trainingLabels has 9999 lines, trainingData has 10000

Q&A related to the challenge: JRS 2012 DM Competition: Topical Classification of Biomedical Research Papers

trainingLabels has 9999 lines, trainingData has 10000

Postby cwilkes » Tue Jan 03, 2012 1:38 am

$ wc -l trainingLabels.txt trainingData.csv
9999 trainingLabels.txt
10000 trainingData.csv


Shouldn't they both be equal in size if each line in Labels should relate to one in Data?
cwilkes
 
Posts: 3
Joined: Fri Feb 25, 2011 8:06 am

Re: trainingLabels has 9999 lines, trainingData has 10000

Postby cwilkes » Tue Jan 03, 2012 6:19 am

Figured it out, the majorityClasses.txt and trainingLabels.txt files are missing an EOL character for the last row causing them to appear to be one row short. That should probably be fixed in the downloadable files.
cwilkes
 
Posts: 3
Joined: Fri Feb 25, 2011 8:06 am

Re: trainingLabels has 9999 lines, trainingData has 10000

Postby NosferatoCorp » Tue Jan 03, 2012 9:32 am

Hi,

Thank you for your post. I guess that the line counting is system/software dependent. On my system, trainingLabels.txt has perfectly fine 10000 lines with EOF ending the last line :-). Anyway, I have added the EOL at the end of trainingLabels.txt and majorityClasses.txt to avoid confusion.

Best,
Andrzej Janusz
NosferatoCorp
 
Posts: 15
Joined: Tue Jul 13, 2010 5:44 pm


Return to JRS 2012 Data Mining Competition: Topical Classification of Biomedical Research Papers

Who is online

Users browsing this forum: No registered users and 1 guest

cron