A problem exists in dataset 6 of basic track!

Questions, answers, discussions related to RSCTC'2010 Discovery Challenge

A problem exists in dataset 6 of basic track!

Postby MohsenH » Sun Feb 21, 2010 11:22 pm

Hi

It is for about 2 months that our team has started working on Basic Track of competition and
during our work we guessed that something is wrong with dataset 6. Today we found a reason
for this.

The problem is probably about test dataset of dataset 6 itself or the way you compute balanced
accuracy for this dataset.

To get the accuracy of our model on each dataset we set the prediction of all the other datasets
than dataset of interest to zero and as a feedback from leaderboard we get the accuracy of that
specific model.

In this manner if someone submits all say 1 or 2 or ... C for dataset X which has C classes and sets
the prediction for the other datasets to zero, the expectation is to get accuracy on leaderboard
equal to [(0%+0%+0%+0%+0%+100% / C) / 6] . This means that for 5 class datasets if you submit
all 1 you should get accuracy = 3.00%. The story is the same with submitting all 2 or 3 or 4 or 5.

This story always happens with all the datasets except dataset 6:

everyone can test this:

if you submit zero as prediction for all datasets 1,2,3,4 and 5 and submit one as prediction
for all test points in dataset 6 we get 0.00%. the result for the others is:

if you submit all 1 for dataset 6 you get 0.00%
if you submit all 2 for dataset 6 you get 3.00%
if you submit all 3 for dataset 6 you get 2.00%
if you submit all 4 for dataset 6 you get 1.00%
if you submit all 5 for dataset 6 you get 1.00%

this indicates that there is a problem in this specific dataset.

I tested dataset 3 and for all classes I get 3.00% as it is expected to get.

Please fix this problem or indicate where I make a mistake.



But we identified a strange thing in dataset 6:
when you
MohsenH
 

Re: A problem exists in dataset 6 of basic track!

Postby Guest » Mon Feb 22, 2010 12:02 am

This doesn't mean there is a problem with the dataset.

There are only 11 instances of class 1 in the training set. There may be less in the test set. The leaderboard scores are only a subset of the test set, so there could actually be no cases of class 1 in the leaderboard set - which would give you the result you are getting.

Does this make sense?
Guest
 

Re: A problem exists in dataset 6 of basic track!

Postby Guest » Mon Feb 22, 2010 12:11 am

Ah - but I see your point - 0 for score 1 is OK, but you would expect to get the same score for all the others - 2,3,4 and 5.
Guest
 

Re: A problem exists in dataset 6 of basic track!

Postby IgglePiggle » Mon Feb 22, 2010 12:32 am

I just submitted all 1s for set 6, zeros for all other sets and got 0.03 as expected.

:geek:
IgglePiggle
 
Posts: 4
Joined: Sat Dec 05, 2009 11:38 am

Re: A problem exists in dataset 6 of basic track!

Postby MohsenH » Mon Feb 22, 2010 1:14 am

To: IgglePiggle

I again checked dataset 6 with all 1 s and again 0.00%

But anyway, did you test it with all other values? 2,3,4,5?
MohsenH
 

Re: A problem exists in dataset 6 of basic track!

Postby Marcin » Tue Feb 23, 2010 2:17 pm

Hello

MohsenH, we've checked your submission and it seems that data6_dec.txt file is broken - it contains some additional bytes that shouldn't be there, so during evaluation your predictions are not recognized correctly. Please check this.

Regards, Marcin Wojnarski
Marcin
 
Posts: 115
Joined: Fri Oct 09, 2009 6:45 pm


Return to RSCTC'2010 Discovery Challenge

Who is online

Users browsing this forum: No registered users and 1 guest

cron