## A problem exists in dataset 6 of basic track!

Questions, answers, discussions related to RSCTC'2010 Discovery Challenge

### A problem exists in dataset 6 of basic track!

Hi

It is for about 2 months that our team has started working on Basic Track of competition and
during our work we guessed that something is wrong with dataset 6. Today we found a reason
for this.

The problem is probably about test dataset of dataset 6 itself or the way you compute balanced
accuracy for this dataset.

To get the accuracy of our model on each dataset we set the prediction of all the other datasets
than dataset of interest to zero and as a feedback from leaderboard we get the accuracy of that
specific model.

In this manner if someone submits all say 1 or 2 or ... C for dataset X which has C classes and sets
the prediction for the other datasets to zero, the expectation is to get accuracy on leaderboard
equal to [(0%+0%+0%+0%+0%+100% / C) / 6] . This means that for 5 class datasets if you submit
all 1 you should get accuracy = 3.00%. The story is the same with submitting all 2 or 3 or 4 or 5.

This story always happens with all the datasets except dataset 6:

everyone can test this:

if you submit zero as prediction for all datasets 1,2,3,4 and 5 and submit one as prediction
for all test points in dataset 6 we get 0.00%. the result for the others is:

if you submit all 1 for dataset 6 you get 0.00%
if you submit all 2 for dataset 6 you get 3.00%
if you submit all 3 for dataset 6 you get 2.00%
if you submit all 4 for dataset 6 you get 1.00%
if you submit all 5 for dataset 6 you get 1.00%

this indicates that there is a problem in this specific dataset.

I tested dataset 3 and for all classes I get 3.00% as it is expected to get.

Please fix this problem or indicate where I make a mistake.

But we identified a strange thing in dataset 6:
when you
MohsenH

### Re: A problem exists in dataset 6 of basic track!

This doesn't mean there is a problem with the dataset.

There are only 11 instances of class 1 in the training set. There may be less in the test set. The leaderboard scores are only a subset of the test set, so there could actually be no cases of class 1 in the leaderboard set - which would give you the result you are getting.

Does this make sense?
Guest

### Re: A problem exists in dataset 6 of basic track!

Ah - but I see your point - 0 for score 1 is OK, but you would expect to get the same score for all the others - 2,3,4 and 5.
Guest

### Re: A problem exists in dataset 6 of basic track!

I just submitted all 1s for set 6, zeros for all other sets and got 0.03 as expected.

IgglePiggle

Posts: 4
Joined: Sat Dec 05, 2009 11:38 am

### Re: A problem exists in dataset 6 of basic track!

To: IgglePiggle

I again checked dataset 6 with all 1 s and again 0.00%

But anyway, did you test it with all other values? 2,3,4,5?
MohsenH

### Re: A problem exists in dataset 6 of basic track!

Hello

MohsenH, we've checked your submission and it seems that data6_dec.txt file is broken - it contains some additional bytes that shouldn't be there, so during evaluation your predictions are not recognized correctly. Please check this.

Regards, Marcin Wojnarski
Marcin

Posts: 115
Joined: Fri Oct 09, 2009 6:45 pm