Page 1 of 1

Question about submission evaluation

PostPosted: Sun Jan 16, 2011 4:28 pm
by dslate
I have a question about the scoring of submissions.
On the submissions page it says:

In this track, the best solution is the active one: it is shown
on Leaderboard and will be used in final evaluation.

Presumably this means that a team's "best solution" as appears on the
leaderboard may be different from the "best solution" used in final
evaluation. It seems that it is to the advantage of each team to
submit as many solutions as possible so as to increase the expected
final score.

Rewarding teams for the quantity of their submissions may not be what
the organizers intended. This issue has come up in other contests for
which the "best" as opposed to "last" solution was the active one. As
a result, the "Kaggle" data prediction competition site is asking each
team to select 5 of their submissions at the end of the competition,
and the best of these 5 on the final test set determines the team's
final ranking. This rule reduces the incentive for trying to "game
the system" by maximizing submission count.

Does anyone have any comments on this issue?

Thanks,

-- "Old Dogs With New Tricks"

Re: Question about submission evaluation

PostPosted: Mon Jan 17, 2011 7:40 am
by dslate
Just a follow-up to my question about submission evaluation:
On the site it says:

You may submit solutions many times, for the whole duration of the challenge.

Is there a maximum number of submissions for the challenge, and/or a maximum number per day?

Thanks,

-- dslate

Re: Question about submission evaluation

PostPosted: Mon Jan 17, 2011 4:48 pm
by Swietlicka
Hello,

Actually, by "best solution" we mean the one that has the highest score in the preliminary tests. Therefore, it is always the same solution as the one on leaderboard. Please also see our wiki.

Regarding the maximum number of submissions, there is al limit for the whole challenge: 1000. However, there are no daily limits.

Regards,
Joanna, TunedIT

Re: Question about submission evaluation

PostPosted: Tue Jan 18, 2011 5:51 pm
by dslate
Thanks Joanna for clarifying how solutions are evaluated.

It does seem to me, however, that designating the solution that is
best on the preliminary test data as the "active" solution for
purposes of final evaluation could have some odd consequences.
Suppose, for example, that a team submits a solution that performs
best on the preliminary data and thus becomes their "active" solution.
This team later decides, based on experiments with holdout sets,
cross-validation, etc., to submit a different solution which, in fact,
performs better on the final test data (without the team's knowledge,
of course) but, because of statistical noise, does not do as well on
the preliminary data. The team's earlier solution in effect blocks
the later one because of a statistical fluke involving the smaller,
hence noisier, preliminary dataset.

If the effect I describe above is real (and undesirable), it could be
mitigated by permitting a team to cancel any of their submitted
solutions. As far as I know, there is currently no way to do that.

Any comments on this issue?

Thanks,

-- dslate

Re: Question about submission evaluation

PostPosted: Thu Jan 20, 2011 4:54 pm
by Swietlicka
Yes, in general you're right and in fact we do plan to implement the functionality you mentioned. However, the changes would be significant and we can't introduce them right now.

On the other hand, considering the size of the data in the contest, it is not a big issue this time.

Regards,
Joanna, TunedIT

Re: Question about submission evaluation

PostPosted: Fri Jan 21, 2011 8:52 pm
by dslate
Joanna,

I'm glad that you plan to change the submission rules along the lines I suggested, and I can understand why you wouldn't want to change them while the contest is already running. However, I'm not so sure that the amount of data involved is sufficient to prevent the kind of problem I described.

Thanks,

-- dslate

Re: Question about submission evaluation

PostPosted: Mon Jan 31, 2011 11:52 am
by abusche
Dear all,

may I furthermore ask one question about the evaluation procedure.

(Without being too concrete at this point - I could be, if requested:) (At least in the genre-challenge:) Is the class distribution (more-or-less) the same for train, reported and final test set, or are there (intentionally or not intentionally) variations/shifts?

Thanks & Best,
André

Re: Question about submission evaluation

PostPosted: Wed Feb 02, 2011 6:05 pm
by Swietlicka
Dear André,

There may be some variations in the distribution of classes between the datasets, so you shouldn't make too strong assumptions regarding it.

Regards,
Joanna, TunedIT

Re: Question about submission evaluation

PostPosted: Thu Feb 03, 2011 5:25 am
by dslate
Just adding to what Joanna said, the page describing the genres track says:

Training and test datasets contain tracks of distinct performers.

I interpret this as saying that the performers on the test dataset are not the same as those on the training dataset.
If this is true, then this makes for variations between training and test data that involve more than just the differences in
class frequencies. However, is it safe to assume that the leaderboard and final portions of the test data are
chosen randomly from the entire test set, and therefore the class frequencies should be roughly the same in each except for
statistical variation?

Thanks,

-- dslate

Re: Question about submission evaluation

PostPosted: Fri Feb 04, 2011 1:39 pm
by Swietlicka
Indeed: training set contains tracks of different performers than the test set. And yes, the leaderboard and final data portions were chosen randomly, so considering the size of the dataset, the class distribution should be very similar in both of them.

Regards,
Joanna, TunedIT

Re: Question about submission evaluation

PostPosted: Mon Feb 07, 2011 2:51 am
by jamesxli
Is there going to be another final test dataset for this contest? If yes, when will the final test data
be published? How will the score for the prelimilary test data included in the final score?

Re: Question about submission evaluation

PostPosted: Mon Feb 07, 2011 5:29 pm
by Swietlicka
There is only one final test dataset, which will be published after the end of the contest. The preliminary test score will not be included in the final score in any way - the final evaluation is completely distinct from the preliminary one.

Regards,
Joanna

Re: Question about submission evaluation

PostPosted: Mon Feb 07, 2011 5:55 pm
by jamesxli
So, what time interval can people submit results for the final test data?

Re: Question about submission evaluation

PostPosted: Mon Feb 07, 2011 6:20 pm
by Swietlicka
Generally, you can (and should) do so throughout the duration of the challenge. Here's why: there is no distinct final test dataset from the participants' point of view. The solution should simply always contain predictions for the whole test dataset. However, during testing, the test dataset is split into two parts: first one for the preliminary tests (their scores are published on the leaderboard), and the other one for the final tests. This is internal to the system, though, and the participants won't be informed which data samples belong to which part of the dataset. I hope it's more clear now :)

Regards,
Joanna