Question about submission evaluation

Questions, answers, discussions related to ISMIS 2011 Contest

Question about submission evaluation

Postby dslate » Sun Jan 16, 2011 4:28 pm

I have a question about the scoring of submissions.
On the submissions page it says:

In this track, the best solution is the active one: it is shown
on Leaderboard and will be used in final evaluation.

Presumably this means that a team's "best solution" as appears on the
leaderboard may be different from the "best solution" used in final
evaluation. It seems that it is to the advantage of each team to
submit as many solutions as possible so as to increase the expected
final score.

Rewarding teams for the quantity of their submissions may not be what
the organizers intended. This issue has come up in other contests for
which the "best" as opposed to "last" solution was the active one. As
a result, the "Kaggle" data prediction competition site is asking each
team to select 5 of their submissions at the end of the competition,
and the best of these 5 on the final test set determines the team's
final ranking. This rule reduces the incentive for trying to "game
the system" by maximizing submission count.

Does anyone have any comments on this issue?

Thanks,

-- "Old Dogs With New Tricks"
dslate
 
Posts: 15
Joined: Mon Jul 05, 2010 4:35 am

Re: Question about submission evaluation

Postby dslate » Mon Jan 17, 2011 7:40 am

Just a follow-up to my question about submission evaluation:
On the site it says:

You may submit solutions many times, for the whole duration of the challenge.

Is there a maximum number of submissions for the challenge, and/or a maximum number per day?

Thanks,

-- dslate
dslate
 
Posts: 15
Joined: Mon Jul 05, 2010 4:35 am

Re: Question about submission evaluation

Postby Swietlicka » Mon Jan 17, 2011 4:48 pm

Hello,

Actually, by "best solution" we mean the one that has the highest score in the preliminary tests. Therefore, it is always the same solution as the one on leaderboard. Please also see our wiki.

Regarding the maximum number of submissions, there is al limit for the whole challenge: 1000. However, there are no daily limits.

Regards,
Joanna, TunedIT
Swietlicka
 

Re: Question about submission evaluation

Postby dslate » Tue Jan 18, 2011 5:51 pm

Thanks Joanna for clarifying how solutions are evaluated.

It does seem to me, however, that designating the solution that is
best on the preliminary test data as the "active" solution for
purposes of final evaluation could have some odd consequences.
Suppose, for example, that a team submits a solution that performs
best on the preliminary data and thus becomes their "active" solution.
This team later decides, based on experiments with holdout sets,
cross-validation, etc., to submit a different solution which, in fact,
performs better on the final test data (without the team's knowledge,
of course) but, because of statistical noise, does not do as well on
the preliminary data. The team's earlier solution in effect blocks
the later one because of a statistical fluke involving the smaller,
hence noisier, preliminary dataset.

If the effect I describe above is real (and undesirable), it could be
mitigated by permitting a team to cancel any of their submitted
solutions. As far as I know, there is currently no way to do that.

Any comments on this issue?

Thanks,

-- dslate
dslate
 
Posts: 15
Joined: Mon Jul 05, 2010 4:35 am

Re: Question about submission evaluation

Postby Swietlicka » Thu Jan 20, 2011 4:54 pm

Yes, in general you're right and in fact we do plan to implement the functionality you mentioned. However, the changes would be significant and we can't introduce them right now.

On the other hand, considering the size of the data in the contest, it is not a big issue this time.

Regards,
Joanna, TunedIT
Swietlicka
 

Re: Question about submission evaluation

Postby dslate » Fri Jan 21, 2011 8:52 pm

Joanna,

I'm glad that you plan to change the submission rules along the lines I suggested, and I can understand why you wouldn't want to change them while the contest is already running. However, I'm not so sure that the amount of data involved is sufficient to prevent the kind of problem I described.

Thanks,

-- dslate
dslate
 
Posts: 15
Joined: Mon Jul 05, 2010 4:35 am

Re: Question about submission evaluation

Postby abusche » Mon Jan 31, 2011 11:52 am

Dear all,

may I furthermore ask one question about the evaluation procedure.

(Without being too concrete at this point - I could be, if requested:) (At least in the genre-challenge:) Is the class distribution (more-or-less) the same for train, reported and final test set, or are there (intentionally or not intentionally) variations/shifts?

Thanks & Best,
André
abusche
 
Posts: 2
Joined: Tue Jan 25, 2011 2:25 pm

Re: Question about submission evaluation

Postby Swietlicka » Wed Feb 02, 2011 6:05 pm

Dear André,

There may be some variations in the distribution of classes between the datasets, so you shouldn't make too strong assumptions regarding it.

Regards,
Joanna, TunedIT
Swietlicka
 

Re: Question about submission evaluation

Postby dslate » Thu Feb 03, 2011 5:25 am

Just adding to what Joanna said, the page describing the genres track says:

Training and test datasets contain tracks of distinct performers.

I interpret this as saying that the performers on the test dataset are not the same as those on the training dataset.
If this is true, then this makes for variations between training and test data that involve more than just the differences in
class frequencies. However, is it safe to assume that the leaderboard and final portions of the test data are
chosen randomly from the entire test set, and therefore the class frequencies should be roughly the same in each except for
statistical variation?

Thanks,

-- dslate
dslate
 
Posts: 15
Joined: Mon Jul 05, 2010 4:35 am

Re: Question about submission evaluation

Postby Swietlicka » Fri Feb 04, 2011 1:39 pm

Indeed: training set contains tracks of different performers than the test set. And yes, the leaderboard and final data portions were chosen randomly, so considering the size of the dataset, the class distribution should be very similar in both of them.

Regards,
Joanna, TunedIT
Swietlicka
 

Re: Question about submission evaluation

Postby jamesxli » Mon Feb 07, 2011 2:51 am

Is there going to be another final test dataset for this contest? If yes, when will the final test data
be published? How will the score for the prelimilary test data included in the final score?
jamesxli
 
Posts: 19
Joined: Wed Dec 09, 2009 6:55 pm

Re: Question about submission evaluation

Postby Swietlicka » Mon Feb 07, 2011 5:29 pm

There is only one final test dataset, which will be published after the end of the contest. The preliminary test score will not be included in the final score in any way - the final evaluation is completely distinct from the preliminary one.

Regards,
Joanna
Swietlicka
 

Re: Question about submission evaluation

Postby jamesxli » Mon Feb 07, 2011 5:55 pm

So, what time interval can people submit results for the final test data?
jamesxli
 
Posts: 19
Joined: Wed Dec 09, 2009 6:55 pm

Re: Question about submission evaluation

Postby Swietlicka » Mon Feb 07, 2011 6:20 pm

Generally, you can (and should) do so throughout the duration of the challenge. Here's why: there is no distinct final test dataset from the participants' point of view. The solution should simply always contain predictions for the whole test dataset. However, during testing, the test dataset is split into two parts: first one for the preliminary tests (their scores are published on the leaderboard), and the other one for the final tests. This is internal to the system, though, and the participants won't be informed which data samples belong to which part of the dataset. I hope it's more clear now :)

Regards,
Joanna
Swietlicka
 


Return to ISMIS 2011 Contest: Music Information Retrieval

Who is online

Users browsing this forum: No registered users and 0 guests

cron