by dslate » Tue Jan 18, 2011 5:51 pm
Thanks Joanna for clarifying how solutions are evaluated.
It does seem to me, however, that designating the solution that is
best on the preliminary test data as the "active" solution for
purposes of final evaluation could have some odd consequences.
Suppose, for example, that a team submits a solution that performs
best on the preliminary data and thus becomes their "active" solution.
This team later decides, based on experiments with holdout sets,
cross-validation, etc., to submit a different solution which, in fact,
performs better on the final test data (without the team's knowledge,
of course) but, because of statistical noise, does not do as well on
the preliminary data. The team's earlier solution in effect blocks
the later one because of a statistical fluke involving the smaller,
hence noisier, preliminary dataset.
If the effect I describe above is real (and undesirable), it could be
mitigated by permitting a team to cancel any of their submitted
solutions. As far as I know, there is currently no way to do that.
Any comments on this issue?
Thanks,
-- dslate