Challenges / RSCTC 2010 Discovery Challenge. Basic Track

Status Closed
Type Scientific
Start 2009-12-01 00:00:00 CET
End 2010-02-28 23:59:59 CET
Prize 1,000$

Registration is required.

Task

This page describes the task of Basic Track only. Task of Advanced Track is here.

Datasets

Datasets for 6 different problems of DNA microarray data analysis and classification are located in Repository in RSCTC/2010/B/public folder. They are available both in ARFF and CSV format, each one separately or zipped together. You must be logged-in to TunedIT and registered to the challenge in order to view and download them, otherwise they will not even show up in the folder contents.

Your task is to train classifiers on dataX_train sets and apply them to predict decisions for all samples of dataX_test sets.

Solution

Solution should be a ZIP file containing 6 files:

  data1_dec.txt
  data2_dec.txt
  ...
  data6_dec.txt

Each file should contain a list of decisions: integer numbers starting from 1, one per line. See example solution files: example-all1.zip, example-majority.zip.

Use submission form to submit your solution. Preliminary result will appear on Leaderboard.

Evaluation

Predicted decisions are compared with the correct ones and balanced accuracy of predictions for each dataset is calculated. Then, the results are averaged over all 6 datasets.

Balanced accuracy is an average of the standard classification accuracies (acck) calculated for each decision class (k = 1,2,...,K) independently:

Sk = #{ i: class(samplei) = k }
acck = #{ i: prediction(samplei) = class(samplei) = k } / Sk
BalancedAcc = (acc1 + acc2 + ... + accK) / K

In this way, every class has the same contribution to the final result, no matter how frequent it is.

Note that only a half of decisions is used for calculation of preliminary result, which appears on Leaderboard. The other half will be used for final evaluation, to guarantee that final result is non-biased. Keep in mind that test datasets are small, so it is relatively easy to overfit to the preliminary test subset. The highest preliminary score, seen on Leaderboard during the challenge, may finally turn to result from overfitting rather than from high-quality algorithm. If you want your algorithm to be tested more precisely, we invite you to the Advanced Track, where evaluation procedure is very thorough thanks to reusing the same dataset multiple times.

Baselines

Leaderboard of the Basic Track contains four baseline results. The three ones named Baseline_xxx correspond to simple algorithms of feature selection, combined with 1-Nearest Neighbor classification algorithm. The one named Baseline corresponds to a naive majority classifier.


Good luck!

Marcin Wojnarski, Andrzej Janusz, Hung Son Nguyen, Jan Bazan
The Organizers

Copyright © 2008-2013 by TunedIT
Design by luksite