This page describes the task of Basic Track only. Task of Advanced Track is here.
Datasets for 6 different problems of DNA microarray data analysis and classification are located in Repository in RSCTC/2010/B/public folder. They are available both in ARFF and CSV format, each one separately or zipped together. You must be logged-in to TunedIT and registered to the challenge in order to view and download them, otherwise they will not even show up in the folder contents.
Your task is to train classifiers on
Solution should be a ZIP file containing 6 files:
data1_dec.txt data2_dec.txt ... data6_dec.txt
Predicted decisions are compared with the correct ones and balanced accuracy of predictions for each dataset is calculated. Then, the results are averaged over all 6 datasets.
Balanced accuracy is an average of the standard classification accuracies (acck) calculated for each decision class
In this way, every class has the same contribution to the final result, no matter how frequent it is.
Note that only a half of decisions is used for calculation of preliminary result, which appears on Leaderboard. The other half will be used for final evaluation, to guarantee that final result is non-biased. Keep in mind that test datasets are small, so it is relatively easy to overfit to the preliminary test subset. The highest preliminary score, seen on Leaderboard during the challenge, may finally turn to result from overfitting rather than from high-quality algorithm. If you want your algorithm to be tested more precisely, we invite you to the Advanced Track, where evaluation procedure is very thorough thanks to reusing the same dataset multiple times.
Leaderboard of the Basic Track contains four baseline results. The three ones named Baseline_xxx correspond to simple algorithms of feature selection, combined with 1-Nearest Neighbor classification algorithm. The one named Baseline corresponds to a naive majority classifier.
Marcin Wojnarski, Andrzej Janusz, Hung Son Nguyen, Jan Bazan