Challenges / Materials Identification Based on Measurements of Passively Emitted Electromagnetic Radiation

Status Closed
Type Industrial
Start 2011-05-11 10:00:00 CET
End 2011-11-30 10:00:00 CET
Prize 45,000$

Registration is required.

Task

A sensor, sensitive to spontaneous electromagnetic radiation from matter, transmits a small voltage output, acquired by a 24-bit analog to digital converter at a sampling rate of 16,384/second. Each data set is from a 2-second run. Voltage data are acquired from three different materials.

There are essentially two datasets: Training and Test, each contains 1500 samples. Your task is to build a model on Training data and use it to identify samples of the Test set. You will submit a text file with a list of A/B/C labels, each one on separate line, as a solution. TunedIT will calculate recognition accuracy and use it for deciding prospective winners. These scores will be kept secret.

In order to make on-line preliminary evaluation possible and to publish preliminary results on the Leaderboard, so that you have some orientation about quality of your current solution, we created a Preliminary Test set, composed of 500 randomly selected samples of the Training set (without removal); the remaining part of the Training set will be called Training-1000. Although classifications of Preliminary samples can be recovered just by comparing the Preliminary Test set with the full Training set, we ask you to generate your solution in the following way:

  1. Train your algorithm on Training-1000 and generate 500 labels for Preliminary Test set
  2. Train your algorithm again, now on the full Training set, and generate 1500 labels for the Test samples
  3. Combine both lists of labels (500+1500) and submit as a solution. Labels must be listed on consecutive lines, with an empty line separating both parts, in the same order as numbering of sample files in each set: 1.lvm,2.lvm,... Please follow strictly this format. View baseline solution for an example.

Two scores will be calculated, preliminary and final one, separately for each part of the solution file. Only preliminary score will be disclosed on the Leaderboard; final score will be used solely by TunedIT, to inform prospective winner when the 95% threshold is achieved.

Keep in mind that Leaderboard results of other participants can be easily disturbed (!) by the authors, intentionally or not (overfitting), because true classifications are already known, thus the scores must be interpreted with care, and even 1.000 preliminary score can be easily obtained (with poor final score). We strongly encourage all of you to submit preliminary classifications generated with the above procedure, which involves construction of two separate models, so that submissions truely depict performance of your algorithms.

In this challenge, Leaderboard shows the score of the last submitted solution, so it may undergo frequent changes and result of a given team may change not only upwards, but also downwards. Do not submit more than 10 solutions daily. If you submit more, excessive submissions can be ignored or removed without performing final evaluation.

Please do not assume that the random sets of data in the Test set add up to 500 sets for each substance; because, that is not the case.

Download

Datasets:
Other:
Note: you must be logged in and registered in this challenge in order to download the files.

Algorithm

You are free to use any open source - GPL-compatible - code in your solution. If you use an algorithm written by other people you must enter a comment before the algorithm in the source code [this algorithm is created by AUTHOR NAME] and provide the link to the web site or publication where it appears.

Winning algorithm must not infringe any patents. The winner must also confirm in writing no one involved in the development of the algorithm is filing a patent for the algorithm.
Copyright © 2008-2013 by TunedIT
Design by luksite