Challenges / Materials Identification Based on Measurements of Passively Emitted Electromagnetic Radiation
A sensor, sensitive to spontaneous electromagnetic radiation from matter, transmits a small voltage output, acquired by a 24-bit analog to digital converter at a sampling rate of 16,384/second. Each data set is from a 2-second run. Voltage data are acquired from three different materials.
There are essentially two datasets: Training and Test, each contains 1500 samples. Your task is to build a model on Training data and use it to identify samples of the Test set. You will submit a text file with a list of A/B/C labels, each one on separate line, as a solution. TunedIT will calculate recognition accuracy and use it for deciding prospective winners. These scores will be kept secret.
In order to make on-line preliminary evaluation possible and to publish preliminary results on the Leaderboard, so that you have some orientation about quality of your current solution, we created a Preliminary Test set, composed of 500 randomly selected samples of the Training set (without removal); the remaining part of the Training set will be called Training-1000. Although classifications of Preliminary samples can be recovered just by comparing the Preliminary Test set with the full Training set, we ask you to generate your solution in the following way:
Two scores will be calculated, preliminary and final one, separately for each part of the solution file. Only preliminary score will be disclosed on the Leaderboard; final score will be used solely by TunedIT, to inform prospective winner when the 95% threshold is achieved.
Keep in mind that Leaderboard results of other participants can be easily disturbed (!) by the authors, intentionally or not (overfitting), because true classifications are already known, thus the scores must be interpreted with care, and even 1.000 preliminary score can be easily obtained (with poor final score). We strongly encourage all of you to submit preliminary classifications generated with the above procedure, which involves construction of two separate models, so that submissions truely depict performance of your algorithms.
In this challenge, Leaderboard shows the score of the last submitted solution, so it may undergo frequent changes and result of a given team may change not only upwards, but also downwards. Do not submit more than 10 solutions daily. If you submit more, excessive submissions can be ignored or removed without performing final evaluation.
Please do not assume that the random sets of data in the Test set add up to 500 sets for each substance; because, that is not the case.
You are free to use any open source - GPL-compatible - code in your solution. If you use an algorithm written by other people you must enter a comment before the algorithm in the source code [this algorithm is created by AUTHOR NAME] and provide the link to the web site or publication where it appears.
Winning algorithm must not infringe any patents. The winner must also confirm in writing no one involved in the development of the algorithm is filing a patent for the algorithm.