Challenges / SIAM SDM'11 Contest: Prediction of Biological Properties of Molecules from Chemical Structure

Start 2010-11-16 19:00:00 CET
End 2011-01-24 23:00:00 CET
Prize 1,000$

Dear Contestants,

The SIAM SDM'11 Contest: Prediction of Biological Properties of Molecules from Chemical Structure has come to an end. Out of 246 registered teams 114 submitted solutions. The compiled list of final results is shown in the last column on the Leaderboard. The winner of this challenge and a recipient of $1000 award is Ed Ramsden (EdR) - our sincere congratulations! His model predicted the Final Test set with Balanced Youden Index = 0.6890. The top three teams are:

  1. Ed Ramsden (EdR): Balanced Youden Index = 0.6890, Sensitivity = 0.7333, Specificity = 0.6890
  2. Yuchun Tang (piaopiao): Balanced Youden Index = 0.6889, Sensitivity = 0.6889, Specificity = 0.7195
  3. Frank Lemke (pat): Balanced Youden Index = 0.6768, Sensitivity = 0.7111, Specificity = 0.6768
Several contestants tied at positions 4 through 9 with Balanced Youden Index = 0.6667. Most of the submitted models performed better than the Baseline method (a simple 1-nearest neighbor classifier).

The authors of top winning solutions will be contacted via e-mail regarding their participation in the SIAM SDM'11 workshop specifically dedicated to this Challenge. SIAM conferences are well known for their high prestige and the indicated contestants are strongly encouraged to present their work. Everyone interested is encouraged to register for the SIAM SDM'11 conference to attend our workshop on April 30, 2011.

We may now reveal that the studied problem was to classify chemical molecules into substrates (label "S") and non-substrates (label "N") of the CYP 2C19 isoform of the cytochrome P450 enzyme in human. Predicting whether or not a particular chemical will be metabolized by 2C19, as well as other major isoforms of CYP P450, is of primary importance to the pharmaceutical industry. The top results of this challenge match the noise level in the experimental data estimated at about 30%. Hence, some Preliminary Test statistics submitted during Phase I before January 10, 2011, that vastly exceeded 0.7 probably indicated overtrained models, finely tuned to this specific test set.

Congratulations to all for taking on this seemingly simple, but truly tough challenge! We would like to acknowledge Simulations Plus for providing sponsorship, TunedIT for excellent organizational and technical support, and the Society for Industrial and Applied Mathematics for providing workshop venue. Thank you all for participating,

Robert Frączkiewicz, Robert D. Clark, Jinhua Zhang, Marcin Wojnarski, and Joanna Świetlicka
The Organizers

