Page 2 of 2

Re: Solubility data

PostPosted: Tue Oct 20, 2009 7:45 pm
by jcbradley
Marcin - do you think it will be possible to store and refer to models of this type using TunedIT?
onsc wrote:Right Marcin. The first column is the variable and the second, the coefficient, exactly as you say. -Andy

Re: Solubility data

PostPosted: Thu Oct 22, 2009 5:12 pm
by Marcin
Hi,
I've implemented your solubility model in Java, using Weka architecture - class SolubilityModel inherits from Weka's Classifier. The code essentially mimics the table printed at http://onschallenge.wikispaces.com/SolubilityModel003, so it's pretty straightforward. The assumption is that the data sample (instance) has the same structure (number and meaning of attributes) as in the file onsc/ONSDataNumeric.arff. Below is the main part of the code, excluding several lines of headers:

Code: Select all
public class SolubilityModel extends Classifier {

   public double classifyInstance(Instance instance) throws Exception
   {
      // attributes are counted from zero! 2 means the 3rd attribute, solute_amr
      double AMR = instance.value(2);                // solute_amr
      double Kier1 = instance.value(19);             // solute_kier1
      double XLogP = instance.value(23);             // solute_xlogp
      double BCUTc1 = instance.value(5);             // solute_bcutcl
      double ATSc1 = instance.value(11);             // solute_atsc1
      double apol = instance.value(9);               // solute_apol
      double bpol = instance.value(16);              // solute_bpol
      double Solvent_DC = instance.value(25);        // solvent_dielectricconstant
      double Solvent_BCUTw1 = instance.value(29);    // solvent_bcutwl
      double Solvent_TopoPSA = instance.value(48);   // solvent_topopsa
      
      double prediction =
                                       (-133.454 )
         + 1/AMR                     * ( -31.4476)
         + 1/Kier1                   * (  36.6713)
         + XLogP                     * (   1.1049)
         + BCUTc1                    * (  -7.4035)
         + ATSc1                     * (   6.0309)
         + apol                      * (  -0.1731)
         + apol * Solvent_DC         * (  -0.0053)
         + bpol * Solvent_DC         * (   0.0065)
         + Solvent_BCUTw1            * (  11.6778)
         + Solvent_TopoPSA           * (   0.1071)
         ;
      return prediction;
   }

Full code, source and compiled, is uploaded to Repository: Marcin/SolubilityModel.jar Feel free to download it and upload to your own Repository folder.

So now you can see what the code looks like and you can make modifications whenever you come up with a better model :-)
Regards, Marcin

Re: Solubility data

PostPosted: Thu Oct 22, 2009 5:17 pm
by jcbradley
Thanks Marcin! Andy and I will look at it in detail

Re: Solubility data

PostPosted: Thu Oct 22, 2009 5:21 pm
by Marcin
I've run also a couple of tests of SolubilityModel using TunedTester. You can view all the results here. Your model performs pretty well! If you want to run more tests these are the settings that must be given in TunedTester:

Evaluation procedure: TunedIT/base/RegressionTT70.jar:org.tunedit.base.RegressionTT70
Algorithm: Marcin/SolubilityModel.jar:onsc.SolubilityModel
Dataset: onsc/ONSDataNumeric.arff

Re: Solubility data

PostPosted: Sat Nov 19, 2011 6:19 am
by maallin
Great interaction here and I am sure that this is really going to help a lot of people who are working and learning about solubility data. This post was really informative and reading about it, I am sure that the webservice is going to be just what is needed for people like us to come and refer. Hope everything goes well and as planned.