| Literature DB >> 30552535 |
Jeffrey Plante1, Stephane Werner2.
Abstract
The partition coefficient between octanol and water (logP) has been an important descriptor in QSAR predictions for many years and therefore the prediction of logP has been examined countless times. One of the best performing models is to predict the logP using multiple methods and average the result. We have used those averaged predictions to develop a training-set which was able to distil the information present across the disparate logP methods into one single model. Our model was built using extendable atom-types, where each atom is distilled down into a 6 digit number, and each individual atom is assumed to have a small additive effect on the overall logP of the molecule. Beyond the simple coefficient model a consensus model is evaluated, which uses known compounds as a starting point in the calculation and modifies the experimental logP using the same coefficients as in the first model. We then test the performance of our models against two different datasets, one where many different models routinely perform well against, and another designed to more represent pharmaceutical space. The true strength of the model is represented in the pharmaceutical benchmark set, where both models perform better than any previously developed models.Entities:
Year: 2018 PMID: 30552535 PMCID: PMC6755606 DOI: 10.1186/s13321-018-0316-5
Source DB: PubMed Journal: J Cheminform ISSN: 1758-2946 Impact factor: 5.514
Fig. 1Random sampling versus targeted sampling. If there is a minimum learning requirement of 3 examples the random only is able to use 7 columns, while the targeted is able to learn on 18 columns
Fig. 2Number format of the identifier for each atom in the atom-typer
Fig. 3Special codes for the local environment around the atom being typed
An example calculation showing both JPlogP-coeff and JPlogP-library
RMSE values for the grid-search comparing the maximum number of matches to use along with the minimum similarity threshold
| Maximum examples | Minimum similarity threshold | ||||
|---|---|---|---|---|---|
| 0.8 | 0.75 | 0.7 | 0.6 | 0.5 | |
| 1 | 0.65 | 0.647 | 0.659 | 0.704 | 0.731 |
| 2 | 0.644 | 0.639 | 0.651 | 0.663 | 0.687 |
| 3 | 0.642 | 0.636 | 0.645 | 0.656 | 0.678 |
| 4 | 0.64 | 0.634 | 0.644 | 0.654 | 0.677 |
| 5 | 0.639 | 0.633 | 0.643 | 0.651 | 0.672 |
| 6 | 0.639 | 0.633 | 0.642 | 0.649 | 0.67 |
Performance of different logP methods against the Martel dataset
| Predictor | Performance | % Binned absolute errors | ||||
|---|---|---|---|---|---|---|
| RMSE | 0–0.5 | 0.5–1 | 1–1.5 | 1.5–2 | ≥ 2 | |
| JPlogP-Coeff | 1.04 | 39.7 | 31.5 | 14.7 | 7.3 | 6.6 |
| JPlogP—library | 1.05 | 38.6 | 32.4 | 14.9 | 7.4 | 6.7 |
| LogP4Average | 1.12 | 33.9 | 29.4 | 19.2 | 10.8 | 6.6 |
| XlogP3-AA | 1.16 | 33.0 | 27.9 | 20.7 | 10.5 | 8.1 |
| AAM | 1.18 | 34.2 | 27.6 | 17.5 | 11.7 | 8.9 |
| Molinspiration | 1.21 | 32.7 | 26.3 | 20.1 | 11.7 | 9.2 |
| SlogP | 1.24 | 32.4 | 27.6 | 19.8 | 10.7 | 9.5 |
| ALogP (Vega) | 1.24 | 34.4 | 27.4 | 17.5 | 9.8 | 10.9 |
| Biobyte CLogP | 1.24 | 32.0 | 31.3 | 16.8 | 9.1 | 10.9 |
| XLogP2 | 1.28 | 31.8 | 24.5 | 19.9 | 11.6 | 12.2 |
| AlogPS logP | 1.29 | 25.3 | 28.3 | 20.1 | 16.3 | 10.0 |
| ACD | 1.39 | 33.1 | 25.2 | 17.5 | 10.6 | 13.6 |
| KowWIN | 1.42 | 32.0 | 23.2 | 18.2 | 11.9 | 14.7 |
| Meylan (Vega) | 1.66 | 25.7 | 23.2 | 17.1 | 14.3 | 19.7 |
| Mannhold LogP | 1.71 | 14.1 | 16.3 | 23.2 | 20.9 | 25.5 |
| MLogP (Vega) | 1.95 | 12.3 | 14.7 | 17.7 | 19.5 | 35.8 |
| AlogP (CDK) | 3.72 | 1.6 | 2.5 | 5.0 | 9.1 | 81.9 |
Performance of different logP methods against the Avdeef dataset
| Predictor | Performance | %Binned absolute errors | ||||
|---|---|---|---|---|---|---|
| RMSE | < 0.5 | 0.5–1 | 1–1.5 | 1.5–2 | ≥ 2 | |
| JPlogP-library | 0.63 | 68.91 | 20.6 | 7.12 | 2.25 | 1.12 |
| LogP4Average | 0.65 | 71.91 | 16.11 | 7.49 | 3.37 | 1.12 |
| AlogP (Vega) | 0.65 | 70.04 | 18.35 | 6.74 | 3 | 1.87 |
| Biobyte CLogP | 0.76 | 70.41 | 17.23 | 4.12 | 3.75 | 4.49 |
| XlogP3—AA | 0.77 | 69.29 | 15.36 | 8.24 | 3.75 | 3.37 |
| SlogP | 0.79 | 49.06 | 34.46 | 10.49 | 4.12 | 1.87 |
| Molinspiration | 0.80 | 63.30 | 20.23 | 10.49 | 3.37 | 2.62 |
| JPlogP-coeff | 0.81 | 47.94 | 32.58 | 13.48 | 4.49 | 1.5 |
| ACD | 0.83 | 68.17 | 19.10 | 8.24 | 1.87 | 2.62 |
| KowWIN | 0.84 | 73.78 | 14.97 | 5.99 | 2.25 | 3.00 |
| MlogP (Vega) | 0.85 | 67.04 | 16.85 | 6.74 | 5.24 | 4.12 |
| AlogPS logP | 0.86 | 66.29 | 23.60 | 7.12 | 2.25 | 0.75 |
| Myelan (Vega) | 0.89 | 65.54 | 15.73 | 9.74 | 4.49 | 4.49 |
| XLogP2 | 1.05 | 56.93 | 20.22 | 8.99 | 7.12 | 6.74 |
| Mannhold LogP | 1.43 | 26.22 | 24.72 | 20.97 | 13.86 | 14.23 |
| AAM | 1.62 | 21.35 | 23.97 | 18.73 | 12.73 | 23.22 |
| AlogP (CDK) | 2.57 | 7.87 | 10.49 | 19.1 | 14.61 | 47.94 |