| Literature DB >> 34251523 |
Kenneth Lopez1, Silvana Pinheiro2, William J Zamora3,4.
Abstract
A multiple linear regression model called MLR-3 is used for predicting the experimental n-octanol/water partition coefficient (log PN) of 22 N-sulfonamides proposed by the organizers of the SAMPL7 blind challenge. The MLR-3 method was trained with 82 molecules including drug-like sulfonamides and small organic molecules, which resembled the main functional groups present in the challenge dataset. Our model, submitted as "TFE-MLR", presented a root-mean-square error of 0.58 and mean absolute error of 0.41 in log P units, accomplishing the highest accuracy, among empirical methods and also in all submissions based on the ranked ones. Overall, the results support the appropriateness of multiple linear regression approach MLR-3 for computing the n-octanol/water partition coefficient in sulfonamide-bearing compounds. In this context, the outstanding performance of empirical methodologies, where 75% of the ranked submissions achieved root-mean-square errors < 1 log P units, support the suitability of these strategies for obtaining accurate and fast predictions of physicochemical properties as partition coefficients of bioorganic compounds.Entities:
Keywords: Empirical methods; Multiple linear regression; N-sulfonamides; SAMPL7 blind challenge; n-Octanol/water partition coefficients
Mesh:
Substances:
Year: 2021 PMID: 34251523 PMCID: PMC8273033 DOI: 10.1007/s10822-021-00409-2
Source DB: PubMed Journal: J Comput Aided Mol Des ISSN: 0920-654X Impact factor: 3.686
Fig. 1Structures of 22 N-sulfonamides in the SAMPL7 log PN challenge dataset
Fig. 2Representation of a some small molecules of the training set which resembles the main functional groups in b molecules of the SAMPL7 dataset
List of descriptors used in the present study and their coefficient of determination (R) against experimental log PN values for the training set
| Descriptor | Definition | |
|---|---|---|
| 1. RNH2 | Count of primary amine groups | 0.11 |
| 2. R2NH | Count of secondary amine groups | 0.06 |
| 3. R3N | Count of tertiary amine groups | 0.15 |
| 4. ROPO3 | Count of phosphate groups | 0.00 |
| 5. ROH | Count of alcohol groups | 0.03 |
| 6. RCHO | Count of aldehyde groups | 0.00 |
| 7. RCOR | Count of ketone groups | 0.00 |
| 8. RCOOH | Count of carboxylic acid groups | 0.00 |
| 9. RCOOR | Count of ester groups | 0.06 |
| 10. ROR | Count of ether groups | 0.03 |
| 11. RSO2NR | Count of sulfonamide groups | 0.03 |
| 12. RSR | Count of thioether groups | 0.00 |
| 13. RF | Count of fluoroalkyl groups | 0.13 |
| 14. RCl | Count of chloroalkyl groups | 0.01 |
| 15. RBr | Count of bromoalkyl groups | 0.01 |
| 16. RSO2R | Count of sulfone groups | 0.00 |
| 17. C | Count of carbon atoms | 0.50 |
| 18. RINGS | Count of rings (aliphatic and aromatic) | 0.30 |
| 19. AROMATIC | Count of aromatic rings | 0.34 |
| 20. HBA1 | Count of hydrogen bond acceptors considering acceptor sites, i.e., the sum of lone pairs on the acceptor atoms | 0.11 |
| 21. HBA2 | Count of hydrogen bond acceptors considering acceptor counts, i.e., the sum of acceptor atoms | 0.10 |
| 22. HBD | Count of hydrogen bond donor atoms | 0.02 |
| 23. PSA | Polar surface area in Å2 | 0.05 |
| 24. MR | Molar refractivity in cm3/mol | 0.41 |
Statistical parameters of MLR approaches for predicting experimental log PN values for the training set (n = 82).a
| Model | RMSE | |||||
|---|---|---|---|---|---|---|
| MLR-1 | 0.79 | 0.73 | 0.72 | 0.83 | 12.6 | 1.02 × 10–14 |
| MLR-2 | 0.82 | 0.75 | 0.68 | 0.80 | 12.2 | 1.30 × 10–14 |
| MLR-3 | 0.84 | 0.77 | 0.64 | 0.77 | 12.3 | 9.00 × 10–15 |
R, squared coefficient of determination; , adjusted squared coefficient of determination; RMSE, root-mean square error in log P units; s, residual standard error; F, Fisher ratio; p-value; statistical p value
Fig. 3Structures and experimental log PN of 5 biologically active sulfonamide-bearing drugs chosen as prediction set
Statistical parameters of the comparison between experimental and predicted log PN values for the test set using the 3 MLR approaches and other common approaches
| Method | MSEa | MUEa | RMSE | |
|---|---|---|---|---|
| -0.04 | 0.51 | 0.66 | 0.72 | |
| 0.00 | 0.31 | 0.40 | 0.90 | |
| ChemAxonc | 0.05 | 0.24 | 0.28 | 0.98 |
| VLifeMDSd | -0.73 | 0.90 | 0.97 | 0.72 |
| DataWarrior | -0.20 | 0.74 | 0.90 | 0.72 |
aMSE, mean signed error; MUE, mean unsigned error
bThe bolded row represents the submitted approach
cRef. [29]
dRef. [15]
eRef. [30]
Fig. 4Comparison between experimental and predicted n-octanol/water log PN using the MLR-3 model for the training (blue) and test (orange) set
Calculated submission ID “TFE MLR”—and experimental n-octanol/water partition coefficient -log PN—determined for the 22 sulfonamides included in the SAMPL7 dataset
| Compound | Calculated | Experimental | Δlog |
|---|---|---|---|
| SM25 | 2.35 | 2.67 | − 0.32 |
| SM26 | 1.19 | 1.04 | + 0.15 |
| SM27 | 1.47 | 1.56 | − 0.09 |
| SM28 | 1.87 | 1.18 | + 0.69 |
| SM29 | 1.47 | 1.61 | − 0.14 |
| SM30 | 2.74 | 2.76 | − 0.02 |
| SM31 | 1.55 | 1.96 | − 0.41 |
| SM32 | 1.98 | 2.44 | − 0.46 |
| SM33 | 3.25 | 2.96 | + 0.29 |
| SM34 | 2.06 | 2.83 | − 0.77 |
| SM35 | 1.37 | 0.88 | + 0.49 |
| SM36 | 2.64 | 0.76 | |
| SM37 | 1.45 | 1.45 | + 0.00 |
| SM38 | 0.94 | 1.03 | − 0.09 |
| SM39 | 2.21 | 1.89 | + 0.32 |
| SM40 | 1.01 | 1.83 | − 0.82 |
| SM41 | 1.45 | 0.58 | + 0.87 |
| SM42 | 1.58 | 1.76 | − 0.18 |
| SM43 | 0.38 | 0.85 | − 0.47 |
| SM44 | 1.39 | 1.16 | + 0.23 |
| SM45 | 2.66 | 2.55 | + 0.11 |
| SM46 | 1.46 | 1.72 | − 0.26 |
| RMSE | 0.58 | ||
| MUE | 0.41 | ||
| MSE | 0.05 |
aBold value represents the compound with the largest deviation between theoretical and experimental value
Statistical parameters of the comparison between experimental and predicted log PN values for the 22 N-sulfonamides in the SAMPL7 challenge dataset using the 3 MLR approaches
| Method | MSE | MUE | RMSE | |
|---|---|---|---|---|
| MLR-1 | 0.03 | 0.62 | 0.73 | 0.13 |
| MLR-2 | 0.12 | 0.51 | 0.66 | 0.24 |
aThe bolded row represents the submitted approach
Fig. 5Comparison between experimental and the multiple linear regression method for determining the n-octanol/water log PN for the SAMPL7 dataset. Red point illustrates the outlier founded in our method. Top left, statistical analyses are shown for all compounds and bottom right, after exclusion of SM36
Fig. 6Difference between experimental log PN of SAMPL7 phenyl/methyl N-sulfonamides analogs. Δ log P corresponds to the difference between log PPhenyl analogous–log PMethyl analogous
Fig. 7Comparison between experimental and predicted n-octanol/water log PN using the MLR-3 model for 149 (top) and 147 (bottom) molecules from the training (blue), test (orange), SAMPL7 (light blue), and DB40 (unfilled dots) sets (top). In the second graph (bottom), two values from the DrugBank dataset without providing the source were omitted and two experimental values were modified by those from confirmed experimental sources. Red points represent the outliers founded in both sets using our MLR-3 method (meloxicam and SM36 present the same deviation)