| Literature DB >> 35299878 |
Abstract
The prediction of solubility of drugs usually calls on the use of several open-source/commercially-available computer programs in the various calculation steps. Popular statistics to indicate the strength of the prediction model include the coefficient of determination (r2), Pearson's linear correlation coefficient (rPearson), and the root-mean-square error (RMSE), among many others. When a program calculates these statistics, slightly different definitions may be used. This commentary briefly reviews the definitions of three types of r2 and RMSE statistics (model validation, bias compensation, and Pearson) and how systematic errors due to shortcomings in solubility prediction models can be differently indicated by the choice of statistical indices. The indices we have employed in recently published papers on the prediction of solubility of druglike molecules were unclear, especially in cases of drugs from 'beyond the Rule of 5' chemical space, as simple prediction models showed distinctive 'bias-tilt' systematic type scatter.Entities:
Keywords: coefficient of determination; linear correction coefficient; linear regression; root-mean-square error
Year: 2020 PMID: 35299878 PMCID: PMC8923304 DOI: 10.5599/admet.888
Source DB: PubMed Journal: ADMET DMPK ISSN: 1848-7718
Figure 1.Correlation plots (pS0 = -log S0) – three distinct definitions of coefficients of determination (val = model validation, bias = bias compensation, and Pearson), illustrated by simulated data (squares) containing random and systematic errors. The statistics arising from case place the prediction in the most favorable light (with RMSE referring to the experimental random error scatter about the green dash-dot curves). Those of case refer to model validation (with RMSE referring to the data scatter about the solid black ‘identity’ diagonal lines). The dashed red lines correspond to the intermediate case .
Recalculated statistics for the scatter plots in Ref. [7]
| Type [ | Fig. in | r2Pearson | r2bias | r2val | RMSEPearson | RMSEbias | RMSEval | bias |
|---|---|---|---|---|---|---|---|---|
| GSE, acids |
| 0.62 | 0.61 | 0.58 | 1.21 | 1.24 | 1.27 | -0.29 |
| GSE, bases |
| 0.60 | 0.57 | 0.56 | 1.16 | 1.21 | 1.21 | -0.14 |
| GSE, neutrals |
| 0.61 | 0.54 | 0.54 | 1.05 | 1.15 | 1.18 | -0.30 |
| GSE, zwitterions |
| 0.24 | 0.07 | 0.02 | 1.38 | 1.54 | 1.57 | 0.34 |
| ABSOLV, acids |
| 0.66 | 0.66 | 0.65 | 1.14 | 1.15 | 1.16 | -0.15 |
| ABSOLV, bases |
| 0.64 | 0.64 | 0.62 | 1.10 | 1.10 | 1.13 | -0.28 |
| ABSOLV, neutrals |
| 0.61 | 0.61 | 0.61 | 1.05 | 1.05 | 1.05 | -0.11 |
| ABSOLV, zwitterions |
| 0.68 | 0.68 | 0.67 | 0.90 | 0.90 | 0.92 | -0.20 |
| RFR |
| 0.98 | 0.98 | 0.98 | 0.28 | 0.28 | 0.28 | 0.00 |
| RFR |
| 0.90 | 0.89 | 0.90 | 0.60 | 0.60 | 0.60 | -0.02 |
| RFR, zwitterions | 0.91 | 0.91 | 0.91 | 0.45 | 0.45 | 0.45 | 0.01 | |
| GSE, Test Set 1 |
| 0.78 | 0.78 | 0.73 | 0.97 | 0.97 | 1.01 | -0.41 |
| GSE, Test Set 2 |
| 0.45 | 0.26 | 0.07 | 1.07 | 1.23 | 1.34 | -0.61 |
| GSE, Test Set 3 |
| 0.46 | 0.26 | 0.20 | 0.94 | 1.10 | 1.13 | -0.31 |
| GSE, Test Set 4 |
| 0.69 | 0.69 | 0.68 | 1.23 | 1.24 | 1.25 | -0.08 |
| ABSOLV, Test Set 1 |
| 0.77 | 0.69 | 0.58 | 0.98 | 1.15 | 1.27 | -0.65 |
| ABSOLV, Test Set 2 |
| 0.55 | 0.55 | 0.35 | 0.98 | 0.98 | 1.13 | -0.62 |
| ABSOLV, Test Set 3 |
| 0.47 | 0.36 | 0.26 | 0.94 | 1.02 | 1.10 | -0.41 |
| ABSOLV, Test Set 4 |
| 0.72 | 0.72 | 0.70 | 1.18 | 1.18 | 1.18 | -0.29 |
| RFR, Test Set 1 |
| 0.90 | 0.83 | 0.82 | 0.66 | 0.84 | 0.83 | -0.23 |
| RFR, Test Set 2 |
| 0.66 | 0.66 |
|
| 0.85 | 0.92 | -0.41 |
| RFR, Test Set 3 |
| 0.66 | 0.66 | 0.64 | 0.74 | 0.75 | 0.76 | -0.18 |
| RFR, Test Set 4 |
| 0.82 | 0.77 | 0.71 | 0.95 | 1.05 | 1.15 | -0.54 |
| GSE, Test Set 1 |
| 0.91 | 0.90 | 0.89 | 0.62 | 0.66 | 0.66 | 0.02 |
a GSE = General Solubility Equation; ABSOLV = Abraham Solvation Equation; RFR = Random Forest regression.
b Statistics reported in Ref. [7].
Recalculated statistics for the scatter plots in Ref. [8]
| Type | Fig. in | r2Pearson | r2bias | r2val | RMSEPearson | RMSEbias | RMSEval | bias |
|---|---|---|---|---|---|---|---|---|
| GSE, small molecules |
| 0.62 | 0.59 | 0.57 | 1.17 | 1.21 | 1.23 | -0.22 |
| GSE, large molecules |
| 0.48 | -3.8 | -3.82 | 1.00 | 3.05 | 2.95 | 0.16 |
| GSE, modified |
| 0.48 | 0.34 | 0.33 | 1.00 | 1.13 | 1.1 | 0.04 |
| ABSOLV, small molecules |
| 0.67 | 0.67 | 0.66 | 1.08 | 1.08 | 1.1 | -0.2 |
| ABSOLV, large molecules |
| 0.13 | -1.39 | -5.24 | 1.30 | 2.15 | 3.36 | -2.64 |
| ABSOLV, modified |
| 0.48 | -0.91 | 2.07 | 1.01 | 1.92 | 2.07 | 0.92 |
| RFR, training set |
| 0.98 | 0.98 | 0.98 | 0.26 | 0.27 | 0.27 | 0.00 |
| RFR, internal validation |
| 0.89 | 0.89 | 0.89 | 0.64 | 0.64 | 0.64 | 0.02 |
| RFR, large molecules |
| 0.45 | 0.42 | 0.37 | 1.03 | 1.06 | 1.07 | 0.30 |
a Statistics reported in Ref. [8].