| Literature DB >> 31827176 |
Reiko Watanabe1, Rikiya Ohashi2,3, Tsuyoshi Esaki4,5, Hitoshi Kawashima4, Yayoi Natsume-Kitatani4,6, Chioko Nagao6, Kenji Mizuguchi4,6.
Abstract
Prediction of pharmacokinetic profiles of new chemical entities is essential in drug development to minimize the risks of potential withdrawals. The excretion of unchanged compounds by the kidney constitutes a major route in drug elimination and plays an important role in pharmacokinetics. Herein, we created in silico prediction models of the fraction of drug excreted unchanged in the urine (fe) and renal clearance (CLr), with datasets of 411 and 401 compounds using freely available software; notably, all models require chemical structure information alone. The binary classification model for fe demonstrated a balanced accuracy of 0.74. The two-step prediction system for CLr was generated using a combination of the classification model to predict excretion-type compounds and regression models to predict the CLr value for each excretion type. The accuracies of the regression models increased upon adding a descriptor, which was the observed and predicted fraction unbound in plasma (fu,p); 78.6% of the samples in the higher range of renal clearance fell within 2-fold error with predicted fu,p value. Our prediction system for renal excretion is freely available to the public and can be used as a practical tool for prioritization and optimization of compound synthesis in the early stage of drug discovery.Entities:
Mesh:
Year: 2019 PMID: 31827176 PMCID: PMC6906481 DOI: 10.1038/s41598-019-55325-1
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1(a) Distribution of f in Dataset_f consisting of 411 compounds. Average and median are shown in the top-right. (b) Distribution of CL with logarithmic scale in Dataset_CL consisting of 401 compounds. Average and median are shown in the top left. (c) The chemical space of Dataset_f with classification by the threshold set to 0.30. The frames indicate 95% normal confidence ellipses in the assembled 411 compounds with f ≥ 0.3 (red) and f < 0.3 (green). (d) The chemical space of Dataset_CL in 96 intemediate (IM, red circle), 104 reabsorption (R, green triangle), and 201 secretion (S, blue square) types. (e) Plot of compound counts depending on CR type. Average and median of CL in each CR type are shown on the right.
Statistical results of the binary classification models for f prediction by each of the four models.
| Descriptor | Selected descriptors | Training or Test | Parameter | RFa (Model— | SVMa | ANNa | PLSa |
|---|---|---|---|---|---|---|---|
| without | 51 | Training | Kappa | 0.50 | 0.46 | 0.50 | 0.52 |
| Test | Kappa | 0.49b | 0.29 | 0.37 | 0.38 | ||
| Bal. Acc. | 0.74 | 0.63 | 0.69 | 0.68 | |||
| Sensitivity | 0.65 | 0.39 | 0.61 | 0.45 | |||
| Specificity | 0.84 | 0.88 | 0.76 | 0.90 |
aRF, Random forest; SVM, Support Vector Machine with radial functions; ANN, artificial neural network; PLS, partial least squares.
bThe highest kappa in the test set among four models.
Figure 2Relationship between CL in logarithmic scale and observed f. (a) Whole Dataset_CL (401 compounds), and (b) sub-categorized by CR type (104, 96, and 201 compounds in reabsorption [R], intermediate [IM] and secretion [S] type, respectively). (c) Boxplot of observed f in each excretion type. n; compound counts, r; correlation coefficient.
Figure 3Plot of predicted and observed CL by three regression models with predicted f value. (a) in the test set (66 compounds) and (b) external test set (41 compounds).
Statistical results and fold error of the best regression models for CL prediction with or without f.
| CR type | Descriptor set | Training or Test | The best model | Average | Methoda | |||
|---|---|---|---|---|---|---|---|---|
| r2 | RMSE | Within 2-fold error (%) | Within 3-fold error (%) | r2 | ||||
| Reabsorption Type (R) | Without | Training | 0.48 | 0.56 | — | — | 0.50 | RF |
| Test | 0.38 | 0.61 | 37.5 (33.3) | 43.8 (33.3) | 0.23 | |||
| With observed | Training | 0.71 | 0.44 | — | — | 0.62* | RF | |
| Test | 0.66 | 0.46 | 56.3 (33.3) | 62.5 (33.3) | 0.53* | |||
| With predicted | Training | 0.57 | 0.51 | — | — | 0.52* | PLS | |
| Test | 0.52 | 0.54 | 43.8 (16.7) | 50.0 (33.3) | 0.47* | |||
| Intermediate Type (IM) | Without | Training | 0.65 | 0.38 | — | — | 0.65 | SVM |
| Test | 0.56 | 0.28 | 68.8 (60.0) | 93.8 (90.0) | 0.43 | |||
| With observed | Training | 0.95 | 0.17 | — | — | 0.94* | RF | |
| Test | 0.92 | 0.12 | 100 (100) | 100 (100) | 0.88* | |||
| With predicted | Training | 0.77 | 0.29 | — | — | 0.82* | RF | |
| Test | 0.74 | 0.21 | 87.5 (83.3) | 100 (100) | 0.68* | |||
| Secretion Type (S) | Without | Training | 0.43 | 0.51 | — | — | 0.46 | RF |
| Test | 0.41 | 0.46 | 48.6 (35.0) | 68.6 (60.0) | 0.36 | |||
| With observed | Training | 0.64 | 0.39 | — | — | 0.65* | RF | |
| Test | 0.62 | 0.37 | 62.9 (55.0) | 80.0 (75.0) | 0.57* | |||
| With predicted | Training | 0.60 | 0.42 | — | — | 0.58* | RF | |
| Test | 0.58 | 0.40 | 57.1 (50.0) | 80.0 (65.0) | 0.46* | |||
aRF, Random forest; SVM, Support Vector Machine with radial functions; PLS, partial least squares; RMSE, root mean squared error. *p-value calculated using the paired t-test with Kappa against model without f in each CR type (p < 0.05).
Statistical results of the 3-class classification models for CL prediction.
| Model | Selected descriptors (n) | Training or Test set | Parameter | CR type | RFa (Model_ | SVMa | ANNa | PLSa |
|---|---|---|---|---|---|---|---|---|
| Without | 15 | Training | Kappa | — | 0.34 | 0.34 | 0.32 | 0.29 |
| Test | Kappa | — | 0.32b | 0.19 | 0.18 | 0.22 | ||
| Sensitivity | R | 0.56 | 0.56 | 0.56 | 0.50 | |||
| IM | 0.29 | 0.12 | 0.41 | 0.18 | ||||
| S | 0.75 | 0.69 | 0.47 | 0.75 | ||||
| Balanced Accuracy | R | 0.7 | 0.69 | 0.68 | 0.68 | |||
| IM | 0.58 | 0.50 | 0.59 | 0.54 | ||||
| S | 0.68 | 0.58 | 0.52 | 0.59 |
aRF, Random forest; SVM, Support Vector Machine; ANN, artificial neural network; PLS, partial least square.
bThe highest Kappa shown in the test set.
Figure 4Plot of predicted and observed CL in the external validation set consisting of 41 compounds by the two-step prediction system with predicted f value.
Figure 5Application of the generated prediction models. Left: In silico prediction system for f in humans. Right: Two step in silico prediction system for CL in humans. R; Reabsorption, IM; Intermediate, S; Secretion.