| Literature DB >> 29601530 |
Sabina Podlewska1, Rafał Kafel2.
Abstract
Metabolic stability is an important parameter to be optimized during the complex process of designing new active compounds. Tuning this parameter with the simultaneous maintenance of a desired compound's activity is not an easy task due to the extreme complexity of metabolic pathways in living organisms. In this study, the platform for in silico qualitative evaluation of metabolic stability, expressed as half-lifetime and clearance was developed. The platform is based on the application of machine learning methods and separate models for human, rat and mouse data were constructed. The compounds' evaluation is qualitative and two types of experiments can be performed-regression, which is when the compound is assigned to one of the metabolic stability classes (low, medium, high) on the basis of numerical value of the predicted half-lifetime, and classification, in which the molecule is directly assessed as low, medium or high stability. The results show that the models have good predictive power, with accuracy values over 0.7 for all cases, for Sequential Minimal Optimization (SMO), k-nearest neighbor (IBk) and Random Forest algorithms. Additionally, for each of the analyzed compounds, 10 of the most similar structures from the training set (in terms of Tanimoto metric similarity) are identified and made available for download as separate files for more detailed manual inspection. The predictive power of the models was confronted with the external dataset, containing metabolic stability assessment via the GUSAR software, leading to good consistency of results for SMOreg and Naïve Bayes (~0.8 on average). The tool is available online.Entities:
Keywords: ChEMBL database; classification; machine learning; metabolic stability; regression
Mesh:
Year: 2018 PMID: 29601530 PMCID: PMC5979396 DOI: 10.3390/ijms19041040
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
A summary of some of the available tools for ADMET properties predictions.
| Package Name | Link | Availability | Description |
|---|---|---|---|
| ADMET Predictor | commercial software | Comprehensive characteristic of physicochemical and ADMET properties of compounds, including cancerogenity, mutagenicity, overall toxicity and possibility of interactions with 5 selected CYP isoforms | |
| CASE ULTRA | commercial software | A set of statistical and expert tools for evaluation of compounds toxicity | |
| DEREK | commercial software | Expert system for predicting toxicity of compounds, including cancerogenity, mutagenicity, genotoxicity, teratogenicity, influence on fertility, irritating influence on skin or allergic effect | |
| META-PC | commercial software | Expert system for predicting products of compounds metabolism | |
| METEOR | commercial software | Expert system for predicting metabolic transformations | |
| ONCOLOGIC | free software | Predicting of cancerogenicity of compounds | |
| PASS | commercial software (simplified version is freely available online) | Qualitative evaluation of above 3500 properties, including mechanisms of action, side and toxic effects, interaction with various enzymes and transport proteins, influence on genes expression | |
| TOPKAT | commercial software | Mutagenicity, cancerogenity, irritating action on skin, eyes, etc. | |
| GUSAR | commercial software | Evaluation of compounds toxicity and interaction with selected off-targets |
The number of compounds present in each dataset used for the predictive model’s construction.
| Human | Rat | Mouse | ||||
|---|---|---|---|---|---|---|
| Liver Microsomes | Plasma | Liver Microsomes | Plasma | Liver Microsomes | Plasma | |
| T1/2 | 2127 | 561 | 1308 | 277 | 808 | 62 |
| clearance | 2546 | - | 1244 | - | 266 | - |
Figure 1Distribution of compound half-lifetimes in the constructed datasets referring to experiments performed on human samples. For better visualization, the dataset was divided into several parts.
Figure 2Distribution of compound half-lifetimes in the constructed datasets referring to experiments performed on rat samples. For better visualization, the dataset was divided into several parts.
Figure 3Distribution of compound half-lifetimes in the constructed datasets referring to experiments performed on mouse samples. For better visualization, the dataset was divided into several parts.
Figure 4Examples of differences in results of metabolic stability tests for human, rat and mouse models.
Figure 5Standard deviation values of half-lifetimes between human, rat and mouse data.
Figure 6Scheme of the prediction approaches covered in the study.
Statistics of number of compounds belonging to each class.
| Class/Number of Compounds | Human | Rat | Mouse |
|---|---|---|---|
| Low | 928 (44%) | 814 (62%) | 486 (60%) |
| Medium | 937 (44%) | 382 (29%) | 252 (31%) |
| High | 262 (12%) | 112 (9%) | 70 (9%) |
| Total | 2127 | 1308 | 808 |
Evaluation parameters obtained in 10-fold CV for data (T1/2) produced on liver microsomes. Values above 0.7 are depicted in bold.
| 1d2d Descriptors | ExtFP | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Class | SMOreg | SMO | IBk | Naïve Bayes | Random Forest | J48 | SMOreg | SMO | IBk | Naïve Bayes | Random Forest | J48 | ||
| human | Recall | Low | 0.176 | 0.344 | 0.696 | 0.650 | 0.626 | |||||||
| Medium | 0.679 | 0.697 | 0.508 | 0.673 | ||||||||||
| High | 0.561 | 0.573 | 0.557 | 0.480 | 0.447 | 0.463 | 0.389 | 0.515 | 0.592 | 0.603 | 0.527 | 0.500 | ||
| Precision | Low | 0.688 | 0.695 | 0.693 | ||||||||||
| Medium | 0.477 | 0.520 | 0.692 | 0.649 | 0.618 | 0.695 | 0.631 | 0.676 | ||||||
| High | 0.573 | 0.632 | 0.609 | 0.310 | 0.562 | 0.662 | 0.643 | 0.615 | 0.295 | 0.648 | 0.491 | |||
| Overall accuracy | 0.524 | 0.517 | 0.660 | 0.698 | 0.571 | 0.682 | ||||||||
| AUROC | ||||||||||||||
| rat | Recall | Low | 0.467 | 0.565 | ||||||||||
| Medium | 0.617 | 0.644 | 0.114 | 0.561 | 0.542 | 0.427 | 0.605 | 0.607 | 0.573 | 0.586 | 0.576 | |||
| High | 0.514 | 0.400 | 0.476 | 0.648 | 0.228 | 0.390 | 0.598 | 0.384 | 0.429 | 0.553 | 0.348 | 0.429 | ||
| Precision | Low | |||||||||||||
| Medium | 0.379 | 0.680 | 0.663 | 0.512 | 0.694 | 0.570 | 0.354 | 0.672 | 0.637 | 0.489 | 0.655 | 0.598 | ||
| High | 0.422 | 0.545 | 0.568 | 0.181 | 0.615 | 0.387 | 0.276 | 0.506 | 0.505 | 0.365 | 0.500 | 0.444 | ||
| Overall accuracy | 0.553 | 0.566 | 0.528 | 0.657 | ||||||||||
| AUROC | 0.698 | |||||||||||||
| mouse | Recall | Low | 0.570 | 0.650 | ||||||||||
| Medium | 0.601 | 0.622 | 0.248 | 0.521 | 0.584 | 0.723 | 0.667 | 0.615 | 0.615 | 0.579 | 0.591 | |||
| High | 0.500 | 0.329 | 0.486 | 0.728 | 0.357 | 0.271 | 0.529 | 0.557 | 0.343 | 0.500 | 0.457 | 0.386 | ||
| Precision | Low | |||||||||||||
| Medium | 0.448 | 0.678 | 0.643 | 0.546 | 0.685 | 0.565 | 0.525 | 0.648 | 0.654 | 0.525 | 0.679 | 0.575 | ||
| High | 0.614 | 0.622 | 0.540 | 0.199 | 0.658 | 0.284 | 0.638 | 0.684 | 0.600 | 0.357 | 0.627 | 0.509 | ||
| Overall accuracy | 0.632 | 0.533 | 0.667 | 0.696 | 0.665 | 0.686 | ||||||||
| AUROC | 0.673 | |||||||||||||
Figure 7Visualization of evaluating parameters values obtained in 10-fold CV studies (T1/2 liver microsomes data).
Accuracies of predictions on external test set (T1/2 human data on liver microsomes). Values above 0.7 are depicted in bold.
| SMOreg | SMO | IBk | Naïve Bayes | Random Forest | J48 | ||
|---|---|---|---|---|---|---|---|
| Medium class predictions removed | 1d2d descriptors | 0.58 | 0.22 | 0.61 | 0.51 | ||
| ExtFP | 0.39 | 0.27 | 0.44 | 0.23 | 0.38 | ||
| Medium class preditcions shifted to high class | 1d2d descriptors | 0.66 | 0.22 | 0.58 | |||
| ExtFP | 0.64 | 0.61 | 0.54 | 0.53 |
Figure 8Screenshot from example results, containing summary of predictions and predicted metabolic stabilities with coloring corresponding to metabolic stability results.
Optimization conditions for SMOreg, SMO, Random Forest, and IBk.
| Method | Parameter | Tested Values |
|---|---|---|
| SMOreg/SMO | C | 0.01, 0.1, 1, 10, 100, 1000 |
| Gamma | 0.001, 0.01, 0.1, 1, 10 | |
| Operations on data | Normalization standardization | |
| Random Forest | Number of trees | 10, 100, 1000 |
| Ibk | Number of nearest neighbors | 1, 2, 3, 4, 5 |
Conditions selected for each model.
| Compounds Representation | Method | Parameter | Human | Rat | Mouse |
|---|---|---|---|---|---|
| 1d2d descriptors | SMOreg | C | 0.01 | 0.1 | 0.1 |
| gamma | 0.1 | 0.1 | 0.1 | ||
| Operations on data | normalization | normalization | normalization | ||
| ExtFP | C | 0.1 | 1 | 1 | |
| gamma | 0.001 | 0.001 | 0.001 | ||
| Operations on data | standardization | standardization | standardization | ||
| 1d2d descriptors | SMO | C | 100 | 100 | 100 |
| gamma | 0.01 | 0.1 | 0.1 | ||
| Operations on data | normalization | normalization | normalization | ||
| ExtFP | C | 10 | 10 | 10 | |
| gamma | 0.01 | 0.01 | 0.001 | ||
| Operations on data | normalization | normalization | normalization | ||
| 1d2d descriptors | Random Forest | Number of trees | 1000 | 1000 | 1000 |
| ExtFP | 1000 | 100 | 100 | ||
| 1d2d descriptors | IBk | Number of nearest neighbors | 1 | 1 | 1 |
| ExtFP | 1 | 5 | 1 |