| Literature DB >> 35647436 |
Hideaki Mamada1,2, Yukihiro Nomura2, Yoshihiro Uesawa1.
Abstract
The toxicity, absorption, distribution, metabolism, and excretion properties of some targets are difficult to predict by quantitative structure-activity relationship analysis. Therefore, there is a need for a new prediction method that performs well for these targets. The aim of this study was to develop a new regression model of rat clearance (CL). We constructed a regression model using 1545 in-house compounds for which we had rat CL data. Molecular descriptors were calculated using molecular operating environment, alvaDesc, and ADMET Predictor software. The classification model of DeepSnap and Deep Learning (DeepSnap-DL) with images of the three-dimensional chemical structures of compounds as features was constructed, and the prediction probabilities for each compound were calculated. For molecular descriptor-based methods that use molecular descriptors and conventional machine learning algorithms selected by DataRobot, the correlation coefficient (R 2) and root mean square error (RMSE) were 0.625-0.669 and 0.295-0.318, respectively. We combined molecular descriptors and prediction probability of DeepSnap-DL as features and developed a novel regression method we called the combination model. In the combination model with these two types of features and conventional algorithms selected by DataRobot, R 2 and RMSE were 0.710-0.769 and 0.247-0.278, respectively. This finding shows that the combination model performed better than molecular descriptor-based methods. Our combination model will contribute to the design of more rational compounds for drug discovery. This method may be applicable not only to rat CL but also to other pharmacokinetic and pharmacological activity and toxicity parameters; therefore, applying it to other parameters may help to accelerate drug discovery.Entities:
Year: 2022 PMID: 35647436 PMCID: PMC9134387 DOI: 10.1021/acsomega.2c00261
Source DB: PubMed Journal: ACS Omega ISSN: 2470-1343
Figure 1Flowchart of the modeling process for rat CL prediction.
Figure 2Chemical space distribution of five datasets by PCA using 11 representative molecular descriptors (n = 1545). (a) PC1 (62.3%) and PC2 (12.0%). (b) PC1 (62.3%) and PC3 (8.0%). (c) PC2 (12.0%) and PC3 (8.0%). Each dot represents a compound (n = 5 × 309). The 11 representative molecular descriptors were molecular weight, S log P (log octanol/water partition coefficient), topological polar surface area, h_log D (octanol/water distribution coefficient [pH 7]), h_pKa (acidity [pH 7]), h_pKb (basicity [pH 7]), a_acc (number of H-bond acceptor atoms), a_don (number of H-bond donor atoms), a_aro (number of aromatic atoms), b_ar (number of aromatic bonds), and b_rotN (number of rotatable bonds). The principal components were calculated from PC1 to PC3.
Internal Validation and External Test Resultsa
| internal validation results | RMSE | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| methods | Pattern 1 | Pattern 2 | Pattern 3 | Pattern 4 | Pattern 5 | Pattern 1 | Pattern 2 | Pattern 3 | Pattern 4 | Pattern 5 |
| molecular descriptor-based methods | 0.573 | 0.580 | 0.582 | 0.595 | 0.576 | 0.338 | 0.334 | 0.334 | 0.328 | 0.337 |
| combination model | 0.676 | 0.688 | 0.692 | 0.702 | 0.692 | 0.294 | 0.288 | 0.287 | 0.282 | 0.287 |
The internal validation and test results were predicted by DataRobot using 4331–4338 descriptors for seed = 1. R2, correlation coefficient; RMSE, root mean square error. Combination model, combination of using prediction probability of DeepSnap and Deep Learning and molecular descriptors.
Top 10 Molecular Descriptors in the Molecular Descriptor-Based Methodsa
| descriptor | software to calculate molecular descriptor | description | average effect |
|---|---|---|---|
| nFCharge+ | MOE | number of atoms with a positive formal charge | 0.513 |
| BCUT_SLOGP_0 | MOE | descriptors using atomic
contribution to log | 0.496 |
| MATS2p | alvaDesc | descriptor-related polarizability | 0.418 |
| B09[N–O] | alvaDesc | presence/absence of N–O bond at some topological distance | 0.375 |
| FCharge+ | MOE | sum of the positive formal charges contained in a molecule | 0.317 |
| SMR_VSA3 | MOE | descriptor-related molar refraction | 0.311 |
| FCharge+_max | MOE | largest number of positive formal charges contained in a molecule | 0.297 |
| h_pstrain | MOE | strain energy needed to convert all protonation states into the input protonation state | 0.294 |
| MATS6e | alvaDesc | descriptor-related electronegativity | 0.242 |
| F09[N–O] | alvaDesc | frequency of N–O bond at some topological distance | 0.238 |
The top 10 descriptors were calculated based on the permutation importance of the prediction models using the molecular descriptor-based methods for seed = 1. Average effect was calculated based on five different prediction models using Patterns 1–5 (Figure ). MOE, molecular operating environment. Log P, log octanol/water partition coefficient.
Top 10 Descriptors in the Molecular Descriptor and Prediction Probability of DeepSnap-DL for the Combination Modela
| descriptor | software to calculate molecular descriptor | description | average effect |
|---|---|---|---|
| prediction probability of DeepSnap-DL | prediction probability calculated by DeepSnap-DL | 1.000 | |
| BCUT_SLOGP_0 | MOE | descriptors using atomic
contribution to log | 0.107 |
| SMR_VSA3 | MOE | descriptor-related molar refraction | 0.088 |
| FCharge+_max | MOE | largest number of positive formal charges contained in a molecule | 0.064 |
| MATS2p | alvaDesc | descriptor-related polarizability | 0.054 |
| T_MIRyy | ADMET predictor | descriptor-related topological equivalent of MIRyy___3D (medium relative principal moment of inertia), but without mass weighting | 0.050 |
| FCharge+ | MOE | sum of the positive formal charges contained in a molecule | 0.049 |
| nFCharge+ | MOE | number of atoms with a positive formal charge | 0.045 |
| F09[N–O] | alvaDesc | frequency of the N–O bond at some topological distance | 0.040 |
| RDF020m | alvaDesc | descriptor-related radial distribution function weighted by mass | 0.038 |
The top 10 descriptors were calculated based on the permutation importance of the prediction models using the molecular descriptor and prediction probability of DeepSnap-DL for the combination model for seed = 1. The average effect was calculated based on five different prediction models using Patterns 1–5 (Figure ). DeepSnap-DL, DeepSnap and Deep Learning; MOE, molecular operating environment. Log P, log octanol/water partition coefficient.