| Literature DB >> 33244000 |
Vishal B Siramshetty1, Pranav Shah1, Edward Kerns1, Kimloan Nguyen1,2, Kyeong Ri Yu1,3, Md Kabir1,4, Jordan Williams1, Jorge Neyra1, Noel Southall1, Ðắc-Trung Nguyễn1, Xin Xu5.
Abstract
Hepatic metabolic stability is a key pharmacokinetic parameter in drug discovery. Metabolic stability is usually assessed in microsomal fractions and only the best compounds progress in the drug discovery process. A high-throughput single time point substrate depletion assay in rat liver microsomes (RLM) is employed at the National Center for Advancing Translational Sciences. Between 2012 and 2020, RLM stability data was generated for ~ 24,000 compounds from more than 250 projects that cover a wide range of pharmacological targets and cellular pathways. Although a crucial endpoint, little or no data exists in the public domain. In this study, computational models were developed for predicting RLM stability using different machine learning methods. In addition, a retrospective time-split validation was performed, and local models were built for projects that performed poorly with global models. Further analysis revealed inherent medicinal chemistry knowledge potentially useful to chemists in the pursuit of synthesizing metabolically stable compounds. In addition, we deposited experimental data for ~ 2500 compounds in the PubChem bioassay database (AID: 1508591). The global prediction models are made publicly accessible ( https://opendata.ncats.nih.gov/adme ). This is to the best of our knowledge, the first publicly available RLM prediction model built using high-quality data generated at a single laboratory.Entities:
Mesh:
Substances:
Year: 2020 PMID: 33244000 PMCID: PMC7693334 DOI: 10.1038/s41598-020-77327-0
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Time-split distribution of RLM stability data (2012 to 2019).
Reproducibility data for control compounds. Mean and S.D of the t1/2 values were calculated for exemplary controls across 600 plates.
| Compound | t1/2 (min) | MSR |
|---|---|---|
| Buspirone | 3.8 ± 1.1 | 2.1 |
| Propranolol | 1.4 ± 0.3 | 1.7 |
| Diclofenac | 11.4 ± 2.6 | 1.8 |
| Loperamide | 8.9 ± 2.4 | 1.9 |
| Antipyrine | > 30 | N/A |
| Carbamazepine | > 30 | N/A |
Figure 2Distributions of the data based on: (a) Molecular weight, (b) TPSA, (c) and Log P.
Figure 3Visualization of the chemical space of RLM stability data set. The axes labels x and y indicate the first two dimensions of the t-SNE embedding.
Figure 4Hierarchical clustering of the RLM stability data set. Exemplary regions that represent: (a) abundance of highly stable compounds; (b) abundance of highly unstable compounds; and (c) a mixture of compounds belonging to different t1/2 groups; are highlighted.
Figure 5Results of the eight models evaluated in five-fold cross-validation. (a) Performance measured as AUC. (b) Performance measured as BACC. The standard deviation of the average over five folds is represented as an error bar for each model.
Figure 6Time-split validation results for the four modeling methods. (a) Performance measured as AUC. (b) Performance measured as BACC.
Figure 7Top 10 NCATS projects chosen for retrospective analysis. (a) Distribution of compounds for all 10 projects across multiple years. (b) Performance of global models on data from the 10 projects in three consecutive years (Year 1, Year 2 and Year 3). The dotted line represents the BACC threshold of 0.7.
Performance of the global and the local models generated for the project NCATS5.
| Year | # Compounds in Training Set | # Compounds in Test Set | BACC (global) | BACC (local) |
|---|---|---|---|---|
| 2017 | 1188 | 1018 | 0.61 | 0.60 |
| 2018 | 2206 | 620 | 0.76 | 0.71 |
| 2019 | 2862 | 338 | 0.57 | 0.75 |
Detailed statistics on the 18 transformations selected from the MMPA.
| Fragment (L) | Fragment (R) | Training | Test | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Compound Pairs | Average t1/2 (min) | Compound Pairs | Average t1/2 (min) | ||||||
| Total (+ ve/und/−ve) | L | R | R/L | Total (+ ve/und/−ve) | L | R | R/L | ||
|
|
| 25 (23/1/1) | 6 | 29 | 10.8 | 68 (57/8/3) | 9.4 | 28.4 | 7.7 |
|
|
| 41 (38/3/0) | 9.5 | 29.7 | 9.8 | 49 (41/7/1) | 9.3 | 29.0 | 8.6 |
|
|
| 19 (15/2/2) | 11.8 | 23.8 | 6.5 | 9 (6/1/2) | 11.2 | 13.5 | 1.5 |
|
|
| 15 (10/5/0) | 13 | 24.3 | 6.5 | 5 (3/0/2) | 5.2 | 7.9 | 1.3 |
|
|
| 17 (9/7/1) | 16.3 | 27.8 | 5.3 | 6 (5/1/0) | 9.5 | 21.5 | 6.3 |
|
|
| 15 (9/4/2) | 13.9 | 24.3 | 5.1 | 3 (2/0/1) | 11.0 | 12.9 | 5.5 |
|
|
| 17 (15/1/1) | 11.8 | 25.3 | 4.8 | 7 (6/1/0) | 8.2 | 17.9 | 2.8 |
|
|
| 15 (9/6/0) | 16.1 | 28 | 4.3 | 10 (4/2/4) | 13.5 | 16.3 | 3.0 |
|
|
| 15 (9/6/0) | 17 | 27.8 | 4.3 | 8 (5/0/3) | 9.9 | 17.2 | 2.0 |
|
|
| 15 (10/4/1) | 13.5 | 24.4 | 3.8 | 5 (4/0/1) | 11.3 | 16.0 | 2.2 |
|
|
| 18 (14/3/1) | 12.8 | 23.1 | 3.8 | 10 (9/0/1) | 7.4 | 16.6 | 2.2 |
|
|
| 21 (19/2/0) | 7.7 | 17.9 | 3.6 | 23 (16/0/7) | 6.6 | 12.3 | 3.6 |
|
|
| 23 (14/8/1) | 13.3 | 23.4 | 3.5 | 43 (26/5/12) | 11.1 | 18.3 | 4.0 |
|
|
| 25 (20/4/1) | 12.1 | 27.3 | 3.5 | 11 (7/3/1) | 15.7 | 25.2 | 2.2 |
|
|
| 21 (10/11/0) | 19.3 | 29.4 | 3.4 | 10 (6/4/0) | 16.8 | 29.5 | 3.4 |
|
|
| 17 (14/3/0) | 15.3 | 27.2 | 2.6 | 13 (7/4/2) | 11.8 | 21.1 | 4.5 |
|
|
| 15 (15/0/0) | 15.4 | 29.8 | 2.3 | 3 (3/0/0) | 15.3 | 30.0 | 2.0 |
|
|
| 15 (15/0/0) | 15.4 | 29.8 | 2.3 | 3 (3/0/0) | 15.3 | 30.0 | 2.0 |
The compound pairs for each matched molecular pair are grouped into three categories based on the shift in t1/2: positive (+ ve) shift; negative (−ve) shift; and undetermined (und).
Comparison of performance of our best RLM stability model with the literature models.
| Metric | Chang et al.[ | Hu et al.[ | NCATS RLM (Best Individual Model) | NCATS RLM (Consensus Model) |
|---|---|---|---|---|
| BACC | 0.81 | 0.77 | 0.82 | 0.83 |
| Sensitivity | 0.82 | 0.73 | 0.86 | 0.85 |
| Specificity | 0.80 | 0.80 | 0.77 | 0.81 |
| Kappa | 0.62 | 0.53 | 0.64 | 0.66 |