| Literature DB >> 35665065 |
Shuheng Huang1,2, Yingjie Gao1, Xuelian Zhang3, Ji Lu1, Jun Wei1, Hu Mei2, Juan Xing3, Xianchao Pan1.
Abstract
The ATP binding cassette transporter ABCG2 is a physiologically important drug transporter that has a central role in determining the ADMET (absorption, distribution, metabolism, elimination, and toxicity) profile of therapeutics, and contributes to multidrug resistance. Thus, development of predictive in silico models for the identification of ABCG2 inhibitors is of great interest in the early stage of drug discovery. In this work, by exploiting a large public dataset, a number of ligand-based classification models were developed using partial least squares-discriminant analysis (PLS-DA) with molecular interaction field- and fingerprint-based structural description methods, regarding physicochemical and fragmental properties related to ABCG2 inhibition. An in-house dataset compiled from recently experimental studies was used to rigorously validated the model performance. The key molecular properties and fragments favored to inhibitor binding were discussed in detail, which was further explored by docking simulations. A highly informative chemical property was identified as the principal determinant of ABCG2 inhibition, which was utilized to derive a simple rule that had a strong capability for differentiating inhibitors from non-inhibitors. Furthermore, the incorporation of the rule into the best PLS-DA model significantly improved the classification performance, particularly achieving a high prediction accuracy on the independent in-house set. The integrative model is simple and accurate, which could be applied to the evaluation of drug-transporter interactions in drug development. Also, the dominant molecular features derived from the models may help medicinal chemists in the molecular design of novel inhibitors to circumvent ABCG2-mediated drug resistance.Entities:
Keywords: ABCG2 (BCRP); PLS-DA; in silico; inhibitors; prediction
Year: 2022 PMID: 35665065 PMCID: PMC9159808 DOI: 10.3389/fchem.2022.863146
Source DB: PubMed Journal: Front Chem ISSN: 2296-2646 Impact factor: 5.545
FIGURE 1Schematic representation of the methods employed in our modeling.
Details of feature selection by stepwise linear regression on Volsurf descriptors.
| Iterations | Entered in Sequence | Removed |
| Adjusted | Description |
|---|---|---|---|---|---|
| 1 | BV12-DRY | — | 0.217 | 0.216 | The best hydrophobic volumes generated by the hydrophobic probe at the energy level of −1.0 kcal/mol |
| 2 | Log | no | 0.254 | 0.252 | Log of the octanol/water partition coefficient, which is computed by mean of a linear equation derived by fitting VolSurf descriptor to experimental data on water/octanol partition coefficient |
| 3 | W3-O | no | 0.270 | 0.267 | Hydrophilic regions generated by the carbonyl oxygen atom at energy level of −1.0 kcal/mol |
| 4 | D1-DRY | no | 0.280 | 0.276 | Hydrophobic regions generated by the hydrophobic probe at energy level of −0.2 kcal/mol |
| 5 | R-OH2 | no | 0.293 | 0.289 | Ratio Volume/Surface generated by the water probe |
| 6 | W4 | no | 0.300 | 0.294 | The hydrophilic regions, represent the molecular envelope accessible generated by solvent water probe at energy level of −2.0 kcal/mol |
| 7 | Emin1-OH2 | no | 0.304 | 0.298 | Local interaction energy minima between the H2O probe and the target molecule |
| 8 | D12-DRY | no | 0.309 | 0.301 | Hydrophobic local interaction energy minima distances generated by the hydrophobic probe |
| 9 | W7-O | no | 0.287 | 0.282 | Hydrophilic regions generated by the carbonyl oxygen atom at energy level of −5.0 kcal/mol |
Performance of VolSurf-based PLS-DA models.
| Model | Number of Descriptors | Training Set | Internal Validation Set | ||||||
|---|---|---|---|---|---|---|---|---|---|
| ACC | SEN | SPE | MCC | ACC | SEN | SPE | MCC | ||
| VS1 | 118 | 0.76 | 0.75 | 0.77 | 0.52 | 0.77 | 0.78 | 0.76 | 0.53 |
| VS2 | 9 | 0.76 | 0.72 | 0.78 | 0.51 | 0.79 | 0.76 | 0.81 | 0.57 |
| VS3 | 8 | 0.76 | 0.75 | 0.77 | 0.52 | 0.79 | 0.78 | 0.80 | 0.57 |
| VS4 | 7 | 0.75 | 0.76 | 0.74 | 0.50 | 0.79 | 0.79 | 0.78 | 0.57 |
| VS5 | 6 | 0.75 | 0.74 | 0.75 | 0.50 | 0.79 | 0.79 | 0.78 | 0.57 |
| VS6 | 5 | 0.75 | 0.74 | 0.76 | 0.50 | 0.78 | 0.78 | 0.78 | 0.56 |
| VS7 | 4 | 0.75 | 0.74 | 0.75 | 0.49 | 0.77 | 0.76 | 0.77 | 0.53 |
| VS8 | 3 | 0.74 | 0.72 | 0.76 | 0.47 | 0.78 | 0.76 | 0.78 | 0.55 |
|
|
|
|
|
|
|
|
|
|
|
| VS10 | 1 | 0.71 | 0.69 | 0.75 | 0.43 | 0.77 | 0.74 | 0.80 | 0.54 |
The best VolSurf-based PLS-DA model based on two descriptors was highlighted in bold.
Performance of the PLS-DA models based on different fragment distinctions.
| Model | Fragment distinction | HL | Training Set | Internal Validation Set | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| ACC | SEN | SPE | MCC | ACC | SEN | SPE | MCC | |||
| FD1 | A | 353 | 0.81 | 0.82 | 0.80 | 0.63 | 0.78 | 0.78 | 0.78 | 0.56 |
| FD2 | B | 353 | 0.77 | 0.74 | 0.79 | 0.54 | 0.76 | 0.76 | 0.75 | 0.52 |
| FD3 | A/B | 353 | 0.84 | 0.81 | 0.87 | 0.68 | 0.81 | 0.81 | 0.81 | 0.61 |
| FD4 | A/C | 257 | 0.83 | 0.83 | 0.83 | 0.66 | 0.82 | 0.83 | 0.81 | 0.63 |
| FD5 | C/D | 353 | 0.84 | 0.81 | 0.86 | 0.67 | 0.78 | 0.82 | 0.75 | 0.57 |
| FD6 | A/B/C | 353 | 0.84 | 0.83 | 0.86 | 0.69 | 0.82 | 0.83 | 0.81 | 0.63 |
| FD7 | A/B/H | 353 | 0.84 | 0.82 | 0.85 | 0.67 | 0.78 | 0.82 | 0.74 | 0.56 |
| FD8 | A/C/D | 307 | 0.85 | 0.82 | 0.87 | 0.69 | 0.81 | 0.80 | 0.82 | 0.61 |
| FD9 | A/B/C/D | 353 | 0.85 | 0.83 | 0.87 | 0.71 | 0.82 | 0.82 | 0.82 | 0.63 |
| FD10 | A/B/C/H | 257 | 0.83 | 0.82 | 0.85 | 0.67 | 0.78 | 0.81 | 0.75 | 0.56 |
The fragment distinction includes A (Atom), B (Bond), C (Connection), D (Donor & Acceptor), and H (Hydrogen Atoms).
HL, holographic length.
Performance of the PLS-DA models based on different fragment sizes.
| Model | Atom counts | HL | Training Set | Internal Validation Set | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| ACC | SEN | SPE | MCC | ACC | SEN | SPE | MCC | |||
| FS1 | 1–4 | 151 | 0.79 | 0.78 | 0.81 | 0.59 | 0.79 | 0.79 | 0.78 | 0.57 |
| FS2 | 2–5 | 257 | 0.81 | 0.80 | 0.82 | 0.62 | 0.81 | 0.85 | 0.78 | 0.63 |
| FS3 | 3–6 | 353 | 0.83 | 0.83 | 0.83 | 0.66 | 0.80 | 0.82 | 0.78 | 0.60 |
| FS4 | 4–7 | 257 | 0.83 | 0.83 | 0.83 | 0.66 | 0.82 | 0.83 | 0.81 | 0.63 |
| FS |
|
|
|
|
|
|
|
|
|
|
| FS6 | 6–9 | 257 | 0.86 | 0.85 | 0.87 | 0.71 | 0.81 | 0.82 | 0.81 | 0.63 |
| FS7 | 7–10 | 307 | 0.86 | 0.83 | 0.88 | 0.72 | 0.81 | 0.81 | 0.81 | 0.61 |
| FS8 | 8–11 | 307 | 0.85 | 0.81 | 0.89 | 0.70 | 0.80 | 0.80 | 0.81 | 0.61 |
The best Hologram-based PLS-DA model was highlighted in bold where the fragment size was 5–8, the fragment distinction was A/C, and the hologram length was 353 bins.
Performance of the best models on the external validation set.
| Best Models | TP | TN | FP | FN | ACC | SEN | SPE | MCC |
|---|---|---|---|---|---|---|---|---|
| VolSurf-based | 327 | 116 | 18 | 173 | 0.70 | 0.65 | 0.87 | 0.43 |
| Hologram-based | 359 | 118 | 16 | 141 | 0.75 | 0.72 | 0.88 | 0.50 |
FIGURE 2Analysis of VolSurf descriptors. (A) Loadings of the nine independent variables in the first two principal components. (B) Density distribution of the descriptors BV12-DRY (p = 1.566 × 10–69) and LogP (p = 8.054 × 10–29) in the two classes. The p values were calculated by using Student’s t-test.
Performance of the rule-based model on different data sets.
| Data Sets | TP | TN | FP | FN | ACC | SEN | SPE | MCC |
|---|---|---|---|---|---|---|---|---|
| training | 262 | 264 | 117 | 93 | 0.71 | 0.74 | 0.69 | 0.43 |
| internal validation set | 144 | 144 | 46 | 34 | 0.78 | 0.81 | 0.76 | 0.56 |
| external validation set | 338 | 111 | 23 | 162 | 0.71 | 0.68 | 0.83 | 0.42 |
FIGURE 3Mapping of atomic contributions in the fragments of ABCG2 inhibitors with diverse scaffolds.
FIGURE 4Molecular interactions with ABCG2. (A) The occurrence frequencies of the binding residues involved in the molecular interactions with the docked inhibitors. (B) 22 hotspot residues with a high-frequency occurrence (>0.5) intimately related to the binding of inhibitors. (C) Interactions of compound 181 and 383 with the residues in the binding pocket. H-bonds interactions are represented as green dashed lines.
FIGURE 5Comparison of the performance of the integrative model with the single classifiers.