| Literature DB >> 24050383 |
Freya Klepsch1, Poongavanam Vasanthanathan, Gerhard F Ecker.
Abstract
The ABC transporter P-glycoprotein (P-gp) actively transports a wide range of drugs and toxins out of cells, and is therefore related to multidrug resistance and the ADME profile of therapeutics. Thus, development of predictive in silico models for the identification of P-gp inhibitors is of great interest in the field of drug discovery and development. So far in silico P-gp inhibitor prediction was dominated by ligand-based approaches because of the lack of high-quality structural information about P-gp. The present study aims at comparing the P-gp inhibitor/noninhibitor classification performance obtained by docking into a homology model of P-gp, to supervised machine learning methods, such as Kappa nearest neighbor, support vector machine (SVM), random fores,t and binary QSAR, by using a large, structurally diverse data set. In addition, the applicability domain of the models was assessed using an algorithm based on Euclidean distance. Results show that random forest and SVM performed best for classification of P-gp inhibitors and noninhibitors, correctly predicting 73/75% of the external test set compounds. Classification based on the docking experiments using the scoring function ChemScore resulted in the correct prediction of 61% of the external test set. This demonstrates that ligand-based models currently remain the methods of choice for accurately predicting P-gp inhibitors. However, structure-based classification offers information about possible drug/protein interactions, which helps in understanding the molecular basis of ligand-transporter interaction and could therefore also support lead optimization.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24050383 PMCID: PMC3904775 DOI: 10.1021/ci400289j
Source DB: PubMed Journal: J Chem Inf Model ISSN: 1549-9596 Impact factor: 4.956
Summary of Docking Runs Performed and Scoring Functions Used in This Study
| docking run | ligand protonation state | main scoring function | rescoring functions |
|---|---|---|---|
| 1 | neutral | ChemScore | GoldScore, ASP, ChemPLP, XScore |
| 2 | GoldScore | ChemScore, ASP, ChemPLP, XScore | |
| 3 | protonated | ChemScore | GoldScore, ASP, ChemPLP, XScore |
| 4 | GoldScore | ChemScore, ASP, ChemPLP, XScore |
Figure 1(A) Score plot from principal component analysis (first two principal components shown). Inhibitors are shown in green circles and noninhibitors are shown in red dots. (B) Loading plot of descriptors used for PCA analysis.
Figure 2Hydrophobic binding site formed by nonpolar residues of both TM domains.
Models Obtained from Common Molecular Descriptors Distributiona
| confusion
matrix | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| property | intersection point | TP | TN | FP | FN | sensitivity | specificity | MCC | accuracy |
| H-Acc | 2.5 | 902 | 156 | 385 | 165 | 0.85 | 0.29 | 0.16 | 0.66 |
| H-Don | 3.5 | 1005 | 108 | 433 | 62 | 0.94 | 0.20 | 0.22 | 0.69 |
| LogP | 3 | 886 | 334 | 208 | 180 | 0.83 | 0.62 | 0.45 | 0.76 |
| LogS | –4 | 896 | 355 | 186 | 171 | 0.84 | 0.66 | 0.50 | 0.78 |
| MR | 10 | 894 | 373 | 168 | 173 | 0.84 | 0.69 | 0.53 | 0.79 |
| MolWt | 300 | 1013 | 238 | 303 | 54 | 0.95 | 0.44 | 0.48 | 0.78 |
| N+O | 3.5 | 883 | 168 | 373 | 184 | 0.83 | 0.31 | 0.16 | 0.65 |
Note: H-Acc, number of hydrogen bond acceptor; H-Don, number of hydrogen bond donors, LogP, logarithm of partition coefficient (octonal/water); LogS, logarithm of water solubility; MR, molar refractivity, MolWt, molecular weight, N + O, number of nitrogen and oxygen.
H-Acc ≤ 2, noninhibitors; H-Acc ≥ 3, inhibitors.
H-Don ≤ 3, inhibitors; H-Don ≥ 4, noninhibitors.
N+O ≤ 3, noninhibitors; N+O ≥ 4, inhibitors.
Summary of Machine-Learning Models Based on BestFirst Feature Selection Method with the Internal Test Seta
| confusion matrix | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| descriptors | models | TP | TN | FP | FN | sensitivity | specificity | accuracy | G-mean |
| MOE | RF | 215 | 112 | 60 | 20 | 0.91 | 0.65 | 0.80 | 0.77 |
| SVM | 219 | 109 | 63 | 16 | 0.93 | 0.63 | 0.81 | 0.77 | |
| KNN | 215 | 114 | 58 | 20 | 0.91 | 0.66 | 0.81 | 0.78 | |
| BQSAR | 196 | 120 | 52 | 39 | 0.83 | 0.70 | 0.78 | 0.76 | |
| MACCS | RF | 207 | 96 | 76 | 28 | 0.88 | 0.56 | 0.74 | 0.70 |
| SVM | 199 | 75 | 97 | 36 | 0.85 | 0.44 | 0.67 | 0.61 | |
| KNN | 215 | 79 | 93 | 20 | 0.91 | 0.46 | 0.72 | 0.65 | |
| BQSAR | 158 | 117 | 55 | 77 | 0.67 | 0.68 | 0.68 | 0.68 | |
| SS-FP | RF | 215 | 73 | 99 | 20 | 0.91 | 0.42 | 0.71 | 0.62 |
| SVM | 220 | 66 | 106 | 15 | 0.94 | 0.38 | 0.70 | 0.60 | |
| KNN | 220 | 67 | 105 | 15 | 0.94 | 0.39 | 0.71 | 0.60 | |
| BQSAR | 188 | 86 | 86 | 47 | 0.80 | 0.50 | 0.67 | 0.63 | |
| combined | RF | 215 | 118 | 54 | 20 | 0.91 | 0.69 | 0.82 | 0.79 |
| SVM | 219 | 106 | 66 | 16 | 0.93 | 0.62 | 0.80 | 0.76 | |
| KNN | 207 | 124 | 48 | 28 | 0.88 | 0.72 | 0.81 | 0.80 | |
| BQSAR | 193 | 118 | 54 | 42 | 0.82 | 0.69 | 0.76 | 0.75 | |
Note: RF, random forest; SVM, support vector machine, KNN, kappa nearest neighbor; BQSAR, binary QSAR.
BestFirst descriptors from 2D-MOE.
BestFirst descriptors from MACCS fingerprints.
Substructure fingerprints.
BestFirst descriptors from all the calculated descriptors.
Matthews Correlation Coefficient of the Models for the Internal Test Set Predictions (10-Fold Cross-Validations Are Provided in the Parentheses)a
| BestFirst descriptor models | ||||
|---|---|---|---|---|
| classification methods | MOE | MACCS | SS-FP | combined |
| RF | 0.60 (0.64) | 0.47 (0.55) | 0.40 (0.43) | 0.63 (0.66) |
| SVM | 0.61 (0.55) | 0.31 (0.38) | 0.40 (0.38) | 0.59 (0.59) |
| KNN | 0.61 (0.61) | 0.43 (0.46) | 0.40 (0.41) | 0.61 (0.59) |
| BQSAR | 0.54 (0.63) | 0.35 (0.41) | 0.32 (0.41) | 0.51 (0.57) |
Note: RF, random forest; SVM, support vector machine; kNN, kappa nearest neighbor; BQSAR, binary QSAR.
Figure 3Schematic representation of occurrence of MACCS fingerprints in a phenylpyrazolon-type P-gp inhibitor.
Descriptors Selected by the BestFirst Algorithm for the Combined Descriptor Set
| Descriptor set | |
|---|---|
| MACCS | 17 (CTC) |
| 50 (C=C(C)) | |
| 54 (QHAAQH) | |
| 125 (aromatic ring > 1) | |
| substructure fingerprint | 84 (carboxylic acid) |
| 90 (carbothioic S ester) | |
| MOE 2D | a_hyd |
| a_nC | |
| b_single | |
| logP(o/w) | |
| b_rotR | |
| density | |
| logS | |
| vdw_area | |
| vsa_hyd | |
External Test Set Predictions
| descriptors | models | sensitivity | specificity | accuracy | G-mean | MCC |
|---|---|---|---|---|---|---|
| MOE | RF | 0.98 | 0.52 | 0.70 | 0.71 | 0.52 |
| SVM | 0.99 | 0.51 | 0.69 | 0.71 | 0.52 | |
| kNN | 0.87 | 0.56 | 0.68 | 0.70 | 0.43 | |
| B-QSAR | 0.86 | 0.65 | 0.73 | 0.75 | 0.50 | |
| MACCS | RF | 0.63 | 0.71 | 0.68 | 0.67 | 0.34 |
| SVM | 0.27 | 0.93 | 0.68 | 0.50 | 0.28 | |
| kNN | 0.79 | 0.67 | 0.71 | 0.72 | 0.44 | |
| B-QSAR | 0.77 | 0.16 | 0.39 | 0.35 | –0.09 | |
| SS-FP | RF | 0.79 | 0.42 | 0.56 | 0.57 | 0.21 |
| SVM | 0.94 | 0.24 | 0.51 | 0.48 | 0.23 | |
| kNN | 0.91 | 0.27 | 0.51 | 0.50 | 0.22 | |
| B-QSAR | 0.49 | 0.82 | 0.69 | 0.63 | 0.33 | |
| combined | RF | 0.99 | 0.57 | 0.73 | 0.75 | 0.57 |
| SVM | 0.97 | 0.62 | 0.75 | 0.77 | 0.59 | |
| kNN | 0.64 | 0.72 | 0.69 | 0.68 | 0.36 | |
| B-QSAR | 0.58 | 0.74 | 0.68 | 0.66 | 0.32 |
Simplified Classification Models using Rule of Five Descriptorsa
| confusion
matrix | ||||||||
|---|---|---|---|---|---|---|---|---|
| models | TP | TN | FP | FN | sensitivity | specificity | MCC | accuracy |
| RF | 127 | 117 | 97 | 5 | 0.96 | 0.55 | 0.52 | 0.71 |
| SVM | 131 | 90 | 124 | 1 | 0.99 | 0.42 | 0.46 | 0.64 |
| KNN | 117 | 102 | 112 | 15 | 0.89 | 0.48 | 0.37 | 0.63 |
| DT | 126 | 132 | 82 | 6 | 0.95 | 0.62 | 0.57 | 0.75 |
Statistics describe the classification performance on the external test set. Note: RF, random forest; SVM, support vector machine; kNN, kappa nearest neighbor; DT, decision tree; MCC, Matthews’s correlation coefficient.
Figure 4Examples of misclassified compounds in the test set.
Figure 5Probability Score of P-gp Inhibitors and Noninhibitors in the Test Set.
Classification Statistics with Different Probability Cutoffs
| sensitivity | specificity | accuracy | G-mean | MCC | ||
|---|---|---|---|---|---|---|
| training and internal test set | all | 0.91 | 0.69 | 0.82 | 0.79 | 0.63 |
| <0.4/>0.6 | 0.94 | 0.68 | 0.84 | 0.80 | 0.66 | |
| <0.3/>0.7 | 0.95 | 0.78 | 0.89 | 0.86 | 0.75 | |
| <0.2/>0.8 | 0.97 | 0.73 | 0.89 | 0.84 | 0.74 | |
| <0.1/>0.9 | 0.99 | 0.78 | 0.94 | 0.88 | 0.83 | |
| external test set | all | 0.99 | 0.56 | 0.73 | 0.75 | 0.56 |
| <0.4/>0.6 | 1.00 | 0.56 | 0.74 | 0.75 | 0.59 | |
| <0.3/>0.7 | 1.00 | 0.60 | 0.77 | 0.77 | 0.63 | |
| <0.2/>0.8 | 1.00 | 0.50 | 0.75 | 0.71 | 0.58 | |
| <0.1/>0.9 | 1.00 | 0.60 | 0.82 | 0.78 | 0.68 |
Figure 6Distribution of P-gp Inhibitors and Noninhibitors based on ChemScore scoring. Sensitivity, specificity and MCC were calculated from true and misclassification rate at intersection point of two curves. (A) Distribution based on ChemScore alone and (B) distribution based on a combined ChemScore-logP score.
Summary of Models Obtained Using Different Scoring Functions
| ligand protonation | scoring function | intersection point | sensitivity | specificity | accuracy | G-mean | MCC | |
|---|---|---|---|---|---|---|---|---|
| CS docking run | neutral | Chemscore | 28 | 0.76 | 0.73 | 0.75 | 0.75 | 0.48 |
| Goldscore | 25 | 0.36 | 0.66 | 0.46 | 0.49 | 0.02 | ||
| ASP | 25 | 0.76 | 0.62 | 0.71 | 0.68 | 0.36 | ||
| ChemPLP | 50 | 0.66 | 0.69 | 0.67 | 0.68 | 0.34 | ||
| XScore | 6 | 0.63 | 0.78 | 0.68 | 0.70 | 0.38 | ||
| charged | Chemscore | 30 | 0.68 | 0.79 | 0.71 | 0.73 | 0.44 | |
| Goldscore | 18 | 0.50 | 0.42 | 0.47 | 0.46 | –0.08 | ||
| ASP | 29 | 0.60 | 0.79 | 0.67 | 0.69 | 0.38 | ||
| ChemPLP | 50 | 0.70 | 0.68 | 0.70 | 0.69 | 0.37 | ||
| XScore | 6 | 0.68 | 0.73 | 0.70 | 0.71 | 0.39 | ||
| GS docking run | neutral | Chemscore | 22 | 0.73 | 0.58 | 0.68 | 0.65 | 0.31 |
| Goldscore | 45 | 0.61 | 0.75 | 0.66 | 0.67 | 0.34 | ||
| ASP | 25 | 0.72 | 0.56 | 0.67 | 0.64 | 0.28 | ||
| ChemPLP | 50 | 0.65 | 0.66 | 0.65 | 0.65 | 0.29 | ||
| XScore | 6 | 0.65 | 0.73 | 0.68 | 0.69 | 0.36 | ||
| charged | Chemscore | 25 | 0.59 | 0.75 | 0.64 | 0.66 | 0.32 | |
| Goldscore | 45 | 0.71 | 0.63 | 0.68 | 0.67 | 0.33 | ||
| ASP | 25 | 0.74 | 0.57 | 0.68 | 0.65 | 0.31 | ||
| ChemPLP | 50 | 0.70 | 0.65 | 0.68 | 0.67 | 0.33 | ||
| XScore | 6 | 0.68 | 0.71 | 0.69 | 0.69 | 0.37 | ||
| combined score (ChemScore + logP) | –0.5 | 0.81 | 0.69 | 0.77 | 0.75 | 0.49 | ||
Figure 7Verapamil docking poses generated by 4 different docking runs. (A) CS docking with neutral ligand, (B) CS docking run with positively charged ligand, (C) GS docking run with neutral ligand, and (D) GS docking run with positively charged ligand.
Figure 8PLIF analysis. Important residues are shown with their hydrophobic and hydrogen bonding interactions: (A) inhibitors and (B) noninhibitors.