Literature DB >> 31640190

A Machine Learning-Based Prediction Platform for P-Glycoprotein Modulators and Its Validation by Molecular Docking.

Onat Kadioglu1, Thomas Efferth2.   

Abstract

P-glycoprotein (P-gp) is an important determinant of multidrug resistance (MDR) because its overexpression is associated with increased efflux of various established chemotherapy drugs in many clinically resistant and refractory tumors. This leads to insufficient therapeutic targeting of tumor populations, representing a major drawback of cancer chemotherapy. Therefore, P-gp is a target for pharmacological inhibitors to overcome MDR. In the present study, we utilized machine learning strategies to establish a model for P-gp modulators to predict whether a given compound would behave as substrate or inhibitor of P-gp. Random forest feature selection algorithm-based leave-one-out random sampling was used. Testing the model with an external validation set revealed high performance scores. A P-gp modulator list of compounds from the ChEMBL database was used to test the performance, and predictions from both substrate and inhibitor classes were selected for the last step of validation with molecular docking. Predicted substrates revealed similar docking poses than that of doxorubicin, and predicted inhibitors revealed similar docking poses than that of the known P-gp inhibitor elacridar, implying the validity of the predictions. We conclude that the machine-learning approach introduced in this investigation may serve as a tool for the rapid detection of P-gp substrates and inhibitors in large chemical libraries.

Entities:  

Keywords:  P-glycoprotein; artificial intelligence; drug discovery; machine learning; molecular docking; multidrug resistance

Mesh:

Substances:

Year:  2019        PMID: 31640190      PMCID: PMC6829872          DOI: 10.3390/cells8101286

Source DB:  PubMed          Journal:  Cells        ISSN: 2073-4409            Impact factor:   6.600


1. Introduction

ATP-binding cassette (ABC) transporters are energy-dependent efflux pumps responsible for the active efflux of drugs, thereby reducing their intracellular concentration. Due to overexpression of ABC transporters in tumor cells, multidrug resistance (MDR) develops, which leads to the failure of chemotherapy with fatal consequences for cancer patients [1]. P-glycoprotein, being a well-known member among the ABC transporter family, is encoded by the ABCB1/MDR1 gene. It is an important determinant of MDR [2,3,4] and upregulated in many clinically resistant and refractory tumors [5,6]. Its overexpression in tumor cells is associated with efficient extrusion of a large number of established anticancer drugs and natural cytotoxic products out of cancer cells, representing a major drawback of cancer chemotherapy [7]. Resistance is either inherently present or will be acquired during chemotherapy [8,9,10]. Hence, P-glycoprotein (P-gp) represents an important target to search for pharmacological inhibitors to overcome MDR [11]. Targeting P-gp to overcome MDR is of importance to achieve higher success rates for chemotherapy. The concept is to combine P-gp inhibitors with established chemotherapy drugs to resensitize tumors [12,13,14,15]. Machine learning and artificial intelligence are recently acquiring increasing interest in the area of drug discovery [16,17,18] because these methods have an enormous potential to speed up the preclinical development processes at minimal costs. For this purpose, we utilized a machine learning strategy in order to establish a prediction platform that allows to predict whether a given compound behaves as a substrate or an inhibitor of P-gp. Available natural compound databases serve as an invaluable source to identify novel lead compounds that possess activity against certain diseases or disorders by focusing on particular target biomarker proteins. As a majority of established anticancer drugs are of natural origin [19], natural products may serve as lead compounds for derivatization to obtain novel chemical entities with improved pharmacological features. Analyses of the interaction between the compounds and the target protein with molecular docking provide clues about the possible binding mode and binding energy, as we reported before [11,20,21]. Selecting P-gp as target protein, the interaction of test compounds can be compared with that of known P-gp inhibitors, such as verapamil, valspodar, tariquidar, or elacridar, in order to assess their binding properties, docking poses, and binding energies. In those cases, where the test compounds yielded by using the P-gp modulator prediction platform possess similar docking poses and comparable binding energies as known inhibitors, it could be concluded that these compounds may be potential P-gp inhibitors. In the present study, we used machine learning strategies to establish such a P-gp modulator prediction platform for compounds by using defined chemical descriptors to predict whether a given compound can behave as a substrate or an inhibitor of P-gp. Selected compounds from inhibitor or substrate classes were subjected to molecular docking for further verification and compared with known P-gp inhibitors and substrates.

2. Material and Methods

2.1. Preparation of Compound List and Calculation of Chemical Descriptors

For the P-gp modulator/non-modulator prediction model, a compound list with modulators and non-modulators from Broccatelli et al. [22] was used. Compounds for learning and validation steps were randomly selected. Thirty-two modulator and thirty-two non-modulator compounds were used for the learning step, while 16 modulator and 16 non-modulator substances were used for the validation step (Table 1). For the P-gp inhibitor/substrate prediction model, a list of P-gp substrates and inhibitors was prepared by referring to the literature [23], yielding a total of 60 compounds (34 inhibitors, 26 substrates). Again, compounds for learning and validation steps were randomly selected. Forty compounds (20 inhibitors, 20 substrates) were used for learning and model establishment. The remaining 20 compounds (14 inhibitors, 6 substrates) were used for the external validation step (Table 2).
Table 1

Compounds selected for learning and external validation for the P-glycoprotein (P-gp) modulator/non-modulator prediction model.

Learning SetExternal Validation Set
CompoundCategoryCompoundCategoryCompoundCategory
EscitalopramModulatorHydroxyzineNon-modulatorTerfenadineModulator
Simvastatin acidModulatorOxybutyninNon-modulatorPrazosinModulator
NeostigmineModulatorEthosuximideNon-modulatorPrednisoneModulator
ZolmitriptanModulatorWarfarinNon-modulatorChloroquineModulator
AtomoxetineModulatorMexiliteneNon-modulatorLopinavirModulator
MethysergideModulatorSulpirideNon-modulatorPrednisoloneModulator
FamciclovirModulatorThiopentalNon-modulatorVincristineModulator
Lovastatin acidModulatorLamotrigineNon-modulatorSertralineModulator
DarifenacinModulatorDiphenhydramineNon-modulatorLoperamideModulator
PaliperidoneModulatorEnoxacinNon-modulatorEtoposideModulator
TrospiumModulatorMethylphenidateNon-modulatorIndinavirModulator
AprepitantModulatorItraconazoleNon-modulatorDipyridamoleModulator
ApomorphineModulatorNortriptylineNon-modulatorMitoxantroneModulator
CetirizineModulatorGalantamineNon-modulatorCimetidineModulator
Cyclosporin AModulatorRamelteonNon-modulatorBromocriptineModulator
LabetalolModulatorRivastigmineNon-modulatorReserpineModulator
AmisulprideModulatorRopivacaineNon-modulatorOxprenololNon-modulator
5-Hydroxymethyl tolterodineModulatorZonisamideNon-modulatorAlprazolamNon-modulator
CabergolineModulatorZolpidemNon-modulatorOxcarbazepineNon-modulator
XimelagatranModulatorSulfasalazineNon-modulatorTolterodineNon-modulator
Hoechst 33342ModulatorMetoclopramideNon-modulatorZaleplonNon-modulator
Rhodamine 123ModulatorNalmefeneNon-modulatorCyclobenzaprineNon-modulator
Actinomycin DModulatorOxycodoneNon-modulatorNimodipineNon-modulator
OlanzapineModulatorTopiramateNon-modulatorRiluzoleNon-modulator
RanitidineModulatorHydrocodoneNon-modulatorTiagabineNon-modulator
AstemizoleModulatorRosuvastatinNon-modulatorNalbuphineNon-modulator
VerapamilModulatorTropisetronNon-modulatorDuloxetineNon-modulator
ZiprasidoneModulatorVareniclineNon-modulatorPravastatin acidNon-modulator
ChlorpromazineModulatorClemastineNon-modulatorPromazineNon-modulator
ClozapineModulatorClonazepamNon-modulatorBromazepamNon-modulator
TrimethoprimModulatorRopiniroleNon-modulatorLorazepamNon-modulator
ParoxetineModulatorSolifenacinNon-modulatorMirtazapineNon-modulator
Table 2

Compounds selected for learning and external validation for the P-gp inhibitor/substrate prediction model.

Learning SetExternal Validation Set
CompoundCategoryCompoundCategoryCompoundCategoryCompoundCategory
GinsenosideInhibitorEpirubicinSubstrateAgosterolInhibitorColchicinSubstrate
LaniquidarInhibitorEtoposideSubstrateAmiodaroneInhibitorDexamethazoneSubstrate
LoratidineInhibitorFexofenadineSubstrateAmorininInhibitorDigoxinSubstrate
MibefradilInhibitorHoechst 33342SubstrateApigeninInhibitorDocetaxelSubstrate
NaringeninInhibitorIdarubicinSubstrateAtorvastatinInhibitorDoxorubicinSubstrate
Pgp-4008InhibitorIrinotecanSubstrateAtovaquoneInhibitorDaunorubicinSubstrate
PhloretinInhibitorKaempferolSubstrateBiochaninInhibitor
QuercetinInhibitorLoperamideSubstrateBiricodarInhibitor
QuinineInhibitorMitomycinSubstrateCatechinInhibitor
RotenoneInhibitorMitoxantroneSubstrateCefoperazoneInhibitor
SakuranetinInhibitorOndansetronSubstrateChrysineInhibitor
SertralineInhibitorPaclitaxelSubstrateCyclosporineInhibitor
SinensetinInhibitorProcyanidin B2SubstrateDiltiazemInhibitor
StigmasterolInhibitorRhodamine 123SubstrateElacridarInhibitor
SyringaresinolInhibitorTenoposideSubstrate
TamoxifenInhibitorTopotecanSubstrate
TariquidarInhibitorVinblastineSubstrate
ValspodarInhibitorVincristineSubstrate
VerapamilInhibitorVindesineSubstrate
ZosuquidarInhibitorVinorelbineSubstrate
Data Warrior software is a multipurpose chemistry data visualization and data analysis program that calculates various molecular descriptors and properties for a given set of compounds. It was used to calculate the chemical descriptors as previously reported [24,25]. After calculation of the 32 chemical descriptors, correlation coefficients between descriptors and correlation of the descriptors with the P-gp modulator category (substrate or inhibitor) were determined using SPSS statistics software version 23.0.0.3 (IBM, Armonk, NY: IBM Corp, USA). If the correlation coefficient between the P-gp modulator category (substrate or inhibitor) and a certain descriptor was below 0.1, this descriptor was omitted. Only descriptors correlating with the P-gp modulator (substrate or inhibitor) category above 0.1 were selected for further processing. As a next step, descriptors having a pairwise correlation coefficient to the P-gp modulator category lower than 0.9 were excluded [26]. By this strategy, relevant descriptors without an issue of over-fitting can be selected.

2.2. P-Glycoprotein Modulator Prediction Model Establishment

At first, a model, which can predict whether a given compound is a P-gp modulator, was built by using the compound list from Broccatelli et al. [22] After applying the descriptor selection criteria by considering the relevancy and over-fitting issues, “logP”, “H-donors”, “polar surface area”, “ligand efficiency dependent lipophilicity”, “molecular complexity”, “stereo centers”, “rotatable bonds”, “rings closures”, “aromatic rings”, “sp-3 atoms”, “amides”, “amines”, “alkyl-amines, ”and “basic nitrogens” were considered for the preparation of the P-gp modulator/non-modulator prediction model. Various classification algorithms with the leave-one-out random sampling method were tested, i.e., k-Nearest Neighboring (kNN), Neural Network, Random Forest (RF), and Support Vector Machine (SVM). Receiver operating characteristic (ROC) curves are depicted in Figure 1. The receiver operating characteristic (ROC) curve plotted the true positive rate (= sensitivity) against the false positive rate (= 1-specificity). The RF algorithm performed better than the other classification algorithms both in learning and validation steps. The overall performance for the established model based on RF algorithm is summarized in Table 3. The establishment of the P-gp modulator/non-modulator and P-gp inhibitor/substrate prediction models were performed by using the machine learning software Orange (Ljubljana, Slovenia) [27].
Figure 1

Receiver operating characteristic (ROC) curves of k Nearest Neighboring (kNN), Neural Network, Random Forest (RF), and Support Vector Machine (SVM) classification algorithms based on random leave-one-out sampling for the P-gp modulator/non-modulator prediction model for the learning step.

Table 3

Performance of the P-gp modulator/non-modulator prediction model based on the RF classifier algorithm.

StepsSensitivitySpecificityOverall Predictive AccuracyPrecision
Learning 0.9380.9690.9530.968
External Validation 0.9380.9380.9380.938
After applying the descriptor selection criteria by considering the relevancy and over-fitting issues, “logP”, “total surface area”, “shape index”, “molecular flexibility”, “rotatable bonds”, “aromatic rings”, “aromatic atoms”, “aromatic nitrogens”, “basic nitrogens”, “symmetric atoms”, and “acidic oxygens” were considered for P-gp inhibitor/substrate prediction model preparation. Various classification algorithms with the leave-one-out random sampling method were tested, i.e., kNN, Neural Network, RF, and SVM. The ROC curves are depicted in Figure 2. The RF algorithm performed better than the other classification algorithms. The overall performance for the established model is summarized in Table 4.
Figure 2

ROC curves of kNN, Neural Network, RF, and SVM classification algorithms based on random leave-one-out sampling for the P-gp inhibitor/substrate prediction model for the learning step.

Table 4

Performance of the P-gp inhibitor/substrate prediction model based on the RF classifier algorithm.

StepsSensitivitySpecificityOverall Predictive AccuracyPrecision
Learning 0.7500.7000.7250.714
External Validation 0.7860.8330.8000.917
In order to evaluate the model performance further and select potential inhibitors, a P-gp modulator compound list consisting of 643 compounds from ChEMBL was used.

2.3. Molecular Docking

The recently published human P-gp structure was used (nanodisc reconstituted in complex with UIC2 fab and paclitaxel at the drug-binding pocket, PDB ID: 6QEX, in the absence of a lipid bilayer) [28]. The Fab chains were deleted. The bound ligands marked as “HETATM” including taxol were also deleted from the PDB structure file in order to prevent interference with molecular docking. The preparation of the final receptor structure as “.pdbqt” file was performed with Autodock tools 1.5.7. Selected compounds from inhibitor and substrate classes have been subjected to an automated and comprising molecular docking campaign by using the high-performance supercomputer MOGON (Johannes Gutenberg University, Mainz). Compound flexibilities were taken into account and a rigid receptor structure was used. At first, three independent screening of all 643 compounds from ChEMBL with Autodock Vina algorithm was performed by focusing on the drug-binding pocket of P-gp, where the majority of the known inhibitors and substrates bind to. The grid parameters are listed in Table 5.
Table 5

Grid parameters for molecular docking analyses on human P-gp.

xyz
Number of Points 12698116
Grid Center 168.614166.372162.000
Grid Spacing (Å) 0.375
Afterward, the top 20 compounds in terms of binding energy yielded from both inhibitor and substrate predictions were selected for molecular docking. Each molecular docking was based on three independent dockings each consisting of 2,500,000 calculations. This means that each data point represents the mean value of 7,500,000 individual MOGON-based calculations. The Autodock 4 algorithm was used for defined molecular docking calculations on the drug-binding pocket of P-gp as described before [11], and Visual Molecular Dynamics (VMD) software (Theoretical and Computational Biophysics group at the Beckman Institute, University of Illinois at Urbana-Champaign) was used for the visualization of the docking poses. Estimated inhibition constants were calculated by the Autodock algorithm with the equation: Ki (M) ΔG (cal/mol) = 1000 * LBE (lowest binding energy, kcal/mol) R (cal/mol-K): gas constant, 1.986 cal/mol-K T (K): room temperature, 298 K

2.4. Boxplot Analysis

The distribution of the values for the descriptors used for the P-gp inhibitor/substrate prediction model and the comparison for the predicted inhibitors and substrates among the ChEMBL P-gp modulator list were subjected to Boxplot analysis using Microsoft Excel 2019 (Microsoft, USA). Statistical significances were evaluated by the t-test (two-tailed, two-sample unequal variance).

3. Results

3.1. P-glycoprotein Modulator Predictions

The P-gp modulator/non-modulator prediction model was evaluated with the validation set as mentioned in the corresponding method part. The RF algorithm reached 0.938 for all parameters. The ChEMBL P-gp modulator list of 643 compounds was tested, and 641 out of 643 substances were correctly predicted as modulators. The P-gp inhibitor/substrate prediction model with the ChEMBL P-gp modulator list of 643 compounds was evaluated. A total of 493 substances were predicted as inhibitors, and 150 compounds were predicted as substrates. Subjecting all compounds to Autodock Vina screening allowed to rank them according to their binding energies. The top 20 inhibitor predictions with strong interaction to P-gp are shown in Table 6. These inhibitors were selected for subsequent molecular docking. The top 20 substrate predictions with strong interaction to P-gp are shown in Table 7. These substrates were also selected substances for subsequent molecular docking. The complete predictions for all 493 inhibitors together with their binding affinities to P-gp are shown in Supplementary Table S1, while all predictions for the 150 substrates and their affinities to P-gp are listed in Supplementary Table S2. The average lowest binding energy (LBE) was -8.155 for the inhibitors and -9.289 for the substrates.
Table 6

Prediction of the top 20 P-gp inhibitors identified by the RF classification algorithm using the ChEMBL P-gp modulator list of 493 compounds. The results were validated by determining the binding affinities using Autodock VINA.

NameChEMBL IDInhibitor ProbabilityClassVINA LBE (kcal/mol)
Karavoate PCHEMBL16416770.849Synthetic−12.200 ± 1.212
Tribenzoylbalsaminol FCHEMBL19288540.549Synthetic−12.033 ± 0.896
ZosuquidarCHEMBL4441720.513Synthetic−11.967 ± 0.058
Latilagascenes DCHEMBL4359170.566Synthetic−11.700 ± 0.001
Dihydrocytochalasin BCHEMBL20747350.513Synthetic−11.367 ± 0.231
Jolkinoate ICHEMBL23156180.593Synthetic−11.300 ± <0.001
Karavoate KCHEMBL16416720.849Synthetic−11.267 ± 0.493
FanchininCHEMBL1760450.586Synthetic−11.233 ± 0.208
Latilagascene ICHEMBL5110180.586Synthetic−11.167 ± 0.058
Karavoate LCHEMBL16416730.766Synthetic−11.133 ± 0.808
3-MethylcholanthreneCHEMBL405830.788Synthetic−11.100 ± <0.001
LonafarnibCHEMBL2987340.567Synthetic−11.000 ± <0.001
Karavoate NCHEMBL16416750.666Synthetic−10.933 ± 0.058
TariquidarCHEMBL3484750.619Synthetic−10.933 ± 0.404
PimozideCHEMBL14230.517Synthetic−10.900 ± 0.100
Karavoate ICHEMBL16416700.766Synthetic−10.767 ± 0.058
CryptotanshinoneCHEMBL1874600.663Natural−10.700 ± <0.001
Jolkinol BCHEMBL4892650.577Synthetic−10.700 ± <0.001
AstemizoleCHEMBL2964190.617Synthetic−10.667 ± 0.115
MetergolineCHEMBL192150.732Natural−10.600 ± <0.001
Table 7

Prediction of P-gp substrates identified by the RF classification algorithm using the ChEMBL P-gp modulator list of 150 compounds. The results were validated by determining the binding affinities using Autodock VINA.

NameChEMBL IDSubstrate probabilityClassVINA LBE (kcal/mol)
VindolineCHEMBL5265460.771Synthetic−15.000 ± <0.001
CepharanthinCHEMBL20749480.614Natural−12.600 ± <0.001
Latilagascene GCHEMBL4481930.514Synthetic−12.300 ± <0.001
Mk3207CHEMBL19109360.733Synthetic−12.167 ± 0.058
ErgocristineCHEMBL4463150.767Natural−12.067 ± 0.058
Cytochalasin ECHEMBL4948560.6Natural−11.800 ± <0.001
Jolkinoate LCHEMBL23156210.567Synthetic−11.533 ± 0.058
IrinotecanCHEMBL4810.967Natural−11.400 ± 0.819
Latilagascenes ECHEMBL3735110.614Synthetic−11.367 ± 0.116
DofequidarCHEMBL650670.583Synthetic−11.300 ± 0.001
AcetyldigoxinCHEMBL20747250.708Natural−11.233 ± 0.808
DihydroergocristineCHEMBL6017730.767Natural−11.133 ± 0.666
TelcagepantCHEMBL2365930.517Synthetic−11.067 ± 0.058
ErgotamineCHEMBL4420.8Natural−10.933 ± 0.058
Candesartan CilexetilCHEMBL10140.567Synthetic−10.900 ± 0.200
DigoxinCHEMBL17510.708Natural−10.833 ± 1.097
BromocriptineCHEMBL4930.767Natural−10.800 ± 0.100
ItrazoleCHEMBL643910.564Synthetic−10.700 ± 0.436
DigitoxinCHEMBL2542190.725Natural−10.667 ± 0.462
PaclitaxelCHEMBL4286470.808Natural−10.633 ± 0.462
Among the 493 inhibitor compounds were 117 natural products (= 23.7%), while all other compounds were of synthetic origin (Supplementary Table S1). The proportion of natural products was higher among the predicted P-gp substrates (69/150 = 46%) (Supplementary Table S2). This trend was even more apparent if we focused on the top 20 inhibitor or substrate compounds only (Table 6 and Table 7). Here, 2/20 (= 10%) were predicted inhibitors, but 11/20 (= 55%) were predicted substrates, indicating that P-glycoprotein may expel natural xenobiotics from cells with higher probability.

3.2. Molecular Docking

After running the prediction model on the P-gp modulator list from ChEMBL and the Autodock VINA screening, the top 20 compounds from the inhibitor class and the top 20 compounds from the substrate class were selected for molecular docking analyses on human P-gp. The lowest binding energies (LBE) and predicted inhibition constants are listed in Table 8 for the inhibitors and Table 9 for the substrates.
Table 8

Lowest binding energies (LBE) and predicted inhibition constants obtained by molecular docking of the top 20 P-gp inhibitors.

P-gp InhibitorAutoDock LBE (kcal/mol)Predicted Inhibition Constant (µM)
3-Methylcholanthrene−8.900 ± 0.0010.300 ± <0.001
Astemizole−9.693 ± 0.0470.079 ± 0.007
Cryptotanshinone−9.010 ± 0.0010.251 ± <0.001
Dihydrocytochalasin B−10.460 ± 0.0200.0212 ± 0.001
Fanchinin−9.937 ± 0.0670.0522 ± 0.006
Jolkinoate I−10.440 ± 0.2000.0232 ± 0.008
Jolkinol B−10.250 ± 0.0440.0307 ± 0.002
Karavoate I−12.310 ± 0.2350.001 ± <0.001
Karavoate K−12.330 ± 0.2130.001 ± <0.001
Karavoate L−12.807 ± 0.2000.0004 ± <0.001
Karavoate N−12.160 ± 0.5600.002 ± 0.001
Karavoate P−13.537 ± 0.6050.0002 ± <0.001
Latilagascene I−11.147 ± 0.5610.009 ± 0.009
Latilagascenes D−12.220 ± 0.3700.001 ± 0.001
Lonafarnib−11.433 ± 0.0870.004 ± 0.001
Metergoline−9.737 ± 0.0290.073 ± 0.004
Pimozide−10.220 ± 0.3240.031 ± 0.025
Tariquidar−11.273 ± 0.2740.006 ± 0.002
Tribenzoylbalsaminol F−12.403 ± 0.1180.001 ± <0.001
Zosuquidar−11.257 ± 0.3610.006 ± 0.004
Elacridar (positive control)−11.093 ± 0.3610.008 ± 0.004
Table 9

Lowest binding energies (LBE) and predicted inhibition constants obtained by molecular docking of the top 20 P-gp substrates.

P-gp substrateAutoDock LBE (kcal/mol)Predicted Inhibition Constant (µM)
Acetyldigoxin−11.767 ± 0.4800.003 ± 0.002
Bromocriptine−12.360 ± 1.020.002 ± 0.001
Candesartan Cilexetil−11.153 ± 0.3700.007 ± 0.004
Cepharanthin−10.753 ± 0.0060.013 ± <0.001
Cytochalasin E−10.957 ± 0.0060.093 ± 0.001
Digitoxin−11.390 ± 0.5170.006 ± 0.004
Digoxin−11.500 ± 0.1510.004 ± 0.001
Dihydroergocristine−11.670 ± 0.0560.003 ± <0.001
Dofequidar−10.970 ± 0.3510.010 ± 0.006
Ergocristine−12.407 ± 0.0120.001 ± <0.001
Ergotamine−11.227 ± 0.1500.006 ± 0.001
Irinotecan−11.380 ± 0.0200.005 ± <0.001
Itrazole−10.843 ± 0.1860.012 ± 0.003
Jolkinoate L−10.643 ± 0.6810.022 ± 0.016
Latilagascenes E−11.770 ± 0.1850.002 ± 0.001
Latilagescene G−12.500 ± 0.3160.001 ± <0.001
Mk-3207−11.650 ± 0.0200.003 ± <0.001
Paclitaxel−9.607 ± 0.3590.103 ± 0.065
Telcagepant−9.333 ± 0.0210.144 ± 0.005
Vindoline−7.337 ± 0.2114.363 ± 1.389
Doxorubicin (positive control)−11.070 ± 0.1350.008 ± 0.002
The negative control compounds (oxprenolol, promazine, riluzole) revealed weaker interaction with P-gp (Table 10) and slightly different docking pose as well (Figure 3).
Table 10

Lowest binding energies (LBE) and predicted inhibition constants obtained by molecular docking of the non-modulators.

P-gp InhibitorAutoDock LBE (kcal/mol)Predicted Inhibition Constant (µM)
Oxprenolol−5.743 ± 0.39870.273 ± 40.057
Promazine−6.933 ± 0.0218.273 ± 0.262
Riluzole−5.380 ± 0.010114.080 ± 2.326
Figure 3

Molecular docking results for selected non-modulators (pink).

As can be seen in Figure 4, the predicted inhibitors possessed similar docking poses as elacridar at the drug-binding pocket of P-gp. Similar results were observed for the substrates: The predicted substrates revealed similar docking poses as doxorubicin. Hence, these results validated the precision and reliability of the model.
Figure 4

Molecular docking results for selected inhibitors (red) and substrates (green) yielded from the P-gp inhibitor/substrate prediction model. Elacridar (blue) and doxorubicin (yellow) were selected as control drugs.

Predicted inhibitors and substrates interact with P-gp significantly stronger than the negative control compounds. This is clear both from the binding energies and predicted inhibition constants. Binding energies of non-modulators are within −5.380 (piluzole) to −6.933 (promazine) kcal/mol and the predicted inhibition constants are within 8.273–114.080 µM, whereas binding energies for the predicted substrates are within −7.337 (vindoline) to −12.500 (latilagescene G) and for the predicted inhibitors −8.900 (3-methylcholanthrene) to −13.537 (karavoate P). Predicted inhibition constants for the predicted substrates are within 0.001–4.363 and for the predicted inhibitors 0.0002–0.300 µM. Docking pose of the negative control compounds differs from that of inhibitors and substrates. Overall, it can be speculated that the predicted inhibitors interact with P-gp stronger than the predicted substrates and the non-modulators are making weak interactions with P-gp and they bind to a different site. The distribution of the values for the descriptors used to build the model and the comparison for the predicted inhibitors and substrates in terms of those descriptor values were performed with Boxplot analysis. As can be seen from Figure 5, the inhibitors revealed significantly different values for all descriptors except logP and acidic oxygens. The average values of descriptors for inhibitors and substrates are listed in Table 11.
Figure 5

Boxplot analysis of the descriptors used for the model and comparison of the predicted inhibitors and substrates.

Table 11

Average values of descriptors for inhibitors and substrates.

DescriptorInhibitorSubstrate
cLogP3.498 ± 2.4643.134 ± 2.962
Total surface area311.199 ± 188.142461.870 ± 286.187
Shape index0.529 ± 0.1250.429 ± 0.081
Molecular flexibility0.395 ± 0.1410.332 ± 0.114
Rotatable bonds6.799 ± 12.1589.818 ± 11.778
Aromatic rings1.450 ± 1.1681.918 ± 1.330
Aromatic atoms8.237 ± 6.47010.759 ± 7.098
Symmetric atoms2.649 ± 3.6373.582 ± 4.477
Aromatic nitrogens0.301 ± 0.7720.559 ± 1.141
Basic nitrogens0.441 ± 0.6250.659 ± 0.762
Acidic oxygens0.117 ± 0.3610.171 ± 0.462

4. Discussion

In the present study, we utilized machine learning methods based on leave-one-out random sampling in order to develop a P-gp modulator prediction platform by using chemical descriptors. The main focus was to predict whether a given compound can behave as substrate or inhibitor of P-gp. The RF classification algorithm (AUC:0.774) outperformed the other tested algorithms (kNN—0.676, Neural Network—0.745, SVM—0.720). Performance scores for the external validation set were even higher than the learning set with better sensitivity (0.786 vs. 0.750), specificity (0.833 vs. 0.700), overall prediction accuracy (0.800 vs. 0.725), and precision (0.917 vs. 0.714). Further testing with the P-gp modulator list from ChEMBL yielded promising results with accurate predictions. Four compounds from inhibitor and four compounds from substrate prediction list were selected for molecular docking analyses. Validations with molecular docking on a recently released human P-gp structure were performed in terms of binding energy and docking poses by including known inhibitor (elacridar) and substrate (doxorubicin) as controls. Curcumin, miconazole, tacrolimus, and venlafaxine revealed a similar docking pose at the drug-binding pocket of P-gp with comparable binding energies with that of elacridar. MK-3207, rifampin, vindoline, and voacamine revealed similar docking poses and comparable binding energy with those of doxorubicin. Overall, the precision and reliability of the model were further confirmed. Machine learning and artificial intelligence attracted increasing interest in the drug discovery area [18,29,30], and utilizing these methods possess great potential for drug discovery, as they save time and costs during the preclinical steps. The RF algorithm depends on multiple decision trees that are built based on the training data, and a majority voting scheme is used to make classification or regression predictions [31]. RF application to drug discovery has been recently reported, and it outperformed other algorithms such as SVM and NN in terms of feature selection [32]. There are various studies in the literature that utilized machine-learning strategies focusing on P-gp. One study pointed out a P-gp substrate prediction model based on RF algorithm to estimate transport potential for central nervous system drugs, accuracy lies between 0.713 and 0.846 whereas precision is between 0.633 and 0.777 [33]. Our P-gp modulator prediction model involves an accuracy of 0.953 for the learning set and 0.938 for the validation set, and our P-gp inhibitor prediction model has an accuracy value of 0.725 for the learning set and 0.800 for the validation set. In terms of precision, our models also perform better. Modulator prediction model involves a precision of 0.968 for the learning set and 0.938 for the validation set. Inhibitor prediction model has a 0.714 precision for the learning set and 0.917 for the validation set. Similarly, a P-gp substrate efflux ratio prediction model has been recently reported based on SVM algorithm [34]. The affinities of flavonoids to P-gp have been evaluated with an SVM-based model and a high correlation with the experimental data has been achieved [35]. Another study involving P-gp inhibitor prediction was performed for chalcone derivatives and selected inhibitor candidates were analyzed in terms of their docking pose on a homology model of human P-gp [36]. The prediction of blood–brain barrier permeability mechanism of central nervous system drugs has been utilized with an SVM-based model [37]. Binding pattern prediction based on pharmacophore ensemble/SVM method for potential P-gp inhibitors was also recently reported [38]. Another SVM-based model coupled with molecular docking aimed to predict whether a given compound may act as P-gp substrate, the accuracy lies between 0.750 and 0.800, specificity between 0.750 and 0.810, and sensitivity between 0.740 and 0.790 [39]. Our modulator prediction model outperforms that model in all those parameters. Our inhibitor prediction model outperforms in the validation set. Similarly, in 2004, SVM-based P-gp substrate prediction model was reported; sensitivity was 0.812, specificity was 0.792, and accuracy was 0.794 [40]. Our modulator prediction model outperforms that model in all those parameters. Our inhibitor prediction model outperforms in the validation set for the specificity and accuracy parameters. In general, these previously published studies have certain disadvantages, e.g., low performance scores in terms of prediction, focusing on only P-gp substrate prediction or molecular docking with homology models but not crystal structures. Our model is superior compared to the previously published studies for several reasons. It is based on leave-one-out random sampling RF algorithm, focused on both natural as well as synthetic compounds, has high sensitivity, specificity, predictive accuracy, and precision to predict at first P-gp modulator/non-modulator and as a next step to predict P-gp substrate/inhibitor depending on various chemical descriptors, and it was coupled with molecular docking using the recently released crystal structure of human P-gp. The fact that predictions on the P-gp modulator list of compounds from ChEMBL was validated with accurate molecular docking results was also advantageous for our model. Furthermore, after the initial compound screening, selected inhibitors revealed similar docking poses as elacridar (as positive control for an inhibitor) and selected substrates revealed similar docking poses as doxorubicin (as positive control for a substrate). Non-modulators have significantly weaker interaction with P-gp and they bind to a slightly different position. Overall, those observations provide further clues for the reliability of the prediction model. Selected inhibitors and substrates after the virtual screening are supported by literature; astemizole [41], cryptotanshinone [42], dihydrocytochalasin B [43], jolkinol B [44], latilagascenes D [45], lonafarnib [46], tariquidar [12], zosuquidar [47], acetyldigitoxin [48], bromocriptine [49], candesartan cilexetil [50], cepharanthin [51], cytochalasin E [52], digitoxin [53], digoxin [54], dihydroergosrictine [55], dofequidar [56], ergocristine [55], irinotecan [57], latilagascenes E [45], MK-3207 [58], paclitaxel [59], vindoline [60]. Many cancer types involve P-gp overexpression, which is associated with increased efflux of established anticancer drugs and natural cytotoxic products out of cancer cells. This phenomenon represents a major drawback of cancer chemotherapy with limitations in killing tumor populations due to MDR [61,62]. P-gp overexpression is indeed one of the main reasons for MDR and thus inadequate chemotherapy success rate. Targeting P-gp is critical to achieve high success rates for chemotherapy, therefore, identification of novel P-gp inhibitors is critical in that regard. Our prediction platform for P-gp modulators facilitates to predict whether a given compound can behave as a substrate or an inhibitor of P-gp. The selection of potential inhibitors can be further validated by molecular docking and the comparison of the binding energy and docking pose with those of known P-gp inhibitors. As a next step in the future, our model may be helpful to identify potential novel P-gp inhibitors and to develop effective chemotherapy strategies involving combination therapy with targeted chemotherapy drugs and identified P-gp inhibitors.

5. Conclusion

In the present study, we established P-gp modulator/non-modulator and inhibitor/substrate prediction models based on the RF algorithm and leave-one-out random sampling. Validation with molecular docking was performed. The identification of novel P-gp inhibitors is critical to overcome MDR and to achieve better chemotherapy strategies. This model can predict whether a given compound can behave as substrate or inhibitor of P-gp, and will be, thus, helpful to identify potential P-gp inhibitors.
  57 in total

Review 1.  Natural Products as Sources of New Drugs from 1981 to 2014.

Authors:  David J Newman; Gordon M Cragg
Journal:  J Nat Prod       Date:  2016-02-07       Impact factor: 4.050

2.  DataWarrior: an evaluation of the open-source drug discovery tool.

Authors:  Edgar López-López; J Jesús Naveja; José L Medina-Franco
Journal:  Expert Opin Drug Discov       Date:  2019-02-26       Impact factor: 6.098

3.  Pharmacogenomic and molecular docking studies on the cytotoxicity of the natural steroid wortmannin against multidrug-resistant tumor cells.

Authors:  Victor Kuete; Mohamed E M Saeed; Onat Kadioglu; Jonas Börtzler; Hassan Khalid; Henry Johannes Greten; Thomas Efferth
Journal:  Phytomedicine       Date:  2014-11-26       Impact factor: 5.340

Review 4.  Applications of machine learning in drug discovery and development.

Authors:  Jessica Vamathevan; Dominic Clark; Paul Czodrowski; Ian Dunham; Edgardo Ferran; George Lee; Bin Li; Anant Madabhushi; Parantu Shah; Michaela Spitzer; Shanrong Zhao
Journal:  Nat Rev Drug Discov       Date:  2019-06       Impact factor: 84.694

5.  A pharmacodynamic study of docetaxel in combination with the P-glycoprotein antagonist tariquidar (XR9576) in patients with lung, ovarian, and cervical cancer.

Authors:  Ronan J Kelly; Deborah Draper; Clara C Chen; Robert W Robey; William D Figg; Richard L Piekarz; Xiaohong Chen; Erin R Gardner; Frank M Balis; Aradhana M Venkatesan; Seth M Steinberg; Tito Fojo; Susan E Bates
Journal:  Clin Cancer Res       Date:  2010-11-16       Impact factor: 12.531

6.  Development, validation and utility of an in vitro technique for assessment of potential clinical drug-drug interactions involving P-glycoprotein.

Authors:  John P Keogh; Jeevan R Kunta
Journal:  Eur J Pharm Sci       Date:  2006-01-10       Impact factor: 4.384

7.  Characterization of acquired paclitaxel resistance of breast cancer cells and involvement of ABC transporters.

Authors:  Vlasta Němcová-Fürstová; Dana Kopperová; Kamila Balušíková; Marie Ehrlichová; Veronika Brynychová; Radka Václavíková; Petr Daniel; Pavel Souček; Jan Kovář
Journal:  Toxicol Appl Pharmacol       Date:  2016-09-21       Impact factor: 4.219

8.  Differential interactions of cytochalasins with P-glycoprotein.

Authors:  J T Zilfou; C D Smith
Journal:  Oncol Res       Date:  1995       Impact factor: 5.574

9.  In Silico Pharmacoepidemiologic Evaluation of Drug-Induced Cardiovascular Complications Using Combined Classifiers.

Authors:  Chuipu Cai; Jiansong Fang; Pengfei Guo; Qi Wang; Huixiao Hong; Javid Moslehi; Feixiong Cheng
Journal:  J Chem Inf Model       Date:  2018-05-10       Impact factor: 4.956

10.  Flow cytometric functional analysis of multidrug resistance by Fluo-3: a comparison with rhodamine-123.

Authors:  S Koizumi; M Konishi; T Ichihara; H Wada; H Matsukawa; K Goi; S Mizutani
Journal:  Eur J Cancer       Date:  1995-09       Impact factor: 9.162

View more
  1 in total

1.  Identification of novel compounds against three targets of SARS CoV-2 coronavirus by combined virtual screening and supervised machine learning.

Authors:  Onat Kadioglu; Mohamed Saeed; Henry Johannes Greten; Thomas Efferth
Journal:  Comput Biol Med       Date:  2021-03-30       Impact factor: 6.698

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.