Literature DB >> 36120033

Machine Learning Enabled Structure-Based Drug Repurposing Approach to Identify Potential CYP1B1 Inhibitors.

Baddipadige Raju¹, Gera Narendra¹, Himanshu Verma¹, Manoj Kumar¹, Bharti Sapra¹, Gurleen Kaur², Subheet Kumar Jain², Om Silakari¹.

Abstract

Drug-metabolizing enzyme (DME)-mediated pharmacokinetic resistance of some clinically approved anticancer agents is one of the main reasons for cancer treatment failure. In particular, some commonly used anticancer medicines, including docetaxel, tamoxifen, imatinib, cisplatin, and paclitaxel, are inactivated by CYP1B1. Currently, no approved drugs are available to treat this CYP1B1-mediated inactivation, making the pharmaceutical industries strive to discover new anticancer agents. Because of the extreme complexity and high risk in drug discovery and development, it is worthwhile to come up with a drug repurposing strategy that may solve the resistance problem of existing chemotherapeutics. Therefore, in the current study, a drug repurposing strategy was implemented to find the possible CYP1B1 inhibitors using machine learning (ML) and structure-based virtual screening (SB-VS) approaches. Initially, three different ML models were developed such as support vector machines (SVMs), random forest (RF), and artificial neural network (ANN); subsequently, the best-selected ML model was employed for virtual screening of the selleckchem database to identify potential CYP1B1 inhibitors. The inhibition potency of the obtained hits was judged by analyzing the crucial active site amino acid interactions against CYP1B1. After a thorough assessment of docking scores, binding affinities, as well as binding modes, four compounds were selected and further subjected to in vitro analysis. From the in vitro analysis, it was observed that chlorprothixene, nadifloxacin, and ticagrelor showed promising inhibitory activity toward CYP1B1 in the IC50 range of 0.07-3.00 μM. These new chemical scaffolds can be explored as adjuvant therapies to address CYP1B1-mediated drug-resistance problems.

Entities: Chemical

Year: 2022 PMID： 36120033 PMCID： PMC9476183 DOI： 10.1021/acsomega.2c02983

Source DB: PubMed Journal: ACS Omega ISSN： 2470-1343

Introduction

Cytochrome P450 enzymes (CYPs) are heme-coupled monooxygenases involved in the metabolic biotransformation of a wide range of endogenous and exogenous compounds, including certain therapeutically approved anticancer medicines.[1] The human CYP superfamily is reported to be classified into 57 isoforms; among these, cytochrome P450 subfamily 1 (CYP1) enzymes including CYP1A1, CYP1A2, and CYP1B1 are of key interest to most researchers for their dominating involvement in the hydroxylation of procarcinogens such as PAHs and amines to cytotoxic and mutagenic compounds.[2] It is reported that CYP1B1 showed a 40% structural similarity with that of CYP1A1 and CYP1A2 isoforms.[3] The specific overexpression of CYP1B1 was identified in extrahepatic tissues, such as ovarian, lung, brain, lymph, breast, and colon cancer tissues, whereas no detectable amount of CYP1B1 was found in nearby normal tissues.[4−6] Unlike CYP1B1, overexpression of CYP1A1 was identified in the liver and some extrahepatic tissues including the pancreas, uterus, small intestine, and thymus, while CYP1A2 expression was found to be constitutive in human hepatic tissues.[7,8] The expression level of these three CYPs is highly influenced by interactions of certain planar aromatic compounds such as 2,3,7,8-tetrachlorodibenzo-p-dioxin (TCDD) and 7,12-dimethylbenz[a]anthracene (DMBA) with the aryl hydrocarbon receptor (AhR) and a ligand-activated transcription factor.[9−11] The CYP1B1 isoform is the most appealing therapeutic target among the three CYP1 members discussed above for the following reasons. (1) CYP1B1 overexpression has only been found in cancer tissues (ovarian, lung, brain, lymph, breast, and colon). (2) Besides its role in the bioactivation of a variety of procarcinogens, CYP1B1 may play a role in the carcinogenic action of estradiol by converting 17-estradiol (E2) into 4-hydroxyestradiol, which is a mutagenic compound capable of forming covalent interactions with DNA. This type of conversion is not observed in the cases of CYP1A1 and CYP1A2 as these two isoforms mediate estrogen 17-hydroxylation primarily to 2-hydroxyestradiol, which is not a carcinogen. (3) It confers drug resistance by inactivating structurally diverse anticancer drugs such as docetaxel, imatinib, tamoxifen, cisplatin, and paclitaxel.[12−14] As a result, a treatment strategy that promotes the selective inhibition of CYP1B1 as adjuvant therapy could be advantageous in avoiding CYP1B1-mediated drug resistance and cancer progression. Currently, derivatives of various scaffolds such as α-naphthoflavones (ANF), flavonoids, trans-stilbenes, anthraquinones, coumarins, alkaloids, and chalcones have all been explored in the development of CYP1B1 inhibitors.[3,5,10,15−18] However, none of these chemicals reached the market due to their limited solubility and selectivity problems.[5,19,20] To identify potential CYP1B1 inhibitors with improved solubility and selectivity properties in the current study, the drug repurposing approach was utilized because it represents a cost-effective and appealing alternative strategy as existing licensed pharmacological compounds have undergone extensive safety and efficacy examinations.[21−25] Drug repurposing accounts for roughly 30% of the new FDA-approved pharmaceuticals in the United States in recent years.[26] In the area of CYP1B1 research, numerous ligand-based computational techniques have already been forged;[5,27] however, none of these models considered a broad chemical space (various chemical scaffolds) in the design process, and they are only applicable to certain chemical series inhibitors. Therefore, in the present study, the most diverse and potent CYP1B1 inhibitors were utilized in the development of machine learning (ML) models. The best-selected ML models were implemented in virtual screening of the FDA database (https://www.selleckchem.com/screening-l) in combination with structure-based virtual screening and in vitro analysis. Furthermore, molecular dynamics (MD) simulations were used to reveal the specific binding mechanisms between the hit molecule and its target. Thus, a novel CYP1B1 inhibitor with a well-established safety or toxicity profile can be anticipated from this study.

Material and Methods

Data Collection and Preparation

A total of 210 selective CYP1B1 inhibitors with IC50 ranges from 0.002 to 400 μM were collected from ChEMBL, Pubchem, and literature sources.[3−5,7,10,15,18,20,28−35] Some of the important considerations in data collection are as follows: (1) molecules tested with a similar assay technique, namely, the ethoxyresorufin-O-deethylase (EROD) assay, were considered for evaluating CYP1B1 and CYP1A1 enzyme inhibitory activity; and (2) the data of human recombinant cytochrome P450 1A enzymes (CYP1B1 and CYP1A1) were collected. Finally, one hundred thirty-six (136) inhibitors were marked as selective CYP1B1 inhibitors based on selectivity index values, i.e., IC50 of CYP1A1/IC50 of CYP1B1 > 3-folds. The remaining 74 molecules were considered nonselective inhibitors, i.e., IC50 of CYP1A1/IC50 of CYP1B1<3-folds. The details of the data with biological activity values against CYP1B1 and CYP1A1 are given in the Supporting Information as Table S1.

Decoys Generation

The balance of data sets is critical for developing a solid predictive model. The performance of the ML model is highly influenced by data imbalance between classes.[36] To balance the data set, 62 decoy molecules were generated by computing their molecular properties based on the two diverse selective CYP1B1 structures using the DUDE online database (http://dude.docking.org/).[37] In the current study, the AM1-BCC method was used to compute partial charges of all molecules. The geometry of the whole data set was optimized using the AMBER force field by employing an implicit OBC2 solvent model with a minimum energy tolerance of 0.25 kcal/mol. Afterward, the data set was divided into training (192) and test sets (80) in the ratio of 7:3 using a random split algorithm in the CARET library[38] (Table ).

Table 1

Data Used in the Development of ML Models

target	ML algorithms	train set	test set	train set		test set
CYP1B1a	SVM RF ANN	192	80	selective inhibitors	nonselective inhibitors	selective inhibitors	nonselective inhibitors
CYP1B1a	SVM RF ANN	192	80	96	96	40	40

Cytochrome P450 1B1; drug-inactivating enzymes responsible for resistance to the Taxane class of anticancer drugs, and Imatinib.

Calculation of Molecular Fingerprints and Data Preprocessing

In predictive modeling, small molecules were represented in the form of a binary string, known as a molecular fingerprint. In particular, molecular fingerprints reveal details on the molecule’s constitutive substructures. These molecular fingerprints are frequently supplied as a fixed-length string of integers 1 and 0, where 1 denotes the presence of a substructure in the given molecule and 0 denotes absence. The optimized structures of CYP1B1 inhibitors are imported into PaDEL software to compute different molecular fingerprints, namely, Estate (Est 79 bits), Molecular Access System (MACCS 166 bits), and PubChem fingerprint (PubChem 881 bits). A total of 1126 molecular fingerprints were calculated. To remove extraneous features from the data, some statistical filters were applied. The following methods were used to undertake data preprocessing: (1) deletion of molecular descriptors with zero variance values and (2) elimination of noise and bias among derived descriptors by the Pearson correlation analysis.

Feature Selection

To create a good prediction model, it is essential to choose an optimal number of descriptors. Feature selection methods have been extensively used to identify significant feature subsets for improving the prediction performance of the model. When developing a model, if molecular structures are represented by an excess number of descriptors than fixed samples, they may cause a problem of overfitting (unreliable predictions). Therefore, employing feature selection methods in predictive model development procedures is highly important. The details of the feature selection methods used in the current study are mentioned below.

Recursive Feature Elimination (RFE)

The recursive feature elimination (RFE) method has recently gained a lot of popularity due to its high efficiency in discovering informative features or attributes in the classification of inhibitors associated with drug activity analysis in a variety of biochemical fields.[39−42] The RFE algorithm assigns a ranking for features and recursively drops low-rank features and keeps important ones.[42]

Boruta

The Boruta feature selection method is a wrapper strategy based on random forests that eliminates features that are less useful than random probes.[43] By considering the interactions between multiple features, the Boruta algorithm removes irrelevant inputs repeatedly. It creates shadow features by reversing the original feature’s values. If a shadow feature’s importance is greater than or equal to that of the original feature, it is classified as unimportant.

ML Classification Algorithms

In the ML approach, the mined properties of known compounds are used to predict the activity of unknown compounds.[44] In this work, different classification models such as SVM, RF, and ANN were developed in combination with RFE and Boruta feature selection methods to classify selective CYP1B1 inhibitors and noninhibitors. For developing ML models, Rstudio version 1.4.1717 was used in the current study.

Support Vector Machine (SVM)

Vapnik created the support vector machine (SVM) model as a general data modeling methodology for pattern recognition.[45] The main idea behind the development of the SVM algorithm is to map the data into high dimensional space by constructing a hyperplane, which differentiates two separate classes of vectors with a maximum margin. The model can describe the position and direction of the hyperplane using a subset of training vectors. The initial training data in a space X is projected to a higher dimensional feature space in SVM using a Mercer kernel operator K. The following is a list of classifiers to be consideredHere, the maximal hyperplane is represented by αi. When f(x) is less than zero, x is classified as 1; when f(x) is more than zero, x is classified as −1. The radial basis kernel (RBF) function implemented with SVM was used to classify training and test set molecules.[46−48] The SVM algorithm was performed with the help of the kernel parameters Cost (C) and Sigma (σ).[38] To optimize the performance of the developed SVM model, 10-fold cross-validation was performed.

Random Forest

A random forest is a collection of different decision trees. At each node, feature bagging finds the best data split for a random feature subset.[49] The final forecast is a consensus overall decision tree. The correlation between any two trees, as well as the strength of each tree, has an impact on the forest error rate. A tree with a low error rate is a strong classifier. The final prediction is based on a consensus among all decision trees h(x;k), k = 1, where x is the observed input (covariate) vector of length p with the associated random vector X and θk is a set of independent and identically distributed random vectors.There are a handful of parameters that can be adjusted (e.g., number of trees, number of features, and mtry), and these factors have a big impact on the algorithm’s performance. In this study, random forest models were built using the 10-fold cross-validation method.

Artificial Neural Network (ANN)

An artificial neural network (ANN) is made up of a large number of connected artificial neurons.[50] Receiving neurons can process the input signal and then send it to downstream neurons connected to it. The weight assigned to the neurons can affect their learning rate. In the current study, the feed forward neural network was implemented with a single-layer logistic activation function to develop an ANN model. Furthermore, hyperparameter tuning parameters such as decay and cost were also utilized in combination with 10-fold cross-validation to optimize the developed models. The NNET package in Rstudio was used to create neural network models. The neuron’s output signal O is given by the following equationwhere w represents the weight vector and the variable (transfer) function is represented by f (net).

Model Validation

In this study, the robustness of the generated models was evaluated using 10-fold cross-validation of the training set. Data sets are frequently divided into two categories in binary classification models: selective and nonselective inhibitors. The following eqs –5 are used to calculate sensitivity (SE), specificity (SP), precision (P), accuracy (ACC), and Matthews’s correlation coefficient (MCC) to measure the predictive quality of the developed ML models. The MCC of ML models generated by different algorithms can be compared to assess their quality. When a model’s MCC score is near 1, it is deemed better, whereas when it reaches 0, it is considered a complete failure. Furthermore, the receiver operating characteristic (ROC) curve was plotted and the area under the ROC curve (AUC) was calculated. The values of AUC range from 0 to 1. An AUC value of 1 represents a flawless model, whereas 0.5 implies a random classifier, and 0.8 indicates a good model.[51,52]

Molecular Docking

Before performing the docking experiment conformation sampling method of the FLARE program was validated by the redocking experiment.[53] Initially, the cocrystal ligand ANF was extracted from its crystal structure of CYP1B1 (PDB ID: 3PM0); then, the extracted structure was redocked into protein 3PM0. Afterward, both cocrystallized and redocked ligands were aligned and the root-mean-square deviation (RMSD) of heavy atoms was calculated. When the RMSD is less than 2.0, the sampling is regarded as successful; otherwise, it is termed a failure. For the docking study, the PDB structures selected for CYP1B1 and CYP1A1 with PDB IDs 3PM0 [8] and 4I8V,[54] respectively, were based on the criteria reported in the literature. The proteins were prepared, which includes insertion of missing atoms in incomplete residues, modeling of missing loops, removal of cocrystallized external water, and protonation of the titratable residues using an XED force field at pH 7.4. A grid was generated by defining the binding site around the centroid of the cocrystallized ligand, i.e., α-naphthoflavone (ANF).

Molecular Dynamics

To examine the conformational changes and stability of the top docked ligands in association with CYP1B1, molecular dynamic (MD) simulations were performed for a period of 100 ns utilizing the Flare module of cresset working on the AMBER force field.[53] Molecules were solvated using the water-based TIP3P model to mimic the cellular environment. Following this, energy minimization of the system was performed to ensure that there were no steric conflicts and that the system had a good starting structure. The maximum number of iterations and minimum energy tolerance would be kept as 30 and 0.025, respectively. Flare’s “Analysis” protocol is used to calculate the ligand-protein RMSD value and estimate the ligand’s binding orientation. In this study, the remaining parameters have been set to default.

Enzymatic Assay

The recombinant human CYP1B1 and CYP1A1 enzymes were purchased from Sigma-Aldrich. 7-Ethoxyresorufin (7-ER), nicotinamide adenine dinucleotide phosphate (NADPH), glibenclamide, nadifloxacin, chlorprothixene, umbelliferone, propyl gallate, and α naphthoflavone (ANF) were also purchased from Sigma-Aldrich. Ticagrelor was received as a kind gift sample from Cipla Ltd, Mumbai. The inhibitory activity of these repurposed drugs against the CYP1B1 and CYP1A1 enzymes was determined using the EROD assay.[55,56] In general, a 200 μL mixture containing various concentrations (final concentrations were 1, 10, 100, 1000, and 10000 nM) of tested compounds (except for positive and negative control wells), an enzyme source (5 μL of 10 picomole CYP1B1 and 5 μL of 2.5 picomole CYP1A1), 45 μL of 0.1 M (pH = 7.4) phosphate buffer, and 50 μL of 7-ER were incubated in a black 96-well flat-bottomed microplate at 37 °C for various durations of time (incubation times for CYP1B1 and CYP1A1 were 35 and 15 min, respectively). Finally, 50 μL of 1.67 mM NADPH was added to the aforementioned mixture to start the reaction process. Later, using a Biotek Gen 5 microplate reader with 530 and 590 nm excitation and emission filters, the fluorescence intensity was measured for 10 min with a 2 min interval. Three replicates were used for each concentration, and each test was performed in triplicate. The IC50 value of each compound was calculated with the AAT Bioquest calculator https://www.aatbio.com/tools.

Results and Discussion

Data Collection and Data Diversity

A total of 210 selective CYP1B1 inhibitors were collected from the different small-molecule databases and the literature. DUDE, a database for useful decoys, was used to generate 62 decoys. In chemometrics, the diversity of chemical compounds significantly influences the outcome and prediction of an ML model. This can be conquered using a wide range of chemical diversity data sets. In the current study, the structural diversity of training set compounds was measured using AlogP, molecular weight (MW), hydrogen-bond donors (H.bond.donors), hydrogen-bond acceptors (H.bond.acceptors), topological polar surface area (TPSA), and molecular refractivity index (MR) descriptors. Figure A shows that the vast range of molecular weights ranging from approximately 144–600 and the AlogP ranging from −2.9 to 4 indicates that this training set has sufficient chemical space. Interestingly, it was also observed that nonselective inhibitors represented by the blue color showed a higher MW than selective inhibitors represented by the green color. Along with MW and AlogP, other drug-likeness descriptors including H.bond.donors and H.bond.acceptors were estimated, as seen in Figure B. This led to the discovery that H.bond.acceptors ranged from 0 to 7, while H.bond.donors ranged from 0 to 5. Likewise, TPSA and MR index values ranged from 0 to 150, as illustrated in Figure C. Finally, from this entire analysis, it was concluded that the training set has sufficient chemical space and could be used for the development of ML models.

Figure 1

Chemical diversity analysis of training set molecules. (A) Molecular weight on the X-axis vs LogP on the Y-axis. (B) Hydrogen-bond acceptors on the X-axis vs hydrogen-bond donors on the Y-axis. (C) Molar refractivity on the X-axis vs topological polar surface area on the Y-axis.

Data Preprocessing and Feature Selection

First of all, 1126 molecular fingerprints were calculated for CYP1B1 data using the freely available PaDEL tool. Later constant and highly correlated (>0.6) features were removed using the variable threshold and Pearson correlation method. Ultimately, after data preprocessing, 160 fingerprints were identified. These identified features were further subjected to feature selection methods using RFE and Boruta algorithms to identify important and relevant features. Finally, subsets of 20 and 19 features were selected based on the RFE and Boruta variable importance scores. The identified features using both algorithms are given in Table . Finally, different ML models, including SVM, RF, and ANN, were developed using each set of identified features.

Table 2

List of Molecular Fingerprints Used in Model Development

feature selection method	no. of features	selected features
RFE	20	MACCSFP90 + MACCSFP126 + PubchemFP189 + PubchemFP798 + EStateFP11 + PubchemFP686 + MACCSFP150 + PubchemFP374 + MACCSFP49 + PubchemFP715 + PubchemFP537 + PubchemFP667 + EStateFP29 + PubchemFP196 + MACCSFP144 + PubchemFP12 + PubchemFP695 + PubchemFP696 + MACCSFP69 + PubchemFP206
Boruta	19	EStateFP11 + EStateFP29 + MACCSFP49 + MACCSFP90 + MACCSFP126 + MACCSFP134 + MACCSFP137 + MACCSFP144 + MACCSFP150 + PubchemFP12 + PubchemFP189 + PubchemFP196 + PubchemFP374 + PubchemFP537 + PubchemFP686 + PubchemFP695 + PubchemFP696 + PubchemFP715 + PubchemFP798

Performance of Developed ML Models

A 10-fold cross-validation approach was used to test the performance of the developed predictive models (with optimized parameters), and the results of each model are represented in Tables and 4. Additionally, the results are also illustrated in the form of bar charts in Figures and 3. Initially, RFE-selected features were used to develop ML models. All of the developed models were optimized by various tuning parameters by implementing the grid search method. For example, the SVM model was created using the RBF and optimized using the caret package’s Cost (0.1–2) and Sigma (1–3) parameters, whereas the RF model was optimized using the mtry parameter, i.e., 7. On the other hand, the ANN model was optimized using size (1–3) and decay (0.01–0.1) parameters. The best model was chosen from the constructed models using statistical parameters such as SE, SP, P, ACC, and MCC. From Figure , it can be observed that all of the developed models showed predictive accuracy greater than 0.7 on both training and test sets. However, overall, the SVM model showed better results than RF and ANN in terms of ACC and MCC values on a 10-fold cross-validated test set. The results of each model developed using Boruta features are given in Figure . All models showed more or less similar results. However, in comparison, the SVM model showed better 10-fold cross-validated ACC and MCC values for the test set. Hence, the SVM models were selected as the best predictive models for both feature selection methods. Among these, the best model was selected by comparing their MCC values. According to Chicco et al., the MCC produces a more informative and truthful result in evaluating binary classification models than accuracy.[57] Hence, by considering the MCC values, the SVM model developed using RFE features (RFE+SVM) was selected as the best model. The best-selected SVM model was tuned using sigma (σ) 0.1 and Cost (C) 2.6 parameters. Furthermore, the performance of the best-selected model was assessed by an external validation method. An external data set of 37 selective inhibitors and 168 decoys was used in this method. The best-selected model provided 0.95% accuracy, 0.90% sensitivity, and 0.96% specificity for the external validation set. In addition, the MCC values showed that the best-selected model (RFE+VM) is capable of classifying both negative and positive classes, as evidenced by MCC values of 0.73 and 0.82 for 10-fold CV and external validation, respectively. All of these findings show that the created SVM model is more effective at distinguishing highly selective CYP1B1 inhibitors from nonselective CYP1B1 inhibitors.

Table 3

Results of the Models Trained Using RFE-Selected Features

s. no.	method		accuracy	sensitivity	specificity	F1	MCC
1	SVM C = 2.6 Sigma = 0.1	train	0.92	0.95	0.9	0.92	0.84
1	SVM C = 2.6 Sigma = 0.1	test	0.82	0.85	0.77	0.77	0.73
2	RF Mtry = 4	train	0.87	0.94	0.80	0.88	0.75
2	RF Mtry = 4	test	0.73	0.87	0.68	0.78	0.68
3	ANN size = 1 decay = 0.01	train	0.87	0.95	0.78	0.88	0.75
3	ANN size = 1 decay = 0.01	test	0.7	0.77	0.67	0.72	0.64

Table 4

Results of the Models Trained Using Boruta Selected Features

s. no.	method		accuracy	sensitivity	specificity	F1	MCC
1	SVM C = 2.1 Sigma = 0.1	train	0.91	0.94	0.86	0.91	0.80
1	SVM C = 2.1 Sigma = 0.1	test	0.77	0.87	0.74	0.79	0.70
2	RF Mtry = 7	train	0.9	0.95	0.83	0.90	0.80
2	RF Mtry = 7	test	0.72	0.87	0.71	0.76	0.67
3	ANN size = 1 decay = 0.01	train	0.90	0.96	0.83	0.90	0.80
3	ANN size = 1 decay = 0.01	test	0.7	0.77	0.65	0.71	0.60

Figure 2

Performance of the 10-fold cross-validated models developed by the RFE features.

Figure 3

Performance of the 10-fold cross-validated models developed by the Boruta features.

Performance of the 10-fold cross-validated models developed by the RFE features. Performance of the 10-fold cross-validated models developed by the Boruta features.

Y-Scrambling and ROC Curve Analysis

Random scrambling experiments were performed to prove that the best performance of the given SVM model was not a result of chance correlation. The classification labels (i.e., “S”, “N”) of the training set (182) molecules were randomly shuffled. Afterward, attempts were made to build an RFE+SVM model using scrambled data. A total of 30 randomized models were created; all of these scrambled models were found to have lower accuracy and MCC scores than the unscrambled model (Original model), as shown in Figure A. The area under the ROC curve is an important measure for evaluating the model’s performance. As a result, it was also plotted to demonstrate a binary classifier system’s diagnostic ability to distinguish between two classes. From Figure B, it can be observed that the AUC value of the best-selected SVM models on the test set is 0.79, indicating that the generated models have good prediction ability and reliability.

Figure 4

(A) Y-randomization results of the best-selected RFE+SVM model. Blue dots indicate the scrambled models and the red dot indicates the original model. (B) ROC curve of the best-selected RFE+SVM model on the test set.

ML-Based Virtual Screening

The best-selected ML model (i.e., RFE+SVM) was implemented in virtual screening (VS) of the Selleckchem database, which consists of 2644 FDA-approved chemical compounds. As a result of screening, 660 hits were identified to be selective CYP1B1 inhibitors, which are further subjected to docking studies to identify the novel chemical scaffolds as CYP1B1 inhibitors. The schematic representation of the VS protocol is shown in Figure .

Figure 5

Schematic representation of the virtual screening pipeline.

Schematic representation of the virtual screening pipeline. Following the development of ML models, validation, and virtual screening, the best predicted FDA molecules were further evaluated for target binding affinity against CYP1B1 (PDB ID: 3PM0) and CYP1A1 (PDB ID: 4I8V) enzymes through molecular docking simulation studies. Redocking simulation was performed using the crystallographic structure of 3PM0. From Figure A, it was observed that both ligands (i.e., extracted crystal ligand and redocked ligand) were entirely overlapped in the same orientation with a lower RMSD (i.e., 0.5Å), which indicates that the conformational sampling method of the Flare program appears to be reliable. The interaction patterns of ANF (cocrystal ligand) have also been analyzed. As illustrated in Figure B, three pi–pi stacking interactions were observed between the naphthalene moiety of ANF and active site residues of PHE 231 and PHE 268, which are crucial for binding. From the literature, it is evident that PHE 231, PHE 268, and PHE 134 are crucial active residues involved in the development of pi–pi stacking interactions in CYP1B1 inhibition.[58] However, PHE 224 and 123 play an important role in selective inhibition of CYP1A1 by forming crucial pi–pi stacking interactions.[58] By considering this crucial interaction and its binding affinity scores, the selectivity of the identified hits was judged against 1B1 over 1A1.

Figure 6

(A) Conformations of the docking ligand and extracted cocrystal ligand (ANF) are superimposed (the green ligand represents the cocrystal ligand pose, and the pink ligand represents the redocking ligand pose). (B) Binding mode of ANF inside the CYP1B1 active site. Initially, all of the 660 hits were docked in the active site of CYP1B1, and their interaction profiles and docking scores were analyzed. Based on the docking scores, 35 molecules were selected, which showed high binding affinity scores (i.e., <−8.5 kcal/mol) than that of the cocrystal ligand (i.e., −8.5 kcal/mol). The identified top 35 molecules with their docking scores are given in Table . All of the 35 molecules showed crucial pi–pi stacking interactions with PHE 231, PHE 268, and PHE 134 of CYP1B1. Furthermore, to understand the selectivity of these identified hits over 1A1, all of the molecules were docked against CYP1A1, and their docking scores and interactions profiles were analyzed. After comprehensive consideration of docking scores, binding modes, and crucial interactions, four compounds, namely, chlorprothixene, glibenclamide, nadifloxacin, and ticagrelor, were considered selective CYP1B1 inhibitors because none of these molecules showed selective interaction against 1A1. Indeed, similar to the cocrystal ligand (ANF), these compounds display a high binding affinity with CYP1B1 by establishing the intramolecular interactions with its key amino acids such as PHE 231, PHE 134, ASP 326, LEU 224, HIS 227, GLN 332, ASN 228, LUE 264, and ASN 265. The docking interactions of the top four molecules are illustrated in Figure . To understand the stability of these interactions, the top four molecules were further subjected to molecular dynamics studies.

Table 5

Top 35 Virtual Hits’ Information toward CYP1B1

molecule	LF dGa	LF VS scoreb	LF Rank scorec	selleckchem ID
ticagrelor	–11.9	–14.4	–11.1	878
bosutinib	–11.8	–13.7	–4.4	5
brinzolamide	–11.8	–11.5	–8.9	682
ceforanide	–11.8	–12.9	–6.2	2012
xipamide	–11.8	–12.1	–12.8	1961
dasatinib	–11.7	–14.3	–12.8	6
ezetimibe	–11.7	–13.9	–12.2	249
ozenoxacin	–11.7	–11.9	–10.7	2762
acotiamide-d₆	–11.6	–12.5	–9.8	1981
trichlormethiazide	–11.6	–11.2	–10.7	750
vilazodone	–11.6	–13.5	–4.7	2269
darolutamide	–11.6	–12.3	–13.4	1689
donepezil	–11.6	–12.6	–5.5	2011
haloperidol	–11.6	–13.0	–12.1	108
dolutegravir	–11.5	–12.6	–12.7	583
cabergoline	–11.5	–12.8	–2.9	2267
isavuconazole	–11.5	–13.0	–10.7	1310
ceftezole	–11.5	–10.9	–8.9	2058
pantethine	–11.4	–13.3	–5.2	1989
lapatinib	–11.4	–15.2	–8.8	383
pyritinol	–11.4	–12.2	–11.7	1927
(−)-epigallocatechin gallate	–11.4	–13.1	0.53	445
d-α-tocopheryl acetate	–11.4	–12.8	–1.5	1919
pexidartinib	–11.4	–13.3	–13.2	1818
lenvatinib	–11.4	–12.8	–7.8	422
ozanimod	–11.4	–13.6	–14.1	1834
halofuginone	–11.3	–12.1	–12.4	1840
nebivolol	–11.3	–13.1	–12.1	2506
gefarnate	–11.3	–12.1	–1.5	2367
doripenem	–11.3	–12.3	–6.7	2339
glyceryl trioctanoate octanoic acid	–11.3	–12.4	0.434	2643
peimine	–11.3	–12.2	–2.6	2326
glibenclamide	–10.2	–12.4	–8.4	280
chlorprothixene	–9.9	–10.8	–12.3	303
nadifloxacin	–9.08	–10.3	–10.2	429

Score (kcal) used to rank the poses of docked compounds.

Score (kcal) used to rank the binding affinity of docked compounds.

Score (kcal) used to rank the docked compounds in the virtual screening experiment.

Figure 7

Binding interaction patterns of the top four compounds in the active site of CYP1B1: (A) chlorprothixene, (B) glibenclamide, (C) nadifloxacin, and (D) ticagrelor. Pink-colored-lined dots represent π–π stacking interactions, dark-green-colored-lined dots represent strong hydrogen bonds, and light-green color represents weak hydrogen-bond interactions. Score (kcal) used to rank the poses of docked compounds. Score (kcal) used to rank the binding affinity of docked compounds. Score (kcal) used to rank the docked compounds in the virtual screening experiment. The stability and flexibility of each docking complex of CYP1B1 with chlorprothixene, glibenclamide, nadifloxacin, and ticagrelor were assessed as a function of time using a 100 ns MD simulation under an explicit hydration environment. MD simulation findings were assessed in terms of pi–pi stacking, hydrogen-bond dynamic stability, and structural flexibility of CYP1B1. The stability of the trajectory during MD simulation was determined by analyzing the RMSD of all complexes based on protein-heavy atoms. The RMSD plot for each ligand-protein complex is shown in Figure A–D, and it can be seen that the RMSD value of the ligand-protein fluctuates during an initial period of 35 ns and gradually stabilizes, and no significant fluctuation was observed in the four complexes. Following this, all complexes were equilibrated with average RMSD values of less than 1Å. According to Gupta et al., stable protein backbone fluctuations and moderate RMSD values are good indications of molecular dynamics simulation stability.[59] The low RMSD values indicate that over the 100 ns MD simulation, the protein conformation did not change or changed only slightly, confirming the logic and stability of the initial docking complexes. RMSD for the complexes CYP1B1 with chlorprothixene showed that the structure of the systems equilibrated well after the 15 ns simulation (Figure A). A slight fluctuation was observed during the 80–85 ns simulation period with an RMSD of 0.5 Å; however, it showed that RMSD profiles were always less than 1 Å for chlorprothixene throughout 7–100 simulations. From Figure A, it was observed that two initial docking interactions such as PHE 231 and ASP 326 were retained in the case of the compound chlorprothixene, indicating the stability of the initial docking complex. The CYP1B1 complex with glibenclamide (Figure B) showed that the structure of the system is equilibrated well after 35 ns simulation with RMSD of less than 1 Å. As shown in Figure B, throughout the 100 ns simulation time, glibenclamide maintained two initial pi–pi stacking contacts with the residue PHE 231. In the case of nadifloxacin, slight fluctuations were observed with an RMSD of 0.5Å during 20–40 ns simulations (Figure C); as a result, one new pi–pi stacking was observed with residue PHE 134, and two initial docking contacts were lost, i.e., LEU 224 and HIS 227 (Figure C). From Figure D, it was observed that after the 20 ns simulation, the CYP1B1-ticagrelor system reached equilibrium. The CYP1B1-ticagrelor complex slightly fluctuated (i.e., <1 Å) during the 65–75 ns simulation; as a result, one new pi–pi stacking interaction was observed with the residue PHE 231 shown in Figure D. Thus, the stability of CYP1B1 in chlorprothixene, glibenclamide, nadifloxacin, and ticagrelor-bound states was found to be suitable for postanalysis. The RMSD and crucial pi–pi stacking interactions of all complexes showed that they stably bounded to CYP1B1 throughout 100 ns simulations. The MD simulations revealed that key pi–pi stacking and hydrogen-bond interactions were conserved. The best four docked compounds have essential contacts with the enzyme’s active region, indicating that these compounds could potentially inhibit CYP1B1. Enzymatic test assays were performed utilizing human recombinant cytochrome P450 enzymes, namely, CYP1B1 and CYP1A1 enzymes, to see how well these four compounds can potentially inhibit CYP1B1.

Figure 8

RMSD plot of top four molecules in complex with CYP1B1: (A) chlorprothixene, (B) glibenclamide, (C) nadifloxacin, and (D) ticagrelor.

Figure 9

Interactions of top four ligands in complex with CYP1B1 (3PM0) after 100 ns molecular dynamics simulations: (A) chlorprothixene, (B) glibenclamide, (C) nadifloxacin, and (D) ticagrelor.

RMSD plot of top four molecules in complex with CYP1B1: (A) chlorprothixene, (B) glibenclamide, (C) nadifloxacin, and (D) ticagrelor. Interactions of top four ligands in complex with CYP1B1 (3PM0) after 100 ns molecular dynamics simulations: (A) chlorprothixene, (B) glibenclamide, (C) nadifloxacin, and (D) ticagrelor.

In Vitro CYP Inhibition Assay

The inhibitory effects of the top four compounds on recombinant human CYP1B1 and CYP1A1 enzymes were evaluated using the standard EROD assay, which has been widely used as a method for the assessment of CYP1 inhibitory activity.[19,60−62] ANF, a well-known CYP1 inhibitor, was used as a reference drug. The top selected drugs (chlorprothixene, nadifloxacin, glibenclamide, and ticagrelor) were examined at five concentrations as mentioned in the methodology section to investigate their CYP1B1 inhibitory activity. As shown in Figure , for each compound in the mixture, IC50 was calculated using a dose–response curve plotted between percentage inhibition and log concentration (these observations were calculated in triplicate). According to general evaluation, three drugs, including chlorprothixene, nadifloxacin, and ticagrelor, exhibited potential inhibition effects on both CYP1 isoforms as mentioned in Table , whereas 50% inhibition was not observed at 10 μM in the case of glibenclamide. Among all compounds, chlorprothixene showed potent inhibitory activity against 1B1 (i.e., IC50 = 0.072 ± 0.008 μM) compared to 1A1 (i.e., IC50 = 0.49 ± 0.04 μM) with 7-fold selectivity. This high potency may be assumed to be due to their crucial pi–pi stacking and salt bridge interactions with PHE 231 and ASP 326 residues, as displayed in Figure A. From the literature, it was evident that salt bridge interactions play an important role in conformational specificity and the molecular recognition process. The binding energy of the salt bridge interactions is significantly higher than the hydrogen-bond and pi–pi stacking interactions.[63] In the current study, chlorprothixene showed one pi–pi stacking and one salt bridge interaction with residues PHE 231 and ASP 326, respectively (Figure A). However, no salt bridge interactions were observed in the case of compounds glibenclamide, nadifloxacin, and ticagrelor (Figure B–D). From these results, it was concluded that ASP 326 and PHE 231 are the only contributors to chlorprothixene potency. On the other hand, nadifloxacin and ticagrelor showed moderate potency against both CYP1 isoforms. However, slightly better activity was observed against 1B1 with IC50 values of 1.46 and 2.81 μM, respectively. Finally, these findings led to the conclusion that pi–pi stacking interaction with PHE 231 is responsible for the potency of three compounds including chlorprothixene, nadifloxacin, and ticagrelor. A study conducted by Du et al. suggested that chlorprothixene, a dopamine receptor antagonist, can induce cell death by apoptosis in myeloid leukemia cell lines.[64,65] This may direct oncologists to explore chlorprothixene as a potential adjuvant in the case of imatinib-resistant-acute myeloid leukemia cell lines. Additionally, the second hit Ticagrelor, an antiplatelet drug, has been reported to show breast cancer prevention activity in orthotopic 4T1 breast cancer models by inhibiting tumor-cell platelet aggregation.[63] From these findings, it was hypothesized that ticagrelor may be used as adjuvant therapy in docetaxel-resistant breast cancer. Nadifloxacin is a fluoroquinolone class of antibacterial drug that has been reported to show promising anticancer activity in triple-negative breast cancer cell lines (i.e., MDA-MB-231) by apoptosis mechanism.[66] These results suggested that the combination therapy of docetaxel and nadifloxacin could improve the anticancer activity in docetaxel-resistant breast cancer cell lines. Herein, we report that chlorprothixene, ticagrelor, and nadifloxacin may ameliorate the anticancer activity in docetaxel-resistant cell lines by inhibiting CYP1B1.

Figure 10

Inhibitory potencies of the top four molecules identified from the EROD assay. (A) CYP1B1 and (B) CYP1A1.

Table 6

Identified Inhibitory Potencies of the Top Four Molecules in Human Recombinant CYP-P450 Enzymes

	recombinant human CYP-P450 enzymes/IC₅₀ μM		selectivity
molecule	CYP1B1 ± SD	CYP1A1 ± SD	CYP1A1/CYP1B1
chlorprothixene	0.07 ± 0.01	0.49 ± 0.04	6.8
glibenclamide	>10	>10
nadifloxacine	1.46 ± 0.06	2.36 ± 0.20	1.61
ticagrelor	2.81 ± 0.90	3.45 ± 0.39	1.22
ANF	0.02 ± 0.01	0.02 ± 0.004	1

Inhibitory potencies of the top four molecules identified from the EROD assay. (A) CYP1B1 and (B) CYP1A1.

Conclusions

Resistance against clinically authorized anticancer drugs is a major challenge in cancer treatment. Several commonly used anticancer medications, such as docetaxel, imatinib, and paclitaxel, are inactivated by CYP1B1, which ultimately results in drug resistance. A number of commonly used anticancer medications, such as docetaxel, tamoxifen, imatinib, and paclitaxel, are specifically quickly inactivated by CYP1B1, which ultimately results in drug resistance. Herein, efforts have been made to integrate in silico and in vitro approaches to identify possible selective CYP1B1 inhibitors. Based on the existing knowledge about already known selective and nonselective CYP1B1 inhibitors, three ML models such as SVM, RF, and ANN were developed. The best-selected SVM+RFE model was validated with known CYP1B1 inhibitors and Y-scrambling experiments. The best-selected model with an MCC value on the test set of 0.73 was used to screen the selleckchem database. To estimate the binding affinity, binding pose, and factors governing the binding process of the obtained hits, molecular docking and molecular dynamics (MD) simulations have been performed. From interaction and RMSD analysis at the atomic level through the 100 ns MD simulation, it was found that four compounds such as chlorprothixene, glibenclamide, nadifloxacin, and ticagrelor maintain mandatory pi–pi stacking and hydrogen-bonding interactions and can be considered to be the best. The ligand-binding affinity was then assessed via an in vitro enzymatic assay. The 50% inhibition concentration (IC50) of these four compounds to CYP1B1 was determined via an EROD assay. Three compounds including chlorprothixene, nadifloxacin, and ticagrelor were found to be highly potent inhibitors of CYP1B1 because they established IC50 values of 0.072, 1.463, and 2.81 μM, respectively. Although it was expected to inhibit CYP1A1, to our surprise, chlorprothixene exhibited significant selectivity (7-fold) toward CYP1B1 over CYP1A1 than the cocrystal ligand (ANF). From molecular dynamics results, it was concluded that pi–pi stacking and salt bridge interaction with PHE 231 and ASP 326 play a crucial role in the inhibitory potency of chlorprothixene, whereas, on the other hand, the same kind of salt bridge interaction, i.e., ASP 326, was absent in other compounds. These identified new leads can be considered for developing new therapeutics to treat CYP1B1-mediated drug-resistance problems in cancer patients. Overall, it can be concluded that the discovery of novel CYP1B1 inhibitors with the help of multiple ML and classical SB-VS methods may be beneficial to cope with the problem of chemoresistance.

57 in total

1. Prediction of the types of ion channel-targeted conotoxins based on radial basis function network.

Authors: Lu-Feng Yuan; Chen Ding; Shou-Hui Guo; Hui Ding; Wei Chen; Hao Lin
Journal: Toxicol In Vitro Date: 2012-12-29 Impact factor: 3.500

2. Human CYP1B1 and anticancer agent metabolism: mechanism for tumor-specific drug inactivation?

Authors: B Rochat; J M Morsman; G I Murray; W D Figg; H L McLeod
Journal: J Pharmacol Exp Ther Date: 2001-02 Impact factor: 4.030

3. Ciprofloxacin triggers the apoptosis of human triple-negative breast cancer MDA-MB-231 cells via the p53/Bax/Bcl-2 signaling pathway.

Authors: Artur Beberok; Dorota Wrześniok; Jakub Rok; Zuzanna Rzepka; Michalina Respondek; Ewa Buszman
Journal: Int J Oncol Date: 2018-03-08 Impact factor: 5.650

4. Bioflavonoids: selective substrates and inhibitors for cytochrome P450 CYP1A and CYP1B1.

Authors: H Doostdar; M D Burke; R T Mayer
Journal: Toxicology Date: 2000-04-03 Impact factor: 4.221

5. Selective inhibition of methoxyflavonoids on human CYP1B1 activity.

Authors: Hitomi Takemura; Toshimasa Itoh; Keiko Yamamoto; Hiroyuki Sakakibara; Kayoko Shimoi
Journal: Bioorg Med Chem Date: 2010-07-13 Impact factor: 3.641

6. Biphenyl urea derivatives as selective CYP1B1 inhibitors.

Authors: Mohd Usman Mohd Siddique; Glen J P McCann; Vinay Sonawane; Neill Horley; Ibidapo Steven Williams; Prashant Joshi; Sandip B Bharate; Venkatesan Jayaprakash; Barij N Sinha; Bhabatosh Chaudhuri
Journal: Org Biomol Chem Date: 2016-09-26 Impact factor: 3.876

7. Selectivity of polycyclic inhibitors for human cytochrome P450s 1A1, 1A2, and 1B1.

Authors: T Shimada; H Yamazaki; M Foroozesh; N E Hopkins; W L Alworth; F P Guengerich
Journal: Chem Res Toxicol Date: 1998-09 Impact factor: 3.739

8. Prediction of factor Xa inhibitors by machine learning methods.

Authors: H H Lin; L Y Han; C W Yap; Y Xue; X H Liu; F Zhu; Y Z Chen
Journal: J Mol Graph Model Date: 2007-03-12 Impact factor: 2.518

9. Identification of chlorprothixene as a potential drug that induces apoptosis and autophagic cell death in acute myeloid leukemia cells.

Authors: Yuxin Du; Kening Li; Xiangeng Wang; Aman Chandra Kaushik; Muhammad Junaid; Dongqing Wei
Journal: FEBS J Date: 2019-11-07 Impact factor: 5.542

10. PubChem Substance and Compound databases.

Authors: Sunghwan Kim; Paul A Thiessen; Evan E Bolton; Jie Chen; Gang Fu; Asta Gindulyte; Lianyi Han; Jane He; Siqian He; Benjamin A Shoemaker; Jiyao Wang; Bo Yu; Jian Zhang; Stephen H Bryant
Journal: Nucleic Acids Res Date: 2015-09-22 Impact factor: 16.971