Literature DB >> 35664614

Prediction of QcrB Inhibition as a Measure of Antitubercular Activity with Machine Learning Protocols.

Afreen A Khan¹, Sannidhi S Poojary¹, Ketki K Bhave¹, Santosh R Nandan², Krishna R Iyer¹, Evans C Coutinho¹.

Abstract

It has always been a challenge to develop interventional therapies for Mycobacterium tuberculosis. Over the years, several attempts at developing such therapies have hit a dead-end owing to rapid mutation rates of the tubercular bacilli and their ability to lay dormant for years. Recently, cytochrome bcc complex (QcrB) has shown some promise as a novel target against the tubercular bacilli, with Q203 being the first molecule acting on this target. In this paper, we report the deployment of several ML-based approaches to design molecules against QcrB. Machine learning (ML) models were developed based on a data set of 350 molecules using three different sets of molecular features, i.e., MACCS keys, ECFP6 fingerprints, and Mordred descriptors. Each feature set was trained on eight ML classifier algorithms and optimized to classify molecules accurately. The support vector machine-based classifier using the ECFP6 feature set was found to be the best classifier in this study. Further, screening of the known imidazopyridine amide inhibitors demonstrated that the model correctly classified the most potent molecules as actives, hence validating the model for future applications.

Entities: Chemical

Year: 2022 PMID： 35664614 PMCID： PMC9161412 DOI： 10.1021/acsomega.2c01613

Source DB: PubMed Journal: ACS Omega ISSN： 2470-1343

Introduction

Tuberculosis (TB) caused by Mycobacterium tuberculosis (Mtb) is a global health concern that is listed as one of the top 10 causes of death from a single infectious agent (ranking above HIV/AIDS since 2007). In 2019, an estimated 8.9–11.0 million people globally were affected by TB with a high death rate.[1] Moreover, the ongoing SARS-COV-2 pandemic has confounded the progress made toward TB efforts.[2] Traditional anti-TB drugs are known for their poor efficacy against nonreplicating bacteria as they target processes which are necessary for cell growth and replication with no effect on dormant tubercles. Consequently, treatments for both drug-susceptible TB (DS-TB) and drug-resistant TB (DR-TB) span several months. Furthermore, the current treatments for DR-TB are associated with low cure rates and high toxicity, thus necessitating the development of new and more efficacious drugs.[3] The novel target–cytochrome bcc-aa3 complex in the oxidative phosphorylation pathway has piqued the interest of researchers in this field. Bedaquiline (BDQ), the first FDA-approved antitubercular in 2012 for treating multidrug and extensively drug-resistant disease, targets the c-subunit of mycobacterial F1F0 ATP synthase.[4] Additionally, both imidazopyrimidine amide Q203, also known as telacebec, and lansoprazole sulfide (LPZS) target the cytochrome bcc complex (QcrB), a subunit of the mycobacterial cyt-bcc-aa3 oxidoreductase in the electron transport chain (ETC) Figure .[5]

Figure 1

Structure of bedaquiline (BDQ) and reported QcrB inhibitors telacebec (Q203) and lansoprazole sulfide (LPZS).

Structure of bedaquiline (BDQ) and reported QcrB inhibitors telacebec (Q203) and lansoprazole sulfide (LPZS). Mtb relies on the energetically efficient oxidative phosphorylation pathway to sustain its growth. This is evident from high-density mutagenesis and deletion studies that indicate that Mtb is totally dependent on oxidative phosphorylation and cannot produce ATP by substrate level phosphorylation.[6] Mycobacteria possess two terminal oxidases which catalyze the two-electron reduction of oxygen atoms to water, namely, the proton-pump cyt-bcc-aa3 supercomplex and cytochrome bd oxidase (cyt-bd). Cytochrome bcc is an intermediary in the terminal reduction of oxygen in the aerobic electron transport chains[3] and transfers protons as part of the Q-cycle that differentiates it from cytochrome bd. Cytochrome bd is a quinol oxidase that plays an important role in a number of physiological functions thereby allowing pathogenic and commensal bacteria to survive in anaerobic conditions. It has been shown that Q203 binds to the quinol oxidation site (Qp) in the cytochrome b subunit of complex III (QcrB) disrupting ATP formation to cause bacteriostasis. ETC is responsible for generating the proton motive force (PMF) by pumping protons across the membrane, and the energy from the PMF is used by ATP synthase to generate ATP molecules. As a continuous flow of PMF and ATP is essential even for the viability of nonreplicating Mtb, inhibition of this complex can eliminate nonreplicating subpopulations.[7] The approval of BDQ for the treatment of DR-TB, and Q203 being in phase II clinical development, demonstrates that mycobacterial respiration could be valuable for therapeutic interventions. One of the budding arms of computer-aided drug discovery is “machine learning” (ML). The approach of ML is quite different from the traditional physical models whose results solely rely on physical equations; on the other hand, machine learning uses algorithms that recognize patterns to establish relationships that help predict biological, chemical, and physical properties of novel compounds. ML techniques can be easily applied on enormous data sets that are quite unwieldy with physical models.[8] To date, many ML techniques have been implemented to guide traditional experiments and have resulted in reducing both time and cost. In the past decade, machine learning tools have made a tremendous impact on quantitative structure activity relationship (QSAR) modeling. The ML tools have been refined and modified over time, and as a result, they are able to identify potential biologically active molecules from millions of possibilities, efficiently and easily.[8] The new approaches like multitarget QSAR (mt-QSAR),[9,9b,9c] multitasking quantitative-structure biological effect relationships (mt-QSBER)[10,10b,11,12] for antiviral and antimicrobial activity, have been introduced that can simultaneously predict activities against multiple organisms. These mt-QSAR models work by constructing a drug–drug similarity complex network. The methods of mt-QSAR and mt-QSBER have been specifically applied to Mycobacterium tuberculosis,[13] where the fragments contributing to the activity were recognized and new molecular entities identified.[10b,14] The imidazopyridine amide Q203 was identified by Pethe et al. in 2013[5a] from a set of 352 molecules that were tested against Mtb.[15] Several attempts to modify and discover a molecule more potent than Q203 have been unfruitful. The lack of an X-ray crystal structure of Mtb QcrB and the fact that homology models of the complex are not easy to build make structure-based approaches difficult. Thus, an ML-based QSAR or a CSAR (classification structure activity relationship) is a potential solution to this problem. Herein, we report the use of eight different ML algorithms to identify a pattern that will enable classification of molecules as active, moderately active, and inactive. To the best of our knowledge, this is an unprecedented attempt at using ML principles on QcrB inhibitors. We had previously reported the synthesis and SAR of benzyl piperazine ureas[16] with in silico studies suggesting QcrB as a plausible target. We have used the benzyl piperazine ureas (n = 55) and molecules reported by Moraski et al.[17,17b,18,18b] (n = 54) as a validation set. The ML classifier has been made publicly accessible as a web-based app, Q-TB (https://github.com/CoutinhoLab/Q-TB.git), to enable researchers to evaluate their molecules as potential inhibitors of the QcrB complex of Mtb.

Experimental Section

Data Collection and Curation

A total of 352 compounds with their corresponding QIM (quantification of intracellular mycobacteria) values were taken from the patent WO 2011/113606 Al.[15] The molecules were categorized as active (QIM <1 μM), moderately active (QIM 1–20 μM) and inactive (QIM >20 μM) as stated in the patent (Table ). Compounds numbered 177 and 234 in the patent were discarded as they were found to be duplicates.

Table 1

Data Set Class Distribution

class	active (class 1)	moderately active (class 2)	inactive (class 3)
range (QIM)	<1 μM	1–20 μM	>20 μM
no. of compounds (% of total)	214 (61%)	58 (16.5%)	78 (22.5%)

The classification led to 214 molecules being defined as active, 58 as moderately active, and 78 as inactive compounds (Table ). 20% of the 350 compounds (i.e., 70 molecules) were held back as an external validation set, while the rest were split into a training set (210 molecules) and a test set (70 molecules) at a ratio of 3:1. The 3 sets, i.e., the validation set, the training set, and the test set, had the same ratio of the three activity classes. An external validation was performed on the set of 55 molecules previously disclosed by our lab as mentioned earlier.

Feature Engineering

Calculation of Molecular Descriptors and Feature Selection

A total of 1613 two-dimensional descriptors from 43 groups created on the molecular SMILES description of compounds in both the training and test sets were calculated with the python-based molecular descriptor calculator Mordred.[19] The descriptors encoding physicochemical as well as topological properties were used to quantitatively represent each compound. Following descriptor calculation, descriptors with null values or with errors were discarded. As a result, a pruned list of 1331 descriptors was used for model building.

Calculation of Molecular Features/Fingerprints

In addition to the aforesaid descriptors (physicochemical and topological), two types of fingerprints, MACCS (molecular access system) keys (166 bits) and ECFP6 (extended-connectivity fingerprints) fingerprints (2047 bits), were calculated with RDKit (version 2020.09.1.0). Unlike descriptors, no preprocessing was applied to molecular fingerprints; all of them were used “as is” for model building.

Model Building and Evaluation

We explored eight ML algorithms, namely, Logistic Regression (LR), k-Nearest Neighbors (KNN), Support Vector Machine (SVM), Random Forest (RF), eXtreme Gradient Boosting (XGB), Gaussian Naïve Bayes (GNB), Decision Trees (DT), and Linear Discriminant Analysis (LDA). The classifiers were built using Python’s Scikit-learn toolkit. For all ML algorithms (except LDA and GNB), a grid-search approach with prediction accuracy as the objective was applied for optimization of hyperparameters, and the optimal parameters were determined during 10-fold cross-validation. Logistic regression is an ML algorithm used to predict the probability of a target variable. KNN is based on the principle that all available data are stored, and a new point is classified based on the similarity to the stored data. SVM classifies the data points using a hyperplane and is effective in high-dimension spaces. An RF model builds multiple decision trees and merges them to obtain a more accurate and stable prediction. XGB is a decision tree-based ensemble ML algorithm that uses the gradient boosting framework. The Naïve Bayes method is a supervised machine learning algorithm based on the Bayes theory and operates on the assumption that each pair of features is independent. The DT classification works by splitting the data on different conditions and predicts the values of an unknown by learning the decision rules inferred by the data. LDA makes predictions by estimating the probability that a set of inputs belongs to a particular class.

Validation Metrics

To measure the performance of the model, several metrics, namely, accuracy (AC), sensitivity (SE) or recall, precision, specificity (SP), F1-score, receiver operating characteristic (ROC-AUC), Matthew’s correlation coefficient (MCC), and error rate (ERR), were calculated. For definitions of the metrics, the following terms (and their abbreviations) are used: number of true positives (TP), number of false positives (FP), number of true negatives (TN), and number of false negatives (FN). Accuracy is a measure of model robustness and is given by TN rate or specificity is the proportion of the actual negative cases correctly identified. Recall or TP rate or sensitivity is the percentage of true class labels correctly identified by the model as true and is given by Precision is the proportion of the positive cases that were correctly identified, given by The F1 score is the harmonic mean of recall and precision, given by MCC is a measure that takes into account the positives and negatives and is given by Finally Based on the TP and FP rate pairs, the corresponding AUC-ROC metric is calculated.

Y-Randomization or Y-Scramble Tests

Y-randomization tests are used to verify the “predictive power” of the model and are performed to evaluate any chance prediction. In each Y-randomization test, a new training set is generated by randomly shuffling the bioactivity labels (in the present case, active, moderately active, or inactive) while leaving the features data intact. A new model is built based on this shuffled training set with the same features, hyperparameters, and procedures of model training as the original ones. The resulting models are further evaluated on the test set with no random shuffling. If the new model performs worse than the one based on the original training set, it can safely be deduced that the performance of the classification model trained on the original training set is not accidental.

Similarity Maps

RDKit facilitates visualization of the atomic contributions to the predicted probability of the ML model. The “atomic weights” are generated by removing the bits belonging to the corresponding atom and comparing the similarity of the modification with the unmodified fingerprint. With the best performing model, the similarity maps were generated for compounds in the test set using the protocol provided by Riniker and Landrum.[20]

Results and Discussion

Classification Models

Discerning Feature Set

Eight ML algorithms along with three kinds of features were used for model building. The optimal hyperparameters were set using GridSearch and were validated based on the AC, MCC, SE, PR, F1-score, AUC, and ERR metrics as listed in Table . The optimal parameters on which the models were built are listed in Table .

Table 2

Performance of the Individual Models on the Test Seta

		test set
models features_ML method	10-fold stratified cross validation	AC	MCC	SE/recall	PR	F1-score	AUC	ERR
maccs_lr	0.69	0.70	0.41	0.91	0.75	0.82	0.69	0.30
maccs_knn	0.65	0.61	0.10	0.98	0.64	0.77	0.60	0.39
maccs_svm	0.71	0.71	0.44	0.93	0.77	0.84	0.67	0.28
maccs_rf	0.68	0.69	0.41	0.93	0.78	0.85	0.69	0.30
maccs_xgb	0.68	0.67	0.33	0.93	0.73	0.82	0.71	0.33
maccs_gnb	0.64	0.40	0.21	0.35	0.83	0.49	0.65	0.60
maccs_dt	0.69	0.70	0.38	0.88	0.75	0.81	0.65	0.30
maccs_lda	0.66	0.60	0.24	0.79	0.72	0.76	0.58	0.40
ecfp6_lr	0.73	0.80	0.62	0.93	0.82	0.87	0.82	0.20
ecfp6_knn	0.69	0.69	0.36	1.00	0.67	0.80	0.69	0.31
ecfp6_svm	0.71	0.71	0.46	0.86	0.79	0.89	0.80	0.29
ecfp6_rf	0.75	0.78	0.59	0.93	0.78	0.85	0.75	0.22
ecfp6_xgb	0.71	0.73	0.47	0.91	0.75	0.82	0.74	0.27
ecfp6_gnb	0.59	0.60	0.32	0.63	0.73	0.68	0.67	0.40
ecfp6_dt	0.70	0.70	0.41	0.88	0.73	0.80	0.68	0.30
ecfp6_lda	0.67	0.76	0.54	0.91	0.83	0.87	0.77	0.24
des_lr	0.67	0.67	0.35	0.88	0.73	0.80	0.78	0.33
des_knn	0.67	0.60	0.07	0.93	0.62	0.75	0.64	0.40
des_svm	0.66	0.7	0.41	0.91	0.75	0.82	0.78	0.30
des_rf	0.72	0.7	0.40	0.95	0.76	0.85	0.78	0.30
des_xgb	0.7	0.69	0.35	0.95	0.69	0.80	0.76	0.29
des_gnb	0.59	0.57	0.19	0.74	0.68	0.71	0.60	0.43
des_dt	0.65	0.57	0.25	0.67	0.74	0.71	0.62	0.43
des_lda	0.57	0.64	0.34	0.81	0.80	0.80	0.67	0.36

The model is named by the features followed by the ML algorithm used. For example, maccs_rf is the rf classifier trained on maccs fingerprint descriptors.

Table 3

Optimal Parameters for the Respective Modelsa

model	parameters
LR	C = 0.23, max_iter = 100, penalty = l2, solver = lbfgs
KNN	n_neighbors = 5, leaf_size = 30, p = 2
SVM	C = 1.0, gamma = scale, kernel = poly
RF	N_estimators = 100, criterion = gini, min_samples_split = 2
XGB	Booster = gbtree, max_depth = 6, min_child_weight = 1
GNB	Priors = none, var_smoothing = 1e^–09
DT	Criterion = gini, min_sample_split = 2, splitter = best
LDA	Solver = svd, other parameters at default values

The acronyms for the various parameters are as mentioned in the Scikit learn documentation.

The model is named by the features followed by the ML algorithm used. For example, maccs_rf is the rf classifier trained on maccs fingerprint descriptors. The acronyms for the various parameters are as mentioned in the Scikit learn documentation. All models show equitable performance on the test set. The AUC values range from 0.58 to 0.82, and the MCC values span from 0.10 to 0.54. The mean value of SE is 0.86, and the error rate is 0.33; this indicates that all of the models can predict the “actives” more confidently than the moderately active and inactive molecules. The average AUC of models built with the MACCS features is 0.66, that with ECFP6 fingerprints 0.74, and that with Mordred descriptors 0.70. As the highest average AUC is returned for models with the ECFP6 features, this means that the ECFP6 features are able to recognize the underlying differences between structure and activity of the imidazopyridine amides better than the other two feature sets.

Leading ML Algorithm

According to the metrics listed in Table , the SVM method is better than other ML methods in being able to classify the molecules with the least error rate. The SVM algorithm was applied to the data reserved for external validation. The results indicate that the SVM model built with the ECFP6 feature set performs the best.

Y-Randomization Tests

Y-scrambling or randomization was performed with the SVM algorithm. The classes were shuffled one hundred times while keeping the ECFP6 features intact. The performance of each “scrambled model” was examined on the test set, and the highest and lowest metrics are given in Table . The highest AC was found to be 0.63 while lowest is 0.37; the MCC metric is highest at 0.28 and lowest at −0.21, and the highest and lowest values for AUC are 0.65 and 0.34, respectively. A table with the metrics of all 100 models generated is given in the Supporting Information (Section S4).

Table 4

Three Y-Randomization Models Sampled from the Set of 100 to Show the Highest and Lowest Values

models	AC	MCC	AUC
Y1	0.63	0.28	0.36
Y2	0.43	–0.13	0.65
Y3	0.37	–0.21	0.34

Overall, the AC, MCC, and AUC values for the randomization trials are lower than the corresponding values for the test set given in Table . This clearly indicates that the models after y-randomization are clearly unsatisfactory, and it can be safely concluded that the models built (Table ) are not a result of chance correlation.

Model Application

Applicability Domain (AD)

The purpose of an applicability domain is to determine the boundaries within which the model can make reliable predictions for compounds based on their similarity with the compounds on which the model was constructed. The compounds that satisfy the scope of the model are within the AD. In this study, the principal component analysis (PCA) bounding box was used to assess the AD of compounds contained in the training and testing sets. The ECFP6 fingerprints were used as input for the PCA; the resulting PCA bounding box scores are plotted in Figure . The data set was divided into internal and external sets, followed by the predictive model construction (for subsequent prediction on the external set), and it was also subjected to a 10-fold CV. As can be seen from Figure , the test and train compounds are within the AD of this model.

Figure 2

PCA bounding box for assessing the applicability domain, internal training set (red), and external test set (purple).

Chemical Space Analysis

The chemical space analysis is a key concept in drug discovery and helps to explore the characteristics differentiating active from moderately active and inactive molecules. The Lipinski’s rule of five (Ro5) enlists characteristics of drug-likeness for orally active drugs. The Ro5 filter was applied to the molecules in the training set. The molecular weight (MW), octanol–water partition coefficient (LogP), number of hydrogen bond acceptors (NumHAcceptors), and number of hydrogen bond donors (NumHDonors) were calculated using the RDKit library. According to the rules, all molecules in the training set fall within the Ro5 limits, i.e., MW < 500, LogP < 5, NumHAcceptors and NumHDonors < 10 (Figure ). The box plots indicate that molecules classified as active have a higher average MW and LogP in contrast to the moderately active and inactive compounds. A scatter plot of MW as a function of LogP is shown in Figure , suggesting that the MW clusters in the range 400–500 Da, and the LogP ranges from 4.0 to 6.0. A large percentage of the active compounds have structures that are comparatively larger than the inactive compounds, as observed from the mean value of the box plots (Figure ).

Figure 3

Figure 4

Plot of MW vs LogP of compounds used to build the model. The active compounds–class 1 are shown in blue, moderately active–class 2 in orange, and inactives–class 3 in green.

Lipinski’s rule of five plots for QcrB inhibitors (training set). The bioactivity class 1 is the actives, class 2 the moderately actives, and class 3 the inactives. The plots are as follows: top left, bioactivity class vs molecular weight; top right, bioactivity class vs LogP; bottom left, bioactivity class vs number of hydrogen-atom acceptors; bottom right, bioactivity class vs number of hydrogen-atom donors. Plot of MW vs LogP of compounds used to build the model. The active compounds–class 1 are shown in blue, moderately active–class 2 in orange, and inactives–class 3 in green.

Model Applied to the Benzyl Piperazine Data Set

The data set disclosed by our group[16] comprising 55 compounds has an activity span from 1 to >20 μM. The molecules were classified according to the cutoffs used on the training and test sets. The ECFP6 feature was calculated for these 55 compounds, and the svm_ecfp6 model was applied. Twenty two compounds are predicted as moderately active and 33 inactive while none are predicted as active. Looking at the classification of the data set, 23 molecules have been predicted correctly. To put this prediction in context, we note that the benzyl piperazine molecules are in a chemical space distinct from the area occupied by the training set on which the model was established; this could be the source of the variance in the predictions.

Model Tested on the Imidazopyridine Amides Data Set

Moraski et al.[18,21] have explored the SAR of imidazopyridine amides, which are postulated to act via inhibition of QcrB. A data set of 54 compounds reported by them was curated. The ECFP6 feature was calculated for these molecules, and the svm_ecfp6 model was constructed for these molecules. The model classifies as active all molecules which are reported as most potent in the respective publications. This indicates that the model is able to handle and predict compounds belonging to the imidazopyridine amides well. Similarity maps were generated, as shown in Figure for all molecules predicted as active. The green contours highlight the core that is similar to Q203, which is the basis for the prediction. Beside this, the red contours highlight regions that have a positive influence and also add to the activity.

Figure 5

Similarity maps for imidazopyridine amides predicted to be “active” inhibitors of QcrB according to the ecfp6_rf model.

Model Tested on Other Chemical Data Sets

Chemical data sets apart from the imidazopyridine amides, which have been tested and identified as potential QcrB inhibitors as described in the literature, were curated and predicted using the leading model. Most compounds (Figure ) were categorized satisfactorily. The Tanimoto similarity value for each molecule was calculated using RDKit with Q203 as the reference. The similarity values are given for each molecule in Figure . The most potent molecules—4, a trifluoroimidazo carboxamide;[17a]5, a pentafluorosulfanyl imidazo carboxamide;[17a] and 6 and 8 from imidazopyridines[22,23]—are reported to exhibit an inhibition profile like Q203; all of these molecules are correctly identified as belonging to class 1.

Figure 6

Molecules validated from other chemical data sets with the Tanimoto similarity value (Tc).

Molecules validated from other chemical data sets with the Tanimoto similarity value (Tc). The molecules 7, an imidazothiazole carboxamide;[17b]9, an imidazopyridine;[4]10, with an aminoquinazoline core;[3]12, a pyrrolopyridinone;[24] and 13, a phenoxyalkyl benzimidazole[25] are less potent that Q203 and are correctly identified as belonging to class 2. Finally 11, a quinazoline molecule,[4] poorly inhibits mycobacteria and has been identified as inactive or class 3. The arylvinlypiperazine amide compound 14(26) is considered to be active; however, this was not correctly identified by the model. We suspect that this anomaly may be due to its binding mode which is reported to be different from Q203. Likewise, compound 15, a morpholino thiophene,[27] is also predicted as inactive; this variance in prediction could be attributed either to the chemical space of its scaffold that is not a part of the training set or to the fact that its potency is lower than Q203. The internal and external validation results suggest that the model shows reasonable accuracy in classifying bioactivity profiles of imidazopyridine amides and many other chemical classes, toward Mtb QcrB.

Screening of the PubChem Database

In a quest to find probable active molecules, a set of 211 molecules with the imidazopyridine core was retrieved from the PubChem database. The ECFP6 feature was calculated for all of the molecules. The molecules lie in the same chemical space as the training molecules. Further, the svm_ecfp6 model was run on this set of compounds. A total of 110 molecules are predicted as active, 35 as moderately active, and 66 as inactive molecules.

Model Deployment as a Q-TB Web Application

The ECFP6 model was deployed as an app named Q-TB (Figure ) using streamlit, to enable researchers to test their compounds as probable QcrB inhibitors. The web application is enabled to accept a smile string in csv format as the molecular input. The name of the input file should be set to “Test.csv” with the column name having the structure in SMILES format as “Smiles”; any deviation from this will lead to an error. This is then submitted to the app. The ECFP6 descriptor is calculated using the RDKit package, and the svm_ecfp6 model is then applied to the input molecule; the app predicts the bioactivity class, which is labeled accordingly as class 1, class 2, or class 3. The app allows researchers with little to no background in machine learning to predict the activity of their compounds. The web app along with the manual can be accessed at https://github.com/CoutinhoLab/Q-TB.

Figure 7

Snapshot of the web application Q-TB.

Conclusions

ML is being widely used in all areas of science, including drug discovery to predict bioactivity and physicochemical properties. QcrB of Mtb is a novel target that is rigorously being explored for development of new anti-TB drugs. Our search for QcrB inhibitors of Mtb engaged the application of ML methods on a data set of 350 imidazopyridine amides curated from the literature. Three distinct classes of molecular descriptors were calculated, and eight different ML algorithms were applied to the data sets. Of the 24 models built, support vector machine was selected as the appropriate algorithm for classification of the data set based on various performance metrics. The model was further analyzed using a validation set. To complete validation of the ecfp6_svm model, Y-randomization was carried out. The model was applied to a known set of imidazopyridine amides, and it was found to correctly classify (according to the literature) all potent molecules as active. New QcrB inhibitors were identified using the model to predict the bioactivity on a data set downloaded from PubChem. Lastly, the classifier model was deployed as a web application for public usage.

30 in total

1. Respiratory flexibility in response to inhibition of cytochrome C oxidase in Mycobacterium tuberculosis.

Authors: Kriti Arora; Bernardo Ochoa-Montaño; Patricia S Tsang; Tom L Blundell; Stephanie S Dawes; Valerie Mizrahi; Tracy Bayliss; Claire J Mackenzie; Laura A T Cleghorn; Peter C Ray; Paul G Wyatt; Eugene Uh; Jinwoo Lee; Clifton E Barry; Helena I Boshoff
Journal: Antimicrob Agents Chemother Date: 2014-08-25 Impact factor: 5.191

Review 2. The QSAR Paradigm in Fragment-Based Drug Discovery: From the Virtual Generation of Target Inhibitors to Multi-Scale Modeling.

Authors: Valeria V Kleandrova; Alejandro Speck-Planche
Journal: Mini Rev Med Chem Date: 2020 Impact factor: 3.862

Review 3. From machine learning to deep learning: progress in machine intelligence for rational drug discovery.

Authors: Lu Zhang; Jianjun Tan; Dan Han; Hao Zhu
Journal: Drug Discov Today Date: 2017-09-04 Impact factor: 7.851

4. Scaffold-switching: an exploration of 5,6-fused bicyclic heteroaromatics systems to afford antituberculosis activity akin to the imidazo[1,2-a]pyridine-3-carboxylates.

Authors: Garrett C Moraski; Allen G Oliver; Lowell D Markley; Sanghyun Cho; Scott G Franzblau; Marvin J Miller
Journal: Bioorg Med Chem Lett Date: 2014-05-28 Impact factor: 2.823

5. Preparation and Evaluation of Potent Pentafluorosulfanyl-Substituted Anti-Tuberculosis Compounds.

Authors: Garrett C Moraski; Ryan Bristol; Natalie Seeger; Helena I Boshoff; Patricia Siu-Yee Tsang; Marvin J Miller
Journal: ChemMedChem Date: 2017-06-27 Impact factor: 3.466

6. New insights toward the discovery of antibacterial agents: multi-tasking QSBER model for the simultaneous prediction of anti-tuberculosis activity and toxicological profiles of drugs.

Authors: Alejandro Speck-Planche; Valeria V Kleandrova; M Natália D S Cordeiro
Journal: Eur J Pharm Sci Date: 2013-02-01 Impact factor: 4.384

7. Putting Tuberculosis (TB) To Rest: Transformation of the Sleep Aid, Ambien, and "Anagrams" Generated Potent Antituberculosis Agents.

Authors: Garrett C Moraski; Patricia A Miller; Mai Ann Bailey; Juliane Ollinger; Tanya Parish; Helena I Boshoff; Sanghyun Cho; Jeffery R Anderson; Surafel Mulugeta; Scott G Franzblau; Marvin J Miller
Journal: ACS Infect Dis Date: 2014-12-27 Impact factor: 5.084

Review 8. Targeting Energy Metabolism in Mycobacterium tuberculosis, a New Paradigm in Antimycobacterial Drug Discovery.

Authors: Dirk Bald; Cristina Villellas; Ping Lu; Anil Koul
Journal: mBio Date: 2017-04-11 Impact factor: 7.867

Review 9. Anticipating the impact of the COVID-19 pandemic on TB patients and TB control programmes.

Authors: Toyin Togun; Beate Kampmann; Neil Graham Stoker; Marc Lipman
Journal: Ann Clin Microbiol Antimicrob Date: 2020-05-23 Impact factor: 3.944

10. Identification of novel imidazo[1,2-a]pyridine inhibitors targeting M. tuberculosis QcrB.

Authors: Katherine A Abrahams; Jonathan A G Cox; Vickey L Spivey; Nicholas J Loman; Mark J Pallen; Chrystala Constantinidou; Raquel Fernández; Carlos Alemparte; Modesto J Remuiñán; David Barros; Lluis Ballell; Gurdyal S Besra
Journal: PLoS One Date: 2012-12-31 Impact factor: 3.240