Literature DB >> 34110578

Development and validation of consensus machine learning-based models for the prediction of novel small molecules as potential anti-tubercular agents.

Mushtaq Ahmad Wani1, Kuldeep K Roy2.   

Abstract

Tuberculosis (TB) is an infectious disease and the leading cause of death globally. The rapidly emerging cases of drug resistance among pathogenic mycobacteria have been a global threat urging the need of new drug discovery and development. However, considering the fact that the new drug discovery and development is commonly lengthy and costly processes, strategic use of the cutting-edge machine learning (ML) algorithms may be very supportive in reducing both the cost and time involved. Considering the urgency of new drugs for TB, herein, we have attempted to develop predictive ML algorithms-based models useful in the selection of novel potential small molecules for subsequent in vitro validation. For this purpose, we used the GlaxoSmithKline (GSK) TCAMS TB dataset comprising a total of 776 hits that were made publicly available to the wider scientific community through the ChEMBL Neglected Tropical Diseases (ChEMBL-NTD) database. After exploring the different ML classifiers, viz. decision trees (DT), support vector machine (SVM), random forest (RF), Bernoulli Naive Bayes (BNB), K-nearest neighbors (k-NN), and linear logistic regression (LLR), and ensemble learning models (bagging and Adaboost) for training the model using the GSK dataset, we concluded with three best models, viz. Adaboost decision tree (ABDT), RF classifier, and k-NN models that gave the top prediction results for both the training and test sets. However, during the prediction of the external set of known anti-tubercular compounds/drugs, it was realized that each of these models had some limitations. The ABDT model correctly predicted 22 molecules as actives, while both the RF and k-NN models predicted 18 molecules correctly as actives; a number of molecules were predicted as actives by two of these models, while the third model predicted these compounds as inactives. Therefore, we concluded that while deciding the anti-tubercular potential of a new molecule, one should rely on the use of consensus predictions using these three models; it may lessen the attrition rate during the in vitro validation. We believe that this study may assist the wider anti-tuberculosis research community by providing a platform for predicting small molecules with subsequent validation for drug discovery and development.
© 2021. The Author(s), under exclusive licence to Springer Nature Switzerland AG.

Entities:  

Keywords:  ABDT; Machine learning; Mycobacterium tuberculosis; RF; Tuberculosis

Mesh:

Substances:

Year:  2021        PMID: 34110578     DOI: 10.1007/s11030-021-10238-y

Source DB:  PubMed          Journal:  Mol Divers        ISSN: 1381-1991            Impact factor:   2.943


  8 in total

1.  Support vector machine classification and validation of cancer tissue samples using microarray expression data.

Authors:  T S Furey; N Cristianini; N Duffy; D W Bednarski; M Schummer; D Haussler
Journal:  Bioinformatics       Date:  2000-10       Impact factor: 6.937

2.  Sequence analysis by additive scales: DNA structure for sequences and repeats of all lengths.

Authors:  P Baldi; P F Baisnée
Journal:  Bioinformatics       Date:  2000-10       Impact factor: 6.937

3.  Constructing optimum blood brain barrier QSAR models using a combination of 4D-molecular similarity measures and cluster analysis.

Authors:  Dahua Pan; Manisha Iyer; Jianzhong Liu; Yi Li; Anton J Hopfinger
Journal:  J Chem Inf Comput Sci       Date:  2004 Nov-Dec

4.  Joint learning of labels and distance metric.

Authors:  Bo Liu; Meng Wang; Richang Hong; Zhengjun Zha; Xian-Sheng Hua
Journal:  IEEE Trans Syst Man Cybern B Cybern       Date:  2009-12-04

Review 5.  Effects of Distance Measure Choice on K-Nearest Neighbor Classifier Performance: A Review.

Authors:  Haneen Arafat Abu Alfeilat; Ahmad B A Hassanat; Omar Lasassmeh; Ahmad S Tarawneh; Mahmoud Bashir Alhasanat; Hamzeh S Eyal Salman; V B Surya Prasath
Journal:  Big Data       Date:  2019-08-14       Impact factor: 2.128

6.  A review of goodness of fit statistics for use in the development of logistic regression models.

Authors:  S Lemeshow; D W Hosmer
Journal:  Am J Epidemiol       Date:  1982-01       Impact factor: 4.897

7.  Fueling open-source drug discovery: 177 small-molecule leads against tuberculosis.

Authors:  Lluís Ballell; Robert H Bates; Rob J Young; Daniel Alvarez-Gomez; Emilio Alvarez-Ruiz; Vanessa Barroso; Delia Blanco; Benigno Crespo; Jaime Escribano; Rubén González; Sonia Lozano; Sophie Huss; Angel Santos-Villarejo; José Julio Martín-Plaza; Alfonso Mendoza; María José Rebollo-Lopez; Modesto Remuiñan-Blanco; José Luis Lavandera; Esther Pérez-Herran; Francisco Javier Gamo-Benito; José Francisco García-Bustos; David Barros; Julia P Castro; Nicholas Cammack
Journal:  ChemMedChem       Date:  2013-01-10       Impact factor: 3.466

8.  Machine Learning and Artificial Intelligence: Two Fellow Travelers on the Quest for Intelligent Behavior in Machines.

Authors:  Kristian Kersting
Journal:  Front Big Data       Date:  2018-11-19
  8 in total
  1 in total

1.  Phase Prediction of High-Entropy Alloys by Integrating Criterion and Machine Learning Recommendation Method.

Authors:  Shuai Hou; Yujiao Li; Meijuan Bai; Mengyue Sun; Weiwei Liu; Chao Wang; Halil Tetik; Dong Lin
Journal:  Materials (Basel)       Date:  2022-05-05       Impact factor: 3.748

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.