Literature DB >> 33325717

Bioactivity Comparison across Multiple Machine Learning Algorithms Using over 5000 Datasets for Drug Discovery.

Thomas R Lane1, Daniel H Foil1, Eni Minerali1, Fabio Urbina2, Kimberley M Zorn1, Sean Ekins1.   

Abstract

Machine learning methods are attracting considerable attention from the pharmaceutical industry for use in drug discovery and applications beyond. In recent studies, we and others have applied multiple machine learning algorithms and modeling metrics and, in some cases, compared molecular descriptors to build models for individual targets or properties on a relatively small scale. Several research groups have used large numbers of datasets from public databases such as ChEMBL in order to evaluate machine learning methods of interest to them. The largest of these types of studies used on the order of 1400 datasets. We have now extracted well over 5000 datasets from CHEMBL for use with the ECFP6 fingerprint and in comparison of our proprietary software Assay Central with random forest, k-nearest neighbors, support vector classification, naïve Bayesian, AdaBoosted decision trees, and deep neural networks (three layers). Model performance was assessed using an array of fivefold cross-validation metrics including area-under-the-curve, F1 score, Cohen's kappa, and Matthews correlation coefficient. Based on ranked normalized scores for the metrics or datasets, all methods appeared comparable, while the distance from the top indicated that Assay Central and support vector classification were comparable. Unlike prior studies which have placed considerable emphasis on deep neural networks (deep learning), no advantage was seen in this case. If anything, Assay Central may have been at a slight advantage as the activity cutoff for each of the over 5000 datasets representing over 570,000 unique compounds was based on Assay Central performance, although support vector classification seems to be a strong competitor. We also applied Assay Central to perform prospective predictions for the toxicity targets PXR and hERG to further validate these models. This work appears to be the largest scale comparison of these machine learning algorithms to date. Future studies will likely evaluate additional databases, descriptors, and machine learning algorithms and further refine the methods for evaluating and comparing such models.

Entities:  

Keywords:  deep learning; drug discovery; machine learning; pharmaceutics; support vector machines

Mesh:

Year:  2020        PMID: 33325717      PMCID: PMC8237624          DOI: 10.1021/acs.molpharmaceut.0c01013

Source DB:  PubMed          Journal:  Mol Pharm        ISSN: 1543-8384            Impact factor:   4.939


  69 in total

1.  Use of robust classification techniques for the prediction of human cytochrome P450 2D6 inhibition.

Authors:  Roberta G Susnow; Steven L Dixon
Journal:  J Chem Inf Comput Sci       Date:  2003 Jul-Aug

2.  Predicting Caco-2 permeability using support vector machine and chemistry development kit.

Authors:  Maria Guangli; Cheng Yiyu
Journal:  J Pharm Pharm Sci       Date:  2006       Impact factor: 2.327

3.  Analysis of pharmacology data and the prediction of adverse drug reactions and off-target effects from chemical structure.

Authors:  Andreas Bender; Josef Scheiber; Meir Glick; John W Davies; Kamal Azzaoui; Jacques Hamon; Laszlo Urban; Steven Whitebread; Jeremy L Jenkins
Journal:  ChemMedChem       Date:  2007-06       Impact factor: 3.466

4.  Repurposing Approved Drugs as Inhibitors of Kv7.1 and Nav1.8 to Treat Pitt Hopkins Syndrome.

Authors:  Sean Ekins; Jacob Gerlach; Kimberley M Zorn; Brett M Antonio; Zhixin Lin; Aaron Gerlach
Journal:  Pharm Res       Date:  2019-07-22       Impact factor: 4.200

5.  Exploiting machine learning for end-to-end drug discovery and development.

Authors:  Sean Ekins; Ana C Puhl; Kimberley M Zorn; Thomas R Lane; Daniel P Russo; Jennifer J Klein; Anthony J Hickey; Alex M Clark
Journal:  Nat Mater       Date:  2019-04-18       Impact factor: 43.841

6.  Beyond the hype: deep neural networks outperform established methods using a ChEMBL bioactivity benchmark set.

Authors:  Eelke B Lenselink; Niels Ten Dijke; Brandon Bongers; George Papadatos; Herman W T van Vlijmen; Wojtek Kowalczyk; Adriaan P IJzerman; Gerard J P van Westen
Journal:  J Cheminform       Date:  2017-08-14       Impact factor: 5.514

7.  The Chemistry Development Kit (CDK) v2.0: atom typing, depiction, molecular formulas, and substructure searching.

Authors:  Egon L Willighagen; John W Mayfield; Jonathan Alvarsson; Arvid Berg; Lars Carlsson; Nina Jeliazkova; Stefan Kuhn; Tomáš Pluskal; Miquel Rojas-Chertó; Ola Spjuth; Gilleain Torrance; Chris T Evelo; Rajarshi Guha; Christoph Steinbeck
Journal:  J Cheminform       Date:  2017-06-06       Impact factor: 5.514

8.  The ChEMBL database in 2017.

Authors:  Anna Gaulton; Anne Hersey; Michał Nowotka; A Patrícia Bento; Jon Chambers; David Mendez; Prudence Mutowo; Francis Atkinson; Louisa J Bellis; Elena Cibrián-Uhalte; Mark Davies; Nathan Dedman; Anneli Karlsson; María Paula Magariños; John P Overington; George Papadatos; Ines Smit; Andrew R Leach
Journal:  Nucleic Acids Res       Date:  2016-11-28       Impact factor: 16.971

9.  Machine learning methods in chemoinformatics.

Authors:  John B O Mitchell
Journal:  Wiley Interdiscip Rev Comput Mol Sci       Date:  2014-09-01

10.  Cell fishing: A similarity based approach and machine learning strategy for multiple cell lines-compound sensitivity prediction.

Authors:  E Tejera; I Carrera; Karina Jimenes-Vargas; V Armijos-Jaramillo; A Sánchez-Rodríguez; M Cruz-Monteagudo; Y Perez-Castillo
Journal:  PLoS One       Date:  2019-10-07       Impact factor: 3.240

View more
  6 in total

1.  The Commoditization of AI for Molecule Design.

Authors:  Fabio Urbina; Sean Ekins
Journal:  Artif Intell Life Sci       Date:  2022-01-24

2.  Quantum Machine Learning Algorithms for Drug Discovery Applications.

Authors:  Kushal Batra; Kimberley M Zorn; Daniel H Foil; Eni Minerali; Victor O Gawriljuk; Thomas R Lane; Sean Ekins
Journal:  J Chem Inf Model       Date:  2021-05-25       Impact factor: 6.162

Review 3.  An Overview of Organs-on-Chips Based on Deep Learning.

Authors:  Jintao Li; Jie Chen; Hua Bai; Haiwei Wang; Shiping Hao; Yang Ding; Bo Peng; Jing Zhang; Lin Li; Wei Huang
Journal:  Research (Wash D C)       Date:  2022-01-19

4.  Integrating concept of pharmacophore with graph neural networks for chemical property prediction and interpretation.

Authors:  Yue Kong; Xiaoman Zhao; Ruizi Liu; Zhenwu Yang; Hongyan Yin; Bowen Zhao; Jinling Wang; Bingjie Qin; Aixia Yan
Journal:  J Cheminform       Date:  2022-08-04       Impact factor: 8.489

Review 5.  Schistosomiasis Drug Discovery in the Era of Automation and Artificial Intelligence.

Authors:  José T Moreira-Filho; Arthur C Silva; Rafael F Dantas; Barbara F Gomes; Lauro R Souza Neto; Jose Brandao-Neto; Raymond J Owens; Nicholas Furnham; Bruno J Neves; Floriano P Silva-Junior; Carolina H Andrade
Journal:  Front Immunol       Date:  2021-05-31       Impact factor: 7.561

6.  Development of Machine Learning Models and the Discovery of a New Antiviral Compound against Yellow Fever Virus.

Authors:  Victor O Gawriljuk; Daniel H Foil; Ana C Puhl; Kimberley M Zorn; Thomas R Lane; Olga Riabova; Vadim Makarov; Andre S Godoy; Glaucius Oliva; Sean Ekins
Journal:  J Chem Inf Model       Date:  2021-07-21       Impact factor: 6.162

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.