Literature DB >> 25995041

Open Source Bayesian Models. 2. Mining a "Big Dataset" To Create and Validate Models with ChEMBL.

Alex M Clark1, Sean Ekins2,3,4.   

Abstract

In an associated paper, we have described a reference implementation of Laplacian-corrected naïve Bayesian model building using extended connectivity (ECFP)- and molecular function class fingerprints of maximum diameter 6 (FCFP)-type fingerprints. As a follow-up, we have now undertaken a large-scale validation study in order to ensure that the technique generalizes to a broad variety of drug discovery datasets. To achieve this, we have used the ChEMBL (version 20) database and split it into more than 2000 separate datasets, each of which consists of compounds and measurements with the same target and activity measurement. In order to test these datasets with the two-state Bayesian classification, we developed an automated algorithm for detecting a suitable threshold for active/inactive designation, which we applied to all collections. With these datasets, we were able to establish that our Bayesian model implementation is effective for the large majority of cases, and we were able to quantify the impact of fingerprint folding on the receiver operator curve cross-validation metrics. We were also able to study the impact that the choice of training/testing set partitioning has on the resulting recall rates. The datasets have been made publicly available to be downloaded, along with the corresponding model data files, which can be used in conjunction with the CDK and several mobile apps. We have also explored some novel visualization methods which leverage the structural origins of the ECFP/FCFP fingerprints to attribute regions of a molecule responsible for positive and negative contributions to activity. The ability to score molecules across thousands of relevant datasets across organisms also may help to access desirable and undesirable off-target effects as well as suggest potential targets for compounds derived from phenotypic screens.

Mesh:

Year:  2015        PMID: 25995041     DOI: 10.1021/acs.jcim.5b00144

Source DB:  PubMed          Journal:  J Chem Inf Model        ISSN: 1549-9596            Impact factor:   4.956


  44 in total

1.  Assessment of Substrate-Dependent Ligand Interactions at the Organic Cation Transporter OCT2 Using Six Model Substrates.

Authors:  Philip J Sandoval; Kimberley M Zorn; Alex M Clark; Sean Ekins; Stephen H Wright
Journal:  Mol Pharmacol       Date:  2018-06-08       Impact factor: 4.436

2.  Making Transporter Models for Drug-Drug Interaction Prediction Mobile.

Authors:  Sean Ekins; Alex M Clark; Stephen H Wright
Journal:  Drug Metab Dispos       Date:  2015-07-21       Impact factor: 3.922

3.  Machine Learning Platform to Discover Novel Growth Inhibitors of Neisseria gonorrhoeae.

Authors:  Janaina Cruz Pereira; Samer S Daher; Kimberley M Zorn; Matthew Sherwood; Riccardo Russo; Alexander L Perryman; Xin Wang; Madeleine J Freundlich; Sean Ekins; Joel S Freundlich
Journal:  Pharm Res       Date:  2020-07-13       Impact factor: 4.200

4.  Doing it All - How Families are Reshaping Rare Disease Research.

Authors:  Sean Ekins; Ethan O Perlstein
Journal:  Pharm Res       Date:  2018-08-16       Impact factor: 4.200

5.  High-throughput screening and Bayesian machine learning for copper-dependent inhibitors of Staphylococcus aureus.

Authors:  Alex G Dalecki; Kimberley M Zorn; Alex M Clark; Sean Ekins; Whitney T Narmore; Nichole Tower; Lynn Rasmussen; Robert Bostwick; Olaf Kutsch; Frank Wolschendorf
Journal:  Metallomics       Date:  2019-03-20       Impact factor: 4.526

6.  Efficacy of Tilorone Dihydrochloride against Ebola Virus Infection.

Authors:  Sean Ekins; Mary A Lingerfelt; Jason E Comer; Alexander N Freiberg; Jon C Mirsalis; Kathleen O'Loughlin; Anush Harutyunyan; Claire McFarlane; Carol E Green; Peter B Madrid
Journal:  Antimicrob Agents Chemother       Date:  2018-01-25       Impact factor: 5.191

7.  Lack of Influence of Substrate on Ligand Interaction with the Human Multidrug and Toxin Extruder, MATE1.

Authors:  Lucy J Martínez-Guerrero; Mark Morales; Sean Ekins; Stephen H Wright
Journal:  Mol Pharmacol       Date:  2016-07-14       Impact factor: 4.436

8.  Comparing Machine Learning Algorithms for Predicting Drug-Induced Liver Injury (DILI).

Authors:  Eni Minerali; Daniel H Foil; Kimberley M Zorn; Thomas R Lane; Sean Ekins
Journal:  Mol Pharm       Date:  2020-06-08       Impact factor: 4.939

9.  Comparing Machine Learning Models for Aromatase (P450 19A1).

Authors:  Kimberley M Zorn; Daniel H Foil; Thomas R Lane; Wendy Hillwalker; David J Feifarek; Frank Jones; William D Klaren; Ashley M Brinkman; Sean Ekins
Journal:  Environ Sci Technol       Date:  2020-11-19       Impact factor: 9.028

10.  Comparison of Machine Learning Models for the Androgen Receptor.

Authors:  Kimberley M Zorn; Daniel H Foil; Thomas R Lane; Wendy Hillwalker; David J Feifarek; Frank Jones; William D Klaren; Ashley M Brinkman; Sean Ekins
Journal:  Environ Sci Technol       Date:  2020-10-21       Impact factor: 9.028

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.