Literature DB >> 24171408

Heterogeneous classifier fusion for ligand-based virtual screening: or, how decision making by committee can be a good thing.

Sereina Riniker1, Nikolas Fechner, Gregory A Landrum.   

Abstract

The concept of data fusion - the combination of information from different sources describing the same object with the expectation to generate a more accurate representation - has found application in a very broad range of disciplines. In the context of ligand-based virtual screening (VS), data fusion has been applied to combine knowledge from either different active molecules or different fingerprints to improve similarity search performance. Machine-learning (ML) methods based on fusion of multiple homogeneous classifiers, in particular random forests, have also been widely applied in the ML literature. The heterogeneous version of classifier fusion - fusing the predictions from different model types - has been less explored. Here, we investigate heterogeneous classifier fusion for ligand-based VS using three different ML methods, RF, naïve Bayes (NB), and logistic regression (LR), with four 2D fingerprints, atom pairs, topological torsions, RDKit fingerprint, and circular fingerprint. The methods are compared using a previously developed benchmarking platform for 2D fingerprints which is extended to ML methods in this article. The original data sets are filtered for difficulty, and a new set of challenging data sets from ChEMBL is added. Data sets were also generated for a second use case: starting from a small set of related actives instead of diverse actives. The final fused model consistently outperforms the other approaches across the broad variety of targets studied, indicating that heterogeneous classifier fusion is a very promising approach for ligand-based VS. The new data sets together with the adapted source code for ML methods are provided in the Supporting Information .

Mesh:

Substances:

Year:  2013        PMID: 24171408     DOI: 10.1021/ci400466r

Source DB:  PubMed          Journal:  J Chem Inf Model        ISSN: 1549-9596            Impact factor:   4.956


  12 in total

1.  Idea2Data: Toward a New Paradigm for Drug Discovery.

Authors:  Christos A Nicolaou; Christine Humblet; Hong Hu; Eva M Martin; Frank C Dorsey; Thomas M Castle; Keith Ian Burton; Haitao Hu; Jorg Hendle; Michael J Hickey; Joel Duerksen; Jibo Wang; Jon A Erickson
Journal:  ACS Med Chem Lett       Date:  2019-02-04       Impact factor: 4.345

2.  ROCS-derived features for virtual screening.

Authors:  Steven Kearnes; Vijay Pande
Journal:  J Comput Aided Mol Des       Date:  2016-09-08       Impact factor: 3.686

3.  Data-science based analysis of perceptual spaces of odors in olfactory loss.

Authors:  Jörn Lötsch; Alfred Ultsch; Antje Hähner; Vivien Willgeroth; Moustafa Bensafi; Andrea Zaliani; Thomas Hummel
Journal:  Sci Rep       Date:  2021-05-19       Impact factor: 4.379

4.  S2DV: converting SMILES to a drug vector for predicting the activity of anti-HBV small molecules.

Authors:  Jinsong Shao; Qineng Gong; Zeyu Yin; Wenjie Pan; Sanjeevi Pandiyan; Li Wang
Journal:  Brief Bioinform       Date:  2022-03-10       Impact factor: 11.622

5.  Understanding the foundations of the structural similarities between marketed drugs and endogenous human metabolites.

Authors:  Steve O'Hagan; Douglas B Kell
Journal:  Front Pharmacol       Date:  2015-05-13       Impact factor: 5.810

6.  Condorcet and borda count fusion method for ligand-based virtual screening.

Authors:  Ali Ahmed; Faisal Saeed; Naomie Salim; Ammar Abdo
Journal:  J Cheminform       Date:  2014-05-03       Impact factor: 5.514

7.  CFam: a chemical families database based on iterative selection of functional seeds and seed-directed compound clustering.

Authors:  Cheng Zhang; Lin Tao; Chu Qin; Peng Zhang; Shangying Chen; Xian Zeng; Feng Xu; Zhe Chen; Sheng Yong Yang; Yu Zong Chen
Journal:  Nucleic Acids Res       Date:  2014-11-20       Impact factor: 16.971

8.  MetMaxStruct: A Tversky-Similarity-Based Strategy for Analysing the (Sub)Structural Similarities of Drugs and Endogenous Metabolites.

Authors:  Steve O'Hagan; Douglas B Kell
Journal:  Front Pharmacol       Date:  2016-08-22       Impact factor: 5.810

9.  Filtered circular fingerprints improve either prediction or runtime performance while retaining interpretability.

Authors:  Martin Gütlein; Stefan Kramer
Journal:  J Cheminform       Date:  2016-10-31       Impact factor: 5.514

10.  Connecting proteins with drug-like compounds: Open source drug discovery workflows with BindingDB and KNIME.

Authors:  George Nicola; Michael R Berthold; Michael P Hedrick; Michael K Gilson
Journal:  Database (Oxford)       Date:  2015-09-16       Impact factor: 3.451

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.