Literature DB >> 25207678

Beware of machine learning-based scoring functions-on the danger of developing black boxes.

Joffrey Gabel1, Jérémy Desaphy, Didier Rognan.   

Abstract

Training machine learning algorithms with protein-ligand descriptors has recently gained considerable attention to predict binding constants from atomic coordinates. Starting from a series of recent reports stating the advantages of this approach over empirical scoring functions, we could indeed reproduce the claimed superiority of Random Forest and Support Vector Machine-based scoring functions to predict experimental binding constants from protein-ligand X-ray structures of the PDBBind dataset. Strikingly, these scoring functions, trained on simple protein-ligand element-element distance counts, were almost unable to enrich virtual screening hit lists in true actives upon docking experiments of 10 reference DUD-E datasets; this is a a feature that, however, has been verified for an a priori less-accurate empirical scoring function (Surflex-Dock). By systematically varying ligand poses from true X-ray coordinates, we show that the Surflex-Dock scoring function is logically sensitive to the quality of docking poses. Conversely, our machine-learning based scoring functions are totally insensitive to docking poses (up to 10 Å root-mean square deviations) and just describe atomic element counts. This report does not disqualify using machine learning algorithms to design scoring functions. Protein-ligand element-element distance counts should however be used with extreme caution and only applied in a meaningful way. To avoid developing novel but meaningless scoring functions, we propose that two additional benchmarking tests must be systematically done when developing novel scoring functions: (i) sensitivity to docking pose accuracy, and (ii) ability to enrich hit lists in true actives upon structure-based (docking, receptor-ligand pharmacophore) virtual screening of reference datasets.

Entities:  

Mesh:

Substances:

Year:  2014        PMID: 25207678     DOI: 10.1021/ci500406k

Source DB:  PubMed          Journal:  J Chem Inf Model        ISSN: 1549-9596            Impact factor:   4.956


  27 in total

1.  Docking pose selection by interaction pattern graph similarity: application to the D3R grand challenge 2015.

Authors:  Inna Slynko; Franck Da Silva; Guillaume Bret; Didier Rognan
Journal:  J Comput Aided Mol Des       Date:  2016-08-01       Impact factor: 3.686

2.  Improving scoring-docking-screening powers of protein-ligand scoring functions using random forest.

Authors:  Cheng Wang; Yingkai Zhang
Journal:  J Comput Chem       Date:  2016-11-17       Impact factor: 3.376

3.  Convex-PL: a novel knowledge-based potential for protein-ligand interactions deduced from structural databases using convex optimization.

Authors:  Maria Kadukova; Sergei Grudinin
Journal:  J Comput Aided Mol Des       Date:  2017-09-18       Impact factor: 3.686

4.  Nonparametric chemical descriptors for the calculation of ligand-biopolymer affinities with machine-learning scoring functions.

Authors:  Edelmiro Moman; Maria A Grishina; Vladimir A Potemkin
Journal:  J Comput Aided Mol Des       Date:  2019-11-14       Impact factor: 3.686

5.  Protein-Ligand Scoring with Convolutional Neural Networks.

Authors:  Matthew Ragoza; Joshua Hochuli; Elisa Idrobo; Jocelyn Sunseri; David Ryan Koes
Journal:  J Chem Inf Model       Date:  2017-04-11       Impact factor: 4.956

6.  Incorporating Explicit Water Molecules and Ligand Conformation Stability in Machine-Learning Scoring Functions.

Authors:  Jianing Lu; Xuben Hou; Cheng Wang; Yingkai Zhang
Journal:  J Chem Inf Model       Date:  2019-10-31       Impact factor: 4.956

7.  A D3R prospective evaluation of machine learning for protein-ligand scoring.

Authors:  Jocelyn Sunseri; Matthew Ragoza; Jasmine Collins; David Ryan Koes
Journal:  J Comput Aided Mol Des       Date:  2016-09-03       Impact factor: 3.686

Review 8.  A review of mathematical representations of biomolecular data.

Authors:  Duc Duy Nguyen; Zixuan Cang; Guo-Wei Wei
Journal:  Phys Chem Chem Phys       Date:  2020-02-26       Impact factor: 3.676

9.  Interaction with specific HSP90 residues as a scoring function: validation in the D3R Grand Challenge 2015.

Authors:  Diogo Santos-Martins
Journal:  J Comput Aided Mol Des       Date:  2016-08-22       Impact factor: 3.686

Review 10.  Application of Computational Biology and Artificial Intelligence Technologies in Cancer Precision Drug Discovery.

Authors:  Nagasundaram Nagarajan; Edward K Y Yapp; Nguyen Quoc Khanh Le; Balu Kamaraj; Abeer Mohammed Al-Subaie; Hui-Yuan Yeh
Journal:  Biomed Res Int       Date:  2019-11-11       Impact factor: 3.411

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.