Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Beware of machine learning-based scoring functions-on the danger of developing black boxes.

Literature DB >> 25207678

Beware of machine learning-based scoring functions-on the danger of developing black boxes.

Joffrey Gabel¹, Jérémy Desaphy, Didier Rognan.

Abstract

Training machine learning algorithms with protein-ligand descriptors has recently gained considerable attention to predict binding constants from atomic coordinates. Starting from a series of recent reports stating the advantages of this approach over empirical scoring functions, we could indeed reproduce the claimed superiority of Random Forest and Support Vector Machine-based scoring functions to predict experimental binding constants from protein-ligand X-ray structures of the PDBBind dataset. Strikingly, these scoring functions, trained on simple protein-ligand element-element distance counts, were almost unable to enrich virtual screening hit lists in true actives upon docking experiments of 10 reference DUD-E datasets; this is a a feature that, however, has been verified for an a priori less-accurate empirical scoring function (Surflex-Dock). By systematically varying ligand poses from true X-ray coordinates, we show that the Surflex-Dock scoring function is logically sensitive to the quality of docking poses. Conversely, our machine-learning based scoring functions are totally insensitive to docking poses (up to 10 Å root-mean square deviations) and just describe atomic element counts. This report does not disqualify using machine learning algorithms to design scoring functions. Protein-ligand element-element distance counts should however be used with extreme caution and only applied in a meaningful way. To avoid developing novel but meaningless scoring functions, we propose that two additional benchmarking tests must be systematically done when developing novel scoring functions: (i) sensitivity to docking pose accuracy, and (ii) ability to enrich hit lists in true actives upon structure-based (docking, receptor-ligand pharmacophore) virtual screening of reference datasets.

Entities: Disease

Mesh：

Substances：

Year: 2014 PMID： 25207678 DOI： 10.1021/ci500406k

Source DB: PubMed Journal: J Chem Inf Model ISSN： 1549-9596 Impact factor: 4.956

Keyword Cloud
Cited

27 in total

1. Docking pose selection by interaction pattern graph similarity: application to the D3R grand challenge 2015.

Authors: Inna Slynko; Franck Da Silva; Guillaume Bret; Didier Rognan
Journal: J Comput Aided Mol Des Date: 2016-08-01 Impact factor: 3.686

2. Improving scoring-docking-screening powers of protein-ligand scoring functions using random forest.

Authors: Cheng Wang; Yingkai Zhang
Journal: J Comput Chem Date: 2016-11-17 Impact factor: 3.376

3. Convex-PL: a novel knowledge-based potential for protein-ligand interactions deduced from structural databases using convex optimization.

Authors: Maria Kadukova; Sergei Grudinin
Journal: J Comput Aided Mol Des Date: 2017-09-18 Impact factor: 3.686

4. Nonparametric chemical descriptors for the calculation of ligand-biopolymer affinities with machine-learning scoring functions.

Authors: Edelmiro Moman; Maria A Grishina; Vladimir A Potemkin
Journal: J Comput Aided Mol Des Date: 2019-11-14 Impact factor: 3.686

Review 10. Application of Computational Biology and Artificial Intelligence Technologies in Cancer Precision Drug Discovery.

Authors: Nagasundaram Nagarajan; Edward K Y Yapp; Nguyen Quoc Khanh Le; Balu Kamaraj; Abeer Mohammed Al-Subaie; Hui-Yuan Yeh
Journal: Biomed Res Int Date: 2019-11-11 Impact factor: 3.411

Beware of machine learning-based scoring functions-on the danger of developing black boxes.

1. Docking pose selection by interaction pattern graph similarity: application to the D3R grand challenge 2015.

2. Improving scoring-docking-screening powers of protein-ligand scoring functions using random forest.

3. Convex-PL: a novel knowledge-based potential for protein-ligand interactions deduced from structural databases using convex optimization.

4. Nonparametric chemical descriptors for the calculation of ligand-biopolymer affinities with machine-learning scoring functions.

5. Protein-Ligand Scoring with Convolutional Neural Networks.

6. Incorporating Explicit Water Molecules and Ligand Conformation Stability in Machine-Learning Scoring Functions.

7. A D3R prospective evaluation of machine learning for protein-ligand scoring.

Review 8. A review of mathematical representations of biomolecular data.

9. Interaction with specific HSP90 residues as a scoring function: validation in the D3R Grand Challenge 2015.

Review 10. Application of Computational Biology and Artificial Intelligence Technologies in Cancer Precision Drug Discovery.