Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Machine-learning scoring functions trained on complexes dissimilar to the test set already outperform classical counterparts on a blind benchmark.

Literature DB >> 34169324

Machine-learning scoring functions trained on complexes dissimilar to the test set already outperform classical counterparts on a blind benchmark.

Hongjian Li¹, Gang Lu², Kam-Heung Sze³, Xianwei Su¹, Wai-Yee Chan⁴, Kwong-Sak Leung⁵.

Abstract

The superior performance of machine-learning scoring functions for docking has caused a series of debates on whether it is due to learning knowledge from training data that are similar in some sense to the test data. With a systematically revised methodology and a blind benchmark realistically mimicking the process of prospective prediction of binding affinity, we have evaluated three broadly used classical scoring functions and five machine-learning counterparts calibrated with both random forest and extreme gradient boosting using both solo and hybrid features, showing for the first time that machine-learning scoring functions trained exclusively on a proportion of as low as 8% complexes dissimilar to the test set already outperform classical scoring functions, a percentage that is far lower than what has been recently reported on all the three CASF benchmarks. The performance of machine-learning scoring functions is underestimated due to the absence of similar samples in some artificially created training sets that discard the full spectrum of complexes to be found in a prospective environment. Given the inevitability of any degree of similarity contained in a large dataset, the criteria for scoring function selection depend on which one can make the best use of all available materials. Software code and data are provided at https://github.com/cusdulab/MLSF for interested readers to rapidly rebuild the scoring functions and reproduce our results, even to make extended analyses on their own benchmarks.

Entities: Chemical

Keywords: binding affinity; blind benchmark; machine learning; random forest; scoring function; scoring power

Mesh：

Substances：
Ligands

Year: 2021 PMID： 34169324 PMCID： PMC8575004 DOI： 10.1093/bib/bbab225

Source DB: PubMed Journal: Brief Bioinform ISSN： 1467-5463 Impact factor: 11.622

19 in total

1. Scoring function for automated assessment of protein structure template quality.

Authors: Yang Zhang; Jeffrey Skolnick
Journal: Proteins Date: 2004-12-01

2. Extended-connectivity fingerprints.

Authors: David Rogers; Mathew Hahn
Journal: J Chem Inf Model Date: 2010-05-24 Impact factor: 4.956

3. Structural and Sequence Similarity Makes a Significant Impact on Machine-Learning-Based Scoring Functions for Protein-Ligand Interactions.

Authors: Yang Li; Jianyi Yang
Journal: J Chem Inf Model Date: 2017-04-05 Impact factor: 4.956

4. Comparative Assessment of Scoring Functions: The CASF-2016 Update.

Authors: Minyi Su; Qifan Yang; Yu Du; Guoqin Feng; Zhihai Liu; Yan Li; Renxiao Wang
Journal: J Chem Inf Model Date: 2018-12-11 Impact factor: 4.956

5. Improved protein-ligand binding affinity prediction by using a curvature-dependent surface-area model.

Authors: Yang Cao; Lei Li
Journal: Bioinformatics Date: 2014-02-21 Impact factor: 6.937

6. Improving AutoDock Vina Using Random Forest: The Growing Accuracy of Binding Affinity Prediction by the Effective Exploitation of Larger Data Sets.

Authors: Hongjian Li; Kwong-Sak Leung; Man-Hon Wong; Pedro J Ballester
Journal: Mol Inform Date: 2015-02-12 Impact factor: 3.353

7. Can machine learning consistently improve the scoring power of classical scoring functions? Insights into the role of machine learning in scoring functions.

Authors: Chao Shen; Ye Hu; Zhe Wang; Xujun Zhang; Haiyang Zhong; Gaoang Wang; Xiaojun Yao; Lei Xu; Dongsheng Cao; Tingjun Hou
Journal: Brief Bioinform Date: 2021-01-18 Impact factor: 11.622

8. Tapping on the Black Box: How Is the Scoring Power of a Machine-Learning Scoring Function Dependent on the Training Set?

Authors: Minyi Su; Guoqin Feng; Zhihai Liu; Yan Li; Renxiao Wang
Journal: J Chem Inf Model Date: 2020-03-03 Impact factor: 4.956

9. Substituting random forest for multiple linear regression improves binding affinity prediction of scoring functions: Cyscore as a case study.

Authors: Hongjian Li; Kwong-Sak Leung; Man-Hon Wong; Pedro J Ballester
Journal: BMC Bioinformatics Date: 2014-08-27 Impact factor: 3.169

10. MM-align: a quick algorithm for aligning multiple-chain protein complex structures using iterative dynamic programming.

Authors: Srayanta Mukherjee; Yang Zhang
Journal: Nucleic Acids Res Date: 2009-05-14 Impact factor: 16.971

1 in total

1. Scoring Functions for Protein-Ligand Binding Affinity Prediction using Structure-Based Deep Learning: A Review.

Authors: Rocco Meli; Garrett M Morris; Philip C Biggin
Journal: Front Bioinform Date: 2022-06-17

1 in total