Literature DB >> 34169324

Machine-learning scoring functions trained on complexes dissimilar to the test set already outperform classical counterparts on a blind benchmark.

Hongjian Li1, Gang Lu2, Kam-Heung Sze3, Xianwei Su1, Wai-Yee Chan4, Kwong-Sak Leung5.   

Abstract

The superior performance of machine-learning scoring functions for docking has caused a series of debates on whether it is due to learning knowledge from training data that are similar in some sense to the test data. With a systematically revised methodology and a blind benchmark realistically mimicking the process of prospective prediction of binding affinity, we have evaluated three broadly used classical scoring functions and five machine-learning counterparts calibrated with both random forest and extreme gradient boosting using both solo and hybrid features, showing for the first time that machine-learning scoring functions trained exclusively on a proportion of as low as 8% complexes dissimilar to the test set already outperform classical scoring functions, a percentage that is far lower than what has been recently reported on all the three CASF benchmarks. The performance of machine-learning scoring functions is underestimated due to the absence of similar samples in some artificially created training sets that discard the full spectrum of complexes to be found in a prospective environment. Given the inevitability of any degree of similarity contained in a large dataset, the criteria for scoring function selection depend on which one can make the best use of all available materials. Software code and data are provided at https://github.com/cusdulab/MLSF for interested readers to rapidly rebuild the scoring functions and reproduce our results, even to make extended analyses on their own benchmarks.
© The Author(s) 2021. Published by Oxford University Press.

Entities:  

Keywords:  binding affinity; blind benchmark; machine learning; random forest; scoring function; scoring power

Mesh:

Substances:

Year:  2021        PMID: 34169324      PMCID: PMC8575004          DOI: 10.1093/bib/bbab225

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   11.622


  19 in total

1.  Scoring function for automated assessment of protein structure template quality.

Authors:  Yang Zhang; Jeffrey Skolnick
Journal:  Proteins       Date:  2004-12-01

2.  Extended-connectivity fingerprints.

Authors:  David Rogers; Mathew Hahn
Journal:  J Chem Inf Model       Date:  2010-05-24       Impact factor: 4.956

3.  Structural and Sequence Similarity Makes a Significant Impact on Machine-Learning-Based Scoring Functions for Protein-Ligand Interactions.

Authors:  Yang Li; Jianyi Yang
Journal:  J Chem Inf Model       Date:  2017-04-05       Impact factor: 4.956

4.  Comparative Assessment of Scoring Functions: The CASF-2016 Update.

Authors:  Minyi Su; Qifan Yang; Yu Du; Guoqin Feng; Zhihai Liu; Yan Li; Renxiao Wang
Journal:  J Chem Inf Model       Date:  2018-12-11       Impact factor: 4.956

5.  Improved protein-ligand binding affinity prediction by using a curvature-dependent surface-area model.

Authors:  Yang Cao; Lei Li
Journal:  Bioinformatics       Date:  2014-02-21       Impact factor: 6.937

6.  Improving AutoDock Vina Using Random Forest: The Growing Accuracy of Binding Affinity Prediction by the Effective Exploitation of Larger Data Sets.

Authors:  Hongjian Li; Kwong-Sak Leung; Man-Hon Wong; Pedro J Ballester
Journal:  Mol Inform       Date:  2015-02-12       Impact factor: 3.353

7.  Can machine learning consistently improve the scoring power of classical scoring functions? Insights into the role of machine learning in scoring functions.

Authors:  Chao Shen; Ye Hu; Zhe Wang; Xujun Zhang; Haiyang Zhong; Gaoang Wang; Xiaojun Yao; Lei Xu; Dongsheng Cao; Tingjun Hou
Journal:  Brief Bioinform       Date:  2021-01-18       Impact factor: 11.622

8.  Tapping on the Black Box: How Is the Scoring Power of a Machine-Learning Scoring Function Dependent on the Training Set?

Authors:  Minyi Su; Guoqin Feng; Zhihai Liu; Yan Li; Renxiao Wang
Journal:  J Chem Inf Model       Date:  2020-03-03       Impact factor: 4.956

9.  Substituting random forest for multiple linear regression improves binding affinity prediction of scoring functions: Cyscore as a case study.

Authors:  Hongjian Li; Kwong-Sak Leung; Man-Hon Wong; Pedro J Ballester
Journal:  BMC Bioinformatics       Date:  2014-08-27       Impact factor: 3.169

10.  MM-align: a quick algorithm for aligning multiple-chain protein complex structures using iterative dynamic programming.

Authors:  Srayanta Mukherjee; Yang Zhang
Journal:  Nucleic Acids Res       Date:  2009-05-14       Impact factor: 16.971

View more
  1 in total

1.  Scoring Functions for Protein-Ligand Binding Affinity Prediction using Structure-Based Deep Learning: A Review.

Authors:  Rocco Meli; Garrett M Morris; Philip C Biggin
Journal:  Front Bioinform       Date:  2022-06-17
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.