Literature DB >> 24420747

Benchmarking protein-protein interface predictions: why you should care about protein size.

Juliette Martin1.   

Abstract

A number of predictive methods have been developed to predict protein-protein binding sites. Each new method is traditionally benchmarked using sets of protein structures of various sizes, and global statistics are used to assess the quality of the prediction. Little attention has been paid to the potential bias due to protein size on these statistics. Indeed, small proteins involve proportionally more residues at interfaces than large ones. If a predictive method is biased toward small proteins, this can lead to an over-estimation of its performance. Here, we investigate the bias due to the size effect when benchmarking protein-protein interface prediction on the widely used docking benchmark 4.0. First, we simulate random scores that favor small proteins over large ones. Instead of the 0.5 AUC (Area Under the Curve) value expected by chance, these biased scores result in an AUC equal to 0.6 using hypergeometric distributions, and up to 0.65 using constant scores. We then use real prediction results to illustrate how to detect the size bias by shuffling, and subsequently correct it using a simple conversion of the scores into normalized ranks. In addition, we investigate the scores produced by eight published methods and show that they are all affected by the size effect, which can change their relative ranking. The size effect also has an impact on linear combination scores by modifying the relative contributions of each method. In the future, systematic corrections should be applied when benchmarking predictive methods using data sets with mixed protein sizes.
© 2014 Wiley Periodicals, Inc.

Keywords:  assessment; benchmark; bias; interface; machine learning; prediction; protein; protein-protein interaction; structure

Mesh:

Substances:

Year:  2014        PMID: 24420747     DOI: 10.1002/prot.24512

Source DB:  PubMed          Journal:  Proteins        ISSN: 0887-3585


  6 in total

1.  Cryo-EM Data Are Superior to Contact and Interface Information in Integrative Modeling.

Authors:  Sjoerd J de Vries; Isaure Chauvot de Beauchêne; Christina E M Schindler; Martin Zacharias
Journal:  Biophys J       Date:  2016-02-01       Impact factor: 4.033

2.  ProB-Site: Protein Binding Site Prediction Using Local Features.

Authors:  Sharzil Haris Khan; Hilal Tayara; Kil To Chong
Journal:  Cells       Date:  2022-07-05       Impact factor: 7.666

3.  Identification and visualization of protein binding regions with the ArDock server.

Authors:  Sébastien Reille; Mélanie Garnier; Xavier Robert; Patrice Gouet; Juliette Martin; Guillaume Launay
Journal:  Nucleic Acids Res       Date:  2018-07-02       Impact factor: 16.971

4.  Deep Learning for Protein-Protein Interaction Site Prediction.

Authors:  Arian R Jamasb; Ben Day; Cătălina Cangea; Pietro Liò; Tom L Blundell
Journal:  Methods Mol Biol       Date:  2021

5.  Algorithmic approaches to protein-protein interaction site prediction.

Authors:  Tristan T Aumentado-Armstrong; Bogdan Istrate; Robert A Murgita
Journal:  Algorithms Mol Biol       Date:  2015-02-15       Impact factor: 1.405

6.  Use B-factor related features for accurate classification between protein binding interfaces and crystal packing contacts.

Authors:  Qian Liu; Zhenhua Li; Jinyan Li
Journal:  BMC Bioinformatics       Date:  2014-12-08       Impact factor: 3.169

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.