Literature DB >> 20801913

Simple sequence-based kernels do not predict protein-protein interactions.

Jiantao Yu1, Maozu Guo, Chris J Needham, Yangchao Huang, Lu Cai, David R Westhead.   

Abstract

MOTIVATION: A number of methods have been reported that predict protein-protein interactions (PPIs) with high accuracy using only simple sequence-based features such as amino acid 3mer content. This is surprising, given that many protein interactions have high specificity that depends on detailed atomic recognition between physiochemically complementary surfaces. Are the reported high accuracies realistic?
RESULTS: We find that the reported accuracies of the predictions are significantly over-estimated, and strongly dependent on the structure of the training and testing datasets used. The choice of which protein pairs are deemed as non-interactions in the training data has a variable impact on the accuracy estimates, and the accuracies can be artificially inflated by a bias towards dominant samples in the positive data which result from the presence of hub proteins in the protein interaction network. To address this bias, we propose a positive set-specific method to create a 'balanced' negative set maintaining the degree distribution for each protein, leading to the conclusion that simple sequence-based features contain insufficient information to be useful for predicting PPIs, but that protein domain-based features have some predictive value. AVAILABILITY: Our method, named 'BRS-nonint', is available at http://www.bioinformatics.leeds.ac.uk/BRS-nonint/. All the datasets used in this study are derived from publicly available data, and are available at http://www.bioinformatics.leeds.ac.uk/BRS-nonint/PPI_RandomBalance.html CONTACT: maozuguo@hit.edu.cn; d.r.westhead@leeds.ac.uk.

Entities:  

Mesh:

Substances:

Year:  2010        PMID: 20801913     DOI: 10.1093/bioinformatics/btq483

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  35 in total

1.  Revisiting the negative example sampling problem for predicting protein-protein interactions.

Authors:  Yungki Park; Edward M Marcotte
Journal:  Bioinformatics       Date:  2011-09-09       Impact factor: 6.937

2.  A Collection of Benchmark Data Sets for Knowledge Graph-based Similarity in the Biomedical Domain.

Authors:  Carlota Cardoso; Rita T Sousa; Sebastian Köhler; Catia Pesquita
Journal:  Database (Oxford)       Date:  2020-01-01       Impact factor: 3.451

3.  Computational Methods and Deep Learning for Elucidating Protein Interaction Networks.

Authors:  Dhvani Sandip Vora; Yogesh Kalakoti; Durai Sundar
Journal:  Methods Mol Biol       Date:  2023

4.  The development of a universal in silico predictor of protein-protein interactions.

Authors:  Guilherme T Valente; Marcio L Acencio; Cesar Martins; Ney Lemke
Journal:  PLoS One       Date:  2013-05-31       Impact factor: 3.240

5.  Rigorous assessment and integration of the sequence and structure based features to predict hot spots.

Authors:  Ruoying Chen; Wenjing Chen; Sixiao Yang; Di Wu; Yong Wang; Yingjie Tian; Yong Shi
Journal:  BMC Bioinformatics       Date:  2011-07-29       Impact factor: 3.169

6.  In silico characterization and prediction of global protein-mRNA interactions in yeast.

Authors:  Vera Pancaldi; Jürg Bähler
Journal:  Nucleic Acids Res       Date:  2011-04-01       Impact factor: 16.971

7.  Improving accuracy of protein-protein interaction prediction by considering the converse problem for sequence representation.

Authors:  Xianwen Ren; Yong-Cui Wang; Yong Wang; Xiang-Sun Zhang; Nai-Yang Deng
Journal:  BMC Bioinformatics       Date:  2011-10-24       Impact factor: 3.169

8.  Predicting the fission yeast protein interaction network.

Authors:  Vera Pancaldi; Omer S Saraç; Charalampos Rallis; Janel R McLean; Martin Převorovský; Kathleen Gould; Andreas Beyer; Jürg Bähler
Journal:  G3 (Bethesda)       Date:  2012-04-01       Impact factor: 3.154

9.  Interactome-wide prediction of protein-protein binding sites reveals effects of protein sequence variation in Arabidopsis thaliana.

Authors:  Felipe Leal Valentim; Frank Neven; Peter Boyen; Aalt D J van Dijk
Journal:  PLoS One       Date:  2012-10-15       Impact factor: 3.240

Review 10.  Survey of Natural Language Processing Techniques in Bioinformatics.

Authors:  Zhiqiang Zeng; Hua Shi; Yun Wu; Zhiling Hong
Journal:  Comput Math Methods Med       Date:  2015-10-07       Impact factor: 2.238

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.