Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Simple sequence-based kernels do not predict protein-protein interactions.

Literature DB >> 20801913

Simple sequence-based kernels do not predict protein-protein interactions.

Jiantao Yu¹, Maozu Guo, Chris J Needham, Yangchao Huang, Lu Cai, David R Westhead.

Abstract

MOTIVATION: A number of methods have been reported that predict protein-protein interactions (PPIs) with high accuracy using only simple sequence-based features such as amino acid 3mer content. This is surprising, given that many protein interactions have high specificity that depends on detailed atomic recognition between physiochemically complementary surfaces. Are the reported high accuracies realistic?
RESULTS: We find that the reported accuracies of the predictions are significantly over-estimated, and strongly dependent on the structure of the training and testing datasets used. The choice of which protein pairs are deemed as non-interactions in the training data has a variable impact on the accuracy estimates, and the accuracies can be artificially inflated by a bias towards dominant samples in the positive data which result from the presence of hub proteins in the protein interaction network. To address this bias, we propose a positive set-specific method to create a 'balanced' negative set maintaining the degree distribution for each protein, leading to the conclusion that simple sequence-based features contain insufficient information to be useful for predicting PPIs, but that protein domain-based features have some predictive value. AVAILABILITY: Our method, named 'BRS-nonint', is available at http://www.bioinformatics.leeds.ac.uk/BRS-nonint/. All the datasets used in this study are derived from publicly available data, and are available at http://www.bioinformatics.leeds.ac.uk/BRS-nonint/PPI_RandomBalance.html CONTACT: maozuguo@hit.edu.cn; d.r.westhead@leeds.ac.uk.

Entities: Disease Gene

Mesh：

Substances：
Proteins

Year: 2010 PMID： 20801913 DOI： 10.1093/bioinformatics/btq483

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

Keyword Cloud
Cited

35 in total

1. Revisiting the negative example sampling problem for predicting protein-protein interactions.

Authors: Yungki Park; Edward M Marcotte
Journal: Bioinformatics Date: 2011-09-09 Impact factor: 6.937

2. A Collection of Benchmark Data Sets for Knowledge Graph-based Similarity in the Biomedical Domain.

Authors: Carlota Cardoso; Rita T Sousa; Sebastian Köhler; Catia Pesquita
Journal: Database (Oxford) Date: 2020-01-01 Impact factor: 3.451

3. Computational Methods and Deep Learning for Elucidating Protein Interaction Networks.

Authors: Dhvani Sandip Vora; Yogesh Kalakoti; Durai Sundar
Journal: Methods Mol Biol Date: 2023

4. The development of a universal in silico predictor of protein-protein interactions.

Authors: Guilherme T Valente; Marcio L Acencio; Cesar Martins; Ney Lemke
Journal: PLoS One Date: 2013-05-31 Impact factor: 3.240

5. Rigorous assessment and integration of the sequence and structure based features to predict hot spots.

Authors: Ruoying Chen; Wenjing Chen; Sixiao Yang; Di Wu; Yong Wang; Yingjie Tian; Yong Shi
Journal: BMC Bioinformatics Date: 2011-07-29 Impact factor: 3.169

6. In silico characterization and prediction of global protein-mRNA interactions in yeast.

Authors: Vera Pancaldi; Jürg Bähler
Journal: Nucleic Acids Res Date: 2011-04-01 Impact factor: 16.971

7. Improving accuracy of protein-protein interaction prediction by considering the converse problem for sequence representation.

Authors: Xianwen Ren; Yong-Cui Wang; Yong Wang; Xiang-Sun Zhang; Nai-Yang Deng
Journal: BMC Bioinformatics Date: 2011-10-24 Impact factor: 3.169

8. Predicting the fission yeast protein interaction network.

Authors: Vera Pancaldi; Omer S Saraç; Charalampos Rallis; Janel R McLean; Martin Převorovský; Kathleen Gould; Andreas Beyer; Jürg Bähler
Journal: G3 (Bethesda) Date: 2012-04-01 Impact factor: 3.154

9. Interactome-wide prediction of protein-protein binding sites reveals effects of protein sequence variation in Arabidopsis thaliana.

Authors: Felipe Leal Valentim; Frank Neven; Peter Boyen; Aalt D J van Dijk
Journal: PLoS One Date: 2012-10-15 Impact factor: 3.240

Review 10. Survey of Natural Language Processing Techniques in Bioinformatics.

Authors: Zhiqiang Zeng; Hua Shi; Yun Wu; Zhiling Hong
Journal: Comput Math Methods Med Date: 2015-10-07 Impact factor: 2.238