Literature DB >> 15717327

Effect of training datasets on support vector machine prediction of protein-protein interactions.

Siaw Ling Lo1, Cong Zhong Cai, Yu Zong Chen, Maxey C M Chung.   

Abstract

Knowledge of protein-protein interaction is useful for elucidating protein function via the concept of 'guilt-by-association'. A statistical learning method, Support Vector Machine (SVM), has recently been explored for the prediction of protein-protein interactions using artificial shuffled sequences as hypothetical noninteracting proteins and it has shown promising results (Bock, J. R., Gough, D. A., Bioinformatics 2001, 17, 455-460). It remains unclear however, how the prediction accuracy is affected if real protein sequences are used to represent noninteracting proteins. In this work, this effect is assessed by comparison of the results derived from the use of real protein sequences with that derived from the use of shuffled sequences. The real protein sequences of hypothetical noninteracting proteins are generated from an exclusion analysis in combination with subcellular localization information of interacting proteins found in the Database of Interacting Proteins. Prediction accuracy using real protein sequences is 76.9% compared to 94.1% using artificial shuffled sequences. The discrepancy likely arises from the expected higher level of difficulty for separating two sets of real protein sequences than that for separating a set of real protein sequences from a set of artificial sequences. The use of real protein sequences for training a SVM classification system is expected to give better prediction results in practical cases. This is tested by using both SVM systems for predicting putative protein partners of a set of thioredoxin related proteins. The prediction results are consistent with observations, suggesting that real sequence is more practically useful in development of SVM classification system for facilitating protein-protein interaction prediction.

Entities:  

Mesh:

Substances:

Year:  2005        PMID: 15717327     DOI: 10.1002/pmic.200401118

Source DB:  PubMed          Journal:  Proteomics        ISSN: 1615-9853            Impact factor:   3.984


  24 in total

1.  Predicting protein-protein interactions in unbalanced data using the primary structure of proteins.

Authors:  Chi-Yuan Yu; Lih-Ching Chou; Darby Tien-Hao Chang
Journal:  BMC Bioinformatics       Date:  2010-04-02       Impact factor: 3.169

2.  Proteome scanning to predict PDZ domain interactions using support vector machines.

Authors:  Shirley Hui; Gary D Bader
Journal:  BMC Bioinformatics       Date:  2010-10-12       Impact factor: 3.169

3.  Prediction of interacting protein pairs from sequence using a Bayesian method.

Authors:  Chishe Wang; Jiaxing Cheng; Shoubao Su
Journal:  Protein J       Date:  2009-02       Impact factor: 2.371

4.  The development of a universal in silico predictor of protein-protein interactions.

Authors:  Guilherme T Valente; Marcio L Acencio; Cesar Martins; Ney Lemke
Journal:  PLoS One       Date:  2013-05-31       Impact factor: 3.240

5.  Classification of lung cancer tumors based on structural and physicochemical properties of proteins by bioinformatics models.

Authors:  Faezeh Hosseinzadeh; Mansour Ebrahimi; Bahram Goliaei; Narges Shamabadi
Journal:  PLoS One       Date:  2012-07-19       Impact factor: 3.240

6.  Simplified method to predict mutual interactions of human transcription factors based on their primary structure.

Authors:  Sebastian Schmeier; Boris Jankovic; Vladimir B Bajic
Journal:  PLoS One       Date:  2011-07-05       Impact factor: 3.240

7.  Update of PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence.

Authors:  H B Rao; F Zhu; G B Yang; Z R Li; Y Z Chen
Journal:  Nucleic Acids Res       Date:  2011-05-23       Impact factor: 16.971

8.  Local combinational variables: an approach used in DNA-binding helix-turn-helix motif prediction with sequence information.

Authors:  Wenwei Xiong; Tonghua Li; Kai Chen; Kailin Tang
Journal:  Nucleic Acids Res       Date:  2009-08-03       Impact factor: 16.971

9.  Triangle network motifs predict complexes by complementing high-error interactomes with structural information.

Authors:  Bill Andreopoulos; Christof Winter; Dirk Labudde; Michael Schroeder
Journal:  BMC Bioinformatics       Date:  2009-06-27       Impact factor: 3.169

10.  A graph kernel approach for alignment-free domain-peptide interaction prediction with an application to human SH3 domains.

Authors:  Kousik Kundu; Fabrizio Costa; Rolf Backofen
Journal:  Bioinformatics       Date:  2013-07-01       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.