Literature DB >> 16450363

Evaluation of different biological data and computational classification methods for use in protein interaction prediction.

Yanjun Qi1, Ziv Bar-Joseph, Judith Klein-Seetharaman.   

Abstract

Protein-protein interactions play a key role in many biological systems. High-throughput methods can directly detect the set of interacting proteins in yeast, but the results are often incomplete and exhibit high false-positive and false-negative rates. Recently, many different research groups independently suggested using supervised learning methods to integrate direct and indirect biological data sources for the protein interaction prediction task. However, the data sources, approaches, and implementations varied. Furthermore, the protein interaction prediction task itself can be subdivided into prediction of (1) physical interaction, (2) co-complex relationship, and (3) pathway co-membership. To investigate systematically the utility of different data sources and the way the data is encoded as features for predicting each of these types of protein interactions, we assembled a large set of biological features and varied their encoding for use in each of the three prediction tasks. Six different classifiers were used to assess the accuracy in predicting interactions, Random Forest (RF), RF similarity-based k-Nearest-Neighbor, Naïve Bayes, Decision Tree, Logistic Regression, and Support Vector Machine. For all classifiers, the three prediction tasks had different success rates, and co-complex prediction appears to be an easier task than the other two. Independently of prediction task, however, the RF classifier consistently ranked as one of the top two classifiers for all combinations of feature sets. Therefore, we used this classifier to study the importance of different biological datasets. First, we used the splitting function of the RF tree structure, the Gini index, to estimate feature importance. Second, we determined classification accuracy when only the top-ranking features were used as an input in the classifier. We find that the importance of different features depends on the specific prediction task and the way they are encoded. Strikingly, gene expression is consistently the most important feature for all three prediction tasks, while the protein interactions identified using the yeast-2-hybrid system were not among the top-ranking features under any condition. (c) 2006 Wiley-Liss, Inc.

Entities:  

Mesh:

Year:  2006        PMID: 16450363      PMCID: PMC3250929          DOI: 10.1002/prot.20865

Source DB:  PubMed          Journal:  Proteins        ISSN: 0887-3585


  26 in total

1.  How reliable are experimental protein-protein interaction data?

Authors:  Einat Sprinzak; Shmuel Sattath; Hanah Margalit
Journal:  J Mol Biol       Date:  2003-04-11       Impact factor: 5.469

2.  A Bayesian networks approach for predicting protein-protein interactions from genomic data.

Authors:  Ronald Jansen; Haiyuan Yu; Dov Greenbaum; Yuval Kluger; Nevan J Krogan; Sambath Chung; Andrew Emili; Michael Snyder; Jack F Greenblatt; Mark Gerstein
Journal:  Science       Date:  2003-10-17       Impact factor: 47.728

3.  Global mapping of the yeast genetic interaction network.

Authors:  Amy Hin Yan Tong; Guillaume Lesage; Gary D Bader; Huiming Ding; Hong Xu; Xiaofeng Xin; James Young; Gabriel F Berriz; Renee L Brost; Michael Chang; YiQun Chen; Xin Cheng; Gordon Chua; Helena Friesen; Debra S Goldberg; Jennifer Haynes; Christine Humphries; Grace He; Shamiza Hussein; Lizhu Ke; Nevan Krogan; Zhijian Li; Joshua N Levinson; Hong Lu; Patrice Ménard; Christella Munyana; Ainslie B Parsons; Owen Ryan; Raffi Tonikian; Tania Roberts; Anne-Marie Sdicu; Jesse Shapiro; Bilal Sheikh; Bernhard Suter; Sharyl L Wong; Lan V Zhang; Hongwei Zhu; Christopher G Burd; Sean Munro; Chris Sander; Jasper Rine; Jack Greenblatt; Matthias Peter; Anthony Bretscher; Graham Bell; Frederick P Roth; Grant W Brown; Brenda Andrews; Howard Bussey; Charles Boone
Journal:  Science       Date:  2004-02-06       Impact factor: 47.728

4.  Gaining confidence in high-throughput protein interaction networks.

Authors:  Joel S Bader; Amitabha Chaudhuri; Jonathan M Rothberg; John Chant
Journal:  Nat Biotechnol       Date:  2003-12-14       Impact factor: 54.908

Review 5.  Computational methods of analysis of protein-protein interactions.

Authors:  Lukasz Salwinski; David Eisenberg
Journal:  Curr Opin Struct Biol       Date:  2003-06       Impact factor: 6.809

6.  A statistical framework for combining and interpreting proteomic datasets.

Authors:  Michael A Gilchrist; Laura A Salter; Andreas Wagner
Journal:  Bioinformatics       Date:  2004-01-22       Impact factor: 6.937

7.  Saccharomyces Genome Database (SGD) provides tools to identify and analyze sequences from Saccharomyces cerevisiae and related sequences from other organisms.

Authors:  Karen R Christie; Shuai Weng; Rama Balakrishnan; Maria C Costanzo; Kara Dolinski; Selina S Dwight; Stacia R Engel; Becket Feierbach; Dianna G Fisk; Jodi E Hirschman; Eurie L Hong; Laurie Issel-Tarver; Robert Nash; Anand Sethuraman; Barry Starr; Chandra L Theesfeld; Rey Andrada; Gail Binkley; Qing Dong; Christopher Lane; Mark Schroeder; David Botstein; J Michael Cherry
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

8.  MIPS: analysis and annotation of proteins from whole genomes.

Authors:  H W Mewes; C Amid; R Arnold; D Frishman; U Güldener; G Mannhaupt; M Münsterkötter; P Pagel; N Strack; V Stümpflen; J Warfsmann; A Ruepp
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

9.  Computational discovery of gene modules and regulatory networks.

Authors:  Ziv Bar-Joseph; Georg K Gerber; Tong Ihn Lee; Nicola J Rinaldi; Jane Y Yoo; François Robert; D Benjamin Gordon; Ernest Fraenkel; Tommi S Jaakkola; Richard A Young; David K Gifford
Journal:  Nat Biotechnol       Date:  2003-10-12       Impact factor: 54.908

10.  Predicting co-complexed protein pairs using genomic and proteomic data integration.

Authors:  Lan V Zhang; Sharyl L Wong; Oliver D King; Frederick P Roth
Journal:  BMC Bioinformatics       Date:  2004-04-16       Impact factor: 3.169

View more
  103 in total

1.  Bayesian neural adjustment of inhibitory control predicts emergence of problem stimulant use.

Authors:  Katia M Harlé; Jennifer L Stewart; Shunan Zhang; Susan F Tapert; Angela J Yu; Martin P Paulus
Journal:  Brain       Date:  2015-09-03       Impact factor: 13.501

2.  Identifying important risk factors for survival in patient with systolic heart failure using random survival forests.

Authors:  Eileen Hsich; Eiran Z Gorodeski; Eugene H Blackstone; Hemant Ishwaran; Michael S Lauer
Journal:  Circ Cardiovasc Qual Outcomes       Date:  2010-11-23

3.  Atypical cytostatic mechanism of N-1-sulfonylcytosine derivatives determined by in vitro screening and computational analysis.

Authors:  Fran Supek; Marijeta Kralj; Marko Marjanović; Lidija Suman; Tomislav Smuc; Irena Krizmanić; Biserka Zinić
Journal:  Invest New Drugs       Date:  2007-09-27       Impact factor: 3.850

4.  Global networks of functional coupling in eukaryotes from comprehensive data integration.

Authors:  Andrey Alexeyenko; Erik L L Sonnhammer
Journal:  Genome Res       Date:  2009-02-25       Impact factor: 9.043

Review 5.  Protein interaction predictions from diverse sources.

Authors:  Yin Liu; Inyoung Kim; Hongyu Zhao
Journal:  Drug Discov Today       Date:  2008-03-06       Impact factor: 7.851

6.  Prediction of human functional genetic networks from heterogeneous data using RVM-based ensemble learning.

Authors:  Chia-Chin Wu; Shahab Asgharzadeh; Timothy J Triche; David Z D'Argenio
Journal:  Bioinformatics       Date:  2010-02-04       Impact factor: 6.937

7.  Large-scale de novo prediction of physical protein-protein association.

Authors:  Antigoni Elefsinioti; Ömer Sinan Saraç; Anna Hegele; Conrad Plake; Nina C Hubner; Ina Poser; Mihail Sarov; Anthony Hyman; Matthias Mann; Michael Schroeder; Ulrich Stelzl; Andreas Beyer
Journal:  Mol Cell Proteomics       Date:  2011-08-11       Impact factor: 5.911

8.  RANKING RELATIONS USING ANALOGIES IN BIOLOGICAL AND INFORMATION NETWORKS.

Authors:  Ricardo Silva; Katherine Heller; Zoubin Ghahramani; Edoardo M Airoldi
Journal:  Ann Appl Stat       Date:  2010-08-03       Impact factor: 2.083

9.  An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests.

Authors:  Carolin Strobl; James Malley; Gerhard Tutz
Journal:  Psychol Methods       Date:  2009-12

10.  Predicting sulfotyrosine sites using the random forest algorithm with significantly improved prediction accuracy.

Authors:  Zheng Rong Yang
Journal:  BMC Bioinformatics       Date:  2009-10-29       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.