Literature DB >> 21908540

Revisiting the negative example sampling problem for predicting protein-protein interactions.

Yungki Park1, Edward M Marcotte.   

Abstract

MOTIVATION: A number of computational methods have been proposed that predict protein-protein interactions (PPIs) based on protein sequence features. Since the number of potential non-interacting protein pairs (negative PPIs) is very high both in absolute terms and in comparison to that of interacting protein pairs (positive PPIs), computational prediction methods rely upon subsets of negative PPIs for training and validation. Hence, the need arises for subset sampling for negative PPIs.
RESULTS: We clarify that there are two fundamentally different types of subset sampling for negative PPIs. One is subset sampling for cross-validated testing, where one desires unbiased subsets so that predictive performance estimated with them can be safely assumed to generalize to the population level. The other is subset sampling for training, where one desires the subsets that best train predictive algorithms, even if these subsets are biased. We show that confusion between these two fundamentally different types of subset sampling led one study recently published in Bioinformatics to the erroneous conclusion that predictive algorithms based on protein sequence features are hardly better than random in predicting PPIs. Rather, both protein sequence features and the 'hubbiness' of interacting proteins contribute to effective prediction of PPIs. We provide guidance for appropriate use of random versus balanced sampling. AVAILABILITY: The datasets used for this study are available at http://www.marcottelab.org/PPINegativeDataSampling. CONTACT: yungki@mail.utexas.edu; marcotte@icmb.utexas.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Mesh:

Year:  2011        PMID: 21908540      PMCID: PMC3198576          DOI: 10.1093/bioinformatics/btr514

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  25 in total

1.  Predicting protein--protein interactions from primary structure.

Authors:  J R Bock; D A Gough
Journal:  Bioinformatics       Date:  2001-05       Impact factor: 6.937

2.  Learning to predict protein-protein interactions from protein sequences.

Authors:  Shawn M Gomez; William Stafford Noble; Andrey Rzhetsky
Journal:  Bioinformatics       Date:  2003-10-12       Impact factor: 6.937

3.  Simple sequence-based kernels do not predict protein-protein interactions.

Authors:  Jiantao Yu; Maozu Guo; Chris J Needham; Yangchao Huang; Lu Cai; David R Westhead
Journal:  Bioinformatics       Date:  2010-08-27       Impact factor: 6.937

4.  A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae.

Authors:  P Uetz; L Giot; G Cagney; T A Mansfield; R S Judson; J R Knight; D Lockshon; V Narayan; M Srinivasan; P Pochart; A Qureshi-Emili; Y Li; B Godwin; D Conover; T Kalbfleisch; G Vijayadamodar; M Yang; M Johnston; S Fields; J M Rothberg
Journal:  Nature       Date:  2000-02-10       Impact factor: 49.962

5.  Correlated sequence-signatures as markers of protein-protein interaction.

Authors:  E Sprinzak; H Margalit
Journal:  J Mol Biol       Date:  2001-08-24       Impact factor: 5.469

6.  A comprehensive two-hybrid analysis to explore the yeast protein interactome.

Authors:  T Ito; T Chiba; R Ozawa; M Yoshida; M Hattori; Y Sakaki
Journal:  Proc Natl Acad Sci U S A       Date:  2001-03-13       Impact factor: 11.205

7.  Predicting protein-protein interactions in unbalanced data using the primary structure of proteins.

Authors:  Chi-Yuan Yu; Lih-Ching Chou; Darby Tien-Hao Chang
Journal:  BMC Bioinformatics       Date:  2010-04-02       Impact factor: 3.169

8.  Human Protein Reference Database--2009 update.

Authors:  T S Keshava Prasad; Renu Goel; Kumaran Kandasamy; Shivakumar Keerthikumar; Sameer Kumar; Suresh Mathivanan; Deepthi Telikicherla; Rajesh Raju; Beema Shafreen; Abhilash Venugopal; Lavanya Balakrishnan; Arivusudar Marimuthu; Sutopa Banerjee; Devi S Somanathan; Aimy Sebastian; Sandhya Rani; Somak Ray; C J Harrys Kishore; Sashi Kanth; Mukhtar Ahmed; Manoj K Kashyap; Riaz Mohmood; Y L Ramachandra; V Krishna; B Abdul Rahiman; Sujatha Mohan; Prathibha Ranganathan; Subhashri Ramabadran; Raghothama Chaerkady; Akhilesh Pandey
Journal:  Nucleic Acids Res       Date:  2008-11-06       Impact factor: 16.971

9.  Exploiting amino acid composition for predicting protein-protein interactions.

Authors:  Sushmita Roy; Diego Martinez; Harriett Platero; Terran Lane; Margaret Werner-Washburne
Journal:  PLoS One       Date:  2009-11-20       Impact factor: 3.240

10.  Critical assessment of sequence-based protein-protein interaction prediction methods that do not require homologous protein sequences.

Authors:  Yungki Park
Journal:  BMC Bioinformatics       Date:  2009-12-14       Impact factor: 3.169

View more
  21 in total

1.  Protein-protein interaction and non-interaction predictions using gene sequence natural vector.

Authors:  Nan Zhao; Maji Zhuo; Kun Tian; Xinqi Gong
Journal:  Commun Biol       Date:  2022-07-02

Review 2.  On protocols and measures for the validation of supervised methods for the inference of biological networks.

Authors:  Marie Schrynemackers; Robert Küffner; Pierre Geurts
Journal:  Front Genet       Date:  2013-12-03       Impact factor: 4.599

3.  Machine learning-based chemical binding similarity using evolutionary relationships of target genes.

Authors:  Keunwan Park; Young-Joon Ko; Prasannavenkatesh Durai; Cheol-Ho Pan
Journal:  Nucleic Acids Res       Date:  2019-11-18       Impact factor: 16.971

4.  Improving the measurement of semantic similarity between gene ontology terms and gene products: insights from an edge- and IC-based hybrid method.

Authors:  Xiaomei Wu; Erli Pang; Kui Lin; Zhen-Ming Pei
Journal:  PLoS One       Date:  2013-05-31       Impact factor: 3.240

5.  The development of a universal in silico predictor of protein-protein interactions.

Authors:  Guilherme T Valente; Marcio L Acencio; Cesar Martins; Ney Lemke
Journal:  PLoS One       Date:  2013-05-31       Impact factor: 3.240

6.  Probabilistic inference of biological networks via data integration.

Authors:  Mark F Rogers; Colin Campbell; Yiming Ying
Journal:  Biomed Res Int       Date:  2015-03-22       Impact factor: 3.411

7.  Computational Prediction of Protein-Protein Interaction Networks: Algo-rithms and Resources.

Authors:  Javad Zahiri; Joseph Hannon Bozorgmehr; Ali Masoudi-Nejad
Journal:  Curr Genomics       Date:  2013-09       Impact factor: 2.236

Review 8.  Recent applications of deep learning and machine intelligence on in silico drug discovery: methods, tools and databases.

Authors:  Ahmet Sureyya Rifaioglu; Heval Atas; Maria Jesus Martin; Rengul Cetin-Atalay; Volkan Atalay; Tunca Doğan
Journal:  Brief Bioinform       Date:  2019-09-27       Impact factor: 11.622

Review 9.  Machine learning and genome annotation: a match meant to be?

Authors:  Kevin Y Yip; Chao Cheng; Mark Gerstein
Journal:  Genome Biol       Date:  2013-05-29       Impact factor: 13.583

10.  Efficient prediction of human protein-protein interactions at a global scale.

Authors:  Andrew Schoenrock; Bahram Samanfar; Sylvain Pitre; Mohsen Hooshyar; Ke Jin; Charles A Phillips; Hui Wang; Sadhna Phanse; Katayoun Omidi; Yuan Gui; Md Alamgir; Alex Wong; Fredrik Barrenäs; Mohan Babu; Mikael Benson; Michael A Langston; James R Green; Frank Dehne; Ashkan Golshani
Journal:  BMC Bioinformatics       Date:  2014-12-10       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.