Michelle S Scott1, Geoffrey J Barton. 1. School of Life Sciences Research, College of Life Sciences, University of Dundee, Scotland, UK. michelle@compbio.dundee.ac.uk <michelle@compbio.dundee.ac.uk>
Abstract
BACKGROUND: Although the prediction of protein-protein interactions has been extensively investigated for yeast, few such datasets exist for the far larger proteome in human. Furthermore, it has recently been estimated that the overall average false positive rate of available computational and high-throughput experimental interaction datasets is as high as 90%. RESULTS: The prediction of human protein-protein interactions was investigated by combining orthogonal protein features within a probabilistic framework. The features include co-expression, orthology to known interacting proteins and the full-Bayesian combination of subcellular localization, co-occurrence of domains and post-translational modifications. A novel scoring function for local network topology was also investigated. This topology feature greatly enhanced the predictions and together with the full-Bayes combined features, made the largest contribution to the predictions. Using a conservative threshold, our most accurate predictor identifies 37606 human interactions, 32892 (80%) of which are not present in other publicly available large human interaction datasets, thus substantially increasing the coverage of the human interaction map. A subset of the 32892 novel predicted interactions have been independently validated. Comparison of the prediction dataset to other available human interaction datasets estimates the false positive rate of the new method to be below 80% which is competitive with other methods. Since the new method scores and ranks all human protein pairs, smaller subsets of higher quality can be generated thus leading to even lower false positive prediction rates. CONCLUSION: The set of interactions predicted in this work increases the coverage of the human interaction map and will help determine the highest confidence human interactions.
BACKGROUND: Although the prediction of protein-protein interactions has been extensively investigated for yeast, few such datasets exist for the far larger proteome in human. Furthermore, it has recently been estimated that the overall average false positive rate of available computational and high-throughput experimental interaction datasets is as high as 90%. RESULTS: The prediction of human protein-protein interactions was investigated by combining orthogonal protein features within a probabilistic framework. The features include co-expression, orthology to known interacting proteins and the full-Bayesian combination of subcellular localization, co-occurrence of domains and post-translational modifications. A novel scoring function for local network topology was also investigated. This topology feature greatly enhanced the predictions and together with the full-Bayes combined features, made the largest contribution to the predictions. Using a conservative threshold, our most accurate predictor identifies 37606 human interactions, 32892 (80%) of which are not present in other publicly available large human interaction datasets, thus substantially increasing the coverage of the human interaction map. A subset of the 32892 novel predicted interactions have been independently validated. Comparison of the prediction dataset to other available human interaction datasets estimates the false positive rate of the new method to be below 80% which is competitive with other methods. Since the new method scores and ranks all human protein pairs, smaller subsets of higher quality can be generated thus leading to even lower false positive prediction rates. CONCLUSION: The set of interactions predicted in this work increases the coverage of the human interaction map and will help determine the highest confidence human interactions.
Authors: Jean-François Rual; Kavitha Venkatesan; Tong Hao; Tomoko Hirozane-Kishikawa; Amélie Dricot; Ning Li; Gabriel F Berriz; Francis D Gibbons; Matija Dreze; Nono Ayivi-Guedehoussou; Niels Klitgord; Christophe Simon; Mike Boxem; Stuart Milstein; Jennifer Rosenberg; Debra S Goldberg; Lan V Zhang; Sharyl L Wong; Giovanni Franklin; Siming Li; Joanna S Albala; Janghoo Lim; Carlene Fraughton; Estelle Llamosas; Sebiha Cevik; Camille Bex; Philippe Lamesch; Robert S Sikorski; Jean Vandenhaute; Huda Y Zoghbi; Alex Smolyar; Stephanie Bosak; Reynaldo Sequerra; Lynn Doucette-Stamm; Michael E Cusick; David E Hill; Frederick P Roth; Marc Vidal Journal: Nature Date: 2005-09-28 Impact factor: 49.962
Authors: Ulrich Stelzl; Uwe Worm; Maciej Lalowski; Christian Haenig; Felix H Brembeck; Heike Goehler; Martin Stroedicke; Martina Zenkner; Anke Schoenherr; Susanne Koeppen; Jan Timm; Sascha Mintzlaff; Claudia Abraham; Nicole Bock; Silvia Kietzmann; Astrid Goedde; Engin Toksöz; Anja Droege; Sylvia Krobitsch; Bernhard Korn; Walter Birchmeier; Hans Lehrach; Erich E Wanker Journal: Cell Date: 2005-09-23 Impact factor: 41.582
Authors: Danielle Kemmer; Yong Huang; Sohrab P Shah; Jonathan Lim; Jochen Brumm; Macaire M S Yuen; John Ling; Tao Xu; Wyeth W Wasserman; B F Francis Ouellette Journal: Genome Biol Date: 2005-12-02 Impact factor: 13.583
Authors: Gopa R Mishra; M Suresh; K Kumaran; N Kannabiran; Shubha Suresh; P Bala; K Shivakumar; N Anuradha; Raghunath Reddy; T Madhan Raghavan; Shalini Menon; G Hanumanthu; Malvika Gupta; Sapna Upendran; Shweta Gupta; M Mahesh; Bincy Jacob; Pinky Mathew; Pritam Chatterjee; K S Arun; Salil Sharma; K N Chandrika; Nandan Deshpande; Kshitish Palvankar; R Raghavnath; R Krishnakanth; Hiren Karathia; B Rekha; Rashmi Nayak; G Vishnupriya; H G Mohan Kumar; M Nagini; G S Sameer Kumar; Rojan Jose; P Deepthi; S Sujatha Mohan; T K B Gandhi; H C Harsha; Krishna S Deshpande; Malabika Sarker; T S Keshava Prasad; Akhilesh Pandey Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971
Authors: Robert D Finn; Jaina Mistry; Benjamin Schuster-Böckler; Sam Griffiths-Jones; Volker Hollich; Timo Lassmann; Simon Moxon; Mhairi Marshall; Ajay Khanna; Richard Durbin; Sean R Eddy; Erik L L Sonnhammer; Alex Bateman Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971
Authors: Chad L Myers; Drew Robson; Adam Wible; Matthew A Hibbs; Camelia Chiriac; Chandra L Theesfeld; Kara Dolinski; Olga G Troyanskaya Journal: Genome Biol Date: 2005-12-19 Impact factor: 13.583
Authors: Max Kotlyar; Chiara Pastrello; Flavia Pivetta; Alessandra Lo Sardo; Christian Cumbaa; Han Li; Taline Naranian; Yun Niu; Zhiyong Ding; Fatemeh Vafaee; Fiona Broackes-Carter; Julia Petschnigg; Gordon B Mills; Andrea Jurisicova; Igor Stagljar; Roberta Maestro; Igor Jurisica Journal: Nat Methods Date: 2014-11-17 Impact factor: 28.547
Authors: Stefanie De Bodt; Sebastian Proost; Klaas Vandepoele; Pierre Rouzé; Yves Van de Peer Journal: BMC Genomics Date: 2009-06-29 Impact factor: 3.969
Authors: Sonia M Leach; Hannah Tipney; Weiguo Feng; William A Baumgartner; Priyanka Kasliwal; Ronald P Schuyler; Trevor Williams; Richard A Spritz; Lawrence Hunter Journal: PLoS Comput Biol Date: 2009-03-27 Impact factor: 4.475