Literature DB >> 17158159

UniHI: an entry gate to the human protein interactome.

Gautam Chaurasia¹, Yasir Iqbal, Christian Hänig, Hanspeter Herzel, Erich E Wanker, Matthias E Futschik.

Abstract

Systematic mapping of protein-protein interactions has become a central task of functional genomics. To map the human interactome, several strategies have recently been pursued. The generated interaction datasets are valuable resources for scientists in biology and medicine. However, comparison reveals limited overlap between different interaction networks. This divergence obstructs usability, as researchers have to interrogate numerous heterogeneous datasets to identify potential interaction partners for proteins of interest. To facilitate direct access through a single entry gate, we have started to integrate currently available human protein interaction data in an easily accessible online database. It is called UniHI (Unified Human Interactome) and is available at http://www.mdc-berlin.de/unihi. At present, it is based on 10 major interaction maps derived by computational and experimental methods. It includes more than 150,000 distinct interactions between more than 17 000 unique human proteins. UniHI provides researchers with a flexible integrated tool for finding and using comprehensive information about the human interactome.

Entities: Disease Gene Species

Mesh：

Year: 2006 PMID： 17158159 PMCID： PMC1781159 DOI： 10.1093/nar/gkl817

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Protein-protein interactions (PPIs) are central to many if not all cellular processes. Their importance has provoked broad interest in their analysis, which in turn has led to the construction of various large-scale interaction maps. The first PPI datasets were generated for model organisms such as Saccharomyces cerevisiae, Drosophila melanogaster and Caenorhabditis elegans (1–5). Recently, the focus has shifted towards the systematic mapping of human PPIs. Both computationally and experimentally derived interaction datasets have been produced. They are mostly based on review of literature (6–8), extrapolation from interactions between orthologous proteins observed in other organisms (9–11) or application of high-throughput yeast two-hybrid (Y2H) assays (12,13). Although these maps will certainly have profound impact on biological research, major limitations are lack of overlap, completeness and integration. Scientists are required to interrogate numerous databases if they seek comprehensive information on potential interaction partners for specific human proteins. This generally involves time-consuming searches as various query formats and identifiers have to be used in different interaction databases. Some datasets are even stored in simple flat files. To overcome these obstacles, we have constructed the UniHI database for the integration of large-scale human PPI maps. UniHI offers a search platform that combines and gives access to ten different large-scale human PPI datasets. It includes over 150 000 interactions between more than 17 000 proteins. UniHI is intended to reduce unnecessary duplication of data, while incorporating the strength of single databases regarding careful curation and annotation of PPIs.

HIGH DIVERGENCE OF HUMAN PPI DATASETS

The construction of UniHI was motivated by the observation that human interaction maps tend to be highly divergent (14,15). This is also the case for the interaction maps integrated in UniHI (Table 1). We observed that <10% of all interactions occur in multiple maps, indicating a low degree of saturation (Figure 1B and Supplementary Data). The small number of shared interactions is remarkable considering the large number of proteins common to different datasets. More than 50% of all proteins are included in two or more maps (Figure 1A). Thus, current PPI datasets are highly complementary sharing few interactions between many common proteins.

Table 1

PPI datasets currently integrated in UniHI

Dataset	Proteins	Interactions	Method	References	Database location
MDC-Y2H	1703	3186	Y2H screen	(12)
CCSB-Y2H	1549	2754	Y2H screen	(13)	(flat file only)
CCSB-LIT	2192	4067	Text mining	(13)	(flat file only)
HPRD-BIN	5908	15 508	Literature	(8)
HPRD-COMP	1277	4468	Literature	(8)
DIP	1033	1303	Literature	(7)
BIND	4273	5863	Literature	(6)
COCIT	3737	6580	Text mining	(10)
REACTOME	679	12 639	Literature	(16)
ORTHO	6225	71 466	Orthology	(11)
HOMOMINT	4127	10 174	Orthology	(17)
OPHID	4785	24 991	Orthology	(9)

Number of proteins and interactions in each dataset as well as construction approach are given.

Figure 1

Numbers of proteins (A) and interactions (B) common to multiple maps. The histograms display frequency of proteins and interactions that are included in N different maps. Comparisons were performed after mapping of proteins to their corresponding Entrez Gene IDs. PPI datasets currently integrated in UniHI Number of proteins and interactions in each dataset as well as construction approach are given.

INTEGRATION OF PPI DATASETS

We have started to integrate available large-scale human PPI maps in UniHI. In its initial version, UniHI is based on the unification of the following interaction datasets recently generated: MDC-Y2H, CCSB, HPRD, DIP, BIND, COCIT, REACTOME, ORTHO, HOMOMINT and OPHID (Table 1). These maps have been derived from manually curated databases (6–8,16), computational approaches employing text-mining (13,17), predictions based on orthology, (9–11) and from large Y2H screenings (12,13). For details see Supplementary Data. Matching of protein identifiers, which is essential for standardization, was performed using information from Ensmart and HGNC (18,19). For the combined map, we could assign 150 992 interactions between 17 064 unique proteins. For user friendliness, some modifications of the integrated datasets were carried out. First, we wanted to indicate whether interactions are binary or complex. Most of the included interactions are binary, while REACTOME comprises only complex interactions and HPRD comprises both binary and complex PPIs. To enable users to distinguish easily between the two types, we have split interaction data from HPRD into two sets (HPRD-BIN, HPRD-COMP). Secondly, differentiation between PPIs identified with different strategies was facilitated as choice of mapping approach has considerable impact on the PPIs detected. Maps based on multiple approaches were divided according to the methods used. CCSB data were divided into Y2H- and literature-based interaction maps (CCSB-Y2H, CCSB-LIT). OPHID comprises orthology-based PPIs as well as interactions imported from other databases. We included only orthology derived PPIs.

DATABASE STRUCTURE AND IMPLEMENTATION

The structure of the UniHI database has been designed to integrate PPI data obtained from different sources. UniHI is implemented as relational database using an open source MySQL database management system. It consists of six key tables: Protein, ProteinAliases, ProteinDistribution, InteractionDistribution, InteractionProperties and InteractionScore. It links the proteins with information about their properties, their interactions and their distribution and in the different PPI datasets (Supplementary Figure S1). A full description of the UniHI database structure and its implementation can be found in the Supplementary Data.

DATA ACCESS

Our aim was to provide easy and intuitive, but nevertheless efficient and comprehensive access to the integrated data. UniHI is accessible via a web-server at . A search interface based on Java programming language offers two different search options: In a single protein search, users input a single protein to query for its direct interaction partners. In a network-oriented multiple protein search users can supply a list of proteins. Proteins can be entered by their corresponding gene symbol, Entrez Gene ID, Uniprot ID, Unigene ID, OMIM ID, NCBI Geneinfo ID or Ensembl ID. A visualization tool for interaction data with various features has been implemented. We utilized and extended a pre-existing Java applet for graphical presentation of interaction networks (20). Retrieved interactions can be displayed either in textual (Figure 2) or graphical form (Figure 3). For both types of views, interactions are directly hyperlinked to the maps from which they originate, with the exception of OPHID, due to technical reasons, and CCBS, which is only available as a text file. To facilitate the interpretation of results, characteristic sets of colors were used distinguishing maps as well as mapping approaches.

Figure 2

Figure 3

Graphical representation of PPIs. After retrieval, users of UniHI can visualize the interactions as graphs with interactions displayed as lines. (A) Output of the query for interaction partners of TP53. (B–D) Output for a query with multiple proteins (TP53, CDC2, E2F4, HD, A2M and GADD45A). Several features allow quick assessment of results. Line color indicates the map from which the interaction is derived. Multiple lines between proteins signify presence in several maps. Queried proteins are symbolized by gray rectangles, their interacting partners by yellow rectangles. To assist users in the evaluation of results, several tools are offered to restrict interaction sets displayed according to selection of maps, multiple occurrences, common neighbourhood of proteins (C), direct interactions between query proteins (D).

Textual representation of a query result for protein interactions in UniHI. For each interaction partner found, a hyperlink is provided to the database from which the interaction originates. Multiple links indicate inclusion in multiple maps. For easy discrimination between maps, specific colors have been assigned. Shades of blue have been used for datasets derived by literature search, shades of green for orthology-based maps, shades of red for maps derived from Y2H screens. Graphical representation of PPIs. After retrieval, users of UniHI can visualize the interactions as graphs with interactions displayed as lines. (A) Output of the query for interaction partners of TP53. (B–D) Output for a query with multiple proteins (TP53, CDC2, E2F4, HD, A2M and GADD45A). Several features allow quick assessment of results. Line color indicates the map from which the interaction is derived. Multiple lines between proteins signify presence in several maps. Queried proteins are symbolized by gray rectangles, their interacting partners by yellow rectangles. To assist users in the evaluation of results, several tools are offered to restrict interaction sets displayed according to selection of maps, multiple occurrences, common neighbourhood of proteins (C), direct interactions between query proteins (D). To permit users a highly targeted search, UniHI offers several tools to specify the displayed interactions: (i) Display only interactions from selected maps. This option can be used to exclude certain mapping approaches. (ii) Display only proteins that are common interaction partners to multiple proteins in the query. Such procedure can narrow down the context of a chosen set of proteins and can help to identify putative modifiers of physiological processes (12). (iii) Display only interactions that occur in multiple maps. This approach may be used to gain confidence in interactions retrieved (21). (iv) Display only direct interactions between query proteins. This option can be used for the identification of protein complexes.

SCOPE OF UniHI AND FUTURE DIRECTIONS

The aim of UniHI is to provide a unified set of protein interactions included in the major human PPI maps that are publicly available. As these are constantly extended, this demands ongoing integration of additional interaction data. UniHI has been designed with an open structure permitting future integration of further human interactome datasets. Links to already included maps will be updated every three months. Currently, Perl scripts with integrated SQL commands are used to preprocess and import interaction data after manual download from the corresponding web-pages. For future versions of UniHI, we aim to automate this process. Detailed information about the updating procedure can be found in the Supplementary Data. To examine the constitution of UniHI, extensive statistical analysis was performed regarding network structure and functional annotation of integrated datasets. We also scrutinized the reliability of interaction maps using independent expression data and annotation (see Supplementary Data). Since the scope of UniHI can be expected to be continuously expanding, these analyses will be regularly repeated and presented on the UniHI webpage. This allows users a critical assessment of the single maps included in UniHI as well as of UniHI itself. To assess the quality of the interaction data, information on co-expression and co-annotation is presented for each interaction pair. We also list how protein interactions were validated in each dataset. Additionally, UniHI provides available links to the original PubMed articles that were used for curation in literature-based interactions maps.

CONCLUSIONS

Increasing numbers of human PPI datasets provide enormous amounts of valuable, but frequently unconnected information whose application in biology and medicine is still limited (22–24). Lack of integration and overlap need to be addressed more strongly with experimental and bioinformatical strategies. UniHI constitutes a highly practical integrated platform that allows simultaneous querying of the major human protein-protein interaction maps. It does not replace already available interaction maps, but facilitates single portal access to the larger part of the human interactome analyzed so far. UniHI enables the assembly of comprehensive lists of protein interactions and flexible network-orientated searching. It allows identification of network structures which would not be detectable if single maps were analyzed separately. UniHI is a flexible tool for the systematic utilization of human interactome data in biomedical research.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR online.

23 in total

1. The Database of Interacting Proteins: 2004 update.

Authors: Lukasz Salwinski; Christopher S Miller; Adam J Smith; Frank K Pettit; James U Bowie; David Eisenberg
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

2. A human protein-protein interaction network: a resource for annotating the proteome.

Authors: Ulrich Stelzl; Uwe Worm; Maciej Lalowski; Christian Haenig; Felix H Brembeck; Heike Goehler; Martin Stroedicke; Martina Zenkner; Anke Schoenherr; Susanne Koeppen; Jan Timm; Sascha Mintzlaff; Claudia Abraham; Nicole Bock; Silvia Kietzmann; Astrid Goedde; Engin Toksöz; Anja Droege; Sylvia Krobitsch; Bernhard Korn; Walter Birchmeier; Hans Lehrach; Erich E Wanker
Journal: Cell Date: 2005-09-23 Impact factor: 41.582

3. Online predicted human interaction database.

Authors: Kevin R Brown; Igor Jurisica
Journal: Bioinformatics Date: 2005-01-18 Impact factor: 6.937

4. Predictive models of molecular machines involved in Caenorhabditis elegans early embryogenesis.

Authors: Kristin C Gunsalus; Hui Ge; Aaron J Schetter; Debra S Goldberg; Jing-Dong J Han; Tong Hao; Gabriel F Berriz; Nicolas Bertin; Jerry Huang; Ling-Shiang Chuang; Ning Li; Ramamurthy Mani; Anthony A Hyman; Birte Sönnichsen; Christophe J Echeverri; Frederick P Roth; Marc Vidal; Fabio Piano
Journal: Nature Date: 2005-08-11 Impact factor: 49.962

5. A protein interaction network links GIT1, an enhancer of huntingtin aggregation, to Huntington's disease.

Authors: Heike Goehler; Maciej Lalowski; Ulrich Stelzl; Stephanie Waelter; Martin Stroedicke; Uwe Worm; Anja Droege; Katrin S Lindenberg; Maria Knoblich; Christian Haenig; Martin Herbst; Jaana Suopanki; Eberhard Scherzinger; Claudia Abraham; Bianca Bauer; Renate Hasenbank; Anja Fritzsche; Andreas H Ludewig; Konrad Büssow; Konrad Buessow; Sarah H Coleman; Claire-Anne Gutekunst; Bernhard G Landwehrmeyer; Hans Lehrach; Erich E Wanker
Journal: Mol Cell Date: 2004-09-24 Impact factor: 17.970

6. A protein interaction map of Drosophila melanogaster.

Authors: L Giot; J S Bader; C Brouwer; A Chaudhuri; B Kuang; Y Li; Y L Hao; C E Ooi; B Godwin; E Vitols; G Vijayadamodar; P Pochart; H Machineni; M Welsh; Y Kong; B Zerhusen; R Malcolm; Z Varrone; A Collis; M Minto; S Burgess; L McDaniel; E Stimpson; F Spriggs; J Williams; K Neurath; N Ioime; M Agee; E Voss; K Furtak; R Renzulli; N Aanensen; S Carrolla; E Bickelhaupt; Y Lazovatsky; A DaSilva; J Zhong; C A Stanyon; R L Finley; K P White; M Braverman; T Jarvie; S Gold; M Leach; J Knight; R A Shimkets; M P McKenna; J Chant; J M Rothberg
Journal: Science Date: 2003-11-06 Impact factor: 47.728

7. A map of the interactome network of the metazoan C. elegans.

Authors: Siming Li; Christopher M Armstrong; Nicolas Bertin; Hui Ge; Stuart Milstein; Mike Boxem; Pierre-Olivier Vidalain; Jing-Dong J Han; Alban Chesneau; Tong Hao; Debra S Goldberg; Ning Li; Monica Martinez; Jean-François Rual; Philippe Lamesch; Lai Xu; Muneesh Tewari; Sharyl L Wong; Lan V Zhang; Gabriel F Berriz; Laurent Jacotot; Philippe Vaglio; Jérôme Reboul; Tomoko Hirozane-Kishikawa; Qianru Li; Harrison W Gabel; Ahmed Elewa; Bridget Baumgartner; Debra J Rose; Haiyuan Yu; Stephanie Bosak; Reynaldo Sequerra; Andrew Fraser; Susan E Mango; William M Saxton; Susan Strome; Sander Van Den Heuvel; Fabio Piano; Jean Vandenhaute; Claude Sardet; Mark Gerstein; Lynn Doucette-Stamm; Kristin C Gunsalus; J Wade Harper; Michael E Cusick; Frederick P Roth; David E Hill; Marc Vidal
Journal: Science Date: 2004-01-02 Impact factor: 47.728

8. Reactome: a knowledgebase of biological pathways.

Authors: G Joshi-Tope; M Gillespie; I Vastrik; P D'Eustachio; E Schmidt; B de Bono; B Jassal; G R Gopinath; G R Wu; L Matthews; S Lewis; E Birney; L Stein
Journal: Nucleic Acids Res Date: 2005-01-01 Impact factor: 16.971

9. Consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactome.

Authors: Arun K Ramani; Razvan C Bunescu; Raymond J Mooney; Edward M Marcotte
Journal: Genome Biol Date: 2005-04-15 Impact factor: 13.583

10. A first-draft human protein-interaction map.

Authors: Ben Lehner; Andrew G Fraser
Journal: Genome Biol Date: 2004-08-13 Impact factor: 13.583

42 in total

1. Human protein reference database and human proteinpedia as discovery resources for molecular biotechnology.

Authors: Renu Goel; Babylakshmi Muthusamy; Akhilesh Pandey; T S Keshava Prasad
Journal: Mol Biotechnol Date: 2011-05 Impact factor: 2.695

2. IMID: integrated molecular interaction database.

Authors: Sentil Balaji; Charles Mcclendon; Rajesh Chowdhary; Jun S Liu; Jinfeng Zhang
Journal: Bioinformatics Date: 2012-01-11 Impact factor: 6.937

Review 3. Tools for protein-protein interaction network analysis in cancer research.

Authors: Rebeca Sanz-Pamplona; Antoni Berenguer; Xavier Sole; David Cordero; Marta Crous-Bou; Jordi Serra-Musach; Elisabet Guinó; Miguel Ángel Pujana; Víctor Moreno
Journal: Clin Transl Oncol Date: 2012-01 Impact factor: 3.405

4. Studying the evolution of promoter sequences: a waiting time problem.

Authors: Sarah Behrens; Martin Vingron
Journal: J Comput Biol Date: 2010-12 Impact factor: 1.479

5. Double barrel shotgun scanning of the caveolin-1 scaffolding domain.

Authors: Aron M Levin; Katsuyuki Murase; Pilgrim J Jackson; Mack L Flinspach; Thomas L Poulos; Gregory A Weiss
Journal: ACS Chem Biol Date: 2007-06-29 Impact factor: 5.100

6. Integrated network analysis platform for protein-protein interactions.

Authors: Jianmin Wu; Tea Vallenius; Kristian Ovaska; Jukka Westermarck; Tomi P Mäkelä; Sampsa Hautaniemi
Journal: Nat Methods Date: 2008-12-14 Impact factor: 28.547

Review 7. Network integration and graph analysis in mammalian molecular systems biology.

Authors: A Ma'ayan
Journal: IET Syst Biol Date: 2008-09 Impact factor: 1.615

Review 8. Exploiting the proteome to improve the genome-wide genetic analysis of epistasis in common human diseases.

Authors: Kristine A Pattin; Jason H Moore
Journal: Hum Genet Date: 2008-06-13 Impact factor: 4.132

9. ConsensusPathDB--a database for integrating human functional interaction networks.

Authors: Atanas Kamburov; Christoph Wierling; Hans Lehrach; Ralf Herwig
Journal: Nucleic Acids Res Date: 2008-10-21 Impact factor: 16.971

10. Human Protein Reference Database--2009 update.

Authors: T S Keshava Prasad; Renu Goel; Kumaran Kandasamy; Shivakumar Keerthikumar; Sameer Kumar; Suresh Mathivanan; Deepthi Telikicherla; Rajesh Raju; Beema Shafreen; Abhilash Venugopal; Lavanya Balakrishnan; Arivusudar Marimuthu; Sutopa Banerjee; Devi S Somanathan; Aimy Sebastian; Sandhya Rani; Somak Ray; C J Harrys Kishore; Sashi Kanth; Mukhtar Ahmed; Manoj K Kashyap; Riaz Mohmood; Y L Ramachandra; V Krishna; B Abdul Rahiman; Sujatha Mohan; Prathibha Ranganathan; Subhashri Ramabadran; Raghothama Chaerkady; Akhilesh Pandey
Journal: Nucleic Acids Res Date: 2008-11-06 Impact factor: 16.971