Literature DB >> 24214987

UniHI 7: an enhanced database for retrieval and interactive analysis of human molecular interaction networks.

Ravi Kiran Reddy Kalathur¹, José Pedro Pinto, Miguel A Hernández-Prieto, Rui S R Machado, Dulce Almeida, Gautam Chaurasia, Matthias E Futschik.

Abstract

Unified Human Interactome (UniHI) (http://www.unihi.org) is a database for retrieval, analysis and visualization of human molecular interaction networks. Its primary aim is to provide a comprehensive and easy-to-use platform for network-based investigations to a wide community of researchers in biology and medicine. Here, we describe a major update (version 7) of the database previously featured in NAR Database Issue. UniHI 7 currently includes almost 350,000 molecular interactions between genes, proteins and drugs, as well as numerous other types of data such as gene expression and functional annotation. Multiple options for interactive filtering and highlighting of proteins can be employed to obtain more reliable and specific network structures. Expression and other genomic data can be uploaded by the user to examine local network structures. Additional built-in tools enable ready identification of known drug targets, as well as of biological processes, phenotypes and pathways enriched with network proteins. A distinctive feature of UniHI 7 is its user-friendly interface designed to be utilized in an intuitive manner, enabling researchers less acquainted with network analysis to perform state-of-the-art network-based investigations.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2013 PMID： 24214987 PMCID： PMC3965034 DOI： 10.1093/nar/gkt1100

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

The study of molecular systems and networks is now a major field in biology and medicine. The goals of network-based investigations range from prioritization of candidate genes to determination of complex molecular mechanisms underlying a disease or a biological process (1,2). An essential prerequisite for these investigations is the availability of resources for molecular interactions in model organisms and humans. To address this need, various databases have been established in recent years (3). Especially for protein–protein interactions in humans, many initiatives and research groups have contributed large sets of data derived from the literature, high-throughput methods or computational prediction (4–15). In parallel, a wide range of dedicated programs for network analyses have also been developed (16–18). However, it is a common experience that current resources and tools pose a considerable challenge to users, especially to researchers less acquainted with concepts in network biology. Frequently, users have to download, map, compile and integrate distinct data types to conduct network-based investigations. These activities require extensive knowledge of data processing and management. Thus, a salient ‘bottleneck’ exists for many interested researchers between the wealth of available molecular interaction data and their utilization. This observation motivated us to develop a new version of the Unified Human Interactome (UniHI) database for the retrieval, analysis and visualization of human molecular interaction networks: UniHI 7. We provide a platform that enables (i) retrieval of an integrated set of interactions from the major resources, (ii) intuitive use of tools for network-based investigations and (iii) easy utilization of complementary data and information for analysis, evaluation and visualization of retrieved networks.

SCOPE OF UNIHI 7

UniHI 7 integrates ∼350 000 molecular interactions for more than 30 000 human proteins. It is based on a complete re-implementation of previous versions of UniHI, with widely extended scope and functionality. Besides protein–protein interactions from 12 different resources [including HPRD (4), BioGrid (5), IntAct (6), DIP (7), BIND (8) and Reactome (9) databases; as well as four interaction maps produced by computational predictions (10–13) and two high-throughput yeast-2-hybrid (Y2H) screens (14,15)], UniHI 7 also comprises curated transcriptional regulatory interactions from three complementary databases TRANSFAC (19), miRTarBase (20) and HTRIdb (21). In addition to these interactions, we also integrated drug target information from DrugBank (22) that can be mapped on the interaction network. Detailed description regarding the incorporated resources can be found on the UniHI 7 web-page and in the Supplementary Materials (Supplementary Table S1). Whereas former UniHI versions can primarily be regarded as integrated databases for human protein–protein interactions (23,24), additional strengths of UniHI 7 lie not only in the integration of regulatory interactions but also in its interactive analysis and visualization tools for molecular networks. Although there are other databases with integrated molecular interaction data (25–34), UniHI provides a distinct and unique set of features ranging from simple filtering options to advanced network analysis tools (Supplementary Materials and Supplementary Table S2). The main application of UniHI 7 is the retrieval and examination of small to medium-sized local networks. It is ideally suited for researchers, who want to explore the molecular context of a single protein or a select set of related proteins using a network-orientated approach.

SEARCHING FOR INTERACTIONS IN UNIHI 7

The UniHI 7 database can be queried for molecular interactions of single or multiple human proteins. Various identifiers such as gene symbol, Entrez Gene, Uniprot and Ensembl IDs can serve as input. It is also possible to input gene and protein identifiers from the model organisms yeast (Saccharomyces cerevisiae), worm (Caenorhabditis elgans), fly (Drosophila melanogaster) or mouse (Mus musculus), which will be automatically mapped to human orthologs. This feature is convenient for researchers, who work with these major model organisms and want to interrogate related human molecular networks. In total, 2977 yeast, 6922 worm, 7998 fly and 15 694 mouse genes were mapped to human orthologs included in UniHI 7 using information from the HGNC database (35). As identifiers for model organisms, Entrez Gene IDs can be generally used, as well as systematic names for yeast, WormBase IDs or gene symbols for worm, FlyBase IDs or gene symbols for fly and MGI identifiers or gene symbols for mouse. The list of identifiers and any other data uploaded by the user are only stored during the active session and are accessible only to the user.

PARALLEL PRESENTATION OF PROTEINS, INTERACTIONS AND NETWORK

Whereas in previous UniHI versions, queried proteins, retrieved interactions and resulting networks were presented in sequential order, UniHI 7 displays now on four web-pages in parallel: ‘Proteins’, ‘Physical Interactions’, ‘Regulatory Interactions’ and ‘Network’. This display scheme enables users to readily switch between the different types of displayed information (Figure 1). The ‘Proteins’ page provides the list of proteins in the UniHI 7, matching the input and the names of the databases or resources in which each protein is included. In addition, hyperlinks to Entrez Gene, Uniprot, RefSeq, OMIM IDs and KEGG (if available) are given. On the ‘Physical Interactions’ and ‘Regulatory Interactions’ pages, the set of detected interactions is shown with various additional information regarding their source, evidence, type and functional annotation. A crucial feature is that interactions can easily be traced back to the original resources and publications, and thus can be critically assessed by users. In addition, all the interactions displayed on these two pages can be downloaded as simple tables, which can used as input for other computational tools.

Figure 1.

Search and result pages in UniHI 7. After performing initial search on the input page, the user is presented with four result pages providing information about detected proteins, physical and regulatory interactions, as well as a graphic network. The user can easily switch between the four pages by clicking on the relevant tab. The retrieved interactions are displayed as a network on the ‘Network’ page. For network visualization, we utilized the recently developed Cytoscape Web, which is a client-side application implemented in Flex/ActionScript and modeled after the popular Cytoscape software (36). To prevent that the visualizaton tool becomes unresponsive, certain automatic layout and filtering procedures are implemented for larger networks (Supplementary Materials). Information about proteins and interactions can be interactively explored in the network graphics, avoiding cumbersome comparison with the textual output. For instance, clicking on the network nodes provides information about the corresponding protein and links to other resources such as GeneCards (37) and GeneMania (38) for follow-up study. The displayed network can be exported as simple tab-delimited text, and as image, either as a PNG or PDF file.

INTEGRATED TOOLS FOR NETWORK ANALYSIS

Molecular networks are inherently difficult to analyze and interpret. In fact, the sheer number of retrieved interactions for well-studied proteins (including many kinases and receptors) can be overwhelming for users (Figure 2a). To help users with these challenges, we have implemented several tools for filtering and inspecting networks, as well as for mapping and utilizing complementary data and information. The application of these tools can be customized to different research objectives and can assist in the elucidation of network structure and prioritizing candidate proteins for follow-up studies.

Figure 2.

Network analysis and visualization tools: (a) original network of the query proteins (GADD45A, SNCA and PARK2); (b) network after filtering based on published evidence (i.e. number of PubMed references ≥3) and with known drug targets highlighted in red; (c) network after filtering with gene expression data (GSE20186) (39). Nodes corresponding to genes with a log2 fold change ≥ +0.2 are displayed in red and those ≤ −0.2 are shown in green; (d) significantly enriched KEGG pathways listed in a table; (e) filtered network with proteins linked to the selected term ‘Neurotrophin signaling pathway’ highlighted in red; and (f) filtered network with proteins linked to ‘Ubiquitin mediated proteolysis’ highlighted in red.

FILTERING OF INTERACTIONS AND UTILIZATION OF GENOMIC DATA

Filtering of interactions can be carried out on resource, published evidence (i.e. number of PubMed references), scale of experiment (i.e. small-scale or large-scale), type of derivation (i.e. literature, computational prediction or Y2H screens), connectivity (i.e. direct or indirect) and interaction (i.e. binary or complex) (Figure 2b). These filtering options can be tailored to produce more reliable and specific networks, e.g. include only interactions reported in multiple publications. UniHI 7 also stores gene expression in 19 different human tissue types derived from the Symatlas (40,41). Users can apply this data to highlight or exclude proteins (based on a chosen threshold level) to derive tissue-specific networks. In addition to using gene expression data stored in UniHI 7, users can upload their own expression data to filter and examine human molecular interaction networks. This feature can be applied to detect network proteins, which have distinct expression patterns related to physiological processes or diseases. Two different types of expression data can be used: absolute gene expression, i.e. positive values for transcript levels such as detected by Affymetrix GeneChips or RNA-Seq and differential gene expression, i.e. changes in expression derived from two-color arrays or by subtraction of absolute expression measurements. Thresholds for differential expression data can be set as maximum P-values or minimum fold changes (Figure 2c). In addition, gene lists, for example, derived from RNAi screens or high-throughput assays, can be uploaded and utilized for annotation and filtering of interaction networks. Together with its capacity to overlay expression data, this option makes UniHI 7 an efficient platform for network-based analyses in the ‘post-genomic’ era.

EXPLORING THE NETWORK: DRUG TARGETS, GENE ANNOTATION, PHENOTYPES AND DISEASES

Small molecules (drugs) can influence activity of single proteins, alter pathogenic mechanisms and are of crucial importance in numerous therapeutic interventions. To facilitate identification of known drug targets in the retrieved networks, UniHI 7 provides relevant highlighting and filtering options (Figure 2b) as well as information about the drugs and their mechanisms of action. From the DrugBank database, information for 4203 drugs targeting 2139 annotated proteins were altogether, imported into UniHI 7 (22) (Supplementary Materials). The functional relevance of networks is inherently difficult to assess. Hence, we have implemented a user-friendly integrated tool, which carries out enrichment analyses for molecular functions, biological processes, cellular location [as defined by Gene Ontology (42)], protein families [as defined by Pfam (43)] and pathways [as defined by KEGG (44)] of network proteins (Figure 2d). The significance of overrepresentation of network proteins in a Gene Ontology category, Pfam protein family or KEGG pathway is calculated using the hypergeometric test (which is equivalent to the one-tailed Fisher’s exact test) with the terms from human genome as background distribution. For significant terms (i.e. GO categories, Pfam families or KEGG pathways), the number of included network proteins, the P-value and the false discovery rate for enrichment are displayed in a table. The associated proteins can easily be highlighted in the network graphics by clicking on the corresponding term (Figure 2e and f). Finally, phenotype information can be assessed for network proteins in the new version of UniHI. For this purpose, we have integrated gene–phenotype associations, curated in Mouse Genome Database (45) and mapped to their human orthologs, for several major phenotypes such as cardiovascular system or embryogenesis. In addition, we have collected genes linked to aging in humans from the GenAge database (46), genes associated with cancer from the Cancer Gene Census catalogue (47) and genes linked to human diseases from the OMIM database (48) (Supplementary Materials). Similarly to the type of analysis describe earlier, the UniHI user can assess whether phenotypic associations are overrepresented among network proteins, and highlight the associated proteins within the network. A help page with detailed description of the different tools and sample outputs for typical analyses is available on the UniHI 7 webpage.

IMPLEMENTATION

The architecture of UniHI 7 comprises a database and an application layer. The database layer is implemented using MySQL, an open source SQL relational database management system. The application layer is implemented using a J2EE architecture including, e.g. JDBC to connect to the back-end database, DAO for interacting with the database and accessing data and JavaServerPages to generate web pages. Data retrieval from the database is performed using the Hibernate library. The communication between client and the application layer is through a Tomcat server. To perform enrichment analyses, UniHI 7 connects via Rserve (http://www.rforge.net/Rserve/) to the R/Bioconductor software (17). Matching of gene and protein identifiers was carried out using information from HGNC (35) and applying the g:Convert web tool (49). UniHI 7 performs best with small- to medium-sized networks (with up to several hundred interactions). For larger networks, the visualization and analysis becomes increasingly time consuming.

CONCLUSIONS

UniHI 7 is intended to serve as a bridge between resources for interaction data and more advanced software. It provides a user-friendly web-based platform to study networks underlying molecular mechanisms in human health and disease. Customization allows users to: (i) adjust the set of included interactions, (ii) overlay retrieved sub-networks with other types of data, (iii) inspect networks for relevant genes, (iv) determine potential network functions and (v) associations with phenotypes. We hope that UniHI 7 will considerably facilitate network-orientated investigations for many researchers, especially for those who are new to this field.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

Funding for the presented work was provided by Portuguese [PTDC/BIA-GEN/116519/2010 and IBB/CBME, LA] and [A-2666]. R.K. is recipient of a FCT scholarship [SFRH/BPD/70718/2010]. D.A. was supported by the FCT [PTDC/BIA-BCM/117975/2010]. Funding for open access charge: Portuguese Fundação para a Ciência e a Tecnologia [PTDC/BIA-GEN/116519/2010]. Conflict of interest statement. None declared.

49 in total

1. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors: M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal: Nat Genet Date: 2000-05 Impact factor: 38.330

2. The Database of Interacting Proteins: 2004 update.

Authors: Lukasz Salwinski; Christopher S Miller; Adam J Smith; Frank K Pettit; James U Bowie; David Eisenberg
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

3. Integrated systems approach identifies genetic nodes and networks in late-onset Alzheimer's disease.

Authors: Bin Zhang; Chris Gaiteri; Liviu-Gabriel Bodea; Zhi Wang; Joshua McElwee; Alexei A Podtelezhnikov; Chunsheng Zhang; Tao Xie; Linh Tran; Radu Dobrin; Eugene Fluder; Bruce Clurman; Stacey Melquist; Manikandan Narayanan; Christine Suver; Hardik Shah; Milind Mahajan; Tammy Gillis; Jayalakshmi Mysore; Marcy E MacDonald; John R Lamb; David A Bennett; Cliona Molony; David J Stone; Vilmundur Gudnason; Amanda J Myers; Eric E Schadt; Harald Neumann; Jun Zhu; Valur Emilsson
Journal: Cell Date: 2013-04-25 Impact factor: 41.582

4. Online predicted human interaction database.

Authors: Kevin R Brown; Igor Jurisica
Journal: Bioinformatics Date: 2005-01-18 Impact factor: 6.937

5. A gene atlas of the mouse and human protein-encoding transcriptomes.

Authors: Andrew I Su; Tim Wiltshire; Serge Batalov; Hilmar Lapp; Keith A Ching; David Block; Jie Zhang; Richard Soden; Mimi Hayakawa; Gabriel Kreiman; Michael P Cooke; John R Walker; John B Hogenesch
Journal: Proc Natl Acad Sci U S A Date: 2004-04-09 Impact factor: 11.205

6. Bioconductor: open software development for computational biology and bioinformatics.

Authors: Robert C Gentleman; Vincent J Carey; Douglas M Bates; Ben Bolstad; Marcel Dettling; Sandrine Dudoit; Byron Ellis; Laurent Gautier; Yongchao Ge; Jeff Gentry; Kurt Hornik; Torsten Hothorn; Wolfgang Huber; Stefano Iacus; Rafael Irizarry; Friedrich Leisch; Cheng Li; Martin Maechler; Anthony J Rossini; Gunther Sawitzki; Colin Smith; Gordon Smyth; Luke Tierney; Jean Y H Yang; Jianhua Zhang
Journal: Genome Biol Date: 2004-09-15 Impact factor: 13.583

Review 7. A census of human cancer genes.

Authors: P Andrew Futreal; Lachlan Coin; Mhairi Marshall; Thomas Down; Timothy Hubbard; Richard Wooster; Nazneen Rahman; Michael R Stratton
Journal: Nat Rev Cancer Date: 2004-03 Impact factor: 60.716

8. Consolidating the set of known human protein-protein interactions in preparation for large-scale mapping of the human interactome.

Authors: Arun K Ramani; Razvan C Bunescu; Raymond J Mooney; Edward M Marcotte
Journal: Genome Biol Date: 2005-04-15 Impact factor: 13.583

9. A first-draft human protein-interaction map.

Authors: Ben Lehner; Andrew G Fraser
Journal: Genome Biol Date: 2004-08-13 Impact factor: 13.583

10. A new reference implementation of the PSICQUIC web service.

Authors: Noemi del-Toro; Marine Dumousseau; Sandra Orchard; Rafael C Jimenez; Eugenia Galeota; Guillaume Launay; Johannes Goll; Karin Breuer; Keiichiro Ono; Lukasz Salwinski; Henning Hermjakob
Journal: Nucleic Acids Res Date: 2013-05-13 Impact factor: 16.971

31 in total

1. Silencing the cleavage factor CFIm25 as a new strategy to control Entamoeba histolytica parasite.

Authors: Juan David Ospina-Villa; Nancy Guillén; Cesar Lopez-Camarillo; Jacqueline Soto-Sanchez; Esther Ramirez-Moreno; Raul Garcia-Vazquez; Carlos A Castañon-Sanchez; Abigail Betanzos; Laurence A Marchat
Journal: J Microbiol Date: 2017-09-28 Impact factor: 3.422

2. IHP-PING-generating integrated human protein-protein interaction networks on-the-fly.

Authors: Gaston K Mazandu; Christopher Hooper; Kenneth Opap; Funmilayo Makinde; Victoria Nembaware; Nicholas E Thomford; Emile R Chimusa; Ambroise Wonkam; Nicola J Mulder
Journal: Brief Bioinform Date: 2021-07-20 Impact factor: 11.622

3. Interaction network analysis of YBX1 for identification of therapeutic targets in adenocarcinomas.

Authors: Suriya Narayanan Murugesan; Birendra Singh Yadav; Pramod Kumar Maurya; Amit Chaudhary; Swati Singh; Ashutosh Mani
Journal: J Biosci Date: 2019-06 Impact factor: 1.826

4. Defect of SLC38A3 promotes epithelial-mesenchymal transition and predicts poor prognosis in esophageal squamous cell carcinoma.

Authors: Rui Liu; Ruoxi Hong; Yan Wang; Ying Gong; Danna Yeerken; Di Yang; Jinting Li; Jiawen Fan; Jie Chen; Weimin Zhang; Qimin Zhan
Journal: Chin J Cancer Res Date: 2020-10-31 Impact factor: 5.087

5. Using Whole Exome Sequencing to Identify Candidate Genes With Rare Variants In Nonsyndromic Cleft Lip and Palate.

Authors: Alana Aylward; Yi Cai; Andrew Lee; Elizabeth Blue; Daniel Rabinowitz; Joseph Haddad
Journal: Genet Epidemiol Date: 2016-05-27 Impact factor: 2.135

6. Attenuated monocyte apoptosis, a new mechanism for osteoporosis suggested by a transcriptome-wide expression study of monocytes.

Authors: Yao-Zhong Liu; Yu Zhou; Lei Zhang; Jian Li; Qing Tian; Ji-Gang Zhang; Hong-Wen Deng
Journal: PLoS One Date: 2015-02-06 Impact factor: 3.240

7. Evolutionary signatures amongst disease genes permit novel methods for gene prioritization and construction of informative gene-based networks.

Authors: Nolan Priedigkeit; Nicholas Wolfe; Nathan L Clark
Journal: PLoS Genet Date: 2015-02-13 Impact factor: 5.917