Literature DB >> 24271398

NeXO Web: the NeXO ontology database and visualization platform.

Janusz Dutkowski1, Keiichiro Ono, Michael Kramer, Michael Yu, Dexter Pratt, Barry Demchak, Trey Ideker.   

Abstract

The Network-extracted Ontology (NeXO) is a gene ontology inferred directly from large-scale molecular networks. While most ontologies are constructed through manual expert curation, NeXO uses a principled computational approach which integrates evidence from hundreds of thousands of individual gene and protein interactions to construct a global hierarchy of cellular components and processes. Here, we describe the development of the NeXO Web platform (http://www.nexontology.org)-an online database and graphical user interface for visualizing, browsing and performing term enrichment analysis using NeXO and the gene ontology. The platform applies state-of-the-art web technology and visualization techniques to provide an intuitive framework for investigating biological machinery captured by both data-driven and manually curated ontologies.

Entities:  

Mesh:

Year:  2013        PMID: 24271398      PMCID: PMC3965056          DOI: 10.1093/nar/gkt1192

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Ontologies provide powerful means for cataloging entities and entity relationships within many domains of knowledge (1,2). In molecular and cellular biology, gene ontology provides structured knowledge about the cellular organization and biological functions encoded by genes. Although most ontologies, including the highly successful Gene Ontology (GO) (3), are constructed through manual expert curation, we have recently developed Network-extracted Ontology (NeXO)—a data-driven gene ontology inferred directly from ‘omics data’ (4). Through a principled computational approach, our method integrates evidence from hundreds of thousands of individual gene and protein interactions to construct a complete hierarchy of cellular components and processes which recapitulates known biological machinery and uncovers many new structures. Online databases and visualization platforms are essential in providing the users with convenient access to ontologies (e.g. 5–7). Since the publication of the NeXO concept paper (4), we now report development of NeXO Web as an online resource, including the ontology database and a fully interactive graphical user interface (GUI) for storing, accessing and browsing the NeXO ontology. This system allows the user to retrieve genes and ontology terms by name and description, map the position of the gene or term in the hierarchy and display both the direct neighborhood of the gene or term and the entire graph structure of the ontology. The NeXO Web resource complements currently available ontology visualization systems (e.g. 5,6) in three major ways. First, it represents the first gene ontology database built directly from high-throughput data. Second, it provides a novel and intuitive visualization system for exploring gene ontologies, with access to both NeXO and GO. In this system, the entire gene ontology is spread out hierarchically and explored with semantic zooming in the style of Google Maps (Figure 1). Third, the visualization system is directly integrated with term enrichment analysis, allowing the user to easily identify and visually explore NeXO and GO terms that are significantly enriched among a selected list of genes.
Figure 1.

NeXO Web ontology view. The hierarchical layout of the NeXO ontology graph in which nodes represent ontology terms and edges represent term relationships. The ontology may be explored interactively utilizing semantic zooming functionality which dynamically adjusts the level of detail presented to the user.

NeXO Web ontology view. The hierarchical layout of the NeXO ontology graph in which nodes represent ontology terms and edges represent term relationships. The ontology may be explored interactively utilizing semantic zooming functionality which dynamically adjusts the level of detail presented to the user.

OVERVIEW OF THE NEXO ONTOLOGY

The NeXO ontology (4) currently combines evidence from four fundamental types of interactions available for yeast: physical protein–protein interactions, genetic interactions (synthetic lethality and epistasis), transcriptional networks (gene co-expression) and an integrated functional network YeastNet (8). These networks are integrated and clustered hierarchically using a probabilistic community detection algorithm (9), producing a binary tree (or dendrogram) in which genes are joined based on the similarity of their interaction patterns. The binary tree is subsequently transformed into a directed acyclic graph (DAG) by: (i) identifying binary joins in the tree that can be replaced by multi-way joins and (ii) supplementing the tree with additional parent–child connections supported by the input interaction data. An ontology alignment procedure is then applied to map between the data-driven DAG and the GO and transfer the term names and annotations from GO to the matching nodes in the NeXO DAG. The result is a network-extracted ontology which contains 4123 biological concepts and 5766 hierarchical concept relations and captures both known and novel biology (4).

The NeXO Web platform

To provide the biological community with convenient and intuitive access to NeXO, we have developed NeXO Web—an ontology database resource with a powerful GUI and API (application programming interface). The NeXO website currently supports access to both the NeXO and GO ontologies. For both types of ontologies, the intuitive visualization system performs a hierarchical layout of the ontology graph according to its most informative parent–child term relations (Figure 1). The entire structure is explored with semantic zooming functionality providing ‘details on demand’ in the style of Google Maps—the labels of the nodes appear and disappear to match the zoom level. The platform takes advantage of state-of-the-art web technologies and modern web browsers with HTML5 support, enabling modular architecture, enhanced performance and dynamic look-and-feel functionality. On the server side, Node.js and the Express Web application framework provide a fully functional representational state transfer (REST) API (see also the ‘Developer Manual’ page in the online documentation) for accessing the input molecular interaction networks, the ontology DAGs and term annotations stored in a Neo4j graph database. Graph operations are implemented using the Tinkerpop Gremlin framework enabling complex graph traversal on the fly. Term enrichment functionality is implemented as a web service using NumPy and FlaskRESTful. Client-side JavaScript libraries including Cytoscape.js, Sigma.js and Highcharts support interactive visualization of networks and data charts.

Navigating NeXO Web

The ontology graph: terms and relations

Both NeXO and GO ontologies are structured as DAGs of terms (nodes) and relations between terms (edges) (Figure 1). In GO, terms are labeled with the cellular component, process or function they represent. In NeXO, terms are labeled based on the best alignment of the data-driven ontology to the GO cellular component ontology. Edges can have either of two meanings: (i) the child term is a part of the parent term (‘part_of’ relation); (ii) the child term is a type of the parent term (‘is_a’ relation). For example, the ‘Cytosolic large ribosomal subunit’ and the ‘Cytosolic small ribosomal subunit’ are both parts of the ‘Cytosolic ribosome’ (Figure 2) which is a type of ‘Ribosomal subunit’ which, in turn, is a type of ‘Ribonucleoprotein complex’. Automatically identifying relationship types such as ‘is_a’ or ‘part_of’ is an active area of investigation. In its current version, NeXO does not distinguish between ontology relationship types; both types are shown.
Figure 2.

NeXO Web search results. Searching the NeXO ontology for terms whose name or description contains the phrase ‘ribosome’. One of the identified terms is NeXO:10022, which is significantly aligned to and named after the ‘cytosolic ribosome’ term of the GO Cellular Component ontology. Selecting this term repositions and rescales the ontology view on the term and its neighborhood. The node corresponding to the selected term is indicated in orange. The nodes and edges on the path from the selected node to the root of the ontology are indicated in blue.

NeXO Web search results. Searching the NeXO ontology for terms whose name or description contains the phrase ‘ribosome’. One of the identified terms is NeXO:10022, which is significantly aligned to and named after the ‘cytosolic ribosome’ term of the GO Cellular Component ontology. Selecting this term repositions and rescales the ontology view on the term and its neighborhood. The node corresponding to the selected term is indicated in orange. The nodes and edges on the path from the selected node to the root of the ontology are indicated in blue.

Interactive browsing

Interactive browsing of the ontology is performed using the mouse, track pad or touchscreen device: by scrolling to zoom in or out of selected regions of the ontology, clicking-and-dragging to pan and clicking an ontology term to select it. When a term is selected, the relations to ancestral terms are highlighted and the term information panel is presented (see below). Double-clicking on the page background resets the current selection and adjusts the ontology graph to fit the page. Additionally, the navigation buttons (lower left) may be used to zoom in and out of the ontology and fit the ontology layout to screen. The user may select which ontology to visualize using the ontology selector (rightmost button in bottom panel; Figure 1). The user may select which species (currently yeast) and which ontology to visualize using the species selector and ontology selector, respectively the two rightmost buttons in the bottom panel (Figure 1). The NeXO yeast ontology is displayed by default.

Searching for terms and genes

NeXO Web search engine allows searching the ontology either by term keyword (including name and description) or by gene name (Figure 2). Results are displayed below the search box. Clicking on a search result selects and highlights a gene or term in the displayed ontology. The refresh button may be used to clear search results and the search box. Currently, the search engine assumes that search results must contain all words in the query. Queries are case insensitive and multiple words encased in double quotes are treated as a single phrase.

TERM ENRICHMENT ANALYSIS

The NeXO Web platform also provides an integrated interface for performing term enrichment analysis in both the NeXO and GO ontologies (Figure 3A). The term enrichment interface can be accessed by clicking the double arrow link placed to the right of the search box. The user is asked to provide a list of query genes and specify optional parameters for the maximum P-value cut-off and minimum number of genes assigned to the term. The system then performs a series of hypergeometric tests to determine the enrichment of the list of query genes in any term in the active ontology. Terms which pass the thresholds for the maximum P-value and minimum number of query genes are listed underneath the query box in the order of increasing P-values. For example, enrichment analysis using genes whose knock-out causes cell sensitivity to methyl methanesulfonate (MMS) (10) identifies a number of known cellular components associated with replication and DNA repair as well as potentially novel components such as the term NeXO:9715 (Figure 3A).
Figure 3.

NeXO Web term analysis facilities. (A) The term enrichment analysis panel. Term enrichment analysis of genes whose knock-out sensitized cells to MMS reveals a number of enriched NeXO terms. One of the terms is the term NeXO:9715. Selecting this term in the NeXO ontology opens the slide-out term information panel (B). The term information panel shows the supporting interaction network, network support statistics and alignment of the term to the three branches of the GO. Although the term NeXO:9715 is not well aligned to any of the GO ontology branches, the network support for the term is very high, suggesting a newly discovered biological entity.

NeXO Web term analysis facilities. (A) The term enrichment analysis panel. Term enrichment analysis of genes whose knock-out sensitized cells to MMS reveals a number of enriched NeXO terms. One of the terms is the term NeXO:9715. Selecting this term in the NeXO ontology opens the slide-out term information panel (B). The term information panel shows the supporting interaction network, network support statistics and alignment of the term to the three branches of the GO. Although the term NeXO:9715 is not well aligned to any of the GO ontology branches, the network support for the term is very high, suggesting a newly discovered biological entity.

TERM INFORMATION PANEL

One of the key features of NeXO Web is the term information slide panel (Figure 3B), which is invoked whenever the user clicks on a term in the ontology. The information panel includes detailed information about the selected term, including term ID, name, description, synonyms and comments. The gene tab of the information panel also includes a list of genes associated with the term as well as links to reference databases such as the Saccharomyces Genome Database (11). The information panel also includes ontology-specific information—in the case of NeXO, detailed information on the network support for each term.

NeXO-specific term information

For NeXO terms, the term information panel displays statistics about the support for the term in network data (Figure 3B) as well as information on the alignment of the term to each of the branches of the GO (cellular component, biological process and molecular function). The network support statistics include the interaction density, the bootstrap score and the term robustness score. The interaction density is the fraction of pairs of genes associated with the term that are connected by an interaction in the input network. The bootstrap score is the fraction of times that the term was present during bootstrapping, in which 5% of input interactions have been removed. The term robustness score provides an integrated measure of data support for the term, combining interaction support and bootstrap measures (4). The data support measures and alignment statistics are key for prioritizing novel NeXO terms that are well supported by data, but do not map well to existing biology captured by the GO. As we have previously shown, many of these new components and relations may be further validated experimentally and some have been already incorporated into GO (4).

NeXO gene–gene interaction network

To allow for visual inspection of the interaction evidence supporting each NeXO term, the term information panel also includes a dynamic network layout of gene interaction data supporting the term (Figure 3B). For terms with less than 100 associated genes the supporting network is laid out using the spring-embedded layout. Larger networks are visualized using a simple degree-sorted circular layout for fast online performance. Interactions in the network are color-coded according to their type (e.g. protein–protein or genetic). The interactions supporting each NeXO term are also listed in the interaction tab of the information panel.

TREE-BASED LAYOUT OF THE ONTOLOGY

NeXO Web utilizes a tree-based layout of the ontology DAG. This requires identifying a tree structure which spans the ontology, laying out the tree and adding back the additional DAG edges not included in the spanning tree. Although NeXO has a natural spanning tree in the form of the clustering dendrogram derived from the input network data, GO DAGs require additional processing. Here we construct a tree from the original GO DAG by removing edges (parent–child term relations) to multiple parent nodes (terms) based on term size (number of genes) and the type of ontology relation. As done in (4), we first reduce the GO DAG to a relevant set of terms by removing terms that are empty (contain no genes) or redundant (contain the same genes as one of the children terms) with respect to the annotations in S. cerevisiae (10). We then apply rules for combining GO relations (3) to infer a transitive closure of the DAG. For example, the path A “part of” B “is a” C “is a” D implies the relation A “part of” D. For every term, the parent with the smallest size is chosen to be the term’s sole parent in the GO tree with the following preferences. In the GO Cellular Component ontology we first choose among the parents connected to the term by “part of” relations, if any exist. In the Biological Process and Molecular Function ontologies we first consider “is a” relations. We find that these preferences result in more informative trees due to the natural subcomponent relations in the Cellular Component ontology and the more functional nature of relations in the other two GO ontologies. For every term, after one of the parents is selected, edges to the other parents are temporarily removed—they are added back after the layout of the tree is established.

SOFTWARE AND HARDWARE REQUIREMENTS

The NeXO ontology was developed and tested using Chrome and Firefox web browsers. Minimum hardware requirements include Intel Core i5 processor (or equivalent), 4 GB RAM and 1280 × 800 screen resolution.

CONCLUSION

The NeXO Web database and platform is a systematically generated resource for genomics and systems biology—a data-driven catalog of cellular machinery from genes, to complexes, to pathways and higher-order processes. It provides means for performing multiscale analysis of biological networks, including automatically identifying, annotating and visualizing their complete hierarchical structure. Each NeXO term is automatically scored based on its support in data and correspondence to known biology as captured by the GO. For cell biologists, NeXO Web provides an intuitive framework for exploring both expert-curated and data-driven ontologies and for prioritizing new terms and term relations that can further be validated experimentally. For editors of the GO, the platform may serve as a tool for identifying terms and term relations that are already well supported by data and literature, but may have escaped prior curation efforts.

FUNDING

The National Resource for Network Biology (nrnb.org) under a grant from the National Institute of General Medical Sciences [GM103504]. Funding for open access charge: National Resource for Network Biology (NIH). Conflict of interest statement. None declared.
  11 in total

1.  The National Center for Biomedical Ontology.

Authors:  Mark A Musen; Natalya F Noy; Nigam H Shah; Patricia L Whetzel; Christopher G Chute; Margaret-Anne Story; Barry Smith
Journal:  J Am Med Inform Assoc       Date:  2011-11-10       Impact factor: 4.497

2.  The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration.

Authors:  Barry Smith; Michael Ashburner; Cornelius Rosse; Jonathan Bard; William Bug; Werner Ceusters; Louis J Goldberg; Karen Eilbeck; Amelia Ireland; Christopher J Mungall; Neocles Leontis; Philippe Rocca-Serra; Alan Ruttenberg; Susanna-Assunta Sansone; Richard H Scheuermann; Nigam Shah; Patricia L Whetzel; Suzanna Lewis
Journal:  Nat Biotechnol       Date:  2007-11       Impact factor: 54.908

3.  The chemical genomic portrait of yeast: uncovering a phenotype for all genes.

Authors:  Maureen E Hillenmeyer; Eula Fung; Jan Wildenhain; Sarah E Pierce; Shawn Hoon; William Lee; Michael Proctor; Robert P St Onge; Mike Tyers; Daphne Koller; Russ B Altman; Ronald W Davis; Corey Nislow; Guri Giaever
Journal:  Science       Date:  2008-04-18       Impact factor: 47.728

4.  A gene ontology inferred from molecular networks.

Authors:  Janusz Dutkowski; Michael Kramer; Michal A Surma; Rama Balakrishnan; J Michael Cherry; Nevan J Krogan; Trey Ideker
Journal:  Nat Biotechnol       Date:  2013-01       Impact factor: 54.908

5.  QuickGO: a user tutorial for the web-based Gene Ontology browser.

Authors:  Rachael P Huntley; David Binns; Emily Dimmer; Daniel Barrell; Claire O'Donovan; Rolf Apweiler
Journal:  Database (Oxford)       Date:  2009-09-29       Impact factor: 3.451

6.  Resolving the structure of interactomes with hierarchical agglomerative clustering.

Authors:  Yongjin Park; Joel S Bader
Journal:  BMC Bioinformatics       Date:  2011-02-15       Impact factor: 3.169

7.  The Gene Ontology: enhancements for 2011.

Authors: 
Journal:  Nucleic Acids Res       Date:  2011-11-18       Impact factor: 16.971

8.  Saccharomyces Genome Database: the genomics resource of budding yeast.

Authors:  J Michael Cherry; Eurie L Hong; Craig Amundsen; Rama Balakrishnan; Gail Binkley; Esther T Chan; Karen R Christie; Maria C Costanzo; Selina S Dwight; Stacia R Engel; Dianna G Fisk; Jodi E Hirschman; Benjamin C Hitz; Kalpana Karra; Cynthia J Krieger; Stuart R Miyasato; Rob S Nash; Julie Park; Marek S Skrzypek; Matt Simison; Shuai Weng; Edith D Wong
Journal:  Nucleic Acids Res       Date:  2011-11-21       Impact factor: 16.971

9.  An improved, bias-reduced probabilistic functional gene network of baker's yeast, Saccharomyces cerevisiae.

Authors:  Insuk Lee; Zhihua Li; Edward M Marcotte
Journal:  PLoS One       Date:  2007-10-03       Impact factor: 3.240

10.  AmiGO: online access to ontology and annotation data.

Authors:  Seth Carbon; Amelia Ireland; Christopher J Mungall; ShengQiang Shu; Brad Marshall; Suzanna Lewis
Journal:  Bioinformatics       Date:  2008-11-25       Impact factor: 6.937

View more
  8 in total

1.  IMMUNOLOGY. An interactive reference framework for modeling a dynamic immune system.

Authors:  Matthew H Spitzer; Pier Federico Gherardini; Gabriela K Fragiadakis; Nupur Bhattacharya; Robert T Yuan; Andrew N Hotson; Rachel Finck; Yaron Carmi; Eli R Zunder; Wendy J Fantl; Sean C Bendall; Edgar G Engleman; Garry P Nolan
Journal:  Science       Date:  2015-07-10       Impact factor: 47.728

2.  Biological network exploration with Cytoscape 3.

Authors:  Gang Su; John H Morris; Barry Demchak; Gary D Bader
Journal:  Curr Protoc Bioinformatics       Date:  2014-09-08

3.  NetworkAnalyst--integrative approaches for protein-protein interaction network analysis and visual exploration.

Authors:  Jianguo Xia; Maia J Benner; Robert E W Hancock
Journal:  Nucleic Acids Res       Date:  2014-05-26       Impact factor: 16.971

4.  Network-driven plasma proteomics expose molecular changes in the Alzheimer's brain.

Authors:  Philipp A Jaeger; Kurt M Lucin; Markus Britschgi; Badri Vardarajan; Ruo-Pan Huang; Elizabeth D Kirby; Rachelle Abbey; Bradley F Boeve; Adam L Boxer; Lindsay A Farrer; NiCole Finch; Neill R Graff-Radford; Elizabeth Head; Matan Hofree; Ruochun Huang; Hudson Johns; Anna Karydas; David S Knopman; Andrey Loboda; Eliezer Masliah; Ramya Narasimhan; Ronald C Petersen; Alexei Podtelezhnikov; Suraj Pradhan; Rosa Rademakers; Chung-Huan Sun; Steven G Younkin; Bruce L Miller; Trey Ideker; Tony Wyss-Coray
Journal:  Mol Neurodegener       Date:  2016-04-26       Impact factor: 14.195

Review 5.  Open source libraries and frameworks for biological data visualisation: a guide for developers.

Authors:  Rui Wang; Yasset Perez-Riverol; Henning Hermjakob; Juan Antonio Vizcaíno
Journal:  Proteomics       Date:  2015-02-05       Impact factor: 3.984

Review 6.  Reverse enGENEering of Regulatory Networks from Big Data: A Roadmap for Biologists.

Authors:  Xiaoxi Dong; Anatoly Yambartsev; Stephen A Ramsey; Lina D Thomas; Natalia Shulzhenko; Andrey Morgun
Journal:  Bioinform Biol Insights       Date:  2015-04-29

7.  Using Network Extracted Ontologies to Identify Novel Genes with Roles in Appressorium Development in the Rice Blast Fungus Magnaporthe oryzae.

Authors:  Ryan M Ames
Journal:  Microorganisms       Date:  2017-01-17

8.  Rac2 controls tumor growth, metastasis and M1-M2 macrophage differentiation in vivo.

Authors:  Shweta Joshi; Alok R Singh; Muamera Zulcic; Lei Bao; Karen Messer; Trey Ideker; Janusz Dutkowski; Donald L Durden
Journal:  PLoS One       Date:  2014-04-25       Impact factor: 3.240

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.