Literature DB >> 31070726

INDRA-IPM: interactive pathway modeling using natural language with automated assembly.

Petar V Todorov1, Benjamin M Gyori1, John A Bachman1, Peter K Sorger1.   

Abstract

SUMMARY: INDRA-IPM (Interactive Pathway Map) is a web-based pathway map modeling tool that combines natural language processing with automated model assembly and visualization. INDRA-IPM contextualizes models with expression data and exports them to standard formats.
AVAILABILITY AND IMPLEMENTATION: INDRA-IPM is available at: http://pathwaymap.indra.bio. Source code is available at http://github.com/sorgerlab/indra_pathway_map. The underlying web service API is available at http://api.indra.bio:8000. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2019. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2019        PMID: 31070726      PMCID: PMC6821420          DOI: 10.1093/bioinformatics/btz289

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

Disease or process-specific pathway maps are commonly used to communicate mechanistic information about interacting genes and proteins (Ostaszewski ). These maps contain information on biomolecules and their interactions organized around a specific biological process, for instance growth-factor signaling mediated by RAS and MAP kinases (Stephen ). Unlike genome-wide interactomes, pathway maps are typically restricted in scope and are fit to purpose to improve human intelligibility and avoid the ‘hairball effect’. Multiple graphical editing tools have been developed to assemble and display pathway maps (King ; O’Hara ; Sari ) but these do not currently use the primary medium of scientific communication in biomedicine: natural language. Natural language descriptions are familiar, do not require specialized expertise to create and edit, and can be drawn directly from the scientific literature (Gyori ). The use of natural language interfaces for pathway modeling and analysis makes it possible to draw on a much larger community of experts. In this article, we describe the INDRA (Integrated Network and Dynamical Reasoning Assembler) Interactive Pathway Map (INDRA-IPM), a web-based pathway modeling tool that builds on the capabilities of INDRA (Gyori ) to construct and edit pathway maps in natural language and display the results in familiar graphical formats. INDRA-IPM allows models to be exported in several different standard exchange formats, thereby enabling the use of existing tools for causal inference, visualization and kinetic modeling. We also make the capabilities of IPM available as a web service to facilitate use by other software.

2 Results

2.1 Pathway map construction

INDRA-IPM provides an interface to enter English language text describing mechanisms used to generate a pathway map. This description is processed by one or more natural language processing (NLP) systems; users can choose between the REACH (Valenzuela-Escarcega ) and TRIPS (Allen ) NLP systems. A pathway map is then generated automatically from the NLP output and visualized dynamically. Users can iteratively update and extend the pathway map by editing the underlying natural language.

2.2 Visual representation

The pathway map is represented as a directed graph with nodes corresponding to molecular entities (genes/proteins, families, complexes and small molecules) and edges representing mechanistic relationships among them. The graph is displayed using CytoscapeJS (Franz ) with a two-stage hierarchical layout procedure designed to reduce visual complexity. INDRA-IPM groups nodes with identical incoming and outgoing edges and aggregates them into a single bounding box with collapsed edges (e.g. RASGRF and SOS in Fig. 1). Nodes representing protein families or complexes (e.g. the Sprouty family, SPRY in Fig. 1), are recognized using the FamPlex ontology (Bachman ) and represented by a single node subdivided to show the genes in the family as slices.
Fig. 1.

Pathway maps are assembled from natural language descriptions of mechanisms (1). INDRA-IPM renders pathway maps as graphs with node and edge grouping and coloring determined by mutational status and expression level (2). Pathways can be stored and shared on NDEx and exported and downloaded in many standard formats (3). Node tooltips provide links to online databases having information on genes/protein in the pathway and also to antibodies against proteins in the node (4). Literature-based evidence for a given interaction can be accessed by clicking on an edge. Corresponding evidence sentences drawn from these publications are then shown with links out to the PubMed entry in which the sentences are found (5)

Pathway maps are assembled from natural language descriptions of mechanisms (1). INDRA-IPM renders pathway maps as graphs with node and edge grouping and coloring determined by mutational status and expression level (2). Pathways can be stored and shared on NDEx and exported and downloaded in many standard formats (3). Node tooltips provide links to online databases having information on genes/protein in the pathway and also to antibodies against proteins in the node (4). Literature-based evidence for a given interaction can be accessed by clicking on an edge. Corresponding evidence sentences drawn from these publications are then shown with links out to the PubMed entry in which the sentences are found (5)

2.3 Integration with modeling and exchange formats

To leverage automated assembly for diverse modeling tasks, INDRA-IPM exports models as SBML, SBGN, BNGL, Kappa, PySB and CX. These formats are widely used in computational biology for modeling, simulating and visualizing pathways. Users also have the option of storing maps on the Network Data Exchange (NDEx) (Pratt ) where they can be shared and reloaded into INDRA-IPM using a persistent URL. More details on these formats are available in the Supplementary Material.

2.4 Integration with gene level data

INDRA-IPM enables users to project mutation and expression data onto the pathway map and thereby visualize data specific to a particular cell type. Mutation status is mapped to color (with green nodes denoting wild-type and orange nodes mutations) and relative expression levels to color intensity (greater color saturation denotes higher expression). Cancer Cell Line Encyclopedia (CCLE; Barretina ) data are embedded in INDRA-IPM making it possible to view mutation and expression information for 996 cell lines.

2.5 Integration with external resources

The NLP tools used by INDRA-IPM link each node (or subnode in a family) to a database identifier using named entity recognition. This makes it possible to connect a pathway map to standard external resources via uniform identifiers. For example, by clicking on a node, a tooltip appears with links to HGNC, UniProt and CiteAb, allowing users to access details about the constituents of a pathway and identify reagents useful for experiments (e.g. antibodies).

2.6 Integration with evidence from scientific literature

Clicking on an edge in a pathway map retrieves support for that interaction by querying a database of interactions aggregated by INDRA. This database includes information gathered from reading literature at scale (Valenzuela-Escarcega ) and information found in curated knowledge bases such as Pathway Commons (Cerami ) and the BEL Large Corpus (www.openbel.org). Users are therefore able to access literature support for relationships specified in natural language descriptions.

2.7 RAS pathway map

As a demonstration of INDRA-IPM, we wrote 43 English sentences to capture all nodes and interactions in a pathway map originally created by the NCI RAS Initiative (cancer.gov/research/key-initiatives/ras). The INDRA-IPM map automatically follows the same visual conventions as the hand-drawn map, hierarchically organizing the graph and spatially grouping related nodes to reduce clutter. In addition, INDRA-IPM substantially extends the original map by providing access to supporting evidence sourced by INDRA, linking elements to external data resources and provide context from CCLE data. The RAS pathway map is available as a built-in example in INDRA-IPM.

2.8 Web service API

To facilitate integration of INDRA-IPM with other tools, we make it available a Web-based API that accesses reading, assembly and export functions of INDRA-IPM.

Funding

This work was funded under the DARPA Big Mechanism and CwC Programs [W911NF-14-1-0397 and W911NF-15-1-0544] and by NIH [P50-GM107618]. Conflict of Interest: P.K.S. holds equity in Merrimack Pharmaceuticals, Glencoe Software, Applied Biomath and RareCyte Inc. P.K.S. declares that none of these relationships are directly or indirectly related to the content of this article. Click here for additional data file.
  11 in total

Review 1.  Dragging ras back in the ring.

Authors:  Andrew G Stephen; Dominic Esposito; Rachel K Bagni; Frank McCormick
Journal:  Cancer Cell       Date:  2014-03-17       Impact factor: 31.743

2.  NDEx, the Network Data Exchange.

Authors:  Dexter Pratt; Jing Chen; David Welker; Ricardo Rivas; Rudolf Pillich; Vladimir Rynkov; Keiichiro Ono; Carol Miello; Lyndon Hicks; Sandor Szalma; Aleksandar Stojmirovic; Radu Dobrin; Michael Braxenthaler; Jan Kuentzer; Barry Demchak; Trey Ideker
Journal:  Cell Syst       Date:  2015-10-28       Impact factor: 10.304

3.  Pathway Commons, a web resource for biological pathway data.

Authors:  Ethan G Cerami; Benjamin E Gross; Emek Demir; Igor Rodchenkov; Ozgün Babur; Nadia Anwar; Nikolaus Schultz; Gary D Bader; Chris Sander
Journal:  Nucleic Acids Res       Date:  2010-11-10       Impact factor: 16.971

4.  The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity.

Authors:  Jordi Barretina; Giordano Caponigro; Nicolas Stransky; Kavitha Venkatesan; Adam A Margolin; Sungjoon Kim; Christopher J Wilson; Joseph Lehár; Gregory V Kryukov; Dmitriy Sonkin; Anupama Reddy; Manway Liu; Lauren Murray; Michael F Berger; John E Monahan; Paula Morais; Jodi Meltzer; Adam Korejwa; Judit Jané-Valbuena; Felipa A Mapa; Joseph Thibault; Eva Bric-Furlong; Pichai Raman; Aaron Shipway; Ingo H Engels; Jill Cheng; Guoying K Yu; Jianjun Yu; Peter Aspesi; Melanie de Silva; Kalpana Jagtap; Michael D Jones; Li Wang; Charles Hatton; Emanuele Palescandolo; Supriya Gupta; Scott Mahan; Carrie Sougnez; Robert C Onofrio; Ted Liefeld; Laura MacConaill; Wendy Winckler; Michael Reich; Nanxin Li; Jill P Mesirov; Stacey B Gabriel; Gad Getz; Kristin Ardlie; Vivien Chan; Vic E Myer; Barbara L Weber; Jeff Porter; Markus Warmuth; Peter Finan; Jennifer L Harris; Matthew Meyerson; Todd R Golub; Michael P Morrissey; William R Sellers; Robert Schlegel; Levi A Garraway
Journal:  Nature       Date:  2012-03-28       Impact factor: 49.962

5.  SBGNViz: A Tool for Visualization and Complexity Management of SBGN Process Description Maps.

Authors:  Mecit Sari; Istemi Bahceci; Ugur Dogrusoz; Selcuk Onur Sumer; Bülent Arman Aksoy; Özgün Babur; Emek Demir
Journal:  PLoS One       Date:  2015-06-01       Impact factor: 3.240

6.  Escher: A Web Application for Building, Sharing, and Embedding Data-Rich Visualizations of Biological Pathways.

Authors:  Zachary A King; Andreas Dräger; Ali Ebrahim; Nikolaus Sonnenschein; Nathan E Lewis; Bernhard O Palsson
Journal:  PLoS Comput Biol       Date:  2015-08-27       Impact factor: 4.475

7.  From word models to executable models of signaling networks using automated assembly.

Authors:  Benjamin M Gyori; John A Bachman; Kartik Subramanian; Jeremy L Muhlich; Lucian Galescu; Peter K Sorger
Journal:  Mol Syst Biol       Date:  2017-11-24       Impact factor: 11.429

8.  FamPlex: a resource for entity recognition and relationship resolution of human protein families and complexes in biomedical text mining.

Authors:  John A Bachman; Benjamin M Gyori; Peter K Sorger
Journal:  BMC Bioinformatics       Date:  2018-06-28       Impact factor: 3.169

Review 9.  Community-driven roadmap for integrated disease maps.

Authors:  Marek Ostaszewski; Stephan Gebel; Inna Kuperstein; Alexander Mazein; Andrei Zinovyev; Ugur Dogrusoz; Jan Hasenauer; Ronan M T Fleming; Nicolas Le Novère; Piotr Gawron; Thomas Ligon; Anna Niarakis; David Nickerson; Daniel Weindl; Rudi Balling; Emmanuel Barillot; Charles Auffray; Reinhard Schneider
Journal:  Brief Bioinform       Date:  2019-03-25       Impact factor: 11.622

10.  Cytoscape.js: a graph theory library for visualisation and analysis.

Authors:  Max Franz; Christian T Lopes; Gerardo Huck; Yue Dong; Onur Sumer; Gary D Bader
Journal:  Bioinformatics       Date:  2015-09-28       Impact factor: 6.937

View more
  4 in total

1.  Parameter Estimation and Uncertainty Quantification for Systems Biology Models.

Authors:  Eshan D Mitra; William S Hlavacek
Journal:  Curr Opin Syst Biol       Date:  2019-11-06

2.  The status of causality in biological databases: data resources and data retrieval possibilities to support logical modeling.

Authors:  Vasundra Touré; Åsmund Flobak; Anna Niarakis; Steven Vercruysse; Martin Kuiper
Journal:  Brief Bioinform       Date:  2021-07-20       Impact factor: 11.622

3.  The Dark Kinase Knowledgebase: an online compendium of knowledge and experimental results of understudied kinases.

Authors:  Matthew E Berginski; Nienke Moret; Changchang Liu; Dennis Goldfarb; Peter K Sorger; Shawn M Gomez
Journal:  Nucleic Acids Res       Date:  2021-01-08       Impact factor: 16.971

4.  The Minimum Information about a Molecular Interaction CAusal STatement (MI2CAST).

Authors:  Vasundra Touré; Steven Vercruysse; Marcio Luis Acencio; Ruth C Lovering; Sandra Orchard; Glyn Bradley; Cristina Casals-Casas; Claudine Chaouiya; Noemi Del-Toro; Åsmund Flobak; Pascale Gaudet; Henning Hermjakob; Charles Tapley Hoyt; Luana Licata; Astrid Lægreid; Christopher J Mungall; Anne Niknejad; Simona Panni; Livia Perfetto; Pablo Porras; Dexter Pratt; Julio Saez-Rodriguez; Denis Thieffry; Paul D Thomas; Dénes Türei; Martin Kuiper
Journal:  Bioinformatics       Date:  2021-04-05       Impact factor: 6.937

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.