| Literature DB >> 29684165 |
Mansoor Saqi1, Artem Lysenko2, Yi-Ke Guo3, Tatsuhiko Tsunoda4, Charles Auffray5.
Abstract
Large amounts of data emerging from experiments in molecular medicine are leading to the identification of molecular signatures associated with disease subtypes. The contextualization of these patterns is important for obtaining mechanistic insight into the aberrant processes associated with a disease, and this typically involves the integration of multiple heterogeneous types of data. In this review, we discuss knowledge representations that can be useful to explore the biological context of molecular signatures, in particular three main approaches, namely, pathway mapping approaches, molecular network centric approaches and approaches that represent biological statements as knowledge graphs. We discuss the utility of each of these paradigms, illustrate how they can be leveraged with selected practical examples and identify ongoing challenges for this field of research.Entities:
Keywords: disease modeling; integrated knowledge networks; molecular medicine; multi-omics; precision medicine
Mesh:
Year: 2019 PMID: 29684165 PMCID: PMC6556902 DOI: 10.1093/bib/bby025
Source DB: PubMed Journal: Brief Bioinform ISSN: 1467-5463 Impact factor: 11.622
Approaches to knowledge representation for contextualizing disease biomarkers
| Approach | Examples of Formats/Frameworks | Advantages | Drawbacks |
|---|---|---|---|
| Pathway- centric | SBML | Ease of Navigation (e.g. using NaviCell, Google Maps API) | Difficult to represent disease context |
| SBGN | |||
| BioPax | |||
| Integrated molecular networks | GeneMania | Easy-to-use resources and tools | Difficult to represent disease context, although connecting layers of information can provide some context |
| STRING | |||
| Knowledge graphs | openBEL | Agility of graph databases; semantic web approaches offer a federated solution; openBEL framework captures context | Lack of formal ontology in graph database representations; Semantic Web solutions do have a formal ontology but lack agility of graph databases such as Neo4j |
| RDF | |||
| Malacards | |||
| BioXMTM |
Figure 1Examples of representations for contextualizing disease associated genes. A pathway centric view of the network neighbourhood of ALOX5 from ReactomeViz, using the Cytoscape plug-in (left). A molecular network centric view using the GenMania Cytoscape plug-in (middle). A heterogeneous network, including proteins, pathways and diseases, constructed and displayed using Neo4j (right).
List of abbreviations
| Abbreviation | Expanded name | Comment |
|---|---|---|
| API | Application programming interface | A set interfaces, tools and functions used for creation of applications |
| BEL | Biological Expression Language | A curation language for structured capturing of data about biological systems and experiments |
| BELIEF | BEL Information Extraction workFlow system | A framework for automated parsing of information into BEL format |
| BioPAX | Biological Pathways Exchange language | A standard that formally defines biological pathway conceptualization in OWL format |
| BMP | Big Mechanism Project | A project by US Department of Defense for automated construction of mechanistic cancer models from scientific literature |
| COPD | Chronic Obstructive Pulmonary Disease | A lung disease characterized by impeded breathing and phenotypically similar to asthma |
| Cypher | A query language for Neo4J graph database | |
| DBMS | Database management system | A software providing capabilities to mediate storage, query and manipulation of data |
| EBI | European Bioinformatics Institute | |
| EFO | Experimental Factor Ontology | |
| GO | Gene Ontology | One of the most widely used ontologies for representing functions of biological entities |
| GWAS | Genome-wide association study | An observational study that relates germ-line variations between individual to phenotypes |
| IR | Information Retrieval | A process of extraction of relevant information from some wider superset |
| Kappa | A rule-based, declarative language used mostly in molecular biology domain | |
| Neo4j | Graph-based database solution | |
| OBO | Open Biomedical Ontologies initiative | OBO format is alternative language for authoring ontologies; mainly used in biological sciences |
| OWL | Ontology Web Language | Currently most widely used language for authoring ontologies |
| PD | Process Description language | An extension of SBGN standard with additional capabilities to represent temporal aspects of biological processes |
| PPI | Protein-protein interaction | |
| RDF | Resource Description Format | Core data exchange format for the Semantic Web |
| SBGN | Systems Biology Graphical Notation | A standard for graphical representation for biological and biochemical domain |
| SBML | Systems Biology Markup Language | An XML-based format for storing and exchanging biological models |
| SBO | Systems Biology Ontology | |
| SNP | Single Nucleotide Polymorphism | A single-base variant or mutation in a genomic sequence |
| SQL | Structured Query Language | Family of similar query languages used for data management in relational databases |
| TM | Text Mining | A process of extraction of structured data from free text |
| TMO | Translational Medicine Ontology | |
| URL | Uniform Resource Locator | Web address, resolving to a resource on the Web |
| XML | Extensible Markup Language | Popular meta-language for document markup |
Figure 2Analysis of severe asthma differential expression gene signature in the context of STRING protein association network. (A) Fifty genes in the first shell of the 22-gene signature; (a–c) high-degree module interconnectivity genes; (1) calmodulins and HIF-1 signalling pathway, (2) TGF-beta signalling pathway, (3) cytokine–cytokine receptor interaction pathway and (4) circadian rhythm-related genes. (B) Diffusion state distance of gene sets; this measure is derived from similarity of random walk profiles for each pair of nodes. In grey—random pairs of nodes. Distances between members of asthma signatures to each other are shown in blue (severe asthma versus DisGeNet category shows distances between members of the two sets). Finally, 10 closest genes to the differential expression signature set are shown in green. (C) Ten closest genes to the severe asthma signature visualized in the STRING database viewer.
Figure 3Significant pathways identified using Tied Diffusion through Interacting Events method, an approach that can extract the most likely set of interconnectors between two sets of nodes. In this example, core asthma genes from DisGeNET and severe differential expression signature were considered in the context of Reactome regulatory pathway network. In both panels, colour intensity indicates weight magnitude assigned by the algorithm. (A) Complete set of all significant genes. Dashed lines indicate undirected edges of ‘component’ type, whereas solid edges show directed interactions; (B) subset of the network relating to the IL1RN gene. Arrow and bar-terminated edge styles show activation and inhibition, respectively.
Figure 4Query expressed in natural language and resulting output produced for the differential expression gene signature using Hetionet graph database. The following entities are shown: proteins (blue), pathways (yellow), diseases (brown) and drugs (red). (A) Query for associated diseases; (B) query to explore the connections between niclosamide drug, asthma and differential expression signature linked via relevant proteins and pathways.