| Literature DB >> 32283553 |
Núria Queralt-Rosinach1, Gregory S Stupp1, Tong Shu Li1, Michael Mayers1, Maureen E Hoatlin2, Matthew Might3, Benjamin M Good1, Andrew I Su1.
Abstract
Hypothesis generation is a critical step in research and a cornerstone in the rare disease field. Research is most efficient when those hypotheses are based on the entirety of knowledge known to date. Systematic review articles are commonly used in biomedicine to summarize existing knowledge and contextualize experimental data. But the information contained within review articles is typically only expressed as free-text, which is difficult to use computationally. Researchers struggle to navigate, collect and remix prior knowledge as it is scattered in several silos without seamless integration and access. This lack of a structured information framework hinders research by both experimental and computational scientists. To better organize knowledge and data, we built a structured review article that is specifically focused on NGLY1 Deficiency, an ultra-rare genetic disease first reported in 2012. We represented this structured review as a knowledge graph and then stored this knowledge graph in a Neo4j database to simplify dissemination, querying and visualization of the network. Relative to free-text, this structured review better promotes the principles of findability, accessibility, interoperability and reusability (FAIR). In collaboration with domain experts in NGLY1 Deficiency, we demonstrate how this resource can improve the efficiency and comprehensiveness of hypothesis generation. We also developed a read-write interface that allows domain experts to contribute FAIR structured knowledge to this community resource. In contrast to traditional free-text review articles, this structured review exists as a living knowledge graph that is curated by humans and accessible to computational analyses. Finally, we have generalized this workflow into modular and repurposable components that can be applied to other domain areas. This NGLY1 Deficiency-focused network is publicly available at http://ngly1graph.org/.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32283553 PMCID: PMC7153956 DOI: 10.1093/database/baaa015
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1Conceptual overview of structured review articles. This figure represents the distribution of knowledge in databases accessible to the community in terms of domains compiled (X axis) and information structured (Y axis). Gray squares indicate knowledge focus of a database with regards to the domain(s) and information structured.
Figure 2Library architecture. Architecture of the system based on four components. The edges component contains libraries with functions to collect, normalize and format the information and data resources we want to integrate as individual networks. The graph component contains functions to integrate and create the knowledge graph. The Neo4j component contains the module to import the graph into Neo4j. Finally, the hypothesis-generation component contains the modules to query the graph, structure the resulting semantic paths and extract summaries to analyse connections and the evidence.
Figure 3Exploration of mechanistic paths between NGLY1 and AQP1 based on the regulatory hypothesis. (A) First query topology for the regulatory hypothesis. We defined a path topology based on gene pathways of length four linking the NGLY1 ortholog in Drosophila (Pngl) with the human AQP1 gene. The bridging nodes and edges were based on transcriptional regulatory relationships in both Drosophila and human, plus orthology relationships between human and fly genes. (B) Mechanistic hypotheses resulted from the first query.
Figure 4Exploration of the evidence relating candidate regulators of AQP1 to NGLY1 Deficiency phenotypes. (A) Second query topology for the AQP1 regulation-disease phenotypes shared genetic basis hypothesis. (B) Hypotheses resulted from the second query. All edges are of type ‘has phenotype’.