| Literature DB >> 36125173 |
Deepak R Unni1,2, Sierra A T Moxon2, Michael Bada3, Matthew Brush3, Richard Bruskiewich4, J Harry Caufield2, Paul A Clemons5, Vlado Dancik5, Michel Dumontier6, Karamarie Fecho7, Gustavo Glusman8, Jennifer J Hadlock8, Nomi L Harris2, Arpita Joshi8, Tim Putman3, Guangrong Qin8, Stephen A Ramsey9, Kent A Shefchek3, Harold Solbrig10, Karthik Soman11, Anne E Thessen3, Melissa A Haendel3, Chris Bizon7, Christopher J Mungall2.
Abstract
Within clinical, biomedical, and translational science, an increasing number of projects are adopting graphs for knowledge representation. Graph-based data models elucidate the interconnectedness among core biomedical concepts, enable data structures to be easily updated, and support intuitive queries, visualizations, and inference algorithms. However, knowledge discovery across these "knowledge graphs" (KGs) has remained difficult. Data set heterogeneity and complexity; the proliferation of ad hoc data formats; poor compliance with guidelines on findability, accessibility, interoperability, and reusability; and, in particular, the lack of a universally accepted, open-access model for standardization across biomedical KGs has left the task of reconciling data sources to downstream consumers. Biolink Model is an open-source data model that can be used to formalize the relationships between data structures in translational science. It incorporates object-oriented classification and graph-oriented features. The core of the model is a set of hierarchical, interconnected classes (or categories) and relationships between them (or predicates) representing biomedical entities such as gene, disease, chemical, anatomic structure, and phenotype. The model provides class and edge attributes and associations that guide how entities should relate to one another. Here, we highlight the need for a standardized data model for KGs, describe Biolink Model, and compare it with other models. We demonstrate the utility of Biolink Model in various initiatives, including the Biomedical Data Translator Consortium and the Monarch Initiative, and show how it has supported easier integration and interoperability of biomedical KGs, bringing together knowledge from multiple sources and helping to realize the goals of translational science.Entities:
Mesh:
Year: 2022 PMID: 36125173 PMCID: PMC9372416 DOI: 10.1111/cts.13302
Source DB: PubMed Journal: Clin Transl Sci ISSN: 1752-8054 Impact factor: 4.438
Biolink Model elements and their definitions
| Biolink Model element | Definition | Examples |
|---|---|---|
| Class | High‐level types (or categories) representing core biological concepts of interest such as genes, diseases, chemical substances, anatomic structures, and phenotypic features, arranged in a class hierarchy | biolink:Disease, biolink:PhenotypicFeature, biolink:Gene, biolink:SequenceVariant |
| Predicate | Objects that define the action being carried out by the subject (or named entity) of a core triple and help define how two entities (or classes) can be related to one another. In graph formalism, predicates are relationships that link two instances. Predicates in the Biolink Model all descend from the “biolink:related_to” predicate | biolink:has_phenotype, biolink:positively_regulates, biolink:affects, biolink:associated_with, biolink:related_to |
| Node property | A set of attributes that can be regarded as a characteristic or inherent part of an instance of “biolink:NamedThing” | biolink:symbol, biolink:name, biolink:id |
| Edge property | A set of attributes that can be regarded as a characteristic or inherent part of a statement, association, or edge | biolink:publications, biolink:has_evidence |
| Core triple | The domain knowledge of an association expressed by the subject and object nodes plus the predicate connecting them | biolink:Disease biolink:has_phenotype biolink:PhenotypicFeature |
| Association | Associations are classes that define a relationship between two domain concepts, constrained and qualified by edge attributes | biolink:DiseaseToPhenotypicFeatureAssociation, biolink:GeneToDiseaseAssociation |
| Type | A kind of value that tells what operations can be performed on a particular data set. Biolink Model implements common types, such as integer and string, but it also defines custom types like quotient and unit | URI or CURIE, string, integer, biolink:Quotient, biolink:Unit |
| Mixin | Modeling elements used to extend the properties (or slots) of a class, without changing its position in the class hierarchy. Please see the Biolink Model documentation for more information on mixin elements | biolink:GeneOrGeneProduct, biolink:DiseaseOrPhenotypicFeature |
Abbreviations: CURIE, compact URI; URI, unique resource identifier.
FIGURE 1An example of an Association represented in Biolink Model. In (a), the green ovals represent the subject and object classes, connected by a predicate. Together, the classes and the predicate constitute a statement or “core triple” in the model. Edge properties provide further context and qualification to the core triple. The entire diagram, including the core triple and its provenance, represents a Biolink Model “association.” In (b), we see a specific example of a “biolink:DiseaseToPhenotypicFeatureAssociation,” where the subject is “biolink:Disease,” the object is “biolink:PhenotypicFeature,” and the predicate is “biolink:has_phenotype.” In addition, the “biolink:publications” property (lavender oval) records the provenance of the core triple.
FIGURE 2An overview of the Translator architecture that supports biomedical KG‐based question‐answering, including the role of Biolink Model, in the context of an example question. In this example, a user has posed the natural‐language question: what chemicals or drugs might be used to treat neurological disorders, such as epilepsy, that are associated with genomic variants of RHOBTB2? The question is translated into a graph query, as shown in the top left panel, which is then translated into a Translator standard machine query (not shown). The KG shown in the second panel from the left is derived from a variety of diverse “knowledge sources,” a subset of which are displayed in the figure, that are exposed by Translator “knowledge providers.” Biolink Model provides standardization and semantic harmonization across the disparate knowledge sources, thereby allowing them to be integrated into a KG capable of supporting question‐answering. In this example, Translator provided two answers or results of interest to the investigative team who posed the question, namely, fostamatinib disodium and ruxolitinib, as shown in the bottom left panel. KG, knowledge graph.