Literature DB >> 36125173

Biolink Model: A universal schema for knowledge graphs in clinical, biomedical, and translational science.

Deepak R Unni^1,2, Sierra A T Moxon², Michael Bada³, Matthew Brush³, Richard Bruskiewich⁴, J Harry Caufield², Paul A Clemons⁵, Vlado Dancik⁵, Michel Dumontier⁶, Karamarie Fecho⁷, Gustavo Glusman⁸, Jennifer J Hadlock⁸, Nomi L Harris², Arpita Joshi⁸, Tim Putman³, Guangrong Qin⁸, Stephen A Ramsey⁹, Kent A Shefchek³, Harold Solbrig¹⁰, Karthik Soman¹¹, Anne E Thessen³, Melissa A Haendel³, Chris Bizon⁷, Christopher J Mungall².

Abstract

Within clinical, biomedical, and translational science, an increasing number of projects are adopting graphs for knowledge representation. Graph-based data models elucidate the interconnectedness among core biomedical concepts, enable data structures to be easily updated, and support intuitive queries, visualizations, and inference algorithms. However, knowledge discovery across these "knowledge graphs" (KGs) has remained difficult. Data set heterogeneity and complexity; the proliferation of ad hoc data formats; poor compliance with guidelines on findability, accessibility, interoperability, and reusability; and, in particular, the lack of a universally accepted, open-access model for standardization across biomedical KGs has left the task of reconciling data sources to downstream consumers. Biolink Model is an open-source data model that can be used to formalize the relationships between data structures in translational science. It incorporates object-oriented classification and graph-oriented features. The core of the model is a set of hierarchical, interconnected classes (or categories) and relationships between them (or predicates) representing biomedical entities such as gene, disease, chemical, anatomic structure, and phenotype. The model provides class and edge attributes and associations that guide how entities should relate to one another. Here, we highlight the need for a standardized data model for KGs, describe Biolink Model, and compare it with other models. We demonstrate the utility of Biolink Model in various initiatives, including the Biomedical Data Translator Consortium and the Monarch Initiative, and show how it has supported easier integration and interoperability of biomedical KGs, bringing together knowledge from multiple sources and helping to realize the goals of translational science.

Entities: Chemical

Mesh：

Year: 2022 PMID： 36125173 PMCID： PMC9372416 DOI： 10.1111/cts.13302

Source DB: PubMed Journal: Clin Transl Sci ISSN： 1752-8054 Impact factor: 4.438

INTRODUCTION

The use of graphs to formalize the representation of human knowledge dates back to the origins of artificial intelligence and the use of semantic networks for knowledge representation. , The term “knowledge graph” (KG) is gaining popularity and is generally used to encompass a range of graph‐oriented representation frameworks, including Resource Description Framework (RDF) triple stores and labeled property‐graph databases, such as Neo4j. Examples of general‐domain KGs include the Google Knowledge Graph and Wikidata. Within the biomedical sciences, examples include SemMedDB, Hetionet, Implicitome, Monarch Initiative, the biological subset of Wikidata, SPOKE, and KG‐COVID‐19. Although KGs have been defined in various ways, perhaps the most intuitive definition is a graph in which the nodes represent real‐world entities and the edges represent known relationships between those entities. In a KG, the knowledge or “facts” are represented as statements, with each statement modeled as two nodes linked together by an edge representing the relationship between them. The statements can have additional properties, metadata, and qualifying attributes that further capture the meaning of the statement and characterize the properties of nodes and edges. Because the basic structure of a KG is generic, the knowledge contained within a KG can be heterogeneous and mutable and still be representable in the graph. The representation of knowledge as simple connections between core entities makes iterative, rapid development of KGs possible. In addition, by leveraging the graph data structure and using various inference strategies, one can infer new edges or connections between nodes in a graph. Ontology‐oriented KGs allow deductive inference through logical rules, from basic rules such as the Gene Ontology “true path” rule to more sophisticated methods like Description Logic inference. Ontology‐oriented KGs are also amenable to machine learning approaches, such as embedding in vector space, which supports the application of deep neural networks for tasks such as link prediction and node classification. Within the biomedical sciences, ontology‐oriented KGs have been used for tasks, such as drug repurposing, target prioritization, and phenotype profile matching. Several ontologies and schemas for representing biomedical knowledge are available. A constellation of domain‐specific ontologies from the Open Biological and biomedical Ontology Foundry can be used for modeling knowledge. For example, the Semantic Science Integrated Ontology is used for representing scientific data and knowledge. The Wikidata Ontology is used by Wikidata for representing knowledge. In terms of schemas, schema.org is used for representing metadata about entities and relationships to other entities. BioSchemas is an extension of schema.org for representing metadata about biological entities. Whereas existing efforts in modeling knowledge have been valuable, a unified data model that bridges across multiple ontologies, schemas, and data models does not exist. Here, we present Biolink Model as an open‐source, universal data model that defines entities and the relationships between these entities within translational science.

OVERVIEW OF BIOLINK MODEL

Biolink Model is a data model for organizing data in biomedical KGs. The model serves both as a map for bringing together data from different sources under one unified model, and as a bridge between ontological domains. Biolink Model is composed of several modeling elements, including a hierarchy of defined classes, properties (with defined types), predicates, mixins, and associations (Table 1). Domain knowledge in a KG that conforms to Biolink Model is represented using associations. An association minimally includes a subject and an object (Biolink Model classes) related by a Biolink Model predicate, together comprising its core triple (statement or primary assertion). The subject and object of an association are foundational domain concepts (e.g., genes, diseases, chemicals, and phenotypes), whose Internationalized Resource Identifiers (IRIs) come from community standard ontologies (e.g., HGNC, MONDO, ChEBI, and HPO). The predicate is a Biolink Model element that represents the relationship between the subject and object. Associations may also include slots to hold additional metadata about the core triple, primarily information about the provenance, and evidence supporting the assertion (Figure 1).

TABLE 1

Biolink Model elements and their definitions

Biolink Model element	Definition	Examples
Class	High‐level types (or categories) representing core biological concepts of interest such as genes, diseases, chemical substances, anatomic structures, and phenotypic features, arranged in a class hierarchy	biolink:Disease, biolink:PhenotypicFeature, biolink:Gene, biolink:SequenceVariant
Predicate	Objects that define the action being carried out by the subject (or named entity) of a core triple and help define how two entities (or classes) can be related to one another. In graph formalism, predicates are relationships that link two instances. Predicates in the Biolink Model all descend from the “biolink:related_to” predicate	biolink:has_phenotype, biolink:positively_regulates, biolink:affects, biolink:associated_with, biolink:related_to
Node property	A set of attributes that can be regarded as a characteristic or inherent part of an instance of “biolink:NamedThing”	biolink:symbol, biolink:name, biolink:id
Edge property	A set of attributes that can be regarded as a characteristic or inherent part of a statement, association, or edge	biolink:publications, biolink:has_evidence
Core triple	The domain knowledge of an association expressed by the subject and object nodes plus the predicate connecting them	biolink:Disease biolink:has_phenotype biolink:PhenotypicFeature
Association	Associations are classes that define a relationship between two domain concepts, constrained and qualified by edge attributes	biolink:DiseaseToPhenotypicFeatureAssociation, biolink:GeneToDiseaseAssociation
Type	A kind of value that tells what operations can be performed on a particular data set. Biolink Model implements common types, such as integer and string, but it also defines custom types like quotient and unit	URI or CURIE, string, integer, biolink:Quotient, biolink:Unit
Mixin	Modeling elements used to extend the properties (or slots) of a class, without changing its position in the class hierarchy. Please see the Biolink Model documentation for more information on mixin elements	biolink:GeneOrGeneProduct, biolink:DiseaseOrPhenotypicFeature

Abbreviations: CURIE, compact URI; URI, unique resource identifier.

FIGURE 1

An example of an Association represented in Biolink Model. In (a), the green ovals represent the subject and object classes, connected by a predicate. Together, the classes and the predicate constitute a statement or “core triple” in the model. Edge properties provide further context and qualification to the core triple. The entire diagram, including the core triple and its provenance, represents a Biolink Model “association.” In (b), we see a specific example of a “biolink:DiseaseToPhenotypicFeatureAssociation,” where the subject is “biolink:Disease,” the object is “biolink:PhenotypicFeature,” and the predicate is “biolink:has_phenotype.” In addition, the “biolink:publications” property (lavender oval) records the provenance of the core triple.

Biolink Model elements and their definitions Abbreviations: CURIE, compact URI; URI, unique resource identifier. An example of an Association represented in Biolink Model. In (a), the green ovals represent the subject and object classes, connected by a predicate. Together, the classes and the predicate constitute a statement or “core triple” in the model. Edge properties provide further context and qualification to the core triple. The entire diagram, including the core triple and its provenance, represents a Biolink Model “association.” In (b), we see a specific example of a “biolink:DiseaseToPhenotypicFeatureAssociation,” where the subject is “biolink:Disease,” the object is “biolink:PhenotypicFeature,” and the predicate is “biolink:has_phenotype.” In addition, the “biolink:publications” property (lavender oval) records the provenance of the core triple. Biolink Model aims to address several challenges that obstruct the interoperability between KGs, including: (1) the need for expertise to transform data between tabular, RDF, and graphical models; (2) sparse and/or inconsistent application of ontologies or other controlled vocabularies, as well as differences in the identifiers that are used for storing instances of nodes within a graph; and (3) the lack of a standard approach to model the intersection of ontological domains (e.g., the relationships between genes and diseases). Using the framework provided by the Linked data Modeling Language (LinkML), Biolink Model is distributed in a variety of formats, including YAML, JSON‐Schema, SQL‐DDL, Python/Java classes, and RDF. Additionally, Unified Modeling Language diagrams provide a visual representation of the model. Biolink Model is accessible in frameworks familiar to a wide variety of developers and database engineers. Because the model can be distributed in different formats, the model elements can also be validated using existing toolchains (e.g., JSONSchema validation and SQL constraints), thus speeding up the reconciliation of tabular data, ontologies, and graphs. The biomedical field has been a leader and champion of ontology development. However, this has sometimes led to the development of multiple ontologies or controlled vocabularies for the same domain concept. When this happens, KG creators must identify which vocabulary best suits their needs, as well as understand how to apply concepts from the chosen ontology to their class instances. Biolink Model helps solve this challenge by indicating to users which ontologies should be used for instances of its classes via identifier prefixes (id_prefixes), mappings, and associations. Biolink Model describes its classes in a description field. Part of the definition of a class is an id_prefixes construct. Recognizing that biomedical resources often implement new identifiers for their resource, instead of reusing existing identifiers from other resources, Biolink Model encourages reuse of existing ontologies by providing a list of possible ontologies (via id_prefixes) in preference order for engineers to use when instantiating model classes. For example, for a disease class, Biolink Model suggests that instances of the class use Mondo (the Mondo Disease Ontology) as the preferred disease vocabulary. The id_prefixes modeling construct allows the development of software that can normalize identifiers across data sources. Tools such as the Biomedical Data Translator Node Normalization Service and the Knowledge Graph Exchange Framework use the identifier mappings in Biolink Model to return the preferred equivalent identifier when presented with several identifiers that represent the same domain concept but with different namespaces (e.g., NCBIGene vs. HGNC gene identifiers). Each element in Biolink Model is mapped, when possible, to equivalent elements in other ontologies or models. Biolink Model uses mapping terms from the Simple Knowledge Organization System (SKOS) namespace to record classes and objects outside the model that can be considered similar in an exact, broad, narrow, close, or related manner to the Biolink Model class (e.g., the broad_mapping relation implements the skos:broadMatch). These mappings render the model and data more computable, allowing software programs to automatically harmonize and connect disparate data sources, thus facilitating interoperability. Finally, a key feature of Biolink Model is its association elements. Taking inspiration from successful efforts like Semanticscience Integrated Ontology, Biolink Model Association elements establish rules for transforming biomedical knowledge into computable statements and help define how to represent knowledge statements across ontological domains. “Computable,” in this context, means that each Biolink Model association defines the kinds of objects that can participate as a subject or object of a biomedical statement (via domain and range constraints); defines sets of attributes (edge properties described in Table 1 and detailed in the Biolink Model documentation) that are required to properly instantiate a relationship between two domain concepts; and provides a blueprint for registering and maintaining the provenance of each statement. In Web Ontology Language (OWL), Biolink Model association elements are equivalent to axioms, and in RDF, they are equivalent to statements (rdf:Statement). Because provenance and evidence are critical components of any data set (and the knowledge represented therein), Biolink Model provides properties capable of tracking evidence and provenance both at the class and association levels.

APPLICATIONS OF BIOLINK MODEL

Translational science, by its nature, involves the application of diverse information derived from different subject matter experts and curated data sources to answer questions through integrated analyses of clinical and biomedical knowledge. Biolink Model supports translation, integration, and harmonization across knowledge sources by capturing subject matter expertise in a machine‐readable format that allows software to interoperate with disparate data sources using a common dialect, facilitated by a harmonized data model. We highlight several examples here.

Biomedical Data Translator (“Translator”) Consortium

The Translator Consortium has adopted Biolink Model as an open‐source upper‐level data model that supports semantic harmonization and reasoning across diverse Translator “knowledge sources.” The model serves a central role in the Translator program and forms the architectural basis of the Translator system, as described below. The Translator program aims to develop a comprehensive, relational, N‐dimensional infrastructure designed to integrate disparate data sources—including objective signs and symptoms of disease, drug effects, chemical and genetic interactions, cell and organ pathology, and other relevant biological entities and relations—and reason over the integrated data to rapidly derive biomedical insights. The ultimate goal of Translator is to augment human reasoning and thereby accelerate translational science and knowledge discovery. To achieve its ambitious goal, the Translator project assembled a diverse interdisciplinary team and a variety of biomedical data sources, including electronic health record data, clinical trial data, genomic and other ‐omics data, chemical reaction data, and drug data. There are hundreds of data sources in the Translator ecosystem, each of which had its own data representation and were in formats that were not compatible or interoperable. Moreover, groups within the Translator Consortium had integrated the data sources as knowledge sources within independent KGs, but these KGs were developed using different technologies and formalisms, such as property graphs in Neo4j and semantically linked data via RDF and OWL. In order to interoperate between knowledge sources and reason across KGs, Biolink Model was adopted as the common dialect, thus enabling queries over the entire Translator KG ecosystem. The result was a federated, harmonized ecosystem that supports advanced reasoning and inference to derive biomedical insights based on user queries. An example Translator use case involved a collaboration with investigators at the Hugh Kaul Precision Medicine Institute (PMI) at the University of Alabama at Birmingham. PMI investigators posed the following natural‐language question to the Translator Consortium: what chemicals or drugs might be used to treat neurological disorders, such as epilepsy that are associated with genomic variants of RHOBTB2? The investigators noted that RHOBTB2 variants cause an accumulation of RHOBTB2 protein and that this accumulation is believed to be the cause of the neurological disorder. To answer the PMI investigator’s question, Translator team members structured the following query: NCBIGene:23221 (CURIE for RHOBTB2) ‐> [biolink:entity_regulates_entity, biolink:genetically_interacts_with] ‐> biolink:Protein, biolink:Gene ‐> [biolink:related_to] ‐> biolink:SmallMolecule (Figure 2). Because of the hierarchical structure of the Biolink model, the use of biolink:related_to also will return more specific predicates such as biolink:negatively_regulates and biolink:positively_regulates. The objective was to identify drugs or chemicals that might regulate RHOBTB2 in some manner and thereby reduce the variant‐induced accumulation of RHOBTB2 and associated neurological symptoms. As all nodes and edges within the Translator KG ecosystem are annotated to Biolink Model classes and attributes, a Translator query can be constructed from a natural‐language user question and return results across a multitude of independent data sources. In addition, because the model uses hierarchical classes, with inheritance and polymorphism, natural‐language queries translated to graph queries using Biolink Model syntax can be constructed at varying levels of granularity and return results from all levels of the hierarchy. Finally, because Biolink Model provides attributes on both edges and nodes that record provenance and evidence for these knowledge statements, each result is annotated with the trail of evidence that supports it.

FIGURE 2

An overview of the Translator architecture that supports biomedical KG‐based question‐answering, including the role of Biolink Model, in the context of an example question. In this example, a user has posed the natural‐language question: what chemicals or drugs might be used to treat neurological disorders, such as epilepsy, that are associated with genomic variants of RHOBTB2? The question is translated into a graph query, as shown in the top left panel, which is then translated into a Translator standard machine query (not shown). The KG shown in the second panel from the left is derived from a variety of diverse “knowledge sources,” a subset of which are displayed in the figure, that are exposed by Translator “knowledge providers.” Biolink Model provides standardization and semantic harmonization across the disparate knowledge sources, thereby allowing them to be integrated into a KG capable of supporting question‐answering. In this example, Translator provided two answers or results of interest to the investigative team who posed the question, namely, fostamatinib disodium and ruxolitinib, as shown in the bottom left panel. KG, knowledge graph. When Translator team members sent the query to the Translator system, it returned several candidates of interest to PMI investigators, including fostamatinib disodium (CHEMBL.COMPOUND:CHEMBL3989516) and ruxolitinib (CHEMBL.COMPOUND:CHEMBL1789941). A review of the supporting evidence provided by Translator indicates that these are approved drugs that either directly or indirectly reduce or otherwise regulate the expression of RHOBTB2. Thus, Biolink Model helped Translator teams bring data together into a single system, thereby reducing the burden on the user to find and manually assemble data from these independent resources.

Monarch initiative

Similar to Translator, the Monarch Initiative is a large‐scale bioinformatics web resource focused on leveraging existing biomedical knowledge to connect genotypes with phenotypes in an effort to aid research on genetic diseases. Monarch pulls together data from a wide variety of sources. However, because each source uses its own model to describe entities and their relationships, subject matter expertise is required to manually translate between knowledge representations. Monarch is adopting the Biolink Model to capture these mappings and make them available for other groups to use. For example, one of the main driving use cases for Monarch is to address the need to establish links between phenotypes identified in model organisms (e.g., mice, fruit flies, rats, yeast, worms, and zebrafish) and phenotypes identified in humans. Unsurprisingly, the vocabularies used to describe clinical observations and those used to describe model organisms are quite different. Clinical data often refer to “side effects” and “symptoms,” whereas model organism data typically refer to “traits” or “phenotypes.” In designing Biolink Model, subject matter experts from a variety of disciplines have reconciled these concepts in the “biolink:PhenotypicFeature” class. This makes it possible to query across multiple resources that use multiple terminologies and identifiers and find relevant results.

Illuminating the druggable genome

Illuminating the druggable genome (IDG) aims to identify protein drug targets by developing tools to search, display, and distribute information on these proteins to the biomedical community, whereas supporting research that helps scientists understand how these proteins function. The Illuminating the Druggable Genome Knowledge Graph (KG‐IDG) was created to use graph‐based machine learning to predict links between drugs and potential targets, in order to identify proteins that are promising drug targets and drugs that are promising repurposing candidates. The generation of this KG relies on Biolink Model to provide a “biolink:Protein” class with mappings to equivalent classes in UniProt, Ensembl, and the Protein Ontology Community. This step ensures that tooling used to identify links among these different protein sources can interrogate them using the same language and same hierarchical data representation. Similarly, KG‐IDG uses the “biolink:InformationContentEntity” grouping class to reason over many diverse sources of biomedical attribution, including clinical trials, books, and journal articles. KG‐IDG is able to reuse the “biolink:InformationContentEntity” hierarchy in Biolink Model to be specific about the attribution stored in the KG and also reason over the attribution using the higher‐level grouping classes, without creating another KG‐IDG‐specific schema.

Additional applications

Translator, Monarch, and KG‐IDG incorporate a broad spectrum of data from a variety of sources, with each source modeling their data using different approaches, independent identifier systems, and heterogeneous data representations. Biolink Model provides the semantic harmonization required to integrate these disparate data sources. A growing number of other projects also consult and reuse components of the Biolink Model in designing their models. For example, the Alliance of Genome Resources imports some Biolink Model components, even though they do not use the entirety of Biolink Model. Other initiatives that rely on Biolink Model for data and knowledge harmonization include KG‐COVID‐19 and KG‐Microbe.

DISCUSSION

The success of Biolink Model can be attributed to its community—biologists, clinicians, data curators, developers, subject matter experts, and ontologists—all of whom have contributed their requirements, perspectives, and expertise to help build a flexible semantic data model. Biolink Model is under continual development, with frequent releases and a publicly accessible issue tracker on GitHub. To ensure sustained development of the model, we invite the biomedical community to contribute via GitHub pull requests and use the issue tracker to suggest new features, report problems, or ask questions (see Supplemental Resources within Supplementary Materials for links to the GitHub repository for Biolink Model, documentation, and other relevant resources). Biolink Model provides a blueprint to harmonize existing data sources and accelerate the development of new knowledge by leveraging a multitude of domain and technical expertise, captured in a variety of ontologies and existing models (via semantic mappings), within a single modeling framework that is easy to read, write, reuse, and distribute. Moreover, Biolink Model is grounded in semantic web technologies (characterized by classes and slots with their own IRIs, SKOS mappings to existing ontologies, descriptions, identifier prefixes, and domain and range constraints) and captures biomedical expertise as a computable knowledge artifact that can be read and interpreted by both machines and humans. Importantly, KGs that implement Biolink Model immediately gain access to the frameworks and tools developed by a variety of projects that use the model, as well as a platform to connect any Biolink Model–compliant KG to other Translator biomedical KGs. Because Biolink Model is platform‐agnostic, open‐source, and publicly accessible, and because it can be translated into a variety of data modeling formats, it encourages people from different backgrounds and with different expertise to work together to evolve the model. Most importantly, Biolink Model supports the harmonization of KGs and underlying data sources in a manner that adheres to FAIR principles and facilitates applications across a broad spectrum of biomedical use cases, thereby democratizing and accelerating translational science.

CONFLICTS OF INTEREST

The authors declared no competing interests for this work. Appendix S1 Click here for additional data file.

15 in total

1. Creating the gene ontology resource: design and implementation.

Authors:
Journal: Genome Res Date: 2001-08 Impact factor: 9.043

2. The Semanticscience Integrated Ontology (SIO) for biomedical research and knowledge discovery.

Authors: Michel Dumontier; Christopher Jo Baker; Joachim Baran; Alison Callahan; Leonid Chepelev; José Cruz-Toledo; Nicholas R Del Rio; Geraint Duck; Laura I Furlong; Nichealla Keath; Dana Klassen; Jamie P. McCusker; Núria Queralt-Rosinach; Matthias Samwald; Natalia Villanueva-Rosales; Mark D Wilkinson; Robert Hoehndorf
Journal: J Biomed Semantics Date: 2014-03-06

3. The Implicitome: A Resource for Rationalizing Gene-Disease Associations.

Authors: Kristina M Hettne; Mark Thompson; Herman H H B M van Haagen; Eelke van der Horst; Rajaram Kaliyaperumal; Eleni Mina; Zuotian Tatum; Jeroen F J Laros; Erik M van Mulligen; Martijn Schuemie; Emmelien Aten; Tong Shu Li; Richard Bruskiewich; Benjamin M Good; Andrew I Su; Jan A Kors; Johan den Dunnen; Gert-Jan B van Ommen; Marco Roos; Peter A C 't Hoen; Barend Mons; Erik A Schultes
Journal: PLoS One Date: 2016-02-26 Impact factor: 3.240

4. Systematic integration of biomedical knowledge prioritizes drugs for repurposing.

Authors: Daniel Scott Himmelstein; Antoine Lizee; Christine Hessler; Leo Brueggeman; Sabrina L Chen; Dexter Hadley; Ari Green; Pouya Khankhanian; Sergio E Baranzini
Journal: Elife Date: 2017-09-22 Impact factor: 8.140

5. The Monarch Initiative in 2019: an integrative data and analytic platform connecting phenotypes to genotypes across species.

Authors: Kent A Shefchek; Nomi L Harris; Michael Gargano; Nicolas Matentzoglu; Deepak Unni; Matthew Brush; Daniel Keith; Tom Conlin; Nicole Vasilevsky; Xingmin Aaron Zhang; James P Balhoff; Larry Babb; Susan M Bello; Hannah Blau; Yvonne Bradford; Seth Carbon; Leigh Carmody; Lauren E Chan; Valentina Cipriani; Alayne Cuzick; Maria Della Rocca; Nathan Dunn; Shahim Essaid; Petra Fey; Chris Grove; Jean-Phillipe Gourdine; Ada Hamosh; Midori Harris; Ingo Helbig; Maureen Hoatlin; Marcin Joachimiak; Simon Jupp; Kenneth B Lett; Suzanna E Lewis; Craig McNamara; Zoë M Pendlington; Clare Pilgrim; Tim Putman; Vida Ravanmehr; Justin Reese; Erin Riggs; Sofia Robb; Paola Roncaglia; James Seager; Erik Segerdell; Morgan Similuk; Andrea L Storm; Courtney Thaxon; Anne Thessen; Julius O B Jacobsen; Julie A McMurry; Tudor Groza; Sebastian Köhler; Damian Smedley; Peter N Robinson; Christopher J Mungall; Melissa A Haendel; Monica C Munoz-Torres; David Osumi-Sutherland
Journal: Nucleic Acids Res Date: 2020-01-08 Impact factor: 16.971

6. Knowledge Network Embedding of Transcriptomic Data from Spaceflown Mice Uncovers Signs and Symptoms Associated with Terrestrial Diseases.

Authors: Charlotte A Nelson; Ana Uriarte Acuna; Amber M Paul; Ryan T Scott; Atul J Butte; Egle Cekanaviciute; Sergio E Baranzini; Sylvain V Costes
Journal: Life (Basel) Date: 2021-01-12

7. Harmonizing model organism data in the Alliance of Genome Resources.

Authors:
Journal: Genetics Date: 2022-04-04 Impact factor: 4.402

Review 8. Progress toward a universal biomedical data translator.

Authors: Karamarie Fecho; Anne E Thessen; Sergio E Baranzini; Chris Bizon; Jennifer J Hadlock; Sui Huang; Ryan T Roper; Noel Southall; Casey Ta; Paul B Watkins; Mark D Williams; Hao Xu; William Byrd; Vlado Dančík; Marc P Duby; Michel Dumontier; Gustavo Glusman; Nomi L Harris; Eugene W Hinderer; Greg Hyde; Adam Johs; Andrew I Su; Guangrong Qin; Qian Zhu
Journal: Clin Transl Sci Date: 2022-05-25 Impact factor: 4.438

9. The FAIR Guiding Principles for scientific data management and stewardship.

Authors: Mark D Wilkinson; Michel Dumontier; I Jsbrand Jan Aalbersberg; Gabrielle Appleton; Myles Axton; Arie Baak; Niklas Blomberg; Jan-Willem Boiten; Luiz Bonino da Silva Santos; Philip E Bourne; Jildau Bouwman; Anthony J Brookes; Tim Clark; Mercè Crosas; Ingrid Dillo; Olivier Dumon; Scott Edmunds; Chris T Evelo; Richard Finkers; Alejandra Gonzalez-Beltran; Alasdair J G Gray; Paul Groth; Carole Goble; Jeffrey S Grethe; Jaap Heringa; Peter A C 't Hoen; Rob Hooft; Tobias Kuhn; Ruben Kok; Joost Kok; Scott J Lusher; Maryann E Martone; Albert Mons; Abel L Packer; Bengt Persson; Philippe Rocca-Serra; Marco Roos; Rene van Schaik; Susanna-Assunta Sansone; Erik Schultes; Thierry Sengstag; Ted Slater; George Strawn; Morris A Swertz; Mark Thompson; Johan van der Lei; Erik van Mulligen; Jan Velterop; Andra Waagmeester; Peter Wittenburg; Katherine Wolstencroft; Jun Zhao; Barend Mons
Journal: Sci Data Date: 2016-03-15 Impact factor: 6.444

10. OBO Foundry in 2021: operationalizing open data principles to evaluate ontologies.

Authors: Rebecca Jackson; Nicolas Matentzoglu; James A Overton; Randi Vita; James P Balhoff; Pier Luigi Buttigieg; Seth Carbon; Melanie Courtot; Alexander D Diehl; Damion M Dooley; William D Duncan; Nomi L Harris; Melissa A Haendel; Suzanna E Lewis; Darren A Natale; David Osumi-Sutherland; Alan Ruttenberg; Lynn M Schriml; Barry Smith; Christian J Stoeckert; Nicole A Vasilevsky; Ramona L Walls; Jie Zheng; Christopher J Mungall; Bjoern Peters
Journal: Database (Oxford) Date: 2021-10-26 Impact factor: 3.451

2 in total

1. RTX-KG2: a system for building a semantically standardized knowledge graph for translational biomedicine.

Authors: E C Wood; Amy K Glen; Lindsey G Kvarfordt; Finn Womack; Liliana Acevedo; Timothy S Yoon; Chunyu Ma; Veronica Flores; Meghamala Sinha; Yodsawalai Chodpathumwan; Arash Termehchy; Jared C Roach; Luis Mendoza; Andrew S Hoffman; Eric W Deutsch; David Koslicki; Stephen A Ramsey
Journal: BMC Bioinformatics Date: 2022-09-29 Impact factor: 3.307

2. Machine actionable metadata models.

Authors: Dominique Batista; Alejandra Gonzalez-Beltran; Susanna-Assunta Sansone; Philippe Rocca-Serra
Journal: Sci Data Date: 2022-09-30 Impact factor: 8.501

2 in total