| Literature DB >> 34289807 |
Natalja Kurbatova1, Rowan Swiers2.
Abstract
BACKGROUND: Data integration to build a biomedical knowledge graph is a challenging task. There are multiple disease ontologies used in data sources and publications, each having its hierarchy. A common task is to map between ontologies, find disease clusters and finally build a representation of the chosen disease area. There is a shortage of published resources and tools to facilitate interactive, efficient and flexible cross-referencing and analysis of multiple disease ontologies commonly found in data sources and research.Entities:
Keywords: Data integration; Knowledge graph; Ontologies
Mesh:
Year: 2021 PMID: 34289807 PMCID: PMC8296689 DOI: 10.1186/s12859-021-04173-w
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1The disease ontology basis for the knowledge graph. Data Preparation: ontology matching presented as cross-reference flat-file and ontological hierarchies are created using Bioportal and Ontology Lookup Service data processed by R scripts. Chronic kidney disease and its presentation from six disease ontologies perspective are shown as a diagram to give an example of a cross-reference file record. Grakn Knowledge Base: data is loaded into the database from data files with python scripts
Individual ontology contribution into cross-referencing and unique terms
| Ontology/counts | # Terms only in this ontology | # Preferred terms | # References | # Unique references |
|---|---|---|---|---|
| MESH | 0 | 0 | 8328 | 8251 |
| UMLS | 0 | 0 | 17,648 | 17,591 |
| EFO | 7 | 70 | 4930 | 4930 |
| NCIT | 0 | 24 | 7067 | 7067 |
| OMIM | 0 | 0 | 8056 | 8032 |
| DOID | 0 | 5 | 9001 | 9001 |
| Orphanet | 1 | 69 | 9066 | 9066 |
| HP | 80 | 75 | 652 | 652 |
| MONDO | 109 | 21,453 | 21,482 | 21,482 |
| ICD10 | 0 | 0 | 11,271 | 4103 |
| Total | 1186 | 21,696 | 97,501 |
Column "Number of terms only in this ontology" shows the number of unique terms from the ontology (when there are no cross-references in other ontologies); column "number of preferred terms" presents the number of terms that were used as the main entries (while other ontologies provided cross-referencing terms), column "number of references" sums up a number of unique terms and cross-references found in the ontology, the last column "number of unique references" shows the number of not repeated references
Fig. 2Grakn Knowledge Base: part of the schema diagram shows a disease node with multiple attributes for ontological terms. Grakn schema with all nodes, attributes and logical rules is
available at Github repository