| Literature DB >> 35841031 |
Philip van Damme1,2, Jesualdo Tomás Fernández-Breis3, Nirupama Benis4,5, Jose Antonio Miñarro-Gimenez3, Nicolette F de Keizer4,6, Ronald Cornet4,5.
Abstract
BACKGROUND: Ontology matching should contribute to the interoperability aspect of FAIR data (Findable, Accessible, Interoperable, and Reusable). Multiple data sources can use different ontologies for annotating their data and, thus, creating the need for dynamic ontology matching services. In this experimental study, we assessed the performance of ontology matching systems in the context of a real-life application from the rare disease domain. Additionally, we present a method for analyzing top-level classes to improve precision.Entities:
Keywords: FAIR data; Ontology matching; Rare diseases; Semantic interoperability
Mesh:
Year: 2022 PMID: 35841031 PMCID: PMC9284868 DOI: 10.1186/s13326-022-00273-5
Source DB: PubMed Journal: J Biomed Semantics
Fig. 1The FAIRification process. Adapted from GO FAIR [3]. This study focuses on step 3 (define the semantic model) and step 4 (make data linkable)
Fig. 2Use case for this study. How ontology matching can enable data querying of distributed data sources. The mentioned ontologies are ORDO (Orphanet Rare Disease Ontology) [44] and SNOMED Clinical Terms (SNOMED CT) [16]
Fig. 3Classification of matching techniques. Adapted from [10]. For each category, an example of a possible implementation is given
Fig. 4Overview of the performed experiments. Selection of ontologies, ontology module extraction, matching the ontologies using the selected matching systems, evaluating the alignments using reference alignments and an hierarchical analysis of mappings. The matching systems were selected beforehand
Classification of the matching systems based on the classification model of [10]. The systems that were used in this study are AgreementMakerLight 2.0 [24], FCA-Map [25], and LogMap 2.0 [27]
| Agreement MakerLight 2.0 | FCA-Map | LogMap 2.0 | |
|---|---|---|---|
| Semantic: Formal resource-based | X | - | - |
| Syntactic: Informal resource-based | - | - | - |
| Syntactic: String-based | X | X | X |
| Syntactic: Language-based | X | X | X |
| Syntactic: Constraint-based | - | X | - |
| Semantic: Model-based | - | X | X |
| Syntactic: Instance-based | - | X | - |
| Syntactic: Graph-based | - | - | X |
| Syntactic: Taxonomy-based | - | - | X |
Fig. 5Mapping example. The class Polyploidy is mapped between NCIt (National Cancer Institute thesaurus [17]) and ORDO (Orphanet Rare Disease Ontology [44]). Shown are a chunk of the RDF output from the alignment and a visual representation of the mapping
Fig. 7Categories for evaluation with BioPortal and UMLS Metathesaurus reference alignments. Adapted from [10]
Fig. 6Example of a manually created top-level hierarchy mapping. The four classes from NCIt and SNOMED CT were matched by the matching system, and all four mappings were present in the reference alignments (true positive). Analyzing the top-level hierarchies reveals that NCIt classes are descendants of Anatomic Structure, System, or Substance and SNOMED CT classes of Body structure. A manual mapping between those top-level classes can then be created for NCIt-SNOMED CT
Details of the ontologies and extracted modules
| ORDO | SNOMED CT | NCIt | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Classes in module (% of total) | 299 (2%) | 1,408 (0.4%) | 1.014 (0.7%) | ||||||
| Axioms in module (% of total) | 2.227 (0.9%) | 7,105 (0.4%) | 19,017 (0.7%) | ||||||
| Object properties in module (% of total) | 7 (39%) | 16 (13%) | 40 (41%) | ||||||
| Total classes | 14,502 | 352,449 | 156,172 | ||||||
| Total axioms | 234,982 | 1,629,354 | 2,543,710 | ||||||
| Total object properties | 18 | 120 | 97 | ||||||
Details of the alignments. Shown are the number of mappings in the alignments for the whole ontologies and modules
| AgreementMakerLight 2.0 | FCA-Map | LogMap 2.0 | ||||
|---|---|---|---|---|---|---|
| # mappings (whole ontology) | # mappings (module) | # mappings (whole ontology) | # mappings (module) | # mappings (whole ontology) | # mappings (module) | |
| ORDO - SNOMED CT | 6,463 | 42 | 4,973 | 46 | 5,742 | 53 |
| NCIt - ORDO | 2,543 | 36 | 4,663 | 47 | 2,679 | 31 |
| NCIt - SNOMED CT | 18,887 | 193 | 26,630 | 220 | 23,885 | 214 |
Details of the reference alignments. Shown are the number of mappings in the BioPortal and UMLS Metathesaurus reference alignments. Also shown are the overlap and harmonic mean of the overlap between the alignments. The harmonic mean was calculated by weighting the reference alignment means by the number of mappings in each reference alignment
| Ontology pair | Ontology type | Mappings UMLS | Mappings BioPortal | Overlap | Harmonic mean overlap |
|---|---|---|---|---|---|
| ORDO-SNOMED CT | Module | 35 | 7 | 3 | 14% |
| NCIt-ORDO | Module | 27 | 18 | 12 | 53% |
| NCIt-SNOMED CT | Module | 127 | 90 | 56 | 52% |
| ORDO-SNOMED CT | Whole ontology | 3,861 | 1,750 | 776 | 28% |
| NCIt-ORDO | Whole ontology | 1,484 | 1,450 | 656 | 45% |
| NCIt-SNOMED CT | Whole ontology | 19,309 | 16,290 | 10,195 | 57% |
Evaluation results of the whole ontologies. Shown is the mean precision/recall/F1-score for both the UMLS and BioPortal. The scores for the ontology pairs indicate the mean of all matching systems, the scores for the matching systems indicate the mean of all ontology pairs
| Pair or matching system | Precision UMLS | Precision BioPortal | Recall UMLS | Recall BioPortal | F1-score UMLS | F1-score BioPortal |
|---|---|---|---|---|---|---|
| ORDO - SNOMED CT | 0.45 | 0.28 | 0.66 | 0.89 | 0.53 | 0.42 |
| NCIt - ORDO | 0.33 | 0.44 | 0.67 | 0.91 | 0.43 | 0.58 |
| NCIt - SNOMED CT | 0.55 | 0.67 | 0.66 | 0.94 | 0.60 | 0.78 |
| AgreementMakerLight 2.0 | 0.47 | 0.54 | 0.66 | 0.96 | 0.55 | 0.66 |
| FCA-Map | 0.39 | 0.39 | 0.64 | 0.90 | 0.46 | 0.53 |
| LogMap 2.0 | 0.47 | 0.45 | 0.69 | 0.88 | 0.55 | 0.58 |
Evaluation results of the modules. Shown is the mean precision/recall/F1-score for both the UMLS and BioPortal. The scores for the ontology pairs indicate the mean of all matching systems, the scores for the matching systems indicate the mean of all ontology pairs
| Pair or matching system | Precision UMLS | Precision BioPortal | Recall UMLS | Recall BioPortal | F1-score UMLS | F1-score BioPortal |
|---|---|---|---|---|---|---|
| ORDO - SNOMED CT | 0.45 | 0.14 | 0.60 | 0.95 | 0.51 | 0.25 |
| NCIt - ORDO | 0.49 | 0.47 | 0.67 | 0.96 | 0.56 | 0.62 |
| NCIt - SNOMED CT | 0.51 | 0.42 | 0.84 | 0.98 | 0.64 | 0.59 |
| AgreementMakerLight 2.0 | 0.49 | 0.37 | 0.67 | 0.97 | 0.57 | 0.51 |
| FCA-Map | 0.42 | 0.31 | 0.68 | 1.00 | 0.52 | 0.46 |
| LogMap 2.0 | 0.53 | 0.35 | 0.77 | 0.92 | 0.62 | 0.49 |
Manually created mappings of top-level classes
| ORDO | SNOMED CT |
| clinical entity (Orphanet_C001) | Clinical finding (404684003) |
| genetic material (Orphanet_C010) | Substance (105590001) |
| geography (Orphanet_C009) | Environment or geographical location (308916002) |
| NCIt | ORDO |
| Disease, Disorder or Finding (C7057) | clinical entity (Orphanet_C001) |
| Gene Product (C26548) | genetic material (Orphanet_C010) |
| Conceptual Entity (C20181) | geography (Orphanet_C009) |
| Conceptual Entity (C20181) | inheritance (Orphanet_C005) |
| Property or Attribute (C20189) | age of onset (Orphanet_C023) |
| NCIt | SNOMED CT |
| Anatomic Structure, System, or Substance (C12219) | Body structure (123037004) |
| Disease, Disorder or Finding (C7057) | Clinical finding (404684003) |
| Property or Attribute (C20189) | Qualifier value (362981000) |
| Anatomic Structure, System, or Substance (C12219) | Substance (105590001) |
| Activity (C43431) | Procedure (71388002) |
| Organism (C14250) | Organism (410607006) |
| Drug, Food, Chemical or Biomedical Material (C1908) | Substance (105590001) |
| Drug, Food, Chemical or Biomedical Material (C1908) | Pharmaceutical / biologic product (373873005) |
| Manufactured Object (C97325) | Physical object (260787004) |
| Property or Attribute (C20189) | Observable entity (363787002) |
| Conceptual Entity (C20181) | Environment or geographical location (308916002) |
| Conceptual Entity (C20181) | Social context (48176007) |
| Conceptual Entity (C20181) | Observable entity (363787002) |
Hierarchy analysis results. The number of mappings whose classes’ top-level ancestors were not matched manually (Table 7) are shown for each system and ontology pair. The amount and percentage of false positives (FP) refer to the mappings that were discarded from the alignment for recalculation of both the precision and F1-score
| Whole ontology | Module | ||||
|---|---|---|---|---|---|
| Matching system | Ontology pair | Incorrect hierarchy mappings (of which FP) | Proportion of total alignment (FP) | Incorrect hierarchy mappings (of which FP) | Proportion of total alignment (FP) |
| AgreementMaker Light 2.0 | ORDO-SNOMED CT | 494 (318) | 8% (5%) | 9 (6) | 21% (14%) |
| FCA-Map | ORDO-SNOMED CT | 489 (310) | 10% (6%) | 11 (8) | 24% (17%) |
| LogMap 2.0 | ORDO-SNOMED CT | 193 (106) | 3% (2%) | 5 (3) | 9% (6%) |
| AgreementMaker Light 2.0 | NCIt-SNOMED CT | 3,055 (252) | 16% (1%) | 46 (13) | 24% (7%) |
| FCA-Map | NCIt-SNOMED CT | 6,868 (3,299) | 26% (12%) | 60 (23) | 27% (10%) |
| LogMap 2.0 | NCIt-SNOMED CT | 3,790 (1,180) | 16% (5%) | 42 (9) | 20% (4%) |
| AgreementMaker Light 2.0 | NCIt-ORDO | 127 (102) | 5% (4%) | 4 (1) | 11% (3%) |
| FCA-Map | NCIt-ORDO | 1,229 (1,170) | 3% (3%) | 12 (8) | 26% (17%) |
| LogMap 2.0 | NCIt-ORDO | 130 (92) | 5% (3%) | 3 (0) | 10% (0%) |
Evaluation results of the whole ontologies after removing false positive mappings with an incorrect top-level hierarchy. Shown is the mean precision/F1-score for both the UMLS and BioPortal. Ontology pairs indicate the mean of all matching systems, matching systems indicate the mean of all ontology pairs. Recall has not changed and is therefore not included
| Pair or matching system | Precision UMLS | Precision BioPortal | F1-score UMLS | F1-score BioPortal |
|---|---|---|---|---|
| ORDO - SNOMED CT | 0.47 (+0.02) | 0.29 (+0.01) | 0.55 (+0.02) | 0.44 (+0.02) |
| NCIt - ORDO | 0.36 (+0.03) | 0.48 (+0.04) | 0.46 (+0.03) | 0.62 (+0.04) |
| NCIt - SNOMED CT | 0.59 (+0.04) | 0.71 (+0.04) | 0.62 (+0.02) | 0.81 (+0.03) |
| AgreementMakerLight 2.0 | 0.49 (+0.02) | 0.56 (+0.02) | 0.56 (+0.01) | 0.68 (+0.02) |
| FCA-Map | 0.44 (+0.05) | 0.45 (+0.06) | 0.51 (+0.05) | 0.59 (+0.06) |
| LogMap 2.0 | 0.48 (+0.01) | 0.47 (+0.02) | 0.56 (+0.01) | 0.60 (+0.02) |
Evaluation results of the modules after removing false-positive mappings with an incorrect top-level hierarchy. Shown is the mean precision/F1-score for both the UMLS and BioPortal. Ontology pairs indicate the mean of all matching systems, matching systems indicate the mean of all ontology pairs. Recall has not changed and is therefore not included
| Pair or matching system | Precision UMLS | Precision BioPortal | F1-score UMLS | F1-score BioPortal |
|---|---|---|---|---|
| ORDO - SNOMED CT | 0.51 (+0.06) | 0.16 (+0.02) | 0.55 (+0.04) | 0.28 (+0.03) |
| NCIt - ORDO | 0.52 (+0.03) | 0.50 (+0.03) | 0.58 (+0.02) | 0.65 (+0.02) |
| NCIt - SNOMED CT | 0.59 (+0.08) | 0.71 (+0.29) | 0.62 (+0.02) | 0.81 (+0.22) |
| AgreementMakerLight 2.0 | 0.54 (+0.05) | 0.39 (+0.02) | 0.59 (+0.02) | 0.52 (+0.01) |
| FCA-Map | 0.49 (+0.07) | 0.37 (+0.06) | 0.57 (+0.05) | 0.52 (+0.06) |
| LogMap 2.0 | 0.55 (+0.02) | 0.36 (+0.01) | 0.64 (+0.02) | 0.49 (+0.00) |
Consensus alignment results. Shown are the F1-scores for vote-based consensus alignments. The number of votes represents how many systems selected the same mapping. F1-scores when corrected for positive mappings with an incorrect top-level hierarchy are shown in parenthesis. AgreementMakerLight 2.0 is abbreviated as AML 2.0
| All systems (vote ≥ 2) | All systems (vote = 3) | AML 2.0 + FCA-Map | AML 2.0 + LogMap 2.0 | FCA-Map + LogMap 2.0 | |
|---|---|---|---|---|---|
| F1-score BioPortal (top-level hierarchy) | 0.80 (0.81) | 0.87 (0.87) | 0.90 (0.90) | 0.87 (0.87) | 0.77 (0.78) |
| F1-score UMLS (top-level hierarchy) | 0.63 (0.65) | 0.59 (0.59) | 0.59 (0.59) | 0.59 (0.59) | 0.63 (0.64) |
| F1-score BioPortal (top-level hierarchy) | 0.71 (0.71) | 0.79 (0.79) | 0.80 (0.80) | 0.73 (0.73) | 0.76 (0.76) |
| F1-score UMLS (top-level hierarchy) | 0.53 (0.54) | 0.53 (0.53) | 0.53 (0.53) | 0.55 (0.55) | 0.51 (0.52) |
| F1-score BioPortal (top-level hierarchy) | 0.44 (0.44) | 0.49 (0.49) | 0.50 (0.50) | 0.44 (0.44) | 0.47 (0.47) |
| F1-score UMLS (top-level hierarchy) | 0.56 (0.57) | 0.51 (0.51) | 0.52 (0.52) | 0.55 (0.56) | 0.52 (0.52) |
Four examples of mappings (NCIt-SNOMED CT) that are potentially incorrect based on their top-level hierarchies
| # | Label class A | Label class B | Top level hierarchy class A | Top level hierarchy class B |
|---|---|---|---|---|
| 1 | Soft tissue | Disorder of soft tissue (disorder) | Anatomic Structure, System, or Substance | Clinical finding (finding) |
| 2 | Aneurysmal Bone Cyst | Aneurysmal bone cyst (morphologic abnormality) | Disease, Disorder or Finding | Body structure (body structure) |
| 3 | Abnormality | Abnormal (qualifier value) | Disease, Disorder or Finding | Qualifier value (qualifier value) |
| 4 | Cell Proliferation | Hyperplasia (morphologic abnormality) | Biological Process | Body structure (body structure) |
Rare disease data items. 117 in total. Item are extracted from the common data elements for rare diseases [14], and the Orphanet rare disease classifications [15]
| Data items 1-31 | Data items 32-62 | Data items 63-93 | Data items 94-117 |
|---|---|---|---|
| Pseudonym | Consent to the reuse of data | Thoracic malformation | Rare Infectious Diseases |
| Personal information | Biological sample | Rare Urogenital Diseases | Cholera |
| Date of birth | Link to a biobank | Urogenital tract malformation | Rare Intoxications |
| Date | Biobank | Rare Surgical Thoracic Diseases | Radiation myelitis |
| Female | Disability | Thoracic outlet syndrome | Rare Gynaecological And Obstetric Diseases |
| Male | Classification of functioning | Rare Skin Diseases | Vaginal carcinoma |
| Foetus | Classification of disability | Ichthyosis | Rare Surgical Maxillo-facial Diseases |
| Sex | Disability profile | Rare Renal Diseases | Cleft palate |
| Patient status | Disability score | Multicystic dysplastic kidney | Rare Allergic Disease |
| Alive | Rare diseases | Rare Eye Diseases | Acquired angioedema |
| Dead | Rare Cardiac Diseases | Retinoblastoma | Rare Teratologic Disorders |
| Lost in follow-up | Rare cardiomyopathy | Rare Endocrine Diseases | Infectious embryofetopathy |
| Opted-out | Rare Developmental Anomalies During Embryogenesis | Neuroendocrine neoplasm | Chromosomal Anomalies Sorted By Chromosomes |
| Opt-out | Hydrops fetalis | Rare Haematological Diseases | Polyploidy |
| Date of death | Rare Cardiac Malformations | Mastocytosis | Rare Rheumatologic Diseases Of Childhood |
| Care pathway | Congenital pericardium anomaly | Rare Immunological Diseases | Kawasaki disease |
| First contact with specialised centre | Rare Sucking Swallowing Disorders | Graft versus host disease | Rare Disorders Potentially Indicated For Transplant |
| Disease history | Stickler syndrome | Rare Systemic And Rhumatological Diseases | Systemic primary carnitine deficiency |
| Age at onset | Rare Inborn Errors Of Metabolism | Hereditary angioedema | Prevalence |
| Antenatal | MPI-CDG | Rare Odontological Diseases | Cases/families |
| At birth | Rare Gastroenterological Diseases | Bruck syndrome | Case |
| Age at diagnosis | Eosinophilic gastroenteritis | Rare Circulatory System Diseases | Worldwide |
| Diagnosis | Rare Genetic Diseases | Congenital renal artery stenosis | Validated |
| Diagnosis of the rare disease | Noonan syndrome | Rare Bone Diseases | Geographic |
| Genetic diagnosis | Rare Neurological Diseases | Aneurysmal bone cyst | |
| Undiagnosed case | Spinal cord injury | Rare Otorhinolaryngological Diseases | |
| Phenotype | Rare Abdominal Surgical Diseases | Familial nasal acilia | |
| Genotype | Adenoma of pancreas | Rare Infertility | |
| Research | Rare Hepatic Diseases | Tuberculosis | |
| Patient permission | Rare vascular liver disease | Rare Neoplastic Diseases | |
| Agreement to be contacted for research purposes | Rare Respiratory Diseases | Germ cell tumor |