| Literature DB >> 25379899 |
Daniel Faria1, Catia Pesquita2, Emanuel Santos3, Isabel F Cruz4, Francisco M Couto2.
Abstract
Ontology matching is a growing field of research that is of critical importance for the semantic web initiative. The use of background knowledge for ontology matching is often a key factor for success, particularly in complex and lexically rich domains such as the life sciences. However, in most ontology matching systems, the background knowledge sources are either predefined by the system or have to be provided by the user. In this paper, we present a novel methodology for automatically selecting background knowledge sources for any given ontologies to match. This methodology measures the usefulness of each background knowledge source by assessing the fraction of classes mapped through it over those mapped directly, which we call the mapping gain. We implemented this methodology in the AgreementMakerLight ontology matching framework, and evaluate it using the benchmark biomedical ontology matching tasks from the Ontology Alignment Evaluation Initiative (OAEI) 2013. In each matching problem, our methodology consistently identified the sources of background knowledge that led to the highest improvements over the baseline alignment (i.e., without background knowledge). Furthermore, our proposed mapping gain parameter is strongly correlated with the F-measure of the produced alignments, thus making it a good estimator for ontology matching techniques based on background knowledge.Entities:
Mesh:
Year: 2014 PMID: 25379899 PMCID: PMC4224389 DOI: 10.1371/journal.pone.0111226
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Ontologies used as background knowledge sources.
| Ontology Name | Acronym | Domain |
| Anatomical Entity Ontology | AEO | anatomy |
| Bilateria Anatomy | BILA | anatomy |
| Cell Type | CL | anatomy |
| Chemical Entities of Biological Interest | CHEBI | biochemistry |
| Common Anatomy Reference Ontology | CARO | anatomy |
| Foundational Model of Anatomy | FMA | anatomy |
| Human Disease Ontology | DOID | health |
| Human Phenotype Ontology | HP | phenotype |
| Infectious Disease | IDO | health |
| Mouse Anatomy | MA | anatomy |
| Minimal Anatomical Terminology | MAT | anatomy |
| NCI Thesaurus | NCI | health |
| NIF Cell | NIFC | neuroscience |
| NIF Dysfunction | NIFD | neuroscience |
| NIF Gross Anatomy | NIFGA | neuroscience |
| Ontology for General Medical Science | OGMS | medicine |
| Phenotypic Quality | PATO | phenotype |
| Subcellular Anatomy Ontology | SAO | anatomy |
| Symptom Ontology | SYMP | health |
| Uber Anatomy Ontology | Uberon | anatomy |
| Verteberate Homologous Organ Groups | VHOG | anatomy |
| Vertebrate Skeletal Anatomy Ontology | VSAO | anatomy |
Correlation between Mapping Gain and F-measure, between Similarity Score and F-measure, and between Effectiveness and F-measure.
| Evaluation Task | Correlation with F-measure | |||||
| Mapping Gain | Similarity Score | Effectiveness | ||||
| All | no UMLS | All | no UMLS | All | no UMLS | |
| Mouse-Human | 0.998 | 1.000 | 0.830 | 0.884 | 0.628 | 0.688 |
| FMA-NCI small | 0.997 | 0.988 | 0.168 | 0.716 | 0.965 | 0.783 |
| FMA-NCI whole | 0.609 | −0.985 | 0.800 | −0.938 | 0.613 | −0.840 |
| FMA-SNOMED small | 1.000 | 0.994 | 0.135 | 0.925 | 0.997 | 0.901 |
| FMA-SNOMED whole | 0.996 | 0.944 | 0.993 | 0.789 | 0.994 | 0.860 |
| SNOMED-NCI small | 0.999 | 0.987 | 0.682 | 0.928 | 0.994 | 0.591 |
| SNOMED-NCI whole | 0.999 | 0.972 | 0.982 | 0.867 | 0.993 | 0.581 |
| Average | 0.998 | 0.981 | 0.632 | 0.852 | 0.929 | 0.734 |
Correlation coefficients were computed with all background knowledge sources and with all sources except UMLS. The average was computed excluding the FMA-NCI whole task, as the reference alignment for this task is incomplete, resulting in the negative correlation coefficients observed without UMLS.
Figure 1F-measure of the automated background knowledge selection methodology as function of the mapping gain threshold (in descending logarithmic scale) for six ontology matching tasks.
The first shift in each line (left-to-right) corresponds to the transition between the baseline alignment and the selection of the best background knowledge source, and subsequent shifts correspond to the selection of additional background knowledge sources. The addition of some background knowledge sources has no visible effect on F-measure.
Run time and F-measure of our automated background knowledge selection methodology, and F-measure of the corresponding baseline and optimal alignments.
| Matching Task | Automated Selection | Baseline | Optimal | |
| Time (s) | F-measure | F-measure | F-measure | |
| Mouse-Human | 120 | 91.1% | 81.2% | 91.1% |
| FMA-NCI small | 140 | 86.1% | 83.6% | 86.1% |
| FMA-SNOMED small | 190 | 78.1% | 76.3% | 78.1% |
| FMA-SNOMED whole | 800 | 76.5% | 75.6% | 76.6% |
| SNOMED-NCI small | 600 | 72.7% | 69.5% | 72.8% |
| SNOMED-NCI whole | 1500 | 70.4% | 68.2% | 70.4% |
Alignment obtained with the manually selected combination of background knowledge sources that leads effectively to the highest F-measure.
Experiments were run in a desktop computer with an Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz and 16 GB RAM.