| Literature DB >> 25411633 |
Abstract
BACKGROUND: We are currently facing a proliferation of heterogeneous biomedical data sources accessible through various knowledge-based applications. These data are annotated by increasingly extensive and widely disseminated knowledge organisation systems ranging from simple terminologies and structured vocabularies to formal ontologies. In order to solve the interoperability issue, which arises due to the heterogeneity of these ontologies, an alignment task is usually performed. However, while significant effort has been made to provide tools that automatically align small ontologies containing hundreds or thousands of entities, little attention has been paid to the matching of large sized ontologies in the life sciences domain.Entities:
Keywords: Entity similarity; Information retrieval; Life sciences ontologies; Machine learning; Ontology matching; Semantic interoperability
Year: 2014 PMID: 25411633 PMCID: PMC4236493 DOI: 10.1186/2041-1480-5-44
Source DB: PubMed Journal: J Biomed Semantics
Figure 1Matching process of ServOMap.
Figure 2Generated fields for respectively NCI Thesaurus (a), FMA (b) and TheSoz (c).
Example of the entries of the index for a concept, a datatype and object properties after pre-processing
| Entity | Field | Description | Value |
|---|---|---|---|
| Concept | directLabelCEN | Labels in English | thoracvertebrforamen foramenthoracvertebr foramenvertebrthorac vertebrforamenthorac |
| directNameC | Local name | thoracicvertebralforamen | |
| uri | URI of the entity |
| |
| Datatype property | dRange | Range restriction | xsd string |
| directNameP | Local name | outdatmean | |
| domainLabelsDP | Domains restriction (concept hierarchy) | concept name attribute entity | |
| propertyType | Constraint on the property | function | |
| uri | URI of the entity |
| |
| Object property | directNameP | Local name of the property | geneproductchemicclassif |
| domainLabelsOP | Domains restriction | Gene Product Kind | |
| rangeLabelsOP | Ranges restriction | Gene Product Kind | |
| uri | URI of the entity |
|
Figure 3Lexical similarity computing.
Figure 4Strategy for generating possible candidate pairs.
Figure 5Machine learning based contextual similarity computing.
Figure 6Discarding incorrect candidate mappings. The green dashed line of the part (b) of the figure identifies the pair from Mexact. The dashed black lines with red cross identify the candidate mappings to remove.
Size of input ontologies considered for the different matching problems
| Matching problem | Small task | Large task | ||
|---|---|---|---|---|
|
| FMA | NCI | FMA | NCI |
| 5% - 3,696 | 10% - 6,488 | 100% - 78,989 | 100% - 66,724 | |
|
| FMA | SNOMED | FMA | SNOMED |
| 13% - 10,157 | 5% - 13,412 | 100% - 78,989 | 40% - 122,464 | |
|
| SNOMED | NCI | SNOMED | NCI |
| 17% - 51,128 | 36% - 23,958 | 40% - 122,464 | 100% - 66,724 | |
Each cell indicates the percentage of the fragment and the corresponding number of concepts.
Performance achieved by the ServOMap_2012 version on the LargeBio dataset
|
| Task | #Mappings | Precision | Recall | F-Measure | Time |
|---|---|---|---|---|---|---|
|
| Small | 2,300 | 99% | 75.3% | 85.5% | 25 |
| Large | 2.413 | 93.3% | 74.4% | 82.8% | 98 | |
|
| Small | 6,009 | 98.5% | 65.7% | 78.8% | 46 |
| Large | 6,272 | 94.1% | 65.5% | 77.3% | 315 | |
|
| Small | 10,829 | 97.2% | 55.9% | 70.9% | 153 |
| Large | 12,462 | 83.5% | 55.2% | 66.4% | 654 |
Performance achieved by the ServOMap_lt version on the LargeBio dataset
|
| Task | #Mappings | Precision | Recall | F-Measure | Time |
|---|---|---|---|---|---|---|
|
| Small | 2,468 | 98.8% | 80.6% | 88.8% | 20 |
| Large | 2,640 | 91.4% | 79.8% | 85.2% | 95 | |
|
| Small | 6,348 | 98.5% | 69.4% | 81,4% | 39 |
| Large | 6,563 | 94,5% | 68,9% | 79,7% | 234 | |
|
| Small | 11,730 | 96% | 59.8% | 73.7% | 147 |
| Large | 13,964 | 79.6% | 59% | 67.8% | 738 |
Performance achieved by the ServOMap_2013 version on the LargeBio dataset
|
| Task | #Mappings | Precision | Recall | F-Measure | Time |
|---|---|---|---|---|---|---|
|
| Small | 2,512 | 95.1% | 81.5% | 87.7% | 141 |
| Large | 3,235 | 72.7% | 80.3% | 76.3% | 2,690 | |
|
| Small | 5,828 | 95.5% | 62.2% | 75.3% | 391 |
| Large | 6,440 | 86.1% | 62% | 72.1% | 4,059 | |
|
| Small | 12,716 | 93.3% | 64.2% | 76.1% | 1,699 |
| Large | 14,312 | 82.2% | 63.7% | 71.8% | 6,320 |
Performance achieved by the ServOMap_V4 version on the LargeBio dataset
|
| Task | #Mappings | Precision | Recall | F-Measure |
|---|---|---|---|---|---|
|
| Small | 2,725 | 94.3% | 85% | 89.4% |
| Large | 3,163 | 71.1% | 83,6% | 79.3% | |
|
| Small | 6,978 | 95.5% | 74% | 83.4% |
| Large | 7,940 | 83.3% | 73.46% | 78.1% | |
|
| Small | 13,047 | 90.9% | 62.9% | 74.4% |
| Large | 15,525 | 75.7% | 62.3% | 68.4% |
Use of the LogMap repair facility with ServOMap_V4 on the small fragment of the input ontologies
| #Mappings | Precision | Recall | F-Measure | |
|---|---|---|---|---|
|
| 2,651 | 95.2% | 83.5% | 88.9% |
|
| 6,402 | 95.4% | 67.9% | 79.2% |
|
| 12,587 | 92.7% | 61.9% | 74.2% |
Figure 7Graphical user interface of the system: parameters (a) and mappings (b).