| Literature DB >> 30111369 |
Prodromos Kolyvakis1, Alexandros Kalousis2, Barry Smith3, Dimitris Kiritsis4.
Abstract
BACKGROUND: While representation learning techniques have shown great promise in application to a number of different NLP tasks, they have had little impact on the problem of ontology matching. Unlike past work that has focused on feature engineering, we present a novel representation learning approach that is tailored to the ontology matching task. Our approach is based on embedding ontological terms in a high-dimensional Euclidean space. This embedding is derived on the basis of a novel phrase retrofitting strategy through which semantic similarity information becomes inscribed onto fields of pre-trained word vectors. The resulting framework also incorporates a novel outlier detection mechanism based on a denoising autoencoder that is shown to improve performance.Entities:
Keywords: Denoising autoencoder; Ontology matching; Outlier detection; Semantic similarity; Sentence embeddings; Word embeddings
Mesh:
Year: 2018 PMID: 30111369 PMCID: PMC6094585 DOI: 10.1186/s13326-018-0187-8
Source DB: PubMed Journal: J Biomed Semantics
Fig. 1Example of alignments between the NCI Thesaurus and the Mouse Ontology (adapted from [56]). The dashed horizontal lines correspond to equivalence matchings between the NCI Thesaurus and the Mouse Anatomy ontology
Fig. 2Phrase Retrofitting architecture based on a Siamese CBOW network [32] and Knowledge Distillation [34]. The input projection layer is omitted
Fig. 3Autoencoder Architecture
Fig. 4Overall proposed ontology matching architecture
Respective sizes of the ontology matching tasks
| Ontology Matching between: | #Matchings | |||
|---|---|---|---|---|
| Ontology I | #Types | Ontology II | #Types | |
| MA | 2744 | NCI | 3304 | 1489 |
| FMA | 3696 | NCI | 6488 | 2504 |
| FMA | 10157 | SNOMED | 13412 | 7774 |
Performance of ontology matching systems across the different matching tasks.
| System | MA - NCI | FMA-NCI | FMA-SNOMED | ||||||
|---|---|---|---|---|---|---|---|---|---|
| P | R | F1 | P | R | F1 | P | R | F1 | |
| AML | 0.943 | 0.94 |
| 0.908 | 0.94 | 0.924 |
| 0.784 | 0.854 |
| CroMatcher | 0.942 | 0.912 | 0.927 | - | - | - | - | - | - |
| XMap | 0.924 | 0.877 | 0.9 | - | - | - | - | - | - |
| FCA_Map | 0.922 | 0.841 | 0.880 | 0.89 | 0.947 | 0.918 | 0.918 | 0.857 | 0.886 |
| LogMap | 0.906 | 0.850 | 0.878 | 0.894 | 0.930 | 0.912 | 0.933 | 0.721 | 0.814 |
| LogMapBio | 0.875 | 0.900 | 0.887 | 0.88 | 0.938 | 0.908 | 0.93 | 0.727 | 0.816 |
| Wieting | 0.804 | 0.879 | 0.839 | 0.840 | 0.857 | 0.849 | 0.867 | 0.851 | 0.859 |
| Wieting+DAE(O) | 0.952 | 0.871 | 0.909 | 0.909 | 0.851 | 0.879 | 0.929 | 0.832 | 0.878 |
| SCBOW | 0.847 | 0.917 | 0.881 | 0.899 | 0.895 | 0.897 | 0.843 | 0.866 | 0.855 |
| SCBOW+DAE(O) |
| 0.913 | 0.94 |
| 0.892 |
| 0.931 | 0.856 |
|
Note: Bold and underlined numbers indicate the best F1-score and the best precision on each matching task, respectively
Fig. 5Feature ablation study of our proposed approach across all the experimental ontology matching tasks
Ablation study experiment’s listings
| Experiment’s code: | Phrase retrofitting | DAE features: | |
|---|---|---|---|
| Matching | Outlier detection | ||
| W2V | - | - | - |
| DAE(O) | - | - | ✓ |
| DAE(M) | - | ✓ | - |
| DAE(MO) | - | ✓ | ✓ |
| SCBOW | ✓ | - | - |
| SCBOW+DAE(O) | ✓ | - | ✓ |
| SCBOW+DAE(M) | ✓ | ✓ | - |
| SCBOW+DAE(MO) | ✓ | ✓ | ✓ |
Sample misalignments produced by aligning ontologies using either SCBOW or Word2Vec vectors
| Terminology to be matched | Matching based on SCBOW | Matching based on Word2Vec |
|---|---|---|
| MA-NCI | ||
| gastrointestinal tract | digestive system | respiratory tract |
| tarsal joint | carpal tarsal bone | metacarpo phalangeal joint |
| thyroid gland epithelial tissue | thyroid gland medulla | prostate gland epithelium |
| FMA-NCI | ||
| cardiac muscle tissue | heart muscle | muscle tissue |
| set of carpal bones | carpus bone | sacral bone |
| white matter of telencephalon | brain white matter | white matter |
| FMA-SNOMED | ||
| zone of ligament of ankle joint | accessory ligament of ankle joint | entire ligament of elbow joint |
| muscle of anterior compartment of leg | compartment of lower leg | entire interosseus muscle of hand |
| dartos muscle | dartos layer of scrotum | tendon of psoas muscle |
Runtimes of the steps in the proposed algorithm
| Matching task | Running time (seconds) | |||
|---|---|---|---|---|
| Step 1 | Step 2 | Step 3 | Total | |
| MA - NCI | 337 | 34 | 36 | 407 |
| FMA - NCI | 490 | 82 | 40 | 612 |
| FMA - SNOMED | 609 | 490 | 41 | 1140 |
Proposed algorithm’s performance in relation to the used synonymy information sources
| System | Training data | MA - NCI | FMA - NCI | FMA - SNOMED | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| P | R | F1 | P | R | F1 | P | R | F1 | ||
| SCBOW | SL | 0.845 | 0.911 | 0.877 | 0.897 | 0.840 | 0.868 | 0.795 | 0.773 | 0.784 |
| SCBOW | SL + AS | 0.847 | 0.917 | 0.881 | 0.899 | 0.895 | 0.897 | 0.843 | 0.866 | 0.855 |
| SCBOW + DAE(O) | SL | 0.946 | 0.905 | 0.925 | 0.972 | 0.830 | 0.895 | 0.912 | 0.759 | 0.829 |
| SCBOW + DAE(O) | SL + AS | 0.968 | 0.913 | 0.94 | 0.976 | 0.892 | 0.932 | 0.931 | 0.856 | 0.892 |
Note: SL: synonyms only from ConceptNet 5, BabelNet, and WikiSynonyms; AS: additional synonyms found in the ontologies to be matched
Fig. 6Correlation between the relative change in training data’s size and F1-score
Fig. 7Sensitivity analysis of the proposed algorithm’s performance with different threshold values