Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Context-Enriched Learning Models for Aligning Biomedical Vocabularies at Scale in the UMLS Metathesaurus.

Literature DB >> 36108322

Context-Enriched Learning Models for Aligning Biomedical Vocabularies at Scale in the UMLS Metathesaurus.

Vinh Nguyen¹, Hong Yung Yip², Goonmeet Bajaj³, Thilini Wijesiriwardene², Vishesh Javangula⁴, Srinivasan Parthasarathy³, Amit Sheth², Olivier Bodenreider¹.

Abstract

The Unified Medical Language System (UMLS) Metathesaurus construction process mainly relies on lexical algorithms and manual expert curation for integrating over 200 biomedical vocabularies. A lexical-based learning model (LexLM) was developed to predict synonymy among Metathesaurus terms and largely outperforms a rule-based approach (RBA) that approximates the current construction process. However, the LexLM has the potential for being improved further because it only uses lexical information from the source vocabularies, while the RBA also takes advantage of contextual information. We investigate the role of multiple types of contextual information available to the UMLS editors, namely source synonymy (SS), source semantic group (SG), and source hierarchical relations (HR), for the UMLS vocabulary alignment (UVA) problem. In this paper, we develop multiple variants of context-enriched learning models (ConLMs) by adding to the LexLM the types of contextual information listed above. We represent these context types in context-enriched knowledge graphs (ConKGs) with four variants ConSS, ConSG, ConHR, and ConAll. We train these ConKG embeddings using seven KG embedding techniques. We create the ConLMs by concatenating the ConKG embedding vectors with the word embedding vectors from the LexLM. We evaluate the performance of the ConLMs using the UVA generalization test datasets with hundreds of millions of pairs. Our extensive experiments show a significant performance improvement from the ConLMs over the LexLM, namely +5.0% in precision (93.75%), +0.69% in recall (93.23%), +2.88% in F1 (93.49%) for the best ConLM. Our experiments also show that the ConAll variant including the three context types takes more time, but does not always perform better than other variants with a single context type. Finally, our experiments show that the pairs of terms with high lexical similarity benefit most from adding contextual information, namely +6.56% in precision (94.97%), +2.13% in recall (93.23%), +4.35% in F1 (94.09%) for the best ConLM. The pairs with lower degrees of lexical similarity also show performance improvement with +0.85% in F1 (96%) for low similarity and +1.31% in F1 (96.34%) for no similarity. These results demonstrate the importance of using contextual information in the UVA problem.

Entities: Chemical

Keywords: UMLS Metathesaurus; knowledge graph embeddings; neural networks; scalability; supervised learning; vocabulary alignment

Year: 2022 PMID： 36108322 PMCID： PMC9455675 DOI： 10.1145/3485447.3511946

Source DB: PubMed Journal: Proc Int World Wide Web Conf

Keyword Cloud
References

14 in total

1. Consistency across the hierarchies of the UMLS Semantic Network and Metathesaurus.

Authors: J J Cimino; H Min; Y Perl
Journal: J Biomed Inform Date: 2003-12 Impact factor: 6.317

2. Auditing the Unified Medical Language System with semantic methods.

Authors: J J Cimino
Journal: J Am Med Inform Assoc Date: 1998 Jan-Feb Impact factor: 4.497

3. The Neighborhood Auditing Tool: a hybrid interface for auditing the UMLS.

Authors: C Paul Morrey; James Geller; Michael Halper; Yehoshua Perl
Journal: J Biomed Inform Date: 2009-06 Impact factor: 6.317

4. A Survey on Knowledge Graphs: Representation, Acquisition, and Applications.

Authors: Shaoxiong Ji; Shirui Pan; Erik Cambria; Pekka Marttinen; Philip S Yu
Journal: IEEE Trans Neural Netw Learn Syst Date: 2022-02-03 Impact factor: 10.451

5. BioWordVec, improving biomedical word embeddings with subword information and MeSH.

Authors: Yijia Zhang; Qingyu Chen; Zhihao Yang; Hongfei Lin; Zhiyong Lu
Journal: Sci Data Date: 2019-05-10 Impact factor: 6.444

6. Adding an Attention Layer Improves the Performance of a Neural Network Architecture for Synonymy Prediction in the UMLS Metathesaurus.

Authors: Vinh Nguyen; Olivier Bodenreider
Journal: Stud Health Technol Inform Date: 2022-06-06