Literature DB >> 36108322

Context-Enriched Learning Models for Aligning Biomedical Vocabularies at Scale in the UMLS Metathesaurus.

Vinh Nguyen1, Hong Yung Yip2, Goonmeet Bajaj3, Thilini Wijesiriwardene2, Vishesh Javangula4, Srinivasan Parthasarathy3, Amit Sheth2, Olivier Bodenreider1.   

Abstract

The Unified Medical Language System (UMLS) Metathesaurus construction process mainly relies on lexical algorithms and manual expert curation for integrating over 200 biomedical vocabularies. A lexical-based learning model (LexLM) was developed to predict synonymy among Metathesaurus terms and largely outperforms a rule-based approach (RBA) that approximates the current construction process. However, the LexLM has the potential for being improved further because it only uses lexical information from the source vocabularies, while the RBA also takes advantage of contextual information. We investigate the role of multiple types of contextual information available to the UMLS editors, namely source synonymy (SS), source semantic group (SG), and source hierarchical relations (HR), for the UMLS vocabulary alignment (UVA) problem. In this paper, we develop multiple variants of context-enriched learning models (ConLMs) by adding to the LexLM the types of contextual information listed above. We represent these context types in context-enriched knowledge graphs (ConKGs) with four variants ConSS, ConSG, ConHR, and ConAll. We train these ConKG embeddings using seven KG embedding techniques. We create the ConLMs by concatenating the ConKG embedding vectors with the word embedding vectors from the LexLM. We evaluate the performance of the ConLMs using the UVA generalization test datasets with hundreds of millions of pairs. Our extensive experiments show a significant performance improvement from the ConLMs over the LexLM, namely +5.0% in precision (93.75%), +0.69% in recall (93.23%), +2.88% in F1 (93.49%) for the best ConLM. Our experiments also show that the ConAll variant including the three context types takes more time, but does not always perform better than other variants with a single context type. Finally, our experiments show that the pairs of terms with high lexical similarity benefit most from adding contextual information, namely +6.56% in precision (94.97%), +2.13% in recall (93.23%), +4.35% in F1 (94.09%) for the best ConLM. The pairs with lower degrees of lexical similarity also show performance improvement with +0.85% in F1 (96%) for low similarity and +1.31% in F1 (96.34%) for no similarity. These results demonstrate the importance of using contextual information in the UVA problem.

Entities:  

Keywords:  UMLS Metathesaurus; knowledge graph embeddings; neural networks; scalability; supervised learning; vocabulary alignment

Year:  2022        PMID: 36108322      PMCID: PMC9455675          DOI: 10.1145/3485447.3511946

Source DB:  PubMed          Journal:  Proc Int World Wide Web Conf


  14 in total

1.  Consistency across the hierarchies of the UMLS Semantic Network and Metathesaurus.

Authors:  J J Cimino; H Min; Y Perl
Journal:  J Biomed Inform       Date:  2003-12       Impact factor: 6.317

2.  Auditing the Unified Medical Language System with semantic methods.

Authors:  J J Cimino
Journal:  J Am Med Inform Assoc       Date:  1998 Jan-Feb       Impact factor: 4.497

3.  The Neighborhood Auditing Tool: a hybrid interface for auditing the UMLS.

Authors:  C Paul Morrey; James Geller; Michael Halper; Yehoshua Perl
Journal:  J Biomed Inform       Date:  2009-06       Impact factor: 6.317

4.  A Survey on Knowledge Graphs: Representation, Acquisition, and Applications.

Authors:  Shaoxiong Ji; Shirui Pan; Erik Cambria; Pekka Marttinen; Philip S Yu
Journal:  IEEE Trans Neural Netw Learn Syst       Date:  2022-02-03       Impact factor: 10.451

5.  BioWordVec, improving biomedical word embeddings with subword information and MeSH.

Authors:  Yijia Zhang; Qingyu Chen; Zhihao Yang; Hongfei Lin; Zhiyong Lu
Journal:  Sci Data       Date:  2019-05-10       Impact factor: 6.444

6.  Adding an Attention Layer Improves the Performance of a Neural Network Architecture for Synonymy Prediction in the UMLS Metathesaurus.

Authors:  Vinh Nguyen; Olivier Bodenreider
Journal:  Stud Health Technol Inform       Date:  2022-06-06

7.  Reuse of terminological resources for efficient ontological engineering in Life Sciences.

Authors:  Antonio Jimeno-Yepes; Ernesto Jiménez-Ruiz; Rafael Berlanga-Llavori; Dietrich Rebholz-Schuhmann
Journal:  BMC Bioinformatics       Date:  2009-10-01       Impact factor: 3.307

Review 8.  A Comprehensive Survey on Graph Neural Networks.

Authors:  Zonghan Wu; Shirui Pan; Fengwen Chen; Guodong Long; Chengqi Zhang; Philip S Yu
Journal:  IEEE Trans Neural Netw Learn Syst       Date:  2021-01-04       Impact factor: 10.451

9.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining.

Authors:  Jinhyuk Lee; Wonjin Yoon; Sungdong Kim; Donghyeon Kim; Sunkyu Kim; Chan Ho So; Jaewoo Kang
Journal:  Bioinformatics       Date:  2020-02-15       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.