Literature DB >> 34284119

Improved biomedical word embeddings in the transformer era.

Jiho Noh1, Ramakanth Kavuluru2.   

Abstract

BACKGROUND: Recent natural language processing (NLP) research is dominated by neural network methods that employ word embeddings as basic building blocks. Pre-training with neural methods that capture local and global distributional properties (e.g., skip-gram, GLoVE) using free text corpora is often used to embed both words and concepts. Pre-trained embeddings are typically leveraged in downstream tasks using various neural architectures that are designed to optimize task-specific objectives that might further tune such embeddings.
OBJECTIVE: Despite advances in contextualized language model based embeddings, static word embeddings still form an essential starting point in BioNLP research and applications. They are useful in low resource settings and in lexical semantics studies. Our main goal is to build improved biomedical word embeddings and make them publicly available for downstream applications.
METHODS: We jointly learn word and concept embeddings by first using the skip-gram method and further fine-tuning them with correlational information manifesting in co-occurring Medical Subject Heading (MeSH) concepts in biomedical citations. This fine-tuning is accomplished with the transformer-based BERT architecture in the two-sentence input mode with a classification objective that captures MeSH pair co-occurrence. We conduct evaluations of these tuned static embeddings using multiple datasets for word relatedness developed by previous efforts.
RESULTS: Both in qualitative and quantitative evaluations we demonstrate that our methods produce improved biomedical embeddings in comparison with other static embedding efforts. Without selectively culling concepts and terms (as was pursued by previous efforts), we believe we offer the most exhaustive evaluation of biomedical embeddings to date with clear performance improvements across the board.
CONCLUSION: We repurposed a transformer architecture (typically used to generate dynamic embeddings) to improve static biomedical word embeddings using concept correlations. We provide our code and embeddings for public use for downstream applications and research endeavors: https://github.com/bionlproc/BERT-CRel-Embeddings.
Copyright © 2021 Elsevier Inc. All rights reserved.

Entities:  

Keywords:  Contextualized embeddings; Fine-tuned embeddings; Word embeddings

Mesh:

Year:  2021        PMID: 34284119      PMCID: PMC8373296          DOI: 10.1016/j.jbi.2021.103867

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   8.000


  12 in total

1.  A simple algorithm for identifying abbreviation definitions in biomedical text.

Authors:  Ariel S Schwartz; Marti A Hearst
Journal:  Pac Symp Biocomput       Date:  2003

2.  ALICE: an algorithm to extract abbreviations from MEDLINE.

Authors:  Hiroko Ao; Toshihisa Takagi
Journal:  J Am Med Inform Assoc       Date:  2005-05-19       Impact factor: 4.497

3.  Measures of semantic similarity and relatedness in the biomedical domain.

Authors:  Ted Pedersen; Serguei V S Pakhomov; Siddharth Patwardhan; Christopher G Chute
Journal:  J Biomed Inform       Date:  2006-06-10       Impact factor: 6.317

4.  A comparison of word embeddings for the biomedical natural language processing.

Authors:  Yanshan Wang; Sijia Liu; Naveed Afzal; Majid Rastegar-Mojarad; Liwei Wang; Feichen Shen; Paul Kingsbury; Hongfang Liu
Journal:  J Biomed Inform       Date:  2018-09-12       Impact factor: 6.317

5.  Semantic Similarity and Relatedness between Clinical Terms: An Experimental Study.

Authors:  Serguei Pakhomov; Bridget McInnes; Terrence Adam; Ying Liu; Ted Pedersen; Genevieve B Melton
Journal:  AMIA Annu Symp Proc       Date:  2010-11-13

6.  Concept embedding to measure semantic relatedness for biomedical information ontologies.

Authors:  Junseok Park; Kwangmin Kim; Woochang Hwang; Doheon Lee
Journal:  J Biomed Inform       Date:  2019-04-19       Impact factor: 6.317

7.  Retrofitting Concept Vector Representations of Medical Concepts to Improve Estimates of Semantic Similarity and Relatedness.

Authors:  Zhiguo Yu; Byron C Wallace; Todd Johnson; Trevor Cohen
Journal:  Stud Health Technol Inform       Date:  2017

8.  An empirical evaluation of supervised learning approaches in assigning diagnosis codes to electronic medical records.

Authors:  Ramakanth Kavuluru; Anthony Rios; Yuan Lu
Journal:  Artif Intell Med       Date:  2015-05-15       Impact factor: 5.326

9.  Knowledge-Based Biomedical Word Sense Disambiguation with Neural Concept Embeddings

Authors:  Akm Sabbir; Antonio Jimeno-Yepes; Ramakanth Kavuluru
Journal:  Proc IEEE Int Symp Bioinformatics Bioeng       Date:  2018-01-11

10.  BioWordVec, improving biomedical word embeddings with subword information and MeSH.

Authors:  Yijia Zhang; Qingyu Chen; Zhihao Yang; Hongfei Lin; Zhiyong Lu
Journal:  Sci Data       Date:  2019-05-10       Impact factor: 6.444

View more
  1 in total

1.  Artificial Intelligence in Pharmacovigilance: An Introduction to Terms, Concepts, Applications, and Limitations.

Authors:  Jeffrey K Aronson
Journal:  Drug Saf       Date:  2022-05-17       Impact factor: 5.606

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.