Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Improved biomedical word embeddings in the transformer era.

Literature DB >> 34284119

Improved biomedical word embeddings in the transformer era.

Abstract

BACKGROUND: Recent natural language processing (NLP) research is dominated by neural network methods that employ word embeddings as basic building blocks. Pre-training with neural methods that capture local and global distributional properties (e.g., skip-gram, GLoVE) using free text corpora is often used to embed both words and concepts. Pre-trained embeddings are typically leveraged in downstream tasks using various neural architectures that are designed to optimize task-specific objectives that might further tune such embeddings.
OBJECTIVE: Despite advances in contextualized language model based embeddings, static word embeddings still form an essential starting point in BioNLP research and applications. They are useful in low resource settings and in lexical semantics studies. Our main goal is to build improved biomedical word embeddings and make them publicly available for downstream applications.
METHODS: We jointly learn word and concept embeddings by first using the skip-gram method and further fine-tuning them with correlational information manifesting in co-occurring Medical Subject Heading (MeSH) concepts in biomedical citations. This fine-tuning is accomplished with the transformer-based BERT architecture in the two-sentence input mode with a classification objective that captures MeSH pair co-occurrence. We conduct evaluations of these tuned static embeddings using multiple datasets for word relatedness developed by previous efforts.
RESULTS: Both in qualitative and quantitative evaluations we demonstrate that our methods produce improved biomedical embeddings in comparison with other static embedding efforts. Without selectively culling concepts and terms (as was pursued by previous efforts), we believe we offer the most exhaustive evaluation of biomedical embeddings to date with clear performance improvements across the board.
CONCLUSION: We repurposed a transformer architecture (typically used to generate dynamic embeddings) to improve static biomedical word embeddings using concept correlations. We provide our code and embeddings for public use for downstream applications and research endeavors: https://github.com/bionlproc/BERT-CRel-Embeddings.

Entities: Chemical

Keywords: Contextualized embeddings; Fine-tuned embeddings; Word embeddings

Mesh：

Year: 2021 PMID： 34284119 PMCID： PMC8373296 DOI： 10.1016/j.jbi.2021.103867

Source DB: PubMed Journal: J Biomed Inform ISSN： 1532-0464 Impact factor: 8.000

12 in total

1. A simple algorithm for identifying abbreviation definitions in biomedical text.

Authors: Ariel S Schwartz; Marti A Hearst
Journal: Pac Symp Biocomput Date: 2003

2. ALICE: an algorithm to extract abbreviations from MEDLINE.

Authors: Hiroko Ao; Toshihisa Takagi
Journal: J Am Med Inform Assoc Date: 2005-05-19 Impact factor: 4.497

3. Measures of semantic similarity and relatedness in the biomedical domain.

Authors: Ted Pedersen; Serguei V S Pakhomov; Siddharth Patwardhan; Christopher G Chute
Journal: J Biomed Inform Date: 2006-06-10 Impact factor: 6.317

4. A comparison of word embeddings for the biomedical natural language processing.

Authors: Yanshan Wang; Sijia Liu; Naveed Afzal; Majid Rastegar-Mojarad; Liwei Wang; Feichen Shen; Paul Kingsbury; Hongfang Liu
Journal: J Biomed Inform Date: 2018-09-12 Impact factor: 6.317

5. Semantic Similarity and Relatedness between Clinical Terms: An Experimental Study.

Authors: Serguei Pakhomov; Bridget McInnes; Terrence Adam; Ying Liu; Ted Pedersen; Genevieve B Melton
Journal: AMIA Annu Symp Proc Date: 2010-11-13

6. Concept embedding to measure semantic relatedness for biomedical information ontologies.

Authors: Junseok Park; Kwangmin Kim; Woochang Hwang; Doheon Lee
Journal: J Biomed Inform Date: 2019-04-19 Impact factor: 6.317

7. Retrofitting Concept Vector Representations of Medical Concepts to Improve Estimates of Semantic Similarity and Relatedness.

Authors: Zhiguo Yu; Byron C Wallace; Todd Johnson; Trevor Cohen
Journal: Stud Health Technol Inform Date: 2017

Improved biomedical word embeddings in the transformer era.

1. A simple algorithm for identifying abbreviation definitions in biomedical text.

2. ALICE: an algorithm to extract abbreviations from MEDLINE.

3. Measures of semantic similarity and relatedness in the biomedical domain.

4. A comparison of word embeddings for the biomedical natural language processing.

5. Semantic Similarity and Relatedness between Clinical Terms: An Experimental Study.

6. Concept embedding to measure semantic relatedness for biomedical information ontologies.

7. Retrofitting Concept Vector Representations of Medical Concepts to Improve Estimates of Semantic Similarity and Relatedness.

8. An empirical evaluation of supervised learning approaches in assigning diagnosis codes to electronic medical records.

9. Knowledge-Based Biomedical Word Sense Disambiguation with Neural Concept Embeddings

10. BioWordVec, improving biomedical word embeddings with subword information and MeSH.

1. Artificial Intelligence in Pharmacovigilance: An Introduction to Terms, Concepts, Applications, and Limitations.