Literature DB >> 34920703

Analyzing transfer learning impact in biomedical cross-lingual named entity recognition and normalization.

Renzo M Rivera-Zavala1, Paloma Martínez2.   

Abstract

BACKGROUND: The volume of biomedical literature and clinical data is growing at an exponential rate. Therefore, efficient access to data described in unstructured biomedical texts is a crucial task for the biomedical industry and research. Named Entity Recognition (NER) is the first step for information and knowledge acquisition when we deal with unstructured texts. Recent NER approaches use contextualized word representations as input for a downstream classification task. However, distributed word vectors (embeddings) are very limited in Spanish and even more for the biomedical domain.
METHODS: In this work, we develop several biomedical Spanish word representations, and we introduce two Deep Learning approaches for pharmaceutical, chemical, and other biomedical entities recognition in Spanish clinical case texts and biomedical texts, one based on a Bi-STM-CRF model and the other on a BERT-based architecture.
RESULTS: Several Spanish biomedical embeddigns together with the two deep learning models were evaluated on the PharmaCoNER and CORD-19 datasets. The PharmaCoNER dataset is composed of a set of Spanish clinical cases annotated with drugs, chemical compounds and pharmacological substances; our extended Bi-LSTM-CRF model obtains an F-score of 85.24% on entity identification and classification and the BERT model obtains an F-score of 88.80% . For the entity normalization task, the extended Bi-LSTM-CRF model achieves an F-score of 72.85% and the BERT model achieves 79.97%. The CORD-19 dataset consists of scholarly articles written in English annotated with biomedical concepts such as disorder, species, chemical or drugs, gene and protein, enzyme and anatomy. Bi-LSTM-CRF model and BERT model obtain an F-measure of 78.23% and 78.86% on entity identification and classification, respectively on the CORD-19 dataset.
CONCLUSION: These results prove that deep learning models with in-domain knowledge learned from large-scale datasets highly improve named entity recognition performance. Moreover, contextualized representations help to understand complexities and ambiguity inherent to biomedical texts. Embeddings based on word, concepts, senses, etc. other than those for English are required to improve NER tasks in other languages.
© 2021. The Author(s).

Entities:  

Keywords:  Clinical texts; Contextual information; Deep learning; Natural language processing

Mesh:

Year:  2021        PMID: 34920703      PMCID: PMC8680060          DOI: 10.1186/s12859-021-04247-9

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  16 in total

1.  An overview of MetaMap: historical perspective and recent advances.

Authors:  Alan R Aronson; François-Michel Lang
Journal:  J Am Med Inform Assoc       Date:  2010 May-Jun       Impact factor: 4.497

2.  Systematized nomenclature of medicine clinical terms (SNOMED CT) to represent computed tomography procedures.

Authors:  Thuppahi Sisira De Silva; Don MacDonald; Grace Paterson; Khokan C Sikdar; Bonnie Cochrane
Journal:  Comput Methods Programs Biomed       Date:  2011-03       Impact factor: 5.428

3.  PyMedTermino: an open-source generic API for advanced terminology services.

Authors:  Jean-Baptiste Lamy; Alain Venot; Catherine Duclos
Journal:  Stud Health Technol Inform       Date:  2015

4.  HUNER: improving biomedical NER with pretraining.

Authors:  Leon Weber; Jannes Münchmeyer; Tim Rocktäschel; Maryam Habibi; Ulf Leser
Journal:  Bioinformatics       Date:  2020-01-01       Impact factor: 6.937

Review 5.  SECNLP: A survey of embeddings in clinical natural language processing.

Authors:  Katikapalli Subramanyam Kalyan; S Sangeetha
Journal:  J Biomed Inform       Date:  2019-11-08       Impact factor: 6.317

6.  LSTMVoter: chemical named entity recognition using a conglomerate of sequence labeling tools.

Authors:  Wahed Hemati; Alexander Mehler
Journal:  J Cheminform       Date:  2019-01-10       Impact factor: 5.514

7.  Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction.

Authors:  Laila Rasmy; Yang Xiang; Ziqian Xie; Cui Tao; Degui Zhi
Journal:  NPJ Digit Med       Date:  2021-05-20

8.  Transfer learning for biomedical named entity recognition with neural networks.

Authors:  John M Giorgi; Gary D Bader
Journal:  Bioinformatics       Date:  2018-12-01       Impact factor: 6.937

9.  Clinical Concept Embeddings Learned from Massive Sources of Multimodal Medical Data.

Authors:  Andrew L Beam; Benjamin Kompa; Allen Schmaltz; Inbar Fried; Griffin Weber; Nathan Palmer; Xu Shi; Tianxi Cai; Isaac S Kohane
Journal:  Pac Symp Biocomput       Date:  2020

10.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining.

Authors:  Jinhyuk Lee; Wonjin Yoon; Sungdong Kim; Donghyeon Kim; Sunkyu Kim; Chan Ho So; Jaewoo Kang
Journal:  Bioinformatics       Date:  2020-02-15       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.