Literature DB >> 33367705

UMLS-based data augmentation for natural language processing of clinical research literature.

Tian Kang1, Adler Perotte1, Youlan Tang1, Casey Ta1, Chunhua Weng1.   

Abstract

OBJECTIVE: The study sought to develop and evaluate a knowledge-based data augmentation method to improve the performance of deep learning models for biomedical natural language processing by overcoming training data scarcity.
MATERIALS AND METHODS: We extended the easy data augmentation (EDA) method for biomedical named entity recognition (NER) by incorporating the Unified Medical Language System (UMLS) knowledge and called this method UMLS-EDA. We designed experiments to systematically evaluate the effect of UMLS-EDA on popular deep learning architectures for both NER and classification. We also compared UMLS-EDA to BERT.
RESULTS: UMLS-EDA enables substantial improvement for NER tasks from the original long short-term memory conditional random fields (LSTM-CRF) model (micro-F1 score: +5%, + 17%, and +15%), helps the LSTM-CRF model (micro-F1 score: 0.66) outperform LSTM-CRF with transfer learning by BERT (0.63), and improves the performance of the state-of-the-art sentence classification model. The largest gain on micro-F1 score is 9%, from 0.75 to 0.84, better than classifiers with BERT pretraining (0.82).
CONCLUSIONS: This study presents a UMLS-based data augmentation method, UMLS-EDA. It is effective at improving deep learning models for both NER and sentence classification, and contributes original insights for designing new, superior deep learning approaches for low-resource biomedical domains.
© The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.

Entities:  

Keywords:  NLP; UMLS; Unified Medical Language System; data augmentation; evidence based medicine; machine learning; named entity recognition; natural language processing

Mesh:

Year:  2021        PMID: 33367705      PMCID: PMC7973470          DOI: 10.1093/jamia/ocaa309

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


  18 in total

1.  Natural language processing and its future in medicine.

Authors:  C Friedman; G Hripcsak
Journal:  Acad Med       Date:  1999-08       Impact factor: 6.893

2.  Evidence-based medicine.

Authors:  D L Sackett
Journal:  Semin Perinatol       Date:  1997-02       Impact factor: 3.300

3.  The well-built clinical question: a key to evidence-based decisions.

Authors:  W S Richardson; M C Wilson; J Nishikawa; R S Hayward
Journal:  ACP J Club       Date:  1995 Nov-Dec

4.  A clinical text classification paradigm using weak supervision and deep representation.

Authors:  Yanshan Wang; Sunghwan Sohn; Sijia Liu; Feichen Shen; Liwei Wang; Elizabeth J Atkinson; Shreyasee Amin; Hongfang Liu
Journal:  BMC Med Inform Decis Mak       Date:  2019-01-07       Impact factor: 2.796

5.  Seventy-five trials and eleven systematic reviews a day: how will we ever keep up?

Authors:  Hilda Bastian; Paul Glasziou; Iain Chalmers
Journal:  PLoS Med       Date:  2010-09-21       Impact factor: 11.069

Review 6.  A guide to deep learning in healthcare.

Authors:  Andre Esteva; Alexandre Robicquet; Bharath Ramsundar; Volodymyr Kuleshov; Mark DePristo; Katherine Chou; Claire Cui; Greg Corrado; Sebastian Thrun; Jeff Dean
Journal:  Nat Med       Date:  2019-01-07       Impact factor: 53.440

7.  Automatic classification of sentences to support Evidence Based Medicine.

Authors:  Su Nam Kim; David Martinez; Lawrence Cavedon; Lars Yencken
Journal:  BMC Bioinformatics       Date:  2011-03-29       Impact factor: 3.169

8.  Analysis of the time and workers needed to conduct systematic reviews of medical interventions using data from the PROSPERO registry.

Authors:  Rohit Borah; Andrew W Brown; Patrice L Capers; Kathryn A Kaiser
Journal:  BMJ Open       Date:  2017-02-27       Impact factor: 2.692

9.  Sentence retrieval for abstracts of randomized controlled trials.

Authors:  Grace Y Chung
Journal:  BMC Med Inform Decis Mak       Date:  2009-02-10       Impact factor: 2.796

10.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining.

Authors:  Jinhyuk Lee; Wonjin Yoon; Sungdong Kim; Donghyeon Kim; Sunkyu Kim; Chan Ho So; Jaewoo Kang
Journal:  Bioinformatics       Date:  2020-02-15       Impact factor: 6.937

View more
  2 in total

Review 1.  Artificial Intelligence for Disease Assessment in Inflammatory Bowel Disease: How Will it Change Our Practice?

Authors:  Ryan W Stidham; Kento Takenaka
Journal:  Gastroenterology       Date:  2022-01-04       Impact factor: 22.682

2.  Investigating the impact of weakly supervised data on text mining models of publication transparency: a case study on randomized controlled trials.

Authors:  Linh Hoanga; Lan Jiang; Halil Kilicoglu
Journal:  AMIA Annu Symp Proc       Date:  2022-05-23
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.