Literature DB >> 34272955

Are synthetic clinical notes useful for real natural language processing tasks: A case study on clinical entity recognition.

Jianfu Li1, Yujia Zhou1, Xiaoqian Jiang1, Karthik Natarajan2, Serguei Vs Pakhomov3, Hongfang Liu4, Hua Xu1.   

Abstract

OBJECTIVE: : Developing clinical natural language processing systems often requires access to many clinical documents, which are not widely available to the public due to privacy and security concerns. To address this challenge, we propose to develop methods to generate synthetic clinical notes and evaluate their utility in real clinical natural language processing tasks.
MATERIALS AND METHODS: : We implemented 4 state-of-the-art text generation models, namely CharRNN, SegGAN, GPT-2, and CTRL, to generate clinical text for the History and Present Illness section. We then manually annotated clinical entities for randomly selected 500 History and Present Illness notes generated from the best-performing algorithm. To compare the utility of natural and synthetic corpora, we trained named entity recognition (NER) models from all 3 corpora and evaluated their performance on 2 independent natural corpora.
RESULTS: : Our evaluation shows GPT-2 achieved the best BLEU (bilingual evaluation understudy) score (with a BLEU-2 of 0.92). NER models trained on synthetic corpus generated by GPT-2 showed slightly better performance on 2 independent corpora: strict F1 scores of 0.709 and 0.748, respectively, when compared with the NER models trained on natural corpus (F1 scores of 0.706 and 0.737, respectively), indicating the good utility of synthetic corpora in clinical NER model development. In addition, we also demonstrated that an augmented method that combines both natural and synthetic corpora achieved better performance than that uses the natural corpus only.
CONCLUSIONS: : Recent advances in text generation have made it possible to generate synthetic clinical notes that could be useful for training NER models for information extraction from natural clinical notes, thus lowering the privacy concern and increasing data availability. Further investigation is needed to apply this technology to practice.
© The Author(s) 2021. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.

Entities:  

Keywords:  clinical notes; named entity recognition; natural language processing; neural language model; text generation

Mesh:

Year:  2021        PMID: 34272955      PMCID: PMC8449609          DOI: 10.1093/jamia/ocab112

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   7.942


  19 in total

1.  2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text.

Authors:  Özlem Uzuner; Brett R South; Shuying Shen; Scott L DuVall
Journal:  J Am Med Inform Assoc       Date:  2011-06-16       Impact factor: 4.497

2.  Long short-term memory.

Authors:  S Hochreiter; J Schmidhuber
Journal:  Neural Comput       Date:  1997-11-15       Impact factor: 2.026

3.  A hybrid system for temporal information extraction from clinical text.

Authors:  Buzhou Tang; Yonghui Wu; Min Jiang; Yukun Chen; Joshua C Denny; Hua Xu
Journal:  J Am Med Inform Assoc       Date:  2013-04-09       Impact factor: 4.497

Review 4.  Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task Track 1.

Authors:  Amber Stubbs; Christopher Kotfila; Özlem Uzuner
Journal:  J Biomed Inform       Date:  2015-07-28       Impact factor: 6.317

5.  De-identification of patient notes with recurrent neural networks.

Authors:  Franck Dernoncourt; Ji Young Lee; Ozlem Uzuner; Peter Szolovits
Journal:  J Am Med Inform Assoc       Date:  2017-05-01       Impact factor: 4.497

Review 6.  Extracting information from the text of electronic medical records to improve case detection: a systematic review.

Authors:  Elizabeth Ford; John A Carroll; Helen E Smith; Donia Scott; Jackie A Cassell
Journal:  J Am Med Inform Assoc       Date:  2016-02-05       Impact factor: 4.497

7.  Entity recognition from clinical texts via recurrent neural network.

Authors:  Zengjian Liu; Ming Yang; Xiaolong Wang; Qingcai Chen; Buzhou Tang; Zhe Wang; Hua Xu
Journal:  BMC Med Inform Decis Mak       Date:  2017-07-05       Impact factor: 2.796

8.  CLAMP - a toolkit for efficiently building customized clinical natural language processing pipelines.

Authors:  Ergin Soysal; Jingqi Wang; Min Jiang; Yonghui Wu; Serguei Pakhomov; Hongfang Liu; Hua Xu
Journal:  J Am Med Inform Assoc       Date:  2018-03-01       Impact factor: 4.497

9.  Discovery of Noncancer Drug Effects on Survival in Electronic Health Records of Patients With Cancer: A New Paradigm for Drug Repurposing.

Authors:  Yonghui Wu; Jeremy L Warner; Liwei Wang; Min Jiang; Jun Xu; Qingxia Chen; Hui Nian; Qi Dai; Xianglin Du; Ping Yang; Joshua C Denny; Hongfang Liu; Hua Xu
Journal:  JCO Clin Cancer Inform       Date:  2019-05

10.  MIMIC-III, a freely accessible critical care database.

Authors:  Alistair E W Johnson; Tom J Pollard; Lu Shen; Li-Wei H Lehman; Mengling Feng; Mohammad Ghassemi; Benjamin Moody; Peter Szolovits; Leo Anthony Celi; Roger G Mark
Journal:  Sci Data       Date:  2016-05-24       Impact factor: 6.444

View more
  1 in total

Review 1.  Privacy Protection and Secondary Use of Health Data: Strategies and Methods.

Authors:  Dingyi Xiang; Wei Cai
Journal:  Biomed Res Int       Date:  2021-10-07       Impact factor: 3.411

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.