Literature DB >> 34363664

Improving domain adaptation in de-identification of electronic health records through self-training.

Shun Liao1,2, Jamie Kiros3, Jiyang Chen3, Zhaolei Zhang1,2, Ting Chen3.   

Abstract

OBJECTIVE: De-identification is a fundamental task in electronic health records to remove protected health information entities. Deep learning models have proven to be promising tools to automate de-identification processes. However, when the target domain (where the model is applied) is different from the source domain (where the model is trained), the model often suffers a significant performance drop, commonly referred to as domain adaptation issue. In de-identification, domain adaptation issues can make the model vulnerable for deployment. In this work, we aim to close the domain gap by leveraging unlabeled data from the target domain.
MATERIALS AND METHODS: We introduce a self-training framework to address the domain adaptation issue by leveraging unlabeled data from the target domain. We validate the effectiveness on 4 standard de-identification datasets. In each experiment, we use a pair of datasets: labeled data from the source domain and unlabeled data from the target domain. We compare the proposed self-training framework with supervised learning that directly deploys the model trained on the source domain.
RESULTS: In summary, our proposed framework improves the F1-score by 5.38 (on average) when compared with direct deployment. For example, using i2b2-2014 as the training dataset and i2b2-2006 as the test, the proposed framework increases the F1-score from 76.61 to 85.41 (+8.8). The method also increases the F1-score by 10.86 for mimic-radiology and mimic-discharge.
CONCLUSION: Our work demonstrates an effective self-training framework to boost the domain adaptation performance for the de-identification task for electronic health records.
© The Author(s) 2021. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.

Entities:  

Keywords:  de-identification; domain adaptation; medical language processing

Mesh:

Year:  2021        PMID: 34363664      PMCID: PMC8449604          DOI: 10.1093/jamia/ocab128

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   7.942


  15 in total

1.  Automatic de-identification of French clinical records: comparison of rule-based and machine-learning approaches.

Authors:  Cyril Grouin; Pierre Zweigenbaum
Journal:  Stud Health Technol Inform       Date:  2013

2.  Leveraging existing corpora for de-identification of psychiatric notes using domain adaptation.

Authors:  Hee-Jin Lee; Yaoyun Zhang; Kirk Roberts; Hua Xu
Journal:  AMIA Annu Symp Proc       Date:  2018-04-16

3.  De-identification of Clinical Text via Bi-LSTM-CRF with Neural Language Models.

Authors:  Buzhou Tang; Dehuan Jiang; Qingcai Chen; Xiaolong Wang; Jun Yan; Ying Shen
Journal:  AMIA Annu Symp Proc       Date:  2020-03-04

4.  A hybrid approach to automatic de-identification of psychiatric notes.

Authors:  Hee-Jin Lee; Yonghui Wu; Yaoyun Zhang; Jun Xu; Hua Xu; Kirk Roberts
Journal:  J Biomed Inform       Date:  2017-06-07       Impact factor: 6.317

5.  De-identification of clinical notes via recurrent neural network and conditional random field.

Authors:  Zengjian Liu; Buzhou Tang; Xiaolong Wang; Qingcai Chen
Journal:  J Biomed Inform       Date:  2017-06-01       Impact factor: 6.317

6.  De-identification of patient notes with recurrent neural networks.

Authors:  Franck Dernoncourt; Ji Young Lee; Ozlem Uzuner; Peter Szolovits
Journal:  J Am Med Inform Assoc       Date:  2017-05-01       Impact factor: 4.497

Review 7.  Automatic de-identification of textual documents in the electronic health record: a review of recent research.

Authors:  Stephane M Meystre; F Jeffrey Friedlin; Brett R South; Shuying Shen; Matthew H Samore
Journal:  BMC Med Res Methodol       Date:  2010-08-02       Impact factor: 4.615

8.  Automated de-identification of free-text medical records.

Authors:  Ishna Neamatullah; Margaret M Douglass; Li-wei H Lehman; Andrew Reisner; Mauricio Villarroel; William J Long; Peter Szolovits; George B Moody; Roger G Mark; Gari D Clifford
Journal:  BMC Med Inform Decis Mak       Date:  2008-07-24       Impact factor: 2.796

9.  Scalable and accurate deep learning with electronic health records.

Authors:  Alvin Rajkomar; Eyal Oren; Kai Chen; Andrew M Dai; Nissan Hajaj; Michaela Hardt; Peter J Liu; Xiaobing Liu; Jake Marcus; Mimi Sun; Patrik Sundberg; Hector Yee; Kun Zhang; Yi Zhang; Gerardo Flores; Gavin E Duggan; Jamie Irvine; Quoc Le; Kurt Litsch; Alexander Mossin; Justin Tansuwan; James Wexler; Jimbo Wilson; Dana Ludwig; Samuel L Volchenboum; Katherine Chou; Michael Pearson; Srinivasan Madabushi; Nigam H Shah; Atul J Butte; Michael D Howell; Claire Cui; Greg S Corrado; Jeffrey Dean
Journal:  NPJ Digit Med       Date:  2018-05-08

10.  Customization scenarios for de-identification of clinical notes.

Authors:  Tzvika Hartman; Michael D Howell; Jeff Dean; Shlomo Hoory; Ronit Slyper; Itay Laish; Oren Gilon; Danny Vainstein; Greg Corrado; Katherine Chou; Ming Jack Po; Jutta Williams; Scott Ellis; Gavin Bee; Avinatan Hassidim; Rony Amira; Genady Beryozkin; Idan Szpektor; Yossi Matias
Journal:  BMC Med Inform Decis Mak       Date:  2020-01-30       Impact factor: 2.796

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.