Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Improving domain adaptation in de-identification of electronic health records through self-training.

Literature DB >> 34363664

Improving domain adaptation in de-identification of electronic health records through self-training.

Shun Liao^1,2, Jamie Kiros³, Jiyang Chen³, Zhaolei Zhang^1,2, Ting Chen³.

Abstract

OBJECTIVE: De-identification is a fundamental task in electronic health records to remove protected health information entities. Deep learning models have proven to be promising tools to automate de-identification processes. However, when the target domain (where the model is applied) is different from the source domain (where the model is trained), the model often suffers a significant performance drop, commonly referred to as domain adaptation issue. In de-identification, domain adaptation issues can make the model vulnerable for deployment. In this work, we aim to close the domain gap by leveraging unlabeled data from the target domain.
MATERIALS AND METHODS: We introduce a self-training framework to address the domain adaptation issue by leveraging unlabeled data from the target domain. We validate the effectiveness on 4 standard de-identification datasets. In each experiment, we use a pair of datasets: labeled data from the source domain and unlabeled data from the target domain. We compare the proposed self-training framework with supervised learning that directly deploys the model trained on the source domain.
RESULTS: In summary, our proposed framework improves the F1-score by 5.38 (on average) when compared with direct deployment. For example, using i2b2-2014 as the training dataset and i2b2-2006 as the test, the proposed framework increases the F1-score from 76.61 to 85.41 (+8.8). The method also increases the F1-score by 10.86 for mimic-radiology and mimic-discharge.
CONCLUSION: Our work demonstrates an effective self-training framework to boost the domain adaptation performance for the de-identification task for electronic health records.

Entities: Chemical

Keywords: de-identification; domain adaptation; medical language processing

Mesh：

Year: 2021 PMID： 34363664 PMCID： PMC8449604 DOI： 10.1093/jamia/ocab128

Source DB: PubMed Journal: J Am Med Inform Assoc ISSN： 1067-5027 Impact factor: 7.942

Keyword Cloud
References

15 in total

1. Automatic de-identification of French clinical records: comparison of rule-based and machine-learning approaches.

Authors: Cyril Grouin; Pierre Zweigenbaum
Journal: Stud Health Technol Inform Date: 2013

2. Leveraging existing corpora for de-identification of psychiatric notes using domain adaptation.

Authors: Hee-Jin Lee; Yaoyun Zhang; Kirk Roberts; Hua Xu
Journal: AMIA Annu Symp Proc Date: 2018-04-16

3. De-identification of Clinical Text via Bi-LSTM-CRF with Neural Language Models.

Authors: Buzhou Tang; Dehuan Jiang; Qingcai Chen; Xiaolong Wang; Jun Yan; Ying Shen
Journal: AMIA Annu Symp Proc Date: 2020-03-04

4. A hybrid approach to automatic de-identification of psychiatric notes.

Authors: Hee-Jin Lee; Yonghui Wu; Yaoyun Zhang; Jun Xu; Hua Xu; Kirk Roberts
Journal: J Biomed Inform Date: 2017-06-07 Impact factor: 6.317

5. De-identification of clinical notes via recurrent neural network and conditional random field.

Authors: Zengjian Liu; Buzhou Tang; Xiaolong Wang; Qingcai Chen
Journal: J Biomed Inform Date: 2017-06-01 Impact factor: 6.317

6. De-identification of patient notes with recurrent neural networks.

Authors: Franck Dernoncourt; Ji Young Lee; Ozlem Uzuner; Peter Szolovits
Journal: J Am Med Inform Assoc Date: 2017-05-01 Impact factor: 4.497

Review 7. Automatic de-identification of textual documents in the electronic health record: a review of recent research.

Authors: Stephane M Meystre; F Jeffrey Friedlin; Brett R South; Shuying Shen; Matthew H Samore
Journal: BMC Med Res Methodol Date: 2010-08-02 Impact factor: 4.615

8. Automated de-identification of free-text medical records.

Authors: Ishna Neamatullah; Margaret M Douglass; Li-wei H Lehman; Andrew Reisner; Mauricio Villarroel; William J Long; Peter Szolovits; George B Moody; Roger G Mark; Gari D Clifford
Journal: BMC Med Inform Decis Mak Date: 2008-07-24 Impact factor: 2.796

9. Scalable and accurate deep learning with electronic health records.

Authors: Alvin Rajkomar; Eyal Oren; Kai Chen; Andrew M Dai; Nissan Hajaj; Michaela Hardt; Peter J Liu; Xiaobing Liu; Jake Marcus; Mimi Sun; Patrik Sundberg; Hector Yee; Kun Zhang; Yi Zhang; Gerardo Flores; Gavin E Duggan; Jamie Irvine; Quoc Le; Kurt Litsch; Alexander Mossin; Justin Tansuwan; James Wexler; Jimbo Wilson; Dana Ludwig; Samuel L Volchenboum; Katherine Chou; Michael Pearson; Srinivasan Madabushi; Nigam H Shah; Atul J Butte; Michael D Howell; Claire Cui; Greg S Corrado; Jeffrey Dean
Journal: NPJ Digit Med Date: 2018-05-08

10. Customization scenarios for de-identification of clinical notes.

Authors: Tzvika Hartman; Michael D Howell; Jeff Dean; Shlomo Hoory; Ronit Slyper; Itay Laish; Oren Gilon; Danny Vainstein; Greg Corrado; Katherine Chou; Ming Jack Po; Jutta Williams; Scott Ellis; Gavin Bee; Avinatan Hassidim; Rony Amira; Genady Beryozkin; Idan Szpektor; Yossi Matias
Journal: BMC Med Inform Decis Mak Date: 2020-01-30 Impact factor: 2.796