Literature DB >> 28579533

De-identification of clinical notes via recurrent neural network and conditional random field.

Zengjian Liu1, Buzhou Tang2, Xiaolong Wang3, Qingcai Chen4.   

Abstract

De-identification, identifying information from data, such as protected health information (PHI) present in clinical data, is a critical step to enable data to be shared or published. The 2016 Centers of Excellence in Genomic Science (CEGS) Neuropsychiatric Genome-scale and RDOC Individualized Domains (N-GRID) clinical natural language processing (NLP) challenge contains a de-identification track in de-identifying electronic medical records (EMRs) (i.e., track 1). The challenge organizers provide 1000 annotated mental health records for this track, 600 out of which are used as a training set and 400 as a test set. We develop a hybrid system for the de-identification task on the training set. Firstly, four individual subsystems, that is, a subsystem based on bidirectional LSTM (long-short term memory, a variant of recurrent neural network), a subsystem-based on bidirectional LSTM with features, a subsystem based on conditional random field (CRF) and a rule-based subsystem, are used to identify PHI instances. Then, an ensemble learning-based classifiers is deployed to combine all PHI instances predicted by above three machine learning-based subsystems. Finally, the results of the ensemble learning-based classifier and the rule-based subsystem are merged together. Experiments conducted on the official test set show that our system achieves the highest micro F1-scores of 93.07%, 91.43% and 95.23% under the "token", "strict" and "binary token" criteria respectively, ranking first in the 2016 CEGS N-GRID NLP challenge. In addition, on the dataset of 2014 i2b2 NLP challenge, our system achieves the highest micro F1-scores of 96.98%, 95.11% and 98.28% under the "token", "strict" and "binary token" criteria respectively, outperforming other state-of-the-art systems. All these experiments prove the effectiveness of our proposed method.
Copyright © 2017. Published by Elsevier Inc.

Entities:  

Keywords:  De-identification; Ensemble system; Natural language processing; Protected health information; Recurrent neural network

Mesh:

Year:  2017        PMID: 28579533      PMCID: PMC5705329          DOI: 10.1016/j.jbi.2017.05.023

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  25 in total

1.  CRFs based de-identification of medical records.

Authors:  Bin He; Yi Guan; Jianyi Cheng; Keting Cen; Wenlan Hua
Journal:  J Biomed Inform       Date:  2015-08-24       Impact factor: 6.317

2.  Rapidly retargetable approaches to de-identification in medical records.

Authors:  Ben Wellner; Matt Huyck; Scott Mardis; John Aberdeen; Alex Morgan; Leonid Peshkin; Alex Yeh; Janet Hitzeman; Lynette Hirschman
Journal:  J Am Med Inform Assoc       Date:  2007-06-28       Impact factor: 4.497

3.  Replacing personally-identifying information in medical records, the Scrub system.

Authors:  L Sweeney
Journal:  Proc AMIA Annu Fall Symp       Date:  1996

4.  Practical applications for natural language processing in clinical research: The 2014 i2b2/UTHealth shared tasks.

Authors:  Özlem Uzuner; Amber Stubbs
Journal:  J Biomed Inform       Date:  2015-10-24       Impact factor: 6.317

Review 5.  Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task Track 1.

Authors:  Amber Stubbs; Christopher Kotfila; Özlem Uzuner
Journal:  J Biomed Inform       Date:  2015-07-28       Impact factor: 6.317

6.  De-identification of patient notes with recurrent neural networks.

Authors:  Franck Dernoncourt; Ji Young Lee; Ozlem Uzuner; Peter Szolovits
Journal:  J Am Med Inform Assoc       Date:  2017-05-01       Impact factor: 4.497

7.  Automatic detection of protected health information from clinic narratives.

Authors:  Hui Yang; Jonathan M Garibaldi
Journal:  J Biomed Inform       Date:  2015-07-29       Impact factor: 6.317

8.  Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features.

Authors:  Buzhou Tang; Hongxin Cao; Yonghui Wu; Min Jiang; Hua Xu
Journal:  BMC Med Inform Decis Mak       Date:  2013-04-05       Impact factor: 2.796

9.  Large-scale evaluation of automated clinical note de-identification and its impact on information extraction.

Authors:  Louise Deleger; Katalin Molnar; Guergana Savova; Fei Xia; Todd Lingren; Qi Li; Keith Marsolo; Anil Jegga; Megan Kaiser; Laura Stoutenborough; Imre Solti
Journal:  J Am Med Inform Assoc       Date:  2012-08-02       Impact factor: 4.497

10.  Development and evaluation of an open source software tool for deidentification of pathology reports.

Authors:  Bruce A Beckwith; Rajeshwarri Mahaadevan; Ulysses J Balis; Frank Kuo
Journal:  BMC Med Inform Decis Mak       Date:  2006-03-06       Impact factor: 2.796

View more
  28 in total

Review 1.  De-identification of psychiatric intake records: Overview of 2016 CEGS N-GRID shared tasks Track 1.

Authors:  Amber Stubbs; Michele Filannino; Özlem Uzuner
Journal:  J Biomed Inform       Date:  2017-06-11       Impact factor: 6.317

2.  Ensemble-based Methods to Improve De-identification of Electronic Health Record Narratives.

Authors:  Youngjun Kim; Paul Heider; Stéphane Meystre
Journal:  AMIA Annu Symp Proc       Date:  2018-12-05

3.  A study of deep learning approaches for medication and adverse drug event extraction from clinical text.

Authors:  Qiang Wei; Zongcheng Ji; Zhiheng Li; Jingcheng Du; Jingqi Wang; Jun Xu; Yang Xiang; Firat Tiryaki; Stephen Wu; Yaoyun Zhang; Cui Tao; Hua Xu
Journal:  J Am Med Inform Assoc       Date:  2020-01-01       Impact factor: 4.497

4.  Comparative Study of Various Approaches for Ensemble-based De-identification of Electronic Health Record Narratives.

Authors:  Youngjun Kim; Paul M Heider; Stéphane M Meystre
Journal:  AMIA Annu Symp Proc       Date:  2021-01-25

Review 5.  Deep learning in clinical natural language processing: a methodical review.

Authors:  Stephen Wu; Kirk Roberts; Surabhi Datta; Jingcheng Du; Zongcheng Ji; Yuqi Si; Sarvesh Soni; Qiong Wang; Qiang Wei; Yang Xiang; Bo Zhao; Hua Xu
Journal:  J Am Med Inform Assoc       Date:  2020-03-01       Impact factor: 4.497

6.  De-identification of Clinical Text via Bi-LSTM-CRF with Neural Language Models.

Authors:  Buzhou Tang; Dehuan Jiang; Qingcai Chen; Xiaolong Wang; Jun Yan; Ying Shen
Journal:  AMIA Annu Symp Proc       Date:  2020-03-04

7.  A natural language processing challenge for clinical records: Research Domains Criteria (RDoC) for psychiatry.

Authors:  Özlem Uzuner; Amber Stubbs; Michele Filannino
Journal:  J Biomed Inform       Date:  2017-10-16       Impact factor: 6.317

8.  Development and Evaluation of an Automated Approach to Detect Weight Abnormalities in Pediatric Weight Charts.

Authors:  Lei Liu; Danny T Y Wu; S Andrew Spooner; Yizhao Ni
Journal:  AMIA Annu Symp Proc       Date:  2022-02-21

Review 9.  Clinical concept extraction: A methodology review.

Authors:  Sunyang Fu; David Chen; Huan He; Sijia Liu; Sungrim Moon; Kevin J Peterson; Feichen Shen; Liwei Wang; Yanshan Wang; Andrew Wen; Yiqing Zhao; Sunghwan Sohn; Hongfang Liu
Journal:  J Biomed Inform       Date:  2020-08-06       Impact factor: 6.317

10.  BENTO: A Visual Platform for Building Clinical NLP Pipelines Based on CodaLab.

Authors:  Yonghao Jin; Fei Li; Hong Yu
Journal:  Proc Conf Assoc Comput Linguist Meet       Date:  2020-07
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.