Literature DB >> 26122526

Automatic de-identification of electronic medical records using token-level and character-level conditional random fields.

Zengjian Liu1, Yangxin Chen2, Buzhou Tang3, Xiaolong Wang4, Qingcai Chen5, Haodi Li6, Jingfeng Wang7, Qiwen Deng8, Suisong Zhu9.   

Abstract

De-identification, identifying and removing all protected health information (PHI) present in clinical data including electronic medical records (EMRs), is a critical step in making clinical data publicly available. The 2014 i2b2 (Center of Informatics for Integrating Biology and Bedside) clinical natural language processing (NLP) challenge sets up a track for de-identification (track 1). In this study, we propose a hybrid system based on both machine learning and rule approaches for the de-identification track. In our system, PHI instances are first identified by two (token-level and character-level) conditional random fields (CRFs) and a rule-based classifier, and then are merged by some rules. Experiments conducted on the i2b2 corpus show that our system submitted for the challenge achieves the highest micro F-scores of 94.64%, 91.24% and 91.63% under the "token", "strict" and "relaxed" criteria respectively, which is among top-ranked systems of the 2014 i2b2 challenge. After integrating some refined localization dictionaries, our system is further improved with F-scores of 94.83%, 91.57% and 91.95% under the "token", "strict" and "relaxed" criteria respectively.
Copyright © 2015 Elsevier Inc. All rights reserved.

Entities:  

Keywords:  De-identification; Electronic medical records; Hybrid method; Natural language processing; Protected health information; i2b2

Mesh:

Year:  2015        PMID: 26122526      PMCID: PMC4988843          DOI: 10.1016/j.jbi.2015.06.009

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  17 in total

1.  Medical document anonymization with a semantic lexicon.

Authors:  P Ruch; R H Baud; A M Rassinoux; P Bouillon; G Robert
Journal:  Proc AMIA Symp       Date:  2000

2.  A successful technique for removing names in pathology reports using an augmented search and replace method.

Authors:  Sean M Thomas; Burke Mamlin; Gunther Schadow; Clement McDonald
Journal:  Proc AMIA Symp       Date:  2002

3.  Rapidly retargetable approaches to de-identification in medical records.

Authors:  Ben Wellner; Matt Huyck; Scott Mardis; John Aberdeen; Alex Morgan; Leonid Peshkin; Alex Yeh; Janet Hitzeman; Lynette Hirschman
Journal:  J Am Med Inform Assoc       Date:  2007-06-28       Impact factor: 4.497

4.  MedEx: a medication information extraction system for clinical narratives.

Authors:  Hua Xu; Shane P Stenner; Son Doan; Kevin B Johnson; Lemuel R Waitman; Joshua C Denny
Journal:  J Am Med Inform Assoc       Date:  2010 Jan-Feb       Impact factor: 4.497

5.  Replacing personally-identifying information in medical records, the Scrub system.

Authors:  L Sweeney
Journal:  Proc AMIA Annu Fall Symp       Date:  1996

6.  A hybrid system for temporal information extraction from clinical text.

Authors:  Buzhou Tang; Yonghui Wu; Min Jiang; Yukun Chen; Joshua C Denny; Hua Xu
Journal:  J Am Med Inform Assoc       Date:  2013-04-09       Impact factor: 4.497

7.  Evaluating word representation features in biomedical named entity recognition tasks.

Authors:  Buzhou Tang; Hongxin Cao; Xiaolong Wang; Qingcai Chen; Hua Xu
Journal:  Biomed Res Int       Date:  2014-03-06       Impact factor: 3.411

8.  Recognizing clinical entities in hospital discharge summaries using Structural Support Vector Machines with word representation features.

Authors:  Buzhou Tang; Hongxin Cao; Yonghui Wu; Min Jiang; Hua Xu
Journal:  BMC Med Inform Decis Mak       Date:  2013-04-05       Impact factor: 2.796

9.  Large-scale evaluation of automated clinical note de-identification and its impact on information extraction.

Authors:  Louise Deleger; Katalin Molnar; Guergana Savova; Fei Xia; Todd Lingren; Qi Li; Keith Marsolo; Anil Jegga; Megan Kaiser; Laura Stoutenborough; Imre Solti
Journal:  J Am Med Inform Assoc       Date:  2012-08-02       Impact factor: 4.497

10.  Development and evaluation of an open source software tool for deidentification of pathology reports.

Authors:  Bruce A Beckwith; Rajeshwarri Mahaadevan; Ulysses J Balis; Frank Kuo
Journal:  BMC Med Inform Decis Mak       Date:  2006-03-06       Impact factor: 2.796

View more
  21 in total

1.  Leveraging existing corpora for de-identification of psychiatric notes using domain adaptation.

Authors:  Hee-Jin Lee; Yaoyun Zhang; Kirk Roberts; Hua Xu
Journal:  AMIA Annu Symp Proc       Date:  2018-04-16

Review 2.  De-identification of psychiatric intake records: Overview of 2016 CEGS N-GRID shared tasks Track 1.

Authors:  Amber Stubbs; Michele Filannino; Özlem Uzuner
Journal:  J Biomed Inform       Date:  2017-06-11       Impact factor: 6.317

3.  Automatic prediction of coronary artery disease from clinical narratives.

Authors:  Kevin Buchan; Michele Filannino; Özlem Uzuner
Journal:  J Biomed Inform       Date:  2017-06-27       Impact factor: 6.317

4.  Practical applications for natural language processing in clinical research: The 2014 i2b2/UTHealth shared tasks.

Authors:  Özlem Uzuner; Amber Stubbs
Journal:  J Biomed Inform       Date:  2015-10-24       Impact factor: 6.317

5.  Scalable Iterative Classification for Sanitizing Large-Scale Datasets.

Authors:  Bo Li; Yevgeniy Vorobeychik; Muqun Li; Bradley Malin
Journal:  IEEE Trans Knowl Data Eng       Date:  2016-11-11       Impact factor: 6.977

6.  The UAB Informatics Institute and 2016 CEGS N-GRID de-identification shared task challenge.

Authors:  Duy Duc An Bui; Mathew Wyatt; James J Cimino
Journal:  J Biomed Inform       Date:  2017-05-03       Impact factor: 6.317

Review 7.  Overview of the First Natural Language Processing Challenge for Extracting Medication, Indication, and Adverse Drug Events from Electronic Health Record Notes (MADE 1.0).

Authors:  Abhyuday Jagannatha; Feifan Liu; Weisong Liu; Hong Yu
Journal:  Drug Saf       Date:  2019-01       Impact factor: 5.606

8.  Building a best-in-class automated de-identification tool for electronic health records through ensemble learning.

Authors:  Karthik Murugadoss; Ajit Rajasekharan; Bradley Malin; Vineet Agarwal; Sairam Bade; Jeff R Anderson; Jason L Ross; William A Faubion; John D Halamka; Venky Soundararajan; Sankar Ardhanari
Journal:  Patterns (N Y)       Date:  2021-05-12

9.  A computational framework for converting textual clinical diagnostic criteria into the quality data model.

Authors:  Na Hong; Dingcheng Li; Yue Yu; Qiongying Xiu; Hongfang Liu; Guoqian Jiang
Journal:  J Biomed Inform       Date:  2016-07-19       Impact factor: 6.317

10.  An automatic system to identify heart disease risk factors in clinical texts over time.

Authors:  Qingcai Chen; Haodi Li; Buzhou Tang; Xiaolong Wang; Xin Liu; Zengjian Liu; Shu Liu; Weida Wang; Qiwen Deng; Suisong Zhu; Yangxin Chen; Jingfeng Wang
Journal:  J Biomed Inform       Date:  2015-09-08       Impact factor: 6.317

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.