Literature DB >> 23643147

Bootstrapping a de-identification system for narrative patient records: cost-performance tradeoffs.

David Hanauer1, John Aberdeen, Samuel Bayer, Benjamin Wellner, Cheryl Clark, Kai Zheng, Lynette Hirschman.   

Abstract

PURPOSE: We describe an experiment to build a de-identification system for clinical records using the open source MITRE Identification Scrubber Toolkit (MIST). We quantify the human annotation effort needed to produce a system that de-identifies at high accuracy.
METHODS: Using two types of clinical records (history and physical notes, and social work notes), we iteratively built statistical de-identification models by annotating 10 notes, training a model, applying the model to another 10 notes, correcting the model's output, and training from the resulting larger set of annotated notes. This was repeated for 20 rounds of 10 notes each, and then an additional 6 rounds of 20 notes each, and a final round of 40 notes. At each stage, we measured precision, recall, and F-score, and compared these to the amount of annotation time needed to complete the round.
RESULTS: After the initial 10-note round (33min of annotation time) we achieved an F-score of 0.89. After just over 8h of annotation time (round 21) we achieved an F-score of 0.95. Number of annotation actions needed, as well as time needed, decreased in later rounds as model performance improved. Accuracy on history and physical notes exceeded that of social work notes, suggesting that the wider variety and contexts for protected health information (PHI) in social work notes is more difficult to model.
CONCLUSIONS: It is possible, with modest effort, to build a functioning de-identification system de novo using the MIST framework. The resulting system achieved performance comparable to other high-performing de-identification systems.
Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.

Entities:  

Keywords:  Computerized [E05.318.308.940.968.625]; Electronic health records [E05.318.308.940.968.625.500]; Medical informatics [L01.313.500]; Medical record systems; NLP; Natural language processing [L01.224.065.580]; Privacy [I01.880.604.473.352.500]

Mesh:

Year:  2013        PMID: 23643147     DOI: 10.1016/j.ijmedinf.2013.03.005

Source DB:  PubMed          Journal:  Int J Med Inform        ISSN: 1386-5056            Impact factor:   4.046


  11 in total

1.  Assessing the readability of ClinicalTrials.gov.

Authors:  Danny T Y Wu; David A Hanauer; Qiaozhu Mei; Patricia M Clark; Lawrence C An; Joshua Proulx; Qing T Zeng; V G Vinod Vydiswaran; Kevyn Collins-Thompson; Kai Zheng
Journal:  J Am Med Inform Assoc       Date:  2015-08-11       Impact factor: 4.497

2.  Efficient Active Learning for Electronic Medical Record De-identification.

Authors:  Muqun Li; Martin Scaiano; Khaled El Emam; Bradley A Malin
Journal:  AMIA Jt Summits Transl Sci Proc       Date:  2019-05-06

3.  Is the Juice Worth the Squeeze? Costs and Benefits of Multiple Human Annotators for Clinical Text De-identification.

Authors:  David S Carrell; David J Cronkite; Bradley A Malin; John S Aberdeen; Lynette Hirschman
Journal:  Methods Inf Med       Date:  2016-07-13       Impact factor: 2.176

4.  Location bias of identifiers in clinical narratives.

Authors:  David A Hanauer; Qiaozhu Mei; Bradley Malin; Kai Zheng
Journal:  AMIA Annu Symp Proc       Date:  2013-11-16

5.  Classifying Cyber-Risky Clinical Notes by Employing Natural Language Processing.

Authors:  Suzanna Schmeelk; Martins Samuel Dogo; Yifan Peng; Braja Gopal Patra
Journal:  Proc Annu Hawaii Int Conf Syst Sci       Date:  2022-01-04

6.  Privacy Policy and Technology in Biomedical Data Science.

Authors:  April Moreno Arellano; Wenrui Dai; Shuang Wang; Xiaoqian Jiang; Lucila Ohno-Machado
Journal:  Annu Rev Biomed Data Sci       Date:  2018-07

7.  De-identification of patient notes with recurrent neural networks.

Authors:  Franck Dernoncourt; Ji Young Lee; Ozlem Uzuner; Peter Szolovits
Journal:  J Am Med Inform Assoc       Date:  2017-05-01       Impact factor: 4.497

8.  Disambiguating Clinical Abbreviations Using a One-Fits-All Classifier Based on Deep Learning Techniques.

Authors:  Areej Jaber; Paloma Martínez
Journal:  Methods Inf Med       Date:  2022-02-01       Impact factor: 1.800

9.  Using Nonexperts for Annotating Pharmacokinetic Drug-Drug Interaction Mentions in Product Labeling: A Feasibility Study.

Authors:  Harry Hochheiser; Yifan Ning; Andres Hernandez; John R Horn; Rebecca Jacobson; Richard D Boyce
Journal:  JMIR Res Protoc       Date:  2016-04-11

10.  NOBLE - Flexible concept recognition for large-scale biomedical natural language processing.

Authors:  Eugene Tseytlin; Kevin Mitchell; Elizabeth Legowski; Julia Corrigan; Girish Chavan; Rebecca S Jacobson
Journal:  BMC Bioinformatics       Date:  2016-01-14       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.