Literature DB >> 27405787

Is the Juice Worth the Squeeze? Costs and Benefits of Multiple Human Annotators for Clinical Text De-identification.

David S Carrell1, David J Cronkite, Bradley A Malin, John S Aberdeen, Lynette Hirschman.   

Abstract

BACKGROUND: Clinical text contains valuable information but must be de-identified before it can be used for secondary purposes. Accurate annotation of personally identifiable information (PII) is essential to the development of automated de-identification systems and to manual redaction of PII. Yet the accuracy of annotations may vary considerably across individual annotators and annotation is costly. As such, the marginal benefit of incorporating additional annotators has not been well characterized.
OBJECTIVES: This study models the costs and benefits of incorporating increasing numbers of independent human annotators to identify the instances of PII in a corpus. We used a corpus with gold standard annotations to evaluate the performance of teams of annotators of increasing size.
METHODS: Four annotators independently identified PII in a 100-document corpus consisting of randomly selected clinical notes from Family Practice clinics in a large integrated health care system. These annotations were pooled and validated to generate a gold standard corpus for evaluation.
RESULTS: Recall rates for all PII types ranged from 0.90 to 0.98 for individual annotators to 0.998 to 1.0 for teams of three, when meas-ured against the gold standard. Median cost per PII instance discovered during corpus annotation ranged from $ 0.71 for an individual annotator to $ 377 for annotations discovered only by a fourth annotator.
CONCLUSIONS: Incorporating a second annotator into a PII annotation process reduces unredacted PII and improves the quality of annotations to 0.99 recall, yielding clear benefit at reasonable cost; the cost advantages of annotation teams larger than two diminish rapidly.

Entities:  

Keywords:  Patient data privacy; cost analysis; data sharing; natural language processing

Mesh:

Year:  2016        PMID: 27405787      PMCID: PMC5194214          DOI: 10.3414/ME15-01-0122

Source DB:  PubMed          Journal:  Methods Inf Med        ISSN: 0026-1270            Impact factor:   2.176


  21 in total

1.  Standards for privacy of individually identifiable health information. Final rule.

Authors: 
Journal:  Fed Regist       Date:  2002-08-14

2.  Hiding in plain sight: use of realistic surrogates to reduce exposure of protected health information in clinical text.

Authors:  David Carrell; Bradley Malin; John Aberdeen; Samuel Bayer; Cheryl Clark; Ben Wellner; Lynette Hirschman
Journal:  J Am Med Inform Assoc       Date:  2012-07-06       Impact factor: 4.497

3.  Inductive creation of an annotation schema and a reference standard for de-identification of VA electronic clinical notes.

Authors:  Jeanmarie Mayer; Shuying Shen; Brett R South; Stephane Meystre; F Jeff Friedlin; William R Ray; Matthew Samore
Journal:  AMIA Annu Symp Proc       Date:  2009-11-14

4.  A de-identifier for medical discharge summaries.

Authors:  Ozlem Uzuner; Tawanda C Sibanda; Yuan Luo; Peter Szolovits
Journal:  Artif Intell Med       Date:  2007-11-28       Impact factor: 5.326

5.  Rapidly retargetable approaches to de-identification in medical records.

Authors:  Ben Wellner; Matt Huyck; Scott Mardis; John Aberdeen; Alex Morgan; Leonid Peshkin; Alex Yeh; Janet Hitzeman; Lynette Hirschman
Journal:  J Am Med Inform Assoc       Date:  2007-06-28       Impact factor: 4.497

6.  Developing a standard for de-identifying electronic patient records written in Swedish: precision, recall and F-measure in a manual and computerized annotation trial.

Authors:  Sumithra Velupillai; Hercules Dalianis; Martin Hassel; Gunnar H Nilsson
Journal:  Int J Med Inform       Date:  2009-05-23       Impact factor: 4.046

7.  The MITRE Identification Scrubber Toolkit: design, training, and assessment.

Authors:  John Aberdeen; Samuel Bayer; Reyyan Yeniterzi; Ben Wellner; Cheryl Clark; David Hanauer; Bradley Malin; Lynette Hirschman
Journal:  Int J Med Inform       Date:  2010-10-14       Impact factor: 4.046

Review 8.  Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task Track 1.

Authors:  Amber Stubbs; Christopher Kotfila; Özlem Uzuner
Journal:  J Biomed Inform       Date:  2015-07-28       Impact factor: 6.317

9.  Scaling drug indication curation through crowdsourcing.

Authors:  Ritu Khare; John D Burger; John S Aberdeen; David W Tresner-Kirsch; Theodore J Corrales; Lynette Hirchman; Zhiyong Lu
Journal:  Database (Oxford)       Date:  2015-03-22       Impact factor: 3.451

10.  Combining knowledge- and data-driven methods for de-identification of clinical narratives.

Authors:  Azad Dehghan; Aleksandar Kovacevic; George Karystianis; John A Keane; Goran Nenadic
Journal:  J Biomed Inform       Date:  2015-07-22       Impact factor: 6.317

View more
  7 in total

1.  The machine giveth and the machine taketh away: a parrot attack on clinical text deidentified with hiding in plain sight.

Authors:  David S Carrell; David J Cronkite; Muqun Rachel Li; Steve Nyemba; Bradley A Malin; John S Aberdeen; Lynette Hirschman
Journal:  J Am Med Inform Assoc       Date:  2019-12-01       Impact factor: 4.497

2.  Scalable Iterative Classification for Sanitizing Large-Scale Datasets.

Authors:  Bo Li; Yevgeniy Vorobeychik; Muqun Li; Bradley Malin
Journal:  IEEE Trans Knowl Data Eng       Date:  2016-11-11       Impact factor: 6.977

Review 3.  Making Sense of Big Textual Data for Health Care: Findings from the Section on Clinical Natural Language Processing.

Authors:  A Névéol; P Zweigenbaum
Journal:  Yearb Med Inform       Date:  2017-09-11

4.  Resilience of clinical text de-identified with "hiding in plain sight" to hostile reidentification attacks by human readers.

Authors:  David S Carrell; Bradley A Malin; David J Cronkite; John S Aberdeen; Cheryl Clark; Muqun Rachel Li; Dikshya Bastakoty; Steve Nyemba; Lynette Hirschman
Journal:  J Am Med Inform Assoc       Date:  2020-07-01       Impact factor: 4.497

5.  Learning to identify Protected Health Information by integrating knowledge- and data-driven algorithms: A case study on psychiatric evaluation notes.

Authors:  Azad Dehghan; Aleksandar Kovacevic; George Karystianis; John A Keane; Goran Nenadic
Journal:  J Biomed Inform       Date:  2017-06-07       Impact factor: 6.317

6.  Evaluating the re-identification risk of a clinical study report anonymized under EMA Policy 0070 and Health Canada Regulations.

Authors:  Janice Branson; Nathan Good; Jung-Wei Chen; Will Monge; Christian Probst; Khaled El Emam
Journal:  Trials       Date:  2020-02-18       Impact factor: 2.279

7.  The OpenDeID corpus for patient de-identification.

Authors:  Jitendra Jonnagaddala; Aipeng Chen; Sean Batongbacal; Chandini Nekkantti
Journal:  Sci Rep       Date:  2021-10-07       Impact factor: 4.379

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.