Literature DB >> 20190058

Effects of personal identifier resynthesis on clinical text de-identification.

Reyyan Yeniterzi1, John Aberdeen, Samuel Bayer, Ben Wellner, Lynette Hirschman, Bradley Malin.   

Abstract

OBJECTIVE: De-identified medical records are critical to biomedical research. Text de-identification software exists, including "resynthesis" components that replace real identifiers with synthetic identifiers. The goal of this research is to evaluate the effectiveness and examine possible bias introduced by resynthesis on de-identification software.
DESIGN: We evaluated the open-source MITRE Identification Scrubber Toolkit, which includes a resynthesis capability, with clinical text from Vanderbilt University Medical Center patient records. We investigated four record classes from over 500 patients' files, including laboratory reports, medication orders, discharge summaries and clinical notes. We trained and tested the de-identification tool on real and resynthesized records. MEASUREMENTS: We measured performance in terms of precision, recall, F-measure and accuracy for the detection of protected health identifiers as designated by the HIPAA Safe Harbor Rule.
RESULTS: The de-identification tool was trained and tested on a collection of real and resynthesized Vanderbilt records. Results for training and testing on the real records were 0.990 accuracy and 0.960 F-measure. The results improved when trained and tested on resynthesized records with 0.998 accuracy and 0.980 F-measure but deteriorated moderately when trained on real records and tested on resynthesized records with 0.989 accuracy 0.862 F-measure. Moreover, the results declined significantly when trained on resynthesized records and tested on real records with 0.942 accuracy and 0.728 F-measure.
CONCLUSION: The de-identification tool achieves high accuracy when training and test sets are homogeneous (ie, both real or resynthesized records). The resynthesis component regularizes the data to make them less "realistic," resulting in loss of performance particularly when training on resynthesized data and testing on real data.

Entities:  

Mesh:

Year:  2010        PMID: 20190058      PMCID: PMC3000784          DOI: 10.1136/jamia.2009.002212

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


  17 in total

1.  Medical document anonymization with a semantic lexicon.

Authors:  P Ruch; R H Baud; A M Rassinoux; P Bouillon; G Robert
Journal:  Proc AMIA Symp       Date:  2000

2.  Standards for privacy of individually identifiable health information. Final rule.

Authors: 
Journal:  Fed Regist       Date:  2002-08-14

3.  A framework for clinical communication supporting healthcare delivery.

Authors:  Jim Jirjis; Jacob B Weiss; Dario Giuse; S Trent Rosenbloom
Journal:  AMIA Annu Symp Proc       Date:  2005

4.  Assessing the difficulty and time cost of de-identification in clinical narratives.

Authors:  D A Dorr; W F Phillips; S Phansalkar; S A Sims; J F Hurdle
Journal:  Methods Inf Med       Date:  2006       Impact factor: 2.176

5.  Rapidly retargetable approaches to de-identification in medical records.

Authors:  Ben Wellner; Matt Huyck; Scott Mardis; John Aberdeen; Alex Morgan; Leonid Peshkin; Alex Yeh; Janet Hitzeman; Lynette Hirschman
Journal:  J Am Med Inform Assoc       Date:  2007-06-28       Impact factor: 4.497

6.  Evaluating the state-of-the-art in automatic de-identification.

Authors:  Ozlem Uzuner; Yuan Luo; Peter Szolovits
Journal:  J Am Med Inform Assoc       Date:  2007-06-28       Impact factor: 4.497

7.  Replacing personally-identifying information in medical records, the Scrub system.

Authors:  L Sweeney
Journal:  Proc AMIA Annu Fall Symp       Date:  1996

8.  Concept-match medical data scrubbing. How pathology text can be used in research.

Authors:  Jules J Berman
Journal:  Arch Pathol Lab Med       Date:  2003-06       Impact factor: 5.534

9.  Evaluation of a deidentification (De-Id) software engine to share pathology reports and clinical documents for research.

Authors:  Dilip Gupta; Melissa Saul; John Gilbertson
Journal:  Am J Clin Pathol       Date:  2004-02       Impact factor: 2.493

10.  Development and evaluation of an open source software tool for deidentification of pathology reports.

Authors:  Bruce A Beckwith; Rajeshwarri Mahaadevan; Ulysses J Balis; Frank Kuo
Journal:  BMC Med Inform Decis Mak       Date:  2006-03-06       Impact factor: 2.796

View more
  14 in total

1.  Voice-dictated versus typed-in clinician notes: linguistic properties and the potential implications on natural language processing.

Authors:  Kai Zheng; Qiaozhu Mei; Lei Yang; Frank J Manion; Ulysses J Balis; David A Hanauer
Journal:  AMIA Annu Symp Proc       Date:  2011-10-22

Review 2.  Strategies for de-identification and anonymization of electronic health record data for use in multicenter research studies.

Authors:  Clete A Kushida; Deborah A Nichols; Rik Jadrnicek; Ric Miller; James K Walsh; Kara Griffin
Journal:  Med Care       Date:  2012-07       Impact factor: 2.983

3.  Hiding in plain sight: use of realistic surrogates to reduce exposure of protected health information in clinical text.

Authors:  David Carrell; Bradley Malin; John Aberdeen; Samuel Bayer; Cheryl Clark; Ben Wellner; Lynette Hirschman
Journal:  J Am Med Inform Assoc       Date:  2012-07-06       Impact factor: 4.497

4.  The machine giveth and the machine taketh away: a parrot attack on clinical text deidentified with hiding in plain sight.

Authors:  David S Carrell; David J Cronkite; Muqun Rachel Li; Steve Nyemba; Bradley A Malin; John S Aberdeen; Lynette Hirschman
Journal:  J Am Med Inform Assoc       Date:  2019-12-01       Impact factor: 4.497

5.  Efficient Active Learning for Electronic Medical Record De-identification.

Authors:  Muqun Li; Martin Scaiano; Khaled El Emam; Bradley A Malin
Journal:  AMIA Jt Summits Transl Sci Proc       Date:  2019-05-06

6.  BoB, a best-of-breed automated text de-identification system for VHA clinical documents.

Authors:  Oscar Ferrández; Brett R South; Shuying Shen; F Jeffrey Friedlin; Matthew H Samore; Stéphane M Meystre
Journal:  J Am Med Inform Assoc       Date:  2012-09-04       Impact factor: 4.497

7.  Is the Juice Worth the Squeeze? Costs and Benefits of Multiple Human Annotators for Clinical Text De-identification.

Authors:  David S Carrell; David J Cronkite; Bradley A Malin; John S Aberdeen; Lynette Hirschman
Journal:  Methods Inf Med       Date:  2016-07-13       Impact factor: 2.176

8.  Building a best-in-class automated de-identification tool for electronic health records through ensemble learning.

Authors:  Karthik Murugadoss; Ajit Rajasekharan; Bradley Malin; Vineet Agarwal; Sairam Bade; Jeff R Anderson; Jason L Ross; William A Faubion; John D Halamka; Venky Soundararajan; Sankar Ardhanari
Journal:  Patterns (N Y)       Date:  2021-05-12

9.  Preparing an annotated gold standard corpus to share with extramural investigators for de-identification research.

Authors:  Todd Lingren; Yizhao Ni; Louise Deleger; Megan Kaiser; Laura Stoutenborough; Keith Marsolo; Michal Kouril; Katalin Molnar; Imre Solti
Journal:  J Biomed Inform       Date:  2014-02-17       Impact factor: 6.317

10.  Resilience of clinical text de-identified with "hiding in plain sight" to hostile reidentification attacks by human readers.

Authors:  David S Carrell; Bradley A Malin; David J Cronkite; John S Aberdeen; Cheryl Clark; Muqun Rachel Li; Dikshya Bastakoty; Steve Nyemba; Lynette Hirschman
Journal:  J Am Med Inform Assoc       Date:  2020-07-01       Impact factor: 4.497

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.