Literature DB >> 20951082

The MITRE Identification Scrubber Toolkit: design, training, and assessment.

John Aberdeen1, Samuel Bayer, Reyyan Yeniterzi, Ben Wellner, Cheryl Clark, David Hanauer, Bradley Malin, Lynette Hirschman.   

Abstract

PURPOSE: Medical records must often be stripped of patient identifiers, or de-identified, before being shared. De-identification by humans is time-consuming, and existing software is limited in its generality. The open source MITRE Identification Scrubber Toolkit (MIST) provides an environment to support rapid tailoring of automated de-identification to different document types, using automatically learned classifiers to de-identify and protect sensitive information.
METHODS: MIST was evaluated with four classes of patient records from the Vanderbilt University Medical Center: discharge summaries, laboratory reports, letters, and order summaries. We trained and tested MIST on each class of record separately, as well as on pooled sets of records. We measured precision, recall, F-measure and accuracy at the word level for the detection of patient identifiers as designated by the HIPAA Safe Harbor Rule.
RESULTS: MIST was applied to medical records that differed in the amounts and types of protected health information (PHI): lab reports contained only two types of PHI (dates, names) compared to discharge summaries, which were much richer. Performance of the de-identification tool depended on record class; F-measure results were 0.996 for order summaries, 0.996 for discharge summaries, 0.943 for letters and 0.934 for laboratory reports. Experiments suggest the tool requires several hundred training exemplars to reach an F-measure of at least 0.9.
CONCLUSIONS: The MIST toolkit makes possible the rapid tailoring of automated de-identification to particular document types and supports the transition of the de-identification software to medical end users, avoiding the need for developers to have access to original medical records. We are making the MIST toolkit available under an open source license to encourage its application to diverse data sets at multiple institutions.
Copyright © 2010 Elsevier Ireland Ltd. All rights reserved.

Entities:  

Mesh:

Year:  2010        PMID: 20951082     DOI: 10.1016/j.ijmedinf.2010.09.007

Source DB:  PubMed          Journal:  Int J Med Inform        ISSN: 1386-5056            Impact factor:   4.046


  63 in total

1.  Voice-dictated versus typed-in clinician notes: linguistic properties and the potential implications on natural language processing.

Authors:  Kai Zheng; Qiaozhu Mei; Lei Yang; Frank J Manion; Ulysses J Balis; David A Hanauer
Journal:  AMIA Annu Symp Proc       Date:  2011-10-22

Review 2.  Strategies for de-identification and anonymization of electronic health record data for use in multicenter research studies.

Authors:  Clete A Kushida; Deborah A Nichols; Rik Jadrnicek; Ric Miller; James K Walsh; Kara Griffin
Journal:  Med Care       Date:  2012-07       Impact factor: 2.983

3.  Hiding in plain sight: use of realistic surrogates to reduce exposure of protected health information in clinical text.

Authors:  David Carrell; Bradley Malin; John Aberdeen; Samuel Bayer; Cheryl Clark; Ben Wellner; Lynette Hirschman
Journal:  J Am Med Inform Assoc       Date:  2012-07-06       Impact factor: 4.497

4.  Assessing the readability of ClinicalTrials.gov.

Authors:  Danny T Y Wu; David A Hanauer; Qiaozhu Mei; Patricia M Clark; Lawrence C An; Joshua Proulx; Qing T Zeng; V G Vinod Vydiswaran; Kevyn Collins-Thompson; Kai Zheng
Journal:  J Am Med Inform Assoc       Date:  2015-08-11       Impact factor: 4.497

Review 5.  Electronic medical records as a tool in clinical pharmacology: opportunities and challenges.

Authors:  D M Roden; H Xu; J C Denny; R A Wilke
Journal:  Clin Pharmacol Ther       Date:  2012-06       Impact factor: 6.875

6.  Leveraging existing corpora for de-identification of psychiatric notes using domain adaptation.

Authors:  Hee-Jin Lee; Yaoyun Zhang; Kirk Roberts; Hua Xu
Journal:  AMIA Annu Symp Proc       Date:  2018-04-16

7.  Physician characteristics associated with higher adenoma detection rate.

Authors:  Ateev Mehrotra; Michele Morris; Rebecca A Gourevitch; David S Carrell; Daniel A Leffler; Sherri Rose; Julia B Greer; Seth D Crockett; Andrew Baer; Robert E Schoen
Journal:  Gastrointest Endosc       Date:  2017-09-01       Impact factor: 9.427

8.  Building a best-in-class automated de-identification tool for electronic health records through ensemble learning.

Authors:  Karthik Murugadoss; Ajit Rajasekharan; Bradley Malin; Vineet Agarwal; Sairam Bade; Jeff R Anderson; Jason L Ross; William A Faubion; John D Halamka; Venky Soundararajan; Sankar Ardhanari
Journal:  Patterns (N Y)       Date:  2021-05-12

9.  Preparing an annotated gold standard corpus to share with extramural investigators for de-identification research.

Authors:  Todd Lingren; Yizhao Ni; Louise Deleger; Megan Kaiser; Laura Stoutenborough; Keith Marsolo; Michal Kouril; Katalin Molnar; Imre Solti
Journal:  J Biomed Inform       Date:  2014-02-17       Impact factor: 6.317

10.  Resilience of clinical text de-identified with "hiding in plain sight" to hostile reidentification attacks by human readers.

Authors:  David S Carrell; Bradley A Malin; David J Cronkite; John S Aberdeen; Cheryl Clark; Muqun Rachel Li; Dikshya Bastakoty; Steve Nyemba; Lynette Hirschman
Journal:  J Am Med Inform Assoc       Date:  2020-07-01       Impact factor: 4.497

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.