Literature DB >> 22771529

Hiding in plain sight: use of realistic surrogates to reduce exposure of protected health information in clinical text.

David Carrell1, Bradley Malin, John Aberdeen, Samuel Bayer, Cheryl Clark, Ben Wellner, Lynette Hirschman.   

Abstract

OBJECTIVE: Secondary use of clinical text is impeded by a lack of highly effective, low-cost de-identification methods. Both, manual and automated methods for removing protected health information, are known to leave behind residual identifiers. The authors propose a novel approach for addressing the residual identifier problem based on the theory of Hiding In Plain Sight (HIPS).
MATERIALS AND METHODS: HIPS relies on obfuscation to conceal residual identifiers. According to this theory, replacing the detected identifiers with realistic but synthetic surrogates should collectively render the few 'leaked' identifiers difficult to distinguish from the synthetic surrogates. The authors conducted a pilot study to test this theory on clinical narrative, de-identified by an automated system. Test corpora included 31 oncology and 50 family practice progress notes read by two trained chart abstractors and an informaticist.
RESULTS: Experimental results suggest approximately 90% of residual identifiers can be effectively concealed by the HIPS approach in text containing average and high densities of personal identifying information. DISCUSSION: This pilot test suggests HIPS is feasible, but requires further evaluation. The results need to be replicated on larger corpora of diverse origin under a range of detection scenarios. Error analyses also suggest areas where surrogate generation techniques can be refined to improve efficacy.
CONCLUSIONS: If these results generalize to existing high-performing de-identification systems with recall rates of 94-98%, HIPS could increase the effective de-identification rates of these systems to levels above 99% without further advancements in system recall. Additional and more rigorous assessment of the HIPS approach is warranted.

Entities:  

Mesh:

Year:  2012        PMID: 22771529      PMCID: PMC3638183          DOI: 10.1136/amiajnl-2012-001034

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


  22 in total

1.  Identification of patient name references within medical documents using semantic selectional restrictions.

Authors:  Ricky K Taira; Alex A T Bui; Hooshang Kangarloo
Journal:  Proc AMIA Symp       Date:  2002

2.  State-of-the-art anonymization of medical records using an iterative machine learning framework.

Authors:  György Szarvas; Richárd Farkas; Róbert Busa-Fekete
Journal:  J Am Med Inform Assoc       Date:  2007 Sep-Oct       Impact factor: 4.497

3.  Rapidly retargetable approaches to de-identification in medical records.

Authors:  Ben Wellner; Matt Huyck; Scott Mardis; John Aberdeen; Alex Morgan; Leonid Peshkin; Alex Yeh; Janet Hitzeman; Lynette Hirschman
Journal:  J Am Med Inform Assoc       Date:  2007-06-28       Impact factor: 4.497

4.  Evaluating the state-of-the-art in automatic de-identification.

Authors:  Ozlem Uzuner; Yuan Luo; Peter Szolovits
Journal:  J Am Med Inform Assoc       Date:  2007-06-28       Impact factor: 4.497

5.  Optimizing A syndromic surveillance text classifier for influenza-like illness: Does document source matter?

Authors:  Brett R South; Brett Ray South; Wendy W Chapman; Wendy Chapman; Sylvain Delisle; Shuying Shen; Ericka Kalp; Trish Perl; Matthew H Samore; Adi V Gundlapalli
Journal:  AMIA Annu Symp Proc       Date:  2008-11-06

6.  Replacing personally-identifying information in medical records, the Scrub system.

Authors:  L Sweeney
Journal:  Proc AMIA Annu Fall Symp       Date:  1996

7.  Use of an electronic medical record for the identification of research subjects with diabetes mellitus.

Authors:  Russell A Wilke; Richard L Berg; Peggy Peissig; Terrie Kitchner; Bozana Sijercic; Catherine A McCarty; Daniel J McCarty
Journal:  Clin Med Res       Date:  2007-03

8.  A software tool for removing patient identifying information from clinical documents.

Authors:  F Jeff Friedlin; Clement J McDonald
Journal:  J Am Med Inform Assoc       Date:  2008-06-25       Impact factor: 4.497

Review 9.  What can natural language processing do for clinical decision support?

Authors:  Dina Demner-Fushman; Wendy W Chapman; Clement J McDonald
Journal:  J Biomed Inform       Date:  2009-08-13       Impact factor: 6.317

10.  Automated de-identification of free-text medical records.

Authors:  Ishna Neamatullah; Margaret M Douglass; Li-wei H Lehman; Andrew Reisner; Mauricio Villarroel; William J Long; Peter Szolovits; George B Moody; Roger G Mark; Gari D Clifford
Journal:  BMC Med Inform Decis Mak       Date:  2008-07-24       Impact factor: 2.796

View more
  19 in total

Review 1.  De-identification of psychiatric intake records: Overview of 2016 CEGS N-GRID shared tasks Track 1.

Authors:  Amber Stubbs; Michele Filannino; Özlem Uzuner
Journal:  J Biomed Inform       Date:  2017-06-11       Impact factor: 6.317

2.  The machine giveth and the machine taketh away: a parrot attack on clinical text deidentified with hiding in plain sight.

Authors:  David S Carrell; David J Cronkite; Muqun Rachel Li; Steve Nyemba; Bradley A Malin; John S Aberdeen; Lynette Hirschman
Journal:  J Am Med Inform Assoc       Date:  2019-12-01       Impact factor: 4.497

Review 3.  Clinical Data Reuse or Secondary Use: Current Status and Potential Future Progress.

Authors:  S M Meystre; C Lovis; T Bürkle; G Tognola; A Budrionis; C U Lehmann
Journal:  Yearb Med Inform       Date:  2017-09-11

4.  Is the Juice Worth the Squeeze? Costs and Benefits of Multiple Human Annotators for Clinical Text De-identification.

Authors:  David S Carrell; David J Cronkite; Bradley A Malin; John S Aberdeen; Lynette Hirschman
Journal:  Methods Inf Med       Date:  2016-07-13       Impact factor: 2.176

5.  Carrell et al. respond to "Observational research and the EHR".

Authors:  David S Carrell; Scott Halgrim; Diem-Thy Tran; Diana S M Buist; Jessica Chubak; Wendy W Chapman; Guergana Savova
Journal:  Am J Epidemiol       Date:  2014-01-30       Impact factor: 4.897

6.  Scalable Iterative Classification for Sanitizing Large-Scale Datasets.

Authors:  Bo Li; Yevgeniy Vorobeychik; Muqun Li; Bradley Malin
Journal:  IEEE Trans Knowl Data Eng       Date:  2016-11-11       Impact factor: 6.977

7.  Building a best-in-class automated de-identification tool for electronic health records through ensemble learning.

Authors:  Karthik Murugadoss; Ajit Rajasekharan; Bradley Malin; Vineet Agarwal; Sairam Bade; Jeff R Anderson; Jason L Ross; William A Faubion; John D Halamka; Venky Soundararajan; Sankar Ardhanari
Journal:  Patterns (N Y)       Date:  2021-05-12

8.  Resilience of clinical text de-identified with "hiding in plain sight" to hostile reidentification attacks by human readers.

Authors:  David S Carrell; Bradley A Malin; David J Cronkite; John S Aberdeen; Cheryl Clark; Muqun Rachel Li; Dikshya Bastakoty; Steve Nyemba; Lynette Hirschman
Journal:  J Am Med Inform Assoc       Date:  2020-07-01       Impact factor: 4.497

9.  De-identification of clinical narratives through writing complexity measures.

Authors:  Muqun Li; David Carrell; John Aberdeen; Lynette Hirschman; Bradley A Malin
Journal:  Int J Med Inform       Date:  2014-07-24       Impact factor: 4.046

10.  Optimizing annotation resources for natural language de-identification via a game theoretic framework.

Authors:  Muqun Li; David Carrell; John Aberdeen; Lynette Hirschman; Jacqueline Kirby; Bo Li; Yevgeniy Vorobeychik; Bradley A Malin
Journal:  J Biomed Inform       Date:  2016-03-25       Impact factor: 6.317

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.