Literature DB >> 32930712

Resilience of clinical text de-identified with "hiding in plain sight" to hostile reidentification attacks by human readers.

David S Carrell1, Bradley A Malin2, David J Cronkite1, John S Aberdeen3, Cheryl Clark3, Muqun Rachel Li4, Dikshya Bastakoty2, Steve Nyemba2, Lynette Hirschman3.   

Abstract

OBJECTIVE: Effective, scalable de-identification of personally identifying information (PII) for information-rich clinical text is critical to support secondary use, but no method is 100% effective. The hiding-in-plain-sight (HIPS) approach attempts to solve this "residual PII problem." HIPS replaces PII tagged by a de-identification system with realistic but fictitious (resynthesized) content, making it harder to detect remaining unredacted PII.
MATERIALS AND METHODS: Using 2000 representative clinical documents from 2 healthcare settings (4000 total), we used a novel method to generate 2 de-identified 100-document corpora (200 documents total) in which PII tagged by a typical automated machine-learned tagger was replaced by HIPS-resynthesized content. Four readers conducted aggressive reidentification attacks to isolate leaked PII: 2 readers from within the originating institution and 2 external readers.
RESULTS: Overall, mean recall of leaked PII was 26.8% and mean precision was 37.2%. Mean recall was 9% (mean precision = 37%) for patient ages, 32% (mean precision = 26%) for dates, 25% (mean precision = 37%) for doctor names, 45% (mean precision = 55%) for organization names, and 23% (mean precision = 57%) for patient names. Recall was 32% (precision = 40%) for internal and 22% (precision =33%) for external readers. DISCUSSION AND
CONCLUSIONS: Approximately 70% of leaked PII "hiding" in a corpus de-identified with HIPS resynthesis is resilient to detection by human readers in a realistic, aggressive reidentification attack scenario-more than double the rate reported in previous studies but less than the rate reported for an attack assisted by machine learning methods.
© The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.

Entities:  

Keywords:  biomedical research; confidentiality; de-identification; electronic health records; natural language processing; privacy

Mesh:

Year:  2020        PMID: 32930712      PMCID: PMC7647331          DOI: 10.1093/jamia/ocaa095

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


  37 in total

1.  Identification of patient name references within medical documents using semantic selectional restrictions.

Authors:  Ricky K Taira; Alex A T Bui; Hooshang Kangarloo
Journal:  Proc AMIA Symp       Date:  2002

2.  Rapidly retargetable approaches to de-identification in medical records.

Authors:  Ben Wellner; Matt Huyck; Scott Mardis; John Aberdeen; Alex Morgan; Leonid Peshkin; Alex Yeh; Janet Hitzeman; Lynette Hirschman
Journal:  J Am Med Inform Assoc       Date:  2007-06-28       Impact factor: 4.497

Review 3.  Extracting information from textual documents in the electronic health record: a review of recent research.

Authors:  S M Meystre; G K Savova; K C Kipper-Schuler; J F Hurdle
Journal:  Yearb Med Inform       Date:  2008

4.  BoB, a best-of-breed automated text de-identification system for VHA clinical documents.

Authors:  Oscar Ferrández; Brett R South; Shuying Shen; F Jeffrey Friedlin; Matthew H Samore; Stéphane M Meystre
Journal:  J Am Med Inform Assoc       Date:  2012-09-04       Impact factor: 4.497

5.  The Hippocratic bargain and health information technology.

Authors:  Mark A Rothstein
Journal:  J Law Med Ethics       Date:  2010       Impact factor: 1.718

6.  A clinical text classification paradigm using weak supervision and deep representation.

Authors:  Yanshan Wang; Sunghwan Sohn; Sijia Liu; Feichen Shen; Liwei Wang; Elizabeth J Atkinson; Shreyasee Amin; Hongfang Liu
Journal:  BMC Med Inform Decis Mak       Date:  2019-01-07       Impact factor: 2.796

7.  Natural language processing of symptoms documented in free-text narratives of electronic health records: a systematic review.

Authors:  Theresa A Koleck; Caitlin Dreisbach; Philip E Bourne; Suzanne Bakken
Journal:  J Am Med Inform Assoc       Date:  2019-04-01       Impact factor: 4.497

8.  Repurposing the clinical record: can an existing natural language processing system de-identify clinical notes?

Authors:  Frances P Morrison; Li Li; Albert M Lai; George Hripcsak
Journal:  J Am Med Inform Assoc       Date:  2008-10-24       Impact factor: 4.497

9.  Optimizing annotation resources for natural language de-identification via a game theoretic framework.

Authors:  Muqun Li; David Carrell; John Aberdeen; Lynette Hirschman; Jacqueline Kirby; Bo Li; Yevgeniy Vorobeychik; Bradley A Malin
Journal:  J Biomed Inform       Date:  2016-03-25       Impact factor: 6.317

10.  Protecting patient privacy when sharing patient-level data from clinical trials.

Authors:  Katherine Tucker; Janice Branson; Maria Dilleen; Sally Hollis; Paul Loughlin; Mark J Nixon; Zoë Williams
Journal:  BMC Med Res Methodol       Date:  2016-07-08       Impact factor: 4.615

View more
  1 in total

1.  Informatics impact requires effective, scalable tools and standards-based infrastructure.

Authors:  Suzanne Bakken
Journal:  J Am Med Inform Assoc       Date:  2020-07-01       Impact factor: 4.497

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.