Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Resilience of clinical text de-identified with "hiding in plain sight" to hostile reidentification attacks by human readers.

Literature DB >> 32930712

Resilience of clinical text de-identified with "hiding in plain sight" to hostile reidentification attacks by human readers.

David S Carrell¹, Bradley A Malin², David J Cronkite¹, John S Aberdeen³, Cheryl Clark³, Muqun Rachel Li⁴, Dikshya Bastakoty², Steve Nyemba², Lynette Hirschman³.

Abstract

OBJECTIVE: Effective, scalable de-identification of personally identifying information (PII) for information-rich clinical text is critical to support secondary use, but no method is 100% effective. The hiding-in-plain-sight (HIPS) approach attempts to solve this "residual PII problem." HIPS replaces PII tagged by a de-identification system with realistic but fictitious (resynthesized) content, making it harder to detect remaining unredacted PII.
MATERIALS AND METHODS: Using 2000 representative clinical documents from 2 healthcare settings (4000 total), we used a novel method to generate 2 de-identified 100-document corpora (200 documents total) in which PII tagged by a typical automated machine-learned tagger was replaced by HIPS-resynthesized content. Four readers conducted aggressive reidentification attacks to isolate leaked PII: 2 readers from within the originating institution and 2 external readers.
RESULTS: Overall, mean recall of leaked PII was 26.8% and mean precision was 37.2%. Mean recall was 9% (mean precision = 37%) for patient ages, 32% (mean precision = 26%) for dates, 25% (mean precision = 37%) for doctor names, 45% (mean precision = 55%) for organization names, and 23% (mean precision = 57%) for patient names. Recall was 32% (precision = 40%) for internal and 22% (precision =33%) for external readers. DISCUSSION AND
CONCLUSIONS: Approximately 70% of leaked PII "hiding" in a corpus de-identified with HIPS resynthesis is resilient to detection by human readers in a realistic, aggressive reidentification attack scenario-more than double the rate reported in previous studies but less than the rate reported for an attack assisted by machine learning methods.

Entities: Species

Keywords: biomedical research; confidentiality; de-identification; electronic health records; natural language processing; privacy

Mesh：

Year: 2020 PMID： 32930712 PMCID： PMC7647331 DOI： 10.1093/jamia/ocaa095

Source DB: PubMed Journal: J Am Med Inform Assoc ISSN： 1067-5027 Impact factor: 4.497

37 in total

1. Identification of patient name references within medical documents using semantic selectional restrictions.

Authors: Ricky K Taira; Alex A T Bui; Hooshang Kangarloo
Journal: Proc AMIA Symp Date: 2002

2. Rapidly retargetable approaches to de-identification in medical records.

Authors: Ben Wellner; Matt Huyck; Scott Mardis; John Aberdeen; Alex Morgan; Leonid Peshkin; Alex Yeh; Janet Hitzeman; Lynette Hirschman
Journal: J Am Med Inform Assoc Date: 2007-06-28 Impact factor: 4.497

Review 3. Extracting information from textual documents in the electronic health record: a review of recent research.

Authors: S M Meystre; G K Savova; K C Kipper-Schuler; J F Hurdle
Journal: Yearb Med Inform Date: 2008

4. BoB, a best-of-breed automated text de-identification system for VHA clinical documents.

Authors: Oscar Ferrández; Brett R South; Shuying Shen; F Jeffrey Friedlin; Matthew H Samore; Stéphane M Meystre
Journal: J Am Med Inform Assoc Date: 2012-09-04 Impact factor: 4.497

5. The Hippocratic bargain and health information technology.

Authors: Mark A Rothstein
Journal: J Law Med Ethics Date: 2010 Impact factor: 1.718

6. A clinical text classification paradigm using weak supervision and deep representation.

Authors: Yanshan Wang; Sunghwan Sohn; Sijia Liu; Feichen Shen; Liwei Wang; Elizabeth J Atkinson; Shreyasee Amin; Hongfang Liu
Journal: BMC Med Inform Decis Mak Date: 2019-01-07 Impact factor: 2.796

7. Natural language processing of symptoms documented in free-text narratives of electronic health records: a systematic review.

Authors: Theresa A Koleck; Caitlin Dreisbach; Philip E Bourne; Suzanne Bakken
Journal: J Am Med Inform Assoc Date: 2019-04-01 Impact factor: 4.497

Resilience of clinical text de-identified with "hiding in plain sight" to hostile reidentification attacks by human readers.

1. Identification of patient name references within medical documents using semantic selectional restrictions.

2. Rapidly retargetable approaches to de-identification in medical records.

Review 3. Extracting information from textual documents in the electronic health record: a review of recent research.

4. BoB, a best-of-breed automated text de-identification system for VHA clinical documents.

5. The Hippocratic bargain and health information technology.

6. A clinical text classification paradigm using weak supervision and deep representation.

7. Natural language processing of symptoms documented in free-text narratives of electronic health records: a systematic review.

8. Repurposing the clinical record: can an existing natural language processing system de-identify clinical notes?

9. Optimizing annotation resources for natural language de-identification via a game theoretic framework.

10. Protecting patient privacy when sharing patient-level data from clinical trials.

1. Informatics impact requires effective, scalable tools and standards-based infrastructure.