Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 The machine giveth and the machine taketh away: a parrot attack on clinical text deidentified with hiding in plain sight.

Literature DB >> 31390016

The machine giveth and the machine taketh away: a parrot attack on clinical text deidentified with hiding in plain sight.

David S Carrell¹, David J Cronkite¹, Muqun Rachel Li², Steve Nyemba³, Bradley A Malin^3,4,5, John S Aberdeen⁶, Lynette Hirschman⁶.

Abstract

OBJECTIVE: Clinical corpora can be deidentified using a combination of machine-learned automated taggers and hiding in plain sight (HIPS) resynthesis. The latter replaces detected personally identifiable information (PII) with random surrogates, allowing leaked PII to blend in or "hide in plain sight." We evaluated the extent to which a malicious attacker could expose leaked PII in such a corpus.
MATERIALS AND METHODS: We modeled a scenario where an institution (the defender) externally shared an 800-note corpus of actual outpatient clinical encounter notes from a large, integrated health care delivery system in Washington State. These notes were deidentified by a machine-learned PII tagger and HIPS resynthesis. A malicious attacker obtained and performed a parrot attack intending to expose leaked PII in this corpus. Specifically, the attacker mimicked the defender's process by manually annotating all PII-like content in half of the released corpus, training a PII tagger on these data, and using the trained model to tag the remaining encounter notes. The attacker hypothesized that untagged identifiers would be leaked PII, discoverable by manual review. We evaluated the attacker's success using measures of leak-detection rate and accuracy.
RESULTS: The attacker correctly hypothesized that 211 (68%) of 310 actual PII leaks in the corpus were leaks, and wrongly hypothesized that 191 resynthesized PII instances were also leaks. One-third of actual leaks remained undetected. DISCUSSION AND
CONCLUSION: A malicious parrot attack to reveal leaked PII in clinical text deidentified by machine-learned HIPS resynthesis can attenuate but not eliminate the protective effect of HIPS deidentification.

Entities: Species

Keywords: deidentification; machine learning; natural language processing, patient data privacy; patient privacy

Year: 2019 PMID： 31390016 PMCID： PMC6857511 DOI： 10.1093/jamia/ocz114

Source DB: PubMed Journal: J Am Med Inform Assoc ISSN： 1067-5027 Impact factor: 4.497

26 in total

1. Identification of patient name references within medical documents using semantic selectional restrictions.

Authors: Ricky K Taira; Alex A T Bui; Hooshang Kangarloo
Journal: Proc AMIA Symp Date: 2002

2. Hiding in plain sight: use of realistic surrogates to reduce exposure of protected health information in clinical text.

Authors: David Carrell; Bradley Malin; John Aberdeen; Samuel Bayer; Cheryl Clark; Ben Wellner; Lynette Hirschman
Journal: J Am Med Inform Assoc Date: 2012-07-06 Impact factor: 4.497

3. Assessing the difficulty and time cost of de-identification in clinical narratives.

Authors: D A Dorr; W F Phillips; S Phansalkar; S A Sims; J F Hurdle
Journal: Methods Inf Med Date: 2006 Impact factor: 2.176

4. Rapidly retargetable approaches to de-identification in medical records.

Authors: Ben Wellner; Matt Huyck; Scott Mardis; John Aberdeen; Alex Morgan; Leonid Peshkin; Alex Yeh; Janet Hitzeman; Lynette Hirschman
Journal: J Am Med Inform Assoc Date: 2007-06-28 Impact factor: 4.497

Review 5. Extracting information from textual documents in the electronic health record: a review of recent research.

Authors: S M Meystre; G K Savova; K C Kipper-Schuler; J F Hurdle
Journal: Yearb Med Inform Date: 2008

6. R-U policy frontiers for health data de-identification.

Authors: Weiyi Xia; Raymond Heatherly; Xiaofeng Ding; Jiuyong Li; Bradley A Malin
Journal: J Am Med Inform Assoc Date: 2015-04-24 Impact factor: 4.497

7. Is the Juice Worth the Squeeze? Costs and Benefits of Multiple Human Annotators for Clinical Text De-identification.

Authors: David S Carrell; David J Cronkite; Bradley A Malin; John S Aberdeen; Lynette Hirschman
Journal: Methods Inf Med Date: 2016-07-13 Impact factor: 2.176

8. Scalable Iterative Classification for Sanitizing Large-Scale Datasets.

Authors: Bo Li; Yevgeniy Vorobeychik; Muqun Li; Bradley Malin
Journal: IEEE Trans Knowl Data Eng Date: 2016-11-11 Impact factor: 6.977

9. Repurposing the clinical record: can an existing natural language processing system de-identify clinical notes?

Authors: Frances P Morrison; Li Li; Albert M Lai; George Hripcsak
Journal: J Am Med Inform Assoc Date: 2008-10-24 Impact factor: 4.497

Review 10. Automatic de-identification of textual documents in the electronic health record: a review of recent research.

Authors: Stephane M Meystre; F Jeffrey Friedlin; Brett R South; Shuying Shen; Matthew H Samore
Journal: BMC Med Res Methodol Date: 2010-08-02 Impact factor: 4.615

3 in total

1. Building a best-in-class automated de-identification tool for electronic health records through ensemble learning.

Authors: Karthik Murugadoss; Ajit Rajasekharan; Bradley Malin; Vineet Agarwal; Sairam Bade; Jeff R Anderson; Jason L Ross; William A Faubion; John D Halamka; Venky Soundararajan; Sankar Ardhanari
Journal: Patterns (N Y) Date: 2021-05-12

2. Resilience of clinical text de-identified with "hiding in plain sight" to hostile reidentification attacks by human readers.

Authors: David S Carrell; Bradley A Malin; David J Cronkite; John S Aberdeen; Cheryl Clark; Muqun Rachel Li; Dikshya Bastakoty; Steve Nyemba; Lynette Hirschman
Journal: J Am Med Inform Assoc Date: 2020-07-01 Impact factor: 4.497

3. The Potential of Research Drawing on Clinical Free Text to Bring Benefits to Patients in the United Kingdom: A Systematic Review of the Literature.

Authors: Elizabeth Ford; Keegan Curlewis; Emma Squires; Lucy J Griffiths; Robert Stewart; Kerina H Jones
Journal: Front Digit Health Date: 2021-02-10

3 in total