OBJECTIVE: Secondary use of clinical text is impeded by a lack of highly effective, low-cost de-identification methods. Both, manual and automated methods for removing protected health information, are known to leave behind residual identifiers. The authors propose a novel approach for addressing the residual identifier problem based on the theory of Hiding In Plain Sight (HIPS). MATERIALS AND METHODS: HIPS relies on obfuscation to conceal residual identifiers. According to this theory, replacing the detected identifiers with realistic but synthetic surrogates should collectively render the few 'leaked' identifiers difficult to distinguish from the synthetic surrogates. The authors conducted a pilot study to test this theory on clinical narrative, de-identified by an automated system. Test corpora included 31 oncology and 50 family practice progress notes read by two trained chart abstractors and an informaticist. RESULTS: Experimental results suggest approximately 90% of residual identifiers can be effectively concealed by the HIPS approach in text containing average and high densities of personal identifying information. DISCUSSION: This pilot test suggests HIPS is feasible, but requires further evaluation. The results need to be replicated on larger corpora of diverse origin under a range of detection scenarios. Error analyses also suggest areas where surrogate generation techniques can be refined to improve efficacy. CONCLUSIONS: If these results generalize to existing high-performing de-identification systems with recall rates of 94-98%, HIPS could increase the effective de-identification rates of these systems to levels above 99% without further advancements in system recall. Additional and more rigorous assessment of the HIPS approach is warranted.
OBJECTIVE: Secondary use of clinical text is impeded by a lack of highly effective, low-cost de-identification methods. Both, manual and automated methods for removing protected health information, are known to leave behind residual identifiers. The authors propose a novel approach for addressing the residual identifier problem based on the theory of Hiding In Plain Sight (HIPS). MATERIALS AND METHODS: HIPS relies on obfuscation to conceal residual identifiers. According to this theory, replacing the detected identifiers with realistic but synthetic surrogates should collectively render the few 'leaked' identifiers difficult to distinguish from the synthetic surrogates. The authors conducted a pilot study to test this theory on clinical narrative, de-identified by an automated system. Test corpora included 31 oncology and 50 family practice progress notes read by two trained chart abstractors and an informaticist. RESULTS: Experimental results suggest approximately 90% of residual identifiers can be effectively concealed by the HIPS approach in text containing average and high densities of personal identifying information. DISCUSSION: This pilot test suggests HIPS is feasible, but requires further evaluation. The results need to be replicated on larger corpora of diverse origin under a range of detection scenarios. Error analyses also suggest areas where surrogate generation techniques can be refined to improve efficacy. CONCLUSIONS: If these results generalize to existing high-performing de-identification systems with recall rates of 94-98%, HIPS could increase the effective de-identification rates of these systems to levels above 99% without further advancements in system recall. Additional and more rigorous assessment of the HIPS approach is warranted.
Authors: Ben Wellner; Matt Huyck; Scott Mardis; John Aberdeen; Alex Morgan; Leonid Peshkin; Alex Yeh; Janet Hitzeman; Lynette Hirschman Journal: J Am Med Inform Assoc Date: 2007-06-28 Impact factor: 4.497
Authors: Brett R South; Brett Ray South; Wendy W Chapman; Wendy Chapman; Sylvain Delisle; Shuying Shen; Ericka Kalp; Trish Perl; Matthew H Samore; Adi V Gundlapalli Journal: AMIA Annu Symp Proc Date: 2008-11-06
Authors: Russell A Wilke; Richard L Berg; Peggy Peissig; Terrie Kitchner; Bozana Sijercic; Catherine A McCarty; Daniel J McCarty Journal: Clin Med Res Date: 2007-03
Authors: Ishna Neamatullah; Margaret M Douglass; Li-wei H Lehman; Andrew Reisner; Mauricio Villarroel; William J Long; Peter Szolovits; George B Moody; Roger G Mark; Gari D Clifford Journal: BMC Med Inform Decis Mak Date: 2008-07-24 Impact factor: 2.796
Authors: David S Carrell; David J Cronkite; Muqun Rachel Li; Steve Nyemba; Bradley A Malin; John S Aberdeen; Lynette Hirschman Journal: J Am Med Inform Assoc Date: 2019-12-01 Impact factor: 4.497
Authors: David S Carrell; David J Cronkite; Bradley A Malin; John S Aberdeen; Lynette Hirschman Journal: Methods Inf Med Date: 2016-07-13 Impact factor: 2.176
Authors: David S Carrell; Scott Halgrim; Diem-Thy Tran; Diana S M Buist; Jessica Chubak; Wendy W Chapman; Guergana Savova Journal: Am J Epidemiol Date: 2014-01-30 Impact factor: 4.897
Authors: Karthik Murugadoss; Ajit Rajasekharan; Bradley Malin; Vineet Agarwal; Sairam Bade; Jeff R Anderson; Jason L Ross; William A Faubion; John D Halamka; Venky Soundararajan; Sankar Ardhanari Journal: Patterns (N Y) Date: 2021-05-12
Authors: David S Carrell; Bradley A Malin; David J Cronkite; John S Aberdeen; Cheryl Clark; Muqun Rachel Li; Dikshya Bastakoty; Steve Nyemba; Lynette Hirschman Journal: J Am Med Inform Assoc Date: 2020-07-01 Impact factor: 4.497