Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Text de-identification for privacy protection: a study of its impact on clinical text information content.

Literature DB >> 24502938

Text de-identification for privacy protection: a study of its impact on clinical text information content.

Stéphane M Meystre¹, Óscar Ferrández², F Jeffrey Friedlin³, Brett R South², Shuying Shen⁴, Matthew H Samore⁴.

Abstract

As more and more electronic clinical information is becoming easier to access for secondary uses such as clinical research, approaches that enable faster and more collaborative research while protecting patient privacy and confidentiality are becoming more important. Clinical text de-identification offers such advantages but is typically a tedious manual process. Automated Natural Language Processing (NLP) methods can alleviate this process, but their impact on subsequent uses of the automatically de-identified clinical narratives has only barely been investigated. In the context of a larger project to develop and investigate automated text de-identification for Veterans Health Administration (VHA) clinical notes, we studied the impact of automated text de-identification on clinical information in a stepwise manner. Our approach started with a high-level assessment of clinical notes informativeness and formatting, and ended with a detailed study of the overlap of select clinical information types and Protected Health Information (PHI). To investigate the informativeness (i.e., document type information, select clinical data types, and interpretation or conclusion) of VHA clinical notes, we used five different existing text de-identification systems. The informativeness was only minimally altered by these systems while formatting was only modified by one system. To examine the impact of de-identification on clinical information extraction, we compared counts of SNOMED-CT concepts found by an open source information extraction application in the original (i.e., not de-identified) version of a corpus of VHA clinical notes, and in the same corpus after de-identification. Only about 1.2-3% less SNOMED-CT concepts were found in de-identified versions of our corpus, and many of these concepts were PHI that was erroneously identified as clinical information. To study this impact in more details and assess how generalizable our findings were, we examined the overlap between select clinical information annotated in the 2010 i2b2 NLP challenge corpus and automatic PHI annotations from our best-of-breed VHA clinical text de-identification system (nicknamed 'BoB'). Overall, only 0.81% of the clinical information exactly overlapped with PHI, and 1.78% partly overlapped. We conclude that automated text de-identification's impact on clinical information is small, but not negligible, and that improved clinical acronyms and eponyms disambiguation could significantly reduce this impact.

Entities: Species

Keywords: Confidentiality, patient data privacy; De-identification, Anonymization, Electronic health records; Medical informatics; Natural Language Processing; United States department of veterans affairs

Mesh：

Year: 2014 PMID： 24502938 DOI： 10.1016/j.jbi.2014.01.011

Source DB: PubMed Journal: J Biomed Inform ISSN： 1532-0464 Impact factor: 6.317

Keyword Cloud
Cited

15 in total

Review 1. Clinical Natural Language Processing in 2014: Foundational Methods Supporting Efficient Healthcare.

Authors: A Névéol; P Zweigenbaum
Journal: Yearb Med Inform Date: 2015-08-13

2. De-identifying Spanish medical texts - named entity recognition applied to radiology reports.

Authors: Irene Pérez-Díez; Raúl Pérez-Moraga; Adolfo López-Cerdán; Jose-Maria Salinas-Serrano; María de la Iglesia-Vayá
Journal: J Biomed Semantics Date: 2021-03-29

Review 3. Clinical Data Reuse or Secondary Use: Current Status and Potential Future Progress.

Authors: S M Meystre; C Lovis; T Bürkle; G Tognola; A Budrionis; C U Lehmann
Journal: Yearb Med Inform Date: 2017-09-11

Review 4. Recent Advances in Clinical Natural Language Processing in Support of Semantic Analysis.

Authors: S Velupillai; D Mowery; B R South; M Kvist; H Dalianis
Journal: Yearb Med Inform Date: 2015-08-13

5. State of the art and a mixed-method personalized approach to assess patient perceptions on medical record sharing and sensitivity.

Authors: Hiral Soni; Adela Grando; Anita Murcko; Sabrina Diaz; Madhumita Mukundan; Nassim Idouraine; George Karway; Michael Todd; Darwyn Chern; Christy Dye; Mary Jo Whitfield
Journal: J Biomed Inform Date: 2019-11-11 Impact factor: 6.317

9. Learning to identify Protected Health Information by integrating knowledge- and data-driven algorithms: A case study on psychiatric evaluation notes.

Authors: Azad Dehghan; Aleksandar Kovacevic; George Karystianis; John A Keane; Goran Nenadic
Journal: J Biomed Inform Date: 2017-06-07 Impact factor: 6.317

10. Sharing big biomedical data.

Authors: Arthur W Toga; Ivo D Dinov
Journal: J Big Data Date: 2015-06-27