Literature DB >> 24502938

Text de-identification for privacy protection: a study of its impact on clinical text information content.

Stéphane M Meystre1, Óscar Ferrández2, F Jeffrey Friedlin3, Brett R South2, Shuying Shen4, Matthew H Samore4.   

Abstract

As more and more electronic clinical information is becoming easier to access for secondary uses such as clinical research, approaches that enable faster and more collaborative research while protecting patient privacy and confidentiality are becoming more important. Clinical text de-identification offers such advantages but is typically a tedious manual process. Automated Natural Language Processing (NLP) methods can alleviate this process, but their impact on subsequent uses of the automatically de-identified clinical narratives has only barely been investigated. In the context of a larger project to develop and investigate automated text de-identification for Veterans Health Administration (VHA) clinical notes, we studied the impact of automated text de-identification on clinical information in a stepwise manner. Our approach started with a high-level assessment of clinical notes informativeness and formatting, and ended with a detailed study of the overlap of select clinical information types and Protected Health Information (PHI). To investigate the informativeness (i.e., document type information, select clinical data types, and interpretation or conclusion) of VHA clinical notes, we used five different existing text de-identification systems. The informativeness was only minimally altered by these systems while formatting was only modified by one system. To examine the impact of de-identification on clinical information extraction, we compared counts of SNOMED-CT concepts found by an open source information extraction application in the original (i.e., not de-identified) version of a corpus of VHA clinical notes, and in the same corpus after de-identification. Only about 1.2-3% less SNOMED-CT concepts were found in de-identified versions of our corpus, and many of these concepts were PHI that was erroneously identified as clinical information. To study this impact in more details and assess how generalizable our findings were, we examined the overlap between select clinical information annotated in the 2010 i2b2 NLP challenge corpus and automatic PHI annotations from our best-of-breed VHA clinical text de-identification system (nicknamed 'BoB'). Overall, only 0.81% of the clinical information exactly overlapped with PHI, and 1.78% partly overlapped. We conclude that automated text de-identification's impact on clinical information is small, but not negligible, and that improved clinical acronyms and eponyms disambiguation could significantly reduce this impact.
Copyright © 2014 Elsevier Inc. All rights reserved.

Entities:  

Keywords:  Confidentiality, patient data privacy; De-identification, Anonymization, Electronic health records; Medical informatics; Natural Language Processing; United States department of veterans affairs

Mesh:

Year:  2014        PMID: 24502938     DOI: 10.1016/j.jbi.2014.01.011

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  15 in total

Review 1.  Clinical Natural Language Processing in 2014: Foundational Methods Supporting Efficient Healthcare.

Authors:  A Névéol; P Zweigenbaum
Journal:  Yearb Med Inform       Date:  2015-08-13

2.  De-identifying Spanish medical texts - named entity recognition applied to radiology reports.

Authors:  Irene Pérez-Díez; Raúl Pérez-Moraga; Adolfo López-Cerdán; Jose-Maria Salinas-Serrano; María de la Iglesia-Vayá
Journal:  J Biomed Semantics       Date:  2021-03-29

Review 3.  Clinical Data Reuse or Secondary Use: Current Status and Potential Future Progress.

Authors:  S M Meystre; C Lovis; T Bürkle; G Tognola; A Budrionis; C U Lehmann
Journal:  Yearb Med Inform       Date:  2017-09-11

Review 4.  Recent Advances in Clinical Natural Language Processing in Support of Semantic Analysis.

Authors:  S Velupillai; D Mowery; B R South; M Kvist; H Dalianis
Journal:  Yearb Med Inform       Date:  2015-08-13

5.  State of the art and a mixed-method personalized approach to assess patient perceptions on medical record sharing and sensitivity.

Authors:  Hiral Soni; Adela Grando; Anita Murcko; Sabrina Diaz; Madhumita Mukundan; Nassim Idouraine; George Karway; Michael Todd; Darwyn Chern; Christy Dye; Mary Jo Whitfield
Journal:  J Biomed Inform       Date:  2019-11-11       Impact factor: 6.317

6.  Natural Language Processing for Enterprise-scale De-identification of Protected Health Information in Clinical Notes.

Authors:  Noor Abu-El-Rub; Jay Urbain; George Kowalski; Kristen Osinski; Robert Spaniol; Mei Liu; Bradley Taylor; Lemuel R Waitman
Journal:  AMIA Annu Symp Proc       Date:  2022-05-23

7.  Classifying Cyber-Risky Clinical Notes by Employing Natural Language Processing.

Authors:  Suzanna Schmeelk; Martins Samuel Dogo; Yifan Peng; Braja Gopal Patra
Journal:  Proc Annu Hawaii Int Conf Syst Sci       Date:  2022-01-04

8.  Privacy Policy and Technology in Biomedical Data Science.

Authors:  April Moreno Arellano; Wenrui Dai; Shuang Wang; Xiaoqian Jiang; Lucila Ohno-Machado
Journal:  Annu Rev Biomed Data Sci       Date:  2018-07

9.  Learning to identify Protected Health Information by integrating knowledge- and data-driven algorithms: A case study on psychiatric evaluation notes.

Authors:  Azad Dehghan; Aleksandar Kovacevic; George Karystianis; John A Keane; Goran Nenadic
Journal:  J Biomed Inform       Date:  2017-06-07       Impact factor: 6.317

10.  Sharing big biomedical data.

Authors:  Arthur W Toga; Ivo D Dinov
Journal:  J Big Data       Date:  2015-06-27
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.