Literature DB >> 22947391

BoB, a best-of-breed automated text de-identification system for VHA clinical documents.

Oscar Ferrández1, Brett R South, Shuying Shen, F Jeffrey Friedlin, Matthew H Samore, Stéphane M Meystre.   

Abstract

OBJECTIVE: De-identification allows faster and more collaborative clinical research while protecting patient confidentiality. Clinical narrative de-identification is a tedious process that can be alleviated by automated natural language processing methods. The goal of this research is the development of an automated text de-identification system for Veterans Health Administration (VHA) clinical documents.
MATERIALS AND METHODS: We devised a novel stepwise hybrid approach designed to improve the current strategies used for text de-identification. The proposed system is based on a previous study on the best de-identification methods for VHA documents. This best-of-breed automated clinical text de-identification system (aka BoB) tackles the problem as two separate tasks: (1) maximize patient confidentiality by redacting as much protected health information (PHI) as possible; and (2) leave de-identified documents in a usable state preserving as much clinical information as possible.
RESULTS: We evaluated BoB with a manually annotated corpus of a variety of VHA clinical notes, as well as with the 2006 i2b2 de-identification challenge corpus. We present evaluations at the instance- and token-level, with detailed results for BoB's main components. Moreover, an existing text de-identification system was also included in our evaluation. DISCUSSION: BoB's design efficiently takes advantage of the methods implemented in its pipeline, resulting in high sensitivity values (especially for sensitive PHI categories) and a limited number of false positives.
CONCLUSIONS: Our system successfully addressed VHA clinical document de-identification, and its hybrid stepwise design demonstrates robustness and efficiency, prioritizing patient confidentiality while leaving most clinical information intact.

Entities:  

Mesh:

Year:  2012        PMID: 22947391      PMCID: PMC3555325          DOI: 10.1136/amiajnl-2012-001020

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


  15 in total

1.  Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications.

Authors:  Guergana K Savova; James J Masanz; Philip V Ogren; Jiaping Zheng; Sunghwan Sohn; Karin C Kipper-Schuler; Christopher G Chute
Journal:  J Am Med Inform Assoc       Date:  2010 Sep-Oct       Impact factor: 4.497

2.  A de-identifier for medical discharge summaries.

Authors:  Ozlem Uzuner; Tawanda C Sibanda; Yuan Luo; Peter Szolovits
Journal:  Artif Intell Med       Date:  2007-11-28       Impact factor: 5.326

3.  Developing a standard for de-identifying electronic patient records written in Swedish: precision, recall and F-measure in a manual and computerized annotation trial.

Authors:  Sumithra Velupillai; Hercules Dalianis; Martin Hassel; Gunnar H Nilsson
Journal:  Int J Med Inform       Date:  2009-05-23       Impact factor: 4.046

4.  A system for de-identifying medical message board text.

Authors:  Adrian Benton; Shawndra Hill; Lyle Ungar; Annie Chung; Charles Leonard; Cristin Freeman; John H Holmes
Journal:  BMC Bioinformatics       Date:  2011-06-09       Impact factor: 3.169

5.  Effects of personal identifier resynthesis on clinical text de-identification.

Authors:  Reyyan Yeniterzi; John Aberdeen; Samuel Bayer; Ben Wellner; Lynette Hirschman; Bradley Malin
Journal:  J Am Med Inform Assoc       Date:  2010 Mar-Apr       Impact factor: 4.497

6.  The MITRE Identification Scrubber Toolkit: design, training, and assessment.

Authors:  John Aberdeen; Samuel Bayer; Reyyan Yeniterzi; Ben Wellner; Cheryl Clark; David Hanauer; Bradley Malin; Lynette Hirschman
Journal:  Int J Med Inform       Date:  2010-10-14       Impact factor: 4.046

7.  A software tool for removing patient identifying information from clinical documents.

Authors:  F Jeff Friedlin; Clement J McDonald
Journal:  J Am Med Inform Assoc       Date:  2008-06-25       Impact factor: 4.497

Review 8.  Automatic de-identification of textual documents in the electronic health record: a review of recent research.

Authors:  Stephane M Meystre; F Jeffrey Friedlin; Brett R South; Shuying Shen; Matthew H Samore
Journal:  BMC Med Res Methodol       Date:  2010-08-02       Impact factor: 4.615

9.  Automated de-identification of free-text medical records.

Authors:  Ishna Neamatullah; Margaret M Douglass; Li-wei H Lehman; Andrew Reisner; Mauricio Villarroel; William J Long; Peter Szolovits; George B Moody; Roger G Mark; Gari D Clifford
Journal:  BMC Med Inform Decis Mak       Date:  2008-07-24       Impact factor: 2.796

10.  Evaluating current automatic de-identification methods with Veteran's health administration clinical documents.

Authors:  Oscar Ferrández; Brett R South; Shuying Shen; F Jeffrey Friedlin; Matthew H Samore; Stéphane M Meystre
Journal:  BMC Med Res Methodol       Date:  2012-07-27       Impact factor: 4.615

View more
  24 in total

1.  The machine giveth and the machine taketh away: a parrot attack on clinical text deidentified with hiding in plain sight.

Authors:  David S Carrell; David J Cronkite; Muqun Rachel Li; Steve Nyemba; Bradley A Malin; John S Aberdeen; Lynette Hirschman
Journal:  J Am Med Inform Assoc       Date:  2019-12-01       Impact factor: 4.497

2.  Efficient Active Learning for Electronic Medical Record De-identification.

Authors:  Muqun Li; Martin Scaiano; Khaled El Emam; Bradley A Malin
Journal:  AMIA Jt Summits Transl Sci Proc       Date:  2019-05-06

Review 3.  Clinical Data Reuse or Secondary Use: Current Status and Potential Future Progress.

Authors:  S M Meystre; C Lovis; T Bürkle; G Tognola; A Budrionis; C U Lehmann
Journal:  Yearb Med Inform       Date:  2017-09-11

4.  Biomedical data privacy: problems, perspectives, and recent advances.

Authors:  Bradley A Malin; Khaled El Emam; Christine M O'Keefe
Journal:  J Am Med Inform Assoc       Date:  2012-12-06       Impact factor: 4.497

Review 5.  Recent Advances in Clinical Natural Language Processing in Support of Semantic Analysis.

Authors:  S Velupillai; D Mowery; B R South; M Kvist; H Dalianis
Journal:  Yearb Med Inform       Date:  2015-08-13

Review 6.  "Big data" and the electronic health record.

Authors:  M K Ross; W Wei; L Ohno-Machado
Journal:  Yearb Med Inform       Date:  2014-08-15

7.  Location bias of identifiers in clinical narratives.

Authors:  David A Hanauer; Qiaozhu Mei; Bradley Malin; Kai Zheng
Journal:  AMIA Annu Symp Proc       Date:  2013-11-16

8.  Scalable Iterative Classification for Sanitizing Large-Scale Datasets.

Authors:  Bo Li; Yevgeniy Vorobeychik; Muqun Li; Bradley Malin
Journal:  IEEE Trans Knowl Data Eng       Date:  2016-11-11       Impact factor: 6.977

9.  Building a best-in-class automated de-identification tool for electronic health records through ensemble learning.

Authors:  Karthik Murugadoss; Ajit Rajasekharan; Bradley Malin; Vineet Agarwal; Sairam Bade; Jeff R Anderson; Jason L Ross; William A Faubion; John D Halamka; Venky Soundararajan; Sankar Ardhanari
Journal:  Patterns (N Y)       Date:  2021-05-12

10.  Resilience of clinical text de-identified with "hiding in plain sight" to hostile reidentification attacks by human readers.

Authors:  David S Carrell; Bradley A Malin; David J Cronkite; John S Aberdeen; Cheryl Clark; Muqun Rachel Li; Dikshya Bastakoty; Steve Nyemba; Lynette Hirschman
Journal:  J Am Med Inform Assoc       Date:  2020-07-01       Impact factor: 4.497

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.