Literature DB >> 33937843

Evaluation of Automated Public De-Identification Tools on a Corpus of Radiology Reports.

Jackson M Steinkamp1, Taylor Pomeranz1, Jason Adleberg1, Charles E Kahn1, Tessa S Cook1.   

Abstract

PURPOSE: To evaluate publicly available de-identification tools on a large corpus of narrative-text radiology reports.
MATERIALS AND METHODS: In this retrospective study, 21 categories of protected health information (PHI) in 2503 radiology reports were annotated from a large multihospital academic health system, collected between January 1, 2012 and January 8, 2019. A subset consisting of 1023 reports served as a test set; the remainder were used as domain-specific training data. The types and frequencies of PHI present within the reports were tallied. Five public de-identification tools were evaluated: MITRE Identification Scrubber Toolkit, U.S. National Library of Medicine‒Scrubber, Massachusetts Institute of Technology de-identification software, Emory Health Information DE-identification (HIDE) software, and Neuro named-entity recognition (NeuroNER). The tools were compared using metrics including recall, precision, and F1 score (the harmonic mean of recall and precision) for each category of PHI.
RESULTS: The annotators identified 3528 spans of PHI text within the 2503 reports. Cohen κ for interrater agreement was 0.938. Dates accounted for the majority of PHI found in the dataset of radiology reports (n = 2755 [78%]). The two best-performing tools both used machine learning methods-NeuroNER (precision, 94.5%; recall, 92.6%; microaveraged F1 score [F1], 93.6%) and Emory HIDE (precision, 96.6%; recall, 88.2%; F1, 92.2%)-but none exceeded 50% F1 on the important patient names category.
CONCLUSION: PHI appeared infrequently within the corpus of reports studied, which created difficulties for training machine learning systems. Out-of-the-box de-identification tools achieved limited performance on the corpus of radiology reports, suggesting the need for further advancements in public datasets and trained models.Supplemental material is available for this article.See also the commentary by Tenenholtz and Wood in this issue.© RSNA, 2020. 2020 by the Radiological Society of North America, Inc.

Entities:  

Year:  2020        PMID: 33937843      PMCID: PMC8082401          DOI: 10.1148/ryai.2020190137

Source DB:  PubMed          Journal:  Radiol Artif Intell        ISSN: 2638-6100


  14 in total

1.  Leveraging existing corpora for de-identification of psychiatric notes using domain adaptation.

Authors:  Hee-Jin Lee; Yaoyun Zhang; Kirk Roberts; Hua Xu
Journal:  AMIA Annu Symp Proc       Date:  2018-04-16

2.  Detecting Protected Health Information in Heterogeneous Clinical Notes.

Authors:  Aron Henriksson; Maria Kvist; Hercules Dalianis
Journal:  Stud Health Technol Inform       Date:  2017

3.  The Cancer Imaging Archive (TCIA): maintaining and operating a public information repository.

Authors:  Kenneth Clark; Bruce Vendt; Kirk Smith; John Freymann; Justin Kirby; Paul Koppel; Stephen Moore; Stanley Phillips; David Maffitt; Michael Pringle; Lawrence Tarbox; Fred Prior
Journal:  J Digit Imaging       Date:  2013-12       Impact factor: 4.056

4.  Preparing a collection of radiology examinations for distribution and retrieval.

Authors:  Dina Demner-Fushman; Marc D Kohli; Marc B Rosenman; Sonya E Shooshan; Laritza Rodriguez; Sameer Antani; George R Thoma; Clement J McDonald
Journal:  J Am Med Inform Assoc       Date:  2015-07-01       Impact factor: 4.497

Review 5.  Privacy technology to support data sharing for comparative effectiveness research: a systematic review.

Authors:  Xiaoqian Jiang; Anand D Sarwate; Lucila Ohno-Machado
Journal:  Med Care       Date:  2013-08       Impact factor: 2.983

6.  De-identification of patient notes with recurrent neural networks.

Authors:  Franck Dernoncourt; Ji Young Lee; Ozlem Uzuner; Peter Szolovits
Journal:  J Am Med Inform Assoc       Date:  2017-05-01       Impact factor: 4.497

7.  Automated de-identification of free-text medical records.

Authors:  Ishna Neamatullah; Margaret M Douglass; Li-wei H Lehman; Andrew Reisner; Mauricio Villarroel; William J Long; Peter Szolovits; George B Moody; Roger G Mark; Gari D Clifford
Journal:  BMC Med Inform Decis Mak       Date:  2008-07-24       Impact factor: 2.796

Review 8.  Use and Understanding of Anonymization and De-Identification in the Biomedical Literature: Scoping Review.

Authors:  Raphaël Chevrier; Vasiliki Foufi; Christophe Gaudet-Blavignac; Arnaud Robert; Christian Lovis
Journal:  J Med Internet Res       Date:  2019-05-31       Impact factor: 5.428

9.  MIMIC-III, a freely accessible critical care database.

Authors:  Alistair E W Johnson; Tom J Pollard; Lu Shen; Li-Wei H Lehman; Mengling Feng; Mohammad Ghassemi; Benjamin Moody; Peter Szolovits; Leo Anthony Celi; Roger G Mark
Journal:  Sci Data       Date:  2016-05-24       Impact factor: 6.444

10.  Image De-Identification Methods for Clinical Research in the XDS Environment.

Authors:  K Y E Aryanto; G van Kernebeek; B Berendsen; M Oudkerk; P M A van Ooijen
Journal:  J Med Syst       Date:  2016-01-26       Impact factor: 4.460

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.