Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Building a best-in-class automated de-identification tool for electronic health records through ensemble learning.

Literature DB >> 34179842

Building a best-in-class automated de-identification tool for electronic health records through ensemble learning.

Karthik Murugadoss¹, Ajit Rajasekharan¹, Bradley Malin², Vineet Agarwal¹, Sairam Bade³, Jeff R Anderson^4,5, Jason L Ross¹, William A Faubion⁴, John D Halamka^4,5, Venky Soundararajan¹, Sankar Ardhanari¹.

Abstract

The presence of personally identifiable information (PII) in natural language portions of electronic health records (EHRs) constrains their broad reuse. Despite continuous improvements in automated detection of PII, residual identifiers require manual validation and correction. Here, we describe an automated de-identification system that employs an ensemble architecture, incorporating attention-based deep-learning models and rule-based methods, supported by heuristics for detecting PII in EHR data. Detected identifiers are then transformed into plausible, though fictional, surrogates to further obfuscate any leaked identifier. Our approach outperforms existing tools, with a recall of 0.992 and precision of 0.979 on the i2b2 2014 dataset and a recall of 0.994 and precision of 0.967 on a dataset of 10,000 notes from the Mayo Clinic. The de-identification system presented here enables the generation of de-identified patient data at the scale required for modern machine-learning applications to help accelerate medical discoveries.

Entities: Chemical Disease Gene Species

Keywords: anonymization; de-identification; ensemble; mayo; nference; obfuscation

Year: 2021 PMID： 34179842 PMCID： PMC8212138 DOI： 10.1016/j.patter.2021.100255

Source DB: PubMed Journal: Patterns (N Y) ISSN： 2666-3899

30 in total

1. Standards for privacy of individually identifiable health information. Final rule.

Authors:
Journal: Fed Regist Date: 2002-08-14

2. Hiding in plain sight: use of realistic surrogates to reduce exposure of protected health information in clinical text.

Authors: David Carrell; Bradley Malin; John Aberdeen; Samuel Bayer; Cheryl Clark; Ben Wellner; Lynette Hirschman
Journal: J Am Med Inform Assoc Date: 2012-07-06 Impact factor: 4.497

3. Evaluating the state-of-the-art in automatic de-identification.

Authors: Ozlem Uzuner; Yuan Luo; Peter Szolovits
Journal: J Am Med Inform Assoc Date: 2007-06-28 Impact factor: 4.497

4. Ensemble method-based extraction of medication and related information from clinical texts.

Authors: Youngjun Kim; Stéphane M Meystre
Journal: J Am Med Inform Assoc Date: 2020-01-01 Impact factor: 4.497

5. BoB, a best-of-breed automated text de-identification system for VHA clinical documents.

Authors: Oscar Ferrández; Brett R South; Shuying Shen; F Jeffrey Friedlin; Matthew H Samore; Stéphane M Meystre
Journal: J Am Med Inform Assoc Date: 2012-09-04 Impact factor: 4.497

6. A Comparative Analysis of Speed and Accuracy for Three Off-the-Shelf De-Identification Tools.

Authors: Paul M Heider; Jihad S Obeid; Stéphane M Meystre
Journal: AMIA Jt Summits Transl Sci Proc Date: 2020-05-30

7. Automatic de-identification of electronic medical records using token-level and character-level conditional random fields.

Authors: Zengjian Liu; Yangxin Chen; Buzhou Tang; Xiaolong Wang; Qingcai Chen; Haodi Li; Jingfeng Wang; Qiwen Deng; Suisong Zhu
Journal: J Biomed Inform Date: 2015-06-26 Impact factor: 6.317

8. Repurposing the clinical record: can an existing natural language processing system de-identify clinical notes?

Authors: Frances P Morrison; Li Li; Albert M Lai; George Hripcsak
Journal: J Am Med Inform Assoc Date: 2008-10-24 Impact factor: 4.497

9. Building the graph of medicine from millions of clinical narratives.

Authors: Samuel G Finlayson; Paea LePendu; Nigam H Shah
Journal: Sci Data Date: 2014-09-16 Impact factor: 6.444

10. ADEPt, a semantically-enriched pipeline for extracting adverse drug events from free-text electronic health records.

Authors: Ehtesham Iqbal; Robbie Mallah; Daniel Rhodes; Honghan Wu; Alvin Romero; Nynn Chang; Olubanke Dzahini; Chandra Pandey; Matthew Broadbent; Robert Stewart; Richard J B Dobson; Zina M Ibrahim
Journal: PLoS One Date: 2017-11-09 Impact factor: 3.240

1 in total

Review 1. Moving towards vertically integrated artificial intelligence development.

Authors: Joe Zhang; Sanjay Budhdeo; Wasswa William; Paul Cerrato; Haris Shuaib; Harpreet Sood; Hutan Ashrafian; John Halamka; James T Teo
Journal: NPJ Digit Med Date: 2022-09-15

1 in total