Literature DB >> 24791676

De-identification of health records using Anonym: effectiveness and robustness across datasets.

Guido Zuccon1, Daniel Kotzur2, Anthony Nguyen2, Anton Bergheim3.   

Abstract

OBJECTIVE: Evaluate the effectiveness and robustness of Anonym, a tool for de-identifying free-text health records based on conditional random fields classifiers informed by linguistic and lexical features, as well as features extracted by pattern matching techniques. De-identification of personal health information in electronic health records is essential for the sharing and secondary usage of clinical data. De-identification tools that adapt to different sources of clinical data are attractive as they would require minimal intervention to guarantee high effectiveness. METHODS AND MATERIALS: The effectiveness and robustness of Anonym are evaluated across multiple datasets, including the widely adopted Integrating Biology and the Bedside (i2b2) dataset, used for evaluation in a de-identification challenge. The datasets used here vary in type of health records, source of data, and their quality, with one of the datasets containing optical character recognition errors.
RESULTS: Anonym identifies and removes up to 96.6% of personal health identifiers (recall) with a precision of up to 98.2% on the i2b2 dataset, outperforming the best system proposed in the i2b2 challenge. The effectiveness of Anonym across datasets is found to depend on the amount of information available for training.
CONCLUSION: Findings show that Anonym compares to the best approach from the 2006 i2b2 shared task. It is easy to retrain Anonym with new datasets; if retrained, the system is robust to variations of training size, data type and quality in presence of sufficient training data. Crown
Copyright © 2014. Published by Elsevier B.V. All rights reserved.

Keywords:  Conditional random fields; De-identification; Health records; Pattern matching

Mesh:

Year:  2014        PMID: 24791676     DOI: 10.1016/j.artmed.2014.03.006

Source DB:  PubMed          Journal:  Artif Intell Med        ISSN: 0933-3657            Impact factor:   5.326


  6 in total

1.  CRFs based de-identification of medical records.

Authors:  Bin He; Yi Guan; Jianyi Cheng; Keting Cen; Wenlan Hua
Journal:  J Biomed Inform       Date:  2015-08-24       Impact factor: 6.317

2.  Ethics and Epistemology in Big Data Research.

Authors:  Wendy Lipworth; Paul H Mason; Ian Kerridge; John P A Ioannidis
Journal:  J Bioeth Inq       Date:  2017-03-20       Impact factor: 1.352

Review 3.  Clinical Data Reuse or Secondary Use: Current Status and Potential Future Progress.

Authors:  S M Meystre; C Lovis; T Bürkle; G Tognola; A Budrionis; C U Lehmann
Journal:  Yearb Med Inform       Date:  2017-09-11

4.  Scalable Iterative Classification for Sanitizing Large-Scale Datasets.

Authors:  Bo Li; Yevgeniy Vorobeychik; Muqun Li; Bradley Malin
Journal:  IEEE Trans Knowl Data Eng       Date:  2016-11-11       Impact factor: 6.977

5.  A study of deep learning methods for de-identification of clinical notes in cross-institute settings.

Authors:  Xi Yang; Tianchen Lyu; Qian Li; Chih-Yin Lee; Jiang Bian; William R Hogan; Yonghui Wu
Journal:  BMC Med Inform Decis Mak       Date:  2019-12-05       Impact factor: 2.796

6.  Generating high-quality data abstractions from scanned clinical records: text-mining-assisted extraction of endometrial carcinoma pathology features as proof of principle.

Authors:  Anthony Nguyen; John O'Dwyer; Thanh Vu; Penelope M Webb; Sharon E Johnatty; Amanda B Spurdle
Journal:  BMJ Open       Date:  2020-06-11       Impact factor: 2.692

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.