Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 De-identification of health records using Anonym: effectiveness and robustness across datasets.

Literature DB >> 24791676

De-identification of health records using Anonym: effectiveness and robustness across datasets.

Guido Zuccon¹, Daniel Kotzur², Anthony Nguyen², Anton Bergheim³.

Abstract

OBJECTIVE: Evaluate the effectiveness and robustness of Anonym, a tool for de-identifying free-text health records based on conditional random fields classifiers informed by linguistic and lexical features, as well as features extracted by pattern matching techniques. De-identification of personal health information in electronic health records is essential for the sharing and secondary usage of clinical data. De-identification tools that adapt to different sources of clinical data are attractive as they would require minimal intervention to guarantee high effectiveness. METHODS AND MATERIALS: The effectiveness and robustness of Anonym are evaluated across multiple datasets, including the widely adopted Integrating Biology and the Bedside (i2b2) dataset, used for evaluation in a de-identification challenge. The datasets used here vary in type of health records, source of data, and their quality, with one of the datasets containing optical character recognition errors.
RESULTS: Anonym identifies and removes up to 96.6% of personal health identifiers (recall) with a precision of up to 98.2% on the i2b2 dataset, outperforming the best system proposed in the i2b2 challenge. The effectiveness of Anonym across datasets is found to depend on the amount of information available for training.
CONCLUSION: Findings show that Anonym compares to the best approach from the 2006 i2b2 shared task. It is easy to retrain Anonym with new datasets; if retrained, the system is robust to variations of training size, data type and quality in presence of sufficient training data. Crown

Keywords: Conditional random fields; De-identification; Health records; Pattern matching

Mesh：

Year: 2014 PMID： 24791676 DOI： 10.1016/j.artmed.2014.03.006

Source DB: PubMed Journal: Artif Intell Med ISSN： 0933-3657 Impact factor: 5.326

Keyword Cloud
Cited

6 in total

6. Generating high-quality data abstractions from scanned clinical records: text-mining-assisted extraction of endometrial carcinoma pathology features as proof of principle.

Authors: Anthony Nguyen; John O'Dwyer; Thanh Vu; Penelope M Webb; Sharon E Johnatty; Amanda B Spurdle
Journal: BMJ Open Date: 2020-06-11 Impact factor: 2.692

6 in total

De-identification of health records using Anonym: effectiveness and robustness across datasets.

1. CRFs based de-identification of medical records.

2. Ethics and Epistemology in Big Data Research.

Review 3. Clinical Data Reuse or Secondary Use: Current Status and Potential Future Progress.

4. Scalable Iterative Classification for Sanitizing Large-Scale Datasets.

5. A study of deep learning methods for de-identification of clinical notes in cross-institute settings.

6. Generating high-quality data abstractions from scanned clinical records: text-mining-assisted extraction of endometrial carcinoma pathology features as proof of principle.