Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Annotating longitudinal clinical narratives for de-identification: The 2014 i2b2/UTHealth corpus.

Literature DB >> 26319540

Annotating longitudinal clinical narratives for de-identification: The 2014 i2b2/UTHealth corpus.

Abstract

The 2014 i2b2/UTHealth natural language processing shared task featured a track focused on the de-identification of longitudinal medical records. For this track, we de-identified a set of 1304 longitudinal medical records describing 296 patients. This corpus was de-identified under a broad interpretation of the HIPAA guidelines using double-annotation followed by arbitration, rounds of sanity checking, and proof reading. The average token-based F1 measure for the annotators compared to the gold standard was 0.927. The resulting annotations were used both to de-identify the data and to set the gold standard for the de-identification track of the 2014 i2b2/UTHealth shared task. All annotated private health information were replaced with realistic surrogates automatically and then read over and corrected manually. The resulting corpus is the first of its kind made available for de-identification research. This corpus was first used for the 2014 i2b2/UTHealth shared task, during which the systems achieved a mean F-measure of 0.872 and a maximum F-measure of 0.964 using entity-based micro-averaged evaluations.

Entities: Disease Gene Species

Keywords: Annotation; De-identification; HIPAA; Natural language processing

Mesh：

Year: 2015 PMID： 26319540 PMCID： PMC4978170 DOI： 10.1016/j.jbi.2015.07.020

Source DB: PubMed Journal: J Biomed Inform ISSN： 1532-0464 Impact factor: 6.317

13 in total

1. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals.

Authors: A L Goldberger; L A Amaral; L Glass; J M Hausdorff; P C Ivanov; R G Mark; J E Mietus; G B Moody; C K Peng; H E Stanley
Journal: Circulation Date: 2000-06-13 Impact factor: 29.690

Review 2. Identifying risk factors for heart disease over time: Overview of 2014 i2b2/UTHealth shared task Track 2.

Authors: Amber Stubbs; Christopher Kotfila; Hua Xu; Özlem Uzuner
Journal: J Biomed Inform Date: 2015-07-22 Impact factor: 6.317

3. Recognizing obesity and comorbidities in sparse data.

Authors: Ozlem Uzuner
Journal: J Am Med Inform Assoc Date: 2009-04-23 Impact factor: 4.497

4. Portability of an algorithm to identify rheumatoid arthritis in electronic health records.

Authors: Robert J Carroll; Will K Thompson; Anne E Eyler; Arthur M Mandelin; Tianxi Cai; Raquel M Zink; Jennifer A Pacheco; Chad S Boomershine; Thomas A Lasko; Hua Xu; Elizabeth W Karlson; Raul G Perez; Vivian S Gainer; Shawn N Murphy; Eric M Ruderman; Richard M Pope; Robert M Plenge; Abel Ngo Kho; Katherine P Liao; Joshua C Denny
Journal: J Am Med Inform Assoc Date: 2012-02-28 Impact factor: 4.497

5. Creation of a new longitudinal corpus of clinical narratives.

Authors: Vishesh Kumar; Amber Stubbs; Stanley Shaw; Özlem Uzuner
Journal: J Biomed Inform Date: 2015-10-01 Impact factor: 6.317

Review 6. Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task Track 1.

Authors: Amber Stubbs; Christopher Kotfila; Özlem Uzuner
Journal: J Biomed Inform Date: 2015-07-28 Impact factor: 6.317

Review 7. What can natural language processing do for clinical decision support?

Authors: Dina Demner-Fushman; Wendy W Chapman; Clement J McDonald
Journal: J Biomed Inform Date: 2009-08-13 Impact factor: 6.317

8. Clinical decision support with automated text processing for cervical cancer screening.

Authors: Kavishwar B Wagholikar; Kathy L MacLaughlin; Michael R Henry; Robert A Greenes; Ronald A Hankey; Hongfang Liu; Rajeev Chaudhry
Journal: J Am Med Inform Assoc Date: 2012-04-29 Impact factor: 4.497

9. Automated de-identification of free-text medical records.

Authors: Ishna Neamatullah; Margaret M Douglass; Li-wei H Lehman; Andrew Reisner; Mauricio Villarroel; William J Long; Peter Szolovits; George B Moody; Roger G Mark; Gari D Clifford
Journal: BMC Med Inform Decis Mak Date: 2008-07-24 Impact factor: 2.796

10. Evaluating the effects of machine pre-annotation and an interactive annotation interface on manual de-identification of clinical text.

Authors: Brett R South; Danielle Mowery; Ying Suo; Jianwei Leng; Óscar Ferrández; Stephane M Meystre; Wendy W Chapman
Journal: J Biomed Inform Date: 2014-05-20 Impact factor: 6.317

35 in total

Annotating longitudinal clinical narratives for de-identification: The 2014 i2b2/UTHealth corpus.

1. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals.

Review 2. Identifying risk factors for heart disease over time: Overview of 2014 i2b2/UTHealth shared task Track 2.

3. Recognizing obesity and comorbidities in sparse data.

4. Portability of an algorithm to identify rheumatoid arthritis in electronic health records.

5. Creation of a new longitudinal corpus of clinical narratives.

Review 6. Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task Track 1.

Review 7. What can natural language processing do for clinical decision support?

8. Clinical decision support with automated text processing for cervical cancer screening.

9. Automated de-identification of free-text medical records.

10. Evaluating the effects of machine pre-annotation and an interactive annotation interface on manual de-identification of clinical text.

1. Leveraging existing corpora for de-identification of psychiatric notes using domain adaptation.

Review 2. De-identification of psychiatric intake records: Overview of 2016 CEGS N-GRID shared tasks Track 1.

3. Identification of Gout Flares in Chief Complaint Text Using Natural Language Processing.

4. Ensemble method-based extraction of medication and related information from clinical texts.

5. Ensemble-based Methods to Improve De-identification of Electronic Health Record Narratives.

6. Efficient Active Learning for Electronic Medical Record De-identification.

7. Comparative Study of Various Approaches for Ensemble-based De-identification of Electronic Health Record Narratives.

8. Practical applications for natural language processing in clinical research: The 2014 i2b2/UTHealth shared tasks.

9. A cascaded approach for Chinese clinical text de-identification with less annotation effort.

10. Building a best-in-class automated de-identification tool for electronic health records through ensemble learning.