Literature DB >> 26319540

Annotating longitudinal clinical narratives for de-identification: The 2014 i2b2/UTHealth corpus.

Amber Stubbs1, Özlem Uzuner2.   

Abstract

The 2014 i2b2/UTHealth natural language processing shared task featured a track focused on the de-identification of longitudinal medical records. For this track, we de-identified a set of 1304 longitudinal medical records describing 296 patients. This corpus was de-identified under a broad interpretation of the HIPAA guidelines using double-annotation followed by arbitration, rounds of sanity checking, and proof reading. The average token-based F1 measure for the annotators compared to the gold standard was 0.927. The resulting annotations were used both to de-identify the data and to set the gold standard for the de-identification track of the 2014 i2b2/UTHealth shared task. All annotated private health information were replaced with realistic surrogates automatically and then read over and corrected manually. The resulting corpus is the first of its kind made available for de-identification research. This corpus was first used for the 2014 i2b2/UTHealth shared task, during which the systems achieved a mean F-measure of 0.872 and a maximum F-measure of 0.964 using entity-based micro-averaged evaluations.
Copyright © 2015 Elsevier Inc. All rights reserved.

Entities:  

Keywords:  Annotation; De-identification; HIPAA; Natural language processing

Mesh:

Year:  2015        PMID: 26319540      PMCID: PMC4978170          DOI: 10.1016/j.jbi.2015.07.020

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  13 in total

1.  PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals.

Authors:  A L Goldberger; L A Amaral; L Glass; J M Hausdorff; P C Ivanov; R G Mark; J E Mietus; G B Moody; C K Peng; H E Stanley
Journal:  Circulation       Date:  2000-06-13       Impact factor: 29.690

Review 2.  Identifying risk factors for heart disease over time: Overview of 2014 i2b2/UTHealth shared task Track 2.

Authors:  Amber Stubbs; Christopher Kotfila; Hua Xu; Özlem Uzuner
Journal:  J Biomed Inform       Date:  2015-07-22       Impact factor: 6.317

3.  Recognizing obesity and comorbidities in sparse data.

Authors:  Ozlem Uzuner
Journal:  J Am Med Inform Assoc       Date:  2009-04-23       Impact factor: 4.497

4.  Portability of an algorithm to identify rheumatoid arthritis in electronic health records.

Authors:  Robert J Carroll; Will K Thompson; Anne E Eyler; Arthur M Mandelin; Tianxi Cai; Raquel M Zink; Jennifer A Pacheco; Chad S Boomershine; Thomas A Lasko; Hua Xu; Elizabeth W Karlson; Raul G Perez; Vivian S Gainer; Shawn N Murphy; Eric M Ruderman; Richard M Pope; Robert M Plenge; Abel Ngo Kho; Katherine P Liao; Joshua C Denny
Journal:  J Am Med Inform Assoc       Date:  2012-02-28       Impact factor: 4.497

5.  Creation of a new longitudinal corpus of clinical narratives.

Authors:  Vishesh Kumar; Amber Stubbs; Stanley Shaw; Özlem Uzuner
Journal:  J Biomed Inform       Date:  2015-10-01       Impact factor: 6.317

Review 6.  Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task Track 1.

Authors:  Amber Stubbs; Christopher Kotfila; Özlem Uzuner
Journal:  J Biomed Inform       Date:  2015-07-28       Impact factor: 6.317

Review 7.  What can natural language processing do for clinical decision support?

Authors:  Dina Demner-Fushman; Wendy W Chapman; Clement J McDonald
Journal:  J Biomed Inform       Date:  2009-08-13       Impact factor: 6.317

8.  Clinical decision support with automated text processing for cervical cancer screening.

Authors:  Kavishwar B Wagholikar; Kathy L MacLaughlin; Michael R Henry; Robert A Greenes; Ronald A Hankey; Hongfang Liu; Rajeev Chaudhry
Journal:  J Am Med Inform Assoc       Date:  2012-04-29       Impact factor: 4.497

9.  Automated de-identification of free-text medical records.

Authors:  Ishna Neamatullah; Margaret M Douglass; Li-wei H Lehman; Andrew Reisner; Mauricio Villarroel; William J Long; Peter Szolovits; George B Moody; Roger G Mark; Gari D Clifford
Journal:  BMC Med Inform Decis Mak       Date:  2008-07-24       Impact factor: 2.796

10.  Evaluating the effects of machine pre-annotation and an interactive annotation interface on manual de-identification of clinical text.

Authors:  Brett R South; Danielle Mowery; Ying Suo; Jianwei Leng; Óscar Ferrández; Stephane M Meystre; Wendy W Chapman
Journal:  J Biomed Inform       Date:  2014-05-20       Impact factor: 6.317

View more
  35 in total

1.  Leveraging existing corpora for de-identification of psychiatric notes using domain adaptation.

Authors:  Hee-Jin Lee; Yaoyun Zhang; Kirk Roberts; Hua Xu
Journal:  AMIA Annu Symp Proc       Date:  2018-04-16

Review 2.  De-identification of psychiatric intake records: Overview of 2016 CEGS N-GRID shared tasks Track 1.

Authors:  Amber Stubbs; Michele Filannino; Özlem Uzuner
Journal:  J Biomed Inform       Date:  2017-06-11       Impact factor: 6.317

3.  Identification of Gout Flares in Chief Complaint Text Using Natural Language Processing.

Authors:  John D Osborne; James S Booth; Tobias O'Leary; Amy Mudano; Giovanna Rosas; Phillip J Foster; Kenneth G Saag; Maria I Danila
Journal:  AMIA Annu Symp Proc       Date:  2021-01-25

4.  Ensemble method-based extraction of medication and related information from clinical texts.

Authors:  Youngjun Kim; Stéphane M Meystre
Journal:  J Am Med Inform Assoc       Date:  2020-01-01       Impact factor: 4.497

5.  Ensemble-based Methods to Improve De-identification of Electronic Health Record Narratives.

Authors:  Youngjun Kim; Paul Heider; Stéphane Meystre
Journal:  AMIA Annu Symp Proc       Date:  2018-12-05

6.  Efficient Active Learning for Electronic Medical Record De-identification.

Authors:  Muqun Li; Martin Scaiano; Khaled El Emam; Bradley A Malin
Journal:  AMIA Jt Summits Transl Sci Proc       Date:  2019-05-06

7.  Comparative Study of Various Approaches for Ensemble-based De-identification of Electronic Health Record Narratives.

Authors:  Youngjun Kim; Paul M Heider; Stéphane M Meystre
Journal:  AMIA Annu Symp Proc       Date:  2021-01-25

8.  Practical applications for natural language processing in clinical research: The 2014 i2b2/UTHealth shared tasks.

Authors:  Özlem Uzuner; Amber Stubbs
Journal:  J Biomed Inform       Date:  2015-10-24       Impact factor: 6.317

9.  A cascaded approach for Chinese clinical text de-identification with less annotation effort.

Authors:  Zhe Jian; Xusheng Guo; Shijian Liu; Handong Ma; Shaodian Zhang; Rui Zhang; Jianbo Lei
Journal:  J Biomed Inform       Date:  2017-07-26       Impact factor: 6.317

10.  Building a best-in-class automated de-identification tool for electronic health records through ensemble learning.

Authors:  Karthik Murugadoss; Ajit Rajasekharan; Bradley Malin; Vineet Agarwal; Sairam Bade; Jeff R Anderson; Jason L Ross; William A Faubion; John D Halamka; Venky Soundararajan; Sankar Ardhanari
Journal:  Patterns (N Y)       Date:  2021-05-12
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.