Literature DB >> 28478268

The UAB Informatics Institute and 2016 CEGS N-GRID de-identification shared task challenge.

Duy Duc An Bui1, Mathew Wyatt2, James J Cimino2.   

Abstract

Clinical narratives (the text notes found in patients' medical records) are important information sources for secondary use in research. However, in order to protect patient privacy, they must be de-identified prior to use. Manual de-identification is considered to be the gold standard approach but is tedious, expensive, slow, and impractical for use with large-scale clinical data. Automated or semi-automated de-identification using computer algorithms is a potentially promising alternative. The Informatics Institute of the University of Alabama at Birmingham is applying de-identification to clinical data drawn from the UAB hospital's electronic medical records system before releasing them for research. We participated in a shared task challenge by the Centers of Excellence in Genomic Science (CEGS) Neuropsychiatric Genome-Scale and RDoC Individualized Domains (N-GRID) at the de-identification regular track to gain experience developing our own automatic de-identification tool. We focused on the popular and successful methods from previous challenges: rule-based, dictionary-matching, and machine-learning approaches. We also explored new techniques such as disambiguation rules, term ambiguity measurement, and used multi-pass sieve framework at a micro level. For the challenge's primary measure (strict entity), our submissions achieved competitive results (f-measures: 87.3%, 87.1%, and 86.7%). For our preferred measure (binary token HIPAA), our submissions achieved superior results (f-measures: 93.7%, 93.6%, and 93%). With those encouraging results, we gain the confidence to improve and use the tool for the real de-identification task at the UAB Informatics Institute.
Copyright © 2017 Elsevier Inc. All rights reserved.

Entities:  

Keywords:  Automatic de-identification; Clinical natural language processing; Machine learning; Shared task

Mesh:

Year:  2017        PMID: 28478268      PMCID: PMC5670015          DOI: 10.1016/j.jbi.2017.05.001

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  18 in total

1.  Coreference analysis in clinical notes: a multi-pass sieve with alternate anaphora resolution modules.

Authors:  Siddhartha Reddy Jonnalagadda; Dingcheng Li; Sunghwan Sohn; Stephen Tze-Inn Wu; Kavishwar Wagholikar; Manabu Torii; Hongfang Liu
Journal:  J Am Med Inform Assoc       Date:  2012-06-16       Impact factor: 4.497

2.  Rapidly retargetable approaches to de-identification in medical records.

Authors:  Ben Wellner; Matt Huyck; Scott Mardis; John Aberdeen; Alex Morgan; Leonid Peshkin; Alex Yeh; Janet Hitzeman; Lynette Hirschman
Journal:  J Am Med Inform Assoc       Date:  2007-06-28       Impact factor: 4.497

3.  Automated misspelling detection and correction in clinical free-text records.

Authors:  Kenneth H Lai; Maxim Topaz; Foster R Goss; Li Zhou
Journal:  J Biomed Inform       Date:  2015-04-24       Impact factor: 6.317

4.  BoB, a best-of-breed automated text de-identification system for VHA clinical documents.

Authors:  Oscar Ferrández; Brett R South; Shuying Shen; F Jeffrey Friedlin; Matthew H Samore; Stéphane M Meystre
Journal:  J Am Med Inform Assoc       Date:  2012-09-04       Impact factor: 4.497

5.  A unified framework for evaluating the risk of re-identification of text de-identification tools.

Authors:  Martin Scaiano; Grant Middleton; Luk Arbuckle; Varada Kolhatkar; Liam Peyton; Moira Dowling; Debbie S Gipson; Khaled El Emam
Journal:  J Biomed Inform       Date:  2016-07-15       Impact factor: 6.317

Review 6.  Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task Track 1.

Authors:  Amber Stubbs; Christopher Kotfila; Özlem Uzuner
Journal:  J Biomed Inform       Date:  2015-07-28       Impact factor: 6.317

7.  De-identification of Address, Date, and Alphanumeric Identifiers in Narrative Clinical Reports.

Authors:  Mehmet Kayaalp; Allen C Browne; Zeyno A Dodd; Pamela Sagan; Clement J McDonald
Journal:  AMIA Annu Symp Proc       Date:  2014-11-14

8.  De-identification of patient notes with recurrent neural networks.

Authors:  Franck Dernoncourt; Ji Young Lee; Ozlem Uzuner; Peter Szolovits
Journal:  J Am Med Inform Assoc       Date:  2017-05-01       Impact factor: 4.497

9.  Combining knowledge- and data-driven methods for de-identification of clinical narratives.

Authors:  Azad Dehghan; Aleksandar Kovacevic; George Karystianis; John A Keane; Goran Nenadic
Journal:  J Biomed Inform       Date:  2015-07-22       Impact factor: 6.317

10.  Automatic detection of protected health information from clinic narratives.

Authors:  Hui Yang; Jonathan M Garibaldi
Journal:  J Biomed Inform       Date:  2015-07-29       Impact factor: 6.317

View more
  2 in total

1.  A natural language processing challenge for clinical records: Research Domains Criteria (RDoC) for psychiatry.

Authors:  Özlem Uzuner; Amber Stubbs; Michele Filannino
Journal:  J Biomed Inform       Date:  2017-10-16       Impact factor: 6.317

Review 2.  Clinical concept extraction: A methodology review.

Authors:  Sunyang Fu; David Chen; Huan He; Sijia Liu; Sungrim Moon; Kevin J Peterson; Feichen Shen; Liwei Wang; Yanshan Wang; Andrew Wen; Yiqing Zhao; Sunghwan Sohn; Hongfang Liu
Journal:  J Biomed Inform       Date:  2020-08-06       Impact factor: 6.317

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.