Literature DB >> 17600094

Evaluating the state-of-the-art in automatic de-identification.

Ozlem Uzuner1, Yuan Luo, Peter Szolovits.   

Abstract

To facilitate and survey studies in automatic de-identification, as a part of the i2b2 (Informatics for Integrating Biology to the Bedside) project, authors organized a Natural Language Processing (NLP) challenge on automatically removing private health information (PHI) from medical discharge records. This manuscript provides an overview of this de-identification challenge, describes the data and the annotation process, explains the evaluation metrics, discusses the nature of the systems that addressed the challenge, analyzes the results of received system runs, and identifies directions for future research. The de-indentification challenge data consisted of discharge summaries drawn from the Partners Healthcare system. Authors prepared this data for the challenge by replacing authentic PHI with synthesized surrogates. To focus the challenge on non-dictionary-based de-identification methods, the data was enriched with out-of-vocabulary PHI surrogates, i.e., made up names. The data also included some PHI surrogates that were ambiguous with medical non-PHI terms. A total of seven teams participated in the challenge. Each team submitted up to three system runs, for a total of sixteen submissions. The authors used precision, recall, and F-measure to evaluate the submitted system runs based on their token-level and instance-level performance on the ground truth. The systems with the best performance scored above 98% in F-measure for all categories of PHI. Most out-of-vocabulary PHI could be identified accurately. However, identifying ambiguous PHI proved challenging. The performance of systems on the test data set is encouraging. Future evaluations of these systems will involve larger data sets from more heterogeneous sources.

Mesh:

Year:  2007        PMID: 17600094      PMCID: PMC1975792          DOI: 10.1197/jamia.M2444

Source DB:  PubMed          Journal:  J Am Med Inform Assoc        ISSN: 1067-5027            Impact factor:   4.497


  20 in total

1.  Classifying free-text triage chief complaints into syndromic categories with natural language processing.

Authors:  Wendy W Chapman; Lee M Christensen; Michael M Wagner; Peter J Haug; Oleg Ivanov; John N Dowling; Robert T Olszewski
Journal:  Artif Intell Med       Date:  2005-01       Impact factor: 5.326

2.  Will the wave finally break? A brief view of the adoption of electronic medical records in the United States.

Authors:  Eta S Berner; Don E Detmer; Donald Simborg
Journal:  J Am Med Inform Assoc       Date:  2004-10-18       Impact factor: 4.497

3.  Improved identification of noun phrases in clinical radiology reports using a high-performance statistical natural language parser augmented with the UMLS specialist lexicon.

Authors:  Yang Huang; Henry J Lowe; Dan Klein; Russell J Cucina
Journal:  J Am Med Inform Assoc       Date:  2005-01-31       Impact factor: 4.497

4.  Agreement, the f-measure, and reliability in information retrieval.

Authors:  George Hripcsak; Adam S Rothschild
Journal:  J Am Med Inform Assoc       Date:  2005-01-31       Impact factor: 4.497

5.  State-of-the-art anonymization of medical records using an iterative machine learning framework.

Authors:  György Szarvas; Richárd Farkas; Róbert Busa-Fekete
Journal:  J Am Med Inform Assoc       Date:  2007 Sep-Oct       Impact factor: 4.497

6.  Rapidly retargetable approaches to de-identification in medical records.

Authors:  Ben Wellner; Matt Huyck; Scott Mardis; John Aberdeen; Alex Morgan; Leonid Peshkin; Alex Yeh; Janet Hitzeman; Lynette Hirschman
Journal:  J Am Med Inform Assoc       Date:  2007-06-28       Impact factor: 4.497

7.  Representing information in patient reports using natural language processing and the extensible markup language.

Authors:  C Friedman; G Hripcsak; L Shagina; H Liu
Journal:  J Am Med Inform Assoc       Date:  1999 Jan-Feb       Impact factor: 4.497

8.  Replacing personally-identifying information in medical records, the Scrub system.

Authors:  L Sweeney
Journal:  Proc AMIA Annu Fall Symp       Date:  1996

9.  Overview of BioCreAtIvE: critical assessment of information extraction for biology.

Authors:  Lynette Hirschman; Alexander Yeh; Christian Blaschke; Alfonso Valencia
Journal:  BMC Bioinformatics       Date:  2005-05-24       Impact factor: 3.169

10.  Development and evaluation of an open source software tool for deidentification of pathology reports.

Authors:  Bruce A Beckwith; Rajeshwarri Mahaadevan; Ulysses J Balis; Frank Kuo
Journal:  BMC Med Inform Decis Mak       Date:  2006-03-06       Impact factor: 2.796

View more
  153 in total

1.  Qualitative analysis of workflow modifications used to generate the reference standard for the 2010 i2b2/VA challenge.

Authors:  Brett R South; Shuying Shen; Robyn Barrus; Scott L DuVall; Ozlem Uzuner; Charlene Weir
Journal:  AMIA Annu Symp Proc       Date:  2011-10-22

2.  A translational engine at the national scale: informatics for integrating biology and the bedside.

Authors:  Isaac S Kohane; Susanne E Churchill; Shawn N Murphy
Journal:  J Am Med Inform Assoc       Date:  2011-11-10       Impact factor: 4.497

3.  medpie: an information extraction package for medical message board posts.

Authors:  A Benton; J H Holmes; S Hill; A Chung; L Ungar
Journal:  Bioinformatics       Date:  2012-01-19       Impact factor: 6.937

Review 4.  Evaluating the state of the art in coreference resolution for electronic medical records.

Authors:  Ozlem Uzuner; Andreea Bodnari; Shuying Shen; Tyler Forbush; John Pestian; Brett R South
Journal:  J Am Med Inform Assoc       Date:  2012-02-24       Impact factor: 4.497

5.  Strategies for maintaining patient privacy in i2b2.

Authors:  Shawn N Murphy; Vivian Gainer; Michael Mendis; Susanne Churchill; Isaac Kohane
Journal:  J Am Med Inform Assoc       Date:  2011-10-07       Impact factor: 4.497

Review 6.  Strategies for de-identification and anonymization of electronic health record data for use in multicenter research studies.

Authors:  Clete A Kushida; Deborah A Nichols; Rik Jadrnicek; Ric Miller; James K Walsh; Kara Griffin
Journal:  Med Care       Date:  2012-07       Impact factor: 2.983

7.  Hiding in plain sight: use of realistic surrogates to reduce exposure of protected health information in clinical text.

Authors:  David Carrell; Bradley Malin; John Aberdeen; Samuel Bayer; Cheryl Clark; Ben Wellner; Lynette Hirschman
Journal:  J Am Med Inform Assoc       Date:  2012-07-06       Impact factor: 4.497

8.  Improving textual medication extraction using combined conditional random fields and rule-based systems.

Authors:  Domonkos Tikk; Illés Solt
Journal:  J Am Med Inform Assoc       Date:  2010 Sep-Oct       Impact factor: 4.497

9.  Building a best-in-class automated de-identification tool for electronic health records through ensemble learning.

Authors:  Karthik Murugadoss; Ajit Rajasekharan; Bradley Malin; Vineet Agarwal; Sairam Bade; Jeff R Anderson; Jason L Ross; William A Faubion; John D Halamka; Venky Soundararajan; Sankar Ardhanari
Journal:  Patterns (N Y)       Date:  2021-05-12

10.  Automatic lymphoma classification with sentence subgraph mining from pathology reports.

Authors:  Yuan Luo; Aliyah R Sohani; Ephraim P Hochberg; Peter Szolovits
Journal:  J Am Med Inform Assoc       Date:  2014-01-15       Impact factor: 4.497

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.