Literature DB >> 18053696

A de-identifier for medical discharge summaries.

Ozlem Uzuner1, Tawanda C Sibanda, Yuan Luo, Peter Szolovits.   

Abstract

OBJECTIVE: Clinical records contain significant medical information that can be useful to researchers in various disciplines. However, these records also contain personal health information (PHI) whose presence limits the use of the records outside of hospitals. The goal of de-identification is to remove all PHI from clinical records. This is a challenging task because many records contain foreign and misspelled PHI; they also contain PHI that are ambiguous with non-PHI. These complications are compounded by the linguistic characteristics of clinical records. For example, medical discharge summaries, which are studied in this paper, are characterized by fragmented, incomplete utterances and domain-specific language; they cannot be fully processed by tools designed for lay language. METHODS AND
RESULTS: In this paper, we show that we can de-identify medical discharge summaries using a de-identifier, Stat De-id, based on support vector machines and local context (F-measure=97% on PHI). Our representation of local context aids de-identification even when PHI include out-of-vocabulary words and even when PHI are ambiguous with non-PHI within the same corpus. Comparison of Stat De-id with a rule-based approach shows that local context contributes more to de-identification than dictionaries combined with hand-tailored heuristics (F-measure=85%). Comparison with two well-known named entity recognition (NER) systems, SNoW (F-measure=94%) and IdentiFinder (F-measure=36%), on five representative corpora show that when the language of documents is fragmented, a system with a relatively thorough representation of local context can be a more effective de-identifier than systems that combine (relatively simpler) local context with global context. Comparison with a Conditional Random Field De-identifier (CRFD), which utilizes global context in addition to the local context of Stat De-id, confirms this finding (F-measure=88%) and establishes that strengthening the representation of local context may be more beneficial for de-identification than complementing local with global context.

Mesh:

Year:  2007        PMID: 18053696      PMCID: PMC2271040          DOI: 10.1016/j.artmed.2007.10.001

Source DB:  PubMed          Journal:  Artif Intell Med        ISSN: 0933-3657            Impact factor:   5.326


  8 in total

1.  Fast exact string pattern-matching algorithms adapted to the characteristics of the medical language.

Authors:  C Lovis; R H Baud
Journal:  J Am Med Inform Assoc       Date:  2000 Jul-Aug       Impact factor: 4.497

2.  Identification of patient name references within medical documents using semantic selectional restrictions.

Authors:  Ricky K Taira; Alex A T Bui; Hooshang Kangarloo
Journal:  Proc AMIA Symp       Date:  2002

3.  Evaluation of two dependency parsers on biomedical corpus targeted at protein-protein interactions.

Authors:  Sampo Pyysalo; Filip Ginter; Tapio Pahikkala; Jorma Boberg; Jouni Järvinen; Tapio Salakoski
Journal:  Int J Med Inform       Date:  2005-08-11       Impact factor: 4.046

4.  Evaluating the state-of-the-art in automatic de-identification.

Authors:  Ozlem Uzuner; Yuan Luo; Peter Szolovits
Journal:  J Am Med Inform Assoc       Date:  2007-06-28       Impact factor: 4.497

5.  Replacing personally-identifying information in medical records, the Scrub system.

Authors:  L Sweeney
Journal:  Proc AMIA Annu Fall Symp       Date:  1996

6.  Concept-match medical data scrubbing. How pathology text can be used in research.

Authors:  Jules J Berman
Journal:  Arch Pathol Lab Med       Date:  2003-06       Impact factor: 5.534

7.  Evaluation of a deidentification (De-Id) software engine to share pathology reports and clinical documents for research.

Authors:  Dilip Gupta; Melissa Saul; John Gilbertson
Journal:  Am J Clin Pathol       Date:  2004-02       Impact factor: 2.493

8.  Development and evaluation of an open source software tool for deidentification of pathology reports.

Authors:  Bruce A Beckwith; Rajeshwarri Mahaadevan; Ulysses J Balis; Frank Kuo
Journal:  BMC Med Inform Decis Mak       Date:  2006-03-06       Impact factor: 2.796

  8 in total
  32 in total

1.  A translational engine at the national scale: informatics for integrating biology and the bedside.

Authors:  Isaac S Kohane; Susanne E Churchill; Shawn N Murphy
Journal:  J Am Med Inform Assoc       Date:  2011-11-10       Impact factor: 4.497

Review 2.  Strategies for de-identification and anonymization of electronic health record data for use in multicenter research studies.

Authors:  Clete A Kushida; Deborah A Nichols; Rik Jadrnicek; Ric Miller; James K Walsh; Kara Griffin
Journal:  Med Care       Date:  2012-07       Impact factor: 2.983

3.  CRFs based de-identification of medical records.

Authors:  Bin He; Yi Guan; Jianyi Cheng; Keting Cen; Wenlan Hua
Journal:  J Biomed Inform       Date:  2015-08-24       Impact factor: 6.317

4.  Leveraging existing corpora for de-identification of psychiatric notes using domain adaptation.

Authors:  Hee-Jin Lee; Yaoyun Zhang; Kirk Roberts; Hua Xu
Journal:  AMIA Annu Symp Proc       Date:  2018-04-16

5.  BoB, a best-of-breed automated text de-identification system for VHA clinical documents.

Authors:  Oscar Ferrández; Brett R South; Shuying Shen; F Jeffrey Friedlin; Matthew H Samore; Stéphane M Meystre
Journal:  J Am Med Inform Assoc       Date:  2012-09-04       Impact factor: 4.497

6.  A machine learning approach for identifying anatomical locations of actionable findings in radiology reports.

Authors:  Kirk Roberts; Bryan Rink; Sanda M Harabagiu; Richard H Scheuermann; Seth Toomay; Travis Browning; Teresa Bosler; Ronald Peshock
Journal:  AMIA Annu Symp Proc       Date:  2012-11-03

7.  Generalizability and comparison of automatic clinical text de-identification methods and resources.

Authors:  Óscar Ferrández; Brett R South; Shuying Shen; F Jeff Friedlin; Matthew H Samore; Stéphane M Meystre
Journal:  AMIA Annu Symp Proc       Date:  2012-11-03

8.  Is the Juice Worth the Squeeze? Costs and Benefits of Multiple Human Annotators for Clinical Text De-identification.

Authors:  David S Carrell; David J Cronkite; Bradley A Malin; John S Aberdeen; Lynette Hirschman
Journal:  Methods Inf Med       Date:  2016-07-13       Impact factor: 2.176

9.  De-identifying Swedish clinical text - refinement of a gold standard and experiments with Conditional random fields.

Authors:  Hercules Dalianis; Sumithra Velupillai
Journal:  J Biomed Semantics       Date:  2010-04-12

10.  De-identification of primary care electronic medical records free-text data in Ontario, Canada.

Authors:  Karen Tu; Julie Klein-Geltink; Tezeta F Mitiku; Chiriac Mihai; Joel Martin
Journal:  BMC Med Inform Decis Mak       Date:  2010-06-18       Impact factor: 2.796

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.