Literature DB >> 31879734

A Study of Deep Learning Methods for De-identification of Clinical Notes at Cross Institute Settings.

Xi Yang1, Tianchen Lyu1, Chih-Yin Lee1, Jiang Bian1, William R Hogan1, Yonghui Wu1.   

Abstract

In this study, we examined a deep learning method for de-identification of clinical notes at UF Health under a cross-institute setting. We developed deep learning models using 2014 i2b2/UTHealth corpus and evaluated the performance using clinical notes collected from UF Health. We compared four pre-trained word embeddings, including two embeddings from the general domain and two embeddings from the clinical domain. We also explored linguistic features (i.e., word shape and part-of-speech) to further improve the performance of de-identification. The experimental results show that the performance of deep learning models trained using i2b2/UTHealth corpus significantly dropped (strict and relax F1 scores dropped from 0.9547 and 0.9646 to 0.8360 and 0.8870) when applied to another corpus from a different institution (UF Health). Linguistic features, including word shapes and part-of-speech, could further improve the performance of de-identification in cross-institute settings (improved to 0.8527 and 0.9052).

Entities:  

Keywords:  De-identification; Deep Learning; Natural Language Processing

Year:  2019        PMID: 31879734      PMCID: PMC6932867          DOI: 10.1109/ICHI.2019.8904544

Source DB:  PubMed          Journal:  IEEE Int Conf Healthc Inform        ISSN: 2575-2626


  6 in total

1.  Evaluating the state-of-the-art in automatic de-identification.

Authors:  Ozlem Uzuner; Yuan Luo; Peter Szolovits
Journal:  J Am Med Inform Assoc       Date:  2007-06-28       Impact factor: 4.497

Review 2.  De-identification of psychiatric intake records: Overview of 2016 CEGS N-GRID shared tasks Track 1.

Authors:  Amber Stubbs; Michele Filannino; Özlem Uzuner
Journal:  J Biomed Inform       Date:  2017-06-11       Impact factor: 6.317

Review 3.  Natural language processing: an introduction.

Authors:  Prakash M Nadkarni; Lucila Ohno-Machado; Wendy W Chapman
Journal:  J Am Med Inform Assoc       Date:  2011 Sep-Oct       Impact factor: 4.497

4.  Combine Factual Medical Knowledge and Distributed Word Representation to Improve Clinical Named Entity Recognition.

Authors:  Yonghui Wu; Xi Yang; Jiang Bian; Yi Guo; Hua Xu; William Hogan
Journal:  AMIA Annu Symp Proc       Date:  2018-12-05

Review 5.  Automatic de-identification of textual documents in the electronic health record: a review of recent research.

Authors:  Stephane M Meystre; F Jeffrey Friedlin; Brett R South; Shuying Shen; Matthew H Samore
Journal:  BMC Med Res Methodol       Date:  2010-08-02       Impact factor: 4.615

6.  MIMIC-III, a freely accessible critical care database.

Authors:  Alistair E W Johnson; Tom J Pollard; Lu Shen; Li-Wei H Lehman; Mengling Feng; Mohammad Ghassemi; Benjamin Moody; Peter Szolovits; Leo Anthony Celi; Roger G Mark
Journal:  Sci Data       Date:  2016-05-24       Impact factor: 6.444

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.