Literature DB >> 28756160

A cascaded approach for Chinese clinical text de-identification with less annotation effort.

Zhe Jian1, Xusheng Guo2, Shijian Liu3, Handong Ma4, Shaodian Zhang4, Rui Zhang5, Jianbo Lei6.   

Abstract

With rapid adoption of Electronic Health Records (EHR) in China, an increasing amount of clinical data has been available to support clinical research. Clinical data secondary use usually requires de-identification of personal information to protect patient privacy. Since manually de-identification of free clinical text requires significant amount of human work, developing an automated de-identification system is necessary. While there are many de-identification systems available for English clinical text, designing a de-identification system for Chinese clinical text faces many challenges such as unavailability of necessary lexical resources and sparsity of patient health information (PHI) in Chinese clinical text. In this paper, we designed a de-identification pipeline taking advantage of both rule-based and machine learning techniques. Our method, in particular, can effectively construct a data set with dense PHI information, which saves annotation time significantly for subsequent supervised learning. We experiment on a dataset of 3000 heterogeneous clinical documents to evaluate the annotation cost and the de-identification performance. Our approach can increase the efficiency of the annotation effort by over 60% while reaching performance as high as over 90% measured by F score. We demonstrate that combing rule-based and machine learning is an effective way to reduce the annotation cost and achieve high performance in Chinese clinical text de-identification task.
Copyright © 2017 Elsevier Inc. All rights reserved.

Entities:  

Keywords:  Annotation cost; Chinese NLP; Clinical natural language processing; De-identification

Mesh:

Year:  2017        PMID: 28756160      PMCID: PMC5583002          DOI: 10.1016/j.jbi.2017.07.017

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  16 in total

1.  A broad-coverage natural language processing system.

Authors:  C Friedman
Journal:  Proc AMIA Symp       Date:  2000

2.  The Unified Medical Language System (UMLS): integrating biomedical terminology.

Authors:  Olivier Bodenreider
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

3.  Annotating longitudinal clinical narratives for de-identification: The 2014 i2b2/UTHealth corpus.

Authors:  Amber Stubbs; Özlem Uzuner
Journal:  J Biomed Inform       Date:  2015-08-28       Impact factor: 6.317

4.  A de-identifier for medical discharge summaries.

Authors:  Ozlem Uzuner; Tawanda C Sibanda; Yuan Luo; Peter Szolovits
Journal:  Artif Intell Med       Date:  2007-11-28       Impact factor: 5.326

5.  Detecting negation and scope in Chinese clinical notes using character and word embedding.

Authors:  Tian Kang; Shaodian Zhang; Nanfang Xu; Dong Wen; Xingting Zhang; Jianbo Lei
Journal:  Comput Methods Programs Biomed       Date:  2016-11-23       Impact factor: 5.428

6.  Speculation detection for Chinese clinical notes: Impacts of word segmentation and embedding models.

Authors:  Shaodian Zhang; Tian Kang; Xingting Zhang; Dong Wen; Noémie Elhadad; Jianbo Lei
Journal:  J Biomed Inform       Date:  2016-02-26       Impact factor: 6.317

Review 7.  The evolution of medical informatics in China: A retrospective study and lessons learned.

Authors:  Jianbo Lei; Qun Meng; Yuefeng Li; Minghui Liang; Kai Zheng
Journal:  Int J Med Inform       Date:  2016-05-11       Impact factor: 4.046

8.  Concept-match medical data scrubbing. How pathology text can be used in research.

Authors:  Jules J Berman
Journal:  Arch Pathol Lab Med       Date:  2003-06       Impact factor: 5.534

9.  Named Entity Recognition in Chinese Clinical Text Using Deep Neural Network.

Authors:  Yonghui Wu; Min Jiang; Jianbo Lei; Hua Xu
Journal:  Stud Health Technol Inform       Date:  2015

10.  Automated de-identification of free-text medical records.

Authors:  Ishna Neamatullah; Margaret M Douglass; Li-wei H Lehman; Andrew Reisner; Mauricio Villarroel; William J Long; Peter Szolovits; George B Moody; Roger G Mark; Gari D Clifford
Journal:  BMC Med Inform Decis Mak       Date:  2008-07-24       Impact factor: 2.796

View more
  4 in total

1.  De-identifying Spanish medical texts - named entity recognition applied to radiology reports.

Authors:  Irene Pérez-Díez; Raúl Pérez-Moraga; Adolfo López-Cerdán; Jose-Maria Salinas-Serrano; María de la Iglesia-Vayá
Journal:  J Biomed Semantics       Date:  2021-03-29

2.  Classifying Cyber-Risky Clinical Notes by Employing Natural Language Processing.

Authors:  Suzanna Schmeelk; Martins Samuel Dogo; Yifan Peng; Braja Gopal Patra
Journal:  Proc Annu Hawaii Int Conf Syst Sci       Date:  2022-01-04

3.  An Efficient Method for Deidentifying Protected Health Information in Chinese Electronic Health Records: Algorithm Development and Validation.

Authors:  Peng Wang; Yong Li; Liang Yang; Simin Li; Linfeng Li; Zehan Zhao; Shaopei Long; Fei Wang; Hongqian Wang; Ying Li; Chengliang Wang
Journal:  JMIR Med Inform       Date:  2022-08-30

4.  De-identifying free text of Japanese electronic health records.

Authors:  Kohei Kajiyama; Hiromasa Horiguchi; Takashi Okumura; Mizuki Morita; Yoshinobu Kano
Journal:  J Biomed Semantics       Date:  2020-09-21
  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.