Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 A cascaded approach for Chinese clinical text de-identification with less annotation effort.

Literature DB >> 28756160

A cascaded approach for Chinese clinical text de-identification with less annotation effort.

Zhe Jian¹, Xusheng Guo², Shijian Liu³, Handong Ma⁴, Shaodian Zhang⁴, Rui Zhang⁵, Jianbo Lei⁶.

Abstract

With rapid adoption of Electronic Health Records (EHR) in China, an increasing amount of clinical data has been available to support clinical research. Clinical data secondary use usually requires de-identification of personal information to protect patient privacy. Since manually de-identification of free clinical text requires significant amount of human work, developing an automated de-identification system is necessary. While there are many de-identification systems available for English clinical text, designing a de-identification system for Chinese clinical text faces many challenges such as unavailability of necessary lexical resources and sparsity of patient health information (PHI) in Chinese clinical text. In this paper, we designed a de-identification pipeline taking advantage of both rule-based and machine learning techniques. Our method, in particular, can effectively construct a data set with dense PHI information, which saves annotation time significantly for subsequent supervised learning. We experiment on a dataset of 3000 heterogeneous clinical documents to evaluate the annotation cost and the de-identification performance. Our approach can increase the efficiency of the annotation effort by over 60% while reaching performance as high as over 90% measured by F score. We demonstrate that combing rule-based and machine learning is an effective way to reduce the annotation cost and achieve high performance in Chinese clinical text de-identification task.

Entities: Disease Species

Keywords: Annotation cost; Chinese NLP; Clinical natural language processing; De-identification

Mesh：

Year: 2017 PMID： 28756160 PMCID： PMC5583002 DOI： 10.1016/j.jbi.2017.07.017

Source DB: PubMed Journal: J Biomed Inform ISSN： 1532-0464 Impact factor: 6.317

16 in total

1. A broad-coverage natural language processing system.

Authors: C Friedman
Journal: Proc AMIA Symp Date: 2000

2. The Unified Medical Language System (UMLS): integrating biomedical terminology.

Authors: Olivier Bodenreider
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

3. Annotating longitudinal clinical narratives for de-identification: The 2014 i2b2/UTHealth corpus.

Authors: Amber Stubbs; Özlem Uzuner
Journal: J Biomed Inform Date: 2015-08-28 Impact factor: 6.317

4. A de-identifier for medical discharge summaries.

Authors: Ozlem Uzuner; Tawanda C Sibanda; Yuan Luo; Peter Szolovits
Journal: Artif Intell Med Date: 2007-11-28 Impact factor: 5.326

5. Detecting negation and scope in Chinese clinical notes using character and word embedding.

Authors: Tian Kang; Shaodian Zhang; Nanfang Xu; Dong Wen; Xingting Zhang; Jianbo Lei
Journal: Comput Methods Programs Biomed Date: 2016-11-23 Impact factor: 5.428

6. Speculation detection for Chinese clinical notes: Impacts of word segmentation and embedding models.

Authors: Shaodian Zhang; Tian Kang; Xingting Zhang; Dong Wen; Noémie Elhadad; Jianbo Lei
Journal: J Biomed Inform Date: 2016-02-26 Impact factor: 6.317

Review 7. The evolution of medical informatics in China: A retrospective study and lessons learned.

Authors: Jianbo Lei; Qun Meng; Yuefeng Li; Minghui Liang; Kai Zheng
Journal: Int J Med Inform Date: 2016-05-11 Impact factor: 4.046

8. Concept-match medical data scrubbing. How pathology text can be used in research.

Authors: Jules J Berman
Journal: Arch Pathol Lab Med Date: 2003-06 Impact factor: 5.534

9. Named Entity Recognition in Chinese Clinical Text Using Deep Neural Network.

Authors: Yonghui Wu; Min Jiang; Jianbo Lei; Hua Xu
Journal: Stud Health Technol Inform Date: 2015

10. Automated de-identification of free-text medical records.

Authors: Ishna Neamatullah; Margaret M Douglass; Li-wei H Lehman; Andrew Reisner; Mauricio Villarroel; William J Long; Peter Szolovits; George B Moody; Roger G Mark; Gari D Clifford
Journal: BMC Med Inform Decis Mak Date: 2008-07-24 Impact factor: 2.796

4 in total

1. De-identifying Spanish medical texts - named entity recognition applied to radiology reports.

Authors: Irene Pérez-Díez; Raúl Pérez-Moraga; Adolfo López-Cerdán; Jose-Maria Salinas-Serrano; María de la Iglesia-Vayá
Journal: J Biomed Semantics Date: 2021-03-29

2. Classifying Cyber-Risky Clinical Notes by Employing Natural Language Processing.

Authors: Suzanna Schmeelk; Martins Samuel Dogo; Yifan Peng; Braja Gopal Patra
Journal: Proc Annu Hawaii Int Conf Syst Sci Date: 2022-01-04

3. An Efficient Method for Deidentifying Protected Health Information in Chinese Electronic Health Records: Algorithm Development and Validation.

Authors: Peng Wang; Yong Li; Liang Yang; Simin Li; Linfeng Li; Zehan Zhao; Shaopei Long; Fei Wang; Hongqian Wang; Ying Li; Chengliang Wang
Journal: JMIR Med Inform Date: 2022-08-30

4. De-identifying free text of Japanese electronic health records.

Authors: Kohei Kajiyama; Hiromasa Horiguchi; Takashi Okumura; Mizuki Morita; Yoshinobu Kano
Journal: J Biomed Semantics Date: 2020-09-21

4 in total