Literature DB >> 29887232

A machine learning based approach to identify protected health information in Chinese clinical text.

Liting Du1, Chenxi Xia1, Zhaohua Deng1, Gary Lu2, Shuxu Xia1, Jingdong Ma3.   

Abstract

BACKGROUND: With the increasing application of electronic health records (EHRs) in the world, protecting private information in clinical text has drawn extensive attention from healthcare providers to researchers. De-identification, the process of identifying and removing protected health information (PHI) from clinical text, has been central to the discourse on medical privacy since 2006. While de-identification is becoming the global norm for handling medical records, there is a paucity of studies on its application on Chinese clinical text. Without efficient and effective privacy protection algorithms in place, the use of indispensable clinical information would be confined.
OBJECTIVES: We aimed to (i) describe the current process for PHI in China, (ii) propose a machine learning based approach to identify PHI in Chinese clinical text, and (iii) validate the effectiveness of the machine learning algorithm for de-identification in Chinese clinical text.
METHODS: Based on 14,719 discharge summaries from regional health centers in Ya'an City, Sichuan province, China, we built a conditional random fields (CRF) model to identify PHI in clinical text, and then used the regular expressions to optimize the recognition results of the PHI categories with fewer samples.
RESULTS: We constructed a Chinese clinical text corpus with PHI tags through substantial manual annotation, wherein the descriptive statistics of PHI manifested its wide range and diverse categories. The evaluation showed with a high F-measure of 0.9878 that our CRF-based model had a good performance for identifying PHI in Chinese clinical text.
CONCLUSION: The rapid adoption of EHR in the health sector has created an urgent need for tools that can parse patient specific information from Chinese clinical text. Our application of CRF algorithms for de-identification has shown the potential to meet this need by offering a highly accurate and flexible solution to analyzing Chinese clinical text.
Copyright © 2018 Elsevier B.V. All rights reserved.

Entities:  

Keywords:  Conditional random fields; De-identification; Electronic health records; Protected health information

Mesh:

Year:  2018        PMID: 29887232     DOI: 10.1016/j.ijmedinf.2018.05.010

Source DB:  PubMed          Journal:  Int J Med Inform        ISSN: 1386-5056            Impact factor:   4.046


  4 in total

1.  Findings from the 2019 International Medical Informatics Association Yearbook Section on Health Information Management.

Authors:  Meryl Bloomrosen; Eta S Berner
Journal:  Yearb Med Inform       Date:  2019-08-16

2.  An Efficient Method for Deidentifying Protected Health Information in Chinese Electronic Health Records: Algorithm Development and Validation.

Authors:  Peng Wang; Yong Li; Liang Yang; Simin Li; Linfeng Li; Zehan Zhao; Shaopei Long; Fei Wang; Hongqian Wang; Ying Li; Chengliang Wang
Journal:  JMIR Med Inform       Date:  2022-08-30

3.  Validation of an algorithm to evaluate the appropriateness of outpatient antibiotic prescribing using big data of Chinese diagnosis text.

Authors:  Houyu Zhao; Jiaming Bian; Li Wei; Liuyi Li; Yingqiu Ying; Zeyu Zhang; Xiaoying Yao; Lin Zhuo; Bin Cao; Mei Zhang; Siyan Zhan
Journal:  BMJ Open       Date:  2020-03-19       Impact factor: 2.692

4.  De-identifying free text of Japanese electronic health records.

Authors:  Kohei Kajiyama; Hiromasa Horiguchi; Takashi Okumura; Mizuki Morita; Yoshinobu Kano
Journal:  J Biomed Semantics       Date:  2020-09-21
  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.