Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 A machine learning based approach to identify protected health information in Chinese clinical text.

Literature DB >> 29887232

A machine learning based approach to identify protected health information in Chinese clinical text.

Liting Du¹, Chenxi Xia¹, Zhaohua Deng¹, Gary Lu², Shuxu Xia¹, Jingdong Ma³.

Abstract

BACKGROUND: With the increasing application of electronic health records (EHRs) in the world, protecting private information in clinical text has drawn extensive attention from healthcare providers to researchers. De-identification, the process of identifying and removing protected health information (PHI) from clinical text, has been central to the discourse on medical privacy since 2006. While de-identification is becoming the global norm for handling medical records, there is a paucity of studies on its application on Chinese clinical text. Without efficient and effective privacy protection algorithms in place, the use of indispensable clinical information would be confined.
OBJECTIVES: We aimed to (i) describe the current process for PHI in China, (ii) propose a machine learning based approach to identify PHI in Chinese clinical text, and (iii) validate the effectiveness of the machine learning algorithm for de-identification in Chinese clinical text.
METHODS: Based on 14,719 discharge summaries from regional health centers in Ya'an City, Sichuan province, China, we built a conditional random fields (CRF) model to identify PHI in clinical text, and then used the regular expressions to optimize the recognition results of the PHI categories with fewer samples.
RESULTS: We constructed a Chinese clinical text corpus with PHI tags through substantial manual annotation, wherein the descriptive statistics of PHI manifested its wide range and diverse categories. The evaluation showed with a high F-measure of 0.9878 that our CRF-based model had a good performance for identifying PHI in Chinese clinical text.
CONCLUSION: The rapid adoption of EHR in the health sector has created an urgent need for tools that can parse patient specific information from Chinese clinical text. Our application of CRF algorithms for de-identification has shown the potential to meet this need by offering a highly accurate and flexible solution to analyzing Chinese clinical text.

Entities: Species

Keywords: Conditional random fields; De-identification; Electronic health records; Protected health information

Mesh：

Year: 2018 PMID： 29887232 DOI： 10.1016/j.ijmedinf.2018.05.010

Source DB: PubMed Journal: Int J Med Inform ISSN： 1386-5056 Impact factor: 4.046

Keyword Cloud
Cited

4 in total

1. Findings from the 2019 International Medical Informatics Association Yearbook Section on Health Information Management.

Authors: Meryl Bloomrosen; Eta S Berner
Journal: Yearb Med Inform Date: 2019-08-16

2. An Efficient Method for Deidentifying Protected Health Information in Chinese Electronic Health Records: Algorithm Development and Validation.

Authors: Peng Wang; Yong Li; Liang Yang; Simin Li; Linfeng Li; Zehan Zhao; Shaopei Long; Fei Wang; Hongqian Wang; Ying Li; Chengliang Wang
Journal: JMIR Med Inform Date: 2022-08-30

3. Validation of an algorithm to evaluate the appropriateness of outpatient antibiotic prescribing using big data of Chinese diagnosis text.

Authors: Houyu Zhao; Jiaming Bian; Li Wei; Liuyi Li; Yingqiu Ying; Zeyu Zhang; Xiaoying Yao; Lin Zhuo; Bin Cao; Mei Zhang; Siyan Zhan
Journal: BMJ Open Date: 2020-03-19 Impact factor: 2.692

4. De-identifying free text of Japanese electronic health records.

Authors: Kohei Kajiyama; Hiromasa Horiguchi; Takashi Okumura; Mizuki Morita; Yoshinobu Kano
Journal: J Biomed Semantics Date: 2020-09-21

4 in total