Jia Su1, Jinpeng Hu1, Jingchi Jiang1, Jing Xie1, Yang Yang1, Bin He1, Jinfeng Yang2, Yi Guan3. 1. Language Technology Research Center, Harbin Institute of Technology, Integrated Building Room 803, 92 West Dazhi Street, Harbin 150001, Heilongjiang, China. 2. School of Software, Harbin University of Science and Technology, Harbin, Heilongjiang, China. 3. Language Technology Research Center, Harbin Institute of Technology, Integrated Building Room 803, 92 West Dazhi Street, Harbin 150001, Heilongjiang, China. Electronic address: guanyi@hit.edu.cn.
Abstract
BACKGROUND AND OBJECTIVE: Early prevention of cardiovascular diseases (CVDs) can effectively prevent later loss of health, and the detection of CVDs risk factors is a simple method to achieve early prevention. Personal health records play a prominent role in the field of health information extraction because of their factuality and reliability. This present study describes how to extract risk factors for CVDs from Chinese electronic medical records (CEMRs). METHODS: The extraction process involves two tasks: (a) CVDs risk factor recognition and (b) risk factor time and assertion classification. We considered risk factor recognition as a named entity recognition (NER) task and time and assertion classification as a textual classification task. An information extraction pipeline system consisting of NER and textual classification modules with machine learning models was developed. In the risk factor recognition module, bidirectional long short term memory (BLSTM) with extra risk factor textual feature input was built, as well, convolutional neural networks (CNNs) with risk factor type and section label input and support vector machine (SVM) were built for time and assertion classification. RESULTS: We have achieved the best performance of risk factor recognition with F1 value of 0.9609, time and assertion classification with F1 of 0.9812 and 0.9612, respectively. The experimental results showed that our system achieved a high performance and can extract risk factors from CEMRs efficiently. CONCLUSIONS: The proposed system is the first system for CVDs risk factors extraction from CEMRs and shows competition to risk factor extraction systems that developed on English EMRs. Further, its good performance should have a strong influence on CVDs prevention.
BACKGROUND AND OBJECTIVE: Early prevention of cardiovascular diseases (CVDs) can effectively prevent later loss of health, and the detection of CVDs risk factors is a simple method to achieve early prevention. Personal health records play a prominent role in the field of health information extraction because of their factuality and reliability. This present study describes how to extract risk factors for CVDs from Chinese electronic medical records (CEMRs). METHODS: The extraction process involves two tasks: (a) CVDs risk factor recognition and (b) risk factor time and assertion classification. We considered risk factor recognition as a named entity recognition (NER) task and time and assertion classification as a textual classification task. An information extraction pipeline system consisting of NER and textual classification modules with machine learning models was developed. In the risk factor recognition module, bidirectional long short term memory (BLSTM) with extra risk factor textual feature input was built, as well, convolutional neural networks (CNNs) with risk factor type and section label input and support vector machine (SVM) were built for time and assertion classification. RESULTS: We have achieved the best performance of risk factor recognition with F1 value of 0.9609, time and assertion classification with F1 of 0.9812 and 0.9612, respectively. The experimental results showed that our system achieved a high performance and can extract risk factors from CEMRs efficiently. CONCLUSIONS: The proposed system is the first system for CVDs risk factors extraction from CEMRs and shows competition to risk factor extraction systems that developed on English EMRs. Further, its good performance should have a strong influence on CVDs prevention.