Chenshuo Wang1, Xianxiang Chen2, Lidong Du2, Qingyuan Zhan3, Ting Yang4, Zhen Fang5. 1. Institute of Electronics, Chinese Academy of Sciences, Beijing, China; University of Chinese Academy of Sciences, Beijing, China. 2. Institute of Electronics, Chinese Academy of Sciences, Beijing, China; Personalized Management of Chronic Respiratory Disease, Chinese Academy of Medical Sciences, China. 3. China-Japan Friendship Hospital, Beijing, China. 4. China-Japan Friendship Hospital, Beijing, China. Electronic address: dryangting@qq.com. 5. Institute of Electronics, Chinese Academy of Sciences, Beijing, China; Personalized Management of Chronic Respiratory Disease, Chinese Academy of Medical Sciences, China; University of Chinese Academy of Sciences, Beijing, China. Electronic address: zfang@mail.ie.ac.cn.
Abstract
OBJECTIVES: Identifying acute exacerbations in chronic obstructive pulmonary disease (AECOPDs) is of utmost importance for reducing the associated mortality and financial burden. In this research, the authors aimed to develop identification models for AECOPDs and to compare the relative performance of different modeling paradigms to find the best model for this task. METHODS: Data were extracted from electronic medical records (EMRs) of patients with chronic obstructive pulmonary disease who admitted to the China-Japan Friendship Hospital between February 2011 and March 2017. Five machine learning algorithms (random forest, support vector machine, logistic regression, K-nearest neighbor and naïve Bayes) were used to develop the AECOPDs identification models. Feature selection was performed to find an optimal feature subset. 10-folds cross-validation was used to find the best hyperparameters for each model. The following metrics: area under the receiver operating characteristic curve, sensitivity, specificity, positive predictive value, and negative predictive value were used to evaluate the performance of these models. RESULTS: A total of 303 EMRs (AECOPDs patients:135; None AECOPDs patients: 168) were included in the study. The SVM model obtained the best performance (sensitivity: 0.80, specificity: 0.83, positive predictive value:0.81, negative predictive value:0.85 and area under the receiver operating characteristic curve: 0.90) after performing feature selection. CONCLUSIONS: Our research confirms that the proposed model based on the support vector machine is a powerful tool to identify AECOPDs patients, and it is promising to provide decision support for clinicians when they are struggling to give a confirmed clinical diagnosis.
OBJECTIVES: Identifying acute exacerbations in chronic obstructive pulmonary disease (AECOPDs) is of utmost importance for reducing the associated mortality and financial burden. In this research, the authors aimed to develop identification models for AECOPDs and to compare the relative performance of different modeling paradigms to find the best model for this task. METHODS: Data were extracted from electronic medical records (EMRs) of patients with chronic obstructive pulmonary disease who admitted to the China-Japan Friendship Hospital between February 2011 and March 2017. Five machine learning algorithms (random forest, support vector machine, logistic regression, K-nearest neighbor and naïve Bayes) were used to develop the AECOPDs identification models. Feature selection was performed to find an optimal feature subset. 10-folds cross-validation was used to find the best hyperparameters for each model. The following metrics: area under the receiver operating characteristic curve, sensitivity, specificity, positive predictive value, and negative predictive value were used to evaluate the performance of these models. RESULTS: A total of 303 EMRs (AECOPDs patients:135; None AECOPDs patients: 168) were included in the study. The SVM model obtained the best performance (sensitivity: 0.80, specificity: 0.83, positive predictive value:0.81, negative predictive value:0.85 and area under the receiver operating characteristic curve: 0.90) after performing feature selection. CONCLUSIONS: Our research confirms that the proposed model based on the support vector machine is a powerful tool to identify AECOPDs patients, and it is promising to provide decision support for clinicians when they are struggling to give a confirmed clinical diagnosis.