| Literature DB >> 34064395 |
Ali Hussain1, Hee-Eun Choi2, Hyo-Jung Kim3, Satyabrata Aich1, Muhammad Saqlain4, Hee-Cheol Kim5.
Abstract
Preventing exacerbation and seeking to determine the severity of the disease during the hospitalization of chronic obstructive pulmonary disease (COPD) patients is a crucial global initiative for chronic obstructive lung disease (GOLD); this option is available only for stable-phase patients. Recently, the assessment and prediction techniques that are used have been determined to be inadequate for acute exacerbation of chronic obstructive pulmonary disease patients. To magnify the monitoring and treatment of acute exacerbation COPD patients, we need to rely on the AI system, because traditional methods take a long time for the prognosis of the disease. Machine-learning techniques have shown the capacity to be effectively used in crucial healthcare applications. In this paper, we propose a voting ensemble classifier with 24 features to identify the severity of chronic obstructive pulmonary disease patients. In our study, we applied five machine-learning classifiers, namely random forests (RF), support vector machine (SVM), gradient boosting machine (GBM), XGboost (XGB), and K-nearest neighbor (KNN). These classifiers were trained with a set of 24 features. After that, we combined their results with a soft voting ensemble (SVE) method. Consequently, we found performance measures with an accuracy of 91.0849%, a precision of 90.7725%, a recall of 91.3607%, an F-measure of 91.0656%, and an AUC score of 96.8656%, respectively. Our result shows that the SVE classifier with the proposed twenty-four features outperformed regular machine-learning-based methods for chronic obstructive pulmonary disease (COPD) patients. The SVE classifier helps respiratory physicians to estimate the severity of COPD patients in the early stage, consequently guiding the cure strategy and helps the prognosis of COPD patients.Entities:
Keywords: chronic obstructive pulmonary disease (COPD); disease severity; features set; machine learning; prediction models
Year: 2021 PMID: 34064395 PMCID: PMC8147791 DOI: 10.3390/diagnostics11050829
Source DB: PubMed Journal: Diagnostics (Basel) ISSN: 2075-4418
Related work.
| Reference | Features | Classifiers | Outcomes | Performance Indices |
|---|---|---|---|---|
| [ | Clinical | LSTM, ANN, SVM | 92.86% | Accuracy |
| [ | Clinical | DTF | 75.8% | Accuracy |
| [ | Clinical | ANN | More than 90% | Sensitivity, Specificity, AUC |
| [ | Clinical | Naïve Bayes, SVM | 87.8% | Accuracy |
| [ | Clinical | RF | 75% | Sensitivity, Specificity |
| [ | Clinical | ANN | 92% | Accuracy |
| [ | Clinical | MLNN | 94.46% | Accuracy |
| [ | Clinical | GBDT, LR | 89.1% | Accuracy |
| [ | Clinical | LR | 77.6% | Sensitivity, Specificity |
Features used in our study.
| No. | Attributes | Value | Description |
|---|---|---|---|
| 1 | Sex | M/F | Male/Female |
| 2 | DBP | Numerical | Diastolic blood pressure (DBP) |
| 3 | NEUT | Numerical | Neutrophil (NEUT) |
| 4 | AA | Yes, no | Availability of asthma (AA) |
| 5 | EO | Numerical | Eosinophils (EO) |
| 6 | Sputum3m | Yes, no | You have had phlegm almost every day for more than three months a year. Is it? |
| 7 | PHA | Yes, no | History of asthma (PHA) |
| 8 | SGRQc | Numerical | The St. George’s Respiratory Questionnaire (SGRQc). Over the past year, you’ve had several respiratory symptoms. Have you experienced it? |
| 9 | DLCO | Numerical | Diffusing capacity for carbon monoxide (DLCO) |
| 10 | FEF | Numerical | The forced mid-expiratory flow (FEF) |
| 11 | WBC | Numerical | White blood cell (WBC) |
| 12 | SPY | Numerical | Smoke per year (SPY) |
| 13 | BR | Numerical | Breath Result (BR) |
| 14 | Alb | Numerical | Albumin (Alb) |
| 15 | Pt | Numerical | Platelets (Pt) |
| 16 | RBC | Numerical | Red blood cells (RBC) |
| 17 | DS | Numerical | Duration of Smoke (DS) |
| 18 | FF_ratio | Numerical | The ratio FEV1/FVC |
| 19 | CAT | Numerical | COPD Assessment Test (CAT) |
| 20 | LYM | Numerical | Lymphocytes (LYM) |
| 21 | SBP | Numerical | Systolic blood pressure (SBP) |
| 22 | FEV1 | Numerical | Forced expiratory volume in one second (FEV1) |
| 23 | Wt | Numerical | Weight (Wt) |
| 24 | FVC | Numerical | FVC (forced vital capacity): maximum volume of air that can be exhaled during a forced maneuver |
Figure 1The basic architecture of SVE classifier for COPD classification.
Severity detection: classifiers and specifications.
| Classifier | Specification |
|---|---|
| Random Forest | n_estimators = 500, random_state = 0, criterion = ‘gini’, max_depth = 15, min_samples_split = 5, min_samples_leaf = 5 |
| Support Vector Machine | kernel = ‘rbf’, degree = 4, gamma = 7.9, C = 20, decision_function_shape = ‘ovr’, probability = True, random_state = 0 |
| Gradient Boosting Machine | learning_rate = 0.1, n_estimators = 500, max_depth = 15, min_samples_split = 5, min_samples_leaf = 5, subsample = 1, max_features = ‘sqrt’, random_state = 10 |
| XGBoost | random_state = 0, silent = False, scale_pos_weight = 2, learning_rate = 0.1, colsample_bytree = 0.4, subsample = 0.9, objective = ‘binary:logistic’, n_estimators = 500, reg_alpha = 0.01, max_depth = 15, gamma = 7 |
| K-nearest neighbor | n_neighbors = 2, weights = ‘uniform’, algorithm = ‘auto’, leaf_size = 40, p = 2, metric = ‘manhattan’ |
Five-fold cross validation.
| Classifier | 5-Fold Cross Validation (%) | |||||
|---|---|---|---|---|---|---|
| 1st Fold | 2nd Fold | 3rd Fold | 4th Fold | 5th Fold | Average | |
| Random Forest | 84.0268 | 83.8926 | 86.3087 | 88.9784 | 87.2311 | 86.0875 |
| Support Vector Machine | 87.1140 | 87.5167 | 89.1275 | 91.5322 | 88.1720 | 88.6925 |
| Gradient Boosting Machine | 88.8590 | 88.4563 | 90.7382 | 91.5322 | 91.1290 | 90.1429 |
| XGBoost | 84.9664 | 84.0268 | 87.5167 | 90.0537 | 86.4247 | 86.5976 |
| K-nearest neighbor | 84.4295 | 85.3691 | 87.3825 | 86.2903 | 86.6935 | 86.0329 |
| Soft voting ensemble (SVE) | 90.2013 | 88.1879 | 92.2147 | 93.6827 | 91.1290 | 91.0831 |
Figure 2A complete framework of study for COPD patients for identification of different stages.
Comparative analysis of classifiers with different stages of COPD patients (%).
| Classifier | Disease Severity | Precision | Recall | F-Measure |
|---|---|---|---|---|
| RF | Mild | 85.3360 | 89.9141 | 87.5652 |
| Severe | 89.3181 | 84.5161 | 86.8507 | |
| SVM | Mild | 89.2070 | 86.9098 | 88.0434 |
| Severe | 87.2117 | 89.4623 | 88.3227 | |
| GBM | Mild | 87.7263 | 93.5622 | 90.5503 |
| Severe | 93.0875 | 86.8817 | 89.8776 | |
| XGB | Mild | 89.3569 | 86.4806 | 87.8951 |
| Severe | 86.8750 | 89.6774 | 88.2539 | |
| KNN | Mild | 84.9484 | 88.4120 | 86.6456 |
| Severe | 87.8923 | 84.3010 | 86.0591 | |
| SVE | Mild | 91.3606 | 90.7725 | 91.0656 |
| Severe | 90.8119 | 91.3978 | 91.1039 |
Figure 3ROC–AUC curve of the proposed SVE method and the five classifiers.
Overall performance of all classifiers (%).
| Classifier | Accuracy | Precision | Recall | F-Measure | AUC |
|---|---|---|---|---|---|
| RF | 87.2180 | 89.9141 | 85.3360 | 87.5652 | 94.7875 |
| SVM | 88.1847 | 86.9098 | 89.2070 | 88.0434 | 94.0616 |
| GBM | 90.2255 | 93.5622 | 87.7263 | 90.5503 | 96.3192 |
| XGB | 88.0773 | 86.4806 | 89.3569 | 87.8952 | 95.8452 |
| KNN | 86.3587 | 88.4120 | 84.9484 | 86.6456 | 90.0259 |
| SVE | 91.0849 | 90.7725 | 91.3607 | 91.0656 | 96.8656 |
Figure 4The features importance of the proposed soft voting ensemble classifier. Note that FVC, Wt, FEV1, SBP, LYM, CAT, FF_ratio, DS, RBC, Pt, Alb, BR, SPY, WBC, FEF, DLCO, SGRQc, PHA, S3m, EO, AA, NEUT, and DBP denote forced vital capacity, weight, forced expired volume in one-second prediction, systolic blood pressure, lymphocytes, COPD assessment test score, FEV1/FVC ratio, duration smoke, red blood cells, platelets, albumin, breath result, smoke per year, white blood cell, forced mid-expiratory flow, diffusing capacity of the lung for carbon monoxide, the St. George’s Respiratory Questionnaire, history of asthma, sputum3m, eosinophils, availability of asthma, neutrophil, and diastolic blood pressure, respectively.
Performance of all classifiers with recall (sensitivity) on different samples of training and testing splits.
| Classifier | Different Division of Training Set (%) and Testing Set (%) | |||||
|---|---|---|---|---|---|---|
| 90/10 | 80/20 | 70/30 | 60/40 | 50/50 | Mean ± STD | |
| Random Forest | 87.9167 | 85.3360 | 84.5222 | 84.3592 | 84.0260 | 85.2320 ± 1.5762 |
| Support Vector Machine | 92.3076 | 89.2070 | 87.2675 | 86.1490 | 84.3485 | 87.8559 ± 3.0497 |
| Gradient Boosting Machine | 90.0000 | 87.7263 | 87.6177 | 87.4747 | 86.5853 | 87.7822 ± 1.1592 |
| XGBoost | 90.9090 | 89.3569 | 89.5434 | 89.3289 | 88.4892 | 89.1192 ± 0.4288 |
| K-nearest neighbor | 91.071 | 84.9484 | 84.0599 | 80.8593 | 79.3822 | 84.0641 ± 4.5296 |
| Soft voting ensemble (SVE) | 94.5945 | 91.3607 | 90.7172 | 89.5288 | 89.0691 | 91.0540 ± 2.1799 |
A comparison analysis of our results with state-of-the-art models’ work for stage detection.
| Author | Objective | Accuracy (%) | AUC Score (%) |
|---|---|---|---|
| Our Work | “Mild”/“Severe” detection | 91.0848 | 96.8656 |
| Peng et al. [ | “Mild”/“Severe” detection | 80.3 | 80.3 |
| Ryynanen et al. [ | “HRQoL” detection | 77 | 69 |