| Literature DB >> 35207716 |
Chew-Teng Kor1,2, Yi-Rong Li3, Pei-Ru Lin1, Sheng-Hao Lin3,4, Bing-Yen Wang5, Ching-Hsiung Lin4,6,7,8.
Abstract
BACKGROUND: The study developed accurate explainable machine learning (ML) models for predicting first-time acute exacerbation of chronic obstructive pulmonary disease (COPD, AECOPD) at an individual level.Entities:
Keywords: COPD; SHapley Additive exPlanations (SHAP); acute exacerbation; explainable machine learning; local explanation
Year: 2022 PMID: 35207716 PMCID: PMC8879653 DOI: 10.3390/jpm12020228
Source DB: PubMed Journal: J Pers Med ISSN: 2075-4426
Figure 1Flowchart of the study population.
Patients’ features in overall cohort and split datasets.
| COPD Patient Data | SPLIT DATA | |||||
|---|---|---|---|---|---|---|
| Non-AECOPD | AECOPD | Train Data | Test Data | |||
| Demographic | ||||||
| Age | 72 ± 10 | 73 ± 10 | 0.557 | 73 ± 10 | 72 ± 11 | 0.882 |
| BMI | 24 ± 4 | 24 ± 4 | 0.593 | 24 ± 4 | 24 ± 4 | 0.962 |
| Vital sign | ||||||
| Pulse | 84 ± 13 | 86 ± 15 | 0.081 | 84 ± 14 | 84 ± 13 | 0.817 |
| Breathing | 19 ± 1 | 19 ± 1 | 0.490 | 19 ± 1 | 19 ± 1 | 0.650 |
| SBP | 135 ± 18 | 135 ± 17 | 0.856 | 135 ± 18 | 137 ± 17 | 0.184 |
| DBP | 75 ± 10 | 76 ± 11 | 0.110 | 75 ± 11 | 77 ± 10 | 0.072 |
| Lung function | ||||||
| FEV1/FVC_post | 60 ± 10 | 57 ± 11 | 0.014 | 59 ± 11 | 60 ± 10 | 0.289 |
| CAT | 4 ± 2 | 6 ± 3 | <0.001 | 4 ± 3 | 4 ± 3 | 0.678 |
| Symptoms | ||||||
| Cough | 177(50%) | 111(71.6%) | <0.001 | 230(56.5%) | 58(56.9%) | 0.949 |
| Dyspnea | 142(40.1%) | 87(56.1%) | 0.001 | 193(47.4%) | 36(35.3%) | 0.028 |
| Wheeze | 112(31.6%) | 93(60%) | <0.001 | 170(41.8%) | 35(34.3%) | 0.170 |
| Comorbidity within 1 year | ||||||
| Chronic pulmonary disease | 163(46.2%) | 101(65.2%) | <0.001 | 211(51.8%) | 53(52.5%) | 0.909 |
| Congestive heart failure | 17(4.8%) | 17(11%) | 0.011 | 31(7.6%) | 3(3%) | 0.094 |
| Sleep disorder | 112(31.7%) | 68(43.9%) | 0.008 | 140(34.4%) | 40(39.6%) | 0.328 |
| anxiety | 26(7.4%) | 15(9.7%) | 0.378 | 33(8.1%) | 8(7.9%) | 0.951 |
| Pneumonia | 35(9.9%) | 41(26.5%) | <0.001 | 61(15%) | 15(14.9%) | 0.973 |
| Hypertension | 159(44.9%) | 85(54.8%) | 0.039 | 196(48.2%) | 48(47.1%) | 0.843 |
| Cancer | 46(13%) | 31(20%) | 0.044 | 63(15.5%) | 14(13.9%) | 0.685 |
| COPD medication within 6 months | ||||||
| Short-acting bronchodilators | 216(61%) | 111(71.6%) | 0.022 | 259(63.6%) | 68(66.7%) | 0.568 |
| Dual bronchodilator | 58(16.4%) | 49(31.6%) | <0.001 | 84(20.6%) | 23(22.5%) | 0.672 |
| Triple bronchodilator | 204(57.6%) | 118(76.1%) | <0.001 | 262(64.4%) | 60(58.8%) | 0.298 |
| Chronic disease medication within 1 year | ||||||
| Antibiotic | 44(12.4%) | 47(30.3%) | <0.001 | 78(19.2%) | 13(12.7%) | 0.130 |
| Oral long-acting bronchodilator | 14(4%) | 25(16.1%) | <0.001 | 31(7.6%) | 8(7.8%) | 0.939 |
| Methylxanthines | 217(61.3%) | 122(78.7%) | <0.001 | 267(65.6%) | 72(70.6%) | 0.340 |
| Lab data within 6 months | ||||||
| WBC count | 7.7 ± 3.1 | 9.1 ± 3.9 | <0.001 | 8.3 ± 3.6 | 7.4 ± 2.4 | 0.003 |
| RBC count | 4.5 ± 0.7 | 4.5 ± 0.5 | 0.165 | 4.5 ± 0.6 | 4.5 ± 0.7 | 0.910 |
| Mean platelet volume | 8.1 ± 0.9 | 8.2 ± 1 | 0.480 | 8.1 ± 1 | 8.1 ± 0.8 | 0.606 |
| Monocyte | 2.6 ± 3.2 | 2.5 ± 2.7 | 0.753 | 9 ± 3.5 | 8.7 ± 3.8 | 0.424 |
| Eosinophil | 21.9 ± 11.3 | 20.5 ± 13.2 | 0.249 | 2.4 ± 2.5 | 3 ± 4.4 | 0.224 |
| Lymphocyte | 5.3 ± 7.1 | 7.2 ± 11.8 | 0.091 | 21.7 ± 12.2 | 20.5 ± 10.7 | 0.421 |
| Neutrophil/Lymphocyte | 8.9 ± 3.5 | 9 ± 3.7 | 0.716 | 5.9 ± 8.6 | 6 ± 10.1 | 0.887 |
| Outcome | ||||||
| AECOPD | 121(29.7%) | 34(33.3%) | 0.479 | |||
Figure 2Study framework for AECOPD risk assessment model and feature engineering.
Comparison of various models’ performance for predicting AECOPD using test data.
| Model | AUC | Threshold | Sensitivity | Specificity | PPV | NPV | F1 Score | Accurate |
|---|---|---|---|---|---|---|---|---|
| Gradient boosted machines (GBMs) | 0.8326 | 0.2614 | 79.41% | 77.94% | 64.29% | 88.33% | 71.05% | 78.43% |
| Extreme Gradient Boosting (XGBoost) | 0.7703 | 0.2961 | 58.82% | 86.76% | 68.97% | 80.82% | 63.49% | 77.45% |
| Random Forest (RF) | 0.7509 | 0.3237 | 64.71% | 83.82% | 66.67% | 82.61% | 65.68% | 77.45% |
| Support Vector Machine (SVM) | 0.8361 | 0.3913 | 82.35% | 69.12% | 57.14% | 88.68% | 67.47% | 73.53% |
The sensitivity, specificity, PPV, and NPV were calculated using Youden’s index.
Figure 3The raincloud plot of AECOPD predicted score in machine learning methods.
Figure 4Receiver operation characteristic curves and decision curves of the models for predicting AECOPD. (a) Receiver operation characteristic curves (ROC curve); (b) Decision curves.
Figure 5Calibration of machine learning models for predicting AECOPD with calibration belts. (a) GBM; (b) XGB; (c) Random Forest; (d) SVM.
Figure 6Summary SHapley Additive exPlanations (SHAP) plot. (a) Global feature importance in final GBM model output. (b) Relationship between features and AECOPD in GBM model.
Figure 7Local explanation plots for individuals with various AECOPD statuses and GBM model predictions. (a) AECOPD and AI predicted AECOPD; (b) Non-AECOPD and AI predicted non-AECOPD; (c) AECOPD but AI predicted non-AECOPD; (d) Non-AECOPD but AI predicted AECOPD. Green and red bars correspond to the contribution of the features to the prediction. Green represents a negative value, which decreases the predicted value; Red represents a positive value, which increases the predicted value. x-axis represents model prediction value; y-axis lists the features and their observed values.