| Literature DB >> 33717594 |
Qin Liu1, Baoguo Pang2, Haijun Li3, Bin Zhang4, Yumei Liu5, Lihua Lai1, Wenjun Le6, Jianyu Li1, Tingting Xia1, Xiaoxian Zhang7, Changxing Ou7, Jianjuan Ma8, Shenghao Li1, Xiumei Guo1, Shuixing Zhang4, Qingling Zhang7, Min Jiang9, Qingsi Zeng1.
Abstract
BACKGROUND: To develop machine learning classifiers at admission for predicting which patients with coronavirus disease 2019 (COVID-19) who will progress to critical illness.Entities:
Keywords: COVID-19; chest CT; critical illness; machine learning; prediction
Year: 2021 PMID: 33717594 PMCID: PMC7947498 DOI: 10.21037/jtd-20-2580
Source DB: PubMed Journal: J Thorac Dis ISSN: 2072-1439 Impact factor: 2.895
Figure 1The framework of predicting progression to critical illness in COVID-19 patients. The workflow mainly consists of five steps: (1) clinical and laboratory data collection; (2) chest CT image acquisition; (3) AI-based quantitative CT analysis; (4) feature selection; and (5) development of clinical, radiological, and combined models using eight machine learning classifiers. The performance of models was evaluated by receiver operating characteristic curve analysis.
The baseline characteristics and laboratory findings at admission
| Non-critical (n=123) | Critical (n=35) | P value | |
|---|---|---|---|
| Age (years) | 58.2±14.4 | 61.5±11.7 | 0.213 |
| Sex | |||
| Male | 64 (52.0) | 25 (71.4) | 0.041 |
| Female | 59 (48.0) | 10 (28.6) | |
| Comorbidities | |||
| COPD | 3 (2.4) | 2 (5.7) | 0.307 |
| Heart disease | 10 (8.1) | 4 (11.4) | 0.513 |
| Hypertension | 23 (18.7) | 17 (48.6) | <0.001 |
| Diabetes | 9 (7.3) | 12 (34.3) | <0.001 |
| Malignancy | 1 (0.8) | 1 (2.9) | 0.395 |
| Cerebropathy | 3 (2.4) | 0 | 1.000 |
| Others | 23 (18.7) | 7 (20.0) | 0.863 |
| No. of comorbidities | |||
| 0 | 80 (65.0) | 11 (31.4) | <0.001 |
| 1 | 25 (20.3) | 11 (31.4) | 0.167 |
| 2 | 13 (10.6) | 9 (25.7) | 0.049 |
| 3 | 3 (2.4) | 2 (5.7) | 0.307 |
| 4 | 2 (1.6) | 2 (5.7) | 0.213 |
| WBC (×109/L) | 5.4±2.1 | 9.2±5.3 | <0.001 |
| Neutrophil (×109/L) | 4.4±6.2 | 8.2±5.2 | 0.002 |
| Neutrophil (%) | 67.0±15.4 | 86.6±6.9 | <0.001 |
| Lymphocyte (×109/L) | 1.3±1.9 | 0.6±0.3 | 0.031 |
| Lymphocyte (%) | 22.8±11.3 | 8.0±4.7 | <0.001 |
| Eosinophil (×109/L) | 0.1±0.3 | 0.01±0.02 | 0.143 |
| Eosinophil (%) | 2.3±10.2 | 0.1±0.2 | 0.227 |
| Monocyte (×109/L) | 0.4±0.2 | 0.5±0.3 | 0.186 |
| Monocyte (%) | 7.7±3.3 | 5.1±2.9 | 0.002 |
| Hemoglobin (g/L) | 126.4±19.4 | 131.9±16.0 | 0.138 |
| Platelet (g/L) | 216.4±76.7 | 172.5±56.4 | 0.001 |
| Fibrinogen (g/L) | 3.8±2.0 | 5.4±1.8 | 0.002 |
| D-dimer (μg/mL) | 20.7±102.3 | 963.9±2,241.8 | 0.011 |
| hs-CRP (mg/L) | 19.1±19.9 | 34.8±4.3 | <0.001 |
| ALT (U/L) | 26.1±17.0 | 43.0±28.2 | 0.004 |
| AST (U/L) | 25.9±14.2 | 50.3±34.6 | 0.001 |
| TBIL (μmol/L) | 12.5±7.7 | 13.4±6.4 | 0.554 |
| DBIL (μmol/L) | 6.1±19.3 | 6.0±4.2 | 0.983 |
| ALP (U/L) | 67.0±30.6 | 73.2±36.7 | 0.436 |
| LDH (U/L) | 231.8±105.9 | 458.4±161.4 | <0.001 |
| Procalcitonin (ng/mL) | 0.14±0.13 | 0.52±0.84 | 0.032 |
| Creatinine (μmol/L) | 69.7±17.8 | 91.0±46.8 | 0.020 |
| Urea nitrogen (mmol/L) | 4.6±1.6 | 7.6±4.3 | 0.001 |
Data were mean ± standard deviation (SD) or number (percentage). P values were calculated by t test, Mann-Whitney U test, χ2 test or Fisher’s exact test, as appropriate. Abbreviations: COPD, chronic obstructive pulmonary disease; CT, computed tomography; WBC, white blood cells; hs-CRP, high-sensitivity C-reactive protein; ALT, alanine transaminase; AST, aspartate aminotransferase; TBIL, total bilirubin; DBIL, direct bilirubin; ALP, alkaline phosphatase; LDH, lactate dehydrogenase.
Figure 2Feature selection using the LASSO binary logistic regression model. (A) Tuning parameter (lambda) selection in the LASSO regression used 5-fold cross-validation via 1 standard error criteria, four laboratory features with non-zero coefficient were selected. (B) LASSO coefficient profiles of the 27 clinical features. (C) Tuning parameter (lambda) selection in the LASSO regression used 5-fold cross-validation via 1 standard error criteria, four quantitative CT features with non-zero coefficient were selected. (D) LASSO coefficient profiles of the 201 radiological features.
Figure 3Relative importance of the selected clinical (A) and radiological (B) features according to the LASSO regression coefficient.
Figure 4Two representative cases of non-critical and critical COVID-19 patients. The non-critical case was a 25-year-old female presented with fever for one day. Her initial chest CT images show GGO and consolidation with crazy paving and air bronchogram sign in the lateral segment of right middle lobe of lung (A,B). The laboratory tests show WBC of 4.3×109/L, neutrophil of 2.7×109/L, lymphocyte count of 1.1×109/L, lymphocyte percentage of 26.1%, d-dimer of 263 µg/mL, and LDH of 47.6 U/L. The critical case was a 58-year-old male who had fever for 10 days and shortness of breath for 3 days. The admission thin-section chest CT images demonstrate extensive GGO and consolidation with crazy paving and bronchial wall thickening in both lungs (C,D). The laboratory findings show WBC of 10.2×109/L, neutrophil of 9.6×109/L, lymphocyte count of 0.2×109/L, lymphocyte percentage of 2.2%, d-dimer of 1,807 µg/mL, and LDH of 811.7 U/L.
Comparison of clinical model based on eight machine learning classifiers in predicting critical illness among patients with COVID-19
| Classifiers | Measured metrics | ||||||
|---|---|---|---|---|---|---|---|
| AUC (95% CI) | Accuracy% (95% CI) | F1 score (95% CI) | PPV% (95% CI) | NPV% (95% CI) | Specificity% (95% CI) | Sensitivity% (95% CI) | |
| XGBoost | 0.960 (0.913–1.000) | 90.6 (81.1–98.1) | 82.8 (65.9–100.0) | 70.6 (54.5–100.0) | 100.0 (95.1–100.0) | 87.8 (75.6–100.0) | 100.0 (83.3–100.0) |
| AdaBoost | 0.929 (0.857–1.000) | 84.9 (71.7–98.1) | 75.0 (53.3–100.0) | 60.0 (44.4–100.0) | 100.0 (91.1–100.0) | 80.5 (63.4–100.0) | 100.0 (66.7–100.0) |
| RF | 0.959 (0.913–1.000) | 90.6 (81.1–98.1) | 82.8 (68.4–100.0) | 70.6 (54.5–100.0) | 100.0 (97.1–100.0) | 87.8 (75.6–100.0) | 100.0 (91.7–100.0) |
| LR | 0.937 (0.871–1.000) | 90.6 (81.1–98.1) | 82.8 (68.4–96.0) | 70.6 (54.5–92.3) | 100.0 (97.1–100.0) | 87.8 (75.6–97.6) | 100.0 (91.7–100.0) |
| KNN | 0.851 (0.718–0.983) | 90.6 (83.0–98.1) | 78.9 (55.6–100.0) | 83.3 (62.5–100.0) | 92.9 (86.7–100.0) | 95.1 (87.8–100.0) | 75.0 (50.0–100.0) |
| SVM | 0.917 (0.834–1.000) | 92.5 (73.6–98.1) | 86.5 (57.1–100.0) | 81.8 (46.2–100.0) | 97.4 (92.7–100.0) | 95.1 (65.9–100.0) | 91.7 (75.0–100.0) |
| NB | 0.856 (0.734–0.977) | 86.8 (77.4–94.3) | 74.1 (53.8–94.7) | 66.7 (50.0–90.0) | 94.9 (87.8–100.0) | 87.8 (75.6–97.6) | 83.3 (58.3–100.0) |
| BPNN | 0.821 (0.680–0.962) | 90.6 (83.0–96.2) | 76.6 (52.0–95.7) | 90.0 (69.2–100.0) | 90.9 (84.8–97.6) | 97.6 (92.7–100.0) | 66.7 (41.7–91.7) |
The confusion matrix in our study was given as a 2×2 contingency table that reported the number of true positives, false positives, false negatives, and true negatives. Sensitivity = true positives/(true positives + false negatives) ×100%. Specificity = True negatives/(true negatives + false positives) ×100%. Accuracy = (true positives + true negatives)/n ×100%. The F1 score is equivalent to harmonic mean of the precision and recall, where the best value is 1.0 and the worst value is 0.0. The formula for F1 score is: F1 =2 * (precision * recall)/(precision + recall), precision = true positives/(true positives + false positives), recall = true positives/(true positives + false negatives). PPV was the probability that the disease was present when the test was positive (expressed as a percentage). NPV was the probability that the disease was not present when the test was negative (expressed as a percentage). The ROC curve was created by plotting the true positive rate (sensitivity) against the false positive rate (1-sensitivity). By varying the predicted probability threshold, we calculated AUC values. We calculated 95% CIs with the bootstrap (100 iterations) method. AUC, area under the curve; CI, confidence interval; PPV, positive predictive value; NPV, negative predictive value; NB, Naive Bayes; LR, Linear Regression; RF, Random Forest; XGBoost, Extreme Gradient Boosting; AdaBoost, Adaptive Boosting; KNN, K-Nearest Neighbor; k-SVM, Kernel Support Vector Machine; BPNN, Back Propagation Neural Networks.
Comparison of radiological model based on eight machine learning classifiers in predicting critical illness among patients with COVID-19
| Classifiers | Measured metrics | ||||||
|---|---|---|---|---|---|---|---|
| AUC (95% CI) | Accuracy% (95% CI) | F1 score (95% CI) | PPV% (95% CI) | NPV% (95% CI) | Specificity% (95% CI) | Sensitivity% (95% CI) | |
| XGBoost | 0.890 (0.757–1.000) | 90.6 (77.4–96.2) | 80.3 (57.1–100.0) | 71.4 (50.0–100.0) | 97.2 (90.7–100.0) | 90.2 (75.6–100.0) | 91.7 (66.7–100.0) |
| AdaBoost | 0.872 (0.743–1.000) | 86.8 (71.7–96.2) | 77.2 (53.3–95.7) | 66.7 (44.4–91.7) | 96.5 (89.5–100.0) | 87.8 (65.9–97.6) | 91.7 (66.7–100.0) |
| RF | 0.878 (0.735–1.000) | 88.7 (75.5–96.2) | 76.9 (55.6–100.0) | 71.4 (47.6–100.0) | 95.3 (89.7–100.0) | 90.2 (70.7–100.0) | 83.3 (66.7–100.0) |
| LR | 0.872 (0.735–1.000) | 86.8 (73.6–96.2) | 78.6 (54.3–96.0) | 68.8 (45.8–92.3) | 96.6 (89.7–100.0) | 87.8 (68.3–97.6) | 91.7 (66.7–100.0) |
| KNN | 0.826 (0.690–0.962) | 86.8 (77.4–94.3) | 72.4 (50.0–95.2) | 70.0 (50.0–90.9) | 92.5 (86.0–100.0) | 90.2 (80.5–97.6) | 75.0 (50.0–100.0) |
| SVM | 0.833 (0.691–0.976) | 83.0 (67.9–92.5) | 68.3 (47.5–94.7) | 57.9 (40.0–90.0) | 94.9 (88.2–100.0) | 82.9 (61.0–97.6) | 83.3 (58.3–100.0) |
| NB | 0.856 (0.734–0.977) | 86.8 (77.4–96.2) | 74.1 (53.8–94.7) | 66.7 (50.0–90.0) | 94.9 (88.1–100.0) | 87.8 (78.0–97.6) | 83.3 (58.3–100.0) |
| BPNN | 0.736 (0.584–0.888) | 77.4 (66.0–88.7) | 57.1 (37.0–81.1) | 50.0 (33.3–72.7) | 89.5 (81.8–97.1) | 80.5 (68.3–92.7) | 66.7 (41.7–91.7) |
The confusion matrix in our study was given as a 2×2 contingency table that reported the number of true positives, false positives, false negatives, and true negatives. Sensitivity = true positives/(true positives + false negatives) ×100%. Specificity = True negatives/(true negatives + false positives) ×100%. Accuracy = (true positives + true negatives)/n ×100%. The F1 score is equivalent to harmonic mean of the precision and recall, where the best value is 1.0 and the worst value is 0.0. The formula for F1 score is: F1 =2 * (precision * recall)/(precision + recall), precision = true positives/(true positives + false positives), recall = true positives/(true positives + false negatives). PPV was the probability that the disease was present when the test was positive (expressed as a percentage). NPV was the probability that the disease was not present when the test was negative (expressed as a percentage). The ROC curve was created by plotting the true positive rate (sensitivity) against the false positive rate (1-sensitivity). By varying the predicted probability threshold, we calculated AUC values. We calculated 95% CIs with the bootstrap (100 iterations) method. AUC, area under the curve; CI, confidence interval; PPV, positive predictive value; NPV, negative predictive value; NB, Naive Bayes; LR, Linear Regression; RF, Random Forest; XGBoost, Extreme Gradient Boosting; AdaBoost, Adaptive Boosting; KNN, K-Nearest Neighbor; k-SVM, Kernel Support Vector Machine; BPNN, Back Propagation Neural Networks.
Comparison of combined model based on eight machine learning classifiers in predicting critical illness among patients with COVID-19
| Classifiers | Measured metrics | ||||||
|---|---|---|---|---|---|---|---|
| AUC (95% CI) | Accuracy% (95% CI) | F1 score (95% CI) | PPV% (95% CI) | NPV% (95% CI) | Specificity% (95% CI) | Sensitivity% (95% CI) | |
| XGBoost | 0.955 (0.906–1.000) | 90.6 (81.1–98.1) | 82.8 (68.4–96.0) | 70.6 (54.5–92.3) | 100.0 (97.1–100.0) | 87.8 (75.6–97.6) | 100.0 (91.7–100.0) |
| AdaBoost | 0.955 (0.905–1.000) | 92.5 (83.0–98.1) | 85.7 (70.4–96.0) | 75.0 (57.1–92.3) | 100.0 (97.6–100.0) | 90.2 (78.0–97.6) | 100.0 (91.7–100.0) |
| RF | 0.959 (0.913–1.000) | 90.6 (83.0–98.1) | 82.8 (70.4–100.0) | 70.6 (57.1–100.0) | 100.0 (97.6–100.0) | 87.8 (78.0–100.0) | 100.0 (91.7–100.0) |
| LR | 0.935 (0.870–1.000) | 88.7 (79.2–96.2) | 80.0 (66.5–95.7) | 66.7 (52.2–91.7) | 100.0 (97.1–100.0) | 85.4 (73.2–97.6) | 100.0 (91.7–100.0) |
| KNN | 0.904 (0.792–1.000) | 94.3 (86.8–100.0) | 87.0 (65.6–100.0) | 90.9 (75.0–100.0) | 95.2 (88.9–100.0) | 97.6 (92.7–100.0) | 83.3 (58.3–100.0) |
| SVM | 0.886 (0.780–0.992) | 84.9 (73.6–94.3) | 74.3 (54.1–100.0) | 62.5 (45.5–100.0) | 97.1 (90.7–100.0) | 82.9 (68.3–100.0) | 91.7 (66.7–100.0) |
| NB | 0.873 (0.773–0.973) | 84.9 (75.5–94.3) | 73.3 (58.1–88.9) | 61.1 (47.4–80.0) | 97.2 (91.4–100.0) | 82.9 (70.7–92.7) | 91.7 (75.0–100.0) |
| BPNN | 0.856 (0.734–0.977) | 86.8 (77.4–94.3) | 74.1 (53.8–94.7) | 66.7 (50.0–90.0) | 94.9 (88.1–100.0) | 87.8 (78.0–97.6) | 83.3 (58.3–100.0) |
The confusion matrix in our study was given as a 2×2 contingency table that reported the number of true positives, false positives, false negatives, and true negatives. Sensitivity = true positives/(true positives + false negatives) ×100%. Specificity = True negatives/(true negatives + false positives) ×100%. Accuracy = (true positives + true negatives)/n ×100%. The F1 score is equivalent to harmonic mean of the precision and recall, where the best value is 1.0 and the worst value is 0.0. The formula for F1 score is: F1 =2 * (precision * recall)/(precision + recall), precision = true positives/(true positives + false positives), recall = true positives/(true positives + false negatives). PPV was the probability that the disease was present when the test was positive (expressed as a percentage). NPV was the probability that the disease was not present when the test was negative (expressed as a percentage). The ROC curve was created by plotting the true positive rate (sensitivity) against the false positive rate (1-sensitivity). By varying the predicted probability threshold, we calculated AUC values. We calculated 95% CIs with the bootstrap (100 iterations) method. AUC, area under the curve; CI, confidence interval; PPV, positive predictive value; NPV, negative predictive value; NB, Naive Bayes; LR, Linear Regression; RF, Random Forest; XGBoost, Extreme Gradient Boosting; AdaBoost, Adaptive Boosting; KNN, K-Nearest Neighbor; k-SVM, Kernel Support Vector Machine; BPNN, Back Propagation Neural Networks.
Figure 5Receiver operating characteristic curve analyses of eight machine learning classifiers in predicting critical illness among COVID-19 patients. (A) clinical model; (B) radiological model; and (C) combined model.