Literature DB >> 33717594

Machine learning models for predicting critical illness risk in hospitalized patients with COVID-19 pneumonia.

Qin Liu¹, Baoguo Pang², Haijun Li³, Bin Zhang⁴, Yumei Liu⁵, Lihua Lai¹, Wenjun Le⁶, Jianyu Li¹, Tingting Xia¹, Xiaoxian Zhang⁷, Changxing Ou⁷, Jianjuan Ma⁸, Shenghao Li¹, Xiumei Guo¹, Shuixing Zhang⁴, Qingling Zhang⁷, Min Jiang⁹, Qingsi Zeng¹.

Abstract

BACKGROUND: To develop machine learning classifiers at admission for predicting which patients with coronavirus disease 2019 (COVID-19) who will progress to critical illness.
METHODS: A total of 158 patients with laboratory-confirmed COVID-19 admitted to three designated hospitals between December 31, 2019 and March 31, 2020 were retrospectively collected. 27 clinical and laboratory variables of COVID-19 patients were collected from the medical records. A total of 201 quantitative CT features of COVID-19 pneumonia were extracted by using an artificial intelligence software. The critically ill cases were defined according to the COVID-19 guidelines. The least absolute shrinkage and selection operator (LASSO) logistic regression was used to select the predictors of critical illness from clinical and radiological features, respectively. Accordingly, we developed clinical and radiological models using the following machine learning classifiers, including naive bayes (NB), linear regression (LR), random forest (RF), extreme gradient boosting (XGBoost), adaptive boosting (AdaBoost), K-nearest neighbor (KNN), kernel support vector machine (k-SVM), and back propagation neural networks (BPNN). The combined model incorporating the selected clinical and radiological factors was also developed using the eight above-mentioned classifiers. The predictive efficiency of the models is validated using a 5-fold cross-validation method. The performance of the models was compared by the area under the receiver operating characteristic curve (AUC).
RESULTS: The mean age of all patients was 58.9±13.9 years and 89 (56.3%) were males. 35 (22.2%) patients deteriorated to critical illness. After LASSO analysis, four clinical features including lymphocyte percentage, lactic dehydrogenase, neutrophil count, and D-dimer and four quantitative CT features were selected. The XGBoost-based clinical model yielded the highest AUC of 0.960 [95% confidence interval (CI): 0.913-1.000)]. The XGBoost-based radiological model achieved an AUC of 0.890 (95% CI: 0.757-1.000). However, the predictive efficacy of XGBoost-based combined model was very close to that of the XGBoost-based clinical model, with an AUC of 0.955 (95% CI: 0.906-1.000).
CONCLUSIONS: A XGBoost-based based clinical model on admission might be used as an effective tool to identify patients at high risk of critical illness. 2021 Journal of Thoracic Disease. All rights reserved.

Entities: Chemical

Keywords: COVID-19; chest CT; critical illness; machine learning; prediction

Year: 2021 PMID： 33717594 PMCID： PMC7947498 DOI： 10.21037/jtd-20-2580

Source DB: PubMed Journal: J Thorac Dis ISSN： 2072-1439 Impact factor: 2.895

Introduction

The emergence and rapid spread of coronavirus disease 2019 (COVID-19) caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) as a potentially fatal disease is a major and urgent threat to global health. As of July 24, 2020, there are more than 15.64 million confirmed cases by World Health Organization (WHO) with 636,384 deaths. The clinical spectrum of COVID-19 pneumonia ranges from mild to critically ill. Most patients of COVID-19 had mild acute respiratory infection symptoms, such as fever, dry cough, and fatigue, but some could rapidly develop fatal complications, including acute respiratory distress syndrome (ARDS) or respiratory failure, multiple organ dysfunction or failure, septic shock or even death (1). Until now, no specific treatments were recommended for COVID-19 except for meticulous supportive care (2); thus, early identification of patients with a high-risk of progression to critical illness may facilitate the provision of proper supportive treatment in advance and reduce mortality. Some attempts have been made to develop forewarning models by taking into account possible prognostic biomarkers to predict poor outcomes in patients with COVID-19. Ji et al. established a clinical nomogram to predict progression risk in COVID-19 (3). Liu et al.identified patients at elevated risk of severe illness according to quantitative computed tomography (CT) features of pneumonia lesions in the early days (4). Liang et al. developed a clinical score consisting of 10 clinical variables at hospital admission for predicting which patients with COVID-19 will develop critical illness (5). Yan et al. developed a clinical model based on lactic dehydrogenase (LDH), lymphocyte and high-sensitivity C-reactive protein (hs-CRP) that can predict the mortality rates of COVID-19 patients >10 days in advance with >90% accuracy (6). Dong et al. developed a scoring system based on D-dimer, lymphocyte, and erythrocyte sedimentation rate to predict the severity of patients with COVID-19 (7). Wang et al. constructed clinical-laboratory model to predict in-hospital mortality of COVID-19 patients (8). However, the role of quantitative CT features has not been fully investigated and the majority of these studies follow the standard scientific methods, such as Cox regression and binary logistic regression analysis. While undeniably successful, these standard methods might have inherent limitations. Machine learning is broadly defined as a body of computational methods/models that use patterns in data to improve performance or make accurate predictions (9). It provides a powerful set of tools to unravel the relationship between the variables and outcomes, particularly when data are nonlinear and complex (10). It is best applied when there are lots of variables and overfitting can be a problem for traditional statistical methods (10). The profusion of data requires machine learning to improve and accelerate the management of COVID-19 (11). Recent studies have identified the ability of machine learning and artificial intelligence (AI) using CT findings or radiomic/deep learning features extracted from CT images to detect, triage, and assess the severity and prognosis of COVID-19 patients (12-23). The machine learning models might serve to augment human diagnostic performance and show great potentials for assisting decision-making in the management of COVID-19 patients by assessing disease severity and predicting clinical outcomes. Considering the machine learning method is purely data-driven, it is essential to compare multiple models for optimal prediction of a specific task (24). Therefore, the primary aims of this study are to compare the performance of multiple machine learning models based on clinical, laboratory, and radiological data for predicting critical illness in patients with COVID-19 pneumonia. Early detection of patients who are likely to develop critical illness is of great importance in the clinical settings, which may help clinicians to better choose treatment strategy and improve the use of limited resources. We present the following article in accordance with the STROBE reporting checklist (available at http://dx.doi.org/10.21037/jtd-20-2580).

Methods

Data sources

The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013) and approved by the institutional review Board of the First Affiliated Hospital of Guangzhou Medical University (approval number: 202056); the need for informed consent was waived due to the retrospective nature of the study. The reporting follows the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) checklist (25). We included laboratory-confirmed hospitalized cases with COVID-19 admitted to three designated hospitals (Huangpi District Hospital of Traditional Chinese Medicine, Hankou Hospital of Wuhan, and The First Affiliated Hospital of Guangzhou Medical University) for COVID-19 treatment between December 31, 2019 and March 31, 2020. COVID-19 cases were confirmed by real-time reverse transcription-polymerase chain reaction (RT-PCR) assay of nasal and pharyngeal swab specimens (at least two samples were taken, at least 24 hours apart) for COVID-19 according to the protocol established by the WHO. Patients aged <18 years or patients with no available clinical/CT records or patients were critically ill on admission were excluded. Finally, 158 patients with COVID-19 were included, 123 (77.8%) were non-critical and 35 (22.2%) were critical cases. On admission, clinical data including age, sex, and comorbidities of patients were collected. The laboratory parameters, mainly including routine blood tests, coagulation profile, liver and renal function, and myocardial enzyme were examined at admission. The data in source documents were confirmed independently by two researchers. illustrates the workflow of this study.

Figure 1

The framework of predicting progression to critical illness in COVID-19 patients. The workflow mainly consists of five steps: (1) clinical and laboratory data collection; (2) chest CT image acquisition; (3) AI-based quantitative CT analysis; (4) feature selection; and (5) development of clinical, radiological, and combined models using eight machine learning classifiers. The performance of models was evaluated by receiver operating characteristic curve analysis.

CT image acquisition

All patients underwent chest CT scans by a 64-slice CT scanner (Siemens Definition AS + 128, Forchheim, Germany). Each patient was scanned from the lung apex to the diaphragm during a breath-hold at the end full inspiration and at end normal-expiration. To reduce breathing artifacts, patients were instructed on breath-holding. No contrast agent was administered. CT acquisition was executed as follows: tube voltage, 120 Kilovolt (kV); tube current, auto milliampere second (mAs); pitch, 1.2; Rotation time, 0.5 s; the field of view (FOV), 330 mm ×330 mm.Lung images were reconstructed at a slice thickness of 1.0–1.25 mm using the I50 medium sharp algorithm. Lung window level and window width were set as −530–430 Hounsfield units (HU) and 1,400–1,600 HU, respectively.

Quantitative CT analysis

The quantitative analysis of lung infected by COVID-19 was performed by a care.ai Intelligent Multi-disciplinary Imaging Diagnosis Platform Intelligent Evaluation System of Chest CT for COVID-19 (YT-CT-Lung, YITU Healthcare Technology Co., Ltd., China). This system used a multi-scale convolutional neural network with adaptive thresholding and morphological operations for the segmentation of lungs and pneumonia lesions (26,27). By thresholding on CT values in the pneumonia lesions, three quantitative features were generated, including ground-glass opacities (GGO) with value ranges of −1,000–−500 HU, semi-consolidation with value ranges of −500–−250 HU and consolidation with density ranges of -250–60 HU (4).A quantitative analysis of pneumonia lesions, GGO, consolidation, and whole lungs was performed based on the segmentation results. All images were independently reviewed and assessed by two radiologists (with 10 and 20 years of experience in thoracic imaging) and discrepancies were resolved by consensus. A total of 201 quantitative CT features were extracted, which were listed below: (I) Volumes of pneumonia lesion, GGO, and consolidation in both lungs, left lung, right lung, and five lobes (n=24). (II) Volumes and percentages of pneumonia lesion, GGO, and consolidation in 18 lung segments (n=36). (III) Percentages of pneumonia volume, GGO volume, and consolidation volume in both lungs, left lung, right lung, and each lobe (n=24). (IV) CT values (mean, standard deviation, median, maximum, interquartile range) of pneumonia lesions, GGO, and consolidation in both lungs, left lung, and right lung (n=45); Hellinger distance, intersection over union (IOU), volume, CT values (mean, standard deviation, median, maximum, interquartile range) of total lung, volumes and percentages of whole lung with density of −1,000 to −700 HU, −700 to −600 HU, −600 to −500 HU, −500 to −300 HU, −300 to −200 HU, −200 to 60 HU, and 60 to 1,000 HU (n=22); herein, Hellinger distance is used to measure the similarity of two distributions. The closer the value is to 0, the higher the similarity. IOU is also called an overlap ratio, which is the ratio of the intersection and union of two distributions. Ideally, they are completely overlapping, that is, the ratio is 1.0. (V) Hellinger distance, IOU, volume, CT values (mean, standard deviation, median, maximum, interquartile range) of left lung, volumes and percentages of left lung with density of −1,000 to −700 HU, −700 to −600 HU, −600 to −500 HU, −500 to −300 HU, −300 to −200 HU, −200 to 60 HU, and 60 to 1,000 HU (n=22). (6)Hellinger distance, IOU, volume, CT values (mean, standard deviation, median, maximum, interquartile range) of the right lung, volumes and percentages of right lung with density of −1,000 to −700 HU, −700 to −600 HU, −600 to −500 HU, −500 to −300 HU, −300 to −200 HU, −200 to 60 HU, and 60 to 1,000 HU (n=22). (7) Each of the five lung lobes was scored with the following formula: 3× the volume ratio of consolidation to total lung + 2× the volume ratio of GGO to total lung (n=5). Accordingly, the total lung score was computed by summarizing the scores of five lobes (n=1).

Definition of endpoint

We defined the severity of COVID-19 according to the newest COVID-19 guidelines released by the National Health Commission of China (28) and the guidelines of the American Thoracic Society for community-acquired pneumonia (29). We defined critical illness as a composite of admission to intensive care unit (ICU), respiratory failure requiring mechanical ventilation, shock during hospitalization, or death.

Feature selection and machine learning model development

COVID-19 patients in the training dataset were included for feature selection and machine learning based model development. Imputation for missing variables was considered if missing values were less than 20%. Five laboratory variables (C-reactive protein, myohemoglobin, creatine kinase, erythrocyte sedimentation rate, and brain natriuretic peptide) with missing values >50% were excluded. Finally, a total of 27 clinical data and 201 quantitative CT features were entered into the selection process, respectively. We used mean value to impute numeric features. The least absolute shrinkage and selection operator (LASSO) logistic regression algorithm was used to select the most significant predictors from among all the candidate variables. It can minimize the potential collinearity of variables measured from the same patient and over-fitting of variables. The penalty parameter lambda was selected in the LASSO regression by 5-fold cross-validation based on the error within one standard error range of the minimum. We firstly constructed the clinical and radiological models based on the corresponding clinical and radiological features selected by LASSO and then built the combined model based on the combination of the selected clinical and radiological features. Eight machine learning classifiers were used to develop those models for predicting critical illness, including Naive Bayes (NB), Linear Regression (LR), Random Forest (RF), Extreme Gradient Boosting (XGBoost), Adaptive Boosting (AdaBoost), K-Nearest Neighbor (KNN), Kernel Support Vector Machine (k-SVM), and Back Propagation Neural Networks (BPNN). The predictive value of the models is validated by 5-fold cross-validation. Classification performance of the machine learning models was measured using the area under the curve (AUC), F1 score, accuracy, positive predictive value (PPV), negative predictive value (NPV), sensitivity, and specificity. Machine learning models were implemented in open source Python 3X and Project Jupyter version 1.2.3 (Anaconda, Inc, https://jupyter.org/about).

Statistical analysis

Categorical variables were expressed as counts and percentages, while continuous variables are shown as mean and standard deviation (SD) or median and interquartile range. All the statistical analyses were performed using R software, version 3.6.1 (R Foundation for Statistical Computing, Vienna, Austria). The packages were used as follows: “glmnet” for LASSO logistic regression, “xgboost” for XGBoost, “adabag” for AdaBoost, “naivebayes” for NB, “mlr” for LR, “class” for KNN, “randomForest” for RF, “e1071” for SVM, and “nnet” for BPNN. Differences of clinical and laboratory characteristics between the non-critical and critical COVID-19 cases were compared using the Chi-square test or Fisher’s exact test or Mann-Whitney U test, if appropriate. The comparison of different models used the Delong test. A P<0.05 was considered significant.

Results

Clinical characteristics of patients

Among the 158 patients with COVID-19, 123 (77.8%) were non-critical cases, and 35 (22.2%) were critical cases including 12 deaths during hospitalization. The relatively high critically ill rate seen in our study was related to the fact that the First Affiliated Hospital of Guangzhou Medical University only admitted severe/critical cases transferred from other designated hospitals of Guangzhou (10 critical cases were included). The mean age of all patients was 58.9±13.9 years (range, 25–95 years), 89 of 158 patients (56.3%) were male. Fever (72.8%) was the most common symptom, followed by dry cough (67.7%), shortness of breath (48.7%) and fatigue (41.8%). 67 patients (42.4%) had at least one underlying comorbidity, with hypertension (25.3%) being the most common, followed by diabetes (13.3%) and heart diseases (8.9%). Baseline clinical and laboratory characteristics of non-critically ill and critically ill patients are shown in .

Table 1

The baseline characteristics and laboratory findings at admission

	Non-critical (n=123)	Critical (n=35)	P value
Age (years)	58.2±14.4	61.5±11.7	0.213
Sex
Male	64 (52.0)	25 (71.4)	0.041
Female	59 (48.0)	10 (28.6)	0.041
Comorbidities
COPD	3 (2.4)	2 (5.7)	0.307
Heart disease	10 (8.1)	4 (11.4)	0.513
Hypertension	23 (18.7)	17 (48.6)	<0.001
Diabetes	9 (7.3)	12 (34.3)	<0.001
Malignancy	1 (0.8)	1 (2.9)	0.395
Cerebropathy	3 (2.4)	0	1.000
Others	23 (18.7)	7 (20.0)	0.863
No. of comorbidities
0	80 (65.0)	11 (31.4)	<0.001
1	25 (20.3)	11 (31.4)	0.167
2	13 (10.6)	9 (25.7)	0.049
3	3 (2.4)	2 (5.7)	0.307
4	2 (1.6)	2 (5.7)	0.213
WBC (×10⁹/L)	5.4±2.1	9.2±5.3	<0.001
Neutrophil (×10⁹/L)	4.4±6.2	8.2±5.2	0.002
Neutrophil (%)	67.0±15.4	86.6±6.9	<0.001
Lymphocyte (×10⁹/L)	1.3±1.9	0.6±0.3	0.031
Lymphocyte (%)	22.8±11.3	8.0±4.7	<0.001
Eosinophil (×10⁹/L)	0.1±0.3	0.01±0.02	0.143
Eosinophil (%)	2.3±10.2	0.1±0.2	0.227
Monocyte (×10⁹/L)	0.4±0.2	0.5±0.3	0.186
Monocyte (%)	7.7±3.3	5.1±2.9	0.002
Hemoglobin (g/L)	126.4±19.4	131.9±16.0	0.138
Platelet (g/L)	216.4±76.7	172.5±56.4	0.001
Fibrinogen (g/L)	3.8±2.0	5.4±1.8	0.002
D-dimer (μg/mL)	20.7±102.3	963.9±2,241.8	0.011
hs-CRP (mg/L)	19.1±19.9	34.8±4.3	<0.001
ALT (U/L)	26.1±17.0	43.0±28.2	0.004
AST (U/L)	25.9±14.2	50.3±34.6	0.001
TBIL (μmol/L)	12.5±7.7	13.4±6.4	0.554
DBIL (μmol/L)	6.1±19.3	6.0±4.2	0.983
ALP (U/L)	67.0±30.6	73.2±36.7	0.436
LDH (U/L)	231.8±105.9	458.4±161.4	<0.001
Procalcitonin (ng/mL)	0.14±0.13	0.52±0.84	0.032
Creatinine (μmol/L)	69.7±17.8	91.0±46.8	0.020
Urea nitrogen (mmol/L)	4.6±1.6	7.6±4.3	0.001

Data were mean ± standard deviation (SD) or number (percentage). P values were calculated by t test, Mann-Whitney U test, χ2 test or Fisher’s exact test, as appropriate. Abbreviations: COPD, chronic obstructive pulmonary disease; CT, computed tomography; WBC, white blood cells; hs-CRP, high-sensitivity C-reactive protein; ALT, alanine transaminase; AST, aspartate aminotransferase; TBIL, total bilirubin; DBIL, direct bilirubin; ALP, alkaline phosphatase; LDH, lactate dehydrogenase.

Predictors of developing critical illness in COVID-19 patients

A total of 27 clinical and laboratory variables measured at hospital admission () were included in the LASSO regression. After LASSO regression selection (), four variables remained significant predictors of critical illness, which were ranked as lymphocyte percentage, LDH, neutrophil count, and D-dimer according to the absolute value of regression coefficient (). Of the 201 quantitative CT features, the vast majority of them were redundant and only four features were selected (), which were ranked as pneumonia percentage in the lateral basal segment of left lower lung, volume of whole lung with density of −300 to −200 HU, pneumonia volume in both lungs, and pneumonia volume in right lung according to the absolute value of regression coefficient (). illustrates the CT findings and clinical parameters in two representative cases of non-critical and critical COVID-19 patients.

Figure 2

Figure 3

Relative importance of the selected clinical (A) and radiological (B) features according to the LASSO regression coefficient.

Figure 4

Two representative cases of non-critical and critical COVID-19 patients. The non-critical case was a 25-year-old female presented with fever for one day. Her initial chest CT images show GGO and consolidation with crazy paving and air bronchogram sign in the lateral segment of right middle lobe of lung (A,B). The laboratory tests show WBC of 4.3×109/L, neutrophil of 2.7×109/L, lymphocyte count of 1.1×109/L, lymphocyte percentage of 26.1%, d-dimer of 263 µg/mL, and LDH of 47.6 U/L. The critical case was a 58-year-old male who had fever for 10 days and shortness of breath for 3 days. The admission thin-section chest CT images demonstrate extensive GGO and consolidation with crazy paving and bronchial wall thickening in both lungs (C,D). The laboratory findings show WBC of 10.2×109/L, neutrophil of 9.6×109/L, lymphocyte count of 0.2×109/L, lymphocyte percentage of 2.2%, d-dimer of 1,807 µg/mL, and LDH of 811.7 U/L.

Feature selection using the LASSO binary logistic regression model. (A) Tuning parameter (lambda) selection in the LASSO regression used 5-fold cross-validation via 1 standard error criteria, four laboratory features with non-zero coefficient were selected. (B) LASSO coefficient profiles of the 27 clinical features. (C) Tuning parameter (lambda) selection in the LASSO regression used 5-fold cross-validation via 1 standard error criteria, four quantitative CT features with non-zero coefficient were selected. (D) LASSO coefﬁcient proﬁles of the 201 radiological features. Relative importance of the selected clinical (A) and radiological (B) features according to the LASSO regression coefficient. Two representative cases of non-critical and critical COVID-19 patients. The non-critical case was a 25-year-old female presented with fever for one day. Her initial chest CT images show GGO and consolidation with crazy paving and air bronchogram sign in the lateral segment of right middle lobe of lung (A,B). The laboratory tests show WBC of 4.3×109/L, neutrophil of 2.7×109/L, lymphocyte count of 1.1×109/L, lymphocyte percentage of 26.1%, d-dimer of 263 µg/mL, and LDH of 47.6 U/L. The critical case was a 58-year-old male who had fever for 10 days and shortness of breath for 3 days. The admission thin-section chest CT images demonstrate extensive GGO and consolidation with crazy paving and bronchial wall thickening in both lungs (C,D). The laboratory findings show WBC of 10.2×109/L, neutrophil of 9.6×109/L, lymphocyte count of 0.2×109/L, lymphocyte percentage of 2.2%, d-dimer of 1,807 µg/mL, and LDH of 811.7 U/L.

Performance of the clinical model, radiological model, and combined model

Machine learning models were formulated according to the above risk factors associated with critical illness, and validated by internal bootstrap validation. and show the predictive performance of eight classifiers in the clinical, radiological, and combined models, respectively. In the validation phase of the clinical model ( and ), the AUCs of eight machine learning classifiers ranged from 0.821 to 0.960. The AUCs of XGBoost, AdaBoost, RF, LR, and SVM exceeded 0.900. The SVM showed the highest discriminatory powers of AUC of 0.960 (95% CI: 0.913–1.000), with sensitivity of 100.0% (95% CI: 83.3–100.0%), specificity of 87.8% (95% CI: 75.6–100.0%), accuracy of 90.6% (95% CI: 81.1–98.1%), F1 score of 82.8% (95% CI: 65.9–100.0%), PPV of 70.6% (54.5–100.0%), and NPV of 100.0% (95.1–100.0%). In the validation phase of radiological model ( and ), the AUCs of all classifiers exceed 0.800 except BNPP. The XGBoost-based model achieved an AUC of 0.890 (95% CI: 0.757–1.000), sensitivity of 91.7% (95% CI: 66.7–100.0%), specificity of 90.2% (95% CI: 75.6–100.0%), accuracy of 90.6% (95% CI: 77.4–96.2%), F1 score of 80.3% (95% CI: 57.1–100.0%), PPV of 71.4% (95% CI: 50.0–100.0%), and NPV of 97.2% (95% CI: 90.7–100.0%). In the validation phase of combined model ( and ), the AUCs of eight classifiers ranged from 0.856 to 0.959. The XGBoost-based combined model performed similarly with the XGBoost-based clinical model, with an AUC of 0.955 (95% CI: 0.906–1.000), sensitivity of 100.0% (91.7–100.0%), specificity of 87.8% (75.6–97.6%), accuracy of 90.6% (81.1–98.1%), F1 score of 82.8% (95% CI: 68.4–96.0%), PPV of 70.6% (54.5–92.3%), and NPV of 100.0% (97.1–100.0%). The clinical model outperformed the radiological model in predicting the risk of developing critical illness in patients with COVID-19, however, with no significant difference (P=0.330). Adding the quantitative CT features to the clinical model achieved no significant improvement (P=0.763).

Table 2

Comparison of clinical model based on eight machine learning classifiers in predicting critical illness among patients with COVID-19

Classifiers	Measured metrics
Classifiers	AUC (95% CI)	Accuracy% (95% CI)	F1 score (95% CI)	PPV% (95% CI)	NPV% (95% CI)	Specificity% (95% CI)	Sensitivity% (95% CI)
XGBoost	0.960 (0.913–1.000)	90.6 (81.1–98.1)	82.8 (65.9–100.0)	70.6 (54.5–100.0)	100.0 (95.1–100.0)	87.8 (75.6–100.0)	100.0 (83.3–100.0)
AdaBoost	0.929 (0.857–1.000)	84.9 (71.7–98.1)	75.0 (53.3–100.0)	60.0 (44.4–100.0)	100.0 (91.1–100.0)	80.5 (63.4–100.0)	100.0 (66.7–100.0)
RF	0.959 (0.913–1.000)	90.6 (81.1–98.1)	82.8 (68.4–100.0)	70.6 (54.5–100.0)	100.0 (97.1–100.0)	87.8 (75.6–100.0)	100.0 (91.7–100.0)
LR	0.937 (0.871–1.000)	90.6 (81.1–98.1)	82.8 (68.4–96.0)	70.6 (54.5–92.3)	100.0 (97.1–100.0)	87.8 (75.6–97.6)	100.0 (91.7–100.0)
KNN	0.851 (0.718–0.983)	90.6 (83.0–98.1)	78.9 (55.6–100.0)	83.3 (62.5–100.0)	92.9 (86.7–100.0)	95.1 (87.8–100.0)	75.0 (50.0–100.0)
SVM	0.917 (0.834–1.000)	92.5 (73.6–98.1)	86.5 (57.1–100.0)	81.8 (46.2–100.0)	97.4 (92.7–100.0)	95.1 (65.9–100.0)	91.7 (75.0–100.0)
NB	0.856 (0.734–0.977)	86.8 (77.4–94.3)	74.1 (53.8–94.7)	66.7 (50.0–90.0)	94.9 (87.8–100.0)	87.8 (75.6–97.6)	83.3 (58.3–100.0)
BPNN	0.821 (0.680–0.962)	90.6 (83.0–96.2)	76.6 (52.0–95.7)	90.0 (69.2–100.0)	90.9 (84.8–97.6)	97.6 (92.7–100.0)	66.7 (41.7–91.7)

Table 3

Comparison of radiological model based on eight machine learning classifiers in predicting critical illness among patients with COVID-19

Classifiers	Measured metrics
Classifiers	AUC (95% CI)	Accuracy% (95% CI)	F1 score (95% CI)	PPV% (95% CI)	NPV% (95% CI)	Specificity% (95% CI)	Sensitivity% (95% CI)
XGBoost	0.890 (0.757–1.000)	90.6 (77.4–96.2)	80.3 (57.1–100.0)	71.4 (50.0–100.0)	97.2 (90.7–100.0)	90.2 (75.6–100.0)	91.7 (66.7–100.0)
AdaBoost	0.872 (0.743–1.000)	86.8 (71.7–96.2)	77.2 (53.3–95.7)	66.7 (44.4–91.7)	96.5 (89.5–100.0)	87.8 (65.9–97.6)	91.7 (66.7–100.0)
RF	0.878 (0.735–1.000)	88.7 (75.5–96.2)	76.9 (55.6–100.0)	71.4 (47.6–100.0)	95.3 (89.7–100.0)	90.2 (70.7–100.0)	83.3 (66.7–100.0)
LR	0.872 (0.735–1.000)	86.8 (73.6–96.2)	78.6 (54.3–96.0)	68.8 (45.8–92.3)	96.6 (89.7–100.0)	87.8 (68.3–97.6)	91.7 (66.7–100.0)
KNN	0.826 (0.690–0.962)	86.8 (77.4–94.3)	72.4 (50.0–95.2)	70.0 (50.0–90.9)	92.5 (86.0–100.0)	90.2 (80.5–97.6)	75.0 (50.0–100.0)
SVM	0.833 (0.691–0.976)	83.0 (67.9–92.5)	68.3 (47.5–94.7)	57.9 (40.0–90.0)	94.9 (88.2–100.0)	82.9 (61.0–97.6)	83.3 (58.3–100.0)
NB	0.856 (0.734–0.977)	86.8 (77.4–96.2)	74.1 (53.8–94.7)	66.7 (50.0–90.0)	94.9 (88.1–100.0)	87.8 (78.0–97.6)	83.3 (58.3–100.0)
BPNN	0.736 (0.584–0.888)	77.4 (66.0–88.7)	57.1 (37.0–81.1)	50.0 (33.3–72.7)	89.5 (81.8–97.1)	80.5 (68.3–92.7)	66.7 (41.7–91.7)

Table 4

Comparison of combined model based on eight machine learning classifiers in predicting critical illness among patients with COVID-19

Classifiers	Measured metrics
Classifiers	AUC (95% CI)	Accuracy% (95% CI)	F1 score (95% CI)	PPV% (95% CI)	NPV% (95% CI)	Specificity% (95% CI)	Sensitivity% (95% CI)
XGBoost	0.955 (0.906–1.000)	90.6 (81.1–98.1)	82.8 (68.4–96.0)	70.6 (54.5–92.3)	100.0 (97.1–100.0)	87.8 (75.6–97.6)	100.0 (91.7–100.0)
AdaBoost	0.955 (0.905–1.000)	92.5 (83.0–98.1)	85.7 (70.4–96.0)	75.0 (57.1–92.3)	100.0 (97.6–100.0)	90.2 (78.0–97.6)	100.0 (91.7–100.0)
RF	0.959 (0.913–1.000)	90.6 (83.0–98.1)	82.8 (70.4–100.0)	70.6 (57.1–100.0)	100.0 (97.6–100.0)	87.8 (78.0–100.0)	100.0 (91.7–100.0)
LR	0.935 (0.870–1.000)	88.7 (79.2–96.2)	80.0 (66.5–95.7)	66.7 (52.2–91.7)	100.0 (97.1–100.0)	85.4 (73.2–97.6)	100.0 (91.7–100.0)
KNN	0.904 (0.792–1.000)	94.3 (86.8–100.0)	87.0 (65.6–100.0)	90.9 (75.0–100.0)	95.2 (88.9–100.0)	97.6 (92.7–100.0)	83.3 (58.3–100.0)
SVM	0.886 (0.780–0.992)	84.9 (73.6–94.3)	74.3 (54.1–100.0)	62.5 (45.5–100.0)	97.1 (90.7–100.0)	82.9 (68.3–100.0)	91.7 (66.7–100.0)
NB	0.873 (0.773–0.973)	84.9 (75.5–94.3)	73.3 (58.1–88.9)	61.1 (47.4–80.0)	97.2 (91.4–100.0)	82.9 (70.7–92.7)	91.7 (75.0–100.0)
BPNN	0.856 (0.734–0.977)	86.8 (77.4–94.3)	74.1 (53.8–94.7)	66.7 (50.0–90.0)	94.9 (88.1–100.0)	87.8 (78.0–97.6)	83.3 (58.3–100.0)

Figure 5

Receiver operating characteristic curve analyses of eight machine learning classifiers in predicting critical illness among COVID-19 patients. (A) clinical model; (B) radiological model; and (C) combined model.

The confusion matrix in our study was given as a 2×2 contingency table that reported the number of true positives, false positives, false negatives, and true negatives. Sensitivity = true positives/(true positives + false negatives) ×100%. Specificity = True negatives/(true negatives + false positives) ×100%. Accuracy = (true positives + true negatives)/n ×100%. The F1 score is equivalent to harmonic mean of the precision and recall, where the best value is 1.0 and the worst value is 0.0. The formula for F1 score is: F1 =2 * (precision * recall)/(precision + recall), precision = true positives/(true positives + false positives), recall = true positives/(true positives + false negatives). PPV was the probability that the disease was present when the test was positive (expressed as a percentage). NPV was the probability that the disease was not present when the test was negative (expressed as a percentage). The ROC curve was created by plotting the true positive rate (sensitivity) against the false positive rate (1-sensitivity). By varying the predicted probability threshold, we calculated AUC values. We calculated 95% CIs with the bootstrap (100 iterations) method. AUC, area under the curve; CI, confidence interval; PPV, positive predictive value; NPV, negative predictive value; NB, Naive Bayes; LR, Linear Regression; RF, Random Forest; XGBoost, Extreme Gradient Boosting; AdaBoost, Adaptive Boosting; KNN, K-Nearest Neighbor; k-SVM, Kernel Support Vector Machine; BPNN, Back Propagation Neural Networks. The confusion matrix in our study was given as a 2×2 contingency table that reported the number of true positives, false positives, false negatives, and true negatives. Sensitivity = true positives/(true positives + false negatives) ×100%. Specificity = True negatives/(true negatives + false positives) ×100%. Accuracy = (true positives + true negatives)/n ×100%. The F1 score is equivalent to harmonic mean of the precision and recall, where the best value is 1.0 and the worst value is 0.0. The formula for F1 score is: F1 =2 * (precision * recall)/(precision + recall), precision = true positives/(true positives + false positives), recall = true positives/(true positives + false negatives). PPV was the probability that the disease was present when the test was positive (expressed as a percentage). NPV was the probability that the disease was not present when the test was negative (expressed as a percentage). The ROC curve was created by plotting the true positive rate (sensitivity) against the false positive rate (1-sensitivity). By varying the predicted probability threshold, we calculated AUC values. We calculated 95% CIs with the bootstrap (100 iterations) method. AUC, area under the curve; CI, confidence interval; PPV, positive predictive value; NPV, negative predictive value; NB, Naive Bayes; LR, Linear Regression; RF, Random Forest; XGBoost, Extreme Gradient Boosting; AdaBoost, Adaptive Boosting; KNN, K-Nearest Neighbor; k-SVM, Kernel Support Vector Machine; BPNN, Back Propagation Neural Networks. The confusion matrix in our study was given as a 2×2 contingency table that reported the number of true positives, false positives, false negatives, and true negatives. Sensitivity = true positives/(true positives + false negatives) ×100%. Specificity = True negatives/(true negatives + false positives) ×100%. Accuracy = (true positives + true negatives)/n ×100%. The F1 score is equivalent to harmonic mean of the precision and recall, where the best value is 1.0 and the worst value is 0.0. The formula for F1 score is: F1 =2 * (precision * recall)/(precision + recall), precision = true positives/(true positives + false positives), recall = true positives/(true positives + false negatives). PPV was the probability that the disease was present when the test was positive (expressed as a percentage). NPV was the probability that the disease was not present when the test was negative (expressed as a percentage). The ROC curve was created by plotting the true positive rate (sensitivity) against the false positive rate (1-sensitivity). By varying the predicted probability threshold, we calculated AUC values. We calculated 95% CIs with the bootstrap (100 iterations) method. AUC, area under the curve; CI, confidence interval; PPV, positive predictive value; NPV, negative predictive value; NB, Naive Bayes; LR, Linear Regression; RF, Random Forest; XGBoost, Extreme Gradient Boosting; AdaBoost, Adaptive Boosting; KNN, K-Nearest Neighbor; k-SVM, Kernel Support Vector Machine; BPNN, Back Propagation Neural Networks. Receiver operating characteristic curve analyses of eight machine learning classifiers in predicting critical illness among COVID-19 patients. (A) clinical model; (B) radiological model; and (C) combined model.

Discussion

In this study, we developed and validated multiple machine learning models to predict the risk of developing critical illness among patients hospitalized for COVID-19 pneumonia. The results demonstrated that the clinical model including decreased lymphocyte percentage, increased LDH, neutrophil count, and D-dimer could achieve the highest performance in predicting critical illness in COVID-19 patients, with an AUC of 0.960 (95% CI: 0.913–1.000) and accuracy of 90.6% (95% CI: 81.1–98.1%). Currently, predicted risk factors associated with a fatal outcome have been often identified from clinical and laboratory parameters. Although the COVID-19 more likely infected older males with pre-existing comorbidities, they were not good predictors of developing critical illness. Previous studies have determined many risk factors related to disease severity or poor prognosis using traditional statistical methods or LASSO regression (3-8). In fact, the identification of predictors depends on available features, feature selection method used and sample size of studies. Our findings showed that lymphocyte percentage, LDH, neutrophil count, and D-dimer were four significant predictors of severity of COVID-19. Lymphocytopenia was a prominent feature of patients with COVID-19 because targeted invasion by viral particles damages the cytoplasmic component of the lymphocyte and causes its destruction, which may reflect the severity of COVID-19 (2). In this study, lymphocyte percentage seems to play the most crucial role in prediction of critical illness of COVID-19. For critically ill patients with COVID-19, the rise in LDH level indicates an increase of the activity and extent of lung injury (30). Neutrophilia is one of the biomarkers of acute infection. Neutrophils are recruited early to sites of infection where they kill pathogens (bacteria, fungi, and viruses) by oxidative burst and phagocytosis (31). Some literature supported the hypothesis that a little known yet powerful function of neutrophils—the ability to form neutrophil extracellular traps—may contribute to organ damage and death in COVID-19 (32). Neutrophil count, either individually or paired in a ratio with lymphocytes, also predicts disease severity in COVID-19 patients (33-35). Elevation of D-dimer indicated a hypercoagulable state in patient with COVID-19, which was an independent predictor of requiring critical care support or in-hospital mortality (36). Our SVM-based clinical model selected the above four biomarkers that predict the critical illness of individual patients in advance with accuracy of more than 90%. Chest CT plays an indispensable role in the detection, diagnosis, and follow-up of COVID-19 pneumonia (37). Visual CT findings such as GGO, consolidation, crazy paving, and bronchial wall thickening are key clues to COVID-19. However, chest CT images are usually visually interpreted by radiologists in the clinical setting, which is somewhat subjective with large variability that unable to quantitatively assess the disease severity and is also time-consuming and labor-intensive. Recently, many studies used AI algorithms integrate chest CT findings with or without other variables, such as clinical symptoms, exposure history, and laboratory testing to rapidly diagnose COVID-19 (15-18,38-54). Also, other studies have used quantitative CT features derived from artificial intelligence to quantify pneumonia lesions and the risk of poor outcomes in patients with COVID-19 (4,19-22,55-61). In particular, Yin et al. concluded that quantitative CT features were superior to that of a semiquantitative visual CT score in the assessment of the severity of COVID-19 (60). Liu et al. found that quantitative CT features on day 0 and day 4 could predict the progression to severe illness in COVID-19 patients, which outperformed the acute physiology and chronic health evaluation II score, neutrophil-to-lymphocyte ratio, and D-dimer (4). Yu et al. observed that larger consolidation lesions in the upper lung on admission CT would increase the risk of poor prognosis in COVID-19 patients (61). In this study, although the XGBoost-based radiological model achieved a good accuracy in predicting the risk of developing critical illness in patients with COVID-19, it was hard to provide additional improvement to the XGBoost-based clinical model, maybe due to the high enough performance of the clinical model. This study also has some potential limitations. Firstly, the retrospective nature of this study with a relatively small sample size. Secondly, the data for machine learning training and validation were all from China, which could limit the generalizability of the models in other areas of the world. Therefore, other validations of the proposed models outside China would be helpful. Thirdly, our AI system has not evaluated the radiological features (such as crazy paving, lymphadenopathy, bronchial wall thickening, and pleural effusion) extracting by radiologists (38,62,63), which may help to improve the model performance. However, the CT findings are mainly used to diagnose COVID-19 not to predict the outcome of COVID-19. Finally, future external validation is needed to identify the generalizability of our machine learning models. Although the external validation was not performed due to insufficient data for machine learning, the testing results of our clinical model might be good because it was built by four simple and strong predictors that proven in previous studies. In conclusion, in this study, we identified the SVM-based clinical model with lymphocyte percentage, LDH, neutrophil count, and D-dimer as the optimal tool to estimate the risk of developing critical illness among patients with COVID-19. Early detection of patients who are likely to develop critical illness is of great importance in the clinical settings, which may help select patients at risk of rapid deterioration who should require high-level monitoring. If a patient’s predicted risk for critical illness is low, regular monitoring may be enough, whereas high-risk patients might need aggressive treatment or ICU care. However, large-scale prospective studies in the future are warranted to validate the effectiveness of our proposed machine learning models. The article’s supplementary files as

59 in total

Review 1. Big data and machine learning algorithms for health-care delivery.

Authors: Kee Yuan Ngiam; Ing Wei Khor
Journal: Lancet Oncol Date: 2019-05 Impact factor: 41.316

2. Machine Learning-state of the art.

Authors: Mark Nicholls
Journal: Eur Heart J Date: 2019-12-01 Impact factor: 29.983

3. Development and Validation of a Deep Learning-Based Model Using Computed Tomography Imaging for Predicting Disease Severity of Coronavirus Disease 2019.

Authors: Lu-Shan Xiao; Pu Li; Fenglong Sun; Yanpei Zhang; Chenghai Xu; Hongbo Zhu; Feng-Qin Cai; Yu-Lin He; Wen-Feng Zhang; Si-Cong Ma; Chenyi Hu; Mengchun Gong; Li Liu; Wenzhao Shi; Hong Zhu
Journal: Front Bioeng Biotechnol Date: 2020-07-31

4. Neutrophil-to-lymphocyte ratio predicts critical illness patients with 2019 coronavirus disease in the early stage.

Authors: Jingyuan Liu; Yao Liu; Pan Xiang; Lin Pu; Haofeng Xiong; Chuansheng Li; Ming Zhang; Jianbo Tan; Yanli Xu; Rui Song; Meihua Song; Lin Wang; Wei Zhang; Bing Han; Li Yang; Xiaojing Wang; Guiqin Zhou; Ting Zhang; Ben Li; Yanbin Wang; Zhihai Chen; Xianbo Wang
Journal: J Transl Med Date: 2020-05-20 Impact factor: 5.531

Review 5. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies.

Authors: Erik von Elm; Douglas G Altman; Matthias Egger; Stuart J Pocock; Peter C Gøtzsche; Jan P Vandenbroucke
Journal: PLoS Med Date: 2007-10-16 Impact factor: 11.069

6. Automatic distinction between COVID-19 and common pneumonia using multi-scale convolutional neural network on chest CT scans.

Authors: Tao Yan; Pak Kin Wong; Hao Ren; Huaqiao Wang; Jiangtao Wang; Yang Li
Journal: Chaos Solitons Fractals Date: 2020-07-25 Impact factor: 5.944

7. AI-assisted CT imaging analysis for COVID-19 screening: Building and deploying a medical AI system.

Authors: Bo Wang; Shuo Jin; Qingsen Yan; Haibo Xu; Chuan Luo; Lai Wei; Wei Zhao; Xuexue Hou; Wenshuo Ma; Zhengqing Xu; Zhuozhao Zheng; Wenbo Sun; Lan Lan; Wei Zhang; Xiangdong Mu; Chenxi Shi; Zhongxiao Wang; Jihae Lee; Zijian Jin; Minggui Lin; Hongbo Jin; Liang Zhang; Jun Guo; Benqi Zhao; Zhizhong Ren; Shuhao Wang; Wei Xu; Xinghuan Wang; Jianming Wang; Zheng You; Jiahong Dong
Journal: Appl Soft Comput Date: 2020-11-10 Impact factor: 6.725

8. Artificial intelligence for the detection of COVID-19 pneumonia on chest CT using multinational datasets.

Authors: Stephanie A Harmon; Thomas H Sanford; Sheng Xu; Evrim B Turkbey; Holger Roth; Ziyue Xu; Dong Yang; Andriy Myronenko; Victoria Anderson; Amel Amalou; Maxime Blain; Michael Kassin; Dilara Long; Nicole Varble; Stephanie M Walker; Ulas Bagci; Anna Maria Ierardi; Elvira Stellato; Guido Giovanni Plensich; Giuseppe Franceschelli; Cristiano Girlando; Giovanni Irmici; Dominic Labella; Dima Hammoud; Ashkan Malayeri; Elizabeth Jones; Ronald M Summers; Peter L Choyke; Daguang Xu; Mona Flores; Kaku Tamura; Hirofumi Obinata; Hitoshi Mori; Francesca Patella; Maurizio Cariati; Gianpaolo Carrafiello; Peng An; Bradford J Wood; Baris Turkbey
Journal: Nat Commun Date: 2020-08-14 Impact factor: 14.919

9. Characteristics of and Important Lessons From the Coronavirus Disease 2019 (COVID-19) Outbreak in China: Summary of a Report of 72 314 Cases From the Chinese Center for Disease Control and Prevention.

Authors: Zunyou Wu; Jennifer M McGoogan
Journal: JAMA Date: 2020-04-07 Impact factor: 56.272

2 in total

1. LHSPred: A web based application for predicting lung health severity.

Authors: Sudipto Bhattacharjee; Banani Saha; Parthasarathi Bhattacharyya; Sudipto Saha
Journal: Biomed Signal Process Control Date: 2022-05-12 Impact factor: 5.076

2. A Comparison of XGBoost, Random Forest, and Nomograph for the Prediction of Disease Severity in Patients With COVID-19 Pneumonia: Implications of Cytokine and Immune Cell Profile.

Authors: Wandong Hong; Xiaoying Zhou; Shengchun Jin; Yajing Lu; Jingyi Pan; Qingyi Lin; Shaopeng Yang; Tingting Xu; Zarrin Basharat; Maddalena Zippi; Sirio Fiorino; Vladislav Tsukanov; Simon Stock; Alfonso Grottesi; Qin Chen; Jingye Pan
Journal: Front Cell Infect Microbiol Date: 2022-04-12 Impact factor: 6.073

2 in total