Literature DB >> 33691201

Machine learning-based prognostic modeling using clinical data and quantitative radiomic features from chest CT images in COVID-19 patients.

Isaac Shiri¹, Majid Sorouri², Parham Geramifar³, Mostafa Nazari⁴, Mohammad Abdollahi², Yazdan Salimi¹, Bardia Khosravi², Dariush Askari⁵, Leila Aghaghazvini⁶, Ghasem Hajianfar⁷, Amir Kasaeian⁸, Hamid Abdollahi⁹, Hossein Arabi¹, Arman Rahmim¹⁰, Amir Reza Radmard¹¹, Habib Zaidi¹².

Abstract

OBJECTIVE: To develop prognostic models for survival (alive or deceased status) prediction of COVID-19 patients using clinical data (demographics and history, laboratory tests, visual scoring by radiologists) and lung/lesion radiomic features extracted from chest CT images.
METHODS: Overall, 152 patients were enrolled in this study protocol. These were divided into 106 training/validation and 46 test datasets (untouched during training), respectively. Radiomic features were extracted from the segmented lungs and infectious lesions separately from chest CT images. Clinical data, including patients' history and demographics, laboratory tests and radiological scores were also collected. Univariate analysis was first performed (q-value reported after false discovery rate (FDR) correction) to determine the most predictive features among all imaging and clinical data. Prognostic modeling of survival was performed using radiomic features and clinical data, separately or in combination. Maximum relevance minimum redundancy (MRMR) and XGBoost were used for feature selection and classification. The receiver operating characteristic (ROC) curve and the area under the ROC curve (AUC), sensitivity, specificity, and accuracy were used to assess the prognostic performance of the models on the test datasets.
RESULTS: For clinical data, cancer comorbidity (q-value < 0.01), consciousness level (q-value < 0.05) and radiological score involved zone (q-value < 0.02) were found to have high correlated features with outcome. Oxygen saturation (AUC = 0.73, q-value < 0.01) and Blood Urea Nitrogen (AUC = 0.72, q-value = 0.72) were identified as high clinical features. For lung radiomic features, SAHGLE (AUC = 0.70) and HGLZE (AUC = 0.67) from GLSZM were identified as most prognostic features. Amongst lesion radiomic features, RLNU from GLRLM (AUC = 0.73), HGLZE from GLSZM (AUC = 0.73) had the highest performance. In multivariate analysis, combining lung, lesion and clinical features was determined to provide the most accurate prognostic model (AUC = 0.95 ± 0.029 (95%CI: 0.95-0.96), accuracy = 0.88 ± 0.046 (95% CI: 0.88-0.89), sensitivity = 0.88 ± 0.066 (95% CI = 0.87-0.9) and specificity = 0.89 ± 0.07 (95% CI = 0.87-0.9)).
CONCLUSION: Combination of radiomic features and clinical data can effectively predict outcome in COVID-19 patients. The developed model has significant potential for improved management of COVID-19 patients.

Entities: Chemical Disease Gene Species

Keywords: COVID-19; Computed tomography (CT); Modeling; Prognosis; Radiomics

Year: 2021 PMID： 33691201 PMCID： PMC7925235 DOI： 10.1016/j.compbiomed.2021.104304

Source DB: PubMed Journal: Comput Biol Med ISSN： 0010-4825 Impact factor: 4.589

Introduction

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) disease (COVID-19) has significantly impacted global health and continues to be a major global concern as the number of infected patients and mortality are still rapidly growing [1,2]. The first line approach to diagnose COVID-19 involves the usage of a molecular diagnostic method, referred to as real-time quantitative reverse transcription-polymerase chain reaction (qPCR) assay [3]. In addition, X-ray computed tomography (CT) has garnered much clinical and research interest for the management of COVID-19 patients [4]. Several studies have compared the two diagnostic methods and documented their benefits and limitations. For instance, some of these studies reported that qPCR has variable sensitivity for different biological samples, while CT was unable to detect small infected lung regions [[3], [4], [5]]. Furthermore, the importance of predicting patients’ prognosis based on early findings in the course of the disease has been an area of active research [6,7]. The emerging field of radiomics provides a reliable, non-invasive and cost-effective approach to improve diagnosis, prognosis and therapy response prediction in a number of diseases [[8], [9], [10], [11], [12], [13], [14], [15]]. Radiomics is an image data mining framework enabling to extract extensive information from medical images using a wide range of features, based on which a correlation is established with clinical and biological findings [8,9,[16], [17], [18], [19], [20], [21], [22]]. Furthermore, radiomic studies can be used to provide differential diagnosis [23]. CT image radiomics are increasingly utilized for this purpose. Yanling et al. [24] developed a radiomics nomogram incorporating CT radiomic signatures and laboratory data for differentiating bacterial pneumonia from acute paraquat lung injury. In another study, Wang et al. [25] applied CT radiomics for differential diagnosis of progressive pulmonary tuberculosis from community-acquired pneumonia. Their radiomics model outperformed senior radiologists’ clinical judgment [25]. A number of studies applied deep or machine learning algorithms for COVID-19 outbreak prediction, detection/segmentation of infected pneumonia regions from radiologic images, as well as new drug development and disease screening [[26], [27], [28], [29], [30], [31], [32], [33], [34], [35]]. In diagnostic studies, artificial intelligence approaches have been applied to various medical imaging modalities, including radiography, ultrasound, and CT to build more accurate detection/diagnostic models [36,37]. For the specific case of CT, a number of radiomic studies have been conducted for detection, including screening patients from other lung infections, and prediction of hospital stay. In these studies, CT radiomic features and machine learning algorithms were used to develop and implement such models. Qi et al. [4] studied 52 COVID-19 patients for predicting hospital stay. CT radiomic features and machine learning algorithms, including logistic regression and Random Forest were employed, wherein the model exhibited area under the receiver operating characteristic (ROC) curve (AUC) values of 0.97 and 0.92 for logistic regression and Random Forest algorithms, respectively. The detection radiomic models developed by Guiot et al. [38] depicted a sensitivity and specificity of 78.9% and 91.1%, respectively, whereas the radiomics signature to detect COVID-19 from CT images developed by Fang et al. [39] achieved AUC of 0.82 for the test sets. When utilizing radiomics and machine learning or deep learning approaches, studies have indicated that these approaches alone or in combination with clinical information have the potential to serve as substitutes for diagnosis, prognosis and therapy response performance evaluation [9,11]. In the present study, we aimed to develop prognostic models to predict survival (alive or deceased status) in COVID-19 patients. Specifically, we aimed to develop various prognostic models using CT radiomic features, clinical data (demographics and history, and laboratory tests) and radiological scores obtained from radiologist's reports.

Materials and methods

Fig. 1 summarizes the various steps involved in the study design. Transparent Reporting of a multivariable prediction model for Individual Prognosis Or Diagnosis (TRIPOD) [40] check list was reported in supplemental Table 1.

Fig. 1

Flowchart of the adopted study protocol.

Patient population

This retrospective study was conducted with institutional review board approval. Formal written consent was waived owing to the nature of the study. All patients admitted to our tertiary center between February 11, 2020 and Jun 20, 2020 were enrolled in our study protocol. First, we collected a COVID-19 dataset by applying a set of inclusion and exclusion criteria. Our inclusion criteria were as follows: 1) patients undergoing high quality CT scans, 2) confirmation of COVID-19 by qPCR, 3) visible infected regions in the lungs, and 4) availability of clinical and radiological data and reports. The inclusion and exclusion criteria of patients are presented in Fig. 2 . All patients had a median of 6 days interval between symptoms start date and admission to the hospital. All patients received one of the two standard treatment regimens administered in the hospital: Hydroxychloroquine, Lopinavir/Ritonavir. All clinical, laboratory and imaging features were extracted at the first day of admission.

Fig. 2

Inclusion and exclusion criteria followed in the study protocol.

Clinical data

Demographics, history and clinical data

Gender, age, weight, height, BMI, past medical history of comorbidities (i.e. diabetes, hypertension, ischemic heart disease, and cancer), history of smoking, initial vital signs, including respiratory rate (RR), O2 saturation (O2Sat), pulse rate (PR), systolic blood pressure (SBP), diastolic blood pressure (DBP), temperature in degrees Celsius (T) and the level of consciousness were obtained and recorded [6,[41], [42], [43]].

Laboratory data

Upon admission, the results of laboratory tests were extracted from patient medical records. These included aspartate aminotransferase in U/L (AST), alanine aminotransferase (ALT) in U/L, alkaline phosphatase (ALP) in U/L, total and direct bilirubin (T.Bill and D.Bill, respectively) in mg/dL, hemoglobin (HB) in g/dL, white blood cells (WBC) in/mm3, venous blood gas analysis of acidity (PH), carbon dioxide concentration (PCO2), and bicarbonate concentration (HCO3), C-reactive protein in mg/L (CRP), platelet count in/mm3 (Plt), blood creatinine level in mg/dL (Cr), blood urea nitrogen (BUN) in mg/dL, prothrombin time (PT) in seconds, partial thromboplastin time (PTT) in seconds, prothrombin time normalized with the international normalized ratio (INR), procalcitonin levels (PCT) in ng/dL, and sodium and potassium (Na and K, respectively) in mEq/L. We also used the differential counts of neutrophils, lymphocytes, monocytes, and eosinophils in percentages (Neutr.Diff, Lymph.Diff, Mono.Diff, and Eosin.Diff, respectively) [6,[41], [42], [43]].

Radiological scores

To obtain radiological data, we designed a questionnaire layout (presented in Supplemental material) based on which the following information was gathered: (a) type of parenchymal abnormality, such as (i) ground-glass opacities (GGO) (ii) consolidation, (iii) reticular pattern, and (iv) mixed pattern; (b) axial and craniocaudal distribution; (c) pleural effusion; (d) pericardial effusion; and (e) emphysema [6,[41], [42], [43]]. In addition, we adapted the 6-zone segmentation, which includes upper, middle and lower zone of each lung [6,[41], [42], [43]]. Both left and right lung were divided into three zones including carina upper level, between the carina and inferior pulmonary vein, and below the inferior pulmonary vein level [6,[41], [42], [43]]. Then, we separately evaluated each zone involvement and scored between 0 and 4 (0: no involvement, 1: 1%–25% involved, 2: 26%–50% involved, 3: 51%–75% involved, and 4: 76%–100% involved) [6,[41], [42], [43], [44], [45], [46]]. The total involvement score (Inv.Score) was calculated by summing up different zones scores. All radiological scores were assigned by consensus of two experienced radiologists to report CT scan findings and a third senior radiologist settled any discordance/dispute between the two [6,[41], [42], [43]].

CT imaging

CT scanning with breath holding was performed on a 16 detector-row Brilliance 16CT scanner (Philips Medical Systems, Best, the Netherlands) using the following scanning parameters: A tube voltage of 100 KVp was used for patients with BMI ≤ 30 (111 patients) whereas 120 KVp was used for patients with BMI > 30 (41 patients); 45 mA tube current; 16 × 1.5 mm collimation; 0.5 s rotation time; pitch of 1.0; and 35 cm field-of-view [6,[41], [42], [43]]. Low quality CT images owing to patient bulk motion, severe respiratory motion, or when the axial coverage was less than the total lungs were excluded.

Image segmentation

Two anatomical segmentations were performed, namely 1) whole lung (Lung) and 2) COVID-19 lesions (Lesion). All segmentations were performed by a radiologist (12 years experience) using the 3D slicer software v4.8.1 [47].

Image preprocessing and feature extraction

All CT images were interpolated to isotropic voxel and re-sampled to 1 × 1 × 1 mm3 [48]. Subsequently, bin discretization to 64gray levels were performed for radiomics analysis [48]. For feature extraction, first-order statistics (19 FOS features), shape-based (16 shape features), gray-level co-occurrence matrix (GLCM 23 features), gray-level run length matrix (16 GLRLM features), gray-level size zone matrix (16 GLSZM features), neighboring gray tone difference matrix (5 NGTDM features), and gray level dependence matrix (14 GLDM features) features were extracted [48,49]. Feature extraction was performed using PyRadiomics [49] v2.1.2 python-based software package, which was standardized through the Image Biomarker Standardization Initiative (IBSI) [48]. Full details about the feature categories are provided in Table 1 . We also constructed 15 new shape features through division of lesion shape features by whole lung shape features to extract relative shape features.

Table 1

Detailed description of the extracted radiomic features used in this study protocol.

Shape Features	Gray Level Size Zone Matrix (GLSZM)	Gray Level Dependence Matrix (GLDM)
Voxel Volume (VVolume)Mesh Volume (MVolume)Surface AreaSurface Area to Volume ratio (SVR)SphericityMaximum 3D diameter (M3DD)Maximum 2D diameter (Slice) (M2DDS)Maximum 2D diameter (Column) (M2DDC)Maximum 2D diameter (Row) (M2DDR)Major AxisMinor AxisLeast AxisElongationFlatness	Small Area Emphasis (SAE)Large Area Emphasis (LAE)Gray Level Non-Uniformity (GLN)Gray Level Non-Uniformity Normalized (GLNN)Size-Zone Non-Uniformity (SZN)Size-Zone Non-Uniformity Normalized (SZNN)Zone Percentage (ZP)Gray Level Variance (GLV)Zone Variance (ZV)Zone Entropy (ZE)Low Gray Level Zone Emphasis (LGLZE)High Gray Level Zone Emphasis (HGLZE)Small Area Low Gray Level Emphasis (SALGLE)Small Area High Gray Level Emphasis (SAHGLE)Large Area Low Gray Level Emphasis (LALGLE)Large Area High Gray Level Emphasis (LAHGLE)	Small Dependence Emphasis (SDE)Large Dependence Emphasis (LDE)Gray Level Non-Uniformity (GLN)Dependence Non-Uniformity (DN)Dependence Non-Uniformity Normalized (DNN)Gray Level Variance (GLV)Dependence Variance (DV)Dependence Entropy (DE)Low Gray Level Emphasis (LGLE)High Gray Level Emphasis (HGLE)Small Dependence Low Gray Level Emphasis (SDLGLE)Small Dependence High Gray Level Emphasis (SDHGLE)Large Dependence Low Gray Level Emphasis (LDLGLE)Large Dependence High Gray Level Emphasis (LDHGLE)
		Gray Level Run Length Matrix (GLRLM)
		Short Run Emphasis (SRE)Long Run Emphasis (LRE)Gray Level Non-Uniformity (GLN)Gray Level Non-Uniformity Normalized (GLNN)Run Length Non-Uniformity (RLN)Run Length Non-Uniformity Normalized (RLNN)Run Percentage (RP)Gray Level Variance (GLV)Run Variance (RV)Run Entropy (RE)Low Gray Level Run Emphasis (LGLRE)High Gray Level Run Emphasis (HGLRE)Short Run Low Gray Level Emphasis (SRLGLE)Short Run High Gray Level Emphasis (SRHGLE)Long Run Low Gray Level Emphasis (LRLGLE)Long Run High Gray Level Emphasis (LRHGLE)
First Order Statistics (FO)	Gray Level Co-occurrence Matrix (GLCM)
EnergyTotal Energy (TE)EntropyMinimum10th percentile90th percentileMaximumMeanMedianInterquartile Range (IQR)RangeMean Absolute Deviation (MAD)Robust Mean Absolute Deviation (RMAD)Root Mean Squared (RMS)SkewnessKurtosisVarianceUniformity	Autocorrelation (AC)Joint Average (JA)Cluster Prominence (CP)Cluster Shade (CS)Cluster Tendency (CT)ContrastCorrelationDifference Average (DA)Difference Entropy (DE)Difference Variance (DV)Joint Energy (JEnergy)Joint Entropy (JEntropy)Informal Measure of Correlation (IMC) 1Informal Measure of Correlation (IMC) 2Inverse Difference Moment (IDM)Inverse Difference Moment Normalized (IDMN)Inverse Difference (ID)Inverse Difference Normalized (IDN)Inverse Variance (IV)Maximum Probability (MP)Maximum Correlation Coefficient (MCC)Sum Average (SA)Sum Entropy (SE)Sum of Squares (SS)
		Neighboring Gray Tone Difference Matrix (NGTDM)
		CoarsenessContrastBusynessComplexityStrength

Detailed description of the extracted radiomic features used in this study protocol.

Univariate analysis

We performed univariate analysis after normalization of each feature to Z-scores to determine the prognostic importance of each feature (clinical, radiological and radiomics). For continuous features, we performed Student's t-test and area under the ROC curve (AUC) analysis. The performance of categorical features was evaluated using the Chi-square test. We also applied false discovery rate (FDR) correction to the q-value (FDR adjusted p-value) to assess the significance of the features.

Multivariate machine learning analysis

The maximum relevance minimum redundancy (MRMR) algorithm [50] was used for feature selection. To this end, the maximum-relevance selection approach was employed to select features with maximal correlation to patients’ outcome (alive or deceased status) and minimum-redundancy selection, thus ensuring minimal redundancy among features [50]. The eXtreme Gradient Boosting (XGBoost) machine learning algorithm [51] which is an ensemble learning algorithms based on different decision trees was adopted for classification. Feature selection and classification were performed using praznik and caret R packages,2 respectively.

Data modeling and univariate analysis

We developed various prognostic models using the collected data and the adopted classification method. Our models were: 1) Clinical (pre-clinical, lab and radiological data), 2) Lung radiomics (radiomic features extracted from the whole lung), 3) Lesion radiomics (radiomic features extracted from lesions), 4) Lung + Lesion (combined radiomic features extracted from whole lung and lesions), 5) Lung + Clinical (combined radiomic features extracted from whole lung, clinical and radiological data), 6) Lesion + Clinical (combined radiomic features extracted from lesions, clinical and radiological data) and 7) Lung + Lesion + Clinical (combined radiomic features extracted from whole lung, lesions, clinical and radiological data). For model validation, 106 patients were used as the training/validation dataset whereas the remaining 46 patients were used as the test (unseen and untouched during training) dataset. The ROC, AUC, accuracy (ACC), sensitivity (SEN) and specificity (SPE) were used to assess the prognostic performance of the models. The different steps followed are summarized in Fig. 3 :

Fig. 3

Flowchart of the training and test steps implemented in the current study.

The model was trained using 106 patients' data. Yet, to find the optimal hyperparameters of models, we used bootstrap resampling with 1000 repetitions. Bootstrap techniques were used for XGBoost hyperparameters tunning (using the random search method) implemented with 1000 repetitions. After tuning and optimizing the models based on AUC, we selected the optimal model. The optimal model was tested on the test set (test datasets untouched/unseen during bootstrapping). We calculated Accuracy, AUC, Sensitivity, and Specificity for the optimal model on the test datasets. Steps 1–3 were repeated 100 times to ensure the repeatability of the results. The Mean, SD and CI95% were calculated from step 4 for Accuracy, AUC, Sensitivity, Specificity metrics. Clinical, radiomics and combined models were statistically evaluated using the results of step 5. Flowchart of the training and test steps implemented in the current study. We repeatedly trained a bootstrapped model with 1000 repetition (on 106 patients dataset) and tested on an independent dataset for 100 times to make sure that the results are repeatable for different models. All results were reported on 46 test sets (unseen during model training by bootstrap) by 100 times repetitions. The mean, standard deviation and 95% confidence interval (CI) were reported for each model for 100 times repetitions of the whole process. After data normality test using Kolmogorov-Smirnov normality test, we used Wilcoxon signed-rank test to determine significant differences between the models. A p-value < 0.05 was used as a criterion for statistically significant differences. All statistical analysis was performed using R 3.6.3 software.

Results

Following application of inclusion and exclusion criteria, 152 patients including 87 males and 65 females were retained from an initial triage of 545 patients. Forty patients with a mean age of 65.7 years had critical conditions and eventually died, whereas 112 cases with a mean age of 59.5 years fully recovered from COVID-19. The flowchart in Fig. 2 shows the number of excluded, included, recovered and deceased patients. The details of the descriptive statistics of continuous and categorical features of patients are presented in Table 2, Table 3 for the training/validation and test sets (unseen and untouched during training).

Table 2

Descriptive statistics (mean ± STD) of continues clinical features collected for the training/validation and test sets.

Continues Features	Training/Validation	Test Set	p-value
Lesion Volume	280 ± 260	260 ± 230	0.51
Lung Volume	1300 ± 300	1300 ± 300	0.92
Lesion Lung Ratio	0.22 ± 0.22	0.21 ± 0.18	0.70
Age	62 ± 17	60 ± 15	0.53
Weight	77 ± 15	78 ± 16	0.18
Height	170 ± 9.7	170 ± 9.5	0.54
BMI	28 ± 4.7	28 ± 5.8	0.32
O2 Saturation (O2Sat)	90 ± 7.8	87 ± 10	0.66
Systolic Blood Pressure (SBP)	130 ± 23	120 ± 24	0.76
Diastolic Blood Pressure (DBP)	78 ± 14	76 ± 12	0.60
Respiratory Rate (RR)	20 ± 4.7	22 ± 7.3	0.39
Pulse Rate (PR)	94 ± 20	93 ± 13	0.08
Temperature in Celsius Degree (T)	37 ± 1	37 ± 0.93	0.61
HB	13 ± 2.9	12 ± 2.8	0.90
White Blood Cells (WBCs)	9000 ± 17000	9900 ± 8000	0.18
Platelet Count in/mm3 (Plt)	180000 ± 96000	210000 ± 110000	0.65
Lymphocyte Diff	19 ± 13	20 ± 13	0.62
Neutrophile Diff	73 ± 16	71 ± 16	0.67
Monocyte Diff	5.9 ± 3.4	6.7 ± 2.9	0.82
Eosinophile Diff	1.7 ± 2.6	1.6 ± 1.1	0.19
C-reactive Protein in mg/L (CRP)	64 ± 46	66 ± 40	0.45
Blood Creatinine Level in mg/dL (Cr)	1.3 ± 1	1.7 ± 1.7	0.18
BUN	24 ± 23	28 ± 26	0.84
AST	82 ± 220	54 ± 43	0.45
ALT	53 ± 160	38 ± 47	0.75
ALP	250 ± 250	180 ± 110	0.52
Sodium in mEq/L (Na)	140 ± 18	140 ± 3.6	0.39
Potassium in mEq/L (K)	4.6 ± 0.75	4.7 ± 0.72	0.96
PT	18 ± 10	16 ± 4.9	0.87
PTT	28 ± 11	27 ± 7.6	0.27
INR	1.6 ± 1	1.3 ± 0.43	0.58
T.Bill	2.1 ± 5	1.6 ± 2.6	0.50
D.Bill	0.91 ± 2.7	0.42 ± 0.26	0.75
PH	7.4 ± 0.08	7.4 ± 0.1	0.89
PCO2	40 ± 8.8	41 ± 13	0.07
HCO3	24 ± 5.7	24 ± 5.7	0.16
Total Invasive Score (Total.Inv.Score)	6.4 ± 4	7.2 ± 4.9	0.88

Table 3

Descriptive statistics (frequency and percent) of discrete (categorical) clinical features in the training/validation and test sets.

Categorical Features		Training/Validation (frequency in %)	Test (frequency in %)	p-value
Gender	F	37 (34.6%)	28 (62.2%)	0.15
Gender	M	70 (65.4%)	17 (37.8%)	0.15
ground-glass opacities (GGO)	0	4 (3.74%)	0	1.00
	1	49 (45.8%)	21 (46.7%)
	2	54 (50.5%)	24 (53.3%)
Consolidation	0	7 (6.54%)	7 (15.6%)	1.00
	1	53 (49.5%)	13 (28.9%)
	2	47 (43.9%)	25 (55.6%)
Reticular	0	36 (33.6%)	12 (26.7%)	0.58
	1	51 (47.7%)	25 (55.6%)
	2	20 (18.7%)	8 (17.8%)
Axial Distribution)Ax.Dist(	1	61 (57%)	23 (51.1%)	0.74
	2	2 (1.87%)	1 (2.22%)
	3	44 (41.1%)	21 (46.7%)
Coronal Distribution (CC.Dist)	1	4 (3.74%)	2 (4.44%)	0.37
	2	36 (33.6%)	16 (35.6%)
	3	67 (62.6%)	27 (60%)
Number of Involved Zones (Num.Zones.Involved)	1	3 (2.8%)	1 (2.22%)	0.79
	2	9 (8.41%)	3 (6.67%)
	3	6 (5.61%)	4 (8.89%)
	4	9 (8.41%)	5 (11.1%)
	5	20 (18.7%)	5 (11.1%)
	6	60 (56.1%)	27 (60%)
Pleural Effusion (Pleural.Eff)	0	78 (72.9%)	36 (80%)	0.46
Pleural Effusion (Pleural.Eff)	1	29 (27.1%)	9 (20%)	0.46
Pericardial Effusion (Pericardial.Eff)	0	88 (82.2%)	37 (82.2%)	1.00
Pericardial Effusion (Pericardial.Eff)	1	19 (17.8%)	8 (17.8%)	1.00
Emphysema	0	79 (73.8%)	34 (75.6%)	0.61
Emphysema	1	28 (26.2%)	11 (24.4%)	0.61
Cardiomegaly	0	49 (45.8%)	22 (48.9%)	1.00
Cardiomegaly	1	58 (54.2%)	23 (51.1%)	1.00
Diabetes	0	76 (71%)	35 (77.8%)	0.50
Diabetes	1	31 (29%)	10 (22.2%)	0.50
Hypertension) HTN (	0	71 (66.4%)	28 (62.2%)	1.00
Hypertension) HTN (	1	36 (33.6%)	17 (37.8%)	1.00
Ischemic Heart Disease) IHD (	0	82 (76.6%)	37 (82.2%)	1.00
Ischemic Heart Disease) IHD (	1	25 (23.4%)	8 (17.8%)	1.00
Cancerous	0	95 (88.8%)	42 (93.3%)	0.75
Cancerous	1	12 (11.2%)	3 (6.67%)	0.75
Smoking	1	94 (87.85%)	39 (86.67%)	0.55
	2	9 (8.41%)	4 (8.89%)
	3	4 (3.74%)	2 (4.44%)
Consciousness	0	2 (1.87%)	0	0.46
	1	1 (0.935%)	1 (2.22%)
	2	9 (8.41%)	4 (8.89%)
	3	95 (88.8%)	40 (88.89%)
Mixed	0	50 (46.7%)	30 (66.7%)	0.57
Mixed	1	57 (53.3%)	15 (33.3%)	0.57

Descriptive statistics (mean ± STD) of continues clinical features collected for the training/validation and test sets. Descriptive statistics (frequency and percent) of discrete (categorical) clinical features in the training/validation and test sets. Our univariate analysis of clinical features is shown in Supplemental Figures 1 and 2 as categorical and continuous features in terms of AUC, p- and q-values. In this regard, clinical features, including BUN (AUC = 0.73) and oxygen saturation (AUC = 0 0.71), monocyte (AUC = 0.70) and a number of involved zones (AUC = 0.70) were identified as most prognostic features. Amongst continuous clinical features, systolic blood pressure, diastolic blood pressure, hemoglobin, platelet, lymphocyte, neutrophil, monocyte, had a q-value < 0.05. Amongst discrete (categorical) clinical features, smoking, cancerous, consciousness and total involvement score proved to be statistically significant features (p-value < 0.05) between two alive and deceased group. For univariate radiomics analysis, the results are displayed in Supplemental Figures 3 and 4. Amongst lung radiomic features, SAHGLE (AUC = 0.70) and HGLZE (AUC = 0.67) from GLSZM, JA and SA from GLCM, Busyness from NGTDM and Median from FO (AUC = 0.67) were found as most prognostic features. Amongst lesion radiomic features, RLNU from GLRLM, HGLZE from GLSZM, DNU from GLDM, Range from FO and Volume from Shape (AUC = 0.73) had the highest performance with significant q-value after FDR correction.

Models

Importance features selected by the MRMR algorithm were reported for each model in Supplemental Table 2. Fig. 4 depicts the heat map of AUC, ACC, SEN and SPE for different combinations of models. The mean (STD) and confidence interval (CI) for AUC, ACC, SEN and SPE for the developed models (test set) are summarized in Table 4, Table 5 , respectively. Our results indicated that the combined model (Lung + Lesion + Clinical) had the highest prognostic capability with AUC = 0.95 ± 0.02, ACC = 0.88 ± 0.04, SEN = 0.88 ± 0.06 and SPE = 0.89 ± 0.07. The 95% CI for these parameters were 0.95–0.96, 0.88–0.89, 0.87–0.90 and 0.87–0.90, respectively.

Fig. 4

Heat map of area under the curve (AUC), accuracy (ACC), sensitivity (SEN) and specificity (SPE) for different combinations of models.

Table 4

Mean and STD of area under the curve (AUC), accuracy (ACC), sensitivity (SNE) and specificity (SPE) in the test set for the different models studied.

Mean ± Sd	AUC	ACC	SEN	SPE
Clinical	0.87 ± 0.04	0.79 ± 0.05	0.76 ± 0.07	0.82 ± 0.08
Lung	0.92 ± 0.03	0.85 ± 0.04	0.85 ± 0.07	0.85 ± 0.06
Lesion	0.92 ± 0.03	0.85 ± 0.05	0.87 ± 0.06	0.83 ± 0.08
Lung + Lesion	0.91 ± 0.04	0.83 ± 0.05	0.85 ± 0.08	0.80 ± 0.09
Lung + Clinical	0.92 ± 0.03	0.85 ± 0.04	0.83 ± 0.06	0.87 ± 0.05
Lesion + Clinical	0.94 ± 0.03	0.87 ± 0.04	0.87 ± 0.07	0.87 ± 0.06
Lung + Lesion + Clinical	0.95 ± 0.03	0.88 ± 0.04	0.88 ± 0.06	0.89 ± 0.07

Table 5

Confidence interval (CI) of area under the curve (AUC), accuracy (ACC), sensitivity (SNE) and specificity (SPE) in the test set for the different models.

CI (lower-upper)	AUC	ACC	SEN	SPE
Clinical	0.86–0.87	0.78–0.80	0.74–0.77	0.81–0.84
Lung	0.91–0.92	0.84–0.86	0.84–0.86	0.84–0.87
Lesion	0.91–0.93	0.84–0.86	0.85–0.88	0.81–0.84
Lung + Lesion	0.90–0.91	0.82–0.84	0.84–0.87	0.79–0.82
Lung + Clinical	0.92–0.93	0.84–0.86	0.82–0.84	0.86–0.88
Lesion + Clinical	0.93–0.95	0.86–0.88	0.86–0.89	0.85–0.88
Lung + Lesion + Clinical	0.95–0.96	0.88–0.89	0.87–0.90	0.87–0.90

Heat map of area under the curve (AUC), accuracy (ACC), sensitivity (SEN) and specificity (SPE) for different combinations of models. Mean and STD of area under the curve (AUC), accuracy (ACC), sensitivity (SNE) and specificity (SPE) in the test set for the different models studied. Confidence interval (CI) of area under the curve (AUC), accuracy (ACC), sensitivity (SNE) and specificity (SPE) in the test set for the different models. The ROC curve and boxplot of these models for the test set are shown in Fig. 5, Fig. 6 , respectively. In the boxplots, significant differences among the models can be observed. To compare the models in terms of significant changes in AUC, ACC, SEN and SPE, p-value plots are shown in Fig. 7 . It can be seen that the combined Lung + Lesion + Clinical model has significant AUC differences (p < 0.05) relative to other models. The model was also significantly different in terms of ACC with respect to all models. With respect to SPE, all models had significant differences, except Lung + Clinical model, whereas for SEN, all models were significantly different, except the Lesion + Clinical model. In terms of AUC, except Lung and Lesion, Lung + Clinical and Lung, Lung + Lesion and Lung models as well as Lung + Clinical and Lesion models were significantly different (p < 0.05).

Fig. 5

ROC curve of the different models in the test sets.

Fig. 6

Box plot of the area under the curve (AUC), accuracy (ACC), sensitivity (SEN) and specificity (SPE) for different combinations of models. P-values comparing differences in values with respect to the Lung + Lesion + Clinical model are shown. Not significant (ns): p > 0.05, *: p ≤ 0.05, **: p ≤ 0.01, ***: p ≤ 0.001 and ****: p ≤ 0.0001.

Fig. 7

P-values for the comparison between the different models with respect to the area under the curve (AUC), accuracy (ACC), sensitivity (SEN) and specificity (SPE).

ROC curve of the different models in the test sets. Box plot of the area under the curve (AUC), accuracy (ACC), sensitivity (SEN) and specificity (SPE) for different combinations of models. P-values comparing differences in values with respect to the Lung + Lesion + Clinical model are shown. Not significant (ns): p > 0.05, *: p ≤ 0.05, **: p ≤ 0.01, ***: p ≤ 0.001 and ****: p ≤ 0.0001. P-values for the comparison between the different models with respect to the area under the curve (AUC), accuracy (ACC), sensitivity (SEN) and specificity (SPE).

Discussion

A novel approach for prognostication of COVID-19 patients using different image-derived features, including semantic, radiomics and clinical data (demographics and history, laboratory tests and visual scoring of CT by radiologists) was presented in this work. We demonstrated that clinical and quantitative radiomic features, alone or in combination, can be used as potential biomarkers for the prediction of survival in COVID-19 patients. Although some radiomic studies have been conducted in the framework of COVID-19, this is to the best of our knowledge, the first study reporting on the use of advanced combined models for prognosis survival analysis. We extracted two categories of radiomic features from whole lung and lung lesions. The aim was to assess how the extracted radiomic features might be used as different prognostic parameters. Decoding heterogeneity is among the aims of radiomics analysis. Hence, since the delineated lesions and whole lungs have different characteristics, they could serve as different markers. Conversely, combining these features along with other clinical parameters provides more variables for developing more predictive models. In previous lung radiomic studies, adding clinical data to radiomic signatures improved model performance. A study by Chen et al. [52] demonstrated that adding clinical data, such as smoking history, enabled to slightly improve performance for differentiating peripherally-located small cell lung cancer from non-small cell lung cancer using CT radiomic features. Univariate analysis showed that some clinical parameters might be predictive. Although for single clinical parameters, the highest AUCs were achieved for BUN, oxygen saturation, monocyte count and a number of involved zones. These findings were congruent with previous studies. Several studies have suggested that increased BUN can be attributed to acute kidney injury during the course of the disease, which can be a cause of adverse effects [[53], [54], [55]]. Colombi et al. [56] reported lung involvement area and hypoxia as reliable predictors of ICU admission and mortality among COVID-19 patients. Conversely, we believe that the fact that the results on monocyte count can be an outcome predictor is the consequence of our sample size as Zeng et al. [57] did not report significant differences between severe and non-severe patients in a population of 3090 patients from 15 different studies. With respect to single imaging measures, we observed that several radiomic features were predictive in both lesion and whole lung delineations. When comparing these two feature categories, it appeared that whole lung features have higher AUCs. A recent study by Tan et al. [58] demonstrated the predictive value of non-focus area of CT images to distinguish different clinical types of COVID-19 pneumonia. In our study, whole lung features provided more relevant characteristics of the disease in COVID-19 patients. Extracting radiomic features from infected and non-infected regions provided a more accurate prognostic model. On the modeling of prognosis prediction, we observed that the combined model including all measures had the highest performance and had significant differences with other models. In addition, we showed that other models have similar behavior, although there were significant differences among them. For both radiomic models, the AUC had a range varying from 0.91 to 0.93. However, when they were combined with clinical features, the AUC improved from 0.91 to 0.93 to 0.95–0.96. These results indicated that the combined models provided more accurate information. In addition, compared to univariate radiomics analysis, multivariate modeling results in more reliable results. It should be emphasized that in predictive modeling, the variables are the heart of the model, although the classifiers have a critical role [59,60]. In this work, we used a classifier with a wide range of features. A wide range of image analysis algorithms combined with machine learning techniques were recently designed for COVID-19 detection, diagnosis and prognosis. Li et al. [61] applied artificial intelligence algorithms to distinguish COVID-19 from community acquired pneumonia on chest CT. They developed a deep learning model, COVID-19 detection neural network (COVNet), to extract visual features from volumetric chest CT examinations for the detection of COVID-19 and compared them to community acquired pneumonia and other non-pneumonia images. Their model achieved a sensitivity and specificity of 90% and 96%, respectively, with an AUC of 0.96. Although deep learning models have achieved a high predictive/prognostic performance, their mechanisms of action are not fully understood [62]. Yet, radiomic features may provide more reliable results because they capture tissue characteristics. Hence, studies have indicated that these markers could decode the biological properties of the tissues [63,64]. In this regard, we believe that our results could be exploited reliably in clinical practice. Although the presented results are important, this study inherently bears a number of limitations. First, the sample size is low and there is a lack of external validation set from different centers. Further clinical studies are needed to verify our results with larger clinical databases. Second, therapeutic strategies for patients were not considered in this study. Including treatment parameters in the models will provide more reliable results. Third, although our imaging and radiomics settings were similar for all patients, we suggest to assess radiomic features reproducibility before clinical adoption. Image segmentation was performed by an experienced radiologist only once, and as such, it was not possible to estimate the intra-observer variability and repeatability of the segmentations. Forth, we only tested one feature selection and classifier algorithms. As there is no one-fits-all machine learning algorithm, different combinations of feature selectors and classifiers result in different performance. Future studies should focus on evaluating different algorithms and comparing their performance [65,66]. Various inclusion and exclusion criteria were applied retrospectively to the datasets, which decreased the number of cases. Patients with severe motion artifacts, which is inevitable in some non-cooperative or unstable patients, were excluded. This might impact the generalizability of the obtained results. Future studies should use a more heterogeneous dataset in terms of CT image quality to ensure the generalizability of the model.

Conclusion

We demonstrated that the combination of radiomic features, clinical and radiological data could be used to effectively predict survival in COVID-19 patients. To the best of our knowledge, this is the first study applying such methodology for COVID-19 prognosis survival modeling. We also demonstrated that there are a number of individual predictive clinical or imaging features having the potential to be used in routine clinical practice for more accurate management of COVID-19 patients.

Declaration of competing interest

The authors declare that they have no conflict of interest.

19 in total

1. Predicting the Disease Severity of Virus Infection.

Authors: Xin Qi; Li Shen; Jiajia Chen; Manhong Shi; Bairong Shen
Journal: Adv Exp Med Biol Date: 2022 Impact factor: 2.622

Review 2. An overview of the National COVID-19 Chest Imaging Database: data quality and cohort analysis.

Authors: Dominic Cushnan; Oscar Bennett; Rosalind Berka; Ottavia Bertolli; Ashwin Chopra; Samie Dorgham; Alberto Favaro; Tara Ganepola; Mark Halling-Brown; Gergely Imreh; Joseph Jacob; Emily Jefferson; François Lemarchand; Daniel Schofield; Jeremy C Wyatt
Journal: Gigascience Date: 2021-11-25 Impact factor: 6.524

3. A Deep Learning Approach to Identify Chest Computed Tomography Features for Prediction of SARS-CoV-2 Infection Outcomes.

Authors: Amirhossein Sahebkar; Mitra Abbasifard; Samira Chaibakhsh; Paul C Guest; Mohamad Amin Pourhoseingholi; Amir Vahedian-Azimi; Prashant Kesharwani; Tannaz Jamialahmadi
Journal: Methods Mol Biol Date: 2022

4. Challenges of Multiplex Assays for COVID-19 Research: A Machine Learning Perspective.

Authors: Paul C Guest; David Popovic; Johann Steiner
Journal: Methods Mol Biol Date: 2022

5. An ML prediction model based on clinical parameters and automated CT scan features for COVID-19 patients.

Authors: Abhishar Sinha; Swati Purohit Joshi; Purnendu Sekhar Das; Soumya Jana; Rahuldeb Sarkar
Journal: Sci Rep Date: 2022-07-04 Impact factor: 4.996

6. Predictors of Worsening COVID-19 Illness.

Authors: Beuy Joob; Viroj Wiwanitkit
Journal: Tuberc Respir Dis (Seoul) Date: 2021-04-02

Review 7. A review of deep learning-based detection methods for COVID-19.

Authors: Nandhini Subramanian; Omar Elharrouss; Somaya Al-Maadeed; Muhammed Chowdhury
Journal: Comput Biol Med Date: 2022-01-29 Impact factor: 4.589

8. Image and structured data analysis for prognostication of health outcomes in patients presenting to the ED during the COVID-19 pandemic.

Authors: Liam Butler; Ibrahim Karabayir; Mohammad Samie Tootooni; Majid Afshar; Ari Goldberg; Oguz Akbilgic
Journal: Int J Med Inform Date: 2021-12-09 Impact factor: 4.730

9. CT Quantification of COVID-19 Pneumonia at Admission Can Predict Progression to Critical Illness: A Retrospective Multicenter Cohort Study.

Authors: Baoguo Pang; Haijun Li; Qin Liu; Penghui Wu; Tingting Xia; Xiaoxian Zhang; Wenjun Le; Jianyu Li; Lihua Lai; Changxing Ou; Jianjuan Ma; Shuai Liu; Fuling Zhou; Xinlu Wang; Jiaxing Xie; Qingling Zhang; Min Jiang; Yumei Liu; Qingsi Zeng
Journal: Front Med (Lausanne) Date: 2021-06-17

10. Explainable Machine Learning for COVID-19 Pneumonia Classification With Texture-Based Features Extraction in Chest Radiography.

Authors: Luís Vinícius de Moura; Christian Mattjie; Caroline Machado Dartora; Rodrigo C Barros; Ana Maria Marques da Silva
Journal: Front Digit Health Date: 2022-01-17