Literature DB >> 35672719

Evaluation of the models generated from clinical features and deep learning-based segmentations: Can thoracic CT on admission help us to predict hospitalized COVID-19 patients who will require intensive care?

Mutlu Gülbay1, Aliye Baştuğ2, Erdem Özkan3, Büşra Yüce Öztürk4, Bökebatur Ahmet Raşit Mendi3, Hürrem Bodur2.   

Abstract

BACKGROUND: The aim of the study was to predict the probability of intensive care unit (ICU) care for inpatient COVID-19 cases using clinical and artificial intelligence segmentation-based volumetric and CT-radiomics parameters on admission.
METHODS: Twenty-eight clinical/laboratory features, 21 volumetric parameters, and 74 radiomics parameters obtained by deep learning (DL)-based segmentations from CT examinations of 191 severe COVID-19 inpatients admitted between March 2020 and March 2021 were collected. Patients were divided into Group 1 (117 patients discharged from the inpatient service) and Group 2 (74 patients transferred to the ICU), and the differences between the groups were evaluated with the T-test and Mann-Whitney test. The sensitivities and specificities of significantly different parameters were evaluated by ROC analysis. Subsequently, 152 (79.5%) patients were assigned to the training/cross-validation set, and 39 (20.5%) patients were assigned to the test set. Clinical, radiological, and combined logit-fit models were generated by using the Bayesian information criterion from the training set and optimized via tenfold cross-validation. To simultaneously use all of the clinical, volumetric, and radiomics parameters, a random forest model was produced, and this model was trained by using a balanced training set created by adding synthetic data to the existing training/cross-validation set. The results of the models in predicting ICU patients were evaluated with the test set.
RESULTS: No parameter individually created a reliable classifier. When the test set was evaluated with the final models, the AUC values were 0.736, 0.708, and 0.794, the specificity values were 79.17%, 79.17%, and 87.50%, the sensitivity values were 66.67%, 60%, and 73.33%, and the F1 values were 0.67, 0.62, and 0.76 for the clinical, radiological, and combined logit-fit models, respectively. The random forest model that was trained with the balanced training/cross-validation set was the most successful model, achieving an AUC of 0.837, specificity of 87.50%, sensitivity of 80%, and F1 value of 0.80 in the test set.
CONCLUSION: By using a machine learning algorithm that was composed of clinical and DL-segmentation-based radiological parameters and that was trained with a balanced data set, COVID-19 patients who may require intensive care could be successfully predicted.
© 2022. The Author(s).

Entities:  

Keywords:  Artificial intelligence; COVID-19; Computed tomography; Deep learning; Logistic regression models; Machine learning; Radiomics

Mesh:

Year:  2022        PMID: 35672719      PMCID: PMC9172094          DOI: 10.1186/s12880-022-00833-2

Source DB:  PubMed          Journal:  BMC Med Imaging        ISSN: 1471-2342            Impact factor:   2.795


Background

Severe COVID-19 patients who are admitted to the inpatient ward due to the need for supplemental oxygen or due to evidence of systemic inflammation must be monitored for the development of critical illness, a rapid increase in oxygen needs and/or an increasing systemic deterioration [1]. For patients who progress to a critical illness level, transfers to the intensive care unit (ICU) are required; additionally, depending on the severity of the condition, the patient may also need oxygen delivery through a high-flow device, noninvasive ventilation, invasive mechanical ventilation, or extracorporeal membrane oxygenation [1]. Planning the ICU bed capacity is of primary importance during pandemic surges [2] since limitations in the ICU bed capacity have been reported to have an effect on mortality [3]. Thus, it is important to predict the need for ICUs, especially for patients with a severe clinical condition that requires inpatient treatment [4, 5]. Additionally, starting remdesivir in the ward was recommended if disease progression was predicted [1]. Although models for identifying ICU candidate patients have been reported, most of these models are based only on clinical data [6-11]. In a study where the candidate parameters included the presence or absence of chest X-ray findings, it was noted that this parameter was not included in the final model [12]. Promising results were achieved by combining the clinical data with the semiquantitative visual severity scores (VSS) depending on the volume, type, and extent of the infiltration that were measured on chest X-ray and CT [13-15]. Radiomics analysis extracts different quantitative data from medical images with various algorithms, and these data are used in further analyses for decision support [16]. Studies using radiomics models and machine learning methods have shown that these methods can diagnose COVID-19 [17, 18] and can determine its prognosis [19]. Although combined models of the clinical and radiomics parameters in RT-PCR-positive cohorts were reported [20], studies evaluating the efficacy of models that include clinical, quantitative volumetric and radiomics parameters for predicting disease progression in hospitalized COVID-19 patients are lacking. Using deep learning (DL) for a COVID-19 diagnosis was previously studied by using chest X-ray and CT parameters in pretrained or customized models, and the results were successful [21]. DL networks are also used for automated segmentation, and a high accuracy was shown in the U-Net architecture for the CT images in COVID-19 patients [22]. The aim of this study was to generate and compare models that predict the need for ICUs in hospitalized COVID-19 patients using clinical features and volumetric and radiomics data that were calculated by automated segmentations.

Materials and methods

This retrospective, cross-sectional, single-center study was approved by our institution’s review board (EK-E1-21-2090), and written informed consent was waived. All of the procedures that were performed in this study were in accordance with the 1964 Helsinki Declaration and its later amendments.

Study population

A total of 268 RT-PCR-positive severe COVID-19 patients hospitalized consecutively in our inpatient ward between March 2020 and March 2021 were evaluated (Fig. 1). All these patients had one or more of the criteria for severe illness [1]: an SpO2 < 94% when breathing room air, a < 300 mmHg arterial partial oxygen pressure to fraction of inspired oxygen (PaO2/FiO2) ratio, a respiratory rate > 30 per minute or infection involving more than 50% of the lung parenchyma. Patients were transferred to the ICU when one or more of the signs of critical illness [1], including acute respiratory distress syndrome, septic shock, and multiorgan failure, had developed.
Fig. 1

Flowchart of the study. Clinical, Radiological and Combined models are the final models in cross-validation. LR is Logistic regression

Flowchart of the study. Clinical, Radiological and Combined models are the final models in cross-validation. LR is Logistic regression The inclusion criteria in this patient group were patients older than 18 years of age, patients who had a thoracic CT scan in our hospital, patients who did not receive any steroid or antiviral treatment before the CT study, patients with no interstitial pulmonary disease, and patients who had not undergone pulmonary surgery, and 222 patients met these criteria. In this group, patients with enhanced CT examinations (n = 7, all were suspicious for embolism), respiratory artifacts (n = 14), massive pleural effusion (over two-thirds of the hemithorax) (n = 4), pneumothorax (n = 1), and cystic lung disease (n = 1) were excluded from the study.

CT protocol

CT studies were performed with a 128-detector system (GE Revolution, General Electric, Milwaukee, WI) from the first rib to the adrenal glands, nonenhanced by using the following parameters: 100 kV, 110 mAs, body filter, a 1.25 mm slice thickness, a 512 × 512 reconstruction matrix, a spiral pitch factor of 1.375:1, BonePlus convolution kernel, adaptive statistical iterative reconstruction of 70%.

Deep learning segmentation and radiomics feature calculation

The entire lung parenchyma and pneumonic lesions were segmented by using Quibim’s U-Net model, which is a convolutional neural network architecture that uses the ResNet-34 backbone, which was developed for the ‘A European initiative for automated diagnosis and quantitative analysis of COVID-19 on imaging’ project. Slices of the studies were preprocessed as the segmentation model input by applying a constant lung window level (WW = 1600, WL = − 600), normalization in range [0, 1], as well as by using the Balance Contrast Enhancement Technique (BCET preprocessing). Thus, the basic shapes of the image histograms were maintained. Several metrics were accounted for in order to evaluate the segmentation model. Dice Similarity Coefficient (DSC) and Intersection over Union (IoU) values were calculated on all of the scans with at least 1000 voxels in the ground truth segmentation. Due to the fact that DSC and IoU are zero in cases without ground truth mask, the final test set for these metrics was determined by using a histogram-based threshold of more than 1000 positive voxels. Average false positive and false negative volumes were calculated for all of the scans. In addition, Pearson's correlation coefficients of positive prediction and ground truth were determined. Two authors (MG, 16 years of experience and EO, last year of residency training) checked whether all ground glass opacities (GGO), consolidation or crazy paving areas in the CT studies were segmented by the DL algorithm. It was noted that DL did not segment GGO that were smaller than 1 mL, and patients who only had such lesions were excluded from the study (n = 4). Finally, the final study population consisted of 191 patients. The radiomics features were calculated using Quibim Texture Analysis software (Quibim SL, Valencia, Spain) from the obtained segmentations by the following parameters: (1) Resampled voxel size 1 × 1 × 1 mm3 by using bicubic interpolation, (2) Fixed bin-width of 25 for gray value discretization, (3) Density normalization according to Eq. (1):where f(x) is the normalized voxel density, x is the original density, μx is the mean density, σx is the standard deviation and S is the scaling factor (set to 500). (4) A voxel array shift of 1024 was added to prevent the negative values from being squared. (5) Second-order matrices were calculated using a distance of 1 voxel and 13 isotropic displacement vectors at angles of 0°, 45°, 90° and 135°.

Statistical analysis

Patients were categorized into Group 1 (patients who recovered with treatment in the inpatient ward) and Group 2 (patients transferred to the ICU for progressive disease from the inpatient ward). The data obtained from the patients were divided into (1) clinical data consisting of the demographic data of the patients, comorbid disease history, therapeutics given to the patient in the inpatient service, oxygen saturation, complete blood count, biochemical parameters, and acute phase reactants obtained at admission and (2) radiological data consisting of the volumetric data of the whole lung, the inflamed lung parenchyma as segmented by DL and the first- and second-order radiomics parameters calculated from the segmented lesions. Comparison of the nominal data between the two groups was performed with the chi-squared test or the Fisher's exact test. For continuous data, the values of a normally distributed parameter were given as the mean ± SD, and the values of nonnormally distributed parameters were provided as the median (IQR). Comparisons of the groups were conducted with the T-test or Mann–Whitney test, accordingly. If a parameter differed significantly between the two groups, the area under the curve (AUC) was calculated with the receiver operator characteristic (ROC) test, and the cutoff value, optimal sensitivity, and specificity were determined by using the Youden index. Logistic regression was used for the univariate nominal parameters to calculate the sensitivity and specificity. After the patient population was randomly divided into a training and cross-validation set (n = 152, 79.6%) and a test set (n = 39, 20.4%), logit fit models were created by using the Bayesian information criterion (BIC) from the training set. The clinical model was selected from the clinical data, and the radiological model was selected from the radiological data. A combined model from both clinical and radiological data was also constructed. The adequacy of the model’s parameters in predicting the categorical outcomes was evaluated with the Hosmer–Lemeshow goodness-of-fit test. Multicollinearity was evaluated by calculating the variance inflation factor (VIF). Models were optimized by calculating cost function (log loss), and the gradient descent optimization algorithm and initial theta vectors were replaced with the optimized ones. By using a tenfold cross-validation, the mean sensitivity, specificity, and accuracy values of the model were calculated by averaging all of the cross-validation results, and the model-specific cutoff values for each model were calculated via the Youden index of ROC analyses [23]. The test set results were obtained by using optimized models and model-specific cutoff values. The C-index and 95% CI values of the models were further separately calculated for the training and cross-validation sets via 1000 bootstrapping studies. To solve the class imbalance problem, the Synthetic Minority Oversampling Technique (SMOTE) algorithm was used by using the “smotefamily” package in the R statistical computing environment (R Foundation for Statistical Computing, Vienna, Austria) [24]. During the generation of the synthetic data, k = 3 was selected for the K-nearest neighbor algorithm. Another model including all of the clinical and radiological parameters in the study was created via the random forest classification algorithm, and this model was trained with the balanced training set containing the synthetic data. The effectiveness of the final random forest model was evaluated with the same test set that was used for the logit-fit models. Statistical analyses were performed using IBM SPSS v23 (IBM Corp, Armonk, NY), MedCalc v20.011 (MedCalc Software bvba, Ostend, Belgium), R v4.0.2 (R foundation, Vienna, Austria) and XLStat statistical and data analysis v2021.3.1 (Addinsoft, NY, USA). The power analysis was conducted using G*Power 3.1.

Results

Group features, demographics, symptoms and findings

There were 117 patients in Group 1 (61.6%) and 74 in Group 2 (38.4%). The mean age of the patients was 65.45 ± 14.02 (26–96 years), 57.6% of the patients were male, and 42.4% were female. ICU patients were followed in the ward for an average of 3.2 days (1–12 days) prior to transfer to the ICU. Whereas Group 1 patients were discharged with a mean duration of 8.8 ± 4.7 days (2–29 days), Group 2 patients had a mean duration of 19.2 ± 13.8 days (7–41 days, including ICU stay) of hospitalization that resulted in either death (n = 3) or discharge (n = 71). The mean age of the patients and the number of males were higher in Group 2, and the differences were significant (Table 1). Regarding the symptoms and findings, only fever was significantly different between the two groups. Among the comorbidities, patients diagnosed with chronic renal failure or coronary heart disease required significantly more ICU admissions (Table 1). Patients who needed corticosteroids in the ward were more frequently transferred to the ICU (Table 1).
Table 1

Demographics, symptoms, findings, comparison between the groups and discrimination assessment

ParameterGroup 1Group 2pCut-off or Risk factorSensitivity %Specificity %AUC
Age (years)62 (21)70 (23.3)< 0.001a> 7155.973.10.679d
SexMale: 52.0%Male: 48.0%0.005bMale67.350.30.774e
Female: 69.3%Female: 30.7%
Fever (body temperature > 38 °C)58.0%72.0%0.014bPresent72.042.00.770e
Dyspnea31.9%43.8%0.082b
Cough49.5%50.5%0.491b
Diabetes mellitus28.7%34.6%0.344b
Hypertension42.5%57.5%0.210b
Acute renal failure1.9%3.7%0.446c
Chronic renal failure0.4%3.8%0.001cPresent9.499.30.863e
Coronary heart disease21.0%33.6%0.022bPresent33.679.00.764e
COPD15.3%15.0%0.941b
Malignity3.2%6.5%0.199b
CS treatment at the wardR: 66.7%R: 83.8%0.009bNeed for CS treatment83.7833.330.595e
N/R: 33.3%N/R: 16.2%
High dose CS pulse treatment at the wardR: 30.9%R: 63.8%< 0.001bNeed for pulse CS treatment63.7769.070.680e
N/R: 69.1%N/R:36.2%

Group results were given as Median (IQR)

Group 1: Patients discharged from inpatient floor; Group 2: Patients transferred to ICU

COPD chronic obstructive pulmonary disease, CS corticosteroid, R received, N/R not received

aMann–Whitney test result

bChi-squared test result

cFisher’s exact test result

dROC analysis results

eOne-parameter logistic regression result

Demographics, symptoms, findings, comparison between the groups and discrimination assessment Group results were given as Median (IQR) Group 1: Patients discharged from inpatient floor; Group 2: Patients transferred to ICU COPD chronic obstructive pulmonary disease, CS corticosteroid, R received, N/R not received aMann–Whitney test result bChi-squared test result cFisher’s exact test result dROC analysis results eOne-parameter logistic regression result

Laboratory findings

Most of the laboratory findings differed significantly between the two groups (Table 2). However, at the time of admission, it was observed that patients who needed ICU care did not have a lower oxygen saturation value. Among the blood tests, procalcitonin was the most effective univariate classifier (Table 2).
Table 2

Laboratory findings, comparison between the groups and results of ROC analysis

ParameterGroup 1Group 2paCut-offSensitivity %Specificity %AUCb
SPO2 (%)92 (2)92 (4)0.987
wbc count (/μL)6100 (2730)5860 (3320)0.948
Lymhoocyte count (/μL)1100 (670)810 (530)0.0002≤ 100072.057.30.655
Neutrophil/Lymphocyte ratio5.14 (4.07)8.59 (8.30)0.0001> 3.6575.749.60.659
Platelet count (× 103/μL)225 (93.5)192 (96)0.010≤ 21461.756.00.593
eGFR (mL/min/1.73 m2)87 (33)69 (50)0.0003≤ 7456.166.20.632
Alanine aminotransferase (ALT; Units/L)28 (26)29 (28)0.594
Aspartate aminotransferase (AST; Units/L)34 (21)44 (39)0.0004> 3768.261.10.668
Lactate dehydrogenase (LDH; Units/L)294 (120)390 (182)0.00001> 36066.475.80.755
Creatine kinase (CK; Units/L)99.5 (110)179 (247)0.0001> 17253.376.30.676
C-reactive protein (CRP; mg/L)71 (85.5)100 (104)0.0005> 13439.386.00.627
Procalcitonin (PCT; μg/L)0.04 (0.06)0.17 (0.27)0.00001> 0.1162.681.50.785
Ferritin (ng/mL)254 (407)445 (706)0.0001> 40858.366.70.648
D-Dimer (ng/mL)0.70 (0.60)1.00 (1.28)0.001> 0.9251.969.30.626

Group results were given as Median (IQR)

Group 1: Patients discharged from inpatient floor; Group 2: Patients transferred to ICU

SPO Pulse oximetry saturation, eGFR estimated-glomerular filtration rate

aMann–Whitney test result

bROC analysis result

Laboratory findings, comparison between the groups and results of ROC analysis Group results were given as Median (IQR) Group 1: Patients discharged from inpatient floor; Group 2: Patients transferred to ICU SPO Pulse oximetry saturation, eGFR estimated-glomerular filtration rate aMann–Whitney test result bROC analysis result

DL segmentation findings and radiomics

The time between the onset of symptoms and the CT examination was 6 [7] days in Group 1 and 5 [5] days in Group 2 (p = 0.775, Mann–Whitney test). The positive RT-PCR test result and the CT study were conducted on the same day. The DL algorithm segmented both the whole lung tissue and the pneumonic areas of COVID-19 infection in the patients. In Group 2 patients, both the percentage and volume of pneumonic tissue secondary to COVID-19 were significantly higher (Table 3) than those in Group 1 patients. Additionally, in Group 2, the mean total lung volume was decreased by 11.2% compared to that in Group 1.
Table 3

Volumetric data, comparison between the groups and results of ROC analysis

ParameterGroup 1Group 2pCut-offSensitivity %Specificity %AUCc
Right lung
Upper Third (%)5 (16)11.5 (20)0.011a> 959.565.80.608
Middle Third (%)14 (17)21.5 (27)0.015a> 2448.777.80.604
Lower Third (%)12 (18)18.5 (30)0.012a> 3526.493.20.608
Infected Total (%)12 (12)21 (21)0.002a> 1954.176.10.635
Upper Third (mL)25 (71)51 (105)0.032a> 3960.160.10.592
Middle Third (mL)136 (152)174.5 (220)0.120a
Lower Third (mL)41 (69)63.5 (94)0.049a> 7746.073.00.584
Infected Total (mL)222 (235)307.5 (416)0.010a> 35447.376.90.611
Whole Right Lung Volume(mL)1849 ± 5741628 ± 5150.008b≤ 151646.069.20.592
Left lung
Upper Third (%)1 (7)4 (16)0.020a> 740.577.80.597
Middle Third (%)10 (20)19 (30)0.005a> 1266.258.10.622
Lower Third (%)11 (22)21 (31)0.024a> 2643.276.10.597
Infected Total (%)10 (16)15.5 (19)0.001a> 1163.559.80.640
Upper Third (mL)9 (30)13.5 (79)0.012a> 7127.092.30.575
Middle Third (mL)73 (136)125 (174)0.080a
Lower Third (mL)38 (70)67 (88)0.060a
Infected Total (mL)139 (209)232 (281)0.092a
Whole Left Lung Volume(mL)1583 ± 5251419 ± 4910.033b≤ 181983.831.60.581
Both lungs
Total infected lung (%)12.26 (12.55)18.60 (19.24)< 0.001a> 16.6159.5%70.1%0.674
Total infected lung (mL)377 (441)583.5 (612)0.001a> 43568.9%58.1%0.638
Total Lung Volume3433 ± 10783049 ± 9560.011b≤ 380483.8%33.3%0.589

Group results were given as Median (IQR) or Mean ± SD according to the parameter’s distribution

Group 1: Patients discharged from inpatient floor; Group 2: Patients transferred to ICU

aMann–Whitney test result

bT-test result

cROC analysis result

Volumetric data, comparison between the groups and results of ROC analysis Group results were given as Median (IQR) or Mean ± SD according to the parameter’s distribution Group 1: Patients discharged from inpatient floor; Group 2: Patients transferred to ICU aMann–Whitney test result bT-test result cROC analysis result Eighteen first-order and 58 second-order radiomics parameters were calculated from the segmentations (Table 4). The skewness was higher in Group 1, and the mean density was higher in Group 2 (Table 4).
Table 4

Radiomic parameters, comparison between the groups and results of ROC analysis

ParameterGroup 1Group 2pCut-offSensitivity %Specificity %AUCc
First order
10th percentile*− 126.15 (109.66)− 86.95 (113.60)0.010a> − 87.0450.073.50.611
90th percentile*508.17 ± 147.39591.50 ± 120.42< 0.001b> 488.2685.145.30.672
Energy (× 1011)5.71 (6.94)11.4 (9.94)< 0.001a> 6.9066.265.80.675
Entropy5.15 (0.32)5.29 (0.20)< 0.001a> 5.2566.265.00.672
Interquartile Range*355.01 ± 93.75398.27 ± 84.740.001a> 319.4287.840.20.644
Kurtosis2.64 (1.17)2.26 (0.63)< 0.001a≤ 2.5674.353.00.652
Maximum*998.36 ± 122.101009.82 ± 121.380.527b
Mean Absolute Deviation*199.58 ± 36.68214.74 ± 32.060.003b> 201.1175.750.40.629
Mean*162.99 ± 119.13243.10 ± 122.06< 0.001b> 159.2377.054.70.691
Median*129.73 ± 109.82231.95 ± 146.890.001b> 185.6760.871.80.702
Minimum*− 479.49 ± 83.67− 487.67 ± 89.950.506b
Range*1477.85 ± 101.111497.49 ± 134.920.278b
RMAD *146.78 ± 35.51162.46 ± 31.450.002b> 134.0086.539.30.636
Root Mean Squared*1211.68 ± 121.501293.10 ± 121.190.001b> 1213.8074.357.30.693
Skewness0.578 (0.730)0.187 (0.648)< 0.001a≤ 0.78894.639.30.721
Total Energy (× 1011)7.09 (6.95)10.00 (9.94)< 0.001a> 6.966.265.80.675
Uniformity0.0323 (0.0100)0.0281 (0.0058)< 0.001a≤ 0.029262.267,50.678
Variance59,791 ± 17,61966,375 ± 15,8030.010b> 59,73473.053.00.618
Second order—gray level cooccurrence matrix (GLCM)
Autocorrelation721.06 (420.12)1009.76 (446.41)< 0.001a> 706.4287.847.90.698
Cluster tendency310.78 (144.06)354.96 (89.13)0.012a> 311.0575.751.30.608
Clustershade2585.01 (3335.90)807.02 (4056,87)< 0.001a≤ 1429.1060.870.10.676
Maximum probability0.00464 (0.00345)0.00353 (0.00332)0.005a≤ 0.0033047.373.50.620
Sumentropy6.0181 (0.3201)6.1410 (0.1770)< 0.001a> 6.028678.453.90.666
Joint energy (× 10–3)1.628 (1.022)1.320 (0.701)0.002a≤ 1.60071.653.00.631
Joint entropy9.814 (0.514)10.016 (0.481)0.005a> 9.920962.262.40.620
Correlation0.6750 ± 0.08100.6906 ± 0.08440.206b
Contrast58.517 (24.270)61.059 (24.522)0.203a
IDN0.9190 ± 0.01680.9169 ± 0.01160.238b
IMC0.7626 (0.1008)0.7949 (0.1031)0.027a> 0.831135.185.50.596
Second order—gray level run length matrix (GLRLM)
Gray Level NUN (× 10–2)3.202 (0.992)2.776 (0.577)< 0.001a≤ 2.90763.566.70.680
Gray Level Variance94.812 (39.683)107.492 (25.276)0.007a> 96.08273.053.00.617
High GL Run Emphasis751.038 (415.427)1021.780 (440.303)< 0.001a< 724.96889.247.90.699
Long-run emphasis1.192 (0.052)1.188 (0.058)0.465a
Long-run High GLE882.784 (474.590)1241.118 (520.097)< 0.001a> 1052.40268.967.50.704
Long-run Low GLE(× 10–3)2.889 (1.573)2.103 (1.366)< 0.001a≤ 2.50467.664.10.667
Low GL Run Emphasis (× 10–3)2.402 (1.146)1.802 (1.043)< 0.001a≤ 2.10768.963.20.673
Run Entropy5.501 (0.253)5.618 (0.171)< 0.001a> 5.51078.453.00.686
Run Length Nonuniformity0.892 (0.240)0.895 (0.269)0.478a
Run Percentage0.942 (0.138)0.943 (0.154)0.473a
Run Variance0.660 (0.183)0.646 (0.222)0.455a
Short Run Emphasis0.957 (0.103)0.958 (0.115)0.475a
Short-run High GLE719.368 (390.961)982.643 (415.463)< 0.001a> 693.11789.246.20.697
Short-run Low GLE (× 10–3)2.294 (1.096)1.730 (0.997)< 0.001a≤ 1.99368.964.10.675
Second order—neighborhood gray tone difference matrix (NGTDM)
Busyness27.061 (27.856)35.585 (28.361)0.071a
Coarseness (× 10–5)2.331 (2.957)1.785 (1.479)0.007a≤ 2.50575.747.90.616
Complexity5736.34 ± 1210.866093.04 ± 1887.260.150b
Contrast0.2082 ± 0.07660.2329 ± 0.06940.026b> 0.201266.252.10.605
Strength (× 10–2)3.518 (3.577)2.365 (1.710)< 0.001a≤ 2.62963.566.70.655
Second order—gray level size zone matrix (GLSZM)
GL NUN (× 10– 2)2.810 (0.567)2.615 (0.299)< 0.001a≤ 2.74873.055.60.663
GL Variance103.41 ± 21.82110.81 ± 20.470.020a> 97.7679.743.60.610
High GL Zone Emphasis816.41 (376.85)1057.04 (340.56)< 0.001b> 873.4177.059.00.700
Low GL Zone Emphasis(× 10–2)2.429 (1.146)1.775 (0.880)< 0.001a≤ 2.23474.356.40.677
Size Zone NUN0.4399 (0.0395)0.4425 (0.0499)0.884b
Large Area High GLE16,758 (42,765)25,675 (86,414)0.033a> 11,56085.137.60.592
Large Area Low GLE0.0964 (0.3851)0.0635 (0.2388)0.229a
Small Area High GLE620.68 ± 186.50745.62 ± 203.19< 0.001b> 616.2474.358.10.689
Small Area Low GLE (× 10–3)1.642 (0.790)1.231 (0.627)< 0.001a≤ 1.77387.841.90.675
Zone Entropy7.1386 ± 0.14907.2035 ± 0.18280.008b> 7.217859.571.80.639
Zone Percentage0.4554 ± 0.06780.4623 ± 0.08060.527b
Zone Variance26.686 (86.459)29.728 (160.183)0.981a
Second order—gray leve dependence matrix (GLDM)
Dependence Entropy7.410 ± 0.1797.511 ± 0.194< 0.001b> 7.48962.269.20.666
Dependence Nonuniformity85,241 (96,288)138,809 (115,232)0.001a> 94,98274.355.60.646
Dependence NUN0.2321 (0.0407)0.2359 (0.0477)0.314b
Dependence variance2.077 (0.815)1.983 (0.776)0.281a
Dependence GLNU13,166 (15,270)17,013 (18,239)0.031a> 15,83758.159.80.593
GL Variance95.750 ± 28.192106.284 ± 25.2850.010b> 95.66773.053.00.618
High GLE834.296 (424.553)1047.474 (442.943)< 0.001a> 719.57689.247.90.698
Low GLE (× 10–3)2.408 (1.157)1.801 (1.059)< 0.001a≤ 2.12668.963.20.672
Large Dependence Emphasis8.245 (2.248)8.690 (2.938)0.419a
Small Dependence Emphasis0.397 ± 0.0550.403 ± 0.0670.534b
Large DHGLE5541.21 (2636.73)7369.43 (3673.84)< 0.001a> 6562.7570.358.40.708
Large DLGLE (× 10– 2)1.979 (2.010)1.388 (1.199)0.001a≤ 1.45559.568.40.646
Small DHGLE354.751 ± 134.628435.646 ± 147.741< 0.001b> 331.55378.453.00.670
Small DLGLE (× 10– 4)8.898 (4.828)6.843 (3.277)< 0.001a≤ 8.19868.962.40.678

Group results were given as Median (IQR) or Mean ± SD according to the parameter’s Nonnormal or Normal distribution, respectively

Group1: Patients discharged from inpatient floor; Group 2: Patients transferred to ICU

IDN inverse difference normalized, IMC informational measure of correlation, RMAD robust mean absolute deviation, GL gray level, GLE gray level emphasis, NUN nonuniformity normalized, DHGLE dependence high gray level emphasis, DLGLE dependence low gray level emphasis

*Standardized results

aMann–Whitney test result

bT-test result

cROC analysis result

Radiomic parameters, comparison between the groups and results of ROC analysis Group results were given as Median (IQR) or Mean ± SD according to the parameter’s Nonnormal or Normal distribution, respectively Group1: Patients discharged from inpatient floor; Group 2: Patients transferred to ICU IDN inverse difference normalized, IMC informational measure of correlation, RMAD robust mean absolute deviation, GL gray level, GLE gray level emphasis, NUN nonuniformity normalized, DHGLE dependence high gray level emphasis, DLGLE dependence low gray level emphasis *Standardized results aMann–Whitney test result bT-test result cROC analysis result

Predictive logit-fit models

None of the clinical, volumetric or radiomics parameters provided a dependable univariate classifier. Therefore, logit-fit models were created (Table 5).
Table 5

Features of clinical, radiological and combined models in detecting ICU candidate COVID-19 patients

Model featuresTraining and validation set**Test set***
TypeParameterspOdds Ratio* (95% CI)C-index 95% CI§AUCSpec%Sens%Acc%PrecisionF1
Clinical modelAge0.0021.057 (1.023–1.092)

Training

0.846

0.824–0.866

Validation

0.805

0.724–0.869

0.73679.1766.6774.3666.670.67
Platelet count0.1170.995 (0.990–0.999)
eGFR0.3850.989 (0.966–1.014)
AST0.3221.016 (0.998–1.034)
LDH< 0.0011.007 (1.003–1.011)
Procalcitonin< 0.0018.542 (1.892–38.559)
Fever > 38 °C0.0923.418 (1.439–8.121)
Radiological modelPercent infected lung0.0101.043 (1.009–1.078)

Training

0.841

0.819–0.860

Validation

0.815

0.746–0.877

0.70879.1760.0071.7964.280.62
First order skewness< 0.0010.029 (0.006–0.138)
First order RMAD0.0300.969 (0.942–0.997)
GLCM sumentropy0.0027.181 (1.974–26.128)
GLDM LDLGLE0.0055.486 (1.650–18.245)
Combined modelPercent infected lung0.0151.048 (1009–1088)

Training

0.897

0.882–0.914

Validation

0.853

0.787–0.905

0.79487.5073.3382.0578.570.76
First order skewness0.0010.012 (0.001–0.166)
GLCM clustershade0.0581.000 (1.000–1.001)
GLSZM LALGLE0.0021.525 (1.174–1.982)
N/L ratio0.0511.088 (1.000–1.184)
Platelet count0.0090.993 (0.987–0.998)
eGFR< 0.0010.962 (0.942–0.983)
AST0.0241.021 (1.003–1.040)

AUC area under curve, Spec% specificity %, Sens% sensitivity % (Recall), Acc% accuracy %, eGFR estimated glomerular filtration rate, AST aspartate aminotransferase, LDH lactate dehydrogenase, N/L neutrophil to lymphocyte ratio, GLCM gray level co-occurrence matrix, GLDM LDLGLE gray level dependence matrix LargeDependenceLowGrayLevelEmphasis, GLSZM LALGLE gray level size zone matrix LargeAreaLowGrayLevelEmphasis

*Standardized values, **Cut-off = 0.500, ***Model-specific cut-off, § 95% CI calculated via 1000 iterations of bootstrap resampling

Features of clinical, radiological and combined models in detecting ICU candidate COVID-19 patients Training 0.846 0.824–0.866 Validation 0.805 0.724–0.869 Training 0.841 0.819–0.860 Validation 0.815 0.746–0.877 Training 0.897 0.882–0.914 Validation 0.853 0.787–0.905 AUC area under curve, Spec% specificity %, Sens% sensitivity % (Recall), Acc% accuracy %, eGFR estimated glomerular filtration rate, AST aspartate aminotransferase, LDH lactate dehydrogenase, N/L neutrophil to lymphocyte ratio, GLCM gray level co-occurrence matrix, GLDM LDLGLE gray level dependence matrix LargeDependenceLowGrayLevelEmphasis, GLSZM LALGLE gray level size zone matrix LargeAreaLowGrayLevelEmphasis *Standardized values, **Cut-off = 0.500, ***Model-specific cut-off, § 95% CI calculated via 1000 iterations of bootstrap resampling The clinical model’s PP was calculated using Eq. (2):where PLT is the platelet count, GFR is the estimated glomerular filtration rate, AST is aspartate aminotransferase, LDH is lactate dehydrogenase and PCT is procalcitonin. Although the clinical model had good specificity, its sensitivity was limited in the training and validation sets (Table 5). Radiological model’s PP calculated using Eq. (3):where PIL is the percent of infected lung, RMAD is Robust Mean Absolute Deviation, and LDLGLE is GLDM-LargeDependenceLowGrayLevelEmphasis. This model showed a better sensitivity but a worse specificity than the clinical model (Table 5). Combined model’s PP calculated using Eq. (4):where PIL is the percent of infected lung, S is the skewness, C is GLCM-Clustershade, GLE is GLSZM-LargeAreaLowGrayLevelEmphasis, NLR is the neutrophil-to-lymphocyte ratio, PLT is the platelet count, GFR is the estimated glomerular filtration rate, and AST is aspartate aminotransferase. This model had the highest AUC and specificity (Table 5). The calculated p values for the clinical, radiological, and combined models were 0.376, 0.399, and 0.631, respectively, in the Hosmer–Lemeshow test. Radiologic and combined models showed better calibration than the clinical model (Fig. 2). The VIF value was less than 3.0 for all parameters in the models; thus, there was no significant multicollinearity.
Fig.2

Calibration plot of Clinical (a), Radiological (b), and Combined (c) models

Calibration plot of Clinical (a), Radiological (b), and Combined (c) models The optimal cutoff-off values were calculated for the clinical, radiological, and combined models as 0.565, 0.444, and 0.429, respectively (Fig. 3).
Fig.3

Prediction probability score versus model score graph of clinical (a), radiological (b), and combined (c) models. In the ROC analysis of the cross-validation sets, the optimal cutoff values of the models were determined and marked by using the Youden index

Prediction probability score versus model score graph of clinical (a), radiological (b), and combined (c) models. In the ROC analysis of the cross-validation sets, the optimal cutoff values of the models were determined and marked by using the Youden index

Test set features and test set results of logit-fit models

Fifty-nine patients in the training set and 15 in the test set were transferred to the ICU (p = 0.908, chi-squared test). The median age of the patients was 65 (22.8) years in the training set and 61 [20] years in the test set (p = 0.056, Mann–Whitney test). The training set included 88 males and 64 females, and the test set included 22 males and 17 females (p = 0.867, chi-squared test). In the test set, the combined model produced the best AUC, followed by the clinical model (Table 5).

Synthetic data generation and random forest algorithm

Despite the fact that they were well calibrated, there were problems with the logit-fit models due to the study population. The data had a low sample size and were affected by class-imbalance. Due to the fact that our sample size was low, the BIC method, which penalizes complex models with more parameters [25], was preferred in the model selection for logit-fit models to avoid overfitting [26]. In addition to the logit-fit models, a random forest classification algorithm, as a method that resistant to overfitting, was used for generating a model that uses all of the study parameters. While 93 (61%) patients in the training set did not require intensive care, 59 (39%) patients were transferred to the ICU. The use of the unbalanced training set, especially for the high-dimensional data, was reported as a reason for model bias in favor of the majority class [27]. To solve this problem, the instance of the minority class, which involves patients being transferred to the ICU, was increased to 93 by using the SMOTE algorithm. The prediction results on the test set of the model that was trained with the random forest algorithm on a more balanced training set did not lead to an increase in specificity (87.5%). On the other hand, sensitivity (80%) was increased, and increases in accuracy (84.6%), AUC (0.837), precision (0.80), and F1 score (0.80) were followed. A feature importance study was also conducted for the random forest model (Fig. 4). Overall importance was the highest in the PCT, followed by Skewness, LDH, PIL, CK, and GLDM-LargeDependenceLowGrayLevelEmphasis. It was observed that most of the parameters that were included in the logit-fit models also had high mean decrease accuracy values in the random forest model.
Fig.4

Mean Decrease Accuracy of study parameters in Random Forest model. Features with the highest overall importance were indicated. PIL: Percent infected lung, PLT: Platelet count, LDH: Lactate dehydrogenase, CK: Creatine kinase, PCT: Procalcitonin

Mean Decrease Accuracy of study parameters in Random Forest model. Features with the highest overall importance were indicated. PIL: Percent infected lung, PLT: Platelet count, LDH: Lactate dehydrogenase, CK: Creatine kinase, PCT: Procalcitonin When the ROC curves of the models were evaluated by pairwise comparison [28], no significant difference was found between the RF model and the Combined logit fit model (Fig. 5).
Fig. 5

ROC curves and their non-parametric pairwise comparison table of machine learning models in the study. The p values of the comparison results were given. RF: Random Forest model

ROC curves and their non-parametric pairwise comparison table of machine learning models in the study. The p values of the comparison results were given. RF: Random Forest model

Power analysis

On the post hoc analysis, for a difference of the mean for two independent groups including 117 and 74 patients with input parameters of a medium effect size (Cohen’s d = 0.5), two tails, and alpha = 0.05, the calculated power was 0.91.

Discussion

COVID-19 surges in the United States over the past two years were assessed by the CDC as three periods [29]. These are the Winter 2020–2021 period, the Delta period from July to November 2021, and the Omicron period that we are currently in, which started in December 2021. The maximum number of 7-day moving average ICU bed in use for COVID-19 was reported as 27,958 (January 9–16, 2021) in the Winter 2020–2021 period, 24,775 (September 6–13, 2021) in the Delta period, and 24,776 (January 15, 2022) in the Omicron period [29]. It has been reported that the number of patients who required intensive care due to the Omicron variant is one-fourth compared to the Delta variant [30]. The lack of difference between the ICU admission numbers between the Omicron period and the Delta period is due to the difference in the number of COVID-19 cases. While the maximum case number of 7-day moving average in the Delta period was 164,249, this value was reported as 798,976 in the Omicron period [29]. Since none of the 28 clinical, 21 volumetric, and 74 radiomics parameters could reliably predict patients who would require ICU admission, clinical, radiological, and combined models were built, and the combined model provided the best predictions. Models based solely on the clinical data emerged with easy accessibility and usability features. High fever, older age, elevated LDH, increased acute phase reactants and a decreased lymphocyte count were frequently reported in ICU candidates [4-11]. In our study, procalcitonin was distinguished as the parameter with the highest odds ratio value, and patients with chronic renal failure showed a significantly higher need for ICU care, which is consistent with previous publications [12]. In a study involving cases from 100 hospitals in South Korea, the presence or absence of chest X-ray findings did not significantly improve the clinical model outcomes when used as a parameter [12]. On the other hand, it has been reported that a better prediction was obtained when a deep-learning model, which was trained to discriminate critical and noncritical chest X-ray findings, was combined with a clinical model [31]. In prognostic studies that have evaluated CT findings, both increased volumes in pulmonary involvement [32] and an increased ratio of consolidation in pulmonary lesions [33] were associated with an unfavorable prognosis. In our study, the percentage of infected lung parenchyma was included in the radiological and combined models. The mean and median densities of the lesions were significantly higher in ICU patients, suggesting a higher frequency of consolidation. However, this parameter was not included, and skewness was selected for the models by BIC. It has been previously shown that as the GGO areas in the lesions increase, the skewness value of the lesion also increases [18]. We showed that the skewness of the lesions was significantly higher in patients who did not require ICU admission. VSS, which includes the classification of lesions as GGO or consolidation, has been reported as an effective method in the evaluation of COVID-19 prognosis [34, 35]. However, this method is reported to have reliability and reproducibility problems due to issues such as difficulty classifying lesions containing both areas of consolidation and GGO, and radiomics models were found to be more useful for predicting prognosis [36]. The total lung volume was 12% lower in the ICU group. Alveolar collapse is known to occur in patients with SARS-CoV-2 infection [37, 38], and surfactant reduction that results from the loss of alveolar type 2 cells, increased inflammatory cell migration to the interstitial space, and microvascular thrombosis may be responsible for this outcome [39]. Although the total lung volume was not directly entered into the models in the parameter selections, it participated in the calculation of the percent of infected lung parameter. Although segmentation can be performed manually in radiomics modeling studies related to COVID-19 [18, 40], this method takes considerable time due to the large number of lesions per patient, and the reproducibility problem needs to be overcome. Methods such as the segmentation of the entire lung (healthy and diseased), rather than individual lesions, have been suggested [36]. Automated segmentation solves all of these problems. While the use of AI in CT sections is not recommended as a screening test, its use as a predictive and prognostic decision support system in hospitalized patients has been suggested [41]. In a study examining clinical data and the radiomic features calculated from automated segmentations, a combination model produced the best predictions [19]. In our study, apart from radiomic parameters, the percent of infected lung parameter was included in the models. Thus, the subjective calculation of critical parameters, such as the lesion classification and the ratio of the diseased parenchyma in the VSS method, were solved, and the models were based on objective criteria. Additionally, volumetric parameters produced by automated segmentations are reportedly more accurate than human semiquantitative estimates [42]. The model that we propose for predicting the risk of ICU in the COVID-19 patient has two important features. First, it does not solely use clinical data. Models that are solely based on laboratory parameters did not consider lung parenchyma involvement as a parameter, nevertheless all of the combined models in our study included more than one radiological parameter, regardless of the machine learning algorithm that was used. Second, the reproducibility problem of VSS methods has been resolved by using the segmentations of the deep learning algorithm that was trained with CT studies from multiple hospitals in affected countries across Europe. Thus, we believe that models based on non-subjective clinical and radiological data that require no parameter calculation effort and that provide reproducible results could be more widely used in the field and can help healthcare providers to make decisions and better organize hospitals' resources. This study has some limitations. First, these are the results of single center. However, we used automated segmentation, and CT data were resampled during the radiomics parameter calculation. Second, the patient population was retrospectively selected from patients who had an indication for hospitalization, which could introduce selection bias. Third, the relationship between the antiviral treatment efficacy and the ICU requirements was not evaluated in this study since there is no definitively proven antiviral treatment for COVID-19. Fourth, we used an unbalanced data set; however, we increased the sensitivity by adding synthetic data to the training set. Finally, patients with a contrast-enhanced examination were not included in this study since the radiomics parameters would be affected. We believe that a different model is required for embolism cases.

Conclusion

The model that was created by combining the radiological parameters obtained by automated segmentation and the clinical parameters in COVID-19 patients requiring hospitalization was found to be useful as an objective method in predicting the risk of developing critical illness.
  35 in total

1.  Evaluation of lung volume loss with 3D CT volumetry in COVID-19 patients.

Authors:  Recep Savaş; Ayşegül Öz Özcan
Journal:  Diagn Interv Radiol       Date:  2021-01       Impact factor: 2.630

2.  Multicenter Study of Temporal Changes and Prognostic Value of a CT Visual Severity Score in Hospitalized Patients With Coronavirus Disease (COVID-19).

Authors:  Xiaofeng Wang; Xingxing Hu; Weijun Tan; Peter Mazzone; Eduardo Mireles-Cabodevila; Xiaozhen Han; Pingyue Huang; Weihua Hu; Raed Dweik; Zhenshun Cheng
Journal:  AJR Am J Roentgenol       Date:  2020-09-09       Impact factor: 3.959

3.  Development and Validation of a Clinical Risk Score to Predict the Occurrence of Critical Illness in Hospitalized Patients With COVID-19.

Authors:  Wenhua Liang; Hengrui Liang; Limin Ou; Binfeng Chen; Ailan Chen; Caichen Li; Yimin Li; Weijie Guan; Ling Sang; Jiatao Lu; Yuanda Xu; Guoqiang Chen; Haiyan Guo; Jun Guo; Zisheng Chen; Yi Zhao; Shiyue Li; Nuofu Zhang; Nanshan Zhong; Jianxing He
Journal:  JAMA Intern Med       Date:  2020-08-01       Impact factor: 21.873

4.  Radiomics Analysis of Computed Tomography helps predict poor prognostic outcome in COVID-19.

Authors:  Qingxia Wu; Shuo Wang; Liang Li; Qingxia Wu; Wei Qian; Yahua Hu; Li Li; Xuezhi Zhou; He Ma; Hongjun Li; Meiyun Wang; Xiaoming Qiu; Yunfei Zha; Jie Tian
Journal:  Theranostics       Date:  2020-06-05       Impact factor: 11.556

5.  Use of CT and artificial intelligence in suspected or COVID-19 positive patients: statement of the Italian Society of Medical and Interventional Radiology.

Authors:  Emanuele Neri; Vittorio Miele; Francesca Coppola; Roberto Grassi
Journal:  Radiol Med       Date:  2020-04-29       Impact factor: 3.469

6.  Prognostication of patients with COVID-19 using artificial intelligence based on chest x-rays and clinical data: a retrospective study.

Authors:  Zhicheng Jiao; Ji Whae Choi; Kasey Halsey; Thi My Linh Tran; Ben Hsieh; Dongcui Wang; Feyisope Eweje; Robin Wang; Ken Chang; Jing Wu; Scott A Collins; Thomas Y Yi; Andrew T Delworth; Tao Liu; Terrance T Healey; Shaolei Lu; Jianxin Wang; Xue Feng; Michael K Atalay; Li Yang; Michael Feldman; Paul J L Zhang; Wei-Hua Liao; Yong Fan; Harrison X Bai
Journal:  Lancet Digit Health       Date:  2021-03-24

7.  Radiomics: Images Are More than Pictures, They Are Data.

Authors:  Robert J Gillies; Paul E Kinahan; Hedvig Hricak
Journal:  Radiology       Date:  2015-11-18       Impact factor: 11.105

8.  Lung volume reduction and infection localization revealed in Big data CT imaging of COVID-19.

Authors:  Feng Shi; Ying Wei; Liming Xia; Fei Shan; Zhanhao Mo; Fuhua Yan; Dinggang Shen
Journal:  Int J Infect Dis       Date:  2020-11-03       Impact factor: 3.623

9.  Risk factors for intensive care unit admission and mortality in hospitalized COVID-19 patients.

Authors:  Ahmed Ayaz; Ainan Arshad; Hajra Malik; Haris Ali; Erfan Hussain; Bushra Jamil
Journal:  Acute Crit Care       Date:  2020-11-11
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.