Literature DB >> 35988110

Multicenter Study on COVID-19 Lung Computed Tomography Segmentation with varying Glass Ground Opacities using Unseen Deep Learning Artificial Intelligence Paradigms: COVLIAS 1.0 Validation.

Jasjit S Suri^1,2, Sushant Agarwal^3,4, Luca Saba⁵, Gian Luca Chabert⁵, Alessandro Carriero⁶, Alessio Paschè⁵, Pietro Danna⁵, Armin Mehmedović⁷, Gavino Faa⁸, Tanay Jujaray^3,9, Inder M Singh¹⁰, Narendra N Khanna¹¹, John R Laird¹², Petros P Sfikakis¹³, Vikas Agarwal¹⁴, Jagjit S Teji¹⁵, Rajanikant R Yadav¹⁶, Ferenc Nagy¹⁷, Zsigmond Tamás Kincses¹⁸, Zoltan Ruzsa¹⁹, Klaudija Viskovic⁷, Mannudeep K Kalra²⁰.

Abstract

Variations in COVID-19 lesions such as glass ground opacities (GGO), consolidations, and crazy paving can compromise the ability of solo-deep learning (SDL) or hybrid-deep learning (HDL) artificial intelligence (AI) models in predicting automated COVID-19 lung segmentation in Computed Tomography (CT) from unseen data leading to poor clinical manifestations. As the first study of its kind, "COVLIAS 1.0-Unseen" proves two hypotheses, (i) contrast adjustment is vital for AI, and (ii) HDL is superior to SDL. In a multicenter study, 10,000 CT slices were collected from 72 Italian (ITA) patients with low-GGO, and 80 Croatian (CRO) patients with high-GGO. Hounsfield Units (HU) were automatically adjusted to train the AI models and predict from test data, leading to four combinations-two Unseen sets: (i) train-CRO:test-ITA, (ii) train-ITA:test-CRO, and two Seen sets: (iii) train-CRO:test-CRO, (iv) train-ITA:test-ITA. COVILAS used three SDL models: PSPNet, SegNet, UNet and six HDL models: VGG-PSPNet, VGG-SegNet, VGG-UNet, ResNet-PSPNet, ResNet-SegNet, and ResNet-UNet. Two trained, blinded senior radiologists conducted ground truth annotations. Five types of performance metrics were used to validate COVLIAS 1.0-Unseen which was further benchmarked against MedSeg, an open-source web-based system. After HU adjustment for DS and JI, HDL (Unseen AI) > SDL (Unseen AI) by 4% and 5%, respectively. For CC, HDL (Unseen AI) > SDL (Unseen AI) by 6%. The COVLIAS-MedSeg difference was < 5%, meeting regulatory guidelines.Unseen AI was successfully demonstrated using automated HU adjustment. HDL was found to be superior to SDL.

Entities: Chemical

Keywords: And AI; COVID-19; Glass ground opacities; Hounsfield units; Hybrid deep learning; Lung CT; Segmentation; Solo deep learning

Mesh：

Year: 2022 PMID： 35988110 PMCID： PMC9392994 DOI： 10.1007/s10916-022-01850-y

Source DB: PubMed Journal: J Med Syst ISSN： 0148-5598 Impact factor: 4.920

Introduction

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is an infectious disease that has infected 385 million individuals and has killed 5.7 million people globally as of 3rd February 2022 [1]. On March 11th, 2020, the World Health Organization (WHO) declared COVID-19 a global pandemic (the novel coronavirus) [2]. COVID-19 [3, 4] has proven to be worse in individuals with comorbidities such as coronary artery disease [3, 5], diabetes [6], atherosclerosis [7], fetal [8], etc. [9-11]. It has also caused architectural distortion with the interactions between alveolar and vascular changes [12] and affected relationships with daily usage such as nutrition [13]. Pathology has shown that even after vaccine immunization (ChAdOx1 nCoV-19), vaccine-induced immune thrombotic thrombocytopenia (VITT) was triggered [14]. It was also observed that adults who are born small, so-called intrauterine growth restriction (IUGR), are also likely to get affected by COVID-19 [8]. One of the gold standards for COVID-19 detection is the "reverse transcription-polymerase chain reaction" commonly known as the RT-PCR test. Nonetheless, the RT-PCR test takes time and has low sensitivity [15-17]. This is where we use the image-based analysis for COVID-19 patients by using Chest radiographs and Computed Tomography (CT) [18-20] to diagnose the disease and work as a reliable complement to RT-PCR [21]. In the general diagnosis of COVID-19 and body imaging, CT has shown high sensitivity and reproducibility [20–, 22–24]. The primary benefit of CT [25, 26] is the imaging capacity to identify anomalies/opacities such as ground-glass opacity (GGO), consolidation, and other opacities [27-29] seen in COVID-19 patients [30–, 31–35]. DL is a branch of AI that employs deep layers to provide fully automatic feature extraction, classification, and segmentation of the input data [36, 37]. Our team has developed the COVLIAS system, which has used deep learning models for lung segmentation [38-40]. In these previous studies, only one cohort was used when applying cross-validation, leading to bias in the performance since both the training and testing data were taken from the same CT machine, same hospital settings, and same geographical region [41-43]. To overcome this weakness, we introduce a multicentre study where training is conducted on one set of data coming from Croatia and testing was conducted using another data set taken from another source. This source was from Italy, the so-called “Unseen AI” (or vice-versa), which is one of the innovations of the proposed study. Just recently, there has been more visibility on “Unseen AI” [38, 44]. Due to variations in COVID-19 lesions such as GGO, Consolidations, and Crazy Paving, the ability of AI models to predict the automated COVID-19 lung segmentation in CT Unseen data has led to poor clinical manifestations (see Fig. 1). This happens when the Hounsfield Units (HU) [45] of CT images are not consistent between the training and testing paradigms, which leads to over-and under-estimation of the prediction region. This can be prevented via normalization right before AI deployment [46, 47]. We embed such normalization in our AI framework automatically, which is another innovation besides the Unseen AI model design.

Fig. 1

Overlay of segmentation results (red) from the ResNet-SegNet HDL models trained without adjusting the HU level. The white arrow represents the region where the ResNet-SegNet HDL model under-estimates the lung region Recent advances in deep learning, such as hybrid deep learning (HDL) have shown promising results [38–40–, 48–52]. Using this premise, we hypothesize that HDL models are superior to solo DL (SDL) models for segmentation. In this study, we have designed nine SDL and HDL models that are trained and tested for COVID-19-based lung segmentation on multicentre databases. We further offer insight into how 9 models of AI reciprocate to COVID-19 data sets, which is another unique contribution of the proposed study. The analysis includes attributes such as (i) the size of the model, (ii) the number of layers in AI architecture, (iii) the segmentation model utilizes, and (iv) the encoder part of the AI model. These can be used for a comparison between the nine AI models. Lastly, to prove the effectiveness of the AI models, we present performance evaluation using (i) Dice Similarity (DS), (ii) Jaccard Index (JI) [53], (iii) Bland–Altman plots (BA) [54, 55], (iv) Correlation coefficient (CC) plot [56, 57], and (v) Figure of Merit. Finally, as part of scientific validation, we compare the performance of COVLIAS 1.0-Unseen against MedSeg [58], a web-based lung segmentation tool.

Literature Survey

Artificial intelligence (AI) has been in existence for a while especially in the field of medical imaging [59, 60]. AI can play a vital role in the investigation of CT and X-ray images, assisting in the detection of COVID-19 type and overcoming the shortage of expert workers. It started with the role of machine learning moving into different application of point-based models such as diabetes [61, 62], neonatal and infant mortality [63], gene analysis [64] and image-based machine learning models such as carotid plaque classification [65-69], thyroid [70], liver [71], stroke [24], coronary [72], ovarian [73], prostate [74], skin cancer [75, 76], Wilson disease [77], ophthalmology [78], etc. The major challenge with these models is the feature extraction process which is ad-hoc in nature and, therefore, very time taking [79]. It has been recently shown that this weakness is being overcome by the deep learning (DL) models [59, 60]. Paluru et al. [80] proposed AnamNet, a hybrid of UNet and ENet to segment COVID-19-based lesions using 4,300 images (using 69 patients with 5122 resolution size) [81]. The authors compared the models against ENet [82], UNet + + [83], SegNet, and LEDNet [84]. The DSC for the lesion detection turned out to be 0.77. Saood et al. [85] used a set of 100 images downscaled to 2562 to compare the results between the two models, namely UNet and SegNet, and showed the DS score of 0.73 and 0.74, respectively. Cai et al. [86] established a tenfold CV protocol on 250 images using 99 patients and adopted the UNet model with a DS of 0.77. They also suggested a method for predicting the duration of an intensive care unit (ICU) stay. Suri et al. [40] benchmarked NIH [87] (a conventional model) against the three AI models, namely, SegNet, VGG-SegNet, and ResNet-SegNet using nearly 5000 CT scans using 72 patients in an image resolution of 7682. Concluding that ResNet-SegNet was the best performing model. In an inter-variability study by Suri et al. [39], three models, namely, PSPNet, VGG-SegNet, ResNet-SegNet were used. The authors showed HDL models outperformed SDL models, by ~ 5% for all the performance evaluation metrics using 5000 CT slices (taken from 72 patients), in an image resolution of 7682. A recent study by the same authors [38] presented VGG-SegNet, and ResNet-SegNet compared to their COVLIAS 1.0 system against MedSeg. This study used HDL models and demonstrated standard Mann–Whitney, Paired t-Test, and Wilcoxon tests to prove the system's stability.

Method and Methodology

Demographics and Data Acquisition

The proposed study utilizes two different cohorts from different countries. The first dataset contains 72 adult Italian patients (approximately 5000 images, Fig. 2), 46 males, and the remainder were female. A total of 60 people tested positive for RT-PCR in which broncho-alveolar lavage [88] was used with 12 individuals. This Italian cohort had an average GGO of 2.1 which was considered low. The second cohort consisted of 80 Croatian patients (approximately 5000 images, Fig. 3), of which 57 were male and the rest female patients. This cohort had a mean age of 66 and an average GGO of 4.1, which was considered high.

Fig. 2

Sample CT scans taken from raw CRO data sets

Fig. 3

Sample CT scans taken from raw ITA data sets

Sample CT scans taken from raw CRO data sets Sample CT scans taken from raw ITA data sets For the patients in the Italian cohort, CT data were acquired using Philips' automatic tube current modulation – Z-DOM), while Croatia's CT volumes were acquired using the FCT Speedia HD 64-detector MDCT scanner (Fujifilm Corporation, Tokyo, Japan, 2017). The exclusion criteria consisted of patients having metallic items or poor image quality without artifacts or blurriness induced by patient movement during scan execution [38].

AI Architectures Adapted

The proposed study uses a total of nine AI models, of which (i) PSPNet (see Supplemental A.1), (ii) SegNet, and (iii) UNet are SDL models and (iv) VGG-PSPNet (Fig. 4), (v) ResNet-PSPNet (Fig. 5), (vi) VGG-SegNet (see Supplemental A.2), (vii) ResNet-SegNet (see Supplemental A.3), (viii)VGG-UNet (Fig. 6), and (ix) ResNet-UNet (Fig. 7) are the HDL models. The difference between the SDL and HDL is that the traditional backbone or encoder part of the SDL model is replaced with a new model like VGG and ResNet. Suri et al. [39, 40, 48, 49, 89] Recent findings show that employing HDL models over SDL models in the medical sector helps learn complicated imaging features rapidly and reliably. Using this knowledge of the performance of HDL > SDL, we here introduce four new HDL models, namely, VGG-PSPNet, ResNet-PSPNet, VGG-UNet, and ResNet-UNet for lung segmentation of COVID-19-based CT images.

Fig. 4

VGG-PSPNet architecture

Fig. 5

ResNet-PSPNet architecture

Fig. 6

VGG-UNet architecture

Fig. 7

ResNet-UNet architecture

VGG-PSPNet architecture ResNet-PSPNet architecture VGG-UNet architecture ResNet-UNet architecture UNet [90] was the first medical segmentation model that consisted of mainly two sections (i) encoder, where the model tries to learn the features in the images, and (ii) decoder, the part of the model that up-samples the image to produce the desired output like a segmented binary lung mask in this study. Another model used in this paper is SegNet [91], which transfers only the pooling indices from the compression (encoder) path to the expansion (decoder) path, thereby using low memory. The Pyramid Scene Parsing Network (PSPNet) [92] is a semantic segmentation network that considers the full context of an image using its pyramid pooling module. PSPNet extracts the feature map from an input image using a pretrained CNN and the dilated network technique. The size of the resulting feature map is 1/8 that of the input image. Finally, the collection of these features is used to generate the output binary mask. Residual networks (ResNet) [93] use a sequential technique of "skip connections" and "batch normalization" to train deep layers without sacrificing efficiency, permitting gradients to bypass a set number of levels. This solves the vanishing gradient problem which is not present in VGGNet [94]. The primary attributes of the AI models such as the backbone used in the architecture, the number of layers in the training models, the total number of parameters in the architecture, and the final size of the trained models are further discussed and compared in the discussion section.

Experimental Protocol

This study involves two datasets from different centers, each of ~ 5000 lung CT images for COVID-19 patients. We have utilized a fivefold cross-validation [95, 96] technique for the training of AI models without overlap. The training and testing performance was determined by the accuracy score of the binary output of the trained AI model and gold standard [39, 40], respectively. The accuracy of the system was computed using standardized protocol given the true positive, true negative, false negative, and false positive. Finally, to assess the model's training during the backpropagation, the cross-entropy loss function was employed. The plots of the accuracy and loss function can be seen in Figs. 8 and 9.

Fig. 8

Accuracy and loss plot for the nine AI models for the training on the CRO dataset

Fig. 9

Accuracy and loss plot for the nine AI models for the training on the ITA dataset

Accuracy and loss plot for the nine AI models for the training on the CRO dataset Accuracy and loss plot for the nine AI models for the training on the ITA dataset

Results and Performance Evaluations

Results

To prove our hypothesis that the performance of the HDL > SDL models in the proposed study, we present a comparison between (i) SDL and HDL models and (ii) the difference in training the models using high-GGO and low-GGO lung CT images. The accuracy and loss plots for the nine AI model for CRO and ITA dataset is presented in Figs. 8 and 9. Using overlays (Figs. 10, 11, 12 and 13), we present a visual representation of the results from the AI models by comparing against four different scenarios, namely, seen analysis using (i) train on Croatia data (CRO) and test on CRO, (ii) train on Italy data (ITA) and test on ITA. Similarly for Unseen analysis, (iii) train on CRO and test on ITA, and finally (iv) train on ITA and test on CRO. This study makes use of two different datasets (i) CRO with ~ 5000 CT images of COVID-19 patients who are considered as patients with high-GGO and (ii) ITA with ~ 5000 COVID-19 CT images regarded as low-GGO patients.

Fig. 10

Visual overlays (set 1) showing the AI (green) output against the GT (red) for Seen analysis

Fig. 11

Visual overlays (set 2) showing the AI (green) output against the GT (red) for Seen analysis

Fig. 12

Visual overlays (set 1) showing the AI (green) output against the GT (red) for Unseen analysis

Fig. 13

Visual overlays (set 2) showing the AI (green) output against the GT (red) for Unseen analysis

Visual overlays (set 1) showing the AI (green) output against the GT (red) for Seen analysis Visual overlays (set 2) showing the AI (green) output against the GT (red) for Seen analysis Visual overlays (set 1) showing the AI (green) output against the GT (red) for Unseen analysis Visual overlays (set 2) showing the AI (green) output against the GT (red) for Unseen analysis

Performance Evaluation

This study presents (i) DS, (ii) JI, (iii) BA, (iv) CC plots, and (v) Figure of Merit (FoM) as part of performance evaluation for nine AI models under Seen and Unseen settings. The cumulative frequency distribution (CFD) plot for DS and JI is presented in Figs. 14, 15, 16 and 17 at a threshold cutoff of 80%. Figures 16, 17, 18 and 19 show the BA plot with mean and standard deviation (SD) line for the estimated lung area against the AI models and ground truth tracings. Similarly, CC plots with a cutoff of 80% are displayed in Figs. 18, 19, 20 and 21. We present a summary, mean, SD, and percentage improvement for all six AI models for DS, JI, and CC values in Tables 1, 2 and 3. When comparing four scenarios for Seen and Unseen settings against SDL and HDL, the DS score is better by 1%, 3%, 1%, and 1%, the JI score is better by 3%, 5%, 3%, and 2%, and finally, for CC, the performance is better by 2%, 1%, 1%, and 6%, thus proving the hypothesis for COVID-19 lungs that performance of HDL > SDL. The standard deviation for all the AI models lies in the range of 0.01 to 0.06, which is considered stable because of the values being in the second decimal place.

Fig. 14

Cumulative frequency plot for Dice using Seen analysis

Fig. 15

Cumulative frequency plot for Dice using Unseen analysis

Fig. 16

Cumulative frequency plot for Jaccard using Seen analysis

Fig. 17

Cumulative frequency plot for Jaccard using Unseen analysis

Fig. 18

BA plot for Seen analysis

Fig. 19

BA plot for Unseen analysis

Fig. 20

CC plot for Seen analysis

Fig. 21

CC plot for Unseen analysis

Table 1

Dice Similarity table for the nine AI models

Dice Similarity: Solo Deep Learning
	Seen-AI		Unseen-AI
Model	CRO-CRO	ITA-ITA	CRO-ITA	ITA-CRO
PSPNet	0.93	0.93	0.93	0.88
SegNet	0.95	0.96	0.89	0.91
UNet	0.93	0.95	0.92	0.9
µ	0.94	0.95	0.91	0.90
σ	0.01	0.02	0.02	0.02
Dice Similarity: Hybrid Deep Learning
	Seen-AI		Unseen-AI
Model	CRO-CRO	ITA-ITA	CRO-ITA	ITA-CRO
VGG-PSPNet	0.93	0.94	0.94	0.9
VGG-SegNet	0.94	0.96	0.95	0.92
VGG-UNet	0.96	0.95	0.93	0.84
ResNet-PSPNet	0.95	0.95	0.95	0.91
ResNet-SegNet	0.96	0.97	0.95	0.94
ResNet-UNet	0.96	0.97	0.94	0.93
µ	0.95	0.96	0.94	0.91
σ	0.01	0.01	0.01	0.04
% Improvement	1%	1%	3%	1%

Table 2

Jaccard Index table for the nine AI models

Jaccard Index: Solo Deep Learning
	Seen-AI		Unseen-AI
Model	CRO-CRO	ITA-ITA	CRO-ITA	ITA-CRO
PSPNet	0.86	0.87	0.87	0.8
SegNet	0.9	0.93	0.8	0.83
UNet	0.87	0.92	0.87	0.83
µ	0.88	0.91	0.85	0.82
σ	0.02	0.03	0.04	0.02
Jaccard Index: Hybrid Deep Learning
	Seen-AI		Unseen-AI
Model	CRO-CRO	ITA-ITA	CRO-ITA	ITA-CRO
VGG-PSPNet	0.85	0.95	0.9	0.81
VGG-SegNet	0.89	0.93	0.85	0.86
VGG-UNet	0.92	0.9	0.88	0.74
ResNet-PSPNet	0.89	0.91	0.9	0.83
ResNet-SegNet	0.93	0.94	0.91	0.88
ResNet-UNet	0.93	0.95	0.89	0.88
µ	0.90	0.93	0.89	0.83
σ	0.03	0.02	0.02	0.05
% Improvement	3%	3%	5%	2%

Table 3

Correlation Coefficient (P < 0.0001) for the nine AI models

CC: Solo Deep Learning
	Seen-AI		Unseen-AI
Models	CRO-CRO	ITA-ITA	CRO-ITA	ITA-CRO
PSPNet	0.98	0.98	0.97	0.77
SegNet	0.99	0.99	0.98	0.97
UNet	0.95	0.97	0.95	0.97
µ	0.97	0.98	0.97	0.90
σ	0.02	0.01	0.02	0.12
CC: Hybrid Deep Learning
	Seen-AI		Unseen-AI
Models	CRO-CRO	ITA-ITA	CRO-ITA	ITA-CRO
VGG-PSPNet	0.99	0.98	0.98	0.92
VGG-SegNet	0.98	0.99	0.96	0.98
VGG-UNet	0.99	0.98	0.98	0.85
ResNet-PSPNet	0.99	1	0.99	0.99
ResNet-SegNet	0.99	1	0.98	0.99
ResNet-UNet	0.99	1	0.97	0.99
µ	0.99	0.99	0.98	0.95
σ	0.00	0.01	0.01	0.06
% Improvement	2%	1%	1%	6%

Cumulative frequency plot for Dice using Seen analysis Cumulative frequency plot for Dice using Unseen analysis Cumulative frequency plot for Jaccard using Seen analysis Cumulative frequency plot for Jaccard using Unseen analysis BA plot for Seen analysis BA plot for Unseen analysis CC plot for Seen analysis CC plot for Unseen analysis Dice Similarity table for the nine AI models Jaccard Index table for the nine AI models Correlation Coefficient (P < 0.0001) for the nine AI models

Scientific Validation

The results from the MedSeg tool were compared against gold standard tracings of the two datasets used in the study. Figure 22 shows a cumulative frequency plot of DS for the segmented lungs using the MedSeg tool for Italian and Croatian datasets using COVLIAS. Similarly, Figs. 23 and 24 show the JI and CC plot of the results from the MedSeg compared to the ground truth tracings of the two datasets, with ITA on the left and CRO on the right. The percentage difference between the DS, JI, and CC score of the COVLIAS AI models in comparison to MedSeg is < 5%, thus proving the applicability of the proposed AI models in the clinical domain. Finally, the mean and standard deviation of the lung area error is presented in Fig. 25 using the BA plot and is used in the same notion with ITA on the left and CRO on the right. For the determination of the system’s error, Table 4 presents Figure of Merit for the nine AI models of Seen and Unseen analysis. Finally, to prove the reliability of the AI-based segmentation system COVLIAS, statistical test such as Mann–Whitney, Paired t-Test, and Wilcoxon test is presented for Seen (Table 5) and Unseen (Table 6) analysis. MedCalc software (Osteen, Belgium) was used to carry out all the tests.

Fig. 22

Cumulative frequency plot of DS for MedSeg for ITA (left) and CRO (right) data sets

Fig. 23

Cumulative frequency plot of JI for MedSeg for ITA data (left) and CRO data (right)

Fig. 24

CC plot for MedSeg vs. GT for ITA (left) and CRO (right)

Fig. 25

BA plot for MedSeg vs. GT for ITA (left) and CRO (right)

Table 4

The Figure of Merit for the nine AI models for Seen-AI vs. Unseen-AI

	Seen-AI		Unseen-AI
Models	CRO-CRO	ITA-ITA	CRO-ITA	ITA-CRO
PSPNet	90.93	94.41	91.84	96.47
SegNet	92.76	96.51	95.89	82.25
UNet	87.51	94.21	99.12	80.85
VGG-PSPNet	85.67	96.84	96.68	99.06
VGG-SegNet	92.48	98.79	81.33	91.56
VGG-UNet	98.74	91.63	88.60	72.49
ResNet-PSPNet	95.19	95.86	93.21	82.96
ResNet-SegNet	95.99	97.24	92.06	85.38
ResNet-UNet	99.85	99.26	86.83	94.77

Table 5

Statistical tests for Seen-AI analysis on nine AI models

	CRO-CRO			ITA-ITA
Models	Paired t-Test	Mann–Whitney	Wilcoxon	Paired t-Test	Mann–Whitney	Wilcoxon
PSPNet	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001
SegNet	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001
UNet	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001
VGG-PSPNet	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001
VGG-SegNet	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001
VGG-UNet	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001
ResNet-PSPNet	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001
ResNet-SegNet	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001
ResNet-UNet	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001

Table 6

Statistical tests for Unseen-AI analysis on nine AI models

	CRO-ITA			ITA-CRO
Models	Paired t-Test	Mann–Whitney	Wilcoxon	Paired t-Test	Mann–Whitney	Wilcoxon
PSPNet	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001
SegNet	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001
UNet	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001
VGG-PSPNet	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001
VGG-SegNet	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001
VGG-UNet	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001
ResNet-PSPNet	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001
ResNet-SegNet	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001
ResNet-UNet	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001	P < 0.0001

Cumulative frequency plot of DS for MedSeg for ITA (left) and CRO (right) data sets Cumulative frequency plot of JI for MedSeg for ITA data (left) and CRO data (right) CC plot for MedSeg vs. GT for ITA (left) and CRO (right) BA plot for MedSeg vs. GT for ITA (left) and CRO (right) The Figure of Merit for the nine AI models for Seen-AI vs. Unseen-AI Statistical tests for Seen-AI analysis on nine AI models Statistical tests for Unseen-AI analysis on nine AI models Nine AI architectures and their comparison MB MegaBytes, M Million, NN Neural Network *in minutes

Discussion

This proposed study presented nine automated CT lung segmentation techniques in AI framework using three SDL, namely, (i) PSPNet, (ii) SegNet, (iii) UNet and six HDL models, namely, (iv) VGG-PSPNet, (v) VGG-SegNet, (vi) VGG-UNet, (vii) ResNet-PSPNet, (viii) ResNet-SegNet, (ix) ResNet-UNet. To prove our hypothesis, we use automated HU adjustment to optimize values of (1600, -400) and train our AI models to predict on test data (Fig. 26). After HU adjustment for DS, JI, and CC, the percentage improvement for Seen AI is 1%, 3%, and 6%, and for the Unseen AI is ~ 4%, ~ 5%, and 6%, respectively. We concluded that Unseen AI is possible using automated HU adjustment. Further, HDL was found to be superior to SDL (Table 1, 2 and 3).

Fig. 26

Overlay of segmentation results from the ResNet-SegNet model trained without adjusting the HU level (red) and after adjusting the HU level (green). The white arrow represents the under-estimated region and the red arrows represent the same region estimated accurately by the ResNet-SegNet model

Comparison and Contrast of the Nine AI Models

The proposed study uses a total of nine AI architectures with three SDL (PSPNet, SegNet and UNet) and six HDL models (VGG-PSPNet, VGG-SegNet, VGG-UNet, ResNet-PSPNet, ResNet-SegNet, and ResNet-UNet). ResNet-PSPNet was the AI model with the highest # of NN layers and model size, equally. The training for all the AI models was implemented on NVIDIA DGX V100 using python [97] and adapting multiple GPUs to speed up the training time (Table 7 and Fig. 27).

Table 7

Nine AI architectures and their comparison

		3 SDL Models			6 HDL Models
SN	Attributes	PSPNet	SegNet	UNet	VGG-PSPNet	VGG-SegNet	VGG-UNet	ResNet-PSPNet	ResNet-SegNet	ResNet-UNet
1	Backbone	NA	VGG-19	VGG-19	VGG-16	VGG-16	VGG-16	Res-50	Res-50	Res-50
2	Loss Function	CE	CE	CE	CE	CE	CE	CE	CE	CE
3	# Parameters	~ 4.4 M	~ 3.8 M	~ 4.6 M	~ 18.2 M	~ 11.6 M	~ 12.4 M	~ 31 M	~ 15 M	~ 16.5 M
4	# NN Layers	54	39	42	47	33	36	202	160	165
5	Size (MB)	50	43	52	209	133	142	355	171	188
6	# Epoch	50	50	50	50	50	50	50	50	50
7	Batch Size	8	8	8	2	4	4	2	4	4
8	Training Time*	~ 17	~ 15	~ 16	~ 60	~ 50	~ 50	~ 70	~ 60	~ 60
9	Prediction Time	< 2 s	< 2 s	< 2 s	< 2 s	< 2 s	< 2 s	< 2 s	< 2 s	< 2 s

MB MegaBytes, M Million, NN Neural Network

*in minutes

Fig. 27

Left: Number of NN layers. Right: Size of the final AI models used in COVLIAS 1.0

Benchmarking

Table 8 shows the benchmarking table using CT imaging. Our proposed study (row #7) took 10,000 CT scans of 152 patients and implemented 9 different models that consisted of three SDL, namely, PSPNet, SegNet, UNet, and six HDL models, namely, VGG-PSPNet, VGG-SegNet, VGG-UNet, ResNet-PSPNet, ResNet-SegNet, ResNet-UNet. The four scenarios (CRO-CRO, ITA-ITA, CRO-ITA, and ITA-CRO) correspond to SDL and HDL.

Table 8

Benchmarking table

-	C1	C2	C3	C4	C5	C6	C7	C8	C9	C10	C11	C12	C13
R#	Author	# Patients	# Images	Image Dim	#M	Model Types	Solo vs. HDL	Dim	AE	DS	JI	BA	ACC
R1	Paluru et al. [80]	69	~ 4339	512²	1	AnamNet	Solo	2D	✖	✔	✖	✖	✔
R2	Saood and Hatem [85]	-	~ 100	256²	2	UNet, SegNet	Solo	2D	✖	✔	✖	✖	✔
R3	Cai et al. [86]	99	~ 250	-	1	UNet	Solo	2D	✖	✔	✔	✖	✖
R4	Suri et al. [40]	72	~ 5000	768²	4	NIH, SegNet, VGG-SegNet, ResNet-SegNet	Both	2D	✔	✔	✔	✔	✔
R5	Suri et al. [39]	72	~ 5000	768²	3	PSPNet, VGG-SegNet, ResNet-SegNet	Both	2D	✔	✔	✔	✔	✔
R6	Suri et al. [38]	79	~ 5500	768²	2	VGG-SegNet, ResNet-SegNet	HDL	2D	✔	✔	✔	✔	✔
R7	Suri et al. (Proposed)	152	> 10,000	512²	9	PSPNet, SegNet, UNet, VGG-PSPNet, VGG-SegNet, VGG-UNet, ResNet-PSPNet, ResNet-SegNet, ResNet-UNet	Both	2D	✔	✔	✔	✔	✔

# number, HDL Hybrid Deep Learning, AE Area Error, DS Dice Similarity, JI Jaccard Index, BA Bland–Altman, ACC Accuracy, Dim Dimension (2D vs. 3D), R# Row number, #M number of AI models

Benchmarking table NIH, SegNet, VGG-SegNet, ResNet-SegNet PSPNet, VGG-SegNet, ResNet-SegNet VGG-SegNet, ResNet-SegNet PSPNet, SegNet, UNet, VGG-PSPNet, VGG-SegNet, VGG-UNet, ResNet-PSPNet, ResNet-SegNet, ResNet-UNet # number, HDL Hybrid Deep Learning, AE Area Error, DS Dice Similarity, JI Jaccard Index, BA Bland–Altman, ACC Accuracy, Dim Dimension (2D vs. 3D), R# Row number, #M number of AI models

A Special note on Tissue Characterization

Lung segmentation can be considered as a tissue characterization (TC) process and was tried before using ML such as in plaque TC [66, 98], lung TC [99], coronary artery disease characterization [100], liver TC [101], or in cancer application such as skin cancer [102], ovarian cancer [103]. Other types of advanced TC can be using hybrid models such as [24, 36, 51].

Strength, Weakness, and Extensions

This proposed study, COVLIAS 1.0-Unseen proves our two hypotheses, (i) contrast adjustment is vital for AI, and (ii) HDL is superior to SDL using nine models considering 5,000 CT scans. The system was validated against MedSeg and tested for reliability and stability. It can also be noted that while training the AI model for COVID-19 infected lungs, it is necessary to adjust the HU levels to get the results of the segmentation accurately. Even though we used HU adjustments (i) it can be extended by adjusting the contrast, removing noise, and adjusting the window level [104]. (ii) Multimodality cross-validation such as ultrasound [105]. (iii) More advanced image processing tools such as level sets [106], stochastic segmentation [107], and computer-aided diagnostic tools [108, 109] can be integrated with AI models for lung segmentation. (iv) Recently, there have been studies to compute the bias in AI and it would be interesting to evaluate the bias models using AP(ai)Bias (AtheroPoint, Roseville, CA, USA) and other competitive models [42]. (v) CVD assessment of patients during the CT imaging [110].

Conclusions

The proposed research compares three SDL models, namely, PSPNet, SegNet, UNet, and six HDL models, namely, VGG-PSPNet, VGG-SegNet, VGG-UNet, ResNet-PSPNet, ResNet-SegNet, and ResNet-UNet against MedSeg for CT lung segmentation. It also performed the benchmarking of three SDL and 6 HDL models against MedSeg. The multicentre CT data was collected from Italy (ITA) with low-GGO, and Croatia (CRO with high-GGO hospitals, each with ~ 5000 COVID-19 images. These CT images were annotated by two trained, blinded senior radiologists, thus creating an inter-variable multicentre dataset. To prove our hypothesis, we use an automated Hounsfield Units (HU) adjustment methodology to train the AI models, leading to four combinations of two Unseen sets: train-CRO:test-ITA, train-ITA:test-CRO, and two Seen sets: train-CRO:test-CRO, train-ITA:test-ITA. To keep the test set unique for each fold, we adapted a five-fold cross-validation technique. Five types of performance metrics, namely, (i) DS, (ii) JI, (iii) BA plots, (iv) CC plots, and (v) Figure-of-Merit. For DS and JI, HDL (Unseen AI) > SDL (Unseen AI) by 4% and 5%, respectively. For CC, HDL (Unseen AI) > SDL (Unseen AI) by 6%. The COVLIAS-MedSeg difference was < 5%, thus proving the hypothesis and making it fit in clinical settings. Statistical tests such as Paired t-Test, Mann–Whitney, and Wilcoxon were used to demonstrate the stability and reliability of the AI system. Below is the link to the electronic supplementary material. Supplementary file1 (DOCX 255 KB)

38 in total

Review 1. Molecular pathways triggered by COVID-19 in different organs: ACE2 receptor-expressing cells under attack? A review.

Authors: L Saba; C Gerosa; D Fanni; F Marongiu; G La Nasa; G Caocci; D Barcellona; A Balestrieri; F Coghe; G Orru; P Coni; M Piras; F Ledda; J S Suri; A Ronchi; F D'Andrea; R Cau; M Castagnola; G Faa
Journal: Eur Rev Med Pharmacol Sci Date: 2020-12 Impact factor: 3.507

Review 2. Scanning electron microscopy of lung disease due to COVID-19 - a case report and a review of the literature.

Authors: T Congiu; R Demontis; F Cau; M Piras; D Fanni; C Gerosa; C Botta; A Scano; A Chighine; E Faedda; R Cau; P Van Eyken; F Marongiu; D Barcellona; L Saba; G Orrù; F Coghe; J S Suri; G Faa; E d'Aloja
Journal: Eur Rev Med Pharmacol Sci Date: 2021-12 Impact factor: 3.507

Review 3. Aortic vulnerability to COVID-19: is the microvasculature of vasa vasorum a key factor? A case report and a review of the literature.

Authors: G Faa; C Gerosa; D Fanni; D Barcellona; G Cerrone; G Orrù; A Scano; F Marongiu; J S Suri; R Demontis; M Nioi; E D'Aloja; G La Nasa; L Saba
Journal: Eur Rev Med Pharmacol Sci Date: 2021-10 Impact factor: 3.507

4. Vaccine-induced severe thrombotic thrombocytopenia following COVID-19 vaccination: a report of an autoptic case and review of the literature.

Authors: D Fanni; L Saba; R Demontis; C Gerosa; A Chighine; M Nioi; J S Suri; A Ravarino; F Cau; D Barcellona; M C Botta; M Porcu; A Scano; F Coghe; G Orrù; P Van Eyken; Y Gibo; G La Nasa; E D'aloja; F Marongiu; G Faa
Journal: Eur Rev Med Pharmacol Sci Date: 2021-08 Impact factor: 3.507

5. Complications in COVID-19 patients: Characteristics of pulmonary embolism.

Authors: Riccardo Cau; Alberto Pacielli; Homayounieh Fatemeh; Paolo Vaudano; Chiara Arru; Paola Crivelli; Giuseppe Stranieri; Jasjit S Suri; Lorenzo Mannelli; Maurizio Conti; Abdelkader Mahammedi; Mannudeep Kalra; Luca Saba
Journal: Clin Imaging Date: 2021-05-18 Impact factor: 1.605

6. Reorganizing stroke and neurological intensive care during the COVID-19 pandemic in Germany.

Authors: Niklas Alexander Kämpfer; Andrea Naldi; Nicola Luigi Bragazzi; Klaus Fassbender; Martin Lesmeister; Piergiorgio Lochner
Journal: Acta Biomed Date: 2021-11-03

Review 7. Imaging in COVID-19-related myocardial injury.

Authors: Riccardo Cau; Pier Paolo Bassareo; Lorenzo Mannelli; Jasjit S Suri; Luca Saba
Journal: Int J Cardiovasc Imaging Date: 2020-11-19 Impact factor: 2.357

Review 8. COVID-19 pathways for brain and heart injury in comorbidity patients: A role of medical imaging and artificial intelligence-based COVID severity classification: A review.

Authors: Jasjit S Suri; Anudeep Puvvula; Mainak Biswas; Misha Majhail; Luca Saba; Gavino Faa; Inder M Singh; Ronald Oberleitner; Monika Turk; Paramjit S Chadha; Amer M Johri; J Miguel Sanches; Narendra N Khanna; Klaudija Viskovic; Sophie Mavrogeni; John R Laird; Gyan Pareek; Martin Miner; David W Sobel; Antonella Balestrieri; Petros P Sfikakis; George Tsoulfas; Athanasios Protogerou; Durga Prasanna Misra; Vikas Agarwal; George D Kitas; Puneet Ahluwalia; Raghu Kolluri; Jagjit Teji; Mustafa Al Maini; Ann Agbakoba; Surinder K Dhanjil; Meyypan Sockalingam; Ajit Saxena; Andrew Nicolaides; Aditya Sharma; Vijay Rathore; Janet N A Ajuluchukwu; Mostafa Fatemi; Azra Alizad; Vijay Viswanathan; Pudukode R Krishnan; Subbaram Naidu
Journal: Comput Biol Med Date: 2020-08-14 Impact factor: 4.589

9. WHO Declares COVID-19 a Pandemic.

Authors: Domenico Cucinotta; Maurizio Vanelli
Journal: Acta Biomed Date: 2020-03-19