Literature DB >> 34337587

COVID TV-Unet: Segmenting COVID-19 chest CT images using connectivity imposed Unet.

Narges Saeedizadeh¹, Shervin Minaee², Rahele Kafieh¹, Shakib Yazdani³, Milan Sonka⁴.

Abstract

The novel corona-virus disease (COVID-19) pandemic has caused a major outbreak in more than 200 countries around the world, leading to a severe impact on the health and life of many people globally. By October 2020, more than 44 million people were infected, and more than 1,000,000 deaths were reported. Computed Tomography (CT) images can be used as an alternative to the time-consuming RT-PCR test, to detect COVID-19. In this work we propose a segmentation framework to detect chest regions in CT images, which are infected by COVID-19. An architecture similar to a Unet model was employed to detect ground glass regions on a voxel level. As the infected regions tend to form connected components (rather than randomly distributed voxels), a suitable regularization term based on 2D-anisotropic total-variation was developed and added to the loss function. The proposed model is therefore called "TV-Unet". Experimental results obtained on a relatively large-scale CT segmentation dataset of around 900 images, incorporating this new regularization term leads to a 2% gain on overall segmentation performance compared to the Unet trained from scratch. Our experimental analysis, ranging from visual evaluation of the predicted segmentation results to quantitative assessment of segmentation performance (precision, recall, Dice score, and mIoU) demonstrated great ability to identify COVID-19 associated regions of the lungs, achieving a mIoU rate of over 99%, and a Dice score of around 86%.

Entities: Chemical

Keywords: COVID-19; Computed tomography; Convolutional encoder decoder; Deep learning; Image segmentation; Total variation

Year: 2021 PMID： 34337587 PMCID： PMC8056883 DOI： 10.1016/j.cmpbup.2021.100007

Source DB: PubMed Journal: Comput Methods Programs Biomed Update ISSN： 2666-9900

Introduction

Since December 2019, a novel corona-virus (SARS-CoV-2) has spread from Wuhan to the whole China, and then to many other countries. At the end of January 2020, the World Health Organization (WHO) declared COVID-19 a Public Health Emergency of International Concern [1]. By October 2020, more than 44 million confirmed cases, and more than 1,000,000 deaths were reportedd across the world [2]. While infection rates are decreasing in some countries, numbers of new infections continue quickly growing in many other countries, signaling the continuing and global threat of COVID-19 [3], [4], [5]. Up to this point, no effective treatment has yet been proven for COVID-19. Therefore for prompt prevention of COVID-19 spread, accurate and rapid testing is extremely pivotal. The reverse transcription polymerase chain reaction (RT-PCR) has been considered the gold standard in diagnosing COVID-19. However, the shortage of available tests and testing equipment in many areas of the world limits rapid and accurate screening of suspected subjects. Even under best circumstances, obtaining RT-PCR test results takes more than 6 hours and the routinely achieved sensitivity of RT-PCR is insufficient [6]. On the other hand, the radiological imaging techniques like chest X-rays and computed tomography (CT) followed by automated image analysis [7] may successfully complement RT-PCR testing. CT screening provides three-dimensional view of the lung and is therefore more sensitive (although less widely available) compared to chest X-ray radiography. In a systematic review [8] the authors indicated that CT images are sensitive in detection of COVID-19 before observation of some clinical symptoms. Typical signs of COVID-19 in CT images consist of unilateral, multifocal and peripherally based ground glass opacities (GGO), interlobular septal thickening, thickening of the adjacent pleura, presence of pulmonary nodules, round cystic changes, bronchiectasis, pleural effusion, and lymphadenopathy [9], [10]. Accurate and rapid detection and localization of these pathological tissue changes is critical for early diagnosis and reliable severity assessment of COVID-19 infection. As the number of affected patients is high, manual annotation by well-trained expert radiologists is time consuming, subject to inter- and intra-observer variability, and slows down the CT analysis. The urgent need for automatic segmentation of typical COVID-19 CT signatures is widely appreciated and deep learning methods can offer a unique solution for identifying COVID-19 signs of infection in clinical-quality images frequently suffering from variations in CT acquisition parameters and protocols [11]. In this work, we present a deep learning based framework for automatic segmentation of pathologic COVID-19-associated tissue areas from clinical CT images available from publicly available COVID-19 segmentation datasets. Our solution is based on adapting and enhancing a popular deep learning medical image segmentation architecture Unet for COVID-19 segmentation task. As COVID-19 tissue regions tend to form connected regions identifiable in individual CT slices, a “connectivity promoting regularization” term was added to the specifically designed training loss function to encourage the model to prefer sufficiently large connected segmentation regions of desirable properties. The main contributions of our work can be summarized as follows: Development of a novel connectivity-promoting regularization loss function for an image segmentation framework detecting pathologic COVID-19 regions in pulmonary CT images, Quantitative validation showing improved performance attributable to our new TV-Unet approach compared to published state-of-the-art segmentation approaches Public sharing the developed software code facilitating research and medical community use(https://github.com/narges-sa/COVID-CT-Segmentation) This paper is therefore a novel contribution in the rapidly advancing effort to develop techniques and approaches for reliable detection and quantification of COVID-19 disease in human lungs affected by the disease. Our work brings additional advances attributable to the new TV-Unet approach and demonstrates the performance improvement of our approach compared to recently reported approaches.

Related works

COVID-19 segmentation from CT images has recently received focused attention of the medical imaging community. There has been a huge progress in the performance of image segmentation models using various deep learning frameworks in recent years [12], [13], [14], [15], [16], [17]. In [18], Fan et al. proposed Inf-Net, to identify infected regions from chest CT slices. In their proposed model, a parallel partial decoder is used to aggregate high-level features and generate a global map. Then, the implicit reverse attention and explicit edge-attention are utilized to model the boundaries and enhance the representations. Unfortunately, this model used a very small dataset of CT-labeled images for segmentation, which consisted of only 100 CT slices, making it hard to generalize and compare their result. In order to have a fair comparison, results from TV-Unet and other methods which were implemented in the Inf-Net, namely Semi-Inf-Net, and Semi-Inf-Net+FCN8s [18], Unet++ [19], DeepLab-v3 [20], and FCN8s [21] are provided in the Discussion. In [22], Elharrouss et al. proposed an encoder-decoder based model for lung infection segmentation using CT-scan images. The proposed model initially considers the image structure and texture to extract the ROI of infected area, and then uses the ROI along with the image structure to predict the infected regions. Elharrouss et al. also trained this model on a small dataset of CT images, and achieved reasonable performance. In [23], Ma et al. prepared a new benchmark of 3D CT data with 20 cases that contained 1800+ annotated slices and provided several pre-trained baseline models that facilitated for out-of-the-box 3D segmentation. In [24], a novel COVID-19 diagnosis system was reported, utilizing a joint approach to deep image classification and segmentation (JCS). In their work, a large-scale COVID-19 dataset is used for classification and segmentation purposes (COVID-CS dataset) which contains 144,167 chest CT images of 400 COVID-19 subjects and 350 uninfected cases. They reached Dice score of 78.5% on the segmentation test set.

The proposed framework

Despite a large number of patients suffering from COVID-19, despite a growing number of COVID-19 volumetric CT scans, the availability of labeled CT images that can be used for training of deep learning methods is still limited. Therefore, our strategy relies heavily on the use of transfer learning, initiating the training from a model previously developed for medical image segmentation (segmentation of neuronal structures in electron microscopic stacks), and adapt it toward this task. To better suit the segmentation task at hand, we employed an architecture similar to the Unet, one of the most successful deep learning medical image segmentation approaches, and modified its loss function to prefer COVID-19 specific foreground mask connectivity.

Unet architecture

Unet is one of the popular segmentation models which is based on encoder-decoder neural architecture and use of skip connections, and was originally proposed by Ronneberger et al. [17]. The network architecture of Unet is illustrated in Fig. 1 . In the encoder part, model gets an image as input and applies multiple layers of convolution, max-pooling and ReLU activation, and compresses the data into a latent space. In the decoder part, the network attempts to decode the information from the latent space using transposed convolution operation (deconvolution) and produce the segmentation mask of the image. The rest of the operations are similar to the aforementioned ones in the encoder part. One difference between Unet and plain encoder-decoder model is the use of skip-connections to send the information from the corresponding high-resolution layers of the encoder to the decoder, which can help the network to better capture small details that are present in high-resolution. Fig. 1 illustrates the general architecture of a Unet model.

Fig. 1

The architecture of Unet model

The architecture of Unet model Similar to other neural segmentation models, Unet uses the loss function in Eq. (1) during training.Where : {1, ..., K} considering that and K denotes the total number of classes. When using the binary cross-entropy, K=2. Moreover, the soft-max is defined as = where represents the activation in feature channel K [17]. Additionally, in a network that contains many convolutional layers, the initial weights greatly affect the performance. In this regard, approximately unit-variances of the initial weights of network feature maps performed best. In the network with convolutional and ReLU layers such as Unet, using a Gaussian distribution with a standard deviation of is an effective solution. In this method, N represents the number of incoming nodes of one neuron [25].

Connectivity regularization

The segmentation maps usually consist of a number of connected components, and single-pixel regions are rare. To encourage our segmentation model to generate segmentation maps with connected components of desirable sizes, we found that incorporating an explicit regularization term in the training loss function can greatly improve connectivity requirements for the predicted segmentation regions. It is worth noting that Unet trained from scratch can also implicitly learn such behavior from training data to some extent, assuming sufficient data sizes are available, which is not quite the case in our situation. Several strategies were developed and considered to impose desired connectivity requirements within images such as adding group-sparsity or incorporate total variation terms [26], [27], [28]. Based on achieved experience, we decided to use total variation (TV) as its gradient update is computationally attractive during the backward pass stage. TV penalizes the generated images with large variations among neighboring pixels, leading to more connected and smoother solutions [27]. Total variation of a differentiable function defined on an interval has the following expression if is Riemann-integrable: Total variation of 1D discrete signals () is straightforward, and can be defined as:where is a matrix as below: For 2D signals (), we can use isotropic or anisotropic versions of 2D total variation [26]. To simplify our optimization problem, we have used the anisotropic version of TV, which is defined as the sum of horizontal and vertical gradients at each pixel: In our case we can add the total variation of the predicted binary mask for COVID-19 pixels to the loss function. Adding this 2D-TV regularization term to our framework will promote the connectivity of the produced segmentation regions. The new loss function for our model would then be defined as: where is the binary cross-entropy loss, M(x) is a COVID-19 probability mask generated in the last layer of the network, and is a constant that needs to be tuned. In this paper, based on the model performance on the validation and test set, was set as 1/(255*number of pixels) to provide the model with balance weights.

COVID CT segmentation dataset

We have combined all available data from the COVID-19 CT segmentation dataset [29], consisting of 929 CT slices from 49 patients. Out of these, 473 CT-image slices are labeled as including COVID-19 pathologies with Ground-Glass pathology regions identified by expert tracing. The remaining 456 CT image slices are labeled as COVID-19 pathology-free. CT-slice sizes were either 512512 or 630630. While a small subset of the 929 CT images also have regions of additional pathologies identified and labeled as Consolidation and/or Pleural Effusion, these pathologies were not consistently available for all the data. After consulting with a board-certified radiologist (GS) who confirmed that Ground-Glass pathology is most relevant for detecting COVID-19, this work focuses on the Ground Glass mask, and does not consider the Consolidation and Pleural Effusion masks due to their small numbers and lack of consistency across the dataset. Some sample images from the used dataset are shown in Figs. 2 and 3 . Fig. 2 demonstrates the difference between normal and COVID-19 images. The images in Fig. 3 denote the original images, and their corresponding COVID-19 Ground-Glass masks. The red boundary contours in the second row are drawn to better show the parts containing COVID-19, and are not a part of the original image.

Fig. 2

The difference between normal and COVID-19 images.

Fig. 3

Sample images from the COVID-19 CT segmentation dataset. The first row shows two COVID-19 images. The red boundary contours in the second row denote regions of COVID-19 Ground-Glass pathology and are not a part of the original image data.The third row shows Ground-Glass masks.

The difference between normal and COVID-19 images. Sample images from the COVID-19 CT segmentation dataset. The first row shows two COVID-19 images. The red boundary contours in the second row denote regions of COVID-19 Ground-Glass pathology and are not a part of the original image data.The third row shows Ground-Glass masks.

Training, validation, and testing sets

To evaluate the effect of radically different training/testing set composition and to demonstrate the robustness of the obtained results, two different splits of training, validation, and testing sets are selected from this dataset (Table 1 ).

Table 1

Training/Validation/Testing splits prior to data augmentation.

Data	Number of Images in Split 1	Number of Images in Split 2
Training	654	590
Validation	75	64
Test	200	275
Total	929	929

Training/Validation/Testing splits prior to data augmentation. In Split 1, data from a relatively large number of 46 training/validation-set patients and a small number of only 3 testing-set patients were used. 729 CT image slices formed the training and validation sets, and 200 images the testing set. In Split 2, a more balanced distribution of patient numbers was used with 654 CT image slices from 35 patients included in the training and validation sets, and 275 images from 14 patients formed the testing set.

Experimental results

In this section we provide a detailed experimental analysis of the proposed segmentation framework, by presenting both qualitative and quantitative results as well as comparing our results with a baseline approach.

Evaluation metrics

There are several metrics that are used by the research community to measure the performance of segmentation models, including precision, recall, Dice coefficient and mean Intersection over Union (mIoU). These metrics are also widely used in medical domain, and are defined as below. Precision is calculated as the ratio of pixels correctly predicted as COVID divided by total pixels predicted as COVID-19 where TP refers to the true positive (the number of correctly labeled COVID-19 pixels) and FP refers to the false positive (the number of incorrectly labeled COVID-19 pixels). Recall is the ratio of pixels correctly labeled as COVID-19 divided by total number of actual COVID-19 pixels and is defined as: where TP is the number of false positives, and FN refers to the false negatives and is the number of pixels mistakenly labeled as non-COVID. Precision-Recall (PR) curve is a popular way to holistically look at the model performance as a plot of the precision (y-axis) versus the recall (x-axis) rates for different thresholds. Dice coefficient (also known as Dice score or DSC) is another popular metric especially for the multi-class image segmentation: where A and B denote the predicted and ground-truth masks. Intersection over union (also known as the Jaccard index) is used to evaluate the similarity between ground truth and predicted segmentation masks. It is defined as the size of the intersection divided by the size of the union of the target mask and predicted segmentation map where A and B are the predicted and ground-truth masks. Mean-IoU is the average IoU value over all classes. If A and B are both empty, IoU(A,B) is defined as 1. IoU ranges between 0 and 1. It is worth mentioning that Dice coefficient and IoU are positively correlated.

Model hyper-parameters

Model hyper-parameters govern the machine learning processes and as such are crucial for achieving good performance, especially in the case of deep neural networks. Hyper-parameter tuning can be done in two different ways, automatically and manually. In this work, we manually evaluated different combinations of hyper-parameters and selected the best combination. To simplify the tuning process, we fixed the number of epochs to 100, and the batch-size to 32. We designed and compared different loss functions (such as binary cross-entropy (BCE) [30], Dice coefficient loss [31], and BCE plus total variation regularization), different optimizers (such as ADAM [32], Adagrad [33], Adadelta [34] and stochastic gradient descent (SGD)[35]), and different learning rates On First Split. We used adaptive learning rate scheduling and early stopping [36] criteria as below, which achieved good performance on the validation set: Learning rate is decayed whenever the validation loss does not improve for 5 continuous epochs. Early stopping is applied whenever the validation loss does not improve for 10 continues epochs. Table 2 shows the impact of the loss function design on First Split on the model performance with binary cross-entropy and the proposed connectivity regularized loss function achieving the best performance.

Table 2

Overall performance with different Loss Functions employed, the best cut-off threshold of 0.3 used. Best performance shown in bold font.

Loss	Optimizer	Learning Rate	mIOU	DSC	Average Precision
BCE	ADAM	0.001	0.993	0.839	0.92
DSC			0.990	0.764	0.90
BCE+DSC			0.993	0.843	0.91
BCE+DSC+TV			0.988	0.645	0.91
BCE+TV			0.995	0.864	0.94

Overall performance with different Loss Functions employed, the best cut-off threshold of 0.3 used. Best performance shown in bold font. The impact of the optimizer on the model performance is shown in Table 3 . As we can see, ADAM achieves the highest performance in terms of all metrics.

Table 3

Overall performance for different Optimizer selection, using the best cut-off threshold of 0.3. Best performance shown in bold font.

Loss	Optimizer	Learning Rate	mIOU	DSC	Average Precision
BCE+TV	ADAM	0.001	0.995	0.864	0.94
	SGD		0.985	0.573	0.8
	Adadelta		0.991	0.780	0.9
	Adagrad		0.992	0.784	0.9

Overall performance for different Optimizer selection, using the best cut-off threshold of 0.3. Best performance shown in bold font. Table 4 provides the analysis of model performance for two different learning rate values when using (the best performing) ADAM optimization.

Table 4

Overall model performance for different Learning Rates, again for the best cut-off threshold of 0.3. Best performance shown in bold font.

Loss	Optimizer	Learning Rate	mIOU	DSC	Average Precision
BCE+TV	ADAM	0.001	0.995	0.864	0.94
BCE+TV	ADAM	0.0001	0.993	0.838	0.92

Overall model performance for different Learning Rates, again for the best cut-off threshold of 0.3. Best performance shown in bold font. Table 5 demonstrates the impact of varying the value of on the final result when using (the best performing) ADAM optimization with the learning rate of 0.001.

Table 5

Overall model performance for different using the best cut-off threshold of 0.3. Best performance shown in bold font.

Loss	lamda	mIOU	DSC	Average Precision
BCE+TV	1	0.991	0.770	0.86
	1/number of pixels	0.992	0.824	0.90
	1/(256*number of pixels)	0.995	0.864	0.94

Overall model performance for different using the best cut-off threshold of 0.3. Best performance shown in bold font. Note that the model predicts a probability for each pixel, showing its likelihood of belonging to the pathologic COVID-19 region (zero denotes Non-COVID pixels and one denotes COVID-19 pathology). These probabilities are thresholded, different thresholds yield certain sensitivity/specificity rates. Threshold value of 0.3 achieved the best performance on the validation set and was therefore used to report the results of the proposed model. The impact of modifying the threshold values on the model accuracy is given in Section 5.4.

Predicted masks

Qualitative result showing how close our predicted masks are to the ground-truth masks are given in Fig. 4 for 5 sample images from the test set. As can be seen when the desired region is very tiny, the Unet trained from scratch cannot distinguish the segmentation region and background very well, while the proposed TV-Unet model performs notably better. If, however, the GT-mask consists of only a few isolated points (e.g., the second row in Fig. 4), our model sometimes fails detecting such points. Clearly, such isolated points do not provide a strong-enough supporting information for local detection of COVID-19 infection.

Fig. 4

Predicted segmentation masks by Unet trained from scratch and the proposed TV-Unet for a typical sample images from the testing set.

Cut-off threshold impact on model performance

As discussed previously, our model predicts a probability score for each pixel, showing the likelihood of its being in a COVID-19 pathology region. Different cut-off thresholds can be used on those probabilities to decide COVID-19 labeling. By increasing the cut-off threshold, less and less pixels would be labeled as COVID-19 pathology. Tables 6 and 7 show the model performance (in terms of precision, recall, and mIoU) for eight different values of cut-off thresholds for the Split-1 and Split-2 datasets. The cut-off threshold of 0.3 results in the highest Dice score, and mIoU metric, and therefore was employed to compare our model with other baseline models.

Table 6

Precision, recall, Dice score, and mIoU rates of TV-Unet model for different threshold values for Split 1. Confidence intervals provided for recall metric. Best performance shown in bold font.

Threshold	Recall	Precision	mIoU	DSC
0.1	0.955± 0.028	0.736	0.992	0.831
0.2	0.913± 0.039	0. 811	0.994	0.859
0.3	0.867± 0.047	0. 859	0.994	0.863
0.4	0.813± 0.054	0. 900	0.994	0.854
0.5	0.746± 0.060	0. 933	0.993	0.829
0.6	0.662± 0.065	0. 959	0.992	0.783
0.7	0.547± 0.089	0. 978	0.990	0.702
0.8	0.362± 0.066	0. 990	0.986	0.531

Table 7

Precision, recall, Dice score, and mIoU rates of TV-Unet model for different threshold values for Split 2. Best performance shown in bold font.

Threshold	Recall	Precision	mIoU	DSC
0.1	0.892	0.626	0.987	0.7363
0.2	0.833	0.700	0.989	0.7609
0.3	0.781	0.750	0.990	0.7643
0.4	0.730	0.789	0.990	0.7582
0.5	0.674	0.825	0.990	0.7413
0.6	0.610	0.859	0.990	0.7139
0.7	0.535	0.890	0.989	0.6692
0.8	0.422	0.926	0.987	0.5801

Precision, recall, Dice score, and mIoU rates of TV-Unet model for different threshold values for Split 1. Confidence intervals provided for recall metric. Best performance shown in bold font. Precision, recall, Dice score, and mIoU rates of TV-Unet model for different threshold values for Split 2. Best performance shown in bold font. To see the holistic view of the proposed model performance on all possible threshold values, Figs. 5 and 6 provide the precision-recall curves on the test sets in Split 1 and Split 2, respectively. Fig. 5 shows average precision of 0.92 for the Unet trained from scratch and 0.94 for our TV-Unet for Split 1 dataset (an improvement of around 0.02 in terms of average-precision). Fig. 6 shows average precision of 0.67 for the Unet trained from scratch and 0.88 for our TV-Unet for Split 2, a relative improvement of 31%.

Fig. 5

Precision-Recall curve for Split-1.

Fig. 6

Precision-Recall curve for Split-2.

Precision-Recall curve for Split-1. Precision-Recall curve for Split-2.

Model performance comparison with Unet trained from scratch

For a fair comparison between the proposed TV-Unet model and the Unet trained from scratch, corresponding cut-off thresholds were identified for similar recall rates for each model and compared in terms of other performance metrics. Tables 8 and 9 provide the comparison between these two models for four different recall rates. Consistency, with which our TV-Unet model outperforms that of the Unet trained from scratch when considering all metrics, shows the added value of the connectivity-promoting regularization. We have an average improvement of around 2% in terms of Dice score in Split 1 and 10.9% in Split 2 studies.

Table 8

Comparison of Unet trained from scratch and the proposed TV-Unet model performance in terms of precision, mIOU and DSC for Split 1. Best performance for each method shown in bold font.

Model	Recall	Precision	mIOU	DSC
Unet	0.975	0.575	0.985	0.727
	0.945	0.688	0.990	0.798
	0.91	0.765	0.992	0.832
	0.85	0.834	0.993	0.841
TV-Unet	0.975	0.675	0.990	0.798
	0.945	0.760	0.993	0.842
	0.91	0.812	0. 994	0.860
	0.85	0.871	0.995	0.864

Table 9

Comparison of the Unet trained from scratch and TV-Unet model performance in terms of precision, mIOU and DSC for Split 2. Best performance for each method shown in bold font.

Model	Recall	Precision	mIOU	DSC

Unet	0.810	0.594	0.983	0.655
	0.643	0.621	0.985	0.633
	0.535	0.655	0.985	0.595
	0.422	0.693	0.984	0.527
TV-Unet	0.810	0.727	0.990	0.764
	0.643	0.842	0.990	0.729
	0.535	0.890	0.989	0.670
	0.422	0.926	0.987	0.580

Comparison of Unet trained from scratch and the proposed TV-Unet model performance in terms of precision, mIOU and DSC for Split 1. Best performance for each method shown in bold font. Comparison of the Unet trained from scratch and TV-Unet model performance in terms of precision, mIOU and DSC for Split 2. Best performance for each method shown in bold font.

Training convergence analysis

To see the model convergence during training, we provide the loss function, recall, and precision rates of the model on different epochs, in Fig. 7, Fig. 8, Fig. 9 . It is worth to mention that for precision and recall, the default threshold value of 0.3 is used in these figures.

Fig. 7

The training and validation loss of the model during training.

Fig. 8

The training and validation recall of the model during training.

Fig. 9

The training and validation precision of the model during training.

The training and validation loss of the model during training. The training and validation recall of the model during training. The training and validation precision of the model during training.

Discussion

The general lack of available pulmonary CT datasets and especially a lack of expert-annotated CT data can be seen in so-far available publications for COVID-19 pathology detection in CT images. This lack of data also forces researchers to develop 2D CT-slice-based methods rather than volumetric 3D data analyses. The entire medical imaging community is anxiously awaiting the arrival of large annotated 3D CT datasets the analysis of which will however greatly benefit from the experience gained on current 2D CT data. One of important accomplishments to date is a number of new approaches and publicly available albeit small COVID-19 CT datasets that can be used for method comparisons. One such dataset with semi-supervised COVID-19 segmentations (COVID-SemiSeg) was recently reported in [37]. The COVID-SemiSeg dataset consists of two sets. The first one contains 1600 pseudo labels generated by Semi-Inf-Net model and 50 labels by expert physicians. The second set includes 50 multi-class labels. Overall, there are 48 images that can be used for performance-comparison assessment and these CT data were used to compare our TV-Unet approach with other methods. Our TV-Unet model reported above was therefore compared with the Inf-Net [18] and with other promising image segmentation models trained on COVID-SemiSeg dataset, including Unet++, Semi-Inf-Net, DeepLab-v3, FCN8s, and Semi-Inf-Net+FCN8s. Consistently using the SemiSeg dataset, Table 10, Table 11, Table 12 provide performance comparisons of individual methods in terms of recall and Dice coefficient. As can be seen from these tables, the proposed TV-Unet model outperformed all other models in all three tested experiments, which focused on a) identifying COVID-19-specific pathologic regions, Ground-Glass regions, and COVID-19 Consolidation regions.

Table 10

Model	Sensitivity	Specificity	Dice Score
Unet+	0.672	0.902	0.518
Inf-Net	0.692	0.943	0.682
Semi-Inf-Net	0.725	0.960	0.739
TV-Unet	0.808	0.960	0.801

Table 11

Comparison of the TV-Unet model performance with other methods in terms of Sensitivity, Specificity and DSC for the Ground-Glass mask on COVID-SemiSeg dataset. Best performance shown in bold font.

Model	Sensitivity	Specificity	Dice Score
DeepLab-v3+ (stride=8)	0.478	0.863	0.375
DeepLab-v3+ (stride=16)	0.713	0.823	0.443
FCN8s	0.537	0.905	0.471
Semi-Inf-Net+FCN8s	0.720	0.941	0.646
TV-Unet	0.762	0.979	0.655

Table 12

Comparison of the TV-Unet model performance with several other methods in terms of Sensitivity, Specificity and DSC for the Consolidation mask on COVID-SemiSeg dataset. Best performance shown in bold font.

Model	Sensitivity	Specificity	Dice Score
DeepLab-v3+ (stride=8)	0.120	0.584	0.117
DeepLab-v3+ (stride=16)	0.245	0.560	0.188
FCN8s	0.212	0.567	0.221
Semi-Inf-Net+FCN8s	0.186	0.639	0.238
TV-Unet	0.558	0.988	0.537

Comparison of the TV-Unet model performance with other recent methods in terms of Sensitivity, Specificity and DSC for pathologic regions on COVID-SemiSeg dataset. Best performance is shown in bold font. Comparison of the TV-Unet model performance with other methods in terms of Sensitivity, Specificity and DSC for the Ground-Glass mask on COVID-SemiSeg dataset. Best performance shown in bold font. Comparison of the TV-Unet model performance with several other methods in terms of Sensitivity, Specificity and DSC for the Consolidation mask on COVID-SemiSeg dataset. Best performance shown in bold font.

Conclusion

A novel deep learning framework for COVID-19 segmentation from CT images was reported. We used the popular Unet architecture as the main framework, and improved its performance by an added connectivity promoting regularization term to encourage the model to generate larger contiguous connected segmentation maps. We showed that our trained model achieved a high accuracy rate for detecting of pathologic COVID-19 regions. We report the model performance under various hyper-parameter settings, which can be helpful for future research by the community to know the impact of different parameters on the final results. Last but not least, we demonstrated that our TV-Unet approach outperformed other reported methods. We will further extend this work to semi-supervised setting, in which a combination of labeled and unlabeled data will be used for training the model. Such an approach will undoubtedly be extremely useful as collecting accurate segmentation labels for COVID-19 remains very challenging.

16 in total

1. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation.

Authors: Vijay Badrinarayanan; Alex Kendall; Roberto Cipolla
Journal: IEEE Trans Pattern Anal Mach Intell Date: 2017-01-02 Impact factor: 6.226

2. Group-based sparse representation for image restoration.

Authors: Jian Zhang; Debin Zhao; Wen Gao
Journal: IEEE Trans Image Process Date: 2014-05-12 Impact factor: 10.856

3. Coronavirus Infections-More Than Just the Common Cold.

Authors: Catharine I Paules; Hilary D Marston; Anthony S Fauci
Journal: JAMA Date: 2020-02-25 Impact factor: 56.272

4. UNet++: A Nested U-Net Architecture for Medical Image Segmentation.

Authors: Zongwei Zhou; Md Mahfuzur Rahman Siddiquee; Nima Tajbakhsh; Jianming Liang
Journal: Deep Learn Med Image Anal Multimodal Learn Clin Decis Support (2018) Date: 2018-09-20

5. Inf-Net: Automatic COVID-19 Lung Infection Segmentation From CT Images.

Authors: Deng-Ping Fan; Tao Zhou; Ge-Peng Ji; Yi Zhou; Geng Chen; Huazhu Fu; Jianbing Shen; Ling Shao
Journal: IEEE Trans Med Imaging Date: 2020-08 Impact factor: 10.048

6. Chest CT findings of COVID-19 pneumonia by duration of symptoms.

Authors: Xun Ding; Jia Xu; Jun Zhou; Qingyun Long
Journal: Eur J Radiol Date: 2020-04-18 Impact factor: 3.528

7. Deep-COVID: Predicting COVID-19 from chest X-ray images using deep transfer learning.

Authors: Shervin Minaee; Rahele Kafieh; Milan Sonka; Shakib Yazdani; Ghazaleh Jamalipour Soufi
Journal: Med Image Anal Date: 2020-07-21 Impact factor: 8.545

8. CT imaging and clinical course of asymptomatic cases with COVID-19 pneumonia at admission in Wuhan, China.

Authors: Heng Meng; Rui Xiong; Ruyuan He; Weichen Lin; Bo Hao; Lin Zhang; Zilong Lu; Xiaokang Shen; Tao Fan; Wenyang Jiang; Wenbin Yang; Tao Li; Jun Chen; Qing Geng
Journal: J Infect Date: 2020-04-12 Impact factor: 6.072

Review 9. COVID-19 and Italy: what next?

Authors: Andrea Remuzzi; Giuseppe Remuzzi
Journal: Lancet Date: 2020-03-13 Impact factor: 79.321

13 in total

1. Effective multiscale deep learning model for COVID19 segmentation tasks: A further step towards helping radiologist.

Authors: Abdul Qayyum; Alain Lalande; Fabrice Meriaudeau
Journal: Neurocomputing Date: 2022-05-12 Impact factor: 5.779

2. Machine Learning-Based Research for COVID-19 Detection, Diagnosis, and Prediction: A Survey.

Authors: Yassine Meraihi; Asma Benmessaoud Gabis; Seyedali Mirjalili; Amar Ramdane-Cherif; Fawaz E Alsaadi
Journal: SN Comput Sci Date: 2022-05-12

3. Cross-Site Severity Assessment of COVID-19 From CT Images via Domain Adaptation.

Authors: Geng-Xin Xu; Chen Liu; Jun Liu; Zhongxiang Ding; Feng Shi; Man Guo; Wei Zhao; Xiaoming Li; Ying Wei; Yaozong Gao; Chuan-Xian Ren; Dinggang Shen
Journal: IEEE Trans Med Imaging Date: 2021-12-30 Impact factor: 10.048

4. A Deep Residual U-Net Algorithm for Automatic Detection and Quantification of Ascites on Abdominopelvic Computed Tomography Images Acquired in the Emergency Department: Model Development and Validation.

Authors: Hoon Ko; Jimi Huh; Kyung Won Kim; Heewon Chung; Yousun Ko; Jai Keun Kim; Jei Hee Lee; Jinseok Lee
Journal: J Med Internet Res Date: 2022-01-03 Impact factor: 5.428

5. Multilevel depth-wise context attention network with atrous mechanism for segmentation of COVID19 affected regions.

Authors: Abdul Qayyum; Mona Mazhar; Imran Razzak; Mohamed Reda Bouadjenek
Journal: Neural Comput Appl Date: 2021-10-26 Impact factor: 5.102

6. COVID-rate: an automated framework for segmentation of COVID-19 lesions from chest CT images.

Authors: Nastaran Enshaei; Anastasia Oikonomou; Moezedin Javad Rafiee; Parnian Afshar; Shahin Heidarian; Arash Mohammadi; Konstantinos N Plataniotis; Farnoosh Naderkhani
Journal: Sci Rep Date: 2022-02-25 Impact factor: 4.379

Review 7. Supervised and weakly supervised deep learning models for COVID-19 CT diagnosis: A systematic review.

Authors: Haseeb Hassan; Zhaoyu Ren; Chengmin Zhou; Muazzam A Khan; Yi Pan; Jian Zhao; Bingding Huang
Journal: Comput Methods Programs Biomed Date: 2022-03-05 Impact factor: 7.027

8. Deep feature fusion classification network (DFFCNet): Towards accurate diagnosis of COVID-19 using chest X-rays images.

Authors: Jingyao Liu; Wanchun Sun; Xuehua Zhao; Jiashi Zhao; Zhengang Jiang
Journal: Biomed Signal Process Control Date: 2022-04-13 Impact factor: 5.076

9. Depth-wise dense neural network for automatic COVID19 infection detection and diagnosis.

Authors: Abdul Qayyum; Imran Razzak; M Tanveer; Ajay Kumar
Journal: Ann Oper Res Date: 2021-07-03 Impact factor: 4.820

Review 10. Review and classification of AI-enabled COVID-19 CT imaging models based on computer vision tasks.

Authors: Haseeb Hassan; Zhaoyu Ren; Huishi Zhao; Shoujin Huang; Dan Li; Shaohua Xiang; Yan Kang; Sifan Chen; Bingding Huang
Journal: Comput Biol Med Date: 2021-12-18 Impact factor: 6.698