Literature DB >> 35935468

MID-UNet: Multi-input directional UNet for COVID-19 lung infection segmentation from CT images.

Jianning Chi¹, Shuang Zhang¹, Xiaoying Han¹, Huan Wang¹, Chengdong Wu¹, Xiaosheng Yu¹.

Abstract

Coronavirus Disease 2019 (COVID-19) has spread globally since the first case was reported in December 2019, becoming a world-wide existential health crisis with over 90 million total confirmed cases. Segmentation of lung infection from computed tomography (CT) scans via deep learning method has a great potential in assisting the diagnosis and healthcare for COVID-19. However, current deep learning methods for segmenting infection regions from lung CT images suffer from three problems: (1) Low differentiation of semantic features between the COVID-19 infection regions, other pneumonia regions and normal lung tissues; (2) High variation of visual characteristics between different COVID-19 cases or stages; (3) High difficulty in constraining the irregular boundaries of the COVID-19 infection regions. To solve these problems, a multi-input directional UNet (MID-UNet) is proposed to segment COVID-19 infections in lung CT images. For the input part of the network, we firstly propose an image blurry descriptor to reflect the texture characteristic of the infections. Then the original CT image, the image enhanced by the adaptive histogram equalization, the image filtered by the non-local means filter and the blurry feature map are adopted together as the input of the proposed network. For the structure of the network, we propose the directional convolution block (DCB) which consist of 4 directional convolution kernels. DCBs are applied on the short-cut connections to refine the extracted features before they are transferred to the de-convolution parts. Furthermore, we propose a contour loss based on local curvature histogram then combine it with the binary cross entropy (BCE) loss and the intersection over union (IOU) loss for better segmentation boundary constraint. Experimental results on the COVID-19-CT-Seg dataset demonstrate that our proposed MID-UNet provides superior performance over the state-of-the-art methods on segmenting COVID-19 infections from CT images.

Entities: Chemical

Keywords: COVID-19; CT image; Convolutional neural networks; Deep learning; Infection segmentation

Year: 2022 PMID： 35935468 PMCID： PMC9344813 DOI： 10.1016/j.image.2022.116835

Source DB: PubMed Journal: Signal Process Image Commun ISSN： 0923-5965 Impact factor: 3.453

Introduction

The coronavirus disease 2019 (COVID-2019) has rapidly become a global pandemic since the first case was reported from Wuhan, China in December 2019 [1]. Counted by the World Health Organization (WHO), there have been 99,363,697 total confirmed cases until January 26, 2021, with 2,135,959 deaths [2]. According to the statistics and analysis from the European Centre for Disease Prevention and Control (ECDC), the world will face a high risk of the second outbreak of COVID-2019 in the following months due to the weather change and virus vibration [3]. Therefore, quick and accurate diagnosis of COVID-2019 plays an important role in the prevention and control of the disease. Though the reverse-transcription polymerase chain reaction (RT-PCR) has been considered as the gold standard for COVID-19 diagnosis, it suffers from the strict testing environment, long testing period and high false negative rates [4]. Given to the convenience of operation and the ability of showing three-dimensional structure of the lung, computed tomography (CT) has been considered as one of the important complements to PT-PCR tests for the early diagnosis of COVID-19, especially for the follow-up assessment and evaluation of disease evolution [5]. As shown in Fig. 1, most important signs of infection regions can be observed in the CT slice. However, delineating the infections manually is laborious, time-consuming and bias to clinical experiences, resulting in subjective or inaccurate diagnosis. Therefore, automatic segmentation of infections in lung CT scans via computer vision techniques has attracted much attention of clinical researchers [6].

Fig. 1

Example of COVID-19 infected regions in the lung CT slice.

Recently, deep learning based frameworks have been applied in many medical image processing fields, including medical image segmentation [7], lesion detection [8] and classification [9]. According to these successful experiences, many deep learning systems have been proposed for COVID-19 infection detection [10], [11], [12]. The well-known UNet [7] and its variants, including UNet++ [13], V-Net [14], have been adopted as some early but effective attempts for segmenting lungs [15], pulmonary opacities [16] and infection regions from CT scans to distinguish COVID-19 from common pneumonia. Following their works, more deep convolutional neural networks, such as Inf-Net [10], MSD-Net [11] and COPLE-Net [12], have been proposed recently and proved their abilities in providing state-of-the-art performance on COVID-19 infections segmentation from lung CT scans. However, the effects of current methods are still limited by several challenges: (1) The differences between COVID-19 infections, other pneumonia regions and normal lung tissues are quite vague due to the low quality of the CT slice. For example, the blurred boundaries of the GGOs make them similar as the tissues, resulting in false negative detection. (2) Image characteristics of the infections vary a lot in different cases, such as intensity, texture and size. These varied low-level features make it difficult for the networks to recognize the high-level features of the infections. (3) The shapes of the COVID-19 infection regions are quite irregular, making it difficult to measure the differences of segmentation boundaries between the predicted results and the ground truth. Example of COVID-19 infected regions in the lung CT slice. To address the above problems, a novel COVID-19 infection segmentation network is designed in this work following a novel theory, that is, using multidimensional low-level image characteristics as the input of the convolutional neural network to extract more representative high-level features. In this way, the low-level characteristics can participate in the whole deep learning process to generate “instructed” high-level features, instead of just working as simple complements of the high-level features learned from the image solely. To further enhance the representability of features learned by the network, we propose a directional convolutional block (DCB) consisting of 4 different directional kernels, and apply it on every short-cut connection route of the backbone UNet to refine the transferred features so that the hidden directional information could be revealed more clearly. Moreover, to predict the precise boundaries of the infections, a curvature histogram based contour loss function is proposed, and combined with the Binary Cross Entropy (BCE) loss, Intersection over Union (IOU) loss as a joint loss function to constrain the training process. The proposed network could better simulate the behaviors of human clinicians in detecting COVID-19 infections: they comprehensively observe the intensity, texture information from the CT image, then summarize and refine the visual features from these low-level observations according to their experiences, finally obtain the semantic diagnosis and delineate the boundaries of the infection regions. Experimental results illustrate that the proposed method performs better than the state-of-the-art methods on segmenting infections. In summary, the main contributions of our work are: We propose a multidimensional input for the segmentation network by combining the original image and its transformations, which can reflect the GGOs or infiltration features of the COVID-19 infection regions. We propose a directional convolution block (DCB) concatenating 4 directional kernels that are sensitive to horizontal, vertical and diagonal changes independently, which can represent the fibrotic-streak-like features of the COVID-19 infection regions. We propose a curvature histogram based region contour loss function, which can restrict the training process of the network to provide more precise segmentation results of irregular boundaries. The rest of this paper is organized as follows. Section 2 reviews some works related to our work. The architecture of the proposed network is described in Section 3. We present and analyze the experimental results in Section 4. Finally, our work is concluded in Section 5.

Related works

Segmentation in lung CT images

Segmentation in lung CT scans [17], [18] has attracted much attention of researchers because it could assist doctors to qualify and quantify the shapes, locations or other features of organs and lesions, which are important for diagnosis of lung diseases [19], [20]. In the early stage, the lung CT segmentation methods mostly extracted hand-crafted features from the image, such as texture information [21], region information [22], and contour information [23]. Then the features were fed to different classifiers for segmentation, e.g., k-nearest-neighbor classifier (KNN) [24], support vector machine (SVM) [25]. Due to the similarity of visual appearances between lesions, normal tissues and organs, the performance of these methods was usually limited. To overcome the shortcomings of hand-crafted based methods, several deep learning based methods have been proposed for their stronger abilities of visual features representation. For example, Jiang et al. [26] developed two multiple resolution residually connected network (MRRN) to combine features across multiple image resolution and feature levels for lung tumors segmentation. In [27], Dou et al. implemented a three-dimensional fully convolutional neural network (3D-FCN) to detect nodule candidates and integrated two residual blocks for accurate classification of true nodules. Cao et al. [15] improved the UNet [7] with the residual-dense mechanism for lung nodule segmentation. UNet++ was proposed by Zhou et al. [13] as a semantic and instance segmentation for medical images, and provided convincing performance for lung nodule segmentation.

Segmentation for COVID-19 infections

Given to the good effects for lung structure segmentation, many deep learning methods [10], [11], [12], [28], [29], [30] have been proposed for COVID-19 infection segmentation and diagnosis. UNet [7] and ResNet [31] were used successively by Gozes et al. [28] for COVID-19 related abnormalities detection. In [29], Chen et al. trained the UNet++ [13] by their collected CT image slices for COVID-19 identification, achieving performance comparable with expert radiologists. Shan et al. [30] integrated V-Net [14] and the bottleneck structure [31] as a VB-Net that was able to distinguish multiple structures from lung CT scans, such as parenchymas, lobes and infections. Besides UNet and its variants, CNN-based methods with novel architectures have been applied for COVID-19 segmentation. For example, Fan et al. [10] designed a new COVID-19 Lung Infection Segmentation Network (Inf-Net) for infected regions segmentation. In the network, a parallel partial decoder was used for global map generation, and edge-attention and implicit reverse attention were explicited to enhance the representations. Considering the noise interruption, Wang et al. [12] proposed a COPLE-Net and a noise-robust dice loss. They were then combined with an adaptive self-ensembling training framework for pneumonia lesions segmentation. In [11], Zheng et al. proposed a multi-scale discriminative network (MSD-Net) that consists of pyramid convolution block (PCB), channel attention block (CAB) and residual refinement block (RRB). Experimental results illustrated that the MSD-Net could effectively segment multiple infection categories.

Imaging characteristics of COVID-19 infections

Extracting high-level visual features from the deep learning frameworks could simulate the clinicians to focus on the infection regions roughly. To delineate the boundaries of the infections precisely, low-level image characteristics of COVID-19 have proved to be necessary in several segmentation or classification tasks [6], [32], [33]. Shi et al. [32] extracted four location-specific hand-crafted features, including volume, infected lesion number, histogram distribution and surface area from chest CT images. Then the high-level representations of these hand-crafted features were leveraged for COVID-19 classification. In [33], Tang et al. calculated 63 quantitative features, such as the infection ratio of the whole lung and the volume of GGO regions, and trained a random forest (RF) model to assess the severity of COVID-19. Shi et al. [6] proposed a logistic regression method with clinical and laboratory features from chest CT images, leading to effective classification of COVID-19. Expanding to pulmonary nodule detection, Zheng et al. [34] took maximum intensity projection (MIP) images of with different slab thicknesses as the input of CNN, achieving accurate and robust performance on nodule detection in CT scans. All the above methods have proved the potential of low-level features in improving the deep learning based methods in COVID-19 infection segmentation. However, current methods usually concatenated the low-level image characteristics and the high-level visual features as the input of the subsequent classifier. The simple combination limited the effects of deep convolution neural networks in mining the essential differences between the infection regions and others. Therefore, it is necessary to design a novel structure to combine the low-level image description and the network learning.

Proposed method

As shown in Fig. 2, the proposed COVID-19 infection segmentation network applies the UNet-like structure as backbone with the following novel modules:

Fig. 2

The architecture of our proposed MID-UNet model. The original image is expanded by its three transformations into multidimensional input of the network, where the encoding feature maps are refined by the directional convolution blocks (DCBs) and concatenated with the original short-cut connection for the up-convolution.

Different transformations of the original lung CT slice are concatenated as a multidimensional input of the network, so that the GGOs and infiltration features of the COVID-19 infection region can be better reflected. Directional convolution blocks (DCBs) are proposed and applied on different short-cut connection routes of the network, so that the fibrotic-streak-like feature of the COVID-19 infection region can be better represented. A curvature histogram based contour loss is designed to form the joint loss function with the BCE loss and IOU loss, so that the irregular boundary of the COVID-19 infection region can be delineated more precisely. The architecture of our proposed MID-UNet model. The original image is expanded by its three transformations into multidimensional input of the network, where the encoding feature maps are refined by the directional convolution blocks (DCBs) and concatenated with the original short-cut connection for the up-convolution.

Multidimensional input

The GGOs and infiltrations are significant features of COVID-19 infection regions in CT scans, with the visual appearances of low intensities, complex textures or vague shadows over bronchi. However, these image characteristics cannot be extracted effectively by conventional CNNs, where only the original images are taken as input and the learning processes begin from pixel-level features. Therefore, to reflect more regional features of infections, we propose a multidimensional input by concatenating the original image and its transformations, which can highlight the intensity distribution, texture details and blurring extent of the image.

Transformation via non-local means filter

The GGOs and infiltrations usually show lower intensities than the normal structures in lung, but noise in the CT scan might degrade this characteristic. We make use of the non-local means (NLM) filter [35] to remove the noise and smoothen the image so that the intensity distribution in each region could be more homogeneous and different from other regions. The filtering process is mathematically expressed as: where and are the input image and the filtered image, represents the non-local means filtering method. The NLM transformation of an example CT slice is illustrated in Fig. 3. The intensities in the image distribute more smooth and the “low-intensity” characteristic of the infection is more obvious. Therefore, the NLM transformation can highlight the differences between infections and normal regions in “intensity-level”.

Fig. 3

Example of transforming the original image via non-local means filtering. The infected regions (surrounded by red lines) appear lower intensities than other tissues.

Transformation via contrast limited adaptive histogram equalization

The GGOs and infiltrations usually illustrate many texture details since the exudates are mixed with vessels or other tissues. To highlight these texture details, we propose to enhance the image by the contrast limited adaptive histogram equalization (CLAHE) [36]. The transformation can be formulated as: where and are the input image and the transformation, denotes the histogram equalization method. An example enhanced result is shown in Fig. 4, where the infection regions with small mesh textures and high contrast are highlighted. It proves that the CLAHE transformation could be used as an effective complement to display the “texture map” of the input image.

Fig. 4

Example of transforming the original image via CLAHE enhancement. The textural differences between infected regions (surrounded by red lines) and normal tissues are highlighted.

Transformation via unclarity descriptor

Besides the relatively low intensity and the complex inner texture, another CT appearance of GGOs and infiltrations in COVID-19 infection is the vague shadows over bronchi so called “white fog”. Intuitively, the vague shadows are blurred, unclear regions in the image. Therefore, we propose a local unclarity descriptor to quantize these blurred regions, which is defined as: where represents the image unclarity value at , is the local neighborhood centered at with window size of , while and denote the Laplacian and Gaussian filtering of the local neighborhood , respectively. The clear region after sharpening varies much from that after smoothing , while the vague region will not change much by either sharpening or smoothing operation. Fig. 5 illustrates the result of applying the unclarity descriptor to the CT image, where the lesion regions show higher values than other regions. Consequently, the proposed transformation could reflect the “blurring extent” in different image regions.

Fig. 5

Example of transforming the original image via unclarity descriptor.

As shown in Fig. 6, we form the multidimensional input by concatenating the above three transformations with the original image: These transformations with different image characteristics highlighted work as the prior-knowledge of clinicians to instruct the network learning. The low-level features reflecting the GGOs and infiltrations of COVID-19 can be learned more efficiently and accurately.

Fig. 6

Multidimensional input via concatenating the original image and its three transformations.

Example of transforming the original image via unclarity descriptor. Multidimensional input via concatenating the original image and its three transformations.

Directional convolution module on short-cut connection

Another significant visual characteristic of COVID-19 infection is the fibrotic-streak-like feature, which presents as mesh textures in the image. However, it is degraded in the feature extraction and transmission process of UNet. In our work, directional convolution block is proposed to refine the extracted features and adopted as a complement of every decoder module input, therefore the fibrotic-streak characteristics in the COVID-19 infections can be better highlighted.

Directional convolution block (DCB)

The directional convolution block (DCB) aims to refine the feature maps according to multi-direction receptive fields. As illustrated in Fig. 7(a) to (d), the DCB consists of 4 convolution kernels , , and with scale size , which can be mathematically formulated as: where denotes the weight parameter at in the corresponding kernel , , and . In every certain kernel, only weight parameters on its related direction are initialized randomly and updated automatically through the network training, while other parameters are set to 0 constantly. By this means, each kernel is only sensitive to the features changed in its represented direction. In coordinate to the feature map scale, in this work, the DCB blocks with scale size 3 and 5 are applied to the bottom and top two short-cut connections, respectively.

Fig. 7

The convolution kernels in the directional convolution block (DCB) with size . (a) to (d) are the DCB kernels in horizontal, vertical and diagonal directions.

Multi-channel features for up-convolution

As shown in Fig. 8, each DCB is adopted on the short-cut connection to refine the features from the encoder module. Then the refined directional feature map and the original feature map are concatenated with the output of previous decoder module to form the input of current decoder module : With the multi-channel features in up-convolution routine, the mesh textures or fibrotic streaks showing obvious directionality can be additionally considered in the visual features representation, leading to more accurate detection of COVID-19 infections. Fig. 9 illustrates the visualized feature maps at levels , and in Fig. 2 of the COVID-19 infection region calculated by the proposed network. Compared with the feature maps generated by the normal UNet, the UNet with multidimensional input, the UNet with DCB, the UNet with multidimensional input and normal convolution on short-cut connection, the proposed multidimensional input and DCB work together to highlight the differences between the infection regions and other organs or normal tissues. With better representation of features, the proposed network could provide superior performance on COVID-19 infections segmentation.

Fig. 8

Concatenation of enhanced feature maps by DCB and short-cut connection.

Fig. 9

The visualized pixel-wise feature map on different network levels of an example image by different methods to process the example image. The first row are the input lung CT image and the COVID-19 infection region. The second to fourth row represent the feature maps in , and deconvolution levels in Fig. 2. The columns from left to right are the feature maps generated by different methods.

Concatenation of enhanced feature maps by DCB and short-cut connection. The visualized pixel-wise feature map on different network levels of an example image by different methods to process the example image. The first row are the input lung CT image and the COVID-19 infection region. The second to fourth row represent the feature maps in , and deconvolution levels in Fig. 2. The columns from left to right are the feature maps generated by different methods.

Joint loss function

The COVID-19 infections usually occupy irregular regions with sinuous boundaries, which are different from nodules or normal tissues. We propose a novel contour loss based on the curvature histogram to restrict the segmentation following this boundary signature. Then we combine it with IOU loss and binary cross entropy (BCE) loss as a joint loss for the network training, leading to more precise COVID-19 infection region prediction.

Contour loss function

The boundaries of the COVID-19 infection regions are usually irregular and sinuous, which are different from nodules or normal tissues. Following this observation, we propose a contour loss function based on the local curvature of the predicted region and the ground truth: where and are the predicted infection segmentation and the ground truth segmentation, represents the local curvature histogram with bins, which is calculated from the local curvature map of the image and has proved in [37] to be sensitive to the tiny changes of region boundaries. For the local curvature calculation, we follow the definition in [37]: where denotes the curvature at pixel , is the circle centered at with radius , is the segmented region and is on the edge of , calculates the region area.

Joint loss function

We combine the proposed contour loss function with the IOU loss function and the BCE loss function, which are both widely used for image segmentation. The joint loss can be formulated as: where the is proposed as Eq. (7), and follow their original definitions in [38], [39]. , , are the weighting parameters for the three losses, empirically set as 0.5, 0.25 and 0.25, respectively.

Experiments

Experimental materials

COVID-19 segmentation dataset

The dataset used for evaluating the performance of the proposed network consists of 3 public datasets as follows:

COVID-19 CT segmentation dataset-medseg part.

The COVID-19 CT Segmentation Dataset-Medseg part contains 100 axial CT images collected by the Italian Society of Medical and Interventional Radiology from different COVID-19 patients. The CT images were segmented manually by a radiologist using different labels for lung infections identification.

COVID-19 CT segmentation dataset-radiopedia part.

The COVID-19 CT Segmentation Dataset-Radiopedia part includes 829 slices from 9 axial volumetric CTs, where 373 slices have been evaluated by a radiologist as positive and segmented into different labels.

COVID-19 CT lung and infection segmentation dataset.

The COVID-19 CT Lung and Infection Segmentation Dataset contains labeled COVID-19 CT volumes from 20 patients, where left lung, right lung and infections are labeled by two radiologists and verified by an experienced radiologist. We extract 2D CT axial slices from the 3D volumes, then remove non-lung regions, obtaining 1844 labeled 2D 512 × 512 CT images with lung regions. To reduce the influence of overfitting due to limited data, in our experiment, the above three datasets are integrated into one dataset with lung CT slices. Then the slices were randomly split into 2233, 270 and 270 samples for training, validation and testing. Furthermore, several data augmentation operations are employed to the training samples, including random rotation and flipping. Finally, the training set included 17,864 samples, while the validation and testing sets included 270 and 270 samples, respectively.

Evaluation metrics

To quantitatively evaluate the performance of different networks on segmentation lesions, the dice similarity coefficient (DSC) [40] is calculated between segmentation results and the ground truth: where and represent the ground truth and the predicted segmentation result, denotes the number of pixels. Larger DSC indicates more accurate segmentation. From the aspect of pixel-wise classification, sensitivity and specificity can also be used for segmentation evaluation: where , , , are the true positive rate, false negative rate, true negative rate and false positive rate calculated by comparing predicted results and ground truth.

Parameter setting

We apply the Adam optimizer [41] to train our model with an initial learning rate of 0.001 and the learning rate decays to 0.0001 after 100 epochs. All the experiments are performed in PyTorch on an Intel Xeon Silver 4110 2.1 GHz Server with 32G RAM and 4 NVIDIA TITATN Xp graphic processing unit cards with 12G memory.

Comparator methods

We compare our MID-Net with several different state-of-the-art methods for COVID-19 infection segmentation, including UNet [7], UNet++ [13], Attention-UNet [42], DeepLabV3 [43], Inf-Net [10], COPLE-Net [12], TransFPN [44] and TransUNet [45]. The hyper-parameters of these comparator networks are set to the same ones as described in their works.

Experimental results

Ablation study

In this section, we implement several experiments to evaluate the effectiveness of different modules in the proposed network. Table 1 shows the models we propose for COVID-19 infection segmentation but with different design strategies, and compares the average Dice scores, sensitivities, specificities of segmenting infected regions in the dataset. Fig. 10 illustrates the results of segmenting an example lung CT slice by different design strategies in Table 1. We evaluate the effectiveness of multidimensional input, directional convolution block and joint loss function as follows.

Table 1

Ablation studies of our MID-Net. The best results are shown in red fonts.

Methods	Dice	Sensitivity	Specificity
Backbone(UNet)	0.8273±0.0801	0.8418±0.0858	0.9976±0.0021
Model 1Backbone+NL	0.8325±0.1437	0.8767±0.0743	0.9931±0.0032
Model 2Backbone+CLAHE	0.8566±0.0934	0.8911±0.0817	0.9932±0.0025
Model 3Backbone+UC	0.8592±0.0813	0.8914±0.0513	0.9957±0.0027
Model 4Backbone+MI	0.8977±0.0876	0.9562±0.0733	0.9973±0.0034
Model 5Backbone+DCB	0.9085±0.1116	0.9628±0.0501	0.9980±0.0030
Model 6Backbone+OCB	0.8287±0.2076	0.8498±0.0378	0.9960±0.0081
Model 7Backbone+JL	0.8798±0.0819	0.9258±0.0925	0.9988±0.0027
Model 8Backbone+MI+JL	0.9166±0.0904	0.9568±0.0815	0.9971±0.0033
Model 9Backbone+DCB+JL	0.9219±0.0811	0.9631±0.0708	0.9980±0.0035
Model 10Backbone+MI+DCB	0.9428±0.0677	0.9458±0.0810	0.9983±0.0021

Model 11Backbone+DCB+JL+MI(FC)	0.8682±0.0630	0.8966±0.0608	0.9968±0.0026
Model12Backbone+MI(input)+DCB+JL

Fig. 10

Results of segmenting an example lung CT slice by different strategies of the proposed method. (a) is the input image, (b) is the ground truth result, (c) is the UNet network as backbone, (d) to (o) are the models listed in Table 1.

Effectiveness of multidimensional input.

To evaluate the effectiveness of multidimensional input (MI), we compare the performance of Model 4, 8, 10 and 12 with MI in Fig. 10(g),(k),(m) and (o), and that of other models without MI. The results illustrate that the MI module can predict the GGOs region, marked by the red rectangle, better than other modules, including the DCB and joint loss. Further, we compare the proposed model with the model of applying the multidimensional image transformations before fully convolution layer as the complement of the high-level features extracted from the original image (Model 11). As shown in Fig. 10(n), Model 11 cannot provide comparable result in segmenting the region with GGOs, marked by the red rectangle. The above experiments prove the effectiveness and necessity of multidimensional input module. The testing results in Table 1 further demonstrates that the MI module can help the proposed network obtain better segmentation performance consistently.

Effectiveness of DCB.

The significance of DCB module is investigated by comparing networks with DCB module (Model 5, Model 9, Model 10, Model 11 and Model 12) and those without it. As shown in Fig. 10(h), (l), (m), (n) and (o), the models with DCB module could provide better segmentation of the region with fibro-streak-like features, marked by the blue rectangle, than those without DCB module. To further evaluate the effectiveness of DCB, we replace the DCB in Model 5 by the ordinary convolution (OCB) on the short-cut connection of the network, and compare the performance in Fig. 10(h) and (i). It can be considered that the COVID-19 infected region with fibro-streak-like features cannot be detected by the model with OCB module, marked by the blue rectangle. Moreover, Table 1 demonstrates that embedding DCB module into the network increases the value of dice, sensitivity and specificity. The above experiments indicate the effectiveness and necessity of the DCB in refining features for the COVID-19 infections, especially the fibro-streak-like features.

Effectiveness of joint loss function.

The segmentations of COVID-19 infection by models with joint loss function are shown in Fig. 10(j), (k), (l), (n) and (o). Compared with the models without joint loss function, they could provide better performance in delineate the sinuous boundary of the infected region, marked by the green rectangle. It can be explained by the fact that the curvature histogram is sensitive to the change of the boundary shape, even it is quite tiny. The quantitative results from two strategies are compared in Table 1, proving that the joint loss function is meaningful to describe the infection region with the precise contour. Ablation studies of our MID-Net. The best results are shown in red fonts. Results of segmenting an example lung CT slice by different strategies of the proposed method. (a) is the input image, (b) is the ground truth result, (c) is the UNet network as backbone, (d) to (o) are the models listed in Table 1.

Comparisons with other models

Fig. 11, Fig. 12 illustrate the segmentation results of lung infections from two example lung CT slices in our testing dataset via different segmentation networks. Some details of segmentation results are zoomed in red, blue and green rectangles, respectively. UNet over-estimates some normal tissues as infections, while omits true infected areas with small sizes. UNet++ performs better than the UNet in determining the infected regions, but a large number of tissues near the infections are mis-segmented. The Attention UNet cannot segment the complete area when the contrast is low and there is no salient area. DeepLabV3 provides good performance on segmenting small-size infected regions, but with several over-estimation of normal tissue as infections. Both Inf-Net and COPLE-Net suffer from relatively fuzzy boundaries of segmented regions, even with semi-supervision learning strategy. The TransFPN and TransUNet work better than the above methods, but still cannot provide complete segmentations of some fibrotic-streak-like infected regions. In contrast, the proposed MID-UNet provides superior performance over state-of-the-art methods, with respect to small-scale or fibrotic-streak-like infections recognition and boundaries preservation.

Fig. 11

Visual comparison of COVID-19 infections segmentation results.

Fig. 12

Visual comparison of COVID-19 infections segmentation results.

For quantitative evaluation, Table 2 shows the mean and standard deviation values of the dice, sensitivity and specificity generated by different methods in segmenting COVID-19 infections. Our proposed MID-Net model achieves best results, which confirms our qualitative observations.

Table 2

Quantitative results (meansd of dice, sensitivity and specificity) associated with different algorithms for the images in the testing dataset.

Method	Dice	Sensitivity	Specificity
UNet	0.8273±0.0801	0.8418±0.0858	0.9976±0.0021
UNet++	0.9159±0.0246	0.9164±0.0936	0.9835±0.0367
Attention UNet	0.9296±0.0706	0.9326±0.0786	0.9964±0.0019
DeepLabV3+	0.9323±0.0742	0.9297±0.0135	0.9810±0.0025
Inf-Net	0.9388±0.0675	0.9406±0.0784	0.9985±0.0014
COPLE-Net	0.9395±0.0553	0.9408±0.0261	0.9978±0.0021
TransFPN	0.9517±0.0376	0.9536±0.0459	0.9975±0.0012
TransUNet	0.9583±0.0590	0.9545±0.0863	0.9941±0.0072
Proposed

Visual comparison of COVID-19 infections segmentation results. Visual comparison of COVID-19 infections segmentation results. Quantitative results (meansd of dice, sensitivity and specificity) associated with different algorithms for the images in the testing dataset.

Running time

Table 3 compares the running time of implementing different networks on the dataset. The proposed method costs 16.88 h in training, which is slower than the UNet, Attention UNet, COPLE-Net with 15.26, 15.57 and 15.47 h, but faster than the UNet++, DeepLabV3, Inf-Net, TransFPN and TransUNet with 17.58, 41.08, 20.69, 50.43 and 66.80 h, respectively. For the testing cases, our proposed method costs on 1.36 s on average, slower than UNet, but faster than TransFPN and TransUNet. Considering the disparity of hardware from other methods, the proposed method can still be optimized for clinical practice.

Table 3

Training and testing time of implementing different methods on the training and testing datasets.

Method	Training time (h)	Testing time (s)
UNet	15.26	1.15
UNet++	17.58	1.59
Attention UNet	15.57	1.16
DeepLabV3+	41.08	4.25
Inf-Net	20.69	2.19
COPLE-Net	15.47	1.24
TransFPN	50.43	5.14
TransUNet	66.80	6.28
Proposed	16.88	1.36

Training and testing time of implementing different methods on the training and testing datasets.

Discussion

COVID-19 infection segmentation in lung CT scan has played important role in computer-assisted COVID-19 diagnosis. Deep learning based methods have been widely applied to provide state-of-the-art results in COVID-19 infected regions segmentation. However, current methods still suffer from several drawbacks so that the results are not fully convincing. UNet cannot extract the essential features from the low-quality CT scans, resulting in incorrect segmentation. UNet++ enhances the ability of feature extraction by using dense-like structure, but it lacks in feature refinement so there still exist the problems of omitting true infected regions and over-estimating the normal tissues as infections. The Attention UNet introduces the attention modules that can enhance the activation of the salient areas, but the segmentations are defective in the low-contrast, inconspicuous regions. The integration of depth wise separable convolution and atrous spatial pyramid pooling in DeepLabV3 improves the ability of network in recognizing the small-scale infected regions, but the concentration of small-scale feature extraction results in detecting small tissues as infections incorrectly. The Inf-Net adopts a coarse-to-fine strategy that mimics how real clinicians segment lung infection regions, therefore it obtains satisfactory performance. The COPLE-Net applies a noise-robust dice loss and an adaptive self-ensembling framework to deal with noisy labels, leading to better segmentation of the infections inferences by noise. However, the constraints of segmentation boundaries are not fully considered in Inf-Net and COPLE-Net, so the resulted boundaries are relatively fuzzy and incomplete. By adopting Transformer structure in the network, the TransFPN and TransUNet work better than the above methods. However, the self-attention mechanism overlooks several visual features from the fibrotic-streak-like regions, leading to incomplete segmentations. Considering the first two difficulties of COVID-19 infection segmentation discussed in Section 1, that is, the low differentiation between infection regions and other tissues, and high variation between different COVID-19 cases, our proposed network attempts to describe the essential and representative visual features of COVID-19 infections. The multidimensional input includes three transformations of the original image as complements, so that the low intensity, detailed texture and vague shadow of GGOs and infiltrations could be described. The DCB adopted on the short-cut connections enhances the directionality of the extracted low-level features so the mesh texture or fibrotic streaks could be represented. Confronting the difficulty in precisely measuring the difference of irregular boundary between the predicted result and the ground truth, our proposed curvature histogram based contour loss can describe the shape signature of the sinuous boundary. By solving these three problems properly, our proposed method provides superior performance over other state-of-the-art methods.

Limitations of the study

Although our MID-Net provides convincing results in segmenting COVID-19 infections on the testing dataset, there exist some limitations in the current model. Firstly, we consider the ground glass opacities, interstitial infiltrations and consolidation together as COVID-19 infections. However, in practice, clinicians may need to further distinguish different types of infected areas for the condition assessment. Therefore, some multi-class labeling methods should be integrated with our method in the future. Secondly, the training samples used for our work all contain infected areas, so the MID-Net may have unsatisfactory accuracy in processing those non-infected slices. Expanding the dataset with more non-infected slices or adding a classifier of infection vs non-infection may solve this problem.

Conclusion

In this paper, we propose a so-called MID-Net network with novel structure for COVID-19 infection region segmentation in lung CT slices. Three spatial transformations are proposed to highlight some image characteristics according to the prior knowledge of the COVID-19. We then integrate these transformations with the original image as a multidimensional input of the network so the self-learning could be under a hidden-guidance. To highlight the mesh textures or fibrotic streaks, which are significant CT symptoms of COVID-19, we propose the directional convolution block (DCB) and apply it on every short-cut connections of the network for feature refinement. Moreover, we propose a contour loss function based on the local curvature description and combine it with IOU loss and BCE loss so that the boundary delineation could be more accurate. Experimental results demonstrate that the proposed MID-Net can effectively segment the COVID-19 infections from the CT slice and outperform the state-of-the-art models.

CRediT authorship contribution statement

Jianning Chi: Conceptualization, Methodology, Writing – original draft. Shuang Zhang: Data curation, Writing – review & editing. Xiaoying Han: Software, Validation. Huan Wang: Visualization. Chengdong Wu: Supervision. Xiaosheng Yu: Validation.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

22 in total

1. Automated 3-D segmentation of lungs with lung cancer in CT data using a novel robust active shape model approach.

Authors: Shanhui Sun; Christian Bauer; Reinhard Beichel
Journal: IEEE Trans Med Imaging Date: 2011-10-13 Impact factor: 10.048

2. Automatic lung segmentation from thoracic computed tomography scans using a hybrid approach with error detection.

Authors: Eva M van Rikxoort; Bartjan de Hoop; Max A Viergever; Mathias Prokop; Bram van Ginneken
Journal: Med Phys Date: 2009-07 Impact factor: 4.071

3. UNet++: A Nested U-Net Architecture for Medical Image Segmentation.

Authors: Zongwei Zhou; Md Mahfuzur Rahman Siddiquee; Nima Tajbakhsh; Jianming Liang
Journal: Deep Learn Med Image Anal Multimodal Learn Clin Decis Support (2018) Date: 2018-09-20

4. Inf-Net: Automatic COVID-19 Lung Infection Segmentation From CT Images.

Authors: Deng-Ping Fan; Tao Zhou; Ge-Peng Ji; Yi Zhou; Geng Chen; Huazhu Fu; Jianbing Shen; Ling Shao
Journal: IEEE Trans Med Imaging Date: 2020-08 Impact factor: 10.048

5. Automated segmentation of lungs with severe interstitial lung disease in CT.

Authors: Jiahui Wang; Feng Li; Qiang Li
Journal: Med Phys Date: 2009-10 Impact factor: 4.071

6. Multiple Resolution Residually Connected Feature Streams for Automatic Lung Tumor Segmentation From CT Images.

Authors: Jue Jiang; Yu-Chi Hu; Chia-Ju Liu; Darragh Halpenny; Matthew D Hellmann; Joseph O Deasy; Gig Mageras; Harini Veeraraghavan
Journal: IEEE Trans Med Imaging Date: 2018-07-23 Impact factor: 10.048

7. The Role of Chest Imaging in Patient Management during the COVID-19 Pandemic: A Multinational Consensus Statement from the Fleischner Society.

Authors: Geoffrey D Rubin; Christopher J Ryerson; Linda B Haramati; Nicola Sverzellati; Jeffrey P Kanne; Suhail Raoof; Neil W Schluger; Annalisa Volpi; Jae-Joon Yim; Ian B K Martin; Deverick J Anderson; Christina Kong; Talissa Altes; Andrew Bush; Sujal R Desai; Onathan Goldin; Jin Mo Goo; Marc Humbert; Yoshikazu Inoue; Hans-Ulrich Kauczor; Fengming Luo; Peter J Mazzone; Mathias Prokop; Martine Remy-Jardin; Luca Richeldi; Cornelia M Schaefer-Prokop; Noriyuki Tomiyama; Athol U Wells; Ann N Leung
Journal: Radiology Date: 2020-04-07 Impact factor: 11.105

8. A deep learning-based quantitative computed tomography model for predicting the severity of COVID-19: a retrospective study of 196 patients.

Authors: Weiya Shi; Xueqing Peng; Tiefu Liu; Zenghui Cheng; Hongzhou Lu; Shuyi Yang; Jiulong Zhang; Mei Wang; Yaozong Gao; Yuxin Shi; Zhiyong Zhang; Fei Shan
Journal: Ann Transl Med Date: 2021-02

9. Quantification of lung damage in an elastase-induced mouse model of emphysema.

Authors: Arrate Muñoz-Barrutia; Mario Ceresa; Xabier Artaechevarria; Luis M Montuenga; Carlos Ortiz-de-Solorzano
Journal: Int J Biomed Imaging Date: 2012-11-08