Literature DB >> 35966970

GFNet: Automatic segmentation of COVID-19 lung infection regions using CT images based on boundary features.

Chaodong Fan^1,2,3,4, Zhenhuan Zeng², Leyi Xiao^1,4,5,6,7, Xilong Qu⁴.

Abstract

In early 2020, the global spread of the COVID-19 has presented the world with a serious health crisis. Due to the large number of infected patients, automatic segmentation of lung infections using computed tomography (CT) images has great potential to enhance traditional medical strategies. However, the segmentation of infected regions in CT slices still faces many challenges. Specially, the most core problem is the high variability of infection characteristics and the low contrast between the infected and the normal regions. This problem leads to fuzzy regions in lung CT segmentation. To address this problem, we have designed a novel global feature network(GFNet) for COVID-19 lung infections: VGG16 as backbone, we design a Edge-guidance module(Eg) that fuses the features of each layer. First, features are extracted by reverse attention module and Eg is combined with it. This series of steps enables each layer to fully extract boundary details that are difficult to be noticed by previous models, thus solving the fuzzy problem of infected regions. The multi-layer output features are fused into the final output to finally achieve automatic and accurate segmentation of infected areas. We compared the traditional medical segmentation networks, UNet, UNet++, the latest model Inf-Net, and methods of few shot learning field. Experiments show that our model is superior to the above models in Dice, Sensitivity, Specificity and other evaluation metrics, and our segmentation results are clear and accurate from the visual effect, which proves the effectiveness of GFNet. In addition, we verify the generalization ability of GFNet on another "never seen" dataset, and the results prove that our model still has better generalization ability than the above model. Our code has been shared at https://github.com/zengzhenhuan/GFNet.

Entities: Chemical

Keywords: COVID-19; CT image; Convolutional neural network; Edge-guidance; Image segmentation

Year: 2022 PMID： 35966970 PMCID： PMC9359771 DOI： 10.1016/j.patcog.2022.108963

Source DB: PubMed Journal: Pattern Recognit ISSN： 0031-3203 Impact factor: 8.518

Introduction

Since December 2019, a large number of novel coronavirus cases have been reported in Wuhan, Hubei Province, China, and the number of infections is increasing. Novel coronavirus can cause acute respiratory diseases in humans and may even cause fatal acute respiratory distress syndrome (ARDS). The International Committee on Taxonomy of Viruses (ICTV) named the novel coronavirus SARS-CoV-2, and the World Health Organization (WHO) named it COVID-19. The Novel Coronavirus has been confirmed as capable of human-to-human transmission. The Novel Coronavirus spreads rapidly in China and the world due to massive traffic and population movement during the Spring Festival. The Novel Coronavirus has extremely high rates of morbidity and mortality. According to the World Health Organization (WHO), as of 10 April 2020, there have been 1521, 252 confirmed cases globally [1]. As of 12 July 2021, there have been 180 million confirmed cases worldwide and 4 million cumulative deaths [2]. Reverse transcription polymerase chain reaction (RT-PCR) is considered by the industry to be the gold standard for screening for COVID-19. However, this has limited rapid and accurate detection. In addition, RT-PCR testing has been reported to have a high rate of false negatives. As an important complement to RT-PCR testing, radiological imaging techniques such as X-ray and computed tomography (CT) have also played a role in current diagnosis, including follow-up evaluation and assessment of disease progression [7], [8]. A clinical study of 1014 patients from Wuhan, China, showed that chest CT analysis could achieve a sensitivity of 0.97, a specificity of 0.25, and an accuracy of 0.68 in detecting cases of neocoronary pneumonia, accompanied by RT-PCR test results as a reference [23]. Computed tomography (CT) imaging plays a crucial role in detecting the pulmonary manifestations of COVID-19 [3], [4], and the segmentation of infected lesions in CT scan is very important for quantitative measurement of disease progression [5], [6]. Lung CT image segmentation is a necessary initial step for lung image analysis [37]. The segmentation of lesions can remove unnecessary background areas and assist doctors in diagnosis, which is an important step in CT image analysis. In contrast to common pneumonia, COVID-19 presents with pulmonary ground-glass opacity (GGO) and signs of pulmonary solidity. CT imaging of COVID-19 in its early stages usually appears as one or more GGO nodular, patchy or lamellar shadows and is generally distributed along the field 1/3 of the lung and under the pleura. As the disease progresses, most patients with COVID-19 will develop solid lung lesions. During the COVID-19 outbreak in Wuhan, a large number of patients, including suspected cases, confirmed cases and follow-up cases, required chest CT examinations to observe the changes and severity of pneumonia. The qualitative evaluation of infection and longitudinal changes in CT slices could thus provide useful and important information in fighting against COVID-19. So we need to segment the lesion regions separately. Existing papers [36] have shown that considering a real-world application, segmentation is an important step since it removes background information, reduces the chance of data leak, and forces the model to focus only on important image areas. Under the current circumstances, any missed cases will continue to lead to COVID-19 transmission. Therefore, a large amount of work and high diagnostic accuracy pose great challenges to radiologists. In addition, the radiologist’s eyestrain increases the potential risk of missing some small lesions. In the face of such a serious epidemic, it is very necessary to apply deep learning to disease diagnosis. Early, accurate and rapid diagnosis of suspected cases is crucial to timely isolation and medical treatment, and is of great significance to patient treatment, epidemic control and public health security. So, developing an artificial intelligence (AI) method for COVID-19 computer-aided diagnosis could be very helpful to radiologists. In Section 2, we will introduce some recent examples of deep learning applied to lung CT segmentation, explain their shortcomings, and then introduce the ideas and solutions of this paper. In Section 3, we will introduce the network structure and its core modules and loss functions in detail. In Section 4, we will introduce the experimental environment, dataset sources, evaluation metrics, and illustrate how to determine the optimal output of the network. Then we will present our experimental results qualitatively and quantitatively and compare them with other methods, and add ablation experiments to verify the effectiveness of the core modules. Finally, the conclusion is given in Section 5.

Related work

Recently, deep learning systems have been designed to examine patients infected with COVID-19 via radiological imaging [41], [42]. In clinical practice, automatic segmentation of lesions is highly desirable [6]. Although CT scans are important for diagnosis and treatment decisions, automatically segmenting COVID-19 pneumonia lesions from CT scans is challenging for several reasons. First of all, the infected lesions have a variety of complex manifestations, such as ground glass shadow (GGO), consolidation, etc. [9]. Second, the size and location of pneumonia lesions vary greatly between different stages of infection and different patients. As shown in Fig. 1 (a–c) [10], the lesions were irregular in shape and blurred in boundary. Some lesions, such as GGO, had low contrast with the surrounding region. In addition, radiologists’ labeling of infected region is a highly subjective task, often subject to personal biases and influenced by clinical experience. As shown in Fig. 1(d). Due to the outbreak of COVID-19, it is difficult to collect enough labeled data for deep model training in a short period of time. In addition, obtaining high quality pixel-level lung infection annotations on CT slices is expensive and time consuming. These challenges make the task of automatically segmenting lesions more difficult.

Fig. 1

CT scan of COVID-19 patients with complex findings of pneumonia lesions (a-c) from three different patients, with some lesions highlighted by arrows.(d) Display the labels made by different observers on item (c). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) Li et al. [4] used U-Net [12] to segment lung from CT scan. UNet++ [13] is also used to detect [14] and segment [15] infected lesions from CT scan. Liu et al. [21] propose a weakly supervised COVID-19 infection segmentation method with scribble supervision. A et al. [37] design and evaluate an automatic tool for automatic COVID-19 Lung Infection segmentation and measurement using chest CT images. An uncertainty-aware mean teacher framework is incorporated into the proposed method to guide the model training, encouraging the segmentation predictions to be consistent under different perturbations for an input image. With the pixel level uncertainty measure on the predictions of the teacher model, the student model is guided with reliable supervision. Zhao et al. [22] proposed a new deep-learning-based method that integrates a 3D V-Net with shape priors for medical image segmentation. The shape prior was used to optimize the model weights in both V-Net input and output, which significantly improved the model performance. Wang et al. [16] trained the lung segmentation network by using ground-truth mask obtained by unsupervised method, designed an effective lightweight 3D residual network and proposed a weakly supervised COVID-19 lesion detection method. Wang et al. [10] proposed a novel noise-robust-DICE loss function and a noise robust learning framework based on CNNs self-integration, which is robust to noise labels and Dice losses. Fan et al. [17] proposed an Inf-Net automatic segmentation of infected region from lung CT images and a semi-supervised segmentation framework based on random selection propagation strategy, which only required a small number of labeled images and mainly used unlabeled data to alleviate the shortage of high-quality labeled data. Maghdid et al. [38] build a comprehensive dataset of X-rays and CT scan images from multiple sources as well as provides a simple but an effective COVID-19 detection technique using deep learning and transfer learning algorithms. Zhou et al. [43] proposed an ensemble deep learning model for novel COVID-19 detection from CT images. Mu et al. [45] proposed a multi-scale multi-level feature recursive aggregation (mmFRA) network which is used to integrate multi-scale features with multi-level features. Katsamenis et al. [46] proposed a deep learning framework that can detect COVID-19 pneumonia in thoracic radiographs, as well as differentiate it from bacterial pneumonia infection. Voulodimos et al. [25], [44] presented a few-shot learning paradigm for segmenting COVID-19 infectious regions. The main difference of the proposed algorithm compared with the traditional approaches is that it is an online learning paradigm, not the static supervised learning of U-Net. However, all the above methods have a common deficiency, that is, the fuzzy boundary problem of lesion segmentation is not solved completely, in other words, the fuzzy boundary is not exactly segmented. So the segmentation results are still kind of ambiguous. As shown in Fig. 2 . If ambiguous, for segmenting the lesion regions will affect the medical staff of judgment, if the normal tissue segmentation as the lesion regions, can make the patient’s condition mistaken for serious, if the lesion regions segmentation to normal tissue, may let the patient do not get proper treatment cases lead to misdiagnosis and even produce COVID-19 further spread.

Fig. 2

The position marked by the circle in the figure points out the problem of fuzzy boundary of segmentation results in an existing model [17]. In the two regions marked, it may be difficult for health care workers to determine from the naked eye whether these regions are infected or not. In image segmentation, these regions are difficult identify by previous methods because of their fuzzy boundaries and low contrast with normal tissues. Such areas of ambiguity may interfere with a health care provider’s perception of the infected region. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) Therefore, the method proposed in this paper is mainly aimed at solving the problem of fuzzy boundary and the problem that normal organizations are difficult to distinguish correctly. In clinical practice, doctors first determine the location of the lesion and then judge the results according to the characteristics of the diseased region when judging the patient’s condition according to CT images. According to the above steps, we should focus on the location and boundary of the lesion region when we use the deep neural network for learning. To solve the above problems, we will carry out the following two main operations successively: 1. Locate the lesion.2 Accurately extract contour. As we all know, in the convolutional neural network, the resolution of low-layer features is higher and contains more detailed information, but because of less convolution, it has lower semantics and more noise. High-layer features have stronger semantic information, but the resolution is low and the perception of details is poor [11]. How to integrate them efficiently, take their advantages and discard their faults is the key to improving the segmentation model. Therefore, we designed a kind of framework. First, aggregation high-level feature extracting rough estimate of the lesion region. We also designed a Edge-guidance module, used to provide guidance in the use of reverse attention modules for accurate contour extraction of lesions after localization. Finally, by fusing feature at each layer, the accurate segmentation of the lesion region was obtained. In short, our contributions in this paper are threefold: 1. A novel deep learning network framework, GFNet, was proposed for the segmentation of infected region of COVID-19 in two-dimensional CT images of lungs. By aggregating the high-layer features using the aggregation module, the aggregated features can capture context information and generate a global location map as an initial boot region for subsequent steps. In order to further dig the boundary information of the target, we use the reverse attention module step by step from the high-layer to the low-layer, then further extract the hidden details of each layer, and finally fuse the features of each layer, so that the network can fully extract the details that are difficult to be noticed by the previous model. 2. We design a Edge-guidance map that contains the boundary features of each layer to further extract the boundary information when the features of each layer are extracted. The experiment proves that this design is very effective. 3. We applied the GFNet framework to VGG16 and used our method on two different datasets. One data set was “seen” to verify learning ability, and the other was “not seen” to verify generalization ability. Experimental results show that GFNet has better learning ability and generalization ability than existing models. 4. We conducted experiments with each model on training datasets of different sizes. Our model can achieve good performance when the training set is relatively small. In real-world case decision making, our GFNet is fully capable of such tasks if it is put into application under time constraints or with few training samples. Our GFNet can also be sufficiently trained to achieve maximum performance if there is sufficient time or a large training sample.

Proposed method

In this section, we will give a detailed introduction to the GFNet network structure, core network components, and loss functions.

Global-feature network(GFNet)

Network overview: The framework of GFNet is shown in Fig. 3 . With VGG16 as the backbone network, the whole network is divided into five layers, and the high-layers features are layer 3, layer 4, and layer 5. We use the aggregation module to aggregate the high-layers features() to generate global location map [17]. Then, starting from the , importing and into a reverse attention module (RA), meanwhile, under the guidance of Edge-guidance module(Eg). The output and go to the next RA. Repeating the above operation to get finally, fuse n and get the prediction graph through sigmoid output. Next, we will introduce the core modules in this network and the loss function in detail.

Fig. 3

Structure diagram of GFNet model, f1 to f5 are the backbone of VGG16. GFNet contains one Eg, one aggregation module Sg, and five RA modules connected. The input image goes from F1 to F5, then from S5 back to S1 via Sg, Eg, RA modules. The final output result is the sum of each lateral output Si. See III. A for more details.

Edge guidance module

We know that the deeper the feature, the coarser the information, while the low-layer feature contains a lot of useful details. In order to make good use of the features of each layer, based on the idea of edge detection [18], we designed a Edge-guidance module(Eg) fusing the features of each layer, it contains the boundary information of each layer and has rich edge features, which helps us to extract the boundary of the lesion region more effectively and accurately in the subsequent feature extraction, so as to solve the problem of fuzzy boundary. The specific approach is as follows: For the VGG16, VGG16 network composes of 13 conv layers and 3 fully connected layers. We cut all the fully connected layers and the pool5 layer. On the one hand, we remove the fully connected layers due to the fact that they do not align with our design of a fully convolutional network. On the other hand, adding pool5 layer will increase the stride by two times, and it’s harmful for edge localization. Its conv layers are divided into five stages, in which a pooling layer is connected after each stage. The useful information captured by each conv layer becomes coarser with its receptive field size increasing. Detailed receptive field sizes of different layers can be seen in Table 1 . Hence, we combine hierarchical features from all the conv layers into a holistic framework, in which all of the parameters are learned automatically. Since receptive field sizes of conv layers in VGG16 are different from each other, our network can learn multi-scale, including low-level and objectlevel, information that is helpful to extract more boundary information. Since our boundary guidance module combines all the accessible conv layers to employ richer features, it is expected to achieve a boost in accuracy. At the same time of obtaining the Edge-guidance map, we compute the gradient of Ground-truth of the input image to obtain edge-Ground-truth of the boundary [17]. Then we conduct deep supervision of the Edge-guidance map and let it further learn the boundary features from the rich features. In Fig. 4 , we show the Edge-guidance results after combination. The edge obtains strong response at the COVID-19 infection region. Details of this module are shown in Fig. 4.

Table 1

Detailed receptive field and stride sizes of standard VGG16 net.

layer	conv1l	conv12	pool1	conv21	conv22	pool2
rf size	3	5	6	10	14	16
stride	1	1	2	2	2	4
layer	conv31	conv32	conv33	pool3	conv41	conv42
rf size	24	32	40	44	60	76
stride	4	4	4	8	8	8
layer	conv43	pool4	conv51	conv52	conv53	pool5
rf size	92	100	132	164	196	212
stride	8	16	16	16	16	32

Fig. 4

A Edge-guidance map is generated by fusing every layer feature to rich edge features. See III. B for more details.

Detailed receptive field and stride sizes of standard VGG16 net. A Edge-guidance map is generated by fusing every layer feature to rich edge features. See III. B for more details. We use the standard Binary Cross Entropy(BCE) loss function to measure the dissimilarity between the generated Edge-guidance map () and the edge-Ground-truth map () calculated by using the gradient of the ground-truth (GT):Where, the coordinates of each pixel in the Edge-guidance map and edge-ground-truth map . In addition, w and h respectively represent the width and height of the corresponding feature map.

Aggregation module

Many existing medical image segmentation networks [19], [20], use high and low level features in encoders to segment target organs and lesion regions. However, Wu et al. [35] pointed out that, compared with high-level features, low-level features require more computing resources due to their greater spatial resolution, but contribute less to performance. Inspired by this, we using a modules to aggregate high-layers features. Specifically, for the input 2d CT images, we first obtain three high-layers features = 3, 4, 5 of the VGG16. Then, we use a novel partial decoder [17], [35] to aggregate the high-layers features in a paralleled connection. The decoder produces a global location map = , which then serves as an initial guide in the RA module. The details are shown in Fig. 5 .

Fig. 5

Aggregated high-layers features are used to generate a global location map.

Reverse attention module

In clinical practice, clinicians usually use a two-step operation to segment the lung infection region, first roughly locate the infected region, and then accurately mark these region by examining the local tissue structure. Inspired by this process and [24], we designed GFNet using five reverse attention modules(RA). First, the module acts as a rough locator and generates a high-level global map with unstructured details to provide rough location information for the lung infection region. Secondly, our structure mine discriminative infection regions in an erasing manner [26], [27]. Specifically, compared with simply aggregating features from all layers [27],we propose to adaptively learn the reverse attention in all five layers features. Importing and into a RA, meanwhile, under the guidance of . The specific approach is as follows: goes through and reverse to get with reverse attention weight, and then is multiplied with the feature of the lower layer. Before getting the RA feature, we concatenate Eg with the result multiplied, so as to get feature under the guidance of . Finally, we add with to get lateral output of layer . For each lateral output , we conduct deep supervised learning. The output and go to the next RA. Repeating the above operation to get finally. Formula is as follows:Where represents concatenate, represents down sampling, followed by two two-dimensional convolution layers with 64 filters. In fact, RA weights widely adopted design in the salient object detection in computer vision [27], it is defined as:Among them represents the upsampling operation, represents activate function for Sigmoid, represents the reverse operation of subtracting the input from a matrix of all 1s. The symbol represents the extension of the single-channel feature map to multi-channels, including the reversion of each channel of the candidate tensors in Eq. (2). The final output of this layer is:The details of this process are shown in Fig. 6 . It is worth noting that erasure strategies in RA can ultimately refine inaccurate and coarse prediction region into an accurate and complete prediction map.

Fig. 6

The reverse attention module is used to implicitly learn edge features.

The reverse attention module is used to implicitly learn edge features. In [17], RA module is also used in Res2Net to extract features from layer 5 to layer 3. We differ from them in that here we apply the RA module to each layer and demonstrate that our approach is better in subsequent ablation experiments.

Loss function

We define the loss function as the combination of the weighted loss function and the weighted binary cross entropy (BCE) loss function , namely:Where represents the weight, which was set to 1 in our experiment. The two parts of provide effective global (image-level) and local (pixel-level) supervision to obtain accurate segmentation effects. Unlike the standard IoU loss function, which is widely used in the segmentation task, the weighted IoU loss function increases the weights at the sample points of the difficult pixel to highlight its importance. In addition, compared with the standard BCE loss function, pays more attention to the difficult pixel sample points rather than assigning the same weight to all pixels. The definitions of these losses are the same as those in [31], [32], and their effectiveness has been validated in target detection domain. We adopt deep supervision for the five lateral outputs () and the global location map , each of which is upsampled to the same size as the object-level segmented ground-truth map . Therefore, the total loss function is:

Experiment

Experimental environment and parameter setting

Our model was configured with the PyTorch 1.10.0, cuda 11.6 framework and trained on a single NVIDIA RTX 3080 laptop GPU. Before the training, we uniformly resize all the inputs to 352 352, and a multi-scale training method with a scaling ratio of [0.75, 1, 1.25] was used to train the datasets, so as to improve the generalization performance of the model. The network is trained using the Adam optimizer. Set the Epoch to 100 with a batch size of 4 and the learning rate to 1e-4.

Baselines

For the infected region experiment, we applied the GFNet framework of this paper to VGG16. We first identify the best value for of . And compared with two classic segmentation models in the medical field, including U-Net [12], U-Net++ [13], the latest model Inf-Net [17], and two segmentation models “Few-shot UNet” [25] and “SG-One” [39] based on few-shot learning.

Datasets

Currently, the number of CT images with segmentation labeling is very limited because manually segmenting areas of lung infection is a difficult and time-consuming task, and the outbreak of the disease is in its early stages. To solve this problem, we use a semi-supervised learning strategy to improve GFNet, using a large number of unlabeled CT images to effectively expand the training data. This strategy refers to the method in [17] and is based on the random sampling strategy to gradually expand the unlabeled data of the training dataset. See Algorithm 1 [17] for details. Specifically, we generate the pseudo labels for unlabeled CT images using the procedure described in Algorithm 1. The resulting CT images with pseudo labels are then utilized to train our model. This semi-supervised approach is simple, it requires no measures to evaluate the labels of predictions, and it has no threshold. Secondly, this strategy can provide better performance than other semi-supervised learning methods and prevent network overfitting. Recent research work [34] has confirmed this conclusion. We use the same training settings as in [17]. We generate pseudo labels for unlabeled CT images using the protocol described in Algorithm 1 [17]. The number of randomly selected CT images is set to 5, i.e., K = 5. Now there are 1600 unlabeled images, we need to perform 320 iterations with a batch size of 4. The unlabeled CT images are extracted from the COVID-19 CT Collection [33] dataset, which consists of 20 3D CT volumes from different COVID-19 patients. Fan et al. [17] had extracted 1600 2d CT axial slices from 3D volumes, removed non-lung regions, and constructed an unlabeled training dataset for effective semi supervised segmentation. After obtaining a semi-supervised dataset with pseudo labels, our next training phase consists of two steps: (i) Pre-training on 1600 CT images with pseudo labels, which takes 7.5 h to converge over 100 epochs with a batch size of 4. (ii) Fine tuning on 60 CT images with the ground-truth labels, which takes about 15 min to converge over 100 epochs with a batch size of 4. These 60 CT images were taken from COVID-19 CT Segmentation dataset [29], which consists of 100 axial CT images collected by the Italian Society of Medical and Interventional Radiology. A radiologist segmented the CT images using different labels to identify lung infections. Although this is the first open-access COVID-19 dataset for lung infection segmentation, it suffers from a small sample size, i.e., only 100 labeled images are available. We employ them as the labeled data , which consists of 60 CT images randomly selected as training samples, 20 CT images for validation, and the remaining 20 images for testing. We verify the model’s learning ability on this testing dataset. In addition, we found a public dataset “COVID-CS” [30] and tested the trained model directly on this dataset to verify our model’s generalization ability to “never seen” data. For a fair comparison, we adopt the same training mode and setting for the baseline model.

Algorithm 1

Evaluation metrics

According to Han et al. [5], Huang et al. [26], we used three widely used metrics: Dice similarity coefficient, Sensitivity (Sen.), Specificity (Spec.) and Precision (Prec.). We also introduce three golden indicators from the object detection field, including Structure Measure [27], Enhanced-alignment Measure [28], and Mean Absolute Error. In our evaluation, we choose with Sigmoid function as the final prediction . The seven metrics used to measure the performance of a model can be expressed as:

Dice coefficient

Dice is mainly used to calculate the similarity of two sets, and its definition is as follows:

Sensitivity ()

Sen. reflects the percentage of lung infections that were correctly segmented. It is defined as follows:

Specificity ()

Spec. reflects the percentage of lung non-infections that were correctly segmented. It is defined as follows:Among them is the pixel set of the entire CT.

Structural measure ():

This metric is more consistent with the human visual system and is used to measure the structural similarity between the prediction map and the ground-truth mask:Where is the balance factor used to control object-aware similarity and region-aware similarity . In this paper, we set the same metric value as the original text with the default setting (=0.5).

Enhanced-alignment measure ()

This is a recently proposed metric that measures both local and global similarity between two binary graphs. The formula is as follows:Where w and h represent the width and height of ground-truth map G, and (x, y) represents the coordinates of each pixel in G. The symbol is an enhanced alignment matrix. We get a set of value by thresholding the prediction with thresholds 0 to 255 to obtain a binary mask. In our experiment, we reported the mean of under all thresholds.

Mean absolute error ()

This metric measures the pixel-level error between and , and is defined as:

The determination of the optimal value of n

We first conducted experiments in different on the “COVID-19 CT Segmentation” dataset [29]. The results can be seen from Fig. 7 . Some metrics, such as , , and ,will increase with the number of fusion feature layers increasing. So we set the best value of n to 5, that is, the final output is . Through this step experiment, we proved that the results combinated by all the five lateral output will have the best performance.

Fig. 7

The value of n corresponds to the data change of [].

Verify the validity of data sets with pseudo labels

Our training process is divided into two steps: first, pre-training on 1600 training sets with pseudo labels, and then fine-tuning with 60 images with ground-truth labels. In this part, we first verify the validity of the pseudo-label dataset to ensure that it is helpful to the learning process of the model. The results are shown in Table 2 . It can be seen from the above results that there is a certain gap between the test result after training with only the training set with pseudo-label and the test result after fine-tuning with ground-truth label, indicating that training with only the pseudo label data set is not enough. However, if only ground-truth labels are used for training (“0+60”), due to the small sample size, the final result is inferior to pseudo-label data set in some evaluation metrics such as Spe.. Therefore, we refer to the semi-supervised training method in Inf-Net [17], which firstly used pseudo-label data set for pre-training, and then used images with ground-truth labels for fine-tuning, so as to ensure sufficient training sample size and achieve the best performance.

Table 2

1600 training sets with pseudo-labels were trained and tested directly on the “COVID-19 CT Segmentation” dataset [29] to verify the validity of the dataset.

Method	Dice	Sen.	Spe.	Sa	Em	MAE
U-Net	0.682	0.607	0.972	0.710	0.842	0.084
U-Net+	0.584	0.608	0.845	0.625	0.796	0.117
Inf-Net	0.689	0.630	0.970	0.733	0.853	0.070
GFnet(ours)	0.697	0.613	0.976	0.723	0.858	0.069

1600 training sets with pseudo-labels were trained and tested directly on the “COVID-19 CT Segmentation” dataset [29] to verify the validity of the dataset.

Experimental results and analysis

Quantitative analysis

In order to compare the performance of infected region segmentation, the existing segmentation models as U-Net, U-Net++ and Inf-Net were added. In addition, we have added two algorithms for few-shot learning field: Few-shot U-Net and SG-One. The quantitative results in datasets [29] and [30] are shown in Tables 3 and 4 . For each model, we used the same training method. We first conducted experiments to observe the influence of training sets of different sizes on the performance of the models. The specific approach is: we divide 1600 pseudo-label training sets into different sizes: [0,400,800,1200,1600]. We train the model on training sets of different sizes. Then we fine-tune models on 60 CT images with the ground-truth labels. In general, the experimental results of the traditional medical segmentation models U-Net and U-Net++ are mediocre. The GFNet proposed in this paper performs the best in many metrics. For the few-shot U-Net, which is based on U-Net and uses an online learning paradigm to further improve the segmentation ability of traditional U-Net for COVID-19. Its performance is improved over traditional U-Net. But for SG-One, another few-shot learning method, it was only moderately effective. Compared with the latest COVID-19 segmentation model and our GFNet, the experimental results of these two methods are not outstanding. Under the influence of training sets of different sizes, the performance of existing methods is gradually enhanced with the increase of data sets. We think this is because as the training set increases, more and more different knowledge is learned by the model. For GFNet, with the increase of training set, the performance of GFNet is slightly improved. This shows that GFNet not only has a good learning ability, but also can achieve good performance when training samples are small. This is because “Edge-guidance” in GFNet is a module with strong robustness and universality. It can learn the boundary features of the target region at a fast speed and with a small sample size, and expand it to the segmentation result. In the dataset2 [30], our GFNet continues to perform best on almost every metric. This means that our GFNet performs better on “never seen” dataset. The universality of Eg to boundary features in different data sets is verified again. Indicating that it has greater generalization ability overall.

Table 3

Method	Training set size (Pseudo label training + Ground-truth label Finetuning)	Dice	Sen.	Spec.	Sα	Eϕmean	MAE
	0+60	0.632	0.653	0.917	0.762	0.880	0.083
	400+60	0.651	0.702	0.937	0.765	0.892	0.076
U-Net	800+60	0.675	0.693	0.930	0.764	0.894	0.078
	1200+60	0.682	0.687	0.949	0.763	0.892	0.078
	1600+60	0.691	0.720	0.965	0.775	0.891	0.076
	0+60	0.645	0.659	0.921	0.759	0.881	0.079
	400+60	0.662	0.673	0.934	0.769	0.899	0.078
Few-shot U-Net	800+60	0.683	0.695	0.945	0.772	0.901	0.075
	1200+60	0.701	0.713	0.946	0.762	0.898	0.076
	1600+60	0.717	0.720	0.952	0.775	0.904	0.073
	0+60	0.594	0.589	0.882	0.643	0.793	0.129
	400+60	0.611	0.603	0.890	0.658	0.812	0.114
SG-One	800+60	0.625	0.616	0.913	0.671	0.824	0.098
	1200+60	0.632	0.639	0.927	0.685	0.821	0.087
	1600+60	0.640	0.635	0.936	0.693	0.834	0.079
	0+60	0.607	0.692	0.898	0.681	0.767	0.125
	400+60	0.603	0.642	0.894	0.633	0.726	0.145
U-Net+	800+60	0.611	0.618	0.940	0.685	0.827	0.090
	1200+60	0.625	0.640	0.930	0.702	0.830	0.098
	1600+60	0.623	0.638	0.935	0.693	0.835	0.076
	0+60	0.699	0.715	0.928	0.782	0.862	0.078
	400+60	0.727	0.721	0.960	0.793	0.901	0.065
Inf-Net	800+60	0.739	0.716	0.961	0.781	0.896	0.064
	1200+60	0.741	0.728	0.960	0.788	0.906	0.064
	1600+60	0.743	0.739	0.963	0.779	0.908	0.062
	0+60	0.739	0.731	0.944	0.767	0.919	0.064
	400+60	0.745	0.725	0.959	0.768	0.921	0.061
GFNet(ours)	800+60	0.748	0.724	0.964	0.766	0.921	0.061
	1200+60	0.750	0.721	0.966	0.764	0.922	0.062
	1600+60	0.755	0.729	0.966	0.776	0.926	0.059

Table 4

Quantitative results of segmentation of infected region on the dataset of “COVID-CS” [30]. The best segmentation results are shown in bold.

Method	Dice	Sen.	Spec.	Sα	Eϕmean	MAE
U-Net	0.602	0.665	0.964	0.719	0.883	0.050
Few-shot U-Net	0.611	0.668	0.964	0.725	0.870	0.051
SG-One	0.596	0.581	0.979	0.721	0.892	0.056
U-Net+	0.485	0.617	0.946	0.650	0.801	0.077
Inf-Net	0.615	0.565	0.963	0.732	0.849	0.038
GFNet(ours)	0.663	0.605	0.981	0.743	0.895	0.033

Quantitative results of segmentation of infected region on the “COVID-19 CT Segmentation” dataset [29]. The table shows the effect of training sets of different sizes on the performance of each model. The best of each evaluation metric is marked in bold and the second best in italic. Quantitative results of segmentation of infected region on the dataset of “COVID-CS” [30]. The best segmentation results are shown in bold. In addition, we also show the training and computational costs of each model in Table 5 . In real-world applications, the accuracy of a deep learning model is important. Not only that, its training cost, test speed and other performance are also taken seriously. As can be seen from the table, on a training set of 1600 images, our GFNet is slightly slower than U-Net and Inf-Net in terms of training time. This is due to the relative complexity of our model. However, it is worth mentioning that GFNet can achieve good performance only by relying on a small number of training sets, which greatly saves the training time of the model and has great potential. U-Net++ has a very long training time, because when we were training U-Net++, on the same GPU, the batchsize was set to be smaller in case of insufficient video memory. In terms of test speed, we counted the time taken for a test set of 20 images. Our GFNet is second only to U-Net in testing speed by a small margin. To sum up, in real world case decision making, our GFNet is fully capable of such tasks if it is put into application under time constraints or with few training samples. If there is enough time or a large training sample, our GFNet can be sufficiently trained to achieve the best results.

Table 5

Method	Backbone	Param	FLOP	Training time	Testing speed
U-Net	VGG16	7.9M	38.116G	7.2h	1.51s
U-Net+	VGG16	9.2M	65.938G	26.7h	3.93s
SG-One	VGG16	19.0M	45.916G	7.3h	1.77s
Inf-Net	Res2Net	33.1M	13.922G	5.3h	1.92s
GFNet(ours)	VGG16	18.1M	50.827G	7.5h	1.63s
GFNet(6)(ours)	VGG16	27.4M	49.537G	7.1h	1.69s
GFNet(4)(ours)	VGG16	10.9M	59.391G	9.8h	1.55s

The table shows the size of each model and the computational cost for training and testing. The training time was conducted on the training set of 1600 images. The testing time was conducted on a test set of 20 images. Finally, we also examine the effect of the number of the parameters on the GFNet performance. To be specific: we changed the number of parameters by adding and removing one layer from the original GFNet (5 layers). We named the changed models GFNet (6) and GFNet (4). We examine the performance of these two models, and the detailed results are shown in Table 6 . From the perspective of model performance, when we add or reduce the number of GFNet layers, the performance of the model is degraded. Generally speaking, the higher the number of parameters of the model, the better the performance will be, but the performance of GFNet (6) is slightly reduced. We suspect that this is because the boundary details of COVID-19 are already very difficult to extract. As the number of layers increases, many details are lost. For deeper structures, simple stacking of layers will lead to network degradation [40]. The last, we show the training and test time of GFNet (6) and GFNet (4) in Table 5.

Table 6

Three GFNets which have different number of parameters.their performance in dataset “1600+60”.

Method	Dice	Sen.	Spec.	Sα	Eϕmean	MAE
GFNet(6)	0.741	0.713	0.961	0.758	0.912	0.064
GFNet(4)	0.723	0.700	0.958	0.747	0.910	0.067
GFNet	0.755	0.729	0.966	0.776	0.926	0.059

Three GFNets which have different number of parameters.their performance in dataset “1600+60”. The segmentation results in different datasets of lung infection are shown in Figs. 8 and 9 . It is easy to see that our GFNet is significantly better than the baselines. Specifically, the segmentation results generated by it are close to the ground-truth map, and there are fewer tissue regions wrongly segmented. In contrast, the results given by U-Net, U-Net++ are not satisfactory because some results have a large amount of normal tissue that has not been segmented. Although Inf-Net is much better than both, there are still many regions where the boundaries are blurred. Our GFNet demarcates the region with very clear and precise boundaries.

Fig. 8

Comparison of visual effects of segmentation of infected regions on the “COVID-19 CT Segmentation” dataset [29].

Fig. 9

Comparison of visual effects of segmentation of infected regions on the “COVID-CS” dataset [30].

Comparison of visual effects of segmentation of infected regions on the “COVID-19 CT Segmentation” dataset [29]. Comparison of visual effects of segmentation of infected regions on the “COVID-CS” dataset [30].

Ablation experiments

In this Section, we conducted some experiments to verify the core module and steps of GFNet (Edge-guidance module, number of RA module usage (i.e., the best value of in, and ( = 1, 2, 3, 4, 5)), as shown in Table 7 . The best result of each metric is indicated in red. (The results of the fusion module are given in Fig. 7.

Table 7

Ablation experiments of GFNet. The best segmentation results are shown in bold.

No.	BLOCKS							Dice	Sen	Spec	Sα	Eϕmean	MAE
	Backbone (VGG16)	Eg	S5	S4	S3	S2	S1
1	✓							0.601	0.612	0.837	0.695	0.720	0.151
2	✓	✓						0.643	0.654	0.863	0.725	0.759	0.127
3	✓		✓					0.639	0.635	0.878	0.734	0.792	0.085
4	✓	✓	✓					0.681	0.667	0.921	0.757	0.856	0.076
5	✓	✓		✓				0.710	0.672	0.944	0.776	0.872	0.069
6	✓	✓			✓			0.725	0.698	0.951	0.791	0.890	0.065
7	✓	✓				✓		0.737	0.715	0.962	0.792	0.899	0.063
8	✓	✓					✓	0.748	0.714	0.965	0.798	0.915	0.061

Ablation experiments of GFNet. The best segmentation results are shown in bold.

Effectiveness of edge-guidance module

As can be seen from No.1 and No.2, No.3 and No.4, after the addition of Edge-guidance module, many metrics have significantly improved.

Effectiveness of RA module

As can be seen from No.1 and No.3, after the RA module is added, many metrics have greatly improved.

Effectiveness of the combination of edge-guidance module and RA module

As can be seen from No. 3 and No. 4, when the combination of Eg and RA module is used, many indicators are improved again.

Usage times of RA module

In the paper [17], RA module was used three times, it is not rigorous enough, because we think RA module is a process of fusing high-level information with low-level information. So, as the features start at the deepest and backward to the ,the information at each layer can be utilized. From No. 4 to No. 8, we can see that with the increase of RA module, the experimental results become better and better.

Deficiencies and future prospects

Although our GFNet is significantly better than the traditional methods in two datasets, there are still some deficiencies. For example, in terms of training speed and testing speed, our GFNet is not the fastest. We will continue to improve GFNet in this area. In the future, we will combine GFNet with more advanced models, and improve the existing modules such as Edge-guidance module and the final fusion module.

Conclusion

In this paper, we propose a COVID-19 lung CT region segmentation network framework named GFNet. The edge guidance module is added on the basis of the reverse attention module and the aggregation module, which can help the model to capture the boundary information on the fuzzy focus boundary effectively, and then make the boundary segmentation result very accurate. Then, based on the idea of feature fusion, the obtained features are fused. Finally, the model can produce an optimal output. Meanwhile, we proved the problem of the usage times of RA module. In Inf-Net, RA module was used in three deep convolution layers. We proved that the experimental results would be better if RA module was used for each layer. In order to solve the problem of the dataset is too small, we use a semi-supervised approach to extend the training dataset to train GFNet and to prevent overfitting. On the existing two datasets, we verify GFNet’s learning ability and generalization ability respectively, and the experiment results show that GFNet is superior to previous models. The qualitative results also show that our segmentation results are more accurate and clear than other models. At the same time, we conducted experiments with each model on training datasets of different sizes. Experimental results show that for other models, increasing the size of the training set will improve the performance of the model. But our GFNet works well when the training set is small. This is because “Edge-guidance” in GFNet is a module with strong robustness and universality. It can learn the boundary features of the target region at a fast speed and with a small sample size, and expand it to the segmentation result. This shows that our GFNet has great potential in image segmentation with fuzzy boundaries facing other similar targets. Finally, we also show the training and computational costs of each model. In our experiment, the training and testing speed of our model is not the fastest, but our model can also achieve good performance when the training set is relatively small. In real world case decision making, our GFNet is fully capable of such tasks if it is put into application under time constraints or with few training samples. Our GFNet can also be sufficiently trained to achieve maximum performance if there is sufficient time or a large training sample. In the future, we will apply the GFNet framework to other medical image segmentation tasks, such as colonoscopic polyps, cells, etc., or image segmentation in other fields. Therefore, our GFNet has great potential to assist healthcare professionals in medical images segmentation.

Declaration of Competing Interest

We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled.

25 in total

1. CE-Net: Context Encoder Network for 2D Medical Image Segmentation.

Authors: Zaiwang Gu; Jun Cheng; Huazhu Fu; Kang Zhou; Huaying Hao; Yitian Zhao; Tianyang Zhang; Shenghua Gao; Jiang Liu
Journal: IEEE Trans Med Imaging Date: 2019-03-07 Impact factor: 10.048

Review 2. Review of Artificial Intelligence Techniques in Imaging Data Acquisition, Segmentation, and Diagnosis for COVID-19.

Authors: Feng Shi; Jun Wang; Jun Shi; Ziyan Wu; Qian Wang; Zhenyu Tang; Kelei He; Yinghuan Shi; Dinggang Shen
Journal: IEEE Rev Biomed Eng Date: 2021-01-22

3. UNet++: A Nested U-Net Architecture for Medical Image Segmentation.

Authors: Zongwei Zhou; Md Mahfuzur Rahman Siddiquee; Nima Tajbakhsh; Jianming Liang
Journal: Deep Learn Med Image Anal Multimodal Learn Clin Decis Support (2018) Date: 2018-09-20

4. A Weakly-Supervised Framework for COVID-19 Classification and Lesion Localization From Chest CT.

Authors: Xinggang Wang; Xianbo Deng; Qing Fu; Qiang Zhou; Jiapei Feng; Hui Ma; Wenyu Liu; Chuansheng Zheng
Journal: IEEE Trans Med Imaging Date: 2020-08 Impact factor: 10.048

5. The Role of Chest Imaging in Patient Management during the COVID-19 Pandemic: A Multinational Consensus Statement from the Fleischner Society.

Authors: Geoffrey D Rubin; Christopher J Ryerson; Linda B Haramati; Nicola Sverzellati; Jeffrey P Kanne; Suhail Raoof; Neil W Schluger; Annalisa Volpi; Jae-Joon Yim; Ian B K Martin; Deverick J Anderson; Christina Kong; Talissa Altes; Andrew Bush; Sujal R Desai; Onathan Goldin; Jin Mo Goo; Marc Humbert; Yoshikazu Inoue; Hans-Ulrich Kauczor; Fengming Luo; Peter J Mazzone; Mathias Prokop; Martine Remy-Jardin; Luca Richeldi; Cornelia M Schaefer-Prokop; Noriyuki Tomiyama; Athol U Wells; Ann N Leung
Journal: Radiology Date: 2020-04-07 Impact factor: 11.105

6. Imaging Profile of the COVID-19 Infection: Radiologic Findings and Literature Review.

Authors: Ming-Yen Ng; Elaine Y P Lee; Jin Yang; Fangfang Yang; Xia Li; Hongxia Wang; Macy Mei-Sze Lui; Christine Shing-Yen Lo; Barry Leung; Pek-Lan Khong; Christopher Kim-Ming Hui; Kwok-Yung Yuen; Michael D Kuo
Journal: Radiol Cardiothorac Imaging Date: 2020-02-13