Literature DB >> 34305181

Progressive global perception and local polishing network for lung infection segmentation of COVID-19 CT images.

Nan Mu¹, Hongyu Wang¹, Yu Zhang¹, Jingfeng Jiang^2,3, Jinshan Tang⁴.

Abstract

In this paper, a progressive global perception and local polishing (PCPLP) network is proposed to automatically segment the COVID-19-caused pneumonia infections in computed tomography (CT) images. The proposed PCPLP follows an encoder-decoder architecture. Particularly, the encoder is implemented as a computationally efficient fully convolutional network (FCN). In this study, a multi-scale multi-level feature recursive aggregation (mmFRA) network is used to integrate multi-scale features (viz. global guidance features and local refinement features) with multi-level features (viz. high-level semantic features, middle-level comprehensive features, and low-level detailed features). Because of this innovative aggregation of features, an edge-preserving segmentation map can be produced in a boundary-aware multiple supervision (BMS) way. Furthermore, both global perception and local perception are devised. On the one hand, a global perception module (GPM) providing a holistic estimation of potential lung infection regions is employed to capture more complementary coarse-structure information from different pyramid levels by enlarging the receptive fields without substantially increasing the computational burden. On the other hand, a local polishing module (LPM), which provides a fine prediction of the segmentation regions, is applied to explicitly heighten the fine-detail information and reduce the dilution effect of boundary knowledge. Comprehensive experimental evaluations demonstrate the effectiveness of the proposed PCPLP in boosting the learning ability to identify the lung infected regions with clear contours accurately. Our model is superior remarkably to the state-of-the-art segmentation models both quantitatively and qualitatively on a real CT dataset of COVID-19.

Entities: Disease Gene Species

Keywords: Coronavirus disease 2019 (COVID-19); Feature recursive aggregation; Global perception; Local polishing; Multiple supervision

Year: 2021 PMID： 34305181 PMCID： PMC8272691 DOI： 10.1016/j.patcog.2021.108168

Source DB: PubMed Journal: Pattern Recognit ISSN： 0031-3203 Impact factor: 7.740

Introduction

The spread of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) becomes a global public health crisis known as the coronavirus disease 2019 (COVID-19) pandemic. SARS-CoV-2 continues threatening the world due to its high infectivity and extreme lethality. To stop the spread of SARS-CoV-2, a robust, accurate, and rapid testing protocol plays a critical role. The gold standard for detecting COVID-19 infections in clinics is reverse transcription-polymerase chain reaction (RT-PCR) [1]. Unfortunately, the shortage of equipment and long operation time (i.e., a typical 24-hour sample-to-result turnaround time) dramatically limits our ability to screen suspected patients rapidly. In addition, the reduced sensitivity of RT-PCR tests caused by inappropriate sample collection, storage, transfer, purification, and processing yield high false-negative errors, becoming a hazardous factor for stopping the COVID-19 pandemic. Radiological techniques, including chest X-ray [2], [3], [4] and computed tomography (CT) [5], [6], [7], offer critical imaging tools for the detection of COVID-19-related pneumonia infections and in evaluating the respiratory complications related to coronavirus. Due to its high spatial resolution, chest CT is more effective than X-ray radiography for detecting COVID-19. This is because CT can be used to detect small lesions showing signs of infections on the lung with a higher degree of sensitivity than X-ray radiography. When CT imaging is used to diagnose COVID-19-caused pneumonia infections, one important task is to delineate the suspected infections in the lung. The identification (also known as image segmentation) of COVID-19-caused infections through CT images is crucial for further quantitative analysis of the disease. Also, the segmentation of the COVID-19-caused lesions can be used to monitor and determine the severity of the disease over time, which allows doctors to predict risks and prognostics in a "patient-specific" fashion. Currently, the segmentation of the COVID-19-caused pneumonia infections has been done manually by experienced radiologists. However, such a task is labor-intensive and prone to inter- and intra-observer variability. Automated segmentation of COVID-19-caused infections from CT images, will relieve clinicians' workloads and make radiologic diagnoses more reliable and reproducible. However, there are several challenges: 1) compared with classification tasks, the segmentation of infection regions in the lung needs to process more discriminative information; 2) texture of the infected regions on CT images, especially the tiny infection regions, is devilishly complicated and detailed CT image characteristics vary dramatically, leading to false-negative; 3) the segmentation is sensitive to several intrinsic factors in the images, e.g., the difference of the locations of the lesions, intensity inhomogeneity of the infected regions, high variation of the infection characteristics, soft tissue appearance within the proximity of a suspected infection, etc. Our image segmentation method is inspired by practices in the clinical workflow. More specifically, during the diagnosis of pulmonary infections, clinicians first identify the overall infected regions and then zoom in to the details of those detected pulmonary lesions, e.g., local appearances on CT images. In other words, clinicians use both global (a rough lesion contour) and local (detailed imaging characteristics) information to distinguish the infected lesions from normal tissues. Our algorithm first predicts the rough infection areas and then refines the suspected lesion boundaries by taking imaging details into considerations. However, the combination of imaging artifacts (e.g., motion artifacts) and insufficient image contrast between the infections and their surroundings (soft tissue) makes the automated image segmentation of COVID-19 infection regions challenging. This paper proposes a progressive global perception and local polishing (PCPLP) deep network to overcome the technological challenges mentioned above. As illustrated in Fig. 1 , the proposed PCPLP network follows an encoder-decoder architecture. The encoder structure uses a VGG-16 framework, which starts by extracting multi-level Pyramid features, i.e., high-level semantic features, middle-level comprehensive features, and low-level detailed features. Two additional modules: a global perception module (GPM) and a local polishing module (LPM), are added to extend the classical VGG model. GPM is used to extract the global knowledge to find the infected regions' rough locations. In contrast, the LPM is used to extract the local information to obtain the infected regions' internal details and external contours. After the image encoding, local features obtained by the LPM models at various scales, the discriminant features of the contrast layers, and the upsampling features of the previous decoder layer are hierarchically integrated to obtain the enriched feature representations (i.e., local score map in Fig. 1). After that, the local refinement features, which are generated through multi-level feature recursive aggregation (mlFRA) by multistage parallel fusion in a recurrent manner, are combined with the global guidance features (i.e., a global score map in Fig. 1) via multi-scale feature recursive aggregation (msFRA) to produce the initial segmentation map. Finally, we propose to train the multi-scale multi-level feature recursive aggregation (mmFRA) network by exploiting the boundary-aware multiple supervision (BMS), i.e., segmentation cross-entropy loss (SCEL), boundary cross-entropy loss (BCEL), and boundary-refinement loss (BRL).

Fig. 1

An illustrative flowchart showing major components of the proposed PCPLP network.

An illustrative flowchart showing major components of the proposed PCPLP network. In summary, our main contributions of this work are fourfold: We propose a novel deep fully encoder-decoder convolutional network, PGPLP, for COVID-19 lung infected region segmentation from chest CT images. The entire network comprises a pair of msFRA and mlFRA networks, which progressively aggregate the global guidance information and local refinement information in a coarse-to-fine fashion to get the initial segmentation map. We build GPM and LPM modules to guide the message through the network to learn accurate positioning information and extensive detailed knowledge of the lung infected regions. To make the network focus on the infected regions and the boundary pixels in the training phase, we employ BMS to predict the lung infection regions and the corresponding contours simultaneously. The remainder of this paper is organized as follows: The related work is presented in Section 2. The architecture of the proposed PGPLP network is described in Section 3. Extensive experimental evaluations are provided in Section 4 followed by some closing remarks in Section 5.

Related Works

In this section, image analysis techniques with applications in the COVID-19 pandemic, which are most relevant to our work, are discussed.

Artificial Intelligence for Combating COVID-19

Many researchers proposed various strategies based on artificial intelligence (AI) technology to contribute to the combat against the COVID-19 epidemic [8]. For instance, Jiang et al. [9] presented an AI architecture with the prophetic ability for COVID-19 detection to support rapid clinical decision-making. Li et al. [10] developed a fully automatic COVID-19 detection neural network (COVNet) to distinguish SARS-CoV-2 infections from community-acquired pneumonia on chest CT images. Harmon et al. [11] employed an AI algorithm to detect COVID-19 pneumonia on chest CT datasets from multi-institutions. Their method reached an accuracy of 90.8%. Besides utilizing chest CT images, Salman et al. [12] exploited a deep learning model for COVID-19 pneumonia detection in X-ray images, which achieved comparable performance to expert radiologists. Wang et al. [13] developed a tailored deep convolutional neural network for detecting COVID-19 utilizing chest X-ray images. Adhikari et al. [14] introduced a two-staged DenseNet to diagnose COVID-19 from CT and X-ray images to alleviate the workloads of radiologists. All those radiography-based algorithms can decide whether a patient is infected by SARS-CoV-2 and contribute to the control of the COVID-19 epidemic, especially when RT-PCR tests are not available.

Segmentation of COVID-19 Infections from Chest CT Images

In addition to AI-based detection of COVID-19-caused pneumonia [15], [16], [17], research efforts have also been devoted to the segmentation of COVID-19 caused infections [18], [19], [20], [21]. Image segmentation enables subsequent quantitative analysis for patients infected by SARS-CoV-2. A critical application of infection segmentation is to provide a comprehensive prediction of the disease severity; this can be done by visualizing and stratifying the lesion distribution utilizing the percentage of infection (POI). As mentioned before, image segmentation algorithms must overcome several challenges, including intensity inhomogeneity, interference of imaging artifacts, lack of image contrast among different tissue types, etc. Some innovative strategies have been proposed to overcome the above-said challenges. The most prominent neural network architectures of this line of research are U-Net [18] and its variants (e.g., residual attention U-Net [19], spatial and channel attention U-Net [20], 3D U-Net [21], etc.). Particularly, Zhou et al. [22] designed a rapid, accurate, and machine-agnostic model to segment the infection regions in COVID-19 CT scans. Wang et al. [23] presented a noise-robust deep segmentation network to learn the pneumonia lesions of COVID-19. Fan et al. [24] proposed a lung infection segmentation network to segment COVID-19 infected regions on chest CT images automatically. More recently, Laradji et al. [25] put forward a weakly supervised learning framework for COVID-19 segmentation in CT images, which utilized the consistency-based loss to enhance the performance of the segmentation. Our proposed automatic deep segmentation model was inspired by observed clinical practices, making it different from other published methods. Our method first learns multi-level discriminant information of lesions via a global perception strategy. It then progressively refines the infection regions with additional local detailed image characteristics through the local polishing method. As described in the proceeding sections, the proposed method possesses enormous potentiality to assist experts in analyzing and interpreting COVID-19 CT images.

The Proposed Lung Infection Segmentation Network

Anecdotally, experienced radiologists usually adopt a two-step procedure to segment an infected region caused by SARS-CoV-2: 1) roughly locate an overall infection region, and 2) complete an accurate delineation of the infection region characterizing local tissue structures and considering detailed imaging characteristics. Inspired by this diagnostic process, a global perception and local polishing guided deep network is proposed to automatically identify the infected regions so that both rough global structure information and fine local boundary information can be progressively integrated. The proposed neural network model can extract the lung infected regions in CT images, even with the presence of low image contrast, image blurring, drastic local changes, and complex local patterns. In this section, we first present an overview of the architecture of the proposed automatic lung infected region segmentation network. Then, we describe both the global perception module (GPM) and the local polishing module (LPM) in detail. After that, we provide the design of multi-scale multi-level feature recursive aggregation (mmFRA). To the end, a boundary-aware multiple supervision (BMS) strategy and the training loss of the network are elucidated.

Overview of Network Architecture

The proposed network follows the encoder-decoder architecture, as displayed in Fig. 2 . The encoder part is based on VGG-16 and is used to extract multi-level coarse features. In contrast, the decoder part recursively integrates the multi-scale fine features to generate the infected region segmentation map through supervised learning. The proposed network consists of four key components: GPM, LPM, mmFRA (viz. mlFRA and msFRA), and BMS.

Fig. 2

An overview of the proposed PCPLP network architecture.

An overview of the proposed PCPLP network architecture. Our encoder configuration is derived from the classic VGG-16 architecture due to its simplicity and elegance. It is also worth noting that VGG-16 still has state-of-the-art performance and good generalization properties for image segmentation. The comparisons among four deep-learning neural networks frequently used as the backbones of various segmentation architectures are shown in Table 1 . We modified the VGG-16 [26] architecture by removing all the fully connected layers. Hence, the encoder becomes a fully convolutional network (FCN), and the removal of fully connected layers allows us to improve computational efficiency. However, our encoder still allows pixel-wise characterization of CT imaging data.

Table 1

A summary of four commonly used classic neural network models for image segmentation.

	AlexNet [27]	VGG [26]	GoogLeNet [28]	ResNet [29]
Input Size	227 × 227	224 × 224	224 × 224	224 × 224
Number of Layers	8	19	22	152
Number of Conv. Layers	5	16	21	151
Filter Sizes	3, 5, 11	3	1, 3, 5, 7	1, 3, 5, 7
Strides	1, 4	1	1, 2	1, 2
Fully Connected Layers	3	3	1	1
TOP-5 Test Accuracy	84.6%	92.7%	93.3%	96.4%
Contributions	ReLU, Dropout	Small filter kernel	1 × 1 Conv.	Residual learning
Advantages	Increase training speed and prevent overfitting	Suitable for parallel acceleration, nonlinear	Reduce the amount of computation	Overcome gradient vanishing
Disadvantages	Low accuracy	Small receptive field	Overfitting, vanishing gradient	Many parameters, long training time

A summary of four commonly used classic neural network models for image segmentation. We denote the remaining layers (13 convolutional layers and 5 pooling layers) of VGG-16 as ~ to extract the features (i=0, 1, 2, 3, 4) at five different levels. The details of each block are provided in Table 2 .

Table 2

Details of the Block1~Block5 to extract image features in a multi-level Pyramid scheme.

Block	Layer	Filter Size/Channels	Stride	Padding
Block₁	Conv_1-1	3 × 3/64	1	Yes
	Conv_1-2	3 × 3/64	1	Yes
	Maxpool	2 × 2/64	2	No
Block₂	Conv_2-1	3 × 3/128	1	Yes
	Conv_2-2	3 × 3/128	1	Yes
	Maxpool	2 × 2/128	2	No
Block₃	Conv_3-1	3 × 3/256	1	Yes
	Conv_3-2	3 × 3/256	1	Yes
	Conv_3-3	3 × 3/256	1	Yes
	Maxpool	2 × 2/256	2	No
Block₄	Conv_4-1	3 × 3/512	1	Yes
	Conv_4-2	3 × 3/512	1	Yes
	Conv_4-3	3 × 3/512	1	Yes
	Maxpool	2 × 2/512	2	No
Block₅	Conv_5-1	3 × 3/512	1	Yes
	Conv_5-2	3 × 3/512	1	Yes
	Conv_5-3	3 × 3/512	1	Yes
	Maxpool	2 × 2/512	2	No

Details of the Block1~Block5 to extract image features in a multi-level Pyramid scheme. The features obtained by are used as the input to the corresponding encoder block (i=0, 1, 2, 3, 4). Each contains one GPM and two LPMs, which are used to capture the overall shape information and the complement detail information of the potential lung infection regions, respectively. More details regarding GPM and LPM will be introduced in the proceeding sections below. A contrast layer is applied to every feature map that is an output of each to measure the dissimilarity between each patch and its local average. The contrast feature can be obtained by conducting an average pooling operation with a 3 × 3 kernel as follows The contrast layer contributes to making the infected regions standing out from their normal surroundings. The feature maps combined from each decoder and contrast layer are fed into decoder block ~ for further refinement, as shown in Fig. 2. Now referring to the decoder configuration, we develop a supervised learning framework that enables recursive integration of multi-scale high-level features for generating the predications of the lung infection regions. The four decoder blocks ~ share a similar structure, which contains a ReLU activation function and an LPM. Each decoder block is designed to merge the features from a contrast layer, an encoder block, and a previous decoder block, as shown in Fig. 2 (far right column). Of note, features from the previous decoder are upsampled via a deconvolution layer before the above-referenced feature merge. For instance, the coarse-level features from the encoder block , the discriminant features from the contrast layer, and the upsampled output from are concatenated to produce the fine-level features. The fine-level features are then passed through two convolution layers and two LPMs to obtain the final local refinement information (see the local score map in Fig. 2) that is used to precisely determine the boundaries of lung infection regions. After , a GPM followed by a series of LPMs and convolution layers is placed to capture the global guidance information (i.e., Global Score; rough estimation of the suspected lung infection regions). Hence, the initial lung infected region segmentation map can be successfully generated by integrating the complementary global guidance information (Global Score) and local refinement information (Local Score). Furthermore, the fine-level features are further utilized to refine the determination of the boundaries of the suspected lung infection regions (see BMS component in Fig. 2). The details regarding each layer of the encoder-decoder framework are shown in Table 3 .

Table 3

Details of the proposed encoder-decoder convolutional network.

Module	Block		Layer	Filter Size/Channels	Stride	Padding
Encoder	GPM	GPM_left	Conv	7 × 1/128	1	Yes
		GPM_left	Conv	1 × 7/256	1	Yes
		GPM_right	Conv	1 × 7/128	1	Yes
		GPM_right	Conv	7 × 1/256	1	Yes
	LPM		Conv	3 × 3/256	1	Yes
			ReLU
			Conv	3 × 3/256	1	Yes
	Conv		Conv	3 × 3/128	1	Yes
	ReLU		ReLU
	LPM		Conv	3 × 3/128	1	Yes
			ReLU
			Conv	3 × 3/128	1	Yes
Contrast Layer	Contrast Layer		avg_pool	3 × 3/128	1	No
Decoder	DeConv		DeConv	3 × 3/384	2	Yes
	ReLU		ReLU
	LPM		Conv	3 × 3/384	1	Yes
			ReLU
			SConv	3 × 3/384	1	Yes
	ReLU		ReLU

Details of the proposed encoder-decoder convolutional network.

Global Perception Module (GPM)

In the past, when a CNN is designed (e.g., VGGNet [26], AlexNet [27], GoogLeNet [28], ResNet [29], LeNet [30], DenseNet [31]), small convolutional kernels (e.g., or ) are commonly used to extract hierarchical features because they are less computationally expensive (low time and memory requirements). However, small kernel sizes ( or ) limit the network's overall representation ability. It is desirable to explore a more effective strategy for handling large receptive field problems, which aims at capturing the global shape and appearance information of the lung infected regions. Research efforts have been devoted to enlarging the receptive field; strategies include dilated convolutions [32], deepening the convolutional layers [33], and adding a series of downsampling [34]. However, these operations pose new challenges as explained below. Conducting dilated convolutions and network deepening will lead to an increase in storage space and a decrease in computing efficiency. Performing multiple downsampling results in the loss and distortion of high-level feature information, thereby negatively impacting the determination of the gross lung infection regions. We are motivated to introduce a global perception module (GPM) that can enlarge the range of the receptive field but has low memory requirements and computation demands. GPM can enlarge the receptive field while maintaining the resolution of the feature map so that the loss of spatial information in the feature maps can be avoided. In particular, using a convolution operation followed by a convolution operation is equivalent to a convolution with an receptive field [35]. Let denote a 2-d image, and denote two 1-d kernels along x-dimension and y-dimension, respectively. This mechanism can be represented as follows:where is a 2-d kernel, and are the width and height of the 2-d image, respectively. As a result, the number of parameters is dramatically reduced and the performance degradation is minimal. Furthermore, let denote the number of channels in the input layer and output layer of GPM, the computational cost of adopting two 1-d convolution operations is , which is less than that of a 2-d convolution operation . As illustrated in Fig. 3 , the proposed GPM comprises two sub-branches, each of which consists of two convolutional layers with kernel size and , respectively. After the convolution operations, the two sub-branches are integrated to enable the corresponding feature extraction layers to focus on a larger receptive field rather than on a small one. GPMs can capture deep high-level semantic information without significantly increasing the memory space and computation cost. A GPM has powerful global representation ability and can effectively avoid the loss of small-target information since its "effective" kernel size is . We consider that the adoption of GPM modules is essential for accurate pixel-wise segmentation of infected lung regions.

Fig. 3

A schematic diagram showing the structure of GPM module.

Local Polishing Module (LPM)

Along with the increased feature levels, the holistic structure of the lung infection regions gradually appears with the help of the proposed GPMs. However, when deep convolutional networks are applied for image segmentation, they generally produce blurred boundaries and are not sensitive in terms of recognizing narrow non-infected regions that are (fully or partially) surrounded by their infected neighboring regions. This is not surprising because convolution layers (e.g., strides and pooling) may cause information loss. Recall that we import guidance information from the previous decoder layers (see Fig. 2), which contains less accurate information due to convolution and deconvolution layers. Particularly, around the contours of the infected regions, such information accuracy could deteriorate remarkably. The local polishing module (LPM) possesses a simple but effective residual structure [29] and can be used to polish the feature representation, especially the boundary information. The idea of LPM is to promote the network to learn the residual representations of the input data via nonlinearity and it is implemented by introducing shortcut connections to skip the input layer to its output without extra parameters or computational complexity (see Fig. 4 ). LPM can be used together with any deep convolutional layers.

Fig. 4

A schematic diagram showing the structure of LPM.

A schematic diagram showing the structure of LPM. These skip connections help to retain spatial information at the network output so that the low-level knowledge contains rich edges, textures, and shapes can be effectively transmitted, making the proposed encoder-decoder framework more suitable for explicit segmentation. Specifically, LPMs keep the local details of the lung infection representations obtained by the CNNs, such as the backbone or preceding GPM, and learn to refine them with the residual connection. The output of LPM can be defined as:where is the output of the previous layer, denotes the output after performing the convolution and activation operations. As demonstrated in Fig. 4, an LPM is a localized residual net consisting of two convolution layers and one ReLU on . Since the residual connection directly propagates the input information to the output layer, the high-level and low-level information can be conveniently concatenated to preserve the integrity of the lung infection regions and detailed boundaries between the infected and non-infected regions.

Multi-Scale Multi-Level Feature Recursive Aggregation (mmFRA)

We propose two schemes to improve feature aggregation in this section. A multi-level feature recursive aggregation (mlFRA) scheme is to improve feature aggregation during the encoder step, while a multi-scale feature recursive aggregation (msFRA) scheme is to fine-tune information fusion prior to the generation of the initial segmentation map. It is important to note that the execution of the proposed GPMs enables the global guidance information to be absorbed in the feature maps for different pyramid levels and the utilization of the LPMs allows the local refinement information to be maintained as much as possible during the knowledge transmission between convolutional layers. Now, our effort is steered towards the seamless aggregation of coarse-level features at different scales. Taking the proposed encoder (i.e., ~ in Fig. 2) as an example, the convolutional layers of different levels correspond to abstract features in different spatial scales: 1) for high-level features, there is abundant semantic information that is beneficial to positioning the lung infection regions and suppressing noise interference; 2) for middle-level features, both the semantic and detailed information can be adaptively prioritized, offering flexibility for feature utilization; 3) for low-level features, the fine-grained details can be preserved and are useful for predicting the boundaries of the lung infection regions. We strive to enhance multi-scale feature fusion by developing the innovative mlFRA scheme during the encoder step. Our goal is to fully integrate the high-level, middle-level, and low-level features by a strategy of recursive learning, which in turn generates a comprehensive and discriminative fine-level feature representation with local perception. As shown in Fig. 2, the features at different levels, which are extracted by ~ of the VGG-16 backbone, are progressively refined in parallel to be more accurate and representative by passing through the encoder and contrast layers. In particular, the parallel encoding and decoding structure at different layers of VGG-16 can maintain the feature information at each level. Furthermore, the features at each level are updated with the integration of the encoder layer, contrast layer, and preceding decoder layer. Practically speaking, the proposed mlFRA can be achieved by concatenating the local feature , the contrast feature , and the higher level unpooled feature as follows,where CAT is a concatenation operation. As stated before, when VGG-16 is used to extract the features, two problems intrinsic to convolution neural networks arise, i.e., the inadequate expression of low-level features and the loss of feature information as the convolution layer deepens. The proposed mlFRA allows the semantic information of the high-level features, comprehensive information of the middle-level features, and detailed information of the low-level features to be gradually aggregated. Since three levels of features are complemental to each other, such aggregation can minimize information loss. At a late step (i.e., prediction step), there are still fusion problems of global and local information from pyramid features extracted by the backbone network. We employ multi-scale feature recursive aggregation (msFRA) to finally integrate the global and local features for lung infected region segmentation. Instead of exploiting more complicated modules and sophisticated strategies to integrate the multi-scale features, our msFRA architecture is quite simple. It directly concatenates the global guidance feature map (which is obtained by feeding the backbone VGG-16 output into a series of GPM, LPM, and convolution with varying subsampling rates in the forward pass) and the local refinement feature map (which is obtained by feeding the output from mlFRA to a series of LPMs and convolutions with varying downsampling rates). In concrete terms, the proposed msFRA is implemented by integrating the global features with local features through a Softmax function:where and denote the linear operators of the local feature map (denoted as ) and the global feature map (denoted as ), respectively. In effect, the probability of pixel belonging to the lung infection regions in the initial segmentation map can be predicted. In this way, the multi-scale features are aggregated to assist the localization of lung infection regions and the refinement of the local details and generate a rough segmentation map. The rough map will be used as the input to the BMS module (see Section 3.5 below). In summary, the progressive mlFRA significantly increases the representations of the multi-level features stage by stage, and the msFRA greatly improves the segmentation performance. As a result, with multi-scale multi-level feature recursive aggregation (mmFRA), the coarse and multi-level features can be rectified and combined to produce the fine-level features, the cluttered information can be removed to better sharpen the details of the lung infection regions.

Boundary-Aware Multiple Supervision (BMS) Module

To train the networks, the design of the loss functions is very important. For segmentation networks, cross-entropy loss (CEL) is often utilized to measure the dissimilarity between the segmentation map (SM, denoted by ) and ground truth (GT, denoted by ). Let represent the index of each pixel and indicate the number of the pixels, the proposed segmentation cross-entropy loss (SCEL, denoted by ) function is defined as: where represents the binary cross-entropy loss, which is extensively used and can be robust in segmentation and binary classification tasks. Although the proposed mmFRA encourages the overall network to predict the initial segmentation map with abundant structure and rich contour information, the pixels near the boundaries of the lung infection regions are difficult to be correctly classified. Given this, to incentivize the deep network to pay more attention to the boundary pixels during the training phase, a boundary cross-entropy loss (BCEL, denoted as ) and a boundary-refinement loss (BRL, denoted as ), serving to generate exquisite boundaries, are exploited to work together with the SCEL for segmenting the lung infected regions. Let and denote the estimated boundary map and the true boundary map, respectively, and can be defined as follow: Mathematically, the proposed boundary-aware multiple supervision (BMS) can be given as the combination of the three losses: With multiple supervision using , and in an intermingled manner, the final lung infected region segmentation map of the uniform highlighted foreground and finely sharpened contours can be effectively achieved.

Experimental Results

In this section, we demonstrate the effectiveness of the proposed PCPLP network by providing detailed qualitative and quantitative experimental results using publically available COVID-19 CT datasets [36]. The evaluation was performed by comparing it with other eight state-of-the-art models in terms of 12 metrics.

Experimental Setup

Evaluation dataset. The COVID-19 CT dataset [36], which consists of 100 axial CT images from more than 40 patients, was collected by the Italian Society of Medical and Interventional Radiology (SIRM). Each image was annotated by a radiologist using three different labels: ground-glass (the value is 1), consolidation (the value is 2), and pleural effusion (the value is 3). In our experiments, 50% of CT images of the dataset were randomly selected for training and the remaining 50% were used for tests. In addition, a semi-supervised COVID-19 dataset [24], which contained 1600 unlabeled CT images extracted from the COVID-19 image data collection in [37], was utilized for augmenting the training set. Implementation Details. We implemented the proposed PCPLP network using TensorFlow [38] deep learning framework on an NVIDIA Ge-Force RTX 2070 GPU and Inter Core i3-9100F CPU (3.6GHz) processor with 32 GB RAM. Our model was trained by Adam optimizer with a batch size of 1 for 10 epochs, which took about nine hours. The initial learning rate is 10−6, the weights of all the convolution and deconvolution layers were randomly initialized using a truncated normal distribution (standard deviation is 0.01), and the biases were initialized to 0. During the training and testing, CT slides were resized to a fixed dimension of 416 × 416 pixels. We employed horizontal flips on the training images for data augmentation. Evaluation models. The proposed PCPLP framework was compared with other eight state-of-the-art segmentation models including FCNet [39], UNet [18], AttUnet [40], BASNet [41], EGNet [42], PoolNet [43], UNet++ [44], and Inf-Net [24]. For the sake of fair comparisons, all these models were trained using the same COVID-19 CT dataset and the same settings. Evaluation metrics. To comprehensively evaluate the performance of the proposed PCPLP architecture against the other eight algorithms, we exploited 12 performance metrics as shown in Table 4 . In Table 4, true-positive (TP) and true-negative (TN) represent the correct classification ratio of positives (lung infection pixels) and negatives (non-lung infection pixels), respectively, whereas false-positive (FP), and false-negative (FN) denote the incorrect prediction ratio of lung infection regions and non-lung infection regions, respectively.

Table 4

The 12 metrics for evaluating the performance of various segmentation models.

Metric	Formula	Description
Receiver operating characteristic (ROC) curve	TPR=TPTP+FN,FPR=FPFP+TN	TPR and FPR measure the proportion of correctly identified actual positives and actual negatives, respectively
Precision-recall (PR) curve	P=TPTP+FP,R=TPTP+FN	PR curve mainly evaluates the comprehensiveness of the detected lung infection pixels
F-measure curve	Fm=(1+β2)P·Rβ2·P+Rβ2 is set to 0.3 to emphasize the effect of P	F-measure is computed by the weighted harmonic mean of precision and recall, which can reflect the quality of detection
DICE score	DICE=2×\|SM∩GT\|\|SM\|+\|GT\|	DICE score measures the similarity between the predicted map and the ground truth
Sensitivity score	Sen.=TPTP+FN	Sensitivity score measures the rate of missed detection
Specificity score	Spec.=TNFP+TN	Specificity score measures the rate of false detection
Mean absolute error (MAE) score	MAE=1W×H∑1W∑1H\|SM−GT\|W and H denote the width and height of the image, respectively	MAE score indicates the similarity between the segmentation map and the ground truth
Area under curve (AUC) score	AUC=∑t0∈E0∑t1∈E1I[f(t0)−f(t0)]\|E0\|·\|E1\|E0 and E1 denote the set of negative and positive examples, respectively	AUC score gives an intuitive indication of how well the segmentation map predicts the true lung infection regions
Weighted F-measure (WF) score [45]	WF=(1+β2)WP·WRβ2·WP+WRWP and WR denote the weighted precision and weighted recall, respectively	WP and WP measure the exactness and completeness, respectively
Overlapping ratio (OR) score	OR=\|SBM∩GT\|\|SBM∪GT\|SBM denotes the binary segmentation map	OR score measures the completeness of lung infection pixels and the correctness of non-lung infection pixels
Structure-measure (S-M) score [46]	(1−α)×SO(SM,GT)+α×SR(SM,GT)SO and SRdenote the object-aware similarity and region-aware similarity, respectively	S-M score measures the structural similarity between the segmentation map and the ground truth
Execution time	Average execution time per image (in second)	All experiments were performed with the same equipment and settings

The 12 metrics for evaluating the performance of various segmentation models.

Experimental Results

Quantitative results. The 12 metrics in Table 4 were used to measure the performance of the proposed method and the other eight methods. The obtained results are reported in Fig. 5 and Table 5 .

Fig. 5

Performance comparisons of the proposed PCPLP framework with other models using the COVID-19 CT dataset [36].

Table 5

Quantitative performance comparison of nine models in terms of different metrics. The best two results are highlighted in red and blue, respectively. The up-arrow ↑ indicates the higher value obtained, the better segmentation quality is, whereas the down-arrow ↓ implies the opposite.

Image, table 5

Performance comparisons of the proposed PCPLP framework with other models using the COVID-19 CT dataset [36]. Quantitative performance comparison of nine models in terms of different metrics. The best two results are highlighted in red and blue, respectively. The up-arrow ↑ indicates the higher value obtained, the better segmentation quality is, whereas the down-arrow ↓ implies the opposite. As depicted in Fig. 5, the curves of the proposed model, illustrated by the red solid lines, consistently lie above most of the other models, indicating that the predictions produced by our model had the minimum error in comparison with the ground truths. As can be seen in Table 5, it is evident that the scores of the DICE, Sensitivity, MAE, WF, OR, and S-M metrics, provided by the proposed PGPLP, outperformed all the compared state-of-the-art models. However, in terms of Specificity and AUC scores, our model only achieved the second-best results, which were slightly lower (0.0079 and 0.0124) than those obtained by UNet++ model. This indicates that our model is weaker than UNet++ in distinguishing non-infected regions but more outstanding in identifying segmented regions of interest. We attributed the competitive performance to our progressive global perception and local polishing architecture, which yielded a robust feature representation with more complete structural information and finer detail information. In addition, by introducing the boundary-aware multi-supervised learning strategy into our framework, the segmentation accuracy can be further improved. Qualitative results. As an assistant diagnostic means, the segmentation map is expected to offer more detailed information on the lung infection regions. The subjective performance comparisons between the manual annotations (ground truths) and nine AI-generated lung infection segmentation maps are shown in Fig. 6 . It can be seen clearly that the segmentation results of the proposed PGPLP model are highly consistent with the ground truth maps, which may supply a powerful guarantee for subsequent analysis. In contrast, the AttUnet and BASNet models generated unsatisfactory results, with many miss-segmented infection regions. FCN and UNet models were able to segment the large infection regions, but poor in identifying small regions. EGNet and UNet++ models over-segmented or improperly segmented some of the COVID-19 lesions. PoolNet and Inf-Net models produced relatively good results by aggregating the high-level features, but the complete structure and clear contour could not be predicted. The advantage of our PGPLP is mainly due to the coarse-to-fine deep FCN framework, in which the coarse-level Pyramid features are progressively integrated into fine-level features to roughly locate the lung infection regions and precisely segment the lesions respectively in an escalatory manner. This process mimics how a clinician visually segments the COVID-19 lung infected regions from CT slices and therefore can achieve good performance.

Fig. 6

Visual comparisons of lung infection segmentation using different algorithms. The green and the yellow areas represent the undetected and detected true infection regions, respectively. The red areas indicate the false infection regions that are incorrectly detected. (a) The original CT images from the test set. (b) The corresponding ground truth for each image. (c-j) The corresponding segmentation results from the eight state-of-the-art models. (k) The segmentation maps of the proposed PGPLP.

Ablation Studies

To assess the contribution of different modules of the proposed PGPLP architecture, an ablation study was performed under the same environment. The effectiveness of each component (GPM, LPM) in the proposed model is demonstrated in Fig. 7, Fig. 8 and Table 6 .

Fig. 7

Performance comparisons using different variants of the proposed PCPLP model.

Fig. 8

Qualitative performance comparisons using different variants of the proposed PCPLP model.

Table 6

Quantitative performance comparisons using different variants of the proposed PCPLP model, the best two results are highlighted in red and blue colors, respectively.

Image, table 6

Performance comparisons using different variants of the proposed PCPLP model. Qualitative performance comparisons using different variants of the proposed PCPLP model. Quantitative performance comparisons using different variants of the proposed PCPLP model, the best two results are highlighted in red and blue colors, respectively. In this part, we conducted an ablation study on three variants of the proposed PGPLP model: encoder-decoder model without GPM and LPM (Backbone), model without LPM (Backbone+GPM), and model without GPM (Backbone+LPM). The comparison results are shown in Fig. 7 and Table 6. As observed in Fig. 7, the embedded GPM and LPM are necessary for boosting performance. The advantages of GPM and LPM can also be found in Table 6. As can be seen in Table 6, the model with embedded GPM or LPM can perform competitively compared to the Backbone, and the complete model including both GPM and LPM can obtain more satisfactory performance than the model without LPM or GPM. To some extent, GPM helps the proposed model to obtain excellent results in localizing and counting the COVID-19 infection regions, and LPM also enables the model to achieve superior performance as it can accurately predict the contours of the infected regions. On the whole, the GPM and LPM improve the segmentation results in guiding the network to learn more informative features for the task of segmenting the COVID-19 infected regions. Visual comparisons of the segmentation maps using different variants of the proposed PGPLP on the test set are presented in Fig. 8. It can be observed that the results of Backbone+GPM include complete infected regions, which demonstrates that the GPM component can enable the proposed model to exactly distinguish the true COVID-19 lesions. Meanwhile, the infection regions obtained by the Backbone+LPM model have sharpness boundaries, which verifies that the LPM component can significantly contribute to the fine-grained detail improvement. The optimum performance is gained with the joint GPM and LPM for higher emphasis on the structures and contours of the infected regions. Overall, both the GPM and LPM components play vital roles in the PGPLP model and bring lots of advantages to the segmentation results.

Conclusion

In this paper, a novel fully convolutional encoder-decoder network, named PGPLP, is proposed to help doctors quickly analyze the severity of pneumonia infections by segmenting the lung infected regions of COVID-19 from CT images. The proposed model employs a global perception module and a local polishing module to improve the localization and identification of the infection regions by more effectively retaining and assembling multi-level structure information and multi-scale detail information. Moreover, the multi-scale multi-level feature recursive aggregation strategy is exploited to integrate the multi-scale and multi-level features in a progressive manner, which not only substantially narrows the semantic gaps between the encoder and decoder blocks but also develops parallel inter-linking among multi-scale and multi-level features, thus greatly alleviates the vanishing gradient problem. Furthermore, our boundary-aware multiple supervision achieves considerable improvement over traditional single supervision methods for introducing more boundary information to produce finer details. Extensive experiments show that the proposed PGPLP architecture is capable of segmenting the infection regions of COVID-19 lesions under challenging conditions such as blurred infected interiors, diffusive infecting regions, and scattered boundaries. Automated image segmentation enabled by our research offers opportunities to quantify the COVID-19 lesions, visualize the infection regions, and rapidly tracking the disease changes in the clinical workflow with minimal human intervention. Moreover, the proposed method has the potential to detect the abnormal areas between healthy tissues and lesions caused by other viruses. Although promising results are achieved, one limitation of the proposed PGPLP model is the difficulty in detecting the small infection regions from the CT images with poor contrast. Those CT images require enhancement and more intelligent AI architectures so that additional features representing new knowledge can be learned and eventually applied to segmentation. In the future, the internal context information can be extracted using attention features and will be added to the training dataset to optimize the performance of the proposed model. We are optimistic that the limitation mentioned above can be overcome. Furthermore, our future plan also includes the integration of the segmentation data with the clinical presentation and laboratory results to help physicians better examine and diagnose COVID-19.

Declaration of Competing Interest

We wish to confirm that there are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome. We also confirm that the manuscript has been read and approved by all named authors and that there are no other persons who satisfied the criteria for authorship but are not listed. We further confirm that the order of authors listed in the manuscript has been approved by all of us. We have given due consideration to the protection of intellectual property associated with this work and that there are no impediments to publication, including the timing of publication, with respect to intellectual property. In so doing we confirm that we have followed the regulations of our institutions concerning intellectual property. We understand that the Corresponding Author is the sole contact for the Editorial process (including Editorial Manager and direct communications with the office). He is responsible for communicating with the other authors about progress, submissions of revisions and final approval of proofs. We confirm that we have provided a current, correct email address that is accessible by the Corresponding Author and which has been configured to accept email from jinshant@mtu.edu.

15 in total

Review 1. Review of Artificial Intelligence Techniques in Imaging Data Acquisition, Segmentation, and Diagnosis for COVID-19.

Authors: Feng Shi; Jun Wang; Jun Shi; Ziyan Wu; Qian Wang; Zhenyu Tang; Kelei He; Yinghuan Shi; Dinggang Shen
Journal: IEEE Rev Biomed Eng Date: 2021-01-22

2. Classification of the COVID-19 infected patients using DenseNet201 based deep transfer learning.

Authors: Aayush Jaiswal; Neha Gianchandani; Dilbag Singh; Vijay Kumar; Manjit Kaur
Journal: J Biomol Struct Dyn Date: 2020-07-03

3. UNet++: A Nested U-Net Architecture for Medical Image Segmentation.

Authors: Zongwei Zhou; Md Mahfuzur Rahman Siddiquee; Nima Tajbakhsh; Jianming Liang
Journal: Deep Learn Med Image Anal Multimodal Learn Clin Decis Support (2018) Date: 2018-09-20

4. A Noise-Robust Framework for Automatic Segmentation of COVID-19 Pneumonia Lesions From CT Images.

Authors: Guotai Wang; Xinglong Liu; Chaoping Li; Zhiyong Xu; Jiugen Ruan; Haifeng Zhu; Tao Meng; Kang Li; Ning Huang; Shaoting Zhang
Journal: IEEE Trans Med Imaging Date: 2020-08 Impact factor: 10.048

5. Inf-Net: Automatic COVID-19 Lung Infection Segmentation From CT Images.

Authors: Deng-Ping Fan; Tao Zhou; Ge-Peng Ji; Yi Zhou; Geng Chen; Huazhu Fu; Jianbing Shen; Ling Shao
Journal: IEEE Trans Med Imaging Date: 2020-08 Impact factor: 10.048

6. Contrastive Cross-Site Learning With Redesigned Net for COVID-19 CT Classification.

Authors: Zhao Wang; Quande Liu; Qi Dou
Journal: IEEE J Biomed Health Inform Date: 2020-09-10 Impact factor: 5.772

7. Automated detection of COVID-19 cases using deep neural networks with X-ray images.

Authors: Tulin Ozturk; Muhammed Talo; Eylul Azra Yildirim; Ulas Baran Baloglu; Ozal Yildirim; U Rajendra Acharya
Journal: Comput Biol Med Date: 2020-04-28 Impact factor: 4.589

8. Frequency and Distribution of Chest Radiographic Findings in Patients Positive for COVID-19.

Authors: Ho Yuen Frank Wong; Hiu Yin Sonia Lam; Ambrose Ho-Tung Fong; Siu Ting Leung; Thomas Wing-Yan Chin; Christine Shing Yen Lo; Macy Mei-Sze Lui; Jonan Chun Yin Lee; Keith Wan-Hang Chiu; Tom Wai-Hin Chung; Elaine Yuen Phin Lee; Eric Yuk Fai Wan; Ivan Fan Ngai Hung; Tina Poy Wing Lam; Michael D Kuo; Ming-Yen Ng
Journal: Radiology Date: 2020-03-27 Impact factor: 11.105

9. Using Artificial Intelligence to Detect COVID-19 and Community-acquired Pneumonia Based on Pulmonary CT: Evaluation of the Diagnostic Accuracy.

Authors: Lin Li; Lixin Qin; Zeguo Xu; Youbing Yin; Xin Wang; Bin Kong; Junjie Bai; Yi Lu; Zhenghan Fang; Qi Song; Kunlin Cao; Daliang Liu; Guisheng Wang; Qizhong Xu; Xisheng Fang; Shiqin Zhang; Juan Xia; Jun Xia
Journal: Radiology Date: 2020-03-19 Impact factor: 11.105

10. Artificial intelligence for the detection of COVID-19 pneumonia on chest CT using multinational datasets.

Authors: Stephanie A Harmon; Thomas H Sanford; Sheng Xu; Evrim B Turkbey; Holger Roth; Ziyue Xu; Dong Yang; Andriy Myronenko; Victoria Anderson; Amel Amalou; Maxime Blain; Michael Kassin; Dilara Long; Nicole Varble; Stephanie M Walker; Ulas Bagci; Anna Maria Ierardi; Elvira Stellato; Guido Giovanni Plensich; Giuseppe Franceschelli; Cristiano Girlando; Giovanni Irmici; Dominic Labella; Dima Hammoud; Ashkan Malayeri; Elizabeth Jones; Ronald M Summers; Peter L Choyke; Daguang Xu; Mona Flores; Kaku Tamura; Hirofumi Obinata; Hitoshi Mori; Francesca Patella; Maurizio Cariati; Gianpaolo Carrafiello; Peng An; Bradford J Wood; Baris Turkbey
Journal: Nat Commun Date: 2020-08-14 Impact factor: 14.919

7 in total

1. A Comprehensive Performance Analysis of Transfer Learning Optimization in Visual Field Defect Classification.

Authors: Masyitah Abu; Nik Adilah Hanin Zahri; Amiza Amir; Muhammad Izham Ismail; Azhany Yaakub; Said Amirul Anwar; Muhammad Imran Ahmad
Journal: Diagnostics (Basel) Date: 2022-05-18

2. Fully automatic pipeline of convolutional neural networks and capsule networks to distinguish COVID-19 from community-acquired pneumonia via CT images.

Authors: Qianqian Qi; Shouliang Qi; Yanan Wu; Chen Li; Bin Tian; Shuyue Xia; Jigang Ren; Liming Yang; Hanlin Wang; Hui Yu
Journal: Comput Biol Med Date: 2021-12-29 Impact factor: 6.698

Progressive global perception and local polishing network for lung infection segmentation of COVID-19 CT images.

Introduction

Related Works

Artificial Intelligence for Combating COVID-19

Segmentation of COVID-19 Infections from Chest CT Images

The Proposed Lung Infection Segmentation Network

Overview of Network Architecture

Global Perception Module (GPM)

Local Polishing Module (LPM)

Multi-Scale Multi-Level Feature Recursive Aggregation (mmFRA)

Boundary-Aware Multiple Supervision (BMS) Module

Experimental Results

Experimental Setup

Experimental Results

Ablation Studies

Conclusion

Declaration of Competing Interest

Review 1. Review of Artificial Intelligence Techniques in Imaging Data Acquisition, Segmentation, and Diagnosis for COVID-19.

2. Classification of the COVID-19 infected patients using DenseNet201 based deep transfer learning.

3. UNet++: A Nested U-Net Architecture for Medical Image Segmentation.

4. A Noise-Robust Framework for Automatic Segmentation of COVID-19 Pneumonia Lesions From CT Images.

5. Inf-Net: Automatic COVID-19 Lung Infection Segmentation From CT Images.

6. Contrastive Cross-Site Learning With Redesigned Net for COVID-19 CT Classification.

7. Automated detection of COVID-19 cases using deep neural networks with X-ray images.

8. Frequency and Distribution of Chest Radiographic Findings in Patients Positive for COVID-19.

9. Using Artificial Intelligence to Detect COVID-19 and Community-acquired Pneumonia Based on Pulmonary CT: Evaluation of the Diagnostic Accuracy.

10. Artificial intelligence for the detection of COVID-19 pneumonia on chest CT using multinational datasets.

1. A Comprehensive Performance Analysis of Transfer Learning Optimization in Visual Field Defect Classification.

2. Fully automatic pipeline of convolutional neural networks and capsule networks to distinguish COVID-19 from community-acquired pneumonia via CT images.

3. CHS-Net: A Deep Learning Approach for Hierarchical Segmentation of COVID-19 via CT Images.

4. Automatic COVID-19 Lung Infection Segmentation through Modified Unet Model.

5. COVID-19 Infection Segmentation and Severity Assessment Using a Self-Supervised Learning Approach.

6. GFNet: Automatic segmentation of COVID-19 lung infection regions using CT images based on boundary features.

7. Multiscale U-Net with Spatial Positional Attention for Retinal Vessel Segmentation.