Literature DB >> 33129141

Dual-branch combination network (DCN): Towards accurate diagnosis and lesion segmentation of COVID-19 using CT images.

Kai Gao¹, Jianpo Su¹, Zhongbiao Jiang², Ling-Li Zeng¹, Zhichao Feng³, Hui Shen⁴, Pengfei Rong⁵, Xin Xu¹, Jian Qin¹, Yuexiang Yang⁶, Wei Wang³, Dewen Hu¹.

Abstract

The recent global outbreak and spread of coronavirus disease (COVID-19) makes it an imperative to develop accurate and efficient diagnostic tools for the disease as medical resources are getting increasingly constrained. Artificial intelligence (AI)-aided tools have exhibited desirable potential; for example, chest computed tomography (CT) has been demonstrated to play a major role in the diagnosis and evaluation of COVID-19. However, developing a CT-based AI diagnostic system for the disease detection has faced considerable challenges, which is mainly due to the lack of adequate manually-delineated samples for training, as well as the requirement of sufficient sensitivity to subtle lesions in the early infection stages. In this study, we developed a dual-branch combination network (DCN) for COVID-19 diagnosis that can simultaneously achieve individual-level classification and lesion segmentation. To focus the classification branch more intensively on the lesion areas, a novel lesion attention module was developed to integrate the intermediate segmentation results. Furthermore, to manage the potential influence of different imaging parameters from individual facilities, a slice probability mapping method was proposed to learn the transformation from slice-level to individual-level classification. We conducted experiments on a large dataset of 1202 subjects from ten institutes in China. The results demonstrated that 1) the proposed DCN attained a classification accuracy of 96.74% on the internal dataset and 92.87% on the external validation dataset, thereby outperforming other models; 2) DCN obtained comparable performance with fewer samples and exhibited higher sensitivity, especially in subtle lesion detection; and 3) DCN provided good interpretability on the loci of infection compared to other deep models due to its classification guided by high-level semantic information. An online CT-based diagnostic platform for COVID-19 derived from our proposed framework is now available.

Entities: Chemical Disease Gene Species

Keywords: Attention; COVID-19; CT image; Combined segmentation and classification

Mesh：

Year: 2020 PMID： 33129141 PMCID： PMC7543739 DOI： 10.1016/j.media.2020.101836

Source DB: PubMed Journal: Med Image Anal ISSN： 1361-8415 Impact factor: 8.545

Introduction

There has been a global outbreak and rapid spread of coronavirus disease (COVID-19) since the beginning of 2020. On March 1, 2020, the disease was declared a pandemic by the World Health Organization (WHO) (Roosa et al., 2020; Yan et al., 2020). According to real-time data published by WHO, more than 19 million people had been infected by the disease as at August 8, 2020, and over 716,000 victims had succumbed to it. Undoubtedly, the epidemic has become a severe challenge to the global human population. Therefore, accurate and efficient diagnosis of the disease is an imperative. The reverse transcription-polymerase chain reaction (RT-PCR) test is regarded as the gold standard for COVID-19 diagnosis, but it is time-consuming and suffers from high false-negative rates (Ai et al., 2020; Chan et al., 2020; Fang et al., 2020). As a supplement, the chest computed tomography (CT) scan is more sensitive and efficient for COVID-19 diagnosis in practice and has been widely applied for early screening of the disease (Ai et al., 2020). Previous studies have shown that lesion size and severity can also be evaluated from chest CT images to facilitate the assessment of disease progression and subsequent treatment (Shi et al., 2020b). Thus, CT has been recognized as a COVID-19 diagnostic criterion in the Chinese “COVID-19 treatment plan (trial version 7)” (Chung et al., 2020; Huang et al., 2020a). However, manual evaluation of CT images typically takes several hours, which is not acceptable for COVID-19 clinical diagnosis given the efficiency demands of numerous suspected and confirmed cases. Therefore, it is critical to develop an AI-aided CT diagnostic system for rapid diagnosis and accurate evaluation of COVID-19 cases. The past decade has witnessed the emergence of deep learning, which has proven relatively superior in computer vision and pattern recognition (LeCun et al., 2015). Classification models, such as AlexNet (Krizhevsky et al., 2012) and VGGNet (Simonyan and Zisserman, 2014), used a series of cascaded convolutional modules to extract features for image classification. ResNet (He et al., 2016) introduced shortcuts to convolutional neural network (CNN) and mitigated the vanishing gradient problem. DenseNet (Huang et al., 2017) utilized skip connections between every two layers and replaced summation with concatenation operation for easier information flow. In the field of image segmentation, Long et al. used a fully convolutional network to segment images and pioneered the application of deep learning in image segmentation tasks (Long et al., 2015). Several deep segmentation networks, such as DeepLab (Chen et al., 2018), PSPNet (Zhao et al., 2017b), and U-net (Ronneberger et al., 2015), were subsequently proposed and further improved image segmentation performance. Among them, U-net has been widely applied in medical image segmentation because of its simple and easy-to-train structure; hence, we adopted it in this study. Deep learning methods are also widely used in medical image analysis (Chen et al., 2019; Huang et al., 2020b; Lei et al., 2020; Li et al., 2020b; Litjens et al., 2017; Shen et al., 2017). Recently, deep learning has been utilized in COVID-19 diagnosis and evaluation, and the results have been encouraging (Shi et al., 2020a). Several studies utilized end-to-end classification models for COVID-19 diagnosis. For example, Li et al. proposed a three-dimensional (3D) COVID-19 detection neural network (COVNet) to distinguish COVID-19 from community-acquired pneumonia and achieved an area under curve (AUC) score of 0.96 (Li et al., 2020a). Likewise, a 3D DeCoVNet was proposed for COVID-19 classification and achieved 90.7% sensitivity and 91.1% specificity (Zheng et al., 2020). However, the interpretability of the results was limited, thereby hindering its clinical application. In some other studies, lesion segmentation was accomplished first, and classification was performed based on the segmentation results. For instance, (Jin et al., 2020) proposed a three-stage model with U-net and 3D CNN for the diagnosis and evaluation of COVID-19. The model achieved a dice similarity coefficient (DSC) of 0.754, sensitivity of 97.2%, and specificity of 92.2% (Jin et al., 2020). Chen et al. used a Nested U-net to delineate the lesions and divided the results into quadrants for individual-level prediction (Chen et al., 2020). The accuracies at the slice level and individual level were 98.85% and 95.24%, respectively. Zhang et al. developed an AI system to differentiate COVID-19 from common pneumonia as well as normal controls and achieved a weighted accuracy of 92.49% (Zhang et al., 2020). The problem of this kind of method is that the classification results are highly dependent on the segmentation performance. Thus, useful information may be excluded from the CT images due to inaccurate segmentation, thereby worsening the classification performance. To date, most of the studies have conducted the classification and segmentation processes separately. In fact, the two tasks can be combined to achieve better performance. Lesions in CT images are decisive in COVID-19 screening, but the lesion size is usually minor in the early stage of the disease and may be neglected by the classification network. However, the intermediate results from the segmentation network may help to focus the classification network more intensively on the lesion foci for accurate diagnosis through an attention mechanism (Fu et al., 2019; Hu et al., 2019; Oktay et al., 2018; Wang et al., 2017; Wang et al., 2018). Moreover, the attention maps can unveil regions that are crucial for classification, thus improving the interpretability of deep learning models and assisting in further assessment by clinicians. Hence, improved performance can be achieved by combining the classification and segmentation tasks. In this study, we proposed a combined segmentation–classification framework that simultaneously accomplishes COVID-19 diagnosis and the segmentation of lesions based on chest CT images. A U-net-based lung segmentation was first performed to delineate the lung contours. Then, a proposed dual-branch combination network (DCN) was used to perform slice-level segmentation and classification. We proposed a lesion attention (LA) module in DCN to utilize the intermediate results of both segmentation and classification branches to improve the classification performance. Finally, a slice probability mapping strategy and a fully connected network (FCN) were adopted to obtain individual-level results from slice-level results, adapting our method to CT scans with different slice numbers. We compared the performance of DCN to other models and proved its efficacy in image classification. In addition, we found that the proposed method was more sensitive to the classification of images with minor lesions. This is extremely helpful for the early COVID-19 diagnosis as lesions in the early stage are usually subtle and difficult to detect (Macmahon et al., 2017). More precisely, the contributions of this study are summarized as follows. COVID-19 segmentation and classification are simultaneously achieved using the proposed DCN, and a novel weighted Dice loss is proposed to ensure the trainability of the network. The sensitivity to COVID-19 is significantly improved, especially for subtle lesions. The intermediate attention maps produced by the proposed LA module provides interpretability for the classification.

Methods

Overall framework

The overall framework of the proposed method (Fig. 1 (A)) can be divided into three parts. Part 1 is a lung segmentation network based on U-net to extract accurate lung regions. Part 2 is the proposed DCN (Fig. 1(B)), which can accomplish simultaneous slice-level classification and segmentation of CT images with the proposed LA module (Fig. 1(C)). In part 3, the slice results are integrated with a slice probability mapping method to obtain the classification results at individual level with a three-layer fully connected network.

Fig. 1

A: The overall framework of our method, which consists of three parts: 1) lung segmentation using U-net; 2) slice-level combined segmentation and classification using the proposed dual-branch combination network (DCN); and 3) individual-level classification with a three-layer fully connected network. A slice probability mapping strategy is utilized to obtain individual-level results, considering inter-subject differences in slice number. B: Details of the DCN. The segmentation branch is based on U-net, and ResNet-50 with four residual blocks is utilized as the backbone of the classification branch. A lesion attention (LA) module is introduced before each residual block to combine classification features with segmentation features of corresponding scale using an attention mechanism. C: Internal structure of the LA module. f1, f2 represent rectified linear unit (ReLU) and sigmoid activation, respectively, and f3 is a series of convolution, batch normalization, and ReLU operations.

Lung segmentation

The images require preprocessing to eliminate interference and obtain the region of interest, that is, the lung. Thresholding methods based on Hounsfield unit (HU) values are widely used for chest CT image preprocessing (Iii and Sensakovic, 2004). However, these thresholding methods are not accurate enough in practice, especially for CT images of patients with COVID-19. A possible explanation is that the HU values of the lesions in patients are relatively high, and it is difficult to distinguish them from other organs using thresholding methods; thus, the subsequent analysis is affected. Therefore, we trained a lung segmentation model based on U-net (Ronneberger et al., 2015) to achieve better lung segmentation results. The lung segmentation model has the same architecture as the segmentation branch of DCN, which is described in Section 2.3.1.

Dual-branch combination network

Model structure

We proposed DCN to accomplish simultaneous classification and segmentation of CT images. The network consists of a classification branch and segmentation branch, corresponding to the classification and segmentation tasks, respectively. The backbone of the classification branch is ResNet-50 (Wang et al., 2017), including four residual blocks. The backbone of the segmentation branch is U-net and comprises an encoder and a decoder. The five blocks of the encoder consist of 64, 128, 256, 512, and 1024 channels respectively. Four 2 × 2 max-pooling layers and four 2 × 2 up-sample layers are used for down-sampling and up-sampling. Each convolution block consists of a 3 × 3 convolution (Conv) layer, a batch normalization (BN) layer (Ioffe and Szegedy, 2015), a rectified linear unit (ReLU) (Nair and Hinton, 2010), and a second 3 × 3 Conv layer. The outputs of the encoding blocks are concatenated with the corresponding decoding blocks using skip connections (Huang et al., 2017). The intermediate results of the two branches are combined with the proposed LA modules. Backpropagation between the two branches is cut off to ensure the trainability of the model. DCN receives the segmented lung images obtained from Section 2.2 as inputs, and outputs the slice-level classification and segmentation results.

Lesion attention module

To better integrate the information of the two branches and improve the classification performance, we proposed the LA module. The inputs of the LA module contain two parts: x from the classification branch and x from the segmentation branch. The attention mechanism is utilized to focus the classification branch more on lesions. The formulations of the LA module are as follows: where is the channel-level concatenation; , , and are weights of 1 × 1 Conv layers; b b, and b are the corresponding biases; F and F refer to input channel sizes of the classification and segmentation branches, respectively; and F represents the output channel size of the corresponding Conv layers. Functions and correspond to ReLU and sigmoid activation function, respectively. The attention map is then normalized to [0, 1]. The final output of the LA module can be written as:where f 3 comprises a series of units including two 1 × 1 Conv layers (,), BN, and a ReLU.

Slice probability mapping

DCN handles the classification of each slice. We then need to incorporate the slice results to achieve individual-level classification and determine whether the subject is infected by COVID-19. However, the slice numbers vary in different subjects owing to the diverse slice thicknesses, fields of view, or volumes of lungs. Some studies utilized max-pooling or average-pooling on fully connected layers to eliminate the effects of this problem (Li et al., 2020a). However, this may lead to loss of information as the approach only saves the max or average signals of all slices. To maximize the information from each slice, we proposed a slice probability mapping strategy based on resampling. Specifically, we sorted the results of slices (that is, the probability of being infected) in descending order and fitted the curve with a bilinear interpolation approach (Li and Orchard, 2001). We then acquired 100 values from the curve in identical intervals and obtained consecutive probabilities in descending order. A simple three-layer FCN was then applied to the classification of individuals with the derived 100 values as input. The numbers of nodes in the two hidden layers are 256 and 128, respectively.

Loss function

The proposed DCN is a slice-level end-to-end network composed of a classification branch and a segmentation branch. Its loss function also comprises two parts: classification and segmentation losses. Similar to ResNet, we used cross-entropy loss (Zhao et al., 2017a) for the slice-level classification:where y denotes the true label of the sample, and refers to the predicted label. The original U-net used binary cross-entropy (BCE) loss (Ronneberger et al., 2015; Zhao et al., 2017a), which performed poorly on our dataset. CT images of patients with COVID-19 are extremely imbalanced data for segmentation because the region of lesions is usually much smaller compared with the normal region and background; and BCE loss is not suitable for this circumstance (Milletari et al., 2016; Sudre et al., 2017). To deal with this problem, we used Dice loss (Milletari et al., 2016), which is an objective function that directly optimizes the network on the evaluation metric (Dice similarity coefficient (DSC)). The slice-level Dice loss can be written as:where X is the ground truth; Y is the predicted result; and p represent the value of the i pixel of the predicted result and ground truth, respectively. The smooth parameter s was used to prevent division by 0 and was set to 1 in this paper. Samples from normal subjects are necessary to train the classification branch. However, for the segmentation task, images of normal subjects are all negative samples. This can exacerbate the imbalance of samples, which will affect the training of the segmentation branch. To solve the problem, we proposed a novel weighted Dice loss for the segmentation branch: where w is the loss weight determined by the label of samples. The weights of slices with/without annotated lesions are set to 1/0, which means only slices with annotated lesions participate in the backpropagation of the segmentation branch. The total loss function can be written as:where λ is the trade-off parameter for the two losses, and we set in this study experimentally. We used Dice and BCE losses for the lung segmentation network and FCN, respectively.

Experiments and results

Materials

Subjects

A total of 1918 CT scans from 1202 subjects (704 patients versus 498 controls, 210,395 slices) collected in ten hospitals were enrolled in the study. The data were divided into an internal training set (48 patients versus 75 controls, 6130 slices) from the First Hospital of Yueyang and an external validation set (656 patients versus 423 controls, 204,265 slices) from nine other hospitals. Detailed information of the data source can be found in Table 1 . The internal training set was used for training and testing with a five-fold cross-validation strategy. The external validation set was used to evaluate the generalization performance of the model. All patients were laboratory-confirmed COVID-19 cases by RT-PCR test. The Institutional Review Board of Third Xiangya Hospital approved our study and waived the informed consent of patients based on the retrospective nature of the study. The personal information of the patients was removed in this study.

Table 1

Data composition and sources.

	COVID-19			Normal
	patients	scans	slices	patients	scans	slices
Internal	48	48	2371	75	75	3759
External	656	1372	166937	423	423	37328

Independent cohorts

Yueyang	48	48	2371	75	75	3759
Changsha 1	39	110	8976	423	423	37328
Changsha 2	201	578	46898	-	-	-
Wuhan	190	190	12199	-	-	-
Changde	76	133	50668	-	-	-
Xiangtan	39	106	24270	-	-	-
Shaoyang	62	144	8085	-	-	-
Hengyang	11	35	11704	-	-	-
Loudi	32	70	3966	-	-	-
Yiyang	6	6	171	-	-	-

Data composition and sources.

Image acquisition and preprocessing

All subjects in the internal training set underwent a thick-section CT scan (Anke ANATOM 16 HD, First Hospital of Yueyang, China). The CT protocol was as follows: tube voltage, 120 kV; automatic tube current, 120 mA–240 mA; iterative reconstruction; 64 mm detector; slice thickness, 5 mm–6 mm; pitch, 1; matrix, 512 × 512; field of view, 360 × 360; and breath-hold at full inspiration. The scan parameters of the external testing set can be found in Table S1. Dicom files were converted into images using the Pydicom toolkit (Mason, 2011). The pixel values of the images represent HU values within the window of -900 HU–100 HU. They were further normalized into 8-bit grayscale (0–255).

Data annotation

Although the thresholding methods are inaccurate for severely infected lungs, they can still be utilized to reduce the pressure of manual annotation by manual supervision and correction. We first used a threshold-based lung CT preprocessing approach to extract the lung areas (Iii and Sensakovic, 2004). A series of morphological processes, such as dilation and erosion, were then performed to obtain better results. We checked each slice and selected slices with a good shape as the ground truth for lung segmentation. Slices with unsatisfactory results were manually re-delineated. Furthermore, we asked six experienced radiologists to annotate the CT images of patients with COVID-19 in the internal dataset at pixel level. In the segmentation task, each pixel was annotated as a lesion of COVID-19 or background (labeled as 1 or 0). A total of 2371 slices from patients were annotated manually, and each slice was annotated by one radiologist. We asked three radiologists to annotate the same CT images from part of patients as a comparison between the segmentation performance of DCN and radiologists. For each slice of patients in the classification task, we considered it as a positive sample if lesions were marked by radiologists and set the slice label to 1. Otherwise, we considered the slice as a negative sample and set the label to 0. Slices from healthy controls were labeled as 0. Given the large amount of data in the external dataset and the lack of annotation experts, we did not annotate the external dataset at slice level.

Parameters and metrics

Training details

All training and testing processes were performed using Pytorch (Steiner et al., 2019) on a server with NVIDIA Tesla P100 GPUs. The lung segmentation, DCN, and FCN models were trained separately. The lung segmentation model was trained in 50 epochs with a batch size of 16. Likewise, DCN, VGGNet, ResNets, and DenseNet were trained in 100 epochs with a batch size of 8. The FCN model was trained in 20 epochs with a batch size of 16. All the models were optimized using Adam optimizer (Kingma and Ba, 2015) with an initial learning rate of 0.001 and a learning decay rate of 0.95 per epoch. Five-fold cross-validation was utilized in the internal training stage. For the external validation stage, the model was pre-trained using all samples of the internal dataset and tested on the external dataset. To deal with the problem of imbalanced data sizes in the training stage, an under-sampling approach (Buda et al., 2018) was adopted for negative samples. Precisely, all positive samples and an equivalent number of randomly selected negative samples were used for training in an epoch, and negative samples were re-sampled in the next epoch.

Evaluation metrics

In this study, we adopted a commonly used metric, DSC, to evaluate segmentation performance; precision and recall were also calculated at a threshold of 0.5: where N represents the number of pixels; subscripted T/F means the pixel is correctly/incorrectly predicted; and subscripted P/N refers to whether the pixel is a positive/negative sample. Accuracy (Acc), sensitivity (Sen), and specificity (Spc) were utilized to evaluate the classification performance. Accuracy is used to describe the performance on the whole dataset, whereas sensitivity and specificity represent the classification results for patients and normal controls, respectively: where TP, FP, TN, and FN refer to the numbers of true-positive, false-positive, true-negative, and false-negative samples, respectively. The average accuracy (AA) was also introduced to eliminate the interference of data imbalance: The receiver operating characteristic (ROC) curve and AUC were used to evaluate the network segmentation and classification performances.

Segmentation results

The DSC of lung segmentation was 99.11% (Table 2 ). A comparison between manual annotation and U-net-based lung segmentation is shown in Fig. 2 (A). It can be observed that the segmentation of U-net is highly consistent with the ground truth, which provides a strong guarantee for subsequent analysis.

Table 2

Segmentation performance of lung and lesion on the internal dataset with five-fold cross-validation.

	DSC	Precision	Recall
Lung	99.11%	99.33%	98.89%
Lesion	83.51%	83.46%	83.55%

Fig. 2

A: Manual and AI-based segmentation of lung and lesions in CT images from four patients. B: The left figure shows the segmentation results of three radiologists and the automatic results of DCN. The right figure shows the uncertain region without the consensus of all three radiologists. C: Pixel level ROC curve of DCN and performance of radiologists.

Segmentation performance of lung and lesion on the internal dataset with five-fold cross-validation. A: Manual and AI-based segmentation of lung and lesions in CT images from four patients. B: The left figure shows the segmentation results of three radiologists and the automatic results of DCN. The right figure shows the uncertain region without the consensus of all three radiologists. C: Pixel level ROC curve of DCN and performance of radiologists. For the segmentation of lesions, we achieved a DSC of 83.51%. The segmentation results are shown in Fig. 2(A). To better evaluate the performance of the proposed segmentation method, a comparison between the proposed DCN and segmentation of three radiologists was performed, and the results are shown in Fig. 2(B) and (C). Annotated lesions without the consensus of all three radiologists are labeled as uncertain regions. A pixel-level ROC curve is shown in Fig. 2(C); our method reached an AUC of 0.964. The results of the three radiologists are also shown in the diagram. The results show that the performance of our method is comparable with an average of three radiologists, which indicates that our segmentation algorithm is comparable to human-level annotation and capable of COVID-19 auxiliary diagnosis.

Classification results

Internal dataset

As we used a slice-based strategy, two results would be obtained on both the slice level and individual level. Five other deep learning models (VGG-16, ResNet-34, ResNet50, ResNet101, and DenseNet-121) were also used for comparison with DCN, and the other parts of the framework (lung segmentation and FCN) were kept for fair comparisons. The slice-level training and testing performances of fold 1 are shown in Fig. 3 . We found that the training process of DCN was more stable compared to that of other networks. All the other five models suffered from overfitting according to the significant difference between the training and testing performances. Using our model, the gap between the training and testing stages was smaller, which means DCN is more resistant to overfitting. This is probably due to the extra input in each LA module from the segmentation branch and the benefits from the attention mechanism.

Fig. 3

Training and testing performance of six models (VGG-16, ResNet-34, ResNet-50, ResNet-101, DenseNet-121, and the proposed DCN) on the internal dataset (fold 1). Solid lines refer to training and testing accuracies, and dashed lines denote corresponding losses. For all patients and healthy controls, we achieved a slice-level accuracy of 95.99% and an individual-level accuracy of 96.74%, which are significantly higher than the results of other models. The ROC curves are shown in Fig. 3. The proposed DCN also achieved the best performance with a slice-level AUC of 0.9755 and an individual-level AUC of 0.9864. The detailed results are presented in Table 3 . We further divided the slices with lesions into six groups (0–1 k, 1–2 k, 2–3 k, 3–4–k, 4–5 k, ≥5 k) according to the number of pixels of the lesion regions and calculated the accuracy of each group. As shown in Fig. 5, the proposed method outperformed other methods in all six groups and significantly improved the classification accuracy of small-lesion slices, which is vital for the early diagnosis of COVID-19.

Table 3

Classification performance. Slice-level and individual-level results of the internal dataset and individual-level results of the external dataset are illustrated. Slice-level results of the external dataset are unavailable due to the lack of slice-level annotation. Acc, AA, Sen, Spc represent accuracy, average accuracy, sensitivity, and specificity, respectively.

		Slice-level					Individual-level
Cohort	Method	Acc (%)	AA (%)	Sen (%)	Spc (%)	AUC	Acc (%)	AA (%)	Sen (%)	Spc (%)	AUC
Internal validation	VGG16	92.68	86.46	74.89	98.02	0.9392	93.49	93.54	93.75	93.33	0.9422
	ResNet-34	92.25	85.37	72.57	98.17	0.9328	91.87	92.13	89.58	94.67	0.9114
	ResNet-50	93.96	89.15	80.20	98.09	0.9510	94.31	94.58	95.83	93.33	0.9506
	ResNet-101	93.05	86.57	74.51	98.62	0.9499	94.31	94.58	95.83	93.33	0.9294
	DenseNet-121	93.50	88.51	79.20	97.81	0.9472	93.49	93.54	93.75	93.33	0.9467
	DCN (ours)	95.99	93.59	89.14	98.04	0.9755	96.74	96.95	97.91	96.00	0.9864
External validation	VGG16	-	-	-	-	-	87.58	87.87	87.32	88.42	0.9264
	ResNet-34	-	-	-	-	-	90.03	89.47	90.52	88.42	0.9383
	ResNet-50	-	-	-	-	-	90.92	90.14	91.62	88.65	0.9512
	ResNet-101	-	-	-	-	-	90.58	90.74	90.45	91.02	0.9493
	DenseNet-121	-	-	-	-	-	86.41	85.09	87.68	82.51	0.9128
	DCN (ours)	-	-	-	-	-	92.87	92.89	92.86	92.91	0.9771

Fig. 5

Classification results on slices of different lesion sizes. The annotated images were divided into 6 groups according to lesion size, and we calculated the sensitivity of each group.

External dataset

Different CT scanning equipment and parameters may cause variations in CT data. To verify the generalization performance of our method, we tested the model on the external dataset from nine different hospitals scanned with different equipment and parameters. The external dataset included 1795 CT scans from 656 patients and 423 normal controls. The slice thickness varied from 0.6 mm to 10 mm. The models were pre-trained on the internal dataset and tested on the external dataset. The proposed DCN achieved 92.87% accuracy, 92.86% sensitivity, and 92.91% specificity at the individual level, which significantly outperformed those of other models. The ROC curves are shown in Fig. 4 , and the proposed method achieved the best AUC of 0.9771.

Fig. 4

Slice-level and individual-level ROC curves on the internal dataset and individual-level ROC curve on the external dataset.

Slice-level and individual-level ROC curves on the internal dataset and individual-level ROC curve on the external dataset. Classification results on slices of different lesion sizes. The annotated images were divided into 6 groups according to lesion size, and we calculated the sensitivity of each group.

Training with small samples

Training with small samples was also performed to evaluate the generalization performance of the models. The models were trained with different sample sizes and tested on a balanced dataset with 1000 images. The sensitivity (solid lines) and specificity (dashed lines) are shown in Fig. 6 . The specificity of all six models maintained a relatively high level (over 95%), and the increase in the model performance was mainly due to the increase in sensitivity; this means the increment of the training samples enhanced the ability of the networks to detect lesions. The proposed DCN achieved significant progress in sensitivity on the small training samples.

Fig. 6

Model performance with small-size training datasets. Solid lines refer to sensitivity on slices with lesions while dashed lines represent specificity on normal slices. The models were tested on a selected dataset with 500 positive samples and 500 negative samples.

Comparison to other COVID-19 study

To better evaluate our method, we compared it with other methods designed for COVID-19 classification. COVNet (Li et al., 2020a) and 3D-ResNet (Zhang et al., 2020) were implemented on our dataset, and the results are shown in Table 4 . The results demonstrate the superiority of our method. We also observed the significant drop in performance of COVNet and 3D-ResNet on the external dataset. It is maybe due to data heterogeneity because the external dataset was scanned using different parameters with the internal dataset. In comparison, our DCN has better compatibility with data heterogeneity.

Table 4

			Slice-level				Individual-level				External validation
Method	LA module	SPM module	Acc (%)	AA (%)	Sen (%)	Spc (%)	Acc (%)	AA (%)	Sen (%)	Spc (%)	Acc (%)	AA (%)	Sen (%)	Spc (%)
COVNet			-	-	-	-	88.61	87.65	83.33	92.00	77.58	79.83	75.72	83.94
3D-ResNet			-	-	-	-	92.68	91.75	87.50	96.00	77.64	81.27	74.64	87.90
DCN(base)			94.40	92.93	90.19	95.67	91.87	91.50	89.58	93.33	77.64	84.85	71.67	98.02
DCN		✓	94.40	92.93	90.19	95.67	94.31	94.58	95.83	93.33	89.44	90.81	88.29	93.33
DCN	✓		95.99	93.59	89.14	98.04	94.31	94.21	94.67	93.75	87.48	89.90	85.47	94.32
DCN	✓	✓	95.99	95.59	89.14	98.04	96.74	96.95	97.91	96.00	92.87	92.89	92.86	92.91

Comparison of DCN with other COVID-19 classification methods. COVNet and 3D-ResNet were tested, and the individual-level results of internal and external datasets were obtained. DCNs with different modules were tested in an ablation study to evaluate the effectiveness of the modules. LA refers to lesion attention; SPM represents slice probability mapping. Moreover, we conducted an ablation study on DCN to measure the effects of the LA module and slice probability mapping. In the base model of DCN, the LA module was replaced with a 1 × 1 Conv layer, and the slice probability mapping was replaced with the max-pooling of the features derived from the last residual block. We observed that the LA module significantly improved the slice-level classification accuracy, which emphasizes the effectiveness of the attention mechanism for COVID-19 classification. Moreover, the slice probability mapping improved the individual-level accuracy, especially for the external dataset, which proved that slice probability mapping improved the generalization of the model.

Attention maps

In further analyzing the proposed DCN, the attention maps derived from the testing stage are shown in Fig. 7 , including four patients and four controls. The images of the six rows represent original testing images, lesion masks, and attention maps from four LA modules, respectively. It can be observed that the consistency between lesion masks and attention maps is very high, especially for LA modules 2 and 3. In other words, the module enables the network to focus on areas with lesions. The attention maps reveal the emphasized areas for classification and promote interpretability for the classification results. We also found that some activated areas in the first and last attention maps were inconsistent with the lesion masks. This is probably because the classification input of the first LA module and the segmentation input of the last LA module come from the shallow layer of the network and contain more shallow semantic information. Quantitative analysis was also performed by calculating the DSC of the generated masks. We resized the generated masks into the same size of input images and calculated the DSC of the resized masks. DSCs of 0.28, 0.56, 0.46, and 0.17 were achieved for four LA modules, respectively, which are consistent with the analysis above.

Fig. 7

Attention maps of internal dataset produced by LA modules. The first row shows the lung segmentation images, and the second row shows the manually annotated lesion masks. Att_maps 1-4 refer to attention maps of LA modules 1-4, respectively.

Online platform

Based on the high accuracy of the proposed method, we built a cloud platform for COVID-19 auxiliary diagnosis and lesion segmentation (http://218.77.58.164:8808/index). The system can process data in batches and provide feedback on the risks of being infected and possible lesion regions in a few seconds. The platform provides COVID-19 diagnostic and segmentation assistance to doctors and others worldwide, thereby relieving their burden and providing support for the global fight against the COVID-19 epidemic.

Discussion

CT imaging has proven to be an effective tool for the diagnosis and quantification of COVID-19, but the image reading is time-consuming. AI-based auxiliary diagnoses of CT scans are crucial for the early screening of COVID-19. In this study, we proposed a combined segmentation–classification framework for the segmentation of lesions and diagnosis of COVID-19 based on chest CT images. The method achieved an accuracy of 96.74% and AUC of 0.9864 on the internal dataset with five-fold cross-validation. The generalization performance of the proposed method was confirmed on a large multi-site external dataset with an accuracy of 92.87%. The experiments demonstrated that DCN outperformed five other commonly used classification models on both internal and external datasets. Furthermore, we compared DCN with two other COVID-19 classification methods, and DCN achieved superior performance. This is probably because we trained the models on a relatively small dataset, and our slice-based method is easier to be trained than the individual-based methods that require more training data. The proposed DCN achieved a lung segmentation DSC of 99.11% and a lesion segmentation DSC of 83.51%. Although it is difficult to compare DCN with other COVID-19 segmentation methods due to their different datasets and annotation quality, we compared our results with segmentation results of radiologists and demonstrated the reliability of our lesion segmentation results. An LA module was proposed to fuse the intermediate results of the segmentation and classification branches for better performance. The LA module was inspired by the attention mechanism (Fu et al., 2019; Oktay et al., 2018). The intermediate results from two branches were concatenated and produced the attention maps for image classification. The classification branch could then concentrate more on the infected loci. The ablation study in Section 3.4.4 demonstrated the effectiveness of the LA module as it improved the accuracy significantly (1.59% for slice level, 2.43% for individual level, and 3.43% for external validation). The high degree of consistency between the manually annotated and attention masks (Fig. 7) verified the effectiveness of the LA module. Based on accurate attention maps of LA modules 2 and 3, our method can provide good interpretations of the classification results. Another advantage of our method is its sensitivity in processing images with small lesions. As shown in Fig. 5, the proposed DCN achieved an average promotion of over 20% for images with lesion sizes of less than 1000 pixels, compared with other models. This is mainly due to the attention mechanism provided by the proposed LA module, which allows the network to focus on the infected loci. Considering that lesions are subtle at the early stage of COVID-19, our method is highly applicable to early screening of the disease. Moreover, DCN also achieved significant progress in the case of small training samples, especially for sensitivity. Thus, DCN would prove invaluable in the absence of sufficient samples, such as in the early stage of the COVID-19 epidemic or other similar situations. We proposed some other techniques in this study to ensure the efficacy of our model. A weighted Dice loss function was proposed to handle the different requirements of the training data and different optimization goals between the classification and segmentation branches. The loss function also facilitates the training of the segmentation branch by reducing the sample imbalance. The difference in slice numbers caused by the diversity of scanning machines and parameters raised another technical challenge for the slice-based methods. Hence, to utilize the information in every slice, we proposed a slice probability mapping strategy, with which we can derive features with the same dimensions in each scan case for subsequent calculations. The slice probability mapping enables the analyses of scans with different slice numbers, thereby facilitating the implementation of our method on diverse datasets. Moreover, the results of ablation study, especially the results on external dataset, has proved the effectiveness of the slice probability mapping. The proposed DCN has several limitations. First, the precision of the attention masks partly depends on the accuracy of the segmentation branch. The segmentation branch learns from manual annotation (in which quality is not guaranteed), and inconsistencies between different radiologists may introduce biases. Semi-supervised or unsupervised methods may provide new perspectives for resolving this problem. Second, due to the large data size, the external dataset was not labeled at slice level. Hence, we could not analyze the slice-level performance at the external validation stage. Human-in-the-loop methods may be useful for further analysis.

Conclusion

The proposed combined segmentation–classification network for the diagnosis of COVID-19 outperformed commonly used classification models on both internal and external validation datasets. Further, the proposed LA module enables the network to focus on infected loci and significantly improves the detection of small lesions for early screening of COVID-19. Moreover, the attention maps aid the identification of lesion loci, thereby improving the interpretation of classification. In the future, we will continue to improve the network performance and extend DCN to a wider range of applications such as lung nodule classification and tumor detection.

CRediT authorship contribution statement

Kai Gao: Methodology, Software, Writing - original draft. Jianpo Su: Software, Visualization, Writing - review & editing. Zhongbiao Jiang: Investigation, Resources, Data curation. Ling-Li Zeng: Supervision, Validation. Zhichao Feng: Investigation, Writing - review & editing. Hui Shen: Conceptualization, Methodology, Project administration. Pengfei Rong: Conceptualization, Data curation, Project administration. Xin Xu: Supervision. Jian Qin: Methodology. Yuexiang Yang: Software, Resources. Wei Wang: Resources, Data curation. Dewen Hu: Conceptualization, Funding acquisition.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

24 in total

Review 1. Deep learning.

Authors: Yann LeCun; Yoshua Bengio; Geoffrey Hinton
Journal: Nature Date: 2015-05-28 Impact factor: 49.962

2. Guidelines for Management of Incidental Pulmonary Nodules Detected on CT Images: From the Fleischner Society 2017.

Authors: Heber MacMahon; David P Naidich; Jin Mo Goo; Kyung Soo Lee; Ann N C Leung; John R Mayo; Atul C Mehta; Yoshiharu Ohno; Charles A Powell; Mathias Prokop; Geoffrey D Rubin; Cornelia M Schaefer-Prokop; William D Travis; Paul E Van Schil; Alexander A Bankier
Journal: Radiology Date: 2017-02-23 Impact factor: 11.105

3. Shape and margin-aware lung nodule classification in low-dose CT images via soft activation mapping.

Authors: Yiming Lei; Yukun Tian; Hongming Shan; Junping Zhang; Ge Wang; Mannudeep K Kalra
Journal: Med Image Anal Date: 2019-12-12 Impact factor: 8.545

Review 4. Review of Artificial Intelligence Techniques in Imaging Data Acquisition, Segmentation, and Diagnosis for COVID-19.

Authors: Feng Shi; Jun Wang; Jun Shi; Ziyan Wu; Qian Wang; Zhenyu Tang; Kelei He; Yinghuan Shi; Dinggang Shen
Journal: IEEE Rev Biomed Eng Date: 2021-01-22

5. CT Imaging Features of 2019 Novel Coronavirus (2019-nCoV).

Authors: Michael Chung; Adam Bernheim; Xueyan Mei; Ning Zhang; Mingqian Huang; Xianjun Zeng; Jiufa Cui; Wenjian Xu; Yang Yang; Zahi A Fayad; Adam Jacobi; Kunwei Li; Shaolin Li; Hong Shan
Journal: Radiology Date: 2020-02-04 Impact factor: 11.105

6. A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: a study of a family cluster.

Authors: Jasper Fuk-Woo Chan; Shuofeng Yuan; Kin-Hang Kok; Kelvin Kai-Wang To; Hin Chu; Jin Yang; Fanfan Xing; Jieling Liu; Cyril Chik-Yan Yip; Rosana Wing-Shan Poon; Hoi-Wah Tsoi; Simon Kam-Fai Lo; Kwok-Hung Chan; Vincent Kwok-Man Poon; Wan-Mui Chan; Jonathan Daniel Ip; Jian-Piao Cai; Vincent Chi-Chung Cheng; Honglin Chen; Christopher Kim-Ming Hui; Kwok-Yung Yuen
Journal: Lancet Date: 2020-01-24 Impact factor: 79.321

7. Radiological findings from 81 patients with COVID-19 pneumonia in Wuhan, China: a descriptive study.

Authors: Heshui Shi; Xiaoyu Han; Nanchuan Jiang; Yukun Cao; Osamah Alwalid; Jin Gu; Yanqing Fan; Chuansheng Zheng
Journal: Lancet Infect Dis Date: 2020-02-24 Impact factor: 25.071

8. Clinically Applicable AI System for Accurate Diagnosis, Quantitative Measurements, and Prognosis of COVID-19 Pneumonia Using Computed Tomography.

Authors: Kang Zhang; Xiaohong Liu; Jun Shen; Zhihuan Li; Ye Sang; Xingwang Wu; Yunfei Zha; Wenhua Liang; Chengdi Wang; Ke Wang; Linsen Ye; Ming Gao; Zhongguo Zhou; Liang Li; Jin Wang; Zehong Yang; Huimin Cai; Jie Xu; Lei Yang; Wenjia Cai; Wenqin Xu; Shaoxu Wu; Wei Zhang; Shanping Jiang; Lianghong Zheng; Xuan Zhang; Li Wang; Liu Lu; Jiaming Li; Haiping Yin; Winston Wang; Oulan Li; Charlotte Zhang; Liang Liang; Tao Wu; Ruiyun Deng; Kang Wei; Yong Zhou; Ting Chen; Johnson Yiu-Nam Lau; Manson Fok; Jianxing He; Tianxin Lin; Weimin Li; Guangyu Wang
Journal: Cell Date: 2020-05-04 Impact factor: 41.582

9. Real-time forecasts of the COVID-19 epidemic in China from February 5th to February 24th, 2020.

Authors: K Roosa; Y Lee; R Luo; A Kirpich; R Rothenberg; J M Hyman; P Yan; G Chowell
Journal: Infect Dis Model Date: 2020-02-14

10. Deep learning-based model for detecting 2019 novel coronavirus pneumonia on high-resolution computed tomography.

Authors: Jun Chen; Lianlian Wu; Jun Zhang; Liang Zhang; Dexin Gong; Yilin Zhao; Qiuxiang Chen; Shulan Huang; Ming Yang; Xiao Yang; Shan Hu; Yonggui Wang; Xiao Hu; Biqing Zheng; Kuo Zhang; Huiling Wu; Zehua Dong; Youming Xu; Yijie Zhu; Xi Chen; Mengjiao Zhang; Lilei Yu; Fan Cheng; Honggang Yu
Journal: Sci Rep Date: 2020-11-05 Impact factor: 4.379

28 in total

1. Rapid quantification of COVID-19 pneumonia burden from computed tomography with convolutional long short-term memory networks.

Authors: Aditya Killekar; Kajetan Grodecki; Andrew Lin; Sebastien Cadet; Priscilla McElhinney; Aryabod Razipour; Cato Chan; Barry D Pressman; Peter Julien; Peter Chen; Judit Simon; Pal Maurovich-Horvat; Nicola Gaibazzi; Udit Thakur; Elisabetta Mancini; Cecilia Agalbato; Jiro Munechika; Hidenari Matsumoto; Roberto Menè; Gianfranco Parati; Franco Cernigliaro; Nitesh Nerlekar; Camilla Torlasco; Gianluca Pontone; Damini Dey; Piotr Slomka
Journal: J Med Imaging (Bellingham) Date: 2022-09-06

2. External Validation of Deep Learning Algorithms for Radiologic Diagnosis: A Systematic Review.

Authors: Alice C Yu; Bahram Mohajer; John Eng
Journal: Radiol Artif Intell Date: 2022-05-04

3. SSA-Net: Spatial self-attention network for COVID-19 pneumonia infection segmentation with semi-supervised few-shot learning.

Authors: Xiaoyan Wang; Yiwen Yuan; Dongyan Guo; Xiaojie Huang; Ying Cui; Ming Xia; Zhenhua Wang; Cong Bai; Shengyong Chen
Journal: Med Image Anal Date: 2022-04-22 Impact factor: 13.828

4. Lung Lesion Localization of COVID-19 From Chest CT Image: A Novel Weakly Supervised Learning Method.

Authors: Ziduo Yang; Lu Zhao; Shuyu Wu; Calvin Yu-Chian Chen
Journal: IEEE J Biomed Health Inform Date: 2021-06-03 Impact factor: 7.021

5. Dynamic deformable attention network (DDANet) for COVID-19 lesions semantic segmentation.

Authors: Kumar T Rajamani; Hanna Siebert; Mattias P Heinrich
Journal: J Biomed Inform Date: 2021-05-20 Impact factor: 8.000

Review 6. On the Role of Artificial Intelligence in Medical Imaging of COVID-19.

Authors: Jannis Born; David Beymer; Deepta Rajan; Adam Coy; Vandana V Mukherjee; Matteo Manica; Prasanth Prasanna; Deddeh Ballah; Michal Guindy; Dorith Shaham; Pallav L Shah; Emmanouil Karteris; Jan L Robertus; Maria Gabrani; Michal Rosen-Zvi
Journal: Patterns (N Y) Date: 2021-04-30

10. Automatic Segmentation of Novel Coronavirus Pneumonia Lesions in CT Images Utilizing Deep-Supervised Ensemble Learning Network.

Authors: Yuanyuan Peng; Zixu Zhang; Hongbin Tu; Xiong Li
Journal: Front Med (Lausanne) Date: 2022-01-03