Literature DB >> 34979404

Fully automatic pipeline of convolutional neural networks and capsule networks to distinguish COVID-19 from community-acquired pneumonia via CT images.

Qianqian Qi¹, Shouliang Qi², Yanan Wu³, Chen Li⁴, Bin Tian⁵, Shuyue Xia⁶, Jigang Ren⁷, Liming Yang⁸, Hanlin Wang⁹, Hui Yu¹⁰.

Abstract

BACKGROUND: Chest computed tomography (CT) is crucial in the diagnosis of coronavirus disease 2019 (COVID-19). However, the persistent pandemic and similar CT manifestations between COVID-19 and community-acquired pneumonia (CAP) raise methodological requirements.
METHODS: A fully automatic pipeline of deep learning is proposed for distinguishing COVID-19 from CAP using CT images. Inspired by the diagnostic process of radiologists, the pipeline comprises four connected modules for lung segmentation, selection of slices with lesions, slice-level prediction, and patient-level prediction. The roles of the first and second modules and the effectiveness of the capsule network for slice-level prediction were investigated. A dataset of 326 CT scans was collected to train and test the pipeline. Another public dataset of 110 patients was used to evaluate the generalization capability.
RESULTS: LinkNet exhibited the largest intersection over union (0.967) and Dice coefficient (0.983) for lung segmentation. For the selection of slices with lesions, the capsule network with the ResNet50 block achieved an accuracy of 92.5% and an area under the curve (AUC) of 0.933. The capsule network using the DenseNet121 block demonstrated better performance for slice-level prediction, with an accuracy of 97.1% and AUC of 0.992. For both datasets, the prediction accuracy of our pipeline was 100% at the patient level.
CONCLUSIONS: The proposed fully automatic deep learning pipeline of deep learning can distinguish COVID-19 from CAP via CT images rapidly and accurately, thereby accelerating diagnosis and augmenting the performance of radiologists. This pipeline is convenient for use by radiologists and provides explainable predictions.

Entities: Chemical

Keywords: Capsule network; Community-acquired pneumonia; Coronavirus disease 2019; Deep learning; Lung computed tomography image

Mesh：

Year: 2021 PMID： 34979404 PMCID： PMC8715632 DOI： 10.1016/j.compbiomed.2021.105182

Source DB: PubMed Journal: Comput Biol Med ISSN： 0010-4825 Impact factor: 6.698

Introduction

Coronavirus disease 2019 (COVID-19) was discovered in late 2019, and spread rapidly worldwide in only a few months [1,2]. The novel coronavirus is characterized by a high infectivity, mild symptoms, and a long incubation period. Currently, the number of COVID-19 patients abroad is increasing at a rate of 560,000 per day [3]. Early diagnosis of COVID-19 plays a crucial role in the isolation and treatment of patients [4,5]. The gold standard for the diagnosis of COVID-19 is the real-time polymerase chain reaction (RT-PCR) test for the detection of the novel coronavirus nucleic acid [[6], [7], [8], [9]], which is time-consuming. Patients affected by detoxification concentration must undergo multiple nucleic-acid tests to confirm the diagnosis, during which the virus can spread [10,11]. Lesions can be clearly observed in the chests of patients via computed tomography (CT) or radiography, mainly with ground-glass opacities (GGO) and crazy-paving [[12], [13], [14]]. However, the CT images of community-acquired pneumonia (CAP) and COVID-19 are similar. Thus, it is challenging even for experienced radiologists to distinguish between them. Moreover, radiologists perform long readings, leading to the degradation of the reading quality. Furthermore, misdiagnosis and missed diagnosis are not conducive to the analysis of symptoms [15,16]. Deep learning—particularly convolutional neural networks (CNNs)—demonstrates a significant potential for feature extraction and representation, and has attracted considerable attention for application in classification tasks for COVID-19 and CAP. Some researchers have focused on chest X-rays owing to their high speed and low radiation dose. Oh et al. proposed a patch-based CNN method, which has the advantage of relatively few parameters [17]. Nwosu et al. created a two-channel and half-cultured model based on residual neural networks for the classification of chest X-ray images, with supervised and unsupervised paths [18]. Gulati proposed a new convolutional network model architecture based on a combination of DarkNet and AlexNet for the automatic diagnosis of COVID-19 using patient X-ray images [19]. Haritha et al. created a deep learning model (CheXNet) using a pretrained model (DenseNet121) for the diagnosis of COVID-19 patients [20]. Waheed et al. developed CovidGAN, which adopts an auxiliary classifier generative adversarial network to generate synthetic X-ray images [21]. Rahaman et al. compared the performance of different transfer learning approaches for the identification of COVID-19 using chest X-ray images [22]. In contrast to chest X-rays, CT images have no overlapping tissues; thus, more details can be obtained and reconstructed in different planes. Moreover, the high resolution of CT examination allows dissection-level analysis. Thus, researchers have used chest CT images for COVID-19 analysis. Soares et al. constructed an open dataset of CT scans and applied an available deep learning model to the dataset for classifying pneumonia [23]. Alshazly et al. used CT images from two datasets—related to severe acute respiratory syndrome coronavirus 2 (SARS-COV-2) and COVID-19—to propose a transfer learning strategy based on a custom input of different depth architectures [24]. Gozes et al. proposed a three-dimensional (3D) deep learning framework that can extract two-dimensional (2D) and 3D global features to distinguish COVID-19 from CAP [25]. Ouyang et al. combined online attention with 3D ResNet34 and developed a dual-sampling attention network [26]. Qi et al. proposed a deep represented multiple-instance learning method [27]. However, more advanced deep learning models are required to improve the performance of the differential diagnosis of COVID-19 and CAP. In this study, inspired by the diagnostic process of radiologists, we propose a fully automated pipeline of CNNs and capsule networks for distinguishing COVID-19 from CAP using CT images. The pipeline comprises LinkNet [28] for lung segmentation, a capsule network for automatically selecting critical slices with infected lesions, a capsule network for distinguishing COVID-19 from CAP at the slice level, and a majority voting module for patient-level prediction. Our study provides the following novelties and contributions. First, an automatic pipeline was constructed, and excellent performance was achieved for multiple datasets. The pipeline comprises four connected modules, namely lung segmentation, selection of slices with lesions, slice-level prediction, and patient-level prediction. Thus, the results of each module can be inspected. Second, LinkNet was used to accurately segment the lung field with infected lesions. Third, a capsule network was implemented to automatically select critical slices with infected lesions. Fourth, a capsule network was designed for the slice-level prediction of COVID-19. To the best of our knowledge, this type of pipeline mimicking the diagnostic process of radiologists has not been previously reported.

Materials and methods

Dataset

COVID-19 and CAP data were collected from the General Hospital of the Yangtze River Shipping and Affiliated Hospital of Guizhou Medical University. The patients were enrolled between December 2019 and March 2020. After the elimination of abnormal data, the final data included 161 CT scans from 57 patients with COVID-19 and 165 scans from 100 patients with CAP. Chest CT scans were performed using GE LightSpeed16 CT, Toshiba Aquilion ONE CT, Toshiba Aquilion CT, Siemens Somatom Scope CT, Siemens Somatom CT, and Definition AS + CT. The patient information and scanning parameters are presented in Table 1 .

Table 1

Demographic information of the participants and acquisition parameters for the CT images.

Information	COVID-19	CAP	p value
Gender (male/female)	27/30	53/47	0.497a
Age (years)	56.1 ± 18.4	40.5 ± 20.7	6.526 × 10⁻⁶
kVp (kV)	120		–
Slice thickness (mm)	5		–
Pixel size (mm)	0.763 ± 0.067	0.697 ± 0.105	5.738 × 10⁻⁵
X-ray tube current (mA)	216.4 ± 23.5	207.1 ± 89.2	0.456b
Exposure (mA*s)	60.4 ± 45.0	103.7 ± 37.2	1.995 × 10⁻⁹

p was calculated via a chi-square test.

p was calculated via a two-sample t-test.

Demographic information of the participants and acquisition parameters for the CT images. p was calculated via a chi-square test. p was calculated via a two-sample t-test. Performances of the five lung segmentation networks. *Bold font indicates the network with the best performance. All the COVID-19 subjects were diagnosed via RT-PCR tests. In the CAP group, 69 patients obtained etiological confirmation from a specialized laboratory (57 bacterial, 9 viral pneumonia, and 3 mycoplasma); 31 patients did not have etiological confirmation, but the possibility of false negatives was eliminated through strict epidemiological investigations, numerous RT-PCR tests, and clinical outcomes. The second dataset, which included 110 patients, was obtained from the China Consortium of Chest CT Image Investigation (CC-CCII) dataset [29]. The third dataset, which included 538 COVID-19 patients (9997 slices), was acquired from The Cancer Imaging Archive (TCIA) Collections [30].

Overview of study procedure

Inspired by the diagnostic process of radiologists, a fully automated pipeline is proposed for distinguishing COVID-19 from CAP via CT images (Fig. 1 ). The pipeline consists of four modules: (1) lung segmentation (radiologists initially devote attention to the lung field during the diagnosis of lung diseases), (2) selection of slices with lesions (radiologists shift their attention from the lung field to the lesions), (3) slice-level prediction (radiologists typically perform a preliminary diagnosis based on the observation results for a specific slice), and (4) patient-level prediction (radiologists provide the final diagnosis after integration of information from all suspected slices). The details of each module of the pipeline are presented in the following sections.

Fig. 1

Overview of the proposed pipeline including four modules: (I) Lung segmentation, (II) selection of slices with lesions, (III) slice-level prediction, and (IV) patient-level prediction.

Lung segmentation

It was important to preprocess the data, as they originated from different hospitals. First, the CT scans were unified in a fixed window (window level = −300 HU, window width = 1400 HU). Second, the pixel intensity in the CT scan was normalized and adjusted between 0 and 1. Using Pulmonary Toolkit (https://www.tomdoel.com/software/), the mask of the lung field in 161 CT scans of COVID-19 was drawn semiautomatically. During the preparation of masks, all structures following the secondary bronchial were included in the lung field. The preprocessed data and obtained mask of the lung field were used to train and evaluate the lung segmentation module. Considering their good performance in medical image segmentation, five deep CNNs—U-Net [31], LinkNet, Recurrent Residual CNN-based U-Net (R2U-Net) [32], Attention U-Net [33], and U-Net++ [34,35]—were applied to our lung segmentation module. The performances of the different networks were compared to identify the most suitable structure. The network architecture is depicted in Fig. 2 .

Fig. 2

Architecture of the deep CNN for lung segmentation: (a) U-Net; (b) LinkNet; (c) R2U-Net; (d) Attention U-Net; (e) U-Net++.

Architecture of the deep CNN for lung segmentation: (a) U-Net; (b) LinkNet; (c) R2U-Net; (d) Attention U-Net; (e) U-Net++. U-Net [31] was presented at the Medical Image Computing and Computer Assisted Intervention (MICCAI) Society in 2015. It has been widely applied in medical image segmentation. In this study, preprocessed slices with sizes of 512 × 512 pixels were fed into the network. As shown in Fig. 2(a), the encoder of the network contains four pairs of convolutional layers and pooling layers with 32, 64, 128, and 256 channels, and the decoder contains four upsampling layers. The most significant advantage of U-Net is that the skip connection between the layers in the encoder and decoder helps each upsampling result to be combined with low-level features, such that features of different scales are fused and the edge segmentation is more accurate. Instead of recovering lost location information through the maximum index of the pooling layer, LinkNet [28] directly connects the encoder to the decoder by replacing the “concatenation” operation in U-Net with the “addition” operation (Fig. 2(b)). This operation bypasses the spatial information and improves the segmentation accuracy while reducing the processing time. The architecture of LinkNet is similar to that of U-Net. The encoder of the U-Net network is replaced by a residual module in LinkNet. To improve the accuracy of the network and minimize the training parameters, the residual module of ResNet34 was used as the network encoder in this study. R2U-Net [32] integrates the structure of the recurrent neural network (RNN) and ResNet into U-Net. Its performance in the segmentation of retinal blood vessels is better than that of U-Net [31]. The convolutional layers in both the encoder and decoder are replaced by an RNN with an embedded ResNet50 module (Fig. 2(c)). Attention U-Net [33] is also based on the U-Net model and introduces the attention mechanism. An attention gate is inserted into each skip connection between the layers in the encoder and decoder (Fig. 2(d)). The information from the convolutional layer of the encoder and the bottom layer of the decoder is input into the attention gate. The output of the attention gate is concatenated to the upsampling layer to emphasize the salient region of this layer and improve the model performance. U-Net++ [34] is a deeply supervised encoder–decoder network that consists of a few U-Net sub-networks with different depths (Fig. 2(e)). This architecture takes advantage of redesigned skip pathways and deep supervision. The skip pathways are used to reduce the semantic gap between the feature maps of the encoder and decoder sub-networks, which makes it easier for the optimizer to solve simple optimization problems. We selected binary cross entropy (BCE) as a loss function for the four segmentation networks. This loss function is used to solve the dichotomy problem, and can be expressed aswhere , , and represent the number of slices, output of the network obtained using the sigmoid function, and label, respectively. The CT images of both COVID-19 and CAP patients may contain lesions, and the presence of holes in the segmentation results of these networks cannot be avoided. In this study, the morphological filling method was used. The “FindContours” function in the OpenCV library was used. The maximum edge information of the segmented lung field was used to improve the results for avoiding the loss of key lesion information in the lung region in the subsequent classification network.

Selection of slices with lesions

For COVID-19 and CAP, infected lesions seldom occupy the entire lung field. The attention of radiologists is devoted to CT slices with lesions during diagnosis. A capsule network [36] was designed and trained to classify all the slices into two categories—with and without lesions—after lung segmentation. As shown in Fig. 3 , the capsule network for slice selection consisted of a convolutional block, a primary capsule layer, two convolutional capsule layers (A and B), and a dense capsule layer. The size of the input image was 512 × 512 pixels. The first three stages of the pretrained ResNet50 [37] were used as convolutional blocks (Fig. 3(b)). In total, 1024 features were output from the convolutional block and transmitted to the primary capsule layer. Each capsule of the primary capsule layer and two convolution capsule layers was a 1 × 16 vector (Fig. 3(c)). The output vector of the bottom capsule was multiplied by the weight matrix to calculate the prediction vector (i.e., high-level information), and was then transmitted to the upper capsule. Finally, a dense capsule layer was connected. Each of the two output categories had a 1 × 16 capsule. The prediction category of the image was determined according to its Frobenius norm.

Fig. 3

Structure of the capsule network for slice selection: (a) Overall structure; (b) Pretrained ResNet50 without fully connected layers; (c) Capsule architecture.

Structure of the capsule network for slice selection: (a) Overall structure; (b) Pretrained ResNet50 without fully connected layers; (c) Capsule architecture. In this study, the dynamic routing mechanism of three iterations was used to update the parameters of the weight matrix. However, the dynamic routing mechanism could not completely replace the back propagation. Thus, we used the spread loss function to train the back propagation.here, and represent the activation values of the target and the ith position from the target, respectively.

Slice-level prediction of COVID-19 and CAP

Using the selected slices with lesions as inputs, another capsule network was trained for the slice-level prediction of COVID-19 and CAP. Except for the convolutional block, which was a pretrained DenseNet121 block [38], the architectures of this capsule network were identical to those of the network for slice selection (Fig. 4 (a)). The previous ResNet50 and pretrained Inception-V3 were also used as the convolutional blocks in the capsule network for slice-level prediction. The detailed architectures of DenseNet121 and Inception-V3 [39] are depicted in Fig. 4(b) and (c), respectively. The same spread loss function (Equation (2)) was used to train the three models. The performances of the models using the three different convolutional blocks were compared.

Fig. 4

Structure of the capsule network for slice-level prediction of COVID-19 and CAP: (a) Overall structure; (b) Pretrained DenseNet121 without fully connected layers; (c) Pretrained Inception-V3 without fully connected layers.

Patient-level prediction

After the slice-level predictions, two majority voting methods were used to obtain the final patient-level prediction of COVID-19 or CAP. The first method is hard majority voting, wherein each slice with lesions from one patient provides a vote. The final prediction is determined according to the minority obeying the majority. The second method is soft majority voting, wherein for each slice with lesions, the probability of two categories is obtained by inputting the norm of the two capsules (i.e., the output of the capsule network) into a SoftMax operation. The final prediction is determined by comparing the sums of the probabilities of the two categories for all slices with lesions from one patient.

Comparative experiments

Three categories of comparative experiments were conducted. The objective of the first experiment was to determine whether lung segmentation improves the performance at the slice and patient levels. The input of the capsule network for slice-level prediction was changed to the original CT images without lung segmentation, and all other modules in the pipeline were maintained unchanged. The objective of the second experiment was to determine whether the selection of slices with lesions is useful. After lung segmentation, all slices were directly used to train and evaluate the capsule network for slice-level prediction. The objective of the third experiment was to determine whether the introduction of the capsule network concept affects the accuracy of COVID-19 and CAP classification. In this experiment, in the last step of the process, we used the traditional DenseNet121 block instead of the capsule network for classification.

Training and evaluation of models

During the training of the lung segmentation network, we marked the lung fields on 161 CT scans of COVID-19. A total of 10,280 image slices were obtained and divided into training, validation, and test datasets in the ratio of 7:1:2. For the capsule network for the selection of slices with lesions, 19,781 slices (6712 slices with COVID-19 or CAP lesions; 13,069 slices without lesions) were collected. Among these slices, 17,356 were used for training and validation, and the remaining slices (approximately 1/10 of all slices) were used for testing. For the capsule network for slice-level prediction, 6712 slices with lesions were divided into training, validation, and test sets in the ratio of 8:1:1. All the slices in the test set were obtained from 34 scans (9 COVID-19 and 25 CAP); no slice from these scans appeared in the training and validation sets. Moreover, the data from the CC-CCII dataset were divided into training, validation, and test sets in the ratio of 7:1:2. Seven performance metrics were used to evaluate the different models: the intersection over union (IoU), Dice coefficient, area under the curve (AUC), accuracy, precision, sensitivity, and specificity. here, TP, TN, FP, and FN denote a true positive, true negative, false positive, and false negative, respectively. After training and validation, the final optimized parameter was obtained. The learning rate, batch size, and number of epochs were 0.0001, 1, and 21, respectively. Data augmentation and early stopping were conducted in the experiment to alleviate the overfitting. When the accuracy of the validation dataset did not increase within seven epochs, the early stopping operation occurred. The experiments were implemented using the PyTorch library. The Pulmonary Toolkit for the preparation of lung field masks in MATLAB 2016b was used, with a Windows 10 system. The workstation used for the implementation had an Intel Core I7-9700 3.00 GHz central processing unit (CPU) and an NVIDIA GeForce RTX 2080 Ti graphics processing unit (GPU).

Results

Table 2 presents the performances of the different lung segmentation models with regard to the IoU and Dice coefficient. LinkNet exhibited the largest IoU (0.967) and Dice coefficient (0.983) among the five segmentation models (U-Net, LinkNet, R2U-Net, Attention U-Net, and U-Net++), whereas R2U-Net exhibited the lowest IoU and Dice coefficient.

Table 2

Performances of the five lung segmentation networks.

Model	IoU	Dice coefficient
U-Net [40]	0.962	0.980
LinkNet	0.967	0.983
R2U-Net [32]	0.928	0.962
Attention U-Net	0.951	0.974
U-Net++ [34,35]	0.936	0.966

*Bold font indicates the network with the best performance.

Fig. 5 depicts examples of lung segmentation using different networks for the same slice with infected lesions. In example slice 1, there was an under-segmented region indicated by an arrow for U-Net, R2U-Net, Attention U-Net, and U-Net++. Additionally, there were several holes in the results obtained using R2U-Net and Attention U-Net. In example slice 2, under-segmented regions were observed for U-Net, R2U-Net, and Attention U-Net. Moreover, there were several holes in the results obtained using R2U-Net, and parts of the bed of the CT scanner were incorrectly segmented as the lung field.

Fig. 5

Examples of lung segmentation using different networks.

Examples of lung segmentation using different networks. Fig. 6 depicts the receiver operating characteristic (ROC) curve and examples of the automatic selection of slices with lesions. The capsule network with the ResNet50 block achieved an accuracy of 92.5% and AUC of 0.933. Slices with obvious COVID-19 or CAP lesions were correctly selected. The slice marked with an asterisk is an example of a case of COVID-19 with small lesions, which was not selected. The slice marked with two asterisk symbols represents a case of CAP without significant lesions, which was incorrectly selected.

Fig. 6

ROC curve of the deep capsule network for automatic selection of slices with lesions and classification results (* indicates an example of a slice with a small COVID-19 lesion that was incorrectly classified as a slice without lesions, and ** indicates an example of a slice without apparent CAP lesions that was incorrectly classified as a slice with lesions).

Prediction for our laboratory dataset

Fig. 7 depicts the accuracy and loss for our laboratory dataset. The accuracy in the training and validation process was high, and the check point indicated that the accuracy in the validation process did not increase after seven epochs. The loss in the training and validation process decreased rapidly, indicating that the parameters selected in our model were sufficient. Table 3 presents the performances of the three capsule networks for slice-level prediction with the pretrained DenseNet121, Inception-V3, and ResNet50 network blocks. The capsule network using the DenseNet121 block exhibited the best performance. It had the fewest training parameters and achieved an accuracy of 97.1% and an AUC of 0.992.

Fig. 7

Accuracy and loss for our laboratory dataset.

Table 3

Performance comparison of slice-level prediction models with different pretraining blocks.

Model	Params. (M)	Precision	Accuracy	Sensitivity	Specificity	AUC
ResNet50	9.63	0.965	0.981	0.997	0.966	0.983
Inception	9.92	0.939	0.923	0.900	0.945	0.973
DenseNet121	8.04	0.979	0.971	0.959	0.981	0.992

* Bold font indicates the highest value among the three models.

Accuracy and loss for our laboratory dataset. Performance comparison of slice-level prediction models with different pretraining blocks. * Bold font indicates the highest value among the three models. The confusion matrix for the final patient-level prediction is shown in Fig. 8 . The prediction accuracy of the pipeline can be 100%, whereas the diagnosis accuracies of two radiologists (A and B) were 65.1% and 66.8%, respectively. Thus, the performance of our pipeline was remarkable.

Fig. 8

Confusion matrix of patient-level prediction of COVID-19 and CAP for our laboratory dataset.

Prediction for two other public datasets

For the CC-CCII dataset, the confusion matrix of the pipeline is shown in Fig. 9 . The accuracy can be 93.4%, while the AUC can be 0.876 for the slice-level prediction. The accuracy of the proposed method for patient-level prediction reached 100%, which was higher than those of the radiologists (A: 73.7%; B: 78.9%). Thus, the proposed strategy is robust and applicable to multiple datasets.

Fig. 9

Results for the prediction of COVID-19 and CAP with the CC-CCII dataset: (a) Confusion matrix at the slice level; (b) Confusion matrix at the patient level; (c) Two examples from the CC-CCII dataset that were incorrectly diagnosed. For the dataset from TCIA Collections, the accuracy was 94.8% and 96.7% for slice- and patient-level prediction, respectively.

Results of comparative experiment

Without lung segmentation before the slice-level prediction of COVID-19 and CAP, the accuracy was reduced to 86.32% and the AUC was reduced to 0.880. At the patient level, the accuracy was reduced to 78.57%. Without the selection of slices with lesions, the accuracy of slice-level prediction was merely 78.5%, which was approximately 20% lower than that of the model with slice selection. The AUC was reduced to 0.654. At the patient level, the accuracy was reduced to 85.71%. When the traditional DenseNet121 block was used for slice-level prediction, the accuracy was 94.7% and the AUC was 0.960 (inferior to those of the capsule network). Correspondingly, the accuracy at the patient level was reduced from 97.1% to 100%.

Comparison of our method and current state-of-the-art studies

Table 4 presents a performance comparison of our pipeline and other state-of-the-art methods for distinguishing COVID-19 from CAP. The state-of-the-art methods include [27], 2D CNN methods [41], BigBiGAN framework [42], Pretrained EfficientNet-b7 [43], and 3D ResNet34 with attention modules [26].

Table 4

Performance of our method and state-of-the-art methods.

Study	Key aspects	Performance
Our method	- Lung segmentation	Accuracy = 0.971
	- Selection of slices with lesions	Sensitivity = 0.959
	- Slice-level prediction	Specificity = 0.981 AUC = 0.992
	- Patient-level prediction
	- 157 patients (COVID-19: 57; CAP: 100)
	- Binary classification (COVID-19 or CAP)
Qi et al., 2021 [27]	- Deep features extracted by ResNet50	Accuracy = 0.959 Sensitivity = 0.972
	- 241 patients (COVID-19: 141; CAP: 100)	Specificity = 0.941
	- Binary classification (COVID-19 or CAP)	AUC = 0.955
Javaheri et al., 2021 [41]	- Training a subset of the control dataset model	Accuracy = 0.933
	- Feed all the datasets into the trained model	Sensitivity = 0.909
	- Classifying the given CT images	Specificity = 1.00
	- 335 CT images (COVID-19: 111; CAP: 115; Normal: 109)	AUC = 0.94
Song et al., 2020 [42]	- BigBiGAN framework is used for semantic feature extraction	Sensitivity = 0.92
	- Linear classifier is constructed using the semantic feature matrix	Specificity = 0.91
	- 201 CT images (COVID-19: 98; non-COVID-19 pneumonia: 103)	AUC = 0.972
Basset et al., 2021 [43]	- Lung segmentation using Bi-convGRU	Accuracy = 0.968
	- Pretrained EfficientNet-b7 is used to obtain features	AUC = 0.988
	- Attention modules are used to learn multi-scale features for lesion localization
	- 305 CT images (COVID-19: 169; CAP: 60; Normal: 76)
Ouyang et al., 2020 [26]	- VB-Net toolkit for lung segmentation	Accuracy = 0.875
	- Two 3D ResNet34 networks	Sensitivity = 0.869
	- Online attention module and ensemble learning	Specificity = 0.901
	- 3645 CT images (COVID-19: 2565; CAP: 1080)	AUC = 0.944
	- Binary classification (COVID-19 or CAP)

Performance of our method and state-of-the-art methods. Lung segmentation Selection of slices with lesions Slice-level prediction Patient-level prediction 157 patients (COVID-19: 57; CAP: 100) Binary classification (COVID-19 or CAP) Deep features extracted by ResNet50 241 patients (COVID-19: 141; CAP: 100) Binary classification (COVID-19 or CAP) Training a subset of the control dataset model Feed all the datasets into the trained model Classifying the given CT images 335 CT images (COVID-19: 111; CAP: 115; Normal: 109) BigBiGAN framework is used for semantic feature extraction Linear classifier is constructed using the semantic feature matrix 201 CT images (COVID-19: 98; non-COVID-19 pneumonia: 103) Lung segmentation using Bi-convGRU Pretrained EfficientNet-b7 is used to obtain features Attention modules are used to learn multi-scale features for lesion localization 305 CT images (COVID-19: 169; CAP: 60; Normal: 76) VB-Net toolkit for lung segmentation Two 3D ResNet34 networks Online attention module and ensemble learning 3645 CT images (COVID-19: 2565; CAP: 1080) Binary classification (COVID-19 or CAP) Our method achieved an accuracy of 0.971 at the patient level, outperforming the aforementioned state-of-the-art methods. Because the steps of the proposed pipeline mimic the workflow of radiologists, the pipeline is more applicable in medicine than the other methods.

Discussion

COVID-19 pandemic and challenges (distinguishing COVID-19 from CAP)

COVID-19 was detected in December 2019 and then rapidly spread worldwide [44]. In the following year, the delta variant emerged, which is more contagious and pathogenic than the original strain [45]. Although vaccines have been widely used and distributed among large populations [46], it is critical to diagnose and treat affected patients at an early stage [47]. The gold standard for COVID-19 diagnosis is the RT-PCR test [48,49]. However, this test is time-consuming and has limitations in underdeveloped areas [50]. Thus, a quick and simple approach to distinguish COVID-19 from CAP is required. Song et al. proposed an end-to-end classification method using a dataset acquired from two different hospitals that included 201 CT images (COVID-19: 98; non-COVID-19 pneumonia: 103) [42]. The state-of-the-art BigBiGAN framework was used for feature extraction, and a support vector machine was employed as the classifier, resulting in an AUC of 0.972. In our previous work, we proposed a method based on multiple-instance learning for distinguishing COVID-19 from CAP [27]. In that study, the pretrained ResNet50 block with finetuning was employed for deep feature representation, and the k-nearest neighbor method was used to generate the final result. An accuracy of 95% and an AUC of 0.943 were achieved. The proposed pipeline can achieve a performance comparable to those of the state-of-the-art methods. Thus, we developed an automatic pipeline of CNNs and capsule networks to distinguish COVID-19 from CAP using CT images. This pipeline can help radiologists to classify COVID-19 and CAP. CT is one of the most widely used imaging methods in clinical practice [[51], [52], [53], [54]] and plays an important role in the diagnosis of CAP and epidemiological studies [55]. GGO, consolidation, and peripheral and bilateral involvement have been observed in CT images of COVID-19 [56]. Radiologists have a high specificity but moderate sensitivity for distinguishing COVID-19 from CAP, which implies missed diagnoses of COVID-19 [57]. Thus, the deep learning is effective for screening [41].

Pipeline mimicking radiological diagnosis

To mimic the diagnostic process of radiologists, our pipeline has four modules: (1) lung segmentation, (2) selection of slices with lesions, (3) slice-level prediction, and (4) patient-level prediction. Therefore, the pipeline is convenient for use by radiologists. End-to-end deep learning models have been proposed for distinguishing COVID-19 from CAP [41,42,58]. Compared with that of these models, the proposed pipeline has better explanatory power, as radiologists can conveniently check the output of each module and confirm the final results. Thus, the pipeline with four modules provides explainable predictions. Deep learning networks have been used to segment the lung field in a preprocessing procedure before the prediction, and high accuracies have been achieved [59,60]. In our study, the lung segmentation module improved the classification performance for COVID-19 and CAP. This is because all lesions in the lung field and tissues outside the lung field can interfere with the feature representation in the capsule network. We compared the performances of U-Net, LinkNet, R2U-Net, Attention U-Net, and U-Net++ for lung segmentation. LinkNet outperformed the other four networks. It achieved a Dice coefficient of 0.983 and an IoU of 0.967, which were larger than or comparable to previous results obtained using DenseNet161 U-Net [61], LungSeg-Net [62], and three-stage segmentation [63]. In LinkNet, the residual module was employed in the encoder of the network to represent the high-level semantic information of the CT images, which contributed to the outstanding segmentation results. Although R2U-Net, which includes the recurrent convolution module, and Attention U-Net, which includes the attention mechanism, were derived from U-Net, their lung segmentation performances were inferior to that of U-Net. This may be because the R2U-Net and Attention U-Net models were trained from scratch with our small dataset and cannot be fully trained. The selection of slices with lesions is useful for distinguishing COVID-19 from CAP with capsule networks via CT images. CNN models have been trained to classify COVID and CAP without selecting lesion slices [23,64]. However, in this study, a specific module was introduced to identify the slices with lesions so that the value could be fed into the network in our pipeline. This significantly increased the accuracy of slice-level prediction (by approximately 20%). The patient-level prediction accuracy increased by 14% with the selection of slices with lesions, mainly because lesions do not necessarily spread throughout the lung field. For patients, labeling slices without lesions reduces the workload of radiologists and improves the classification accuracy.

Advantages of capsule network

The capsule network reflects the spatial information of images better than CNNs [65,66]. According to the comparative experiment, the accuracy of the capsule network can reach 97.1%, which is higher than that without a capsule block. This is mainly because compared with the scalar value produced by the CNN models, the vector output by the capsule network can better represent the features. The vector formula used by the capsule network can offset the deficiency of the CNN and help the network represent the features in a strong and lightweight manner [67].

Limitations and future studies

Our study has several limitations. First, the number of datasets considered is small. Although the capsule network is a lightweight network, overfitting may occur because of the small number of datasets, which may limit the generalization and robustness of the pipeline. Second, COVID-19 is distinguished from CAP, but other clinical phenotypes of COVID-19 and CAP are not considered, as the numbers of cases of different clinical phenotypes of COVID-19 are unequally distributed in the dataset and the COVID-19 phenotypes of some patients are unclear. Third, healthy individuals may also have opacities in the lung field. The influence of this condition is unknown, because a healthy control group was not included in our study. With the increase in the number of patients in the dataset, the clinical type of CAP and severity of COVID-19 is expected to be balanced, and the pipeline is expected to exhibit a better generalization ability. More advanced methods, such as Deep Bayes-SqueezeNet [68], deep convolutional generative adversarial networks [69], and ensemble learning [70], are required to improve the detection performance for COVID-19. Additionally, these methods demonstrate potential for use in the segmentation of lung infections and prognosis for COVID-19 patients in the future [71,72].

Conclusion

In this study, a fully automatic deep learning pipeline that can rapidly and accurately distinguish COVID-19 from CAP using CT images was developed. The performance of the pipeline was improved by adding modules for lung segmentation and the selection of slices with lesions. The capsule network in the pipeline effectively represented the deep features of COVID-19 lesions in CT images. Because the four modules mimic the diagnostic process of radiologists, the pipeline is convenient for use by radiologists and provides explainable predictions. The proposed pipeline can accelerate diagnosis and augment the performance of radiologists.

Ethical approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards.

Informed consent

Informed consent was waived because this was a prospective study.

Declaration of competing interest

The authors declare that they have no conflict of interest.

48 in total

1. Comparison of effective radiation doses from X-ray, CT, and PET/CT in pediatric patients with neuroblastoma using a dose monitoring program.

Authors: Yeun Yoon Kim; Hyun Joo Shin; Myung Joon Kim; Mi-Jung Lee
Journal: Diagn Interv Radiol Date: 2016 Jul-Aug Impact factor: 2.630

2. UNet++: A Nested U-Net Architecture for Medical Image Segmentation.

Authors: Zongwei Zhou; Md Mahfuzur Rahman Siddiquee; Nima Tajbakhsh; Jianming Liang
Journal: Deep Learn Med Image Anal Multimodal Learn Clin Decis Support (2018) Date: 2018-09-20

3. Identification of COVID-19 samples from chest X-Ray images using deep learning: A comparison of transfer learning approaches.

Authors: Md Mamunur Rahaman; Chen Li; Yudong Yao; Frank Kulwa; Mohammad Asadur Rahman; Qian Wang; Shouliang Qi; Fanjie Kong; Xuemin Zhu; Xin Zhao
Journal: J Xray Sci Technol Date: 2020 Impact factor: 1.535

4. Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia.

Authors: Qun Li; Xuhua Guan; Peng Wu; Xiaoye Wang; Lei Zhou; Yeqing Tong; Ruiqi Ren; Kathy S M Leung; Eric H Y Lau; Jessica Y Wong; Xuesen Xing; Nijuan Xiang; Yang Wu; Chao Li; Qi Chen; Dan Li; Tian Liu; Jing Zhao; Man Liu; Wenxiao Tu; Chuding Chen; Lianmei Jin; Rui Yang; Qi Wang; Suhua Zhou; Rui Wang; Hui Liu; Yinbo Luo; Yuan Liu; Ge Shao; Huan Li; Zhongfa Tao; Yang Yang; Zhiqiang Deng; Boxi Liu; Zhitao Ma; Yanping Zhang; Guoqing Shi; Tommy T Y Lam; Joseph T Wu; George F Gao; Benjamin J Cowling; Bo Yang; Gabriel M Leung; Zijian Feng
Journal: N Engl J Med Date: 2020-01-29 Impact factor: 176.079

5. COVIDiagnosis-Net: Deep Bayes-SqueezeNet based diagnosis of the coronavirus disease 2019 (COVID-19) from X-ray images.

Authors: Ferhat Ucar; Deniz Korkmaz
Journal: Med Hypotheses Date: 2020-04-23 Impact factor: 1.538

6. Detection and Severity Classification of COVID-19 in CT Images Using Deep Learning.

Authors: Yazan Qiblawey; Anas Tahir; Muhammad E H Chowdhury; Amith Khandakar; Serkan Kiranyaz; Tawsifur Rahman; Nabil Ibtehaz; Sakib Mahmud; Somaya Al Maadeed; Farayi Musharavati; Mohamed Arselene Ayari
Journal: Diagnostics (Basel) Date: 2021-05-17

7. Performance of Radiologists in Differentiating COVID-19 from Non-COVID-19 Viral Pneumonia at Chest CT.

Authors: Harrison X Bai; Ben Hsieh; Zeng Xiong; Kasey Halsey; Ji Whae Choi; Thi My Linh Tran; Ian Pan; Lin-Bo Shi; Dong-Cui Wang; Ji Mei; Xiao-Long Jiang; Qiu-Hua Zeng; Thomas K Egglin; Ping-Feng Hu; Saurabh Agarwal; Fang-Fang Xie; Sha Li; Terrance Healey; Michael K Atalay; Wei-Hua Liao
Journal: Radiology Date: 2020-03-10 Impact factor: 11.105

8. CT imaging and clinical course of asymptomatic cases with COVID-19 pneumonia at admission in Wuhan, China.

Authors: Heng Meng; Rui Xiong; Ruyuan He; Weichen Lin; Bo Hao; Lin Zhang; Zilong Lu; Xiaokang Shen; Tao Fan; Wenyang Jiang; Wenbin Yang; Tao Li; Jun Chen; Qing Geng
Journal: J Infect Date: 2020-04-12 Impact factor: 6.072