Literature DB >> 36212877

Analysis of CT scan images for COVID-19 pneumonia based on a deep ensemble framework with DenseNet, Swin transformer, and RegNet.

Lihong Peng^1,2, Chang Wang¹, Geng Tian³, Guangyi Liu¹, Gan Li¹, Yuankang Lu¹, Jialiang Yang³, Min Chen⁴, Zejun Li⁴.

Abstract

COVID-19 has caused enormous challenges to global economy and public health. The identification of patients with the COVID-19 infection by CT scan images helps prevent its pandemic. Manual screening COVID-19-related CT images spends a lot of time and resources. Artificial intelligence techniques including deep learning can effectively aid doctors and medical workers to screen the COVID-19 patients. In this study, we developed an ensemble deep learning framework, DeepDSR, by combining DenseNet, Swin transformer, and RegNet for COVID-19 image identification. First, we integrate three available COVID-19-related CT image datasets to one larger dataset. Second, we pretrain weights of DenseNet, Swin Transformer, and RegNet on the ImageNet dataset based on transformer learning. Third, we continue to train DenseNet, Swin Transformer, and RegNet on the integrated larger image dataset. Finally, the classification results are obtained by integrating results from the above three models and the soft voting approach. The proposed DeepDSR model is compared to three state-of-the-art deep learning models (EfficientNetV2, ResNet, and Vision transformer) and three individual models (DenseNet, Swin transformer, and RegNet) for binary classification and three-classification problems. The results show that DeepDSR computes the best precision of 0.9833, recall of 0.9895, accuracy of 0.9894, F1-score of 0.9864, AUC of 0.9991 and AUPR of 0.9986 under binary classification problem, and significantly outperforms other methods. Furthermore, DeepDSR obtains the best precision of 0.9740, recall of 0.9653, accuracy of 0.9737, and F1-score of 0.9695 under three-classification problem, further suggesting its powerful image identification ability. We anticipate that the proposed DeepDSR framework contributes to the diagnosis of COVID-19.

Entities: Chemical

Keywords: COVID-19 pneumonia; CT scan image; DenseNet; RegNet; Swin transformer; deep ensemble

Year: 2022 PMID： 36212877 PMCID： PMC9539545 DOI： 10.3389/fmicb.2022.995323

Source DB: PubMed Journal: Front Microbiol ISSN： 1664-302X Impact factor: 6.064

Introduction

In December 2019, a novel acute atypical respiratory disease, COVID-19, has broken in Wuhan, China (Ksiazek et al., 2003; Zhou et al., 2020). COVID-19 was defined as a global pandemic by the World Health Organization on 3 November 2020. Till 26 June 2022, this disease has infected over 541 million individuals and caused over 6.3 million deaths (COVID Live—Coronavirus Statistics—Worldometer, 2022). COVID-19 has exacerbated human suffering, damaged the global economy, and seriously affected the health, environmental and social fields worldwide (Mofijur et al., 2021). It has still indirectly affected the global educational and religions level. Moreover, it has caused healthcare service resources to the brink in many countries and regions and will deeply affect medical research (Harper et al., 2020). Furthermore, middle-income countries especially low-income countries remain more vulnerable in preventing COVID-19 and need to face more serious challenges (Peters et al., 2020). The COVID-19 pandemic has caused severe challenges to global public health (Wang et al., 2020; Sun et al., 2022a). The screening of massive samples each day overwhelms laboratories worldwide (Agaoglu et al., 2022). Detection of SARS-CoV-2 through RT-PCR from a nasopharyngeal swab sample is the most common avenue to diagnose COVID-19. However, RT-PCR does not demonstrate powerful sensitivity and specificity (Pu et al., 2022). Moreover, it need spend about 6 h for sampling and consecutive tests to distinguish false positives and false negatives (Lee et al., 2022). Multiple patients demonstrate clinical, laboratorial, and radiological features related to COVID-19, however, their RT-PCR test results are negative (Saad Menezes et al., 2022). Many evidences have suggested that chest Computer Tomography (CT) is an accurate and efficient COVID-19 diagnosis avenue (Chung et al., 2020; Pan et al., 2020; Wang C C et al., 2021; Wang B et al., 2021). It has high sensitivity and low misdiagnosis rate, thus is an efficient complement to RT-PCR (Fields et al., 2021). Although it is vital to rapidly detect patients with the COVID-19 infection by CT images, expert thoracic radiologists are not likely to immediately diagnose positive cases at all times, which may not only cause treatment delay, but also urge further transmission of COVID-19 because the COVID-19 patients are not promptly isolated (Jin et al., 2020; Shorten et al., 2021; Afshar et al., 2022). In this situation, it is especially important to aid doctors and health care workers to distinguish COVID-19-related CT images from non-COVID-19-realted CT images using artificial intelligence techniques. Many studies have suggested that artificial intelligence (AI) techniques including machine learning obtained enormous success in bioinformatics and medical image analysis (Chen et al., 2018a,b, 2019; Wang B et al., 2021; Wang C C et al., 2021; Zhang et al., 2021; Yang et al., 2022; Liu et al., 2022a). Over the last decade years, deep learning techniques have outperformed numerous state-of-the-art machine learning algorithms and demonstrated excellent learning ability in many fields including image recognition (Voulodimos et al., 2018; Wang B et al., 2021; Wang C C et al., 2021;Sun et al., 2022; Liu et al., 2022a,b). Under the situation of no standardization, artificial intelligence technologies, especially deep learning, have been widely applied to data collection and performance evaluation for COVID-19 (Roberts et al., 2021). Abbas et al. (2021) proposed a novel convolutional neural network (CNN) model, DeTraC, to classify COVID-19-related chest X-ray images based on feature extraction, decomposition and class composition. Shalbaf and Vafaeezadeh (2021) designed a deep transfer learning-based ensemble model with different pre-trained CNN architectures to detect CT images for novel COVID-19 diagnosis. Zhang et al. (2020) developed a deep learning-based anomaly detection system to screen COVID-19 from chest x-ray images. Zhou et al. (2021) explored an ensemble deep learning framework to detect COVID-19 from CT images. Karbhari et al. (2021) introduced an auxiliary classifier generative adversarial network to generate synthetic chest X-ray images and further detect COVID-19 based on custom-made deep learning model. Chouat et al. (2022) exploited deep transfer learning algorithm to screen COVID-19 positive patients based on CT scan and chest X ray images. Fan et al. (2022) proposed a branch network model by combining CNN and transformer structure for the identification of COVID-19 using CT scan images. Ter-Sarkisov (2022) built a COVID-CT-Mask-Net model to diagnose COVID-19 through regional features from chest CT scan images. Chieregato et al. (2022) presented a deep learning-based COVID-19 prognostic hybrid model to support clinical decision making. These models are mainly based on CNN and attention mechanism and effectively classify COVID-19-related images and non-COVID-19-related ones. However, they remain to improve the classification performance. In this study, we developed an ensemble deep learning framework (DeepDSR) by integrating three state-of-the-art neural networks including DenseNet, Swin transformer, and RegNet for the COVID-19 diagnosis.

Materials and methods

Materials

We use three available CT image datasets related to COVID-19 to investigate the performance of our proposed DeepDSR model. Dataset 1 can be downloaded from https://www.kaggle.com/datasets/plameneduardo/a-covid-multiclass-dataset-of-ct-scans. It contains publicly available 4,173 CT scan images from 210 different patients, out of which 2,168 images are from 80 patients infected by COVID-19 and confirmed by RT-PCR in hospitals from Sao Paulo, Brazil (Soares et al., 2020). Dataset 2 can be downloaded from https://www.kaggle.com/datasets/plameneduardo/sarscov2-ctscan-dataset. It contains 1,252 CT scan images from patients infected by COVID-19 and 1,230 CT scan images for patients non-infected by COVID-19 in hospitals from Sao Paulo, Brazil (Soares et al., 2020). Dataset 3 can be downloaded from https://github.com/UCSD-AI4H/COVID-CT. It contains 349 COVID-19 CT images from 216 patients and 463 non-COVID-19 CT images (Zhao et al., 2020). To boost the generalization ability of our proposed DeepDSR model, we integrate the above three datasets to one larger dataset. Consequently, DeepDSR can be used to effectively classify CT images in both individual datasets and other datasets. And we remove images with poor imaging and ones nonconforming to specifications. Finally, we obtain one dataset with 7,398 pulmonary CT images, which include 3,768 CT images from patients with the COVID-19 infections, 1,247 ones with other pneumonia infections, and 2,383 ones from normal lungs. We use 3,768 COVID-19-related images and 2,383 normal CT images to train the models for binary classification problems and use all 7,398 images for three classification problems. As shown in Figure 1, Lines 1–3 show pulmonary CT images from patients with COVID-19 infections, normal lungs, and patients with other pneumonia infections, respectively.

Figure 1

Image examples in dataset.

The pipeline of DeepDSR

It is difficult to obtain the best prediction accuracy when only thousands of images are trained. Thus, we design an ensemble model to reduce the limitation of lack of data through transfer learning. The ensemble model integrates three state-of-the-art and different network architectures, that is, DenseNet, Swin transformer and RegNet. The pipeline is shown as Figure 2. As shown in Figure 2, first, we preprocess data by integrating three available COVID-19-related CT image datasets to one larger dataset. Second, we pretrain weights of DenseNet, Swin transformer, and RegNet on the ImageNet dataset based on transformer learning. Third, we continue to train DenseNet, Swin Transformer, and RegNet on the integrated larger dataset. Finally, the classification results are obtained by integrating results from the above three models and the soft voting approach.

Figure 2

The pipeline for COVID-19-related CT image classification based on an ensemble of DenseNet, RegNet, and Swin transformer.

DenseNet

CNNs can implement accurate and efficient train when they contain shorter connections between layers close to the input and those close to the output. Traditional convolutional networks composed of layers connect each layer to its subsequent layer. Inspired by the model proposed by Huang et al. (2017), we introduced a Dense convolutional Network, DenseNet, to classify COVID-19-related CT scan images. DenseNet implements connection between each layer in a feed-forward fashion to accurately and efficiently train the model. DenseNet with layers has direct connections. At each layer, as shown in Figure 3A, the CT image feature maps from all previous layers are taken as its inputs and its outputs are taken as the inputs at next layer. For ResNet (Radosavovic et al., 2020), the original features and the new features are added by element by element to achieve the sample features. Differed from ResNet, DenseNet obtains shortcut through direct concatenation. DenseNet reduces the vanishing-gradient problem, boosts feature propagation, advances feature reuse while greatly decrease the number of parameters.

Figure 3

(A) The DenseNet Block; (B) Shifted-Window technique; (C) The Squeeze-and-Excitation network.

Swin transformer

Transformer has difficulty in application from language to vision because of differences between the two areas. Thus, Liu et al. developed a hierarchical transformer to obtain data representation by shifted windows (Liu et al., 2021). For an image, first, transformer splits it into fixed-size patches. Second, the patches are linearly embedded and added position embeddings. Third, the embedded results are feed to a standard Transformer encoder. Finally, an extra learnable “classification token” is added to the sequence to classify images. Inspired by model proposed by Liu et al. (2021), we use the window-shift technique and design a Swin transformer to classify COVID-19-related CT scan images. The window-shift technique and the sliding window approach are similar in modeling ability, but the former is beneficial for all-MLP architectures and has much lower latency than the latter. Swin transformer focuses on shifting window partition between consecutive self-attention layers. As shown in Figure 3B, the shifted windows connect with the windows in the previous layer, thus significantly enhancing the modeling ability. The window-shift technique limits self-attention computation to non-overlapping local windows as well as supports cross-window connection, thereby effectively improving the image classification ability of models. Furthermore, Swin transformer utilizes the window-shift technique and demonstrates the flexibility when modeling on COVID-19-related image identification as well as computational complexity linearly with image size.

RegNet

Neural architecture search and RegNet are two representative neural network design paradigms. The two complementary design paradigms can improve the efficiency of search algorithms while develop better models. Neural architecture search mainly focuses on the search strategy to more efficiently find the best network instances in a fixed and manually designed search space. In contrast, RegNet (Radosavovic et al., 2020) more focuses on designing paradigms on novel design spaces. RegNet is a novel neural network design paradigm. It used a residual network to simplify the deeper network training. It can boost the understanding of network design and further investigate design principles with strong generalize abilities across different settings. Instead of concentrating on individual network instance design, RegNet designs network design spaces that can parameterize network populations. The design process is similar to manually design network while advances the design space level. Consequently, we can obtain a low-dimensional design space composed of multiple simple and regular networks. In this study, RegNet composes a stem with the stride of 2 and 32 convolution kernels, followed by a network body composed of a series of stages, and finally a head. In the network body, each stage operates at gradually reduced resolution. It consists of multiple identical blocks with the stride of 1 except that the first block uses stride-two convolution kernel. The head is composed of an average pooling layer and a fully connected layer. It is used to output classes. In addition, RegNet contains RegNetX and RegNetY composed of RegNetX and squeeze-and-excitation network. As shown in Figure 3C, the squeeze-and-excitation network generally composed of one global average pooling layer and two fully connection layers that separately use ReLU and sigmoid as activation functions.

Ensemble

Although machine learning techniques achieve significant successes in knowledge discovery, they fail to obtain powerful performance because of imbalanced, high-dimensional and noisy features of data. Consequently, ensemble learning, which effectively integrates the prediction results from multiple individual classifiers, has been widely applied to image processing (Sagi and Rokach, 2018). Ensemble learning methods first generate multiple weak predictive results using different machine learning models, and obtain better performance by ensemble of the results from each individual model with different voting strategies. It composes of five main types: bagging, AdaBoost, gradient boosting, random forest, and random sub-space (Dong et al., 2020). Bagging generates sample subsets based on the random sampling approach, and train basic learners in a parallel manner (Breiman, 1996). AdaBoost concentrates on improving classification ability of individual models via iteratively adjusting weights for all misclassified samples (Hastie et al., 2009). Gradient boosting achieves sample subsets based on the random sampling approach, and trains each classifier to alleviate the residuals caused by the previous model. Thus, gradient boosting better fits the real data (Friedman, 2002). Random forest takes decision trees as predictors and separately trains multiple models to reduce the overfitting problem (Breiman, 2001). Random subspace constructs a set of feature subspaces based on the random sampling approach, and trains learners on the feature subspace set. Finally, it obtains the final classification by combining the results from each individual classifier (Ho, 1995). Ensemble learning utilizes different ensemble strategies to ensemble results from individual models. For regression estimation, it gains the final results via averaging all predictions. For classification, ensemble learning uses the voting method to achieve the final classification by combining each individual classifier. The absolute majority voting approach takes the same classification result as one from more than half of individual classifiers as the final result, and the relative majority voting approach takes the classification result where the number of individual predictors involved in a certain prediction is the largest as the final result. Therefore, we combine DenseNet, Swin transformer, and RegNet and develop an ensemble deep learning model, DeepDSR, to improve the COVID-19 classification performance of the model. The classification scores from the three individual classifiers are integrated based on the soft voting approach. Given a query image, for a binary classification problem, suppose that its scores classified to COVID-19-related image by DenseNet, Swin transformer, and RegNet are , , and , respectively, its final score classified to COVID-19-related sample can be represented by Eq. (1): Similarly, its final score classified to non-COVID-19-related image can be represented by Eq. (2): The image will be taken as COVID-19 related when , it will be taken as non-COVID-19 related, otherwise. Furthermore, for a three-classification problem, suppose that its scores classified to COVID-19 related by DenseNet, Swin transformer, and RegNet are , , and , respectively, its final score classified to positive sample can be computed by Eq. (3): Similarly, its final score classified to other pneumonia can be computed by Eq. (4): And its final score from normal lung can be computed by Eq. (5): Finally, the image will be taken as COVID-19 related when is larger than and ; it will be taken as other pneumonia when is larger than other two values; it is from normal lung otherwise.

Transfer learning and pre-training

CNNs usually need to train a mass of parameters. However, it is almost impossible to learn such massive parameters only through a few training images (Zhuang et al., 2020; Zhu et al., 2021). In particular, transfer learning can utilize existing knowledge and transfer knowledge from source domains to the target domain and thus has been widely applied to solve problems in different while relevant fields (Pan and Yang, 2009; Weiss et al., 2016). It usually pretrains weights on a large-scale dataset using a standard neural architecture and then fine-tunes the weights on a target dataset. It has been successfully applied to medical image classification, for instance, cancer classification, pneumonia detection, and skin lesion identification (Chang et al., 2017; Deepak and Ameer, 2019; Khalifa et al., 2019; Chouhan et al., 2020). Furthermore, existing lung CT scan images do not satisfy the need of a powerful image identification model because most of lung CT images are not publicly available. In addition, a image processed by random affine transformation, random crop or flip may not be a complete lung CT image because of the specificity of CT scanning techniques. The above two situations may easily produce the overfitting problem of image classification models in small datasets. Therefore, we want to pretrain the proposed DeepDSR model by transfer learning to advance the training speed, reduce overfitting, alleviate problems produced by insufficient data, and further improve the classification performance (Hijab et al., 2019; Cherti and Jitsev, 2021; Mustafa et al., 2021). Finally, we developed an ensemble deep model (DeepDSR) to analyze COVID-19 CT images by combining DenseNet, Swin transformer, and RegNet. First, we integrate three COVID-19 image dataset to one larger dataset. Second, we pretrain weights of DenseNet, Swin Transformer, and RegNet on the ImageNet dataset. Third, we repeatedly select 80% of CT images from the integrated larger dataset as the training set and the remaining 20% as the test set. Fourth, the training set is used to train DenseNet, Swin transformer, and RegNet, respectively. The test set is used to test the performance of DenseNet, Swin transformer, and RegNet, respectively. Finally, the final classification results are obtained by integrating the results from the above three models.

Results

Experimental evaluation and parameter settings

To evaluate the performance of the proposed DeepDSR framework, we use six measurement metrics: precision, recall, accuracy, F1-score, AUC and AUPR. Suppose that True Positive (TP), True Negative (TN), False Negative (FN), and False Negative (FN) are defined as Table 1. We can compute precision, recall, accuracy, F1-score, True Positive Rate (TPR), and False Positive Rate (FGR) as follows:

Table 1

The confusion matrix.

		True results
		Positive	Negative
Predicted results	Positive	TP	FP
Predicted results	Negative	FN	TN

The confusion matrix. And AUC is the area under the TPR-FPR curve, and AUPR is the area under the precision-recall curve. For each sample (image), its classification scores from three individual networks (DenseNet, Swin transformer, and RegNet) are computed by the softmax layer, respectively. Its final classification probability is obtained by averaging the scores from the three single models. AUC and AUPR can be computed based on the obtained final classification probability. Moreover, the six metrics are not equally important to COVID-19 CT image classification. The results caused by false negatives are more severe than ones caused by false positives for medical image classification. Therefore, recall and AUPR are more important compared to the other four evaluation metrics. The experiments are performed for 100 epochs to obtain the optimal parameter settings. In addition, DenseNet and RegNet use stochastic gradient descent algorithm and Swim transformer uses AdamW as optimizer to update parameters. The detailed parameters are set in Table 2. In Table 2 and the following Tables 2–5, the bold font in each column represents the best performance computed by corresponding method.

Table 2

Parameter settings.

Model	Parameter setting
Swin transformer	epochs = 100, batch_size = 8, lr = 0.0001
RegNet	epochs = 100, batch_size = 16, lr = 0.001, lrf = 0.01
DenseNet	epochs = 100, batch_size = 16, lr = 0.001, lrf = 0.01

Table 3

The performance comparison of DeepDSR and other models for COVID-19 image binary classification.

	Precision	Recall	Accuracy	F1-score	AUC	AUPR
EfficientNetV2	0.5077	0.9015	0.6231	0.6495	0.7800	0.6649
ResNet	0.9786	0.9602	0.9764	0.9693	0.9960	0.9943
Vision transformer	0.9811	0.9769	0.9838	0.9790	0.9982	0.9975
DeepDSR	0.9833	0.9895	0.9894	0.9864	0.9991	0.9986

The bold fonts represent the best performance in each column.

Table 4

The performance comparison of DeepDSR and three individual models for binary classification problem.

	Precision	Recall	Accuracy	F1-score	AUC	AUPR
Swin transformer	0.9619	0.9539	0.9675	0.9579	0.9943	0.9924
RegNet	0.9571	0.9832	0.9764	0.9700	0.9963	0.9949
DenseNet	0.9770	0.9790	0.9829	0.9780	0.9981	0.9973
DeepDSR	0.9833	0.9895	0.9894	0.9864	0.9991	0.9986

The bold fonts represent the best performance in each column.

Table 5

Statistical analyses of four models on 1,231 images.

	DenseNet	Swin transformer	RegNet	DeepDSR
TN	743	736	733	746
FN	10	22	8	5
FP	11	18	21	8
TP	467	455	469	472

The bold fonts represent the best performance in each column.

Parameter settings. The performance comparison of DeepDSR and other models for COVID-19 image binary classification. The bold fonts represent the best performance in each column. The performance comparison of DeepDSR and three individual models for binary classification problem. The bold fonts represent the best performance in each column. Statistical analyses of four models on 1,231 images. The bold fonts represent the best performance in each column.

The performance comparison of DeepDSR with other three models for COVID-19 image binary classification

We compare the proposed DeepDSR method to three state-of-the-art deep learning models (efficientNetV2, ResNet, and Vision transformer) when classifying CT scan images to two classes: COVID-19 related or non-COVID-19 related. EfficientNetV2 (Tan and Le, 2021) aims to solve the problem of slow training when the size of the training image is large in efficientNetV1. Moreover, it replaced some MBConv structures in shallow layers with Fused-MBConv structures and found the optimal combination through neural architecture search technology to improve the network training speed. Finally, efficientNetV2 used a non-uniform scaling strategy to scale the model and thus make the model more reasonable. ResNet (He et al., 2016) aims to solve the vanishing gradient and network degradation problems in traditional neural networks. ResNet solved the vanishing gradient problem through data preprocessing and batch normalization layer, and reduced the network degradation problem through a residual structure. ResNet used a connection model of shortcut to add interlayers in the feature matrix and thus greatly improve the depth of the network. Transformer (Vaswani et al., 2017) has been broadly used in the natural language processing field. Attention mechanism has been widely used in the computer vision field. Inspired by the transformer mechanism, Vaswani et al. divided each image into patches, and took the linear embedded sequence of these image blocks as the input of the transformer. The processing method of image patches is the same as marks in NLP applications. Vision transformer (Dosovitskiy et al., 2020) achieved excellent results when both pretraining on a sufficient scale dataset and migrating to tasks with fewer data points. We first selected 80% images as training set and 20% as test set from the integrated COVID-19-related CT scan images. We then train DeepDSR, efficientNetV2 (Tan and Le, 2021), ResNet (He et al., 2016), and Vision transformer (Dosovitskiy et al., 2020) for 100 epochs, respectively. The results are shown in Table 3 and Figure 4A. We can find that DeepDSR significantly outperforms efficientNetV2 in terms of precision, recall, accuracy, F1-score, AUC and AUPR. For examples, DeepDSR outperforms 21.93% and 33.42% compared to efficientNetV2 based on AUC and AUPR, respectively. DeepDSR also performs better than ResNet and Vision transformer although the improvement is slight. Figures 4B,C illustrate the AUC and AUPR values of DeepDSR and other models when classifying COVID-19-related CT images to two classes. The above results show that DeepDSR can efficiently identify CT scan images for patients infected by COVID-19.

Figure 4

(A) The performance comparison of DeepDSR and other models for COVID-19 image binary classification. (B,C) The AUC and AUPR values of DeepDSR and other models for COVID-19 image binary classification.

The performance comparison of DeepDSR and three individual models for COVID-19 image binary classification

To investigate the image classification performance of the proposed DeepDSR model with DenseNet, Swin transformer, and RegNet, we conduct experiment for 100 epochs. At each epoch, we select 80% samples to train DeepDSR, DenseNet, Swin transformer, and RegNet and the remaining 20% to test their performance. Table 4 and Figure 5A demonstrate the prediction results of the above four models. The results show that the proposed ensemble model, DeepDSR, outperforms other three individual models in terms of precision, recall, accuracy, F1-score, AUC, and AUPR. Figures 5B,C illustrate the AUC and AUPR values obtained from the above four models. We find that DeepDSR, ensemble of DenseNet, Swin transformer, and RegNet, can more effectively classify CT images to two classes: COVID-19-related or not.

Figure 5

(A) The performance comparison of DeepDSR and three individual models for COVID-19 binary classification problem; (B,C) The AUC and AUPR values of DeepDSR and three individual models for COVID-19 binary classification problem.

Statics of true positives/negatives and false positives/negatives

We investigate the classification results on 1,231 COVID-19-related CT images from the test set to more intuitively illustrate the affect of DeepDSR on CT image identification performance. Table 5 and Figure 6 give the number of true positives (TP), false positives (FP), false negatives (FN), and true negatives (TN) computed by DeepDSR, DenseNet, Swin transformer, and RegNet, respectively.

Figure 6

Statistical analysis of four methods on 1,231 images.

Statistical analysis of four methods on 1,231 images. The results show that DeepDSR, DenseNet, Swin transformer, and RegNet misclassify a few samples. DeepDSR computes the most TPs and TNs while the least FPs and FNs. Furthermore, efficientNetV2, ResNet, and Vision transformer compute much more FPs and FNs compared with DeepDSR, demonstrating higher error rates. Moreover, DeepDSR, ensemble of DenseNet, Swin transformer, and RegNet, outperforms all other three individual models. Thus, the neural network, combining the predictions obtained from all the base models, can significantly improve the CT image classification performance of models. In addition, the stacking ensemble consisting of all three base models outperforms all other combinations. DeepDSR is tuned to utilize those predictions that help improve the classification performance and ignore the wrong predictions made by the base models.

The affect of transfer learning on the performance

In the above sections, we pretrain the weights of DenseNet, Swin transformer, and RegNet on the ImageNet dataset and continue to train the three models on the integrated larger dataset for 100 epochs. We set up a group of control experiments without pretraining (100 epochs and 200 epochs) to validate the importance of pretraining weights of the models for 100 epochs. The results are shown in Table 6 and Figure 7.

Table 6

The affect of transfer learning on the performance.

	Precision	Recall	Accuracy	F1-score	AUC	AUPR
With pre-train	0.9833	0.9895	0.9894	0.9864	0.9991	0.9986
Without pre-train	0.8773	0.914	0.9171	0.8953	0.9716	0.9455
Without pre-train (200 epoch)	0.9544	0.9224	0.9529	0.9382	0.9866	0.9821

The bold fonts represent the best performance in each column.

Figure 7

The affect of transfer learning on the performance.

The affect of transfer learning on the performance. The bold fonts represent the best performance in each column. The affect of transfer learning on the performance. From Table 6 and Figure 7, we can observe that the performance of network architecture with the pretrained weights is much better than that of the network without pretraining weights for 100 epochs and 200 epochs. For example, under 100 epochs, the pretrained network computes accuracy of 0.9894, AUC of 0.9991, and AUPR of 0.9986, outperforming 7.88%, 2.83%, and 5.61% than the network without pretraining, respectively. In addition, we also investigate the performance of DeepDSR with pretraining for 100 epochs and ones without pretraining for 200 epochs. The results show that the pretrained network significantly outperforms the network without pretraining even for 200 epochs. Accuracy, AUC, and AUPR computed by the pretrained network are better 3.83%, 1.27%, and 1.68% than ones without pretraining for 200 epochs, respectively. The results demonstrate that pretraining based on transfer learning can reduce the training time while improve the classification performance. Finally, when adding epochs on the pretrained network, however, the performance improvement is not obvious. On the contrary, it even produces drifts and thus causes poorer performance.

Performance comparison for three-classification problem

Finally, we classify CT scan images to three classes to further evaluate the robustness and credibility of DeepDSR. We use 7,398 lung CT scan images, which contain 3,768 lung CT scan images from patients infected by COVID-19, 2,383 ones from normal lung, and 1,247 ones from patients infected by other pneumonia. And 80% images are selected the training set and the remaining images are the test set. We repeatedly conduct the three-classification experiments on the obtained 7,398 images for 100 epochs. Table 7 and Figure 8 give precision, recall, accuracy, and F1-socre of DeepDSR, other three comparative methods, and three individual models.

Table 7

The performance of DeepDSR and other models for three-classification problem.

	Precision	Recall	Accuracy	F1-score
EfficientNet V2	0.4023	0.4479	0.5132	0.3736
ResNet	0.9487	0.9397	0.9541	0.9439
Vision transformer	0.7112	0.6264	0.7373	0.6301
Swin transformer	0.9488	0.9371	0.9548	0.9424
RegNet	0.9492	0.9463	0.9568	0.9476
DenseNet	0.9552	0.953	0.9608	0.9541
DeepDSR	0.974	0.9653	0.9737	0.9695

The bold fonts represent the best performance in each column.

Figure 8

The performance of DeepDSR and other models for three-classification problem.

The performance of DeepDSR and other models for three-classification problem. The bold fonts represent the best performance in each column. The performance of DeepDSR and other models for three-classification problem. The results from Table 7 and Figure 8 show that the proposed DeepDSR framework significantly outperforms efficientNet-V2 and Vision transformer in terms of precision, recall, accuracy, and F1-score. DeepDSR is also better than ResNet and three individual models based on the above measurement metrics. For example, DeepDSR computes the best precision of 0.9740, recall of 0.9653, accuracy of 0.9737, and F1-score of 0.9695, outperforming 1.93%, 1.27%, 1.31%, and 1.59 compared the second-best methods (DenseNet, DenseNet, RegNet, and DenseNet), respectively. The results demonstrate that DeepDSR has better generalization ability and can thus be applied to classify COVID-19-related CT scan images.

Conclusion

COVID-19 detection through CT scan images has the characteristics of high sensitivity, low misdiagnosis rate, and high commercial practicability. Therefore, it has been a research hotspot to detect COVID-19 through CT scan images based on deep learning. In this study, we developed a deep ensemble model, DeepDSR to identify CT scan images for patients infected by COVID-19. DeepDSR combined three different state-of-the-art network architectures, DenseNet, Swin transformer, and RegNet. It obtained the best performance compared to three classical deep learning models (efficientNetV2, ResNet, and Vision transformer) as well as three individual models when classifying CT images to two classes (COVID-19-related or non-COVID-19-related) or three classes (COVID-19-related, normal pneumonia, and healthy lung). EfficientNetV2, ResNet, and Vision transformer are three state-of-the-art deep learning models with different network architectures. The proposed DeepDSR model computed the best measurement values compared with the three network architectures, demonstrating its optimal image classification ability. Moreover, DeepDSR aggregated three individual deep models, DenseNet, Swin transformer, and RegNet. Lower correlations between the three individual models more obviously reduced the variance of DeepDSR. In addition, DeepDSR also reduced its variance due to the ensemble nature. Therefore, DeepDSR, ensemble of different single models, significantly outperforms the three individual models, thereby suggesting its powerful performance. Our proposed DeepDSR has three advantages: first, three COVID-19-related CT image datasets were fused to boost the generalization ability of DeepDSR. Moreover, multiple methods including batch normalization were adopted to prevent overfitting. Finally, DeepDSR, ensemble of DenseNet, Swin transformer, and RegNet, can more accurately classify CT images and thus improve the classification performance. However, the training of DeepDSR was more complex than single model, it also spend more time to train and test the model, and more parameters need to be adjusted, thereby requiring more computing resources. In the future, we will design more robust ensemble deep learning models to accurately classify images for query diseases including COVID-19 and cancer. In particular, we will further consider deep heterogeneous ensemble framework to accurately identify images for related diseases by ensemble of deep learning model and supervised learning model.

Data availability statement

The original contributions presented in the study are included in the article/supplementary material. Source code is freely downloadable at: https://github.com/plhhnu/DeepDSR/. Datasets 1-3 can be downloaded from the following three links: https://www.kaggle.com/datasets/plameneduardo/a-covid-multiclass-dataset-of-ct-scans; https://www.kaggle.com/datasets/plameneduardo/sarscov2-ctscan-dataset; https://github.com/UCSD-AI4H/COVID-CT.

Author contributions

LP, CW, MC, and ZL: conceptualization. LP, CW, and ZL: methodology. CW and MC: software. LP, CW, GT, GLiu: validation. LP, MC, and ZL: investigation. CW, GLi, and YL: data curation. LP and CW: writing—original draft preparation. LP, GT, and JY: writing—review and editing. LP: supervision. LP, CW, and MC: project administration. LP, and MC: funding acquisition. All authors contributed to the article and approved the submitted version.

Funding

ZL was supported by National Natural Science Foundation of China under grant no. 62172158. LP was supported by the National Natural Science Foundation of China under grant no. 61803151. GLiu and YL were supported by the Innovation and Entrepreneurship Training Program for College Students of Hunan Province under grant no. S202111535031 and the Innovation and Entrepreneurship Training Program for College Students of Hunan University of Technology under grant no. 20408610119.

Conflict of interest

GT and JY were employed by Geneis (Beijing) Co. Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be constructed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

37 in total

1. BNPMDA: Bipartite Network Projection for MiRNA-Disease Association prediction.

Authors: Xing Chen; Di Xie; Lei Wang; Qi Zhao; Zhu-Hong You; Hongsheng Liu
Journal: Bioinformatics Date: 2018-09-15 Impact factor: 6.937

2. A novel coronavirus associated with severe acute respiratory syndrome.

Authors: Thomas G Ksiazek; Dean Erdman; Cynthia S Goldsmith; Sherif R Zaki; Teresa Peret; Shannon Emery; Suxiang Tong; Carlo Urbani; James A Comer; Wilina Lim; Pierre E Rollin; Scott F Dowell; Ai-Ee Ling; Charles D Humphrey; Wun-Ju Shieh; Jeannette Guarner; Christopher D Paddock; Paul Rota; Barry Fields; Joseph DeRisi; Jyh-Yuan Yang; Nancy Cox; James M Hughes; James W LeDuc; William J Bellini; Larry J Anderson
Journal: N Engl J Med Date: 2003-04-10 Impact factor: 91.245

3. A pneumonia outbreak associated with a new coronavirus of probable bat origin.

Authors: Peng Zhou; Xing-Lou Yang; Xian-Guang Wang; Ben Hu; Lei Zhang; Wei Zhang; Hao-Rui Si; Yan Zhu; Bei Li; Chao-Lin Huang; Hui-Dong Chen; Jing Chen; Yun Luo; Hua Guo; Ren-Di Jiang; Mei-Qin Liu; Ying Chen; Xu-Rui Shen; Xi Wang; Xiao-Shuang Zheng; Kai Zhao; Quan-Jiao Chen; Fei Deng; Lin-Lin Liu; Bing Yan; Fa-Xian Zhan; Yan-Yi Wang; Geng-Fu Xiao; Zheng-Li Shi
Journal: Nature Date: 2020-02-03 Impact factor: 69.504

Review 4. Impact of COVID-19 on the social, economic, environmental and energy domains: Lessons learnt from a global pandemic.

Authors: M Mofijur; I M Rizwanul Fattah; Md Asraful Alam; A B M Saiful Islam; Hwai Chyuan Ong; S M Ashrafur Rahman; G Najafi; S F Ahmed; Md Alhaz Uddin; T M I Mahlia
Journal: Sustain Prod Consum Date: 2020-10-14

5. COVID-CT-Mask-Net: prediction of COVID-19 from CT scans using regional features.

Authors: Aram Ter-Sarkisov
Journal: Appl Intell (Dordr) Date: 2022-01-08 Impact factor: 5.019

6. COVID-19 PCR test performance on samples stored at ambient temperature.

Authors: Nihat Bugra Agaoglu; Jale Yildiz; Ozlem Akgun Dogan; Betsi Kose; Gizem Alkurt; Yasemin Kendir Demirkol; Arzu Irvem; Levent Doganay; Gizem Dinler Doganay
Journal: J Virol Methods Date: 2021-12-16 Impact factor: 2.014

7. The application of a deep learning system developed to reduce the time for RT-PCR in COVID-19 detection.

Authors: Yoonje Lee; Yu-Seop Kim; Da-In Lee; Seri Jeong; Gu-Hyun Kang; Yong Soo Jang; Wonhee Kim; Hyun Young Choi; Jae Guk Kim; Sang-Hoon Choi
Journal: Sci Rep Date: 2022-01-24 Impact factor: 4.379

8. Distinct Outcomes in COVID-19 Patients with Positive or Negative RT-PCR Test.

Authors: Maria Clara Saad Menezes; Diego Vinicius Santinelli Pestana; Juliana Carvalho Ferreira; Carlos Roberto Ribeiro de Carvalho; Marcelo Consorti Felix; Izabel Oliva Marcilio; Katia Regina da Silva; Vilson Cobello Junior; Julio Flavio Marchini; Julio Cesar Alencar; Luz Marina Gomez Gomez; Denis Deratani Mauá; Heraldo Possolo Souza
Journal: Viruses Date: 2022-01-18 Impact factor: 5.048

9. MDHGI: Matrix Decomposition and Heterogeneous Graph Inference for miRNA-disease association prediction.

Authors: Xing Chen; Jun Yin; Jia Qu; Li Huang
Journal: PLoS Comput Biol Date: 2018-08-24 Impact factor: 4.475

10. Deep Transfer Learning Models for Medical Diabetic Retinopathy Detection.

Authors: Nour Eldeen M Khalifa; Mohamed Loey; Mohamed Hamed N Taha; Hamed Nasr Eldin T Mohamed
Journal: Acta Inform Med Date: 2019-12