Literature DB >> 35530171

CrodenseNet: An efficient parallel cross DenseNet for COVID-19 infection detection.

Jingdong Yang¹, Lei Zhang¹, Xinjun Tang².

Abstract

Purpose At present, though the application of Convolution Neural Network (CNN) to detect COVID-19 infection significantly enhance the detection performance and efficiency, it often causes low sensitivity and poor generalization performance. Methods In this article, an effective CNN, CrodenseNet is proposed for COVID-19 detection. CrodenseNet consists of two parallel DenseNet Blocks, each of which contains dilated convolutions with different expansion scales and traditional convolutions. We employ cross-dense connections and one-sided soft thresholding to the layers for filtering of noise-related features, and increase information interaction of local and global features. Results Cross-validation experiments on COVID-19x dataset shows that via CrodenseNet the COVID-19 detection attains the precision of 0.967 ± 0.010, recall of 0.967 ± 0.010, F1-score of 0.973 ± 0.005, AP (area under P-R curve) of 0.991 ± 0.002, and AUC (area under ROC curve) of 0.996 ± 0.001. Conclusion CrodenseNet outperforms a variety of state-of-the-art models in terms of evaluation metrics so it assists clinicians to prompt diagnosis of COVID-19 infection.

Entities: Chemical

Keywords: CNN; Cross dense connections; DenseNet; One-sided soft thresholding transformation

Year: 2022 PMID： 35530171 PMCID： PMC9058031 DOI： 10.1016/j.bspc.2022.103775

Source DB: PubMed Journal: Biomed Signal Process Control ISSN： 1746-8094 Impact factor: 5.076

Introduction

Since December 2019, a novel Corona Virus Disease has been prevailing globally, and on February 11, 2020, the disease was officially named COVID-19 by the World Health Organization (WHO) [1]. Because of its rapid global spread, the WHO declared COVID-19 a global pandemic in March 2020 [2]. COVID-19 triggers neuroinflammation, hypoxia, vulnerability of the Blood Brain Barrier (BBB), increased predisposition to secondary infections and brain disfunction [3]. Meanwhile, with the outbreak of COVID-19 across the globe, multiple researchers study COVID-19 via Artificial Intelligence (AI). Saleh Albahli et al. [4] used Natural Language Processing (NLP) techniques to perform a semantic analysis of three levels (negative, neutral and positive) based on a Twitter dataset of 2 months to assess the feelings of Gulf countries towards the pandemic and the lockdown. The results showed that 50.5% of the countries had a neutral attitude towards the pandemic, 31.2% had a positive attitude and 18.3% had a negative attitude. Kumar et al. [5] proposed an AI based automated image classification for sorting COVID related medical waste streams from other waste. The result showed that the Support Vector Machine (SVM) performed best with 96.5% accuracy, and prevented the transmission of COVID-19 infection effectively. Alyasseri et al. [6] reviewed more than 200 papers published from December 2019 to April 2021 on the latest COVID-19 diagnosis using Deep Learning or Machine Learning and found that SVM was the most widely used machine learning for COVID-19 diagnosis or outbreak prediction, while CNN was the most widely used deep learning. Accuracy, sensitivity and specificity were the most widely used measures in previous studies. Hasoon et al [7] applied three procedures to classify normal and COVID-19 images. The first procedure was preprocessing, including noise removal, thresholding, and morphological operation. The second was Region of Interest (ROI) detection and segmentation, in which Local Binary Pattern (LBP) and Histogram of Gradient (HOG) were employed to extract edge feature and Haralick was used for extraction of texture feature. Finally, K-Nearest Neighbors (KNN) and SVM were used for classification. The results showed LBP-KNN model with an average accuracy of 98.66% outperformed the other models. It showed that machine learning had achieved higher performance in COVID-19 detection, but unsatisfactory robustness due to feature engineering. With the development of deep learning, CNN is usually used instead of feature engineering to accomplish end-to-end automatic detection. The main contribution in this paper is as follows. To extract COVID-19 features with different dimensions and scales, a parallel cross DenseNet, composed of dilated convolutions and traditional convolutions is proposed to enlarge receptive fields for extraction of more features in global and semantic levels. The one-sided soft thresholding is employed to remove noise-related features, and auxiliary classification and regularization is integrated to reduce overfitting. Cross-validation experiments on COVID-19x chest X-ray images datasets show that the proposed model outperforms the state-of-the-art CNN models in classification accuracy and generalization performance. Grad-CAM visualization technology is applied to display the key areas of feature extraction, and aid the clinicians in clinical diagnosis and analysis. In this paper, Section Ⅱ discusses related works. Section III introduces the overall framework and implementation of the proposed model. Section IV presents experimental setup including dataset description and hyperparameter Setting. Section Ⅴ is experimental results and analysis. Section Ⅵ describes the discussion on computational complexity and features of visualization. We conclude the main traits and shortcomings of proposed model in Section Ⅶ.

Related works

CNN have been used in multiple researches to implement classification on COVID-19 as shown in Table 1 . Al-Waisy et al. [8] used Contrast-Limited Adaptive Histogram Equalization (CLAHE) and Butterworth bandpass filter to enhance the contrast and eliminate the noise in CX-R images, respectively. Then, input to two different deep learning methods (a deep belief network and a convolutional deep belief network) to distinguish healthy and COVID-19 infected individuals. The experimental results showed this way can diagnose patients with COVID-19 with a detection accuracy rate of 99.93% in the large-scale dataset. Barshooi et al. [9] first expanded the dataset by combining the traditional data augmentation techniques with the Generative Adversarial Networks (GANs) and then applied different filters (Sobel, Laplacian of Gaussian (LoG) and the Gabor filters) for feature extraction, using the output of the filters as well as the original data are considered as new data and input into 6 migration models for training (AlexNet, GoogleNet, VGG-19, ShuffleNet V2, DenseNet-121 and DenseNet-201). The experiment shows that the combination of Gabor and DenseNet-201 best, with 2-class classification accuracy of 98.5%. The methods use specific filter to reduce the noises of image on specified datasets amid data preprocessing to improve classification performance. However, the variation of distributions and types of the dataset will bear on the classification results to a certain extent.

Table 1

Applications of CNN for detection of COVID-19 infection.

Researchers	class (COVID-19, Normal, Viral pneumonia)	Models	Overall Accuracy	F1-score of COVID-19
A. S. Al-Waisy et al. [8]	12000,12000	DBN + CDBN	0.993	0.999
A. Barshooi et al. [9]	360,4200	DenseNet-201	0.985	0.978
S. Rajpal et al. [10]	520, 520,520	ResNet50	0.974 ± 0.02	0.987
S. Karakanis et al. [11]	275, 270, 275	CNN	0.983	/
T. Ozturk et al. [12]	125, 500, 500	DarkCovidNet	0.8702	0.8737
C. Ouchicha et al. [13]	219, 1341, 1345	CVDNet	0.9669	0.9664
L. Wang et al. [14]	266, 8086, 5538	COVID-Net	0.933	0.9268
P. Kedia et al. [15]	266, 8086, 5538	CoVNet-19	0.9828	0.99

Applications of CNN for detection of COVID-19 infection. Rajpal et al. [10] created a segmented feature extraction and classification method for detection of COVID-19 chest X-ray images using three modules. The first module extracted 2048 feature vectors using ResNet-50. The second module extracted 252 frequency-domain and texture features, furtherly reduced to 16. The third module connected the features of the first and second modules and classified using fully connected layer. Using this method, the performance improved. However, when the data distribution changed, the frequency-domain or texture features changed as well, and the classification performance was degraded. Kedia et al. [11] created an integrated structural model (CoVNet-19) that combined two depth models (e.g. VGG-19 and DenseNet121) to extract in-depth features from chest X-ray images. The SVM classifier was applied for binary classification and 3-class classification, yielding accuracy of 0.9971 and 0.9828, respectively. By this method a deep learning model is used to extract in-depth features of COVID-19 image, and a machine learning model is used for classification. Because the feature extractor and feature classifier are heterostructures, classification performance may decrease in case of different distributions of samples. Karakanis et al. [12] first used GAN to expand datasets, and then applied ResNet-8 model to build two lightweight deep learning networks for binary classification and 3-class classification, respectively. The binary classification yielded an accuracy of 0.987, and 3-class classification accuracy of 0.983. Because the data generated by GAN and the original data fell into the same distribution, it was difficult to generate effective data when a large amount of data had different distributions. Therefore, we should choose a large dataset with multiple data sets fused for training and testing, which can show the robustness of the model well. Ozturk et al. [13] improved Darknet-19 model by removing layers and adding filters for binary classification and multiple classification, yielding accuracy of 0.9808 and 0.8702, respectively. Darknet-19 is a YOLOV3-based feature extraction network that can effectively extract local pixel position features, but it is not sensitive to global features. Both ResNet-8 and Darknet-19 are lightweight models with fewer parameters, but they may have a decreased classification performance for hard samples. Therefore, the trade-off between the number of parameters and classification performance should be taken into account. Ouchicha et al. [14] proposed a 3-class image classification model that connected the residual blocks of two convolution kernels with different scales in parallel and added cross residual links to reuse the features of different scales. The model extracted features using two larger convolutional kernels, which increased complexity of model. The residual network excels in facilitating the back propagation of gradient, but it is unable to make full use of the different sizes of extracted features, causing a decrease of classification performance. Wang et al. [15] applied reinforcement learning to generate a special model for COVID-19 datasets, and used the channel-wise convolutions to replace traditional convolution to reduce complexity of the model. Such models show good performance on some specific dataset, but the increase of numbers of dataset would affect the overall performance of classification. In summary, the research on medical image has some limitations. For example, the convolutional kernel of fixed size is unable to extract the features of different sizes. The feature map of various dimensions is unable to have impact on classification fully, and reduce the efficiency in feature utilization. Meanwhile, those above models have deficiency on an effective filtering mechanism resulting in the reduction of feature transmission ratio. Therefore, we propose an end-to-end model whose feature extractor and classifier are trained at the same time to maximize complex interplay between them. We also use a parallel cross DenseNet, which highly reuses the extracted features and achieves better performance. We employ a one-sided soft thresholding structure to automatically filter features further to reduce noises for different feature layers.

A parallel DenseNet model with Cross-Dense connections

A. Architecture of CrodenseNet model

We propose a novel model CrodenseNet based on DenseNet [16] that applies dilated convolution layers [17] and traditional convolution layers to extract COVID-19 features from receptive fields with different scales. There are a total of 3 major innovations in the model structure: cross-dense connection, one-sided soft threshold and auxiliary classifier. Cross-dense connections As is shown in Fig. 1 , on the basis of DenseNet, we build a novel Block that has two DenseBlocks in parallel, then add a cross-dense connections between the two DenseBlock channels, and finally we use conventional convolution with a convolution kernel size of 3 × 3 in one side channel and dilated convolution with a convolution kernel size greater than 3 × 3 in the other side channel.

Fig. 1

The architecture of the cross-dense connections.

The architecture of the cross-dense connections. As a result, the convolution layer in each DenseBlock can accept the output feature map of the current channel in parallel with the output of channel. This block can extract deep and shallow features with different scales via complex cross-dense connections. The output of each layer can be expressed as follows.where x or x dentoes the ath or bth channel, is the output concatenation from the 0th layer to (l-1)th layer. BN is BatchNormalization layer [18], ReLU is Linear rectification function [19] and Conv is convolutional layer. The Bottleneck layer of each channel in original DenseBlock is the 1 × 1 convolution layer combined with the growth rate. First, the 1 × 1 convolution layer is used to generate the feature graphs with channels of n × growth rate. Then the N × N convolution layer is used to generate the feature graphs with channel of growth rate, which reduces the number of feature graphs and model parameters. Because of the addition of cross dense connections, the lth layer of Block has 2 l-1 inputs and Block of l layer has all 2 l connections. It consists of splicing of output feature graphs in first two convolutional layers of DenseBlock. If the number of channels from input is × and the growth rate is a, the final number of output channels will be 2x+(2 l-1)a. One-sided soft thresholding As shown in Fig. 2 , we apply one-sided soft thresholding, a kind of nonlinear transformation with channel attention-based mechanism, to cross dense connections. The pixels that is smaller than τ are set as zero, while the pixels larger than τ, are set as minus τ. The improved architecture of attention-based mechanism is shown in Fig. 3 .

Fig. 2

Illustration of one-sided soft thresholding.

Fig. 3

Attention Mechanism.

Illustration of one-sided soft thresholding. Attention Mechanism. According to SENet [20], So, we apply the Global Average Pooling layer (GAP) to compress the feature map of each channel, yielding a series of real numbers with the same channel number. Then, by using a fully connected layer and ReLU transformation, the number of channels is reduced by 8 times for reduction of model complexity. We use the fully connection layer to restore the number of channels to the original number. And the softmax transformation is applied to each real number in the range (0,1), and the sum is 1. Then, the τ of each channel is calculated by multiplying the real number after the GAP by the cannel attention-based weight. The purpose of softmax transformation is to map each τ in the range of its each channel value. According to DRSN-CS [21], we filter the original images for denoising using soft-thresholding, and combine with ReLU non-linear transformation to build a one-side soft-thresholding structure, and activate the feature maps on each channel via a nonlinear transformation with one-sided soft thresholding concerning τ. Therefore, the features greater than their respective τ in different channels will be set as the current feature minus τ and those smaller than τ are set as zero. The method can learn the importance of each channel automatically, eliminate interference pixels, and reduce the influence on the subsequent convolutional layers. Crodense Block We name the combination of cross dense connections and nonlinear transformation with one-sided soft thresholding as CrodenseBlock in this study. Lemma: Assumes that for inputs , one-sided soft thresholding is , and convolution kernels are respectively. Convolution kernels are , , denoting a convolutional operation. Then the output of one-sided channel after CrodenseBlock of one layer is as follows. As shown in Fig. 3, GAP in Attention Mechanism module represents the global average pooling layer, with w being the fully connected layer parameters, r representing the features after the one-sided soft thresholding, and y representing the result after the convolution operation. Then is as follows: Transform the feature map via one-sided soft thresholding. Conduct convolution operation and nonlinear transformation ReLU on feature maps. Concatenate the feature maps of two channels by cross dense connections. When , where . As illustrated in Fig. 4 , CrodenseBlock1 module is made up of two parallel channels, with a total of 6 layers and each layer consisting of two parallel convolutional layers. It has a total of 72 connections because it has 6 layers (n). The growth rate (g) has been set to 16. The traditional convolution layer is used in the left channel, with a convolution kernel size of 3 × 3, which uses a small receptive field to extract small-scale features. In the right channel, the traditional convolution layer is replaced by a dilated convolution layer, and the expansion index (d) is 8, with seven zeros filled between the weights of each convolution kernel, resulting in convolution kernel size of 17 × 17. This method makes use of the large receptive field to extract large-scale features. The feature maps of each convolution layer are transformed by a nonlinear transformation with one-sided soft thresholding and transmitted to the subsequent convolution layer via the cross-dense connections. Different in-depth features are continuously added during feature transmission, and different feature maps can be highly reused in two channels. Therefore, the model integrates large-scale context information to improve classification accuracy of the model.

Fig. 4

Architecture of CrodenseBlock.

Architecture of CrodenseBlock. As shown in Fig. 5 , CrodenseNet applies four different CrodenseBlock blocks to extract features from feature maps with various sizes to the greatest extent possible. Following each CrodenseBlock, the feature outputs of the left and right channels are connected to different transition layers. The left transition layer consists of a 1 × 1 convolution layer and a 2 × 2 average pooling layer, while the right transition layer consists of a 1 × 1 convolution layer and a 3 × 3 convolutional layer with a stride (s) of 2. On the other side of channel, the convolution layer with a stride of 2 is used to replace average pooling layer to prevent the loss of some spatial characteristics when the average pooling layer reduces the feature scale.

Fig. 5

Architecture of CrodenseNet.

Auxiliary Classifier Architecture of CrodenseNet. Following four CrodenseBlock structures, we add two identical auxiliary classifiers in the outputs of two channels to increase more effective gradient signals and an additional regularization to curb overfitting in terms of GoogleNet [22]. The feature maps of two channels are concatenated and fed into the main classifier for final decisions. To prevent overfitting, all classifiers have three fully connected layers, with a Dropout layer [23] added after the first two fully connected layers. Following the final fully connected layer, the feature map is fed into the softmax [24] function, which yields the probability distribution for the corresponding class.

Implementation of CrodenseNet model

The CrodenseNet model in this paper uses three cross-entropy loss functions to calculate the loss between the SoftMax output probability and the actual probability. As a result, CrodenseNet includes two auxiliary classifiers (Classifier1 and Classifier2) with loss of L1 and L2, as well as a primary classifier (Classifier3) with loss of L3. The auxiliary classifier is used to improve effectiveness in training stage. The different weights are assigned to the three cross-entropy loss functions, with two auxiliary classifiers of 0.1, the main classifier of 0.8. Then, in reverse transmission, the formula of loss is as follows:where n is the number of images, k is the number of classes, y is the image label, and p is the predicted probability. The optimizer employs stochastic gradient descent (SGD), with a learning rate of 1e-3(μ). The training epoch is 100epoch(∊), and the batch(β) is 32. In each epoch, β cases of chest X-ray images are selected for training, and after training ηl times (each chest X-ray images in the training set was trained once), the entire validation set is tested, and the test result (λ) is calculated based on the output of main classifier. As the test result, the F1-score is chosen as evaluation metrics on COVID-19. When λ is reaching the highest value, save the current training parameters (ω) and enter the next epoch training until training over. Load ω for testing and evaluate classification performance. Table 2 depicts the detailed process.

Table 2

The Implementation of CrodenseNet model.

Input Chest X-Ray images Training set δ₁,Valid set δ₂
μ → the CNN initial learning rate
ξ → iteration steps in the CNN
∊ →the CNN maximum number of iterations
α → iteration steps in a training
β → the number of images covered in a training
η₁ → the maximum number of iterations in a training(η ← δ₁/β)
η₂ → the maximum number of iterations in a training(η ← δ₂/β)
λ → variables of saving accuracy (λ ← 0)
Outputω → CNN weight
Start
1. Initialize the CNN parameters: μ, ∊, β, λ
2. Light preprocessing the images with resolution of 3 × 256 × 256
3. Train the CNN and compute the weights
4. for ξ = 1 to ∊ do
5. for α = 1 to η ₁ do
6. Select a mini-batch from δ₁ with the β size
7. Forward propagation and compute the loss using μ
8. Back-propagation and update ω with SGD optimization
9. end
10. for α = 1 to η ₂ do
11. Select a mini-batch from δ₂ with the β size
12. Forward propagation and get the results of the CNN
13. end
14. If the F1score of COVID-19 > λ do
15. Save ω and λ ← F1score of COVID-19
16. end End

The Implementation of CrodenseNet model. Initialize the CNN parameters: μ, ∊, β, λ Light preprocessing the images with resolution of 3 × 256 × 256 Train the CNN and compute the weights for ξ = 1 to ∊ do for α = 1 to η 1 do Select a mini-batch from δ1 with the β size Forward propagation and compute the loss using μ Back-propagation and update ω with SGD optimization end for α = 1 to η 2 do Select a mini-batch from δ2 with the β size Forward propagation and get the results of the CNN end If the F1score of COVID-19 > λ do Save ω and λ ← F1score of COVID-19 end End

Experimental setup

A. Dataset description

The datasets in this paper are from the COVID-19 X datasets, which includes 3 classes e.g. COVID-19, normal, and viral pneumonia. They are made up of five different public COVID-19 Datasets: (1) COVID-19 Image Data Collection [25], (2) Fig. 1 COVID-19 Chest X-ray images Dataset [26], (3) ActualMed COVID-19 Chest X-ray images Dataset [27], (4) RSNA Pneumonia Detection Challenge dataset [28], and (5) COVID-19 radiography database [29]. There is a total collection of 19,164 Chest X-ray images, which include 4187 COVID-19 images, 8851 normal images, and 6126 viral pneumonia images. Table 3 shows the detailed distribution of each dataset. The chest X-ray images of COVIDx are shown in Fig. 6 .

Table 3

Distribution of COVIDx Chest X-ray images datasets.

ID	COVID-19	NORMAL	PNEUMONIA	TOTAL IMAGES
1	478	N/A	57	535
2	35	N/A	57	92
3	58	N/A	N/A	58
4	N/A	8851	6012	14863
5	3616	N/A	N/A	3616

Fig. 6

COVIDx Datasets.

Distribution of COVIDx Chest X-ray images datasets. COVIDx Datasets. As shown in Fig. 7 , we randomly selected 100 images from each class, 300 images in total, and explored t-distributed stochastic neighbor embedding (t-SNE) to downscale all samples to 2 dimensions. As can be seen, most of the images of different classes are discriminative, with COVID-19 on the left, normal on the lower right, and viral pneumonia on the upper right. However, since this dataset comes from 5 small datasets with different distributions, there are maybe overlap of the data distribution for 3-class images classes.

Fig. 7

t-SNE of COVIDx Datasets.

Evaluation metrics and data augmentation

Precision, Sensitivity, F1-score, Accuracy, AP of area under P-R curve, and AUC of area under ROC curve are considered as evaluation metrics in this paper. TP, TN, FP, and FN denote true positive, true negative, false positive, and false negative, respectively. The details are as follows. The Chest X-ray images in this paper are from different COVID-19 datasets. As a result, we first reduce the resolution of Chest X-ray images to 260 × 260, then horizontally rotate in the interval of (-10°, +10°) and flip randomly, and finally crop image in center to obtain the remaining 256 × 256 pixels. The augmentation effectively removes text interference and highlight ROI. Before being fed into the model, all the chopped images need normalization, and the RGB mean and standard deviation are set to 0.485, 0.456, 0.406, and 0.229, 0.224, 0.225, respectively.

Hyperparameter setting

The detailed hyperparameters of CrodenseNet are shown in Table 4 . The size of the input images is 3 × 256 × 256. The process of feature extraction is given as follows. The left channel goes through the convolution layer with a convolution kernel of 3 × 3 and a stride (s) of 2; the right channel goes through the dilated convolution layer with a scale (d) of 16 and a stride (s) of 2, so the convolution kernel size becomes 33 × 33. The feature map has grown to 64 × 128 × 128. A maximum pooling layer of 2 × 2 is linked on the left, and a convolution layer of 3 × 3 with a stride of 2 is linked on the right. Following this step, the feature map is reduced to 64 × 64 × 64. The images are then processed by the first CrodenseBlock layer, which has a growth rate (g) of 16, the number of layers (n) of 6, and an expansion rate (d) of 8 for the right channel. The feature maps are changed to 304 × 64 × 64. The left and right feature maps each passing through a convolution layer of 1 × 1, reduce their number of channels to 152 × 64 × 64. The left side is then passed through a 2 × 2 average pooling layer, and the right side is passed through a 3 × 3 convolution layer with a stride of 2, resulting in a feature map of 152 × 32 × 32. Then through the second CrodenseBlock layer, with a growth rate of 16 and the number of layers of 12, the expansion rate of the right channel is 2. The feature map becomes 672 × 32 × 32. Again, the feature map is 336 × 16 × 16 after passing through the second transition layer.

Table 4

Configuration of CrodenseNet hyperparameters.

	Channel1		Channel2
Name	Layers	size of filters	Layers	Size of filters	Output shape(depth × height × width)
Feature	Input	–	Input	–	3 × 256 × 256
	Conv	3 × 3,s = 2,p = 2	Conv	3 × 3,s = 2,p = 16,d = 16	64 × 128 × 128
	MP	2 × 2,s = 2	Conv	3 × 3,s = 2,p = 1	64 × 64 × 64
	CrodenseBlock1		g = 16,n = 6,d = 8		304 × 64 × 64
	Conv	1 × 1,s = 1	Conv	1 × 1,s = 1	152 × 64 × 64
	AP	2 × 2,s = 2	Conv	3 × 3,s = 2,p = 1	152 × 32 × 32
	CrodenseBlock12		g = 16,n = 12,d = 2		672 × 32 × 32
	Conv	1 × 1,s = 1	Conv	1 × 1,s = 1	336 × 32 × 32
	AP	2 × 2,s = 2	Conv	3 × 3,s = 2,p = 1	336 × 16 × 16
	CrodenseBlock13		g = 16,n = 24,d = 2		1424 × 16 × 16
	Conv	1 × 1,s = 1	Conv	1 × 1,s = 1	712 × 16 × 16
	AP	2 × 2,s = 2	Conv	3 × 3,s = 2,p = 1	712 × 8 × 8
	CrodenseBlock14		g = 16,n = 16,d = 1		1920 × 8 × 8
	Conv	1 × 1,s = 1	Conv	1 × 1,s = 1	960 × 8 × 8

Classifier1,2	GAP	–	GAP	–	960 × 1 × 1
	FC	–	FC	–	512 × 1 × 1
	Dropout	p = 0.4	Dropout	p = 0.4	–
	FC	–	FC	–	256 × 1 × 1
	Dropout	p = 0.4	Dropout	p = 0.4	–
	FC	–	FC	–	3 × 1 × 1
	Softmax	–	Softmax	–	3 × 1 × 1

Classifier3	Concat		–		1920 × 8 × 8
	Conv		3 × 3,s = 2,p = 1		1024 × 4 × 4
	Conv		3 × 3,s = 1,p = 1		512 × 4 × 4
	GAP		–		512 × 1 × 1
	FC		–		512 × 1 × 1
	Dropout		p = 0.4		–
	FC		–		256 × 1 × 1
	Dropout		p = 0.4		–
	FC		–		3 × 1 × 1
	Softmax		–		3 × 1 × 1

Configuration of CrodenseNet hyperparameters. The feature map is then 1424 × 16 × 16 after passing through the third CrodenseBlock layer, whose growth rate is 16, the number of layers is 24, and the expansion rate of right channel is 2. The feature map becomes 712 × 8 × 8 after the third transition layer. Through the fourth CrodenseBlock layer, whose growth rate is 16, the number of layers is 16, the convolution layer of right channel is changed to 3 × 3 convolution layer, and the feature map becomes 1920 × 8 × 8. Then, using convolution layer of 1 × 1, the number of channels is reduced to 960 × 8 × 8. The structure of the left and right auxiliary classifiers is as follows. First, the feature map is globally pooled to 960 × 1 × 1; after that, it passes through the first fully connected layer and Dropout layer, and the feature map becomes 512 × 1 × 1. The feature map is reduced to 256 × 1 × 1 after passing through the second fully connected layer and Dropout layer. Finally, it becomes 3 × 1 × 1 after passing through the third fully connected layer and Softmax layer. The main classifier is structured as follows: the feature maps of left and right channels are concatenated into 1920 × 8 × 8. The feature map changes to 1024 × 4 × 4 after a 3 × 3 convolution layer with a stride of 2, 512 × 4 × 4 after using a 3 × 3 convolution layer with a stride of 1. The main classifier reduces number of channels for fusing feature information of each channel. The feature map is reduced to 512 × 1 × 1 after global average pooling. It becomes 512 × 1 × 1 after the first fully connected and Dropout layer, and 256 × 1 × 1 after the second fully connected and Dropout layer; finally, the feature map becomes 3 × 1 × 1 after the third fully connected layer and Softmax layer.

Experimental results and analysis

Performance of COVID-19 detection

This paper conducts a 3-class classification experiment on the COVIDx datasets, with 70% as the training data, 10% as validation, and 20% as test. Fig. 8 shows the confusion matrix of 5-fold cross-validation, as well as PR and ROC curves for detection of COVID-19 infection. Each fold consists of 832 cases of COVID-19 images. CrodenseNet has the highest AUC (0.996) of each fold on COVID-19 when compared to DenseNet, CvdNet, CovidNet, and DarkCovidNet, indicating that CrodenseNet has a high positive detection rate on COVID-19 and is suitable for COVID-19 rapid testing. The AP of each fold on COVID-19 is also the highest (0.991), indicating that CrodenseNet achieves good classification performance when the number of COVID-19 instances is significantly lower than that of normal or viral pneumonia class.

Fig. 8

5-fold Confusion Matrix of CrodenseNet and PR and ROC curve for COVID-19 detection.

5-fold Confusion Matrix of CrodenseNet and PR and ROC curve for COVID-19 detection. Fig. 9 shows that evaluation metrics of CrodenseNet, i.e. Precision, Recall, and F1-score, on COVID-19 detection are the highest for 5-fold cross-validation. Table 5 provides detailed statistics for classification performance of various models on COVID-19 detection. Compared to DenseNet, CvdNet, CovidNet, and DarkCovidNet, CrodenseNet has improved Precision by 0.7%, 2.3%, 5 %, 1.8 %, respectively, F1-score by 1.1 %, 2.1%, 5.8 %, 1.9% to 0.973%, Recall by 1.6 %, 1.9 %, 6.5 %, 1.3%, AUC by 0.1%, 0.1 %, 0.9 %, 0.1% to 0.996%, and AP by 1%, 0.4 %, 2.9 %, 0.5%, respectively. Generalization performance of CrodenseNet outperforms the four state-of-the-art models, and achieves average recall of 0.967 on COVID-19, which can help clinical physician detect potential false negative COVID-19 patients effectively and timely and reduces COVID-19 transmission risks via early diagnosis.

Fig. 9

5-fold performance of various models on COVID-19.

Table 5

Performance of various models on COVID-19 detection.

Model	PRECISION	RECALL	F1-score	AP	AUC
DenseNet121	0.972 ± 0.004	0.952 ± 0.009	0.962 ± 0.005	0.981 ± 0.005	0.995 ± 0.001
CrodenseNet	0.979 ± 0.004	0.967 ± 0.010	0.973 ± 0.005	0.991 ± 0.002	0.996 ± 0.001
CvdNet	0.957 ± 0.013	0.949 ± 0.014	0.953 ± 0.004	0.987 ± 0.002	0.995 ± 0.001
CovidNet	0.932 ± 0.013	0.908 ± 0.019	0.920 ± 0.009	0.963 ± 0.006	0.987 ± 0.002
DarkCovidNet	0.962 ± 0.010	0.949 ± 0.015	0.955 ± 0.003	0.986 ± 0.003	0.995 ± 0.001

5-fold performance of various models on COVID-19. Performance of various models on COVID-19 detection.

Performance of CrodenseNet model

Fig. 10 shows 5-fold cross-validation results on 3-class classification. Because CrodenseNet focuses on prediction of COVID-19 infection, performance in prediction of normal and viral pneumonia has a downside, lowering the overall performance. However, the average of Precision, Recall, F1-Score, AP, and AUC are higher than those of other state-of-the-art models. Table 6 shows various evaluation metrics of various models. Compared to DenseNet121, CvdNet, CovidNet, and DarkCovidNet, CrodenseNet has improved the precision by 0.5%, 1.5%, 6.7%, and 2.3% respectively, recall by 1.1%, 2.3 %, 7.1%, and 1.9% to 0.935%, F1-score by 0.9%, 2%, 6.8%, and 2.1% to 0.937%, accuracy by 0.8%, 1.9%, 6.9%, and 2.1%, respectively. Therefore, generalization performance of CrodenseNet outperforms the four state-of-the-art models on normal and viral pneumonia detection in terms of evaluation metrics.

Fig. 10

5-fold performance of various models on 3-class classification.

Table 6

Performance and computational complexity of various models.

Model	PRECISION	RECALL	F1-score	ACCURACY	Time (GFlops)	Parameters(/n)
DenseNet121	0.935 ± 0.006	0.925 ± 0.010	0.929 ± 0.008	0.927 ± 0.008	3.76	6,956,931
CrodenseNet	0.940 ± 0.007	0.935 ± 0.013	0.937 ± 0.010	0.934 ± 0.010	6.08	43,571,503
CvdNet	0.926 ± 0.007	0.914 ± 0.013	0.919 ± 0.010	0.917 ± 0.011	6.78	5,321,571
CovidNet	0.881 ± 0.005	0.873 ± 0.006	0.877 ± 0.006	0.874 ± 0.005	8.18	165,971,219
DarkCovidNet	0.919 ± 0.004	0.918 ± 0.009	0.918 ± 0.006	0.915 ± 0.006	0.30	1,167,586

5-fold performance of various models on 3-class classification. Performance and computational complexity of various models. CrodenseNet model has a total of 43,571,503 parameters, the majority of which are applied for the nonlinear transformation with one-sided soft thresholding and fully connected layers of final classifier. As we use a large number of 1 × 1 convolution layers, the time complexity is kept to a minimum. As a result, CrodenseNet can perform quick prediction of COVID-19 infection on batch Chest X-ray images.

Ablation study

To analyze performance of each block of CrodenseNet, we improve the basic parallel densenet via addition of different blocks, such as cross-dense connection (CD), Dilated convolution (DC), one-sided soft thresholding (ST), and multiple classifier (MC) blocks for ablation study based on the first fold test data. From analysis shown in Table 7 , the basic parallel DenseNet has the lowest overall performance due to small receptive field, and the inefficient usage of superficial and in-depth features. Addition of cross-dense connection in each DenseNet module can transmit the superficial features to the in-depth layers effectively, which improves the usage of features and enhances classification performance. The features with different receptive field size can be extracted effectively after adding Dilated convolution with different expansion rate for each cross-dense network and parallel DenseNet, and the overall performance shows rising. After adding one-sided soft thresholding to cross-dense network, CrodenseNet can automatically set different soft thresholding for each channel based on the importance of each channel in the feature map, effectively filtering the noise in the feature map for different channels and improving classification performance. Following addition of multiple classifiers to cross-dense network and parallel DenseNet, each channel can receive more efficient gradient signals and additional regularization, as well as make decisions via vote of main classifier. It is equivalent to adding a classification model with shared features that also serves as an integration model. As a result, overall performance is significantly improved. CrodenseNet consists of four modules (CD, DC, ST, and MC), each of which can extract features by using receptive fields of variant size, as well as use the cross-dense connections and one-sided soft thresholding to effectively remove noise interference from the feature map. Furthermore, by using multiple classifiers, more gradient signals and regularization can be obtained to improve overall performance of model. After addition of various blocks, CrodenseNet can achieve the Precision of 0.938, Recall of 0.934, F1-score of 0.936, and Accuracy of 0.932 on the first fold test data, respectively, and improve classification performance as well.

Table 7

Ablation study of CrodenseNet.

Model	PRECISION	RECALL	F1-score	ACCURACY
parallel densenet	0.910	0.898	0.903	0.903
+DC	0.922	0.905	0.913	0.912
+MC	0.922	0.912	0.916	0.915
+CD	0.923	0.914	0.918	0.916
+CD + DC	0.924	0.921	0.922	0.921
+CD + ST	0.926	0.924	0.925	0.922
+CD + MC	0.938	0.923	0.930	0.927
CrodenseNet	0.938	0.934	0.936	0.932

Ablation study of CrodenseNet. We add the one-sided soft thresholding in each CrodenseBlock to find which the improved model can achieve highest performance. As shown in Fig. 11 (a), adding ST in the first two CrodenseBlocks achieves the highest performance. It is because the front CrodenseBlocks can extract the superficial texture features, while the back CrodenseBlocks can extract the in-depth features. Using one-sided soft thresholding can reduce the superficial feature noise, further to improve the performance.

Fig. 11

Albated study for ST and DC.

Albated study for ST and DC. We resize the number of dilated convolution layers for the four CrodenseBlocks to find the impact on performance. Fig. 11(b) shows that the performance increases with the increase of dilated rate of dilated convolutions. When reaching at the critical threshold (10,4,4,1), the performance declines abruptly. It is because the dilated convolution can enlarge the receptive field, and enhances the relations of global features. But when the feature map is small, and the dilated rate is larger, the relations between long-term features decreases, and the performance declines. For CrodenseNet, the weight of the main classifier is set as 0.8 and subsidiary classifier as 0.01,0.04,0.1,0.4,0.8,1 respectively. The results on test dataset are shown in Fig. 12 . It shows that the performance of the main classifier is the best when the weight of subsidiary classifier is set as 0.1. Meanwhile, when the weight is 0.8 or 1, it seems to be a small decrease in performance for main classifier, but is still higher than the performance when the weight is 0.01 or 0.04. Therefore, the subsidiary classifiers for two channels enhance gradient transmission efficiency, improve the accuracy of feature extraction for convolutional layers, also add the additional normalization, and furtherly improve the performance of the main classifiers.

Fig. 12

Test for MC in ablated study.

Discussion

Computational complexity

The model complexity is the key factor influencing prediction error. Low complexity leads to underfitting, while high complexity causes overfitting. Time complexity and model parameters can often be used to represent model complexity [30]. The time complexity, or number of model operations, is measured in floating-point operations (Flops) and represents the time required to train or predict. Model parameters are the derivatives of convolution layer and fully connected layer during model training. The greater the number of model parameters, the greater the amount of memory required. The time complexity and model parameters are calculated as follows. where M is the length of feature maps, K is the length of convolutional kernel, is the number of convolutional layers, and Cl represents the number of output channels of the lth convolutional layer. As shown in Table 6, the limitation of the CrodenseNet is the use of a large number of fully connected layers, which results in an increase in model parameters and time complexity. Hence, we plot a radar chart for each model to showcase the trade-off between time complexity and qualitative performance as shown in Fig. 13 . It can be seen that our proposed model CrodenseNet has almost outstanding comprehensive performance, though an increase in time complexity occurs compared to DenseNet121 and DarkCovidNet. For the ratio per 106 pixels of the area in radar chart for each model, CrodenseNet, DenseNet121, DarkCovidNet, CvdNet and CovidNet are 1.108, 1.081, 1.007, 0.65 and 0.051, respectively. So comprehensive performance of CrodenseNet is similar with DenseNet121 and DarkCovidNet substantially, is much better than CvdNet and CovidNet.

Fig. 13

radar chart of each model performance.

Features of visualization

To assist clinicians in the analysis and diagnosis of COVID-19, the thermal map visualization technology Grade-CAM [31] is applied to highlight the feature areas that the model focuses on when classifying Chest X-ray images. As shown in Fig. 14 , (a) and (c) are the original X-Ray images, while (b) and (d) are the corresponding thermal maps that show the key region/pixel of feature extraction. The more vibrant the color in the thermal map, the more attention the model pays. Except for the lung, the red regions on COVID-19 recognition are concentrated in the upper right and upper left of the image, including more shoulder regions or more noise pixels. It is because that the pathological features of COVID-19 are concentrated at the lung edge, and the extracted features cannot be completely concentrated throughout the lung. The red regions on Normal recognition are concentrated in the right lung, where there are fewer interference pixels. Because the lung of Normal class patient has no pathological features, the model can focus on the left and right lungs. The red regions on Viral recognition are concentrated in the bilateral lungs, indicating that the pathological features of Viral pneumonia are more visible, and the model can accurately identify the lesion areas. When predicting COVID-19 and viral pneumonia, CrodenseNet readily receives the influence of interference pixels other than lung pixels. As a result, X-Ray images should be clipped appropriately before being fed into classification model to remove interference pixels and improve attention on key features.

Fig. 14

Heatmap showing the parts of the input image (a), (c) that triggered the prediction (b), (d).

Conclusion

CrodenseNet model on the basis of DenseNet framework, is proposed in this paper. Two DenseNet blocks with different convolutional kernel sizes and parallel cross-dense connections are applied for extraction of the deep features of Chest X-ray images, and nonlinear transformation with one-sided soft thresholding is used to filter interference noise, further to implement the automatic 3-class identification, such as COVID-19, normal, and viral pneumonia. CrodenseNet has achieved a precision of 97.9 %, a recall of 96.7 %, and a F1-score of 97.3 % for COVID-19 detection according to cross-validation experiments on COVID-19x datasets. CrodenseNet outperforms various the-state-of-art models in classification accuracy and generalization performance, but has higher computation complexity than the light models. Therefore, we would like to optimize the architecture of model to decrease time complexity. Meanwhile, the spatial mechanism is required to improve the ability of feature extraction on convid-19 infection. In the future, we would like to design more efficient architecture for covid-19 detection, reduce the time complexity of model, and improve the sensitivity of prediction.

Funding

This research is funded by the (81973749).

Ethical approval

Not required.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

12 in total

1. Review on COVID-19 diagnosis models based on machine learning and deep learning approaches.

Authors: Zaid Abdi Alkareem Alyasseri; Mohammed Azmi Al-Betar; Iyad Abu Doush; Mohammed A Awadallah; Ammar Kamal Abasi; Sharif Naser Makhadmeh; Osama Ahmad Alomari; Karrar Hameed Abdulkareem; Afzan Adam; Robertas Damasevicius; Mazin Abed Mohammed; Raed Abu Zitar
Journal: Expert Syst Date: 2021-07-28 Impact factor: 2.812

2. Automated detection of COVID-19 cases using deep neural networks with X-ray images.

Authors: Tulin Ozturk; Muhammed Talo; Eylul Azra Yildirim; Ulas Baran Baloglu; Ozal Yildirim; U Rajendra Acharya
Journal: Comput Biol Med Date: 2020-04-28 Impact factor: 4.589

3. First Case of 2019 Novel Coronavirus in the United States.

Authors: Michelle L Holshue; Chas DeBolt; Scott Lindquist; Kathy H Lofy; John Wiesman; Hollianne Bruce; Christopher Spitters; Keith Ericson; Sara Wilkerson; Ahmet Tural; George Diaz; Amanda Cohn; LeAnne Fox; Anita Patel; Susan I Gerber; Lindsay Kim; Suxiang Tong; Xiaoyan Lu; Steve Lindstrom; Mark A Pallansch; William C Weldon; Holly M Biggs; Timothy M Uyeki; Satish K Pillai
Journal: N Engl J Med Date: 2020-01-31 Impact factor: 91.245

4. The continuing 2019-nCoV epidemic threat of novel coronaviruses to global health - The latest 2019 novel coronavirus outbreak in Wuhan, China.

Authors: David S Hui; Esam I Azhar; Tariq A Madani; Francine Ntoumi; Richard Kock; Osman Dar; Giuseppe Ippolito; Timothy D Mchugh; Ziad A Memish; Christian Drosten; Alimuddin Zumla; Eskild Petersen
Journal: Int J Infect Dis Date: 2020-01-14 Impact factor: 3.623

5. COVID-Net: a tailored deep convolutional neural network design for detection of COVID-19 cases from chest X-ray images.

Authors: Linda Wang; Zhong Qiu Lin; Alexander Wong
Journal: Sci Rep Date: 2020-11-11 Impact factor: 4.379

6. CoVNet-19: A Deep Learning model for the detection and analysis of COVID-19 patients.

Authors: Priyansh Kedia; Rahul Katarya
Journal: Appl Soft Comput Date: 2021-02-15 Impact factor: 6.725

Review 7. Neurological consequences of COVID-19 and brain related pathogenic mechanisms: A new challenge for neuroscience.

Authors: Fiorella Sarubbo; Khaoulah El Haji; Aina Vidal-Balle; Joan Bargay Lleonart
Journal: Brain Behav Immun Health Date: 2021-11-30