Literature DB >> 35601273

A COVID-19 CXR image recognition method based on MSA-DDCovidNet.

Wei Wang1, Wendi Huang1, Xin Wang1, Peng Zhang2, Nian Zhang3.   

Abstract

Currently, coronavirus disease 2019 (COVID-19) has not been contained. It is a safe and effective way to detect infected persons in chest X-ray (CXR) images based on deep learning methods. To solve the above problem, the dual-path multi-scale fusion (DMFF) module and dense dilated depth-wise separable (D3S) module are used to extract shallow and deep features, respectively. Based on these two modules and multi-scale spatial attention (MSA) mechanism, a lightweight convolutional neural network model, MSA-DDCovidNet, is designed. Experimental results show that the accuracy of the MSA-DDCovidNet model on COVID-19 CXR images is as high as 97.962%, In addition, the proposed MSA-DDCovidNet has less computation complexity and fewer parameter numbers. Compared with other methods, MSA-DDCovidNet can help diagnose COVID-19 more quickly and accurately.
© 2022 The Authors. IET Image Processing published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology.

Entities:  

Year:  2022        PMID: 35601273      PMCID: PMC9111165          DOI: 10.1049/ipr2.12474

Source DB:  PubMed          Journal:  IET Image Process        ISSN: 1751-9659            Impact factor:   2.373


INTRODUCTION

The 2019‐nCoV is spreading with an extremely fast rate. Coronavirus disease 2019 (COVID‐19) caused by 2019‐nCoV has put many countries and regions with scarce medical resources and low medical standards into trouble. The most common used method for diagnosing COVID‐19 is a detection method based on reverse transcriptase polymerase chain reaction (RT‐PCR). It has high specificity, but the current demand for detection kits is increasing [1]. In addition, its sensitivity is low, which makes it prone to false negative diagnostic results. False negative results have serious consequences on the COVID‐19 prevention. For countries and regions where medical resources are scarce, a fast, reliable, and low‐cost detection method should be sought. CXR is the most widely used imaging test to diagnose heart and other chest diseases [2]. Compared with CT scans, CXR is more popular, and X‐rays have lower ionizing radiation [3]. Detecting diseases through chest radiographs is an extremely challenging task. It requires a certain amount of professional knowledge and careful observation. COVID‐19 contains some radiological features that can be detected by CXR. However, if these characteristics are analysed by manual film reading, not only will it take up a lot of medical staff's time, but it will also be prone to errors due to visual fatigue and other disturbances. Therefore, it is necessary for us to find a way to automate the detection of CXR. The purpose of this study is to search a lightweight and accurate CXR image automatic recognition method of COVID‐19 to assist medical staff in diagnosis. Since convolutional neural networks (CNNs) have excellent performance in image recognition task, especially in image classification task, CNN model is considered to realize this method. In order to ensure that the model can accurately identify the CXR image of COVID‐19 in a low‐cost way, deep separable convolution [4], feature reuse and multi‐scale feature fusion are adopted fully when designing the network structure. The remainder of the paper is arranged as follows: Section 2 discusses the related work of CNN image recognition and medical image recognition. Section 3 describes the structure of our proposed network and its modules. Section 4 shows the experimental dataset, parameter setting and experimental results and analyses the results in detail. Section 5 carefully analyses the advantages of the structure of MSA‐DDCovidNet and the limitations of the study. Section 6 summarizes the paper and describes our prospects for the future of this study.

RELATED WORK

In recent years, deep learning has been widely used in medical image detection. For example, Wang W et al. [5] applied the image classification method based on Deep Learning to the classification of Colonic Polyps and proposed the improved approaches VGGNets‐GAP and ResNets‐GAP with global average pooling (GAP) to classified colonoscopy polyp images for assisted diagnosis. Inspired by the DenseNet [6] and MobileNet [4], Wang W et al. [7] proposed Dense‐MobileNet, which got a good performance in children's colonoscopy polyp dataset. As a representative branch of deep learning technology, convolutional neural network (CNN) has excellent performance in image feature extraction and learning [8]. Therefore, researchers recommend using deep learning technology to help detect lesion information on CXR images, save medical resources, and improve diagnosis efficiency. For example, Khan et al. [9] proposed the CoroNet based on the structure of Xception [10], which achieved good performance on the COVID‐19 CXR image classification. Based on Xception [10] and ResNet50V2 [11], Rahimzadeh et al. [12] designed a network which improved the performance of the network by combining the output feature of the two networks. The network has achieved good results on a dataset containing three types of CXR images of COVID‐19, pneumonia and normal. Wang et al. [13] designed the channel feature weight extraction module (CFWE) according to the characteristics of CXR image and proposed a new CFW‐Net. Ozturk et al. [14] proposed a DarkCovidNet, which was improved based on the DarkNet‐19 network and achieved good classification accuracy. To recognize the COVID‐19 CXR images, Wang et al. [15] designed a new network MCFF‐Net based on the Parallel Channel Attention Feature Fusion Module (PCAF). Wang et al. [16] proposed a new method to detect COVID‐19 patients in CXR images based on MAI‐Nets, and finally got an excellent result with an accuracy of 96.42%.

ARCHITECTURE DESIGN

Commonly, CXR images of different classes are highly convergent, and CXR images in the same class have low specificity. This leads to model deviation and overfitting, which reduces the performance and generalization of the model. Moreover, CNN for mobile terminals requires a model with few parameters and fast speed, otherwise it will cause delays and undermine recognition efficiency. In response to the above problems, a new lightweight CNN, MSA‐DDCovidNet, is proposed, based on DMFF module and D3S module and the multi‐scale spatial attention (MSA) mechanism.

DMFF module and D3S module

The DMFF module and the D3S module are innovatively proposed by our team, and both are modules based on deep separable convolution. They have high computational efficiency and have strong representational capacity on the shallow and deep feature maps respectively. Their structure diagrams are shown in Figures 1 and 2. In Figures 1 and 2, H, W, and C denote the height, width, and channels of the feature maps, respectively; f means the number of convolution kernels, k represents the size of the convolution kernel, and s denotes the step size. Depth Separate convolution decomposes the convolution process into two processes: depth‐wise convolution and point‐wise convolution. Such decomposition process can greatly reduce the amount of calculation and model parameters. Applying h‐swish can alleviate the delay [17], so h‐swish is adopted as the activation function in the network.
FIGURE 1

The structure of DMFF Module and D3S Module

FIGURE 2

The structure of Multi‐scale Spatial Attention mechanism

The structure of DMFF Module and D3S Module The structure of Multi‐scale Spatial Attention mechanism DMFF module splits the input feature maps into channels and generates two branches. After increasing the channels with point‐wise convolution, one of branches uses the dilated depth‐wise convolution layer, that is, the depth‐wise convolution layer using dilated convolution kernel with an expansion rate of 2, instead of using the ordinary convolution kernel. The other branch uses depth‐wise convolutional layer after a point‐wise convolution layer. Finally, it concatenates the channels of feature maps of two branches, and gets the output after a channel shuffle [18] operation. Obviously, the receptive fields of the two branches are different. The channel‐wise concatenate operation can realize multi‐scale feature fusion and enhance the spatial representational capacity of the model. Since the dilated convolution with an expansion rate of 2 does not increase the complexity of the model [11], the parameters and the amount of calculation of the two branches are the same. Since the features extracted by the convolutional layer close to the input contain detailed texture information, the DMFF module will be used in the shallow layers of the proposed network. D3S module is based on dilated depth‐wise separable convolutional layer and dense connection. The input feature maps pass through a dilated depth separable convolutional layer, and then the obtained feature maps and the input feature maps are channel‐wise concatenated as the output of the module. Compared with standard convolution, the dilated depth‐wise separable convolution has fewer parameters and calculation, and a larger receptive field, which makes the model more lightweight and efficient. The features extracted from the deep layers of the network are more critical for distinguishing heterogeneous samples. Feature reuse can alleviate information loss. Therefore, the D3S module will be used in the deep layers of the proposed network.

Multi‐scale spatial attention (MSA) mechanism

Inspired by Kim et al. [19], a novel multi‐scale spatial attention (MSA) mechanism is proposed. Before being input to the fully connected layer, the feature map will be input to MSA attention, as shown in Figure 2. Let there be L successive D3S modules in the network. On the one hand, to obtain a spatial attention map, the feature maps output by the first DMFF module will be input to a global average pooling layer and a standard convolution layer. The resulting feature maps are token as a spatial attention map; on the other hand, three groups of feature maps containing different depth semantic features are channel‐wise concatenated. Such resulting feature maps contains rich multi‐scale deep features. These feature maps are multiplied with the spatial attention map to extract the key spatial information in the feature map. Compared with the single‐scale spatial attention mechanism, MSA mechanism can capture feature information of different depths, and has better spatial representational capacity.

The structure of MSA‐DDCovidNet

The structure of MSA‐DDCovidNet is shown in Figure 3. The input image is preprocessed before being input to the model. The first layer contains a dilated convolution filters with an expansion rate of 2. Then the DMFF module is used for five times to halve the spatial dimension (the height and width) of the feature maps, remove redundant information and compress the features. And then the depth‐wise separable convolution layer is designed to enrich feature information. Next, nine successive D3S modules are set to extract deep features and alleviate the disappearance of gradients. Then the MSA mechanism is used to extract the spatial domain information in the multi‐scale feature maps. After the global average pooling layer, the spatial size of the feature maps becomes 1 × 1. Then a point convolution layer is used to increase the feature dimension and full connection layer. Next, a fully connected layer is used to reduce the impact of feature coordinate information on classification. Finally, the SoftMax layer is used for classification.
FIGURE 3

The structure of MSA‐DDCovidNet

The structure of MSA‐DDCovidNet

Network complexity

In this work, the amount of computation and the number of parameters are adopted to measure the complexity of the model. The parameters generated by the weight layers in CNN, which mainly includes convolution layer and full connection layer. The amount of computation refers to floating‐point operations (FLOPs). All kinds of operations in the network will produce computation, even a simple element‐wise addition operation. The parameters and the amount of computation of the model are mainly related to the depth, width, the resolution of input images and the structure of model. For a given input feature map Hi · Wi · Ci and the output feature map Ho · Wo · Co, the parameters Pstd and the amount of computation Fstd produced by a standard convolution are as follows: Since the dilated convolution with an expansion rate of 2 will not increase parameters and calculations, the parameters Pdw and the amount of computation Fdw generated by a depth‐wise convolution and a depth‐wise dilated convolution process with an expansion rate of 2 are as follows: For a given input feature map H · W · C and the output feature map (H / 2) · (W / 2) · (C + 16), the parameters Pconv3_1 and the amount of computation Fconv3_1 generated by a standard convolution with kernel size 3 × 3 are as follows: And when the DMFF module is used to complete the above dimension conversion, the parameters PDMFF and the amount of computation FDMFF generated by a DMFF module are as follows: Therefore, compared to a standard convolution, the reduction in parameter ∆DMFF_P and computation ∆DMFF_F achieved by DMFF module is shown as follows: Similarly, for a given input feature map H · W · C and the output feature map H · W · (C + 16), the parameters Pconv3_2 and the amount of computation Fconv3_2 generated by a standard convolution with kernel size 3 × 3 are as follows: When the D3S module is used to complete the above dimension conversion, the parameters PD3S and the amount of computation FD3S generated by a D3S module are as follows: Therefore, compared to a standard convolution, the reduction in parameter ∆D3S_P and computation ∆D3S_F achieved by D3S module are shown as follows: Obviously, ∆DMFF_P > 0, ∆DMFF_F > 0, ∆D3S_P > 0 and ∆D3S_F > 0, which means DMFF module and D3S module make positive contribution to reduce the parameters and calculation. The complexity of MSA mechanism is analysed. For three sets of input feature map with shapes H · W · C, H · W · (C + 16), H · W · (C + 32) input feature map, the output feature map H · W · (3 · C + 48) and the shallow feature map H1 · W1 · C1, the parameters PMSA and the amount of computation FMSA generated by MSA mechanism are as follows:

EXPERIMENTAL RESULTS

Dataset

Two different datasets were used in this study. The first dataset mentioned in this paper is used in the comparative experiment between MSA‐DDCovidNet network and some state‐of‐the‐art CNNs. CXR images in the above dataset come from two datasets: Kaggle CXR dataset [20] (https://www.kaggle.com/paultimothymooney/chest‐xray‐pneumonia) and the dataset collected by Joseph et al. [21]. Kaggle CXR dataset has a total of 5863 images, including pneumonia and normal CXR images. From the above two classes of images, 4265 images and 1575 images were selected. The dataset proposed by Joseph et al. has a total of 790 CXR images and CT images of patients infected with COVID‐19 or other pneumonia. Finally, 412 CXR images of with COVID‐19 patients are selected in this dataset. Therefore, the experimental dataset in this article contains a total of 6252 images. 310 COVID‐19 images, 1341 normal images, and 3875 pneumonia images are randomly selected from the experimental dataset as the training set. The remaining 102 COVID‐19 images, 234 normal images, and 390 pneumonia images are used as the test set. In the following section, COVIDx dataset [22] is adopted to verify the performance of MSA‐DDCovidNet on other CXR image datasets. The COVIDx dataset is obtained according to the dataset generation method provided by Wang et al. [22], and finally got 589 COVID‐19 images, 8851 normal images and 6053 images of pneumonia. Similar to the method of Nihad et al. [23], 100 COVID‐19 images, 885 normal images, and 594 pneumonia images in COVIDx are randomly selected as the test set, and the remaining as the training set. Figure 4 shows an example of various CXR images in the experimental dataset of this work. It can reflect the high inter‐class similarity and low intra‐class variance of CXR images, which ratchet up the difficulty to the CXR images classification task.
FIGURE 4

Cases of CXR Images. (a) Represent COVID‐19 CXR images. CXR images of COVID‐19 are mainly characterized by Pulmonary interstitial edema and exudation, thickening of pulmonary grain and multiple patchy and spotted shadow (b) Represent normal CXR images. (c) Represent pneumonia CXR images

Cases of CXR Images. (a) Represent COVID‐19 CXR images. CXR images of COVID‐19 are mainly characterized by Pulmonary interstitial edema and exudation, thickening of pulmonary grain and multiple patchy and spotted shadow (b) Represent normal CXR images. (c) Represent pneumonia CXR images

The evaluation criteria of model

In terms of model evaluation criteria, we refer to the evaluation criteria adopted by most medical image classification models. Accuracy, precision, sensitivity, specificity, F1‐score, receiver operating characteristic (ROC) curve and area under the curve (AUC) are adopted as the model evaluation criteria. Some of the formulas for these evaluation criteria are as follows: In these equations, TP denotes true positive, FP means false positive, FN represents false negative, and TN represents true negative.

Preprocessing and parameter settings

Since model training requires sufficient data samples, data augmentation techniques are used in this work. First, the resolution of the CXR images is scaled to a fixed size of 256 × 256, and the centre crop is applied to make the size 224 × 224. Then we perform a series of data enhancement processing on the training set: flip the CXR images horizontally with a probability of 0.5, and then randomly adjust the brightness, contrast, and saturation of the images to 0.6–1.4 times. After data enhancement technology, in fact, the number of samples used for training is four times that of the training set. This article conducts all experiments in the same configuration environment. The software platform and hardware environment are shown in Table 1.
TABLE 1

Experimental platform configuration

AttributeConfiguration information
Operating systemUbuntu 18.04.1
CPUIntel(R) Xeon(R) CPU E5‐2670 v3 @ 2.30GHz
GPUGeForce RTX 2080
CUDNNCUDNN 7.5.0
CUDACUDA 10.0.130
FramePytorch
IDEPycharm
LanguagePython
Experimental platform configuration After many experiments, the training strategy of this experiment is summarized. The initial learning rate of the experimental models was set to 0.001. Each group of experiments was trained 150 cycles of epoch, and the loss function was the Cross‐Entropy loss function for label smoothing regularization [24] with epsilon = 0.1. And Adam [25] optimizer with betas = (0.9, 0.999) is used to make the model converge quickly. The batch‐size of training set and test set are 32 and 16 respectively.

Experimental results and analysis

In order to illustrate the lightweight and classification performance of our proposed model, several state‐of‐the‐art models are used as the control group in the experiments, such as VGG19 [26], GoogLeNet [27], ResNet50 [28], DenseNet121 [6]. The control group also contain various lightweight networks such as SqueezeNet1.0 [29], ShuffleNet [30], MobileNetV2 [18] and ShuffleNetV2 [31]. The performance of the above models is shown in Table 2. As can be seen from the Table 2, the classification accuracy, precision, sensitivity, specificity and F1 score of MSA‐DDCovidNet are 97.96%, 98.09%, 98.07%, 98.33% and 98.07%, respectively. Obviously, each criteria value of our proposed network is better than other networks. Taking the traditional network ResNet50 [28] in the control group as an example, its accuracy is 93.53%, which is the traditional network with the highest accuracy in our experiment. However, it is still 4.43% lower than the proposed network.
TABLE 2

Values of criteria of experimented models

ModelAccuracy (%)Precision (%)Sensitivity (%)Specificity (%)F1‐score (%)
VGG19 [26]93.1196.0992.9396.4793.02
GoogleNet [27]92.5695.2991.5695.7892.06
ResNet50 [28]93.5396.0193.1596.5393.34
DenseNet121 [6]93.1195.9892.7596.3892.92
SqueezeNet1.0 [29]67.9145.8350.5164.1657.93
MobileNet [4]88.5390.1487.2591.8487.89
ShuffleNet [30]87.0290.0886.1792.3186.59
MobileNetV2 [18]89.2691.8988.5193.1688.89
ShuffleNetV2 [31]92.0191.9291.7496.2991.87
MSA‐DDCovidNet 97.96 98.09 98.07 98.33 98.07
Values of criteria of experimented models In terms of the network complexity, it can be seen from the Table 3 that the parameter and the amount of calculation of MSA‐DDCovidNet outperform the other methods. Taking the lightweight networks ShuffleNet [30] and SqueezeNet1.0 [29] as examples, they are the networks with the least amount of calculation and parameters in the control group respectively. But they are still not as lightweight as our network, and their classification performance is also far less than our network. Moreover, as shown in Table 3 the parameters and the amount of calculation of ResNet50 [28] are 54.68 and 43.21 times that of ours respectively, which is obviously not as light‐weight as MSA‐DDCovidNet.
TABLE 3

Parameters and flops of several deep learning models and MSA‐DDCovidNet

ModelFlops (million)Params (million)
VGG1918 736.81137.04
GoogLeNet1 434.215.32
ResNet503 919.1322.42
DenseNet1212 731.916.62
SqueezeNet_1.0702.710.73
MobileNet560.733.11
ShuffleNet142.020.91
MobileNetV2311.132.13
ShuffleNetV2144.721.22
MSA‐DDCovidNet 90.69 0.41
Parameters and flops of several deep learning models and MSA‐DDCovidNet Figure 5 shows the confusion matrix of MSA‐DDCovidNet on test set. As can be seem from Figure 5, the sensitivity of COVID‐19 is 95.10% when 97 images are detected from 102 tested images. In addition, the true detection of the Normal class is 98.29%. Further, the Pneumonia class achieves 98.46% success ratio. Based on this confusion matrix, the values of various criteria of MSA‐DDCovidNet are calculate, as shown in Table 4. As shown in Table 4, the weighted average precision, sensitivity, and specificity of MSA‐DDCovidNet are all higher than 97%, which are 97.95%, 97.93% and 98.23% respectively. More notably, the precision and specificity of MSA‐DDCovidNet to recognize COVID‐19 reach 100%. Since the baseline sensitivity of Covid‐19 CXR images is 69% [32], it proves that our proposed network can effectively improve the diagnostic efficiency of COVID‐19.
FIGURE 5

The confusion matrix of MSA‐DDCovidNet

TABLE 4

Precision, sensitivity, specificity of MSA‐DDCovidNet on test set

ClassPrecision (%)Sensitivity (%)Specificity (%)
COVID‐1910095.10100
Normal97.0598.2998.58
Pneumonia97.9698.4697.55
Average97.9597.9398.23
The confusion matrix of MSA‐DDCovidNet Precision, sensitivity, specificity of MSA‐DDCovidNet on test set In addition, some deep learning methods for detection of COVID CXR images are compared with MSA‐DDCovidNet, as shown in Table 5. As is shown in Table 5, DarkCovidNet [14] has the fewest parameters among the five comparison models. But it is still 2.68 times more than that of MSA‐DDCovidNet, and its classification accuracy is 10.94% lower than MSA‐DDCovidNet. ECOVNet‐Soft [23] has the highest accuracy among the five comparison models, which is still 2.26% lower than our proposed network, and its parameter is 12.146 times that of our proposed network. Therefore, considering the network performance and complexity, it demonstrates that our proposed network is a recommendable intelligent method for recognizing CXR images of COVID‐19.
TABLE 5

Comparison of MSA‐DDCovidNet with other deep learning methods developed using X‐ray images

MethodNumbers of casesModelAccuracy/%Params (Million)
Rahimzadeh et al. [12]

224 COVID‐19

700 Pneumonia

504 Normal

XResNet50V2 [12]92.8545.37
Wang et al. [22]

358 COVID‐19

5 538 Pneumonia

8066 Normal

Covid‐Net [22]93.311.75
Khan et al. [9]

284 COVID‐19

657 Pneumonia

310 Normal

CoroNet [9]94.5933.00
Ozturk et al. [14]

125 COVID‐19

500 Pneumonia

500 Normal

DarkCovidNet [14]87.021.10
Nihad et al. [23]589 COVID‐19ECOVNet‐Soft [23]95.704.98
8851 Pneumonia
6053 Normal
Our Method

412 COVID‐19

4 265 Pneumonia

1575 Normal

MSA‐DDCovidNet97.960.41
Comparison of MSA‐DDCovidNet with other deep learning methods developed using X‐ray images 224 COVID‐19 700 Pneumonia 504 Normal 358 COVID‐19 5 538 Pneumonia 8066 Normal 284 COVID‐19 657 Pneumonia 310 Normal 125 COVID‐19 500 Pneumonia 500 Normal 412 COVID‐19 4 265 Pneumonia 1575 Normal The results of these excellent methods are obtained in different datasets. If these methods are verified with the same data set, and the performance differences will be more intuitive and convincing. In order to further verify the effectiveness of MSA‐DDCovidNet, an experiment is supplemented with COVIDx [22] dataset: The performance of the six models in Table 5 in COVIDx [22] dataset under the experimental environment and parameter settings of this study (see Section 4.3 for details) will be observed and compared. The results of the above experiments are shown in Table 6.
TABLE 6

Values of criteria of experimented models

ModelAccuracy (%)Precision (%)Sensitivity (%)Specificity (%)F1‐score (%)
XResNet50V280.8775.6980.8781.8778.19
Covid‐Net93.2293.1993.2293.7993.17
CoroNet 94.81 94.85 94.81 95.45 94.78
DarkCovidNet74.8670.1474.8676.2472.42
ECOVNet‐Soft86.8381.2686.8387.2083.92
MSA‐DDCovidNet90.6390.8690.6392.5190.65
Values of criteria of experimented models As shown in Table 6, CoroNet, proposed by Khan et al. [9], outperforms the other models in all criteria. Based on Xception [10], CoroNet [9] adopts deep separable convolution to reduce the parameters of the model, instead of standard convolution. However, the large depth and width of the network result in a mass of parameters. Covid‐Net [22] makes full use of point convolution and depth separable convolution in the PEPX module, which effectively reduces the parameters, and finally obtains a better performance with fewer parameters. XResNet50V2 [12] by Rahimzadeh et al. contains two parallel sub‐networks: Xception [10] and ResNet50V2 [11], and adopts a fully connected layer to classify the features extracted by these two sub‐networks, which produces a mass of parameter. Moreover, its complex structure makes it difficult to optimize. Therefore, in the end, it needs more parameters, but it can't get good performance. The structure of DarkCovidNet [14] is similar to VGGNet [26], consisting of some standard convolutional layers, max pooling layers and fully connected layers. It has fewer parameters with low depth and width, which makes it difficult to learn a relatively large data set, like COVIDx. Therefore, DarkCovidNet [14] performs poorly in this experiment. After the experimental preprocessing, the CXR images in COVIDx are finally resize to 224 × 224. For better comparison, the ECOVNet‐Soft in this experiment is based on the EfficientNet‐b0 model, rather than the original EfficientNet‐b5. The ECOVNet‐Soft obtained by this method is a relatively lightweight network, and its performance in this experiment is slightly different from that in the original paper [23]. Such difference is considered reasonable due to the difference of hardware devices. MSA‐DDCovidNet is the model with the fewest parameters in the experiment. Due to the application of deep separable convolution, feature reuse and multi‐scale feature fusion, it still performs well in this experiment. From a comprehensive point of view, although CoroNet [9] and Covid‐Net [22] have achieved better performance with sophisticated designs, their parameters are more than 28 times that of MSA‐DDCovidNet. Moreover, MSA‐DDCovidNet can perform better than those more complex models such as XResNet50V2 [12], DarkCovidNet [14], ECOVNet‐Soft [23]. ROC curve is considered as an effective evaluation method that reflects the classification performance of the model. It can reflect the trade‐off between the true positive rate and the false positive rate. Figure 6 shows the ROC curves of the six models. The labels in Figure 6 show the micro and macro average and class‐wise AUC scores.
FIGURE 6

ROC curves of MSA‐DDCovidNet and the other deep learning models in Table 5

ROC curves of MSA‐DDCovidNet and the other deep learning models in Table 5 The comparison results of Figure 6 are similar to those in Table 6. Both CoroNet [9] and Covid‐Net [22] have better ROC curves and AUC values, and the performance of MSA‐DDCovidNet is only behind these two networks. It can also be found in Figure 6 that the three underperforming networks – XResNet50V2, DarkCovidNet, and ECOVNet‐Soft – have poor classification capabilities for COVID‐19. In the same experimental settings with the other models, DarkCovidNet underperforms. The intuitive explanation is that its low depth and width make it difficult to detect relatively few Covid‐19 CXR images among the numerous CXR images. In contrast, MSA‐DDCovidNet has achieved a relatively well performance with fewer parameters. In summary, MSA‐DDCovidNet is a network worthy of being applied to CXR image recognition.

DISCUSSION

In order to verify that the multi‐scale spatial attention mechanism is better than the traditional spatial attention mechanism, a network SSA‐DDCovidNet is designed as the control group. In the SSA‐DDCovidNet, the attention mechanism in MSA‐DDCovidNet is replaced with the traditional single‐scale spatial attention mechanism to obtain SSA‐DDCovidNet. Figure 7 shows the accuracy curves of the two networks in the experimental dataset (). As can be seen from Figure 7, the average accuracy of the proposed network in 150 epochs is higher than that of SSA‐DDCovidNet, and the highest accuracy is 2.03% higher than that of SSA‐DDCovidNet.
FIGURE 7

Accuracy curves of MSA‐DDCovidNet and SSA‐DDCovidNet on test set. The red line represents the accuracy curve of MSA‐DDCovidNet and the green line represents the accuracy curve of SSA‐DDCovidNet. The two curves peaked at the 120th epoch and the 109th epoch, respectively

Accuracy curves of MSA‐DDCovidNet and SSA‐DDCovidNet on test set. The red line represents the accuracy curve of MSA‐DDCovidNet and the green line represents the accuracy curve of SSA‐DDCovidNet. The two curves peaked at the 120th epoch and the 109th epoch, respectively An additional experiment is conducted to verify the need for obtaining spatial attention map. Two networks are designed in this experiment: D3S9Net and DMFF5Net as comparison networks. In MSA‐DDCovidNet, the output feature map of the 1st DMFF module is used to generate spatial attention map. While in D3S9Net, the output feature map of the 9th D3S module is used to generate spatial attention map. Similarly, in DMFF5Net, the output feature map of the 5th DMFF module is used to generate the attention map. 1st DMFF module, 5th DMFF Module and 9th D3S Module are in the shallow, middle and deep layers of the network respectively. Different depth feature maps are adopted to generate attention maps and then compare their performance. The test accuracy curves of the three networks are shown in Figure 8. Our interpretation of this result is that in each down sampling, the feature map will lose some spatial information. Since the features in the shallow feature map are not compressed many times, the included features are relatively complete. Therefore, it is more reasonable to obtain the spatial attention map in the shallow layer of the network.
FIGURE 8

Accuracy curves of MSA‐DDCovidNet, D3S9Net and DMFF5Net on test set. The red line represents the accuracy curve of MSA‐DDCovidNet, the green line represents the accuracy curve of D3S9Net and the black line denotes the accuracy curve of DMFF5Net. The three curves peaked at the 120th epoch, the 89th epoch and the 113th epoch respectively

Accuracy curves of MSA‐DDCovidNet, D3S9Net and DMFF5Net on test set. The red line represents the accuracy curve of MSA‐DDCovidNet, the green line represents the accuracy curve of D3S9Net and the black line denotes the accuracy curve of DMFF5Net. The three curves peaked at the 120th epoch, the 89th epoch and the 113th epoch respectively As a lightweight network, MSA‐DDCovidNet gets great advantages from its structure. But its performance still has a gap with some sophisticated and highly complex networks. The model needs further study and improvement in the future. And MSA‐DDCovidNet will be rescaled in the further work, under the premise of ensuring the lightweight of the network, using more parameters for better performance.

CONCLUSION

In this paper, to recognize COVID‐19 CXR images effectively, two kinds of feature sensitive modules proposed by our team are used: DMFF module and D3S module. Based on these two modules and MSA mechanism, we proposed MSA‐DDCovidNet with strong spatial representation capacity and few parameters. To verify the performance of our proposed network, two datasets are adopted. In the preliminary experiment, 4265 CXR images of pneumonia patients, 1575 normal CXR images and 412 CXR images of COVID‐19 patients are selected from two datasets. The performance of our network is compared with a series of other networks through experiments. The results of the preliminary experiment show that MSA‐DDCovidNet has excellent performance, and its classification accuracy for test set is 97.96%. More notably, its precision, sensitivity and specificity for COVID‐19 are 100%, 95.10% and 100%, respectively. In addition, a larger dataset COVIDx is also adopted to verify the performance of MSA‐DDCovidNet. An additional experiment is designed and the performance of MSA‐DDCovidNet is compared with some other deep learning models. Finally, MSA‐DDCovidNet got a good performance. Two additional ablation experiments are also conducted to verify the effectiveness of MSA mechanism. Therefore, it's believed that using MSA‐DDCovidNet to detect COVID‐19 CXR can effectively improve the diagnostic efficiency, and help detect and isolate patients in time. Due to the shortage of COVID‐19, it's necessary to collect more COVID‐19 CXR images to better illustrate the effectiveness of our proposed network. Although MSA‐DDCovidNet performed very well in the experiment, it still needs further clinical research and testing. After further training and testing, MSA‐DDCovidNet is expected to be put into practical application in auxiliary diagnosis COVID‐19.

CONFLICTS OF INTEREST

The authors declare that they have no conflicts of interest.
  5 in total

1.  Imaging Profile of the COVID-19 Infection: Radiologic Findings and Literature Review.

Authors:  Ming-Yen Ng; Elaine Y P Lee; Jin Yang; Fangfang Yang; Xia Li; Hongxia Wang; Macy Mei-Sze Lui; Christine Shing-Yen Lo; Barry Leung; Pek-Lan Khong; Christopher Kim-Ming Hui; Kwok-Yung Yuen; Michael D Kuo
Journal:  Radiol Cardiothorac Imaging       Date:  2020-02-13

2.  Automated detection of COVID-19 cases using deep neural networks with X-ray images.

Authors:  Tulin Ozturk; Muhammed Talo; Eylul Azra Yildirim; Ulas Baran Baloglu; Ozal Yildirim; U Rajendra Acharya
Journal:  Comput Biol Med       Date:  2020-04-28       Impact factor: 4.589

3.  Frequency and Distribution of Chest Radiographic Findings in Patients Positive for COVID-19.

Authors:  Ho Yuen Frank Wong; Hiu Yin Sonia Lam; Ambrose Ho-Tung Fong; Siu Ting Leung; Thomas Wing-Yan Chin; Christine Shing Yen Lo; Macy Mei-Sze Lui; Jonan Chun Yin Lee; Keith Wan-Hang Chiu; Tom Wai-Hin Chung; Elaine Yuen Phin Lee; Eric Yuk Fai Wan; Ivan Fan Ngai Hung; Tina Poy Wing Lam; Michael D Kuo; Ming-Yen Ng
Journal:  Radiology       Date:  2020-03-27       Impact factor: 11.105

4.  CoroNet: A deep neural network for detection and diagnosis of COVID-19 from chest x-ray images.

Authors:  Asif Iqbal Khan; Junaid Latief Shah; Mohammad Mudasir Bhat
Journal:  Comput Methods Programs Biomed       Date:  2020-06-05       Impact factor: 5.428

  5 in total
  2 in total

1.  Detecting COVID-19 patients via MLES-Net deep learning models from X-Ray images.

Authors:  Wei Wang; Yongbin Jiang; Xin Wang; Peng Zhang; Ji Li
Journal:  BMC Med Imaging       Date:  2022-07-30       Impact factor: 2.795

2.  CAW: A Remote-Sensing Scene Classification Network Aided by Local Window Attention.

Authors:  Wei Wang; Xiaowei Wen; Xin Wang; Chen Tang; Jiwei Deng
Journal:  Comput Intell Neurosci       Date:  2022-10-11
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.