Literature DB >> 33230398

Application of deep learning techniques for detection of COVID-19 cases using chest X-ray images: A comprehensive study.

Soumya Ranjan Nayak¹, Deepak Ranjan Nayak², Utkarsh Sinha¹, Vaibhav Arora¹, Ram Bilas Pachori³.

Abstract

The emergence of Coronavirus Disease 2019 (COVID-19) in early December 2019 has caused immense damage to health and global well-being. Currently, there are approximately five million confirmed cases and the novel virus is still spreading rapidly all over the world. Many hospitals across the globe are not yet equipped with an adequate amount of testing kits and the manual Reverse Transcription-Polymerase Chain Reaction (RT-PCR) test is time-consuming and troublesome. It is hence very important to design an automated and early diagnosis system which can provide fast decision and greatly reduce the diagnosis error. The chest X-ray images along with emerging Artificial Intelligence (AI) methodologies, in particular Deep Learning (DL) algorithms have recently become a worthy choice for early COVID-19 screening. This paper proposes a DL assisted automated method using X-ray images for early diagnosis of COVID-19 infection. We evaluate the effectiveness of eight pre-trained Convolutional Neural Network (CNN) models such as AlexNet, VGG-16, GoogleNet, MobileNet-V2, SqueezeNet, ResNet-34, ResNet-50 and Inception-V3 for classification of COVID-19 from normal cases. Also, comparative analyses have been made among these models by considering several important factors such as batch size, learning rate, number of epochs, and type of optimizers with an aim to find the best suited model. The models have been validated on publicly available chest X-ray images and the best performance is obtained by ResNet-34 with an accuracy of 98.33%. This study will be useful for researchers to think for the design of more effective CNN based models for early COVID-19 detection.

Entities: Disease Gene Species

Keywords: COVID-19; Chest X-ray; Convolutional Neural Networks; Optimization algorithms; SARS-CoV-2

Year: 2020 PMID： 33230398 PMCID： PMC7674150 DOI： 10.1016/j.bspc.2020.102365

Source DB: PubMed Journal: Biomed Signal Process Control ISSN： 1746-8094 Impact factor: 3.880

Introduction

The outbreak of Coronavirus Disease 2019 (COVID-19) has put the globe under tremendous pressure since early December 2019. To date, more than five million people across the globe have been infected, and there are approximately three lakhs confirmed death cases as reported by the World Health Organization (WHO) [1]. It is caused by Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) and the common symptoms of COVID-19 include fever, myalgia, dry cough, headache, sore throat, and chest pain [2], [3] and therefore, it is considered as a respiratory disease. It can take around 14 days to show complete symptoms in the infected person. Currently, there is no explicit treatment or drug available to heal this disease. However, the utmost common practice used in the diagnosis of COVID-19 is called Reverse Transcription-Polymerase Chain Reaction (RT-PCR) [4], [5]. Recently, it has been found that medical imaging techniques such as X-ray and Computed Tomography (CT) play a crucial role in testing COVID-19 cases [4], [6], [7], [8]. Since the virus generally causes infection in the lungs, the chest radiography images (chest X-ray or CT images) have been widely considered [9] and the interpretation of these images is manually performed by radiologists to find some visual indicators for COVID-19 infection. These visual indicators can be served as an alternative method for the rapid screening of infected patients. The conventional diagnosis process has become relatively faster but still causes a high risk for medical staff. Moreover, it is costly and there exists a limited number of diagnostic test kits. On the other hand, medical imaging techniques (e.g., X-ray and CT) based screening are relatively safe, faster, and easily accessible. Compared to CT imaging, X-ray imaging has been extensively used for COVID-19 screening as it requires less imaging time, lower cost, and X-ray scanners are widely available even in rural areas [7], [10]. However, the visual inspection of X-ray images by radiologists at a larger scale is time-consuming, cumbersome, and may lead to inaccurate diagnosis due to lack of prior knowledge about the virus-infected regions. Thus, there is a strong need for design of automated methods to obtain a faster and accurate COVID-19 diagnosis. The recent automated methods used contemporary Artificial Intelligence (AI) technologies (mostly Deep Learning (DL) algorithms) to enhance the power of X-ray imaging and aimed to reduce the workload of radiologists [4]. The DL models, in particular, Convolutional Neural Networks (CNN) have shown to be effective than traditional AI methods and have been widely used for analyzing several medical images [11], [12], [13], [14], [15]. Recently, CNN has been successfully applied to detect pneumonia in chest X-ray images [16], [17], [18], [19]. Recently, studies have been conducted using DL models related to the diagnosis of COVID-19 through X-ray images. For instance, Ozturk et al. [20] developed a DL network termed as DarkCovidNet based on X-ray images for automated COVID-19 diagnosis. The model achieved a higher accuracy of 87.02% and 98.08% for multi-class (COVID-19, normal, and pneumonia) and two-class (COVID-19 and normal) cases. Hemdan et al. [21] developed a COVIDX-Net model considering X-ray images. Seven different CNN models have been used to train the COVIDX-Net model and the model is validated on 50 X-ray images (25 normal and 25 COVID-19 cases). Wang and Wong [22] designed a deep CNN model referred to as COVID-Net for COVID-19 detection which obtained a testing accuracy of 92.6%. Apostolopoulos et al. [23] evaluated a set of existing CNN models for classification of COVID-19 cases and obtained the highest testing accuracy of 93.48% and 98.75% for three and binary classes. Narin et al. [24] yielded testing accuracy of 98% on a dataset of 100 images (50 normal and 50 COVID-19 cases) using the ResNet50 model. Sethy and Behera [25] obtained the features from different pre-trained CNN architectures using chest X-ray images. The ResNet50 coupled with Support Vector Machine (SVM) classifier achieved the highest accuracy of 95.38% using 50 samples (25 normal and 25 COVID-19 cases). Ucar and Korkmaz [26] suggested a COVIDiagnosis-Net model using SqueezeNet and Bayesian optimizer to achieve a testing accuracy of 98.3% over three-class classification cases. Toğaçar et al. [27] designed an automated method for classification of COVID-19 cases from normal and pneumonia cases using two DL models (MobileNetV2 and SqueezeNet) and SVM classifier. In their study, the original dataset was reconstructed using Fuzzy Color and Stacking techniques prior to the application of DL models. The features obtained by the DL models were further processed using Social Mimic Optimization (SMO) algorithm to derive efficient features and finally SVM was applied on the combined feature set. Recently, Farooq and Hafeez [28] built a ResNet baed CNN model named as COVID-ResNet for classification of COVID-19 and three other cases (normal, bacterial pneumonia and viral pneumonia). They obtained an accuracy of 96.23% over a publicly available dataset (i.e., COVIDx); however, only 68 COVID-19 samples were considered in this study. The major challenge lies while applying DL models is to collect an adequate number of samples with proper annotations for effective training. The literature reveals that the earlier models are validated using a fewer number of samples and the data in most cases are imbalanced. The pre-trained models adopted for the classification of COVID-19 cases are not fully investigated empirically. Furthermore, a comprehensive comparative study among the pre-trained CNN models has not yet reported. Hence, there exists a scope to perform an empirical study of different pre-trained CNN models to obtain the best performing model that can be used as an alternative tool for the rapid diagnosis of COVID-19. In this study, we proposed a DL empowered automated COVID-19 screening method using chest X-ray images. Eight most successful pre-trained CNN models namely VGG-16, AlexNet, GoogleNet, MobileNet-V2, SqueezeNet, ResNet-34, ResNet-50 and Inception-V3 have been taken into consideration based on the concept of Transfer Learning (TL). A comparative analysis has been made among all these models by considering several factors such as batch size, learning rate, number of epochs, misclassification rate, and type of optimization techniques, and finally, the best performing model has been obtained and compared with the state-of-the-art methods. The models have been validated using a larger number of chest X-ray images collected from covid-chestxray-dataset [29] and ChestX-ray8 [30] datasets. To mitigate the data scarcity and data imbalance problem, a multi-scale offline augmentation technique followed by data balancing and normalization has been adopted. A comparison of the proposed model with other state-of-the-art methods has been made in the context of COVID-19 class sensitivity. The suggested model for COVID-19 diagnosis is easy to implement, follows an end to end architecture without the need for manual feature engineering and could assist radiologists for accurate and stable diagnosis of COVID-19 infection. The major contributions of this proposed research can be outlined as follows: The effectiveness of the eight most efficient pre-trained deep CNN models, namely, VGG-16, Inception-V3, ResNet-34, MobileNet-V2, AlexNet, GoogleNet, ResNet-50, and SqueezeNet have been compared comprehensively. The impact of several hyper- parameters such as learning rate, batch size, number of epochs, and optimization techniques have been studied. Eventually, the best model is derived that will be useful for the researchers to design a more efficient CNN based solution for early detection of COVID-19 infection. The data in the publicly available datasets are limited and imbalanced. To overcome this, we performed multi-operation data augmentation while balancing the samples for both COVID-19 and normal classes. The remainder of the paper is structured as follows. Section 2 gives a description of the samples used for validation of different CNN models and describes the proposed automated model for COVID-19 detection. The results and comparisons are presented in Section 3. The concluding remarks are outlined in Section 4.

Materials and method

In this section, we present a detailed description of the proposed methodology designed for COVID-19 detection and the data used for the validation of the proposed model.

Dataset

To validate the proposed method, the chest X-ray images have been collected from two different sources. The dataset (covid-chestxray-dataset [29]) prepared by Cohen JP has been used to collect the COVID-19 X-ray images that considers images from various open sources and has been updated regularly. While chest X-ray images of normal category have been collected from a GitHub repository1 which contains 500 images selected from ChestX-ray8 dataset [30] In our experiment, 203 COVID-19 frontal-view chest X-ray images have been selected from covid-chestxray-dataset. To deal with the data imbalance problem, the same 203 frontal-view chest X-ray images of healthy lungs have been randomly selected from [30]. The example of chest X-ray images from both normal and COVID-19 classes are depicted in Fig. 1. Since the information about the actual data split was not provided in the datasets, 70% of the data was used for training and remaining 30% for testing which resulted in 286 images (143 COVID-19 and 143 normal cases) in the training set and 120 images (60 COVID-19 and 60 normal cases) in the test set.

Fig. 1

Samples of frontal-view chest X-ray images from the dataset: (a) COVID-19 case and (b) normal case.

Proposed methodology

The proposed model for automated diagnosis of COVID-19 cases is illustrated in Fig. 2. The model aims to classify a given chest X-ray image into normal or COVID-19 category which includes two vital stages: preprocessing (normalization and augmentation) and classification using pre-trained CNN architectures. The description of each stage is detailed in the subsequent sections.

Fig. 2

Overview of the proposed automated COVID-19 diagnosis method using frontal-view chest X-ray images.

Preprocessing

This section gives a detailed description of the methods used at the preprocessing stage. Normalization Normalization of data is an essential step and is generally used to maintain numerical stability in the CNN architectures. With normalization, a CNN model is likely to learn faster and the gradient descent is more likely to be stable [31]. Therefore, in this study, the pixel values of the input images have been normalized in between the range 0–1. The images used in the considered datasets are gray-scale images and the rescaling was achieved by multiplying 1/255 with the pixel values. Data Augmentation The CNN models require a vast amount of data for effective training and have shown to perform better on bigger datasets [11], [13]. However, the available training X-ray images in the considered dataset is very less (i.e., 286 X-ray images). This has been a major concern while performing analysis of medical images using DL algorithms since it is hard to collect medical data. To handle this problem, data augmentation technique has been widely employed which helps in expanding the number of images using a set of transformations while preserving class labels. Augmentation also increases variability in the images and serves as a regularizer at the dataset level [32]. The techniques adopted in this study for augmenting the training images are illustrated in Fig. 3.

Fig. 3

Illustration of different data augmentation techniques used in this study.

Illustration of different data augmentation techniques used in this study. The images were augmented using the following techniques: (1) images were rotated by the angles of 5 degrees clockwise, (2) images were scaled by a measure of 15%, (3) images were performed horizontal flipping, and (4) images were added Gaussian noise with a zero mean and variance 0.25. It is worth noting that all these techniques have been applied over the training samples and the example results of each technique are depicted in Fig. 4. Finally, a larger training set containing 1430 images was obtained which is 5 times more than the original training images.

Fig. 4

Sample results of data augmentation.

COVID-19 prediction using pre-trained CNN models

The CNN models have been proved to obtain superior results in a wide range of medical image processing applications. However, training these models from scratch would be difficult for prediction of COVID-19 cases due to the limited availability of X-ray samples. The application of pre-trained models using the concept of Transfer Learning (TL) can be useful in such circumstances. In TL, the knowledge gained by a DL model trained from a large dataset is used to solve a related task with a comparatively smaller dataset. This helps in eliminating the need for a large dataset and longer learning time as required by DL methods that are trained from scratch [11], [32], [33]. In this study, eight pre-trained models such as AlexNet [34], VGG-16 [35], GoogleNet [36], MobileNet-V2 [37], SqueezeNet [38], ResNet-34 [39], ResNet-50 [39], and Inception-V3 [40] have been used for classification of COVID-19 from normal cases. These networks have been achieved dramatic success in a wide range of computer vision and medical image analysis problems and hence were chosen in this study to distinguish COVID-19 infection from normal cases. It is worth noting here that these models were originally trained on a large-scale labeled dataset called ImageNet [34] and later fine-tuned over the chest X-ray images. The last layer in these models has been removed and a new Fully Connected (FC) layer is inserted with an output size of two that represents two different classes (normal and COVID-19). In these resulted models, only the final FC layer is trained, whereas other layers are initialized with pre-trained weights. The hyperparameters play a key role for tuning these DL models and were kept constant to derive a fair comparison. The details of the parameter settings are discussed in Section 3.1. The architectural overview of the pre-trained CNN models is tabulated in Table 1 and the major components of each network are illustrated in Fig. 5. AlexNet comprises of five convolutional layers and three FC layers and was trained over 1.2 million images from 1000 categories [34]. ReLU activation is used in this network. The first and second FC layers have 4096 neurons, whereas the final FC layer has 1000 neurons. In VGG-16 network, there is an increased number of layers i.e., 13 convolutional layers and three FC layers [35]. The filters in this network are restricted to 3 × 3 with stride and padding of 1. The model was trained over a million images of 1000 categories. SqueezeNet executes better performance than AlexNet with comparatively fewer parameters [38]. It has one convolution layer at the beginning and the end and has eight fire modules in between.

Table 1

Architectural descriptions of the pre-trained CNN models used in this study.

Model	Layers	Parameters (in million)	Input layer size	Output layer size
AlexNet	8	60	(224,224,3)	(2,1)
VGG-16	16	138	(224,224,3)	(2,1)
GoogleNet	22	5	(224,224,3)	(2,1)
MobileNet-V2	53	3.4	(224,224,3)	(2,1)
SqueezeNet	18	1.25	(224,224,3)	(2,1)
ResNet-34	34	21.8	(224,224,3)	(2,1)
ResNet-50	50	25.6	(224,224,3)	(2,1)
Inception-V3	42	24	(299,299,3)	(2,1)

Fig. 5

Illustration of the major components of eight pre-trained CNN models. Convolution blocks of different colors indicate filters of different size. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

ResNet-34 is a deep residual network and is designed based on the concept of residual learning [39]. It consists of one standalone convolution layer and 16 residual bocks followed by one FC layer. This network mainly overcomes the degradation problem by introducing residual connections. ResNet-50 is another variant of ResNet that consists of same number of residual block as with ResNet-34, but, the structure of residual block is different [39]. ResNet-50 model replaces each two layer residual block of ResNet-34 with a three layer bottleneck block that uses 1 × 1 convolutions. GoogleNet is a 22 layer deep architecture with 5 million parameters which is considerably than the parameters used in AlexNet and VGG models [36]. The inception module is the basic building block of this model that helps in processing the data using multiple filters in parallel. Inception-V3 is a variant of Inception-V2 and avoids representational bottlenecks [40]. It has more effective computations by using factorization techniques and achieves a low error rate as opposed to its earlier variants. This network contains 42 layers with an input image size of 299 × 299. MobileNet-V2 is designed based on the ideas of MobileNet-V1 that utilize depth wise separable convolution as efficient building blocks. But, this version introduces a new layer module that is the inverted residual with linear bottleneck [37]. It is a small and cost-effective architecture that helps in achieving high performance with limited resources. It has 19 residual bottleneck layers. The main purpose of this research is to determine the best performing DL model for COVID-19 screening that can drive forward researchers for development of more effective AI based solutions. Architectural descriptions of the pre-trained CNN models used in this study. Illustration of the major components of eight pre-trained CNN models. Convolution blocks of different colors indicate filters of different size. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Experiments and results

This section presents the results obtained from several experiments. We performed a comprehensive experimental analysis for prediction of COVID-19 from X-ray images using eight pre-trained CNN models namely, AlexNet, VGG-16, GoogleNet, MobileNet-V2, Squeezenet, ResNet-34, ResNet-50 and Inception-V3. We analyzed the impact of several hyperparameters associated with these models and carried out a comparative analysis among eight CNN models. Finally, the best performing model is obtained. We also compared the results with recent state-of-the-art approaches.

Experimental setup and performance metrics used

The CNN models were evaluated using chest X-ray samples collected from covid-chestxray-dataset [29] and ChestX-ray8 dataset [30]. The details of the data splitting used in our study with and without augmentation are shown in Table 2. The augmented samples have been used for training the model. After performing augmentation on training images, 1430 X-ray images have been obtained which were further divided into a train and validation set using a splitting ratio of 70% and 30% and thereby, resulting in 1002 training images and 428 validation images as shown in Table 2. The validation set has been used to prevent the model from overfitting and obtain an optimal model.

Table 2

Details of data splitting with and without augmentation.

Class	Original dataset		Augmented dataset
	Train	Test	Train	Validation	Test
COVID-19	143	60	501	214	60
Normal	143	60	501	214	60

Total	286	120	1002	428	120

The input X-ray images were initially resized to 224 × 224 while using AlexNet, VGG-16, GoogleNet, MobileNet-V2, SqueezeNet, ResNet-34, ResNet-50, and to 299 × 299 while using Inception-V3. We implemented our algorithms using PyTorch toolbox. For TL, the batch size was set as 32. Each model was trained for 50 epochs. The batch size and number of epochs have been determined empirically. The training was performed using Adam optimizer and the learning rate has been empirically decided. The performance of each model was evaluated based on different metrics like F1-score, Specificity (Spe), Sensitivity (Sen), Precision (Pre), Accuracy (Acc), and Area Under the ROC Curve (AUC). These metrics were computed by different parameters of the confusion matrix such as True Positive (TP), False Positive (FP), True Negative (TN) and False Negative (FN) [41], [42]. The metrics are defined as follows: Details of data splitting with and without augmentation. In this study, COVID-19 infections and normal cases were considered as positive and negative cases respectively. Therefore, and indicate the accurately predicted normal and COVID-19 cases respectively, whereas and indicate the inaccurately predicted normal and COVID-19 cases respectively.

Results

The training performance in terms of training loss, validation loss and validation accuracy obtained by different networks at different epochs are listed in Table 3. Fig. 6 illustrates the training and validation loss across different iterations for all networks.

Table 3

Trainings performance of all the proposed models.

Model	Epoch	Train loss	Valid loss	Valid accuracy (%)
AlexNet	1	0.8252	0.3738	84.81
		…	…	…
	49	0.0143	0.0053	98.73
	50	0.0232	0.0043	99.05

VGG-16	1	0.8113	0.3149	84.57
		…	…	…
	49	0.1759	0.1859	96.30
	50	0.1724	0.1886	96.15

GoogleNet	1	0.4749	0.2396	81.35
		…	…	…
	49	0.0020	0.0396	98.27
	50	0.0028	0.0477	98.62

MobileNet-V2	1	0.5772	0.2308	90.65
		…	…	…
	49	0.0102	0.0243	97.92
	50	0.0057	0.0270	97.23

SqueezeNet	1	0.6943	0.2576	90.89
		…	…	…
	49	0.0057	0.0298	97.37
	50	0.0071	0.0289	97.65

ResNet-34	1	0.8606	0.4616	77.80
		…	…	…
	49	0.0019	0.0245	98.68
	50	0.0015	0.0270	98.13

ResNet-50	1	0.5923	0.3542	83.84
		…	…	…
	49	0.0050	0.0126	98.71
	50	0.0035	0.0137	98.78

Inception-V3	1	1.3397	0.6440	69.83
		…	…	…
	49	0.3319	0.1395	94.97
	50	0.3460	0.1393	95.13

Fig. 6

Loss convergence plot obtained for different CNN architectures: (a) ResNet-34, (b) ResNet-50, (c) GoogleNet, (d) VGG-16, (e) AlexNet, (f) MobileNet-V2, (g) Inception-V3, and (h) SqueezeNet.

We presented the confusion matrices of all eight CNN models on the test data in Fig. 7. It can be observed that our proposed method with ResNet-34 and ResNet-50 could able to classify all COVID-19 infection cases accurately. The ROC curves of all models are depicted in Fig. 8.

Fig. 7

Confusion matrices obtained for different CNN models: (a) ResNet-34, (b) ResNet-50, (c) GoogleNet, (d) VGG-16, (e) AlexNet, (f) MobileNet-V2, (g) Inception-V3, and (h) SqueezeNet.

Fig. 8

ROC curves of eight different CNN models used in this study.

The detailed classification results obtained from all networks are compared in terms of various metrics and are tabulated in Table 4. It can be seen that the ResNet-34 model achieved the highest performance with a precision of 96.77%, specificity of 96.67%, F1-score of 0.9836, accuracy of 98.33%, and AUC of 0.9836. Also, it is noticed that the obtained sensitivity for COVID-19 class is remarkably higher (i.e., 100%) for both ResNet models. AlexNet network was found to be the second-best performer for COVID-19 prediction which obtained a precision of 96.72%, sensitivity of 98.33%, specificity of 96.67%, F1-score of 0.9752, accuracy of 97.50%, and AUC of 0.9642.

Table 4

Classification results comparison of all eight CNN models.

Model	Pre (%)	Sen (%)	Spe (%)	F1-score	Acc (%)	AUC
ResNet-34	96.77	100.00	96.67	0.9836	98.33	0.9836
ResNet-50	95.24	100.00	95.00	0.9756	97.50	0.9731
GoogleNet	96.67	96.67	96.67	0.9667	96.67	0.9696
VGG-16	95.08	96.67	95.00	0.9587	95.83	0.9487
AlexNet	96.72	98.33	96.67	0.9752	97.50	0.9642
MobileNet-V2	98.24	93.33	98.33	0.9573	95.83	0.9506
Inception-V3	96.36	88.33	96.67	0.9217	92.50	0.9342
SqueezeNet	98.27	95.00	98.33	0.9661	96.67	0.9705

Trainings performance of all the proposed models. Loss convergence plot obtained for different CNN architectures: (a) ResNet-34, (b) ResNet-50, (c) GoogleNet, (d) VGG-16, (e) AlexNet, (f) MobileNet-V2, (g) Inception-V3, and (h) SqueezeNet. Confusion matrices obtained for different CNN models: (a) ResNet-34, (b) ResNet-50, (c) GoogleNet, (d) VGG-16, (e) AlexNet, (f) MobileNet-V2, (g) Inception-V3, and (h) SqueezeNet. Classification results comparison of all eight CNN models.

Results comparison with different optimization methods

Adam optimizer [43] has been chosen for performing the training of all networks. To evaluate the effectiveness, its results were compared with of other efficient optimization methods such as SGD [44], Adadelta [45], and RMSProp [46]. Table 5 lists the classification results of different optimizers for two different and best performing CNN models such as ResNet-34 and AlexNet. The results have been computed over the test set. It can be seen that Adam optimizer outperforms other competitive optimization methods.

Table 5

Classification performance (in %) comparison among different optimizers.

Model	Optimizer	Pre (%)	Sen (%)	Spe (%)	F1-score	Acc (%)
ResNet-34	SGD	95.00	100.00	95.24	0.9740	97.50
	Adadelta	95.00	93.44	94.92	0.9420	94.17
	RMSProp	96.67	98.31	96.72	0.9748	97.50
	Adam	96.77	100.00	96.67	0.9836	98.33

AlexNet	SGD	98.33	93.65	98.25	0.9593	95.83
	Adadelta	91.67	93.22	91.80	0.9240	92.50
	RMSProp	95.00	98.28	95.16	0.9661	96.67
	Adam	96.72	98.33	96.67	0.9752	97.50

Classification performance (in %) comparison among different optimizers.

Optimal learning rate selection

To determine the optimal learning rate for all networks, a graph has been plotted between different possible learning rates and the validation loss as shown in Fig. 9. The best learning rate of a model was chosen based on the minimum loss.

Fig. 9

Plot between learning rate and loss obtained for all models: (a) ResNet-34, (b) ResNet-50, (c) GoogleNet, (d) VGG-16, (e) AlexNet, (f) MobileNet-V2, (g) Inception-V3, and (h) SqueezeNet.

Results comparison with different batch sizes

Batch size is one of the most crucial hyper-parameters for tuning the DL models. In this experiment, the impact of batch size on testing accuracy is studied. Table 6 shows the test accuracy of all networks when trained using different batch sizes such as 8, 16 and 32. It can be seen that a higher and stable testing performance is obtained for all network models with batch size 32 and hence, a batch size of 32 has been chosen in the study.

Table 6

Testing accuracy (in %) obtained by the CNN models trained with different batch sizes.

Model	Batch size
	8	16	32
ResNet-34	98.33	98.33	98.33
ResNet-50	96.67	97.50	97.50
GoogleNet	95.83	95.83	96.67
VGG-16	96.67	95.83	95.83
AlexNet	96.67	96.67	97.50
MobileNet-V2	95.00	96.67	95.83
Inception-V3	92.50	91.67	92.50
SqueezeNet	96.67	95.83	96.67

Testing accuracy (in %) obtained by the CNN models trained with different batch sizes. Comparison with state-of-the-art deep learning COVID-19 detection approaches using chest X-ray images. C: COVID-19, N: Normal, P: Pneumonia, BP: Bacterial pneumonia, VP: Viral pneumonia

Misclassification results analysis

The misclassification samples predicted by two best performing CNN models: ResNet-34 and AlexNet are depicted in Fig. 10. The misclassification was possibly occurred due to the similar imaging features between the normal and COVID-19 infection cases.

Fig. 10

Illustration of misclassification results obtained by two best networks: (a) ResNet-34 and (b) AlexNet.

COVID-19 class performance comparison with the state-of-the-art approaches. NR: Not reported

Comparison with state-of-the-art methods

The results achieved by the best CNN model are compared with the recently proposed DL methods for automated COVID-19 diagnosis using chest X-ray images in Table 7. It can be observed that the proposed method achieved higher performance than the other existing schemes. Compared to most of the studies (Hemdan et al. [21], Narin et al. [24] and Sethy and Behera [25]), the proposed study considered a fairly large number of chest X-ray samples to validate the CNN models. On the other hand, Toğaçar et al. [27], Ozturk et al. [20], Wang and Wong [22], Ucar and Korkmaz [26] and Farooq and Hafeez [28] used comparatively larger datasets to validate their models, but these datasets led to the class imbalance problem and accommodated less number of COVID-19 samples in general. But, the dataset used in the proposed study has equal class distribution for normal and COVID-19 infection cases. In [26], the class imbalance problem was resolved using offline augmentation technique. Also, it can be noticed that binary classification was performed in most of the studies; however, multi-class (mostly three or four) classification was performed in [20], [22], [26], [27] and [28]. Only the study in [28] considered four classes such as COVID-19, normal, bacterial pneumonia, and viral pneumonia and an accuracy of 96.23% was achieved using a model called COVID-ResNet. However, the major weakness of this model is that it was validated on very less number of COVID-19 samples (i.e., 68). To summarize, the multi-class COVID-19 classification task is more vital and challenging due to the similar manifestation of COVID-19 with different types of pneumonia. Therefore, the performance in these cases is comparatively lower and is yet to be improved. For example, the accuracies obtained in [22] and [20] are 92.60% and 87.02% respectively.

Table 7

Comparison with state-of-the-art deep learning COVID-19 detection approaches using chest X-ray images.

Reference	Method	Classes	Number of X-ray samples	Acc (%)
Hemdan et al. [21]	COVIDX-Net	COVID-19	50	90.00
		Normal	C: 25 and N: 25
Narin et al. [24]	ResNet-50	COVID-19	100	98.00
		Normal	C: 50 and N: 50
Sethy and Behera [25]	ResNet-50 and SVM	COVID-19	50	95.38
		Normal	C: 25 and N: 25
Toğaçar et al. [27]	SqueezeNet and MobileNetV2	COVID-19	458	98.25
	SMO and SVM	Normal	C: 295, N: 65 and P: 98
		Pneumonia
Wang and Wong [22]	COVID-Net	COVID-19	13800	92.60
		Normal	C: 183, N: – and P: –
		Pneumonia
Ucar and Korkmaz [26]	Bayes-SqueezeNet	COVID-19	5949	98.30
		Normal	C: 76, N: 1583 and P: 4290
		Pneumonia
Farooq and Hafeez [28]	COVID-ResNet	COVID-19	5941	96.23
		Normal	C: 68, N: –, BP: –, VP: –
		Bacterial pneumonia
		Viral pneumonia
Ozturk et al. [20]	DarkCovidNet	COVID-19	625	98.08
		Normal	C: 125 and N: 500
		COVID-19	1125	87.02
		Normal	C: 125, N: 500 and P: 500
		Pneumonia
Proposed method	ResNet-34	COVID-19	406	98.33
		Normal	C: 203 and N: 203

C: COVID-19, N: Normal, P: Pneumonia, BP: Bacterial pneumonia, VP: Viral pneumonia

Furthermore, a performance comparison has been made between the proposed scheme and existing methods in the context of COVID-19 class sensitivity and the results are given in Table 8. Wang and Wong [22] achieved the lowest COVID-19 class sensitivity of 87.10%. It can be seen that the proposed method achieved 100% COVID-19 class sensitivity. Also, Hemdan et al. [21], Ucar and Korkmaz [26], and Farooq and Hafeez [28] obtained similar COVID-19 class sensitivity (i.e., 100%). However, the actual test set used in these studies contained comparatively less number of COVID-19 samples.

Table 8

COVID-19 class performance comparison with the state-of-the-art approaches.

Reference	COVID-19 class sensitivity (%)
Hemdan et al. [21]	100.00
Narin et al. [24]	96.00
Sethy and Behera [25]	NR
Toğaçar et al. [27]	99.32
Wang and Wong [22]	87.10
Ucar and Korkmaz [26]	100.00
Farooq and Hafeez [28]	100.00
Ozturk et al. [20]	90.65
Proposed method	100.00

NR: Not reported

ROC curves of eight different CNN models used in this study. Plot between learning rate and loss obtained for all models: (a) ResNet-34, (b) ResNet-50, (c) GoogleNet, (d) VGG-16, (e) AlexNet, (f) MobileNet-V2, (g) Inception-V3, and (h) SqueezeNet. Illustration of misclassification results obtained by two best networks: (a) ResNet-34 and (b) AlexNet.

Discussion

The study of chest X-ray images for accurate prediction of COVID-19 infection has been attracting remarkable attention since the release of a dataset developed by Cohen [29]. Thereafter, several attempts have been made to develop a reliable diagnosis model using DL techniques. The concept of TL has been extensively used with CNN based networks. However, most of the earlier methods were evaluated using a limited amount of data. Further, in some cases, the data are imbalanced. In this study, we have comprehensively evaluated the effectiveness of the eight most effective CNN models such as AlexNet, VGG-16, MobileNet-V2, Squeezenet, ResNet-34, and Inception-V3 for prediction of COVID-19 infections in X-ray images. Extensive experiments were conducted on a comparatively large dataset by considering several factors to determine the best performing model for automated COVID-19 screening. The COVID-19 X-ray images and normal X-ray images were taken from two sources [29] and [30] respectively. To handle the data imbalance issue, the same amount of data was chosen for both the groups. The experimental results and the detailed comparative analysis among all methods demonstrated the superiority of ResNet models. We achieved accuracy and sensitivity of 98.33% and 100.00% with ResNet-34 network. The model is cost-effective and can aid the radiologists to verify their decisions. The major purpose of this research to take faster decisions for quarantine of patients that can ultimately help in reducing the spread of COVID-19 infection. The major weakness of the proposed study is that it is validated using a limited amount of COVID-19 samples. To date, there exists no large dataset publicly available as this is an ongoing and new pandemic. But, in the future, we plan to validate our approach using large datasets.

Conclusion

In this study, a DL based automated method was proposed for efficient classification of COVID-19 infection cases from normal cases using chest X-ray images. Several pre-trained CNN architectures using the concept of TL were studied by considering several crucial factors and their results over a set of publicly available X-ray samples were compared. The results indicated that ResNet-34 outperformed other competitive networks with an accuracy of 98.33% and can hence be regarded as a potential model for prediction of COVID-19 infection. The model can be utilized by the radiologists to verify their screening and thereby, reducing their workload. This study also paves the way for further development of effective deep CNN models (using residual connections) for a more accurate diagnosis of COVID-19 infection. The suggested DL model is developed to earn significant performance for binary classification (COVID-19 versus normal) and a limited number of studies have been proposed to date for multi-class classification (COVID versus pneumonia versus normal). Hence, in future studies, the effectiveness of the proposed model will be verified for multi-class classification problem. Further, we plan to explore the use of optimization algorithms along with the DL models used in this study to design a more reliable model.

CRediT authorship contribution statement

Soumya Ranjan Nayak: Formal analysis, Resources, Validation, Writing - original draft. Deepak Ranjan Nayak: Conceptualization, Methodology, Software, Writing - review & editing, Visualization, Supervision. Utkarsh Sinha: Data curation, Investigation, Software, Writing - original draft. Vaibhav Arora: Data curation, Investigation, Software, Writing - original draft. Ram Bilas Pachori: Investigation, Writing - review & editing, Supervision, Validation.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

20 in total

Review 1. Deep learning.

Authors: Yann LeCun; Yoshua Bengio; Geoffrey Hinton
Journal: Nature Date: 2015-05-28 Impact factor: 49.962

Review 2. Review of Artificial Intelligence Techniques in Imaging Data Acquisition, Segmentation, and Diagnosis for COVID-19.

Authors: Feng Shi; Jun Wang; Jun Shi; Ziyan Wu; Qian Wang; Zhenyu Tang; Kelei He; Yinghuan Shi; Dinggang Shen
Journal: IEEE Rev Biomed Eng Date: 2021-01-22

3. Dermatologist-level classification of skin cancer with deep neural networks.

Authors: Andre Esteva; Brett Kuprel; Roberto A Novoa; Justin Ko; Susan M Swetter; Helen M Blau; Sebastian Thrun
Journal: Nature Date: 2017-01-25 Impact factor: 49.962

4. Deep Learning COVID-19 Features on CXR Using Limited Training Data Sets.

Authors: Yujin Oh; Sangjoon Park; Jong Chul Ye
Journal: IEEE Trans Med Imaging Date: 2020-05-08 Impact factor: 10.048

5. Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning.

Authors: Hoo-Chang Shin; Holger R Roth; Mingchen Gao; Le Lu; Ziyue Xu; Isabella Nogues; Jianhua Yao; Daniel Mollura; Ronald M Summers
Journal: IEEE Trans Med Imaging Date: 2016-02-11 Impact factor: 10.048

6. Essentials for Radiologists on COVID-19: An Update-Radiology Scientific Expert Panel.

Authors: Jeffrey P Kanne; Brent P Little; Jonathan H Chung; Brett M Elicker; Loren H Ketai
Journal: Radiology Date: 2020-02-27 Impact factor: 11.105

7. Automated detection of COVID-19 cases using deep neural networks with X-ray images.

Authors: Tulin Ozturk; Muhammed Talo; Eylul Azra Yildirim; Ulas Baran Baloglu; Ozal Yildirim; U Rajendra Acharya
Journal: Comput Biol Med Date: 2020-04-28 Impact factor: 4.589

8. Using X-ray images and deep learning for automated detection of coronavirus disease.

Authors: Khalid El Asnaoui; Youness Chawki
Journal: J Biomol Struct Dyn Date: 2020-05-22

9. Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists.

Authors: Pranav Rajpurkar; Jeremy Irvin; Robyn L Ball; Kaylie Zhu; Brandon Yang; Hershel Mehta; Tony Duan; Daisy Ding; Aarti Bagul; Curtis P Langlotz; Bhavik N Patel; Kristen W Yeom; Katie Shpanskaya; Francis G Blankenberg; Jayne Seekins; Timothy J Amrhein; David A Mong; Safwan S Halabi; Evan J Zucker; Andrew Y Ng; Matthew P Lungren
Journal: PLoS Med Date: 2018-11-20 Impact factor: 11.069

10. COVID-Net: a tailored deep convolutional neural network design for detection of COVID-19 cases from chest X-ray images.

Authors: Linda Wang; Zhong Qiu Lin; Alexander Wong
Journal: Sci Rep Date: 2020-11-11 Impact factor: 4.379

51 in total

1. COVID-RDNet: A novel coronavirus pneumonia classification model using the mixed dataset by CT and X-rays images.

Authors: Lingling Fang; Xin Wang
Journal: Biocybern Biomed Eng Date: 2022-08-05 Impact factor: 5.687

2. An ML prediction model based on clinical parameters and automated CT scan features for COVID-19 patients.

Authors: Abhishar Sinha; Swati Purohit Joshi; Purnendu Sekhar Das; Soumya Jana; Rahuldeb Sarkar
Journal: Sci Rep Date: 2022-07-04 Impact factor: 4.996

3. Detecting Covid19 and pneumonia from chest X-ray images using deep convolutional neural networks.

Authors: Nallamothu Sri Kavya; Thotapalli Shilpa; N Veeranjaneyulu; D Divya Priya
Journal: Mater Today Proc Date: 2022-05-19

4. Diagnosis of dengue virus infection using spectroscopic images and deep learning.

Authors: Mehdi Hassan; Safdar Ali; Muhammad Saleem; Muhammad Sanaullah; Labiba Gillani Fahad; Jin Young Kim; Hani Alquhayz; Syed Fahad Tahir
Journal: PeerJ Comput Sci Date: 2022-06-01

5. Automated system for classification of COVID-19 infection from lung CT images based on machine learning and deep learning techniques.

Authors: Bhargavee Guhan; Laila Almutairi; S Sowmiya; U Snekhalatha; T Rajalakshmi; Shabnam Mohamed Aslam
Journal: Sci Rep Date: 2022-10-18 Impact factor: 4.996

6. Audio texture analysis of COVID-19 cough, breath, and speech sounds.

Authors: Garima Sharma; Karthikeyan Umapathy; Sri Krishnan
Journal: Biomed Signal Process Control Date: 2022-04-18 Impact factor: 3.880

7. Framework for Real-Time Detection and Identification of possible patients of COVID-19 at public places.

Authors: Bharati Peddinti; Amir Shaikh; Bhavya K R; Nithin Kumar K C
Journal: Biomed Signal Process Control Date: 2021-04-01 Impact factor: 3.880

8. Determination of COVID-19 pneumonia based on generalized convolutional neural network model from chest X-ray images.

Authors: Adi Alhudhaif; Kemal Polat; Onur Karaman
Journal: Expert Syst Appl Date: 2021-05-04 Impact factor: 6.954

9. Chest X-ray Classification Using Deep Learning for Automated COVID-19 Screening.

Authors: Ankita Shelke; Madhura Inamdar; Vruddhi Shah; Amanshu Tiwari; Aafiya Hussain; Talha Chafekar; Ninad Mehendale
Journal: SN Comput Sci Date: 2021-05-26

10. Classifying Multi-Level Stress Responses From Brain Cortical EEG in Nurses and Non-Health Professionals Using Machine Learning Auto Encoder.

Authors: Ashlesha Akella; Avinash Kumar Singh; Daniel Leong; Sara Lal; Phillip Newton; Roderick Clifton-Bligh; Craig Steven Mclachlan; Sylvia Maria Gustin; Shamona Maharaj; Ty Lees; Zehong Cao; Chin-Teng Lin
Journal: IEEE J Transl Eng Health Med Date: 2021-05-05 Impact factor: 3.316