Literature DB >> 33967366

Automated detection of COVID-19 from CT scan using convolutional neural network.

Narendra Kumar Mishra¹, Pushpendra Singh², Shiv Dutt Joshi¹.

Abstract

Under the prevailing circumstances of the global pandemic of COVID-19, early diagnosis and accurate detection of COVID-19 through tests/screening and, subsequently, isolation of the infected people would be a proactive measure. Artificial intelligence (AI) based solutions, using Convolutional Neural Network (CNN) and exploiting the Deep Learning model's diagnostic capabilities, have been studied in this paper. Transfer Learning approach, based on VGG16 and ResNet50 architectures, has been used to develop an algorithm to detect COVID-19 from CT scan images consisting of Healthy (Normal), COVID-19, and Pneumonia categories. This paper adopts data augmentation and fine-tuning techniques to improve and optimize the VGG16 and ResNet50 model. Further, stratified 5-fold cross-validation has been conducted to test the robustness and effectiveness of the model. The proposed model performs exceptionally well in case of binary classification (COVID-19 vs. Normal) with an average classification accuracy of more than 99% in both VGG16 and ResNet50 based models. In multiclass classification (COVID-19 vs. Normal vs. Pneumonia), the proposed model achieves an average classification accuracy of 86.74% and 88.52% using VGG16 and ResNet50 architectures as baseline, respectively. Experimental results show that the proposed model achieves superior performance and can be used for automated detection of COVID-19 from CT scans.

Entities: Chemical Disease Gene Species

Keywords: COVID-19; CT scan images; Convolutional Neural Network; ResNet50; Transfer learning; VGG16

Year: 2021 PMID： 33967366 PMCID： PMC8084624 DOI： 10.1016/j.bbe.2021.04.006

Source DB: PubMed Journal: Biocybern Biomed Eng ISSN： 0208-5216 Impact factor: 4.314

Introduction

The viral disease COVID-19, which broke-out from Wuhan China, has spread worldwide as a global pandemic. As of 09 Jan 21, more than 87.5 millions of people have been infected including 1.9 million deaths worldwide due to the pandemic [1]. COVID-19 is one of the single largest events in human history which has affected in such proportion globally. The pandemic has not only affected routine activities of the people but has also led to the mental and psychological stress apart from substantial financial loss, economic stagnation, and health liabilities [2], [3]. So far, no proven vaccine has been developed for the disease, though concerted efforts are in-progress. The asymptomatic COVID-19 transmission cases have also been reported, though they are less contagious than the symptomatic cases [4]. These circumstances demand the regular check-ups of a large number of people that may lead to an extra burden. Being a viral infection, early identification, and isolation of COVID-19 patients is vital in breaking the chain of its spread and efficient handling of the present situation. Presently Reverse Transcription–Polymerase Chain Reaction (RT-PCR) and rapid tests are being conducted for early detection of the disease [5]. However, the sensitivity shown by these tests is not optimal. There are many false-positive and false-negative results, which defeats the very purpose of early identification and isolation of the patients [6], [7]. This further delays in making the right decision and suitable action taken. Shortage and timely availability of sufficient test-kits to conduct mass level tests is also a significant concern in many countries [8]. Several studies have been performed for prediction of the spread and continuous monitoring of the COVID-19 pandemic [9], [10], [11], [12]. The X-ray, computerized tomography (CT) scan, ultrasound screening tests are also being used by radiologists to examine the patients to detect COVID-19 infection. While evaluating the CT scan images, specific and consistent patterns/features have been identified in almost all COVID-19 patients. Two significant patterns, one ground glass opacities in the early stage and other one is pulmonary embolism demonstrating linear consolidation in the latter stages, have been identified as prominent signs in detecting the infection of virus [13], [14]. It is important to note that COVID-19 develops symptoms similar to Pneumonia, as both are viral diseases, affecting the lungs and leading to breathing problems. Therefore, it becomes very challenging and bewildering to differentiate between COVID-19 and Pneumonia. In pandemic scenarios, this manual method of evaluating images takes much time and needs intensive human resources. As an alternative, the need of the hour is to find an automatic system/tool to have early and precise detection of COVID-19 disease to control its spread. Machine learning, deep learning and AI based approaches have been used for detection and classifications various diseases [15], [16], [17]. Thus, as an alternative, AI-based solutions can provide efficient solutions that can help in automatic learning of features/patterns from CT scan images, which can augment the capabilities of radiologists in better decision-making and more effective management of the situation. Deep Learning models [18], based on CNN are highly effective and have shown promising results in various medical imaging applications [19], [20], [21]. CNN has made possible the development of deep neural network architectures consisting of several intermediate layers. In contrast to the traditional Machine Learning algorithms, where input features are required to be fed as input to the algorithm, CNN has automatic feature learning capabilities. Technology improvements in terms of large data handling and high-speed Graphical Processing Units (GPU) have worked as a catalyst in augmenting the performance. CNN based deep learning models have shown promising results across diverse fields and are fundamental to almost all image recognition tasks. CNN consists of several intermediate layers, where initially low-level features are learned, and higher-level or fine-grade features are learned in deeper stages. Fundamentally, CNN based model consists of two levels. The first level is feature extraction layer cascaded with several layers of convolution, activation, and MaxPooling operations, which helps in learning unique and specific features from an input. The second level consists of fully connected layers that perform the actual classification task [22]. The development of any deep learning algorithm from scratch requires resources in terms of a high-speed GPU processor for execution, large number of input images for training, and time to fine-tune and optimize the model parameters for specific tasks. Therefore, in this paper, transfer learning-based architectures have been used as a base model to reduce complexity, which has been optimized and fine-tuned further to improve the performance of the COVID-19 detection algorithm. There are a number of easily accessible top-performing models, namely VGG16/VGG19 [23], ResNet50 [24], Inception [25], Xception [26], InceptionResNet [27] and DenseNet [28], which can be integrated into a new image classification task. The weights of these models, which have been trained optimally on millions of images for similar kinds of image recognition tasks, can be easily loaded into the algorithm. Based on requirements, the weights of model can be re-trained, or new convolution layers can be incorporated into the algorithm to increase the model capacity. Further, model performance can be improved by assimilating BatchNormalization, regularization, and tuning of the neural network Hyperparameters. Since the spread of COVID-19, several deep learning-based models have been proposed to detect the disease. Models have been implemented using X-ray [29], [30], [31], [32], [33], [34] or CT scan [35], [36] images. Since the pandemic is in-continuation and spreading at a very high rate, there is a lack of a large number of good quality-labeled radio-graphic images to train a neural network. Given this, the dataset in almost all proposed models are highly unbalanced view paucity of adequate COVID-19 images. It is also pertinent to mention that developing a deep learning model from scratch needs many resources and concerted efforts to optimize it. Therefore, most of the proposed models available in the literature have been built-upon using transfer learning-based architecture. COVID-19 detection model for three classes, consisting of Coronavirus, Bacterial Pneumonia and Normal categories, has been proposed in [29], [37], [35], [38]. In [29], the proposed DarkCOVIDNet model based on the Darknet-19 model has been evaluated using X-ray images and 5-fold cross-validation. The authors in [37] have introduced a COVID-Net network architecture, and [35] has studied various transfer learning-based models, including ResNet50 and VGG16 using X-ray images and single fold cross-validation. A similar analysis has been carried out in [38] using X-ray, CT scan, and ultrasound images applying various transfer learning models. A hybrid approach based on the integration of artificial neural networks and fuzzy logic has been described in [39] for diagnosis of pulmonary diseases using chest X-ray images. The author performs the classification task by feeding grayscale histogram features, grey-level co-occurrence matrix, texture-based features, and local binary pattern texture features into the artificial neural network. The same authors have undertaken the classification of COVID-19 using the same texture features and neural networks in [40]. A seven-layer convolutional neural network-based COVID-19 diagnosis model has been proposed in [41] using 14-way data augmentation and introduced stochastic pooling. In reference [42], six different pretrained models have been experimented with by making the number of layers adaptive and adding two new fully connected layers in each model. The authors have fused the features from the best two pretrained models using discriminant correlation analysis. The authors in [43] have used different pretrained models as feature extractors combined with machine learning algorithms to perform the automatic detection using chest X-ray images. Automated detection of COVID-19 using CNN-based U-Net architecture has been proposed in [44] that consists of a sequence of convolutional blocks where the output of a block is concatenated with the output from a previous block in a specific pattern before feeding to the next block. A model based on Multiple Kernels-Extreme Learning Machine and neural network using chest CT images has been proposed in [45] for COVID-19 detection based on DenseNet201 transfer learning architecture. In this study, the final output class is predicted using the majority voting of the outputs from three different activation functions. CVID-19 detection model proposed by [46] uses ShuffleNet and SqueezeNet architectures as a feature extractor and Multiclass Support Vector Machine as a classifier. In this study, extensive experiments have been performed through the proposed model and the main contributions are as follows: Transfer learning-based architectures (VGG16 and ResNet50), originally developed on ImageNet dataset, have been studied and experimented with to detect COVID-19 disease from CT scan images. The dataset has been prepared by collecting three different categories of images consisting of Normal, Pneumonia, and COVID-19 from authentic websites [47], [48], [49]. Several performance improvement techniques and data augmentation have been incorporated in the proposed model to improve the performance of standard VGG16 and ResNet50 models. To make the standard pretrained models more specific and relevant for COVID-19 detection, convolutional blocks of VGG16 and ResNet50 architectures have been re-trained. A new convolutional block consisting of Convolutional layer, BatchNormalization, MaxPooling, and Dropout has been incorporated into the proposed model to minimize the overfitting and hence, the generalization error. Rest of the work is organized as follows: The details of dataset preparation, proposed neural network architecture, data augmentation, and performance improvements incorporated into the baseline VGG16 and ResNet50 models have been discussed in Section II. The discussions and analysis of results obtained from the experiments are presented in Section III. Finally, Section IV concludes the paper by proposing some future scopes.

Experimental design of COVID-19 detection model

Dataset

The major challenge in training and validating the proposed model is the availability of authentic, labeled, and class-balanced COVID-19 CT scan images. Most of the available open-source datasets are either unorganized or highly unbalanced, which would affect the learning and prediction capabilities of deep learning models significantly. Moreover, the available numbers of COVID-19 CT scan images are minimal and insufficient. Therefore, for experiment purposes, the dataset is obtained by downloading CT scan images from the authentic websites [47], [48], [49]. From these images, a class-balanced dataset consisting of 400 images each of COVID-19 and Normal categories is prepared by performing required pre-processing that consists of resizing and conversion to a common image format (PNG). Further, since COVID-19 has similar symptoms as Pneumonia and both are viral disease, a new dataset comprising three categories of COVID-19, Pneumonia, and Normal (Healthy) images, is prepared to assess the robustness and effectiveness of the proposed model. Two hundred fifty images exhibiting Pneumonia have been downloaded from the website [49] to perform the 3-class classification task. All input images are resized to 224x224 pixels. Sample images from each class are shown in Fig. 1 . The complete dataset is divided in a ratio of 80:20 for training and testing of the neural network. Training dataset has been further divided in a ratio of 80:20 for model training and validation purpose. Detailed information about the training and testing samples is listed in Table 1 .

Fig. 1

Sample CT scan images used for training and evaluation of the proposed COVID-19 detection model. First row: Normal; second row: COVID-19, third row: Pneumonia.

Table 1

Details of number of images used for training, validation and testing of the proposed COVID-19 detection model.

Sr.	Class	Train	Validation	Test	Source
(i)	COVID-19	256	64	80	[48], [49]
(ii)	Normal	256	64	80	[47]
(iii)	Pneumonia	160	40	50	[49]

Total		672	168	210

Sample CT scan images used for training and evaluation of the proposed COVID-19 detection model. First row: Normal; second row: COVID-19, third row: Pneumonia. Details of number of images used for training, validation and testing of the proposed COVID-19 detection model.

Proposed deep learning architecture

In this section, standard VGG16 and ResNet50 models a baseline models have been described, and various performance improvement techniques have been discussed. These techniques have been integrated with the VGG16 and ResNet50 model carefully to obtain the proposed deep learning model in detecting COVID-19 from the CT scan images.

VGG16 and ResNet50 Model

VGG16 model was the winner of the ILSVRC-2014 annual computer vision competition and developed by the Visual Graphics Group (VGG) from Oxford. It has been trained over a dataset of 14 million images corresponding to 1000 categories. It consists of 5 convolution blocks (Conv Block-1 to Conv Block-5) and Fully Connected layers, as shown in Fig. 2 , and details of each block are shown in Table 2 . It is pertinent to note that BatchNormalization and Dropout techniques were not incorporated in the original VGG16 model. The model is flexible, simple in implementation, and still very powerful in performance. Further, as an improvement to the VGG16, ResNet50 architecture was proposed, which was the winner of several tracks in ILSVRC & COCO 2015 competitions. With the increase in deep learning models’ depth, a vanishing gradient problem has been observed, causing accuracy saturation and degradation in subsequent stages. ResNet50 architecture has addressed this problem by introducing a deep residual learning framework and by incorporating shortcut connections. We observe that ResNet50 obtains accuracy gain from increased depths with lower complexity than VGG nets. The details of layers of ResNet50 model is shown in Table 2 and Fig. 2.

Fig. 2

Process flow architecture of the proposed neural network models: (A) Model-1, based on VGG16 architecture; (B) Model-2, based on ResNet50 architecture.

Table 2

Details of convolutional building blocks: VGG16 architecture (Left) and ResNet50 architecture (Right).

Layer Name	Output Size	Filter Details
		[size, numbers]
ConvBlock-1	112×112	3×3,643×3,64
MaxPooling
ConvBlock-2	56×56	3×3,1283×3,128
MaxPooling
ConvBlock-3	28×28	3×3,2563×3,2563×3,256
MaxPooling
ConvBlock-4	14×14	3×3,5123×3,5123×3,512
MaxPooling
ConvBlock-5	7×7	3×3,5123×3,5123×3,512
MaxPooling
FC	1000	FC,4096FC,4096FC,1000,
		Softmax
Conv1	112×112	7×7, 64, stride 2
Conv2_x	56×56	3×3 MaxPool, stride 2
		1×1,643×3,641×1,256×3
Conv3_x	28×28	1×1,1283×3,1281×1,512×4
Conv4_x	14×14	1×1,2563×3,2561×1,512×6
Conv5_x	7×7	1×1,5123×3,5121×1,2048×3
FC	1000	AveragePool,
		FC-1000, Softmax

Process flow architecture of the proposed neural network models: (A) Model-1, based on VGG16 architecture; (B) Model-2, based on ResNet50 architecture. Details of convolutional building blocks: VGG16 architecture (Left) and ResNet50 architecture (Right).

Data augmentation and fine-tuning

Data augmentation and various performance improvement techniques have been implemented to enhance the learning capabilities of the baseline models. Details are enumerated as follows: Data Augmentation: Data augmentation has been applied to improve the diverse feature learning capabilities of the model by creating more number of distinct images from the training dataset. It is pertinent to mention that the performance of deep learning model generalizes well on new images when trained on large number of training dataset. Data augmentation introduces random variations in the dataset at each iteration of the optimization process. It has been applied carefully to augment the dataset while preserving the quality. Various types of data augmentation techniques have been applied to the training dataset, such as horizontal flip, vertical flip, rotation, width/height shift, zoom, and shear. Re-train the weights of VGG16 and ResNet50 Model: VGG16, and ResNet50 were originally developed to classify millions of images into the thousands of categories. Re-training of the models’ convolution blocks has been carried out to improve the model weights specific to the detection of COVID-19 disease. Fine-Tuning of the Model: Hyperparameters of the model have been fine-tuned precisely through iterations to achieve the optimal performance. Various fine-tuning techniques, incorporated into the proposed models, are briefed as follows: Number of Layers- Several number of intermediate layers and nodes have been experimented within the proposed model. It has been observed that a small number of layers underfit the model capabilities. In contrast, large numbers lead to increased model complexity due to an increased number of weights to be learned, more time to converge, and performance degradation. Suitable values were set empirically to achieve performance improvement. Learning Rate- Learning rate is one of the most vital parameter that needs careful selection. It has been observed through experiments that a too-large value of the learning rate leads to the quick and sub-optimal convergence of the optimization process. At the same time, a too-small value stuck the learning process. An optimal value of the learning rate has been deduced analytically to obtain optimal performance. Batchsize- Batchsize is the number of training samples taken together to estimate the error gradient of the optimization algorithm. Experimentally, it has been noted that Batchsize impacts the learning behavior and speed of the model. A suitable value of Batchsize is chosen empirically. BatchNormalization- Improved stability of the neural network has been achieved by incorporating a BatchNormalization after every convolutional and fully connected layers. It reduces the covariance shift between the layers by normalizing the input to a layer[50] while training a neural network. Regularization- For the given training dataset, VGG16 and ResNet50 models showed the tendency of overfitting and sub-optimal performance. Mitigation of overfitting and generalization of the proposed model has been accomplished by incorporating a regularization technique. Two popular regularization methods, Dropout [51] and Early Stopping, have been implemented in this paper. Optimization Algorithms- Various optimization algorithms, viz. SGD, Adam, RMSprop have been analysed to assess their effectiveness in the classification task.

Proposed model

Initially, performance analysis of baseline VGG16 and ResNet50 model was carried out. It was observed that validation learning curves do not converge to training learning curves and have random variations. The baseline model also performed poorly on the test data. Based on these observations, it was predicted that the model is showing a tendency of overfitting and need fine-tuning of Hyperparameters. To further improve the performance, the following modifications have been incorporated in the baseline models: Weights of the convolution Blocks of VGG16 and ResNet50 are re-trained so that the proposed model is more suitable and efficient for the specific classification task. An additional convolution block consisting of Convolutional layers, BatchNormalization, MaxPooling, and Dropout layers are inserted before the fully connected layer in VGG16 architecture. This block has been primarily used to mitigate the overfitting of the model (through BatchNormalization and Dropout) and also to assist in the learning of fine-order features (through convolutional layers). BatchNormalization and Dropout are incorporated in the fully connected block of VGG16. Similar changes are also done in ResNet50, in addition to the incorporation of two more Fully Connected layers. The process flow architecture of the proposed model based on VGG16 and ResNet50 architecture is shown in Fig. 2.

Experimental parameters

The proposed deep learning model has been trained and evaluated using Keras [52] library on Google Colab. The Hyperparameter values have been tuned optimally in multiple iterations while training the model. Details of the final experimental parameters are tabulated in Table 3 .

Table 3

Details of Hyperparameters (Left) and data augmentation (Right) used in the proposed COVID-19 detection model.

Hyperparameters	Details
	Model1	Model2
Learning rate	0.00007	0.00003
Momentum	0.8	0.8
Batchsize	32	16
Epochs	500	350
Patience(Early Stopping)	100	100
Dropout	0.2–0.5	0.5
Optimizer	SGD	SGD
Momentum SGD	0.7	0.8
Learning rate decay	epoch/iteration	epoch/iteration

Data Augmentation

Re-scaling	1/255	1/255
Horizontal flip	Yes	Yes
Vertical flip	Yes	Yes
Rotation range	45°	45°
Width shift range	20%	20%
Height shift range	20%	20%
Shear range	20%	20%
Zoom range	20%	20%

Details of Hyperparameters (Left) and data augmentation (Right) used in the proposed COVID-19 detection model.

5-fold cross-validation

Cross-validation has been used to estimate the performance and robustness of the proposed model on unseen data with the aim of minimal generalization error. In this study, 5-fold cross-validation has been used, where the actual dataset is divided into five folds. In each experiment, only four folds have been used for the purpose of training, and the holdout set is used for testing purpose. Five experiments are performed to get five different accuracies corresponding to each fold of the dataset as test set. The absolute accuracy of the proposed model is the average of the accuracy from all experiments. A stratified version of cross-validation has been implemented to ensure the balance of classes in each fold. The schematic representation of the proposed stratified 5-Fold cross-validation is shown in Fig. 3 .

Fig. 3

Schematic representation of stratified 5-Fold cross-validation technique implemented in the proposed COVID-19 detection model.

Results analysis and discussions

Proposed models have been trained on Google Colab using Hyperparameters as listed in Table IV using the prepared dataset. The model based on VGG16 and ResNet50 have been trained for maximum epochs of 500 and 350, respectively, combined with Early Stopping. Performance analysis is carried out using Accuracy, Precision, Recall (Sensitivity), Specificity, and F1-Score metrics. The values of various metrices recorded and analyzed for each experiment. The performance measures are mathematically calculated as follows:where TP, TN, FP, and FN are True Positives, True Negatives, False Positives, and False Negatives, respectively. Four different experiments have been undertaken to obtain the detailed analysis of the proposed models. Binary classification of COVID vs. Normal, and multiclass (3-class) classification of COVID vs. Normal vs. Pneumonia have been undertaken using two different proposed models (VGG16 and ResNet50). For each experiment, 5-Fold cross-validation has been carried out to evaluate the robustness and effectiveness of the model. Details of models and various experiments are tabulated in Table 4 .

Table 4

Details of models and various experiments conducted in the proposed COVID-19 detection model.

Sr.No.	Particulars	Details
(i)	Baseline Model	Original VGG16 model with changes in output layer
		Original ResNet50 with changes in output layer
(ii)	Proposed Model	Model1 based on VGG16 Architecture (Refer Fig. 2)
		Model2 based on ResNet50 Architecture (Refer Fig. 3)
(iii)	Experiment-1	Binary Classification
		COVID vs. Normal) using Model1
(iv)	Experiment-2	Binary Classification
		(COVID vs. Normal) using Model2
(v)	Experiment-3	Multiclass Classification
		(COVID vs. Normal vs. Pneumonia) using Model1
(vi)	Experiment-4	Multiclass Classification
		(COVID vs. Normal vs. Pneumonia) using Model2

Details of models and various experiments conducted in the proposed COVID-19 detection model. The details of performance measures of the experiments are presented in Table 5 . Moreover, an evaluation matrix for the binary classification of COVID-19 vs. Normal using proposed models (Experiment-1 and Experiment-2) is tabulated in Table 6 . Similarly, an evaluation matrix for multiclass classification of COVID-19 vs. Normal vs. Pneumonia (Experiment-3 and Experiment-4) is presented in Table 7 .

Table 5

Performance measures Matrix of the binary classification (COVID-19 vs. Normal) and multiclass classification (COVID-19 vs. Normal vs. Pneumonia) experiments, based on VGG16 and ResNet50 architecture.

Folds	Experiment-1			Experiment-2			Experiment-3			Experiment-4
	Test	Test	Epochs	Test	Test	Epochs	Test	Test	Epochs	Test	Test	Epochs
	Loss	Accuracy(%)		Loss	Accuracy(%)		Loss	Accuracy(%)		Loss	Accuracy(%)
Fold-1	0.031	99.37	500	0.011	100	350	0.38	82.38	435	0.505	78.09	331
Fold-2	0.032	99.37	346	0.022	100	350	0.45	80.67	307	0.688	81.64	328
Fold-3	0.04	97.5	500	0.02	100	350	0.407	82.85	490	0.186	93.33	350
Fold-4	0.0001	100	339	0.007	100	350	0.259	90.95	313	0.211	93.33	350
Fold-5	0.046	99.37	500	0.054	98.12	350	0.117	96.87	500	0.105	96.19	350

	Average Accuracy  = 99.12			Average Accuracy  = 99.62			Average Accuracy  = 86.74			Average Accuracy  = 88.52

Table 6

Evaluation Matrix of binary classification, COVID-19 vs. Normal, using VGG16 based architecture (Experiment-1) and using ResNet50 based architecture (Experiment-2).

Folds	Class	Experiment-1				Experiment-2
		Precision	Sensitivity	Specificity	F1-score	Precision	Sensitivity	Specificity	F1-score
Fold-1	COVID	100	99	100	99	100	100	100	100
	Normal	99	100	98.75	99	100	100	100	100
	Average	99	99	99.37	99	100	100	100	100
Fold-2	COVID	100	99	100	99	100	100	100	100
	Normal	99	100	98.75	99	100	100	100	100
	Average	99	99	99.37	99	100	100	100	100
Fold-3	COVID	95	100	95	98	100	100	100	100
	Normal	100	95	100	97	100	100	100	100
	Average	98	97	97.5	97	100	100	100	100
Fold-4	COVID	100	100	100	100	100	100	100	100
	Normal	100	100	100	100	100	100	100	100
	Average	100	100	100	100	100	100	100	100
Fold-5	COVID	99	100	98.75	99	96	100	96.25	98
	Normal	100	99	100	99	100	96	100	98
	Average	99	99	99.37	99	98	98	98.12	98

Table 7

Evaluation Matrix of multiclass classification, COVID-19 vs. Normal vs. Pneumonia, using VGG16 based architecture (Experiment-3) and using ResNet50 based architecture (Experiment-4).

Folds	Folds	Experiment-3				Experiment-4
		Precision	Sensitivity	Specificity	F1-score	Precision	Sensitivity	Specificity	F1-score

Fold-1	COVID	79	72	88.46	76	81	55	92.3	66
	Normal	99	100	99.23	99	100	100	100	100
	Pneumonia	62	70	86.87	66	53	80	77.5	63
	Average	80	81	91.52	80	78	78	89.93	76
Fold-2	COVID	79	65	90	71	90	57	96.15	70
	Normal	98	100	98.42	99	100	100	100	100
	Pneumonia	60	74	84.07	66	58	90	78.98	70
	Average	79	80	90.83	79	82	82	91.71	80
Fold-3	COVID	84	68	92.31	75	93	89	96.15	91
	Normal	98	100	98.46	99	100	99	100	99
	Pneumonia	62	80	85	70	84	92	94.37	88
	Average	81	83	91.92	81	92	93	96.84	93
Fold-4	COVID	94	81	96.92	87	92	90	95.38	91
	Normal	100	100	100	100	100	100	100	100
	Pneumonia	75	92	90.62	83	85	88	95	86
	Average	90	91	95.85	90	92	93	96.79	92
Fold-5	COVID	99	94	99.23	96	95	95	96.92	95
	Normal	100	99	100	99	100	100	100	100
	Pneumonia	91	100	96.87	95	92	92	97.5	92
	Average	97	97	98.7	97	96	96	98.14	96

Performance measures Matrix of the binary classification (COVID-19 vs. Normal) and multiclass classification (COVID-19 vs. Normal vs. Pneumonia) experiments, based on VGG16 and ResNet50 architecture. Evaluation Matrix of binary classification, COVID-19 vs. Normal, using VGG16 based architecture (Experiment-1) and using ResNet50 based architecture (Experiment-2). Evaluation Matrix of multiclass classification, COVID-19 vs. Normal vs. Pneumonia, using VGG16 based architecture (Experiment-3) and using ResNet50 based architecture (Experiment-4). Training and Validation Learning Curves of Cross-Entropy Loss and Classification Accuracy have been also evaluated during the training process. Parameters of the model are optimized based on the learning curves of Cross-Entropy Loss, and the model has been evaluated based on the learning curves of Classification Accuracy. The validation curve indicates how well the model is learning. Overfitting and underfitting of the model are assessed, and based on these observations, fine-tuning of the Hyperparameters is carried out to achieve a Good-Fit of the model. During the fine-tuning of the model, it is observed that the Stochastic Gradient Descent optimization algorithm, as compared to the other adaptive optimization algorithms, gives better control of Hyperparameters to improve the learning performance. We also observed that the model performance is highly sensitive to the Hyperparameters, particularly to learning rate, batch size, momentum, and thus careful fine-tuning is required. As compared to ResNet50 architecture, VGG16 architecture is more flexible as extra convolution blocks, including optimization techniques, can be incorporated more easily. Learning curves of Cross-Entropy Loss and Classification Accuracy for Experiment-1 to Experiment-4, for each 5-fold cross-validation, are shown in Fig. 4, Fig. 5, Fig. 6, Fig. 7 , respectively. From the evaluation of the learning curves, it is observed that there is a minimal fluctuations and better convergence between the training and validation curves that shows the Good-Fit of the model.

Fig. 4

Fig. 5

Learning Curves of binary classification, COVID-19 vs. Normal, using ResNet50 based architecture (Experiment-2) using: (A) Fold-1 dataset; (B) Fold-2 dataset; (C) Fold-3 dataset; (D) Fold-4 dataset; (E) Fold-5 dataset.

Fig. 6

Learning Curves of multiclass classification, COVID-19 vs. Normal vs. Pneumonia, using VGG16 based architecture (Experiment-3) using: (A) Fold-1 dataset; (B) Fold-2 dataset; (C) Fold-3 dataset; (D) Fold-4 dataset; (E) Fold-5 dataset.

Fig. 7

Learning Curves of multiclass classification, COVID-19 vs. Normal vs. Pneumonia, using ResNet50 based architecture (Experiment-4) using: (A) Fold-1 dataset; (B) Fold-2 dataset; (C) Fold-3 dataset; (D) Fold-4 dataset; (E) Fold-5 dataset.

Learning Curves of binary classification, COVID-19 vs. Normal, using VGG16 based architecture (Experiment-1) using: (A) Fold-1 dataset; (B) Fold-2 dataset; (C) Fold-3 dataset; (D) Fold-4 dataset; (E) Fold-5 dataset. Learning Curves of binary classification, COVID-19 vs. Normal, using ResNet50 based architecture (Experiment-2) using: (A) Fold-1 dataset; (B) Fold-2 dataset; (C) Fold-3 dataset; (D) Fold-4 dataset; (E) Fold-5 dataset. Learning Curves of multiclass classification, COVID-19 vs. Normal vs. Pneumonia, using VGG16 based architecture (Experiment-3) using: (A) Fold-1 dataset; (B) Fold-2 dataset; (C) Fold-3 dataset; (D) Fold-4 dataset; (E) Fold-5 dataset. Learning Curves of multiclass classification, COVID-19 vs. Normal vs. Pneumonia, using ResNet50 based architecture (Experiment-4) using: (A) Fold-1 dataset; (B) Fold-2 dataset; (C) Fold-3 dataset; (D) Fold-4 dataset; (E) Fold-5 dataset. Confusion matrix [53] is an essential tool to analyze the model performance, which provides a matrix of true labels vs. predicted labels. Confusion matrices of Experiment-1 to Experiment-4, for each fold, are shown in Fig. 8, Fig. 9, Fig. 10, Fig. 11 , respectively. The diagonal elements of the matrix show the number of correct classification in each category. Through the confusion matrix, the performance measures can be calculated for each class, giving a better understanding of the relations and interdependencies among various classes. It is observed that the Normal category of CT images are categorized exceptionally well from COVID-19 and Pneumonia images. However, major confusion occurred between COVID and Pneumonia categories. This confusion is primarily due to the similarity of impacts made by the diseases on human.

Fig. 8

Fig. 9

Confusion Matrix of binary classification, COVID-19 vs. Normal, using ResNet50 based architecture (Experiment-2) using: (A) Fold-1 dataset; (B) Fold-2 dataset; (C) Fold-3 dataset; (D) Fold-4 dataset; (E) Fold-5 dataset.

Fig. 10

Confusion Matrix of multiclass classification, COVID-19 vs. Normal vs. Pneumonia, using VGG16 based architecture (Experiment-3) using: (A) Fold-1 dataset; (B) Fold-2 dataset; (C) Fold-3 dataset; (D) Fold-4 dataset; (E) Fold-5 dataset.

Fig. 11

Confusion Matrix of multiclass classification, COVID-19 vs. Normal vs. Pneumonia, using ResNet50 based architecture (Experiment-4) using: (A) Fold-1 dataset; (B) Fold-2 dataset; (C) Fold-3 dataset; (D) Fold-4 dataset; (E) Fold-5 dataset.

Confusion Matrix of binary classification, COVID-19 vs. Normal, using VGG16 based architecture (Experiment-1) using: (A) Fold-1 dataset; (B) Fold-2 dataset; (C) Fold-3 dataset; (D) Fold-4 dataset; (E) Fold-5 dataset. Confusion Matrix of binary classification, COVID-19 vs. Normal, using ResNet50 based architecture (Experiment-2) using: (A) Fold-1 dataset; (B) Fold-2 dataset; (C) Fold-3 dataset; (D) Fold-4 dataset; (E) Fold-5 dataset. Confusion Matrix of multiclass classification, COVID-19 vs. Normal vs. Pneumonia, using VGG16 based architecture (Experiment-3) using: (A) Fold-1 dataset; (B) Fold-2 dataset; (C) Fold-3 dataset; (D) Fold-4 dataset; (E) Fold-5 dataset. Confusion Matrix of multiclass classification, COVID-19 vs. Normal vs. Pneumonia, using ResNet50 based architecture (Experiment-4) using: (A) Fold-1 dataset; (B) Fold-2 dataset; (C) Fold-3 dataset; (D) Fold-4 dataset; (E) Fold-5 dataset. In last one year, several efforts have been made in development of deep learning based COVID-19 detection models. Direct comparison with the proposed models in literature could not be carried out view lack of common training dataset. However, some comparisons have been undertaken in-terms of proposed methodologies. Authors in [29] have used DarkCOVIDNet model and achieves an average classification accuracy of 87.02% using 5-fold cross-validation on X-ray images. COVID-Net network architecture in [37] achieves classification accuracy of 93.3%, whereas, ResNet50 and VGG16 based models in [35] achieves classification accuracy of 87.54% and 74.84%, respectively. It is pertinent to note that the dataset in [37], [35] are class unbalanced due to lack of adequate COVID-19 images and have been evaluated using single-fold cross-validation only. Superior classification accuracy between COVID-19 and Pneumonia has been proposed using ultrasound images in [38]. The authors in [40] attains validation accuracy of 92.81% for binary class classification (COVID-19 vs. Normal) and 96.83% for three-class classification (COVID-19 vs. Pneumonia vs. Normal) using feature-based feed-forward neural network. The author in [41] using a 10-fold cross-validation experiment for binary classification (normal vs. COVID-19), achieves a sensitivity of 94.44%, a specificity of 93.63%, and accuracy of 94.03%. The proposed model in [42] achieves average F1-score of 97.04 and precision of 97.32%, 96.42%, 96.99%, 97.38% on COVID-19, pneumonia, tuberculosis and healthy classes, respectively. The extractor-classifier combination in [43] achieves the best F1-score of 98.5% using MobileNet architecture with the SVM classifier and a linear Kernel. The authors attains the best F1-score of 95.6% using DenseNet201 with multi-layer perceptron (MLP) for the other dataset. U-Net-based architecture [44] attains average classification accuracy of 94.26% using 10-fold cross-validation using lungs CT scan images. The model [45] based on Multiple Kernels-Extreme Learning Machine and neural network using chest CT images achieves classification accuracy of 98.36% for binary classification. The proposed model in [46] obtains classification accuracy of 100% for binary classification (COVID vs. Non-COVID), 99.72% for three-class classification (COVID-19 vs. Normal vs. Pneumonia) and 94.44% for four-class classification (COVID-19 vs. Normal vs. Bacterial Pneumonia vs. Viral Pneumonia). Based on the extensive experimentation and detailed performance analysis of the proposed models in this study, some important observations have been made, which are briefly presented as follows: In Experiment-1, the model shows superior performance with a test accuracy of more than 97% in all the dataset folds. In all folds, training and validation learning curves converge perfectly, showing a Good-Fit of the model. In Experiment-2, the performance of the model is outstanding. It achieves test accuracy of almost 100% in all experiments that show the great avenues in developing technology for the detection of COVID-19 disease using CT images. This performance is consistent in all the 5-Fold experiments. The training and validation learning curves are converging very well without any substantial random variations. In Experiment-3, the model achieves varying degrees of test accuracies ranging from 80% to 96%. This variation is mainly due to the diverse origin of input images and the similarity of features of Pneumonia and COVID-19 images. However, in all the experiment folds, the training and validation learning curves converge very well. The model improvement can be achieved by adding more number of input training images. In Experiment-4, the Fold-5 experiment achieved the highest test accuracy of 97%, and the Fold-2 experiment achieved the lowest test accuracy of 78%. It shows a wide variation in the test accuracy from different experiments. Comprehensive analysis of the confusion matrix reveals that the reduction in performance is primarily due to the confusion between Pneumonia and COVID classes. As both diseases are viral with similar symptoms and training images are comparatively smaller in number, the model could not learn essential higher-order features that lead to lower test accuracies in some experiments. The model achieved an average accuracy of 88.52% that can be considered a promising performance towards developing a deep learning-based COVID-19 detection model. Stratified 5-Fold cross-validation plays a crucial role in assessing the effectiveness and robustness of the model. It averages the biases towards the large variations in the test dataset as final accuracy will be the mean of accuracy from each fold. Many deep learning-based models proposed in the literature have been evaluated based on single fold experiment only. It is considered to be a preliminary evaluation technique as noticeable from the large variations among the test results from 5-fold cross-validation of each experiment.

Conclusion and future scopes

A deep learning model based on VGG16 and ResNet50 transfer learning architecture has been experimented in this study for the rapid and efficient detection of COVID-19 disease using CT scan images. Four different experiments have been undertaken to evaluate the proposed models for binary class classification (COVID vs. Normal) and multiclass classification (COVID vs. Normal vs. Pneumonia). The robustness of the proposed models has been assessed through the stratified 5-fold cross-validation. It is observed that the proposed model performs exceptionally well in case of binary classification with an average classification accuracy of more than 99% in both VGG16 and ResNet50 based models. The model’s performance degrades relatively when Pneumonia images are added to make it multiclass classification, though it shows superior and promising results. With the limited available dataset, the proposed model achieves an average accuracy of 86.74% and 88.52% using VGG16 and ResNet50 architectures as baseline, respectively, in the case of multiclass classification. This degradation is primarily due to viral nature and similar impacts on human by COVID-19 and Pneumonia. The results analysis of the proposed model has shown promising results, which would enable the development of AI-based automated solutions/systems assisting the radiologist in efficient and accurate detection of COVID-19 disease. The proposed model can be further built upon or improved by researching the following areas: Availability of a large number of good quality training images will enable the learning of diverse and higher-order features and will reduce the generalization error that would further accelerate the research and development. Various high performing transfer learning-based models can be integrated with the proposed model to improve the model’s learning capabilities and performance. Performance can be further improved upon by incorporating X-ray or ultrasound images along with CT scan images. Pre-processing measures can be incorporated into the proposed model to learn the fine-order distinguishing features between Pneumonia and CT scan images to improve the overall performance. A hybrid approach based on integration of fuzzy logic and artificial neural network can be studied to have a robust representation of partial truth/uncertainty in the classification. Performance study of the proposed model can be undertaken by collecting continent-wise dataset to have more understanding about the impact of regional variation.

CRediT authorship contribution statement

Narendra Kumar Mishra: Conceptualization, Methodology, Software, Validation, Writing - original draft, Writing - review & editing. Pushpendra Singh: Conceptualization, Methodology, Validation, Writing - original draft, Writing - review & editing. Shiv Dutt Joshi: Conceptualization, Methodology, Validation, Writing - original draft, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

24 in total

1. Efficient Epileptic Seizure Prediction Based on Deep Learning.

Authors: Hisham Daoud; Magdy A Bayoumi
Journal: IEEE Trans Biomed Circuits Syst Date: 2019-07-17 Impact factor: 3.833

2. SDCT-AuxNet^θ: DCT augmented stain deconvolutional CNN with auxiliary classifier for cancer diagnosis.

Authors: Shiv Gehlot; Anubha Gupta; Ritu Gupta
Journal: Med Image Anal Date: 2020-02-04 Impact factor: 8.545

3. Modeling and prediction of COVID-19 pandemic using Gaussian mixture model.

Authors: Amit Singhal; Pushpendra Singh; Brejesh Lall; Shiv Dutt Joshi
Journal: Chaos Solitons Fractals Date: 2020-06-16 Impact factor: 5.944

4. COVID-19 Detection System Using Chest CT Images and Multiple Kernels-Extreme Learning Machine Based on Deep Neural Network.

Authors: M Turkoglu
Journal: Ing Rech Biomed Date: 2021-01-27

5. COVID-19 classification by CCSHNet with deep fusion using transfer learning and discriminant correlation analysis.

Authors: Shui-Hua Wang; Deepak Ranjan Nayak; David S Guttery; Xin Zhang; Yu-Dong Zhang
Journal: Inf Fusion Date: 2020-11-13 Impact factor: 12.975

6. COVIDetection-Net: A tailored COVID-19 detection from chest radiography images using deep learning.

Authors: Ahmed S Elkorany; Zeinab F Elsharkawy
Journal: Optik (Stuttg) Date: 2021-02-01 Impact factor: 2.443

7. Automatic detection of COVID-19 disease using U-Net architecture based fully convolutional network.

Authors: Prasad Kalane; Sarika Patil; B P Patil; Davinder Pal Sharma
Journal: Biomed Signal Process Control Date: 2021-02-20 Impact factor: 3.880

8. Using X-ray images and deep learning for automated detection of coronavirus disease.

Authors: Khalid El Asnaoui; Youness Chawki
Journal: J Biomol Struct Dyn Date: 2020-05-22

12 in total

1. Multi-task semantic segmentation of CT images for COVID-19 infections using DeepLabV3+ based on dilated residual network.

Authors: Hasan Polat
Journal: Phys Eng Sci Med Date: 2022-03-14

2. Automatic classification of brain magnetic resonance images with hypercolumn deep features and machine learning.

Authors: Kemal Akyol
Journal: Phys Eng Sci Med Date: 2022-08-23

3. Automated diagnosis of COVID stages from lung CT images using statistical features in 2-dimensional flexible analytic wavelet transform.

Authors: Rajneesh Kumar Patel; Manish Kashyap
Journal: Biocybern Biomed Eng Date: 2022-07-01 Impact factor: 5.687

Review 4. Automated COVID-19 diagnosis and prognosis with medical imaging and who is publishing: a systematic review.

Authors: Ashley G Gillman; Febrio Lunardo; Joseph Prinable; Gregg Belous; Aaron Nicolson; Hang Min; Andrew Terhorst; Jason A Dowling
Journal: Phys Eng Sci Med Date: 2021-12-17

5. Computer-Aided Detection of COVID-19 from CT Images Based on Gaussian Mixture Model and Kernel Support Vector Machines Classifier.

Authors: Ahmet Saygılı
Journal: Arab J Sci Eng Date: 2021-10-07 Impact factor: 2.807