Literature DB >> 34010794

Deep learning model for distinguishing novel coronavirus from other chest related infections in X-ray images.

Fareed Ahmad¹, Muhammad Usman Ghani Khan², Kashif Javed³.

Abstract

Novel Coronavirus is deadly for humans and animals. The ease of its dispersion, coupled with its tremendous capability for ailment and death in infected people, makes it a risk to society. The chest X-ray is conventional but hard to interpret radiographic test for initial diagnosis of coronavirus from other related infections. It bears a considerable amount of information on physiological and anatomical features. To extract relevant information from it can occasionally become challenging even for a professional radiologist. In this regard, deep-learning models can help in swift, accurate and reliable outcomes. Existing datasets are small and suffer from the balance issue. In this paper, we prepare a relatively larger and well-balanced dataset as compared to the available datasets. Furthermore, we analyze deep learning models, namely, AlexNet, SqueezeNet, DenseNet201, MobileNetV2 and InceptionV3 with numerous variations such as training the models from scratch, fine-tuning without pre-trained weights, fine-tuning along with updating pre-trained weights of all layers, and fine-tuning with pre-trained weights along with applying augmentation. Our results show that fine-tuning with augmentation generates best results in pre-trained models. Finally, we have made architectural adjustments in MobileNetV2 and InceptionV3 models to learn more intricate features, which are then merged in our proposed ensemble model. The performance of our model is statistically analyzed against other models using four different performance metrics with paired two-sided t-test on 5 different splits of training and test sets of our dataset. We find that it is statistically better than its competing methods for the four metrics. Thus, the computer-aided classification based on the proposed model can assist radiologists in identifying coronavirus from other related infections in chest X-rays with higher accuracy. This can help in a reliable and speedy diagnosis, thereby saving valuable lives and mitigating the adverse impact on the socioeconomics of our community.

Entities: Chemical Disease Species

Keywords: Data augmentation; Deep learning models; Ensemble learning; Feature fusion; Novel coronavirus classification; Transfer learning

Year: 2021 PMID： 34010794 PMCID： PMC8058056 DOI： 10.1016/j.compbiomed.2021.104401

Source DB: PubMed Journal: Comput Biol Med ISSN： 0010-4825 Impact factor: 4.589

Introduction

Zoonotic diseases are caused by microbes that are transmitted from animals to humans [1]. The scope, scale and worldwide effect of zoonoses pose a risk not only to animals and humans but also to global health safety [2]. Approximately 1500 pathogens are recognized to induce infections in individuals [3]. Among these infections, 61% of the recognized and 75% of the emerging infectious diseases in humans are zoonotic in origin [2,4]. It is approximated that infectious illnesses cause nearly 16% of all the deaths and 44% of the deaths in low-income countries [2]. Annually, zoonotic illnesses cause 2.7 million mortalities and 2.5 billion illnesses in individuals [5]. Emerging zoonotic infections are responsible for various notable and disastrous epidemics [6]. Coronavirus is of zoonotic origin [7]. A study proposes that the virus is transferred to humans either by means of pangolins or bats [8]. No direct relationship between humans and other species is reported. However, the virus is an extremely mutated microorganism that can comfortably cross the species barrier [9,10]. It can affect the cells of human airways. As a result, it can induce pneumonia and critical respiratory diseases, kidney malfunction and can even cause death [11]. The pathogen can persist in the air and on various surfaces from several hours to multiple days [12]. Health executives have observed that the pathogen spreads through coughed and sneezed droplets of 5–10 μm and are around 30 times tinier than a human hair [12,13]. Many scientists also believe that virus can persist in aerosol form [14,15]. World health organization (WHO) is currently assessing the role of aerosol transmission [16]. Recently, WHO said that it cannot be ruled out in closed, crowded and poor ventilated settings [17]. Thrilling analysis foretells that sneezing can cause the particles of the virus to move up to 27 feet [18]. Its ease of dispersion, tremendous capability for ailment and death rate make it a likely candidate for a bio-warfare [[19], [20], [21]]. In USA, UK and Europe, the epidemic is extensively prevalent and millions of individuals are infected and thousands have died because of the infection. As of February 18, 2021, approximately 27.82 million cases are recorded and 490,717 people lost their lives in the United States [22]. The country's topmost contagious disease specialist predicts that without proper safety measures, the lethal pathogen might kill up to 2.4 million individuals [23]. The UK and the European states such as France, Italy, Germany and Spain recorded that about 15.88 million individuals became affected and 0.34 million lost their lives [22]. The British health executives believe that the virus can infect 80% of the population and 0.5 million individuals can lose their lives [24]. All over the globe, billions of people are either living their lives under lockdowns or in self-quarantine. If the virus is allowed to continue its route, the healthcare system will be overburdened, economics will crumble and millions of valuable lives will be lost [23,25]. The most commonly used techniques for the identification of coronavirus are enzyme-linked immunosorbent assay (ELISA) and Reverse transcription-polymerase chain reaction (RT-PCR) [26]. The primary screening method for recognizing COVID-19 is RT-PCR, which identifies the pathogen's RNA from lower respiratory tract specimens. These specimens are collected in several ways, such as nasopharyngeal or oropharyngeal swabs. While RT-PCR is regarded as a golden standard [27,28], it is a time consuming, complicated and sensitive manual approach [29]. The test results can be affected by low sensitivity, sampling errors and low viral load [30,31]. In the case of inadequate viral load, the results of the test can be falsely negative [28,32]. An alternative method used for COVID-19 screening employs chest radiographic images (i.e. X-rays or CT-Scans). Radiologists analyze these images to assess the apparent symptoms related to COVID-19. Initial studies revealed that patients show variations in chest radiographic images indicative of those concerned with SARS-CoV-2 viral infection [28,33]. Other studies suggest that radiographic analyses should be used as a principal mechanism for the screening of the virus in afflicted regions [28]. The radiography-based tests can be performed quickly and are readily available in our healthcare system. It gives them a real compliment to PCR testing (in some cases, even showing higher sensitivity) [28,32]. A study based on 1014 cases recorded a 97% sensitivity of chest radiographic images for the diagnosis of COVID-19, while the average time interval within initial negative and positive RT-PCR was roughly five days [28]. The major issue with radiographic anaysis is that they require an expert radiologist to assess resulting medical images as the visual indicators can be difficult to interpret [34,35]. Though, computer-aided designs can help radiologists to instantly and accurately interpret radiographic images to recognize disorders that cause COVID-19 and other chestrelated infections. In current times, deep learning models can facilitate in early diagnosis, quick prevention and remedy of the infections caused by the novel virus [36]. A combination of convolutional neural networks (CNN) (e.g. InceptionV3 [37] and AlexNet [38]), large-scale databases (e.g. ImageNet [39]) and an effective overfitting checking method (e.g. dropout [40]) have shown an improved prediction accuracy, performance and outstanding generalization ability to solve medical [41,42], biological [36,43] and complex computer vision tasks [44]. The advantage of deep networks for image classification is that a model is trained end-to-end but suffers from overfitting on modest datasets. It is a challenging task for deep networks to obtain the same good performance on small datasets as they can obtain on large datasets [45]. A solution to this problem is implementing pre-trained deep designs on a small dataset [46]. This technique is known as transfer learning which replaces the last few layers of the pre-trained model and fine-tunes it on a given dataset. The technique can be very useful if augmentation along with proper hyper-parameters are set and efficient fine-tuning strategies are applied. Another technique that is also being effectively applied on image classification tasks is ensemble learning. It fuses the features from various deep models into a classifier of high-grade quality thus, achieving better and reliable predictive performance. Image processing and deep learning models in bio-medical image processing and analysis have yielded promising results particularly in the field of chest radiology. The techniques are generally used to conduct lung nodule classification [47], pulmonary tuberculosis identification [48] and especially for novel coronavirus classification in radiological images [35,[49], [50], [51], [52]]. Next, a summary of pneumonia and COVID-19 related works from the current literature is presented., , Ayan and Ünver [53] proposed a quick diagnostic technique for pneumonia based on chest X-ray images using Xception and VGG-16. Their work shows that Xception generates better results in identifying pneumonia and VGG-16 performs good for healthy cases. Varshni et al. [54] apply six deep models (Xception, VGG-16, ResNet 50, DenseNet-169 and DenseNet-121) for feature retrieval on X-ray images and then use numerous machine learning classifiers (SVM, K-nearest neighbors, Naıve Bayes, and Random Forest) for classification. The results are further improved by optimizing the hyper-parameters. Stephen et al. [55] proposed a CNN model trained from scratch to classify and detect pneumonia in chest X-rays images [56]. The design retrieves relevant features from the images and uses them to classify the infection. The model achieved an accuracy of 93.73% on a meager dataset with the help of augmentation, hyper-parameter tuning and fine-tuning. In Ref. [57], the authors propose a compressed sensing based deep model for automatic classification of pneumonia using chest radiology images to help the medical professionals in early diagnosis of the disease. The dataset consists of 5863 X-ray images from Kaggle. Wide-ranging simulation outcomes have shown that the proposed technique predicts the presence of pneumonia with 97.34% accuracy. Chouhan et al. [58] devised an ensemble method that combines the outputs from different models to classify pneumonia in chest X-ray images using transfer learning. The ensemble model attained an accuracy of 96.4% with a recall of 99.62% on the test data. The outcomes of the convolution models on different abnormalities based on the freely available OpenI dataset [59] show that the same deep CNN model does not give acceptable results for all types of abnormalities [60]. Yet, ensemble model considerably increases the classification accuracy as compared to a particular deep learning model. Gozes et al. [50] presented a swift AI development cycle using a CNN-based analysis of CT-scan images. Xu et al. [51] proposed a three-class model that can distinguish among normal, viral and COVID-19 cases. The segmentation-based approach attains an accuracy of 86.7% on a dataset with 618 CT-scan images. Wang et al. [52] suggested a deep design to retrieve graphical features from CT-scan images for COVID-19 classification. The design processes a dataset consisting of 1065 CT-scan images (325 COVID-19 images and 740 viral pneumonia images) of patients. The model obtains a test accuracy of 79.3%. Ioannis et al. [61] presented a transfer learning technique for classifying various chest-related infections on a dataset with 504 normal, 700 bacterial pneumonia and only 224 COVID-19 X-ray images. Their technique's specificity, sensitivity and accuracy were 96.46%, 98.66% and 96.78% respectively. Although various models were compared but results were based on a small number of COVID-19 instances. In Ref. [35], the authors provide a combined public dataset and proposed a deep model called COVID-Net for the detection of COVID-19. The architecture relies on a tailored CNN model, which uses the chest X-rays as inputs. The dataset consists of 358 COVID-19, 5538 pneumonia, 8066 normal images. COVID-Net achieves an accuracy of 93.3% but COVID-19 class images were limited. Ucar and Korkmaz [62] train a pre-trained deep network SqueezeNet with the help of Bayesian optimization technique to identify COVID-19 re-lated infections in chest X-ray images. The dataset consists of 66 COVID-19, 1349 normal and 3895 pneumonia X-ray images. Although the model shows promising results on the small dataset but its performance on a large COVID- 19 related dataset was not checked. Abbas et al. [63] exhibit a CNN model that performs dimensionality reduction to transform a high dimensional feature space into a lower one. The dataset comprises 80 normal, 11 SARS and 105 COVID-19 X-ray images. The model attains an accuracy of 95.12%. Khan et al. [64] employ the Xception model to automatically detect coronavirus infec-tions in chest X-ray images. The dataset consists of 310 normal, 330 bacterial pneumonia, 327 viral pneumonia and 284 COVID-19 images. On this small dataset, the model achieves an accuracy of 89.6% while achieves an accuracy of 95% by combining the two pneumonia classes. Ashfar et al. [65] proposed a capsule network known as COVID-CAPS. The network achieves an accuracy of 95.7% on 94,323 X-ray images related to common thorax disorders. Gian-chandani et al. [66] apply two deep ensemble models, one comprising of VGG16 and ResNet152V2 for binary (COVID-19, Normal) classification and another consisting of VGG16, DenseNet201 for three-class (COVID-19, bacterial, and viral) classification. For the binary model 1525 images are used for each class and for three-class models only 401 images are used for each class. Although results are good but may not reproduce for a larger dataset with many classes. A simple train, validation, test split is used, although cross-validation is mostly recommended for limited datasets. Singh et al. [67] applies genetic and parti-cle swarm optimization based CNN's for screening of COVID patients. These genetic models apply a simple binary classification approach, they are slow to converge but still deliver good results. However, training, validation, and test curves are widely separated, which reflect overfitting. Hyperparameter tuning is also applied to further improve the model's performance. The author [68] also applies ensemble learning using three CNN models (VGG16, DenseNet201, and ResNet152V2) on CT-Scan images to generate excellent results on a larger four class (COVID-19, pneumonia, tuberculosis, and normal) dataset. A simple train test split is applied instead of applying cross-validation. Most of the above mentioned approaches for the identification of pneumonia apply machine and deep learning methods along with fine-tuning, augmentation, and ensemble learning. These approaches have produced good results and are being used to recognize coronavirus related infections in chest radiology. Most of the above mentioned COVID-19 classification techniques were tested on small datasets to produce promising results, but there is no guarantee that these models would generate similar results on large datasets. Furthermore, most of those datasets were very unbalanced and most of the models did not employ hyper-parameter tuning, cross-validation, and statistical analysis. Table 1 provides a summary of the comparison of the research works related to CNN models.

Table 1

A comparison of different CNN models for COVID-19 Classification.

Approach	No. Of Classes	Dataset Details	Aug-mentation?	Dataset Balanced?	Transfer Learning?	5-fold CV?	Ensemble Model?	Statistical Comparison?	Tuning Hyper parameters?
[30]	5	N 191, B 54, T 57, V 20, C 180		×	×	×	×	×
[35]	3	N 8851, P 6012, C 180		×	×	×	×	×	×
[62]	3	N 1349, P 3895 C 66		×	×	×	×	×
[65]	5	Not specified	×	×	×	×	×	×	×
[70]	3	3 N 8066, P 5521 C 183		×	×	×	×	×	×
[71]	3	N 8851, P9579C99		×	×	×	×	×	×
[72]	4	N 7595, B 2780 C 313, UP 6012	×	×	×	×		×
[73]	2	N 500, C 184		×	×	×	×	×	×
[74]	4	N 1203, B 931 V 660, C 68		×	×	×	×	×	×
[75]	3	N 1579, V 1485 C 423		×	×		×	×	×
[66]	3	N 401, V 401 C 401		×		×		×

A comparison of different CNN models for COVID-19 Classification. In this work, we prepared a relatively larger and well-balanced dataset consisting of X-ray images of patients with bacterial, viral and novel coronavirus infections. We also included X-ray images of healthy individuals. Fig. 1 shows some of the samples. These images are analyzed by applying transfer learning along with fine-tuning on pre-trained models to explain whether these pre-trained networks can provide better results when data is scarce. Apart from implementing fine-tuning and transfer learning, we also applied traditional data augmentation methods like rotation and reflection [69], which resolves the scarcity issue of training data by enriching it with transformed original examples. Finally, we propose a deep ensemble learning model composed of MobileNetV2 and InceptionV3 models as shown in Fig. 2 . It produces far better results than currently available fine-tuned pre-trained architectures. The devised model obtains state-of-the-art performance on image dataset associated to various chest X-ray infections.

Fig. 1

Example posteroanterior chest radiograph images of our dataset.

Fig. 2

Various phases of our proposed method.

Example posteroanterior chest radiograph images of our dataset. Various phases of our proposed method. The contribution of our work can be summarized as follows. We prepared a large and well-balanced dataset with four classes (viral, bacterial, COVID-19 and normal). Each class consists of 1000 images. Our work integrates fine-tuning, augmentation, transfer learning and hyper-parameter tuning into one model. We propose a deep design by merging the features of MobileNetV2 and InceptionV3 models using an ensemble approach. We have made architectural adjustments to deep models by adding three dense layers at the end of each model to learn more intricate features before merging their features through the addition layer. We also added a dropout layer after the addition layer to handle overfitting. We compare our ensemble model with state-of-the-art deep models in terms of four performance metrics namely specificity, recall, F-score and accuracy. We also compared our ensemble model with another ensemble approach presented in the literature, to show that our model is far superior to the previous model. To statistically validate the performance of the proposed design, we employ 5-fold cross-validation with the paired two-sided t-test.

Materials and methods

A deep learning framework for feature fusion of deep models using ensemble learning is presented in Fig. 3 . The proposed framework consists of seven main steps: (i) Image resizing of large-scale images (ii) Dataset splitting (iii) Data augmentation (iv) Fine-tuning of deep models (v) Merging best performing models using ensemble learning (vi) Adding additional dense and dropout layers and finally performing classification on the proposed ensemble model.

Fig. 3

Flowchart of various steps of our proposed method.

Flowchart of various steps of our proposed method. The classes column gives the number of classes in a dataset. In the dataset details column, N, B, V, T, P, UP, C represent the number of Normal, Bacterial, Viral, Tuberculosis, Pneumonia, Pneumonia of Unknown type, Covid-19 instances respectively. Augmentation? column describes whether the research work uses data augmentation. Dataset Balanced? shows whether the dataset is balanced. Transfer Learning? means whether transfer learning was employed. Similarly the 5- fold CV? column indicates whether cross validation was used. The Ensemble Model? column shows whether the approach uses ensemble learning. The Statistical Comparison? column shows whether the approach performs statistical comparison with other deep models. The last column Tuning Hyper parameters? shows whether the approach tunes hyper parameters during model training. Algorithm 1 describes the pseudo code of our proposed deep ensemble method. Initially, X-ray images, which are provided as input are resized according to the required image scale. Then, these rescaled images are split into training and test sets. We apply augmentation to these images to generate an augmented dataset. A list of different deep learning models is retrieved. These deep learning models are loaded, trained, fine-tuned and evaluated one by one. Each fine-tuned model is appended to Model-List one by one. Finally, the best two models are merged and some additional dense and dropout layers are added to further improve the ensemble model. We perform classification to obtain the class labels of the bacterial, viral, COVID-19 images as output.

Data description and augmentation

Our dataset consists of X-ray images belonging to four categories namely bacterial, viral, COVID-19 and normal. There are 4000 images in the dataset and each category has 1000 images. X-ray images of bacterial, viral and normal categories were collected from the Kaggle dataset [76]. As far as the images of COVID-19 are concerned, 900 images were gathered from Mendeley dataset [77] and the remaining hundred images were collected from two open-source repositories namely (i) Italian Society of Medical and Interventional Radiology (SIRM) [78] and (ii) Radiopaedia [79]. Augmentation is employed to the proposed model to enhance the size of the training data, overcome over-fitting and devise a more generalized model. Differ-ent augmentation techniques like random rotation, random vertical/horizontal reflection and random vertical/horizontal shear are employed to generate an augmented dataset. During the experimentation, 80% of the dataset is used for training a model and the remaining 20% of the data is used for testing purposes.

Pre-processing

The chest X-ray images in the dataset are massive in size, with few images with a dimensions of 1007 × 1024 pixels. These images are resized to a standard size 299 × 299 pixels. But we had to further tailor these resized images to meet the requirements of deep learning models. Pre-trained models like MobileNetV2 and InceptionV3 have an input size of 224-by-224 and 299-by-299 respectively. The images were finally resized and provided to the given deep model. Some of the chest X-ray images were cropped, which helped in eliminating the noise produced by the extended black background.

Hyper-parameters tuning

In our work, we tried to estimate the impact of various hyper-parameters on the performance of the selected deep models. These hyper-parameters include the number of epochs, learning rate and batch size. We have investigated five deep models namely AlexNet, SqueezeNet, DenseNet201, MobileNetV2 and InceptionV3. For each of these models, we have fine tuned the parameters. The different values of the learning rate that we tried are {9e-4, 9e-5, 1e-5, 9e-6, 7e-6, 5e-6, 3e-6, 1e-6, 8e-7, 6e-7, 4e-7, 1e-7, 1e-8, 5e-10}. For the batch size, the values are {24, 36, 32, 48}. The values of number of epochs are {10, 30, 40, 50, 85, 100, 135, 150, 200, 300, 400, 500, 600, 700, 800, 900, 950}. We found that mostly the deep models displayed the best results with the batch size values of {48, 32}. For the initial epochs, learning rate of {1e-8, 1e-7, 9e-6} were very fruitful and gradually we increased the learning rate. The details of various hyper-parameters used during the training of different deep models are presented in Table 2 . To maintain a stable utilization of the GPU resource, we have identified the value twenty-four as the smallest batch size.

Table 2

The range of hyper-parameters used for different pre-trained models.

Deep Model	Batch Size	Learning Rate
AlexNet	48, 32	5e-10, 1e-8, 8e-7,6e-7, 1e-7, 1e-6, 1e-5
SqueezeNet	48, 32, 24	1e-8, 8e-7, 1e-7,1e-6, 9e-5
DenseNet201	48, 32	1e-7, 1e-6,1e-5
MobileNetV2	48, 32, 24	1e-8, 6e-7, 4e-7, 1e-7,9e-6, 5e-6, 3e-6, 1e-6, 8e-5
InceptionV3	36, 32, 24	1e-8, 1e-6, 8e-5, 2e-5,1e-5
Our ensemble model (InceptionV3 + MobileNetV2)	48	9e-8, 9e-7, 7e-7, 4e-7, 1e-7,7e-6, 1e-6

The range of hyper-parameters used for different pre-trained models.

Overview of convolutional neural networks (CNNs)

The CNNs rely on the foundation of conventional neural networks and usually consist of a convolutional layer, a pooling layer and a fully connected layer. After the convolution operation, we generally perform pooling to reduce the dimensionality. This enables us to decrease the number of parameters that decreases the training time. It also helps in overcoming the overfitting problem. The pooling operation down samples each feature map individually, reducing the height and width while keeping the significant features intact. After these layers, various fully connected layers try to acquire mid-level feature maps. Executing full connection in these layers needs a large number of weight parameters. For further details, we refer the readers to Ref. [80]. The training of CNNs initiates in a feed-forward manner as it begins from the initial layer to the final output layer. Then, the error propagates in reverse order as it starts from the final layer to the convolution layer. Let p be a neuron in layer a, which accepts input from a neuron q of layer a − 1 in the forward pass. The input [81] is calculated as below.where b and are the bias term and weight vector of the ath layer respectively. A nonlinear function such as rectified linear activation function (ReLU) [81] can be used to compute the output, which is given below. All the nodes in the convolution and fully connected layers use Equations (1), (2)) to compute the input and output. The pooling layer utilizes a K × K square window sliding on the N × N features map, and takes the average or maximum value of the features inside the window. It therefore, reduces the spatial dimension of the feature map from N × N to N/K × N/K as it generates a single value for the K × K region. We use the SoftMax function [82] to calculate the classification probability of every pathogen in the final layer as given below. Back propagation procedure is used to trained the CNN's. It can be used to minimize the cost function [82] related to the unknown weight W, which is given below. where m represents the number of instances in a training set, X is the nth instance in the training set and y is its corresponding label and its true classification probability is p (y |X ). Using stochastic gradient descent (SGD) over the mini-batches of magnitude N, the cost function is reduced and training costs are estimated by the mini batch cost. If W denotes the weights at iteration s for the convolution layer a and C is the cost of mini-batch, we will then update the weights [82] at the subsequent iteration as given below.where α represents the learning rate of layer a, γ is the rate of scheduling, which diminishes the original learning rate α at the end of the specific number of epochs and μ denotes the momentum, which portrays the impact of previously updated weights in the recent iteration.

Transfer learning and fine-tuning deep learning models

During the training process, the weights of various layers of pre-trained mod-els are updated after every iteration, as given by Equation (5). There are 314 layers in total and 25 million training parameters in the InceptionV3 with a depth of 48 layers. On the other hand, in MobileNetV2, there are 155 layers and 3.5 million parameters with a depth of 53 layers. Different deep models have different number of training parameters and depths as shown in Table 3 .

Table 3

The number of training parameters and depth of Pre-trained Models.

Pre-trained Model	Depth	Weight Parameters
AlexNet	8	61 million
SqueezeNet	18	1.24 million
DenseNet201	201	20.0 million
MobileNetV2	53	3.5 million
InceptionV3	48	25 million

The number of training parameters and depth of Pre-trained Models. For the training and optimization of these deep models, an abundant amount of data is a pre-requisite, as it serves as a fuel for these models. However, for a modest dataset, it is pretty challenging to acquire the appropriate local minimum for the cost function and the model will undergo overfitting. Originally pre-trained weights are applied for the MobileNetV2 and InceptionV3 models. These models are fine-tuned on our dataset by applying different values of the learning rate, batch size and the number of epochs. The initial layers in these models possess general features and the succeeding layers contain domain specific features. To preserve the features from the initial layers and decrease the pace of learning in the remaining transferred layers, the primary learning rate is fixed to a small value. However, to increase the pace of learning in the newly appended layers than in the transferred layers, the rate is set to a higher value in the fully connected layer. The final fully connected layer of the model consists of 1000 neurons that correspond to classes in the ImageNet. To gain the do-main specific features of novel coronavirus and other infections in X-ray images, this layer is set to 4 neurons according to the number of classes in our dataset. The pre-trained models and ensemble learning are explained in the following sub-sections.

MobileNetV2

MobileNetV2 is a deep convolutional network with 53 layers. The model is an extension of MobileNetV1, which proposes a depth wise separable con-volution layer, which miraculously lessens the size and complexity cost of the design. The current design introduces a more beneficial module with an inverted residual structure, where the input and output of the residual block are narrow bottleneck layers. The intermediate layer is an expanded representation that uses lightweight depth wise convolutions to filter features. Moreover, non-linearities are removed in the thinner layers to maintain the representational ability [83]. MobileNetV2 can generate state-of-the-art performances in image classification.

InceptionV3

InceptionV3 is a pre-trained deep architecture that consists of 48 layers. The architecture is trained on the ImageNet repository that consists of more than a million images [37]. The architecture can categorize images into 1000 classes. As a result, the architecture retrieves valuable features for a broad category of images. InceptionV3 is an enhanced variant of InceptionV2 that achieves enormous proficiency in image classification tasks by factorizing 5 × 5 convolution layers into two simplistic 3 × 3 layers. The representational bottleneck is eliminated by adding a regularization part to the loss function. The novel InceptionV3 model narrows down overfitting and achieves label smoothing to a large amount. The design also factorizes a 7 × 7 convolution layer and joins many discrete layers with batch normalization routine, providing even higher accuracy with less computational complexity. Fig. 4 shows the complete design of the InceptionV3 module.

Fig. 4

InceptionV3 module.

InceptionV3 module. Confusion matrices for 5-fold cross-validation. ROC curve of the deep ensemble model for 5-fold cross-validation.

Ensemble learning for classification

These kinds of networks possess nonlinear architectures that acquire com-plex associations from the input data by stochastic optimization and back-propagation. Thus, making them greatly sensitive to random weight initializations and the noise present in the training data. These concerns can be alleviated by employing an ensemble model by training numerous deep models and fusing their predictions, where a particular model's shortcomings are compensated by the predictions of the other deep model. Combined predictions are demonstrated to generate better results than a specific deep model [72]. There are many ensembles learning strategies stated in the literature like stacking, max voting, weighted averaging, boosting, simple and blending and several others that minimize the error and improve performance and generalization of deep models. Various research works [47,78,84] depict that ensemble models generate better results for the classification of tuberculosis in X-ray images. An averaging ensemble method utilized to pre-trained models assisted researchers in improving cardiomegaly classification using chest X-ray images [60]. Initially, we conduct feature extraction from InceptionV3 and MobileNetV2. But before extracting features, three dense layers are appended to these deep models, which help the models in learning intricate features. The succeeding addition layer helps in merging features from these models. Then, a 0.5 dropout layer is added before classification, which helps in overcoming the hurdles of overfitting and long training time [85]. Finally, the ensemble learning model performs classification, which produces far better results than any particular deep learning design.

Results

This section presents the experimental results of the five deep learning models. Initially, we select the best two models and retrieve features from these pre-trained models by employing the ensemble method. Finally, for classifi-cation, a fully-connected, SoftMax layer and classification layer are appended after the dropout layer to the ensemble model. We evaluate the performance of these deep models under four different perspectives. When these deep models are (i) trained from the start (ii) fine-tuned without pre-learned weights (iii) fine-tuned with all layers unfrozen and (iv) augmentation and fine-tuned with all layers unfrozen. For each above strategy, specificity, recall, F-score and accuracy are determined for deep models on the COVID-19 dataset, as shown in Table 4 . We can find the following observations.

Table 4

A comparison of performances of our ensemble model and various deep learning models.

Model	Methods	Specificity	Recall	F-score	Accuracy
Our ensemble method (InceptionV3+MobileNetV2)		98.97 ± 0.22	96.89 ± 0.65	96.90 ± 0.65	98.45 ± 0.32
Ensemble method [66] (Vgg16+Densenet201)		• 91.35 ± 0.42	• 74.05 ± 0.96	• 65.34 ± 1.10	• 87.04 ± 0.57
AlexNet	Trained on ODS-NPTW	76.15 ± 0.37	28.48 ± 1.46	24.18 ± 1.82	64.36 ± 0.64
	FT on ODS-NPTW	92.69 ± 0.70	78.11 ± 2.20	77.77 ± 2.17	89.03 ± 1.03
	FT on ODS-PTW	97.41 ± 0.43	92.21 ± 1.25	92.20 ± 1.33	96.13 ± 0.62
	FT on ADS-ALUF	• 97.67 ± 0.46	• 93.05 ± 1.08	• 92.96 ± 1.13	• 96.50 ± 0.64
SqueezeNet	Trained on ODS-NPTW	75.28 ± 0.31	25.85 ± 0.95	14.00 ± 1.32	62.91 ± 0.69
	FT on ODS-NPTW	92.46 ± 0.49	77.30 ± 1.13	77.35 ± 1.15	88.69 ± 0.65
	FT on ODS-PTW	97.24 ± 0.38	91.77 ± 0.87	91.78 ± 0.95	95.87 ± 0.54
	FT on ADS-ALUF	• 97.73 ± 0.25	• 93.21 ± 0.47	• 93.19 ± 0.47	• 96.60 ± 0.33
Densenet201	Trained on ODS-NPTW	74.64 ± 1.43	23.73 ± 4.03	22.20 ± 4.68	61.99 ± 2.14
	FT on ODS-NPTW	87.46 ± 0.74	62.36 ± 2.49	62.55 ± 2.33	81.21 ± 1.01
	FT on ODS-PTW	97.36 ± 0.43	92.08 ± 1.35	92.04 ± 1.41	96.05 ± 0.65
	FT on ADS-ALUF	• 97.91 ± 0.38	• 93.67 ± 1.35	• 93.67 ± 1.33	• 96.86 ± 0.32
MobileNetV2	Trained on ODS-NPTW	74.48 ± 0.57	23.46 ± 1.73	14.29 ± 0.85	61.74 ± 1.06
	FT on ODS-NPTW	92.33 ± 0.54	77.04 ± 1.06	76.68 ± 1.13	88.50 ± 0.74
	FT on ODS-PTW	97.91 ± 0.21	93.76 ± 0.63	93.71 ± 0.66	96.86 ± 0.32
	FT on ADS-ALUF	• 98.24 ± 0.27	• 94.71 ± 1.01	• 94.68 ± 1.01	• 97.36 ± 0.42
InceptionV3	Trained on ODS-NPTW	72.16 ± 0.58	16.57 ± 1.33	16.31 ± 1.00	58.24 ± 0.59
	FT on ODS-NPTW	85.01 ± 0.41	55.02 ± 1.09	55.29 ± 1.17	77.53 ± 0.52
	FT on ODS-PTW	98.08 ± 0.45	94.19 ± 1.63	94.21 ± 1.60	97.11 ± 0.70
	FT on ADS-ALUF	• 98.49 ± 0.38	• 95.42 ± 1.31	• 95.43 ± 1.28	• 97.74 ± 0.58

ODS, ADS, NPTW, PTW, ALUF, FT stands for original dataset, augmented dataset, no pre-trained weights, pre-trained weights, all layers un-frozen, fine-tuned. A • denotes that our deep learning ensemble model is statistically better than its competing model.

When the dataset is limited, conventional shallow CNN models produce better results as compared to deeper models. This can be seen from the performance metrics of these CNN models trained from start. The CNN models produce comparatively low specificity, recall, F-score and accuracy because they are not thoroughly trained, due to an immense number of parameters and inadequate training instances. Fine-tuning can also assist in improving performance metrics like F-score and accuracy etc. of these CNN models, even if trained from the start. The results show that the CNN models only fine-tuned on the primary COVID-19 dataset can considerably improve F-score and accuracy, even if no pre-trained weights are used during the training process. Pre-trained weights are used for deep models in the last two approaches. Results show that these pre-trained weights help in substantially increasing the performance of models. Augmentation is also very effective in enhancing a deep model's performance mainly when the training data is limited. The deep models along with standard augmentation techniques can make these models accomplish improved performance. As shown by the results, the strategy where aug-mentation is applied models show an increase up to 1.77% in F-score and up to 0.84% in accuracy. Our proposed deep ensemble model produces a specificity, recall, F-score and accuracy of 98.97%, 96.89%, 96.89%, 96.90%, and 98.45% respectively. The model generates results better as compared to any of the selected CNN models. The proposed design employing augmentation and fine-tuning procedures to accomplished an increase of up to 4.24% in F-score, and up to 2.02% in accuracy over all the selected models. We compare our results with a previous ensemble approach [66] that uses VGG16 and DenseNet201 models, to show that our model is superior in all respects. The model was trained using the same hyper-parameters as suggested in the study, only the batch size was set to 24 to speed up the training process and attain high accuracy. However, the model only achieved specificity, recall, FScore, and accuracy of 91.35%, 74.05%, 65.34%, and 87.04%, respectively. This clearly depicts that deep CNN models need large training data and hundreds of epochs to generalize well for unseen examples. The confusion matrices further elaborate the results of the proposed model for the COVID-19 dataset as shown in Fig. 5. The results of all the ma-trices of five-fold cross validation show that there are ten misclassifications in the Normal class and 1 misclassification in the COVID-19 class. How-ever, misclassifications are relatively high in the other two classes (bacterial 32 and viral 81).

Fig. 5

Confusion matrices for 5-fold cross-validation.

Fig. 6 shows ROC curves for 5-fold cross-validation to elaborate the performance of the presented model. The ROC curve indicates that the ensemble model performs very good on the COVID-19 dataset.

Fig. 6

ROC curve of the deep ensemble model for 5-fold cross-validation.

A comparison of performances of our ensemble model and various deep learning models. ODS, ADS, NPTW, PTW, ALUF, FT stands for original dataset, augmented dataset, no pre-trained weights, pre-trained weights, all layers un-frozen, fine-tuned. A • denotes that our deep learning ensemble model is statistically better than its competing model. Over-fitting can be a significant hurdle, particularly with a small number of training instances. A model may achieve significant accuracy on the training data but it may not generalize well for new examples. So, a vital issue to review is that whether there is over-fitting or the suggested design has generalized well for given examples. To make this comparison, we evaluate the performance of the model by evaluating the gap within the training and validation curves against the number of epochs. The more widespread is the space between the curves, the greater is over-fitting. Fig. 7 shows the change in accuracy and loss between training and valida-tion curves for the proposed ensemble model as the number of epochs increases. The curve is related to fold-2 of the model, which was further trained to increase the performance of the model. The curve further shows a slight increase in ac-curacy and a decrease in loss after 10 epochs. The figure also demonstrates that the validation and train curves move side by side without a gap, which shows that there is no over-fitting and the proposed model has generalized correctly for the training examples.

Fig. 7

Learning curves for training and validation accuracy (blue, black doted lines) and training and validation loss (orange, black dotted lines) of fold-2 of fine-tuned, pre-trained, ensemble model for various infections in X-rays.

Statistical analysis procedure

To statistically estimate the performance of our proposed model against its competing models, we repeated our evaluation procedure 5 times. Then, we follow the steps given below. We consider the 5 accuracy values of our model and those of a competing model and apply a paired two-sided t-test at the 5% significance level. Our null hypothesis is that the difference between accuracy values of our model and those of a competing model comes from a normal distribution with mean equal to zero and unknown variance. The alternative hypothesis is that the mean is not equal to zero. The same statistical analysis procedure has been applied on F scores, recall and specificity. Our results in Table 4 show that our model is statistically significant than the other models.

A comparison against previous works on coronavirus classification

A comparison of our work with previously applied approaches for the classification of COVID-19 in chest X-ray images is presented in this section. This elaborates that only limited COVID-19 images are used in most of these approaches, secondly, datasets are mostly unbalanced, thirdly, 5- Folds cross-validation is applied by only by few approaches, and finally, statistical comparison of deep models and hyper-parameter tuning is not applied by any of the approaches. While considering the performance metrics in Table 5 , the proposed ensemble model outperforms the considered state-of-the-art approaches, achieving an overall classification accuracy of 98.45%.

Table 5

A comparison of Our model with previous approaches for COVID-19 Classification.

(c) In case the null hypothesis is rejected, the performance of our model is statistically different from the other model. We consider it to be a win for our model if the mean accuracy value of our model is greater than that of the competing model. We denote it be a •. Otherwise, it's a loss for our model. We denote it by a ◦.

If t-test does not reject the null hypothesis, the performance of our model is not statistically different from the other model and we consider it to be a tie. A tie will be represented by no symbol.

Approach	Classes	COVID-19Images	BalancedDataset?	5-folds CV?	StatisticalComparison?	Tuning Hyper parameters?	Accuracy
Our Model	4	1000					98.45%
Covid-Net [35]	3	180	×	×	×	×	93.3%
CoroNet [64]	4	284	×	×	×	×	89.6%
XGB [86]	4	130	×		×	×	79.52%
DELT [87]	4	305	×		×	×	90.13%
COVID-ResNet [74]	4	68	×	×	×	×	96.23%

A comparison of Our model with previous approaches for COVID-19 Classification. (c) In case the null hypothesis is rejected, the performance of our model is statistically different from the other model. We consider it to be a win for our model if the mean accuracy value of our model is greater than that of the competing model. We denote it be a •. Otherwise, it's a loss for our model. We denote it by a ◦. If t-test does not reject the null hypothesis, the performance of our model is not statistically different from the other model and we consider it to be a tie. A tie will be represented by no symbol.

Discussions

The COVID-19 pandemic has adversely affected the health of people and worldwide economies. Speedy diagnosis has often been substandard and sero-logical tests have not been freely available. The likelihood to use chest X-ray images along with ensemble learning as part of the diagnostic technique could help us in the battle against COVID-19 infections and different respiratory pathogens that might emerge in the days to come [72]. Machine and deep learning techniques are producing excellent outcomes in different fields [73,88] but they earned vital attention because of the new outbreak of diseases like COVID-19 [74,89] and Ebola [90]. These methods can help in the quick and precise diagnosis of a disease. Deep CNNs in particular are highly effective in the image classification tasks and various pre-trained deep models exist for upcoming tasks while applying transfer learning, fine-tuning, and augmenta-tion techniques. These techniques are a benchmark in several domains such as object classification [91], image identification [46], biomedical image analysis [92], bacterial classification [31,80,93], and especially disease classification using chest X-rays [94]. A specific type of these deep models known as the ensemble learning model performs image classification, which generates far better results than any individual deep learning model [72]. Our study reveals that transfer learning from the non-biological tasks can substantially enhance current methods of classification for radiology images while also separately yielding feasible performances for a small dataset as shown in Table 4. Some notable investigations [78,79,84] by researchers produce confirmation that transfer learning can generate outstanding outcomes, especially in the case of small datasets [78,95]. Fine-tuning is utilized on multiple pre-trained deep models using Chest X-ray images, to help deep models converge swiftly and acquire features related to a specific domain. The current study unveils that fine-tuning a deep model is essential for its reusability [96]. With such a design, we maintain the initial architectural model, and the pre-learned weights, are appropriated to initialize the pre-trained deep model. The learned weights are therefore renewed through-out the procedure of fine-tuning, helping the model to discover features specific to the related problem. Recently, several types of research have confirmed that fine-tuning is effective and effective for different types of classification issues in the biological field [97]. In the proposed ensemble model, two best performing deep models (MobileNetV2 and InceptionV3) selected for ensemble learning. The ensemble model, possessing the characteristics of both models, was fine-tuned to attain domain-specific features, which are necessary for the identification of novel coro-navirus and other chest related infections in X-ray images. MobileNetV2 depends on a streamlined architectural design that applies depth-wise separable convolutions with different layers for filtering and merging. The factorization has the impact of substantially diminishing computational cost and design dimensions [98]. Such type of network possesses a lesser number of parameters to adjust, as compared to standard Convolution networks, which reduces overfitting. A recent investigation manifested that pre-trained InceptionV3 model and fine-tuned using chest X-ray films relating to the examination of pulmonary nodules, accomplished fantastic results for the diagnosis of thoracic disease, similar to the conclusion of expert radiologists [99]. Another research also utilizes the InceptionV3 model and transfer learning using chest X-rays for the classification of pneumonia [58]. The InceptionV3 architecture utilizes factorized inception blocks, facilitating the interface to pick appropriate kernel-sizes for the convolution layers. That allows the design to gain both high and low-level features with larger and smaller convolution layers [83]. The research also depicts that augmentation can significantly help in reducing overfitting and improving the performance of deep models. All pre-trained models applied augmentation technique produced an increase of 1.54–4.24% increase in F-score, and 0.74–2.02% increase in accuracy over prior models, as in Table 4. Investigations also maintain our opinion that augmentation can help in increasing the performance and producing a more generalized prototype without the menace of overfitting [88,95]. The current research also depicts that merging features from deep models using ensemble learning can help in increasing the accuracy and overcoming deficiencies of individual models. As a result, F-score and accuracy of the model increased over all the applied deep models by 1.54–4.24% and 0.74–2.02%, respectively. Various researches propose that contrary to the conventional CNN models, ensemble learning models by merging deep CNN's, acquire more useful features from images in the training data. These ensemble models have accomplished outstanding results in image classification tasks in various domains [89,100,101], along with pneumonia classification [58], cardiovascular tissues identification [102] and especially in the area of radiology images [90,103].

Conclusions and future works

In the current research, a new coronavirus classification procedure is devised, which takes advantage of ensemble learning, transfer learning, fine-tuning, and data augmentation techniques for differentiating between various chest infections. Ensemble learning assists in blending the qualities of separate models while overcoming deficiencies of specific models. Transfer learning addresses the need for an abundant quantity of training data. Fine-tuning aids the model converge quickly and attain domain-related features. Data augmentation makes datasets more diverse, which enhances the generalization capacity of the model and thus assists in handling overfitting. The suggested design comprises of MobileNetV2 and InceptionV3 designs, which produces far better classification performance than any of the selected deep models. We also compared our model with another ensemble approach presented in the literature, to show that our model is far superior to the presented approach. The model attained aa specificity, recall, F-score and accuracy of 98.97%, 96.89%, 96.90%, and 98.45%, which can significantly beneficial for radiologists and diagnostic staff in the accurate classification of the novel coronavirus and other bacterial and viral infections in chest X-rays. As a result, the proposed model can assist in the accurate and swift diagnosis of chest related infections and thereby limiting the social and economic impact on the community. Although the results are good, the dataset is still limited and noisy. For viable deep learning solutions that can be accepted as a standard for classifying COVID-19 in X-rays, there is a need to develop a significantly larger dataset without any background noise. As future work, a deep lung segmentation model like U-Net can be used to extract the Lung counter from the noisy X-ray image to develop a significantly larger dataset for COVID-19, which can then be used as input for CNN models. This can be a leap towards a highly accurate and practical deep learning solution for the community.

Consent to participate

The authors declares their consent for participation.

Consent for publication

The authors declares their consent for publication.

Code availability

Code will be shared on request.

Authors’ contributions

We prepared a large and well-balanced dataset with four classes (viral, bacterial, COVID-19 and normal). Each class consists of 1000 images. Our work integrates fine-tuning, augmentation, transfer learning and hyper-parameter tuning into one model. We propose a deep design by merging the features of MobileNetV2 and InceptionV3 models using an ensemble approach. We have made architectural adjustments to deep models by adding three dense layers at the end of each model to learn more intricate features before merging their features through the addition layer. We also added a dropout layer after the addition layer to handle overfitting. We compare our ensemble model with state-of-the-art deep models in terms of four performance metrics namely specificity, recall, F-score and accuracy. We also compared our ensemble model with another ensemble approach presented in the literature, to show that our model is far superior to the previous model. To statistically validate the performance of the proposed design, we employ 5-fold cross-validation with the paired two-sided t-test.

Availability of data and material

COVID-19 dataset will be shared on request.

Funding

No funds, grants, or other support was received.

Declaration of competing interest

The authors declares no conflicts of interest, financial or otherwise.

51 in total

1. The potential contribution of Iivestock to food and nutrition security: the application of the One Health approach in livestock policy and practice.

Authors: D Nabarro; C Wannous
Journal: Rev Sci Tech Date: 2014-08 Impact factor: 1.181

Review 2. The global one health paradigm: challenges and opportunities for tackling infectious diseases at the human, animal, and environment interface in low-resource settings.

Authors: Wondwossen A Gebreyes; Jean Dupouy-Camet; Melanie J Newport; Celso J B Oliveira; Larry S Schlesinger; Yehia M Saif; Samuel Kariuki; Linda J Saif; William Saville; Thomas Wittum; Armando Hoet; Sylvain Quessy; Rudovick Kazwala; Berhe Tekola; Thomas Shryock; Michael Bisesi; Prapas Patchanee; Sumalee Boonmar; Lonnie J King
Journal: PLoS Negl Trop Dis Date: 2014-11-13

Review 3. Jumping species-a mechanism for coronavirus persistence and survival.

Authors: Vineet D Menachery; Rachel L Graham; Ralph S Baric
Journal: Curr Opin Virol Date: 2017-03-31 Impact factor: 7.090

4. Escalating infection control response to the rapidly evolving epidemiology of the coronavirus disease 2019 (COVID-19) due to SARS-CoV-2 in Hong Kong.

Authors: Vincent C C Cheng; Shuk-Ching Wong; Jonathan H K Chen; Cyril C Y Yip; Vivien W M Chuang; Owen T Y Tsang; Siddharth Sridhar; Jasper F W Chan; Pak-Leung Ho; Kwok-Yung Yuen
Journal: Infect Control Hosp Epidemiol Date: 2020-03-05 Impact factor: 3.254

5. Machine-learning classification of texture features of portable chest X-ray accurately classifies COVID-19 lung infection.

Authors: Lal Hussain; Tony Nguyen; Haifang Li; Adeel A Abbasi; Kashif J Lone; Zirun Zhao; Mahnoor Zaib; Anne Chen; Tim Q Duong
Journal: Biomed Eng Online Date: 2020-11-25 Impact factor: 2.819

6. Deep-COVID: Predicting COVID-19 from chest X-ray images using deep transfer learning.

Authors: Shervin Minaee; Rahele Kafieh; Milan Sonka; Shakib Yazdani; Ghazaleh Jamalipour Soufi
Journal: Med Image Anal Date: 2020-07-21 Impact factor: 8.545

7. CoroNet: A deep neural network for detection and diagnosis of COVID-19 from chest x-ray images.

Authors: Asif Iqbal Khan; Junaid Latief Shah; Mohammad Mudasir Bhat
Journal: Comput Methods Programs Biomed Date: 2020-06-05 Impact factor: 5.428

8. Covid-19: automatic detection from X-ray images utilizing transfer learning with convolutional neural networks.

Authors: Ioannis D Apostolopoulos; Tzani A Mpesiana
Journal: Phys Eng Sci Med Date: 2020-04-03

9. A deep learning algorithm using CT images to screen for Corona virus disease (COVID-19).

Authors: Shuai Wang; Bo Kang; Jinlu Ma; Xianjun Zeng; Mingming Xiao; Jia Guo; Mengjiao Cai; Jingyi Yang; Yaodong Li; Xiangfei Meng; Bo Xu
Journal: Eur Radiol Date: 2021-02-24 Impact factor: 5.315

5 in total

Review 1. Exploring the Deep-Learning Techniques in Detecting the Presence of Coronavirus in the Chest X-Ray Images: A Comprehensive Review.

Authors: K Silpaja Chandrasekar
Journal: Arch Comput Methods Eng Date: 2022-05-23 Impact factor: 8.171