Literature DB >> 32868966

Coronavirus disease (COVID-19) detection in Chest X-Ray images using majority voting based classifier ensemble.

Tej Bahadur Chandra1, Kesari Verma1, Bikesh Kumar Singh2, Deepak Jain3, Satyabhuwan Singh Netam3.   

Abstract

Novel coronavirus disease (nCOVID-19) is the most challenging problem for the world. The disease is caused by severe acute respiratory syndrome coronavirus-2 (SARS-COV-2), leading to high morbidity and mortality worldwide. The study reveals that infected patients exhibit distinct radiographic visual characteristics along with fever, dry cough, fatigue, dyspnea, etc. Chest X-Ray (CXR) is one of the important, non-invasive clinical adjuncts that play an essential role in the detection of such visual responses associated with SARS-COV-2 infection. However, the limited availability of expert radiologists to interpret the CXR images and subtle appearance of disease radiographic responses remains the biggest bottlenecks in manual diagnosis. In this study, we present an automatic COVID screening (ACoS) system that uses radiomic texture descriptors extracted from CXR images to identify the normal, suspected, and nCOVID-19 infected patients. The proposed system uses two-phase classification approach (normal vs. abnormal and nCOVID-19 vs. pneumonia) using majority vote based classifier ensemble of five benchmark supervised classification algorithms. The training-testing and validation of the ACoS system are performed using 2088 (696 normal, 696 pneumonia and 696 nCOVID-19) and 258 (86 images of each category) CXR images, respectively. The obtained validation results for phase-I (accuracy (ACC) = 98.062%, area under curve (AUC) = 0.956) and phase-II (ACC = 91.329% and AUC = 0.831) show the promising performance of the proposed system. Further, the Friedman post-hoc multiple comparisons and z-test statistics reveals that the results of ACoS system are statistically significant. Finally, the obtained performance is compared with the existing state-of-the-art methods.
© 2020 Elsevier Ltd. All rights reserved.

Entities:  

Keywords:  Chest X-Ray; Contagious; Coronavirus; Pandemic; Pneumonia; SARS-COV-2; nCOVID-19

Year:  2020        PMID: 32868966      PMCID: PMC7448820          DOI: 10.1016/j.eswa.2020.113909

Source DB:  PubMed          Journal:  Expert Syst Appl        ISSN: 0957-4174            Impact factor:   6.954


Introduction

The recent outbreak of the novel coronavirus disease (nCOVID-19) has infected millions of people and killed several individuals across the world (“Coronavirus Disease 2019, 2020”; “Johns Hopkins University, Corona Resource Center, 2020”). The World Health Organization (WHO) has declared this epidemic a global health emergency. nCOVID-19 is caused by a highly contagious virus named severe acute respiratory syndrome coronavirus-2 (SARS-COV-2) in which transmission of infection can even occur from the asymptotic patients during the incubation period (Huang et al., 2020, Kooraki et al., 2020). As per the expert's opinion, the virus mainly infects the human respiratory tract leading to severe bronchopneumonia with symptoms of fever, dyspnea, dry cough, fatigue, and respiratory failure, etc. (N.Chen et al., 2020, Cheng et al., 2020). There is no specific vaccine or medication available to cure the disease and prevent further spread. Also, the standard confirmatory clinical test – reverse transcription-polymerase chain reaction (RT-PCR) test for detecting nCOVID-19 is manual, complex, and time-consuming (Chowdhury, Rahman, Khandakar, Mazhar, Kadir, Mahbub, & Reaz, 2020). The limited availability of test-kits and domain experts in the hospitals and rapid increase in the number of infected patients necessitates an automatic screening system, which can act as a second opinion for expert physicians to quickly identify the infected patients, who require immediate isolation and further clinical confirmation. Chest X-Ray (CXR) is one of the important, non-invasive clinical adjuncts that play an essential role in the preliminary investigation of different pulmonary abnormalities (Chandra & Verma, 2020, Chandra and Verma, 2020a, Chandra et al., 2020, Ke et al., 2019). It can act as an alternative screening modality for the detection of nCOVID-19 or to validate the related diagnosis, where the CXR images are interpreted by expert radiologists to look for infectious lesions associated with nCOVID-19. The earlier studies reveal that the infected patients exhibit distinct visual characteristics in CXR images, as shown in Fig. 1 (Cheng et al., 2020, Chowdhury et al., 2020, Chung et al., 2020, Zhang et al., 2020). These characteristics typically include multi‐focal, bilateral ground‐glass opacities and patchy reticular (or reticulonodular) opacities in non-ICU patients, while dense pulmonary consolidations in ICU patients (Hosseiny, Kooraki, Gholamrezanezhad, Reddy, & Myers, 2020). However, the manual interpretation of these subtle visual characteristics on CXR images is challenging and require domain expert (Kanne, Little, Chung, Elicker, & Ketai, 2020; L. Wang & Wong, 2020). Moreover, the exponential increase in the number of infected patients makes it difficult for the radiologist to complete the diagnosis in time, leading to high morbidity and mortality (AsnaouiEl Chawki & Idri, 2020).
Fig. 1

(a–c) nCOVID-19 infected chest X-Ray images (d–f) Pneumonia infected chest X-Ray (h-i) Normal chest X-Ray images.

(a–c) nCOVID-19 infected chest X-Ray images (d–f) Pneumonia infected chest X-Ray (h-i) Normal chest X-Ray images. To fight against nCOVID-19 epidemic, the recent machine learning (ML) techniques can be embedded to develop an automatic computer-aided diagnosis (CAD) system. In this direction, many clinical and radiological studies have been reported, describing various radio-imaging findings and epidemiology of nCOVID-19 (N. Chen et al., 2020, Huang et al., 2020, Kooraki et al., 2020, Yoon et al., 2020). Further, many deep-learning models like deep convolutional network, recursive network, transfer learning models, etc. have been implemented to automatically analyze the radiological disease characteristics (Chouhan et al., 2020, Jaiswal et al., 2019). Xue et al. (2018) used a convolutional neural network (CNN) to assign a class label to different superpixels extracted from the lungs parenchyma and localize tuberculosis-infected regions in CXR images with average dice index of 0.67. Another work by Pesce et al. (2019), used two novel models, the first model is based on backpropagation neural network that uses weakly labeled CXR images and generates visual attention feedback for accurate localization of pulmonary lesions; the second model used reinforcement learning-based recurrent attention model, which learns the sequence of images to find the nodules. Recently, Purkayastha, Buddi, Nuthakki, Yadav, and Gichoya (2020) introduced CheXNet—deep learning (DL) based model integrated with LibreHealth Radiology Information System, which analyzes uploaded CXR images and assigns one of 14 diagnostic labels. Motivated by the promising performance of DL models reported in the literature and urgent need of an alternate screening tool for early detection of nCOVID-19 infected patients, the research community has applied different DL techniques on chest radiograph images (Abbas, Abdelsamea, & Gaber, 2020; S. Wang et al., 2020, Xu et al., 2020). The detailed description of different state-of-arts methods, including the imaging modality, dataset size, algorithms, and obtained performance, are recapitulated in Table 1 . Initially, the authors have used a mixture of CXR images collected from different hospitals, publications, and older repositories (Abbas, Abdelsamea, & Gaber, 2020, Hemdan, Shouman, & Karar, 2020, Narin et al., 2020). However, the limited availability of annotated CXR images for nCOVID-19 cases to train the data-hungry DL models turned out to be the biggest bottleneck (X. Wang et al., 2017). Latter, to avoid the overfitting of the models, the studies used data augmentation techniques, which generates different variants of the source image by applying random photometric transformations like blurring, sharpening, contrast adjustment, etc. (Chowdhury et al., 2020; S. Wang et al., 2020, Xu et al., 2020). Further, the CT images have also been used to perform in-depth volumetric analysis of subtle disease responses (similar to viral pneumonia or other inflammatory lung diseases) (Maghdid, Asaad, Ghafoor, Sadiq, & Khan, 2020; S. Wang et al., 2020, Xu et al., 2020).
Table 1

Existing literature for detection of nCOVID-19 using DL approaches. (Abbreviations: CXR: Chest X-Ray, CT: Computed Tomography, nCOVID: Novel Coronavirus Disease, AUC: Area Under Curve, CNN: Convolutional Neural Network, DT: Decision Tree, SVM: Support Vector Machine, KNN: k-Nearest Neighbor, VGG: Visual Geometry Group).

ArticlesImaging ModalityDataset Size
ClassAlgorithms/TechniquesPerformance
nCOVID-19PneumoniaNormalAugmentedACC (%)AUC
Ozturk et al. (2020)CXR1275005003DarkNet87.02
Xu et al. (2020)CT2192241752634 patches nCOVID,2661 patches Pneumonia,6576 patches Normal3ResNet-1886.70
Panwar et al. (2020)CXR1421422nCOVnet88.100.88
S. Wang et al. (2020)CT791801065 (740 Negative, 325 nCOVID)2Deep Learning89.50
Hemdan, Shouman, and Karar (2020)CXR25252VGG19, DenseNet201, ResNetV2, InceptionV3, InceptionResNetV2, Xception, MobileNetV290.00
Pathak, Shukla, Tiwari, Stalin, and Singh (2020)CT4134392ResNet-5093.020.93
L. Wang et al. (2020)CXR385553880663COVID-Net93.30
Maghdid, Asaad, Ghafoor, Sadiq, and Khan (2020)CXRCT85 CXR203 CT85 CXR158 CT2AlexNet, Modified CNN94.00 CXR82.0 CT
Abbas, Abdelsamea, and Gaber (2020)CXR10511803DeTraC95.12
AsnaouiEl Chawki and Idri (2020)CXR427315832CNN, VGG16 VGG19, Inception_V3, Xception, DensNet201, MobileNet_V2, Inception_ Resnet_V2, Resnet5096.61
Chowdhury et al. (2020)CXR4231485157965403AlexNet, ResNet18, DenseNet201, SqueezeNet97.94
Narin et al. (2020)CXR50502ResNet50, ResNetV2, InceptionV398.00
Ucar and Korkmaz (2020)CXR663895134946083Deep Bayes-SqueezeNe98.26
Nour, Cömert, and Polat (2020)CXR219134513417653CNN, SVM, DT, KNN98.97
Ardakani, Kanafi, Acharya, Khadem, and Mohammadi (2020)CT510510VGG-16, ResNet-18, ResNet-101, AlexNet, VGG-19, Xception, SqueezeNet, GoogleNet, MobileNet-V2, ResNet-5099.510.99
Toğaçar et al. (2020)CXR29598653MobileNetV2, SqueezeNet, SVM99.271.00
Existing literature for detection of nCOVID-19 using DL approaches. (Abbreviations: CXR: Chest X-Ray, CT: Computed Tomography, nCOVID: Novel Coronavirus Disease, AUC: Area Under Curve, CNN: Convolutional Neural Network, DT: Decision Tree, SVM: Support Vector Machine, KNN: k-Nearest Neighbor, VGG: Visual Geometry Group). After the retrospective analysis of the above literatures, we found that the existing studies had been performed using a limited number of input CXR or CT images, which may lead to under-fitting of the data-hungry DL models (X. Wang et al., 2017). Moreover, the DL approach requires huge computational resources along with a large number of accurately annotated CXR images to train the model, which restrain its clinical acceptability (Altaf et al., 2019, Ho and Gwak, 2019). Conventional ML techniques can be better integrated with CAD systems to overcome these shortcomings. Despite several studies, no one has used conventional ML approaches with ensemble learning using majority voting for the classification of normal and nCOVID-19 infected CXR images. In this study, we tailored an automatic COVID screening (ACoS) system that employs hierarchical classification using conventional ML algorithms and radiomic texture descriptors to segregate normal, pneumonia, and nCOVID-19 infected patients. The major advantage of the proposed system is that it can be easily modeled using the limited number of annotated images and can be deployed even in a resource-constrained environment.

Contribution and Organization of paper

The contributions of this study are recapitulated as follows: Proposed an ACoS system for detection of nCOVID-19 infected patients using hierarchical classification and augmented images. The proposed model can be used as a retrospective tool or to validate the related diagnosis. Applied majority vote based classifier ensemble to aggregate the prediction results of five supervised classification algorithms. Review and compare the performance of the proposed ACOS system with the state of the art methods. The remaining sections of the paper is organized as follows. Section 2 describes the materials and methods used in this study. The obtained results and its detailed analysis are discussed in Section 3. The paper is concluded in Section 4.

Materials and methods

Data acquisition, preprocessing, and augmentation

In this study, we have used dataset from three public repositories – COVID‐Chestxray set (Cohen, Morrison, & Dao, 2020), Montgomery set (Candemir et al., 2014, Jaeger et al., 2014), and NIH ChestX-ray14 set (X. Wang et al., 2017). The detailed statistics of the number of posterior-anterior (PA) view CXR images used from each repository are shown in Table 2 . Initially, all input images are preprocessed, which includes image resizing (512 × 512 pixels), format conversion (Portable Network Graphics), and color space conversion (Gray Scale). Subsequently, the texture preserving guided filter is applied to reduce the inherent quantum noise (Sprawls, 2018). The choice of the de-noising filter is based on our previous study (Chandra & Verma, 2020a).
Table 2

Statistics of the number of CXR images used from different repositories for performance evaluation in training, testing, and validation set.

Dataset propertyCOVID‐Chestxray SetMontgomery SetNIH ChestX-ray14 SetAugmented imagesTraining -Testing set (80%)Validation set (20%)
Number of nCOVID-19 CXR images43434869686
Number of Pneumonia X-ray images8934534869686
Number of normal X-ray images198033534869686
Total Number of X-ray images5428068010442088258
Statistics of the number of CXR images used from different repositories for performance evaluation in training, testing, and validation set. The preprocessed images are divided into two sub-sets: training-testing set (80%) and validation set (20%). Further, the image augmentation technique is applied to the images of the training-testing set to build a generalized model by incorporating the possible variability in the images, which might occur due to diverse imaging conditions. We applied different random photometric transformations with random parameters between the specified ranges, as described in Table 3 .
Table 3

Image augmentation using various photometric transformations.

TransformationsRange
SharpeningAutomaticHighlight the fine details by adjusting the contrast between bright and dark pixels.
Gaussian Blur0.1to1.5Random smoothing of texture information between the specified range of sigma.
Brightness-20to20Randomly increase or decrease the pixel’s intensity between the given range.
Contrast adjustmentAutomaticAdjust the contrast of the image.
Image augmentation using various photometric transformations.

Feature extraction and feature selection

The nCOVID-19 infected patients exhibit different radiographic texture patterns such as patchy ground-glass opacities (Fig. 2 a), pulmonary consolidations (Fig. 2c), reticulonodular opacities (Fig. 2b), etc. on CXR images (Hosseiny et al., 2020). These subtle visual characteristics can be efficiently represented with the help of radiomic texture descriptors. The study uses eight first order statistical features (FOSF) (Srinivasan & Shobha, 2008), 88 grey level co-occurrence matrix (GLCM) (Gómez et al., 2012, Haralick et al., 1973) features (in four different orientations) and 8100 histogram of oriented gradients (HOG) (Dalal & Triggs, 2005, Santosh and Antani, 2018) features. The FOSF describes the complete image at a glance by using the mean, variance, roughness, smoothness, kurtosis, energy, and entropy, etc. It can easily quantify the global texture patterns; however, it does not contemplate the local neighborhood information. To overcome this shortcoming, the GLCM and HOG feature descriptor are used to perform the in-depth texture analysis. The GLCM feature describes the spatial correlation among the pixel intensities in radiographic texture patterns along four distinct directions (i.e.,) whereas the HOG feature encodes the local shape/texture information. The selection of these statistical texture features is motivated by the fact that it can efficiently encode the natural texture patterns and is widely used in medical image analysis (Chandra and Verma, 2020a, Chandra et al., 2020, Santosh and Antani, 2018, Vajda et al., 2018).
Fig. 2

nCOVID-19 infected Chest X-ray images showing: (a) Ground‐glass opacities, (b) Reticular opacities, (c) Pulmonary consolidation, (d) Mild opacities.

nCOVID-19 infected Chest X-ray images showing: (a) Ground‐glass opacities, (b) Reticular opacities, (c) Pulmonary consolidation, (d) Mild opacities. In this study, a total of 8196 features (8 FOSF, 88 GLCM, 8100 HOG) are extracted from each CXR image (described in Appendix-A). However, not all the extracted features are relevant for accurate characterization of visual indicators associated with nCOVID-19. Thus, to select the most informative features, we used a recently developed meta-heuristic approach called—binary grey wolf optimization (BGWO) (Mirjalili et al., 2014, Too et al., 2018). The method imitates the leadership, encircling, and hunting strategy of grey wolfs. Unlike the other evolutionary algorithms, the method does not get trapped in local minima, which motivated us to use it in our study (Emary, Zawbaa, & Hassanien, 2016). Mathematically, the grey wolfs are divided into four categories denoted by alpha (), beta (), delta (), and omega (). The -wolf is the decision-maker and administers the hunting process with the help of beta. The -wolfs are the fittest candidate to replace the alpha when the alpha is very old or dead. The -wolfs are the next in the hierarchy, which obey the orders from and -wolfs but command omega wolfs. The -wolfs are the lowest in the hierarchy and report to these leader wolfs. The encircling strategy of the wolfs is described in Eq. (1). where, denotes the position of the pray and grey wolf in ‘’ iteration, respectively. The A and C are the coefficient vectors computed using equations Eqs. (3) and (4), respectively. where, denotes the two random numbers between 0 and 1, and ‘’ denotes the linearly decreasing encircling coefficient (from 2 to 0) used to balance the tradeoff between searching and exploitation. Further, the optimal position of the wolfs () at iteration ‘t’ are updated using Eqs. (5), (6), (7), (8); are computed using Eq. (3); and are calculated using Eq. (2).

Proposed methodology

To develop a robust ACoS system, we hypothesized the following: The image augmentation technique could improve the robustness of the ACoS system by incorporating variability in input CXR images, which might occur due to diverse imaging modality, exposure time, radiation dose, and varying patient’s posture. The solid mathematical foundation and better generalization capability of support vector machine (SVM) could uncover the subtle radiological characteristics associated with nCOVID-19. The majority voting based classifier ensemble could act as a multi-expert recommendation and reduce the probable chance of false diagnosis. To evaluate the hypothetical assumptions, we proposed a prototype (ACoS system) model, as shown in Fig. 3 . The proposed system consists of five major steps: pre-processing, image augmentation, feature extraction, classification, and performance evaluation. Initially, to examine Hypothesis 1, the input CXR images are preprocessed (resize, de-noise), and the image augmentation technique is applied (described in Section 2.1). Subsequently, the radiomic textures descriptors are extracted from the complete CXR image and binary gray wolf optimization (BGWO) (Emary et al., 2016, Mirjalili et al., 2014) based feature selection technique is applied to pick the most relevant features. Further to examine Hypothesis 2, the selected features are used to train the model using five supervised classification algorithms, namely – decision tree (DT) (Shalev-Shwartz & Ben-David, 2014), support vector machine (SVM) (Vapnik, 1998), k-nearest neighbor (KNN) (Han, Kamber, & Pei, 2012), naïve Bayes (NB) (Rish et al., 2001) and artificial neural network (ANN) (Artificial Neural Network, 2013). The proposed methodology uses two-phase classification approach. In phase-I, the normal and abnormal (containing nCOVID-19 and Pneumonia) images are segregated. Subsequently, the abnormal images are further classified in phase-II to segregate the nCOVID-19 and pneumonia. Moreover, the fully trained model is validated using a separate validation set. The final prediction of the validation set is the majority vote of seven benchmark classifiers (ANN, KNN, NB, DT, SVM (linear kernel), SVM (radial basis function (RBF) kernel), and SVM (polynomial kernel)), which reduce the probable chance of misclassification (Hypothesis 3). Finally, the performance measures are evaluated for testing and validation sets. All the experiments in this study are implemented using MATLAB R2018a.1
Fig. 3

The prototype of the proposed automatic COVID screening (ACoS) system. (Abbreviations: BGWO: Binary Gray Wolf Optimization, SVM: Support Vector Machine, DT: Decision Tree, KNN: k-Nearest Neighbor, NB: Naïve Bayes, ANN: Artificial Neural Network).

The prototype of the proposed automatic COVID screening (ACoS) system. (Abbreviations: BGWO: Binary Gray Wolf Optimization, SVM: Support Vector Machine, DT: Decision Tree, KNN: k-Nearest Neighbor, NB: Naïve Bayes, ANN: Artificial Neural Network).

Classification

To compute the discriminative performance of the aforementioned features we have used the popular supervised classification algorithms: SVM (linear, radial bias function, polynomial) (Chandra and Verma, 2020a, Vapnik, 1998), ANN (Artificial Neural Network, 2013), KNN (Han, Kamber, & Pei, 2012), NB (Khatami et al., 2017, Venegas-Barrera and Manjarrez, 2011), and DT (Han, Kamber, & Pei, 2012, Pantazi, Moshou, & Bochtis, 2020). These algorithms are very fast and are widely used in the literature for the classification of pulmonary diseases using CXR images (Chandra and Verma, 2020a, Santosh and Antani, 2018). The selection of these classifiers is motivated by the fact that these algorithms can be efficiently trained using smaller datasets without compromising with the performance. In this study, a discrete set of models were created for phase-I and phase-II, respectively. In phase-I, the models were trained using normal and abnormal images (containing nCOVID-19 and pneumonia) from training –testing set. However, in phase-II, only abnormal images (containing nCOVID-19 and pneumonia) were used to train the models. In both the phases, the performance of the classifiers was evaluated using a 10 fold cross-validation setup. In each fold, all the optimizable learning hyper-parameters were tuned using the Bayesian automatic optimization method (Snoek, Larochelle, & Adams, 2012). In general, the learning hyper-parameters can be optimized in two ways, called manual and automatic searching. The manual parameter tuning requires expertise. However, when dealing with numerous models and larger datasets, even expertise may not be sufficient (Ucar & Korkmaz, 2020). To overcome this shortcoming, an automatic parameter tuning is used as an alternative. In this study, grid search algorithm is used to select the best hyper-parameters by minimizing the cross-validation loss automatically.2 Moreover, to examine the Hypothesis 3, majority vote based classifier ensemble technique (described in Appendix B, Algorithm 1) is applied (shown in Fig. 3) using a separate validation set (258 CXR images). In order to select the optimal combination of evaluated classifiers for majority vote, we implemented an exhaustive search using recursive elimination method (Chatterjee, Dey, & Munshi, 2019; Q. Chen, Meng, & Su, 2020). Initially, the method starts with all evaluated classifiers, according to the selection criteria, it iteratively eliminates the classifiers until all possible combinations exhausted.

Performance evaluation metrics

The performance of the proposed ACoS system is assessed using seven performance measures, as shown in Eqs. (9), (10), (11), (12), (13), (14), (15) (Han, Kamber, & Pei, 2012), where, the number of infected and normal CXR images correctly predicted by the proposed system is denoted by true positive (TP) and true negative (TN), respectively; the false positive (FP) and false-negative (FN) denotes the misclassification of normal and infected images, respectively; P = TP + FN and N = TN + FP. Finally, the obtained results is statistically validated using z-test and Friedman average ranking and Holm (Holm, 1979) and Shaffer (Shaffer, 1986) post-hoc multiple comparison methods.

Experimental results and discussion

This section presents a detailed discussion of the obtained experimental results of the proposed ACoS system. To evaluate the hypothetical assumptions (Hypothesis 1, Hypothesis 2, Hypothesis 3), the following experiments were formulated: Experiment 1—Different photometric transformations were randomly applied to input CXR images, and classification performance was evaluated (Hypothesis 1). Experiment 2—The classification performance of SVM was assessed and compared with the other benchmark classifiers (Hypothesis 2). Experiment 3—The classification performance of the majority voting technique and individual benchmark classifiers are evaluated using a separate validation set (Hypothesis 3). In this study, we used two-phase classification technique to discriminate the normal, nCOVID-19 and pneumonia X-ray images. Initially, two sets of classification models were created for phase-I and phase-II, respectively using original CXR images from the training-testing set. Subsequently, the image augmentation was performed using different photometric transformations as discussed in Section 2.1. The augmented images along with the original CXR images were used to re-train the models and classification performance was evaluated. From the obtained results shown in Tables 4 and 5 , it was observed that the supervised models trained using augmented images performed significantly better compared to the models trained using original CXR images for both the phases (phase-I and phase-II), which confirms the validity of Hypothesis 1. The obtained promising performance using augmented images can be justified by the fact that the augmented images provide sufficient instances to train the model for possible variations in input CXR images, which might occur due to diverse imaging parameters and platforms in different hospitals.
Table 4

Phase-I (Normal vs. Abnormal) classification performance of different supervised models using Training-Testing in 10-fold cross-validation setup. (Abbreviations: SVM: Support Vector Machine, DT: Decision Tree, KNN: k-Nearest Neighbor, NB: Naïve Bayes, ANN: Artificial Neural Network, STD: Standard Deviation, AUC: Area Under Curve, MCC: Matthews Correlation Coefficient, Note: best performance is highlighted with bold letters).

Classification algorithmsAccuracy (±STD)Specificity (±STD)Precision (±STD)Recall (±STD)F1-Measure (±STD)AUC (±STD)MCC (±STD)
Without using Augmented Images
SVM (RBF Kernel)66.70 ± 0.833.40 ± 2.2266.96 ± 0.7998.56 ± 0.9579.74 ± 0.570.51 ± 0.010.06 ± 0.09
DT90.73 ± 5.0385.95 ± 6.0492.95 ± 3.1193.12 ± 4.9793.02 ± 3.890.90 ± 0.050.79 ± 0.11
NB97.22 ± 1.1597.71 ± 3.2498.85 ± 1.1396.98 ± 0.8297.90 ± 0.860.97 ± 0.020.94 ± 0.03
KNN97.41 ± 2.0898.28 ± 1.2098.14 ± 1.0096.99 ± 3.2698.02 ± 0.640.98 ± 0.020.94 ± 0.04
SVM (Poly Kernel)98.47 ± 0.8197.97 ± 2.3999.00 ± 0.8698.71 ± 0.8198.85 ± 0.610.98 ± 0.010.97 ± 0.02
ANN98.47 ± 1.1298.29 ± 1.4899.13 ± 0.7598.56 ± 1.1898.84 ± 0.850.98 ± 0.010.97 ± 0.02
SVM (Linear Kernel)98.85 ± 1.0998.57 ± 1.5199.21 ± 0.7698.99 ± 0.9899.13 ± 0.730.99 ± 0.010.97 ± 0.02
Using Augmented Images
SVM (RBF Kernel)82.90 ± 1.7148.71 ± 5.0979.62 ± 1.6398.17 ± 0.3588.64 ± 1.010.74 ± 0.030.62 ± 0.04
DT95.88 ± 1.0493.38 ± 3.696.75 ± 1.7397.13 ± 1.1796.92 ± 0.760.95 ± 0.020.91 ± 0.02
NB96.70 ± 1.5597.28 ± 1.8498.60 ± 0.9596.41 ± 1.7697.49 ± 1.190.97 ± 0.020.93 ± 0.03
ANN99.33 ± 0.2598.13 ± 1.5599.18 ± 0.5699.43 ± 0.3499.50 ± 0.160.99 ± 0.010.98 ± 0.02
SVM (Poly Kernel)99.38 ± 0.5599.57 ± 0.1799.78 ± 0.1599.19 ± 0.1799.13 ± 0.520.99 ± 0.010.99 ± 0.01
KNN99.41 ± 0.5199.71 ± 0.9099.86 ± 0.1399.31 ± 0.4099.71 ± 0.281.00 ± 0.010.99 ± 0.01
SVM (Linear Kernel)99.67 ± 0.3199.57 ± 0.2799.79 ± 0.6799.86 ± 0.1499.82 ± 0.171.00 ± 0.000.99 ± 0.00
Table 5

Phase-II (nCOVID vs. Pneumonia) classification performance of different supervised models using Training-Testing set in 10-fold cross-validation setup.

Classification algorithmsAccuracy (±STD)Specificity (±STD)Precision (±STD)Recall (±STD)F1-Measure (±STD)AUC (±STD)MCC (±STD)
Without using Augmented Images
DT72.98 ± 3.5470.93 ± 12.0673.06 ± 6.2074.96 ± 8.8273.41 ± 3.220.73 ± 0.040.47 ± 0.07
ANN74.29 ± 5.2874.29 ± 32.9974.29 ± 8.5774.29 ± 30.0774.29 ± 17.50.74 ± 0.050.49 ± 0.13
SVM (RBF Kernel)80.03 ± 4.4080.18 ± 10.6181.05 ± 8.1279.91 ± 7.4480.01 ± 4.300.80 ± 0.040.61 ± 0.09
KNN80.17 ± 4.6283.30 ± 7.8282.68 ± 6.4277.03 ± 7.2379.47 ± 4.920.80 ± 0.050.61 ± 0.09
NB80.46 ± 6.0379.57 ± 6.2680.00 ± 5.7181.36 ± 7.7080.58 ± 6.180.80 ± 0.060.61 ± 0.12
SVM (Poly Kernel)83.34 ± 3.4685.89 ± 5.6385.42 ± 4.3780.77 ± 5.8182.87 ± 3.630.83 ± 0.030.67 ± 0.07
SVM (Linear Kernel)85.07 ± 6.9487.09 ± 9.6487.11 ± 8.7383.08 ± 7.3984.83 ± 6.740.85 ± 0.070.71 ± 0.14
Using Augmented Images
NB84.63 ± 6.7585.94 ± 6.2685.51 ± 6.4383.31 ± 8.4684.32 ± 7.120.85 ± 0.070.69 ± 0.13
DT84.84 ± 5.2784.19 ± 4.8284.36 ± 4.8585.49 ± 684.91 ± 5.340.85 ± 0.050.7 ± 0.11
ANN94.93 ± 2.5494.2 ± 3.5294.29 ± 3.195.65 ± 3.594.96 ± 2.550.95 ± 0.030.9 ± 0.05
KNN97.27 ± 1.8695.83 ± 2.595.98 ± 2.3598.7 ± 1.2597.31 ± 1.810.97 ± 0.020.95 ± 0.04
SVM (RBF Kernel)97.7 ± 1.8597.41 ± 2.1797.49 ± 2.2797.99 ± 1.9197.71 ± 1.820.98 ± 0.020.95 ± 0.04
SVM (Poly Kernel)97.98 ± 1.9997.7 ± 2.2697.75 ± 2.1898.19 ± 1.5897.89 ± 2.050.98 ± 0.020.95 ± 0.04
SVM (Linear Kernel)98.78 ± 0.9698.14 ± 1.6998.19 ± 1.6899.23 ± 0.7298.79 ± 0.950.99 ± 0.010.98 ± 0.02
Phase-I (Normal vs. Abnormal) classification performance of different supervised models using Training-Testing in 10-fold cross-validation setup. (Abbreviations: SVM: Support Vector Machine, DT: Decision Tree, KNN: k-Nearest Neighbor, NB: Naïve Bayes, ANN: Artificial Neural Network, STD: Standard Deviation, AUC: Area Under Curve, MCC: Matthews Correlation Coefficient, Note: best performance is highlighted with bold letters). Phase-II (nCOVID vs. Pneumonia) classification performance of different supervised models using Training-Testing set in 10-fold cross-validation setup. Moreover, the results obtained using different supervised algorithms for phase-I (shown in Table 4) and phase-II (shown in Table 5) demonstrates that the SVM (linear kernel) outperformed the others using a selected feature set (1546 features for phase-I and 2018 features for phase-II). The significant better performance of SVM is due to its generalization capability and ability to learn and infer the intricate natural patterns by efficiently adapting the hyperplane and the soft margins using support vectors. Further, the obtained higher accuracy (ACC) of 99.67 ± 0.31%, area under the curve (AUC) of 1 ± 0.00, and Matthews Correlation Coefficient (MCC) of 0.99 ± 0.00 for phase-I and ACC of 98.78 ± 0.96, AUC of 0.99 ± 0.01, and MCC of 0.98 ± 0.02 for phase-II demonstrates its promising performance and thus justifying the validity of Hypothesis 2. The nCOVID-19 is highly contagious, and even a single false negative may lead to community spread of the infection. Therefore, to reduce the probable chance of misclassification, we used the majority voting based classifier ensemble of seven benchmark supervised models, as shown in Fig. 3. Further, the classification performance of the majority voting technique and individual benchmark classifiers are evaluated using a separate validation set (which was not used during training of the models). The set consists of 258 CXR images (86 normal, 86 nCOVID-19 and 86 pneumonia). Initially, the radiomic texture features (described in Section 2.2) were extracted from the input CXR images and classified using different supervised models in phase-I. The output of each model acts as an expert suggestion to segregate the input CXR images into normal or abnormal (nCOVID-19 or pneumonia). The classification performance of each model using validation set in phase-I is shown in Table 6 . From the obtained results, it can be observed that the performance of majority voting algorithm in phase-I (ACC of 98.062%, AUC of 0.977, and MCC of 0.956) is significantly better compared to the others. Further, all the images which were classified to abnormal category in phase-I were passed to phase-II for differential diagnosis between nCOVID-19 and pneumonia.
Table 6

Phase-I (Normal vs. Abnormal) classification performance of different supervised models and majority voting algorithm using the validation set. (Abbreviations: SVM: Support Vector Machine, DT: Decision Tree, KNN: k-Nearest Neighbor, NB: Naïve Bayes, ANN: Artificial Neural Network, STD: Standard Deviation, AUC: Area Under Curve, MCC: Matthews Correlation Coefficient). (Note: best performance is highlighted with bold letters).

Classification algorithmsAccuracy (%)Specificity (%)Precision (%)Recall (%)F1-Measure (%)AUCMCC
NB88.37273.25687.76695.93091.6670.8460.734
DT90.69880.23390.65995.93093.2200.8810.788
SVM (RBF Kernel)95.34989.53594.94498.25696.5710.9390.895
KNN95.73694.18697.07696.51296.7930.9530.904
SVM (Linear Kernel)96.12490.69895.50698.83797.1430.9480.913
SVM (Poly Kernel)96.12491.86096.02398.25697.1260.9510.912
ANN96.51293.02396.57198.25697.4060.9560.921
Majority voting98.06296.51298.26698.83798.5510.9770.956
Phase-I (Normal vs. Abnormal) classification performance of different supervised models and majority voting algorithm using the validation set. (Abbreviations: SVM: Support Vector Machine, DT: Decision Tree, KNN: k-Nearest Neighbor, NB: Naïve Bayes, ANN: Artificial Neural Network, STD: Standard Deviation, AUC: Area Under Curve, MCC: Matthews Correlation Coefficient). (Note: best performance is highlighted with bold letters). In phase-II, the abnormal input images were classified using each supervised model, and prediction results were aggregated using majority voting based classifier ensemble. From the obtained results shown in Table 7 , it was observed that majority voting based classifier ensemble achieved significantly higher performance (ACC of 91.279%, AUC of 0.913, and MCC of 0.830) compared to the individual models, which confirms the robustness of the proposed ACoS system (justifying the validity of Hypothesis 3).
Table 7

Phase-II (nCOVID vs. Pneumonia) classification performance of different supervised models and majority voting algorithm using the validation set. (Note: best performance is highlighted with bold letters).

Classification algorithmsAccuracy (%)Specificity (%)Precision (%)Recall (%)F1-Measure (%)AUCMCC
KNN72.09376.74474.35967.44270.7320.7210.444
ANN73.25653.48866.66793.02377.6700.7330.506
DT79.07082.55881.25075.58178.3130.7910.583
NB80.81472.09376.23889.53582.3530.8080.626
SVM (Linear Kernel)81.97783.72183.13380.23381.6570.8200.640
SVM (Poly Kernel)86.04779.07081.63393.02386.9570.8600.728
SVM (RBF Kernel)86.62883.72184.61589.53587.0060.8660.734
Majority voting91.32986.20787.36896.51291.7130.9140.831
Phase-II (nCOVID vs. Pneumonia) classification performance of different supervised models and majority voting algorithm using the validation set. (Note: best performance is highlighted with bold letters). To breakoff, the community spread of the nCOVID-19, one of the desired properties in any ACoS system is that it should have the least number of Type-II (false negative) errors without compromising with the number of Type-I (false positive) errors. Fig. 4 (a) and (b) show the confusion matrix (CM) for majority voting algorithm for phase-I and phase-II, respectively using validation set. From the CM, it was observed that the majority voting approach outperformed the others achieving fewer Type-I and Type-II errors.
Fig. 4

Confusion matrix using validation set for (a) Majority voting (Phase-I), (b) Majority voting (Phase-II).

Confusion matrix using validation set for (a) Majority voting (Phase-I), (b) Majority voting (Phase-II).

Statistical analysis

In this section describes the statistical significance of the obtained results from the various experiments performed in this study. Initially, the statistical significance of obtained performance (ACC and F1-measure) of different supervised models using augmented images and without using augmented images for phase-I and phase-II were validated using z-test statistics. The test consider the null hypothesis as the performance of supervised models before and after applying image augmentation is equal. Alternatively, the models trained using augmented images exhibit higher performance. The test statistics for phase-I and phase-II at 95% confidence interval (or ) are shown in Table 8 . From the statistical results of phase-I, it was observed that the classification performance of ANN, SVM (linear and RBF kernel), DT, and KNN models are significantly higher using augmented images (accepting the alternate hypothesis) compared to the models trained using original CXR images. Similarly, in phase-II, all the models strongly accept the alternate hypothesis (i.e., models trained using augmented images exhibit higher performance or ).
Table 8

Computed z-score for comparing the performance (accuracy and F-measure) of different supervised models using augmented images vs. without using augmented images for Training-Testing set in 10-fold cross-validation setup (at 95% significance level or alpha = 0.05). (Abbreviations: SVM: Support Vector Machine, DT: Decision Tree, KNN: k-Nearest Neighbor, NB: Naïve Bayes, ANN: Artificial Neural Network, STD: Standard Deviation, AUC: Area Under Curve, MCC: Matthews Correlation Coefficient, Note: Bold value denotes the rejection of alternate hypothesis).

ClassifiersPhase-I
Phase-II
Accuracy
F1-Measure
Accuracy
F1-Measure
Z-ScoreP-ValueZ-ScoreP-ValueZ-ScoreP-ValueZ-ScoreP-Value
ANN−2.786860.00821−2.847200.00693−16.786300.00000−16.830800.00000
SVM (Linear Kernel)−2.328910.02649−2.041640.04963−13.506500.00000−13.705140.00000
SVM (RBF Kernel)−10.237060.00000−6.704900.00000−18.815390.00000−18.846670.00000
SVM (Poly Kernel)−2.514590.01690−0.770560.29647−15.244700.00000−15.412340.00000
DT−5.797330.00000−5.035250.00000−7.961440.00000−7.748920.00000
NB0.796590.290480.710260.31000−2.940580.00529−2.625490.01271
KNN−4.743450.00001−4.868420.00000−16.230910.00000−16.756140.00000
Computed z-score for comparing the performance (accuracy and F-measure) of different supervised models using augmented images vs. without using augmented images for Training-Testing set in 10-fold cross-validation setup (at 95% significance level or alpha = 0.05). (Abbreviations: SVM: Support Vector Machine, DT: Decision Tree, KNN: k-Nearest Neighbor, NB: Naïve Bayes, ANN: Artificial Neural Network, STD: Standard Deviation, AUC: Area Under Curve, MCC: Matthews Correlation Coefficient, Note: Bold value denotes the rejection of alternate hypothesis). The statistical significance of the proposed ACoS system was evaluated using Friedman average ranking method and Holm and Shaffer pairwise comparison method for validation set (Chandra and Verma, 2020a, Chandra et al., 2020). The Friedman test statistics compare the mean ranks of different classifiers assuming that the performance of all classifiers are equal (null hypothesis). From the average ranks shown in Table 9 , we found that the test strongly accepts the alternate hypothesis while rejecting the null, which confirms the substantial difference in the performance of different classification algorithms (at ) for both the phases. The result can also be verified from the Friedman test (at 7 degrees of freedom) with for phase-I and for phase-II. Further, the validity of Hypothesis 3 can be verified from the fact that the majority voting algorithm achieved minimum rank (first rank) in both the phases.
Table 9

Average ranking of classifiers based on different classification performance metrics using the Friedman test with 7 degrees of freedom. (Note: the minimum value represents the better rank and is highlighted in bold).

Classification algorithmsAverage ranking of classification algorithms
Phase - IPhase - II
NB7.9295.214
DT7.0715.714
SVM (RBF Kernel)5.7142.429
KNN4.0007.571
SVM (Linear Kernel)3.7144.071
SVM (Poly Kernel)3.9293.357
ANN2.5716.643
Majority Voting1.0711.000
Average ranking of classifiers based on different classification performance metrics using the Friedman test with 7 degrees of freedom. (Note: the minimum value represents the better rank and is highlighted in bold). Further, the Friedman average rankings shown in Table 9 demonstrate that the mean ranks of different classification algorithms are significantly different (), therefore it is meaning full to perform the pairwise post-hoc comparisons. In this study, Holm (Holm, 1979) and Shaffer (Shaffer, 1986) post-hoc procedures were used to perform multiple pairwise comparisons. The method considers the null hypothesis as all algorithms performed equally. In this study, 28 pairs of classification algorithms (denoted by ‘’) were compared at level of significance. The Holm and Shaffer method reject those hypotheses that have an unadjusted and, respectively for both phase-I and phase-II. The test statistics for phase-I and phase-II are shown in Tables 10 and 11 , respectively. From the statistical results, it was observed that the performance of the proposed majority vote based classifier ensemble method is significantly better compared to the other classification algorithms for both the phases confirming the validity of Hypothesis 3.
Table 10

p-value and adjusted p-value for pairwise multiple comparisons of different supervised classification algorithms (Phase-I: Normal vs. Abnormal) using the validation set at (Abbreviations: SVM: Support Vector Machine, DT: Decision Tree, KNN: k-Nearest Neighbor, NB: Naïve Bayes, ANN: Artificial Neural Network, STD: Standard Deviation, AUC: Area Under Curve, MCC: Matthews Correlation Coefficient).

iAlgorithmsz=R0-RiSEpHolmShafferAdjusted p-Value
pHolmpShaffer
28NB vs. Majority Voting5.23720.00000.00180.00180.00360.0036
27DT vs. Majority Voting4.58260.00000.00190.00240.00370.0048
26NB vs. ANN4.09160.00000.00190.00240.00380.0048
25SVM (RBF Kernel) vs. Majority Voting3.54600.00040.00200.00240.00400.0048
24DT vs. ANN3.43690.00060.00210.00240.00420.0048
23NB vs. SVM (Linear Kernel)3.21870.00130.00220.00240.00430.0048
22NB vs. SVM (Poly Kernel)3.05510.00230.00230.00240.00450.0048
21NB vs. KNN3.00050.00270.00240.00240.00480.0048
20DT vs. SVM (Linear Kernel)2.56410.01030.00250.00250.00500.0063
19DT vs. SVM (Poly Kernel)2.40040.01640.00260.00260.00530.0063
18SVM (RBF Kernel) vs. ANN2.40040.01640.00280.00280.00560.0063
17DT vs. KNN2.34580.01900.00290.00290.00590.0063
16KNN vs. Majority Voting2.23670.02530.00310.00310.00630.0063
15SVM (Poly Kernel) vs. Majority Voting2.18220.02910.00330.00330.00670.0067
14SVM (Linear Kernel) vs. Majority Voting2.01850.04350.00360.00360.00710.0071
13NB vs. SVM (RBF Kernel)1.69120.09080.00380.00380.00770.0077
12SVM (RBF Kernel) vs. SVM1.52750.12660.00420.00420.00830.0083
11SVM (RBF Kernel) vs. SVM1.36390.17260.00450.00450.00910.0091
10SVM (RBF Kernel) vs. KNN1.30930.19040.00500.00500.01000.0100
9ANN vs. Majority Voting1.14560.25190.00560.00560.01110.0111
8KNN vs. ANN1.09110.27520.00630.00630.01250.0125
7DT vs. SVM (RBF Kernel)1.03650.30000.00710.00710.01430.0143
6SVM (Poly Kernel) vs. ANN1.03650.30000.00830.00830.01670.0167
5SVM (Linear Kernel) vs. ANN0.87290.38270.01000.01000.02000.0200
4NB vs. DT0.65470.51270.01250.01250.02500.0250
3KNN vs. SVM (Linear Kernel)0.21820.82730.01670.01670.03330.0333
2SVM (Linear Kernel) vs. SVM0.16370.87000.02500.02500.05000.0500
1KNN vs. SVM (Poly Kernel)0.05460.95650.05000.05000.10000.1000
Table 11

p-value and adjusted p-value for pairwise multiple comparisons of different supervised classification algorithms (Phase-II: nCOVID-19 vs. Pneumonia) using the validation set at.

iAlgorithmsz=R0-RiSEpHolmShafferAdjusted p-Value
pHolmpShaffer
28KNN vs. Majority Voting5.01900.00000.00180.00180.00000.0000
27ANN vs. Majority Voting4.30980.00000.00190.00240.00040.0003
26KNN vs. SVM (RBF)3.92790.00010.00190.00240.00220.0018
25DT vs. Majority Voting3.60060.00030.00200.00240.00790.0067
24NB vs. Majority Voting3.21870.00130.00210.00240.03090.0270
23KNN vs. SVM (Poly)3.21870.00130.00220.00240.03090.0270
22ANN vs. SVM (RBF)3.21870.00130.00230.00240.03090.0270
21KNN vs. SVM (Linear)2.67320.00750.00240.00240.15780.1578
20ANN vs. SVM (Poly)2.50950.01210.00250.00250.24180.1934
19DT vs. SVM (RBF)2.50950.01210.00260.00260.24180.1934
18SVM (Linear) vs. Majority Voting2.34580.01900.00280.00280.34170.3037
17NB vs. SVM (RBF)2.12760.03340.00290.00290.56730.5339
16ANN vs. SVM (Linear)1.96400.04950.00310.00310.79260.7926
15DT vs. SVM (Poly)1.80030.07180.00330.00331.07721.0772
14SVM (Poly) vs. Majority Voting1.80030.07180.00360.00361.07721.0772
13KNN vs. NB1.80030.07180.00380.00381.07721.0772
12NB vs. SVM (Poly)1.41840.15610.00420.00421.87281.8728
11KNN vs. DT1.41840.15610.00450.00451.87281.8728
10SVM (Linear) vs. SVM (RBF)1.25480.20960.00500.00502.09572.0957
9DT vs. SVM (Linear)1.25480.20960.00560.00562.09572.0957
8SVM (RBF) vs. Majority Voting1.09110.27520.00630.00632.20192.2019
7ANN vs. NB1.09110.27520.00710.00712.20192.2019
6NB vs. SVM (Linear)0.87290.38270.00830.00832.29642.2964
5ANN vs. DT0.70920.47820.01000.01002.39102.3910
4SVM (Poly) vs. SVM (RBF)0.70920.47820.01250.01252.39102.3910
3KNN vs. ANN0.70920.47820.01670.01672.39102.3910
2SVM (Linear) vs. SVM (Poly)0.54550.58540.02500.02502.39102.3910
1DT vs. NB0.38190.70250.05000.05002.39102.3910
p-value and adjusted p-value for pairwise multiple comparisons of different supervised classification algorithms (Phase-I: Normal vs. Abnormal) using the validation set at (Abbreviations: SVM: Support Vector Machine, DT: Decision Tree, KNN: k-Nearest Neighbor, NB: Naïve Bayes, ANN: Artificial Neural Network, STD: Standard Deviation, AUC: Area Under Curve, MCC: Matthews Correlation Coefficient). p-value and adjusted p-value for pairwise multiple comparisons of different supervised classification algorithms (Phase-II: nCOVID-19 vs. Pneumonia) using the validation set at. Finally, the performance of the proposed system is compared with the existing state of the art methods (summarized in Table 1). Initially, the proposed method is compared for two-class (normal vs. abnormal/nCOVID-19) as shown in Table 12 . From the table, it was observed that the proposed method performed significantly better compared to Panwar et al., 2020, Hemdan, Shouman, & Karar, 2020, Maghdid, Asaad, Ghafoor, Sadiq, & Khan, 2020. Further, it achieved comparably equal performance to Narin et al., 2020, Ozturk et al., 2020. However, one should note that Narin et al., 2020, Ozturk et al., 2020 used comparably less number of CXR images to train the DL model.
Table 12

Two class (normal vs. abnormal) performance comparison of the proposed method with the state of art methods.

ArticlesClassAlgorithms/techniquesACC (%)
Panwar et al. (2020)2nCOVnet88.10
Hemdan, Shouman, and Karar (2020)2VGG19, DenseNet201, ResNetV2, InceptionV3, InceptionResNetV2, Xception, MobileNetV290.00
Maghdid, Asaad, Ghafoor, Sadiq, and Khan (2020)2AlexNet, Modified CNN94.00
Narin et al. (2020)2ResNet50, ResNetV2, InceptionV398.00
Ozturk et al. (2020)2DarkNet98.08
Proposed Method (Phase-I)2Majority vote based classifier ensemble98.06
Two class (normal vs. abnormal) performance comparison of the proposed method with the state of art methods. Further, the overall accuracy (for three class: normal vs. nCOVID-19 vs. pneumonia) of the proposed model is evaluated and compared with the existing state of art methods, as shown in Table 13 . The table reveals that the proposed method performed significantly better in terms of overall accuracy (ACC = 93.411%) compared to Ozturk et al. (2020) andWang and Wong (2020). However, it achieved comparably lower performance than Abbas, Abdelsamea, & Gaber, 2020, Chowdhury et al., 2020, Ucar and Korkmaz (2020) and Toğaçar, Ergen, and Cömert (2020), which is due to the fact that the author Toğaçar et al., 2020, Abbas, Abdelsamea, & Gaber, 2020 used very less number of CXR image to train the DL models. Further, radiological responses of pneumonia and nCOVID-19 are subtle, which confuses the classifier. Overcoming such limitation is still an open-ended research area.
Table 13

Three class (normal, nCOVID-19 and pneumonia) performance comparison of the proposed method with the state of art methods.

ArticlesClassAlgorithms/techniquesTotal number of imagesACC (%)
Ozturk et al. (2020)3DarkNet87.02
L. Wang et al. (2020)3COVID-Net93.30
Abbas et al. (2020)3DeTraC19695.12
Chowdhury et al. (2020)3AlexNet, ResNet18, DenseNet201, SqueezeNet348797.94
Ucar and Korkmaz (2020)3Deep Bayes-SqueezeNe595798.26
Nour et al. (2020)3CNN, SVM, DT, KNN367098.97
Toğaçar et al. (2020)3MobileNetV2, SqueezeNet, SVM45899.27
Proposed Method (Overall)3Majority vote based classifier ensemble234693.41
Three class (normal, nCOVID-19 and pneumonia) performance comparison of the proposed method with the state of art methods.

Discussion

The morbidity and mortality rate due to nCOVID-19 is rapidly increasing, with thousands of reported death worldwide. The WHO has already declared this pandemic as a global health emergency (Coronavirus Disease 2019, 2020). In this study, we presented an ACoS system to detect nCOVID-19 infected patients using CXR image data. We performed two-phase classification to segregate normal, nCOVID-19 and pneumonia infected images. The major challenges we experienced in this study are: The publicly available nCOVID-19 infected CXR images are limited and lacking standardization. The radiological characteristics of nCOVID-19 and pneumonia infections are ambiguous. Moreover, several studies using DL approaches have been reported in the literature for detection of nCOVID-19 infection in CXR and CT images (as shown in Table 1). Although the DL methods reported promising performance, it suffers from the following shortcomings: Resize the input CXR images to lower resolution (like 64 × 64 or 224 × 224, etc.) before processing, which may result in loss of crucial discriminative texture information. Demands massive training data to sufficiently train the model. Requires expertise to define suitable network architecture and set the many hyper-parameters (like input resolution, number of layers, filters, and filter shape, etc.). Requires high computational resources, extensive memory and a significant amount of time to train the network. Unlike conventional machine learning, DL approaches are unexplainable in nature. To overcome the aforementioned limitations, we have used a combination of radiomic texture features with conventional ML algorithms. The following facts can justify the promising performance of the proposed ACoS system: The radiomic texture descriptors (FOSF, GLCM, and HOG features) are highly efficient in encoding natural textures and thus can easily quantize the correlation attributes of radiological visual characteristics associated with nCOVID-19 infection. The image augmentation technique provides sufficient instances to train the model for possible variable inputs, making the model robust. The conventional ML algorithms can be efficiently trained using smaller datasets, fewer resources and minimal hyper-parameter tuning without compromising with the performance. The majority vote based classifier ensemble method used in the proposed ACoS system acts as a multi-expert recommendation system and reduces the probable chance of misclassification. The disadvantages of the proposed system are as follows: The subtle radiographic responses of different abnormalities like TB, pneumonia, influenza, etc. confuses the classifier, limiting the diagnostic performance of the system. In the proposed ACoS system, majority vote based classifier ensemble technique has been exploited to reduce the probable chance of misclassification of nCOVID-19 infected patients. Such method can be easily integrated into mobile radiology van and can work for the welfare of the society.

Conclusion

In this study, we have presented an ACoS system for preliminary diagnosis of nCOVID-19 infected patients, so that proper precautionary measures (like isolation and RT-PCR test) can be taken to prevent the further outbreak of the infection. The key findings of the study are summarized as follows: The proposed ACoS system demonstrated the promising potential to segregate the normal, pneumonia, and nCOVID-19 infected patients, which can be verified from the significant performance of phase-I (ACC = 98.062%, AUC = 0.977, and MCC = 0.956) and phase-II (ACC = 91.329%, AUC = 0.914 and MCC = 0.831) using the validation set. There are significant variations in the input CXR images due to diverse imaging conditions in different hospitals. The proposed system used augmented images, which generate sufficient variability to train the model and improve its robustness. The radiomic texture descriptors like FOSF, GLCM, and HOG features are highly efficient in quantizing the correlation attributes of radiological visual characteristics associated with nCOVID-19 infection. Unlike the data-hungry DL approaches, the proposed ACoS system used conventional ML algorithms to train the model with limited annotated images and less computational resources. This type of system may have greater clinical acceptability and can be deployed even in a resource-constrained environment. The Friedman post-hoc multiple comparison and z-score statistics confirm the statistical significance of the proposed system. The future work of this study should focus on improving the reliability and clinical acceptability of the system. The integration of the patient’s symptomatology and radiologist’s feedback with the CAD system could be helpful in making a robust screening system. Further, an in-depth analytical comparison of performances between conventional algorithms and deep learning methods could help in establishing its clinical acceptability.

CRediT authorship contribution statement

Tej Bahadur Chandra: Conceptualization, Methodology, Software, Validation, Investigation, Writing - original draft. Kesari Verma: Conceptualization, Formal analysis, Supervision, Writing - review & editing. Bikesh Kumar Singh: Formal analysis, Supervision, Writing - review & editing. Deepak Jain: Visualization, Validation. Satyabhuwan Singh Netam: Visualization, Validation.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Table 14

Radiomic texture features (FOSF, GLCM features) extracted from CXR images.

Category of featuresNumber of featuresName of features
First Order Statistical Feature (FOSF) (Srinivasan & Shobha, 2008)8Mean (m), Variance (μ2), Standard deviation (σ), Skewness (μ3), Kurtosis (μ4), Smoothness (R), Uniformity (U), Entropy (e)
Gray Level Co-occurrence Matrix (GLCM) Texture Feature (Gómez et al., 2012, Haralick et al., 1973)88 (22 × 4)Sum average, Sum variance, Difference variance, Energy, Autocorrelation, Entropy, Sum entropy, Difference entropy, Contrast, Homogeneity I, Homogeneity II, Correlation I, Correlation II, Cluster Prominence, Cluster Shade, Sum of squares, Maximum probability, Dissimilarity, Information measure of correlation I, Information measure of correlation II, Inverse difference normalized, Inverse difference moment normalized
Histogram of Oriented Gradients (HOG) (Dalal & Triggs, 2005, Santosh and Antani, 2018)8100f1, f2, f3, f4,…………………………………………., f8100
Algorithm 1. Majority voting algorithm using ensemble of benchmark classifiers.
Input/Initialize
Validation dataset D={x1,x2,xn}. // n=number of images in the dataset
Classification model M={m1,m2,mk} //k = number of classification models
Healthy = 0; Unhealthy = 0;
Output
Prediction results of each image as healthy or infected.
Algorithm

forxi=1ton //where, n=number of images in the dataset

 Healthy = 0; Unhealthy = 0;

  formj=1tok // where, k=number of classification models

    xi.classmpredict(xi,mj) // extract the class label using model mj for input image xi

    ifxi.classm='Healthy' // jth model classify the data as ‘Healthy’

     Healthy = Healthy + 1; // increment the counter by one

    else ifxi.classm='Unhealthy' // jth model classifies the data as abnormal

     Unhealthy = Unhealthy + 1; // Increment the probability by one

    end

  end

  ifHealthy>Unhealthy

    xi.class='Healthy';

  else

    xi.class='Unhealthy';

  end

end

  25 in total

1.  Lung segmentation in chest radiographs using anatomical atlases with nonrigid registration.

Authors:  Sema Candemir; Stefan Jaeger; Kannappan Palaniappan; Jonathan P Musco; Rahul K Singh; Alexandros Karargyris; Sameer Antani; George Thoma; Clement J McDonald
Journal:  IEEE Trans Med Imaging       Date:  2013-11-13       Impact factor: 10.048

2.  Feature Selection for Automatic Tuberculosis Screening in Frontal Chest Radiographs.

Authors:  Szilárd Vajda; Alexandros Karargyris; Stefan Jaeger; K C Santosh; Sema Candemir; Zhiyun Xue; Sameer Antani; George Thoma
Journal:  J Med Syst       Date:  2018-06-29       Impact factor: 4.460

3.  Deep Transfer Learning Based Classification Model for COVID-19 Disease.

Authors:  Y Pathak; P K Shukla; A Tiwari; S Stalin; S Singh; P K Shukla
Journal:  Ing Rech Biomed       Date:  2020-05-20

4.  Automatic tuberculosis screening using chest radiographs.

Authors:  Stefan Jaeger; Alexandros Karargyris; Sema Candemir; Les Folio; Jenifer Siegelman; Fiona Callaghan; Kannappan Palaniappan; Rahul K Singh; Sameer Antani; George Thoma; Clement J McDonald
Journal:  IEEE Trans Med Imaging       Date:  2013-10-01       Impact factor: 10.048

5.  Learning to detect chest radiographs containing pulmonary lesions using visual attention networks.

Authors:  Emanuele Pesce; Samuel Joseph Withey; Petros-Pavlos Ypsilantis; Robert Bakewell; Vicky Goh; Giovanni Montana
Journal:  Med Image Anal       Date:  2019-01-09       Impact factor: 8.545

6.  Automated detection of COVID-19 cases using deep neural networks with X-ray images.

Authors:  Tulin Ozturk; Muhammed Talo; Eylul Azra Yildirim; Ulas Baran Baloglu; Ozal Yildirim; U Rajendra Acharya
Journal:  Comput Biol Med       Date:  2020-04-28       Impact factor: 4.589

7.  Application of deep learning technique to manage COVID-19 in routine clinical practice using CT images: Results of 10 convolutional neural networks.

Authors:  Ali Abbasian Ardakani; Alireza Rajabzadeh Kanafi; U Rajendra Acharya; Nazanin Khadem; Afshin Mohammadi
Journal:  Comput Biol Med       Date:  2020-04-30       Impact factor: 4.589

8.  Application of deep learning for fast detection of COVID-19 in X-Rays using nCOVnet.

Authors:  Harsh Panwar; P K Gupta; Mohammad Khubeb Siddiqui; Ruben Morales-Menendez; Vaishnavi Singh
Journal:  Chaos Solitons Fractals       Date:  2020-05-28       Impact factor: 5.944

9.  COVID-Net: a tailored deep convolutional neural network design for detection of COVID-19 cases from chest X-ray images.

Authors:  Linda Wang; Zhong Qiu Lin; Alexander Wong
Journal:  Sci Rep       Date:  2020-11-11       Impact factor: 4.379

View more
  43 in total

Review 1.  Decision fusion in healthcare and medicine: a narrative review.

Authors:  Elham Nazari; Rizwana Biviji; Danial Roshandel; Reza Pour; Mohammad Hasan Shahriari; Amin Mehrabian; Hamed Tabesh
Journal:  Mhealth       Date:  2022-01-20

2.  Detecting SARS-CoV-2 From Chest X-Ray Using Artificial Intelligence.

Authors:  Md Manjurul Ahsan; Md Tanvir Ahad; Farzana Akter Soma; Shuva Paul; Ananna Chowdhury; Shahana Akter Luna; Munshi Md Shafwat Yazdan; Akhlaqur Rahman; Zahed Siddique; Pedro Huebner
Journal:  IEEE Access       Date:  2021-02-23       Impact factor: 3.367

3.  Integrating patient symptoms, clinical readings, and radiologist feedback with computer-aided diagnosis system for detection of infectious pulmonary disease: a feasibility study.

Authors:  Tej Bahadur Chandra; Bikesh Kumar Singh; Deepak Jain
Journal:  Med Biol Eng Comput       Date:  2022-07-02       Impact factor: 3.079

4.  Deep learning-based important weights-only transfer learning approach for COVID-19 CT-scan classification.

Authors:  Tejalal Choudhary; Shubham Gujar; Anurag Goswami; Vipul Mishra; Tapas Badal
Journal:  Appl Intell (Dordr)       Date:  2022-07-18       Impact factor: 5.019

Review 5.  Exploring the Deep-Learning Techniques in Detecting the Presence of Coronavirus in the Chest X-Ray Images: A Comprehensive Review.

Authors:  K Silpaja Chandrasekar
Journal:  Arch Comput Methods Eng       Date:  2022-05-23       Impact factor: 8.171

6.  COVID-19 chest X-ray detection through blending ensemble of CNN snapshots.

Authors:  Avinandan Banerjee; Arya Sarkar; Sayantan Roy; Pawan Kumar Singh; Ram Sarkar
Journal:  Biomed Signal Process Control       Date:  2022-07-15       Impact factor: 5.076

7.  Early assessment of lung function in coronavirus patients using invariant markers from chest X-rays images.

Authors:  Mohamed Elsharkawy; Ahmed Sharafeldeen; Fatma Taher; Ahmed Shalaby; Ahmed Soliman; Ali Mahmoud; Mohammed Ghazal; Ashraf Khalil; Norah Saleh Alghamdi; Ahmed Abdel Khalek Abdel Razek; Eman Alnaghy; Moumen T El-Melegy; Harpal Singh Sandhu; Guruprasad A Giridharan; Ayman El-Baz
Journal:  Sci Rep       Date:  2021-06-08       Impact factor: 4.379

8.  Automated COVID-19 Detection from Chest X-Ray Images: A High-Resolution Network (HRNet) Approach.

Authors:  Sifat Ahmed; Tonmoy Hossain; Oishee Bintey Hoque; Sujan Sarker; Sejuti Rahman; Faisal Muhammad Shah
Journal:  SN Comput Sci       Date:  2021-05-25

9.  Home quarantine is a useful strategy to prevent the coronavirus outbreak: Identifying the reasons for non-compliance in some Iranians.

Authors:  Elham Nazari; Mohammad Hasan Shahriari; Malihe Dadgarmoghaddam; Azadeh Saki; Mahsa Nahidi; Amin Mehrabian; Hamed Tabesh
Journal:  Inform Med Unlocked       Date:  2020-11-24

10.  COVSeg-NET: A deep convolution neural network for COVID-19 lung CT image segmentation.

Authors:  XiaoQing Zhang; GuangYu Wang; Shu-Guang Zhao
Journal:  Int J Imaging Syst Technol       Date:  2021-06-04       Impact factor: 2.177

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.