Moloud Abdar1, Soorena Salari2, Sina Qahremani3, Hak-Keung Lam4, Fakhri Karray5,6, Sadiq Hussain7, Abbas Khosravi1, U Rajendra Acharya8,9,10, Vladimir Makarenkov11, Saeid Nahavandi1. 1. Institute for Intelligent Systems Research and Innovation (IISRI), Deakin University, Geelong, Australia. 2. Department of Computer Science and Software Engineering, Concordia University, Montreal, Canada. 3. Department of Electrical Engineering, Sharif University of Technology, Tehran, Iran. 4. Centre for Robotics Research, Department of Engineering, King's College London, London, United Kingdom. 5. Centre for Pattern Analysis and Machine Intelligence, Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, ON, Canada. 6. Department of Machine Learning, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi, United Arab Emirates. 7. System Administrator, Dibrugarh University, Dibrugarh, India. 8. Department of Electronics and Computer Engineering, Ngee Ann Polytechnic, Clementi, Singapore. 9. Department of Biomedical Engineering, School of Science and Technology, SUSS University, Singapore. 10. Department of Biomedical Informatics and Medical Engineering, Asia University, Taichung, Taiwan. 11. Department of Computer Science, University of Quebec in Montreal, Montreal, Canada.
Abstract
The COVID-19 (Coronavirus disease 2019) pandemic has become a major global threat to human health and well-being. Thus, the development of computer-aided detection (CAD) systems that are capable of accurately distinguishing COVID-19 from other diseases using chest computed tomography (CT) and X-ray data is of immediate priority. Such automatic systems are usually based on traditional machine learning or deep learning methods. Differently from most of the existing studies, which used either CT scan or X-ray images in COVID-19-case classification, we present a new, simple but efficient deep learning feature fusion model, called U n c e r t a i n t y F u s e N e t , which is able to classify accurately large datasets of both of these types of images. We argue that the uncertainty of the model's predictions should be taken into account in the learning process, even though most of the existing studies have overlooked it. We quantify the prediction uncertainty in our feature fusion model using effective Ensemble Monte Carlo Dropout (EMCD) technique. A comprehensive simulation study has been conducted to compare the results of our new model to the existing approaches, evaluating the performance of competing models in terms of Precision, Recall, F-Measure, Accuracy and ROC curves. The obtained results prove the efficiency of our model which provided the prediction accuracy of 99.08% and 96.35% for the considered CT scan and X-ray datasets, respectively. Moreover, our U n c e r t a i n t y F u s e N e t model was generally robust to noise and performed well with previously unseen data. The source code of our implementation is freely available at: https://github.com/moloud1987/UncertaintyFuseNet-for-COVID-19-Classification.
The COVID-19 (Coronavirus disease 2019) pandemic has become a major global threat to human health and well-being. Thus, the development of computer-aided detection (CAD) systems that are capable of accurately distinguishing COVID-19 from other diseases using chest computed tomography (CT) and X-ray data is of immediate priority. Such automatic systems are usually based on traditional machine learning or deep learning methods. Differently from most of the existing studies, which used either CT scan or X-ray images in COVID-19-case classification, we present a new, simple but efficient deep learning feature fusion model, called U n c e r t a i n t y F u s e N e t , which is able to classify accurately large datasets of both of these types of images. We argue that the uncertainty of the model's predictions should be taken into account in the learning process, even though most of the existing studies have overlooked it. We quantify the prediction uncertainty in our feature fusion model using effective Ensemble Monte Carlo Dropout (EMCD) technique. A comprehensive simulation study has been conducted to compare the results of our new model to the existing approaches, evaluating the performance of competing models in terms of Precision, Recall, F-Measure, Accuracy and ROC curves. The obtained results prove the efficiency of our model which provided the prediction accuracy of 99.08% and 96.35% for the considered CT scan and X-ray datasets, respectively. Moreover, our U n c e r t a i n t y F u s e N e t model was generally robust to noise and performed well with previously unseen data. The source code of our implementation is freely available at: https://github.com/moloud1987/UncertaintyFuseNet-for-COVID-19-Classification.
The 2019 coronavirus (COVID-19) has been spreading astonishingly fast across the globe since its emergence in December 2019 and its exact origin is still unknown [1], [2], [3]. Overall, the COVID-19 pandemic has caused a consecutive series of catastrophic losses worldwide, infecting more than 287 million people and causing around 5.4 million deaths around the world up to the present. The rapid spread of COVID-19 is continuing to threaten human’s life and health with the emergence of novel variants such as Delta and Omicron. All of this makes COVID-19 not only an epidemiological disaster, but also a psychological and emotional one. The uncertainties and grappling with the loss of normalcy caused by this pandemic provoke severe anxiety, stress and sadness among people.Easy respiratory transmission of the disease from person to person triggers swift spread of the pandemic. While many of the COVID-19 cases show milder symptoms, the symptoms of the remaining cases are unfortunately life-critical. The health-care systems in many countries seem to have arrived at the point of collapse as the number of cases has been increasing drastically due to the fast propagation of some of its variants. Regarding the COVID-19 diagnostic, the reverse transcription polymerase chain reaction (RT-PCR) is one of the gold standards for COVID-19 detection. However, RT-PCR has a low sensitivity. Hence, many COVID-19 cases will not be recognized by this test and thus the patients may not get the proper treatments. These unrecognized patients pose a threat to the healthy population due to highly infectious nature of the virus. Chest X-ray (CXR) and Computed Tomography (CT) have been widely used to identify prominent pneumonia patterns in the chest. These imaging technologies accompanied by artificial intelligence tools may be used to diagnose COVID-19 patients in a more accurate, fast and cost-effective manner. Failure to provide prompt detection and treatment of COVID-19 patients increases the mortality rate. Hence, the detection of COVID-19 cases using deep learning models using both CXR and CT images may have huge potential in healthcare applications.In recent years, deep learning models have had the widespread applicability not only in medical imaging field but also in many other areas [4], [5], [6], [7]. These models have also been extensively applied for COVID-19 detection. It is critical to discriminate COVID-19 from other forms of pneumonia and flu. Farooq et al. [8] introduced an open-access dataset and the open-source code of their implementation using a CNN framework for distinguishing COVID-19 from analogous pneumonia cohorts from chest X-ray images. The authors designed their COVIDResNet model by utilizing a pre-trained ResNet-50 framework allowing them to improve the model’s performance and reduce its training time. An automatic and accurate identification of COVID-19 using CT images helps radiologists to screen patients in a better way. Zheng et al. in [9] proposed a fully automated system for COVID-19 detection from chest CT images. Their deep learning model, called COVNet, investigates visual features of the chest CT images. Moreover, Hall et al. [10] presented a new deep learning model, named COVIDX-Net, to aid radiologists with COVID-19 detection from CXR image data. The authors explored seven deep learning architectures, including DenseNet, VGG-19 and MobileNet v2.0. In another study, Abbas et al. [11] designed the Decompose, Transfer, and Compose (DeTraC) model of COVID-19 image classification using CXR data. A class decomposition approach was employed to identify irregularities in iCXR data by scrutinizing the class boundaries.Segmentation also plays a key role in COVID-19 quantification applied to CT scan data. Chen et al. [12] proposed a novel deep learning method for segmentation of COVID-19 infection regions automatically. Aggregated Residual Transformations were employed to learn a robust and expressive feature representation and the soft attention technique was applied to improve the potential of the system to distinguish several symptoms of COVID-19. However, we noticed that there are still some open issues in the recently proposed traditional machine learning and deep learning models for COVID-19 detection. For this reason, optimizing the existing models should be a priority in COVID-19 detection and classification. Ensemble and fusion-based models [13] have shown outstanding performance in different medical applications. In the following, we provide more information about fusion-based models, discussing how they can be used in the framework of the deep learning approach.
Uncertainty quantification (UQ)
Many traditional machine learning and deep learning models have been developed not only for analysis of CXR and CT image data but also for many other medical applications, often yielding high accuracy results even for a limited number of images [7]. However, DNNs require a large number of data to fine-tune trainable parameters. A limited number of images usually leads to epistemic uncertainty. Trust is an issue for these models, deployed with lower numbers of training samples. Out-of-distribution (OoD) samples and discrimination between the training and testing samples make such models fail in real world applications. Lack of confidence in unknown or new cases is usually not reported for these models. However, this information is essential for the development of reliable medical diagnostic tools. These unknown samples, which are generally hard to predict, often have important practical value. It is essential to estimate uncertainties with an extra insight in their point estimates. This additional vision aims at enhancing the overall trustworthiness of the systems, allowing clinicians to know where they can trust predictions made by the models. The flawed decisions made by some models can be fatal for the patients at risk. Hence, proper uncertainty estimations are necessary to improve the efficiency of ML models making them trustworthy and reliable [14], [15], [16]. Trustworthy uncertainty estimates can facilitate clinical decision making, and more importantly, provide clinicians with appropriate feedback on the reliability of the obtained results [17]. As discussed above, COVID-19 has had many negative effects on all aspects of human life around the world. The COVID-19 pandemic has caused millions of deaths worldwide. In this regard, our study attempts to propose a simple and accurate deep learning model, called UncertaintyFuseNet, for detecting COVID-19 cases. Our model includes an uncertainty quantification method to increase the reliability of the obtained results.
Research gaps
Our comprehensive literature review helped us to identify several important research gaps related to the use of the COVID-19 detection/segmentation methods. Below, we list the most important of them:There are no sufficient COVID-19 image data to develop accurate and robust deep learning models. This lack of data can impact the performance of deep learning approaches.To the best of our knowledge, there are very few studies that have used both types of images (CT scan and X-ray) simultaneously.There are very few studies that have examined the uncertainty of the COVID-19 predictions provided by deep learning models.Moreover, we found that there are very few COVID-19 classification studies considering the model’s robustness and its ability to process unknown data.The impressive effect of different feature fusion methods has received less attention in the COVID-19 classification research. It is worth noting that feature fusion techniques are very effective both for improving the model’s performance and for dealing with uncertainty within ML and DL models.
Main contributions
The main contributions of this study are as follows:We proposed a novel feature fusion model for accurate detection of COVID-19 cases.We quantified the uncertainty in our proposed feature fusion model using effective Ensemble MC Dropout (EMCD) technique.The proposed feature fusion model demonstrates strong robustness to data contamination (data noise).Our new model provided very encouraging results in terms of unknown data detection.The main characteristics of the proposed UncertaintyFuseNet model are as follows: (i) It is an accurate model with promising performance, (ii) It can be used efficiently to carry out classification analysis of large CT and X-ray image datasets, (iii) It quantifies the prediction uncertainty, (iv) It is a reliable model in terms of processing noisy data, and finally, (v) It allows for an accurate detection of OoD samples.The rest of this study is organized as follows. Section 2 formulates the proposed methodology. The main experiments of this study are discussed in Section 3. Section 4 presents the obtained results and provides a comprehensive comparison with existing studies. Finally, the conclusions are presented in Section 5.
Proposed methodology
This section includes two main sub-sections describing: (i) Basic deep learning models in sub- Section 2.1, (ii) and our novel feature fusion model, , in sub- Section 2.2. It may be noted that we also applied two traditional machine learning algorithms (i.e., (RF) and (DT, max-depth 50 and n-estimators 200)) and compared their performances with the considered deep learning models.
Basic deep learning models
In this sub-section, we provide more details regarding two basic deep learning models: (i) Deep 1 (Simple CNN), and (ii) deep 2 (Multi-headed CNN). Fig. 1, Fig. 2 show deep 1 (Simple CNN) and deep 2 (Multi-headed CNN) models, respectively. The first deep learning model (Simple CNN) includes three convolutional layers followed by MC dropout in the feature extraction layer. The extracted features are then given to the classification layer, including three dense layers and MC dropout. More details of the deep 1 model can be found in Fig. 1. In our second deep learning model, deep 2, i.e., multi-headed CNN, comprises three main heads (as feature extractors). The extracted features in each branch are then given to the fusion layers, followed by the classification layer as illustrated in Fig. 2.
Fig. 1
A general overview of the applied deep learning model deep 1 (simple CNN).
Fig. 2
A general overview of the applied deep learning mode deep 2 (multi-headed CNN).
A general overview of the applied deep learning model deep 1 (simple CNN).A general overview of the applied deep learning mode deep 2 (multi-headed CNN).
Proposed feature fusion model:
Feature fusion is an approach used to combine features (different information) of the same sample (input) extracted by various methods. Assume be a training sample (image) space of labeled samples (images). Given , , …, and , where , , …, are the feature vectors of the same input sample extracted by various deep learning models, respectively. Therefore, the total feature fusion vector space obtained from different sources can be calculated as follows:In this study, after preprocessing the data, we feed our dataset to the model. Our model consists of two major branches: The first branch has five convolutional blocks. Each block is made up of two tandem convolutional layers followed by batch normalization and max-pooling layers. Also, the fourth and fifth blocks have dropout layers in their outputs. It is worth noting that separable convolutions were utilized in the model in the second and subsequent layers. However, we used the usual convolution layer in other architectures (First layer of , Simple CNN, and Multi-headed CNN). The following training parameters were used in our experiments: The learning rate of the proposed model is 0.0005, the batch size is 128, the number of epochs is 200, and Adam is selected as our optimizer. The second branch is a VGG16 transfer learning network whose output is used in the fusion layer. After two branches, the model is followed by a fusion layer that concatenates the third, fourth, and fifth convolutional layers’ outputs with VGG16’s output.Finally, we used fully connected layers to process the fused features and classify the data. In this part, we have used four dense layers with 512, 128, 64, and 3 neurons with the ReLU activation function, respectively. The output of the first three dense layers has a dropout in their outcomes with a rate equal to 0.7, 0.5, and 0.3, respectively.The stated model is not simplistic. Indeed, to boost the model’s power in dealing with data and extracting high-quality features, we have employed a novel feature fusion approach combining different sources:We selected the third convolutional block’s output as a fusion source to have a holistic perspective about the data distribution. These features help the model to consider the unprocessed and raw information and use it in the prediction.We included the final and penultimate convolutional blocks’ outputs in the feature fusion layer to have more accurate information. This feature gives a detailed view of the dataset to model and helps the model to process advanced classification features.As has been suggested by recent pneumonia detection studies, where the pretrained networks have been successively used to create high-quality generalizable features, we used the output of VGG16 in the fusion layer.The pseudo-code of the proposed model for detecting the COVID-19 cases is reported in Algorithm 1. Its general view is illustrated in Fig. 3.
Fig. 3
A general overview of the proposed model inspired by a hierarchical feature fusion approach and EMCD.
It should be noted that the detailed information about Convolution blocks in Figs. 1, 2, and 3 is reported in Table B.12, in the Appendix. To generate the final prediction, after training the applied models with uncertainty module, we have first run each model times. Thereafter, we average the predicted softmax probabilities (outputs) in those random predictions of data through and stochastic sampling dropout mask for each single prediction.
Table B.12
The detailed parts of Convolution blocks used in Deep 1 (Simple CNN), Deep 2 (Multi-headed CNN), and our proposed model (Fusion model).
Model
Layer name
Input size
Deep 1 (Simple CNN)
Conv1
2D convolution, Kernel size: 3, Activation: ReLU, Max Pooling: 2
A general overview of the proposed model inspired by a hierarchical feature fusion approach and EMCD.We then used the model ensembling and acquired predictions from the trained models with various weight distributions and initialized weights using this strategy. This allowed us to improve the model’s performance drastically. Thus, after training the model, we use the MC equation with
200 (see Eq. (3)) to obtain predictions of the model through different stochastic paths (using MC dropouts to create randomness in our architectures). After getting all predictions, we calculate the mean for each sample. Using this approach, we obtain an ensemble of different models which helps boost the model’s performance. Precisely, we run the proposed model 200 times for each sample at the test stage and get an average prediction as the final prediction of the model.The pseudo-code of the applied EMCD procedure included in our model for detecting COVID-19 cases is summarized in Algorithm 2. Furthermore, the learning rate of the proposed model is 0.0005, the batch size is 128, the number of epochs is 200 and Adam is selected as our optimizer.
Experiments
In this section, we present : the data considered in our study (see sub- Section 3.1), the results obtained using our new model (see sub- Section 3.2), the results showing that our new model is robust against noise (see sub- Section 3.3), and the results showing how our new model copes with unknown data (see sub- Section 3.4).
Datas considered
In this study, two types of input image data were used: CT scan [18]
2
and X-ray3
images (see Table 1). Some random samples of the CT scan and X-ray datasets considered in this study are shown in Fig. 4. The CT scan dataset has classes of data: non-informative CT (NiCT), positive CT (pCT), and negative CT (nCT) images. The X-ray dataset also has three data classes: COVID-19, Normal, and Pneumonia images.
Table 1
Characteristics of the CT scan and X-ray datasets considered in our study.
Dataset
# of samples
# of classes
CT scan images
19 685 (70% train, 30% test)
3
X-ray images
6432 (train: 5144, test: 1288)
3
Fig. 4
Some random image samples from the CT scan and X-ray datasets considered in our study.
It should be pointed out that for the CT scan dataset we randomly used 70% of the whole data for training and the rest (30%) for testing the applied models. However, the X-ray dataset was originally divided into two main categories: train (5144 images) and test (1288 images). Thus, we used these train and test categories in our study as well.Characteristics of the CT scan and X-ray datasets considered in our study.Some random image samples from the CT scan and X-ray datasets considered in our study.
Experimental results
In this section, the experimental results are presented and discussed. Since we also considered the impact of UQ methods, our experiments have been conducted with and without applying them for detection of COVID-19 cases. In our first experiment, we compared five different machine learning models, including Random Forest (RF), Decision Trees (DT, max-depth 50, and n-estimators 200), Deep 1 (Simple CNN), Deep 2 (Multi-headed CNN), and our proposed model (feature fusion model).
COVID-19 classification without considering uncertainty
First, we investigated the performance of the five considered classifiers (RF, DT, simple CNN, multi-headed CNN and our proposed feature fusion model) without considering uncertainty. The obtained results are presented in Table 2, Table 3 for the CT scan and X-ray datasets, respectively. As shown in Table 2 our feature fusion model outperformed the other methods for the CT scan dataset, providing the accuracy of 99.136%, and followed by simple CNN with the accuracy of 98.763%. The obtained results also indicate that DT provided the weakest performance for the CT scan dataset among the five competing models. Figs. B.15 (in the Appendix) and 5 present the confusion matrices and the ROC curves obtained for the CT scan dataset without quantifying uncertainty, respectively.
Table 2
Comparison of the results (given in %) provided by different ML models for detecting COVID-19 cases for the CT scan dataset: Results without considering uncertainty.
ML model
Precision
Recall
F-measure
Accuracy
RF
97.111
97.070
97.091
97.070
DT
93.049
93.040
93.045
93.040
Deep 1 (Simple CNN)
98.787
98.763
98.775
98.763
Deep 2 (Multi-headed CNN)
98.599
98.577
98.588
98.577
Proposed (Fusion model)
99.137
99.136
99.136
99.136
Table 3
Comparison of the results (given in %) provided by different ML models for detecting COVID-19 cases for the X-ray dataset: Results without considering uncertainty.
ML model
Precision
Recall
F-measure
Accuracy
RF
91.532
91.381
91.456
91.381
DT
83.828
84.006
83.917
84.006
Deep 1 (Simple CNN)
93.847
93.167
93.506
93.167
Deep 2 (Multi-headed CNN)
95.041
94.953
94.997
94.953
DarkCovidNet
95.752
95.729
95.741
95.729
Proposed (Fusion model)
97.121
97.127
97.124
97.127
Fig. B.15
Confusion matrices obtained using different models for the CT scan datasets without quantifying uncertainty.
Fig. 5
ROC curves obtained for the five considered ML models for the CT scan data without quantifying uncertainty.
To demonstrate the effectiveness of the proposed feature fusion model, the same five ML have been applied to analyze X-ray data. It can be observed from Table 3 that our feature fusion model performed much better than the other competing ML models, providing the accuracy of 97.127%, followed by the multi-headed CNN model with the accuracy of 94.953%. The traditional Decision Tree model provided much worse results for the X-ray data (the recall value of 84.006%) than for the CT scan data (the recall value of 93.040%). Figs. B.16 (in the Appendix) and 6 present the confusion matrices and the ROC curves obtained by the five ML models for the X-ray dataset without quantifying uncertainty.
Fig. B.16
Confusion matrices obtained using different models for the X-ray dataset without quantifying uncertainty.
Fig. 6
ROC curves obtained for the five considered ML models for the X-ray data without quantifying uncertainty.
Comparison of the results (given in %) provided by different ML models for detecting COVID-19 cases for the CT scan dataset: Results without considering uncertainty.ROC curves obtained for the five considered ML models for the CT scan data without quantifying uncertainty.Comparison of the results (given in %) provided by different ML models for detecting COVID-19 cases for the X-ray dataset: Results without considering uncertainty.ROC curves obtained for the five considered ML models for the X-ray data without quantifying uncertainty.
COVID-19 classification considering uncertainty
The results, discussed in the previous sub- Section 3.2.1, provided by our new feature fusion model are promising, suggesting that it can be used by clinical practitioners for automatic detection of COVID-19 cases. We believe that new efficient intelligent (i.e. ML and DL) models to deal with COVID-19 data are urgently needed. At the same time, we believe in the uncertainty estimates should accompany such intelligent models. To accomplish this, we applied the uncertainty quantification method, called EMC dropout, to estimate the uncertainty of our deep learning predictions. The EMC method was used in the framework of the Deep 1 (Simple CNN) and Deep 2 (Multi-headed CNN) models, and our proposed model.Table 4 and Fig. B.17 in the Appendix (confusion matrices) and Fig. 7 (ROC curves) show the results provided by the three compared deep learning models considering uncertainty for the CT scan dataset. As shown in Table 4, our feature fusion model yielded a better classification performance compared to the Deep 1 and Deep 2 CNN-based models. provided the accuracy value of 99.085%, followed by the Deep 1 model with the accuracy value of 98.831%, for the CT scan data. The results obtained using deep learning models with and without uncertainty quantification (UQ) reveal that our proposed feature fusion model with UQ method has had a slightly poorer performance than the model without UQ. The Deep 1 CNN model performed slightly better with UQ, while the Deep 2 CNN model performed slightly better without UQ.
Table 4
Comparison of the results (given in %) provided by the 3 DL models for detecting COVID-19 cases for the CT scan dataset: Results obtained with uncertainty quantification.
DL model
Precision
Recall
F-measure
Accuracy
Deep 1 (Simple CNN)
98.831
98.854
98.843
98.831
Deep 2 (Multi-headed CNN)
98.493
98.523
98.508
98.493
Proposed (Fusion model)
99.085
99.085
99.085
99.085
Fig. B.17
Confusion matrices obtained using different models for the CT scan dataset with quantifying uncertainty.
Fig. 7
ROC curves obtained for the three considered DL models for the CT scan data with uncertainty quantification.
We also evaluated the performance of three considered DL models with uncertainty quantification on the X-ray dataset. The obtained statistics, confusion matrices, and the ROC curves for the three competing DL models applied are presented in Table 5 and Fig. B.17 in the Appendix and Fig. ?? (ROC curves), respectively. Our proposed feature fusion model achieved the best performance for COVID-19 detection using X-ray dataset with an accuracy of 96.350% compared to the simple CNN (accuracy of 95.263%). For the X-ray data, the proposed model outperformed the Deep 1 simple CNN model, but was slightly surpassed by the Deep 2 multi-headed CNN (see Table 5) in the Appendix (see Fig. 8).
Table 5
Comparison of the results (given in %) provided by the 3 DL models for detecting COVID-19 cases for the X-ray dataset: Results obtained with uncertainty quantification.
Method
Precison
Recall
F-measure
Accuracy
Deep 1 (Simple CNN)
95.263
95.354
95.309
95.263
Deep 2 (Multi-headed)
95.186
95.257
95.222
95.186
DarkCovidNet
97.460
97.4589
97.459
97.458
Proposed (Fusion model)
96.350
96.370
96.360
96.350
Fig. 8
ROC curves obtained for the three considered DL models for the X-ray data with uncertainty quantification.
Comparison of the results (given in %) provided by the 3 DL models for detecting COVID-19 cases for the CT scan dataset: Results obtained with uncertainty quantification.ROC curves obtained for the three considered DL models for the CT scan data with uncertainty quantification.Comparison of the results (given in %) provided by the 3 DL models for detecting COVID-19 cases for the X-ray dataset: Results obtained with uncertainty quantification.ROC curves obtained for the three considered DL models for the X-ray data with uncertainty quantification.
Robustness against noise
An individual visual system is significantly robust against a wide variety of natural noises and corruptions occurring in the nature such as snow, fog or rain [19]. However, the overall performance of various modern image and speech recognition systems is greatly degraded when evaluated using previously unseen noises and corruptions. Thus, conducting robustness tests for considered ML and DL models can be necessary to reveal their level of stability against noise. In this study, the robustness of the applied deep learning models against noise has been investigated.We added different noise variables to both CT scan and X-ray datasets to evaluate the performance of Simple CNN, Multi-headed CNN and our proposed feature fusion model. Gaussian noise variables with different standard deviations (STD) were generated. The generated STD values were the following: 0.0001, 0.001, 0.01, 0.1, 0.2, 0.3, 0.4, 0.5, and 0.6, whereas the value of Mean was equal to 0. Our simulation results obtained for the CT scan and X-ray datasets are presented in Table 6, Table 7, respectively. It may be noted from Table 6 (CT scan data results) that both Simple CNN and Multi-headed CNN models did not perform well with noisy data compared to our feature fusion model. The results reported in Table 6 indicate that the values of all metrics computed for Simple and Multi-headed CNNs decrease dramatically as the level of noise increases. In contrast, our feature fusion model has been much more robust against noise according to all metrics considered.
Table 6
Robustness against noise results (given in %) provided by the 3 compared DL models for detecting COVID-19 cases for the CT scan dataset. Here, , where is the mean of the noise and is standard deviation of the noise.
DL model
Noise STD
Precision
Recall
F-measure
Accuracy
Deep 1 (Simple CNN)
0.0001
98.852
98.831
98.842
98.831
0.001
98.868
98.848
98.858
98.848
0.01
98.754
98.730
98.742
98.730
0.1
95.785
95.614
95.700
95.614
0.2
89.270
87.284
88.266
87.284
0.3
86.370
82.593
84.439
82.593
0.4
82.628
75.194
78.736
75.194
0.5
78.303
64.053
70.465
64.053
0.6
75.770
57.534
65.405
57.534
Deep 2 (Multi-headed)
0.0001
98.526
98.493
98.509
98.493
0.001
98.524
98.493
98.508
98.493
0.01
98.558
98.526
98.542
98.526
0.1
93.447
92.871
93.158
92.871
0.2
86.943
83.423
85.147
83.423
0.3
80.168
69.065
74.203
69.065
0.4
76.032
58.465
66.102
58.465
0.5
73.837
54.690
62.837
54.690
0.6
72.914
53.149
61.482
53.149
Proposed (Fusion model)
0.0001
99.085
99.085
99.085
99.085
0.001
99.119
99.119
99.119
99.119
0.01
99.194
99.187
99.190
99.187
0.1
99.098
99.085
99.092
99.085
0.2
98.828
98.814
98.821
98.814
0.3
98.109
98.086
98.097
98.086
0.4
96.956
96.884
96.920
96.884
0.5
96.201
96.088
96.145
96.088
0.6
95.804
95.665
95.734
95.665
Table 7
Robustness against noise results (given in %) provided by the 3 compared DL models for detecting COVID-19 cases for the X-ray dataset.
DL model
Noise STD
Precision
Recall
F-measure
Accuracy
Deep 1 (Simple CNN)
0.0001
95.408
95.341
95.375
95.341
0.001
95.408
95.341
95.341
95.341
0.01
95.338
95.263
95.301
95.263
0.1
94.554
94.254
94.404
94.254
0.2
91.540
89.285
90.398
89.285
0.3
88.534
82.065
85.176
82.065
0.4
85.770
73.136
78.951
73.136
0.5
84.294
64.518
73.092
64.518
0.6
82.545
57.375
67.696
57.375
Deep 2 (Multi-headed)
0.0001
95.474
95.419
95.446
95.419
0.001
95.188
95.108
95.148
95.108
0.01
95.404
95.341
95.372
95.341
0.1
93.922
93.322
93.621
93.322
0.2
88.861
82.453
85.537
82.453
0.3
83.781
58.074
68.598
58.074
0.4
82.207
40.062
53.871
40.062
0.5
81.750
31.521
45.499
31.521
0.6
81.366
27.639
41.262
27.639
Proposed (Fusion model)
0.0001
96.498
96.506
96.502
96.506
0.001
96.568
96.583
96.576
96.583
0.01
96.492
96.506
96.499
96.506
0.1
96.363
96.350
96.357
96.350
0.2
94.403
94.254
94.329
94.254
0.3
91.769
91.071
91.418
91.071
0.4
88.225
85.714
86.951
85.714
0.5
84.181
78.804
81.404
78.804
0.6
81.082
68.322
74.157
68.322
Table 7 reports the performance of the three selected deep learning models under different noise conditions for the X-ray dataset. Both Simple CNN and Multi-headed CNN did not perform well in this context, whereas our new model was usually much more robust against noise. It should be noted that our feature fusion model performed better for the CT scan data than for the X-ray data.Robustness against noise results (given in %) provided by the 3 compared DL models for detecting COVID-19 cases for the CT scan dataset. Here, , where is the mean of the noise and is standard deviation of the noise.This stage of the experiments was necessary to demonstrate the stability of the applied models against noise. Our results clearly indicate that the proposed feature fusion model is robust against noise for both considered types of image data: CT scan and X-ray images. It should be mentioned that there are various COVID-19 diagnostic resources, the main being CT scan and X-ray imaging tools. Thus, we were motivated to propose an efficient deep learning-based COVID-19 detection model working promisingly on both CT scan and X-ray images in order to assist clinicians in providing timely diagnostics and clinical support to their patients in both of these fields.Robustness against noise results (given in %) provided by the 3 compared DL models for detecting COVID-19 cases for the X-ray dataset.
Unknown data detection
In this sub-section, we evaluate the performance of deep learning models when they are fed by unknown images. In these experimental settings the models either do not know or cannot clearly estimate the uncertainty of their predictions. To perform this evaluation, we fed the DL models being compared with one sample image from the well-known MNIST dataset (see Fig. 9). The mean and the STD values of the Simple CNN model, Multi-headed CNN model and our proposed feature fusion model are reported in Table 8. The obtained results indicate that our feature fusion model showed its uncertainty towards unknown data much better than the two other DL models.
Fig. 9
The MNIST sample image fed to the deep learning models as an unknown sample.
Table 8
Unknown image class detection by Simple CNN, Multi-headed CNN and our proposed feature fusion model when fed with the image presented in Fig. 9.
DL model
CT scan
X-ray
nCT
NiCT
pCT
COVID-19
Normal
Pneumonia
Deep 1 (Simple CNN)
Mean
0.02
0.98
0.0
0.57
0.15
0.28
STD
0.10
0.10
0.01
0.39
0.26
0.35
Deep 2 (Multi-headed CNN)
Mean
0.05
0.30
0.65
0.68
0.22
0.10
STD
0.19
0.43
0.45
0.32
0.27
0.16
Proposed fusion model
Mean
0.56
0.0
0.44
0.41
0.59
0.0
STD
0.50
0.07
0.50
0.49
0.49
0.0
We fed the MNIST sample image presented in Fig. 9 to the three deep learning models trained on CT scan and X-ray datasets, and then predicted the class of this unknown image sample.The MNIST sample image fed to the deep learning models as an unknown sample.Estimating uncertainty of traditional machine learning and deep learning models using different UQ methods is vital during critical predictions such as medical case studies. Ideally, the applied ML models should be able to capture a portion of both epistemic and aleatoric uncertainties. In this study, we applied a new feature fusion model to classify two types of medical data: CT scan and X-ray images. Table 8 reports the Mean and the STD values of the three considered deep learning models applied to unknown data. It should be noted that the Mean value accounts for the model’s prediction and STD accounts for its uncertainty. As reported in Table 8, our model usually provides zero (or close to zero) values of Mean and STD for one of the image classes (for both CT scan and X-ray image types).Unknown image class detection by Simple CNN, Multi-headed CNN and our proposed feature fusion model when fed with the image presented in Fig. 9.
Discussion
Nowadays, timely and accurate detection of COVID-19 cases has become a crucial health care task. Various methods from different fields of science have been proposed to tackle the problem of accurate COVID-19 diagnostic. Traditional machine learning (ML) and deep learning (DL) methods have been among the most effective of them. In this work, we mainly focused on the detection of COVID-19 cases using CT scan and X-ray image data. We proposed a new simple but very efficient feature fusion model, called , and compared its performance with several classical ML and DL techniques. The prediction results we obtained confirm that our feature fusion model can be highly effective in detecting the COVID-19 cases. Moreover, we have shown the superiority of our model in dealing with noise data. The obtained results also reveal that the proposed model can be effectively used for classifying previously unseen images.Our study attempts to fill the gap reported in the literature [20]. To do so, we have compared the performance of the proposed feature fusion model to recent state-of-the-art machine learning techniques used to classify CT scan and X-ray image data (see Table 11). The Grad-CAM visualization procedure was carried out to identify the important features for each data class (this analysis was conducted for both CT scan and X-ray image datasets). Fig. 12, Fig. 13 illustrate the most important features used by our feature fusion model to identify each data class separately for CT scan and X-ray image datasets, respectively. Moreover, the T-SNE visualisation of different models applied to the CT scan and X-ray datasets without and with quantifying uncertainty are presented in Fig. 10, Fig. 11. Finally, the output posterior distributions of our proposed feature fusion model for both considered image datasets are presented in Fig. 14. This figure clearly shows that the correctly classified samples of a given class do not overlap with samples of the other classes (incorrect classes).
Table 11
Comprehensive comparison of the results provided by our proposed model with the state-of-the-art techniques for automated detection of COVID-19 cases using both the CT scan and X-ray image datasets.
Dataset
Study
Year
# of samples
Performance
UQ
Code
Precision
Recall
F-measure
Accuracy
AUC
CT scan
Li et al. [36]
2020
1540 (3 classes)
N/A
82.60
N/A
N/A
0.918
×
×
Jaiswal et al. [37]
2020
2492 (2 classes)
96.29
96.29
96.29
96.25
0.970
×
×
Wang et al. [38]
2020
640 (2 classes)
96.61
97.71
97.14
97.15
N/A
×
×
Sharma [39]
2020
2200 (3 classes)
N/A
92.10
N/A
91.00
N/A
×
×
Panwar et al. [40]
2020
1600 (2 classes)
95.00
95.00
95.00
95.00
N/A
×
×
Do and Vu [41]
2020
746 (2 classes)
85.00
85.00
85.00
85.00
0.922
×
×
Singh [42]
2020
N/A (2 classes)
N/A
91.00
89.97
93.50
N/A
×
×
Pham [43]
2020
746 (2 classes)
N/A
91.14
93.00
92.62
0.980
×
×
Martinez [44]
2020
746 (2 classes)
94.40
86.60
90.30
90.40
0.965
×
×
Loey et al. [45]
2020
11 012 (2 classes)
N/A
80.85
N/A
81.41
N/A
×
×
Ning et al. [18]
2020
19 685 (3 classes)
N/A
N/A
N/A
N/A
0.978
×
×
Han et al. [46]
2020
460 (3 classes)
95.90
90.50
92.30
94.30
0.988
×
×
Shamsi Jokandan et al. [7]
2021
746 (2 classes)
N/A
86.50
N/A
87.90
0.942
✓
×
Benmalek et al. [47]
2021
19 685 (3 classes)
98.50
98.60
98.50
N/A
N/A
✓
✓
Kumar et al. [48]
2022
2926 (2 classes)
N/A
N/A
N/A
98.87
N/A
×
✓
Masood et al. [49]
2022
19 685 (2 classes)
99.75
99.70
99.72
99.75
N/A
×
×
Ours
2022
19 685 (3 classes)
99.08
99.08
99.08
99.08
1.00
✓
✓
X-ray
Khan et al. [50]
2020
1251 (4 classes)
90.00
89.92
89.80
89.60
N/A
×
✓
Ozturk et al. [22]
2020
1125 (3 classes)
89.96
85.35
87.37
87.02
N/A
×
✓
Mesut and [51]
2020
458 (3 classes)
98.89
98.33
98.57
99.27
N/A
×
✓
Mahmud et al. [52]
2020
1220 (4 classes)
82.87
83.82
83.37
90.30
0.825
×
✓
Heidari et al. [53]
2020
2544 (3 classes)
N/A
N/A
N/A
94.50
N/A
×
×
Rahimzadeh and Attar [54]
2020
11 302 (3 classes)
72.83
87.31
N/A
91.40
N/A
×
✓
Pereira et al. [55]
2020
1144 (7 classes)
N/A
N/A
64.91
N/A
N/A
×
×
De Moura et al. [56]
2020
1616 (3 classes)
79.00
79.33
79.33
79.86
N/A
×
×
Yoo et al. [57]
2020
1170 (2 classes)
97.00
99.00
97.98
98.00
0.980
×
×
Chandra et al. [58]
2020
2346 (2 classes)
N/A
N/A
N/A
91.32
0.914
×
×
Zhang et al. [59]
2020
2706 (2 classes)
77.13
N/A
N/A
78.57
0.844
×
✓
Shamsi Jokandan et al. [7]
2021
100 (2 classes)
N/A
99.90
N/A
98.60
0.997
✓
×
Ahmad et al. [60]
2021
4200 (4 classes)
93.01
92.97
92.97
96.49
N/A
×
×
Patel [61]
2021
6432 (3 classes)
N/A
N/A
N/A
93.67
N/A
✓
✓
Basu et al. [62]
2022
2926 (2 classes)
N/A
92.90
N/A
97.60
N/A
×
×
Masud [63]
2022
6432 (2 classes)
N/A
N/A
N/A
92.70
0.964
×
×
Ours
2022
6432 (3 classes)
96.35
96.37
96.36
96.35
0.993
✓
✓
Fig. 12
Grad-CAM visualization for our proposed fusion model without and with UQ for nCT (Fig. 12, Fig. 12), NiCT (Fig. 12, Fig. 12), and pCT (Fig. 12, Fig. 12) classes using CT scan dataset.
Fig. 13
Grad-CAM visualization for our proposed fusion model without and with UQ for COVID-19 (Fig. 13, Fig. 13), Normal (Fig. 13, Fig. 13), and Pneumonia (Fig. 13, Fig. 13) classes using the X-ray dataset.
Fig. 10
T-SNE visualisation of different models applied to the CT scan data without and with quantifying uncertainty.
Fig. 11
T-SNE visualisation of different models applied to the X-ray data without and with quantifying uncertainty.
Fig. 14
The output posterior distributions of our proposed feature fusion model calculated for the nCT 14(a), NiCT 14(b) and pCT 14(c) data classes for the CT scan dataset, and the COVID-19 14(d), Normal 14(e) and Pneumonia 14(f) data classes for the X-ray dataset.
T-SNE visualisation of different models applied to the CT scan data without and with quantifying uncertainty.T-SNE visualisation of different models applied to the X-ray data without and with quantifying uncertainty.Grad-CAM visualization for our proposed fusion model without and with UQ for nCT (Fig. 12, Fig. 12), NiCT (Fig. 12, Fig. 12), and pCT (Fig. 12, Fig. 12) classes using CT scan dataset.Grad-CAM visualization for our proposed fusion model without and with UQ for COVID-19 (Fig. 13, Fig. 13), Normal (Fig. 13, Fig. 13), and Pneumonia (Fig. 13, Fig. 13) classes using the X-ray dataset.The output posterior distributions of our proposed feature fusion model calculated for the nCT 14(a), NiCT 14(b) and pCT 14(c) data classes for the CT scan dataset, and the COVID-19 14(d), Normal 14(e) and Pneumonia 14(f) data classes for the X-ray dataset.
Comparison with the state-of-the-art
In this sub-section, we quickly compare the results provided by our new model with those yielded by the state-of-the-art DL techniques (see Table 9, Table 10). The state-of-the-art models used in our comparison are the Bayesian Deep Learning [21], DarkCovidNet [22],CNN [23], DeTraC (Decompose, Transfer, and Compose) [24], and ResNet50 [25] models.
Table 9
Comparison of the results of our DL feature fusion model with the state-of-the-art DL models for CT scan data.
DL model
Precison
Recall
F-measure
Accuracy
Bayesian Deep Learning [21]
98.351
98.333
98.342
98.333
DarkCovidNet [22]
97.460
97.458
97.459
97.458
CNN [23]
97.753
97.750
97.751
97.750
DeTraC [24]
96.972
96.958
96.965
96.958
ResNet50 [25]
95.571
95.541
95.556
95.541
Proposed fusion model
99.085
99.085
99.085
99.085
Table 10
Comparison of the results of our DL feature fusion model with the state-of-the-art DL models for X-ray data.
DL model
Precison
Recall
F-measure
Accuracy
Bayesian Deep Learning [21]
95.398
95.419
95.408
95.419
DarkCovidNet [22]
95.752
95.729
95.741
95.729
CNN [23]
95.400
95.341
95.370
95.341
DeTraC [24]
95.276
95.263
95.270
95.263
ResNet50 [25]
94.153
94.177
94.165
94.177
Proposed fusion model
96.350
96.370
96.360
96.350
As can be seen from Table 9, Table 10, our proposed feature fusion model not just only achieved superior performance but also significantly outperformed the state-of-the-art models applied to the same datasets.Comparison of the results of our DL feature fusion model with the state-of-the-art DL models for CT scan data.Comparison of the results of our DL feature fusion model with the state-of-the-art DL models for X-ray data.
Significance of the feature fusion model
Wang et al. [20] proposed a DL feature fusion model for COVID-19 case detection. The model introduced by Wang et al. provided excellent prediction performance for CT scan data considered. However, the authors stated that their model may be much less efficient for other types of medical data such as X-ray images. In another study, Tang et al. [26] proposed an ensemble deep learning model for COVID-19 detection using X-ray image data only. Moreover, most of the existing studies focus on COVID-19 case detection without conducting any uncertainty analysis of the model’s predictions. Shamsi et al. [7] have been among rare authors who considered uncertainty in their study; however, they used very small datasets in their training experiments.In this work, we proposed a novel general feature fusion model which can be effectively used to analyze large CT scan and X-ray datasets (both of these types of images can be processed successively), while quantifying the uncertainty of the model’s predictions using the Ensemble MC Dropout (EMCD) technique. It should be noted that the proposed feature fusion model could be easily generalized to classify other complex diseases. Moreover, the model’s performance could be further improved by incorporating into its different optimization algorithms such as the Arithmetic optimization algorithm [27], Aquila optimizer [28], Artificial Immune System (AIS) algorithm [29], Marine Predators algorithm [30], or Cuckoo search optimization algorithm [31]. Finally, Neural Architecture Search (NAS) is a new technique for automating the design of various deep learning models. Therefore, the architecture of the proposed feature fusion model can be further improved using newly proposed NAS techniques [32], [33], [34], [35].A comprehensive comparison of the results provided by our proposed model with the state-of-the-art techniques for automated detection of COVID-19 cases using both the CT scan and X-ray image datasets is presented in Table 11. Moreover, the most important features of our feature fusion model are summarized below:Our model provided the highest COVID-19 detection performance compared to traditional machine learning models, some simple deep learning models as well as to state-of-the-art deep learning techniques for both considered types of medical data (CT scan and X-ray images).Proposed model takes advantage of an uncertainty quantification strategy based on the effective Ensemble MC Dropout (EMCD) technique.Proposed model is robust against noise.Proposed model is able to detect unknown data with high accuracy.Comprehensive comparison of the results provided by our proposed model with the state-of-the-art techniques for automated detection of COVID-19 cases using both the CT scan and X-ray image datasets.
Conclusion
In this study, we have described a new deep learning feature fusion model to accurately detect COVID-19 cases using CT scan and X-ray data. In order to detect the COVID-19 cases accurately and provide health practitioners with an efficient diagnostic tool they could rely on, we carried out the uncertainty quantification of the model’s predictions while detecting the disease cases. Moreover, our model demonstrated an excellent robustness to noise and ability to process unknown data. A class-wise analysis procedure has been implemented to ensure a steady performance of the model. We have demonstrated the effectiveness of our model using various computational experiments. Our experimental results suggest that the presented feature fusion model can be applied to analyze efficiently both CT and X-ray data. The use of hierarchical features in the model’s architecture helped to outperform the considered traditional machine learning models, classical deep learning models, and state-of-the-art deep learning models. The limitations of the proposed feature fusion model will be addressed in our future studies. Thus, in the future, we intend to: (i) expand the considered COVID-19 datasets and test our feature fusion model using multi-modal data, (ii) include an attention mechanism while merging features, and (iii) integrate into our model some modern data fusion techniques such as decision level fusion.
CRediT authorship contribution statement
Moloud Abdar: Conception of the project, Design of methodology, Data selection and collection, Analysis and experimental protocol, Writing – original draft, Result discussion, Writing – review & editing. Soorena Salari: Analysis and experimental protocol, Drafting the initial draft, Experimental protocol, Result discussion, Writing – review & editing. Sina Qahremani: Analysis and experimental protocol, Drafting the initial draft, Experimental protocol, Result discussion, Writing – review & editing. Hak-Keung Lam: Writing – review & editing, Supervision. Fakhri Karray: Writing – review & editing, Supervision. Sadiq Hussain: Drafting the initial draft. Abbas Khosravi: Writing – review & editing, Supervision. U. Rajendra Acharya: Writing – review & editing, Supervision. Vladimir Makarenkov: Writing – review & editing, Supervision. Saeid Nahavandi: Supervision.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.