| Literature DB >> 34663998 |
Avinandan Banerjee1, Rajdeep Bhattacharya2, Vikrant Bhateja3,4, Pawan Kumar Singh1, Aime' Lay-Ekuakille5, Ram Sarkar2.
Abstract
Biomedical images contain a large volume of sensor measurements, which can reveal the descriptors of the disease under investigation. Computer-based analysis of such measurements helps detect the disease, and thereby swiftly aid medical professionals to choose adequate therapy. In this paper, we propose a robust deep learning ensemble framework known as COVID Fuzzy Ensemble Network, or COFE-Net. This strategy is proposed for the task of COVID-19 screening from chest X-rays (CXR) and CT Scans, as a part of Computer-Aided Detection (CADe) for medical practitioners. We leverage the strategy of Transfer Learning for Convolutional Neural Networks (CNNs) widely adopted in recent literature, and further propose an efficient ensemble network for their combination. The principles of fuzzy logic have been leveraged to combine the measured decision scores generated by three state-of-the-art CNNs - Inception V3, Inception ResNet V2 and DenseNet 201 - through the Choquet fuzzy integral. Experimental results support the efficacy of our approach over empirical ensembling, as the fuzzy ensembling strategy for biomedical measurement consists of dynamic refactoring of the classifier ensemble weights on the fly, based upon the confidence scores for coalitions of inputs. This is the chief advantage of our biomedical measurement strategy over others as other methods do not adjust to the multiple generated measurements dynamically unlike ours.Impressive results on multiple datasets demonstrate the effectiveness of the proposed method. The source code of our proposed method is made available at: https://github.com/theavicaster/covid-cade-ensemble.Entities:
Keywords: Biomedical measurement; COFE-Net; COVID-19 detection; CT Scan; Chest X-Ray; Classifier fusion; Deep learning; Ensemble; Fuzzy integral
Year: 2021 PMID: 34663998 PMCID: PMC8516129 DOI: 10.1016/j.measurement.2021.110289
Source DB: PubMed Journal: Measurement (Lond) ISSN: 0263-2241 Impact factor: 5.131
Fig. 1Sample Images of chest X-rays for all three classes in the COVID-X dataset.
Fig. 2Schematic diagram of our proposed methodology which consists of: (I) Preprocessing input biomedical images to conform to expected input of standard CNN architectures, (II) Classification using three CNNs leveraging Transfer Learning, and (III) Ensemble of classifiers using Choquet fuzzy integral to yield prediction, available for medical practitioners.
Fig. 3Proposed classification network following state-of-the-art CNN architectures.
Class-wise distribution of CXR samples in the COVID-X dataset.
| Phase | COVID-19 | Pneumonia | Normal | Total |
|---|---|---|---|---|
| Train | 468 | 5458 | 7966 | 13892 |
| Test | 100 | 594 | 885 | 1579 |
Comparison of base learners on the SARS-COV 2 CT Scan Dataset.
| Method | Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) | Specificity (%) | FPR (%) | AUC (%) | MCC (%) | McNemar’s Test |
|---|---|---|---|---|---|---|---|---|---|
| DenseNet 201 | 98.79 | 98.38 | 99.18 | 98.78 | 98.40 | 1.59 | 98.79 | 97.58 | – |
| Inception v3 | 97.18 | 97.02 | 97.28 | 98.92 | 97.07 | 2.92 | 97.18 | 94.36 | 0.0247 |
| Inception ResNet v2 | 97.58 | 97.30 | 97.83 | 97.56 | 97.34 | 2.65 | 97.58 | 95.16 | 0.0323 |
| ResNet 152 v2 | 96.24 | 96.20 | 96.20 | 96.20 | 96.27 | 3.72 | 96.24 | 92.48 | 0.0085 |
| EfficientNet B7 | 97.44 | 98.88 | 95.93 | 97.38 | 98.93 | 1.06 | 97.43 | 94.93 | 0.0441 |
| Xception | 97.18 | 98.06 | 96.20 | 97.12 | 98.13 | 1.86 | 97.17 | 94.37 | 0.0247 |
| VGG 19 | 97.71 | 97.82 | 97.56 | 97.69 | 97.87 | 2.12 | 97.71 | 95.43 | 0.05330 |
Confusion matrix for Inception V3 on SARS-COV 2 CT Scan Dataset.
| Predicted | |||
|---|---|---|---|
| True | 365 | 11 | |
| 10 | 359 | ||
Confusion matrix for Inception ResNet v2 on SARS-COV 2 CT Scan Dataset.
| Predicted | |||
|---|---|---|---|
| True | 366 | 10 | |
| 8 | 361 | ||
Confusion matrix for DenseNet 201 on SARS-COV 2 CT Scan Dataset.
| Predicted | |||
|---|---|---|---|
| True | 370 | 6 | |
| 3 | 366 | ||
McNemar’s test on the SARS COV-2 CT Scan Dataset compared to ensemble model.
| CNN Model | McNemar’s Test |
|---|---|
| Inception v3 | 0.0360 |
| Inception ResNet v2 | 0.0442 |
| DenseNet 201 | 0.0483 |
KL and JS divergences among CNN classifiers on SARS COV-2 CT Scan Dataset.
| Distribution | Distribution | ||
|---|---|---|---|
| Inception V3 | DenseNet 201 | 0.3452 | 0.1020 |
| DenseNet 201 | Inception V3 | 0.2202 | |
| Inception ResNet v2 | DenseNet 201 | 0.3245 | 0.1262 |
| DenseNet 201 | Inception ResNet v2 | 0.2506 | |
| Inception V3 | Inception ResNet v2 | 0.3330 | 0.0983 |
| Inception ResNet v2 | Inception V3 | 0.3100 | |
Configuration for fuzzy ensemble on the 3-class classification problem on the COVID-X Dataset.
| CNN Model | Accuracy (%) | Fuzzy Measure |
|---|---|---|
| Inception v3 | 95.06 | 0.038 |
| Inception ResNet v2 | 94.62 | 0.015 |
| DenseNet 201 | 95.88 | 0.074 |
Configuration for fuzzy ensemble on the 2-class classification problem on the COVID-X Dataset.
| CNN Model | Accuracy (%) | Fuzzy Measure |
|---|---|---|
| Inception v3 | 99.36 | 0.030 |
| Inception ResNet v2 | 99.36 | 0.043 |
| DenseNet 201 | 99.36 | 0.026 |
Comparison with empirical ensemble methods on the COVID-X Dataset.
| Ensemble Method | Accuracy (%) 3-Class | Accuracy (%) 2-Class |
|---|---|---|
| Maximum | 94.68 | 99.36 |
| Multiplication | 95.22 | 99.36 |
| Average | 95.87 | 99.41 |
| Weighted Average | 96.20 | 99.46 |
| 96.39 | 99.49 | |
Confusion matrix for 3-class classification on COVID-X.
| Predicted | ||||
|---|---|---|---|---|
| 95 | 5 | 0 | ||
| True | 0 | 870 | 15 | |
| 2 | 35 | 557 | ||
Confusion Matrix for 2-class classification on COVID-X dataset.
| Predicted | |||
|---|---|---|---|
| True | 93 | 7 | |
| 1 | 1478 | ||
Performance Metrics for 3-class classification on COVIDx dataset.
| Metric (%) | COVID-19 | Normal | Pneumonia | Overall |
|---|---|---|---|---|
| Accuracy | 99.56 | 96.51 | 96.70 | 96.39 |
| Precision | 97.94 | 95.60 | 97.38 | 96.97 |
| Recall | 95.00 | 98.30 | 93.77 | 95.69 |
| F1-Score | 96.44 | 96.93 | 95.54 | 96.30 |
| Specificity | 99.86 | 94.24 | 98.47 | 97.52 |
| FPR | 0.135 | 5.764 | 1.523 | 2.474 |
| AUC | 97.43 | 96.27 | 96.12 | – |
| MCC | 96.22 | 92.95 | 92.97 | 93.31 |
Performance Metrics for 2-Class classification on COVIDx dataset.
| Metric (%) | COVID-19 | Non COVID-19 | Overall |
|---|---|---|---|
| Accuracy | 99.49 | 99.49 | 99.49 |
| Precision | 98.94 | 99.52 | 99.23 |
| Recall | 93.00 | 99.93 | 96.46 |
| F1-Score | 95.88 | 99.73 | 97.80 |
| Specificity | 93.00 | 99.93 | 96.46 |
| FPR | 0.068 | 7.00 | 3.53 |
| AUC | 96.46 | 96.46 | – |
| MCC | 95.66 | 95.66 | 95.66 |
Confusion matrix for classification on COVID-19 Radiography Database.
| Predicted | ||||
|---|---|---|---|---|
| 120 | 0 | 0 | ||
| True | 0 | 133 | 1 | |
| 0 | 1 | 133 | ||
Performance Metrics for Classification on COVID-19 Radiography Database.
| Metric (%) | COVID-19 | Normal | Viral Pneumonia | Overall |
|---|---|---|---|---|
| Accuracy | 100.00 | 99.46 | 99.46 | 99.49 |
| Precision | 100.00 | 99.25 | 99.25 | 99.50 |
| Recall | 100.00 | 99.25 | 99.25 | 99.50 |
| F1-Score | 100.00 | 99.25 | 99.25 | 99.50 |
| Specificity | 100.00 | 99.61 | 99.61 | 99.74 |
| FPR | 0.00 | 0.394 | 0.394 | 0.262 |
| AUC | 100.00 | 99.43 | 99.43 | – |
| MCC | 100.00 | 98.86 | 98.86 | 99.22 |
Confusion matrix for classification on SARS-COV 2 CT Scan Dataset.
| Predicted | |||
|---|---|---|---|
| True | 370 | 6 | |
| 2 | 367 | ||
Performance metrics upon SARS-COV 2 CT Scan Dataset.
| Method | Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) | Specificity (%) | FPR (%) | AUC (%) | MCC (%) |
|---|---|---|---|---|---|---|---|---|
| COFE-Net | 98.93 | 98.40 | 99.46 | 98.92 | 98.40 | 1.59 | 98.93 | 97.86 |
| Inception V3 | 97.18 | 97.02 | 97.28 | 98.92 | 97.07 | 2.92 | 97.18 | 94.36 |
| Inception ResNet V2 | 97.58 | 97.30 | 97.83 | 97.56 | 97.34 | 2.65 | 97.58 | 95.16 |
| DenseNet 201 | 98.79 | 98.38 | 99.18 | 98.78 | 98.40 | 1.59 | 98.79 | 97.58 |
Fig. 4ROC Curve for SARS-COV 2 CT Scan Dataset.
Confusion matrix for classification on the Montgomery Dataset.
| Predicted | |||
|---|---|---|---|
| True | 16 | 0 | |
| 1 | 11 | ||
Performance Metrics on the Montgomery Dataset.
| Metric (%) | Normal | Tuberculosis | Overall |
|---|---|---|---|
| Accuracy | 96.43 | 96.43 | 96.43 |
| Precision | 94.11 | 100.00 | 97.06 |
| Recall | 100.00 | 91.67 | 95.83 |
| F1-Score | 96.97 | 95.65 | 96.31 |
| Specificity | 91.67 | 100.00 | 95.83 |
| FPR | 8.333 | 0.000 | 4.167 |
| AUC | 95.83 | 95.83 | – |
| MCC | 92.88 | 92.88 | 92.88 |
Results of the ablation study upon SARS-COV 2 CT Scan Dataset.
| Method | Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) | Specificity (%) | FPR (%) | AUC (%) | MCC (%) |
|---|---|---|---|---|---|---|---|---|
| Inception V3 | 97.18 | 97.02 | 97.28 | 98.92 | 97.07 | 2.92 | 97.18 | 94.36 |
| Inception ResNet V2 | 97.58 | 97.30 | 97.83 | 97.56 | 97.34 | 2.65 | 97.58 | 95.16 |
| DenseNet 201 | 98.79 | 98.38 | 99.18 | 98.78 | 98.40 | 1.59 | 98.79 | 97.58 |
| Inception V3 and Inception ResNet V2 | 97.71 | 96.56 | 98.91 | 97.72 | 96.54 | 3.457 | 97.72 | 95.46 |
| Inception V3 and DenseNet 201 | 98.79 | 98.38 | 99.18 | 98.78 | 98.40 | 1.595 | 98.79 | 97.58 |
| Inception ResNet V2 and DenseNet 201 | 98.79 | 98.64 | 98.91 | 98.78 | 98.67 | 1.329 | 98.79 | 97.58 |
| Ensemble of all three | 98.93 | 98.40 | 99.46 | 98.92 | 98.40 | 1.59 | 98.93 | 97.86 |
Results of the ablation study for three-class classification upon COVID-X Dataset.
| Method | Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) | Specificity (%) | FPR (%) | MCC (%) |
|---|---|---|---|---|---|---|---|
| Inception V3 | 95.06 | 96.77 | 94.01 | 95.30 | 95.50 | 3.491 | 90.86 |
| Inception ResNet V2 | 97.58 | 93.07 | 91.81 | 92.42 | 96.59 | 3.41 | 90.05 |
| DenseNet 201 | 95.88 | 96.03 | 95.02 | 95.50 | 97.22 | 2.77 | 92.38 |
| Inception V3 and Inception ResNet V2 | 95.95 | 94.88 | 94.42 | 94.65 | 97.44 | 2.557 | 92.50 |
| Inception V3 and DenseNet 201 | 96.07 | 96.76 | 95.15 | 95.92 | 97.30 | 2.700 | 92.73 |
| Inception ResNet V2 and DenseNet 201 | 96.20 | 95.28 | 94.36 | 94.81 | 97.60 | 2.393 | 92.96 |
| Ensemble of all three | 96.39 | 96.97 | 95.69 | 96.30 | 97.52 | 2.474 | 93.31 |
Results of the ablation study for two-class classification upon COVID-X Dataset.
| Method | Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) | Specificity (%) | FPR (%) | AUC (%) | MCC (%) |
|---|---|---|---|---|---|---|---|---|
| Inception V3 | 99.36 | 98.67 | 95.93 | 97.25 | 95.93 | 4.06 | 95.93 | 94.56 |
| Inception ResNet V2 | 99.36 | 99.15 | 95.46 | 97.22 | 95.46 | 4.53 | 95.46 | 94.56 |
| DenseNet 201 | 99.36 | 98.67 | 95.93 | 97.25 | 95.93 | 4.06 | 95.93 | 94.56 |
| Inception V3 and Inception ResNet V2 | 99.36 | 99.15 | 95.46 | 97.22 | 95.46 | 4.534 | 95.46 | 94.54 |
| Inception V3 and DenseNet 201 | 99.43 | 98.71 | 96.43 | 97.54 | 96.43 | 3.56 | 96.43 | 95.11 |
| Inception ResNet V2 and DenseNet 201 | 99.43 | 99.19 | 95.96 | 97.51 | 95.96 | 4.03 | 95.96 | 95.10 |
| Ensemble of all three | 99.49 | 99.23 | 96.46 | 97.80 | 96.46 | 3.53 | 96.46 | 95.66 |
K-Fold performance metrics upon SARS-COV 2 CT Scan Dataset.
| Fold Number | Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) | Specificity (%) | FPR (%) | AUC (%) | MCC (%) |
|---|---|---|---|---|---|---|---|---|
| 1 | 99.80 | 100.00 | 99.59 | 99.79 | 100.00 | 0.000 | 99.79 | 99.59 |
| 2 | 99.60 | 100.00 | 99.19 | 99.59 | 100.00 | 0.000 | 99.59 | 99.20 |
| 3 | 99.79 | 99.59 | 100.00 | 99.79 | 99.60 | 0.0040 | 99.80 | 99.59 |
| 4 | 99.59 | 99.19 | 100.00 | 99.59 | 99.20 | 0.0079 | 99.60 | 99.35 |
| 5 | 99.60 | 98.38 | 99.60 | 99.60 | 99.60 | 0.0040 | 99.60 | 99.49 |
| Average | 99.68 | 98.38 | 99.68 | 99.68 | 99.68 | 0.0032 | 99.68 | 99.44 |
Fig. 5ROC curve for first fold on SARS-COV 2 CT Scan Dataset.
Confusion matrix for first fold on SARS COV-2 CT Scan Dataset.
| Predicted | |||
|---|---|---|---|
| True | 250 | 0 | |
| 1 | 244 | ||
Comparison with state-of-the-art methods for COVID-19 CADe on the COVID-X dataset [18].
| Method | Data Distribution | Accuracy (%) |
|---|---|---|
| COVID-Net | 358 COVID-19, 5538 Pneumonia, 8066 Normal | 93.3 |
| COVID-ResNet | 68 COVID-19, 1591 Pneumonia, 1203 Normal | 96.23 |
| COVID-CAPS | Not specified | 98.3 |
| COVIDiagnosis-Net | 76 COVID-19, 4290 Pneumonia, 1583 Normal | 98.26 |
| 96.39 | ||
| 99.49 | ||
Comparison with state-of-the-art methods for COVID-19 CADe on multi-class classification.
| Method | Data Distribution | Accuracy (%) |
|---|---|---|
| Transfer Learning Dataset 1 | 224 COVID-19, 700 Pneumonia, 504 Normal | 93.48 |
| Transfer Learning Dataset 2 | 224 COVID-19, 714 Pneumonia, 504 Normal | 94.72 |
| Majority Voting ML | 782 COVID-19, 782 Pneumonia, 782 Normal | 93.41 |
| DenseNet201 | 423 COVID-19, 1485 Pneumonia, 1579 Normal | 97.94 |
| Cascaded CNNs | 69 COVID-19, 79 Bact. Pneumonia, 79 Viral Pneumonia, 79 Normal | 99.9 |
| CoroNet Dataset 1 | 284 COVID-19, 657 Pneumonia, 310 Normal | 95.0 |
| CoroNet Dataset 2 | 157 COVID-19, 500 Pneumonia, 500 Normal | 90.21 |
| Stacked VGG Ensemble | 219 COVID-19, 1345 Pneumonia, 1341 Normal | 97.4 |
| Pruned Weighted Average | 313 COVID-19, 8792 Pneumonia, 7595 Normal | 99.01 |
| 96.39 | ||
Comparison with state-of-the-art methods for COVID-19 CADe on binary-class classification.
| Method | Data Distribution | Accuracy (%) |
|---|---|---|
| Transfer Learning Dataset 1 | 224 COVID-19, 1204 nonCOVID-19 | 98.75 |
| Transfer Learning Dataset 2 | 224 COVID-19, 1214 nonCOVID-19 | 96.78 |
| DenseNet201 | 423 COVID-19, 3064 nonCOVID-19 | 99.70 |
| CoroNet Dataset 1 | 284 COVID-19, 967 nonCOVID-19 | 99.0 |
| DarkCovidNet | 127 COVID-19, 500 nonCOVID-19 | 98.08 |
| Majority Voting ML | 782 COVID-19, 1564 nonCOVID-19 | 98.06 |
| Stacked VGG Ensemble | 219 COVID-19, 2686 nonCOVID-19 | 99.48 |
| Class Decomposition | 116 COVID-19, 80 nonCOVID-19 | 97.35 |
| 99.49 | ||
Comparison with state-of-the-art methods on the COVID-19 Radiography Database.
| Method | Accuracy (%) |
|---|---|
| VGG 19 | 93.00 |
| Transfer Learning | 98.29 |
| AlexNet | |
| 99.49 | |
Comparison with state-of-the-art methods on the SARS-COV 2 CT Scan Dataset.
| Method | Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) | Specificity (%) |
|---|---|---|---|---|---|
| xDNN | 88.60 | 89.70 | 88.60 | 89.15 | – |
| Transfer Learning | 94.04 | 95.00 | 94.00 | 94.50 | 95.86 |
| Bi-stage FS | 95.32 | 95.30 | 95.30 | 95.30 | – |
| DenseNet 201 | 96.25 | 96.29 | 96.29 | 96.29 | 96.21 |
| KarNet | 97.00 | 95.00 | 98.00 | 97.00 | 95.00 |
| Gabor Ensemble | 97.40 | 99.10 | 95.50 | 97.30 | – |
| 98.93 | 98.40 | 99.46 | 98.92 | 98.40 | |
Comparison with state-of-the-art methods on the Montgomery Dataset.
| Method | Accuracy (%) |
|---|---|
| FRCNN | 92.60 |
| HDHFS | 92.70 |
| HCDEL | 93.47 |
| VoPreCNNFT | 97.50 |
| 96.43 | |
Fig. 6Misclassified CT Scan sample with true class non COVID-19.
Fig. 7Misclassified CT Scan sample with true class COVID-19.
Fig. 8Misclassified CXR sample with true class Viral Pneumonia.