| Literature DB >> 36157353 |
Ebenezer Jangam1,2, Chandra Sekhara Rao Annavarapu2, Aaron Antonio Dias Barreto3.
Abstract
To accurately diagnose multiple lung diseases from chest X-rays, the critical aspect is to identify lung diseases with high sensitivity and specificity. This study proposed a novel multi-class classification framework that minimises either false positives or false negatives that is useful in computer aided diagnosis or computer aided detection respectively. To minimise false positives or false negatives, we generated respective stacked ensemble from pre-trained models and fully connected layers using selection metric and systematic method. The diversity of base classifiers was based on diverse set of false positives or false negatives generated. The proposed multi-class framework was evaluated on two chest X-ray datasets, and the performance was compared with the existing models and base classifiers. Moreover, we used LIME (Local Interpretable Model-agnostic Explanations) to locate the regions focused by the multi-class classification framework.Entities:
Keywords: COVID-19; Deep learning; Multi-class classification; Stacked ensemble; Transfer learning
Year: 2022 PMID: 36157353 PMCID: PMC9490695 DOI: 10.1007/s11042-022-13710-5
Source DB: PubMed Journal: Multimed Tools Appl ISSN: 1380-7501 Impact factor: 2.577
Multi-class classification models
| Study | Data | Method | Contribution |
|---|---|---|---|
| Ibrahim et al. [ | chest X-rays combined from different sources | AlexNet | Two-way, three-way and four-way classification |
| Nishio et al. [ | 1248 images taken from two public chest X-ray datasets | VGG-16 | CADx system for evaluation of COVID-19 pneumonia, non-COVID-19 pneumonia and healthy images |
| Asif et al. [ | Mixed dataset of CXR and CT scan images | Deep CNN based InceptionV3 | Three class classification of COVID-19 pneumonia, non-COVID-19 pneumonia and healthy images |
| Khan et al. [ | Mixed dataset of CXR images | Modification of Xception architecture and transfer learning | Three-class classification of COVID-19 CXR images |
| Chowdhury et al. [ | Mixed dataset of CXR images | Image augmentation, transfer learning and multiple pre-trained models | Comparision of performance of different pre-trained models |
| Shelke et al. [ | Indian dataset of CXR images | Pneumonia detection using DenseNet-161, COVID-19 detection using ResNet-18 and VGG-16 | Classification of COVID-19 pneumonia from normal, pneumonia and tuberculosis. |
| Bassi and Attux [ | Indian dataset of CXR images | Pneumonia detection using DenseNet-161, COVID-19 detection using ResNet-18 and VGG-16 | Classification of COVID-19 pneumonia from normal, pneumonia and tuberculosis. |
| Karakanis et al. [ | CXR images and synthetic images | GAN for synthetic images; Lightweight ResNet8 for COVID-19 detection; GRAD CAM for heatmap generation | Three-class classification of COVID-19 pneumonia from normal, pneumonia |
| Ibrahim et al. [ | 33,676 CXR and CT images from RSNA and SIRM | ResNet152V2 and VGG19 | For classification of COVID-19 pneumonia from lung cancer and pneumonia, VGG19 provided better accuracy. |
| Ibrahim et al. [ | CXR images from two different datasets | Evaluation using ResNet, MobileNet, DenseNet and InceptionV3 | Comparision of accuracy of five pre-trained models. DenseNet121 provided better accuracy. |
| Karar et al. [ | CXR images | VGG16, ResNet50V2 and DenseNet169 | VGG16, ResNet50V2 and DenseNet169 provided better performance for COVID19, Viral pneumonia and bacterial pneumonia respectively. |
| Zebin et al. [ | CXR images | VGG16, ResNet50 and EfficientNetB0 and gradient class activation mapping for progress monitoring | In addition to classification, disease monitoring of COVID-19 was performed. |
| Gupta et al. [ | COVID19 Radiograph dataset and Chest X-ray dataset | Integrated stacking of pre-trained models | Classification using intgrated stacking of pre-trained models. |
| Ismaelet and Ṡengür [ | Chest X-ray images collected from multiple datasets | ResNet and VGG for feature extraction and SVM for classification | Hybrid model for classification |
| Rahimzadehet and Attar [ | 11,302 CXR images collected from two public datasets | Concatenation of Xception and ResNet50V2 | Three class classification |
| Mahmud et al. [ | Balanced dataset of CXR images collected from two datasets | DNN based on depthwise dilated convolution. Features extracted from different resolutions of X-rays are jointly converged by stacking algorithm | classification using Covxnet with feature extraction, stacking, and gradient-based activation mapping |
| Hussain et al. [ | Assembled CXR images and CT scans | Modification of existing architecture | Classification using CoroDet |
| Abbas et al. [ | CXR images from JSRT and other public dataset | Decompose using AlexNet; Transfer and compose using AlexNet, VGG19, ResNet, GoogleNet and Sqeezenet | Proposed Decompose, Transfer and Compose method |
Fig. 1Overview of generation of multi-class classification framework
Fig. 2Disease screening model architecture
Fig. 3Disease diagnosis model architecture
Datasets
| Name | Dataset1 [ | Dataset2 [ |
|---|---|---|
| Total Images | 15,153 | 6,118 |
| COVID-19 Positive | 3,616 | 262 |
| Viral Pneumonia | 1,345 | 1,583 |
| Normal Images | 10,192 | 4,273 |
| Images used for training | 13,953 | 5,600 |
| (9792 N, 945 P, 3216 C) | (4100 N, 1400 P, 100 C) | |
| Images used for Validation | 600 | 218 |
| (200 of each class) | (73 N, 83 P, 62 C) | |
| Images used for Testing | 600 (200 of each class) | 300 (100 of each class) |
N : Normal, P : Pneumonia, C : COVID-19
Comparison of multi-class classification framework’s precision with base classifiers on COVID-19 Radiography Database [15]
| Model | Precision-Normal | Precision-Pneumonia | Precision-COVID-19 |
|---|---|---|---|
| 0.9091 | 0.99 | ||
| 0.9346 | 0.9947 | 0.9848 | |
| 0.9132 | 0.9898 | ||
| 0.9171 | 0.9747 | ||
| 0.8924 | 0.9945 | 0.9897 | |
| 0.9337 | 0.9648 | 0.9366 | |
| 0.9458 | 0.9945 | 0.9213 | |
| 0.8959 | 0.9522 | ||
| 0.9471 | 0.9896 | 0.96 | |
| 0.9327 | 0.9655 | ||
| 0.9561 | 0.9701 | 0.9794 | |
| 0.9843 | 0.9707 | 0.9608 | |
| 0.9346 | |||
| 0.9947 | 0.9476 |
Bold entries denote highest values for a specific evaluation metric (Column)
Comparison of multi-class classification framework’s recall with base classifiers on COVID-19 Radiography Database [15]
| Model | Recall-Normal | Recall-Pneumonia | Recall-COVID-19 |
|---|---|---|---|
| 0.9 | 0.99 | ||
| 0.94 | 0.97 | ||
| 0.92 | 0.975 | ||
| 0.995 | 0.925 | 0.965 | |
| 0.995 | 0.905 | 0.965 | |
| 0.915 | 0.96 | 0.96 | |
| 0.96 | 0.9 | ||
| 0.99 | 0.85 | ||
| 0.985 | 0.95 | 0.96 | |
| 0.97 | 0.945 | 0.98 | |
| 0.98 | 0.975 | 0.95 | |
| 0.94 | 0.98 | ||
| 0.98 | 0.95 | ||
| 0.93 |
Bold entries denote highest values for a specific evaluation metric (Column)
Comparison of proposed multi-class classification framework with base classifiers on COVID-19 Radiography Database [15]
| Model | Precision | Recall | Accuracy | F1 Score |
|---|---|---|---|---|
| 0.9722 | 0.97 | 0.9633 | 0.9711 | |
| 0.9747 | 0.97 | 0.97 | 0.9723 | |
| 0.9732 | 0.9675 | 0.965 | 0.9704 | |
| 0.9666 | 0.9625 | 0.9617 | 0.9646 | |
| 0.9666 | 0.9575 | 0.955 | 0.962 | |
| 0.9429 | 0.9488 | 0.945 | 0.9458 | |
| 0.9457 | 0.9625 | 0.9517 | 0.954 | |
| 0.9501 | 0.9575 | 0.945 | 0.9538 | |
| 0.9642 | 0.9638 | 0.965 | 0.964 | |
| 0.9659 | 0.9688 | 0.965 | 0.9673 | |
| 0.9713 | 0.9638 | 0.9683 | 0.9675 | |
| 0.9691 | 0.9738 | 0.9717 | 0.9714 | |
| 0.97 | ||||
| 0.9688 | 0.975 | 0.9744 |
Bold entries denote highest values for a specific evaluation metric (Column)
Comparison of multi-class classification framework’s specificity with base classifiers on COVID-19 Radiography Database [15]
| Model | Specificity-Normal | Specificity-Pneumonia | Specificity-COVID-19 |
|---|---|---|---|
| 0.9497 | 0.9948 | ||
| 0.9646 | 0.9975 | 0.9923 | |
| 0.9523 | 0.9948 | ||
| 0.9545 | 0.9871 | ||
| 0.9397 | 0.9975 | 0.9948 | |
| 0.9673 | 0.9817 | 0.9665 | |
| 0.9718 | 0.9974 | 0.9563 | |
| 0.9413 | 0.9735 | ||
| 0.9720 | 0.9949 | 0.9797 | |
| 0.9649 | 0.9821 | ||
| 0.9772 | 0.9847 | 0.9899 | |
| 0.9846 | 0.9797 | ||
| 0.965 | |||
| 0.9975 | 0.9723 |
Bold entries denote highest values for a specific evaluation metric (Column)
Confusion Matrix of the Disease Diagnosis Model on the COVID-19 Radiography Database [15]
| Predicted Labels | |||||
|---|---|---|---|---|---|
| Normal | Pneumonia | COVID 19 | Total | ||
| True Labels | Normal | 200 | 0 | 0 | 200 |
| Pneumonia | 4 | 196 | 0 | 200 | |
| COVID 19 | 10 | 0 | 190 | 200 | |
| Total | 214 | 196 | 190 | 600 | |
Confusion Matrix of the Disease Screening Model on the COVID-19 Radiography Database [15]
| Predicted Labels | |||||
|---|---|---|---|---|---|
| Normal | Pneumonia | COVID 19 | Total | ||
| True Labels | Normal | 200 | 0 | 0 | 200 |
| Pneumonia | 3 | 186 | 11 | 200 | |
| COVID 19 | 0 | 1 | 199 | 200 | |
| Total | 203 | 187 | 210 | 600 | |
Comparison of multi-class classification framework’s precision with base classifiers on Chest Xray Images PNEUMONIA and Covid-19 dataset [11]
| Model | Precision-Normal | Precision-Pneumonia | Precision-COVID-19 |
|---|---|---|---|
| 0.9151 | 0.9515 | ||
| 0.949 | 0.8205 | 0.9882 | |
| 0.9417 | 0.9612 | ||
| 0.9375 | 0.8919 | ||
| 0.9381 | 0.9074 | ||
| 0.9684 | 0.9252 | ||
| 0.9327 | 0.97 | 0.9896 | |
| 0.96 | 0.9252 | ||
| 0.9886 | 0.8772 | ||
| 0.96 | 0.9245 | ||
| 0.9588 | 0.9159 | ||
| 0.9681 | 0.9083 | ||
| 0.9794 | 0.934 | ||
| 0.9429 |
Bold entries denote highest values for a specific evaluation metric (Column)
Comparison proposed multi-class classification framework’s recall with base classifiers on Chest Xray Images PNEUMONIA and Covid-19 dataset [11]
| Model | Recall-Normal | Recall-Pneumonia | Recall-COVID-19 |
|---|---|---|---|
| 0.97 | 0.98 | 0.91 | |
| 0.93 | 0.96 | 0.84 | |
| 0.97 | 0.99 | 0.94 | |
| 0.9 | 0.99 | 0.93 | |
| 0.91 | 0.98 | 0.95 | |
| 0.92 | 0.99 | 0.98 | |
| 0.97 | 0.97 | 0.95 | |
| 0.96 | 0.99 | 0.93 | |
| 0.87 | 0.98 | ||
| 0.96 | 0.94 | ||
| 0.9567 | 0.96 | ||
| 0.91 | 0.99 | 0.97 | |
| 0.95 | 0.99 | 0.97 | |
| 0.96 | 0.98 |
Bold entries denote highest values for a specific evaluation metric (Column)
Comparison of proposed multi-class classification framework with base classifiers using chest X-ray images pneumonia and COVID-19 [11]
| Model | Precision | Recall | Accuracy | F1 Score |
|---|---|---|---|---|
| 0.9666 | 0.9425 | 0.9533 | 0.9544 | |
| 0.9365 | 0.8925 | 0.91 | 0.914 | |
| 0.9757 | 0.96 | 0.9667 | 0.9678 | |
| 0.9573 | 0.9375 | 0.94 | 0.9473 | |
| 0.9614 | 0.9475 | 0.9467 | 0.9544 | |
| 0.9734 | 0.9675 | 0.9633 | 0.9704 | |
| 0.9705 | 0.96 | 0.9633 | 0.9652 | |
| 0.9713 | 0.9525 | 0.96 | 0.9618 | |
| 0.9665 | 0.9575 | 0.95 | 0.962 | |
| 0.9711 | 0.955 | 0.96 | 0.963 | |
| 0.9687 | 0.9575 | 0.9567 | 0.963 | |
| 0.9691 | 0.96 | 0.9567 | 0.9645 | |
| 0.97 | 0.97 | 0.9742 | ||
| 0.9689 |
Bold entries denote highest values for a specific evaluation metric (Column)
Comparison of multi-class classification framework’s specificity with base classifiers on Chest Xray Images PNEUMONIA and Covid-19 dataset [11]
| Model | Specificity-Normal | Specificity-Pneumonia | Specificity-COVID-19 |
|---|---|---|---|
| 0.9545 | 0.9741 | ||
| 0.973 | 0.8939 | 0.9947 | |
| 0.9698 | 0.9795 | ||
| 0.9697 | 0.9385 | ||
| 0.9698 | 0.949 | ||
| 0.985 | 0.9596 | ||
| 0.9648 | 0.9846 | 0.9949 | |
| 0.9796 | 0.9594 | ||
| 0.9296 | |||
| 0.9796 | 0.9596 | ||
| 0.9798 | 0.9545 | ||
| 0.9849 | 0.9495 | ||
| 0.9899 | 0.9648 | ||
| 0.97 |
Bold entries denote highest values for a specific evaluation metric (Column)
Confusion Matrix of the Disease Diagnosis Model on the Chest Xray Images PNEUMONIA and Covid-19 dataset [11]
| Predicted Labels | |||||
|---|---|---|---|---|---|
| Normal | Pneumonia | COVID 19 | Total | ||
| True Labels | Normal | 95 | 5 | 0 | 100 |
| Pneumonia | 1 | 99 | 0 | 100 | |
| COVID 19 | 1 | 2 | 97 | 100 | |
| Total | 97 | 106 | 97 | 300 | |
Confusion Matrix of the Disease Screening Model on the Chest Xray Images PNEUMONIA and Covid-19 dataset [11]
| Predicted Labels | |||||
|---|---|---|---|---|---|
| Normal | Pneumonia | COVID 19 | Total | ||
| True Labels | Normal | 96 | 0 | 4 | 100 |
| Pneumonia | 0 | 98 | 2 | 100 | |
| COVID 19 | 1 | 0 | 99 | 100 | |
| Total | 97 | 98 | 105 | 300 | |
Precision of the proposed model on COVID-19 Radiography Database [15] under varied threshold
| Threshold | Precision-Normal | Precision-Pneumonia | Precision-COVID-19 |
|---|---|---|---|
| 0.1 | 0.9041 | 0.9944 | 0.9754 |
| 0.2 | 0.9005 | 0.9944 | 0.985 |
| 0.3 | |||
| 0.4 | 0.9302 | 0.9948 | |
| 0.5 | 0.9259 | 0.9948 | |
| 0.6 | 0.9091 | 0.9947 | |
| 0.7 | 0.8969 | 0.9946 | |
| 0.8 | 0.8734 | 0.9944 | |
| 0.9 | 0.8621 | 0.9845 | 0.9943 |
Bold entries denote highest values for a specific evaluation metric (Column)
Recall of the proposed model on COVID-19 Radiography Database [15] under varied threshold
| Threshold | Recall-Normal | Recall-Pneumonia | Recall-COVID-19 |
|---|---|---|---|
| 0.1 | 0.99 | 0.885 | |
| 0.2 | 0.995 | 0.89 | 0.985 |
| 0.3 | 0.98 | ||
| 0.4 | 0.965 | ||
| 0.5 | 0.96 | ||
| 0.6 | 0.94 | ||
| 0.7 | 0.925 | ||
| 0.8 | 0.895 | ||
| 0.9 | 0.87 |
Bold entries denote highest values for a specific evaluation metric (Column)
Performance of the proposed model on COVID-19 Radiography Database [15] under varied thresholds
| Threshold | Precision | Recall | Accuracy | F1 Score |
|---|---|---|---|---|
| 0.1 | 0.958 | 0.955 | 0.955 | 0.9565 |
| 0.2 | 0.96 | 0.9567 | 0.9567 | 0.9583 |
| 0.3 | ||||
| 0.4 | 0.9733 | 0.9717 | 0.9717 | 0.9725 |
| 0.5 | 0.9718 | 0.97 | 0.97 | 0.9709 |
| 0.6 | 0.9662 | 0.9633 | 0.9633 | 0.9648 |
| 0.7 | 0.9621 | 0.9583 | 0.9583 | 0.9602 |
| 0.8 | 0.9542 | 0.9483 | 0.9483 | 0.9513 |
| 0.9 | 0.9469 | 0.94 | 0.94 | 0.9435 |
Bold entries denote highest values for a specific evaluation metric (Column)
Fig. 4Variation of Precision, Recall, Accuracy and F1 score with threshold on COVID-19 Radiography Database [15]
Performance of the proposed model on the chest X-ray images pneumonia and COVID-19 [11] under varied threshold
| Threshold | Precision-Normal | Precision-Pneumonia | Precision-COVID-19 |
|---|---|---|---|
| 0.1 | |||
| 0.2 | 0.9596 | ||
| 0.3 | 0.9596 | ||
| 0.4 | 0.9596 | ||
| 0.5 | 0.9596 | ||
| 0.6 | 0.9596 | ||
| 0.7 | 0.95 | 0.9429 | |
| 0.8 | 0.95 | 0.934 | |
| 0.9 | 0.9406 | 0.9252 |
Bold entries denote highest values for a specific evaluation metric (Column)
Performance of the proposed model on the chest X-ray images pneumonia and COVID-19 [11] under varied threshold
| Threshold | Recall-Normal | Recall-Pneumonia | Recall-COVID19 |
|---|---|---|---|
| 0.1 | |||
| 0.2 | 0.97 | ||
| 0.3 | 0.97 | ||
| 0.4 | 0.97 | ||
| 0.5 | 0.97 | ||
| 0.6 | 0.97 | ||
| 0.7 | 0.95 | ||
| 0.8 | 0.94 | ||
| 0.9 | 0.92 |
Bold entries denote highest values for a specific evaluation metric (Column)
Performance of the proposed model on the chest X-ray images pneumonia and COVID-19 [11] under varied threshold
| Threshold | Precision | Recall | Accuracy | F1 Score |
|---|---|---|---|---|
| 0.1 | ||||
| 0.2 | 0.9705 | 0.97 | 0.97 | 0.9703 |
| 0.3 | 0.9705 | 0.97 | 0.97 | 0.9703 |
| 0.4 | 0.9705 | 0.97 | 0.97 | 0.9703 |
| 0.5 | 0.9705 | 0.97 | 0.97 | 0.9703 |
| 0.6 | 0.9705 | 0.97 | 0.97 | 0.9703 |
| 0.7 | 0.9643 | 0.9633 | 0.9633 | 0.9638 |
| 0.8 | 0.9613 | 0.96 | 0.96 | 0.9607 |
| 0.9 | 0.9533 | 0.9533 | 0.9533 | 0.9543 |
Bold entries denote highest values for a specific evaluation metric (Column)
Fig. 5Variation of Precision, Recall, Accuracy and F1 score with threshold on the chest X-ray images pneumonia and COVID-19 [11]
Performance of the Disease Diagnosis Model on COVID-19 Radiography Database [15] under varied noise levels
| Noise Percentage | Precision | Recall | Accuracy | F1 Score |
|---|---|---|---|---|
| 0 | ||||
| 2.5 | 0.9687 | 0.9513 | 0.96 | 0.9599 |
| 5 | 0.913 | 0.9013 | 0.91 | 0.9071 |
| 10 | 0.8184 | 0.8013 | 0.81 | 0.8097 |
| 30 | 0.6185 | 0.605 | 0.605 | 0.6117 |
Performance of the Disease Screening Model on COVID-19 Radiography Database [15] under varied noise levels
| Noise Percentage | Precision | Recall | Accuracy | F1 Score |
|---|---|---|---|---|
| 0 | ||||
| 2.5 | 0.9389 | 0.9425 | 0.9417 | 0.9407 |
| 5 | 0.8834 | 0.8988 | 0.8917 | 0.891 |
| 10 | 0.7973 | 0.8122 | 0.8048 | 0.8047 |
| 30 | 0.5833 | 0.5913 | 0.5833 | 0.5873 |
Performance of the Disease Diagnosis Model on the chest X-ray images pneumonia and COVID-19 [11] under varied noise levels
| Noise Percentage | Precision | Recall | Accuracy | F1 Score |
|---|---|---|---|---|
| 0 | ||||
| 2.5 | 0.9353 | 0.9275 | 0.9233 | 0.9314 |
| 5 | 0.8804 | 0.8675 | 0.8667 | 0.8739 |
| 10 | 0.8037 | 0.77 | 0.7733 | 0.7865 |
| 30 | 0.6385 | 0.6025 | 0.6 | 0.62 |
Performance of the Disease Screening Model on the chest X-ray images pneumonia and COVID-19 [11] under varied noise levels
| Noise Percentage | Precision | Recall | Accuracy | F1 Score |
|---|---|---|---|---|
| 0 | ||||
| 2.5 | 0.923 | 0.9375 | 0.93 | 0.9302 |
| 5 | 0.8684 | 0.8825 | 0.8733 | 0.8754 |
| 10 | 0.7886 | 0.7975 | 0.7967 | 0.793 |
| 30 | 0.628 | 0.64 | 0.6367 | 0.634 |
Fig. 6Pneumonia Images : Output from LIME
Fig. 7COVID-19 Images : Output from LIME
Comparison between the proposed model and models proposed in previous research papers
| Model | Accuracy | F1 Score |
|---|---|---|
| CNN Model [ | 0.8422 | 0.8421 |
| Automated Model [ | 0.8702 | 0.8737 |
| COVID NET Model [ | 0.924 | 0.9 |
| CORONET Model [ | 0.95 | 0.956 |
| CVDNET Model [ | 0.9669 | 0.9668 |
| Proposed framework |
Fig. 8F1 Score of Different models on different datasets
Fig. 9Accuracy of Different models on different datasets