| Literature DB >> 36240175 |
Surbhi Mittal1, Vasantha Kumar Venugopal2, Vikash Kumar Agarwal2, Manu Malhotra2, Jagneet Singh Chatha2, Savinay Kapur2, Ankur Gupta2, Vikas Batra2, Puspita Majumdar1,3, Aakarsh Malhotra1,3, Kartik Thakral1, Saheb Chhabra1,3, Mayank Vatsa1, Richa Singh1, Santanu Chaudhury1.
Abstract
Consistent clinical observations of characteristic findings of COVID-19 pneumonia on chest X-rays have attracted the research community to strive to provide a fast and reliable method for screening suspected patients. Several machine learning algorithms have been proposed to find the abnormalities in the lungs using chest X-rays specific to COVID-19 pneumonia and distinguish them from other etiologies of pneumonia. However, despite the enormous magnitude of the pandemic, there are very few instances of public databases of COVID-19 pneumonia, and to the best of our knowledge, there is no database with annotation of abnormalities on the chest X-rays of COVID-19 affected patients. Annotated databases of X-rays can be of significant value in the design and development of algorithms for disease prediction. Further, explainability analysis for the performance of existing or new deep learning algorithms will be enhanced significantly with access to ground-truth abnormality annotations. The proposed COVID Abnormality Annotation for X-Rays (CAAXR) database is built upon the BIMCV-COVID19+ database which is a large-scale dataset containing COVID-19+ chest X-rays. The primary contribution of this study is the annotation of the abnormalities in over 1700 frontal chest X-rays. Further, we define protocols for semantic segmentation as well as classification for robust evaluation of algorithms. We provide benchmark results on the defined protocols using popular deep learning models such as DenseNet, ResNet, MobileNet, and VGG for classification, and UNet, SegNet, and Mask-RCNN for semantic segmentation. The classwise accuracy, sensitivity, and AUC-ROC scores are reported for the classification models, and the IoU and DICE scores are reported for the segmentation models.Entities:
Mesh:
Year: 2022 PMID: 36240175 PMCID: PMC9565456 DOI: 10.1371/journal.pone.0271931
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.752
Fig 1Chest X-rays of (a) RT-PCR proven COVID-19 pneumonia in cases showing the typical bilateral peripheral consolidations and ground-glass opacities. (b) Non-COVID-19 pneumonia in cases showing the lobar distribution of consolidations with pleural effusion.
Summary of existing datasets containing COVID-19 positive chest X-rays.
| Dataset | COVID-19 CXR Samples | Disease Region Annotation | Description |
|---|---|---|---|
| COVID-19 Image Data Collection [ | 11 | ✘ | The COVID-19 Image Data Collection dataset was the first effort towards COVID-19 CXR data collection. |
| COVIDx [ | 385 | ✘ | The COVIDx dataset combines a wide variety of infected chest CXRs. However, the number of COVID-19 CXRs is extremely limited. |
| COVID-19 Posteroanterior Chest X-Ray fused [ | 153 | ✘ | The CPCXR dataset is a combination of three publicly available datasets. |
| BrixIA Covid-19 [ | 192 | ✘ | The CXRs in the BrixIA dataset are annotated as per the BrixIA score which is a score to measure the degree of lung compromise. |
| Novel COVID-19 Chestxray Repository [ | 752 | ✘ | The Novel COVID-19 Chestxray Repository dataset is a combination of CXRs collected from various sources including healthy and pneumonia-infected CXRs. |
| COVID-19 Radiography Database [ | 3616 | ✘ | The COVID-19 Radiography Database dataset is a combination of different datasets and collected as a part of a Kaggle competition with CXRs added incrementally. |
| BIMCV-COVID19+ [ | 2429 | ✘ | The BIMCV dataset contains COVID-infected CXRs collected in Spain with a large number of infected CXRs. |
|
|
| ✔ | The proposed CAAXR dataset uses a subset of BIMCV-COVID19+ dataset and provides additional disease region annotations, unavailable\\in existing datasets. |
Fig 2Representative image of the CARPL Annotation Platform tool used for loading DICOM files and subsequently for drawing bounding boxes around each image.
Fig 3Samples of abnormality annotations performed for BIMCV-COVID19+ dataset.
The regions annotated by the radiologists are depicted using green bounding boxes along with the abnormality annotation.
Proposed experimental protocol for the train, validation, and test set of the proposed annotation dataset.
2-fold cross-validation is performed. The source images for COVID-19 X-rays are taken from the BIMCV-COVID19+ database, while the rest of the images are from the CheXpert database. The corresponding disease abnormality is provided in this study.
| S. No. | Protocol | Classes | Sub-Classes | Train | Test | ||
|---|---|---|---|---|---|---|---|
| Sub-class wise | Total | Sub-class wise | Total | ||||
| D1 | Disease Localization | Abnormal Regions | Lung specific abnormalities | - | 828 | - | 828 |
| Other Regions | Healthy regions & other non-lung abnormalities | ||||||
| C1 | 2-class classification | COVID | - | 1188 | 2400 | 1200 | 2400 |
| Non COVID | Healthy | 492 | 480 | ||||
| Non-COVID Pneumonia | 240 | 240 | |||||
| Other Abnormalities | 480 | 480 | |||||
| C2 | 3-class classification | COVID | - | 1188 | 3600 | 1200 | 3600 |
| Unhealthy Others | Non-COVID Pneumonia | 480 | 480 | ||||
| Other Abnormalities | 720 | 720 | |||||
| Healthy | - | 1212 | 1200 | ||||
| C3 | 4-class classification | COVID | - | 1188 | 4800 | 1200 | 4800 |
| Non-COVID Pneumonia | - | 1200 | 1200 | ||||
| Other Abnormalities | - | 1200 | 1200 | ||||
| Healthy | - | 1212 | 1200 | ||||
Fig 4Number of samples present in protocol for (a) Semantic Segmentation. (b) 2-class Classification. (c) 3-class Classification. (d) 4-class Classification.
Evaluation of existing deep learning models for COVID-19 semantic segmentation.
| Algorithm | Test Set | Consolidation Test Subset | ||
|---|---|---|---|---|
| IoU | DICE | IoU | DICE | |
| UNet [ | 0.22 ± 0.03 | 0.32 ± 0.03 | 0.24 ± 0.03 | 0.35 ± 0.00 |
| SegNet [ | 0.18 ± 0.01 | 0.28 ± 0.02 | 0.35 ± 0.03 | 0.30 ± 0.02 |
| Mask-RCNN [ | 0.30 ± 0.02 | 0.43 ± 0.02 | 0.32 ± 0.02 | 0.46 ± 0.02 |
Fig 5Samples of semantic disease segmentation for popular deep learning algorithms.
Evaluation of existing deep learning algorithms for COVID-19 prediction corresponding to all classification protocols.
In the case of multi-class classification, a one-vs-all strategy is used to report sensitivity for COVID detection.
| Algorithm | Sensitivity @ Y Specificity | |||||
|---|---|---|---|---|---|---|
| 2-class classification | 3-class classification | 4-class classification | ||||
| Y = 99% | Y = 90% | Y = 99% | Y = 90% | Y = 99% | Y = 90% | |
| DenseNet121 [ | 97.35 ± 0.06 | 99.52 ± 0.06 | 97.14 ± 0.16 | 99.57 ± 0.14 | 96.87 ± 0.30 | 99.52 ± 0.11 |
| MobileNetv2 [ | 99.12 ± 0.11 | 99.75 ± 0.07 | 98.58 ± 0.17 | 99.57 ± 0.06 | 98.13 ± 0.21 | 99.62 ± 0.10 |
| ResNet18 [ | 97.01 ± 0.41 | 99.70 ± 0.10 | 96.62 ± 0.40 | 99.60 ± 0.08 | 96.17 ± 0.55 | 99.43 ± 0.06 |
| VGG19 [ | 94.77 ± 0.96 | 99.55 ± 0.28 | 93.84 ± 0.30 | 99.53 ± 0.11 | 94.02 ± 0.96 | 99.38 ± 0.12 |
Class-wise classification accuracy corresponding to existing deep learning algorithms for all classification protocols.
| Protocol | Classes | DenseNet121 | MobileNetv2 | ResNet18 | VGG19 |
|---|---|---|---|---|---|
| 2-class classification | COVID | 96.98 ± 1.22 | 98.81 ± 0.78 | 98.27 ± 0.32 | 96.74 ± 1.22 |
| Non-COVID | 98.32 ± 0.65 | 98.68 ± 0.85 | 98.09 ± 0.30 | 97.45 ± 1.43 | |
| 3-class classification | COVID | 97.54 ± 1.04 | 97.92 ± 0.68 | 95.89 ± 1.35 | 96.43 ± 0.70 |
| Non-COVID Unhealthy | 73.36 ± 4.34 | 74.73 ± 4.28 | 72.69 ± 1.37 | 72.06 ± 1.68 | |
| Healthy | 72.85 ± 4.04 | 72.79 ± 3.69 | 70.96 ± 1.43 | 71.71 ± 1.73 | |
| 4-class classification | COVID | 87.74 ± 4.70 | 97.27 ± 0.93 | 95.83 ± 1.59 | 90.81 ± 1.58 |
| Non-COVID Pneumonia | 57.22 ± 10.89 | 57.63 ± 6.41 | 53.86 ± 3.24 | 56.06 ± 4.91 | |
| Unhealthy Others | 50.80 ± 3.21 | 50.54 ± 2.74 | 47.23 ± 2.13 | 51.20 ± 1.64 | |
| Healthy | 58.06 ± 8.61 | 57.03 ± 5.07 | 54.18 ± 5.53 | 55.14 ± 3.68 |
Class-wise AUC corresponding to existing deep learning algorithms for all four classification protocols.
| Protocol | Classes | DenseNet121 | MobileNetv2 | ResNet18 | VGG19 |
|---|---|---|---|---|---|
| 2-class classification | COVID | 0.998 ± 0.000 | 0.999 ± 0.000 | 0.998 ± 0.000 | 0.996 ± 0.001 |
| Non-COVID | 0.998 ± 0.000 | 0.999 ± 0.000 | 0.998 ± 0.000 | 0.996 ± 0.001 | |
| 3-class classification | COVID | 0.998 ± 0.000 | 0.999 ± 0.000 | 0.998 ± 0.000 | 0.997 ± 0.000 |
| Non-COVID Unhealthy | 0.895 ± 0.002 | 0.898 ± 0.001 | 0.889 ± 0.002 | 0.889 ± 0.003 | |
| Healthy | 0.894 ± 0.005 | 0.901 ± 0.001 | 0.889 ± 0.001 | 0.886 ± 0.003 | |
| 4-class classification | COVID | 0.998 ± 0.000 | 0.998 ± 0.000 | 0.997 ± 0.000 | 0.996 ± 0.000 |
| Non-COVID Pneumonia | 0.820 ± 0.035 | 0.821 ± 0.040 | 0.795± 0.039 | 0.812 ± 0.032 | |
| Unhealthy Others | 0.788 ± 0.005 | 0.793 ± 0.004 | 0.768 ± 0.012 | 0.793 ± 0.006 | |
| Healthy | 0.830 ± 0.037 | 0.840 ± 0.037 | 0.822 ± 0.039 | 0.832 ± 0.028 |
p-values obtained after performing t-test using the baseline models for the 4-class classification protocol.
| MobileNetv2 | ResNet18 | VGG19 | |
|---|---|---|---|
| DenseNet121 | 0.00004 | 0.00003 | 0.06119 |
| MobileNetv2 | - | 0.99248 | 0.02548 |
| VGG19 | - | - | 0.02375 |