| Literature DB >> 35581272 |
Mizuho Nishio1, Daigo Kobayashi2, Eiko Nishioka2, Hidetoshi Matsuo2, Yasuyo Urase2, Koji Onoue3, Reiichi Ishikura3, Yuri Kitamura4, Eiro Sakai5, Masaru Tomita6, Akihiro Hamanaka7, Takamichi Murakami2.
Abstract
This retrospective study aimed to develop and validate a deep learning model for the classification of coronavirus disease-2019 (COVID-19) pneumonia, non-COVID-19 pneumonia, and the healthy using chest X-ray (CXR) images. One private and two public datasets of CXR images were included. The private dataset included CXR from six hospitals. A total of 14,258 and 11,253 CXR images were included in the 2 public datasets and 455 in the private dataset. A deep learning model based on EfficientNet with noisy student was constructed using the three datasets. The test set of 150 CXR images in the private dataset were evaluated by the deep learning model and six radiologists. Three-category classification accuracy and class-wise area under the curve (AUC) for each of the COVID-19 pneumonia, non-COVID-19 pneumonia, and healthy were calculated. Consensus of the six radiologists was used for calculating class-wise AUC. The three-category classification accuracy of our model was 0.8667, and those of the six radiologists ranged from 0.5667 to 0.7733. For our model and the consensus of the six radiologists, the class-wise AUC of the healthy, non-COVID-19 pneumonia, and COVID-19 pneumonia were 0.9912, 0.9492, and 0.9752 and 0.9656, 0.8654, and 0.8740, respectively. Difference of the class-wise AUC between our model and the consensus of the six radiologists was statistically significant for COVID-19 pneumonia (p value = 0.001334). Thus, an accurate model of deep learning for the three-category classification could be constructed; the diagnostic performance of our model was significantly better than that of the consensus interpretation by the six radiologists for COVID-19 pneumonia.Entities:
Mesh:
Year: 2022 PMID: 35581272 PMCID: PMC9113076 DOI: 10.1038/s41598-022-11990-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Our DL model. Abbreviation: DL, deep learning.
Numbers of CXR images in the COVIDx, COVIDBIMCV, and COVIDprivate datasets.
| Dataset | Total number of CXR images | Number of CXR images of the healthy | Number of CXR images of non-COVID-19 pneumonia | Number of CXR images of COVID-19 pneumonia |
|---|---|---|---|---|
| COVIDx | 14,258 | 8066 | 5575 | 617 |
| COVIDBIMCV | 11,253 | 8799 | 979 | 1475 |
| COVIDprivate | 455 | 139 | 139 | 177 |
All cases of non-COVID-19 pneumonia are bacterial pneumonia in COVIDprivate.
CXR chest X-Ray imaging; COVIDx public dataset used for COVID-Net; COVID public dataset obtained from the PadChest dataset and the BIMCV-COVID19 + dataset; COVID private dataset collected from six hospitals.
Patients’ characteristics in the COVIDprivate dataset.
| Hospital | Number of patients | Male | Female | Age (y) (mean ± standard deviation) |
|---|---|---|---|---|
| Hospital 1 | 6 | 4 | 2 | 68.0 ± 9.78 |
| Hospital 2 | 20 | 15 | 5 | 61.7 ± 14.8 |
| Hospital 3 | 7 | 5 | 2 | 73.1 ± 12.1 |
| Hospital 4 | 173 | 104 | 69 | 58.3 ± 19.3 |
| Hospital 5 | 186 | 99 | 87 | 61.2 ± 18.5 |
| Hospital 6 | 63 | 30 | 33 | 65.3 ± 17.7 |
| Total | 455 | 198 | 257 | 61.0 ± 18.6 |
COVID private dataset collected from six hospitals.
Figure 2Schematic illustration of dataset splitting, model training, and prediction with our DL model. Abbreviations: COVIDx, Public dataset used for COVID-Net; COVIDBIMCV, Public dataset obtained from the PadChest dataset and the BIMCV-COVID19 + dataset; COVIDprivate, Private dataset collected from six hospitals.
Class-wise precision, recall, F1-score, and three-category classification accuracy of four DL models and six radiologists in the COVIDprivate dataset.
| Model or Radiologist | The healthy | Non-COVID-19 pneumonia | COVID-19 pneumonia | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Precision | Recall | F1-score | Precision | Recall | F1-score | Precision | Recall | F1-score | Accuracy* | |
| Our DL model | 0.8475, 0.7458, 0.9348 | 0.7000, 0.5652, 0.8302 | ||||||||
| COVID-Net | 0.6173, 0.5067, 0.7229 | 0.7634, 0.6726, 0.8392 | 0.6604, 0.5254, 0.7827 | 0.7000, 0.5714, 0.8182 | 0.6796, 0.5656, 0.7708 | 0.7500, 0.5000, 0.9412 | 0.2400, 0.1250, 0.3636 | 0.3636, 0.2089, 0.5079 | 0.6467, 0.5667, 0.7200 | |
| Sharma et al | 0.0000, 0.0000, 0.0000 | 0.0000, 0.0000, 0.0000 | 0.0000, 0.0000, 0.0000 | 0.3627, 0.2687, 0.4592 | 0.4868, 0.3803, 0.5806 | 0.6000, 0.4524, 0.7500 | 0.5400, 0.3958, 0.6793 | 0.5684, 0.4337, 0.6813 | 0.4267, 0.3400, 0.5067 | |
| DarkCovidNet | 0.2500, 0.0000, 1.0000 | 0.0200, 0.0000, 0.0638 | 0.0370, 0.0000, 0.1132 | 0.4648, 0.3478, 0.5882 | 0.6600, 0.5227, 0.7869 | 0.5455, 0.4301, 0.6462 | 0.3467, 0.2429, 0.4588 | 0.5200, 0.3799, 0.6591 | 0.4160, 0.3051, 0.5206 | 0.4000, 0.3267, 0.4800 |
| Radiologist1 | 0.8039, 0.6862, 0.9038 | 0.8200, 0.7111, 0.9167 | 0.8119, 0.7209, 0.8837 | 0.6327, 0.4902, 0.7619 | 0.6200, 0.4878, 0.7547 | 0.6263, 0.5055, 0.7333 | 0.6400, 0.5088, 0.7727 | 0.6400, 0.5000, 0.7647 | 0.6400, 0.5238, 0.7358 | 0.6933, 0.6200, 0.7600 |
| Radiologist2 | 0.8333, 0.7222, 0.9318 | 0.8000, 0.6779, 0.9038 | 0.8163, 0.7209, 0.8932 | 0.7000, 0.5714, 0.8197 | 0.7000, 0.5745, 0.8182 | 0.7000, 0.5895, 0.7959 | 0.7115, 0.5818, 0.8302 | 0.7400, 0.6111, 0.8519 | 0.7255, 0.6200, 0.8148 | 0.7467, 0.6800, 0.8133 |
| Radiologist3 | 0.8600, 0.7556, 0.9500 | 0.8600, 0.7755, 0.9250 | 0.7200, 0.5957, 0.8400 | 0.7200, 0.5882, 0.8409 | 0.7200, 0.6118, 0.8142 | 0.7400, 0.6154, 0.8667 | 0.7400, 0.6122, 0.8537 | 0.7400, 0.6316, 0.8367 | 0.7733, 0.7067, 0.8400 | |
| Radiologist4 | 0.6154, 0.5051, 0.7215 | 0.9600, 0.8965, 1.0000 | 0.7500, 0.6560, 0.8244 | 0.8276, 0.6786, 0.9615 | 0.4800, 0.3404, 0.6200 | 0.6076, 0.4706, 0.7246 | 0.6279, 0.4736, 0.7778 | 0.5400, 0.3921, 0.6724 | 0.5806, 0.4444, 0.6903 | 0.6600, 0.5865, 0.7333 |
| Radiologist5 | 0.7358, 0.6122, 0.8511 | 0.7800, 0.6596, 0.8913 | 0.7573, 0.6531, 0.8432 | 0.5417, 0.4000, 0.6793 | 0.5200, 0.3846, 0.6563 | 0.5306, 0.4051, 0.6400 | 0.5102, 0.3725, 0.6471 | 0.5000, 0.3673, 0.6316 | 0.5051, 0.3789, 0.6154 | 0.6000, 0.5267, 0.6800 |
| Radiologist6 | 0.5385, 0.4375, 0.6429 | 0.9800, 0.9362, 1.0000 | 0.6950, 0.6031, 0.7792 | 0.6667, 0.4783, 0.8519 | 0.3600, 0.2249, 0.5000 | 0.4675, 0.3158, 0.6001 | 0.5625, 0.3793, 0.7419 | 0.3600, 0.2222, 0.4894 | 0.4390, 0.2899, 0.5618 | 0.5667, 0.4867, 0.6467 |
Each cell includes classification metric and its 95% CI (lower and upper bounds of CI). * indicates 3-category classification accuracy. The experience of the six radiologists were 10 months, and 4, 7, 10, 10, and 15 years. The underlined values represent the best values for each column.
DL deep learning; CI confidence interval; COVID private dataset collected from six hospitals.
Class-wise AUC and its 95% CI of our DL model and consensus of six radiologists.
| Model or Radiologist | Dataset | The healthy | Non-COVID-19 pneumonia | COVID-19 pneumonia | |||
|---|---|---|---|---|---|---|---|
| AUC | 95% CI | AUC | 95% CI | AUC | 95% CI | ||
| Our DL model | COVIDx | 0.9914 | 0.9837, 0.9990 | 0.9772 | 0.9601, 0.9942 | 0.9934 | 0.9871, 0.9996 |
| Our DL model | COVIDBIMCV | 0.9712 | 0.9548, 0.9877 | 0.9568 | 0.9355, 0.9781 | 0.9856 | 0.9702, 1 |
| Our DL model | COVIDprivate | 0.9912 | 0.9801, 1.0000 | 0.9492 | 0.9118, 0.9866 | 0.9752 | 0.9555, 0.9949 |
| COVID-Net | COVIDprivate | 0.8917 | 0.8405, 0.9429 | 0.8500 | 0.7909, 0.9091 | 0.7167 | 0.6347, 0.7987 |
| Sharma et al | COVIDprivate | 0.6074 | 0.5111, 0.7037 | 0.5017 | 0.4089, 0.5945 | 0.7564 | 0.6768, 0.8360 |
| DarkCovidNet | COVIDprivate | 0.4315 | 0.3350, 0.5280 | 0.7226 | 0.6420, 0.8032 | 0.5589 | 0.4630, 0.6548 |
| Consensus of radiologists | COVIDprivate | 0.9656 | 0.9401, 0.9911 | 0.8654 | 0.8022, 0.9286 | 0.8740 | 0.8164, 0.9316 |
DL deep learning; CI confidence interval; AUC area under the curve; COVIDx public dataset used for COVID-Net; COVID public dataset obtained from the PadChest dataset and the BIMCV-COVID19 + dataset; COVID private dataset collected from six hospitals.
Figure 3Class-wise ROC curves in COVIDprivate dataset. Note: (A) consensus of radiologists and (B) our DL model. Abbreviation: DL, deep learning; COVIDprivate, private dataset collected from six hospitals; AUC, area under the curve; ROC, receiver operating characteristics.
Figure 4Results of Grad-CAM for our DL model. Note: (A) the healthy, (B) non-COVID-19 pneumonia, (C) COVID-19 pneumonia. Each image part consists of CXR image and result of Grad-CAM. One trained model of our DL model was used for Grad-CAM. Abbreviation: DL, deep learning; CXR, chest X-Ray imaging.
Summary of COVID-19 DL models on CXR images.
| Authors | Classification | Dataset | Number of COVID-19 images | Performance | Comparison with radiologists |
|---|---|---|---|---|---|
| Shorfuzzaman et al.[ | Multi-class, Binary | Public | 230 | Accuracy = 95.6% (multi-class) | No |
| Ozturk et al.[ | Multi-class, Binary | Public | 125 | Accuracy = 87.02% (multi-class) | No |
| Nishio et al.[ | Multi-class | Public | 215 | Accuracy = 83.6% | No |
| Sharma et al.[ | Multi-class | Public | 51 (original) 75 (dataset-II) | COVID-19 Sensitivity = 100% COVID-19 Sensitivity = 66.67 | No |
| Wang et al.[ | Multi-class | Public | 358 (original COVIDx) | Accuracy = 93.3% | No |
| Elgendi et al.[ | Multi-class | Public, Private | 50 (Dataset 1) 198 (Dataset 2) 248 (Dataset 3) 58 (Dataset 4) | MCC = 0.51 | No |
| Wehbe et al.[ | Binary | Private | 4253 | Accuracy = 82% | Yes |
| Monshi et al.[ | Multi-class | Public | 320 (COVIDcxr) NA (COVIDx ver. 3) | Accuracy = 95.82% | No |
| Karakanis et al.[ | Multi-class, Binary | Public | 145 | Accuracy = 98.3% | No |
| Ours | Multi-class | Public, Private | 617 (COVIDx ver. 5) 1475 (COVIDBIMCV) 177 (COVIDprivate) | Accuracy = 86.67% | Yes |
Definition of accuracy in multi-class classification may be different between these studies.
CXR chest X-Ray imaging; DL deep learning; NA not available; MCC Matthews correlation coefficient; COVIDx public dataset used for COVID-Net.