| Literature DB >> 33967406 |
Mahbubunnabi Tamal1, Maha Alshammari1, Meernah Alabdullah1, Rana Hourani1, Hossain Abu Alola2, Tarek M Hegazi2.
Abstract
The objective of the research article is to propose and validate a combination of machine learning and radiomics features to detect COVID-19 early and rapidly from chest X-ray (CXR) in presence of other viral/bacterial pneumonia and at different severity levels of diseases. It is vital to assess the performance of any diagnosis method on an independent data set and at very early stage of the disease when the disease severity of is very low. In such cases, most of the diagnosis methods fail. A total of 378 CXR images containing both normal lung and pneumonia (both COVID-19 and others lung conditions) were collected from publically available data set. 71 radiomics features for each lung segment were chosen from 100 extracted features based on Z-score heatmap and one way ANOVA test that can detect COVID-19. Three best performing classical machine learning algorithms during the training phase - 1) fine Gaussian support vector machine (SVM), 2) fine k-nearest neighbor (KNN) and 3) ensemble bagged model (EBM) trees were chosen for further evaluation on an independent test data set. The independent test data set consists of 115 COVID-19 CXR images collected from a local hospital and 100 CXR images collected from publically available data set containing normal lung and viral/bacterial pneumonia. Severity was scored between 0 to 4 by two experienced radiologists for each lung with pneumonia (both COVID-19 and non COVID-19) for the test data set. Ensemble Bagging Model Trees (EBM) with the selected radiomics features is the most suitable to distinguish between COVID-19 and other lung infections with an overall sensitivity of 87.8% and specificity of 97% (95.2% accuracy and 0.9228 area under curve) and is robust across severity levels. The method also can detect COVID-19 from CXR when two experienced radiologists were unable to detect any abnormality in the lung CXR (represented by severity score of 0). Once the CXR is acquired and lung is segmented, it takes less than two minutes for extracting radiomics features and providing diagnosis result. Since the proposed method does not require any manual intervention (e.g., sample collection etc.), it can be straightway integrated with standard X-ray reporting system to be used as an efficient, cost-effective and rapid early diagnosis device.Entities:
Keywords: COVID-19; Chest X-ray; Early diagnosis; Machine learning; Radiomics
Year: 2021 PMID: 33967406 PMCID: PMC8095015 DOI: 10.1016/j.eswa.2021.115152
Source DB: PubMed Journal: Expert Syst Appl ISSN: 0957-4174 Impact factor: 6.954
Training data description
| Disease Type | Description | No of CXR | Source |
|---|---|---|---|
| non-COVID-19(Total 152 images) | Normal | 50 | JSRT ( |
| Viral/bacterial pneumonia | 50 | Kaggle ( | |
| Other lung conditions (ARDS (4), SARS (15), Pneumocystis (12), Streptococcus (13), Chlamydophila (1), E. Coli (4), Klebsiella (1) and Legionella (1)) | 52 | GitHub ( | |
| COVID-19 | COVID-19 | 226 | GitHub ( |
Fig. 1Illustration of the methodology for COVID-19 detection from CXR images.
Test data description
| Disease type | Description | No of CXR | Source |
|---|---|---|---|
| non-COVID-19 | Normal | 25 | JSTR ( |
| Viral/bacterial Pneumonia | 25 | Kaggle ( | |
| COVID-19 | COVID-19 | 115 (25 patients) | Local Hospital |
Fig. 2Normal CXR (a) and RT-PCR test confirmed COVID-19 positive cases with high (b) and low (c) severity score.
Number of segmented lungs for each severity
| Severity | 0 | 1 | 2 | 3 | 4 | Total |
|---|---|---|---|---|---|---|
| non-COVID-19 | 50 | 17 | 22 | 11 | 0 | 100 |
| COVID-19 | 36 | 88 | 71 | 27 | 8 | 230 |
| Total | 86 | 105 | 93 | 38 | 8 | 330 |
Fig. 3Z-score heatmap of 71 radiomics features that yield statistically significant difference between COVID-19 and others. Each row represents one feature and each column represents one CXR image used in the training set.
Performance of the classifiers during training.
| Classifier | Sensitivity | Specificity | Accuracy | AUC-ROC |
|---|---|---|---|---|
| Fine Gaussian SVM | 98.2% | 88.4% | 93.4% | 0.9894 |
| Fine KNN | 88.9% | 97.9% | 93.3% | 0.9343 |
| Ensemble Bagged Model Trees (EBM) | 91.6% | 92.6% | 91.8% | 0.9772 |
Performance of the classifiers during testing.
| Classifier | Sensitivity | Specificity | Accuracy | AUC-ROC |
|---|---|---|---|---|
| Fine Gaussian SVM | 99.6% | 85% | 95.2% | 0.9228 |
| Fine KNN | 73.5% | 98% | 80.9% | 0.8574 |
| Ensemble Bagged Model Trees (EBM) | 87.8% | 97% | 90.6% | 0.9241 |
Fig. 4ROC plot of the three best performing classifiers during training (left) and during testing (right).
Sensitivity and specificity based on severity for SVM
| Severity | 0 | 1 | 2 | 3 | 4 | Total | |
|---|---|---|---|---|---|---|---|
| COVID-19 | Original | 36 | 88 | 71 | 27 | 8 | 230 |
| Detected | 35 | 88 | 71 | 27 | 8 | 229 | |
| Sensitivity | 97.2% | 100.0% | 100.0% | 100.0% | 100.0% | 99.6% | |
| Non-COVID-19 | Original | 50 | 17 | 22 | 11 | 0 | 100 |
| Detected | 47 | 11 | 20 | 7 | 0 | 85 | |
| Specificity | 94.0% | 64.7% | 90.9% | 63.6% | 0.0% | 85.0% |
Sensitivity and specificity based on severity for EBM.
| Severity | 0 | 1 | 2 | 3 | 4 | 5 | |
|---|---|---|---|---|---|---|---|
| COVID-19 | Original | 36 | 88 | 71 | 27 | 8 | 230 |
| Detected | 33 | 78 | 64 | 21 | 6 | 202 | |
| Sensitivity | 91.7% | 88.6% | 90.1% | 77.8% | 75.0% | 87.8% | |
| Non-COVID-19 | Original | 50 | 17 | 22 | 11 | 0 | 100 |
| Detected | 50 | 16 | 20 | 11 | 0 | 97 | |
| Specificity | 100.0% | 94.1% | 90.9% | 100.0% | 0.0% | 97.0% |
Fig. 5Sensitivity and specificity for whole test data as well as for each severity level. The size of the filled circles represents severity levels from 0 to 4.