| Literature DB >> 35350595 |
Farid Al-Areqi1, Mehmet Zeki Konyar2.
Abstract
Rapid diagnosis of the Covid-19 disease is the best way to prevent infection. In this paper, it is proposed to use machine learning methods to aid diagnoses quickly Covid-19 and focused on effect of several features on classification accuracy. In the proposed method 746 axial computed tomography (CT) images of the lung; 349 Covid-19 (positives) and 397 non-Covid-19 (negative) are used. Gray-level texture, shape and first order statistical features were extracted from the images. The feature vector for model training is constructed with one feature group or combination of more than one group. We then classified with Support Vector Machine, Random Forest, k-nearest neighbor and XGBoost classifier models. The hyperparameter of the models were controlled by the tuning test. Experimental results obtained with 10-fold cross-validation. The results of cross-validation verified with the additionally independent test. The best overall accuracy was 98.65% with first order statistics features classified with XGBoost. In the gray level features, the best individual results given by GLSZM as 81.25%, and the best combination result is with GLDM, GLRLM and GLSZM features as 85.52%. An important finding of this paper is that, for Covid-19 classification, the shape and first order statistics features are more valuable than gray level features. The proposed results compared with the literature studies under some Covid-19 dataset for accuracy, precision, sensitivity and F1-score metrics. Also, the literature studies which used the different Covid-19 dataset were compared with the proposed study. Our results have the significant superiority when compared with the literature studies.Entities:
Keywords: CT images; Covid-19; Diagnosis; Features; Machine learning
Year: 2022 PMID: 35350595 PMCID: PMC8947946 DOI: 10.1016/j.bspc.2022.103662
Source DB: PubMed Journal: Biomed Signal Process Control ISSN: 1746-8094 Impact factor: 5.076
Summary of the literature studies.
| Authors | Year | Dataset | Pre-processing | Classification method | Results |
|---|---|---|---|---|---|
| Ardakani et al. | 2020 | 612 CT images (306 COVID-19 and 306 non-COVID-19) | 20 Radiological features extraction | DT, KNN, Naive Bayes, SVM and Ensemble. | Accuracy of 91.94% with Ensamble |
| Al-Karawi et al. | 2020 | 470 CT images (275 positive and 195 negative) | Features extraction (FFT-Gabor) | SVM | 95.37% Accuracy |
| Barstugan et al. | 2020 | 150 CT images | Patch regions cropping and features extraction | SVM | 99.64% Accuracy |
| Dey et al. | 2020 | 400 CT images (200 normal and 200 COVID-19) | Segmentation and feature extraction | RF, KNN, SVM, | Accuracy of 87% with KNN |
| Liu et al. | 2020 | 88 CT images (61 COVID-19 and 27 general pneumonia) | Delineation of ROIs and feature extraction | DT, SVM, LR, KNN and Ensemble of bagged tree | 94.16% Accuracy with EBT |
| Özkaya et al. | 2020 | 150 CT images | Deep learning-based feature extraction | SVM | 98.27% |
| Kassani et al. | 2021 | 274 CT images (117 X-ray and 20 CT positive and 117 X-ray and 20 CT negative) | Deep learning-based feature extraction | DT, RF, XGBoost, AdaBoost, Bagging, LightGBM | 99.00% accuracy on features extracted by DenseNet121 with Bagging |
| Shi et al. | 2021 | 2,685 CT images (1658 COVID-19 and 1027 bacterial pneumonia) | Segmentation and feature extraction | SVM, LR, NN and RF & LightGBM-based proposed method | 89.4% Accuracy |
| Zheng et al. | 2020 | 540 CT images (313 Covid-19 and 229 without Covid-19) | Deep learning-based feature extraction | 2D UNet, 3D CNN | 90.8%. Accuracy |
| Xu et al | 2020 | 618 CT images (219 Covid-19, 224 IAVP, 175 healthy people) | Deep learning-based feature extraction | 3D CNN, ResNet18, the location-attention | 86.7% Accuracy |
| Song et al. | 2021 | 274 CT images (88 Covid-19, 100 infected with bacteria pneumonia and 86 healthy people) | Deep learning-based feature extraction | ResNet50, Details Relation Extraction neural network | 93% Accuracy |
| Wang et al. | 2021 | 1065 CT images (325 COVID-19 positive and 740 COVID-19 negative) | Deep learning-based feature extraction | CNN, GoogleNet inception network | 89.5% Accuracy |
| Alsharman et al. | 2020 | 812 CT images (349 COVID-19 and 463 non-COVID-19) | Deep learning-based feature extraction | GoogleNet based CNN | 82.14% Accuracy |
Fig. 1Overview of the proposed method.
Fig. 2Samples of the CT images dataset. a) images of Covid-19 infected patients, b) images of non-Covid-19 patients.
Fig. 3An example of the calculations of GLCM, GLRLM and GLSZM matrices [22].
Fig. 4SVM visualization in 2D.
Fig. 5Decision boundaries created by the nearest neighbors for different values of K [24].
Fig. 6Random forest model consists of four decision trees.
Fig. 7The basic structure of XGBoost [24].
Fig. 8The basic concept of k-fold cross validation. P stands for performance.
The hyperparameter spaces of the machine learning models’.
| Model | Hyperparameter Spaces |
|---|---|
| SVM | Kernel = { |
| RF | Min samples leaf = { |
| KNN | Number of neighbors (K) = {3, |
| XGBoost | Learning rate = {0.01, 0.1, |
* The bold parameters were selected for overall results.
Results of GLCM features.
| GLCM | ||||
|---|---|---|---|---|
| ACC | PRE | SENS | F1 | |
| SVM | 72.66% | 73.21% | 75.98% | 75.25% |
| RF | 77% | 79.57% | 79.67% | 79.68% |
| KNN | 73.40% | 74.50% | 76.40% | 74.10% |
| XGBoost | 79% | 77.90% | 78% | |
Results of GLRLM features.
| GLRLM | ||||
|---|---|---|---|---|
| ACC | PRE | SENS | F1 | |
| SVM | 72.60% | 70.4% | 71.42% | 70.93% |
| RF | 79% | 82.46% | 80.9% | |
| KNN | 74.27% | 75.6% | 78.75% | 77.96% |
| XGBoost | 78.42% | 79.42% | 82.42% | 78.3% |
Results of GLSZM features.
| GLSZM | ||||
|---|---|---|---|---|
| Metrics | ACC | PRE | SENS | F1 |
| SVM | 73.99% | 74.35% | 80.13% | 78.21% |
| RF | 82% | 84.14% | 82.85% | |
| KNN | 78% | 79.3% | 81.64% | 80.4% |
| XGBoost | 80.54% | 82.11% | 82.8% | 82.18% |
Results of GLCM, GLRLM, GLSZM combination features.
| GLCM + GLRLM + GLSZM | ||||
|---|---|---|---|---|
| ACC | PRE | SENS | F1 | |
| SVM | 82.85% | 84.04% | 83.79% | 83.41% |
| RF | 86.22% | 85.77% | 86% | |
| KNN | 80.9% | 84% | 81.1% | 80.2% |
| XGBoost | 84.32% | 85.14% | 85.78% | 85.45% |
Results of GLDM, GLRLM, GLSZM combination features.
| GLDM + GLRLM + GLSZM | ||||
|---|---|---|---|---|
| ACC | PRE | SENS | F1 | |
| SVM | 78.95% | 75.88% | 89.66% | 81.54% |
| RF | 82.82% | 86.57% | 85% | |
| KNN | 82.98% | 84.48% | 81.8% | 83.54% |
| XGBoost | 82.44% | 84.72% | 83.48% | 84.47% |
Fig. 9Confusion matrix of RF classifier results for GLDM, GLRLM, GLSZM feature combination.
Results of GLSZM + NGTDM + GLDM combination features.
| GLSZM + NGTDM + GLDM | ||||
|---|---|---|---|---|
| ACC | PRE | SENS | F1 | |
| SVM | 70.11% | 72.72% | 74.34% | 72.15% |
| RF | 84.25% | 85.29% | 84.64% | |
| KNN | 80.96% | 81.33% | 82.87% | 82.37% |
| XGBoost | 82.83% | 82.88% | 85.41% | 83.51% |
Results of GLCM + NGTDM + GLDM combination features.
| GLCM + NGTDM + GLDM | ||||
|---|---|---|---|---|
| ACC | PRE | SENS | F1 | |
| SVM | 74.78% | 76.87% | 79.98% | 79.97% |
| RF | 81% | 83.6% | 83% | |
| KNN | 78.28% | 78.78% | 79.4% | 78.7% |
| XGBoost | 80.41% | 82.57% | 78.42% | 81.73% |
Results of the Shape features.
| Shape | ||||
|---|---|---|---|---|
| ACC | PRE | SENS | F1 | |
| SVM | 91.3% | 91.3% | 90% | 90.64% |
| RF | 92.34% | 93.37% | 92% | |
| KNN | 88.75% | 87.95% | 91% | 89.38% |
| XGBoost | 91.42% | 91.83% | 93.22% | 92.37% |
Results of the First Order Statistics features.
| First Order | ||||
|---|---|---|---|---|
| ACC | PRE | SENS | F1 | |
| SVM | 94.36% | 100% | 88.9% | 94.39% |
| RF | 94.1% | 96.1% | 93.59% | 94.25% |
| KNN | 83.9% | 90.86% | 81.17% | 84.98% |
| XGBoost | 99.73% | 96.98% | 98.35% | |
Fig. 10Confusion matrix of RF classifier results for shape features.
Fig. 11Confusion matrix of XGBoost classifier results for First order statistics features.
The performance comparison of the proposed with the literature studies.
| Method | Dataset Size | Accuracy(%) |
|---|---|---|
| Ardakani et al. | 612 | 91.94 |
| Al-Karawi et al. | 470 | 95.37 |
| Barstugan et al. | 150 | 99.64 |
| Dey et al. | 400 | 87.00 |
| Liu et al. | 88 | 94.16 |
| Özkaya et al. | 150 | 98.27 |
| Kassani et al. | 274 | 99.00 |
| Shi et al. | 2,685 | 89.40 |
| Zheng et al. | 540 | 90.80 |
| Xu et al. | 618 | 86.70 |
| Song et al. | 274 | 93.00 |
| Wang et al. | 1065 | 89.50 |
| Alsharman et al. | 812 | 82.14 |
| Our Proposed | 746 | 98.65 |
The performance comparison between proposed method and the literature studies under same dataset.
| Method | ACC | PRE | SENS | F1 |
|---|---|---|---|---|
| Saeedi et al. | 90.61 | 89.76 | 90.80 | 90.28 |
| Pham | 96.20 | 92.22 | 95.78 | 96.00 |
| Sakagianni et al. | 88.10 | 88.57 | 86.11 | 87.32 |
| Elaziz et al. | 78.30 | 78.50 | 78.30 | 78.40 |
| Madhavi et al. | 97.50 | 96.73 | 98.34 | 97.48 |
| Shaik et al. | 97.79 | 97.77 | 97.84 | 97.78 |
| Cruz | 86.70 | 88.17 | 83.67 | 85.86 |
| Polsinelli et al. | 85.03 | 85.01 | 87.55 | 86.26 |
| Proposed Method | 98.65 | 99.73 | 96.98 | 98.35 |
Independent test set scores of proposed method with COVID-CT Dataset.
| Feature vector | Classifier | ACC | PRE | SENS | F1 |
|---|---|---|---|---|---|
| GLSZM | SVM | 70.67 | 69.15 | 81.25 | 74.71 |
| RF | 83.33 | 82.35 | 87.50 | 84.85 | |
| KNN | 74.00 | 78.87 | 70.00 | 74.17 | |
| XGBoost | 82.00 | 80.46 | 87.50 | 83.83 | |
| GLCM + GLRLM + GLSZM | SVM | 82.67 | 80.00 | 90.00 | 84.71 |
| RF | 88.00 | 85.23 | 93.75 | 89.29 | |
| KNN | 80.67 | 80.00 | 85.00 | 82.42 | |
| XGBoost | 86.00 | 85.54 | 88.75 | 87.12 | |
| GLSZM + NGTDM + GLDM | SVM | 81.33 | 77.66 | 91.25 | 83.91 |
| RF | 85.33 | 84.52 | 88.75 | 86.59 | |
| KNN | 81.33 | 80.23 | 86.25 | 83.13 | |
| XGBoost | 85.33 | 84.52 | 88.75 | 86.59 | |
| Shape | SVM | 90.00 | 90.12 | 91.25 | 90.68 |
| RF | 94.00 | 94.00 | 94.00 | 94.00 | |
| KNN | 92.00 | 92.23 | 92.00 | 91.97 | |
| XGBoost | 94.67 | 94.67 | 94.67 | 94.67 | |
| First Order | SVM | 96.67 | 96.89 | 96.67 | 96.67 |
| RF | 96.67 | 96.67 | 96.67 | 96.67 | |
| KNN | 87.33 | 87.35 | 87.33 | 87.34 | |
| XGBoost | 98.67 | 98.67 | 98.67 | 98.67 |