Literature DB >> 35410707

Automated detection of COVID-19 cases from chest X-ray images using deep neural network and XGBoost.

Abstract

INTRODUCTION: In late 2019 and after the COVID-19 pandemic in the world, many researchers and scholars tried to provide methods for detecting COVID-19 cases. Accordingly, this study focused on identifying patients with COVID-19 from chest X-ray images.
METHODS: In this paper, a method for diagnosing coronavirus disease from X-ray images was developed. In this method, DenseNet169 Deep Neural Network (DNN) was used to extract the features of X-ray images taken from the patients' chests. The extracted features were then given as input to the Extreme Gradient Boosting (XGBoost) algorithm to perform the classification task.
RESULTS: Evaluation of the proposed approach and its comparison with the methods presented in recent years revealed that this method was more accurate and faster than the existing ones and had an acceptable performance for detecting COVID-19 cases from X-ray images. The experiments showed 98.23% and 89.70% accuracy, 99.78% and 100% specificity, 92.08% and 95.20% sensitivity in two and three-class problems, respectively.
CONCLUSION: This study aimed to detect people with COVID-19, focusing on non-clinical approaches. The developed method could be employed as an initial detection tool to assist the radiologists in more accurate and faster diagnosing the disease. IMPLICATION FOR PRACTICE: The proposed method's simple implementation, along with its acceptable accuracy, allows it to be used in COVID-19 diagnosis. Moreover, the gradient-based class activation mapping (Grad-CAM) can be used to represent the deep neural network's decision area on a heatmap. Radiologists might use this heatmap to evaluate the chest area more accurately.

Entities: Chemical

Keywords: COVID-19; Chest X-ray images; Deep neural network (DNN); DenseNet169; XGBoost

Mesh：

Year: 2022 PMID： 35410707 PMCID： PMC8958100 DOI： 10.1016/j.radi.2022.03.011

Source DB: PubMed Journal: Radiography (Lond) ISSN： 1078-8174

Introduction

COVID-19 virus was reported in Wuhan, China, in late December 2019 with unknown causes, after which it spread rapidly throughout the world.1, 2, 3 The virus prevailed in most parts of China within 30 days. The infectious disease caused by this type of virus was named COVID-19 by the World Health Organization (WHO) on February 11, 2020. COVID-19 was reported in Iran on February 21, 2020. About 3.5 and 192 million confirmed cases were identified in Iran and worldwide until July 20, 2021, respectively. Most types of coronavirus affect animals, but they can also be transmitted to humans due to their common nature. Severe Acute Respiratory Syndrome (SARS-CoV)-associated coronavirus causes humans’ severe respiratory disease and death. The well-known signs and symptoms of COVID-19 include fever, cough, sore throat, headache, fatigue, muscle pain, and shortness of breath. Since the prevalence of this pandemic, the COVID-19 virus has directly impacted the lifestyles of most communities, including human health, social welfare, businesses, and social relationships. It has also put its indirect effects, such as reducing the quality of education in schools and universities, weakening family relationships, decreasing sports activities, and so on. The most common method of COVID-19 diagnosis in individuals is the Real-Time Reverse Transcription-Polymerase Chain Reaction (RT-PCR) assay. However, identification through this approach is time-consuming, and the results may have a high level of false-negative errors. , Alternatively, chest radiographic imaging methods, such as Computed Tomography scan (CT-scan) and X-ray, can have a vital and effective role in the timely diagnosis and treatment of this disease, especially in pregnant women and children. , Chest X-ray Radiograph (CXR) images are mostly utilized to diagnose chest pathology and have been rarely applied to detect COVID-19. This research was conducted on these types of images due to the availability of radiographic imaging devices in most hospitals and specialized clinics. , According to the previous studies, radiological images of patients with COVID-19 bear important and useful information for identifying the virus in the body. However, one of the disadvantages of using CXR images is that they cannot detect soft tissues with poor contrast and are not thus capable of determining the degree of a patient's lung involvement. , To compensate for this shortcoming, Computer-Aided Diagnosis (CAD) systems can be employed. , Most CAD systems depend on the development of Graphics Processing Units (GPUs), which are applied to implement medical image processing algorithms, such as image enhancement and limb or tumor segmentation. , The development of artificial intelligence, especially some of its branches, such as Machine Learning and Deep Learning (DL), has contributed to greater intelligence in this process compared to human intelligence. Artificial intelligence has also significantly impacted the speed of the processes involved in such fields as medical sciences for performing diagnosis or even treatment. For instance, in areas like diagnosing lung , and cardiovascular , diseases and performing brain surgery, , it has so effectively contributed to the medical community and patients. Advances in DL have shown promising results in medical image analysis and radiology. , DL has got various architectures, each of which involves a variety of applications in the related fields. A type of DL architecture is Deep Convolutional Neural Network (DCNN), which is employed specifically in image processing. Among its varied applications, pattern recognition and image classification can be mentioned. Depending on the problem involved, DCNNs can be used in many different ways. One of the existing methods is to use pre-trained neural networks, which were utilized in this research. Based on this approach, pre-trained models that are freely available are employed, and image features are extracted using DNNs. The second step following the extraction of image features is to utilize classification methods for conducting the classification task. Among the various classification methods, such as Support Vector Machines (SVMs), Decision Trees, etc., the XGBoost classifier was applied in this paper. Apostolopoulos et al. carried out a study on a set of X-ray images from patients with pneumonia, COVID-19, and healthy individuals to assess the Convolutional Neural Network (CNN) performance. In this research, transfer learning was utilized, and the research process was done in 3 stages. The results demonstrated that the use of DL could lead to the extraction of significant features from COVID-19. In another study, Wang et al. presented the COVID-Net network (a DCNN for COVID-19 detection), implemented on X-ray images. Their proposed network could help physicians during the screening phase. Sethy et al. utilized DL and SVM to detect coronavirus-involved patients using X-ray images. Since SVM provides a powerful approach, it was applied in their classification process. Hemdan et al. proposed COVIDX-Net, which consisted of VGG16 and Google MobileNet. Mishra et al. proposed a decision fusion approach, which combined predictions from varied DCNNs to identify COVID-19 from chest CT images. In their study, Narin et al. proposed five models for diagnosing patients with pneumonia and coronavirus via X-ray images. These models were based on pre-trained CNNs, such as ResNet 152, ResNet 101, ResNet 50, Inception-ResNetV2, and InceptionV3. Pandit et al. employed the pre-trained VGG-16 to detect COVID-19 from chest radiographs and achieved 96% accuracy. Sung et al. developed a system to identify patients with COVID-19 using the CT images collected from hospitals in two provinces of China. Javadi Moghaddam and Gholamalinejad developed a novel DL structure for COVID-19 detection. The pooling layer of their proposed structure was a combination of pooling and the Squeeze Excitation Block layer. They also used the Mish function for convergence optimization. In a study, Ozturk et al. proposed a new model for detecting COVID-19 by using X-ray images. Their proposed model was presented based on the two problems of binary classification (for distinguishing COVID-19 from the “no-finding” class) and multi-class classification (for distinguishing COVID-19, pneumonia, and the “no-finding” classes). They proposed a DCNN called DarkCovidNet, which included 17 convolutional layers, and achieved 98.08% and 87.02% accuracies in their binary and multi-class classifications, respectively. The rest of the paper is organized as follows: Section: Methodology will describe the methodology. In Section: Proposed Method, results and discussion will be presented. Finally, the conclusion will be presented in Section: Results & Discussion.

Methodology

In this section, first, the dataset used in this study is presented. Then the XGBoost algorithm employed for classification in our proposed method is introduced, and finally, the proposed method is presented.

Dataset

Same as X-ray images from two different sources were used as the dataset in this paper: 1-Covid-19 X-ray images dataset, which was collected by Cohen, 2-ChestX-ray 8 dataset collected by Wang et al. Cohen collected images from public sources and through indirect collection from hospitals and physicians. This project was approved by the University of Montreal's Ethics Committee. Fig. 1 depicts the sample images in this dataset. There are 43 female and 82 male cases in the dataset, and the subjects' average age is approximately 55 years.

Figure 1

Sample images in Cohen's dataset.

Sample images in Cohen's dataset. The ChestX-ray 8 dataset was employed for normal and pneumonia images. This dataset consists of 108,948 frontal view X-ray images of 32,717 unique patients, from which 500 no-findings and 500 pneumonia chest X-ray images were selected randomly. So overall, the dataset used in this study contained 1125 X-ray images of the studied individuals’ chests, including 125 images labeled as COVID-19, 500 images labeled as pneumonia, and 500 images labeled as no findings. Fig. 2 shows the sample images in the ChestX-ray8 dataset.

Figure 2

Sample images in the ChestX-ray 8 dataset.

XGBoost

XGBoost is an efficient and scalable algorithm based on tree boosting proposed by Chen & Guestrin in 2016. , It is an improved version of the Gradient Boosted Decision Tree (GBDT) method. It has proven not to have its computational limitations41, 42, 43 and thus differs from the GBDT method. GBDT uses the first-order Taylor expansion, while the second-order Taylor expansion is utilized in the XGBoost's loss function.44, 45, 46 In addition, the objective function is normalized in XGBoost to alleviate the model's complexity and prevent it from overfitting. ,

Proposed method

Considering the past similar research activities and common methods of using artificial intelligence in image processing, especially for medical images, our proposed method aimed to utilize the extracted features of the images by using pre-trained networks. One of the applications of artificial intelligence is the use of transfer learning techniques. In this technique, various networks are designed and trained with a huge set of available data, and the weights of the network layers are calculated. For example, in image processing, the ImageNet dataset contains millions of images in 1000 different classes. Several methods employ pre-trained networks, as follows: Using the structures of pre-designed networks to train one's model, remove the last layer of the presented network, and finally add layers to perform classification. Extracting image features by using pre-trained models and using the extracted features to perform classification via other algorithms. In this paper, the features were extracted using the second method, and the XGBoost classifier was employed for classification. In this way, the images were first given as input to the DenseNet169 DNN so that the network could extract image features. The extracted features were then given as input to the XGBoost algorithm to perform the classification operation. The framework of the proposed method can be seen in Fig. 3 . The proposed method was implemented using Python 3.8 and Keras 2.4 (i.e., the Python deep learning API).

Figure 3

Framework of the proposed method.

Results

This research was done in two phases. In the first phase, the best (pre-trained) DNN was selected to extract the features, and in the second phase, the XGBoost classifier parameters were set by trial-and-error. Also, the ChestX-ray 8 dataset was used for two cases: 2-class problem, including COVID-19 and no findings (625 images), and 3-class problem, consisting of COVID-19, pneumonia, and no findings (1125 images). In the first phase, 17 pre-trained neural networks were assessed, and the XGBoost classifier was employed along with the default parameters for classification. Table 1 shows the average accuracy of the DNNs for each of the 2-class and 3-class problems. It should be noted that a 5-fold cross-validation method was applied to obtain the average accuracies in this experiment.

Table 1

Comparison of the average accuracies of the different DNNs.

DNN	Average Accuracy (%)
DNN	Three-class Problem	Two-class Problem
Xception	78.84	93.59
VGG16	81.68	96.48
VGG19	80.08	95.36
ResNet 50	80.71	95.51
ResNet 152	79.55	95.68
ResNet50V2	80.53	94.71
ResNet101V2	76.88	93.95
ResNet152V2	77.60	93.59
InceptionV3	79.02	92.79
InceptionResNetV2	68.44	90.72
MobileNet	79.55	95.51
MobileNetV2	82.57	96.16
DenseNet121	82.51	96.32
DenseNet169	83.02	97.43
DenseNet201	82.31	96.63
NASNetMobile	74.57	93.11
EfficientNetB0	80.00	97.28

Comparison of the average accuracies of the different DNNs. As can be seen, the DenseNet169 network has the best accuracy in both cases. As a result, this network was selected to extract the features of the proposed model in the second phase. The input to this network included images with the dimensions of 224 × 224 × 3, and its output consisted of 1664 features, which the network extracted from the given images. After determining the network type, parameters of the XGBoost classifier were set. Table 2 shows the parameters used in the XGBoost algorithm.

Table 2

The XGBoost parameter settings.

Parameter	Value
Base Learner	Gradient boosted tree
Tree construction algorithm	Exact greedy
Number of gradients boosted trees	100
Learning rate (η)	0.44
Lagrange multiplier (γ)	0
Maximum depth of trees	6

The XGBoost parameter settings. A 5-fold cross-validation was used for the 2-class problem, 80% of the dataset was utilized for training in the 3-class problem, and the remaining 20% was applied as the test set. The average accuracy for the 2-class problem was 98.23%, and the test accuracy for the 3-class problem was 89.70%. The confusion matrices for each of the five folds in the 2-class problem are shown in Fig. 4 , and the confusion matrix for the 3-class problem is shown in Fig. 5 .

Figure 4

Confusion matrices for the 2-class problem.

Figure 5

Confusion matrix for the 3-class problem.

Confusion matrices for the 2-class problem. Confusion matrix for the 3-class problem. The results of comparing the proposed approach with the method proposed by Ozturk et al. for the 3-class and 2-class problems can be observed in Table 3, Table 4 , respectively. Also, a comparison of the results obtained in this study with those of other proposed methods is given in Table 5 .

Table 3

Comparison of the proposed method with DarkCovidNet (3-class problem).

	Proposed Method	DarkCovidNet
Sensitivity	95.20	88.17
Specificity	100	93.66
Precision	92.50	90.97
F1-score	91.20	89.44
Accuracy	89.70	89.33

Table 4

Comparison of the proposed method with DarkCovidNet (2-class problem).

Performance Metrics		Fold 1	Fold 2	Fold 3	Fold 4	Fold 5	Average
Sensitivity	Proposed Method	95.20	95.40	96.70	81.40	91.40	92.08
Sensitivity	DarkCovidNet	100	96.42	90.47	93.75	93.18	95.13
Specificity	Proposed Method	100	100	100	89.90	100	99.78
Specificity	DarkCovidNet	100	96.42	90.47	93.75	93.18	95.30
Precision	Proposed Method	99.50	99.50	99.40	95.30	99.02	98.54
Precision	DarkCovidNet	100	94.52	98.14	98.57	98.58	98.03
F1-score	Proposed Method	98.50	98.50	98.20	92.50	97.30	97.00
F1-score	DarkCovidNet	100	95.52	93.79	95.93	95.62	96.51
Accuracy	Proposed Method	99.20	99.20	99.20	95.20	98.40	98.24
Accuracy	DarkCovidNet	100	97.60	96.80	97.60	97.60	98.08

Table 5

Comparison of the proposed method with other DL-based methods.

Study	Type of Images	Number of Samples	Method Used	Accuracy (%)
Apostolopoulos et al.²⁸	Chest X-ray	1428	VGG-19	93.48
Wang et al.²⁹	Chest X-ray	13,645	COVID-Net	92.40
Sethy et al.³⁰	Chest X-ray	50	ResNet 50 + SVM	95.38
Hemdan et al.³¹	Chest X-ray	50	COVIDX-Net	90.00
Narin et al.³³	Chest X-ray	100	Deep CNN ResNet-50	98.00
Song et al.³⁵	Chest CT	1485	DRE-Net	86.00
Wang et al.⁴⁸	Chest CT	453	M-Inception	82.90
Zheng et al.⁴⁹	Chest CT	542	UNet + 3D Deep Network	90.80
Xu et al.⁵⁰	Chest CT	443	ResNet + Location Attention	86.60
Ozturk et al.¹²	Chest X-ray	625	DarkCovidNet	98.08
Ozturk et al.¹²	Chest X-ray	1125	DarkCovidNet	89.33
Proposed Method	Chest X-ray	625	DenseNet169 + XGBoost	98.24
Proposed Method	Chest X-ray	1125	DenseNet169 + XGBoost	89.70

Comparison of the proposed method with DarkCovidNet (3-class problem). Comparison of the proposed method with DarkCovidNet (2-class problem). Comparison of the proposed method with other DL-based methods.

Discussion

As can be seen in Table 3, Table 4, the proposed method has better performance than the DarkCovidNet network in both 3-class and 2-class problems. Noteworthy, the proposed approach had a higher speed and lower computational complexity than the method presented by Ozturk et al. because it did not require training of the DNN. Note that the proposed method has just trained the XGBoost algorithm. Table 5 demonstrates that the proposed method is more accurate than other DL-based models. However, it should be noted that the results presented in Table 5 were obtained from different datasets. This study's limitations include using an unbalanced dataset with a limited number of COVID-19 X-ray images and low sensitivity in the two-class problem. To compare the performance of XGBoost with other machine learning algorithms, we employed Random Forest and SVM as the classifier instead of XGBoost and repeated the experiments. The linear kernel was used for SVM. Table 6 shows the result of comparing different machine learning algorithms. As can be seen, the XGBoost outperforms other machine learning algorithms in both 2-class and 3-class problems.

Table 6

Comparison of the different machine learning algorithms.

	Accuracy (%)
Method	2-class problem	3-class problem
DenseNet169 + XGBoost	98.24	89.70
DenseNet169 + Random Forest	95.85	80.15
DenseNet169 + SVM	96.96	79.20

Comparison of the different machine learning algorithms. To further analyze the proposed method, the gradient-based class activation mapping (Grad-CAM) was used to represent the decision area on a heatmap. Fig. 6 depicts the heatmaps for two confirmed COVID-19 cases. As can be seen, the developed method extracted correct features, and the model is mainly concentrated on the lung area. Radiologists can employ the heatmap to evaluate the chest area more accurately.

Figure 6

Heatmap of two confirmed COVID-19 cases.

Conclusion

This study aimed to detect and identify people with COVID-19, focusing on non-clinical approaches and artificial intelligence techniques. In the proposed method, DenseNet169 was employed to extract image features, and the XGBoost algorithm was used for classification. The obtained results revealed that the detection accuracy of the proposed method in the 2-class problem was 98.24%, which was higher than other proposed methods. Also, 89.70% accuracy was reached in the 3-class problem, thus indicating better performance compared to the DarkCovidNet network. Besides being highly accurate, the proposed approach had a higher speed and lower computational complexity than the other proposed methods since it did not require the training of DNN.

Funding statement

This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.

Conflict of interest statement

The authors declare that they have no conflicts of interest.

3 in total

1. Modeling of energy consumption factors for an industrial cement vertical roller mill by SHAP-XGBoost: a "conscious lab" approach.

Authors: Rasoul Fatahi; Hamid Nasiri; Ehsan Dadfar; Saeed Chehreh Chelgani
Journal: Sci Rep Date: 2022-05-09 Impact factor: 4.379

2. BLCov: A novel collaborative-competitive broad learning system for COVID-19 detection from radiology images.

Authors: Guangheng Wu; Junwei Duan
Journal: Eng Appl Artif Intell Date: 2022-08-15 Impact factor: 7.802

3. A Novel Lightweight Approach to COVID-19 Diagnostics Based on Chest X-ray Images.

Authors: Agata Giełczyk; Anna Marciniak; Martyna Tarczewska; Sylwester Michal Kloska; Alicja Harmoza; Zbigniew Serafin; Marcin Woźniak
Journal: J Clin Med Date: 2022-09-20 Impact factor: 4.964

3 in total