| Literature DB >> 33204229 |
Warda M Shaban1, Asmaa H Rabie2, Ahmed I Saleh2, M A Abo-Elsoud3.
Abstract
COVID-19, as an infectious disease, has shocked the world and still threatens the lives of billions of people. Recently, the detection of coronavirus (COVID-19) is a critical task for the medical practitioner. Unfortunately, COVID-19 spreads so quickly between people and approaches millions of people worldwide in few months. It is very much essential to quickly and accurately identify the infected people so that prevention of spread can be taken. Although several medical tests have been used to detect certain injuries, the hopefully detection efficiency has not been accomplished yet. In this paper, a new Hybrid Diagnose Strategy (HDS) has been introduced. HDS relies on a novel technique for ranking selected features by projecting them into a proposed Patient Space (PS). A Feature Connectivity Graph (FCG) is constructed which indicates both the weight of each feature as well as the binding degree to other features. The rank of a feature is determined based on two factors; the first is the feature weight, while the second is its binding degree to its neighbors in PS. Then, the ranked features are used to derive the classification model that can classify new persons to decide whether they are infected or not. The classification model is a hybrid model that consists of two classifiers; fuzzy inference engine and Deep Neural Network (DNN). The proposed HDS has been compared against recent techniques. Experimental results have shown that the proposed HDS outperforms the other competitors in terms of the average value of accuracy, precision, recall, and F-measure in which it provides about of 97.658%, 96.756%, 96.55%, and 96.615% respectively. Additionally, HDS provides the lowest error value of 2.342%. Further, the results were validated statistically using Wilcoxon Signed Rank Test and Friedman Test.Entities:
Keywords: COVID-19; Classification; Feature selection; Fuzzy logic
Year: 2020 PMID: 33204229 PMCID: PMC7659585 DOI: 10.1016/j.asoc.2020.106906
Source DB: PubMed Journal: Appl Soft Comput ISSN: 1568-4946 Impact factor: 6.725
Fig. 1COVID-19 epidemic curve with and without protective measures.
Predicted number of COVID-19 cases using X 3 and m 5 days.
| Number of incubation period (m) | Day | Predicted incident cases (E | Predicted total cases(E) |
|---|---|---|---|
| 0 | 0 | 1( | 1 |
| 1 | 5 | 3 ( | 4 ( |
| 2 | 10 | 9 ( | 13 ( |
| 3 | 15 | 27 ( | 40 ( |
Predicted number of COVID-19 deaths using X 3, m 5 days, D 10% and n 14 days.
| Number of incubation | Day | Predicted incident cases (E | Predicted new deaths (S | Predicted total deaths (S) |
|---|---|---|---|---|
| 0 | 0 ( | 1 | 0 | 0.0 |
| 1 | 5 | 3 | 0 | 0.0 |
| 2 | 10 | 9 | 0 | 0.0 |
| 14 | 0 | 0.1 ( | 0.1 | |
| ( | ||||
| 3 | 15 | 27 | 0 | 0.1 |
| 19 | 0 | 0.3 ( | 0.4 | |
Fig. 2Different COVID-19 diagnosis techniques.
Comparison about previous works on COVID-19 classification techniques.
| Used technique | Description | Advantages | Disadvantages |
|---|---|---|---|
| DarkCovidNet model | DarkCovidNet model is an automated COVID-19 detection model that was introduced as a new detection method based on using chest X-ray images. It represented a development of deep learning technique to be able to perform binary and multi-class classification. | DarkCovidNet can be used in remote places in countries affected by COVID-19 to overcome a shortage of radiologists. Also, it can be used to diagnose other chest-related diseases including tuberculosis and pneumonia. | A limitation of this model is the use of a limited number of COVID-19 X-ray images. |
| Group Method of Data Handling (GMDH) model | GMDH model was used as binary classification model. It is a type of artificial neural networks that used to predict the number of confirmed COVID-19 cases in Hubei province. | GMDH has the ability to work with inadequate knowledge and it have fault tolerance. | Unexplained behavior of the network represents the most problem of GMDH. |
| KNN Variant (KNNV) algorithm | KNNV algorithm was introduced to accurately and efficiently classify COVID-19 patients using incomplete and heterogeneous COVID-19 data. It inherited the merits of KNN in which different K values were calculated for each unknown patient independently and efficient computations for the distances between patients were implemented | KNNV is a simple technique that used the merits of KNN method to classify COVID-19 patients. | KNNV is a lazy learning method that has a high computational time. |
| Automated Detection and Patient Monitoring (ADPM) algorithm | ADPM was proposed for the detection, quantification, and tracking of COVID-19 patients. It depended on using a deep learning model to classify COVID-19 from CT images | ADPM could distinguish COVID-19 patients from other patients in which it efficient to classify positive cases. | ADPM cannot provide the optimal accuracy. |
| Proposed Convolutional Neural Network (CNN) | CNN was proposed to accurately detect COVID-19 patients using EfficientNet architecture. CNN was used to perform binary and multi-class classification using X-ray images | CNN can accurately detect COVID-19 patients. | More complex |
| Corona Patients Detection Strategy (CPDS) | CPDS was proposed to detect COVID-19 patients using enhanced KNN classifier based on the most effective and significant features. these features were selected using Hybrid Feature Selection Methodology (HFSM). | CPDS can accurately detect infected patients with minimum time penalty. | KNN is a lazy learner. |
Fig. 3The proposed hybrid diagnose strategy.
Some of laboratory findings of patients infected with COVID-19.
| Features | Description |
|---|---|
| C-Reactive Protein (CRP) | CRP is one of the plasma proteins known as acute-phase proteins (p). The pooled effect size showed that CRP level was significantly higher in patients with severe COVID-19 than patients with non-severe COVID-19. |
| Lactate Dehydrogenase (LDH) | LDH is a type of protein, known as an enzyme. LDH plays an important role in making your body’s energy. The pooled effect size showed that LDH level was significantly higher in patients with severe COVID-19 than patients with non-severe COVID-19. |
| Eosinophil | Eosinophils are a type of disease-fighting white blood cell. |
| Leukocytes (WBC) | WBC means the number of white blood cells in a sample of blood. COVID-19 patients have low WBC count in the first day. |
| Neutrophils | Neutrophils are the most abundant type of granulocytes and make up 40% to 70% of all white blood cells in humans. |
| Basophils | Basophils are a type of white blood cell. Although they are produced in the bone marrow, they are found in many tissues throughout your body. |
| Lymphocyte (LYM) | LYM is a small white blood cell (leukocyte) that plays a large role in defending the body against disease. It was significantly lower in patients with severe COVID-19 than patients with non-severe COVID-19. |
| Platelets | Platelets are tiny blood cells that help your body form clots to stop bleeding. |
| Monocytes | Monocytes are a type of leukocyte, or white blood cell. They are the largest type of leukocyte and can differentiate into macrophages and myeloid lineage dendritic cells. |
| Alanine Aminotransferase (ALT) | ALT is an enzyme found primarily in the liver and kidney. It was originally referred to as serum glutamic pyruvic transaminase (SGPT). Normally, a low level of ALT exists in the serum. |
An illustration of estimating the feature impact.
| Feature (f | Accuracy( | Accuracy( | w(f | Action |
|---|---|---|---|---|
| f | 0.86 | 0.83 | 0.03 | Keep |
| f | 0.88 | 0.88 | 0.0 | Remove |
| f | 0.85 | 0.87 | −0.02 | Remove |
Fig. 4Feature Connectivity Graph (FCG) illustrative example.
Fig. 5Identifying the friends of a feature (illustrative example).
Fig. 6The employed fuzzy inference engine for COVID-19 detection.
Fig. 7The membership functions for the considered fuzzy sets.
The assigned values of , , and .
| Parameter | Assigned value |
|---|---|
| 2 | |
| 4 | |
| 6 | |
| 6.5 | |
| 12 | |
| 19.5 | |
| 0.5 | |
| 1.3 | |
| 2 | |
| 450 | |
| 530 | |
| 650 |
Fig. 8Sources of input data uncertainty.
The considered fuzzy rules.
| ID | WBC | LYM | MON | LDH | Rule output | ID | WBC | LYM | MON | LDH | Rule output | ID | WBC | LYM | MON | LDH | Rule output |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | L | L | L | L | L | 28 | M | L | L | L | L | 55 | H | L | L | L | M |
| 2 | L | L | L | M | L | 29 | M | L | L | M | L | 56 | H | L | L | M | L |
| 3 | L | L | L | H | M | 30 | M | L | L | H | L | 57 | H | L | L | H | H |
| 4 | L | L | M | L | L | 31 | M | L | M | L | L | 58 | H | L | M | L | L |
| 5 | L | L | M | M | M | 32 | M | L | M | M | M | 59 | H | L | M | M | M |
| 6 | L | L | M | H | L | 33 | M | L | M | H | M | 60 | H | L | M | H | H |
| 7 | L | L | H | L | M | 34 | M | L | H | L | L | 61 | H | L | H | L | H |
| 8 | L | L | H | M | L | 35 | M | L | H | M | M | 62 | H | L | H | M | H |
| 9 | L | L | H | H | M | 36 | M | L | H | H | H | 63 | H | L | H | H | H |
| 10 | L | M | L | L | L | 37 | M | M | L | L | M | 64 | H | M | L | L | L |
| 11 | L | M | L | M | M | 38 | M | M | L | M | M | 65 | H | M | L | M | M |
| 12 | L | M | L | H | L | 39 | M | M | L | H | M | 66 | H | M | L | H | H |
| 13 | L | M | M | L | M | 40 | M | M | M | L | M | 67 | H | M | M | L | M |
| 14 | L | M | M | M | M | 41 | M | M | M | M | M | 68 | H | M | M | M | M |
| 15 | L | M | M | H | M | 42 | M | M | M | H | M | 69 | H | M | M | H | M |
| 16 | L | M | H | L | L | 43 | M | M | H | L | M | 70 | H | M | H | L | H |
| 17 | L | M | H | M | M | 44 | M | M | H | M | M | 71 | H | M | H | M | M |
| 18 | L | M | H | H | H | 45 | M | M | H | H | M | 72 | H | M | H | H | H |
| 19 | L | H | L | L | M | 46 | M | H | L | L | L | 73 | H | H | L | L | H |
| 20 | L | H | L | M | L | 47 | M | H | L | M | M | 74 | H | H | L | M | H |
| 21 | L | H | L | H | H | 48 | M | H | L | H | H | 75 | H | H | L | H | H |
| 22 | L | H | M | L | L | 49 | M | H | M | L | M | 76 | H | H | M | L | H |
| 23 | L | H | M | M | M | 50 | M | H | M | M | M | 77 | H | H | M | M | M |
| 24 | L | H | M | H | H | 51 | M | H | M | H | M | 78 | H | H | M | H | H |
| 25 | L | H | H | L | H | 52 | M | H | H | L | H | 79 | H | H | H | L | H |
| 26 | L | H | H | M | H | 53 | M | H | H | M | M | 80 | H | H | H | M | H |
| 27 | L | H | H | H | H | 54 | M | H | H | H | H | 81 | H | H | H | H | H |
The fuzzy rules using two items of evidence per rule.
| Rule ID | Rule |
|---|---|
| 1 | |
| 2 | |
| … | … |
| N |
Fig. 9The output membership function.
Fig. 10The output membership function.
Fig. 11DNN architecture.
Fig. 12Illustrative example showing how to diagnose an input case for a person using the proposed HDS.
The parameters applied with the corresponding used values.
| Parameter | Description | Applied value |
|---|---|---|
| 2 | ||
| 4 | ||
| 6 | ||
| 6.5 | ||
| 12 | ||
| 19.5 | ||
| 0.5 | ||
| 1.3 | ||
| 2 | ||
| 450 | ||
| 530 | ||
| 650 | ||
| 3 | ||
| 6 | ||
| 9 | ||
| Weighting factors | 1 | |
| 0.5 |
Dataset description.
| Criteria | Value/Description | ||||
|---|---|---|---|---|---|
| Total number of cases | Male | Female | |||
| 188 | 91 | ||||
| Sick cases | COVID-19 | Non COVID-19 | |||
| 177 | 102 | ||||
| COVID-19 patients | 20–40 | 41–60 | 61–80 | >80 | |
| 5 | 9 | 63 | 78 | 22 | |
Fig. 13The total number of cases according to age.
Fig. 14The total number of cases according to age and sex.
Fig. 15The presentation of COVID-19 patient and non COVID-19 patient distribution.
Confusion matrix.
| Predicted label | |||
|---|---|---|---|
| Positive | Negative | ||
| Known label | Positive | True Positive (TP) | False Negative (FN) |
| Negative | False Positive (FP) | True Negative (TN) | |
Confusion matrix formulas.
| Measure | Formula | Intuitive meaning |
|---|---|---|
| Precision (P) | TP/(TP | The percentage of positive predictions those are correct. |
| Recall/ | TP/(TP | The percentage of positive labeled instances that were predicted as positive. |
| Accuracy (A) | (TP | The percentage of predictions those are correct. |
| Error (E) | 1-Accuracy | The percentage of predictions those are incorrect. |
| Macro-average | The average of the precision and recall of the system on different c classes. | |
| Micro-average | (TP1 | The summation up to the individual true positives, false positives, and false negatives of the system for different classes and the apply them to get the statistics |
| (TP1 | ||
| F-measure | 2*PR/(P+R) | The weighted harmonic mean of Precision and Recall |
Performance of HDS in terms of accuracy, precision, recall, and error.
| Fold | Accuracy | Precision | Recall | Error |
|---|---|---|---|---|
| 1 | 98% | 97% | 97% | 2% |
| 2 | 96.86% | 96% | 96% | 3.14% |
| 3 | 96.86% | 95.5% | 95.5% | 3.14% |
| 4 | 98% | 97% | 97% | 2% |
| 5 | 98% | 97% | 97% | 2% |
| 6 | 98% | 96.6% | 96.5% | 2% |
| 7 | 96.86% | 95.86% | 95% | 3.14% |
| 8 | 98% | 97.5% | 97% | 2% |
| 9 | 98% | 97% | 96.5% | 2% |
| 10 | 98% | 98.1% | 98% | 2% |
| Average | 97.658% | 96.756% | 96.55% | 2.342% |
Performance of HDS in terms of Macro-average (precision& recall) and Micro-average (precision& recall), and F-measure.
| Fold | Macro-average precision | Macro-average recall | Micro-average precision | Micro-average recall | F-measure |
|---|---|---|---|---|---|
| 1 | 96% | 97% | 96.5% | 96.8% | 97% |
| 2 | 95.6% | 95.9% | 96% | 96% | 96% |
| 3 | 95% | 95.8% | 95% | 95% | 95% |
| 4 | 96.5% | 97% | 96.9% | 97.2% | 97% |
| 5 | 96.5% | 97% | 96.9% | 96.9% | 96.9% |
| 6 | 96% | 96.5% | 96.6% | 97% | 96.5% |
| 7 | 95.55% | 95.3% | 95.5% | 95.8% | 95.75% |
| 8 | 97.1% | 96.3% | 97.8% | 96.9% | 97% |
| 9 | 96% | 96% | 96.9% | 97% | 96.5% |
| 10 | 97.1% | 98% | 97% | 96.9% | 98.5% |
| Average | 96.135% | 96.44% | 96.51% | 96.55% | 96.615% |
Comparison between HDS and the existing classification technique in terms of accuracy, precision, recall, and error.
| Used technique | Accuracy | Precision | Recall | Error |
|---|---|---|---|---|
| DarkCovidNet | 84.26% | 85.6% | 82.5% | 15.74% |
| GMDH | 92.48% | 93% | 91.4% | 7.52% |
| KNNV | 91.5% | 92.3% | 93.6% | 8.5% |
| ADPM | 90.4% | 89.9% | 89.9% | 9.6% |
| CNN | 85.6 | 87.42% | 85.6% | 14.4% |
| CPDS | 94.9% | 90.86% | 91% | 5.1% |
| RBFL | 95.97% | 92.6% | 93.48% | 4.03% |
| HDS | 97.658% | 96.756% | 96.5% | 2.342% |
Comparison between HDS and the existing classification technique in terms of Macro-average (precision& recall), Micro-average (precision& recall), and F-measure.
| Used technique | Macro-average precision | Macro-average recall | Micro-average precision | Micro-average recall | F-measure |
|---|---|---|---|---|---|
| DarkCovidNet | 83% | 84.6% | 82% | 87.6% | 82.5% |
| GMDH | 92% | 90% | 90.6% | 89.95% | 90.3% |
| KNNV | 89.5% | 85.7% | 87.6% | 88.5% | 90.1% |
| ADPM | 89.5% | 88.9% | 87.98% | 89.56% | 83% |
| CNN | 85.7 | 86.4% | 83.6% | 83.9% | 87% |
| CPDS | 90.1% | 92.16% | 91.8% | 89.8% | 93.42% |
| RBFL | 94.7% | 93.6% | 93. 8% | 94.7%% | 94.8%% |
| HDS | 96.035% | 96.42% | 96.5% | 96.52 | 96.615% |
Fig. 16ROC curve of the validation testing.
WSRT results.
| Model 1 vs. Model 2 | WSRT | p- value | Estimated median difference |
|---|---|---|---|
| HDS vs. DarkCovidNet | 0.0 | 0.00 | 0.003 |
| HDS vs. GMDH | 0.0 | 0.00 | 0.0017 |
| HDS vs. KNNV | 0.0 | 0.00 | 0.0018 |
| HDS vs. ADPM | 0.0 | 0.00 | 0.0025 |
| HDS vs. CNN | 0.0 | 0.00 | 0.0031 |
| HDS vs. CPDS | 0.0 | 0.00 | 0.0016 |
| HDS vs. RBFL | 0.0 | 0.00 | 0.0015 |
Friedman mean ranking.
| Used technique | Rank |
|---|---|
| DarkCovidNet | 6.0 |
| GMDH | 2.3 |
| KNNV | 3.36 |
| ADPM | 3.8 |
| CNN | 4.8 |
| CPDS | 2.1 |
| RBFL | 1.5 |
| HDS | 1.2 |
Comparison between HDS with different feature selection techniques in terms of accuracy, precision, recall, and error.
| Used technique | Accuracy | Precision | Recall | Error |
|---|---|---|---|---|
| Method1: HDS-(FAR-BSO) | 90.4% | 86.5% | 84.5% | 9.6% |
| Method 2: HDS-OCS | 92.98% | 92.9% | 92.4% | 7.02% |
| Method 3: HDS-FWFSS | 93.8% | 93.4% | 93.6% | 6.2% |
| Method 4: HDS-HFS | 95.5% | 94.9% | 93.9% | 4.5% |
| HDS | 97.658% | 96.756% | 96.5% | 2.342% |
Comparison between HDS with different feature selection techniques e in terms of Macro-average (precision& recall), Micro-average (precision& recall) and F- measure.
| Used technique | Macro-average precision | Macro-average recall | Micro-average precision | Micro-average recall | F-measure |
|---|---|---|---|---|---|
| Method1: HDS-(FAR-BSO) | 85% | 86.6% | 85.9% | 89.6% | 86.5% |
| Method 2: HDS-OCS | 93% | 91% | 90.6% | 89.95% | 90.9% |
| Method 3: HDS-FWFSS | 93.5% | 94.7% | 91.46% | 90.5% | 91.1% |
| Method 4: HDS-HFS | 94.5% | 94.9% | 92.08% | 92.56% | 92.8% |
| HDS | 96.035% | 96.42% | 96.5% | 96.52 | 96.615% |