| Literature DB >> 34149100 |
Warda M Shaban1, Asmaa H Rabie2, Ahmed I Saleh2, M A Abo-Elsoud3.
Abstract
COVID-19, as an infectious disease, has shocked the world and still threatens the lives of billions of people. Early detection of COVID-19 patients is an important issue for treating and controlling the disease from spreading. In this paper, a new strategy for detecting COVID-19 infected patients will be introduced, which is called Distance Biased Naïve Bayes (DBNB). The novelty of DBNB as a proposed classification strategy is concentrated in two contributions. The first is a new feature selection technique called Advanced Particle Swarm Optimization (APSO) which elects the most informative and significant features for diagnosing COVID-19 patients. APSO is a hybrid method based on both filter and wrapper methods to provide accurate and significant features for the next classification phase. The considered features are extracted from Laboratory findings for different cases of people, some of whom are COVID-19 infected while some are not. APSO consists of two sequential feature selection stages, namely; Initial Selection Stage (IS2) and Final Selection Stage (FS2). IS2 uses filter technique to quickly select the most important features for diagnosing COVID-19 patients while removing the redundant and ineffective ones. This behavior minimizes the computational cost in FS2, which is the next stage of APSO. FS2 uses Binary Particle Swarm Optimization (BPSO) as a wrapper method for accurate feature selection. The second contribution of this paper is a new classification model, which combines evidence from statistical and distance based classification models. The proposed classification technique avoids the problems of the traditional NB and consists of two modules; Weighted Naïve Bayes Module (WNBM) and Distance Reinforcement Module (DRM). The proposed DBNB tries to accurately detect infected patients with the minimum time penalty based on the most effective features selected by APSO. DBNB has been compared with recent COVID-19 diagnose strategies. Experimental results have shown that DBNB outperforms recent COVID-19 diagnose strategies as it introduce the maximum accuracy with the minimum time penalty.Entities:
Keywords: COVID-19; Classification; Feature selection; NB; Optimization; Particle swarm; Wrapper
Year: 2021 PMID: 34149100 PMCID: PMC8205562 DOI: 10.1016/j.patcog.2021.108110
Source DB: PubMed Journal: Pattern Recognit ISSN: 0031-3203 Impact factor: 7.740
Fig. 1A graphic representation of the rapid spike in infections.
Fig. 2Different COVID-19 diagnosis techniques.
Fig. 3The proposed DBNB classification strategy
Fig. 4The sequential steps of APSO method.
An example of single particle.
| f1 | f2 | f3 | f4 | f5 | f6 | f7 | f8 | f9 | f10 | f11 | f12 |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 |
Feature selection using APSO algorithm.
The assumptions for employing BPSO in FS2.
| No. | Assumption | Value |
|---|---|---|
| 1 | No. of generations to process | 2 |
| 2 | Swarm size (no. of particles) | 4”No of filter methods in IS2”(g) |
| 3 | Initial Ppersonal(pi) | Pi |
| 4 | Initial PGlobal | 0 |
| 5 | Initial VPi | 0 |
| 6 | Particle size “P” | 6” No. of features” (m) |
| 7 | Fitness function | Accuracy of NB classifier |
| 8 | Initial swarm | P1={1, 0, 1, 0, 1, 1}@@P2={0, 0, 1, 1, 0, 1}@@P3={1, 1, 1, 1, 0, 1}@@P4={1, 1, 0, 0, 1, 1} |
| 9 | w | 1.1 |
| 10 | c1=c2 | 2 |
| 11 | r1=r2 | 0.6 |
Fig. 5Calculating the distance to class centers.
An example of single particle.
| Case number | f1 | f2 | f3 | f4 | f5 | Diagnose |
|---|---|---|---|---|---|---|
| 1 | L | L | H | M | L | T |
| 2 | H | H | H | M | L | F |
| 3 | M | L | L | L | H | T |
| 4 | L | H | L | L | H | T |
| 5 | M | L | L | L | H | T |
| 6 | H | H | H | M | L | F |
| 7 | M | H | L | H | H | F |
| 8 | L | L | H | L | L | F |
| 9 | M | H | L | H | L | T |
| 10 | H | L | L | L | H | T |
| Normalized Weight | 1 | 0.7 | 0.8 | 0.5 | 0.6 |
Conditional probabilities for feature f1.
| Values | Classes | P(f1|T) | P(f1|F) | |
|---|---|---|---|---|
| T | F | |||
| L | 2 | 1 | 2/6 | 1/4 |
| M | 3 | 1 | 3/6 | 1/4 |
| H | 1 | 2 | 1/6 | 2/4 |
| Total | 6 | 4 | 100% | 100% |
Prior probabilities of the target classes.
| Diagnose | Count | Prior probability |
|---|---|---|
| T | 6 | P(T)=6/10 |
| F | 4 | P(F)=4/10 |
| Total | 10 | 1 |
The applied parameters with the corresponding used values.
| Parameter | Description | Applied value |
|---|---|---|
| m | No. of extracted features | 12 |
| w | Inertia weight | 1.1 |
| c1 | The cognitive acceleration | 2 |
| c2 | The social acceleration | |
| r1 | Uniformly distributed random number | 0.6 |
| r2 |
Dataset description.
| Criteria | Value / Description | ||||||
|---|---|---|---|---|---|---|---|
| Total number of cases | male | female | |||||
| 1969 | 1031 | ||||||
| Not sick (ordinary) cases | 410 | ||||||
| Sick cases | COVID-19 | Other | |||||
| 1990 | 600 | ||||||
| COVID-19 patients | <15 | 15-25 | 25-35 | 35-45 | 45-55 | 55-65 | >65 |
| 20 | 98 | 170 | 287 | 395 | 420 | 600 | |
Fig. 6The total number of cases according to age.
Fig. 8The presentation of COVID-19 patient and non COVID-19 patient distribution.
Clinical laboratory data for 1990 COVID-19 patients.
| Features | Normal Range | Severe group n=696 | Mild group N=1294 |
|---|---|---|---|
| WBC, x109 per L | 3.5-9.5 | 4.96 ± 1.85 | 4.26 ± 1.64 |
| AST, U/L | 15-40 | 33.21 ± 18.24 | 27.80 ± 11.42 |
| ALT, U/ L | 9-50 | 24.50 (15.75, 37.75) | 27.00 (21.00, 41.00) |
| LDH, U/L | 120-250 | 360–540 | 183–360 |
| CRP, mg/L | 0-10 | 18.76 ± 22.20 | 39.37 ± 27.68 |
| PCT, ng/ml | ˂0.1 | 0.02 (0.01, 0.04) | 0.04 (0.02,0.09) |
| FIB, g/L | 2-4 | 3.11 ± 0.83 | 3.84 ± 1.00 |
| Cr, µmol/L | 74.3-107 | 66.96 ± 13.38 | 65.33 ± 15.55 |
| NEU, x109 per L | 1.8-6.3 | 3.43 ± 1.63 | 2.65 ± 1.49 |
| LYM, x109 per L | 1.1-3.2 | 1.07 ± 0.40 | 1.20 ± 0.42 |
| IL-6, pg/L | ≤ 20 | 10.60 (5.13, 24.18) | 36.10 (23.00, 59.20) |
| D-D, μg/ L | 0-0.55 | 0.21 (0.19, 0.27) | 0.49 (0.29, 0.91) |
Performance of APSO in terms of accuracy, precision, and recall.
| Fold | Accuracy | Precision | Recall |
|---|---|---|---|
| 1 | 93.4% | 91.68% | 92.28% |
| 2 | 94.5% | 95.5% | 95.5% |
| 3 | 92.63% | 94.2% | 94% |
| 4 | 94.72% | 95.01% | 94% |
| 5 | 95.52% | 95.8% | 94% |
| 6 | 95.36% | 94.54% | 95.76% |
| 7 | 94.52% | 94.98% | 95.61% |
| 8 | 94.52% | 95.98% | 96.79% |
| 9 | 93% | 91.5% | 95.29% |
| 10 | 94.54% | 94.6% | 91.5% |
| Average | 94.271% | 94.379% | 94.473% |
Comparison between APSO and the existing feature selection techniques in terms of accuracy, precision, recall, and inference time.
| Used Technique | Accuracy | Precision | Recall | Inference time (Sec) |
|---|---|---|---|---|
| FAM-BSO | 92.8% | 92% | 92.2% | 14 |
| OCS | 89% | 89.234% | 89.98% | 11 |
| FWFSS | 91.12% | 90.58% | 90.8% | 12.5 |
| HFS | 93.3% | 92.89% | 92.5% | 12 |
| APSO | 94.271% | 94.379% | 94.473% | 9 |
Performance of WNBM in terms of accuracy, precision, and recall.
| Fold | Accuracy | Precision | Recall |
|---|---|---|---|
| 1 | 96.5% | 95.2% | 96.6% |
| 2 | 95.98% | 94.9% | 93.98% |
| 3 | 97.5% | 96.8% | 96.99% |
| 4 | 97.5% | 96.8% | 96.99% |
| 5 | 95.36% | 94.6% | 93.6% |
| 6 | 96.365% | 95.87% | 94.12% |
| 7 | 97% | 96.8% | 94.2% |
| 8 | 95.98% | 94.6% | 93.9% |
| 9 | 96.78% | 95% | 94.9% |
| 10 | 96.89% | 95.5% | 94.99% |
| Average | 96.585% | 95.607% | 95.027% |
Comparison between WNBM and the existing classification techniques in terms of accuracy, precision, recall, and inference time.
| Used technique | Accuracy | Precision | Recall | Inference time (Sec) |
|---|---|---|---|---|
| EKNN | 93.5% | 90.9% | 92.3% | 18 |
| NB-PKC | 94.02% | 91.68% | 90.78% | 13 |
| WOA-SVM | 92.6% | 90.89% | 91% | 14 |
| WNBM | 96.585% | 95.607% | 95.027% | 11 |
Performance of DBNB in terms of accuracy, precision, and recall.
| Fold | Accuracy | Precision | Recall |
|---|---|---|---|
| 1 | 97.78% | 96% | 96.5% |
| 2 | 96.86% | 95.9% | 95% |
| 3 | 96.86% | 95.5% | 95.5% |
| 4 | 97.78% | 96% | 96.5% |
| 5 | 97.78% | 96% | 96.5% |
| 6 | 97.78% | 96.6% | 96.85% |
| 7 | 96.86% | 95.86% | 95% |
| 8 | 97.78% | 96.5% | 96.5% |
| 9 | 97.78% | 96% | 96.85% |
| 10 | 97.78% | 97.1% | 97.2% |
| Average | 97.504% | 96.146% | 96.24% |
Comparison between DBNB and the existing classification techniques in terms of accuracy, precision, and recall.
| Used technique | Accuracy | Precision | Recall |
|---|---|---|---|
| CNN | 84.2% | 85.3% | 82.12% |
| GMDH | 92.4% | 93% | 91% |
| CPDS | 94.6% | 90.06% | 91.63% |
| DarkCovidNet | 85% | 87.2% | 85.21% |
| CS | 90.2% | 89.19% | 89.4% |
| DBNB | 97.504% | 96.146% | 96.24% |
Conditional probabilities for feature f2.
| Values | Classes | P(f2|T) | P(f2|F) | |
|---|---|---|---|---|
| T | F | |||
| L | 4 | 1 | 4/6 | 1/4 |
| H | 2 | 3 | 2/6 | 3/4 |
| Total | 6 | 4 | 100% | 100% |
Conditional probabilities for feature f3.
| Values | Classes | P(f3|T) | P(f3|F) | |
|---|---|---|---|---|
| T | F | |||
| L | 5 | 1 | 5/6 | 1/4 |
| H | 1 | 3 | 1/6 | 3/4 |
| Total | 6 | 4 | 100% | 100% |
Conditional probabilities for feature f4.
| Values | Classes | P(f4|T) | P(f4|F) | |
|---|---|---|---|---|
| L | 4 | 1 | 4/6 | 1/4 |
| M | 1 | 2 | 1/6 | 2/4 |
| H | 1 | 1 | 1/6 | 1/4 |
| Total | 6 | 4 | 100% | 100% |
Conditional probabilities for feature f5.
| Values | Classes | P(f5|T) | P(f5|F) | |
|---|---|---|---|---|
| T | F | |||
| L | 2 | 3 | 2/6 | 3/4 |
| H | 4 | 1 | 4/6 | 1/4 |
| Total | 6 | 4 | 100% | 100% |