| Literature DB >> 35317385 |
Josephat Kalezhi1, Mathews Chibuluma2, Christopher Chembe3, Victoria Chama4, Francis Lungo5, Douglas Kunda6.
Abstract
The outbreak of Covid-19 pandemic has been declared a global health crisis by the World Health Organization since its emergence. Several researchers have proposed a number of techniques to understand how the pandemic affects the populations. Reported among these techniques are data mining models which have been successfully applied in a wide range of situations before the advent of Covid-19 pandemic. In this work, the researchers have applied a number of existing data mining methods (classifiers) available in the Waikato Environment for Knowledge Analysis (WEKA) machine learning library. WEKA was used to gain a better understanding on how the epidemic spread within Zambia. The classifiers used are J48 decision tree, Multilayer Perceptron and Naïve Bayes among others. The predictions of these techniques are compared against simpler classifiers and those reported in related works.Entities:
Keywords: COVID-19; Coronavirus; J48 algorithm; Multilayer perceptron; Naïve Bayes; WEKA
Year: 2022 PMID: 35317385 PMCID: PMC8813672 DOI: 10.1016/j.rineng.2022.100363
Source DB: PubMed Journal: Results Eng ISSN: 2590-1230
Fig. 1Decision tree produced by the J48 algorithm for the whole country from 1st August 2020 to 11 September 2020.
Fig. 2Decision tree produced by the J48 algorithm for Muchinga province over the stated period.
Results of running ZeroR with 10-fold cross-validation for Muchinga province from 1st August to 11 September 2020.
| Classifier | Correctly classified instances (%) | Incorrectly classified instances (%) | Total number of instances | Root mean squared error |
|---|---|---|---|---|
| ZeroR | 49.375 | 50.625 | 320 | 0.3881 |
Results of running OneR with 10-fold cross-validation for Muchinga province from 1st August to 11 September 2020.
| Classifier | Correctly classified instances (%) | Incorrectly classified instances (%) | Total number of instances | Root mean squared error |
|---|---|---|---|---|
| OneR | 86.875 | 13.125 | 320 | 0.2562 |
Results of running J48 with 10-fold cross-validation for Muchinga province from 1st August to 11 September 2020.
| Classifier | Correctly classified instances (%) | Incorrectly classified instances (%) | Total number of instances | Root mean squared error |
|---|---|---|---|---|
| J48 | 87.1875 | 12.8125 | 320 | 0.2215 |
Results of running J48 with different random seeds and repeated holdout for Muchinga province from 1st August to 11 September 2020.
| Random seed | Correctly classified instances (%) |
|---|---|
| 1 | 87.1875 |
| 2 | 87.5 |
| 3 | 87.5 |
| 4 | 86.5625 |
| 5 | 86.5625 |
| 6 | 86.875 |
| 7 | 86.25 |
| 8 | 86.25 |
| 9 | 86.875 |
| 10 | 86.5625 |
| Sample mean | 86.8125 |
| Standard deviation | 0.4612 |
Comparison of results for Muchinga province with 10-fold cross validation from 1st August to 11 September 2020. Total number of instances = 320.
| Classifier | Correctly classified instances (%) | Incorrectly classified instances (%) | Root mean squared error |
|---|---|---|---|
| ZeroR | 49.375 | 50.625 | 0.3881 |
| OneR | 86.875 | 13.125 | 0.2562 |
| J48 | 87.1875 | 12.8125 | 0.2215 |
| Multilayer Perceptron | 87.5 | 12.5 | 0.2128 |
| Naïve Bayes | 87.5 | 12.5 | 0.2239 |
| Random Forest | 85.9375 | 14.0625 | 0.2394 |
| Support Vector Machine | 88.4375 | 11.5625 | 0.3288 |
| K Nearest Neighbor | 82.5 | 17.5 | 0.2663 |
| Logistic Regression | 87.5 | 12.5 | 0.2177 |
Comparison of results for the nation (Zambia) with 10-fold cross validation from 1st August to 11 September 2020. Total number of instances = 5338.
| Classifier | Correctly classified instances (%) | Incorrectly classified instances (%) | Root mean squared error |
|---|---|---|---|
| ZeroR | 37.692 | 62.308 | 0.2978 |
| OneR | 47.0776 | 52.9224 | 0.3637 |
| J48 | 57.8494 | 42.1506 | 0.2645 |
| Multilayer perceptron | 48.5013 | 51.4987 | 0.2837 |
| Naïve Bayes | 48.1266 | 51.8734 | 0.2825 |
| Random Forest | 53.1847 | 46.8153 | 0.2899 |
| Support Vector Machine | 55.2454 | 44.7546 | 0.3014 |
| K Nearest Neighbor | 51.8359 | 48.1641 | 0.3268 |
| Logistic Regression | 55.976 | 44.024 | 0.2619 |
Comparison of Correctly Classified instances for some provinces of Zambia with 10-fold cross validation from 1st August to 11 September 2020.
| Province | Number of instances | ZeroR (%) | OneR (%) | J48 (%) | Multilayer perceptron (%) | Naïve Bayes (%) |
|---|---|---|---|---|---|---|
| Lusaka | 3433 | 41.5963 | 49.2281 | 49.9854 | 48.9659 | 50.9758 |
| Copperbelt | 844 | 62.3223 | 68.128 | 67.1801 | 66.2322 | 67.7725 |
| Central | 176 | 59.0909 | 83.5227 | 81.25 | 81.8182 | 81.25 |
| Southern | 228 | 42.5439 | 54.8246 | 47.807 | 47.807 | 57.8947 |
| Eastern | 61 | 54.0984 | 67.2131 | 77.0492 | 80.3279 | 72.1311 |
| Northwestern | 160 | 60.625 | 82.5 | 78.75 | 80 | 78.75 |
| Luapula | 11 | 0 | 100 | 81.8182 | 100 | 54.5455 |
| Northern | 98 | 64.2857 | 86.7347 | 84.6939 | 84.6939 | 82.6531 |
| Muchinga | 320 | 49.375 | 86.875 | 87.1875 | 87.5 | 87.5 |
Comparison of Correctly Classified instances for additional classifiers reported in Ref. [37] with 10-fold cross validation from 1st August to 11 September 2020.
| Province | Number of instances | Random Forest (%) | Support Vector Machine (%) | K Nearest Neighbor (%) | Logistic Regression (%) |
|---|---|---|---|---|---|
| Lusaka | 3433 | 43.7518 | 49.1407 | 43.1984 | 51.2962 |
| Copperbelt | 844 | 62.6777 | 68.128 | 61.4929 | 69.4313 |
| Central | 176 | 73.8636 | 84.0909 | 69.8864 | 82.3864 |
| Southern | 228 | 45.1754 | 55.2632 | 40.3509 | 54.8246 |
| Eastern | 61 | 75.4098 | 67.2131 | 78.6885 | 77.0492 |
| Northwestern | 160 | 75 | 81.25 | 73.125 | 80 |
| Luapula | 11 | 90.9091 | 100 | 90.9091 | 100 |
| Northern | 98 | 85.7143 | 86.7347 | 81.6327 | 84.6939 |
| Muchinga | 320 | 85.9375 | 88.4375 | 82.5 | 87.5 |
The simplified high-level J48 Algorithm [38].
| Step 1: The classifier first selects an attribute to split on to serve as a root node. A branch is then created for every possible attribute value. |
| Step 2: The instances are then split into subsets, one for every branch extending from the root node. |
| Step 3: Repeat the procedure recursively from step 1 for each branch, that is, selecting an attribute for each node, and using only attributes that reach the branch. |
| Step 4: Stop if all attributes have the same class. |