| Literature DB >> 33286642 |
Abstract
In this research, we develop ordinal decision-tree-based ensemble approaches in which an objective-based information gain measure is used to select the classifying attributes. We demonstrate the applicability of the approaches using AdaBoost and random forest algorithms for the task of classifying the regional daily growth factor of the spread of an epidemic based on a variety of explanatory factors. In such an application, some of the potential classification errors could have critical consequences. The classification tool will enable the spread of the epidemic to be tracked and controlled by yielding insights regarding the relationship between local containment measures and the daily growth factor. In order to benefit maximally from a variety of ordinal and non-ordinal algorithms, we also propose an ensemble majority voting approach to combine different algorithms into one model, thereby leveraging the strengths of each algorithm. We perform experiments in which the task is to classify the daily COVID-19 growth rate factor based on environmental factors and containment measures for 19 regions of Italy. We demonstrate that the ordinal algorithms outperform their non-ordinal counterparts with improvements in the range of 6-25% for a variety of common performance indices. The majority voting approach that combines ordinal and non-ordinal models yields a further improvement of between 3% and 10%.Entities:
Keywords: AdaBoost; COVID-19; decision trees; ensemble algorithms; epidemic; information gain; objective-based entropy; ordinal classification; random forest
Year: 2020 PMID: 33286642 PMCID: PMC7517475 DOI: 10.3390/e22080871
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1The meta-code of the proposed ordinal random forest classifier based on the objective-based information gain (OBIG) measure.
Figure 2The meta-code of the proposed ordinal AdaBoost classifier based on the OBIG measure.
Performance measures of CART models for the classification of the daily growth factor.
| Performance Measures for Classification | Performance Measures for Ordinal Classification | ||||
|---|---|---|---|---|---|
|
|
|
|
|
| |
|
| |||||
| CART | 0.361 | 0.379 | 0.493 | 1.537 | −0.079 |
|
| |||||
| Ordinal CART–OBE( | 0.391 | 0.389 |
| 1.684 | −0.011 |
| Ordinal CART–OBE( | 0.366 | 0.358 | 0.518 |
| 0.068 |
| Ordinal CART–OBE( | 0.385 | 0.389 | 0.526 | 1.274 |
|
| Ordinal CART–OBE( |
|
| 0.535 | 1.316 | 0.016 |
Figure 3Comparison of the AUC values (y-axis) for two best ordinal CART models vs. non-ordinal CART as a function of the growth factor level (x-axis).
Performance measures for the classification of the daily growth factor using ordinal and non-ordinal AdaBoost models.
| Performance Measures for Classification | Performance Measures for Ordinal Classification | ||||
|---|---|---|---|---|---|
|
|
|
|
|
| |
|
| |||||
| ADABoost | 0.380 | 0.411 | 0.507 | 1.253 | 0.006 |
|
| |||||
| Ordinal AdaBoost–OBE( | 0.377 | 0.381 | 0.504 | 1.56 | −0.093 |
| Ordinal AdaBoost–OBE( |
|
|
|
|
|
| Ordinal AdaBoost–OBE( | 0.405 | 0.484 | 0.538 | 1.242 | 0.072 |
| Ordinal AdaBoost–OBE( | 0.447 | 0.474 | 0.563 | 1.221 | 0.122 |
Performance measures for the classification of the daily growth factor using ordinal and non-ordinal random forest models.
| Performance Measures for Classification | Performance Measures for Ordinal Classification | ||||
|---|---|---|---|---|---|
|
|
|
|
|
| |
|
| |||||
| Random forest (RF) | 0.405 | 0.400 | 0.540 | 1.421 | 0.035 |
|
| |||||
| Ordinal RF–OBE( |
|
|
|
|
|
| Ordinal RF–OBE( | 0.407 | 0.453 | 0.559 | 1.400 | 0.126 |
| Ordinal RF–OBE( | 0.425 |
| 0.557 | 1.211 | 0.131 |
| Ordinal RF–OBE( | 0.437 | 0.442 | 0.555 | 1.411 | 0.026 |
Figure 4Comparison of the AUC values (y-axis) for the best ordinal AdaBoost and random forest classifiers vs. their conventional counterparts as a function of the growth factor level (x-axis).
Paired t-test results for the significance of the difference in the predictions of the ordinal classifiers and their non-ordinal counterparts (ordinal CART vs. CART; ordinal AdaBoost vs. AdaBoost; ordinal random forest vs. random forest).
| Paired | |||
|---|---|---|---|
| Ordinal CART—OBE | Ordinal AdaBoost—OBE( | Ordinal RF—OBE( | |
| Non-ordinal counterpart | 0.14 | 0.0057 | 0.0015 |
Figure 5Error distribution of ordinal CART and non-ordinal CART.
Figure 6Error distribution of ordinal AdaBoost and non-ordinal AdaBoost.
Figure 7Error distribution of ordinal random forest and non-ordinal random forest.
Performance measures of the best ordinal classifiers in comparison to eight popular non-ordinal classifiers.
| Performance Measures for Classification | Performance Measures for Ordinal Classification | ||||
|---|---|---|---|---|---|
|
|
|
|
|
| |
|
| |||||
| Naïve Bayes | 0.246 | 0.305 | 0.478 |
| −0.065 |
| Logistic regression |
|
| 0.560 | 1.189 | 0.060 |
| Gradient boosting | 0.347 | 0.356 | 0.481 | 1.611 | −0.117 |
| XGBoost | 0.378 | 0.389 | 0.506 | 1.558 | −0.077 |
| K-nearest neighbor | 0.433 | 0.453 | 0.543 | 1.305 | 0.013 |
| AdaBoost | 0.380 | 0.411 | 0.507 | 1.253 | 0.006 |
| Random forest | 0.405 | 0.400 | 0.540 | 1.421 | 0.035 |
| CART | 0.361 | 0.379 | 0.493 | 1.537 | −0.079 |
|
| |||||
| Ordinal CART–OBE( | 0.409 | 0.442 | 0.535 | 1.316 | 0.016 |
| Ordinal AdaBoost—OBE( |
|
|
|
|
|
| Ordinal RF—OBE( | 0.439 | 0.484 |
| 1.147 |
|
Comparison of global performance measures for the majority voting ensemble approach and the best individual classifiers (non-ordinal and ordinal).
| Performance Measure | Majority Voting Model Based on Ordinal and Non-Ordinal Classifiers | Best Non-Ordinal Classifier: Logistic Regression | Best ordinal classifier: ordinal AdaBoost based on OBE |
|---|---|---|---|
|
|
| 0.453 | 0.475 |
|
|
| 0.505 | 0.526 |
|
|
| 0.560 | 0.578 |
|
|
| 1.189 | 1.137 |
|
|
| 0.060 | 0.016 |