| Literature DB >> 33424081 |
Nonita Sharma1, Jaiditya Dev2, Monika Mangla3, Vaishali Mehta Wadhwa4, Sachi Nandan Mohanty5, Deepti Kakkar1.
Abstract
The manuscript presents a bragging-based ensemble forecasting model for predicting the number of incidences of a disease based on past occurrences. The objectives of this research work are to enhance accuracy, reduce overfitting, and handle overdrift; the proposed model has shown promising results in terms of error metrics. The collated dataset of the diseases is collected from the official government site of Hong Kong from the year 2010 to 2019. The preprocessing is done using log transformation and z score transformation. The proposed ensemble model is applied, and its applicability to a specific disease dataset is presented. The proposed ensemble model is compared against the ensemble models, namely dynamic ensemble for time series, arbitrated dynamic ensemble, and random forest using different error metrics. The proposed model shows the reduced value of MAE (mean average error) by 27.18%, 3.07%, 11.58%, 13.46% for tuberculosis, dengue, food poisoning, and chickenpox, respectively. The comparison drawn between the proposed model and the existing models shows that the proposed ensemble model gives better accuracy in the case of all the four-disease datasets. © Ohmsha, Ltd. and Springer Japan KK, part of Springer Nature 2021.Entities:
Keywords: Bootstrapping; Bragging; Disease forecasting; Ensemble; Time series forecasting
Year: 2021 PMID: 33424081 PMCID: PMC7781432 DOI: 10.1007/s00354-020-00119-7
Source DB: PubMed Journal: New Gener Comput ISSN: 0288-3635 Impact factor: 1.048
Fig. 1Proposed bragging-based ensemble model demonstrating the modeling of trend using ensemble model and seasonality modeling using bragged ensemble model
Statistics measures of ADF test
| Year | Month | Outbreaks | Persons affected |
|---|---|---|---|
| 2019 | January | 7 | 26 |
| 2019 | February | 24 | 216 |
| 2019 | March | 21 | 90 |
| 2019 | April | 17 | 62 |
| 2019 | May | 24 | 74 |
| 2019 | June | 35 | 90 |
| 2019 | July | 20 | 86 |
| 2019 | August | 11 | 26 |
| 2019 | September | 21 | 60 |
| 2019 | October | 17 | 69 |
| 2019 | November | 5 | 12 |
| 2019 | December | 10 | 37 |
Fig. 2Time series representation of disease datasets to demonstrate the trend and seasonality in the data
Fig. 3Application of ensemble model on the bootstrapped datasets. Lighter grey lines represent the individual forecasts of the model. Red line represents the final forecast
Statistics measures of ADF test
| 0.815369 | |
| 0.991880 | |
| 13.000000 | |
| − 3.481682 | |
| − 2.884042 | |
| − 2.578770 |
MAE error metrics result after the application of base model on disease datasets
| Base learners | MAE | |||
|---|---|---|---|---|
| Tuberculosis | Dengue | Food poisoning | Chicken pox | |
| Theta forecasting | 13.40 | 2.48 | 49.65 | 54.12 |
| Moving average | 14.63 | 4.58 | 56.01 | 46.86 |
| Spline | 13.64 | 4.19 | 37.06 | 33.71 |
| Naïve | 9.56 | 3.84 | 21.58 | 30.41 |
| Random walk with drift | 9.55 | 3.84 | 42.25 | 44.25 |
| Croston’s method | 16.58 | 3.31 | 49.96 | 47.97 |
| Holt winter | 12.57 | 3.36 | 47.58 | 44.41 |
| Simple exponential smoothing | 12.55 | 3.34 | 65.38 | 43.44 |
| Seasonal naïve | 3.59 | 3.43 | 18.57 | 39.24 |
| NNAR | 2.97 | 2.75 | 14.59 | 24.12 |
| ETS | 4.26 | 2.88 | 88.31 | 56.46 |
| ARIMA | 3.50 | 2.92 | 31.78 | 37.78 |
Error metrics result of best performing base model (NNAR) on disease datasets
| Disease | RMSE | MAE | MAPE |
|---|---|---|---|
| Tuberculosis | 4.34 | 2.97 | 0.82 |
| Dengue | 3.70 | 2.75 | 0.82 |
| Food poisoning | 19.57 | 14.59 | 0.76 |
| Chickenpox | 13.50 | 24.10 | 0.39 |
Error metrics result after the application of proposed model on disease datasets
| Ensemble models | RMSE | MAE | MAPE |
|---|---|---|---|
| DETS-time series ensemble model (tuberculosis) | 4.15# | 2.17# | 0.86# |
| ADE-time series ensemble model (tuberculosis) | 4.27 | 2.56 | 0.87 |
| Random forest (tuberculosis) | 5.86 | 3.04 | 1.24 |
| Proposed ensemble model (tuberculosis) | 2.25# | 1.58# | 0.54# |
| DETS-time series ensemble model (dengue) | 3.19# | 1.95# | 0.78# |
| ADE-time series ensemble model (dengue) | 3.26 | 2.31 | 0.91 |
| Random forest (dengue) | 5.26 | 3.56 | 1.13 |
| Proposed ensemble model (dengue) | 1.43# | 1.89# | 0.34# |
| DETS-time series ensemble model (food poisoning) | 14.51 | 11.95 | 0.25 |
| ADE-time series ensemble model (food poisoning) | 16.23 | 12.34 | 0.31 |
| Random forest (food poisoning) | 19.54 | 21.87 | 0.95 |
| Proposed ensemble model (food poisoning) | 11.56# | 11.45# | 0.13# |
| DETS-time series ensemble model (chickenpox) | 14.53 | 18.12 | 0.21 |
| ADE-time series ensemble model (chickenpox) | 15.31 | 20.12 | 0.28 |
| Random forest (chickenpox) | 20.08 | 24.53 | 0.34 |
| Proposed ensemble model (food poisoning) | 12.78# | 15.68# | 0.26# |
# indicates minimum value of the error metric
Comparison of percentage decrease in error metrics of proposed ensemble model vis-à-vis other ensemble models
| Ensemble models | RMSE (%) | MAE (%) | MAPE (%) |
|---|---|---|---|
| Proposed ensemble model (tuberculosis) | 45.17 | 27.18 | 37.20 |
| Proposed ensemble model (dengue) | 55.17 | 3.07 | 56.41 |
| Proposed ensemble model (food poisoning) | 20.33 | 4.18 | 48 |
| Proposed ensemble model (food poisoning) | 12.04 | 13.46 | − 23.8 |