| Literature DB >> 35017792 |
Gustavo B Libotte1, Lucas Dos Anjos1, Regina C C Almeida1, Sandra M C Malta1, Renato S Silva1.
Abstract
Reliable data are essential to obtain adequate simulations for forecasting the dynamics of epidemics. In this context, several political, economic, and social factors may cause inconsistencies in the reported data, which reflect the capacity for realistic simulations and predictions. In the case of COVID-19, for example, such uncertainties are mainly motivated by large-scale underreporting of cases due to reduced testing capacity in some locations. In order to mitigate the effects of noise in the data used to estimate parameters of models, we propose strategies capable of improving the ability to predict the spread of the diseases. Using a compartmental model in a COVID-19 study case, we show that the regularization of data by means of Gaussian process regression can reduce the variability of successive forecasts, improving predictive ability. We also present the advantages of adopting parameters of compartmental models that vary over time, in detriment to the usual approach with constant values.Entities:
Keywords: Gaussian process regression; Noisy data; Regularization of data; Time-dependent parameters; Uncertainty quantification
Year: 2022 PMID: 35017792 PMCID: PMC8736321 DOI: 10.1007/s11071-021-07069-9
Source DB: PubMed Journal: Nonlinear Dyn ISSN: 0924-090X Impact factor: 5.741
Fig. 1Schematic description of the SEIRPD-Q model
Fig. 2Results of the regularization of data using GPR. The blue dots are the original data and the orange dots represent the resulting regularized data. Shaded areas indicate two standard deviations from the corresponding regularized data. We also show the cumulative data, for comparison purposes
Fig. 3NRMSEs computed by comparing simulations of the SEIRPD-Q model, performed using parameters that best fit the training data, where , in relation to the test set (composed of 14 points). The area under the curve represents the total deviation relative to the test data in all runs, where the number of training data varies. (Color figure online)
Fig. 4Simulations of the daily number of infected and dead individuals using optimal parameters obtained with the deterministic approach, by varying the amount of data in the training set. The shaded areas represent the variation range of the simulations, whereas the points are the fitted data. Box-and-whisker diagrams are used to show the variability of the simulations on specific days. We also show the corresponding results for cumulative data, where the noise is less effective, for comparison purposes. The results follow the same color scheme as in Fig. 2, blue for original data, and orange for regularized data. (Color figure online)
Statistical results of all 136 simulations performed using the optimal parameters. The results refer to the cumulative number of infected and dead individuals, that is, for , obtained using both original and regularized data on the last day that the model was simulated, October 5, 2020. On this day, Rio de Janeiro accumulated 273,335 confirmed cases and 18,780 deaths
| Original data | Regularized data | |||
|---|---|---|---|---|
| Min | 146,763.4 | 8,066.9 | 80,762.4 | 6,950.6 |
| Max | 36,518,946.5 | 3,579,976.6 | 253,387.8 | 17,528.7 |
| Median | 259,406.5 | 17,313.5 | 217,621.5 | 16,196.7 |
| Q1 | 172,065.9 | 14,796.5 | 163,064.3 | 15,249.4 |
| Q3 | 3,970,094.4 | 342,994.5 | 243,274.8 | 17,059.5 |
MAP values and 95% CIs of the parameters estimated using Bayesian calibration (in a.u.)
| Data type | ||
|---|---|---|
| Original | Regularized | |
| 0.007121 | 0.004859 | |
| 0.07126 | 0.0625 | |
| 1190.8021 | 1236.5740 | |
Fig. 5The left frame shows the posterior distribution of the parameters obtained in the Bayesian inference for original (in blue) and regularized (in orange) data, by fitting the daily number of infected and dead individuals; the right frame illustrates the variance of each parameter in a comparative way, where the same color scheme is adopted. (Color figure online)
Fig. 6Simulations with parameters obtained by fitting the daily number of infected (in red) and dead (in green) individuals using the Bayesian approach. The shaded areas refer to the 95% CI (see Table 2). Hatched areas bound the range of training and test data. Training data is never regularized. (Color figure online)
Fig. 7Optimal parameters obtained by fitting the daily data of infected and dead individuals, for . Each run is associated with a training set with a specific size, from 60 to 196 data in each set. The test data refer to the 14 subsequent data from the corresponding run. Parameters obtained by fitting original data are shown in blue, and regularized data are shown in orange. (Color figure online)
Fig. 8NRMSEs computed by comparing simulations of the SEIRPD-Q model, performed using parameters that best fit the training data, where , in relation to the test set (composed of 14 points). The area under the curve represents the total deviation relative to the test data in all runs, where the number of training data varies
Fig. 9Optimal parameters obtained in a procedure similar to that of Fig. 7, for . We also show the interval that includes the curves of , for each optimal value of and associated with the same run. In addition, we also show the optimal values of presented in Fig. 7, for purposes of comparison with the curves of