Literature DB >> 32501370

Short-term forecasting COVID-19 cumulative confirmed cases: Perspectives for Brazil.

Matheus Henrique Dal Molin Ribeiro^1,2, Ramon Gomes da Silva¹, Viviana Cocco Mariani^3,4, Leandro Dos Santos Coelho^1,4.

Abstract

The new Coronavirus (COVID-19) is an emerging disease responsible for infecting millions of people since the first notification until nowadays. Developing efficient short-term forecasting models allow forecasting the number of future cases. In this context, it is possible to develop strategic planning in the public health system to avoid deaths. In this paper, autoregressive integrated moving average (ARIMA), cubist regression (CUBIST), random forest (RF), ridge regression (RIDGE), support vector regression (SVR), and stacking-ensemble learning are evaluated in the task of time series forecasting with one, three, and six-days ahead the COVID-19 cumulative confirmed cases in ten Brazilian states with a high daily incidence. In the stacking-ensemble learning approach, the CUBIST regression, RF, RIDGE, and SVR models are adopted as base-learners and Gaussian process (GP) as meta-learner. The models' effectiveness is evaluated based on the improvement index, mean absolute error, and symmetric mean absolute percentage error criteria. In most of the cases, the SVR and stacking-ensemble learning reach a better performance regarding adopted criteria than compared models. In general, the developed models can generate accurate forecasting, achieving errors in a range of 0.87%-3.51%, 1.02%-5.63%, and 0.95%-6.90% in one, three, and six-days-ahead, respectively. The ranking of models, from the best to the worst regarding accuracy, in all scenarios is SVR, stacking-ensemble learning, ARIMA, CUBIST, RIDGE, and RF models. The use of evaluated models is recommended to forecasting and monitor the ongoing growth of COVID-19 cases, once these models can assist the managers in the decision-making support systems.

Entities: Chemical Disease Gene Species

Keywords: ARIMA; COVID-19; Decision-making; Forecasting; Machine learning; Time-series

Year: 2020 PMID： 32501370 PMCID： PMC7252162 DOI： 10.1016/j.chaos.2020.109853

Source DB: PubMed Journal: Chaos Solitons Fractals ISSN： 0960-0779 Impact factor: 5.944

Introduction

The new Coronavirus (COVID-19) is an emerging disease responsible for infecting millions of people and killing thousands worldwide since the first notification until nowadays, according to the World Health Organization (WHO) [1], [2]. Also according to WHO, Brazil registered 40.581 confirmed cases until April 22nd 2020, holding the 12th position in the world ranking in the number of confirmed cases of COVID-19, and 2nd position in the Americas (behind the United States of America). Due to the impacts of the COVID-19 pandemic in people’s lives and the world’s economy, the governments and population are most concerned with (i) when the COVID-19 outbreak will peak; (ii) how long the outbreak will last and (iii) how many people will eventually be infected [3]. Further, Boccaletti et al. [4] have identified at least three scientific communities that may cooperate in the effort to deal with the current pandemic: (i) the community of applied mathematicians, virologists and epidemiologists, developing sophisticated diffusion models to the specific properties of a given pathogen; (ii) the community of complex systems scientists who study the spread of infections using compartmental models, using methods and principles from statistical mechanics and nonlinear dynamics; and (iii) the community of scientists who incorporate artificial intelligence (AI) and most specifically deep learning approaches to produce accurate predictive models. Also, different studies are evaluating the impacts of COVID-19 on society, whether through predictions of future cases, as well as variables capable of helping to understand the spread of this disease [5], [6], [7], [8], [9]. Moreover, epidemiological time series forecasting plays an important role in health public system, once it allows the managers to develop strategic planning to avoid possible epidemics. Forecasting diseases as accurate as possible is important due to their impact on the public health system. To ensure this accuracy, AI models have been widely used to forecast epidemiological time series over the years [10], [11], [12]. Moreover, in the AI context, Vaishya et al. [13] presented a review of trends in COVID-19 data analysis. Regarding this context, the objective of this paper is to explore and compare the predictive capacity of machine learning regression and statistical models, in the task of forecasting one, three, and six-days-ahead COVID-19 cumulative cases in Brazil. In this respect, datasets of ten Brazilian states some with a high incidence of COVID-19 until now, like Sao Paulo and Rio de Janeiro, are adopted to evaluates the forecasting efficiency through of the autoregressive integrated moving average (ARIMA), cubist regression (CUBIST), random forest (RF), ridge regression (RIDGE), support vector regression (SVR), and stacking-ensemble learning models. In the stacking-ensemble learning modelling, which is an effective ensemble learning approach [14], [15], CUBIST, RF, RIDGE, and SVR are used as base-learners (weak models), and Gaussian process (GP) as meta-learner (strong model). The out-of-sample forecasting accuracy of each model is compared by some performance metrics such as the improvement percentage index (IP), mean absolute errors (MAE), and symmetric mean absolute percentage error (sMAPE). The contributions of this paper can be summarized as follows: The first contribution is related to the presentation of a novel analysis of the forecast model for cumulative confirmed cases of COVID-19 in Brazil, whose accuracy of the models assists governors in decision-making to contain the pandemic and strategies concerning the health system; The second contribution, we can highlight the use of heterogeneous machine learning models, as well as the stacking-ensemble learning approach to forecast the Brazilian cumulative confirmed cases of COVID-19; Also, this paper evaluates models forecasting in a multi-day-ahead forecasting strategy. The forecasting time horizons are the interval of one, three, and six-days-ahead. This range of the forecasting time horizon allows us to verify the effectiveness of the predicting models in different scenarios, helping in future strategies in fighting COVID-19. The remainder of this paper is organized as follows: Section 2.1 a brief description of the dataset adopted in this paper is given. The forecasting models applied in this study are described in Section 2.2. Section 3 details the procedures applied in the research methodology. Results obtained and related discussion about models forecasting performance are given on Section 4. Finally, Section 5 concludes this work with considerations and some directions for future research proposals.

Material and methods

This section presents the description of the material analyzed (Section 2.1) as well as the models description applied in this paper (Section 2.2).

Dataset description

The collected dataset refers to the cumulative confirmed cases of COVID-19 that occurred in Brazil until April, 18 or 19 of 2020. The dataset was collected from an application programming interface [16] that retrieves the daily information about COVID-19 cases from all 27 Brazilian State Health Offices, gather them, and make it a publicly available. Among the 27 federative units (26 states and one federal district), ten states some with a high incidence of COVID-19 cases and other states with lower temperatures, states from south of Brazil, were chosen, among them are Amazonas (AM), Bahia (BA), Ceara (CE), Minas Gerais (MG), Parana (PR), Rio de Janeiro (RJ), Rio Grande do Norte (RN), Rio Grande do Sul (RS), Santa Catarina (SC), and Sao Paulo (SP). The measurement period of each state varies, once each state counts since the day of its first case until the day of the last report. The cumulative confirmed cases and deaths of each state, as well as the period from the first and last reports, are illustrated in Table 1 . The change in the way of accounting for the number of cases, by the health departments, may change the data presented in this paper.

Table 1

First and last report dates by state.

State	Number of observed days	First report	Last report	Cumulative confirmed cases	Cumulative deaths
AM	34	13/03/2020	19/04/2020	2044	182
BA	43	06/03/2020	19/04/2020	1249	45
CE	35	16/03/2020	19/04/2020	3306	189
MG	42	08/03/2020	19/04/2020	1154	39
PR	36	12/03/2020	18/04/2020	960	49
RJ	38	05/03/2020	19/04/2020	4675	402
RN	30	12/03/2020	18/04/2020	561	26
RS	38	10/03/2020	19/04/2020	869	26
SC	39	12/03/2020	19/04/2020	1025	35
SP	53	25/02/2020	19/04/2020	14267	1015

First and last report dates by state. A heatmap of the cumulative confirmed cases is presented in Fig. 1 .

Fig. 1

Heatmap of the cumulative confirmed cases of the analyzed states.

Methodologies

This section describes a brief of each model employed in the data analysis. ARIMA is a Box & Jenkins modelling usually employed to deal with non-stationary time series. In fact, the ARIMA model is full specified by autoregressive (p), different degrees of trend differences (d), and moving average operators (q). These parameters are used do define the model order, and usually defined by grid-search, as well as by autocorrelation and partial autocorrelation function. In this context, the model is described as ARIMA(p,d,q) [17]. CUBIST is a rule-based model, which performs predictions following the regression of trees principle [18]. Through the use of a committee of the rules, and using the neighborhood concept similar to k-nearest-neighbor modelling, the final forecasting is obtained. GP is composed of a set of random variables Gaussian distributed and fully specified by its mean and covariance (kernel) function [19]. In this paper, the GP with a linear kernel is adopted. RIDGE is a regularized regression approach [20] which employs a penalization term in the ordinary least squares algorithm. It is an effective tool, once it reduces the bias of parameter estimates by controlling the standard errors. Moreover, the model can deal with inputs multi-collinearity problem. RF is a bagging ensemble-based model, which combines the bagging advantages characterized by the creation of multiple samples, with refitting through of the bootstrap technique, from the same set of data, and random selection of predictors to compose each node of the decision tree [21]. RF is a fast and robust supervised learning method able to deal with the randomness of the time series. Furthermore, it is interesting because, in addition to being an ensemble approach, only the number of predictors for each node needs to be tuned. SVR consists in determining support vectors (points) close to a hyperplane that maximizes the margin between two-point classes obtained from the difference between the target value and a threshold. To deal with non-linear problems SVR takes into account kernel functions, which calculates the similarity between two observations. In this paper, the linear kernel is adopted. The main advantages of the use of SVR lies in its capacity to capture the predictor non-linearity and then use it to improve the forecasting cases. In the same direction, it is advantageous to employ this perspective in this case study adopted, since that the samples are small [22]. Stacked Generalization or stacking-ensemble learning is an ensemble-based approach [23] which combines through a meta-learner the predictions of a set of weak models (base-learners) to obtain a stronger learner. This approach usually operates into two levels, where in the first level the base-learners are trained and its predictions are obtained. In the next stage, a meta-learner uses, as inputs, the predictions of the previous level in the training phase. The stacking predictions are obtained from meta-learner. The main advantage of the stacking-ensemble learning is that this approach can improve the accuracy and additionally reduce error variance [14].

Proposed forecasting framework

This section describes the main steps in the data analysis adopted by CUBIST, RF, RIDGE, SVR, and stacking-ensemble learning models. Also, the ARIMA modelling is described. Step 1: Firstly, the raw data is split into training and test datasets. The test dataset is composed of six last observations, and the training dataset by the remain samples [14]. The training data are centered by its mean value and divided by its standard deviation. To develops multi-days-ahead COVID-19 cases forecasting, recursive strategy is employed [24]. In this aspect, one model is fitted for one-day-ahead forecasting. Next, the recursive strategy uses the forecasting value as an input for the same model to forecast the next step, continuing this manner until reaching the desirable horizon. The training structure adopted in this paper is stated as follows, in which f is a function related to the adopted model in the training stage, is the COVID-19 case one-day-ahead, are the past confirmed cases, ϵ is the random error, following a normal distribution with zero mean (0) and constant variance σ 2. In this paper, the aim is to obtain the cases up to H next days, especially up to 1 (ODA, one-day-ahead), 3 (TDA, three-days-ahead), and 6-days-ahead (SDA, six-days-ahead), respectively. The following structures are considered, where, is the forecast value at time t and forecast horizon up to h, and are the previously observed and forecast cases lags in days. The n value is chosen through grid-search with purpose to capture the best data behavior. Step 2: In the stacking-ensemble learning modelling, the base-learners CUBIST, RF, RIDGE, SVR are trained and its forecasting are used as inputs for meta-learner GP. In the training stage, leave-one-out cross-validation with a time slice is adopted [14]. Finally, the out-of-sample forecasts are computed. These approaches are developed using the caret package [25]. The ARIMA modeling is performed through the use of forecast package [26], [27] with use of auto.arima function. To define the ARIMA order, grid-search is adopted, and the most suitable order is that reach a lower Akaike and Bayesian Akaike criteria information. Both analyses are developed using R software [28]. All hyperparameters employed in this study are presented in Table B.1 in Appendix B.

Table B1

Hyperparameters selected by grid-search for each evaluated model.

State	Model
	ARIMA	CUBIST		SVR	RIDGE	RF
	(p,d,q)	Committees	Neighbors	Cost	Regularization	Number of randomly selected predictors
AM	(1,2,0)	10	5	1	3.16E-03	2
BA	(0,2,1)	20	9	1	1E-04	2
CE	(2,2,1)	1	9	1	0	4
MG	(0,2,1)	1	9	1	1E-04	2
PR	(0,2,1)	20	5	1	3.16E-03	3
RJ	(0,2,1)	1	9	1	1E-04	3
RN	(1,1,0)	1	9	1	3.16E-03	5
RS	(0,1,0)	1	9	1	1E-04	3
SC	(0,2,1)	10	0	1	3.16E-03	5
SP	(0,2,0)	20	9	1	1E-04	5

Step 3: To evaluate the effectiveness of adopted models, from obtained forecasts out-of-sample (test set), performance IP (3), MAE (4), and sMAPE (5) criteria are computed as where n is the number of observations, y and are the ith observed and predicted values, respectively. Also, the M and M represent the performance measure of compared and best models, respectively. Fig. 2 presents the proposed forecasting framework.

Fig. 2

Proposed forecasting framework.

Results

This section describes the results of the developed experiments in forecasts out-of-sample (test set). First, Section 4.1 compares the results of evaluated models over ten datasets and three forecasting horizons adopted. In Table A.1 in Appendix A, the best results regarding accuracy are presented in bold. Additionally, Figs. 3 up to 4 illustrate the relation between observed and predicted values achieved by models with best set of performance measures depicted in Table A.1, as well as box-plots for out-of-sample errors are illustrated in Fig. 5 .

Table A1

Performance measures for each evaluated model.

State	Forecasting Horizon	Criteria	Model
			ARIMA	CUBIST	RF	RIDGE	Stacking	SVR
AM	ODA	MAE	95	45	622.17	48.17	121.5	56.33
		sMAPE	6.61%	2.80%	42.50%	2.83%	7.13%	3.18%
	TDA	MAE	101.33	71.33	622.17	83.67	176.67	80.5
		sMAPE	6.55%	4.50%	42.50%	4.49%	10.47%	4.19%
	SDA	MAE	119.17	162.17	622.17	62.33	233.17	79.17
		sMAPE	6.97%	9.55%	42.50%	3.45%	13.87%	4.13%

BA	ODA	MAE	12	93.83	366.33	45.33	107.67	42.33
		sMAPE	1.56%	9.16%	42.02%	4.36%	10.68%	4.15%
	TDA	MAE	70	132	366.33	74.33	171.67	59.67
		sMAPE	8.00%	12.92%	42.02%	7.46%	17.32%	5.63%
	SDA	MAE	155.67	152.33	366.33	152.83	215.83	73.17
		sMAPE	15.41%	15.08%	42.02%	15.16%	22.25%	6.90%

CE	ODA	MAE	18	65.17	916	70.33	220.83	87.67
		sMAPE	0.87%	2.49%	40.28%	2.81%	8.20%	3.17%
	TDA	MAE	69.66	128.83	916	149.83	382.17	136.67
		sMAPE	3.01%	4.48%	40.28%	5.39%	14.48%	4.78%
	SDA	MAE	257	118.17	916	98.17	484.33	164.17
		sMAPE	9.34%	4.11%	40.28%	3.52%	18.78%	5.77%

MG	ODA	MAE	32	17.5	235.5	24.33	56.5	16
		sMAPE	3.63%	1.81%	26.21%	2.50%	5.59%	1.57%
	TDA	MAE	26	21.33	235.5	21.67	78.17	21
		sMAPE	3.08%	2.20%	26.21%	2.13%	7.81%	2.04%
	SDA	MAE	55	36.83	235.5	32.17	97.83	14.33
		sMAPE	5.43%	3.58%	26.21%	3.14%	9.88%	1.41%

PR	ODA	MAE	31	27.33	163.5	38	23.5	35.33
		sMAPE	3.96%	3.26%	21.09%	4.50%	2.69%	4.18%
	TDA	MAE	51.66	57.33	163.5	76.5	28.17	60.17
		sMAPE	6.21%	6.56%	21.09%	8.61%	3.21%	6.89%
	SDA	MAE	73.67	118	163.5	151	24.17	117.17
		sMAPE	8.20%	12.56%	21.09%	15.75%	2.75%	12.53%

RJ	ODA	MAE	110	165.5	1305.67	273.67	69.5	360.83
		sMAPE	3.17%	3.82%	37.06%	6.25%	1.70%	8.09%
	TDA	MAE	120	275.67	1305.67	462.83	68	429.33
		sMAPE	3.18%	6.24%	37.06%	10.20%	1.65%	9.49%
	SDA	MAE	158.33	532.67	1305.67	696.17	65.17	529.5
		sMAPE	3.67%	11.34%	37.06%	14.67%	1.58%	11.43%

RN	ODA	MAE	6	17	152.5	24.83	30.33	18.33
		sMAPE	1.61%	3.87%	39.28%	5.56%	6.45%	4.14%
	TDA	MAE	8.33	30.83	152.5	37.67	54	35.5
		sMAPE	2.11%	6.54%	39.28%	8.51%	11.66%	7.69%
	SDA	MAE	36.33	15.83	152.5	62	54	18.5
		sMAPE	7.61%	3.42%%	39.28%	12.76%	11.66%	4.15%

RS	ODA	MAE	12	12.83	146.67	11.33	45.5	8.17
		sMAPE	1.64%	1.62%	19.82%	1.43%	5.76%	0.97%
	TDA	MAE	24	19.17	147.33	18.67	71.33	8.5
		sMAPE	3.22%	2.47%	19.92%	2.42%	9.14%	1.02%
	SDA	MAE	34.5	34.17	147.5	37.67	91.83	7.83
		sMAPE	4.31%	4.26%	19.95%	4.74%	11.89%	0.95%

SC	ODA	MAE	21	93.67	179.5	180.5	33.83	177.67
		sMAPE	2.43%	9.66%	20.97%	17.53%	3.66%	17.27%
	TDA	MAE	44.33	100.33	179.5	277	41	257.33
		sMAPE	4.76%	10.30%	20.97%	25.34%	4.39%	23.79%
	SDA	MAE	56	102.83	179.5	338.5	43.83	330.33
		sMAPE	5.65%	10.53%	20.97%	29.95%	4.68%	29.23%

SP	ODA	MAE	436	1587	3799	537.33	1363.83	409
		sMAPE	4.65%	13.47%	35.85%	4.44%	11.44%	3.51%
	TDA	MAE	1485.66	2471.83	3801	579.17	2243	326.67
		sMAPE	14.56%	21.81%	35.88%	4.79%	19.47%	2.77%
	SDA	MAE	2779	3054.67	3801.5	591.83	2665.83	362.83
		sMAPE	24.74%	27.60%	35.88%	4.95%	23.55%	3.04%

Fig. 3

Predicted versus observed cumulative confirmed cases of COVID-19 for AM, BA, CE, and MG states.

Fig. 4

Predicted versus observed cumulative confirmed cases of COVID-19 for PR, RJ, RN, RS, SC, and SP states.

Fig. 5

Box-plot for absolute error according to model and state for COVID-19 forecasting up to SDA.

Predicted versus observed cumulative confirmed cases of COVID-19 for AM, BA, CE, and MG states. Predicted versus observed cumulative confirmed cases of COVID-19 for PR, RJ, RN, RS, SC, and SP states. Box-plot for absolute error according to model and state for COVID-19 forecasting up to SDA.

Performance measures for compared models

In this section, the main results achieved by the best model regarding MAE and sMAPE criteria are presented for short-term forecasting multi-days- ahead of cumulative cases of COVID-19 from ten Brazilian states. AM: In this state, CUBIST, and RIDGE approaches could be considered to forecasting COVID-19 cases. In fact, in respect to ODA and TDA, CUBIST outperforms models, while for SDA the RIDGE achieves better accuracy regarding MAE and sMAPE than others. The improvement in the MAE for ODA and TDA achieved by CUBIST ranges between 6.58%–92.77%, and 11.39%–88.54%, respectively. Through sMAPE analysis, the RIDGE model outperforms other models, and this criterion is reduced in the range of 16.46%–91.88%, for SDA horizon. BA, MG, RS, and SP: For these states, in all forecasting windows, the SVR approach achieved better accuracy than other models, for both MAE and sMAPE criteria in the multi-days-ahead forecasting task of the confirmed number of COVID-19. In fact, the improvement in sMAPE is ranged in 13.26%–95.11%, 4.23%–94.88%, and 38.59%–95.24%, respectively, in ODA, TDA, and SDA forecasting horizons. Moreover, the same behavior is observed when the improvement in sMAPE criterion is obtained. CE and RN: In the CE state, the ARIMA model has a better performance in the forecasting out-of-sample than other models for ODA and TDA time windows. In this aspect, for MAE criterion, the improvement is ranged between 72.36%–98.03%, and 45.93%–92.40%, for ODA, and TDA time windows, respectively. For sMAPE, the improvement on ODA, and TDA horizons is 65.06%–97.84%, and 32.81%–92.53%, respectively. The SVR has better results than ARIMA model for SDA. Considering the RN state, the same analysis is developed for ODA, and TDA horizons. The exception to the SDA horizon, in which the CUBIST model has better effectiveness in the MAE and sMAPE criteria than remain models. PR, RJ, and SC: For these states localized into the south region (PR and SC) and southeast region (RJ) of Brazil, the most appropriate approach to forecast cumulative cases of COVID-19 is the stacking-ensemble learning, exception in ODA horizon, when ARIMA model has better results. Stacking overcomes the drawback of single models and achieves the best accuracy than other models. In fact, for these states, the improvement in MAE and sMAPE are between 14.01%–94.68%, and 17.48%–95.41%, respectively, for ODA horizon. The improvement in order forecasting horizons presents the same behavior of ODA, with the greatest magnitude of improvement for TDA and SDA. Remark: In this experiment, 180 scenarios (10 datasets, 3 forecasting horizons, and 6 models) were evaluated for the task of forecasting cumulative COVID-19 cases. In an overview, the best models for each state, obtained sMAPE ranged between 0.87%–3.51%, 1.02%–5.63%, and 0.95%–6.90% for ODA, TDA, and SDA forecasting, respectively. The ranking of models in all scenarios is SVR, stacking-ensemble, ARIMA, CUBIST, RIDGE, and RF models. In contrast to finds of [29], for the datasets evaluated in this paper, ARIMA modelling was effective in some situation for very-short horizons When the horizon is SDA, ARIMA model has worst performance than most of compared models. However, for ODA the applications are limited. From a broader perspective, the efficiency of SVR is due to its ability to deal with small size dataset, while the stacking-ensemble learning combines the advantages of several single models to learn the data behavior and obtain forecasts similar to observed values. On the other hand, the difficulty of the RF model to forecasting cumulative COVID-19 cases could be attributed to the fact that this approach requires more observations to effectively learn the data pattern. According to the information depicted in Figs. 3 and 4 it is possible to identify that the behavior of the data is learned by the evaluated models, which can forecasting compatible cases with the observed values. The good performance obtained in the training phase persists in the test stage. In the Fig. 3a and 4c the models, RIDGE and CUBIST, as well as in Fig. 3d and 4f, SVR presented difficulties to capture the variability of the first observations. The dataset is reduced for all states, which justifies the difficulties of the mathematical models to learn the behavior. Fig. 5 shows the box-plots of out-of-sample forecasting errors in the SDA horizon for each model and dataset used. This horizon is chosen to analysis due to the recursive strategy adopted, once the errors increase according to the growth of the forecasting horizon. The box diagram depicts the variation of absolute errors for each model, which reflects the stability of each model. In this context, the dots out of boxes are considered outliers errors, and the black dot inside of the box is the MAE for each model. Through the box-plot analysis, boxes with lower size indicate models with lower variation in the errors, and the results presented in Table A.1 are corroborated by the depicted in Fig. 5. Models with lower errors also reach better stability, which means that the most suitable modelling for each state can maintain a learning pattern, achieving homogeneous prediction errors.

Conclusion and future research

In this paper, six machine learning approaches named CUBIST, RF, RIDGE, SVR, and stacking-ensemble learning, as well as ARIMA statistical model, were employed in the task of forecasting one, three, and six-days-ahead the COVID-19 cumulative confirmed cases in ten Brazilian states with a high daily incidence. The COVID-19 cumulative confirmed cases for AM, BA, CE, MG, PR, RJ, RN, RS, SC, and SP states were used. The IP, MAE, and sMAPE criteria were adopted to evaluate the performance of the compared approaches. Moreover, the stability of out-of-sample errors was evaluated through box-plots. In respect of obtained results, it is possible to infer that SVR and stacking-ensemble learning model are suitable tools to forecast COVID-19 cases for most of the adopted states, once that these approaches were able to learn the nonlinearities inherent to the evaluated epidemiological time series. Also, ARIMA can be considered in some aspects for ODA, while CUBIST and RIDGE models deserve attention for the development of this task in TDA and SDA time windows. Therefore, the ranking of models, from the best to the worst regarding accuracy, in all scenarios is SVR, stacking-ensemble learning, ARIMA, CUBIST, RIDGE, and RF models. However, even though the models discussed in this paper presented forecasting cases similar to those observed, they should be used cautiously. This fact is attributed to the chaotic dynamics of the analyzed data, as well as the diversity of exogenous factors that can affect the daily notifications of COVID-19. For future works, it is intended (i) to adopt deep learning approaches combined to stacking-ensemble learning, (ii) to employ copulas functions for data augmentation dealing with small samples, (iii) to use multi-objective optimization to tune hyperparameters of adopted forecasting models, (iv) to adopt set of features which can help to explain the future cases of the COVID-19.

CRediT authorship contribution statement

Matheus Henrique Dal Molin Ribeiro: Conceptualization, Methodology, Formal analysis, Validation, Writing - original draft, Writing - review & editing. Ramon Gomes da Silva: Conceptualization, Methodology, Formal analysis, Validation, Writing - original draft, Writing - review & editing. Viviana Cocco Mariani: Conceptualization, Writing - review & editing. Leandro dos Santos Coelho: Conceptualization, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

11 in total

1. Modeling Dengue vector population using remotely sensed data and machine learning.

Authors: Juan M Scavuzzo; Francisco Trucco; Manuel Espinosa; Carolina B Tauro; Marcelo Abril; Carlos M Scavuzzo; Alejandro C Frery
Journal: Acta Trop Date: 2018-05-16 Impact factor: 3.112

2. Forecasting emergency admissions due to respiratory diseases in high variability scenarios using time series: A case study in Chile.

Authors: Miguel Becerra; Alejandro Jerez; Bastián Aballay; Hugo O Garcés; Andrés Fuentes
Journal: Sci Total Environ Date: 2019-11-23 Impact factor: 7.963

Review 3. Artificial Intelligence (AI) applications for COVID-19 pandemic.

Authors: Raju Vaishya; Mohd Javaid; Ibrahim Haleem Khan; Abid Haleem
Journal: Diabetes Metab Syndr Date: 2020-04-14

4. Predicting turning point, duration and attack rate of COVID-19 outbreaks in major Western countries.

Authors: Xiaolei Zhang; Renjun Ma; Lin Wang
Journal: Chaos Solitons Fractals Date: 2020-04-20 Impact factor: 5.944

5. Association of the COVID-19 pandemic with Internet Search Volumes: A Google Trends^TM Analysis.

Authors: Maria Effenberger; Andreas Kronbichler; Jae Il Shin; Gert Mayer; Herbert Tilg; Paul Perco
Journal: Int J Infect Dis Date: 2020-04-17 Impact factor: 3.623

6. Composite Monte Carlo decision making under high uncertainty of novel coronavirus epidemic using hybridized deep learning and fuzzy rule induction.

Authors: Simon James Fong; Gloria Li; Nilanjan Dey; Rubén González Crespo; Enrique Herrera-Viedma
Journal: Appl Soft Comput Date: 2020-04-09 Impact factor: 6.725

7. Real-time forecasts of the COVID-19 epidemic in China from February 5th to February 24th, 2020.

Authors: K Roosa; Y Lee; R Luo; A Kirpich; R Rothenberg; J M Hyman; P Yan; G Chowell
Journal: Infect Dis Model Date: 2020-02-14

8. Application of the ARIMA model on the COVID-2019 epidemic dataset.

Authors: Domenico Benvenuto; Marta Giovanetti; Lazzaro Vassallo; Silvia Angeletti; Massimo Ciccozzi
Journal: Data Brief Date: 2020-02-26

Review 9. World Health Organization declares global emergency: A review of the 2019 novel coronavirus (COVID-19).

Authors: Catrin Sohrabi; Zaid Alsafi; Niamh O'Neill; Mehdi Khan; Ahmed Kerwan; Ahmed Al-Jabir; Christos Iosifidis; Riaz Agha
Journal: Int J Surg Date: 2020-02-26 Impact factor: 6.071

10. Analysis and forecast of COVID-19 spreading in China, Italy and France.

Authors: Duccio Fanelli; Francesco Piazza
Journal: Chaos Solitons Fractals Date: 2020-03-21 Impact factor: 5.944

89 in total

1. ξboost: An AI-Based Data Analytics Scheme for COVID-19 Prediction and Economy Boosting.

Authors: Darshan Vekaria; Aparna Kumari; Sudeep Tanwar; Neeraj Kumar
Journal: IEEE Internet Things J Date: 2020-12-25 Impact factor: 9.471

2. A systematic review on AI/ML approaches against COVID-19 outbreak.

Authors: Onur Dogan; Sanju Tiwari; M A Jabbar; Shankru Guggari
Journal: Complex Intell Systems Date: 2021-07-05

3. Global prediction model for COVID-19 pandemic with the characteristics of the multiple peaks and local fluctuations.

Authors: Haoran Dai; Wen Cao; Xiaochong Tong; Yunxing Yao; Feilin Peng; Jingwen Zhu; Yuzhen Tian
Journal: BMC Med Res Methodol Date: 2022-05-13 Impact factor: 4.612

Review 4. The Promise of AI in Detection, Diagnosis, and Epidemiology for Combating COVID-19: Beyond the Hype.

Authors: Musa Abdulkareem; Steffen E Petersen
Journal: Front Artif Intell Date: 2021-05-14

5. Forecasting COVID-19 pandemic in Alberta, Canada using modified ARIMA models.

Authors: Jian Sun
Journal: Comput Methods Programs Biomed Update Date: 2021-09-26

6. Real-time Prediction of the Daily Incidence of COVID-19 in 215 Countries and Territories Using Machine Learning: Model Development and Validation.

Authors: Yuanyuan Peng; Cuilian Li; Yibiao Rong; Chi Pui Pang; Xinjian Chen; Haoyu Chen
Journal: J Med Internet Res Date: 2021-06-14 Impact factor: 5.428

7. Status evaluation of provinces affected by COVID-19: A qualitative assessment using fuzzy system.

Authors: Bappaditya Ghosh; Animesh Biswas
Journal: Appl Soft Comput Date: 2021-06-02 Impact factor: 6.725

8. A hybrid computational framework for intelligent inter-continent SARS-CoV-2 sub-strains characterization and prediction.

Authors: Moses Effiong Ekpenyong; Mercy Ernest Edoho; Udoinyang Godwin Inyang; Faith-Michael Uzoka; Itemobong Samuel Ekaidem; Anietie Effiong Moses; Martins Ochubiojo Emeje; Youtchou Mirabeau Tatfeng; Ifiok James Udo; EnoAbasi Deborah Anwana; Oboso Edem Etim; Joseph Ikim Geoffery; Emmanuel Ambrose Dan
Journal: Sci Rep Date: 2021-07-15 Impact factor: 4.379

9. Modelling and Forecasting of Growth Rate of New COVID-19 Cases in Top Nine Affected Countries: Considering Conditional Variance and Asymmetric Effect.

Authors: Aykut Ekinci
Journal: Chaos Solitons Fractals Date: 2021-07-08 Impact factor: 5.944

10. Short-Term Forecasting of Daily Confirmed COVID-19 Cases in Malaysia Using RF-SSA Model.

Authors: Shazlyn Milleana Shaharudin; Shuhaida Ismail; Noor Artika Hassan; Mou Leong Tan; Nurul Ainina Filza Sulaiman
Journal: Front Public Health Date: 2021-06-14