Aman Swaraj1, Karan Verma2, Arshpreet Kaur3, Ghanshyam Singh4, Ashok Kumar5, Leandro Melo de Sales6. 1. Indian Institute of Technology, Roorkee, India. Electronic address: amanswaraj007@gmail.com. 2. National Institute of Technology, Delhi, India. Electronic address: karanverma@nitdelhi.ac.in. 3. DIT University, Dehradun, India. Electronic address: arshpreet.kaur@dituniversity.edu. 4. Malaviya National Institute of Technology Jaipur, India. Electronic address: gsingh.ece@mnit.ac.in. 5. Government Mahila Engineering College, Ajmer, India. Electronic address: kumarashoksaini@gmail.com. 6. Universidade Federal De Alagoas-UFAL, Brazil. Electronic address: leandro@ic.ufal.
Abstract
BACKGROUND: Time-series forecasting has a critical role during pandemics as it provides essential information that can lead to abstaining from the spread of the disease. The novel coronavirus disease, COVID-19, is spreading rapidly all over the world. The countries with dense populations, in particular, such as India, await imminent risk in tackling the epidemic. Different forecasting models are being used to predict future cases of COVID-19. The predicament for most of them is that they are not able to capture both the linear and nonlinear features of the data solely. METHODS: We propose an ensemble model integrating an autoregressive integrated moving average model (ARIMA) and a nonlinear autoregressive neural network (NAR). ARIMA models are used to extract the linear correlations and the NAR neural network for modeling the residuals of ARIMA containing nonlinear components of the data. Comparison: Single ARIMA model, ARIMA-NAR model and few other existing models which have been applied on the COVID-19 data in different countries are compared based on performance evaluation parameters. RESULT: The hybrid combination displayed significant reduction in RMSE (16.23%), MAE (37.89%) and MAPE (39.53%) values when compared with single ARIMA model for daily observed cases. Similar results with reduced error percentages were found for daily reported deaths and cases of recovery as well. RMSE value of our hybrid model was lesser in comparison to other models used for forecasting COVID-19 in different countries. CONCLUSION: Results suggested the effectiveness of the new hybrid model over a single ARIMA model in capturing the linear as well as nonlinear patterns of the COVID-19 data.
BACKGROUND: Time-series forecasting has a critical role during pandemics as it provides essential information that can lead to abstaining from the spread of the disease. The novel coronavirus disease, COVID-19, is spreading rapidly all over the world. The countries with dense populations, in particular, such as India, await imminent risk in tackling the epidemic. Different forecasting models are being used to predict future cases of COVID-19. The predicament for most of them is that they are not able to capture both the linear and nonlinear features of the data solely. METHODS: We propose an ensemble model integrating an autoregressive integrated moving average model (ARIMA) and a nonlinear autoregressive neural network (NAR). ARIMA models are used to extract the linear correlations and the NAR neural network for modeling the residuals of ARIMA containing nonlinear components of the data. Comparison: Single ARIMA model, ARIMA-NAR model and few other existing models which have been applied on the COVID-19 data in different countries are compared based on performance evaluation parameters. RESULT: The hybrid combination displayed significant reduction in RMSE (16.23%), MAE (37.89%) and MAPE (39.53%) values when compared with single ARIMA model for daily observed cases. Similar results with reduced error percentages were found for daily reported deaths and cases of recovery as well. RMSE value of our hybrid model was lesser in comparison to other models used for forecasting COVID-19 in different countries. CONCLUSION: Results suggested the effectiveness of the new hybrid model over a single ARIMA model in capturing the linear as well as nonlinear patterns of the COVID-19 data.
The novel coronavirus, COVID-19 (SARS-CoV-2), which was first reported in Wuhan, China, after the outbreak of exceptional pneumonia in late 2019, has already infected over 5.6 million people and caused more than three fifty thousand deaths worldwide [1]. Surpassing the fatalities caused by previous outbreaks such as severe acute respiratory syndrome coronavirus (SARS) [2], [3], and middle east respiratory syndrome (MERS) [4], [5], COVID-19 has been characterized by the world health organization (WHO) as a global pandemic [6]. The virus, which is assumed to be of zoonotic origin [7], [8], has spread rapidly with a transmission rate of around 1.4 to 2.5 [9].Therefore, to curb the outbreak, the nationwide lockdown has been observed in more than two hundred countries and in India. Table 1
shows the phases of lockdown conducted in India.
Table 1
Depiction of lockdown phases.
Lock down phases
Dates
Number of cases
Days
Increase percentage
Phase 0
22/01/2020, 24/03/2020
2872
58
–
Phase 1
25/03/2020–14/04/2020
10,951
21
281.3%
Phase 2
15/03/2020 – 02/05/2020
31,118
19
184.16%
Phase 3
03/05/2020–17/05/2020
53,193
12
70.93%
Depiction of lockdown phases.COVID-19 first appeared in India in Kerala back in late January, where the patient had a recent travel record to Wuhan, China. Initially, the transmission was slow, and the virus could infect very few people within Kerala only. However, the number of cases started rising again in mid-march after the pandemic hit western Europe, and after that, strict lockdown measures were observed throughout the nation.India is the second-most populous country in the world after China. A slight negligence in constraining the pandemic can lead to unprecedented panic and widespread loss of trade, economy, outsourcing workforce, manufacturing, and other services all over the world. For all these, it is essential to have a proper strategy for combating the epidemic. In the current situation of unavailability of an adequate cure of the disease, having short term forecasts of the spread can provide state authorities with a realistic estimate of the magnitude of the outbreak for the coming weeks.However, despite all the intervention strategies implemented by state authorities, the curve has jumped exponentially (Fig. 1
). Presently, the highest no of cases is observed in the United States; however, the curve is abruptly rising in Russia, India, and South American countries like Brazil.
Fig. 1
Total Confirmed cases of COVID-19 Worldwide from Jan 22 to May 15, 2020 [1].
Total Confirmed cases of COVID-19 Worldwide from Jan 22 to May 15, 2020 [1].Time-series forecasting during epidemics has been regarded as an essential tool in the past for containing the spread of contagious diseases like ebola, influenza, etc. [10], [11], [12], [13], [14], [15], [16]. Timing plays a critical role in an epidemic, and from the very beginning, an exceptional level of monitoring is required to curb the spread. Several studies have shown that proper analysis of such outbreaks can contribute substantially in devising the right course of action in due time [17], [18]. In this connection, a standard model often used for analyzing the trend of an epidemic, 'susceptible–exposed–infectious–resistant' (SEIR), has been applied recently for analyzing COVID-19 cases in various countries [19], [20], [21], [22], [23], [24], [25], [26], [27].Researchers have subsequently proposed alternate forecasting models involving machine learning algorithms like LSTM, SVR, ARIMA, and few others for forecasting COVID-19 cases in different countries [28], [29], [30], [31], [32], [33], [34], [35], [36], [37], [38], [39], [40], [41], [42], [43]. Some of the relevant work is presented in Table 2
.
Table 2
Existing models over COVID-19 data in different countries.
Existing models over COVID-19 data in different countries.(ABC – Artificial Bee Colony; KNN – K-Nearest Neighbors; Support Vector Regression (SVR); ANFIS – Adaptive Neuro-Fuzzy Inference System; PSO – Particle Swarm Optimization; GA – Genetic Algorithm; FPA – Flower Pollination Algorithm; FPASSA – Flower pollination algorithm Salp Swarm Algorithm; ARIMA – Auto -Regressive Integrated Moving Average; DNN – Deep Neural Network; LSTM – Long short-term memory; PR- Polynomial Regression; ANN – Artificial Neural Networks).However, among all these forecasting models, ARIMA is most popular [44], [45], [46]. ARIMA works with an underlying assumption that the present data is linearly related to past observed values and errors. However, previous pandemics have often shown complex and nonlinear patterns with time, and therefore a linear approach might not yield the best results. Artificial Neural Networks (ANN) have emerged as one of the most successful methods to overcome this limitation of non-linearity [47], [48], [49], [50]. However, ANN models are not capable of capturing both linear as well as nonlinear features of the time series equally well [51], and thus several hybrid methodologies have been developed [52], [53], [54], [55]. Zhang [56] proposed a combination of ARIMA and NAR (Non-linear Auto-Regressive) Neural Network on some well-known datasets. Wang et al. [57] also implemented a similar model for forecasting tuberculosis cases in China. The same approach was opted by Benmouiza et al. in [58] for small-scale solar radiation forecasting. Most of the hybrid models were successful in improving the prediction accuracy as compared to the individual alternatives of those models. Therefore, the study of a hybrid model having capabilities of modeling both linear and nonlinear time-series for COVID-19 could be capable of better forecasting.With this motivation, we develop an ensemble model combining ARIMA and NAR models for predicting future cases of COVID-19 in India and then compare the results produced by the hybrid model with the regular one.The organization of the rest of the paper is as follows: In Section 2, we discuss the methods for forecasting future COVID-19 cases along with the overall flow of the work. The implementation of these methods, along with a comparative analysis, is described in Section 3. Section 4 holds a discussion, and Section 5 depicts the conclusion.
System description
In Section 2.1, COVID-19 time-series data sources are mentioned. Section 2.2 describes our proposed ensemble model. A pictorial description of the same is presented in Fig. 2
. First we implement ARIMA model and analyze its results. Then to further improvise its results, a hybrid combination of ARIMA-NAR was developed. A comparison is made using performance evaluation parameters amongst these models. The section ends with a brief description of the accuracy estimation parameters in 2.3. All the ARIMA and NAR models are built in MATLAB v. 9.4.0.813654 (R2018a) using the Econometric Modeller Toolbox and Neural Net Time Series Toolbox respectively.
Fig. 2
Pictorial description of the stack based ensemble ARIMA model.
Pictorial description of the stack based ensemble ARIMA model.Daily observed cases, reported deaths and recovered cases of COVID-19 in India till May 15, 2020.
Data set collection
The cumulative count of confirmed cases, reported deaths and recovered cases of COVID-19 were taken from the official COVID-19 Data Repository of the Jhon Hopkins University [1] and for our study, we formulated the data in Microsoft Excel to obtain the respective cases on a daily basis for three phases, between may 6–15, July 21–30 and Aug 1–10. The starting point however is fixed at 22nd January.
Stacking based ARIMA-NAR model
Stacking based models basically use predictions from multiple models to build a new one. In this study, we utilize ARIMA models for extracting the linear relationships of the data and NAR neural network for the non linear patterns. Fig. 4
gives a step wise explanation for the ARIMA-NAR ensemble model. First in 2.2.1, we describe the working of the ARIMA model. Next, Section 2.2.2 talks about the NAR neural network and finally the contribution of both the models in making the final forecast is realized in Section 2.2.3.
Fig. 4
Prediction by ARIMA, ARIMA-NAR Model for daily new cases of COVID-19 in India between May 6–15, 2020.
Prediction by ARIMA, ARIMA-NAR Model for daily new cases of COVID-19 in India between May 6–15, 2020.
ARIMA model for linear patterns
The econometric model, ARIMA was first presented by Box & Jenkins in 1970 [59]. The model is generally favored for its flexibility to various types of time-series data and its predicting accuracy.ARIMA is a combination of A.R. and M.A. models, along with differencing. In Autoregressive models (A.R.), predictions are based on past values of the time-series data, and in Moving Average models (MA), prior residuals are considered for forecasting future values. The underlying process could be written as:Here, is the actual observed value at time and is random error. and are model parameters where and denote order of the model. Random errors are generally independent and identically distributed with zero mean and constant variance.In simpler terms, it represented as ARIMA (a, b, c) where 'a' denotes the order of A.R. model, 'b' is the differencing degree, 'c' is the order of the M.A. model. All these mentioned parameters of ARIMA model are determined in three iterative steps of model recognition, parameter selection and model verification.Since ARIMA models are generally suitable for stationary time series, so firstly in the identification step, stationarity of the time series is checked. If the series is not stationary, then differencing can be applied to make it stationary. After stationary tests, in the second step, appropriate parameters for the A.R. Snd M.A. models are selected for fitting based on Autocorrelation function (ACF) and Partial Autocorrelation Function (PACF) plots of the stationary data. In the final step, the goodness of the fit is verified by Akaike's Information Criterion (AIC) and Bayesian information criterion (BIC). These three steps are repeated untill a satisfactory model is achieved which is then used for forecasting.
NAR neural network for nonlinear patterns
An artificial neural network (ANN) is an intuitive mapping structure represented by a mathematical model simulated around the biological nervous system. It is equipped with the ability to comprehend dynamic nonlinear time series patterns and arbitrary functions of all sorts. An ANN processes information by combining various neurons connected in a network of weighted links and then gives the output by computing certain activation functions that can be expressed in mathematical terms as mentioned:where is the activation function, is the bias of neuron, represents the weight, input, and is the output.Nonlinear autoregressive neural network (NAR) is a well-known ANN for modeling dynamic systems and predicting future values in a nonlinear time series [56], [57], [58]. It is based on the architecture of a recurrent neural network having embedded memory with feedback connections. The general equation of a NAR model could be defined as:Here, represents the nonlinear function, and the previous output values determine the future values.Among multiple architectures in a NAR model, the close loop network is widely used for multi-step ahead forecasting.Here, denotes number of future points.
Forecast from ANN, NAR combined hybrid model
Although ARIMA and ANN both are potent methods for time-series forecasting, they have their own limitations. ARIMA models have achieved success in linear problems, whereas NAR models are more suitable for nonlinear domains [56], [57], [58]. While dealing with a real-world problem, it is challenging to ascertain all the characteristics of data, and therefore they study of a hybrid model having capabilities of modeling both linear and nonlinear time-series is essential.In general, a time-series contains both linear autocorrelation structure as well as nonlinear components, and it could be written as:where, is the original time-series data, denotes the linear component, and the nonlinear part at time . The hybrid methodology is carried out in two steps. First, the linear component is modeled using ARIMA such that the residuals left after modeling will contain only the nonlinear relationship. If we can denote the residuals left by ARIMA at time as , then we get,where, denotes forecasted valuesat time by the ARIMA model.Residual diagnosis plays a vital role in checking the sufficiency of ARIMA models. Although an ARIMA model is considered sufficient if the residuals left after fitting display no linear correlation structures, residual analysis cannot detect the presence of any significant nonlinear patterns in the data. Thus, by modeling the residuals using ANNs, nonlinear patterns can be realized. So, for the second step, the residuals are modeled to a NAR neural network with input nodes as follows:where, represents the nonlinear function evaluated by the NAR model and the leftover error is denoted by such that the final prediction can be equated as:where, denotes the final predicted values at time , and Eq. is represented as , the forecast value of residuals.The ARIMA-NAR combination thus exploits the strength of ARIMA as well as ANN models for capturing linear as well as nonlinear patterns.Zhang [56] and Granger [60] have further pointed out the importance of the subjective selection of component models while building a hybrid model, as sometimes a combination of sub-optimal models can yield better forecasts for the hybrid model than that of the optimal ones.
Constructing the hybrid model in MATLAB
Data is first divided into training, testing and validation randomly on multiple iterations. Several weight optimising algorithms are then used for adjusting the weight values, and the 'Neural Net Time Series Toolbox' in MATLAB provides three sets of such algorithms, namely Levemberg–Marquardt [61], Bayesian Regularization [62] and scaled conjugate gradient [63]. Low MSE and higher R values account for selection the optimum NAR model. The error autocorrelation plot is also used for verifying the adequacy of the model. After the training is finished, all the synaptic weights are saved, and the model is ready for prediction.
Performance evaluation measures
In general, the performance of any forecasting model is determined by comparing the actual values with the predicted ones, and three standard methods for evaluation are:mean absolute percentage error (MAPE), root mean square error (RMSE) and mean absolute error (MAE). The optimum prediction model can thus beselected based on these performance measures.
Results
A total of 85,784 cases of novel coronavirus were reported throughout India along with 2,753 deaths and 30,258 cases of recovery till May 15, 2020. Fig. 5
shows the number of cases observed on a daily basis, daily reported deaths and daily recovered cases in India between January 22 and May 15, 2020. We utilize the data from Jan 22 to May 5, 2020 for training purpose and then test the respective models for 6–15 May 2020 for all three datasets and additionally for 21–30 July and 1–10 Aug for cumulative cases in India. We also compare the results with LSTM and SIR model.
Fig. 5
Prediction by ARIMA, ARIMA-NAR Model for daily new reports of death due to COVID-19 in India between May 6–15, 2020.
Prediction by ARIMA, ARIMA-NAR Model for daily new reports of death due to COVID-19 in India between May 6–15, 2020.The final forecasting is done by combining the separate prediction values of ARIMA and NAR models. Fig. 4, Fig. 5, Fig. 6
respectively show the prediction of future cases by the ARIMA and NAR neural network for daily observed cases, reported deaths, and daily recovered cases between May 6–15, 2020. RMSE, MAE and MAPE values are calculated for the predictions made by single ARIMA model and the ARIMA-NAR combined model for all the three datasets (Table 3a, Table 3b, Table 3c
). Fig. 7, Fig. 8, Fig. 9
further draw a comparision between three different models, ARIMA, Hybrid ARIMA and LSTM for cumulative cases of covid-19 in India for three different phases, respectively 6–15 May, 21–30 July, 1–10 Aug. Additionally, we also draw comparision with the compartmental model, SIR in Fig. 10
and Table 4
. Finally, we also present long term forecast of covid-19 cases with the hybrid model (Fig. 11
) and Table 5
.
Fig. 6
Prediction by ARIMA, ARIMA-NAR Model for daily new cases of recovery from COVID-19 in India between May 6–15, 2020.
Table 3a
Prediction accuracy evaluation for daily observed cases in India between 6th and 15th May 2020.
Model
RMSE
MAE
MAPE
Single arima
329.4373
284.9
7.8%
Hybrid arima
275.9648
176.9298
4.7%
Table 3b
Prediction accuracy evaluation for daily reported deaths in India between 6th and 15th May, 2020.
Model
RMSE
MAE
MAPE
Single arima
46.3923
43.8708
44.02%
Hybrid arima
37.79482
35.3597
35.32%
Table 3c
Prediction accuracy evaluation for daily recovered cases in India between 6th and 15th May, 2020.
Model
RMSE
MAE
MAPE
Single Arima
198.0642
168.1494
10.66%
Hybrid Arima
177.6032
153.3469
9.67%
Fig. 7
Prediction by ARIMA, ARIMA-NAR and LSTM Model for cumulative new cases of COVID-19 in India between May 6–15, 2020.
Fig. 8
Prediction by ARIMA, ARIMA-NAR and LSTM Model for cumulative new cases of COVID-19 in India between July 21–30, 2020.
Fig. 9
Prediction by ARIMA, ARIMA-NAR and LSTM Model for cumulative new cases of COVID-19 in India between Aug 1–10, 2020.
Fig. 10
Predictions using the SIR model. Top panel with white, red, yellow and green regions indicate initial exponential growth, fast growth (with positive and negative phase separated by red vertical line), asymptotic slow growth and curve flattening, respectively. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Table 4
Accuracy comparison of SIR model and Hybrid Arima model for daily new cases in India between 6th and 15th May 2020.
SIR model
Hybrid model
RMSE
2499.233
275.9648
Fig. 11
Forecast for a duration of 40 days using (a) ARIMA; (b) LSTM; (c) Hybrid ARIMA.
Table 5
Accuracy comparison of ARIMA model and Hybrid Arima model for a duration of 40 days.
ARIMA model
Hybrid model
RMSE
11759.72517
8908.786344
Prediction by ARIMA, ARIMA-NAR Model for daily new cases of recovery from COVID-19 in India between May 6–15, 2020.Prediction accuracy evaluation for daily observed cases in India between 6th and 15th May 2020.Prediction accuracy evaluation for daily reported deaths in India between 6th and 15th May, 2020.Prediction accuracy evaluation for daily recovered cases in India between 6th and 15th May, 2020.Prediction by ARIMA, ARIMA-NAR and LSTM Model for cumulative new cases of COVID-19 in India between May 6–15, 2020.Prediction by ARIMA, ARIMA-NAR and LSTM Model for cumulative new cases of COVID-19 in India between July 21–30, 2020.Prediction by ARIMA, ARIMA-NAR and LSTM Model for cumulative new cases of COVID-19 in India between Aug 1–10, 2020.Predictions using the SIR model. Top panel with white, red, yellow and green regions indicate initial exponential growth, fast growth (with positive and negative phase separated by red vertical line), asymptotic slow growth and curve flattening, respectively. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)Accuracy comparison of SIR model and Hybrid Arima model for daily new cases in India between 6th and 15th May 2020.Forecast for a duration of 40 days using (a) ARIMA; (b) LSTM; (c) Hybrid ARIMA.Accuracy comparison of ARIMA model and Hybrid Arima model for a duration of 40 days.As seen in Table 3a, Table 3b, Table 3c, hybrid ARIMA's performance provide more adequate results. The RMSE, MAE and MAPE value of the hybrid combination for daily observed cases are 275.9648 (16.23% reduction), 176.9298 (37.89% reduction), 4.7% (39.53% reduction). Regarding daily reported deaths, cases of recovery and cumulative confirmed cases similar results were found with reduced error percentages. Further, it is evident from Fig. 4, Fig. 5, Fig. 6, Fig. 7, Fig. 8, Fig. 9 that Hybrid ARIMA has consistently performed better on all occasions.Prediction with SIR model-The well known compartmental model, Susceptible-Infectious-Recovered (SIR) model deals with the number of susceptibles ‘S’; number of infectious ‘I’; and the number of recovered or deceased individuals ‘R’. Details of the implementation including selection of R0 (basic reproduction number), β (transmission rate) and γ (average recovery rate) can be found in Batista [64] and Ranjan [65].After carrying out the overall prediction, we particularly noted the predicted values of daily new cases from 6th May 2020 to 15th May 2020 in order to calculate rmse and do the required comparison with the hybrid model (Table 4).To check the validity of the model on a longer duration, we trained the data for 200 days and predicted it for next 40 days. Since Arima is a linear model, we see that for the testing data, the graph just rises linearly up in Fig. 11.(a); similarly in 11.(b) we see the lstm model also settling down in the long run. But when the residual corrections are added in the hybrid arima, the graph shows some non linear variations in Fig. 11.(c). However, the non-linear variations also become constant over a period of time which goes to show that the error values captured in the training data more or start repeating over a period of time which is unlikely to happen in a real life scenario. Thus, to forecast for a longer duration, we may need to make proper adjustments in the model. Still compared to the single ARIMA and LSTM model, the hybrid model is more reliable (Table 5).
Discussion
The current COVID-19 outbreak has brought forward a major challenge for healthcare sector all over the world. After witnessing a catastrophic rise in the number of COVID-19 cases in USA and western Europe, a proper strategy for epidemic control in a densely populated country like India has become priority and to implement control measures in due time, forecasting of future cases is certainly essential. Several forecasting models have been proposed in recent months for predicting future cases of COVID-19 in different countries. Most of the forecasting work has been done using standard ARIMA models which are popular for their statistical properties in building models.Generally, a time series compromises of linear as well as nonlinear patterns and the existing trend of COVID-19 over last few months clearly depicted nonlinear patterns (Fig. 3). While ARIMA models have proven quite useful for linear time-series, they cannot extract nonlinear patterns sufficiently. On the other hand, NAR, a powerful class of ANN has displayed favourable characteristics for modelling nonlinear time-series. However, ANN models have their own limitations in equally capturing both the linear and nonlinear patterns. Therefore, a hybrid approach that utilizes ARIMA and ANN models together is proposed in the present study.
Fig. 3
Daily observed cases, reported deaths and recovered cases of COVID-19 in India till May 15, 2020.
Our study highlighted the key point of analysing linear and nonlinear patterns using separate models in context of a time series forecasting. Three separate datasets of daily confirmed cases of COVID-19 in India, reported deaths and cases of recovery were respectively trained on both the models for a duration of over hundred days between January 22 to May 5, 2020. First, the best model was selected for training the respective datasets on ARIMA and subsequently the fitting curve and residual plot of all the three datasets were generated.Further, for extracting the nonlinear patterns, the residuals left from the ARIMA models were fitted to the NAR neural network. Both the models, ARIMA and NAR were then used to predict the future cases and residuals respectively. The combination of prediction results from both these models were used as the final results for the hybrid model.Our hybrid ARIMA model was able to capture the nonlinear patterns quite well which were left as residuals by the ARIMA model. On the basis of RMSE, MAE, and MAPE measures (Eqs. (9), (10), (11)), we evaluated the prediction accuracy of both the models for all the three datasets. Reduced error as seen in Table 3a, Table 3b, Table 3c clearly advocate for the superiority of the proposed hybrid ARIMA model over a single ARIMA model. We have also compared the model with LSTM, SIR model and the hybrid ARIMA outperformes that as well.Although our model has shown better performance compared to LSTM, SIR and ARIMA, the difference between the results however starts to reduce as days increase for cumulative cases with larger dataset. This goes to show the limitation of our model to forecast on longer horizon of months. In addition to current covid transfer rate and prevention policies, uncertain behavioural patterns, and mitigation schemes also account for forecasting accuracy at longer intervals.Still, our model is particularly suited for quick short term forecasts in an epidemic. This is in line with previous studies where a combination of ARIMA and NAR model has been explored as a possibility for producing better time-series forecasting results. Hence, the present study can be regarded as an authentic approach for time-series forecasting during pandemics.
Conclusion
In this paper, we presented a new hybrid model for COVID-19 time-series forecasting by combining an Auto-Regressive Integrated Moving Average (ARIMA) model with a Nonlinear Auto-Regressive (NAR) neural network. ARIMA models were used to capture the linear relationship from the time-series, and the residuals of the ARIMA model containing the nonlinear components were fitted by the NAR Model. The prediction accuracy of both the models were measured on the basis of Root Mean Squared Error, Mean Absolute Error, and Mean Absolute Percentage Error. With low values of RMSE, MAE, and MAPE, the combination of ARIMA-NAR models produced better prediction results as compared to the single ARIMA, model. Our model also outperforms SIR and LSTM model for short term forecasts. Therefore, the new hybrid model can be considered as a reliable tool for policymakers in predicting short term forecasts of COVID-19 and devising proper strategies in due time.However, for longer intervals, the difference of results between models reduces owing to the uncertainities of data, mitigation policies and behavioural patterns.Ethical approval: This article does not contain any studies with human participants or animals performed by any of the authors.
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Authors: Ikwo K Oboho; Sara M Tomczyk; Ahmad M Al-Asmari; Ayman A Banjar; Hani Al-Mugti; Muhannad S Aloraini; Khulud Z Alkhaldi; Emad L Almohammadi; Basem M Alraddadi; Susan I Gerber; David L Swerdlow; John T Watson; Tariq A Madani Journal: N Engl J Med Date: 2015-02-26 Impact factor: 91.245
Authors: Kiesha Prem; Yang Liu; Timothy W Russell; Adam J Kucharski; Rosalind M Eggo; Nicholas Davies; Mark Jit; Petra Klepac Journal: Lancet Public Health Date: 2020-03-25
Authors: Shahadat Uddin; Arif Khan; Haohui Lu; Fangyu Zhou; Shakir Karim Journal: Int J Environ Res Public Health Date: 2022-02-11 Impact factor: 3.390
Authors: Marcelo Benedeti Palermo; Lucas Micol Policarpo; Cristiano André da Costa; Rodrigo da Rosa Righi Journal: Netw Model Anal Health Inform Bioinform Date: 2022-10-11