Literature DB >> 33071402

Comparing the accuracy of several network-based COVID-19 prediction algorithms.

Massimo A Achterberg¹, Bastian Prasse¹, Long Ma¹, Stojan Trajanovski², Maksim Kitsak¹, Piet Van Mieghem¹.

Abstract

Researchers from various scientific disciplines have attempted to forecast the spread of coronavirus disease 2019 (COVID-19). The proposed epidemic prediction methods range from basic curve fitting methods and traffic interaction models to machine-learning approaches. If we combine all these approaches, we obtain the Network Inference-based Prediction Algorithm (NIPA). In this paper, we analyse a diverse set of COVID-19 forecast algorithms, including several modifications of NIPA. Among the algorithms that we evaluated, the original NIPA performed best at forecasting the spread of COVID-19 in Hubei, China and in the Netherlands. In particular, we show that network-based forecasting is superior to any other forecasting algorithm.

Entities: Chemical

Keywords: Bayesian methods; Epidemiology; Forecast accuracy; Machine learning methods; Network inference; SIR model; Time series methods

Year: 2020 PMID： 33071402 PMCID： PMC7546239 DOI： 10.1016/j.ijforecast.2020.10.001

Source DB: PubMed Journal: Int J Forecast ISSN： 0169-2070

Introduction

In December 2019, SARS-CoV-2, the virus that causes coronavirus disease 2019 (COVID-19), emerged in the Chinese province of Hubei. The number of COVID-19 cases in China rose dramatically to almost 80,000 by the end of February 2020. From China, COVID-19 quickly spread throughout the whole world, with almost ten million cases by the end of June 2020. Many countries imposed nation-wide lockdowns to slow down the spread of COVID-19. A reliable forecast of the pandemic outbreak is key for targeted disease countermeasures and for the appropriate design of exit strategies to lift lockdowns. Unfortunately, just as weather forecasts, the prediction of epidemic outbreaks is subject to fundamental limits (Moran et al., 2016). One aspect is the limited availability of data, because epidemic time series are relatively short, and carrying out medical tests on a large scale is challenging. Also, the final number of infected cases is highly sensitive to initial perturbations (Prasse, Achterberg & Van Mieghem, 2020). Nonetheless, many methods have been developed and applied to forecast the spread of COVID-19. Perhaps the simplest approach is based on fitting the number of infections to a sigmoid curve, such as the logistic function (Roosa et al., 2020, Verhulst, 1845), Hill function (Hill, 1910), or Gompertz function (Gompertz, 1825). Using nonlinear regression, the parameters of the sigmoid curve can be estimated. For the comparison of prediction algorithms in this work, we focus on the logistic function. The logistic function is of particular interest, because the logistic function is the (approximate) solution for the number of infected cases (Van Mieghem, 2016) in the Susceptible-Infected-Susceptible (SIS) epidemic model, and for the number of removed cases in the Susceptible-Infected-Removed (SIR) epidemic model (Kermack and McKendrick, 1927, Prasse, Achterberg and Van Mieghem, 2020). By fitting the number of infected cases to a sigmoid curve, we implicitly assume that the spread in a particular region is independent of other regions, which contrasts with the strong interconnectedness of our modern world. The interaction between different regions, which is due to the movement of people, is taken into account by network-based techniques. The interaction can be described by a network with nodes. Each node in the network represents a particular region (country, province, municipality, or city), and the link represents the existence of an interaction from region to region , specified by a link weight denoting the infection probability from region to region . The self-infection probability within a region is given by , which we expect to be dominant over the other infection probabilities, because the interaction within a region is stronger than the interaction with other regions. The infection probability matrix , with elements is, however, unknown and must be derived from past observations of the epidemic. We address this issue in more detail in Section 2. Throughout this work, we often use “the number of infected cases”, which we understand as “the number of cases reported by local authorities”. Asymptomatic individuals, who do not feel sick and even do not know that they are infected and infectious, are not reported and can infect others unwittingly. To gain an understanding of the percentage of asymptomatic cases, one possibility is to test the population at random with, for example, blood tests. For COVID-19, the fraction of asymptomatic cases is estimated to be as large as 80% (Day, 2020). Since the number of asymptomatic cases cannot be determined on a daily basis, we confine ourselves to the number of reported cases in this work. Many scientific disciplines have investigated and forecasted the spread of COVID-19. Statistical approaches are commonly based on Kalman filtering (Yang, Yi et al., 2020) or consider Bayesian approaches (Lorch et al., 2020). Network-based approaches consider aeroplane networks, daily commute traffic, or cell phone traffic (Chang et al., 2020). Data scientists apply machine-learning algorithms, like the adaptive neuro-fuzzy inference system (Al-qaness, Ewees, Fan, & Abd El Aziz, 2020) or Long Short-Term Memory (LSTM) (Yang, Zeng et al., 2020). Mathematicians have performed parameter estimation on compartmental models such as the SIR model (Kergassner et al., 2020, Yang, Zeng et al., 2020) or the Susceptible-Exposed-Infected-Removed (SEIR) model (He, Peng, & Sun, 2020). Most epidemic models forecast the number of infected cases as a point forecast (generally: the mean of a distribution) rather than a complete distribution. All models in this work were designed to provide point forecasts, but can be generalised to provide prediction intervals. We discuss this topic further in Section 2. The focus of this work is the comparison of a diverse set of methods for forecasting the spread of COVID-19, ranging from fitting closed-form epidemic curves and comprehensive machine-learning algorithms to network-based approaches. We focus on the spread of COVID-19, but we emphasise that all methods can be applied to general epidemic outbreaks. We show that pure machine-learning and network-agnostic algorithms or epidemiological models are inferior to algorithms that combine multiple approaches and rely on the underlying network topology. In particular, the Network Inference-based Prediction Algorithm (NIPA) is superior to any other algorithm that we evaluated. In Section 2, we explain eight forecast algorithms for predicting the future number of COVID-19 cases. In Section 3, we demonstrate their performance in two selected regions—Hubei, China and the Netherlands—and discuss the strengths and weaknesses of each algorithm. Finally, we summarise our findings in Section 4.

Prediction algorithms

The spread of COVID-19 can be measured in terms of the daily number of reported cases. We model the course of the epidemic with an SIR compartmental model, where each individual is either susceptible (healthy), infected (can infect the susceptible), or removed (recovered or died). We denote the (discrete) time by , where is the total number of observation days. The first COVID-19 case was reported on day . Given that nearly all governments report their epidemic data once a day, we take a time step of one day as a natural choice and investigate the effect of the time step on the prediction accuracy in Appendix G. The SIR epidemic model with time-varying spreading parameters is given by:

SIR Epidemic Model (Kermack and McKendrick, 1927, Prasse and Van Mieghem, 2020a, Youssef and Scoglio, 2011)

The viral state of region evolves in discrete time according to and the fraction of susceptible individuals follows as Here, denotes the infection probability from region to region at time , and denotes the curing probability of region . The spread of COVID-19 cannot be described exactly by the SIR equations 1, (2) and (3). The COVID-19 pandemic evolves in continuous time, whereas the SIR model evolves in discrete time, with a time step of one day. Additionally, the SIR model is unable to describe phenomena like personal social distancing, nation-wide lockdowns, and the availability of vaccinations. Each of these model assumptions introduces model errors. Prior to the introduction of several forecasting algorithms, we explain how model errors can be used to obtain prediction intervals for the forecasted number of infected cases. As described in Prasse, Achterberg, Ma and Van Mieghem (2020), we obtain the fraction of susceptible , infectious , and removed individuals in region from the observed infections . We aim to find the best possible forecast for the cumulative number of infected cases for region and time . In this work, we discuss eight prediction methods.

Potential generalisation to prediction intervals

Before introducing the different prediction methods, we emphasise that this work focuses on short-term point forecasts. Long-term epidemic behaviour is very random, and providing forecast intervals is essential to give a complete picture of the long-term viral spread (Cirillo & Taleb, 2020). Extending the point forecast methods in this work to prediction intervals is outside the scope of this work. Nonetheless, we consider it valuable to conceptually discuss an extension of the SIR equation (1) to allow for the computation of prediction intervals. A real epidemic does not follow the SIR model (1) exactly. Instead, the infection state evolves from time to as where denotes the model error of region at time ; see also Appendix A. Equation (4) can be used as a basis for prediction intervals with a Monte Carlo approach. We define the error vector as and the infection vector as for all times . Then, based on Eq. (4), past observations , and errors , the point forecast algorithms provide an estimate of the viral state at future times . Conceptually, a prediction interval for the future viral state can be obtained in two steps. First, we obtain random samples from the distribution of the model errors . Second, for each sample of errors , we obtain a point forecast of the future viral states . The prediction intervals for the future viral state can be obtained from the ensemble of point forecasts. The details of the outlined method for obtaining prediction intervals are beyond the scope of this paper. Two particular challenges are the determination of the distribution of the model errors and the implementation of a computationally efficient sampling method.

Sigmoid curves

The logistic function is a well-known example of an epidemiological sigmoid curve (Van Mieghem, 2016, Verhulst, 1845). We assume the cumulative number of infected cases in region at time to follow a logistic function: where is the long-term fraction of infections, is the logistic growth rate, and is the inflection point, also known as the epidemic peak. The parameters , , and are estimated for each region separately using a nonlinear curve fitting procedure, which is explained in Appendix F. Other sigmoid curves, like the Hill function and Gompertz function, are also discussed in Appendix F.

Long short-term memory

Recurrent neural networks (Elman, 1990) (RNNs) have been used in various tasks related to sequences (Goodfellow, Bengio, & Courville, 2016), time series analysis and forecasting, speech recognition or natural language processing (Young, Hazarika, Poria, & Cambria, 2018), and they have been demonstrated to achieve state-of-the-art performance. LSTM networks (Hochreiter & Schmidhuber, 1997) are specific types of RNNs that resolve the long-standing problem of long-term dependencies. LSTM introduces additional input, output, and optional forget gates as interfaces with additional weights on the top of standard input data and hidden weights in the standard RNN unit. There are several variations (Gers and Schmidhuber, 2001, Gers et al., 2000) of LSTM networks, such as LSTMs with or without a forget gate and a “peephole connection”, (Jozefowicz, Zaremba, & Sutskever, 2015). For the internal mechanism between the gates and the exact mathematical relations, we refer the reader to Gers et al. (2000) or Yu, Si, Hu, and Zhang (2019). Here, we utilise the most common mechanism—an LSTM with a forget gate. In the simulations, we use an LSTM with sequence and hidden sizes both equal to four in a single LSTM layer (e.g., it is possible to stack a few LSTM layers, which leads to more overfitting), a learning rate of 0.1, and the Adam optimiser (Kingma & Ba, 2014), with mean squared error loss in 2000 epochs of training.

Network inference-based prediction algorithm (NIPA)

Network-based approaches take into account the interactions between different regions. However, the contact network is unknown (and consequently also the infection probability matrix ) and must be inferred from the epidemic outbreak. NIPA was originally proposed in Prasse and Van Mieghem (2020a), and an adaption of NIPA was applied to the spread of COVID-19 in Hubei, China (Prasse, Achterberg, Ma et al., 2020) and Italy (Pizzuti, Socievole, Prasse, & Van Mieghem, 2020). NIPA consists of two steps. First, the underlying infection matrix is inferred from the epidemic outbreak. Second, the infection matrix and the estimated curing rates for node are used to forecast the outbreak by iterating the SIR model on the estimated infection matrix . Even though NIPA successfully forecasted the spread of COVID-19 in the Chinese province of Hubei, the underlying infection matrix could not be inferred (Prasse & Van Mieghem, 2020b).

NIPA applied to each region separately

As a benchmark model, we apply NIPA to each region separately, which we name NIPA separate. NIPA separate is a machine-learning method based on the SIR model, but it does not consider the interaction between different regions.

NIPA static prior

The formulation of NIPA can be extended to include knowledge of the underlying contact network. We use a time-independent traffic network (with the corresponding traffic intensity matrix ) to obtain a prior for the infection probability matrix as We explain our motivation for the prior infection matrix in Appendix B. The positive scalars are unknown and are set by cross-validation. We assume that the true infection matrix is normally distributed around the prior infection matrix . Based on the prior infection matrix and observations of the spread of COVID-19, we obtain the Bayesian estimate by solving the optimisation problem where is the observed infection vector at all times . Using the estimated infection matrix and the estimated curing rates for region , we forecast the outbreak by iterating the SIR model. For details on NIPA static prior, see Appendix C. All algorithms discussed in this paper. *If the algorithm is based on a phenomenological epidemic process, like the SIR model. **If the algorithm is able to forecast small perturbations in the global trend. ***If the spread between different regions is considered.

NIPA dynamic prior

During the COVID-19 pandemic, many countries have imposed some kind of lockdown, in which the free movement of people is significantly restricted. Thus, the true contact network is not static but varies over time. We use a time-varying traffic matrix as an approximation for the prior infection matrix , whose entries equal for all times . The positive scalars are unknown and are set by hold-out validation. We propose a Bayesian approach called NIPA dynamic prior to estimate the true infection matrix from the time series of infected cases and the prior infection matrix . Using the estimated time-varying infection matrix and the curing rates for each region , we forecast the outbreak by iterating the SIR model. Appendix D explains the technical details of NIPA dynamic prior. One challenge to NIPA dynamic prior is the unavailability of the contact network in the future. Hence, we assume that the traffic matrix will remain constant after the last observation point : for all . We summarise all prediction algorithms in Table 1.

Table 1

All algorithms discussed in this paper. *If the algorithm is based on a phenomenological epidemic process, like the SIR model. **If the algorithm is able to forecast small perturbations in the global trend. ***If the spread between different regions is considered.

Algorithm	Epidemiology*	Adaptive**	Network***
NIPA	✓	✓	✓
NIPA separate	✓	✓	×
NIPA static prior	✓	✓	✓
NIPA dynamic prior	✓	✓	✓
Logistic function	✓	×	×
Hill function	✓	×	×
Gompertz function	✓	×	×
LSTM	×	✓	×

Evaluation of the prediction performance

We evaluate the prediction accuracy of the methods discussed in Section 2 by forecasting the spread of COVID-19 in a selected number of regions. We set the maximal forecast horizon to six days, because of the difficulty of predicting epidemic outbreaks (Prasse, Achterberg & Van Mieghem, 2020). Each prediction algorithm produces a forecast for the cumulative number of infected cases for region at time . To quantify the prediction error at time , we use the symmetric mean absolute percentage error (sMAPE) which is commonly used in forecasting (Hyndman & Koehler, 2006). Furthermore, we quantify the percentage error (PE) as follows: for region and time to investigate over- and underestimations. We consider the spread of COVID-19 in two regions: the cities in Hubei, China, and the provinces in the Netherlands. These regions cannot be regarded as full representatives of the spread of COVID-19, let alone general infectious diseases. Rather, these regions illustrate the strengths and weaknesses of our methods.

Hubei, China

We evaluate the prediction accuracy first in the Chinese province Hubei. In December 2019, the first cases of COVID-19 were detected in Wuhan, the capital of Hubei. The first case outside Wuhan was reported on January 21. From January 24 onwards, the whole province Hubei was under lockdown, prohibiting any non-urgent travel. On February 15, the local government in Hubei changed the diagnosing policy, causing an erratic increase in the number of reported cases on February 15. Therefore, we restrict ourselves to the period from January 21 to February 14. The reported cases are provided by the Health Commission of Hubei (2020). The majority of COVID-19 patients were reported in Wuhan, as shown in Fig. 1. We removed the region Shennongjia from our analysis, because of the small number of infections in that region.

Fig. 1

The figure on the left shows a geographical map of Hubei. The darker the city, the more infections per 100,000 inhabitants on February 14. The three cities with the most infections on February 14 are displayed on the right.

For NIPA static prior, we require a traffic network describing the interactions between the cities in Hubei. The Chinese company Baidu provides an estimate of the number of commuters between all cities in Hubei on a daily basis (Baidu Migration website, 2020). The static prior is set proportional to the traffic network on January 21, which corresponds to day . The figure on the left shows a geographical map of Hubei. The darker the city, the more infections per 100,000 inhabitants on February 14. The three cities with the most infections on February 14 are displayed on the right. Fig. 2 shows the prediction accuracy over time for different forecast algorithms. The horizontal axis shows the date . We forecasted the disease several days ahead, using all available information from January 22 until . For example, the right-most point in Fig. 2(a) includes data from January 22 to February 13 to forecast the situation on February 14.

Fig. 2

The sMAPE error in Fig. 2 tends to decrease as time evolves, because a growing amount of data is available. Furthermore, the total number of infected cases quickly increases, whereas the daily infected cases increase at a lower rate, indicating sub-exponential growth (Maier and Brockmann, 2020, Prasse, Achterberg and Van Mieghem, 2020). Sub-exponential growth will inevitably reduce the sMAPE error, because sMAPE is a relative error metric. On the other hand, the prediction accuracy decreases rapidly if the forecast horizon is enlarged. In particular, the number of cases five and six days ahead around February 1 cannot be predicted accurately, which is illustrated by Fig. 2, Fig. 2, respectively. In general, the logistic function performs worse than the other algorithms. There may be several reasons for this. First, by fitting a logistic curve, we assume the number of cases to follow the SIR model closely (Kermack and McKendrick, 1927, Prasse, Achterberg and Van Mieghem, 2020). Hence, we do not allow any individual or governmental responses to COVID-19, which typically flattens the (logistic) curve. Second, the logistic function ignores the spread between regions, which further deteriorates the prediction accuracy. Third, the logistic function is symmetric around the epidemic peak at ; the increase and decrease in the number of cases around the peak is equal. Most epidemic outbreaks of COVID-19 show a rapid increase and a more gradual decrease in the daily number of cases. A possible reason for this is that most lockdowns are enforced immediately, whereas lockdown measures are lifted gradually. Occasionally, the Hill function (Hill, 1910) and Gompertz function (Gompertz, 1825) are used to predict epidemic outbreaks, because they allow asymmetry around the epidemic peak. In this work, we focus on the logistic function because of its relation to the solution of the SIR and SIS models, and we discuss the Hill function and the Gompertz function in Appendix F. The performance of LSTM is fairly good, but LSTM fails to find an accurate forecast around January 31. Since the time series is the shortest at the left-most part of Fig. 2, less data is available to train the LSTM. Pure machine-learning algorithms are known to yield a lower prediction accuracy than other methods if the time series is short (Makridakis, Spiliotis, & Assimakopoulos, 2020). The prediction accuracy of all NIPA methods in Fig. 2 is similar, although NIPA static prior is considerably worse around February 4 for predictions of three or more days ahead. A possible reason is that the impact of the nation-wide lockdown on January 24 is captured incorrectly by the static prior, whereas the original NIPA method has more freedom to adjust its contact network accordingly and NIPA dynamic prior receives a more tailored, time-varying prior during the lockdown situation. Another reason is that the prior network (dynamic or static) may deviate significantly from the true infection matrix. Under ideal circumstances, namely when the epidemic outbreak exactly follows the SIR model, we show that NIPA static prior outperforms NIPA in Appendix E. Fig. 2 also shows that the negligence of the network interaction by the NIPA separate model decreases the prediction accuracy compared to NIPA. Hence, a network-based approach appears beneficial for forecasting. We summarise the results in Section 4. Another interesting topic is forecast bias: the tendency to systematically overestimate or underestimate the true number of infected cases. Using the Percentage Error (PE), we estimate the bias for all prediction algorithms for region at time . The surface error plots in Fig. 3 show the PE as a function of time for a four-days-ahead prediction. The logistic function and LSTM show the largest deviation around the mean, especially around February 1, which is in agreement with Fig. 2. Furthermore, Fig. 3 illustrates that the logistic function and LSTM systematically underestimate the true number of cases. On the other hand, NIPA static prior appears to overestimate the true number of cases. A possible reason for this is the following. The static network is taken to be proportional to the traffic flow before the lockdown measures. When a lockdown is introduced, the static prior remains constant, so the algorithm overestimates the true result. After some time, the newly collected data shows evidence that the prior is not very accurate, so NIPA static prior ignores the prior and uses the data instead, which improves the forecast accuracy again.

Fig. 3

Surface error plots for four-days-ahead forecasts versus time. The subfigures show (a) NIPA, (b) NIPA separate, (c) NIPA static prior, (d) NIPA dynamic prior, (e) logistic function, and (f) LSTM.

Prediction accuracy for the situation in Hubei, China. The subfigures show the prediction accuracy for a forecast horizon of (a) one day, (b) two days, (c) three days, (d) four days, (e) five days, and (f) six days for the prediction algorithms from Section 2. Surface error plots for four-days-ahead forecasts versus time. The subfigures show (a) NIPA, (b) NIPA separate, (c) NIPA static prior, (d) NIPA dynamic prior, (e) logistic function, and (f) LSTM.

The Netherlands

As a second case study, we regard the spread of COVID-19 in the Netherlands. The first patient, who had visited Italy the week before, was diagnosed on February 27. After February 27, the number of cases grew rapidly, as depicted in Fig. 4. The epidemic peak was observed at the end of March, and the daily number of cases subsequently dropped. We consider the spread of COVID-19 at a provincial level, for which data is available from the Dutch National Institute for Public Health and the Environment, called RIVM (RIVM, 2020). The Netherlands is subdivided into 12 provinces, for which the RIVM reports the daily number of new infections. Since the number of infected cases increased more gradually in the Netherlands than in Hubei, China, the total epidemic period is longer and more data points are available. A more gradual increase in the number of cases should be beneficial for the prediction accuracy.

Fig. 4

The figure on the left shows a geographical map of the Netherlands. The darker the province, the more infections per 100,000 inhabitants on May 19. The four provinces with the most infections on May 19 are displayed on the right.

For NIPA static prior, we require a traffic network as an approximation for the interaction between the provinces. Statistics Netherlands (Centraal Bureau voor de Statistiek) reports the number of people working in province and living in province , averaged over one year (CBS, 2018). We use the Google Mobility Data “Workplaces” to estimate the time-varying traffic network for each province in the Netherlands (Google LLC, 2020). Google reports the percentage decrease of traffic on day in province compared to an ordinary day between January 3 and February 6, 2020. During the lockdown, we expect because of the lockdown measures. Then, we construct the time-dependent traffic matrix as follows: . The figure on the left shows a geographical map of the Netherlands. The darker the province, the more infections per 100,000 inhabitants on May 19. The four provinces with the most infections on May 19 are displayed on the right. The prediction accuracy for the Netherlands is outlined in Fig. 5. Before April 1, the situation in the Netherlands is similar to Hubei, where the NIPA methods perform the best, but there are large deviations in the prediction accuracy. After April 1, the accuracy of the NIPA methods is nearly identical to each other. In other words, the influence of the initial static/dynamic network on the prediction is small. The main reason for this is that the NIPA algorithms are trained on a growing amount of infection data as time advances. Among the best performing methods over the whole period are original NIPA and NIPA separate, whereas the logistic function and LSTM show the worst performance.

Fig. 5

Prediction accuracy for the situation in the Netherlands. The subfigures show the prediction accuracy (a) one day ahead, (b) two days ahead, (c) three days ahead, (d) four days ahead, (e) five days ahead, and (f) six days ahead.

The prediction accuracy of NIPA separate and NIPA are comparable, except at the left-hand side of Fig. 5. A possible reason for this is that the spread of the coronavirus was initially dominated by interprovincial interactions. After imposing the lockdown at the end of March, the interaction between provinces decreased significantly, so the spread of the coronavirus mainly took place within each province. Prediction accuracy for the situation in the Netherlands. The subfigures show the prediction accuracy (a) one day ahead, (b) two days ahead, (c) three days ahead, (d) four days ahead, (e) five days ahead, and (f) six days ahead. The performance of all algorithms discussed in this paper. The Netherlands is abbreviated as NL. *As input, each algorithm requires the population size of each region and a time series of the infected cases in each region at any time .

Conclusion

We compared the prediction accuracy of eight algorithms designed to forecast the spread of COVID-19. We summarise the results in Table 2. The error in Table 2 was obtained by averaging over all sMAPE forecast errors for forecast horizons between one and six days. Fitting a sigmoid curve, like the logistic function, performed the worst among the methods considered. The main reasons for the low prediction accuracy are the imposed symmetry around the epidemic peak and the negligence of the interaction between regions. Other sigmoid curves, such as the Hill function and the Gompertz function, performed slightly better than the logistic function, but performed worse than most other algorithms. The LSTM machine-learning algorithm is not based on any phenomenological epidemic processes, nor does it consider provincial interactions. Table 2 shows that the prediction accuracy of LSTM is comparable to the Hill and Gompertz functions.

Table 2

The performance of all algorithms discussed in this paper. The Netherlands is abbreviated as NL. *As input, each algorithm requires the population size of each region and a time series of the infected cases in each region at any time .

Algorithm	Additional input*	Error (Hubei)	Error (NL)	Bias
NIPA	–	0.122	0.0381
NIPA separate	–	0.129	0.0487
NIPA static prior	Static traffic network	0.135	0.0384	Over
NIPA dynamic prior	Dynamic traffic network	0.129	0.0429
Logistic function	–	0.186	0.0735	Under
Hill function	–	0.142	0.0531
Gompertz function	–	0.141	0.0528
LSTM	–	0.160	0.0570	Under

The Network Inference-based Prediction Algorithm (NIPA) is a combination of machine learning and phenomenological epidemiology (SIR model), and it considers the interaction between different regions. Table 2 illustrates that the prediction accuracy of NIPA is better than that of any other algorithm. Applying NIPA for each region separately (NIPA separate) yielded a forecast error comparable to that of LSTM. We thus conclude that a network-based approach is beneficial for accurate forecasts. We also showed that choosing a time-varying or static prior close to the true contact network may improve the forecast accuracy of NIPA. Surprisingly, the inclusion of a time-varying or static prior in NIPA on real infection data does not improve the forecast accuracy for the considered regions. Among several reasons, the chosen prior might be an inaccurate estimate of the true contact network. In a practical setting, such as the current COVID-19 pandemic, policymakers might prefer to anticipate to worst-case prediction of the number of infected cases. In that case, an asymmetric error metric that penalises underestimations more significantly than overestimations may be more suitable.

20 in total

1. Learning to forget: continual prediction with LSTM.

Authors: F A Gers; J Schmidhuber; F Cummins
Journal: Neural Comput Date: 2000-10 Impact factor: 2.026

2. The Gompertz Curve as a Growth Curve.

Authors: C P Winsor
Journal: Proc Natl Acad Sci U S A Date: 1932-01 Impact factor: 11.205

3. LSTM recurrent networks learn simple context-free and context-sensitive languages.

Authors: F A Gers; E Schmidhuber
Journal: IEEE Trans Neural Netw Date: 2001

4. An individual-based approach to SIR epidemics in contact networks.

Authors: Mina Youssef; Caterina Scoglio
Journal: J Theor Biol Date: 2011-06-06 Impact factor: 2.691

5. Covid-19: four fifths of cases are asymptomatic, China figures indicate.

Authors: Michael Day
Journal: BMJ Date: 2020-04-02

6. SEIR modeling of the COVID-19 and its dynamics.

Authors: Shaobo He; Yuexi Peng; Kehui Sun
Journal: Nonlinear Dyn Date: 2020-06-18 Impact factor: 5.022

Review 7. Deciphering death: a commentary on Gompertz (1825) 'On the nature of the function expressive of the law of human mortality, and on a new mode of determining the value of life contingencies'.

Authors: Thomas B L Kirkwood
Journal: Philos Trans R Soc Lond B Biol Sci Date: 2015-04-19 Impact factor: 6.237

8. Modified SEIR and AI prediction of the epidemics trend of COVID-19 in China under public health interventions.

Authors: Zifeng Yang; Zhiqi Zeng; Ke Wang; Sook-San Wong; Wenhua Liang; Mark Zanin; Peng Liu; Xudong Cao; Zhongqiang Gao; Zhitong Mai; Jingyi Liang; Xiaoqing Liu; Shiyue Li; Yimin Li; Feng Ye; Weijie Guan; Yifan Yang; Fei Li; Shengmei Luo; Yuqi Xie; Bin Liu; Zhoulang Wang; Shaobo Zhang; Yaonan Wang; Nanshan Zhong; Jianxing He
Journal: J Thorac Dis Date: 2020-03 Impact factor: 3.005

2. Weighted butterfly optimization algorithm with intuitionistic fuzzy Gaussian function based adaptive-neuro fuzzy inference system for covid-19 prediction.

Authors: T Sundaravadivel; V Mahalakshmi
Journal: Mater Today Proc Date: 2021-10-25

3. Prediction model for the spread of the COVID-19 outbreak in the global environment.

Authors: Ron S Hirschprung; Chen Hajaj
Journal: Heliyon Date: 2021-06-29

4. Multitask learning and nonlinear optimal control of the COVID-19 outbreak: A geometric programming approach.

Authors: Mikhail Hayhoe; Francisco Barreras; Victor M Preciado
Journal: Annu Rev Control Date: 2021-05-19 Impact factor: 6.091

4 in total