| Literature DB >> 34222166 |
Subhash Kumar Yadav1, Yusuf Akhter2.
Abstract
In this review, we have discussed the different statistical modeling and prediction techniques for various infectious diseases including the recent pandemic of COVID-19. The distribution fitting, time series modeling along with predictive monitoring approaches, and epidemiological modeling are illustrated. When the epidemiology data is sufficient to fit with the required sample size, the normal distribution in general or other theoretical distributions are fitted and the best-fitted distribution is chosen for the prediction of the spread of the disease. The infectious diseases develop over time and we have data on the single variable that is the number of infections that happened, therefore, time series models are fitted and the prediction is done based on the best-fitted model. Monitoring approaches may also be applied to time series models which could estimate the parameters more precisely. In epidemiological modeling, more biological parameters are incorporated in the models and the forecasting of the disease spread is carried out. We came up with, how to improve the existing modeling methods, the use of fuzzy variables, and detection of fraud in the available data. Ultimately, we have reviewed the results of recent statistical modeling efforts to predict the course of COVID-19 spread.Entities:
Keywords: distribution fitting models; epidemiological models of disease; estimation; parameters; prediction; time series regression models
Year: 2021 PMID: 34222166 PMCID: PMC8242242 DOI: 10.3389/fpubh.2021.645405
Source DB: PubMed Journal: Front Public Health ISSN: 2296-2565
Figure 1A case of normal distribution. The normal distribution is a symmetric bell-shaped curve. The standard normal distribution has a mean zero and a standard deviation of one. The coefficient of skewness for this distribution is zero while the coefficient of kurtosis is three. The presented figure represents different normal curves for different values of means and variances.
Figure 5The Gamma distribution is the distribution of a random variable X for which E(X) = κθ = is fixed and greater than zero, and E[log(X)=ψ(κ) + log(κ) = log(α) − log(σ) is fixed (ψ is the digamma function). The presented figure represents different gamma curves for different values of the parameters α and σ.
Fitting of different distributions for different infectious diseases.
| 1 | Meyer and Held ( | The authors have studied the short-time human travel behavior through power Law (Pareto, Uniform, Cauchy, etc.) concerning the distance. They used extended space-time models for influenza infectious disease surveillance data to better capture the dynamics of disease spread. They have studied the statistical properties of the best-fitted distribution for a better explanation and prediction of influenza. |
| 2 | Virlogeux et al. ( | In this work a novel avian influenza virus, influenza A(H7N9) emerged in China was studied. The authors have fitted different parametric and non-parametric distribution for A(H7N9) incubation periods and studied the properties of the fitted distributions. The best fitted parametric distribution observed was Weibull distribution and the mean incubation period was 3.4 days with a 95% confidence interval [3.0 3.7] and the variance was 2.9 days. The results were very similar for the non-parametric Turnbull estimate as well. |
| 3 | Virlogeux et al. ( | The authors studied Middle East Respiratory Syndrome coronavirus (MERS) disease in the Arabian Peninsula and in South Korea in 2015. They examined the incubation period distribution of MERS coronavirus infection using parametric (Lognormal, Gamma, Weibull, Exponential, Log-logistic) and non-parametric (turnbull) methods. They have shown that Gamma and Weibull are best-fitted distributions for South Korea while Lognormal and Log-logistic are the best fitted for Saudi Arabia and estimated a mean incubation period of 6.9 days with 95% credibility interval as [6.3 7.5] for cases in South Korea and 5.0 days with 95% credibility interval as [4.0 6.6] among cases in Saudi Arabia. |
| 4 | Hanel et al. ( | The authors worked on the most standard methods based on maximum likelihood (ML) estimates of power-law function which is an exponential distribution. The best-fitted power function distribution based on the fitting measures was observed after that the appropriate ML estimator was derived for arbitrary exponents of power-law distributions on bounded discrete sample spaces. They had shown that a similar estimator was also working for continuous data. This ML estimator was implemented and its performance was compared with previous works. Further, a general protocol was given on how it could be used for estimating the spread of the infections. |
| 5 | Li et al. ( | In this study, prediction and parameter estimation of infections were studied using noisy case reporting data. A simple stochastic, discrete-time, discrete-state epidemic model was established with both process and observation errors and was used to characterize the efficiency of different flavors of Bayesian Markov chain Monte Carlo (MCMC). They fitted different parametric distributions with ceilings (binomial and beta-binomial distributions) and without ceilings (Poisson and negative binomial) and the best-fitted distribution were studied for the statistical properties to explain and prediction of the nature of the infections. |
| 6 | De-Souza et al. ( | The authors inferred that climate change has a high impact on governing the health and death rates due to respiratory system diseases and remained poorly understood by probability distribution modeling. They fitted the Burr, Inverse Gaussian, Lognormal, Pert, Rayleigh, and Weibull distributions to respiratory diseases, and the shape and scale parameters of the distributions were determined to verify the quality of fit through fitting measures. The lognormal and Rayleigh are best observed fit for hospital admissions. |
| 7 | Valvo ( | The author studied the epidemiological model for the prediction of the time trends of COVID-19 deaths worldwide. They have taken a bimodal distribution function as a mixture of two lognormal distributions to model the time distribution of deaths in a country. They mentioned that an asymmetric lognormal distribution is better fitted in comparison to symmetric distribution functions. Based on the best model, they have further analyzed and predicted the future behavior of the spread of COVID-19 and was extrapolated until the end of the year 2020. |
| 8 | Vazquez ( | The author has shown that infection spreads are expected to grow exponentially in time but their initial kinetics is not well understood. In this study, derivation of the analytical expressions was carried out for the kinetic behavior with a gamma distribution of generation intervals. Omitting the exponential distribution, the spread of the infection grows as a power law at short times. At long times, the kinetics is exponential with a growth rate estimated by the reproductive number and the parameters of the generation interval distribution. These kinetic derivations can be deployed to do better estimates of parameters used for infection spread. |
| 9 | El-Monsef ( | The author has fitted finite mixture of m-Erlang distributions to analyze the COVID-19 dissemination. The author has derived different moments and shape parameters estimate for the suggested model and shown that it has a bound hazard function. A special case of the suggested distribution has also been discussed along with the predictive technique to estimate the parameters of the fitted distribution. In this fitted distribution, the data of the COVID-19 cases from Egypt was used to examine the flexibility of the proposed model. |
| 10 | Almetwally et al. ( | The authors suggested a model for fitting the COVID 19 mortality rates in the UK and Canada using optimal statistical technique. They have suggested a new two-parameter lifetime distribution by combining inverted Topp-Leone (ITL) and modified Kies inverted Topp-Leone (MKITL) distributions. They have shown that the suggested model has various important properties as simple linear representation, hazard rate function, and moment function. They have used various methods of estimation for the estimation of parameters of the suggested distribution. They have shown through the data simulation study on COVID-19 cases that the suggested model is better than the traditional methods. |
| 11 | Mubarak and Almetwally ( | The authors have introduced a new extended three-parameter exponential distribution and studied the survival function and hazard function. They have also used the maximum likelihood estimation (MLE) and maximum product spacing (MPS) methods for to evaluate the parameters of this distribution. An empirical study is carried out to judge the superiority of the suggested model over some well-known distributions using COVID-19 data and it was concluded that the suggested distribution is better fitted over competing distributions. |
| 12 | Gonçalves et al. ( | In the presented work, authors have concluded that the inaccurate epidemiological concepts are being used during COVID-19 pandemic. They pointed out about social media and scientific journals regarding wrong references for “normal epidemic curve” and “log-normal curve/distribution” and the textbooks and courses of reputed institutions have spread slightly incorrect information. Most of them have shown histogram as epidemic curve or using epidemic data as Gaussian distribution, ignoring its property of temporal indexing. The authors have further observed that epidemic curve may be of Gaussian curve type and be modeled from Gauss function but it could not be a perfect normal distribution or a log-normal, as some of the previous studies have shown. Further, they have mentioned that a pandemic gives highly-complex data and to handle it effectively, there is need to go beyond the “one-size-fits-all solution” of statistical and mathematical modeling. Finally they suggested that the classical textbooks should be updated on pandemics and epidemiology should give reliable information to policy making and implementation. |
Stationary and non-stationary time series regression models used in epidemiology.
| 1. | Autoregressive model (AR) | Present values explicated linearly based on previous values and present residuals |
| 2. | Moving Average (MA) | Present values of time series explicated linearly for previous values and the time series residuals |
| 3. | Autoregressive Moving Average (ARMA) | As a combination of AR and MA, present values of time series explicated linearly for current values but also previous and present residuals |
| 4. | Autoregressive Integrated Moving Average (ARIMA) | Based on the ARMA model, but a differencing procedure transforming non-stationary data to stationary data |
| 5. | Seasonal Autoregressive Integrated Moving Average (SARIMA) | Based on the ARIMA model, but also includes seasonal differencing, in case of data has periodic patterns |
Different time series regression models for different infectious diseases.
| 1 | Zhang et al. ( | In this study, the authors have presented a complete analysis of different predicting methods based on the monthly infection spread data of typhoid fever. The seasonal autoregressive integrated moving average (SARIMA) model and three different models inspired by neural networks, namely, backpropagation neural networks (BPNN), radial basis function neural networks (RBFNN), and Elman recurrent neural networks (ERNN) were compared. The dissimilarities, pros, and cons, between the two models. The evaluation was based on three metrics: mean absolute error (MAE), mean absolute percentage error (MAPE) and mean square error (MSE). The results showed that RBFNN obtained the smallest MAE, MAPE, and MSE in both the modeling and forecasting processes. Ultimately, it was suggested to use the RBFNN method for better explanation and prediction of typhoid fever infection spread. |
| 2 | Zhang et al. ( | In this work, nine types of infections were compared based on the efficiency of four-time series methods, regression and exponential smoothing, ARIMA, and support vector machine (SVM). The performances were evaluated based on three metrics: mean absolute error (MAE), mean absolute percentage error (MAPE) and mean square error (MSE). The robustness of the statistical models in predicting the potential spread of the infections showed their good application in epidemiological surveillance and found that no single method is completely superior to the others but support vector machine-based methods are proven better than the ARIMA models and decomposition methods in most of the cases. |
| 3 | Imai et al. ( | In this study, time series regression was applied to evaluate the short-term associations of air pollution and weather with mortality or morbidity of infectious diseases. They used different approaches, including mathematical modeling, wavelet analysis, and ARIMA models. They concluded that the time series regression can be used to investigate the dependence of infectious diseases on weather, but may need modifying to allow for features specific to this context. |
| 4 | Song et al. ( | The authors compiled monthly data of influenza infections from all provinces and autonomous regions in mainland China and applied the time series analysis to construct an ARIMA model. They have evaluated the goodness of fit through Autocorrelation function (ACF), partial autocorrelation function (PACF), and automatic model selection was to determine the order of the model parameters. It is conceivable that SARIMA is the best time series model for the prediction of influenza infection spread. |
| 5 | Sarkar and Chatterjee ( | The authors have applied different time series models to analyze and forecast financial data as well as epidemiological data of malaria infection dissemination. They have studied epidemiological data of malaria using three-time series models, namely Auto-Regressive Integrated Moving Average (ARIMA), Generalized Auto-Regressive Conditional Heteroskedastic (GARCH), and Random Walk. They have shown a good fit of models on the data and provided the best forecast for future infection spread. As far as future prevalence pattern is concerned, the prediction of these models may help researchers and public health professionals to design control programs for malaria. |
| 6 | Chae et al. ( | The authors studied the prediction of infections by optimizing the parameters of deep learning algorithms while considering big data including social media data. The performance of the deep neural network (DNN) and long-short term memory (LSTM) learning models were compared with the autoregressive integrated moving average (ARIMA) when predicting three infections for 1 week time into the future. They have shown that the DNN and LSTM models perform better than ARIMA. The DNN model performed stably and the LSTM model was more accurate when infections were spreading. |
| 7 | Tapak et al. ( | The author analyzed the correctness of support vector machine, artificial neural network, and random-forest time series models in influenza-like illness (ILI) modeling and infection detection. Different models were applied to a data set of weekly ILI cases data in Iran. To judge the robustness of the models, the root means square errors (RMSE), mean absolute errors (MAE), and intra-class correlation coefficient (ICC) calculations were used as testing criteria. It was indicated that the random-forest time series model worked better in comparison to the rest three methods. The outcome depicted that the used time series models had excellent performance suggesting these could be effectively applied for predicting weekly ILI infections and endemics. |
| 8 | Chaurasia and Pal ( | In this work, the authors have analyzed the number of cases, deaths, and recovery cases in the case of COVID-19 worldwide within a specific period. They have used several prediction techniques: naive method, simple average, moving average, single exponential smoothing, Holt linear trend method, Holt-Winters method, and ARIMA, for comparison, and how these methods improve the Root mean square error score. They concluded that the naive method is best in comparison to other used methods. |
| 9 | Rahmadani and Lee ( | The authors suggested a hybrid deep learning framework using the meta-population model and long and short term model (LSTM) for the prediction of the COVID-19 dissemination. They expanded the susceptible–exposed–infected–recovered compartment model by taking into account the human mobility among a number of regions. They used the meta-population model to incorporate with deep learning models to estimate the parameters of the combined hybrid model. They have compared the suggested hybrid deep learning framework with other estimation methods for the prediction of COVID-19 spread patterns and have shown improvement over previously presented methods. |
| 10 | Kalantari ( | The author used the singular spectrum analysis (SSA) method for the prediction of the number of daily confirmed infection cases, deaths, and recoveries caused by COVID-19. It was analyzed using SSA method with the other commonly used time series predicting techniques including ARIMA, fractional ARIMA, exponential smoothing, TBATS, and neural network autoregression (NNAR) on the basis of fitting measure root mean squared error (RMSE). It was shown that the SSA technique is best for predicting the number of daily confirmed infection cases, deaths, and recoveries caused by COVID-19 among the studied models. |
| 11 | Satrio et al. ( | The authors utilized the machine learning model for predicting the spread of COVID-19 in Indonesia. They have also attempted to estimate a time line for the return of the normalcy. They have utilized PROPHET forecasting model as well as ARIMA to see their robustness and accuracy for the confirmed new infection cases, deaths, and recovered numbers. They have shown that PROPHET performs better than ARIMA model on the analyzed data set. |
| 12 | Beneditto et al. ( | The authors utilized the Machine Learning model to forecast the trend of the disease in Indonesia with finding out the approximation when normality will return. This study used Facebook's Prophet Forecasting Model and ARIMA Forecasting Model to compare their performance and accuracy on a dataset containing the confirmed cases, deaths, and recovered numbers, obtained from the Kaggle website. The prediction models are then compared to the last 2 weeks of the actual data to measure their performance against each other. The result showed that Prophet has predicted the outcomes better than ARIMA, despite it being further from the actual data the more days it predicts. |
Different infectious diseases their causative microorganisms and modes of transmission.
| Virus | Measles, Chickenpox, Mumps, Rubella, Smallpox, Influenza, Herpes, HIV (AIDS virus) | Poliomyelitis | Arboviruses: | Rabies |
| Bacteria | Gonorrhea | Typhoid Fever | Plague | Brucellosis |
| Protozoa | Syphilis | Amebiasis | Malaria | |
| Helminths | Trichinosis | |||
Various epidemiological models for different infectious diseases.
| 1 | Huppert and Katriel ( | The authors have discussed the extent to which the disease transmission models provide reliable predictions. They examined the predictions of the model to test which are trustworthy. An important benefit derived from mathematical modeling activity is that it demands transparency and accuracy regarding our assumptions, thus enabling us to test our understanding of the disease epidemiology by comparing model results and observed patterns. Models can also assist in decision-making by making projections regarding important issues such as intervention-induced changes in the spread of disease. |
| 2 | Steele et al. ( | The authors mentioned that the early detection of infectious disease outbreaks can reduce the ultimate size of the outbreak, with lower overall morbidity and mortality due to the disease. In the review, they have mentioned numerous approaches to the earlier detection of outbreaks exist. In the systematic review the authors used of PRISMA framework (Preferred Reporting Items for Systematic Reviews and Meta-analyses), The MEDLINE (PubMed) database. Five studies were identified and included in the review. These studies evaluated the effect of electronic-based reporting on detection timeliness, the impact of laboratory agreements on timeliness, and barriers to notification by general practitioners. |
| 3 | Driessche ( | The author worked on the basic reproduction number, |
| 4 | Walters et al. ( | The authors observed that mathematical models can aid in the understanding of the risks associated with the global spread of infectious diseases. To assess the current state of mathematical models for the global spread of infectious diseases, the authors reviewed the literature highlighting common approaches and good practice, and identifying research gaps. They found that most epidemiological data come from published journal articles, population data come from a wide range of sources, and travel data mainly come from statistics or surveys, or commercial datasets. However, they believed that open access datasets should be used wherever possible to aid model reproducibility and transparency. |
| 5 | Raissi et al. ( | The authors considered the compartmental disease transmission models and discuss the importance of determining model parameters that provide an insight into disease transmission and prevalence. They used three approaches including an optimization approach, a physics informed deep learning, and a statistical inference method to estimate parameters and analyze disease transmission. The performance of the deep learning method is validated against representative small and big data sets corresponding to a well-known benchmark example and the results indicate that deep learning is a viable candidate to determine model parameters. The results indicate the efficiency and importance of statistical inference methods for researchers to understand and analyze the data to make confident predictions. |
| 6 | Li et al. ( | The authors established the dynamics model of infectious diseases and the time series model to predict the trend and short-term prediction of the transmission of COVID-19, in mainland China for clinical trials. They applied the dynamic models of the six chambers and established the time series models based on different mathematical formulas according to the variation law of the original data. Finally, they suggested that it is a very effective prevention and treatment method to continue to increase investment in various medical resources to ensure that suspected patients can be diagnosed and treated promptly. |
| 7 | Prasse et al. ( | The authors have used a network-based model to describe the COVID-19 epidemic in the Hubei province. They have suggested the network-inference-based prediction algorithm (NIPA) to predict the future prevalence of the COVID-19 epidemic in the cities of China and they have shown that NIPA is best for accurate prediction of the infection spread. |
| 8 | Yang et al. ( | The authors have described the short-term predictor of the daily cases reported in Wuhan City using individual-level network-based model to rebuilt the epidemic dynamics in Hubei Province and have seen the effectiveness of non-pharmaceutical interventions on the epidemic spreading with various scenarios. They have shown through the simulation study that without continued control measures, the epidemic in Hubei Province could have become persistent and the infection rate is controlled through protective measures and social distancing. They have demonstrated the COVID-19 transmission with non-Markovian processes and have shown how these models produce different epidemic trajectories, in comparison to Markov processes. |
| 9 | Popov and Nakov ( | The authors worked on the epidemiological models of the spread of infectious diseases, including COVID-19. The models and simulations of an epidemic in the presence of quarantine and the moment of its termination have been made. They have pointed out that it is important to pinpoint the timing of the lifting of measures or their granting. They have shown through the proposed simulation model that the impact of group gatherings such as the beginning of the school year, holidays, and more, mass events on the epidemic picture. These studies are also relevant in the event of a mutation in the virus that will change the rate of spread. |
| 10 | Saraee and Silva ( | In this review, the authors have compared studies that have used epidemiological models for disease forecasting and other models that have identified socio-demographic factors associated with COVID-19. They have evaluated several models, from basic equation-based mathematical models to more advanced machine-learning ones. They have identified high-impact models used by policymakers and discussing their limitations, They have suggested possible areas of applications for future research. |
| 10 | Moein et al. ( | The authors have used different mathematical techniques, including the susceptible-infected-recovered (SIR) model for the description and prediction of the infection spread of COVID-19. They have simulated the infection spread data in Isfahan province of Iran along with three suppressive measures of the stringency level of physical distancing. They have shown that for the short term prediction, SIR model was only able to predict the actual spread and pattern of COVID-19 while not in long term. They have also concluded that other published works using SIR models for predicting COVID-19 has the same drawback. The assumptions for SIR models are not true for COVID-19 pandemic. Finally they have suggested that more sophisticated modeling strategies and detailed knowledge of the biomedical and epidemiological aspects of the disease are needed to predict the spread of this pandemic. |
| 11 | Alvarez et al. ( | The authors come up with a simple epidemiological model which may be implemented in Excel spreadsheets and able to simulate the data of the COVID-19 pandemic significantly. They have shown that the model may closely follow the evolution of COVID-19 spread in big cities by simply adjusting parameters of demographic conditions and aggressiveness of the response to epidemics. Further they have also advised that the suggested epidemiological simulator may be used to judge the efficiency of the response of population to the pandemic. The simplicity and accuracy of the model will help to understand the extent of an epidemic event and the efficacy of any policy response from the state. |
Figure 6The Susceptible- Infectious (SI) Model is the ingenious model among the disease models. Units are born without immunity which means they are susceptible to all infections. When they will be infected and not given any treatments, then all cases remain life-long infected, and they remained in contact with the susceptible population. This model applies to diseases such as cytomegalovirus (CMV) and herpes.
Figure 7In Susceptible-Infectious-Susceptible (SIS) Model, the infected cases are again susceptible after recovery. This model is applied to the diseases, which have the common occurrence of re-infection and relapse cases, e.g., common cold (rhinoviruses) or sexually transmitted diseases (STDs) such as Gonorrhea or Syphilis.
Figure 8The Susceptible-Infected-Recovered (SIR) model is an epidemiological model that computes the theoretical infections with a contagious infection in a closed population over time. The family of these models involves coupled equations related to the number of susceptible people, infected cases, and recovered individuals from the disease.
Figure 9The Susceptible-Infected-Recovered- Susceptible (SIRS) model is an epidemiological model that describes the theoretically infected individuals with a contagious infection in a closed population over time. In this model, the equations are related to the susceptible, infected, and recovered number of individuals along with re-susceptible individuals for the disease.
Figure 10The Susceptible-Exposed-Infected-Recovered (SEIR) model is an extension of the SIR model to include an exposed but non-infectious group of individuals. This model considers the number of susceptible, exposed, infectious, and recovered individuals with no additional mortality associated with infectious disease.
Figure 11The Susceptible-Exposed-Infected-Recovered- Susceptible (SEIRS) model considers people carry lifelong immunity to disease after recovery, but for many diseases, the immunity deteriorates over time. In such cases, the SEIRS model is applied to permit recovered individuals to come back to a susceptible state. The parameter ξ represents the rate to be susceptible after recovery because of decay in immunity.