Literature DB >> 23552814

Mathematical modeling of infectious disease dynamics.

Abstract

Over the last years, an intensive worldwide effort is speeding up the developments in the establishment of a global surveillance network for combating pandemics of emergent and re-emergent infectious diseases. Scientists from different fields extending from medicine and molecular biology to computer science and applied mathematics have teamed up for rapid assessment of potentially urgent situations. Toward this aim mathematical modeling plays an important role in efforts that focus on predicting, assessing, and controlling potential outbreaks. To better understand and model the contagious dynamics the impact of numerous variables ranging from the micro host-pathogen level to host-to-host interactions, as well as prevailing ecological, social, economic, and demographic factors across the globe have to be analyzed and thoroughly studied. Here, we present and discuss the main approaches that are used for the surveillance and modeling of infectious disease dynamics. We present the basic concepts underpinning their implementation and practice and for each category we give an annotated list of representative works.

Entities: Chemical Disease Species

Keywords: agent-based models; dynamical models; machine learning models; mathematical epidemiology; statistical models

Mesh：

Year: 2013 PMID： 23552814 PMCID： PMC3710332 DOI： 10.4161/viru.24041

Source DB: PubMed Journal: Virulence ISSN： 2150-5594 Impact factor: 5.882

Introduction

No doubt, the history of mankind has been shaped by the pitiless outbreaks of infectious disease pandemics. Whole nations and civilizations have been wiped off the map through the ages. The list is long: biblical pharaonic plagues that hit Ancient Egypt in the middle of Bronze Age around 1715 BC, the “λοιμός” in Athens from 430 to 425 BC set the end of the Periclean golden era, the “cocoliztli” epidemics, which occurred during the 16th century, resulted in some 13 million deaths, decimating the Mesoamerican native population, the Black Death bubonic plague burst in Europe in 1348, and is estimated to have killed over 25 million people in just five years. The pandemic influenza virus of 1918–1919 swept through America, Europe, Asia, and Africa smashing the globe: the death toll was around 40 million people. Two one-year, less severe influenza pandemics followed in the next decades: the 1957 and the 1963 influenza pandemics resulted to two and one million deaths respectively (World Health Organization: http://apps.who.int/iris/handle/10665/68985). In the last decades emerging and re-emerging epidemics such as AIDS, measles, malaria, and tuberculosis cause death to millions of people each year. According to the UNAIDS report on the global AIDS epidemic, an estimated 34 million people, including 3.4 million children, were living with HIV worldwide at the end of 2010, while the related deaths and new infections were 1.8 and 2.7 million, respectively. The rapid technological and theoretical progress has dramatically enhanced our arsenal in fighting epidemics and we are getting better on it. The global surveillance network is growing under an intensive worldwide effort. We are now able to produce effective vaccines and antiviral drugs and knowledge goes deep in details such as the molecular structure of a variety of viruses. A large and intensive research is evolving for the design of better drugs and vaccines. Yet, studies warn us that a new pandemic—influenza-type is the most worrisome one—is sooner or later on the way. The critical question(s) is not whether but when it will arise, how it is going to spread, how deadly it will be, who should get the vaccine when not all can, how likely are multiple waves of re-emergence and what type of intervention may be applied to stop the spread. Unfortunately, even with all the advances, we still don’t have robust answers. The problem stems mainly from two reasons: (1) the continuous and ever-lasting mutations of the viruses, and (2) the complexity in the disease transmission mechanism. Unfortunately, the odds are that in a real crisis, even if researchers succeed to come up with a vaccine tailor-made for an emerged virus strain, it is doubtful that it would stop a pandemic. The complex multi-scale interplay between a host of factors ranging from the micro host–pathogen and individual-scale host–host interactions to macro-scale ecological, social, economic, and demographic conditions across the globe complicated by technical issues such as the time lag between vaccine prototype development and commercial production and distribution imposes a real impediment to our control strategy potential. Mathematical, statistical models and computational engineering are playing a most valuable role in shedding light on the problem and for helping make decisions.

The Beginning of Mathematical Modeling in Epidemiology

The very first publication addressing the mathematical modeling of epidemics dates back in 1766. In this seminal paper, Essai d'une nouvelle analyze de la mortalité causée par la petite vérole, Daniel Bernoulli developed a mathematical model to analyze the mortality due to smallpox in England, which at that time was one in 14 of the total mortality. Bernoulli used his model to show that inoculation against the virus would increase the life expectancy at birth by about three years. A translation in English and review of this work can be found in Sally Blower (2004), while a revision of the main findings and a presentation of the criticism by D’Alembert appears in Dietz and Heesterbeek (2002). Lambert, in 1772, followed up the work of Bernoulli extending the model by incorporating age-dependent parameters. Laplace has also worked on the same concept. However this line of research has not been developed systematically until the benchmark paper of Ross in 1911, which actually establishes modern mathematical epidemiology. In this work, Ross addressed the mechanistic a priori modeling approach using a set of equations to approximate the discrete-time dynamics of malaria through the mosquito-borne pathogen transmission (for a discussion and a review of this model see also Smith et al. [2012]). Following up the work of Ross, Kermack and McKendrick published three seminal papers which founded the deterministic compartmental epidemic modeling.- In these papers, they addressed the mass–action incident in disease transmission cycle, suggesting that the probability of infection of a susceptible (virgin from illness) is analogous to the number of its contacts with infected individuals. Hence, the rate at which susceptibles become infected is given by kSI where S and I represent population densities of susceptible and infected people, respectively. In this context, the rate at which infected individuals become recovered is given by λI, while the rate at which recovered individuals become again susceptible is given by μR; k, λ and μ are analogy constants. This mechanistic-deterministic representation holds strong analogy to the Law of Mass Action introduced by Guldberg and Waage in 1864 and is called the SIR model, implying homogeneous mixing of the contacts and conservation of the total mass (population) as well as relatively low rates of interaction. Forty years after the paper of Ross, MacDonald extended Ross’s model to explain in depth the transmission process of malaria and propose methods for eradicating the disease on an operational level. Due to the importance of MacDonald’s contribution to the field by exploiting the use of computers, mathematical models for the dynamics and the control of mosquito-transmitted pathogens are known as Ross–MacDonald models. At this point it would be remiss of us not to mention the work of Enko,- who in 1889 published a remarkable probabilistic model for describing the epidemic of measles in discrete time. With the use of the model, Enko evaluated the number of contacts between infectives and susceptibles in the population. The model of Enko is the precursor of the famous Reed-Frost chain binomial model introduced by W. H. Frost in 1928 in biostatistics lectures at Johns Hopkins University (not published then in a journal, but published in 1979). This model assumes that the infection spreads from an infected to a susceptible individual through discrete time Markov chain events. This representation set the basis of contemporary stochastic epidemic modeling.

Mathematical Modeling Methodologies in Epidemiology

Mathematical modeling and simulation allows for rapid assessment. Simulation is also used when the cost of collecting data is prohibitively expensive, or there are a large number of experimental conditions to test. Over the years, a vast number of approaches have been proposed looking at the problem from different perspectives. These encompass three general categories (see Fig. 1): (1) statistical methods for surveillance of outbreaks and identification of spatial patterns in real epidemics, (2) mathematical models within the context of dynamical systems (also called state-space models) used to forecast the evolution of a “hypothetical” or on-going epidemic spread, and (3) machine learning/ expert methods for the forecasting of the evolution of an ongoing epidemic. For all three of these categories there are again different approaches weaving a big and diverse literature. Here, we try to draw the map of these approaches and try to describe their basic underpinning concepts.

Figure 1. An overview of mathematical models for infectious diseases.

Statistical-Based Methods for Epidemic Surveillance

One of the most important aspects in epidemics revolves around the surveillance, early detection of possible outbreaks and patterns that may help controlling a spread. One of the very first success stories in the area is the modeling of cholera epidemic that swept through London in 1854. At that time John Snow, a physician, collected spatiotemporal data and by visualizing them in a map found that there was a particular pattern around the Broad Street water pump, which actually was the zero point of transmission. His analysis helped eradicate the disease. In the dawn of 20th century Greenwood an epidemiologist and statistician was the first Professor of Epidemiology and Statistics at the London School of Hygiene and Tropical Diseases establishing a rigorous mathematical connection between fields. Today, global initiatives to combat epidemics require effective domestic action mechanisms and preparedness through the globe. An intensive worldwide effort led by World Health Organization and Centers for Disease Control is speeding up the developments for the establishment of a global surveillance network. New emerged pandemics such as the AIDS, the severe acute respiratory syndrome (SARS) of 2002–2003 and the H1N1 swine flu of 2009 pandemics reminds us about the importance of surveillance and prompt outbreak detection. Toward this aim, statistical methods have enhanced our potential in fighting epidemics allowing for rapid assessment of emerging situations. Obviously, the correctness of the data and the selection of the appropriate methodology are crucial for the construction of statistical models that can capture in an efficient robust way the communicable disease characteristics. To date, several statistical methods have been proposed (see also Unkel et al. [2012] for a review of statistical methods for the detection of disease outbreaks). In the website of Centers for Disease Control and Prevention (CDC) ( http://www.cdc.gov/ ) one can find a list of references in the field. Here we present and discuss the most common schemes that can be classified as follows:

Regression methods-

Regression models try to detect an outbreak from time-series of epidemic-free periods by monitoring a statistic of reported infected cases, say y(t). An epidemic alert is raised when a certain threshold, say k, is surpassed, defined by , (μ being the mean value of the time-series distribution) within a confidence interval (usually of 95%). A basic regression model is that proposed from Serfling which was initially constructed to monitor the deaths of influenza based on the seasonal pattern of pneumonia and influenza deaths. Due to the seasonal behavior of the disease the following cyclic regression model has been addressed: θ is a linear function of time t while the coefficients are to be determined by a parameter identification technique. The cosine and sine terms are used to approximate cyclical seasonal patterns; e(t) is the noise (assumed that is Gaussian distributed with mean zero and variance σ2) which is estimated from the time-series. In the original paper of Serfling, y(t) was the expected mean value of total deaths due to pneumonia and influenza in units of 4-weeks periods. The model was fitted using data from 108 US cities for a 3 year period starting in September of 1955. Using least squares estimation Serfling ended up to the following model: Other models including square terms, t to account long-term changes due to factors such as the population growth or disease assessment have been also proposed. Today, the above approach is used by the Centers for Disease Control in the US, Australia, France, and Italy for the detection of influenza outbreaks. While this approach is very popular among epidemiologists for predicting and surveillance purposes, one has to be cautious about their use as the form of the equations relies usually on ad hoc assumptions on the dependence between the dynamics of a disease and the independent factors (variables) that determine its spread. In addition, the choice of the model (linear/nonlinear), assumptions on the statistical properties (for example independence, normal distribution and fixed variance) of the unmodeled dynamics (represented by e(t)) flash a “note of caution” in their use especially for the surveillance and prediction of outbreaks of new emerging epidemics.

Times series analysis based on autoregressive models such as the autoregressive integrated moving average model (ARIMA) and seasonal ARIMA (SARIMA)- as well as neural networks

These models relax the hypothesis of autocorrelation of regression models as well as the hypothesis of simple autoregressive models such as AR (autoregressive) and ARMA (autoregressive moving) in which past disturbances are not modeled. In this category, ARIMA models are the most commonly used. Their general form reads: where y(t) denotes a stationary stochastic process at time t with mean value E(y(t)) = μ; z–1 is the backward shift operator defined by z–(t) = y(t – k) and Δ is the differencing operator of order d defined by Δ ≡ (1 – z−1)d; A(z−1) is the autoregressive operator defined as ; B(z−1) is the moving-average operator defined by ; e(t) is the residual (noise) at time t representing the part of the measurement that cannot be predicted from previous measurements. For d = 0 and n = 0 one gets the moving average model, while for d = 1, n = n = 0 one gets the random walk with drift. Seasonal differencing enters naturally in the above framework by considering the seasonal differencing operator where k is the length of seasonal cycle and S is the degree of seasonal differencing producing series of changes from one season to the next. The time-series is then split in two sets: one containing the times-series serving as a training set, and another one containing the remaining data serving as a test (validation) set. The Akaike Information Criterion is usually applied to identify the optimal model order by compromising between the goodness-of-fit and number of parameters. The fitted model is then used for the forecasting of disease evolution. The reliability of such approaches is limited mostly by (1) the statistical uncertainty related to the estimation of the values of the unknown parameters and (2) the hypotheses related to the statistical properties of the corresponding time series.

Statistical process control methods including cumulative sum (CUSUM) charts- and exponentially weighted moving average (EWMA),-based methods

CUSUM is probably the most common used technique for the detection of disease outbreaks. This is achieved by monitoring a cumulative performance measure over time. Let us consider the number of infected cases y(t) as observed at different time instances t, i = 1, 2, …, n. In its simple representation, for a single parameter process, CUSUM is defined as or in a recursive form as where k is a reference value corresponding to the difference between to the in-control and the out-of-control mean. The process is considered to be in-control if CUSUM(i) < h with h denoting a threshold (its value is usually taken to be three times the standard deviation from the baseline/mean value of in-control-observations). An alarm is raised at time t if CUSUM(i) exceeds h; the process is considered to be out-of-control. The reference value k is determined by likelihood ratio based methods.- Hence, denoting by f(θ0) and f(θ1) the probability function of the in-control and out-of-control processes with parameters θ0 and θ1 respectively, the reference value reads: The probability functions f(θ0) and f(θ1) and their parameters can be estimated using data from past periods. For Poisson distributions the above relation reads: where μ0 and μ1 are the mean values of the in-control and out-of-control Poisson distributions. For an epidemic that involves time-varying characteristics, such as seasonality, the reference parameter is now time-varying itself, i.e., k ≡ k(t). The EWMA control chart method monitors infectious disease dynamics using the following recursive statistical estimator, which in its simple form reads: γ is a “forgetting” factor, a number between 0 and 1 which weights the significance of past values. Actually this factor reduces the importance of past observed information in estimating future. Again, an alarm is raised at time t if z(t) > h. Other statistical process control methods such as temporal scan statistics have been also used.,,

Hidden Markov models (HMM) used to explain statistical correlation in time series,

The question that the HMMs come to answer in epidemiology is the following: how can we infer about the dynamics of a particular infectious disease and forecast its outbreak when we cannot monitor/record explicitly the characteristics of the disease but we can observe some possible indicators of the disease? For example, can we forecast the evolution of an influenza epidemic by monitoring for example the number of reported cases as recorded through a surveillance network of physicians or in hospital units?, HMM models are exploited exactly under these limitations/ constraints. Within this context, let us denote by Y(t) the stochastic process of the unobserved (hidden) state, e.g., the number of cases of the disease in the population at time t and with O(t) the stochastic process of the observable states. Formally, HMMs are Markov processes, i.e., stochastic processes which satisfy the so called Markov property (here for the sake of presentation we assume discrete in time Markov processes) defined by: along with the time-invariant transition probability between two realizations, say y(.), y(.): The above relations simply state that all the necessary information for predicting the distribution of Y(t) at time Y(t) with a certain probability defined by P(.)is contained within Y(t – 1); y(.)denotes a realization of the stochastic process Y(.). In HMMs, the following conditional independence assumption holds: Here, the transition probability between an observed, say o(.), and a hidden state, say y(.), is defined as There are three basic questions that have to be answered here: (1) what is the likelihood of the observed sequence, (2) what is the most likely hidden sequence given a,b and the observation sequence, and (3) given the observation sequence, which are the HMM parameters, i.e., a,b and initial distribution of observed states that maximize the likelihood of the observation sequence and/or hidden sequence. The first problem is usually tackled with the use of the forward-backward algorithm, the second problem with the use of the Viterbi algorithm, and the third problem with the use of the so-called Expectation-Maximization (EM) algorithm.

Spatial models for monitoring, identifying and forecasting disease outbreaks in different locations-

Most of the infectious diseases result to strong spatio-temporal patterns whose systematic analysis is of outmost importance for better understanding, predicting and combating outbreaks. Spatial surveillance requires the use of multivariate techniques. Most of the multivariate methods can be viewed as extensions of standard univariate methods—as the ones described above—; however, there are others such as clustering, principal component analysis (PCA) based methods that do not have a common ancestor with univariate ones. Kleinshmidt et al. (2000) used a two tier approach for the surveillance of malaria. They used regression analysis on the larger scale and kriging to interpolate the count data at an unobserved location in order to forecast the prevalence of the disease in the local scale. Cohen et al. (2010) exploited PCA to create a single surveillance index that can be used to summarize temporal and spatial trends of malaria in India. Coleman et al. (2009) used the SatScan freeware software ( http://www.satscan.org/ ) to identify malaria outbreaks to a province of South Africa by detecting time and space clusters. The SatScan software is based on the spatial scan statistic, and the Bernoulli spatial model. SatScan has been also exploited by Gaudart et al. (2006) to identify spatio-temporal clusters of high risk incidence of malaria in a Mali village. A temporal analysis using ARIMA technique was also undertaken. To this end, we should also mention the use of copulas, (joint distribution functions used to model the dependencies between random variables based on given/known marginal distributions of the individual variables) for parametric multivariate analysis. Copulas can be integrated naturally within the HMM framework and hazard analysis approaches such as the Cox, and Plackett–Dale survival models to better understand and ultimately design more efficient intervention policies such as vaccination on targeted parts of the population and project future trends for risk assessment especially for fatal diseases such as AIDS.- Such models are used to quantify the relation of demographic variables (such as age, gender, social status, spatial characteristics) on the survival rates, i.e., occurrence rates of events such as death or infection) in the population.,

Mathematical/Mechanistic State-Space Models

According to the level of the approximation of the reality and increasing complexity mathematical models may be categorized in the following categories:

“Continuum” models in the form of differential and/or (integro)-partial differential equations

Continuum models describe the coarse-grained dynamics of the epidemics in the population.- One might, for example, study a model for the evolution of the disease as a function of the age and the time since vaccination, or investigate the influence of quarantine or isolation of the infected part of the population., Such models can be explored using powerful analysis techniques for ordinary or partial differential equations. However, due to the complexity and the stochasticity of the phenomena, most available continuum models are often only qualitative caricatures that cannot capture all of the details, therefore compromising epidemiological realism. Within this context, the population is divided in compartments in accordance to the state of their health, such as susceptible (S), infected (I), and recovered (R). Other states of the population linked with control policies such as vaccinated (V) and quarantined (Q) are also used. The compartmental SIR mass-action model of Kermack and McKendrick (1922) is the basis of such models. In this representation, it is assumed that an infected individual infects a susceptible with a probability p and that an infected individual recovers with a probability p.The systems dynamics under the mass-balance formulation can be approximated by the following three ordinary differential equations:, where P({S, I, R}) denotes the probability that an individual is on one of the states {S, I, R} at time t and P(A,B) is the pair joint probability to have states A and B communicating at time t; N(S) denotes the set of links of a susceptible individual. The above equations are not in a closed form. Assuming Markovian behavior of the underlying process, P(S,I) = P(S)P(I). Under the mean field approximation, assuming that the population is perfectly mixed and that every susceptible has the same probability of becoming infected the probabilities are equated to the expected (mean) values of the corresponding variables in the population. These assumptions lead to the following set of equations: where S, I, R denote expected (mean) values; a and 1/β denote mean values of the disease transmission probability and length of the period for which an individual can transmit the disease before recovering. The above set of equations is the celebrated Kermack and McKendrick model. When a recovered individual becomes again susceptible after a period of time 1/γ then the SIRS mean field model becomes: In the Kermack and McKendrick model, the disease becomes epidemic, i.e., if and only if . Hence, the number of infective will increase as long as . At the number of infected cases reach a maximum and after this it decreases to zero. The threshold is called the basic reproduction number (R0) and indicates whether the disease will become epidemic (if R0 > 1) or it will die out (if R0 < 1). Generally speaking, R0 represents the average number of secondary infections produced from a single infected individual introduced into a completely susceptible population. A transmission potential index that relaxes the hypothesis of the fully susceptible population is the effective reproduction number defined as the average number of secondary infections produced from a single infected individual in a population which is already infected from a disease. The parameters of these models can be estimated using epidemic data from past periods. Within this context Coburn et al. (2009) give a review on simulating influenza including swine flu (H1N1) with SIR models. Nichol et al. (2010) used a SIR model to simulate influenza dynamics in a college campus and through this to assess the impact of various scenarios of vaccinations. Correia et al. (2011) used a SIR model to study the measles and hepatitis C in Portugal using data from 1996 until 2007. SIR-type models have also been extended to incorporate demographics such as age distributions, mortality and spatial dependence of the spread to account for diffusion and migration effects as well as genetic mutations in the interacting populations, thus enhancing their realism. Gaudart et al. (2009) addressed a modified McDonald’s SIRS model to approximate the dynamics of malaria in the region Bancoumana of Mali that deployed from June 1996 to June 2001. The McDonald’s model has been extended to incorporate the state of contagious children as well as the state of susceptible Anopheles and the state of contagious Anopheles. Magal et al. (2010) presented an age-dependent infection model with a mass action law, and analyze its stability using a Lyapunov function. Metcalf et al. (2011) developed a metapopulation SIR-based model including the probability of infection by age to predict the rubella dynamics in Peru. Ajelli et al. (2011) developed an SIR-based metapopulation model that incorporates a spatial contact matrix describing the mixing level between Italian regions. The authors used the model to predict the spatiotemporal dynamics of hepatitis A in the south regions of Italy. The model was fitted using weekly time series of reported rubella cases from 1997 to 2009. The same type of models have been also used to model nosocomial epidemics modeled both at the level of pathogen and host-host interactions (see, e.g., Webb et al., 2005). In Gaudart et al. (2010), the Ross and McKendrik model has been extended to incorporate demographics and genetic changes in the populations to simulate the spread of malaria in Mali and the plague in the Middle Ages. The authors have employed the Archimedean copula approach to relate the risk of infection and biological age. In another study the authors have augmented the model by age classes and with a diffusion term to account for spatial effects in order to approximate the epidemic front wave dynamics of the Black Death between 1348 and 1350. In Demongeot et al. (2012) the Ross and McKendrik SIR model has been revised to incorporate demographic and spatial dynamics introducing continuous age classes and diffusion of both human and vectors species subpopulations within the infected zones. The model has been used to simulated the spread of malaria in Bancoumana, Mali.

Stochastic models including discrete and continuous-time individual based Markov-chain models-

These are usually individual-level models that relax the hypothesis of the mean field approximations of infinite population and perfect mixing introducing the uniqueness of the individual behavior including multiple heterogeneous characteristics. The main representative in the category is the discrete Markov chains (DMC). In DMC both time and states are defined on a discrete set of values. The states of the individuals change at every discrete time step in a probabilistic manner according to simple rules involving their own states and the states of their links satisfying the Markov property, i.e., that that the future values of the states at time t + Δt depend only on the values of the states at the previous time step t, i.e. For example, for a stochastic SIRS-like model these transition rules may read: • Rule #1: An infected individual (I) infects a susceptible (S) link with a probability p = λ if an active physical communication exists between them. • Rule #2: An infected individual (I) recovers with a probability p = δ. • Rule #3: A recovered individual (R) becomes susceptible (S) with a probability p = γ. This condition expresses the case of temporal immunity. When these transition probabilities remain constant in time, the Markov process is then called time homogenous Markov process. The links between individuals form the contact network through which the disease spreads. For simple DMC models this network is assumed to be a fully connected graph resulting to homogeneous mixing of individuals. For this case and in the limit of infinite number of individuals, the stochastic model can be regarded as a mean field deterministic model. For a uniform distribution with z links per individual and in the limit of an infinite size population the governing equations read: However the above deterministic mean field approximations may impose important bias when the assumptions about infinite size population, homogeneous individuals, homogenous or random regular networks do not hold. Therefore, they may miss important quantitative and/or qualitative information at the coarse-grained/emergent (continuum) level. This situation worsens as the heterogeneity becomes stronger (e.g., interactions on more complex networks with finite size populations). Within this context, a comparison between stochastic and the analogous deterministic models is given in Allen and Burgin (2000). Lekone et al. (2006) used a stochastic SEIR model (E stands for exposed to the disease individuals) to simulate the dynamics of Ebola outbreak in the Democratic Republic of Congo in 1995. Bishai et al. (2011) used a stochastic SIR model with age structure and two additional states (compartments) to describe heterogeneity in vaccination. The authors combined the epidemic model with an economic model incorporating the costs of the control disease policies to study the cost effectiveness of supplemental immunization activities for measles in Uganda. Wang et al. (2012) developed a stochastic model within the SIR concept to simulate and better understand the multi-periodic patterns in outbreaks of avian flu in North America. The model assumes random contact between individuals as well as environmental transmission of the virus. Non-markovian SIR-like models have been also proposed. These models incorporate “memory” in transmission dynamics. For example, Streftaris and Gibson (2004) propose a non-markovian SIR model for the foot-and-mouth disease outbreaks. In their model they assume that individuals remain infected for a time drawn randomly from a two-parameter Weibull distribution. Randomization of classical deterministic SIR-like models, coming from the random, chemical kinetics to account for non-constant population with age classes due to birth and death processes and spatial demographics have been also been proposed. Within this context, Allen and Burgin (2000) compare the dynamics of deterministic and their counterparts stochastic epidemics models for populations with constant and variable.

Complex network models- that are relaxing the hypothesis of the above stochastic models that the interactions between individuals are instantaneous and homogeneous-

One of the most critical problems in epidemics concerns the dynamic effects of the contact network heterogeneity. Contacts between individuals evolve under numerous complicated and strongly heterogeneous modes that are influenced by a broad spectrum of factors, ranging from the pathogen inherent variability and host–pathogen interaction stochasticity characterizing the transmission mechanisms of a particular disease, to the population-level ones complicated by environmental, seasonal, economic, and demographic conditions. Furthermore, in many situations the spread of an epidemic is shaped by the topology of the contact social network, and, vice versa, the dynamic evolution of the transmission network depends on the emergent dynamics of the epidemic. For example, in a severe epidemic outbreak, a change in the state of endemicity of a particular part of the population can cause a significant change in the characteristics of the transmission network (due to, e.g., link-cutting due hospitalization). Understanding this complex behavior is of outmost importance to public-health measures and policies for controlling diseases outbreaks. Vaccination, quarantine, and/or use of antiviral drugs on targeted parts of the population have to be carefully designed for the efficient combat of an emerged epidemic. Poor understanding of the infectious disease dynamics as these emerge due to heterogeneous contact interactions may result to serious negative consequences. Over the last years, there has been an intense effort in studying the interplay between the emergent dynamics of infectious diseases and the underlying topology of transmission network. Within this context, Kuperman and Abramson (2001) showed how changes in the rewiring probability used to construct small-world networks influence the dynamics of a simple epidemic model. It was shown that there exists a critical value of the rewiring probability that marks the onset of a phase transition from stationary endemic situations to self-sustained oscillations. Hwang et al. (2005) studied the influence of the clustering coefficient and average path length on epidemic outbreaks evolving on scale free networks. Shirley and Rushton studied the impact of four different types of network topologies, namely Erdős–Rényi, regular lattices, small-world, and scale free on epidemic dynamics. Reppas et al. (2012) studied the influence of the path length of small world networks on the dynamics of a simple SIRS stochastic epidemic model. Studies on adaptive networks have only very recently begun to appear in the physics literature indicating that adaptation can trigger effects that are not present in other types of networks. Regarding real-world cases, Read et al. (2008) studied the impact of social networking to the spread of a communicable disease by constructing the underlying contact network from a diary-based survey from 49 adults who recorded 8661 encounters with 3528 different individuals over 14 non-consecutive days. Christakis and Fowler (2010) studied a flu outbreak at Harvard University in 2009. Following 744 students they mapped the transmission network following their friends and contacts and detected the critical nodes and links that were responsible for rapid spread and could be used as early warning detectors. In particular, by measuring several statistics of the underlying network topology, they quantified the centrality of individuals in the network, i.e., how much likely is for the disease to pass and transmitted from an individual to other individuals through the network. Salathè et al. (2010) used wireless sensors to obtain close proximity interactions during a typical day at an American high school. Based on these measures, they constructed the transmission network and studied the potential of the disease to spread in terms of topological characteristics such as transitivity and average-path-length with respect to the duration of contact between students. Keeling et al. (2010) constructed two metapopulation networks based on information available from 2001 on the commuter movements between 10 000 wards in Great Britain. From the cattle trading system they also constructed the movement network between 150 000 farms. They consider four infectious diseases, namely influenza and smallpox in humans and foot-and-mouth disease or tuberculosis in cattle. Comparing simulations with actual data the authors raised the question if simple network models can eventually catch the influence of movements in an epidemic. Furthermore they showed that the identity of individuals in contrast to random-mover assumption can significantly influence the emergent infection dynamics. Rocha et al. (2011) simulated the spread of sexual transmitted infections using SI and SIR models evolving over the transmission network constructed from data extracted from a Brazilian Internet community where sex buyers rate their encounters with escorts. The network was extended over 12 cities. They showed that due to the high clustering and the distinct communities of the underling topology, the network slows down outbreaks.

Agent-based simulations

In contemporary mathematical epidemiology, agent-based modeling represents the state-of-the-art for reasoning about and simulating complex epidemic systems. These take into account details such as the transportation infrastructure of the simulated area, the mobility of the population, demographics, and epidemiological aspects such as the evolution of the disease within a host and transmission between hosts (Fig. 2). Public-health epidemiologists, researchers, and policy makers are turning to these detailed models for reasons of ethics, cost, timeliness, and appropriateness. In epidemic systems, testing experimental conditions would put the safety of people at risk, creating an ethical problem. In other cases, real-time evaluation of an existing system may be prohibitively long. For example, in a disaster, simulation can be used to rapidly evaluate many previously unexamined alternatives. In all of these cases, since the real-world system under study is a complex system, multi-agent simulations are used as they are considered to incorporate the appropriate level of complexity. For example the Models of Infectious Disease Agent Study (MIDAS, https://www.epimodels.org/midas/pubglobamodel.do ), a network launched on May 1, 2004 and funded by the US National Institutes of Health has as its pilot effort the detailed modeling of the dynamics of a hypothetical flu pandemic.

Figure 2. Schematic of the components of an agent-based epidemic simulator.

Figure 2. Schematic of the components of an agent-based epidemic simulator. Within this context, Eubank et al. (2004) addressed the use of EpiSims, a detailed agent-based simulator which incorporates data from population mobility based on TRANSIMS and epidemic models of host-pathogen and host-host interactions. EpiSims, developed at Los Alamos National Laboratory creates a synthetic population based on the Transportation Analysis and Simulation System (TRANSIMS, http://code.google.com/p/transims/ ). The authors simulated the spread of an infectious disease in the area of Portland, Oregon, US whose network involves 1.5 million people (nodes), 180 000 locations and a total of 1.6 million vertices. Ferguson et al. (2005) developed and presented the simulations results concerning the H5N1 influenza A pandemic in Southeast Asia. Their simulations involved 85 million agents residing in Thailand and a 100 km-wide zone of neighboring countries. Demographic data involving details about households, location of schools and workplaces, and population mobility where taken into account. Using the detailed agent-based simulations they evaluated the containment strategies with respect to the potential of preventing a pandemic and the distribution of drugs necessary to eradicate the spread. Burke et al. (2006) presented an agent-based model for the spread of smallpox. The model considered hypothetical towns of 6000- and 50 000 inhabitants. A distribution of households, workplaces, schools, and hospital units was constructed based on US demographic data. The authors investigated the efficiency of various contagion control scenarios such as vaccination of households, children at schools, isolation of infected persons and vaccination of medical staff in hospitals. Balcan et al. (2009) investigated how short-scale and long-scale contacts due to air travel can influence the spatiotemporal pattern of a pandemic. The authors made use of the GLEaM agent-based computational platform ( http://www.gleamviz.org/ ) consisting of three data layers: the demographic/population, the mobility-related, and the epidemic modeling layer. In this study, real-world data from 29 countries around the globe as well as air travel flowing from 3362 airports indexed by IATA were integrated into a spatial metapopulation epidemic model.

Empirical/Machine Learning-Based Models

Over the last years, machine learning using data extracted from internet-based communication platforms and search engines have been used to extract early indicators of social trends. Microblogging socializing services and web searching platforms have revolutionized the way private and publicly available information diffuses. Such emerging technology appears promising to data mining agents' personal behavior. For example, such services with the aid of search queries have been exploited as tools to stock-market prediction and movie box-office revenue. Within this context Ginsberg et al. (2009) exploited the aid of search queries on the Google platform for early detection of influenza epidemic in the US. The authors used around 50 million Google web queries related to influenza symptoms between 2003 and 2008. A linear model using the log-odds of a visit of a physician in a certain region and the log-odds of a related search query submitted from the same region was fitted using publicly available data from the CDC’s US Influenza Sentinel Provider Surveillance Network ( http://www.cdc.gov/flu/). This approach has now been realized as a surveillance web-based tool ( http://www.google.org/flutrends/ ). Hulth et al. (2009) processed web queries submitted in a Swedish website related to influenza between 2005 and 2007. The authors fitted two models, one for relating web queries volume with the total number of laboratory verified influenza and the number of persons exhibiting influenza-like symptoms treated by physicians in Sweden. The models were used in turn to estimate outbreaks of the disease in time as well as to predict the influenza evolution. In Chan et al. (2011) a linear model was used to relate Google search queries related to dengue in Bolivia, Brazil, India, Indonesia, and Singapore using publicly available dengue cases between 2003 and 2010.

Conclusion

In this paper, we discussed and presented key modeling methods used for the surveillance and forecasting of infectious disease outbreaks. Generally speaking, epidemiological models can be categorized in three classes: statistical, mathematical-mechanistic state space, and machine-learning based ones. Public-health organizations throughout the world use such models to evaluate and develop intervention disease outbreak policies for ever-emerging epidemics. Simulation allows for rapid assessment and decision making, providing quantification and insight into the spatio-temporal dynamics of a spread. An intensive inter- and multi-disciplinary research effort is speeding up the developments in the field integrating advances from epidemiology, molecular biology, computational engineering and science, and applied mathematics as well as sociology. Nowadays, molecular, sociological, demographic, and epidemiologic data are exploited to develop state-of-the-art detailed very large-scale bottom-up agent-based models aspiring to approximate the dynamics of real-world cases. Within this context, along with the available information ranging from the host-pathogen interaction level to the host-host, city, country, and globe level, complex network theory has provided the necessary “glue” for the systematic link between epidemiology demographics and sociology. On one hand, for the bridging of the scales of modeling, one has to first find the appropriate observable variables for which deterministic or stochastic models can be expressed. To this direction, data mining techniques that have flourished over the last few years can be employed to extract such information. On the other hand, due to the complexity of the underlying multiscale interactions, such models are built on incomplete knowledge imported, e.g., as parameter, rule evolution, and contact network inaccuracies. Thus far, simple brute-force temporal simulations are used to study the behavior of very large scale detailed agent-based simulators in the presence of such inaccuracies. For example some of the rules and model’s parameters, such as the virus pathogenicity—as this may be expressed in terms of the reproduction number—and different social network topologies, are examined in order to assess how such factors may influence the spread of an outbreak. However, such simple simulations are inefficient for the systematic analysis of the emergent epidemic in the parameter space. New rigorous computational methodologies, such as the equation-free multiscale framework,,- that can be used to address this issue have the potential to expedite novel computational modeling and analysis as well as to enhance our understanding and forecasting capability to combat epidemic outbreaks.

81 in total

1. Daniel Bernoulli's epidemiological model revisited.

Authors: Klaus Dietz; J A P Heesterbeek
Journal: Math Biosci Date: 2002 Nov-Dec Impact factor: 2.144

2. An attempt at a new analysis of the mortality caused by smallpox and of the advantages of inoculation to prevent it. 1766.

Authors: Sally Blower; Daniel Bernoulli
Journal: Rev Med Virol Date: 2004 Sep-Oct Impact factor: 6.989

3. On the course of epidemics of some infectious diseases.

Authors: P D En'ko
Journal: Int J Epidemiol Date: 1989-12 Impact factor: 7.196

4. Influence of nonlinear incidence rates upon the behavior of SIRS epidemiological models.

Authors: W M Liu; S A Levin; Y Iwasa
Journal: J Math Biol Date: 1986 Impact factor: 2.259

5. Modelling epidemics with variable contact rates.

Authors: D Greenhalgh; R Das
Journal: Theor Popul Biol Date: 1995-04 Impact factor: 1.570

6. An evaluation of influenza mortality surveillance, 1962-1979. I. Time series forecasts of expected pneumonia and influenza deaths.

Authors: K Choi; S B Thacker
Journal: Am J Epidemiol Date: 1981-03 Impact factor: 4.897

7. The Summary Index of Malaria Surveillance (SIMS): a stable index of malaria within India.

Authors: Alan A Cohen; Neeraj Dhingra; Raju M Jotkar; Peter S Rodriguez; Vinod P Sharma; Prabhat Jha
Journal: Popul Health Metr Date: 2010-02-11

8. Excess mortality associated with influenza epidemics in Portugal, 1980 to 2004.

Authors: Baltazar Nunes; Cecile Viboud; Ausenda Machado; Corinne Ringholz; Helena Rebelo-de-Andrade; Paulo Nogueira; Mark Miller
Journal: PLoS One Date: 2011-06-21 Impact factor: 3.240

9. Ross, macdonald, and a theory for the dynamics and control of mosquito-transmitted pathogens.

Authors: David L Smith; Katherine E Battle; Simon I Hay; Christopher M Barker; Thomas W Scott; F Ellis McKenzie
Journal: PLoS Pathog Date: 2012-04-05 Impact factor: 6.823

10. Effects of climate on West Nile Virus transmission risk used for public health decision-making in Quebec.

Authors: Salaheddine El Adlouni; Claudie Beaulieu; Taha B M J Ouarda; Pierre L Gosselin; André Saint-Hilaire
Journal: Int J Health Geogr Date: 2007-09-20 Impact factor: 3.918

74 in total

1. Modeling the 2014 Ebola Virus Epidemic - Agent-Based Simulations, Temporal Analysis and Future Predictions for Liberia and Sierra Leone.

Authors: Constantinos Siettos; Cleo Anastassopoulou; Lucia Russo; Christos Grigoras; Eleftherios Mylonakis
Journal: PLoS Curr Date: 2015-03-09

2. Editorial: Mathematical modeling of infectious disease dynamics.

Authors: Constantinos I Siettos
Journal: Virulence Date: 2016 Impact factor: 5.882

3. Evaluating short-term forecasting of COVID-19 cases among different epidemiological models under a Bayesian framework.

Authors: Qiwei Li; Tejasv Bedi; Christoph U Lehmann; Guanghua Xiao; Yang Xie
Journal: Gigascience Date: 2021-02-19 Impact factor: 6.524

4. 5G, Big Data, and AI for Smart City and Prevention of Virus Infection.

Authors: Shumin Ren; Bairong Shen
Journal: Adv Exp Med Biol Date: 2022 Impact factor: 2.622

5. Modeling and inference for infectious disease dynamics: a likelihood-based approach.

Authors: Carles Bretó
Journal: Stat Sci Date: 2018-02-02 Impact factor: 2.901

6. Interdisciplinary Approaches to COVID-19.

Authors: Negar Moradian; Marjan Moallemian; Farnaz Delavari; Constantine Sedikides; Carlos A Camargo; Pedro J Torres; Armin Sorooshian; Saeid Paktinat Mehdiabadi; Juan J Nieto; Stephane Bordas; Hamid Ahmadieh; Mohammad Abdollahi; Michael R Hamblin; Frank W Sellke; Jack Cuzick; Bozkurt Biykem; Michael Schreiber; Babak Eshrati; Georg Perry; Ali Montazeri; Ali Akbar Saboury; Roya Kelishadi; Amirhossein Sahebkar; Ali A Moosavi-Movahed; Hassan Vatandoost; Mofid Gorji-Bandpy; Bahram Mobasher; Nima Rezaei
Journal: Adv Exp Med Biol Date: 2021 Impact factor: 2.622

Review 7. The Promise of AI in Detection, Diagnosis, and Epidemiology for Combating COVID-19: Beyond the Hype.

Authors: Musa Abdulkareem; Steffen E Petersen
Journal: Front Artif Intell Date: 2021-05-14

8. An Agent-Based Metapopulation Model Simulating Virus-Based Biocontrol of Heterodera Glycines.

Authors: Safyre Anderson; Chinmay Soman; Sadia Bekal; Leslie Domier; Kris Lambert; Kaustubh Bhalerao
Journal: J Nematol Date: 2018-09-03 Impact factor: 1.402

9. A cross-sectional study measuring contact patterns using diaries in an urban and a rural community in South Africa, 2018.

Authors: Jackie Kleynhans; Stefano Tempia; Meredith L McMorrow; Anne von Gottberg; Neil A Martinson; Kathleen Kahn; Jocelyn Moyes; Thulisa Mkhencele; Limakatso Lebina; F Xavier Gómez-Olivé; Floidy Wafawanaka; Azwifarwi Mathunjwa; Cheryl Cohen
Journal: BMC Public Health Date: 2021-06-03 Impact factor: 3.295

10. Modelling the Transmission Dynamics of COVID-19 in Six High-Burden Countries.

Authors: Azizur Rahman; Md Abdul Kuddus
Journal: Biomed Res Int Date: 2021-05-27 Impact factor: 3.411