Literature DB >> 34220287

Data-driven methods for present and future pandemics: Monitoring, modelling and managing.

Teodoro Alamo¹, Daniel G Reina², Pablo Millán Gata³, Victor M Preciado⁴, Giulia Giordano⁵.

Abstract

This survey analyses the role of data-driven methodologies for pandemic modelling and control. We provide a roadmap from the access to epidemiological data sources to the control of epidemic phenomena. We review the available methodologies and discuss the challenges in the development of data-driven strategies to combat the spreading of infectious diseases. Our aim is to bring together several different disciplines required to provide a holistic approach to epidemic analysis, such as data science, epidemiology, and systems-and-control theory. A 3M-analysis is presented, whose three pillars are: Monitoring, Modelling and Managing. The focus is on the potential of data-driven schemes to address three different challenges raised by a pandemic: (i) monitoring the epidemic evolution and assessing the effectiveness of the adopted countermeasures; (ii) modelling and forecasting the spread of the epidemic; (iii) making timely decisions to manage, mitigate and suppress the contagion. For each step of this roadmap, we review consolidated theoretical approaches (including data-driven methodologies that have been shown to be successful in other contexts) and discuss their application to past or present epidemics, such as Covid-19, as well as their potential application to future epidemics.

Entities: Chemical Disease Gene Species

Keywords: Epidemic control; Epidemiological models; Forecasting; Machine learning; Model predictive control; Optimal control; Pandemic control; Surveillance systems

Year: 2021 PMID： 34220287 PMCID： PMC8238691 DOI： 10.1016/j.arcontrol.2021.05.003

Source DB: PubMed Journal: Annu Rev Control ISSN： 1367-5788 Impact factor: 6.091

Introduction

The 2019 coronavirus pandemic (Covid-19) is one of the most critical public health emergencies in recent human history. While facing this pandemic, governments, public institutions, healthcare professionals, and researches of different disciplines address the problem of effectively controlling the spread of the virus while minimizing the negative effects on both the economy and society. The challenges raised by this pandemic require a holistic approach. In this document, we analyse the interplay between data science, epidemiology and control theory, which is crucial to understand and manage the spread of diseases both in human and animal populations. In line with current epidemiological needs, this paper aims to review available methodologies, while anticipating the difficulties and challenges encountered in the development of data-driven strategies to combat pandemics. We consider the Covid-19 pandemic as a case study and summarize some lessons learned from this pandemic with the hope of improving our preparedness at handling future outbreaks. In the context of epidemics outbreaks, data-driven tools are fundamental to: (i) monitor the spread of the epidemic and assess the potential impact of adopted countermeasures, not only from a healthcare perspective but also from a socioeconomic one; (ii) model and forecast the epidemic evolution; (iii) manage the epidemic by making timely decisions to mitigate and suppress the contagion. Optimal decision making in the context of a pandemic is a complex process involving a significant amount of uncertainty; at the same time, not reacting timely and with adequate intensity, even in the presence of overwhelming uncertainties, can lead to severe consequences. This survey provides a holistic roadmap that encompasses from the process of retrieving epidemiological data to the decision-making process aimed at controlling, mitigating and preventing the epidemic spread. A 3M-analysis is proposed, covering three main aspects: Monitoring, Modelling and Managing, as shown in Fig. 1. A more detailed document, focused on the Covid-19 pandemic, can be found in the preprint (Alamo, Reina and Millán, 2020). Each step of this roadmap is presented through a review of consolidated theoretical methods and a discussion of their potential to help us understand and control pandemics. When possible, examples of applications of these methodologies on past or current epidemics are provided. Data-driven methodologies that have proven successful in other biological contexts, or have been identified as promising solutions in the Covid-19 pandemic, are highlighted. This survey does not provide an exhaustive enumeration of methodologies, algorithms and applications. Instead, it is conceived to serve as a bridge between those disciplines required to develop a holistic approach to the epidemic, namely: data science, epidemiology, and control theory.

Fig. 1

3M-Approach to data-driven control of an epidemic: Monitoring, Modelling and Managing.

Data are a fundamental pillar to understand, model, forecast, and manage many of the aspects required to provide a comprehensive response against an epidemic, or pandemic, outbreak. There exist many different open data resources and institutions providing relevant information not only in terms of specific epidemiological variables but also of other auxiliary variables that facilitate the assessment of the effectiveness of the implemented interventions (see Alamo, Reina, Mammarella and Abella, 2020 for a review on open data resources and repositories for the Covid-19 case). Since the available epidemiological data suffer from severe limitations, methodologies to detect anomalies in the raw data and generate time-series with enhanced quality (like data reconciliation, data-fusion, data-clustering, signal processing, to name just a few) play a crucial role. Another important aspect of the 3M-approach is the real-time surveillance of the epidemic, which can be implemented by monitoring mobility, using social media to assess the compliance to restrictions and recommendations, pro-active testing, contact-tracing, etc. The design and implementation of surveillance systems capable of early detecting secondary epidemic waves is also very important. 3M-Approach to data-driven control of an epidemic: Monitoring, Modelling and Managing. Modelling techniques are also fundamental in the fight against pandemics. Epidemiological models range from coarse compartmental models to complex networked and agent-based models. Fundamental parameters characterizing the dynamics of the virus can be identified using these models. Besides, data-driven parameter estimation provides mechanisms to forecast the epidemic evolution, as well as to anticipate the effectiveness of adopted interventions. However, fitting the models to the available data requires specific techniques because of critical issues like partial observation, non-linearities and non-identifiability. Sensitivity analysis, model selection and validation methodologies have to be implemented (Burnham and Anderson, 2010, Martcheva, 2015). Apart from the forecasting possibilities that epidemiological models offer, alternative forecasting techniques from the field of data science can be applied in this context. The choice ranges from simple linear parametric methods to complex deep-learning approaches. The methods can be parametric or non-parametric in nature. Some of these techniques provide probabilistic characterizations of the provided forecasts. Several measures to mitigate the epidemic can be found in the literature, but one needs to be careful about their effectiveness (Xiao & Torok, 2020). Some measures, like an aggressive lockdown of an entire country, have a devastating effect on the economy and they might be adopted at very precise moments, preferably as early as possible and for short time periods. Other measures, like pro-active testing and contact-tracing, can be very effective while having a minor impact on the economy (Ferretti, Wymant, Kendall, Zhao, et al., 2020). In this direction, control theory provides a consolidated framework to formulate and solve many relevant decision-making problems (Nowzari, Preciado, & Pappas, 2016), such as the optimal allocation of resources (e.g. test reagents and vaccines) and the determination of the optimal time to implement certain interventions. The use of optimal control theory and (distributed) model predictive control has great potential in epidemic control. Mathematical tools from the fields of control theory and dynamic systems, such as bifurcation theory and Lyapunov theory, have been extensively used to characterize the different possible qualitative behaviours of epidemics. This survey is organized as follows: Section 2 describes different methodologies to monitor the current state of a pandemic. An overview of different techniques to model an epidemic is provided in Section 3. The main forecasting techniques are described in Section 4. The question of how to assess the effectiveness of different non-pharmaceutical measures is analysed in Section 5. The decision making process and its link with control theory is addressed in Section 6. The review paper is finished with a section describing some conclusions and lessons learned.

Monitor– Estimation of the state of a pandemic

There is a plethora of indicators that can be monitored in order to contain a pandemic. This includes not only estimations of the current incidence of the disease in the population and the healthcare system, but also the (daily) surveillance of measures that directly or indirectly affect its spread, such as physical distancing and mobility, as well as testing and contact tracing. In order to design an effective response to an epidemic outbreak, it is of utmost importance to build up-to-date estimations of the epidemic state. This estimation process is hindered by the presence of an incubation period of the infectious disease, which introduces a time-delay between the beginning of a new infection and its potential detection. Another challenge in the estimation process is the presence of infectious but asymptomatic cases, which is an important transmission vector in the case of several pathogens, including HIV, Zika virus and SARS-CoV-2 (Ferretti et al., 2020). These (and other) challenges motivate the need for specific surveillance and estimation methodologies capable of using available information in order to design quick and effective control measures (Alamo, Reina, Gata, Preciado, & Giordano, 2021). In this section, we cover the most relevant techniques to monitor the state of the pandemic, focusing on approaches oriented towards (i) real-time monitoring of different aspects of the pandemic (real-time epidemiology); (ii) early detection of infected cases and immune response estimation (pro-active testing); (iii) estimation of the current fraction of infected population, both symptomatic and asymptomatic (state estimation methods); (iv) early detection of new waves (epidemic wave surveillance).

Real-time epidemiology

The use of a large number of real-time data streams to infer the status and dynamics of a population’s health presents enormous opportunities as well as significant scientific and technological challenges (Bettencourt et al., 2007, Drew et al., 2020, Zeng et al., 2010). Real-time epidemic data can vary widely in nature and origin (e.g., mobile phone data, social media data, IoT data and public health systems) (Alamo, Reina, Mammarella et al., 2020, Ting et al., 2020). During the Covid-19 pandemic, mobile phone data, when used properly and carefully, have provided invaluable information for supporting public health actions across early, middle, and late-stage pandemic phases (Oliver, Lepri, Sterly, Lambiotte, Deletaille et al., 2020). Voluntary installation of Covid-19 apps or web-based tools have allowed the active retrieval of data related to exposure and infections. The information stemming from these sources has provided real-time epidemiological data that have then been used to identify hot spots for outbreaks (Drew et al., 2020). Social media have also been relevant to assess the mobility of the population and its awareness with regard to physical distancing, as well as the state of the economy and many other key indicators (Zhou, Su, Pei, Zhang, Du et al., 2020). Our ability to extract information regarding population mobility is essential to predict spatial transmission, identify risk areas, and decide control measures against the disease. Nowadays, the most effective tool to access real-time mobility data is through Big Data technologies and Geographic Information Systems (GIS). These systems have played a relevant role when addressing past epidemics like SARS and MERS (Peeri et al., 2020), providing efficient aggregation of multi-source big data, rapid visualization of epidemic information, spatial tracking of confirmed cases, surveillance of regional transmission and spatial segmentation of the epidemic risk (Wang et al., 2020, Zhou, Su et al., 2020).

Proactive testing

Proactive testing is key in the control of infectious diseases, since it allows us to identify and isolate infected individuals. It also provides relevant information to identify risk areas, fraction of asymptomatic carriers, and attained levels of immunization in the population (Winter and Hegde, 2020, Yilmaz et al., 2020). There are different methodologies to approach proactive testing: Risk-based approach: In this approach, one must test first those individuals with the highest probability of being carriers of the disease (i.e. not only those with symptoms, but also those who have been heavily exposed to the disease). For example, healthcare workers are at high risk and can also be relevant transmission vectors. Second, test those individuals that have been exposed to a confirmed case according to contact tracing. Finally, test those individuals who have recently travelled to hot spots (Wang et al., 2020). The determination of hot spots can be done by means of government mobility surveillance or by personal software environments (Drew et al., 2020). Voucher-based system: In this system, people who test positive are given an anonymous voucher that they can share with a limited number of people whom they think might have infected. The recipients can use this voucher to book a test and receive their test results without ever revealing their identity. People receiving positive result are given vouchers to further backtrack the path of infection; see Nanni, Andrienko, Barabási, Boldrini, Bonchi, et al. (2021) and Roomp and Oliver (2020) for the Covid-19 case. Serology studies: Some tests (such as RT-PCR revealing viral load) are unable to detect past infection. Conversely, serological tests, carried out within the correct time frame after disease onset, can detect both active and past infections, since they detect antibodies produced in response to the disease. Serological analysis can be useful to identify clusters of cases, to retrospectively delineate transmission chains, to ascertain how long transmission has been ongoing, or to estimate the fraction of asymptomatic individuals in the population (Winter & Hegde, 2020).

State-space estimation methods

As we will see in the next section, dynamic state-space epidemiological models are fundamental to characterize how the virus spreads in a specific region and estimate time-varying epidemiological variables that are not directly measurable (Cazelles and Chau, 1997, Scharbarg et al., 2020). Classical state-space estimation methods, like the Kalman filter (Riad, Scoglio, Cohnstaedt, & McVey, 2019), are employed to estimate the fraction of currently infected population. The objective of the Kalman filter is to update our knowledge about the state of the system whenever a new observation is available (Durbin & Koopman, 2012). Different modifications and generalizations of the Kalman filter have been developed and tailored to epidemic models. These methodologies are essential both to the estimation problem and to the inference of the parameters that describe the model (see Abreu and Dutra, 2020, Schön et al., 2011).

Epidemic wave surveillance

Infectious diseases often lead to recurring epidemic waves interspersed with periods of low-level transmission, as observed, for example, in the “Spanish” flu (Reid, Taubenberger, & Fanning, 2001), Influenza (Vega, Lozano, Meerhoff, Snacken, Beauté, et al., 2015) and Covid-19 (Glass, 2020). In this context, it is crucial to implement a surveillance system able to detect, or even predict, recurring epidemic waves, so as to enable an immediate response aiming to reduce the potential burden of the outbreak. Detecting outbreaks requires methodologies able to process huge amount of data stemming from various surveillance systems (Althouse et al., 2015, Dubrawski, 2011, Elliot et al., 2020) and determine whether the spread of the virus has surpassed a threshold requiring mitigation measures; see, e.g. Lazarus (2010). A large body of literature focuses on epidemiological detection problems, since many infectious diseases undergo considerable seasonal fluctuations with peaks seriously impacting the healthcare systems (Sparks, 2013, Unkel et al., 2012). National surveillance systems are implemented world-wide to rapidly detect outbreaks of influenza-like illnesses, and assess the effectiveness of influenza vaccines (Thompson et al., 2006, Vega et al., 2015). Specific methodologies to determine the baseline influenza activity and epidemic thresholds have been proposed and implemented (Vega, Lozano, Meerhoff, Snacken, Mott, et al., 2013). These methods aim at reducing false alarms and detection lags. Outbreak detection can be implemented in different ways that range from simple predictors based on moving average filters (Farrington, Andrews, Beale, & Catchpole, 1996) and fusion methods (Dubrawski, 2011) to complex spatial and temporal analyses (Batlle et al., 2020, Chan and King, 2011). In the early phases of a new pandemic, such as the recent Covid-19, the detection of recurring epidemic waves is particularly challenging because: (i) historical seasonal data are lacking, (ii) determining the current fraction of infected population can be difficult when many asymptomatic infected are present, and (iii) determining baselines and thresholds requires a precise characterization of the regional (time-varying) reproduction number.

Model– Epidemiological models

Mathematical epidemiology is a well-established field aiming to model the spread of diseases both in human and animal populations (Martcheva, 2015, Rothman et al., 2008, Thrusfield, 2018). Given the high complexity of these phenomena, models are key to understand epidemiological patterns and support decision making processes (Heesterbeek, Anderson, Andreasen, Bansal, De Angelis, et al., 2015). There are in-host models that take into account the complexity of virus–host dynamics at the microscopic scale, describing how the pathogen interacts with cellular biomolecular processes and with the immune system, and between-host models that describe how the epidemic spreads within a population at the macroscopic scale, by considering the contagion either at an aggregate level (compartmental models) or through agent-based networked models of the population. Approaches for epidemic multi-scale modelling, which include the interplay between immunological and epidemiological phenomena, are very recent and mostly rely on partial differential equations, sometimes reduced to small-size ordinary differential-equation systems, see e.g. Almocera and Hernandez-Vargas, 2019, Almocera et al., 2018, Barbarossa and Röst, 2015, Cai et al., 2017, Feng et al., 2015, Gandolfi et al., 2015, Gulbudak and Browne, 2020 and Hart, Maini, Yates, and Thompson (2020). Multi-scale epidemic modelling with an interdisciplinary approach integrating epidemiology, immunology, economy and mathematics is advocated in Bellomo, Bingham, Chaplain, Dosi, Forni, et al. (2020).

Time-response and viral shedding

In-host infection dynamics capture the interplay between virus and host. Models describing the dynamics of the immune response (Castiglione & Celada, 2015) in the presence of an infectious disease have been proposed for influenza (Handel et al., 2010, Li et al., 2021, Yan et al., 2017, Zarnitsyna et al., 2016) and generic viral infections (Moore et al., 2020). Very recently an immunological description for Covid-19 has been provided (Matricardi, Dal Negro, & Nisini, 2020) and has enabled the characterization of virus–host dynamics for SARS-CoV-2 (Abuin et al., 2020, Hernandez-Vargas and Velasco-Hernandez, 2020). The evolution of a disease and its infectiousness over time can be characterized through some key epidemiological parameters (see e.g. Heffernan et al., 2005, Hellewell et al., 2020, Vink et al., 2014, Wallinga and Lipsitch, 2007): Latency time: Time during which an individual is infected but not yet infectious. For Covid-19, initial estimates are of 3–4 days (Li, Pei, Chen, Song Zhang et al., 2020). Incubation time: Time between infection and onset of symptoms. For Covid-19, the median incubation period is estimated to be 5.1 days, and 97.5% of those who develop symptoms will do so within 11.5 days of infection (Lauer, Grantz, Bi, Jones, Zheng, et al., 2020); the median time from the onset of symptoms to death is close to 3 weeks (Zhou et al., 2020). Serial interval: Time between symptom onsets of successive cases in a transmission chain (Vink et al., 2014). For Covid-19, initial estimates of the median serial interval yield a value of around 4 days, which is shorter than its median incubation period (Nishiura, Linton, & Akhmetzhanov, 2020); this implies that a substantial proportion of secondary transmission may occur prior to illness onset (He, Lau, Wu, Deng, Wang, et al., 2020). Infectiousness profile: It characterizes the infectiousness of an infected individual over time. For Covid-19, the median duration of viral shedding estimation was 20 days in survivors, while the most prolonged observed duration of viral shedding in survivors was 37 days (Zhou, Yu et al., 2020). Basic reproduction number: It represents the average number of new infections generated by an infectious person at the early stages of the outbreak, when everyone is susceptible, and no countermeasures have been taken (Heffernan et al., 2005, Liu et al., 2020, Wallinga and Lipsitch, 2007). For the original strain of SARS-CoV-2, first estimations range from 2.24 to 3.58 (Zhao, Lin, Ran, Musa, Yang, et al., 2020); the effect of temperature and humidity in this parameter is addressed in different studies, see e.g. Mecenas, Travassos da Rosa Moreira Bastos, Rosário Vallinoto, and Normando (0000). The basic reproduction number, along with the serial interval, can be used to estimate the number of infections that are caused by a single case in a given time period. Without any control measure, at the early stages of the outbreak, more than 400 people can be infected by a single Covid-19 case in one month (Nicola, O’Neill, Sohrabi, Khan, Agha, et al., 2020). Estimates of the basic reproductive number are of interest during an outbreak because they provide information about the level of intervention required to interrupt transmission and about the potential final size of the outbreak (Heffernan et al., 2005). The aforementioned parameters are often inferred from epidemiological models, once they have been fitted to available data on the number of confirmed infection cases and deaths (Rothman et al., 2008, Wallinga and Lipsitch, 2007).

Simple compartmental models

Compartmental models partition a population into different groups, called compartments, associated with mutually exclusive stages of the disease. Each compartment is associated with a variable that counts the individuals who are in that stage of the infection (Brauer, 2008). The simplest compartmental models are the SI, SIS, and SIR models, introduced by Kermack and McKendrick at the beginning of the 20th century (Kermack & McKendrick, 1927). The SIR model includes three compartments: Susceptible (), representing healthy individuals susceptible of getting infected, Infected (), and Recovered/Removed (). For possibly fatal diseases, this last compartment can take into account both recovered (with permanent immunity) and deceased individuals; however, for low mortality rate diseases, including only recovered individuals can be a good approximation. The SIR model describes the dynamics of an epidemic according to the following set of nonlinear differential equations: where is the infection rate, while is the recovery rate; the variables , and represent the fraction of susceptible, infected and recovered (or removed) individuals within the population, and at all times . At the onset of a new epidemic, equals approximately the entire population, and thus from (2) it holds that , where represents the initial number of infected and is the basic reproduction number, i.e. the average number of secondary cases produced by an infectious individual when . Clearly, when is greater than 1, there is an exponential increase in the number of infected individuals during the early days of the epidemic. The same equation can also be used to estimate the point at which the number of newly infected individuals begins to decrease, . At this point, the given population has reached what is known as herd immunity (Fine, Eames, & Heymann, 2011). To account for the latency time, an extended version of the SIR model, called the SEIR model, includes an extra compartment for Exposed (E) individuals, who have been infected but are not yet infectious, and are transitioning into the Infectious compartment at a fixed rate.

Extended compartmental models

To model the specific dynamics of a given infectious disease, extended compartmental models including additional compartments and transitions are often proposed. In particular, it is possible to consider symptomatic and asymptomatic compartments, vaccinated and unvaccinated, the possibility of reinfection after recovery, quarantined individuals, hospitalized, etc. Comprehensive books, surveys and works on compartmental models and their extensions are (Anderson and May, 1991, Brauer and Castillo-Chavez, 2012, Breda et al., 2012, Capasso and Serio, 1978, Diekmann and Heesterbeek, 2000, Gumel et al., 2004, Hethcote, 2000). The number of compartments required to model a disease depends on a variety of factors. For example, when modelling the dynamics of a new disease, for which no vaccine is available, it makes no sense to consider the vaccinated group. However, in other cases, as when modelling seasonal influenza, it is relevant to distinguish between vaccinated and unvaccinated populations (Brauer, 2008). Many diseases, like malaria, West Nile virus, etc., are transmitted not directly from human to human but by infected animals (usually insects) (Taylor, Latham, & Woolhouse, 2001). For these cases, the corresponding animal compartments are included in the model. Another relevant factor influencing what compartments to include in a model is the quantity and quality of available data. Complex models require more data to fit the parameters, so in the early stages of a new disease outbreak simple compartmental models are often employed. Illustration of an extended compartmental epidemic model with seven compartments used in Riley, Fraser, Donnelly, Ghani, Abu-Raddad, et al. (2003) to model SARS : Susceptible (S), Latent (L), Asymptomatic and potentially infectious (I), Symptomatic Diagnosed (Y), Hospitalized that die (), Hospitalized that recover () and Recovered R . Many applications of extended compartmental models can be found in the literature. For example, in Riley et al. (2003), the authors use a dynamical compartmental model to analyse the effective transmission rate of the SARS epidemic in Hong Kong. The model consists of 7 compartments: Susceptible individuals (S) become infected and enter a latent state (L). They then progress to a short asymptomatic and potentially infectious state (I) followed by a symptomatic state that leads to diagnose (Y) and hospitalization. It is assumed that every symptomatic case is eventually hospitalized and either dies () or, after treatment in the hospital (), recovers (R) (see Fig. 2). In Chowell, Blumberg, Simonsen, Miller, and Viboud (2014), a stochastic SEIR model is used to estimate the basic reproduction number of MERS-CoV in the Arabian Peninsula, distinguishing between cases transmitted by animals and secondary cases.

Fig. 2

Illustration of an extended compartmental epidemic model with seven compartments used in Riley, Fraser, Donnelly, Ghani, Abu-Raddad, et al. (2003) to model SARS : Susceptible (S), Latent (L), Asymptomatic and potentially infectious (I), Symptomatic Diagnosed (Y), Hospitalized that die (), Hospitalized that recover () and Recovered R .

In the spread of the Covid-19 pandemic, asymptomatic infected individuals play a crucial role (see Ferretti et al., 2020, Giordano et al., 2020); the large prevalence of asymptomatic infections makes it harder to detect all cases and, thus, timely break the contagion chain. In Giordano et al. (2020), a SIDARTHE model with eight compartments is proposed. This model distinguishes between asymptomatic and symptomatic infected, as well as detected and undetected infection cases, and partitions the population into Susceptible, Infected (asymptomatic infected, undetected), Diagnosed (asymptomatic infected, detected), Ailing (symptomatic infected, undetected), Recognized (symptomatic infected, detected), Threatened (infected with life-threatening symptoms, detected), Healed (recovered) and Extinct (dead) individuals. In Ferretti et al. (2020), the epidemic model includes a transmission rate that takes into account the contributions of asymptomatic, presymptomatic and symptomatic transmissions, as well as environmental transmission. In both works, the results indicate that the contribution of asymptomatic infected to is higher than that of symptomatic infected and other transmission modalities. In fact, symptomatic infected are often rapidly detected and isolated.

Age-structured models

Age-structured epidemic models incorporate heterogeneous, age-dependent contact rates between individuals (Del Valle, Hyman, & Chitnis, 2013). In Safi, Gumel, and Elbasha (2013) and Xue-Zhi, Gupur, and Guang-Tian (2001), stability results for different age-structured SEIR models are given. For Covid-19, an age-structured model, aiming at estimating the effect of physical distancing measures in Wuhan, is presented in Prem, Liu, Russell, Kucharski, Eggo, et al. (2020). In Salje, Kiem, Lefrancq, Courtejoie, et al. (2020), a stratified approach is used to model the epidemic in France.

Seasonal behaviour

Some works have studied the influence of temperature and humidity on the spread of viruses (Grassly and Fraser, 2006, Waikhom et al., 2016). In the case of Covid-19, it has been reported that both variables have an effect on the basic reproduction number (Iqbal et al., 2020, Mecenas et al., 0000). This influence might be included in the epidemic models to capture the seasonal behaviour of Covid-19; for instance, by considering the parameters and as functions of both temperature and relative humidity. Yet, it remains unclear under which circumstances seasonal and geographic variations in climate can substantially alter the dynamics of a given pandemic, specially in the case of high susceptibility (Baker, Yang, Vecchi, Metcalf, & Grenfell, 2020).

Spatial epidemiology

Compartmental models are well-suited to describe the evolution of epidemics in a single, well-mixed population where each individual is assumed to interact with every other at a common rate (homogeneous contacts). While this can be a reasonable approximation in some contexts, it is not appropriate to study the global spread of a pandemic over a large, geographically dispersed population. In the last decades, compartmental models have been successfully extended to spatial epidemiological models in order to analyse spreading phenomena where spatial patterns need to be more accurately described. Graphs and networks have often been used to achieve this, see for instance (House, 2012, Keeling and Eames, 2005, Kiss et al., 2017, Lewien and Chapman, 2019, Mei et al., 2017, Nowzari et al., 2016, Ogura and Preciado, 2016a, Ogura and Preciado, 2016b, Paré et al., 2020, Paré et al., 2018, Pastor-Satorras et al., 2015, Zino et al., 2020). Three widely used classes of models are described in the following sub-subsections.

Meta-population models

Meta-population models integrate two types of dynamics: one related to the disease, typically driven by a compartmental model, and the other to the mobility of individuals (agent-based model) across the sub-populations that build the meta-population under analysis (Ball et al., 2015, Grenfell and Harwood, 1997). As a representative example, in Brockmann and Helbing (2013) the authors introduce the notion of effective distance to capture the spatio-temporal dynamics of epidemics, combining the SIR model of populations with mobility among them. The resulting model for each population is where is the per capita traffic flux from population to population . In Aleta and Moreno (2020), the authors use a SEIR compartmental model together with stochastic data-driven simulations to capture the mobility in all Spanish provinces. The work focuses on evaluating the effectiveness of containment measures in Spain on February 28th, when a few dozen cases of Covid-19 had been detected. Meta-population models to capture the spatio-temporal dynamics of the Covid-19 epidemics in Italy have been proposed in Bertuzzo et al., 2020, Della Rossa et al., 2020 and Gatto et al. (2020). By capturing both temporal and spatial evolution of epidemics, meta-population models are also capable of forecasting the effectiveness of mobility restrictions.

Social networks models

Social network models consider that transmission can only occur along linked or connected individuals (El-Sayed, Scarborough, Seemann, & Galea, 2012), which allows to explicitly model heterogeneity in contact patterns. Small-world networks have been used in combination with compartmental models to model disease transmission of SARS (Small & Tse, 2005) and Covid-19 (Thurner, Klimek, & Hanel, 2020), and also to assess the efficacy of contact tracing (Kiss, Green, & Kao, 2006). In general, network models produce a more accurate prediction of the disease spread (Paré et al., 2020). In particular, the use of homogeneous compartmental models in populations with heterogeneous contacts tends to underestimate disease burden early in the outbreak and overestimate it towards the end, although for certain kinds of networks compartmental models can be modified to prevent this problem (Bansal, Grenfell, & Meyers, 2007). Another interesting aspect of studying epidemic spreads with network models is the observation of a percolation phase transition (House, 2012, Pastor-Satorras et al., 2015), i.e., an abrupt change in the global dynamics of the epidemics. Percolation theory has been widely studied in random networks (Albert & Barabási, 2002). In the context of epidemic modelling, the transition phase occurs where isolated clusters of infected people join to form a giant component that is able to infect many people (Harding, Spinney, & Prokopenko, 2020).

Self-exciting spatio-temporal point processes

In epidemiology, it is natural to register each new infection event with a pair in which refers to time and to location. The underlying stochastic model for this kind of data is called spatio-temporal point process (Diggle, 2006). Since each infection event potentially causes new ones, an epidemic can be modelled as a self-exciting spatio-temporal point process in which the rate of infections depends on the past history of the process (Reinhart et al., 2018, Zino et al., 2020). In this setting, the objective is to estimate an intensity function which predicts the rate of infections at any spatial location and time (Diggle, 2006, Waller, 2010). This modelling framework, which constitutes a generalization of Hawkes processes (Hawkes, 1971), permits the incorporation of the distributions of the duration of incubation, pre-symptomatic and asymptomatic phases, along with the modulating effect of time-varying counter-measures and detection efforts (Garetto, Leonardi, & Torrisi, 2021).

Computer-based models

Computer-based simulation methods to predict the spread of epidemics can take into account numerous factors, such as heterogeneous behavioural patterns, mobility patterns, both at long and short scales, demographics, epidemiological data, or disease-specific mechanisms (Helbing et al., 2015, Marathe and Vullikanti, 2013). The real-world accuracy of mathematical and computational models used in epidemiology has been considerably improved by the integration of large-scale data sets and explicit simulations of entire populations down to the scale of single individuals. These computational tools have recently gained importance in the field of infectious disease epidemiology, by providing rationales and quantitative analysis to support decision-making and policy-making processes (Tizzoni, Bajardi, Poletto, Ramasco, Balcan, et al., 2012). As a representative example, the Global Epidemic and Mobility simulation framework (GLEAM) allows performing stochastic simulations of a global epidemic with different global–local mobility patterns, as well as data regarding demographics or hospitalization (Van den Broeck, Gioannini, Gonçalves, Quaggiotto, Colizza, et al., 2011). However, detailed simulation-based methods depend on a significant number of parameters, which need to be chosen and fixed for a specific simulation. This is especially difficult in the early days of an epidemic outbreak. Furthermore, these approaches might not reveal which factors are actually relevant in the spread of epidemics. Simpler data-driven tools have also been developed to overcome these difficulties (Marathe & Vullikanti, 2013).

Modelling the effect of containment measures

Controlling an emerging infectious disease requires both the prompt implementation of countermeasures and the rapid assessment of their efficacy (Brauner et al., 2020, Cauchemez et al., 2006, Chowell et al., 2003, Guan et al., 2020, Gumel et al., 2004, Haug et al., 2020). In what follows, we enumerate the most relevant non-pharmaceutical interventions, focusing on different research works that assess their efficacy. Quarantine: Quarantine of diagnosed cases, or probably infected, is crucial in every epidemic outbreak. In order to model the effect of quarantine, specific compartments are included in the epidemic models for SARs (Chowell et al., 2003, Gumel et al., 2004). If a significant fraction of the infected population is not diagnosed (or diagnosed with a significant delay), then the modelling is harder and non-diagnosed groups are included in the models (Ansumali et al., 2020, Giordano et al., 2020, May and Anderson, 1987). Quarantine of a whole population (i.e., lockdown) is the most extreme measure in the scope of physical distancing/mobility restrictions. The extreme impact of Covid-19 yield to the quarantine of the epicentre of the pandemic (Wuhan) on January 24th, 2020, and the same measures were subsequently adopted in different countries of Europe and America (Gatto et al., 2020). In this case, the effect of a lockdown can be modelled by means of time-varying epidemic models, see e.g. Calafiore, Novara, and Possieri (2020). Physical distancing: Physical (or social) distancing is another measure promoted by governments, public and private institutions in an attempt to reduce disease transmission (Morato et al., 2020, Prather et al., 2020, Prem et al., 2020). Population-wide wearing of masks, capacity reduction on public transport, reducing or stopping the activity in educational institutions or factories are examples of this. In Maharaj and Kleczkowski (2012), the authors conduct a simulation-based analysis to determine the effects of physical distancing both in public health and in the economy. Two social network models (regular and small-world networks) are combined with a compartmental SIR model, and the economic impact takes into account the costs of individuals falling ill and the cost of a reduction in social contacts. Mobility restrictions: Governments often introduce long-range or local mobility restrictions aimed at reducing disease transmission. Spatial epidemiology is particularly useful to model the effects of such measures. For instance, in Thurner et al. (2020), the authors show, by means of a small-world network model, that the onset of mobility restrictions influences the final size of the outbreak, which is well below the levels of herd immunity. Proactive testing: Proactive testing of asymptomatic individuals is very relevant for the monitoring and control of the Covid-19 pandemic (World Health Organization (WHO), 2020), since it allows to isolate infectious individuals and implement contact tracing strategies, which have been shown to be crucial for an effective control of the pandemic (Giordano et al., 2020). Contact tracing: Contact tracing is a widely used epidemic control measure that aims to identify and isolate infected individuals by following the social contacts of individuals that are known to be infectious. A review of contact-tracing based epidemic models for SARS and MERS can be found in Kwok et al. (2019). In Kiss et al. (2006), a small-world, free-scale network model is combined with a compartmental model to assess the efficacy of contact tracing.

Fitting epidemic models to data

Dynamic epidemiological models rely on a set of parameters that have to be tuned in order to provide realistic predictions and/or infer essential features, such as the (time-varying) effective reproduction number (Cori, Ferguson, Fraser, & Cauchemez, 2013), or the latent period. Fitting epidemic models to data is a fundamental problem in epidemiology that can be approached in different ways. We can distinguish between classical methods, in which the parameters of the model are unknown but fixed, and Bayesian methods, in which they are assumed to be random variables (Kypraios, Neal, & Prangle, 2017). Another classification follows from the accessibility to the populations considered in the compartments of the model: Full access to the evolution of the number of cases in each compartment: In most models, the parameters that determine the dynamics multiply linear or bi-linear terms, depending on the current number of cases in each compartment. This means that a (vector) equality constraint, that depends (bi-) linearly on the parameters to fit, can be obtained at each sample time. In the case of linear constraints, standard linear identification techniques, such as least-square methods, can be applied to estimate the parameters that best fit the model to the data. See, for example, Allman and Rhodes (2004) and Martcheva (2015, Chapter 6). Partial access to the number of cases in each compartment: In many situations, there are no available time series for one or more of the groups considered in the model. This complicates the data-fitting process considerably because it is no longer possible to obtain, in a simple way, the equality constraints described in the full access case. The standard approach in this case is to resort to non-linear identification techniques (see Abreu and Dutra, 2020, Schön et al., 2011). In this context, Monte Carlo based methods (e.g. Markov Chain Monte Carlo and Sequential Monte Carlo algorithms) play a crucial role in addressing the challenges that lie in reconciling predictions and observations (McKinley, Cook, & Deardon, 2009).

Sensitivity analysis

Sensitivity analysis (SA) is the study of how the uncertainty in the output of a model (numerical or otherwise) can be apportioned to different sources of uncertainty in the model input (Saltelli, 2002). See the review paper (Qian & Mahdi, 2020) on the use of this technique in the context of biological sciences. A monovariate and multivariate sensitivity analysis for a data-fitted SARS model is given in Álvarez, Donado-Campos, and Morilla (2015). The use of SA is common in many research papers on modelling Covid-19 (see e.g. Fang et al., 2020, Salje et al., 2020).

Validation and model selection

The ultimate test of the validity of any model is that its behaviour is in accord with real data. Because of the simplifications introduced in any mathematical model of a biological system, we must expect some divergence between the results of a model and reality, even for the most carefully collected data and most detailed model. Different questions arise in this context: (i) How can we determine if a model describes data well? (ii) How can we determine the parameter values in a model that are appropriate for describing real data? These questions are too broad to have a single answer (Allman and Rhodes, 2004, Vittinghoff et al., 2012). Epidemic models depend on their data calibration. However, many possible models are potentially suited to analyse the spread of a pandemic in a given moment. The models are inherently linked to the goal for which they were envisaged. For a given goal (for example second outbreak detection), different models can be considered. Model selection techniques are used on a regular basis in epidemiology (Portet, 2020). They address the problem of choosing, among a set of candidate models, the most suitable for a given purpose (Burnham & Anderson, 2010). The selection is based on different aspects: (i) How the calibrated model is able to reconcile and match observations and (ii) the complexity of the model. Under similar adjustment to observations, simpler models are preferred since they are more robust from an information-theoretic point of view (Huyvaert, Burnham, & Anderson, 2011). There are often different sets of parameters yielding a similar fit to data, but providing significantly different estimations of the main characteristics of the spread of the epidemic (like peak size, reproduction number, etc.). This issue is known as non-identifiability (Gustafsson et al., 2020, Roda et al., 2020). Identifiability issues may lead to inferences that are driven more by prior assumptions than by the data themselves (Lintusaari, Gutmann, Kaski, & Corander, 2016). There are some approaches to address this difficulty. The first one is to resort to simplified models (SIR and SEIR models, for example) in which the number of parameters to adjust is small, see Postnikov (2020) and Roda et al. (2020). The second one is to use data from different regions in a not aggregated way, which reduces the probability of parametric over-fitting (Fiacchini & Alamir, 2021). In this context, model selection theory provides systematic methodologies to determine which model structure best suits the purposes of the model (Burnham and Anderson, 2010, Portet, 2020).

Model– Forecasting

The task of forecasting a time series can be stated as a supervised learning problem in which a number of temporal variables (also called predictors or features in the machine learning literature) are used to learn a model able to predict the future value of an output variable of interest (Bishop, 2006). In our context, we focus on forecasting methods aiming to predict the future evolution of epidemiological variables (Chowell et al., 2019, Suárez et al., 2017). We find in the literature numerous approaches to forecast temporal variables describing the evolution of Covid-19 (Calafiore et al., 2020, Petropoulos and Makridakis, 2020, Tayarani-N, 2020), from black-box approaches to estimates based on learning the internal parameters of compartmental epidemic models. Forecasting in the context of global pandemics faces many difficulties (Ioannidis, Cripps, & Tanner, 2020) and requires the implementation of validation and sensitivity analysis (Burnham & Anderson, 2010). We now introduce some considerations that should be taken into account in order to select and train a suitable forecasting model. First, we start with some statistical considerations: Frequentist versus Bayesian statistical methods: In the former, probabilities are assigned according to experiment repetition and occurrence. In the latter, the parameters of a model are learned using Bayes’ theorem and prior knowledge about the probability distributions of unknown variables (Bonamente, 2013). Parametric versus non-parametric approaches: In the former, we assume a parametric function mapping past variables input into future predictions. This function contains several unknown parameters that are learned using historic time series. In the non-parametric approach, we do not assume such a parametric function (Malley, Kruppa, Dasgupta, Malley, & Ziegler, 2012); for example, one can make future predictions for a given time series by analysing the behaviour of historic past behaviours resembling the behaviour of the time series under consideration. Other considerations to keep in mind are: The model should be trained with reliable data. If the available data is poor, the forecasts produced will be unreliable. In this direction, data-cleaning techniques such as data reconciliation, standardization, filtering, and outlier detection should be utilized to improve the quality of the input data collected (Albuquerque & Biegler, 1996). The amount of data collected should be appropriate for the forecasting technique under consideration. For instance, black-box models, such as deep learning, require vast amounts of data compared with compartmental models (Torrealba-Rodriguez et al., 2020, Yang et al., 2020); therefore, while dealing with relatively short time series, making predictions using compartmental models is more appropriate than using deep learning (and other black-box techniques). Learning procedures should include training, validation, and test phases executed separately. In other words, available data set should be divided into three parts, each one used for a different purpose. In the training stage, model parameters are learned using training data. In the validation step, one adjusts model hyper-parameters and performs comparisons with other competing approaches. Finally, the final test of a model should be carried out with data that has not been used during the training or validation phases (Burnham & Anderson, 2010). Interpretability of the model. While deep learning (and other black-box techniques) may produce high-quality predictions, the obtained model is hard to interpret; in other words, we typically do not have an intuitive understanding of why the model is making a prediction (Arrieta, Díaz-Rodríguez, Del Ser, Bennetot, Tabik, et al., 2020). However, when policy-makers make critical decisions based on the forecast of a model, it is important for them to understand why the model is behaving in a certain way. Therefore, it is sometimes reasonable to use more interpretable models, with parameters having a clear physical/biological interpretation, even at the expense of having a lower performance than with black-box approaches.

Model– Impact assessment tools

In order to design effective control strategies, it is important to define the control goals first. In the context of the current pandemic, the ultimate goal is to maintain the spread of the virus within an adequate threshold (e.g., a low level of infection cases Priesemann et al., 2021), while minimizing the economic and social impacts of the interventions. Once this goal is quantified in terms of a cost function, we should then consider the types of interventions that can be taken to achieve our goals, as well as their associated costs. For example, there are several non-pharmaceutical interventions that can be used before a vaccine is widely available, such as physical distancing, border closures, school closures, isolation of symptomatic individuals, among others (see Section 3.7). Each of these interventions has an associated economic and social cost that should be considered while making a decision. In order to use disciplined decision-making techniques, like the ones described below, one needs to clearly state the control objectives in a precise, quantitative form. Furthermore, it is necessary to quantify the impact and costs of all possible interventions, as well as their actuation limits (Brauner et al., 2020, Cauchemez et al., 2006). In this direction, we can quantify the impact of our actions by using suitable indexes such as the mean reproductive number, the mortality index, or the unemployment rate or public debt, to name just a few. Once the decision-maker has decided how to use these indexes to measure the impact and cost of potential actions, the decision-making process can be stated as a formal optimization problem with constraints. For example, the goal could be the minimization of a weighted index measuring the economic and social impact of our actions while keeping the reproductive number smaller than one. We would like to remark that the numerical estimation of certain indexes is not an easy task because they require the design of data-driven strategies to assess the effect of each potential decision on different indexes. This could be done by means of predictive models and forecasting schemes analysed in the previous sections. In some cases, quantifying the effect of one intervention over the spread of an epidemic is a non-trivial task, since multiple interventions are typically present at the same time (Haushofer, Jessica, & Metcalf, 2020). In these scenarios, correlation analyses, like Pearson Correlation Coefficient (PCC), can be a naive way to assess causalities. Whenever possible, a reliable approach to establish causalities is to perform Randomized Control Trials (RTC) (Donner and Klar, 1994, Haushofer et al., 2020). In an RTC, a subset of randomly chosen individuals receives an intervention, while the rest of individuals receives no intervention. A standard statistical analysis of the observed results can be used to reliably evaluate the impact of this intervention. In the following subsections, we discuss a collection of indexes that could be included in the decision-making process of managing a pandemic.

Spread of the virus and reproductive number

It is natural to express the effectiveness of control strategies in terms of the effective reproductive number . As introduced in Section 3, the basic reproduction number determines the potential of an epidemic to spread exponentially at its early stage by measuring the number of secondary infections induced by a typical infectious individual in a population when everyone is susceptible. In contrast, when an epidemic is ongoing, the effective reproduction number, denoted by , is used to quantify the average number of secondary infections per infectious case in a population with both susceptible and non-susceptible hosts. The effective reproduction number can be used to assess the ability of available control measures to contain the spread of an epidemic. By implementing interventions able to maintain below 1, the incidence of new infections decreases and the spread of epidemics fades with time. In Cori et al. (2013), the authors presented a software tool that was validated with 5 different epidemics, including SARS and influenza. This tool can be used to estimate the daily reproductive number and its variation in the presence of vaccination and super-spreading events. For Covid-19, a numerical analysis of the effective reproductive number can be found in Fang et al. (2020), where, using real data and a SEIR model, the authors estimate in Wuhan and quantify the effectiveness of government measures. Based on the number of deaths, in Flaxman et al. (2020), the Imperial College Covid-19 Response Team used a semi-mechanistic Bayesian model to estimate the evolution of when non-pharmaceutical measures, such as physical distancing, self-isolation, school closure, public events banned, and complete lock-down, were recommended/enforced. Limitations in the use of as an assessment tool stem from the unreliability of available data sources. As a result, determining the real value of is difficult. Other indirect measures, like the number of deaths, ICU cases, saturation of healthcare systems can also be employed to assess the current epidemic burden, as described in the next subsection.

Healthcare systems capacity

The capacity of a country to prevent, detect, and respond to epidemic outbreaks varies widely across countries. The preparedness and resilience of a healthcare system is a particularly relevant factor to analyse the future impact of an infectious outbreak in the population (Kandel, Chungong, Omaar, & Xing, 2020). The capacity of a healthcare system to continue delivering the same level (quantity, quality and equity) of basic healthcare services and protection to the population can severely degrade during an epidemic outbreak (Blanchet et al., 2017, Emanuel et al., 2020). At the early stages of the Covid-19 outbreak, its virulence and high contagiousness quickly saturated the healthcare system of many cities around the world, resulting in higher mortality rates (Lai et al., 2020, Miller et al., 2020). Furthermore, in countries with low capacity, like African and South American countries, saturation levels are reached even with a significantly smaller number of cases (Morato et al., 2020, Velavan and Meyer, 2020). To limit the saturation of healthcare systems and plan resource distribution effectively, tools that assess the effect of different interventions on the magnitude and timing of the epidemic peak during first and secondary outbreaks (see Sections 3, 4 ) are fundamental. However, precise tools to forecast these peaks are challenging to obtain, due to the limitations of the available data and the time-varying nature of the mitigation efforts and potential seasonal behaviour of a pandemic. Another issue is the uncertain adherence of the population to the interventions (see next subsection). In order to partially circumvent these issues, forecasts of cumulative disease burden are often looked for. While missing the intensity and timing of the peaks, these projections can at least allow to identify areas with heavy present and/or future pandemic incidence.

Adherence to interventions and social impact

Analyses of the relationship between risk perception and preventive behaviours can be found in the social epidemiology literature (Berkman et al., 2014, Lep et al., 2020). Moreover, the level of belief in the effectiveness of recommended behaviours and trust in authorities are important predictors of adherence to preventive behaviour (see the survey paper Bish & Michie, 2010), which is fundamental to deploy effective containment strategies (Moran et al., 2016). Here, we review some of the methodologies that could be helpful to design indexes aiming to monitor the adherence of the population to interventions and the social burden of the pandemic. Social network analysis: Online social networks, such as Facebook and Twitter, can be used to assess the impact of an infectious disease on society. People post in these social networks their feelings and worries. In Shanthakumar, Anand, and Ramesh (2020), 530,206 tweets in the USA were analysed to measure the social impact of Covid-19. The hashtags were classified into six categories, including general covid, quarantine, school closures, panic buying, lockdowns, frustration and hope. Thus, the number of tweets in each category can be used as a metric of social impact and overall sentiment. Similarly, Weibo microblogging social network was used in Li et al. (2020) to study the propagation of situational information related to Covid-19 in China. In Jiang, Chen, Yan, Lerman, and Ferrara (2020), the political polarization with regards to Covid-19 in the United States was analysed using a large Twitter dataset. Search engines: Online searches made by citizens in search engines, such as Google, Bing, or Baidu, can be used to measure the social impact of the epidemic in different locations. Normally, people try to find information about unknown diseases, drugs, vaccines, and treatments on the Internet. Along this line, the authors of Ginsberg et al. (2009) found a correlation between the relative frequency of certain queries in Google and the percentage of physician visits in which a patient presents influenza-like symptoms. Furthermore, other works have performed similar studies for other epidemics like Influenza Virus A (H1N1) (Cook, Conrad, Fowlkes, & Mohebbi, 2011). Regarding Covid-19, in Qin, Sun, Wang, Wu, Chen, et al. (2020), the Baidu engine is used to estimate the number of new cases of Covid-19 in China by the number of searches of five keywords, such as dry cough, fever, chest distress, coronavirus, and pneumonia. These five keywords showed a high correlation with the number of new cases. News: The number and the content of posts in online newspapers can also be used to assess the spread of the virus. Along this line, in Zheng, Du, Wang, Zhang, Cui, et al. (2020), Natural Language Processing (NLP) is used to extract the relevant features of news media in China. Online questionnaires: Another tool for measuring the social impact of a sanitary emergence is through online questionnaires such as Oliver, Barber, Roomp and Roomp (2020) (Spain, 146,728 participants), Qiu et al. (2020) (China, 52,730 participants) and Kleinberg, van der Vegt, and Mozes (2020) (UK, 2500 participants), which were implemented for the Covid-19 pandemic. These questionnaires allow to rapidly ask citizens multiple questions related to adherence to interventions, as well as psychological, social and economic impact, among other aspects. The main difficulty is to spread the questionnaires throughout the population, although social networks and web-based tools help to reach a large amount of population. Mobility: One of the most relevant indexes to understand the spread of a pandemic is mobility (Tizzoni et al., 2014). See Section 8.4 in Alamo, Reina, Mammarella et al. (2020) for a relation of mobility data sets in the context of Covid-19. The reduction of mobility is not only due to the imposed quarantines and lockdowns by governments but also due to the increasing population’s fear of getting infected. In Engle, Stromme, and Zhou (2020), a perceived risk index of contracting Covid-19 is defined. This metric measures the individuals’ perception of risk, and it is determined by several variables, such as prevalence in both local and neighbouring locations, as well as population demographics. The results in Engle et al. (2020) indicate that a rise of local infection rate from 0% to 0.003% reduces mobility by 2.31%.

Manage– Managing and decision making

Deciding which of the far-reaching social and economic restrictions are the most effective to contain the spread of a disease, as well as the conditions under which they can be safely lifted, is one of the main goals of data-driven decision approaches to combat pandemics. Unlike an unmitigated pandemic, which spreads through the susceptible population out of control and eventually fades out, a mitigated pandemic presents waves. For example, a first wave grows when a very transmissible disease appears and decreases due to, for example, social distancing measures. However, as soon as social distancing measures are relaxed, a new wave can appear as long as we have a large number of individuals susceptible to the infection. To avoid recurrent waves, it is important to put in place surveillance systems and reactive mechanisms to reduce the potential burden of secondary epidemic waves. The decision-making process in this context is complex for many reasons: The presence of uncertainty in some crucial parameters characterizing the spread, such as seasonality, extent and duration of immunity of a new pandemic outbreak (Cobey, 2020, Kissler et al., 2020). The difficulties in assessing the quantitative effect of a specific set of mitigation interventions on the effective reproduction number (Haushofer et al., 2020). The possibility of significant non-symptomatic transmission (as in the case of Covid-19), which renders some interventions less effective (Nishiura et al., 2020, Prather et al., 2020). The different regional incidence and adherence to interventions, which motivates spatially distributed decisions (Della Rossa et al., 2020, Sélley et al., 2015). The limited capacity of healthcare systems and the logistic challenges to address mass testing and mass vaccination. The necessity to mitigate the spread of the epidemic and, at the same time, reduce the socioeconomic impact. The time-delay induced by the incubation period of the disease, as well as the testing system, which does not allow for a prompt evaluation of the effect of the implemented actions. The difficulties of assessing in a quantitative way the disruptive effects of the undertaken measures on relevant macroeconomic variables. In what follows, we analyse under which circumstances the epidemic can be mitigated (controllability of the pandemic). After that, we also discuss some methodologies that have been applied to combat infectious diseases, including the Covid-19 pandemic, and that could potentially be applied in the context of future pandemics. See also the review papers (Bussell et al., 2019, Nowzari et al., 2016) for the use of control theory in the context of disease control, or (Ansumali et al., 2020, Paré et al., 2020, Preciado et al., 2014) for the stability analysis of an epidemic.

Controllability of the pandemic

In this subsection, we review the most important factors determining the controllability of a pandemic: the aspects that have a relevant impact on the effective reproduction number. We link them with standard epidemic threshold theorems (e.g. Becker, 1977, Kermack and McKendrick, 1927, Whittle, 1955). The epidemic threshold theorem of Kermack and McKendrick (1927), stated in 1927, and in particular its stochastic form as given by Whittle (1955) are fundamental to predict the size and nature of an infectious disease outbreak. The theorem indicates that, in homogeneously mixed communities, major epidemics can be prevented by keeping the product of the size of the susceptible population, the infection rate, and the mean duration of the infectious period, sufficiently small (Becker, 1977). We now discuss how to have an impact on each of these factors by means of control actions. Size of the susceptible population: The most effective way to reduce the susceptible population is by means of vaccines: vaccination campaigns increase herd immunity to a level that prevents further spread of the disease (Giordano et al., 2021, Scherer and McLean, 2002). Protection against an infectious disease can either be achieved by widespread vaccination or by repeated waves of infection over the years, until a large enough fraction of the population is immunized (Graham, 2020). However, an issue is the duration of the acquired immunity (Kissler et al., 2020), which in some infectious diseases, like the seasonal influenza, is not long enough to prevent recurring seasonal peaks (Cobey, 2020). Infection rate: This factor can be reduced by means of different control actions like physical distancing, mobility constraints or prohibition of certain activities (Kraemer et al., 2020, Ngonghala et al., 2020). Depending on the seasonality and the specific demographic characteristics of a given population, the implemented measures can exhibit a time-varying effect on the infection rate (Cori et al., 2013, Fang et al., 2020). This might cause flows from tropical to temperate regions and back in each hemisphere’s respective winter, limiting opportunities for global disease declines (Cobey, 2020) and implying that surveillance methods to detect a seasonal peak should be put in place. Mean duration of the infectious period: An effective way to reduce the infectious period consists in detecting infected cases and setting them into quarantine (Chowell et al., 2003). Challenges are posed by relatively short latent periods and by the presence of many asymptomatic cases, as in the Covid-19 pandemic; then, the impact of quarantine measures depends very much on how fast the detection is taking place. It has been shown that the probability of effectively controlling the outbreak decreases with long delays from symptom onset to isolation (Ferretti et al., 2020, Hellewell et al., 2020). A large prevalence of asymptomatic cases is indeed an issue due to the significant probability that transmission occurs before the onset of symptoms (when the median latent delay is smaller than the median incubation time), hence before the infection can be detected (Ferretti et al., 2020, Giordano et al., 2020).

Optimal allocation of limited resources

During a major health crisis, policy makers face the problem of optimally allocating limited resources, such as intensive care beds, ventilators, tests, high-filtration masks and Individual Protection Equipment (IPE), medicines, vaccines, etc. (Brandeau et al., 2003, Zaric and Brandeau, 2002). This fact has led to the problem of how to ethically and consistently allocate resources (Emanuel et al., 2020). In this context, the term “resource allocation problem” extends to issues such as where and when to allocate available resources. A rigorous and precise allocation method should lead to the formulation of an optimization problem, composed of a mathematical formulation and efficient algorithms to obtain its numerical solution (Hansen & Day, 2011). In the mathematical model, resource allocations are the decision variables while the objectives are encoded in cost functions and equality or inequality constraints. For example, in Brandeau et al. (2003) and Zaric and Brandeau (2002), budget allocation models for multiple populations are provided. in Preciado, Zargham, Enyioha, Jadbabaie, and Pappas (2013), a network model is used to optimally allocate vaccines to eradicate an initial epidemic outbreak using linear matrix inequalities. An extension of this work to the case of directed and weighted networks can be found in Nowzari, Preciado, and Pappas (2015) and Preciado et al. (2014), where geometric programming was proposed to find an optimal solution. The same authors extend this last result to more general compartmental models in Nowzari, Preciado, and Pappas (2017). See also Hayhoe, Barreras, and Preciado (2020) for an application of geometric programming and multi-task learning in the context of Covid-19. In Lampariello and Sagratella (2021) an optimization problem is formulated to find the number of tests to be performed in the different Italian regions in order to maximize the overall detection capabilities. The problem is a quadratic, convex optimization program. In Gollier and Grossner (2020), a group testing (Walter, Hildreth, & Beaty, 1980) approach is considered, and it is shown how the optimization of the group size can save between 85% and 95% of tests with respect to individual testing. See also Yilmaz et al. (2020) for a strategy that optimizes testing resources in the context of the Covid-19 pandemic. Estimation, forecasting, and impact assessment techniques are often used to allocate resources, as they enable decision-makers to predict imbalances between supply and demand and to evaluate the overall efficiency of different alternatives of allocation. In Emanuel et al. (2020), the authors propose fair resource allocation guidelines in the time of Covid-19, which can be a reference for future pandemics. These guidelines come from four fundamental values: (i) maximizing the benefits, (ii) treating people equally, (iii) promoting instrumental value, and (iv) giving priority to the worst off. As a result, these guidelines are condensed in some recommendations: To maximize the number of saved lives and life-years, with the latter metric subordinated to the former. To prioritize critical interventions for healthcare workers and others who take care of sick patients because of their instrumental value. For patients with similar prognoses, equality should be invoked and operationalized through random allocation. To distinguish priorities depending on the interventions and the scientific evidence (e.g. vaccines could be prioritized for older persons while allocation ICU resources depending on prognosis might mean giving priority to younger patients). People who participate in research to prove the safety and effectiveness of vaccines and therapeutics should receive some priority for interventions.

Trigger control

A strategy to modulate the intensity of non-pharmaceutical interventions consists in implementing a trigger mechanism to maintain the effective reproduction number close to one, avoiding the saturation of the healthcare system while reducing, when possible, the economic and social burden of the pandemic (Bin et al., 2021, Cauchemez et al., 2006, Della Rossa et al., 2020, Preciado et al., 2014). The on-line surveillance of the pandemic permits to estimate the time-varying value of the effective reproduction number. Three cases are possible: The effective reproduction number is largely under 1: in this case, one could consider lifting one, or more non-pharmaceutical measures. However, other criteria should be met in order to implement a reduction on the confinements measures in a safe way (Priesemann et al., 2021). The three criteria highlighted by the European Commission to decide on the lifting of confinement measures for Covid-19 (European Commission, 2020) are: Epidemiological criteria showing that the spread of the disease has significantly decreased and stabilized for a sustained period of time. This can, for example, be indicated by a sustained reduction in the number of new infections, hospitalizations and patients in intensive care. Sufficient health system capacity, in terms of, for instance, occupancy rate for Intensive Care Units; adequate number of hospital beds; access to pharmaceutical products required in intensive care units; reconstitution of stocks of equipment; access to care, in particular for vulnerable groups; availability of primary care structures, as well as sufficient staff with appropriate skills to care for patients discharged from hospitals or maintained at home and to engage in measures to lift confinement (testing for example). This criterion is essential as it indicates that the different national healthcare systems can cope with future increases in cases after lifting the measures. At the same time, hospitals are likely to face a backlog of elective interventions that had been temporarily postponed during the pandemic peak. Therefore, healthcare systems should have recovered sufficient capacity in general, and not only related to the management of Covid-19. Appropriate monitoring capacity, including large-scale testing capacity to detect and monitor the spread of the virus combined with contact tracing and possibilities to isolate people in case of resurgence and further spread of infections. Antibody detection capacities, e.g. in the case of Covid-19, provide complementary data on the share of the population that has successfully overcome the disease and eventually measure the acquired immunity. The effective reproduction number has increased to a level clearly above 1: this would demand, in most cases, extremely prompt strengthening of the mitigation interventions. The stringency of the new measures should guarantee that the healthcare system is not overwhelmed by a new epidemic wave. This requires the implementation of forecasting tools that help decision-makers to determine the most suitable set of mitigating measures. The effective reproductive factor is close to 1: in this case, a deeper analysis is required. The decision on whether to keep the same set of current mitigation measures or not will depend on the current fraction of infected population, the healthcare system capacity, and the potentiality of implementing in a short period of time a mitigating intervention, which is capable of bringing the effective reproductive number to admissible values. That is, the decision could be determined by the worst-case cost of delaying in one week the implementation of new measures. It is worth stressing that preemptive actions are always preferable: the earlier a countermeasure is adopted, the better in terms of its efficacy and potential to save lives (Giordano et al., 2021). In order to develop a timely and appropriate response, different methodologies from the field of control theory are available (see the review paper Nowzari et al., 2016). Relying on Pontryagin’s maximum principle, optimal control approaches have been proposed to design optimal treatment plans, or vaccination plans, that minimize the cost of the epidemics, including both the cost of infection and the cost of treatment or vaccination (Bloem et al., 2009, Forster and Gilligan, 2007, Hansen and Day, 2011, Morton and Wickwire, 1974). Robust control approaches have also been proposed to control the spreading of infectious diseases, seen as uncertain dynamical systems (Lee and Leitmann, 1994, Leitmann, 1998). We provide more details on optimal control approaches in the following subsections.

Optimal control theory

Optimal control theory (Liberzon, 2012) can be applied to reduce in an effective way the burden of an epidemic (Lenhart & Workman, 2007; Chapter 9 Martcheva, 2015). The dynamic optimization techniques of the calculus of variations and of optimal control theory provide methods for solving planning problems in continuous time. The solution is a continuous function (or a set of functions) indicating the optimal path to be followed by the variables through time or space (Kamien & Schwartz, 2012). We present here a common formulation of a continuous dynamical optimization problem (Hartl, Sethi, & Vickson, 1995 Section 2): In an epidemic control problem represents the state of the pandemic at time (for example, in terms of the populations of the different compartments), is the control action which can be stated in a direct way (intensity of the interventions, number of vaccines, treatments), or in an indirect way (infection rate, immunologic protection, recovery rate). The differential equation represents the epidemic model, inequality (4) allows us to incorporate (mixed) constraints on and whereas the (pure) constraint (5) can be used to impose limits on the size of the components of . Finally, (6), (7) are terminal constraints. The question of existence of optimal pairs was studied in Cesari (1965) and Filippov (1962). See also Hartl et al. (1995, Section 3) and the references therein. Pontryagin’s maximum principle provides necessary conditions that characterize the optimal solutions in the presence of inequality constraints (Kirk, 2004, Liberzon, 2012). These necessary conditions become sufficient under certain convexity conditions on the objective and constraint functions (Kamien and Schwartz, 1971, Mangasarian, 1966). In general, the solution of the optimal problem in the presence of nonlinear dynamics and constraints requires iterative numerical methods to solve the so-called Hamiltonian system, which is a two-point boundary value problem, plus a maximum (minimum) condition of the Hamiltonian (see e.g. Chapter 6 Kirk, 2004). We now describe some examples of the use of optimal control theory in epidemic control. In Zaman, Kang, and Jung (2008), the dynamic optimal vaccination strategy for a SIR epidemic model is described. The optimal solution is obtained using a forward–backward iterative method with a Runge–Kutta fourth-order solver. An example of how to deploy scarce resources for disease control when epidemics occur in different but interconnected regions is presented in Rowthorn, Laxminarayan, and Gilligan (2009). The authors solve the optimal control problem of minimizing the total level of infection when the control actions are bounded. In Youssef and Scoglio (2013) the authors apply Pontryagin’s Theorem to obtain an optimal Bang-Bang strategy to minimize the total number of infection cases during the spread of SIR epidemics in contact networks. Optimal control theory is employed to design the best policies to control the spread of seasonal and novel A-H1N1 strains in Prosper, Saucedo, Thompson, Torres-Garcia, Wang, et al. (2011). An example of the use of optimal control theory to control the present Covid-19 pandemic is presented in Hayhoe et al. (2020) and Mandal et al. (2020), where the authors design an optimal strategy, for a five compartmental model, in order to minimize the number of infected cases while minimizing the cost of non-pharmaceutical interventions.

Model predictive control

Model predictive control (MPC) provides optimal solutions to a control decision problem subject to constraints (Camacho and Bordons, 2013, Rawlings et al., 2017). MPC is a receding horizon methodology that involves repeatedly solving a constrained optimization problem, using predictions of future costs, disturbances, and constraints over a moving time horizon. In epidemic control, the aforementioned optimization problem is solved daily, or weekly, in order to decide the optimal control action (for example, the intensity of mitigation interventions, or the optimal allocation of resources). The output of the model predictive controller is adaptive in the sense that it takes into consideration the latest available information on the state of the pandemic (Bussell et al., 2019, Sélley et al., 2015). See, for example, Alleman et al., 2020, Köhler et al., 2020 and Morato et al. (2020) for MPC formulations that address the control of the Covid-19 pandemic. See Carli, Cavone, Epicoco, Scarabaggio, and Dotoli (2020) for a review paper on the application of MPC in the context of Covid-19 pandemic. Because of the spatial clustered distribution of an epidemic, it is possible to apply specific control techniques from the field of distributed model predictive control (Christofides et al., 2013, Maestre and Negenborn, 2014). For example, non-linear model predictive control can be used to control the epidemics by solely acting upon the individuals’ contact pattern or network (Sélley et al., 2015). Another example of distributed MPC in the control of epidemics is given in Köhler, Enyioha, and Allgöwer (2018), where the problem of dynamically allocating limited resources (vaccines and antidotes) to control an epidemic spreading process over a network is addressed.

Multi-objective control

Pareto optimality is used in multi-objective control problems with counter-balanced objectives. For instance, in a counter-balanced bi-objective problem, improving one objective implies to worsen the other one. Pareto optimality is based on the Pareto dominance, which defines that one solution dominates another one if it is strictly superior in all the objectives. Thus, the goal of the optimization algorithm is to find the Pareto front, which includes all dominant solutions of the control problem. Therefore, there is a set of optimal solutions instead of one optimal solution. The Pareto front is a useful tool for decision-makers that allows to visualize all the possible optimal solutions (for two objectives is a curve, for three objectives a plane, and so forth) and to evaluate the trade-off between different strategies. In the context of epidemic control (Sharomi & Malik, 2017), Pareto optimality has been used in Yousefpour, Jahanshahi, and Bekiros (2020) in a bi-objective control problem, the goals are related to epidemic measures like the number of cases and economic costs.

Conclusions

This review has presented a roadmap for controlling present and future pandemics from a data-driven perspective, based on three pillars: Monitoring, Modelling, and Managing. We have highlighted the interplay between data science, epidemiology, and control theory to address the different challenges raised by a pandemic. Methodologies and approaches proposed for previous epidemics and the present Covid-19 pandemic have been reviewed, without claiming exhaustiveness, given the huge and continuously growing literature on this subject. Although the relevant body of literature is extremely large and many approaches have been studied in the past, further research is still needed. Implementing effective control strategies to mitigate a pandemic is difficult because of various reasons: (i) the unavoidable uncertainty affecting some crucial parameters that characterize the spread, including compliance issues due to the unpredictable human behaviour, (ii) the difficulties in assessing the quantitative effect of mitigating interventions, (iii) the impossibility of obtaining a prompt evaluation of the effect of the implemented interventions, due to the intrinsic time-delay, and at the same time the critical importance of acting quickly, due to the exponential nature of the spreading phenomenon: even a small delay in interventions can lead to a much heavier healthcare burden and a much larger death toll (see e.g. Giordano et al., 2021). The first step for modelling different aspects of the pandemic is the processing of the available raw data to obtain consolidated time-series. In order to obtain predictive models, which are crucial for the decision-making process, we have discussed several techniques from epidemiology and machine learning. We have described the most relevant modelling and forecasting approaches, focusing on the adjustment of the prediction models to the available data, model selection and validation processes. Different surveillance systems able to detect, or anticipate, possible recurring epidemic waves have been surveyed. These systems enable an immediate response that reduces the potential burden of the outbreak. Different methods from control theory can be applied to provide an optimal, robust and adaptive response to the time-varying incidence of an epidemic. These methods can be applied to the optimal allocation of resources, useful for testing campaigns and vaccination plans, and to determine trigger control schemes that modulate the stringency of the adopted interventions. We have reviewed the control-theory literature focused on the analysis and the design of feedback structures for the efficient control of an epidemic. Besides, we have also mentioned some techniques from distributed model predictive control that can be applied to control the temporal and spatial evolution of an infectious disease. Preventing and controlling pandemics will be an increasingly important challenge in the future, due to the likelihood of new virus spillovers resulting from the increasing ecological footprint of humans. The systems and control community has powerful tools available to contribute and take on this fundamental challenge.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

152 in total

1. Dynamic resource allocation for epidemic control in multiple populations.

Authors: Gregory S Zaric; Margaret L Brandeau
Journal: IMA J Math Appl Med Biol Date: 2002-12

2. Resource allocation for control of infectious diseases in multiple independent populations: beyond cost-effectiveness analysis.

Authors: Margaret L Brandeau; Gregory S Zaric; Anke Richter
Journal: J Health Econ Date: 2003-07 Impact factor: 3.883

3. How generation intervals shape the relationship between growth rates and reproductive numbers.

Authors: J Wallinga; M Lipsitch
Journal: Proc Biol Sci Date: 2007-02-22 Impact factor: 5.349

Review 4. "Herd immunity": a rough guide.

Authors: Paul Fine; Ken Eames; David L Heymann
Journal: Clin Infect Dis Date: 2011-04-01 Impact factor: 9.079

5. Using the Kalman filter and dynamic models to assess the changing HIV/AIDS epidemic.

Authors: B Cazelles; N P Chau
Journal: Math Biosci Date: 1997-03 Impact factor: 2.144

6. Predicting COVID-19 in China Using Hybrid AI Model.

Authors: Nanning Zheng; Shaoyi Du; Jianji Wang; He Zhang; Wenting Cui; Zijian Kang; Tao Yang; Bin Lou; Yuting Chi; Hong Long; Mei Ma; Qi Yuan; Shupei Zhang; Dong Zhang; Feng Ye; Jingmin Xin
Journal: IEEE Trans Cybern Date: 2020-05-08 Impact factor: 11.448

7. New coronavirus outbreak. Lessons learned from the severe acute respiratory syndrome epidemic.

Authors: E Álvarez; J Donado-Campos; F Morilla
Journal: Epidemiol Infect Date: 2015-01-16 Impact factor: 4.434

8. Mathematical Model Reveals the Role of Memory CD8 T Cell Populations in Recall Responses to Influenza.

Authors: Veronika I Zarnitsyna; Andreas Handel; Sean R McMaster; Sarah L Hayward; Jacob E Kohlmeier; Rustom Antia
Journal: Front Immunol Date: 2016-05-09 Impact factor: 7.561

9. Influenza surveillance in Europe: establishing epidemic thresholds by the moving epidemic method.

Authors: Tomás Vega; Jose Eugenio Lozano; Tamara Meerhoff; René Snacken; Joshua Mott; Raul Ortiz de Lejarazu; Baltazar Nunes
Journal: Influenza Other Respir Viruses Date: 2012-08-16 Impact factor: 4.380

10. The effect of control strategies to reduce social mixing on outcomes of the COVID-19 epidemic in Wuhan, China: a modelling study.

Authors: Kiesha Prem; Yang Liu; Timothy W Russell; Adam J Kucharski; Rosalind M Eggo; Nicholas Davies; Mark Jit; Petra Klepac
Journal: Lancet Public Health Date: 2020-03-25

1 in total

1. Epidemiological model based periodic intervention policies for COVID-19 mitigation in the United Kingdom.

Authors: Gianmario Rinaldi; Prathyush P Menon; Antonella Ferrara; W David Strain; Christopher Edwards
Journal: Sci Rep Date: 2022-09-19 Impact factor: 4.996

1 in total