Raj Dandekar1, Chris Rackauckas2, George Barbastathis3,4. 1. Department of Computational Science and Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA. 2. Department of Applied Mathematics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA. 3. Department of Mechanical Engineering, Massachusetts Institute of Technology, Cambridge, MA 02139, USA. 4. Singapore-MIT Alliance for Research and Technology (SMART) Centre, Singapore 138602, Singapore.
Abstract
We have developed a globally applicable diagnostic COVID-19 model by augmenting the classical SIR epidemiological model with a neural network module. Our model does not rely upon previous epidemics like SARS/MERS and all parameters are optimized via machine learning algorithms used on publicly available COVID-19 data. The model decomposes the contributions to the infection time series to analyze and compare the role of quarantine control policies used in highly affected regions of Europe, North America, South America, and Asia in controlling the spread of the virus. For all continents considered, our results show a generally strong correlation between strengthening of the quarantine controls as learnt by the model and actions taken by the regions' respective governments. In addition, we have hosted our quarantine diagnosis results for the top 70 affected countries worldwide, on a public platform.
We have developed a globally applicable diagnostic COVID-19 model by augmenting the classical SIR epidemiological model with a neural network module. Our model does not rely upon previous epidemics like SARS/MERS and all parameters are optimized via machine learning algorithms used on publicly available COVID-19 data. The model decomposes the contributions to the infection time series to analyze and compare the role of quarantine control policies used in highly affected regions of Europe, North America, South America, and Asia in controlling the spread of the virus. For all continents considered, our results show a generally strong correlation between strengthening of the quarantine controls as learnt by the model and actions taken by the regions' respective governments. In addition, we have hosted our quarantine diagnosis results for the top 70 affected countries worldwide, on a public platform.
The coronavirus respiratory disease 2019 originating from the virus “SARS-CoV-2”, has led to a global pandemic, leading to 12,552,765 confirmed global cases in more than 200 countries as of July 12, 2020. As the disease began to spread beyond its apparent origin in Wuhan, the responses of local and national governments varied considerably. The evolution of infections has been similarly diverse, in some cases appearing to be contained and in others reaching catastrophic proportions. In Hubei province itself, starting at the end of January, more than 10 million residents were quarantined by shutting down public transport systems, train stations and airports, and imposing police controls on pedestrian traffic. Subsequently, similar policies were applied nationwide in China. By the end of March, the rate of infections was reportedly receding.By the end of February 2020, the virus began to spread in Europe, with Italy using extraordinary quarantine measures starting on March 11, 2020. France enacted strict quarantine measures beginning on 17 March followed later by the UK on 23 March; whereas no measures were enforced in Sweden. South Korea, Iran, and Spain experienced acute initial increases, but then adopted drastic generalized quarantine. In the United States, the first infections were detected in Washington State as early as January 20, 2020, and now it is being reported that the virus had been circulating undetected in New York City as early as mid-February. Federal, state, and city government responses were comparatively delayed and variable, with most states having stay-at-home orders declared by the end of March. In South America, Brazil, Chile, and Peru are the highest affected countries as of 12 July and they used differing quarantine policies. Brazil's first case was reported in the last week of February and the country went into a state of partial quarantine on 24 March. Chile declared a state of disaster for 90 days in the first week of March, and the military was deployed to enforce quarantine measures. In Peru, a nationwide curfew was used much later, on March 19.Given the available COVID-19 data for the infected case count by country and worldwide, it is seen that the infection growth curve also showed significantly diverse behavior globally. In some countries, the infected case count peaked within a month and showed a subsequent decline, while in certain other countries, it was seen to increase for much longer before plateauing. In some of the highly affected countries, the infected count has not yet reached a plateau and the number of daily active cases continues to increase or remain stagnant as of July 12, 2020. The disparity of the countries' responses is compounded by commensurate disparity in their effectiveness in controlling the severity of infectious spread. This, together with standard challenges in epidemiological modeling and certain unusual features of the disease itself (such as the possibility of individuals to remain asymptomatic yet infectious for up to 2 weeks) create severe difficulty in interpreting the policies or to draw lessons for future outbreaks.Here, we focus on compartment-based modeling, a widely used tool in epidemiology. The earliest version of the compartment model was the SIR (Susceptible-Infected-Recovered) model. Two major assumptions of this class of models are (1) homogeneity and (2) the law of mass action, which states that the rate of change of compartment population at the next time step is proportional to the compartment population at the current time step; hence, compartmental models typically result in a set of coupled ordinary differential equations (ODEs) governing the populations. These simplifying assumptions make the compartment models weaker than the other class of models called agent-based models, which are used to simulate autonomous agents and their interactions within a constrained environment (see Gallagher and Baltimore and references therein for a detailed Introduction). Although it is easier to incorporate heterogeneity in agent-based models, the significant advantage of compartment modeling is interpretability. This is because physically meaningful information about the system, such as the reproduction number, can be extracted directly from the ODEs. Stochastic variations of compartment-based models13, 14, 15 and Bayesian approaches have also been studied.For analyzing different aspects of the COVID-19 outbreak, compartment-based models that are based on the SEIR (Susceptible-Exposed-Infected-Recovered) framework have been used widely.17, 18, 19, 20, 21, 22 From such studies, it is seen that, although increasing the number of compartments results in more realistic behavior, the model then becomes less identifiable; i.e., it becomes progressively more difficult to uniquely determine parameters from the data. For example, while analyzing the COVID-19 outbreak for Wuhan, China, it has been shown in a recent study that the large number of parameters in the SEIR models makes it less reliable than the simpler SIR models.To deal with the aforementioned disparity between government responses and outcomes to the COVID-19 pandemic, several models studied the effect of quarantine/lockdown measures on the evolution of the disease.,21, 22, 23, 24 Existing models generallylack independent estimation: using parameters based on previous knowledge of SARS/MERS coronavirus epidemiology and not derived independently from the COVID-19 data or parameters, such as rate of detection, nature of government response fixed before running the model; orlack global applicability: they are not implemented on a global scale; orlack interpretability, as we defined it earlier.In this paper, we propose a globally scalable, interpretable compartment-based model with entirely independent parameter estimation through a novel approach: augmenting a first principles-derived epidemiological model with a data-driven module, implemented as a neural network. Previous approaches of functional quantification through data involve probabilistic methods, such as variational inference25, 26, 27, 28, 29, 30, 31, 32, 33, 34 and variational Gaussian processes, which do not incorporate knowledge of the ODEs governing the system under consideration. We leverage our model to quantify the quarantine strengths and analyze and compare the role of quarantine control policies used to control the virus effective reproduction number,36, 37, 38, 39, 40, 41 in the European, North American, South American, and Asian continents. In the SEIR model,42, 43, 44 the population is divided into the susceptible S, exposed E, infected I, and recovered R groups, and their relative growths and competition are represented as a set of coupled ODEs; whereas the simpler SIR model does not account for the exposed population E. These models cannot capture the large-scale effects of more granular interactions, such as the population's response to social distancing and quarantine policies. However, a major assumption of these models is that the rate of transitions between population states is fixed. In our approach, we relax this assumption by estimating the time-dependent quarantine effect on virus exposure as a neural network informs the infected variable I in the SIR model. This trained model thus decomposes the effects and the neural network encodes information about the quarantine strength function in the locale where the model is trained.In general, neural networks with arbitrary activation functions are universal approximators.45, 46, 47 Unbounded activation functions, in particular, such as the rectified linear unit (ReLU) has been known to be effective in approximating nonlinear functions with a finite set of parameters.48, 49, 50 Thus, a neural network solution is attractive to approximate quarantine effects in combination with analytical epidemiological models. The downside is that the presence of the neural network term as a component of the ODEs results in limited interpretability. The recently emerging field of scientific machine learning exploits conservation principles within a universal differential equation, SIR in our case, to mitigate overfitting and other related machine learning risks.In the present work, the neural network is trained from publicly available infection and population data for COVID-19 for a specific region under study; results for which are provided in the next section followed by a Discussion. Details of the model estimation procedure and parameter inference are presented in the Experimental Procedures.
Results
Standard SIR Model
The classic SIR epidemiological model is a standard tool for basic analysis concerning the outbreak of epidemics. In this model, the entire population is divided into three sub-populations: susceptible S; infected I; and recovered R. The sub-populations’ evolution is governed by the following system of three coupled nonlinear ODEs:Here, β and γ are the infection and recovery rates, respectively, and are assumed to be constant in time. The total population is seen to remain constant as well; that is, births and deaths (unrelated to the disease) are neglected. The recovered population is to be interpreted as those who can no longer infect others; so it also includes individuals deceased due to the infection. The possibility of recovered individuals to become reinfected is accounted for by SEIS (Susceptible-Exposed-Infected-Susceptible)models, but we do not use this model here, as the negligibly few reinfection cases for COVID-19 have been recorded as of now. The reproduction number in the SEIR and SIR models is defined asAn important assumption of the SIR models is homogeneous mixing among the sub-populations. Therefore, this model cannot account for social distancing or social network effects. In addition, the model assumes uniform susceptibility and disease progress for every individual; and that no spreading occurs through animals or other non-human means. Alternatively, the SIR model may be interpreted as quantifying the statistical expectations on the respective mean populations, while deviations from the model's assumptions contribute to statistical fluctuations around the mean.
Augmented QSIR Model
To study the effect of quarantine control globally, we start with the SIR epidemiological model. Figure 1A shows the schematic of the modified SIR model, the QSIR model, which we consider. We augment the SIR model by introducing a time varying quarantine strength rate term and a quarantined population , which is prevented from having any further contact with the susceptible population. Thus, the term denotes the infected population still having contact with the susceptibles, as done in the standard SIR model; while the term denotes the infected population who are effectively quarantined and isolated. Further we introduce an additional recovery rate δ which quantifies the rate of recovery of the quarantined population. Thus, we can write an expression for the quarantined infected population as
Figure 1
Illustration of the QSIR Model and Neural Network Architecture
(A) Schematic of the augmented QSIR model considered in the present study.
(B) Schematic of the neural network architecture used to learn the quarantine strength function . Here, represents the quarantined infected population prescribed by the quarantine strength rate .
Illustration of the QSIR Model and Neural Network Architecture(A) Schematic of the augmented QSIR model considered in the present study.(B) Schematic of the neural network architecture used to learn the quarantine strength function . Here, represents the quarantined infected population prescribed by the quarantine strength rate .Based on the modified model, we define a COVID-19 spread parameter in a similar way to the reproduction number defined in the SIR model (Equation 4) as indicates that infections are being introduced into the population at a higher rate than they are being removed, leading to rapid spread of the disease. On the other hand, indicates that the COVID-19 spread has been brought under control in the region of consideration. Since does not follow from first principles and is highly dependent on local quarantine policies, we devised a neural network-based approach to approximate it.Recently, it has been shown that neural networks can be used as function approximators to recover unknown constitutive relationships in a system of coupled ODEs., Following this principle, we represent as an n layer-deep neural network with weights , activation function r, and the input vector asFor the implementation, we choose an -layer densely connected neural network with 10 units in the hidden layer and the ReLU activation function. This choice was because we found sigmoidal activation functions to stagnate. The final model is described by a total of 54 tunable parameters. The neural network architecture schematic is shown in Figure 1B. The governing coupled ODEs for the QSIR model areMore details about the model initialization and parameter estimation methods is given in the Experimental Procedures. In all cases considered below, we trained the model using data starting from the dates when the 500th infection was recorded in each region and up to June 1, 2020.
Interpretation of
denotes the rate at which infectedpersons are effectively quarantined and isolated from the remaining population, and thus gives composite information about (1) the effective testing rate of the infected population as the disease progressed and (2) the intensity of the enforced quarantine as a function of time. To understand the nature of evolution of , we look at the time point when approximately shows an inflection point or a sudden increase in . An inflection point in indicates the time when the rate of increase of , i.e., , was at its peak, while a sudden increase corresponds to a sudden intensification of quarantine policies used in the region under consideration.Introduction of in the SIR model has a similar effect as that of having a time varying decreasing contact rate within the population; which would simulate a lockdown situation. As a result, although denotes infected population quarantine, the way it is introduced in our augmented SIR model enables our model to capture broad level population lockdown effects, without burdening regression with additional parameters. We demonstrate this ability of our model in the results of the subsequent sections.Further, we define the quarantine efficiency, , as the increase in within a month following the detection of the 500th infected case in the region under consideration. Thus,The magnitude of shows how rapidly the infected individuals were prevented from coming into contact with the susceptibles in the first month following the detection of the 500th infected case, and is indicative of the quarantine responsiveness: the testing and tracing protocols to identify and isolate infected individuals.
Europe
Figure 2 shows the comparison of the model-estimated infected and recovered case counts with actual COVID-19 data for the highest affected European countries as of June 1, 2020, namely: Russia, the UK, Spain, and Italy, in that order. We find that, irrespective of a small set of optimized parameters (note that the contact rate β and the recovery rate γ are fixed, and not functions of time), a reasonably good match is seen in all four cases.
Figure 2
Europe: Infected and Recovered COVID-19 Case Count Evolution
COVID-19 infected and recovered evolution compared with our neural network augmented model prediction in the highest affected European countries as of June 1, 2020.
Europe: Infected and Recovered COVID-19 Case Count EvolutionCOVID-19infected and recovered evolution compared with our neural network augmented model prediction in the highest affected European countries as of June 1, 2020.Figure 3 shows the evolution of the neural network learnt quarantine strength function for the considered European nations. Inflection points in are seen for the UK, Spain, and Italy at 14, 10, and 16 days, respectively, post detection of the 500th case, i.e., on March 23, March 15, and March 14, respectively. This is in good agreement with nationwide quarantine imposed on March 25, March 14, and March 9, in the UK, Spain, and Italy, respectively.,,
Figure 3
Europe: Quarantine Strength Evolution in Response to COVID-19
Quarantine strength learnt by the neural network in the highest affected European countries as of June 1, 2020. The transition from the red to blue shaded regions indicates the COVID-19 spread parameter of value leading to halting of the infection spread. The green dashed line indicates the time when quarantine measures were implemented in the region under consideration, which generally matches well with an inflection point seen in the plot denoted by the red dashed line. For regions in which a clear inflection or ramp-up point is not seen (Russia), the red dashed line is not shown.
Europe: Quarantine Strength Evolution in Response to COVID-19Quarantine strength learnt by the neural network in the highest affected European countries as of June 1, 2020. The transition from the red to blue shaded regions indicates the COVID-19 spread parameter of value leading to halting of the infection spread. The green dashed line indicates the time when quarantine measures were implemented in the region under consideration, which generally matches well with an inflection point seen in the plot denoted by the red dashed line. For regions in which a clear inflection or ramp-up point is not seen (Russia), the red dashed line is not shown.Figure 16A shows the comparison of the contact rate β, quarantine efficiency as defined in the beginning of this subsection, and the recovery rate γ. It should be noted that the contact and recovery rates are assumed to be constant in our model, in the duration spanning the detection of the 500th infected case and June 1, 2020. The average contact rate in Spain and Italy is seen to be higher than Russia and the UK over the considered duration of 2–3 months, possibly because Russia and the UK were affected relatively late by the virus, which gave sufficient time for the enforcement of strict social distancing protocols before widespread outbreak. For Spain and Italy, the quarantine efficiency and also the recovery rate are generally higher than for Russia and the UK, possibly indicating more efficient testing, isolation, and quarantine, and hospital practices in Spain and Italy. This agrees well with the ineffectiveness of testing, contact tracing, and quarantine practices seen in the UK. Although the social distancing strength also varied with time, we do not focus on that aspect in the present study, and it will be the subject of future studies. A higher quarantine efficiency combined with a higher recovery rate led Spain and Italy to bring down the COVID-19 spread parameter (defined in Equation 6), from to in 16 and 25 days, respectively, as compared with 32 days for the UK and 42 days for Russia (Figure 4).
Figure 16
COVID-19 Spread and Subsequent Response of Majorly Affected Continents and Countries Therein
Global comparison of infection, recovery rates, and quarantine efficiency.
Figure 4
Europe: COVID-19 Spread Parameter Evolution in Response to COVID-19
Control of COVID-19 quantified by the COVID-19 spread parameter evolution in the highest affected European countries as of June 1, 2020. The transition from the red to blue shaded regions indicates leading to halting of the infection spread.
Europe: COVID-19 Spread Parameter Evolution in Response to COVID-19Control of COVID-19 quantified by the COVID-19 spread parameter evolution in the highest affected European countries as of June 1, 2020. The transition from the red to blue shaded regions indicates leading to halting of the infection spread.
Quarantine Efficiency Map for Europe
Figure 5 shows for the 23 highest affected European countries. We can see that in the western European regions is generally higher than in eastern Europe. This can be attributed to the strong quarantine response measures implemented in western countries, such as Spain, Italy, Germany, and France after the rise of infections seen first in Italy and Spain. Although countries, such as Switzerland and Turkey did not enforce a strict quarantine response as compared with their west European counterparts, they were generally successful in halting the infection count before reaching catastrophic proportions, due to strong testing and tracing protocols., Subsequently, these countries also managed to identify potentially infected individuals and prevented them from coming into contact with susceptibles, giving them a high score as seen in Figure 5. In contrast, our study also manages to identify countries like Sweden, which had very limited quarantine measures, with a low score as seen in Figure 5. This strengthens the validity of our model in diagnosing information about the effectiveness of quarantine and isolation protocols in different countries, which agrees well with the actual protocols seen in these countries.
Figure 5
Europe: Quarantine Efficiency Heatmap
Quarantine efficiency, defined in (Equation 12) for the 23 highest affected European countries. Note that is indicative of the quarantine responsiveness: the testing and tracing protocols to identify and isolate infected individuals. The map also shows the demarcation between countries with a high shown by a green dotted line and those with a low shown by a red dotted line.
Europe: Quarantine Efficiency HeatmapQuarantine efficiency, defined in (Equation 12) for the 23 highest affected European countries. Note that is indicative of the quarantine responsiveness: the testing and tracing protocols to identify and isolate infected individuals. The map also shows the demarcation between countries with a high shown by a green dotted line and those with a low shown by a red dotted line.
USA
Figure 6 shows reasonably good match between the model-estimated infected and recovered case counts with actual COVID-19 data for the highest affected North American states (including states from Mexico, the United States, and Canada) as of June 1, 2020, namely: New York, New Jersey, Illinois, and California. for New York and New Jersey show a ramp-up point immediately in the week following the detection of the 500th case in these regions, i.e., on March 19 for New York and on March 24 for New Jersey (Figure 7). This matches well with the actual dates: March 22 in New York and March 21 in New Jersey when stay-at-home orders and isolation measures were enforced in these states. A relatively slower rise of is seen for Illinois, while California showed a ramp-up post a week after detection of the 500th case. Although no significant difference is seen in the mean contact and recovery rates between the different US states, the quarantine efficiency in New York and New Jersey is seen to be significantly higher than that of Illinois and California (Figure 16B), indicating the effectiveness of the rapidly deployed quarantine interventions in New York and New Jersey. Owing to the high quarantine efficiency in New York and New Jersey, these states were able to bring down the COVID-19 spread parameter, to less than 1 in 19 days (Figure 8). On the other hand, although Illinois and California reached close to after the 30 day and 20 day mark, respectively, still remained greater than 1 (Figure 8), indicating that these states were still in the danger zone as of June 1, 2020. An important caveat to this result is the reporting of the recovered data.
Figure 6
USA: Infected and Recovered COVID-19 Case Count Evolution
COVID-19 infected and recovered evolution compared with our neural network augmented model prediction in the highest affected USA states as of June 1, 2020.
Figure 7
USA: Quarantine Strength Evolution in Response to COVID-19
Quarantine strength learnt by the neural network in the highest affected USA states as of June 1, 2020. The transition from the red to blue shaded regions indicates the COVID-19 spread parameter of value leading to halting of the infection spread. The green dashed line indicates the time when quarantine measures were implemented in the region under consideration, which generally matches well with an inflection point (for New York, New Jersey, and Illinois) or a ramp-up point (California) seen in the plot denoted by the red dashed line.
Figure 8
USA: COVID-19 Spread Parameter Evolution in Response to COVID-19
Control of COVID-19 quantified by the COVID-19 spread parameter evolution in the highest affected USA states as of June 1, 2020. The transition from the red to blue shaded regions indicates leading to halting of the infection spread.
USA: Infected and Recovered COVID-19 Case Count EvolutionCOVID-19infected and recovered evolution compared with our neural network augmented model prediction in the highest affected USA states as of June 1, 2020.USA: Quarantine Strength Evolution in Response to COVID-19Quarantine strength learnt by the neural network in the highest affected USA states as of June 1, 2020. The transition from the red to blue shaded regions indicates the COVID-19 spread parameter of value leading to halting of the infection spread. The green dashed line indicates the time when quarantine measures were implemented in the region under consideration, which generally matches well with an inflection point (for New York, New Jersey, and Illinois) or a ramp-up point (California) seen in the plot denoted by the red dashed line.USA: COVID-19 Spread Parameter Evolution in Response to COVID-19Control of COVID-19 quantified by the COVID-19 spread parameter evolution in the highest affected USA states as of June 1, 2020. The transition from the red to blue shaded regions indicates leading to halting of the infection spread.Compared with Europe, the recovery rates seen in North America are significantly lower (Figures 16A and 16B). It should be noted that accurate reporting of recovery rates is likely to play a major role in this apparent difference. In our study, the recovered data include individuals who cannot further transmit infection; and thus includes treated patients who are currently in a healthy state and also individuals who died due to the virus. Since quantification of deaths can be done in a robust manner, the death data are generally reported more accurately. However, there is no clear definition for quantifying the number of people who transitioned from infected to healthy. As a result, accurate and timely reporting of recovered data is seen to have a significant variation between countries, underreporting of the recovered data being a common practice. Since the effective reproduction number calculation depends on the recovered case count, accurate data regarding the recovered count are vital to assess whether the infection has been curtailed in a particular region or not. Thus, our results strongly indicate the need for each country to follow a particular metric for estimating the recovered count robustly, which is vital for data-driven assessment of the pandemic spread.
Quarantine Efficiency Map for the USA
Figure 9A shows the quarantine efficiency for 20 major US states spanning the whole country. Figure 9B shows the comparison between a report published in the Wall Street Journal on May 21 highlighting USA states based on the quarantine measure strength, and the quarantine efficiency magnitude in our study. The size of the circles represents the magnitude of the quarantine efficiency. The blue color indicates the states for which the quarantine efficiency was greater than the mean quarantine efficiency across all US states, while those in red indicate the opposite. Our results indicate that the north-eastern and western states were much more responsive in implementing rapid quarantine measures in the month following early detection; as compared with the southern and central states. This matches the on-ground situation as indicated by a generally strong correlation, which is seen between the red circles in our study (states with lower quarantine efficiency) and the yellow regions seen in in the Wall Street Journal report (states with reduced imposition of restrictions) and between the blue circles in our study (states with higher quarantine efficiency) and the blue regions seen in the Wall Street Journal report (states with generally higher level of restrictions). This strengthens the validity of our approach in which the quarantine efficiency is recovered through a trained neural network rooted in fundamental epidemiological equations.
Figure 9
USA: Quarantine Efficiency Heatmap and Its Comparison with Ground Truth Data
(A) Quarantine efficiency, defined in (Equation 12) for 20 major USA states. Note that is indicative of the quarantine responsiveness: the testing and tracing protocols to identify and isolate infected individuals.
(B) Comparison between a report published in the Wall Street Journal on May 21 and the quarantine efficiency magnitude in our study. A generally strong correlation is seen between the magnitude of quarantine efficiency in our study and the level of restrictions actually imposed in different USA states.
USA: Quarantine Efficiency Heatmap and Its Comparison with Ground Truth Data(A) Quarantine efficiency, defined in (Equation 12) for 20 major USA states. Note that is indicative of the quarantine responsiveness: the testing and tracing protocols to identify and isolate infected individuals.(B) Comparison between a report published in the Wall Street Journal on May 21 and the quarantine efficiency magnitude in our study. A generally strong correlation is seen between the magnitude of quarantine efficiency in our study and the level of restrictions actually imposed in different USA states.
Asia
Figure 10 shows reasonably good match between the model-estimated infected and recovered case count with actual COVID-19 data for the highest affected Asian countries as of June 1, 2020, namely: India, China, and South Korea. shows a rapid ramp-up in China and South Korea (Figure 11), which agrees well with cusps in government interventions which took place in the weeks leading up to and after the end of January and February for China and South Korea, respectively. On the other hand, a slow buildup of is seen for India, with no significant ramp-up. This is reflected in the quarantine efficiency comparison (Figure 16C), which is much higher for China and South Korea compared with India. South Korea shows a significantly lower contact rate than its Asian counterparts, indicating strongly enforced and followed social distancing protocols. No significant difference in the recovery rate is observed between the Asian countries. Owing to the high quarantine efficiency in China and a high quarantine efficiency coupled with strongly enforced social distancing in South Korea, these countries were able to bring down the COVID-19 spread parameter from to in 21 and 13 days, respectively, while it took 33 days in India (Figure 12).
Figure 10
Asia: Infected and Recovered COVID-19 Case Count Evolution
COVID-19 infected and recovered evolution compared with our neural network augmented model prediction in the highest affected Asian countries as of June 1, 2020.
Figure 11
Asia: Quarantine Strength Evolution in Response to COVID-19
Quarantine strength learnt by the neural network in the highest affected Asian countries as of June 1, 2020. The transition from the red to blue shaded regions indicates the COVID-19 spread parameter of value leading to halting of the infection spread. The green dashed line indicates the time when quarantine measures were implemented in the region under consideration, which generally matches well with a ramp-up point seen in the plot denoted by the red dashed line. For regions in which a clear inflection or ramp-up point is not seen (India), the red dashed line is not shown.
Figure 12
Asia: COVID-19 Spread Parameter Evolution in Response to COVID-19
Control of COVID-19 quantified by the COVID-19 spread parameter evolution in the highest affected Asian countries as of June 1, 2020. The transition from the red to blue shaded regions indicates leading to halting of the infection spread.
Asia: Infected and Recovered COVID-19 Case Count EvolutionCOVID-19infected and recovered evolution compared with our neural network augmented model prediction in the highest affected Asian countries as of June 1, 2020.Asia: Quarantine Strength Evolution in Response to COVID-19Quarantine strength learnt by the neural network in the highest affected Asian countries as of June 1, 2020. The transition from the red to blue shaded regions indicates the COVID-19 spread parameter of value leading to halting of the infection spread. The green dashed line indicates the time when quarantine measures were implemented in the region under consideration, which generally matches well with a ramp-up point seen in the plot denoted by the red dashed line. For regions in which a clear inflection or ramp-up point is not seen (India), the red dashed line is not shown.Asia: COVID-19 Spread Parameter Evolution in Response to COVID-19Control of COVID-19 quantified by the COVID-19 spread parameter evolution in the highest affected Asian countries as of June 1, 2020. The transition from the red to blue shaded regions indicates leading to halting of the infection spread.
South America
Figure 13 shows reasonably good match between the model-estimated infected and recovered case count with actual COVID-19 data for the highest affected South American countries as of June 1, 2020, namely: Brazil, Chile, and Peru. For Brazil, is seen to be approximately constant initially with a ramp-up around the 20 day mark; after which is seen to stagnate (Figure 14). The key difference between the COVID-19 progression in Brazil compared with other nations is that the infected and the recovered (recovered healthy + dead in our study) count is very close to one another as the disease progressed (Figure 13). Owing to this, as the disease progressed, the new infectedpeople introduced in the population were balanced by the infectedpeople removed from the population, either by being healthy or deceased. This higher recovery rate combined with a generally low quarantine efficiency and contact rate (Figure 16D) manifests itself in the COVID-19 spread parameter for Brazil to be for almost the entire duration of the disease progression (Figure 15). For Chile, is almost constant for the entire duration considered (Figure 14). Thus, although government regulations were imposed swiftly following the initial detection of the virus, leading to a high initial magnitude of, the government imposition became subsequently relaxed. This may be attributed to several political and social factors outside the scope of the present study. Even for Chile, the infected and recovered count remain close to each other compared with other nations. A generally high quarantine magnitude coupled with a moderate recovery rate (Figure 16D) leads to being for the entire duration of disease progression (Figure 15). In Peru, shows a very slow build up (Figure 14) with a very low magnitude. Also, the recovered count is lower than the infected count compared with its South American counterparts (Figure 13). A low quarantine efficiency coupled with a low recovery rate (Figure 16D) leads Peru to be in the danger zone () for 48 days post detection of the 500th case (Figure 15).
Figure 13
South America: Infected and Recovered COVID-19 Case Count Evolution
COVID-19 infected and recovered evolution compared with our neural network augmented model prediction in the highest affected South American countries as of June 1, 2020.
Figure 14
South America: Quarantine Strength Evolution in Response to COVID-19
Quarantine strength learnt by the neural network in the highest affected South American countries as of June 1, 2020. The transition from the red to blue shaded regions indicates the COVID-19 spread parameter of value leading to halting of the infection spread. The green dotted line indicates the time when quarantine measures were implemented in the region under consideration.
Figure 15
South America: COVID-19 Spread Parameter Evolution in Response to COVID-19
Control of COVID-19 quantified by the COVID-19 spread parameter evolution in the highest affected South American countries as of June 1, 2020. The transition from the red to blue shaded regions indicates leading to halting of the infection spread.
South America: Infected and Recovered COVID-19 Case Count EvolutionCOVID-19infected and recovered evolution compared with our neural network augmented model prediction in the highest affected South American countries as of June 1, 2020.South America: Quarantine Strength Evolution in Response to COVID-19Quarantine strength learnt by the neural network in the highest affected South American countries as of June 1, 2020. The transition from the red to blue shaded regions indicates the COVID-19 spread parameter of value leading to halting of the infection spread. The green dotted line indicates the time when quarantine measures were implemented in the region under consideration.South America: COVID-19 Spread Parameter Evolution in Response to COVID-19Control of COVID-19 quantified by the COVID-19 spread parameter evolution in the highest affected South American countries as of June 1, 2020. The transition from the red to blue shaded regions indicates leading to halting of the infection spread.COVID-19 Spread and Subsequent Response of Majorly Affected Continents and Countries ThereinGlobal comparison of infection, recovery rates, and quarantine efficiency.
Discussion
Our model captures the infected and recovered counts for highly affected countries in Europe, North America, Asia, and South America reasonably well, and is thus globally applicable. Along with capturing the evolution of infected and recovered data, the novel machine learning-aided epidemiological approach allows us to extract valuable information regarding the quarantine policies, the evolution of COVID-19 spread parameter , the mean contact rate (social distancing effectiveness), and the recovery rate. Thus, it becomes possible to compare across different countries, with the model serving as an important diagnostic tool.Our results show a generally strong correlation between strengthening of the quarantine controls, i.e., increasing as learnt by the neural network model; actions taken by the regions' respective governments; and decrease of the COVID-19 spread parameter for all continents considered in the present study.Based on the COVID-19 data collected (details in the Experimental Procedures), we note that accurate and timely reporting of recovered data is seen to have a significant variation between countries; with under reporting of the recovered data being a common practice. In the North American countries, for example, the recovered data are significantly lower than in the European and Asian counterparts. Thus, our results strongly indicate the need for each country to follow a particular metric for estimating the recovered count robustly, which is vital for data-driven assessment of the pandemic spread.The key highlights of our model are: (1) it is highly interpretable with few free parameters rooted in an epidemiological model, (2) its reliance on only COVID-19 data and not on previous epidemics, and (3) it is highly flexible and adaptable to different compartmental modeling assumptions. In particular, our method can be readily extended to more complex compartmental models, including hospitalization rates, testing rates, and distinction between symptomatic and asymptomatic individuals. Thus, the methodology presented in the present study can be readily adapted to any province, state, or country globally; making it a potentially useful tool for policy makers in event of future outbreaks or a relapse in the current one.Finally, we have hosted our quarantine diagnosis results for the top 70 affected countries worldwide on a public platform (https://covid19ml.org/or https://rajdandekar.github.io/COVID-QuarantineStrength/), which can be used for informed decision making by public health officials and researchers alike. We believe that such a publicly available global tool will be of significant value for researchers who want to study the correlation between the quarantine strength evolution in a particular region with a wide range of metrics spanning from mortality rate to socio-economic landscape impact of COVID-19 in that region.Currently, our model lacks forecasting abilities. To do robust forecasting based on previous data available, the model needs to be further augmented through coupling with real-time metrics parameterizing social distancing, e.g., the publicly available Apple mobility data. This could be the subject of future studies.
Experimental Procedures
Resource Availability
Lead Contact
Further information and requests for resources should be directed towards Professor George Barbastathis, MIT. Email: gbarb@mit.edu.
Materials Availability
All the results of our work are hosted publicly at covid19ml.org. Preliminary versions of this work can be found at medRxiv 2020.04.03.20052084 and arXiv:2004.02752.
Data and Code Availability
Data for the infected and recovered case count in all regions was obtained from the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University. All code files are available at https://github.com/RajDandekar/MIT-Global-COVID-Modelling-Project-1. All results are publicly hosted at https://covid19ml.org/or https://rajdandekar.github.io/COVID-QuarantineStrength/.
Augmented QSIR Model: Initial Conditions
The starting point for each simulation was the day at which 500 infected cases were crossed, i.e., . The number of susceptible individuals was assumed to be equal to the population of the considered region. Also, in all simulations, the number of recovered individuals was initialized from data at as defined above. The quarantined population is initialized to a small number .
Augmented QSIR Model: Parameter .Estimation
The time-resolved data for the infected, and recovered, for each locale considered were obtained from the CSSE at Johns Hopkins University. The neural network augmented SIR ODE system was trained by minimizing the mean square error loss functionwhich includes the neural network's weights W. For most of the regions under consideration, were optimized by minimizing the loss function given in (Equation 13). Minimization was employed using local adjoint sensitivity analysis, following a similar procedure outlined in a recent study with the ADAM optimizer, with a learning rate of 0.01. The iterations required for convergence varied based on the region considered and generally ranged from 40,000 to 100,000. For regions with a low recovered count: all US states and the UK, we used a two-stage optimization procedure to find the optimal . In the first stage, (Equation 13) was minimized. For the second stage, we fix the optimal found in the first stage to optimize for the remaining parameters: based on the loss function defined just on the infected count as . In the second stage, we do not include the recovered count in the loss function, since depends on , which have already been optimized in the first stage. By placing more emphasis on minimizing the infected count, such a two-stage procedure leads to much more accurate model estimates; when the recovered data count is low. The iterations required for convergence in both stages varied based on the region considered and generally ranged from 30,000 to 100,000.
Parameter Inference: Gaussian Process Residue Model
To validate the robustness of the model and the uniqueness of the parameters recovered by the model, we consider a Gaussian process residue model for uncertainty quantification. Gaussian processes have emerged as a useful tool for regression, classification, clustering, and uncertainty quantification., Gaussian process regression can be viewed as a Bayesian inference problem where we want to recover the posterior for the regression function that best approximates the training data. The novelty of such an approach stems from using the previous probability distribution over a function space rather than from a finite parametric system. Each realization of such a function is a multivariate normal distribution, which allows for exact estimation of the posterior distribution. The covariance underlying the function space distribution is specified by the kernel function. The kernel function affects the shape and noise of the resulting posterior distribution. In the present study, we fit a Gaussian process regression model between the error resulting from the best fit model (described in earlier in the section on the augmented QSIR model and optimized using the method described in the previos section) and the data. For the previous over the function space, we use a mean of zero and variance described by a squared exponential kernel with a lengthscale of 1 and a significantly high signal standard deviation of , which allows for noisy estimates of the posterior. Such a fitted model for the infected and recovered case count for Russia is shown in Figure 17. It should be noted that the recovered optimal posterior is not a deterministic function, but a distribution over function spaces. Subsequently, we sampled 500 error residues from this model and superimposed them on the best fit predictions to simulate 500 samples of the infected and recovered case count data. Finally, we applied our model described in the section on the augmented QSIR model and optimized using the method described in the previous section for these samples. Figure 18 shows inferred parameters for 500 realizations of the Gaussian process residue model superimposed on the best fit model prediction applied to Russia and shown for (1) the quarantine strength function, (2) the contact rate β, and (3) the recovery rate . It can be seen that, for all realizations, is seen to follow a similar behavior, which lies close to the best fit model prediction. In addition, the inferred histograms for the contact rate β and the recovery rate show a peak that is close to the best fit model prediction. This further validates the robustness of the model and strengthens the uniqueness of the parameters recovered by the model. Similar figures for all other countries are shown in the Supplemental Information.
Figure 17
Gaussian Process Residue Regression Model
Gaussian process residue model fitted to (A) the infected case count and (B) the recovered case count for Russia.
Figure 18
Parameter Inference to Demonstrate Robustness of QSIR Model Recovered Parameters
Inferred parameters for 500 realizations of the Gaussian process residue model superimposed on the best fit model prediction applied to Russia and shown for (A) the quarantine strength function , (B) the contact rate β, and the recovery rate . A total of 30 million iterations were performed on the MIT Supercloud cluster to generate parameter histograms for one country.
Gaussian Process Residue Regression ModelGaussian process residue model fitted to (A) the infected case count and (B) the recovered case count for Russia.Parameter Inference to Demonstrate Robustness of QSIR Model Recovered ParametersInferred parameters for 500 realizations of the Gaussian process residue model superimposed on the best fit model prediction applied to Russia and shown for (A) the quarantine strength function , (B) the contact rate β, and the recovery rate . A total of 30 million iterations were performed on the MIT Supercloud cluster to generate parameter histograms for one country.A total of 30 million iterations were performed on the MIT Supercloud cluster to generate parameter histograms for one country.
Authors: Kiesha Prem; Yang Liu; Timothy W Russell; Adam J Kucharski; Rosalind M Eggo; Nicholas Davies; Mark Jit; Petra Klepac Journal: Lancet Public Health Date: 2020-03-25
Authors: Seyed M Moghadas; Affan Shoukat; Meagan C Fitzpatrick; Chad R Wells; Pratha Sah; Abhishek Pandey; Jeffrey D Sachs; Zheng Wang; Lauren A Meyers; Burton H Singer; Alison P Galvani Journal: Proc Natl Acad Sci U S A Date: 2020-04-03 Impact factor: 11.205
Authors: Adam J Kucharski; Timothy W Russell; Charlie Diamond; Yang Liu; John Edmunds; Sebastian Funk; Rosalind M Eggo Journal: Lancet Infect Dis Date: 2020-03-11 Impact factor: 25.071
Authors: Nicolas Banholzer; Adrian Lison; Dennis Özcelik; Tanja Stadler; Stefan Feuerriegel; Werner Vach Journal: Eur J Epidemiol Date: 2022-09-24 Impact factor: 12.434