Vladislav Soukhovolsky1, Anton Kovalev1, Anne Pitt1, Boris Kessel2. 1. Federal Research Center "Krasnoyarsk Science Center of the Siberian Branch of the Russian Academy of Sciences". 2. Hillel Yaffe Medical Center, Rappoport Medical School, Technion, Haifa, Israel.
Abstract
А model of coronavirus incidence is proposed. Process of disease development is represented as analogue of first- and second order phase transition in physical systems. The model is very simple in terms of the data necessary for the calculations. To verify the proposed model, only data on the current incidence rate are required. However, the determination coefficient of model R2 is very high and exceeds 0.95 for most countries. The model permits the accurate prediction of the pandemics dynamics at intervals of up to 10 days. The ADL(autoregressive distributed lag)-model was introduced in addition to the phase transition model to describe the development of the disease at the exponential phase.The ADL-model allows describing nonmonotonic changes in relative infection over the time, and providing to governments and health care decision makers the possibility to predict the outcomes of their decisions on public health.
А model of coronavirus incidence is proposed. Process of disease development is represented as analogue of first- and second order phase transition in physical systems. The model is very simple in terms of the data necessary for the calculations. To verify the proposed model, only data on the current incidence rate are required. However, the determination coefficient of model R2 is very high and exceeds 0.95 for most countries. The model permits the accurate prediction of the pandemics dynamics at intervals of up to 10 days. The ADL(autoregressive distributed lag)-model was introduced in addition to the phase transition model to describe the development of the disease at the exponential phase.The ADL-model allows describing nonmonotonic changes in relative infection over the time, and providing to governments and health care decision makers the possibility to predict the outcomes of their decisions on public health.
The COVID 19 pandemic definitively changed the face of the modern world. It emerged even the best national public health care systems before a huge trouble. COVID 19 is a mass causalities event per definition: a situation when the amount of the acute care patients overwhelmed emergency medical services resources, such as personnel and equipment. For "war plan", the intelligence is crucial.Daily data about the coronavirus pandemic, including the number of new cases and deaths, is collected internationally. The use of these data makes possible the modelling of the international disease spread. However, due to the limited reliability of morbidity and mortality statistics, it is still questionable whether current databases may be used to assess the patterns of the epidemic and its prognosis.Moreover, neither the significant percentage of asymptomatic virus-infected people nor the real number of recovered patients has not been well-established. Nevertheless, despite the limited and uncertain data, the mathematic modelling of the pandemic's progression is crucial for predicting the possible future spread, preparing the next steps to combat it, and correctly evaluating the efficacy of governmental public health measures already performed.One of the most important information components is the possibility to develop the high-grade accuracy matemathics model of epidemics progression allowing the medical care providers to organize better the national resources.Adequate modelling also establishes a framework for future evaluation of this and other pandemics.This paper describes novel infectious disease modelling, covering many aspects of COVID-19 dynamics.
Material and methods
For statistical analysis of coronavirus incidence dynamics, data from the World Health Organization [1] and from Johns Hopkins University [2] from 01.01.2020 to 04.17.2020 was collected.There are many infectious disease spread models, such as SIR, SEIRD, and SEIHR. Most of them use differential equation systems (continuous time) or differenсе equations (discrete time) for describing population dynamics with different characteristics in relation to infection [3], [4], [5], [6], [7].In the simplest SIR-model, the entire population is divided into three groups: Susceptible (S), Infected (I) and Removed (R) [8] The SIR model is a homogeneous Markov chain with a single absorbing state R, which can be imagined as a graph with the states of the S, I, R process and with the transitions between the states. The probability
defines the potential possibility of moving from state i to state j А .Verification of the parameters of the SIR-model is based on daily medical statistics. However, in this case, the value of variable I (where patients with latent signs should be included) will not be the same as the I values from the statistics.Since, at the early stages of an epidemic, the number of S susceptible patients is unproportionally much higher than the number of actually ill patients, and the rate of recovery is less than the rate of infection, the SIR-model defines the S value as constant. Thus, at the initial stage of an epidemic's development, the dynamics of infection can be represented as follows:where m = βS = const.However, model [1] is unable to recognize the nonmonotonic changes of process dynamics (for example, resulting from quarantine measures). In general, the SIR model describes the ending of the epidemic, as a process of exhausting the extent of group S, assuming in advance that the total population will be infected.Most other modelling systems use Markov models that have all the above-mentioned disadvantages. [9,10]. The proposed bifurcation models that include the sudden increases in incidence at certain stages of the epidemic [11] are regression models and do not explain the reasons for such bifurcations.We propose that the prevalence of coronavirus infected population can exist in one of two states. The number of cases is low and is kept constant in time at the initial low-infection stage of the disease. The viral epidemic goes to an exponential stage with very high incidence later. The coronavirus pandemic may be interpreted as the biological analogue of a well-known physics phenomenon, first-order phase transition. An example of such a phase transition is the condensation of a substance from gaseous to liquid state, when the density of gas molecules increases sharply from some critical temperature or pressure. In general, the function of thermodynamic potential (free energy) may be presented as a bimodal function with two local minimums and a maximum between these lows - a potential barrier.Theory and Calculations:Theory:With the bimodal function of the F(x) state, the system could be near one of two stable states - low x or high x. A "soft" system with a small height and a small half-width of a potential barrier easily alters its phase state. On the contrary, a "tough" system with a high potential barrier and a high half-width value of a potential barrier will last quite some time in this condition.The transition from a stable state with low incidence to an epidemic state will occur when some critical value of x of incidents is reached. The ecological potential of the population morbidity F(x) will be defined as follows: consider a long time series of coronavirus cases since the first patient. In this series, there will be a significant proportion of events (days in time series) with cases close to x1. These events will be responsible for the state of the low-incidence population. The other part of the events will have values close to x, the state of epidemic. Finally, a small proportion of the events will have values close to the critical value of x
r. When, rarely, there are many cases close to this value, there is a high rate of system transfer from x1 to x2.If the proportion of people with this disease is low, and the proportion of time when the population is observed with a slightly higher incidence compared to x1 is small, then condition x1 can be said to be stable.The potential function of F(x) can be represented as the reverse function of the daily infection incidents’ density distribution function. In a state of initial stage of epidemic with low incidence,the number of cases will be near the x1 and value of F(x will be small. If we consider the epidemic development as a first-order phase transition, this population will be near the critical value of xr number of patients for a very short time, and the value of the function F(x will be large.The process of the pandemic's growth can be considered as an analogue to a second order phase transition, in which the connection of the system's integral characteristics (the order parameter w, varying from 0 to 1) to the impact of the external factor Y is described by the equation
[12,13]. In this version of the second order phase transition model, the role of the external factor Y is significant from the pandemic's absolute beginning on the planetand w(T) is the portion of countries with coronavirus incidents at time T.To simulate the processes of coronavirus dynamics in an epidemic state, we will use data on the current daily incidence of disease [I(t)-I(t-1)]. Since the n population in different countries can vary greatly - from billions to tens of thousands, we obtained, as comparable data for each country, indicators of relative current incidence introduced as per inhabitant. To reduce the variance, the value of lnp(t) will be considered instead p(t).Under the assumption that the reported incidence on day t, after the onset of the epidemic, depends both on the duration of the t epidemic since the beginning of its exponential stage and on the disease in k of the previous days. Then, in the logarithmic scale, you can use the ADL (k,1) (autoregressive distributed lag) - model [14].where p(t) - relative morbidity, t - duration of the exponential phase of the epidemic (number of days since its beginning), a - odds, k - the order of autogressing, ε(t) - noise.Eq. (2) can be considered as a regression with the availablevalues of the current relative morbidity, the relative morbidity during the previous days, and the duration of time since the beginning of the exponential epidemic stage. To calculate regression coefficients, it is necessary to evaluate the order of k autoregression. The easiest way to estimate k is to calculate the partial autocorrelation function (PACF) of the simulated time series and to determine the number of PACF members that are significantly different from zero [15].The quality of approximation with the ADL model may be determined by two indicators - the degree of consent of the accounting data and the model number of cases and the magnitude of the phase shift between the number of accounting data and the model time line. The consent of the amplitude of model and in-kind data was assessed by the R regressive equation determination ratio [2]. To determine the phase shift of the model and the natural series of morbidity, it is possible to calculate the cross-correlation function (CCF) of the model time series and the number of in-kind data.Calculations:The coronavirus epidemic model as a first-order phase transition was tested on data of the disease's development in 55 administrative divisions of the United States. According to the data from 01.22.2020 to 04.12.2020, the f(x) function was built to distribute the number of cases in one day and the reverse function of F(x).For states with different populations and different levels of morbidity, critical numbers are very close when the exponential phase of the epidemic begins (Table 1
).
Table 1
The threshold values of the number of cases in four different states of the United States.
State
population
incidents on 04.12.2020
Threshold
Illinois
12 671 821
20 852
25
North Dakota
762 062
308
19
California
39 512 223
22 795
21
Wyoming
578 759
270
26
The threshold values of the number of cases in four different states of the United States.Similar data was obtained for all countries from the WHO registry (
Fig. 3
).
Fig. 3
The potential function F(x) and the first-order phase transition for patients in all countries from the WHO registry from 01.01.2020 to 04.08.2020.
The potential function F(x) and the first-order phase transition for patients throughout the United States from 01.01.2020 to 04.12.2020 (total 555,152 cases).The dynamics of coronavirus development in different US states. Data is aligned with the start date of the exponential phase of the epidemic in the state, taken as t = 0 (1 - North Dakota, 2 - California, 3 - Illinois, 4 - Wyoming).The potential function F(x) and the first-order phase transition for patients in all countries from the WHO registry from 01.01.2020 to 04.08.2020.The peak function of F(x) accounts for the critical number of cases = 30. Thus, this means that the epidemic is probably going to an exponential phase when the number of infected is between 20 to 30 people in a country.It is possible on next step to calculate the change in time of incidence within large territories or the planet as a whole, using the second order phase transition model. Fig. 4
shows a change in the time of the order parameter square for 211 countries in the WHO registry.
Fig. 4
The dynamics of coronavirus disease on the planet. 1 - initial stage of infection (-single country - China), 2 – initial stage of infection, 3 - inhibition of the infection process, 4 - stage of mass infection, 5 - complete infection of the planet.
The dynamics of coronavirus disease on the planet. 1 - initial stage of infection (-single country - China), 2 – initial stage of infection, 3 - inhibition of the infection process, 4 - stage of mass infection, 5 - complete infection of the planet.A similar pattern is true for a large country such as the United States. Fig. 5
shows the change, over time, in the relative number of states and other administrative units in the U.S. where the coronavirus was registered (data taken from the Johns Hopkins University site).
Fig. 5
The dynamics of coronavirus in the United States. 1 - Initial stage of infection (three states), 2 – inhibition stage of the infection process, 3 - mass infection stage, 4 - complete infection of the country.
The dynamics of coronavirus in the United States. 1 - Initial stage of infection (three states), 2 – inhibition stage of the infection process, 3 - mass infection stage, 4 - complete infection of the country.Fig. 6 shows the dynamics of the relative coronavirus incidence in the United States from 01.23.2020 to 04.08.2020.
Fig. 6
Dynamics of the relative coronavirus incidence for the United States (1 - lag phase; 2 - exponential phase).
Dynamics of the relative coronavirus incidence for the United States (1 - lag phase; 2 - exponential phase).As the epidemic develops in a single country, the following phases may be identified:- lag-phase, when a consistently low number of cases is recorded over a long period (1 - 2 people per country);- exponential phase, when the relative number of cases (the number of cases per inhabitant) increases exponentially over time;- stabilization and fading phase, when the relative number of cases does not change over time and later begins to decrease.As shown in Fig. 6, the lag-phase of the disease lasted from January 23 (the date on which the first patient was recorded, according to WHO) until March 6, after which the disease went into an exponential phase, which can be approximated by the equation (where h(t) is the estimated relative number of cases).The ADL model may be used to more accurately approximate incidence data. To estimate the order of the autocorrelation member in Eq. (2), a partial autocorrelation function (PACF) of a number of statistical data differentials and exponential equations {ln p(t) – h(t)} was calculated. This transformation brings the data of a different time range to a stationary form. Fig. 7
shows PACF for the U.S. time series from 03.06.2020 to 04.08 2020.
Fig. 7
PACF time series for USA (1 PACF; 2 - St. Err.).
PACF time series for USA (1 PACF; 2 - St. Err.).It has shown in Fig. 7, that the order k of the autocorrelation function in [2] equals 2 and, therefore, it is possible to write the equation of the morbidity dynamics:Equation ratios [3] can be interpreted as the susceptibility of magnitude ln p(t) to change values ln p(t-1), ln p(t-2) and t:If the susceptibilities [4] are positive, then with the growth of ln p(t-1), ln p(t-2), relative contamination rates b(t) also grow. With negative values of [4], relative infection rates fall.The calculations showed that, not only for the U.S., but also for other countries, the order k of autoregression for model [3] may be adopted equal to 2. Fig. 8
reflects the epidemic's dynamics in the U.S. and the model's calculations [3] with the Statistica 10 package.
Fig. 8
Data for USA (1), the model using data to 04.04.2020 (2) and the forecast (3) according to the model (3) from 04.09.2020 to 04.17.2020.
Data for USA (1), the model using data to 04.04.2020 (2) and the forecast (3) according to the model (3) from 04.09.2020 to 04.17.2020.The calculated data for the parameters of the ADL model for the exponential phase of the disease in the United States are presented in Table 2
.
Table 2.
Parameters of the ADL-model of the epidemic in the United States.
Parameters of model
value
Std.Err.
t(32)
p-level
a0
0.429
0.756
0.568
0.574
b
-0.010
0.011
-0.962
0.343
a2
-0.096
0.185
-0.519
0.607
a1
1.099
0.176
6.236
0.000
R2
0.996
F
2 662.1
St. err. - standard error of the coefficient, t - value of t-Student criterion for the coefficient; p -confidence level.
Parameters of the ADL-model of the epidemic in the United States.St. err. - standard error of the coefficient, t - value of t-Student criterion for the coefficient; p -confidence level.The values of a> 1 can be characterized as a parameter proportional to the relative number of latent and manifest cases at the moment (t-1). As can be seen from Table 2, the determination factor R ≈ 0.95 for the ADL model is higher than the R ≈ 0.85 for a simple exponential model (Fig. 6). Temporary series of data on relative morbidity and model calculations by [3] is synphasse. This is seen on the cross-correlation function (Fig. 9
).
Fig. 9
The cross-correlation function (CCF) of statistics data and model series for United States. (1 - CCF; 2 - St. Err.).
The cross-correlation function (CCF) of statistics data and model series for United States. (1 - CCF; 2 - St. Err.).In a number of countries, the disease intensity begins to decrease, and a monotonic exponential model cannot describe this process. Fig. 10
shows data on the incidence of coronavirus in Russia.
Fig. 10
Data for Russia (1), the model according to the data until 04.04.2020 (2) and the forecast (3) according to the model from 04.09.2020 to 04.17.2020.
Data for Russia (1), the model according to the data until 04.04.2020 (2) and the forecast (3) according to the model from 04.09.2020 to 04.17.2020.Table 3 shows the calculations of the ADL model's parameters model to describe the epidemic in Russia.
Table 3
Parameters of the ADL-model of the epidemic's development in Russia.
Parameters of model
value
Std.Err.
t(24)
p - level
a0
-10.516
2.805
-3.750
0.001
b
0.115
0.032
3.620
0.001
a2
-0.369
0.172
-2.146
0.042
a1
0.761
0.169
4.489
0.000
R2
0.987
F
709 200
Parameters of the ADL-model of the epidemic's development in Russia.As shown in Table 3 and Fig. 10, the ADL model describes very well the development dynamics of the disease in Russia. Similar calculations of the ADL-model coefficients were performed for a number of countries and are summarized in Table 4
.
Table 4
Model parameters (3) for different countries.
Parameters of model
Korea
USA
Israel
Russia
Italy
Germany
Spain
a0
-4.993
0.429
-1.125
-10.516
-0.067
-0.3480
-0.019
b
-0.007
-0.010
0.004
0.115
-0.004
-0.0010
-0.004
a2
-0.013
-0.096
-0.022
-0.369
-0.198
-0.1804
-0.383
a1
0.631
1.099
0.932
0.761
1.176
1.1381
1.364
R2
0.770
0.996
0.957
0.987
0.990
0.9830
0.995
F
48.9
2 662.1
266.4
709.2
1 300.6
660.2
2 275
Model parameters (3) for different countries.As shown in Table 3, the susceptibility a2 is negative for all countries, that is, with an increase in ln p(t-2), the number of infected on day t decreases. Susceptibility b has a positive sign for Russia (to a greater extent) and for Israel (to a lesser extent), which can be interpreted as an increase in incidence over time. For other countries, the coefficient b is negative and then, over time, the incidence should fall. Small values of coefficient a1 for Russia and Korea could indicate a weak effect of the number of infected the previous day on the current incidence.Using the values of the parameters of the model [3] for individual countries, we can give a short-term prognosis of incidence. Indicators of the quality of the forecast can be defined as the difference between the data ln p (t) of the official statistics of morbidity and forecast calculations (Table 5
).
Table 5
The difference between the data ln p(t) of the official statistics of incidence and predictive calculations.
Date
Korea
USA
Israel
Russia
Italy
04.09
-0.144
0.036
0.125
-0.142
0.103
04.10
-0.293
0.050
0.399
-0.311
0.174
04.11
-0.209
0.003
0.214
-0.470
0.265
04.12
-0.153
-0.047
0.219
-0.489
0.226
04.13
-0.157
-0.059
0.098
-0.488
0.100
04.14
-0.191
0.019
0.168
-0.534
0.040
04.15
-0.197
0.159
-0.125
-0.592
0.117
04.16
-0.249
0.309
-0.247
-0.688
0.320
8 days average
-0.199
0.059
0.106
-0.464
0.168
The difference between the data ln p(t) of the official statistics of incidence and predictive calculations.Absolute discrepancy between statistics and model data for Israel and Russia.The minus sign in Table 5 means that the incidence data is less than the predicted estimates. Such ratios observed for Korea and Russia, and it is presumed that their slight decrease in incidence compared to the model may be due to quarantine measures. The same is true for data during the week from 04.09 to 04.16 in Israel.
Discussion
From the point of view of phase transition, the transition from x1 to x2 can occur in two ways:With large enough fluctuations of the system state, the value of which exceeds the height of the potential barrier, the system can spontaneously move from one state to another; orUnder the influence of an external field, one of the local minimum potential functions can disappear, and the system will "fall" into the only remaining state.The analysis of morbidity dynamics raises the question of whether the observed dynamics of the pandemic have some common properties or are individual to each country. Thus, we consider the dynamics of the disease onset (the date of the first patient) in different countries.As an indicator of the pandemic's development, we introduce a variable - the parameter of the order w(t) - the ratio of the number of countries which have not currently registered the disease t (0 As shown in Figs. 4 and 5, the pandemic's development, either worldwide or in a single country is characterized by deterministic dependencies. One can introduce the rate of disease - the angular coefficient b of the equation. Globally, the rate of coronavirus spread is characterized by B ≈ 0.03, and for the U.S. B ≈ 0.067, that is, the coronavirus spread rate in the United States is more than twice the spread rate across the planet. These differences appear be related to the peculiarities of population movements, governmental public health measures, economics etc. These patterns of the disease spread need to take into account when planning how to fight the pandemic.To describe the behavior of a bistable system, three characteristic times can be entered - the characteristic time τoscillations near the state x the characteristic time of oscillations near the state x, and the typical time θ of the system's transfer from state x to state x and back. In this case, two time series of cases can be accounted: a temporary series characterizing the population near x and a time range characterizing the population near x. These series may be considered stationary, the values of which fluctuate near the values of x and xMany questions were reffered immediately to the public health systems worldwide with burst of COVID 19 pandemics. Even advanced national care systems found itself unprepared to "tsunami –like" growing amount of corona virus patients. The major efforts are being made in achieving of adequate medical staff, equipment, drugs and etc. However, the basic strategy planning of every single country is depends on prognostic epidemiologic modelling while the strict monitoring of the rapidly changed parameters. The mathematical models were created by scientists in order to predict the course of the outbreak and identify governmental interventions that could be effective in decrease of epidemics progression.Three possible approaches for such modeling may be considered. The first approach is to model the available data using non-linear regression (for example, the agent-based model [16]. Using this method allows obtaining accurate estimates of existing data and give a short-term forecast of incidence, but it is difficult to describe the mechanisms of the pandemic.Most coronavirus spread models use various versions of the SIR-model (for example, SEIR-models [17], [18], [19]. In this case, the problem is the size estimation of the S-group. If we accept that the entire country's population should be attributed to this group, then, based on the SIR-models, it should be concluded that the pandemic will stop only when the majority of the population becomes ill [20,21]. Moreover, the SIR-models do not raise the question of the critical number of patients, after reaching the beginning of the exponential phase. Choosing the model of sufficiently large initial values of the cases number makes it difficult to answer this question.Finally, a recursive bifurcation model is proposed in which the development of a disease is considered a transition from a condition with a low number of cases of the epidemic [11]. This method is similar to our proposed approach. Our model describes the infection processes as first and second order phase transitions. The development model of the coronavirus pandemic as a first-order phase transition indicates either the possibility of the population returning to a state with a low level of cases or the epidemic returning. In this case, the appearance of a second or multiple waves of the disease depends on fluctuations in the number of patients near state x1, the height of the potential barrier and the influence of various external factors. The possibility of the pandemic's later waves appearing in the model as phase transitions iseasily explained, while SIR-models and derivative models do not predict such effects. In Markov models, the influence of an external field can be introduced only by force through specified changes in the coefficients of the transition matrix.The possibility of describing the spatial dynamics of pandemic development as second order phase transitions is somewhat unexpected. Perhaps this effect may be related to the peculiarities of the movements of infected people.ConclusionThe model is very simple in terms of the data necessary for the calculations. To verify the proposed model, only data on the current incidence rate are required. However, the determination coefficient of model R2 is very high and exceeds 0.95 for most countries. The model permits the accurate prediction of the pandemics dynamics at intervals of up to 10 days. The ADL model allows describing nonmonotonic changes in relative infection over the time, and providing to governments and health care decision makers the possibility to predict the outcomes of their decisions on public health.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Table 6
Absolute discrepancy between statistics and model data for Israel and Russia.
Authors: Samuel Clifford; Carl A B Pearson; Petra Klepac; Kevin Van Zandvoort; Billy J Quilty; Rosalind M Eggo; Stefan Flasche Journal: J Travel Med Date: 2020-08-20 Impact factor: 8.490
Authors: Wladimir Lyra; José-Dias do Nascimento; Jaber Belkhiria; Leandro de Almeida; Pedro Paulo M Chrispim; Ion de Andrade Journal: PLoS One Date: 2020-09-02 Impact factor: 3.240