Literature DB >> 33994678

Time-dependent probability distribution for number of infection in a stochastic SIS model: case study COVID-19.

Abstract

We derive the time-dependent probability distribution for the number of infected individuals at a given time in a stochastic Susceptible-Infected-Susceptible (SIS) epidemic model. The mean, variance, skewness, and kurtosis of the distribution are obtained as a function of time. We study the effect of noise intensity on the distribution and later derive and analyze the effect of changes in the transmission and recovery rates of the disease. Our analysis reveals that the time-dependent probability density function exists if the basic reproduction number is greater than one. It converges to the Dirac delta function on the long run (entirely concentrated on zero) as the basic reproduction number tends to one from above. The result is applied using published COVID-19 parameters and also applied to analyze the probability distribution of the aggregate number of COVID-19 cases in the United States for the period: January 22, 2020-March 23, 2021. Findings show that the distribution shifts concentration to the right until it concentrates entirely on the carrying infection capacity as the infection growth rate increases or the recovery rate reduces. The disease eradication and disease persistence thresholds are calculated.

Entities: CellLine Chemical Disease Species

Keywords: COVID-19; Hypergeometric; Infection; Kummer; Laguerre function; Probability density function; Stochastic differential equation; Whittaker function

Year: 2021 PMID： 33994678 PMCID： PMC8112579 DOI： 10.1016/j.chaos.2021.110983

Source DB: PubMed Journal: Chaos Solitons Fractals ISSN： 0960-0779 Impact factor: 5.944

Introduction

Several advances have been made in the field of mathematical modeling in studying the behavior of certain dynamical systems whose state evolves with time [1], [10], [23], [38], [43], [44], [49], [51], [54], [56], [57]. Most of these dynamical processes in nature are far from equilibrium and so using stationary distribution to describe and analyze the evolution of such processes has a great limitation [8], [9], [30], [55]. Although studying the evolution of such random dynamical systems using time-dependent probability density function is challenging, significant analysis and estimates about the process can be deduced from such study. The exact solutions in terms of probability density function of dynamical systems can only be obtained for a restricted class of dynamical systems [12], [13], [21], [31], [41]. Apart from Fokker-Planck equation (FPE), a partial differential equation that describes the time evolution of probability density function, there are other methods such as the moment or cumulant equations approach [17] that can be used to analyze the distribution of nonlinear systems under random perturbations. Because of the complexity of solving partial differential equations, many approximate methods have been developed numerically to solve the FPE equation for situations far from equilibrium. A simple thermodynamically consistent matrix numerical method (MNM) is developed by Holubec et al. [26] for solving over-damped FPEs with time-dependent coefficients. Paola at al. [17] proposed an approximate method called the Taylor moments for the probabilistic description of response of nonlinear system driven by a stochastic process. Other methods such as the stochastic linearization (SL) method [53], the equivalent non-linear equations method (ENLE) [14], the moment differential equation method (MDE) Monte Carlo simulation using a non-Gaussian closure approximation [40], the path integral approach, perturbation technique and Galerkin method [32], [35], [60] have been developed for approximating the time-dependent probability density of some general first-order and second-order non-linear stochastic systems. In this work, we develop the time-dependent probability density function for the number of infections of certain diseases satisfying a particular stochastic SIS epidemic model. The state transition probability density function for the dynamic process is governed by the FPE equation [19], [58]. The result obtained is used to analyze the distribution of the aggregate daily infection count of COVID-19 cases for the United States collected from the Center of Disease Control and Prevention (CDC) for the period: 01/22/2020-03/23/2021. The organization of the paper is as follows. In Section 2, we formulate a stochastic differential equation (SDE) used in modeling the number of infection at a given time by extending the well known deterministic SIS epidemic model with vital dynamics to a stochastic case. The existence and uniqueness of the solution of the SDE is also discussed. The threshold under which the disease dies out using the SDE model is obtained and analyzed. In Section 3, we derive the closed form time-dependent probability density function for the number of infection at a given time for the stochastic SIS model. Properties of the distribution, namely, mean, variance, skewness, kurtosis functions are also derived in Section 3. Special cases of the distribution are also discussed in this section. Numerical results are carried out in Section 4 to verify our claim. Discussion and summary of the work done are presented in Section 5.

Formulation of model

Our main focus in this work is on the derivation of time-dependent probability density function for the number of infected individuals following the general stochastic SIS model. To do this, we first study the SIS deterministic epidemic modelwhere and represent the population of infected and susceptible individuals, respectively, and is the recruitment rate, is the transmission rate, is the natural death rate and is the temporary recovery rate. To maintain a constant population, we assume . With this, the population size remains constant over time so that . This reduces the model governing in (2.1) to the formThe model (2.1) has been studied extensively by Brauer et al. [11], Hethcote and Yorke [25], Lajmanovich and Yorke [33] and Nold [45]. Many factors such as pneumonia seasonality, mobility, testing rates, mask use per capital weather [27], social behavior, strain-specific factors [42], and public health intervention [34] can act as external fluctuations affecting the contact/transmission rate of the disease. For this reason, we assume the contact/transmission rate of the disease is affected by environmental perturbations, such as stated above, changing the infection transmission rate to the formwhere each infected individual makes infectious contacts with each other in the interval is a standard Wiener process on a filtered probability space (), the filtration function is right-continuous and each with contains all -null sets in . By substituting (2.3) into (2.1), we reduce the deterministic epidemic model (2.1) to the Stratonovich stochastic modelwhere denotes the Stratonovich integral [2]. We used the Stratonovich calculus instead of the Itô calculus following the arguments made in [7], [37]. We also assume the initial process is measurable and independent of . The solution of (2.4) with Itô calculus has been studied extensively by Gray [22]. In their work, Dalal et al. [16] showed that the introduction of stochastic noise can stabilize system that is unstable. In this work, we are interested in studying how the system behaves in the presence of noise in the transmission rate of the disease. This behavior is studied by analyzing the time-dependent probability density function of the process and . The stochastic differential equation has a unique positive solution [22], [39], [47], [59] in the feasible regionfor all with probability one. The model governing in (2.4) reduces to the Itô formDefineThe number is referred to the reproduction number for (2.1). This is the average number of infection cases produced by an infectious individual when introduced into a completely susceptible population. The number is the stochastic version of for the model (2.6). It was shown in Méndez et al. [39] and Otunuga [48] that the boundaries and of (2.6) are unattainable at all times if regardless of the noise intensity. Following a similar Theorem in [22], we show in Theorem 1 that the disease will die out exponentially almost surely on the long run in the presence of noise if . For any given initial condition the solution of the stochastic differential Eq. (2.6) tends to zero exponentially almost surely if . It follows from (2.6) thatIf we have from (2.5) thatThe result follows from the fact thatusing the large number theorem for martingales [24]. □ We see in Theorem 1 that replacing the condition with the condition for the disease to die out exponentially almost surely is also valid. We note here that epidemic advances can still grow initially, leading to transient epidemic advance ifA numerical result is derived later in Section 4 to confirm Theorem 1 by showing that the stationary probability distribution of concentrates entirely on zero if (which also implies ). Likewise, it can be shown that the stochastic differential Eq. (3.2) has a unique stationary distribution if . The proof of this is similar to the proof given in Theorem 6.2 of the work of Gray et al. [22], so we omit the proof here. We give the closed form representation of the time-dependent and stationary probability density function for in the next section and confirm that these distributions exist if . It was shown in Otunuga [48] that the solution of the system (2.4) is obtained aswhereandLet be the probability density for the process satisfying (2.6) at time given initial point . In his work, Otunuga [48] derived the probability density function for the specific case where the resulting Fokker Planck differential equation for has only discrete eigen-values and satisfyingis a non-negative integer. In this work, we extend the work of Otunuga [48] to include both discrete and continuous eigen-values.

Derivation of time-dependent probability distribution for the general stochastic SIS epidemic model

Using the transformationwe reduce (2.4) to the formBy setting the stochastic differential equation governing in (3.2) reduces to the Stratonovich stochastic differential equationThe Itô equivalent of (3.3) is given byUsing the change of state variablein (3.4), we obtain the Itô stochastic differential equationwhere and are defined in (2.10). Let be the probability density function for the process satisfying (3.6) at time given initial point and let denotes the corresponding limiting or stationary distribution. We give the closed form expression for and in the following theorem. Assume and . Then the solution satisfying (3.6) has a unique stationary and time-dependent probability density function and respectively, obtained as where is the Laguerre polynomial is the Pochhammer symbol , and is the Whittaker function . It follows from (3.6) and the Fokker Planck equation that satisfieswith boundary conditionWe assume a solution of the formwhere is a constant. The stationary density function corresponding to (3.7) is obtained asprovided thatWe convert (3.7)-(3.8) to Sturm-Liouville equationby substituting . By substituting (3.10) into (3.12), we see that the differential equation in (3.12) reduces toUsing the transformationwe further reduce (3.13) into a differential equation of the formwhereThe differential Eq. (3.15) is the well known Whittaker differential equation (see Section 13.14 of [46]) with solutionwhere are constants and ; are the Whittaker functions [5], [46]. It now follows from (3.14) that the function satisfying (3.12) is obtained asUsing relations 6.9.2 and 6.9.4 in Bateman [5], the Sturm-Liousville equation (3.12) has eigenvalueswith corresponding eigenfunctionwhere is the Laguerre polynomial of degree and is a normalizing factor obtained asThe Sturm-Liouville equation also has continuous range of eigenvaluewith corresponding eigenfunctionwhere is the Kummer function [46], is the imaginary unit, and is a constant derived using the orthogonal conditionwhere is the Dirac delta function. According to Szmytkowski [52], the functions and with are orthogonal on the positive real semi-axis with the weight in the senseIt follows immediately from (3.25) and (3.26) thatThe principal solution is obtained as □ We note here that the distribution is the Gamma distribution with shape parameter . Since and it follows from (3.27) thatUsing the fact that the distribution is the Gamma distribution, it follows immediately that the limiting mean, mode, variance, skewness, kurtosis function of are obtained asrespectively. Using the following integrals regarding Laguerre polynomial (these follows from the Chu-Vandermonde identities (see [46] Section 16.1.1)),where is the Pochhammer’s symbol, it can be shown that andwhere is the Hypergeometric regularized function [46]. □ In order to confirm the validity of the solution (3.27), we show that the solution corresponds to an initial value . That is, for any real valued continuous function we have . To do this, we show that for . We use (3.29) and equation (44) in Becker [6] to show that □ Let and be the time-dependent probability density function for the number of infected and susceptible individuals at time given initial points and respectively. Also, letand be the corresponding stationary density function. We give the closed form expression for and in the following theorem. Assume and . Then the solution given in (2.8) and satisfying (2.4) has a unique stationary and time-dependent probability density function respectively, obtained as Using the transformationin (3.5), we obtain the probability stationary function and the density function of the number of infection at a given time time with initial condition asandLet . Since it follows that the density probability function of the number of susceptible individuals at time with initial condition can be obtained as Eq. (3.30) follows using the transformation (3.1) □ The condition given in (3.10) is equivalent to . It follows that the probability distribution and do not exist if . Following similar approach in Remark 1, it can be shown that and . Also, using the substitution it can be shown that .

Properties of the stationary distribution

One can easily check thatThe cummulative density function, for the distribution in (3.30) is obtained aswhere is the lower incomplete gamma distribution. The mean and variance, denoted and respectively, of the distribution are obtained aswhere is the upper incomplete gamma function and is the Kummer/confluent hypergeometric function [46]. We see later in (3.40) that these results correspond to the limiting mean and variance of the distribution . It follows from (3.35) that the median is the number that satisfies the equationThe following theorem shows interval where the stationary density function is decreasing and increasing. Define If then and the probability stationary distribution is increasing and decreasing on the interval and respectively, with maximum value attained at . If and then is increasing and decreasing on the interval and respectively. In this case, has local maximum at but no absolute maximum value on . If then is decreasing on the interval . We note that on and on . If then . Hence, attains its maximum value at and (i) follows. Likewise, if and then and (ii) follows. Finally, if then on and (iii) follows. □

Special case where

The condition is equivalent to . For this case, reduces toIf are samples of independent identically distributed random variables, the maximum likelihood estimate of is

Properties of the density function

The th moment, denoted of the distribution at time is obtained aswhere and withusing identities (13.14.5), (13.6.19) and (13.16.6) in [46]. The mean variance skewness and kurtosis of the distribution can easily be calculated from (3.37). The limiting th moment, is obtained aswhere is defined in (3.21) and is the Kummer/Congruent Hypergeometric function[46]. It follows immediately from (3.39) thatwhere is the incomplete gamma function. The values obtained in (3.40) are the mean, variance, skewness, and kurtosis, respectively, of the stationary density function in (3.30). This is confirmed numerically in Section 4.

Special cases

In this section, we briefly discuss few interesting cases of the expression for given in (3.30) involving particular values for the parameters and .

Case for small

If then is approximatelywith .

Case for large

The constant is called the harvesting effort of the system (2.4). It follows from (3.11) that . For large so that it follows from (2.8) that . That is, large value of leads the infected population to extinct. Also, this condition is equivalent to so that and the probability density function reduces to that in (3.30). This is verified numerically in Section 4.

Case for large noise intensity

As the noise intensity in the transmission rate of disease increases (that is, as ), we see that and the probability distribution converges to as and . This effect is shown numerically in Section 4.

Numerical simulation

In this section, numerical simulation is carried out to verify our claim. First, we use published and assumed COVID-19 epidemiological parameters in Subsection 4.1 to confirm our results. The properties of the distribution are investigated numerically in Subsection 4.2. In Subsection 4.3, the obtained time-dependent and stationary probability density function are used to analyze the distribution of aggregate number of COVID-19 infection cases in the United States.

Analysis using published data

We apply the distribution using published and assumed parameters. According to recent survey conducted by CDC1 on people with mild COVID-19 cases, one-third did not return to normal health within two to three weeks of testing positive, and recovery takes six weeks or more for those with severe cases. For this reason, we set to be in the range for our analysis. The CIA World Factbook2 reported that the current population of the United States as at July 2020 is 332,639,102. So, we set unit representing 332,639,102 individuals. The total number of infected cases3 as of November 30, 2020 is 13386251. So is about and increasing. For this reason, we suggest . Also, the annual U.S. birth rates for the year 2020 is 12.4 births per 1000 population, while the death rate is 8.3 death per 1000 population and life expectancy is 80.3 years4 . Therefore, we set to be in the range . The parameters used are associated with COVID-19 data and are given in Table 1 below. A MATHEMATICA program that evaluates the Whittaker and Laguarre functions and performs the integration in (3.32) is available from the author upon request.

Table 1

Parameter values associated with COVID-19 data.

Parameter	Description	Value	Source
γ	temporary recovery rate (day−1)	[1/30,1/2]	CDC1
μ	death rate (day−1)	[833650000,180.3×365]	CIA4
Λ	recruitment rate (day−1)	μ
β	transmission rate (day−1)	[0.5,1.5],[0.1,2]	[18], Assumed
σ	noise intensity	[0.01,8]	Assumed
N	population Size	1	CIA2
I0	initial infection cases	[0.04,0.05]	CDC3

Parameter values associated with COVID-19 data.

Verification of the result obtained

We note from (3.1) that the time-dependent probability density function reduces to if the population size is converted to fractions, that is, if . For the rest of the numerical analysis in this subsection, we set . In order to verify the correctness of the density function and the stationary density function obtained in (3.30), we first discretize the stochastic model (3.4) using the Milstein scheme [20] as follows:where are independent standard normal variable, for sample size for simulations. For large number of simulations the probability density function derived from the histogram of the random variable is plotted and compared with the graph of the probability density function for in Fig. 1 (a) below. To check the correctness of the result for we simulate the probability density function derived from the histogram of the random variable for large and large number of simulations . The simulated density function is now compared with the graph of in Fig. 1(b) below

Fig. 1

Comparison of the exact density functions and with simulated distribution of and respectively.

Comparison of the exact density functions and with simulated distribution of and respectively. The red and blue curves in Fig. 1(a) are the trajectories of the exact probability density function and simulated distribution of respectively, for simulations at time using the parameters . The red and blue curves in Fig. 1(b) are the trajectories of the exact probability density function and simulated distribution of respectively, for simulations, using the same parameters above.

Behavior of the distribution of as the transmission rate changes

In this section, we analyze numerically the behavior of the time-dependent distribution and as the transmission rate increases. Fig. 2 (a), (b) and (c) is the graph of the probability density function for the case where and respectively. Here, we use and . Fig. 2(d) shows the stationary probability distribution which is the limiting distribution . We see here that the distributions for the three cases skewed left. This is evident in Figs. 9(b) and 10(b) as we can see there that the skewness plot is negative. We see from Fig. 2(d) that as . Also, we observe that as increases, the stationary density function shifts concentration to the right. The concentration moves closer to as grows larger. This is because as the transmission rate increases, we expect the number of infection to increase until everyone in the population becomes infected. This can be confirmed analytically using (3.30).

Fig. 2

Graphs of the probability density function and as increases.

Fig. 9

Graphs of the mean, variance, skewness, kurtosis function with time.

Fig. 10

Graphs of the mean, variance, skewness, kurtosis function with time.

Graphs of the probability density function and as increases. Stationary density function for the case where and respectively. Effect of changing on the stationary density function .

Behavior of the distribution of as and .

As shown in Theorem 1, the disease will die out on the long run if (implying also). Fig. 3(a) and (b) show the behavior of the distribution as disease dies out in the population using parameters and varying so that in 3(a) and in 3(b). The result shows that on the long run, the density function of converges to the Dirac delta function as .

Fig. 3

Stationary density function for the case where and respectively.

Effect of change in temporary recovery rate

It follows from condition (3.11) that . Fig. 4 (a) and (b) shows the effect of increase in the parameter on the stationary density function using parameters with in Fig. 4(a), and in Fig. 4(b), respectively. We see here that as increases to from the left, the distribution shifts concentration to the left until it concentrates entirely on 0. This shows that the disease dies out as . We also see the effect of large noise in the system as increases. We see that the distribution becomes taller and narrower in the presence of small noise, and flatter and wider in the presence of large noise. In Fig. 4(c), we see that the stationary distribution concentrates more on the final size of the epidemic as the recovery rate decreases to zero. This analysis shows that the disease dies out as the recovery rate reaches the threshold . The number of infected reaches maximum as the number of those who recovered declines to zero.

Fig. 4

Effect of changing on the stationary density function .

Distribution for the case where .

Fig. 5 (a) and (b) show the probability and stationary density functions and respectively, for the case where or equivalently, using parameters in Table 1 with . We see here that as the stationary distribution . This follows directly from (3.30).

Fig. 5

Probability and stationary density functions and respectively, for the case where .

Probability and stationary density functions and respectively, for the case where . Fig. 6 shows the probability density function for the case where using parameters in Table 1 with . The analysis also shows that if the number of infection is not harvested in any way (in this case, by recovering), then as for some with . Also, we see that . This is true because if if as t→∞

Fig. 6

Probability density function for the case where .

Probability density function for the case where . Effect of noise intensity on the stationary density function .

Effect of noise in the system

Fig. 7 shows the effect of noise intensity on the stationary density function using and varying . Fig. 7(a) shows the distribution as while Fig. 7(b) shows the distribution as . We see from Fig. 7(a) that the stationary density function concentrates entirely on the number as . This is because the solutionof the deterministic equivalent of (3.4) where converges to if . The number is the endemic equilibrium of the deterministic equivalent of (3.4) where . Endemic persists in the deterministic system if . This is confirmed in Fig. 7(a). The graph shows that the Dirac delta function, as . This is not the case for large . In fact, as the stationary density function as and as .

Fig. 7

Effect of noise intensity on the stationary density function .

Plot of the mean, variance, skewness, and kurtosis function using published parameters

Fig. 8 (a), (b), (c) and (d) is the graph of the mean, variance, skewness, and kurtosis, respectively, of the distribution at time using parameters in Table 1 with and . The dashed line is the horizontal line representing the limits of the mean, variance, skewness, and kurtosis functions, respectively, of the distribution given in (3.40). Fig. 8(a) shows that if the transmission rate is and the temporary recovery rate is as low as the average number of infection cases will rise to on the long run if proper care and prevention is not taken. The variance function converges to 0.0039 on the long run. This shows the magnitude of the spread of the data away from the mean is small. The skewness function is a positive function, suggesting that the distribution of the data is skewed right.

Fig. 8

Graphs of the mean, variance, skewness, kurtosis function with time.

Graphs of the mean, variance, skewness, kurtosis function with time. Graphs of the mean, variance, skewness, kurtosis function with time. Graphs of the mean, variance, skewness, kurtosis function with time. Fig. 9 (a), (b), (c) and (d) is the graph of the mean, variance, skewness, and kurtosis, respectively, of the distribution at time using parameters in Table 1 with and . The dashed line is as described in Fig. 8. Fig. 9(a) shows that if the transmission rate is and the temporary recovery rate is as low as the average number of infection cases will rise to on the long run if proper care and prevention is not taken. The variance function converges to 0.012 on the long run. The skewness function is a negative function, suggesting that the distribution is left skewed. Fig. 10 (a), (b), (c) and (d) is the graph of the mean, variance, skewness, and kurtosis, respectively, of the distribution at time using parameters in Table 1 with and . The dashed line is as described in Fig. 8. Fig. 10(a) shows that if the transmission rate is and the temporary recovery rate is as low as the average number of infection cases will rise to on the long run if proper care and prevention is not taken. The variance function converges to 0.0064 on the long run. The skewness function is a negative function. By comparing the results derived in Figs. 9 and 10, it follows that the average number of cases increases and the variance decreases on the long run as the transmission rate increases.

Probability distribution of aggregate number of COVID-19 infections in the United States

Several epidemiological models [1], [10], [23], [38], [43], [44], [49], [51], [57] have been developed to study the transmission of the COVID-19 virus. As it is well known, the generalized logistic equation is widely used in interpreting the aggregate number of COVID-19 infection trajectories in several countries [49], [50], [57]. We see in our work that model (2.2) is a logistic differential equation. In the case of modeling the aggregate number of infected cases, is referred to as the aggregate infection growth rate, is the aggregate infection carrying capacity, and serves as the aggregate infection harvesting effort. In this section, we study the trajectory and the distribution of the aggregate counts of COVID-19 cases reported by the Centers for Disease Control and Prevention (CDC)5 for some states in the United States using model (2.6) and the time-dependent density function (3.30), respectively. We assume the aggregate number of infection follows the model (2.6), where in this case represents the aggregate infection cases at time . The advantage of using model (2.6) over most models available in literature is that it considers presence of external perturbations/disturbances in the contact rate of the disease that can be caused by many factors such as pneumonia seasonality, mobility, testing rates, mask use per capital weather [27], social behavior, strain-specific factors [42], and public health intervention [34]. Also, in the presence of this noise, the harvesting function is not constant as widely assumed, but non-linear. Figs. 11 , 12 , and 13 shows the plot of the aggregate infection counts of COVID-19 cases for the states and territories: AK, AR, AZ, CA, CO, CT, DC, DE, GA, GU, HI, IA, ID, IL, IN, KS, KY, LA, MD, ME, MI, MN, MO, MP, MS, MT, NC, ND, NE, NH, NJ, NM, NV, OH, OK, OR, PA, RI, SC, SD, TN, TX, UT, VA, VT, WI, WV, WY, in the United States for the period 01/22/2020 to 03/23/2021 using model (2.6), and the parameters in the model estimated for the deterministic version of the model case using the Non-linear Least Squares estimation scheme [15], [36]. We also estimated the initial point for better fit.

Fig. 11

Real and simulated aggregate counts of COVID-19 infection for the states and territories: AK, AR, AZ, CA, CO, CT, DC, DE, GA, GU, HI, IA, ID, IL, IN, KS in the United States.

Fig. 12

Real and simulated aggregate counts of COVID-19 infection for the states and territories: KY, LA, MD, ME, MI, MN, MO, MP, MS, MT, NC, ND, NE, NH, NJ, NM in the United States.

Fig. 13

Real and simulated aggregate counts of COVID-19 infection for the states and territories: NV, OH, OK, OR, PA, RI, SC, SD, TN, TX, UT, VA, VT, WI, WV, WY in the United States.

Real and simulated aggregate counts of COVID-19 infection for the states and territories: AK, AR, AZ, CA, CO, CT, DC, DE, GA, GU, HI, IA, ID, IL, IN, KS in the United States. Real and simulated aggregate counts of COVID-19 infection for the states and territories: KY, LA, MD, ME, MI, MN, MO, MP, MS, MT, NC, ND, NE, NH, NJ, NM in the United States. Real and simulated aggregate counts of COVID-19 infection for the states and territories: NV, OH, OK, OR, PA, RI, SC, SD, TN, TX, UT, VA, VT, WI, WV, WY in the United States. The parameter estimates and of the aggregate infection carrying capacity the infection growth rate the infection recovery rate and the initial population size are estimated along with the root mean square error (RMSE) of the estimation and reported in Table 2 . is the number of non-zero aggregate infection days from January 22, 2020 to March 23, 2021. The trajectory of the aggregate infection counts, together with the estimated trajectory for each of the 48 states and territories studied are reported in Fig. 11, Fig. 12, and 13. The red and blue curves are the trajectory for the real and simulated COVID-19 aggregate count data, respectively. The estimated parameter here represents the carrying capacity of the aggregate infection count for each states. We show in later section that as the aggregate harvesting effort reduces, or as the aggregate infection growth rate increases due to no proper public health intervention, the aggregate COVID-19 count estimate converges to .

Table 2

Parameter estimates and the RMSE for the period of Jan 22, 2020-March 23, 2021.

State	N^	β^	γ^	I0	T	RMSE
AK	116955	0.06495	0.03196	11.7	373	1169.2
AR	1387119	0.05572	0.03857	3073.4	374	8295.1
AZ	6207611	0.05191	0.03901	10109.9	419	42953.8
CA	18525258	0.04999	0.03442	18040.2	419	180570.9
CO	1287227	0.06487	0.04007	528.3	381	17744.6
CT	4265220	0.05993	0.04875	8086.5	377	16566.6
DC	1139045	0.06526	0.05747	3.501.3	379	1926.8
DE	993497	0.06256	0.05049	2217.8	375	3.887.0
GA	7996680	0.06166	0.04890	23805.8	386	38483.0
GU	14309	0.06804	0.03079	2.4421	371	152.2
HI	88343	0.066541	0.045843	231.96	381	1164.9
IA	1047217	0.058366	0.037049	1664.8	378	10774
ID	490785	0.055512	0.033361	688.45	372	4195.1
IL	4441504	0.053073	0.035285	5387.3	422	47364
IN	1783274	0.057553	0.033006	898.27	380	24196
KS	868640	0.058924	0.035784	615.67	378	7801
KY	1386548	0.06343	0.041157	797.04	379	7928.9
LA	3399215	0.063446	0.051774	19790	377	17743
MD	4032036	0.062989	0.0521	16646	381	15558
ME	152787	0.0656	0.040101	24.428	374	2044
MI	2316851	0.057358	0.037583	3746.3	376	32390
MN	1243941	0.061734	0.035266	555.81	380	20819
MO	1580962	0.056665	0.033793	1439.2	379	10266
MP	857	0.066576	0.053355	10.391	358	2.8125
MS	1939174	0.063662	0.049823	7495.3	374	9953.3
MT	188925	0.064356	0.029378	14.373	375	1290
NC	4994376	0.056624	0.041666	10277	384	26445
ND	174226	0.064338	0.026613	14.732	374	2473.8
NE	655227	0.062605	0.040601	714.66	380	7454
NH	232013	0.065474	0.038755	24.024	384	3291.7
NJ	21306864	0.064247	0.055493	45087	381	44812
NM	511161	0.062123	0.036503	208.01	375	7326.1
NV	1387057	0.065965	0.048652	2906.5	381	11652
OH	2680011	0.060963	0.035346	871.78	376	35223
OK	1386379	0.056107	0.035141	1358.7	379	10747
OR	655383	0.063132	0.043888	674.69	386	4765.3
PA	3720047	0.06246	0.041921	2585.5	380	44433
RI	559165	0.051781	0.035041	975.64	384	6601.1
SC	4294273	0.060356	0.047188	10037	381	20568
SD	245266	0.067793	0.035468	43.639	376	3069.5
TN	3173919	0.05427	0.036698	6150.5	381	26324
TX	17156629	0.05962	0.045363	43642	381	78597
UT	1095558	0.054204	0.03238	1074.7	378	9331.2
VA	5879402	0.056827	0.044575	13170	378	19814
VT	132684	0.076231	0.058152	46.712	378	607.51
WI	1372907	0.058849	0.030678	477.69	383	13913
WV	327686	0.064602	0.034811	36.89	369	3243.5
WY	99777	0.070869	0.031364	2.2229	374	1245.1

Parameter estimates and the RMSE for the period of Jan 22, 2020-March 23, 2021.

Distribution of the aggregate infection count in the United States for March 23, 2021

Here, we study the effect of small disturbances in the system by setting and plot the distribution of the aggregate number of infection that occurred on March 23, 2021 for the state of AK, ID, KY, ND, SD, WI, WV, WY. Let be the number of non-zero days from January 22, 2020 (first day CDC started recording data) to March 23, 2021. We plot the curve for the distribution together with the density function obtained from the histogram of the random variable in (4.1) for simulations. The value of for each of the states analyzed are reported in Table 3 . We confirm the result derived in Fig. 7(a), that is, the probability density function of the aggregate count of COVID-19 for March 23, 2021 concentrates entirely on the number for small . This result is shown in Fig. 14 . The analysis shows that the aggregate count of COVID-19 for the states analyzed converges to asymptotically on the day March 23, 2021. This number is estimated and reported in Table 3.

Table 3

Parameter estimate for the day March 23, 2021.

State	AK	ID	KY	ND	SD	WI	WV	WY
N¯	59355	195541	486138	102066	116825	656407	150941	55572

Fig. 14

Probability distribution of aggregate number of counts that occurred on March 23, 2021.

Parameter estimate for the day March 23, 2021. Probability distribution of aggregate number of counts that occurred on March 23, 2021.

Distribution of the final size of the aggregate infection count in the United States

In this section, we study and analyze the distribution of the final size of the aggregate infection count in the United States by studying how the distribution behaves as . According to (3.34), we know thatWe verify this by plotting, for large say, the distribution together with . The verification plot is shown in Fig. 15 . For this reason, we analyze the distribution of the final size of the aggregate infection count using the stationary probability density function .

Fig. 15

Comparison of the probability density function and .

Comparison of the probability density function and . Fig. 15 shows the comparison of the stationary density function (in red color) and the limiting distribution (in blue color). As evidenced in Fig. 2(d), we see that the stationary probability density function concentrates more on the final size of the epidemic as the infection growth rate increases. We also see a similar behavior in Figs. 4(c) and 7 (b) as the aggregate harvesting effort reduces and as the noise intensity increases, respectively. For this reason, we study the effect of high infection growth rate low recovery rate (low aggregate harvesting effort ), and effect of large noise on the distribution of the aggregate COVID-19 counts in the United States in the following subsections. This is done by plotting the stationary probability density function of the final size of the aggregate infection count for large, small, and large , and respectively. We see that the graph of concentrates on the estimate (obtained in Table 2), which is the aggregate carrying capacity of the COVID-19 count in the United States as public health intervention reduced. Fig. 16 shows the distribution of the final size of the aggregate infection count as the infection growth rate increases using the estimated parameters in Table 2 with large noise intensity and varying infection growth rates (blue curve), (black curve), and (red curve) for each states analyzed. The figure shows that if proper intervention is not taken (resulting in increase in infection growth rate and constant recovery rate ), the aggregate count of COVID-19 infection will rise to the estimated aggregate carrying capacity value in Table 2 for each respective states and territories. The analysis was done for the 48 states and territories mentioned in Table 2 and similar results obtained for each states and territories. For this reason and in order to minimize space, we only report the plots for the state of AK, ID, KY, ND, SD, WI, WV, WY.

Fig. 16

Distribution of the final size of the aggregate infection as the infection growth rate increases for the states: AK, ID, KY, ND, SD, WI, WV, WY in the United States.

Distribution of the final size of the aggregate infection as the infection growth rate increases for the states: AK, ID, KY, ND, SD, WI, WV, WY in the United States. Fig. 17 shows the distribution of the final size of the aggregate infection count as the aggregate harvesting effort decreases using the estimated parameters in Table 2 with large noise intensity and varying recovery rate (blue curve), (black curve), and (red curve) for each states analyzed. The figure shows that if proper intervention is not taken (resulting in decline in recovery rate and constant aggregate growth rate ), the aggregate count of COVID-19 infection will rise to the estimated aggregate carrying capacity value in Table 2 for each respective states and territories. The analysis was done for the 48 states and territories mentioned in Table 2 and similar results obtained for each states and territories. We only report the plots for the state of AK, ID, KY, ND, SD, WI, WV, WY here.

Fig. 17

Distribution of the final size of the aggregate infection as the aggregate harvesting rate reduces with large noise for the states: AK, ID, KY, ND, SD, WI, WV, WY in the United States.

Results, conclusion and discussion

By assuming the contact rate of certain diseases is affected by environmental perturbations, we convert the general deterministic SIS epidemic model with vital dynamics and constant population to a stochastic SIS model and used this model to study the dynamics of some infectious diseases. The study in this work involves analyzing the distribution of numbers of infections whose dynamic follows the SIS stochastic model (2.6). To do this, we derive the closed-form time-dependent probability density function of the number of infection at time given the initial point . The stationary probability density function of the process is also derived and analyzed and our studies show that converges to on the long run. We showed that these distibutions exist if the average number, of individuals infected by an infectious individual in a completely susceptible population is more than one. Also, we showed that the disease will die out if . Using the Milstein scheme, the stochastic model (2.6) is discretized for sample size and simulations and the correctness of the obtained probability density functions and are verified in Section 4.1.1 by comparing them to the probability density function derived from the histogram of the discretized processes and (for large ), respectively, in (4.1), for and using published epidemiological data. The effect of changes in the transmission rate and temporary recovery rate on the distribution and is studied. It was shown that as approaches 1 (from above), the distribution approaches the Dirac delta function and concentrates entirely on zero. That is, on the long run, the number of infection declines and the disease eventually dies out as the reproduction number approaches 1 (from above). Also, as the recovery rate decreases to zero, the distribution of the number of infection shifts concentration to the right until it concentrates entirely on the infection carrying capacity . That is, the infected population converges to the infection carrying capacity as the recovery rate reduces significantly. A similar analysis done on the transmission rate shows that the infected population converges to the infection carrying capacity as increases. The effect of noise on the system is also analyzed. It was shown that on the long run, the probability density function shifts concentration on the disease endemic equilibrium point as the noise intensity approaches zero. This is because (2.2) has an endemic equilibrium that is globally stable if . Some properties of the time-dependent probability density function, namely the mean, variance, skewness, and Kurtosis functions are derived and analyzed as a function of time. Similar analysis described above is done using the real aggregate counts of COVID-19 cases reported by the Centers for Disease Control and Prevention for the states: AK, AR, AZ, CA, CO, CT, DC, DE, GA, GU, HI, IA, ID, IL, IN, KS, KY, LA, MD, ME, MI, MN, MO, MP, MS, MT, NC, ND, NE, NH, NJ, NM, NV, OH, OK, OR, PA, RI, SC, SD, TN, TX, UT, VA, VT, WI, WV, WY in the United States. We study the trajectory and the distribution of the aggregate counts using the time-dependent density function and stationary distribution given in (3.30). The validity of the derived time-dependent density function is verified by comparing the distribution for the aggregate count of COVID-19 for the day March 23, 2021 with the histogram of the distribution of the random variable for simulations, where is the number of non-zero days from January 1, 2020 to March 23, 2021. We also show numerically using the real COVID-19 aggregate counts for AK, ID, KY, ND, SD, WI, WV, WY that on the long run, the time-dependent density function converges to the stationary density function . We use the distribution to confirm the estimate of the aggregate count for the month of March 23, 2021 in Table 3 and compare the result with the real data. By studying the distribution of the final size of the aggregate infection counts, our analysis shows that on the long run, if proper intervention is not taken such that the harvesting effort decreases significantly to 0.01 (or less), the aggregate counts of COVID-19 cases for each states may increase to the value reported in Table 2. A similar analysis shows that the aggregate counts of COVID-19 cases for each states may increase to the value if the infection growth rate increases to 0.3 (or more). These studies are confirmed by showing that the distribution shifts concentration on the value as the harvesting effort decreases or the infection growth rate increases. More studies are still ongoing on the distribution of the aggregate count of infection as data are being updated daily. Further extension of the result obtained in this work using fractional order cases as discussed in Jajarmi et al. [28], [29] is under investigation. For more readings on fractional derivatives, we direct readers to the papers [3], [4], [51], [56].

Availability of data and materials

The data used in the analysis can be found on the CDC website.

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

CRediT authorship contribution statement

Olusegun Michael Otunuga: Conceptualization, Data curation, Formal analysis, Software, Writing - original draft, Writing - review & editing.

Declaration of Competing Interest

The author declares that there is no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

2 in total

1. Stochastic Modeling and Forecasting of Covid-19 Deaths: Analysis for the Fifty States in the United States.

Authors: Olusegun Michael Otunuga; Oluwaseun Otunuga
Journal: Acta Biotheor Date: 2022-09-16 Impact factor: 1.185

2. Stochasticity of disease spreading derived from the microscopic simulation approach for various physical contact networks.

Authors: Yuichi Tatsukawa; Md Rajib Arefin; Shinobu Utsumi; Kazuki Kuga; Jun Tanimoto
Journal: Appl Math Comput Date: 2022-06-21 Impact factor: 4.397

2 in total