Mi Jin Lee1, Deok-Sun Lee1. 1. Department of Physics, Inha University, Incheon 22212, Korea.
Abstract
For a reliable prediction of an epidemic or information spreading pattern in complex systems, well-defined measures are essential. In the susceptible-infected model on heterogeneous networks, the cluster of infected nodes in the intermediate-time regime exhibits too large fluctuation in size to use its mean size as a representative value. The cluster size follows quite a broad distribution, which is shown to be derived from the variation of the cluster size with the time when a hub node was first infected. On the contrary, the distribution of the time taken to infect a given number of nodes is well concentrated at its mean, suggesting the mean infection time is a better measure. We show that the mean infection time can be evaluated by using the scaling behaviors of the boundary area of the infected cluster and use it to find a nonexponential but algebraic spreading phase in the intermediate stage on strongly heterogeneous networks. Such slow spreading originates in only small-degree nodes left susceptible, while most hub nodes are already infected in the early exponential-spreading stage. Our results offer a way to detour around large statistical fluctuations and quantify reliably the temporal pattern of spread under structural heterogeneity.
For a reliable prediction of an epidemic or information spreading pattern in complex systems, well-defined measures are essential. In the susceptible-infected model on heterogeneous networks, the cluster of infected nodes in the intermediate-time regime exhibits too large fluctuation in size to use its mean size as a representative value. The cluster size follows quite a broad distribution, which is shown to be derived from the variation of the cluster size with the time when a hub node was first infected. On the contrary, the distribution of the time taken to infect a given number of nodes is well concentrated at its mean, suggesting the mean infection time is a better measure. We show that the mean infection time can be evaluated by using the scaling behaviors of the boundary area of the infected cluster and use it to find a nonexponential but algebraic spreading phase in the intermediate stage on strongly heterogeneous networks. Such slow spreading originates in only small-degree nodes left susceptible, while most hub nodes are already infected in the early exponential-spreading stage. Our results offer a way to detour around large statistical fluctuations and quantify reliably the temporal pattern of spread under structural heterogeneity.
Heterogeneity of the connectivity of elements in complex systems [1] leads to peculiar dynamic behaviors, including large connected components formed with a small number of links [2], the onset of a global epidemic [3] or synchronization [4] at all positive interaction strengths, and a novel singularity of the free energy in the Ising and the Potts models [5-7]. Different dynamic influences of nodes essentially determined by their degrees (numbers of connected nodes) have been shown to underlie such anomalous emergent behaviors by extensive studies on the structure and dynamics of complex networks [8-10].This advancement in our understanding is, however, restricted to the equilibrium or stationary state. In reality, taking quick action before reaching the stationary state is necessary to control, e.g., the spread of a life-threatening virus or the word about marketed products. Despite such importance, theoretical understanding of the nonstationary state is far from complete. This is partly because of the time variation of relevant variables and having neither small nor large order parameters in the intermediate-time regime, defying analytic approaches based on approximations which are valid in the early- or late-time regime. Moreover, the speed of epidemic spreading shows a large statistical fluctuation, presumably due to the heterogeneity of the infection seed's degree and the stochasticity of infection trajectories [11,12]. As we will address here, the epidemic size at a given time suffers from such a large fluctuation that it is disqualified from being a reliable measure, particularly in the intermediate stage of spreading. Therefore, a new reliable measure and its theory are required to predict and control efficiently disease and information spreading in real-world complex systems.Here we pay attention to epidemic spreading processes, for which various approaches have been proposed, such as pair approximation [13], the branching process approach [14], the moment closure method [15], and message passing [16]. We study the simplest model, the susceptible-infected (SI) model, to investigate thoroughly the statistical fluctuations appearing in the temporal pattern of spreading and provide the theory for an alternative reliable measure. To quantify fluctuation, the distribution of the number of infected nodes at a given time is measured, which turns out to be so broad that the mean loses its representativeness in the intermediate-time regime on heterogeneous networks. We show analytically that the asymptotic behavior of the distribution is derived from the dependence of the epidemic size on the time when a hub, defined here as a node with a degree larger than 30% of the maximum degree, is first infected. In contrast, the distribution of the time taken to infect a given number of nodes is well concentrated at its mean. This suggests that the mean infection time can be a measure for the reliable description of the spreading phenomena. We construct a theory to evaluate the mean infection time, which leads us to discover, for strongly heterogeneous networks, the algebraic relation between and in the intermediate stage contrasted with the well-known exponential spreading in the early stage. The origin of the algebraic-spreading phase is investigated, which helps us understand the temporal complexity derived from the structural heterogeneity in various complex systems.In Sec. II, the SI model and the model networks are described along with their numerical implementation. We compare and analyze the fluctuations of the number of infected nodes and the infection time in Sec. III. Our theory for the mean infection time is presented in Sec. IV. We summarize and discuss the results in Sec. V.
MODEL
We consider the SI model on scale-free (SF) networks of nodes and undirected links, displaying a power-law degree distribution for large , with being the degree exponent. In simulations, we use the uncorrelated configuration model [17] to construct the SF networks, in which each node is assigned link stubs with degree selected as described below such that its distribution takes a prescribed power-law form, and then those stubs are randomly paired until no single or pair of stubs is left. Finally, an unpaired stub, multiple links, and self-loops are removed, the numbers of which are negligible in all the considered cases. For given and , the degree of node is given by the integer part of a real-valued random number from a distribution for , with being a normalization constant and determined such that the resultant mean degree is equal to
[18]. The degree cannot be larger than , a constraint imposed to remove the degree-degree correlation of neighboring nodes, and actually, the maximum degree behaves as for and
[17].In the SI model, the state of node is either susceptible () or infected (). A susceptible node becomes infected with rate by each of its infected neighbors, while the transition from infected to susceptible is disallowed. We run the simulation of the SI model by asynchronous updating [19] as follows. (i) At the initial stage (), a randomly selected node is infected. (ii) At each time , we count the number of links having a susceptible node at one end and an infected node at the other end. And we select randomly one such link and infect the susceptible node with probability . This is repeated times to move to the next time step . (iii) Repeat step (ii) until all the nodes are infected.
LARGE FLUCTUATION OF THE NUMBER OF INFECTED NODES AND ITS ORIGIN
Simulation data for the number of infected nodes are scattered in the plane to an extent varying with except for quite small or large . See Fig. 1 for . See also Fig. 8 in Appendix C for other .
FIG. 1.
Fluctuation in the spreading of infection on SF networks. The fraction of simulation runs of the SI model yielding infected nodes at time is color-coded. The SI model with infection rate is simulated 100 times in each of 200 SF networks of nodes, links (the mean degree ), and degree exponent . Shown are , and the fastest and slowest spreading, taking the shortest and longest time to infect nodes, respectively.
FIG. 8.
Simulation results of the SI model on (a)–(d) SF networks with and (e)–(h) ER networks. (a) and (e) The fraction of simulation runs yielding infected nodes at time is color-coded in the plane. (b) and (f) The probability distribution for . Inset: the standard deviation versus the mean . (c) and (g) The probability distribution for . Inset: the standard deviation versus the mean . (d) and (h) Plot of versus . A solid line fitting Eq. (2) is shown.
Fluctuation in the spreading of infection on SF networks. The fraction of simulation runs of the SI model yielding infected nodes at time is color-coded. The SI model with infection rate is simulated 100 times in each of 200 SF networks of nodes, links (the mean degree ), and degree exponent . Shown are , and the fastest and slowest spreading, taking the shortest and longest time to infect nodes, respectively.To quantify such a fluctuation, we measure the number of infected nodes at time , which is found in the intermediate-time regime to follow a power law
over a wide range of with the exponent [Fig. 2(a)]. The standard deviation is mostly not smaller than the mean , scaling almost linearly in the time period showing , which we refer to as the intermediate-time regime. With such a large fluctuation, cannot be a representative value of . For instance, the probability to observe larger than is only 0.15 at [see Figs. 1 and 2(a)].
FIG. 2.
Statistics of the infection spreading in the SI model on SF networks with . (a) The probability distribution of at time for the network of . is its standard deviation. The dashed line is shown as a guide. Inset: versus the mean for different . The lines (solid) and (dashed) are shown. (b) The distribution of the time taken to infect nodes. and are its mean and standard deviation. Inset: versus . The line (solid) is shown as a guide.
Statistics of the infection spreading in the SI model on SF networks with . (a) The probability distribution of at time for the network of . is its standard deviation. The dashed line is shown as a guide. Inset: versus the mean for different . The lines (solid) and (dashed) are shown. (b) The distribution of the time taken to infect nodes. and are its mean and standard deviation. Inset: versus . The line (solid) is shown as a guide.In striking contrast, the distribution of the time taken to infect nodes is well concentrated at its mean, [Fig. 2(b)]. The standard deviation remains far smaller than the mean unless is too small, demonstrating that is a well-defined measure (see Appendix A). Note that the line representing is in the middle of the region showing high probability in the plane (Fig. 1), supporting its representativeness. Therefore, one should refer to how long it will take to infect a given number of nodes, rather than how many will be infected at a given time, in describing and predicting the pattern of spreading over heterogeneous contact networks. The difference between the two mean values grows with in SF networks (see Appendix B).Before addressing the theory for the mean infection time, let us consider why decays so slow in Eq. (1). Hubs are abundant in SF networks, and their infection should play a role in speeding up or slowing down spreading [11]. As seen in Figs. 3(a) and 3(b), the cluster of the first 80 infected nodes in the fastest spreading has many hub nodes infected very early, while that from the slowest spreading has only small-degree nodes infected early and hub nodes infected late. This suggests that whether and when hubs are infected determine the growth of the infected cluster. To check this, we measure the hub-infection time , the earliest time when any hub is infected. Using different criteria for hubs does not change the results qualitatively. When is plotted as a function of [Fig. 3(c)], its statistical fluctuation is significantly reduced in comparison to the large fluctuation for given shown in Fig. 1. Moreover, grows abruptly for , with being a constant. For example, for and . This demonstrates that the global spreading can occur when hubs are infected. The number of infected nodes at time satisfies the relation
for , with and being positive constants. Given the small fluctuation of with respect to its mean in Eq. (2) for given and and the observation that the probability distribution is almost constant for (see Appendix C), we obtain from Eq. (2) as
which agrees with Eq. (1). This finding provides a guideline for the epidemic-size distribution; different from Eq. (1) implies a relation other than Eq. (2). is expected to depend on the network characteristics of the initially infected node (seed) [11] and also on the specific realization of spreading in the early stage. We find both the degree of the seed and its shortest distance to a hub significantly correlated with (see Fig. 9 in Appendix C). In practically controlling the epidemic spreading, various factors can be influential, such as the core [20], and should be considered when designing efficient intervention strategies [21] and identifying superspreaders and superblockers [22,23].
FIG. 3.
Role of infecting hubs in infection spreading on a SF network with . The cluster of the first 80 infected nodes appears (a) at the observation time (fastest spreading) or (b) at (slowest spreading). Node size and color represent the degree and the infection order of each node. (c) Plot of versus the difference between and the hub-infection time . The mean and the standard deviation are shown, along with a solid line fitting Eq. (2) to the data.
FIG. 9.
The dependence of the hub-infection time on the initially infected node (seed). (a) The hub-infection time increases with the distance of the seed node to the nearest hub node, having a degree larger than , in SF networks with and different numbers of nodes . (b) Plot of versus the degree of the seed node.
Role of infecting hubs in infection spreading on a SF network with . The cluster of the first 80 infected nodes appears (a) at the observation time (fastest spreading) or (b) at (slowest spreading). Node size and color represent the degree and the infection order of each node. (c) Plot of versus the difference between and the hub-infection time . The mean and the standard deviation are shown, along with a solid line fitting Eq. (2) to the data.
ANALYTIC APPROACH TO THE MEAN INFECTION TIME
According to the conventional mean-field theory applied to heterogeneous networks [10,24], the probability of a susceptible node to be infected per unit time interval is proportional to its degree and the probability of encountering an infected neighbor. The latter probability is assumed to be a function of time and is solved in a self-consistent way to reveal exponential growth and saturation of the number of infected nodes in the early- and late-time regimes, respectively [3,10,11,25]. However, in the intermediate-time regime, large fluctuations prevent us from referring to time-dependent functions.To construct a theory for the mean infection time, let us first consider the time taken to newly infect a susceptible node, the average of which will be identified with . As infection spreads along the links connecting an infected node and a susceptible node, the total number of such links, which is counted during the simulation of the SI model as in Sec. II and we call boundary links, essentially determines . The boundary links were also used to formulate the uniform mean-field framework for time-dependent quantities in [26]. If a cluster of infected nodes has boundary links, a newly infected node will first appear at a time between and with probability . Given infected nodes, the fluctuation of is insignificant unless is too small (see Fig. 10 in Appendix D), allowing us to use the mean . Therefore, we evaluate the mean time taken to infect one more node given infected ones as
where we introduced
with being the adjacency matrix, which is symmetric, and being the expected degree of the newly infected node given infected nodes or, equivalently, the infected node. and are the link-based volumes of the whole and the internal part of the infected cluster; () is the sum of the number of all (infected) neighbors of all infected nodes. In this sense, can be considered the boundary area of the cluster. A similar link-based approach was taken in establishing nonlinear differential equations for time-dependent variables [12,27].
FIG. 10.
The ratio of the standard deviation to the mean of the number of boundary links for a given number of infected nodes in SF networks with .
To complete and solve Eq. (4), the dependence of and should be known. When a node is newly infected, is increased by twice the number of its previously infected neighbors, . When is not so large, the node is very likely to have just one infected neighbor, without forming a loop in the infected cluster as supported by its tree structure, as seen in Figs. 3(a) and 3(b)
[28]. is increased by 2 whenever a newly infected node appears, resulting in
It is valid for a wide range of [Fig. 4(a)], except for the large- region where a newly infected node can have more than one infected neighbor, forming loops in the infected cluster.
FIG. 4.
The volume of the infected cluster and the mean infection time for . (a) The internal volume versus . The approximation in Eq. (6) is shown as a guide. (b) The degree of the newly infected node given infected nodes. Its cumulative sum gives the whole volume as in Eq. (10). The lines represent computed using Eq. (8) with from the simulations used. (c) Plots of versus from simulations (points) and from the solutions (lines) to Eq. (4) with Eqs. (6) and (10) and the initial condition and .
The volume of the infected cluster and the mean infection time for . (a) The internal volume versus . The approximation in Eq. (6) is shown as a guide. (b) The degree of the newly infected node given infected nodes. Its cumulative sum gives the whole volume as in Eq. (10). The lines represent computed using Eq. (8) with from the simulations used. (c) Plots of versus from simulations (points) and from the solutions (lines) to Eq. (4) with Eqs. (6) and (10) and the initial condition and .Next, we consider the degree of the infected node. Before its infection, the node was susceptible and connected to one of the previously infected nodes. Let us assume that every link from the susceptible nodes is equally likely to be heading to one of the infected nodes. Then the probability that a susceptible neighbor of the infected nodes has degree can be approximated as , where is the expected number of susceptible nodes with degree given infected nodes. We also assumed in the relation , which is valid in the intermediate stage. The decrease in the number of susceptible nodes of degree , is equal to , giving
Here is the probability that any link of a susceptible node of degree is not used to transmit infection when a newly infected node appears. From Eq. (7), one obtains . The expected degree of the infected node is evaluated as
where we defined , which is reduced to the mean degree of a node's neighboring node for . Notice that is computed by using the degree distribution of the underlying network.Simulations support the agreement between and [Fig. 4(b)]. is constant for small but decreases with for large , particularly in SF networks with small . A similar decrease in the degree of newly infected nodes with time was noted in [11]. However, its functional behavior remains unknown, which should be understood for the theory of the mean infection time. The exponential term in Eq. (8) is the key. When is so small that or , with
the exponential term is close to 1 for all , and thus, . If will be quite small for , meaning that susceptible nodes with a degree larger than are rarely seen, as they are already infected, causing to decrease with . In the configuration-model SF networks [17], for , and for . Therefore, the intermediate stage of infection is divided into two regimes: and . The decay of with for is significant in SF networks with , for which diverges with . As for , and for (see Appendix E).Solving Eq. (4) by using the approximation for ,
which behaves as for and for , and using Eq. (6) for , one obtains from of the substrate networks. In Fig. 4(c), the simulation data agree with this solution in the intermediate stage.The analytic solution to reveals a crossover around from the exponential- to algebraic-spreading phase in SF networks of . For is fixed at , yielding the exponential spreading
where is a constant larger than 1 but much smaller than and . In SF networks with decreases very weakly with for , and therefore, Eq. (11) is valid approximately for . On the contrary, in SF networks with , the sublinear growth of for leads to
with the coefficient (see Appendix E). This means that infection spreads with time algebraically, slower than an exponential spreading. The polynomial dependence of has been studied using the branching process approach [14]. It reflects the inequivalent chances of infection for nodes of different degrees; most hub nodes are infected for , and only the small-degree nodes are left susceptible for . Knowing such crossover in the spreading speed can be helpful for designing and executing in a timely fashion an efficient strategy to intervene in the spreading process.
CONCLUSION
To conclude, we have shown that the infection time is well defined as a function of the number of infected nodes, enabling the reliable description and prediction of the temporal pattern of spreading in heterogeneous networks. The link-based volume and boundary area of the infected cluster were investigated as a function of its size, which allowed us to see how the node degree affects the order of infection and understand the temporal complexity characterized by the algebraic spreading in the nonstationary state. In more complex spreading dynamics such as the susceptible-infected-susceptible and susceptible-infected-recovered models, the infected cluster may shrink in the bulk due to recovery as well as grow at the boundary, which could deepen our understanding of the spreading phenomena. The perspective and method presented in this work can be used in practical applications as well as in the study of various model dynamics on heterogeneous networks.