Literature DB >> 29720624

Explicit size distributions of failure cascades redefine systemic risk on finite networks.

Rebekka Burkholz¹, Hans J Herrmann^2,3, Frank Schweitzer⁴.

Abstract

How big is the risk that a few initial failures of nodes in a network amplify to large cascades that span a substantial share of all nodes? Predicting the final cascade size is critical to ensure the functioning of a system as a whole. Yet, this task is hampered by uncertain and missing information. In infinitely large networks, the average cascade size can often be estimated by approaches building on local tree and mean field approximations. Yet, as we demonstrate, in finite networks, this average does not need to be a likely outcome. Instead, we find broad and even bimodal cascade size distributions. This phenomenon persists for system sizes up to 107 and different cascade models, i.e. it is relevant for most real systems. To show this, we derive explicit closed-form solutions for the full probability distribution of the final cascade size. We focus on two topological limit cases, the complete network representing a dense network with a very narrow degree distribution, and the star network representing a sparse network with a inhomogeneous degree distribution. Those topologies are of great interest, as they either minimize or maximize the average cascade size and are common motifs in many real world networks.

Entities: CellLine Chemical Disease

Year: 2018 PMID： 29720624 PMCID： PMC5932047 DOI： 10.1038/s41598-018-25211-3

Source DB: PubMed Journal: Sci Rep ISSN： 2045-2322 Impact factor: 4.379

Introduction

Systemic risk is defined as the risk that a large fraction, ρ → 1, of a system fails. This can happen because extreme events shock the system[1] or because the failures of a few system elements cause new failures. This propagation results in failure cascades that can eventually reach the size of the system[2]. In the aftermath of the financial crisis in 2008/2009, systemic risks induced by such cascades received attention especially in the context of financial contagion of bank defaults[3-5]. Also the increasing complexity of economic value chains[6] raised awareness for similar effects[7], in particular concerning critical resources[8] and food[9]. Many of these works are inspired by stability analyses of ecological food webs[10-12]. But also generic models from physics and related areas have been utilized to study failure cascades. Examples include Ising models[13], models of fiber bundles that break under stress[14,15], models of epidemic spreading[16,17], voter models[18], percolation[19-22] and models of overload failures in power grids or other infrastructure[23]. Many of these cascade models have in common that they can be mapped to a generic threshold model for propagating failures on a graph G, or a complex network, where nodes represent the system elements and links between nodes their interactions.

Network model

Formally G = (V, E) is a constant undirected network consisting of N = |V| nodes, which are elements of the node set V. The link set E contains tuples of the form (i, j), each representing a pairwise connection between nodes i and j. Network topologies vary between Erdös-Rényi random networks, where the number of links per node can be described by an average degree, and scale-free networks, where the number of links vastly differs between core nodes, or hubs with very many links, and peripheral nodes with very few links. In this paper, we discuss limit cases of these two topologies, i.e. the complete network, where each node has the same number of links, N − 1, and the star network, where the center node has N − 1 links, but the peripheral nodes have only one link to the center. Both are recurring motifs of real world networks and an approximation of prevailing core periphery structures. In particular, financial and economic networks often have densely connected core nodes with peripheral nodes loosely connected to them[24,25]. Most importantly, for many models, one of the two, the complete network or the star, minimizes while the other maximizes average systemic risk[26,27], when systemic risk is measured by the cascade size, i.e. the fraction of failed nodes in the network.

Cascade processes

A failure of a node i at time t is indicated by its binary state variable s (t) = 1. Otherwise, it is healthy or functional, s (t) = 0, and can switch to failed in each of the discrete time steps t = 0, …, T. For simplicity, we exclude the possibility to recover, i.e. node states can only change from 0 to 1 but not vice versa. Each node state s (t) is characterized by two node variables, a threshold θ and the load λ (t). θ summarizes the “robustness” of a node, its ability to withstand shocks. It is assumed to stay fixed over time. Individual thresholds are initially drawn independently at random from an arbitrary cumulative distribution function (cdf) F (θ), often also called the response function. A node i fails at time t + 1, if its load exceeds its threshold, i.e. s (t + 1) = Θ[λ (t) − θ], where Θ[x] is the Heaviside function with Θ[x] = 1 if x ≥ 0 and Θ[x] = 0 otherwise. The load λ (t) can change because of interaction with neighboring nodes involving a load distribution process. This basically describes how a failure can propagate along a link if two nodes interact, because a load increase can result in λ (t) ≥ θ. Initially, all nodes with θ ≤ λ0 fail, where λ (0) = λ0 denotes the initial load of a node. This allows us to define the size of a cascade at time t as the fraction of failed nodes in the network: . A cascade maximally lasts T = N − 1 time steps because at least one node needs to fail at each time step to keep the process going. The final size of the cascade, ρ = ρ (T), i.e. the final fraction of failed nodes, is a common measure for systemic risk. While the cascade dynamics itself is deterministic, ρ is a random variable, as it depends on the thresholds that are drawn initially independently at random according to the distribution F (θ). This randomness can model uncertainty (e.g.g incomplete data or information) or changing exposures of nodes, for instance, due to fluctuating markets or aging system components in an engineering setting. Based on combinatorial arguments, in the materials and methods section and the SI, we derive a closed-form solution for the probability distribution of ρ on complete graphs, p (ρΔ), and stars, , of finite arbitrary size N.

Three paradigmatic load distribution mechanisms

We discuss three exemplary load distribution mechanisms well established in different fields. We refer to the first one[28] as exposure diversification (ED). It means that a node is exposed equally to the risk that any of its neighbors fail. The more neighbors, the better for hedging against such exposures (in low systemic risk regimes)[29]. A node carries simply the fraction of its failed neighbors as load. Thus, in complete networks each functional node carries the load λ [k] = k/(N − 1) when k nodes have failed. The ED mechanism has not only attracted theoretical interest[30-34], it is also used to explain opinion formation[35] and financial contagion[3,36,37], where random thresholds correspond to fluctuating exposures between banks and changing capital buffers. The randomness can also be interpreted as model uncertainty in such complex systems[38]. The second mechanism, denoted as damage diversification (DD)[29,39], assumes that a failing node distributes its total load 1 equally among its network neighbors. Consequently, DD coincides with ED for complete networks. The third mechanism defines the prominent fiber bundle model[14,40] in material science. A force, applied to a bundle of fibers, is modeled by a load λ0 carried by each node in a network, while a node’s threshold determines the strength of a fiber. When a fiber i breaks, it distributes its full load λ(t) equally to its still functional network neighbors, which become connected afterwards. On complete networks, this mechanism is also known as global load sharing rule and translates into λ [k] = λ0 + kλ0/(N − k) that each functional node carries when k nodes have failed. For the special and simplifying case of uniformly distributed thresholds, a formula for the cascade size distribution is known already[41]. With our new derivations, now arbitrary threshold distributions and thus different fiber materials can be studied. For the star network, we focus on a local load sharing variant where no new edges are created[42,43]

Results

The only random ingredient that we study is the cumulative threshold distribution F(θ). Our derivations apply to an arbitrary choice. Yet to be consistent and to allow for comparisons, we discuss only , i.e. a normal distribution with mean μ + λ0 and standard deviation σ. For ED and DD, λ0 does not influence the results, so we set the initial load to λ0 = 0. For the fiber bundle model, we assume λ0 = 1, i.e. a value large enough that we can observe both cascades that span the whole network (ρ = 1), and cascades that stop early. We are especially interested in parameter regions (μ, σ) that lead to medium average cascade sizes. This implies that such cascades would not encompass the whole system in case of an infinitely large network. We then investigate, for such “safe” parameter regions, the size of cascades for finite networks. As a reference case, Fig. 1(a) shows the phase diagram for ρΔ for infinitely large systems, assuming a complete network and the ED/DD mechanism[29,39]. ρΔ is taken as a measure of systemic risk. If the mean value of the threshold distribution, μ, is large, the system is less prone to large failure cascades. Taking a small, but fixed value of μ, we recall two important observations, (i) the sharp phase transition between completely safe systems (white, average cascade size zero) and completely failed systems at a critical value of (μ, σ), (ii) the gradual decrease of systemic risk in areas with high σ. This means, the heterogeneity in thresholds plays a crucial role in amplifying, or preventing, systemic risks. Therefore, we first study the impact of increasing heterogeneity σ on failure cascades in finite networks. Figure 1(b) shows the broad distribution of the final cascade size, spanning values between 0 and 1. Only for large values of σ we find distributions that match our expectations derived from the behavior of infinite systems, i.e. they are unimodal and symmetric. As σ decreases, however, the distributions become asymmetric and broader and even bi-modal. Only very close to a phase transition, broader distributions in finite systems are expected by theoretical physicists and, usually, explicit formulas for such distributions are unknown. But even for parameters further away from the phase transition in Fig. 1(a), the average cascade size is meaningless to characterize the system. Instead, we can expect either small cascades or, as a new observation, very big ones that are likely to destroy the whole system and are relevant for systemic risk estimations. Counter-intuitively, the smaller the threshold heterogeneity, the the less predictive the system behavior becomes.

Figure 1

ED/DD mechanism on complete network and normally distributed thresholds, i.e , with mean μ and standard deviation σ. (a) (Average) final fraction of failed nodes ρΔ (color coded) on infinitely large network. The darker the color, the higher is ρΔ. (b) Distribution of ρΔ on finite network of size N = 50 with respect to different σ. μ = 0.5 is fixed. This strong bi-modality is a new observation that only applies to finite systems and is not captured by risk approximations for infinite systems. To further investigate how this behavior depends on the system size N, we simulate, for parameters (μ, σ) = (0.5, 0.4), the distribution of final cascade sizes. Monte Carlo simulations are necessary because, for systems up to N = 108, the products of large binomial terms and small probabilities that appear in our exact derivations become unfeasible to solve numerically. Figure 2 shows the results. We see that the bi-modality persists even for networks of size N = 107. So, the question whether networks of that size can be reasonably described by approximations for infinite networks, is clearly answered by “no”. On the other hand, real economic networks which are prone to failure cascades are often of the order 104–106, for example ownership networks or collaboration networks of firms[6]. Hence, it is of importance to correctly reflect finite-size effects in risk estimations of such systems.

Figure 2

(a) Final cascade size distribution for complete networks of several sizes N obtained via 104 independent simulations of the ED mechanism, i.e. draws from P(ρΔ). The thresholds are independently normally distributed with mean μ = 0.5 and standard deviation σ = 0.4 (). (b) Position of the two local maxima of the distributions in (a) with respect to the network size N. Figure 3 shows the corresponding cascade size distributions for the star-shaped networks. Here, we have to distinguish the ED and the DD mechanisms of load distribution, which coincide only for complete networks. Figure 3(a) shows the results for the ED mechanism. Again, we observe bi-modality in the distribution of cascade sizes that vanishes only for large threshold heterogeneity σ. The bi-modality has a clear interpretation with respect to the central node. In smaller cascades, the center does not fail, which explains the peak at low values of . But if the center fails, it triggers a significant number of further failures of leaf nodes that explains the right peak in the distribution. This bi-modality is so fundamental that it does not even vanish in the limit N → ∞, as we show in the materials and methods section.

Figure 3

Probability distribution of the final fraction of failed nodes on a star network with N = 50 nodes for the (a) ED mechanism and (b) DD mechanism. The thresholds are normally distributed with mean μ = 0.5 and standard deviation σ (i.e. . The results for the DD mechanism presented in Fig. 3(b) show that, in contrast to the ED mechanism, the distributions of the final cascade size are unimodal, even for smaller σ. Again, this can be explained with the role of the central node. Although the center fails with high probability, with the DD mechanism, its failure distributes only a small amount of load equally to the (large number) of leaf nodes. Hence, this does not cause substantial additional failures. As in infinitely large systems[29], in the presence of hubs, failure cascades involving the DD mechanism expose the system to a smaller failure risk than cascades involving the ED mechanism - at the expense that hubs usually fail. Eventually, we also discuss the third load distribution mechanism used in the fiber bundle model. Because of the additional load λ0, more load is distributed after a cascade started. As a result, in complete networks large cascades amplify to a full system break-down shown in the very right peak at ρΔ = 1 in Fig. 4(a). This is an extreme case of the bi-modal distribution, where also small cascades can occur with high probability. As Fig. 4(b) shows, in star-like networks the range of probable cascade sizes is usually bigger. Dependent on the (μ, σ) values of the threshold distribution, bi-modal cascade-size distributions can still occur. The peaks of the cascade-size distribution can be explained again by the survival or the failure of the central node. In comparison with complete networks, these peaks are often close together. Further, the lower connectivity of the star-like networks reduces the risk of large cascades.

Figure 4

Fiber bundle model with initial load λ0 = 1. Probability distribution of the final fraction of failed nodes with respect to normally distributed thresholds, i.e. , on (a) complete network of size N = 30 and (b) star of size N = 50.

Discussion

Our driving question was whether systemic risk in finite systems can be reasonably estimated from approximations valid for infinite systems. We have to negate this question for two reasons. First, taking the final size of failure cascades as a risk measure, we could demonstrate, for the first time, that the distribution of this quantity changes fundamentally in finite systems. While a unimodal and narrow distribution, typical for infinite systems, would allow to use the average cascade size as a reasonable measure, we could show that for finite systems we often have a very broad and even bi-modal distribution. Hence, an estimated average does not quantify the real risk. Instead, small cascades but also extreme larger failure cascades become more likely. Even though a complete system break-down is unlikely as we show (which may be expected for infinite, but not for finite systems), large cascades in finite systems are much more frequent than expected so far. Secondly, we could also estimate the range of sizes for which such finite size effects play a role. We found that they are important for system sizes up to 107, i.e. these systems are quite large, even though finite. This demonstrates that corrections are needed not just for small systems of a few to a few hundred nodes. Instead, they are already necessary for all real-world systems, up to a hundred million nodes. In addition to these insights, we were able to derive novel explicit closed-form solutions to calculate the full distribution of the final cascade size, for two exemplary topologies. Our results are unique, as we derive exact solutions that apply to networks of arbitrary finite size N, a wide range of load distribution mechanisms, and arbitrary threshold distribution to characterize the robustness of the nodes. In light of the combinatorial complexity of the problem, such results are rare and crucially depend on the high symmetry of the considered networks. Complete networks and stars are of particular interest, as they appear to be systemic risk minimizing or maximizing when only the average cascade size is considered[26,27]. Intuitively, this is reasonable, as the complete graph represents the limit case of perfect risk diversification, where the risk is shared between all system components. The star network can be less prone to the amplification of failures in the course of a cascade, in particular when its central node is very robust. For that reason, central counter parties are often favored to reduce systemic risk. Yet, we have to reconsider our assessment of robust finite network topologies and take the probability of large system break-downs into account, as these are relevant for risk averse decision makers. Many further real world networks, especially economic networks, show a core-periphery structure[24,25] with densely connected core nodes and peripheral nodes loosely connected to the core. Thus, their topology can be understood as a superposition of the sample topologies discussed here, the complete network and the star network. Sparse topologies, with the star network as a paragon, occur frequently in real world networks[44] and play an important role in the spreading of failures[45]. Even other topologies, like Mandala[46] networks which are specifically designed to reduce average systemic risk with respect to percolation and related risk measures, consist of complete cliques that are highly interlinked and are thus prone to uncertainty about the final cascade size. These systems usually have sizes that require to consider the full cascade size distribution for the estimation of systemic risk, as we propose with our solutions. Beyond their elegance, such closed-form formulas have the advantage that they enable explicit optimization strategies for control of the involved parameters. For instance, we can derive derivatives with respect to those parameters that inform gradient based optimization methods. This allows to design robust systems, by controlling e.g. the broadness or the convergence of the cascade size distribution. It can be also applied in restoration strategies for failed systems[47]. Yet, we cannot expect to derive such analytic results for arbitrary network topologies. In this case, we have to rely on Monte Carlo simulations to estimate the broadness and shape of the cascade size distribution. We hope to inspire systemic risk analysts to base their insights and decisions on risk measures that can capture such broadness and multi-modality.

Methods

Derivation of cascade size distributions

In finite networks, the final cascade size ρ takes discrete values in the set {0, 1/N, 2/N, …, 1}. To determine the distributions P(ρΔ) and , we can calculate the probabilities for each event and , where k denotes the number of failed nodes at the end of the cascading process.

Complete networks

For complete networks, we can decompose the probability of k final failures into the product of the probability that N−k out of N nodes do not fail and the probability p that exactly k nodes fail Recall that λ [k] denotes the load that a node with k failed neighbors carries so that F(λ [k]) is the probability that this load exceeds the node’s threshold. With the help of the inclusion-exclusion principle[48], p can be expressed as The details of this derivation and proofs are provided in the supplementary information. Apparently, our derivations apply to an even larger class of models than threshold models. The failure probability of a node, here F(λ[k]), can be any function of the number of its failed neighbors.

Star networks

The central node with degree d = N − 1 has a prominent role, as it is the only node that can distribute accumulated load received by other nodes. We allow it to have a different threshold cdf F and load function λ than the remaining N − 1 leaf nodes, which have degree 1, threshold cdf F and carry loads of the form λ. A leaf node can fail initially with probability F (λ [0]) (with 0 neighboring failures) or, if the center has failed before, it can carry the load λ [j, l], which depends on two additional variables that are determined by the history of the center: j and l. j denotes the number of failed leaf nodes that have distributed load to the center, before the center has failed. l indicates the number of nodes among which the accumulated load of the center is shared (when the center fails). We simply add the probabilities for all possible cases and obtain: The first summand considers the case when the center does not fail, while the second term adds the probability for the case when the center fails initially. Then, each of the other k − 1 failures of leaves can either occur initially or because of a load distribution by the center. The size of this load might depend on the number of nodes l that fail initially together with the center, since these nodes cannot receive load after the failure of the center. The index j in the third term finally takes the events into account when j leaves have failed before the center. Remarkably, this distribution converges for N → ∞ to a discrete distribution with non-zero probability at three events, if λ [k] and λ [j, l] only depend on fractions of nodes (λ [k] = λ [k/N], λ [j, l] = λ [j/N, l/N]). In an infinitely large star, exactly F (λ0) of the leaves fail initially. With probability F (λ0), the center node fails as well initially so that of the nodes fail in total. With probability F (λ [F (λ0)]) − F (λ0), the center fails after the initial failures, which leads to . Or with probability 1 − F (λ [F (λ0)]), the center does not fail and we have . In case of the ED model, two cases collapse to one because λ [0, 1 − F (λ0)] = λ [F (λ0), 1 − F (λ0)] = 1. Thus, the observed bi-modality of the distributions in Fig. 3(a) persists for increasing network size N.

Exemplary load distribution mechanisms

We have discussed three basic load distribution mechanisms corresponding to the exposure diversification (ED) model, the damage diversification (DD) model and a fiber bundle model. In the following, we specify how these models define the loads used in the derivations above.

ED load redistribution

In the complete graph, each load carries the load λ [k] = k/(N − 1). For the star network, λ [k] = k/(N − 1) holds for the center node and λ [0] = 0, λ [j, l] = 1 for the leaf nodes.

DD load redistribution

In the complete graph, each load carries the load λ [k] = k/(N − 1). On the star network, DD corresponds to the choice λ [k] = k, λ [0] = 0, λ [j, l] = 1/(N − 1).

Fiber bundle model load redistribution

On complete networks, the total load λ0k has been redistributed so that each functional node carries λ [k] = λ0 + kλ0/(N − k). For stars, we have λ [k] = (k + 1)λ0, λ [0] = λ0, λ [j, l] = (j + 1)λ0/l. For finite networks of arbitrary size N, we have obtained explicit closed forms of the full probability distribution of final cascade sizes for two prominent topologies. This is a major achievement that allows us to calculate the systemic risk in a highly efficient manner. More detailed information can be found in the supplementary information. Supplementary Materials

26 in total

Explicit size distributions of failure cascades redefine systemic risk on finite networks.

Introduction

Network model

Cascade processes

Three paradigmatic load distribution mechanisms

Results

Discussion

Methods

Derivation of cascade size distributions

Complete networks

Star networks

Exemplary load distribution mechanisms

ED load redistribution

DD load redistribution

Fiber bundle model load redistribution

1. Emergence of scaling in random networks

2. Seed size strongly affects cascades on random networks.

3. How to make a fragile network robust and vice versa.

4. Information cascades on degree-correlated random networks.

5. Complex systems. Complexity theory and financial regulation.

6. Framework for cascade size calculations on random networks.

7. Emergence of core-peripheries in networks.

8. Correlations between thresholds and degrees: An analytic approach to model attacks and failure cascades.

9. Mandala networks: ultra-small-world and highly sparse graphs.

10. Systemic trade risk of critical resources.

1. Efficient message passing for cascade size distributions.

2. The Downside of Heterogeneity: How Established Relations Counteract Systemic Adaptivity in Tasks Assignments.