Literature DB >> 29531250

Modelling indirect interactions during failure spreading in a project activity network.

Abstract

Spreading broadly refers to the notion of an entity propagating throughout a networked system via its interacting components. Evidence of its ubiquity and severity can be seen in a range of phenomena, from disease epidemics to financial systemic risk. In order to understand the dynamics of these critical phenomena, computational models map the probability of propagation as a function of direct exposure, typically in the form of pairwise interactions between components. By doing so, the important role of indirect interactions remains unexplored. In response, we develop a simple model that accounts for the effect of both direct and subsequent exposure, which we deploy in the novel context of failure propagation within a real-world engineering project. We show that subsequent exposure has a significant effect in key aspects, including the: (a) final spreading event size, (b) propagation rate, and (c) spreading event structure. In addition, we demonstrate the existence of 'hidden influentials' in large-scale spreading events, and evaluate the role of direct and subsequent exposure in their emergence. Given the evidence of the importance of subsequent exposure, our findings offer new insight on particular aspects that need to be included when modelling network dynamics in general, and spreading processes specifically.

Entities: CellLine Chemical Disease Species

Year: 2018 PMID： 29531250 PMCID： PMC5847592 DOI： 10.1038/s41598-018-22770-3

Source DB: PubMed Journal: Sci Rep ISSN： 2045-2322 Impact factor: 4.379

Introduction

Recent years have witnessed a flurry of work on spreading processes[1,2], ranging from empirical expositions on the impact of such spreading (e.g. existence of large-scale spreading events[3-5], properties of ‘super-spreaders’)[6-10] to methodological developments that map the underlying dynamics (e.g. spreading mechanisms[11-14], modelling frameworks)[15-19]. Pairwise interactions are the perceived centrepiece in understanding the evolution of spreading processes[20], since they capture the direct exposure of each node to the prospect of switching its state, for example from ‘non-affected’ to ‘affected’. Importantly, the impact of direct exposures can be supplemented by subsequent exposures which may arise due to global or local network features. Past work has focused on the impact of these indirect effects by considering the global topology of the network (e.g. distribution of shortest paths)[21,22] and how it is influenced by particular mechanisms (e.g. flow redistribution)[23,24]. However, such indirect effects can also arise from local, non-trivial structures (e.g. particular network motifs such as the feed-forward loop)[25,26]. Despite the intuitive importance of these local structures[26], little attention has been paid in evaluating their impact to the overall spreading process, largely due to the particularities of the spreading models typically deployed to study these processes. In particular, spreading models can be classified to two broad categories[12,27], depending on the incorporated mechanisms: (i) epidemiological models, where the spreading process is viewed as an independent event across different pairs of nodes (e.g. the probability of node i to affect its neighbour j is independent of other interactions), and (ii) sociological models, where the spreading process is viewed as an interdependent event (e.g. the probability of node i to affect its neighbour j depends on the state of node’s j neighbours). Despite this distinction, both model categories are grounded on the same fundamental premise, in which direct exposure is the principal factor that determines the dynamics of the spread. Yet, the implication of this premise can be non-trivial, as subsequent exposures can interfere with the spreading process and affect key outcomes, even in simple examples like the one discussed below. Consider the toy network in Fig. 1, where each node can switch states irreversibly from ‘non-affected’ to ‘affected’ with a given probability p. In the case of an epidemiological spreading model, this premise corresponds to the widely used ‘Susceptible’-‘Infected’ model[28], where nodes and links correspond to individuals and infection pathways (e.g. social interactions), respectively. In this case, the ability of node i to affect its immediate neighbours is assessed independently across all possible pairs (pair ; pair ; pair ). At this point, node j is directly exposed to the infection of node k, through the directed link from node k to node j, and indirectly, through paths and (see Methods). As a result, the possibility for node j to switch state is evaluated at three distinct points during the evolution of the spread (one direct link; two indirect paths), compared to the single evaluation that takes place in the case of node i and node l (due to their direct link). Hence, node j is three times more likely to switch state, compared to node i and l, despite that fact that the probability of them changing state is uniformly set. We can trace this effect to the implicit assumption that direct and subsequent exposures are of equal importance, which is appropriate when dealing with a typical disease spreading.

Figure 1

Example to highlight the distinction between the direct and subsequent exposure of node j to the failure of node k (time runs from left to right). Top panel focuses on the direct case, where node j is directly exposed to node k’s failure; rest illustrate the case where node j is subsequently exposed, via node i (middle panel) and node l (bottom panel). Fleshing out this assumption, the probability of node j to switch states, either due to its contact with node k, or due to its contact with node k (as part of path ), is exactly the same. This suggests that the underlying pathogen that drives the spread has remained unchanged (and therefore, the probability for node k to infect its neighbours, including node j, is exactly the same as the probability of node i to infect its own neighbours, including node j). However, consider an alternative context where the pathogen is replaced by a defect, which spreads across a network of activities, where nodes correspond to technical activities[29] (e.g. specify the design of an engineering component) and links to functional dependencies (e.g. activity j can start if, and only if, its predecessor task i finishes; i.e. manufacturing an engineering component can start only if its specifications have first been specified)[30,31]. In this case, the interpretation of direct and subsequent exposures is distinct, where activity j may be particularly susceptible (or immune) to the particular defect that caused activity i to fail. For example, consider the case where activity k corresponds to ‘specify component’, activity i to ‘manufacture component’ and activity j to ‘install component’. In the case where activity k fails, the probability of affecting activity j may be low (e.g. due to the availability of prefabricated, standardised components). Yet, if activity i is affected by the failure of activity k, the probability of activity j to be affected is now higher – in this case the defect propagating across the activity network has evolved from ‘failure to specify’ to ‘failure to specify and manufacture’. Therefore, the fact that activity j is additionally exposed to the defect in an indirect way (through node i and node l; Fig. 1) directly interferes with the likelihood of activity j changing its state; yet typical spreading models – like the aforementioned SI model – would assume that such subsequent exposure does not interfere. In conjunction with the fact that activity networks have a high concentration of local, non-trivial structures[32] – and hence, experience pronounced levels of subsequent exposure (see Supplementary Fig. 1) - the likelihood of obtaining misleading results through the use of typical spreading models is high. Consequently, a range of decisions that may be driven by those results can be significantly affected, from estimating an appropriately sized contingency budget to developing appropriate (project) risk management framework(s)[31] In response, we propose a simple spreading model that allows us to map the impact of direct and subsequent exposures (see Methods). It does so by disentangling the probability of a defect to spread (p), to two distinct components:p1, which controls the probability of node j being affected by the failure of its predecessor, node k (Fig. 1, top panel) and , which controls the probability of node j being affected by additional, subsequent exposures to the failure of node k, through node i (Fig. 1, middle panel) or node l (Fig. 1, bottom panel). We distinguish between the two by keeping track of node k’s successor at time t, say node j, and assess whether node j has been encountered before. If so, then node j has been directly exposed to the failure of a different predecessor before t (Fig. 1 middle/bottom panel at t = 2), and hence the probability of node j failing is controlled by ; if node j is encountered for the first time, the probability of is controlled by p1 (Fig. 1 middle/bottom panel at t = 1). We formulate two models,M0 where and , and M where both and . Hence, any difference between results obtained under M0 and M reflect the impact of subsequent exposure, with controlling the magnitude of the effect. We deploy both model variants M and M0 to a real-world network of activities (i.e. a project)[33,34], and explore two key quantities that characterise spreading – the spreading event size (S) and the rate by which spreading propagates (S) –where has an important effect. We subsequently focus on particular structural features of the pathways used to sustain these spreading events, and specifically on the different impact that P1 and have. Finally, we explore the topological properties of nodes involved in large-scale spreading events. In agreement with recent studies[6,35] on information spreads, we report the presence of ‘hidden influentials’ (i.e. nodes with average topological properties which play a key role in sustaining large-scale spreading events), with their existence being increasingly pronounced at higher P1 and/or values.

Results

We first establish the importance of subsequent exposure (parameter ) by illustrating its effect on the spreading event size through a comparative analysis between the results of spreading model M and M0. At this point we provide evidence on the link between subsequent exposure and clustering, where an increase in the latter (clustering) controls the magnitude of the former (subsequent exposure). We subsequently use model M to highlight the contrasting impact that direct (parameter P1) and subsequent exposures have on propagation rate. We then focus on the relationships between the structural characteristics of these spreading events and their size/rate, and how P1 and affect them. Finally, we provide insight on the topological features of nodes capable of fuelling large-scale spreading events, and how P1 and influence them.

Spreading event size and propagation rate

We first define the ratio of the largest spreading event size obtained using the M model, over the largest spreading event size obtained under the model, as . In a similar fashion, we denote the ratio of the average spreading event sizes as ravg. With rmax and ravg being functions of both P1 and , we can examine their isolated effect by considering their corresponding averages: and By applying the same approach to ravg, we obtain and . Increasing parameter leads to values of (Fig. 2a) and (Fig. 2b) being significantly higher than 1, demonstrating the augmenting effect that subsequent exposures have on the spreading event size. This result suggests that this activity network contains a high enough number of non-trivial subgraphs – such as the ones included in Fig. 1 – which allows to have a significant impact on the progression of the spreading process. If the converse where to be true (i.e. ), it would suggest that the activity network could be approximated by ‘locally tree-like’ network, where subsequent exposure would have had no effect over the spreading process, with results resembling that of a tree network (results from a tree network are shown in Supplementary Fig. 2 for reference).

Figure 2

Difference in spreading event size between spreading model and M (a) Ratio of the largest spreading event sizes () under the entire spectrum of p1 an (largest ); (b) like (a), focusing on the ratio of average spreading event size () under parameter p1 and ; (c) ratio of largest (red) and average (blue) spreading event size, averaged across , as a function of p1; (d) ratio of largest (red) and average (blue) spreading event size, averaged across p1, as a function of . Taken in isolation,P1 and show qualitatively different characteristics in terms of their impact in the spreading process, further highlighting the non-trivial interaction between direct and subsequent exposure in the context of a spreading process. The concave relationship between both and , with respect to P1 demonstrates the principal role of P1 in sustaining the spreading process, with both and converging to 1 at the two extreme ends of P1 (Fig. 2c). On one hand, if no spreading occurs and therefore the effect of is nullified, with both M and converging to identical spreading events; when then direct exposure successfully switches the state of all nodes and therefore, no nodes are left for to affect. Interestingly, high values are preserved up to relatively high P1 values (), indicating the strong influence of even under unfavourable conditions i.e. under , a node is much more likely to switch state due to a direct rather than subsequent exposure, and therefore one would naturally expect that the influence of would be limited, giving rise to a low value. Finally, the intuitive expectation of having an ever-increasing effect in terms of both and is supported by the monotonically increasing trends shown in Fig. 2d. This insight is consistent when we consider the cumulative probability distribution of , focusing on (i) the probability of observing a spreading event of a given size, and (ii) the magnitude of the largest event. In this case, the augmenting role of is particularly pronounced at the tail of the distribution, where higher increases both (i) and (ii) (Fig. 3b). This is in contrast to the direct impact of P1, where highger P1increase both(i) and (ii) across the entire range of (Fig. 3a). Taken in conjunction, this behaviour demonstrates the subtle impact of indirect interactions with respect to the emergence of small-scale spreading events – which are largely driven by direct interactions – and their marked influence with respect to large-scale spreading events, both in terms of their probability of their emergence and their absolute size.

Figure 3

Cumulative probability distribution of spreading event size (S), where marker colour corresponds to (a) p1 and (b) . For clarity, p1 and are sampled in 0.3 step intervals, taking the values of [0, 0.3, 0.6, 0.9]. The extent by which subsequent exposures control the spreading event size depends on the clustering of the network, as captured by the clustering coefficient (C)[36]. To demonstrate this, we have deployed the Watts-Strogatz model to generate artificial networks whilst varying the rewiring probability β, moving progressively from clustered to random networks . In doing so, we find that an increase in β (and hence, drop in C) decreases the difference between spreading event sizes obtained by M and M0, for both and , with results being qualitatively similar to Fig. 2 (see Fig. 4). This finding suggests that the increased impact that subsequent exposure has to the spreading process in general – and to the activity network in particular – relies, at least partly, to increased clustering.

Figure 4

Difference in spreading event size between spreading model M0 and M for a set of artificial, ‘small-world’ networks with increasingly probability of rewiring (β), ranging from increasingly clustered (a) β = 0.1) to random (f) β = 1) topologies; (g) ratio of average (blue) spreading event size, averaged across , as a function of p1, across the β spectrum; (h) ratio of average (blue) spreading event size, averaged across p1, as a function of , across the β spectrum. We now focus on the rate by which a given spreading event propagates across the network, which we quantify as , where refers to the spreading event size and t refers to the average number of simulation steps needed for all nodes affected (within that spreading event) to switch state from ‘non-affected’ to ‘affected’ (which is a variation of survival probability)[37]. As such, we define the average propagation rate, as the propagation rate for each spreading event, averaged across all events, and the maximum propagation rate, , as the propagation rate for the single largest spreading event. We find that direct and subsequent exposures have the converse effect with respect to the propagation rate, both in terms of and . In particular, we find a positive relationship between P1 and the propagation rate, in terms of both (blue marker) and (red marker), which corresponds to the intuitive expectation where increased direct exposure eases the way in which spreading progresses, enhancing the overall propagation rate (Fig. 5a). However, a negative relationship exists between and propagation rate, both in terms of and (Fig. 5b). This is due to the elaborate topology of the pathways deployed by the spreading process, where higher increases the likelihood of utilising wider spreading pathways (i.e. high ; see Topology of spreading pathways) which are more likely to involve a higher number of links to be traversed for affecting the same number of nodes (increasing the denominator of ) eventually delaying the overall spreading process. Considering both effects, this behaviour suggests that propagation rate is conflated by contrasting dynamics, where direct exposure provides immediate – and thus, faster – pathways for spreading to propagate, while subsequent exposure unlocks slower pathways which supress the overall in propagation rate (even though they may increase the overall spreading event size, as seen in Fig. 2d).

Figure 5

Propagation rate as a function of (a) direct exposure,p1, and (b) subsequent exposure,. Blue and red markers correspond to the average rate across all spreading events, , and propagation rate for the largest spreading event,, respectively.

Topology of spreading pathways

We characterise the structure of a spreading event by considering the maximum depth and width of the underlying pathways that have sustained it. We define the maximum depth of a spreading event (Sd) as the shortest path between the initial seed node and the farthest node involved in the event[6]. In addition, we define its maximum width (Sw) as the maximum number of nodes affected whilst being at the same distance from the seed node[38]. As such, we characterise the structure of each spreading event as , which provides a continuous measure for the overall shape of the underlying pathways, where a high value of corresponds to long and narrow pathways, whilst a low value of corresponds to short and wide pathways (see Supplementary Fig. 3 for examples). In that way, is maximised when the spread is composed of a single linear chain, and minimised when the spread resembles the structure of a star-shaped network. We first focus on the relationship between the largest spreading event size, as a function of , where we identify a non-trivial relationship roughly composed of two opposing trends, see Fig. 6 ( is normalised over the total number of nodes,N). The first trend dominates the small to medium sized events, where the spreading event size increases in step with , demonstrating the reliance of the spreading process to long and narrow pathways. However, as spreading events become larger than a given threshold (in this case, when) the positive relationship between and reverses, with the spreading process enlisting an increasingly high number of relatively shorter and wider pathways. This switch suggests the existence of an upper bound in the number of long and narrow sequence of consecutive tasks within the activity network. These sequences are largely composed of low-out degree nodes, and given the finite size of the network, pose a limit to the growth of the spread. To surpass this limit, and to further fuel the growth of the spreading event size, spreading utilises additional pathways which emerge through the inclusion of occasional high out-degree nodes, which allow for the spreading process to branch out in order to increase in size, resulting in relatively wider spreads.

Figure 6

Parameter space of the largest spreading event size , normalised over the total number of nodes (N), and its underlying structure , as a function of (a) p1 and (b) ; (c) the relationship between the largest spreading events, averaged over and mapped as a function of p1, and (d) the relationship between the largest spreading events, averaged over p1 and mapped as a function of . The rate by which the spreading event size increases depends on the spreading event structure. In the case where the spreading event size is negatively correlated with , the rate by which the spreading event size grows is roughly 3 times faster compared to the case where the spreading event size is positively correlated with (gradient is roughly −0.09 and +0.03, respectively). This result emphasizes the multiplicative effect that wider structures can provide, which in turn enhances the number of nodes that can be reached, and in turn, affected. With respect to the impact of direct and subsequent exposure,P1 and show district trends, demonstrated by the marker colour patterns in Fig. 6a,b, respectively. Focusing on the impact of P1, the transition in marker colour, from blue to red, is accompanied with a smooth increase in the spreading event size (Fig. 6a,c). This result is somewhat expected, since P1 plays a key role in the progression of the overall spreading process. In addition,P1 has an important role in determining the structure of the resulting spreading structure, albeit in a non-trivial manner. In particular, short and wide structures (low ) can occur at both extremes of P1, with the shape slowly converging to the highest attainable values as P1 approaches ~0.5. Shifting focus to the impact of , we observe that the entire range of is obtainable under any given value of , indicating the limited role of subsequent exposure in determining the structure of the pathways used by the spreading process (Fig. 6b). The subtle impact of on is further highlighted in the limited range of obtained reported Fig. 6d, which is significantly lower than the corresponding impact of P1 in Fig. 6c. Note that these results are robust when we consider the relationship between (instead of ) and , as a function of P1 or , see Supplementary Fig. 4. We now focus on the relationship between the propagation rate of the largest spreading event, , and the structure of the pathways used to sustain it, , as a function of P1 (Fig. 7a) and (Fig. 7b). Similar to Fig. 7, we identify a non-trivial relationship roughly composed of two distinct behaviours, where the propagation rate initially increases in step with spreading pathways becoming increasingly long and narrow (higher ). This overall increase in is the result of two conflicting effects – an increase in propagation rate, driven by higher P1 values (Fig. 7a,c), combined with a decrease in propagation rate driven by higher values (Fig. 7b,d). This result demonstrates the conflicting nature of P1 and , where the former relies on direct interactions which are faster to affect, while the latter introduces additional indirect pathways that take longer to evaluate completely due to their non-trivial nature (similar to the ones depicted in Fig. 1).

Figure 7

The relationship between the propagation rate of the largest spreading events across the entire range of p1 and , , and its underlying structure , as a function of (a) p1 and (b) ; (c) the relationship between the propagation rate of the largest spreading events, averaged over and mapped as a function of p1, and (d) the relationship between the propagation rate of the largest spreading events, averaged over p1, and mapped as a function of . Once reaches a given threshold (in this case, ), its positive relationship with , reverses to a negative relationship, which essentially reflects the need to utilise wider pathways (and hence, triggers a decrease in ), in order to increase the propagation rate further. This switch in behaviour is induced when P1 grows over ~0.5; as soon as this reversal in the relationship between and takes place, the fork-like shape of Fig. 7 indicates that two possible trajectories are available. Importantly, the similar marker colouring within both trajectories (Fig. 7a) suggests that P1 has a limited role in determining which of the two trajectories is followed. Yet the distinct marker colouring shown in Fig. 7b indicates that is the key parameter in determining which of the two trajectories is followed. Specifically, the primary trajectory is accessible under the entire range of , and the secondary trajectory, accessible under a limited range of values, roughly ranging from 0 to 0.6. This demonstrates the complex nature of subsequent exposure: on one hand, low-to-medium subsequent exposure means that the topology of the utilised failure pathways depends on the topology of the network itself; on the other hand high subsequent exposure always results in wider pathways being utilised (since any auxiliary path stemming from the main failure pathway is pursued to its full length, triggering an increase in , and hence a decrease in ). Both of these aspects are further reinforced by isolating the results obtained at equal P1 and increments, with controlling the emergence of the second trajectory (see Supplementary Fig. 5 for P1, and Supplementary Fig. 6 for ).

‘Hidden influentials’ and the effect of p1 and

Large-scale spreading events are typically associated with extra-ordinary topological characteristics of the node that initially triggers them (i.e. the seed node), the simplest of which being the (out) degree[39] – our results confirm this common conjecture, albeit with certain strong caveats. Specifically, we find that the spreading event size is highly correlated with node out-degree, as reported in[6,40] (Fig. 8a). However, this correlation deteriorates as P1 increases in size, which suggests that the spreading process is shifting from being a local-driven process (and hence, dominated by the properties of the seed node) to a globally-driven process, where the characteristics of the intermediate nodes eventually dilute the correlation between spreading event size and the topological characteristics of the seed node. This correlation deteriorates faster once the effect of is introduced, as additional intermediate nodes are employed early on during the spreading. Recent empirical work on information spreads has identified a similar effect, where large-scale spreading events are largely sustained by intermediate nodes with no special topological features – the so-called ‘hidden influentials’[6,41].

Figure 8

(a) Correlation coefficient between spreading event size and the out-degree of the initial seed node, as a function of p1 and ; (b) average out-degree of nodes involved in a spreading event, excluding that of the initial seed node, as a function of the normalised spreading event size and p1; dotted line corresponds to the network average out-degree, ; (c) largest as a function of p1, where a lower value indicates an increasingly important role for typical node in sustaining spreading, converging to ; (d) largest as a function of . Following the work of Baños, et al.[6], we evaluate whether these ‘hidden influentials’ exist in the activity network by comparing the average out-degree of nodes involved in a spreading event, excluding that of the seed node, with the network average , and the relationship between and the spreading event size. Notably, converges to as the spreading event size increases, confirming the presence of these ‘hidden influentials’ (Fig. 8b). This behaviour demonstrates that the existence of extra-ordinary nodes (e.g. hubs) is not a necessary condition for large-scale spreading to occur, and hints for other non-local topological properties that may characterise the nature of these ‘hidden influentials’[9,42]. More generally, this result highlights the intrinsic challenge in containing spreading in general, where system-wide spreading events are sustained by merely typical nodes, which themselves are hard to identify a priori. We now focus on exploring the impact of P1 and on the emergence of these ‘hidden influentials’, by considering the largest , averaged across the entire and P1 values, respectively. Notably, an overall decreasing trend is noted as both P1 and increase, where an increase in P1 triggers a rapid decrease in (Fig. 8c), converging towards , while an increase in triggers a linear decrease in (Fig. 8d). This behaviour corresponds to an increase in the role of ‘hidden influential’ in sustaining the spreading process as immediate failure becomes more likely. In conjunction with the fact that spreading event sizes increases with larger P1 and/or , these results further suggest that larger spreading events may be harder to contain than smaller ones, simply because larger ones are increasingly reliant on the existence of these ‘hidden influentials’.

Discussion

In this paper, we have introduced a simple model which allows us to decouple the effect of subsequent exposure from the overall spreading process, and comparatively examine its impact on key quantities, including spreading event size (Figs 2,3) and propagation rate (Fig. 5). Our results highlight the conflating nature of spreading, where subsequent exposure increases the number of nodes affected whilst reducing the rate in which the spread progresses. With subsequent exposure being a derivative of clustered networks, our results clarify broader discussions within the literature, which typically focus in providing high-level insight[43-46]. For example, ref.[44] focuses on identifying a (largely) positive link between clustering and spreading event size. Yet, the question of why clustering enhances spreading event size remains unexplored, with cases of no effect being treated as some sort of outliers. Our results suggest that subsequent exposure is one possible avenue by which the relationship between clustering and enhanced spreading event size depends on, allowing for additional aspects to be explored in a similar fashion. From a methodological standpoint, our results demonstrate the need to explicitly account for, and control the effects of, subsequent exposure when modelling spreading-like processes. For example, consider the frequent use of ‘locally tree-like’ approximations, typically used to deploy analytically tractable expositions into various network dynamics[1,47-49]. Despite the valuable insights that these approximations provide, the eventual nullification of subsequent exposure – and its effect on the spreading process – clouds the real difference between these models and the respective real-world systems they represent, skewing our confidence and biasing results in a non-trivial manner. The results presented within this paper serve as additional motivation to recently emerging lines of inquiry[50,51] which focus on relaxing the ‘locally-tree like assumption’, integrating the effect of subsequent exposure to the overall spreading process. More generally, relaxing these approximations has the potential for uncovering dynamical properties that are shared across a range of real-world systems, similar in spirit to the work of Barzel and Barabási[20]. In terms of applications, our work provides the grounds for a dialogue between researchers in the network science and project management, where hotly-researched, domain challenges (e.g. project complexity evaluation)[52-57] can be treated as network-related problems[31,32]. For example, increased susceptibility to the spreading of failures can be reasonably interpreted as a contributing factor to project complexity. Hence, the relationship between spreading event size (or propagation rate) and the structure of the underlying pathways (Figs 6 and 7 respectively) can serve as an objective, quantitative measure for project complexity (assuming that the activity network is an up-to-date reflection of the actual project plan). Similarly, the proportion (and identity) of ‘hidden influentials’ within an activity network (Fig. 8) could be used to support the overall project risk mitigation scheme, where activities with limited connections (yet increased probability of being ‘hidden influentials’) receive adequate attention.

Methods

Data

The data comprises of a real-world engineering project which captures a set of planned activities that need to be completed in order to deliver a definitive commercial product in the area of defence. The overall duration of the project is 577 days, and is composed of 578 distinct tasks with 1,085 dependencies. Note that some tasks are used as planning instruments (e.g. milestones)[58] and thus, include no dependencies – these tasks are excluded from the analysis (8 tasks in total). The delivery of each activity is typically conditional to a number of other activities. We refer to these interactions as functional dependencies, since they effectively control the function of each activity e.g. the start of activity j depends on the completion of activity i. The directionality of each dependency dictates the functional role of each activity i.e. whether it acts as a predecessor (activity i proceeds activity j) or a successor (activity k succeeds activity j) to a subsequent task (leaf activities also exist, with no successor activities). The set of tasks and dependencies was subsequently converted to an activity network, defined as a directed network , where V is the set of nodes and E is that of directed edges. Every activity is abstracted as a node, where a functional dependency between activity i and j is captured in the form of a directed link from node i to node j, denoted by . The number of successors and predecessors each activity has corresponds to its out-degree and in-degree respectively. The cumulative probability distribution of out-degree (red) and in-degree (blue) is shown in Supplementary Fig. 7– note its heavy-tail nature, evident by the straight line formed under the log-log axes.

Spreading Model Formulation

Every node j of the network at time t is characterised by a dynamic variable , where ‘0’ and ‘1’ correspond to the ‘non-affected’ and ‘affected’ state, respectively. During the spreading process, node j may irreversibly switch from the ‘non-affected’ to the ‘affected’ state at time t if: (i) node j has at least one predecessor, node i, and (ii) at least one node i was in the ‘affected’ state at t-1. Then, we artificially switch the state of some seed node at , from ‘non-affected’ to ‘affected’ and track the progression of the spreading process as time increases at discrete increments of 1. In order to distinguish between direct and subsequent exposures, we keep track of node i’s successors at time t, say node j, and assess whether node j has been encountered before. If so, then node j has been directly exposed to the failure of a different predecessor at some time >t, but did not switch states during that time. Therefore, the probability of node j to switch states at now, at time t, is controlled by (Fig. 1, middle panel). However, if this is the first time node j has been encountered, then it is the first time node j is exposed to failure in general and therefore the probability to switch states at time t is controlled by p1 (e.g. Fig. 1, top panel). Note the broad nature of the term ‘affected’, acknowledging the fact that failure can mean very different things, depending on the context of the project. For example, ‘failure’ can mean ‘structural defect’ in a construction project, or something much less tangible such as a ‘contaminated’ or ‘compromised’ in a cyber-security project.

Spreading Model Implementation

The model is implemented as follows. First, an initialisation phase is implemented, where simulation time t is set to 0 and the state of all nodes is set to ‘0’. In addition, an empty set B is created in order to record all successor nodes encountered during time t The spreading process is initiated by externally switching the state of node i from ‘0’ to ‘1’. We then identify all successors of node i, node(s)j (if no neighbours exist, the process terminates). For each node j, we record index j in set B(t), and then check whether index j was already present in set . If index j was not present, the interaction between node i and j is the result of direct exposure; hence, the probability of node j to switch states, under both model M and , is equal to p1. However, if index j was already present in set , the interaction between node i and j is the result of subsequent exposure; hence, the probability of node j to switch states, under model M, is equal to (in the case of M0, is always set to 0). Once all node(s) j have been tested with respect to the prospect of changing states, we record the total number of state changes up to, and including, time t, , and increase t by 1. The process repeats until the total number of state changes remains constant i.e. . Finally, the process is reiterated for each node i, in order to evaluate the total number of state changes the failure of every possible seed node. Finally, this process is repeated for 48 independent runs, with results presented herein being the average (number of runs determined in order to minimise the standard error of the mean). Formally, the condition by which node j changes state depends on the spreading model used. The condition for spreading model M0 is determined by eq. 1:and the condition for spreading model M is determined by eq. 2:where variable is uniformly drawn at random from U(0,1) for both eqs 1 and 2.

Data availability

The datasets generated and/or analysed during the current study, and related source code, are available from the corresponding author on reasonable request. Supplementary Material

29 in total

Modelling indirect interactions during failure spreading in a project activity network.

Introduction

Results

Spreading event size and propagation rate

Topology of spreading pathways

‘Hidden influentials’ and the effect of p1 and

Discussion

Methods

Data

Spreading Model Formulation

Spreading Model Implementation

Data availability

1. Network motifs: simple building blocks of complex networks.

2. Message passing approach for general epidemic models.

3. A generalized model of social and biological contagion.

4. A simple model of global cascades on random networks.

5. Clustering in complex networks. II. Percolation properties.

6. Cascades on correlated and modular random networks.

7. Collective Influence of Multiple Spreaders Evaluated by Tracing Real Information Flow in Large-Scale Social Networks.

8. Default cascades in complex networks: topology and systemic risk.

9. Universality in network dynamics.

10. Spatio-temporal propagation of cascading overload failures in spatially embedded networks.