Literature DB >> 33303816

The localization of non-backtracking centrality in networks and its physical consequences.

Romualdo Pastor-Satorras¹, Claudio Castellano².

Abstract

The spectrum of the non-backtracking matrix plays a crucial role in determining various structural and dynamical properties of networked systems, ranging from the threshold in bond percolation and non-recurrent epidemic processes, to community structure, to node importance. Here we calculate the largest eigenvalue of the non-backtracking matrix and the associated non-backtracking centrality for uncorrelated random networks, finding expressions in excellent agreement with numerical results. We show however that the same formulas do not work well for many real-world networks. We identify the mechanism responsible for this violation in the localization of the non-backtracking centrality on network subgraphs whose formation is highly unlikely in uncorrelated networks, but rather common in real-world structures. Exploiting this knowledge we present an heuristic generalized formula for the largest eigenvalue, which is remarkably accurate for all networks of a large empirical dataset. We show that this newly uncovered localization phenomenon allows to understand the failure of the message-passing prediction for the percolation threshold in many real-world structures.

Entities: Gene

Year: 2020 PMID： 33303816 PMCID： PMC7728761 DOI： 10.1038/s41598-020-78582-x

Source DB: PubMed Journal: Sci Rep ISSN： 2045-2322 Impact factor: 4.379

Introduction

The non-backtracking (NB) operator is a binary matricial representation of the topology of a network, whose elements represent the presence of non-backtracking paths between pairs of different nodes, traversing a third intermediate one[1,2]. By means of a message-passing approach[3], the NB matrix finds a natural use in the representation of dynamical processes on networks, such as percolation[4,5] and non-recurrent epidemics[6], where a spreading process cannot affect twice a given node, and therefore backtracking propagation paths are inhibited[7,8]. Within this approach, the bond percolation threshold and the epidemic threshold in the SIR model[6] are found to be inversely proportional to the largest eigenvalue (LEV) of the NB matrix, . The spectrum of the non-backtracking matrix is relevant also for other problems in network science, such as community structure[9] and node importance[2,10-12]. The principal eigenvector (PEV) associated to the LEV of the NB matrix has been recently used to build a new measure of node importance or centrality[13]. A classical measure of node centrality is given by eigenvector centrality, based on the idea that a node is central if it is connected to other central nodes. In this perspective, eigenvector centrality of node i is defined as the ith component of the principal eigenvector of the adjacency matrix[14]. Eigenvector centrality has the drawback of being strongly affected by the presence of large hubs, which exhibit an exceedingly large component of the adjacency matrix PEV because of a peculiar self-reinforcing bootstrap effect. The hub is highly central since it has a large number of mildly central neighbors; the neighbors are in their turn central just because of their vicinity with the highly central hub[2,15]. In terms of the adjacency matrix this self-reinforcement is revealed by the localization of the PEV on a star graph composed by the largest hub and its immediate neighbors. To correct for this feature, in Ref.[2] it was proposed to build a centrality measure using the NB matrix, in such a way as to avoid backtracking paths that could artificially inflate a hub’s centrality. In this way, an alternative non-backtracking centrality (NBC) of nodes was defined, in which the effect of hubs is strongly suppressed. Consider an unweighted undirected complex network with N nodes and E edges. The non-backtracking (NB) matrix is a representation of the network topology in terms of a non-symmetric matrix in which rows and columns represent virtual directed edges pointing from node j to node i, taking the valuewhere represents the Kronecker symbol. Each NB matrix element represents a possible walk in the network composed by a pair of directed edges, one pointing from node m to node , and the other from node j to node i. The element is nonzero when the edges share the central node (), and when the walk does not return to the first node (). The principal eigenvector of the NB matrix, associated to the largest eigenvalue (LEV) , is given by the relationSince is a non-negative matrix, the Perron–Frobenius theorem[16] guarantees that and all components are positive, provided that the matrix is irreducible. The element expresses the centrality of node j, disregarding the possible contribution of node i. The non-backtracking centrality of node i is defined as[2]where is the network adjacency matrix. If the PEV of the NB matrix is normalized as , which is valid if is irreducible, then the natural normalization emerges.

Results

Theory for uncorrelated random networks

The NBC can be practically calculated by using the Ihara–Bass determinant formula[2,17], which shows that the NBC values correspond to the first N elements of the PEV of the matrixwhere is the adjacency matrix, is the identity matrix, and is a diagonal matrix of elements . Using the Ihara–Bass formalism[18] (see Method “Theory for uncorrelated networks” section) one can express, in full generality, the leading eigenvalue in terms of the NBC asFollowing Ref.[2] (see Method “Theory for uncorrelated networks” section), it is possible to argue that, for uncorrelated random networks, i.e., networks with a given degree sequence but completely random in all other respects[13], the dependence of the components of the NB matrix PEV isIntroducing this relation into the definition of the NBC, Eq. (3), and applying the normalization , we obtainthat, inserted into Eq. (5), leads toThese expressions constitute an improvement over previous results[2,9,18], namely( is the nth moment of the degree distribution), which can be recovered from Eqs. (7) and (8) by replacing the network adjacency matrix with its annealed approximated value [19,20].

Test on synthetic networks

We now check the predictions developed above with the LEV and the NBC determined numerically by applying the power iteration method[21] to the Ihara–Bass matrix for random uncorrelated networks with a power-law degree distribution , generated using the uncorrelated configuration model (UCM)[22]. In Fig. 1 we present, as a function of the network size N, a comparison between the NB LEV, , evaluated numerically and our theoretical prediction Eq. (8). The match between theory and simulation is excellent. However, also Eq. (9) gives very accurate results, differing in average by less than from the theoretical result Eq. (8). A much more noticeable improvement is observed instead for the NB centrality , for which annealed network approximation does not provide accurate predictions (see Fig. 2, bottom row). In Fig. 2 (top row) we show the dependence of the NBC on the structure of the adjacency matrix, as given by Eq. (7), namely . The analytical expression is extremely accurate for values of . For , although some scattering can be observed with respect to the expected value, the prediction is still good, much more accurate than the annealed network approximation. More evidence about the superior accuracy of our approach is found considering the inverse participation ratio as a function of network size (see Method “Localization of the non-backtracking centrality”).

Figure 1

Figure 2

NBC for uncorrelated networks. Scatter plot of the numerical NBC in power-law UCM networks of size with different degree exponent , as a function of the theoretical predictions in Eq. (7) (top row) and in Eq. (9) (bottom row). The dashed lines represent the curve . Degree exponents considered are (a) and (e); (b) and (f); (c) and (g); (d) and (h).

for uncorrelated networks. Scaling of the LEV of the NB matrix, , as a function of network size N in power law UCM networks with different degree exponent . Dashed lines correspond to the theoretical prediction Eq. (8). Simulations results correspond to the average over 25 different network realizations. Error bars are smaller than symbols size. NBC for uncorrelated networks. Scatter plot of the numerical NBC in power-law UCM networks of size with different degree exponent , as a function of the theoretical predictions in Eq. (7) (top row) and in Eq. (9) (bottom row). The dashed lines represent the curve . Degree exponents considered are (a) and (e); (b) and (f); (c) and (g); (d) and (h).

Non-backtracking principal eigenvalue of characteristic subgraphs

The non-backtracking centrality was introduced with the goal of overcoming the flaws of eigenvector centrality, due to the localization of the adjacency matrix principal eigenvector on star graphs surrounding hubs of large degree, that artificially inflate their own eigenvector centrality[2]. For the NBC the addition of a large hub to an otherwise homogeneous network has a limited impact. Indeed, the addition of a dangling hub of degree K, connected to leaves of degree 1 and to a generic network by a single edge, does not alter at all the value of [2,9] (see Method “Largest non-backtracking eigenvalue of characteristic subgraphs” section). In the case of a hub integrated into the network, connected to K other random nodes in the graph, Ref.[2] argued, from the perspective of the annealed network approximation, that its effect is irrelevant in the thermodynamic limit. A more elaborate analysis (see Method “Largest non-backtracking eigenvalue of characteristic subgraphs” section) shows that this is true unless . Only in this case an integrated hub has an effect and leads to a PEV significantly larger than the PEV of the original network and scaling as . However, it is possible that other types of subgraphs play for the NB centrality the same role that star graphs play for eigenvector centrality: They can have, alone, large values of , so that, if present within an otherwise random network, they determine of the whole structure, with the overall NBC localized on them. We now show that these subgraphs actually exist and can have dramatic effects. As noticed in Ref.[2], the simplest example is a clique of size , which is associated to . If is large enough, can dominate over . But also a homogeneous (Poisson) subgraph of average degree , for which [2,9], can become the substrate of a localized NB PEV if is sufficiently large. Apart from these simple examples, a less trivial one is the case of overlapping hubs, i.e., a set of n hubs of degree K, connected to the same K leaves of degree n, see Supplementary Fig. SF1. The intrinsic LEV associated to such a structure is (see Method “Largest non-backtracking eigenvalue of characteristic subgraphs section”)This last case is particularly important, since can become very large due to a few overlapping hubs of very large degree K, or due to a large number of hubs with moderate overlap K.

Localization in real-world networks

In Fig. 3a,b we compare the theoretical predictions derived for uncorrelated and annealed networks with the values of computed numerically for a set of 109 real-world networks of diverse origin (see Supplementary Table ST1 for details). In opposite ways, both predictions, and , fail to provide an accurate approximation of empirical results for many networks. In the most noticeable cases, the networks Zhishi and DBpedia, the uncorrelated prediction Eq. (8) largely underestimates the value of , while the annealed network prediction Eq. (9) largely overestimates it.

Figure 3

Test of theoretical approaches for real-world networks. LEV of the NB matrix, , as a function of the theoretical predictions [Eq. (8a)] [Eq. (9b)], and [Eq. (11c)], for the set of 109 real-world networks described in Supplementary Table ST1. To shed light on the origin of these discrepancies, in Supplementary Fig. SF2 we compare the empirical NBC, , with the theoretical prediction for four real-world networks in which the predictions largely fail. We observe that, in all networks, a few nodes assume an exceedingly large value of , i.e., the NBC is localized on a very small subset of nodes, which includes the largest hubs. It is clear that, in order to obtain an accurate prediction of in real-world networks, it is necessary to take into account the possible localization of the NB centrality on subgraphs which, despite being relatively small, may determine for the whole structure. In previous paragraphs, we have seen that two special subgraphs, a large clique/relatively dense homogeneous graph, or a set of overlapping hubs, may become the set where NBC gets localized if the associated is larger than the one for the rest of the network. It is then natural to postulate (in analogy with what happens for the adjacency matrix[23]) that the overall is well approximated by the maximum among Eq. (8) and the values associated to each possible network subgraph s (We note here that, while in the case of the adjacency matrix this result is exact due to the Rayleigh’s inequality[24], for the NB matrix we simply proceed by analogy. As we will see later on, however, the conjecture turns out to be quite accurate). An exhaustive search among all subgraphs is computationally impractical. However, if we limit ourselves to the types of subgraphs discussed above, it is numerically easy to find reasonable estimates of their maximum LEVs. The hubs, either dangling or integrated, provide a negligible contribution, as we can check numerically. The K-core decomposition (see Method “”) provides, as the core with maximum index, an approximation of the densest subgraph in the network. The value associated to such max K-core, which can be either a clique or a relatively dense homogeneous graph, is a good estimate of the maximum LEV among these types of subgraphs. Concerning , the pair of n and K values maximizing Eq. (10) can be well approximated by a heuristic greedy algorithm described in Method “Algorithm to determine optimal ”. Following this line of reasoning, we can then write an approximate expression for the NB LEV in generic networks aswhere is computed as the largest eigenvalue of the NB matrix defined by the subgraph spanned by the maximum K-core. The comparison of Eq. (11) with empirical results in real-world networks, displayed in Fig. 3c, reveals a striking accuracy in all cases and substantiates the predictive power of Eq. (11) for the LEV of the non-backtracking matrix on generic real-world networks. The spontaneous formation of large cliques or sets of overlapping hubs is exceedingly improbable in uncorrelated networks. A K-core structure exists only for [25] but in that case . As a consequence, for all uncorrelated networks Eq. (11) gives back Eq. (8).

Application to percolation

Spectral properties of the non-backtracking matrix are at the heart of the message-passing theory for bond percolation[7]: For locally tree-like networks, the percolation threshold is given by the inverse of the NB matrix LEV, Test of message-passing prediction for bond percolation threshold in real-world networks. The bond percolation threshold determined numerically from the main peak of the susceptibility is divided by the message-passing prediction [Eq. (12)] and plotted for the 109 real-world networks considered. Below the horizontal dashed red line the prediction is accurate within . Vertical dashed lines represent the size scale of the networks: from left to right , , , , and . Symbols show which of the terms in Eq. (11) is maximal. Symbols are surrounded by a black (red) circle in case a secondary peak appears in the susceptibility on the left (right) of the main peak. Susceptibility plots for networks exhibiting a secondary peak on the left. Numerical bond percolation susceptibility for the networks (a): GR-QC, 1993-2003; (b): Reactome; (c): PGP; (d): Flickr; (e): Web Stanford; (f): DBLP, collaborations; (g): Web Notre Dame; (h): Zhishi; (i): US Patents; and (j): DBpedia. The global maximum of the susceptibility , indicating the percolation threshold, is marked by a gray vertical bar. Black vertical lines indicate the position of the secondary peak. Red vertical lines signal the value of the prediction . Notice that for three of the networks (Web Stanford, Zhishi and DBpedia) the NBC is localized on overlapping hubs, while for the others localization occurs on the max K-core. A comparison of this prediction with results obtained numerically for our set of real-world networks is presented (A similar test was already performed in Ref.[18].) in Fig. 4, where the percolation threshold is obtained as the position of the main susceptibility peak (see Method “Numerical simulations of bond percolation”). In the majority of cases and differ by less than 50%, but for the remaining networks the discrepancy is larger, in some cases by more than one order of magnitude. These failures of prediction (12) can be understood by applying the knowledge acquired in the previous Sections. Most (and the largest) of the violations occur when the NBC is localized on small subgraphs, either overlapping hubs or the max K-core, which determine the overall value of . In these cases the system actually undergoes what can be seen as a double percolation transition[26], reflected, in Fig. 5, by the presence of two distinct peaks of the susceptibility (see also Ref.[27] for the effect of mesoscopic structures on percolation). In the networks considered in this figure, the message-passing value signals the buildup of the connected subgraph of relatively small size where NBC is localized, originating the first susceptibility peak. The second and largest peak occurs for much larger values of p and signals the formation of a percolating cluster encompassing a larger fraction of the nodes. Two (or even multiple) peaks are present also in other networks. The message-passing theory accurately predicts only the leftmost of these peaks (see Fig. 5), while it does not give any information about the position of other peaks and the associated transition.

Figure 4

Test of message-passing prediction for bond percolation threshold in real-world networks. The bond percolation threshold determined numerically from the main peak of the susceptibility is divided by the message-passing prediction [Eq. (12)] and plotted for the 109 real-world networks considered. Below the horizontal dashed red line the prediction is accurate within . Vertical dashed lines represent the size scale of the networks: from left to right , , , , and . Symbols show which of the terms in Eq. (11) is maximal. Symbols are surrounded by a black (red) circle in case a secondary peak appears in the susceptibility on the left (right) of the main peak.

Figure 5

Susceptibility plots for networks exhibiting a secondary peak on the left. Numerical bond percolation susceptibility for the networks (a): GR-QC, 1993-2003; (b): Reactome; (c): PGP; (d): Flickr; (e): Web Stanford; (f): DBLP, collaborations; (g): Web Notre Dame; (h): Zhishi; (i): US Patents; and (j): DBpedia. The global maximum of the susceptibility , indicating the percolation threshold, is marked by a gray vertical bar. Black vertical lines indicate the position of the secondary peak. Red vertical lines signal the value of the prediction . Notice that for three of the networks (Web Stanford, Zhishi and DBpedia) the NBC is localized on overlapping hubs, while for the others localization occurs on the max K-core.

Some other networks exhibit quite large discrepancies between and but in the absence of a secondary peak. Our theory does not provide an explanation for these cases. However, it must be remarked that this phenomenology occurs for small networks, for which the very concept of localization on a subgraph is not well defined. Moreover, in these cases the peak of the susceptibility is wide and it may hide the presence of another peak (see Supplementary Fig. SF3). Finally, an ample discrepancy between and is observed also for a few networks (Road network TX, Road 512 network CA, Road network PA and US Power grid) having very large values of the average shortest path length and thus not possessing the small-world property. This is not surprising, as the almost planar nature of these topologies makes our framework inapplicable to them. In summary, realizing that localization of the NB centrality can determine the value of for the whole structure allows us to understand the presence of a double percolation transition in several real-world networks. In these cases message-passing theory captures only the first of the transitions, corresponding to the emergence of a localized subgraph, while the occurrence of the second transition is completely missed by the theory[28,29].

Discussion

Our results show that the non-backtracking centrality, which was introduced to avoid the pathological self-reinforcement mechanism that plagues standard eigenvector centrality, is affected by the same problem. The NBC may also get localized on specific network subgraphs, with the same bootstrap mechanism at work: Some nodes are highly central because they are in “contact” with other central nodes and the latter are central because they are in contact with the former. The only difference is that for the adjacency matrix the relevant subgraphs are stars and self-reinforcement takes place among the hub and its direct neighbors[23]. For the NB matrix the relevant subgraphs are groups of nodes sharing many neighbors and self-reinforcement occurs at distance 2. The possibility of localization also for the NB matrix was overlooked so far, because it is exceedingly unlikely in random uncorrelated networks. However, as we show here, in real-world topologies these structures are rather common. Indeed, cliques and sets of overlapping hubs are, respectively, complete unipartite and bipartite subgraphs, which naturally arise in many networks, for structural or functional reasons. The results presented here have a number of implications. Which of the three contributions determines in Eq. (11) allows to rapidly estimate also the relevant non-backtracking centralities in the network. If dominates, then the NBC are given by Eq. (7). If instead is largest, then non-backtracking centralities are given by Eq. (41) in the subset of overlapping hubs and are essentially zero elsewhere. Similarly, when dominates in Eq. (11), NBC is approximately constant in the max K-core and much smaller elsewhere. Additionally, our results allow to shed light on the LEV of the adjacency matrix, . In Ref.[23], it was argued that is determined by two subgraphs that have associated a large LEV, and that correspond to the node of maximum degree (hub), taken as an isolated star graph, and the maximum K-core. Thus, in the spirit of Rayleigh’s inequality[24], it was proposed the approximation , where is the LEV of star graph of degree and is the LEV of the maximum K-core, approximated by its average degree [23]. The subgraph composed by n overlapping hubs of degree K turns out to possess also a large LEV of the adjacency matrix, given by . We can then propose an improved approximation, taking into account the effect of overlapping hub, of the form . In Supplementary Fig. SF4 we check this new expression, observing that it provides some improvement in the estimation of the adjacency matrix LEV, particularly for networks of large size. The localization phenomenon of the NB matrix has also strong implications for percolation and thus for the related susceptible-infected-removed model for epidemic dynamics. Quite surprisingly, this reveals strong analogies with what happens in some regions of the phase-diagram of the paradigmatic susceptible-infected-susceptible model for epidemic dynamics (SIS)[30]. The formation (under appropriate conditions) of localized clusters below the global epidemic transition is a striking common feature of both types of dynamics, which they share despite their completely different nature. This intriguing similarity extends to the predictive power of theoretical approaches. For SIS dynamics quenched mean-field theory predicts when localized clusters of activity start to appear, but misses the formation of an overall endemic state[30]. For percolation (and SIR dynamics) message-passing theory captures the formation of localized clusters but is not predictive for what concerns the possible second transition involving a much larger fraction of the network. The quest for theoretical approaches able to understand and predict this nontrivial second transition is a challenging avenue for future research. Another related line for future research is the exploitation of the improved understanding presented here to devise targeted immunization strategies[12].

Methods

Theory for uncorrelated networks

Denoting the PEV of the matrix as , we can rewrite Eq. (4) as[18]which translates intoSumming over i and rearranging, we obtainDiscarding the solution , which is always an eigenvalue, we haveleading towhich allows us to compute once the NBC is known. Following Ref.[2], we can obtain an approximation for the NB matrix PEV (and hence for the NBC) by expanding the eigenvalue relationthat, after some transformations can be written as[2]Let us now compute the average value of over all outgoing nodes i with a fixed degree , that iswhere kNP(k) represents the number of edges emanating from nodes of degree k. Applying Eq. (20) to the previous equation we can writeAssuming now[2] that the components departing from nodes of degree have the same distribution as in the whole network (assumption valid in the limit of random uncorrelated networks), we can substitute , where E is the number of undirected edges in the original network. With this assumption, we can writeAnalogously, we can compute the average of over all ingoing nodes l with fixed degree ,Applying again Eq. (20), we can writeThe matrix element counts the number of walks of length 2 between nodes l and j[13], andcounts those walks that start at nodes of degree k and are non-backtracking. In a tree-like network, the number of such walks is equal to the number of next-nearest neighbors of nodes of degree k, that is in average [13]. Therefore, we haveThat is, in random uncorrelated networks, we have and . Extending this relation at the level of individual edges, we can approximate the normalized dependence of the components of the NB matrix PEV asIn Supplementary Fig. SF5 we check the dependence obtained for the components of the PEV of the NB matrix as a function of the outgoing and ingoing degree, namely . The averaged components and , defined in Eqs. (21) and (26), correctly fulfill the scaling forms and , respectively. Indeed, for UCM networks, the theoretical predictions in Eqs. (25) and (28) are extremely well fulfilled.

Localization of the non-backtracking centrality

The concept of vector localization/delocalization refers to whether the components of a vector are evenly distributed over the network or they attain a large value on some subset of nodes V of size and are much smaller in the rest of the network. In the first scenario we have for all nodes i, and we say the vector is delocalized. In the second scenario, one has for , and for , and we say the vector is localized on V. For the NBC , defined with a Euclidean normalization , localization can be measured in terms of the inverse participation ratio [2,15], defined asFor a delocalized vector, , so one has ; on the other hand, for a vector localized on a subgraph of size , we have . Therefore, fitting the inverse participation ratio to a power-law form , a value indicates delocalization, while implies localization on a subextensive set of nodes of size [31]. In the extreme case of localization on a finite set of nodes (independent of N), one has instead The functional form derived for in Eq. (7) helps to explain the localization properties of the NBC for UCM networks observed in Ref.[31]. In Supplementary Fig. SF6 we show a comparison of the inverse participation ratio numerically obtained in power-law UCM networks with the theoretical prediction computed from Eq. (7), , and with the prediction obtained from the annealed network approximation Eq. (4), . As we can see, the prediction from our expression, , provides an almost perfect match for the numerical observation, while the annealed network approximation exhibits sizeable inaccuracies, particularly in the range .

Largest non-backtracking eigenvalue of characteristic subgraphs

Dangling star graph

Let us consider a dangling star network, see Supplementary Fig. SF1a, formed by a hub h of degree K connected to leaves l of degree 1 and by one edge to a connector node n of a generic network. By applying Eq. (15), we obtain the following equations for the LEV and the NBC:where is the degree of node n, is the NBC centrality of each leaf, and the equations corresponding to the rest of the nodes are the same as in the absence of the dangling star. From the first two equations, assuming , we obtain and . Introducing the last equality into the third equation, the dependence on drops out and the equation takes the form of Eq. (15) in the absence of the dangling star. We conclude therefore that a dangling star is unable to alter the value of the overall LEV and its NBC depends only on the centrality of the connector node n. The reason for this is the absence of non-backtracking paths between the hub and the leaves, so that the hub has the effect of a node of degree one[2,9].

Integrated star graph

The case of an integrated star of degree K, i.e., a star connected by K edges to K randomly chosen connector nodes in a network, Supplementary Fig. SF1b, is more difficult to analyze. To simplify calculations, we consider the case of a regular network with fixed degree q. For symmetry reasons, the nodes connected to the hub, of degree , have approximately the same NBC, , different from the centrality of the nodes not connected to the hub, and also from , the centrality of the hub. Applying the Ihara–Bass determinant formula, Eq. (15), we can writewhere to ease calculations, we have made the mean-field assumption that nodes in the network are neighbors of nodes connected to the hub with probability K/N, and otherwise with probability , which is valid in the limit of large K and N. These conditions lead to the equation for where we have factorized the trivial solution . This is an algebraic equation of fifth order than cannot be solved analytically in general. However, for , assuming , it reduces toleading to the solutionInstead for , assuming and expanding Eq. (34) to first order in , we obtainHence the value of is very close to the value of the original random regular network, with a correction that vanishes with N. We conclude that the addition of a finite integrated hub does not change the value of the whole network unless , a case which may be relevant in small networks. Not surprisingly, the uncorrelated expression Eq. (8) fails here, since it predicts a finite value , in the limit of large K. While we considered a star integrated into a homogeneous network, Supplementary Fig. SF7 shows that the same picture is valid also in the case of power-law distributed synthetic networks, replacing q by the network average degree : for K up to values of the order of the addition of the hub has no effect on ; for larger values, Eq. (36) holds.

Overlapping hubs

Let us consider now a graph composed of n hubs, sharing all their K leaves, see Supplementary Fig. SF1c. We can evaluate and by applying again the Ihara–Bass determinant formula. For symmetry reasons, the components of the hubs are equal, and correspondingly the components of the leaves. Thus, from Eq. (15) we can writeImposing that the components and are non-zero, we obtain the largest eigenvaluewhile the NB centralities fulfillThat is, for large K, the NBC becomes strongly localized in the hubs. In Supplementary Fig. SF8 we check the effects of adding n overlapping hubs of degree to power-law distributed synthetic networks. As we can see, as soon as is large enough (in practice, when ), the actual value of the NB LEV is dominated by the presence of the overlapping hubs.

K-core decomposition

The K-core decomposition[32] is an iterative classification process of the vertices of a network in layers of increasing density of mutual connections, denoted by increasing values of the index K. One starts removing the vertices of degree , repeating the process until only nodes with degree are left. The removed nodes constitute the shell, and the remaining ones are the core. At the next step, all vertices with degree are iteratively removed, thus leaving the core. The procedure is repeated until the maximum K-core (of index ) is reached, such that one more iteration removes all nodes in the network. The maximum K-core of generic networks is usually a homogeneous subgraph[23]. The K-core structure of networks has been proposed as a classification of node importance in dynamical processes on complex topologies[33].

Algorithm to determine optimal n and K values for overlapping hubs

The determination of the set of all overlapping hubs in a real-world network is highly time consuming. We can however obtain a working approximation using the following greedy algorithm: We order the nodes in decreasing order of their degree, . Starting from node , we visit the set of nodes and identify and identify the number of common neighbors , that are common neighbors of the set of nodes . Repeating this process for all nodes in the network, we compute the values for all nodes and all sets of nodes (in decreasing order of degree) of length . We choose as values of n and K the values of and that maximize the product .

Numerical simulations of bond percolation

We consider the bond percolation process in which network edges are randomly kept with probability p and removed with probability . For each realization of this process with a given value of p, one considers the largest cluster remaining in the network, of size . The average of this quantity over independent realization is denoted by . The critical percolation point separates a subcritical phase at , in which only clusters of small size are present, so that in the thermodynamic limit , from a supercritical phase at , in which there is a finite spanning cluster leading to [34]. In order to estimate the value of the percolation point, one considers the susceptibility , defined as[18,35]The percolation threshold is defined as the value of p for which shows a maximum[35]. To compute numerically in real-world networks we perform the averages on bond percolation experiments applying the Newman-Ziff algorithm[36]. Supplementary Information 1

16 in total

The localization of non-backtracking centrality in networks and its physical consequences.

Introduction

Results

Theory for uncorrelated random networks

Test on synthetic networks

Non-backtracking principal eigenvalue of characteristic subgraphs

Localization in real-world networks

Application to percolation

Discussion

Methods

Theory for uncorrelated networks

Localization of the non-backtracking centrality

Largest non-backtracking eigenvalue of characteristic subgraphs

Dangling star graph

Integrated star graph

Overlapping hubs

K-core decomposition

Algorithm to determine optimal n and K values for overlapping hubs

Numerical simulations of bond percolation

1. Network robustness and fragility: percolation on random graphs.

2. Efficient Monte Carlo algorithm and high-precision results for percolation.

3. Message passing approach for general epidemic models.

4. Generation of uncorrelated random scale-free networks.

5. k-Core organization of complex networks.

6. Spectral redemption in clustering sparse networks.

7. Predicting percolation thresholds in networks.

8. Nonbacktracking expansion of finite graphs.

9. Localization and spreading of diseases in complex networks.

10. Leveraging percolation theory to single out influential spreaders in networks.