Yang Liu1,2, Xi Wang3, Jürgen Kurths1,4,5. 1. Potsdam Institute for Climate Impact Research, 14412 Potsdam, Germany. 2. Department of Computer Science, Technische Universität Berlin, 10587 Berlin, Germany. 3. Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong. 4. Department of Physics, Humboldt University Berlin, 12489 Berlin, Germany. 5. Institute of Applied Physics, Russian Academy of Science, 603950 Nizhny Novgorod, Russia.
Abstract
Most of the existing methods for the robustness and targeted immunization problems can be viewed as greedy strategies, which are quite efficient but readily induce a local optimization. In this paper, starting from a percolation perspective, we develop two strategies, the relationship-related (RR) strategy and the prediction relationship (PR) strategy, to avoid a local optimum only through the investigation of interrelationships among nodes. Meanwhile, RR combines the sum rule and the product rule from explosive percolation, and PR holds the assumption that nodes with high degree are usually more important than those with low degree. In this manner our methods have a better capability to collapse or protect a network. The simulations performed on a number of networks also demonstrate their effectiveness, especially on large real-world networks where RR fragments each of them into the same size of the giant component; however, RR needs only less than 90% of the number of nodes which are necessary for the most excellent existing methods.
Most of the existing methods for the robustness and targeted immunization problems can be viewed as greedy strategies, which are quite efficient but readily induce a local optimization. In this paper, starting from a percolation perspective, we develop two strategies, the relationship-related (RR) strategy and the prediction relationship (PR) strategy, to avoid a local optimum only through the investigation of interrelationships among nodes. Meanwhile, RR combines the sum rule and the product rule from explosive percolation, and PR holds the assumption that nodes with high degree are usually more important than those with low degree. In this manner our methods have a better capability to collapse or protect a network. The simulations performed on a number of networks also demonstrate their effectiveness, especially on large real-world networks where RR fragments each of them into the same size of the giant component; however, RR needs only less than 90% of the number of nodes which are necessary for the most excellent existing methods.
There has recently been an enormous amount of interest focusing on the targeted immunization and robustness problems of network science [1-5], like investigating the critical threshold of structural collapse if an intentional attack happens [6], or probing the optimal targeted-immunized threshold if a virus is in possible transmission [7]. These problems appear in, but are not limited to, effectively preventing viruses in computer or population related networks [8,9], information transmission in social networks [10-12], or the breakdown of some infrastructure networks [13,14].For a network, the solution to the critical or optimal threshold is mathematically equivalent to finding the minimum set of nodes which can fragment the network into a certain situation, e.g., the size of the giant component is less than a given value after the removal of the minimum set. To achieve this, numerous methods have been proposed in the last few years, consisting of random immunization [15], acquaintance strategies [7,16], targeted methods [3,4,17-19], etc. [20-23], ranging from the need of local information to the whole network demand. With respect to random immunization, the immunization nodes are randomly selected from a certain network—without any priority about them. Similarly, random selection is also applied in the acquaintance strategy, but only one of the neighbors of a certain node is chosen to be immunized [7]. In addition, the targeted method is a widely accepted approach which first identifies the importance of each node and then removes the nodes in descending order of importance until the network reaches the immunized demand [3,4,18,19].Within networks, there are numerous relationships among nodes. Generally, high-degree nodes tend to connect to other high-degree nodes in assortatively mixed networks, while they mostly have low-degree neighbors in disassortative networks [24]. Moreover, a node with a low degree might play a critical role, whereas those with high degree might not be of significance comparatively (e.g., the betweenness centrality of nodes [25]). In the course of immunization, some subinfluential nodes would become influential after a few nodes are removed, while some others might lose their importance instead [heuristic immunized strategies [3,4,18], including the high adaptive degree centrality strategy, etc.]. All of such methods, e.g., the Collective Influence method (CI) [3] (better results always obtained with larger radius ), show that a better immunization strategy could be discovered when more interrelationships of nodes are considered. This may be also a good interpretation why the high adaptive betweenness centrality strategy (HAB) is significantly effective in most situations, as well as the belief propagation-guided decimation (BPD) method [4,26] in artificial networks. But HAB has a limitation due to its high time complexity and BPD is not so effective in real-world networks because there are always many loops.Most of those methods can also be viewed as greedy strategies, i.e., they repeat the process that recalculates the importance of nodes in the remaining network and then remove the most influential one or a part of it. For an optimization problem, the greedy strategy is quite efficient but readily induces a local optimum. In addition, taking Fig. 1(a) as an example, the removal of a node would affect the status (remove or not) of other nodes. These facts motivate us to use another approach: can the local optimization be effectively avoided by investigating the interrelationship among nodes?
FIG. 1.
Brief illustrations of the proposed methods. (a) In this network almost all of the methods mentioned in this paper will remove (marked with “') in the demand case of splitting this network into isolated ones, while the optimal removal set should be the nodes marked by “” apparently. Now, assuming that is removed first, then most of these methods will easily find the optimal solution in the remaining network. But the same situation cannot be induced by removing . In other words, the removal of or will influence the status of (remove or not), and as a result it directly determines whether the optimal solution can be reached. (b) An example of RR under the sum rule and . In this temporary network (), we consider two assumed cases: (1) then and (2) , then , i.e., two rounds of selection. For the first choice between and in case 1, the occupied node is either or since . After this, node would be chosen because of (might induce the optimal solution of ). In contrast, would be selected to be occupied at first , and then
in case 2 (might be associated with the optimization of ). In this example, we can also find how the status of a node influences the status of other nodes, e.g., to and . (c, d) An example of the PR method. In (c), some low-degree nodes are chosen and occupied first. (d) = and here marked by color shadow.
Brief illustrations of the proposed methods. (a) In this network almost all of the methods mentioned in this paper will remove (marked with “') in the demand case of splitting this network into isolated ones, while the optimal removal set should be the nodes marked by “” apparently. Now, assuming that is removed first, then most of these methods will easily find the optimal solution in the remaining network. But the same situation cannot be induced by removing . In other words, the removal of or will influence the status of (remove or not), and as a result it directly determines whether the optimal solution can be reached. (b) An example of RR under the sum rule and . In this temporary network (), we consider two assumed cases: (1) then and (2) , then , i.e., two rounds of selection. For the first choice between and in case 1, the occupied node is either or since . After this, node would be chosen because of (might induce the optimal solution of ). In contrast, would be selected to be occupied at first , and then
in case 2 (might be associated with the optimization of ). In this example, we can also find how the status of a node influences the status of other nodes, e.g., to and . (c, d) An example of the PR method. In (c), some low-degree nodes are chosen and occupied first. (d) = and here marked by color shadow.Here, also from a percolation perspective [3,19,27,28], we propose two strategies: the relationship-related (RR) method and the prediction relationship (PR) method, which are capable of achieving excellent performance compared to other existing strategies. The main idea of the developed strategies is to explore and utilize the interrelationship among the nodes. In addition, RR combines the sum rule and the product rule from explosive percolation [29], while PR holds the assumption that nodes with high degree are usually more important than those with low degree. In this way, our approaches can achieve a better capability of avoiding of local optimum and obtain smaller thresholds than other methods. To demonstrate the effectiveness of the proposed strategies, we conduct numerous simulations on a number of networks. The results show that our methods have significant advantages over other strategies, especially on large real-world networks where RR can collapse each of them into the same size of the giant component with less than nodes of CI, BPD, or the Explosive Immunization method (EI) [19]. Moreover, our methods might also be used for the feedback vertex set (FVS) problem [26,30,31].
METHOD
We consider an undirected network composed of nodes tied by edges where and are the node set and the edge set, accordingly. Let be an arbitrary configuration (sequence) of , namely, where corresponds to a unique node of the network. Then the threshold regarding is defined to be
in which is a given value and represents the probability that a node is part of the giant (largest) connected component in the remaining network after the removal of all nodes in , including the incidental edges. Denoting by ,
the average size fraction of giant components of , the solution associated with the targeted immunization or robustness problem is to search the optimal sequence , which satisfies
where mean all the configurations of . Apparently, finding the optimal solution is NP-hard.
Relationship-related (RR) method
Inspired by Ref. [29], we develop RR method in a percolation process, i.e., change the process from the removal of the most influential node to the occupation of the least important node. In other words, we start the RR method with an arbitrary configuration of the node set and a nonoccupied network [or a given strategy, e.g., high-degree centrality strategy (HD)], and then reverse the order of to be a new sequence , satisfying
where . Let be the proportion of possible candidates and be the selection times, respectively. Denoting the number of occupied nodes with , we then obtain based on through the following procedures:(i) Each time randomly select one node from the nearest nonoccupied node set :(ii) Independently repeat the selection (i) times to form the candidate node set , and then choose the node from which minimizes to be occupied (randomly choose one if there are several nodes with the same minimum):
where is defined as the following two cases (respectively correspond to the sum rule and the product rule [29]):
in which is the size of the component and denotes the component set that node would connect in the temporary network consisting of all the occupied nodes and the related edges.(iii) Update by swapping and , i.e., exchange the places of and in .(iv) Repeat the processes (i)–(iii) until all nodes are occupied, and we will get a new sequence by reversing .Next, replace with based on Eq. (3), namely, replace with if is given and , otherwise, replace with if . In this manner, we further obtain based on as well as with other and where denotes the time step. An illustration of RR is shown in Fig. 1(b).Now let us focus our attention on the parameters and as well as the two kinds of selection strategies [Eq. (7)]. A large indicates that the selection happens on a large range of possible candidates, which on the one hand can make RR converge quickly at the early stage, but on the other hand it will induce RR saturating and no longer improving after reaches some value since . The value of is the main contribution to the time consumption of RR. To overcome this, here we associate the and with (may have other choices): where and are the initial values of and , is the decrease rate of and denotes the increases rate of , respectively.With respect to the two kinds of selection strategies, our simulation results demonstrate that the sum rule is more efficient than the product rule in small networks but less efficient in large network. Hence, we combine them and use the following adaptive probability to determine which one is adopted in each :
where is the selection probability of the sum rule, otherwise the product rule. and correspond to the number of positive replacements under the sum rule and the product rule, respectively. In other words, if the sum rule promotes a better result (smaller or ), then , vice versa. In this paper, we initialize and with 1.
Prediction relationship (PR) method
The PR method is developed based on an assumption that high-degree nodes are normally more influential than those nodes with low degree, i.e., PR tries to keep the occupied components away from as many high-degree nodes as possible. To achieve this, we first identify each node based on the distribution of node degree:
where is the probability of nodes with degree . Then, similar to RR, construct the function with
in which denotes all of the 's nearest-unoccupied neighbors (here view as occupied node). An example of PR is shown in Figs. 1(c) and 1(d).
RR and PR for the feedback vertex set (FVS) problem
Following Ref. [4], we further develop RR and PR to obtain the optimal FVS of a given network, which can help RR and PR to obtain better than the direct calculation in model networks. Let FVS be a subset of , after the removal of it there is no loop in the remaining network (). Denoting with the number of nodes in FVS, the goal of optimizing FVS is to minimize . How can we achieve this in RR and PR?Considering the candidate node set [see Sec. II A (ii)], we construct the subset of it in the following way:
where is defined as
in which is the number of occupied nodes that belong to the component as well as the nearest neighbors of . Then we rewrite Eq. (6) as
where corresponds to the node chosen to be occupied. In addition, another strategy is adopted for the FVS problem: if (this means that there are two neighbors of in the same component, i.e., the selected occupied node will induce a loop), then we further exchange the places of and one of its two corresponding neighbors (randomly) after the swap process [see Sec. II A (iii)]. Finally, without a loss of generality, we replace with if has smaller than .Obviously, there is no loop in the temporary network (composing of ) if all occupied nodes satisfy [Eq. (14)] in the occupied process. In other words, the minimization of is equivalent to the maximization of under the constraint of .
RESULTS
In this section, if there is no special explanation, of CI [3] is fixed to 4, each result of EI [19] is obtained with and 2000 candidates, and BPD [4] is conducted with . Note that the results of BPD are slightly different from the results in Ref. [4], since we fix the “Degree threshold” with the “Degree of top percent” in the BPD code (Table II). To validate the effectiveness of the proposed methods in more detail, here we test RR and PR by considering different optimization objectives, and , respectively. In addition, both RR and PR are based on HD with , , and for networks with , , and for networks with , , and for networks with , and and for networks with , accordingly. The threshold is assumed to be obtained with .
TABLE II.
The threshold (), the average giant fraction and the size of the feedback vertex set of HD, , BPD, EI, PR, and RR on the 17 real-world networks. Here CI is with for the Email-EuAll network and for the as-Skitter network. Each result of EI, PR, and RR is obtained by averaging 20 independent realizations. The bold numbers are the minimal value of each objective among these methods for a same network.
qc×n
F
nFVS
Networksa
HD
CI
BPD
EI
PR
RR
CI
EI
PR
RR
BPD
PR
RR
Power
975
570
316
337.10
440.90
282.55
0.0449
0.0112
0.0154
0.0076
516
485.60
487.65
CA-GrQc
912
1760
398
428.25
390.20
372.10
0.0527
0.0347
0.0356
0.0289
1449
1426.20
1427.00
p2p-Gnutella08
2045
1444
1300
1508.95
1331.20
1372.55
0.1415
0.1651
0.1486
0.1386
1256
1276.85
1281.00
as-733
243
192
162
169.35
187.80
152.85
0.0150
0.0097
0.0117
0.0087
216
208.00
208.60
Scottish
877
2036
434
471.05
432.85
442.70
0.0542
0.0259
0.0256
0.0231
444
436.35
438.00
CA-AstroPh
8544
4865
4198
4320.60
4055.60
4013.10
0.1562
0.1579
0.1368
0.1200
8626
8529.65
8525.80
CA-CondMat
5726
3217
2569
2700.80
2559.30
2534.35
0.0832
0.0774
0.0694
0.0625
8323
8230.20
8228.40
hep-th
18097
11184
10294
11002.85
9913.35
9732.10
0.2541
0.2742
0.2437
0.1915
12344
12097.45
12103.15
Cit-HepPh
22533
14164
13455
14498.90
13089.05
12982.90
0.2645
0.2860
0.2533
0.2056
15405
15133.45
15139.80
Email-Enron
4097
3074
2621
2764.35
2619.00
2572.90
0.0292
0.0314
0.0263
0.0217
7853
7748.70
7746.35
loc-Gowalla
53828
31386
26951
26916.70
25703.10
25015.30
0.0868
0.0916
0.0812
0.0625
38841
37690.20
37739.00
Email-EuAll
1431
1193
1064
6985.80
1104.30
1077.20
0.0056
0.0019
0.0012
0.0008
1187
1182.80
1193.65
com-Amazon
78308
42108
29572
27471.15
28056.55
26342.10
0.0793
0.0619
0.0583
0.0424
85274
82364.55
82263.80
web-Google
253099
82525
50861
41948.85
41175.95
33573.35
0.0526
0.0322
0.0312
0.0227
208876
205231.85
205435.45
PAroad
273899
71134
21172
17204.05
11150.15
10124.80
0.0417
0.0034
0.0019
0.0012
194443
176535.00
177536.80
Txroad
307413
82744
20873
16800.10
10676.50
9365.95
0.0342
0.0019
0.0011
0.0007
239909
217066.25
217823.05
as-Skitter
322128
151846
74286
70901.00
62059.25
63977.35
0.0394
0.0287
0.0239
0.0215
228775
224356.65
225329.90
The source code of CI is from Ref. [43]. The source code of BPD is from Ref. [44].
We first conduct our validation on a number of real-world networks from various fields: one Power Grid network [32,33] (Power), three Collaboration networks [34] (including ca-GrQc, ca-AstroPh, and ca-CondMat), one Internet peer-to-peer network [34,35], Autonomous systems graphs [36] (including as-733 and as-Skitter), the Scottish cattle movements network [19], two Citation networks [36,37] (including hep-th and cit-HepTh), two Communication networks (including email-Enron [38,39] and email-EuAll [34]), one Location-based online social network [40] (loc-Gowalla), the Amazon product co-purchasing network [41] (com-Amazon), the Google web graph [39] (web-Google), and two Road networks [39] (including roadNet-PA and roadNet-TX). The choices of these networks consider both the density of edges [1] and the assortativity of degrees [24,42], which are associated with robustness of a network. Some basic information regarding these networks is given in Table I. Note that for all networks studied here, the directed edges are simply replaced with undirected edges, and self-loops and isolated nodes are entirely deleted.
TABLE I.
Basic information of the real-world networks where CC is the clustering coefficient [32] and AC denotes the assortativity coefficient [24], respectively.
Networksa
n
m
CC
AC
Power
4941
6594
0.0801
0.0035
CA-GrQc
5242
14484
0.5296
0.6593
p2p-Gnutella08
6301
20777
0.0109
0.0356
as-733
6474
12572
0.2522
−0.1818
Scottish
7228
24784
0.2798
−0.1985
CA-AstroPh
18771
198050
0.6306
0.2051
CA-CondMat
23133
93439
0.6334
0.1340
hep-th
27240
341923
0.3119
−0.0302
Cit-HepPh
34546
420877
0.2848
−0.0063
Email-Enron
36692
183831
0.4970
−0.1108
loc-Gowalla
196591
950327
0.2367
−0.0293
Email-EuAll
265214
364481
0.0671
−0.1781
com-Amazon
334863
925872
0.3967
−0.0588
web-Google
875713
4322051
0.5143
−0.0551
PAroad
1088092
1541898
0.0465
0.1227
Txroad
1379917
1921660
0.0470
0.1304
as-Skitter
1696415
11095298
0.2581
−0.0814
The source data of these networks is from either http://www.snap.stanford.edu/data or http://www.konect.uni-koblenz.de/networks/opsahl-powergrid.
Basic information of the real-world networks where CC is the clustering coefficient [32] and AC denotes the assortativity coefficient [24], respectively.The source data of these networks is from either http://www.snap.stanford.edu/data or http://www.konect.uni-koblenz.de/networks/opsahl-powergrid.In Fig. 2 the proportion of the largest component versus the fraction of removed nodes is plotted by comparing RR, CI, BPD, and EI on the CA-AstroPh network, the Cit-HepPh network, the TXroad network, and the as-Skitter network. In almost all the situations studied here, RR exhibits notable superiority of less nodes to be removed for same size of giant component compared to the other strategies. Further regarding certain metrics (Fig. 3 and Table II), RR also shows better threshold in most networks and represents minimal average giant fraction in all cases compared to HD, CI, BPD, and EI, especially for the four largest networks where both and of RR are significantly smaller than the other strategies, e.g., RR needs less than half of nodes of HD, CI, and BPD to split the two road networks into fragments with . In addition, PR can also achieve smaller in networks than HD, CI, BPD, and EI.
FIG. 2.
The fraction of the size of the giant component versus the fraction of removed nodes for CI, BPD, EI, and RR for (a) the CA-AstroPh network, (b) the Cit-HepPh network, (c) the TXroad network, and (d) the as-Skitter network (where CI with ). Each result of EI and RR is obtained by averaging 20 realizations.
FIG. 3.
The percentage of improvement of 17 real-world networks by comparing RR with HD, CI, BPD, and EI. where corresponds to HD, CI, BPD, or EI. Each result of EI and RR is obtained by averaging 20 realizations.
The fraction of the size of the giant component versus the fraction of removed nodes for CI, BPD, EI, and RR for (a) the CA-AstroPh network, (b) the Cit-HepPh network, (c) the TXroad network, and (d) the as-Skitter network (where CI with ). Each result of EI and RR is obtained by averaging 20 realizations.The percentage of improvement of 17 real-world networks by comparing RR with HD, CI, BPD, and EI. where corresponds to HD, CI, BPD, or EI. Each result of EI and RR is obtained by averaging 20 realizations.The threshold (), the average giant fraction and the size of the feedback vertex set of HD, , BPD, EI, PR, and RR on the 17 real-world networks. Here CI is with for the Email-EuAll network and for the as-Skitter network. Each result of EI, PR, and RR is obtained by averaging 20 independent realizations. The bold numbers are the minimal value of each objective among these methods for a same network.The source code of CI is from Ref. [43]. The source code of BPD is from Ref. [44].We further evaluate the performance of the proposed strategies (both RR and PR) by focusing on artificial model networks [including Erdős-Rényi (ER) [45] and scale-free (SF) [33] networks]. Note that here RR is in the normal way (optimizing ) and PR is to optimize FVS (following the idea of BPD). As illustrated in Fig. 4, RR significantly outperforms CI of lower curves on both ER and SF networks. Considering the threshold on the ER networks, we respectively obtain by BPD, through EI (slightly larger, 0.0005, than the results in Ref. [19]) and with PR. Meanwhile, PR with is closer to BPD with compared to EI with in the SF networks. Besides, the results of versus the average degree are exhibited in Fig. 5. Interestingly, when tied by , EI performs worse and worse with the increase of . The reason why this happens is ascribed to the fact that (used to measure the spreading ability of a node [19]) is harder and harder to identify the nodes with similar degree as rising,
where consists of all 's nearest neighbors, is the number of leaves (nodes with degree 1) in and is the number of strong hubs (nodes with ). In other words, more and more nodes have a degree larger than when the network becomes dense. Hence, we also report the results of EI with () in Fig. 5 (but this adaptation is invalid for real-world networks). To summarize: considering the threshold, PR performs better than both CI and EI but slightly worse for all cases] than BPD in the model networks. In contrast to this, RR obtains a quite small compared to other methods, e.g., in Fig. 4(a) and in Fig. 4(b), respectively.
FIG. 4.
The fraction of the size of the largest cluster versus the fraction of removed nodes (over 50 sample networks) for CI, BPD, EI, PR, and RR on (a) ER networks with and , and (b) SF networks with , , and .
FIG. 5.
The threshold in dependence on the average degree (50 sample networks for each ) for CI, BPD, , , and PR for (a) ER networks with and (b) SF networks with and .
The fraction of the size of the largest cluster versus the fraction of removed nodes (over 50 sample networks) for CI, BPD, EI, PR, and RR on (a) ER networks with and , and (b) SF networks with , , and .The threshold in dependence on the average degree (50 sample networks for each ) for CI, BPD, , , and PR for (a) ER networks with and (b) SF networks with and .The different performances of BPD in model and real-world networks arouse our interest in another question: how do the loops influence the effectiveness of BPD, since the belief propagation (BP) algorithm is actually sensitive to the existence of circles in a network and most of the real-world networks have a lot of loops (see Table I)? We still employ the paradigmatic ER and SF models to construct our basis networks. Then, for each network, the following strategies are used to enhance the clustering coefficients, i.e., increase the local loops. (i) Randomly choose one node and its two corresponding neighbors and subject to , which means that there is no edge between and , namely, and . (ii) In the same way to respectively choose one of the neighbors of and , assuming they are and satisfying , , and . (iii) Cut (delete) the edges and and at the same time add two new edges . (iv) repeat (i)-(iii) until the network reaches our demand, i.e., a given clustering coefficient. In this manner, the clustering coefficients of these networks can be improved and, apparently, the degree distribution of them is kept constant. As illustrated in Fig. 6, the fraction of PR rise more slowly than BPD with the increase of the clustering coefficients CC in both ER and SF networks, while the threshold of PR decreases more quickly than BPD. This may indicate that PR is more suitable than BPD for real-world networks. Therefore, we also show the performances of BPD, PR, and RR for the FVS problem in Table II where PR finds a smaller FVS than BPD in almost all the networks ().
FIG. 6.
The threshold and the fraction of the feedback vertex set versus the clustering coefficients CC for BPD and PR in (a) ER networks with and , and (b) SF networks with , , and .
The threshold and the fraction of the feedback vertex set versus the clustering coefficients CC for BPD and PR in (a) ER networks with and , and (b) SF networks with , , and .Moreover, we consider the two largest networks, the TXroad network (with maximal degree 12) and the as-Skitter network (with maximal degree ), to demonstrate the efficiency of the proposed methods. Since it is hard to analyze the computational complexity of RR and PR in detail, we here put them as well as CI and BPD (open-source codes written by either C or C++ program) in the same simulated environment and compare their time consumptions. As illustrated in Fig. 7, both PR and RR get smaller thresholds than CI and BPD within a quite short time, in particular, RR takes only to obtain a better result than CI and BPD in the TXroad network. Note that the running time of CI and BPD reported here may be as a reference but not as a standard.
FIG. 7.
The running time [measured by second (s)] of CI, BPD, PR (with ) and RR (with ) on (a) the TXroad network and (b) the as-Skitter network. The horizontal dash lines correspond to the thresholds of either CI or BPD. The values marked beside the vertical dash lines are related to the computational time indicating that the proposed methods begin to have smaller thresholds than both CI and BPD. All the results are obtained by averaging 20 implementations.
The running time [measured by second (s)] of CI, BPD, PR (with ) and RR (with ) on (a) the TXroad network and (b) the as-Skitter network. The horizontal dash lines correspond to the thresholds of either CI or BPD. The values marked beside the vertical dash lines are related to the computational time indicating that the proposed methods begin to have smaller thresholds than both CI and BPD. All the results are obtained by averaging 20 implementations.Finally, the susceptible-infectious-recovery (SIR) epidemic spreading model [5,18,46] is used to investigate the spreading process of a virus on the email-Enron network and the loc-Gowalla network by comparing the CI, EI, and RR methods. For a given network under SIR simulation, its nodes belong to either the susceptible, infected, or recovered state. And before the start of the simulation, a part of nodes are previously identified and removed from the network based on a certain strategy. Then one random node is selected from the remaining network as the infected source and the others are to be susceptible. In each time step, the infected nodes infect their susceptible neighbors with the infection rate , and then they recover with rate . The recovered nodes are removed from the network too. This process is repeated until there is no infected node in the network. The simulation results are shown in Fig. 8 where and are fixed to 0.2 and 0.05, respectively. On both networks, RR has a significantly lower value (9.5 to 20.0 times) of recovered individuals than EI under the same immunized fraction [Figs. 8(a) and 8(b)]. Considering the final recovered fraction [Figs. 8(c) and 8(d)], RR also outperforms CI and EI in almost all situations.
FIG. 8.
The SIR simulation results of CI, EI and RR respectively on the Email-Enron network (a, c) and the loc-Gowalla network (b, d), including (a–b) the rate of infected () and recovered () individuals versus the spreading time step under the immunized fraction , and (c–d) the final recovered fraction versus the fraction of immunized nodes . In each network, independent selections are conducted.
The SIR simulation results of CI, EI and RR respectively on the Email-Enron network (a, c) and the loc-Gowalla network (b, d), including (a–b) the rate of infected () and recovered () individuals versus the spreading time step under the immunized fraction , and (c–d) the final recovered fraction versus the fraction of immunized nodes . In each network, independent selections are conducted.
CONCLUSION
In this paper, two methods as effective strategies have been developed for the robustness and target immunization problems based on percolation transition. The proposed strategies choose the removed (immunized) fraction by repeatedly investigating and capturing the interrelationship among nodes. To evaluate the effectiveness of both proposed methods, we conduct numerous simulations on two types of model networks as well as 17 real-world networks from different fields. The results, especially most of the empirical networks, clearly illustrate that our strategy considerably outperforms the existing well-known strategies, like CI [3] and EI [19]. In addition, our strategies might open up a new path to investigate more effective solutions to the robustness and immunization problems as well as obtain the minimal feedback set [4] in network science.