Literature DB >> 28604809

Constructing a Watts-Strogatz network from a small-world network with symmetric degree distribution.

Mozart B C Menezes¹, Seokjin Kim², Rongbing Huang³.

Abstract

Though the small-world phenomenon is widespread in many real networks, it is still challenging to replicate a large network at the full scale for further study on its structure and dynamics when sufficient data are not readily available. We propose a method to construct a Watts-Strogatz network using a sample from a small-world network with symmetric degree distribution. Our method yields an estimated degree distribution which fits closely with that of a Watts-Strogatz network and leads into accurate estimates of network metrics such as clustering coefficient and degree of separation. We observe that the accuracy of our method increases as network size increases.

Entities: Chemical Gene Species

Mesh：

Year: 2017 PMID： 28604809 PMCID： PMC5467850 DOI： 10.1371/journal.pone.0179120

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Since the term “small world” was coined first by the Milgram’s pioneering experiment [1], Watts and Strogatz [2] have proposed the most compelling analytical framework demonstrating the small-world phenomenon prevalent in a range of social, information, technological, and biological networks. A small world consists of many local clusters, but all members are connected with short distance via a few more connected members. These conditions for a small world to emerge are minimal and many real networks have shown small-world properties [3-5]. However, it is not straight-forward to quantify “small-world-ness.” A quantitative model measures the equivalence between a network and a unique Watts-Strogatz (WS) model [6]. For a real network with high equivalence, the corresponding WS model can be generated to explore the structure and dynamics. Even though such equivalence is confirmed, it is not viable to study a large real network when sufficient data on a population are not available. For example, modern social networks are very large in sizes and it takes significant resources and time to collect the data of a network and find key parameters. Web-based experiments accommodating a large number of participants are more difficult to control in some respects than are those conducted in physical laboratories [7]. When such real experiments are impractical, an artificially structured network can be studied instead [8]. However, a large real network with strong small-world properties cannot be replicated into the corresponding WS model, unless its parameters are estimated from a sample. A WS model [2] is characterized by n = number of nodes, K = number of neighbors a node has to its right side in the regular lattice before rewiring, and p = “rewiring” probability with which the right end of an arc incident to a node is rewired uniformly randomly to another node. The size of population under study is represented by n which is known a priori or can be estimated [9, 10] in many cases. The sample mean of node degrees is an estimator for K since the total number of arcs remain invariant after rewiring. Among the three parameters, it is the most challenging to estimate p. Motivated by this immediate need, we formulate a method to estimate p leading into an estimated degree distribution which fits closely with that of the corresponding WS model. These three parameters (n, K, p) indeed suffice to characterize a WS network. We observe that, from many generated WS networks under the same values of (n, K, p), variations in network metrics such as clustering coefficient (CC) and degree of separation (DS) (defined as characteristic path length in [2]) are very small. A direct question from this motivation, then, is how many arcs are incident to node i ∈ N after rewiring, where N is the set of nodes in a network. We start from deriving the degree distribution of a network represented by the probability P(δ = m) that node i has a degree of m, where P is a probability mass function and δ = the degree of node i (or the number of arcs incidents to node i) after rewiring. A previous derivation of P(δ = m) in [11] is based on the assumption that δ ≥ K after rewiring, which might not be the case for some WS networks we have generated. Nonetheless, this assumption allows a simpler formulation as a result. We thus propose a new formulation of P(δ = m) which is closer to the exact value in a WS network.

Results and discussion

In the regular lattice of a WS network before rewiring, a node i ∈ N has degree 2K with K arcs incident to its right neighbors and K to its left ones. Let N be the set of nodes connected to node i before or during rewiring (whereas δ is the degree of node i after rewiring). Then, before rewiring, |N| = 2K. Note that N does not include node i. Node i loses one degree after a sequence of events below takes place to node j ∈ N with the assumption of |N| = 2K. Arc {i, j} is chosen for rewiring with probability p. One end of arc {i, j} (attached to node i) is chosen with probability 1/2. The chosen end is rewired, with probability (n − 1 − 2K)/n, to a node which is neither node j nor one of the nodes in N. Consequently, the probability that the degree of a node decreases by 1 is . We admit that |N| = 2K might not be the case during rewiring and our formulation of P(δ = m) is an approximation. The small world property (high local clustering and short paths) emerges for a small rewiring probability p ranging from 0.001 to 0.1 in Fig 2 in [2]. For a small p, e.g., p = 0.01, about 1% of the arcs are rewired. Accordingly, the degree of most nodes would be N = 2K during rewiring and this assumption is not significantly limiting. As shown in the examples we have generated, our approximation still results in small errors. On the other hand, node i gains one degree after the steps below if arc {j, k}, among the (n − 2)K arcs not incident initially to node i, is detached and an end of arc {j, k} is rewired to node i chosen randomly. If i = j or i = k, arc {j, k} is attached back and not rewired. Arc {j, k} is detached for rewiring with probability p. Node i is chosen with probability 1/n and an end of arc {j, k} is rewired to node i. Thus, the probability that the degree of a node increases by 1 is β ≡ p/n. We rewrite the number of degrees after rewiring as δ = 2K − X + Y, where X and Y are binomial random variables representing the number of degrees lost and the number of degrees gained at node i, respectively. Then we have and For m ≥ n or m < 0, P(δ = m) = 0. In Eq (4), we have bounds as max{2K − m, 0} ≤ x ≤ min{n − 1 − m, 2K}, where x is the number of degrees lost. For 0 ≤ m ≤ 2K, 2K − m ≤ x ≤ 2K. Node i loses at most 2K degrees since |N| = 2K before rewiring, but cannot lose less than 2K − m degrees. Otherwise, δ = m is impossible. For 2K < m ≤ n − 1 − 2K, 0 ≤ x ≤ 2K. In this case, node i can lose all 2K degrees as long as it can gain from the other n − 1 − 2K nodes and can lose none since 2K < m. For n − 1 − 2K < m ≤ n − 1, 0 ≤ x ≤ n − 1 − m. Since m is larger than in the previous case, node i loses no more than n − 1 − m degrees and can lose none. In the conditional probability in Eq (4), Y = m − 2K + x is immediate from 2K − x + Y = m. We assume a large n (e.g., n >>K) consistent with a large network which we are mainly interested in. From Eqs (1), (2) and (4), we have The probability mass function of binomial distribution with probability β in Eq (5) can be approximated by that of a Poisson distribution with rate λ = (n − 2)Kβ = (n − 2)Kp/n for a large n and a small β, which are the cases in small-world networks. From the fact that and for a large n, Eq (5) can be written as Then the mean estimated from Eq (6) is and the standard deviation is . From now on, we use interchangeably use P(m) and P(δ = m). Fig 1 includes an actual WS network generated with parameters of (n = 1,000, K = 8, p = 0.04). As shown in Fig 2, the degree distribution of the generated WS network is symmetric and closely estimated by Eq (6). Due to the symmetry of degrees in WS networks, our framework is intended for networks with symmetric degree distribution.

Fig 1

A Watts-Strogatz network.

This example with parameters of (n = 1,000, K = 8, p = 0.04), generated in Mathematica 10 by Wolfram Research, has CC = 0.6170 and DS = 4.1531. Nodes are distinguished by different sizes and colors representing their degrees ranging from from 13 to 19.

Fig 2

Distributions of degrees in a Watts-Strogatz network and degrees estimated by Eq (6).

The WS network was generated with parameters of (n = 1,000, K = 8, p = 0.04).

A Watts-Strogatz network.

Distributions of degrees in a Watts-Strogatz network and degrees estimated by Eq (6).

The WS network was generated with parameters of (n = 1,000, K = 8, p = 0.04). The example WS network in Fig 1 is not the only one whose node degrees are close to those estimated by Eq (6). We now demonstrate their statistical fit via 8 tuples of parameters which were set to be n = 5,000, 10,000, K = 50, 75 and p = 0.01, 0.05. For each tuple of (n, K, p), 100 WS networks were generated and their node degrees were recorded for two tests performed. First, a chi-square test was performed for each of 800 WS networks between the actual node degrees and estimated values (nP(δ = m)) given by Eq (6). None of the 800 tests were significant at the given significance level of α = 5% and these results corroborate our observation in Fig 2. Second, a t-test was performed on the number of nodes for each degree m for 100 WS networks with respect to each tuple of (n, K, p). The 95% confidence interval is , where is the sample mean from 100 WS networks and is the sample standard deviation. For each tuple and each m, the value of nP(δ = m) estimated by Eq (6) lied within the corresponding confidence interval. Fig 3 shows the distributions of CC and DS from 1,000 WS networks randomly generated with parameters of (n = 1,000, K = 8, p = 0.04) kept the same as in Fig 1. The distributions of CC and DS are symmetric with small ranges (0.6048 ≤ CC ≤ 0.6336 and 4.0278 ≤ DS ≤ 4.3333). Thus, given estimates of n, K and p are accurate, the resulting estimates of CC and DS would also be accurate. It is promising to use our method to estimate K and p from a sample and then evaluate network metrics such as CC and DS of the corresponding WS network.

Fig 3

Distributions of CC and DS.

1,000 WS networks were randomly generated with parameters of (n = 1000, K = 8, p = 0.04).

Distributions of CC and DS.

1,000 WS networks were randomly generated with parameters of (n = 1000, K = 8, p = 0.04). Given that n is known or estimated, we propose an algorithm below to find estimates and for their population values of K and p, respectively. Let S = {(i, δ); i = 1, …, s} be a set of s individuals sampled from a WS network, where individual i has a degree of δ. Since the total number of arcs remains the same after rewiring, an estimate for the sample mean is and the sample standard deviation is estimated to be . Then we perform a search for until the standard deviation (, where ) calculated from Eq (6) gets close enough to . Our algorithm is based on the key observation that, as the rewiring probability increases in the WS procedure, the variations of degrees also increase in the resulting network. Thus, given from Eq (6), we find in a “reverse” manner. Algorithm 1 Input: n, (δ, i = 1, …, s) Define ϵ with a very small value (e.g., 0.00001). Calculate and . Let p = 0, p = 1 and . Given , use Eq (6) to calculate and . while do If , let . Else, let . Let and use Eq (6) to calculate and . end while Construct a WS network with parameters of , and calculate CC and DS. Fig 4 shows results from application of Algorithm 1 to 200 WS networks (100 networks for n = 10,000 and 100 networks for n = 20,000) with K = 80 and a randomly chosen p ∈ [0.005, 0.05]. From each network, s = 100 nodes were randomly sampled along with their degrees to calculate from Algorithm 1. The closer the labels are to the diagonal, the more accurate the estimated values match the actual ones. In Fig 4, we normalized all estimated values between 0 and 1 to plot them together. Closer matches between the estimated and actual values are observed for larger networks with n = 20,000. The variations around the diagonal seem to be consistent with those in Fig 3, but exacerbated slightly due to the use of as an estimate of p.

Fig 4

Estimated values of p, CC and DS.

Estimated values of p, CC and DS.

For each case of (a) n = 10,000 and (b) n = 20,000, 100 WS networks were generated with K = 80 and a randomly chosen p ∈ [0.005, 0.05]. From each network, s = 100 nodes were randomly sampled along with their degrees to calculate from Algorithm 1. All estimated values were normalized between 0 and 1. In Fig 5, we measure the accuracy of calculated by Algorithm 1. For each combination of n = 10,000, 20,000, K = 80 and s/n (percentage of nodes sampled) = 1%, 3%, 5%, we generated 30 WS networks with a randomly chosen p ∈ [0.05, 0.2] and calculated from Algorithm 1. Then, for each combination, we calculated the sample mean of 30 ratios (). The ratio of 1 represents an exact match between and p. As sample sizes increase from 1% to 5% of nodes sampled, mean ratios of approach to 1. Also, as in Fig 4, higher accuracy is observed for larger networks with n = 20,000. For each percentage of nodes sampled (s/n = 1%, 3%, 5%), the confidence interval for n = 20,000 is narrower than that for n = 10,000 while both of the confidence intervals overlap with .

Fig 5

Sample means and 95% confidence intervals of 30 ratios of .

Sample means and 95% confidence intervals of 30 ratios of .

For each combination of n = 10,000, 20,000, K = 80 and s/n = 1%, 3%, 5%, 30 WS networks were generated with a randomly chosen p ∈ [0.05, 0.2]. For each network, the value of was calculated from Algorithm 1. Table 1 summarizes the sample means and 95% confidence intervals (in a format of sample mean ± margin of error) in Fig 5. A margin of error is calculated as , where σ is the sample standard deviation of 30 ratios. Again, For each percentage of nodes sampled (s/n = 1%, 3%, 5%), the margins of error for n = 20,000 are smaller resulting in narrower confidence intervals. Thus, our method adds more accuracy for larger networks (e.g., large-scale social networks).

Table 1

Sample means and 95% confidence intervals of 30 ratios of .

n	s/n = 1%	s/n = 3%	s/n = 5%
10,000	0.9943 ± 0.0638	0.9952 ± 0.0371	1.0050 ± 0.0258
20,000	0.9850 ± 0.0362	0.9908 ± 0.0218	1.0061 ± 0.0152

Table notes 95% confidence intervals are shown in a format of sample mean ± margin of error. For each combination of n = 10,000, 20,000, K = 80 and s/n = 1%, 3%, 5%, 30 WS networks were generated with a randomly chosen p ∈ [0.05, 0.2]. For each network, the value of was calculated from Algorithm 1.

Conclusion

We have presented a method to construct a Watts-Strogatz network using a sample from a small-world network with symmetric degree distribution. Our method yields an estimated degree distribution which fits closely with that of a WS network and allows to characterize the population with accurate estimates of network metrics such as clustering coefficient and degree of separation. This is particularly useful when sufficient information on the population is not available due to limited resources and time. As observed, our method is more accurate for larger networks. An obvious limitation of our method is the symmetry of degree distribution and we admit that many real networks have skewed degree distributions. Applications of our method are also limited to networks revealing strong small-world properties which can be well represented by a WS model, since our method is formulated based on the WS rewiring procedure. For a real network either with non-symmetric degree distribution or with weak small-world properties, we still hope that our method serves as a building block for potential revisions or extensions. Replicating a large network from a sample allows further experiments on a generated network for more insights on its structure and dynamics. This would be feasible if some fundamental properties (e.g., small-world properties) of the network are identified and formulated in an analytical model (e.g., the WS model). Then, key parameters of the model can be estimated and a full-scale network can be constructed.

6 in total

1. Classes of small-world networks.

Authors: L A Amaral; A Scala; M Barthelemy; H E Stanley
Journal: Proc Natl Acad Sci U S A Date: 2000-10-10 Impact factor: 11.205

2. The spread of behavior in an online social network experiment.

Authors: Damon Centola
Journal: Science Date: 2010-09-03 Impact factor: 47.728

3. Experimental study of inequality and unpredictability in an artificial cultural market.

Authors: Matthew J Salganik; Peter Sheridan Dodds; Duncan J Watts
Journal: Science Date: 2006-02-10 Impact factor: 47.728

4. Collective dynamics of 'small-world' networks.

Authors: D J Watts; S H Strogatz
Journal: Nature Date: 1998-06-04 Impact factor: 49.962

5. Estimating Population Size Using the Network Scale Up Method.

Authors: Rachael Maltiel; Adrian E Raftery; Tyler H McCormick; Aaron J Baraff
Journal: Ann Appl Stat Date: 2015-09 Impact factor: 2.083

6. Network 'small-world-ness': a quantitative method for determining canonical network equivalence.

Authors: Mark D Humphries; Kevin Gurney
Journal: PLoS One Date: 2008-04-30 Impact factor: 3.240

6 in total

1 in total

1. A multiparametric approach to improve the prediction of response to immunotherapy in patients with metastatic NSCLC.

Authors: Camillo Porta; Romano Danesi; Marzia Del Re; Federico Cucchiara; Eleonora Rofi; Lorenzo Fontanelli; Iacopo Petrini; Nicole Gri; Giulia Pasquini; Mimma Rizzo; Michela Gabelloni; Lorenzo Belluomini; Stefania Crucitta; Raffaele Ciampi; Antonio Frassoldati; Emanuele Neri
Journal: Cancer Immunol Immunother Date: 2020-12-14 Impact factor: 6.968

1 in total