Literature DB >> 29743534

A generalised significance test for individual communities in networks.

Abstract

Many empirical networks have community structure, in which nodes are densely interconnected within each community (i.e., a group of nodes) and sparsely across different communities. Like other local and meso-scale structure of networks, communities are generally heterogeneous in various aspects such as the size, density of edges, connectivity to other communities and significance. In the present study, we propose a method to statistically test the significance of individual communities in a given network. Compared to the previous methods, the present algorithm is unique in that it accepts different community-detection algorithms and the corresponding quality function for single communities. The present method requires that a quality of each community can be quantified and that community detection is performed as optimisation of such a quality function summed over the communities. Various community detection algorithms including modularity maximisation and graph partitioning meet this criterion. Our method estimates a distribution of the quality function for randomised networks to calculate a likelihood of each community in the given network. We illustrate our algorithm by synthetic and empirical networks.

Entities: Disease Species

Year: 2018 PMID： 29743534 PMCID： PMC5943579 DOI： 10.1038/s41598-018-25560-z

Source DB: PubMed Journal: Sci Rep ISSN： 2045-2322 Impact factor: 4.379

Introduction

Many biological, physical and social systems can be expressed as networks, with nodes representing individual entities within the network and edges representing pairwise relationships between nodes[1,2]. Among various structural properties of networks, many empirical networks have community structure such that a network is composed of communities, which are groups of nodes that are densely interconnected with each other while sparsely interconnected with those in other groups[3,4]. A community may correspond to the role of nodes. For example, communities may correspond to functional modules of proteins[5], groups of airports serving the same geographical region[6] and herds of people sharing an interest[7]. Many algorithms have been proposed for finding communities in networks[3,4]. These algorithms are often equipped with a quality function with which to judge whether or not the detected community structure is significant overall. A much less asked fundamental question is the significance of individual communities. In fact, a network may be composed of a part where community structure is pronounced and another part where community structure is vague or absent. To discuss community structure in such a “chimera” network, one needs methods to assess statistical significance of single communities. In the present study, we consider the significance of single communities that have been detected by a non-overlapping community-detection algorithm. An algorithm for testing significance of individual communities was previously proposed[8]. In that algorithm, one uses a quality function for individual communities to compare the quality of a community in question, detected in the given network, and that detected in randomised networks. The distribution of the quality function in randomised networks is analytically known. The authors then used the same significant test in OSLOM, which is an algorithm for finding various types of communities[9]. However, OSLOM does not optimise the same quality function as that used in the aforementioned statistical test or its aggregate over the different communities. The same discrepancy exists in a different significance test for single communities[10]. In an extreme case, let us suppose one detects communities by optimising a quality function that is very different from the quality function used in the statistical test. Then, the detected communities may have small values of the quality function used in the statistical test and will be judged to be insignificant. However, in terms of the quality function used in the community detection, the detected communities may be sufficiently strong. This pitfall may be overcome if one uses the same quality function for the community detection and the statistical test. There exist such significance tests for individual communities[11,12]. However, these significance tests[11,12] do not consider the possible dependence of the quality function value on the size of community[10,13,14]. This practice is problematic for the following reason. Suppose that two communities in the given network have different sizes and bear the same value of the quality function. Then, the significance level (i.e., p-value) in these statistical tests is the same for the two communities. In general, however, the quality function value may be positively correlated with the community size, which is in fact often the case (Methods section). In this case, it is easier for the larger community to attain the observed quality function value than for the smaller community under the null model. Then, the smaller community should be judged to be more significant than the larger community if they yield the same quality function value. An aforementioned statistical test does consider the dependence of the quality function value on the community size[10]. However, that method does not use a common quality function between community detection and statistical testing, as discussed already. Based on these considerations, it will be useful to develop methods to test the significance of individual communities that (i) use a quality function that is consistent with the one used in community detection, and (ii) take into account the dependence of the quality function value on the community size. We will develop a new statistical test for individual communities that meets these criteria. An additional feature of our method is that it allows for general quality functions. Python code for the present significance test is available at https://github.com/skojaku/qstest/.

Methods

Correlation between quality and community size

We consider unweighted networks composed of N nodes. Denote their N × N adjacency matrix by = (A), where A = 1 if nodes i and j are adjacent and A = 0 otherwise. We assume that the network is undirected (i.e., A = A for all i ≠ j) and does not contain self-loops (i.e., A = 0). Let M be the number of edges in the network. We denote by the degree of node i. One may regard a community as significant if its quality value is significantly larger than that expected for randomised networks. This intuitive approach has a problem. To see this, let us consider a benchmark network generated by the Lancichinetti-Fortunato-Radicchi (LFR) model[15] (Fig. 1(a)). The network has N = 103 nodes and consists of C non-overlapping communities. Each node i belongs to one of the C = 31 communities. To generate the network, we set the average node’s degree to 10, the maximum node’s degree to 100, the range of the number of nodes in a community c (denoted by n) to [10,100] and the power-law exponent for the distributions of d and n to 2. Let us consider a quality function given by[13,14]

Figure 1

(a) A network with 31 non-overlapping communities generated by the LFR model. The circles represent nodes. The lines between the nodes represent edges. The colour of each node indicates the planted community to which the node belongs. (b–e) Quality of a community (i.e., , , and ) plotted against its number of nodes, n. The circles indicate the planted communities shown in panel (a). The crosses indicate the communities detected in 500 randomised networks generated by the configuration model. To find communities in the randomised networks, we use the Louvain algorithm[26] for (panel (b)) and a variant of the Kernighan–Lin algorithm[27] for , and (panels (c–e)). Note that the modularity is the sum of over the communities[7]. We find a strong positive correlation between and n (circles in Fig. 1(b)). This is also true for communities in randomised networks that are generated by the configuration model, i.e., random networks that preserve the expected degree of each node (crosses in Fig. 1(b)). Crucially, large communities detected in the randomised networks have larger values than small communities in the original network do. Therefore, we can not judge the significance of communities solely by the value of . The results are qualitatively the same for other quality functions for individual communities introduced in the following section (Fig. 1(c,d and e)).

Our statistical test

On the basis of the observations made in the previous section, we construct a statistical test for individual communities as follows. Note that we do not specify the quality function q, which may be or a different one. Moreover, we do not specify how one measures the size s of community c. We refer to the present statistical test based on a quality function q and community size s as the (q, s)–test. Suppose that we have a community c with quality q and size s. We judge community c to be significant if its q value is larger than those for communities of the same size s detected in randomised networks. We compute , which is the probability that a community of size s detected in randomised networks generated by the configuration model has a quality value larger than q. We numerically estimate as follows. First, we generate 500 randomised networks using the configuration model. Then, we detect communities in each randomised network by the algorithm that has been used to detect communities in the original network. Let be the sum of the number of communities detected in the 500 randomised networks. For each community in the randomised networks, we compute the quality and size . Then, we compute the average values, i.e., and , and the unbiased estimation of the standard deviation, i.e., and . We estimate the joint probability distribution using the kernel density estimator[16] as follows:where h is the width of the kernel. The function f (·, ·) is the bivariate Gaussian kernel (i.e., bivariate standard normal distribution) given bywhereis the Pearson correlation coefficient between and . The probability distribution estimated by the Gaussian kernels is close to any form of the true probability distribution as the number of samples increases[17]. Although there are also non-Gaussian kernels that share this property[17], we used the Gaussian kernels, which is a state-of-the-art method. The width h is a free parameter that affects the speed of the convergence to the true probability distribution. Optimising the value of h requires assumptions for the true probability distributions and intensive computations[18,19]. Therefore, we set according to Scott’s rule-of-thumb[20], which often provides a reasonable estimate in practice[18-20]. The conditional probability, , is given by The integration of f (x1, x2) over x1 yieldswhere Φ (·) is the cumulative distribution function of the standard normal distribution. By substituting Eq. (6) into Eq. (5), we have Finally, we regard community c as significant if , where α ∈ [0, 1] is the significance level. The conditional probability obeys a uniform probability distribution over [0, 1] for a community detected in a randomised network (see Supplementary Information 1). One can estimate more accurate p-values (i.e. ) using a larger number of randomised networks, which, however, requires an additional computational time. We opt to use 500 randomised networks to obtain sufficiently accurate p-values in a reasonable time. In fact, the p-value does not change much if one increases the number of randomised networks beyond 500 or if one uses networks with different numbers of nodes and communities (Supplementary Information 2). As the number of communities, C, increases, some insignificant communities would be significant owing to the multiple comparison problem. To avoid this, we use the Šidák correction[21], i.e., α = 1 − (1 − α′)1/, where α′ ∈ [0, 1] is the targeted significance level. We set α′ = 0.05.

Time complexity

The time complexity of the proposed statistical test is evaluated as follows. Generating one randomised network from the configuration model consumes time using an efficient algorithm[22], which is implemented in some network analysis software[23,24]. For each generated randomised network, we detect communities. Any community-detection algorithm qualified for the present statistical test computes the quality and size of the individual communities and maximises the quality function for the entire network. We use the quality and size of the optimised communities in the statistical test. We carry out these procedures for each of the R randomised networks, consuming time in total, where Z is the time complexity of the community-detection algorithm. We compute the p-value for each of the C communities in the original network using Eq. (7) with RCconf samples on average, and , where Cconf is the average number of communities detected in a randomised network. This incurs a time complexity of . In total, the proposed statistical test requires time. The time complexity can be mitigated using parallel computing. In other words, one runs multiple threads, each of which generates independent samples of . Once the sampling is completed in all the threads, one computes the p-value using Eq. (7). We used 16 threads on a computer with the Intel 2.6 GHz Sandy Bridge processors and 4GB of memory. For the largest network we analysed (i.e., Internet[25]; N = 34,761 nodes), our statistical test needed 403 seconds using the Louvain community-detection algorithm, which has a time complexity of [26]. With the Kernighan-Lin community-detection algorithm having a time complexity of [27], it took 17,763 seconds (i.e. approximately 5 hours).

Community detection with different quality functions

Among various quality functions for individual communities apart from [4,13,14], we consider the following three quality functions. The internal average degree[14] (i.e., normalised number of intra-community edges), denoted by , is defined by The maximisation of yields a community having dense intra-community connectivity. The expansion[14], denoted by , is defined by The maximisation of yields a community having sparse inter-community connectivity. Finally, the conductance[14], denoted by , is defined bywhere vol is the sum of degrees of nodes (i.e., volume) in a community c. Similar to the case of , the maximisation of yields a community having sparse inter-community connectivity. One can also interpret the maximisation of as the maximisation of the number of intra-community edges[28]. For , we adopt the Louvain algorithm to maximise the modularity (i.e., sum of over the communities, ) to find communities in the original and randomised networks. However, the Louvain algorithm is not available to , where , or . Therefore, we adopt a variant of the Kernighan–Lin algorithm[29] used in a previous study[27]. The algorithm seeks partitioning of the network into communities that maximises Q. Suppose that each node i has a tentative label indicating the index of the community to which node i belongs. First, we assign each node to one of the C communities selected uniformly at random. Second, for each node i, we tentatively relabel it to a different label and measure the increment in Q. Third, we select the node i and its new label c that maximise the increment in Q among all nodes i (1 ≤ i ≤ N) and all possible new labels. Regardless of whether Q increases or not, we accept the proposed relabelling of node i (i.e., set ). Fourth, we determine the pair of another node j (j ≠ i) and its tentative new label c′, which maximises the increment in Q, and change the label of j to c′ (i.e., ). In this manner, we relabel nodes one by one. Here we do not relabel the nodes that have already been relabelled. After sequentially relabelling the N nodes, we select the labelling that yields the largest value of Q among the N + 1 labellings that have appeared in the course of relabelling the N nodes. If the initial labelling (before relabelling any node) yields the largest value of Q, we terminate the algorithm. Otherwise, we use the labelling that has yielded the largest Q value among the N + 1 labellings as the initial labelling in the next round of updating the labels. We repeat the aforementioned procedure to sequentially relabel N nodes and select the best labelling. We repeat rounds of updating until the initial labelling is the best labelling in the round in terms of the Q value. To find communities in networks using , or , we need to specify the number of communities, C. Otherwise, the maximisation of the quality functions may yield trivial communities. For example, is always the largest when each connected component constitutes a community because there is no inter-community edge. In the analysis of synthetic networks, we set C to the number of planted communities. For empirical networks, we set C to the number of communities identified by the Louvain algorithm.

Other statistical tests

We compare the (q, s)–test with two statistical tests, i.e., the test proposed by Spirin and Mirny[10] and the test proposed by Lancichinetti, Radicchi and Ramasco[8], which we refer to as the S–test and L–test, respectively. As is the case with the (q, s)–test, both S–test and L–test adopt the configuration model as the null model. For both statistical tests, we set the significance level for a single community to α = 1 − (1 − α′)1/, where α′ = 0.05. The S–test regards a community as significant if it has more intra-community edges than a community composed of the same number of nodes detected in randomised networks does. Their original algorithm[10] is slow for large networks. Therefore, we adopt the Kernighan–Lin algorithm[29] to optimise the quality function for a community adopted in the S–test. Up to our numerical efforts, our implementation is faster and also finds better community structure than their original algorithm does in terms of their quality function. The L–test regards a community as significant if every node in the community has more neighbours within the community than that expected for the configuration model. In the original paper[8], the authors defined two significance measures, i.e., -score and -score. We adopt the –score, which is less conservative than the –score. In the original article[8], the –score is claimed to be more trustworthy than the –score because the –score but not the B-score relies on an extreme value statistics.

Data

We apply the statistical test to the 12 empirical networks listed in Table 1. We ignore the directions and weights of edges in the empirical networks.

Table 1

Properties of 12 empirical networks.

Network	N	M	C	n _c		vol_c
Network	N	M	C	Min	Max	Min	Max
Karate[30]	34	78	3	5	17	16	78
Dolphin[31]	62	159	4	7	22	37	123
Les Misérables[32]	77	254	10	2	16	3	147
Email[33]	151	1527	6	16	50	258	1081
Jazz[34]	198	2742	6	3	63	9	2029
Network science[7]	379	914	11	6	65	27	290
Blog[35]	1222	16,714	2	565	657	15,755	17,673
Airport[36,37]	2939	15,677	20	2	712	2	12,638
Protein[38,39]	3023	6149	161	2	312	2	1832
Chess[25]	7115	55,779	409	2	812	3	23,034
Astro-ph (co-authorship)[40]	18,771	198,050	116	2	3547	2	98,628
Internet[25]	34,761	107,720	65	4	13,710	7	106,881

Column C indicates the number of communities detected by the Louvain algorithm. Columns n and vol indicate the number of nodes in a community and the sum of degrees of nodes in a community, respectively.

Properties of 12 empirical networks. Column C indicates the number of communities detected by the Louvain algorithm. Columns n and vol indicate the number of nodes in a community and the sum of degrees of nodes in a community, respectively. The karate club network represents the relationships among the members of a university’s karate club[30]. Each node represents a member of the karate club. Two members are defined to be adjacent if they are friends outside of the club activities. The dolphin social network represents the relationships of the dolphins living near Doubtful Sound in New Zealand[31]. Each node represents a dolphin. Two dolphins are defined to be adjacent if they are frequently observed in the same school. The network of Les Misérables represents the relationships between the characters of a novel, Les Misèrables[32]. Each node represents a character of the book. Each edge indicates that they appear in the same chapter of the book. The Enron email network represents the email interactions among the staff of Enron Inc[33]. Each node represents an email account. Each edge indicates that an email is sent from one account to the other account. The jazz network represents the collaborations among jazz musicians[34]. Each node represents a jazz musician. Each edge indicates that two musicians belong to the same band. The network of network scientists represents the collaborations between researchers in network science[7]. Each node represents a researcher. Two researchers are defined to be adjacent if they have published a co-authored paper cited by one of two popular review papers on network science. Then, some nodes and edges were added manually by the author of the article[7]. We only consider the largest connected component of the network. The political blog network is the network of blogs on the United States presidential election in 2004[35]. Each node represents a blog. Two blogs are defined to be adjacent if there is at least one hyperlink between the two blogs on their front page. The airport network consists of nodes representing airports in the world[36,37]. Two airports are defined to be adjacent if there is a direct commercial flight between the two airports. The protein network represents the physical interactions among human proteins[38,39]. Each node represents a protein. Two proteins are defined to be adjacent if they physically interact. The Chess network represents the chess matches between players[25]. Each node represents a chess player. Each edge indicates that they have played at least once. The Astro-ph network represents the collaborations among the researchers who published a joint paper in the arXiv’s astro-ph section[40]. Each node represents a researcher. Two researchers are defined to be adjacent if they have published a joint paper. The Internet network represents the network of autonomous systems[25]. A node represents an autonomous system, which is a group of routers maintained by a network operator. Two autonomous systems are defined to be adjacent if they have a logical peering relation.

Results

We measure the size of a community in two ways: the number of nodes in a community c, n, and the sum of degrees of nodes in a community c, vol. In the next two subsections, we consider the –test and the –test. We show the results for other quality functions in the third subsection.

Synthetic networks

In this section, we examine synthetic networks with planted communities. We generate networks using the LFR model[15], which places edges such that the node’s degree, (i.e., d), and the number of nodes in a community c, (i.e., n), follow power-law distributions. We set the power-law exponent for the distributions of d and n to 2, the average node’s degree to 10, the maximum degree to 100 and the range of n to [20,200]. The networks are composed of N = 103 nodes. Each node i has an average fraction 1 − μ of neighbours belonging to the same community, where μ ∈ {0, 0.025, 0.05, …, 1} is a mixing parameter controlling the “strength” of community structure. With μ = 0, all edges are placed within communities, and the community structure is the strongest. With μ = 1, all edges are between different communities. We set the extent of overlaps between different communities to zero. We generate 30 networks using the LFR model at each μ value. For each generated network, we classify the planted communities into significant and insignificant communities by each statistical test. Then, we compute the true positive rate (i.e., the fraction of significant communities in the network). Finally, we average the true positive rate over the 30 generated networks. Figure 2 shows the true positive rate as a function of μ. The true positive rate for the S–test is smallest for the entire range of μ, indicating that the S–test is the most conservative. The S–test does not regard all the planted communities as significant even at μ = 0 for the following reason. In the S–test, one detects the strongest community in each randomised network, where the strength of a community is measured by the number of intra-community edges. Then, a focal community in the original network is regarded as significant if it is stronger than the majority of the strongest communities detected in the randomised networks. The strongest communities in the randomised networks often contain almost the largest possible number of intra-community edges, whereas the planted communities do not always even at μ = 0. Therefore, the S–test concludes that some planted communities are insignificant. The true positive rate for the L–test is 1 when μ = 0 and ranges between 0.55 and 0.95 for 0 < μ ≤ 0.5. The true positive rate for the –test and that for the –test are comparable and close to 1 for 0 ≤ μ ≤ 0.3. In contrast, there is a visible difference between the results for the – and the – tests for 0.3 < μ ≤ 0.5. This result suggests that the definition of the size of a community may affect the significance of weak communities but not of strong communities.

Figure 2

True positive rate for the statistical tests applied to the networks generated by the LFR model. Legends S, L, and indicate the S–test, the L–test, the –test and the –test, respectively. The error bars indicate the ±1 standard deviation.

Empirical networks

We apply the statistical tests to the 12 empirical networks listed in Table 1 (see the Data section for details). In this section, we detect communities by modularity maximisation using the Louvain algorithm[26]. Then, we apply the statistical tests to each detected community. The fraction of significant communities for each statistical test is shown in Table 2. The – and the –tests identify more significant communities than the S–test and the L–test do in a majority of the 12 empirical networks. This result indicates that the – and the –tests are more generous than the S–test and L– test, which is consistent with the results for the LFR model. This is probably because the – and the – tests use to evaluate the quality of individual communities, which is consistent with the objective function of modularity maximisation, .

Table 2

Fraction of significant communities identified by the S–test, the L–test, the (qmod, s)–test, the (qint, s)–test, the (qexp, s)–test and the (qcnd, s)–test in the 12 empirical networks.

Network	S	L	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\boldsymbol{q}}}_{{\boldsymbol{c}}}^{{\bf{mod}}}$$\end{document}qcmod		\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\boldsymbol{q}}}_{{\boldsymbol{c}}}^{{\bf{int}}}$$\end{document}qcint		\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\boldsymbol{q}}}_{{\boldsymbol{c}}}^{\exp }$$\end{document}qcexp		\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\boldsymbol{q}}}_{{\boldsymbol{c}}}^{{\bf{cnt}}}$$\end{document}qccnt
Network	S	L	n _c	vol_c	n _c	vol_c	n _c	vol_c	n _c	vol_c
Karate	1.00	0.33	0.67	1.00	0.00	0.00	0.00	0.00	0.00	0.33
Dolphin	1.00	0.50	1.00	0.75	0.00	0.00	0.00	0.00	0.50	0.50
Les Misérables	0.40	0.40	0.40	0.60	0.20	0.40	0.00	0.00	0.50	0.40
Enron	1.00	0.00	1.00	1.00	0.33	0.67	0.00	0.00	1.00	1.00
Jazz	0.67	0.67	0.67	1.00	0.67	0.83	0.00	0.00	1.00	1.00
Netscience	1.00	0.64	1.00	1.00	0.91	0.82	0.09	0.09	0.91	1.00
Blog	0.00	1.00	1.00	1.00	0.50	0.50	0.00	0.00	1.00	1.00
Airport	0.00	0.60	0.70	0.80	0.15	0.55	0.00	0.00	0.40	0.20
Protein	0.00	0.35	0.14	0.22	0.03	0.12	0.01	0.01	0.00	0.00
Chess	0.00	0.25	0.13	0.15	0.36	0.58	0.00	0.00	0.01	0.03
Astro-ph	—	0.61	0.24	0.53	1.00	1.00	0.00	0.00	0.33	0.12
Internet	—	0.55	0.65	0.60	0.00	0.18	0.00	0.00	0.00	0.02

The hyphen indicates that the test did not terminate within 64 days on our computer (Intel 2.6 GHz Sandy Bridge processors and 4GB of memory).

Fraction of significant communities identified by the S–test, the L–test, the (qmod, s)–test, the (qint, s)–test, the (qexp, s)–test and the (qcnd, s)–test in the 12 empirical networks. The hyphen indicates that the test did not terminate within 64 days on our computer (Intel 2.6 GHz Sandy Bridge processors and 4GB of memory). To quantify the agreement between the – and the – tests, we compute the level of agreement defined by τ = (C11 + C00)/C, where C00 is the number of communities classified as insignificant by both statistical tests and C11 is the number of communities classified as significant by both tests. Note that 0 ≤ τ ≤ 1, τ = 1 if the two tests regard the same set of communities as significant, and τ = 0 if the two tests completely disagree. We compute τ between each pair of statistical tests for each empirical network and then average τ over the 12 empirical networks. The averaged τ values are shown in Table 3. We find τ = 0.42 between the S–test and the L–test, indicating that the two statistical tests disagree for a majority of communities. The L–test weakly agrees with the –test (i.e., τ = 0.58) but disagrees with the other tests for a majority of communities (i.e., τ < 0.5). The τ between the – and the –tests is large (τ = 0.84), suggesting that the significance of a majority of communities is not strongly affected by the definition of the community size.

Table 3

Agreement between pairs of statistical tests.

Test	S	L	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({{\boldsymbol{q}}}_{{\boldsymbol{c}}}^{{\bf{mod}}},{{\boldsymbol{n}}}_{{\boldsymbol{c}}})$$\end{document}(qcmod,nc)	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({{\boldsymbol{q}}}_{{\boldsymbol{c}}}^{{\bf{mod}}},{\bf{vo}}{{\bf{l}}}_{{\boldsymbol{c}}})$$\end{document}(qcmod,volc)
S	1.00	0.42	0.73	0.66
L	0.42	1.00	0.49	0.58
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({q}_{c}^{{\rm{mod}}},{n}_{c})$$\end{document}(qcmod,nc)	0.73	0.49	1.00	0.84
\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$({q}_{c}^{{\rm{mod}}},{{\rm{vol}}}_{c})$$\end{document}(qcmod,volc)	0.66	0.58	0.84	1.00

Agreement between pairs of statistical tests.

Other quality functions

In this section, we examine the –, the –and the –tests, where s is either n or vol. For the synthetic networks, the true positive rate for the –and the –tests is small in the entire range of μ (Fig. 3). As is the case for the S–test, quality function uses the number of intra-community edges. Some planted communities are regarded as insignificant because randomised networks often contain a community having almost the largest possible number of intra-community edges (Fig. 1(c)). The quality function is the largest when the community c is disconnected from the other nodes. Randomised networks often contain many disconnected components, yielding a large value of (Fig. 1(d)). Therefore, the true positive rate for the – and the –tests is also close to zero in the entire range of μ. In contrast to – and –tests, the – and –tests yield the true positive rate close to one when μ ≤ 0.3. These results suggest that the results considerably depend on the quality function. For all the (q, s)–tests, the definition of community size (i.e., n or vol) does not strongly influence the true positive rate.

Figure 3

True positive rate as a function of mixing parameter, μ, for the six (q, s)–tests.

True positive rate as a function of mixing parameter, μ, for the six (q, s)–tests. For the empirical networks, we first detect communities by maximising q, where q is either , or , using the variant of the Kernighan–Lin algorithm (see the Other statistical test sections). Then, we apply the (q,s)–test to each detected community. The results for the –, the – and the –tests applied to the 12 empirical networks are shown in Table 2. For all the networks, the –test regards more communities as significant than the – and the –tests, where s is either n or vol. This result is consistent with those obtained for the synthetic networks (Fig. 3). For each quality function q, the level of agreement (i.e., τ) between the different definitions of the community size (i.e., n or vol) is shown in Table 4. For most empirical networks, the agreement τ is larger than 0.8, indicating that the results of the statistical test do not strongly depend on the definition of community size in most cases.

Table 4

Agreement between the (q, n)–test and the (q, vol)–test.

Network	q _c
Network	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\boldsymbol{q}}}_{{\boldsymbol{c}}}^{{\bf{int}}}$$\end{document}qcint	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\boldsymbol{q}}}_{{\boldsymbol{c}}}^{\exp }$$\end{document}qcexp	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${{\boldsymbol{q}}}_{{\boldsymbol{c}}}^{{\bf{cnd}}}$$\end{document}qccnd
Karate	1.00	1.00	0.67
Dolphin	1.00	1.00	1.00
Les Misérables	0.60	1.00	0.90
Enron	0.67	1.00	1.00
Jazz	0.50	1.00	1.00
Netscience	0.73	1.00	0.91
Blog	1.00	1.00	1.00
Airport	0.60	1.00	0.60
Protein	0.90	0.99	1.00
Chess	0.77	1.00	0.98
Astro-ph	1.00	1.00	0.76
Internet	0.82	1.00	0.98

Agreement between the (q, n)–test and the (q, vol)–test.

Discussion

We proposed a non-parametric statistical test, called the (q, s)–test, for the significance of individual communities, which accounts for the correlation between the quality and the size of single communities. We demonstrated our test with several quality functions q including the one defined as the contribution of a single community to the modularity. In fact, the (q, s)–test accepts different quality functions for individual communities such as those described in the previous literature[13,14,41-43]. In addition, the (q, s)–test does not demand how communities should be detected in a given network. We note that q that is consistent with the objective function for community detection should be used because the former is maximised in the (q, s)–test and the latter is maximised in community detection. We have used two definitions of the size of a community, i.e., the number of nodes in a community (i.e., n), and the sum of degrees of nodes in a community (i.e., vol). For degree-homogeneous networks, the choice does not matter because n ∝ vol. However, for degree-heterogeneous networks, significant communities may considerably depend on whether we use n or vol. If q explicitly uses its own measure of the size of a community, we should probably adopt the corresponding definition of the community size in the (q, s)–test. If a measure of community size is not explicit, we suggest that one selects a measure of community size that is more strongly correlated with q than others. If q is correlated with multiple quantities (e.g. both n and vol) that are not perfectly correlated with each other, one can extend the (q, s)–test by adopting multivariate Gaussian kernels with three or more variables instead of bivariate Gaussian kernels. A downside of this approach is that we would need more data to reliably estimate the distribution of (q, s), where s is at least two-dimensional. We can adopt the (q, s)–test to assess the significance of other structures of networks, such as bipartite communities[44] and core-periphery structure[45-47], provided that the quality function for the individual structure (e.g., a single bipartite community) is explicitly defined. In fact, we applied a variant of the (q, s)–test to core-periphery structure in our previous study[47]. Robustness of community structure against random perturbations (e.g., addition, removal and rewiring of edges) is an alternative measure of the significance of communities[14,48,49]. With this approach, if small perturbations do not considerably change communities, then the communities are regarded as significant. Statistical tests based on quality functions including the (q, s)–test and those based on robustness may provide different results[49]. As is the case of quality functions, the robustness of an individual community may be correlated with the size of a community. For example, removal of a small number of intra-community edges may destroy small communities, whereas large communities may survive the removal of more intra-community edges. If this is the case, it may be worthwhile to inform a robustness–based test of individual communities by the dependence of the robustness measure on the size of a community.

14 in total

1. Protein complexes and functional modules in molecular networks.

Authors: Victor Spirin; Leonid A Mirny
Journal: Proc Natl Acad Sci U S A Date: 2003-09-29 Impact factor: 11.205

2. Statistical significance of communities in networks.

Authors: Andrea Lancichinetti; Filippo Radicchi; José J Ramasco
Journal: Phys Rev E Stat Nonlin Soft Matter Phys Date: 2010-04-20

3. Towards a proteome-scale map of the human protein-protein interaction network.

Authors: Jean-François Rual; Kavitha Venkatesan; Tong Hao; Tomoko Hirozane-Kishikawa; Amélie Dricot; Ning Li; Gabriel F Berriz; Francis D Gibbons; Matija Dreze; Nono Ayivi-Guedehoussou; Niels Klitgord; Christophe Simon; Mike Boxem; Stuart Milstein; Jennifer Rosenberg; Debra S Goldberg; Lan V Zhang; Sharyl L Wong; Giovanni Franklin; Siming Li; Joanna S Albala; Janghoo Lim; Carlene Fraughton; Estelle Llamosas; Sebiha Cevik; Camille Bex; Philippe Lamesch; Robert S Sikorski; Jean Vandenhaute; Huda Y Zoghbi; Alex Smolyar; Stephanie Bosak; Reynaldo Sequerra; Lynn Doucette-Stamm; Michael E Cusick; David E Hill; Frederick P Roth; Marc Vidal
Journal: Nature Date: 2005-09-28 Impact factor: 49.962

4. Finding community structure in networks using the eigenvectors of matrices.

Authors: M E J Newman
Journal: Phys Rev E Stat Nonlin Soft Matter Phys Date: 2006-09-11

5. Mixture models and exploratory analysis in networks.

Authors: M E J Newman; E A Leicht
Journal: Proc Natl Acad Sci U S A Date: 2007-05-24 Impact factor: 11.205

6. Benchmark graphs for testing community detection algorithms.

Authors: Andrea Lancichinetti; Santo Fortunato; Filippo Radicchi
Journal: Phys Rev E Stat Nonlin Soft Matter Phys Date: 2008-10-24

7. Robustness of community structure in networks.

Authors: Brian Karrer; Elizaveta Levina; M E J Newman
Journal: Phys Rev E Stat Nonlin Soft Matter Phys Date: 2008-04-29

8. Scalable detection of statistically significant communities and hierarchies, using message passing for modularity.

Authors: Pan Zhang; Cristopher Moore
Journal: Proc Natl Acad Sci U S A Date: 2014-12-08 Impact factor: 11.205

9. Finding statistically significant communities in networks.

Authors: Andrea Lancichinetti; Filippo Radicchi; José J Ramasco; Santo Fortunato
Journal: PLoS One Date: 2011-04-29 Impact factor: 3.240

10. Cluster analysis of networks generated through homology: automatic identification of important protein communities involved in cancer metastasis.

Authors: Pall F Jonsson; Tamara Cavanna; Daniel Zicha; Paul A Bates
Journal: BMC Bioinformatics Date: 2006-01-06 Impact factor: 3.169

6 in total

1. Computing the statistical significance of optimized communities in networks.

Authors: John Palowitch
Journal: Sci Rep Date: 2019-12-05 Impact factor: 4.379

2. Network Analysis Reveals the Latent Structure of Negative Symptoms in Schizophrenia.

Authors: Gregory P Strauss; Farnaz Zamani Esfahlani; Silvana Galderisi; Armida Mucci; Alessandro Rossi; Paola Bucci; Paola Rocca; Mario Maj; Brian Kirkpatrick; Ivan Ruiz; Hiroki Sayama
Journal: Schizophr Bull Date: 2019-09-11 Impact factor: 9.306

3. Collaboration and knowledge generation in an 18-year quality improvement research programme in Australian Indigenous primary healthcare: a coauthorship network analysis.

Authors: Jodie Bailie; Boyd Alexander Potts; Alison Frances Laycock; Seye Abimbola; Ross Stewart Bailie; Frances Clare Cunningham; Veronica Matthews; Roxanne Gwendalyn Bainbridge; Kathleen Parker Conte; Megan Elizabeth Passey; David Peiris
Journal: BMJ Open Date: 2021-05-06 Impact factor: 2.692