Literature DB >> 32357655

Growing networks with communities: A distributive link model.

Ke-Ke Shang¹, Bin Yang¹, Jack Murdoch Moore², Qian Ji¹, Michael Small³.

Abstract

Evolution and popularity are two keys of the Barabasi-Albert model, which generates a power law distribution of network degrees. Evolving network generation models are important as they offer an explanation of both how and why complex networks (and scale-free networks, in particular) are ubiquitous. We adopt the evolution principle and then propose a very simple and intuitive new model for network growth, which naturally evolves modular networks with multiple communities. The number and size of the communities evolve over time and are primarily subjected to a single free parameter. Surprisingly, under some circumstances, our framework can construct a tree-like network with clear community structures-branches and leaves of a tree. Results also show that new communities will absorb a link resource to weaken the degree growth of hub nodes. Our models have a common explanation for the community of regular and tree-like networks and also breaks the tyranny of the early adopter; unlike the standard popularity principle, newer nodes and communities will come to dominance over time. Importantly, our model can fit well with the construction of the SARS-Cov-2 haplotype evolutionary network.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2020 PMID： 32357655 PMCID： PMC7192348 DOI： 10.1063/5.0007422

Source DB: PubMed Journal: Chaos ISSN： 1054-1500 Impact factor: 3.642

Evolution and popularity are two keys of the Barabasi–Albert (BA) model—the default model for generation and growth of complex networks, with a power law distribution of network degrees. Evolving network generation models are important as they offer an explanation of both how and why complex networks (and scale-free networks, in particular) are ubiquitous. However, the motivation and manifestation of “popularity,” the other key to the BA model, are often rather ambiguous for natural networks. We also argue that the “rich-get-richer” principle is not necessarily sufficient or ubiquitous across all phenomena in all biological, social, and physical networks or particularly in engineering networks. Moreover, the growth of mesoscale network structures—communities—remains not fully understood. Here, we develop a growing network model with new communities for regulating the degree distribution, and new communities will absorb link resources to weaken the degree growth of hub nodes. Interestingly, our framework can construct a tree-like network with clear community structures—the branches and leaves of a tree. Results show that our models have a common explanation for the community of regular and tree-like networks and can also regulate degree distributions via a free parameter. We show that our model is also capable of replicating the features of real biological systems—it can construct a network that has a similar structure to the SARS-Cov-2 haplotype evolutionary network.

INTRODUCTION

Evolving networks are ubiquitous, from social networks to protein networks, from the Internet to contact networks, and so on. Hence, in all these instances, time is the basic element for a fitness network model. However, another basic element, the principle of preferential attachment—leading to the scale-free degree distribution—is increasingly controversial as an omnipresent model of complex systems. In one recent study, it was shown that the scale-free network is rare. In this report, we argue that time evolution is the key for any network model, but the birth time of a node should not entirely preordain its final state. While the first mover advantage is an advantage, it is not sufficient to guarantee final success—sometimes better ideas will appear later. In particular, the degree distribution of growing engineering networks is usually influenced by both geography and resource-driven demand; hence, links cannot be predicted well via an algorithm that is based solely on the principle of pure preferential attachment. On the other hand, communities appear in many complex networks and are frequently associated with important functions of those networks. Research into community detection also has recently attracted increasing attention. Actually, the appearance of new communities will pillage the degree resource from old hub nodes, as the links within communities are denser than those between groups, and a new hub node is easily formed with the new community. In this report, we adopt two keys—network evolution and new communities—to design a growing network model. With time evolution, upstarts (new nodes with high degrees) will appear in new groups, and our model provides a novel perspective on the “rich-get-richer” phenomenon. First, we construct a fully connected network, and every node is one initial community. Second, we evaluate the probability that generates a new community or choose a community by the same probability mass function. Third, we introduce a new node that connects nodes within the new or chosen community with a higher probability and external nodes of that with a lower probability. Fourth and finally, we repeat steps and as required. In what follows, we show network structures that are constructed via our model and their degree distributions. In regular networks (networks with many crossing links), with the increasing probability of the appearance of new communities, the degree of the hub node decreases—consistent with the hypothesis of our model. Surprisingly, our model can also construct a tree-like network with natural increasing new branches (communities). To investigate the community structure of our model more carefully, we measure the visibility of community structures by . We find that networks that are constructed via our model with appropriate parameters have obvious community structures. Furthermore, we detect the predictability of communities and find that links within a community are more predictable than those between communities. All in all, our model can construct a regular or tree-like network with clear and predictable communities and controllable degree distributions. In the end, we successfully construct a tree-like network with a similar structure to the SARS-Cov-2 haplotype evolutionary network and a regular network with a similar structure to the karate club network.

GROWING MODEL

Time is the key power in the BA model, the birth time determines the popularity of nodes, and the popularity of a node determines accretion of links. Hence, the richer get richer and the link resource distribution becomes more unequal. However, our model introduces the control parameter to create new communities to get the resource and also controls both new and old communities to share the resource, in contrast to a mechanism in which high degree nodes absorb more links over time. Given nodes, construct a fully connected network, where each node is an independent community and each community has an independent label . Next, we introduce a free control parameter ; the probability of the appearance of a new community is inversely proportional to here, is the number of nodes of community and is the number of nodes of the whole network. If the community is successfully formed, then we introduce a new node as the first member of the new community and randomly connect to () existing nodes from other communities. Hence, the label of the new community is , where is the number of old communities. Otherwise, introduce a new node as a member of an old community. The community to be chosen is decided by its label number ; the bigger means the later birth. In other words, the probability that a particular community is chosen is determined by its age, inversely proportional to . The later born community is easy to be chosen with the bigger then forming more communities. Next, the new node has a probability to connect its own community nodes and to randomly connect to other community nodes. Here, can be ; due to the community, the first node has connected the node from other communities at step 2. If the number of nodes within a community is not enough to be connected by its new node, let the new node first connect all nodes within the community randomly and then connect the external nodes randomly. Repeat (2), (3), and (4) times and then the number of nodes is . For a large network, only a small number of nodes may connect to other communities. However, a small number of nodes will struggle to make an impact on the community structure—or be strongly wired with their intended community. To accelerate the computation, we can delete these nodes and then add new nodes. Actually, in a real-world growing network, that some nodes only connect other communities is natural. Such as in the SARS-Cov-2 haplotype evolutionary network, the haplotype (group ) only connects other group due to the relationship between them, and its community is uncovered. Hence, we can also ignore this phenomenon. Furthermore, as depicted in Eq. (1), when , and the probability of the appearance of a new community tends to and the number of communities . On the contrary, when , and then the probability of the appearance of a new community tends to and the number of communities . We set the probability of the appearance of a new community equal to when . Hence, . Here, we provide a fitting method for the relationship between the number of communities and the , is a free parameter for fitting. As shown in Fig. 1, we set and then change the number of nodes . Results show that our fitting method can fit with the results of our model very well.

FIG. 1.

The relationship between the number of communities and the free control parameter . The ordinate is the number of communities, and the abscissa is the free control parameter. The solid line is the result curve of our model, and the dashed line is the result of our fitting method. (a) We set , , , , and for the model; actually, only , , and are useful for our model curve. (b) We set , , , , and for the model. (c) We set , , , , and for the model.

FITNESS OF A GROWING MODEL

In this report, we give three examples (Fig. 2) to qualitatively interpret our network growth model. Here, we set the number of initial nodes , the number of links per step (), and the probability to connect to its own existing community () to be the same and only change the free control parameter . We can see that with the increase in , the number of communities increases, the size (number of nodes) of the biggest community decreases, and the degree of the largest hub node decreases (Table I). Furthermore, as shown in Table I, the increase in sharply increases the number of communities and will greatly inhibit the growth of the community size. Hence, this will influence the degree of the hub node in each community. The node with the lowest degree still has the same degree value. This phenomenon states that the new nodes are more likely to get richer (more degrees), and more degree resources can be allocated to previously not richer nodes under our control.

FIG. 2.

TABLE I.

The number of communities N, the size of the biggest community Smax, the highest degree Dmax, and the lowest degree Dmin for each model network in Fig. 2.

	N_c	S_max	D_max	D_min
β = min(n_i)	5	20	17	3
β = mean(n_i)	8	17	14	3
β = max(n_i)	17	14	13	3

Network structures and the corresponding degree distributions for three regular networks of our model. In network structure figures, each community has its unique node shape, the dashed link is the link between communities, and the solid link is the link within the community. We set , , , and for all networks and for (a), for (b), and for (c). Here, a new node randomly connects the old node within the same community. The number of communities N, the size of the biggest community Smax, the highest degree Dmax, and the lowest degree Dmin for each model network in Fig. 2. Next, we set and and then directly construct a tree-like network. As shown in Fig. 3, the surprising result is that our model is very similar to a naturally evolving tree. Initial nodes are tree roots; with the time evolution, each branch is the community and the branch of the branch is the new community; and the rest can be deduced by analogy. The number of communities will increase with the increase in ; hence, we set to avoid a large number of communities.

FIG. 3.

The network structures and the corresponding degree distribution for a tree-like network of our model. In the network structure figure, each community has its unique node shape, the dashed link is the link between communities, and the solid link is the link within the community. We set , , , , and for it. Here, a new node randomly connects the old node within the same community.

Regular networks

To further discuss the fitness of our growing network model, we measure the community structure of our model by the default metric . Here, we set , , and as in Fig. 2. As shown in Fig. 4(a), we can see that the value () for all networks will increase with the increase in . This means that, with the increase in , our model will have a clearer so-called community structure. In detail, compared to and , we can see that has the worst performance under the metric of . Moreover, has the best performance under the metric of , and the network has the clear community structure ( is around ) if . It should be noted that also has a clear community structure when .

FIG. 4.

We set as , , and , respectively. (a) is the value of different regular networks with the increase in . The ordinate is the score of communities, and the abscissa is the value. (b) is the value of different regular networks with the increase in . The ordinate is the score of networks, and the abscissa is the value. Furthermore, many previous studies demonstrate that the community can improve the predictability of links. Hence, next, we measure the link predictability by the metric (), Here, is the algorithm prediction accuracy of internal links between communities and is that of all links. We adopt the default metric to measure our link prediction accuracy. In practice, it is not easy to improve the accuracy of a basis link prediction algorithm more than ; hence, the community has a very predictable structure if . In this report, we adopt three famous basic link prediction algorithms , , and for regular networks. As shown in Fig. 4(b), has the worst performance under the metric of . has a better performance under the metric of , and the community has the best predictability structure () if is around the interval . In summary, we suggest that setting and can construct a regular network with a clear community structure and a higher predictability of community links. However, we can also set and to further regulate the degree distribution. In addition, each real-world network has its own particular community structure; actually, not all networks have a high or value. In other words, each network has its real and particular statistical characteristics. However, we can observe the statistical characteristic of the real network and then regulate to construct a network with a similar value, a value, and the number of communities to the real network.

Tree-like network

Growing model

Many real-world systems are naturally represented as tree-like networks with almost no closed triangular structure. Here, we set , , , and as shown in Fig. 3. We adopt the (the algorithm for tree-like networks) as the basic link prediction algorithm. As depicted in Fig. 5, compared to and , the performance of and both can achieve the best and high-quality ( and ) performance when we setting . Hence, for tree-like networks, we suggest that set can achieve a clear community structure and a higher predictability of community links.

FIG. 5.

The value and the value of three tree-like networks. We set as , , and , respectively. The ordinate is the or score of communities, and the abscissa is the value.

The haplotype evolutionary network of SARS-Cov-2

Actually, the tree-like networks are ubiquitous in nature and human society. In particular, the virus haplotype evolutionary network is a natural tree-like network with new communities. In this report, we adopt the SARS-Cov-2 haplotype evolutionary data as a real-world network, a node stands for one kind of a haplotype or a medium vector, and a link stands for an evolutionary relationship between two connected nodes. The SARS-Cov-2 haplotype evolutionary network with groups (communities) has nodes ( haplotypes and one medium vector ) and links before February . Here, in the SARS-Cov-2 network, we observe that the root node in each community is the hub node with the highest degrees; hence, we set that a new node connects the oldest node within the same community with a probability. As depicted in Fig. 6, when we set to avoid a large number of communities, . Our model can construct a network that has the same number of communities, the lowest degree, and the maximum depth of the tree within a community as the real evolutionary network. Furthermore, as shown in Table II, our model network has the similar value, the value, and the highest degree as that of the real network. Results show that our model can fit well with the real-world evolving network that has new communities. In addition, we can also see that the Barabasi–Albert (BA) network cannot grow communities [Fig. 6(c)], and the highest degree value () is far less than that of the real network (). Hence, we suggest that the BA model cannot fit with the real-world evolving network that has new communities.

FIG. 6.

TABLE II.

The number of communities N, the maximum depth of the tree within a community Lmax, the size of the biggest community Smax, the highest degree Dmax, the lowest degree Dmin, the modularity value Q, and the predictability value Pr for networks in Fig. 6.

	N_c	L_max	S_max	D_max	D_min	Q	Pr
SARS-Cov-2	5	2	30	27	1	0.5471	0.0082
Growing model	5	2	34	31	1	0.5208	0.0071

(a) is the network structure of SARS-Cov-2 haplotypes. (b) is the network structure of our model, and we set , , , , and for it. Here, a new node connects the oldest node within the same community with a probability and randomly connects other old nodes within the same community. (c) is the network structure of the BA model, and we set , , and . The number of communities N, the maximum depth of the tree within a community Lmax, the size of the biggest community Smax, the highest degree Dmax, the lowest degree Dmin, the modularity value Q, and the predictability value Pr for networks in Fig. 6.

Karate club network

Dissimilar to the haplotype evolutionary network of SARS-Cov-2, previous famous traditional networks have no natural communities. However, researchers proposed many detection algorithms to uncover their defined communities. In this report, we adopt the karate club network as the real regular network: A node represents one club user, and a link indicates that there is a relationship between a pair of nodes. The karate club network is the common used data and has been widely applied for community detection. Moreover, we adopt the foundational community detection algorithm, the Girvan–Newman (GN) algorithm, to detect the karate club network. As depicted in Fig. 7, when we set to avoid a large number of communities, , , and . Our model can construct a network that has the same number of communities and the size of the biggest community as the real karate club network. Surprising, our model and the real network each have a community with only one node. Furthermore, as shown in Table III, our model network has a similar value, a value, and the lowest degree as that of the real network. Results show that our model can also fit with the traditional network, which has been widely used for community detection.

FIG. 7.

(a) is the network structure of the karate club. (b) is the network structure of our model, and we set , , , , and for it. Here, a new node randomly connects the old node within the same community.

TABLE III.

The number of communities N, the size of the biggest community Smax, the size of the smallest community Smin, the highest degree Dmax, the lowest degree Dmin, the modularity value Q, and the predictability value Pr for networks in Fig. 7.

	N_c	S_max	S_min	D_max	D_min	Q	Pr
Karate network	5	12	1	17	1	0.4013	0.0581
Growing model	5	12	1	11	2	0.4746	0.0554

(a) is the network structure of the karate club. (b) is the network structure of our model, and we set , , , , and for it. Here, a new node randomly connects the old node within the same community. The number of communities N, the size of the biggest community Smax, the size of the smallest community Smin, the highest degree Dmax, the lowest degree Dmin, the modularity value Q, and the predictability value Pr for networks in Fig. 7.

CONCLUSION AND DISCUSSION

Growing network models or static network models with communities have been proposed many times, but a fitness growth model that incorporates the birth of communities and can achieve tree-like networks is very rare. Our model exhibits a clear community structure, the more predictable links of communities, and the varying degree distribution at the same time. Particularly, in the construction of tree-like networks, our growing model can grow the branches and leafs for the tree-like network, and the almost perfect new branches or leafs that are precisely the new communities. More importantly, compared to the SARS-Cov-2 haplotype evolutionary network, our model can construct a network with the same number of communities and a similar network structure. In addition, our model also can fit with the common used network of community detection. “Rich get richer” is the common sense for network science; however, new communities often expand rapidly in the real world. Our work helps us to utilize the new community phenomenon and to contribute a novel network model. Another possible way is that using the increase in new communities helps us improve the link prediction accuracy of tree-like networks.

17 in total

1. Emergence of scaling in random networks

Authors:
Journal: Science Date: 1999-10-15 Impact factor: 47.728

2. Compartments revealed in food-web structure.

Authors: Ann E Krause; Kenneth A Frank; Doran M Mason; Robert E Ulanowicz; William W Taylor
Journal: Nature Date: 2003-11-20 Impact factor: 49.962

Review 3. Community structure in social and biological networks.

Authors: M Girvan; M E J Newman
Journal: Proc Natl Acad Sci U S A Date: 2002-06-11 Impact factor: 11.205

4. Finding community structure in very large networks.

Authors: Aaron Clauset; M E J Newman; Cristopher Moore
Journal: Phys Rev E Stat Nonlin Soft Matter Phys Date: 2004-12-06

5. Modularity and community structure in networks.

Authors: M E J Newman
Journal: Proc Natl Acad Sci U S A Date: 2006-05-24 Impact factor: 11.205

Review 6. Adaptive coevolutionary networks: a review.

Authors: Thilo Gross; Bernd Blasius
Journal: J R Soc Interface Date: 2008-03-06 Impact factor: 4.118

7. Popularity versus similarity in growing networks.

Authors: Fragkiskos Papadopoulos; Maksim Kitsak; M Ángeles Serrano; Marián Boguñá; Dmitri Krioukov
Journal: Nature Date: 2012-09-12 Impact factor: 49.962

8. Finding missing edges in networks based on their community structure.

Authors: Bowen Yan; Steve Gregory
Journal: Phys Rev E Stat Nonlin Soft Matter Phys Date: 2012-05-15

9. Simulation of an SEIR infectious disease model on the dynamic contact network of conference attendees.

Authors: Juliette Stehlé; Nicolas Voirin; Alain Barrat; Ciro Cattuto; Vittoria Colizza; Lorenzo Isella; Corinne Régis; Jean-François Pinton; Nagham Khanafer; Wouter Van den Broeck; Philippe Vanhems
Journal: BMC Med Date: 2011-07-19 Impact factor: 8.775

10. Decoding the evolution and transmissions of the novel pneumonia coronavirus (SARS-CoV-2 / HCoV-19) using whole genomic data.

Authors: Wen-Bin Yu; Guang-Da Tang; Li Zhang; Richard T Corlett
Journal: Zool Res Date: 2020-05-18

7 in total

1. Cluster-based dual evolution for multivariate time series: Analyzing COVID-19.

Authors: Nick James; Max Menzies
Journal: Chaos Date: 2020-06 Impact factor: 3.642

2. Association between COVID-19 cases and international equity indices.

Authors: Nick James; Max Menzies
Journal: Physica D Date: 2020-12-23 Impact factor: 2.300

3. Estimating a continuously varying offset between multivariate time series with application to COVID-19 in the United States.

Authors: Nick James; Max Menzies
Journal: Eur Phys J Spec Top Date: 2022-01-11 Impact factor: 2.891