Literature DB >> 35052107

Structural Entropy of the Stochastic Block Models.

Jie Han¹, Tao Guo¹, Qiaoqiao Zhou², Wei Han¹, Bo Bai¹, Gong Zhang¹.

Abstract

With the rapid expansion of graphs and networks and the growing magnitude of data from all areas of science, effective treatment and compression schemes of context-dependent data is extremely desirable. A particularly interesting direction is to compress the data while keeping the "structural information" only and ignoring the concrete labelings. Under this direction, Choi and Szpankowski introduced the structures (unlabeled graphs) which allowed them to compute the structural entropy of the Erdős-Rényi random graph model. Moreover, they also provided an asymptotically optimal compression algorithm that (asymptotically) achieves this entropy limit and runs in expectation in linear time. In this paper, we consider the stochastic block models with an arbitrary number of parts. Indeed, we define a partitioned structural entropy for stochastic block models, which generalizes the structural entropy for unlabeled graphs and encodes the partition information as well. We then compute the partitioned structural entropy of the stochastic block models, and provide a compression scheme that asymptotically achieves this entropy limit.

Entities: Chemical

Keywords: network compression; optimal compression algorithm; stochastic block model (SBM); structural entropy

Year: 2022 PMID： 35052107 PMCID： PMC8775199 DOI： 10.3390/e24010081

Source DB: PubMed Journal: Entropy (Basel) ISSN： 1099-4300 Impact factor: 2.524

1. Introduction

Shannon’s metric of “Entropy” of information is a foundational concept of information theory [1,2]. Given a discrete random variable X with support set (that is, the possible outcomes) , which occurs with probability , the entropy of X is defined as where the logarithm here and throughout this paper is of base 2. Note that the entropy of X is a function of the probability distribution of X. The entropy was originally created by Shannon in [3] as part of their theory of communication, where a data communication system consists of a data source X, a channel and a receiver. The fundamental problem of communication is for the receiver to reliably recover what data was generated by the source, based on the bits it receives through the channel. Shannon proved that the entropy of the source X plays a central role—in their source coding theorem it is shown that the entropy is the mathematical limit on how well the data can be losslessly compressed. The question then arises: How to compress data that has structures, e.g., data in social networks? In Shannon’s 1953 less known paper [4] he argued for an extension of information theory, where data is considered as observations of a source, to “non-conventional data” (that is, lattices). Indeed, nowadays data appears in various formats and structures (e.g., sequences, expressions, interactions) and in drastically increasing amounts. In many scenarios, data is highly context-dependent and in particular, the structural information and the context information seem to be two conceptually different aspects. Therefore it is desirable to develop novel theory and efficient algorithms for extracting useful information from non-conventional data structures. Roughly speaking, such data consists of structural information, which, might be understood as the “shape” of the data, and context information which should be recognized as data labels. It is well-known that complex networks (e.g., social networks) admit community structures [5]. That is, users within a group interact with each other more frequently than those outside the group. The stochastic block model (SBM) [6] is a celebrated random graph model that has been widely used to study the community structures in graphs and networks. It provides a good benchmark to evaluate the performance of community detection algorithms and inspires the design of many algorithms for community detection tasks. The theoretical underpinnings of the SBM have been extensively studied and sharp thresholds for exact recovery have been successively established [7,8,9,10,11,12]. We refer readers to [13] for a recent survey, where other interesting and important problems in SBM are also discussed. In addition to the SBM discussed in [13], there are other angles to study compression of data with graph structures. Asadi et al. [14] investigated data compression on graphs with clusters. Zenil et al. [15] have surveyed information-theoretic methods, in particular Shannon entropy and algorithmic complexity, for characterizing graphs and networks.

1.1. Compression of Graphs

In recent years, graphical data and the network structures supporting them are becoming increasingly common and important in branches of engineering and sciences. To better represent and transmit graphical data, many works consider the problem of compressing the (random) graph up to isomorphism, i.e., compressing the structure of a graph. A graph G contains a finite set V of vertices and a set E of edges each of which connects two vertices. A graph can be represented by a binary matrix (the adjacency matrix) that further can be viewed as a binary sequence. Thus, encoding a labeled graph (that is, all vertices need to be distinguished) is equivalent to encoding the -digit binary sequence, given certain probability distribution on all possible edges. However, such a string does not reflect internal symmetries that are conveyed by the graph automorphism, and sometimes we are only interested in the local or global structures in the graph, rather than the exact vertex labelings. The structural entropy is defined when the graphs are considered unlabeled, or simply called structures, where the vertices are viewed as undistinguishable. The goal of this natural definition is to capture the information of the structure, and thus provides a fundamental measure in graph/structure compression schemes. The problem actually has a strong theoretical background. Back to 1984, Turán [16] raised the question of finding an efficient coding method for general unlabeled graphs on n vertices, where a lower bound of bits is suggested. This lower bound can be seen by the number of unlabeled graphs [17]. The question was later answered by Naor [18] in 1990 who proposed such a representation that is optimal up to the first two leading terms when all unlabeled graphs are equally likely. In a recent paper Kieffer et al. [19] proved a structural complexity of a binary tree. There also have been some heuristic methods for real-world graph compression schemes, see [20,21,22,23,24]. Rather recently, Choi and Szpankowski [25] studied the structural entropy of the Erdős–Rényi random graph . They computed the structural entropy given that p is not (very) close to 0 or 1 and also gave a compression scheme that matches their computation. Later, the structural entropy for other randomly generated graphs, e.g., the preferential attachment graphs and web graphs are also studied [26,27,28,29]. However, it is well-known that the Erdős–Rényi model is too simplistic to model real networks, in particular due to its strong homogeneity and absence of community structure. In this paper, we consider the compression of graphical structures of the SBM, which in general model real networks better and circumvent the issues of the ER-model. In summary, our contributions are as follows: We introduce the partitioned structural entropy which generalizes the structural entropy for unlabeled graphs and we show that it reflects the partition information of the SBM. We provide an explicit formula for the partitioned structural entropy of the SBM. We also propose a compression scheme that asymptotically achieves this entropy limit. Semantic communications are considered as a key component of future generation networks, where a natural problem to consider is how to efficiently extract and transmit the “semantic information”. In the case of graph data, one may view the (partitioned) structures as the information that needs to be abstracted while the concrete labeling information is considered redundant. From this point of view, our result is a step for the study of semantic compression/communication under appropriate contexts.

1.2. Related Works

Finally, we would like to point out that there are some other information metrics defined on graphs. The term “graph entropy” has been defined and used in the history. For example, graph entropy introduced by Kőrner in [30] denotes the number of bits one has to convey to resolve the ambiguity of a vertex in a graph. This notion also turns out to be useful in other areas, including combinatorics. Chromatic entropy introduced in [31] is the lowest entropy of any coloring of a graph. It finds application in zero-error source coding. We remark that the structural entropy we considered is quite different from the Kőrner graph entropy and chromatic entropy. On the other hand, a concept of graph entropy (also called topological information content of a graph) was introduced by Rashevsky [32] and Trucco [33], and later by Mowshowitz [34,35,36,37,38,39], which is defined as a function of (the structure of) a graph and an equivalence relation defined on its vertices or edges. Such a concept is a measure of the graph itself and does not involve any probability distribution.

2. Preliminaries

2.1. Structural Entropy of Unlabeled Graphs

Now let us formally define the structural entropy given a probability distribution on unlabeled graphs. In this subsection, we use notations borrowed from [25]. Given an integer n, define as the collection of all n-vertex labeled graphs. (Entropy of Random Graph). Given an integer n and a random graph where Then the random structure model associated with the probability distribution , is defined as the unlabeled version of . For a given , the probability of S can be computed as: Here means that G and S have the same structure, that is, S is isomorphic to G. Clearly if all isomorphic labeled graphs have the same probability, then for any labeled graph , one has: where stands for the number of different labeled graphs that have the same structure as S. (Structural Entropy). The structural entropy where the sum is over all distinct structures. The Erdos–Rényi random graph , also called the binomial random graph, is a fundamental random graph model, which has n vertices and each pair of vertices is connected with probability p, independent of other pairs. In 2012, Choi and Szpankowski [25] proved the following for the Erdős–Rényi random graphs. (Choi and Szpankowski, [25]). For large n and all p satisfying and , the following holds: The structural entropy of is: for some. For a structure S of n vertices andwhereis the entropy rate of a binary memoryless source. Furthermore, they [25] also presented a compression algorithm for unlabeled graphs that asymptotically achieves the structural entropy up to an error term.

2.2. Stochastic Block Model–Our Result

As the ER model is not appropriate to model real networks, the stochastic block model is introduced on the assumption that vertices in a network connect independently but with probability based on their profiles, or equivalently, on their community assignment. For example, in the SBM with two communities and symmetric parameters, also known as the planted bisection model, denoted by , the vertex set is partitioned into two sets and , any pair of vertices inside or are connected with probability p and any pair of vertices across the clusters are connected with probability q, and all these connections are independent. As an illuminating example, consider a context G where there are users and devices, and each pair of users and each pair of devices are connected with probability p, a user and a device is connected with probability q and each of these connections is independent of all other connections. Suppose that we need to compress the information of G. However, in the context it is not appropriate to view G as an unlabeled graph, that is, in addition to the structure information, it is also important to keep the “community” information – the compression also needs to encode the information that who is a user and who is device. (Partition-respecting isomorphism, Partitioned Unlabeled Graphs). Let Note that every labeled graph G corresponds to a unique structure , and we use to denote this relation. Furthermore, under the above definition, general unlabeled graphs correspond to the case . (Partitioned Structural Entropy). Let V be a set of n vertices where In this paper, we extend Theorem 1 to the structural entropy of the stochastic block model with any given number of blocks, and provide a compression algorithm that asymptotically matches this structural entropy. For ease of comprehension, we first give the result for the balanced bipartition case . Let n be a positive even integer and letbe a set of n vertices with. Supposeis a probability distribution of graphs on V where every edge insideoris present with probability p and every edge betweenandis present with probability q, and these edges are mutually independent. For large even n and all p satisfyingand, the following holds: The partitioned structural entropy of is: for some. For a balanced bipartitioned structure S andwhereis the entropy rate of a binary memoryless source. Note that the structural entropy here is larger than that in Theorem 1 (even if ), which reflects the fact that the SBM with “a planted (bi-)partition” contains prefixed structures, so has less symmetries than , the pure random model (For , when it is asymmetric, comparing with the completely labeled graphs, Theorem 1 saves a term as ; this saving becomes for the planted balanced bipartition case in Theorem 2).

3. Proof of Theorem 2

One key ingredient in the proof of Theorem 1 in [25] is the following lemma on the symmetry of . A graph is called asymmetric if its automorphism group does not contain any permutation other than identity; otherwise it is called symmetric. (Kim, Sudakov and Vu, 2002). For all p satisfying (Proof of Theorem 2). Note that every pair of vertices in or in should be considered as undistinguishable, but not the pairs of vertices in . Recall that we write for a graph G and a structure S if S represents the structure of G (with respect to the partition ). Let . We first compute . Note that there are possible edges in , and we can view it as a binary sequence of length , where each digit is a Bernoulli random variable. Moreover, for edges inside or , the random variable, denoted by , has expectation p and for edges in the random variable, denoted by , has expectation q. Thus, we have: Now write for the probability distribution on V over all partitioned unlabeled graphs inherited from , namely, given , . Let be the partitioned structural entropy of . Therefore, compared with our goal, it remains to show that: Note that in , all labeled graphs such that have the same probability . Thus, given a (labeled) graph , we have , where is such that . So the graph entropy of can be written as: Define be be S restricted on W for . Now we split S into and , i.e., and . Write for the automorphism group for , and we naturally have: Combining this with (2) and (3), it remains to show that: In the summation above we only need to focus on S such that either or is symmetric, as otherwise . By Lemma 1, we conclude that the probability of S restricted on or is symmetric is for some , and for such S we use the trivial bound . This gives us the desired estimate in (i) To show (ii), for a set V of n vertices and a balanced bipartition of V, we define the typical set as the set of structures S on n vertices satisfying: S is asymmetric on and , respectively; , for . Denote by and the sets of structures satisfying the properties (a) and (b), respectively, and thus we have . Firstly, by the asymmetry of (Lemma 1), we conclude that for large n. Secondly, we use a binary sequence of length to represent a (labeled) instance G of , where the first bits represent the induced subgraph on , the next bits represent the induced subgraph on , and finally the rest bits represent the bipartite graph on . Since all edges of G are generated independently, both and have in expectation 1’s and the AEP property of the binary sequences implies that: holds with probability at least . Similarly, has in expectation 1’s and the AEP property of the binary sequences gives that with probability at least , Since these edges are independent, we finally conclude that (b) holds with probability at least . Thus, . Now we can compute for . By (a), for any . Together with (b) and straightforward computation, the assertion of (ii) follows. □

4. SBM Compression Algorithm

Given the computation of the structural entropy, a natural next step is to design efficient compression schemes that are close to or even (asymptotically) achieve this entropy limit. Choi and Szpankowski [25] presented such an algorithm (which they named Szip) for (unlabeled) random graphs, which uses in expectation at most bits and asymptotically achieves the structural entropy given in Theorem 1. Roughly speaking, Szip greedily peels off vertices from the graph and (efficiently) store the neighborhood information. This procedure can be simply reversed but the labeling of the recovered graph may be different from the original graph, which is the reason on why a saving of the codeword length is achieved. Refinements and analysis [25] are also provided to achieve the proposed performance. Here we give an algorithm that optimally compresses SBMs which uses the Szip algorithm as building blocks and matches the structural entropy computation in Theorem 2. The algorithm consists of two stages. It first compresses and using Szip and then compresses using an arithmetic compression algorithm with the help of Szip decoding outputs. To give a brief description of the compression algorithm, we again use the balanced bipartition as an example. The encoding and decoding procedure of the algorithm is illustrated in Figure 1. The algorithm encodes the observed into a binary string as follows. It uses Szip as a subroutine to compress and into binary sequences and . Then, as part of the encoder, we run the Szip decoder on and to obtain decoded structures and , respectively. We then compress as a labeled bipartite graph under the vertex labeling of and into . This “Labeled Encoder" can be done by treating it as a binary sequence of length and compressing using a standard arithmetic encoder [40,41,42]. The concatenation of Szip algorithms and the arithmetic encoder forms the cascade encoder of our algorithm and obtains the codeword . Upon receiving the codeword, we decode them parallelly using Szip decoder and the arithmetic decoder. This completes our algorithm.

Figure 1

Illustration of compression algorithm.

The main challenge in the design of our algorithm is how the decoder can retrieve the consistency between the bipartite graph and the decoded version of and . A key observation here is that since Szip is a deterministic algorithm, although it may permute the vertex labelings, its output is an invariant given the same input. Given this, our solution here is to first run Szip (both encoding and decoding) at the encoder, and obtain structures and , respectively. We then compress (as a labeled bipartite graph) under the vertex labeling of and . This would guarantee that the decoded structures , and share the same vertex labeling as and , namely, S is recovered. Before discussing the performance of the algorithm, we first describe some useful properties of the arithmetic compression algorithm in the following lemma. We omit the proof of the lemma, which follows from the analysis in [40,41,42] and AEP properties in [1,2]. Let L be the codeword length of the arithmetic compression algorithm when compressing a binary sequence with length m and entropy rate h. For large m, the following holds: The expected codeword length asymptotically achieves the entropy of the message, i.e., For any The arithmetic algorithm runs in time The following theorem characterizes the performance of our algorithm. It is immediate from Theorem 2 in [25] (performance of Szip) and 2, we omit the detailed proofs here. Let The algorithm asymptotically achieves the structural entropy in ( For any

5. General SBM with Blocks

In previous sections, we discussed the structural entropy of SBM and the compression algorithm that asymptotically achieves this structural entropy for the balanced bipartition case (). The corresponding results in Theorem 2 and 3 can be easily generalized to the general r-partition case. We briefly describe the generalizations below.

5.1. Structural Entropy

Our approach can deal with general SBMs similarly. In a general SBM with parts, the transition matrix, an symmetric matrix is used to describe the probabilities between and within the communities, where two vertices and are connected by an edge with probability (i and j are not necessarily distinct). We first give the result on the computation of the partitioned structural entropy of SBM. Fix r reals The r-partitioned structural entropy for some For

5.2. Compression Algorithm

The compression algorithm for a general r with vertex partition can be viewed as a union of the compression algorithms for and (). To be more precise, we describe the algorithm as follows. It first compresses all into using Szip. Then run the Szip decoder with input to obtain the decoded structure . With the indices of , , we can compress as a labeled r-partite graph into using an arithmetic encoder. This completes the encoding procedure and gives the codewords , for which we concatenate together and get the final codeword. The decoding is to simply run the Szip decoders and labeled (arithmetic) decoders parallelly. The correctness of the decoding output can also be argued accordingly. The performance of the algorithm can be obtained similar to 3 as follows. Fix r reals The algorithm asymptotically achieves the structural entropy in ( For any

6. Conclusions

In this paper, we defined the partitioned unlabeled graphs and partitioned structural entropy, which generalize the structural entropy for unlabeled graphs introduced by Choi and Szpankowski [25]. We then computed the partitioned structural entropy for stochastic block models and gave a compression algorithm that asymptotically achieves this structural entropy limit. As mentioned earlier, we believe that in appropriate contexts the structural information of a graph or network can be interpreted as a kind of semantic information, in which case, the communication schemes may benefit from structural compressions which considerably reduce the cost.

6 in total

Structural Entropy of the Stochastic Block Models.

1. Introduction

1.1. Compression of Graphs

1.2. Related Works

2. Preliminaries

2.1. Structural Entropy of Unlabeled Graphs

2.2. Stochastic Block Model–Our Result

3. Proof of Theorem 2

4. SBM Compression Algorithm

5. General SBM with Blocks

5.1. Structural Entropy

5.2. Compression Algorithm

6. Conclusions

1. Uncovering the overlapping community structure of complex networks in nature and society.

2. Entropy of network ensembles.

3. Entropy of stochastic blockmodel ensembles.

4. Entropy and the complexity of graphs. II. The information content of digraphs and infinite graphs.

5. Entropy and the complexity of graphs. I. An index of the relative complexity of a graph.

Review 6. A Review of Graph and Network Complexity from an Algorithmic Information Perspective.