| Literature DB >> 28832661 |
Adib Hasan1, Po-Chien Chung2, Wayne Hayes2.
Abstract
Graphlets are small connected induced subgraphs of a larger graph G. Graphlets are now commonly used to quantify local and global topology of networks in the field. Methods exist to exhaustively enumerate all graphlets (and their orbits) in large networks as efficiently as possible using orbit counting equations. However, the number of graphlets in G is exponential in both the number of nodes and edges in G. Enumerating them all is already unacceptably expensive on existing large networks, and the problem will only get worse as networks continue to grow in size and density. Here we introduce an efficient method designed to aid statistical sampling of graphlets up to size k = 8 from a large network. We define graphettes as the generalization of graphlets allowing for disconnected graphlets. Given a particular (undirected) graphette g, we introduce the idea of the canonical graphette [Formula: see text] as a representative member of the isomorphism group Iso(g) of g. We compute the mapping [Formula: see text], in the form of a lookup table, from all 2k(k - 1)/2 undirected graphettes g of size k ≤ 8 to their canonical representatives [Formula: see text], as well as the permutation that transforms g to [Formula: see text]. We also compute all automorphism orbits for each canonical graphette. Thus, given any k ≤ 8 nodes in a graph G, we can in constant time infer which graphette it is, as well as which orbit each of the k nodes belongs to. Sampling a large number N of such k-sets of nodes provides an approximation of both the distribution of graphlets and orbits across G, and the orbit degree vector at each node.Entities:
Mesh:
Year: 2017 PMID: 28832661 PMCID: PMC5568234 DOI: 10.1371/journal.pone.0181570
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1All (connected) graphlets of sizes k = 3, 4, 5 nodes, and their automorphism orbits; within each graphlet, nodes of equal shading are in the same orbit.
The numbering of these graphlets and orbits were created by hand [8] and do not correspond to the automatically generated numbering used in this paper. The figure is taken verbatim from [16].
Fig 2Three isomorphic representations of the Petersen graph.
Fig 3All the possible 3-graphettes.
Fig 4All 3-graphettes with exactly one edge; the canonical one is the one with lowest integer representation (the middle one in this case).
Each of them is placed in a lookup table indexed by the bit vector representation of its adjacency matrix, pointing at the canonical one. In this way we can determine that it is the one-edge 3-graphette in constant time.
For each value of k: The number of bits required to store the lower-triangle of the adjacency matrix for an undirected k-graphette; the number of such k-graphettes counting all isomorphs which is just 2; the number of canonical k-graphettes (this will be the number of unique entries in the above lookup table [22], and up to k = 8, 14 bits is sufficient); and the total number of unique automorphism orbits (up to k = 8, 17 bits is sufficient) [27].
Note that up to k = 8, together the lookup table for canonical graphettes and their canonical orbits fits into 31 bits, allowing storage as a single 4-byte integer, with 1 bit to store whether the graphette is connected (i.e., also a graphlet). The suffixes K, M, G, T, P, and E represent exactly 210, 220, 230, 240, 250 and 260, respectively.
| bits | #Graphs | Space | #Canonicals | #Orbits | |
|---|---|---|---|---|---|
| 1 | 0 | 1 | 0 | 1 | 1 |
| 2 | 1 | 2 | 0.25 B | 2 | 2 |
| 3 | 3 | 8 | 3 B | 4 | 6 |
| 4 | 6 | 64 | 48 B | 11 | 20 |
| 5 | 10 | 1 K | 1.25 KB | 34 | 90 |
| 6 | 15 | 32 K | 60 KB | 156 | 544 |
| 7 | 21 | 2 M | 5.25 MB | 1044 | 5096 |
| 9 | 36 | 64 G | 288 GB | 274668 | 2208612 |
| 10 | 45 | 32 T | 180 TB | 12005168 | 113743760 |
| 11 | 55 | 32 P | 220 PB | 1018997864 | 10926227136 |
| 12 | 66 | 64 E | 528 EB | 165091172592 | 1956363435360 |
| The Graph with nodes | |
| The set of nodes of graph | |
| The boolean value denoting connectivity between nodes | |
| ⟺, iff | If and only if |
| | | The number of elements in set |
| The adjacency matrix representation of graph | |
| The set of automorphisms of graph | |
| Canonical isomorph of graphette |