| Literature DB >> 31554857 |
Harish Kannan1, Emil Saucan2,3, Indrava Roy4, Areejit Samal5,6.
Abstract
Topological data analysis can reveal higher-order structure beyond pairwise connections between vertices in complex networks. We present a new method based on discrete Morse theory to study topological properties of unweighted and undirected networks using persistent homology. Leveraging on the features of discrete Morse theory, our method not only captures the topology of the clique complex of such graphs via the concept of critical simplices, but also achieves close to the theoretical minimum number of critical simplices in several analyzed model and real networks. This leads to a reduced filtration scheme based on the subsequence of the corresponding critical weights, thereby leading to a significant increase in computational efficiency. We have employed our filtration scheme to explore the persistent homology of several model and real-world networks. In particular, we show that our method can detect differences in the higher-order structure of networks, and the corresponding persistence diagrams can be used to distinguish between different model networks. In summary, our method based on discrete Morse theory further increases the applicability of persistent homology to investigate the global topology of complex networks.Entities:
Year: 2019 PMID: 31554857 PMCID: PMC6761140 DOI: 10.1038/s41598-019-50202-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1An illustration of the construction of a discrete Morse function f on a clique complex K corresponding to an unweighted and undirected graph G using our algorithm. (a) A simple example of an unweighted and undirected graph G containing 9 vertices and 11 edges. (b) The clique simplicial complex K corresponding to the simple graph G shown in (a). The clique complex K consists of 9 vertices or 0-simplices, 11 edges or 1-simplices and 2 triangles or 2-simplices. The figure also displays the orientation of the 1- and 2-simplices using arrows. (c) Generation of a discrete Morse function f on the clique complex K shown in (b) using our algorithm. The figure lists the state of the Flag variable in algorithm 1 and IsCritical variable in algorithm 2 (See SI Appendix) for each simplex in K. In this example, the clique complex has 4 critical simplices and their respective critical weights correspond to the filtration steps. The figure also lists the FiltrationWeight for each simplex in K obtained using algorithm 3 (See SI Appendix).
Figure 2Filtration based on the entire sequence of weights satisfying discrete Morse function is equivalent to filtration based only on the subsequence of critical weights in terms of persistent homology. (a) Filtration of the network shown in Fig. 1 based on weights of the 4 critical simplices. There is a 0-hole (or connected component) that persists across the 4 stages of the filtration while another 0-hole is born at stage 2 on addition of critical vertex v2 but dies at the stage 3 which corresponds to the weight of the critical edge . Moreover, a 1-hole is born at the stage 4 on addition of the critical edge . (b) Five intermediate stages during the filtration between critical weights 1.1 (stage 2) and 2.35 (stage 3). (c) Four intermediate stages during the filtration between critical weights 2.35 (stage 3) and 3.48 (stage 4). It is seen that the homology of the clique complex remains unchanged during the intermediate stages of the filtration whereby the birth and death of holes occur only at stages which correspond to critical weights.
The table lists the number of p-simplices (n), the number of critical p-simplices (m) and the p-Betti number β for clique complexes corresponding to model and real networks.
| Network |
|
|
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ER model with | 1000 | 90 | 21 | 2007 | 1090 | 1021 | 7 | 0 | 0 | 0 | 0 | 0 |
| WS model with | 1000 | 123 | 1 | 2000 | 991 | 864 | 137 | 5 | 0 | 0 | 0 | 0 |
| BA model with | 1000 | 8 | 1 | 1996 | 949 | 942 | 55 | 0 | 0 | 0 | 0 | 0 |
| Spherical model with | 1000 | 172 | 126 | 2028 | 118 | 0 | 2029 | 180 | 0 | 1321 | 554 | 446 |
| Hyperbolic model with | 1000 | 144 | 144 | 2593 | 20 | 0 | 5440 | 426 | 0 | 11456 | 8159 | 7753 |
| US Power Grid | 4941 | 573 | 1 | 6594 | 1671 | 1080 | 651 | 21 | 0 | 90 | 15 | 13 |
| Email communication | 1133 | 6 | 1 | 5451 | 1694 | 1186 | 5343 | 871 | 53 | 3419 | 1577 | 1262 |
| Route views | 6474 | 17 | 1 | 12572 | 2459 | 2157 | 6584 | 627 | 19 | 5636 | 3335 | 3013 |
| Yeast protein interaction | 1870 | 272 | 173 | 2203 | 424 | 318 | 222 | 12 | 0 | 41 | 12 | 7 |
| Hamsterster friendship | 1858 | 33 | 23 | 12534 | 4484 | 2970 | 16750 | 5324 | 1880 | 10015 | 4814 | 2874 |
| Euro road | 1174 | 213 | 26 | 1417 | 425 | 237 | 32 | 1 | 0 | 0 | 0 | 0 |
| Human protein interaction | 3133 | 269 | 210 | 6149 | 2454 | 2298 | 1047 | 109 | 1 | 142 | 35 | 24 |
Note that the dimension p of simplices ranges from 0 to 3. In case of model networks, the statistics is reported for a particular instance of ER graph with and , WS graph with , and , BA graph with and , Spherical random graph produced from HGG model with , , and , and Hyperbolic random graph produced from HGG model with , , and . For the statistics corresponding to a more comprehensive list of model networks with different parameters, we refer the readers to SI Table S2. Note that we omit self-loops in the real networks considered here.
Figure 3Barcode diagrams for H0 and H1 in model networks. (a) ER model with and . (b) WS model with , and . (c) BA model with and . (d) Spherical random graphs produced from HGG model with , , and . (e) Hyperbolic random graphs produced from HGG model with , , and .
Figure 4Barcode diagrams for H0 and H1 in real networks. (a) US Power Grid. (b) Email communication. (c) Route views. (d) Yeast protein interaction. (e) Hamsterster friendship.
Figure 5Barcode diagrams for H2 in model and real networks. (a) Spherical random graphs produced from HGG model with , , and . (b) Hyperbolic random graphs produced from HGG model with , , and . (c) US Power Grid. (d) Email communication. (e) Route views. (f) Yeast protein interaction. (g) Hamsterster friendship.
Figure 6Bottleneck distance between persistence diagrams of model networks, namely, ER model with and , WS model with , and , BA model with and , Spherical random graphs produced from HGG model with , , and , and Hyperbolic random graphs produced from HGG model with , , and . For each of the five model networks, 10 random samples are generated by fixing the number of vertices n and other parameters of the model. We report the distance (rounded to two decimal places) between two different models as the average of the distance between each of the possible pairs of the 10 sample networks corresponding to the two models along with the standard error.
The table lists the value of optimality indicator μ for various model and real networks analyzed here.
| Network | Optimalitiy indicator |
|---|---|
| ER model with | 0.924 ± 0.004 |
| ER model with | 0.947 ± 0.004 |
| ER model with | 0.959 ± 0.006 |
| WS model with | 0.890 ± 0.003 |
| WS model with | 0.917 ± 0.007 |
| WS model with | 0.906 ± 0.003 |
| BA model with | 0.989 ± 0.003 |
| BA model with | 0.985 ± 0.004 |
| BA model with | 0.964 ± 0.006 |
| Spherical model with | 0.925 ± 0.007 |
| Spherical model with | 0.901 ± 0.003 |
| Spherical model with | 0.887 ± 0.003 |
| Hyperbolic model with | 0.939 ± 0.013 |
| Hyperbolic model with | 0.927 ± 0.007 |
| Hyperbolic model with | 0.921 ± 0.005 |
| US Power Grid | 0.893937 |
| Email communication | 0.871847 |
| Route views | 0.952140 |
| Yeast protein interaction | 0.942157 |
| Hamsterster friendship | 0.793236 |
| Euro road | 0.840678 |
| Human protein interaction | 0.957924 |
Eq. 14 gives the definition of μ which is an indicator for the proximity of our algorithm to the optimal case. The value of μ ranges from 0 to 1 with indicating that our algorithm achieves exactly the theoretical minimum number of critical simplices while indicating that all simplices are critical. For model networks, the value reported in this table is the average of μ across 10 samples of each model network for a chosen set of parameter values along with the corresponding standard deviations.