| Literature DB >> 24879305 |
Maarten Houbraken1, Sofie Demeyer2, Tom Michoel2, Pieter Audenaert1, Didier Colle1, Mario Pickavet1.
Abstract
Subgraph matching algorithms are used to find and enumerate specific interconnection structures in networks. By enumerating these specific structures/subgraphs, the fundamental properties of the network can be derived. More specifically in biological networks, subgraph matching algorithms are used to discover network motifs, specific patterns occurring more often than expected by chance. Finding these network motifs yields information on the underlying biological relations modelled by the network. In this work, we present the Index-based Subgraph Matching Algorithm with General Symmetries (ISMAGS), an improved version of the Index-based Subgraph Matching Algorithm (ISMA). ISMA quickly finds all instances of a predefined motif in a network by intelligently exploring the search space and taking into account easily identifiable symmetric structures. However, more complex symmetries (possibly involving switching multiple nodes) are not taken into account, resulting in superfluous output. ISMAGS overcomes this problem by using a customised symmetry analysis phase to detect all symmetric structures in the network motif subgraphs. These structures are then converted to symmetry-breaking constraints used to prune the search space and speed up calculations. The performance of the algorithm was tested on several types of networks (biological, social and computer networks) for various subgraphs with a varying degree of symmetry. For subgraphs with complex (multi-node) symmetric structures, high speed-up factors are obtained as the search space is pruned by the symmetry-breaking constraints. For subgraphs with no or simple symmetric structures, ISMAGS still reduces computation times by optimising set operations. Moreover, the calculated list of subgraph instances is minimal as it contains no instances that differ by only a subgraph symmetry. An implementation of the algorithm is freely available at https://github.com/mhoubraken/ISMAGS.Entities:
Mesh:
Year: 2014 PMID: 24879305 PMCID: PMC4039476 DOI: 10.1371/journal.pone.0097896
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Subgraph examples.
In “XXXXXX”, every node is symmetric to every other node as, in a clique, all nodes can be swapped. The “XX00XX” graph is symmetric as it has rotation symmetry (all nodes can be shifted in the ring) and reflection symmetry (top two nodes can be switched with bottom two nodes for example). The “XZ00ZY” graph is also symmetric as the same configuration is obtained when node is switched with node and node with node . While the G4 graph has no symmetric properties, the tetrahedron and the Petersen graph [25] have more complex symmetric structures.
Figure 2Some permuted instances of the Petersen graph.
Figure 3Example of subgraph refinement.
The top figure shows the initial partitioning in the “XZ00ZY” graph in which all nodes are in the same cell. However, nodes and have an outgoing edge while nodes 3 and 4 do not. This indicates the partition needs to be refined. Nodes and are put in a separate cell as shown in the bottom figure.
Figure 4Subgraph symmetry analysis of “XX00XX”.
The branches are denoted with the applied coupling. The boxed numbers indicate the order of tree traversal, with a depth-first exploration, according to the smallest node first coupling. The initial partition has all nodes in the same cells and . The first coupling splits up the cells in both partitions in 3 cells (separation of cells denoted by |). When a permutation is found, the orbit partition is updated as shown. Orbit pruning is used to reduce the required computations as explained in text.
Algorithm 1: findSubgraphInstances(Graph , Subgraph ).
| 1: Set<Constraint> |
| 2: NodeListHandler[] |
| 3: |
| 4: |
| 5: |
| 6: |
| 7: add |
| 8: |
| 9: |
| 10: SubgraphNode |
| 11: NodeList |
| 12: mapNodes( |
Algorithm 2: mapNodes(SubgraphNode , NodeListHandler[] , Set
| 1: List<Node> |
| 2: |
| 3: map |
| 4: |
| 5: export instance; |
| 6: |
| 7: |
| 8: |
| 9: |
| 10: |
| 11: addNeighbourList( |
| 12: |
| 13: |
| 14: mapNodes( |
| 15: |
| 16: unmap |
| 17: |
Algorithm 3: analyseSubgraph(Subgraph ).
| 1:Set<Permutation> |
| 2: Set<Constraint> |
| 3: Set<Set<SubgraphNode>> |
| 4: |
| 5: add set |
| 6: |
| 7: Partition |
| 8: OPP |
| 9: processOPP( |
| 10: |
Algorithm 4: processOPP(OPP , Set
| 1: |
| 2: add current mapping to |
| 3: update |
| 4: |
| 5: |
| 6: |
| 7: |
| 8: sort |
| 9: |
| 10: couple |
| 11: |
| 12: processOPP( |
| 13: |
| 14: |
| 15: |
| 16: |
| 17: add |
| 18: |
| 19: |
| 20: |
Network properties.
| Network | #Nodes | #Edges |
| PGS (reduced) | 1255 | 6454 |
| P | 887 | 1844 |
| G | 469 | 4051 |
| S | 404 | 659 |
| XYZ/ABZ | 15078 | 79794 |
| X/A | 4847 | 36391 |
| Y/B | 9602 | 40630 |
| Z | 5208 | 3132 |
| Slashdot | 79120 | 469768 |
| E | 37412 | 118755 |
| F | 69998 | 351013 |
| Wiki-Vote | 7115 | 100762 |
| p2p-Gnutella08 | 6301 | 20777 |
| p2p-Gnutella30 | 36682 | 88328 |
| CA-CondMat | 23133 | 93439 |
| CA-HepTh | 9877 | 25973 |
For each network, the number of nodes and edges is given. If multiple edge types are present in the network, separate counts are given for each edge type, denoting the number of edges of the specific type as well the number of nodes having an edge of that type. The XYZ-network and the ABZ-network have the same node and edge count as the A- and B-edges are the directed versions of the X- and Y-edges. Similarly, the reduced PGS-network has the same node and edge count as the PGS-network.
Comparison between ISMA and ISMAGS on the biological networks.
| Search space (#nodes) | Calculation time (ms) | ||||||
| #instances | ISMA | ISMAGS | SPRF | ISMA | ISMAGS | SF | |
| PGS-network | |||||||
| GGG | 9008 | 4520 | 4520 | 1.00 | 25.91 | 5.80 | 4.46 |
| SSS | 78 | 763 | 761 | 1.00 | 1.61 | 0.24 | 6.65 |
| SsS | 0 | 190 | 90 | 2.11 | 0.62 | 0.06 | 10.24 |
| GPS | 47 | 462 | 454 | 1.02 | 0.99 | 0.28 | 3.56 |
| SSG | 103 | 763 | 761 | 1.00 | 1.76 | 0.44 | 3.97 |
| SsG | 25 | 190 | 131 | 1.45 | 0.49 | 0.13 | 3.79 |
| GGS | 294 | 462 | 454 | 1.02 | 1.29 | 0.48 | 2.69 |
| GGP | 418 | 8571 | 8571 | 1.00 | 14.96 | 3.83 | 3.90 |
| ssG | 112 | 372 | 419 | 0.89 | 0.91 | 0.25 | 3.69 |
| PGSPGS | 0 | 391 | 232 | 1.69 | 0.97 | 0.14 | 7.02 |
| P0P | 24452 | 4575 | 4575 | 1.00 | 19.59 | 2.87 | 6.82 |
| P0P00P | 221290 | 57167 | 53479 | 1.07 | 192.67 | 30.67 | 6.28 |
| P0P00P000P | 2570154 | 551837 | 496059 | 1.11 | 2142.74 | 302.55 | 7.08 |
| Petersen | 9430 | 1131600 | 616418 | 1.84 | 330882* | 733.25 | 451 |
| Reduced PGS-network | |||||||
| 3-clique | 10614 | 7709 | 7709 | 1.00 | 39.50 | 8.81 | 4.48 |
| 4-clique | 11150 | 18323 | 18323 | 1.00 | 118.69 | 27.60 | 4.30 |
| 5-clique | 7669 | 29473 | 29473 | 1.00 | 225.49 | 48.36 | 4.66 |
| 6-clique | 3616 | 37142 | 37142 | 1.00 | 320.71 | 64.90 | 4.94 |
| 7-clique | 1158 | 40758 | 40758 | 1.00 | 379.00 | 76.63 | 4.95 |
| 8-clique | 226 | 41916 | 41916 | 1.00 | 412.69 | 84.97 | 4.86 |
| 9-clique | 24 | 42142 | 42142 | 1.00 | 431.79 | 92.21 | 4.68 |
| 10-clique | 1 | 42166 | 42166 | 1.00 | 451.08 | 99.08 | 4.55 |
| XYZ-network | |||||||
| XZ00ZY | 2554 | 23164 | 19095 | 1.21 | 36.66 | 9.58 | 3.83 |
| XXXZ000Z0Y00ZYY | 4727 | 73647 | 39572 | 1.86 | 152.77 | 27.54 | 5.55 |
| ABZ-Network | |||||||
| AZ00ZB | 1337 | 11588 | 10859 | 1.07 | 20.46 | 6.02 | 3.40 |
| AAAZ000Z0B00ZBB | 837 | 15186 | 14190 | 1.07 | 36.52 | 9.73 | 3.75 |
The reported #instances is the number of subgraph instances as exported by ISMAGS. The search space size is the #nodes visited during the search process. SPRF is the search space reduction factor and is calculated as the ratio of the search space size for ISMA to the search space size for ISMAGS. The speed-up factor is defined analogously for the calculation time. All reported timings are averaged over 1000 runs unless denoted by an asterisk *, in which case only one test was performed as the long computation times were prohibitive for more elaborate testing.
Comparison between ISMA and ISMAGS on the Slashdot and SNAP networks.
| Search space (#nodes) | Calculation time (ms) | ||||||
| #instances | ISMA | ISMAGS | SPRF | ISMA | ISMAGS | SF | |
| Slashdot | |||||||
| 3-clique E | 12176 | 156167 | 156167 | 1.00 | 844.37 | 283.44 | 2.98 |
| 4-clique E | 223 | 168343 | 168343 | 1.00 | 1075.89 | 364.76 | 2.95 |
| 5-clique E | 3 | 168566 | 168566 | 1.00 | 1253.54 | 392.47 | 3.19 |
| 3-clique F | 392556 | 421011 | 421011 | 1.00 | 4145.04 | 1600.07 | 2.59 |
| 4-clique F | 1779701 | 813567 | 813567 | 1.00 | 14137.33 | 4727.03 | 2.99 |
| FEF | 67594 | 148084 | 139542 | 1.06 | 1018.42 | 890.77 | 1.14 |
| FEE | 75100 | 178997 | 173435 | 1.03 | 1145.29 | 618.22 | 1.85 |
| FE00EF | 755899 | 7544696 | 1577058 | 4.78 | 35769.79 | 5388.92 | 6.64 |
| Wiki-Vote | |||||||
| 3-clique | 608389 | 107877 | 107877 | 1.00 | 1794.22 | 410.19 | 4.37 |
| 4-clique | 2077903 | 716266 | 716266 | 1.00 | 14137.27 | 5156.31 | 2.74 |
| tetrahedron | 84787 | 115205 | 49836 | 2.31 | 1025.65 | 320.72 | 3.20 |
| G4 | 62406 | 80584 | 76364 | 1.06 | 580.07 | 448.54 | 1.29 |
| p2p-Gnutella08 | |||||||
| 3-clique | 2383 | 27078 | 27078 | 1.00 | 84.96 | 21.23 | 4.00 |
| 4-clique | 175 | 29461 | 29461 | 1.00 | 107.87 | 28.69 | 3.76 |
| tetrahedron | 2 | 9565 | 6043 | 1.58 | 21.22 | 4.92 | 4.32 |
| G4 | 6 | 9546 | 9472 | 1.01 | 20.03 | 7.97 | 2.51 |
| p2p-Gnutella30 | |||||||
| 3-clique | 1590 | 125010 | 125010 | 1.00 | 405.26 | 113.69 | 3.56 |
| 4-clique | 13 | 126600 | 126600 | 1.00 | 464.50 | 139.06 | 3.34 |
| tetrahedron | 2 | 41147 | 27365 | 1.50 | 101.60 | 28.03 | 3.62 |
| G4 | 0 | 41149 | 40910 | 1.01 | 99.60 | 38.12 | 2.61 |
| CA-CondMat | |||||||
| 3-clique | 173361 | 116572 | 116572 | 1.00 | 548.19 | 128.96 | 4.25 |
| 4-clique | 294008 | 289933 | 289933 | 1.00 | 1518.38 | 357.29 | 4.25 |
| tetrahedron | 0 | 67688 | 39758 | 1.70 | 163.95 | 35.08 | 4.67 |
| G4 | 0 | 62974 | 62612 | 1.01 | 149.69 | 54.06 | 2.77 |
| CA-HepTh | |||||||
| 3-clique | 28339 | 35850 | 35848 | 1.00 | 118.27 | 30.45 | 3.88 |
| 4-clique | 65592 | 64187 | 64187 | 1.00 | 290.38 | 64.56 | 4.50 |
| tetrahedron | 0 | 18258 | 11434 | 1.60 | 37.38 | 9.16 | 4.08 |
| G4 | 0 | 17853 | 17068 | 1.05 | 35.82 | 11.54 | 3.10 |
The reported parameters are analogously defined as in Table 6. Note that for the clique subgraphs, the SNAP networks are assumed to be undirected, while for the tetrahedron, they are assumed to be directed.
Comparison between ISMAGS, VF2, GK and G-Trie on the biological networks.
| Calculation time (ms) | |||||
| #instances | VF2 | GK | G-Trie | ISMAGS | |
| #runs | 1000 | 1000 | 1000 | 1000 | |
| PGS-network | |||||
| GGG | 9008 | 885.56 | 319.27 | 1.03 | 5.80 |
| SSS | 78 | 24.12 | 22.90 | 0.25 | 0.24 |
| SsS | 0 | 22.50 | 17.40 | 0.22 | 0.06 |
| P0P | 24452 | 268.76 | 103.81 | - | 2.87 |
| P0P00P | 221290 | 4252.12 | 585.77 | - | 30.67 |
| P0P00P000P | 2570154 | 35303* | 5705.64 | - | 302.55 |
| Petersen | 9430 | 6854418* | 53608* | - | 733.25 |
| Reduced PGS-network | |||||
| 3-clique | 10614 | 1170* | 511.33 | 1.51 | 8.81 |
| 4-clique | 11150 | 7300* | 1177.12 | 3.60 | 27.60 |
| 5-clique | 7669 | 32527* | 2160.31 | 5.55 | 48.36 |
| 6-clique | 3616 | 115409* | 2968.93 | 7.44 | 64.90 |
| 7-clique | 1158 | 329521* | 3513.20 | 14.41 | 76.63 |
| 8-clique | 226 | 754671* | 3700.21 | 71.60 | 84.97 |
| 9-clique | 24 | 1337881* | 3813.57 | 671.08 | 92.21 |
| 10-clique | 1 | 1848315* | 4181.80 | - | 99.08 |
The top row denotes, for each algorithm, the number of runs averaged to obtain the reported timing results. However, for results denoted with an asterisk, only 1 run was performed. For the G-Trie algorithm, some results are missing as the size of the subgraph ( = 10 nodes) was not supported by the reference implementation. The results for the “P0P”, “P0P00P” and “P0P00P000P” are also omitted as the algorithm did not support “don't care” links.
Comparison between ISMAGS, VF2, GK and G-Trie on the SNAP networks.
| Calculation time (ms) | |||||
| #instances | VF2 | GK | G-Trie | ISMAGS | |
| #runs | 100 | 100 | 1000 | 1000 | |
| Wiki-Vote | |||||
| 3-clique | 608389 | 187191.52 | 27940.99 | 90.28 | 410.19 |
| 4-clique | 2077903 | 3410302° | 189357.30 | 613.08 | 5156.31 |
| tetrahedron | 84787 | 15260.17 | 106367.64 | 443.71 | 320.72 |
| G4 | 62406 | 8168.52 | 128836.22 | 1006.36 | 448.54 |
| p2p-Gnutella08 | |||||
| 3-clique | 2383 | 816.04 | 1163.03 | 6.66 | 21.23 |
| 4-clique | 175 | 1659.69 | 1359.35 | 6.81 | 28.69 |
| tetrahedron | 2 | 114.66 | 1151.98 | 6.18 | 4.92 |
| G4 | 6 | 108.73 | 1766.08 | 12.32 | 7.97 |
| p2p-Gnutella30 | |||||
| 3-clique | 1590 | 6259.23 | 5681.83 | 43.21 | 113.69 |
| 4-clique | 13 | 5867.19 | 5527.54 | 43.07 | 139.06 |
| tetrahedron | 2 | 1991.82 | 5793.46 | 34.34 | 28.03 |
| G4 | 0 | 1964.98 | 6671.50 | 72.07 | 38.12 |
| CA-CondMat | |||||
| 3-clique | 173361 | 37742.10 | 7196.78 | 41.73 | 128.96 |
| 4-clique | 294008 | 232134.17 | 15558.19 | 68.48 | 357.29 |
| tetrahedron | 0 | 6547.67 | 11779.51 | 40.48 | 35.08 |
| G4 | 0 | 4848.04 | 13660.84 | 121.68 | 54.06 |
| CA-HepTh | |||||
| 3-clique | 28339 | 4441.81 | 1416.95 | 7.64 | 30.45 |
| 4-clique | 65592 | 21374.00 | 2361.03 | 11.19 | 64.56 |
| tetrahedron | 0 | 539.45 | 1790.64 | 6.65 | 9.16 |
| G4 | 0 | 392.03 | 2449.13 | 19.03 | 11.54 |
Similar to Table 8, the top row denotes, for each algorithm, the number of runs averaged to obtain the reported timing results. However, the result denoted with a circle was averaged over 10 runs.