| Literature DB >> 23815292 |
Vincenzo Bonnici1, Rosalba Giugno, Alfredo Pulvirenti, Dennis Shasha, Alfredo Ferro.
Abstract
BACKGROUND: Graphs can represent biological networks at the molecular, protein, or species level. An important query is to find all matches of a pattern graph to a target graph. Accomplishing this is inherently difficult (NP-complete) and the efficiency of heuristic algorithms for the problem may depend upon the input graphs. The common aim of existing algorithms is to eliminate unsuccessful mappings as early as and as inexpensively as possible.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23815292 PMCID: PMC3633016 DOI: 10.1186/1471-2105-14-S7-S13
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Search space tree. The leaves of the search space tree corresponding to a path of the search tree leading to an isomorphism are highlighted with a green stick.
Statistics of biochemical datasets.
| Min Vertices | Min Edges | Max Vertices | Max Edges | Avg (SD) Vertices | Avg (SD) Edges | Avg (SD) Degree | Total Labels | Avg (SD) Labels | |
|---|---|---|---|---|---|---|---|---|---|
| 4 | 8 | 245 | 500 | 44.98 | 93.91 | 4.17 | 62 | 4.36 | |
| 240 | 480 | 33067 | 61546 | 5663.6 | 86661.27 | 3.21 | 14 | 5.9 | |
| 1683 | 3414 | 7979 | 16302 | 3614.1 | 7386.2 | 4.08 | 13 | 4.63 | |
| 7 | 16 | 883 | 18832 | 376.86 | 8679.48 | 44.78 | 21 | 18.86 | |
| 1081 | 12961 | 6726 | 230468 | 3167.6 | 87759.6 | 48.14 | 31676 | 3167.6 | |
| 5720 | 51464 | 12575 | 332458 | 7827.1 | 107135 | 28.66 | 78271 | 7827.1 | |
Statistics of the number of vertices and number of edges. These describe the minimum, maximum and average number of vertices and edges in the dataset. Total Labels is the total number of labels in the dataset. Avg Label is the average number of labels per graph. Standard deviations are reported in parentheses.
Figure 2GreatestConstraintFirst algorithm. The algorithm generates an order on the pattern vertices, sequence μ, that during the subgraph isomorphism process will maximize the number of topological constraints as early as possible in the matching process.
Figure 3Search strategy in RI. The sequence of pattern vertices produced by the static search strategy of RI. The first vertex inserted in μ is 4, since it has the greatest number of edges. Then, suppose that μ = {4, 1}, the candidates to be inserted are vertices 0,2,6, 5 and 7. The next vertex that will be inserted in μ is 5, because vertex 5 has a greater number of edges pointing neighbors of vertices in μ (i.e. point to 2 and 7) than vertex 0. Even though the node 5 has fewer edges pointing to all remaining vertices (i.e. consider the edge 〈5, 8〉 for 5, and for 0 the edges 〈0, 3〉 and 〈0, 9〉), this case has less weight in in defined score.
Figure 4Matching algorithm. Matching algorithm in RI
Comparison of subgraph isomorphism algorithms.
| Search Stratergy | Reduce Search Space | Preprocessing Data | x Data Structure | |
|---|---|---|---|---|
| FocusSearch [ | Static Semi-target dependent | Local domain reduction | Yes | List |
| Lad [ | Dynamic Target dependent | Domain reduction until convergence | Yes | Matrix |
| VFlib [ | Dynamic Target dependent | Two-Look-Head pruning rules | No | List |
| RI | Static Target independent | Fast and light pruning rules | No | List |
Review of some algorithmic aspects of the most recent subgraph isomorphism algorithms.