| Literature DB >> 19527491 |
Steffen Klamt1, Axel von Kamp.
Abstract
BACKGROUND: Interaction graphs (signed directed graphs) provide an important qualitative modeling approach for Systems Biology. They enable the analysis of causal relationships in cellular networks and can even be useful for predicting qualitative aspects of systems dynamics. Fundamental issues in the analysis of interaction graphs are the enumeration of paths and cycles (feedback loops) and the calculation of shortest positive/negative paths. These computational problems have been discussed only to a minor extent in the context of Systems Biology and in particular the shortest signed paths problem requires algorithmic developments.Entities:
Mesh:
Year: 2009 PMID: 19527491 PMCID: PMC2708159 DOI: 10.1186/1471-2105-10-181
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Example graphs illustrating the different stages and possible problems when searching for shortest paths in signed graphs (for discussion see main text). Edges with arrows are positive, those with bars are negative. In Figure 1e, a negative path from A to B with length 7 is indicated.
Figure 2The graph of Figure 1d transformed for the calculation of shortest paths with the two-step algorithm (TSA). The positive edges X → X' (X ∈ {A, B, C, G, H, F}) with zero weight are not displayed to reduce clutter (cf. Additional file 2). The shortest negative path from A to B is A → H' → C → B' with a length of 5. Note that the lengths of the (shortest) cycles is computed during the exhaustive search and not displayed in this transformed (acyclic) graph.
Benchmarks for path and cycle enumeration
| Number of cycles | EMC | Johnson | Number of I/O paths | EMC | Breadth-first | |||
| [s] | [s] | [s] | [s] | |||||
| T-cell | 94 | 138 | 100 | 0.1 | 0.04 | 8058 | 0.98 | 0.27 |
| EGFR | 106 | 230 | 237 | 0.15 | 0.07 | 384766 | 131 | 21 |
| T-cell+EGFR | 200 | 410 | 337 | 6.0 ± 0.9 | 0.14 ± 0.01 | n/a | n/a | n/a |
| Regulon DB | 1493 | 3565 | 132 | 194 | 0.77 | 44194 | 4716 | 38 |
| CA1 neuron | 512 | 1047 | n/a | n/a | n/a | n/a | n/a | n/a |
| Cancer signaling | 1240 | 3144 | n/a | n/a | n/a | n/a | n/a | n/a |
The running times when using elementary modes computation (columns "EMC") are compared with those of the graph-algorithms (Johnson's algorithm and breadth-first search, respectively). The number of edges refers to unique edges (parallel edges with the same sign and half-edges are removed before calculation). The values for the combined network "T-cell+EGFR" are mean and standard error over ten runs with different random connections between the two networks. An entry n/a indicates that the procedure quickly ran out of memory (3 GB) because of a combinatorial explosion of paths or cycles. The platform used was MATLAB 2006 b under 32 bit Linux with an Intel E6600 processor.
Benchmarks for calculation of shortest signed paths between all pairs of nodes.
| approximation with DLACC-TI | TSA | DFT | ||||
| [s] | TI corrections | remaining errors | [s] | [s] | ||
| T-cell | 1 (33) | 0.79 | 71 | 2 | 0.34 | 0.02 |
| EGFR | 1 (33) | 1.18 | 183 | 1 | 0.61 | 0.26 |
| T-cell+EGFR | 2 (33, 33) | 6.0 ± 0.04 | 879 ± 79 | 3 ± 0 | 3.1 ± 0.03 | 419 ± 82 |
| Regulon DB | 1 (30) | 103 | 145 | 0 | 11.8 | 1.0 |
| CA1 neuron | 1 (154) | 25 | 1869 | 43* | 582* | 2213* |
| Cancer signaling | 4 (2, 2, 2, 445) | 243 | 2161 | n/a | >12 h | >12 h |
The running times for the different algorithms are shown and the quality of the approximation with DLACC-TI is assessed. Also, the number of uSCCS in the networks together with the number of nodes that they contain is shown. In the column "TI corrections" the number of shorter paths that can be identified with transitive inference after having run DLACC is given. The "remaining errors" column shows how many shortest paths (after the TI step) differ in their length compared to the exact results delivered by TSA or DFT. When an algorithm ran longer than 12 hours it was considered impractical and terminated. Therefore, no exact results were determined for the cancer signaling network and consequently the quality of the approximation with DLACC-TI cannot be given (*) For the CA1 neuron, the search depths in the two-step algorithm and in the exhaustive search were limited to length 18 to make calculations practicable. Therefore some paths may have been missed. The longest shortest path identified for this network with the DLACC-TI is also of length 18. The computational environment is the same as in Table 1.