| Literature DB >> 33888059 |
Marco Beccuti1, Rosalba Giugno2, Nicola Licheri1, Vincenzo Bonnici3.
Abstract
BACKGROUND: Graphs are mathematical structures widely used for expressing relationships among elements when representing biomedical and biological information. On top of these representations, several analyses are performed. A common task is the search of one substructure within one graph, called target. The problem is referred to as one-to-one subgraph search, and it is known to be NP-complete. Heuristics and indexing techniques can be applied to facilitate the search. Indexing techniques are also exploited in the context of searching in a collection of target graphs, referred to as one-to-many subgraph problem. Filter-and-verification methods that use indexing approaches provide a fast pruning of target graphs or parts of them that do not contain the query. The expensive verification phase is then performed only on the subset of promising targets. Indexing strategies extract graph features at a sufficient granularity level for performing a powerful filtering step. Features are memorized in data structures allowing an efficient access. Indexing size, querying time and filtering power are key points for the development of efficient subgraph searching solutions.Entities:
Keywords: Decision diagrams; Graph indexing; Pattern matching; Query processing; Subgraph isomorphism
Year: 2021 PMID: 33888059 PMCID: PMC8061067 DOI: 10.1186/s12859-021-04129-0
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Different kind of MTMDD encoding the function counting the occurrences of an element into the multiset S: a an OMTMDD; b a ROMTMDD; c a QROMTMDD
Fig. 2A multiplication operation between two QROMTMDD
Fig. 3GRAPES-DD workflow with path length
Fig. 4GRAPES-DD indexing of a target graph using a MTMDD built from partial tries
Fig. 5GRAPES/GRAPES-DD ratios of memory peak (as a RAM requirement) and index size (as a storage requirement), obtained by indexing Barabasi-Albert graphs. The chart was made by using the boxplot function of the Python3 Pandas module
Fig. 6GRAPES/GRAPES-DD ratios of memory peak (as a RAM requirement) and index size (as a storage requirement), obtained by indexing Forest-Fire graphs. The chart was made by using the boxplot function of the Python3 Pandas module
Indexing comparison of GRAPES and GRAPES-DD of synthetic graphs in terms of RAM requirement, Storage requirement, and Building time
| RAM req. (MB) | Storage req. (MB) | Build time (s) | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| GRAPES-DD | GRAPES | ratio | GRAPES-DD | GRAPES | ratio | GRAPES-DD | GRAPES | |||
| Barabasi-A. | 0.1% | 3649 | 7935 | 2.2 | 305 | 3,493 | 11.5 | 470 | 109 | |
| 1% | 8229 | 66,838 | 8.1 | 1543 | 28,772 | 18.6 | 646 | 214 | ||
| 10% | 7876 | 81,552 | 10.4 | 10,071 | 34,368 | 3.4 | 668 | 265 | ||
| 0.5 | 2103 | 94,654 | 45.0 | 24,915 | 40,281 | 1.6 | 516 | 330 | ||
| 1 | 8351 | 58,519 | 7.0 | 16,602 | 25,145 | 1.5 | 766 | 213 | ||
| 1.5 | 1447 | 3068 | 2.1 | 1246 | 1219 | 1.0 | 144 | 26 | ||
| Forest-Fire | 0.1% | 834 | 3,929 | 4.7 | 121 | 1,689 | 13.9 | 63 | 29 | |
| 1% | 1308 | 21,255 | 16.2 | 738 | 9,229 | 12.5 | 77 | 66 | ||
| 10% | 1167 | 24,936 | 21.4 | 5,351 | 10,650 | 2.0 | 62 | 70 | ||
| 0.1 | 147 | 1426 | 9.7 | 2,882 | 585 | 0.2 | 10 | 7 | ||
| 0.3 | 188 | 2451 | 13.1 | 3,358 | 1,024 | 0.3 | 14 | 9 | ||
| 0.5 | 281 | 4966 | 17.7 | 3,922 | 2,109 | 0.5 | 26 | 16 | ||
| 0.7 | 487 | 11,694 | 24.0 | 4,535 | 5,020 | 1.1 | 57 | 37 | ||
| 0.9 | 988 | 29,565 | 29.9 | 5,386 | 12,840 | 2.4 | 139 | 88 | ||
Indexing comparison of GRAPES and GRAPES-DD of biochemical datasets in terms of RAM requirement, Storage requirement, and Building time
| RAM req. (MB) | Storage req. (MB) | Build time (s) | ||||||
|---|---|---|---|---|---|---|---|---|
| GRAPES-DD | GRAPES | ratio | GRAPES-DD | GRAPES | ratio | GRAPES-DD | GRAPES | |
| 5304 | 1064 | 0.20 | 164 | 39 | 0.24 | 170.12 | 16 | |
| 532 | 556 | 1.04 | 22 | 17 | 0.78 | 176.00 | 10.07 | |
| 512 | 7057 | 13.77 | 253 | 1,392 | 5.51 | 617.24 | 754.56 | |
| 629 | 1698 | 2.70 | 166 | 665 | 4.00 | 2514.18 | 2906.65 | |
Fig. 7Cumulative time for running 100 queries over the four collections of biochemical graphs. The chart was made by using the plot function of the Python3 Pandas module
Indexing comparison of GRAPES and GRAPES-DD of single PPI network in terms of RAM requirement, Storage requirement, and Building time
| RAM req. (MB) | Storage req. (MB) | Build time (s) | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Species | | | | | GRAPES-DD | GRAPES | ratio | GRAPES-DD | GRAPES | ratio | GRAPES-DD | GRAPES |
| 4709 | 40,284 | 38.1 | 91.1 | 2.39 | 10.5 | 26.2 | 2.49 | 60.93 | 63.49 | |
| 5,230 | 53,699 | 56.3 | 136.3 | 2.42 | 11.7 | 42.4 | 3.63 | 604.47 | 630.92 | |
| 5,762 | 76,482 | 61.1 | 150.9 | 2.47 | 12.7 | 47.5 | 3.73 | 846.24 | 879.90 | |
| 5,936 | 89,674 | 46.5 | 121.2 | 2.61 | 12.9 | 36.8 | 2.86 | 128.52 | 135.01 | |
| 1557 | 2472 | 7.1 | 5.7 | 0.80 | 0.2 | 0.3 | 1.45 | 0.24 | 0.20 | |
| 2421 | 3981 | 7.8 | 7.9 | 1.02 | 0.5 | 1.0 | 2.06 | 0.52 | 0.44 | |
| 3664 | 7005 | 10.4 | 14.6 | 1.40 | 1.0 | 2.8 | 2.65 | 1.33 | 1.13 | |
| 6173 | 26,184 | 25.1 | 58.5 | 2.33 | 4.3 | 16.0 | 3.75 | 34.16 | 34.19 | |
| 1185 | 2008 | 8.0 | 12.1 | 1.51 | 0.6 | 1.5 | 2.30 | 0.31 | 0.26 | |
| 2488 | 6151 | 12.1 | 32.1 | 2.65 | 1.9 | 6.6 | 3.47 | 3.28 | 3.21 | |
| 2729 | 7235 | 13.3 | 36.2 | 2.73 | 2.3 | 7.9 | 3.37 | 4.30 | 4.22 | |
| 7928 | 37,542 | 52.3 | 198.7 | 3.80 | 13.1 | 64.3 | 4.90 | 144.05 | 156.07 | |
| M. musculus | 1810 | 2413 | 8.0 | 13.1 | 1.64 | 0.7 | 2.2 | 2.94 | 0.42 | 0.36 |
| 3255 | 5424 | 11.0 | 31.0 | 2.81 | 1.9 | 7.1 | 3.68 | 2.52 | 2.50 | |
| 3758 | 6853 | 13.1 | 43.6 | 3.33 | 2.6 | 11.2 | 4.30 | 4.47 | 4.61 | |
| 6875 | 23,779 | 41.1 | 193.6 | 4.71 | 12.4 | 62.1 | 5.02 | 76.64 | 81.56 | |
| 4638 | 10,665 | 17.3 | 55.0 | 3.18 | 3.6 | 14.6 | 4.07 | 5.60 | 5.36 | |
| 8728 | 31,164 | 53.5 | 215.4 | 4.02 | 13.1 | 70.8 | 5.42 | 65.14 | 68.26 | |
| 9826 | 48,835 | 87.5 | 351.4 | 4.02 | 21.5 | 120.1 | 5.59 | 213.95 | 230.16 | |
| 10,186 | 51,484 | 89.2 | 391.8 | 4.39 | 22.0 | 134.1 | 6.10 | 191.63 | 209.63 | |