| Literature DB >> 21342542 |
Daiji Fukagawa1, Takeyuki Tamura, Atsuhiro Takasu, Etsuji Tomita, Tatsuya Akutsu.
Abstract
BACKGROUND: Measuring similarities between tree structured data is important for analysis of RNA secondary structures, phylogenetic trees, glycan structures, and vascular trees. The edit distance is one of the most widely used measures for comparison of tree structured data. However, it is known that computation of the edit distance for rooted unordered trees is NP-hard. Furthermore, there is almost no available software tool that can compute the exact edit distance for unordered trees.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21342542 PMCID: PMC3044267 DOI: 10.1186/1471-2105-12-S1-S13
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Example of tree edit operations and edit distance mapping under the unit cost model. T2 is obtained from T1 by deletion of node (labeled with) e, insertion of node k and substitution of node f. The corresponding mapping M is shown by broken curves.
Figure 2Example of the reduction from tree edit distance to maximum clique. We consider the case of γ(a, ∈) = γ(∈, a) = 1, γ(a, a) = 0, and γ(a, b) = 2 for a ≠ b (i.e., f(v, v) = 2 and f(u, v) = 0 for ℓ(u) ≠ℓ(v)). In the left figure, both label and node ID are shown above and below parts of each node, respectively. Vertices with f(u) = 0 are omitted in the right figure. The maximum clique shown by bold lines in the right figure corresponds to the optimal edit distance mapping shown by broken lines in the left figure.
CPU time on maximum vertex weighted clique-based method
| total number of nodes | average CPU time (sec.) |
|---|---|
| 30 ~ 34 | 0.004340 |
| 35 ~ 39 | 0.004990 |
| 40 ~ 44 | 0.015200 |
| 45 ~ 49 | 0.050800 |
| 50 ~ 54 | 0.473000 |
| 55 ~ 59 | 2.160000 |
| 60 ~ 64 | 3.020000 |
| 65 ~ 69 | 15.300000 |
| 70 ~ 74 | 4.380000 |
| 75 ~ 79 | 2.610000 |
| 80 ~ 84 | 7.930000 |
| 85 ~ 89 | 232.000000 |
CPU time on maximum clique-based method
| total number of nodes | average CPU time (sec.) |
|---|---|
| 30 ~ 34 | 0.010400 |
| 35 ~ 39 | 0.000191 |
| 40 ~ 44 | 0.000203 |
| 45 ~ 49 | 0.001100 |
| 50 ~ 54 | 0.000780 |
| 55 ~ 59 | 0.004530 |
| 60 ~ 64 | 0.125000 |
| 65 ~ 69 | 4.600000 |
| 70 ~ 75 | 0.016400 |
| 75 ~ 79 | 0.032800 |
| 80 ~ 84 | 0.000087 |
| 85 ~ 89 | 0.000032 |
Figure 3ROC curve for leukemia dataset
Figure 4
ROC curve for erythrocyte dataset
Comparison of glycan similarity measures via AUC score
| AUC score | CPU time (sec.) | ||
|---|---|---|---|
| leukemia | erythrocyte | ||
| global alignment score [ | 0.686 | 0.797 | 10.08 |
| local alignment score [ | 0.623 | 0.822 | 10.18 |
| ordered tree edit distance | 0.729 | 0.773 | 38.02 |
| unordered tree edit distance | 0.731 | 0.777 | 48.33 |
| reversed ordered tree | 0.730 | 0.769 | 37.92 |
Figure 5Comparison of unordered and ordered tree edit distances
Figure 6Comparison of tree edit and glycan alignment