| Literature DB >> 19325851 |
Jason T L Wang1, Huiyuan Shan, Dennis Shasha, William H Piel.
Abstract
As the size of phylogenetic databases grows, the need for efficiently searching these databases arises. Thanks to previous and ongoing research, searching by attribute value and by text has become commonplace in these databases. However, searching by topological or physical structure, especially for large databases and especially for approximate matches, is still an art. We propose structural search techniques that, given a query or pattern tree P and a database of phylogenies D, find trees in D that are sufficiently close to P. The "closeness" is a measure of the topological relationships in P that are found to be the same or similar in a tree D in D. We develop a filtering technique that accelerates searches and present algorithms for rooted and unrooted trees where the trees can be weighted or unweighted. Experimental results on comparing the similarity measure with existing tree metrics and on evaluating the efficiency of the search techniques demonstrate that the proposed approach is promising.Entities:
Keywords: Structural pattern matching; phylogenetic trees; structural search and retrieval; tree search strategies
Year: 2007 PMID: 19325851 PMCID: PMC2658875
Source DB: PubMed Journal: Evol Bioinform Online ISSN: 1176-9343 Impact factor: 1.625
Figure 1illustration of up and down operations between two nodes in a tree.
Figure 2a tree and its up and down matrices.
Figure 3example trees.
Figure 4example showing how the data tree reduction technique works in near neighbour searching.
Figure 5distribution of PAR metric values.
Figure 9distribution of USim values.
Figure 8distribution of QUA metric values.
comparison of the five studied tree metrics
| Metric | Weighted trees | Internal lables | Unresolved trees | Different taxa | Polynomial computable | |
|---|---|---|---|---|---|---|
| PAR | N | N | Y | N | Y | ( |
| MAST | N | Y | N | Y | Y | ( |
| NNI | N | N | N | N | N | (DasGupta et al 1995) |
| QUA | N | N | Y | N | Y | ( |
| WSSP | Y | Y | Y | Y | Y |
Figure 10running times on 1,000 synthetic trees for search methods with and without the filter.
Figure 11running times of the proposed search method on different sizes of databases.
Figure 12an example query and search results displayed via the Web-based interface of the proposed search engine.