| Literature DB >> 19208109 |
Dimitris Papamichail1, Georgios Papamichail.
Abstract
BACKGROUND: The problem of approximate string matching is important in many different areas such as computational biology, text processing and pattern recognition. A great effort has been made to design efficient algorithms addressing several variants of the problem, including comparison of two strings, approximate pattern identification in a string or calculation of the longest common subsequence that two strings share.Entities:
Mesh:
Year: 2009 PMID: 19208109 PMCID: PMC2648743 DOI: 10.1186/1471-2105-10-S1-S10
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Dependency graph.
Figure 2Edit graphs under different scoring schemes. Edit graph cell values and optimal paths under different scoring schemes.
Figure 3Edit distance algorithm iterations. Edit distance algorithm iterations. The main diagonal is depicted in blue, iteration transitions are drawn in red and green alternatively. Cells whose values are presented have been inserted in the list at the end of each iteration, where cells that their values are circled have been removed from the list, dominated by the cells they connect with arcs.
Figure 4Performance on random strings. Edit distance calculations on random strings with different length ratios, comparing the performance of ours, Ukkonen's and the basic algorithms.
Algorithm performance comparing biologically related sequences of similar length
| Sequence A | Sequence B | Alphabet size | (Average) | Our algorithm | Ukkonen's algorithm | Basic algorithm | (Average) |
| Random 16S | Random 16S | 4 | 1350 | 0.679 | 0.811 | 2.554 | 421.3 |
| Hyphomonas 16S | Hyphomonas 16S | 4 | 1330 | 0.25 | 0.18 | 2.14 | 46 |
| Alphaproteobacteria 16S | Betaproteobacteria 16S | 4 | 1320 | 0.42 | 0.46 | 2.07 | 318 |
| Cucumber necrosis | Lisianthus necrosis | 4 | 4790 | 6.70 | 6.32 | 28.27 | 1154 |
| Human poliovirus 1 | Human Rhinovirus A | 20 | 870 | 1.02 | 1.05 | 0.88 | 472 |