| Literature DB >> 27103940 |
Abstract
BACKGROUND: Measuring sequence similarity is central for many problems in bioinformatics. In several contexts alignment-free techniques based on exact occurrences of substrings are faster, but also less accurate, than alignment-based approaches. Recently, several studies attempted to bridge the accuracy gap with the introduction of approximate matches in the definition of composition-based similarity measures.Entities:
Keywords: Alignment free; Compositional approaches; Mismatches; Phylogenetic analysis; Sequence similarity
Year: 2016 PMID: 27103940 PMCID: PMC4839165 DOI: 10.1186/s13015-016-0072-x
Source DB: PubMed Journal: Algorithms Mol Biol ISSN: 1748-7188 Impact factor: 1.405
Fig. 1Candidate from when
Fig. 2Candidate from when
Fig. 3Guessing the maximum extension between the suffix and a candidate
Average time (in seconds) for the comparison of two sequences on several datasets with the “relaxed” version of MissMax
| k | Rodents | Carnivora | Mixed | Random |
|---|---|---|---|---|
| 5 | 2.11 | 2.22 | 2.15 | 1.85 |
| 10 | 2.45 | 2.56 | 2.47 | 2.18 |
| 20 | 3.07 | 3.17 | 3.09 | 2.72 |
| 50 | 4.74 | 4.87 | 4.82 | 4.28 |
| 100 | 7.42 | 7.62 | 7.57 | 6.65 |
Average time (in seconds) for the comparison of two sequences on several datasets with the exact version of Missmax
| k | Rodents | Carnivora | Mixed | Random |
|---|---|---|---|---|
| 5 | 4.50 | 5.16 | 4.45 | 3.26 |
| 10 | 5.16 | 6.39 | 5.64 | 3.81 |
| 20 | 7.47 | 8.46 | 7.52 | 4.85 |
| 50 | 12.08 | 14.29 | 12.9 | 8.26 |
| 100 | 21.01 | 23.55 | 21.28 | 13.15 |
Fig. 4Time performance of relaxed MissMax for different values of k, as a function of the input length
Fig. 5The tree for the 27 primates dataset reconstructed by MissMax with . It is in perfect agreement with the reference tree reported in [11]
Average percentage of pair of positions considered in several datasets with the relaxed filter, with respect to the quadratic maximum number of pairs
| Relaxed MissMax | k = 5 | k = 10 | k = 20 | k = 50 | k = 100 |
|---|---|---|---|---|---|
| Rodents | 7.26 | 8.37 | 9.15 | 9.74 | 9.93 |
| Mixed | 7.16 | 8.15 | 8.76 | 9.28 | 9.79 |
| Random | 9.73 | 13.31 | 15.17 | 15.29 | 15.27 |
Average percentage of pair of positions considered in several datasets with the exact filter, with respect to the quadratic maximum number of pairs.
| Exact MissMax | k = 5 | k = 10 | k = 20 | k = 50 | k = 100 |
|---|---|---|---|---|---|
| Rodents | 9.16 | 12.38 | 16.08 | 18.13 | 19.89 |
| Mixed | 9.34 | 12.64 | 16.45 | 17.93 | 19.32 |
| Random | 15.40 | 26.92 | 33.51 | 32.56 | 31.54 |