| Literature DB >> 26557649 |
Md Aashikur Rahman Azim1, Costas S Iliopoulos2, M Sohel Rahman1, M Samiruzzaman2.
Abstract
This paper deals with the circular pattern matching (CPM) problem, which appears as an interesting problem in many biological contexts. CPM consists in finding all occurrences of the rotations of a pattern 𝒫 of length m in a text 𝒯 of length n. In this paper, we present SimpLiFiCPM (pronounced "Simplify CPM"), a simple and lightweight filter-based algorithm to solve the problem. We compare our algorithm with the state-of-the-art algorithms and the results are found to be excellent. Much of the speed of our algorithm comes from the fact that our filters are effective but extremely simple and lightweight.Entities:
Year: 2015 PMID: 26557649 PMCID: PMC4628665 DOI: 10.1155/2015/259320
Source DB: PubMed Journal: Int J Genomics ISSN: 2314-436X Impact factor: 2.326
Algorithm 1Exact circular pattern signature using Observations 1–6 in a single pass.
Algorithm 2Reduction of search space in a text string using Procedure ECPS_FT.
An example simulation of SimpLiFiCPM.
| Iteration | Local total sum | abs sum | Actual sum | Local individual sum [0 : 4] | modulas sum | xor sum | Does it match with pattern? | Output file |
|---|---|---|---|---|---|---|---|---|
| 1 | 18 | 14 | 0 | {2, 2, 6, 8} | 5 | 28 | YES |
|
| 2 | 15 | 12 | 0 | {3, 2, 6, 4} | 4 | 18 | NO | $ |
| 3 | 13 | 8 | 0 | {4, 2, 3, 4} | 3 | 14 | NO | |
| 4 | 15 | 8 | 0 | {3, 2, 6, 4} | 6 | 18 | NO | |
| 5 | 15 | 8 | 0 | {3, 2, 6, 4} | 6 | 18 | NO | |
| 6 | 14 | 10 | 0 | {4, 0, 6, 4} | 5 | 18 | NO | |
| 7 | 12 | 6 | 0 | {5, 0, 3, 4} | 4 | 14 | NO | |
| 8 | 15 | 12 | 0 | {4, 0, 3, 8} | 5 | 24 | NO | |
| 9 | 16 | 12 | 0 | {3, 2, 3, 8} | 5 | 28 | NO | |
| 10 | 18 | 10 | 0 | {2, 2, 6, 8} | 6 | 24 | NO | |
| 11 | 16 | 14 | 0 | {3, 2, 3, 8} | 4 | 24 | NO | |
| 12 | 16 | 14 | 0 | {3, 2, 3, 8} | 4 | 24 | NO | |
| 13 | 18 | 14 | 0 | {2, 2, 6, 8} | 5 | 28 | YES |
|
Elapsed time (in seconds) of and speed-up comparisons among Filter-CPM [8], ACSMF-SimpleZerok [4], and SimpLiFiCPM on a text of size 299 MB.
|
| Elapsed time (s) of ACSMF-SimpleZero | Elapsed time (s) of Filter-CPM | Speed-up: ACSMF-SimpleZero | Elapsed time (s) of SimpLiFiCPM | Speed-up: ACSMF-SimpleZero |
|---|---|---|---|---|---|
| 500 | 5.938 | 3.025 | 2 | 1.167 | 5 |
| 550 | 7.914 | 3.068 | 3 | 1.456 | 5 |
| 600 | 7.691 | 3.06 | 3 | 1.364 | 6 |
| 650 | 7.836 | 3.074 | 3 | 1.006 | 8 |
| 700 | 7.739 | 3.072 | 3 | 1.028 | 8 |
| 750 | 7.82 | 3.051 | 3 | 1.073 | 7 |
| 800 | 7.839 | 3.209 | 2 | 1.04 | 8 |
| 850 | 8.382 | 3.053 | 3 | 1.055 | 8 |
| 900 | 7.646 | 3.055 | 3 | 1.278 | 6 |
| 950 | 7.876 | 3.049 | 3 | 1.402 | 6 |
| 1000 | 7.731 | 3.067 | 3 | 1.216 | 6 |
| 1600 | 7.334 | 3.206 | 2 | 1.182 | 6 |
| 1650 | 8.239 | 3.063 | 3 | 0.969 | 9 |
| 1700 | 7.572 | 3.059 | 2 | 1.18 | 6 |
| 1750 | 5.968 | 3.066 | 2 | 1.144 | 5 |
| 1800 | 7.551 | 3.064 | 2 | 1.179 | 6 |
| 1850 | 7.407 | 3.079 | 2 | 1.086 | 7 |
| 1900 | 7.861 | 3.225 | 2 | 1.126 | 7 |
| 1950 | 7.339 | 3.073 | 2 | 1.028 | 7 |
| 2000 | 7.814 | 3.062 | 3 | 1.118 | 7 |
| 2050 | 5.969 | 3.057 | 2 | 1.988 | 3 |
| 2100 | 5.173 | 3.036 | 2 | 1.187 | 4 |
| 2150 | 5.317 | 3.027 | 2 | 1.919 | 3 |
| 2200 | 6.032 | 3.168 | 2 | 1.927 | 3 |
| 2250 | 5.009 | 3.073 | 2 | 1.895 | 3 |
| 2300 | 5.029 | 3.024 | 2 | 1.891 | 3 |
| 2350 | 5.041 | 3.047 | 2 | 1.887 | 3 |
| 2400 | 6.036 | 3.046 | 2 | 1.91 | 3 |
| 2450 | 6.04 | 3.037 | 2 | 1.886 | 3 |
| 2500 | 7.046 | 3.029 | 2 | 1.976 | 4 |
| 2550 | 7.042 | 3.037 | 2 | 1.987 | 4 |
| 2600 | 8.043 | 4.029 | 2 | 2.883 | 3 |
| 2650 | 8.049 | 4.03 | 2 | 2.884 | 3 |
| 2700 | 8.031 | 4.183 | 2 | 2.892 | 3 |
| 2750 | 8.039 | 4.044 | 2 | 2.882 | 3 |
| 2800 | 9.026 | 4.067 | 2 | 2.886 | 3 |
| 2850 | 9.154 | 4.036 | 2 | 2.901 | 3 |
| 2900 | 10.049 | 4.045 | 2 | 3.134 | 3 |
| 2950 | 11.044 | 5.052 | 2 | 3.876 | 3 |
| 3000 | 12.044 | 6.039 | 2 | 3.9 | 3 |
Elapsed time (in seconds) of and speed-up comparisons among ACSMF-SimpleZerok and three variants of SimpLiFiCPM (considering different combination of the filters) for a text of size 299 MB.
|
| Filters 1 to 3 | Filters 1 to 4 | Filters 1 to 5 | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Elapsed time (s) of ACSMF-SimpleZero | Elapsed time (s) of SimpLiFiCPM-[1 ⋯ 3] | Speed-up: ACSMF-SimpleZero | Elapsed time (s) of ACSMF-SimpleZero | Elapsed time (s) of SimpLiFiCPM-[1 ⋯ 4] | Speed-up: ACSMF-SimpleZero | Elapsed time (s) of ACSMF-SimpleZero | Elapsed time (s) of SimpLiFiCPM-[1 ⋯ 5] | Speed-up: ACSMF-SimpleZero | |
| 500 | 6.355 | 3.522 | 2 | 6.373 | 4.973 | 1 | 6.397 | 2.523 | 3 |
| 550 | 8.526 | 20.43 | 0 | 8.564 | 4.866 | 2 | 8.38 | 2.545 | 3 |
| 600 | 8.149 | 43.544 | 0 | 8.286 | 4.902 | 2 | 8.359 | 2.518 | 3 |
| 650 | 8.315 | 4.35 | 2 | 8.448 | 4.894 | 2 | 8.324 | 2.47 | 3 |
| 700 | 9.063 | 7.596 | 1 | 8.71 | 4.9 | 2 | 8.249 | 2.493 | 3 |
| 750 | 8.399 | 6.837 | 1 | 8.643 | 5.101 | 2 | 8.326 | 2.478 | 3 |
| 800 | 8.357 | 16.293 | 1 | 8.346 | 4.915 | 2 | 8.265 | 2.48 | 3 |
| 850 | 8.79 | 10.651 | 1 | 8.309 | 4.924 | 2 | 8.48 | 2.562 | 3 |
| 900 | 7.959 | 23.181 | 0 | 8.411 | 4.916 | 2 | 8.223 | 2.525 | 3 |
| 950 | 8.652 | 15.443 | 1 | 8.552 | 4.93 | 2 | 8.678 | 2.519 | 3 |
| 1000 | 8.285 | 12.399 | 1 | 8.371 | 4.916 | 2 | 8.375 | 2.616 | 3 |
| 1600 | 7.846 | 6.074 | 1 | 7.927 | 4.915 | 2 | 7.872 | 2.529 | 3 |
| 1650 | 8.918 | 2.691 | 3 | 8.878 | 4.904 | 2 | 8.854 | 2.523 | 4 |
| 1700 | 7.839 | 6.506 | 1 | 7.697 | 4.897 | 2 | 7.8 | 2.522 | 3 |
| 1750 | 6.252 | 30.173 | 0 | 6.523 | 5.09 | 1 | 6.399 | 2.526 | 3 |
| 1800 | 8.643 | 26.655 | 0 | 8.218 | 4.918 | 2 | 8.143 | 2.487 | 3 |
| 1850 | 8.072 | 2.901 | 3 | 8.026 | 4.901 | 2 | 8.095 | 2.532 | 3 |
| 1900 | 8.442 | 30.468 | 0 | 8.495 | 4.927 | 2 | 8.297 | 2.516 | 3 |
| 1950 | 8.123 | 2.542 | 3 | 8.367 | 4.927 | 2 | 7.951 | 2.495 | 3 |
| 2000 | 8.366 | 12.175 | 1 | 8.58 | 5.13 | 2 | 8.394 | 2.533 | 3 |