| Literature DB >> 21172058 |
He Quan Sun1, Malcolm Yoke Hean Low, Wen Jing Hsu, Jagath C Rajapakse.
Abstract
BACKGROUND: Weak motif discovery in DNA sequences is an important but unresolved problem in computational biology. Previous algorithms that aimed to solve the problem usually require a large amount of memory or execution time. In this paper, we proposed a fast and memory efficient algorithm, RecMotif, which guarantees to discover all motifs with specific (l, d) settings (where l is the motif length and d is the maximum number of mutations between a motif instance and the true motif).Entities:
Mesh:
Year: 2010 PMID: 21172058 PMCID: PMC3024859 DOI: 10.1186/1471-2105-11-S11-S8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1RecMotif example. This figure gives an example for explaining the process of RecMotif.
Figure 2Example for processing vertex A.This figure gives an example for processing a vertex A in the example of Figure 1.
Time complexities of RecMotif.
| Range of | Range of | ||
|---|---|---|---|
| (0,0.035] | (0.32,0.38] | ||
| (0.035,0.10] | (0.38,0.43] | ||
| (0.10,0.18] | (0.43,0.47] | ||
| (0.18,0.26] | (0.47,0.51] | ||
| (0.26,0.32] | (0.51,0.54] |
This table shows the complexities of RecMotif with different p and n.
Algorithm complexities.
| Algorithm | Complexity | |
|---|---|---|
| Time | Space | |
| PMSprune | ||
| iTriplet | ||
| DPCFG | ||
| RecMotif | See Table | |
This table shows time and space complexities of PMSprune, iTriplet, DPCFG and RecMotif.
*note,
Figure 3Effects of sequence length n on execution time on model (15, 4). This figure shows for MCP, how the execution times of PMSprune, iTriplet, DPCFG and RecMotif change with increasing n.
Effects of (l, d) on execution time.
| ( | Algorithm | |||
|---|---|---|---|---|
| DPC-FG | PMS-Prune | iTr-iplet | Rec-Motif | |
| (12, 3): 0.054 | 0.825 | 1.63 | 173.6 | 0.630 |
| (15, 4): 0.057 | 0.673 | 5.22 | 189.2 | 0.703 |
| (18, 5): 0.057 | 0.596 | 16.9 | 230.4 | 0.700 |
| (21, 6): 0.056 | 0.532 | 46.5 | 250.0 | 0.677 |
| (24, 7): 0.055 | 0.475 | 80.2 | 291.5 | 0.585 |
| (27, 8): 0.053 | 0.432 | 137.1 | 354.4 | 0.633 |
| (30, 9): 0.051 | 0.394 | 242.9 | 443.6 | 0.629 |
| (33, 10): 0.048 | 0.365 | 405.2 | 553.8 | 0.556 |
| (36, 11): 0.046 | 0.329 | 651.8 | 1419 | 0.484 |
| (39, 12): 0.044 | 0.311 | 1056 | 2779 | 0.500 |
| (42, 13): 0.042 | 0.286 | 1842 | 2895 | 0.483 |
| (44, 14): 0.063 | 0.674 | -o | -e | 0.971 |
| (47, 15): 0.059 | 0.577 | -o | -e | 0.921 |
| (50, 16): 0.055 | 0.520 | -o | -e | 0.832 |
This table shows how the execution times of algorithms, including PMSprune, iTriplet, DPCFG and RecMotif are influenced by the values of l and d with approximately fixed p.
*note, -o: over 5 hours; -e: error on memory allocation. Time unit: seconds
Different motifs for increasing p.
| ID | ( | ID | ( | ||
|---|---|---|---|---|---|
| 1 | 0.029 | (28, 8) | 15 | 0.163 | (50, 17) |
| 2 | 0.086 | (29, 9) | 16 | 0.167 | (36, 12) |
| 3 | 0.096 | (23, 7) | 17 | 0.175 | (19, 6) |
| 4 | 0.101 | (20, 6) | 18 | 0.181 | (33, 11) |
| 5 | 0.103 | (40, 13) | 19 | 0.190 | (16, 5) |
| 6 | 0.107 | (17, 5) | 20 | 0.197 | (30, 10) |
| 7 | 0.112 | (14, 4) | 21 | 0.206 | (41, 14) |
| 8 | 0.112 | (37, 12) | 22 | 0.214 | (27, 9) |
| 9 | 0.119 | (34, 11) | 23 | 0.223 | (38, 13) |
| 10 | 0.128 | (31, 10) | 24 | 0.234 | (24, 8) |
| 11 | 0.139 | (28, 9) | 25 | 0.242 | (35, 12) |
| 12 | 0.149 | (25, 8) | 26 | 0.260 | (54, 19) |
| 13 | 0.155 | (39, 13) | 27 | 0.283 | (18, 6) |
| 14 | 0.162 | (22, 7) | 28 | 0.285 | (40, 14) |
This table shows the values of l and d with increasing p.
Figure 4Effects of increasing p (with related (l, d)) on execution time This figure shows how the execution times of algorithms, including PMSprune, iTriplet, DPCFG and RecMotif, change with the parameter p.
Results of RecMotif on biological data
| Data | Discovered | Model | Published |
|---|---|---|---|
| (16, 3) | ccatattaggacatct | ||
| (13, 2) | ttcgcgccaaact | ||
| (18, 6) | tgtgaxxxxgxtcaca | ||
| (20, 5) | tactgtatatatatacagta | ||
| (10, 2) | cctcagcccc, | ||
| agacccagca, | |||
| (13, 2) | ccctaatgggcca |
This table shows the results of RecMotif on real biological data.
*note: ’x’ in the consensus means all bases appear with opportunity less than 50 percent