| Literature DB >> 17217504 |
Abstract
BACKGROUND: The problem of finding a Shortest Common Supersequence (SCS) of a set of sequences is an important problem with applications in many areas. It is a key problem in biological sequences analysis. The SCS problem is well-known to be NP-complete. Many heuristic algorithms have been proposed. Some heuristics work well on a few long sequences (as in sequence comparison applications); others work well on many short sequences (as in oligo-array synthesis). Unfortunately, most do not work well on large SCS instances where there are many, long sequences.Entities:
Mesh:
Year: 2006 PMID: 17217504 PMCID: PMC1780115 DOI: 10.1186/1471-2105-7-S4-S12
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1A step-by-step illustration of the process of finding a common supersequence via deposition. The characters above bar represents the sequences have yet to be deposited. The underlined blue characters are the characters that are deposited in the current step. The final result is CS = "ACGCT", which, for this example, is also the optimal result.
Figure 2The procedure of the reduction process.
The results of the Deposition process and the Reduction process.
| 100 | 100 | 158.0 | 267.8 (7.1) | |||
| 500 | 100 | 160.6 | 274.9 (7.3) | |||
| 1000 | 100 | 161.3 | 277.1 (6.0) | |||
| 5000 | 100 | 162.9 | 281.6 (4.0) | |||
| 100 | 1000 | 1441.8 | 2495.4 (50.6) | |||
| 500 | 1000 | 1457.6 | 2543.9 (48.6) | |||
| 1000 | 1000 | 1472.3 | 2557.5 (45.1) | |||
| 5000 | 1000 | 1481.6 | 2566.6 (35.1) | |||
Each row of the table represents 10 randomly generated instances. For each row, we list the average value for LB (the lower bound), |CS|, and |CS| (the length of the common supersequences after Depostion and Reduction), their standard deviations (in parenthesis), and the estimated performance ratios R'and R'(|CS|/LB and |CS|/LB)
A comparison of the lengths of the SCS results obtained by different algorithms on simulated DNA sequences.
| 100 | 100 | 158.0 | 400 | 2.53 | 304.6 (15.3) | 1.93 | 303.9 (11.9) | 1.92 | 276.8 (5.5) | 1.75 | ||
| 500 | 100 | 160.6 | 400 | 2.49 | 329.0 (18.2) | 2.05 | 321.6 (9.1) | 2.00 | 286.4 (7.7) | 1.78 | ||
| 1000 | 100 | 161.3 | 400 | 2.48 | 335.4 (19.9) | 2.08 | 329.0 (17.8) | 2.04 | 289.1 (8.2) | 1.79 | ||
| 5000 | 100 | 162.9 | 400 | 2.46 | 339.8 (21.2) | 2.09 | 336.2 (21.2) | 2.06 | 294.8 (10.6) | 1.81 | ||
| 100 | 1000 | 1441.8 | 4000 | 2.77 | 2936.7 (146.6) | 2.04 | 2921.5 (143.4) | 2.03 | 2547.1 (24.7) | 1.77 | ||
| 500 | 1000 | 1457.6 | 4000 | 2.74 | 3049.6 (150.0) | 2.09 | 3043.6 (145.2) | 2.09 | 2578.0 (23.3) | 1.77 | ||
| 1000 | 1000 | 1472.3 | 4000 | 2.72 | 3142.3 (176.9) | 2.13 | 3115.5 (173.6) | 2.12 | 2590.9 (26.5) | 1.76 | ||
| 5000 | 1000 | 1481.6 | 4000 | 2.70 | 3194.5 (221.9) | 2.16 | 3172.8 (199.0) | 2.14 | 2602.1 (25.6) | 1.76 | ||
The average and standard deviation (in parenthesis) over 10 randomly generated instances are given. The estimated performance ratios are also given for each algorithm.
The comparison of the lengths of the SCS results obtained by different algorithms on selected DNA and protein sequences.
| DNA-1 | 100 | 500 | 686.6 | 2,000 | 2.91 | 1359.7 (18.7) | 1.98 | ||
| DNA-2 | 500 | 500 | 689.5 | 2,000 | 2.90 | 1430.5 (22.1) | 2.07 | ||
| DNA-3 | 100 | 1000 | 1361.0 | 4,000 | 2.94 | 2698.0 (39.9) | 1.98 | ||
| DNA-4 | 500 | 1000 | 1364.4 | 4,000 | 2.93 | 2822.3 (43.4) | 2.07 | ||
| PROT-1 | 100 | 500 | 800.6 | 10,000 | 12.49 | 5312.7 (81.9) | 6.64 | ||
| PROT-2 | 500 | 500 | 803.5 | 10,000 | 12.45 | 5935.6 (61.8) | 7.39 | ||
| PROT-3 | 1000 | 500 | 809.8 | 10,000 | 12.35 | 6082.4 (44.9) | 7.51 | ||
The average and standard deviation (in parenthesis) are given. The estimated performance ratios are also given for each algorithm.
The comparison of the lengths of the SCS results between RE and DR on simulated DNA sequences.
| 5 | 10 | 40 | 20.45 (0.53) | |
| 10 | 10 | 40 | 26.04 (0.93) | |
| 50 | 10 | 40 | 35.00 (1.00) | |
| 100 | 10 | 40 | 34.69 (0.97) | |
| 5 | 100 | 400 | 188.08 (5.55) | |
| 10 | 100 | 400 | 229.28 (7.61) | |
| 50 | 100 | 400 | 286.12 (13.34) | |
| 100 | 100 | 400 | 281.95 (10.86) | |
In each cell, the average and standard deviation (in parenthesis) over 10 randomly generated instances are given.
The comparison the lengths of the SCS results between RE and DR on selected DNA and protein sequences.
| DNA-5 | 100 | 100 | 400 | 301.04 (8.85) | |
| DNA-6 | 500 | 100 | 400 | 308.87 (7.61) | |
| PROT-4 | 100 | 100 | 2,000 | 1028.90 (35.54) | |
| PROT-5 | 500 | 100 | 2,000 | 1232.10 (31.03) | |
In each cell, the average and standard deviation (in parenthesis) over 10 randomly generated instances are given.