| Literature DB >> 36071384 |
Changyong Yu1, Pengxi Lin1, Yuhai Zhao2, Tianmei Ren1, Guoren Wang3.
Abstract
BACKGROUND: In various fields, searching for the Longest Common Subsequences (LCS) of Multiple (i.e., three or more) sequences (MLCS) is a classic but difficult problem to solve. The primary bottleneck in this problem is that present state-of-the-art algorithms require the construction of a huge graph (called a direct acyclic graph, or DAG), which the computer usually has not enough space to handle. Because of their massive time and space consumption, present algorithms are inapplicable to issues with lengthy and large-scale sequences.Entities:
Keywords: Mini-MLCS; Multiple longest common subsequences (MLCS); The branch and bound
Mesh:
Year: 2022 PMID: 36071384 PMCID: PMC9450393 DOI: 10.1186/s12859-022-04906-5
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
Fig. 1shows the L score table for two sequences, and . The LCS can be determined by traveling from number 5 to number 1 in the scoring table L. And the dominates region may be represented by a shaded portion
Fig. 2The DAG is constructed for three sequences, , and , with black and gray nodes representing repeated and dominated nodes, respectively
Fig. 3The process of estimating a lower bound Lower(R) of the length of the longest paths in DAG
Fig. 4a, b and c show the processes of using Theorem 1, Strategy 2, and Strategy 3 to removing unnecessary points, respectively. And grey point, dark grey point and light grey point represent unnecessary points in a, b and c, respectively
The run time (s)/memory (GB) consumed by the compared algorithms on DNA sequences with length fixed to 120
| Number of sequences | DNA( | ||||
|---|---|---|---|---|---|
| mini-MLCS | Top-MLCS | Quick-DP | Fast-MLCS | ||
| 10,000 | 15 | 1245.7/31.8 | |||
| 15,000 | 14 | 1499.2/17.2 | |||
| 20,000 | 13 | 1362.8/17.6 | |||
| 25,000 | 13 | 1225.1/35.8 | |||
| 30,000 | 11 | 226.1/2.9 | 515.5/5.1 | ||
| 35,000 | 11 | 403.0/3.9 | 590.6/5.1 | ||
| 40,000 | 11 | 256.1/3.1 | 432.6/5.5 | ||
| 45,000 | 11 | 393.7/3.4 | 480.2/5.4 | ||
| 50,000 | 11 | 298.8/3.6 | 448.1/6.6 | ||
The bold values represent the minimum running time and minimum memory of all the algorithms in the table on the dataset
The run time (s)/memory (GB) consumed by the compared algorithms on DNA sequences with number fixed to 20,000
| Length of sequences | DNA( | ||||
|---|---|---|---|---|---|
| mini-MLCS | Top-MLCS | Quick-DP | Fast-MLCS | ||
| 90 | 8 | 3.7/0.2 | 4.8/0.1 | 38.84/0.8 | |
| 95 | 9 | 14.1/0.3 | 25.6/0.4 | 466.5/0.9 | |
| 100 | 10 | 39.5/1.1 | 76.9/2.6 | 4531.1/1.7 | |
| 105 | 11 | 91.9/2.1 | 339.6/3.6 | ||
| 110 | 11 | 130.2/2.5 | 437.3/5.3 | ||
| 115 | 12 | 256.0/3.6 | 809.9/6.6 | ||
| 120 | 13 | 1362.8/17.6 | |||
The average results for benchmark BL, length fixed to 100
| Number of sequences | A* | Top-MLCS | mini-MLCS | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| 10 | 4 | 20.5 | 428.33 | 6 | 0.0 | – | 0 | 20.5 | 6 | |
| 12 | 12.7 | 1.73 | 10 | 12.7 | 5.2 | 10 | 12.7 | 10 | ||
| 20 | 7.9 | 0.08 | 10 | 7.9 | 0.28 | 10 | 7.9 | 10 | ||
| 50 | 4 | 0.0 | – | 0 | 0.0 | – | 0 | 20.1 | 7 | |
| 12 | 6.9 | 0.17 | 10 | 6.9 | 0.46 | 10 | 6.9 | 10 | ||
| 20 | 3.0 | 0.06 | 10 | 3.0 | 0.08 | 10 | 3.0 | 10 | ||
| 100 | 4 | 0.0 | – | 0 | 0.0 | – | 0 | 19.3 | 6 | |
| 12 | 5.2 | 0.08 | 10 | 5.2 | 0.23 | 10 | 5.2 | 10 | ||
| 20 | 2.1 | 0.07 | 10 | 2.1 | 0.08 | 10 | 2.1 | 10 | ||
| 150 | 4 | 0.0 | – | 0 | 0.0 | – | 0 | 18.8 | 9 | |
| 12 | 4.7 | 0.07 | 10 | 4.7 | 0.16 | 10 | 4.7 | 10 | ||
| 20 | 1.9 | 0.08 | 10 | 1.9 | 0.08 | 10 | 1.9 | 10 | ||
| 200 | 4 | 0.0 | – | 0 | 0.0 | – | 0 | 18.0 | 8 | |
| 12 | 4.1 | 0.07 | 10 | 4.1 | 0.18 | 10 | 4.1 | 10 | ||
| 20 | 1.1 | 0.06 | 10 | 1.1 | 0.11 | 10 | 1.1 | 10 | ||
The bold values represent the shortest average running time of all the algorithms in the table on the dataset
The Lower(R)/run time(s) generated by different change in t
| Length of sequences | ||||||
|---|---|---|---|---|---|---|
| 10 | 20 | 50 | 100 | 150 | ||
| 90 | 8 | 8/0.06 | 8/1.63 | 8/2.73 | 8/4.13 | 8/5.79 |
| 95 | 9 | 8/1.28 | 8/1.88 | 9/2.60 | 9/4.88 | 9/6.68 |
| 100 | 10 | 10/1.15 | 10/2.28 | 10/4.99 | 10/8.77 | 10/10.81 |
| 105 | 11 | 11/0.77 | 11/2.50 | 11/3.47 | 11/10.60 | 11/9.37 |
| 110 | 11 | 10/1.68 | 11/2.74 | 11/4.35 | 11/10.19 | 11/13.03 |
| 115 | 12 | 10/1.55 | 11/2.85 | 11/6.75 | 12/11.29 | 12/15.77 |
| 120 | 13 | 12/1.07 | 12/2.53 | 12/4.21 | 13/6.38 | 13/10.21 |