| Literature DB >> 35392798 |
Katarzyna Nałęcz-Charkiewicz1, Robert M Nowak2.
Abstract
BACKGROUND: The assembly task is an indispensable step in sequencing genomes of new organisms and studying structural genomic changes. In recent years, the dynamic development of next-generation sequencing (NGS) methods raises hopes for making whole-genome sequencing a fast and reliable tool used, for example, in medical diagnostics. However, this is hampered by the slowness and computational requirements of the current processing algorithms, which raises the need to develop more efficient algorithms. One possible approach, still little explored, is the use of quantum computing.Entities:
Keywords: De novo assembly; Hybrid algorithm; Quantum annealing; TSP; Travelling salesman problem; VRP; Vehicle routing problem
Mesh:
Substances:
Year: 2022 PMID: 35392798 PMCID: PMC8988116 DOI: 10.1186/s12859-022-04661-7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Diagram showing the steps of the algorithm
Fig. 2The method of calculating the Pearson correlation coefficients matrix
Parameters of sequences used for experiments
| datasetA | ||||
|---|---|---|---|---|
| Overlap length | Length | Coverage | % of errors | Average read length |
| 2300 | 30,000 | 5 | 0 | 3000 |
| 2500 | 30,000 | 5 | 0 | 3000 |
| 2700 | 30,000 | 5 | 0 | 3000 |
| 2300 | 30,000 | 5 | 0.5 | 3000 |
| 2500 | 30,000 | 5 | 0.5 | 3000 |
| 2700 | 30,000 | 5 | 0.5 | 3000 |
| 2300 | 30,000 | 5 | 1.0 | 3000 |
| 2500 | 30,000 | 5 | 1.0 | 3000 |
| 2700 | 30,000 | 5 | 1.0 | 3000 |
| 2300 | 30,000 | 5 | 1.5 | 3000 |
| 2500 | 30,000 | 5 | 1.5 | 3000 |
| 2700 | 30,000 | 5 | 1.5 | 3000 |
| 2300 | 15,000 | 10 | 0 | 3000 |
| 2500 | 15,000 | 10 | 0 | 3000 |
| 2700 | 15,000 | 10 | 0 | 3000 |
| 2300 | 15,000 | 10 | 0.5 | 3000 |
| 2500 | 15,000 | 10 | 0.5 | 3000 |
| 2700 | 15,000 | 10 | 0.5 | 3000 |
| 2300 | 15,000 | 10 | 1.0 | 3000 |
| 2500 | 15,000 | 10 | 1.0 | 3000 |
| 2700 | 15,000 | 10 | 1.0 | 3000 |
| 2300 | 15,000 | 10 | 1.5 | 3000 |
| 2500 | 15,000 | 10 | 1.5 | 3000 |
| 2700 | 15,000 | 10 | 1.5 | 3000 |
| 2300 | 10,000 | 15 | 0 | 3000 |
| 2500 | 10,000 | 15 | 0 | 3000 |
| 2700 | 10,000 | 15 | 0 | 3000 |
| 2300 | 10,000 | 15 | 0.5 | 3000 |
| 2500 | 10,000 | 15 | 0.5 | 3000 |
| 2700 | 10,000 | 15 | 0.5 | 3000 |
| 2300 | 10,000 | 15 | 1.0 | 3000 |
| 2500 | 10,000 | 15 | 1.0 | 3000 |
| 2700 | 10,000 | 15 | 1.0 | 3000 |
| 2300 | 10,000 | 15 | 1.5 | 3000 |
| 2500 | 10,000 | 15 | 1.5 | 3000 |
| 2700 | 10,000 | 15 | 1.5 | 3000 |
Fig. 3Arrangement of reads in relation to the input sequence for exemplary sequence (COVID-19) from dataset B
Results of experiments for circular random sequences (datasetA), coverage = 5
| Overlap length | Method | Path cost | Real contigs | Calculated contigs | Correct overlaps | Incorrect overlaps | Accuracy |
|---|---|---|---|---|---|---|---|
| 0% errors | |||||||
| 2300 | GOT | 0 | 0 | 0 | 51 | 0 | 1.00 |
| DWVRP | 268 | 2 | 2 | 49 | 0 | 1.00 | |
| 2500 | GOT | 0 | 0 | 0 | 51 | 0 | 1.00 |
| DWVRP | 309 | 2 | 2 | 49 | 0 | 1.00 | |
| 2700 | GOT | 0 | 0 | 0 | 51 | 0 | 1.00 |
| DWVRP | 113 | 1 | 1 | 50 | 0 | 1.00 | |
| 0.5% errors | |||||||
| 2300 | GOT | 449 | 2 | 2 | 49 | 0 | 1.00 |
| DWVRP | 648 | 1 | 6 | 45 | 0 | 0.90 | |
| 2500 | GOT | 350 | 2 | 3 | 48 | 0 | 0.98 |
| DWVRP | 390 | 1 | 7 | 44 | 0 | 0.88 | |
| 2700 | GOT | 349 | 0 | 3 | 48 | 0 | 0.94 |
| DWVRP | 484 | 0 | 10 | 41 | 0 | 0.80 | |
| 1.0% errors | |||||||
| 2300 | GOT | 507 | 0 | 2 | 49 | 0 | 0.96 |
| DWVRP | 965 | 3 | 6 | 45 | 0 | 0.94 | |
| 2500 | GOT | 469 | 1 | 4 | 47 | 0 | 0.94 |
| DWVRP | 892 | 3 | 7 | 44 | 0 | 0.92 | |
| 2700 | GOT | 648 | 1 | 5 | 46 | 0 | 0.92 |
| DWVRP | 881 | 2 | 8 | 43 | 0 | 0.88 | |
| 1.5% errors | |||||||
| 2300 | GOT | 912 | 0 | 3 | 48 | 0 | 0.94 |
| DWVRP | 1331 | 2 | 5 | 46 | 0 | 0.94 | |
| 2500 | GOT | 1054 | 3 | 5 | 45 | 1 | 0.92 |
| DWVRP | 2233 | 10 | 12 | 39 | 0 | 0.96 | |
| 2700 | GOT | 724 | 0 | 8 | 43 | 0 | 0.84 |
| DWVRP | 1124 | 2 | 10 | 41 | 0 | 0.84 | |
Results of experiments for circular random sequences (datasetA), coverage=10
| Overlap length | Method | Path cost | Real contigs | Calculated contigs | Correct overlaps | Incorrect overlaps | Accuracy |
|---|---|---|---|---|---|---|---|
| 0% errors | |||||||
| 2300 | GOT | 0 | 0 | 0 | 51 | 0 | 1.00 |
| DWVRP | 0 | 0 | 0 | 51 | 0 | 1.00 | |
| 2500 | GOT | 0 | 0 | 0 | 51 | 0 | 1.00 |
| DWVRP | 65 | 1 | 1 | 50 | 0 | 1.00 | |
| 2700 | GOT | 0 | 0 | 0 | 51 | 0 | 1.00 |
| DWVRP | 0 | 0 | 0 | 51 | 0 | 1.00 | |
| 0.5% errors | |||||||
| 2300 | GOT | 127 | 0 | 0 | 51 | 0 | 1.00 |
| DWVRP | 176 | 0 | 1 | 50 | 0 | 0.98 | |
| 2500 | GOT | 138 | 0 | 1 | 50 | 0 | 0.98 |
| DWVRP | 300 | 1 | 3 | 48 | 0 | 0.96 | |
| 2700 | GOT | 150 | 0 | 0 | 51 | 0 | 1.00 |
| DWVRP | 196 | 0 | 1 | 50 | 0 | 0.98 | |
| 1.0% errors | |||||||
| 2300 | GOT | 325 | 0 | 0 | 51 | 0 | 1.00 |
| DWVRP | 482 | 0 | 3 | 48 | 0 | 0.94 | |
| 2500 | GOT | 380 | 0 | 1 | 50 | 0 | 0.98 |
| DWVRP | 719 | 2 | 6 | 45 | 0 | 0.92 | |
| 2700 | GOT | 288 | 0 | 1 | 50 | 0 | 0.98 |
| DWVRP | 509 | 1 | 4 | 47 | 0 | 0.94 | |
| 1.5% errors | |||||||
| 2300 | GOT | 422 | 1 | 0 | 50 | 1 | 0.98 |
| DWVRP | 578 | 0 | 4 | 47 | 0 | 0.92 | |
| 2500 | GOT | 483 | 0 | 0 | 51 | 0 | 1.00 |
| DWVRP | 581 | 0 | 2 | 49 | 0 | 0.96 | |
| 2700 | GOT | 363 | 0 | 1 | 50 | 0 | 0.98 |
| DWVRP | 636 | 1 | 3 | 48 | 0 | 0.96 | |
Results of experiments for circular random sequences (datasetA), coverage=15
| Overlap length | Method | Path cost | Real contigs | Calculated contigs | Correct overlaps | Incorrect overlaps | Accuracy |
|---|---|---|---|---|---|---|---|
| 0% errors | |||||||
| 2300 | GOT | 0 | 0 | 0 | 51 | 0 | 1.00 |
| DWVRP | 0 | 0 | 0 | 51 | 0 | 1.00 | |
| 2500 | GOT | 0 | 0 | 0 | 51 | 0 | 1.00 |
| DWVRP | 0 | 0 | 0 | 51 | 0 | 1.00 | |
| 2700 | GOT | 0 | 0 | 0 | 51 | 0 | 1.00 |
| DWVRP | 0 | 0 | 0 | 51 | 0 | 1.00 | |
| 0.5% errors | |||||||
| 2300 | GOT | 285 | 0 | 0 | 51 | 0 | 1.00 |
| DWVRP | 461 | 0 | 3 | 48 | 0 | 0.94 | |
| 2500 | GOT | 450 | 0 | 3 | 48 | 0 | 0.94 |
| DWVRP | 574 | 0 | 3 | 48 | 0 | 0.94 | |
| 2700 | GOT | 272 | 0 | 4 | 47 | 0 | 0.92 |
| DWVRP | 403 | 0 | 7 | 44 | 0 | 0.86 | |
| 1.0% errors | |||||||
| 2300 | GOT | 640 | 0 | 0 | 51 | 0 | 1.00 |
| DWVRP | 779 | 0 | 3 | 48 | 0 | 0.94 | |
| 2500 | GOT | 716 | 0 | 2 | 49 | 0 | 0.96 |
| DWVRP | 903 | 0 | 3 | 48 | 0 | 0.94 | |
| 2700 | GOT | 509 | 0 | 2 | 49 | 0 | 0.96 |
| DWVRP | 816 | 1 | 10 | 41 | 0 | 0.82 | |
| 1.5% errors | |||||||
| 2300 | GOT | 917 | 0 | 1 | 50 | 0 | 0.98 |
| DWVRP | 1994 | 1 | 8 | 43 | 0 | 0.86 | |
| 2500 | GOT | 789 | 1 | 0 | 50 | 1 | 0.98 |
| DWVRP | 1048 | 0 | 2 | 49 | 0 | 0.96 | |
| 2700 | GOT | 837 | 0 | 3 | 48 | 0 | 0.94 |
| DWVRP | 1353 | 2 | 6 | 45 | 0 | 0.92 | |
Results of experiments for datasetB
| Sequence | Method | Path cost | Real contigs | Calculated contigs | Correct overlaps | Incorrect overlaps | Accuracy |
|---|---|---|---|---|---|---|---|
| covid19 | GOT | 524 | 0 | 0 | 52 | 0 | 1.00 |
| DWVRP | 756 | 0 | 2 | 50 | 0 | 0.96 | |
| lambda_phage | GOT | 99 | 1 | 5 | 85 | 0 | 0.96 |
| DWVRP | 174 | 2 | 9 | 81 | 0 | 0.92 | |
| NC_000913_cov5 | GOT | 282 | 2 | 1 | 45 | 1 | 0.98 |
| DWVRP | 518 | 5 | 7 | 40 | 0 | 0.96 | |
| NC_000913_cov10 | GOT | 282 | 1 | 2 | 90 | 0 | 0.99 |
| DWVRP | 678 | 4 | 6 | 85 | 1 | 0.96 | |
| NC_000913_cov15 | GOT | 229 | 1 | 1 | 137 | 0 | 1.00 |
| DWVRP | 526 | 3 | 10 | 128 | 0 | 0.95 |