| Literature DB >> 25081913 |
Subrata Saha, Sanguthevar Rajasekaran.
Abstract
In the next generation sequencing techniques millions of short reads are produced from a genomic sequence at a single run. The chances of low read coverage to some regions of the sequence are very high. The reads are short and very large in number. Due to erroneous base calling, there could be errors in the reads. As a consequence, sequence assemblers often fail to sequence an entire DNA molecule and instead output a set of overlapping segments that together represent a consensus region of the DNA. This set of overlapping segments are collectively called contigs in the literature. The final step of the sequencing process, called scaffolding, is to assemble the contigs into a correct order. Scaffolding techniques typically exploit additional information such as mate-pairs, pair-ends, or optical restriction maps. In this paper we introduce a series of novel algorithms for scaffolding that exploit optical restriction maps (ORMs). Simulation results show that our algorithms are indeed reliable, scalable, and efficient compared to the best known algorithms in the literature.Entities:
Mesh:
Year: 2014 PMID: 25081913 PMCID: PMC4120203 DOI: 10.1186/1471-2164-15-S5-S5
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Results for Yersinia pestis.
| Contigs | Method | Missed probability | % Resize | Conflicts | Wrong placement | % Accuracy | Time (s) |
|---|---|---|---|---|---|---|---|
| 50 | GPA1 | 0.0 | 0 | 0 | 0 | 100.00 | 31.97 |
| 0.1 | 5 | 0 | 0 | 100.00 | 29.45 | ||
| 0.2 | 10 | 0 | 0 | 100.00 | 29.07 | ||
| 0.3 | 20 | 27 | 1 | 44.00 | 25.99 | ||
| GPA2 | 0.0 | 0 | 0 | 0 | 100.00 | 34.10 | |
| 0.1 | 5 | 0 | 0 | 100.00 | 33.42 | ||
| 0.2 | 10 | 0 | 0 | 100.00 | 30.83 | ||
| 0.3 | 20 | 25 | 1 | 48.00 | 28.21 | ||
| GPA3 | 0.0 | 0 | 0 | 0 | 100.00 | 35.02 | |
| 0.1 | 5 | 0 | 0 | 100.00 | 32.41 | ||
| 0.2 | 10 | 0 | 0 | 100.00 | 27.76 | ||
| 0.3 | 20 | 12 | 2 | 72.00 | 27.24 | ||
| 100 | GPA1 | 0.0 | 0 | 1 | 0 | 99.00 | 34.05 |
| 0.1 | 5 | 4 | 0 | 96.00 | 31.23 | ||
| 0.2 | 10 | 7 | 0 | 93.00 | 27.76 | ||
| 0.3 | 20 | 45 | 6 | 49.00 | 25.92 | ||
| GPA2 | 0.0 | 0 | 1 | 0 | 99.00 | 31.44 | |
| 0.1 | 5 | 2 | 0 | 98.00 | 33.17 | ||
| 0.2 | 10 | 4 | 2 | 94.00 | 26.18 | ||
| 0.3 | 20 | 36 | 10 | 54.00 | 28.90 | ||
| GPA3 | 0.0 | 0 | 1 | 0 | 99.00 | 32.41 | |
| 0.1 | 5 | 0 | 0 | 100.00 | 30.10 | ||
| 0.2 | 10 | 1 | 0 | 99.00 | 29.64 | ||
| 0.3 | 20 | 27 | 6 | 67.00 | 29.04 | ||
| 200 | GPA1 | 0.0 | 0 | 3 | 0 | 98.50 | 36.90 |
| 0.1 | 5 | 8 | 0 | 96.00 | 33.28 | ||
| 0.2 | 10 | 21 | 0 | 89.50 | 33.11 | ||
| 0.3 | 20 | 69 | 4 | 63.5 | 29.61 | ||
| GPA2 | 0.0 | 0 | 3 | 0 | 98.50 | 33.56 | |
| 0.1 | 5 | 10 | 1 | 94.50 | 33.73 | ||
| 0.2 | 10 | 19 | 3 | 89.50 | 34.29 | ||
| 0.3 | 20 | 92 | 7 | 50.50 | 32.40 | ||
| GPA3 | 0.0 | 0 | 3 | 0 | 98.50 | 34.93 | |
| 0.1 | 5 | 5 | 0 | 97.50 | 35.96 | ||
| 0.2 | 10 | 12 | 1 | 93.50 | 32.25 | ||
| 0.3 | 20 | 52 | 5 | 71.5 | 32.16 | ||
| 400 | GPA1 | 0.0 | 0 | 8 | 0 | 98.00 | 40.17 |
| 0.1 | 5 | 20 | 2 | 94.50 | 35.00 | ||
| 0.2 | 10 | 56 | 7 | 84.25 | 32.21 | ||
| 0.3 | 20 | 120 | 15 | 66.25 | 30.47 | ||
| GPA2 | 0.0 | 0 | 8 | 0 | 98.00 | 34.77 | |
| 0.1 | 5 | 28 | 5 | 91.75 | 35.83 | ||
| 0.2 | 10 | 47 | 25 | 82.00 | 33.15 | ||
| 0.3 | 20 | 116 | 35 | 62.25 | 28.99 | ||
| GPA3 | 0.0 | 0 | 7 | 0 | 98.25 | 37.64 | |
| 0.1 | 5 | 18 | 0 | 95.50 | 31.70 | ||
| 0.1 | 5 | 29 | 8 | 90.75 | 31.50 | ||
| 0.3 | 20 | 162 | 21 | 76.75 | 31.70 | ||
Results for Yersinia enterocolitica.
| Contigs | Method | Missed probability | % Resize | Conflicts | Wrong placement | % Accuracy | Time (s) |
|---|---|---|---|---|---|---|---|
| 200 | GPA1 | 0.0 | 0 | 0 | 0 | 100.00 | 43.37 |
| 0.1 | 5 | 5 | 0 | 97.50 | 43.97 | ||
| 0.2 | 10 | 18 | 0 | 91.00 | 38.92 | ||
| 0.3 | 20 | 92 | 4 | 51.00 | 28.32 | ||
| GPA2 | 0.0 | 0 | 0 | 0 | 100.00 | 46.41 | |
| 0.1 | 5 | 3 | 0 | 98.50 | 45.47 | ||
| 0.2 | 10 | 11 | 6 | 91.50 | 32.71 | ||
| 0.3 | 20 | 84 | 10 | 53.00 | 32.29 | ||
| GPA3 | 0.0 | 0 | 0 | 0 | 100.00 | 41.10 | |
| 0.1 | 5 | 6 | 2 | 96.00 | 43.61 | ||
| 0.2 | 10 | 11 | 0 | 94.50 | 40.41 | ||
| 0.3 | 20 | 57 | 7 | 68.00 | 31.87 | ||
| 400 | GPA1 | 0.0 | 0 | 9 | 0 | 97.75 | 46.67 |
| 0.1 | 5 | 17 | 1 | 95.50 | 45.02 | ||
| 0.2 | 10 | 45 | 1 | 88.50 | 37.00 | ||
| 0.3 | 20 | 111 | 18 | 67.75 | 32.95 | ||
| GPA2 | 0.0 | 0 | 10 | 1 | 97.25 | 46.66 | |
| 0.1 | 5 | 26 | 4 | 92.50 | 49.04 | ||
| 0.2 | 10 | 50 | 22 | 82.00 | 33.21 | ||
| 0.3 | 20 | 135 | 26 | 59.75 | 31.90 | ||
| GPA3 | 0.0 | 0 | 9 | 0 | 97.75 | 43.89 | |
| 0.1 | 5 | 15 | 0 | 96.25 | 36.04 | ||
| 0.2 | 10 | 29 | 5 | 91.50 | 33.53 | ||
| 0.3 | 20 | 54 | 23 | 80.75 | 33.04 | ||
Figure 1Aligning ordered contigs onto the .
Figure 2Aligning ordered contigs onto the .
Results for simulated data.
| Length | Contigs | Method | Placed | Observed length | Difference | Edit dist | Coverage | Time (s) |
|---|---|---|---|---|---|---|---|---|
| 1 | 7 | GPA1 | 6 | 84689 | 15311 | 15483 | 84.69% | 0.40 |
| GPA2 | 6 | 84689 | 15311 | 15483 | 84.69% | 0.45 | ||
| GPA3 | 6 | 84689 | 15311 | 15483 | 84.69% | 0.37 | ||
| 3 | 34 | GPA1 | 26 | 259619 | 40381 | 40923 | 86.54% | 1.95 |
| GPA2 | 26 | 281905 | 18095 | 18917 | 93.97% | 2.07 | ||
| GPA3 | 32 | 260662 | 39338 | 86220 | 86.89% | 2.10 | ||
| 5 | 52 | GPA1 | 39 | 445727 | 54273 | 55210 | 89.15% | 4.67 |
| GPA2 | 39 | 454376 | 45624 | 50185 | 90.88% | 5.46 | ||
| GPA3 | 38 | 431582 | 68418 | 69285 | 86.32% | 4.67 | ||
| 7 | 53 | GPA1 | 43 | 571908 | 128092 | 129160 | 81.70% | 8.62 |
| GPA2 | 45 | 656139 | 45624 | 48593 | 93.73% | 8.27 | ||
| GPA3 | 50 | 586588 | 113412 | 143189 | 83.80% | 8.17 | ||
Comparisons.
| Length | Method | Correctly placed | Accuracy | Time (s) |
|---|---|---|---|---|
| 5 | GPA1 | 49 | 98.00% | 5.87 |
| GPA2 | 49 | 98.00% | 4.65 | |
| GPA3 | 49 | 98.00% | 4.62 | |
| Nagarajan et al. [ | 30 | 60.00% | 1620 | |
| 6 | GPA1 | 50 | 100.00% | 8.52 |
| GPA2 | 50 | 100.00% | 7.12 | |
| GPA3 | 50 | 100.00% | 7.12 | |
| Nagarajan et al. [ | 32 | 64.00% | 14400 | |
| 7 | GPA1 | 49 | 98.00% | 8.79 |
| GPA2 | 49 | 98.00% | 8.19 | |
| GPA3 | 49 | 98.00% | 8.48 | |
| Nagarajan et al. [ | - | - | - | |
| 8 | GPA1 | 50 | 100.00% | 11.77 |
| GPA2 | 50 | 100.00% | 11.70 | |
| GPA3 | 50 | 100.00% | 10.64 | |
| Nagarajan et al. [ | - | - | - | |