| Literature DB >> 29902968 |
Abstract
BACKGROUND: Third generation sequencing technologies generate long reads that exhibit high error rates, in particular for insertions and deletions which are usually the most difficult errors to cope with. The only exact algorithm capable of aligning sequences with insertions and deletions is a dynamic programming algorithm.Entities:
Keywords: NGS; Read correction; Semi-global alignment
Mesh:
Year: 2018 PMID: 29902968 PMCID: PMC6003146 DOI: 10.1186/s12859-018-2228-9
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Semi-global alignment in a band. The w-band is shown in gray. The 1st, horizontal sequence has length l1=20. The second sequence has length l2=12. The width of the band is w=3. This is a global alignment, hence the path starts at position [1,1] in the matrix and ends at position [ l2,l1]
Fig. 2Semi-global alignments. Left: path of a semi-global alignment between two sequences using the full-matrix DP algorithm. The score function used is: match = +3, mismatch = -1, gap = -2. Middle: same as left but using the DP algorithm in a band. The band is not wide enough (w = 6) to fit the path. Right: same as middle, but this time the band can accommodate the path. Band limits are displayed with dotted lines
Comparison of simulated and theoretical standard deviations
| error rate set 1 | error rate set 2 | error rate set 3 | ||||
|---|---|---|---|---|---|---|
| Length |
|
|
|
|
|
|
| 2000 | 28.8 ± 0.3 | 30.0 | 20.2 ± 0.3 | 21.5 | 25.1 ± 0.3 | 26.1 |
| 3000 | 35.6 ± 0.5 | 36.7 | 25.2 ± 0.4 | 26.4 | 30.8 ± 0.3 | 32.0 |
| 4000 | 41.4 ± 0.5 | 42.4 | 28.8 ± 0.3 | 30.5 | 35.6 ± 0.5 | 37.0 |
| 6000 | 50.1 ± 0.7 | 51.9 | 35.5 ± 0.3 | 37.3 | 44.0 ± 0.5 | 45.3 |
Two-way table of banded DP alignments
| optimal score | suboptimal score | |
|---|---|---|
| did not reach the band edge | 68.4% | 0.5% |
| reached the band edge | 2.2% | 28.9% |