| Literature DB >> 21685085 |
Susanne Balzer1, Ketil Malde, Inge Jonassen.
Abstract
MOTIVATION: 454 pyrosequencing, by Roche Diagnostics, has emerged as an alternative to Sanger sequencing when it comes to read lengths, performance and cost, but shows higher per-base error rates. Although there are several tools available for noise removal, targeting different application fields, data interpretation would benefit from a better understanding of the different error types.Entities:
Mesh:
Year: 2011 PMID: 21685085 PMCID: PMC3117331 DOI: 10.1093/bioinformatics/btr251
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Empirical flow values distributions (D.labrax) and derived intervals.
Fig. 3.Flow value histograms for G.morhua mate-pair reads (forward matches, N=7 016 764). The y-axis is on a log10 scale. The 15 flow cycles correspond to the 42 positions of the linker sequence. The gray areas contain correct base calls. Subpeaks point toward putative PCR errors.
Flow value intervals from empirical distributions (D.labrax)
| Size (%) | 0-distribution | 1-distribution | 2-distribution | 3-distribution |
|---|---|---|---|---|
| 5 | [0.00, 0.02] | [1.01, 1.02] | [2.00, 2.02] | [3.01, 3.03] |
| 10 | [0.00, 0.04] | [1.01, 1.03] | [2.00, 2.03] | [3.00, 3.04] |
| 25 | [0.00, 0.07] | [1.00, 1.04] | [1.97, 2.05] | [2.97, 3.07] |
| 50 | [0.00, 0.11] | [0.96, 1.07] | [1.93, 2.09] | [2.90, 3.12] |
| 75 | [0.00, 0.14] | [0.92, 1.12] | [1.86, 2.16] | [2.81, 3.20] |
| 90 | [0.00, 0.18] | [0.86, 1.18] | [1.78, 2.24] | [2.69, 3.30] |
| 95 | [0.00, 0.22] | [0.81, 1.23] | [1.72, 2.31] | [2.61, 3.39] |
Fig. 2.Bins for homopolymer lengths 0, 1 and 2, based on different flow value interval sizes from Table 1.
Estimated fraction of error types in percentage of overall errors
| Size (%) | Pyrosequencing errors (%) | Putative PCR errors (%) | Extreme errors (%) |
|---|---|---|---|
| 5 | 80.18 | 3.97 | 15.85 |
| 10 | 79.28 | 5.78 | 14.94 |
| 25 | 75.69 | 11.17 | 13.14 |
| 50 | 67.15 | 24.65 | 8.20 |
| 75 | 59.18 | 36.89 | 3.93 |
| 90 | 51.62 | 47.02 | 1.36 |
| 95 | 46.63 | 52.77 | 0.60 |
Fig. 4.Putative PCR and pyrosequencing error rates with respect to flow cycles (for underlying flow value intervals of size 5 and 95%).