| Literature DB >> 23894647 |
Johanna Brodin1, Mattias Mild, Charlotte Hedskog, Ellen Sherwood, Thomas Leitner, Björn Andersson, Jan Albert.
Abstract
BACKGROUND: Ultra-deep pyrosequencing (UDPS) is used to identify rare sequence variants. The sequence depth is influenced by several factors including the error frequency of PCR and UDPS. This study investigated the characteristics and source of errors in raw and cleaned UDPS data.Entities:
Mesh:
Year: 2013 PMID: 23894647 PMCID: PMC3720931 DOI: 10.1371/journal.pone.0070388
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Total number of reads and mean error frequency percent (%) per nucleotide as well as the number of unique sequence variants in raw and cleaned UDPS data from forward and reverse reads from three UDPS runs.
| Raw data | Cleaned data | ||||
| Run | Sequencing direction | No. of reads/mean % errorfrequency per nucleotide | No. of uniquevariants | No. of reads/mean % errorfrequency per nucleotide | No. of uniquevariants |
| 1 | Forward | 10,121/0.20 | 633 | 8,756/0.063 | 204 |
| Reverse | 7,378/0.19 | 480 | 6,205/0.058 | 146 | |
| 2 | Forward | 12,092/0.23 | 682 | 9,537/0.058 | 206 |
| Reverse | 10,482/0.61 | 527 | 1,462/0.077 | 91 | |
| 3 | Forward | 2,570/0.21 | 271 | 2,187/0.041 | 85 |
| Reverse | 5,050/0.14 | 354 | 4,583/0.08 | 124 | |
| Total | Both | 47,693/0.30 | 2,044 | 32,730/0.056 | 315 |
Figure 1Examples of how different types of UDPS error were defined.
Frequency of specific nucleotide substitution errors in raw UDPS data.
| To base | ||||
| From base | A | T | G | C |
| A | – | 0.00 (0.00–0.01) | 0.06 (0.00–0.20) | 0.00 (0.00–0.02) |
| T | 0.00 (0.00–0.01) | – | 0.00 (0.00–0.03) | 0.06 (0.00–0.19) |
| G | 0.02 (0.00*–0.21) | 0.00 (0.00–0.01) | – | 0.00 (0.00–0.01) |
| C | 0.00 (0.00–0.01) | 0.02 (0.00–0.18) | 0.00 (0.00–0.01) | – |
Results were combined from the three UDPS runs and are displayed as median and range percent (%) error per nucleotide.
0.00* denotes an error frequency of = 0.00021%. 0.00 denotes that the substitution error was not observed.
Figure 2The average frequency of different substitution errors in percent (%) in cleaned UDPS data from three sequencing runs.
Thick arrows indicate transitions and thin arrows indicate transversions.
Figure 3Site-specific error frequencies in percent (%) in cleaned UDPS data obtained in the forward sequencing direction of run 1.
All sequencing errors were substitutions since all deletions and insertions were removed by the data cleaning procedure. The bars are color-coded according to the type of substitution error. Homopolymeric regions are shaded.
Figure 4PCR/UDPS error ratio in our cleaned data.
This figure shows a comparison of the counts of reverse to forward variants of run 1. A) Our filtered UDPS data. B) Same data, normalized by the main variant forward and reverse counts.