| Literature DB >> 28060945 |
Vladimir Potapov1, Jennifer L Ong1.
Abstract
Next-generation sequencing technology has enabled the detection of rare genetic or somatic mutations and contributed to our understanding of disease progression and evolution. However, many next-generation sequencing technologies first rely on DNA amplification, via the Polymerase Chain Reaction (PCR), as part of sample preparation workflows. Mistakes made during PCR appear in sequencing data and contribute to false mutations that can ultimately confound genetic analysis. In this report, a single-molecule sequencing assay was used to comprehensively catalog the different types of errors introduced during PCR, including polymerase misincorporation, structure-induced template-switching, PCR-mediated recombination and DNA damage. In addition to well-characterized polymerase base substitution errors, other sources of error were found to be equally prevalent. PCR-mediated recombination by Taq polymerase was observed at the single-molecule level, and surprisingly found to occur as frequently as polymerase base substitution errors, suggesting it may be an underappreciated source of error for multiplex amplification reactions. Inverted repeat structural elements in lacZ caused polymerase template-switching between the top and bottom strands during replication and the frequency of these events were measured for different polymerases. For very accurate polymerases, DNA damage introduced during temperature cycling, and not polymerase base substitution errors, appeared to be the major contributor toward mutations occurring in amplification products. In total, we analyzed PCR products at the single-molecule level and present here a more complete picture of the types of mistakes that occur during DNA amplification.Entities:
Year: 2017 PMID: 28060945 PMCID: PMC5218489 DOI: 10.1371/journal.pone.0169774
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Error rate of Taq DNA polymerase.
| Amplicon | Substitution rate | Deletion rate | Insertion rate | Total error rate | Total bases |
|---|---|---|---|---|---|
| LacZ-1 | 1.2 × 10−4 (98.8%) | 1.6 × 10−6 (1.2%) | - (0.0%) | 1.3 × 10−4 | 323,802 |
| LacZ-1 | 1.7 × 10−4 (97.3%) | 4.7 × 10−6 (2.6%) | 1.8 × 10−7 (0.1%) | 1.8 × 10−4 | 35,879,784 |
| LacZ-2 | 1.7 × 10−4 (96.1%) | 5.1 × 10−6 (2.9%) | 1.8 × 10−6 (1.0%) | 1.8 × 10−4 | 15,857,446 |
| DNA-1 | 1.4 × 10−4 (97.2%) | 3.9 × 10−6 (2.8%) | 1.2 × 10−7 (0.1%) | 1.4 × 10−4 | 18,680,811 |
| DNA-2 | 1.4 × 10−4 (97.5%) | 3.4 × 10−6 (2.4%) | 1.5 × 10−7 (0.1%) | 1.4 × 10−4 | 27,978,748 |
Reported error rates are per base per doubling as detailed in Materials and Methods. Numbers in parentheses are percentages of the total error rate.
Distribution of individual error types for Taq DNA polymerase.
| Sequencing method | A→G, T→C (%) | G→A, C→T (%) | A→T, T→A (%) | A→C, T→G (%) | G→C, C→G (%) | G→T, C→A (%) |
|---|---|---|---|---|---|---|
| Sanger (dideoxy) | 66 | 21 | 10 | 0.9 | 1.4 | 0.9 |
| Pacific Biosciences RSII | 66 | 19 | 9.3 | 2.0 | 1.6 | 2.0 |
Data derived from sequencing LacZ-1 amplicon.
Substitution error rates measured by PacBio single-molecule sequencing.
| DNA Polymerase | Substitution rate | Accuracy | Fidelity, rel. to | Total bases |
|---|---|---|---|---|
| 1.5 × 10−4 (± 0.2 × 10−4) | 6,456 | 1 | 98,396,789 | |
| Q5 | 5.3 × 10−7 (± 0.9 × 10−7) | 1,870,763 | 280 | 112,619,228 |
| Phusion | 3.9 × 10−6 (± 0.7 × 10−6) | 255,118 | 39 | 118,262,939 |
| Deep Vent | 4.0 × 10−6 (± 2.0 × 10−6) | 251,129 | 44 | 106,217,940 |
| 5.1 × 10−6 (± 1.1 × 10−6) | 195,275 | 30 | 79,614,976 | |
| PrimeSTAR GXL | 8.4 × 10−6 (± 1.1 × 10−6) | 118,467 | 18 | 118,964,566 |
| KOD | 1.2 × 10−5 (± 0.2 × 10−5) | 82,303 | 12 | 121,234,438 |
| Kapa HiFi HotStart ReadyMix | 1.6 × 10−5 (± 0.3 × 10−5) | 63,323 | 9.4 | 101,742,963 |
| Deep Vent (exo-) | 5.0 × 10−4 (± 0.1 × 10−4) | 2,020 | 0.3 | 60,218,605 |
a Reported error rates are per base per doubling as detailed in Materials and Methods. Standard deviations were determined based on sequencing several samples and are given here in brackets.
b Accuracy is calculated as 1 over substitution rate such that accuracy is a number of bases over which 1 substitution error is expected.
c Fidelity relative to Taq numbers are computed separately for each amplicon (LacZ-1, LacZ-2, DNA-1, DNA-2) and the average number is reported per DNA polymerase. Individual values are available in S1 Table.
Fig 1Fidelity measurements and mutational spectrum of DNA polymerases.
(A) Base substitution error rates of various DNA polymerases relative to Taq polymerase. (B) Proportion of each type of base substitution error as a percentage of the total errors for each polymerase.
Fig 2Schematic of template-switching in lacZ.
Replication of potential cruciform structures at lacZ coordinates 3083..3103 without template-switching (A) and with template-switching (B, C and D) between top and bottom strands. Three classes of template-switching events were identified based on sequence context: double-switching events (B) and single-switch events on the bottom (C) or top (D) strands. Similar sites were identified at locations 185..222 and 1843..1981 in lacZ.
Regions in lacZ with observed template-switching events.
| Inversion | Location in | Location in LacZ-2 amplicon | Stem length | Loop length |
|---|---|---|---|---|
| Inversion 1 | 185..222 | n.a. | 16 | 6 |
| Inversion 2 | 1843..1981 | 45..183 | 10 | 129 |
| Inversion 3 | 3083..3103 | 1285..1305 | 8 | 5 |
a The length of LacZ-2 amplicon in 1353 nt. Inversion 2 is located close to the 5’ end of the amplicon and Inversion 3 is close to the 3’ end. Inversion 1 is not covered by LacZ-2 amplicon.
Percentage of strands with inversions.
| Enzyme | Location in | |
|---|---|---|
| 1843..1981 | 3083..3103 | |
| n.d. | n.d. | |
| Q5 | 0.003% | 0.001% |
| Phusion | n.d. | 0.02% |
| Deep Vent | 1.54% | 0.17% |
| 0.42% | 0.003% | |
| PrimeSTAR GXL | n.d. | 0.004% |
| KOD | 0.01% | 0.40% |
| Kapa HiFi HotStart ReadyMix | 0.006% | 0.003% |
| Deep Vent (exo-) | 0.03% | 0.01% |
a “n.d.” indicates that template-switching reads were not detected.
Fig 3Schematic of measuring PCR-mediated recombination rate.
Two template pairs, differing by single point mutations (markers) spaced at regular intervals across the gene are co-amplified. Recombination products contain markers from both templates.
PCR-meditated recombination rate by Taq DNA polymerase.
| Template pair | Nre
| Ntotal
| Recombination rate | Strands with at least 1 recombination event |
|---|---|---|---|---|
| DNA-1:DNA-1x | 19,943 | 77,725,936 | 9.6 × 10−5 | 23% |
| DNA-2:DNA-2x | 14,687 | 44,271,304 | 1.3 × 10−4 | 28% |
a Number of recombination events.
b Total number of analyzed sequenced bases.
c Recombination rate is per base per doubling. Recombination rate is doubled to account for “cryptic” recombination events.
Fig 4Thermocycling-induced DNA damage.
The base substitution error rates of plasmid libraries without treatment, mock thermocycling, and mock thermocycling and PreCR treatment. The base substitution error rate of plasmid libraries treated with PreCR before and after SMRTbell library construction represents the background error rate of the single-molecule sequencing fidelity assay.