| Literature DB >> 31608108 |
Piyush Gampawar1, Yasaman Saba1, Ulrike Werner1, Reinhold Schmidt2, Bertram Müller-Myhsok3,4,5, Helena Schmidt1.
Abstract
Library preparation for whole-exome sequencing is a critical step serving the enrichment of the regions of interest. For Ion Proton, there are only two exome library preparation methods available, AmpliSeq and SureSelect. Although of major interest, a comparison of the two methods is hitherto missing in the literature. Here, we systematically evaluate the performance of AmpliSeq and SureSelect and present an improved variant calling pipeline. We used 12 in-house DNA samples with genome-wide and exome microarray data and a commercially available reference DNA (NA12878) for evaluation. Both methods had a high concordance (>97%) with microarray genotypes and, when validating against NA12878, a sensitivity and positive predictive values of >93% and >80%, respectively. Application of our variant calling pipeline decreased the number of false positive variants dramatically by 90% and resulted in positive predictive value of 97%. This improvement is highly relevant in research as well as clinical setting.Entities:
Keywords: AmpliSeq; SureSelect; exome sequencing; ion proton sequencer; library preparation; validation
Year: 2019 PMID: 31608108 PMCID: PMC6774276 DOI: 10.3389/fgene.2019.00856
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Workflow of the study design. The same color represents the steps at the same level. Identical steps are used to analyze both methods. AS, AmpliSeq; SS, SureSelect; TTR, Total Target Region; ETR, Effective Target Region; OTR, Overlapping Target Region; TPs,True Positives; FNs, False Negatives; FPs, False Positives; PPV, Positive Predictive Value.
Comparison of laboratory protocols and design of AmpliSeq and SureSelect library preparation methods.
| AmpliSeq | SureSelect | ||
|---|---|---|---|
|
| Enrichment approach | PCR | Hybridization |
| DNA input | 100 ng | 1 µg | |
| Steps in library preparation | 3 | 8 | |
| DNA fragmentation | NA | Enzymatic fragmentation | |
| Target selection | Amplification using primers | Hybridization with RNA library baits | |
| Incubation time | ∼4 h | ∼26 h | |
| Library Preparation time | 6 h | 2.5 days | |
|
| Total target region | 57.74 MB | 60.45 MB |
| RefSeq coding | 32.30 MB (91.13%) | 31.15 MB (87.88%) | |
| UCSC coding | 32.57 MB (88.65%) | 32.26 MB (87.78%) | |
| Ensembl coding | 32.40 MB (87.90%) | 32.23 MB (87.44%) | |
| Effective target region | 46.35 MB | NA | |
| RefSeq coding | 30.58 MB (86.28%) | NA | |
| UCSC coding | 30.76 MB (83.72%) | NA | |
| Ensembl coding | 30.62 MB (83.07%) | NA |
NA, not applicable; MB, million bases.
Figure 2Evenness of coverage, per base depth of coverage and its comparison between AmpliSeq and SureSelect methods. (A) Evenness of coverage plotted for original and downsampled BAM files (B) Scatter plot showing the distribution of per base coverage of AmpliSeq and SureSelect till 1000X read depth. (C) A bar chart is showing the difference in coverage after dividing the depth of coverage into 45 groups and normalization. SureSelect covers more bases in the coverage range of 11X to 150X than AmpliSeq. AS, AmpliSeq; SS, SureSelect.
Variant validation of default TVC output (VCF1) against NA12878 truth set.
| Total Variants | Truth set | TPs | FNs | FPs | Sensitivity | PPV | |
|---|---|---|---|---|---|---|---|
| AmpliSeq | |||||||
| Total Variants | 54,351 | 49,340 | 45,946 | 3,394 | 8,405 | 93.12% | 84.54% |
| Total SNVs | 50,913 | 45,092 | 43,840 | 1,252 | 7,073 | 97.22% | 86.11% |
| Exonic SNVs | 19,650 | 16,964 | 16,588 | 376 | 3,062 | 97.78% | 84.42% |
| Total Indels | 3,436 | 4,248 | 2,106 | 2,142 | 1,330 | 49.58% | 61.29% |
| Exonic indels | 539 | 329 | 231 | 98 | 308 | 70.21% | 42.86% |
| SureSelect | |||||||
| Total Variants | 54,934 | 46,982 | 43,929 | 3,053 | 11,005 | 93.50% | 79.97% |
| Total SNVs | 52,013 | 43,367 | 42,230 | 1,137 | 9,783 | 97.38% | 81.19% |
| Exonic SNVs | 19,171 | 16,120 | 15,846 | 274 | 3,325 | 98.30% | 82.66% |
| Total Indels | 2,921 | 3,614 | 1,699 | 1,915 | 1,222 | 47.01% | 58.17% |
| Exonic indels | 312 | 277 | 195 | 82 | 117 | 70.40% | 62.50% |
Truth set—Variants in v3.3.2 of high-confidence calls VCF of NA12878 from Genome in the Bottle project, SNVs,Single Nucleotide Variants; TPs,True Positives; FNs, False Negatives; FPs, False Positives; PPV, Positive Predictive Value; TVC, Torrent Variant Caller, VCF1—This corresponds to .
Variant validation in various steps of optimization using NA12878 truth set.
| Steps | Total Variants | Truth set | TPs | FNs | FPs | Sensitivity | PPV |
|---|---|---|---|---|---|---|---|
| AmpliSeq | |||||||
| TTR (VCF1) | 54,351 | 49,340 | 45,946 | 3,394 | 8,405 | 93.12% | 84.54% |
| RG (VCF2) | 55,241 | 49,340 | 46,660 | 2,680 | 8,581 | 94.57% | 84.47% |
| HCR (VCF3) | 47,538 | 48,796 | 46,320 | 2,476 | 1,218 | 94.93% | 97.44% |
| SureSelect | |||||||
| TTR (VCF1) | 54,934 | 46,982 | 43,929 | 3,053 | 11,005 | 93.50% | 79.97% |
| RG (VCF2) | 55,831 | 46,982 | 44,551 | 2,431 | 11,280 | 94.83% | 79.80% |
| HCR (VCF3) | 45,200 | 46,557 | 44,253 | 2,304 | 947 | 95.05% | 97.91% |
Truth set—Variants in v3.3.2 of high-confidence calls VCF of NA12878 from Genome in the Bottle project, TPs, True Positives; FNs, False Negatives; FPs, False Positives; PPV, Positive Predictive Value, VCF1–3—This corresponds to , TTR, Total Target Region; RG, Regularization; HCR, High-Confidence Region.
Figure 3Effect on optimization-variant calling pipeline. Effect of optimization steps shown on total variants, true positives, false positives, sensitivity, and PPV in AmpliSeq and SureSelect. Blue represents AmpliSeq and red SureSelect. PPV, positive predictive value.
Comparing performance of AmpliSeq vs. SureSelect within RefSeq-coding region and overlapping target region.
|
| Steps | Total Variants | Truth set | TPs | FNs | FPs | Sensitivity | PPV |
|---|---|---|---|---|---|---|---|---|
|
| ||||||||
| TTR (VCF1) | 21,584 | 19,270 | 17,836 | 1,434 | 3,748 | 92.56% | 82.64% | |
| RG (VCF2) | 21,878 | 19,270 | 18,087 | 1,183 | 3,791 | 93.86% | 82.67% | |
| HCR (VCF3) | 18,331 | 19,270 | 18,009 | 1,261 | 322 | 93.46% | 98.24% | |
|
| ||||||||
| TTR (VCF1) | 21,649 | 19,270 | 17,312 | 1,958 | 4,337 | 89.84% | 79.97% | |
| RG (VCF2) | 21,943 | 19,270 | 17,523 | 1,747 | 4,420 | 90.93% | 79.86% | |
| HCR (VCF3) | 17,747 | 19,270 | 17,443 | 1,827 | 304 | 90.52% | 98.29% | |
|
|
| |||||||
| TTR (VCF1) | 35,093 | 32,213 | 30,367 | 1,846 | 4,723 | 94.27% | 86.53% | |
| RG (VCF2) | 35,550 | 32,213 | 30,788 | 1,425 | 4,762 | 95.58% | 86.60% | |
| HCR (VCF3) | 31,161 | 31,979 | 30,611 | 1,368 | 550 | 95.72% | 98.23% | |
|
| ||||||||
| TTR (VCF1) | 36,228 | 32,213 | 30,651 | 1,562 | 5,577 | 95.15% | 84.61% | |
| RG (VCF2) | 36,744 | 32,213 | 31,067 | 1,146 | 5,677 | 96.44% | 84.55% | |
| HCR,(VCF3) | 31,478 | 31,979 | 30,896 | 1,083 | 582 | 96.61% | 98.15% | |
Overlapping target region (OTR) —this is the common region covered by both AS and SS designs. This region is 43.17 Mb, RefSeq-coding region—this is the coding region from RefSeq database, downloaded from UCSC table browser on 20/04/2017. NA12878 truth set has 19,270 variants, Truth set—Variants in v3.3.2 of high-confidence calls VCF of NA12878 from Genome in the Bottle project, SNVs, Single Nucleotide Variants; TPs, True Positives; FNs, False Negatives; FPs, False Positives; PP, Positive Predictive Value, VCF1–3—This corresponds to , RG, Regularization; HCR, High-Confidence Region.