| Literature DB >> 23718862 |
Peidong Shen1, Wenyi Wang2, Aung-Kyaw Chi1, Yu Fan2, Ronald W Davis1, Curt Scharfe1.
Abstract
Target enrichment technologies utilize single-stranded oligonucleotide probes to capture candidate genomic regions from a DNA sample before sequencing. We describe target capture using double-stranded probes, which consist of single-stranded, complementary long padlock probes (cLPPs), each selectively capturing one strand of a genomic target through circularization. Using two probes per target increases sensitivity for variant detection and cLPPs are easily produced by PCR at low cost. Additionally, we introduce an approach for generating capture libraries with uniformly randomized template orientations. This facilitates bidirectional sequencing of both the sense and antisense template strands during one paired-end read, which maximizes target coverage.Entities:
Year: 2013 PMID: 23718862 PMCID: PMC3706973 DOI: 10.1186/gm454
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 11.117
Figure 1Probe construction, target capture and reciprocal paired-end sequencing. (a) Each cLPP contains a common linker flanked by post-capture amplification sites (red and green) and two target-specific capturing arms (blue and orange). Probe ends are trimmed (BsaI and MlyI) and 5'-phosphorylated to produce functional cLPPs. (b) Multiplex probe-target hybridization followed by gap-filling and ligation triggers probe circularization and target capture. (c) Capture libraries are multiplex-amplified using hybrid primers that anneal to the probes' amplification sites and add Illumina sequencing adaptors (P5 or P7) and sample-specific barcodes. This is done in two separate PCRs during which the adaptors swap positions at the ends of templates. Both PCRs are pooled for reciprocal PE sequencing of both DNA strands.
Figure 2Coverage distribution across target regions. (a) Cumulative mean percent base coverage across 5,619 targets captured using cLPPs and ssLPPs, respectively, and shown separately for sequence read 1 and read 2. All bases have a minimum of 10× coverage. (b) Log ratio of coverage of read 1 and 2. Each boxplot corresponds to coverage distribution of a group of amplicons within a defined size range with number of amplicons, percent bases covered (≥10×) and average GC content shown for each group. All groups present a statistically significant distribution different from each other and each maintains a mean significantly different from 0.
Performance of cLPP target capture and sequencing
| (i) Sensitivity; percentage of the target bases that are represented by one or more reads | 98.7% of all exons and 97.8% of all target bases at >20× coverage (0.012× mean coverage) |
| (ii) Specificity; percentage of sequence reads that map to the intended targets (5,471 exons) | 98.1% mapped target reads confirming the high on-target specificity of LPP capture |
| (iii) Accuracy; base calling concordance to known sample SNVs | >99% concordance rate for both heterozygous and homozygous SNVs with coverage of 97.9%; sample SNVs at >20× coverage (0.012× mean) |
| (iv) Uniformity; variability in sequence coverage across target regions | 91% of capture products were distributed within a 50-fold range (94% within 100-fold) |
| (v) Reproducibility; or how closely results obtained from independent samples correlate | r = 0.93 rank-order correlation between two different HapMap samples (Figure
S8 in Additional file |
| (vi) Cost of LPP target capture | $86 per sample for 100 genes in 100 samples or $16.70 per sample for 100
genes in 1,000 samples (Figure S2 in Additional file |
| (vii) Ease of use and time effort | <8 hours target capture and library preparation ('sequencing-ready'), <24 hours MiSeq and approximately 6 hours variant calling (524 genes). Total time: 38 hours |
| (viii) DNA amount required per experiment | >50 ng of genomic DNA |
| (ix) Multiplexing of candidate genomic targets and of DNA samples | Multiplex target capture of 524 genes per sample; sample multiplexing of 7
capture libraries per MiSeq run (#B, Additional file |
Performance parameters are adapted from [2] and calculated for cLPPs based on: (i to iv) multiplex targeted sequencing of 524 genes using cLPPs and MiSeq for NA18507 (experiment #7, Additional file 2); (v) comparison of two independent sample preparations (NA18507 and NA12878, experiment #6) and by estimating the standard deviation (SD) across five cLPP capture experiments (experiment #2, 4 to 7, Additional file 2).