| Literature DB >> 19881494 |
Ryan Tewhey1, Jason B Warner, Masakazu Nakano, Brian Libby, Martina Medkova, Patricia H David, Steve K Kotsopoulos, Michael L Samuels, J Brian Hutchison, Jonathan W Larson, Eric J Topol, Michael P Weiner, Olivier Harismendy, Jeff Olson, Darren R Link, Kelly A Frazer.
Abstract
Targeted enrichment of specific loci of the human genome is a promising approach to enable sequencing-based studies of genetic variation in large populations. Here we describe an enrichment approach based on microdroplet PCR, which enables 1.5 million amplifications in parallel. We sequenced six samples enriched by microdroplet or traditional singleplex PCR using primers targeting 435 exons of 47 genes. Both methods generated similarly high-quality data: 84% of the uniquely mapping reads fell within the targeted sequences; coverage was uniform across approximately 90% of targeted bases; sequence variants were called with >99% accuracy; and reproducibility between samples was high (r(2) = 0.9). We scaled the microdroplet PCR to 3,976 amplicons totaling 1.49 Mb of sequence, sequenced the resulting sample with both Illumina GAII and Roche 454, and obtained data with equally high specificity and sensitivity. Our results demonstrate that microdroplet technology is well suited for processing DNA for massively parallel enrichment of specific subsets of the human genome for targeted sequencing.Entities:
Mesh:
Year: 2009 PMID: 19881494 PMCID: PMC2779736 DOI: 10.1038/nbt.1583
Source DB: PubMed Journal: Nat Biotechnol ISSN: 1087-0156 Impact factor: 54.908
Figure 1Microdroplet PCR workflow
Primer Library Generation (A): (1) Identify targeted sequences of interest in the genome. (2) Design and synthesize forward and reverse primer pairs for each targeted sequence (library element). (3) Generation of primer pair droplets for each library element. A microfluidic chip is used to encapsulate the aqueous PCR primers in inert fluorinated carrier oil with a block-copolymer surfactant to generate the equivalent of a picoliter scale test tube compatible with standard molecular biology. (4) Primer library, primer pair droplets of library elements are mixed together so that each library element has an equal representation. Genomic DNA Template Mix Preparation (B): (5) Genomic DNA is biotinylated (red dots), fragmented into 2 to 4 kb fragments and purified. (6) Purified genomic DNA is mixed together with all of the components of the PCR reaction (DNA polymerase, dNTPs, and buffer) except for the PCR primers. Droplet Merge and PCR (C): (7) Primer Library droplets are dispensed to the microfluidic chip (8) while the Genomic DNA Template is delivered as an aqueous solution and template droplets are formed within the microfluidic chip. The primer pair droplets and template droplets are then paired together in a 1:1 ratio. (9) Paired droplets flow through the channel of the microfluidic chip to pass through a merge area where an electric field induces the two discrete droplets to coalesce into a single PCR droplet. The roughly 1.5 million PCR droplets are collected into a single 0.2 ml PCR tube. The collection of PCR droplets (PCR Library) is processed in a standard thermal cycler for targeted amplification, followed by breaking the emulsion of PCR droplets to release the PCR amplicons into solution for genomic DNA (gDNA) removal, purification and sequencing.
Illumina GAII reads and mapping statistics
| Filtered Reads | Mapped Reads | Uniquely Mapped Reads | % Uniquely Mapping to Trimmed | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Set | Sample | PCR Method | Number | Bases | HG18 | Full | HG18 | Full | Primer | Exonic | Filtered | Mapped | Uniquely | % of Filtered |
| NA11832 | Traditional | 1496088 | 53.86 Mb | 50.29 Mb | 48.06 Mb | 49.24 Mb | 47.57 Mb | 44.47 Mb | 25.35 Mb | 83.44% | 89.37% | 90.31% | 47.06% | |
| NA11832 | Microdroplet | 1673982 | 60.26 Mb | 56.39 Mb | 44.70 Mb | 52.99 Mb | 44.23 Mb | 41.46 Mb | 21.47 Mb | 69.53% | 74.31% | 78.24% | 35.63% | |
| NA11992 | Traditional | 1213396 | 43.68 Mb | 39.33 Mb | 37.39 Mb | 38.46 Mb | 37.01 Mb | 35.59 Mb | 21.55 Mb | 82.31% | 91.42% | 92.53% | 49.34% | |
| NA11992 | Microdroplet | 1367394 | 49.23 Mb | 43.83 Mb | 30.09 Mb | 39.81 Mb | 29.71 Mb | 28.45 Mb | 16.01 Mb | 58.55% | 65.76% | 71.47% | 32.53% | |
| NA12006 | Traditional | 1256622 | 45.24 Mb | 41.84 Mb | 39.64 Mb | 40.99 Mb | 39.27 Mb | 37.49 Mb | 21.59 Mb | 83.66% | 90.45% | 91.45% | 47.72% | |
| NA12006 | Microdroplet | 1148454 | 41.34 Mb | 37.83 Mb | 30.00 Mb | 35.43 Mb | 29.67 Mb | 28.15 Mb | 15.82 Mb | 68.86% | 75.26% | 79.46% | 38.27% | |
| NA18505 | Traditional | 1222820 | 44.02 Mb | 40.77 Mb | 38.17 Mb | 39.83 Mb | 37.77 Mb | 36.63 Mb | 22.00 Mb | 84.09% | 90.81% | 91.95% | 49.97% | |
| NA18505 | Microdroplet | 1116948 | 40.21 Mb | 36.06 Mb | 30.36 Mb | 34.36 Mb | 30.05 Mb | 29.21 Mb | 15.87 Mb | 73.40% | 81.84% | 85.01% | 39.46% | |
| NA18517 | Traditional | 838226 | 30.18 Mb | 28.12 Mb | 26.35 Mb | 27.41 Mb | 26.04 Mb | 24.99 Mb | 14.89 Mb | 83.81% | 89.93% | 91.17% | 49.33% | |
| NA18517 | Microdroplet | 587958 | 21.17 Mb | 18.12 Mb | 13.04 Mb | 16.40 Mb | 12.87 Mb | 12.37 Mb | 6.88 Mb | 59.16% | 69.12% | 75.41% | 32.49% | |
| NA18489 | Traditional | 1429866 | 51.48 Mb | 46.90 Mb | 44.44 Mb | 45.97 Mb | 44.03 Mb | 41.08 Mb | 23.36 Mb | 80.56% | 88.42% | 89.36% | 45.37% | |
| NA18489 | Microdroplet | 1885186 | 67.87 Mb | 63.50 Mb | 56.18 Mb | 60.79 Mb | 55.62 Mb | 51.75 Mb | 27.18 Mb | 77.03% | 82.32% | 85.14% | 40.06% | |
| All Samples | Traditional | 7457018 | 268.45 Mb | 247.24 Mb | 234.10 Mb | 241.91 Mb | 231.69 Mb | 220.25 Mb | 128.72 Mb | 82.04% | 89.08% | 91.04% | 47.95% | |
| All Samples | Microdroplet | 7779922 | 280.08 Mb | 255.74 Mb | 204.22 Mb | 239.77 Mb | 202.15 Mb | 191.39 Mb | 103.23 Mb | 68.36% | 74.84% | 79.82% | 36.86% | |
| NA18858 | Microdroplet | 10603854 | 381.74 Mb | 325.57 Mb | 293.89 Mb | 309.65 Mb | 289.97 Mb | 245.20 Mb | 122.87 Mb | 64.23% | 76.30% | 79.19% | 32.19% | |
- High quality reads from Illumina Pipeline 1.3
- Number of bases from filtered reads that were mapped by Maq, non-unique reads are randomly placed at one of the multiple locations
- Maq mapping score of 20 or greater, corresponding to a 1% chance of being incorrectly mapped.
-Amount of sequence mapping to amplicons including primer sequences
- NA18517 had fewer reads than the other 5 samples for both traditional and microdroplet PCR. This was likely a technical issue with both library preparations that is independent of PCR method and DNA quality.
Figure 2Coverage plots of targeted sequences
For the validation phase (A) base by base coverage of three target sequences selected for their varying lengths and GC% amplified by microdroplet (blue) and traditional (red) PCR. For the scale-up phase (B) the coverage of two targets representing an average and maximum amplicon length sequenced by Illumina GA (green) and Roche 454 (yellow) is shown. At the bottom of each plot the PCR primer positions (grey dumbbells connected by line) are shown. Roche 454 end sequencing of average sized amplicons results in 2-fold higher coverage of middle bases whereas end sequencing of larger amplicons results in middle bases having no coverage.
Figure 3Normalized Coverage Distribution Plots
The validation phase 457 amplicons amplified by traditional PCR (A) and microdroplet PCR (B) and the scale-up phase 3976 amplicons amplified by microdroplet PCR (C). Normalized coverage is the absolute base coverage divided by the mean coverage of bases for the indicated sample. Each colored line represents either one of the six samples (A & B) or one of two sequencing platforms (C). The solid colored lines represent the cumulative distribution (left axis) for each sample. The colored dashed lines indicate a skewed normal distribution (right axis) for each sample. For each sample the mean coverage across all bases are listed.
Figure 4Inter-sample reproducibility of amplicon coverage
For the validation phase the normalized mean coverage of each amplicon is plotted for NA12006 (Caucasian) versus NA18505 (African) samples for the traditional (A) and microdroplet (B) PCR methods. For each sample (assigned same color as in Figure 2) the average normalized coverage of each amplicon is plotted for traditional versus microdroplet (C). Correlation matrix for all samples depicting Lin's concordance coefficient (D). All samples show a high correlation among each other within a PCR method but not between the two methods.
Validation phase sequence variant detection rates and concordance
| HapMap SNPs | Variant Detection Rate | Variant Concordance | Reference | |||||
|---|---|---|---|---|---|---|---|---|
| Sample | PCR Method | Total | ENCODE | Total % | ENCODE % | Total % | ENCODE % | % |
| NA11832 | Traditional | 459 | 279 | 98.47 | 97.85 | 99.67 | 98.53 | 51.08 |
| NA11832 | Microdroplet | 97.39 | 97.13 | 98.66 | 98.52 | 51.11 | ||
| NA11992 | Traditional | 460 | 279 | 99.35 | 99.28 | 99.13 | 98.92 | 51.55 |
| NA11992 | Microdroplet | 98.48 | 98.57 | 99.33 | 98.91 | 51.64 | ||
| NA12006 | Traditional | 461 | 280 | 99.13 | 99.29 | 99.34 | 99.28 | 51.63 |
| NA12006 | Microdroplet | 97.61 | 97.86 | 98.85 | 99.27 | 50.02 | ||
| NA18505 | Traditional | 444 | 265 | 98.87 | 98.49 | 99.09 | 98.85 | 53.40 |
| NA18505 | Microdroplet | 97.52 | 98.11 | 98.85 | 98.46 | 52.62 | ||
| NA18517 | Traditional | 439 | 262 | 97.95 | 97.33 | 99.07 | 99.21 | 50.40 |
| NA18517 | Microdroplet | 95.67 | 95.80 | 99.29 | 99.20 | 49.69 | ||
| NA18489 | Traditional | 189 | 99 | 100 | 100 | 99.47 | 100 | 51.67 |
| NA18489 | Microdroplet | 98.94 | 98.99 | 99.47 | 100 | 51.66 | ||
- Number of called genotypes in 457 amplicons (Total) and the number mapping to the 234 amplicons in the ENCODE intervals (HapMap release 27),
- The number of genotypes that match between sequence and Hapmap divided by total comparisons
- The average across all concordant heterozygote sites, the number of observations of the reference allele in comparison to the alternate allele
Scale-up phase sequence variant detection rates and concordance
| Sample | Sequencing | HapMap | Variant | Variant | Discordant | |
|---|---|---|---|---|---|---|
| SNPs | Common | |||||
| NA18858 | Illumina GAII | 2226 | 99.326 | 98.83 | 26 | 21 |
| NA18858 | 454 Flx | 92.273 | 98.49 | 31 | ||
- Number of SNPs with discordant genotypes between HapMap and Illumina and 454 sequence data and the number of discordant SNPs in common between both platforms.