| Literature DB >> 21163834 |
Elaine M Kenny1, Paul Cormican, William P Gilks, Amy S Gates, Colm T O'Dushlaine, Carlos Pinto, Aiden P Corvin, Michael Gill, Derek W Morris.
Abstract
Screening large numbers of target regions in multiple DNA samples for sequence variation is an important application of next-generation sequencing but an efficient method to enrich the samples in parallel has yet to be reported. We describe an advanced method that combines DNA samples using indexes or barcodes prior to target enrichment to facilitate this type of experiment. Sequencing libraries for multiple individual DNA samples, each incorporating a unique 6-bp index, are combined in equal quantities, enriched using a single in-solution target enrichment assay and sequenced in a single reaction. Sequence reads are parsed based on the index, allowing sequence analysis of individual samples. We show that the use of indexed samples does not impact on the efficiency of the enrichment reaction. For three- and nine-indexed HapMap DNA samples, the method was found to be highly accurate for SNP identification. Even with sequence coverage as low as 8x, 99% of sequence SNP calls were concordant with known genotypes. Within a single experiment, this method can sequence the exonic regions of hundreds of genes in tens of samples for sequence and structural variation using as little as 1 μg of input DNA per sample.Entities:
Mesh:
Substances:
Year: 2010 PMID: 21163834 PMCID: PMC3041504 DOI: 10.1093/dnares/dsq029
Source DB: PubMed Journal: DNA Res ISSN: 1340-2838 Impact factor: 4.458
Figure 2.Sequence coverage across on-target and off-target regions. Sequence coverage is plotted for the single non-indexed and indexed samples at an on-target site (PTBP2 on chromosome 1; A) and at an off-target site (chromosome 12; B). Inclusion of the index does not dramatically change the pattern of sequence coverage at on-target or off-target regions. The higher sequence coverage observed for the non-indexed sample compared with the indexed sample reflects the larger number of clusters that passed QC filtering during the sequence run (Supplementary Table SB).
Figure 1.Experimental Design. Genomic DNA from nine HapMap samples was chosen for the study (three trio families). DNA from one of the samples (NA11881) was prepared twice (with and without an indexed adapter), target enriched and sequenced separately as single samples (non-indexed sample and one indexed sample in Step 1 and enriched libraries 1 and 2 in Step 2). One trio family (NA11881, NA11882 and NA10859; all indexed) was pooled after the Illumina genomic DNA sample prep and enriched together using one SureSelect enrichment reaction to produce the enriched library 3 sample. Indexed DNA from all nine samples was also pooled after the Illumina genomic DNA sample prep and enriched together using one SureSelect enrichment reaction to produce the enriched library 4 sample. Note: enriched libraries 3 and 4 were also sequenced using 80 bp reads to generate additional data for validation of the method for SNP detection.
Percentage on-target and fold enrichment for each library
| Non-index sample | One-index sample | Three-index samplea | Nine-index samplea | |
|---|---|---|---|---|
| Percentage reads in targeted regions ± 50 bp (%)b | 20 | 22 | 21 | 18 |
| Fold enrichment in targeted regionsc | 1708 | 1885 | 1689 | 1467 |
| Percentage target bases covered (%)d | 98 | 98 | 98 | 98 |
| Median coverage of targete | 169xf | 93xf | 164x | 46x |
aForty and 80 bp data were combined for the three-index and nine-index samples. Average values given for multisample libraries. Individual values are listed in Supplementary Table SB.
bNumber of reads uniquely mapping to the target region (±50 bp) as a % of the number of reads uniquely mapping to hg18.
c(Sequence reads uniquely mapping to the target regions/Sequence reads mapping to hg18) × Maximum enrichment where maximum enrichment is a ratio of genome length (3 080 419 510 bp) to target length (377 388 bp).
dPercentage of target bases covered by at least one sequence read.
e(Number of 34 or 74 bp reads matching target × 34 or 74)/target length.
fThe difference in median read coverage between the non-indexed and indexed sample is reflective of the larger number of clusters on the flowcell and also the larger number of clusters passing QC filters in the non-indexed sample (83.48 versus 57.65%, Supplementary Table SB).
Figure 3.Percentage of sequence reads per indexed sample in sequenced libraries. Percentage distribution per sample of sequence reads (pre-alignment to reference genome; 40 and 80 bp data combined) for the three-index (A) and the nine-index (B) sample libraries. The relative underperformance of sample NA10859 in the three-index library is not observed in the nine-index library and is unlikely to be due to a systemtaic problem with the ACACAT index.
Concordance of SNPs called by MAQ in sequencing data with known HapMap genotypes
| Individual ID (# SNPs with at least one non-reference allele in PTBP2 and CDC42 target region for this sample) | Number of concordant SNP calls/number of SNPs with at least 8x coverage and Phred-like consensus quality > 30 (% concordance call) | |
|---|---|---|
| Three-index samples | Nine-index samples | |
| NA11881 (103) | 102/103 (99.0%) | 96/97a (98.9%) |
| NA11882 (136) | 134/135 (99.3%) | 132/134 (98.5%) |
| NA10859 (106) | 104/105 (99.0%) | 98/99 (98.9%) |
| NA12144 (103) | 96/97 (98.9%) | |
| NA12239 (57) | 54/56 (96.4%) | |
| NA12145 (111) | 107/107 (100%) | |
| NA10846 (109) | 104/105 (99%) | |
| NA10847 (109) | 106/107 (99%) | |
| NA12146 (106) | 101/102 (99%) | |
aOnly 97 of the 103 SNPs had ≥8x coverage and a Phred-like consensus score >30 and were included in concordance analysis.