| Literature DB >> 21347407 |
Anna Kiialainen1, Olof Karlberg, Annika Ahlford, Snaevar Sigurdsson, Kerstin Lindblad-Toh, Ann-Christine Syvänen.
Abstract
Targeted sequencing is a cost-efficient way to obtain answers to biological questions in many projects, but the choice of the enrichment method to use can be difficult. In this study we compared two hybridization methods for target enrichment for massively parallel sequencing and single nucleotide polymorphism (SNP) discovery, namely Nimblegen sequence capture arrays and the SureSelect liquid-based hybrid capture system. We prepared sequencing libraries from three HapMap samples using both methods, sequenced the libraries on the Illumina Genome Analyzer, mapped the sequencing reads back to the genome, and called variants in the sequences. 74-75% of the sequence reads originated from the targeted region in the SureSelect libraries and 41-67% in the Nimblegen libraries. We could sequence up to 99.9% and 99.5% of the regions targeted by capture probes from the SureSelect libraries and from the Nimblegen libraries, respectively. The Nimblegen probes covered 0.6 Mb more of the original 3.1 Mb target region than the SureSelect probes. In each sample, we called more SNPs and detected more novel SNPs from the libraries that were prepared using the Nimblegen method. Thus the Nimblegen method gave better results when judged by the number of SNPs called, but this came at the cost of more over-sampling.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21347407 PMCID: PMC3036585 DOI: 10.1371/journal.pone.0016486
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Sequencing and alignment statistics.
| Sample | Number of reads (PF) | % of reads that align to genome | Number of reads that align to target | % of reads that align to target |
| NA10860 SureSelect (Downsampled) | 18 325 890 | 89 | 13 803 067 (7603844) | 75 (75) |
| NA10860 Nimblegen | 10 093 998 | 85 | 5 363 593 | 53 |
| NA11992 SureSelect (Downsampled) | 18 785 249 | 90 | 13 818 498 (9293047) | 74 (74) |
| NA11992 Nimblegen | 12 634 643 | 83 | 5 209 663 | 41 |
| NA11993 SureSelect (Downsampled) | 19 148 344 | 89 | 14 200 810 (9972236) | 74 (74) |
| NA11993 Nimblegen | 13 444 317 | 88 | 8 985 107 | 67 |
PF = pass filter.
Figure 1Coverage plots of the sequencing libraries prepared with SureSelect (left) and Nimblegen (right) methods.
Cumulative base frequency is plotted against coverage. In the SureSelect libraries, 94–97% of the bases in the regions targeted by the probes are covered by at least ten reads. In the Nimblegen libraries the corresponding number is 83–97%.
Coverage of the probe targeted regions and regions targeted by both probe sets in each sample.
| Sample | % probe targeted region/region targeted by both probe sets | Average sequencing depth | ||
| >1X | >10X | >30X | ||
| NA10860 SureSelect | 99.5/99.5 | 94.4/94.4 | 83.6/83.4 | 86 |
| NA10860 Nimblegen | 99.0/99.5 | 93.0/94.8 | 77.1/80.7 | 61 |
| NA11992 SureSelect | 99.9/99.9 | 97.4/97.4 | 88.5/88.5 | 105 |
| NA11992 Nimblegen | 96.9/97.9 | 83.1/86.3 | 65.4/70.6 | 59 |
| NA11993 SureSelect | 99.8/99.8 | 97.3/97.3 | 88.7/88.7 | 113 |
| NA11993 Nimblegen | 99.5/99.8 | 96.9/98.0 | 88.1/90.6 | 101 |
Figure 2Coverage of a single gene.
Per base coverage plotted along ARL11 in the sample NA11993 in sequencing libraries prepared with Nimblegen (top) and SureSelect (bottom) methods. The bars at the top of the graph mark the regions that are targeted by probes in each design (red) and the repeat elements as determined by RepeatMasker (blue).
Figure 3Normalized coverage of the sequencing libraries prepared with SureSelect (left) and Nimblegen (right) methods.
To normalize the coverage, the absolute per base coverage was divided by the mean coverage. In the SureSelect libraries, 67–69% of the regions targeted by probes are covered by at least half of the average coverage. In the Nimblegen libraries 61–72% of the regions targeted by probes are covered by at least half of the average coverage.
HapMap SNPs in the targeted region for the three individuals sequenced.
| NA10860 | NA11992 | NA11993 | ||||
| Number of SNPs | SureSelect | Nimblegen | SureSelect | Nimblegen | SureSelect | Nimblegen |
| In HapMap | 977 | 977 | 1796 | 1796 | 1772 | 1772 |
| Targeted by probes | 673 | 923 | 1335 | 1722 | 1322 | 1701 |
| Targeted by both probe sets | 666 | 666 | 1323 | 1323 | 1310 | 1310 |
| Covered by sequencing | 710 | 884 | 1422 | 1519 | 1406 | 1708 |
| Covered by probes and sequencing | 652 | 870 | 1328 | 1498 | 1307 | 1678 |
| Covered by both probe sets and seq | 647 | 635 | 1317 | 1189 | 1297 | 1293 |
| Sequencing agrees with HapMap | 677 | 853 | 1378 | 1436 | 1375 | 1631 |
| Targeted by both probe sets and seq agrees with HapMap | 618 | 612 | 1279 | 1129 | 1271 | 1232 |
SNPs in the targeted region and in the region covered by both probe sets.
| Total | HapMap | Annotated in Ensembl | In 1000 genomes | Novel | |
| NA10860 SureSelect | 2 405/2069 | 232/209 | 2245/1949 | 2227/1946 | 80/66 |
| NA10860 Nimblegen | 3 061/2089 | 310/209 | 2829/1953 | 2837/1937 | 131/87 |
| NA11992 SureSelect | 2 513/2161 | 645/598 | 2359/2051 | 2341/2031 | 67/53 |
| NA11992 Nimblegen | 2 598/1821 | 667/511 | 2411/1712 | 2409/1686 | 92/62 |
| NA11993 SureSelect | 2 702/2310 | 640/591 | 2453/2125 | 2483/2145 | 118/91 |
| NA11993 Nimblegen | 3 371/2221 | 758/556 | 3021/2031 | 3079/2049 | 181/98 |