| Literature DB >> 24244686 |
Christian Rellstab1, Stefan Zoller, Andrew Tedder, Felix Gugerli, Martin C Fischer.
Abstract
Sequencing of pooled samples (Pool-Seq) using next-generation sequencing technologies has become increasingly popular, because it represents a rapid and cost-effective method to determine allele frequencies for single nucleotide polymorphisms (SNPs) in population pools. Validation of allele frequencies determined by Pool-Seq has been attempted using an individual genotyping approach, but these studies tend to use samples from existing model organism databases or DNA stores, and do not validate a realistic setup for sampling natural populations. Here we used pyrosequencing to validate allele frequencies determined by Pool-Seq in three natural populations of Arabidopsis halleri (Brassicaceae). The allele frequency estimates of the pooled population samples (consisting of 20 individual plant DNA samples) were determined after mapping Illumina reads to (i) the publicly available, high-quality reference genome of a closely related species (Arabidopsis thaliana) and (ii) our own de novo draft genome assembly of A. halleri. We then pyrosequenced nine selected SNPs using the same individuals from each population, resulting in a total of 540 samples. Our results show a highly significant and accurate relationship between pooled and individually determined allele frequencies, irrespective of the reference genome used. Allele frequencies differed on average by less than 4%. There was no tendency that either the Pool-Seq or the individual-based approach resulted in higher or lower estimates of allele frequencies. Moreover, the rather high coverage in the mapping to the two reference genomes, ranging from 55 to 284x, had no significant effect on the accuracy of the Pool-Seq. A resampling analysis showed that only very low coverage values (below 10-20x) would substantially reduce the precision of the method. We therefore conclude that a pooled re-sequencing approach is well suited for analyses of genetic variation in natural populations.Entities:
Mesh:
Year: 2013 PMID: 24244686 PMCID: PMC3820589 DOI: 10.1371/journal.pone.0080422
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Overview of Pool-Seq validation studies.
|
|
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|
| Van Tassell et al. [ | 2008 |
| Illumina CSMA | Illumina Infinium Array | 23'357 SNPs | 15-35 |
| Reduced representation libraries |
| Holt et al. [ | 2009 |
| Illumina GA I | Illumina GA I | 403 SNPs | 6 |
| |
| Druley et al. [ | 2009 |
| Illumina GA I | TaqMan/database (samples with previously known polymorphisms, determined by Sanger) | 14 SNPs in 4 genes | 1’111 |
| Pre-amplification of loci |
| Ingman and Gyllensten [ | 2009 |
| Roche 454 GS FLX | Database | 16 SNPs in 1 gene | 96 |
| Pre-amplification of gene, tested pooled DNA and PCR products |
| Out et al. [ | 2009 |
| Illumina GA I | Sanger | 23 SNPs in 1 gene | 287 |
| Tested pooled DNA and PCR products |
| 17 SNPs in 1 gene[ | 88[ |
| ||||||
| Bansal et al. [ | 2011 |
| Illumina GA IIx | HapMap database (samples with previously known polymorphisms) | >4’000 SNPs | 20 |
| In-solution hybridization |
| Margraf et al. [ | 2011 |
| Illumina GA IIx | Sanger | 47 SNPs in 1 gene | 30/50 | All known variants detected | Pre-amplification of two gene regions |
| Niranjan et al. [ | 2011 |
| Illumina GA IIx | Sanger | n.a. | 20/40 |
| Pre-amplification of gene |
| Zhu et al. [ | 2012 |
| Illumina GA IIx | Database (strains with previously known polymorphisms) | 100*1’000 random SNPs genome-wide | 22-92 |
| Pooled flies prior to DNA extraction |
| Gautier et al. [ | 2013 |
| Illumina HiSeq 2000 | Illumina HiSeq 2000 | 49’597 SNPs | 20/30 |
| Restriction site-associated DNA (RAD) sequencing |
| Zavodna et al. [ | 2013 |
| Roche 454 GS FLX | Sanger | 2 mitochondrial genes | 2-13 | R2 = 0.31-0.96[ | Pre-amplification of genes |
R 2 = determination coefficient of a linear regression, r = Pearson's correlation coefficient, mcc = Matthew's correlation coefficient, cc = concordance correlation.
for pooled PCR products.
calculated from the supporting material.
comparing estimates of nucleotide diversity and pairwise population differentiation.
SNP positions, PCR and pyrosequencing primers.
| Gene | TAIR locus identifier | SNP | SNP position in consensus sequence[ | Contig name[ | SNP position in contig[ | Primer | Primer sequence | PCR product size [bp] | TA
[ | Sequence to analyse[ | Pyrosequencing failures (of 60 samples) |
|---|---|---|---|---|---|---|---|---|---|---|---|
| ADC2 | AT4G34710 | 1 | 1587 | NODE_171306 | 9158 | PCR forward |
| 223 | 58 | ||
| PCR reverse |
| ||||||||||
| Pyrosequencing |
|
| 0 | ||||||||
| AHK1 | AT2G17820 | 1 | 1251 | NODE_107322 | 14799 | PCR forward |
| 163 | 58 | ||
| PCR reverse |
| ||||||||||
| Pyrosequencing |
|
| 0 | ||||||||
| AHK1 | AT2G17820 | 2 | 4933 | NODE_107322 | 18437 | PCR forward |
| 373 | 55 | ||
| PCR reverse |
| ||||||||||
| Pyrosequencing |
|
| 2 | ||||||||
| AT3G60750 | AT3G60750 | 1 | 1059 | NODE_134451 | 1382 | PCR forward |
| 197 | 58 | ||
| PCR reverse |
| ||||||||||
| Pyrosequencing |
|
| 0 | ||||||||
| ERD7 | AT2G17840 | 1 | 1622 | NODE_8629 | 4361 | PCR forward |
| 106 | 58 | ||
| PCR reverse |
| ||||||||||
| Pyrosequencing |
|
| 8 | ||||||||
| RUS1 | AT3G45890 | 1 & 2 | 1982/1996 | NODE_69242 | 24280/24294 | PCR forward |
| 129 | 58 | ||
| PCR reverse |
| ||||||||||
| Pyrosequencing |
|
| 6/5 | ||||||||
| RUS1 | AT3G45890 | 3 | 2430 | NODE_69242 | 24722 | PCR forward |
| 96 | 58 | ||
| PCR reverse |
| ||||||||||
| Pyrosequencing |
|
| 0 | ||||||||
| UHV1 | AT5G41150 | 1 | 3583 | NODE_76279 | 4123 | PCR forward |
| 99 | 55 | ||
| PCR reverse |
| ||||||||||
| Pyrosequencing |
|
| 1 |
Positions of the SNPs in the consensus sequences of File S1.
Names of the contigs containing the SNPs (File S2).
Positions of the SNPs in the contigs of File S2.
Annealing temperature.
The investigated SNPs are marked bold and underlined (IUPAC Codes).
Figure 1Validation of SNP allele frequencies determined by next-generation sequencing of pooled samples of Arabidopsis halleri.
Comparison between major allele frequencies calculated from individual genotyping using pyrosequencing (PyroMark) and based on pooled population samples (Pool-Seq) determined with Illumina. In Figure 1a, allele frequencies from the Pool-Seq were calculated from reads mapped to the publicly available Arabidopsis thaliana genome. In Figure 1b, the reads were mapped to our own de novo draft genome of A. halleri. Shown are the results of SNPs from six genes for all three populations studied. The dashed line represents the expected 1:1 proportion. Open circles represent comparisons including incomplete date from the pyrosequencing, filled circles refer to comparisons with complete data. Note that some data points are overlapping.
Figure 2Effect of sequencing coverage on the accuracy of the Pool-Seq in Arabidopsis halleri.
a) Mean difference between allele frequencies determined by Illumina Pool-Seq and individual pyrosequencing. b) Determination coefficient of the linear regression (R 2) between the two allele frequency estimates. Data derive from a resampling analysis with 1000 iterations, in which we performed random draws of the reads (mapped to Arabidopsis thaliana) in 27 SNP/population combinations. Error bars represent standard deviation.