| Literature DB >> 31236247 |
Yann Dorant1, Laura Benestan1,2, Quentin Rougemont1, Eric Normandeau1, Brian Boyle1,3, Rémy Rochette4, Louis Bernatchez1.
Abstract
Unraveling genetic population structure is challenging in species potentially characterized by large population size and high dispersal rates, often resulting in weak genetic differentiation. Genotyping a large number of samples can improve the detection of subtle genetic structure, but this may substantially increase sequencing cost and downstream bioinformatics computational time. To overcome this challenge, alternative, cost-effective sequencing approaches, namely Pool-seq and Rapture, have been developed. We empirically measured the power of resolution and congruence of these two methods in documenting weak population structure in nonmodel species with high gene flow comparatively to a conventional genotyping-by-sequencing (GBS) approach. For this, we used the American lobster (Homarus americanus) as a case study. First, we found that GBS, Rapture, and Pool-seq approaches gave similar allele frequency estimates (i.e., correlation coefficient over 0.90) and all three revealed the same weak pattern of population structure. Yet, Pool-seq data showed F ST estimates three to five times higher than GBS and Rapture, while the latter two methods returned similar F ST estimates, indicating that individual-based approaches provided more congruent results than Pool-seq. We conclude that despite higher costs, GBS and Rapture are more convenient approaches to use in the case of species exhibiting very weak differentiation. While both GBS and Rapture approaches provided similar results with regard to estimates of population genetic parameters, GBS remains more cost-effective in project involving a relatively small numbers of genotyped individuals (e.g., <1,000). Overall, this study illustrates the complexity of estimating genetic differentiation and other summary statistics in complex biological systems characterized by large population size and migration rates.Entities:
Keywords: GBS; Homarus; Pool‐seq; Rapture; marine genomics; population genetics
Year: 2019 PMID: 31236247 PMCID: PMC6580275 DOI: 10.1002/ece3.5240
Source DB: PubMed Journal: Ecol Evol ISSN: 2045-7758 Impact factor: 2.912
Figure 1Map of lobster sampling locations. GAS, Gaspé; LOB, Lobster bay; SID, Sidney Bight; SJH, Saint‐John Harbour; THE, The Wolves/Deer island; TRI, Triton
Summary statistics of data obtained using genotype by sequencing (GBS), Rapture, and Pool‐seq approaches
| GBS | Rapture | Pool‐seq | |
|---|---|---|---|
| Number of individual barcodes per sequencing chip |
|
|
|
| Average reads per library (millions) | 80 ( | 84 ( | 78 ( |
| Average reads per individual/pool (millions) | 1.3 M ( | 0.46 M ( | 8.4 M ( |
| Proportion of targeted loci with at least one read per sample/pool | 98% | 95% | 99% |
| SNPs called | 41,147 | 35,325 | 49,238 |
| SNPs quality filtering | 16,986 | 13,930 | 10,874 |
| SNPs (only one SNP per locus) | 8,079 | 6,401 | 5,558 |
| SNPs mean depth | 17× | 33× | 87× |
| % targeted loci after filtering | 82% | 65% | 56% |
The last line (% targeted loci after filtering) indicates the proportion of loci kept at the end of the filtering steps and relative to the maximum of loci expected (i.e., the 9,818 loci from the reference catalog used for mapping and for sequence capture).
Figure 2Minor allele frequency correlation comparing GBS and Rapture. Comparison between minor allele frequency (MAF) estimates of the 4,664 overlapped SNPs from individual GBS (x‐axis) and Rapture (y‐axis), with the Pearson correlation values for each population comparison. The black line represents the expected correlation (1:1 proportion)
Figure 3Minor allele frequency correlations comparing GBS and Pool‐seq. Comparison between minor allele frequency (MAF) estimates for the 4,664 overlapping SNPs from individual GBS sequences data (x‐axis) and Pool‐seq data (y‐axis), with the Pearson correlation values for each comparison. For each sampling site comparison, Pool‐seq values represent the average of minor allele frequency between pool replicates. The black line represents the expected correlation (1:1 proportion)
Details of minor allele frequency (MAF) correlations between individual‐based approaches (i.e., GBS and Rapture) and Pool‐seq overlapped SNPs datasets
| GAS | LOB | SID | SJH | THE | TRI | |
|---|---|---|---|---|---|---|
| GBS versus | ||||||
| Pool replicate 1 | 0.86 | 0.85 | 0.87 | 0.92 | 0.64 | 0.85 |
| Pool replicate 2 | 0.86 | 0.88 | 0.86 | 0.88 | 0.84 | 0.88 |
| Pool replicate 3 | 0.86 | 0.85 | 0.86 | 0.91 | 0.86 | 0.86 |
| Pool replicate 4 | 0.85 | 0.80 | 0.84 | – | – | 0.87 |
| Average | 0.86 | 0.85 | 0.86 | 0.90 | 0.78 | 0.87 |
| Rapture versus | ||||||
| Pool replicate 1 | 0.84 | 0.84 | 0.87 | 0.89 | 0.61 | 0.84 |
| Pool replicate 2 | 0.85 | 0.87 | 0.85 | 0.86 | 0.82 | 0.87 |
| Pool replicate 3 | 0.85 | 0.84 | 0.85 | 0.88 | 0.84 | 0.84 |
| Pool replicate 4 | 0.85 | 0.78 | 0.83 | – | – | 0.86 |
| Average | 0.85 | 0.83 | 0.85 | 0.88 | 0.76 | 0.85 |
Values represent MAF correlations between individual‐based data and each Pool‐seq replicate distributed for each sampling site (columns). Sampling site codes are detailed in the Figure 1 (i.e., sampling map). All correlation values were significant (p‐value < 10−4) and calculated from the Pearson method. Note the weaker correlation for Pool replicate 1 for the THE population.
Genetic differentiation (i.e., pairwise F ST values) estimated by Weir and Cockerham (1984) index
| Samples pair | Overall SNPs datasets | Overlapped SNPs dataset | ||||
|---|---|---|---|---|---|---|
| GBS | Rapture | Pool‐seq | GBS | Rapture | Pool‐seq | |
| GAS|LOB |
0.0021 |
0.0022 |
0.0077 |
0.0016 |
0.0017 |
0.0081 |
| GAS|SID |
|
|
0.0046 |
|
|
0.0060 |
| GAS|SJH |
0.0030 |
0.0018 |
0.0033 |
0.0020 |
0.0015 |
0.0041 |
| GAS|THE |
0.0026 |
0.0015 |
0.0085 |
0.0013 |
|
0.0095 |
| GAS|TRI |
0.0006 |
0.0006 |
0.0047 |
|
|
0.0061 |
| LOB|SID |
0.0010 |
0.0017 |
0.0071 |
|
0.0016 |
0.0078 |
| LOB|SJH |
0.0009 |
|
0.0030 |
0.0012 |
|
0.0037 |
| LOB|THE |
|
|
0.0085 |
|
|
0.0086 |
| LOB|TRI |
0.0018 |
0.0023 |
0.0064 |
0.0012 |
0.0025 |
0.0069 |
| SID|SJH |
0.0025 |
0.0014 |
0.0032 |
0.0022 |
0.0016 |
0.0040 |
| SID|THE |
0.0021 |
0.0013 |
0.0088 |
0.0011 |
|
0.0098 |
| SID|TRI |
|
0.0006 |
0.0047 |
|
|
0.0056 |
| SJH|THE |
|
|
|
|
|
0.0037 |
| SJH|TRI |
0.0024 |
0.0015 |
0.0024 |
0.0019 |
0.0016 |
0.0036 |
| THE|TRI |
0.0022 |
0.0019 |
0.0077 |
0.0021 |
0.0021 |
0.0099 |
| Average | 0.0014 | 0.0011 | 0.0054 | 0.0011 | 0.0011 | 0.0065 |
95% confidence intervals were obtained after 1,000 bootstraps and are provided below F ST values. Sampling sites codes are detailed in the Figure 1 (i.e., sampling map). Values in bold were not significant.
Figure 4Clustering analysis under Bayesian hierarchical model. (a, b, and c) represent the eigenvalue decomposition of the scaled variance–covariance matrices of population allele frequencies (Ω) for GBS, Rapture, and Pool‐seq datasets, respectively. Left plots correspond to overall SNPs datasets and right plots correspond to overlapping SNPs datasets. Variance–covariance matrix (Ω) was estimated from the neutral core model proposed by Coop, Witonsky, Rienzo, & Pritchard (2010) and implemented in BAYPASS software (Gautier, 2015).
Two‐dimensional association of genetic variation versus geography
| Latitude | Longitude | |||
|---|---|---|---|---|
| PC1 | PC2 | PC1 | PC2 | |
| GBS overall | 0.71 | 0.74 | 0.69 | 0.70 |
| GBS overlap | 0.69 | 0.72 | 0.68 | 0.78 |
| Rapture overall | 0.55 | 0.94 | 0.48 | 0.63 |
| Rapture overlap | 0.68 | 0.94 | 0.720 | 0.82 |
| Pool‐seq overall | −0.08 | 0.87 | −0.15 | 0.76 |
| Pool‐seq overlap | 0.60 | 0.92 | 0.460 | 0.79 |
Values represent Pearson r correlation between Ω‐PC space coordinates of each sampling site (i.e., PC1 and PC2, see Figure 4) versus geographic position (i.e., latitude and longitude).
Figure 5Genotyping cost relatively to sampling design. Genotyping costs were estimated from our experimental design and sequencing platform fees. Genotyping by sequencing (GBS) was based on 96 barcodes sequencing setup. Pool‐seq genotyping costs were calculated based on pool size with 50 samples, three technical replicates per pool, and 15 Pool‐seq libraries per sequencing chip. Rapture costs are given for three multiplexing scenarios (e.g., 96, 192, and 384 individual barcodes). Genotyping cost were estimated based on Probe kit invest (here 20K probes kit ≈ 6,000 $US—Arbor Biosciences™ 2016), an average reads depth to 15×, and an optimized capturing step for five Rapture in the same laboratory experience. We also allowed 10% of poor‐quality samples for re‐sequencing in GBS and Rapture. We fixed two sequencing runs for each individual/pool libraries among each approach
Advantages/disadvantages of each approaches
|
|
|
| Disadvantage |
|---|---|---|---|
|
|
|
Keep individual information No reference genome required Allow low coverage sequencing Library normalization |
High genotyping costs with large number of samples Heavy bioinformatics process when dealing with thousands of samples Limited multiplexing for sequencing |
|
|
|
Costs decrease with number of samples compared to GBS Keep individual information No reference genome required Allow low coverage sequencing Fast bioinformatic processes Requires fewer reads per sample than GBS for the same coverage |
Require prior RAD‐seq experiment to develop capture probes Investment for probes production Overall time required for getting results extended Less cost‐effective when number of samples is small |
|
|
|
Low costs Fast library time preparation Large library multiplexing (hundreds to thousands of samples) Fast bioinformatics processes |
No individual information Requires genomic reference Require pool of individuals > 40 Unbalanced contribution of samples Minimal coverage > 20× |
Genotyping costs are proportional to the number of samples.
For the same sequencing depth, GBS need more sequencing effort per sample than Rapture.