| Literature DB >> 33963409 |
Claire Mérot1, Emma L Berdan2, Hugo Cayuela1,3, Haig Djambazian4, Anne-Laure Ferchaud1, Martin Laporte1, Eric Normandeau1, Jiannis Ragoussis4, Maren Wellenreuther5,6, Louis Bernatchez1.
Abstract
Across a species range, multiple sources of environmental heterogeneity, at both small and large scales, create complex landscapes of selection, which may challenge adaptation, particularly when gene flow is high. One key to multidimensional adaptation may reside in the heterogeneity of recombination along the genome. Structural variants, like chromosomal inversions, reduce recombination, increasing linkage disequilibrium among loci at a potentially massive scale. In this study, we examined how chromosomal inversions shape genetic variation across a species range and ask how their contribution to adaptation in the face of gene flow varies across geographic scales. We sampled the seaweed fly Coelopa frigida along a bioclimatic gradient stretching across 10° of latitude, a salinity gradient, and a range of heterogeneous, patchy habitats. We generated a chromosome-level genome assembly to analyze 1,446 low-coverage whole genomes collected along those gradients. We found several large nonrecombining genomic regions, including putative inversions. In contrast to the collinear regions, inversions and low-recombining regions differentiated populations more strongly, either along an ecogeographic cline or at a fine-grained scale. These genomic regions were associated with environmental factors and adaptive phenotypes, albeit with contrasting patterns. Altogether, our results highlight the importance of recombination in shaping adaptation to environmental heterogeneity at local and large scales.Entities:
Keywords: diptera; environmental associations; local adaptation; population genomics; structural variants
Mesh:
Year: 2021 PMID: 33963409 PMCID: PMC8382925 DOI: 10.1093/molbev/msab143
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
Fig. 1.Coelopa frigida sampling across an environmental gradient. Map of the 16 sampling sites, colored by geographic region. The background of the map displays the gradient of annual mean air temperature. The insert shows the location of the study area at a wider scale. Photos show C. frigida and its habitat of seaweed beds.
Fig. 2.Two large chromosomal inversions structure within species genetic variability. (A) PCA of whole-genome variation. Individuals are colored by karyotypes at the inversion Cf-Inv(1), as determined previously with an SNP marker (Mérot et al. 2018). Ellipses indicate secondary grouping along PC2. (B) LD in LG1 and LG4. The upper triangles include all individuals and the lower triangles include homokaryotes for the most common arrangement for each inversion. Bars represent the position of the inversions. The color scale shows the 2nd higher percentile of the R2 value between SNPs summarized by windows of 250 kb (C) Along the genome, correlation between PC1 scores of local PCAs performed on windows of 100 SNPs and PC1 scores of the PCA performed on the whole genome; FST differentiation between the two homokaryotypes of Cf-Inv(1) in sliding-windows of 25 kb; and nucleotide diversity (π) within the three karyotypic groups of Cf-Inv(1) smoothed for visualization. Dashed lines represent the inferred boundaries of the inversion Cf-Inv(1) (D) Correlation between PC1 scores of local PCAs performed on windows of 100 SNPs and PC2 scores of the PCA performed on the whole genome; FST differentiation between the two homokaryotypes of Cf-Inv(4.1) in sliding windows of 25 kb; and nucleotide diversity (π) within the three karyotypic groups of Cf-Inv(4.1) smoothed for visualization. Dashed lines represent the inferred boundaries of the inversion Cf-Inv(4.1).
Name, Position, and Characteristics of the Putative Inversions and Regions Appearing as Cluster of Outlier Windows in the Local PCA Analysis.
| Name | Status | Chr. | Start | Stop | Size (MB) | dXY |
|---|---|---|---|---|---|---|
|
|
| LG1 | 8,342,182 | 33,487,673 | 25.1 | 1.84% [1.80 |
|
|
| LG4 | 1,088,816 | 7,995,568 | 6.9 | 0.64% [0.61 |
|
|
| LG4 | 22,421,881 | 25,145,365 | 2.7 | 0.079% [0.061 |
|
| LG4 | 30,622,035 | 31,991,919 | 1.4 | 0.32% [0.31 | |
|
|
| LG2 | 14,083,320 | 20,869,940 | 6.8 | |
|
|
| LG3 | 7,486,933 | 13,829,649 | 6.3 | |
|
|
| LG5 | 15,940,464 | 32,665,323 | 16.7 |
Note.—For putative inversions, absolute nucleotide divergence (dXY) in noncoding regions was calculated between homokaryotypic groups and corrected by the mean of nucleotide diversity (π) within homokaryotypic groups by windows of 25 kb. Numbers between square brackets indicate confidence intervals drawn by bootstrapping windows of 25 kb.
Fig. 3.Detecting other regions exhibiting non recombining haplotypic blocks. (A) LD across the five major chromosomes expressed as the 2nd higher percentile of the R2 value between SNPs summarized by windows of 1 Mb. (B) Recombination rate (in cM/Mb) inferred from the linkage map, smoothened with a loess function accounting for 10% of the markers. (C) Nucleotide diversity (π) by sliding windows of 100 kb (step 20 kb) averaged across the different geographic populations. (D) Position along the genome of clusters of local PCA windows scored as outliers (>4 SD) along each axis of the MDS, at the upper end in black, and the lower end in gray. Colored rectangles indicate the position of the inversions and the regions of interest gathering outlier clusters or putative inversions. Dashed lines represent their inferred boundaries across all plots. (E) PCA performed on SNPs located in each region of interest. For the two regions on LG4 that appear as two linked putative inversions (Cf-Inv(4.2) and Cf-Inv(4.3)), three clusters were identified with high confidence and colored as putative homokaryotes and heterokaryotes. The same colors are used in both regions since karyotyping was consistent across all individuals.
Fig. 4.Genetic variation is geographically structured along a North-South gradient and displays IBR. (A) Third and 4th PCs of a PCA on whole-genome variation. Individuals are colored by their geographic region, as in fig. 1. (B and C) IBR displayed as the association between genetic distance (FST/(1−FST) and the distance by the least-cost path following the coast. Colors denote the subset of SNPs used for the calculation of the FST. The results are displayed in two panels with different y scales to better display the lower values. (D) Latitudinal variation of inversion frequencies.
Association between Genetic Distance and Geographic Distances Measured as Least-Cost Distances along the Shoreline (IBR) for the Different Fractions of the Genome.
| SNP subset |
| Intercept | Slope coefficient | Comparison to collinear regions | ||
|---|---|---|---|---|---|---|
| All | 0.19 | 29.3 | <0.001 | 0.0085 | 0.0020 [0.0013 | |
| Collinear | 0.54 | 138.6 | <0.001 | 0.0062 | 0.0019 [0.0015 | |
| LD pruned | 0.63 | 199.5 | <0.001 | 0.0057 | 0.0021 [0.0018 | |
|
| −0.01 | 0.3 | 0.59 | 0.0137 | −0.0006 [−0.0032 | −* |
|
| 0.29 | 49.4 | <0.001 | 0.0172 | 0.0134 [0.0096 | +* |
|
| 0.50 | 121.5 | <0.001 | 0.0075 | 0.0030 [0.0025 | +* |
|
| 0.44 | 95.4 | <0.001 | 0.0074 | 0.0028 [0.0023 | +* |
|
| 0.49 | 113.1 | <0.001 | 0.0066 | 0.0019 [0.0016 | n.s. |
|
| 0.55 | 147.2 | <0.001 | 0.0080 | 0.0033 [0.0028 | +* |
Note.—Numbers between square brackets indicate the limits of the 95% distribution of the slope coefficient. The comparison to collinear regions displays the output of a full model comparing each region to the collinear genome, providing the direction and the significance (*) of the interaction term.
Fig. 5.Environmental and phenotypic associations. Candidate SNPs associated with (A) climatic variation along the North-South gradient, (B) salinity variation along the Estuarian gradient, (C) variations in abiotic characteristics of the wrackbed habitat, (D and E) variation in wrackbed algal composition. The Manhattan plot shows the Bayesian factor from the environmental association analysis performed in Baypass, controlling for population structure. (F) Candidate SNPs associated with wing size. The Manhattan plot shows the P values from the GWAS. Points are colored according to FDR (black: <0.00001, red: <0.0001, orange: <0.001). Dashed lines represent the inferred boundaries of inversions and low-recombining regions.
Genomic Repartition of Candidate SNPs Associated with Environmental Variables.
| Tested SNPs | Climate | Salinity | Bed abiotic characteristics | Algal composition (PC1) | Algal composition (PC2) | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| % |
| % | OR |
| % | OR |
| % | OR |
| % | OR |
| % | OR | |
| All | 1,155,978 | 3,635 | 509 | 780 | 372 | 2,740 | |||||||||||
| Collinear | 814,279 | 70 | 556 | 15 | 0.2 | 301 | 59 | 0.8 | 163 | 21 | 0.3 | 254 | 68 | 1.0 | 390 | 14 | 0.2 |
|
| 176,963 | 15 | 1474 | 41 |
| 64 | 13 | 0.8 | 584 | 75 |
| 77 | 21 |
| 1494 | 55 |
|
|
| 57,323 | 5.0 | 480 | 13 |
| 15 | 2.9 | 0.6 | 11 | 1.4 | 0.3 | 14 | 3.8 | 0.8 | 33 | 1.2 | 0.2 |
|
| 17,019 | 1.5 | 111 | 3.1 |
| 8 | 1.6 | 1.1 | 8 | 1.0 | 0.7 | 3 | 0.8 | 0.5 | 26 | 0.9 | 0.6 |
|
| 20,458 | 1.8 | 93 | 2.6 |
| 6 | 1.2 | 0.7 | 9 | 1.2 | 0.7 | 3 | 0.8 | 0.5 | 15 | 0.5 | 0.3 |
|
| 16,313 | 1.4 | 11 | 0.3 | 0.2 | 28 | 5.5 |
| 0 | 0.0 | 0.0 | 3 | 0.8 | 0.6 | 7 | 0.3 | 0.2 |
|
| 53,623 | 4.6 | 910 | 25 |
| 87 | 17 |
| 5 | 0.6 | 0.1 | 18 | 4.8 | 1.0 | 775 | 28 |
|
Note.—Repartition of the candidate SNPs associated with each environmental variation using the combination of two GEA methods. N is the number of outliers SNPs located in a given region, % is the proportion of the outliers found in this region, and OR indicates the odds ratio. Values in bold with a star indicate significant excess of candidate SNPs in a Fisher exact test. Results obtained for each GEA method are presented in supplementary table S5, Supplementary Material online.