Mona Schreiber1,2, Yixuan Gao3, Natalie Koch3, Joerg Fuchs2, Stefan Heckmann2, Axel Himmelbach2, Andreas Börner2, Hakan Özkan4, Andreas Maurer3, Nils Stein2, Martin Mascher2,5, Steven Dreissig3. 1. University of Marburg, Department of Biology, Marburg, Germany. 2. Leibniz Institute of Plant Genetics and Crop Plant Research (IPK), Seeland, OT Gatersleben, Germany. 3. Institute of Agricultural and Nutritional Sciences, Martin-Luther-University Halle-Wittenberg, Halle (Saale), Germany. 4. University of Cukurova, Faculty of Agriculture, Department of Field Crops, 01330 Adana, Turkey. 5. German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig, Leipzig, Germany.
Abstract
The genomic landscape of recombination plays an essential role in evolution. Patterns of recombination are highly variable along chromosomes, between sexes, individuals, populations, and species. In many eukaryotes, recombination rates are elevated in sub-telomeric regions and drastically reduced near centromeres, resulting in large low-recombining (LR) regions. The processes of recombination are influenced by genetic factors, such as different alleles of genes involved in meiosis and chromatin structure, as well as external environmental stimuli like temperature and overall stress. In this work, we focused on the genomic landscapes of recombination in a collection of 916 rye (Secale cereale) individuals. By analysing population structure among individuals of different domestication status and geographic origin, we detected high levels of admixture, reflecting the reproductive biology of a self-incompatible, wind-pollinating grass species. We then analysed patterns of recombination in overlapping subpopulations, which revealed substantial variation in the physical size of LR regions, with a tendency for larger LR regions in domesticated subpopulations. Genome-wide association scans (GWAS) for LR region size revealed a major quantitative-trait-locus (QTL) at which, among 18 annotated genes, an ortholog of histone H4 acetyltransferase ESA1 was located. Rye individuals belonging to domesticated subpopulations showed increased synaptonemal complex length, but no difference in crossover frequency, indicating that only the recombination landscape is different. Furthermore, the genomic region harbouring rye ScESA1 showed moderate patterns of selection in domesticated subpopulations, suggesting that larger LR regions were indirectly selected for during domestication to achieve more homogeneous populations for agricultural use.
The genomic landscape of recombination plays an essential role in evolution. Patterns of recombination are highly variable along chromosomes, between sexes, individuals, populations, and species. In many eukaryotes, recombination rates are elevated in sub-telomeric regions and drastically reduced near centromeres, resulting in large low-recombining (LR) regions. The processes of recombination are influenced by genetic factors, such as different alleles of genes involved in meiosis and chromatin structure, as well as external environmental stimuli like temperature and overall stress. In this work, we focused on the genomic landscapes of recombination in a collection of 916 rye (Secale cereale) individuals. By analysing population structure among individuals of different domestication status and geographic origin, we detected high levels of admixture, reflecting the reproductive biology of a self-incompatible, wind-pollinating grass species. We then analysed patterns of recombination in overlapping subpopulations, which revealed substantial variation in the physical size of LR regions, with a tendency for larger LR regions in domesticated subpopulations. Genome-wide association scans (GWAS) for LR region size revealed a major quantitative-trait-locus (QTL) at which, among 18 annotated genes, an ortholog of histone H4 acetyltransferase ESA1 was located. Rye individuals belonging to domesticated subpopulations showed increased synaptonemal complex length, but no difference in crossover frequency, indicating that only the recombination landscape is different. Furthermore, the genomic region harbouring rye ScESA1 showed moderate patterns of selection in domesticated subpopulations, suggesting that larger LR regions were indirectly selected for during domestication to achieve more homogeneous populations for agricultural use.
Meiosis is a specialized cell division during which homologous chromosomes recombine and pair to produce genetically reshuffled gametes. As such, it is an important factor in the evolution of eukaryotic genomes, facilitating the creation of genetic diversity across time (Charlesworth 1976; Barton and Charlesworth 1998; Otto 2009). Within populations, the breeding system, for example, outcrossing versus selfing, has an impact on the effective rate of recombination and patterns of genetic diversity (Jain 1976; Nordborg 2000; Charlesworth and Wright 2001). The genomic landscape of recombination plays a role in shaping patterns of diversity along the genome and has an effect on the efficacy of selection (i.e., linked selection) (Charlesworth and Jensen 2021). In different populations of the same species, genome-wide recombination rates were shown to be similar (e.g., honeybee [Apis mellifera] [Ross et al. 2015], great tit [Parus major] [van Oers et al. 2014]). In species in which recombination hot-spots are determined by PRDM9, hot-spot landscape divergence is driven by the evolution of PRDM9 binding sites (Baker et al. 2017; Latrille et al. 2017).Meiotic recombination takes place during prophase I, where DNA double-strand breaks (DSBs) are initiated. DSB repair using the homologous chromosome is required for chromosome pairing and balanced chromosome segregation (Mercier et al. 2015). At the onset of meiotic prophase, sister-chromatid loops are tethered to the chromosome axis via REC8-cohesins, a proteinaceous scaffold that organizes meiotic chromosomes into loops and regulates meiotic recombination events (Ur and Corbett 2021). Dependent on DSB repair, homologous chromosomes pair via the polymerization of the synaptonemal complex (SC), a liquid crystal-like structure comprising the chromosome axis, transverse filament-like protein elements, and a central element. Together, the SC acts as a regulator of chromatin density and impacts on the distribution of recombination events along the chromosome and heterochiasmy (Kleckner et al. 2003; Rog et al. 2017; Capilla-Pérez et al. 2021; France et al. 2021; Gordon et al. 2021). In mice, for example, quantitative variation in SC length has an effect on genome-wide recombination rates, and several candidate genes were identified [e.g., Hmf1, Rnf212 (Wang et al. 2019)]. The distribution of recombination events is also strongly influenced by the local chromatin environment, for example, via patterns of DNA methylation (Melamed-Bessudo and Levy 2012; Mirouze et al. 2012; Yelina et al. 2012; Underwood et al. 2018) and histone acetylation (Perrella et al. 2010; Wang et al. 2021).In many eukaryotes, recombination rates increase with distance from the centromere and correlate with the density of heterochromatin. Interestingly, chromosome length, or SC length at prophase, correlates with the distribution of recombination events. Species with longer chromosomes, that is, large-genome grasses, show drastically telomere-biased recombination landscapes, whereas smaller genome species, such as Arabidopsis, do not show large low-recombining (LR) regions around the centromere (Haenel et al. 2018; Dreissig et al. 2019; Rowan et al. 2019; Danguy des Déserts et al. 2021; Dukić et al. 2022). Within a given species, SC length variation is influencing the total crossover number, with longer SCs per base pair (bp) permitting more DSBs and crossovers to occur (Kleckner et al. 2003). It was recently shown that a reduction in histone H4 acetylation via knock-out of ESA1 in yeast leads to a reduction in SC length and crossover number, suggesting that histone modifications are important regulators of SC length and crossover distribution (Wang et al. 2021). Meiotic recombination rates are highly variable at different scales, such as along the chromosome, between sexes, individuals, populations, and species (Stapley et al. 2017). At the population scale, several meiotic genes exhibit allelic variation and were shown to be under selection in different species (Wright et al. 2015; Johnston et al. 2016, 2018; Brand et al. 2019). In Drosophila, for example, the mini-chromosome maintenance protein MEI-218 is influencing the recombination landscape and is under positive selection in some populations (Brand et al. 2019).Rye (Secale cereale) is a self-incompatible, wind-pollinating plant species belonging to the Poaceae (Monocotyledons). Its wide-ranged distribution, random-mating reproductive biology, and recently assembled reference genome (7.9 gigabases) make it an ideal model for population genetic studies (Li et al. 2021; Rabanus-Wallace et al. 2021; Sun et al. 2021). Rye was domesticated from weeds outside its region of origin, the Fertile Crescent, which was first hypothesized by Nikolai Ivanovič Vavilov (1887–1943) and recently confirmed by Sun et al. (2021). To this date, numerous samples of outbreeding rye individuals were collected, maintained, and stored in ex situ genebanks as geo-referenced accessions, enabling comprehensive genomic studies not only in this species (Wambugu et al. 2018; Mascher et al. 2019; Milner et al. 2019; Haupt and Schmid 2022).Here, our goal was to explore population-level differences in the recombination landscape of a widely distributed grass species. We hypothesized that recombination landscape divergence was driven by allelic variants of genes involved in shaping patterns of recombination, and that indirect selection of such genes may have occurred during domestication. This was based on the observation that alleles of meiotic genes were shown to impact on crossover frequency, patterning, or both, in other eukaryotes like fruit flies, soay sheep, red deer, mice, wheat, barley, and Arabidopsis (Johnston et al. 2016, 2018; Ziolkowski et al. 2017; Brand et al. 2018, 2019; Gardiner et al. 2019; Wang et al. 2019; Dreissig et al. 2020; Barakate et al. 2021). To test this hypothesis, we used a large diversity panel comprising 916 individuals belonging to 266 accessions of diverse geographic origin. We estimated population-scale recombination rates (ρ) based on genotyping-by-sequencing (GBS) data, quantified differences in the size of LR regions, and performed genome-wide association scans (GWAS) to identify putative candidate genes. In doing so, we identified an ortholog of yeast histone acetyltransferase ESA1 as a putative candidate among a total of 18 annotated genes in a ±1 Mb interval surrounding the quantitative-trait-locus (QTL). This led us to hypothesize that LR region size might be linked to differences in SC length, which we analyzed in individuals from different subpopulations. Finally, we analyzed patterns of diversity at the ScESA1 genomic region and showed that indirect selection for increased LR region size might have occurred during the domestication of rye.
Materials and Methods
Plant Material, DNA Isolation, and GBS Library Construction and Sequencing
Our panel comprises 266 genebank accessions S. cereale based on passport information available in the genebank information system of the German Federal ex situ Genebank at IPK Gatersleben (GBIS; https://gbis.ipk-gatersleben.de/GBIS_I) such as taxonomic status, country of origin, and collection site (Oppermann et al. 2015). Out of these, 94 accessions comprising 582 individuals were previously described and sequenced (Schreiber et al. 2019; Rabanus-Wallace et al. 2021), and another 172 accessions comprising 334 individuals were added in this work (supplementary file S1, Supplementary Material online spreadsheet ‘Passport data all samples’ and ‘Passport data Turkey samples’). DNA was isolated from one up to nine plants per accession as described in Schreiber et al. 2019, resulting in a total of 916 individuals for further analyses (supplementary file S1, Supplementary Material online). GBS was done as described in Schreiber et al. (2019).
GBS Read Alignment and Variant Calling
Adapters were trimmed from raw reads with Cutadapt (Martin 2011) and trimmed reads were mapped to the Lo7 reference genome sequence assembly (Rabanus-Wallace et al. 2021) using BWA-MEM (Li 2013). Alignment records were converted to Binary/Alignment Map format with SAMtools (Li et al. 2009) and sorted with Novosort (http://www.novocraft.com/products/novosort/). Multi-sample variant calling was done with BCFtools. The resulting single nucleotide polymorphism (SNP) matrix in Variant Call Format was filtered for minimum read depth per SNP of 2 (–minDP 2), minimum genotype quality of 2 (–minGQ 2), maximum missing data of 10% (–max-missing 0.9), minor allele frequency of 0.01 (–maf 0.01), to contain only bi-allelic sites (–min-alleles 2 –max-alleles 2), and no indels (–remove-indels) using vcftools (Danecek et al. 2011). The final SNP matrix contained 57,326 sites, with an average SNP density of 6.6 SNPs/Mb. Unfiltered SNP data were deposited at the European Variation Archive under project number PRJEB51528. Data for this study are available at the European Nucleotide Archive under accession number PRJEB50548.
Population Structure Analysis
Population structure among all 916 S. cereale individuals was analyzed by performing a principal component analysis (PCA) using all 57,326 SNPs employing the smartpca function in EIGENSOFT without outlier removal and using least squares projection (Patterson et al. 2006). Partitioning of rye individuals into K ancestral populations and estimation of individual ancestry coefficients were done using sNMF (Frichot et al. 2014). The sNMF algorithm was run for K-values between 1 and 20 to identify the optimal number of ancestral populations according to the cross-entropy criterion. Ancestry coefficients were averaged across 20 replications using CLUMPP applying the LargeKGreedy algorithm with 500 permutations (Jakobsson and Rosenberg 2007).
Recombination Landscape and Diversity Analysis
In order to analyze recombination landscapes among 916 rye individuals, we grouped 50 individuals each into sliding subpopulations with an overlap of five individuals (supplementary fig. S1, Supplementary Material online). Individuals were grouped along the first eigenvector of a PCA. The rationale behind this selection was to group individuals with similar eigenvalues and ancestry coefficients and provide a method to compare between subpopulations which otherwise cannot be separated based on population structure (Battey et al. 2020). This approach resulted in a total of 174 subpopulations. Among these subpopulations, the number of SNPs ranged from 33,630 to 48,105. Independent of the total number of SNPs, SNP distributions along the chromosome were highly similar between subpopulations, with rank correlations ranging from 0.986 to 1 (supplementary fig. S2, Supplementary Material online). Genome-wide population recombination rates were estimated in each subpopulation using the Interval program from the LDhat package based on a common SNP set comprised of 57,326 sites (Hudson 2001; McVean et al. 2004; Auton and McVean 2007). Population recombination rates (ρ = 4N × r) were estimated between pairs of SNPs. The Interval program was run for 10,000,000 iterations, sampling every 5,000 runs, with a block penalty of 5, and using a population-scaled mutation rate per site of θ = 0.01. Similar to previous work (Dreissig et al. 2019), different iteration numbers (10,000,000 versus 60,000,000), block penalties (5, 10, 15, 20), and θ-values (0.1, 0.01, 0.001) were tested, compared with a reference recombination map (Lo7 × Lo255) (Martis et al. 2013; Bauer et al. 2017), and selected based on visual comparison of multiple runs. The Stat program from LDhat was used to summarize ρ/kb values and the first 100,000 runs were discarded as burn in. Population recombination rates of each chromosome in each subpopulation were normalized to range from 0 to 1 in order to compare the chromosomal distribution of high- and low-recombining regions. Normalized ρ-values were aggregated in 10 Mb bins and mean values were calculated. Population recombination landscapes were compared with a reference genetic map derived from a cross between two inbred lines (Lo7 × Lo255 [Martis et al. 2013; Bauer et al. 2017]), with rank correlations ranging from 0.42 to 0.61 (P < 2.2 × 10−16). LR regions were defined as 10 Mb bins showing 20-fold lower normalized ρ-values than the respective chromosome average (Choo 1998; Baker et al. 2014; Fuentes et al. 2022). We calculated the Spearman's rank correlation between subpopulations using normalized ρ in 10 Mb bins to describe differences in the recombination landscapes, where high correlation coefficients indicate similar recombination landscapes and vice versa. Genome-wide LR region size was calculated by summarizing the number of LR 10 Mb bins across all seven chromosomes. A smaller SNP set of 25,806 comprising only sites polymorphic in all subpopulations was analyzed in the same way and used to test for a potential bias introduced by varying numbers of SNPs in different subpopulations. A comparison of recombination maps estimated using 25,806 or 57,326 sites revealed that no bias was introduced and similar broad-scale recombination landscapes were measured (supplementary fig. S3, Supplementary Material online). Pairwise F values, nucleotide diversity (µ), and Tajima's D were calculated in 1 Mb windows using vcftools (Danecek et al. 2011). The Akey d statistic was used to infer the significance of F values according to the following formula developed by Akey et al. (2010): , where and denote the expected value and standard deviation of F between subpopulatons i and j calculated all SNPs. For nucleotide diversity and Tjaima's D, genome-wide averages were subtracted from values at selected 1 Mb windows to account for genome-wide differences between subpopulations.
Genome-wide Association Scans
We performed GWAS for genome-wide LR region size as the phenotype using the FarmCPU method in GAPIT and a custom SAS script (Dreissig et al. 2020; Wang and Zhang 2021). Since LR region size was estimated in sets of 50 individuals, we calculated the average values of each SNP (coded as 0, 1, or 2 in individuals) resulting in subpopulation-average genotypes on a continuous scale from 0 to 2. In standard GWAS scenarios, each individual is attributed a phenotypic and genotypic value. In our case, however, phenotypes are estimated in subpopulations of 50 individuals, which means there are no individual phenotypic values. This is effectively reducing statistical power, since marker–trait associations are averaged across 50 individuals. In order to challenge the statistical power of this method, we applied it to a previously published data set in which GWAS were performed on flowering time in a large nested-association-mapping population in barley (Maurer et al. 2015). This population is composed of 1,420 lines belonging to 25 families, with 21 up to 72 lines per family. We created 1,420 random subsets of 25 individuals for which phenotypic and genotypic values were averaged. This is expected to result in a reduction of statistical power, since both phenotype and genotype values were averaged across individuals. GWAS results were then compared between both approaches, that is, average phenotypes and genotypes versus individual phenotypes and genotypes. Results of both approaches were almost identical (supplementary fig. S4, Supplementary Material online).To validate the GWAS output for LR region size, we performed 100 permutations in which genotype and phenotype values were randomized. Additionally, GWAS were performed on LR region size data estimated based on 174 random subpopulations in which individuals were randomly mixed rather than grouped along the first eigenvector.
Cytogenetic Analyses
For cytogenetic analyses, 66 of the 266 accessions, with 5–10 individuals per accession, were cultivated on an experimental field station in central Germany (Halle [Saale], 51°29′52.7″N, 11°59′31.3″E) from September 2020 to June 2021. Individual plants of those accessions selected for cytogenetic analyses (domesticated and non-domesticated) were subjected to ploidy and genome size measurement via flow cytometry to exclude polyploid accessions and to ensure comparable genome sizes (supplementary file S1, Supplementary Material online, spreadsheet ‘ploidy, genome size, cytogenetics’). Genome size measurements were performed according to Doležel et al. (2007). Spikes undergoing meiosis were collected across a range of 2 weeks and fixed in 3:1 ethanol (99%):glacial acetic acid (99%). For each sample, collection time point and air temperature values (°C) up to 40 h before collection were documented, since rye meiotic prophase is known to take ∼40 h (Bennett et al. 1973) (supplementary fig. S5, Supplementary Material online). Immunostaining of ZYP1 (lateral element of the SC) and HEI10 (involved in class I crossover maturation) was performed using a modified protocol adapted from Chelysheva et al. (2010). Briefly, fixed anthers were transferred into 20 µl of acetocarmine staining solution and meiocytes were released from anthers by cutting the top of anther and gently pressing using forceps under a stereo-microscope. Afterwards, cells were heated over an ethanol flame for 2 s and chromosomes were spread by applying gentle pressure to the cover slip. Cover slips were removed after freezing slides in liquid nitrogen. Slides were then heated in a standard microwave for 40 s at 900 W in 50 ml citrate buffer (10 mM, pH = 6), followed by washing in 1× phosphate-buffered saline. Slides were treated with 4% bovine serum albumin (BSA) for blocking before overnight incubation with primary antibodies targeting ZYP1 and HEI10. Anti-HvHEI10 was described by Desjardins et al. (2020). To produce anti-HvZYP1, a synthetic peptide (HPANIGELFSEGSLNPYADD) corresponding to aa 839–858 of Hordeum vulgare ZYP1 was used for immunization of rabbits by LifeTein. Rabbit anti-HvZYP1 was affinity-purified against the synthetic peptide by LifeTein. Secondary antibodies raised against guinea pig IgG and rabbit IgG coupled with CF™ 568 and CF™ 488 (Sigma–Aldrich SAB4600080, SAB4600030), respectively, were incubated for 1 h at 37 °C. Slides were then mounted in VectaShield supplemented with DAPI and analyzed using a Zeiss CellObserver HS system equipped with a 40× and 100× objective. Image analysis was done using Fiji (Schindelin et al. 2012) and SNT (Arshadi et al. 2021) for SC-length measurements. SNT is able to automatically detect and measure linear segments in 3D images, but was used semi-automatically for 2D ZYP1 images. Briefly, reference points were manually placed along ZYP1 segments until all visible ZYP1 segments were covered. Then, total SC length was measured as the sum of all individual segments. The reliability of this approach depends on the overall size of all chromosomes and the resolution of the microscope used to acquire images. Using this method, it is not possible to measure individual chromosomes, but only to obtain an approximation for total SC length (supplementary fig. S6, Supplementary Material online). Approximate total SC length was measured in six individuals out of distant subpopulations (three per subpopulation) with contrasting LR region size. A total of 6–27 cells were analyzed per individual, resulting in 35–50 SC-length measurements per subpopulation. For comparison between different temperature conditions, three individual plants belonging to one accession were sampled, and SC length and HEI10 foci were measured in 6–9 and 66–73 cells per individual, respectively. Class I crossover numbers were also counted in the same set of cells by counting HEI10 foci co-localizing with the SC lateral element ZYP1. SC-length and crossover number were compared between groups using the t.test function in R (Student's t-test) for pairwise comparisons, and nested analysis of variance (aov[SC length ∼ subpopulation/individual] function) in case of multiple comparisons.
Results and Discussion
Recombination Landscape Divergence Among Populations
Eukaryotic recombination landscapes are typically characterized by increased recombination rates in sub-telomeric regions and drastically reduced recombination rates in peri-centromeric regions (Haenel et al. 2018). The relative size of LR regions is dependent on chromosome length, chromatin environment, as well the spatio-temporal dynamics of meiosis in different species. Empirical work on recombination landscape variation in humans and different plant species showed that broad-scale recombination landscapes are similar between populations (Bauer et al. 2013; Dreissig et al. 2019; Spence and Song 2019; Danguy des Déserts et al. 2021; Dukić et al. 2022). Here, we used rye as a model to analyze the extent of recombination landscape divergence between subpopulations.In order to explore population structure in rye, we analyzed 916 S. cereale individuals belonging to 266 accessions of diverse geographic origin. Based on prior classification, 458 individuals were classified as domesticated rye (88 accessions), 292 (137 accessions) as feral with either non-brittle rachis (273) or with brittle rachis (19), 158 as weedy rye (65 accessions), and 8 as S. cereale × vavilovii hybrids (two accessions) (Schreiber et al. 2019, 2021). Due to heterogeneity within accessions, 27 out of 266 contained individuals which were previously assigned to multiple classifications. A PCA based on 57,326 SNPs revealed that rye accessions form overlapping clusters, suggesting high levels of gene flow before ex situ conservation due to outbreeding via wind pollination and self-incompatibility (fig. 1 and ). We analyzed levels of population admixture using sNMF with the number of ancestral populations (K) ranging from 1 to 20. As K increased, the cross-entropy criterion decreased without reaching a local minimum, suggesting high levels of admixture and no clear population structure (fig. 1 and ). This led us to group individuals in sliding subpopulations of n = 50 individuals with an overlap of five individuals along the first eigenvector, which explained 38.16% of the observed genetic variation. This grouping appeared feasible due to the absence of distinguishable subpopulations and high levels of admixture caused by outbreeding through wind pollination.
Fig. 1.
Rye population admixture. Rye individuals show high levels of admixture among samples of diverse geographic origin and different domestication status. (A and B) PCA including 916 rye individuals belonging to 266 accessions of diverse geographic origin (A) and different domestication status (B). First and second eigenvectors explain up to 47.7% of the observed genetic variation. (C) Estimation of cross-entropy criterion for varying numbers of ancestral populations ranging from K = 1 to K = 20. (D) Ancestry coefficients of individual Secale cereale samples (K = 20). Samples are shown in ascending order according to eigenvalues of the first eigenvector of a PCA.
Rye population admixture. Rye individuals show high levels of admixture among samples of diverse geographic origin and different domestication status. (A and B) PCA including 916 rye individuals belonging to 266 accessions of diverse geographic origin (A) and different domestication status (B). First and second eigenvectors explain up to 47.7% of the observed genetic variation. (C) Estimation of cross-entropy criterion for varying numbers of ancestral populations ranging from K = 1 to K = 20. (D) Ancestry coefficients of individual Secale cereale samples (K = 20). Samples are shown in ascending order according to eigenvalues of the first eigenvector of a PCA.We estimated genome-wide population recombination rates (ρ = 4N × r) in each subpopulation using LDhat, which revealed substantial variation with a minimum of ρ = 1.3 up to a maximum of ρ = 28.4. This is likely based on differences in effective population size (N) as well as different meiotic recombination rates. Since it is rather difficult to disentangle the relative contributions of effective population size, drift, and selection from actual variations in the meiotic recombination rate (Peñalba and Wolf 2020; Monroe et al. 2022), we instead focused on comparing recombination landscapes, that is, the distribution of normalized ρ-values along the chromosome. We used the Spearman's rank correlation to describe recombination landscape divergence among subpopulations, where a high correlation coefficient between any two subpopulations indicates similar recombination landscapes and a low correlation coefficient indicates divergent recombination landscapes. Across the 15,051 pairwise comparisons, correlations ranged from 0.42 to 0.91, with subpopulations close to each other in the PCA space showing high correlations and those at the respective ends of the first eigenvector showing low correlations (fig. 2). Lowest rank correlations were observed between domesticated subpopulations and weedy subpopulations. In intraspecific subpopulations, recombination landscapes are typically very similar, as was seen in animals and plants, with rank correlations always higher than 0.7 (Bauer et al. 2013; Dreissig et al. 2019; Spence and Song 2019; Danguy des Déserts et al. 2021).
Fig. 2.
Recombination landscape divergence in rye. Recombination landscapes vary between subpopulations, with a tendency toward smaller low-recombining regions in weedy subpopulations. (A) Spearman's rank correlation between recombination landscapes of rye subpopulations. Sliding subpopulations were defined along the first eigenvector explaining 38.16% of observed variance with a population size of n = 50 individuals and an overlap of 5 between windows. (B) Genome-wide mean LR region size in Mb for each subpopulation. Contrasting subpopulations are highlighted in blue (domesticated) and red (weedy), respectively, and were used for cytological validation. (C) Recombination landscapes (chromosome 5R) of selected subpopulations showing contrasting LR region size in (B), ranging from domesticated (blue) to weedy (red) subpopulations.
Recombination landscape divergence in rye. Recombination landscapes vary between subpopulations, with a tendency toward smaller low-recombining regions in weedy subpopulations. (A) Spearman's rank correlation between recombination landscapes of rye subpopulations. Sliding subpopulations were defined along the first eigenvector explaining 38.16% of observed variance with a population size of n = 50 individuals and an overlap of 5 between windows. (B) Genome-wide mean LR region size in Mb for each subpopulation. Contrasting subpopulations are highlighted in blue (domesticated) and red (weedy), respectively, and were used for cytological validation. (C) Recombination landscapes (chromosome 5R) of selected subpopulations showing contrasting LR region size in (B), ranging from domesticated (blue) to weedy (red) subpopulations.Intrigued by these differences between domesticated and weedy subpopulations, we compared the physical size of LR regions among all subpopulations along each chromosome. LR regions were defined as 10 Mb bins showing 20-fold lower recombination rates than the respective chromosome average (Choo 1998; Baker et al. 2014), which revealed a tendency toward smaller LR regions in weedy subpopulations (fig. 2 and ). LR region size was gradually decreasing from domesticated to weedy subpopulations (R = −0.92, R2 = 0.85, P < 2.2 × 10−16). This pattern held true for all chromosomes except 4R, where the exact opposite was observed (supplementary fig. S7, Supplementary Material online). This might be caused by a chromosomal inversion segregating in those individuals analyzed here compared with the reference genome assembly (Rabanus-Wallace et al. 2021). In order to validate these observations, individuals were also randomly assigned to subpopulations and analyzed following the same procedure. However, random subpopulations did not show any differences in LR region size, indicating that the above-mentioned differences are not detectable when individuals from non-related subpopulations are grouped (supplementary fig. S7, Supplementary Material online). Structural variations are known to influence recombination landscapes, with large chromosomal inversions of more than 1 Mb typically preventing homologous recombination or inverting recombination landscapes (Lukaszewski et al. 2012; Schmidt et al. 2020; Rabanus-Wallace et al. 2021). Smaller structural variations of <500 Kb, however, were shown to influence patterns of recombination only at the fine-scale (Rowan et al. 2019; Fuentes et al. 2022). In our data, we cannot determine how many small structural variations might segregate in rye subpopulations. Since we analyzed broad-scale differences in LR region size at a scale of 100–500 Mb, contributions of small structural variations are expected to be neglectable (Lian et al. 2022).
In order to explore the genetic architecture underlying recombination landscape variation, we performed GWAS to identify genetic markers associated with quantitative differences in LR region size. Since LR region size of each subpopulation was estimated based on 50 individuals, we calculated the mean of each marker in each subpopulation. This method was expected to have reduced statistical power, which is why it was first tested on a flowering time data set in barley (H. vulgare) (Maurer et al. 2015). In this scenario, it was able to detect identical QTL, so we concluded our approach was suitable for reliably detecting major QTL (see Materials and Methods, supplementary fig. S4, Supplementary Material online). We identified one major and one minor recombination landscape QTL located on chromosomes 2R and 6R, respectively (fig. 3, supplementary file S1 and fig. S8, Supplementary Material online). The major QTL (QTL.1) on chromosome 2R explained 88% of the phenotypic variation, occurred at a minor allele frequency of 0.4 (proportion of missing data = 0.02), showed a significant deviation from Hardy–Weinberg equilibrium (HWE) (P = 9.5 × 10−18, loss of heterozygosity P = 7.3 × 10−18, supplementary file S1, Supplementary Material online), and showed an effect size of −29% (−539.7 Mb total LR size, mean LR size = 1880 Mb, fig. 3, supplementary fig. S9, Supplementary Material online). The minor QTL (QTL.2) on chromosome 6R explained only 5% of the phenotypic variation and occurred with a minor allele frequency of 0.02 (proportion of missing data = 0.03), and did not deviate from HWE (P = 1). We did not filter out SNPs deviating from HWE in order to capture variants under selection. Although an excess of heterozygosity might be caused by genotyping errors, a loss of heterozygosity might be indicative of selection (Fardo et al. 2009; Coleman et al. 2016; Abramovs et al. 2020). Therefore, the loss of heterozygosity at major QTL.1 might suggest that this locus was under selection during domestication (see below). For validation, GWAS was repeated 100 times using randomly mixed phenotype and genotype values, without detecting any significant associations (supplementary fig. S10, Supplementary Material online). We also performed GWAS on LR region size estimated in random subpopulations composed of randomly assigned individuals, rather than grouped according to the first eigenvector, and did not detect any significant associations (supplementary figs. S7 and S9, Supplementary Material online). Within a ±1 Mb interval surrounding the major QTL.1 on chromosome 2R, a total of 18 genes were located. Among these, we identified a histone acetyltransferase gene whose homologs are highly expressed in meiotic tissue in barley (HORVU.MOREX.r2.2HG0101410), wheat (Traes_2AS_72EE169B7; Traes_2BS_CCF195D52.1; Traes_2DS_DCFDFA58B.2), and Arabidopsis (HAM1; AT5G64610) (Berardini et al. 2015; Colmsee et al. 2015; Borrill et al. 2016; Mascher et al. 2017; Ramírez-González et al. 2018). In Arabidopsis thaliana, histone H3 acetylation has a significant impact on crossover distribution, but not total crossover number per cell, suggesting a role in crossover patterning (Perrella et al. 2010). In the budding yeast S. cerevisiae, meiosis-specific depletion of histone H4 acetylation via knock-out of ESA1 results in shorter chromosome axes, fewer DSBs, and reduced crossover frequency (Wang et al. 2021). The histone acetyltransferase identified at QTL.1 shares high amino acid sequence homology with yeast ESA1 (fig. 3) and the Arabidopsis histone H4 acetylase HAM1, which is also highly expressed in meiotic tissue (Berardini et al. 2015). Chromosome axis and SC length positively correlate with crossover frequency (Kleckner et al. 2003), and recombination landscapes are influenced by overall SC length, resulting in species with longer SCs having distally-biased recombination landscapes (i.e., large-genome Triticeae) and species with shorter SCs having more uniform distributions along chromosome arms (i.e., Arabidopsis) (Haenel et al. 2018; Rowan et al. 2019; Danguy des Déserts et al. 2021). Allelic variation in a histone acetyltransferase, potentially leading to altered histone acetylation, might therefore influence chromosome axis/SC length and consequently have an effect on recombination landscapes. We therefore suggest ScESA1 as a candidate gene for regulating LR region size. However, additional work such as comprehensive resequencing and functional characterization of ScESA1 is required to confirm this.
Fig. 3.
GWAS identified an ortholog of ESA1 underlying LR region size variation. Quantitative variation in LR region size is largely explained by one major QTL on chromosome 2R. An ortholog of histone acetyltransferase ESA1 is closely linked to this QTL and shows high amino acid similarity to yeast ESA1 and Arabidopsis HAM1. (A) GWAS on LR region size identified two significant associations on chromosomes 2R (QTL.1) and 6R (QTL.2). Red line indicates Bonferroni threshold at P = 0.01. (B) Correlation between LR region size and mean genotype value at QTL.1 showing that 88% of phenotypic variation are explained by this locus. (C) Multiple amino acid sequence alignment (Corpet 1988) of histone acetyltransferase ESA1 orthologs in yeast (Secale cerevisae), rye (S. cereale), and Arabidopsis (Arabidopsis thaliana). Conserved amino acids are highlighted in red.
GWAS identified an ortholog of ESA1 underlying LR region size variation. Quantitative variation in LR region size is largely explained by one major QTL on chromosome 2R. An ortholog of histone acetyltransferase ESA1 is closely linked to this QTL and shows high amino acid similarity to yeast ESA1 and Arabidopsis HAM1. (A) GWAS on LR region size identified two significant associations on chromosomes 2R (QTL.1) and 6R (QTL.2). Red line indicates Bonferroni threshold at P = 0.01. (B) Correlation between LR region size and mean genotype value at QTL.1 showing that 88% of phenotypic variation are explained by this locus. (C) Multiple amino acid sequence alignment (Corpet 1988) of histone acetyltransferase ESA1 orthologs in yeast (Secale cerevisae), rye (S. cereale), and Arabidopsis (Arabidopsis thaliana). Conserved amino acids are highlighted in red.
SC Length Variation Correlates with LR Region Size Variation Between Subpopulations
To address the question whether differences in LR region size might be caused by variations in SC length, we quantified SC length and class I crossovers by immunolabelling of ZYP1 and HEI10 proteins on pachytene chromosomes. We selected rye individuals belonging to subpopulations of contrasting LR region sizes and sampled meiotic cells at identical environmental temperatures (12 °C) (fig. 4). Total SC length was significantly higher in individuals from subpopulations with larger LR regions, with an average of 1147.6 µm compared with an average of 975.9 µm in individuals from subpopulations with smaller LR regions (fig. 4, nested ANOVA, Psubpoulation = 0.00208, Pindividual(subpopulation) = 0.07568). Class I crossovers numbers per cell did not differ significantly between these subpopulations, with an average of 15.4 compared with 15.5 (fig. 4, nested ANOVA, Psubpopulation = 0.824, Pindividual(subpopulationo) = 1.01 × 10−10), respectively. We detected small but significant differences in class I crossover number between individuals within subpopulations (mean of ±3.33), which may be attributed to genetic heterogeneity between these individuals, which is typical of outbreeding species (Schreiber et al. 2019; Sun et al. 2021). In subpopulations with larger LR regions, the numerical distribution of crossovers significantly deviated from a Poisson distribution, typically caused by positive crossover interference (χ2 goodness-of-fit test, P = 0.0055). In subpopulations with smaller LR regions, this was not the case, suggesting reduced crossover interference (P = 0.0962). Previous work in other eukaryotes, such as Arabidopsis, barley, yeast, and mouse, showed that crossover numbers are influenced by SC length (Phillips et al. 2015; Lloyd et al. 2018; Wang et al. 2019, 2021; Song et al. 2021). Consequently, shorter SCs are expected to permit fewer crossovers and vice versa. SC length is also influenced by environmental temperature during meiotic prophase (Phillips et al. 2015; Lloyd et al. 2018; Weitz et al. 2021). We therefore tested if genetically similar rye individuals of one accession experiencing different temperatures during meiosis showed SC length variation accompanied by crossover number variation. Indeed, SC length and class I crossover number co-varied at different temperatures (7, 12, 18 °C) following a U-shaped pattern, with SC length and crossover number at a local minimum at intermediate temperature (12 °C), and increased SC length and crossover number at colder and warmer conditions (7 and 18 °C) (fig. 4 and ). Total SC length ranged from 921.5 µm at 12 °C, to 1,560.1 µm at 7 °C, and 1,235.8 µm at 18 °C (TukeyHSD test, P < 0.05). In agreement with previous work (Kleckner et al. 2003; Lloyd et al. 2018), class I crossover numbers co-varied with total SC length, with 15.7 at 12 °C, 22.9 at 7 °C, and 29.2 at 18 °C (TukeyHSD test, P < 0.05). In our work, total SC length was strongly influenced by environmental temperature and less by genetic divergence between individuals, and crossover number was only affected by temperature. This suggests that genetically determined subtle differences in SC length do not affect class I crossover number, but only crossover distribution along the chromosome, leading to smaller LR regions. Previous work in H. vulgare, a closely related species, showed that increased SC length under higher temperatures led to a shift of crossovers toward more centromere-proximal regions (Higgins et al. 2012; Phillips et al. 2015). In other species, such as Arabidopsis, yeast, and mice, crossover number also co-varied with SC length, but whether this led to a shift in crossover distribution was not analyzed (Lloyd et al. 2018; Wang et al. 2019, 2021). In Drosophila, however, a single meiotic gene was shown to influence both crossover frequency and patterning, suggesting that certain key regulators are able to change recombination landscapes (Brand et al. 2018). In our work, subpopulations with smaller LR regions also showed decreased SC length. This might suggest differences in how SC length and recombination landscapes are intertwined in response to temperature variation or epigenetic changes, such as histone acetylation. On the other hand, our observation of a correlation between LR region size and SC length does not necessarily imply causality, but might also be caused by other factors during the domestication process. These data expand our knowledge on the multiple mechanisms underlying recombination landscape variation, and demonstrate that population-level differences can be associated with SC length variation. Whether this is caused by natural variation in an ortholog of the histone H4 acetyltransferase ESA1 remains to be addressed in future functional studies.
Fig. 4.
Synaptonemal complex length and class I crossover variation in divergent subpopulations. Population-level differences in recombination landscapes were tested in multiple individuals by quantifying synaptonemal complex (SC) length and class I crossover number. In individuals belonging to subpopulations with small LR regions, total SC length is reduced, but not crossover number, which suggests crossovers are shifted toward more centromere-proximal positions along the chromosome axis. (A) Immunolabeling of class I crossovers via HEI10 and SC lateral element ZYP1 on rye pachytene chromosomes samples from individuals belonging to divergent subpopulations. (B and C) Approximate total SC length and HEI10 foci per cell in individuals belonging to divergent subpopulations. Individuals (represented by different shapes) were sampled at 12 °C (0–40 h average before sampling). (D and E) SC length and HEI10 foci per cell in individuals of accession R1213 (subpopulations 51–78) sampled at different temperatures prior to meiotic prophase.Scale bar = 20 µm. *P < 0.05, **P < 0.005, and ***P < 0.0005, ns = not significant.
Synaptonemal complex length and class I crossover variation in divergent subpopulations. Population-level differences in recombination landscapes were tested in multiple individuals by quantifying synaptonemal complex (SC) length and class I crossover number. In individuals belonging to subpopulations with small LR regions, total SC length is reduced, but not crossover number, which suggests crossovers are shifted toward more centromere-proximal positions along the chromosome axis. (A) Immunolabeling of class I crossovers via HEI10 and SC lateral element ZYP1 on rye pachytene chromosomes samples from individuals belonging to divergent subpopulations. (B and C) Approximate total SC length and HEI10 foci per cell in individuals belonging to divergent subpopulations. Individuals (represented by different shapes) were sampled at 12 °C (0–40 h average before sampling). (D and E) SC length and HEI10 foci per cell in individuals of accession R1213 (subpopulations 51–78) sampled at different temperatures prior to meiotic prophase.Scale bar = 20 µm. *P < 0.05, **P < 0.005, and ***P < 0.0005, ns = not significant.
Patterns of Diversity in the ScESA1 Genomic Region
To investigate whether the ScESA1 genomic region is showing patterns of selection across subpopulations, we analyzed levels of diversity in the genomic region harboring ScESA1 and compared it to the rest of the genome. First, we calculated pairwise fixation indices (F) across all subpopulations in 1 Mb windows with an overlap of 500 Kb along all chromosomes. We then calculated the Akey d statistic (Akey et al. 2010) in a 2 Mb interval surrounding ScESA1 (21 polymorphic sites) and found increasing pairwise d-values the more distant two subpopulations were in the PCA space, that is, domesticated versus weedy subpopulations (fig. 5). We also observed significant differentiation at the ScESA1 locus among clusters of domesticated subpopulations. Overall differentiation between domesticated and weedy rye is weak, but genomic regions harboring genes underlying domestication traits, such as grain-shattering and brittle culm, were shown to be differentiated (Sun et al. 2021). This was also observed in the domesticated and weedy subpopulations analyzed here, with the major domestication genes for brittle rachis (ScBtr) and brittle culm (ScBc1) showing signification differentiation (fig. 5). At ScESA1, differentiation was weaker compared with other key domestication genes, but still significant (d > 95% quantile) (fig. 5), supporting the idea that the ScESA1 region diverged between domesticated and weedy subpopulations. Next, we estimated nucleotide diversity (π) in 1 Mb windows across all 174 subpopulations and found reduced diversity levels in domesticated subpopulations in the ScESA1 region compared with the genome-wide average (fig. 5). In addition, Tajima's D showed a deviation from 0 in domesticated subpopulations, but not in a uniform tendency, indicating that some domesticated subpopulations underwent balancing selection (Tajima's D > 0), whereas others underwent a selective sweep (Tajima's D < 0) (fig. 5). These observations suggest that diversity in the ScESA1 region is not evolving neutrally, and might be reduced by moderate selection on ScESA1 alleles in domesticated rye. However, among domesticated subpopulations, we did not observe a uniform tendency for differences in nucleotide diversity and Tajima's D, suggesting that selection for the ScESA1 genomic region was not paralleled in all domesticated accessions. Interestingly, this locus also showed weak but significant signals of selection in a comparison between cultivated rye and its putative wild progenitor (S. cereale subsp. vavilovii), but no candidate gene was proposed at the time (Li et al. 2021).
Fig. 5.
Patterns of diversity at the ScESA1 genomic locus. Rye ScESA1 shows patterns of selection in case of comparisons between domesticated and weedy subpopulations. This suggests that an increase in the size of LR regions was indirectly selected during rye domestication, presumably in order to generate more homogenous rye populations to ensure stable yields in farming systems. (A) Pairwise d-values among 174 subpopulations in 2 Mb genomic interval harboring ScESA1. The blue and red lines represent the 95th and 99th percentiles, respectively. (B) Genome-wide d-values for a comparison between a domesticated (22) and a weedy subpopulation (144), which are not overlapping. Chromosomal d distribution marks several loci known to harbor domestication genes (e.g., ScBtr on chromosome 3R and ScBc1 on chromosome 2R; domesticated gene information adopted from [Sun et al. 2021]). ScESA1 is highlighted in red. The dashed blue and red lines represent the 95th and 99th percentile. (C) Nucleotide diversity (π) in the ±5 Mb interval surrounding ScESA1 across 174 subpopulations. Genome-wide π-values were subtracted from local π at ScESA1. (D) Tajima's D in the ±5 Mb interval surrounding ESA1 across 174 subpopulations. Genome-wide Tajima's D-values were subtracted from local Tajima's D at ScESA1.
Patterns of diversity at the ScESA1 genomic locus. Rye ScESA1 shows patterns of selection in case of comparisons between domesticated and weedy subpopulations. This suggests that an increase in the size of LR regions was indirectly selected during rye domestication, presumably in order to generate more homogenous rye populations to ensure stable yields in farming systems. (A) Pairwise d-values among 174 subpopulations in 2 Mb genomic interval harboring ScESA1. The blue and red lines represent the 95th and 99th percentiles, respectively. (B) Genome-wide d-values for a comparison between a domesticated (22) and a weedy subpopulation (144), which are not overlapping. Chromosomal d distribution marks several loci known to harbor domestication genes (e.g., ScBtr on chromosome 3R and ScBc1 on chromosome 2R; domesticated gene information adopted from [Sun et al. 2021]). ScESA1 is highlighted in red. The dashed blue and red lines represent the 95th and 99th percentile. (C) Nucleotide diversity (π) in the ±5 Mb interval surrounding ScESA1 across 174 subpopulations. Genome-wide π-values were subtracted from local π at ScESA1. (D) Tajima's D in the ±5 Mb interval surrounding ESA1 across 174 subpopulations. Genome-wide Tajima's D-values were subtracted from local Tajima's D at ScESA1.In summary, we conclude that recombination landscapes differ between domesticated and weedy subpopulations in rye, especially in the size of LR regions. Through GWAS, we identified a histone H4 acetyltransferase (ScESA1) as a putative candidate gene, but further work is required to confirm its effect and function in plants. We observed a correlation between domestication status, LR region size, and SC length, but no difference in class I crossover number. These observations allow us to speculate that LR-region size is regulated via SC-length, which in turn might be influenced by overall histone H4 acetylation, as was shown in yeast (Wang et al. 2021). Finally, we detected weak but significant patterns of selection for the genomic interval harboring ScESA1 in domesticated rye. Rye is an outbreeding, wind-pollinating species, which shaped its domestication history (Schreiber et al. 2019, 2021; Sun et al. 2021). During the domestication of rye, and throughout its cultivation as landraces, and later as population cultivars, it might have been advantageous for farmers and breeders to have larger LR regions in order to develop more homogeneous populations for agricultural use. In this way, indirect selection on ScESA1 and larger LR regions might have occurred during rye domestication.Click here for additional data file.
Authors: Eva Bauer; Thomas Schmutzer; Ivan Barilar; Martin Mascher; Heidrun Gundlach; Mihaela M Martis; Sven O Twardziok; Bernd Hackauf; Andres Gordillo; Peer Wilde; Malthe Schmidt; Viktor Korzun; Klaus F X Mayer; Karl Schmid; Chris-Carolin Schön; Uwe Scholz Journal: Plant J Date: 2017-02-08 Impact factor: 6.417
Authors: Kevin M Wright; Brian Arnold; Katherine Xue; Maria Šurinová; Jeremy O'Connell; Kirsten Bomblies Journal: Mol Biol Evol Date: 2014-12-26 Impact factor: 16.240
Authors: James D Higgins; Ruth M Perry; Abdellah Barakate; Luke Ramsay; Robbie Waugh; Claire Halpin; Susan J Armstrong; F Chris H Franklin Journal: Plant Cell Date: 2012-10-26 Impact factor: 11.277
Authors: Nataliya E Yelina; Kyuha Choi; Liudmila Chelysheva; Malcolm Macaulay; Bastiaan de Snoo; Erik Wijnker; Nigel Miller; Jan Drouaud; Mathilde Grelon; Gregory P Copenhaver; Christine Mezard; Krystyna A Kelly; Ian R Henderson Journal: PLoS Genet Date: 2012-08-02 Impact factor: 5.917