Literature DB >> 31744900

Inferring the Genomic Landscape of Recombination Rate Variation in European Aspen (Populus tremula).

Rami-Petteri Apuli1, Carolina Bernhardsson1,2, Bastian Schiffthaler3, Kathryn M Robinson3, Stefan Jansson3, Nathaniel R Street3, Pär K Ingvarsson4.   

Abstract

The rate of meiotic recombination is one of the central factors determining genome-wide levels of linkage disequilibrium which has important consequences for the efficiency of natural selection and for the dissection of quantitative traits. Here we present a new, high-resolution linkage map for Populus tremula that we use to anchor approximately two thirds of the P. tremula draft genome assembly on to the expected 19 chromosomes, providing us with the first chromosome-scale assembly for P. tremula (Table 2). We then use this resource to estimate variation in recombination rates across the P. tremula genome and compare these results to recombination rates based on linkage disequilibrium in a large number of unrelated individuals. We also assess how variation in recombination rates is associated with a number of genomic features, such as gene density, repeat density and methylation levels. We find that recombination rates obtained from the two methods largely agree, although the LD-based method identifies a number of genomic regions with very high recombination rates that the map-based method fails to detect. Linkage map and LD-based estimates of recombination rates are positively correlated and show similar correlations with other genomic features, showing that both methods can accurately infer recombination rate variation across the genome. Recombination rates are positively correlated with gene density and negatively correlated with repeat density and methylation levels, suggesting that recombination is largely directed toward gene regions in P. tremula.
Copyright © 2020 Apuli et al.

Entities:  

Keywords:  linkage disequilibrium; linkage map; linked selection; methylation; nucleotide diversity; recombination

Mesh:

Year:  2020        PMID: 31744900      PMCID: PMC6945010          DOI: 10.1534/g3.119.400504

Source DB:  PubMed          Journal:  G3 (Bethesda)        ISSN: 2160-1836            Impact factor:   3.154


Meiotic recombination (hereafter recombination) is an important evolutionary force that directly alters levels of linkage disequilibrium (e.g., Wright 1931). Recombination therefore has important consequences for how effective natural selection is at removing deleterious mutations (Felsenstein 1974), increasing the frequency of beneficial mutations (Barton 1995) and for determining the resolution of association mapping for the dissection of quantitative traits (Nordborg and Weigel 2008). Recombination rates are known to vary between species, among individuals within species and even among different regions in a genome (Nachman 2002). Local recombination rates have been shown to be positively correlated with neutral genetic diversity across a wide range of organisms (reviewed in Nachman 2002). One possible explanation for such an association is that cross-over events and/or associated processes, such as gene conversion and double-strand break repair, have direct mutagenic effects and thus act to increase nucleotide polymorphism (e.g., Kulathinal ). An alternate explanation is that natural selection has indirect effects on sites linked to a site under selection and therefore also acts to reduce diversity on these sites (Begun and Aquadro 1992). Since recombination breaks down linkage disequilibrium, areas of high recombination are characterized by a rapid decay of linkage disequilibrium and linked selection will hence impact fewer sites in the vicinity of a selected site in these regions (Begun and Aquadro 1992). Conversely, in areas of low recombination rates, linkage disequilibrium will be extensive and indirect selection will impact a larger genomic region. Variation in recombination rates across the genome will generate an association between recombination and sequence diversity. Local variation in recombination rates is therefore an important factor for understanding how natural or artificial selection shapes sequence diversity across the genome of an organism. Recombination rates are also known to be associated with a number of different genomic features, such as gene density, repeat density, and cytosine methylation, although the magnitude and direction of these associations are still under debate. Recombination rates have been shown to be both positively and negatively correlated with gene density (positively: e.g., Wang , negatively: e.g., Giraut ), GC-content (positively: Kim , negatively: Giraut ), repeat density and methylation levels (positively: e.g., Rodgers-Melnick , negatively: Giraut ). Characterizing associations between recombination rates and various genomic features at the genus or species level is thus important to avoid making incorrect assumptions about the strength and/or direction of these associations. Traditionally, recombination rates have been estimated from the relationship of marker positions in linkage maps (Stapley ) and more recently recombination rates have also been linked to physical regions of a genome through whole genome sequencing (Nachman 2002). However, producing linkage maps is time consuming and may even be infeasible in some species as it in many cases requires controlled crossing of known parents and the establishment of a large segregating progeny population (Stapley ). Therefore, methods have been developed that infer recombination rates from linkage disequilibrium (LD) between segregating polymorphisms in individuals sampled from natural populations (e.g., McVean , Chan ). Due to the relative ease of obtaining sequence information with modern sequencing methods even from wild populations, these LD-based methods for estimating recombination rates have been widely employed (e.g., McVean , Kulathinal Silva-Junior and Grattapaglia 2015, Wang , Booker ). Detailed knowledge of local variation in recombination rates can be used to infer the action of linked selection by establishing a correlation between the levels of nucleotide diversity and recombination rates across the genome of an organism (McVean , Chan ). Using polymorphism data to infer recombination rates and then using these inferred recombination rates to explain variation in genetic diversity could be problematic, but simulations and studies performed using well-established animal model species such as Mus musculus (Booker ) and in a number of Drosophila species (Kulathinal , Chan ) suggest that indirect methods for estimating recombination rates are not strongly affected by natural selection. However, comparisons of LD-based and genetic linkage map-based methods for estimating recombination rates are not readily available in plant species. Genome structure, and in particular local rates of recombination, show large scale differences between plants and animals (Haenel ), and it would therefore also be valuable to assess how well indirect methods for inferring recombination rates perform in plants. The genus Populus has emerged as an important model system for forest genetics due to its rapid growth rate, ability to generate natural clones and a manageable genome size of ca. 480 Mbp distributed across a haploid set of 19 (2n = 38) chromosomes (Taylor 2002, Lin ). Furthermore, both large and small scale synteny is highly conserved across species in the genus, enabling the transfer of genetic resources between species within the genus (Jansson and Douglas 2007). Further interest in Populus has been spurred by their economical (e.g., Taylor 2002) and ecological importance (e.g., Kouki ) and over the past two decades a growing number of the ca. 40 species in the genus have been fully sequenced, including Populus trichocarpa (Black cottonwood) (Tuskan ), P. euphratica (Ma ) and P. tremula (European aspen) (Lin ). Linkage maps have also been produced for many of the species in the genus (e.g., Paolucci , Tong ), including Populus tremula (Zhigunov ) but these maps have generally been relatively coarse, utilizing a few hundred up to a few thousand markers and typically employing mapping populations consisting of fewer than 300 progenies. Consequently, many of these maps have failed to resolve the expected 19 linkage groups typical for the genus and there is thus a need for developing a high-resolution, fine-scale linkage maps for the whole genus. P. tremula is a species of special interest within the genus as it has the largest distribution of any tree species in Eurasia, spanning from Spain and Scotland in the west to pacific China and Russia in the east, Iceland and northern Scandinavia in the north to northern Africa and southern China in the south (Luquez ). Such extensive geographic distribution means P. tremula has adapted to a great variety of different environments, making it a promising species for studying the effects of spatially varying selection and adaptation (e.g., Farmer 1996, Luquez , Wang ). Here we present a newly developed, fine-scale genetic map for Populus tremula and use this map to anchor scaffolds from the current draft genome assembly of P. tremula (Potra v1.1, Lin ) to chromosomes. We then use this new resource to estimate local variation in recombination rates and use this to assess the correlation with recombination rates inferred from data on linkage disequilibrium in sample of unrelated individuals. We assess how different genomic features, such as gene density, repeat content and methylation levels are associated with estimates of local recombination rates. These results provide a valuable resource for enhancing our understanding of genome evolution and the recombination landscape in Populus and will also further facilitate the identification of loci controlling quantitative traits of ecological and economic value.

Material and methods

Plant material

In 2013, a controlled F1 cross was performed between two unrelated P. tremula individuals (UmAsp349.2 x UmAsp229.1) from the Umeå Aspen (UmAsp) collection that consists of c. 300 individuals collected in the vicinity of Umeå in northern Sweden (Fracheboud ). This cross yielded 764 full sib progenies that were planted and monitored in a common garden at the Forestry Research Institute of Sweden’s research station in Sävar, 20 km north-east of Umeå (63.9N 20.5E). In addition, we utilized SNP data for 94 individuals of P. tremula belonging to the SwAsp collection that consists of 116 individuals sampled from 12 sites across Sweden (6-10 individuals per site, Luquez ) displaying no population structure (Wang ). The SNP data has previously been described in Wang and consists of 4,425,109 SNPs with a minor allele frequency exceeding 5%.

DNA extraction and sequence capture

In 2015 leaf samples were collected from all progenies of the F1 cross. DNA was extracted using the Qiagen Plant Mini kit according to manufacturer guidelines and sent to Rapid Genomics (http://www.rapid-genomics.com) for genotyping using sequence capture probes. The probe set contain 45,923 probes of 120 bases each that were designed to target unique genic regions in the v1.1 P. tremula genome assembly (Lin ), as well as an additional 70 probes that were designed to specifically target the putative sex determination region on chromosome 19 of the P. trichocarpa genome assembly v3.0 (https://phytozome.jgi.doe.gov/pz/portal.html). Parents and all offspring were subjected to sequence capture and subsequently sequenced on an Illumina HighSeq 2000 platform using paired-end (2x100bp) sequencing to an average depth of 15x per sample. All sequence capture data were delivered from Rapid Genomics in the spring of 2016. In addition, the two parents of the F1 cross were whole-genome re-sequenced to an average depth of 15x on an Illumina HiSeq 2500 platform with paired-end sequencing (2x150 bp) at the National Genomics Infrastructure at the Science for Life Laboratory in Stockholm, Sweden. All raw sequencing reads were mapped against the complete P. tremula v.1.1 reference genome using BWA-MEM v.0.7.12 (Li and Durbin 2009) using default parameters. Following read mapping, PCR duplicates were marked using Picard (http://broadinstitute.github.io/picard/) and local realignment around indels was performed using GATK RealignerTargetCreator and IndelRealigner (McKenna ; DePristo ). Genotyping was performed using GATK HaplotypeCaller (version 3.4-46, (DePristo ; Van der Auwera ) with a diploid ploidy setting and gVCF output format. CombineGVCFs was then run on batches of ∼200 gVCFs to hierarchically merge samples into a single gVCF and a final SNP call was performed using GenotypeGVCFs jointly on the combined gVCF file, using default read mapping filters.

Construction of a high-density linkage map

Linkage maps were built separately for the two parents using a pseudo-testcross strategy (Grattapaglia & Sederoff 1994) by employing bi-allelic SNPs that were segregating according to Mendelian expectations. Pairwise estimates of recombination frequency were calculated between all markers and marker pairs showing no evidence for recombination were collected into bins and one representative marker was used for map construction. Markers were grouped into linkage groups (LGs) using a LOD threshold of 12 and ordered using the Kosambi mapping function as implemented in the BatchMap software (Schiffthaler ). AllMaps (Tang ) was used to create physical chromosomes from the P. tremula genome assembly v.1.1 (Lin ) and the male and female genetic maps, with equal weight given to the two maps. For more details about the construction of the genetic and physical maps, please refer to Supplementary Methods.

Genetic map and linkage disequilibrium-based recombination rates

The genetic and physical maps for all 19 P. tremula chromosomes were read into MareyMap (Rezvoy , https://cran.r-project.org/web/packages/MareyMap) and the ‘sliding window’ method in MareyMap was used to estimate recombination rate in windows of 1 Mbp with a step size of 250 kbp. We only used windows with at least 8 SNPs in order to avoid regions with large gaps being assigned recombination values. To generate a LD-based recombination map we used LDhelmet v.1.10 (Chan ) with a random subset of 25 diploid individuals (i.e., 50 haplotypes) from the SwAsp data set (Wang ). We used the default values from the LDhelmet manual as Populus tremula has similar levels of nucleotide polymorphism and extent of linkage disequilibrium as Drosophila melanogaster, on which the default settings in LDhelmet are based upon (Wang ). As LDhelmet outputs estimates of recombination in units of ρ/bp whereas the genetic map is in units of cM, we converted the LDhelmet results to cM distances following the method outlined in Booker . We did this to be able to make comparisons between the recombination rates estimated using the two methods. The conversion assumes that the physical size of a chromosome is constant for the two methods so that the cumulative genetic distance in either cM or ρ should be the same but on different scales. The cumulative ρ was calculated by multiplying the ρ/bp estimates with the distance in bp between the adjacent SNP’s and then summed across chromosomes. Knowing the cumulative ρ and corresponding cM-values, it is possible to derive a ‘scaling factor’ to calculate cM values from the corresponding ρ values. The resulting cM values were read into MareyMap and recombination rates were estimated as described earlier for the genetic map-based recombination map. More details on the estimation of recombination rates from the genetic maps can be found in Supplementary Methods.

Correlation of recombination rate estimates, genetic correlates of recombination rate and model of recombination rate

We compared recombination rates inferred from the consensus genetic map or from the sequence data by calculating Spearman’s rank correlations (hereafter correlations) across 1 Mbp windows. We also assessed correlations between the two recombination rates and a number of genomic features, including gene density, repeat density, GC-content, substitution density, neutral diversity (π) and methylation. Gene and repeat density were estimated using bedtools (Quinlan 2014) based on the annotation for the v1.1 P. tremula genome (Lin , ftp://plantgenie.org/Data/PopGenIE/Populus_tremula/v1.1/gff3/). GC-content was calculated from the genome FASTA using an in-house developed awk script (modified from: https://www.biostars.org/p/70167/#70172). The original script was modified to have window functionality across a FASTA sequence and to take into consideration sequence gaps. Windows with more than 80% gaps (N) were discarded to avoid biased results. Substitutions relative to P. trichocarpa were estimated from a vcf-file with SNP-calls for a single Populus trichocarpa individual mapped against the P. tremula reference genome v1.1. Comparisons of putative substitutions were then made against a list of SNP positions from the 94 SwAsp P. tremula data set. Substitutions relative to P. tremuloides were inferred in a similar way based on SNP calls from five P. tremuloides individuals mapped against the P. tremula reference genome v1.2 (Lin ). Files containing old, new and the total number of substitutions were used to calculate substitution densities in windows across the genome. Neutral genetic diversity (π) was estimated using vcftools and only fourfold degenerate sites or intergenic sites located at least 2 kbp from an annotated gene. For nucleotide diversity calculations, data for all 94 SwAsp samples were used. Methylation levels were estimated using bisulfite sequencing data from six SwAsp individuals that were bisulfite sequenced using two biological replicates per individual. Samples were sequenced using paired-end (2x150) sequencing on an Illumina HiSeq X at the National Genomics Infrastructure facility at Science for Life Laboratory in Uppsala, Sweden. Methylation levels were estimated using the Bismark tools (Krueger and Andrews 2011, https://www.bioinformatics.babraham.ac.uk/projects/bismark/). Following trimming and quality control, sequence reads were mapped against polymorphism-substituted versions of the P. tremula v1.1 assembly (Lin ) for each individual sample. Following read mapping BAM files were deduplicated to remove optical duplicates. Methylation levels were then extracted separately for the different methylation contexts (GpG, CHG and CHH) and average values were calculated across 1 Mbp window using a step size of 250 kbp across the P. tremula genome. Further details on the handling of the methylation sequencing data can be found in Supplementary Methods. Correlations between the two recombination maps and between the recombination maps and the various genomic features were calculated in R (R Core Team 2018). We also assessed the independent effects of different genomic features on the two recombination rate estimates using multiple regression.

Data availability

The raw sequencing reads used for the construction of the genetic map and raw bisulfate sequencing reads are available from ENA under study number PRJEB34662 (https://www.ebi.ac.uk/ena/data/view/PRJEB34662). BatchMap input files for the female and male genetic maps, the two component maps and the consensus map files as well as all processed files of the bisulfate sequencing data are available from zenodo.org under DOI number 10.5281/zenodo.3257544 (https://doi.org/10.5281/zenodo.3257544). Additional scripts and files used for the analyses are available at https://github.com/parkingvarsson/recombination_rate_variation. All SNP data used for population genomic analyses is already publicly available through ENA under accession number PRJNA297202 (https://www.ebi.ac.uk/ena/data/view/PRJNA297202). A filtered VCF files with final SNP data are available to download from zenodo.org under DOI number 10.5281/zenodo.3546316 (https://doi.org/10.5281/zenodo.3546316). Supplemental material available at figshare: https://doi.org/10.25387/g3.8481641.

Results

P. tremula linkage maps

14,598 unique markers in the female map and 13,997 unique markers in the male map were distributed across 3,861 and 3,710 scaffolds, respectively. Markers in both parental framework maps grouped into 19 linkage groups (LGs), corresponding to the haploid number of chromosomes in Populus (Table 1). Among the mapped scaffolds, 19 scaffolds contained markers that mapped to different positions within the same LG, but that were more than 20 cM apart (Supplemental Material, Figure S3) and 340 scaffolds contained markers that mapped to two or more LGs (Figures S2 and S4). These scaffolds were split according to criteria described in Supplementary methods. Additionally, there were 49 scaffolds split within gene models (Figure S5). Two ambiguous markers in scaffold Potra001073 were removed. After splitting 14,596 (12,900 binned) and 13,996 (12,382 binned) markers from 4,184 and 4,011 scaffolds remained. These markers spanned 4072.72 cM and 4053.68 cM for the female and male framework maps, respectively. The parental maps were used to produce a consensus genetic map consisting of 19,519 markers derived from 4,761 scaffolds spanning 4059.00 cM (Table 1). Linkage groups (LG) were assigned to the corresponding P. trichocarpa homologs through synteny assessment (Table 1, Figure S1).
Table 1

Summary of female male and consensus linkage maps for each chromosome

ChrLGFemaleMaleConsensus
Probe markersBin markersSize (cM)Probe markersBin markersSize (cM)Probe markersSize (cM)
161,7841,573498.01,7371,542499.32,417498.8
231,054924266.31,028904268.31,407266.3
38826736229.9790695235.31,117235.2
41813728217.2835736263.01,128220.5
57965842284.5947836270.61,293271.9
6141,2411,108311.11,116993294.31,602296.4
718572497161.9501450155.1765155.5
812895781227.4824726211.0117215.5
92697617171.5640578171.6879172.2
1041,026901279.01,015879270.81,388273.7
1116515460181.4541480180.7724182.0
1217515452142.4486429140.7678143.3
1311586530173.7608531176.4822178.7
145733655194.3702609182.0983194.3
1510553505145.8593532157.9777159.2
1619383334145.8220195125.8430145.4
1715488433155.5447394148.9647149.4
1813591502156.3592531166.2792165.5
199361322130.6375342135.8500135.8
Total14,59812,9004072.713,99712,3824053.719,5194059.0

Physical assembly Potra v1.2

The parental framework maps were used to produce a physical map, Potra v1.2, of the P. tremula chromosomes that we used to estimate recombination maps (Figure 1, Figure S6). The scaffolds anchored from the parental framework maps spanned 210.7 Mbp and 205.1 Mbp for the female and male respectively. This corresponds to 54.6% and 53.1% of the 385.8 Mbp covered by the v1.1 assembly (Table S1). Of the 4,761 scaffolds with markers, 96.6% could be anchored in the assembly providing a total physical assembly consisting of 223.4 Mbp. This corresponds to 57.9% of the v1.1 P. tremula assembly (Lin ) (Table 2). 75.7% of the physical map was both anchored and oriented, while the remaining 24.3% was only anchored. 199,967 of the v1.1 assembly scaffolds were not covered by the framework maps and thus could not be anchored to the physical map. There was a clear distinction between the scaffolds we could and could not anchor to the Potra v1.2 assembly. The median length for the 4214 scaffolds anchored in the map was 37 kbp and these scaffolds contain 26,808 predicted gene models. Conversely, the median length on unanchored scaffolds was only 0.3 kbp and they collectively contain only 8,501 predicted gene models (Table 2). After the initial assembly, gap estimation added 43 Mbp of gap sequences across the genome, increasing the estimated total size of the v1.2 assembly to 265 Mbp. This is approximately 55% of the 479 Mbp genome size estimated for P. tremula (Lin ).
Figure 1

Genetic maps and resulting physical map created by Allmaps for Chr1 and Chr5. The left panel shows the marker distribution (in cM) for the genetic maps and the anchored genomic region (in Mbp) for the physical map, while the right panel is showing the Marey maps, i.e., the correspondence between the physical (x-axis) and recombination-based (y-axis) position of markers. The female map is depicted in green and the male map is depicted in orange.

Table 2

Summary of physical assembly Potra v1.2

AnchoredOrientedUnplaced
Markers (unique)19,30215,597215
Markers per Mbp86.492.21.3
Gene elements26,808 (75.9%)NA8501 (24.1%)
N50 Scaffolds (bp)2,0711,589238
Scaffolds4,5992,759200,129
Scaffolds with 1 marker1,1510133
Scaffolds with 2 markers99978717
Scaffolds with 3 markers6203847
Scaffolds with >=4 markers1,8291,5885
Total bases223,381,844 (57,9%)169,187,619 (43,9%)162,436,408 (42,1%)
Genetic maps and resulting physical map created by Allmaps for Chr1 and Chr5. The left panel shows the marker distribution (in cM) for the genetic maps and the anchored genomic region (in Mbp) for the physical map, while the right panel is showing the Marey maps, i.e., the correspondence between the physical (x-axis) and recombination-based (y-axis) position of markers. The female map is depicted in green and the male map is depicted in orange.

Recombination estimates

Recombination estimates were produced based on the consensus linkage map (LMB) and from LD data (LDB) derived from 25 randomly selected individuals from the SwAsp collection. The LMB recombination rate estimates varied between 1.605 cM/Mbp on chromosome 4 to 26.911 cM/Mbp on chromosome 11, while the LDB estimates varied between 1.969 cM/Mbp on chromosome 5 to 231.801 cM/Mbp on chromosome 1. The median estimated recombination rate in the LMB map was 16.0 cM/Mbp with a mean of 15.6 cM/Mbp, whereas the median recombination rate for the LDB map was 14.0 cM/Mbp with a mean of 16.1 cM/Mbp (Table S2, Figure S7). The majority of all recombination rate estimates (97%) for both maps fell in the range of 2 - 27 cM/Mb (Figure 2, Figure S8). There were 20 windows where the LDB estimates are 1.5-15-fold higher compared to the corresponding rates from the LMB estimates and 2-14-fold higher than the mean recombination rate estimate from the LDB map. 13 of these windows had recombination rates exceeding 27 cM/Mbp, while seven were within 2-27 cM/Mbp. Conversely, there were also 23 windows where the LDB estimates were 2-4 times lower than the corresponding LMB estimates (Figure 2, Figures S8 and S9).
Figure 2

Recombination rates and genomic features calculated in 1 Mbp windows across chromosome 1 and 5 with a step size of 250kbp. A) Recombination rate estimated from the linkage map (cM/Mbp) B) Recombination rate estimated from sequence LD data (cM/Mbp) C) Nucleotide diversity (1/bp) D) Divergence (sites/Mbp) E) Gene density (percentage coding/Mbp) F) Repeat density (percentage repeats/Mbp) G) CpG methylation (percentage/Mbp).

Recombination rates and genomic features calculated in 1 Mbp windows across chromosome 1 and 5 with a step size of 250kbp. A) Recombination rate estimated from the linkage map (cM/Mbp) B) Recombination rate estimated from sequence LD data (cM/Mbp) C) Nucleotide diversity (1/bp) D) Divergence (sites/Mbp) E) Gene density (percentage coding/Mbp) F) Repeat density (percentage repeats/Mbp) G) CpG methylation (percentage/Mbp).

Correlation of recombination rate estimates and genomic features

The Spearman’s rank correlation between recombination rate estimates for the LMB and LDB maps was 0.478 (Figure 3). This was the strongest positive correlation of all of the correlations calculated for both maps and strongest correlation overall for the LDB map. Correlations between the LMB and LDB recombination maps and neutral diversity were the second strongest positive correlations for both maps, 0.447 and 0.442 respectively. This correlation was the second strongest overall for LDB map. Correlation with neutral diversity was also the only variable where there was no notable decrease in the correlation coefficient from the LMB to the LDB recombination maps (Figure 3).
Figure 3

Correlations between recombination rates and genomic features. All correlations are significant (P < 0.01) with the exception of those marked with * (P > 0.01).

Correlations between recombination rates and genomic features. All correlations are significant (P < 0.01) with the exception of those marked with * (P > 0.01). For the LMB map we observed strong negative correlations with CHG methylation (-0.515), CHH methylation (-0.511) and CpG methylation (-0.505). For the LDB map the corresponding correlations were -0.379 (CHH), -0.371 (CHG) and -0.353 (CpG). Methylation levels were also strongly correlated with each other (0.918-0.984) and with repeat density (0.800-0.840). Repeat density was also moderately negatively correlated with recombination rate estimates from both the LMB (-0.408) and LDB maps (-0.291). Both recombination maps showed only weak correlations (-0.1<ρ<0.1) with either old or new neutral substitution densities (-0.06 - -0.1). Neutral substitution densities and neutral diversity showed only weak negative correlations (Figure 3). Overall, the LMB estimates displayed consistently stronger correlations with the different genomic features compared to the LDB estimates. This is in line with 5% of the total variation being explained by a multiple regression model for the LDB recombination estimates compared to 35% variation explained for the LMB estimates (Table 3).
Table 3

Multiple regression of recombination rate and various genomic features

FactorEstimateStd. Errortp
Linkage map-based recombination rates
Gene density0.2250.0376.121.3e-9
Repeat density−0.1110.069−1.600.111
GC content−0.1820.042−4.371.36e-5
GpG methylation−0.3600.050−7.181.25e-12
Nucleotide diversity0.1400.0344.075.0e-5
Substitution density−0.0380.056−0.6850.493
R2 = 0.345
LD-based recombination rates
Gene density−0.0090.031−0.110.836
Repeat density0.2180.0832.638.6e-3
GC content0.1110.0502.220.027
GpG methylation−0.0210.060−0.360.717
Nucleotide diversity0.1760.0414.262.2e-5
Substitution density−0.1630.067−2.470.015
R2 = 0.054

Discussion

P. tremula fine-scale genetic maps and physical assembly Potra v1.2

The genetic maps presented here are the most marker-dense maps produced for P. tremula to date. Our female map is only 20 cM larger than the male map, despite having 518 more informative markers (Table 1) and most chromosomes have size differences of less than 10 cM between sexes (Table 1, Figure S6). However, in cases where we observed differences between the maps for the two sexes exceeding 10 cM, the male map is shorter in all but one case. These results, together with the overall shorter linkage map for the male, could suggest overall lower recombination rates in males, in line with what has been observed in many other highly outcrossing plant species (Lenormand and Dutheil 2005). The high marker density in our framework genetic maps allowed us to anchor 57.9% of the P. tremula v1.1 genome assembly on to the expected 19 chromosomes, providing us with the first chromosome-scale assembly for P. tremula (Table 2). The map length in the section Populus, to which P. tremula belongs (Wang ), has previously been estimated to be 1,600-3,500 cM (e.g., Zhang , Paolucci , Zhigunov ). The most relevant comparison for our purposes is the recently produced linkage maps in P. tremula by Zhigunov . Their map contains 2000 informative markers with an average marker distance of 1.5 cM that were observed in 122 progenies, resulting in a total map length of 3000-3100 cM. Our framework maps are much denser with ca. 12000-13000 informative markers (Table 1) and with an average distance of ∼0.3 cM between markers. In addition, our map is based on a mapping population consisting of 764 progenies and we are hence able to achieve a far greater resolution in our maps. However, larger data sets, both with respect to the number of markers and the number of progenies used, increase the risk of genotyping errors. Genotyping errors will ultimately lead to an inflation of map sizes as errors can be interpreted as recombination events during map creation and this could help explain why our maps are roughly 1000 cM longer than those reported by Zhigunov , given that we use 5-7 -fold more markers and a mapping population that is six times larger. On the other hand, our framework genetic maps are similar in size to the ca. 4200 cM and 3800 cM maps presented by Tong for the more distantly related species Populus deltoides and Populus simonii, respectively. The large size of these maps led Tong to suggest that their maps were suffering from inflation due to the difficulty of properly ordering a large number of markers within a linkage group. While we likely also suffer from such size inflation, these issues appear to be less severe in our P. tremula parental maps, which contain between 8-14 times the number of markers used in the P. deltoides (1601) and P. simonii (940) maps and yet yield linkage maps of similar size. One explanation for this is our considerably larger mapping population compared to the P. deltoides and P. simonii maps (299 progenies). A greater number of segregating progenies helps mitigate the problems of ordering a larger number of markers by increasing resolution of recombination detection.

Recombination rate estimates

Recombination rates estimated from both the consensus linkage map and from polymorphism data showed substantial variation across all chromosomes on Mbp scales (Figure 2). For the consensus genetic map-based estimates and LD-based estimates, the majority of our observations fell in the range 0-27 cM/Mbp (Figure 2, Figure S8) which is similar to what has been observed in other plants such as Arabidopsis thaliana (Giraut ), Populus trichocarpa (Slavov ) and Eucalyptus grandis (Silva-Junior and Grattapaglia 2015), where recombination rates across chromosomes mostly fall within 0-25 cM/Mbp. We observed a small number of genomic windows where the LDB recombination rates were either 1.5-15-fold higher or lower than the corresponding estimates based on the consensus genetic map (Figure 2, Figure S8). While we do not know for certain what causes these large differences in recombination rates, a possible explanation could be that such windows harbor recombination hotspots or coldspots that the comparatively coarse linkage map fails to detect. Recombination hotspots, with local recombination rates 10 to 100-fold higher than the genome-wide average, have been observed in a number of species, including Drosophila melanogaster (Chan ), Arabidopsis thaliana (Kim ), Zea mays (He and Dooner 2009), Oryza sativa (Si ) and Eucalyptus grandis (Silva-Junior and Grattapaglia 2015). Similarly, coldspots have been identified in Zea mays (He and Dooner 2009) and Oryza sativa (Si ) among others. Hotspots or coldspots for recombination are, however, often quite restricted in size (Choi and Henderson 2015), spanning only a few kbp, and the relatively coarse recombination maps produced here are consequently not suitable for accurate detection of such regions. It is also worth noting that in species with lower levels of diversity, the resolution of LD based estimates would be lower and the power to detect hot- and coldspots for LDB and LMB estimates may not differ much for these species. The average recombination rate in P. tremula is 2-27 times higher than those found in a number of, mostly domesticated, plant and animal species (reviewed in Henderson (2012) and Tiley and Burleigh (2015)), suggesting that P. tremula exhibits recombination rates that are among the highest recorded in the animal and plant kingdoms. Of the species covered in these reviews, P. trichocarpa (Slavov ) makes for the most interesting comparison since it is one of the few undomesticated species listed, is closely related to P. tremula and has previously been compared with P. tremula (Wang ). Despite the close relationship, the average recombination rate in P. trichocarpa (Slavov ) is less than a third of what we estimated for P. tremula. Similar observations were previously made by Wang who found that population-based recombination rates in P. trichocarpa were on average only a quarter of the corresponding values in P. tremula. Wang argued that the differences they observed in recombination rates between P. tremula and P. trichocarpa could at least partly stem from differences in the effective population size (Ne) of the two species (Wang ). In light of this, it would be interesting to perform further comparisons of recombination rates between P. tremula and other Populus species that have wide distribution ranges and large Ne, such as P. deltoides (Tong ) or P. tremuloides (Wang ).

Correlations between recombination rate and genomic features

Recombination rate estimates from the consensus linkage map and from polymorphism data showed a moderately strong positive correlation (>0.4) (Figure 3). A similar correlation between linkage map and LD-based estimates of recombination was also observed in Mus musculus by Booker , suggesting that LDB recombination rate estimates are reliable substitutes for genetic map-based recombination rate estimates. A recent study in Gasterosteus aculeatus (Shanfelter ), however, found even stronger correlations between the two estimates (∼0.8). We observed a strong positive correlation between recombination rate and gene density (0.45 and 0.41 respectively) (Figure 3). This is in line with earlier observations in plants (Tiley and Burleigh 2015, Stapley ) and implies that recombination may be linked to gene-dense regions through a higher recruitment of the recombination machinery to euchromatic genome regions. Preferential recruitment of recombination to euchromatic genome regions has also been put forward as an explanation for why recombination rates across plants generally show stronger correlations with gene density compared to genome size (Henderson 2012, Tiley and Burleigh 2015). Studies in plants like Arabidopsis thaliana and Oryza sativa have shown that while crossover events are enriched in genic regions, they mostly occur in promoters a few hundred bps upstream of the transcription start site or downstream of the transcription termination site (Choi , Marand ). We observed negative correlations between local recombination rates and both repeat density and methylation (Figure 3), in line with earlier results that highlighted the role of chromatin features in establishing crossover locations in plants (Choi , Marand ). For instance, Choi showed that methylation is lower at observed sites of crossovers and Rodgers-Melnick showed that cross-over density in Zea mays is negatively correlated with repeats and CpG methylation. All methylation contexts were highly correlated in our data and also strongly correlated with repeat density (≥0.8, Figure 3), in line with the observation that most repetitive elements in plant genomes are strongly methylated (Saze and Kakutani 2011). Compared to earlier results from P. tremula, we observed a weaker correlation between recombination rates and gene density (Wang ). One possible reason for this is likely to be the reference genome used. Wang based their analyses on the P. trichocarpa reference genome whereas our analyses were based on a de novo assembly for P. tremula. The P. trichocarpa assembly, while more contiguous than our current P. tremula assembly, is less ideal for these types of analyses since divergence between the two species leads to substantially reduced rates of read mapping primarily in intergenic regions (Lin ). Our current assembly, while only representing 55% of the expected genome size of P. tremula, likely offers a more unbiased set of genomic regions where we are able to call genetic variants. In contrast, the data derived from using the P. trichocarpa reference genome likely suffers from under-representations of repeat-rich regions and other intergenic regions (Wang ). GC-content was positively correlated with both our recombination rate estimates, similar to what has been observed in humans (Homo sapiens) (e.g., Fullerton ) and Arabidopsis thaliana (Kim ) among others. However, when GC-content was included in a multiple regression model with other genomic features, the direct effect of GC content was actually negative for the LMB recombination rate (Table 3). GC-content is strongly correlated with gene density (0.50) in P. tremula, and gene density is in turn also strongly positively correlated with recombination (Figure 3). The strand separation needed in the strand invasion of meiotic recombination is harder to achieve in areas with high GC-content due to higher annealing energy and can explain why GC-content has a direct negative effect on recombination rates when effects of gene density are accounted for (Table 3, e.g., Mandel and Marmur 1968).

Effects of linked selection on patterns of nucleotide diversity in P. tremula

Both of our recombination rate estimates were strongly correlated with nucleotide diversity at putatively neutral sites (Figure 3). A positive correlation between local recombination rate and nucleotide polymorphism is usually interpreted as a signature of ubiquitous natural selection acting either through positive (hitchhiking) or negative (background) selection (Begun and Aquadro 1992). Alternatively, such a correlation could also arise if recombination itself is mutagenic (Begun and Aquadro 1992). However, if recombination is mutagenic one also expects to see a correlation between recombination and sequence divergence at neutral sites (Begun and Aquadro 1992). Our data show little evidence supporting the idea that recombination has a direct mutagenic effect as we observed only a weak and negative correlation between local recombination rates and substitutions at putatively neutral sites (Figure 3). In light of this, and in line with earlier results, we observed that linked selection has pervasive effects on neutral diversity across the P. tremula genome (Ingvarsson 2010, Wang ).

Conclusions

Our high-density Populus tremula genetic maps and the new chromosome-scale genome assembly we present here provide a valuable resource not only for P. tremula, but also for comparative genomics studies within the entire genus Populus. Estimates of recombination rates derived from two different approaches were in broad agreement and showed similar correlations various genomic features. However, our results suggest that LD-based estimates of recombination might be especially useful for identifying fine scale recombination variation and for identifying features such as recombination hot- or cold-spots as it is based on information derived from population-scale variation. We have also verified and extended the observation that linked selection is an important force shaping genome-wide variation in P. tremula by showing that the positive correlation between local recombination rates and nucleotide diversity and neutral sites is robust even when factoring in the effects of other genomic features. Although a positive correlation between recombination and diversity is a hallmark signature of linked selection, the pattern can be established by either positive or negative selection. We have earlier documented evidence for both a reduction in levels of standing variation due to recurrent hitchhiking (Ingvarsson 2010) and a reduction in the efficacy of purifying selection at eliminating weakly deleterious variants in regions of low recombination (Wang ). More work is thus needed to assess the relative importance of positive and negative selection in shaping genome-wide variation in P. tremula and having access to the resources developed here will facilitate these studies.
  51 in total

Review 1.  Populus: arabidopsis for forestry. Do we need a model tree?

Authors:  Gail Taylor
Journal:  Ann Bot       Date:  2002-12       Impact factor: 4.357

Review 2.  Variation in recombination rate across the genome: evidence and implications.

Authors:  Michael W Nachman
Journal:  Curr Opin Genet Dev       Date:  2002-12       Impact factor: 5.578

3.  Evolution in Mendelian Populations.

Authors:  S Wright
Journal:  Genetics       Date:  1931-03       Impact factor: 4.562

4.  Recombination and linkage disequilibrium in Arabidopsis thaliana.

Authors:  Sung Kim; Vincent Plagnol; Tina T Hu; Christopher Toomajian; Richard M Clark; Stephan Ossowski; Joseph R Ecker; Detlef Weigel; Magnus Nordborg
Journal:  Nat Genet       Date:  2007-08-05       Impact factor: 38.330

5.  Genome-wide patterns of recombination, linkage disequilibrium and nucleotide diversity from pooled resequencing and single nucleotide polymorphism genotyping unlock the evolutionary history of Eucalyptus grandis.

Authors:  Orzenil B Silva-Junior; Dario Grattapaglia
Journal:  New Phytol       Date:  2015-06-16       Impact factor: 10.151

6.  Natural selection on synonymous and nonsynonymous mutations shapes patterns of polymorphism in Populus tremula.

Authors:  Pär K Ingvarsson
Journal:  Mol Biol Evol       Date:  2009-10-16       Impact factor: 16.240

Review 7.  Populus: a model system for plant biology.

Authors:  Stefan Jansson; Carl J Douglas
Journal:  Annu Rev Plant Biol       Date:  2007       Impact factor: 26.379

8.  Genome resequencing reveals multiscale geographic structure and extensive linkage disequilibrium in the forest tree Populus trichocarpa.

Authors:  Gancho T Slavov; Stephen P DiFazio; Joel Martin; Wendy Schackwitz; Wellington Muchero; Eli Rodgers-Melnick; Mindie F Lipphardt; Christa P Pennacchio; Uffe Hellsten; Len A Pennacchio; Lee E Gunter; Priya Ranjan; Kelly Vining; Kyle R Pomraning; Larry J Wilhelm; Matteo Pellegrini; Todd C Mockler; Michael Freitag; Armando Geraldes; Yousry A El-Kassaby; Shawn D Mansfield; Quentin C B Cronk; Carl J Douglas; Steven H Strauss; Dan Rokhsar; Gerald A Tuskan
Journal:  New Phytol       Date:  2012-08-03       Impact factor: 10.151

9.  A framework for variation discovery and genotyping using next-generation DNA sequencing data.

Authors:  Mark A DePristo; Eric Banks; Ryan Poplin; Kiran V Garimella; Jared R Maguire; Christopher Hartl; Anthony A Philippakis; Guillermo del Angel; Manuel A Rivas; Matt Hanna; Aaron McKenna; Tim J Fennell; Andrew M Kernytsky; Andrey Y Sivachenko; Kristian Cibulskis; Stacey B Gabriel; David Altshuler; Mark J Daly
Journal:  Nat Genet       Date:  2011-04-10       Impact factor: 38.330

10.  Genome-wide fine-scale recombination rate variation in Drosophila melanogaster.

Authors:  Andrew H Chan; Paul A Jenkins; Yun S Song
Journal:  PLoS Genet       Date:  2012-12-20       Impact factor: 5.917

View more
  6 in total

1.  Diversity and determinants of recombination landscapes in flowering plants.

Authors:  Thomas Brazier; Sylvain Glémin
Journal:  PLoS Genet       Date:  2022-08-30       Impact factor: 6.020

2.  Genomic prediction of the recombination rate variation in barley - A route to highly recombinogenic genotypes.

Authors:  Federico Casale; Delphine Van Inghelandt; Marius Weisweiler; Jinquan Li; Benjamin Stich
Journal:  Plant Biotechnol J       Date:  2021-12-11       Impact factor: 9.803

3.  A Novel Strategy to Reveal the Landscape of Crossovers in an F1 Hybrid Population of Populus deltoides and Populus simonii.

Authors:  Zhiting Li; Wei Zhao; Jinpeng Zhang; Zhiliang Pan; Shengjun Bai; Chunfa Tong
Journal:  Plants (Basel)       Date:  2022-04-12

4.  Variation in fine-scale recombination rate in temperature-evolved Drosophila melanogaster populations in response to selection.

Authors:  Ari Winbush; Nadia D Singh
Journal:  G3 (Bethesda)       Date:  2022-09-30       Impact factor: 3.542

5.  High-Resolution Estimates of Crossover and Noncrossover Recombination from a Captive Baboon Colony.

Authors:  Jeffrey D Wall; Jacqueline A Robinson; Laura A Cox
Journal:  Genome Biol Evol       Date:  2022-04-10       Impact factor: 4.065

6.  Adaptive Introgression Facilitates Adaptation to High Latitudes in European Aspen (Populus tremula L.).

Authors:  Martha Rendón-Anaya; Jonathan Wilson; Sæmundur Sveinsson; Aleksey Fedorkov; Joan Cottrell; Mark E S Bailey; Dainis Ruņis; Christian Lexer; Stefan Jansson; Kathryn M Robinson; Nathaniel R Street; Pär K Ingvarsson
Journal:  Mol Biol Evol       Date:  2021-10-27       Impact factor: 16.240

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.