Literature DB >> 35320272

Meiotic, genomic and evolutionary properties of crossover distribution in Drosophila yakuba.

Nikale Pettie¹, Ana Llopart^1,2, Josep M Comeron^1,2.

Abstract

The number and location of crossovers across genomes are highly regulated during meiosis, yet the key components controlling them are fast evolving, hindering our understanding of the mechanistic causes and evolutionary consequences of changes in crossover rates. Drosophila melanogaster has been a model species to study meiosis for more than a century, with an available high-resolution crossover map that is, nonetheless, missing for closely related species, thus preventing evolutionary context. Here, we applied a novel and highly efficient approach to generate whole-genome high-resolution crossover maps in D. yakuba to tackle multiple questions that benefit from being addressed collectively within an appropriate phylogenetic framework, in our case the D. melanogaster species subgroup. The genotyping of more than 1,600 individual meiotic events allowed us to identify several key distinct properties relative to D. melanogaster. We show that D. yakuba, in addition to higher crossover rates than D. melanogaster, has a stronger centromere effect and crossover assurance than any Drosophila species analyzed to date. We also report the presence of an active crossover-associated meiotic drive mechanism for the X chromosome that results in the preferential inclusion in oocytes of chromatids with crossovers. Our evolutionary and genomic analyses suggest that the genome-wide landscape of crossover rates in D. yakuba has been fairly stable and captures a significant signal of the ancestral crossover landscape for the whole D. melanogaster subgroup, even informative for the D. melanogaster lineage. Contemporary crossover rates in D. melanogaster, on the other hand, do not recapitulate ancestral crossovers landscapes. As a result, the temporal stability of crossover landscapes observed in D. yakuba makes this species an ideal system for applying population genetic models of selection and linkage, given that these models assume temporal constancy in linkage effects. Our studies emphasize the importance of generating multiple high-resolution crossover rate maps within a coherent phylogenetic context to broaden our understanding of crossover control during meiosis and to improve studies on the evolutionary consequences of variable crossover rates across genomes and time.

Entities: Chemical

Mesh：

Substances：
Drosophila Proteins

Year: 2022 PMID： 35320272 PMCID： PMC8979470 DOI： 10.1371/journal.pgen.1010087

Source DB: PubMed Journal: PLoS Genet ISSN： 1553-7390 Impact factor: 5.917

Introduction

Meiotic recombination is an essential cellular and evolutionary process. During meiosis, double strand breaks (DSBs) are typically repaired by reciprocal exchange between homologous chromosomes that result in either crossovers or non-crossover gene conversion events [1-6]. Crossovers and the cross-connections they form between non-sister chromatids (chiasmata) are essential to signal proper chromosome segregation during the first meiotic division in many species [7,8]. Meiotic recombination and crossovers also play critical roles in evolution, directly generating new combinations of alleles and, indirectly, increasing the level of genetic variation within species and the rate of adaptations [9-15]. Given the important dual role of meiotic recombination and crossing-over, it is not surprising that both processes are present in the vast majority of eukaryotic organisms. Crossover number and distribution are critically regulated, with three phenomena playing key and likely interrelated roles: centromere effect, crossover interference and crossover assurance. The centromere effect and crossover interference influence crossover patterning along chromosomes while crossover assurance plays a role in the distribution of crossovers among tetrads or bivalents (reviewed in [16,17]). The centromere effect describes the classic observation in Drosophila that crossovers do not form in regions near the centromeres [18-24], a phenomenon that has also been observed in many other organisms such as Saccharomyces cerevisiae, Arabidopsis thaliana, and humans [25-29]. Although the causes of the centromere effect are not yet fully understood, it seems to be influenced by the distance from the centromere and highly repetitive heterochromatin [17,20,23,24,30]. A similar but weaker pattern is also observed near the telomeres in D. melanogaster [17,18,24]. The extent of the centromere effect varies between species, even among Drosophila. For instance, it is weaker in D. simulans than in D. melanogaster, and even weaker—if at all present—in D. mauritiana based on crossover data using P-elements as markers [31]. Studies in the sister species D. pseudoobscura and D. persimilis also show a tendency towards detectable but weaker centromere effect than in D. melanogaster [32-34]. Notably, all these Drosophila species with weaker centromere effect than D. melanogaster have higher crossover rates and longer genetic maps. Crossover interference occurs when one crossover inhibits the formation of a second crossover in neighboring regions and results in fewer and more distant double crossovers than expected if the formation of each crossover along a chromosome arm was independent [35,36]. This phenomenon was first identified in D. melanogaster [35-41] and has now been observed in many other, but not all, organisms [28,42-49]. Interestingly, two species with no apparent crossover interference (fission yeast and Aspergillus nidulans) also lack a synaptonemal complex [50-52]. In support for a direct role of the synaptonemal complex in crossover interference, mutants for the synaptonemal complex protein ZIP1 in S. cerevisiae lack interference [53]. On the other hand, the complete synaptonemal complex is not required for interference in D. melanogaster [40]. Worth noting, neither the centromere effect nor crossover interference could be detected in D. melanogaster Blm (Bloom syndrome helicase) mutants, suggesting a likely unifying pathway for crossover patterning across chromosomes [37]. Crossover assurance or “obligate chiasma” is the observation that each tetrad or pair of homologous chromosomes undergoes at least one crossover, irrespective of chromosome length or total number of crossovers genome-wide [16,40,54]. Because crossover formation is important for proper chromosome segregation at meiosis I, crossover assurance is expected to be under critical and tight regulatory control. As expected, most species with a limited number of crossovers per meiosis show some degree of crossover assurance, thus reducing the chance of tetrads with zero crossovers. C. elegans is an extreme case of absolute crossover assurance, with a percentage of tetrads with no crossovers (E tetrads) representing less than 1% [55]. Nevertheless, crossover formation is not always required for faithful segregation of homologs. In Drosophila, as in most diptera, males show faithful chromosome segregation despite no crossing over during meiosis [56-58]. Females also show proper achiasmate chromosome segregation of the small (dot) chromosome (also known as Muller F element and chromosome 4 in the D. melanogaster subgroup) through alternative mechanisms now known to involve centromeres and heterochromatic DNA [59-63]. A deep understanding of the different mechanisms controlling meiosis, therefore, benefits from quantifying E0 and more generally the full series of E tetrad classes (where r indicates the number of crossovers in a tetrad). The direct quantification of E tetrads is, however, only possible in systems where the four products of meiosis are recovered. In species where only one of the four chromatids resulting from meiosis can be analyzed (as in most species including Drosophila), E can be estimated under specific meiotic models based on the fraction of products with different numbers of crossovers. The first and most frequently used model to estimate E was proposed by Weinstein (1936) under the assumptions of no crossover interference, no chromatid interference, random segregation of chromatids into haploid gametes and equal viability among all possible meiotic products ([64,65]; see Materials and Methods for details). In D. melanogaster, estimates of E (based on Weinsten’s model unless otherwise noted) vary between studies, genotypes and conditions but the general trend suggests that crossover assurance in female meiosis is incomplete (E > 0; see [17] and references therein). Using Whole-Genome Sequencing (WGS), Miller et al. (2016) [41] calculated an overall E of 0.112, with similar values for the X chromosome and autosomes whereas studies of visible markers (observed fly phenotypes) and much larger sample sizes often show smaller estimates of E for the X chromosome than for autosomes [16,30,41,66]. In all, complete or absolute crossover assurance has not been reported for any chromosome of D. melanogaster wild-type flies. Interestingly, studies in D. virilis show smaller E0 compared to D. melanogaster [67,68], a result that has been proposed to be the consequence of the higher rate of recombination and longer genetic map [67]. Insights into the distribution of crossover rates across genomes also provides an excellent opportunity for testing hypotheses concerning evolutionary processes. A variety of models of selection predict that variation in recombination rates has consequences on rates of evolution and levels of variation within species (diversity). These models rely on the notion that selection acting at a locus or nucleotide site causes population dynamics at neighboring genomic regions that can be understood as a reduction in local N relative to a case without selection [69-77]. Notably, this phenomenon results from either positive or negative selection [69,70,74,77-83]. Although the precise effects of selection and linkage on N depend on numerous factors, all these models predict a positive association between intragenomic variation in crossover rates and both the efficacy of selection (which is influenced by the population-scaled selection coefficient; N × s) and the level of neutral intraspecific variation or diversity (which is influenced by the population-scaled mutation rate; N × μ) (reviewed in [72,73,84]). Whereas the predicted effect of variation in crossover rates on neutral diversity has been observed in many species ([72, 73,84,85]; although see [86]), studies in humans, Drosophila and yeast have provided mixed support for the prediction that variation in crossover rates across genomes modulates the efficacy of selection as estimated from either codon usage bias or rates of protein evolution (see [87,88] and references therein). In many species, synonymous codons are not used in equal frequencies. This observation has been proposed to be the result—at least partially—of weak selection favoring a subset of codons that increase translational speed and accuracy, particularly in highly expressed genes (see [89-92] and references therein). This model directly predicts a positive association between crossover rates (through increased efficacy of selection) and the degree of bias towards preferred codons (CUB) across genomes. In D. melanogaster, initial studies of variation in crossover rates and CUB provided some support for models of weak selection and linkage [93,94] but later studies using larger data sets reported different trends for the X chromosome and autosomes, and instances of a negative association between recombination rates and CUB [95-98]. These unexpected results for CUB in D. melanogaster are difficult to reconcile with models of selection and linkage, given that either demographic effects or the inclusion of a fraction of mutations under strong selection [99,100] would predict a weakly positive or no association between crossover rates and CUB, but not a significantly negative one. At the same time, analyses of the rate of protein evolution that include data from the D. melanogaster lineage show rates that are not (or only very weakly negatively) associated with crosover rates after excluding genes putatively under positive selection. Analyses focusing on genes likely under positive selection for amino acid changes show either no significant or a weakly significant positive association between crossover rates and rates of protein evolution [98,101-105]. Studies in the D. pseudoobscura subgroup show no correlation between variation in crossover rates and rates of protein evolution [32]. The causes for the apparent weak support for models of selection and linkage when applied to the efficacy of selection can be multiple, including limitations when capturing potential demographic events [73,106-108]. Most of these previous studies also used low-resolution crossover maps, which is significant because the effect of variation in crossover rates on N across genomes has been shown to be best captured when using fine-scale crossover data [32,34,80,83,109]. Additionally, there are potential biases associated with maps based on visible markers or P-elements with reporter genes, resulting from combinations of markers with different viability or the presence of transposable elements (TEs) that can directly affect recombination frequencies near insertion sites [110-112]. The use of WGS-based high-resolution crossover maps, however, reduces many of such limitations [113-119]. Another caveat to most of these previous studies is the frequent assumption that crossover maps observed today capture, to a significant degree, ancestral crossover rates despite evidence that the rate and genomic distribution of crossovers are fast-evolving, with differences between distant as well as closely related species [32,113,120-130]. Temporal changes in recombination landscapes would decouple our contemporary measures of recombination rates (and contemporary N) from the long-term N influencing efficacy of selection, thus reducing the genomic correlations predicted by models ([73,86,107] and references therein). These temporal changes in recombination landscapes and local N can be understood as similar to those proposed after demographic events but with the additional challenge that different genomic regions will follow different non-equilibrium conditions and dynamics whereas demographic changes act genome-wide. In this study, we apply a novel, highly-efficient, WGS approach to generate a high-resolution crossover map in D. yakuba and tackle multiple questions that benefit from being addressed collectively and within a phylogenetic framework. These questions range from the patterns of crossover distribution across genomes and between chromatids, to the relationship between crossover rates and repetitive elements and short motifs, and the evolutionary consequences of crossover rate variation across genomes and time. Our data in D. yakuba can be directly compared with those of the related species D. melanogaster, a model system to study crossover and meiosis for a century for which high-resolution crossover maps already exist [41,113]. Importantly, our new crossover landscape in D. yakuba also allows an initial study of the evolutionary effects of variation in recombination landscapes within the framework of the D. melanogaster subgroup, and specifically test the hypothesis that the lack of general support for models of selection and linkage in the D. melanogaster lineage may be the consequence of recent changes in the distribution of crossover rates across the D. melanogaster genome [98,131].

Results

A high-resolution crossover map in D. yakuba using dual SNP-barcode genotyping

To generate a whole-genome high-resolution crossover map and genotype a high number of F2 flies with Illumina WGS, we used a new highy efficient dual-barcoding genotyping scheme (). This method involves the analysis of multiple crosses between different parental lines and allows combining multiple F2 individuals (one per cross) per individual library barcode-sequence to later ‘demultiplex’ them bioinformatically based on diagnostic SNPs (genetic barcodes). These diagnostic SNPs are singletons, variants unique to a single line used in the study, and every Illumina read containing one can be assigned to a single parental genome and cross. Furthermore, before genotyping F2 individuals we improved the D. yakuba reference genome using PacBio long-read sequencing to obtain high-quality sequences of all parental lines used in the crosses, with an average coverage across all chromosomes of 125× (see Materials and Methods and ). Each parental line had more than 90,000 diagnostic SNPs, about one every 1.3 kb across the genome, and were used for the dual-barcoding method as well as to identify the genomic location of crossovers.

Dual-barcoding genotyping method used to obtain crossover rates.

Diagnostic SNPs are used as genetic barcodes and allow the pooling of multiple F2 individuals from different crosses for a given sequence barcode (left panel). The combination of genetic and sequence barcodes ensures efficient genotyping of multiple individuals and accurate crossover localization along chromosome arms (right panel). Diagnostic, or strain-specific SNPs, are singletons for the complete set of genotypes used in the study, including the tester line. More than 1,600 F2 individuals from three crosses of wild-type lines were genotyped to generate this first high-resolution genetic map in D. yakuba ( One cross (Sn20 x Sn17) showed a heterozygous chromosomal inversion on 2R (see ) and, therefore, no data from that chromosome arm were used in the analyses. In all, we recovered a total of 5,273 crossover events distributed across all chromosome arms except the dot chromosome ( and Tables and S1). The absence of crossovers on this small chromosome 4 despite the presence of diagnostic SNPs is consistent with results in other Drosophila species [37,41,113,132,133]. There are fewer non-crossovers (NCOs), more single crossovers (1COs) and, especially, more double crossovers (2COs) in D. yakuba than in D. melanogaster (). This trend is observed across all chromosomes examined, with the X chromosome of D. yakuba showing the largest increase (64%) in crossovers relative to D. melanogaster. Notably, we observed multiple chromatids with triple crossovers (3COs) and 0.05% (three) cases with four crossovers (4COs).

Number and distribution of crossovers.

A) Observed crossovers per chromatid in D. yakuba (this study) and D. melanogaster (see Materials and Methods for details). Note that studies using visible markers are based on much larger sample sizes but, at the same time, may underestimate crossover events due to limited density of markers along chromosome arms. NCO: non-crossover, 1CO: single crossover, 2CO: double crossover, 3CO: triple crossover, 4CO: quadruple crossovers per chromosome arm. B) Relative location of crossovers along chromosome arms in D. yakuba. Each horizontal line represents a chromosome with one or more crossovers. A change in color represents a crossover event, starting from the telomere (blue). Chromosome arms have been ordered based on crossover class for better visualization, from 1CO (top) to 4 CO (bottom). Number of chromatids analyzed (n): X: n = 1,622; 2L: n = 1,661; 2R: n = 841; 3L: n = 1,701; 3R: n = 1,635. NCO: zero crossover, 1CO: single crossover, 2CO: double crossover, 3CO: triple crossover, 4CO: 4 crossovers in a single chromatid. 2The Sn20 x Sn17 cross is heterozygous for an inversion on 2R ( and ) and, therefore, no crossovers were analyzed for this chromosome arm. For comparison, meiotic events in D. melanogaster analyzed in this study are shown in . We observe some variation in the average number of crossovers per gamete among the three crosses analyzed (from 3.25 to 3.76). Nevertheless, there is a much larger degree of variability in the distribution of crossovers per gamete within a cross, and an ANOVA test indictes that cross plays a minor role in the number of crossovers per gamete (ANOVA test, partial η2 = 0.015; ). Additionally, there is no evidence of an interchromosomal effect [39,134,135] caused by the presence of a heterozygous inversion on chromosome 2R in the Sn20 x Sn17 cross given that the number of crossovers in freely recombining chromosome arms for this cross is not higher than in the other two crosses ( In agreement, an ANOVA test shows that the presence of the heterozygous 2R inversion has no detectable effect on the number of crossovers per gamete on freely recombining chromosome arms (partial η2 = 0.001; P > 0.05). We, therefore, performed all analyses combining the results of the three crosses unless noted. In D. yakuba, the average number of crossovers per chromosome arm is 0.94, 0.57, 0.55, 0.73 and 0.67, for the X chromosome and chromosome arms 2L, 2R, 3L and 3R, respectively. Overall, our study shows an average of 3.46 crossovers per viable meiotic product relative to 2.76 in D. melanogaster. The total genetic map length for D. yakuba obtained in our crosses is 339.31 cM, about 25% longer than that for D. melanogaster [41,113,136]. Average crossover rates are highest for chromosome X (4.07 cM/Mb) and range between 2.00 and 2.76 cM/Mb for autosomal arms (2.00, 2.47, 2.51 and 2.76 cM/Mb for chromosome arms 3R, 2L, 2R and 3L, respectively). Although recombination maps generated for the different crosses show some differences, the main trends of intrachromosomal variation are consistent among crosses for all chromosome arms ().

High-resolution crossover maps for D. yakuba.

Crossover rates (cM/Mb) are shown along chromosome arms from three different crosses as well as the average (thick red line). Maps are shown for overlapping 250-kb windows with increments of 50 kb. Below each chromosome, vertical blue and red bars indicate the presence of transposable and INE-1 elements, respectively. The scale for the different chromosome arms is equivalent and differences in figure size capture differences in chromosome arm length.

Centromere and telomere effects

Similar to other Drosophila species, D. yakuba shows a noticeable centromere effect (Figs and ). Using the standard approach of comparing observed and expected crossover numbers within a centromere-proximal region of a predetermined size (e.g., one-third of the chromosome arm [41]), we identified significant centromere effects on the autosomes but not on the X chromosome ( and ). To compare the magnitude of centromere effects among chromosome arms and between species, we used a more quantitative method to estimate the actual size of centromere-proximal regions with a significant deficit in crossovers relative to expectations (see Materials and Methods). Using this method, we identified a centromere effect in all chromosome arms of D. yakuba ( and ). On autosomes, the proportion of the chromosome that experiences the centromere effect is much larger in D. yakuba than in D. melanogaster, with chromosome arms 2L and 3R in D. yakuba showing an almost two-fold increase in the size of the affected genomic region relative to D. melanogaster. Chromosome arm 3R in D. yakuba, in particular, stands out with 47% and 52% of the arm showing a detectable centromere effect at P = 1×10−6 and P = 0.01 significance levels, respectively. On the other hand, the centromere effect is weakest on the X chromosome of D. yakuba, as it was also reported for D. melanogaster [21,41,113,137], and both species show a similar proportion of the chromosome influenced by the centromere effect (12% and 13% of the chromosome arm at P = 1×10−6 and P = 0.01 significance levels, respectively).

Centromere and telomere effects for different chromosome arms of D. yakuba and D. melanogaster.

For D. melanogaster results are shown for genome r5.3 and r6 assemblies. The proportion of the chromosome with significant reduction in crossovers is based on the study of overlapping 1-Mb windows with increments of 100 kb. Two levels of significance are shown. To examine the telomere effect, we first used the standard methodology applied to the telomere-proximal one-third of the chromosome arm, and identified a significant deficit of crossovers on the D. yakuba X chromosome but not on any of the autosomal arms, reversing the observation for the centromere effect (). The use of the quantitative method to identify the proportion of the chromosome affected by the telomere effect in D. yakuba confirms a much smaller effect in all chromosome arms compared to the centromere effect (), a trend that is also observed in D. melanogaster [21,31,41,113]. Our results for D. yakuba are also consistent with the observation in D. melanogaster that the X chromosome has the strongest telomere effect (12% and 11% of the chromosome for D. yakuba and D. melanogaster, respectively, at P = 1×10−6 significance level). Chromosome arm 3L, on the other hand, shows the weakest telomere effect in D. yakuba (as it is also the case in D. melanogaster), with <3% and 4% of the chromosome arm affected at P = 1×10−6 and P = 0.01 significance levels, respectively.

Satellite repeats and the centromere effect

Although the mechanisms controlling the magnitude of the centromere effect are not fully understood, analyses in D. melanogaster suggest that the distance to the centromere plays a role, with increasing crossover suppression in peri-centromeric regions when centromeric heterochromatin is removed [137,138]. At the same time, centromeres are heavily enriched in satellite DNA sequences [137,139-141]. We, therefore, investigated whether the greater centromere effect in D. yakuba relative to D. melanogaster could be related to reduced centromeric heterochromtin and centromere size (see ). The complete sequence of centromeres in D. yakuba is yet unknown and, therefore, we estimated the amount of satellite DNA in heterochromatic regions as an indirect measure of centromere size. We used PacBio reads that do and do not align to our updated euchromatic D. yakuba assembly as a proxy for euchromatic and heterochromatic sequences, respectively, and characterized satellite DNA for simple repeats (k-mers) using k-Seek [142]. For D. melanogaster, PacBio reads were aligned to both the r5.3 and r6 releases [143], with similar conclusions and, therefore, only the r6 results will be discussed here. In agreement with previous studies indicating a high turnover rate in satellite repeats between closely related Drosophila species [139-141], we find that satellites that are most enriched in the centromeres of D. melanogaster are mostly absent in the heterochromatic reads of D. yakuba (). Overall, heterochromatic reads in D. yakuba show a lower abundance (more than 20-fold) of satellite repeats than those in D. melanogaster and this tendency remains when considering the difference observed in euchromatic reads, which is only 4-fold (). Tentatively, these results support the notion that centromeres in D. yakuba are likely shorter than those in D. melanogaster, and shorter centromeres in D. yakuba may play a role in the stronger centromere effect relative to D. melanogaster.

Crossover interference in D. yakuba

In this study we recovered a total of 673 chromatids with 2 crossovers from a single meiosis event, thus allowing for a detailed study of crossover interference. Genome-wide, the average inter-crossover distance (ICD) in 2CO chromatids is shorter than expected if crossovers were randomly distributed across the genome (7.5 vs 8.4Mb, respectively), with a minimum ICD of 820 Kb. The average ICD in 3CO chromatids is also shorter than random expectations (5.9 vs 6.3 Mb, respectively). Indeed, the standard approach of studying crossover interference by randomly positioning crossover events across the genome would suggest that crossover interference is completely absent in D. yakuba. However, given the strong centromere/telomere effects, we quantified crossover interference based on expectations from the distribution of crossovers in 1CO chromatids (see Materials and Methods for details). The ratio of observed to expected ICD using this approach is 1.36 genome-wide, with the highest values observed for chromosome arms 2L and 3R (1.54 and 1.51, respectively) and the lowest values observed for the X chromosome and chromosome arm 2R (1.17 and 1.08, respectively) (). These analyses reveal a highly significant tendency for crossovers in 2CO chromatids to be more distant than expected (positive interference) for all chromosome arms except 2R, with a weak, albeit significant, departure from genome-wide expectations (P = 0.046) that becomes highly significant when removing 2R (P < 1×10−6). 1 Observed average inter-crossover distance (ICD; in kb) in 2CO chromatids. 2 Expected ICD based on two randomly chosen crossovers from 1CO chromatids (see text for details). We also studied crossover interference based on the full distribution of ICD in 2CO chromatids and estimated the shape parameter (ν) of a gamma distribution (see and Materials and Methods). Estimates of ν greater than those expected under no crossover interference (ν = 1 under ideal conditions; see Materials and Methods) are an indication of positive interference. In D. melanogaster, a large-scale study of visible markers along the X chromosome indicates ν ~ 5 [144]. In D. yakuba, estimates of ν range between 3.85 and 9.5 (for the X chromosome and chromosome arm 3R, respectively) and provide additional support for a significant degree of positive crossover interference in this species.

Tetrad analysis shows stronger crossover assurance in D. yakuba than in D. melanogaster

Tetrad analysis suggests that there are fewer tetrads with zero crossovers (E0) in D. yakuba than in D. melanogaster ( and ). The direct application of Weinstein’s equations [64] generates a D. yakuba genome-wide estimate of E very close to zero (E = -0.005), which is compatible with absolute crossover assurance (P[E = 0] = 0.99; see Materials and Methods for details). Autosomes show a weak but positive estimate of E0 (E = 0.059) suggesting a stronger degree of crossover assurance than that reported for D. melanogaster autosomes. The D. yakuba X chromosome data, however, produces a negative estimate of E (E = -0.202).

Maximum Likelihood (ML) estimates of tetrad frequencies.

A) Estimates of E under a ML model with unrestricted E. Probability (y-axis) of models with variable E0 (x-axis), with Er>0 allowed to vary to provide the best fit for a given E0. 1 Estimates of tetrad frequency in D. melanogaster based on the distribution of 196 meiotic products analyzed by Miller et al. (2016) using whole genome sequencing (WGS). 2 Estimates of tetrad frequency in D. melanogaster based on visible markers (see text). B) Estimates of tetrad frequencies based on ML models with a biologically relevant range for E (E ≥ 0). Note that the best model for the D. yakuba X chromosome (E = 0) is, nonetheless, incompatible with the observed data [P(E = 0) = 1.6x10-13]. E0: tetrads that do not undergo crossing over, E1: tetrads with 1 CO, E2: tetrads with 2 COs, E3: tetrads with 3 COs, and E4: tetrads with 4 COs. Negative E0 can be caused by sampling errors and small sample sizes. Whereas limited sampling is hardly a drawback when using visible markers, it can become a serious limitation in studies based on WGS that are often restricted to a few hundred meiotic products. For instance, a typical distribution of crossover events for D. melanogaster of 50% NCO, 45% 1CO and 5% 2CO chromatids predicts E = 0.10 but random sampling could generate negative E in 10% and 22% of the studies when the number of analyzed chromosomes is 250 and 100, respectively. Given our very large sample size of 1,622 chromatids analyzed for the D. yakuba X chromosome, the likelihood of obtaining our negative E given a true E = 0 is remarkably small (P < 1×10−14). We then used a ML approach that allows adding diverse sets of rules (see Materials and Methods and [66,145]), and explored a model that constrains all tetrad classes to biologically relevant frequencies (E ≥ 0) (). Although the best model for the D. yakuba X chromosome under these conditions is E = 0, we can, nonetheless, rule out E ≥ 0 (P[E ≥0] < 1.2×10−14).

Analysis of viability effects

Besides statistical considerations, estimates of E0 under Weinstein’s model can generate negative values if at least one of the assumptions is violated. The most commonly argued violation is viability effects associated with specific combinations of alleles. This is often an issue when using visible markers to characterize crossovers, and has been shown to play a significant role in tetrad analyses in D. melanogaster [66]. Moreover, fitness effects associated with visible mutations can also alter recombination frequencies in other chromosome arms [39,41,134,135,146]. WGS genotyping methods such as ours and the use of wild type flies, on the other hand, can ameliorate such potential biases but viability effects should be still considered, especially if inbred lines are being used in the crossing schemes. This is because inbreeding could cause the fixation of recessive deleterious alleles, as evidenced by the fitness decline in inbred lines documented in many species, including Drosophila [147-151]. In such cases, the crossing of two inbred lines has the potential to bias (either upward or downward) the relative frequency of recombinant haplotypes observed in adults (e.g., non-epistatic deleterious alleles in repulsion phase would bias the frequency of recombinant haplotypes upward whereas the fixation of compensatory mutations along a chromosome could generate opposite effects). Our crossing and genotyping schemes are designed to reduce the likelihood of viability effects: heterozygous F1 females are crossed to males from a ‘third’ (tester) line and we genotyped F2 females (). Thus, potential deleterious alleles with egg-to-adult viability effects that became fixed due to inbreeding will likely be heterozygous in F2 females, limiting viability defects across the whole genome. Moreover, the inbred lines used in our study were generated by en masse brother-sister mating, which exposes mutations to both drift and selection and increases the likelihood of purging severely deleterious and non-recessive alleles relative to more extreme inbreeding methods [152,153]. Finally, we used three different crosses involving a total of six inbred parental lines from different natural populations, making it unlikely that multiple lines share similar combinations of deleterious alleles. All three crosses show the same trend of a negative unconstrained estimate of E0 for the X chromosome [E of -0.163, -0.185 and -0.263]. Combined, our results and crossing schemes would predict limited viability effects altering haplotype frequencies. To rule out viability effects as a cause of the unsual E0 values for the X chromosome more directly, we estimated the frequency of the four possible haplotypes (pairwise combinations of the two parental genotypes) recovered in F2 individuals as a function of the genetic distance across the entire chromosome. Viability effects would generate one parental haplotype to be more frequent than the other and/or one recombinant haplotype to be more frequent than the other. Our study () shows that the two parental haplotypes as well as the two recombinant haplotypes are observed at equal frequencies for any given genetic distance. Moreover, the frequency of recombinant haplotypes increases with genetic distance, as predicted if all (or the great majority) of recombinants are observed in adult females. Note also that the frequency of the two recombinant haplotypes stays lower than 25% for ~ 50 cM due to the presence of multiple COs in a fraction of gametes. Additionally, we obtain equivalent results for each of the three crosses analyzed. These results allow us to rule out viability effects in our recombination data and strongly hint at the presence of mechanisms that alter the segregation or transmission of chromatids for the D. yakuba X chromosome.

Crossover-associated meiotic drive in D. yakuba X chromosome

A biological explanation for an observed deficit in chromatids with zero crossovers would be a form of meiotic drive [154] during meiosis II. Specifically, our data could be explained by meiotic drive if the sister chromatid that experienced a crossover was preferentially segregated into the oocyte nucleus relative to its nonrecombinant sister chromatid, which would be preferentially extruded to the second polar body (). As a consequence of this form of crossover-associated meiotic drive (MDCO), the number of crossovers detected in the offspring would be above that expected under Mendelian segregation for any given number of crossovers during meiosis I. Note that MDCO differs from the standard concept of meiotic drive because only the latter is associated with allelic genetic variants in heterozygotes [154-164].

Model of tetrad frequencies with crossover-associated meiotic drive (MDCO).

A) MDCO model where chromatids with crossovers are preferentially transmitted to the oocyte with a bias b (b > 0.5) when the sister chromatid has no crossovers. B) ML estimates of tetrad frequencies for the D. yakuba X chromosome under a MDCO model, with a best fit when b = 0.86. C) Observed and predicted frequencies of crossover classes for the chromosome X of D. yakuba under the MDCO model. D) Joint maximum-likelihood estimates (MLE) of E0 and b for the X chromosome of D. yakuba (left) and D. melanogaster (WGS; right). The red dot represents point estimates with maximum fit to the data. We explored a ML model including the possibility of a bias in meiosis II favoring the recovery (oocyte) of chromatids with crossovers when the sister chromatid has no crossovers. This modelling shows that allowing MDCO not only improves the fit to the X chromosome crossover data significantly relative to a model without segregation bias (Likelihood-ratio test, LRT, P < 1x10-13) but also explains the overall distribution of crossovers accurately (P = 0.997) ( and ). Specifically, point estimates for our D. yakuba X chromosome data suggest a model with E = 0.10 and a probability of chromatids with crossovers to ultimately be present in oocyte nuclei of b = 0.86 instead of 0.5 when paired with sister chromatids with no crossovers. More generally, however, a MDCO model fits the D. yakuba X chromosome data with P > 0.99 for a wide range of E0 and b values, from E0 = 0 (with b = 0.73) to E0 = 0.22 (with b = 1) (). Given these results for D. yakuba, we investigated the possibility of MDCO for the D. melanogaster X chromosome for comparison (). This study shows that although a significant effect of MDCO is not necessary to explain crossover information in this species, it is nonetheless compatible with it (P > 0.99) for a wide range of plausible E0 values (from 0.10 to 0.46). We also examined the possibility of chromatid interference [64,145]. Unlike MDCO, the average number of crossovers per offspring chromatid does not change under chromatid interference, only the distribution of crossovers among chromatids when E. In this regard, positive chromatid interference or PCI (making the nonsister pair of chromatids involved in one crossover less likely to be involved in additional crossovers) would cause a reduction in the fraction of chromatids with extreme values of crossovers, including those with zero crossovers. For instance, an extreme case of PCI for E would generate only chromatids with 1 crossover instead of the expected 25, 50 and 25% of chromatids with 0, 1 and 2 crossovers, respectively. Thus, although chromatid interference is rare in diploids [165], PCI could—in principle—generate the observed deficit of chromatids with zero crossovers for the D. yakuba X chromosome. To evaluate this possibility, we incorporated PCI into our ML model with constrained tetrad frequencies (E ≥ 0), allowing for a biased distribution of crossovers (b) favoring the use of the sister chromatid with the fewest number of previous crossovers (b ≥ 0.5 between sister chromatids instead of b = 0.5 for the random case). When applied to the D. yakuba X chromosome data, a PCI model is better than a model without PCI but, nonetheless, incompatible with our observations (best fit when b = 0.59, P = 1.1x10-9) and worse than the MDCO model described above (LRT = 41.2, P = 1.4x10-10).

Transposable elements (TEs) and crossover rates

Given the frequent deleterious effects of TE insertions, models concerning crossover rates and efficacy of selection predict that TEs would passively accumulate in regions with reduced crossover rates (reviewed in [166,167]). In agreement, TEs tend to be more frequent in genomic regions with limited or absent crossovers in many species, including D. melanogaster [67,110-112,166,168-170]. There is, however, an alternative explanation to this same observation which does not rely on an association between contemporary crossover rates and long-term efficacy of selection; TEs can actively reduce crossover rates via epigenetic mechanisms related to TE silencing around insertions thus creating a similar negative intra-genomic association between crossover rates and TE presence—even if crossover landscapes have changed very recently [171-177]. Consistent with previous studies [178,179], our analysis shows that there are more TE copies (about 30% more) in the D. yakuba genome assembly than in D. melanogaster (see ). Our study also shows that the overall difference in TE presence is driven by differences in copy number of INE-1 (Drosophila INterspersed Element), with a 1.7-fold increase relative to D. melanogaster (Padj = 1.26×10−78; ; see also [180-184]). The survey of TE distribution across the genome of D. yakuba reveals an accumulation in regions with reduced crossover rates, clustered near the centromeres and, to a lesser degree, telomeres (). Across the D. yakuba genome, TE presence is negatively correlated with crossover rates (Spearman’s ρ = -0.543, P = 5.2×10−41) (). Although TEs in D. melanogaster also show a significantly negative association with crossover rates (ρ = -0.354, P = 1.6×10−15), the correlation is significantly more negative in D. yakuba than in D. melanogaster (Fisher’s Z transformation, P < 8×10−15), a difference that is not expected if epigenetic factors were the only cause of the negative association. A similar pattern is observed when only INE-1 elements are analyzed (). More generally, the study of TE classes () or each TE independently indicates that the distribution of SINE elements and Class I retrotransposons, which include INE-1, shows a stronger correlation between abundance and crossover rates in D. yakuba than in D. melanogaster. Combined, these results suggest that D. yakuba shows a distribution of TEs and crossover rates across the genome that fits population genetic expectations better than direct epigenetic effects of TE presence on crossover distribution, and is a first indication that D. yakuba may have a more stable landscape of crossover rates than D. melanogaster. To gain insight into the temporal dynamics of TE accumulation, we focused on the large regions showing a clear difference in crossover rates between D. yakuba and D. melanogaster. Regions with centromere effect in D. yakuba but not in D. melanogaster show significantly more TEs in D. yakuba than in D. melanogaster (Mann–Whitney U test, z = 6.04, P < 1 × 10−6). On the other hand, regions affected by the centromere effect in both D. yakuba and D. melanogaster show similar TE presence in both species (z = - 0.01297, n.s.). These results are in agreement with predictions of models of selection and linkage if the stronger centromere effect has been fairly stable in D. yakuba.

Short DNA motifs and crossover localization

We investigated whether short DNA motifs associated with intragenomic variation in crossover rates in D. melanogaster [185] and D. simulans [186] are also significantly enriched near crossovers in D. yakuba (see ). These motifs contain [A]n, [CA]n, [TA]n, [GCA]n and a more variable [CYCYYY] n (). All nine individual motifs identified in D. melanogaster are significantly enriched near crossovers also in D. yakuba when studying crossovers flanked by diagnostic SNPs separated by a maximum of 3 kb (or 5 kb). All except a variable form of [CA]n are also significantly enriched near crossovers with a resolution of 1 kb (see Tables and ). Because several of the motifs have overlapping characteristics within the [A]n and [CA]n classes, we combined them and observed that both [A]n and [CA]n classes of motifs are very significantly enriched near crossovers in D. yakuba at all resolutions analyzed (). 1 Probabilities obtained by comparing the number of times this motif is identified in sequences containing a crossover event and expectations based on sequences of equivalent length randomly chosen across the genome. Three sequence datasets were analyzed based on the distance between diagnostics SNPs around a crossover event (5-kb or less, 3-kb or less, and 1-kb or less).

Evolutionary consequences and dynamics of intragenomic variation in crossover rates

Crossover rates and the efficacy of selection on synonymous codons: In agreement with predictions of models of linkage and selection, estimates of the strength of CUB (-ENC) in D. yakuba genes show a very significant and positive correlation with our D. yakuba high-resolution crossover rates (Rec): R = +0.228 (P = 6.0x10-13) for X-linked genes and R = +0.133 (P = 1.3x10-22) for autosomal genes. We also re-analyzed CUB in D. melanogaster for the same set of genes now with high-resolution crossover rates (Rec) and confirmed the previous results: that CUB increases with crossover rates along the X chromosome (R = +0.170, P = 8.9x10-8) but autosomal genes show a significant and negative correlation (R = -0.075, P = 3.8x10-8). These results suggest that previous incongruent results in D. melanogaster were not caused by the use of inadequate selection models (given that these models fit D. yakuba data remarkably well) but rather by departures from model assumptions when using D. melanogaster data. More specifically, the fit of D. yakuba data to models that assume constancy in linkage effects suggests that the landscape of crossover rates might have been fairly stable along the D. yakuba lineage whereas a change in the recent past of D. melanogaster may have likely occurred, as previously proposed by [98,131]. To explore this possibility, we first investigated how well Rec would predict CUB in D. melanogaster and how well Rec would predict CUB in D. yakuba. We show that crossover rates in D. yakuba are better predictors of intragenomic variation in CUB across the D. melanogaster genome than contemporary crossover rates from D. melanogaster, and that the results fit models of selection and linkage along both the X chromosome and autosomes (Fig ). D. melanogaster CUB in X-linked genes shows a stronger positive association with crossover rates in D. yakuba (Rec; R = +0.201, P = 2.2x10-10) than with crossover rates in D. melanogaster (Rec; R = +0.170; P = 8.9x10-8). CUB in autosomal genes in D. melanogaster also shows a significantly positive correlation with Rec (R = +0.063, P = 4.1x10-6), even though the use of Rec instead generated a negative association (see above). At the same time, Rec is a poor predictor of CUB along the X chromosome of D. yakuba (R = +0.115, P = 0.0003) and shows a negative association with CUB in D. yakuba autosomes (R = - 0.076, P = 2.7x10-8), again contrary to predictions. These results, therefore, open the possibility that contemporary crossover maps in D. yakuba may recapitulate to some extent the ancestral recombination environment of the D. melanogaster subgroup.

Fig 7

Correlation between codon usage bias (CUB) and crossover rates in D. yakuba (Rec) or D. melanogaster (Rec).

Correlation between codon usage bias (CUB) and crossover rates in D. yakuba (Rec) or D. melanogaster (Rec).

A generalized linear model (GLM) was used to estimate the correlation coefficient R between crossover rates (log10), either Rec or Rec, and CUB estimated for each gene in the five species analyzed. Numbers in black indicate significant estimates of R in the direction predicted by models of selection and linkage. Red numbers indicate a significant association in the opposite direction than that predicted by models. *, P < 0.05; **, P < 0.01); n.s., non-significant association (P > 0.05). To delve into the prospect that Rec may capture ancestral crossover patterns of the whole D. melanogaster subgroup, we obtained regression estimates between Rec and CUB for three other species of the subgroup (D. simulans, D. sechellia and D. erecta; see Fig ). For the X chromosome, we observe that Rec predicts almost equally well CUB in D. yakuba as CUB in D. erecta (R = +0.222, P = 2.5x10-12), a species that separated from D. yakuba ~10.4 Mya [187], and only slightly less well (R ranging between 0.191 and 0.201; P < 3.0x10-10) for D. simulans and D. sechellia, which separated from D. yakuba ~12.7 Mya [187]. Combined, these results strengthen the notion that Rec for the X chromosome captures the ancestral crossover rates of the whole subgroup to a significant degree and that Rec has been remarkably constant for more than 40 My of evolution. For autosomes, our results suggest a faster evolution of crossover landscapes relative to the X chromosome within the D. melanogaster subgroup. Nonetheless, Rec keeps being a good predictor of CUB across the entire phylogeny analyzed while Rec shows a negative association with CUB for all species, indicating that significant changes in crossover landscapes in the D. melanogaster lineage had to occur not only after the split from its common ancestor with D. simulans/D. sechellia 5.4 Mya [187] but also very recently in order to severely limit the linkage effects of the new crossover landscape on CUB. Note that the conclusions about temporal stability of crossover rates based on the analysis of CUB would also be valid if the cause of biased codon frequency was GC-biased gene conversion (gcGBC) instead of, or in combination with, translational selection [188,189]. This is because the resolution of heteroduplex DNA that arises during recombination is GC-biased and preferred synonymous codons are often G- and C-ending. gcBGC, therefore, directly predicts a positive relationship between crossover rates across genomes and the degree of GC-ending codons (and hence CUB) that mimics the predictions of weak selection [190]. Crossover rates and the efficacy of selection on protein evolution: To identify variation in the efficacy of selection at protein level, we estimated rates of nonsynonymous (dN) and synonymous (dS) evolution, and the dN/dS ratio (ω) across the D. melanogaster subgroup using the branch-model in the codeml program as implemented in PAML [191,192]. Moreover, to capture fine-scale intragenomic effects of variation in crossover rates, we analyzed rates of evolution using single-gene data, without grouping genes into crossover-groups. Additionally, we took into account gene-specific properties possibly influencing ω and used residuals as a measure of variable efficacy of selection on amino acid changes (ωR) for each gene and phylogenetic branch (see Materials and Methods). We first investigated changes in efficacy of selection on weakly deleterious amino acid mutations by focusing on genes with no signal of rapid evolution (see Materials and Methods and [104]). Variation in contemporary crossover rate across the D. yakuba genome is negatively associated with gene-specific rates of protein evolution along the D. yakuba (ωR-yak) lineage genome-wide (r = -0.047, P = 0.0002), as well as for the X chromosome (r = -0.078, P = 0.02) and autosomes (r = -0.043, P = 0.002) (Fig 8A). Along the D. melanogaster lineage, ωR-mel shows a weak negative association with Rec genome-wide (r = -0.026, P = 0.044) that is also observed for the X chromosome (r = -0.100, P = 0.002) but not for autosomes (r = -0.012, P > 0.05). Equivalent to our results with CUB, we observe that Rec is a better predictor of rates of evolution along the D. melanogaster lineage than Rec; Rec shows a significantly negative association with ωR-mel genome-wide (r = -0.053, P = 0.00004). In fact, contemporary crossover rates in D. yakuba show a weak but significantly negative association with rates of protein evolution across the whole phylogeny (ωR-total) (r = -0.036, P = 0.0044) while Rec shows no significantly association with ωR-total (Fig 8A).

Fig 8

Correlation between rates of protein evolution and crossover rates in D. yakuba (Rec) or D. melanogaster (Rec).

Correlation between rates of protein evolution and crossover rates in D. yakuba (Rec) or D. melanogaster (Rec).

A) For each branch across the D. melanogaster subgroup phylogeny, estimates of the efficacy of selection on amino acid changes (ωR) per gene were compared to crossover rates, either Rec or Rec, with a generalized regression model (GRM) to estimate the correlation coefficient R. B) For each branch across the phylogeny, genes with and without signal of positive selection based on PAML (see text for details) were compared to crossover rates for these same genes. Odds Ratio (OR) from logistic regression analysis was applied to capture variable likelihood of positive selection with crossover rates (see text for details). Numbers in black indicate significant estimates of R in the direction predicted by models of selection and linkage whereas red numbers indicate a significant R in the opposite direction. *, P < 0.05; **, P < 0.01; n.s., non-significant association (P > 0.05). When studying the relationship between Rec and ωR for different branches of the phylogeny, we confirm a consistently significant and negative association for the X chromosome for most branches (except for the D. erecta lineage), and a less conserved trend for autosomes (Fig 8A). Rec, on the other hand, shows no signal of being negatively associated with ωR along any lineage outside D. melanogaster; it even shows a positive association (contrary to predictions) with rate of protein evolution along the D. simulans lineage. In agreement with the results based on CUB, these analyses support a very recent change in crossover landscape in D. melanogaster and, additionally, identify a likely change in crossover patterns in D. simulans/D. sechellia after the split from their common ancestor with D. melanogaster. Therefore, our studies of protein evolution also support the conclusions that contemporary crossover rates across the D. yakuba genome capture ancestral properties of crossover landscapes, with the X chromosome showing a stronger degree of conservation. We also investigated whether variation in crossover rates is associated with the probability of adaptive protein evolution (based on PAML analyses; see Materials and Methods). To this end, we applied a logistic regression test (or logit) that allows the study of binary variables (a gene showing or not signals of positive selection along a lineage) as a response to a continuous variable (crossover rates), without grouping genes into arbitrary crossover classes. We observe that contemporary Rec is a good predictor of adaptive protein evolution along the D. yakuba lineage [Odds Ratio (OR) = 1.33, P = 0.0017]. Outside the D. yakuba lineage, Rec shows a similar association with adaptive protein evolution along the D. erecta lineage and along the lineages ancestral to both D. yakuba/D. erecta and D. melanogaster/ D. simulans/D. sechellia (), again suggesting that Rec captures some of the ancestral crossover landscape of the D. melanogaster subgroup. Contemporary Rec shows no significant association with the likelihood of adaptive amino acid changes along any of the lineages analyzed (including the D. melanogaster lineage), in agreement with previous studies [104] and with the proposal of a recent change in the distribution of crossover rates across the genome of D. melanogaster [98,131]. Crossover rates and variation in levels of neutral diversity: We estimated neutral diversity at four-fold synonymous sites (π4f) across the D. yakuba and D. melanogaster genomes as a proxy for recent N (see Materials and Methods) and compared these estimates with our high-resolution crossover landscapes for non-overlapping 100 kb regions. In D. yakuba, we observe that variation in crossover rates is a very strong predictor of π4f for both autosomes [r = 0.74, P = 7.6×10−169] and the X chromosome (r = 0.63, P = 5.6×10−25), as expected based on models of selection and linkage (). In D. melanogaster, we estimate equivalent trends when using similar methodologies and crossover map resolutions: r = 0.64 (P = 5.4×10−110) and r = 0.568 (P = 3.3×10−20) for autosomes and the X chromosomes, respectively (). These results show that contemporary Rec does a good job capturing heterogeneity in recent N across the D. melanogaster genome, and emphasizes the disparity between long- and short-tem N along the D. melanogaster lineage when combined with the results from CUB and protein evolution. At the same time, we observe that the D. yakuba lineage may conform better than D. melanogaster to models of selection and linkage when applying methodologies that assume consistency between long- and short-term N.

Relationship between crossover rate (cM/Mb) and levels of neutral nucleotide polymorphism (π4f) in D. yakuba and D. melanogaster.

π4f indicates pairwise nucleotide variation (/bp) at four-fold synonymous sites. Autosomal and X-linked regions are indicated as blue and red circles, respectively. Crossover rates are adjusted to allow a direct comparison between X-linked and autosomal regions. Dashed lines indicate linear regressions between π4f and adjusted crossover rates; solid lines indicate regressions between π4f and the log10 of crossover rates. Data shown for non-overlapping 100-kb windows.

Discussion

Here we report a genome-wide crossover map for D. yakuba using WGS and a new dual-barcoding methodology. Our method reduces library and sequencing costs significantly, allowing for the genotyping of a large number of individuals and, ultimately, the generation of genetic maps at very high resolution. The dual-barcoding approach involves crosses between multiple wild-type parental genotypes and, in turn, provides the opportunity to capture species-wide genetic maps that are more informative for genomic and evolutionary analyses than those generated using a single cross. Our crossing and genotyping schemes also reduce the potential effects of using inbred lines when studying meiotic outcomes. Nevertheless, the study of the frequency of genotypes in the chromatids analyzed is an important step to directly asses the potential role of egg-to-adult viability deffects. In our case, the study of the frequency of parental and recombining genotypes allowed us to rule out detectable viability effects biasing the meiotic products genotyped. We analyzed a large number of individual meiotic events (more than 1,600) and identified the precise localization of more than 5,300 crossover events. Amongst Drosophila species, D. yakuba shows an intermediate total map length (339 cM), longer than D. melanogaster (287 cM [41,113,136]) and shorter than D. mauritiana (about 500cM; [31]), D. pseuddobscura (>450 cM; [193,194]) or D. virilis (732 cm [117]). This high degree of variation is consistent with the proposal that the number and distribution of crossovers have the potential to evolve very fast, possibly as a result of adaptive processes ([118,126] see also [127,129,195]). Although our study in D. yakuba provides confirmatory evidence to some information garnered from the model species D. melanogaster, most of our analyses in D. yakuba reveal clear differences and a contrasting view of crossover distribution, control and evolution in Drosophila. As such, this study adds to the expanding notion that a deep understating of the causes and consequences of crossover rate variation across genomes benefits from the analysis of multiple species within an appropriate phylogenetic context. Before this study, a general trend among Drosophila species appeared to support a link between the length of genetic maps and the magnitude of the centromere effect; D. simulans, D. mauritiana, D. pseudoobscura and D. persimilis all show higher number of crossovers per meiosis and weaker centromere effect than D. melanogaster [31-34,126]. Such a trend suggested that the centromere effect is either a direct cause or a consequence of lower genome-wide crossover rates, or that the two processes coevolve. D. yakuba shows, however, a significantly higher crossover rate and overall longer genetic map than D. melanogaster while also showing a greater centromere effect, thus decoupling the two observations for the first time in Drosophila. Our results in D. yakuba, the species with the greatest centromere effect in the Drosophila genus to date, uncover a more complex relationship between crossover distribution and the centromere effect. A strong centromere effect also limits substantially the genomic region naturally available for multiple crossover events along a chromosome arm, begging the question of whether this impacts crossover interference. D. yakuba shows significant crossover interference, as observed in all other Drosophila species analyzed. Notably, the chromosome arms with stronger centromere effect (2L, 3L and 3R) also show a higher degree of crossover interference than chromosome arms with weaker centromere effects (2R and X) (see Tables and ). Combined, these results would support the concept of interconnected molecular mechanisms controlling the overall number of crossovers, centromere effects and crossover interference, with outcomes that may vary among chromosome arms. Our study in D. yakuba also differs from multiple reports in other Drosophila species showing that the suppression of crossovers in inversion heterozygotes increases the number of crossovers on freely recombining chromosome arms (the interchromosomal effect [39,134,135]). The presence of a heterozygous inversion on chromosome arm 2R in D. yakuba plays no detectable role in increasing the number of crossovers in any of the freely recombining chromosome arms. A key difference between D. yakuba and most Drosophila species where the interchromosomal effect has been studied is the higher number of crossovers per female meiosis in the former species. Therefore, although it is not possible to fully disentangle the effects of the genetic backgrounds carrying inversions, the results of our D. yakuba study open the possibility that the higher number of crossovers per chromosome arm in this species limits the potential increase in crossovers and the overall magnitude of the interchromosomal effect.

Crossover assurance and crossover-associated meiotic drive in D. yakuba

D. yakuba shows a large fraction of recovered chromatids with multiple crossovers, particularly for the X chromosome. As a consequence, and contrary to studies in D. melanogaster, tetrad analysis in D. yakuba requires the inclusion of E≥3 tetrads to capture the full observed distribution of meiotic products. Our tetrad analysis shows that D. yakuba has a stronger degree of crossover assurance than D. melanogaster, compatible with absolute crossover assurance under benign conditions for wild-type genotypes. We also show results that strongly support the presence of an active crossover-associated meiotic drive mechanism (MDCO) in D. yakuba that acts in addition to strong crossover assurance for the X chromosome and results in the preferential inclusion in oocytes of chromatids with crossovers (). More specifically, we show that models without bias in transmission or with biases in distribution of crossovers between sister chromatids are incompatible with observed data for the X chromosome. A model allowing MDCO not only fits the data far better than any of the other models investigated but also fits the data well. Moreover, our WGS approach together with the observation that this phenomenon is identified in all three crosses of wild-type strains makes it unlikely that the results are caused by viability effects. We, therefore, propose that D. yakuba has evolved increased effective crossover rates on the X chromosome, at least partly, by meiotic drive rather than by increasing the number of tetrads with two or more crossovers. In all, our analyses predict a total of 5.98 ─ 6.33 crossover events per female meiosis that would result in 3.46 detectable crossovers per viable meiotic product. The same quantitative study suggests that this MDCO mechanism would be equivalent to a 15 to 34% increase in X chromosome homologous recombination events in offspring relative to a case with no drive but the same number of meiotic crossover events in prophase of meiosis I. Our proposal of an active MDCO mechanism is not completely new. Meiotic drive for recombinant chromatids at meiosis II has been proposed to be active in the human female germline based on one study that genotyped oocytes and polar bodies of women with average age of 37.3 years [196]. In D. melanogaster, Singh et al. (2015) [197] showed an increase in recombinant offspring (rather than a direct increase in crossovers in meiosis I) as a plastic response to parasite pressures, suggesting an epigenetic activation of transmission distortion and assymetries during meiosis II, equivalent to the MDCO mechanism. Our results suggest that an equivalent mechanism may be active under benign conditions for the X chromosome in D. yakuba. This said, and given the standard observation that E0 is smaller on the X chromosome than on autosomes in D. melanogaster, it is possible that MDCO is active under benign conditions in the D. melanogaster X chromosome as well, but has not yet been identified due to the smaller number of crossovers. In fact, we show that MDCO is compatible with D. melanogaster X chromosome data (). In this regard, the detailed genetic and cellular study of other species of the D. melanogaster subgroup can provide additional support to the possibility of a more widespread MDCO, at least for the X chromosome, that may be enhanced under stress conditions. This leaves open the question of what might be the selective advantage, if any, to an active MDCO mechanism on the X chromosome. Modifers of crossover rates are known to segregate in natural populations of many species [118,120,128,198-201] and models of selection and recombination predict an overall benefit to increasing rates when finite populations are considered (see [202] and references therein). Increased crossover rates could be particularly frequent on the X chromosome because this chromosome would be more likely to take advantage of recesive beneficial modifiers that increase rates relative to autosomes, a possibility that would fit with Drosophila studies showing stronger positive selection acting on the X chromosome at both protein and gene expression levels [103,203-211]. At the same time, there is a direct limitation to increasing the number of crossovers in meiosis I given the known increased probability of missegregation and higher rates of ectopic exchange and chromosomal aberrations in multi-chiasma tetrads [212,213]. The proposed meiotic drive in meiosis II would allow taking advantage of the evolutionary benefits of higher recombination rates while limiting the number of crossovers per tetrad. The advantages of such mechanism, moreover, are predicted to be particularly relevant in species with high likelihood of ectopic recombination due to the abundance of TEs, as in the case of D. yakuba.

Transposable elements, satellite repeats and crossover distribution in D. yakuba

The study of TE abundance across the genome of D. yakuba confirms trends observed in other Drosophila species including D. melanogaster, with TEs being more frequently observed in genomic regions with significantly reduced crossover rates. Notably, D. yakuba shows a stronger negative correlation between TE abundance and crossover rate across the genome than D. melanogaster. These results suggest that the proposed epigenetic effects of TEs reducing local crossover rates play a minor role in the observed crossover landscapes and areinstead more consistent with models of selection and linkage. Because these models assume long-term equilibrium, our results also suggest a more stable landscape of crossover rates in D. yakuba than in D. melanogaster. The study of satellite DNA in D. yakuba, combined with previous results from D. simulans, confirms that D. melanogaster shows a highly evolved composition relative to other related species and provides outgroup information strengthening the idea that the changes in satellite composition in D. melanogaster arose after the split from D. simulans [139,214]. Our analysis of satellite DNA also suggests, albeit indirectly, shorter centromeres in D. yakuba than in D. melanogaster. Given the reports linking shorter centromeres to stronger centromere effects in Drosophila [138], this result is also congruent with the observed greater centromere effect in D. yakuba than in D. melanogaster.

Short DNA motifs and crossover distribution

Short DNA repeat motifs enriched near crossover events in D. melanogaster [185] are also overrepresented in genomic regions with high crossover rates in the closely related species D. simulans [186]. Here, we showed that these same motifs are also enriched near crossover events in D. yakuba despite the greater evolutionary distance. Notably, many of these motifs share properties associated with a high degree of open chromatin and DNA accessibility [185], and there is the additional observation that in Drosophila, as in other organisms, crossovers are enriched at or near transcriptionally active regions [215,216]. In this regard, it is important to consider DNA:RNA hybrids (R-loops) and the proposed role inducing DNA damage, instability and, ultimately, double-strand breaks [217-220]. Interestingly, poly A/T tracts, which is one of the crossover-associated motifs in Drosophila and in yeast [119,221,222], play a causal role in R-loop formation [223]. In all, therefore, crossover rate variation at the scale of several megabases seems best explained by centromere and telomere effects whereas variation at a finer or local scale would be associated with open chromatin, active transcription and specific DNA motifs. The inclusion of R-loops into the paradigm to describe crossover localization across Drosophila genomes is particularly appealing because it would provide a testable mechanistic explanation for the known influence on crossover rates and distribution of epigenetic changes due to environmental conditions and stress. Because crossover rates vary across genomes, models of selection and linkage predict that N will vary across genomes as well, as a function of crossover rates and gene distribution. As a result, analyses of diversity and selection that combine data from multiple loci are difficult and not recommended. To deal with this intragenomic variation in N, the standard approach is to combine inter- (divergence) and intra-specific (diversity) data and assume that contemporary N for a given locus is representative of past, long-term N influencing efficacy of selection and rates of evolution [73,80,83,84,86,88,95,106,107,224-231]. However, genomic landscapes of crossover rates vary between species evidencing that N at a given locus changes with time (in addition to demographic events) and, therefore, contemporary N may differ from long-term N in many species and genes. The reasons for not fully embracing temporal changes in crossover rates and N are mostly practical ones. Population genomic data can be easily obtained today, providing detailed views of nucleotide variation within and between species. In contrast, genome-wide high-resolution crossover maps are much less common because they need both very high marker density (ideally WGS or equivalent) and many individuals genotyped; high marker density or high number of genotyped individuals alone is not sufficient to generate accurate high-resolution maps. In all, the assumption of temporally stable genomic landscapes of N can produce inaccurate parameterization of linkage effects and selection, and thus hinders our understanding of the causes of variation within and between species. Among Drosophila species, detailed, experimentally generated, crossover maps exist for D. melanogaster [41,113] and D. pseudoobscura [32,118] but equivalent data is still infrequent for closely related species. Within the D. melanogaster subgroup, sparse crossover rate maps for the sister species D. simulans and D. mauritiana showed higher crossover rates and a different crossover landscape than D. melanogaster [31]. Limited data for the X chromosome of D. yakuba also suggested higher crossover rates than D. melanogaster [232,233]. Recent studies of meiosis genes comparing D. melanogaster and D. mauritiana further suggest that the change in crossover rates likely occurred in the D. melanogaster lineage [126], but there was no outgroup data to strengthen that conclusion. More generally, the lack of multispecies high-resolution crossover landscapes in the D. melanogaster subgroup has limited attempts to quantify the effects of linkage on the efficacy of selection in this species complex. Here, we used the new high-resolution contemporary crossover rate map for D. yakuba to add phylogenetic context for crossover landscapes within the D. melanogaster subgroup, and investigate the evolutionary influence of variation in crossover rates across the genome and time. Our studies show that the weak support for an effect of crossover rate variation on the efficacy of selection when using D. melanogaster as the focal species is, to a large degree, due to a widespread change in crossover rates in the very recent history of D. melanogaster, after the split from its common ancestor with D. simulans. Indeed, contemporary crossover rates in D. melanogaster (Recmel) are a good predictor of diversity and contemporary N within this species, but are very poorly associated with estimates of long-term efficacy of selection (CUB, efficacy of selection against deleterious amino acid changes and the incidence of adaptive amino acid changes). As a consequence, analyses in D. melanogaster that assume temporally stable crossover landscapes will be inaccurate. In contrast, the D. yakuba lineage seems to have had a more stable crossover landscape that is well captured by our new contemporary crossover maps. Studies of neutral diversity across the genome of D. yakuba fit well with patterns predicted by models of selection and linkage. That is, contemporary Recyak captures well recent intragenomic contemporary N as well as long-term N. The conclusion of high stability in crossover rates along the D. yakuba lineage is confirmed when considering the D. melanogaster subgroup, with the contemporary crossover landscape of D. yakuba capturing information of ancestral linkage effects across the whole D. melanogaster subgroup, including the D. melanogaster lineage. These results suggesting a fairly stable crossover landscape in D. yakuba are also congruent with the study of TE distribution, which indicates long-term co-evolutionary dynamics of TE abundance and crossover rates. Combined, these results strongly support the idea that D. yakuba may be an adequate species to study population genetics predictions influenced by crossover rates. Our study represents only one step towards a better understanding of temporal variation in crossover rates and their consequences in Drosophila. More generally, our analyses underscore the importance of generating high-resolution crossover rate maps from closely-related species as a necessary counterpart for every species with population genomic data and within a coherent phylogenetic context. Minimally, this information will allow testing assumptions of temporal stability of crossover landscapes. Moving forward, the additional information that these crossover maps provide will refine our understanding of crossover control during meiosis and activate research towards theoretical and modelling frameworks to study and parameterize evolutionary processes amid changes in crossover rates.

Materials and methods

Dual-barcoding genotyping method

We used a genotyping approach with two layers of barcoding to reduce sequencing and labor costs (see ). In typical Illumina sequencing, each sample receives one adapter (one library sequence barcode), which allows assigning reads to different samples when sequenced together. In our method, we use multiple parental lines (line 1, line 2, line 3, line 4, etc.), and each line is involved in a different cross (cross 1: line 1 x line 2; cross 2: line 3 x line 4, etc.). F1 virgin females are crossed to males of a ‘tester’ line to generate F2 individuals and we combine multiple F2 individuals, one from each cross, per library (per sequence barcode). After sequencing, reads are separated based on sequence barcode and, for each set of reads sharing a sequence barcode, SNPs specific to a single parental line (singleton SNPs) are used as diagnostic SNPs or genetic barcodes. Reads containing diagnostic SNPs can then be assigned to a specific parental genome and cross without ambiguity, and sequence reads not containing any diagnostic SNP for parental or tester lines are discarded.

Crossing scheme and sequencing of the parental genomes

Each of the seven D. yakuba lines used in this study was derived from the offspring of a single female (isofemale lines). Lines TZ043 and TZ020 were established from females collected during July 2001 in the Upanga District of Dar es Salaam (Tanzania; kindly provided by Bill Ballard), lines SN20 and SN17, from females collected in January 2004 in the São Nicolau waterfall (São Tomé Island; collected by Ana Llopart), lines Cost 1235.2 and Obat 1200.5, from females collected in March 2001 in the Obo Natural Reserve (São Tomé Island; collected by Daniel Lachaise), and line Rain 5 from a female collected in July 2009 in the Obo Natural Reserve (São Tomé Island; collected by Ana Llopart). The three crosses we used to obtain recombination maps were: TZ043 × TZ020, Sn20 × Sn17 and Rain 5 × Cost 1235.2. One-day old F1 female offspring from these crosses were crossed to males from the tester line Obat 1200.5. This last cross to the tester line was important to allow the generation of haploid sequences from the parental lines bioinformatically when sequencing F2 individuals (see [113]). We generated an improved D. yakuba genome reference sequence of Tai18E2 using PacBio and Illumina sequencing (see ) that was then used to generate high-quality genome sequences for all the parental lines used in the crosses, including the tester line (see ). In short, to generate genome sequences of parental lines (and to correct PacBio sequences) we used two rounds of mapping, each one including Bowtie2 [234] and Stampy [235] mapping. Variants were called using samtools mpileup version 0.1.18 with minimum base quality of 30 and minimum mapping quality of 35 followed by BCFtools view to filter variants with a minimum read depth 3 and a fraction of reads supporting the variant greater than 0.8. After filtering, vcfutlis vcf2fq was used to convert the VCF files to fastq format followed by seqtk fq2fa to convert to fasta format [236]. Our approach generated final sequences for parental lines that only show non-N positions when high quality information is present.

Diagnostic SNPs

Diagnostic SNPs were defined as sites where all parental (including the tester) genomes contain a high quality nucleotide call and only one genome had a different nucleotide variant (singleton SNP). Sites with low quality or ambiguity in one or more genomes, or with 3 or 4 nucleotide variants were removed from further consideration. Because the D. yakuba lines used in the crosses were inbred over several generations, they each contain few heterozygous sites that were not considered as diagnostic SNPs unless when including a high quality and unambiguously informative variant.

Sequencing of F2 flies

We combined one F2 female from each of the three crosses to prepare Illumina libraries with a given custom-designed barcode, following the same experimental protocols described in ’Illumina library preparation’ (). We multiplexed our libraries (112 libraries/lane) based on custom-designed adapters with ≥8-nucleotides that are at least 4 nucleotide changes away from any other barcode, and sequenced them in an Illumina HiSeq 4000 instrument housed at the Iowa Institute of Human Genetics (IIHG) Genomics Division, University of Iowa. Fastx_barcode_splitter and fastx_trimmer from FASTX-Toolkit version 0.014 (http://hannonlab.cshl.edu/fastx_toolkit/) were used to split sequence reads and to remove barcode sequences, respectively. Sequence reads were also filtered with Trimmomatic and parameters as described in ’Illumina alignment pipeline’ (). A total of six Illumina HiSeq 4000 lanes were used in this project, allowing us to genotype up to 1,701 F2 individuals.

Mapping to diagnostic SNPs

To identify reads mapping to a diagnostic SNP of a specific parental line, we first, conservatively, mapped reads to all other parental lines using Bowtie2 [234] with parameters that ensure perfect alignment for the whole read. Reads that did not align to any of these genomes were then mapped to the sequence of the parental genome of interest with the same restrictive parameters. After mapping, samtools mpileup was used to obtain all nucleotides with mapped reads followed by BCFtools view to generate VCF files [236]. Sites not previously assigned as a diagnostic SNP and sites assigned as diagnostic SNPs for the tester line were filtered out. This process was repeated for every parental line used in the crosses, meaning that every F2 individual we sequenced had only one (if all diagnostic SNPs corresponded to a single parental genome) or two (if one or more crossovers occured) VCF files with mapped diagnostic SNPs per chromosome arm, identifying the original cross. Note that the filtering process removed a higher number of chromatids sequenced for chromosome arm 2R relative to other chromosome arms () due to a lower fraction of diagnostic (singeton) SNPs. As a consequence, analyses for 2R are based on fewer chromatids. For each F2 individual, we used the ordered distribution of diagnostic SNPs along chromosomes in the VCF files (after merging when needed using BCFtools merge) for the purpose of downstream analysis and crossover localization.

Identification of crossovers

We used diagnostic SNPs as a method for dual multiplex sequencing but also to identify crossovers and their genomic location. After aligning reads from each F2 individual to parental genomes of a specific cross based on diagnostic SNPs, several filters were applied before determining crossover events. First, any F2 chromosome arm that contained less than 100 diagnostic SNPs was filtered out. Second, given that our diagnostic SNPs were not evenly distributed across the chromosomes, we removed regions with a significant bias of mapping to one parental genome over the other. We applied a sliding window approach with regions of 100 kb and increments of 25 kb, and removed from further consideration regions with a ratio of mapped reads to the parental genomes less than 0.1 or greater than 10. Our next filters aimed at decreasing events that could potentially be gene conversions without crossover. A block was defined as a region of continuous mapping to diagnostic SNPs from only one parental genome. Blocks shorter than 100 kb were removed from analysis and flanking blocks were combined. Crossovers were identified as consecutive diagnostic SNPs of one parental genome switching to diagnostic SNPs from the other parental genome (boundaries between blocks). Crossovers were then assigned a random position between the two diagnostic SNPs defining the region between blocks of diagnostic SNPs. A final filter was applied to require crossover events to be separated by at least 250 kb. Analyses using 500 kb as threshold produced the same results, in agreement with the observation that the two closest crossovers we identified in our study were 771 kb apart. To study crossovers in the dot chromosome, the filters used were relaxed to capture the paucity of SNPs among lines. In this case, the minimum number of diagnostic SNPs was set to 10, and the minimum number of diagnostic SNPs per block flanking potential crossovers was set to 4. Even with these more permissive filters, we did not detect any crossovers along the dot chromosome. Crossover rates were studied as cM/Mb per female meiosis, for each cross separately and after combining the results from the three crosses. Unless noted, the distribution of crossover rates along chromosome arms in D. yakuba was based on non-overlapping 200 kb windows. For D. melanogaster, we analyzed equivalent genomic sizes based on high-resolution crossover maps for r5.3 and r6 genome releases [113,237]. In all cases, we used log10 of crossover rates in analyses to estimate association between crossover rates and N or efficacy of selection (CUB, protein evolution and neutral diversity).

Analyses of viability effects

Differential egg-to-adult viability of parental and/or recombinant genotypes could bias the set of chromatids genotyped in adults. We, therefore, quantified the frequency of the four possible pairwise combinations of the parental genotypes for all possible genomic distances along the X chromosome. In particular, for each cross and chromatid analyzed, we identified the genotype at 1-kb resolution along the chromosome and obtained all pairwise 1-kb vs 1-kb (two-marker) haplotypes as a function of the genomic distance. For recombinant haplotypes, we followed the convention of ordering ‘left’ and ‘right’ genotypes from telomere-proximal to centromere-proximal. We then combined the results from all chromatids for each cross sepparately, obtaining the frequency of the four possible haplotypes as a function of genomic distance. Finally, we transformed genomic distances into genetic distances (cM) based on the genetic map generated for each cross.

The centromere effect

Two different methods were used to examine the centromere effect. The first method was performed following previous studies where the number of crossovers observed near the centromere (i.e., the centromere-proximal region) is compared to the expected number of crossovers for a region of equivalent size under the assumption that crossovers are randomly distributed along a chromosome arm [18,20,21,41]. This approach requires a predetermined, arbitrary, size for the centromere-proximal genomic region, and in this study we defined it as one third of the chromosome arm, as in Miller et al. (2016) [41]. For each chromosome arm we obtained the probability of detecting centromere effect after comparing the observed and expected number of crossovers within the centromere-proximal region for that chromosome arm based on 10 million random replicates. To obtain a random distribution of crossover locations on a chromosome arm, we took into account the genomic locations were the detection of a crossover would have been possible near centromeres based on our genotyping method. To study the centromere effect quantitively and identify the genomic region influenced by it, we applied a second method that estimates the proportion of the chromosome showing a significant reduction in crossovers. Starting at the most centromere-proximal region, we calculated the likelihood of a 1 Mb genomic region (window) showing a significantly reduced number of crossovers relative to the rest of the chromosome arm. This process was repeated after moving the window into the chromosome arm with 100 kb increments. For each window, we used a binomial distribution to determine if the number of crossovers was significantly reduced relative to expectations under the assumption that crossovers are randomly distributed along a chromosome arm. We obtained probability values for each 1 Mb window based on n trials of the binomial distribution, where n is the total number of crossovers in a chromosome arm with a probability equal to the fraction that this 1 Mb region represents of the total length of the chromosome arm where crossovers could be detected. Using these parameters, we obtained P values for each 1 Mb window moving from the centromere towards the center of the chromosome arms, and identified the threshold of the genomic regions affected by the centromere effect when five consecutive windows showed a non-significant reduction in crossovers. Two levels of significance were examined, P < 1×10−6 and P < 1× 10−2. In both models, the total number of crossovers from the three crosses analyzed were used to investigate centromere effects. Equivalent analyses were used to detect the telomere effect. To enable a direct comparison between D. yakuba and D. melanogaster, we also examined the centromere and telomere effect in D. melanogaster with the same methodologies as in our study of D. yakuba. Furthermore, we analyzed the D. melanogaster genome assemblies r5.3 and r6 to capture different levels of inclusion of peri-centromeric (or telomeric) sequences and heterochromatin. Unless noted, the D. melanogaster r6 release is used by default in the main text, and results comparing D. yakuba to both D. melanogaster r5.3 and r6 are shown in Figures and Tables.

Crossover interference

Crossovers tend to avoid peri-centromeric and peri-telomeric regions, and even within the ‘central’ region of chromosome arms (outside the influence of centromere and telomere effects), crossovers are not randomly distributed. Therefore, the direct comparison of observed inter-crossover distance (ICD) in chromatids with two crossovers (2CO) and the expected distance based on a random distribution of crossovers along a chromosome arm can produce biased conclusions on interference. Another approach to measure interference is to assume that the distance between two independent crossovers in 2CO chromatids (no interference) is exponentially distributed and, as such, follows a gamma distribution with shape parameter (ν) of 1 [45]. Fitting the observed series of ICD in 2CO chromatids to a gamma distribution, therefore, would provide information about the existence of crossover interference when ν is greater than 1, evidencing crossovers more evenly spaced (positive interference) than expected under a Poisson process [45,144,238]. This model, however, also assumes that all genomic sites along a chromosome are equally likely to become a crossover, which is not correct in many species and may cause expectations for ν to be different than 1 under no interference. To study crossover interference in D. yakuba, we considered that crossovers are not randomly distributed along chromosome arms. Expectations of ICD for 2CO chromatids under no interference can be obtained by using data from single crossover chromatids (1CO), randomly choosing two crossover locations along a chromosome and estimating the distance [39]. Our large number of genotyped meioses and 1CO chromatids (a total of 3,531) allows us to use this approach to obtain ICD expectations for 2CO chromatids after 1 million independent replicates per chromosome arm. Genome-wide analyses of crossover interference were generated by considering the differences between chromosome arms (observed number of 1CO and 2CO chromatids, and the distribution of crossovers along each arm in 1CO chromatids). For the sake of allowing comparisons with other studies, we also show the estimated ν for each chromosome and expectations of ν under the assumption of no interference based on the distance between two randomly chosen 1COs. Note that the random generation of ICDs based on the location of 1COs shows a best fit to gamma distributions with ν greater than 1 (as discussed above). Estimates of ν for ICD of 2CO chromatids from our D. yakuba datasets are consistently greater than 1 and, more relevant, also consistently greater than our expectations of ν, thus supporting the presence of positive interference as well.

Chromatid and tetrad analysis

Crossover events were divided into classes depending on the number of observed crossovers per chromatid: zero or non-crossover (NCOs), single crossover (1COs), double crossovers (2COs), triple crossovers (3COs), and quadruple crossovers (4COs). This information was also used to estimate the frequency of bivalent (or tetrad) exchange classes, initially based on Weinstein’s algebraic method [64,65]. The frequency of tetrad classes is estimated as E, denoting tetrads with r crossovers (E tetrads indicate the frequency of homologous chromosome pairs with no crossovers; E1 tetrads with a single crossover, E2 tetrads with two crossovers, etc.) Note that Weinstein’s method is based on a meiotic model that assumes random distribution of crossovers along chromatids with no crossover interference, random selection of pairs of nonsister chromatids (no chromatid interference), random distribution of chromatids into gametes, and equal viability among all possible meitotic products. Under this model, the frequency of tetrad classes can be estimated using the observed frequency of crossover classes (NCO, 1CO, 2CO, 3CO and 4CO) and by solving Er, To obtain confidence intervals and compare models, a maximum likelihood (ML) approach was applied to identify by simulation the combination of tetrad frequencies (Er) that best fits the observed data (number of NCO, 1CO, 2CO, 3CO and 4CO chromatids) under several models. This allowed us to obtain point estimates for Er, confidence intervals and overall fit of a model to data. To compare models, we calculated the likelihood test statistic for each model and performed a likelihood-ratio test (LRT). Given our detection of rare but non zero chromosomes with four crossovers and a nonnegligible fraction of chromosomes with three crossovers, we modeled E, E, E, E and E. Models limiting E perform much worse than E, at least for D. yakuba, and therefore are not used. Our simplest model assumed no restrictions in terms E values and it is equivalent to Weinstein’s model for a given maximum r (in our case E). We also used ML models to add a diverse set of rules [66,145] to initially restrict expectations to biologically feasible values (Er ≥ 0). We then expanded the rules of Er ≥ 0 to study models including meiotic drive or chromatid interference. The model with meiotic drive assumes a non-random segregation of sister chromatids in meiosis II depending on whether or not sister chromatids had crossovers. We explicitly explored meiotic drive with a bias (b) favoring chromatids with crossovers preferentially segregating into the oocyte nucleus when the sister chromatid has no crossovers (MDCO model). Under this model (see ), chromatids with crossovers have a probability of being segregated into the oocyte nucleus of 0.5 ≤ b ≤ 1 (instead of a case with no bias, b = 0.5), which causes nonrecombinant sister chromatids to be preferentially extruded to the second polar body. The model with chromatid interference, on the other hand, assumes that chromatids used for multiple exchanges in a tetrad with E are not chosen independently and at random [64,145]. This model allows for a biased distribution of crossovers (b) among chromatids when E. In particular, we studied a model of positive chromatid interference (PCI) favoring (0.5 ≤ b ≤ 1) the sister chromatid with the fewest number of previous crossovers for each additional crossover, thus reducing the number of chromatids with no crossovers relative to a random case of b = 0.5.

Crossover data from D. melanogaster

To compare the observed frequency of different crossover and tetrad classes in D. yakuba (this study) and D. melanogaster, we used D. melanogaster data based on WGS [41] and visible markers [37,239,240] (see and ). We included results from visible markers in D. melanogaster because these studies are based on much larger sample sizes than those based on WGS for this species. On the other hand, it is worth noting that studies using visible markers are more likely to generate egg-to-adult viability defects than WGS and, because of the reduced density of markers along chromosomes, underestimate crossover events (hence overestimating NCO chromatids and E0).

Rates of protein evolution in the D. melanogaster subgroup

To identify variation in the efficacy of selection at protein level, we estimated dN (the number of nonsynonymous substitutions per nonsynonymous site), dS (the number of synonymous substitutions per synonymous site) and ω (the dN/dS ratio) for each gene and across the entire phylogeny of the D. melanogaster subgroup (D. melanogaster, D. simulans, D. sechellia, D. yakuba and D. erecta; Figs and ). To this end, we applied the branch-model in the codeml program as implemented in PAML (v4.9j, February 2020) that allows different ω in all internal and external branches of the five-species tree [191,192]. Furthermore, we took into account gene-specific properties influencing ω (CDS and transcript length, number of introns, gene expression, and CUB as an estimate of the degree of weak selection on synonymous mutations and therefore dS) and used residuals generated by Generalized Additive Models (GAM) as a measure of variable efficacy of selection on amino acid changes (ωR) for each gene and phylogenetic branch. We estimated rates of evolution for a curated set of 7,062 aligned orthologous CDS from flyDIVaS [241]. To identify genes under positive selection for amino acid changes in one or more branches, we compared Model M1a (nearly neutral evolution) against a model that allows for the additional presence of positive selection at a fraction of sites (model M2a). For each branch, we compared maximum likelihood estimates (MLEs) under these two models and applied likelihood ratio tests (LRTs) to identify evidence of positive selection after correcting for multiple tests (FDR = 0.05) [242]. To study the efficacy of selection against deleterious amino acid changes, we followed Larracuente et al. (2008) [104] and generated a set of genes with no evidence of positive selection for each branch. This set of genes was obtained by removing genes with the highest 10% ωR as well as all those that showed significant signal of positive selection based on the LRT approach.

Synonymous codon usage

We estimated the degree of bias in the use of different synonymous codons (Codon Usage Bias, CUB) by calculating ENC (Effective Number of Codons) [243] in genes with more than 200 codons aligned across species. ENC measures departures from random use among synonymous codons for each amino acid and has been shown to be minimally influenced by differences in CDS length and amino acid composition [244]. The values for ENC range between 20 (extreme CUB, with only one synonymous codon used per amino acid) and 59 (random use of synonymous codons) and, therefore, we used -ENC to evaluate the expected positive correlation between CUB and efficacy of selection.

Neutral diversity in D. yakuba and D. melanogaster

We obtained estimates of neutral diversity based on pairwise sequence comparisons at four-fold synonymous sites (π4f). For D. yakuba, we analyzed the CY population (Cameroon; West Africa), which shows a high degree of long-term stability with median estimates of Tajima’s D [245] very close to equilibrium expectations (-0.08). For D. melanogaster, we analyzed the sub-Saharan African RG population from Rwanda (Drosophila Genome Nexus; http://johnpool.net/genomes.html [246,247]), which combines a relatively large sample size (n = 27), minimal signals of non-equilibrium dynamics, and low and well characterized levels of admixture [83,247].

Observed number of meiotic events for each of the three crosses of D. yakuba analyzed in this study.

(PDF) Click here for additional data file.

Centromere and telomere effect in D. yakuba based on the study of the most centromere- and telomere-proximal 1/3 region of each chromosome arm.

(PDF) Click here for additional data file.

Proportion of chromosome arms showing centromere and telomere effect in D. yakuba and D. melanogaster.

(PDF) Click here for additional data file.

Satellite repeats in euchromatic and heterochromatic regions of D. yakuba and D. melanogaster.

(PDF) Click here for additional data file.

Number of satellite repeats per Mb in heterochromatic and euchromatic PacBio reads of D. yakuba and D. melanogaster.

(PDF) Click here for additional data file.

Tetrad analysis and estimates of E values for D. yakuba and D. melanogaster.

(PDF) Click here for additional data file.

Transposable Elements (TEs) showing a significant difference in copy number between D. yakuba and D. melanogaster.

(PDF) Click here for additional data file.

Spearman’s ρ correlation between TE abundance and crossover rate (cM/Mb) in D. yakuba and D. melanogaster.

(PDF) Click here for additional data file.

Spearman’s ρ correlation between abundance of TE classes and crossover rate (cM/Mb) in D. yakuba and D. melanogaster.

(PDF) Click here for additional data file.

Enrichment analysis of individual motifs near crossover events in D. yakuba.

(PDF) Click here for additional data file.

Data used to study tetrad frequencies in D. melanogaster.

(PDF) Click here for additional data file.

Frequency distribution of number of crossovers per gamete.

A) Crossovers observed in all three crosses analyzed not including data from chromosome arm 2R, and B) crossovers observed in all chromosome arms for the two crosses without the heterozygous 2R inversion (TZ043 x TZ020 and Rain5 x Cost1235.2). (TIF) Click here for additional data file.

Frequency of parental and recombinant haplotypes as a function of genetic distance (cM) along the X chromosome.

Frequencies shown as average within overlapping 1-cM windows with increments of 0.1 cM. For recombinant haplotypes, the order of the two parental genotypes is shown from telomere to centromere. (TIF) Click here for additional data file.

Motif logos for each of the short DNA motifs associated with crossovers in D. yakuba.

Logos and notation in parenthesis are from [185]. (TIF) Click here for additional data file.

Description of gaps in the D. yakuba reference genome 2.0 solved or improved in this study.

A) Number of gaps completely solved or improved per chromosome arm. B) Genomic locations of completely solved (dark red lines) or improved (light red lines) gaps. TE presence in added sequences are indicated as blue lines. (TIF) Click here for additional data file.

Supplemental materials and methods.

(PDF) Click here for additional data file.

Table 1

Observed number of meiotic events in D. yakuba.

	Chromosome
CO Class¹	X	2L	2R ²	3L	3R	Total
NCO	447	835	463	694	683	3122
1CO	901	708	312	781	829	3531
2CO	200	116	50	209	107	682
3CO	74	2	14	17	15	122
4CO	0	0	2	0	1	3
Chromatids	1622	1661	841	1701	1635	7460

NCO: zero crossover, 1CO: single crossover, 2CO: double crossover, 3CO: triple crossover, 4CO: 4 crossovers in a single chromatid.

2The Sn20 x Sn17 cross is heterozygous for an inversion on 2R ( and ) and, therefore, no crossovers were analyzed for this chromosome arm. For comparison, meiotic events in D. melanogaster analyzed in this study are shown in .

Table 2

Crossover interference in D. yakuba.

	X	2L	2R	3L	3R	All
Number of chromatids with 2 COs	200	116	50	209	107	682
Av. inter-crossover distance (ICD¹; in kb)	7,068.3	7,039.1	6,379.9	8,437.6	7,480.4	7,497.1
Min ICD	1,486.9	1,163.0	1,880.1	819.9	3,286.4	819.9
Expected ICD²	6,059.3	4,585.1	5,923.6	5,594.7	5,336.5	5,499.9
Ratio Observed/Expected ICD	1.17	1.54	1.08	1.51	1.40	1.36
Prob (Observed ≤ Expected ICD)	0.0003	<1×10⁻⁶	n.s.	<1×10⁻⁶	<1×10⁻⁶	0.0464
Expected ν (gamma) for ICD	1.97	2.03	1.64	1.87	1.99	1.91
Observed ν (gamma) for ICD	3.87	5.57	5.14	5.18	9.5	4.90

1 Observed average inter-crossover distance (ICD; in kb) in 2CO chromatids.

2 Expected ICD based on two randomly chosen crossovers from 1CO chromatids (see text for details).

Table 3

Enrichment analysis of DNA motifs near crossover events in D. yakuba.

	P-value
Motif	5kb	3kb	1kb
[A]_N	<3.3×10⁻³⁰⁸	1.7×10⁻⁷¹	3.2×10⁻³⁶
[CA]_N	<3.3×10⁻³⁰⁸	1.4×10⁻⁶⁵	1.4×10⁻⁵
[TA]_N	5.2×10⁻⁹²	5.1×10⁻⁴⁵	1.6×10⁻⁴
[GCA]_N	<3.3×10⁻³⁰⁸	1.0E×10⁻¹⁹⁹	4.4×10⁻³
[CYCYYY]_N	1.5×10⁻⁶⁰	2.3×10⁻⁴⁰	9.6×10⁻⁵

1 Probabilities obtained by comparing the number of times this motif is identified in sequences containing a crossover event and expectations based on sequences of equivalent length randomly chosen across the genome. Three sequence datasets were analyzed based on the distance between diagnostics SNPs around a crossover event (5-kb or less, 3-kb or less, and 1-kb or less).

228 in total

Review 1. Biased gene conversion: implications for genome and sex evolution.

Authors: Gabriel Marais
Journal: Trends Genet Date: 2003-06 Impact factor: 11.639

2. A Possible Influence of the Spindle Fibre on Crossing-Over in Drosophila.

Authors: G W Beadle
Journal: Proc Natl Acad Sci U S A Date: 1932-02 Impact factor: 11.205

3. Direct evidence of a role for heterochromatin in meiotic chromosome segregation.

Authors: A F Dernburg; J W Sedat; R S Hawley
Journal: Cell Date: 1996-07-12 Impact factor: 41.582

4. The rapid evolution of X-linked male-biased gene expression and the large-X effect in Drosophila yakuba, D. santomea, and their hybrids.

Authors: Ana Llopart
Journal: Mol Biol Evol Date: 2012-07-26 Impact factor: 16.240

5. The centromere1 (CEN1) region of Arabidopsis thaliana: architecture and functional impact of chromatin.

Authors: W Haupt; T C Fischer; S Winderl; P Fransz; R A Torres-Ruiz
Journal: Plant J Date: 2001-08 Impact factor: 6.417

6. Correlated variation and population differentiation in satellite DNA abundance among lines of Drosophila melanogaster.

Authors: Kevin H-C Wei; Jennifer K Grenier; Daniel A Barbash; Andrew G Clark
Journal: Proc Natl Acad Sci U S A Date: 2014-12-15 Impact factor: 11.205

7. Population dynamics of PIWI-interacting RNAs (piRNAs) and their targets in Drosophila.

Authors: Jian Lu; Andrew G Clark
Journal: Genome Res Date: 2009-11-30 Impact factor: 9.043

8. Prdm9 controls activation of mammalian recombination hotspots.

Authors: Emil D Parvanov; Petko M Petkov; Kenneth Paigen
Journal: Science Date: 2009-12-31 Impact factor: 47.728

9. The Time Scale of Recombination Rate Evolution in Great Apes.

Authors: Laurie S Stevison; August E Woerner; Jeffrey M Kidd; Joanna L Kelley; Krishna R Veeramah; Kimberly F McManus; Carlos D Bustamante; Michael F Hammer; Jeffrey D Wall
Journal: Mol Biol Evol Date: 2015-12-15 Impact factor: 16.240

Review 10. FlyBase 2.0: the next generation.

Authors: Jim Thurmond; Joshua L Goodman; Victor B Strelets; Helen Attrill; L Sian Gramates; Steven J Marygold; Beverley B Matthews; Gillian Millburn; Giulia Antonazzo; Vitor Trovisco; Thomas C Kaufman; Brian R Calvi
Journal: Nucleic Acids Res Date: 2019-01-08 Impact factor: 16.971

1 in total

1. Background selection under evolving recombination rates.

Authors: Tom R Booker; Bret A Payseur; Anna Tigano
Journal: Proc Biol Sci Date: 2022-06-22 Impact factor: 5.530

1 in total