Literature DB >> 30184112

Intragenic Meiotic Crossovers Generate Novel Alleles with Transgressive Expression Levels.

Sanzhen Liu^1,2, James C Schnable³, Alina Ott^2,4, Cheng-Ting Eddy Yeh², Nathan M Springer⁵, Jianming Yu², Gary Muehlbauer⁶, Marja C P Timmermans⁷, Michael J Scanlon⁸, Patrick S Schnable².

Abstract

Meiotic recombination is an evolutionary force that generates new genetic diversity upon which selection can act. Whereas multiple studies have assessed genome-wide patterns of recombination and specific cases of intragenic recombination, few studies have assessed intragenic recombination genome-wide in higher eukaryotes. We identified recombination events within or near genes in a population of maize recombinant inbred lines (RILs) using RNA-sequencing data. Our results are consistent with case studies that have shown that intragenic crossovers cluster at the 5' ends of some genes. Further, we identified cases of intragenic crossovers that generate transgressive transcript accumulation patterns, that is, recombinant alleles displayed higher or lower levels of expression than did nonrecombinant alleles in any of ∼100 RILs, implicating intragenic recombination in the generation of new variants upon which selection can act. Thousands of apparent gene conversion events were identified, allowing us to estimate the genome-wide rate of gene conversion at SNP sites (4.9 × 10-5). The density of syntenic genes (i.e., those conserved at the same genomic locations since the divergence of maize and sorghum) exhibits a substantial correlation with crossover frequency, whereas the density of nonsyntenic genes (i.e., those which have transposed or been lost subsequent to the divergence of maize and sorghum) shows little correlation, suggesting that crossovers occur at higher rates in syntenic genes than in nonsyntenic genes. Increased rates of crossovers in syntenic genes could be either a consequence of the evolutionary conservation of synteny or a biological process that helps to maintain synteny.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2018 PMID： 30184112 PMCID： PMC6231493 DOI： 10.1093/molbev/msy174

Source DB: PubMed Journal: Mol Biol Evol ISSN： 0737-4038 Impact factor: 16.240

Introduction

Adaptation, speciation, and selection (both natural and artificial) depend upon both independent assortment of whole chromosomes, as well as meiotic recombination’s ability to create novel haplotypes within individual chromosomes, via crossovers and/or gene conversion (GC; Muller 1964; Gaut et al. 2007; Lu et al. 2012). These haplotypes can be novel combinations of alleles carried by the same chromosome and/or recombinant alleles of individual genes with novel functionalities, for example, by shuffling promoter or enhancer sequences or protein domains of varying efficiencies. In addition, recombination is essential for identifying associations between genic and phenotypic variation (Salome et al. 2012). In many organisms, meiotic recombination events are not uniformly distributed across the genome (Myers et al. 2005; Giraut et al. 2011; Pan et al. 2011). As such, the cumulative effects of recombination over evolutionary time have left signatures that can be observed in existing populations. Favorable haplotype blocks can be maintained over evolutionary time in chromosomal regions that experience low rates of recombination (Wijnker et al. 2012). Conversely, low rates of recombination can reduce the probability of such favorable haplotype blocks being created in the first place. Low rates of recombination also contribute to linkage drag, decreasing the efficiency of both natural and artificial selection (e.g., breeding) causing selective sweeps (Brinkman and Frey 1977). Although double-strand breaks are widely distributed across the maize genome (He et al. 2017), meiotic recombination events per se are generally suppressed in pericentromeric regions (Gore et al. 2009; Liu et al. 2009; Rodgers-Melnick et al. 2015). The lack of recombination between loci can contribute to heterosis via pseudo-overdominance (Schnable and Springer 2013). The reduced rate of recombination in pericentromeric regions may contribute to the accumulation of deleterious alleles in these regions, and regions with higher rates of recombination tend to have reduced genetic load (Rodgers-Melnick et al. 2015). Studies in maize indicate that meiotic recombination occurs at high frequency in nonrepetitive genic regions (Fu et al. 2001; Rodgers-Melnick et al. 2015). Case studies of individual genes have suggested that meiotic recombination events occurring within genes (intragenic recombination) occur more often in 5′ ends of some maize genes (Patterson et al. 1995; Xu et al. 1995; Choi et al. 2013; Shilo et al. 2015), more often in 3′ ends of some other genes (Eggleston et al. 1995), or show no obvious preference (Dooner and Martínez-Férez 1997). A study of maize tetrads showed that cross-overs were more likely to occur in the 5′ end of genes, followed by the 3′ end and the middle of genes (Li et al. 2015). However, this study was limited to a handful of genes due to the small number of tetrads observed. More global analyses conducted in the Arabidopsis (Choi et al. 2013; Shilo et al. 2015) and human (Kong et al. 2010) genomes reached similar conclusions. The phenotypic importance of intragenic recombination has been shown in studies in multiple species but has so far been limited to indirect associations or single genes (Freeling 1976; Hagblom et al. 1985; McDowell et al. 1998; Ogasawara et al. 1998). Alleles derived via intragenic recombination can have higher expression in segregating populations and be more likely to be present in genomic association intervals for phenotypic traits than randomly selected genes in a maize diversity panel (Pan et al. 2016). Many plant genomes have undergone evolutionarily recent whole genome duplication events (Jiao et al. 2011). Maize underwent a recent polyploidization event between 5 and 12 Ma that is not shared by sorghum (Swigonova et al. 2004) and that was followed by fractionation. Both the maize and sorghum genomes have been sequenced (Paterson et al. 2009; Schnable et al. 2009), making comparisons between these two species an excellent model to assess the effects of genome duplication. Maize contains two subgenomes relative to the unduplicated sorghum, termed maize1 and maize2. After the whole genome duplication, many maize genes were lost apparently via intrachromosomal nonhomologous recombination (Woodhouse et al. 2010). As a consequence of differential loss, the maize1 subgenome shares more genes with sorghum than does maize2 (Schnable et al. 2011). Genes retained at syntenic (ancestral, collinear) positions have distinct genomic properties as compared with nonsyntenic genes. For example, syntenic genes are more likely to be expressed and have lower levels of methylation than nonsyntenic genes (Eichten et al. 2011), consistent with mutation analyses which suggest that mutants in syntenic genes are more likely to yield a visible phenotype than mutations in nonsyntenic genes (Schnable and Freeling 2011). In addition, functional differentiation is also observed between the maize subgenomes. Maize1 genes explain more phenotypic variation than maize2 genes as assessed via mutant and association analyses (Schnable and Freeling 2011; Renny-Byfield et al. 2017), which corresponds to higher maize1 gene expression than maize2 though no difference in methylation (Eichten et al. 2011). In this study, we identified 176,279 polymorphic SNPs from RNA-seq data on 105 RILs, 7,854 recombination breakpoint intervals (RBIs), of which 848 are intragenic crossovers, and 3,014 apparent gene conversions (aGCs). Importantly, we demonstrated that crossovers within these genes can generate alleles with novel levels of transcript accumulation. Global GC events enabled us to provide the first genome-wide estimate for the rate of GC in maize. Finally, we demonstrated that syntenic genes are enriched for recombination relative to nonsyntenic genes, which could be either a consequence of the evolutionary conservation of synteny across maize haplotypes or a process that helps to maintain synteny.

Results

Genotyping the IBM Population by RNA-Seq and De Novo Construction of a Genetic Map

The intermated B73 and Mo17 recombinant inbred lines (IBM RILs) were developed by crossing the inbred lines B73 and Mo17, followed by several generations of random mating and multiple generations of self pollination (Lee et al. 2002). Using RNA-seq data from vegetative apices (Barbazuk et al. 2007), a set of high confidence SNPs (N = 176,279) that segregated among the RILs was identified (Materials and Methods). 162,356 of these SNPs are in annotated genes, providing a polymorphism rate of 1.3 SNPs per kb of genic space. The SNP genotypes of all the RILs were subjected to segmentation (Olshen et al. 2004) to obtain a minimum set of genetic markers describing all crossovers observed within the RIL population (fig. 1, Materials and Methods). A total of 7,856 segmental markers representing single genetic markers genotyped as the B73, the Mo17, or a recombinant haplotype were identified (supplementary fig. S1, Supplementary Material online). These markers were used to generate a genetic map (supplementary table S1, Supplementary Material online), which was shown to exhibit a high level of consistency with the physical map (correlation coefficient >0.999). Segments in a particular RIL where a crossover event occurred are called RBIs (N = 7,574; fig. 1). Given their size and the numbers of consistent flanking markers (Materials and Methods), RBIs are presumed to be the consequence of crossovers (fig. 1). The numbers of detected crossovers in each RIL range from 60 to 125 (fig. 2), with an average of 82, which is similar to previous estimates in the IBM RILs (Fu et al. 2006; Esch et al. 2007). Some markers within segments have a parental genotype that does not agree with the evidence from the rest of the segment and therefore represent aGCs (fig. 1).

. 1.

Genotyping, segmentation, recombination breakpoint intervals, and gene conversions in the IBM RILs. (A) SNPs (vertical bars) were genotyped as B73-like (black) or Mo17-like (gray) in each RIL (horizontal bars). (B) Segmental markers were defined as genomic regions that originate from a single parent in each RIL. (C) Segments were genotyped as B73-like (A), Mo17-like (B), or recombinant (R). (D) Recombination breakpoint intervals (RBIs) are individual or multiple adjacent recombinant segments. RBIs that occur within a gene are intragenic RBIs. (E) Individual markers within a segment may disagree with the segmental genotype. These markers are potentially gene conversions.

. 2.

Crossovers in the RILs. (A) The distribution of the number of crossovers per RIL (N = 105). (B) The distribution of RBIs in genes based on the number of RILs affected by an RBI in a particular gene.

Chromosomal Level Distribution of Crossovers

Overall, the genetic lengths of the ten maize chromosomes are highly positively correlated with the physical lengths of chromosomes (correlation coefficient = 0.93; supplementary fig. S2, Supplementary Material online). Consistent with previous genetic maps of the IBM RILs and other maize populations, the middles of all chromosomes show low crossover frequency, whereas the arms of chromosomes exhibit high crossover frequency (supplementary fig. S3, Supplementary Material online; Esch et al. 2007; Liu et al. 2009; Rodgers-Melnick et al. 2015). 72% of all crossovers occur in the ∼30% of the genome that exhibits a recombination frequency >1 cM/Mb across generations.

The Relationship between Crossovers and Synteny

Many studies suggest that crossovers occur more often in genic versus nongenic regions (Brown and Sundaresan 1991; Civardi et al. 1994; Fu et al. 2001; Yao et al. 2002; Rodgers-Melnick et al. 2015). As a consequence of the RNA-seq strategy used for genotyping, most of our markers are located within genes. Hence, we were able to focus our investigations on differences in crossover frequency between classes of genes. A set of genes that are syntenic between maize and sorghum—that is, retained in ancestral, collinear positions in both species—has been defined (Schnable et al. 2011). We analyzed crossovers and synteny for 1 Mb bins genome-wide. The numbers of base pairs in each bin that could be assigned to syntenic genes, nonsyntenic genes or nongenic sequences was determined and compared with the crossover frequency. Crossover frequencies of bins are more strongly correlated with the size of the syntenic gene space within the bin (fig. 3, Spearman’s Rank correlation ρ = 0.58) than with the size of the nonsyntenic gene space (fig. 3, Spearman’s Rank correlation ρ = 0.27), suggesting that crossovers occur at higher rates in syntenic genes than in nonsyntenic genes.

. 3.

Synteny and crossover frequency. The genome was divided into nonoverlapping 1 Mb bins (black dots, N = 2,064 for each panel) and the amount of sequence from syntenic and nonsyntenic genes was determined. The crossover frequency across each bin was estimated in cM per Mb. (A) Bins containing more syntenic genes have a stronger positive correlation with crossover frequency. (B) Bins containing more nonsyntenic genes have a reduced correlation with crossover frequency. Pericentromeric regions contain more nonsyntenic genes (Schnable et al. 2012). The enrichment of crossovers in syntenic regions could potentially be a result of gross chromosomal arrangements elsewhere in the genome, rather than a function of the properties of syntenic regions. To control for this possibility, the relationship between crossovers and synteny was assessed separately in pericentromeric (25 kb flanking the centromere) and nonpericentromeric regions. The correlations between synteny and crossovers in nonpericentromeric regions were very similar to the genome-wide correlations: ρ = 0.52 versus 0.58 for syntenic genes and ρ = 0.25 versus 0.27 for nonsyntenic genes. In contrast, within pericentromeric regions there was no discernable correlation between crossovers and either syntenic or nonsyntenic genes (ρ = –0.10 and ρ = 0.05, respectively). We therefore conclude that the negative correlations of both recombination and syntenic gene density with pericentromeric identity are not sufficient to explain the correlation between these two characteristics of genomic regions. Relative to sorghum, maize is an ancient tetraploid (Swigonova et al. 2004) and any retained syntenic orthologous genes between these species can be classified as belonging to either the maize1 or maize2 subgenome based on their genomic positions (Schnable et al. 2011). While there are differences between the subgenomes, such as a bias towards higher gene expression in maize1, higher gene loss across maize lines from maize2 (Schnable et al. 2011), and a larger proportion of phenotypic variation explained by maize1 singleton genes (Renny-Byfield et al. 2017), no difference in crossover frequency was detected between the two subgenomes (supplementary fig. S4, Supplementary Material online, Spearman’s Rank correlation ρ = 0.56 for maize1 and ρ = 0.57 for maize2).

Distribution of Crossovers among and within Genes

Previous case studies of individual genes showed that many genes are hotspots for crossovers (Brown and Sundaresan 1991; Civardi et al. 1994; Eggleston et al. 1995; Patterson et al. 1995; Xu et al. 1995; Dooner and Martínez-Férez 1997). To understand the distribution of crossovers among and within genes, we first identified RBIs encompassed by a single gene. The size of RBIs is dependent on the regional density of SNP markers; hence the estimated sizes of most RBIs are expected to be larger than the actual sizes of these RBIs. The 7,574 RBIs detected in this study range in size from 1 bp to ∼30 Mb, with a median of 104.6 kb. Maize genes, as defined in this study (Materials and Methods) have a median length of 4.5 kb, and 99% genes are <30 kb. Even so, across all RILs, 848 RBIs were located within a gene that was extended 1 kb upstream and downstream of the gene, including 793 unique RBIs encompassed by 561 nonredundant genes (fig. 1, supplementary table S2, Supplementary Material online, Materials and Methods). Most RBIs (7,535/7,574) overlap with a gene, indicating few RBIs were encompassed only by the intergenic regions that comprise >85% of the genome. The majority of genes (404/561) that contain an RBI (i.e., an apparent intragenic crossover) experienced a crossover in only a single RIL. But some genes appear to be “hotspots” such that crossovers were observed in at least two RILs (fig. 2). Permutation testing (Materials and Methods) indicated that the number of “hotspot” genes was significantly more than expected (P-value < 0.001). These observations support both the previous finding that recombination is more likely to occur in genic than nongenic regions and that certain genes experience recombination events much more frequently than do others. We also found that intragenic crossovers were 1.9× more likely to occur within syntenic genes than within nonsyntenic genes (Fisher’s Exact Test, P-value = 5.9 × 10−4). Syntenic genes are more likely to be expressed than are nonsyntenic genes and therefore intragenic crossovers in syntenic genes may be more likely to be detected using our RNA-seq based genotyping strategy. However, roughly equal proportions of syntenic and nonsyntenic genes contained at least two SNPs that passed all quality filtering parameters (49% and 48% respectively) and therefore had the potential for detectable intragenic crossover events. Hence, the enrichment of intragenic crossovers within syntenic genes is unlikely to have resulted from an ascertainment bias and instead provides additional support for the conclusion that syntenic genes experience more crossovers than do nonsyntenic genes. A preference for crossovers to cluster in the 5′-ends of genes has been observed in several case studies of maize (e.g., Patterson et al. 1995; Xu et al. 1995) and later more globally in Arabidopsis (Choi et al. 2013; Shilo et al. 2015). To explore where in genes intragenic crossovers occur, each gene was divided into three equal parts: 5′-end, middle, and 3′-end (Materials and Methods). For 301 RBIs the entire interval is within one of these three parts: 141 (46.8%), 79 (26.3%), and 81 (26.9%) RBIs occur at the 5′-end, middle and the 3′-end, respectively. The overall distribution of SNPs identified by RNA-seq between the three segments is 25,486 (24%) 5′-end, 41,356 (38%) middle, 41,117 (38%) 3′-end, significantly different from the distribution of RBIs (=89.7, df = 2, P-value < 2.2e–16); thus, the observed distribution of intragenic RBIs is not an artifact of the distribution of SNPs.

Intragenic Crossovers Can Generate Novel Alleles with Transgressive Levels of Transcript Accumulation

Intergenic crossovers can produce novel haplotypes by shuffling existing alleles. Similarly, intragenic crossovers can shuffle parental polymorphisms to create novel alleles (Schnable et al. 1998). Novel alleles created by intragenic crossovers have the potential to exhibit novel functionality. To test this hypothesis, we compared the transcript accumulation levels, represented by RPKM (reads per thousand of genic region per million of total reads), of alleles derived from intragenic crossovers to the transcript accumulation levels of their parental alleles. The transcript accumulation levels of each of the 848 recombinant alleles generated by intragenic crossovers was compared with the transcript accumulation levels of the nonrecombinant (i.e., parental) alleles that were segregating in the other RILs, as well as in the parental inbreds, B73 and Mo17. This comparison identified a statistically significant number of recombinant alleles (N = 20) that exhibited transgressive transcript accumulation in which the recombinant allele is expressed 5% higher or lower than the most extreme valued observed across the RIL set (and their parents) for nonrecombinant parental alleles (fig. 4, P-value = 0.004, see Materials and Methods for the permutation test).

. 4.

Transgressive transcript accumulation of recombinant alleles. Shown are 18 genes with at least one recombinant allele (red triangles) that exhibit transgressive transcript accumulation. Blue dots indicate the transcript accumulation of RILs with a B73 allele, whereas orange dots indicate the expression of RILs with a Mo17 allele. The transcript accumulation of each gene is scaled such that the maximum transcript accumulation is 10 and no transcript accumulation is 0. The blue and orange bars are the transcript accumulation level of the B73 and Mo17 parents, respectively. Among these 20 recombinant alleles with transgressive transcript accumulation in 18 genes, 9 and 11 alleles exhibit high and low transgressive transcript accumulation, respectively. Five, two, and one RBIs were located at 5′ ends, the middle, and 3′ ends of the genes. We conducted a permutation test using randomly sampled genes to demonstrate that the probability that at least 20 of the 793 intragenic recombination alleles identified in this study would exhibit transgressive segregation is ∼1/250 or 0.4%. Although this test does not provide certainty that any given recombinant allele exhibits transgressive segregation, it does provide strong evidence that as a group at least some of the identified recombinant alleles exhibit transgressive segregation. Because transgressive expression resulting from segregation of trans-regulatory factors should be equally common for not only alleles which experienced intragenic recombination and also for those which did not, it is unlikely that segregation of trans-regulatory factors explains the amount of transgressive expression we observed for alleles produced by intragenic recombination. Two genes contain more than one recombinant allele with transgressive transcript accumulation. In the gene GRMZM2G074238, a putative alpha/beta-hydrolase, two intragenic recombinant alleles exhibited transgressive transcript accumulation lower than other alleles. Both alleles had the same haplotype and were the only two recombinant alleles recovered in this gene. In the gene GRMZM2G094579, a putative glycosyltransferase, one allele with transgressive transcript accumulation exhibits higher transcript accumulation but the other exhibits lower transcript accumulation. Consistent with the hypothesis that transgressive expression of these alleles arises via an interaction of the alternative alleles at the 5′ and 3′ ends of the gene, in one of these alleles the 5′ end of the gene was derived from B73 and in the other allele the 5′ end was derived from Mo17. The genes for which we identified alleles with transgressive segregation are not enriched for those with low parental transcript accumulation; therefore, transgressive transcript accumulation is not simply a consequence of small amounts of variation in genes that are expressed at low levels (supplementary fig. S5, Supplementary Material online). In summary, these findings demonstrate that crossover between polymorphic alleles can generate novel alleles with different patterns of gene regulation and, potentially, function.

GC Events

Few studies of GC, a form of meiotic recombination that involves a nonreciprocal exchange of genetic information have been conducted genome-wide in plants (Wijnker et al. 2013; Li et al. 2015). In yeast, each haploid spore of a tetrad can be separated and germinated to form a spore colony that provides sufficient material for sequencing to detect evidence of GC genome-wide (Fogel and Hurst 1967). In plants where tetrad analysis is technically challenging, putative GC events are typically detected primarily through indirect methods, such as assessing the segregation of markers (Shi et al. 2010). Only a single tetrad study of maize microsporocytes has been conducted in maize to assess GCs, which was limited by sequencing coverage (1.4× for 41% of the maize genome) and the number of tetrads (N = 24 tetrads which resulted in the identification of 160 GC tracts, 150 which were >10 kb due to low marker density; Li et al. 2015). Two GC events were identified in the IBM RIL centromeres, and these events combined with an indirect estimate of GC based on haplotypes in a diversity panel were used to estimate the overall rate of GC in maize centromeres as ∼1 × 10−5 per marker per generation (Shi et al. 2010). To estimate the frequency of GCs in maize, we identified aGC among the IBM RILs. Evidence from Arabidopsis and maize suggests GC tracts are typically <2 kb (Yandeau-Nelson et al. 2005; Lu et al. 2012). SNPs that exhibited the opposite genotype from the chromosomal segment in which they were embedded were classified as aGCs (fig. 1). After stringent filtering of RILs, SNPs, and aGCs to control for potential systemic errors described by Qi et al. (2014) (Materials and Methods), we were left with 97 RILs, 10,289,873 SNPs and 3,014 aGC SNPs covering 2,634 SNP sites, or an average of 31 aGC SNPs per RIL. 4.9 × 10−5 exchanges were estimated to occur per marker per generation, that is, an order of magnitude higher than the rate in maize centromeres (Shi et al. 2010). Because we used RNA-seq data to discover aGCs, we considered the possibility that we were inadvertently detecting cases of RNA–DNA differences (Li et al. 2011). Although the frequency of RNA–DNA differences has been brought into question (Piskol et al. 2013), to rule out what we believed was a remote possibility, genomic sequence data from two RILs, M0022 and M0023 (Liu et al. 2012) were used to successfully cross-validate 5/6 SNPs associated with aGCs, indicating that at least most aGCs are not the result of RNA–DNA differences. Another potential explanation for the origin of aGCs would be single-nucleotide mutations. The probability of two such mutations resulting in a SNP that matches the nonparental SNP is much lower than the rate of aGC at ∼1.3 × 10−11 (Materials and Methods). In addition, mutations would not be expected to cluster as aGCs do. The aGCs are not randomly distributed across the genome. Indeed, 96% (2,518/2,634) of the aGCs cluster within 2 kb of another aGC SNP, and 636 aGC SNPs are located adjacent to other aGC SNPs within a RIL. Segmentation (Materials and Methods) of those aGC SNPs identified 143 genomic regions that harbor multiple aGC SNPs. Given the RNA-seq strategy used to detect aGCs it is not surprising that they cluster in genes and genes are not randomly distributed across the genome. But setting aside the distribution of aGCs in genic versus nongenic regions, some genes are enriched in aGCs. The 2,634 aGCs are located in only 946 genes, fewer genes than would be expected by chance (permutation test; P < 0.001). aGCs also exhibit little difference in frequencies in the 5′, middle and 3′ portions of genes (supplementary table S1, Supplementary Material online).

Discussion

By conducting RNA-seq on 105 RILs we identified 176,279 polymorphic SNPs, 7,574 RBIs, of which 848 are intragenic crossovers, and 3,014 aGCs.

Intragenic Crossover Generates Novel Alleles with Functional Differences

Mutations create genetic diversity upon which selection can act. However, even in the absence of mutation, meiotic recombination can create novel haplotypes that may prove to be adaptive, though few examples have been reported at the gene level. Previous investigations have identified intragenic recombination events (Brown and Sundaresan 1991; Civardi et al. 1994; Eggleston et al. 1995; Patterson et al. 1995; Xu et al. 1995; Dooner and Martínez-Férez 1997) and suggest that genes are recombination hotspots (Brown and Sundaresan 1991; Civardi et al. 1994; Fu et al. 2001; Yao et al. 2002; Rodgers-Melnick et al. 2015). Further, a handful of studies have shown for single genes across multiple species that novel alleles created via intragenic recombination can exhibit alterations in gene function or expression (Freeling 1976; Hagblom et al. 1985; McDowell et al. 1998; Ogasawara et al. 1998). Similarly, alleles derived via intragenic recombination at 31 loci were located within chromosomal intervals that have been shown via GWAS to exhibit novel functionality (Pan et al. 2016). The limitation of this approach for ascertaining functional novelty of intragenic recombinants is that GWAS typically does not provide gene level mapping resolution. In contrast, in the current study, we directly demonstrated that intragenic recombinants had novel levels of gene transcript accumulation. We used SNPs derived from RNA-seq data to map 848 intragenic crossovers to 561 nonredundant genes in a biparental population. A statistically significant number of these recombinant alleles exhibit transgressive levels of transcript accumulation relative to the parental alleles from which they were derived, thus providing direct evidence for their novel functionality. In this study, transcript accumulation, as measured by RPKM was used as a proxy for gene expression. Transcript accumulation can be affected by sequences at both ends of genes, for example, promoters at the 5′ end of the gene (Biłas et al. 2016) and 3′ regions that affect RNA stability (Wang et al. 2009). Hence, an intragenic recombinant allele that combines a strong 5′ promoter with a stabilizing 3′ polymorphism could exhibit transgressive levels of transcript accumulation. Indeed, we identified one gene in which intragenic recombinant alleles that consisted of 5′ regions derived from B73 and 3′ regions derived from Mo17 had high transcript accumulation, whereas an intragenic recombinant allele with opposite composition (i.e., a 5′ region derived from B73 and a 3′ region derived from Mo17) had low transcript accumulation. Hence, our data demonstrate that intragenic crossovers can generate novel alleles with unique patterns of gene expression. While our data do not link transgressive transcript accumulation with a specific phenotype, variation in transcript accumulation have been shown to be associated with phenotypic variation (Lin et al. 2017). The new alleles generated by intragenic recombination have the potential to create more extreme phenotypes which may be acted on by selection. Whereas the number of recombinant alleles identified with transgressive expression was small, over generations and across large populations, these rare recombination events can generate new alleles at a higher rate than does mutation.

Rate and Distribution of GC in Maize

Genome-wide discovery of GC events relies on the accuracy of identification of DNA polymorphisms. Qi et al. (2014) identified potential sources causing artifactual GC events, including misalignments of sequences from paralogs, regions exhibiting structural variation, and some tandem repeats, as well as random sequencing errors resulting in an apparent conversion from a homozygous genotype from one parent to a heterozygous genotype. In our RIL population, alignment issues are predicted to result in repeated observations of aGC at the same sites in multiple RILs. Here, we excluded aGC SNP sites observed in five or more independent RILs and regions that were >2 kb and that contained clustered aGC SNPs, which controlled for false GCs derived from misalignments. In our analysis, we only identified GCs that were fixed in the homozygous state. This is expected to substantially reduce the likelihood of false GCs caused by random sequencing errors that were discussed by Qi et al. (2014). Our ability to confirm 5/6 aGC via independent sequencing confirmed that the rate of false positive aGCs in this study was low. On the basis of our analysis of ∼3,000 SNPs associated with aGCs, we estimated the rate of GC in genic regions to be 4.9 × 10−5 per marker per generation, which is higher than the genome-wide rate (3.6 × 10−6) reported for Arabidopsis (Wijnker et al. 2013) and the rate estimated for maize centromeres (∼1 × 10−5 per marker per generation; Shi et al. 2010), but within the range of estimates reported in other species (Korunes and Noor 2017). However, as no standard for estimating GC has been determined, the variation in reported GC rates might result from variation due to experimental design, error, or biology. It is well established (and this study confirms) that crossovers exhibit nonrandom distributions within genes (i.e., 5′ enrichment). The distribution of GCs within genes is less well established, in part because of the danger of ascertainment bias based on the distributions of markers within the limited numbers of genes that have been studied. However, Dooner and He (2014) reported that GC events occur more frequently at two ends (5′ and 3′) than in the middle of the maize bz1 gene. In other species, other intragenic distributions have been observed (Schultes and Szostak 1990; Malone et al. 1994). Analysis of our collection of 3,014 aGCs affecting 946 genes provides no evidence to support a nonrandom distribution within genes (supplementary table S3, Supplementary Material online). The difference in distribution of crossovers and aGC is not surprising given our current understanding of meiotic recombination (Guillon et al. 2005; Yandeau-Nelson et al. 2005), though the nonrandom intragenic distribution of aGC has not been previously reported.

Synteny and Crossover

As a consequence of the evolutionary history of the maize genome, its two subgenomes have undergone subsequent gene loss and rearrangement (Schnable et al. 2011). There is a growing body of evidence that genes that have remained syntenic and those that are not syntenic differ in multiple respects (Schnable 2015), including methylation (Eichten et al. 2011), transcript accumulation levels (Eichten et al. 2011) and functional importance as defined by mutant analysis and studies of natural phenotypic variation (Schnable and Freeling 2011; Renny-Byfield et al. 2017). Our results provide strong evidence that syntenic genes experience higher rates of crossover than do nonsyntenic genes and as such provide additional support for functional differences between those genes that have been retained at syntenic positions and those that have not. Several studies have identified negative correlations between cytosine methylation and recombination in maize (Liu et al. 2009; Rodgers-Melnick et al. 2015) and later in Arabidopsis (Choi et al. 2013; Shilo et al. 2015). Hence, the differences in crossover frequency we observed between syntenic and nonsyntenic genes may be related to the observed differences in methylation between these classes of genes (Eichten et al. 2011). This could come about because following polyploidization, “genome-shock” can lead to the expression and translocation of transposons (Comai 2000), which in turn can disrupt synteny. Transposed genes that are no longer syntenic may undergo increases in cytosine methylation due to the proximity of transposons, which inhibits further movement and recombination (Eichten et al. 2012, 2013; Sehrish et al. 2014). These changes in methylation are thought to reduce transcript accumulation levels. Hence, it is intriguing to hypothesize that these changes in methylation may also be responsible for the reduced crossover frequency experienced by nonsyntenic genes (Eichten et al. 2011). It is, of course also possible that the reduced crossover frequency is a cause, rather than an effect, of altered patterns of methylation. A stronger case can be made that the alterations in gene order and content that occur following polyploidization and subsequent fractionation may disrupt rates of crossover in nonsyntenic genes (Soltis et al. 2012). It is known, for example, that large structural rearrangements in the maize genome can reduce crossover frequency (Rhoades and Dempsey 1953; Rodgers-Melnick et al. 2015). Perhaps more relevant is the finding that crossovers are suppressed in genomic regions of mice that are polymorphic for copy number variation (Morgan et al. 2017). Furthermore, crossover frequency is suppressed in maize genes that are hemizygous for transposon insertions (Yao et al. 2002; Dooner and He 2008). Hence, it is reasonable to hypothesize that the reduced sequence conservation surrounding nonsyntenic genes directly contributes to their reduced crossover frequency.

Materials and Methods

Genetic Stocks and Illumina RNA-Seq

Tissue was collected from apices of 10 day old seedlings of B73, Mo17, and 105 RILs extracted from the Intermated B73 × Mo17 (IBM) Syn4 population (Lee et al. 2002 and subjected to single-end, 75 bp RNA-sequencing, Li et al. 2013). Raw reads (between 18.8 and 46.8 million per RIL) were scanned for low quality bases with PHRED quality values of 15 (out of 40) which were removed from each end of each read. The remaining nucleotides were then scanned using overlapping windows of 10 bp and sequences beyond the last window with average quality value less than the specified threshold were truncated.

Alignment of Reads to the Reference Genome

Trimmed reads (96–98% of the raw reads per RIL) were aligned to the B73 reference genome version 2 (RefGen2) using GSNAP (Wu and Nacu 2010), which allows for gapped alignments, including intron-spanning alignments. Reads were retained if they mapped uniquely in the genome, allowing two or less mismatches every 36 bp and fewer than five bases for every 75 bp in read length as unaligned “tails” (69–80% of the trimmed reads per RIL). The read depth of each gene was computed based on the coordinates of mapped reads and annotated transcript start and stop locations of genes in the reference genome (ZmB73_5b_FGS).

SNP Discovery

The coordinates of unique alignments were used for SNP discovery. Polymorphisms at each potential SNP site were examined and putative homozygous SNPs were identified after ignoring the first and last three aligned bases of each read. A polymorphic base was required to have a PHRED base quality value of at least 20, and at least five unique reads supported the SNP call.

Filtering to Extract Segregating SNPs

The IBM RILs are expected to be segregating 1:1 for each allele if both alleles are expressed. To obtain a set of confident SNPs that were expressed in the majority of RILs and were segregating in the population, we required that at least 20 RILs exhibited the B73 genotype and at least 20 RILs exhibited the Mo17 genotype. To remove SNPs showing extreme segregation distortion, we also required that the ratio of the number of RILs with each of the two alleles not exceed 2.5. We further filtered any segregating SNPs that were not discovered in the Mo17 RNA-seq. After filtering, 176,279 SNPs remained.

Segmentation to Identify Chimeric Chromosomal Structure of Each RIL

The scores of each filtered SNP marker were converted to 1 (B73 genotype) or 0 (Mo17 genotype). The converted binary data were subjected to the segmentation with the R package DNAcopy (Olshen et al. 2004) using the following parameters: alpha = 0.01, nperm = 10000, p.method = “perm”, eta = 0.01, min.width = 3. These parameters required each generated segment to contain at least three SNP markers and a median absolute deviation equal to 0. We also required each segment to be least 200 kb and the mean of each segment is ≥0.9 or ≤0.1 for the B73 segment and Mo17 segment, respectively. During segmentation, any segments >2 Mb that were adjacent to a segment of the same genotype were merged into a single segment. This generated 8,624 segments summed across the 105 RILs. Segments from all RILs were merged, and unique marker positions across the RILs were used to define the beginning and end of 7,867 segmental markers. A merged segment contained only a single genotype in each RIL: B73, Mo17, or recombinant. Adjacent recombinant segments are termed recombinant breakpoint intervals (RBI).

Constructing the Genetic Map

The software MSTmap (http://alumni.cs.ucr.edu/∼yonghui/mstmap.html, last accessed September 11, 2018) was used to build a genetic map with the 7,867 segmental markers with B73 and Mo17 genotypes (recombinant genotypes were coded as missing) with the following parameters: population_type DH, distance function kosambi, cut_off_p_value 0.000001, no_map_dist 15.0 no_map_size 2, missing_threshold 0.6, estimation_before_clustering yes, detect_bad_data yes, and objective_function COUNT. A total of 13 markers had large disagreements in the physical and genetic positions and were removed from the segmental marker file. We then reran MSTmap with the filtered 7,854 segmental markers with the same parameters. All 7,854 segmental markers remained in the constructed genetic map, which contained 11 linkage groups. Two separated groups in which markers were all from chromosome 5 were concatenated.

Exploration of the Relationship between Crossover and Syntenic Retention

Genes within syntenic blocks (syntenic genes) between maize and sorghum have previously been determined (Schnable et al. 2011). Each chromosome of the B73 reference genome was divided into nonoverlapping 1 Mb bins. The total syntenic and nonsyntenic gene space in each bin was determined (ZmB73_5b_FGS). The amount of recombination in each bin representing primarily crossovers was determined by inferring the genetic start and end positions of each using a GAM function (Liu et al. 2009). The genetic distance between the start point and the end point was used to represent the crossover frequency of each bin.

Identification of Recombinant Alleles

The segmentation results from each individual RIL were used to identify RBIs. 7,574 RBIs were identified across the RILs. The number of RBIs is lower than the number of segmental markers because an RBI may contain multiple segments due to missing data. The sizes of RBIs vary depending on the surrounding informative SNP markers. Intragenic crossovers were defined as RBIs that occurred within a gene (ZmB73_5b_FGS). In this case, genes were defined as including an additional 1 kb of sequence upstream and downstream to include regulatory elements. Recombinant alleles were defined as alleles of these genes that include both B73 and Mo17 sequences. Most RBIs overlap with a gene (7,535/7,574) and a substantial fraction map to within a gene (848/7,574). Some RBIs occur within more than one overlapping gene. These RBIs were only counted once. Enrichment of RBIs in hotspot genes (genes with two or more intragenic recombination events, N = 13,298) was tested by permutation. Only genes with two or more SNPs were included as potential genes with intragenic crossover. The probability of a gene having an RBI was determined in two ways. First, the probability was calculated as the distance between the first and last SNP in a gene divided by the sum of this distance for all genes. This method assumes crossovers are distributed equally across a gene. However, we show that crossovers are enriched at the 5′ end of genes. Therefore, the second probability was modified in that any base pairs in the 5′ end of a gene were counted twice, and the sum of the distance between all SNPs was adjusted accordingly. Random genes from this filtered set were selected based on the probability of each gene having an RBI without replacement for each RIL based on the number of intragenic RBIs observed in that RIL. The number of random genes with an RBI event and the number of random hotspot genes were recorded. This process was repeated 1,000 times. The P-value of the proportion of hotspot genes was determined as the proportion of random hotspot genes from all random genes with intragenic crossovers greater than the proportion of observed hotspot genes from all observed genes with intragenic crossover.

Transgressive Transcript Accumulation of Recombinant Alleles

The transcript accumulation of each recombinant allele in a particular RIL was compared with the transcript accumulations of the nonrecombinant alleles contained in other RILs and the two parental inbreds, B73 and Mo17. Transgressive transcript accumulation of recombinant alleles was defined as recombinant alleles that exhibit > or <5% of the transcript accumulation of the most extreme nonrecombinant alleles. To determine if transgressive transcript accumulation occurs by chance, permutations were performed 1,000 times. For all genes with one or more recombinant alleles, the transcript accumulation of an equal number of alleles randomly selected from all RILs was tested for transgressive transcript accumulation. The number of random alleles with transgressive transcript accumulation for each gene was recorded and used as the null distribution. The observed number of genes with transgressive transcript accumulation from the real data was compared with this null distribution to obtain a P-value. The same process was performed to test for enrichment of up or down transgressive transcript accumulation separately.

Discovery of GC Events

To discover a set of possible GC events, the nonreciprocal exchange of genetic information, the SNPs with <5% missing data were further filtered. Identification of aGC events is highly dependent on missing data in a particular RIL; thus, only the 97 RILs with >100k genotyped SNPs were included in this analysis. The genotype of each SNP was compared with the genotype of the segment at which the SNP was located. A SNP showing the alternate genotype to the corresponding segment is a SNP affected by an aGC. aGCs that were observed in >5 RILs were removed as they may be due to systematic genotyping or sequencing errors. To obtain a consensus genotype for aGCs, the aGCs of each RIL were subjected to segmentation using DNAcopy software with the criteria (alpha = 0.01, nperm = 10000, p.method=“perm”, eta = 0.01, min.width = 2). The input data for DNAcopy are 1 and 0, representing aGC and not aGC of each of the SNP markers, respectively. Clusters of aGCs occurring in blocks >2 kb were excluded as these likely represent either double crossovers or regions incorrectly placed within the current maize pseudomolecules. The rate of GC per generation was calculated as the number of markers with an aGC divided by the total number of successfully genotyped markers divided by the number of generations. As RILs become more inbred, aGCs are less likely to be detected. Thus the number of generations in the RILs was modified as the number of rounds of intermating (N = 4) plus a modified value for the number of generations of selfing calculated as ½ where i is the number of selfing generations as described by Shi et al. (2010) (N = 2) for a total of six generations. The permutation tests to check for enrichment of GC events in certain genes was performed on RILs with <5% missing data and a filtered SNP set that only contains SNPs with a Mo17 genotype and a nearby SNP within 2 kb. Genes with SNPs that met these criteria were selected as genes with the potential for aGC. Permutations were performed by selecting a random set of genes equal to the observed number of genes with aGC per RIL. The number of unique genes randomly selected across RILs was determined and compared with the actual number of unique genes identified with GC in all RILs. This process was repeated 1,000 times. The mutation rate of the population was estimated using a set of 97 RILs having low missing data (see below). Only SNP sites with <5% missing data and for which the Mo17 and B73 inbreds carry the same allele were used for this estimate. RILs in which a nonB73 allele was identified at a site and was present in ≤5 RILs were putative mutations. The mutation rate was calculated as the number of putative mutations divided by the total number of genotyped sites with enough reads to make a SNP call (32,105/2,974,824,916 = 1.08 × 10−5). For mutation to cause a GC-like event at least two sites must be mutated (1.08 × 10−5 squared) and each site must be mutated to the same allele (multiplied by 1/2 squared). This estimate assumes mutations occur independently, which may not be accurate, and also does not take into account the fact that the two mutations must be adjacent for a GC event to be called and should thus be treated as a rough estimate of the improbability of mutations causing an aGC event in this population. Click here for additional data file.

70 in total

1. The recombination landscape in Arabidopsis thaliana F2 populations.

Authors: P A Salomé; K Bomblies; J Fitz; R A E Laitinen; N Warthmann; L Yant; D Weigel
Journal: Heredity (Edinb) Date: 2011-11-09 Impact factor: 3.821

2. Genetic dissection of intermated recombinant inbred lines using a new genetic map of maize.

Authors: Yan Fu; Tsui-Jung Wen; Yefim I Ronin; Hsin D Chen; Ling Guo; David I Mester; Yongjie Yang; Michael Lee; Abraham B Korol; Daniel A Ashlock; Patrick S Schnable
Journal: Genetics Date: 2006-09-01 Impact factor: 4.562

3. Decreasing gradients of gene conversion on both sides of the initiation site for meiotic recombination at the ARG4 locus in yeast.

Authors: N P Schultes; J W Szostak
Journal: Genetics Date: 1990-12 Impact factor: 4.562

4. Differentiation of the maize subgenomes by genome dominance and both ancient and ongoing gene loss.

Authors: James C Schnable; Nathan M Springer; Michael Freeling
Journal: Proc Natl Acad Sci U S A Date: 2011-02-22 Impact factor: 11.205

5. Meiotic recombination break points resolve at high rates at the 5' end of a maize coding sequence.

Authors: X Xu; A P Hsia; L Zhang; B J Nikolau; P S Schnable
Journal: Plant Cell Date: 1995-12 Impact factor: 11.277

Review 6. Progress toward understanding heterosis in crop plants.

Authors: Patrick S Schnable; Nathan M Springer
Journal: Annu Rev Plant Biol Date: 2013-02-06 Impact factor: 26.379

7. Epigenetic and genetic influences on DNA methylation variation in maize populations.

Authors: Steven R Eichten; Roman Briskine; Jawon Song; Qing Li; Ruth Swanson-Wagner; Peter J Hermanson; Amanda J Waters; Evan Starr; Patrick T West; Peter Tiffin; Chad L Myers; Matthew W Vaughn; Nathan M Springer
Journal: Plant Cell Date: 2013-08-06 Impact factor: 11.277

8. Maize genome structure variation: interplay between retrotransposon polymorphisms and genic recombination.

Authors: Hugo K Dooner; Limei He
Journal: Plant Cell Date: 2008-02-22 Impact factor: 11.277

9. Structure of a viral cap-independent translation element that functions via high affinity binding to the eIF4E subunit of eIF4F.

Authors: Zhaohui Wang; Krzysztof Treder; W Allen Miller
Journal: J Biol Chem Date: 2009-03-10 Impact factor: 5.157

10. Spreading of heterochromatin is limited to specific families of maize retrotransposons.

Authors: Steven R Eichten; Nathanael A Ellis; Irina Makarevitch; Cheng-Ting Yeh; Jonathan I Gent; Lin Guo; Karen M McGinnis; Xiaoyu Zhang; Patrick S Schnable; Matthew W Vaughn; R Kelly Dawe; Nathan M Springer
Journal: PLoS Genet Date: 2012-12-13 Impact factor: 5.917

2 in total

Review 1. A Critical Assessment of 60 Years of Maize Intragenic Recombination.

Authors: Ron J Okagaki; Stefanie Dukowic-Schulze; William B Eggleston; Gary J Muehlbauer
Journal: Front Plant Sci Date: 2018-10-29 Impact factor: 5.753

2. Expressed genes and their new alleles identification during fibre elongation reveal the genetic factors underlying improvements of fibre length in cotton.

Authors: Jianjiang Ma; Yafei Jiang; Wenfeng Pei; Man Wu; Qifeng Ma; Ji Liu; Jikun Song; Bing Jia; Shang Liu; Jianyong Wu; Jinfa Zhang; Jiwen Yu
Journal: Plant Biotechnol J Date: 2022-07-11 Impact factor: 13.263

2 in total