Literature DB >> 32236090

High expression in maize pollen correlates with genetic contributions to pollen fitness as well as with coordinated transcription from neighboring transposable elements.

Cedar Warman1, Kaushik Panda2, Zuzana Vejlupkova1, Sam Hokin3, Erica Unger-Wallace4, Rex A Cole1, Antony M Chettoor3, Duo Jiang5, Erik Vollbrecht4,6,7, Matthew M S Evans3, R Keith Slotkin2, John E Fowler1,8.   

Abstract

In flowering plants, gene expression in the haploid male gametophyte (pollen) is essential for sperm delivery and double fertilization. Pollen also undergoes dynamic epigenetic regulation of expression from transposable elements (TEs), but how this process interacts with gene expression is not clearly understood. To explore relationships among these processes, we quantified transcript levels in four male reproductive stages of maize (tassel primordia, microspores, mature pollen, and sperm cells) via RNA-seq. We found that, in contrast with vegetative cell-limited TE expression in Arabidopsis pollen, TE transcripts in maize accumulate as early as the microspore stage and are also present in sperm cells. Intriguingly, coordinate expression was observed between highly expressed protein-coding genes and their neighboring TEs, specifically in mature pollen and sperm cells. To investigate a potential relationship between elevated gene transcript level and pollen function, we measured the fitness cost (male-specific transmission defect) of GFP-tagged coding sequence insertion mutations in over 50 genes identified as highly expressed in the pollen vegetative cell, sperm cell, or seedling (as a sporophytic control). Insertions in seedling genes or sperm cell genes (with one exception) exhibited no difference from the expected 1:1 transmission ratio. In contrast, insertions in over 20% of vegetative cell genes were associated with significant reductions in fitness, showing a positive correlation of transcript level with non-Mendelian segregation when mutant. Insertions in maize gamete expressed2 (Zm gex2), the sole sperm cell gene with measured contributions to fitness, also triggered seed defects when crossed as a male, indicating a conserved role in double fertilization, given the similar phenotype previously demonstrated for the Arabidopsis ortholog GEX2. Overall, our study demonstrates a developmentally programmed and coordinated transcriptional activation of TEs and genes in pollen, and further identifies maize pollen as a model in which transcriptomic data have predictive value for quantitative phenotypes.

Entities:  

Year:  2020        PMID: 32236090      PMCID: PMC7112179          DOI: 10.1371/journal.pgen.1008462

Source DB:  PubMed          Journal:  PLoS Genet        ISSN: 1553-7390            Impact factor:   5.917


Introduction

Sexual reproduction enables the segregation and recombination of genetic material, which increases genetic diversity in populations and contributes to the vast diversity of eukaryotes. In flowering plants, sexual reproduction requires the development of reduced, haploid gametophytes from sporophytic, diploid parents. The mature female gametophyte, the embryo sac, includes the binucleate central cell and the egg cell (reviewed in [1,2]), each of which is fertilized by a sperm cell to generate the triploid endosperm and diploid embryo, respectively. The mature male gametophyte, pollen, consists of a vegetative cell harboring two sperm cells (reviewed in [3,4]). In maize, male gametophytes arise from microspore mother cells in the tassel primordium. The transition from diploid sporophyte to haploid gametophyte occurs when these cells undergo meiosis, each resulting in four haploid microspores. Each microspore then undergoes two rounds of mitosis to produce the pollen grain, first generating the large vegetative cell and a smaller generative cell via asymmetric division, and then producing the two sperm from the generative cell. After the arrival of the pollen grain on the floral stigma, the vegetative cell transports the two sperm cells to the female gametophyte via pollen tube growth (reviewed in [5,6]). Accurate navigation of the pollen tube as it grows down the style is dependent on the architecture of the style's transmitting tract [7] and additional signaling and recognition mechanisms that are poorly understood [8]. The final stages of pollen tube growth depend on a complex interplay of signals to guide the pollen tube to the micropyle of the ovule for sperm delivery to the embryo sac (reviewed in [9]). In maize, a pollen tube must grow up to 30 cm through the silk to reach the female gametophyte, often competing with multiple pollen tubes to eventually enter the embryo sac and release its sperm cells for fertilization (reviewed in [2,5]). Across the angiosperms, this competitive context for pollen tube development differs, depending on the pollen population as well as sporophytic characters (reviewed in [10]). In a highly competitive environment, successful fertilization is likely enhanced by pollen tubes functioning at full capacity [11-13], as generally only the first tube to reach the micropyle is permitted to enter the female gametophyte. The mechanisms preventing entry of multiple pollen tubes, known as the polytubey block, are not well-understood, but presumably act to reduce polyspermy, which typically leads to sterile offspring [6]. In maize, mutations in the genes MATRILINEAL/NLD/ZmPLA1 and ZmDMP have been linked to pollen-induced production of haploid embryos and other seed defects, which are likely associated with aberrant events at fertilization [14-17] or soon after [18]. Thus, many mechanisms associated with both pollen tube growth and fertilization remain enigmatic. Given their specialized biological functions and well-defined developmental stages, gametophytes are prime targets for transcriptome analysis. Initial studies of plant gametophytic transcriptomes in Arabidopsis pollen [19,20] and embryo sacs [21,22] described a limited and specialized set of transcripts and identified numerous candidate genes for gametophytic function. In maize, the first RNA-seq study of male and female gametophyte transcriptomes (mature pollen and embryo sacs) similarly identified subsets of developmentally specific genes, with pollen showing the most specialized transcriptome relative to other tissues assessed [23]. More recently, RNA-seq has been carried out on additional stages of maize reproductive development, including pre-meiotic and meiotic anther cells [24-26], as well as sperm cells, egg cells, and early stages of zygotic development [27]. Gametophytic tissues are known to show dynamic expression of transposable elements (TEs). In Arabidopsis, global TE expression is derepressed at the late stages of pollen development, occurring in the pollen vegetative nucleus only after pollen mitosis II [28]. The pollen vegetative nucleus undergoes a programmed loss of heterochromatin, resulting in TE activation, TE transposition and subsequent increased RNA-directed DNA methylation [28-32]. A variety of functions have been ascribed to this male gametophytic "developmental relaxation of TE silencing" (DRTS) event [33], including the generation of TE small interfering RNAs that are mobilized to the sperm cells [34], and control of imprinted gene expression after fertilization [35]. However, the dynamics of TE expression during gametophytic development in a transposable element-rich species such as maize have not been investigated. To provide a more full description of transcriptome dynamics across maize male reproductive development, including TE transcriptional activity, we generated RNA-seq datasets from tassel primordia, microspores, mature pollen, and isolated sperm cells. Using these data, we describe differential expression patterns of genes and TEs across these stages, uncovering a coordinated regulation of TEs and their neighboring genes in pollen grains. Then, within a framework provided by the transcriptome data, we conducted a functional validation of highly expressed genes by testing over fifty insertional mutations for male-specific fitness effects. Finally, the same transcriptome data guided the discovery of mutant alleles in the sperm cell-enriched gex2, which induces seed development defects when present in the pollen parent, implying a role in fertilization.

Results

Experimental design and gene expression during maize male reproductive development

RNA-seq was performed on four tissues representing integral stages in maize male gametophyte development: immature tassel primordia (TP), isolated unicellular microspores (MS), mature pollen (MP), and isolated sperm cells (SC) (Fig 1A). Techniques were developed to efficiently isolate RNA from TP, MS, and SC (see Methods). RNA was extracted from the inbred maize line B73, with four biological replicates for each tissue. In addition, a single RNA replicate was isolated for the bicellular stage of pollen development (MS-B). Libraries were sequenced using Illumina sequencing (100 bp paired-end reads) and mapped to the B73 AGPv4 reference genome [36]. Principal Component Analysis (PCA) showed samples from each tissue clustering together along PC1 and PC2, which together explained 49.8% of the variance between samples (Fig 1B). One sample, SC1, had significant levels of ribosomal RNA (rRNA) contamination, as well as the fewest number of mapped reads (approximately 1 million). However, to maintain a balanced experimental design with a consistent false discovery rate (FDR), we chose to include SC1 in our analysis of gene expression patterns.
Fig 1

Experimental design and characteristics of maize male reproductive transcriptomes.

(A) mRNA was isolated from four developmental stages of maize male reproductive development, with four biological replicates for each: pre-meiotic tassel primordia (TP), post-meiotic, unicellular microspores (MS), mature pollen (MP), and sperm cells (SC). A single biological replicate of mRNA from the bicellular stage of pollen development was also isolated and sequenced (MS-B, not shown here). Nuclei were stained with either DAPI or Dyecycle green. (B) Principal component analysis of genic transcriptomic data generated by this study, showing the 2 major components (explaining 49.8% of the variance) on x- and y-axis. The four biological replicates of each of the four sequenced tissues clustered with other replicates from the same tissue. TP and MS were clearly separated in principal component space, whereas SC and MP samples displayed less separation from each other. (C) High expression levels are associated with developmental specificity: approximately 2/3 of the genes associated with the highest FPKM values in each of the four sample types are highly expressed in only that sample type.

Experimental design and characteristics of maize male reproductive transcriptomes.

(A) mRNA was isolated from four developmental stages of maize male reproductive development, with four biological replicates for each: pre-meiotic tassel primordia (TP), post-meiotic, unicellular microspores (MS), mature pollen (MP), and sperm cells (SC). A single biological replicate of mRNA from the bicellular stage of pollen development was also isolated and sequenced (MS-B, not shown here). Nuclei were stained with either DAPI or Dyecycle green. (B) Principal component analysis of genic transcriptomic data generated by this study, showing the 2 major components (explaining 49.8% of the variance) on x- and y-axis. The four biological replicates of each of the four sequenced tissues clustered with other replicates from the same tissue. TP and MS were clearly separated in principal component space, whereas SC and MP samples displayed less separation from each other. (C) High expression levels are associated with developmental specificity: approximately 2/3 of the genes associated with the highest FPKM values in each of the four sample types are highly expressed in only that sample type. Differential gene expression was defined in two ways: in the first, gene expression in later developmental stages was compared to the premeiotic, diploid tassel primordia (TP vs MS, TP vs MP, and TP vs SC); in the second, gene expression was compared between all adjacent developmental stages (TP vs MS, MS vs MP, MS vs SC, MP vs SC) (S2 and S11 Tables). Enriched GO terms highlighted the differences in gene expression among developmental stages and suggested consistency with the established functions of each tissue [19,23,27]. GO terms in MS were consistent with a post-meiotic tissue still at an early stage of development, with terms related to protein synthesis and transport, morphogenesis, and reproduction showing enrichment. MP showed more specific enriched GO terms, including those related to pollen tube growth, signaling, and actin filament-based movement. SC shared many GO terms with MP when compared to MS, but was uniquely enriched for GO terms related to epigenetic regulation of gene expression, such as histone H3K9 demethylation and gene silencing by RNA, potential mechanisms involved in differential regulation of TEs. Comparison of the most highly expressed genes from all four sample types showed that, generally, such transcripts were highly enriched at a single developmental stage (Fig 1C). Notably, no single gene was highly expressed in all four tissues, and fewer than twenty were highly expressed in three out of the four stages assessed. These data suggest a simple hypothesis in which the expression level for protein-coding genes reflects, at some level, functional importance, i.e., a high expression level at a specific developmental stage implies an increased contribution by the gene’s encoded function at that particular stage. Alternatively, or in addition, high expression could be reflective of regulatory mechanisms specific to each stage, each primarily influencing specific subsets of genes. We were interested to explore the possibility of some regulatory linkage between those genes and TEs highly expressed in the male gametophyte.

A subset of transposable elements in the maize genome show developmentally dynamic expression

To obtain a baseline view of TE expression throughout maize development, our RNA-seq data for maize male reproductive development (samples with asterisks, S1 Fig) was combined with publicly available datasets from nine-day old above-ground seedlings, juvenile leaves, ovules, another set of independently isolated sperm cells, and three independent studies of pollen RNA-seq [23,27,37,38] & SRP067853). The complete list of samples, their sequencing statistics, references and data availability can be found in S3 Table. All of the raw data were remapped using the same parameters (see Methods). Principal component analysis demonstrates that replicates of the same tissue and growth state typically group together (S1 Fig). We aimed to identify the set of dynamically expressed TEs within the tissues sampled, and thus calculated expression levels for each individual TE in the genome located more than 2 kb away from annotated non-TE genes. Our rationale was to avoid false positive signals of TE expression due to a TE residing within a gene, and to minimize the influence of read-through transcription from a nearby gene, which could not be distinguished from TE-initiated transcription. Because we concentrated on individual elements, and not TE families, the majority of annotated TEs were not assessed in this analysis (55%; Fig 2A ‘not covered’), either because no expression was detected in any dataset, or because their sequence lacks the polymorphisms necessary for mapping reads to a specific TE. To relate TE expression comparatively across development, we used seedling tissue as a baseline against which other tissues were measured. Seedling was chosen for several reasons: it is not a reproductive tissue, it has low to average levels of TE expression, and a large number of TEs show no evidence of expression in this tissue (S2 Fig).
Fig 2

Characterization of developmentally dynamic transcription from transposable elements (TEs).

(A) Distribution of different categories of TEs based on their expression. Number of TEs are in parentheses. (B) Length of TEs in the different TE categories from part A. The violin plots around the box show the kernel probability density of the data. The box represents lower and upper quartile, the line is the median, and the whiskers represent 10–90% range. Red asterisk denotes the mean. (C) Observed / expected Log2 ratios of TE family proportions in the different TE categories from part A. Grey indicates no data. (D) Distance from the annotated centromere for different TE categories from panel A.

Characterization of developmentally dynamic transcription from transposable elements (TEs).

(A) Distribution of different categories of TEs based on their expression. Number of TEs are in parentheses. (B) Length of TEs in the different TE categories from part A. The violin plots around the box show the kernel probability density of the data. The box represents lower and upper quartile, the line is the median, and the whiskers represent 10–90% range. Red asterisk denotes the mean. (C) Observed / expected Log2 ratios of TE family proportions in the different TE categories from part A. Grey indicates no data. (D) Distance from the annotated centromere for different TE categories from panel A. Apart from the 18.3% of annotated TEs that are near genes and analyzed separately (see below), we calculated the number of TEs with statistically significant expression differences in each tissue compared to the seedling reference. This identified the subset of TEs that are developmentally dynamic, meaning that they show differential expression in at least one tissue in our dataset compared to the seedling reference. Only 4.4% of all maize annotated TEs are developmentally dynamic, whereas 22.2% of TEs have detectable expression but do not change in our dataset and therefore are developmentally ‘static’ (Fig 2A). Each TE category was interrogated for feature overrepresentation. Both dynamic and static TEs are longer than the genome average, and longer than the sets of TEs ‘not covered’ or ‘near genes’ (Fig 2B). To determine if one long family of TEs was contributing this difference, we performed this analysis for each TE superfamily and found that for dynamic TEs, this observation is not specific to one TE type (S3 Fig). The finding that expressed TEs as a group are longer correlates with Arabidopsis data where longer TE transcripts are overrepresented and differentially regulated when epigenetic repression is lost [39]. Expressed TEs show an under-representation for DNA transposon and SINE families, which are mainly within the ‘near genes’ set (Fig 2C). In contrast, the ‘LTR unknown’ TE annotation is over-represented in the dynamic TE set (Fig 2C). Since some LTR retrotransposons are enriched in the pericentromere [40], we tested if the dynamic TE set is enriched in the pericentromere compared to the genome average, but did not detect any correlation (Fig 2D). Therefore, we conclude that expressed TEs are generally longer elements, and the subset of developmentally dynamic TEs are enriched for uncharacterized LTR retrotransposons located throughout the genome.

Transposable element transcript levels are up-regulated in the post-meiotic male reproductive lineage

From the developmentally dynamic TE set, we calculated the number of differentially expressed TEs in each tissue/stage compared to the seedling reference. In some tissues, such as tassel primordia and ovules, we observed a similar number of TEs up-regulated and down-regulated (Fig 3A), demonstrating that while there are shifts in which TEs are expressed, a genome-scale change in TE expression does not occur. In other tissues, such as juvenile leaves, there is a skew towards increased TE expression. The largest TE up-regulation occurs in the tissues of the male reproductive lineage, including unicellular and bicellular microspores, mature pollen and isolated sperm cells (Fig 3A). Our data confirm the recent finding that the tissue with the largest number of TE families activated is mature pollen [41]. The number of up-regulated TEs compared to down-regulated TEs in these tissues suggest that there is a genome-wide activation of TE expression, similar to the DRTS event that occurs in Arabidopsis pollen [28,33]. One important distinction is that TE expression is present in maize sperm cells (Fig 3A), whereas it is not detected in Arabidopsis sperm cells [28]. To verify this finding, we compared our sperm cell RNA-seq data to an independent maize sperm cell dataset [27]. We found that TEs are also significantly expressed in this independent dataset, and 70% of those expressed TEs are also detected in our dataset (p<0.001) (Fig 3B). This shared set of 810 sperm cell-expressed TEs (38% of those detected in our dataset), supports the conclusion that significant expression of TEs occurs in maize sperm cells. Of the sperm cell-expressed TEs, 36% were not observable in total pollen, but rather required the isolation and enrichment of sperm cells for detection (Fig 3B). Overall, we detect 157 TEs expressed in both sperm cell datasets that are not expressed throughout development, but specifically in the sperm cells (sperm-cell exclusive).
Fig 3

High TE expression in the maize male gametophyte lineage.

(A) Number of differentially expressed TEs in seven tissues compared to seedlings. The inset volcano plot shows for mature pollen how differentially expressed TEs were identified. Green and red numbers within the volcano plot indicate how many TEs were statistically up- or down-regulated, respectively. (B) Number of up-regulated TEs in mature pollen compared to isolated sperm cells from this study and a previously published distinct isolation and sequencing of sperm cell mRNA. (C) Starting with TEs differentially up-regulated in unicellular microspores (boxed, far left volcano plot), we determined how many of these same TEs are expressed at other developmental time points. (D) Raw distribution of expressed TE family annotations. ‘Male’ refers to the set of TEs expressed in any male lineage dataset (MS, MS-B, MP, SC). ‘Male specific’ are TEs expressed in only the male lineage (not other tissues / timepoints). ‘Male extensive’ TEs are expressed in all of the male lineage datasets, and ‘male extensive + specific’ refers to TEs expressed all male lineage datasets and not other tissues / timepoints. ‘High-confidence sperm’ refers to TEs identified as expressed in both analyzed sperm cell datasets from part B. (E) Expressed TE family annotations normalized to the genome-wide TE distribution of TEs >2 kb from genes. Categories are the same as part D.

High TE expression in the maize male gametophyte lineage.

(A) Number of differentially expressed TEs in seven tissues compared to seedlings. The inset volcano plot shows for mature pollen how differentially expressed TEs were identified. Green and red numbers within the volcano plot indicate how many TEs were statistically up- or down-regulated, respectively. (B) Number of up-regulated TEs in mature pollen compared to isolated sperm cells from this study and a previously published distinct isolation and sequencing of sperm cell mRNA. (C) Starting with TEs differentially up-regulated in unicellular microspores (boxed, far left volcano plot), we determined how many of these same TEs are expressed at other developmental time points. (D) Raw distribution of expressed TE family annotations. ‘Male’ refers to the set of TEs expressed in any male lineage dataset (MS, MS-B, MP, SC). ‘Male specific’ are TEs expressed in only the male lineage (not other tissues / timepoints). ‘Male extensive’ TEs are expressed in all of the male lineage datasets, and ‘male extensive + specific’ refers to TEs expressed all male lineage datasets and not other tissues / timepoints. ‘High-confidence sperm’ refers to TEs identified as expressed in both analyzed sperm cell datasets from part B. (E) Expressed TE family annotations normalized to the genome-wide TE distribution of TEs >2 kb from genes. Categories are the same as part D. A second notable difference between maize and Arabidopsis is the activation of TE expression early in the male gametophytic phase of maize. A genome-wide increase in TE transcript levels is detected at the earliest post-meiotic stage tested, the microspore, in contrast to low TE expression in the sporophytic tassel primordia (Fig 3A). Arabidopsis TE expression occurs only late in pollen development, after pollen mitosis I when the vegetative cell is generated [28]. To determine if TEs were indeed activated early in maize male reproductive development, we asked if the same TEs that we identified as expressed in the unicellular microspore remain active throughout the male reproductive lineage. We used the set of differentially expressed up-regulated TEs in unicellular microspores (3,335) and found that 62% are still expressed in bicellular microspores and 54% in mature pollen (Fig 3C), demonstrating that once TEs are activated early in development, expression and/or steady-state mRNA frequently remains through pollen maturation. Only some of these male-lineage expressed TEs continue to be expressed in sperm cells (32%), raising the possibility that many TEs with active expression in the early gametophytic stages are under negative/repressive regulation in the gametes. This large-scale developmental activation is potentially limited to the male lineage, as ovules express relatively few TEs (Fig 3A) and only 14% of the male lineage-expressed elements (Fig 3C). Together, our data demonstrate conserved activation of TE expression in the male gametophytes of maize and Arabidopsis, with key differences such as the developmental timing and localization of TE expression in the gamete cells. We determined what types of TEs activate in the male reproductive lineage and sperm cells and compared these to the whole-genome distribution of TEs analyzed. Overall, both male lineage-expressed TEs and sperm cell-expressed TEs reflect the genome-wide TE distribution (Fig 3D). This suggests that TE family type does not have a determining role in the developmental regulation of TE expression. One notable exception is the enrichment of Mutator family TE expression in sperm cells (Fig 3D). When normalized for genome-wide TE distribution, Mutator element expression is highly enriched across the male lineage, including in sperm cells (Fig 3E). The expression of some Mutator TEs in sperm cells is both high confidence (present in both sperm cell datasets) and specific to only that tissue (high confidence sperm cell specific, Fig 3E). LINE L1 elements are also expressed throughout the male lineage and sperm cells, but their expression is general and not specific to these cell types (Fig 3E). Our data demonstrate that there is a general (TE family-independent) activation of TE expression in the male reproductive lineage, with one observable bias towards Mutator family expression in both the male lineage and sperm cells.

Mature pollen and sperm cells display coexpression of highly expressed genes and their neighboring TEs

To determine if TEs have an effect on neighboring gene expression, or vice versa, we next analyzed the set of 36,945 assayable TEs within 2kb of genes (Fig 2A). We calculated the absolute expression level of each genic isoform and categorized them into 100 bins of expression levels for each developmental stage (Fig 4). We found no significant relationship between highly expressed genes and the number of up- or down-regulated TEs in tassel primordia or microspores (grey bars, top row, Fig 4). In contrast, in both mature pollen and isolated sperm cells there is a statistically significant (p < 1E-6) positive association between highly expressed genes and the number of up-regulated TEs within 2kb of those genes (grey bars, bottom row, Fig 4). Similarly, there is a negative correlation (p < 1E-6) between high gene expression and the number of down-regulated TEs in the same samples (grey bars, bottom row, Fig 4). This relationship is not due to the fact that pollen or sperm cell-expressed genes are more likely to be located nearby a TE (S4A Fig), nor due to sample contamination between these two datasets (S4B Fig). To determine if our analysis is biased by the presence of multiple TEs close to just a few highly expressed genes, we also counted the number of genes with at least one differentially expressed TE within 2kb (black dots, Fig 4 and S4 Fig). We found that a similar number of genes were adjacent to differentially expressed TEs (compared grey bars to black dots, Fig 4 and S4 Fig), demonstrating that a small number of genes was not biasing our dataset. We conclude that specifically in the mature male gametophyte the most highly expressed genes tend to be near actively expressing TEs. However, it remains unclear whether gene expression is influencing TE expression, or TE expression is affecting gene regulation, or alternatively, some global regulatory mechanism is influencing both.
Fig 4

Co-regulation of TE and gene expression in the male gametophyte.

For each tissue type, the top 20,000 most highly expressed genes are distributed along the X-axis in bins of 200, with the most highly expressed bin on the far left. For each bin the number of up- and down-regulated TEs near (<2kb) that bin’s genes is then summed on the Y-axis (shown in grey bar). For each bin, the number of genes with at least 1 up- or down-regulated TE within 2kb is displayed as black dots. In unicellular microspores (top right) there is little correlation, whereas in mature pollen and sperm cells (bottom panels) the most highly expressed genes are near primarily up-regulated TEs. To check if the perceived correlations are statistically significant, we performed Kendall’s Tau rank correlation tests and found significant correlations (p < 1E-6) only for mature pollen and sperm cells for both number of TEs and number of genes. For mature pollen, gene expression has a positive correlation with up-regulated TEs (#TEs: τ = 0.65, #genes: τ = 0.64) and a negative correlation with down-regulated TEs (#TEs: τ = -0.55, #genes: τ = -0.55). Similarly for sperm cells, gene expression has a positive correlation with up-regulated TEs (#TEs: τ = 0.52, #genes: τ = 0.52) and a negative correlation with down-regulated TEs (#TEs: τ = -0.42, #genes: τ = -0.44). Only TE (not genes) rank correlation coefficients (τ) are displayed in the figure. The bin location of gex2 (see Fig 7) is annotated in the sperm cell data.

Co-regulation of TE and gene expression in the male gametophyte.

For each tissue type, the top 20,000 most highly expressed genes are distributed along the X-axis in bins of 200, with the most highly expressed bin on the far left. For each bin the number of up- and down-regulated TEs near (<2kb) that bin’s genes is then summed on the Y-axis (shown in grey bar). For each bin, the number of genes with at least 1 up- or down-regulated TE within 2kb is displayed as black dots. In unicellular microspores (top right) there is little correlation, whereas in mature pollen and sperm cells (bottom panels) the most highly expressed genes are near primarily up-regulated TEs. To check if the perceived correlations are statistically significant, we performed Kendall’s Tau rank correlation tests and found significant correlations (p < 1E-6) only for mature pollen and sperm cells for both number of TEs and number of genes. For mature pollen, gene expression has a positive correlation with up-regulated TEs (#TEs: τ = 0.65, #genes: τ = 0.64) and a negative correlation with down-regulated TEs (#TEs: τ = -0.55, #genes: τ = -0.55). Similarly for sperm cells, gene expression has a positive correlation with up-regulated TEs (#TEs: τ = 0.52, #genes: τ = 0.52) and a negative correlation with down-regulated TEs (#TEs: τ = -0.42, #genes: τ = -0.44). Only TE (not genes) rank correlation coefficients (τ) are displayed in the figure. The bin location of gex2 (see Fig 7) is annotated in the sperm cell data.
Fig 7

Mutations in the sperm cell-specific gex2 gene cause aberrant seed development.

(A) The exon/intron structure of gex2 (Zm00001d005781/GRMZM2G036832), showing the locations of the two independent Ds-GFP insertion mutants. (B) Ear projections of gex2 mutant outcrosses. Top: heterozygote outcrossed as female, showing 1:1 transmission of the GFP-tagged allele. Middle: heterozygote outcrossed as a male, with 26.1% transmission of the mutant allele. Additionally, small seeds and occasional, small gaps between seeds are visible. Bottom: homozygous mutant outcrossed as a male, with many small seeds and large gaps, despite heavy pollination. (C) Genomic neighborhood of the GEX2 locus, with two nearby TEs, and their RNA-seq expression levels across male reproductive development. (D) Predicted domain structure of Zm GEX2; the amino acid sequence shows 44.2% similarity with Arabidopsis GEX2. (E) Quantification of small/aborted seeds resulting from pollination by gex2 mutant plants and controls. Controls included two Ds-GFP lines that did not show transmission defects (tdsgR12H07 and tdsgR46C04) and one Ds-GFP line that showed a strong transmission defect in the vegetative cell group (tdsgR96C12; 29.5% transmission). A higher percentage of small/aborted seeds was present following pollination by heterozygous gex2 plants representing the two mutant alleles (tdsgR82A03 and tdsgR84A12), and pollination by homozygous gex2-tdsgR84A12 plants further increased the percentage of small/aborted seeds.

Large-scale insertional mutagenesis supports a relationship between transcript level and fitness contribution for vegetative cell-expressed genes

Using the quantitative framework provided by our transcriptome dataset, we next tested the simple hypothesis that highly expressed genes contribute to male gametophytic function–i.e, to reproductive success (pollen fitness). The functional validation approach we used relied on a large, sequence-indexed collection of green fluorescent protein (GFP)-marked transposable element (Ds-GFP) insertion mutants [42], enabling assessment of the effects of mutations in select genes (Fig 5). We focused on expression data from the MP and SC stages, as these have distinctive cell fates and roles in reproduction: the vegetative cell generates the pollen tube for competitive delivery of gametes, and the sperm cells accomplish double fertilization. Expression data from seedlings [23] was used to design a sporophytic control. Highly expressed genes, operationally defined as in the top 20% for a tissue by FPKM, were grouped into three mutually exclusive classes: Seedling, Sperm Cell, and Vegetative Cell. The seedling group also excluded any gene highly expressed in either MP or SC. Due to the significant overlap among genes highly expressed in both MP and SC, we compared expression values to assign each of these genes to a single class. Vegetative Cell genes were not only highly expressed in MP, but were also associated with an FPKM greater in MP than in SC, and vice versa for Sperm Cell genes (S5 Table). All genes in these classes were then cross-referenced with Ds-GFP insertion locations to identify potential mutant alleles for study, restricting the search to insertions in coding sequence (CDS), as these were rationalized as most likely to generate loss-of-function effects. Finally, to insure our results were as generalizable as possible, each class list was randomized to identify the specific subset of Ds-GFP lines for study. Insertion locations were verified by PCR for 64 of 83 alleles obtained (S6 Table) (see Methods), of which 56, representing mutations in 52 genes, generated sufficient transmission data to include in our final analysis.
Fig 5

Large-scale tracking of seed marker transmission frequencies was accomplished by generating ear projections with a custom built rotational scanner.

(A) When crossed either through the male or the female, Ds-GFP mutant allele tdsgR107C12 (in gene Zm00001d012382), marked by green fluorescent seeds, shows 1:1 Mendelian inheritance (50% transmission of the GFP seed marker). Images captured in blue light with an orange filter. (B) Mutant alleles in other genes, such as tdsgR102H01 (Zm00001d037695), showed non-Mendelian segregation when crossed through the male (37.5% GFP transmission). Segregation through the female remained Mendelian, indicating a male-specific transmission defect. (C) For some mutant alleles (~10% of lines in this study), the anthocyanin transgene C1 was tightly linked to the insertion mutant. In these cases, seeds carrying a mutant allele of a gene of interest could be tracked by their purple color. Here, insertion tdsgR96C12 (Zm00001d015901) shows a strong male-specific transmission defect (24.8% C1 transmission through the male). Images captured in full spectrum visible light.

Large-scale tracking of seed marker transmission frequencies was accomplished by generating ear projections with a custom built rotational scanner.

(A) When crossed either through the male or the female, Ds-GFP mutant allele tdsgR107C12 (in gene Zm00001d012382), marked by green fluorescent seeds, shows 1:1 Mendelian inheritance (50% transmission of the GFP seed marker). Images captured in blue light with an orange filter. (B) Mutant alleles in other genes, such as tdsgR102H01 (Zm00001d037695), showed non-Mendelian segregation when crossed through the male (37.5% GFP transmission). Segregation through the female remained Mendelian, indicating a male-specific transmission defect. (C) For some mutant alleles (~10% of lines in this study), the anthocyanin transgene C1 was tightly linked to the insertion mutant. In these cases, seeds carrying a mutant allele of a gene of interest could be tracked by their purple color. Here, insertion tdsgR96C12 (Zm00001d015901) shows a strong male-specific transmission defect (24.8% C1 transmission through the male). Images captured in full spectrum visible light. Mendelian inheritance predicts 50% transmission of mutant and wild-type alleles when a heterozygous mutant is outcrossed to a wild-type plant. However, a mutation that alters the function of a gene expressed during the haploid gametophytic phase can result in a reduced transmission rate if that gene contributes to the fitness of the male gametophyte–i.e., to its ability to succeed in the highly competitive process of pollen tube growth, given that 50% of the pollen population will be wild-type for the same gene. Thus, reduced transmission of a mutant through the male (a male transmission defect) provides not only evidence for gene function in the gametophyte, but also a measure of the mutated gene’s contribution to fitness. Transmission rates through the female serve as a control, as 50% transmission through the female would confirm both a single Ds-GFP insertion in the genome and male-specificity for any defect identified. To measure the fitness cost of each Ds-GFP insertion, heterozygous mutant plants were reciprocally outcrossed with a heavy pollen load to a wild-type plant, maximizing pollen competition within each silk. Transmission rates were then quantified by assessing the ratio of the non-mutant to mutant progeny using a novel scanning system and image analysis pipeline (Fig 5) (see Methods) [43]. Mutant alleles were tracked using linked endosperm markers: either the GFP encoded by the inserted transposable element (Fig 5A and 5B), or, in ~10% of the lines, a tightly linked C1 anthocyanin transgene (present due to the initial Ds-GFP generation protocol) (Fig 5C, S7 Table). Transmission rates for all groups were tested through quasi-likelihood tests on generalized linear models with a logit link function for binomial counts (see Methods, S8 Table). When crossed through the female, no genes showed significant differences from Mendelian inheritance (Fig 6A). When crossed through the male, no genes with insertion alleles in the Seedling category (n = 10) showed evidence of abnormal transmission rates (Fig 6B). Most Sperm Cell genes (n = 10, 90%) showed no statistically significant transmission defects, with one notable exception (two independent alleles of the gex2 gene, described in detail below) (Fig 6C). However, among Vegetative Cell genes tested (n = 32), a larger proportion of insertion alleles (7 out of 32 or 21.9%) showed significant male transmission defects (quasi-likelihood test, adjusted p-value threshold < 0.05) (Fig 6D). The proportions of genes with transmission defects in the three classes were not significantly different by Fisher's exact test (Seedling vs Sperm Cell p-value = 0.500, Seedling vs Vegetative Cell p-value = 0.125, Vegetative Cell vs Sperm Cell p-value = 0.374), likely due to the small number of mutations assessed in the Seedling and Sperm Cell classes. For the insertion alleles tested, a summary description of genes showing non-Mendelian inheritance can be found in Table 1, whereas a description of those showing Mendelian inheritance can be found in Table 2.
Fig 6

Functional validation of highly-expressed gametophyte genes by quantification of transmission rates in Ds-GFP insertional mutants.

Alleles with CDS insertions were tested for differences from Mendelian inheritance using a quasi-likelihood test, with p-values corrected for multiple testing using the Benjamini-Hochberg procedure; alleles with quasi-likelihood adjusted p-value < 0.05 are represented in pink. Alleles are plotted by the log2(FPKM) of their respective gene according to that gene’s expression class (Seedling, Vegetative Cell, or Sperm Cell). Insertion alleles distributed among the classes as follows: Vegetative Cell, 35 alleles; Sperm Cell, 11 alleles; Seedling, 10 alleles. The number of seeds categorized for each allele ranged from 1,522 to 5,219, with an average of 2,807. Genes represented by two independent insertion alleles are enclosed by dotted lines. (A) Transmission rates of 56 mutant allele seed markers for heterozygous Ds-GFP mutant plants crossed through the female. (B) Transmission rates for alleles in the negative control Seedling class when crossed through the male. (C) For genes belonging to the Sperm Cell class, one out of ten (10%) was associated with significant non-Mendelian inheritance. The single gene with a male transmission defect in this group (gex2) showed a strong defect for both of the independent alleles tested. (D) An increased proportion of the genes in the Vegetative Cell class were associated with significant non-Mendelian inheritance when mutant (7/32 genes, 21.9%). In this class, an increase in log2(FPKM) was significantly associated with a decrease in marker transmission (linear regression, p = 0.0120).

Table 1

Characteristics of genes showing non-Mendelian inheritance.

CategoryGene designation (v3)Gene designation (v4)Gene TypeDs-GFP alleleMale transmission rateAdjusted p-valueBest BLAST Hit, A. thalianaPredicted Function (B73v4 Gramene)Cellular process (inferred)
Vegetative CellGRMZM2G359879Zm00001d028437singletontdsgR04A0243.84%3.29E-04AT3G61050Calcium-dependent lipid-binding (CaLB domain) family proteinCell signaling
Vegetative CellGRMZM2G350802Zm00001d037695singletontdsgR102H0145.50%3.17E-02AT1G52080Actin binding protein familyCytoskeleton
Vegetative CellGRMZM2G039583Zm00001d022250singletontdsgR33F0343.95%1.38E-04AT2G02370SNARE associated Golgi protein familyVesicle trafficking
Vegetative CellGRMZM2G012328Zm00001d003431syntelogtdsgR49F1143.87%1.38E-04AT3G05610Pectinesterase 5Cell wall modification
Vegetative CellGRMZM2G135570Zm00001d014731singletontdsgR67C0944.09%1.06E-03AT2G29960Peptidyl-prolyl cis-trans isomerase CYP20-1Protein folding
Vegetative CellGRMZM2G153987Zm00001d014782singletontdsgR92F0845.06%1.41E-02AT1G19940Endoglucanase 2Cell wall modification
Vegetative CellGRMZM2G082517Zm00001d015901singletontdsgR96C1229.51%0.00E+00AT2G24450Fasciclin-like arabinogalactan protein 3Cell wall modification
Sperm CellGRMZM2G036832Zm00001d005781singletontdsgR82A0333.43%4.15E-14AT5G49150Protein GAMETE EXPRESSED 2Fertilization
Sperm CellGRMZM2G036832Zm00001d005781singletontdsgR84A1223.14%0.00E+00AT5G49150Protein GAMETE EXPRESSED 2Fertilization
Table 2

Characteristics of genes showing Mendelian inheritance.

CategoryGene designation (v3)Gene designation (v4)Gene TypeDs-GFP alleleMale transmission rateAdjusted p-valueBest BLAST Hit, A. thalianaPredicted Function (B73v4 Gramene)
Seedling OnlyGRMZM2G000052Zm00001d002266syntelogtdsgR63F0949.37%6.99E-01AT2G43020Lysine-specific histone demethylase 1
Seedling OnlyGRMZM2G111143Zm00001d004768singletontdsgR80E0949.09%6.99E-01AT5G24318Glycosyl hydrolase superfamily protein
Seedling OnlyGRMZM2G007283Zm00001d005036syntelogtdsgR83H0549.22%6.99E-01AT5G01020Serine/threonine-protein kinase
Seedling OnlyGRMZM2G129209Zm00001d007228singletontdsgR65E0248.00%6.83E-01AT5G05580Omega-3 fatty acid desaturase
Seedling OnlyGRMZM2G051403Zm00001d013295singletontdsgR46C0451.28%6.99E-01AT3G55250Nuclear pore complex protein Nup214
Seedling OnlyAC217975.3_FG001Zm00001d022274singletontdsgR106F0449.37%6.99E-01AT3G06483pyruvate orthophosphate dikinase4
Seedling OnlyGRMZM2G100288Zm00001d029047syntelogtdsgR76E0749.79%8.83E-01AT3G51550Receptor-like protein kinase FERONIA
Seedling OnlyGRMZM2G127798Zm00001d035925singletontdsgR53F1148.87%6.99E-01AT3G023606-phosphogluconate dehydrogenase1
Seedling OnlyGRMZM2G342243Zm00001d036283syntelogtdsgR52B0948.45%6.99E-01AT1G45688Late embryogenesis abundant protein group 2
Seedling OnlyGRMZM2G044882Zm00001d051110singletontdsgR12H0749.43%6.99E-01AT5G18430GDSL esterase/lipase LTL1
Vegetative CellGRMZM5G876898Zm00001d002258singletontdsgR81G0551.26%6.80E-01AT1G11860Aminomethyltransferase
Vegetative CellGRMZM2G142863Zm00001d003947syntelogtdsgR83B0448.73%7.03E-01AT5G657502-oxoglutarate dehydrogenase E1 component
Vegetative CellGRMZM5G827174Zm00001d007845syntelogtdsgR52E0747.09%2.88E-01AT1G10020Formin-like protein 18
Vegetative CellGRMZM2G045278Zm00001d012382singletontdsgR107C1248.33%4.26E-01AT3G53990Adenine nucleotide alpha hydrolases-like superfamily protein
Vegetative CellGRMZM2G045278Zm00001d012382singletontdsgR34C1148.78%6.80E-01AT3G53990Adenine nucleotide alpha hydrolases-like superfamily protein
Vegetative CellGRMZM2G102912Zm00001d015242singletontdsgR99B0249.27%9.07E-01AT2G24390AIG2-like protein
Vegetative CellGRMZM2G056252Zm00001d017840syntelogtdsgR41F0149.62%9.36E-01AT3G12120Delta(12)-fatty-acid desaturase
Vegetative CellGRMZM5G872068Zm00001d017958syntelogtdsgR98H0947.47%1.69E-01AT1G66200Glutamine synthetase root isozyme 3
Vegetative CellGRMZM2G136508Zm00001d025437singletontdsgR31H0550.18%9.68E-01AT2G01170Amino acid permease
Vegetative CellGRMZM2G126858Zm00001d026303syntelogtdsgR23D0551.10%7.26E-01AT1G56145Putative leucine-rich repeat receptor-like protein kinase family protein
Vegetative CellGRMZM2G120136Zm00001d026445syntelogtdsgR24D0349.47%9.36E-01AT1G45180E3 ubiquitin-protein ligase MBR2
Vegetative CellGRMZM2G006894Zm00001d026490syntelogtdsgR02D0249.57%9.68E-01AT4G30190proton-exporting ATPase4
Vegetative CellGRMZM2G172751Zm00001d027590singletontdsgR35A0848.21%6.80E-01AT2G33420Protein of unknown function (DUF810 domain)
Vegetative CellGRMZM2G035243Zm00001d027856syntelogtdsgR72D1149.84%9.68E-01AT1G14330Kelch motif family protein
Vegetative CellGRMZM2G016734Zm00001d028820syntelogtdsgR27E0148.34%6.80E-01AT1G56300Chaperone DnaJ-domain superfamily protein
Vegetative CellGRMZM2G142249Zm00001d032279singletontdsgR77F0949.17%8.88E-01AT3G47730ABC2 homolog 15
Vegetative CellGRMZM2G114093Zm00001d032310singletontdsgR04G1049.57%9.36E-01AT5G58950Protein kinase superfamily protein
Vegetative CellGRMZM2G124434Zm00001d032950singletontdsgR01G0148.39%6.80E-01AT5G28840GDP-mannose 35-epimerase
Vegetative CellGRMZM5G878153Zm00001d034799singletontdsgR103E0449.33%9.36E-01AT3G03320RNA-binding ASCH domain protein
Vegetative CellGRMZM2G134054Zm00001d034839singletontdsgR32B0550.28%9.68E-01AT3G07130Purple acid phosphatase 15
Vegetative CellGRMZM2G134054Zm00001d034839singletontdsgR45E0450.00%1.00E+00AT3G07130Purple acid phosphatase 15
Vegetative CellGRMZM2G307402Zm00001d036330singletontdsgR69C0448.26%6.80E-01AT2G28200C2H2-type zinc finger family protein
Vegetative CellGRMZM2G012263Zm00001d037061singletontdsgR101B0350.08%9.80E-01AT5G14130Peroxidase 64
Vegetative CellGRMZM2G012263Zm00001d037061singletontdsgR81E0250.39%9.36E-01AT5G14130Peroxidase 64
Vegetative CellGRMZM2G018372Zm00001d041514syntelogtdsgR88B0850.55%9.36E-01AT1G18670Protein kinase superfamily protein IBS1-like
Vegetative CellGRMZM2G095206Zm00001d046483singletontdsgR92A1052.43%2.94E-01AT4G09750NAD(P)-binding Rossmann-fold superfamily protein
Vegetative CellGRMZM2G089699Zm00001d048384complex*tdsgR39B0651.04%8.88E-01AT1G65680beta expansin10a
Vegetative CellGRMZM5G845021Zm00001d048785singletontdsgR08A0747.30%4.25E-01AT3G03900Adenylyl-sulfate kinase 3
Sperm CellGRMZM2G365613Zm00001d006218singletontdsgR60D1051.47%7.46E-01AT5G42560HVA22-like protein i
Sperm CellGRMZM2G100318Zm00001d012128syntelogtdsgR87A0350.03%9.79E-01AT1G67710Putative two-component response regulator family protein
Sperm CellGRMZM2G172726Zm00001d021974syntelogtdsgR35A0351.10%7.46E-01AT1G19360Arabinosyltransferase RRA3
Sperm CellGRMZM2G160069Zm00001d025834singletontdsgR31B0147.96%7.46E-01AT4G16480Inositol transporter 4
Sperm CellGRMZM2G114899Zm00001d034788syntelogtdsgR91F1149.24%7.46E-01AT1G77280Protein kinase protein with adenine nucleotide alpha hydrolases-like domain
Sperm CellGRMZM2G007659Zm00001d042810syntelogtdsgR26G0751.03%7.46E-01AT2G29680Cell division control protein 6 homolog B
Sperm CellGRMZM2G038252Zm00001d043076syntelogtdsgR53C0350.99%7.46E-01AT4G35550WUSCHEL-related homeobox 13a
Sperm CellGRMZM2G099382Zm00001d044109singletontdsgR106G1249.33%7.46E-01AT5G47560Tonoplast dicarboxylate transporter
Sperm CellGRMZM2G352898Zm00001d048434singletontdsgR37A0451.31%7.46E-01AT3G63240Type IV inositol polyphosphate 5-phosphatase 7

*Beta expansins have proliferated in the maize genome, including by tandem duplication [77]. Thus, this gene cannot be strictly characterized as syntelog; however, the presence of multiple paralogs in the genome indicate it should not be categorized as a singleton gene.

Functional validation of highly-expressed gametophyte genes by quantification of transmission rates in Ds-GFP insertional mutants.

Alleles with CDS insertions were tested for differences from Mendelian inheritance using a quasi-likelihood test, with p-values corrected for multiple testing using the Benjamini-Hochberg procedure; alleles with quasi-likelihood adjusted p-value < 0.05 are represented in pink. Alleles are plotted by the log2(FPKM) of their respective gene according to that gene’s expression class (Seedling, Vegetative Cell, or Sperm Cell). Insertion alleles distributed among the classes as follows: Vegetative Cell, 35 alleles; Sperm Cell, 11 alleles; Seedling, 10 alleles. The number of seeds categorized for each allele ranged from 1,522 to 5,219, with an average of 2,807. Genes represented by two independent insertion alleles are enclosed by dotted lines. (A) Transmission rates of 56 mutant allele seed markers for heterozygous Ds-GFP mutant plants crossed through the female. (B) Transmission rates for alleles in the negative control Seedling class when crossed through the male. (C) For genes belonging to the Sperm Cell class, one out of ten (10%) was associated with significant non-Mendelian inheritance. The single gene with a male transmission defect in this group (gex2) showed a strong defect for both of the independent alleles tested. (D) An increased proportion of the genes in the Vegetative Cell class were associated with significant non-Mendelian inheritance when mutant (7/32 genes, 21.9%). In this class, an increase in log2(FPKM) was significantly associated with a decrease in marker transmission (linear regression, p = 0.0120). *Beta expansins have proliferated in the maize genome, including by tandem duplication [77]. Thus, this gene cannot be strictly characterized as syntelog; however, the presence of multiple paralogs in the genome indicate it should not be categorized as a singleton gene. The majority of transmission defects in the Vegetative Cell class genes (six of the seven with significant effects) were mild, at approximately 45% transmission, with only one reducing transmission by a moderate amount, to ~30%. Notably, six of the genes associated with significant defects were measured at a log2(FPKM) > 8 (i.e., in the top 5% of Vegetative Cell genes by FPKM). Given that twelve genes above this threshold were tested, these most highly expressed Vegetative Cell genes were significantly more likely to be associated with non-Mendelian transmission (6 out of 12) than the group of Vegetative Cell genes below this expression threshold (1 out of 20) (Fisher's exact test, p-value = 0.00572). Consistent with this observation, an increase in log2(FPKM) was associated with both reduced transmission rate and an increase in -log10(p-value) (linear regression, p-value = 0.0120, 0.0255, respectively). Thus, our data indicate that transcript level in the Vegetative Cell does provide some limited predictive power for identifying gene-specific contributions to male gametophytic fitness (adjusted R2 = 0.151, 0.116, respectively). Vegetative Cell genes associated with non-Mendelian inheritance had a range of predicted cellular functions, including cell wall modification, cell signaling, protein folding, vesicle trafficking, and actin binding (Table 1). To ensure the experimental design was robust, we examined two potential confounding variables: the presence of the wx1-m7::Ac allele in a subset of lines tested and the potential for epigenetic silencing of GFP transgenes (see S1 Methods). We found no evidence that the presence of wx1-m7::Ac significantly impacted the overall conclusions drawn from the dataset, nor evidence of epigenetic silencing of GFP transgenes.

Insertional mutants in the sperm cell-expressed Zm gex2 cause paternally triggered aberrant seed development

The male-specific transmission defect for the sole affected gene in the Sperm Cell class, Zm00001d005781 (GRMZM2G036832), was notably more severe than the average defect across all Ds-GFP mutants identified with decreased transmission through the male (Table 1). This gene is hereafter referred to as Zm gex2 or gex2, for reasons detailed below. The two independent alleles assessed, gex2-tdsgR82A03 and gex2-tdsgR84A12, were associated with transmission rates of 33.4% and 23.1%, respectively. Sequencing confirmed that these Ds-GFP elements were inserted into their predicted CDS locations (Fig 7A). In addition to the transmission defect, both alleles, when crossed through the male, conditioned unusual phenotypes: underdeveloped or aborted seeds, as well as ovules with no apparent seed development despite heavy pollination (Fig 7B). These features motivated further investigation of this gene.

Mutations in the sperm cell-specific gex2 gene cause aberrant seed development.

(A) The exon/intron structure of gex2 (Zm00001d005781/GRMZM2G036832), showing the locations of the two independent Ds-GFP insertion mutants. (B) Ear projections of gex2 mutant outcrosses. Top: heterozygote outcrossed as female, showing 1:1 transmission of the GFP-tagged allele. Middle: heterozygote outcrossed as a male, with 26.1% transmission of the mutant allele. Additionally, small seeds and occasional, small gaps between seeds are visible. Bottom: homozygous mutant outcrossed as a male, with many small seeds and large gaps, despite heavy pollination. (C) Genomic neighborhood of the GEX2 locus, with two nearby TEs, and their RNA-seq expression levels across male reproductive development. (D) Predicted domain structure of Zm GEX2; the amino acid sequence shows 44.2% similarity with Arabidopsis GEX2. (E) Quantification of small/aborted seeds resulting from pollination by gex2 mutant plants and controls. Controls included two Ds-GFP lines that did not show transmission defects (tdsgR12H07 and tdsgR46C04) and one Ds-GFP line that showed a strong transmission defect in the vegetative cell group (tdsgR96C12; 29.5% transmission). A higher percentage of small/aborted seeds was present following pollination by heterozygous gex2 plants representing the two mutant alleles (tdsgR82A03 and tdsgR84A12), and pollination by homozygous gex2-tdsgR84A12 plants further increased the percentage of small/aborted seeds. Across maize tissues, gex2 is highly and specifically expressed in sperm cells [38] (Fig 7C). Like many highly expressed genes in mature pollen, it is within 2kb of a transcriptionally active TE, a downstream RLG retrotransposon that displays sperm cell-specific activation (Fig 7C). The Zm gex2 gene was first identified via EST sequencing of maize sperm cells [44], and subsequently used to isolate the Arabidopsis ortholog, named GAMETE EXPRESSED2 (GEX2) [45]. In Arabidopsis, GEX2 is necessary for effective double fertilization, causing seed abortion and empty spaces in the silique when a mutant allele is inherited through the male [46]. Zm gex2 encodes a protein with similar structure and amino acid sequence to its Arabidopsis ortholog (Fig 7D). Small and aborted seeds were quantified for both gex2 insertion alleles when outcrossed to wild-type plants (S10 Table, S5A Fig). Two different Ds-GFP insertion lines that were not associated with transmission defects (tdsgR12H07, tdsgR46C04), as well as the Ds-GFP associated with the strongest male transmission defect in the Vegetative Cell class (tdsgR96C12), were used as controls. Pollination with both gex2::Ds-GFP insertion alleles was associated with increased percentages of small or aborted seeds, significantly so in gex2-tdsgR84A12 (pairwise t-test against Ds-GFP controls, all p-values < 0.05), and pollination from gex2-tdsgR84A12 homozygotes approximately doubled the percentage of aberrant seeds from heterozygotes (pairwise t-test against Ds-GFP controls, all p-values < 0.01) (Fig 7E). From the heterozygous gex2::Ds-GFP crosses, small seeds with endosperm large enough for DNA preparation were genotyped, and 79.2% were found to harbor the gex2 mutation, whereas the tdsgR46C04 control showed Mendelian segregation in small seeds (S5B Fig). These data support the hypothesis that aberrant seed development is induced by fertilization by gex2::Ds-GFP sperm. If the Zm GEX2 protein acts to promote double fertilization similarly to its Arabidopsis ortholog, the arrival of a gex2::Ds-GFP pollen tube at the embryo sac could lead to failure of one or both fertilization events. Given an active polytubey block, this could produce the observed gaps between seeds on the ear, resulting from ovules associated with completely failed fertilization, or with very early seed abortion due to single fertilization. Consistent with this possibility, pollination with both heterozygous and homozygous gex2-tdsgR82A03 alleles resulted in increased seedless area relative to controls (S6 Fig). To test for aberrant fertilization more directly, seed development was assessed at 4 days post-pollination with either wild-type or gex2-tdsgR84A12 homozygous pollen (Fig 8, Table 3). Typical embryo and endosperm development, as well as indication of the polytubey block (i.e., arrival of only single pollen tubes at the embryo sac), was observed in all ovules assessed from wild-type pollination. In contrast, half of the ovules assessed following pollination with gex2::Ds-GFP showed significant evidence of abnormal double fertilization, demonstrating single fertilization of either embryo or endosperm or indication of arrival of more than one pollen tube at the embryo sac (Fisher's exact test, p-value = 0.000241). We conclude that in maize, similarly to Arabidopsis, Zm GEX2 is part of the sperm cell machinery that helps ensure proper double fertilization.
Fig 8

Pollination by gex2-tdsgR84A12 leads to aberrant fertilization events and developing seed phenotypes.

(A) Seed development in a typical ovule pollinated by wild-type pollen, with one synergid penetrated by a pollen tube, and both embryo and endosperm development initiated. (B-D) Abnormal phenotypes seen following gex2-tdsgR84A12 pollination. (B) Ovule with developing (cellularizing) endosperm but unfertilized egg cell. (C) Ovule with developing embryo but unfertilized central cell. (D) Ovule with both synergids penetrated by a pollen tube, and a developing embryo and unfertilized central cell. emb = embryo; endo = endosperm; ps = synergid penetrated by a pollen tube; PN = polar nuclei.

Table 3

Seed development at 4 days after pollination by wild-type or gex2::Ds-GFP pollen.

One synergid penetrated by a pollen tubeBoth synergids penetrated by a pollen tube
Pollen parentBoth embryo and endospermEndosperm without embryoEmbryo without endospermBoth embryo and endospermEndosperm without embryoEmbryo without endosperm
gex2-tdsgR84A12/gex2-tdsgR84A12622002
Wild-type2800000

Pollination by gex2-tdsgR84A12 leads to aberrant fertilization events and developing seed phenotypes.

(A) Seed development in a typical ovule pollinated by wild-type pollen, with one synergid penetrated by a pollen tube, and both embryo and endosperm development initiated. (B-D) Abnormal phenotypes seen following gex2-tdsgR84A12 pollination. (B) Ovule with developing (cellularizing) endosperm but unfertilized egg cell. (C) Ovule with developing embryo but unfertilized central cell. (D) Ovule with both synergids penetrated by a pollen tube, and a developing embryo and unfertilized central cell. emb = embryo; endo = endosperm; ps = synergid penetrated by a pollen tube; PN = polar nuclei.

Discussion

The Zm gex2 gene has a conserved role in promoting double fertilization

The generation of a well-replicated developmental time course of transcriptomic data enabled the targeting of highly expressed genes in vegetative and sperm cells for mutational screening. Two independent insertions in the highly and specifically expressed maize sperm cell gene gex2 led not only to severely reduced transmission through the male, but also, in contrast to other mutations analyzed in this study, to paternally triggered post-fertilization defects. Zm gex2 was first identified in maize by sperm cell EST sequencing [44], which led to identification of the orthologous gene in Arabidopsis, GEX2, and its sperm cell-specific promoter [45]. In Arabidopsis, single fertilization events were observed at increased frequency in GEX2 mutant-pollinated plants, both for the egg cell and the central cell, leading to an observed increase in aberrant seed development and abortion [46]. Our results in maize are similar, with gex2 mutant pollen resulting in unfilled ovules, single fertilization events in embryo sacs, and aberrant early seed development from embryo sacs fertilized by gex2::Ds-GFP sperm cells. In Arabidopsis, the interaction between the plasma membrane-localized GEX2 and either the female egg or central cells has been suggested to contribute to gamete attachment. The two orthologues share a predicted domain structure, including a large N-terminal non-cytoplasmic region containing filamin repeat domains potentially contributing to this interaction [46], raising the possibility that Zm GEX2 acts similarly during double fertilization. With conserved GEX2-like genes widely distributed throughout the currently sequenced Embryophyta taxa, our results support the idea that, in flowering plants, these genes play a crucial role in double fertilization.

Maize pollen provides a powerful model for quantifying gene-specific contributions to fitness

The transcriptome dataset also provided a framework to ask broader questions regarding potential relationships between elevated expression and gene function, i.e., utilizing Ds-GFP insertions not merely in a genetic screen, but in a mutational interrogation of gene function guided by quantitative hypotheses. Despite the explosion of omic-scale methods to characterize genomes and to measure molecular characters (e.g., transcript levels), our ability to predict phenotypic relevance for specific genes is limited, particularly in multicellular organisms. A simple hypothesis is that a high transcript level at a particular developmental stage implies functional relevance for the associated gene at that stage, and thus potential for phenotypic influence, a hypothesis supported by observations in mice [47]. This study addresses this hypothesis in plants with a systematic assessment of the functional relevance of highly expressed genes in maize pollen, taking advantage of the ease of reciprocal outcross pollination in maize, the availability of a sufficient number of marked and likely null mutations, and the development of an imaging technique that enables sensitive quantitation. In the post-pollination progamic stage, pollen grains, as independent, genetically distinct organisms, compete to be the first to deliver the sperm cells to the embryo sac for double fertilization. In an outcrossing plant with an extensive stigma and style like maize, there is likely a heightened context for competition among these individuals [10], thus providing a milieu that may be particularly sensitive to genetic perturbation. We reasoned that genes highly expressed in the vegetative cell would tend to contribute to a competitive advantage at this stage, which is responsible for pollen tube germination and growth. Indeed, we found that CDS-insertion alleles for 7 out of 32 (21.9%) tested genes in this class are associated with mild to moderate male-specific transmission defects, with the majority of these defects classified as mild (~45% transmission) and thus detectable only by assessing large populations. In this class, transcript level was significantly correlated with both reduced transmission rate and an increased likelihood of significant non-Mendelian transmission, although the explanatory power is limited (R2 <0.2), as expected for a complex biological system. Genetic redundancy could also be predicted to influence the phenotypic outcome of mutating single genes, and there is a suggestive trend consistent with this idea in our dataset (Fisher's exact test, p-value = 0.2117), with 6/7 non-Mendelian alleles classified as singletons in the maize genome (86%), compared to 14/25 singletons in genes harboring Mendelian alleles (56%) (Tables 1 and 2). In addition, the variety of biological processes predicted for genes with documented fitness contributions (Table 1) is consistent with the idea that competitive pollen tube growth requires an array of cellular functions. It should be noted that our approach relies on the availability of Ds-GFP-marked insertion alleles, and it seems likely that such availability is biased against genes with severe transmission phenotypes, as these would be selected against in a transposon-mutagenized population. In fact, two large Mutator transposon populations show a statistically significant deficit in insertions in genes associated with gametophyte-enriched expression [23]. Notably, both previously described maize genes associated with severe male transmission defect mutants (rop2, apt1; transmission <15% [11,48]) also would be classified as highly expressed in the vegetative cell (log2(FPKM) > 8), suggesting that the trend we found is applicable even to the types of genes most likely to be absent from the Ds-GFP insertion population. Few studies have investigated the relationship between transcriptomic data and quantitative mutant phenotypes, particularly in multicellular organisms. Large-scale screening of the Arabidopsis T-DNA mutant collection for leaf or reproduction-related phenotypes has identified effects in ~4% of lines assessed [49,50], but these efforts were not guided by transcriptome data. Our use of a sensitive phenotypic assay, combined with a focus on sampling mutations in genes that are most highly expressed at a developmental stage relevant to the phenotype tested, seem likely to have contributed to the higher frequency of phenotypic effects we found. In mice, a meta-analysis found a relationship similar to the one we observed, with aberrant phenotypes more likely to be associated with genes highly transcribed in the tissue exhibiting the phenotype [47]. In contrast to these results, genome scale measurement of the fitness costs of gene knockouts via competitive assays in yeast [51,52] and bacteria [53,54] found that, for particular environmental conditions, there was little to no correlation between the expression level of a gene and its impact on fitness in that condition. This could be indicative of differences between the mechanisms underlying single-celled organisms’ response to the environment versus those underlying developmental complexity in multicellular organisms. Interestingly, our results are consistent with a recent meta-analysis that identified higher mRNA expression levels as a feature distinguishing gene models with known mutant phenotypes from the overall population of gene models defined by molecular approaches [55].

Transposable element dynamics in the maize male gametophyte

The transcriptomic time course enabled exploration of the dynamic relationships among developmental progression, gene expression levels, and transcriptional activation of TEs. Understanding TE expression during maize male reproductive development provides an informative comparison to similar analyses in Arabidopsis, an evolutionarily distant plant with a genome landscape that is distinct from maize. Although maize has a higher number and percentage of its genome occupied by TEs compared to Arabidopsis, we found that only a fraction of maize TEs are developmentally dynamic with regards to transcript accumulation. These ‘dynamic’ TEs tend to be longer elements than average, which suggests that they have protein coding and transpositional potential. From this dynamic TE set, we were able to identify individual elements that are expressed in a number of specific tissues. However, more globally, there is a trend towards activation of TE transcription over the course of the development of the male gametophyte. This finding confirms that both monocots and eudicots have developmental activation of TE expression in pollen. Consistent with our findings, a recent study found that spontaneous retrotransposon mutations are much more frequent through the male than the female in certain maize lines [31]. This conservation suggests that the roles of TE and TE-induced small RNAs during reproductive development may also be conserved between monocots and eudicots [35,56,57]. Although TE activation is conserved in maize and Arabidopsis pollen, we have identified key differences in the timing and location. Maize TE activation is detected earlier (in the unicellular microspore) compared to when it is thought to occur in Arabidopsis [28]. Transcripts from these early-activated TEs in the microspores typically remain detectable through pollen development and in the mature pollen grain, which may be due to continued expression or transcript stability. A second distinction is the location of activation, which in Arabidopsis is confined to the pollen vegetative cell nucleus [28,32], whereas in maize also occurs in sperm cells. Mutator family TEs are overrepresented in the pool of sperm-cell transcripts, suggesting that this family of TEs may have evolved (or co-opted) specific regulatory mechanism(s) such as an enhancer element that confers expression in this cell type. Given our results indicating a linkage between elevated gene expression levels and functional relevance, we also assessed whether similar correlations exist between gene and TE expression locally in the maize genome. We found that in mature pollen and sperm cells there is a positive correlation: the more highly expressed a gene is, the more likely it is to have an up-regulated TE nearby. This tissue-specific correlation is a developmentally-specific co-regulation of gene and TE expression. Notably, there does not appear to be any strong trend linking this co-regulation to gene function. We find instances of local gene/TE coordinate regulation are present in similar proportions in genes with documented transmission defects vs. those showing Mendelian segregation when mutant (14% vs 28% respectively, S5 Table). Therefore, although our data indicate the population of highly expressed maize pollen genes has a tendency to contribute to pollen fitness, and a tendency to be adjacent to pollen-expressed TEs, these two characteristics appear to be independent. Several potential mechanisms may account for the coordinated gene/TE expression. First, programmed activation of TE expression may influence chromatin, enhancer, or other regulatory functions that influence neighboring genes. Second, the genes and TEs may be directly controlled by the same mechanism of large-scale epigenetic activation, limiting the expression of both to a specific tissue or developmental time point. Third, gene activation may influence the expression of the neighboring TE via read-through transcription. Future studies using alternative transcriptomic approaches (e.g., CAGE or long-read RNA sequencing) will enable the dissection of these possible mechanisms.

Methods

Plant materials

Maize inbred line B73 was used for all RNA isolations. Plants were grown in a controlled greenhouse environment (16 hrs light, 8 hrs dark, 80 F day/70 F night) and in the field at the Botany & Plant Pathology Field Lab (Oregon State University, Corvallis, OR) using standard practices. Lines containing Ds-GFP insertion alleles were acquired from the Maize Genetics Cooperation Stock Center.

RNA isolation, library preparation and sequencing

Detailed methods are available in S1 Methods. Briefly, tissue was isolated either by dissection (TP), differential density centrifugation (MS, MS-B and SC), or collection at anthesis (MP). Total RNA from TP, MS, and MP was extracted using a modified Trizol Reagent (Life Technologies) protocol; SC total RNA was extracted via a phenol/chloroform protocol. Poly-A RNA (mRNA) was isolated using streptavidin magnetic beads (New England Biolabs, # S1420S) and a biotin-linked poly-T primer. RNA libraries were prepared and sequenced by the Central Services Lab (CSL) at the Center for Genome Research and Biocomputing (CGRB, Oregon State University) using WaferGen robotic strand specific RNA preparation (WaferGen Biosystems) with an Illumina TruSeq RNA LT (single index) prep kit and run on an Illumina HiSeq 3000 with 100 bp paired-end reads.

Mapping reads to genes, differential expression assessment and GO enrichment analysis

Ribosomal reads (rRNA) were removed from all samples using STAR, version 2.5.1b [58] to map reads to a repository of maize rRNA sequences (parameters: --outSAMunmapped Within --outSAMattributes NH HI AS NM MD --outSAMstrandFieldintronMotif --limitBAMsortRAM 50000000000 --outReadsUnmapped Fastx). The number of mappable reads generated from each sample after rRNA removal ranged from approximately 1 million to approximately 41 million, with an average mappable reads of approximately 18 million per sample. Total reads, mapped reads, rRNA contamination, and other statistics are summarized in S1 Table. rRNA-filtered sequences were mapped to the maize reference genome, version B73 RefGen_v4.33 [36] using STAR, keeping only unique alignments (parameters: --outSAMunmapped Within --outSAMattributes NH HI AS NM MD --outSAMstrandField intronMotif --outFilterMultimapNmax 1 --limitBAMsortRAM 50000000000). Transcript levels of annotated gene isoforms were measured using Cufflinks, version 2.2.1 [59]. FPKM (fragments per kilobase of transcript per million mapped reads) values are shown in S4 Table. Differential expression was calculated between each tissue with Cuffdiff, version 1.0.2, using default parameters. FPKM counts were normalized using the geometric library normalization method. A pooled dispersion method was used by Cuffdiff to model variance. Differential expression results are summarized in S11 Table. Gene ontology (GO) terms were found for enriched genes in each tissue using the AgriGO 2: GO Analysis Toolkit [60]. Enriched genes were defined as the top 300 significantly differentially expressed genes (q-value) from Cuffdiff output, with ties broken by log2 fold change. Enriched sets were split into up- and down-expressed genes. GO term enrichment was calculated using the singular enrichment analysis method with a Fisher test and Yekutieli multi-test adjustment. GO annotations were based off the maize-GAMER annotation set [61].

Mapping reads to transposable elements

The rRNA-filtered reads were quality trimmed (QC30) and adapter sequences were removed using BBDUK2 [62]. The remaining sequences were mapped to the whole genome using STAR, allowing mapping to at most 100 ‘best’ matching loci. (parameters: --outMultimapperOrder Random --outSAMmultNmax -1 --outFilterMultimapNmax 100). For paired-end reads, the unmapped reads were re-mapped using single-end approach to maximize the number of mappable reads. The mapping percentage is reported in S3 Table. Because 19% of the total reads in the dataset mapped to more than one location, such reads were mapped to only their best match in the genome, and when multiple best matches existed, they were mapped to all of these loci, and then counted fractionally. For example, if one read maps to 4 TE locations equally well, each TE would receive a weighted value of 0.25 mapped reads. Because the TE expression of the aberrant SC1 biological replicate did not cluster with the other three SC replicates (S1 Fig), it was removed from all subsequent analysis of TE expression.

Principal component analysis (PCA)

Using the maize gene and TE annotation file available from Ensembl Genomes (v38) [36], a combined annotation file was generated for both genes and TEs to run PCA for all samples. FeatureCounts [63] was used to calculate the accumulation of each gene and TE in all samples following fractional assignment of reads (parameters: -O --largestOverlap -M --fraction -p -C). This counts file was used in DESeq-2 [64] to generate the PCA plot.

Analysis of transposable elements

The featureCount file (described above) was used as input for differential expression analysis using DESeq2. Since DESeq2 accepts only integers as raw counts, we used ‘round’ function of R to round the counts to their nearest integers. For differential expression using default parameters for normalization in DESeq2, we only included TEs with a sum total of > = 10 read counts across samples; the rest were categorized as ‘not covered’ TEs. First, normalized read counts for all TEs were obtained (data in S2 Fig) and then, only TEs (farther than 2kb from genes) were filtered to be investigated further whereas TEs less than 2kb away from a gene were categorized as ‘near genes’. After selecting seedling as the reference tissue, pairwise volcano plots were generated for all samples against the reference seedling tissue. Each pairwise comparison with the seedling tissue yielded set of TEs with adjusted p-value of either ‘NA’ or a real number. The set of TEs with a p-value of ‘NA’ in all pairwise comparisons was added to the count of ‘not covered’ category since there was not enough statistical power to call differential expression in any of the tissues. The number of TEs statistically significantly up- and down-regulated (adjusted p-value < 0.05) in each tissue was calculated, categorized as ‘dynamic TEs’ and plotted (Fig 3). Additionally, the number of TEs with adjusted p-value > = 0.05 were categorized as developmentally ‘static TEs’ as no evidence of TE expression was observed over different developmental time points analyzed. For all categories, the length, family or distance from centromere was calculated based on the published TE annotation file.

Validation of Ds-GFP insertion sites

A FASTA file containing 2 kb of genomic sequence surrounding each Ds-GFP insertion site was used as input to a primer3-based tool to generate a pair of specific primers to genotype individual plants from each line (https://vollbrechtlab.gdcb.iastate.edu/tools/primer-server/). The primers used for each Ds-GFP line are listed in S6 Table. To genotype the plants, two 7 mm discs of leaf tissue were collected from each plant using a modified paper punch. The samples were collected in 1.2 ml tubes that fit within a labeled 96 well plate/rack (https://vollbrechtlab.gdcb.iastate.edu/tools/tissue-sample-plate-mapper/) (Phenix Research Products, Candler, NC; M845 and M845BR or equivalent). Genomic DNA was isolated from the leaf punches [65] with the following modifications. An additional centrifugation (3,000 g for 10 min.) was added to clear the leaf extracts prior to loading onto a 96-well glass fiber filter plate (Pall, 8032). DNA was eluted from filter plates in 125 μL water, and 2 μL was used as template for PCR. Amplification followed standard PCR conditions using GoTaq Green Master Mix (Promega) with 4% DMSO (v/v) and amplicons were resolved using agarose gel electrophoresis. Lines were genotyped using the pair of Ds-GFP line gene-specific primers plus one Ds-specific primer (JSR01 GTTCGAAATCGATCGGGATA or JGP3 ACCCGACCGGATCGTATCGG). All lines were also screened by PCR for the presence of wx1-m7::Ac using primers for wx1 (CACAGCACGTTGCGGATTTC) and Ac (CCGGATCGTATCGGTTTTCG). Followup PCR to test for co-segregation of GFP fluorescence with the presence of the insertion used the appropriate set of three PCR primers (two gene-specific and one Ds-specific) and DNA prepared either from endosperm or seedling leaves [66].

Insertional mutagenesis transmission quantification and statistics

Heterozygous lines with PCR-validated Ds-GFP insertion alleles were planted in the Botany & Plant Pathology Field Lab (Oregon State University, Corvallis, OR). All insertions were in coding sequence (CDS) sites. Heterozygous Ds-GFP plants were outcrossed to tester plants (c1/c1 wx1/wx1 or c1/c1 genetic background) through both the female and the male, with male pollinations made with a heavy pollen load on extended silks (silks that had been allowed to grow for at least two days following cutback). Following harvest, resulting ears were imaged using a custom rotational scanner in the presence of a blue light source and orange filter for GFP seed illumination [43]. Briefly, videos were captured of rotating ears, which were then processed to generate flat cylindrical projections covering the surface of the ear (for examples, see Figs 5 and 7). Seeds were manually counted using the Cell Counter plugin of the Fiji distribution of ImageJ [67]. Ears showing evidence of more than a single Ds-GFP insertion (~75% GFP transmission) were excluded from further analysis. For an allele to be included in the final dataset, we required a minimum of three independent male outcrosses from two different plants. Seed transmission rates of remaining ears were quantified using a generalized linear model with a logit link function for binomial counts and a quasi-binomial family to correct for overdispersion between parent lines. By incorporating overdispersion, we allowed for the possibility that seeds on the same ear were not completely independent, and for varying transmission rates between ears associated with a given mutation (e.g. by environmental or maternal effects). A quasi-likelihood approach is more realistic than a simple chi-square test, which assumes that all seeds are independent and transmission rates between ears associated with a given mutation are the same. The dispersion parameter for Sperm Cell and Vegetative Cell categories was approximately 1.8, indicating substantially more heterogeneity among seeds on different ears than is expected in a model which assumes independence. Non-Mendelian inheritance was assessed with a quasi-likelihood test with p-values corrected for multiple testing using the Benjamini-Hochberg procedure to control the false discovery rate at 0.05. Significant non-Mendelian segregation was defined with an adjusted p-value < 0.05. Proportions of genes with male-specific transmission defects in the Seedling, Sperm Cell, and Vegetative Cell categories were compared using a two-sided Fisher's exact test, with significance defined as a p-value < 0.05. A two-sided Fisher's exact test was also used to compare the proportions of male-specific transmission defects in the most highly expressed genes and the less highly expressed in the vegetative cell category. A two-sided test for equality of proportions with continuity correction was used to compare transmission rates in families with partial Ac presence. A Git repository containing statistical tests and plotting information for this portion of the study can be found at https://github.com/fowler-lab-osu/maize_gametophyte_transcriptome.

Zm gex2 sequence analysis and phenotype characterization

Protein sequence encoded by Zm gex2 (Zm00001d005781_T002) was retrieved from the Maize Genetics and Genomics Database (MaizeGDB) hosting of the B73 v4 genome [36,68]. Arabidopsis GEX2 protein sequence (AT5G49150.3) was retrieved from the Arabidopsis Information Portal (ARAPORT) Col-0 Araport11 release [69,70]. Maize and Arabidopsis GEX2 protein domains were predicted by InterPro [71], with transmembrane helix predictions by TMHMM [72]. Prediction of land plant species GEX2 conservation was retrieved from PLAZA, gene family HOM04M006791 [73]. Zm gex2 gene duplication searches were performed using BLAST [74] and the B73 v4 genome. To confirm the predicted insertion sites for the two gex2::Ds-GFP alleles, flanking insertion site fragments were PCR-amplified with a gene-specific primer and a Ds-GFP-specific primer (DsGFP_3UTR–TGCAAGCTCGAGTTTCTCCA) and sequenced via Sanger sequencing. To quantify small seed phenotype, mature, dried down maize ears were imaged prior to seed removal from the ear. For small seeds selection, the ear was first visually scanned row by row from the top to the bottom of the ear. Seeds that were noticeably smaller than their surrounding (regular-sized) seeds are carefully removed from the ear using a pin tool. This sometimes required the removal of regular-sized surrounding seeds, which were saved for later counting. A second visual inspection of the ear often resulted in additional small seeds and is recommended. All remaining seeds were then removed from the ear by hand or using a hand corn sheller tool (Seedburo Equip. Co., Chicago, IL). The ear was screened again for any small (flat/tiny) seeds that could have been missed previously. The cob was inspected prior to discarding, and if any small seed was left behind it was removed and accounted for. Small/smaller seeds and regular-sized seeds were counted and counts were recorded (S10 Table). To measure seedless area, ears were scanned as previously described to create flat surface projections. "Seedless area" was defined as ear surface area that lacked mature or partially developed seeds. Seedless area was quantified as a percentage of total area, as measured with the "Freehand selection" tool of the Fiji distribution of ImageJ [67]. A Git repository containing statistical tests and plotting information for this portion of the study can be found at https://github.com/fowler-lab-osu/maize_gametophyte_transcriptome. For analysis of embryo sacs by confocal microscopy, tissues were stained with acriflavine, followed by propidium iodide staining [75,76]. After staining, samples were dehydrated in an ethanol series and cleared in methyl salicylate. Samples were visualized on a Leica SP8 point-scanning confocal microscope using excitations of 436 nm and 536 nm and emissions of 540 ± 20 nm and 640 ± 20 nm.

Principal component analysis of gene and transposable element (TE) expression levels.

Two major components, on x- and y-axis, explain 89% of the variance in gene and TE expression levels. Asterisk (*) mark indicates the sample generated as part of this study, whereas other datasets are publicly available. For the sperm cells isolated in this study (SC), the TE expression of one biological replicate did not cluster with the other three (SC1), and therefore was removed from subsequent analyses of expression from TEs. MP-2014, SE, and OV are from [23]; MP-WEB is from [38]; LF is from [37]; MP-LM is from NCBI BioProject 306885 (2015); SC-TD is from [27]. (TIF) Click here for additional data file.

Seedling tissue is the appropriate reference for comparison of TE activity.

(A) Steady-state mRNA accumulation of all TEs in different tissues. Datasets generated in this study are marked with an asterisk. (B) The number of TEs with zero or near-zero expression levels in different tissues. Seedlings (SE) have the most TEs with low expression levels. (TIF) Click here for additional data file.

Length distribution of categorized TEs subdivided by superfamilies.

Length of TEs in the different TE categories from Fig 2A but further subcategorized by different superfamilies. The violin plots around the box show the kernel probability density of the data. The box represents lower and upper quartile, the line is the median, and the whiskers represent 10–90% range. Red asterisk denotes the mean. ‘n’ shows the number of TEs in each category for each superfamily. (TIF) Click here for additional data file.

Abundance of TEs near genes in each tissue.

(A) For each tissue type, the top 20,000 expressed genes are distributed along the X-axis in bins of 200, with the highest expressed bin on the far left. The number of TEs near (<2kb) these genes is then counted on the Y-axis (shown in grey bar) and the number of genes with at least 1 TE within 2kb is displayed as black dots. (B) Genes filtered for either higher expression in pollen (MP) over sperm cells (SC) (left) or SC>MP (right) were used to determine if the association in Fig 4 is due to sample contamination between SC and MP. Once genes were filtered, the top expressed genes in that tissue were distributed along the X-axis in bins of 200 based on their expression values, with the highest expressed bin on the far left. The number of up- and down-regulated TEs near (<2kb) these genes is then counted on the Y-axis (shown in grey bar) and the number of genes with at least 1 TE within 2kb is displayed as black dots. (TIF) Click here for additional data file.

gex2 mutant pollen is associated with increased small and aborted seeds in outcross progeny.

(A) Seeds were removed from ears, arranged according to size, and counted. Images of representative seed populations are shown, with the top two rows in each image showing representative fully developed seeds. Rows below the top two contain all of the smaller or aborted seed from that particular ear. (B) PCR genotyping of small endosperm seeds from two independent crosses for the two gex2 alleles show the majority of small seeds harbor the gex2::Ds-GFP allele, despite overall reduced transmission of the insertion alleles through the male. Small seeds from control tsdgR46C04 crosses segregate in a Mendelian fashion. (TIF) Click here for additional data file.

Characterization of gex2 seedless ear area.

Seedless area was quantified from scanned ear images for gex2 Ds-GFP alleles and Ds-GFP controls. Pollen from heterozygous gex2 plants did not show significantly increased seedless area (gex2-tdsgR82A03 pairwise t-test p-values relative to GFP line 1, GFP line 2, and VC mutant 0.95, 0.96, and 0.74, respectively; gex2-tdsgR84A12 pairwise t-test p-values 0.19, 0.13, and 0.06, respectively), whereas pollen from homozygous gex2-tdsgR84A12 plants had significantly increased seedless area (pairwise t-test against Ds-GFP controls separately, all p-value < 0.0001). (TIF) Click here for additional data file.

Tissue sample preparation, RNA extraction, and analysis of potential confounding variables in insertional mutagenesis lines.

(PDF) Click here for additional data file.

Gene sequencing statistics and availability.

Summary statistics for sequencing data generated in the study. (XLSX) Click here for additional data file.

GO term enrichment results.

Differentially expressed genes in developmental categories examined in the study, as well as significantly enriched GO terms associated with these genes. (XLSX) Click here for additional data file.

Transposable element sequencing statistics and availability.

Summary statistics and availability for expression datasets used in the analysis of transposable element expression. (XLSX) Click here for additional data file.

Genic isoform abundance (FPKM) across developmental stages.

Cufflinks output describing isoform expression by developmental stages, separated by biological replicate. (TXT) Click here for additional data file.

Top 20% transcripts by FPKM in Mature Pollen, Sperm Cell and Seedling datasets.

List of top 20% highly expressed genes assigned to the Vegetative Cell, Sperm Cell or Seedling Only classes. (XLSX) Click here for additional data file.

Insertional mutagenesis alleles and primers.

List of alleles tested for the presence of Ds-GFP insertions by PCR, including primers sequences. (XLSX) Click here for additional data file.

Insertional mutagenesis results by line.

Insertional mutagenesis results, separated by line, including marker transmission rates and expression category information. (XLSX) Click here for additional data file.

Insertional mutagenesis results by allele.

Insertional mutagenesis results, separated by allele, including marker transmission rates and expression category information. (XLSX) Click here for additional data file.

Concordance of seed phenotype with DsGFP genotype.

PCR results from testing Ds-GFP presence GFP and non-GFP seeds for selected alleles. (XLSX) Click here for additional data file.

gex2 small seed phenotyping.

gex2 small seed counting and seedless area results. (XLSX) Click here for additional data file.

Differential expression results.

Cuffdiff output comparing expression between tissues examined in this study. (TXT) Click here for additional data file. 10 Dec 2019 Dear Dr Fowler, Thank you very much for submitting your Research Article entitled 'Highly expressed maize pollen genes display coordinated expression with neighboring transposable elements and contribute to pollen fitness' to PLOS Genetics. Your manuscript was fully evaluated at the editorial level and by independent peer reviewers. The reviewers appreciated the attention to an important problem, but raised some substantial concerns about the current manuscript. Based on the reviews, we will not be able to accept this version of the manuscript, but we would be willing to review again a much-revised version. We cannot, of course, promise publication at that time. There are several very interesting observations presented in this manuscript.  Several of the reviewers (and I) noted that the paper is a bit difficult to follow and summarize as it really is two distinct studies that are not all that well linked.  You should consider how to better link these studies of TEs/expression and fitness as they are both interesting but are difficult to push together in a cohesive narrative.  In addition, the reviewers pointed out a number of specific issues that will need to be addressed either through providing more clarity of altering some analyses. Should you decide to revise the manuscript for further consideration here, your revisions should address the specific points made by each reviewer. We will also require a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. If you decide to revise the manuscript for further consideration at PLOS Genetics, please aim to resubmit within the next 60 days, unless it will take extra time to address the concerns of the reviewers, in which case we would appreciate an expected resubmission date by email to plosgenetics@plos.org. If present, accompanying reviewer attachments are included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist. To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see our guidelines. Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission. While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool.  PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process. To resubmit, use the link below and 'Revise Submission' in the 'Submissions Needing Revision' folder. [LINK] We are sorry that we cannot be more positive about your manuscript at this stage. Please do not hesitate to contact us if you have any concerns or questions. Yours sincerely, Nathan M. Springer Associate Editor PLOS Genetics Gregory P. Copenhaver Editor-in-Chief PLOS Genetics Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: I think the title is problematic, because of the word “and”. The paper has two parts: in the first they demonstrate TE expression in male development similar to what was shown for Arabidopsis, but find key differences in timing and location. No connection between TE part of the paper and the transmission defects part of the paper; could be a coincidence, nothing more. Much of work on the GEX2 mutant is duplicative of what was already published in Arabidopsis. Transmission defects for vegetative-expressed genes are modest, except for one –surprised they found anything, as many pollen proteins are members of multi-gene families (all expressed in pollen). So gene redundancy might be expected to cover for any defect in another. They should provide info. about possible gene redundancy to explain lack of/presence of phenotypes for the genes they chose to test. I am not convinced by their idea that highly expressed genes are likely to be more important for function. Paper is not user-friendly – e.g., if a reader is interested in knowing what genes were tested and which ones did not show transmission defects, the supplemental tables don’t help, long gene identifier numbers, nothing else. A table of annotations for all tested genes should be in the body of the paper. Line 2: strange to say that the male gametophyte is essential for subsequent initiation of seed development – with such an argument, it must be important for every subsequent step in the life cycle. Just say pollen tube growth and sperm delivery. Line 45: insert the word the in front of the word pollen Lines 50-52: a role for GEX2 already demonstrated in Arabidopsis (reference 47); should be mentioned here. Similarly, in line 69- role for GEX2 in fertilization already established – if the authors insist on this phrasing, they need to say maize gex2 throughout. Similarly in lines 138-140. Line 61; what does “this process” refer to? Pollen tube growth? Line 240 – the word somatic is not needed Line 272, vice, not vise Line 283-285, they should also admit “or neither”, i.e. expression levels of TE and genes could be a coincidence, neither affecting the other. Line 289, there is no such thing as high or low tissue-specificity, either something is specific or it is not. Otherwise use the word enriched. Line 307 – competitive delivery of gametes is unusual phrasing, the competition is at the level of pollen tube growth. line 356-362. I don’t understand the math – 6 of 7 and then 6 of 12. Line 394, cite ref. 46 after EST sequencing, and the reference to the name GEX2 (ref 50), is missing. Neither ref should be at the end of the sentence, as “no mutant has been described” is not mentioned in either ref. 46 or ref 50. The statements on lines 495-497 are better and accurate. Line 400 “has been described” is passive voice – Ref. 47 shows it. Line 401 – of course it encoded a protein similar to Arabidopsis Gex2, the way Gex2 was identified was by using maize ESTs (ref. 46) to find the Arabidopsis homolog (ref 50). Lines 405-441 essentially are showing in maize what was already shown (ref 47) in Arabidopsis. Probably does not merit this much space. Line 507-508 Why mention “we cannot rule out….” Lines 512-520 – why discuss these other mutants? They don’t bear on the work in this paper. Lines 555 mild to moderate is an overstatement here, 5 were mild, only one was moderate. Line 560 – delete extrapolation comment, it is pointless from such a small sample. Discussion is much too long, could be cut by 40%. Reviewer #2: In this manuscript by Warman et al., the authors performed RNA-seq at different stages of pollen development to analyze the expression patterns of both genes and transposable elements and then screened mutants in highly expressed genes for transmission defects that imply functional importance of these genes. The highlight of this manuscript was the use of GFP-marked insertion mutants followed by clever imaging to identify genes that reduce pollen fitness. The authors then characterized the aborted seed phenotype seen in paternally inherited gex2 mutants, showing that this gene is involved in double fertilization. I do have some concerns about this manuscript, specifically with the analysis of the transcriptome data and the integration of transcriptome data with the analysis of transmission defects. For example, the title is misleading. The authors show that several highly expressed maize pollen genes contribute to pollen fitness, but as far as I could tell, only one of these functionally tested genes is near a TE with coordinated expression. Furthermore, the result that highly expressed genes are enriched for up-regulated TEs nearby is interesting but is not well supported by the data presented. What I could not determine from the figures is how many genes have at least one up-regulated TE within 2 kb or how TEs that overlap genes are treated. In Figure 4A, the y-axis shows the number of TEs within 2 kb of bins of 200 genes, but since a single gene may have multiple up-regulated TEs within that vicinity, it is not clear how many genes are associated with up-regulated TEs. With respect to overlap, the authors acknowledge earlier in the manuscript that TEs within 2 kb of a gene introduce complications due to overlaps and the potential for read-through transcription, so it is not clear why this major conclusion of the paper is not more thoroughly assessed to rule out this possibility. The conclusion that TE transcription in maize is largely static across development is surprising and contradicts prior studies in maize using EST or RNA-seq data, but this difference is not addressed in the manuscript. In looking through the methods, I am wondering if the number of “static” transcripts might be exaggerated by not using an expression cutoff to filter out lowly expressed transcripts that lack power to be called DE and by partially counting ambiguous reads to multiple locations. Although I am not entirely sure from the methods how this analysis was performed since DE TEs were defined using DESeq2, which only allows integer counts, while FeatureCounts is set to output fractions for multi-mapped reads. The methods should also describe how read counts were normalized and what parameters were used to call significantly DE features. Finally, in Figure 2B the authors conclude that dynamic TEs tend to be longer than all TEs. However, it is not clear from this plot how much of this difference in length distribution results from the difference in TE types in the dynamic set. Since LTR Unknown TEs are on average larger than DNA transposons, the increased presence of LTRs alone could explain the pattern. If the authors want to conclude something about TE lengths in the different sets, TE lengths should be compared within each TE type. Reviewer #3: This was a fascinating and clearly written paper. The authors describe an RNA-seq based analysis to confirm that the derepression of transposons in male gametophytes previously reported in arabidopsis appears to be a more general property of angiosperms. They demonstrate the use of a novel screen to quantitatively estimate the effect of KO mutants on pollen fitness in a high throughput fashion and use this screen to qualify fitness effects for 30 genes. There is suggestive but not entirely conclusive evidence that genes which are highly expressed in the pollen vegetative cell are more likely to play rolls in determining pollen fitness. Finally the authors identify and characterize a novel mutant that appears to play a role in controlling the odds of proper double fertilization of both the central cell and egg cell of megagametophytes. The last is an area of growing interest from both a flowering plant evo/devo perspective as well as being an area of economic impact and applied research interest. As I think is clear from my summary I found this to be a very strong manuscript that will be of interest to a broad audience. I do, however, have a couple of minor concerns/questions: 1) Around like 343 the authors mention that they tested for changes in transmission rate using "h quasi-likelihood tests on generalized linear models with a logit link function for binomial counts." It would be good to include a little discussion of why this test was employed rather than a simple chi-square test. 2) For Figure 2B and 2D, violin plots may be more informative than simple box plots. 3) Unless I missed it, it doesn't seem like there is any discussion of the fact that genes with significant transmission defects, even only in the male direction, are likely underpresented in most reverse genetics populations. I would strongly suggest touching on this point in the discussion. If the data are available, it would also be quick and interesting to check whether the genes the authors identify has highly expressed in pollen vegetative cell genes indeed over or under represented in the reverse genetics population employed by the authors (relative to genes highly expressed in seedings or pollen sperm nuclei). But the test is a nice to have, not a must have. ********** Have all data underlying the figures and results presented in the manuscript been provided? Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: Yes Reviewer #2: None Reviewer #3: Yes ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: No 28 Jan 2020 Submitted filename: Response to Reviewers_Revision1.docx Click here for additional data file. 17 Feb 2020 Dear Dr Fowler, Thank you very much for submitting your Research Article entitled 'High expression in maize pollen correlates with genetic contributions to pollen fitness as well as with coordinated transcription from neighboring transposable elements' to PLOS Genetics. Your manuscript was fully evaluated at the editorial level and by independent peer reviewers. The reviewers appreciated the attention to an important topic but identified just a couple minor aspects of the manuscript that should be improved. Two reviewers were satisfied with the revisions but one reviewer noted some minor issues.  In addition, I think there may be one other detail that should be added relative to your analysis of the transmission phenotypes.  I did not find the specific location of the insertion sites (5' UTR, coding, 3' UTR).  I might have missed this but if this is not present could you please add this.  This will help to confirm that there are similar frequencies of coding insertions and also will reveal if coding/UTR insertions show any differences in frequency of transmission defects. We therefore ask you to modify the manuscript according to the review recommendations before we can consider your manuscript for acceptance. Your revisions should address the specific points made by each reviewer. In addition we ask that you: 1) Provide a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. 2) Upload a Striking Image with a corresponding caption to accompany your manuscript if one is available (either a new image or an existing one from within your manuscript). If this image is judged to be suitable, it may be featured on our website. Images should ideally be high resolution, eye-catching, single panel square images. For examples, please browse our archive. If your image is from someone other than yourself, please ensure that the artist has read and agreed to the terms and conditions of the Creative Commons Attribution License. Note: we cannot publish copyrighted images. We hope to receive your revised manuscript within the next 30 days. We do not anticipate a need to send this back out for another evaluation by external reviewers.  If you anticipate any delay in its return, we would ask you to let us know the expected resubmission date by email to plosgenetics@plos.org. If present, accompanying reviewer attachments should be included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist. While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission. PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process. To resubmit, you will need to go to the link below and 'Revise Submission' in the 'Submissions Needing Revision' folder. [LINK] Please let us know if you have any questions while making these revisions. Yours sincerely, Nathan M. Springer Associate Editor PLOS Genetics Gregory P. Copenhaver Editor-in-Chief PLOS Genetics Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: The authors have sufficiently addressed most of my comments. On line 525, I don't see the necessity to state "To our knowledge, this is the largest yet...." In the same way that I don't like claims of first. I suggest removing this phrasing: let the paper stand on its merits, without claims of size, priority, etc. line 584, they state here that the two points in the original title are independent, although this is somewhat burying the lede. In this section of the discussion they use the word "tendency" whereas the title, using the word correlates, sounds more definitive - so I think the title could be further revised to reflect reality, as stated in the discussion, i.e. tendencies, and independent. In their response to reviews they state: "In addition, we note that the maize and Arabidopsis nomenclature conventions also distinguish the genes, as maize uses all lower case letters (gex2), whereas Arabidopsis uses all upper case (GEX2)" The average reader will not be familiar with the nomenclature rules of both maize and Arabidopsis (lower case, upper case), so it is important (for readability/understanding) to distinguish them throughout. Reviewer #2: I appreciate the changes that the authors made to clarify the separate findings related to the transcriptome analyses and transmission defect / phenotypic characterizations. My technical concerns with the manuscript have been appropriately addressed and the authors have added sufficient clarifications on methods where needed. Reviewer #3: The authors have done an excellent job of addressing my concerns from the first round of review. ********** Have all data underlying the figures and results presented in the manuscript been provided? Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: None ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No Reviewer #3: No 27 Feb 2020 Dear Dr Fowler, We are pleased to inform you that your manuscript entitled "High expression in maize pollen correlates with genetic contributions to pollen fitness as well as with coordinated transcription from neighboring transposable elements" has been editorially accepted for publication in PLOS Genetics. Congratulations! Before your submission can be formally accepted and sent to production you will need to complete our formatting changes, which you will receive in a follow up email. Please be aware that it may take several days for you to receive this email; during this time no action is required by you. Please note: the accept date on your published article will reflect the date of this provisional accept, but your manuscript will not be scheduled for publication until the required changes have been made. Once your paper is formally accepted, an uncorrected proof of your manuscript will be published online ahead of the final version, unless you’ve already opted out via the online submission form. If, for any reason, you do not want an earlier version of your manuscript published online or are unsure if you have already indicated as such, please let the journal staff know immediately at plosgenetics@plos.org. In the meantime, please log into Editorial Manager at https://www.editorialmanager.com/pgenetics/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production and billing process. Note that PLOS requires an ORCID iD for all corresponding authors. Therefore, please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field.  This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager. If you have a press-related query, or would like to know about one way to make your underlying data available (as you will be aware, this is required for publication), please see the end of this email. If your institution or institutions have a press office, please notify them about your upcoming article at this point, to enable them to help maximise its impact. Inform journal staff as soon as possible if you are preparing a press release for your article and need a publication date. Thank you again for supporting open-access publishing; we are looking forward to publishing your work in PLOS Genetics! Yours sincerely, Nathan M. Springer Associate Editor PLOS Genetics Gregory P. Copenhaver Editor-in-Chief PLOS Genetics www.plosgenetics.org Twitter: @PLOSGenetics ---------------------------------------------------- Comments from the reviewers (if applicable): ---------------------------------------------------- Data Deposition If you have submitted a Research Article or Front Matter that has associated data that are not suitable for deposition in a subject-specific public repository (such as GenBank or ArrayExpress), one way to make that data available is to deposit it in the Dryad Digital Repository. As you may recall, we ask all authors to agree to make data available; this is one way to achieve that. A full list of recommended repositories can be found on our website. The following link will take you to the Dryad record for your article, so you won't have to re‐enter its bibliographic information, and can upload your files directly: http://datadryad.org/submit?journalID=pgenetics&manu=PGENETICS-D-19-01653R2 More information about depositing data in Dryad is available at http://www.datadryad.org/depositing. If you experience any difficulties in submitting your data, please contact help@datadryad.org for support. Additionally, please be aware that our data availability policy requires that all numerical data underlying display items are included with the submission, and you will need to provide this before we can formally accept your manuscript, if not already present. ---------------------------------------------------- Press Queries If you or your institution will be preparing press materials for this manuscript, or if you need to know your paper's publication date for media purposes, please inform the journal staff as soon as possible so that your submission can be scheduled accordingly. Your manuscript will remain under a strict press embargo until the publication date and time. This means an early version of your manuscript will not be published ahead of your final version. PLOS Genetics may also choose to issue a press release for your article. If there's anything the journal should know or you'd like more information, please get in touch via plosgenetics@plos.org. 13 Mar 2020 PGENETICS-D-19-01653R2 High expression in maize pollen correlates with genetic contributions to pollen fitness as well as with coordinated transcription from neighboring transposable elements Dear Dr Fowler, We are pleased to inform you that your manuscript entitled "High expression in maize pollen correlates with genetic contributions to pollen fitness as well as with coordinated transcription from neighboring transposable elements" has been formally accepted for publication in PLOS Genetics! Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out or your manuscript is a front-matter piece, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Genetics and open-access publishing. We are looking forward to publishing your work! With kind regards, Jason Norris PLOS Genetics On behalf of: The PLOS Genetics Team Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom plosgenetics@plos.org | +44 (0) 1223-442823 plosgenetics.org | Twitter: @PLOSGenetics
  72 in total

1.  Male Gametophyte Development.

Authors:  S. McCormick
Journal:  Plant Cell       Date:  1993-10       Impact factor: 11.277

Review 2.  Male gametophyte development and function in angiosperms: a general concept.

Authors:  Said Hafidh; Jan Fíla; David Honys
Journal:  Plant Reprod       Date:  2016-01-04       Impact factor: 3.767

3.  The maize aberrant pollen transmission 1 gene is a SABRE/KIP homolog required for pollen tube growth.

Authors:  Zhennan Xu; Hugo K Dooner
Journal:  Genetics       Date:  2005-11-19       Impact factor: 4.562

4.  Genome-wide identification of Pseudomonas syringae genes required for fitness during colonization of the leaf surface and apoplast.

Authors:  Tyler C Helmann; Adam M Deutschbauer; Steven E Lindow
Journal:  Proc Natl Acad Sci U S A       Date:  2019-09-04       Impact factor: 11.205

Review 5.  Fertilization Mechanisms in Flowering Plants.

Authors:  Thomas Dresselhaus; Stefanie Sprunck; Gary M Wessel
Journal:  Curr Biol       Date:  2016-02-08       Impact factor: 10.834

6.  MATRILINEAL, a sperm-specific phospholipase, triggers maize haploid induction.

Authors:  Timothy Kelliher; Dakota Starr; Lee Richbourg; Satya Chintamanani; Brent Delzer; Michael L Nuccio; Julie Green; Zhongying Chen; Jamie McCuiston; Wenling Wang; Tara Liebler; Paul Bullock; Barry Martin
Journal:  Nature       Date:  2017-01-23       Impact factor: 49.962

7.  Heritable targeted mutagenesis in maize using a designed endonuclease.

Authors:  Huirong Gao; Jeff Smith; Meizhu Yang; Spencer Jones; Vesna Djukanovic; Michael G Nicholson; Ande West; Dennis Bidney; S Carl Falco; Derek Jantz; L Alexander Lyznik
Journal:  Plant J       Date:  2009-10-07       Impact factor: 6.417

8.  Integration of omic networks in a developmental atlas of maize.

Authors:  Justin W Walley; Ryan C Sartor; Zhouxin Shen; Robert J Schmitz; Kevin J Wu; Mark A Urich; Joseph R Nery; Laurie G Smith; James C Schnable; Joseph R Ecker; Steven P Briggs
Journal:  Science       Date:  2016-08-19       Impact factor: 47.728

9.  InterPro in 2019: improving coverage, classification and access to protein sequence annotations.

Authors:  Alex L Mitchell; Teresa K Attwood; Patricia C Babbitt; Matthias Blum; Peer Bork; Alan Bridge; Shoshana D Brown; Hsin-Yu Chang; Sara El-Gebali; Matthew I Fraser; Julian Gough; David R Haft; Hongzhan Huang; Ivica Letunic; Rodrigo Lopez; Aurélien Luciani; Fabio Madeira; Aron Marchler-Bauer; Huaiyu Mi; Darren A Natale; Marco Necci; Gift Nuka; Christine Orengo; Arun P Pandurangan; Typhaine Paysan-Lafosse; Sebastien Pesseat; Simon C Potter; Matloob A Qureshi; Neil D Rawlings; Nicole Redaschi; Lorna J Richardson; Catherine Rivoire; Gustavo A Salazar; Amaia Sangrador-Vegas; Christian J A Sigrist; Ian Sillitoe; Granger G Sutton; Narmada Thanki; Paul D Thomas; Silvio C E Tosatto; Siew-Yit Yong; Robert D Finn
Journal:  Nucleic Acids Res       Date:  2019-01-08       Impact factor: 16.971

10.  Indirect and suboptimal control of gene expression is widespread in bacteria.

Authors:  Morgan N Price; Adam M Deutschbauer; Jeffrey M Skerker; Kelly M Wetmore; Troy Ruths; Jordan S Mar; Jennifer V Kuehl; Wenjun Shao; Adam P Arkin
Journal:  Mol Syst Biol       Date:  2013-04-16       Impact factor: 11.429

View more
  10 in total

1.  Long-Read cDNA Sequencing Enables a "Gene-Like" Transcript Annotation of Transposable Elements.

Authors:  Kaushik Panda; R Keith Slotkin
Journal:  Plant Cell       Date:  2020-07-09       Impact factor: 11.277

2.  Comprehensive analysis of both long and short read transcriptomes of a clonal and a seed-propagated model species reveal the prerequisites for transcriptional activation of autonomous and non-autonomous transposons in plants.

Authors:  Ting-Hsuan Chen; Christopher Winefield
Journal:  Mob DNA       Date:  2022-05-12

Review 3.  Consequences of whole genome duplication for 2n pollen performance.

Authors:  Joseph H Williams
Journal:  Plant Reprod       Date:  2021-07-24       Impact factor: 3.767

4.  The regulatory landscape of early maize inflorescence development.

Authors:  Rajiv K Parvathaneni; Edoardo Bertolini; Md Shamimuzzaman; Daniel L Vera; Pei-Yau Lung; Brian R Rice; Jinfeng Zhang; Patrick J Brown; Alexander E Lipka; Hank W Bass; Andrea L Eveland
Journal:  Genome Biol       Date:  2020-07-06       Impact factor: 13.583

5.  Genomic and Transcriptomic Survey Provides New Insight into the Organization and Transposition Activity of Highly Expressed LTR Retrotransposons of Sunflower (Helianthus annuus L.).

Authors:  Ilya Kirov; Murad Omarov; Pavel Merkulov; Maxim Dudnikov; Sofya Gvaramiya; Elizaveta Kolganova; Roman Komakhin; Gennady Karlov; Alexander Soloviev
Journal:  Int J Mol Sci       Date:  2020-12-07       Impact factor: 5.923

6.  A Maize Male Gametophyte-Specific Gene Encodes ZmLARP6c1, a Potential RNA-Binding Protein Required for Competitive Pollen Tube Growth.

Authors:  Lian Zhou; Zuzana Vejlupkova; Cedar Warman; John E Fowler
Journal:  Front Plant Sci       Date:  2021-02-25       Impact factor: 5.753

7.  Deep learning-based high-throughput phenotyping can drive future discoveries in plant reproductive biology.

Authors:  Cedar Warman; John E Fowler
Journal:  Plant Reprod       Date:  2021-03-16       Impact factor: 3.767

8.  The maize gene maternal derepression of r1 encodes a DNA glycosylase that demethylates DNA and reduces siRNA expression in the endosperm.

Authors:  Jonathan I Gent; Kaitlin M Higgins; Kyle W Swentowsky; Fang-Fang Fu; Yibing Zeng; Dong Won Kim; R Kelly Dawe; Nathan M Springer; Sarah N Anderson
Journal:  Plant Cell       Date:  2022-09-27       Impact factor: 12.085

Review 9.  Insights into the molecular evolution of fertilization mechanism in land plants.

Authors:  Vijyesh Sharma; Anthony J Clark; Tomokazu Kawashima
Journal:  Plant Reprod       Date:  2021-06-01       Impact factor: 3.767

Review 10.  Evolutionary Genomics of Plant Gametophytic Selection.

Authors:  Felix E G Beaudry; Joanna L Rifkin; Spencer C H Barrett; Stephen I Wright
Journal:  Plant Commun       Date:  2020-10-24
  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.