Literature DB >> 32963085

Network Analysis Prioritizes DEWAX and ICE1 as the Candidate Genes for Major eQTL Hotspots in Seed Germination of Arabidopsis thaliana.

Margi Hartanto¹, Ronny V L Joosen², Basten L Snoek³, Leo A J Willems², Mark G Sterken⁴, Dick de Ridder⁵, Henk W M Hilhorst², Wilco Ligterink², Harm Nijveen¹.

Abstract

Seed germination is characterized by a constant change of gene expression across different time points. These changes are related to specific processes, which eventually determine the onset of seed germination. To get a better understanding on the regulation of gene expression during seed germination, we performed a quantitative trait locus mapping of gene expression (eQTL) at four important seed germination stages (primary dormant, after-ripened, six-hour after imbibition, and radicle protrusion stage) using Arabidopsis thaliana Bay x Sha recombinant inbred lines (RILs). The mapping displayed the distinctness of the eQTL landscape for each stage. We found several eQTL hotspots across stages associated with the regulation of expression of a large number of genes. Interestingly, an eQTL hotspot on chromosome five collocates with hotspots for phenotypic and metabolic QTL in the same population. Finally, we constructed a gene co-expression network to prioritize the regulatory genes for two major eQTL hotspots. The network analysis prioritizes transcription factors DEWAX and ICE1 as the most likely regulatory genes for the hotspot. Together, we have revealed that the genetic regulation of gene expression is dynamic along the course of seed germination.

Entities: Chemical Disease Gene Species

Keywords: Arabidopsis; eQTL; network analysis; seed germination

Mesh：

Substances：

Year: 2020 PMID： 32963085 PMCID： PMC7642920 DOI： 10.1534/g3.120.401477

Source DB: PubMed Journal: G3 (Bethesda) ISSN： 2160-1836 Impact factor: 3.154

Seed germination involves a series of events starting with the transition of quiescent to physiologically active seeds and ends with the emergence of the embryo from its surrounding tissues. Germination is initiated when seeds become imbibed by water, leading to the activation of seed physiological activities (Bewley ; Nonogaki ). Major metabolic activities occur after seeds become hydrated, for example, restoration of structural integrity, mitochondrial repair, initiation of respiration, and DNA repair (Bewley ; Nonogaki ). For some species such as Arabidopsis thaliana, germination can be blocked by seed dormancy. Dormant seeds need to sense and respond to environmental cues to break their dormancy and complete germination. In Arabidopsis thaliana, seed dormancy can be alleviated by periods of dry after-ripening or moist chilling (Bewley ). Soon after dormancy is broken, the storage reserves are broken down, and germination-associated proteins are synthesized. Lastly, further water uptake followed by cell expansion leads to radicle protrusion through endosperm and seed coat, which marks the end of germination (Bewley ). A major determinant for the completion of seed germination is the transcription and translation of mRNAs. The activity of mRNA transcription is low in dry, mature seeds (Comai and Harada 1990; Leubner-Metzger 2005), and drastically increases after seeds become rehydrated (Bewley ). Nevertheless, stored mRNAs of more than 12,000 genes with various functions are already present in dry seeds. These mRNAs are not only remnants from the seed developmental process, but also mRNAs for genes related to metabolism as well as protein synthesis and degradation required in early seed germination (Nakabayashi ; Rajjou ). Later in after-ripened seeds, only a slight change in transcript composition was detected compared to the dry seeds (Finch-Savage ). The major shift in transcriptome takes place after water imbibition (Nakabayashi ). Interestingly, the transcriptome at the imbibition stage depends on the status of dormancy. For non-dormant seeds, most of the transcripts are associated with protein synthesis, while for dormant seeds, the transcripts are dominated by genes associated with stress-responses (Finch-Savage ; Buijs ). Even the transcript composition in primary dormant seeds, which occurs when the dormancy is initiated during development, is different from that of secondary dormant seeds, which occurs when the dormancy is reinduced (Cadman ). These findings show the occurrence of phase transitions in transcript composition along the course from dormant to germinated seed. As omics technology becomes more widely available, several transcriptomics studies in seed germination processes have been conducted on a larger-scale. More developmental stages, e.g., stratification and seedling stage, and even spatial analyses have been included in these studies, resulting in the identification of gene co-expression patterns as well as the predicted functions of hub-genes (Narsai ; Silva ; Dekkers ; Bassel ). Through guilt-by-association, these co-expression based studies can be used for the identification of regulatory genes that are involved in controlling the expression of downstream genes. These regulatory genes can be subjected to further studies by reverse genetics to provide more insight into the molecular mechanisms of gene expression in seed germination (e.g., Silva ). Nevertheless, this approach still has limitations. Uygun argued that co-expressed genes do not always have similar biological functions. On the other hand, genes involved in the same function are not always co-expressed since gene expression regulation could be the result of post-transcriptional or other layers of regulation (Lelli ). Further, Uygun emphasized the importance of combining the expression data with multiple relevant datasets to maximize the effort in the prioritization of candidate regulatory genes. Genetical genomics is a promising approach to study the regulation of gene expression by combining genome-wide expression data with genotypic data of a segregating population (Jansen and Nap 2001). To enable this strategy, the location of markers associated with variation in gene expression is mapped on the genome, which results in the identification of expression quantitative trait loci (eQTLs). Relative to the location of the associated gene, the eQTL can be locally or distantly mapped, known as local and distant eQTLs (Rockman and Kruglyak 2006; Brem ). Local eQTLs mostly arise because of variations in the corresponding gene or a cis-regulatory element. In contrast, distant eQTLs typically occur due to polymorphism on trans-regulatory elements located far away from the target genes (Rockman and Kruglyak 2006). Therefore, given the positional information of distant eQTLs, one can identify the possible regulators of gene expression. However, the eQTL interval typically spans a large area of the genome and harbors hundreds of candidate regulatory genes. A large number of candidate genes would cause the experimental validation (e.g., using knockout or overexpression lines) to be costly and take a long time. Therefore, a prioritization method is needed to narrow down the list of candidate genes underlying eQTLs, particularly on distant eQTL hotspots. A distant eQTL hotspot is a genomic locus where a large number of distant eQTLs are collocated (Breitling ). The common assumption is that the hotspot arises due to one or more polymorphic master regulatory genes affecting the expression of multiple target genes (Breitling ). Therefore, the identification of master regulatory genes becomes the center of most genetical genomics studies as the findings might improve our understanding of the regulation of gene expression (i.e., in Keurentjes ; Jimenez-Gomez ; Sterken ; Valba ; Terpstra ). In this study, we carried out eQTL mapping to reveal loci controlling gene expression in seed germination. To capture whole transcriptome changes during seed germination, we included four important seed germination stages, which are primary dormant seeds (PD), after-ripened seeds (AR), six-hours imbibed seeds (IM), and seeds with radicle protrusion (RP). In total, 160 recombinant inbred lines (RILs) from a cross between genetically distant ecotypes Bayreuth and Shahdara (Bay x Sha) were used in this study (Loudet ). Our results show that each seed germination stage has a unique eQTL landscape, confirming the stage-specificity of gene regulation, particularly for distant regulation. Based on network analysis, we identify the transcription factors ICE1 and DEWAX as prioritized candidate regulatory genes for two major eQTL hotspots in PD and RP, respectively. Finally, the resulting dataset complements the previous phenotypic QTL (Joosen ) and metabolite QTL (Joosen ) datasets, allowing systems genetics studies in seed germination. The identified eQTLs are available through the web-based AraQTL (http://www.bioinformatics.nl/AraQTL/) workbench (Nijveen ).

Materials and Methods

Plant materials

In this study, we used 164 recombinant inbred lines (RILs) derived from a cross between the Bayreuth and Shahdara Arabidopsis ecotypes (Loudet ) provided by the Versailles Biological Resource Centre for Arabidopsis http://publiclines.versailles.inra.fr/. The plants were sown in a fully randomized setup on 4x4 cm rockwool plugs (MM40/40, Groudan B. V.) and hydrated with 1 g/l Hyponex (NPK = 7:6:19, http://www.hyponex.co.jp) in a climate chamber (20° day, 18° night) with 16 hr of light (35 W/m2) at 70% relative humidity. Seeds from four to seven plants per RIL were bulk harvested for the experiment (see also Joosen ; Joosen ). The genotypic data consisting of 1,059 markers per line was obtained from Serin . However, the genotypic data are available only for 160 RILs; therefore, we used this number of lines for eQTL mapping.

Experimental setup

The RIL population was grouped into four subpopulations, each one representing one of the four different seed germination stages. We used the designGG-package (Li ) in R (version 3.6.0 Windows x64) to aid the grouping so that the distribution of Bay-0 and Sha alleles between sub-populations is optimized. The first stage is the primary dormant (PD) stage when the seeds were harvested and stored at -80° after one week at ambient conditions. The second stage is after-ripened (AR) seeds that obtained maximum germination potential after five days of imbibition by storing at room temperature and ambient relative humidity. The third stage is the 6 hr imbibition (IM) stage. For this stage, the seeds were after-ripened and imbibed for six hours on water-saturated filter paper at 20° and immediately transferred to a dry filter paper for 1 min to remove the excess of water. The fourth stage is the radicle protrusion (RP) stage. To select seeds at this stage, we used a binocular to observe the presence of a protruded radicle tip.

RNA isolation

Total RNA was extracted according to the hot borate protocol modified from Wan and Wilkins (1994). For each treatment, 20 mg of seeds were homogenized and mixed with 800 μl of extraction buffer (0.2M Na boratedecahydrate (Borax), 30 mM EGTA, 1% SDS, 1% Na deoxycholate (Na-DOC)) containing 1.6 mg DTT and 48 mg PVP40 which had been heated to 80°. Then, 1 mg proteinase K was added to this suspension and incubated for 15 min at 42°. After adding 64 μl of 2 M KCL, the samples were incubated on ice for 30 min and subsequently centrifuged for 20 min at 12,000 g. Ice-cold 8 M LiCl was added to the supernatant in a final concentration of 2 M, and the tubes were incubated overnight on ice. After centrifugation for 20 min at 12,000 g at 4°, the pellets were washed with 750 μl ice-cold 2 M LiCl. The samples were centrifuged for another 10 min at 10,000 g at 4°, and the pellets were re-suspended in 100 μl DEPC treated water. The samples were phenol-chloroform extracted, DNAse treated (RQ1 DNase, Promega), and further purified with RNeasy spin columns (Qiagen) following the manufacturer’s instructions. The RNA quality and concentration were assessed by agarose gel electrophoresis and UV spectrophotometry.

Microarray analysis

RNA was processed for use on Affymetrix Arabidopsis SNPtile array (atSNPtilx520433), as described by the manufacturer. Briefly, 1 mg of total RNA was reverse transcribed using a T7-Oligo(dT) Promoter Primer in the first-strand cDNA synthesis reaction. Following RNase H-mediated second-strand cDNA synthesis, the double-stranded cDNA was purified and served as a template in the subsequent in vitro transcription reaction. The reaction was carried out in the presence of T7 RNA polymerase and a biotinylated nucleotide analog/ribonucleotide mix for complementary RNA (cRNA) amplification and biotin labeling. The biotinylated cRNA targets were then cleaned up, fragmented, and hybridized to the SNPtile array. The hybridization data were extracted using a custom R script with the help of an annotation-file based on TAIR10. Intensity data were log-transformed and normalized using the normalizeBetweenArrays function with the quantile method from Bioconductor package limma (Ritchie ). Then, for each annotated gene, the log-intensities of anti-sense exon probes were averaged.

Clustering analysis

Principal component analysis for log-intensities of all parents and RIL population samples was done using the pr.comp function in R where the unscaled log intensities are shifted to be zero centered. For hierarchical clustering, we only selected genes with a minimal fold change of 2 between any pair of consecutive stages (PD to AR, AR to IM, or IM to RP). Then, the distance matrices of filtered genes and all samples were calculated using the absolute Pearson correlation. These matrices were clustered using Ward’s method. We manually set the number of clusters to eight and performed gene ontology enrichment for each of the clusters using the weight algorithm of the topGO package in R and used 29,913 genes detected by hybridization probes as the background (Alexa ).

eQTL mapping

For eQTL mapping, we used 160 RILs separated into four subpopulations, each representing one specific seed germination stage. For each stage separately, eQTLs were mapped using a single-marker model, as in Sterken . The gene expression data were fitted to the linear modelwhere y is the log-intensity representing the expression of a gene ( = 1, 2, ..., 29,913) of RIL ( = 1, 2, ..., 160) explained by the parental allele on marker location ( = 1, 2, ..., 1,059). The random error in the model is represented by . To account for the multiple-testing burden in this analysis, we determined the genome-wide significant threshold using a permutation approach (e.g., see Sterken ). A permuted dataset was created by randomly distributing the log-intensities of the gene under study over the genotypes. Then, the previous eQTL mapping model was performed on this permuted dataset. This procedure was repeated 100 times for each stage. The threshold was determined using:where, at a specific significance level, the false discoveries () were the averaged permutation result, and real discoveries () were the outcome of the eQTL mapping using the unpermuted dataset. The number of true hypotheses tested () was 29,913 - and the number of hypotheses () tested was the number of genes, which was 29,913. For the -value, we used a threshold of 0.05. As a result, we got a threshold of 4.2 for PD and AR, 4.1 for IM, and 4.3 for RP. The confidence interval of an eQTL was determined based on a -log10(p-value) drop of 1.5 compared to the peak marker (as in Keurentjes ; Cubillos ). We determine an eQTL as local if the peak marker or the confidence interval lies within 1 Mb or less from the target gene location (as in Cubillos ). All eQTLs that did not meet this criterion were defined as distant. We defined a region as an eQTL hotspot if the number of distant-eQTLs mapped to a particular genomic region significantly exceeded the expectation. First, we divided the genome into bins of 2 Mb. Then, we determined the expected number of distant-eQTLs per genomic bin by dividing the total number of distant-eQTLs by the total number of bins. Based on a Poisson distribution, any bin having an actual number of distant-eQTLs larger than expected (P < 0.0001) was then considered as an eQTL hotspot.

Gene regulatory network inference and candidate genes prioritization of eQTL hotspot

We used a community-based approach to infer regulatory networks of genes with an eQTL on a hotspot location using expression data. In this approach, we assume the hotspot is caused by a polymorphism in or near one or more regulatory genes causing altered expression that can be detected as a local eQTL (Joosen ; Breitling ; Jimenez-Gomez ; Serin ). Based on this assumption, we labeled all genes with a local eQTL on a hotspot as candidate regulators and genes with a distant eQTL as targets. The expression of these genes was subjected to five different network inference methods to predict the interaction weight. The methods used were TIGRESS (Haury ), Spearman correlation, CLR (Faith ), ARACNE (Margolin ), and GENIE3 (Huynh-Thu ). The predictions from GENIE3 were used to establish the direction of the interaction by removing the one that has the lowest variable importance to the expression of the target genes between two pairs of genes. For instance, if the importance of genei – genej is smaller than genej – genei, then the former is removed. By averaging the rank, the predictions of all inference methods were integrated to produce a robust and high performance prediction (Marbach ). The threshold was determined as the minimum average rank where all nodes are included in the network. Finally, the network was visualized using Cytoscape (version 3.7.1) (Shannon ), and network properties were calculated using the NetworkAnalyzer tool (Assenov ). The candidate genes for each eQTL hotspot were prioritized based on their outdegree and closeness centrality (Pavlopoulos ).

Data availability

The list of genetic markers, genotype, and gene expression data used in this study are given in Table S7, Table S8, and Table S9, respectively. Cel files of microarray data have been deposited in the ArrayExpress database at EMBL-EBI (www.ebi.ac.uk/arrayexpress) under accession number E-MTAB-9080. The phenotype and metabolite measurement can be found in Table S10 and Table S11. The list of differentially expressed genes is in Table S12. All QTL mapping results are given in Table S13 (expression QTL), Table S14 (phenotype QTL), and Table S15 (metabolite QTL). The code for the analysis and visualization is available in the form of R scripts at the Wageningen University GitLab repository (https://git.wur.nl/harta003/seed-germination-qtl). Supplemental material available at figshare: https://doi.org/10.25387/g3.12844358.

Results

Major transcriptional shifts take place after water imbibition and radicle protrusion

To visualize the transcriptional states of the parental lines and the RILs at the four seed germination stages, we performed a principal component analysis using the log-intensities of all expressed genes (Figure 1). The first principal component explains 55.6% of the variation and separates the samples into three groups. Germination progresses from left to right with the PD and AR seeds grouping together, indicating that the after-ripening treatment does not induce a considerable change in global transcript abundance. The large-scale transcriptome change only happens after water imbibition and radicle protrusion. This event was also observed by Finch-Savage and Silva . The second principal component on the PCA explains 14.2% variance in the data and separates the RILs within each of the three clusters but not the parents. The source of this variation may be the genetic variation among samples and shows transgressive segregation of gene expression in RILs due to genetic reshuffling of the parental genomes during crossing and generations of selfing.

Figure 1

Principal component plot derived from transcriptome measurements of 164 RILs, and the Bay-0 and Sha parental lines taken at primary dormant seed (PD), after-ripened seed (AR), six-hours after imbibition (IM), and at the time when the radicle is protruded (RP). To identify specific expression patterns among genes in the course of seed germination, we performed an additional analysis of the transcriptome data using hierarchical clustering (Figure 2). For this analysis, we only selected the 990 genes with a minimal fold change of two between any two consecutive stages (PD to AR, AR to IM, IM to RP). We then clustered both the genes and the seed samples. As shown in the figure, the clustering of samples shows similar grouping as in the previous PCA plot; three clusters were formed with one cluster containing both PD and AR, while IM and RP form separate clusters.

Figure 2

Hierarchical clustering of Bay-0, Sha, and 164 RILs transcriptome samples measured at four different seed germination stages (top) and 990 genes differentially expressed between two consecutive stages (left). Listed genes are the sample of genes for each cluster. Some enriched gene ontology terms for gene clusters are listed on the right. The clustering of genes shows at least three distinctive gene expression patterns. In the first pattern, transcript abundance is highest in the last stage, radicle protrusion. A GO enrichment test suggests that transcripts with this expression pattern are involved in the transition from the heterotrophic seed to the autotrophic seedling stage, with enriched processes such as photosynthesis, response to various light, and response to temperature. This is in agreement with Rajjou , who showed that genes required for seedling growth are expressed after water imbibition. The second pattern shows an opposite trend with higher transcript abundances in the first three stages and lower expression at the end of the seed germination process. Some of these transcripts may be the remnant of seed development since the GO term related to this process is overrepresented. Moreover, transcripts involved in response to hydrogen peroxide were also overrepresented, which provides more evidence for the importance of reactive oxygen species in seed germination (for review see Wojtyla ). The last pattern represents genes that are upregulated at the IM stage. Genes with this pattern are functionally enriched in the catabolism of fatty acids, a likely source of energy for seedling growth (Bewley ). Altogether, these results suggest that co-expression patterns of genes reflect particular functions during the seed germination process.

Distant eQTLs explain less variance than local eQTLs and are more specific to a seed germination stage

To map loci associated with gene expression levels, we performed eQTL mapping of 29,913 genes for each seed population representing four seed germination stages (Table 1). We found eQTLs, numbers ranging from 1,335 to 1,719 per stage (FDR = 0.05), spread across the genome. Among the genes with an eQTL, only a few (less than 1%) had more than one. We then categorized the eQTLs into local and distant based on the distance between the target gene and the eQTL peak marker or the confidence interval. Based on this criterion, over 72% of the eQTLs per stage were categorized as local (located within 1 Mb of the gene), while the remainder were distant. Although the total of the identified eQTLs was different between the stages, the ratio of distant to local eQTLs was relatively similar for all stages. We then calculated the fraction of the total variation that is explained by the simple linear regression model for each eQTL. By comparing the density distributions (Figure S1), we showed that local eQTLs generally explain a more substantial fraction of gene expression variation than distant eQTLs. Finally, we determined the number of specific and shared eQTLs across stages (Figure 3). Here, we show that distant eQTLs are more specific to seed germination stages. Local eQTLs, on the other hand, are commonly shared between two or more stages, which is in line with previous experiments showing overlapping local eQTLs and specific distant eQTLs across different developmental stages (Vinuela ), environments (Snoek ; Snoek ; Lowry ) and populations (Cubillos ).

Table 1

Summary of the eQTL mapping for the four different seed germination stages

stage	eQTLs	genes with an eQTL	eQTL type	total	proportion
primary dormant	1,335	1,328	local	955	0.72
primary dormant	1,335	1,328	distant	380	0.28
after-ripened	1,395	1,377	local	1,089	0.78
after-ripened	1,395	1,377	distant	306	0.22
six hours after imbibition	1,719	1,702	local	1,320	0.77
six hours after imbibition	1,719	1,702	distant	399	0.23
radicle protrusion	1,426	1,418	local	1,096	0.77
radicle protrusion	1,426	1,418	distant	330	0.23

Figure 3

Shared local and distant eQTLs per seed germination stage.

An eQTL hotspot on chromosome 5 is associated with genes related to seed germination and collocates with multiple metabolic and phenotypic QTL

To get an overview of how the eQTLs were mapped over the genome, we visualized the eQTL locations and their associated genes on a local/distant eQTL plot (Figure 4A). Here, the local eQTLs are aligned across the diagonal and spread relatively equally across the genome, while it is not the case for the distant eQTLs. Furthermore, specific loci show clustering of eQTLs, which could indicate the presence of major regulatory genes that cause genome-wide gene expression changes. We identified ten so-called (distant-) eQTL hotspots, with at least two hotspots per stage (Table 2). The number of distant eQTLs located within these hotspots ranges from 16 to 96. The major eQTL hotspots are PD2, IM2, and RP4, with 69, 69, and 96 distant eQTLs co-locating, respectively. Moreover, the landscape of the eQTL hotspots (Figure 4B) differs for every stage, including PD and AR, which is surprising since these two stages have a relatively similar transcriptome profile (Figure 1).

Figure 4

Table 2

Distant eQTL hotspots of the four seed germination stages. These hotspots were identified by dividing the genome into bins of 2 Mbp and performing a test to determine whether the number of distant eQTLs on a particular bin is higher than expected (P > 0.0001) assuming a Poisson distribution. Seed germination phenotype and metabolite data were taken from Joosen and Joosen , respectively. Detailed information about enriched GO terms, metabolite, and phenotype can be seen on Supplemental Table S2 in the Supplementary Material

hotspot ID	position	distant eQTLs	enriched GO terms	metabolite QTL	phenotype QTL
PD1	ch1:6-10 Mb	43	11	1	4
PD2	ch3:8-12 Mb	69	3	2	1
AR1	ch2:12-14 Mb	16	0	0	0
AR2	ch3:2-4 Mb	20	9	1	1
IM1	ch5:6-8 Mb	19	2	24	1
IM2	ch5:22-26 Mb	69	6	6	31
RP1	ch1:0-2 Mb	23	1	0	1
RP2	ch1:6-8 Mb	18	0	0	3
RP3	ch5:14-16 Mb	21	29	0	1
RP4	ch5:24-26Mb	96	18	20	25

eQTL mapping from four different seed germination stages. The local-distant eQTL plot is shown on top (A). The positions of eQTLs are plotted along the five chromosomes on the x-axis and the location of the genes with an eQTL is plotted on the y-axis. The black dots (●) represent local eQTLs (located within 1 Mb of the associated gene) and the colored dots represent distant eQTLs (located far from the associated gene). The gray horizontal lines next to each dot indicate the confidence interval of the eQTL location based on a 1.5 drop in -log10(p-value). The histogram of the number of eQTLs per genomic location is shown at the bottom (B). The horizontal dashed black lines mark the significance threshold for an eQTL hotspot. We remapped the QTL for previously studied seed germination phenotypes (Joosen ) and metabolites (Joosen ) using the RNA-seq based genetic map (Serin ). We then visualized the resulting QTL count histograms alongside the eQTL histogram (Figure 5). The histogram shows that several eQTL hotspots collocate with hotspots for phenotype and metabolite QTL (phQTLs and mQTLs, respectively). The most striking example is the collocation of QTL on chromosome 5 around 24—25 Mb (IM2 and RP4) at the last two stages of seed germination. We performed gene ontology (GO) term enrichment analysis for genes with an eQTL mapping to these hotspots, and found ‘seed germination’ enriched among other terms (Table 2). These findings taken together indicate that the IM2 and RP4 hotspots harbor one or more important genes affecting gene expression during seed germination. Therefore, the identification of the regulatory gene(s) for one of these hotspots can give us more insight into the trans-regulation of gene expression during seed germination.

Figure 5

Hotspots for phQTLs, mQTLs, and eQTLs. A region of interest is located on chromosome 5 (around 24—26 Mb) where hotspots from different QTL levels collocate.

Transcription factors were prioritized as the candidate genes for major eQTL hotspots

To prioritize the candidate regulatory genes underlying eQTL hotspots in this study, we constructed a network based on the expression of genes with eQTLs on the hotspot location. We built the network for two hotspots: RP4, where QTL for expression, metabolite, and phenotype are collocated; and PD2, another major eQTL hotspot in this study. For RP4, the total number of genes used to construct the network was 116, of which 20 had a local eQTL at the hotspot, whereas for PD2, 114 genes were identified, of which 45 with a local eQTL. The genes with local eQTLs were then labeled as candidates. The networks were constructed by integrating predictions from several gene regulatory network inference methods to ensure the robustness of the result (Marbach ). The direction of the edges in the network is predicted using the GENIE3 method (Huynh-Thu ). For each candidate gene, we calculated the outdegree, indicating the number of outgoing edges of a gene to other genes in the network, and the closeness centrality of the candidate gene nodes, which shows the efficiency of the gene in spreading information to the rest of the genes in the network (Pavlopoulos ). Finally, these two network properties were used to prioritize the most likely regulator of the distant eQTL hotspot. In the resulting network, genes encoding the transcription factors DECREASE WAX BIOSYNTHESIS/DEWAX (AT5G61590), and INDUCER OF CBP EXPRESSION 1/ICE1 (AT3G26744) were prioritized as the most likely candidate genes for RP4 (Figure 6) and PD2 (Figure 7), respectively. As many as 15 genes were predicted to be associated with DEWAX and 32 genes with ICE1. Note that these numbers depend on the chosen threshold; nonetheless, the current candidates are robust to changes when the parameter was changed (Table S3 and Table S4). Furthermore, these two genes also had the highest closeness centrality among the other candidates, showing that these genes have a strong influence within the network. We assessed the Bay x Sha SNP data (Genomes Consortium. Electronic address and 1001 Genomes Consortium 2016) and found several SNPs between the Bay and Sha parents in both the DEWAX and ICE1 genes, including two that affect the amino acid sequence of the corresponding proteins (Table S5 and Table S6). Also, querying for DEWAX and ICE1 on AraQTL showed a local eQTL for both genes in an experiment using the same RIL population on leaf tissue (West ). This evidence supports the hypothesis that polymorphisms between the Bay and Sha alleles of DEWAX and ICE1 are responsible for the steadily occurring local eQTLs at three stages (PD, IM, RP) for DEWAX and all four stages for ICE1.

Figure 6

Figure 7

The prioritization of candidate genes for the PD2 eQTL hotspot. The network of genes associated with PD2 is visualized in A. The genes in the network are represented by nodes with various sizes according to the outdegree. The unlabeled gray nodes are the targets (genes with a distant eQTL) and labeled green nodes are the candidates (genes with a local eQTL). Nodes with a red border are transcription factors. The yellow node is ICE1 (AT3G26744). The list of top ten candidate genes for the hotspot is shown in B. The expression of ICE1 in 160 RILs across the four seed germination stages is visualized in C. The RILs with the Sha allele of the gene are depicted in blue, the ones with the Bay-0 allele of ICE1 are depicted in red.

The prioritization of candidate genes for RP4 eQTL hotspot. The network of genes associated with RP4 is visualized in A. The genes in the network are represented by nodes with various sizes according to the outdegree. The unlabeled gray nodes are the targets (genes with a distant eQTL) and the labeled green nodes are the candidates (genes with a local eQTL). Nodes with a red border are transcription factors. The yellow node is DEWAX (AT5G61590). The list of top ten candidate genes for the hotspot is shown in B. The expression of DEWAX in 160 RILs across the four seed germination stages is visualized in C. The RILs with the Sha allele of the gene are depicted in blue, the ones with the Bay-0 allele of DEWAX are depicted in red. The prioritization of candidate genes for the PD2 eQTL hotspot. The network of genes associated with PD2 is visualized in A. The genes in the network are represented by nodes with various sizes according to the outdegree. The unlabeled gray nodes are the targets (genes with a distant eQTL) and labeled green nodes are the candidates (genes with a local eQTL). Nodes with a red border are transcription factors. The yellow node is ICE1 (AT3G26744). The list of top ten candidate genes for the hotspot is shown in B. The expression of ICE1 in 160 RILs across the four seed germination stages is visualized in C. The RILs with the Sha allele of the gene are depicted in blue, the ones with the Bay-0 allele of ICE1 are depicted in red.

Discussion

The function of DEWAX may be related to seed cuticular wax biosynthesis

In this study, we constructed a network of genes associated with the RP4 eQTL hotspot and showed that DEWAX was prioritized as the candidate gene for the hotspot. DEWAX encodes an AP2/ERF-type transcription factor that is well-known as a negative regulator of cuticular wax biosynthesis (Go et al. 2014; Suh and Go 2014; Cui et al., 2016; Li et al., 2019) and a positive regulator of defense response against biotic stress (Froschel ; Ju ). This gene also seems to be involved in drought stress response (Huang ) by inducing the expression of genes that confer drought tolerance (Sun ), some of which (LEA4-5, LTI-78) have a distant eQTL at the RP4 hotspot. Moreover, the overexpression of DEWAX in Arabidopsis increases the seed germination rate (Sun ). The role of DEWAX in seed germination is still unknown but may be related to cuticular wax biosynthesis. Wax is a mixture of hydrophobic lipids, which is part of the plant cuticle together with cutin and suberin (Yeats and Rose 2013). Previous studies have demonstrated that the biosynthesis of wax in the cuticular layer of stems and leaves is negatively regulated by DEWAX (Go ; Suh and Go 2014; Cui ; Li ). Although the function of this gene has never been reported in seeds, the presence of a cuticular layer indeed plays a significant role in maintaining seed dormancy (Nonogaki 2019; De Giorgi ). In Arabidopsis seeds, the thick cuticular structure covering the endosperm prevents cell expansion and testa rupture that precede radicle protrusion. Besides, this layer also reduces the diffusion of oxygen into the seed, thus preventing oxidative stress that may cause rapid seed aging and loss of dormancy (De Giorgi ). Besides DEWAX, MUM2 is another possible regulatory gene for the RP4 hotspot based on QTL confirmation of an imbibed seed size phenotype using a heterogeneous inbred family approach (Joosen ). In our study, we also discovered that most eQTLs on the RP4 hotspot peak at the marker located closely to the MUM2 location (Figure S2), which provides more evidence for this gene as the regulator for the hotspot. MUM2 encodes a cell-wall modifying beta-galactosidase involved in seed coat mucilage biosynthesis, and the mum2 mutant is characterized by a failure in extruding mucilage after water imbibition (Dean ). In our analysis, MUM2 did not have a distant eQTL on the RP4 hotspot; thus, it is not prioritized as a prominent candidate, pointing out a limitation of our approach in prioritizing candidate eQTL hotspot genes which will be discussed later. Nonetheless, we found some evidence connecting DEWAX to MUM2. First, Shi found out that the mutant of CPL2, another gene involved in wax biosynthesis, showed a delayed secretion of the enzyme encoded by MUM2 that disrupts seed coat mucilage extrusion. In the same study, they revealed that CPL2 encodes a phosphatase involved in secretory protein trafficking required for the secretion of extracellular matrix materials, including wax and the cell wall-modifying enzyme MUM2. Although no direct connection between DEWAX and CPL2 has been reported, a recent study by Xu et al. did identify DEWAX as a putative regulator of cell-wall-loosening EXPANSIN (EXPA) genes involved in germination (Xu ). These findings provide a link between wax biosynthesis and cell-wall modifying enzymes, and possibly between the genes involved in these processes. Second, the expression of DEWAX may be the consequence of the disruption of seed mucilage extrusion. Penfield suggest that seed mucilage helps enhance water uptake to ensure efficient germination in the condition of low water potential. This is supported by the evidence that the mucilage-impaired mutant showed reduced maximum germination only on osmotic polyethylene glycol solutions (Penfield ). Therefore, the absence of mucilage in imbibed seed under low water potential may cause osmotic stress in the seed and, in turn, induce the expression of DEWAX, which is known to play a role in the response of plants against osmotic stress (Sun ). If this is the case, then a scenario could be that DEWAX acts downstream of MUM2, and the expression variation of these two genes lead to the emergence of the RP4 eQTL hotspot.

Network analysis shows the involvement of ICE1 as a regulator of gene expression during seed germination

ICE1 is an MYC-like basic helix-loop-helix (bHLH) transcription factor that shows pleiotropic effects in plants. Earlier studies of ICE1 mostly focus on the protein function in the acquisition of cold tolerance (Lee ; Chinnusamy ) and stomatal lineage development (Kanaoka ). Recently, ICE1 was also shown to form a heterodimer with ZOU, another bHLH transcription factor, to regulate endosperm breakdown required for embryo growth during seed development (Denay ). At a later stage, ICE1 negatively regulates ABA-dependent pathways to promote seed germination and seedling establishment (Liang and Yang 2015). This process involves repressing the expression of transcription factors in ABA signaling, such as ABI3 and ABI5, and ABA-responsive genes, such as EM6 and EM1, thus initiating seed germination and subsequent seedling establishment (Hu ; MacGregor ). Loss of ice1 has been reported to lead to reduced germination (MacGregor ) In this study, we performed a network analysis for genes having distant eQTLs on the PD2 hotspot and prioritized ICE1 as the most likely regulator using network analysis. The high connectivity of ICE1 with the other genes in the network could reflect an essential regulatory function of this gene during seed germination. However, we did not find any of the known ICE1 target genes (e.g., ABI3, ABI5, EM1, and EM6) nor seed germination phenotype (Figure 5) having a QTL at the ICE1 locus. It could be that the ICE1 polymorphism is not severe enough to cause considerable trait variation, especially to break a robust biological system where several buffering mechanisms exist to prevent small molecular perturbation from propagating to the phenotypic level (Signor and Nuzhdin 2018; Fu ). A good strategy to validate that a predicted candidate gene indeed causes a QTL hotspot would be to test one parent’s allele of the gene in the genetic background of the other parent. This could be achieved by generating near-isogeneic lines, although rapid developments in site directed mutagenesis might offer a more feasible high-throughput approach for future studies. Next, being able to convert one parent’s gene into the other parent’s gene one SNP at a time would even allow identification of causal SNPs.

Limitations of co-expression network in identifying candidate genes of eQTL hotspots

The construction of a co-expression network is a promising approach to prioritize candidate eQTL genes (Serin ). Despite its potential, there is a major limitation in using a co-expression network. The network is based on gene expression data; hence the identified causal genes are those that directly affect gene expression. For example, as we described above, our approach did not prioritize MUM2 for the RP4 hotspot, possibly because the gene does not cause variation in the target gene expression but rather causes differences at another level of target gene regulation (e.g., enzyme biosynthesis) between two parental alleles in the RIL population. Other studies reported similar results where a known causal gene was not detected as a hub in the network (Jimenez-Gomez ; Sterken ). To overcome this, future work should focus on networks that are built upon multi-omics data by including metabolic, proteomic, and, more importantly, phenotypic measurement data (Hawe ). Moreover, prior biological knowledge, including protein-protein interaction (Szklarczyk ), transcription factor binding-site (Kulkarni ), and other types of interactions (for review see Kulkarni and Vandepoele 2019) can be incorporated to construct data-driven interaction networks. Nevertheless, our approach offers a simple and straightforward way to prioritize candidate genes underlying eQTL hotspots from a limited amount of resources.

77 in total

1. Computing topological parameters of biological networks.

Authors: Yassen Assenov; Fidel Ramírez; Sven-Eric Schelhorn; Thomas Lengauer; Mario Albrecht
Journal: Bioinformatics Date: 2007-11-15 Impact factor: 6.937

2. MYB61 is required for mucilage deposition and extrusion in the Arabidopsis seed coat.

Authors: S Penfield; R C Meissner; D A Shoue; N C Carpita; M W Bevan
Journal: Plant Cell Date: 2001-12 Impact factor: 11.277

3. Diurnal Regulation of Plant Epidermal Wax Synthesis through Antagonistic Roles of the Transcription Factors SPL9 and DEWAX.

Authors: Rong-Jun Li; Lin-Mao Li; Xiu-Lin Liu; Jang-Chol Kim; Matthew A Jenks; Shiyou Lü
Journal: Plant Cell Date: 2019-09-04 Impact factor: 11.277

4. limma powers differential expression analyses for RNA-sequencing and microarray studies.

Authors: Matthew E Ritchie; Belinda Phipson; Di Wu; Yifang Hu; Charity W Law; Wei Shi; Gordon K Smyth
Journal: Nucleic Acids Res Date: 2015-01-20 Impact factor: 16.971

5. The Arabidopsis MUM2 gene encodes a beta-galactosidase required for the production of seed coat mucilage with correct hydration properties.

Authors: Gillian H Dean; Huanquan Zheng; Jagdish Tewari; Jun Huang; Diana S Young; Yeen Ting Hwang; Tamara L Western; Nicholas C Carpita; Maureen C McCann; Shawn D Mansfield; George W Haughn
Journal: Plant Cell Date: 2007-12-28 Impact factor: 11.277

6. Dissecting Abscisic Acid Signaling Pathways Involved in Cuticle Formation.

Authors: Fuqiang Cui; Mikael Brosché; Mikko T Lehtonen; Ali Amiryousefi; Enjun Xu; Matleena Punkkinen; Jari P T Valkonen; Hiroaki Fujii; Kirk Overmyer
Journal: Mol Plant Date: 2016-04-07 Impact factor: 13.164

7. Seed dormancy release in Arabidopsis Cvi by dry after-ripening, low temperature, nitrate and light shows common quantitative patterns of gene expression directed by environmentally specific sensing.

Authors: William E Finch-Savage; Cassandra S C Cadman; Peter E Toorop; James R Lynn; Henk W M Hilhorst
Journal: Plant J Date: 2007-04-25 Impact factor: 6.417

8. Genome-wide network model capturing seed germination reveals coordinated regulation of plant cellular phase transitions.

Authors: George W Bassel; Hui Lan; Enrico Glaab; Daniel J Gibbs; Tanja Gerjets; Natalio Krasnogor; Anthony J Bonner; Michael J Holdsworth; Nicholas J Provart
Journal: Proc Natl Acad Sci U S A Date: 2011-05-18 Impact factor: 11.205

9. TIGRESS: Trustful Inference of Gene REgulation using Stability Selection.

Authors: Anne-Claire Haury; Fantine Mordelet; Paola Vera-Licona; Jean-Philippe Vert
Journal: BMC Syst Biol Date: 2012-11-22

Review 10. Learning from Co-expression Networks: Possibilities and Challenges.

Authors: Elise A R Serin; Harm Nijveen; Henk W M Hilhorst; Wilco Ligterink
Journal: Front Plant Sci Date: 2016-04-08 Impact factor: 5.753