Literature DB >> 19225455

Comprehensive analysis of Arabidopsis expression level polymorphisms with simple inheritance.

Stephanie Plantegenet1, Johann Weber, Darlene R Goldstein, Georg Zeller, Cindy Nussbaumer, Jérôme Thomas, Detlef Weigel, Keith Harshman, Christian S Hardtke.   

Abstract

In Arabidopsis thaliana, gene expression level polymorphisms (ELPs) between natural accessions that exhibit simple, single locus inheritance are promising quantitative trait locus (QTL) candidates to explain phenotypic variability. It is assumed that such ELPs overwhelmingly represent regulatory element polymorphisms. However, comprehensive genome-wide analyses linking expression level, regulatory sequence and gene structure variation are missing, preventing definite verification of this assumption. Here, we analyzed ELPs observed between the Eil-0 and Lc-0 accessions. Compared with non-variable controls, 5' regulatory sequence variation in the corresponding genes is indeed increased. However, approximately 42% of all the ELP genes also carry major transcription unit deletions in one parent as revealed by genome tiling arrays, representing a >4-fold enrichment over controls. Within the subset of ELPs with simple inheritance, this proportion is even higher and deletions are generally more severe. Similar results were obtained from analyses of the Bay-0 and Sha accessions, using alternative technical approaches. Collectively, our results suggest that drastic structural changes are a major cause for ELPs with simple inheritance, corroborating experimentally observed indel preponderance in cloned Arabidopsis QTL.

Entities:  

Mesh:

Year:  2009        PMID: 19225455      PMCID: PMC2657532          DOI: 10.1038/msb.2008.79

Source DB:  PubMed          Journal:  Mol Syst Biol        ISSN: 1744-4292            Impact factor:   11.429


Introduction

Recent advances in high throughput technologies have had a major impact on quantitative genetic analyses, enabling the interrogation of whole genomes for characteristics such as, gene expression levels, single nucleotide polymorphisms (SNPs) or structural genome variation (Keurentjes ). Among these approaches, microarray-based discovery of genetically controlled gene expression level differences has identified numerous expression quantitative trait loci (eQTL) in humans and model organisms (Brem ; Morley ; Doss ; Li ; West ; Stranger ; Potokina ). eQTL can be divided principally into two classes (Gibson and Weir, 2005; Rockman and Kruglyak, 2006; Hansen ). Trans-acting eQTL (trans-eQTL) control the expression of other loci, whereas cis-acting eQTL (cis-eQTL) coincide with the loci whose expression varies. The latter represent ∼20–50% of eQTL in various systems (Morley ; Li ; Stranger ; Potokina ) and, additively, often explain significant portions of observed phenotypic variability (Li ; Petretto ; Keurentjes ; Wentzell ; Stranger ). In this study, we focused on expression level polymorphisms (ELPs) that are already observed between parental lines and display simple, single locus inheritance. Such loci constitute a highly heritable subset of cis-eQTL and, because of their simple inheritance, can be exploited as markers (Doss ; Petretto ; West , 2007; Keurentjes ; Stranger ; Potokina ). They can, for instance, replace SNPs in genotyping, a particularly interesting application in systems with poorly characterized genomes (West ; Potokina ). Despite the abundance of ELPs with simple inheritance, little is known about their molecular basis. In principle, they could represent trans-eQTL that are tightly linked to the locus they control, and this scenario might account for a significant fraction of heritable ELPs in large, complex genomes that are difficult to analyze at high resolution. Generally, however, it appears more likely that ELPs with simple inheritance represent large effect cis-acting polymorphisms in individual genes (Ronald ; Stranger ; Hansen ). These could include polymorphisms that affect gene expression at the transcriptional, post-transcriptional or post-translational level. For instance, mutations might alter transcript stability, or activity of the encoded protein, which could in turn affect RNA levels in cases of auto-regulatory feedback. Generally, however, cis-eQTL and thus ELPs with simple inheritance are assumed to reflect sequence variation in regulatory elements of the corresponding genes (Jansen and Nap, 2001; Cowles ; Schadt ; Pastinen and Hudson, 2004; Ronald ; Williams ), although only few studies have addressed this issue systematically (Cowles ; GuhaThakurta ). Somewhat counter to the idea that regulatory polymorphisms are major determinants of phenotypic variability, in Arabidopsis thaliana, quantitative trait locus (QTL) cloning over the last years has often identified knockout mutations that affect the transcript and/or protein as the underlying molecular cause (e.g. Aukerman ; Grant ; Johanson ; Kliebenstein ; Kroymann ; Kroymann ; Mouchel ; Werner ). Even if many of these loci represent ELPs, generally, a preponderance of indels, whether in regulatory or transcript regions, is observed among these drastic mutations (Koornneef ). However, because of the considerable sequence polymorphisms distinguishing naturally occurring isogenic Arabidopsis strains (so-called accessions), identification of the precise change underlying a QTL is often difficult, and structural changes are the easiest to discover. Thus, the successful reports of QTL isolation might reflect a bias in the ease with which such changes are detected. Indeed, recent studies that exploited recombinant inbred line (RIL) populations created from Arabidopsis accessions have identified numerous eQTL by microarray analyses (Keurentjes ; West ), including a varying portion of cis-eQTL. Among the cis-eQTL, a sizable fraction of loci represented parental ELPs with simple inheritance, which are strong QTL candidates to explain morphological or physiological variation between the parental lines. In this study, we analyzed the molecular basis of such ELPs in greater detail, by comprehensive comparison of gene expression, sequence variation and gene structure. Corroborating the experimental evidence from published reports of QTL cloning, we found again a preponderance of sizable indels, suggesting that QTL representing more subtle regulatory polymorphisms might be less common than anticipated.

Results and discussion

Expression level polymorphisms between the Eil-0 and Lc-0 accessions

To identify parental ELPs, we determined transcript level variation between Eil-0 and Lc-0 seedlings by microarray analyses. Arrays based on short oligonucleotide probes are particularly sensitive to SNPs in parental transcripts, resulting in spurious eQTL and overestimation of cis-eQTL (Doss ; Alberts ), although this appears to depend on various factors, such as array design (Luo ). In the absence of detailed genomic sequence information on the Eil-0 and Lc-0 accessions, in this study, we used arrays based on gene-specific probes of 150–500 bp lengths (Allemeersch ). Genomic DNA hybridizations have previously shown that such two-color arrays are largely insensitive to potential hybridization efficiency biases introduced by minor sequence polymorphisms (Keurentjes ). Moreover, they also offer the advantage of direct sample comparison, allowing immediate ELP assessment rather than ELP inference from statistical comparison of single sample hybridizations of oligonucleotide-based arrays (West ). Nevertheless, in this study, we chose to follow established recommendations for the analysis of two-color arrays, which includes a statistical component (Shi ). Based on duplicate dye swap comparison of three independent RNA samples, 499 ELPs (P<0.005 with Benjamini–Hochberg false discovery rate multiple testing correction and fold change ⩾2) representing 480 genes distributed across the genome were observed (Supplementary Table 1). Comparable numbers of parental ELPs have been found for other pairs of Arabidopsis accessions (West ; Keurentjes ).

Determination of expression level polymorphisms with simple inheritance by microarray analysis of recombinant inbred lines

To determine which of these ELPs show simple inheritance over several generations, we took advantage of a RIL population that had been derived by single-seed descent over seven generations, starting with F2 individuals from an Eil-0 (♀) × Lc-0 (♂) cross (Sibout ). Notably, it was evident from earlier studies that detection of parental ELPs with simple inheritance does not require full-scale eQTL analysis of RIL populations, as they represent a subgroup of cis-eQTL that display firm allele-dependent inheritance of differential expression through all generations starting from the parents. Thus, dye swap comparisons between a few RILs and their two parents in microarray analyses were sufficient for their detection (Figure 1A). The RILs were chosen to represent the genetic diversity of the population based on genotyping data from 79 segregating genome-wide SNP markers (Warthmann ; Sibout ) (Supplementary Table 2), such that each locus would be derived typically from the same parent in at least three RILs. Thus, seven RILs were chosen for detailed analyses. RNA from these lines was hybridized against RNA from either parent in a dye swap layout. To assess the heritability of parental ELPs, we compared expected and observed differential expression, taking into account the genotyping data (Figure 1A). On average, ∼60% of predicted ELPs were recovered in a given RIL versus parent hybridization (Figure 1B, Supplementary Table 3), similar to proportions found in other studies (Keurentjes ). The absence of differential expression was an even better predictor, matching ∼91% of observations. This discrepancy is likely due to the fact that the 2-fold change in expression represents a rather stringent but also arbitrary selection criterion. Overall, predictions of presence and absence of differential expression matched better, if data were treated according to 5% false discovery rate. However, as an extensive study of two-color microarray hybridizations recommended a 2-fold change in conjunction with false discovery rate for scoring differential signals (Shi ), we used the analysis of our data according to those criteria, as the baseline in the following. Similar to earlier studies (West ; Keurentjes ), the parental ELPs could be used for RIL genotyping, delivering higher resolution than the SNP data (Supplementary Figure 1).
Figure 1

Assessment of ELP heritability by microarray analyses. RIL from a cross between Eil-0 and Lc-0 were genotyped with a set of 79 genome-wide SNP markers (Warthmann ), defining the parental origin of chromosome segments. (A) Principles for the assessment of the heritability of ELPs observed between the Eil-0 and Lc-0 parents. Genotyped RIL from the S6 generation were compared with both parents in dye swap replicates. Based on the RIL genotype for a particular chromosome segment as determined by the flanking SNP markers, differential expression of a parental ELP locus on this chromosome segment was not expected in hybridizations of the RIL against the parent from whom the segment was inherited. However, differential expression (>2-fold) was expected in hybridizations against the other parent. ELPs located in regions of ambiguous genotype, i.e. heterozygous regions or segments spanning recombination breakpoints, were omitted from the analysis of that particular RIL. (B) Summary of parental ELP behavior in the hybridizations of the seven RILs (EL lines) against the two parent lines based on the principles outlined in (A). (C) Percentage of parental ELPs matching predictions across all RIL-parent hybridizations at a given frequency (100 or the 10% intervals below).

Comparison of the patterns of individual genes corresponding to parental ELPs, across all RIL hybridizations, enabled us to classify them according to the frequency at which predictions were met. This analysis identified a group representing ∼20% of parental ELPs that perfectly matched predictions (Figure 1C, Supplementary Table 4) and, thus, can be considered to have simple cis-inheritance. Notably, as many of the other loci frequently missed our cutoff criteria for differential expression only narrowly, particularly the 2-fold criterion (see above), this is a conservative estimate. Overall, ELPs whose hybridization pattern matched 80% of predictions or more represented ∼44% of all parental ELPs.

Sequence analysis of regulatory regions in genes representing ELPs with simple inheritance

To determine whether ELPs with simple inheritance are associated with increased sequence variation in regulatory regions, as observed in other systems (Cowles ; GuhaThakurta ), we compared a sample of 61 genes chosen from the ELP group that matched at least 90% of predictions with a control group of 85 genes that displayed very low variability and differential expression across all microarray experiments (see Methods). Notably, in Arabidopsis, regulatory elements controlling gene expression are generally found in the 5′ vicinity of the transcription start sites and the 5′ leader sequences (Lee ). Thus, we isolated 1 kb fragments immediately upstream of the start codon for each of the sample and control group genes from both Eil-0 and Lc-0. Sequence information was obtained for ∼44 kb of stably heritable ELP loci and ∼62 kb of control loci (Supplementary Table 5, Supplementary sequence alignments). Sequence diversity between Eil-0 and Lc-0 was considerably higher in the ELPs with simple inheritance as compared with the control group (Figure 2A, Supplementary Table 5). Overall, SNP frequency was increased >4.5-fold, indel number >4.7-fold and the number of bp affected by indels >9.0-fold (Figure 2B and C). Generally, SNPs were biased towards the promoter as compared with the leader sequences. These results support the idea that ELPs with simple inheritance are associated with increased sequence diversity in the regulatory regions of the corresponding genes.
Figure 2

Sequence analysis of regulatory regions of a sample of 61 genes representing parental ELPs with simple inheritance and a control group of 85 genes, which displayed very low variability and differential expression (see Supplementary Materials and methods) in the array experiments (‘controls'). For the ELPs with simple inheritance, only genes which perfectly matched predictions (see Figure 1C), and for which at least 10 precise predictions could be made (i.e. loci located in unambiguous chromosome segments in at least five RIL) were included. (A) Summary of sequence analyses of regulatory regions from 61 ELPs with simple inheritance and 85 control genes. Observed total absolute values (tot. line), per gene average values (av. line) and median values (me. line) are indicated. Note that numbers for promoter sequences and 5′ leader sequences do not add up to the total, because leader sequences were not defined for all genes investigated. (B) Relative abundance of SNPs (based on total sequence investigated). (C) Relative amount of bp affected by indels (based on total sequence investigated). Asterisks indicate t-test significance between the ELPs with simple inheritance and the control group (*P<0.05; **P<0.01; NS, not significant).

Genome tiling array analyses of the Eil-0 and Lc-0 genomes

Analyses of Arabidopsis genome variation have discovered unexpectedly high levels of accession-specific indels, which often impair gene function (Clark ; Zeller ). Such indels can, for instance, be identified by probing whole genome tiling arrays with genomic DNA (Hinds ; Clark ; Yazaki ). As we failed to amplify the 5′ regions of at least one parent for ∼34% of all loci initially targeted for sequencing in the ELP group and ∼12% in the control group, we sought to determine whether this could be explained by indels. To this end, duplicate samples of genomic Col-0, Eil-0 and Lc-0 DNA were hybridized to Affymetrix Arabidopsis tiling 1.0R arrays, which represent the Col-0 genome as a tile of 25mer oligonucleotides with 10 bp spacing. Thresholds for detection of deletions (⩾2.8-fold drop in hybridization signal over ⩾35 bp, maximum allowed gap 150 bp) in Eil-0 and Lc-0 were determined empirically. This was done using deletions identified in the sequencing data (Figure 3). These threshold criteria consistently allowed detection of indels greater than ∼30 bp, whereas at the same time ruling out the possibility that deletion calls could represent spurious differential signals because of SNPs or smaller indels (Figure 3) as detected in other studies (Li ; West ; Alberts ; Borevitz ; Clark ). The genes representing parental ELPs as well as the control genes were inspected individually and only indels that were detected consistently in both replicate hybridizations were considered real. Even using these stringent criteria, numerous indels of various sizes were identified in both Eil-0 and Lc-0 (e.g. Figure 4; bar files for viewing tiling paths are provided in the Supplementary information). However, although ∼42% of all parental ELP genes displayed indels when comparing their structure in Eil-0 versus Lc-0, only 9% of control genes did (Figure 5A; Supplementary Table 6); thus representing a >4-fold enrichment. Moreover, in the control group, deletions were usually small and affected mostly intron or leader sequences. As it appeared possible that the low expression variability of the control group genes could reflect the effect of purifying selection, we also analyzed a non-redundant random set of genes, which yielded essentially quantitatively similar results (Supplementary Table 6). By contrast, in the ELP group, generally multiple indels per gene were detected, and these were often larger and frequently affected exons. Moreover, in the ELP group, gene deletions (defined as uninterrupted deletion detection signal spanning >50% of the transcript region) were observed for nearly 10% of loci. Gene deletions were never observed in either control group.
Figure 3

Genomic tiling array analysis of the Eil-0 and Lc-0 genomic DNA hybridized against a tile of the Col-0 genome. Two independent hybridizations were performed for each genotype. For classification of deletions, thresholds were determined by an empirical approach based on the promoter sequencing data described in Figure 2. The deduced settings of a signal drop below 2.8-fold (–1.5 on log2 scale), a minimum run >35 and for maximum gap ⩽150 allowed detection of indels >30 bp, but detected neither smaller indels nor SNPs. Examples are shown for tiles of individual sequenced regions. (A) Promoter region of At1g29030. No polymorphisms were observed among Lc-0 or Eil-0 as compared with Col-0 or each other. (B) Promoter region of At1g05830. Sequencing revealed a few dispersed small indels and SNPs between the three genotypes. (C) Promoter region of At1g13650. Sequencing revealed an extended stretch of many small indels and SNPs. (D) Promoter region of At1g33480. Sequencing revealed several small indels and SNPs. Only a 39 bp deletion in Eil-0 is picked up as a positive (red horizontal bars) by our settings. Gene structure is shown at the bottom of each panel (thick yellow blocks, exons; small yellow blocks, UTRs; yellow lines, introns). Difference in hybridization signal between Lc-0 or Eil-0 versus Col-0 along the oligonucleotides representing the tiling path is shown as white vertical bars. Upward deviation from the base line indicates positive hybridization signal, downward deviation negative hybridization signal.

Figure 4

Indel analysis of the Eil-0 and Lc-0 genomes using genome tiling arrays. Genomic DNA of genotypes was hybridized against a tile of the Col-0 genome. Two independent hybridizations were performed for each genotype. Indels were deduced using threshold settings (signal drop ⩽2.8-fold, min run >35, max gap ⩽150) determined empirically as described in Figure 3. Examples are shown for tiles of individual genes. (A) At1g59900. No polymorphisms were observed in Lc-0 or Eil-0 as compared with Col-0 or each other. (B) At1g63900. Various deletions were detected in Eil-0 as compared with Lc-0 and Col-0. (C) At1g12220. A large-scale deletion likely covering the whole gene as indicated by a continuous detection bar was observed in Lc-0.

Figure 5

Summary of indel analyses. (A, B) Indel analysis of genes representing parental ELPs between the Eil-0 and Lc-0 accessions. (A) Correlation between strict ELP heritability (matching of hybridization predictions, see Figure 1C) and presence of deletions in the corresponding genes in one of the parents. Percentage of genes in each class displaying structural changes between Eil-0 and Lc-0 (‘indels') or not (‘similar'). Controls represent an extended group of 97 genes as described in Figure 2. (B) Detailed classification of the parental ELPs and controls shown in (A). None: no indels detected in Eil-0 as compared with Lc-0; introns: indel(s) detected in intron(s) of one parent as compared with the other; UTRs: indel(s) detected in UTR(s) or UTR(s) and intron(s) of one parent as compared with the other; exons: indel(s) detected in exon(s) or exons, UTR(s) and/or intron(s) of one parent as compared with the other; whole gene: >50% of gene deleted or duplicated in one parent as compared with the other. (C) Correlation between the presence of indels in the coding region and increased sequence variation in the corresponding 5′ regulatory regions in the parental ELP genes. The quartiles as well as the average (wider line) are indicated. The distribution between the two groups is statistically significant (P<0.0390, t-test). (D) Expression microarray hybridization signal distribution of all genes in the Eil-0 versus Lc-0 parent comparison. (E) As in (D), shown for the parental ELPs with simple inheritance.

The majority of genes representing ELPs with simple inheritance display uni-parental structural changes

Analysis of deletions according to ELP class with respect to matched predictions revealed a clear trend towards more severe indel types in ELP loci with simple inheritance. For instance, in the class of ELPs that perfectly matched predictions, 20% of loci displayed uni-parental gene deletions, whereas 25% of loci carried deletions in exons (Figure 5B). Still within the group of ELPs that matched at least 80% of predictions, the majority of loci displayed major uni-parental deletions. By contrast, the proportion of loci, for which no structural difference was observed between Eil-0 and Lc-0, continuously increased in the parental ELP classes that matched predictions less and less faithfully, accompanied by a decrease in the severity of deletions observed. Thus, indels that are likely to impair or even abolish gene function appear to be much more frequent in genes representing ELPs, with simple inheritance, than in genes representing less heritable parental ELPs or invariable (or random) controls. These data suggest that the majority of ELPs with simple inheritance reflect a uni-parental impairment or even loss of gene function. Importantly, in the vast majority of cases, over 90%, deletions were in phase with the direction of expression difference between the parents, such that the allele that carried deletions was expressed at a lower level. This observation would be consistent with the idea that the majority of deletions negatively affect gene function, thus leading to a loss of selection on gene maintenance and consequently gene expression. Supporting this notion, those parental ELPs that carried indels in their coding region also displayed a higher level of sequence variation in their 5′ regulatory regions (Figure 5C). However, the observation that alleles carrying deletions were expressed at lower levels could also simply reflect a difference in hybridization signal because of deletions in one allele. Although this appears likely for loci that displayed uni-parental gene deletion, this explanation might not be generally applicable to loci that carried partial deletions. Such loci might still yield detectable although potentially aberrant transcripts, even if those would not encode functional protein. In fact, deletions were not evident from our expression arrays, as documented by the signal strength distribution of parental ELPs, which resembles the one for all genes (Figure 5D–E). Moreover, as background noise is difficult to define in the two-color array hybridizations employed in our study, absence of hybridization signal is hard to establish, in particular for genes that are expressed at low levels (Czechowski ). Finally, an earlier study used two-color arrays as well, and the authors entertained the notion that ELPs might reflect deletions (Keurentjes ). To test this, they hybridized their arrays with competing genomic DNA from the parental accessions, Ler and Cvi-0, to identify a total of 159 indels. Of those, 14 coincided with cis-eQTL that mostly reflected ELPs with simple inheritance observed between the parents. However, as their study identified 922 parental ELPs, this would mean that there are either significantly fewer structural differences between the Ler and Cvi-0 genomes than between the Eil-0 and Lc-0 genomes, or that indels were underestimated as compared with our study.

Independent analysis of ELPs with simple inheritance between Bay-0 and Sha

To corroborate independently the validity of our approach, we made use of other studies, in which expression differences between the Arabidopsis Bay-0 and Sha accessions were reported (West , 2007). Importantly, these data were extracted from full-scale single-feature polymorphism and e-QTL analysis of a population of more than 200 RILs, which, compared with our Eil-0xLc-0 analysis, was characterized using a different, short oligonucleotide microarray platform (Affymetrix) and a different conceptual approach to extract heritable gene expression differences. Thus, 187 genes representing parental ELPs with simple inheritance between Bay-0 and Sha were identified. We performed two independent hybridizations of genomic DNA of both Bay-0 and Sha to genome tiling arrays and analyzed the data as outlined above for Eil-0 and Lc-0. Again, we observed a strong preponderance of indels in the 187 ELPs with simple inheritance (>6-fold enrichment) as compared with the same control group used above (the two gene sets did not overlap; importantly, the control genes had been selected from the Eil-0xLc-0 analysis according to the indicated threshold criteria, but also according to the fact that they were monitored in all hybridizations, and that they were not part of gene expression markers in the Bay-0 × Sha eQTL analysis). (Figure 6A, Supplementary Table 7). Similar to our results for Eil-0 and Lc-0, the majority of the deletions in the ELPs with simple inheritance were observed at the level of exons (∼33% of loci) or genes (18%) (Figure 6B).
Figure 6

Indel analysis and polymorphic region prediction (PRP) analysis of genes representing ELPs with simple inheritance and controls between the Bay-0 and Sha accessions. (A) Percentage of genes representing ELPs with simple inheritance or controls (same group Figure 5) that carry indels in one parent as compared with the other or that display similar gene structure. (B) Detailed classification of genes shown in (A), categories similar to Figure 5B. (C, D) PRP predictions. The graphs (logarithmic scale) represent total PRP size observed in a given gene (in bp, equaling sum of all individual PRPs with respect to the gene model Atxgyyyyy.1, TAIR 7.0 annotation) detected in one accession plotted against the same value for the other accession. Classification of genes is similar to Figure 5B. (C) Genes representing ELPs with simple inheritance. (D) Control genes.

The Bay-0 and Sha accessions were also part of a recent genome re-sequencing effort using high-density oligonucleotide arrays that interrogate SNP polymorphisms at every single base of the Arabidopsis genome (Clark ). These data offered us the opportunity to independently verify our results. To this end, we analyzed the ELPs with simple inheritance and control group genes by a recently developed algorithm (Zeller ) to identify polymorphic region predictions (PRPs), i.e. reduced hybridization signal over extended tracts of sequence. Such PRPs could result from an accumulation of SNPs or indels. Matching our tiling array analysis, PRPs were dramatically more frequent and generally more extended in the ELPs with simple inheritance as compared with the controls (Supplementary Table 7). This is illustrated by comparison of the combined PRP lengths in the Bay-0 versus the Sha alleles, which also revealed a marked asymmetry in PRP size in the genes representing ELPs with simple inheritance (Figure 6C), but not in the control genes (Figure 6D). In nearly all cases, increased PRP size matched the presence of deletions as detected by the tiling array approach.

Conclusions

In summary, our data suggest that ELPs with simple inheritance in Arabidopsis primarily reflect the consequences of structural differences in the corresponding genes, rather than variation in regulatory elements, even if such a variation is observed. Notably, association of increased SNP variability and proximal deletions has also been observed in the human genome (Hinds ). The large majority of deletions detected in ELPs with simple inheritance affected open reading frames or even complete genes, suggesting that they could frequently lead to loss of gene function. Moreover, we repeatedly observed major deletions of flanking regulatory regions. Even if those deletions leave transcription units intact, they might lead to reduced or abolished gene expression, resulting in a de facto loss of gene function. It remains to be determined whether Arabidopsis suffers from a particularly heavy mutational load because of inbreeding, as suggested before (Bustamante ), or whether our findings apply more broadly. The similarity in ELP behavior across systems and the finding that copy number variation can explain significant portions of quantitative traits (Cutler ; Stranger ) suggests that this could be the case. Finally, although functional variation in cis-regulatory elements contributes clearly to phenotypic variation (Bentsink ; Rus ; Sibout ), large-effect changes that impact the integrity of transcribed regions should be considered as an equally valid explanation for expression variation. Indeed, such mutations have been shown to underlie phenotypic variation in natural strains of Arabidopsis (Grant ; Aukerman ; Johanson ; Kliebenstein ; Kroymann , 2003; Koornneef ; Werner ). Finally, the prevalence of indels in ELPs with simple inheritance mirrors the preponderance of indels with a drastic effect on gene integrity underlying cloned QTL, suggesting that the latter do not reflect a technical bias in the ease of detection. Thus, Arabidopsis QTL representing more subtle regulatory polymorphisms might be less common than anticipated.

Materials and methods

Plant materials

Seeds of Arabidopsis accessions were obtained from the Arabidopsis Biological Resources Center (Ohio State University, USA). Sterilized seeds were stratified for 48 h at 4°C, and seedlings were germinated and grown in tissue culture on a basic solid medium with macro and micronutrients (0.5 × MS) and 0.9% agar (Duchefa, the Netherlands), supplemented with 2% sucrose at 21°C under continuous light of 130 μE intensity. The Eil-0 × Lc-0 RIL population was derived from a cross between those parents in which Eil-0 served as the mother, after seven generations of single-seed descent starting from the segregating F2 generation (Sibout ). Plant material for RNA analysis was harvested at 9 days after germination, typically from pools of 20 seedlings per line.

SNP Genotyping

Genomic DNA of the EL RIL F5 population was isolated with Plant DNeasy™ kits (QIAGEN, the Netherlands) according to the manufacturer's instructions. Genotyping with a set of 289 SNPs was carried out by Genaissance Pharmaceuticals, Inc. (New Haven, CT, USA). Of those SNP, 79 were polymorphic between Eil-0 and Lc-0.

Microarray hybridizations

Total RNA was isolated using the Plant RNeasy™ kit (QIAGEN, the Netherlands) according to the manufacturer's instructions. Total RNA from the seedling pools was amplified using the MessageAmp™ aRNA II kit (Ambion, TX, USA). Five micrograms of amplified RNA were reverse transcribed into cyanin 3- or cyanin 5-labeled cDNA, purified with Qiaquick™ columns (Qiagen, the Netherlands) and hybridized on microarrays produced by the Lausanne DNA Array Facility (GEO accession number GPL6147) containing 25 000 gene-specific tags for the A. thaliana genome (Hilson ). In order to analyze ELPs between the accessions Eil-0 and Lc-0, three independently grown seedling pools were analyzed by two-color co-hybridization of the labeled cDNAs in dye swap experiments, giving a total of six slides. These experiments can be found in the GEO database under entry GSE13628.

Statistical analyses

Statistical analyses of gene expression measures were carried out with open source R software packages available as part of the BioConductor project (http://www.bioconductor.org). Raw data from the microarrays were normalized by print tip lowess normalization (Yang ), without applying background subtraction. To identify differentially expressed genes, we computed single gene moderated t-statistics (Smyth, 2004) using the limma package (Smyth, 2005). Genes were ranked according to their mod-t P-value and a cutoff was set at a maximum false discovery rate (Benjamini–Hochberg multiple testing correction, (Benjamini and Hochberg, 1995) of 0.005. From these genes, those with a minimum 2-fold expression difference between Eil-0 and Lc-0 qualified as parental ELP. For the analysis of RIL gene expression, each RIL sample was co-hybridized with each parent (Eil-0 and Lc-0) in a dye swap, resulting in two slides per parent versus RIL comparison. Genes with large mod-t and an expression difference of at least 2-fold in the RIL-parent comparison were considered as expressed differentially. In order to select genes that show no ELP (control genes), we selected genes, which had a maximum fold change of 1.3 in at least 13 out of 17 conditions tested (all RIL versus parent comparisons and Eil-0 versus Lc-0 comparisons). These genes were ranked according to their signal intensities and genes with an A-value <8 were excluded. From the remaining medium to high-intensity genes (134), 97 were selected for promoter and tiling array analysis.

Sequencing

For sequence analyses of regulatory elements, 1 kb fragments of 85 control genes and 65 stable ELP genes spanning the region 5′ to the start codon were isolated by PCR with KOD Hot Start Polymerase® (Novagen™) following the manufacturer's instructions. PCR-amplified fragments were purified using QiaQuick columns (Qiagen, the Netherlands) and sequenced by Macrogen Inc. (Republic of South Korea). Obtained sequences were analyzed using MacVector™ 7.2.2 software. The sequences have been submitted to the GenBank database (accession numbers FJ441298-FJ441589).

Tiling arrays

Genomic DNA was extracted from pools of three plants for each accession (Col-0, Eil-0, Lc-0 Bay-0, Sha) with Plant DNeasy™ kits (QIAGEN, the Netherlands) according to the manufacturer's instructions. Biotin-labeled target DNA was generated from this genomic DNA as described (Borevitz, 2006). Labeled targets were hybridized on Affymetrix GeneChip® Arabidopsis Tiling 1.0R Arrays and processed according to the supplier's protocols. CEL files were processed by Affymetrix tiling analysis software to generate normalized signal bar files. Tiling analysis software settings were quantile normalization and a bandwidth for probe analysis of 50 bp. To determine structural variations in the genomes of Eil-0, Lc-0, Bay-0 and Sha, two independent DNA isolates of each accession were compared with Columbia DNA. The resulting bar files were loaded into the Affymetrix integrated genome browser software and analyzed manually for the genes of interest. To qualify as deletions, the integrated genome browser signals had to be below cutoff—1.5 (log2 scale) and the settings for min run was >35 and for max gap⩽150. These parameters were determined empirically (see text and Figure 3). The TAIR Arabidopsis genome annotation version 7.0 was used for analysis. The tiling array raw data have been deposited at the ArrayExpress database under accession number E-MEXP-1888. Supplementary Figure 1 Supplementary Table 1 Supplementary Table 2 Supplementary Table 3 Supplementary Table 4 Supplementary Table 5 Supplementary Table 6 Supplementary Table 7 Supplementary sequence alignments Normalized signal bar files generated by Affymetrix tiling analysis software (TAS)
  53 in total

1.  Molecular analysis of FRIGIDA, a major determinant of natural variation in Arabidopsis flowering time.

Authors:  U Johanson; J West; C Lister; S Michaels; R Amasino; C Dean
Journal:  Science       Date:  2000-10-13       Impact factor: 47.728

2.  Detection of regulatory variation in mouse genes.

Authors:  Christopher R Cowles; Joel N Hirschhorn; David Altshuler; Eric S Lander
Journal:  Nat Genet       Date:  2002-10-15       Impact factor: 38.330

Review 3.  The quantitative genetics of transcription.

Authors:  Greg Gibson; Bruce Weir
Journal:  Trends Genet       Date:  2005-09-08       Impact factor: 11.639

4.  Flowering as a condition for xylem expansion in Arabidopsis hypocotyl and root.

Authors:  Richard Sibout; Stéphanie Plantegenet; Christian S Hardtke
Journal:  Curr Biol       Date:  2008-03-25       Impact factor: 10.834

5.  Global eQTL mapping reveals the complex genetic architecture of transcript-level variation in Arabidopsis.

Authors:  Marilyn A L West; Kyunga Kim; Daniel J Kliebenstein; Hans van Leeuwen; Richard W Michelmore; R W Doerge; Dina A St Clair
Journal:  Genetics       Date:  2006-12-18       Impact factor: 4.562

6.  A gene controlling variation in Arabidopsis glucosinolate composition is part of the methionine chain elongation pathway.

Authors:  J Kroymann; S Textor; J G Tokuhisa; K L Falk; S Bartram; J Gershenzon; T Mitchell-Olds
Journal:  Plant Physiol       Date:  2001-11       Impact factor: 8.340

7.  Significant gene content variation characterizes the genomes of inbred mouse strains.

Authors:  Gene Cutler; Lisa A Marshall; Ni Chin; Helene Baribault; Paul D Kassner
Journal:  Genome Res       Date:  2007-11-07       Impact factor: 9.043

8.  Natural variants of AtHKT1 enhance Na+ accumulation in two wild populations of Arabidopsis.

Authors:  Ana Rus; Ivan Baxter; Balasubramaniam Muthukumar; Jeff Gustin; Brett Lahner; Elena Yakubova; David E Salt
Journal:  PLoS Genet       Date:  2006-10-26       Impact factor: 5.917

9.  Cis-regulatory variations: a study of SNPs around genes showing cis-linkage in segregating mouse populations.

Authors:  Debraj GuhaThakurta; Tao Xie; Manish Anand; Stephen W Edwards; Guoya Li; Susanna S Wang; Eric E Schadt
Journal:  BMC Genomics       Date:  2006-09-15       Impact factor: 3.969

10.  Linking metabolic QTLs with network and cis-eQTLs controlling biosynthetic pathways.

Authors:  Adam M Wentzell; Heather C Rowe; Bjarne Gram Hansen; Carla Ticconi; Barbara Ann Halkier; Daniel J Kliebenstein
Journal:  PLoS Genet       Date:  2007-08-01       Impact factor: 5.917

View more
  13 in total

Review 1.  Natural variation in Arabidopsis: from molecular genetics to ecological genomics.

Authors:  Detlef Weigel
Journal:  Plant Physiol       Date:  2011-12-06       Impact factor: 8.340

Review 2.  Systems genetics in "-omics" era: current and future development.

Authors:  Hong Li
Journal:  Theory Biosci       Date:  2012-11-09       Impact factor: 1.919

3.  Complex Relationships between Chromatin Accessibility, Sequence Divergence, and Gene Expression in Arabidopsis thaliana.

Authors:  Cristina M Alexandre; James R Urton; Ken Jean-Baptiste; John Huddleston; Michael W Dorrity; Josh T Cuperus; Alessandra M Sullivan; Felix Bemm; Dino Jolic; Andrej A Arsovski; Agnieszka Thompson; Jennifer L Nemhauser; Stan Fields; Detlef Weigel; Kerry L Bubb; Christin Queitsch
Journal:  Mol Biol Evol       Date:  2018-04-01       Impact factor: 16.240

4.  Reference-guided assembly of four diverse Arabidopsis thaliana genomes.

Authors:  Korbinian Schneeberger; Stephan Ossowski; Felix Ott; Juliane D Klein; Xi Wang; Christa Lanz; Lisa M Smith; Jun Cao; Joffrey Fitz; Norman Warthmann; Stefan R Henz; Daniel H Huson; Detlef Weigel
Journal:  Proc Natl Acad Sci U S A       Date:  2011-06-06       Impact factor: 11.205

5.  A hyperactive quantitative trait locus allele of Arabidopsis BRX contributes to natural variation in root growth vigor.

Authors:  Julien Beuchat; Shuwei Li; Laura Ragni; Chikako Shindo; Michael H Kohn; Christian S Hardtke
Journal:  Proc Natl Acad Sci U S A       Date:  2010-04-19       Impact factor: 11.205

6.  Bimodal expression level polymorphisms in Arabidopsis thaliana.

Authors:  Atsushi J Nagano; Takashi Tsuchimatsu; Yudai Okuyama; Ikuko Hara-Nishimura
Journal:  Plant Signal Behav       Date:  2012-07-01

7.  Different Alleles of a Gene Encoding Leucoanthocyanidin Reductase (PaLAR3) Influence Resistance against the Fungus Heterobasidion parviporum in Picea abies.

Authors:  Miguel Nemesio-Gorriz; Almuth Hammerbacher; Katarina Ihrmark; Thomas Källman; Åke Olson; Martin Lascoux; Jan Stenlid; Jonathan Gershenzon; Malin Elfstrand
Journal:  Plant Physiol       Date:  2016-06-17       Impact factor: 8.340

8.  Mass spectra-based framework for automated structural elucidation of metabolome data to explore phytochemical diversity.

Authors:  Fumio Matsuda; Ryo Nakabayashi; Yuji Sawada; Makoto Suzuki; Masami Y Hirai; Shigehiko Kanaya; Kazuki Saito
Journal:  Front Plant Sci       Date:  2011-08-22       Impact factor: 5.753

9.  Substantial deletion overlap among divergent Arabidopsis genomes revealed by intersection of short reads and tiling arrays.

Authors:  Luca Santuari; Sylvain Pradervand; Amelia-Maria Amiguet-Vercher; Jerôme Thomas; Eavan Dorcey; Keith Harshman; Ioannis Xenarios; Thomas E Juenger; Christian S Hardtke
Journal:  Genome Biol       Date:  2010-01-12       Impact factor: 13.583

10.  Gene expression profiles deciphering rice phenotypic variation between Nipponbare (Japonica) and 93-11 (Indica) during oxidative stress.

Authors:  Fengxia Liu; Wenying Xu; Qiang Wei; Zhenghai Zhang; Zhuo Xing; Lubin Tan; Chao Di; Dongxia Yao; Chunchao Wang; Yuanjun Tan; Hong Yan; Yi Ling; Chuanqing Sun; Yongbiao Xue; Zhen Su
Journal:  PLoS One       Date:  2010-01-08       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.