Literature DB >> 36084135

Conditional GWAS of non-CG transposon methylation in Arabidopsis thaliana reveals major polymorphisms in five genes.

Eriko Sasaki1,2, Joanna Gunis1, Ilka Reichardt-Gomez1,3, Viktoria Nizhynska1, Magnus Nordborg1.   

Abstract

Genome-wide association studies (GWAS) have revealed that the striking natural variation for DNA CHH-methylation (mCHH; H is A, T, or C) of transposons has oligogenic architecture involving major alleles at a handful of known methylation regulators. Here we use a conditional GWAS approach to show that CHG-methylation (mCHG) has a similar genetic architecture-once mCHH is statistically controlled for. We identify five key trans-regulators that appear to modulate mCHG levels, and show that they interact with a previously identified modifier of mCHH in regulating natural transposon mobilization.

Entities:  

Mesh:

Substances:

Year:  2022        PMID: 36084135      PMCID: PMC9491579          DOI: 10.1371/journal.pgen.1010345

Source DB:  PubMed          Journal:  PLoS Genet        ISSN: 1553-7390            Impact factor:   6.020


Introduction

Organisms have developed defense systems to protect the genome from transposable elements, which can act as ‘selfish genes’ and cause considerable damage [1,2]. DNA methylation is a major component of genome defense that is found in both mammals and plants, albeit with significant differences [2,3]. For instance, whereas mammals mostly have CG methylation (mCG), methylation in plants also occurs in the CHG and CHH contexts (H is A, T, or C). mCG in plants is known to be maintained through both mitosis and meiosis by DNA METHYLTRANSFERASE 1 (MET1; DNMT1 in humans)—in contrast to CHH methylation (mCHH), which is re-established after cell division by several pathways, including the RNA-directed DNA methylation (RdDM) pathway and the CHROMOMETHYLASE 2 (CMT2) pathway [4,5]. Unlike mCG, which is stably inherited, mCHH behaves like a molecular phenotype and is strongly influenced by the environment, such as growth temperature [6] and stress [7]. CHG methylation (mCHG) falls somewhere between mCG and mCHH in the sense that it can be maintained via positive feedback between CHROMOMETHYLASE3 (CMT3) and KRYPTONITE (KYP), which recognize dimethylation of histone 3 lysine 9 (H3K9me2) and mCHG, respectively [8-11]. Molecular mechanisms aside, the forces shaping variation in DNA methylation remain obscure, and the same is true for its biological significance [7,12-19]. Previously, we demonstrated the existence of large-scale geographic clines for DNA methylation in A. thaliana, and used genome-wide association studies (GWAS) to show that mCHH on transposons is heavily influenced by major polymorphisms at trans-acting modifiers corresponding to known DNA methylation regulators in silencing pathways: CMT2, NUCLEAR RNA POLYMERASE D1B (NRPE1), ARGONAUTE 1 (AGO1), and AGO9 [6,20]. In particular, an allele of NRPE1 (named NRPE1’) strongly affects mCHH levels in RdDM-targeted transposons. NRPE1, which is the largest subunit of RNA polymerase V, localizes to promoter regions of relatively young transposons [21], and Baduel et al. [22] recently showed that the NRPE1’ allele is associated with recent transposon mobilization, presumably through its effect on DNA methylation. All in all, these findings point to a highly variable genome defense system. While both mCHG and mCG showed high heritability, GWAS yielded little in terms of significant associations. This might be because these “traits” are highly polygenic, or because they are at least partly transgenerationally inherited, and hence do not behave like standard phenotypes (i.e. the observed trait does not necessarily reflect current genetics or environment). In this study, we revisit mCHG variation and use a conditional GWAS approach to reveal that multiple major alleles at trans-acting regulators modify this type of methylation, and are also associated with transposon mobilization.

Results

mCHG is strongly correlated with mCHH

Our starting point is the observation that mCHG and mCHH levels on transposons are strongly correlated in the 1001 Epigenomes data set [20], especially for RdDM-targeted transposons (Fig 1A; see Methods). Much of the variation in these two traits is due to differences in the environment—including tissue, which can be viewed as a cellular environment. It is important to note that these are not single-cell data, and that differences in DNA methylation levels may arise simply because of different levels of cell division states between samples [23], although hyper-mCHG in flower tissue samples probably also reflect tissue-specific CMT3 expression [24-26]. At the same time, variation across individuals is enormous even when controlling for known tissue and environmental effects, as observed in the largest leaf sample data set (“Leaf SALK ambient temperature”; n = 846).
Fig 1

Covariance between mCHH and mCHG levels across individuals.

(A) Correlation of genome-wide average mCHH and mCHG levels in RdDM- and CMT2-targeted transposons across 1028 natural inbred lines measured in different conditions (including 79 lines measured in more than one condition; see [20]) Colors correspond to environments/tissues. Plots on axes show marginal densities. Circled lines carry known natural null alleles of CMT2. (B) The same correlation as in A, but for 84 epigenetics-related loss-of-function mutants [27]. Gray areas indicate 99% confidence intervals around the linear regression lines. Green points denote “wildtypes” (see [27]); magenta, major DNA methyltransferases. Blue points denote lines with a highly significant effect on mCHG.

Covariance between mCHH and mCHG levels across individuals.

(A) Correlation of genome-wide average mCHH and mCHG levels in RdDM- and CMT2-targeted transposons across 1028 natural inbred lines measured in different conditions (including 79 lines measured in more than one condition; see [20]) Colors correspond to environments/tissues. Plots on axes show marginal densities. Circled lines carry known natural null alleles of CMT2. (B) The same correlation as in A, but for 84 epigenetics-related loss-of-function mutants [27]. Gray areas indicate 99% confidence intervals around the linear regression lines. Green points denote “wildtypes” (see [27]); magenta, major DNA methyltransferases. Blue points denote lines with a highly significant effect on mCHG. Interestingly, the covariance between mCHH and mCHG showed the same pattern in data generated by knocking out known or potential DNA methylation regulators in the same genetic background (Fig 1B) [27]. This demonstrates strong co-regulation of these types of methylation, in particular for RdDM-targeted transposons. Loss of RdDM regulation, such as in the double mutant of DOMAINS REARRANGED METHYLASE (DRM) drm1 drm2, causes loss of almost all mCHH and mCHG because mCHG in these regions is mainly established via siRNA [28]. At the same time, the data demonstrate that some genetic perturbations can affect one type of methylation much more than the other. For example, knocking out mCHG-regulators, like CMT3 or KYP, affects mCHG levels much more than mCHH levels. Based on these observations, it is clear that a substantial fraction of variation in non-CG methylation is due to factors that affect both mCHH and mCHG. Without replicate measurements, it is difficult to say what fraction of these factors is genetic and what is environmental, but, irrespective of this, we hypothesized that the substantial covariance could reduce the power of GWAS for either mCHH or mCHG (when using a standard univariate model), and that an analysis accounting for this covariance might perform better [29-31]. In essence, we sought to simplify a complex trait by breaking it into constituent parts [32-34]. This worked, as we shall see below.

Conditional GWAS reveals new associations

Multi-trait GWAS can be carried out using transformations (e.g., the difference or ratio between traits), conditional approaches in which one or more traits are accounted for as covariates, or full multivariate models [29-31]. We tried several different approaches, including a multivariate GWAS, but this paper will focus on results from a conditional GWAS of mCHG with mCHH as a covariate (denoted mCHG|mCHH) because this gave the clearest results. Other approaches produced consistent results, and we will briefly discuss them later. Fig 2 contrasts the performance of univariate and conditional GWAS. As previously noted, univariate GWAS of mCHG does not yield any significant associations despite moderate SNP-heritability (37% and 38% for RdDM- and CMT2-targeted transposons, respectively). As expected given the very strong correlation between mCHH and mCHG on RdDM-targeted transposons, the strong associations at AGO1 and NRPE1 found in univariate GWAS of mCHH [20] give rise to associations for mCHG as well, but they are not genome-wide significant (AGO1 Chr1:17895231, -log10p-value = 6.51 and NRPE1 Chr2:16714815, -log10p-value = 5.23). The previously identified CMT2-association with mCHH on CMT2-targeted transposons [6,20,35,36] is not apparent, consistent with the much weaker correlation between mCHH and mCHG on these transposons.
Fig 2

Comparison of univariate and conditional GWAS of mCHG.

The analysis was done separately for (A) RdDM-targeted and (B) CMT2-targeted transposons, using the 774 lines in the global panel from the 1001 Epigenomes Project (“SALK leaf in ambient temperature”; see Fig 1). For each case, the upper Manhattan plot shows univariate GWAS of mCHG and the lower GWAS of mCHG controlling for mCHH. Horizontal gray lines show genome-wide significance (p = 0.05 after Bonferroni-correction). The line plots show enrichment of a priori genes and FDR (see text), with horizontal dashed lines indicating an FDR of 20%.

Comparison of univariate and conditional GWAS of mCHG.

The analysis was done separately for (A) RdDM-targeted and (B) CMT2-targeted transposons, using the 774 lines in the global panel from the 1001 Epigenomes Project (“SALK leaf in ambient temperature”; see Fig 1). For each case, the upper Manhattan plot shows univariate GWAS of mCHG and the lower GWAS of mCHG controlling for mCHH. Horizontal gray lines show genome-wide significance (p = 0.05 after Bonferroni-correction). The line plots show enrichment of a priori genes and FDR (see text), with horizontal dashed lines indicating an FDR of 20%. The contrast between these univariate GWAS results and the conditional analysis is striking. GWAS for mCHG while controlling for mCHH (mCHG|mCHH) revealed five clear peaks for RdDM-targeted transposons, three of which were also found for CMT2-targeted transposons. The peaks are above or near genome-wide significance using a conservative threshold (Figs 2 and S1 and S1 Table). As will be discussed further below, these associations jointly explain 30.5% and 14.8% of mCHG|mCHH in RdDM- and CMT2-targeted transposons, respectively. The previously mentioned AGO1 and NRPE1 associations disappear completely, consistent with their being strongly correlated with mCHH, and hence controlled for. The improved performance of conditional GWAS is also evident from a massive enrichment of associations near a priori candidates (see Methods and [37]). With conditional GWAS, we observed an up to 45-fold excess of associations near annotated epigenetic modifiers [20,35], compared to an 4- to 10-fold excess for the univariate analyses (Fig 2). The enrichment for a priori candidates also allows us to estimate a False Discovery Rate (FDR) for this set of genes [37]. Four of the five major associations can be identified with a priori candidates at very low FDRs (e.g. FDR < 0.05 using a significance threshold of -log10p-value > 6; see Fig 2), but it is notable that FDR is low even for associations that are nowhere near genome-wide significance. This is not true for the univariate analyses. For example, at an FDR of 20%, univariate GWAS of mCHG identifies only AGO1 and NRPE1 for RdDM-targeted transposons, and nothing for CMT2-targeted transposons (S2 Table), whereas conditional GWAS of mCHG|mCHH generates a long list of known regulators of mCHG (S2 Table). For RdDM-targeted transposons, we find four a priori genes above or near genome-wide significance: the previously mentioned CMT2 and CMT3, plus MIR823a, which encodes a microRNA targeting CMT3 [26,38], and MULTICOPY SUPPRESSOR OF IRA1 (MSI1), likely a component of chromatin assembly factor (CAF-1) responsible for proper heterochromatin formation with FASCIATA1 (FAS1) and FASCIATA2 (FAS2) [39,40]. At lower significance levels, we find REPRESSOR OF SILENCING 3 (ROS3) and DNA METHYLTRANSFERASE 2 (DNMT2). ROS3 is a DNA demethylase that interacts with REPRESSOR OF SILENCING 1 (ROS1) [41] and DNMT2 is associated with histone deacetylation [42] and RNA methyltransferase activity [43] (and could therefore be a false positive since the effects on DNA methylation is under debate; see [42,43]). For CMT2-targeted transposons, the list of associated a priori candidates includes four genes also associated with RdDM-targeted transposons, namely MIR823A, CMT2, ROS3, and MSI1, but there are also two specific genes, FAS1 and DECREASED DNA METHYLATION 1 (DDM1), a chromatin-remodeler responsible for heterochromatin formation [44,45]. As shown in Fig 1B, ddm1 strongly reduces mCHG, and fas2, a functionally redundant FAS1 homolog, increases CMT2-targeted mCHG in an mCHH-independent manner [27].

Genes underlying major associations

On searching for causal genes

Identifying causal genes and mechanisms from GWAS results is notoriously difficult [46]. Peaks of association often cover multiple genes—this is certainly true in the gene-dense genome of A. thaliana [37]—and functional annotation is of limited use for complex traits with unknown genetic basis. We are in a stronger position, however, because, just like in our previous work on mCHH [35], the significant GWAS peaks (Fig 2) are narrow, and clearly pinpoint genes a priori known to be specifically involved in well-defined molecular phenotypes (Table 1 and Fig 3). Hence we will discuss these candidates, and the indirect evidence supporting their causal role.
Table 1

Significant conditional GWAS hits for mCHG|mCHH*.

SNPs-log10(p-value)MACDistance froma priori gene (bp)Genes within 15K bp*
mCHG|mCHH in RdDM-targeted transposons
Chr1:403502210.177NAAT1G11930, AT1G11940, AT1G11950 (JMJ26), AT1G11960
Chr1:262536307.3829050 AT1G69770 (CMT3)
Chr3:449662610.48207196AT3G13700, AT3G13710 (PRA1.F4), AT3G13720 (PRA1.F3), AT3G13724 (MIR823A), AT3G13730 (SYP90D1)
Chr4:104232786.853012069AT4G19020 (CMT2), AT4G19030 (NLM1), AT4G19035 (LCR7)
Chr5:235535067.14992506AT5G58200 (ECT10), AT5G58200, AT5G58210, AT5G58220 (TTL), AT5G58230 (MSI1), AT5G58240 (FHIT)
mCHG|mCHH in CMT2-targeted transposons
Chr3:44960476.42210775AT3G13700, AT3G13710 (PRA1.F4), AT3G13720 (PRA1.F3), AT3G13724 (MIR823A), AT3G13730 (SYP90D1)
Chr4:104232786.863012069AT4G19020 (CMT2), AT4G19030 (NLM1), AT4G19035 (LCR7)
Chr5:235559109.1764102AT5G58200, AT5G58210, AT5G58220 (TTL), AT5G58230 (MSI1), AT5G58240 (FHIT), AT5G58250, AT5G58260 (NdhN)

* Bold denotes a priori epigenetic regulators

Fig 3

Candidate loci underlying mCHG|mCHH variation.

Zoomed-in Manhattan plots around conditional GWAS peaks in Fig 2 for (A) RdDM- and (B) CMT2-targeted transposons. Gene models below the plots show the candidate genes in orange (see Table 1). Colored dots represent SNPs in strong linkage disequilibrium with top SNPs.

Candidate loci underlying mCHG|mCHH variation.

Zoomed-in Manhattan plots around conditional GWAS peaks in Fig 2 for (A) RdDM- and (B) CMT2-targeted transposons. Gene models below the plots show the candidate genes in orange (see Table 1). Colored dots represent SNPs in strong linkage disequilibrium with top SNPs. * Bold denotes a priori epigenetic regulators

Multi-layered direct CMT3 regulation affects mCHG variation

Based on mutant phenotypes, CMT3 is a strong a priori candidate for regulating mCHG (Fig 1). Consistent with this, one of the five major associations from GWAS of mCHG|mCHH pinpoints CMT3 and another MIR823a, a gene encoding a microRNA that directly down-regulates CMT3 by cleaving its transcripts during early embryogenesis [26]. The most significant peak for mCHG|mCHH on RdDM-targeted transposons was located 196 bp downstream of MIR823a, while the peak corresponding to CMT3 was located 50 bp upstream of the gene in a potential regulatory region (Figs 4A and S4A and Table 1). GWAS for CMT2-targeted transposons also pinpointed MIR823a, although the association is weaker, and the most significant SNP is further away (Fig 3B and Table 1). This association was also recently found by Hüther et al. [47] using GWAS for unconditional mCHG levels of individual transposons.
Fig 4

Effects of MIR823A on mCHG levels.

(A) Polymorphisms in the miRNA region of MIR823A and the target in CMT3 in 1135 natural lines. Differences from the CMT3 reference sequence are shown in light blue. Haplotype counts are shown in parentheses. (B) Estimated effects of MIR823A alleles on CMT3 expression in natural lines with individual values (Welch’s t-test, two-tailed, CMT3 expression was collected from transcriptome data published in [20]). (C) Scatter plot comparing the average allelic effect of the MIR823A polymorphism with the effect of cmt3 in the Col-0 background. Dots represent individual transposons, and the colors show the significance of the allelic effects as -log10p-value in GWAS. (D) CMT3 expression in early embryos with standard errors in WT (Col-0) and mir823a. Significance of the difference in mean tested using Welch’s t-test (n = 3, one-tailed). (E) The distribution of mCHG levels in RdDM and CMT2-targeted transposons, comparing knock-outs of MIR823A to wild-type in different backgrounds.

Effects of MIR823A on mCHG levels.

(A) Polymorphisms in the miRNA region of MIR823A and the target in CMT3 in 1135 natural lines. Differences from the CMT3 reference sequence are shown in light blue. Haplotype counts are shown in parentheses. (B) Estimated effects of MIR823A alleles on CMT3 expression in natural lines with individual values (Welch’s t-test, two-tailed, CMT3 expression was collected from transcriptome data published in [20]). (C) Scatter plot comparing the average allelic effect of the MIR823A polymorphism with the effect of cmt3 in the Col-0 background. Dots represent individual transposons, and the colors show the significance of the allelic effects as -log10p-value in GWAS. (D) CMT3 expression in early embryos with standard errors in WT (Col-0) and mir823a. Significance of the difference in mean tested using Welch’s t-test (n = 3, one-tailed). (E) The distribution of mCHG levels in RdDM and CMT2-targeted transposons, comparing knock-outs of MIR823A to wild-type in different backgrounds. The MIR823A polymorphism appears to almost exclusively affect mCHG (S2 and S3 Figs), primarily targeting the same transposons as a CMT3 knock-out, as expected if the former directly regulates the latter (Fig 4C). Consistent with this interpretation, lines carrying the non-reference MIR823A allele and a CMT3 reference allele showed lower CMT3 expression as well as lower mCHG (Fig 4B). The specificity of the phenotypic effects and known regulatory mechanism provides strong evidence for a direct causal role for these genes. Knock-outs of MIR823A in several backgrounds affected CMT3 expression and mCHG in a manner consistent with this regulatory model [26], although the effects on methylation were very weak (Figs 4D, 4E and S4D). The natural polymorphism almost certainly does not involve a loss of function, as there is no common polymorphism in the 21-nt mature miRNA region of MIR823A, nor in the target region of CMT3 (Fig 4A). There are also no other non-synonymous polymorphisms in CMT3 significantly associated with the different alleles. There is no SNP in the stem-loop region of MIR823A that is significantly associated with mCHG|mCHH, and the most significant SNP (Chr3:4496626) does not appear to affect predicted RNA secondary structure [48] (S4C Fig).

Further evidence for allelic heterogeneity at CMT2

In addition to de novo establishment of methylation in heterochromatic regions, CMT2 plays a role in mCHG maintenance through regulation of H3K9 methylation via a self-reinforcing loop [11,49]. Previous work has identified two common natural alleles, CMT2a’ and CMT2b’, affecting mCHH [6,20,35], but neither appears to affect mCHG (S2, S3 and S5 Figs). Here we identify a new association, 2 kb downstream of CMT2 (Fig 3), that does appear to affect mCHG (on both RdDM- and CMT2-targeted transposons) (S5 Fig). The top SNP is only weakly correlated with CMT2a’ and CMT2b’ (r = 0.37; r = 0.24), but caution is needed when interpreting associations in regions that apparently harbor multiple causal polymorphisms [50].

A complex association on chromosome 5 includes two a priori genes

The major peak on chromosome 5 was associated with mCHG|mCHH on both RdDM- and CMT2-targeted transposons, although the shape of the peak differs slightly. The strongest association was found for CMT2-targeted transposons and is located 102 bp upstream of a known epigenetic regulator, MSI1, but substantial linkage disequilibrium extends over a 30 kb region which also includes another a priori gene, ROS3 (Figs 3 and S6A). The loss-of-function mutant ros3 does not show altered mCHG in leaves (Fig 1B)[27]. Loss of MSI1 causes embryonic lethality [51], but the heterozygous mutant msi1-2 does not show a significant effect on mCHG levels in leaves either. However, MSI1 has been copurified with a tesmin/TSO1-like CXC domain-containing protein, a component of a DREAM complex that controls mCHG via repression of CMT3, KYP, MET1, and DDM1 [52]. In addition, a loss of FAS2, a component of CAF-1, which also includes MSI1, induces mCHG hypermethylation in CMT2-targeted TEs (Fig 1B right) [27,53]. Therefore, MSI1 remains a strong candidate. Furthermore, the strongly associated SNP at Chr5:23555910 could explain almost all phenotypic variation associated with the chromosome 5 peak (S6B Fig), making MSI1 the top candidate, although further experiments are clearly needed. The allelic variation is presumably regulatory, since the SNP is not correlated with any non-synonymous SNPs.

A jmjC gene is a novel modifier of mCHG in RdDM-targeted transposons

The final peak (Chr1:4035022; -log10p-value = 10.1) in our study did not pinpoint any a priori gene. However, it is highly significant and narrowly centered on the coding region of a jmjC gene, JUMONJI26 (JMJ26) [54]. JMJ26 is a close homolog of JMJ25, also known as INCREASE IN BONSAI METHYLATION 1 (IBM1), a histone H3K9m demethylase targeting genic regions in a KYP/SUVH4- and CMT3-dependent manner [55,56]. In contrast to IBM1, the function of JMJ26 has barely been studied. Among the most significant associations were two non-synonymous SNPs resulting in amino acid changes (Chr1:4035683, Glu678Val; Chr1:4035690, Leu676Val) in the conserved JMJ domain, suggesting allelic effects on the enzymatic activity (Figs 5 and S7).
Fig 5

Effects of JMJ26 on mCHG levels.

(A) Read density and SNP across the gene model. (B) Volcano plots showing the effects on DNA methylation levels of jmj26 on RdDM- and CMT2-targeted transposons. (C) Violin plots showing the distribution of mCHG levels of RdDM- and CMT2-targeted transposons in WT (Col-0), jmj26, and two lines with JMJ26ref or JMJ26alt transgenes in Col-0 jmj26 background. The gray horizontal line shows the mean value of WT. (D) The genome-wide distribution of transposons affected by JMJ26. Black lines show differential mCHG levels of jmj26 from Col-0 in 30 kbp sliding windows. Orange dots show transposons with significantly changed mCHG levels (n = 2, p<0.01 by two-tailed Welch’s t-test & differential mCHG > 0.1 or < -0.1). (E) Heat map comparing JMJ26 and IBM1 targets. Rows correspond to standardized mCHG levels in each gene or transposon for jmj26, ibm1, and WTs.

To explore the function of JMJ26, we measured mCHG levels of a loss-of-function mutant, jmj26 (Fig 5). Unlike IBM1 loss-of-function mutants, jmj26 did not show any morphological phenotype (S8A Fig), but mCHG levels increased significantly in RdDM-targeted transposons (Fig 5B and 5C). Furthermore, jmj26 showed increased mCHG in pericentromeric transposons, whereas ibm1-like hypermethylation was not observed in genic regions (Fig 5D and 5E). Gene expression was barely affected except for a few DNA methylation sensitive genes, including transposon genes (AT5G35935, AT4G01525) (S3 Table). These observations support the functionality of JMJ26 as a histone demethylase with different targets from IBM1, and also make it a strong a posteriori candidate for causing the GWAS peak on the left end of chromosome 1.

Effects of JMJ26 on mCHG levels.

(A) Read density and SNP across the gene model. (B) Volcano plots showing the effects on DNA methylation levels of jmj26 on RdDM- and CMT2-targeted transposons. (C) Violin plots showing the distribution of mCHG levels of RdDM- and CMT2-targeted transposons in WT (Col-0), jmj26, and two lines with JMJ26ref or JMJ26alt transgenes in Col-0 jmj26 background. The gray horizontal line shows the mean value of WT. (D) The genome-wide distribution of transposons affected by JMJ26. Black lines show differential mCHG levels of jmj26 from Col-0 in 30 kbp sliding windows. Orange dots show transposons with significantly changed mCHG levels (n = 2, p<0.01 by two-tailed Welch’s t-test & differential mCHG > 0.1 or < -0.1). (E) Heat map comparing JMJ26 and IBM1 targets. Rows correspond to standardized mCHG levels in each gene or transposon for jmj26, ibm1, and WTs.

Major loci additively affect mCHG and influence transposon activity

Finally, we investigated how the five major loci jointly shape mCHG variation at the population level. Although some of these genetic variants are clearly interacting at the molecular level (e.g., MIR823A directly regulates CMT3), the phenotypic effects appear to be additive (Fig 6A), and jointly explain a large fraction of the variation: 30.5% and 14.8% of mCHG|mCHH in RdDM- and CMT2-targeted transposons, respectively, while SNP-based heritabilities of these traits are 65% and 69%. mCHG levels decrease linearly with the number of mCHG-decreasing alleles, and the single line (out of 774) harboring mCHG-decreasing alleles at all five loci exhibits strong hypo-mCHG|mCHH in RdDM-targeted transposons (Fig 6A).
Fig 6

Cumulative effects and geographic distribution of mCHG-decreasing alleles.

(A) Cumulative effect on mCHG|mCHH. Chr1:4035022 (JMJ26) was excluded from the analysis of CMT2-targeted transposons as it has no effect. Chr5:2355910 was used for MSI1. Inserted plots show mCHG levels around transposons, calculated using 20 sliding windows across the transposon body and flanking regions. (B) Longitudinal frequency distribution of mCHG-decreasing and NRPE1’ alleles (top); geographical distribution of the number of mCHG-decreasing alleles (middle), and mCHG|mCHH levels in RdDM-targeted transposons (bottom). Triangles correspond to lines in the inserted plot of panel D. (C) The frequency of distribution of the number of mCHG-decreasing alleles, separately for NRPE1’ genotypes. (D) The number of recent transposon insertions [22] as a function of NRPE1’ genotype and the number of mCHG-decreasing alleles. Significance was tested using a permutation test (see Methods; *** is p < 0.01, * is p < 0.1). The insert shows the six observed genotypes with three mCHG-decreasing and the NRPE1’ non-reference allele. Filled circles indicate mCHH- or mCHG-decreasing alleles. Mapping and statistical testing were performed in R version 3.5.3. Maps were generated using the following packages: maps, ggridges, ggplot, rasrer, tidyverse.

These five loci also affect the geographic distribution of mCHG for both RdDM- and CMT2-targeted transposons. In European lines (n = 971), mCHG-decreasing alleles are almost exclusively found in the west, from 0 to 30°E (Figs 6B, S9 and S10). Lines carrying more than three mCHG-decreasing alleles are rare (2.5%), and the one line carrying all five such alleles is from the Iberian peninsula: IP-Mun-0 (9561), an admixed line of relict and western European descent [57]. Interestingly, the longitudinal gradient in the number of mCHG-decreasing alleles is opposite to the longitudinal cline of NRPE1’ associated with lower mCHH [35] and higher transposon mobilization [22] (Fig 6B). No line was found carrying NRPE1’ with more than three mCHG-decreasing alleles, and a permutation test shows that this negative correlation is unlikely to be due to population structure (Figs 6C and S10C). A plausible alternative is some form of stabilizing selection on methylation levels: multiple mCHG-decreasing alleles in lines with lower mCHH levels might affect fitness analogously to double knockout mutants of the RdDM and CMT3 pathways that show pleiotropic morphological phenotypes and high transposon transcriptional activity [10,28,49]. To explore this further, we examined the effect of mCHG-negative alleles on transposon mobilization by intersecting our data with those of Baduel et al. [22], who found a significant correlation between recent transposon mobilization and the NRPE1’ non-reference allele. We found that this effect reflects an epistatic interaction with the five loci reported here: significantly increased transposon mobilization is only found in lines carrying the NRPE1’ non-reference allele and two or more mCHG-decreasing alleles (p-value < 0.01; Figs 6D and S11). In particular, lines carrying three mCHG-decreasing alleles and the NRPE1’ non-reference allele have about thirty times more recent transposon insertions than lines carrying no mCHG-decreasing alleles and the NRPE1’ non-reference allele. Such lines are not closely related and carry different transposon insertions, mostly Copia and MuDR. They are found in two distinct regions: north-western Africa, including the Canary Islands, and the south-eastern Balkans (Fig 6B and S4 Table).

Cumulative effects and geographic distribution of mCHG-decreasing alleles.

(A) Cumulative effect on mCHG|mCHH. Chr1:4035022 (JMJ26) was excluded from the analysis of CMT2-targeted transposons as it has no effect. Chr5:2355910 was used for MSI1. Inserted plots show mCHG levels around transposons, calculated using 20 sliding windows across the transposon body and flanking regions. (B) Longitudinal frequency distribution of mCHG-decreasing and NRPE1’ alleles (top); geographical distribution of the number of mCHG-decreasing alleles (middle), and mCHG|mCHH levels in RdDM-targeted transposons (bottom). Triangles correspond to lines in the inserted plot of panel D. (C) The frequency of distribution of the number of mCHG-decreasing alleles, separately for NRPE1’ genotypes. (D) The number of recent transposon insertions [22] as a function of NRPE1’ genotype and the number of mCHG-decreasing alleles. Significance was tested using a permutation test (see Methods; *** is p < 0.01, * is p < 0.1). The insert shows the six observed genotypes with three mCHG-decreasing and the NRPE1’ non-reference allele. Filled circles indicate mCHH- or mCHG-decreasing alleles. Mapping and statistical testing were performed in R version 3.5.3. Maps were generated using the following packages: maps, ggridges, ggplot, rasrer, tidyverse.

Discussion

The genetics of non-CG transposon methylation

Non-CG methylation in plants is associated with gene silencing, especially for transposons. In previous studies, we identified an oligogenic architecture for mCHH on transposons involving major polymorphism at CMT2, NRPE1, AGO1, and AGO9—all involved in DNA methylation pathways [6,20,35]. Here we complement this with a conditional GWAS approach and identify five major polymorphisms affecting mCHG, presumably acting via methylation maintenance pathways: CMT2 (different alleles), CMT3, MIR823A (directly regulating CMT3), MSI1 (probably), and JMJ26. All had previously been shown to affect DNA methylation except JMJ26. We observed major polymorphisms at several of the layers of regulation controlling mCHG, including methylation, de-methylation, and histone modification—even post-transcriptional regulation, in the form of a microRNA targeting the CMT3 pathway directly. The DDM1 and FAS1 alleles detected using a non-stringent threshold would further contribute to the fine-tuning of DNA methylation over transposons genome-wide. The genetic architecture revealed by these studies is very different from the “omnigenic” model [58,59] which has come to represent a typical human GWAS result. It is also different from trans-regulation for reported human mCG variation [60,61]. Rather than a diffuse genetic architecture involving huge numbers of loci of small effect and unknown causal relationship to the phenotype, we find a small number of major loci with highly plausible mechanistic connection to the phenotype—and a diffuse background for which we lack the power to dissect further. The results are reminiscent of those found for other clinal traits, like flowering time [37,62], flower color [63], or eye and coat color in mammals [64-66], suggesting that transposon methylation is likewise under selection, most likely as part of a plastic and an environmentally sensitive genome defense system—a notion further supported by the association with transposon mobility reported here and be Baduel et al. [22].

The power and complexity of conditional GWAS

The performance of GWAS relies on using the right model for the relation between genotype and phenotype. As with other statistical methods, using the wrong model may lead to unpredictable results. However, lack of prior knowledge makes modeling difficult, and univariate linear models are therefore the default in GWAS. Here we focus on correlated traits. Correlations between traits may arise for a variety of reasons, most obviously a shared genetic basis (pleiotropy), and it is clear that such traits should, in principle, be analyzed jointly using some kind of multivariate analysis [29-31,67]. Our results provide a dramatic illustration of the potential benefits of doing so. Despite high heritability, univariate GWAS of mCHG variation failed to detect any significant associations, leading us to conclude that the trait was simply too polygenic [20]. In contrast, GWAS of the same data using a conditional model that controlled for mCHH revealed an oligogenic architecture with five major loci—qualitatively similar to what we had previously seen for mCHH using univariate GWAS, also in that it mainly identified loci corresponding to biologically meaningful a priori candidates [6,20,35]. However, while our conditional GWAS approach clearly improved the power to reject the null hypothesis of no association, interpreting the result in terms of causality is more difficult. By controlling for mCHH we have effectively simplified the trait, presumably revealing genetic factors affecting mCHG only, perhaps by affecting the maintenance of this type of DNA methylation. There is considerable background experimental evidence to support such a model. For example, the RdDM pathway would affect both mCHH and mCHG via de novo methylation, while mCHG (but not mCHH) would be also maintained by the CMT3 pathway, and by controlling the former, we would reveal the latter. Our GWAS results are consistent with this interpretation. Polymorphism at NRPE1, a key component of the RdDM pathway, was revealed in univariate GWAS, while variation in CMT3 (and its regulator, MIR823a) was only found after controlling for mCHH. At the same time, it seems clear that reality is more complex, and that both mCHH and mCHG are regulated by multiple homeostatic mechanisms that also involve factors not included in our model, like histone modifications [68]. It is therefore not surprising that most of the polymorphisms we have identified seem to affect both traits, albeit not to the same extent (exceptions include the one at MIR823A, which only seems to affect mCHG, and the previously identified CMT2a’ and CMT2b’ polymorphisms, which only affect mCHH on CMT2-targeted transposons; see S2 and S3 Figs). Two other GWAS models produced consistent results, but had less power (S12 and S13 Figs). A fully parameterized multi-trait mixed model for mCHH and mCHG (MTMM; see [29]) confirmed that the new associations presented in this paper were “specific” to one of the two traits, while a conditional model for mCHH while controlling for mCHG (mCHH|mCHG) also identified most of these loci, plus the previously identified CMT2 associations [6,35], which affect mCHH only. In conclusion, we agree with Stephens [31] that multi-trait association methods can have much greater power than univariate methods, but require an appropriate statistical framework for interpretation. However, in a model organism like A. thaliana, further experiments should guide the construction of such a framework.

Functional importance of non-CG transposon methylation

While genomics has revealed striking geographic variation in DNA methylation, and an equally striking genetic architecture underlying this, the functional importance of all this variation has been rather less clear. Based on decades of molecular biology studies, it is reasonable to speculate that it must play some role in regulating transposon activity [5,27,69,70]. Non-CG methylation is part of a redundant system with strong epistasis between mCHH and mCHG in keeping transposons transcriptionally silent [10,28,49]. However, direct evidence based on transposon mobility in nature has been missing. Now, the evidence is gathering that the allelic variants responsible for transposon methylation variation do indeed affect transposon mobility. Baduel et al [22] showed that the NRPE1’ polymorphism identified as affecting mCHH on RdDM-targeted transposons by Sasaki et al [35] is associated with recent transposon mobility, and we show here that the multiple loci controlling mCHG appear to play a similar role. Our findings connect complex genome defense systems proposed in molecular biology studies with natural populations. Furthermore, the geographic distribution of the relevant multi-locus genotypes suggests that selection may be acting to maintain the appropriate level of transposon silencing—for unknown reasons. Further studies, such as crosses with new allele combinations and common garden experiments, will be required to address these questions.

Materials and methods

Analyzed data sets

DNA methylation data

The DNA methylation data sets are summarized in S5 Table. Briefly, we analyzed a bisulfite-sequencing data set published in the 1001 epigenome project [20], 85 epigenetic-related mutants [27], and our own sequence data described below. All reads were mapped on the appropriate pseudo-genome provided by the 1001 genome project (1001 Genomes Consortium., 2016) using a Methylpy pipeline v1.2 [71]. DNA methylation levels were estimated as weighted methylation levels for each transposon defined in Araport11 annotation. CMT2- and RdDM-targeted transposons were defined as having differential levels of methylation (> 0.1) between wild-type and cmt2 or drm1drm2 in Col-0 previously described in [20]. For each line, average DNA methylation was calculated using all transposons for which at least one read was mapped.

RNA-seq data

The RNA-seq data are summarized in S5 Table. Quality control and adapter trimming of all RNA-seq data were conducted using FASTP [72]. Adapter trimmed reads were aligned on the reference genome Araport11 [73] using STAR v2.7 [74] with default settings. For all annotated genes, mapped read counts were calculated using featureCounts v2.0.1 in the Subread package [75], accepting reads that were mapped on multiple genes (option -M). Using the count data, differentially expressed genes were detected by edgeR v3.1 [76] with glmQLFTest() function (FDR < 0.05). For transposon expression, reads were mapped as described above but allowing multiple hits (STAR—outFilterMultimapNmax 100 and—winAnchorMultimapNmax 100). Mapped reads were calculated by TEtoolkits [77] using the TEcount() function, and differentially expressed transposons were detected using edgeR. CMT3 expression in natural populations were downloaded from the 1001 epigenome project data [20].

Statistics

Genome-wide association studies

Univariate and conditional GWAS were carried out using LIMIX [78] version 3.0.4 with a full genome SNP matrix for 774 lines [20] from the 1001 genome project (10,709,949 SNPs) and the following linear mixed model, where Y is the n×1 vector of a phenotype, fixed terms, L and X, are the n×1 vectors corresponding to a covariate and a genotype to be tested (SNP) with the parameters α and β, respectively. and are random terms, including the identity-by-state kinship matrix K representing the genetic relatedness [79,80] and the residuals, respectively. Y, L, and X were z-scored. Univariate models without cofactor take α = 0. SNPs that satisfied minor allele frequency (MAF) > 5% were used for association studies. Bonferroni-correction was used for multiple-testing correction after excluding all SNPs less than MAF 5%. Multi-trait linear mixed models (S12 Fig) were performed to identify common genetic effects (common) and differing genetic effects (specific) between two traits using LIMIX [78] following models described in [29].

Enrichment

To assess the amount of signal in the data, we used the enrichment approach introduced in [37]. The rationale behind this approach is that loci that are a priori candidates based on independent data should be more likely to be associated than loci that are not. Enrichment thus simultaneously demonstrates that the a priori loci are a meaningful subset and that there is a real signal of causality in the GWAS. Importantly, the method relies on the empirical p-value distribution, and is insensitive to these being well-calibrated (a major advantage when dealing with population structure). Let x be the fraction of a priori loci that have p-values smaller than p, and let y be the same fraction among the remaining loci. Note that the loci do not have to be genes, but can also be genomic regions (e.g. coding vs non-coding), and that the method is not limited to p-values, but can be used for any ordinal statistic. Enrichment is then x/y. If desired, significance of the observed enrichment can be assessed using a suitable permutation scheme, like the “genome rotation” method introduced in [81]. In this study, we focused on genes, and assigned each gene a p-value using the strongest SNP (MAF > 5%) association within 15 kb (upstream and downstream). Our a priori set was the 80 epigenetic regulators from [20]. As described in [37], the enrichment can also be used to calculate an upper bound for the False Discovery Rate (FDR) among the a priori loci by assuming that all non a priori associations are false (if this is false, the true FDR will be lower, hence it is an upper bound). The FDR is calculated as 1−(x−y)/x = y/x, and we emphasize again that it only applies to the a priori set of loci.

Heritability

SNP-heritability was estimated using REML implemented in mixmogam (https://github.com/bvilhjal/mixmogam/blob/master/phenotypeData.py). The proportion of phenotypic variation explained by identified genetic variants was calculated using r2 [82] using R scripts published in [83].

Correlation of the allelic effects and molecular phenotypes

For 9329 transposons common across 774 lines, differential mCHG levels induced by alleles were estimated for each transposon as the differential average methylation level between lines carrying the reference (Col-0) and the alternative allele. Differential mCHG levels induced by mutants were calculated analogously [27]. The details are described in [35].

Gene effect of loss-of-function mutants

Gene functions of MIR823A and JMJ26 were tested using transgenic lines described below. We measured DNA methylation levels for two or three independently grown plants, and the significance of differential mean values across RdDM- and CMT2-targeted transposons was tested by Welch’s t-test (two-tailed). DNA methylation pattern of jmj26 was compared to ibm1 and the control samples from published data sets [27].

Permutation tests

We evaluated the effects of mCHG-decreasing alleles using permutation tests with 3000 randomly chosen SNP sets with the same allele frequencies as the identified alleles along the genome. For correlations between the number of mCHG-decreasing alleles and longitude or linkage disequilibrium between the five alleles and NRPE1’ alleles, we compared the observed to the 3000 randomly chosen samples directly. For evaluation of epistatic effects between NRPE1’ and mCHG-decreasing alleles on transposon activity, we randomly chose 3000 SNPs with the same allele frequency as NRPE1’ and tested the effect of zero to five mCHG-decreasing alleles.

Plant materials

Plants were grown at 21°C with a 16-h light/8-h dark cycle and humidity 60%. Whole plant rosettes were harvested individually when they reached the 9-true-leaf stage of development. The t-DNA insertion line of jmj26 (SALKseq_069498.1) and a gamma-ray mutant msi1-2 [53] were purchased from Nottingham Arabidopsis Stock Center (http://arabidopsis.info/) for the functional analysis. Homozygous jmj26 was used for bisulfite sequencing after the second generation. msi1-2 was maintained as a segregating population, and heterozygous lines were selected by genotyping and measuring MIS1 expression by qRT-PCR.

Transgenic lines

MIR823A CRISPR/CAS9 mutant lines

Loss-of-function mutants mir823a for Col-0 and 9771 were generated using CRISPR/CAS9 methods described in [84]. Briefly, to generate a vector carrying two sgRNAs targeting MIR823A, the target sites were incorporated into forward and reverse PCR primers, and the fragment amplified from pCBC-DT1T2 (S6 Table). Subsequently, an amplified and purified fragment was assembled using the Golden Gate reaction into pHSE401E modified to contain a mCherry marker. The resulting vector was transformed into Agrobacterium tumefaciens GV3101/pSOUP, then transformed into A. thaliana by the floral dip method [85]. T1 seeds were screened by mCherry marker under the fluorescence stereomicroscope (Discovery.V8, Zeiss), and the mutations were genotyped by Sanger sequencing. T2 seeds without mCherry signal were kept to amplify the stable T3 generations, which were used for further analyses.

Rescue analyses of JMJ26

The gene function of JMJ26 was confirmed by rescue experiments. The protein-coding genomic sequence including 3 kbp promoter sequence was PCR-amplified and cloned into the pGreen0029 binary vector [86] using In-Fusion Cloning system (Takara Bio Europe, Saint-Germain-en-Laye, France) for Col-0 and DraIV 6–22 (5984) carrying the reference and the alternative allele, respectively. These constructs were used for floral dipping transformation to Col-0 background jmj26 (SALKseq_069498.1) to create transgenic lines [85].

mRNA abundance

Total RNA was extracted using ZR Plant MiniPrep Kit (Zymo Research) including a DNase I treatment and was quantified by fluorometer Qubit 4 (Invitrogen). cDNA was synthesized using the SuperScript III First-Strand Synthesis System (Invitrogen) according to the manufacturer’s protocols. qRT-PCR was performed using the LightCycler 96 system (Roche) with FastStart Essential DNA Green Master (Roche). The transcript abundance was estimated by ddCT methods with an internal control SAND (AT2G28390) [87]. Primer information is listed in S6 Table.

MicroRNA abundance and secondary structure

Material collection and RNA extraction

Siliques were collected five days after pollination when the embryos reached an early torpedo stage according to the previous report [26]. Using the tungsten needles, seeds were dissected from siliques and collected in 1.5 ml tubes, and washed 3–4 times with fresh 10% RNA later solution (Thermo Fisher Scientific). After the last wash, the buffer was replaced with 500 μl TRIzol Reagent (Ambion) and total RNA was isolated as described in [88].

qRT-PCR Analysis

The quantification was conducted according to the protocol in [26]. Briefly. cDNA synthesis was performed using SuperScript III System (Invitrogen), and the corresponding stem-loop primers were added to the reverse transcription reaction. qRT-PCR analysis was performed on the LightCycler 96 Instrument (Roche) with the FastStart Essential DNA Green Master (Roche). The transcript abundance was estimated by ddCT methods with an internal control miRNA160.

RNA secondary structure

RNA secondary structure was predicted by Vienna RNAFold [48] (http://rna.tbi.univie.ac.at/cgi-bin/RNAWebSuite/RNAfold.cgi) based on Minimum of Free Energy (MEF) values [89]. We applied sanger sequences of Col-0 and 9771 carrying the alternative allele for the prediction.

RNA sequencing

RNA was extracted in the same protocol with qRT-PCR for mRNA.Total RNA was treated with the poly(A) RNA Selection Kit (Lexogen). Libraries were prepared according to the manufacturer’s protocol in NEBNext Ultra II Directional RNA Library Prep Kit (New England BioLabs) and individually indexed with NEBNext Multiplex Oligos for Illumina (New England BioLabs). The quantity and quality of each amplified library were analyzed by using Fragment Analyzer (Agilent) and HS NGS Fragment Kit (Agilent).

Bisulfite sequencing

Genomic DNA was extracted using GeneJET Plant Genomic DNA Purification Kit (Thermo Scientific) and sheared by E220 Focused-ultrasonicator (Covaris) to target average fragments around 350 bp. Sequencing libraries were prepared using NebNext Ultra II DNA Library Prep Kit (New England BioLabs) with Methylated Adaptor (New England BioLabs). Adapter-ligated DNA was carried through the bisulfite conversion using EZ-96 DNA Methylation-Gold MagPrep Kit (Zymo Research). Bisulfite-treated samples were amplified by EpiMark Hot Start Taq DNA Polymerase and indexed with NEBNext Multiplex Oligos for Illumina (New England BioLabs). All libraries were sequenced by Illumina NextSeq550 or Hiseq2500.

Linkage disequilibrium between five loci.

(XLSX) Click here for additional data file.

a priori genes identified by enrichment tests (FDR = 0.2).

(XLSX) Click here for additional data file.

The list of genes significantly affected in jmj26.

(XLSX) Click here for additional data file.

Lines carrying three mCHG-decreasing alleles with mCHH-decreasing NRPE1’ allele.

(XLSX) Click here for additional data file.

Key resources.

(XLSX) Click here for additional data file.

Primer information.

(XLSX) Click here for additional data file.

Distribution of p-values for two GWAS models.

QQ plots for univariate models for mCHG levels (A) and conditional models for mCHG|mCHH (B) in RdDM-targeted transposons (left) and CMT2-targeted transposons (right). (PDF) Click here for additional data file.

Distribution of phenotypes and the allelic effects.

The scatter plots illustrate the allelic effects on mCHH (A from [1]) and mCHG|mCHH in RdDM- (B) and CMT2-targeted transposons (C). Marginal phenotypic distributions for the reference and the alternative allele are plotted in blue and yellow, for univariate and conditional distributions. Red vectors show shifts of the mean value from the reference to the alternative allele in 2-dimensional phenotype space. (PDF) Click here for additional data file.

Comparison of the allelic effects on mCHH and mCHG.

Each arrow shows the mean value shift of mCHH and mCHG levels from reference to the alternative allele. Left and right plot correspond to RdDM- and CMT2-targeted transposons, respectively. Gray arrows correspond to the genetic effects of randomly sampled SNPs, keeping the allele frequency of each tested SNP (x 3000 for each allele). Diagonal lines correspond to linear regression lines of mCHH and mCHG. (PDF) Click here for additional data file.

Characterization of MIR823A.

(A) Sanger sequences around MIR823A region for lines carrying reference allele (Col-0) and the alternative allele (763, 9771). (B) MIR823A expression in a CRISPR/CAS9 mir823a mutant (Col-0) by qRT-PCR. Error bar shows standard deviation (n = 3). (C) Predicted secondary structure of MIR823A for reference (6909) and the alternative allele (9711). Secondary structures were plotted for the MIR823A coding region, and mountain plots were corresponding to the expanded regions, including 150 bp upstream and 200 bp downstream. (D) Effects of MIR823A on mCHH and mCG levels in RdDM and CMT2-targeted transposons in mir823a Col-0 and 9771 background. (PDF) Click here for additional data file.

The effects of CMT2 alleles on non-CG methylation.

(A) Genome structure of three CMT2 alleles associated with non-CG methylation variation. The CMT2 region was illustrated by mapped short-read DNA-seq data (IGV browser) for reference line (Col-0), CMT2a’ (10018), CMT2b’ (6969), and CMT2c (10023). Vertical colored lines in the IGV plots indicate SNPs. (B) The allelic effects on genome-wide average mCHH and mCHG levels in CMT2-targeted transposons. Only lines carrying one allele were compared with the reference line. Horizontal gray lines show median values of the reference lines. (PDF) Click here for additional data file.

Genetic variation around MSI1 and ROS3.

(A) Zoom-in Manhattan plots (Fig 3) and the genome structure around Chr5:23553506, 23555910 (top), and 23522001 (bottom) illustrated by mapped short-read DNA-seq data (IGV browser). Vertical colored lines in the IGV plots show SNPs. (B) Conditional GWAS for mCHG in RdDM- and CMT2-targeted transposons. mCHH and Chr5:23555910 were both used as co-factors. Gray vertical lines indicate the Chr5:2355910 position, and horizontal lines show the genome-wide significance (p = 0.05 by Bonferroni correction). r2 was calculated from chr5:23553506 and chr5:23555910 for mCHGRdDM and mCHGCMT2, respectively. (PDF) Click here for additional data file.

Characterization of JMJ26 allele.

Predicted amino acid sequences around jmjC domain of reference (6909) and the alternative allele (DraIV 6–22)(A) and the conserved region (B). Orange bars indicate nonsynonymous mutations associated with mCHG|mCHH in RdDM-targeted TEs. The sequences were predicted based on polymorphism data provided by the 1001 genome project. The domain information follows the TAIR10 annotation. (PDF) Click here for additional data file.

Molecular phenotypes of jmj26.

(A) Characterization of loss-of-function mutants, jmj26. Both #1 and #2 were jmj26 homozygous lines isolated from SALKseq_069498.1 and propagated separately. WT is a segregated line carrying active JMJ26 in the same stock. Morphology of jmj26 and Col-0 (left) and JMJ26 expression in leaves of jmj26 and the wild type (right). Expression was measured by qRT-PCR. (B) Volcano plots show the effects on transposon transcripts highlighted RdDM- and CMT2-targeted transposons. (C) The scatter plot shows the effect of mCHG methylation on transposon transcripts. The gray line shows the linear regression line. (PDF) Click here for additional data file.

Geographical distribution of mCHG-decreasing alleles.

The plots show the origin of lines carrying mCHG-increasing or decreasing alleles. (PDF) Click here for additional data file.

Geographical distribution of cumulative mCHG-decreasing alleles.

(A) Longitudinal frequencies of cumulative mCHG-decreasing alleles (corresponding to five alleles in Fig 6A). (B) The histogram shows how the cumulative mCHG-decreasing allele number is correlated with longitude of the origins. The blue histogram shows permuted Pearson’s correlation coefficients (R) between numbers of mCHG-decreasing alleles and longitude of the origin, maintaining the allele frequencies. The permutation tests were repeated 3000 times for 971 lines ranging from longitude -20° to 100°. The orange vertical line indicates the observed value. (C) Histogram similarly shows permuted Pearson’s correlation coefficients between number of mCHG-decreasing alleles and NRPE1’ genotype. The orange line is the observation. mCHH-increasing and decreasing NRPE1’ alleles are 0 and 1, respectively. (D) The geographic distribution of mCHG|mCHH levels in CMT2-targeted transposons. (PDF) Click here for additional data file.

Function of mCHG-decreasing alleles on transposon activities.

(A) Effects of the cumulative mCHG-decreasing alleles in whole populations (left) and combination with NRPE1’ allele (right). (+) and (-) are lines carrying NRPE1’ reference (mCHH-increasing) and the alternative (mCHH-decreasing) alleles, respectively. (B) Effects of the five major mCHG-decreasing alleles with NRPE1’ reference (left) and the alternative (right) populations. (R) and (A) are reference and the alternative alleles, respectively. (PDF) Click here for additional data file.

The genetic basis of mCHG and mCHH analyzed by MTMM.

(A) Manhattan plots for any, common, specific SNP effects on mCHG and mCHH in RdDM and CMT2-targeted transposons (see Methods). Vertical lines correspond to genome-wide significance (p = 0.05 by Bonferroni correction). (B) Enrichment of a priori genes and FDR for each GWAS result. (PDF) Click here for additional data file.

Conditional GWAS for mCHH|mCHG.

The genetic effects on mCHH in RdDM- and CMT2-targeted transposons were analyzed by the conditional GWAS model with mCHG as cofactor. Vertical lines correspond to genome-wide significance (p = 0.05 by Bonferroni correction). Orange arrows indicate peaks reported in previous studies as affecting mCHH ([1]). Each GWAS result was assessed by enrichment of a priori genes and FDR. (PDF) Click here for additional data file. 22 Jun 2022 Submitted filename: 124661_1_rebuttal_2374828_rcql0d.docx Click here for additional data file. 5 Jul 2022 Dear Dr Nordborg, Thank you very much for submitting your Research Article entitled 'Conditional GWAS of non-CG transposon methylation in Arabidopsis thaliana reveals major polymorphisms in five genes' to PLOS Genetics. The manuscript was fully evaluated at the editorial level. The editors evaluated the manuscript as well as the prior reviews and responses to these reviews. Our opinion was that the revisions had largely addressed the key concerns of the reviewers.  In addition, we felt that this work provided novel insights into approaches for monitoring natural variation in DNA methylation and identified novel natural variants that influence mCHG.  The writing is clear and summarizes a complex set of findings in an accessible fashion. However, there was one aspect of the research that was not well described in the current manuscript. The GWAS identifies natural variant alleles at several genes including prior characterized components of DNA methylation machinery as well as new components such as jmj26. In some cases, such as cmt2, the work suggests that this study identifies novel allelic variants. The current text does a nice job of summarizing evidence to support the candidate genes either through discussions of prior findings or through new experiments using knock-out alleles.  However, there is quite limited discussion of the potential basis for the natural variant alleles. At msi1/ros3, cmt2 and cmt3 is seems that the highly associated SNPs are not located within the coding region but this was not fully clear. The jmj26 associated variants do seem to overlap the gene but it is not clear if any of these result in protein sequence changes.  It would be useful to include a bit more description of the natural variant alleles to describe if there are potential changes in proteins, transcript levels or transcript stability that might explain the underlying basis for the natural variation. We realize that these might be speculative in some cases but at least the potential explanations could be provided. In addition we ask that you: 1) Provide a description of the changes you have made in the manuscript. 2) Upload a Striking Image with a corresponding caption to accompany your manuscript if one is available (either a new image or an existing one from within your manuscript). If this image is judged to be suitable, it may be featured on our website. Images should ideally be high resolution, eye-catching, single panel square images. For examples, please browse our archive. If your image is from someone other than yourself, please ensure that the artist has read and agreed to the terms and conditions of the Creative Commons Attribution License. Note: we cannot publish copyrighted images. We hope to receive your revised manuscript within the next 30 days. If you anticipate any delay in its return, we would ask you to let us know the expected resubmission date by email to plosgenetics@plos.org. If present, accompanying reviewer attachments should be included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist. While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission. To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice. PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process. To resubmit, you will need to go to the link below and 'Revise Submission' in the 'Submissions Needing Revision' folder. [LINK] Please let us know if you have any questions while making these revisions. Yours sincerely, Nathan M. Springer Associate Editor PLOS Genetics Wendy Bickmore Section Editor: Epigenetics PLOS Genetics 8 Jul 2022 Submitted filename: Cover letter.pdf Click here for additional data file. 16 Jul 2022 Dear Dr Nordborg, We are pleased to inform you that your manuscript entitled "Conditional GWAS of non-CG transposon methylation in Arabidopsis thaliana reveals major polymorphisms in five genes" has been editorially accepted for publication in PLOS Genetics. Congratulations! Before your submission can be formally accepted and sent to production you will need to complete our formatting changes, which you will receive in a follow up email. Please be aware that it may take several days for you to receive this email; during this time no action is required by you. Please note: the accept date on your published article will reflect the date of this provisional acceptance, but your manuscript will not be scheduled for publication until the required changes have been made. Once your paper is formally accepted, an uncorrected proof of your manuscript will be published online ahead of the final version, unless you’ve already opted out via the online submission form. If, for any reason, you do not want an earlier version of your manuscript published online or are unsure if you have already indicated as such, please let the journal staff know immediately at plosgenetics@plos.org. In the meantime, please log into Editorial Manager at https://www.editorialmanager.com/pgenetics/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production and billing process. Note that PLOS requires an ORCID iD for all corresponding authors. Therefore, please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field.  This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager. If you have a press-related query, or would like to know about making your underlying data available (as you will be aware, this is required for publication), please see the end of this email. If your institution or institutions have a press office, please notify them about your upcoming article at this point, to enable them to help maximise its impact. Inform journal staff as soon as possible if you are preparing a press release for your article and need a publication date. Thank you again for supporting open-access publishing; we are looking forward to publishing your work in PLOS Genetics! Yours sincerely, Nathan M. Springer Associate Editor PLOS Genetics Wendy Bickmore Section Editor: Epigenetics PLOS Genetics www.plosgenetics.org Twitter: @PLOSGenetics ---------------------------------------------------- Comments from the reviewers (if applicable): ---------------------------------------------------- Data Deposition If you have submitted a Research Article or Front Matter that has associated data that are not suitable for deposition in a subject-specific public repository (such as GenBank or ArrayExpress), one way to make that data available is to deposit it in the Dryad Digital Repository. As you may recall, we ask all authors to agree to make data available; this is one way to achieve that. A full list of recommended repositories can be found on our website. The following link will take you to the Dryad record for your article, so you won't have to re‐enter its bibliographic information, and can upload your files directly: http://datadryad.org/submit?journalID=pgenetics&manu=PGENETICS-D-22-00735R1 More information about depositing data in Dryad is available at http://www.datadryad.org/depositing. If you experience any difficulties in submitting your data, please contact help@datadryad.org for support. Additionally, please be aware that our data availability policy requires that all numerical data underlying display items are included with the submission, and you will need to provide this before we can formally accept your manuscript, if not already present. ---------------------------------------------------- Press Queries If you or your institution will be preparing press materials for this manuscript, or if you need to know your paper's publication date for media purposes, please inform the journal staff as soon as possible so that your submission can be scheduled accordingly. Your manuscript will remain under a strict press embargo until the publication date and time. This means an early version of your manuscript will not be published ahead of your final version. PLOS Genetics may also choose to issue a press release for your article. If there's anything the journal should know or you'd like more information, please get in touch via plosgenetics@plos.org. 30 Aug 2022 PGENETICS-D-22-00735R1 Conditional GWAS of non-CG transposon methylation in Arabidopsis thaliana reveals major polymorphisms in five genes Dear Dr Nordborg, We are pleased to inform you that your manuscript entitled "Conditional GWAS of non-CG transposon methylation in Arabidopsis thaliana reveals major polymorphisms in five genes" has been formally accepted for publication in PLOS Genetics! Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out or your manuscript is a front-matter piece, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Genetics and open-access publishing. We are looking forward to publishing your work! With kind regards, Andrea Szabo PLOS Genetics On behalf of: The PLOS Genetics Team Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom plosgenetics@plos.org | +44 (0) 1223-442823 plosgenetics.org | Twitter: @PLOSGenetics
  84 in total

1.  THE MEASUREMENT OF SELECTION ON CORRELATED CHARACTERS.

Authors:  Russell Lande; Stevan J Arnold
Journal:  Evolution       Date:  1983-11       Impact factor: 3.694

2.  Spontaneous epigenetic variation in the Arabidopsis thaliana methylome.

Authors:  Claude Becker; Jörg Hagmann; Jonas Müller; Daniel Koenig; Oliver Stegle; Karsten Borgwardt; Detlef Weigel
Journal:  Nature       Date:  2011-09-20       Impact factor: 49.962

3.  Control of genic DNA methylation by a jmjC domain-containing protein in Arabidopsis thaliana.

Authors:  Hidetoshi Saze; Akiko Shiraishi; Asuka Miura; Tetsuji Kakutani
Journal:  Science       Date:  2008-01-25       Impact factor: 47.728

4.  Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines.

Authors:  Susanna Atwell; Yu S Huang; Bjarni J Vilhjálmsson; Glenda Willems; Matthew Horton; Yan Li; Dazhe Meng; Alexander Platt; Aaron M Tarone; Tina T Hu; Rong Jiang; N Wayan Muliyati; Xu Zhang; Muhammad Ali Amer; Ivan Baxter; Benjamin Brachi; Joanne Chory; Caroline Dean; Marilyne Debieu; Juliette de Meaux; Joseph R Ecker; Nathalie Faure; Joel M Kniskern; Jonathan D G Jones; Todd Michael; Adnane Nemri; Fabrice Roux; David E Salt; Chunlao Tang; Marco Todesco; M Brian Traw; Detlef Weigel; Paul Marjoram; Justin O Borevitz; Joy Bergelson; Magnus Nordborg
Journal:  Nature       Date:  2010-03-24       Impact factor: 49.962

5.  Loss of Small-RNA-Directed DNA Methylation in the Plant Cell Cycle Promotes Germline Reprogramming and Somaclonal Variation.

Authors:  Filipe Borges; Mark T A Donoghue; Chantal LeBlanc; Emily E Wear; Milos Tanurdžić; Benjamin Berube; Ashley Brooks; William F Thompson; Linda Hanley-Bowdoin; Robert A Martienssen
Journal:  Curr Biol       Date:  2020-12-03       Impact factor: 10.900

6.  A CRISPR/Cas9 toolkit for multiplex genome editing in plants.

Authors:  Hui-Li Xing; Li Dong; Zhi-Ping Wang; Hai-Yan Zhang; Chun-Yan Han; Bing Liu; Xue-Chen Wang; Qi-Jun Chen
Journal:  BMC Plant Biol       Date:  2014-11-29       Impact factor: 4.215

7.  fastp: an ultra-fast all-in-one FASTQ preprocessor.

Authors:  Shifu Chen; Yanqing Zhou; Yaru Chen; Jia Gu
Journal:  Bioinformatics       Date:  2018-09-01       Impact factor: 6.937

8.  A mixed-model approach for genome-wide association studies of correlated traits in structured populations.

Authors:  Arthur Korte; Bjarni J Vilhjálmsson; Vincent Segura; Alexander Platt; Quan Long; Magnus Nordborg
Journal:  Nat Genet       Date:  2012-08-19       Impact factor: 38.330

9.  Retinoblastoma and its binding partner MSI1 control imprinting in Arabidopsis.

Authors:  Pauline E Jullien; Assaf Mosquna; Mathieu Ingouff; Tadashi Sakata; Nir Ohad; Frédéric Berger
Journal:  PLoS Biol       Date:  2008-08-12       Impact factor: 8.029

10.  Arabidopsis shoot stem cells display dynamic transcription and DNA methylation patterns.

Authors:  Ruben Gutzat; Klaus Rembart; Thomas Nussbaumer; Falko Hofmann; Rahul Pisupati; Gabriele Bradamante; Nina Daubel; Angelika Gaidora; Nicole Lettner; Mattia Donà; Magnus Nordborg; Michael Nodine; Ortrun Mittelsten Scheid
Journal:  EMBO J       Date:  2020-08-20       Impact factor: 14.012

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.