Literature DB >> 21487388

Integrative epigenomic mapping defines four main chromatin states in Arabidopsis.

François Roudier¹, Ikhlak Ahmed, Caroline Bérard, Alexis Sarazin, Tristan Mary-Huard, Sandra Cortijo, Daniel Bouyer, Erwann Caillieux, Evelyne Duvernois-Berthet, Liza Al-Shikhley, Laurène Giraut, Barbara Després, Stéphanie Drevensek, Frédy Barneche, Sandra Dèrozier, Véronique Brunaud, Sébastien Aubourg, Arp Schnittger, Chris Bowler, Marie-Laure Martin-Magniette, Stéphane Robin, Michel Caboche, Vincent Colot.

Abstract

Post-translational modification of histones and DNA methylation are important components of chromatin-level control of genome activity in eukaryotes. However, principles governing the combinatorial association of chromatin marks along the genome remain poorly understood. Here, we have generated epigenomic maps for eight histone modifications (H3K4me2 and 3, H3K27me1 and 2, H3K36me3, H3K56ac, H4K20me1 and H2Bub) in the model plant Arabidopsis and we have combined these maps with others, produced under identical conditions, for H3K9me2, H3K9me3, H3K27me3 and DNA methylation. Integrative analysis indicates that these 12 chromatin marks, which collectively cover ∼90% of the genome, are present at any given position in a very limited number of combinations. Moreover, we show that the distribution of the 12 marks along the genomic sequence defines four main chromatin states, which preferentially index active genes, repressed genes, silent repeat elements and intergenic regions. Given the compact nature of the Arabidopsis genome, these four indexing states typically translate into short chromatin domains interspersed with each other. This first combinatorial view of the Arabidopsis epigenome points to simple principles of organization as in metazoans and provides a framework for further studies of chromatin-based regulatory mechanisms in plants.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2011 PMID： 21487388 PMCID： PMC3098477 DOI： 10.1038/emboj.2011.103

Source DB: PubMed Journal: EMBO J ISSN： 0261-4189 Impact factor: 11.598

Introduction

Packaging of DNA into chromatin is pivotal for the regulation of genome activity in eukaryotes. The basic unit of chromatin is the nucleosome, which is composed of 147 bp of DNA wrapped around a protein octamer composed of two molecules each of the core histones H2A, H2B, H3 and H4. Covalent modifications of histones, DNA methylation, incorporation of histone variants, and other factors, such as chromatin-remodelling enzymes or small RNAs, all contribute to defining distinct chromatin states that modulate access to DNA (Berger, 2007; Kouzarides, 2007). In particular, different histone modifications are thought to act sequentially or in combination in order to confer distinct transcriptional outcomes (Strahl and Allis, 2000; Jenuwein and Allis, 2001; Berger, 2007; Lee et al, 2010a). More generally, it is now well established that the precise composition of chromatin along the genome, which defines the epigenome, participates in the selective readout of the genomic sequence. Thanks in part to a compact, almost fully sequenced and well-annotated genome, the flowering plant Arabidopsis thaliana has become a model of choice for exploring the epigenomes of multicellular organisms and the contribution of chromatin to the regulation of genome activity during development or in response to the environment. Indeed, epigenomic profiling in Arabidopsis has begun to provide insights into the relationship between transcriptional activity and localization of chromatin marks or histone variants (Roudier et al, 2009; Feng and Jacobsen, 2011). For instance, H3K4me3 and H3K36me2 are detected at the 5′- and 3′-ends of actively transcribed genes, respectively (Oh et al, 2008; Zhang et al, 2009), while H3K27me3 broadly marks repressed genes (Turck et al, 2007; Zhang et al, 2007; Oh et al, 2008). In contrast, cytosine methylation (5mC) has a dual localization. It is present predominantly over silent transposable elements (TEs) and other repeats, where it is associated with H3K9me2 and H3K27me1, but also in the body of ∼30% of genes, many of which are characterized by moderate expression levels (Lippman et al, 2004; Zhang et al, 2006; Zilberman et al, 2006; Turck et al, 2007; Vaughn et al, 2007; Bernatavichute et al, 2008; Cokus et al, 2008; Lister et al, 2008; Jacob et al, 2010). Furthermore, the variant histone H2A.Z, which is preferentially deposited near the 5′-end of genes and promotes transcriptional competence, antagonizes DNA methylation and vice versa (Zilberman et al, 2008). However, extensive combinatorial analyses of these and other chromatin marks have not been performed so far in Arabidopsis and meta-analysis of published data is complicated by the fact that biological materials and methodologies often differ between studies. Here, we report the epigenomic profiles of eight histone modifications (H3K4me2, H3K4me3, H3K27me1, H3K27me2, H3K36me3, H3K56ac, H4K20me1 and H2Bub). Integrative analyses of these and other profiles, previously obtained under identical conditions for DNA methylation, H3K9me2, H3K9me3 and H3K27me3 (Turck et al, 2007; Vaughn et al, 2007), indicate a low combinatorial complexity of chromatin marks in Arabidopsis, as recently reported for metazoans (Wang et al, 2008; Hon et al, 2009; Ernst and Kellis, 2010; Gerstein et al, 2010; Roy et al, 2010; Kharchenko et al, 2011; Liu et al, 2011; Riddle et al, 2011; Zhou et al, 2011). Furthermore, our study identifies four main chromatin states in Arabidopsis, which have distinct indexing functions and which typically form short domains interspersed with each other. This first comprehensive view of the Arabidopsis epigenome suggests simple principles of organization, as recently proposed for Drosophila (Filion et al, 2010), and provides a resource to refine our understanding of the control of genome activity at the level of chromatin.

Results

Epigenomic profiling of 12 chromatin marks

Epigenomic maps were generated for eight histone modifications (H3K4me2, H3K4me3, H3K27me1, H3K27me2, H3K36me3, H3K56ac, H2Bub and H4K20me1) using chromatin extracted from young seedlings and immunoprecipitation followed by hybridization to a tiling microarray that covers the entire chromosome 4 of Arabidopsis at ∼900 bp resolution (Turck et al, 2007). Data previously obtained for 5mC (Vaughn et al, 2007), H3K9me2, H3K9me3 and H3K27me3 (Turck et al, 2007) using similar materials and methodologies were also considered. Epigenomic profiling was additionally performed for seven of these marks (H3K4me2, H3K4me3, H3K27me1, H3K27me3, H3K36me3, H2Bub and 5mC) using a tiling microarray covering the whole-genome sequence at 165 bp resolution. Chromosome 4 and whole-genome maps were also obtained for histone H3 to control for nucleosome occupancy. The 12 marks were chosen because they were shown in previous studies to be associated with distinct transcriptional activities or subnuclear localization in Arabidopsis. In addition, our selection was focussed to a large extent on histone lysine methylation, which exists in three forms (mono-, di- and tri-methylation) and therefore has a versatile indexing potential (Sims and Reinberg, 2008). Collectively, the 12 chromatin marks cover almost all of the regions that are detectably associated with histone H3, which amount to ∼90% of the total genome sequence (data not shown; Chodavarapu et al, 2010). The distribution of each chromatin modification was characterized in detail along chromosome 4. In agreement with previous reports (Lippman et al, 2004; Turck et al, 2007; Zhang et al, 2007, 2009; Oh et al, 2008; Tanurdzic et al, 2008), H3K4me2, H3K4me3, H3K9me3, H3K27me3 and H3K56ac are mostly found in euchromatin (Figure 1A; Supplementary Figure S1; Supplementary Table I), which reflects the fact that these different modifications are associated almost exclusively with genes (Figure 1B). H2Bub and H3K36me3, for which no epigenomic maps have been reported to date in plants, are also characterized by a predominant distribution over genes. In contrast, H4K20me1 is found in heterochromatin mainly and associates with TE and other repeat element sequences (Figure 1B), like H3K9me2 (Lippman et al, 2004; Bernatavichute et al, 2008). The present analysis reveals in addition that, like 5mC (Zhang et al, 2006; Zilberman et al, 2006), H3K27me1 and H3K27me2 are dual marks associated not only with TEs but also with a fraction of genes (Supplementary Tables II–IV).

Figure 1

Genomic distribution of chromatin marks. (A) Relative coverage of chromatin marks in the euchromatin and heterochromatin of chromosome 4. Coordinates for heterochromatin are 1.61–2.36 Mb (knob) and 2.78–5.15 Mb (pericentromeric regions). (B) Chromosome-wide distribution of chromatin marks over annotated features. Tiles that overlap annotated genes or transposable elements (TAIR8) by at least 50 bp were assigned to the corresponding annotation and otherwise called ‘intergenic'. (C) Pairwise association analysis of the 12 chromatin marks along chromosome 4. Mean association values were calculated for each pair of modifications over all marked tiles and are shown as a directional heat map organized by hierarchical clustering using Pearson's correlation distances.

Each chromatin mark defines domains of contiguous tiles and the number of these domains ranges from 306 for H3K9me2 to 1163 for H3K4me3. For H3K4me3, H3K36me3, H3K56ac, H3K9me3, H2Bub or H3K27me3, domains have similar median length between euchromatin and heterochromatin and mostly coincide with single transcription units (Supplementary Table II; Supplementary Figure S2). By contrast, H3K9me2, H4K20me1, H3K27me1, H3K27me2 and 5mC form small domains in euchromatin but large domains in heterochromatin, as a result of the dense clustering of TE and other repeat sequences in the latter (Supplementary Figure S2; Supplementary Table II).

Combinatorial analysis of chromatin marks

As a first step in exploring the combinatorial deposition patterns of chromatin marks, unbiased pairwise association analyses were carried out. A heat map generated from the calculated association values (Supplementary Table V) and organized by hierarchical clustering reveals two clear groups of correlated pairs that distinguish genes from TE sequences (Figure 1C). Next, co-occurrence of marks was registered over each of the ∼20 000 tiles of the chromosome 4 array. Of the 212=4096 combinations theoretically possible, only 665 were observed and among these, only 38 concerned at least 100 tiles (Supplementary Figure S3A). This indicates therefore a limited repertoire of chromatin signatures in Arabidopsis, as in other eukaryotes (Ernst and Kellis, 2010; Kharchenko et al, 2011; Liu et al, 2011). The four prevalent combinations of marks are H3K27me1+5mC+H3K9me2+H4K20me1+H3K27me2, H3K56Ac+H2Bub+H3K4me3+H3K4me2+H3K9me3+H3K36me3, H3K27me3+H3K27me2+H3K4me2 and H3K27me3+H3K27me2, which cover 10.9, 6.8, 4.7 and 4.6% of the tiling array, respectively. Whereas the first combination is almost exclusively associated with TE sequences, the other three are mainly present over genes (Supplementary Figure S3B). Furthermore, like H3K27me3+H3K27me2, most of the remaining combinations represented by at least 100 tiles are subcombinations of the three prevalent ones (Supplementary Figure S3B and data not shown). To complement this tile-centric analysis and to identify the prevalent combinatorial patterns of the 12 chromatin marks, unsupervised c-means clustering was performed. The number of clusters (k) was varied from 2 to 11 and k=4 was determined to be optimal in maximizing homogeneity within clusters and heterogeneity between them. The four chromatin states (CS1–CS4) defined by these four clusters are also identified by PCA analysis (data not shown), thus reinforcing their significance. Whereas CS1 regroups ∼90% of the tiles associated with H3K4me3, H3K36me3, H3K9me3 and H2Bub as well as the majority of H3K4me2- and H3K56ac-marked sequences, H3K27me3 and H3K27me2 are the most prevalent modifications in CS2 (Figure 2A). As expected from their composition, CS1 and CS2 are mainly associated with genes (Figure 2B) and have antagonistic indexing functions, being prevalent among active and repressed/lowly expressed genes, respectively (Figure 2C). CS3, which is associated predominantly with TE sequences (Figure 2B), regroups most of the tiles marked by H3K9me2, H4K20me1 and H3K27me1 as well as ∼50% of those marked by H3K27me2 and 5mC (Figure 2A). In contrast to the other three chromatin states, CS4 is not particularly enriched in any chromatin mark (Figure 2A) and is found mainly outside of genes and TE sequences (Figure 2B). Nonetheless, CS4 also marks ∼10% of genes, most of which display low expression (Figure 2C). In keeping with the domain layout of individual marks, CS1–CS4 typically form small domains interspersed with each other, except in cytologically defined heterochromatin, where CS3 forms larger domains as a result of the clustering of TE sequences (Figure 2D; Supplementary Figure S4).

Figure 2

The Arabidopsis epigenome contains four predominant chromatin states. (A) The table on the left indicates the composition of the four predominant chromatin states (CS) identified by c-means clustering. The distribution of the 12 chromatin marks over the four CS is indicated as a heat map for values ranging from 25% (light purple) to 100% (dark purple). The degree of homogeneity of each CS is indicated by the percentage of tiles assigned to it that are associated with each of the 12 chromatin marks (numbers inside cells). Note that no single mark is present over >20% of the tiles assigned to CS4, in contrast to what is observed for CS1–CS3. The percentage of genes indexed by CS1, CS2 and CS4 and the percentage of TE annotations indexed by CS3 are also shown. Pie charts indicate the relative genomic coverage of the four CS and the number of domains that they each form. Grey colour corresponds to tiles that cannot be unambiguously assigned to any of the four CS (see Materials and methods). (B) Relative proportion of genomic features within each CS. Tiles that overlap annotated genes or transposable elements (TAIR8) by at least 50 bp were assigned the corresponding annotation. All other tiles were considered as ‘intergenic'. (C) Relationship between chromatin states and gene expression level. The percentage of tiles associated with a given CS is represented according to expression level. The dashed line represents the distribution of all annotated genes of chromosome 4. Expression data (Schmid et al, 2005) were obtained by averaging appropriate developmental stages. (D) Distribution of the four CS along chromosome 4. For each tile, membership to a given CS is colour coded. K: heterochromatic knob. The non-sequenced part of the centromere (C) is represented by the vertical black line. The high interspersion of chromatin states seen outside of heterochromatin is highlighted in a genome browser view of a 30-kb euchromatic region (positions 0.95–0.98 Mb).

Chromatin signatures of genes

To investigate further the chromatin indexing of genes, pairwise analysis of chromatin modifications was carried out specifically over genic tiles, which revealed a tight association between H3K4me3 and H3K56ac, between H3K36me3, H3K9me3 and H2Bub and between H3K27me2 and H3K27me3 (Figure 3A). Next, average enrichment levels were calculated within and around genes for all marks except H3K9me2 and H4K20me1, which are almost exclusively associated with TE and other repeat sequences. As shown in Figure 3B, values are highest within the transcribed region for the 10 chromatin modifications considered and are typically lowest upstream or downstream of it. However, distribution patterns vary substantially between marks, as previously established in several instances (Turck et al, 2007; Zhang et al, 2007; Jacob et al, 2010). H3K4me3, H3K56ac, H3K4me2, H3K36me3 and H3K9me3 all peak at the 5′-end of the transcribed region, but the first two marks more sharply than the other three (Figure 3B). In contrast, H2Bub as well as H3K27me1 are highest more centrally, 5mC is most enriched in the 3′-half of the transcribed region and both H3K27me2 and H2K27me3 show an even distribution across transcribed regions. Finally, H3K27me2 differs from all other marks including H3K27me3 in that it remains high in flanking regions, a difference which does not result from the presence of H3K27me2-marked TE sequences adjacent to genes nor from the lower signal to noise ratio measured for this mark (see legend of Figure 3A, data not shown). Using the genome-wide profiles obtained for seven chromatin modifications, we could show in addition that contrary to H3K27me3, which preferentially marks small genes as noted before (Luo and Lam, 2010), H2Bub, H3K36me3, 5mC and, to a lesser extent, H3K4me2 as well as H3K4me3 tend to be associated with longer genes (Figure 3C). Unlike these chromatin modifications, H3K27me1 does not exhibit preferential association in relation to gene length (Figure 3C).

Figure 3

Distribution of chromatin marks over genes. (A) Pairwise association analysis of the 12 chromatin marks along chromosome 4. Mean association values were calculated for each pair of modifications over all marked genic tiles and are shown as a directional heat map organized by hierarchical clustering using Pearson's correlation distances metric. (B) Mean enrichment levels relative to histone H3 are plotted along marked genes (transcribed region, scaled to accommodate for different gene lengths, bin size of 1%) as well as up to 1 kb of upstream and downstream sequences (bin size of 10 bp). Maximum value for any given mark is arbitrarily set to 1. Data were obtained using the chromosome 4 tiling array. Note that values for H3K27me2 in upstream and downstream regions are significantly higher than for unmarked genes (>0.9 versus ∼0.6, not shown). (C) Left panels: Enrichment levels relative to histone H3 for marked genes sorted by length. Each line represents a single gene as well as 1 kb of upstream and downstream sequences. Enrichment is indicated as a heat map, with maximal (red) and minimal (green) values set to 1 and 0, respectively. Right panels: Frequency distribution of marked (red line) and all genes (black dashed line) according to their length. Data were obtained using the whole-genome tiling array.

It has been established that H3K4me3 and H3K56ac mark genes that are highly and broadly expressed (Oh et al, 2008; Tanurdzic et al, 2008; Zhang et al, 2009). Conversely, H3K27me3 is preferentially associated with genes that are expressed at low levels or in a tissue-specific manner (Turck et al, 2007; Zhang et al, 2007; Oh et al, 2008; Jacob et al, 2010) and 5mC tends to mark moderately expressed genes (Zilberman et al, 2006; Vaughn et al, 2007). Our analysis confirms these results and indicates in addition that H2Bub, H3K36me3 and H3K9me3 tend to mark highly expressed genes, like H3K4me3 and H3K56ac (Figure 4A). On the other hand, H3K4me2 does not appear to index genes in relation to their expression level and H3K27me1 as well as H3K27me2 tend to be associated with genes that are expressed at low level or in a tissue-specific manner, like H3K27me3 (Figure 4A and B). However, H3K27me1 and H3K27me2/3 mark largely non-overlapping sets of genes with different ontologies (Figure 3A; Supplementary Tables III, IV, VI and VII), which suggests the existence of two distinct gene repression systems associated with methylation of H3K27. For most chromatin marks, average enrichment levels correlate either positively or negatively with expression levels (Figure 4C). Thus, values for H3K4me3, H3K56ac, H3K36me3, H2Bub and H3K9me3 increase gradually with gene expression, at least up to mid expression levels, whereas values for H3K27me1, H3K27me2 and H3K27me3 show an opposite trend. Whether these correlations reflect expression of genes in a variable number of cells, or true differential enrichment in relation to expression level, remains to be determined.

Figure 4

Chromatin indexing in relation to gene expression. (A) Distribution density of marked genes according to expression percentiles. Genes were binned according to their absolute expression values in whole seedlings. The dashed line indicates the distribution of all annotated genes on chromosome 4 across all expression percentiles. Expression data (Schmid et al, 2005) were obtained by averaging appropriate developmental stages. (B) Tissue specificity of marked genes as estimated by Shannon entropy calculation. Low entropy values indicate high tissue specificity. The fraction of marked genes associated with a given entropy value is plotted for each chromatin modification. (C) Relationship between gene expression and enrichment level for each chromatin modification. Maximum enrichment level is set to 1 in each case.

Collectively, our findings indicate that H3K4me3 and H3K27me3 are diagnostic of two antagonist chromatin states that are associated with most active and repressed genes, respectively. However, ∼13% (3433 out of 27 294) of genes marked by H3K4me3 or H3K27me3 in whole seedlings present both marks, in agreement with previous observations (Oh et al, 2008; Zhang et al, 2009). To explore this further, H3K4me3 and H3K27me3 were mapped genome-wide using chromatin extracted from roots and profiles were compared with those obtained for whole seedlings (this study) or aerial parts only (Oh et al, 2008). Out of the 3433 genes with both marks in whole seedlings, 284 genes (8.3%) are only marked by H3K4me3 in roots and by H3K27me3 in aerial parts or vice versa (Figure 5A; Supplementary Table VIII). Correspondingly, a majority of these genes show differential expression between roots and aerial parts (Figure 5B), which is in contrast to genes with persistent co-marking in both plant parts (Figure 5A and C). Thus, it can be concluded that co-marking in whole seedlings results for a number of genes from the mixing of cells with opposite chromatin indexing in the two plant parts. By extension, it is likely that persistent co-marking in one or the other plant parts (Figure 5A) reflects similar mixing of cells with distinct epigenomes, but this time within organs. Co-marking could nevertheless correspond to bona fide bivalent marking in some cases, as originally reported in mammals for key regulatory genes poised for activation (Wang et al, 2009) and as also described in Arabidopsis for a small number of genes encoding transcription factors (Jiang et al, 2008; Berr et al, 2010). In this respect, it is noteworthy that ontology analysis of the 224 genes with persistent co-marking in both roots and aerial parts (Figure 5A) indicates significant enrichment for terms associated with regulation of transcription (data not shown).

Figure 5

Analysis of genes co-marked with H3K27me3 and H3K4me3 in whole seedlings. (A) The 3433 genes co-marked in whole seedlings were split into different classes according to their marking in roots (R; this study) and aerial parts (AP; Oh et al, 2008). ‘Others' indicate genes with other marking patterns in the two plant parts. This class, which is not expected based on the co-marking observed in whole seedlings, can be explained in part by the fact that the different data sets were not all generated using the same conditions and methodologies. (B) Expression analysis in roots and shoots (Schmid et al, 2005) for the 284 genes showing opposite marking in roots and aerial parts. Brown dots indicate genes with H3K4me3 in roots and H3K27me3 in aerial parts, green dots indicate genes with the opposite marking pattern. (C) Expression analysis in roots and shoots (Schmid et al, 2005) for the 224 genes showing persistent co-marking in roots and aerial parts.

Discussion

A small number of prevalent chromatin states index the Arabidopsis genome

Using an integrative analysis of the distribution of 12 chromatin marks, we show that the Arabidopsis epigenome is organized around four predominant chromatin states with distinct biochemical, transcriptional and sequence properties. This representation refines the classical segmentation between cytologically defined heterochromatin and euchromatin. A first chromatin state (CS1) corresponds to transcriptionally active genes and is typically enriched in the trimethylated forms of H3K4 and H3K36. Two further states correspond to two distinct types of repressive chromatin. H3K27me3-marked repressive chromatin (CS2) is mainly associated with genes under PRC2-mediated repression (Turck et al, 2007; Zhang et al, 2007), while H3K9me2- and H4K20me1-marked repressive chromatin (CS3) corresponds to classical heterochromatin and is almost exclusively located over silent TEs (Lippman et al, 2004; Bernatavichute et al, 2008). A fourth chromatin state (CS4) is characterized by the absence of any prevalent mark and is associated with weakly expressed genes and intergenic regions. This rather simple organization of Arabidopsis chromatin into four main states shows similarities with that recently reported for Drosophila cells. Indeed, based on the integration of epigenomic maps obtained for 53 chromatin proteins, it was concluded that the Drosophila epigenome is organized into a mosaic of five principal chromatin types that display distinct functional properties (Filion et al, 2010). Specifically, Arabidopsis CS2 and CS3 are similar to Drosophila ‘BLUE' and ‘GREEN' chromatin types, which correspond to repressive chromatin associated with the Polycomb pathway and classical heterochromatin, respectively. Furthermore, CS4, which has no prevalent chromatin mark and indexes some weakly expressed genes as well as intergenic regions is reminiscent of Drosophila ‘BLACK' chromatin, which is relatively gene poor and constitutes a repressive environment distinct from heterochromatin. In contrast, transcriptionally active chromatin is represented by a single chromatin state in Arabidopsis (CS1) but by two distinct types in Drosophila that differ in several ways, including the enrichment of H3K36me3 in ‘YELLOW' but not in ‘RED' chromatin. Other large-scale epigenomic studies have been performed in yeast (Liu et al, 2005), C. elegans (Gerstein et al, 2010; Liu et al, 2011), Drosophila (Kharchenko et al, 2011; Roy et al, 2010; Riddle et al, 2011) and human cells (Wang et al, 2008; Hon et al, 2009; Ernst and Kellis, 2010; Zhou et al, 2011), which all indicate a relatively low combinatorial complexity of chromatin marks. Furthermore, the two main repressive chromatin states defined in Arabidopsis (CS2 and CS3) have similar counterparts in metazoans, indicating that they are highly conserved between plants and animals. On the other hand, the single predominant chromatin state (CS1) that we have identified for transcriptionally active genes in Arabidopsis has no obvious equivalent in these other organisms. Instead, several chromatin states have been associated with expressed genes in other organisms. This discrepancy likely results from the smaller size of genes and intergenic regions in Arabidopsis (∼2 kb each on average), as well as the relatively lower resolution of our data. Indeed, our analysis shows that distribution patterns vary substantially between chromatin marks associated with active genes (Figure 3B), which suggests that CS1 could be further refined into at least two additional chromatin signatures, specific to the promoter and transcribed region of these genes. Although the number of chromatin states identified via this type of integrative approach may appear surprisingly low, such analyses aim to identify prevalent combinations of chromatin marks or chromatin proteins. Furthermore, the heterogeneity of the biological material used in many of these studies, including ours, likely hampered the detection of certain chromatin states such as those that are specific to rare cell types. Ultimately, only a knowledge of the epigenomes of individual cell types will enable a full understanding of the functional impact of chromatin-level regulation on genome activity.

Chromatin indexing of genes in Arabidopsis

Our work indicates that the Arabidopsis epigenome is mainly organized at the level of single transcription units and that the distribution of chromatin marks along genes is linked to the transcription process (Figures 2 and 3). For example, H3K4me3 peaks around the transcription start site of actively expressed genes, as observed in all other eukaryotes examined to date (Rando and Chang, 2009). Similarly, H3K56ac is specifically located at gene promoters and shows preferential marking of active genes, suggesting that, like in yeast, it could facilitate rapid transcriptional activation (Williams et al, 2008). In contrast to H3K4me3, H3K4me2 shows no particular association with highly expressed genes or with specific parts of genes. Rather than being a constitutive mark of transcription, H3K4me2 may be implicated in fine tuning of tissue-specific expression, as recently reported in mammals (Pekowska et al, 2010). The distribution of H3K36me3, H3K9me3 and H2Bub over the transcribed regions of expressed genes suggests that these modifications are linked with transcriptional elongation. In the case of H2Bub, this is in agreement with the distribution reported in mammals and yeast (Minsky et al, 2008; Schulze et al, 2009). For H3K9me3, enrichment over the coding region of expressed genes in Arabidopsis (this study; Caro et al, 2007; Turck et al, 2007; Charron et al, 2009) contrasts with the enrichment predominantly over heterochromatin in animals. However, association with the transcribed regions of some active genes has been reported in mammals (Vakoc et al, 2005, 2006; Squazzo et al, 2006). Whether H3K9me3 could serve different outcomes depending on genomic and or chromatin context and whether it has any role in transcription regulation in plants remains to be determined. Given the discrepancy between the low amounts of H3K9me3 reported in bulk histones (Jackson et al, 2004; Johnson et al, 2004) and its apparent abundance reported by ChIP-chip, it is also possible that the H3K9me3 antibody we used recognizes another modification in Arabidopsis, which would be H3K36me3 based on our epigenomic analysis. However, in vitro competition assays using an H3K36me3 peptide suggest that this is unlikely (Supplementary Figure S5). H3K36me3 preferentially marks exons of transcribed genes in yeast, C. elegans and mammals (Kolasinska-Zwierz et al, 2009) and it was shown to be involved in the control of alternative splicing in mammals (Luco et al, 2010). In Arabidopsis, however, H3K36me3 peaks in the first half of the coding region, which is in contrast to the 3′-end enrichment reported in other organisms (Wang et al, 2009). This preferential enrichment at the 5′-end, which is not dependent on gene length, could indicate that the principles governing H3K36me3 deposition differ between plants and other eukaryotes. In fact, H3K36me3 distribution in Arabidopsis resembles that of H3K79me3 in mammals (Wang et al, 2009). As Arabidopsis lacks a clear homologue of the H3K79 methyltransferase Dot1 and has no H3K79me3 (Zhang et al, 2007), it is possible that H3K36me3 in plants serves a function equivalent to H3K79me3 in other eukaryotes. Furthermore, H3K36me2 could have a role similar to that attributed to H3K36me3 in other eukaryotes, as it peaks at the 3′-end of expressed genes in Arabidopsis (Oh et al, 2008). Chromatin marks associated with transcription have been proposed to cross talk and serve as checkpoints in budding yeast and mammals (Suganuma and Workman, 2008; Weake and Workman, 2008; Lee et al, 2010a). A similar scenario could be envisioned in Arabidopsis based on the chromatin marks that predominate in CS1, whereby the RNA polymerase II-associated factor 1 complex would induce mono-ubiquitylation of H2B via the activity of the Rad6-Bre1 ubiquitin ligase homologues UBC1, 2 and 3 as well as HUB1 and 2, as shown at the FLC gene (Cao et al, 2008; Gu et al, 2009; Schmitz et al, 2009). H2Bub deposition would in turn help recruit COMPASS (COMplex Proteins ASsociated with Set1), thus mediating deposition of H3K4me3 and potentially H3K36me3 (in place of H3K79me3) as well as H3K36me2. Similarly to other eukaryotes, initiation of another round of transcription would require the activity of the Ubp8 ubiquitin protease homologue, UBP26, which catalyses H2B deubiquitylation (Sridhar et al, 2007). Consistent with this, H3K36me3 but not H3K36me2 nor H3K4me3 is almost lost at the 5′-end of the gene FLC in ubp26 mutant plants and this loss is associated with a reduction of FLC expression (Schmitz et al, 2009). The steady-state distribution pattern of H2Bub observed over expressed genes presumably results from targeted deubiquitylation of H2B at the 5′-end and probably 3′-end of the transcribed region, rather than from an increased ubiquitylation of H2B towards the middle of the gene. Our epigenomic profiling of the three forms of H3K27 indicates that methylation of this lysine residue is generally associated with repressive chromatin and that its indexing function depends on the degree of modification (mono-, di- and tri-methylation). Thus, in agreement with previous studies, H3K27me3, which is the hallmark of CS2, is almost exclusively present over transcriptionally repressed genes (Turck et al, 2007; Zhang et al, 2007), while H3K27me1 is prevalent over silent TEs in pericentromeric regions, where it is thought to prevent over-replication (Jacob et al, 2009, 2010). Our analysis reveals in addition that H3K27me2 is enriched over H3K27me3-marked genes, as well as of over TE sequences. Although immunolocalization of H3K27me2 at chromocenters (Fuchs et al, 2006) was proposed to result from cross-reactivity of antibodies with H3K27me1 in Arabidopsis (Jacob et al, 2009), we did not observe extensive cross-reactivity of the H3K27me2 antibodies used in our study with H3K27me1 (Supplementary Figure S5). Moreover, while all forms of methylated H3K27 can be found over genes and are associated with transcriptional repression, little overlap is observed between the small group of genes marked by H3K27me1 and the much larger set of genes marked by H3K27me2/3, suggesting that these modifications define two repressive pathways with distinct gene targets (Supplementary Tables VI and VII). Whereas H3K27me3 deposition is catalysed by the evolutionarily conserved Polycomb Repressive Complexes 2 (Kohler and Hennig, 2010; Bouyer et al, 2011), H3K27me1 deposition over TE sequences is partly dependent on the activity of the two SET-domain proteins ATXR5 and ATXR6 (Jacob et al, 2009). Whether H3K27me1 deposition over genes requires the same or different histone methyltransferases and whether it is associated with the control of DNA replication remain to be determined. Irrespective of the mechanisms involved, it is noteworthy that whereas H3K27me1-marked TE sequences are also co-marked with H3K9me2 and 5mC, this is not the case for H3K27me1-marked genes. Acetylation of H3K56 is another chromatin mark that has been linked with the replication process. In Arabidopsis cell cultures, early replicating sequences form broad domains of H3K56ac (Lee et al, 2010b). Our epigenomic profiling of H3K56ac reveals mostly short domains located at the 5′-end of expressed genes, which correspond to the replication-independent incorporation of acetylated H3K56. However, a few large domains (∼20 kb) are also detected, which span several genes, intergenic regions and TEs. As our epigenomic maps have been derived from whole seedlings that comprise only a small proportion of mitotic cells, these large H3K56ac domains might correspond to sequences frequently used as endoreplication origins. Although most Arabidopsis genes are associated with chromatin states CS1 or CS2, ∼10% are instead associated with CS4, which is characterized by the absence of any prevalent chromatin mark among the 12 that were analysed in this work (Figure 3). Analysis of additional chromatin marks and proteins will be required to determine more precisely the nature of CS4 and notably the extent of its similarity to the repressive chromatin type BLACK of Drosophila (Filion et al, 2010). To conclude, the first integrative view of the Arabidopsis epigenome provided here could be compared with a first sketch, which is progressively refined until a complete blueprint is produced. Importantly, key aspects of the Arabidopsis epigenome are already apparent in this first sketch, like the relative simplicity of designing principles, which appears to be shared with metazoans.

Materials and methods

Immunoprecipitation of chromatin and methylated DNA, labelling and microarray hybridization

All experiments were performed using wild-type Arabidopsis thaliana accession Columbia seedlings grown for 10 days either in liquid MS (whole seedlings) or on MS agar plates (roots and aerial parts) supplemented with 1% sucrose under long day conditions. ChIP and Me-DIP assays were carried out essentially as described (Lippman et al, 2005) using commercially available antibodies (Supplementary Table IX; Supplementary Figure S5). Specificity of the H3K27me2 and H3K9me3 antibodies was tested by peptide competition and western blotting analysis on nuclear extracts (Supplementary Figure S5) as described in Bouyer et al (2011) using H3K27me3, H3K27me2, H3K27me1, H3K9me3 and H3K36me3 peptides (Millipore, 12-565, 12-566, 12-567, 12-568 and Diagenode sp-058-050, respectively). Immunoprecipitated DNA (IP) and input DNA (INPUT) were amplified, differentially labelled and co-hybridized in dye-swap experiments as described (Lippman et al, 2004; Turck et al, 2007) for the chromosome 4 tiling array or according to the manufacturer's instructions for the Roche NimbleGen whole-genome tiling array. Two biological replicates were analySed (two dye-swaps). The chromosome 4 array contains 21 800 printed features, on average ∼900 bp in size. The heterochromatic knob on the short arm and several megabases of pericentromeric heterochromatin are included and account for 16% of the 18.6 Mb covered by the array. Details of array design and production are described in Vaughn et al (2007). This platform has been deposited to GEO under accession number GPL10172. The whole-genome tiling array consists of 50–75 nt tiles, with 110 nt spacing on average, that are tiled across the entire genome sequence (TAIR7), without repeat masking. Tiles have a melting temperature of 74°C on average and 88% of them match a unique position in the genome. This custom design was either split into two arrays of 360 718 tiles each, with every other tile on each array (GEO accessions GPL10911 and GPL10918) or synthesized in triplicates of 711 320 tiles each on a single array (GEO accession GPL11005).

ChIP- and Me-DIP-chip data analysis

Hybridization data were normalized as described previously for the chromosome 4 array (Turck et al, 2007) or using an ANOVA model was applied to remove technical biases from data obtained using the whole-genome array. Data were averaged on the dye-swap to remove tile-specific dye bias. Normalized data were analysed using the ChIPmix method (Martin-Magniette et al, 2008), which was adapted to handle multiple biological replicates simultaneously. This method is based on a mixture model of regressions, the parameters of which are estimated using the EM algorithm. For each tile, a posterior probability is defined as the probability to be enriched given the log(Input) and log(IP) intensities, and is used to assign each tile into a normal or enriched class. A false-positive risk is determined by defining the probability of obtaining a posterior probability at least as extreme as the one that is actually observed when the tile is normal. False-positive risks are then adjusted by the Benjamini–Hochberg procedure and tiles for which the adjusted false-positive risk is <0.01 are declared enriched. Previously published data (Turck et al, 2007; Vaughn et al, 2007) were re-analysed using the same procedure. Neighbouring enriched tiles are joined into domains by requiring a minimal run of 1.6 kb or 400 bp and allowing a maximal gap of 800 or 200 bp for data obtained using the chromosome 4 or whole-genome arrays, respectively. Thus, ‘singletons' are not considered for further analyses.

Computational analyses

General bioinformatics methods including positional, quantitative and class-based computations were conducted in Excel and using ad hoc scripts written in R, PERL or Python. Genes and transposable elements were annotated based on TAIR8 and other sequences are assumed to be intergenic. Gene Ontology analyses were done using the GOrilla (Eden et al, 2009) with an additional correction for multiple testing of the P-values. Pairwise association analysis, which is directional unlike correlation analysis, was calculated by scoring the frequency of co-occurrence of pairs of chromatin modifications among the 12 marks analysed on the chromosome 4 tiling array. Whole seedlings transcriptome data were retrieved from Schmid et al (2005) and genes were binned into 20 expression percentiles according to their absolute expression values. Within each expression percentile, the number of genes marked by a given chromatin modification was calculated and represented as a percentage of all the genes marked by this modification. Shannon entropy for each set of marked genes was calculated as described (Zhang et al, 2006) using publicly available developmental expression series (Schmid et al, 2005), after filtering genes that showed no expression in any conditions. Fuzzy c-means clustering using R MCLUST package was performed to classify tiles into principal chromatin states based on the 12 epigenomic maps. c-means clustering computes membership values for each tile towards all the clusters and all the membership values add up to 1. Each tile was assigned to one cluster only, based on a membership value equal or higher to 0.5. To identify the optimal number of clusters (k), cluster validity value, which is an estimate of homogeneity within the clusters and heterogeneity between them, was calculated for clusters from k=2–11.

Data availability

Raw and processed data have been deposited to NCBI's Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/geo/) under the super-series accession GSE24710 and to CATdb (http://urgv.evry.inra.fr/CATdb) (Samson et al, 2004; Gagnot et al, 2008). In addition, array data and genome annotation are displayed using a Generic Genome Browser, available for visualization at http://epigara.biologie.ens.fr/index.html.

69 in total

1. Plasticity in patterns of histone modifications and chromosomal proteins in Drosophila heterochromatin.

Authors: Nicole C Riddle; Aki Minoda; Peter V Kharchenko; Artyom A Alekseyenko; Yuri B Schwartz; Michael Y Tolstorukov; Andrey A Gorchakov; Jacob D Jaffe; Cameron Kennedy; Daniela Linder-Basso; Sally E Peach; Gregory Shanower; Haiyan Zheng; Mitzi I Kuroda; Vincenzo Pirrotta; Peter J Park; Sarah C R Elgin; Gary H Karpen
Journal: Genome Res Date: 2010-12-22 Impact factor: 9.043

2. Broad chromosomal domains of histone modification patterns in C. elegans.

Authors: Tao Liu; Andreas Rechtsteiner; Thea A Egelhofer; Anne Vielle; Isabel Latorre; Ming-Sin Cheung; Sevinc Ercan; Kohta Ikegami; Morten Jensen; Paulina Kolasinska-Zwierz; Heidi Rosenbaum; Hyunjin Shin; Scott Taing; Teruaki Takasaki; A Leonardo Iniguez; Arshad Desai; Abby F Dernburg; Hiroshi Kimura; Jason D Lieb; Julie Ahringer; Susan Strome; X Shirley Liu
Journal: Genome Res Date: 2010-12-22 Impact factor: 9.043

3. ChIPmix: mixture model of regressions for two-color ChIP-chip analysis.

Authors: Marie-Laure Martin-Magniette; Tristan Mary-Huard; Caroline Bérard; Stéphane Robin
Journal: Bioinformatics Date: 2008-08-15 Impact factor: 6.937

4. Acetylation in the globular core of histone H3 on lysine-56 promotes chromatin disassembly during transcriptional activation.

Authors: Stephanie K Williams; David Truong; Jessica K Tyler
Journal: Proc Natl Acad Sci U S A Date: 2008-06-24 Impact factor: 11.205

5. Combinatorial patterns of histone acetylations and methylations in the human genome.

Authors: Zhibin Wang; Chongzhi Zang; Jeffrey A Rosenfeld; Dustin E Schones; Artem Barski; Suresh Cuddapah; Kairong Cui; Tae-Young Roh; Weiqun Peng; Michael Q Zhang; Keji Zhao
Journal: Nat Genet Date: 2008-06-15 Impact factor: 38.330

Review 6. Epigenetic modifications in plants: an evolutionary perspective.

Authors: Suhua Feng; Steven E Jacobsen
Journal: Curr Opin Plant Biol Date: 2011-01-11 Impact factor: 7.834

7. Monoubiquitinated H2B is associated with the transcribed region of highly expressed genes in human cells.

Authors: Neri Minsky; Efrat Shema; Yair Field; Meromit Schuster; Eran Segal; Moshe Oren
Journal: Nat Cell Biol Date: 2008-03-16 Impact factor: 28.824

8. Highly integrated single-base resolution maps of the epigenome in Arabidopsis.

Authors: Ryan Lister; Ronan C O'Malley; Julian Tonti-Filippini; Brian D Gregory; Charles C Berry; A Harvey Millar; Joseph R Ecker
Journal: Cell Date: 2008-05-02 Impact factor: 41.582

9. Polycomb repressive complex 2 controls the embryo-to-seedling phase transition.

Authors: Daniel Bouyer; Francois Roudier; Maren Heese; Ellen D Andersen; Delphine Gey; Moritz K Nowack; Justin Goodrich; Jean-Pierre Renou; Paul E Grini; Vincent Colot; Arp Schnittger
Journal: PLoS Genet Date: 2011-03-10 Impact factor: 5.917

10. Comprehensive analysis of the chromatin landscape in Drosophila melanogaster.

Authors: Peter V Kharchenko; Artyom A Alekseyenko; Yuri B Schwartz; Aki Minoda; Nicole C Riddle; Jason Ernst; Peter J Sabo; Erica Larschan; Andrey A Gorchakov; Tingting Gu; Daniela Linder-Basso; Annette Plachetka; Gregory Shanower; Michael Y Tolstorukov; Lovelace J Luquette; Ruibin Xi; Youngsook L Jung; Richard W Park; Eric P Bishop; Theresa K Canfield; Richard Sandstrom; Robert E Thurman; David M MacAlpine; John A Stamatoyannopoulos; Manolis Kellis; Sarah C R Elgin; Mitzi I Kuroda; Vincenzo Pirrotta; Gary H Karpen; Peter J Park
Journal: Nature Date: 2010-12-22 Impact factor: 49.962

258 in total

Review 1. Polycomb group complexes mediate developmental transitions in plants.

Authors: Sarah Holec; Frédéric Berger
Journal: Plant Physiol Date: 2011-11-15 Impact factor: 8.340

2. Genome-wide analysis of histone H3.1 and H3.3 variants in Arabidopsis thaliana.

Authors: Hume Stroud; Sofía Otero; Bénédicte Desvoyes; Elena Ramírez-Parra; Steven E Jacobsen; Crisanto Gutierrez
Journal: Proc Natl Acad Sci U S A Date: 2012-03-19 Impact factor: 11.205

3. Loss of DNA methylation affects the recombination landscape in Arabidopsis.

Authors: Marie Mirouze; Michal Lieberman-Lazarovich; Riccardo Aversano; Etienne Bucher; Joël Nicolet; Jon Reinders; Jerzy Paszkowski
Journal: Proc Natl Acad Sci U S A Date: 2012-03-26 Impact factor: 11.205

4. Genome-wide identification of regulatory DNA elements and protein-binding footprints using signatures of open chromatin in Arabidopsis.

Authors: Wenli Zhang; Tao Zhang; Yufeng Wu; Jiming Jiang
Journal: Plant Cell Date: 2012-07-05 Impact factor: 11.277

Review 5. Genome architecture: from linear organisation of chromatin to the 3D assembly in the nucleus.

Authors: Joana Sequeira-Mendes; Crisanto Gutierrez
Journal: Chromosoma Date: 2015-09-02 Impact factor: 4.316

6. The many faces of plant chromatin: Meeting summary of the 4th European workshop on plant chromatin 2015, Uppsala, Sweden.

Authors: Iva Mozgová; Claudia Köhler; Valérie Gaudin; Lars Hennig
Journal: Epigenetics Date: 2015 Impact factor: 4.528

7. Telomere binding protein TRB1 is associated with promoters of translation machinery genes in vivo.

Authors: Petra Procházková Schrumpfová; Ivona Vychodilová; Jan Hapala; Šárka Schořová; Vojtěch Dvořáček; Jiří Fajkus
Journal: Plant Mol Biol Date: 2015-11-23 Impact factor: 4.076

Review 8. The First Rule of Plant Transposable Element Silencing: Location, Location, Location.

Authors: Meredith J Sigman; R Keith Slotkin
Journal: Plant Cell Date: 2016-02-11 Impact factor: 11.277

9. Jumonji C domain protein JMJ705-mediated removal of histone H3 lysine 27 trimethylation is involved in defense-related gene activation in rice.

Authors: Tiantian Li; Xiangsong Chen; Xiaochao Zhong; Yu Zhao; Xiaoyun Liu; Shaoli Zhou; Saifeng Cheng; Dao-Xiu Zhou
Journal: Plant Cell Date: 2013-11-26 Impact factor: 11.277

10. HISTONE DEACETYLASE6 Controls Gene Expression Patterning and DNA Methylation-Independent Euchromatic Silencing.

Authors: Emilija Hristova; Kateryna Fal; Laurin Klemme; David Windels; Etienne Bucher
Journal: Plant Physiol Date: 2015-04-27 Impact factor: 8.340