| Literature DB >> 27013954 |
Alexandra Essebier1, Patricia Vera Wolf1, Minh Duc Cao1, Bernard J Carroll1, Sureshkumar Balasubramanian2, Mikael Bodén1.
Abstract
More than 30 human genetic diseases are linked to tri-nucleotide repeat expansions. There is no known mechanism that explains repeat expansions in full, but changes in the epigenetic state of the associated locus has been implicated in the disease pathology for a growing number of examples. A comprehensive comparative analysis of the genomic features associated with diverse repeat expansions has been lacking. Here, in an effort to decipher the propensity of repeats to undergo expansion and result in a disease state, we determine the genomic coordinates of tri-nucleotide repeat tracts at base pair resolution and computationally establish epigenetic profiles around them. Using three complementary statistical tests, we reveal that several epigenetic states are enriched around repeats that are associated with disease, even in cells that do not harbor expansion, relative to a carefully stratified background. Analysis of over one hundred cell types reveals that epigenetic states generally tend to vary widely between genic regions and cell types. However, there is qualified consistency in the epigenetic signatures of repeats associated with disease suggesting that changes to the chromatin and the DNA around an expanding repeat locus are likely to be similar. These epigenetic signatures may be exploited further to develop models that could explain the propensity of repeats to undergo expansions.Entities:
Keywords: DNA methylation; bioinformatics; epigenetics; genome sequence; histone modification; short tandem repeat
Year: 2016 PMID: 27013954 PMCID: PMC4782033 DOI: 10.3389/fnins.2016.00092
Source DB: PubMed Journal: Front Neurosci ISSN: 1662-453X Impact factor: 4.677
Disease-associated tri-nucleotide repeats examined in this study.
| Spinobulbar muscular atrophy | SBMA | AR | X:66765160 | CAG |
| Huntington's disease | HD | HTT | 4:3076604 | CAG |
| Dentatorubral-pallidoluysian atrophy | DRPLA | ATN1 | 12:7045880 | CAG |
| Spinocerebellar ataxia type 1 | SCA1 | ATXN1 | 6:16327213 | CAG |
| Spinocerebellar ataxia type 2 | SCA2 | ATXN2 | 12:112037083 | CAG |
| Spinocerebellar ataxia type 3 | SCA3 | ATXN3 | 14:92537280 | CAG |
| Spinocerebellar ataxia type 6 | SCA6 | CACNA1A | 19:13318283 | CAG |
| Spinocerebellar ataxia type 7 | SCA7 | ATXN7 | 3:63898362 | CAG |
| Spinocerebellar ataxia type 17 | SCA17 | TBP | 6:170870996 | CAG |
| Potassium channel gene | KCNN3 | KCNN3 | 1:154841700 | CAG |
| Amplified in breast cancer 1 | AIB1 | NCOA3 | 20:46279816 | CAG |
| Soluble programmed death-1 | SPD1 | HOXD13 | 2:176957782 | GCG |
| Oculopharyngeal muscular dystrophy | OPMD | PABPN1 | 14:23790681 | GCG |
| Associated with cleidocranial dysplasia, reduced bone mineral density and bone fracture | CBFA1 | CBFA1/ RUNX2 | 6:45390487 | GCG |
| Holoprosencephaly | ZIC2 | ZIC2 | 13:100637703 | GCG |
| Associated with hand-foot-genital syndrome | HOXA13 | HOXA13 | 7:27238886 | GCG |
| Associated with blepharophimosis syndrome (BPES) | FOXL2 | FOXL2 | 3:138665094 | GCG |
| Early infantile epileptic encephalopathy type 1 | EIEE1 | ARX | X:25031140 | GCG |
| Associated with pseudochondroplasia (PSACH) & multiple epiphyseal dysplasia (MED) | COMP | COMP | 19:18896872 | GAC |
| Associated with Fuchs' endothelial corneal dystrophy (FECD) | CTG18.1 | TCF4 | 18:53253401 | CAG |
| Myotonic dystrophy type 1 | DM1 | DMPK | 19:46273199 | CAG |
| Friedreich ataxia | FRDA | FXN | 9:71652203 | GAA |
| Spinocerebellar ataxia type 8 | SCA8 | ATXN8OS | 13:70681356 | CAG |
| Spinocerebellar ataxia type 12 | SCA12 | PPP2R2B | 5:146258400 | CAG |
| Huntington's disease-like 2 | HDL2 | JPH3 | 16:87637894 | CAG |
| Not currently Associated with a phenotypic abnormality | MAB21L1 | MAB21L1 | 13:36050618 | CAG |
| Candidate gene for autism spectrum disorders (ASDs) | RELN | RELN | 7:103629939 | GCG |
| Fragile X syndrome/Fragile X tremor ataxia syndrome | FRAXA/ FXTAS | FMR1 | X:146993569 | GCG |
| Folate-sensitive fragile site FRA10A | FRA10A | FRA10AC1 | 10:95462158 | GCG |
| Fragile X syndrome | FRAXE | FMR2 | X:147582153 | GCG |
| Fragile X syndrome | FRAXF | FAM11A | X:148713314 | GCG |
| Folate-sensitive fragile site FRA11B | FRA11B | CBL2 | 11:119077000 | GCG |
Columns include disease, disease ID/abbreviation, causal or associated gene for the disease, the genomic location of the repeat, and the repeat sequence. The genomic location is based on the reference hg19 and includes chromosome and starting position. See Supplementary Table 1 for references to supporting literature.
Epigenetic marks analyzed in this study.
| H3K4me1 | A mark of regulatory elements associated with enhancer regions | 111 | ENCODE Project Consortium, |
| H3K4me3 | A mark of regulatory elements associated with promoter regions | 111 | ENCODE Project Consortium, |
| H3K9me3 | A repressive mark associated with heterochromatin regions | 111 | ENCODE Project Consortium, |
| H3K27me3 | A repressive mark associated with Polycomb complex activity | 111 | ENCODE Project Consortium, |
| H3K36me3 | An elongation mark associated with transcribed regions | 111 | ENCODE Project Consortium, |
| H3K9ac | A mark of active regulatory elements associated with increased activation of enhancer and promoter regions | 44 | ENCODE Project Consortium, |
| H3K27ac | A mark of active regulatory elements associated with increased activation of enhancer and promoter regions | 82 | ENCODE Project Consortium, |
| DNAm | A mark associated with chromatin structure, silencing gene expression and maintaining stability in repetitive DNA | 40 | Robertson, |
| DNase | Denoting regions of accessible chromatin commonly associated with regulatory DNA regions | 37 | ENCODE Project Consortium, |
The columns include the standard abbreviation of the epigenetic mark, its biological description, the number of cell types explored in this study along with the data source.
Figure 1A graphical representation of the three methods used to measure an enrichment of epigenetic environment around DA-TNRs. M1 measures an association between a group of DA-TNRs and an epigenetic mark on basis of overlap within each cell type. M2 measures an association between a specific DA-TNR and an epigenetic mark on basis of their overlap when cell type is ignored. M3 uses a distance from the centre of the repeat to the centre of the epigenetic mark to measure enrichment. Each method makes different use of assays from available cell types, foregrounds and statistical tests to measure enrichment.
Percent of cell types that display over- or under-enrichment for specific epigenetic marks around DA-TNRs per M1 (see Figure .
| H3K4me1 | 2 | 62 | 7 | 1 | 26 | 10 | 5 | 16 | 7 | 1 | 13 | 0 |
| H3K4me3 | 65 | 100 | 46 | 0 | 95 | 0 | 53 | 82 | 77 | 52 | 0 | 7 |
| H3K9ac | 36 | 97 | 3 | 0 | 75 | 0 | 28 | 53 | 53 | 28 | *6 | 0 |
| H3K27ac | 17 | 89 | 4 | 0 | 70 | *6 | 18 | 32 | 48 | 12 | *9 | 0 |
| 32 | 95 | 32 | 1 | 69 | 1 | 17 | 33 | 40 | 6 | 1 | 0 | |
| H3K9me3 | 4 | 10 | 7 | 6 | *8 | 17 | 1 | 5 | *3 | 1 | 20 | 0 |
| H3K27me3 | 2 | 23 | 20 | 2 | 1 | 32 | 1 | 7 | *1 | 0 | 51 | 0 |
| H3K36me3 | 0 | 4 | *11 | 0 | 12 | 2 | 0 | 1 | 14 | 1 | 2 | 0 |
| 0 | 7 | 3 | 3 | 5 | 7 | 1 | 3 | 5 | 1 | 8 | 0 | |
| DNAm | 68 | 100 | 50 | 3 | 90 | 5 | 65 | 85 | 85 | 33 | 3 | 98 |
| DNase | 46 | 100 | 41 | 0 | 92 | 0 | 41 | 57 | 65 | 38 | *3 | 5 |
A row represents a specific epigenetic mark, or when marked “Any,” the union of all marks listed immediately above. Each column represents DA-TNRs grouped by sequence and/or genic region (as indicated) with the absolute number of DA-TNRs in brackets. For each epigenetic mark and DA-TNR group, the percentage of cell types containing a statistically enriched epigenetic mark at p ≤ 0.05 is shown (over-enrichment by default; under-enrichment is marked with asterisk; if both over- and under-enrichment were found for the same epigenetic mark, only the dominant type is shown). bg-TNRs were predicted by Tandem Repeats Finder and stratified by sequence and/or genic region as per DA-TNRs.
Over- and under-enriched epigenetic marks identified by M2 for individual DA-TNRs (see Figure .
| AR | X:66765160 | 34 | CAG | • | ◦ | ⇑ | ⇑ | ⇓ | ⇑ | ↑ | 36 | ||||||
| HTT | 4:3076604 | 21 | CAG | • | ⇑ | ↓ | ↓ | ↑ | ⇑ | ⇑ | 427 | ||||||
| ATN1 | 12:7045880 | 20 | CAG | • | ⇑ | ⇑ | ⇑ | ↓ | ⇑ | ↓ | ↑ | 427 | |||||
| ATXN1 | 6:16327213 | 29 | CAG | • | ◦ | ↓ | ⇓ | ⇑ | ↑ | ↓ | 45 | ||||||
| ATXN2 | 12:112037083 | 23 | CAG | • | ◦ | ◦ | ⇓ | ↑ | ↑ | ↑ | 7 | ||||||
| ATXN3 | 14:92537280 | 27 | CAG | • | ◦ | ◦ | ↓ | 3 | |||||||||
| CACNA1A | 19:13318283 | 13 | CAG | • | ◦ | ⇓ | ⇑ | 36 | |||||||||
| ATXN7 | 3:63898362 | 10 | CAG | • | ◦ | ⇑ | ⇓ | ↑ | ⇑ | ⇑ | ⇑ | 36 | |||||
| TBP | 6:170870996 | 47 | CAG | • | ⇑ | ⇓ | ↓ | 427 | |||||||||
| KCNN3 | 1:154841700 | 17 | CAG | • | ⇑ | ⇑ | ⇓ | ⇑ | ↑ | 427 | |||||||
| NCOA3 | 20:46279816 | 29 | CAG | • | ↓ | ⇓ | ⇑ | ↓ | ↓ | 427 | |||||||
| HOXD13 | 2:176957782 | 15 | GCG | • | ⇑ | ⇑ | ↓ | ⇓ | 994 | ||||||||
| PABPN1 | 14:23790681 | 7 | GCG | • | ◦ | ⇓ | ⇓ | ⇑ | 124 | ||||||||
| RUNX2 | 6:45390487 | 15 | GCG | • | 994 | ||||||||||||
| ZIC2 | 13:100637703 | 18 | GCG | • | ⇓ | ⇑ | ↓ | 994 | |||||||||
| HOXA13 | 7:27238886 | 14 | GCG | • | ⇑ | ↑ | ⇑ | ↓ | 994 | ||||||||
| FOXL2 | 3:138665094 | 14 | GCG | • | ⇑ | ⇑ | 994 | ||||||||||
| ARX | X:25031140 | 15 | GCG | • | ↑ | ⇑ | ↓ | 994 | |||||||||
| COMP | 19:18896872 | 7 | GAC | • | ↑ | ↓ | ⇑ | 478 | |||||||||
| TCF4 | 18:53253401 | 24 | CAG | • | ⇑ | ⇑ | ⇑ | ⇑ | ⇑ | 760 | |||||||
| DMPK | 19:46273199 | 20 | CAG | • | ↑ | ⇑ | ↓ | ↑ | ⇑ | ⇑ | 89 | ||||||
| FXN | 9:71652203 | 6 | GAA | • | ⇑ | ⇑ | ↓ | ⇑ | ↑ | ⇑ | 3664 | ||||||
| ATXN8OS | 13:70681356 | 15 | CAG | ◦ | • | ⇑ | ↓ | ⇑ | ⇓ | ⇓ | 48 | ||||||
| PPP2R2B | 5:146258400 | 11 | CAG | ◦ | • | ⇑ | ⇑ | 30 | |||||||||
| JPH3 | 16:87637894 | 14 | CAG | ◦ | ⇑ | ⇑ | ⇑ | ⇑ | 785 | ||||||||
| MAB21L1 | 13:36050618 | 19 | CAG | ◦ | • | ↑ | ⇑ | ↑ | ⇑ | 30 | |||||||
| RELN | 7:103629939 | 8 | GCG | • | ⇑ | ⇓ | 1354 | ||||||||||
| FMR1 | X:146993569 | 20 | GCG | ◦ | • | ◦ | ↓ | 13 | |||||||||
| FRA10AC1 | 10:95462158 | 8 | GCG | ◦ | • | 237 | |||||||||||
| FMR2 | X:147582153 | 19 | GCG | • | ⇓ | 1354 | |||||||||||
| FAM11A | X:148713314 | 12 | GCG | • | 1354 | ||||||||||||
| CBL2 | 11:119077000 | 11 | GCG | • | 1354 | ||||||||||||
Columns include position in the reference genome (hg19; chromosome and locus), repeat tract length [R#] and [Seq]uence unit, if it occurs in an [E]xon (coding), [I]ntron, [5]′ or [3]′ UTR regions (as annotated by RefSeq denoted by ◦; as reported in literature denoted by •). Enrichment of epigenetic marks is based on multiple cell types from the Roadmap Epigenomics project and are assumed to be independent of cell type, relative bg-TNRs grouped by identical genic region and sequence composition. Arrows indicate over- (blue arrow) or under- (red arrow) enrichment, at a corrected p ≤ 10–5 (single arrow) or p ≤ 10–10 (double arrow). The sample size of the background is shown in the column “Sample.” The lines separate poly-Gln, poly-Ala, poly-Asp and non-coding repeats. See Supplementary Table 1 for disease identifiers. DNAm is DNA methylation.
Proximity-enriched epigenetic marks in Human ESCs identified by M3 for individual DA-TNRs (see Figure .
| AR | CAG | • | ◦ | H4K20me1 | ||
| HTT | CAG | • | H3K4me3, H3K9ac | |||
| ATN1 | CAG | • | H4K20me1 | |||
| ATXN1 | CAG | • | ◦ | DNAm | ||
| ATXN2 | CAG | • | ◦ | ◦ | DNAm | |
| ATXN3 | CAG | • | ◦ | ◦ | ||
| CACNA1A | CAG | • | ◦ | DNAm | ||
| ATXN7 | CAG | • | ◦ | CTCF | ||
| TBP | CAG | • | ||||
| KCNN3 | CAG | • | H3K9ac, HDAC2 | |||
| NCOA3 | CAG | • | H2az | |||
| HOXD13 | GCG | • | ||||
| PABPN1 | GCG | • | ◦ | H3K36me3, FAIRE | ||
| RUNX2 | GCG | • | H2az | |||
| ZIC2 | GCG | • | H3K36me3 | |||
| HOXA13 | GCG | • | DNase | |||
| FOXL2 | GCG | • | CTCF | |||
| ARX | GCG | • | DNase, FAIRE | |||
| COMP | GAC | • | H4K20me1 | |||
| TCF4 | CAG | • | ||||
| DMPK | CAG | • | H4K20me1, HDAC2 | |||
| FXN | GAA | • | H2az, H3K27ac, H3K4me1, H3K4me2, H3K4me3, H3K9ac, CTCF, DNase | |||
| ATXN8OS | CAG | ◦ | • | H3K4me1 | ||
| PPP2R2B | CAG | ◦ | • | H3K27ac, DNase | ||
| JPH3 | CAG | ◦ | H3K27me3, H3K79me2, HDAC2 | |||
| MAB21L1 | CAG | ◦ | • | H4K20me1 | ||
| RELN | GCG | • | H3K20me1 | |||
| FMR1 | GCG | ◦ | • | ◦ | ||
| FRA10AC1 | GCG | ◦ | • | CTCF (x2), H3K4me3, H3K79me2, H3K9ac | ||
| FMR2 | GCG | • | H3K27me3 | |||
| FAM11A | GCG | • | ||||
| CBL2 | GCG | • | ||||
Columns include gene name and sequence unit, if it occurs in an [E]xon (coding), [I]ntron, [5]′ or [3]′ UTR regions (as annotated by RefSeq denoted by ◦; as reported in literature denoted by •), and which epigenetic marks are significantly closer to DA-TNRs, relative bg-TNRs (p ≤ 0.05) in human ESCs.
Figure 2The epigenetic environment of FRDA. A representation of epigenetic marks with their location around the Frataxin gene (NM_000144) across three experimental data sets (named in the right margin). Upper panel: Marks are displayed if they are significantly close to the GAA repeat according to M3. Middle panel: The full-width peaks for each mark around the GAA repeat according to the Roadmap Epigenome for healthy brain tissue. Lower panel: Changes in epigenetic marks as observed in brain tissue, based on differences between normal and diseased state (Al-Mahdawi et al., 2008). Locations where DNAm and histone assays were performed are shown. Arrows indicate a significant increase or decrease in methylation and histone binding at these locations. The size of the GAA repeat expansion in the disease state is also indicated.