Fragile X-associated tremor/ataxia syndrome (FXTAS) and fragile X syndrome (FXS) are primary examples of fragile X-related disorders (FXDs) caused by abnormal expansion of CGG repeats above a certain threshold in the 5'-untranslated region of the fragile X mental retardation (FMR1) gene. Both diseases have distinct clinical manifestations and molecular pathogenesis. FXTAS is a late-adult-onset neurodegenerative disorder caused by a premutation (PM) allele (CGG expansion of 55-200 repeats), resulting in FMR1 gene hyperexpression. On the other hand, FXS is a neurodevelopmental disorder that results from a full mutation (FM) allele (CGG expansions of ≥200 repeats) leading to heterochromatization and transcriptional silencing of the FMR1 gene. The main challenge is to determine how CGG repeat expansion affects the fundamentally distinct nature of FMR1 expression in FM and PM ranges. Abnormal CGG repeat expansions form a variety of non-canonical DNA and RNA structures that can disrupt various cellular processes and cause distinct effects in PM and FM alleles. Here, we review these structures and how they are related to underlying mutations and disease pathology in FXS and FXTAS. Finally, as new CGG expansions within the genome have been identified, it will be interesting to determine their implications in disease pathology and treatment.
Fragile X-associated tremor/ataxia syndrome (FXTAS) and fragile X syndrome (FXS) are primary examples of fragile X-related disorders (FXDs) caused by abnormal expansion of CGG repeats above a certain threshold in the 5'-untranslated region of the fragile X mental retardation (FMR1) gene. Both diseases have distinct clinical manifestations and molecular pathogenesis. FXTAS is a late-adult-onset neurodegenerative disorder caused by a premutation (PM) allele (CGG expansion of 55-200 repeats), resulting in FMR1 gene hyperexpression. On the other hand, FXS is a neurodevelopmental disorder that results from a full mutation (FM) allele (CGG expansions of ≥200 repeats) leading to heterochromatization and transcriptional silencing of the FMR1 gene. The main challenge is to determine how CGG repeat expansion affects the fundamentally distinct nature of FMR1 expression in FM and PM ranges. Abnormal CGG repeat expansions form a variety of non-canonical DNA and RNA structures that can disrupt various cellular processes and cause distinct effects in PM and FM alleles. Here, we review these structures and how they are related to underlying mutations and disease pathology in FXS and FXTAS. Finally, as new CGG expansions within the genome have been identified, it will be interesting to determine their implications in disease pathology and treatment.
CGG repeats are a type of microsatellite or short tandem repeat (STR) found in the human genome, with the majority located in the 5′-untranslated regions (5′-UTRs), suggesting that they may play a role in transcriptional regulation or translation initiation (Bagshaw, 2017). Abnormal expansion of CGG repeat tracts above a certain threshold confers instability and chromosome fragility, resulting in various clinical manifestations. CGG expansion in the FRAXA (folate-sensitive fragile site, X chromosome, A) region has distinct effects on the fragile X mental retardation1 (FMR1) gene located on the X chromosome (Xq27.3) (Verkerk et al., 1991). The structure of FMR1 is shown in Figure 1. It contains 56 CpG sites spread across 1 kb of its promoter and a naturally occurring CGG triplet-repeat region in its first exon (Oberlé et al., 1991; Naumann et al., 2014). However, in the general population, there are some polymorphisms in the CGG repeat region in terms of the length and content of AGG repeats, which are often interspersed with a periodicity of the 9th to 11th repeats. AGG interruptions significantly increase the stability of CGG repeats (Nolin et al., 2013; Yrigollen et al., 2014). Carriers of FMR1 alleles that are either normal (<55 repeats) or have 55–200 repeats (premutation (PM) alleles) have much lower rates of chromosomal fragility. Longer CGG repeats are extremely unstable during intergenerational transmission (Nolin et al., 2019) and in somatic cells, resulting in CGG repeat expansion (Lokanga et al., 2013). Therefore, chromosome fragility is prominent in carriers of the FMR1 allele with massive CGG repeat expansions of >200 repeats (full mutation (FM) alleles) (Mila et al., 2018). The FM allele is usually accompanied by heterochromatization, transcriptional silencing, and subsequent loss of FMR1 protein (FMRP) expression, resulting in fragile X syndrome (FXS; OMIM #300624) (Mila et al., 2018). FXS is the most common form of inherited intellectual disability (ID) and is the leading genetic cause of autism (Hagerman et al., 2010). However, PM alleles are associated with transcriptional increases in FMR1, which could be related to euchromatization of the FMR1 locus and an upstream shift in the transcription start from the transcription start site (TSS-I) of FMR1 (Tassone et al., 2000; Hagerman, 2012; Schneider et al., 2020). Such hyperexpression of the PM allele paradoxically leads to a relatively normal or gradual reduction in FMRP with increasing repeat length (Hagerman and Hagerman, 2004). Hyperexpression of PM alleles is associated with specific disorders, including fragile X premature ovarian insufficiency (FXPOI; OMIM #311360), a condition associated with menopause in women aged <40 years (Sullivan et al., 2011; Sherman et al., 2014), and fragile X-associated tremor/ataxia syndrome, a neurodegenerative disorder (FXTAS; OMIM #300623) that affects PM carriers, mostly men over the age of 50 years, with clinical manifestations such as action tremors, gait ataxia, Parkinsonism, and cognitive decline (Hagerman and Hagerman, 2016). In model systems, hyperexpression of riboCGG repeats in the PM range leads to defects in cell development and cell toxicity (Hagerman, 2012; Hagerman et al., 2018; Bhat et al., 2021). Unlike FM alleles, PM alleles alter RNA-processing mechanisms, which may be related to unusual secondary structures formed by DNA strands and RNA containing CGG and antisense CCG repeats (Zhao and Usdin, 2021). Such unusual secondary structures can potentially impede translation within the PM range through an obscure mechanism (Zhao and Usdin, 2021). In addition, such secondary structures can sequester specific proteins from their normal biological functions and/or undergo repeat-associated non-AUG (RAN) translation from both sense and antisense strands into toxic homopolymeric peptides (Todd et al., 2013). Homopolymeric peptides, such as polyGlycine (FMRpolyG), have been identified in neuronal inclusions of FXTAS patients (Buijsen et al., 2014). Additionally, FMRpolyG overexpression is toxic to cells in various FXTAS model systems (Kearse et al., 2016; Sellier et al., 2017). This review focuses on how secondary structures are related to the PM and FM alleles and their associated diseases. Recent years has seen a flurry of papers reporting novel CGG repeat expansions within the genome and some have been cloned and associated with neurodevelopmental or neurodegenerative diseases (Deng et al., 2019; Ishiura et al., 2019; Okubo et al., 2019; Sone et al., 2019; Tian et al., 2019; Jiao et al., 2020; Ma et al., 2020; Sun et al., 2020; Annear et al., 2021). Some share common genetic and clinical features, allowing for a better understanding of disease mechanisms and development of therapeutic strategies.
FIGURE 1
Representation of the canonical structure of the FMR1 gene and its alleles (normal, intermediate, PM, FM) as a result of CGG repeat expansion in the 5′-UTR. Exons 1 to 17 that can be spliced in different ways, as well as sites for binding transcription factors and transcription start sites (TSS-I, TSS-II, and TSS-III).
Representation of the canonical structure of the FMR1 gene and its alleles (normal, intermediate, PM, FM) as a result of CGG repeat expansion in the 5′-UTR. Exons 1 to 17 that can be spliced in different ways, as well as sites for binding transcription factors and transcription start sites (TSS-I, TSS-II, and TSS-III).
Structural polymorphism of CGG/CCG repeats in the FMR1 gene
As shown in Figure 2, individual strands of expanded CGG repeats form a variety of stable non-canonical DNA and RNA structures during processes involving transient DNA unwinding such as replication, repair, transcription, and/or recombination. There is conflicting evidence regarding the secondary structural preference of DNA and RNA strands. CGG stem-loop/hairpins are relatively stable and easily formed in vitro and in vivo using Watson-Crick G:C and Hoogsteen G:G base pairs (Figure 2A) (Chen et al., 1995; Mitas et al., 1995; Nadel et al., 1995; Usdin and Woodford, 1995; Yu et al., 1997; Handa et al., 2003; Sobczak et al., 2003; Zumwalt et al., 2007; Ciesiolka et al., 2017; Ajjugal et al., 2021; Poggi and Richard, 2021). However, in the presence of physiological K+ concentrations, stable G-quadruplex (G4) and intercalated-motif (i-motif) structures are formed from CGG and CCG repeat strands, respectively (Figure 2B) (Kettani et al., 1995; Fojtík and Vorlícková, 2001; Weisman-Shomer et al., 2002; Weisman-Shomer et al., 2003; Khateb et al., 2007; Renčiuk et al., 2011; Krzyzosiak et al., 2012; Loomis et al., 2014; Malgowska et al., 2014; Yang and Rodgers, 2014; Chen et al., 2018; Asamitsu et al., 2021). The formation of hairpins or tetrahelical structures (dimerization of hairpins) is altered by AGG interruption and cell type (Jarem et al., 2010). The strands of CCG repeats are also unpaired and form stable pathological R-loops, which are RNA:DNA hybrid duplexes that are formed in the transcribed region during transcription (Figure 2C) (Abu Diab et al., 2018; Crossley et al., 2019). A hairpin in the non-template strand could reduce duplex reannealing behind the advancing transcription complex, and thus aid in R-loop formation. The persistence of the R loop, on the other hand, may favor the development of the hairpin on the non-template strand. R-loop structures composed of a G-rich RNA template and C-rich DNA template are thermodynamically advantageous and stable compared to DNA duplexes (Roberts and Crothers, 1992; Belotserkovskii et al., 2013; Takahashi and Sugimoto, 2020). Interestingly, unlike hairpins, the formation of the R-loop is not affected by AGG interruption within the CGG repeat tract (Reddy et al., 2011) suggesting that they are formed in most repeat expansion disorders (REDs) that become heterochromatinized.
FIGURE 2
Representation of non-canonical secondary structures formed by CGG (blue) or CCG (red) repeat expansions on the respective strands of FMR1. (A) Hairpin created on sense (blue) and antisense (red) strands, (B) a G-quadruplex formed on a sense (blue) strand or an i-motif structure formed on the antisense strand (red), and (C) an R-loop formed by the annealing of nascent RNA and non-template strand (red). The unpaired loops are shown in green.
Representation of non-canonical secondary structures formed by CGG (blue) or CCG (red) repeat expansions on the respective strands of FMR1. (A) Hairpin created on sense (blue) and antisense (red) strands, (B) a G-quadruplex formed on a sense (blue) strand or an i-motif structure formed on the antisense strand (red), and (C) an R-loop formed by the annealing of nascent RNA and non-template strand (red). The unpaired loops are shown in green.
CGG/CCG repeat associated secondary structures play a role in the expansion of CGG repeats in the FMR1 gene
FMR1 is flanked by two origins of replication (ORIs): One 45 kb upstream and one 45 kb downstream (Gerhardt et al., 2014). Inactivation of upstream ORIs in FM human embryonic stem cells (hESCs) and PM cells most likely occurs during germ cell generation and the early stages of embryogenesis when rapid cell division and more ORIs are simultaneously required to complete genome replication. During replication, hairpin and tetrahelical structures have been observed to pause DNA polymerases in both in vitro and in vivo studies (Viguera et al., 2001; Murat et al., 2020), resulting in the probability of replication irregularities and repeat instability. When such structures are formed on the Okazaki fragments of the lagging strand, the polymerase slips, resulting in the expansion of repeats in the daughter strand (Figure 3A). In contrast, a hairpin on the template of the leading strand causes the polymerase to skip the loop, resulting in contraction of the repeat in the daughter strand (Kim and Mirkin, 2013). Replication difficulties may also explain why offspring from male PM carriers do not inherit expanded or FM alleles. This is because, unlike post-mitotic oocytes, sperm cells undergo multiple rounds of replication before fertilization, which could provide selective pressure for expansion in male PM carriers compared to female PM carriers.
FIGURE 3
Repeat instability models. (A) Model of repeat instability based on the Ori-switch. The absence of replication ORI upstream of the CGG repeat track causes formation of hairpin-like secondary structures on the lagging strand, leading to polymerase slip and resulting in repeat expansion in the new daughter strand. (B) Model of repeat instability based on mismatch repair (MMR). Repeat instability occurs by causing a nick at the base of loopouts that are bound by mismatch repair factors MutSβ or MutLγ, and are processed via a DSB to generate expansions. MutLγ endonuclease activity can be directed by a nick to cleave the opposite strand in a concerted manner to create a DSB. Out-of-register annealing could result in the activation of via the non-homologous end joining (NHEJ) or microhomology-mediated end joining (MMEJ) pathway. (C) Base excision repair (BER) model of repeat instability. DNA glycosylase recognizes the oxidized base and APEI creates a nick, leaving a single-stranded break. Strand slippage results in the formation of repeat-associated hairpins either on the lesion or on the opposite non-lesion strand, leading to a multinucleotide gap. (D) Nucleotide excision repair (NER) model of repeat instability. RNA polymerase stalls because of R-loop formation and/or the formation of secondary structures on the non-template strand. Stalled transcription recruits transcription arrest factors, including CSB and XPG, which nicks the repeated region at two different sites, and thus removes this fragment. DNA pol then refills this gap via transcription-coupled-NER (TC-NER). Repetitive regions in DNA (green) and RNA (yellow) are shown.
Repeat instability models. (A) Model of repeat instability based on the Ori-switch. The absence of replication ORI upstream of the CGG repeat track causes formation of hairpin-like secondary structures on the lagging strand, leading to polymerase slip and resulting in repeat expansion in the new daughter strand. (B) Model of repeat instability based on mismatch repair (MMR). Repeat instability occurs by causing a nick at the base of loopouts that are bound by mismatch repair factors MutSβ or MutLγ, and are processed via a DSB to generate expansions. MutLγ endonuclease activity can be directed by a nick to cleave the opposite strand in a concerted manner to create a DSB. Out-of-register annealing could result in the activation of via the non-homologous end joining (NHEJ) or microhomology-mediated end joining (MMEJ) pathway. (C) Base excision repair (BER) model of repeat instability. DNA glycosylase recognizes the oxidized base and APEI creates a nick, leaving a single-stranded break. Strand slippage results in the formation of repeat-associated hairpins either on the lesion or on the opposite non-lesion strand, leading to a multinucleotide gap. (D) Nucleotide excision repair (NER) model of repeat instability. RNA polymerase stalls because of R-loop formation and/or the formation of secondary structures on the non-template strand. Stalled transcription recruits transcription arrest factors, including CSB and XPG, which nicks the repeated region at two different sites, and thus removes this fragment. DNA pol then refills this gap via transcription-coupled-NER (TC-NER). Repetitive regions in DNA (green) and RNA (yellow) are shown.Repeat expansion can also occur during the repair of secondary structures through redundant repair events that are not protective but are harmful either by leading to repeat expansion or contraction (Salinas-Rios et al., 2011; Pluciennik et al., 2013). Genome-wide association studies (GWAS) in patient cohorts with various repeat expansion disorders (REDs) have implicated a variety of mismatch repair (MMR) proteins such as mutS homolog 3 (MSH3), mutL homolog 1 (MLH1), and mutL homolog 3 (MLH3) as important modifiers of repeat expansion and disease severity. These proteins are required for repeat expansion in FXDs and in several RED mouse models (Schmidt and Pearson, 2016; Kadyrova et al., 2020). For example, in an FX PM mouse model, overexpression of MSH2 increased the frequency of both intergenerational CGG repeat expansion and somatic expansion, whereas ablation of MSH2 reduced both repeat number and expansion frequency in a dose-dependent manner (Lokanga et al., 2013; Lokanga et al., 2014). Similarly, in mESCs derived from FX PM mice, the point mutation D1185N in the endonuclease domain of MLH3 precludes repeat expansion, suggesting its importance in this process (Hayward et al., 2020). In addition to MutSβ (an MSH2 and MSH3 heterodimer) and MutSγ (an MSH2 and MSH6 heterodimer) (López Castel et al., 2010; Zhao et al., 2015; Zhao et al., 2016), three other mammalian protein complexes, MutSα, MutLα, and MutLβ, play important roles in expansion (Figure 3B) (López Castel et al., 2010; Miller et al., 2020; Zhao and Usdin, 2021). Although these studies have suggested that the MMR pathway plays a role in repeat expansion, the mechanism by which MMR substrates are generated remains unclear. Secondary structures formed during replication or transcription are vulnerable to oxidative damage and the most common oxidation product is 7,8-dihydro-8-oxoguanine (8-oxoG) (Jarem et al., 2011). As a result, base excision repair (BER) of 8-oxoG results in strand displacement synthesis due to polymerase slippage, resulting in the formation of repeat-associated hairpins on either the lesion or the opposite non-lesion strand (Lokanga et al., 2015) (Figure 3C). Therefore, repairing one lesion increases the possibility of generating additional oxidized bases and cycle repeat instability (Jarem et al., 2011). The observation that the treatment of FXD mouse models with potassium bromate (KBrO3) resulted in a significant increase in both 8-oxoG and the frequency of germline expansion supports the role of oxidative damage in CGG repeat expansion (Entezam et al., 2010). However, this study did not provide any evidence of somatic expansion. MutLγ function is partly dependent on cytosine deamination and AP endonuclease 1 (Apn1) activity, which act on dsDNA (Su and Freudenreich, 2017; Zhao et al., 2018). Therefore, R-loop displacement may act as a substrate for MutLγ, resulting in a slipped strand structure with hairpins on both strands (Reddy et al., 2014). Moreover, in FXS, MutLγ recognizes hairpin junctions as Holliday junctions, nicks both strands, and results in a double stranded break (DSB) in the CGG repeats (Gazy et al., 2019). In addition, a nick can direct MutLγ endonuclease activity to cleave the opposite strand in a concerted manner to generate DSBs (Figure 3D). A recent study has reported that FXS cells show more DSBs that colocalize with R-loop-forming sequences. These R-loop-induced DSBs decrease in number once exogenous FMRP is expressed in FXS cells, suggesting that FMRP prevents the gene from forming an R-loop (Chakraborty et al., 2021).
CGG/CCG repeat associated secondary structures play a role in the pathogenesis of FXTAS
Normal FMR1 alleles are transcriptionally active and are correlated with normal FMRP production (Figure 5A). The PM allele is associated with euchromatization and transcriptional activation of the FMR1 gene in PM-related disorders, such as FXTAS and FXPOI. This was evidenced by the increased levels of CGG-containing FMR1 mRNA (up to eight-fold) with relatively unchanged or slightly reduced FMRP levels in PM carriers. FMR1 RNA transcripts are present in the nuclear inclusions (NIs) of postmortem FXTAS brains (Tassone et al., 2004). Related inclusions were found in FXTAS disease model systems. Although higher RNA levels are associated with increased transcription initiation, rather than increased transcript stability (Tassone et al., 2007), the exact mechanism of hyperexpression remains unknown. Several points of evidence may explain hyperexpression of the PM allele. First, both in vitro and in vivo studies have linked PM alleles, as well as long tracts of CGG/CCG repeats, to a transcriptionally active euchromatic configuration of the FMR1 locus (Figure 4A). This may increase the accessibility of transcription factors or chromatin modifiers to promote transcription initiation. Consistent with this, the FMR1 promoter in PM alleles showed almost two times higher acetylation of histone-H3 and -H4 compared to normal alleles (Todd et al., 2010). Secondly, FMR1 mRNAs with CCG repeats in the PM range form hairpin structures. These structures may directly bind to factors that remodel chromatin to regulate FMR1 transcription or cause stalling of the 40 S ribosomal subunits, resulting in altered transcription start sites and decreased FMRP levels (Usdin and Woodford, 1995). Third, unlike the stable R-loops found in FXS, R-loops associated with PM alleles are susceptible to chromatin decondensation (Wang et al., 1996; Wang, 2007; Powell et al., 2013). As nascent FMR1 and R-loops have been identified as targets of DNA methyltransferase 1 (DNMT1), nascent FMR1 RNA and co-transcriptional R-loop structures may interact with DNMT1, preventing it from performing normal DNA methylation at the FMR1 locus (Di Ruscio et al., 2013). The absence of FXTAS symptoms in FXS patients and the absence of FXS in older FXTAS patients suggests that FMR1 mRNA repeats play a direct role in FXTAS pathology. In model systems, ectopic expression of riboCGG repeats leads to the production of inclusions, disruption of the nuclear lamin A/C architecture, and induction of cell toxicity (Hagerman, 2013). Several mutually non-exclusive molecular mechanisms have been proposed for FXTAS (Figure 5B): RNA gain of function or sequestration type of mechanism has been proposed for REDs such as spinocerebellar ataxia type 8 (SCA8), as well as myotonic dystrophy type 1 (DM1) (La Spada and Taylor, 2010; Todd and Paulson, 2010). According to this model, cellular toxicity is caused by partial sequestration of specific RNA-binding proteins (RBPs) from their normal functions by hairpin structures (Figure 5B). Some of the sequestered proteins identified in FXTAS patients and model systems include heterogeneous nuclear ribonucleoproteins (hnRNP A2/B) and Pur α (Jin et al., 2007; Sofola et al., 2007; Hagerman and Hagerman, 2016), which are involved in various processes of DNA metabolism, including transcriptional activation. Sequestration of muscleblind-like splicing regulator 1 (MBNL1) and SRC associated mitosis of 68 kDa (Sam68) are involved in mRNA splicing defects in FXTAS cellular models (Sellier et al., 2010). Similarly, the sequestration of Drosha and DiGeorge syndrome critical region 8 complex (Drosha-DGCR8) is involved in the processing of miRNA precursors in the nucleus (Sellier et al., 2013) and has been linked to the reduced generation of mature miRNAs in the brains of FXTAS patients. Moreover, overexpression of most RBPs has been shown to reduce RNA toxicity and improve phenotypes in FXTAS disease models (Hagerman, 2012). Recently, it has been found that various DNA helicases, such as human DNA helicase B, remove CGG repeat-associated secondary structures by unwinding (Guler et al., 2018). Consistent with this, R-loop formation can be prevented by RNA helicases, as overexpression of the Drosophila ortholog of p68/DDX5 RNA helicase, Rm62 (one of the sequestered proteins along with Pur α), prevents neurodegeneration in transgenic flies expressing riboCGG repeats within the PM range (Qurashi et al., 2011). Another proposed mechanism for FXTAS pathogenesis is RAN translation, which is thought to be triggered by RNA hairpins acting as impediments to ribosomes that favor noncanonical translation at suboptimal initiation codons upstream of the true initiation codon. In FXTAS, the non-coding region of FMR1 mRNA is translated into multiple RAN translation products, including homopolymeric proteins such as FMRpolyG, whose length correlates with the number of CGG repeats (Todd et al., 2013). RAN translation has been detected in several other REDs such as amyotrophic lateral sclerosis, frontal dementia (ALS-FTD), and SCA8, suggesting shared disease mechanisms (Cleary and Ranum, 2013). The FMRpolyG peptide was found in ubiquitin-positive inclusions in the brains of FXTAS patients, and has been directly linked to CGG repeat-associated toxicity in FXTAS disease models (Todd et al., 2013; Buijsen et al., 2014; Sellier et al., 2017). FMRpolyG binds to CGG-RNA quadruplex structures in vitro, promotes aggregate formation, and alters the ubiquitin-proteasome system (UPS) in an FXTAS model system. In addition, FMRpolyG interacts with lamina-associated polypeptide 2 beta (LAP2β), a nuclear membrane protein, and rescues neuronal cell death in a mouse FXTAS model (Sellier et al., 2013; Todd et al., 2013; Hoem et al., 2019). The third proposed molecular mechanism involves an altered DNA damage response (DDR) molecular signalling pathway due to co-transcriptional R-loops (Aguilera and García-Muse, 2012; García-Muse and Loops, 2019). Such R-loops are susceptible to single- and double-strand breaks (Cristini et al., 2019). Corroborating this, γH2AX, a marker of DSBs, has been identified in NIs in FXTAS brains (Iwahashi et al., 2006; Garcia-Arocena and Hagerman, 2010; Hoem et al., 2011). Similarly, DSB-activated ataxia-telangiectasia mutated kinase (ATM) has been observed in FXTAS animal models (Robin et al., 2017).
FIGURE 5
The non-canonical DNA and RNA structures are linked to FXTAS and FXS. (A) The presence of normal alleles results in normal transcription and FMRP synthesis. (B) The PM allele causes the formation of R-loops in DNA and hairpins (in DNA or RNA). Hairpin-containing FMR1 transcripts can bind and sequester rCGG specific RBPs or induce RAN translation. (C) The development of a longer R-loop permits the recruitment of PRC2 to the promoter for repressive histone modification.
FIGURE 4
Regions and epigenetic modifications in the FMR1 promoter are shown. (FREE1 region (blue), CpG island (red), CGG repeat (yellow), exon 1, and FREE2 intron 1 segment are highlighted (yellow). (A) In normal and PM alleles, the CGG repeats in the promoter region are flanked by 5′ and 3′ stable epigenetic boundaries (DNA methylation (lower) and repressive histone marks (lower), allowing transcription of the FMR1, ASFMR1, FMR4, FMR5, and FMR6 genes. (B) The 5′ and 3′ epigenetic boundaries were abolished in FM, allowing DNA methylation to spread throughout the promoter region. DNA methylation (higher) and repressive histone marks (higher).
Regions and epigenetic modifications in the FMR1 promoter are shown. (FREE1 region (blue), CpG island (red), CGG repeat (yellow), exon 1, and FREE2 intron 1 segment are highlighted (yellow). (A) In normal and PM alleles, the CGG repeats in the promoter region are flanked by 5′ and 3′ stable epigenetic boundaries (DNA methylation (lower) and repressive histone marks (lower), allowing transcription of the FMR1, ASFMR1, FMR4, FMR5, and FMR6 genes. (B) The 5′ and 3′ epigenetic boundaries were abolished in FM, allowing DNA methylation to spread throughout the promoter region. DNA methylation (higher) and repressive histone marks (higher).The non-canonical DNA and RNA structures are linked to FXTAS and FXS. (A) The presence of normal alleles results in normal transcription and FMRP synthesis. (B) The PM allele causes the formation of R-loops in DNA and hairpins (in DNA or RNA). Hairpin-containing FMR1 transcripts can bind and sequester rCGG specific RBPs or induce RAN translation. (C) The development of a longer R-loop permits the recruitment of PRC2 to the promoter for repressive histone modification.
CGG/CCG repeat associated secondary structures play a role in the pathogenesis of FXS
The transcriptionally inactive FM allele is linked to the heterochromatic status of FMR1, as has been observed in individuals with FXS. During embryonic development, de novo DNA methyltransferases (DNMTs) establish cytosine methylation across the entire promoter, including the fragile X related element 1 (FREE1), CpG island, CGG repeat, and fragile X related element 2 (FREE2) regions of FMR1 gene (Oberlé et al., 1991; Naumann et al., 2009). However, in rare FXS individuals, the unmethylated full mutation (UFM) allele may represent the methylation status prior to FMR1 silencing, which occurs around 11 weeks of gestation (Willemsen et al., 2002; Colak et al., 2014; Mor-Shaked and Eiges, 2018). Thus, the extent to which silencing occurs in early FXS embryos remains an important open question. In addition, FMR1 silencing may require several other epigenetic regulatory mechanisms. Histone modifications occur in FMR1 promoter-associated chromatin, with inhibitory histone marks (H3K9me2, H3K9me3, H3K27me3, and H4K20me3) and fewer active histone marks (H3K9ac and H4K16ac) catalyzed by histone methyltransferase (HMT) and histone deacetylases, respectively (Figure 4B) (Coffee et al., 1999; Biacsi et al., 2008; Li et al., 2018). Polycomb group proteins cause the trimethylation of histone 3, such as H3K9me3, H3K27me3, and H4K20me3. Specifically, polycomb repressive complex 2 (PRC2), a transcriptional repressor complex, is required for histone 3 trimethylation at lysine 27 (H3K27me3), which is a late modification required for gene silencing. Consequently, PRC2 inhibition prevents H3K27me3 in the FMR1 5′-UTR (Kumari and Usdin, 2014). PRC2 binds to G-rich RNAs, specifically G4-forming RNA sequences and R-loops, to mediate gene silencing at multiple loci (Skourti-Stathaki et al., 2019). Therefore, it is possible that the R-loops and FMR1 transcript aid gene silencing by facilitating PRC2 recruitment, either directly or indirectly (Figure 5C). Accordingly, FMR1 mRNA, and thus R-loops prevents PRC2 mediated gene silencing during the neuronal differentiation of embryonic stem cells (Colak et al., 2014). In addition, decreased PRC2 recruitment to FM alleles is reactivated by 5-azadeoxycytidine (Kumari and Usdin, 2014). It is worth noting that given the proposed role of R-loops in hyperexpression of the PM allele, the role of the R-loop in gene silencing in the case of the FM allele appears paradoxical. The R loops associated with FM alleles are more stable and longer, which may account for the differences in the effects of repeat length, transcriptional rate, protein expression, and cell stage (Colak et al., 2014; Groh and Gromak, 2014; Loomis et al., 2014). As a result, this R-loop may further promote the loss of active chromatin marks in the flanking regions of the FMR1 promoter, transcriptional termination, and DNA damage.
Novel CGG/CCG repeats in the human genome suggest their broad involvement in neurological diseases
Long-read and whole-genome sequencing has revealed additional STRs within the genome that are more widespread than previously thought (Depienne and Mandel, 2021). A small subset of these STRs has identical sequences, sizes, and genomic locations. In addition, they may be unstable during intergenerational transmission and exhibit expansions or contractions that result in neurological disorders with related clinical manifestations and pathogenic mechanisms (Liufu et al., 2022). For example, similar to FXS, expanded CGG repeats are a causative genetic contributor to Desbuquois dysplasia 2 (DBQD2) and Baratela-Scott syndrome (BSS). DBQD2 and BSS are characterised by skeletal dysplasia and share several clinical features. In both cases, CGG expansion in the 5′-UTR of XYLT1 leads to gene silencing through hypermethylation (LaCroix et al., 2019). Similarly, hypermethylation caused by CGG expansion in the 5′-UTR of disco-interacting protein 2 homologue B (DIP2B) (Winnepenninckx et al., 2007) and AF4/FMR2 family member 3 (AFF3) causes FRA12A-related neurocognitive and ID disorders (Knight et al., 1993). Similar clinical manifestations have been observed in individuals with deletions or other loss-of-function mutations in these genes, further supporting the hypothesis that CGG expansion in these genes is pathogenic via a loss-of-function mechanism.CGG expansion in several other genes can also manifest as dominant neurodegenerative disorders via mechanisms similar to those described for FXTAS. The GGC repeat, located in the 5′-UTR of NOTCH2NLC, is a causative genetic contributor to neuronal intranuclear inclusion disease (NIID) (Deng et al., 2019; Ishiura et al., 2019). Pathogenic NOTCH2NLC expansions have been identified in patients with essential tremor (ETM6, MIM #618866), C9ORF72-associated amyotrophic lateral sclerosis/frontal temporal dementia (ALS/FTD) (Tian et al., 2019; Jiao et al., 2020), Parkinsonism (Ma et al., 2020), and multiple system atrophy (Fang et al., 2020). In addition, oculopharyngodistal myopathy type 1–4 (OPDM), group of adult-onset inherited neuromuscular disorders, are caused by CGG repeat expansions in the 5′UTR of LRP12 (Ishiura et al., 2019), GIPC1 (Deng et al., 2020), NOTCH2NLC (Yu et al., 2021), and RILPL1 (Yu et al., 2022), respectively. Similarly, CGG expansion in NUTM2B-AS1 has been identified as the causative agent of oculopharyngeal myopathy with leucoencephalopathy (OPML) (Ishiura et al., 2019). Interestingly, NIID, OPDM, and OPML resemble FXTAS in terms of clinical symptoms, radiological imaging, and histological characteristics, such as the presence of distinctive eosinophilic ubiquitin-positive NIs (Viguera et al., 2001). In patients with NIID, RNA molecules with expanded CGG repeats form RNA foci that sequester RBPs into p62-positive NIs (Mori et al., 2012). In addition, similar to FXTAS patients, the translation of expanded GGC repeats resulted in the accumulation of polyG-containing proteins in the NIs in both the NIID model system and patients. Together, these results suggest a pathological mechanism involving toxic gain-of-function at the RNA level and/or RAN translation. Although the formation of polyG in OPML and OPDMs has not yet been elucidated, in the C9ORF72-associated amyotrophic ALS/FTD, translation of the polyglycine-alanine dipeptide repeat (polyGA DPR) protein occurs because of G4C2 repeats located in the first intron of the C9ORF72 gene (Tabet et al., 2018). While these examples demonstrate common pathogenic mechanisms in several distinct diseases, it remains unclear whether they reflect general disease mechanisms. It is worth noting that DNA methylation may be protective in some NOTCH2NLC-associated NIIDs (Ishiura and Tsuji, 2020), however, it increases RNA and peptide toxicity in C9ORF72-associated ALS/FTD (Zhu et al., 2020). Therefore, understanding how DNA methylation affects the progression of such disorders can lead to improved treatments such as those based on Cas9 methylation editing, which has recently been proposed for FXS (Liu et al., 2018).
Conclusion and perspective
FXTAS and FXS are two primary diseases caused by dynamic mutations in FMR1, and have distinct clinical manifestations and molecular pathogenesis. FXTAS is a late-onset neurodegenerative disorder that typically affects men >50 years of age. On the other hand, FXS is a neurodevelopmental disease and the most common type of inherited intellectual disability. Both are caused by the abnormal expansion of CGG repeats beyond the normal range in the 5′-UTR of the FMR1 gene. PM alleles (CGG expansions of 55–200 repeats) were associated with elevated FMR1 mRNA levels and relatively normal FMRP levels. In contrast, FM alleles (CGG expansion of ≥200 repeats) typically result in transcriptional silencing and, consequently, the loss of FMR1 protein (FMRP). Abnormal CGG expansions form a variety of secondary structures that are linked to the pathology and transmission risk in both diseases. As CGG/CCG/GGC repeats with characteristics similar to those of CGG repeat expansions associated with FXS or FXTAS are abundant in the human genome, these studies suggest that CGG repeats are broadly involved in neurological diseases. Although several rare folate-sensitive fragile sites associated with neurodevelopmental diseases have been cloned as expanded CGG repeats, the number of studies of CGG/GGC repeat-related disorders has increased in recent years. A recent study using whole-genome STR analysis discovered hundreds of unique-CGG repeats with highly variable repeat lengths and intergenerational instability, most of which are linked to known neurodevelopmental disease genes or strong candidate genes (Annear et al., 2021). Furthermore, several GGC repeat-related disorders, such as ET and NIID, have been identified to have clinical and molecular overlaps with FXTAS (Xu et al., 2021). In these diseases, GGC repeats occur at the 5′-UTR of the respective gene and do not involve the open reading frame of the gene, implying that GGC and CGG repeat RNAs share the same secondary structures that may play an important role in disease pathogenesis and are thus amenable to pharmacological or molecular therapy. In this context, antisense oligonucleotides (ASOs) containing CCG repeats have been shown to reduce R-loop formation and alleviate the downstream effects of RNA hairpin formation (Derbis et al., 2021). Similarly, small molecules that inhibit protein binding to the hairpin structure or reduce RBP sequestration or RAN translation have been shown to alleviate disease pathology in FXTAS model systems (Disney et al., 2012; Hagihara et al., 2012; Qurashi et al., 2012; Tran et al., 2014; Verma et al., 2019; Verma et al., 2020). Despite these encouraging results, a deeper understanding of the underlying pathophysiology of these diseases is still required. Recently, small molecules that reprogram the epigenetically determined transcriptional state of key genes by stabilizing G4 structures in DNA have been used to develop epigenetic therapies (Guilbaud et al., 2017). Therefore, understanding the secondary structures formed by CGG/GGC repeats and their downstream effects may lead to a better understanding of disease pathology as well as the development of therapeutics to alleviate their pathological effects.
Authors: Weston T Powell; Rochelle L Coulson; Michael L Gonzales; Florence K Crary; Spencer S Wong; Sarrita Adams; Robert A Ach; Peter Tsang; Nazumi Alice Yamada; Dag H Yasui; Frédéric Chédin; Janine M LaSalle Journal: Proc Natl Acad Sci U S A Date: 2013-08-05 Impact factor: 11.205
Authors: Ronald A M Buijsen; Chantal Sellier; Lies-Anne W F M Severijnen; Mustapha Oulad-Abdelghani; Rob F M Verhagen; Robert F Berman; Nicolas Charlet-Berguerand; Rob Willemsen; Renate K Hukema Journal: Acta Neuropathol Commun Date: 2014-11-26 Impact factor: 7.801