Literature DB >> 33579866

What is the Pathogenic CAG Expansion Length in Huntington's Disease?

Jasmine Donaldson¹, Sophie Powell¹, Nadia Rickards¹, Peter Holmans¹, Lesley Jones¹.

Abstract

Huntington's disease (HD) (OMIM 143100) is caused by an expanded CAG repeat tract in the HTT gene. The inherited CAG length is known to expand further in somatic and germline cells in HD subjects. Age at onset of the disease is inversely correlated with the inherited CAG length, but is further modulated by a series of genetic modifiers which are most likely to act on the CAG repeat in HTT that permit it to further expand. Longer repeats are more prone to expansions, and this expansion is age dependent and tissue-specific. Given that the inherited tract expands through life and most subjects develop disease in mid-life, this implies that in cells that degenerate, the CAG length is likely to be longer than the inherited length. These findings suggest two thresholds- the inherited CAG length which permits further expansion, and the intracellular pathogenic threshold, above which cells become dysfunctional and die. This two-step mechanism has been previously proposed and modelled mathematically to give an intracellular pathogenic threshold at a tract length of 115 CAG (95% confidence intervals 70- 165 CAG). Empirically, the intracellular pathogenic threshold is difficult to determine. Clues from studies of people and models of HD, and from other diseases caused by expanded repeat tracts, place this threshold between 60- 100 CAG, most likely towards the upper part of that range. We assess this evidence and discuss how the intracellular pathogenic threshold in manifest disease might be better determined. Knowing the cellular pathogenic threshold would be informative for both understanding the mechanism in HD and deploying treatments.

Entities: Chemical

Keywords: DNA repair; DNA repeat expansion; Huntington’s disease; spinocerebellar ataxias; trinucleotide repeat

Mesh：

Year: 2021 PMID： 33579866 PMCID： PMC7990448 DOI： 10.3233/JHD-200445

Source DB: PubMed Journal: J Huntingtons Dis ISSN： 1879-6397

BACKGROUND

Huntington’s disease (HD) is one of > 50 diseases caused by expanded short tandem repeats [1, 2]. In those diseases where the repeat is coding, as in HD, the repeat unit is usually CAG and this is translated to a homopolymeric glutamine tract in the encoded protein. There are nine such diseases, often referred to collectively as the polyglutamine diseases. The sections of these proteins containing expanded glutamine form cellular aggregates [3]. The polyglutamine diseases have disease-causing expansion lengths that are much shorter than those in diseases where the repeats causing the expansion are not translated [4, 5], implying a possible constraint on length at the level of the protein. While somatic expansion is critical in reaching the intracellular pathogenic CAG length threshold, the subsequent events leading to cell dysfunction and death have not been conclusively defined (Fig. 1). Much attention has focused on the expanded glutamine tract in the protein but it has never been conclusively proven that this elicits toxicity in cells in human disease, and the genetic evidence implicates CAG length rather than polyglutamine length as critical in HD pathogenesis [6– 8]. Other potential pathogenic mechanisms that cannot be precluded include RNA-based toxicity as in myotonic dystrophy (OMIM 160900) [9], RAN translation [10] and aberrant exon 1 splicing [11]: all of these mechanisms would also be exacerbated by somatic expansion of the repeat in individual cells. Recent evidence of neurodevelopmental effects in HD [12], and early phenotypes in peripheral blood mononuclear cells [13, 14], may indicate other pathways impacted by the unexpanded CAG length, but the genetic evidence in HD subjects very clearly points to somatic expansion as likely to be important in disease manifestation.

Fig. 1

A model for the pathogenic threshold in HD. A) HD pathogenesis is largely determined by an expanded cytosine-adenine-guanine (CAG) trinucleotide repeat within exon 1 of the huntingtin (HTT) gene, which is translated into an expanded polyglutamine tract in the corresponding HTT protein. Wild-type HTT possesses 5– 35 CAG repeats (non-expanded HTT gene), and can undergo expansion into the disease range in the germline to create apparent de novo HD subjects, but ≥36 + repeats is associated with a significantly increased risk of developing HD (expanded HTT gene). B) An expanded HTT allele with 36 or more repeats is unstable and licenced to further expand in cells over the lifespan of the HD at-risk individual. HD symptoms would manifest and progress as increasing numbers of disease-relevant cells undergo somatic expansion beyond an unknown intracellular pathogenic threshold that renders the gene toxic in those cells. Figures created using BioRender.com. Figure 1A adapted from a figure by National Institute of General Medical Sciences, National Institutes of Health. In HD, age at onset of disease is largely determined by the length of the CAG tract [15– 17]. More recently, however, age at onset has been shown to be modulated by a series of genetic modifiers whose discovery has revolutionised the way we think about HD pathogenesis [7, 18– 20]. Two types of genetic modifier revealed so far have provided evidence that has made us rethink our notions of HD pathogenesis. First, some of the encoded proteins translated from these modifiers act directly on DNA and are most likely to exert their effect at the level of the mutated expanded DNA, through modulating the length of the CAG tract in both somatic and germline cells [18– 21], and indeed, this has been shown experimentally in cell culture [22, 23]. Second, the exact sequence at the HTT CAG locus exerts a strong influence on age at onset of disease [6– 8]. The length of the uninterrupted CAG tract correlates most closely with age at onset of disease and any interrupting bases that disrupt the continuous repeated sequence delay age at onset. Not all of the difference in onset age appears to be determined by uninterrupted CAG length at the HTT locus, in particular, having no CAA interruptions appears to hasten onset, while the effect of having more than one CAA at the 3’ end of the repeat tract is unclear [6, 7]. Together, these findings refocus attention on the causative repeat expansion at the level of the DNA, rather than the expanded glutamine tract in the mutant huntingtin protein that is translated from it.

A TWO-STAGE HYPOTHESIS OF DISEASE PATHOGENESIS

We know that further expansion of the HTT CAG tract occurs in the brain cells of HD subjects [24– 26], that longer CAG tracts are more likely to expand [27, 28], and that the greater the expansion in brain the earlier the onset of disease [29]. These observations in humans validate previous findings in mice and provide a link between genetic data from human subjects and disease models of repeat instability [18, 20, 30– 32]. These genetic data imply that at the point at which the CAG tract elicits toxicity in susceptible cells, which we refer to as the intracellular pathogenic CAG length, it is likely to be longer than the diagnostic disease-causing length of 36 CAG or more measured in the subjects’ blood. It is plausible that the inherited disease-causing threshold of 36 CAG in HD is the threshold beyond which further CAG repeat expansions are permissible in susceptible somatic cells (Fig. 1). It is also possible that there is a vicious cycle where alterations in DNA damage and repair exacerbate somatic expansion of the repeat [33]. So how can we define the intracellular pathogenic repeat length in such cells? Or at least the length that is pathogenic in a high enough proportion of the appropriate cell types to cause disease symptoms? This important information is lacking: defining such a threshold would be useful in clinical prognosis and clinical trials. A version of this interesting idea has been proposed previously by Kaplan and colleagues [34]. They postulated a mechanism where onset and progression of the disease are determined by the rate of expansion of the repeat tract in specific cells in the subject, and that the disease manifests only when the repeat tract has expanded beyond a certain threshold in a sufficient number of these cells. Progression is then determined by the rate at which more cells cross the pathogenic threshold for repeat length. The expansion rate and pathogenic threshold are estimated from clinical data (onset and repeat length of the inherited allele). The model assumes that the expansion rate is linear with respect to repeat length above an initial threshold, determined from clinical data (37 for HD). The assumption of linear expansion rate has not been confirmed experimentally, although Kaplan et al. [34] found that the onset predictions given by their model appeared to fit observed data for a number of repeat disorders, including HD, the polyglutamine tract mediated spinocerebellar ataxias (SCAs), myotonic dystrophy 1 (DM1, OMIM 160900) and Friedreich’s ataxia (FA, OMIM 229300). This model also predicts that earlier onset and longer CAG are correlated with faster progression, consistent with recent clinical observations [19, 35]. It may also explain why pre-clinical symptoms are evident up to 15 years prior to disease onset [36– 39]. This premanifest period reflects a proportion of cells that have somatic expansion which induces neuronal dysfunction, but too few to manifest overt clinical symptoms (Fig. 2). It is also likely that with a slow degeneration of neurons during this period, there is functional compensation in the CNS [40, 41]. The new genetic evidence makes this a compelling hypothesis in the pathogenesis of HD and other repeat disorders. This prescient paper has notably only been cited eleven times in 13 years [34].

Fig. 2

Potential relationship of CAG tract expansion and clinical Huntington’s disease events. The premanifest period of the disease may reflect the presence of a proportion of disease-relevant cells with sufficient somatic expansion to induce neuronal dysfunction, but too few to manifest overt clinical symptoms. Premanifest HD includes a presymptomatic period where no signs or symptoms are present, and prodromal HD, characterised by the onset of subtle signs and symptoms, which may be the result of the HTT CAG length expanding beyond an unknown pathogenic threshold in increasing numbers of disease-relevant cells. Manifest HD— characterised by chorea and gradual worsening of motor and cognitive difficulties— may then arise once a significant number of disease-relevant cells have passed this threshold. Somatic expansion in susceptible cell populations is likely to be occurring throughout the premanifest and prodromal stages of disease as indicated by the hypothetical dashed line, although the actual trajectory of this expansion will depend on the inherited repeat length and is likely to differ in different cell types. Therefore, the relationship between the trajectory of somatic expansion and clinical phenotypes is currently hypothetical. Figure adapted from Ross et al. [45] and Bates et al. [3] and created using BioRender.com. The in silico modelling of Kaplan et al. [34] predicted that the intracellular pathogenic CAG length threshold in HD was 115 CAG. However, the confidence intervals for this threshold are wide (95% CI 93– 170 CAG), possibly due to the model being fitted to a relatively small sample (n = 336). Do we have any evidence that would allow us to determine whether 115 CAG is a reasonable estimate of the threshold, or to refine that wide confidence interval? This is critical, as under this scenario, the period of an HD at-risk subject’s life before such a threshold is reached is a window of opportunity for therapies that address the expansion of the repeat (Fig. 2). It is potentially a large window as expansion is likely to occur throughout life [26, 42– 44] and at-risk subjects remain largely indistinguishable from their non-HD at-risk peers for a substantial period of that time [39]. There are some potential clues to the intracellular pathogenic threshold. We might be able to improve the definition of the edge of the pathogenic thresholds using data from mouse models. In mouse models the repeat is normally expanded to 100 CAG or more in order to induce a disease-like phenotype in the short-lived mouse [46, 47]. Even in the presence of tracts of over 100 CAG in their Htt gene, mice may only develop subtle phenotypes late in their lifespan. Other diseases where a CAG tract is translated to a polyglutamine tract may also offer some clues about thresholds for intracellular CAG toxicity [4, 48– 50]. Possible inferences from these sources of evidence are discussed below.

EVIDENCE FROM HD ANIMAL MODELS

There are many animal models of HD generated in a number of different ways (Table 1). They can be divided into those expressing transgenes with a truncated section of human HTT carrying the CAG tract, or full length human HTT, and those with long CAG tracts replacing mouse Htt in one way or another [3, 46, 47, 51]. Instability of the repeat sequence has been seen in many of the mouse models and was noted in the first HD models ever reported, the R6 series [52, 53].

Table 1

Animal models of Huntington’s disease with up to 100 CAG repeats

Animal model	Construct	Promoter	PolyQ repeat length	Pure repeat length/interruptions	Phenotype	Somatic expansion?	References
YAC46 (mouse)	FL human HTT gene within a YAC	HTT	Q46	46 mixed CAA/CAG	Increased NMDA-induced Ca⁺ response, but no behavioural or cognitive phenotype	Not reported	[54– 56]
HD46	N-terminal 3 kb of human HTT cDNA	Rat NSE	Q46	(CAG)_n(CAA)(CAG)	Increased incidence of clasping, abnormal gait and abnormal activity- though time-points are not clear. HTT inclusions in cortex and striatum.	Not reported	[57]
HD48	FL human HTT	CMV	Q48	(CAG)_n(CAA)(CAG)	Limb clasping from 8– 24 weeks, hyperactivity from 20 weeks, reduced exploration from 24 weeks. Neuronal loss and fibrillary astrocytosis in the striata. HTT aggregates.	Not reported	[58]
HdhQ50	Chimeric human exon 1/ mouse Htt	Htt	Q50	(CAG)₄₈(CAA)(CAG)	No behavioural or neuropathological changes observed at 6 months of age.	Not reported	[59, 60]
HDQ50	CAG-only tract knocked into mouse exon 1	Htt	Q52	(CAG)₅₀(CAA)(CAG)	No behavioural or pathological changes observed.	Not reported but germline expansion is present. The allelic line HDQ150 has somatic expansion.	[25, 61, 62]
Tg51 (rat)	1962 bp rat Htt cDNA fragment	Htt	Q51	Not clear	Reduced anxiety from 2 months, impaired coordination from 10 months, nuclear inclusions from 12 months.	Not reported	[63]
CAG71	Chimeric human exon 1/ mouse Htt	Htt	Q71	(CAG)₇₁ (R42) Arginine residue at position 42	No behavioural abnormalities.	Not reported	[64, 65]
YAC72 (mouse)	Full length human HTT gene within a yeast artificial chromosome	HTT	Q72	(CAG)_n(CAA)(CAG)	Circling and foot-clasping from 9 months, hyperactivity at 7 months, HTT aggregates and striatal degeneration from 12 months.	Not reported	[54, 56, 66]
Hdh6/Q72	Chimeric human exon 1/ mouse Htt	Htt	Q72	(CAG)_n(CAA)(CAG)	Hyperaggressive behaviour from 3 months. No neuropathological changes.	Yes. Multiple tissues. Expansion bias in striatum where > 80% of cells showed expansions greater than+5 CAG.	[67]
Htt-Q79	Chimeric human exon 1/ mouse Htt	Htt	Q77	(CAG)_n(CAA)(CAG)	Aggressive behaviour. Reactive gliosis from 40 weeks, nuclear inclusions in the striatum from 80 weeks.	Yes. Multiple tissues (brain, liver, kidney and stomach).	[68, 69]
Hdh4/Q80	Chimeric human exon 1/ mouse Htt	Htt	Q80	(CAG)_n(CAA)(CAG)	Hyperaggressive behaviour from 3 months. Diffuse nuclear staining in the striatum from 17 weeks, HTT aggregates from 48 weeks.	Yes, high levels observed in the striatum at 24 months. Expansions were also observed in the cortex, cerebellum, hippocampus, hindbrain, spinal cord, olfactory bulb, kidney and eye.	[67, 70– 72]
N171-82Q	N-terminal 171 amino acids of human HTT cDNA	Prp	Q82	(CAG)_n(CAA)(CAG)	Motor deficit from 5 months, HTT nuclear inclusions.	Not reported	[73]
N586-82Q	N-terminal 586 amino acids of human HTT cDNA	Prp	Q82	(CAG)_n(CAA)(CAG)	Rotarod deficit from 4 months, hyperactivity from 5 months, HTT aggregates and cognitive impairment from 8 months.	Not reported	[74]
HD89	FL human HTT	CMV	Q89	(CAG)_n(CAA)(CAG)	Limb clasping from 8 weeks, hyperactivity from 20 weeks, less exploration from 24 weeks. Neuronal loss and fibrillary astrocytosis in the striata.	Not reported	[58]
R6/1-89Q	Human HTT exon 1	HTT	Q89	(CAG)_n(CAA)(CAG)	Clasping behaviour from 24 weeks, diffuse nuclear staining in cerebral cortex and hippocampus from 11 weeks, body weight loss from 28 weeks.	Yes. Expansions in motor cortex and hippocampus from 9 weeks.	[75]
HdhQ92	Chimeric human exon 1/mouse Htt	Htt	Q92	(CAG)_n(CAA)(CAG)	Cognitive deficits from 4 months, mild motor deficit and HTT aggregates from 6 months, striatal cell loss from 8 months.	Yes, in striatum and liver	[60]
CAG94	Chimeric human exon 1/mouse Htt	Htt	Q94	(CAG)₉₄ (R42) Arginine residue at position 42	Increased sensitivity to NMDA from 7 weeks. Increased rearing from 9 weeks, decreased motor and exploratory activity from 18 weeks.	Not reported	[64, 65]
Hu97/18 (mouse)	Two full human HTT alleles, one mutant and one wild type	HTT	Q97	(CAG)_n(CAA)(CAG)	Learning and motor deficit from 2 months, progressive cognitive deficits from 6 months, reduced cortical and striatal volume from 12 months.	Not reported	[76, 77]
BAC HD (mouse)	FL human HTT	HTT	Q97	97 mixed CAA/CAG	Motor deficit at 2 months significant by 6 months, no HTT aggregates.	No	[78– 80]
BAC HD (rat)	FL human HTT	HTT	Q97	97 mixed CAA/CAG	Motor deficit from 2 months, hypoactivity from 4 months, learning deficit from 6 months.	No	[81– 83]
HD100	N-terminal 3 kb of human HTT cDNA	Rat NSE	Q100	(CAG)₉₈(CAA)(CAG)	Rotarod impairment, clasping, abnormal gait from 13– 78 weeks. HTT inclusions from 13 weeks.	Not reported	[57]
HDQ50	CAG tract knocked into mouse exon 1	Htt	Q100	(CAG)₁₀₀(CAA)(CAG)	No behavioural or pathological changes observed.	Not reported but germline expansion is present. The allelic line HDQ150 has somatic expansion.	[25, 61, 62]

NSE, neuron-specific enolase; CMV, cytomegalovirus; PrP, prion gene promoter. Not reported means no data were available. No means somatic expansion was investigated and not seen.

Animal models of Huntington’s disease with up to 100 CAG repeats NSE, neuron-specific enolase; CMV, cytomegalovirus; PrP, prion gene promoter. Not reported means no data were available. No means somatic expansion was investigated and not seen. Although there are multiple rodent models which have been deployed to help us understand the biology of HD and begin the search for therapies, many are limited in their ability to inform us of the effects of genetic modifiers of disease, as they often present with repeats well above the presumed intracellular pathogenic threshold and a severe phenotype. The most useful are those with relatively short repeats (Table 1) though they have differences in their genetic manipulations that make straightforward inferences about the threshold for intracellular pathogenesis complex. They all have either full length human HTT transgenes or human exon 1 replacing mouse exon 1. This may lead to differences in transcription and translation compared with humans. Often several copies of a transgene are present in a genome and RNA and protein expression levels are variable. While the human disease is completely dominant and expression levels of HTT seem irrelevant to disease manifestation and course [84], that is not true in all animal models where expression levels do appear to influence phenotype, as discussed below. Perhaps most challenging is that in many of these models a human HTT gene has been used with a long pure CAG tract. This will almost invariably lead to somatic expansion thus the intracellular pathogenic threshold we are interested in is a moving target— it will have expanded in individual cells from the inherited or engineered repeat length but by an unknown amount. Finally, somatic expansion itself can be transcriptionally mediated [85– 87], meaning that alterations in transcription across an expansion prone-repeat may themselves alter levels of expansion. An added complication is that very long repeats appear less pathogenic than shorter disease-causing repeats, and more prone to contraction than expansion [88], though it is not clear why [89, 90]. The earliest onset and most deleterious phenotypes are seen around 150 CAG with longer CAG tracts giving later phenotypic changes [89– 91] though it should be noted that in mice with an inherited ∼150 CAG there is also somatic expansion and the repeat length in the susceptible cells is likely to be longer than 150 CAG. Very long repeat tracts form unusual DNA structures [2] that can inhibit transcription or translation of HTT, though there is evidence that the somatically expanded CAG tracts in DNA are transcribed and translated into expanded polyglutamine-containing mutant HTT (mHTT) [92]. Such very long repeat tracts might prevent downstream events that promote pathogenesis such as production of exon 1 fragments [11, 93, 94] or nucleo-cytoplasmic shuttling [95]. Landles et al. [96] demonstrated that a version of the R6/2 mouse with 90 CAG, R6/2(CAG)90, showed earlier mHTT nuclear aggregation than an R6/2 line with 200 CAG, R6/2(CAG)200, but later phenotype onset. The R6/2(CAG)90 brains contained nuclear aggregates that had a diffuse punctate appearance and remained partly detergent soluble, which correlated with the onset of transcriptional changes, whereas the R6/2(CAG)200 brains had cytoplasmic aggregates that gave larger inclusion bodies which correlated with behavioural changes. Both lines of mice showed somatic expansion of the CAG tract, therefore the exact CAG tract length giving rise to these different molecular sequelae remains unknown. Further detailed studies in animals with less extreme repeat lengths such as this might well yield more insights into the pathogenic mechanism and threshold. A number of models, still encoding glutamine but using a mixed CAACAG rather than a pure CAG tract, can help to establish a window for a pathogenic repeat length. The mixed CAACAG stabilises the repeat tract [78], preventing germline and somatic expansion. The BAC HD model with 97 glutamines encoded by a mixed CAACAG tract fulfils this criteria— the mixed CAACAG tract prevents both germline and somatic expansion in mice but is still pathogenic (Table 1) [78]. These mice have 5 copies of the transgene integrated into their genome and express the BAC HD HTT at higher than endogenous levels, estimated at three-fold the level of transcript and 1.5– 2-fold the level of protein. They also notably show functional deficits, but no HTT-positive inclusions, and the translated mHTT protein is largely full-length and mainly located in the cytoplasm [78]. Other sequelae of the mixed tract may exert effects unrelated to CAG expansion: the CAACAG tract is likely to form different DNA structures to pure CAG tracts and this may well affect transcription at the locus. Differential codon usage may further affect translation efficiency of HTT: CAG is used to encode glutamine three times as frequently as CAA in brain tissue [97]. Given the BAC HD line with a stable tract of 97 glutamine-encoding codons has a phenotype [78] this sets an upper bound to the likely intracellular pathogenic length (Table 1). The HdhQ92 mouse with a human exon 1 pure CAG tract knocked into mouse Htt has a late behavioural phenotype and mHTT inclusions. This is one of the minority of HD mouse models that has been assessed systematically for the presence of somatic instability, which is seen especially in brain and liver [60, 98, 99]. This means that the CAG tract length associated with the intracellular pathogenic threshold in these mice is unknown. Similarly the HdhQ72 and HdhQ80 models show some mild phenotypes but also show substantial somatic instability, again with high levels of instability in the striatum, consistent with the HdhQ92 mice [28, 67, 72]. Other rodent lines with CAG tracts of less than 100 repeats might provide further clarification. However, in most of these somatic expansion is likely to occur, though it has not been reported, as a human HTT transgene with a long pure CAG tract was used to generate these mice (Table 1) [54, 58, 63, 73, 74, 100]. If expansion is occurring, in the germline or somatically, it is likely this will provoke changes in phenotype within a cohort. More interesting are the models reported by Levine et al. (1999) with 71 and 94 CAGs. Both have a (coding) interruption in the middle of the CAG tract [64]. The 71 CAG mice have no reported abnormal behaviour and the 94 CAG mice only some minor changes [64]. Given the interruption in the CAG array of the transgene it is likely that somatic expansion is attenuated in these mouse models. These animal models may well be either side of a threshold that defines pathogenesis, so it would be useful to investigate whether they display somatic expansion. There are additional limitations in extrapolation from mice and other animal models to people [101]. Expression levels of the gene and protein are not necessarily at endogenous levels. Genetically the most accurate animal models are those with long CAGs knocked into their mouse Htt gene (Table1). Most contain a human exon1/mouse Htt chimeric sequence which has the disadvantage of not being a gene that appears in nature at all. There is an allelic series where only the mouse CAGCAA sequence encoding 7 glutamines is replaced in mouse exon 1 [102] (Table 1), but this may not show appropriate human-relevant downstream pathogenic events. Animals are not humans and have inherent limitations such as short lifespans and differences in underlying biology, seen in DNA repair systems and oxidative damage [103], that may well be important in determining the pathogenic threshold in particular models. While both people with HD and the animal models of disease have development of phenotypic changes over time, animals do not have an age at onset of manifest disease, as at clinical diagnosis in humans. In both people and models the changes seen depend on what phenotypes are examined and how they are measured [38, 104, 105]. The differences in disease manifestation in people are not reflected in mice, because laboratory mice are much less genetically diverse and live in a more uniform environment. Genetic variation in HD subjects influences the presentation of many non-motor symptoms for instance [106]. Most HD mouse models, despite possession of a repeat length that would give juvenile HD with its different clinical presentation, show a similar motor phenotype (though this may be an artefact of how this is measured) (Table 1) [47]. They also display very little frank neurodegeneration, though they often have smaller and lighter brains than their wildtype counterparts [46]. A series of matched knock-in lines with identical glutamine encoding stretches in Htt have been generated: one with pure CAG tracts and a parallel line with CAACAG alternating tracts [51]. These lines encode 45, 80 and 105 glutamines and should reveal the pathogenic threshold in mice provided expression levels are similar in the parallel lines, though the processing of m Htt may still be different in the mouse gene from the human gene. If extrapolating the pathogenic repeat threshold from mouse models is difficult, is it then possible to garner more relevant information from other human diseases caused by similar repeat expansions?

EVIDENCE FROM OTHER DISEASES

Repeat sequences are common in the genome and biologically functional [107] and there is a growing list of diseases caused by expanded repeated sequences in DNA [1, 2, 108]. A series of neurodegenerative diseases are caused by expanded CAG sections in their coding sequence, invariably translated to a polyglutamine tract [109]. These diseases have some striking similarities: the repeat threshold at which disease is caused is in most cases a similar length [4, 48], they show a strong relationship of repeat length with age at onset of disease, many show somatic and germline expansion of their causative repeat [31, 48, 110– 112] and they have similar genetic modifiers of their ages at onset [113] (Table 2). This implies that the underlying events leading to expansion of the CAG tracts in these diseases might have common mechanisms that can be used to inform all of these diseases, though the molecular pathogenic events downstream of the CAG tract may be specific to each disease.

Table 2

Evidence from human CAG-repeat disorders

Disease	Normal CAG range	Disease CAG range	Pure repeat length/interruptions	Phenotype	Somatic expansion observed?	References
HD	6– 35	36– 200+	(CAG)_n(CAA)(CAG) CAG tract usually ends CAACAG. Alleles without CAA or with multiple CAAs observed.	Age at onset 2– 80 years (n = 40 years). Pure CAG alleles give earlier onset, multiple CAA interruptions later onset.	Yes, in neural and peripheral tissues. More CAG expansion in brain, especially striatum, globus pallidus and cerebral cortex, than peripheral tissues (except the cerebellar cortex). Repeat expansion is greater in neurons than glial cells.	[7, 8, 24, 26– 29, 122]
SCA1	6– 35	39+(no CAT) 45– 81+	(CAG)_n(CAT)(CAG)_n OR (CAG)_n(CAT)(CAG)(CAT)(CAG)_n CAG tract is usually interrupted by CAT or CATCAGCAT. Alleles without interruptions or with additional CATs observed.	Age at onset 4– 74 years (n = 40 years). Pure CAG alleles give earlier onset, CAT interruptions later onset.	Yes, in neural and peripheral tissues. More CAG expansion in brain than the peripheral tissues (except the cerebellar cortex).	[116, 123– 136]
SCA2	14– 31	32– 270+	(CAG)_n(CAA)(CAG)_n OR (CAG)_n(CAA)(CAG)₄(CAA)(CAG)_n OR (CAG)_n(CAA)(CAG)₄(CAA)(CAG)₄ (CAA)(CAG)_n CAG tract is usually interrupted by 1– 3 CAAs separated by 4x CAGs. Alleles with no CAA, or with no CAAs and a CCG or CCGCCG interruption have also been observed.	Age at onset 1– 66 years (n = 32 years). Pure CAG alleles – ataxic phenotype, CAA interruptions – Parkinsonism (SCA2-P) or ALS phenotype. Lack of CAA plus a CCG or CCGCCG exclusively observed in disease alleles	Yes, in blood. No evidence of somatic expansion has been found in alleles with interrupted repeats involved in SCA2-P phenotype.	[116, 125, 137– 150]
SCA3	12– 44	52– 86+	(CAG)₂(CAA)(AAG)(CAG)(CAA)(CAG)_n(CGG) OR (CAG)₂(CAA)(AAG)(CAG)(CAA)(CAG)_n(GGG) CAG tract ends with either CGG or GGG and is usually interrupted by two CAAs and one AAG. Alleles with one less CAA have been observed.	Age at onset 4– 70 years (n = 36 years). No relationship between repeat interruptions and phenotype. 3’ flank CGG rather than GGG is associated with disease	Yes, in neural and peripheral tissues. More CAG expansion in brain than the peripheral tissues (except the cerebellar cortex).	[116, 131– 134, 151– 155]
SCA6	7– 16	21– 28+	(CAG)_n No interruptions in disease or normal allele.	Age at onset 19– 71 years (n = 48 years).	No	[115– 117]
SCA7	7– 35	34– 460+	(CAG)_n No interruptions in disease or normal allele.	Age at onset 1– 45 years (n = 20 years).	Yes, in blood, kidney, and skeletal muscle (limited studies)	[116, 156– 162]
SCA12	7– 32	55– 78+	(CAG)_n No interruptions in disease or normal allele.	Age at onset 8– 55 years (n = 33 years).	Yes, in blood (limited studies)	[116, 163– 165]
SCA17	25– 44	47– 63+	(CAG)₃(CAA)₃(CAG)_n(CAA)(CAG)(CAA) (CAG)_n(CAA)(CAG) CAG tract typically begins with (CAG)₃(CAA)₃, ends with CAACAG and has a central CAACAGCAA interruption. Alleles without CAACAGCAA have been observed.	Age at onset 3– 75 years (n = 35 years). Alleles without CAACAGCAA are associated with disease phenotype	Yes, in blood (limited studies). Mosaicism and therefore mutation frequency in patients with pure CAG repeats (without the central CAACAGCAA) is 2-3 fold of those with CAA interruptions.	[116, 166, 167]
DRPLA	6– 35	48– 83+	(CAG)_n No interruptions in disease or normal allele.	Age at onset 1– 69 years (n = 30 years).	Yes, in neural and peripheral tissues. More CAG expansion in the brain than the peripheral tissues (except the cerebellar cortex). Repeat expansion is greater in glial cells than neurons. In cerebellum Purkinje cells and cerebral neuronal cells show more expansion than cerebellar granule cells.	[133, 153, 168– 173]
SBMA	9– 36	38– 62+	(CAG)_n No interruptions in disease or normal allele.	Age at onset 14– 75 years (n = 43 years).	Yes, in neural and peripheral tissues. Skeletal and cardiac muscle showed the most expansion whereas central nervous system tissues, liver and spleen showed the least.	[122, 174– 178]

Evidence from human CAG-repeat disorders Table 2 shows the diseases caused by expanded CAG tracts where the repeat is definitely or likely to be translated to a polyglutamine tract in the cognate protein— it is perhaps of relevance that most of the polyglutamine protein products have a role in DNA repair [33, 109, 114]. Only spinocerebellar ataxia 6 (SCA6, OMIM 183086) shows no evidence of somatic expansion of the CAG tract, though there is genetic anticipation in families, implicating germline expansion [115– 117]. SCA6 may therefore be an exception, not requiring intracellular somatic expansion to elicit pathogenesis. The CAG tract disease range is shorter than in the other diseases, and the repeat occurs in CACNA1A, encoding a calcium channel. Nuclear inclusions are seldom observed [118– 120], and as in SCA2 (OMIM 183090), are mainly cytoplasmic, thus it is likely that the cell toxicity in this disease is mediated through other, protein-based, mechanisms [121]. SCA7 (OMIM 164500), SCA12 (OMIM 604326), dentatorubral-pallidoluysian atrophy (DRPLA, OMIM 125370) and spinal and bulbar muscular atrophy (SBMA, OMIM 313200) show no repeat tract interruptions and all have at least some evidence of somatic expansion (Table 2). Interruptions are present in the CAG tracts of the causative genes for SCA1 (OMIM 164400), SCA2 (OMIM 183090), SCA3 (also known as Machado-Joseph’s disease, OMIM 109150), and SCA17 (OMIM 607136), though loss of interruptions is associated with disease-causing alleles in SCA2. These diseases are perhaps most informative in our quest to define the pathogenic CAG length range, as interruptions are known to stabilise expanded tracts, such that the inherited allele repeat length is likely to be the maintained in most cells. Information on the extent to which interruptions reduce the rate of expansion and delay age at onset could be used to modify the Kaplan model and thus improve the estimate of the pathogenic threshold. SCA1 disease-causing expanded CAG tracts are 39 CAGs or more with no interruption, or 45– 81 with interruptions. Lack of interruptions gives earlier disease onset [135] and in uninterrupted alleles there is a strong length correlation with age at onset [127]. The interruptions are CAT, encoding histidine rather than glutamine, and the later onset of disease was assumed to be mediated by the resulting change in the protein [126], but it appears more likely to be mediated at the level of DNA by the somatic expansion widely seen in this disease [123, 124, 129, 131– 133, 136, 179]. The pathology of SCA1 is concentrated in the cerebellum with a characteristic early and severe degeneration of the Purkinje cells [4] although recent evidence shows that subjects have widespread degeneration in deep cerebellar structures and the brainstem as well as cerebral pathology [180]. In postmortem SCA1 human brain, the highest levels of somatic expansion are not seen in the cerebellar regions and the Purkinje cells most affected in the disease [129], though at the end stage of disease the earliest affected cells may have been lost. Additionally, Purkinje cells are low in number compared with other cerebellar neurons [181], and thus rare, large expansions in these cells are likely to be underestimated when looking at whole cerebellar tissue. However, elegant work in mice has shown that it is likely to be protein interactions, particularly with capicua, that drive cell-specific intracellular pathogenesis in the Purkinje cells [182, 183]. Nevertheless, somatic expansion may drive other pathogenic events in SCA1: a similar genetic modification signal was seen in SCA1 as in HD, implying that age at onset is at least partly modulated by similar events in both diseases [113]. SCA2 is more complicated. Most CAG tract alleles have CAA interruptions, but may also be interrupted by CCG, encoding glycine. Pure CAG tracts over 34 CAG cause the ataxic phenotype of SCA2 [5, 50, 145], but interrupted alleles in what would normally be considered the long normal or low SCA2 range (see Table 2), give a Parkinsonian or amyotrophic lateral sclerosis phenotype [138, 141, 148, 184]. No evidence of somatic expansion has been seen in the phenotypes associated with interruptions [149, 150] but it is seen in SCA2 [140]. SCA3 is perhaps the most interesting and informative of the SCAs with respect to the CAG length pathogenic threshold. Normal alleles may have repeat lengths up to 44 CAG, whereas disease-associated alleles range from 52– 75 CAG, with most disease alleles harbouring repeat lengths of over 60 CAG (Table 2) [48]. There is a window where no repeat lengths have been reported between the normal and disease ranges in SCA3 as in DRPLA, SCA12 and SCA17. The CAG tract is usually interrupted by two CAAs and there does not seem to be an association between the presence of interruptions and phenotype. Notably the somatic mosaicism observed is of the order of a few repeats even in the presence of CAG tracts of 70– 80, and expansions are more prevalent in peripheral tissues than in nervous tissue [153, 154]. Though these analyses are in relatively few brains and do not use techniques that would reveal individual large expansions, nevertheless this appears to be a more stable CAG repeat tract than in HD or SCA1 for instance, especially given the CAG tract length. This provides a repeat tract length for neurodegeneration of a minimum of 60 CAG in SCA3. SCA17 is caused by an expanded mixed CAA/CAG tract in TBP. The normal repeat number in TBP is up to 40 CAG/CAA, reduced penetrance alleles have 41– 48 repeats, and full penetrance alleles carry more than 48 CAG/CAA repeats. These interrupted repeats have a complex structure [185, 186] (Table 2). SCA17 predominantly occurs in subjects who do not carry a central CAACAGCAA interrupting sequence. The instability of the expanded CAG repeat is dependent on repeat configuration, and CAA interruption is a limiting factor for further CAG repeat expansion [166]. There is some germline instability, raising the possibility that there is some somatic expansion also occurring despite the non-pure CAG tract. There are limitations to extrapolating from other diseases. They have different pathologies and different susceptible cell types. Notably in most of these diseases regional pathology and somatic expansion are not correlated, but relatively few subjects have been analysed in anatomical detail and only one study conducted at the single cell level. This study, measuring somatic expansion in single cells in DRPLA, compared somatic mosaicism in cerebellar structures in early versus late onset patients [173]. Higher rates of expansion were more evident in late onset case than early onset cases, though this may well be a function of age [88, 187]. The frequency of expansions was highest in glial cells, with Purkinje cells lower and granular cells lower again. Relative levels of expression of the cognate genes in the most susceptible cells are not known, but are assumed to underlie differential spatial pathogenesis [121, 188], and transcription appears to be important in promoting somatic repeat tract length changes [85– 87, 189– 192]. Finally, surviving cells that are examined in post-mortem human brain may be resistant to the ongoing toxicity mechanisms and therefore uninformative about the intracellular pathogenic repeat length threshold.

MOUSE MODELS IN OTHER REPEAT DISORDERS

There are multiple mouse models of each non-HD polyglutamine repeat disorder, most of which have not had somatic expansion of the repeat surveyed systematically (Table 3). Most alleles were cloned from patients as transgenes or knocked into the endogenous mouse genes, and often required longer CAG repeat lengths than in humans to evoke a phenotype. As in HD animal models, transgenic mouse models of these diseases often demonstrate severe early-onset neuropathology and behavioural syndromes whilst knock-in mouse models tend to show milder late-onset phenotypes that perhaps parallel the disease more accurately, but are slower to produce phenotypes. Consistent with animal models of HD, animal models of other triplet repeat disorders tend to show increased disease phenotype as CAG repeat length increases, though this is influenced by the promoter used, transgene copy number and resultant transgene expression. Cemal et al. [193] generated a series of eight YAC SCA3 models and found that disease severity increased both with an increased CAG repeat tract length and an increased transgene copy number such that an animal with 72 CAG and one copy of the transgene developed symptoms later than an animal with 67 CAG repeats and two copies of the transgene.

Table 3

Animal models in other repeat disorders

Disease	Animal model	Construct	Promoter	PolyQ repeat length	Pure repeat length/interruptions	Phenotype	Somatic expansion observed?	References
DRPLA	AT-FL-65Q	FL human ATN1	Prp	Q65	(CAG)_n	Motor deficits from 9 weeks. NIIs from 19 weeks.	Not reported	[194, 195]
	ATN-1[Q76]	FL human ATN1	ATN1	Q76	(CAG)_n	Motor deficits from 8 weeks. DNS from 14 weeks.	Yes at 64 weeks. High levels of somatic expansion in cortex, liver, kidney. Low levels in cerebellum and heart.	[196– 198]
	Q96	FL human ATN1	ATN1	Q96	(CAG)_n	Motor deficits from 8 weeks. Decreased life span (54 weeks).	Not reported	[197]
	Q113	FL human ATN1	ATN1	Q113	(CAG)_n	Motor deficits from 7 weeks. Decreased life span from 21 weeks.	Not reported	[197]
	Atro-118Q	FL human ATN1	NSE	Q118	(CAG)_n	Motor deficits from 5 weeks. DNS and NIIs from 12 weeks. Decreased lifespan from 14 weeks.	Not reported	[199]
	Tg(ATN1)Q129Stsu	FL human ATN1	ATN1	Q129	(CAG)_n	Motor deficits from 3 weeks. DNS from day 4. NIIs at 9 weeks. Decreased lifespan from 10 weeks.	Somatic instability not reported. Germline ATN1 instability reported.	[197, 198, 200]
SCA1	Sca178Q	Chimeric human exon 8/ mouse Atxn1	Atxn1	Q78	(CAG)_n	Motor deficits from 9 months. No neuropathological changes up to 18 months.	Somatic instability not reported. Intergenerational repeat instability (+2 to – 6 CAG)	[201]
	PS-82 or B05	FL human ATXN1	Pcp2/L7	Q82	(CAG)_n	Motor deficits from 5 weeks. NIIs in Purkinje cells at 6 weeks. Purkinje cell loss 24 weeks.	Not reported. Intergenerationally stable.	[202– 204]
	PrP-tTA/TRE-SCA1	FL human ATXN1	Tet-Prp	Q100	(CAG)_n	Motor deficits from 5 weeks. Purkinje cell loss, dendritic degeneration and reactive gliosis from 20 weeks	Not reported	[205]
	Sca1154Q	Chimeric human exon 8/ mouse Atxn1	Atxn1	Q154	(CAG)_n	Motor deficits from 8 weeks. NIIs in hippocampus and cerebral cortex from 7 weeks. NIIs in cerebellum at 40 weeks.	Yes, observed in most tissues at 40 weeks. Considerable expansions in striatum (>210 CAG), spinal cord and pons.	[206, 207]
SCA2	Atxn2-CAG42-Knock-In	Expanded CAG in exon 2 of mouse Atxn2	Atxn2	Q42	(CAG)_n	Motor deficit from 18 months. DCS and CI in cerebellum at 13 months.	No somatic instability or intergenerational instability.	[208]
	58Q-5B	FL human ATXN2	Pcp2/L7	Q58	(CAG)_n	Motor deficits from 16 weeks. CIs and decreased number of Purkinje cells at 24 weeks.	Not reported	[209, 210]
	BAC-Q72	FL human ATXN2	Atxn2	Q72	(CAG)_n	Motor deficits from 8 weeks. Morphological changes in Purkinje cells at 24 weeks.	Not reported	[211]
	75Q-SCA2	FL human ATXN2	Atxn2	Q75	(CAG)_n	Motor deficits from 6 weeks. Purkinje cell degeneration from 52 weeks	Not reported. Intergenerationally stable.	[212]
	Atxn2-CAG100-KIN	Expanded CAG in exon 2 of mouse Atxn2	Atxn2	Q100	(CAG)_n	Motor deficits from 20 weeks. Progressive brain atrophy and neuronal aggregation in multiple brain regions from 13 weeks.	Yes, expansions > 128 CAG observed. Somatic mosaicism observed in multiple mice.	[213]
	ATXN2	FL human ATXN2	Pcp2/L7	Q127	(CAG)_n	Motor deficits from 8 weeks. Extranuclear aggregates and DCS in the cerebellum from 4 weeks. Purkinje cell loss from 12 weeks.	Not reported	[214]
SCA3	MJD 64.8	FL human ATXN3	ATXN3	Q64	(CAG)₂(CAA)(AAG) (CAG)(CAA)(CAG)₅₈	Motor deficits from 3 weeks. Severe neuronal loss and gliosis. Widespread NIIs from 5 months.	Not reported	[193]
	MJD67.2	FL human ATXN3	ATXN3	Q67	(CAG)₂(CAA)(AAG) (CAG)(CAA)(CAG)₆₁	Motor deficits from 4 weeks. NIIs in multiple brain regions.	Not reported	[193]
	Pcp2/ATXN3_69CAG	c-terminus human ATXN3	Pcp2/L7	Q69	(CAG)₂(CAA)(AAG) (CAG)(CAA)(CAG)₆₃	Motor deficits from 3 weeks. Cerebellar atrophy from 3 weeks. DCS in cerebellum at day 5, NIIs at 7 weeks.	Not reported	[215]
	70.61 CAG	FL human ATXN3	Prp	Q70	Not reported	Motor deficits from 12 weeks. Widespread NIIs at 3-4 months	Not reported	[216, 217]
	Q71B/C	FL human ATXN3	Prp	Q71	(CAG)₂(CAA)(AAG) (CAG)(CAA)(CAG)₆₅	Motor deficits from 11 weeks. Widespread NIIs at 2-3 months	Not reported	[218]
	MJD72.1	FL human ATXN3	ATXN3	Q72	(CAG)₄(CAA)(CAG)₆₇	Motor deficits from 34 weeks. NIIs in 10% pontine and dentate neurons	Not reported	[193]
	MJD76.1 (founder)	FL human ATXN3	ATXN3	Q76	(CAG)₂(CAA)(AAG) (CAG)(CAA)(CAG)₇₀	Motor deficits from 4 weeks.	Not reported	[193]
	PrP/MJD77-het/hom	FL human ATXN3	Tet-Prp	Q77	(CAG)₂(CAA)(AAG) (CAG)(CAA)(CAG)₇₂	Motor deficits from 9 weeks. NIIs in cerebral cortex at 4 weeks.	Not reported	[219]
	HA-Q79	FL human ATXN3	Pcp2/L7	Q79	(CAG)₂(CAA)(AAG) (CAG)(CAA)(CAG)_n	No behavioural phenotype or neurological phenotype at 23 weeks	Not reported	[220]
	HA-Q79	Truncated ATXN3	Pcp2/L7	Q79	(CAG)₂(CAA)(AAG) (CAG)(CAA)(CAG)_n	Ataxia from 4 weeks, failed to rear	Reported as stable on transmission.	[220]
	Ataxin-3-Q79HA	FL human ATXN3	Prp	Q79	(CAG)₂(CAA)(AAG) (CAG)(CAA)(CAG)_n	Motor deficits from 20 weeks. Widespread NIIs at 43 weeks.	Not reported	[221]
	Atxn3Q82/Q6	Chimeric human exon 10/ mouse Atxn3	ATXN3	Q86	(CAG)₂(CAAAAG) (CAG)₈₂	No motor impairment at 52 weeks. NIIs in deep cerebellar nuclei at 10 weeks. Extranuclear inclusions in striatum, hippocampus.	Somatic instability not reported but intergenerational instability observed	[222]
	MJD84.2	FL human ATXN3 (two copies)	ATXN3	Q84	CAG)₂(CAA)(AAG) (CAG)(CAA)(CAG)₇₈	Motor deficits from 4 weeks. Widespread DNS (52 weeks). NIIs cerebellum and pons. Purkinje cell loss (80 weeks).	Not reported	[193, 223]
	MJD22.1/84.1	FL human ATXN3	ATXN3	Q84	(CAG)₂(CAA)(AAG) (CAG)(CAA)(CAG)₁₂ (CAA)(CAG)₆₅	Motor deficits not reported, mild neuronal loss and gliosis. NIIs in pontine and dentate.	Not reported	[193]
	Ki91	Chimeric human exon 7– 11/ mouse Atxn3	ATXN3	Q91	Not reported	Motor deficits from 90 weeks. Cerebellar degeneration and mild loss of Purkinje cells from 52 weeks.	Yes, age-dependent expansions prominent in the striatum, pons and testes at 40 weeks. Intergenerational instabilities also observed.	[224]
	Hemi-CMVMJD94	FL human ATXN3	CMV	Q94	(CAG)₂(CAA)(AAG) (CAG)(CAA)(CAG)₇₇	Motor deficits from 16 weeks. Mild neuronal atrophy in pontine and dentate nuclei (16 weeks). No NIIs at 12 months.	Yes, age-dependent increase observed in multiple tissues. Pontine nuclei, substantia nigra, striatum and liver showed the highest rates.	[225]
	CMVMJD135	FL human ATXN3	CMV	Q135	(CAG)₂(CAA)(AAG) (CAG)(CAA)(CAG)₁₂₉	Motor deficits from 10 weeks. NIIs in pons and spinal cord from 20 weeks. Neuronal loss in the pons from 60 weeks.	Somatic mosaicism observed. Intergenerational instability in 54– 79% of transmissions.	[226]
	148.19 CAG	FL human ATXN3	Prp	Q148	Not clear	Motor deficits from 9 weeks. Widespread NIIs (3-4 months)	Not reported	[217]
	HDPromMJD148	FL human ATXN3	Htt	Q148	Not clear	Motor deficits from 57 weeks. NIIs in red nucleus, pons and the cerebellum from 78 weeks.	Intergenerational instability	[227]
	SCA3-KI	Expanded CAG in exon 10 of mouse Atxn3	ATXN3	Q304	(CAACAGCAG)_n	Motor deficits and widespread NIIs from 18 months	Not reported	[228]
SCA6	CT-longQ27PC	FL human CACNA1A	CMV/ ACTB	Q27	(CAG)_n	Motor deficit from 8 months. Purkinje cell degeneration.	Not reported	[229]
	Ca_v2.1 knock-in mice	Insertion of CAG into exon 47	CACNA1A	Q28	(CAG)_n	Motor deficits from 6 months	Not reported	[230]
	SCA6_30Q	Chimeric human exon 47/ mouse Cacna1a	CACNA1A	Q30	(CAG)_n	No motor impairment	Not reported	[231]
	SCA6_84Q	Chimeric human exon 47/ mouse Cacna1a	CACNA1A	Q84	(CAG)_n	Motor deficits from 30 weeks. NIIs in Purkinje cells at 22 months.	Not reported	[231, 232]
	MPI-118Q	Chimeric exon 47 with splicing site mutation	CACNA1A	Q118	4 CAA interruptions	Motor deficits from 6 weeks. Purkinje cell number decrease from 6 weeks and cytoplasmic inclusions from 7 weeks.	Not reported	[233]
SCA7	ataxin-7-Q52	FL human ATXN7	PDGF-B	Q52	(CAG)_n	Motor deficits from 7 months. DNS at 39 weeks, NIIs in cerebellum from 43 weeks	Not reported	[234, 235]
	P7E	FL human ATXN7	Pcp2	Q90	(CAG)_n	Motor deficits from 48 weeks. DNS in Purkinje cells from 3 weeks and NIIs from 35 weeks	Not reported	[236]
	R7E	FL human ATXN7	Rhodopsin	Q90	(CAG)_n	No motor deficits reported. DNS in eye retina from 2 weeks and DCS from 4 weeks	Not reported	[236, 237]
	PrP-SCA7-c92Q	FL human ATXN7	Prp	Q92	(CAG)_n	Motor deficits from 8 weeks. NIIs in photoreceptors (11 weeks) and in the pons, hippocampus and medulla (19 weeks).	Yes, mild somatic mosaicism (100 CAGs).	[238– 240]
	Gfa2-SCA7-92Q	FL human ATXN7	GFAP (Gfa2)	Q92	(CAG)_n	Motor deficits from 34 weeks. Reactive gliosis, dendritic degeneration (39 weeks), DNS and DCS in cerebellum (52 weeks)	Not reported	[241]
	PrP-floxed-SCA7-92Q BAC	FL human ATXN7	Prp	Q92	(CAG)_n	Motor deficits from 21 weeks. Loss of cells in molecular layer and los of glial processes (40 weeks)	Not reported	[242]
	Atxn7 100Q	Chimeric human exon 3/ mouse Atxn7	Atxn7	Q100	(CAG)_n	Cerebellar molecular layer atrophy at 8-9 months.	Not reported	[243]
	B7E2	FL human ATXN7	PDGF-B	Q128	(CAG)_n	Motor deficits from 22 weeks. DCS in cerebellum and cerebral cortex (4 weeks), DNS (9 weeks) and NIIs (17 weeks).	Not reported	[244]
	Sca7 266Q/5Q	Chimeric human exon 3/ mouse Atxn7	Atxn7	Q266	(CAG)_n	Motor deficits from 5 weeks. Widespread NIIs from 5 weeks. Brain volume decrease, apoptosis increase, cell loss from 12– 15 weeks.	Not reported. Intergenerational instability reported.	[245]
SCA17	TBPQ64 (rat)	FL human TBP	Prp	Q64	64 mixed CAA/CAG	Motor deficits from 3.5 months. Degeneration of Purkinje cell layers at 9 months.	Not reported. Intergenerational stability across three generations	[246]
	TBP-71Q-27	FL human TBP	Prp	Q71	71 mixed CAA/CAG	Motor deficits from 6 weeks. DNS in multiple brain regions from 15 weeks. Purkinje cell degeneration.	Not reported	[247]
	TBP-105Q	FL human TBP	Prp	Q105	105 mixed CAA/CAG	Motor deficits from 6 weeks. NIIs in cerebellum from 10 weeks.	Not reported	[247]
	L7-hTBP	FL human TBP	Pcp2/L7	Q109	109 mixed CAA/CAG	Motor deficits from 9 weeks. Purkinje cell loss (4 weeks). DNS in brainstem, cerebellum, cerebral cortex and striatum (22 weeks).	Not reported	[248]
	SBMA	Mx-AR	FL human AR	Mx	Q45	(CAG)_n	No behavioural phenotype.	Somatic instability not reported. No change in repeat across 4 generations.	[249]
	NSE-AR	FL human AR	NSE	Q45	(CAG)_n	No behavioural phenotype.	Somatic instability not reported. No change in repeat across 4 generations.	[249]
	AR YAC CAG 45	450-kb YAC construct with 100-kb human AR gene	AR	Q45	(CAG)_n	No behavioural phenotype.	No somatic instability. Intergenerational repeat instability in 10% of transmissions.	[250]
	AR48Q	Chimeric human exon 1/ mouse Ar	mAR	Q48	(CAG)_n	No behavioural phenotype at 23 months. Gene expression alteration in testes at 26 weeks.	Not reported	[251]
	AR65	FL human AR	CMV	Q65	(CAG)_n	Motor deficits from 16 weeks.	Not reported	[252]
	AR-97Q	FL human AR	Chicken β-actin	97	(CAG)_n	Motor deficits from 9 weeks. Widespread DNS and NIIs from 15 weeks	Not reported	[253]
	AR100	FL human AR	hAR	100	(CAG)_n	Motor deficits in males at 48 weeks. NIIs and DNS in spinal cord and hypothalamus (61 weeks).	Not reported	[254]
	112Q	FL human AR	Prp	112	(CAG)_n	Motor deficits from 6 weeks. NIIs in spinal cord (6 weeks)	Not reported	[255]
	AR113Q	Chimeric human exon 1/ mouse Ar	mAR	113	(CAG)_n	Motor deficits from 8 weeks. NIIs in skeletal muscle (10 weeks). Reduced fertility.	Not reported	[256, 257]
	AR120	FL human AR	CMV	120Q	(CAG)_n	Motor deficits from 3 weeks. Cell loss in the spinal cord from 13 weeks, brain volume decreases from 26 weeks	Not reported	[252]
	AR239Q	FL human AR	hAR	239Q	(CAG)_n	Motor deficits from 4 weeks. Widespread NIIs (8 weeks).	Somatic instability not reported. Minor intergenerational instability.	[258]

FL, full length; Prp, prion protein; NSE, neuron specific enolase; NIIs, neuronal intranuclear inclusions; DCS, diffuse cerebellar staining; DNS, diffuse nuclear staining; CI, cerebellar inclusions.

Animal models in other repeat disorders FL, full length; Prp, prion protein; NSE, neuron specific enolase; NIIs, neuronal intranuclear inclusions; DCS, diffuse cerebellar staining; DNS, diffuse nuclear staining; CI, cerebellar inclusions. In some cases, allelic series have been ‘naturally’ generated through intergenerational expansions or contractions following extensive breeding [196– 198, 225]. These models allow us to explore the effect of CAG repeat length in a well-controlled system. One such system is a series of transgenic DRPLA mouse models carrying 76, 96, 113 and 129 CAG, whose motor deficits and cognition worsen with CAG repeat length and age. High levels of somatic expansion were observed in the cortex, liver and kidney of the Q76 mice [196], and although no behavioural phenotype was initially reported in the Q76 after 64 weeks, they showed reduced survival and body weight when compared with non-transgenic littermates [197] as well as neuronal intranuclear accumulation [196]. Again, repeat instability is likely to occur in all models, but was only examined in Q76 animals. Genomic context is an important driver of repeat instability in these models. Early studies of independent transgenic mouse models of SBMA with 45 AR CAG repeats revealed no behavioural phenotype at this repeat length, but did highlight the importance of genomic context in mediating intergenerational repeat instability in mice possessing repeats at the lower end of the pathogenic range [249, 250]. Consistent with this, a knock-in mouse model of SBMA with 48 AR CAG repeats demonstrated no overt behavioural phenotype at 23 months but did show changes in gene expression. Genomic context also appears to govern intergenerational and somatic expansion in a mouse model of SCA7 [240]. Marked repeat instability has been observed in a transgenic mouse model of SCA3, CMVMJD94, which carries 94 CAG repeats [225]. Expansion was observed in multiple tissues, but within the brain mosaicism was most notable in the pontine nuclei, substantia nigra and striatum. Somatic instability correlated well with neuronal atrophy and gliosis in the pontine nuclei and substantia nigra, but pathological involvement was not seen in the striatum [225]. Another mouse model of SCA3, Ki91, and a mouse model of SCA1, Sca1154Q/2Q, also demonstrate similar tissue-specific patterns of repeat expansions, with notable expansions in the striatum [207, 224]. This extends to other repeats— the same tissue distribution of expansion is seen in models of myotonic dystrophy [187]. These data suggest that whilst repeat instability is not associated with cerebellar neuronal vulnerability in models of SCA, it is likely that repeat instability in areas other than the cerebellum might contribute to disease pathogenesis [259]. Intergenerational instability has been observed in numerous models of SCA3 despite the interrupted CAG tract; this could be due to the presence of a long uninterrupted stretch of CAG at the 3’ end of the tract [222, 224, 226, 227]. These findings suggest that somatic instability is occurring in these model systems. Some models have allowed us to examine the substantial effect that inheriting only 1 or 2 CAG additional repeats may have on phenotype [225]. SCA3 mice with 83 CAG repeats did not demonstrate behavioural differences, yet SCA3 mice with 94 CAG repeats and similar expression levels demonstrated rotarod deficits and behavioural abnormalities from 16 weeks. It was concluded that the threshold for disease in this model was between 84– 94 CAG repeats. Analysis of data from two cohorts of Q94 also revealed an inverse correlation between the length of the CAG repeat tract and the time spent on the rotarod [225]. Whilst animal models have been invaluable in examining pathogenesis in these diseases, as in HD models, to date, it has been difficult to show directly that somatic expansions are causative to neuronal dysfunction, earlier age at onset and faster disease progression. Interpretation of results is difficult when repeat sequence and length are not clearly defined or have not been examined. Many of the issues that arise in the HD animal models also arise in animal models of other repeat disorders and for many of the same reasons. However, the conclusion from human CAG repeat disorders, and also the corresponding mouse models, would indicate that a repeat length of less than 100 CAGs is toxic to cells— at the shorter end of that estimated by Kaplan et al. [34]. Exactly where the intracellular pathogenic threshold falls remains unclear, but the evidence would place it at over 60 CAG. The question remains whether it is possible to define the intracellular pathogenic threshold more accurately.

WHAT EVIDENCE DO WE NEED TO REFINE OUR DEFINITION OF THE INTRACELLULAR PATHOGENIC THRESHOLD?

The parameters used to establish the CAG-length threshold for HD pathogenesis by Kaplan et al. [34] included the CAG size threshold for disease to arise, the subject’s inherited repeat length as measured in blood, and their current age: these data are available. However, they also require a measure of the cell group critical portion— of the most susceptible cell population(s), what proportion have died, or are dysfunctional, at onset of clinical disease? The final unknown, for HD and the other repeat diseases, is the basal expansion rate of the repeat over time. In HD, the cell group critical portion can be estimated from previous work that showed around half of the most susceptible D2R-expressing medium spiny neurons in the striatum have been lost at onset [260– 262]. This parameter could likely be estimated in living subjects from imaging data, as recent well-standardised structural imaging and clinical data has been collected in prospective studies in both manifest and premanifest subjects [38, 263]. The basal expansion rate of the repeat in the most susceptible cells is much more difficult to measure or to derive from existing data. Given the likely stochastic nature of the allele expansion process and the data available in human brain which indicates very long repeats in some cells [24], this will be hard to estimate. However, the very long repeats could be rare events and indeed, could be protective in those surviving cells, as such repeat lengths are seen to reduce phenotype severity and delay onset in mice [90]. The most useful data are likely to come from single cell approaches in a combination of human and mouse brain. It would be ideal if all the data we needed could be derived from human brain, but this is unlikely to be sufficient as human postmortem brain is at the end stage of disease, and the only cells that can be surveyed are those that have survived. These are likely not representative of those that died earlier, and they may well themselves have been dysfunctional at death. Nevertheless, given this is likely a stochastic process there might be surviving cells at different points in the pathological trajectory that could be used in single cell experiments to define the pathogenic CAG tract length threshold. There are methods to sequence and size the HTT CAG tract accurately [264, 265], which could potentially be applied to single cells, but these would have to be tied to the single cell RNA gene expression data— achievable, but technically challenging. Mouse brain is likely to offer a clearer picture of the dynamics of the pathological process, as tissues can be taken across the lifespan of the mouse and can be processed immediately to generate high quality single cell data. One major disadvantage of most HD mouse models is that they show little frank neurodegeneration, and in this respect do not recapitulate the human disease, but rather display neuronal dysfunction. However, for some analyses this is an advantage. Current data indicates that HD cellular dysfunction can be measured by single cell RNA-seq [266– 268], though the disconnect between behavioural and gene expression changes observed by Landles et al. [96] may make this difficult to interpret. The barrier here is gaining a measure of HTT CAG tract length in individual cells and matching that up with the gene expression signature of the same cells— the same technical challenge as noted above in human brain but perhaps easier to overcome in mouse brain. Single cell studies in mouse brain would also be a way to answer a long-standing question in HD: do vulnerable neurons die because of intracellular pathological events induced by the CAG tract, as we argue here, or do they die because of aberrant intercellular events, or both [269– 271]? In addition, mice with a shorter repeat length than those currently widely used in HD research will be necessary to examine the intracellular pathogenic threshold as they will need to start with a repeat length below that threshold. Another advantage of mice is that blood and brain somatic instability can be directly correlated. If this relationship can be established then it may be possible to extrapolate to human subjects where only blood is available. Using the age at onset genetic modifier data obtain-ed in people might help to establish the pathogenic threshold. The effect sizes and directions of the known modifiers can be used to construct a polygenic risk score, which here consists of the sum of all known modifier alleles, weighted by the effect of each allele on onset [272]. This score can be used to predict somatic expansion in individuals without requiring expansion to be measured directly, thereby greatly increasing sample size, and may be incorporated into the Kaplan model [34]. This assumes that age at onset is a surrogate for measuring somatic expansion: Ciosi et al. [6] showed that individuals with higher blood DNA HTT CAG expansion have earlier HD onset and that the level of expansion in blood was associated with variants in selected genome-wide significant DNA repair genes from the modifier GWAS [7]. The question then becomes what is the relationship between blood DNA HTT CAG expansion and brain DNA HTT CAG expansion? Again, mice will be the model best able to test this directly. It would also be useful to have a prospective study of somatic expansion in blood in manifest and premanifest carriers to investigate how expansion increases over time and with proximity to onset: this could be performed in TRACK and TRACK-ON [38, 273] though studies of longer duration might be needed to establish longitudinal trajectories of repeat length. Then, if the relationship between expansion in blood and brain can be established experimentally, these trajectories may help infer the pathogenic HTT CAG threshold in the most susceptible cell populations in brain, likely to be neurons [25]. As larger genetic studies are performed in HD the various polygenic risk scores will become more accurate and account for more phenotypic variation. In particular, risk scores estimating somatic expansion directly will be important in improving inferences about the pathogenic CAG length threshold via the Kaplan model. This will require genome-wide association studies of somatic expansion with large sample sizes. Given the recent interest in targeting somatic expansion of the expanded CAG in HTT, it behoves us to understand just how much of that expansion we need to prevent in order to substantially delay onset and slow the progression of disease. Evidence from animal models of HD and from other diseases caused by expanded CAG tracts place the threshold for cellular dysfunction above 60 CAG but below 100 CAG. Modelling in blood gives an expansion parameter that predicts age at onset [6] and this may prove a useful biomarker especially in clinical trials that target somatic expansion. While the prediction works well at a population level it is not currently clinically useful for individuals. There are substantial differences between individuals that could well be improved by using other peripheral tissues to measure expansion or by improving the model using genetics. As a first step, the CAG length and sequence themselves need to be more accurately measured [20, 274]. It should further be borne in mind that there are likely to be different repeat length dynamics in different cell types and the nature of the rate of expansion in different cell types is unknown. Cellular toxicity may be driven by different mechanisms in different cell types and different diseases. Specifically, in HD, in order to improve power in clinical trials, it would be very helpful to know how the HTT CAG expansion measured in blood relates to expansions in specific vulnerable cell types in the brain, and what influences that relationship. This might allow us to increase the power and shorten clinical trials by using a model that includes blood or other peripheral tissue HTT CAG expansion data, genetic data and imaging data. This could speed up getting treatments to the clinic [275].

CONFLICT OF INTEREST

LJ is a member of the Scientific Advisory Boards of LoQus23 Therapeutics and Triplet Therapeutics. JD, SP, NR and PH have no conflicts of interest.

269 in total

1. Gcn5 loss-of-function accelerates cerebellar and retinal degeneration in a SCA7 mouse model.

Authors: Yi Chun Chen; Jennifer R Gatchel; Rebecca W Lewis; Chai-An Mao; Patrick A Grant; Huda Y Zoghbi; Sharon Y R Dent
Journal: Hum Mol Genet Date: 2011-10-14 Impact factor: 6.150

2. Ubiquitous expression of human SCA2 gene under the regulation of the SCA2 self promoter cause specific Purkinje cell degeneration in transgenic mice.

Authors: Jorge Aguiar; Julio Fernández; Anselmo Aguilar; Yssel Mendoza; María Vázquez; José Suárez; Jorge Berlanga; Silian Cruz; Gerardo Guillén; Luis Herrera; Luis Velázquez; Nieves Santos; Nelson Merino
Journal: Neurosci Lett Date: 2005-10-03 Impact factor: 3.046

3. Somatic mosaicism of the expanded CAG trinucleotide repeat in mRNAs for the responsible gene of Machado-Joseph disease (MJD), dentatorubral-pallidoluysian atrophy (DRPLA), and spinal and bulbar muscular atrophy (SBMA).

Authors: Y Ito; F Tanaka; M Yamamoto; M Doyu; M Nagamatsu; S Riku; T Mitsuma; G Sobue
Journal: Neurochem Res Date: 1998-01 Impact factor: 3.996

4. Common origin of pure and interrupted repeat expansions in spinocerebellar ataxia type 2 (SCA2).

Authors: Eliana Marisa Ramos; Sandra Martins; Isabel Alonso; Vanessa E Emmel; Maria Luiza Saraiva-Pereira; Laura Bannach Jardim; Paula Coutinho; Jorge Sequeiros; Isabel Silveira
Journal: Am J Med Genet B Neuropsychiatr Genet Date: 2010-03-05 Impact factor: 3.568

5. Regional differences of somatic CAG repeat instability do not account for selective neuronal vulnerability in a knock-in mouse model of SCA1.

Authors: Kei Watase; Koen J T Venken; Yaling Sun; Harry T Orr; Huda Y Zoghbi
Journal: Hum Mol Genet Date: 2003-09-02 Impact factor: 6.150

6. Somatic expansion of the Huntington's disease CAG repeat in the brain is associated with an earlier age of disease onset.

Authors: Meera Swami; Audrey E Hendricks; Tammy Gillis; Tiffany Massood; Jayalakshmi Mysore; Richard H Myers; Vanessa C Wheeler
Journal: Hum Mol Genet Date: 2009-05-23 Impact factor: 6.150

7. Continuous and periodic expansion of CAG repeats in Huntington's disease R6/1 mice.

Authors: Linda Møllersen; Alexander D Rowe; Elisabeth Larsen; Torbjørn Rognes; Arne Klungland
Journal: PLoS Genet Date: 2010-12-09 Impact factor: 5.917

8. Increased nuclear DNA damage precedes mitochondrial dysfunction in peripheral blood mononuclear cells from Huntington's disease patients.

Authors: Georgina Askeland; Zaneta Dosoudilova; Marie Rodinova; Jiri Klempir; Irena Liskova; Anna Kuśnierczyk; Magnar Bjørås; Gaute Nesse; Arne Klungland; Hana Hansikova; Lars Eide
Journal: Sci Rep Date: 2018-06-29 Impact factor: 4.379

9. Assessment of motor function, sensory motor gating and recognition memory in a novel BACHD transgenic rat model for huntington disease.

Authors: Yah-Se K Abada; Huu Phuc Nguyen; Rudy Schreiber; Bart Ellenbroek
Journal: PLoS One Date: 2013-07-11 Impact factor: 3.240

10. Genetic Risk Underlying Psychiatric and Cognitive Symptoms in Huntington's Disease.

Authors: Natalie Ellis; Amelia Tee; Branduff McAllister; Thomas Massey; Duncan McLauchlan; Timothy Stone; Kevin Correia; Jacob Loupe; Kyung-Hee Kim; Douglas Barker; Eun Pyo Hong; Michael J Chao; Jeffrey D Long; Diane Lucente; Jean Paul G Vonsattel; Ricardo Mouro Pinto; Kawther Abu Elneel; Eliana Marisa Ramos; Jayalakshmi Srinidhi Mysore; Tammy Gillis; Vanessa C Wheeler; Christopher Medway; Lynsey Hall; Seung Kwak; Cristina Sampaio; Marc Ciosi; Alastair Maxwell; Afroditi Chatzi; Darren G Monckton; Michael Orth; G Bernhard Landwehrmeyer; Jane S Paulsen; Ira Shoulson; Richard H Myers; Erik van Duijn; Hugh Rickards; Marcy E MacDonald; Jong-Min Lee; James F Gusella; Lesley Jones; Peter Holmans
Journal: Biol Psychiatry Date: 2019-12-17 Impact factor: 12.810

8 in total

1. Suppression of trinucleotide repeat expansion in spermatogenic cells in Huntington's disease.

Authors: In K Cho; Charles A Easley; Anthony W S Chan
Journal: J Assist Reprod Genet Date: 2022-09-06 Impact factor: 3.357

Review 2. Huntington's disease iPSC models-using human patient cells to understand the pathology caused by expanded CAG repeats.

Authors: Julia Kaye; Terry Reisine; Steven Finkbeiner
Journal: Fac Rev Date: 2022-06-28

3. Special Issue: DNA Repair and Somatic Repeat Expansion in Huntington's Disease.

Authors: Lesley Jones; Vanessa C Wheeler; Christopher E Pearson
Journal: J Huntingtons Dis Date: 2021

Review 4. FAN1, a DNA Repair Nuclease, as a Modifier of Repeat Expansion Disorders.

Authors: Amit L Deshmukh; Antonio Porro; Mohiuddin Mohiuddin; Stella Lanni; Gagan B Panigrahi; Marie-Christine Caron; Jean-Yves Masson; Alessandro A Sartori; Christopher E Pearson
Journal: J Huntingtons Dis Date: 2021

5. Approaches to Sequence the HTT CAG Repeat Expansion and Quantify Repeat Length Variation.

Authors: Marc Ciosi; Sarah A Cumming; Afroditi Chatzi; Eloise Larson; William Tottey; Vilija Lomeikaite; Graham Hamilton; Vanessa C Wheeler; Ricardo Mouro Pinto; Seung Kwak; A Jennifer Morton; Darren G Monckton
Journal: J Huntingtons Dis Date: 2021

Review 6. The Contribution of Somatic Expansion of the CAG Repeat to Symptomatic Development in Huntington's Disease: A Historical Perspective.

Authors: Darren G Monckton
Journal: J Huntingtons Dis Date: 2021

7. Huntingtin structure is orchestrated by HAP40 and shows a polyglutamine expansion-specific interaction with exon 1.

Authors: Rachel J Harding; Justin C Deme; Johannes F Hevler; Sem Tamara; Alexander Lemak; Jeffrey P Cantle; Magdalena M Szewczyk; Nola Begeja; Siobhan Goss; Xiaobing Zuo; Peter Loppnau; Alma Seitova; Ashley Hutchinson; Lixin Fan; Ray Truant; Matthieu Schapira; Jeffrey B Carroll; Albert J R Heck; Susan M Lea; Cheryl H Arrowsmith
Journal: Commun Biol Date: 2021-12-08

Review 8. Drugging DNA Damage Repair Pathways for Trinucleotide Repeat Expansion Diseases.

Authors: Caroline L Benn; Karl R Gibson; David S Reynolds
Journal: J Huntingtons Dis Date: 2021

8 in total