| Literature DB >> 17519034 |
Stefanie L Butland1, Rebecca S Devon, Yong Huang, Carri-Lyn Mead, Alison M Meynert, Scott J Neal, Soo Sen Lee, Anna Wilkinson, George S Yang, Macaire M S Yuen, Michael R Hayden, Robert A Holt, Blair R Leavitt, B F Francis Ouellette.
Abstract
BACKGROUND: Expansion of polyglutamine-encoding CAG trinucleotide repeats has been identified as the pathogenic mutation in nine different genes associated with neurodegenerative disorders. The majority of individuals clinically diagnosed with spinocerebellar ataxia do not have mutations within known disease genes, and it is likely that additional ataxias or Huntington disease-like disorders will be found to be caused by this common mutational mechanism. We set out to determine the length distributions of CAG-polyglutamine tracts for the entire human genome in a set of healthy individuals in order to characterize the nature of polyglutamine repeat length variation across the human genome, to establish the background against which pathogenic repeat expansions can be detected, and to prioritize candidate genes for repeat expansion disorders.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17519034 PMCID: PMC1896166 DOI: 10.1186/1471-2164-8-126
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Q-tract length variation in genes containing polyglutamine-encoding CAG-type trinucleotide repeats, sorted by Q-tract
| 17p13.2 | MINK1* | G4N1 | Q4LQ5 (SwP) | 162 | 5 – 5 | 5.0 | 0 |
| 9q34.11 | CIZ1 | Q6 | 154 | 6 – 6 | 6.0 | 0 | |
| 7q36.2 | PAXIP1L* | Q7 | 168 | 7 – 7 | 7.0 | 0 | |
| 11q24.3 | PRDM10 | Q8 | 172 | 8 – 8 | 8.0 | 0 | |
| 4q31.1 | MAML3a* | Q9 | 156 | 8 – 8 | 8.0 | 0 | |
| 6p21.1 | TFEB | Q10 | 162 | 10 – 10 | 10.0 | 0 | |
| 19p13.11 | CHERP | Q12 | 192 | 12 – 12 | 12.0 | 0 | |
| 12q21.2 | PHLDA1 | G5A1 | Q15 | 212 | 14 – 14 | 14.0 | 0 |
| 16p13.3 | CREBBP | Q18 | 158 | 18 – 18 | 18.0 | 0 | |
| 4q31.1 | MAML3b* | G3A1G3A1G1A1 | Q18 | 166 | 18 – 18 | 18.0 | 0 |
| 20q11.22 | NCOA6* | G4A4 | Q25 | 166 | 25 – 25 | 25.0 | 0 |
| Xq13.1 | MED12* | G5A1G2A1G1A1G5A1G1A1 | Q26X4Q6 | 205 | 26 – 27 | 26.0 | 0 |
| 20q13.12 | PRKCBP1 | Q8 | 152 | 8 – 9 | 8.0 | 0.01 | |
| 15q24.1 | ARID3B | Q11 | 212 | 11 – 12 | 11.0 | 0.01 | |
| 22q11.21 | PCQAPa | G4A1G3N1G5N3G7A1G3N8G3N5G5N1 | Q8FQ5X3Q11X16Q5LQ8 | 152 | 11 – 12 | 11.0 | 0.01 |
| 3p24.3 | SATB1 | G1A1G3A1G1A1 | Q15 | 174 | 15 – 16 | 15.0 | 0.01 |
| 6q16.2 | POU3F2 | G3A1G1A1G3A1G2A1 | Q21 | 148 | 21 – 22 | 21.0 | 0.01 |
| Xq22.3 | FRMPD3 | G4A3G4A3G3A3 | Q27 (SwP) | 184 | 26 – 27 | 27.0 | 0.01 |
| 2q35 | TNS | Q9 | 178 | 9 – 11 | 9.0 | 0.02 | |
| 19p13.12 | BRD4 | Q5RQEQ8 | 140 | 8 – 9 | 8.0 | 0.03 | |
| 12p13.31 | PHC1 | Q15 | 170 | 13 – 15 | 15.0 | 0.05 | |
| 9q32 | C9orf43 | Q8 | 168 | 8 – 9 | 8.1 | 0.07 | |
| 1q21.3 | TNRC4 | A1 | Q15 | 150 | 15 – 18 | 15.0 | 0.08 |
| 17q12 | SOCS7 | Q8 (SwP) | 134 | 8 – 9 | 8.1 | 0.12 | |
| 1p31.1 | ST6GALNAC5 | Q12 | 150 | 12 – 14 | 12.1 | 0.13 | |
| 15q26.1 | POLG | Q13 | 164 | 13 – 15 | 13.1 | 0.16 | |
| 22q13.1 | TNRC6B | Q8 | 166 | 7 – 8 | 7.8 | 0.17 | |
| 12q13.12 | MLL2* | G5N1A1G1A1G1A1N1 | Q5LQ5LQ7LQ5LQ4LQ8LQ7 LQ6LQ10FQ8 | 184 | 8 – 11 | 10.2 | 0.21 |
| 7p14.1 | POU6F2 | Q10 | 168 | 6 – 11 | 10.0 | 0.22 | |
| Xq28 | CXorf6 | G1A1 | Q11X92Q10 | 168 | 11 – 12 | 11.6 | 0.25 |
| 12p13.33 | DCP1B | Q10 | 136 | 10 – 12 | 10.5 | 0.26 | |
| 17q23.2 | VEZF1 | Q13 (through intron) | 176 | 8 – 15 | 13.1 | 0.29 | |
| 22q11.21 | PCQAPb | G3A1G2N9A2G1A1 | Q6X9Q16 | 152 | 12 – 18 | 16.1 | 0.34 |
| 3p14.1 | MAGI1 | G5A1G3A1 | Q20 | 168 | 16 – 21 | 20.3 | 0.36 |
| 4q21.21 | BMP2K | G8A1G1A1G4A1G1A1 | Q27 | 148 | 23 – 28 | 26.9 | 0.36 |
| 16q22.1 | NFAT5* | Q17 | 168 | 11 – 19 | 17.0 | 0.37 | |
| 12p13.31 | ZNF384 | Q16 | 214 | 11 – 20 | 15.2 | 0.47 | |
| 22q12.1 | MN1* | A1 | Q28 | 180 | 26 – 30 | 28.7 | 0.53 |
| 12q24.33 | EP400 | G6A2 | Q29 | 158 | 28 – 31 | 28.8 | 0.53 |
| 12q23.2 | ASCL1 | Q12 | 148 | 9 – 15 | 12.3 | 0.65 | |
| 6q25.3 | ARID1B | Q18 | 152 | 16 – 23 | 17.7 | 0.69 | |
| 11q21 | MAML2 | G1A1G2A1 | Q31X5Q7X5Q27 (through intron) | 168 | 27 – 31 | 28.3 | 0.75 |
| 12q24.12 | Q23 | 124 | 17 – 27 | 22.2 | 0.79 | ||
| 9p24.3 | SMARCA2 | G1A2G3A1 | Q23 | 130 | 18 – 24 | 22.7 | 0.79 |
| 20q13.12 | NCOA3 | G6A1 | Q29 | 150 | 26 – 31 | 28.4 | 0.80 |
| 17p11.2 | RAI1 | Q14 | 184 | 11 – 17 | 14.6 | 0.84 | |
| 7q31.1 | FOXP2* | G4A1G4A2G2A2G3A5G2A2 | Q40 | 100 | 34 – 40 | 39.8 | 0.85 |
| 3p14.1 | Q10 | 184 | 7 – 14 | 10.4 | 0.89 | ||
| 19q13.2 | NUMBL | G6A1G1A1 | Q20 | 156 | 18 – 20 | 18.7 | 0.93 |
| 12q24.31 | NCOR2 | G3A2 | Q16 (through intron) | 172 | 13 – 20 | 16.9 | 0.95 |
| 15q26.3 | MEF2A | Q11 | 174 | 8 – 16 | 10.2 | 1.13 | |
| 14q24.3 | C14orf4 | A1G1A1G1A1G6A1 | Q25 (through intron) | 150 | 20 – 31 | 23.4 | 1.17 |
| 3q13.2 | KIAA2018 | Q14 (through intron) | 150 | 11 – 16 | 12.6 | 1.44 | |
| 1q21.3 | DENND4B | A1G5A1 | Q16 | 156 | 13 – 17 | 15.2 | 2.04 |
| 6p22.3 | G12T1G1T1 | Q12HQHQ14 | 130 | 11 – 21 | 14.6 | 2.23 | |
| 6q27 | G3A3G8A1G1A1 | Q38 | 158 | 30 – 41 | 36.9 | 2.26 | |
| 19p13.3 | Q13 | 112 | 7 – 16 | 12.1 | 2.42 | ||
| 16p12.1 | TNRC6A | Q8 | 166 | 4 – 8 | 7.2 | 2.50 | |
| 6p21.1 | RUNX2 | A1G3A1G4A1 | Q23 | 100 | 18 – 30 | 22.5 | 3.04 |
| 16q22.1 | THAP11 | G3A1G5A1G2A1G5A1 | Q29 | 170 | 18 – 30 | 28.5 | 3.12 |
| 1q22 | KCNN3 | G7A1G4N25 | Q12X25Q14 | 170 | 15 – 25 | 20.3 | 3.98 |
| 4p16.3 | Q21 | 252 | 9 – 33 | 17.2 | 7.18 | ||
| Xq12 | Q23X5Q6 | 180 | 14 – 33 | 23.7 | 9.34 | ||
| 12p13.31 | G1A1G1A1 | Q19 | 168 | 11 – 27 | 17.6 | 11.6 | |
| 14q32.12 | G2A1N1G1A1 | Q3KQ10 | 168 | 10 – 27 | 17.8 | 29.2 | |
| 2q37.1 | TNRC15 | Q6 | n.d. | n.d. | n.d. | n.d. |
aBoldface text marks a gene known to cause disease by expansion of a polyglutamine-encoding CAG trinucleotide repeat. 'a' and 'b' after MAML3 and PCQAP denote two targets within these genes. Genes marked with an asterisk (*) contain an additional repeat target that was not screened in this study.
bG denotes "CAG", A denotes "CAA" and N denotes a non-glutamine codon, each followed by the number of tandem repeats of that codon. Boldface text marks the longest uninterrupted CAG-tract.
cX indicates a non-glutamine amino acid; SwP indicates peptide sequence obtained from SwissProt record
dN denotes number of alleles screened
eData for N, Observed Q-tract Length Min-Max, Q-tract Mean, Q-tract Variance, taken from Andres et al. (26)
Figure 1Relationship between length of longest uninterrupted CAG-tract and Q-tract length variance. (A) All targets. HD Q-tract length variance from Andres et al. [26]. Correlation = 0.62, not including ATXN3. (B) Higher resolution view of targets with Q-tract length variance < 4.0. Dashed lines at 10 CAG and 0.79 variance represent the cutoff for identifying candidate genes for polyglutamine expansion disorders. See text for list of genes falling in this area.
Figure 2Example distributions of normal Q-tract lengths. (A) ATXN3, ataxin 3 (B) RAI1, retinoic acid receptor 1.
Functional classification of CAGpolyQ genes: Gene Ontology over-representation analysis.
| regulation of biological process ( | 37 | 2.3 |
| regulation of physiological process ( | 36 | 2.5 |
| regulation of metabolism ( | 29 | 3.0 |
| positive regulation of metabolism ( | 7 | 6.5 |
| nucleobase, nucleoside, nucleotide and nucleic acid | ||
| metabolism (4) GO:0006139 | 34 | 2.5 |
| transcription regulator activity ( | 24 | 4.0 |
| transcription cofactor activity ( | 11 | 8.8 |
| transcription coactivator activity ( | 9 | 13.4 |
| nucleic acid binding (2) GO:0003676 | 35 | 2.8 |
| DNA binding (3) GO:0003677 | 28 | 3.1 |
| transcription factor binding (3) GO:0008134 | 12 | 8.3 |
| organelle ( | 43 | 1.7 |
| membrane-bound organelle (2) GO:0043227 | 43 | 1.9 |
| intracellular (2) GO:0005622 | 47 | 1.5 |
| intracellular organelle ( | 43 | 1.7 |
| intracellular membrane-bound organelle ( | 43 | 1.9 |
| nucleus (3, | 41 | 2.7 |
| nucleoplasm (4, | 11 | 4.1 |
All levels for each GO term are indicated, with boldface indicating one path through the GO
*p < 0.00004 for all GO terms listed except nucleoplasm, p = 0.0001.
Figure 3Functional classification of CAGpolyQ genes: shared Gene Ontology term analysis. Known disease genes are marked with a 'D', candidate disease genes are marked with a 'C' and genes with invariant Q-tracts (Table 1) are marked with an 'I'. Clusters of genes are labeled with the GO terms that best described each cluster. GO terms shared by gene pairs are listed in Additional file 7 and Additional file 8. Genes not represented in a graph either had no annotation under that GO namespace or did not share a GO term with a score above the 99th percentile. (A) Biological process. Genes not represented: ARID1B, ATXN1, ATXN2, BRD4, C9ORF43, DCP1B, HD, DENND4B, FRMPD3, MAML2, PAXIP1L, PHC1, PHLDA1, SOCS7, THAP11, TNRC15, TNRC6A, TNRC6B and TNS. (B) Molecular function. Genes not represented: ATN1, ATXN1, ATXN3, BRD4, C14ORF4, C9ORF43, CHERP, DCP1B, KCNN3, DENND4B, FRMPD3, MAML2, NUMBL, PAXIP1L, PCQAP, PHLDA1, SOCS7, ST6GALNAC5, TNRC15, TNRC6A, TNRC6B and TNS.