| Literature DB >> 29186753 |
April L Darling1,2, Vladimir N Uversky1,3.
Abstract
Intrinsically disordered proteins and proteins with intrinsically disordered regions have been shown to be highly prevalent in disease. Furthermore, disease-causing expansions of the regions containing tandem amino acid repeats often push repetitive proteins towards formation of irreversible aggregates. In fact, in disease-relevant proteins, the increased repeat length often positively correlates with the increased aggregation efficiency and the increased disease severity and penetrance, being negatively correlated with the age of disease onset. The major categories of repeat extensions involved in disease include poly-glutamine and poly-alanine homorepeats, which are often times located in the intrinsically disordered regions, as well as repeats in non-coding regions of genes typically encoding proteins with ordered structures. Repeats in such non-coding regions of genes can be expressed at the mRNA level. Although they can affect the expression levels of encoded proteins, they are not translated as parts of an affected protein and have no effect on its structure. However, in some cases, the repetitive mRNAs can be translated in a non-canonical manner, generating highly repetitive peptides of different length and amino acid composition. The repeat extension-caused aggregation of a repetitive protein may represent a pivotal step for its transformation into a proteotoxic entity that can lead to pathology. The goals of this article are to systematically analyze molecular mechanisms of the proteinopathies caused by the poly-glutamine and poly-alanine homorepeat expansion, as well as by the polypeptides generated as a result of the microsatellite expansions in non-coding gene regions and to examine the related proteins. We also present results of the analysis of the prevalence and functional roles of intrinsic disorder in proteins associated with pathological repeat expansions.Entities:
Keywords: homorepeats; intrinsically disordered protein; intrinsically disordered protein region; protein aggregation; protein repeat expansion; proteinopathies
Mesh:
Substances:
Year: 2017 PMID: 29186753 PMCID: PMC6149999 DOI: 10.3390/molecules22122027
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
Major characteristics of genes with pathological repeat expansions and proteins they encode.
| Repeat Location | Gene | Disease a | Repeat Sequence | WT Length | Pathogenic Length b | % Disorder c | References | |
|---|---|---|---|---|---|---|---|---|
| Poly-alanine | Exon | SPD II | GCG | 15 | >21 | 33.82 | [ | |
| Exon | HFGS | GCG | 12 | >17 | 34.28 | [ | ||
| Exon | CCD | GCG | 17 | >26 | 62.96 | [ | ||
| Exon | HPE | GCG | 9 | -- | 54.89 | [ | ||
| Exon | CCHS | GCG | 20 | -- | 54.15 | [ | ||
| Exon-X chrom. | XLMR + GHD | GCG | 15 | >25 | 51.12 | [ | ||
| Exon-X chrom. | XLMR | GCG | 16 | >17, >22 | 59.07 | [ | ||
| Exon | BPEIS | GCG | 14 | >21, >24 | 47.34 | [ | ||
| Exon | OPMD | GCG | 10 | >11, >16 | 59.80 | [ | ||
| Poly-glutamine | Exon | Schizo. d | CAG | -- | -- | 37.50 | [ | |
| Exon | HDL2 | CAG/CTG | 6 to 28 | >41 | 51.07 | [ | ||
| Exon | HD | CAG | 6 to 35 | >35 | 19.10 | [ | ||
| Exon | DRPLA | CAG | 3 to 36 | >48 | 86.05 | [ | ||
| Exon | SBMA | CAG | 9 to 36 | >37 | 54.13 | [ | ||
| Exon | SCA1 | CAG | 6 to 39 | >39 | 54.97 | [ | ||
| Exon | SCA2 | CAG | 14 to 32 | >33 | 79.13 | [ | ||
| Exon | SCA3 | CAG | 12 to 40 | >54 | 42.03 | [ | ||
| Exon | SCA6 | CAG | 4 to 18 | >20 | 42.08 | [ | ||
| Exon | SCA7 | CAG | 7 to 17 | >33 | 71.30 | [ | ||
| Exon | SCA17 | CAG | 25 to 42 | >44 | 46.31 | [ | ||
| Non-coding | 5′ UTR | SCA12 | CAG | 7 to 32 | >54 | 7.67 | [ | |
| 5′ UTR-X chrom. | FXMR, FXTAS | CGG | 6 to 55 | >200, >55 | 38.29 | [ | ||
| 5′ UTR | FRA12A MR | CGG | 6 to 23 | >200 | 19.73 | [ | ||
| 5′ UTR-X chrom. | FRAXE MR | GCC | -- | >200 | 58.12 | [ | ||
| 5′ UTR | C9ALS/FTD | GGGGCC | -- | Unknown | 2.70 | [ | ||
| Intron | FRDA | GAA | 7 to 22 | >66 | 40.00 | [ | ||
| Intron | DM2 | CCTG | <27 | >75 | 19.77 | [ | ||
| Intron | SCA10 | ATTCT | 10 to 29 | >279 | 5.26 | [ | ||
| Intron | SCA36 | GGCCTG | 3 to 8 | >1500 | 26.26 | [ | ||
| Intron | FECD | CTG | -- | >50 | 88.01 | [ | ||
| 3′ UTR | DM1 | CTG | 5 to 37 | >50 | 14.63 | [ | ||
| 3′ UTR | SCA8 e | CTG | 6 to 37 | >74 | 68.00 f | [ | ||
| Exon | SCA8 e | CAG | 15 to 50 | 71 to 1300 | 100.00 f | [ | ||
| Promoter | EPM1 | CCCCGCCCCGCG | 2 to 3 | >14 | 40.82 | [ |
a SPD II, synpolydactyly; HFGS, hand-foot genital syndrome; CCD, cleidocranial dysplasia; HPE, holoprosencephaly cephalic disorder; CCHS, congenital central hypoventilation syndrome; XLMR + GHD, X-linked mental retardation with isolated growth hormone deficiency; XLMR, ARX-nonsyndromic X-linked mental retardation; BPEIS, blepharophimosis, ptosis, and epicanthus inversus syndrome; OPMD, oculopharyngeal muscular dystrophy; Schizo., schizophrenia; HDL2, huntinton’s disease-like 2; HD, Huntington’s disease; DRPLA, dentatorubral-pallidoluysian; SBMA, spinal and bulbar muscular atrophy; SCA, spinocerebellar ataxia; FXMR, fragile X mental retardation; FTXAS, fragile X-associated tremor/ataxia syndrome; FRA12A MR, fragile X mental retardation; FRAXE MR, fragile X mental retardation; ALS, amyotrophic lateral sclerosis; FTD, frontotemporal dementia; FRDA, Friedreich ataxia; DM, myotonic dystrophy; FECD, fuchs endothelial corneal dystrophy; EPM1, myoclonus epilepsy of Unverricht-Lundborg type, WT and pathogenic length refer to number of sequence repeats. b Pathogenic length indicates the threshold of the repeat length, above which the protein-carrier will cause development of pathology. c MobiDB-based predicted consensus disorder content is shown for query proteins (http://mobid.bio.unipd.it/) [127,128]. d Although CAG repeat tract length in KCNN3 was correlated with schizophrenia, this is not a pathological repeat expansion and a cause of disease. e SCA8 is caused by the bidirectional transcription at the SCA8 locus containing ATXN8OS and ATXN8 genes and therefore considered as the ′CTG*CAG′ repeat expansion disease, referring to the complementary base pairs of the ATXN8OS and ATXN8 genes. f For ATXN8OS and ataxin-8 proteins, disorder content was calculated as an averaged value of the overall percent of residues predicted to be disordered by PONDR® VLXT, PONDR® VL3 and PONDR® VSL2.
Figure 1Major known functions of proteins with pathogenic repeat expansions. Proteins with pathogenic expansions have varied functions depending on the type of expansion present. Poly-alanine (poly-Ala) expansions have the least variability in functions, with 8 of the 9 engaging in some sort of transcription regulation. Poly-glutamine (polyQ) expansions that cause pathology have more varied functions, but the majority participate in transcription regulation as well. Pathogenic repeats in non-coding regions occur in genes encoding proteins with the most varied functions. They include everything from catalytic proteins to receptors. Since the repeat extension occurs in the non-coding region of the gene, it is conceivable that there are not more synonymous functions among pathogenic repeat proteins in non-coding regions.
Figure 2Evaluation of intrinsic disorder propensities of 33 proteins associated with the proteins caused by the nucleotide expansions. Intrinsic disorder predisposition was evaluated by PONDR® VSL2 predictor, which is one of the more accurate stand-alone tools for prediction of the intrinsic disorder status of a target protein. This tool is known to be statistically better for proteins containing both ordered and disordered regions [208,209]. (A) Homeobox protein HOXD13 (UniProt ID: P35453); (B) Homeobox protein HOXA13 (UniProt ID: P31271); (C) runt-related transcription factor 2, RUNX2 (UniProt ID: Q13950); (D) Zinc finger protein ZIC2 (UniProt ID: O95409); (E) Paired mesoderm homeobox protein 2B (UniProt ID: Q99453); (F) Transcription factor SOX3 (UniProt ID: P41225); (G) Homeobox protein ARX (UniProt ID: Q96QS3); (H) Human FOXL2 (UniProt ID: P58012); (I) PABP2/PABPN1 (UniProt ID: Q86U42); (J) Small conductance calcium-activated potassium channel protein 3 (SK3, UniProt ID: Q9UGI6); (K) Human junctophilin-3 (JP-3, UniProt ID: Q8WXH2); (L) Human huntingtin (UniProt ID: P42858); (M) Atrophin-1 (UniProt ID: P54259); (N) Human androgen receptor (AR, UniProt ID: P10275); (O) Ataxin-1 (UniProt ID: P54253); (P) Ataxin-2 (UniProt ID: Q99700); (Q) Human ataxin-3 (UniProt ID: P54252); (R) Voltage-dependent P/Q-type calcium channel subunit α1A (CACNA1A, UniProt ID: O00555); (S) Ataxin-7 (UniProt ID: O15265); (T) TATA-box-binding protein (TBP, UniProt ID: P20226); (U) Synaptic functional regulator FMR1 (UniProt ID: Q06787); (V) Disco-interacting protein 2 homolog B (UniProt ID: Q9P265); (W) AF4/FMR2 family member 2 (UniProt ID: P51816); (X) C9orf72 (UniProt ID: Q96LT7); (Y) Frataxin (UniProt ID: Q16595); (Z) Cellular nucleic acid-binding protein (CNBP, UniProt ID: P62633); (a) Ataxin-10 (UniProt ID: Q9UBB4); (b) Nucleolar protein 56 (UniProt ID: O00567); (c) Transcription factor 4 (UniProt ID: P15884); (d) Myotonin-protein kinase (UniProt ID: Q09013); (e) Ataxin-8 (UniProt ID: Q156A1); (f) ATXN8OS protein (UniProt ID: P0DMR3); (g) Cystatin-B (UniProt ID: P04080); (h) Serine/threonine-protein phosphatase 2A 55 kDa regulatory subunit B β isoform (PPP2R2B, UniProt ID: Q00005). In this analysis, scores above 0.5 correspond to intrinsic disorder.
Potential translation products of non-coding repeat expansions.
| Gene | Repeat Sequence | Sense Translation | Antisense Translation |
|---|---|---|---|
| CGG-CGG-CGG-CGG | R-R-R-R | A-A-A-A | |
| GGC-GGC-GGC-GGC | G-G-G-G | P-P-P-P | |
| GCG-GCG-GCG-GCG | A-A-A-A | R-R-R-R | |
| CGG-CGG-CGG-CGG | R-R-R-R | A-A-A-A | |
| GGC-GGC-GGC-GGC | G-G-G-G | P-P-P-P | |
| GCG-GCG-GCG-GCG | A-A-A-A | R-R-R-R | |
| GCC-GCC-GCC-GCC | A-A-A-A | R-R-R-R | |
| CCG-CCG-CCG-CCG | P-P-P-P | G-G-G-G | |
| CGC-CGC-CGC-CGC | R-R-R-R | A-A-A-A | |
| GGG-GCC-GGG-GCC | G-A-G-A | P-R-P-R | |
| GGG-CCG-GGG-CCG | G-P-G-P | P-G-P-G | |
| GGC-CGG-GGC-CGG | G-R-G-R | P-A-P-A | |
| GAA-GAA-GAA-GAA | E-E-E-E | L-L-L-L | |
| AAG-AAG-AAG-AAG | K-K-K-K | F-F-F-F | |
| AGA-AGA-AGA-AGA | R-R-R-R | S-S-S-S | |
| CCT-GCC-TGC-CTG | P-A-C-L | G-R-T-G-G | |
| CTG-CCT-GCC-TGC | L-P-A-C | G-G-R-T-G | |
| TGC-CTG-CCT-GCC | C-L-P-A | T-G-G-G-R | |
| ATT-CTA-TTC-TAT-TCT | I-L-F-F-T | STOP-D-K-I-R | |
| TTC-TAT-TCT-ATT-CTA | F-F-S-I-L | K-I-R-STOP-D | |
| TCT-ATT-CTA-TTC-TAT | S-I-L-F-F | R-STOP-D-K-I | |
| CTG-CTG-CTG-CTG | L-L-L-L | D-D-D-D | |
| TGC-TGC-TGC-TGC | C-C-C-C | P-P-P-P | |
| GCT-GCT-GCT-GCT | A-A-A-A | R-R-R-R | |
| CTG-CTG-CTG-CTG | L-L-L-L | D-D-D-D | |
| TGC-TGC-TGC-TGC | C-C-C-C | P-P-P-P | |
| GCT-GCT-GCT-GCT | A-A-A-A | R-R-R-R | |
| CTG-CTG-CTG-CTG | L-L-L-L | D-D-D-D | |
| TGC-TGC-TGC-TGC | C-C-C-C | P-P-P-P | |
| GCT-GCT-GCT-GCT | A-A-A-A | R-R-R-R | |
| CTG-CTG-CTG-CTG | L-L-L-L | D-D-D-D | |
| TGC-TGC-TGC-TGC | C-C-C-C | P-P-P-P | |
| GCT-GCT-GCT-GCT | A-A-A-A | R-R-R-R | |
| CCC-CGC-CCC-GCG | P-R-P-A | G-A-G-R | |
| CCC-GCC-CCG-CGC | P-A-P-R | G-R-G-A | |
| CCG-CCC-CGC-GCC | P-P-R-A | G-G-A-R |