| Literature DB >> 22287626 |
Martin H Schaefer1, Erich E Wanker, Miguel A Andrade-Navarro.
Abstract
Expanded runs of consecutive trinucleotide CAG repeats encoding polyglutamine (polyQ) stretches are observed in the genes of a large number of patients with different genetic diseases such as Huntington's and several Ataxias. Protein aggregation, which is a key feature of most of these diseases, is thought to be triggered by these expanded polyQ sequences in disease-related proteins. However, polyQ tracts are a normal feature of many human proteins, suggesting that they have an important cellular function. To clarify the potential function of polyQ repeats in biological systems, we systematically analyzed available information stored in sequence and protein interaction databases. By integrating genomic, phylogenetic, protein interaction network and functional information, we obtained evidence that polyQ tracts in proteins stabilize protein interactions. This happens most likely through structural changes whereby the polyQ sequence extends a neighboring coiled-coil region to facilitate its interaction with a coiled-coil region in another protein. Alteration of this important biological function due to polyQ expansion results in gain of abnormal interactions, leading to pathological effects like protein aggregation. Our analyses suggest that research on polyQ proteins should shift focus from expanded polyQ proteins into the characterization of the influence of the wild-type polyQ on protein interactions.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22287626 PMCID: PMC3378862 DOI: 10.1093/nar/gks011
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 2.Relative amount of polyQ proteins in a representative set of species. The graph represents the fraction of proteins of each species’ available proteome that contains a polyQ tract. Species included had more than 1000 protein sequences in the UniProt/Swiss-Prot database (version 15.6) (20). For simplicity, just two bacterial species were included in the plot since all of those analyzed had very few or no polyQ proteins.
Correlation of domains to polyQ presence over species
| Pfam identifier | Description | Class | Correlation on eukaryotic subset |
|---|---|---|---|
| PF03810 | Importin-beta N-terminal domain | 0.530 | |
| PF01302 | CAP-Gly domain | 0.522 | |
| PF12171 | Zinc-finger double-stranded RNA-binding | ZF | 0.494 |
| PF02207 | Putative zinc finger in N-recognin (UBR box) | UBX, ZF | 0.493 |
| PF01363 | FYVE zinc finger | PI, ZF | 0.479 |
| PF03731 | Ku70/Ku80 N-terminal alpha/beta domain | 0.476 | |
| PF01151 | GNS1/SUR4 family | 0.470 | |
| PF08389 | Exportin 1-like protein | 0.464 | |
| PF00787 | PX domain | PI | 0.447 |
| PF00153 | Mitochondrial carrier protein | 0.435 | |
| PF00169 | PH domain | PI | 0.432 |
| PF00613 | Phosphoinositide 3-kinase family, accessory domain (PIK domain) | PI | 0.431 |
| PF09336 | Vps4 C terminal oligomerization domain | 0.428 | |
| PF01585 | G-patch domain | 0.423 | |
| PF05047 | Mitochondrial ribosomal protein L51 / S25 / CI-B8 domain | 0.417 | |
| PF00620 | RhoGAP domain | 0.408 | |
| PF00566 | TBC domain | 0.400 |
Columns are (1) Pfam identifier, (2) Pfam description, (3) functional and structural classes: zinc finger (ZF), ubiquitin (UBX) or phosphatidylinositol (PI), (4) (Spearman) correlation over 133 eukaryotic species.
Figure 1.Frequency of trinucleotide repeats in the human genome. The y-axis represents the log2 of the ratio between the relative number of repeat runs observed (considering runs of at least 10 consecutive trinucleotides) and the proportion of the genome that is covered by the respective genomic region type.
Figure 3.Protein families with multiple events of polyQ insertion. (A) Fragments of a multiple sequence alignment of huntingtin orthologs from several species, with glutamines and prolines marked in red and green, respectively. Left box: N-terminal polyQ region progressively enlarged along the chordate lineage and missing in Drosophilae. Note how this region is followed by polyP in some species where the polyQ length is above four. Right box: very variable polyQ rich insertion specific to Drosophilae at another, distant position in huntingtin. (B) A total of 4759 protein families with members in human, zebrafish and fly was studied. We found 75 families having at least one human protein with a polyQ stretch, 354 families having at least one fly protein with a polyQ stretch, and 4293 having no Q-rich region in the fish proteins (see main text for details). For a total of 14 families (including huntingtin), both the human and the fly sequences had polyQ tracts (red boxes within the blue boxes) but not the zebrafish one, indicating multiple events of polyQ insertion along separate lineages (stars). By randomizing the identity of the polyQ sets in human and fly, we found the number of selected families to be significantly higher than random expectation (P < 0.05).
Figure 4.Protein interaction degree distribution for different protein sets. Box plots of the distribution of protein interaction partners for different protein sets. (A and B) Comparison of polyQ proteins, transcription factors without polyQ, large proteins without polyQ and all non-polyQ proteins, for human and yeast, respectively. (C and D) Comparison of proteins with long polyQ, short polyQ, or no polyQ, for human and yeast, respectively. All pairwise differences within a species were significant (P < 0.01) except for the comparison between medium and long polyQ length in yeast (P-value 0.056). This exception was due to an outlier in the medium set: one of the proteins has a degree of 2549 which is more than twice as high as the second highest degree. Removing it results in significant differences for all comparisons.
Frequently overrepresented functional annotations among polyQ proteins from 11 eukaryotic species
| Categoryb | HS | BT | RN | MM | DR | DM | CE | AT | SC | DD | NC |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Transcription-related | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||
| Nucleus | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||
| (RNA and nitrogen) metabolic or biosynthetic process | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||
| Compositionally biased region (Ser,Gly,Pro,Ala) | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||||
| Protein phoshporylation | ✓ | ✓ | ✓ | ✓ | ✓ | ||||||
| Alternative splicing | ✓ | ✓ | ✓ | ✓ | |||||||
| Protein dimerization activity | ✓ | ✓ | ✓ | ||||||||
| Developmental protein | ✓ | ✓ | ✓ |
aUsing the web tool DAVID (25).
bWe merged the resulting species-specific lists of functional terms (applying a P-value threshold of 0.05 after multiple testing correction with the Benjamini–Hochberg method) and replaced similar terms by representative substitutes.
HS = Homo sapiens, BT = Bos taurus, RN = Rattus norvegicus, MM = Mus musculus, DR = Danio rerio, DM = Drosophila melanogaster, CE = Caenorhabditis elegans, AT = Arabidopsis thaliana, SC = Saccharomyces cerevisiae, DD = Dictyostelium discoideum, NC = Neurospora crassa.
Domains overrepresented in proteins that interact with human polyQ proteins
| Domain name | Pfam accs. | Reason for merging | Interactions | |
|---|---|---|---|---|
| Nuclear hormone receptor associated | 0 | PF00104 | Colocalization | 95 |
| EGF | 0 | PF00008, PF07645 | Colocalization and overlap | 29 |
| Zinc finger MIZ type | 0.0016 | PF02891, PF11789 | Overlap | 12 |
| ATPase family associated with various cellular activities (AAA) | 0.00 | PF00004, PF05496 | Colocalization and overlap | 24 |
| Ubiquitin family | 0.002 | PF00240, PF11976 | Overlap | 25 |
| MH1 domain | 0.007 | PF03165, PF03166, PF10401 | Colocalization and overlap | 30 |
| Basic Leucine Zipper Domain (bZIP) | 0.0088 | PF00170, PF07716 | Overlap | 37 |
aDetails in Supplementary Methods.
bP-value remains under a significant level of 0.05 even after Benjamini and Hochberg correction for multiple testing.
cOver-represented in polyQ proteins (Supplementary Table S5).
dResult is reproducible in Drosophila melanogaster.
Figure 5.Cartoon of proposed polyQ function in protein interaction. Left: a polyQ protein contains a coiled-coil (blue), followed by a polyQ region (red) and a polyP region (green). In the unbound state, the polyQ region is disordered. Right: upon interaction with a protein partner X, the polyQ region adopts a coiled-coil structure that extends the original coiled-coil. The polyP region remains unstructured capping precisely the extension of the coiled-coil.