| Literature DB >> 18927604 |
Elodie Monsellier1, Matteo Ramazzotti, Niccolò Taddei, Fabrizio Chiti.
Abstract
Formation of amyloid-like fibrils is involved in numerous human protein deposition diseases, but is also an intrinsic property of polypeptide chains in general. Progress achieved recently now allows the aggregation propensity of proteins to be analyzed over large scales. In this work we used a previously developed predictive algorithm to analyze the propensity of the 34,180 protein sequences of the human proteome to form amyloid-like fibrils. We show that long proteins have, on average, less intense aggregation peaks than short ones. Human proteins involved in protein deposition diseases do not differ extensively from the rest of the proteome, further demonstrating the generality of protein aggregation. We were also able to reproduce some of the results obtained with other algorithms, demonstrating that they do not depend on the type of computational tool employed. For example, proteins with different subcellular localizations were found to have different aggregation propensities, in relation to the various efficiencies of quality control mechanisms. Membrane proteins, intrinsically disordered proteins, and folded proteins were confirmed to have very different aggregation propensities, as a consequence of their different structures and cellular microenvironments. In addition, gatekeeper residues at strategic positions of the sequences were found to protect human proteins from aggregation. The results of these comparative analyses highlight the existence of intimate links between the propensity of proteins to form aggregates with beta-structure and their biology. In particular, they emphasize the existence of a negative selection pressure that finely modulates protein sequences in order to adapt their aggregation propensity to their biological context.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18927604 PMCID: PMC2557143 DOI: 10.1371/journal.pcbi.1000199
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Figure 1Definition of the aggregation parameters calculated for a sequence.
The aggregation propensity profile and the parameters shown in the figure refer to the peptide A-Dan, used here as an example. Red area: surface of the aggregation peaks (S agg); green: flanking positions.
Figure 2Dependence of the aggregation parameters on protein length.
(A) All proteins are reported with their individual values of Z agg and protein length. For graphical convenience, the log of the protein length is represented. (B–F) Each point represents the average value over all the sequences having a length comprised in an interval of 50 residues. The membrane proteins are excluded from all the analyses reported in the figure. Solid lines (E–F) represent the best fits to an exponential function. The sizes of the substrates typically targeted by the chaperones Hsc70 and TriC [34]–[35] are indicated as horizontal solid lines (E–F).
Figure 3Percentages of proteins without aggregation peaks in different populations.
The number of sequences composing each population is given in parentheses.
Figure 4Cumulative distributions of the aggregation parameters in populations regrouping proteins from different subcellular localizations.
Black: proteome without membrane proteins (28,901 sequences); purple: proteins from the endoplasmic reticulum (331 sequences); dark blue: proteins from the Golgi apparatus (93 sequences); light blue: proteins from the extracellular media (499 sequences); green: proteins from lysosomes (113 sequences); grey: mitochondrial proteins (667 sequences); yellow: nuclear proteins (4,898 sequences); orange: proteins of the cytoskeleton (456 sequences); red: ribosomal proteins (163 sequences). The membrane proteins were excluded from all the subcellular populations analyzed.
Comparisons between the aggregation parameters in different populations.
| Analyzed population | Reference population |
|
|
|
|
|
|
| Endoplasmic reticulum | all except membrane proteins | +++ | +++ | +++ | +++ | +++ | +++ |
| Golgi apparatus | all except membrane proteins | +++ | n.s. | n.s. | +++ | n.s. | ++ |
| Extracellular media | all except membrane proteins | − − − | +++ | +++ | +++ | +++ | +++ |
| Lysosomes | all except membrane proteins | +++ | +++ | +++ | +++ | +++ | +++ |
| Mitochondria | all except membrane proteins | − − − | +++ | +++ | − − | n.s. | − − |
| Nucleus | all except membrane proteins | +++ | − − − | − − − | − − − | − − − | − − − |
| Cytoskeleton | all except membrane proteins | +++ | − − − | − − − | − − − | − − − | − − − |
| Ribosomes | all except membrane proteins | − − − | − − − | − − − | − − − | − − − | − − − |
| Membrane proteins | all | +++ | +++ | +++ | +++ | +++ | +++ |
| Intrinsically disordered proteins | all except membrane proteins | − − − | − − − | − − − | − − − | − − − | − − − |
| Folded proteins | all except membrane proteins | − − − | +++ | +++ | − − − | +++ | − − − |
| Proteins forming fibrillar aggregates in vivo | all except membrane proteins | − | +++ | ++ | n.s. | + | n.s. |
| Proteins from the extracellular media forming fibrillar aggregates in vivo | extracellular media | − − − | + | n.s. | n.s. | n.s. | − − |
| Proteins from the cytoskeleton forming fibrillar aggregates in vivo | cytoskeleton | n.s. | n.s. | n.s. | n.s. | n.s. | − |
| Folded proteins forming fibrillar aggregates in vivo | folded proteins | n.s. | n.s. | n.s. | n.s. | n.s. | n.s. |
| Folded proteins forming fibrillar aggregates in vitro | folded proteins | − | n.s. | n.s. | n.s. | n.s. | n.s. |
The distributions of the aggregation parameter values of the analyzed population are compared to the ones of the reference population using statistical tests (see Methods).
+++ and −−− indicate that the analyzed population has a distribution significantly (p<0.001) shifted to higher or lower values than the reference population in the statistical tests performed, respectively. ++ and −−, idem (p<0.01). + and −, idem (p<0.05).
n.s., the distributions of the analyzed and reference populations are not significantly different (p>0.05).
The results remain unchanged when the sequences without signal peptides of the corresponding subcellular districts (membrane proteins excluded) are compared with a reference database composed of all the human non-membrane protein sequences without the identified signal peptides.
Proteins forming amyloid fibrils and intracellular inclusions with amyloid-like characteristics.
The distributions of the two populations differ significantly although their median values are not significantly different (significant differences in the Kolmogorov-Smirnov test and not in the Mann-Whitney test).
The difference was significant in the Mann-Whitney test but not in the Kolmogorov-Smirnov test (parameter lacking a defined distribution).
Figure 5Cumulative distributions of the aggregation parameters in different populations.
Grey: all proteome (34,180 sequences); pink: membrane proteins (5,279 sequences); black: proteome without membrane proteins (28,901 sequences); green: folded proteins (ASTRAL40 database; 1,391 sequences); blue: intrinsically disordered proteins (43 sequences); red: proteins forming amyloids or related intracellular inclusions in vivo and associated with human diseases (31 sequences); orange: folded proteins forming amyloids or related intracellular inclusions in vivo and associated with human diseases (15 sequences).
Figure 6Gatekeeper residues in the human proteome.
(A) Amino acid frequencies at different positions, relative to their global frequencies in the human proteome. A relative frequency of 1.0 for a given residue at a given position means that the residue occupies that position with a frequency identical to that in the whole human proteome. Black: inside the aggregation peaks; grey: at the flanking positions; white: outside the aggregation peaks and far from the flanks (“valleys”). (B) Dependence of the frequencies of the gatekeepers at the flanks on the length of the aggregation peak. Filled circles: average frequencies of Pro, Arg and Lys; empty circles: average frequencies of Asp and Glu. The membrane proteins are removed from the database.