| Literature DB >> 20976197 |
Michail Yu Lobanov1, Eugeniya I Furletova, Natalya S Bogatyreva, Michail A Roytberg, Oxana V Galzitskaya.
Abstract
Intrinsically disordered regions serve as molecular recognition elements, which play an important role in the control of many cellular processes and signaling pathways. It is useful to be able to predict positions of disordered regions in protein chains. The statistical analysis of disordered residues was done considering 34,464 unique protein chains taken from the PDB database. In this database, 4.95% of residues are disordered (i.e. invisible in X-ray structures). The statistics were obtained separately for the N- and C-termini as well as for the central part of the protein chain. It has been shown that frequencies of occurrence of disordered residues of 20 types at the termini of protein chains differ from the ones in the middle part of the protein chain. Our systematic analysis of disordered regions in PDB revealed 109 disordered patterns of different lengths. Each of them has disordered occurrences in at least five protein chains with identity less than 20%. The vast majority of all occurrences of each disordered pattern are disordered. This allows one to use the library of disordered patterns for predicting the status of a residue of a given protein to be ordered or disordered. We analyzed the occurrence of the selected patterns in three eukaryotic and three bacterial proteomes.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20976197 PMCID: PMC2954861 DOI: 10.1371/journal.pcbi.1000958
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Figure 1Illustration of definition of disordered fraction.
The given protein chain occurs in two PDB files: 1i8f and 1lnx. The C-terminal glycine is disordered in nine out of 14 cases. Therefore, for glycine the weight to be disordered is 9/14 and to be ordered is 5/14, correspondingly. For this example, there are 8.7 average disordered residues.
Figure 2Length distribution of disordered regions in protein chains from the DRDB.
Distribution of disordered amino acid residues in protein structures from the DRDB.
| Fraction of all residues | Fraction of disordered residues | |
|
|
|
|
| 40 residues near the | 15% | 42% |
| 40 residues near the | 15% | 30% |
|
|
|
|
Figure 3Fraction of disordered amino acid residues for each of the 20 types in the middle part of a protein chain.
The dashed line shows the total fraction of disordered residues in the middle part of the protein chain.
Fraction of disordered amino acid residues for each of the 20 types in the termini, in the middle part of protein chains, and in the whole proteins.
| a.a. | TRP | ILE | PHE | CYS | TYR | LEU | VAL | MET | ALA | HIS |
|
| 0.032 | 0.054 | 0.061 | 0.044 | 0.055 | 0.077 | 0.069 | 0.351 | 0.134 | 0.427 |
|
| 0.029 | 0.046 | 0.045 | 0.047 | 0.038 | 0.063 | 0.054 | 0.065 | 0.090 | 0.376 |
| middle | 0.009 | 0.011 | 0.011 | 0.011 | 0.011 | 0.011 | 0.013 | 0.015 | 0.019 | 0.020 |
| whole | 0.015 | 0.022 | 0.022 | 0.022 | 0.021 | 0.028 | 0.027 | 0.093 | 0.046 | 0.166 |
Figure 4Number of the correlated patterns with the considered pattern in the DRDB.
Two patterns are correlated if there are at least 4 proteins containing both patterns and the identity between the proteins is no more than 20%.
Occurrence of patterns in the eukaryotic proteomes.
| Pattern | Number of groups, identity inside group >20% | Fraction of disordered residues in the patterns from the DRDB | Probability of occurrence of the patterns in protein | Occurrence in the human proteome/in the DRDB | Occurrence in the fruit fly proteome/in the DRDB | Occurrence in the nematode worm proteome/in the DRDB |
| PPPPPP | 15 | 0.70 | 0.00017 | 703/32 | 304/32 | 247/32 |
| QQQQQQ | 11 | 0.66 | 0.00004 | 331/17 | 869/17 | 249/17 |
| EEEDEE | 55 | 0.65 | 0.00015 | 242/55 | 42/55 | 54/55 |
| QPPPPP | 9 | 0.74 | 0.00013 | 163/16 | 66/16 | 32/16 |
| APAPAP | 17 | 0.51 | 0.00067 | 121/30 | 44/30 | 34/30 |
| HHHHHH | 1227 | 0.93 | 0.00002 | 99/5423 | 133/5423 | 57/5423 |
| EDEDEE | 23 | 0.64 | 0.00014 | 97/29 | 27/29 | 42/29 |
| DEEEED | 12 | 0.68 | 0.00014 | 83/16 | 26/16 | 39/16 |
| GGGGGSG | 17 | 0.65 | 0.00028 | 78/29 | 80/29 | 8/29 |
| GSSGSS | 66 | 0.68 | 0.00120 | 67/93 | 35/93 | 19/93 |
| PPPPPK | 18 | 0.81 | 0.00027 | 62/31 | 24/31 | 32/31 |
| DDEDED | 14 | 0.64 | 0.00013 | 53/16 | 31/16 | 26/16 |
| SGGGGSG | 10 | 0.82 | 0.00022 | 31/29 | 19/29 | 2/29 |
| KKKGKK | 26 | 0.55 | 0.00181 | 27/56 | 8/56 | 13/56 |
| EEEEAP | 12 | 0.66 | 0.00028 | 26/21 | 6/21 | 9/21 |
| KKRKRK | 12 | 0.54 | 0.00067 | 25/19 | 6/19 | 7/19 |
| SGGGSGG | 12 | 0.68 | 0.00024 | 20/17 | 17/17 | 5/17 |
| SHHHHH | 558 | 0.98 | 0.00005 | 19/1566 | 27/1566 | 12/1566 |
| GGSGSGG | 17 | 0.77 | 0.00027 | 14/50 | 23/50 | 6/50 |
| NHHHHH | 19 | 0.83 | 0.00003 | 10/25 | 14/25 | 8/25 |