| Literature DB >> 24688741 |
Abstract
Although it is reasonable to expect that the frequency of a generic dipeptide XY in proteins is the same of its counterpart YX, on the basis of an accurate statistical analysis of a large number of protein sequences, it appears that some dipeptides XY are considerably more frequent than their mirror images YX, referred to as antidipeptides. Given that it has been verified that this unexpected anisotropic frequency of occurrence is unbiased by the type of protein sequences that are analyzed, it is possible to conclude that this is a genuine phenomenon. Nevertheless, it was impossible to find the mechanism underlying this unexpected phenomenon, which does not seem to be related to diverse conformational propensities, to the different conformational flexibility of the peptide/antidipeptide pair, to dissimilar accessibility to the solvent or to gene random mutations.Entities:
Keywords: Amino acid composition; Antidipeptide; Dipeptide; Protein sequence
Year: 2013 PMID: 24688741 PMCID: PMC3962099 DOI: 10.5936/csbj.201308001
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
List of the ensembles of protein sequences used in the present study.
| Dataset | Description | Number of proteins / Number of residues | Notes |
|---|---|---|---|
| Any | Any protein | 39,029/19,363,703 | |
|
| Proteins of | 10,290/6,690,249 | |
|
| Proteins of | 204/63,809 | |
| Mono | Monomeric proteins | 1,307/542,334 | |
| Homo | Proteins chains that form homo-oligomeric complexes | 3,374/1,455,147 | |
| Hetero | Proteins chains that form hetero-oligomeric complexes | 1,490/721,979 | |
| Cyto | Cytoplasmic proteins | 9,421/5,622,499 | Only proteins the subcellular location of which was proven experimentally |
| Memb | Membrane proteins | 8,757/5,022,480 | Only proteins the subcellular location of which was proven experimentally |
| Extra | Secreted / Extracellular proteins | 344/295,553 | The subcellular location was proven experimentally |
Figure 1C values for the dipeptides AB with (A≠ B). Values are colored according to the following scheme: white if C ≤ 10, light gray if 10 < C ≤ 20, dark gray if 20 < C ≤ 30, and black if C > 30.
Average C values for the didpetides that contain the amino acid X and another one, different from X. Standard errors of the average values are given in parentheses.
| X |
|
|---|---|
| A | 9.03(1.45) |
| C | 9.86(1.87) |
| D | 9.76(1.74) |
| E | 5.21(0.84) |
| F | 4.17(0.73) |
| G | 4.95(1.01) |
| H | 4.77(1.09) |
| I | 5.57(1.14) |
| K | 7.60(1.58) |
| L | 5.65(0.89) |
| M | 10.45(1.53) |
| N | 7.12(1.19) |
| P | 11.86(2.23) |
| Q | 6.02(0.83) |
| R | 4.51(0.80) |
| S | 5.19(0.94) |
| T | 4.89(1.10) |
| V | 3.47(0.72) |
| W | 8.43(1.64) |
| Y | 4.80(0.85) |
The seven pairs of dipeptide/antidipeptide with the highest C values.
| dipeptide | n. observations | antidipeptide | n. observations |
|
|---|---|---|---|---|
| EP | 5384 | PE | 7571 | 33.76 |
| PW | 1113 | WP | 873 | 24.14 |
| MW | 403 | WM | 513 | 24.12 |
| GP | 5965 | PG | 7486 | 22.61 |
| AM | 2933 | MA | 3652 | 21.85 |
| IP | 5417 | PI | 4384 | 21.07 |
| CP | 1840 | PC | 1490 | 20.97 |
C values computed with the sequence sets summarized in Table 1. Only the C values for the seven pairs of dipeptide/antidipeptide of Table 3 are reported.
|
|
| Cyto | Memb | Extra | Mono | Homo | Hetero | |
|---|---|---|---|---|---|---|---|---|
|
| 30.31 | 59.39 | 34.76 | 31.27 | 15.11 | 45.96 | 37.5 | 34.92 |
|
| 27.51 | 31.25 | 26.54 | 26.27 | 31.06 | 8.79 | 29.19 | 41.51 |
|
| 19.13 | 20.5 | 21.47 | 20.02 | 23.73 | 50.6 | 30.46 | 42.49 |
|
| 20.46 | 50 | 18.33 | 19.76 | 31.01 | 29.18 | 27.08 | 23.31 |
|
| 32.82 | 31.58 | 18.46 | 19.27 | 48.23 | 27.86 | 22.5 | 31.53 |
|
| 29.93 | 52.48 | 23.42 | 23.22 | 15.21 | 29.24 | 20.12 | 25.37 |
|
| 19.69 | 30.3 | 15.72 | 12.18 | 17.11 | 17.77 | 23.92 | 27.23 |
C values computed with protein subsets of increasing size from 1000, to 3000, to 6000, to 12000 proteins until all the sequences are included into the analysis. Standard errors are given in parentheses.
| 1,000 | 3,000 | 6,000 | 12,000 | all | |
|---|---|---|---|---|---|
|
| 34.29(0.90) | 33.99(1.22) | 33.89(1.28) | 33.65(0.62) | 33.76 |
|
| 23.81(2.04) | 23.57(2.65) | 23.59(1.39) | 23.71(0.25) | 24.14 |
|
| 25.11(2.53) | 23.52(3.28) | 23.78(3.21) | 23.78(0.57) | 24.12 |
|
| 23.03(1.21) | 22.28(0.86) | 22.17(0.69) | 22.13(0.58) | 22.61 |
|
| 21.82(1.16) | 22.11(1.04) | 21.98(1.14) | 22.06(0.91) | 21.85 |
|
| 21.11(0.90) | 21.16(0.90) | 21.05(0.78) | 21.13(0.68) | 21.07 |
|
| 22.69(1.46) | 21.65(1.54) | 20.82(1.24) | 20.83(0.96) | 20.97 |
The seven pairs of dipeptide/antidipeptide with the largest difference in propensity of a residue to be followed by another residue. In the first line, for example, it can be read that the propensity of a methionine to be followed by a triptophane in equal to 0.83 while the propensity of a triptophane to be followed by a methionine is 1.17.
| Dipeptide | Propensity | Antidipeptide | Propensity | Difference |
|---|---|---|---|---|
| MW | 0.83 | WM | 1.17 | 0.34 |
| PE | 1.05 | EP | 0.75 | 0.3 |
| GP | 0.89 | PG | 1.12 | 0.23 |
| CP | 1.08 | PC | 0.87 | 0.21 |
| PW | 0.93 | WP | 0.73 | 0.2 |
| EN | 1.1 | NE | 0.9 | 0.2 |
| IP | 1 | PI | 0.81 | 0.19 |
Analysis of the pairs of dipeptide/antidipeptide (AB/BA) reported in Tables 3 and 6 in the form of A(X)5B, when five amino acids (of any type) are intercalated between A and B.
| Dipeptide / antidipeptide |
| Prop. Dipept. | Prop. Antidip. | Difference |
|---|---|---|---|---|
| EP/PE | 2.74 | 0.9 | 0.87 | 0.03 |
| PW/WP | 2.65 | 1.04 | 1.02 | 0.02 |
| MW/WM | 3.71 | 0.96 | 1.02 | 0.06 |
| GP/PG | 3.85 | 1.02 | 0.98 | 0.04 |
| AM/MA | 10.13 | 1 | 1 | 0 |
| IP/PI | 3.94 | 0.89 | 0.92 | 0.03 |
| CP/PC | 0.21 | 1.03 | 1.02 | 0.01 |
| EN/NE | 2.81 | 0.92 | 0.95 | 0.03 |