| Literature DB >> 25506420 |
Sandeep Chakraborty1, Ravindra Venkatramani2, Basuthkar J Rao1, Bjarni Asgeirsson3, Abhaya M Dandekar4.
Abstract
The structure of a protein provides insight into its physiological interEntities:
Year: 2013 PMID: 25506420 PMCID: PMC4257144 DOI: 10.12688/f1000research.2-243.v3
Source DB: PubMed Journal: F1000Res ISSN: 2046-1402
Electrostatic potential differences (EPD) for consecutive residue pairs for C α atoms for residue pairs that include proline.
While these pairs have for a low standard deviation (SD) like all other pairs, the absolute value of their mean is different (higher) than any pair that does not include a proline. This also highlights the unique nature of proline in protein structures.
| Pair | Mean EPD | SD | Number of
|
|---|---|---|---|
| AP | -167.3 | 28.6 | 328 |
| CP | -153 | 30.3 | 45 |
| DP | -184.5 | 29.5 | 290 |
| EP | -176.6 | 27.3 | 346 |
| FP | -160.4 | 25.3 | 173 |
| GP | -165.3 | 29.2 | 339 |
| HP | -162.7 | 34.6 | 92 |
| IP | -161.9 | 27.2 | 175 |
| KP | -156.6 | 29.6 | 203 |
| LP | -165.2 | 28.3 | 323 |
| MP | -161.3 | 29.5 | 70 |
| NP | -159.6 | 27.6 | 168 |
| PQ | 168.5 | 26.1 | 131 |
| PR | 156.2 | 31.5 | 184 |
| PS | 172.3 | 25.9 | 269 |
| PT | 170.8 | 27.6 | 218 |
| PV | 164.4 | 30.4 | 299 |
| PW | 158.5 | 29.7 | 70 |
| PY | 155.5 | 29 | 141 |
Figure 1. Electrostatic potential differences (PD) for the C-N peptide bond.
AA: Alanine/Alanine, AC: Alanine/Cysteine, HS: Histidine/Serine and DF: Aspartic-acid/Phenylalanine. ( a) Probability distribution for four pairs of amino acids. ( b) Scatter plot for all pairs of amino acids. It can be seen that the mean and SD for all pairs of amino acids are the same. Further, the variance is large (SD=~50), indicating that this feature is not tightly constrained in peptide structures.
Figure 2. Electrostatic potential differences (PD) for consecutive residue pairs for C α atoms.
A: Alanine/Alanine, AC: Alanine/Cysteine, HS: Histidine/Serine, DF: Aspartic-acid/Phenylalanine. ( a) Probability distribution for four pairs of amino acids. ( b) Scatter plot for all pairs of amino acids. It is seen that pairs of amino acids which include proline have a higher mean, although the magnitude of SD is the same.
Figure 3. Electrostatic potential differences ( PD) for consecutive residue pairs for C β atoms.
AA: Alanine/Alanine, AD: Alanine/Aspartic-acid, AE: Alanine/Glutamic-acid, DF: Aspartic-acid/Phenylalanine, DY - Aspartic-acid/Tyrosine, HT: Histidine/Threonine, HS: Histidine/Serine. ( a) Probability distribution for seven pairs of amino acids. ( b) Scatter plot for all pairs of amino acids. The pairs which include cysteine have a high standard deviation. It is seen that the mean is much more varied than the electrostatic potential difference (EPD) for C α and the C-N peptide bond.
Electrostatic potential differences (EPD) for consecutive residue pairs for C β atoms for residue pairs that has one cysteine.
These pairs have a random values for the mean and a high standard deviation (SD), with the exception of the pair ‘CC’ (not the disulfide bond) which has a low mean value and SD. Consequently, these values can not discriminate between pairs of amino acids.
| Pair | Mean EPD | SD | Number of
|
|---|---|---|---|
| AC | -53.7 | 86.9 | 178 |
| CC | -7.1 | 30.4 | 36 |
| CD | 103.8 | 92.7 | 154 |
| CE | 96.8 | 94.7 | 121 |
| CF | -21.4 | 84.2 | 85 |
| CH | -12 | 93.3 | 97 |
| CI | 32.8 | 80.9 | 136 |
| CK | 50.2 | 93.9 | 131 |
| CL | 42.8 | 90.6 | 224 |
| CM | 61.9 | 100 | 39 |
| CN | 63.7 | 96.1 | 115 |
| CP | 24.9 | 88 | 45 |
| CQ | 66.7 | 92.1 | 95 |
| CR | 35.4 | 95.1 | 144 |
| CS | 106.3 | 98.1 | 184 |
| CT | 109.9 | 97.5 | 173 |
| CV | 54.9 | 90.7 | 183 |
| CW | -0.2 | 85.9 | 43 |
| CY | 8.5 | 91.5 | 96 |
Electrostatic potential differences (EPD) in a sample of consecutive residue pairs of C β atoms.
These pairs are used for discriminating predicted structures in order to obtain the native structure. The complete set is available at https://github.com/sanchak/mqap.
| Pair | Mean EPD | SD | Number of
|
|---|---|---|---|
| DF | -108.9 | 29.5 | 481 |
| DY | -107.4 | 30.7 | 442 |
| DH | -105.2 | 33.5 | 242 |
| DW | -104.1 | 27.7 | 209 |
| EH | -98.5 | 28.5 | 200 |
| EY | -96.5 | 28 | 378 |
| EW | -94.2 | 29.8 | 184 |
| SY | -93.5 | 27.5 | 403 |
| EF | -93.1 | 27.6 | 439 |
| TY | -93 | 28.6 | 384 |
| TW | -90.8 | 28.7 | 144 |
| SW | -89.2 | 27.7 | 169 |
| FT | 89.2 | 26.8 | 436 |
| FS | 92.3 | 28.4 | 453 |
| HS | 93.7 | 31.8 | 235 |
| HT | 95.1 | 31.5 | 235 |
Figure 4. Standard deviation (SD) of the electrostatic potential difference between C β atoms increases with increasing sequence distance for amino acid pairs.
Each sequence distance has at least 30 sample points. DF: Aspartic-acid/Phenylalanine, HS: Histidine/Serine. As expected, there is lesser correlation in the EPD values between the shown amino acid pairs ‘DF’ and ‘HS’ as the sequence distance between the residues increases. The SD for distance 1 (i.e. consecutive residues) is 29.8 EPD units and 31.8 EPD units for ‘DF’ and ‘HS’, respectively - and rises to around 60 EPD units with increasing sequence distance.
Misfold decoy set.
This decoy set has ~20 protein structures - each of which has a correct and an incorrect structure specified. The PDBs are sorted based on the number of residues in the structure (NRes). Three of the structures (1CBH, 1FDX and 2SSI) have a lower PDScore for the incorrect structure.
| PDB | NRes | Correct
| Incorrect
| Specificity |
|---|---|---|---|---|
| 1CBH | 36 | 18.7 | 12.6 | 0 |
| 1PPT | 36 | 18 | 33.5 | 1 |
| 1FDX | 54 | 33 | 30.9 | 0 |
| 5RXN | 54 | 25.1 | 35 | 1 |
| 1SN3 | 65 | 20.7 | 30.3 | 1 |
| 2CI2 | 65 | 19.9 | 35.2 | 1 |
| 2CRO | 65 | 26.7 | 43.4 | 1 |
| 1HIP | 85 | 19.1 | 36.8 | 1 |
| 2B5C | 85 | 22.1 | 34.3 | 1 |
| 2CDV | 107 | 17.4 | 40.9 | 1 |
| 2SSI | 107 | 22.6 | 20 | 0 |
| 1BP2 | 123 | 21 | 44.1 | 1 |
| 2PAZ | 123 | 19.3 | 27.3 | 1 |
| 1P2P | 124 | 28.6 | 29 | 1 |
| 1RN3 | 124 | 20.7 | 28.8 | 1 |
| 1LH1 | 153 | 18 | 26.2 | 1 |
| 2I1B | 153 | 19 | 27.6 | 1 |
| 1REI | 212 | 16.5 | 21.8 | 1 |
| 5PAD | 212 | 18.8 | 32 | 1 |
| 1RHD | 293 | 23.2 | 31.9 | 1 |
| 2CYP | 293 | 21.2 | 35.8 | 1 |
| 2TMN | 316 | 27.2 | 32.2 | 1 |
| 2TS1 | 317 | 21.3 | 28.2 | 1 |
hg_structal and 4state_reduced decoy sets.
The PDBs are sorted based on specificity. (A) The hg_structal decoy set has ~30 protein structures - each of which has 30 structures. The average specificity obtained for the set is 0.91. (B) The 4state_reduced decoy set has 7 protein structures - each of which has ~600 structures. The average specificity obtained for the set is 0.94. (C) The fisa set has 4 protein structures - each of which has 500 structures. The electrostatic discriminator has low specificities in this case. We have previously demnostrated that this decoy set can be discriminated by a distance based criterion. It consists of physically nonviable structures, thus rendering an electrostatic analysis meaningless. NRes = number of residues, NStructures = number of structures in the decoy set.
| PDB | NRes | NStructures | Specificity | |
|---|---|---|---|---|
| (A)
| 2PGHA | 141 | 30 | 0.2 |
| 1MBS | 153 | 30 | 0.5 | |
| 2DHBA | 141 | 30 | 0.6 | |
| 1HDAB | 145 | 30 | 0.9 | |
| 1MYT | 146 | 30 | 0.9 | |
| 1HLM | 158 | 30 | 0.9 | |
| 1HSY | 153 | 30 | 0.9 | |
| 1MBA | 146 | 30 | 0.9 | |
| 1MYGA | 153 | 30 | 0.9 | |
| 1MYJA | 153 | 30 | 0.9 | |
| 1ASH | 147 | 30 | 1 | |
| 1BABB | 146 | 30 | 1 | |
| 1COLA | 197 | 30 | 1 | |
| 1CPCA | 162 | 30 | 1 | |
| 1ECD | 136 | 30 | 1 | |
| 1EMY | 153 | 30 | 1 | |
| 1FLP | 142 | 30 | 1 | |
| 1GDM | 153 | 30 | 1 | |
| 1HBG | 147 | 30 | 1 | |
| 1HBHA | 142 | 30 | 1 | |
| 1HBHB | 146 | 30 | 1 | |
| 1HDAA | 141 | 30 | 1 | |
| 1HLB | 157 | 30 | 1 | |
| 1ITHA | 141 | 30 | 1 | |
| 1LHT | 153 | 30 | 1 | |
| 2DHBB | 146 | 30 | 1 | |
| 2LHB | 149 | 30 | 1 | |
| 2PGHB | 146 | 30 | 1 | |
| 4SDHA | 145 | 30 | 1 | |
| (B)
| 2CRO | 65 | 675 | 0.8 |
| 3ICB | 75 | 654 | 0.9 | |
| 4RXN | 54 | 677 | 0.9 | |
| 4PTI | 118 | 688 | 1 | |
| 1CTF | 131 | 631 | 1 | |
| 1R69 | 97 | 676 | 1 | |
| 1SN3 | 65 | 661 | 1 | |
| (C)
| 4ICB | 76 | 501 | 0 |
| 1FC2 | 44 | 501 | 0.4 | |
| 1HDDC | 57 | 501 | 0.1 | |
| 2CRO | 65 | 501 | 0.7 |
Proteins from the PISCES database used for learning values.
Set of 1000 proteins from the PISCES database with percentage identity cutoff of 20%, resolution cutoff of 1.6 Å, R-factor cutoff of 0.25, and a RDCC cutoff of 0.012 Å used to learn feature values.
| 1A62 1AH7 1AHO 1AIE 1ARB 1ATG 1B5E 1BGF 1BKR 1BX7 1C1K 1C4Q 1C5E 1C75 1C7K 1CC8 1CCW
|
ig_structal decoy set.
The PDBs are sorted based on specificity: The ig_structal decoy set has ~61 protein structures - each of which has 61 structures. The average specificity obtained for the set is 0.97. NRes =: number of residues, NStructures =: number of structures in the decoy set.
| PDB | NRes | NStructures | Specificity |
|---|---|---|---|
| 1FPT | 11 | 61 | 0 |
| 1IKF | 233 | 61 | 0.5 |
| 1IGM | 115 | 61 | 0.9 |
| 1ACY | 211 | 61 | 1 |
| 1BAF | 214 | 61 | 1 |
| 1BBD | 219 | 61 | 1 |
| 1BBJ | 211 | 61 | 1 |
| 1DBB | 211 | 61 | 1 |
| 1DFB | 212 | 61 | 1 |
| 1DVF | 107 | 61 | 1 |
| 1EAP | 213 | 61 | 1 |
| 1FAI | 214 | 61 | 1 |
| 1FBI | 214 | 61 | 1 |
| 1FGV | 107 | 61 | 1 |
| 1FIG | 214 | 61 | 1 |
| 1FLR | 219 | 61 | 1 |
| 1FOR | 210 | 61 | 1 |
| 1FRG | 217 | 61 | 1 |
| 1FVC | 109 | 61 | 1 |
| 1FVD | 214 | 61 | 1 |
| 1GAF | 214 | 61 | 1 |
| 1GGI | 211 | 61 | 1 |
| 1GIG | 210 | 61 | 1 |
| 1HIL | 211 | 61 | 1 |
| 1HKL | 214 | 61 | 1 |
| 1IAI | 214 | 61 | 1 |
| 1IBG | 213 | 61 | 1 |
| 1IGC | 213 | 61 | 1 |
| 1IGF | 214 | 61 | 1 |
| 1IGI | 213 | 61 | 1 |
| 1IND | 211 | 61 | 1 |
| 1JEL | 230 | 61 | 1 |
| 1JHL | 108 | 61 | 1 |
| 1KEM | 217 | 61 | 1 |
| 1MAM | 214 | 61 | 1 |
| 1MCP | 220 | 61 | 1 |
| 1MFA | 113 | 61 | 1 |
| 1MLB | 214 | 61 | 1 |
| 1MRD | 211 | 61 | 1 |
| 1NBV | 219 | 61 | 1 |
| 1NCB | 386 | 61 | 1 |
| 1NGQ | 211 | 61 | 1 |
| 1NMB | 385 | 61 | 1 |
| 1NSN | 213 | 61 | 1 |
| 1OPG | 214 | 61 | 1 |
| 1PLG | 215 | 61 | 1 |
| 1RMF | 219 | 61 | 1 |
| 1TET | 211 | 61 | 1 |
| 1UCB | 211 | 61 | 1 |
| 1VFA | 108 | 61 | 1 |
| 1VGE | 214 | 61 | 1 |
| 1YUH | 211 | 61 | 1 |
| 2CGR | 219 | 61 | 1 |
| 2FB4 | 214 | 61 | 1 |
| 2FBJ | 213 | 61 | 1 |
| 2GFB | 214 | 61 | 1 |
| 3HFL | 223 | 61 | 1 |
| 3HFM | 214 | 61 | 1 |
| 6FAB | 214 | 61 | 1 |
| 7FAB | 204 | 61 | 1 |
| 8FAB | 206 | 61 | 1 |