| Literature DB >> 17201922 |
Abstract
BACKGROUND: Protein-Carbohydrate interactions are crucial in many biological processes with implications to drug targeting and gene expression. Nature of protein-carbohydrate interactions may be studied at individual residue level by analyzing local sequence and structure environments in binding regions in comparison to non-binding regions, which provide an inherent control for such analyses. With an ultimate aim of predicting binding sites from sequence and structure, overall statistics of binding regions needs to be compiled. Sequence-based predictions of binding sites have been successfully applied to DNA-binding proteins in our earlier works. We aim to apply similar analysis to carbohydrate binding proteins. However, due to a relatively much smaller region of proteins taking part in such interactions, the methodology and results are significantly different. A comparison of protein-carbohydrate complexes has also been made with other protein-ligand complexes.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17201922 PMCID: PMC1780050 DOI: 10.1186/1472-6807-7-1
Source DB: PubMed Journal: BMC Struct Biol ISSN: 1472-6807
Figure 1Some typical protein-carbohydrate interactions. All the atoms making hydrogen bonded contacts between sugars and amino acids are labelled.
Figure 2Comparison of binding site propensity of each residue in Procarb40, PDNA62 & PLD116 (residue was marked as binding if any of its atom fell within 3.5 Å of any atom of the ligand/DNA/carbohydrate. Propensity values were obtained by pooling all residues of the same type in all proteins to a single database of binding and non-binding sites. To compute the error bars, propensity values were calculated for each protein separately and standard deviations in propensity values was used as an error bar.
Propensities of Procarb40, PDNA62 & PLD116 along with their binding and non-binding data
| 0.43 | 9 | 494 | 0.64 | 42 | 389 | 0.79 | 109 | 2684 | |
| 0.00 | 0 | 29 | 0.34 | 7 | 143 | 1.07 | 24 | 436 | |
| 1.41 | 27 | 433 | 0.36 | 18 | 292 | 0.79 | 84 | 2009 | |
| 1.81 | 29 | 356 | 0.39 | 32 | 510 | 0.92 | 92 | 1952 | |
| 0.66 | 9 | 318 | 0.77 | 33 | 245 | 1.09 | 70 | 1346 | |
| 0.80 | 20 | 581 | 0.71 | 46 | 372 | 1.26 | 176 | 2633 | |
| 1.58 | 8 | 114 | 1.08 | 39 | 194 | 2.09 | 81 | 712 | |
| 0.12 | 2 | 392 | 0.48 | 30 | 373 | 0.72 | 70 | 1837 | |
| 1.40 | 26 | 419 | 1.95 | 180 | 423 | 0.59 | 65 | 2053 | |
| 0.34 | 8 | 561 | 0.38 | 39 | 624 | 0.81 | 120 | 2872 | |
| 0.19 | 1 | 124 | 0.54 | 14 | 149 | 1.11 | 42 | 716 | |
| 1.96 | 38 | 429 | 1.45 | 74 | 260 | 1.17 | 92 | 1485 | |
| 0.40 | 5 | 297 | 0.66 | 35 | 307 | 0.45 | 38 | 1597 | |
| 1.54 | 18 | 263 | 1.19 | 61 | 272 | 0.74 | 46 | 1123 | |
| 2.77 | 32 | 246 | 2.41 | 208 | 360 | 1.80 | 139 | 1450 | |
| 0.43 | 9 | 499 | 1.33 | 91 | 355 | 1.03 | 112 | 2049 | |
| 0.70 | 15 | 499 | 1.36 | 85 | 325 | 0.87 | 90 | 2030 | |
| 0.00 | 0 | 472 | 0.59 | 40 | 399 | 0.73 | 92 | 2315 | |
| 3.31 | 23 | 144 | 1.40 | 22 | 81 | 2.30 | 67 | 518 | |
| 1.68 | 25 | 333 | 1.19 | 43 | 189 | 1.88 | 125 | 1189 | |
Figure 3Comparison between mean ASA values of residues in binding and non-binding sites for Procarb40. Error bars are taken from their standard deviation in each protein. The graph does not contain cystein and valine data as none of these residues were found to be in the binding regions.
Comparison of Binary and PSSM prediction results using jackknife leave-one-out method (binding sites were labeled at 3.5 Å cut-off distance between carbohydrate and protein atoms).
| GalBind18 | Leave1 out (Using PSSM) | 0.63 | 0.79 | 0.71 | 0.08859 |
| GalBind18 | Leave1 out (Using single sequences) | 0.62 | 0.68 | 0.65 | |
| Procarb40 | Leave1 out (Using PSSM) | 0.87 | 0.23 | 0.55 | 0.00209 |
| Procarb40 | Leave1 out (Using single sequences) | 0.68 | 0.55 | 0.61 |
Due to a large number of iterations required in a leave-one-out method, the prediction performance has a significant standard deviation, which has been shown in brackets. P-values are for two-tailed t-test conducted to distinguish between the predictions performances of single sequences versus evolutionary information coded by PSSM. In Procarb40, evolutionary profiles give a significantly poorer result than single sequences, due to a high false positive rate (low specificity).
Showing Procarb40 dataset. (Some cells are left empty as no Pfam ID could be found for them).
| Serpins | ALPHA-D-MANNOSE | C6 H12 O6 | MAN | |
| Interferon | ZINC ION | ZN1 2+ | ZN | |
| Fibroblast Growth Factors | SELENOMETHIONINE | 6(C5 H11 N1 O2 SE1) | MSE | |
| Lectin legB | CALCIUM ION | 4(CA1 2+) | CA | |
| CBM_5_12 | GLYCEROL | C3 H8 O3 | GOL | |
| Toxin_R_bind_C | GLUCOSE | 2(C6 H12 O6) | GLC | |
| Ricin_B_lectin | FUCOSE | C6 H12 O5 | FUC | |
| EGF | FUCOSE | C6 H12 O5 | FUC | |
| Annexin | CALCIUM ION | 9(CA1 2+) | CA | |
| Kringle | O2-SULFO-GLUCURONIC ACID | 3(C6 H10 O10 S1) | IDS | |
| CBM4/9 | CALCIUM ION | CA1 2+ | CA | |
| Family 29 carbohydrate binding module | GLUCOSE | C6 H14 O6 | GLC | |
| Bac_rhodopsin | GLUCOSE | C6 H12 O6 | GLC | |
| PapG _N | GLUCOSE | C6 H12 O6 | GLC | |
| B_lectin | ALPHA-D-MANNOSE | 8(C1 H12 O6) | MAN | |
| Lectin_legB | FUCOSE | C6 H12 O6 | FUC | |
| ALPHA-D-MANNOSE | 8(C6 H12 O6) | MAN | ||
| CALCIUM ION | CA1 2+ | CA | ||
| Lectin_legB | CALCIUM ION | 2(CA1 2+) | CA | |
| V-set | GLUCOSE | 2(C6 H12 O6) | GLC | |
| Sushi | O2-SULFO-GLUCURONIC ACID | 8(C6 H10 O10 S1) | IDS | |
| Stap_Strp_tox_C | GLUCOSE | C6 H12 O6 | GLC | |
| Lectin_C | CALCIUM ION | 3(CA1 2+) | CA | |
| Gal-bind_lectin | D-GALACTOSE | 4(C6 H12 O6) | GAL | |
| Chitin_bind_1 | AMINO GROUP | H2 N1 | NH2 | |
| Sulfotransfer_1 | SODIUM ION | 2(NA1 1+) | NA | |
| D-GALACTOSE | 4(C6 H12 O6) | GAL | ||
| CBM_6 | CALCIUM ION | 2(CA1 2+) | CA | |
| SODIUM ION | NA1 1+ | NA | ||
| CBM_6 | CALCIUM ION | 4(CA1 2+) | CA | |
| ALPHA-D-MANNOSE | 20(C6 H12 O6) | MAN | ||
| Polyoma Coat | D-GALACTOSE | 5(C6 H12 O6) | GAL | |
| CBM_6 | SODIUM ION | 6(NA1 1+) | NA | |
| Toxin_1 | CITRIC ACID | C6 H8 O7 | CIT | |
| SLT beta | BUTYL GROUP | 3(C4 H9) | BUT | |
| Plug | GLUCOSE | C6 H12 O6 | GLC | |
| LamB | GLUCOSE | C6 H12 O6 | GLC | |
| Enterotoxin b | GLUCOSE | 5(C6 H12 O6) | GLC | |
| Cellulase | ALPHA-D-MANNOSE | 3(C6 H12 O6) | MAN | |
| GLUCOSE | 3(C6 H12 O6) | GLC |