| Literature DB >> 19397821 |
Hemajit Singh1, Shandar Ahmad.
Abstract
BACKGROUND: Solvent accessibility (ASA) of amino acid residues is often transformed from absolute values of exposed surface area to their normalized relative values. This normalization is typically attained by assuming a highest exposure conformation based on extended state of that residue when it is surrounded by Ala or Gly on both sides i.e. Ala-X-Ala or Gly-X-Gly solvent exposed area. Exact sequence context, the folding state of the residues, and the actual environment of a folded protein, which do impose additional constraints on the highest possible (or highest observed) values of ASA, are currently ignored. Here, we analyze the statistics of these constraints and examine how the normalization of absolute ASA values using context-dependent Highest Observed ASA (HOA) instead of context-free extended state ASA (ESA) of residues can influence the performance of sequence-based prediction of solvent accessibility. Characterization of burial and exposed states of residues based on this normalization has also been shown to provide better enrichment of DNA-binding sites in exposed residues.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19397821 PMCID: PMC2685369 DOI: 10.1186/1472-6807-9-25
Source DB: PubMed Journal: BMC Struct Biol ISSN: 1472-6807
Figure 1Histogram of highest observed ASA (HOA) of X-Z-Y tripeptides for residue Z where X and Y are the flanking amino acid residues in actual structures. The Vertical line indicates (currently used) extended state ASA (ESA) of Ala-X-Ala. Subplots are arranged alphabetically by one-letter code (Ala to Tyr). X-axis shows the HOA and Y-axis shows the number of tripeptides (out of 400 possible combinations), whose HOA falls in that range.
Figure 2Percent residues with more than 100% ASA values, if normalized by their extended state ASA (ESA).
Number of over-exposed residues with their exposed surface area (ASA) greater than the Ala-X-Ala extended state ASA (ESA).
| Freq | Freq | Freq | Freq | ||||
| 0.66 | 2.35 | 0.36 | 1.91 | ||||
| 0.10 | 0.62 | 3.31 | 0.94 | ||||
| 3.45 | 0.05 | 0.45 | 0.15 | ||||
| 2.54 | 0.76 | 1.22 | 0.11 | ||||
| 0.15 | 0.09 | 0.88 | 0.20 |
Typically, polar and charged residues have a significant number of such cases.
Frequency of residues in Ala-X-Ala conformations, their extended state ASA (ESA) values and highest observed ASA (HOA) obtained from the entire data set of proteins (8.9 million residues, overall including residues with different sequence neighbors).
| Freq | ESA | HOA | CC | Freq | ESA | HOA | CC | ||
| 998 | 140.4 | 120 | 0.9200 | 7428 | 183.1 | 173 | 0.9885 | ||
| 889 | 240.5 | 217 | 0.9633 | 3694 | 117.2 | 142 | 0.9885 | ||
| 1554 | 200.1 | 181 | 0.9740 | 3049 | 138.7 | 149 | 0.9891 | ||
| 1857 | 213.7 | 223 | 0.9813 | 2810 | 178.6 | 204 | 0.9893 | ||
| 2453 | 200.7 | 202 | 0.9838 | 2910 | 146.4 | 176 | 0.9895 | ||
| 4114 | 185.0 | 167 | 0.9843 | 3297 | 229.0 | 280 | 0.9895 | ||
| 1849 | 181.9 | 200 | 0.9845 | 2528 | 141.9 | 138 | 0.9901 | ||
| 5328 | 78.7 | 96 | 0.9857 | 4192 | 205.7 | 219 | 0.9916 | ||
| 8032 | 110.2 | 120 | 0.9873 | 4962 | 174.7 | 201 | 0.9917 | ||
| 5704 | 153.7 | 148 | 0.9875 | 4567 | 144.1 | 165 | 0.9934 |
Correlation coefficient (CC) between relative ASA obtained after normalizing by the Ala-X-Ala extended state (ESA) and tripeptide highest observed ASA (context-dependent ASA) is also provided in the last column. CC values are based on all residues with different sequence neighbor. Residues are arranged in the order of CC values.
Abbreviations: Res: residue ID, Freq: number of residues in Ala-X-Ala conformation, ESA: extended state ASA of Ala-X-Ala conformation, HOA: highest observed ASA in Y-X-Z conformation (any neighbor), CC: correlation coefficient between ESA-normalized and HOA-normalized values in the benchmark data set.
Figure 3Typical distribution of ASA for a residue in similar tripeptide environment.
Distribution of 8000 HOA residues environments in various secondary structures.
| Alpha-helix | 7.73 | 672 |
| Strand | 3.11 | 271 |
| Beta Bridge | 0.21 | 18 |
| 3–10 helix | 3.86 | 336 |
| Bend | 21.50 | 1874 |
| Turn | 30.40 | 2650 |
| Coil | 33.00 | 2870 |
For each tripeptide environment, (central) residue with highest ASA is selected and distribution of such residues in various secondary structures is calculated.
Figure 4Distribution of residues' trippetide environments with highest ASA in various secondary structures and.
Figure 5A Ser residue in .
Comparison of the prediction performance obtained by using ESA- and HOA-normalized target ASA values.
| Rel | Rel | Abs | Abs | Rel | Rel | Abs | Abs | t-test | u-test | ||
| A | 30972 | 14.81 | 14.13 | 16.22 | 15.64 | 0.63 | 0.65 | 0.63 | 0.66 | 0.002 | 5.0e-10 |
| C | 5156 | 10.39 | 13.28 | 14.53 | 12.25 | 0.33 | 0.35 | 0.33 | 0.37 | 8.39e-13 | 0 |
| D | 22482 | 21.04 | 17.30 | 30.31 | 29.73 | 0.49 | 0.50 | 0.49 | 0.52 | 0.018 | 0.579 |
| E | 25303 | 20.09 | 17.00 | 35.11 | 34.12 | 0.49 | 0.50 | 0.49 | 0.52 | 5.21e-04 | 0.027 |
| F | 15016 | 11.60 | 11.63 | 23.22 | 21.71 | 0.40 | 0.44 | 0.40 | 0.45 | 7.28e-06 | 0 |
| G | 27935 | 20.25 | 17.60 | 15.31 | 15.90 | 0.51 | 0.52 | 0.50 | 0.54 | 0.621 | 2.3e-4 |
| H | 8596 | 16.22 | 16.36 | 29.50 | 29.28 | 0.54 | 0.55 | 0.54 | 0.56 | 0.825 | 0.581 |
| I | 21345 | 10.80 | 11.21 | 20.05 | 17.55 | 0.46 | 0.50 | 0.46 | 0.50 | 2.79e-13 | 0 |
| K | 21997 | 17.82 | 17.46 | 36.64 | 36.97 | 0.43 | 0.43 | 0.43 | 0.45 | 0.015 | 0.008 |
| L | 34119 | 11.40 | 11.27 | 20.86 | 19.34 | 0.47 | 0.51 | 0.47 | 0.51 | 1.58e-12 | 0 |
| M | 8130 | 12.20 | 12.99 | 24.40 | 21.84 | 0.48 | 0.52 | 0.48 | 0.53 | 3.44e-10 | 0 |
| N | 16660 | 20.87 | 17.63 | 30.55 | 30.38 | 0.53 | 0.52 | 0.53 | 0.54 | 0.621 | 0.148 |
| P | 17214 | 17.96 | 18.39 | 25.47 | 25.23 | 0.52 | 0.53 | 0.52 | 0.54 | 0.514 | 0.426 |
| Q | 14334 | 18.73 | 17.21 | 33.46 | 33.31 | 0.52 | 0.52 | 0.52 | 0.53 | 0.993 | 0.194 |
| R | 18302 | 17.57 | 16.21 | 40.23 | 39.85 | 0.46 | 0.46 | 0.46 | 0.47 | 0.243 | 0.647 |
| S | 22408 | 18.62 | 16.48 | 21.83 | 21.64 | 0.56 | 0.57 | 0.56 | 0.58 | 0.944 | 0.092 |
| T | 20973 | 16.71 | 15.61 | 23.17 | 22.97 | 0.56 | 0.58 | 0.56 | 0.58 | 0.710 | 0.343 |
| V | 26399 | 11.72 | 11.69 | 18.04 | 16.56 | 0.52 | 0.55 | 0.51 | 0.55 | 0.000 | 0 |
| W | 5625 | 11.77 | 13.80 | 28.35 | 26.76 | 0.36 | 0.37 | 0.36 | 0.38 | 0.008 | 9.93e-7 |
| Y | 13636 | 13.24 | 13.45 | 28.29 | 27.73 | 0.42 | 0.43 | 0.42 | 0.43 | 0.110 | 0.040 |
Abbreviations: Res: Residue ID, MAE: mean absolute error in prediction measured in percentage points, CC: Correlation coefficient, Rel: relative ASA values, Abs: using absolute area units, t-test: Student's t-test statistics, u-test: Mann-Whitney u-test
Figure 6Improvement in the prediction of ASA using HOA- and ESA- normalized data sers. Both MAE and correlation co-efficient between absolute ASA values are shown and improvement is defined relative to the MAE in the ESA-normalized predictions.
Figure 7Difference between the frequency of DNA-binding residues and non-binding residues in various ASA ranges taken from protein-DNA complexes. This difference is best separated for buried and exposed regions defined in terms of HOA-normalized relative ASA. According to that classification, we can say with the best confidence that exposed regions contain many more DNA-binding residues than buried regions.