| Literature DB >> 26491686 |
Chen Cao1, Lincong Wang1, Xiaoyang Chen1, Shuxue Zou1, Guishen Wang1, Shutan Xu1.
Abstract
Several secondary structures, such as π-helix and left-handed helix, have been frequently identified at protein ligand-binding sites. A secondary structure is considered to be constrained to a specific region of dihedral angles. However, a comprehensive analysis of the correlation between main chain dihedral angles and ligand-binding sites has not been performed. We undertook an extensive analysis of the relationship between dihedral angles in proteins and their distance to ligand-binding sites, frequency of occurrence, molecular potential energy, amino acid composition, van der Waals contacts, and hydrogen bonds with ligands. The results showed that the values of dihedral angles have a strong preference for ligand-binding sites at certain regions in the Ramachandran plot. We discovered that amino acids preceding the ligand-prefer ϕ/ψ box residues are exposed more to solvents, whereas amino acids following ligand-prefer ϕ/ψ box residues form more hydrogen bonds and van der Waals contacts with ligands. Our method exhibited a similar performance compared with the program Ligsite-csc for both ligand-bound structures and ligand-free structures when just one ligand-binding site was predicted. These results should be useful for the prediction of protein ligand-binding sites and for analysing the relationship between structure and function.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26491686 PMCID: PMC4602322 DOI: 10.1155/2015/757495
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1Probability for (a) left-handed helix residues and (b) non-left-handed helix residues observed at ligand-binding site. (a) shows the probability of left-handed helix residues being observed in a ligand-binding site; (b) illustrates the probability of non-left-handed helix residues observed at a ligand-binding site in the same region. The top three most frequent ligands contacted with left-handed helix residues (a) and non-left-handed helix residues (b) are also labelled as three-letter code in (a) and (b). (c) shows an example of non-left-handed helix residues at a ligand-binding site. In (a) and (b), the probability value, which is expressed as a percentage, is defined by the number of residues detected in the ligand-binding site divided by the total number of residues observed in the 5° × 5° Ramachandran box. (c) and (d) show examples of non-left-handed helix residues (coloured residues) at a ligand-binding site in O-succinylbenzoate synthase with a ligand-free form in (c) (pdbid: 2opj) and a ligand-bound form in (d) (pdbid: 2qvh). The dihedral angles for the residues are noted in bold font, and the ligand is indicated by purple spheres.
Figure 2Observed probabilities at ligand-binding site for (a) 5° × 5° Ramachandran boxes and (b) ligand-prefer Ramachandran boxes in nine regions. The probability increases from white to yellow to orange to black; the boxes in both figures with probabilities > 0.6 are also represented as black boxes. Angles are shown in degrees. Detailed ϕ/ψ boundaries for the nine regions are shown in Table S1.
Figure 3Observed distribution for (a) 5° × 5° Ramachandran boxes and (b) level 2 binding site residues. The x axis for both figures indicates the probability observed at the ligand-binding site; the number is labelled on the y-axis.
Distribution of amino acids in nine ligand-prefer Ramachandran regions.
| Region | Number 11 | Number 22 | Probability3 | Top three AA occurrences4 | Top three AA observed at ligand site5 | Secondary structure features6 | Top three frequently ligand7 |
|---|---|---|---|---|---|---|---|
| I | 1,525 | 5,366 | 0.284 | D(207), H(138), S(132) | H(45.8%), E(38.5%), G(36.3%) | EBS(24.7%), NBS(23%), | NAG(4.9%), FAD(3.8%), |
|
| |||||||
| II | 1,269 | 4,262 | 0.297 | D(150), N(140), H(123) | H(41.8%), E(41.2%), W(39.0%) | TC(18.2%) | FAD(6.8%), NAG(4.6%), |
|
| |||||||
| III | 1,895 | 6,779 | 0.279 | D(329), N(218), H(153) | H(53.7%), G(42.3%), M(36.1%) | GXT(40.1%) | FAD(6.4%), HEM(4.3%), |
|
| |||||||
| IV | 2,693 | 9,373 | 0.287 | D(327), H(224), N(295) | H(45.6%), D(39.5%), A(37.7%) | TC(16.8%), SCH(11.9%), | FAD(8.4%), NAG(5.8%), |
|
| |||||||
| V | 2,164 | 7,815 | 0.276 | T(258), D(157), V(141) | C(73.2%), H(43.5%), D(39.6%) | BU(20.7%), SCH(15.1%), | HEM(5.8%), FAD(4.1%), |
|
| |||||||
| VI | 2,369 | 7,714 | 0.307 | V(247), I(232), L(217) | C(63.3%), H(44.7%), W(33.1%) |
| HEM(6.4%), FAD(4.7%), |
|
| |||||||
| VII | 1,690 | 5,208 | 0.325 | G(837), D(207), N(113) | C(67.3%), H(51.2%), R(41.8%) | HC(24.0%), PP(12.5%), | FAD(9.5%), NAD(4.8%), |
|
| |||||||
| VIII | 5,974 | 20,007 | 0.298 | G(1171), N(840), D(695) | C(67.1%), H(51.3%), R(41.8%) | SCH(31.5%), LHH(17.8%), | NAG(7.5%), FAD(3.4%), |
|
| |||||||
| IX | 1,482 | 4,609 | 0.322 | G(1067), S(101), A(46) | C(72.2%), H(69.7%), S(59.4%) | LT2(51.1%) | FAD(9.2%), SAH(6.6%), |
|
| |||||||
| Other8 | 171,545 | 901,640 | 0.190 | L(90,807), A(72,088), V(69,604) | H(29.2%), C(27.6%), W(27.2%) | ALH(28.0%), EBS(12.9%), | FAD(5.4%), HEM(4.9%), |
1Total number of level 2 residues in region n ligand-prefer Ramachandran boxes, where n ranges from I to IX.
2Total number of level 3 residues in region n ligand-prefer Ramachandran boxes, where n ranges from I to IX.
3The value in this column is calculated by the number in column 2 divided by the number in column 3.
4The top three occurrences level 2 residues in region n ligand-prefer Ramachandran boxes.
5Probability is calculated by the number of level 2 residues in region n ligand-prefer Ramachandran boxes divided by number of level 3 residues in the region n ligand-prefer Ramachandran boxes.
6Only residues that are not assigned as “undefined” are listed; for additional information about DISISL assignment, refer to Table S2.
7Top three most frequent ligands (three-letter code in PDB file) contacted with level 2 residues in the region.
8The other is the remaining region (except the nine regions) in the Ramachandran plot.
Molecular potential energy for 20 amino acids in nine ligand-prefer Ramachandran regions (values in KJ/mol).
| AA | Average1 | I | II | III | IV | V | VI | VII | VIII | IX |
|---|---|---|---|---|---|---|---|---|---|---|
| I | 69.8 | 72.1 | 84.7 | 84.4 | 69.8 | 82.4 | 77.6 | 64.8 | 71.4 | 78.4 |
| V | 47.4 | 45.5 | 59.0 | 58.2 |
|
| 55.7 |
|
|
|
| L | 70.6 | 73.9 | 74.7 | 71.9 |
| 69.7 | 73.8 | 82.1 | 73.2 | 79.1 |
| F | 80.0 | 91.2 | 90.3 | 90.6 |
|
| 91.7 | 90.0 | 82.0 | 81.3 |
| C | 38.0 | 37.7 | 42.2 | 42.3 | 43.6 | 43.9 |
| 42.9 | 40.4 |
|
| M | 59 | 63.5 | 55.4 | 67.2 | 71.7 | 70.0 | 67.5 |
| 61.8 |
|
| A | 31.9 | 31.7 | 37.6 | 33.4 | 31.9 | 38.6 | 30.7 | 32.8 | 33.4 | 29.4 |
| G | 27.2 | 24.3 | 30.6 | 30.9 | 36.9 | 35.9 | 37.2 | 31.4 | 34.1 | 31.3 |
| T | 49.7 | 44.5 | 51.3 | 53.3 | 54.5 | 46.4 | 51.2 | 56.0 |
| 47.5 |
| S | 42.6 | 44.4 | 44.3 | 45.5 | 48.5 | 47.7 | 47.4 | 45.6 | 41.8 | 50.3 |
| W | 203.7 | 203.3 | 208.7 | 208.3 | 211.4 | 212.8 | 210.5 | 214.2 | 206.5 | 211.9 |
| Y | 82.1 | 92.9 | 82.3 | 82.1 | 91.9 | 91.3 | 87.7 | 87.5 | 87.5 | 84.5 |
| P | 109.5 | — | — | — | 115.2 | 109.9 | — | 117.4 | — | — |
| H | 193.4 | 187.6 | 193.2 | 190.4 | 192.8 | 190.0 | 193.5 | 200.4 | 193.1 | 190.7 |
| E | 68.2 | 60.5 | 73.1 | 73.4 | 64.3 | 69.5 | 70.6 | 79.8 | 66.0 | 52.5 |
| Q | 36.6 | 42.4 | 46.7 | 47.4 | 37.6 |
| 37.2 | 40.9 | 38.7 |
|
| D | 71.1 | 67.8 | 74.8 | 71.7 | 85.1 | 76.4 | 72.0 | 84.5 | 73.2 | 73.0 |
| N | 41.8 | 35.2 | 46.8 | 42.1 | 52.2 | 46.3 | 39.0 |
| 45.8 | 43.7 |
| K | 74.6 | 80.5 | 76.8 | 76.2 | 74.0 | 79.3 | 79.4 | 71.5 | 71.9 | 62.4 |
| R | 208.6 | 206.8 | 207.9 | 204.4 | 208.9 | 209.7 | 209.1 | 209.5 | 208.1 |
|
1Average energy is calculated by residues that are not in the nine ligand-prefer Ramachandran regions; outliers energy calculation (E > 1000 KJ/mol) are excluded.
2Energy values that are 15 KJ/mol higher than the second column are denoted in bold.
“—” represents regions in which Pro does not occur.
Figure 4Average relative accessibility for ligand-prefer boxes residues (i) and their neighbours (at positions i − 1 and i + 1).
Figure 5Number of residue-ligand VDW contacts (a) and residue-ligand hydrogen bonds (b) for ligand-prefer Ramachandran boxes residues (i) and their neighbours at positions i + 1 and i − 1.
Figure 6Performance of ligand-binding site prediction, including Ligsite-csc, our method, and a random baseline predictor for ligand-bound structures (a) and ligand-free structures (b). The y-axis represents the success rate, that is, the nearest distance between the predicted binding site and any atom of a ligand, which is less than or equal to the distance labelled on the x-axis.
Figure 7An example of the ligand site prediction performance of Ligsite-csc and our method (pdbid: 2f48). The top-scoring binding site predicted by Ligsite-csc is denoted by a blue sphere, four additional binding sites listed in top five score grids predicted by Ligsite-csc are denoted by light blue spheres, and the site predicted by our method is denoted by a red sphere. The protein surface is depicted in grey, and the ligand is shown as a purple stick. The ϕ/ψ angles for four ligand-prefer Ramachandran boxes residues around the red sphere are indicated in bold font.