| Literature DB >> 25038637 |
Seyed Majid Saberi Fathi, Jack A Tuszynski1.
Abstract
BACKGROUND: This papn>er provides a simple and rapn>id method for a protein-clustering strategy. The basic idea implemented here is to use computational geometry methods to predict and characterize ligand-binding pockets of a given protein structure. In addition to geometrical characteristics of the protein structure, we consider some simple biochemical propn>erties that helpn> recognize the best candidates for pockets in a protein's active site.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25038637 PMCID: PMC4112621 DOI: 10.1186/1472-6807-14-18
Source DB: PubMed Journal: BMC Struct Biol ISSN: 1472-6807
Figure 1The 3D polyhedron (convex hull) for the PDB:1ABT structure.
Figure 2A given triangle on the convex hull for the PDB:1ABT structure. The three vertices are labeled as 1, 2, and 3. The point p is determined by the extreme values of x, y, and z of these three vertices. The distance of atom i to the triangle is obtained as follows: first obtain the normal vector to the triangle, N, N = (x − x) × (x − x), where x1, x2, and x3 are the vectors from the origin of the systems of Cartesian coordinates to the three vertices. Then, calculate the angle between the normal vector and the line passing through atom i and one of the vertices of this triangle using the following relation: , Finally, we compute this distance by = |x − x| cos θ, where x is a vector joining the origin and a given point in this volume.
Figure 3The steps of the algorithm illustrated (in 2D for clarity) using the PDB:1ABT structure. The red dots represent empty voxels and the blue dots are voxels containing protein atoms. The atom positions have been averaged on the z-axis. (a) A convex hull enclosing the protein atoms is generated. (b) A line (a triangle in 3D) on the surface of the hull is selected. Inside of convex hull part of a given pocket is shown.
Figure 4Schematic illustration of the overlap between two pockets.
Main biochemical interactions of atoms and residues in the proteins[49,51,52]
| THR | |
| SER | |
| GLN | |
| ASN | |
| TYR | |
| CYS | |
| MET | |
| ALA | |
| PRO | |
| LEU | |
| VAL | |
| ILE | |
| ASP | |
| GLU | |
| LYS | |
| ARG | |
| HIS | |
| PHE | |
| TRP | |
| TYR | |
| GLY | No participation |
Abbreviations used:HBA: Hydrogen bond acceptor, HBD: Hydrogen bond donor, vdW: van der Waals interaction, Ion: Ionic interaction, Sul: Sulfur interaction.
Ligand biochemistry
| Unprotonated atoms in ligand | 1) O has a connection with N, P or Zn |
| 2) O only has a connection with C | |
| Protonated atoms in ligand | 1) Ca |
| 2) N has only two connection with C |
The bond list is given in the PDB file CONECT lines.
Figure 5Three dimensional structural representation of 1A6U. The atoms are shown with yellow dots and the surface atoms of a given pocket are shown with red crosses.
Pockets and their characteristics recognized by our method for 1A6U protein atoms
| 1 | 63 | 401 | 116.25 | 28.40 | 5 | 8 | 0 | 1 | 0 | 20 | 0.31 | 0.33 |
| 5 | 80 | 481 | 21.83 | 38.66 | 2 | 3 | 10 | 2 | 0 | 2 | 0 | 0 |
| 18 | 101 | 648 | 187.27 | 25.83 | 5 | 7 | 6 | 2 | 0 | 14 | 0.12 | 0.11 |
| 19 | 67 | 411 | 84.36 | 19.35 | 1 | 2 | 5 | 0 | 0 | 2 | 0 | 0 |
| 38 | 44 | 266 | 138.90 | 20.63 | 1 | 4 | 1 | 0 | 0 | 6 | 0 | 0 |
| 39 | 85 | 499 | 82.58 | 28.26 | 3 | 5 | 2 | 0 | 0 | 14 | 0.31 | 0.22 |
| 40 | 21 | 127 | 77.97 | 14.53 | 2 | 3 | 0 | 0 | 0 | 4 | 0.06 | 0 |
| 58 | 118 | 765 | 340.90 | 29.83 | 5 | 4 | 7 | 3 | 0 | 3 | 0 | 0 |
| 59 | 86 | 529 | 253.20 | 26.72 | 4 | 4 | 4 | 2 | 0 | 6 | 0.06 | 0 |
| 85 | 226 | 1360 | 370.14 | 36.18 | 7 | 7 | 26 | 3 | 1 | 27 | 0 | 0 |
| 89 | 21 | 141 | 212.35 | 21.47 | 0 | 1 | 4 | 1 | 0 | 4 | 0 | 0 |
| 90 | 92 | 573 | 293.28 | 28.54 | 4 | 2 | 15 | 2 | 0 | 11 | 0 | 0 |
| 112 | 44 | 241 | 36.33 | 27.39 | 1 | 2 | 1 | 0 | 0 | 6 | 0.06 | 0 |
| 117 | 38 | 215 | 76.66 | 17.42 | 1 | 3 | 0 | 0 | 0 | 8 | 0 | 0 |
| 137 | 15 | 99 | 127.57 | 17.53 | 2 | 4 | 0 | 0 | 0 | 3 | 0.25 | 0.33 |
| 143 | 55 | 354 | 259.10 | 24.24 | 4 | 8 | 0 | 1 | 0 | 20 | 0.43 | 0.55 |
*Pocket number indicates the number in the protein’s atomic positions convex hull surface rows, and they correspond to three vertices of triangles.
**NoA means the number of atoms.
***vdW means van der Waals.
HA means hydrogen bond acceptor.
HD means hydrogen bond donor.
These are the cf-values (ratio of the number of correct residues to the total number of residues in the active site). For 1A6W in PDB two active sites (AS) are reported as HAP and AC1.
1A6U best pockets with residues in common with the 2 active sites, HAP and AC1
| ASN 354H (11.61) | SER 331H (10.79) | TYR 34 L (4.27) |
| ASP 352H (7.07) | THR 328H (14.41) | TYR 332H (8.34) |
| ILE 351H (6.25) | THR 330H (12.29) | TYR 401H (2.92) |
| SER 32 L (6.81) | TRP 333H (1.734) | TYR 402H (5.75) |
| ALA 2 L (15.1365) | HIS 97 L (6.8477) | THR 26 L (15.7431) |
| ARG 350H (2.89) | ILE 348H (9.34) | TRP 98 L (3.24) |
| ASN 96 L (7.12) | LYS 359H (5.38) | TRP 347H (4.78) |
| ASN 361H (9.75) | LYS 365H (14.84) | TYR 94 L (7.84) |
| GLU 362H (12.30) | PHE 364H (13.46) | TYR 360H (8.34) |
| GLY 349H (6.45) | SER 366H (17.38) | VAL 99 L (9.69) |
| ASP 400H (5.44) | THR 31 L (8.29) | TYR 401H (2.92) |
| SER 405H (3.65) | TYR 34 L (4.27) | TYR 402H (5.75) |
| ARG 350H (2.89) | SER 95 L (5.42) | TYR 332H (8.34) |
| ASN 354H (11.61) | SER 331H (10.79) | TYR 401H (2.92) |
| ASP 352H (7.07) | TRP 93 L (3.36) | TYR 402H (5.75) |
| ILE 351H (6.25) | TRP 333H (1.73) | |
| SER 32 L (6.81) | TYR 34 L (4.27) | |
There are four predicted pockets with more than 25% of residues in common between the pockets and the active sites. The values in parentheses are the minimum residue distances for 1A6U to the ligand atoms of NIP reported in the heterogenic atom lines in the PDB file of 1A6W.
Figure 61A6W and its ligand. From the PDB website.
Figure 7Histogram of the 86-element data set. Due to the RAM memory limits the protein number 55 in the 86-element data set list (PDB structures 2NGR and 1KZ7) was not included. The results are reported for the 85-element data set. The horizontal axis is the percentage of correct prediction of residues. The vertical axis is the number of proteins. The number of proteins with predicted pockets including more than half of the active site residues is 66 proteins (78% of the data set). Overlap threshold between pockets is 0.8.
Figure 8Histogram of the 48-element data set. The horizontal axis is the percentage of correct prediction of residues. The vertical axis is the number of proteins. The number of proteins with predicted pockets including more than half of the active site residues is 24 proteins (50% of the data set). Overlap threshold between pockets is 0.8.
Performance comparison of our results with the other methods CASTp, LIGSITE, PASS, SURFNET and VISGRID
| | ||
|---|---|---|
| CAST | 31 (64.6%) | 66 (76.7%) |
| LIGSITE | 36 (75.0%) | 69 (80.2%) |
| PASS | 27 (56.3%) | 54 (62.8%) |
| SURFNET | 19 (39.6%) | 63 (73.3%) |
| VISGRID: Top 0.8% voxels | 34 (70.8%) | 55 (64.0%) |
| Our method: Overlap 0.8 | 24 (50%) | 66 (78%) |
The other results reported in Table III of Li et al. [20].
Figure 9Histogram of the 130-element data set. The horizontal axis is the percentage of correct prediction of residues. The vertical axis is the number of proteins. Overlap threshold between pockets is 0.8.