| Literature DB >> 22815760 |
Leif Ellingson1, Jinfeng Zhang.
Abstract
Comparison of the binding sites of proteins is an effective means for predicting protein functions based on their structure information. Despite the importance of this problem and much research in the past, it is still very challenging to predict the binding ligands from the atomic structures of protein binding sites. Here, we designed a new algorithm, TIPSA (Triangulation-based Iterative-closest-point for Protein Surface Alignment), based on the iterative closest point (ICP) algorithm. TIPSA aims to find the maximum number of atoms that can be superposed between two protein binding sites, where any pair of superposed atoms has a distance smaller than a given threshold. The search starts from similar tetrahedra between two binding sites obtained from 3D Delaunay triangulation and uses the Hungarian algorithm to find additional matched atoms. We found that, due to the plasticity of protein binding sites, matching the rigid body of point clouds of protein binding sites is not adequate for satisfactory binding ligand prediction. We further incorporated global geometric information, the radius of gyration of binding site atoms, and used nearest neighbor classification for binding site prediction. Tested on benchmark data, our method achieved a performance comparable to the best methods in the literature, while simultaneously providing the common atom set and atom correspondences.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22815760 PMCID: PMC3398928 DOI: 10.1371/journal.pone.0040540
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Atom types used in binding site matching.
|
|
|
|
| Carbon (C) | Carbonyl C | 1 |
| Aliphatic C, CA, Other sp3 C, | 2 | |
| Aromatic C | 3 | |
| Oxygen (O) | Backbone O and carbonyl O in Asn and Gln, carboxyl O in Asp and Glu | 4 |
| Hydroxyl O in Ser, Thr and Tyr | 5 | |
| Nitrogen (N) | Backbone N, TRP side chain NE1, GLN NE2, ASN ND2, ARG NE NE1 NE2, LYS NZ | 6 |
| HIS side chain NE1, NE2 | 7 | |
| Hydrogen (H) | polar H | 8 |
| Sulfur (S) | Disulfide bond S, Met S, Cys S | 2 |
CE for studied combinations of search radius and classifiers.
|
|
| ||||
| 1.0 Å | 1.5 Å | 2.0 Å | 2.5 Å | 3.0 Å | |
| 1 | 0.56 | 0.53 | 0.45 |
| 0.56 |
| 3 | 0.58 | 0.53 | 0.51 |
| 0.53 |
| 4 | 0.78 | 0.70 | 0.66 |
| 0.65 |
| 5 | 0.78 | 0.70 | 0.66 |
| 0.65 |
CE for combinations of factors for various levels of k-nearest neighbor.
|
|
|
|
|
| TI + Gyr | 0.29 | 0.33 | 0.53 |
| TI + RMSD4 | 0.43 | 0.41 | 0.66 |
| TI + HydProp | 0.36 | 0.40 | 0.64 |
Results of k-nearest neighbor classification for the Kahraman (5.3 Å) data set.
| Method | Classification Error |
| TIPSA-TI | 0.43 |
| TIPSA-TI + Gyr |
|
| TIPSA-TI + RMSD4 | 0.43 |
| TIPSA-TI + HydProp | 0.36 |
| TIPSA-TI + Gyr + HydProp |
|
| Gyr | 0.54 |
| RMSD4 | 0.71 |
| HydProp | 0.64 |
| Sup-CK | 0.36 |
| Sup-CK + Vol | 0.34 |
| Sup-CKL |
|
| Sup-CKL + Vol | 0.26 |
| Vol | 0.39 |
| Sup-TI | 0.42 |
| MultiBind | 0.42 |
| Random (No Assumptions) | 0.90 |
| Random (Known Proportions) | 0.87 |
Results of k-nearest neighbor classification for the Homogeneous (5.3 Å) data set.
| Method | Classification Error |
| TIPSA-TI | 0.49 |
| TIPSA-TI + Gyr | 0.44 |
| TIPSA-TI + RMSD4 | 0.49 |
| TIPSA-TI + HydProp | 0.42 |
| TIPSA-TI + Gyr + HydProp |
|
| Gyr | 0.77 |
| RMSD4 | 0.58 |
| HydProp | 0.83 |
| Sup-CK | 0.47 |
| Sup-CK + Vol | 0.46 |
| Sup-CKL |
|
| Sup-CKL + Vol |
|
| Vol | 0.89 |
| Sup-TI | 0.47 |
| MultiBind | 0.48 |
| Random | 0.90 |
CE for the Kahraman (5.3 Å) data set from alternate versions using TI + Gyr to measure similarity.
| Method | Classification Error |
| Standard |
|
| No Hungarian | 0.33 |
| No Iterative Alignment | 0.30 |
| No Hungarian and Iterative Alignment | 0.32 |
The misclassified binding sites (classified ligands in parentheses).
|
|
|
| AMP | 12as (ATP), 1 amu (EST), 1c0a (GLC), 1 jp4 (ATP), 1 kht (EST), 1 tb7 (AND), |
| ATP | 1b8a (AMP), 1 dy3 (FMN), 1 esq (AMP), 1 gn8 (FMN), |
| FAD |
|
| FMN | 1f5v (ATP), 1ja1 (EST), |
| GLC |
|
| HEM |
|
| NAD |
|
| PO4 | 1e9g (GLC), |
| AND | 1e3r (EST), |
| EST | 1fds (AND) |
Figure 1Similarity matrix computed using the TIPSA-TI + Gyr model.
Dark pixels represent greater similarity and light pixels correspond to less similarity. The steroid group consists of both the AND and EST ligand groups in accordance to Kahraman et al (2007).
Results of k-nearest neighbor classification for the Kahraman and Homogeneous data sets.
| Method | Classification Error | |
| Kahraman | Homogeneous | |
| TIPSA-TI | 0.53 | 0.52 |
| TIPSA-TI + Gyr | 0.41 | 0.49 |
| TIPSA-TI + RMSD4 | 0.38 | 0.46 |
| TIPSA-TI + HydProp | 0.38 | 0.48 |
| TIPSA-TI + Gyr + HydProp |
| 0.45 |
| TIPSA-TI + Gyr + HydProp + RMSD4 |
|
|
| Gyr | 0.55 | 0.84 |
| RMSD4 | 0.70 | 0.52 |
| HydProp | 0.80 | 0.81 |
Binding sites consist of all atoms within 7 Å of the ligand.
CE and average runtime per alignment for various seed pair restrictions.
| Number of Seed Pairs | Classification Error | Average Runtime (sec) |
| 1000 | 0.29 | 11.4 |
| 500 | 0.29 | 6.1 |
| 450 | 0.33 | 4.7 |
| 400 | 0.32 | 4.1 |
| 300 | 0.27 | 2.3 |
| 100 | 0.33 | 1.2 |
Matched atoms from the AMP binding site of 1ct9 and APC binding site of 1q19 found only by either SitesBase or TIPSA.
| 1ct9 AMP | 1q19 APC SitesBase | 1q19 APC TIPSA | |
| 2 | CA 232 | C 244 | CA 244 |
| 3 | C 232 | C 244 | |
| 6 | CG 232 | CG 244 | CD 244 |
| 7 | CD1 232 | CG 244 | |
| 19 | CB 238 | CB 250 | |
| 20 | CG 238 | CG 250 | |
| 21 | OD1 238 | OD2 250 | |
| 25 | OG 239 | OG 251 | |
| 31 | O 271 | O 269 | |
| 39 | CB 279 | CB 277 | |
| 40 | CG 279 | CG 277 | |
| 41 | OD1 279 | OE1 277 | |
| 42 | OD2 279 | OE1 277 | |
| 43 | SD 329 | CG 327 | |
| 44 | SD 332 | CG2 330 | |
| 47 | N 346 | N 343 | |
| 48 | CA 346 | CB 343 | CA 343 |
| 50 | O 346 | O 343 | |
| 51 | CB 346 | CG2 343 | CB 343 |
| 59 | CB 348 | CG 345 | CB 345 |
| 60 | CG 348 | CG 345 | |
| 71 | NZ 449 | N 444 |
Figure 2The atoms common to the ATP binding site of 1ct9 and the APC binding site of 1q19 as found using TIPSA (top) and SitesBase (bottom).
Matched atoms from the ATP binding site of 1ayl and ATP binding site of 1e2q found only by either SitesBase or TIPSA.
| 1ayl.ATP | 1e2q.ATPSitesBase | 1e2q.ATPTIPSA | |
| 37 | CA 254 | CA 19 | |
| 41 | CD 254 | CD 19 | |
| 58 | C 256 | C 21 | |
| 59 | CB 256 | CG2 21 | |
| 60 | OG1 256 | OG1 21 | |
| 61 | CG2 256 | CB 21 | |
| 78 | CE 288 | CG 16 | |
| 98 | CA 441 | C 180 | |
| 99 | C 441 | C 180 | |
| 100 | O 441 | O 180 | |
| 108 | NE 449 | NE 143 | NH2 143 |
| 110 | NH1 449 | NE 143 | |
| 111 | NH2 449 | NH1 143 | |
| 119 | C 450 | C 182 | |
| 120 | O 182 | O 182 | |
| 126 | N 451 | N 183 | |
| 127 | CA 451 | CA 183 | |
| 128 | C 451 | C 183 | |
| 129 | O 451 | O 183 | |
| 131 | N 452 | N 184 | |
| 132 | CA 452 | CA 184 | |
| 135 | CB 452 | CB 184 | |
| 136 | CG1 452 | CG1 184 | |
| 139 | CB 455 | CG2 187 | |
| 142 | CG2 455 | CB 187 |
Figure 3The atoms common to the ATP binding sites of, respectively, 1ayl and 1e2q as found using TIPSA (top) and SitesBase (bottom).