| Literature DB >> 32985584 |
Jordan J Clark1, Zachary J Orban1, Heather A Carlson2.
Abstract
We present the application of seven binding-site prediction algorithms to a meticulously curated dataset of ligand-bound and ligand-free crystal structures for 304 unique protein sequences (2528 crystal structures). We probe the influence of starting protein structures on the results of binding-site prediction, so the dataset contains a minimum of two ligand-bound and two ligand-free structures for each protein. We use this dataset in a brief survey of five geometry-based, one energy-based, and one machine-learning-based methods: Surfnet, Ghecom, LIGSITEcsc, Fpocket, Depth, AutoSite, and Kalasanty. Distributions of the F scores and Matthew's correlation coefficients for ligand-bound versus ligand-free structure performance show no statistically significant difference in structure type versus performance for most methods. Only Fpocket showed a statistically significant but low magnitude enhancement in performance for holo structures. Lastly, we found that most methods will succeed on some crystal structures and fail on others within the same protein family, despite all structures being relatively high-quality structures with low structural variation. We expected better consistency across varying protein conformations of the same sequence. Interestingly, the success or failure of a given structure cannot be predicted by quality metrics such as resolution, Cruickshank Diffraction Precision index, or unresolved residues. Cryptic sites were also examined.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32985584 PMCID: PMC7522209 DOI: 10.1038/s41598-020-72906-7
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1(A) Distribution of the sizes of unified binding sites for the 304 protein families in this dataset, as % frequency. (B) Distribution of amino acid composition of the 304 unified binding sites.
Figure 2Analyses of maximum and mean backbone RMSD for each protein family. Each point represents the maximum or mean observed in one protein family, and the number of points of each section is labeled in black (numbers in parenthesis are points with values > 3.5 Å). (A) The maximum backbone RMSD across the apo-apo pairs is compared to the maximum of the holo-holo pairs; 206 proteins display RMSD ≤ 1 Å for both groups. (B) The mean backbone RMSD across the apo-apo pairs is compared to the mean of the holo-holo pairs; 247 proteins display RMSD ≤ 1 Å for both groups. (C) The maximum UBS RMSD across the apo-apo pairs is compared to the maximum of the holo-holo pairs; 206 proteins display RMSD ≤ 1 Å for both groups. (D) The mean UBS RMSD across the apo-apo pairs is compared to the mean of the holo-holo pairs; 235 proteins display RMSD ≤ 1 Å for both groups.
Median of family median F scores and MCCs for apo and holo datasets for all seven LBS-prediction methods.
| Method | Apo F | Holo F | Wilcoxon | Apo MCC | Holo MCC | Wilcoxon |
|---|---|---|---|---|---|---|
| Surfnet | 0.23 | 0.23 | 0.90 | 0.22 | 0.23 | 0.63 |
| Ghecom | 0.48 | 0.54 | 0.20 | 0.50 | 0.53 | 0.17 |
| LIGSITEcsc | 0.49 | 0.52 | 0.56 | 0.47 | 0.50 | 0.60 |
| Fpocket | 0.42 | 0.53 | 0.43 | 0.52 | ||
| Depth | 0.40 | 0.42 | 0.32 | 0.38 | 0.40 | 0.17 |
| AutoSite | 0.36 | 0.45 | 0.13 | 0.34 | 0.42 | 0.10 |
| Kalasanty | 0.49 | 0.51 | 0.12 | 0.48 | 0.54 | 0.11 |
Wilcoxon p values are the same as those found in Figs. 3 and 4.
The bold values are the only ones that meet the statistical limit of p < 0.05.
Figure 3Distribution of family median F scores of apo and holo protein structures for (A) Surfnet (p = 0.90), (B) Ghecom (p = 0.20), (C) LIGSITEcsc (p = 0.56), (D) Fpocket (p = 0.04), (E) Depth (p = 0.32), (F) AutoSite (p = 0.13), and (G) Kalasanty (p = 0.12).
Figure 4Distribution of family median Matthews Correlation Coefficients (MCCs) of apo and holo protein structures for (A) Surfnet (p = 0.63), (B) Ghecom (p = 0.17), (C) LIGSITEcsc (p = 0.60), (D) Fpocket (p = 0.03), (E) Depth (p = 0.17), (F) AutoSite (p = 0.10), and (G) Kalasanty (p = 0.11).
Figure 5Family median F scores of apo and holo protein structures for (A) Surfnet, (B) Ghecom, (C) LIGSITEcsc, (D) Fpocket, (E) Depth, (F) AutoSite, and (G) Kalasanty where the error bars are constructed from the family minima and maxima. Line: y = x.
Figure 6Family median MCCs of apo and holo protein structures for (A) Surfnet, (B) Ghecom, (C) LIGSITEcsc, (D) Fpocket, (E) Depth, (F) AutoSite, and (G) Kalasanty where the error bars are constructed from the family minima and maxima. Line: y = x.
PDBids for structures which resulted in system errors for the various LBS-prediction methods.
Apo structures are denoted in orange, holo structures are denoted in blue.
PDBids for structures which resulted in no predicted pockets for the various LBS-prediction methods.
| Method (data) | Structures with no pockets |
|---|---|
| Ghecom (Apo) | 1g7b,4ey1 |
| Ghecom (Holo) | 1tym,4ajz |
| Ligsite (Apo) | 1ve6,3o4g |
| Ligsite (Holo) | 2hu5,2hu7,2ogz |
| Fpocket (Apo) | 1aki,1b2d,1g7b,1guj,1mi7,1rnu,1u1t,1uoj,1yy6,1zz6,2rh2,2vjz,3a93,3az5,3w3b,4bwo,4f4t |
| Fpocket (Holo) | 1a7x,1b0d,1j4h,1our,1tym,1uzv,1zt9,2boj,2oly,2olz,2z3h,3dcq,3ipe,3qe8,4ajx,4ajz,4b4q,4b4r,4joj,4jor,4lkd,4tun,4tz8 |
| Depth (Holo) | 2olz |
| AutoSite (Apo) | 1b2d,1n40,1vie,2vjz |
| AutoSite (Holo) | 1uof,1vif,2oly,2rk2,3lb2,4ajz |
| Kalasanty (Apo) | 1alv,1b2d,1bmz,1dq2,1ed8,1f41,1fz2,1fz7,1fz8,1g7b,1gmq,1gwg,1hfj,1ier,1ird,1l7l,1m47,1mi7,1mmi,1mso,1n1z,1nxd,1ous,1oux,1pw9,1r13,1r14,1r7i,1sar,1tta,1u6j,1u94,1uoj,1w6l,1w8e,1yy6,1yze,2ajs,2cm3,2duo,2g4g,2gqv,2gt7,2i3u,2i4e,2j46,2noy,2pol,2ptx,2rh2,2vjz,2wlc,2wld,2×88,2yf3,2yf4,2yf9,3a4d,3c95,3d5g,3d7p,3e8m,3enr,3exx,3f32,3gxm,3kv7,3kx7,3o7s,3par,3q4j,3q6e,3rnt,3ssw,3vaf,3vag,3vaj,3wne,4b4p,4bwo,4clf,4ey1,4f4t,4i2g,4j0c,4k3s,4lse,4lsf,4lsh, 4ovh,4usv,8rnt,9rnt |
| Kalasanty (Holo) | 1alw,1eta,1ew8,1ew9,1fy5,1gic,1gmr,1i3h,1m49,1n20,1n22,1ona,1ovs,1rnt,1rsn,1tym,1uzv,1wav,1wpg,1wrp,1xgi,1xms,1xvd,1xz3,1yvx,1zt9,2ajz,2boj,2bp6,2duq,2dur,2flm,2foj,2foo,2fop,2oly,2olz,2omg,2omi,2oz9,2r1x,2r1y,2r2b,2rk2,2roy,2sar,2wle,2wlf,2wlg,2wos,2yfd,2ys6,3bpc,3d1f,3d1g,3d5i,3dcq,3dh2,3eio,3f33,3f34,3f35,3f37,3f38,3hl8,3ikn,3ikp,3ikq,3ikr,3imu,3iqf,3kw1,3paq,3qce,3qcf,3sy0,3t4y,3vq5,3vq8,3vqe,4ajx,4ajz,4akj,4b4q,4b4r,4bu4,4gcq,4hjt,4i87,4j0i,4k3m,4k3r,4l6o,4lk7,4lkd,4lke,4lkf,4mjq,4mjr,4n94,4n97,4n9a,4usu,5cna,6rnt |