| Literature DB >> 22942732 |
Jun Gao1,2, Qi Liu1, Hong Kang1, Zhiwei Cao1, Ruixin Zhu1,3,4.
Abstract
In recent years, although many ligand-binding site prediction methods have been developed, there has still been a great demand to improve the prediction accuracy and compare different prediction algorithms to evaluate their performances. In this work, in order to improve the performance of the protein-ligand binding site prediction method presented in our former study, a comparison of different binding site ranking lists was studied. Four kinds of properties, i.e., pocket size, distance from the protein centroid, sequence conservation and the number of hydrophobic residues, have been chosen as the corresponding ranking criterion respectively. Our studies show that the sequence conservation information helps to rank the real pockets with the most successful accuracy compared to others. At the same time, the pocket size and the distance of binding site from the protein centroid are also found to be helpful. In addition, a multi-view ranking aggregation method, which combines the information among those four properties, was further applied in our study. The results show that a better performance can be achieved by the aggregation of the complementary properties in the prediction of ligand-binding sites.Entities:
Keywords: prediction; protein-ligand binding site; ranking aggregation
Mesh:
Substances:
Year: 2012 PMID: 22942732 PMCID: PMC3430263 DOI: 10.3390/ijms13078752
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 6.208
Prediction success rate presented by different ranking methods.
| Bound | Unbound/bound | |||||
|---|---|---|---|---|---|---|
|
|
| |||||
| Methods | TOP1 | MCC for TOP1 | TOP3 | TOP1 | MCC for TOP1 | TOP3 |
| Conservation score | 59% | 0.53 | 73% | 57 | 0.53 | 72 |
| Distance | 48% | 0.53 | 66% | 56 | 0.53 | 70 |
| Volume | 47% | 0.50 | 69% | 44 | 0.53 | 59 |
| Hydrophobic | 39% | 0.51 | 62% | 30 | 0.51 | 48 |
| SURFNET (Control) | 42% | ~ | 57% | ~ | ~ | ~ |
Prediction success rate of ranking aggregation.
| Bound | Unbound/bound | |||||
|---|---|---|---|---|---|---|
|
|
| |||||
| Methods | TOP1 | MCC | TOP3 | TOP1 | MCC for TOP1 | TOP3 |
| CON + DIS | 57% | 0.52 | 74% | 61 | 0.53 | 74 |
| VOL + DIS | 52% | 0.51 | 73% | 54 | 0.53 | 74 |
| CON + VOL | 52% | 0.52 | 72% | 48 | 0.54 | 65 |
| VOL + HYDRO | 46% | 0.50 | 67% | 39 | 0.53 | 61 |
| DIS + HYDRO | 47% | 0.51 | 68% | 44 | 0.49 | 63 |
| CON + HYDRO | 53% | 0.51 | 70% | 39 | 0.53 | 61 |
| DIS + CON + HYDRO | 53% | 0.50 | 72% | 48 | 0.51 | 67 |
| VOL + CON + HYDRO | 51% | 0.52 | 71% | 41 | 0.55 | 63 |
| VOL + DIS + HYDRO | 50% | 0.52 | 71% | 46 | 0.50 | 67 |
| VOL + DIS + CON | 54% | 0.51 | 73% | 52 | 0.53 | 74 |
| VOL + DIS + CON + HYDRO | 53% | 0.52 | 72% | 48 | 0.53 | 67 |
The one-sided Wilcoxon signed ranked sum test is used based on the Matthews Correlation Coefficient (MCC) scores for each protein. The p values for the comparison of different methods are listed in the Supporting Information (Table S1 for bound test set, S2 for unbound/bound test set).
Part of results obtained for different ranking methods, which include volume (VOL), distance of presumed binding sites from the protein centroid (DIS), rank aggregation (REG) for VOL and DIS, and conservation score (CONS).
| Rank | VOL | DIS | REG | CONS |
|---|---|---|---|---|
| 1 | Pocket 0 | Pocket 12 | ||
| 2 | Pocket 0 | Pocket 5 | ||
| 3 | Pocket 5 | Pocket 0 | Pocket 10 | Pocket 0 |
| 4 | Pocket 10 | Pocket 7 | Pocket 12 | Pocket 2 |
Pocket 9 corresponds to the observed binding site.
Figure 1The surface position of Pocket 9 in protein structure. PDB ID: 2SIM. (Red points: water molecule; Light blue: the whole protein; Golden: molecular ligand; Purple: predicted binding site constituted by amino acids).
Figure 2The concept of multi-view ranking aggregation.