| Literature DB >> 27855637 |
Xiuzhen Hu1, Kai Wang2, Qiwen Dong3,4,5.
Abstract
BACKGROUND: Prediction of ligand binding sites is important to elucidate protein functions and is helpful for drug design. Although much progress has been made, many challenges still need to be addressed. Prediction methods need to be carefully developed to account for chemical and structural differences between ligands.Entities:
Keywords: Binding residue prediction; Ensemble classifier; Protein function
Mesh:
Substances:
Year: 2016 PMID: 27855637 PMCID: PMC5114821 DOI: 10.1186/s12859-016-1348-3
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Composition of the dataset for the 9 types of ligands
| Ligand Categories | Ligand IDa | No. Proteins | No. Positiveb | No. Negativec |
|---|---|---|---|---|
| Metal ions | CU | 110 | 535 | 38488 |
| FE | 227 | 1115 | 73813 | |
| FE2 | 103 | 439 | 34113 | |
| ZN | 933 | 4317 | 367292 | |
| Acid radical ions | SO4 | 303 | 2125 | 99729 |
| PO4 | 339 | 2168 | 112279 | |
| Nucleotides | ATP | 261 | 3631 | 100848 |
| FMN | 95 | 1552 | 30244 | |
| HEME | HEM and HEC | 228 | 5821 | 69155 |
aThe ligand ID in the BioLip database
bThe number of binding residues
cThe number of non-binding residues
Fig. 1The flowchart of the proposed TargetSeq (a) and TargetCom (b) methods for protein-ligand binding site prediction
Performance of the proposed sequence-based methods on the 9 types of ligands over five-fold cross-validation and comparison with S-SITE
| Ligand |
| Method | Accuracy (%) | Sensitivity (%) | Specificity (%) | MCC |
|---|---|---|---|---|---|---|
| CU | 15 | TargetSeq | 99.02 | 51.40 | 99.69 | 0.59 |
| S-SITE | 97.98 | 60.37 | 98.50 | 0.46 | ||
| FE | 9 | TargetSeq | 98.83 | 53.54 | 99.52 | 0.57 |
| S-SITE | 96.93 | 59.55 | 97.49 | 0.38 | ||
| FE2 | 9 | TargetSeq | 99.20 | 51.36 | 99.81 | 0.63 |
| S-SITE | 98.28 | 42.14 | 99.00 | 0.37 | ||
| ZN | 11 | TargetSeq | 99.01 | 41.78 | 99.68 | 0.50 |
| S-SITE | 97.71 | 56.43 | 98.20 | 0.38 | ||
| SO4 | 13 | TargetSeq | 97.79 | 10.07 | 99.66 | 0.19 |
| S-SITE | 96.98 | 14.4 | 98.73 | 0.15 | ||
| PO4 | 7 | TargetSeq | 98.09 | 20.18 | 99.59 | 0.31 |
| S-SITE | 97.29 | 27.86 | 98.63 | 0.27 | ||
| ATP | 19 | TargetSeq | 97.14 | 36.81 | 99.31 | 0.48 |
| S-SITE | 96.73 | 48.09 | 98.48 | 0.49 | ||
| FMN | 17 | TargetSeq | 97.23 | 56.59 | 99.32 | 0.66 |
| S-SITE | 96.39 | 66.56 | 97.92 | 0.62 | ||
| HEME | 17 | TargetSeq | 92.62 | 61.27 | 95.26 | 0.53 |
| S-SITE | 93.63 | 58.24 | 96.61 | 0.55 |
aThe optimal window length
Performance of the proposed combined methods on the 9 types of ligands over five-fold cross-validation and comparison with COACH
| Ligand | Method | Accuracy (%) | Sensitivity (%) | Specificity (%) | MCC |
|---|---|---|---|---|---|
| CU | TargetCom | 99.21 | 57.94 | 99.78 | 0.67 |
| COACH | 98.86 | 61.12 | 99.39 | 0.59 | |
| FE | TargetCom | 98.73 | 59.73 | 99.32 | 0.58 |
| COACH | 97.95 | 66.82 | 98.42 | 0.50 | |
| FE2 | TargetCom | 99.27 | 67.73 | 99.68 | 0.70 |
| COACH | 99.20 | 62.41 | 99.67 | 0.66 | |
| ZN | TargetCom | 98.99 | 56.18 | 99.50 | 0.56 |
| COACH | 98.65 | 57.38 | 99.14 | 0.50 | |
| SO4 | TargetCom | 97.72 | 15.11 | 99.48 | 0.23 |
| COACH | 97.21 | 19.15 | 98.87 | 0.21 | |
| PO4 | TargetCom | 97.99 | 32.03 | 99.26 | 0.37 |
| COACH | 97.52 | 35.33 | 98.72 | 0.34 | |
| ATP | TargetCom | 97.17 | 59.26 | 98.54 | 0.58 |
| COACH | 96.99 | 56.27 | 98.46 | 0.55 | |
| FMN | TargetCom | 97.66 | 79.61 | 98.58 | 0.76 |
| COACH | 96.75 | 70.36 | 98.11 | 0.66 | |
| HEME | TargetCom | 94.96 | 69.92 | 97.07 | 0.66 |
| COACH | 94.48 | 61.60 | 97.25 | 0.60 |
Fig. 2Head-to-head comparisons between TargetCom and the individual component methods on the proteins of all ligands. CC is the Pearson’s correlation coefficient between the MCCs of the two compared methods
The p-values in Student’s t-test for the differences in the MCC scores between each pair of predictors on the proteins of all ligands
| Method | TargetCom | TargetSeq | COACH | COFACTOR | TM-SITE |
|---|---|---|---|---|---|
| TargetSeq | 8.17562E-32 | ||||
| COACH | 2.09639E-44 | 1.17438E-10 | |||
| COFACTOR | 4.366E-117 | 4.27612E-34 | 5.84742E-79 | ||
| TM-SITE | 2.62453E-79 | 2.29774E-06 | 6.07923E-42 | 2.70076E-15 | |
| S-SITE | 7.1097E-112 | 2.47659E-10 | 5.31899E-68 | 7.66908E-11 | 0.0924092 |
Performance comparison of SVM-PSSM on the ATP168 dataset with different definitions of ligand binding sites
| Definitiona | Accuracy (%) | Sensitivity (%) | Specificity (%) | MCC |
|---|---|---|---|---|
| LPC | 96.00 | 33.40 | 99.28 | 0.47 |
| BioLip | 95.14 | 22.34 | 98.10 | 0.24 |
aBinding sites were defined using the LPC and BioLip databases, respectively
Performance of all methods on the “hard” target proteins over each type of ligand
| Ligand | Na | Method | Accuracy (%) | Sensitivity (%) | Specificity (%) | MCC (%)b |
|---|---|---|---|---|---|---|
| CU | 3 | TargetCom | 98.76 | 16.67 | 1 | 23.42 |
| COACH | 98.76 | 16.67 | 1 | 23.42 | ||
| S-SITE | 98.76 | 33.33 | 99.56 |
| ||
| TargetSeq | 98.33 | 0 | 1 | 0 | ||
| COFACTOR | 98.33 | 0 | 1 | 0 | ||
| TM-SITE | 98.33 | 0 | 1 | 0 | ||
| FE | 3 | TargetCom | 98.48 | 16.67 | 1 | 23.39 |
| COACH | 98.48 | 16.67 | 1 | 23.39 | ||
| S-SITE | 98.73 | 25.00 | 1 |
| ||
| TargetSeq | 98.26 | 12.5 | 99.9 | 20.03 | ||
| COFACTOR | 97.98 | 0 | 1 | 0 | ||
| TM-SITE | 97.98 | 0 | 1 | 0 | ||
| ZN | 30 | TargetCom | 98.38 | 40.63 | 99.42 | 37.49 |
| COACH | 97.65 | 40.91 | 98.66 | 31.86 | ||
| S-SITE | 97.43 | 43.32 | 98.37 |
| ||
| TargetSeq | 97.97 | 7.02 | 99.68 | 11.43 | ||
| COFACTOR | 98.14 | 0 | 1 | 0 | ||
| TM-SITE | 97.99 | 0 | 99.85 | −0.15 | ||
| SO4 | 5 | TargetCom | 97.11 | 6.67 | 99.61 | 7.61 |
| COACH | 97.02 | 6.67 | 99.51 | 7.08 | ||
| S-SITE | 97.11 | 0 | 1 | 0 | ||
| TargetSeq | 97.11 | 0 | 1 | 0 | ||
| COFACTOR | 97.39 | 6.67 | 99.9 | 1 | ||
| TM-SITE | 97.02 | 6.67 | 99.51 |
| ||
| PO4 | 8 | TargetCom | 97.68 | 4.17 | 99.59 | 4.52 |
| COACH | 97.54 | 05 | 99.43 | 2.9 | ||
| S-SITE | 97.81 | 0 | 99.76 | −0.36 | ||
| TargetSeq | 98.08 | 13.39 | 99.66 |
| ||
| COFACTOR | 97.91 | 0 | 99.86 | −0.17 | ||
| TM-SITE | 98.05 | 0 | 1 | 0 | ||
| ATP | 4 | TargetCom | 93.92 | 5 | 97.14 | 1.38 |
| COACH | 93.92 | 5 | 97.14 | 1.38 | ||
| S-SITE | 96.42 | 0 | 1 | 0 | ||
| TargetSeq | 97.32 | 27.08 | 99.85 |
| ||
| COFACTOR | 96.42 | 0 | 1 | 0 | ||
| TM-SITE | 93.69 | 5 | 96.9 | 1.19 | ||
| HEME | 9 | TargetCom | 92.74 | 16.7 | 99.21 | 25.02 |
| COACH | 91.8 | 03.98 | 99.3 | 07.3 | ||
| S-SITE | 92.43 | 14.9 | 99.2 |
| ||
| TargetSeq | 89.54 | 15.47 | 95.68 | 11.59 | ||
| COFACTOR | 92.12 | 0 | 1 | 0 | ||
| TM-SITE | 92.12 | 0 | 1 | 0 |
aThe number of “hard” target proteins in each type of ligand
bThe numbers shown in bold are the best values of the non-combination based method
Performance comparison of the general purpose and ligand-specific models of the TargetSeq method on the dataset of the 9 ligands by five-fold cross-validation
| Ligand Type | Model Type | Accuracy (%) | Sensitivity (%) | Specificity (%) | MCC |
|---|---|---|---|---|---|
| CU | General | 86.98 | 79.62 | 87.09 | 0.22 |
| Specific | 99.02 | 51.40 | 99.69 | 0.59 | |
| FE | General | 90.14 | 85.02 | 90.22 | 0.29 |
| Specific | 98.83 | 53.54 | 99.52 | 0.57 | |
| FE2 | General | 90.67 | 90.89 | 90.67 | 0.30 |
| Specific | 99.20 | 51.36 | 99.81 | 0.63 | |
| ZN | General | 88.50 | 74.29 | 88.66 | 0.27 |
| Specific | 99.01 | 41.78 | 99.68 | 0.50 | |
| SO4 | General | 85.85 | 55.29 | 86.50 | 0.17 |
| Specific | 97.79 | 10.07 | 99.66 | 0.19 | |
| PO4 | General | 86.38 | 71.73 | 86.66 | 0.23 |
| Specific | 97.29 | 27.86 | 98.63 | 0.27 | |
| ATP | General | 87.46 | 71.88 | 88.02 | 0.32 |
| Specific | 96.73 | 48.09 | 98.48 | 0.49 | |
| FMN | General | 88.24 | 76.68 | 88.83 | 0.40 |
| Specific | 96.39 | 66.56 | 97.92 | 0.62 | |
| HEME | General | 86.18 | 73.85 | 87.21 | 0.43 |
| Specific | 93.63 | 58.24 | 96.61 | 0.55 |