| Literature DB >> 22962461 |
Fayyaz ul Amir Afsar Minhas1, Asa Ben-Hur.
Abstract
MOTIVATION: Calmodulin (CaM) is a ubiquitously conserved protein that acts as a calcium sensor, and interacts with a large number of proteins. Detection of CaM binding proteins and their interaction sites experimentally requires a significant effort, so accurate methods for their prediction are important.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22962461 PMCID: PMC3436843 DOI: 10.1093/bioinformatics/bts416
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.CaM binding site prediction with MIL. The annotated binding site is shown as a box, and is represented by a ‘bag’ composed of the windows indicated in red above the binding site. The rest of the windows that do not overlap the binding site are negative examples (shown in blue below the protein). The bottom panel illustrates the desired characteristics of the classifier's discriminant function. The dots indicate the score of different examples (positive indicated by solid red circles and negative shown as hollowed blue circles). The score from the trained discriminant function for one window in a binding site should be higher than the scores generated for non-binding site windows within that protein
Heuristic algorithm used for training MI-1
| With each binding site |
| Solve the following quadratic programming (QP) problem: |
| such that, ∀ |
| Update (for all binding sites): |
Results across methods and kernels
| Method | Features | AUC | AUC0.1 | TH % | FH % |
|---|---|---|---|---|---|
| Vanilla SVM | 1-Spec | 95.5 | 53.9 | 2.6 | |
| PD-1 | 95.6 | 54.5 | 64 | 2.5 | |
| Comb. | 95.9 | 55.1 | 65 | 2.1 | |
| 0.16 | 0.59 | 2.2 | 0.15 | ||
| mi-SVM | 1-Spec | 95.5 | 64 | 2.6 | |
| PD-1 | 96.0 | 55.8 | 69 | 2.1 | |
| Comb. | 96.2 | 55.6 | 68 | 1.9 | |
| MI-1 SVM | 1-Spec | 54.3 | 62 | ||
| PD-1 | |||||
| Comb. | |||||
| 0.14 | 0.80 | 3.4 | 0.11 | ||
| 96.5 | 58.5 | 68 | 1.6 |
The features are 1-spectrum (1-Spec), position-dependent 1-spectrum (PD-1) and the combination (Comb) of the 1-Spec and PD-1 representations. The Max Std. rows show the maximum standard deviation of a particular performance metric using the above feature representations. Results with the position-dependent Gappy triplet kernel (Gappy) with MI-1 SVM are also reported (for a single run due to its longer computational time). Bold numbers indicate the best value (across all methods) for a particular metric using a particular feature representation. (AUC: area under the ROC curve, AUC0.1 AUC for first 10% false positives, TH: true hit, FH: false hit).
Fig. 2.MI-1 discriminant values along the length of a held-out protein with the position-independent (top) and the position-dependent (bottom) 1-spectrum features
Fig. 3.(a) Weights of different amino acids in the (position-independent) 1-spectrum feature representation; (b) Heat map of the weights of different amino acids versus their position from the MI-1 SVM position-dependent 1-spectrum feature representation; and (c) Top 100 (in terms of their weights) motifs from the position-dependent gappy triplet kernel. The last (numeric) column shows actual weight values
Results of CaM binding prediction for Discriminant function scoring and Cascaded classification with an SVM with a Gaussian kernel
| Method | Features | AUC |
|---|---|---|
| Discriminant function scoring | 1-Spec | 71.9 |
| PD-1 | 70.1 | |
| Comb. | 71.9 | |
| Cascaded classification | 1-Spec | |
| PD-1 | ||
| Comb. |
The features are 1-spectrum (1-Spec), position-dependent 1-spectrum (PD-1) and the combination (Comb) of the 1-Spec and PD-1 feature representations. Using Cascaded Classification with a liner kernel in the second stage SVM instead of the Gaussian kernel, the best AUC was 0.72 with 1-spectrum features. (AUC: area under the ROC curve).