| Literature DB >> 23919118 |
Ebrahim Barzegari Asadabadi1, Parviz Abdolmaleki.
Abstract
BACKGROUND: Prediction of interaction sites within the membrane protein complexes using the sequence data is of a great importance, because it would find applications in modification of molecules transport through membrane, signaling pathways and drug targets of many diseases. Nevertheless, it has gained little attention from the protein structural bioinformatics community.Entities:
Keywords: Interaction sites; Membrane proteins; Support vector machines
Year: 2013 PMID: 23919118 PMCID: PMC3732864
Source DB: PubMed Journal: Avicenna J Med Biotechnol ISSN: 2008-2835
The Weka classifier models and their accuracy of prediction on the independent test set
| Classifier | Accuracy (%) | Classifier | Accuracy (%) | Classifier | Accuracy (%) |
|---|---|---|---|---|---|
| NBTree | 77.1435 | Random forest | 71.6437 | Voted perceptron | 60.9159 |
| SMO | 76.9305 | Classification via regression | 71.1627 | IB1 | 60.7905 |
| Decision table | 76.2861 | Bayes Net | 71.1209 | IBk | 60.7905 |
| Attribute selected classifier | 76.2233 | Rotation forest | 70.4308 | Multilayer perceptron | 59.7449 |
| Filtered classifier | 76.1188 | LADTree | 70.2217 | Naïve bayes multinomial | 58.9921 |
| Bagging | 75.366 | LogitBoost | 69.6989 | Complement naïve bayse | 58.9293 |
| Decorate | 75.0314 | ADTree | 68.4442 | Naïve bayes simple | 58.7829 |
| JRip | 74.4458 | FT | 68.2978 | Naïve bayes | 58.5529 |
| END | 74.3622 | AdaBoostM1 | 67.2313 | Naïve bayes multinom updateable | 58.5529 |
| Nested Dichotomies Class Balanced ND | 74.3622 | RandomTree | 66.1857 | Naïve bayes updateable | 58.5529 |
| Nested Dichotoies Data Near Balanced ND | 74.3622 | Raced incremental logitBoost | 65.0774 | DMNBtext | 57.7373 |
| Nested Dichotomies ND | 74.3622 | OneR | 63.279 | Threshold selector | 56.3363 |
| Ordinal class classifier | 74.3622 | Conjunctive rule | 62.4843 | RBF network | 56.0435 |
| J48 | 74.3622 | KStar | 61.9824 | VFI | 51.7984 |
| PART | 74.3413 | LWL | 61.857 | Classification via clustering | 50.2928 |
| J48graft | 74.3413 | Decision stump | 61.857 | CV parameter selection | 49.9791 |
| Random sub space | 74.2158 | SPegasos | 61.7942 | Grading | 49.9791 |
| Simple cart | 73.923 | Multi boost AB | 61.7315 | Multi scheme | 49.9791 |
| LMT | 73.6303 | NNge | 61.7315 | Stacking | 49.9791 |
| Ridor | 73.4421 | Bayesian logistic regression | 61.606 | StackingC | 49.9791 |
| BFTree | 73.3166 | Logistic | 61.5851 | Vote | 49.9791 |
| DTNB | 73.1493 | Multi class classifier | 61.5851 | ZeroR | 49.9791 |
| REPTree | 72.3756 | Simple logistic | 61.376 | Hyper pipes | 49.9164 |
| Random committee | 71.9155 | Dagging | 61.2505 |
Feature selection by RLR and its comparison with the previous study
| Features/Weights | Order of importance by RF model |
|---|---|
| Evol. Rate/1.480945 | Ala |
| His/0.989602 | Leu |
| Ala/0.255016 | Gly |
| Cys/0.251367 | Val |
| Ile/0.239659 | Evol. Rate |
| Lys/0.231607 | Met |
| Val/0.229424 | Phe |
| Leu/0.215648 | Ile |
| Gly/0.173995 | Trp |
| Pro/0.103435 | Ser |
| Trp/-0.05232 | Arg |
| Gln/-0.08103 | Lys |
| Glu/-0.14467 | Thr |
| Ser/-0.17687 | Asn |
| Asn/-0.23034 | Cys |
| Phe/-0.24444 | Pro |
| Tyr/-0.27445 | His |
| Asp/-0.35331 | Tyr |
| Arg/-0.42569 | Gln |
| Thr/-0.80005 | Asp |
| Met/-0.83732 | Glu |
All predictor variables are sorted by their RLR-assigned weights and the order of importance is compared with that of previously reported RF model (Bordner, 2009). Positive values show preference and negative values show avoidance of the features in interaction sites
Figure 1ROC plot illustrating the classification performance of tuned SVM model with the reference dataset and the collected independent test set. Related AUC values are 0.812 and 0.786, respectively