| Literature DB >> 16689701 |
Abstract
In this study, an attempt has been made to predict the major functions of gram-negative bacterial proteins from their amino acid sequences. The dataset used for training and testing consists of 670 non-redundant gram-negative bacterial proteins (255 of cellular process, 60 of information molecules, 285 of metabolism, and 70 of virulence factors). First we developed an SVM-based method using amino acid and dipeptide composition and achieved the overall accuracy of 52.39% and 47.01%, respectively. We introduced a new concept for the classification of proteins based on tetrapeptides, in which we identified the unique tetrapeptides significantly found in a class of proteins. These tetrapeptides were used as the input feature for predicting the function of a protein and achieved the overall accuracy of 68.66%. We also developed a hybrid method in which the tetrapeptide information was used with amino acid composition and achieved the overall accuracy of 70.75%. A five-fold cross validation was used to evaluate the performance of these methods. The web server VICMpred has been developed for predicting the function of gram-negative bacterial proteins (http://www.imtech.res.in/raghava/vicmpred/).Entities:
Mesh:
Substances:
Year: 2006 PMID: 16689701 PMCID: PMC5054027 DOI: 10.1016/S1672-0229(06)60015-6
Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN: 1672-0229 Impact factor: 7.691
The Performance of Various Modules Including SVM Modules Based on Various Features of Protein Sequences and PSI-BLAST
| Approach | Cellular | Information | Metabolism | Virulence | Overall | ||||
|---|---|---|---|---|---|---|---|---|---|
| ACC | MCC | ACC | MCC | ACC | MCC | ACC | MCC | ACC | |
| Composition-based (A) | 47.06 | 0.12 | 0.12 | 0.41 | 0.41 | 0.31 | 27.14 | 0.32 | 52.39 |
| Dipeptide-based (B) | 45.10 | 0.11 | 15.00 | 0.21 | 60.35 | 0.23 | 27.14 | 0.20 | 47.01 |
| Pattern-based (C) | 70.20 | 0.46 | 48.33 | 0.57 | 72.98 | 0.51 | 62.86 | 0.61 | 68.66 |
| PSI-BLAST | 23.13 | / | 8.33 | / | 28.77 | / | 25.71 | / | / |
| Hybrid 1 (A+C) | 69.41 | 0.48 | 50.00 | 0.59 | 77.19 | 0.54 | 62.86 | 0.65 | 70.30 |
| Hybrid 2 (B+C) | 69.02 | 0.54 | 48.33 | 0.52 | 74.04 | 0.53 | 58.57 | 0.54 | 68.21 |
| Hybrid 3 (A+B+C) | 69.80 | 0.51 | 53.33 | 0.58 | 77.54 | 0.56 | 61.43 | 0.59 | 70.75 |
ACC: Accuracy (%); MCC: Matthew’s correlation coefficient.
Fig. 1An outline of the ab initio pattern prediction method.