| Literature DB >> 17854501 |
Mamoon Rashid1, Sudipto Saha, Gajendra Ps Raghava.
Abstract
BACKGROUND: In past number of methods have been developed for predicting subcellular location of eukaryotic, prokaryotic (Gram-negative and Gram-positive bacteria) and human proteins but no method has been developed for mycobacterial proteins which may represent repertoire of potent immunogens of this dreaded pathogen. In this study, attempt has been made to develop method for predicting subcellular location of mycobacterial proteins.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17854501 PMCID: PMC2147037 DOI: 10.1186/1471-2105-8-337
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Prediction of subcellular localization of proteins using BLAST
| Cytoplasmic | Integral Membrane | Secretory | Membrane -attached | |||||
|---|---|---|---|---|---|---|---|---|
| BLAST E-value | Prob | Acc | Prob | Acc | Prob | Acc | Prob | Acc |
| 20.0 | 0.6 | 0.0 | 0.0 | 86.9 | 40.0 | 0.0 | 0.0 | |
| 21.4 | 0.9 | 0.0 | 0.0 | 86.9 | 40.0 | 0.0 | 0.0 | |
| 55.0 | 6.5 | 33.3 | 1.2 | 80.0 | 40.0 | 16.6 | 1.7 | |
| 50.0 | 12.1 | 70.1 | 13.4 | 68.9 | 40.0 | 16.6 | 1.7 | |
| 46.7 | 29.4 | 72.2 | 41.3 | 51.3 | 40.0 | 8.6 | 5.0 | |
| 42.3 | 41.8 | 69.0 | 65.9 | 41.6 | 40.0 | 20.7 | 20.0 | |
| 45.6 | 45.6 | 70.5 | 70.2 | 40.0 | 40.0 | 23.3 | 23.3 | |
| 46.5 | 46.5 | 70.4 | 70.4 | 40.0 | 40.0 | 26.7 | 26.7 | |
Prob: Probability of correct hit; Acc: Accuracy
The performance of various SVM models
| Cytoplasmic | Integral Membrane | Secretory | Membrane-attached | Overall accuracy | Average accuracy | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| Input Pattern | ACC ± sd | MCC | ACC ± sd | MCC | ACC ± sd | MCC | ACC ± sd | MCC | ACC | ACC |
| 88.82 ± 5.4 | 0.77 | 86.07 ± 7.5 | 0.71 | 44.00 ± 42.2 | 0.57 | 55.00 ± 19.4 | 0.58 | 82.51 | 68.47 | |
| 89.41 ± 7.8 | 0.72 | 81.09 ± 7.5 | 0.67 | 50.00 ± 36.8 | 0.60 | 50.00 ± 17.4 | 0.57 | 80.39 | 67.63 | |
| 94.71 ± 4.8 | 0.85 | 87.81 ± 6.1 | 0.80 | 44.00 ± 42.2 | 0.48 | 68.33 ± 28 | 0.69 | 86.62 | 73.71 | |
ACC: Accuracy; MCC: Matthews correlation coefficient; sd: Standard Deviation
The performance of HMM based model
| Sensitivity (percent of correct hits) | ||||||
|---|---|---|---|---|---|---|
| E-value | Cytoplasmic | Integral Membrane | Secretory | Membrane -attached | Overall accuracy | Average accuracy |
| 0.29 | 1.99 | 38.00 | 5.00 | 3.63 | 11.32 | |
| 3.82 | 5.47 | 38.00 | 41.67 | 9.26 | 22.24 | |
| 20.59 | 21.39 | 40.00 | 65.00 | 25.23 | 36.74 | |
| 33.82 | 30.60 | 40.00 | 70.00 | 35.21 | 43.60 | |
| 36.18 | 32.84 | 40.00 | 73.33 | 37.44 | 45.58 | |
| 36.18 | 32.84 | 40.00 | 73.33 | 37.44 | 45.58 | |
The comparison of performance of hybrid model and MEME/MAST model
| Percent accuracy | ||||||
|---|---|---|---|---|---|---|
| E-value | Cytoplasmic | Integral Membrane | Secretory | Membrane -attached | Overall accuracy | Average accuracy |
| 94.7 (0.0) | 87.8 (0.2) | 46.0 (40.0) | 65.0 (0.0) | 86.5 (2.4) | 73.4 (10.1) | |
| 94.7 (0.0) | 87.6 (1.5) | 46.0 (40.0) | 65.0 (0.0) | 86.3 (3.0) | 73.4 (10.4) | |
| 94.7 (0.0) | 87.3 (2.7) | 46.0 (40.0) | 65.0 (0.0) | 86.2 (3.6) | 73.3 (10.7) | |
| 93.2 (0.3) | 86.6 (8.5) | 46.0 (40.0) | 65.0 (11.7) | 85.3 (7.3) | 73.2 (15.1) | |
| 79.1 (0.9) | 85.3 (31.6) | 92.0 (100.0) | 91.7 (100.0) | 83.7 (28.1) | 87.0 (58.1) | |
| 75.0 (4.4) | 84.3 (39.1) | 92.0 (100.0) | 91.7 (100.0) | 81.6 (33.1) | 85.8 (60.9) | |
Figure 1A plot between reliability index (RI) and percent coverage vs average accuracy for PSSM based SVM module, where Y-axis shows average accuracy and X-axis shows RI (lower axis) and percent coverage (upper axis). For example, about 62% of sequences having RI > = 3 are predicted with 95% accuracy.
The performances of existing methods on dataset used in this study
| Methods | Predicted Locations | Cytoplasmic [340] | Intergral Membrane [402] | Secretory [50] | Membrane -attached [60] |
|---|---|---|---|---|---|
| Cytoplasm | 300 (88%) | 20 | 3 | 4 | |
| Extracellular | 2 | 4 | 40 (80%) | 4 | |
| Cytoplasmic Membrane | 3 | 326 (81%) | 0 | 11 (18%) | |
| Cell Wall | 0 | 1 | 0 | 1 | |
| Unknown | 35 | 51 | 7 | 40 | |
| Cytoplasm | 323 (95%) | 31 | 0 | 0 | |
| Extracellular | 1 | 117 | 50 (100%) | 50 | |
| Plasma Membrane | 1 | 0 (0%) | 0 | 10 (17%) | |
| No-positive | 15 | 254 | 0 | 0 | |
| Membrane | 1 | 354 (88%) | 40 | 14 (23%) | |
| Non-Membrane | 339 | 48 | 10 | 46 |
Statistics of distributions of proteins among different subcellular locations
| Subcellular localizations | Sample Numbers |
|---|---|
| 1. Probably external side of the cell wall | 1 |
| 2. Integral membrane protein | 402 |
| 3. Cytoplasmic | 340 |
| 4. Secreted | 50 |
| 5. Membrane associated | 10 |
| 6. Soluble or peripheral membrane protein | 3 |
| 7. Attached to the membrane by a lipid anchor | 60 |
| 8. Probable peripheral membrane protein | 3 |
| 9. Type-I membrane protein | 2 |
| 10. Surface associated | 2 |
| 11. Membrane bound | 5 |
| 12. Membrane protein | 3 |
| 13. Partially secreted | 1 |
Number of proteins remaining in various locations, after removing redundant proteins, at cut-off 40%, 60% and 90% using program CD-HIT
| Sequences remaining after removal of similar sequences | ||||
|---|---|---|---|---|
| CD-HIT cut-off (% identity) | Cytoplasmic (340*) | Integral-membrane (402) | Secretory (50) | Membrane-attached (60) |
| 223 | 262 | 34 | 38 | |
| 118 | 195 | 20 | 29 | |
| 117 | 182 | 17 | 27 | |
* Number in bracket is total number of proteins in a location