| Literature DB >> 19038062 |
Aarti Garg1, Gajendra P S Raghava.
Abstract
BACKGROUND: The expansion of raw protein sequence databases in the post genomic era and availability of fresh annotated sequences for major localizations particularly motivated us to introduce a new improved version of our previously forged eukaryotic subcellular localizations prediction method namely "ESLpred". Since, subcellular localization of a protein offers essential clues about its functioning, hence, availability of localization predictor would definitely aid and expedite the protein deciphering studies. However, robustness of a predictor is highly dependent on the superiority of dataset and extracted protein attributes; hence, it becomes imperative to improve the performance of presently available method using latest dataset and crucial input features.Entities:
Mesh:
Substances:
Year: 2008 PMID: 19038062 PMCID: PMC2612013 DOI: 10.1186/1471-2105-9-503
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Detailed performance of various SVM based modules and EuPSI-BLAST on the RH2427 dataset
| ACC | MCC | ACC | MCC | ACC | MCC | ACC | MCC | ACC | MCC | ACC | MCC | |
| AAC-NTerm (A) | 82.2 | 0.76 | 83.5 | 0.82 | 90.3 | 0.80 | 86.2 | 0.86 | 86.5 | 0.80 | 85.5 | 0.81 |
| PSSM based (B) | 84.1 | 0.79 | 71.3 | 0.74 | 95.2 | 0.85 | 93.2 | 0.95 | 88.6 | 0.83 | 86.0 | 0.83 |
| *EuPSI-BLAST (C) | 77.6 | --- | 54.8 | --- | 84.5 | --- | 86.7 | --- | --- | --- | --- | --- |
| Hybrid1 (A+B) | 86.1 | 0.84 | 89.4 | 0.89 | 95.2 | 0.87 | 93.9 | 0.94 | 91.7 | 0.88 | 91.1 | 0.89 |
ACC is accuracy; MCC is Matthew correlation coefficient
ACC is calculated in percentage
*The results are obtained from ESLpred method 20
The detailed prediction results of different modules and comparison of performance with BaCelLo method on non-redundant and organism specific datasets
| Datasets | Localizations | PSI-BLAST (A) | (PSSM+AAC-NTerm) (B) | Hybrid2 (A+B) | Hybrid2 (10-fold CV) | #Using BaCelLo strategy (B) | |||||
| 10.9 | ---- | 53.6 | 0.32 | 54.0 | 0.36 | 51.7 | 0.37 | 62.6 | |||
| 12.2 | ---- | 84.0 | 0.75 | 82.5 | 0.77 | 83.5 | 0.77 | 90.4 | |||
| 39.7 | ---- | 73.0 | 0.73 | 78.6 | 0.59 | 80.7 | 0.60 | 74.7 | |||
| 29.6 | ---- | 92.1 | 0.92 | 92.1 | 0.93 | 93.2 | 0.93 | 94.3 | |||
| ---- | * | ||||||||||
| ---- | * | ||||||||||
| 28.7 | 62.9 | 0.42 | 63.3 | 0.49 | 61.3 | 0.48 | 70.6 | ||||
| 17.0 | ---- | 77.1 | 0.75 | 78.2 | 0.77 | 78.7 | 0.77 | 91.5 | |||
| 53.8 | ---- | 69.0 | 0.60 | 77.7 | 0.68 | 79.1 | 0.69 | 72.6 | |||
| 40.9 | ---- | 92.4 | 0.86 | 95.3 | 0.90 | 95.0 | 0.90 | 93.8 | |||
| ---- | * | ||||||||||
| ---- | * | ||||||||||
| 31.4 | ---- | 77.5 | 0.67 | 81.9 | 0.69 | 82.8 | 0.71 | 90.7 | |||
| 6.90 | ---- | 51.7 | 0.50 | 50.0 | 0.53 | 50.0 | 0.50 | 79.3 | |||
| 16.4 | ---- | 67.2 | 0.66 | 65.8 | 0.63 | 70.2 | 0.66 | 67.2 | |||
| 48.8 | ---- | 80.2 | 0.77 | 81.8 | 0.76 | 81.8 | 0.79 | 86.8 | |||
| 26.8 | ---- | 87.8 | 0.65 | 90.2 | 0.70 | 95.1 | 0.76 | 85.4 | |||
| ---- | * | ||||||||||
| ---- | * | ||||||||||
ACC is accuracy; MCC is Matthew correlation coefficient; ACC is calculated in percentage
*Overall and average accuracy obtained at SVM parameters: For Fungi dataset (kernel = RBF, γ = 5, C = 4); Animal dataset (kernel = RBF, γ = 5, C = 2); Plant dataset (RBF, γ = 9, C = 3).
# SVM parameters obtained for each class using hybrid1 features-For Fungi dataset (Cytoplasm: j = 4, γ = 7, C = 0.4, threshold value = 0.0; Mitochondria: j = 5, γ = 1, C = 1.6, threshold value = 0.0; Nuclear: j = 4, γ = 7, C = 0.54, threshold value = 0.0; Extracellular: j = 3, γ = 1, C = 1, threshold value = 0.0), Animal dataset (Cytoplasm: j = 3, γ = 9, C = 0.5, threshold value = 0.0; Mitochondria: j = 25, γ = 1, C = 2, threshold value = 0.0; Nuclear: j = 3, γ = 9, C = 0.5, threshold value = 0.0; Extracellular: j = 6, γ = 2, C = 1, threshold value = -0.1), Plant dataset (Cytoplasm: j = 2, γ = 3, C = 0.7, threshold value = 0.1; Mitochondria: j = 1, γ = 5, C = 75, threshold value = 0.2; Nuclear: j = 2, γ = 3, C = 0.7, threshold value = 0.1; Extracellular: j = 9, γ = 1, C = 1, threshold value = 0.0)
The detailed evaluation of performance on an independent datasets of 707 animal and 179 fungi proteins
| Localizations | BaCelLo* | LOCtree* | PLOC | MultiLoc | ||
| Cytoplasm | 54.0 | 38.2 | 23.4 | 60.6 | ||
| Mitochondria | 68.6 | 60.0 | 54.2 | 65.7 | ||
| Nuclear | 66.1 | 62.2 | 82.1 | 58.4 | ||
| Extracellular | 85.5 | 84.9 | 42.4 | 68 | ||
| 68.6 | 63.0 | 59.7 | 61.5 | |||
| 68.5 | 61.3 | 50.5 | 63.2 | |||
| Localizations | BaCelLo* | LOCtree* | PLOC | MultiLoc | ||
| Cytoplasm | 56.7 | 46.7 | 13.3 | 20.0 | ||
| Mitochondria | 100 | 63.6 | 45.5 | 72.7 | ||
| Nuclear | 66.4 | 66.4 | 87.7 | 54.9 | ||
| Extracellular | 93.8 | 81.3 | 62.5 | 75.0 | ||
| 69.2 | 64.3 | 70.4 | 52.0 | |||
| 79.2 | 64.3 | 52.3 | 55.7 | |||
*The values are obtained from reference 18