| Literature DB >> 20840726 |
Chenglei Sun1, Xing-Ming Zhao, Weihua Tang, Luonan Chen.
Abstract
BACKGROUND: The fungal pathogen Fusarium graminearum (telomorph Gibberella zeae) is the causal agent of several destructive crop diseases, where a set of genes usually work in concert to cause diseases to crops. To function appropriately, the F. graminearum proteins inside one cell should be assigned to different compartments, i.e. subcellular localizations. Therefore, the subcellular localizations of F. graminearum proteins can provide insights into protein functions and pathogenic mechanisms of this destructive pathogen fungus. Unfortunately, there are no subcellular localization information for F. graminearum proteins available now. Computational approaches provide an alternative way to predicting F. graminearum protein subcellular localizations due to the expensive and time-consuming biological experiments in lab.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20840726 PMCID: PMC2982686 DOI: 10.1186/1752-0509-4-S2-S12
Source DB: PubMed Journal: BMC Syst Biol ISSN: 1752-0509
Distributions of the fungi proteins with known subcellular localizations, where only localizations with more than 30 annotations are shown for clarity.
| Localization | Proteins in UniProtKB | Proteins_40 |
|---|---|---|
| Extracellular | 272 | 148 |
| Cytoplasm | 1357 | 916 |
| ER | 895 | 561 |
| Golgi apparatus | 276 | 150 |
| Nucleus | 1538 | 1354 |
| Mitochondrion | 1719 | 949 |
| Peroxisome | 120 | 82 |
| Endosome | 105 | 54 |
| Vacuole | 315 | 192 |
| Cell membrane | 351 | 186 |
| Total | 6948 | 4592 |
a Number of proteins with unique localization found in UniProtKB.
b Curated data set with pairwise sequence identity <40%.
The 10-fold cross-validation results by SVM classifiers based on different features and those by ensemble classifiers for 10 locations with respect to AUC scores, where the ensemble classifiers are the optimal combinations of different SVM classifiers trained on features without strikethrough. The numbers with strikethrough indicate the corresponding classifier was not used in the ensemble classifier. The numbers within the brackets denote the corresponding Gap.
| Localization | threAA | N-term | C-term | Gap1 | Gap2 | Ensemble |
|---|---|---|---|---|---|---|
| Extracellular | 0.812 | 0.943(7) | 0.950 | |||
| Cytoplasm | 0.738 | |||||
| ER | 0.827 | |||||
| Golgi apparatus | 0.729(5) | 0.848 | ||||
| Nucleus | 0.721 | |||||
| Mitochondrion | 0.665 | 0.802(13) | 0.833 | |||
| Peroxisome | 0.882 | |||||
| Endosome | 0.895 | |||||
| Vacuole | 0.746(9) | 0.820 | ||||
| Cell membrane | 0.837 |
Figure 1The comparison of performance of SVM classifiers without feature selection against those with feature selection and balancing, where the results were obtained with 10-fold cross-validation.
Comparison of FGsub with PLOC and PSLDoc based on the 10 fold cross-validation on the fungi data set with respect to AUC scores.
| Localization | PLOC | PSLDoc | FGsub |
|---|---|---|---|
| Extracellular | 0.9220 | 0.9140 | |
| Cytoplasm | 0.6572 | 0.6668 | |
| ER | 0.7813 | 0.8083 | |
| Golgi apparatus | 0.7314 | 0.7090 | |
| Nucleus | 0.7088 | 0.721 | |
| Mitochondrion | 0.7972 | 0.8069 | |
| Peroxisome | 0.6335 | 0.6684 | |
| Endosome | 0.8031 | 0.7993 | |
| Vacuole | 0.7141 | 0.7476 | |
| Cell membrane | 0.7588 | 0.7683 |
Distributions of the predicted subcellular localizations for 12786 F. graminearum proteins based on ensemble classifier and BLAST.
| Localization | Ensemble classifier | BLAST | Ensemble classifier+BLAST |
|---|---|---|---|
| Extracellular | 3105 | 262 | 3163 |
| Cytoplasm | 4782 | 2050 | 5699 |
| ER | 4016 | 520 | 4166 |
| Golgi apparatus | 1773 | 246 | 1975 |
| Nucleus | 1381 | 1858 | 2868 |
| Mitochondrion | 4115 | 952 | 4484 |
| Peroxisome | 2202 | 154 | 2315 |
| Endosome | 1075 | 52 | 1114 |
| Vacuole | 3377 | 262 | 3505 |
| Cell membrane | 5035 | 346 | 5130 |
| Bud | 11 | 11 | |
| Bud neck | 36 | 36 | |
| Bud tip | 6 | 6 | |
| Lipid-anchor | 61 | 61 | |
| Centromere | 23 | 23 | |
| Kinetochore | 28 | 28 | |
| Telomere | 19 | 19 | |
| cytoskeleton | 88 | 88 | |
| Spindle | 48 | 48 | |
| Prospore membrane | 4 | 4 | |
| Peripheral membrane | 280 | 280 | |
| Multi-pass membrane | 968 | 968 | |
| Single-pass membrane | 229 | 229 | |
| Preautophagosomal structure membrane | 4 | 4 | |
| Total | 12532 | 4897 | 12786 |
Figure 2The distribution of proteins among the top 10 subcellular localizations.
Figure 3Distributions of functional similarity for protein pairs located to the same subcellular localizations and different subcellular localizations, respectively.
Figure 4The schematic flowchart of the proposed method for re-balancing the imbalanced data set.
Figure 5The schematic flowchart of predicting query proteins by ensemble classifier.