| Literature DB >> 17670799 |
Jie Liu1, Shuli Kang, Chuanning Tang, Lynda B M Ellis, Tongbin Li.
Abstract
Meta-prediction seeks to harness the combined strengths of multiple predicting programs with the hope of achieving predicting performance surpassing that of all existing predictors in a defined problem domain. We investigated meta-prediction for the four-compartment eukaryotic subcellular localization problem. We compiled an unbiased subcellular localization dataset of 1693 nuclear, cytoplasmic, mitochondrial and extracellular animal proteins from Swiss-Prot 50.2. Using this dataset, we assessed the predicting performance of 12 predictors from eight independent subcellular localization predicting programs: ELSPred, LOCtree, PLOC, Proteome Analyst, PSORT, PSORT II, SubLoc and WoLF PSORT. Gorodkin correlation coefficient (GCC) was one of the performance measures. Proteome Analyst is the best individual subcellular localization predictor tested in this four-compartment prediction problem, with GCC = 0.811. A reduced voting strategy eliminating six of the 12 predictors yields a meta-predictor (RAW-RAG-6) with GCC = 0.856, substantially better than all tested individual subcellular localization predictors (P = 8.2 x 10(-6), Fisher's Z-transformation test). The improvement in performance persists when the meta-predictor is tested with data not used in its development. This and similar voting strategies, when properly applied, are expected to produce meta-predictors with outstanding performance in other life sciences problem domains.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17670799 PMCID: PMC1976432 DOI: 10.1093/nar/gkm562
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Compiling the MetaSCL06 dataset. Nuc: nuclear; Cyt: cytoplasmic; Mit: mitochondrial; Ext: extracellular.
Summary of the 12 element predictors
| Element predictor | Reference | URL | Other subcellular compartments predicted |
|---|---|---|---|
| ELSpred_comp | ( | None | |
| ELSpred_physico | |||
| ELSpred_dipeptide | |||
| ELSpred_EuPSI | |||
| ELSpred_hybrid | |||
| LOCtree | ( | Organelles | |
| PLOC | ( | Cytoskeleton, Endoplasmic reticulum, Golgi apparatus, Lysosome, Peroxisome, Plasma membrane | |
| Proteome Analyst | ( | Endoplasmic Reticulum, Golgi apparatus, Lysosome, Peroxisome Plasma membrane | |
| PSORT | ( | Endoplasmic reticulum, Golgi apparatus, Lysosome, Microbody, Plasma membrane | |
| PSORT II | ( | Cytoskeleton, Endoplasmic reticulum, Golgi apparatus Lysosome, Plasma membrane, Peroxisome, Secretory vesicles | |
| SubLoc | ( | None | |
| WoLF PSORT | ( | Cytoskeleton, Endoplasmic reticulum, Golgi apparatus, Lysosome, Peroxisome, Plasma membrane |
aFor data features and classification methods, see text.
bOther subcellular compartments besides the four compartments focused on in this study: nuclear, cytoplasmic, mitochondria and extracellular.
Predicting performance of element predictors using the MetaSCL06 dataset
| Element predictor | Accuracy | Relative accuracy | GCC | Weights for RAW-RAG-6 | ||
|---|---|---|---|---|---|---|
| ELSpred_comp | 1693 | 951 | 0.562 | 0.562 | 0.359 | |
| ELSpred_physicochemical | 1693 | 1028 | 0.607 | 0.607 | 0.409 | 0.607 |
| ELSpred_dipeptide | 1693 | 750 | 0.443 | 0.443 | 0.215 | |
| ELSpred_EuPSI | 624 | 549 | 0.324 | 0.880 | 0.458 | 0.880 |
| ELSpred_hybrid | 1693 | 659 | 0.389 | 0.389 | 0.179 | |
| LOCtree | 1649 | 1263 | 0.746 | 0.766 | 0.663 | 0.766 |
| PLOC | 1692 | 1014.5 | 0.599 | 0.600 | 0.465 | |
| Proteome Analyst | 1523 | 1390 | 0.821 | 0.913 | 0.811 | 0.913 |
| PSORT | 1692 | 916 | 0.541 | 0.541 | 0.438 | |
| PSORT II | 1687 | 1013 | 0.598 | 0.600 | 0.464 | 0.601 |
| SubLoc | 1687 | 973 | 0.575 | 0.577 | 0.409 | |
| WoLF PSORT | 1687 | 1240.5 | 0.733 | 0.735 | 0.635 | 0.735 |
Comparison based on the 1693 proteins in the MetaSCL06 dataset.
aThe six element predictors used and, for each, the weight it was given in the high-scoring RAW-RAG-6 meta-predictor (see text).
bPLOC and WoLF PSORT output two most likely subcellular compartments with equal scores for some proteins. If one of them matches the true compartment label, the prediction is deemed ‘half correct’, counted as 0.5 correct prediction made.
Predicting performance of unreduced voting meta-predictors as compared with that of element predictors
| Predictor | Accuracy | GCC |
|---|---|---|
| Average of all element predictors | 0.578 | 0.459 |
| Best element predictor (Proteome Analyst) | 0.821 | 0.811 |
| UV-UR | 0.754 | 0.651 |
| AW-UR | 0.808 | 0.724 |
| RAW-UR | 0.819 | 0.740 |
| GW-UR | 0.838 | 0.767 |
Comparison based on the 1693 proteins in the MetaSCL06 dataset. UV: unweighted voting; AW: accuracy weighted voting; RAW: relative accuracy weighted voting; GW: GCC weighted voting. UR stands for ‘unreduced’.
Figure 2.Performance of reduced voting meta-predictors (accuracy on the left, GCC on the right) plotted against the number of excluded element predictors. (A) Relative accuracy weighted voting (RAW) combined with three reduction methods—accuracy-guided reduction (AG), relative accuracy-guided reduction (RAG) and GCC-guided reduction (GG), giving rise to three series of meta-predictors. (B) Four voting schemes—unweighted voting (UV), accuracy-weighted voting (AW), relative accuracy-weighted voting (RAW) and GCC-weighted voting (GW) are combined with RAG reduction method, making four series of meta-predictors. All curves roughly show a biphasic characteristic—a rising phase followed by a decline phase. Dotted lines indicate performance of the best element predictor (Proteome Analyst, accuracy: 0.821, GCC: 0.811).
Predicting performance of reduced voting meta-predictors
| AG | RAG | GG | |
|---|---|---|---|
| Accuracy | |||
| UV | 0.892 (8) | 0.892 (7) | 0.863 (9) |
| AW | 0.898 (7) | 0.899 (6) | 0.882 (8) |
| RAW | 0.897 (8) | 0.892 (8) | |
| GW | 0.897 (7) | 0.899 (6) | 0.883 (8) |
| GCC | |||
| UV | 0.841 (0.003) | 0.841 (0.003) | 0.816 (0.33) |
| AW | 0.846 (0.00057) | 0.851 (8 × 10−5) | 0.829 (0.055) |
| RAW | 0.849 (0.00018) | 0.842 (0.0022) | |
| GW | 0.848 (0.00027) | 0.852 (5.2 × 10−5) | 0.830 (0.045) |
Comparison based on the 1693 proteins in the MetaSCL06 dataset. Best reduced voting meta-predictor (RAW-RAG-6) is shown in bold. UV: unweighted voting; AW: accuracy weighted voting; RAW: Relative accuracy weighted voting; GW: GCC weighted voting; AG: accuracy guided reduction; RAG: relative guided reduction; GG: GCC-guided reduction. Number of excluded element predictors are shown after the accuracy values (enclosed in parentheses). P-values of GCC (Fisher's Z-transformation test) are shown after the GCC values (enclosed in parentheses).
Predicting performance of element predictors and RAW-RAG-6 using the MetaSCL07 dataset
| Element predictor | Accuracy | GCC |
|---|---|---|
| ELSpred_comp | 0.230 | 0.145 |
| ELSpred_physicochemical | 0.370 | 0.081 |
| ELSpred_dipeptide | 0.434 | 0.274 |
| ELSpred_EuPSI | 0.252 | 0.424 |
| ELSpred_hybrid | 0.332 | 0.136 |
| LOCtree | 0.829 | 0.757 |
| PLOC | 0.446 | 0.324 |
| Proteome Analyst | 0.775 | 0.783 |
| PSORT | 0.494 | 0.427 |
| PSORT II | 0.459 | 0.348 |
| SubLoc | 0.451 | 0.230 |
| WoLF PSORT | 0.654 | 0.565 |
Comparison based on the 579 proteins in the MetaSCL07 dataset. P-values of GCC (Fisher's Z-transformation test) are shown after the GCC value of RAW-RAG-6 (enclosed in parentheses).
Predicting performance of element predictors and RAW-RAG-6 in two-class predictions for the 4 subcellular compartments
| Predictor | Sensitivity | Specificity | Accuracy | MCC |
|---|---|---|---|---|
| Nuclear | ||||
| ELSpred_comp | 0.776 | 0.620 | 0.676 | 0.380 |
| ELSpred_ physicochemical | 0.824 | 0.681 | 0.732 | 0.484 |
| ELSpred_dipeptide | 0.901 | 0.369 | 0.560 | 0.291 |
| ELSpred_EuPSI | 0.545 | 0.977 | 0.822 | 0.615 |
| ELSpred_hybrid | 0.572 | 0.444 | 0.490 | 0.015 |
| LOCtree | 0.728 | 0.935 | 0.861 | 0.692 |
| PLOC | 0.890 | 0.622 | 0.718 | 0.495 |
| Proteome Analyst | 0.781 | 0.980 | 0.908 | 0.801 |
| PSORT | 0.611 | 0.944 | 0.825 | 0.611 |
| PSORT II | 0.774 | 0.785 | 0.781 | 0.544 |
| SubLoc | 0.799 | 0.757 | 0.772 | 0.537 |
| WoLF PSORT | 0.768 | 0.931 | 0.872 | 0.719 |
| RAW-RAG-6 | 0.923 | 0.932 | 0.929 | |
| ELSpred_comp | 0.515 | 0.916 | 0.875 | 0.391 |
| ELSpred_ physicochemical | 0.075 | 0.932 | 0.845 | 0.009 |
| ELSpred_dipeptide | 0.243 | 0.897 | 0.831 | 0.132 |
| ELSpred_EuPSI | 0.434 | 0.972 | 0.917 | 0.485 |
| ELSpred_hybrid | 0.705 | 0.753 | 0.748 | 0.305 |
| LOCtree | 0.607 | 0.886 | 0.857 | 0.402 |
| PLOC | 0.523 | 0.950 | 0.906 | 0.480 |
| Proteome Analyst | 0.728 | 0.944 | 0.922 | 0.617 |
| PSORT | 0.347 | 0.835 | 0.785 | 0.142 |
| PSORT II | 0.538 | 0.842 | 0.811 | 0.289 |
| SubLoc | 0.561 | 0.840 | 0.811 | 0.302 |
| WoLF PSORT | 0.517 | 0.921 | 0.880 | 0.404 |
| RAW-RAG-6 | 0.699 | 0.955 | 0.929 | |
| Mitochondrial | ||||
| ELSpred_comp | 0.189 | 0.971 | 0.868 | 0.247 |
| ELSpred_ physicochemical | 0.284 | 0.971 | 0.881 | 0.357 |
| ELSpred_dipeptide | 0.099 | 0.969 | 0.855 | 0.117 |
| ELSpred_EuPSI | 0.144 | 0.997 | 0.885 | 0.331 |
| ELSpred_hybrid | 0.225 | 0.975 | 0.877 | 0.306 |
| LOCtree | 0.761 | 0.950 | 0.926 | 0.686 |
| PLOC | 0.288 | 0.967 | 0.878 | 0.346 |
| Proteome Analyst | 0.730 | 1.000 | 0.965 | 0.837 |
| PSORT | 0.306 | 0.963 | 0.877 | 0.352 |
| PSORT II | 0.243 | 0.967 | 0.872 | 0.296 |
| SubLoc | 0.329 | 0.936 | 0.857 | 0.300 |
| WoLF PSORT | 0.369 | 0.970 | 0.892 | 0.438 |
| RAW-RAG-6 | 0.793 | 0.996 | 0.969 | |
| Extracellular | ||||
| ELSpred_comp | 0.505 | 0.841 | 0.704 | 0.371 |
| ELSpred_ physicochemical | 0.654 | 0.826 | 0.756 | 0.489 |
| ELSpred_dipeptide | 0.201 | 0.944 | 0.641 | 0.224 |
| ELSpred_EuPSI | 0.161 | 0.996 | 0.655 | 0.306 |
| ELSpred_hybrid | 0.203 | 0.982 | 0.664 | 0.312 |
| LOCtree | 0.792 | 0.968 | 0.896 | 0.787 |
| PLOC | 0.462 | 0.979 | 0.768 | 0.540 |
| Proteome Analyst | 0.909 | 0.988 | 0.956 | 0.909 |
| PSORT | 0.604 | 0.986 | 0.830 | 0.665 |
| PSORT II | 0.573 | 0.991 | 0.820 | 0.650 |
| SubLoc | 0.460 | 0.888 | 0.714 | 0.393 |
| WoLF PSORT | 0.873 | 0.957 | 0.923 | 0.840 |
| RAW-RAG-6 | 0.970 | 0.993 | 0.983 |
Comparison made based on datasets derived from the MetaSCL06 dataset. P-values of MCC (Fisher's Z-transformation test) are shown together with the MCC values (enclosed in parentheses).