| Literature DB >> 20299325 |
Sebastian Briesemeister1, Jörg Rahnenführer, Oliver Kohlbacher.
Abstract
MOTIVATION: Protein subcellular localization is pivotal in understanding a protein's function. Computational prediction of subcellular localization has become a viable alternative to experimental approaches. While current machine learning-based methods yield good prediction accuracy, most of them suffer from two key problems: lack of interpretability and dealing with multiple locations.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20299325 PMCID: PMC2859129 DOI: 10.1093/bioinformatics/btq115
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Performance comparison using two IDSs
| Dataset | YLoc- | YLoc- | YLoc+ | MultiLoc2- | MultiLoc2- | BaCelLo | LOCTree | WoLF PSORT | Euk-mPloc | KnowPred |
|---|---|---|---|---|---|---|---|---|---|---|
| LowRes | HighRes | LowRes | HighRes | |||||||
| B Animals | 0.75 ( | 0.69 (0.74) | 0.67 (0.58) | 0.71 (0.68) | 0.66 (0.64) | 0.58 (0.62) | 0.67 (0.70) | 0.54 (0.61) | 0.69 (0.75) | |
| B Fungi | 0.51 (0.56) | 0.51 (0.48) | 0.58 (0.53) | 0.60 (0.57) | 0.43 (0.47) | 0.51 (0.50) | 0.56 (0.60) | 0.56 ( | ||
| B Plants | 0.58 (0.71) | 0.54 (0.58) | 0.49 (0.53) | 0.54 (0.62) | 0.56 (0.69) | 0.58 (0.70) | 0.46 (0.57) | 0.37 (0.46) | 0.23 (0.29) | |
| H Animals | – (–) | 0.34 (0.56) | 0.37 (0.53) | – (–) | – (–) | – (–) | 0.18 (0.36) | 0.24 (0.27) | 0.37 (0.49) |
Performance of the YLoc predictors and other state-of-the-art predictors using the Bacello (B) IDS and the Höglund (H) IDS concerning F1 and ACC (in brackets). The performance of YLoc+, WoLF PSORT, Euk-mPloc and KnowPred was measured using the generalized F1 and ACC. The highest-ranking method regarding each measure is highlighted in bold. Note that the WoLF PSORT results differ slightly from those obtained in Blum et al. (2009) due to some changes in the underlying dataset. Also note that KnowPred does not predict chloroplasts.
Performance of YLoc using the BaCelLo animal IDS for different minimum confidence levels
| Predictor | Measure | 0.00 | 0.20 | 0.40 | 0.60 | 0.80 | 0.90 |
|---|---|---|---|---|---|---|---|
| YLoc-LowRes | F1 | 0.75 | 0.76 | 0.78 | 0.80 | 0.84 | 0.95 |
| ACC | 0.79 | 0.79 | 0.81 | 0.86 | 0.91 | 0.93 | |
| No. Inst. | 576 | 467 | 395 | 299 | 189 | 118 | |
| YLoc-HighRes | F1 | 0.69 | 0.74 | 0.76 | 0.76 | 0.77 | 0.77 |
| ACC | 0.74 | 0.78 | 0.80 | 0.82 | 0.83 | 0.84 | |
| No. Inst. | 576 | 507 | 470 | 428 | 391 | 354 | |
| YLoc+ | F1 | 0.67 | 0.69 | 0.72 | 0.77 | 0.76 | 0.81 |
| ACC | 0.58 | 0.60 | 0.62 | 0.65 | 0.65 | 0.69 | |
| No. Inst. | 576 | 494 | 423 | 324 | 219 | 142 |
For each minimum confidence score the prediction performance is given using F1 and ACC as well as the number of instances that can be predicted with at least this score. The performance of YLoc+ was measured using the generalized F1 and ACC.
Performance comparison using the DBMLoc dataset
| Measures | YLoc+ | Euk-mPloc | WoLF PSORT | KnowPred |
|---|---|---|---|---|
| Single-label | 0.04 (0.05) | 0.03 (0.05) | 0.28 ( | |
| Multi-label | 0.44 (0.41) | 0.52 (0.43) | 0.66 (0.63) |
The performance was measures using F1 and ACC (in brackets). For YLoc+ and WoLF PSORT, only the best-performing version is shown. The highest-ranking method regarding each measure is highlighted in bold.
Fig. 1.The distribution of proteins regarding the secretory pathway signal (SPS) feature of YLoc-LowRes (animal version) is shown. For every discretization interval, the interval borders and an interpretation is given.
YLoc output of an example prediction
| Sequence feature | DS | Nu | Cy | Mi | SP |
|---|---|---|---|---|---|
| Strong secretory pathway sorting signal (high hydrophobic autocorrelation within first 20 amino acids) | 5.72 | 0.01 | 0.00 | 0.02 | 0.69 |
| Barely charged (low overall charge autocorrelation) | 2.89 | 0.10 | 0.16 | 0.02 | 0.28 |
| No mono NLS sorting signal | 2.89 | 0.04 | 0.12 | 0.02 | 0.26 |
| Strong putative mitochondrial or secretory pathway sorting signal (large weighted sum of amino acids typical | 1.68 | 0.58 | 0.62 | 0.16 | 0.84 |
| for mi and SP (Nakai and Kanehisa, | |||||
| Very hydrophobic protein [high pseudo-amino acid count of hydrophobic amino acids (CITVWY)] | 2.32 | 0.08 | 0.13 | 0.04 | 0.36 |
| Very hydrophonic N-terminus (high pseudo-amino acid count of very hydrophobic residues | 2.06 | 0.09 | 0.05 | 0.08 | 0.41 |
| within the first 90 amino acids) |
The six most discriminating protein features are displayed in order of their absolute DS. The features are manually annotated with a biological property. A more detailed description of each feature is given in italics. For each location the ratio of proteins having this particular feature is shown.