| Literature DB >> 20472543 |
Nancy Y Yu1, James R Wagner, Matthew R Laird, Gabor Melli, Sébastien Rey, Raymond Lo, Phuong Dao, S Cenk Sahinalp, Martin Ester, Leonard J Foster, Fiona S L Brinkman.
Abstract
MOTIVATION: PSORTb has remained the most precise bacterial protein subcellular localization (SCL) predictor since it was first made available in 2003. However, the recall needs to be improved and no accurate SCL predictors yet make predictions for archaea, nor differentiate important localization subcategories, such as proteins targeted to a host cell or bacterial hyperstructures/organelles. Such improvements should preferably be encompassed in a freely available web-based predictor that can also be used as a standalone program.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20472543 PMCID: PMC2887053 DOI: 10.1093/bioinformatics/btq249
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
New subcategory SCLs predicted by PSORTb 3.0
| SCL subcategories | Description |
|---|---|
| Host-associated | Any proteins destined to the host cell cytoplasm, cell membrane or nucleus by any of the bacterial secretion systems |
| Type III secretion | Components of the Type III secretion apparatus |
| Fimbrial | Components of a bacterial or archaeal fimbrium or pilus |
| Flagellar | Components of a bacterial or archaeal flagellum |
| Spore | Components of a spore |
Performance comparisons for Gram-positive and Gram-negative bacterial SCL prediction software
| Software | Precision | Recall | Accuracy | MCC |
|---|---|---|---|---|
| Gram-positive | ||||
| PSORTb 3.0 | 98.2 | 93.1 | 97.9 | 0.79 |
| PSORTb 2.0 | 97.0 | 90.0 | 96.8 | 0.76 |
| CELLO 2.5 | 93.7 | 93.7 | 96.9 | 0.76 |
| Gpos-PLoc | 91.2 | 90.7 | 95.5 | 0.64 |
| PA 2.5 | 90.0 | 81.8 | 90.9 | 0.57 |
| Gram-negative | ||||
| PSORTb 3.0 | 97.3 | 94.1 | 98.3 | 0.85 |
| PA 2.5 | 97.3 | 92.0 | 97.9 | 0.85 |
| PSORTb 2.0 | 95.9 | 85.3 | 96.3 | 0.69 |
| SubcellPredict | 94.3 | 94.3 | 96.0 | 0.52 |
| SLP-Local | 93.8 | 93.8 | 95.9 | 0.59 |
| Gneg-PLoc | 89.6 | 88.9 | 95.7 | 0.65 |
| CELLO 2.5 | 87.5 | 87.5 | 95.0 | 0.61 |
aPA 3.0 is not included in the analysis since we are unable to determine the degree of overlap between our testing dataset and the training dataset of PA 3.0.
b where TP is the number of true positives, FP the number of false positives, TN the number of true negatives, FN the number of false negatives and MCC the Matthew's Coefficient Constant.
cSoftware also predicts periplasmic SCL. None of the testing dataset proteins received a periplasmic SCL prediction.
dSoftware only predicts cytoplasmic, membrane and extracellular categories. All proteins (including cell wall proteins) submitted to the server will receive one or more of these three localization predictions (or ‘no predictions’).
eSoftware only predicts cytoplasmic, periplasmic, and extracellular categories. All proteins (including membrane proteins) submitted to the server will receive one of these three SCL predictions.
fSoftware also predicts flagellar, fimbrium and nucleoid localizations; however, none of test dataset proteins received one of these three SCL predictions.
Performance comparison for archaeal proteins between the PSORTb 3.0 archaeal option and software with Gram-positive SCL prediction capability
| Software | Precision | Recall | Accuracy | MCC |
|---|---|---|---|---|
| PSORTb 3.0 | 97.2 | 93.4 | 97.7 | 0.83 |
| PSORTb 2.0 | 95.7 | 81.0 | 94.3 | 0.59 |
| Gpos-PLoc | 92.3 | 92.3 | 96.2 | 0.65 |
| PA 2.5 | 90.0 | 77.5 | 89.6 | 0.38 |
| CELLO 2.5 | 86.5 | 86.5 | 93.2 | 0.46 |
aPA 3.0 is not included in the analysis since the exact content of the training dataset is unknown and may skew the cross-validation results.
bSee Table 2 footnotes for definitions of the four performance metrics.
cSoftware also predicts periplasmic SCL. None of the testing dataset proteins received a periplasmic SCL prediction.
dSoftware does not predict cell wall localization.
Evaluation of PSORTb 3.0, PSORTb 2, PA 3.0 and PA 2.5 using an LC–MS/MS proteomics dataset of proteins found exclusively in the cytoplasmic fraction when comparing to the periplasmic and extracellular fractions of P.aeruginosa PA01
| Software | Precision | Recall |
|---|---|---|
| PSORTb 3.0 | 96.3 | 91.8 |
| PA 3.0 | 95.9 | 81.3 |
| PA 2.5 | 90.7 | 51.5 |
| PSORTb 2.0 | 90.3 | 54.4 |
aPrecision in this case refers to TP/(TP + FP), where FP refers to proteins predicted as SCLs other than ‘Cytoplasmic’ or ‘Unknown’.
Fig. 1.Genome coverage prediction for PSORTb 2.0 and PSORTb 3.0 for Gram-negative and Gram-positive bacteria genomes. Chr1 denotes chromosome 1.
Fig. 2.Genome prediction coverage results from combining PSORTb 3.0 and PA 3.0 output. The majority of the ‘disagreement’ cases are boundary localizations (membrane prediction and a neighboring compartment). This likely reflects the true nature of the proteins. Only a small fraction of the disagreements (2.5–5% of the deduced proteome) are non-boundary cases.