| Literature DB >> 16288665 |
Sébastien Rey1, Jennifer L Gardy, Fiona S L Brinkman.
Abstract
BACKGROUND: Identification of a bacterial protein's subcellular localization (SCL) is important for genome annotation, function prediction and drug or vaccine target identification. Subcellular fractionation techniques combined with recent proteomics technology permits the identification of large numbers of proteins from distinct bacterial compartments. However, the fractionation of a complex structure like the cell into several subcellular compartments is not a trivial task. Contamination from other compartments may occur, and some proteins may reside in multiple localizations. New computational methods have been reported over the past few years that now permit much more accurate, genome-wide analysis of the SCL of protein sequences deduced from genomes. There is a need to compare such computational methods with laboratory proteomics approaches to identify the most effective current approach for genome-wide localization characterization and annotation.Entities:
Mesh:
Substances:
Year: 2005 PMID: 16288665 PMCID: PMC1314894 DOI: 10.1186/1471-2164-6-162
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
PSORTb v.2.0 predicted localization sites for 405 proteins reported in ten subproteome studies.
| Laboratory Data | PSORTb v.2.0 Predicted Localizationa) | ||||||||||||||
| Organism | Fractiona) | Total | C | C/CM | C/P | CM | CM/P | P | P/OM | OM | OM/EC | EC | UN | Agreementb) | Coveragec) |
| C | 23 | 19 | - | - | - | - | 1 | - | - | - | - | 3 | 95.0 | 87.0 | |
| CM | 63 | 13 | 2 | - | 5 | 1 | 6 | - | 5 | 1 | - | 30 | 24.2 | 52.4 | |
| P | 57 | 2 | - | 1 | - | - | 8 | - | 3 | - | - | 43 | 64.3 | 24.6 | |
| OM | 3 | - | - | - | - | - | - | - | 3 | - | - | - | 100.0 | 100.0 | |
| OM | 11 | 2 | - | - | - | - | - | - | 6 | 1 | - | 2 | 77.8 | 81.8 | |
| OM | 39 | 3 | - | - | - | 1 | 3 | - | 22 | 1 | 1 | 8 | 74.2 | 79.5 | |
| OM | 6 | - | - | - | - | - | - | - | 2 | - | 1 | 3 | 66.7 | 50.0 | |
| OM | 33 | 4 | - | - | 1 | - | - | 1 | 22 | 2 | 1 | 2 | 80.6 | 93.9 | |
| EC | 150 | 33 | - | - | 5 | 1 | 33 | - | 9 | 6 | 63 | 6.9 | 58.0 | ||
| EC | 20 | 3 | - | - | - | - | 2 | - | 4 | 1 | 1 | 9 | 18.2 | 55.0 | |
a) C = cytoplasmic, CM = cytoplasmic membrane, P = periplasmic, OM = outer membrane, EC = extracellular, and UN = unknown.
b) Percentage of agreement is defined by , where: A represents the number of proteins of the fraction X predicted by PSORTb to be resident at X and X/Y localization sites.
B represents the total number of proteins of the fraction X predicted as not unknown by PSORTb.
c) Percentage of coverage is defined by , where: B represents the total number of proteins of the fraction X predicted as not unknown by PSORTb.
T represents the total number of proteins identified in the fraction X.
Estimation of subproteome study error rate.
| Organism | Fractiona) | Total proteins identified | Disagreementsb) | Confirmed PSORTb errorsc) | Confirmed laboratory errorsd) | % Errorse) |
| C | 23 | 1 | 0 | 0 | 0.0 | |
| CM | 63 | 25 | 0 | 4 | 6.3 | |
| P | 57 | 5 | 0 | 1 | 1.8 | |
| OM | 3 | 0 | 0 | 0 | 0.0 | |
| OM | 11 | 2 | 0 | 2 | 18.2 | |
| OM | 39 | 8 | 0 | 6 | 15.4 | |
| OM | 6 | 1 | 0 | 1 | 16.7 | |
| OM | 33 | 6 | 0 | 3 | 9.1 | |
| EC | 150 | 81 | 2 | 36 | 24.0 | |
| EC | 20 | 9 | 1 | 5 | 25.0 | |
| Total | 405 | 138 | 3 | 58 | 14.3 | |
a) C = cytoplasmic, CM = cytoplasmic membrane, P = periplasmic, OM = outer membrane, and EC = extracellular.
b) Disagreement represents the number of proteins of the fraction X predicted by PSORTb not to be resident at X or X/Y localization sites.
c) Confirmed PSORTb error represents the number of disagreeing cases for which the PSORTb predicted localization site was found to be incorrect.
d) Confirmed laboratory error represents the number of disagreeing cases for which the PSORTb predicted localization site was found to be correct.
e) % Errors is calculated as the number of confirmed laboratory errors divided by the total number of proteins identified.
Advantages and disadvantages of computational and subproteomic approaches to localization analysis.
| Rapid predictions for all proteins deduced to be encoded in a given sequence | Can be performed under different conditions and provide condition-specific information |
| Detailed information about specific features of proteins, e.g. signal peptides, TMHs | Confirms expression of hypothetical proteins |
| Identification of potential contaminants in subproteome analyses | Large-scale source of data on SCL for hypothetical proteins that cannot be easily predicted computationally |
| Identification of hydrophobic integral membrane proteins | |
| Does not perform as well (less predictions) when analyzing an organism that is not similar to well studied/model organisms. | Time-consuming |
| May miss flagging some multiply-localized proteins | Low abundance and hydrophobic proteins not readily detected |
| Poorly predicts particular localizations for which there is little training data, or the proteins are computationally difficult to differentiate between localizations. | Difficult to accurately identify all proteins found on the gel |
| Cannot identify condition-specific data on SCL, particularly proteins that change SCL depending on the condition. | One subcellular fraction at once analyzed |
| Subfractionation often results in contamination | |
| Cannot identify multiply localized proteins | |