| Literature DB >> 21810203 |
Abstract
BACKGROUND: Genome-wide prediction of protein subcellular localization is an important type of evidence used for inferring protein function. While a variety of computational tools have been developed for this purpose, errors in the gene models and use of protein sorting signals that are not recognized by the more commonly accepted tools can diminish the accuracy of their output.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21810203 PMCID: PMC3223724 DOI: 10.1186/1471-2164-12-S1-S1
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1General strategy for predicting protein localization in gram negative bacteria.
Core components and associated domains of gram negative export and sorting systems
| Export/Insertion System | Function | Core components | Domains | Localization1 | Comments |
|---|---|---|---|---|---|
| General (Sec) | Translocation of unfolded proteins across the inner membrane | SecY | TIGR00967 | IM | Signal peptides cleaved by LspA tend to be shorter than those cleaved by LepB |
| SecE | TIGR00964 | IM | |||
| SecG | TIGR00810 | IM | |||
| Twin arginine (Tat) | Translocation of folded proteins across the inner membrane | TatA | TIGR01411 | IM | respiratory proteins that require cytoplasmic enzymes to covalently attachment metal cofactors (e.g. have iron sulfur, copper, molybdopterin) are expected substrates |
| TatB | TIGR01410 | IM | |||
| TatC | TIGR00945 | IM | |||
| Holin | Translocation of phage endolysin across the inner membrane | Holin | Numerous, but Genus-specific | IM | Encoded near endolysin in double-strand phage |
| Lol | Insertion of lipoproteins in the outer membrane | LolB | TIGR00548 | LP-OM | beta and gamma Proteobacteria also have LolA having TIGR00547 and pfam03548 domains |
| LolC | TIGR00548 | IM | |||
| LolD | TIGR00221 | Cyt-IM assoc | |||
| LolE | TIGR02212 | IM | |||
| Bam | Insertion of beta barrel proteins in the outer membrane | BamA | TIGR03303 | OM | With the exception of proteins having large periplasmic domains, expect a genus-specific C-terminal sorting motif |
| BamB | TIGR03300 | LP-OM | |||
| BamC | TIGR03302 | LP-OM | |||
| BamD | pfam06804 | LP-OM | |||
| BamE | Pfam04355 | LP-OM |
1Abbreviations - inner membrane (IM), outer membrane (OM), cytoplasmic, but associated with the inner membrane (Cyt-IM assoc), lipoprotein localized to the outer membrane (LP-OM)
Domains that Identify Secretins and Ushers in Shewanella
| Domain | Short Model Descriptor | Secretion System | Proteins Detected | Predicted Localization in |
|---|---|---|---|---|
| pfam02321 | Outer membrane efflux protein | T1SS1 | AggA | OM |
| TIGR01844 | type I secretion outer membrane protein, TolC family | T1SS | TolC, AggA | OM |
| TIGR02519 | pilus (MSHA type) biogenesis protein MshL | T2bSS | MshL | OM lipoprotein |
| pfam07655 | Secretin N-terminal domain | T2bSS | MshL | OM lipoprotein |
| TIGR02515 | type IV pilus secretin (or competence protein) PilQ | T2bSS | PilQ | OM lipoprotein |
| pfam00263 | Bacterial type II and III secretion system protein | T2a-cSS, T3aSS | GspD, PilQ, MshL, YscC, RcpA, SspD | mixed |
| pfam03958 | Bacterial type II/III secretion system short domain | T2a-bSS, T3aSS | GspD, PilQ, YscC | mixed |
| pfam02107 | Flagellar L-ring protein | T3bSS | FlgH | OM lipoprotein |
| pfam03524 | Conjugal transfer protein | T4bSS | TrbG | OM |
| pfam03895 | YadA-like C-terminal region | T5cSS | OM | extra | |
| pfam03797 | Autotransporter beta-domain | T5aSS | OM | extra | |
| pfam06586 | TraK protein | T4bSS | TraK, TrhK | OM |
| pfam07660 | Secretin and TonB N terminus short domain | T2bSS2 | PilQ, MshL | OM lipoprotein |
| TIGR02516 | type III secretion outer membrane pore, YscC/HrcC family | T3aSS | YscC | OM |
| TIGR02756 | type-F conjugative transfer system secretin TraK | T4bSS | TraK | OM |
| TIGR03352 | type VI secretion lipoprotein, VC_A0113 family | T6SS | SciN | OM lipoprotein |
| pfam00577 | Fimbrial Usher protein | T7SS | PapC/FimD | OM |
| pfam03783 | Curli production assembly/transport component CsgG | T8SS | CsgG | OM lipoprotein |
1 Also detects OM component of drug and metal efflux pumps
2 also detects some TonB receptor proteins
Figure 2N-terminal signal peptides detected in Shewanella. Note that some T5SS substrates have been reported to possess an additional n- and h- region at the N-terminus and thus the position of the signal peptidases cleavage site would likely go undetected by standard predictors such as SignalP. Arrows indicate the position at which the signal peptide is cleaved by the respective signal peptidases. Conserved sequences are indicated, with X denoting any amino acid. The twin arginine motif is denoted by ZXRRXϕϕ, where Z=hydrophilic residue and ϕ=hydrophobic residue.
Characteristics of signal peptidases and target signal peptides
| Model Signal Peptidase | Example Protein | Merops Family & domains | Translocation system | Signal peptide domain | Example substrates | Domains | Comments |
|---|---|---|---|---|---|---|---|
| N-terminal Processing | |||||||
| PilD PulO GspO | SO_0414 | A24A | T2aSS | pfam07963 | GspGHIJK | pfam02501 pfam03934 pfam08334 pfam12019 | Signal peptides similar, except ones in IVb pili are longer (~25 aa) than others (~7 aa), GspK, PilX, PilW are not detected by pfam07963 |
| T2bSS | type IVa pili | ||||||
| type IVb pili | |||||||
| TadV | Spea_2010 | A24A | T2cSS | Flp, TadEF | |||
| LspA | SO_3531 | A08 | Sec | Lipo-proteins | |||
| Tat | TIGR01409 pfam10518 | ||||||
| NA1 | SO_A0049 | C39 | T1SS | TIGR01847 | class Ia & IIa-b bacteriocin, microcins | pfam01721 pfam10439 pfam10439 | The signal peptidase activity is encoded in the permease component of the T1SS system that exports the bacteriocin |
| LepB | SO_1347 | S26A | Sec | exported proteins | |||
| Tat | TIGR01409 pfam10518 | ||||||
| C-terminal processing | |||||||
| ? | T4bSS - IncF | TraA | mature TraA is about ~68 aa in length with two TM spans that circularizes | ||||
| TrhF | Shewana3_4209 | S26A | T4bSS - IncH | TrhA HdtZ | Substrates have Sec signal peptide that is cleaved by LepB | ||
| TraF | Sputw3181_1142 | S26A | T4bSS - IncJ | TraA | TIGR02758 | ||
| TraF | Shewana3_1267 | S26C | T4bSS - IncP | TrbC | pfam04956 | ||
Computational Tools used in Studies of Shewanella
| Name | URL | Use | Limitations |
|---|---|---|---|
| LipoP | primarily prediction of Sec signal peptides that are cleaved by LspA but also provides prediction of inner membrane or cytoplasmic localization as well as LepB cleavage | does not detect Tat substrates | |
| Lipo | prediction of Sec signal peptides that are cleaved by LspA | does not detect Tat substrates | |
| SignalP | prediction of Sec signal peptides that are cleaved by LepB | does not detect Tat substrates | |
| Phobius | prediction of alpha helices in inner membrane proteins, distinguishing N-terminal TM from signal peptides | ||
| TmHmm | prediction of alpha helices in inner membrane proteins | Signal peptides are often erroneously counted as TM spans | |
| Bomp | prediction of beta barrel spans in outer membrane proteins | ||
| Cello | prediction of localization (Cyt, IM, Peri, OM, Extra) | does not predict lipoprotein location in OM or IM | |
| Sosui-GramN | prediction of localization (Cyt, IM, Peri, OM, Extra) in gram negatives only | does not predict lipoprotein location in OM or IM, no scores given | |
| Subloc | prediction of localization (Cyt, Peri, Extra) | not appropriate for membrane bound proteins | |
| PsortB | prediction of localization (Cyt, IM, Peri, OM, Extra) | does not predict lipoprotein location in OM or IM, many proteins assigned | |
| TatP | prediction of Tat and Sec signal peptides | does not detect lipoproteins that have Tat signal peptide; some very long signal peptides not detected | |
| Tatfind | obtained from Dr. Pohlschroder 1 | Prediction of Tat signal peptides | does not require the presence of an adjacent LepB or LspA site or that it occurs at the protein N-terminus (though this can be advantageous when the start codon prediction is wrong) |
1 For correspondence. E-mail pohlschr@ sas.upenn.edu
Tool Performance Across 19 Proteins in Each of the 1990 Core Ortholog Groups
| Test | Tool | Groups with no match | Disagree with Curation | Groups with match | Disagree with Curation | Groups with mixed predictions | Curated as having Match |
|---|---|---|---|---|---|---|---|
| Sig Pep cleaved by LspA | LipoP 1.0 | 1911 | 0 | 49 | 0 | 30 | 61 |
| Sig Pep cleaved by LepB | SignalP-NN 3.0 | 1482 | 4 | 158 | 39 | 350 | 169 |
| Sig Pep cleaved by LepB | SignalP-Hmm 3.0 | 1447 | 1 | 247 | 89 | 296 | |
| Sig Pep recognized by TAT | TatP 1.0 | 1962 | 0 | 5 | 2 | 23 | 5 |
| Inner membrane protein | TmHmm 2.0 | 1417 (1)1 | 1 | 390 (103) | 21 | 183 (29) | 403 (133) |
| Inner membrane protein | Phobius | 1505 (14) | 14 | 349 (72) | 7 | 136 (47) | |
| Outer membrane protein | Bomp | 1934 | 11 | 13 | 2* | 43 | 32 |
1Values in parentheses indicate number of proteins predicted to have only 1 transmembrane span.
Performance of Localization Predictors Across 19 Proteins in Each of the 1990 Core Ortholog Groups
| Subcellular localization | Curated Localization1 | Cello 2.52 | SosiuGn | Subloc | PsortB 3.02 |
|---|---|---|---|---|---|
| extracellular | 40 | 7 (7)3 | 7 (7) | 24 (2) | 14 (11) |
| Outer Membrane | 32 | 16 (12) | 21 (12) | NA | 20 (14) |
| Periplasm | 176 | 25 (16) | 20 (18) | 72 (24) | 21 (20) |
| Inner Membrane | 403 | 222 (222) | 281 (278) | NA | 349 (294) |
| Cytoplasm | 1339 | 750 (737) | 780 (779) | 1277 (977) | 976 (970) |
| Total | 1990 | 1020 | 1109 | 1373 | 1380 |
1Lipoproteins localizing to the outer or inner membrane were counted as periplasmic, while those predicted to localize to the cell surface were counted as extracellular. T5SS autotransporters were counted as extracellular.
2Only Cello values for which a single location was predicted are included in these counts.
3Numbers in parentheses indicate the number of groups that are in agreement with curated locations.
Figure 3Decision tree for predicting localization of proteins in gram negative bacteria.