| Literature DB >> 22912771 |
Sandra Renier1, Pierre Micheau, Régine Talon, Michel Hébraud, Mickaël Desvaux.
Abstract
Genome-scale prediction of subcellular localization (SCL) is not only useful for inferring protein function but also for supporting proteomic data. In line with the secretome concept, a rational and original analytical strategy mimicking the secretion steps that determine ultimate SCL was developed for Gram-positive (monoderm) bacteria. Based on the biology of protein secretion, a flowchart and decision trees were designed considering (i) membrane targeting, (ii) protein secretion systems, (iii) membrane retention, and (iv) cell-wall retention by domains or post-translocational modifications, as well as (v) incorporation to cell-surface supramolecular structures. Using Listeria monocytogenes as a case study, results were compared with known data set from SCL predictors and experimental proteomics. While in good agreement with experimental extracytoplasmic fractions, the secretomics-based method outperforms other genomic analyses, which were simply not intended to be as inclusive. Compared to all other localization predictors, this method does not only supply a static snapshot of protein SCL but also offers the full picture of the secretion process dynamics: (i) the protein routing is detailed, (ii) the number of distinct SCL and protein categories is comprehensive, (iii) the description of protein type and topology is provided, (iv) the SCL is unambiguously differentiated from the protein category, and (v) the multiple SCL and protein category are fully considered. In that sense, the secretomics-based method is much more than a SCL predictor. Besides a major step forward in genomics and proteomics of protein secretion, the secretomics-based method appears as a strategy of choice to generate in silico hypotheses for experimental testing.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22912771 PMCID: PMC3415414 DOI: 10.1371/journal.pone.0042982
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Synoptic view of the secretomics-based method for monoderm bacterium.
Based on the biology of protein secretion, the coding sequences (CDS) are sequentially analysed in a workflow for (i) signal peptide (SP), (ii) secretory pathway, (iii) membrane retention, (iv) cell-wall retention, and (v) surface appendage. For each step, a combination of different tools allows defining different databases, as indicated in the detailed flowchart ( ). From there, the resulting databases are analysed as depicted in the detailed decision trees ( ). In the end, proteins are discriminated into different categories and different SCL are predicted. Sec-SP: Sec-dependent SP; Unc-SP: uncleaved SP; TMD: transmembrane domain; PGBD: peptidoglycan-binding domain; CWBD: cell-wall binding domain; SLHD: S-layer homology domain; SL-Prot: S-layer protein; Doc/Coh: dockerin/cohesin domain; IMP: integral membrane protein; GO: gene ontology.
Summary of the abbreviations in use in relation to protein secretion.
| Abbreviation | Full name |
|
| |
| IMP | integral membrane protein |
| ssIMP I/II/III | single-spanning IMP of type I/II/III |
| msIMP | multi-spanning IMP |
| CW-protein | parietal protein |
|
| |
| CM | cytoplasmic membrane |
| CW | cell wall |
| CS | cell surface |
| EM | extracellular milieu |
|
| |
| SP I/II | signal peptide of type I/II |
| Unc-SP | uncleaved SP |
| TMD | α-helical transmembrane domain |
| CWBD1/2 | cell-wall binding domain of type 1/2 |
| PGBD1/2/3/4 | peptidoglycan-binding domain of type 1/2/3/4 |
| SLHD | S-layer homology domain |
| Sec | secretion |
| Tat | twin-arginine translocation |
| FEA | flagellum export apparatus |
| FPE | fimbrillin-protein exporter |
| ABC | ATP-binding cassette |
| Wss | WXG100 secretion system |
| NC | non-classical secretion |
Figure 2Comprehensive flowchart of the secretomics-based method in a monoderm bacterium.
The analysis considered the (i) signal peptide (SP), (ii) type of SP (Signal anchor, FPE-SP, Tat-SP, ABC-S and, lipobox), (iv) protein secretion systems, (v) exported proteins lacking a SP (Export, FEA-, Holin, and Wss-substrates), (iv) transmembrane domain (TMD), and (iv) relevant conserved domains (LPXTG, WXL, LysM, CWBD1,...etc). Details of the prediction tools used for the analysis and definition of the databases are provided inthe Materials & Methods section.
Figure 3Detailed decision trees for prediction of protein category and SCL of secreted proteins.
(1) Proteins exhibiting N-terminal SP are extracted from TMD database (Y). (2) The different types of SP (Tat-SP, ABC-SP and FPE-SP) are extracted (Y). (3) Absence of a signal anchor (N) and export (Y) define Sec-dependent SP (Sec-SP). (1) Proteins with TMD but no predicted SP (N), (4) are checked for uncleaved SP (Unc-SP), i.e. TMD of at least 7 amino acid within the first 100 N-terminal residues and with Nin-Cout topology (Type II signal) (Y). (3) Unc-SP also comprises proteins with signal anchor (Y) and SP categorised as non-exported (N). (5) From the types of SP are clearly defined. Together with (4) proteins with TMD but no SP (N) and (6) protein substrates of holins, Wss and FEA, (7) the presence of the respective protein secretion systems is checked (Y). (8) When the respective protein secretion system is absent (N) or (9) proteins are not predicted as secreted (N), proteins are considered as cytoproteins and located in the CP. (10) Cytoproteins predicted as exported by NC (Y) are further considered as located extracellularly. (7) Secreted proteins and their respective secretion system are defined from there. (11) Translocated proteins with Unc-SP (Y) are IMPs. (11) Translocated proteins without Unc-SP (N), and (12) with TMD (Y) but no SP (N) are IMPs. (13) Remaining translocated proteins with a cleavable SP (Y) and a single predicted TMD (TMD = 1) (Y) cannot be IMP, and are checked for (16) the presence of a lipobox. (13) Remaining translocated proteins with more than one TMD (N) are checked for (14) the absence of overlap (N) with SP region and LPXTG domain respectively (TMD = 2 AND TMD = LPXTG) to be IMP, otherwise (Y) are checked for (16) the presence of a lipobox. (15) From TMD topology prediction ( ), IMPs are further subcategorised and considered as integral to CM. (16) Presence of a lipobox (Y) define lipoproteins anchored to the CM. (17) The presence of glycine residue at position C+2 (C+2 = G) (Y) indicates potential release into the EM [26], [48]. (18) Presence of cell-wall retention domains define (19) parietal proteins (CW-protein) that are further subcategorised ( ) and considered as located at the CW. (20) Proteins with less tha 2 GW modules is not defined as CW-protein located at the CW [57], [58]. (21) Proteins part of S-layer, pilus and cellulosome, as well as (22) pseudopilus and flagellum are defined. (23) Secreted proteins with none of the cell-envelope retention are as exoproteins located in the EM. N: No, Y: Yes.
Protein secretion pathways in L. monocytogenes EGD-e as revealed by the secretomics-based method.
| Secretion pathway | Protein ID | Annotation | Similarity search |
|
| |||
| Translocase | Lmo2612 | Sec translocon, subunit SecY | TC#3.A.5, COG0201, IPR002208, PIRSF004557, TIGR00967, SSF103491, PF00345 |
| Lmo0245 | Sec translocon, subunit SecE | TC#3.A.5, COG0690, IPR001901, IPR005807, TIGR00964, PF00584 | |
| Lmo2451 | Sec translocon, subunit SecG | TC#3.A.5, COG1314, IPR004692, TIGR00810, PF03840 | |
| Lmo1527 | Sec transcolon, bifunctional subunit SecDF | TC#2.A.6.4, TC#3.A.5, IPR03335, PF07549, PF02356 | |
| Lmo1529 | Sec transcolon, subunit YajC | TC#9.B.18, COG1862, TIGR00739, PF02699 | |
| Lmo2510 | Sec translocase, ATPase, SecA | TC#3.A.5, COG0653, IPR000185, TIGR00963 | |
| Lmo0583 | Sec translocase, ATPase, SecA2 | TC#3.A.5.10, COG0653 | |
| Lmo1803 | Signal recognition particle (SRP) receptor subunit, FtsY | TC#3.A.5, COG0552, IPR004390, IPR000897, TIGR00064, SSF47364 | |
| Lmo1801 | Signal recognition particle (SRP), Ffh | TC#3.A.5, COG0541, IPR004780, TIGR00959, SSF47446 | |
| Insertase | Lmo1379 | YidC insertase, OxaA1 (YqjG) | TC#2.A.9, COG0706 |
| Lmo2854 | YidC insertase, OxaA2 (SpoIIIJ) | TC#2.A.9, COG0706 | |
| SPase | Lmo1269 | Signal peptidase of Type I, SipX | COG0681, IPR000223, TIGR02227, SSF51306 |
| Lmo1270 | Signal peptidase of Type I, SipY | COG0681, IPR000223, TIGR02227, SSF51306 | |
| Lmo1271 | Signal peptidase of Type I, SipZ | COG0681, IPR000223, TIGR02227, SSF51306 | |
| Lmo1844 | Signal peptidase of Type II, lipoprotein signal peptidase, LspA | COG0597, IPR001872, TIGR00077, PF01252 | |
| Lmo1101 | Signal peptidase of Type II, lipoprotein signal peptidase, LspB | COG0597, IPR001872, TIGR00077, PF01252 | |
| CM anchoring | Lmo2482 | Prolipoprotein diacylglyceryltransferase, Lgt | COG0682, IPR001640, TIGR00544, PF01790 |
| CW anchoring | Lmo0929 | Sortase A, SrtA | COG3764, IPR005754, TIGR01076, SSF63817, PF04203 |
| Lmo2181 | Sortase B, SrtB | COG4509, IPR009835, IPR015986, PIRSF030150, TIGR03064, SSF63817, PF07170 | |
|
| |||
| Lmo0362 | Twin-arginine translocase protein A, TatA | TC#2.A.64, COG1826, IPR003369, IPR006312, TIGR01411, PF02416 | |
| Lmo0361 | Twin-arginine translocation protein, TatC | TC#2.A.64, COG0805, IPR002033, TIGR00945, PF00902 | |
|
| |||
| Lmo0607 | ABC-type bacteriocin exporter, peptidase domain, ATP-binding/permease protein | TC#3.A.1, COG2274 | |
| Lmo0608 | ABC-type bacteriocin exporter, peptidase domain, ATP-binding/permease protein | TC#3.A.1, COG2274 | |
| Lmo2580 | ABC-type antimicrobial peptide transport system, ATPase component | TC#3.A.1, COG1136 | |
| Lmo2581 | ABC-type antimicrobial peptide transport system, permease component | TC#3.A.1, COG0577 | |
| Lmo2751 | ABC-type bacteriocin exporter, peptidase domain, ATP-binding/permease protein | TC#3.A.1, COG2274 | |
| Lmo2752 | ABC-type bacteriocin exporter, peptidase domain, ATP-binding/permease protein | TC#3.A.1, COG2274 | |
| Lmo0107 | ABC-type bacteriocin exporter, peptidase domain, ATP-binding/permease protein | TC#3.A.1, COG2274 | |
| Lmo0108 | ABC-type bacteriocin exporter, peptidase domain, ATP-binding/permease protein | TC#3.A.1, COG2274 | |
|
| |||
| Lmo1347 | Fimbrilin-protein exporter, ATPase component, ComGA | TC#3.A.14 | |
| Lmo1346 | Fimbrilin-protein exporter, membrane component, ComGB | TC#3.A.14 | |
| Lmo1550 | Type 4 prepilin peptidase, ComC | COG1989, IPR000045, PF01478 | |
|
| |||
| Lmo0680 | Flagellar export apparatus, membrane subunit FlhA | TC#3.A.6, COG1298, IPR001712, PF00771 | |
| Lmo0679 | Flagellar export apparatus, membrane subunit FlhB | TC#3.A.6, COG1377, IPR006135, PF01312 | |
| Lmo0678 | Flagellar export apparatus, membrane subunit FliR | TC#3.A.6, COG1684, IPR002010, PF01311 | |
| Lmo0677 | Flagellar export apparatus, membrane subunit FliQ | TC#3.A.6, COG1987, IPR002191, PF01313 | |
| Lmo0676 | Flagellar export apparatus, membrane subunit FliP | TC#3.A.6, IPR018035, PF02108 | |
| Lmo0715 | Flagellar export apparatus, peripheral subunit FliH | TC#3.A.6, COG1338, IPR005838, PF00814 | |
| Lmo0716 | Flagellar export apparatus, ATPase subunit FliI | TC#3.A.6, COG1157, IPR005714, TIGR01026 | |
|
| |||
| Lmo0128 | Holin TcdE-like | TC#1.E, COG4824, IPR006480, TIGR01593, PF05105 | |
| Lmo2279 | Holin phage A118 | TC#1.E, IPR009708, PF06946 | |
|
| |||
| Lmo0061 | WXG100 secretion system, ATPase component, YukAB (EssC) | TC#9.A.44, IPR023839, TIGR03928 | |
| Lmo0057 | WXG100 secretion system, membrane component, EsaA | TC#9.A.44, IPR023838, TIGR03929 | |
| Lmo0058 | WXG100 secretion system, membrane component, EssA | TC#9.A.44, IPR018920, PF10661, TIGR03927 | |
| Lmo0060 | WXG100 secretion system, membrane component, YukC (EssB) | TC#9.A.44, IPR018778, PF10140, TIGR03926 | |
| Lmo0059 | WXG100 secretion system, peripheral component, YukD (EsaB) | TC#9.A.44, IPR14921, PIRSF037793, PF08817 | |
| Lmo0062 | WXG100 secretion system, peripheral component, EsaC | TC#9.A.44 | |
Protein secretion systems: Sec (Secretion), Tat (Twin-arginine translocation), ABC (ATP-binding cassette), holin (hole forming) and Wss (WXG100 secretion system) pathways.
Some annotations were corrected respective to the similarity search performed as described in the Material & Methods section. More extensive and detailed annotations are available in Table S1.
Similarity search were based on interrogations of dedicated databases as described in the Material & Methods section.
Performance evaluation metrics of the secretomics-based methods compared to other SCL predictors.
| Tool | Actual location | GO | Performance | ||||
| MCC | Accuracy | Sensitivity | Specificity | ||||
|
| Single | ||||||
| Extracellular milieu | GO:0005876 | 0.914 | 97.8 | 89.8 | 99.3 | ||
| Cell surface | GO:0009986 | 0.965 | 98.2 | 97.0 | 100.0 | ||
| Cell surface protein complex | GO:0043234 AND 0009986 | 1.000 | 100.0 | 100.0 | 100.0 | ||
| Cell wall | GO:0009275 | 1.000 | 100.0 | 100.0 | 100.0 | ||
| Intrinsic to the CM | GO:0031226 | 1.000 | 100.0 | 100.0 | 100.0 | ||
| Anchored to the CM | GO:0046658 | 1.000 | 100.0 | 100.0 | 100.0 | ||
| Integral to the CM | GO:0005887 | 1.000 | 100.0 | 100.0 | 100.0 | ||
| Cytoplasm | GO:0005737 | 1.000 | 100.0 | 100.0 | 100.0 | ||
| Overall | 0.988 | 99.5 | 98.5 | 99.9 | |||
|
| Single | ||||||
| Extracellular milieu | GO:0005876 | 0.267 | 70.0 | 65.3 | 70.8 | ||
| Cell surface | GO:0009986 | n/a | n/a | n/a | n/a | ||
| Cell surface protein complex | GO:0043234 AND 0009986 | n/a | n/a | n/a | n/a | ||
| Cell wall | GO:0009275 | n/a | n/a | n/a | n/a | ||
| Intrinsic to the CM | GO:0031226 | n/a | n/a | n/a | n/a | ||
| Anchored to the CM | GO:0046658 | n/a | n/a | n/a | n/a | ||
| Integral to the CM | GO:0005887 | n/a | n/a | n/a | n/a | ||
| Cytoplasm | GO:0005737 | 0.420 | 63.5 | 94.0 | 48.3 | ||
| Overall | 0.396 | 66.7 | 85.5 | 60.8 | |||
|
| Single | ||||||
| Extracellular milieu | GO:0005876 | 0.298 | 68.5 | 73.5 | 67.7 | ||
| Cell surface | GO:0009986 | n/a | n/a | n/a | n/a | ||
| Cell surface protein complex | GO:0043234 AND 0009986 | n/a | n/a | n/a | n/a | ||
| Cell wall | GO:0009275 | n/a | n/a | n/a | n/a | ||
| Intrinsic to the CM | GO:0031226 | n/a | n/a | n/a | n/a | ||
| Anchored to the CM | GO:0046658 | n/a | n/a | n/a | n/a | ||
| Integral to the CM | GO:0005887 | n/a | n/a | n/a | n/a | ||
| Cytoplasm | GO:0005737 | 0.556 | 71.5 | 100.0 | 57.0 | ||
| Overall | 0.471 | 70.0 | 92.0 | 63.0 | |||
|
| Single | ||||||
| Extracellular milieu | GO:0005876 | 0.030 | 73.8 | 20.4 | 82.8 | ||
| Cell surface | GO:0009986 | n/a | n/a | n/a | n/a | ||
| Cell surface protein complex | GO:0043234 AND 0009986 | n/a | n/a | n/a | n/a | ||
| Cell wall | GO:0009275 | n/a | 91.2 | 0.0 | 100.0 | ||
| Intrinsic to the CM | GO:0031226 | 0.572 | 78.8 | 77.7 | 79.7 | ||
| Anchored to the CM | GO:0046658 | n/a | n/a | n/a | n/a | ||
| Integral to the CM | GO:0005887 | n/a | n/a | n/a | n/a | ||
| Cytoplasm | GO:0005737 | 0.749 | 86.8 | 97.4 | 81.2 | ||
| Overall | 0.553 | 82.6 | 69.5 | 87.1 | |||
|
| Single | ||||||
| Extracellular milieu | GO:0005876 | 0.248 | 73.5 | 55.1 | 76.6 | ||
| Cell surface | GO:0009986 | n/a | n/a | n/a | n/a | ||
| Cell surface protein complex | GO:0043234 AND 0009986 | n/a | n/a | n/a | n/a | ||
| Cell wall | GO:0009275 | 0.230 | 89.3 | 23.3 | 95.8 | ||
| Intrinsic to the CM | GO:0031226 | 0.627 | 81.8 | 73.6 | 88.0 | ||
| Anchored to the CM | GO:0046658 | n/a | n/a | n/a | n/a | ||
| Integral to the CM | GO:0005887 | n/a | n/a | n/a | n/a | ||
| Cytoplasm | GO:0005737 | 0.745 | 87.7 | 91.5 | 85.9 | ||
| Overall | 0.571 | 83.1 | 72.7 | 86.6 | |||
|
| Single | ||||||
| Extracellular milieu | GO:0005876 | 0.310 | 85.0 | 32.7 | 93.8 | ||
| Cell surface | GO:0009986 | 0.691 | 83.5 | 76.4 | 93.6 | ||
| Cell surface protein complex | GO:0043234 AND 0009986 | n/a | n/a | n/a | n/a | ||
| Cell wall | GO:0009275 | n/a | n/a | n/a | n/a | ||
| Intrinsic to the CM | GO:0031226 | n/a | n/a | n/a | n/a | ||
| Anchored to the CM | GO:0046658 | n/a | n/a | n/a | n/a | ||
| Integral to the CM | GO:0005887 | n/a | n/a | n/a | n/a | ||
| Cytoplasm | GO:0005737 | n/a | n/a | n/a | n/a | ||
| Overall | 0.654 | 84.3 | 67.7 | 93.8 | |||
|
| Single | ||||||
| Extracellular milieu | GO:0005876 | 0.280 | 86.9 | 20.4 | 97.4 | ||
| Cell surface | GO:0009986 | n/a | n/a | n/a | n/a | ||
| Cell surface protein complex | GO:0043234 AND 0009986 | n/a | n/a | n/a | n/a | ||
| Cell wall | GO:0009275 | 0.730 | 95.6 | 76.7 | 97.4 | ||
| Intrinsic to the CM | GO:0031226 | 0.680 | 84.1 | 72.3 | 93.2 | ||
| Anchored to the CM | GO:0046658 | n/a | n/a | n/a | n/a | ||
| Integral to the CM | GO:0005887 | n/a | n/a | n/a | n/a | ||
| Cytoplasm | GO:0005737 | 0.831 | 92.4 | 88.9 | 94.2 | ||
| Overall | 0.714 | 89.7 | 70.9 | 95.9 | |||
|
| Single | ||||||
| Extracellular milieu | GO:0005876 | 0.378 | 87.5 | 34.7 | 95.8 | ||
| Cell surface | GO:0009986 | 0.470 | 70.4 | 54.8 | 90.4 | ||
| Cell surface protein complex | GO:0043234 AND 0009986 | n/a | n/a | n/a | n/a | ||
| Cell wall | GO:0009275 | 0.888 | 98.3 | 86.7 | 99.4 | ||
| Intrinsic to the CM | GO:0031226 | 0.803 | 90.3 | 90.5 | 90.1 | ||
| Anchored to the CM | GO:0046658 | 0.802 | 95.7 | 91.9 | 96.2 | ||
| Integral to the CM | GO:0005887 | 0.828 | 92.4 | 86.8 | 95.2 | ||
| Cytoplasm | GO:0005737 | 0.819 | 90.6 | 100.0 | 85.9 | ||
| Overall | 0.730 | 89.3 | 77.2 | 94.0 | |||
|
| Single | ||||||
| Extracellular milieu | GO:0005876 | 0.392 | 88.6 | 26.5 | 98.4 | ||
| Cell surface | GO:0009986 | n/a | n/a | n/a | n/a | ||
| Cell surface protein complex | GO:0043234 AND 0009986 | n/a | n/a | n/a | n/a | ||
| Cell wall | GO:0009275 | 0.866 | 97.9 | 76.7 | 100.0 | ||
| Intrinsic to the CM | GO:0031226 | 0.789 | 89.2 | 94.6 | 85.2 | ||
| Anchored to the CM | GO:0046658 | 0.825 | 96.0 | 97.3 | 95.9 | ||
| Integral to the CM | GO:0005887 | 0.807 | 91.0 | 93.0 | 90.0 | ||
| Cytoplasm | GO:0005737 | 0.814 | 90.3 | 100.0 | 85.5 | ||
| Overall | 0.790 | 92.1 | 87.9 | 93.4 | |||
Subcellular location follow the GO (Gene Ontology) for cellular component.
Performance was evaluated for single and overall SCL predictions for each tools. MCC (Matthews Correlation Coefficient) and other statistical metrics were calculated as described in the Materials & Methods section. Sensitivity, specificity and accuracy are expressed in %. Detailed performance evaluation metrics are provided in Table S7.