| Literature DB >> 33897982 |
Xinjie Hui1, Zewei Chen1, Junya Zhang1, Moyang Lu2, Xuxia Cai1, Yuping Deng1, Yueming Hu1, Yejun Wang1,3.
Abstract
Gram-negative bacteria harness multiple protein secretion systems and secrete a large proportion of the proteome. Proteins can be exported to periplasmic space, integrated into membrane, transported into extracellular milieu, or translocated into cytoplasm of contacting cells. It is important for accurate, genome-wide annotation of the secreted proteins and their secretion pathways. In this review, we systematically classified the secreted proteins according to the types of secretion systems in Gram-negative bacteria, summarized the known features of these proteins, and reviewed the algorithms and tools for their prediction.Entities:
Keywords: Gram-negative bacteria; Prediction; Protein secretion system; Secreted protein; Transmembrane protein
Year: 2021 PMID: 33897982 PMCID: PMC8047123 DOI: 10.1016/j.csbj.2021.03.019
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Fig. 1Subcellular localization of Gram-negative bacterial proteins. The dashed arrow showed the translocation process of the proteins.
Fig. 2Secreted proteins and their transport pathways. The secretion machines are multi-protein complex, with different protein components. The protein transport processes were also indicated, with Sec and Tat pathways secreting the proteins from bacterial cytoplasm to periplasm or inner membrane, Lol pathways transporting the protein within the periplasm side of inner membrane into the periplasm side of outer membrane, Bam and Tam systems transporting periplasmic protein into outer membrane, T1SSs transporting proteins from bacterial cytoplasm to extracellular space, T2SSs and T9SSs transporting periplasmic proteins to extracellular space, and T3SSs, T4SSs and T6SSs translocating proteins from cytoplasm to host cellular cytoplasm directly. T5SSs are autotransporters that transport themselves extracellularly. The pili and curli proteins are transported out of bacterial outer membrane through T7SSs and T8SSs, respectively. The protein names or component types were shown for each secretion systems. OMF, Out Membrane Factor; MFP, Membrane Fusion Protein; IMC, Inner Membrane Component; SRP, Signal Recognition Particle.
Overview of protein secretion systems and the substrate features in Gram-negative bacteria.
| Secretion system | Secretion step(s) | Membrane spanning | Secretion signal | Substrate state |
|---|---|---|---|---|
| Sec | 1 | Inner | N-terminus | Unfolded |
| Tat | 1 | Inner | N-terminus | Folded |
| T1SS | 1 | Inner + Outer | C-terminus | Unfolded |
| T2SS | 2 (Sec/Tat) | Inner + Outer | N-terminus | Folded |
| T3SS1 | 1 or 2 (Sec) | Inner + Outer (+Host) | N-terminus | Unfolded |
| T4SS2 | 1 | Inner + Outer (+Host) | C-terminus | Unfolded |
| T6SS | 1 | Inner + Outer + Host | N-terminus? | Folded |
| T5SS | 2 (Sec) | Outer | N-terminus | Unfolded |
| Pili/ T7SS | 2 (Sec) | Outer | N-terminus | Folded |
| Curli/ T8SS | 2 (Sec) | Outer | N-terminus | Unfolded |
| T9SS | 2 (Sec) | Inner + Outer | C-terminus | Folded |
Notes: 1 T3SSs include non-flagella T3SSs and flagella T3SSs. Non-flagella T3SSs are translocation systems delivering substrates into host cells in one step, while flagella T3SSs involve two steps to secrete substrates extracellularly. 2 T4SSs translocate substrate proteins into host cells like T3SSs, or transport the proteins into extracellular milieu.
Fig. 3Sequence features of Sec/Tat SPs. (A) Sec-dependent SPs. There are two types of Sec-SPs, classical (top) and lipoprotein ones (bottom). Both of them are composed of a N-terminus region (blue), a hydrophobic region (dark blue) and a C-terminus region (grey). ‘+’ represents the region positively charged. The residue composition patterns of the C-terminal cleavage sites and corresponding SPases are shown. (B) Tat-dependent SPs. SPs targeted to Tat pathway have the sequential features similar to Sec-SPs, but generally have longer N-terminal regions which often contain a conserved motif with two consecutive arginine residues. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Representative software tools predicting Sec substrates in Gram-negative bacteria.
| Tool | Algorithms | Target | URL or reference |
|---|---|---|---|
| SignalP4 | Artificial Neural Network (ANN) | Sec/SPI; Cleavage site | |
| SignalP5 | Deep Neural Network (DNN) | Sec/SPI; Sec/SPII; Tat/SPI; Cleavage site | |
| Signal-BLAST | BLASTP | Sec/SPI | |
| Signal-3L 2.0 | Hierarchical Mixture Model | Sec/SPI; TMH; Cleavage site | |
| PrediSi | Position Weight Matrix (PWM) | Sec/SPI | |
| Signal-CF | Pseudo Amino Acid Composition;K Nearest Neighbor Classifier | Sec/SPI; Cleavage site | |
| LipoP | HMM | Sec/SPI; Sec/SPII; TMH | |
| SPEPlip | ANN; Regular Expression Search | Sec/SPI; Lipoprotein; Cleavage site | |
| Phobius/ PolyPhobius | Hidden Markov Model (HMM) | Sec/SPI; Full-protein TM topology | |
| Philius | Dynamic Bayesian Network (DBN) | Sec/SPI; Full-protein TM topology; Protein type | |
| TOPCONS | Consensus prediction | Sec/SPI; Full-protein TM topology; Protein type | |
| SPOCTOPUS | ANN and HMM | Sec/SPI; TMH | |
| MEMSAT3/ MEMSAT-SVM | ANN; Support Vector Machine (SVM) | Sec/SPI; TMH; Re-entrant helix; Protein type | |
| DeepSig | Deep Convolutional Neural Network(DCNN); Grammar-Restrained Hidden Conditional Random Field | Sec/SPI; Cleavage site | |
| SigUNet | Convolutional Neural Network (CNN) | Sec/SPI | |
| Signal-3L 3.0 | Attention Deep Learning; Window-Based Scoring | Sec/SPI; Cleavage site |
Representative software tools predicting Tat substrates in Gram-negative bacteria.
| Tool | Method | Target | URL or reference |
|---|---|---|---|
| TATFIND 1.4 | Regular expression pattern; Hydrophobicity analysis | Tat/SPI; Sec/SPI | |
| TatP 1.0 | Regular expression pattern; ANN | Tat/SPI; Sec/SPI; Cleavage site | |
| PRED-TAT | HMM | Tat/SPI; Sec/SPI; Cleavage site | |
| SignalP5 | DNN | Tat/SPI; Sec/SPI; Sec/SPII; Cleavage site |
Fig. 4Sequence features of T1SS substrates. T1SSs can be divided into 4 groups. The substrates of Class 1 T1SSs typically contain N-terminal leader peptides (blue), while Classes 2–4 have secretion signal sequences in the C-termini (grey). Consensus sequence motifs are shown for the RTX repeats (light green and pink). RTX repeats are not necessarily present in Class 3 T1SEs. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Representative software tools predicting substrates of T1 ~ 9SSs.
| Secretion System(s) | Tool | Method | URL or reference |
|---|---|---|---|
| T1SS | Linhartova's | Data mining | |
| Luo's | Random Forest (RF) | ||
| T3SS | SIEVE | SVM | |
| SSE-AAC | SVM | ||
| BPBAac | SVM | ||
| TEREE | Probability scoring | ||
| T3SEpre | SVM | ||
| BEAN/BEAN 2.0 | SVM | ||
| EffectiveT3 | Naïve Bayes (NB) | ||
| Modlab | ANN and SVM | ||
| T3_MM | Markov Model | ||
| RF model | RF | ||
| pEffect | PSI-BLAST and SVM | ||
| GenSET | Voting algorithm | ||
| DeepT3 | DCNN | ||
| Bastion3 | Two-layer ensemble model | ||
| Tbooster | Logistic Regression (LR), RF and SVM | ||
| orgsissec | Phylogenetic profiles | ||
| T3SEpp | Multiple features; ensemble models | ||
| EP3 | Ensemble models | ||
| T4SS | S4TE | Motif searching | |
| Burstein's | Voting algorithm | ||
| Lifshitz's | Hidden Semi-Markov Mode (HSMM) | ||
| Chen's | Genetic Screening | ||
| T4EffPred | SVM | ||
| T4SEpre | SVM | ||
| Wang's | SVM | ||
| PredT4SE-Stack | Stacked generalization | ||
| Bastion4 | Ensemble model | ||
| OPT4e | SVM | ||
| SecReT4 | BLASTp | ||
| Tbooster | LR, RF and SVM | ||
| CNN-T4SE | CNN; voting | ||
| T4SE-XGB | eXtreme gradient boosting (XGBoost) algorithm | ||
| orgsissec | Phylogenetic profiles | ||
| T5SS | twin-HMM | HMM | |
| Zude's | Seeded guide trees and HMM | ||
| Vo's | BLASTp | ||
| T6SS | Bastion6 | SVM | |
| PyPredT6 | Consensus of MLP, SVM, KNN, NB, RF | ||
| SecReT6 | BLAST | ||
| Tbooster | LR, RF and SVM | ||
| orgsissec | Phylogenetic profiles | ||
| T9SS | Veith's | HMM |
Fig. 5Sequence features of the substrates of type 3/4/6 secretion systems. (A) A classical T3SE contains a secretion signal bearing N-terminus, a C-terminal effector domain, and a CBD connecting the termini. (B) Classical T4SEs (T4aSE/T4bSE) show amino acid preference patterns in the C-terminal regions. Some of the effectors also contain essential translocation-guiding signals in the N-termini. Different from T4aSS and T4bSS effectors, T4xSS effector contains a conserved C-terminal domain termed ‘XVIPCD’. (C) Some of T6SEs contain MIX (marker for type six effectors) motif in the N-termini as the T6S signal potentially. There could be also other putative catalytic motifs as shown in example proteins (green). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 6BT5SSs and the features their substrates. (A) Substrate export of different types of T5SSs. (B) Sequence features of the substrates of different types of T5SSs. The pre-proteins of all these substrates also belong to Sec substrates and therefore contain SPs in N-termini. However, autotransporters (T5aSSs), TpsA exoproteins of the two-partner systems (T5bSSs) and trimeric autotransporters (T5cSSs) have extended signal peptide regions specifically (top). A T5aSS contains a passenger domain and a β-barrel translocator domain. Cleavage occurs between the two-asparagine residues located between the two domains (red arrow). A T5bSS is composed by two polypeptides, substrate TpsA and transporter TpsB. There is a conserved ‘NPNL’ motif in TpsA that is essential for its secretion. The TpsB and T5dSEs both contain POTRA (polypeptide transport-associated) motifs preceding the putative 16-stranded beta-barrel domains in the C-temini. T5cSS is composed by three polypeptides while T5eSS is inverted with an additional small periplasmic domain in the N-termini. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 7Sequence features and the transport of the T7SS substrates. Pilus subunits contain SPs in the N-termini. The proteins are taken up by their cognate chaperones within periplasm, and a donor strand complementation (DSC) reaction occurs, by which a motif of four alternating hydrophobic residues (termed P1 to P4) on the chaperone G1 ftrand are inserted into a hydrophobic groove (known as the P1 to P4 pockets) of the pilus subunits so that a correct folding of the pilus subunits is catalyzed. CU pilus subunits also contain a 10–20 residue-long N-terminal extension (Nte) peptide that is sequentially conserved. During CU pilus subunit polymerization, the complementing G1 strand donated by the chaperone is replaced by the Nte on the subunit of the incoming chaperone–subunit complex. The assembly reaction is termed donor strand exchange (DSE). After DSE, the P2 to P5 pockets of the subunit groove are occupied by the hydrophobic residues (termed P2–P5) of the incoming subunit Nte. The P4 Gly residue in Nte sequences is strictly conserved.
Representative software tools predicting TMHs.
| Tool | Method | Target | URL or reference |
|---|---|---|---|
| TOPPred2 | Physiochemical property and statistics based | TMH; TM topology | |
| SOSUI | Physiochemical property and statistics based | TMH | |
| SCAMPI | Physiochemical property and statistics based | TMH; TM topology | |
| PHDhtm | ANN | TMH | |
| MEMSAT3 | ANN | TMH; TM topology; Sec/SPI | |
| SPOCTOPUS | NN + HMM | TMH; TM topology; Sec/SPI | |
| SOMPNN | PNN | TMH | |
| TMSEG | NN + RF | TMH; TM topology | |
| TMHMM 2.0 | HMM | TMH; TM topology | |
| HMMTOP2 | HMM | TMH; TM topology | |
| Phobius/ PolyPhobius | HMM | TMH; TM topology; Sec/SPI | |
| Philius | DBN | TMH; TM topology; Sec/SPI | |
| MEMSAT-SVM | SVM | TMH; TM topology; Sec/SPI; Re-entrant helix | |
| MemBrain | OET-KNN | TMH; Sec/SPI |
Representative software tools predicting β-barrel OMPs.
| Tool | Method | Target | URL or reference |
|---|---|---|---|
| Neuwald’s | Motif searching | TMβ-strand | |
| Gromiha’s | Physicochemical property, structure and statistics based | TM β-strand | |
| BBF | Physicochemical property, structure and statistics based | β-barrel OMP | |
| BOMP | Physicochemical property, structure and statistics based | β-barrel OMP | |
| transFold | Physicochemical property, structure and statistics based | TM β-barrel; residue side-chain orientations; inter-strand residue contact; strand inclination | |
| HHomp | Sequence similarity searching | β-barrel OMP | |
| Freeman-Wimley | Physicochemical property, structure and statistics based | TM β-barrel | |
| OM-TOPO-PREDICT | NN | TM β-strand; OMP topology | |
| B2TMPRED | NN | TM β-strand; OMP topology | |
| TMBETA-NET | NN | TM β-strand | |
| TMBpro | NN | TM β-barrel; secondary structure; β-contacts; tertiary structure | |
| TBBPred | NN + SVM | TM β-barrel | |
| TMbeta-SVM | SVM (sequential Aac + residue pairs) | β-barrel OMP | |
| PredβTM | SVM (position-specific Aac + residue pairs) | TM β-strand | |
| BOCTOPUS/ BOCTOPUS2 | SVM; HMM | OMP topology | |
| HMMB2TMR | HMM | OMP topology | |
| PROFtmb | HMM (beta-hairpin motifs) | β-barrel OMP; non-β-barrel OMP | |
| PRED-TMBB | HMM; | OMP; soluble protein | |
| ConBBPRED | Consensus approaches | β-strand; OMP topology | |
| TMB-Hunt | k-NN | β-barrel TMP; non-β-barrel TMP | |
| IDQD | Quadratic discriminant analysis | β-barrel TMP; TMH; global protein | |
| TMBETADISC-RBF | Radial Basis Function (RBF) Networks | OMP | |
| GRHCRF | Grammatical-Restrained Hidden Conditional Random Fields (GRHCRFs) | OMP | |
| BetAware | N-to-1 Extreme Learning | β-barrel TMP | |
| BETAWARE | N-to-1 network encoding and ELM training algorithm | β-barrel TMP | |
| MemBrain-TMB | Statistical machine learning | β-barrel TMP | |
| Koehler's | NN | β-barrel TMP; TMH |