| Literature DB >> 22194940 |
Daniel H Haft1, Neha Varghese.
Abstract
The rhomboid family of serine proteases occurs in all domains of life. Its members contain at least six hydrophobic membrane-spanning helices, with an active site serine located deep within the hydrophobic interior of the plasma membrane. The model member GlpG from Escherichia coli is heavily studied through engineered mutant forms, varied model substrates, and multiple X-ray crystal studies, yet its relationship to endogenous substrates is not well understood. Here we describe an apparent membrane anchoring C-terminal homology domain that appears in numerous genera including Shewanella, Vibrio, Acinetobacter, and Ralstonia, but excluding Escherichia and Haemophilus. Individual genomes encode up to thirteen members, usually homologous to each other only in this C-terminal region. The domain's tripartite architecture consists of motif, transmembrane helix, and cluster of basic residues at the protein C-terminus, as also seen with the LPXTG recognition sequence for sortase A and the PEP-CTERM recognition sequence for exosortase. Partial Phylogenetic Profiling identifies a distinctive rhomboid-like protease subfamily almost perfectly co-distributed with this recognition sequence. This protease subfamily and its putative target domain are hereby renamed rhombosortase and GlyGly-CTERM, respectively. The protease and target are encoded by consecutive genes in most genomes with just a single target, but far apart otherwise. The signature motif of the Rhombo-CTERM domain, often SGGS, only partially resembles known cleavage sites of rhomboid protease family model substrates. Some protein families that have several members with C-terminal GlyGly-CTERM domains also have additional members with LPXTG or PEP-CTERM domains instead, suggesting there may be common themes to the post-translational processing of these proteins by three different membrane protein superfamilies.Entities:
Mesh:
Substances:
Year: 2011 PMID: 22194940 PMCID: PMC3237569 DOI: 10.1371/journal.pone.0028886
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Paralogous family alignment of the GlyGly-CTERM domain from Shewanella baltica OS195.
Six sequences are shown through the C-terminal residue, while four sequences are trimmed by up to three residues, Residues are shown colored by type: yellow is hydrophobic (Leu, Ile, Val, Met, Phe, Trp, Tyr, Ala), light blue is helix-breaking (Gly, Pro), green is basic (Arg, Lys) red is hydrophilic (Ser, Thr, Asp, Asn, Glu, Gln, and dark blue is Cys. Only the top two sequences are homologous outside of the region shown. For computation of percent identity among GlyGly-CTERM domains (boxed), the 13th column (an inserted Ser in one sequence) and the last three columns were removed.
Figure 2Sequence logos showing similar domain architectures for GlyGly-CTERM and PEP-CTERM.
Panel A shows a sequence logo based on the 267-sequence revised seed alignment for GlyGly-CTERM model TIGR03501, after removing two columns of >90% gaps. Panel B shows a sequence logo based on the 66-sequence seed alignment for PEP-CTERM model TIGR02595 after removing three columns of >50% gaps.
Selected genomes containing proteins with GlyGly-CTERM domains.
| Genome | # | Genome | # |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Figure 3Tail regions of multiple alignments with GlyGly-CTERM domains.
Panel A shows the C-terminal region of selected members from the S8/S53 family of subtilosin-like extracellular serine metalloproteases. The boxed region shows GlyGly-CTERM domains. Together with a poorly conserved spacer region of about fifteen residues, it represents a suffix region that the bottom two sequences lack. Panel B shows the C-terminal region of a multiple sequences alignment of YP_941517.1 from Psychromonas ingrahamii 37 and selected homologs. The region of sequence similarity has no defined homology domain definition, although longer homologs contain protease domains. Members of the alignment with GlyGly-CTERM regions (boxed) show variable-length spacer regions. The GlyGly-CTERM region replaces a longer alternative sequences as seen in the bottom three sequences. Panel C shows an aligned C-terminal region of proteins that share vault protein Von Willebrand factor type A/inter-alpha-trypin inhibitor homology. The upper box shows GlyGly-CTERM regions. The middle box shows a PEP-CTERM domain, recognized by model TIGR02595, cognate sequence for an exosortase in Verrucomicrobium spinosum DSM 4136. The lower box shows three examples of an LPXTG domain, recognized by TIGR01167, cognate sequences for a dedicated, strictly Gram-negative sortase (TIGR03784), encoded by an adjacent gene.
Rhombosortases identified by TIGR03902.
| Genomes with GlyGly-CTERM | 104 out of 108 that have rhombosortase |
| One rhombosortase | 103 genomes |
| Two rhombosortases |
|
| GlyGly-CTERM but no Rhombosortase |
|
|
| |
|
| |
|
| |
| Rhombosortase but no GlyGly-CTERM | gamma proteobacterium HTCC5015 |
|
| |
|
|
Genomes with Rhombosortase and GlyGly-CTERM adjacent to each other.
| Genome | Number of GlyGly-CTERM member proteins | GlyGly-CTERM and Rhombosortase proteins | |
| GI of GlyGly-CTERM | GI of Rhombosortase | ||
|
| 1 | 86157459 | 86157460 |
|
| 1 | 197121497 | 197121498 |
|
| 1 | 153875496 | (none found) |
|
| 1 | 192362347 | (none found) |
|
| 1 | 34499249 | (distant) |
|
| 1 | 194289234 | 194289233 |
|
| 1 | 95930867 | 95930866 |
|
| 1 | 255059491 | 255059490 |
|
| 1 | 261856296 | (distant) |
|
| 1 | 171058456 | (distant) |
|
| 1 | 149927708 | (none found) |
|
| 1 | 149376022 | 149376023 |
|
| 1 | 120556021 | 120556022 |
|
| 1 | 87121263 | (distant) |
|
| 1 | 124265454 | (distant) |
|
| 1 | 89094360 | 89094359 |
|
| 1 | 182412621 | 182412622 |
|
| 1 | 113867149 | 113867148 |
|
| 1 | 73540735 | 73540734 |
|
| 1 | 241664068 | 241664070 |
|
| 1 | 300703137 | 300703135 |
|
| 1 | 17547372 | 17547374 |
|
| 1 | 88799867 | (distant) |
|
| 1 | 32476015 | 32476016 |
|
| 1 | 116751108 | 116751109 |
Figure 4SIMBAL heat map for the rhombosortase SO_2504 of Shewanella oneidensis MR-1.
Values are calculated for all possible subsequences with lengths from 204 (full length) at the apex of the triangular heat map to 6 along the base. Horizonal numbering represents sequence position, marking the center of each subsequence. represented SIMBAL scores are calculated as the negative log of the probability, according to the binomial distribution, that a BLAST hits list (at an optimized E-value cutoff) for a subsequence from SO_2504 could so strongly favor matches to rhomboid family proteases from species with GlyGly-CTERM sequences of rhomboid family proteases from species without. The peak score, 57.7, occurs for the fifteen-residue peptide QLLGYVGLSGMLHGL, containing the active residue, Ser-119, and represents the most extreme red color in the heat map. The positions of several key sequence motifs are indicated. The WRxxS/T motif, in loop L1, falls within a hexapeptide centered at 54.5 with a locally high SIMBAL score of 23.6. The sequence Ser-Gly-Met-Leu-His,, where Ser-119 is the active site residue and His-123 is the stacking residue for the active site His, belongs to transmembrane helix TM4. The region 176–184 shows the conserved TM6 motif AHxxGxxxG, with the catalytic His and the GxxxG transmembrane dimerization motif [10].