Literature DB >> 36051883

Genomic discovery and structural dissection of a novel type of polymorphic toxin system in gram-positive bacteria.

Huan Li¹, Yongjun Tan¹, Dapeng Zhang^1,2.

Abstract

Bacteria have developed several molecular conflict systems to facilitate kin recognition and non-kin competition to gain advantages in the acquisition of growth niches and of limited resources. One such example is a large class of so-called polymorphic toxin systems (PTSs), which comprise a variety of the toxin proteins secreted via T2SS, T5SS, T6SS, T7SS and many others. These systems are highly divergent in terms of sequence/structure, domain architecture, toxin-immunity association, and organization of the toxin loci, which makes it difficult to identify and characterize novel systems using traditional experimental and bioinformatic strategies. In recent years, we have been developing and utilizing unique genome-mining strategies and pipelines, based on the organizational principles of both domain architectures and genomic loci of PTSs, for an effective and comprehensive discovery of novel PTSs, dissection of their components, and prediction of their structures and functions. In this study, we present our systematic discovery of a new type of PTS (S8-PTS) in several gram-positive bacteria. We show that the S8-PTS contains three components: a peptidase of the S8 family (subtilases), a polymorphic toxin, and an immunity protein. We delineated the typical organization of these polymorphic toxins, in which a N-terminal signal peptide is followed by a potential receptor binding domain, BetaH, and one of 16 toxin domains. We classified each toxin domain by the distinct superfamily to which it belongs, identifying nine BECR ribonucleases, one Restriction Endonuclease, one HNH nuclease, two novel toxin domains homologous to the VOC enzymes, one toxin domain with the Frataxin-like fold, and several other unique toxin families such as Ntox33 and HicA. Accordingly, we identified 20 immunity families and classified them into different classes of folds. Further, we show that the S8-PTS-associated peptidases are analogous to many other processing peptidases found in T5SS, T7SS, T9SS, and many proprotein-processing peptidases, indicating that they function to release the toxin domains during secretion. The S8-PTSs are mostly found in animal and plant-associated bacteria, including many pathogens. We propose S8-PTSs will facilitate the competition of these bacteria with other microbes or contribute to the pathogen-host interactions.

Entities: Chemical

Keywords: Fold recognition; Function prediction; Gram-positive bacteria; Kin selection; Polymorphic toxins; S8 peptidases

Year: 2022 PMID： 36051883 PMCID： PMC9424270 DOI： 10.1016/j.csbj.2022.08.036

Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN： 2001-0370 Impact factor: 6.155

Introduction

In the world of microbes, competition between these tiny organisms for resources is extremely fierce [1]. As a result, bacteria have developed a large repertoire of weaponries, collectively termed biological conflict systems (such as antibiotics/resistance systems [2], toxin-antitoxin systems [3], CRISPR-Cas systems [4], and polymorphic toxin systems [5]) against their rivals during the course of evolution. The discovery of polymorphic toxin systems (PTSs) a decade ago situates them as one of the most recently discovered biological conflict systems, and their characterization revealed that they are perhaps the most complicated and predominant conflict system in bacteria, making them an important subject of continued biological research [5], [6], [7], [8], [9]. The offensive abilities of PTSs rely on their various proteinaceous toxins. Remarkably, these toxin proteins display modular, but polymorphic domain architectures, including N-terminal regions, central repeats/linkers, and C-terminal domains [5]. The N-terminal regions of PTs often serve as recognition modules for specific secretion pathways to export these toxins, including the signal peptides for Type II Secretion System (T2SS) [10], ESPR and TpsA-SD module for T5SS [7], [11], PAAR, HCP1 and VgrG modules for T6SS [5], [12], and WXG, LXG and LDXD modules for T7SS [7], [13], [14]. There are also several specific N-terminal domains that are linked to other atypical secretion modes, such as MuF [15], [16], PrsW [17], MafB [5], [18] and TANDOR [19]. The C-terminal regions of PTs usually contain the toxin/effector modules that target the rival bacteria or other organisms. Comprehensive bioinformatics analysis of these toxin/effector domains has revealed an extraordinary catalytic spectrum, including nucleases, deaminases, peptidases, lipid and carbohydrate modifying domains, ADP-ribosyltransferases (ARTs), nucleotidyltransferases, glycosyltransferase, and many others [5], [6], [7], [20]. Besides the modular toxins, another hallmark of PTSs is the presence of immunity proteins whose encoding genes are typically next to the toxin genes on genomic loci [5], [6], [7]. The immunity proteins are usually intracellular, single-domain proteins which can neutralize the cognate toxins by directly binding with them. When the toxins enter the surrounding bacteria, the kin bacteria encoding the cognate immunity protein remain unaffected while the non-kin bacteria, which do not carry the same cognate immunity protein will be destroyed [5], [7], [21], leading to kin selection or self/non-self recognition. Furthermore, several studies have demonstrated that the PTSs typically undergo rampant diversifications through both domain shuffling of toxin proteins and rapid remodeling of genomic loci [5], [6], [7], [22]. As a result, even between closely related strains, the toxin modules, as well as the cognate immunity proteins, may belong to distinct families and display unrelated activities [5], [6], [7], [8], [9], despite other modules of toxin proteins remaining conserved. Recent years have witnessed a great expansion of research on many molecular aspects of PTSs. This collective research has led to the identification of numerous toxins and their cognate immunity proteins [23], [24], [25], determination of their structures [26], [27], [28], [29], characterizations of enzymatic activities [18], [30], [31], [32], [33], [34], [35], [36], [37], dissection of details of secretion and regulation [38], [39], [40], discovery of distinct secretion modes [41], [42], as well as the demonstration of the profound roles of PTSs in kin selection and interbacterial competition [41], [43], pathogenicity [44], [45] and biofilm formation [46]. To be noted, many of these studies typically focus on a unique toxin system and its specific components, between which little homology is observed. This diversity serves as a testimony to the complexity within the PTS universe and indicates the presence of many more novel systems in nature that have yet to be discovered, reinforcing the need for a systematic discovery of PTSs. However, the traditional discovery process using experimental strategies is usually time-consuming. While sequence-based computational approaches can be useful in genome mining, this strategy often fails to identify novel toxin systems based on the existing PTS proteins, largely due to the high diversity among the PT systems (in terms of sequence, structure, and organization of genomic loci). This is especially true for many bacterial pathogens that have developed their own specific mechanisms of pathogenicity. To address the limitations in traditional toxin research, in recent years, we have been developing novel computational strategies, based on the organizational principles in both domain architectures and genomic loci of PTSs, leading to an effective and comprehensive discovery of novel PTSs, dissection of their components, and prediction of their structures and functions. We are confident that effective computational mining of novel PTSs will continue to accelerate the research in toxin-related studies, as demonstrated by several recent works [5], [6], [7], [16], [19], [42], [47]. In this study, we have discovered a novel PT system, S8-PTS (S8-peptidase associated Polymorphic Toxin System) by using our bioinformatic pipelines. The S8-PTS is featured by the presence of a single-domain S8 peptidase gene on the toxin loci, followed by one pair of polymorphic toxin and immunity genes. All the toxins in this system have a typical N-terminal domain which we refer to as BetaH. We extensively identified the toxin modules, which were classified into 16 distinct families, and the associated immunity proteins, which are found to belong to 20 distinct families. We also unified the majority of toxin and immunity domains in this system into superfamilies based on their shared structures/folds, conserved motifs, and similar configuration of the active sites. We show that the S8-PTS-associated peptidases are analogous to many other processing peptidases found in T5SS, T7SS, T9SS, and many proprotein-processing peptidases, indicating that they function to release the toxin domains during secretion. Further, we found that S8-PTS is only present in a clade of firmicutes (Bacillales) and several actinobacteria species, most of which belong to animal and plant-associated bacteria, including many pathogens. We predict that S8-PTS might facilitate the competition of these bacteria with other microbes or contribute to the pathogen-host interactions.

Materials and methods

Improved protein domain-centric strategy

Protein sequences that can be unified using sequence similarity-based methods, such as BLAST [48], PSIBLAST [49] and HMMER [50], are usually classified as a protein family. Historically, at the earlier stage of protein classification, the concept of family was applied to full length proteins. However, this is no longer valid since the discovery of modularity of the multidomain (or mosaic) proteins [51]. Domains are basic structural and functional units of proteins and may have separate evolutionary histories even inside the same protein [52]. This is especially true for proteins in PTSs which have extensive domain shuffling features. Hence, all the analyses in this study use domains instead of the full-length proteins. However, defining the boundary of domains using traditional methods is not always accurate as it typically refers to the overall conservation of the full-length sequence. Therefore, in this project, we will combine both the sequence conservation and structural information provided by AlphaFold2 [53] to help us dissect the domains more accurately and effectively (see below). Another bottleneck of domain analysis is the identification and classification of domain families. Here we will utilize CLANS [54] network analysis and structure-based similarity comparison (see below) to solve this problem.

Homologous sequence search and analysis

In order to retrieve homologous sequences, iterative sequence-profile searches were run for each domain family until convergence via the Position-Specific Iterative Basic Local Alignment Search Tool (PSI-BLAST) [49] against the nonredundant protein (nr) database of the National Center for Biotechnology Information (NCBI), with a cut-off e-value of 0.001. For peptidase_S8, since it is a very large family, we ran PSI-BLAST searches against a custom prokaryotic database (11067 genomes with 40,046,196 protein sequences) built by selecting representative genomes of each species from the NCBI FTP site (ftp.ncbi.nih.gov/genomes/genbank/bacteria/and ftp.ncbi.nih.gov/genomes/genbank/archaea/) on May 2021. The set of homologous sequences for each domain family were used for downstream analyses, including taxonomic distribution and Multiple Sequence Alignment (MSA). For the former, the taxonomy information of the bacteria that contain the BetaH toxins was extracted from NCBI GenPept files and summarized based on phylum and class/order. We also classified the bacteria based on the information of their isolation source and host, which were extracted from the associated GenBank files (genome files) by using Entrez Direct [55] and a custom script. For the latter, the pairwise-based BLASTCLUST program was first used to select representative sequences for each family by removing relatively similar sequences. Then, representative sequences were used to generate MSAs by the KALIGN [56], MUSCLE [57] or PROMALS3D [58] alignment softwares. The MSAs of the families that were discovered in this study are available in Supplementary File S1. The consensus pattern and conserved residues of alignments were generated by a custom Perl script based on the classification of amino acids [59]. We used the CHROMA [60] tool for visualizing MSAs which were further edited via Microsoft Word.

Guilt by association—Domain architecture analysis

It is intuitive that domains inside the same protein cooperate with each other for a common functional aim. This helps us to infer functions of both the protein and its domains, especially given the conserved “secretion-toxin” domain architecture for toxin proteins from PTSs. In our study, the domain architectures for all proteins will be annotated via the HMMSCAN [50] program against domain profiles from both PFAM [61] and our custom PTS profile database. The regions that lack the annotations of existing domains will be further collected and studied to develop new domains and profiles. Other regions including signal peptide and transmembrane regions will be detected by the Phobius program [62].

Guilt by association—Gene neighborhood analysis

Polycistronic mRNA, a single copy of mRNA that can encode multiple proteins, is the hallmark of prokaryotic genomes, while the mRNAs of eukaryotic genomes are mostly monocistronic, meaning one mRNA encodes one protein [63]. The genes of polycistronic mRNA are organized under the same one promoter and the whole genomic locus is termed an operon [64]. The operon is the key concept of our gene neighborhood analysis since genes inside the same operon are generally functionally linked [65]. Specifically, we ran gene neighborhood analysis for BetaH-containing proteins by collecting their upstream and downstream genes (five genes for both sides of query) using the homologous sequences retrieved via PSI-BLAST as the queries. Then, both the BetaH proteins and proteins encoded by neighbor genes were clustered based on sequence similarities via the BLASTCLUST program (). Each cluster and its respective proteins were then annotated by their domain architectures via the HMMSCAN program described above. The genes demonstrating a conserved association are determined by both the frequency and taxonomic distribution of their co-occurrence with BetaH genes. Specifically, for the taxonomic distribution, we look for associations conserved across at least two bacteria phyla or complete conservation in a single bacteria phylum.

Sequence network analysis

The representative sequences from multiple families were combined to run the CLAN program [54]. CLAN first conducts all-against-all BLASTP comparisons and generates a two-dimensional graph in which nodes represent sequences while edges represent detected pairwise similarities between sequences. Then, the layout of the graph is rearranged by the Fruchterman and Reingold force-directed graph drawing algorithm [66]. In short, the sequences with more intra-similarities will be clustered as a colony to represent a family. The final graph helped us to ensure that the families collected by PSIBLAST were not repeated collections for the same family, as unique families would separate into different colonies; this is especially important for identifying novel domains. However, we considered colonies close to each other as the same family if their sequences were collected via the same one PSIBLAST.

Protein structure modeling and analysis

The structural models for all selected proteins were predicted by AlphaFold2 [53], [67]. AlphaFold2 is a recently published deep learning tool that can return accurate protein structures from just an amino acid sequence. All the predictions were based on the full length of the proteins, and boundaries of each domain were determined by both the structural models and PAE matrices provided by AlphaFold2. The pdb files of all the modeled structures are provided in Supplementary File S2. Next, to obtain structural homologs for understanding each domain, the domain structures were used as inputs for the DaliLite program [68] which searches against the PDB database [69]. The method of DaliLite we used in this study is a local structural comparison algorithm to search similar structures. This structural comparison strategy stems from the fact that structure is more conserved than sequence over the course of evolution [70]. Therefore, highly diverse homologous sequences that are undetectable by sequence- or profile-based methods can be retrieved by structural similarity, combined with the conserved motifs or active sites. The DaliLite search results can be found in Supplementary File S3. Further analyses and visualization of structures were conducted by PyMOL [71].

Phylogenetic analysis

For the phylogenetic analysis of peptidase_S8 family, we selected representative sequences filtered by the BLASTCLUST program and then added our sequences of interest. The added sequences were determined by their specific species, domain architectures or gene neighborhoods. Other sequences were added if they were necessary for the robustness of the tree. After constructing the MSA for the selected sequences, the phylogenetic calculations were conducted by the FastTree program with default parameters [72]. The final trees were visualized by MEGA7 program [73].

Results and discussion

Identification of a novel N-terminal domain of polymorphic toxins

Over the course of studying the domain architectures of previously identified toxin proteins [5], we found that several different C-terminal toxin domains, such as ColD, ColE3, Ntox21 and Ntox35, were coupled with the same, unknown module in their N-termini (Fig. 1A). Iterative sequence searches using PSIBLAST with this region (named the BetaH domain, explained later) as a query further retrieved ∼700 homologs in the NCBI NR database. A global multiple sequence alignment of these homologs revealed that, despite the conservation of the N-terminal BetaH region, the C-terminal regions of these toxin domains displayed a strong variation, suggesting that they contain different domain families (Fig. 1A). Further, we found that almost all BetaH homologs contain a signal peptide at the N-terminus, suggesting that they are secreted out of the membrane through the Sec pathway. These features, including the secretion signal, C-terminal domain variation, and presence of several known toxin modules suggest that BetaH represents a characteristic domain of a new type of polymorphic toxins.

Fig. 1

(A) Multiple sequence alignment (MSA) of the BetaH-containing toxins. The general domain architecture of the BetaH toxins and a representative secondary structure excluding the C-terminal domains are shown above the MSA, while the consensus of amino acids at each column is shown below the MSA, where ‘h’ stands for hydrophobic residues highlighted in light yellow background, ‘l’ for aliphatic residues in light yellow, ‘a’ for aromatic residues in light yellow, ‘+’ for positive residues in blue, ‘-‘ for negative residues in light pink, ‘s’ for small residues in light green, ‘u’ for tiny residues in light green, ‘b’ for big residues in grey, and ‘o’ for alcohol residues in orange. In addition, the positively charged and negatively charged residues are highlighted in light purple and blue backgrounds, respectively. The C-terminal regions that contain known toxin domains are highlighted in different background. (B) The ribbon representation of the BetaH domains as compared to the receptor-binding domain (RBD) of the Colicin N (1A87_A) and the TipC immunity protein (6DHX_A). One representative of the BetaH domain is shown in front view and back view, as well as their electrostatic potential surface view, whereas the other variant only presents the back view to show the situation of presence of a β-hairpin insert in the ending α helix. α-helices are shown in red, β-sheets in green, and loops in grey. The basic residues located at the C-terminus of the BetaH domain are shown as sticks. Surface diagrams are colored by electrostatic potential as calculated by APBS in the PyMol program with positively charged regions (>5 mV) colored in blue and negatively charged regions (<−5 mV) in red. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) The BetaH domains are typically around 120 amino acids in length. Multiple sequence alignment (Fig. 1A and Supplementary File S1) revealed that no position and residue is universally conserved across the family. However, this domain is featured by containing many acidic residues. Further, a peptidase cleavage motif, which is featured by dibasic residues, can be found at the C-terminus of the domains. A structural model predicted by AlphaFold2 showed that the BetaH domain adapts a globular structure comprised of a seven-stranded sheet that wraps around a C-terminal alpha helix (Fig. 1B). Interestingly, structural comparisons (Fig. 1B) revealed that BetaH shares a striking topological similarity with two structures, the N-terminal receptor-binding domain of the toxin Colicin N (ColN-RBD) [74], and an immunity family TipC found in T7SS systems [75]. Both ColN-RBD and TipC utilize the surface of the β sheet to bind protein components such as the target cell receptor [76] and the coupled toxin protein [75]. Similarly, BetaH is featured by a negatively charged surface on the β sheet, formed by the above-described acidic residues (Fig. 1B). Additionally, BetaH-containing proteins share a comparable domain organization as ColN; both are led by the secretion or translocation (TL) signal, followed by BetaH or RBD, and C-terminal toxins. Thus, both the structural and locational similarities suggest that BetaH might function as ColN-RBD in binding to the membrane receptor of the target cell. Given its structural feature, we named the domain BetaH (β sheet and α helix). In order to comprehensively dissect the toxin systems that are associated with this unique BetaH domain, we next conducted a series of protein sequence, structure and genome analyses.

The BetaH domain is coupled with a variety of toxin domains

First, to identify the potential C-terminal domains of the BetaH-containing proteins, we extracted their C-terminal regions and conducted several PSIBLAST searches to collect their homologs. After removing the highly similar sequences, we used a CLAN network analysis to identify the major groups of the homologous sequences, each of which correspond to a domain family (Fig. 2A). To study the function of each domain family, we selected one representative and modelled its structure using AlphaFold2. The distant structure homologs were identified using the DaliLite program. We also identified the evolutionarily conserved residues for each domain and mapped them to the structure model to understand their potential catalytic activity (Supplementary File S1, Fig. 2B–F and Fig. 3). Altogether, we identified 16 major toxin domains among these proteins. Below we specifically describe the function and features of these toxin domains.

Fig. 2

Fig. 3

Representative ribbon structures of the VOC toxins (A), FxnL-Toxin (B), and their distantly-related structural homologs.

(A) A network clustering analysis of the C-terminal domains of the BetaH toxins. Each node corresponds to a sequence. Straight lines indicate significant high-scoring segment pairs (HSPs) detected by all-against-all BLASTP searches with scoring matrix BLOSUM62 and an e-value cutoff of 0.0001. The graph was generated by the CLANS program, which uses the Fruchterman and Reingold graph drawing algorithm. The known toxin families are labeled in black while the novel toxin families are labeled in various colors. (B-F) Representative ribbon structures of the toxin families discovered in S8-PTS, including nine families of the BECR ribonucleases (B), HicA (C), XHH (D), S8-REase (E) and Ntox33 (F). The α-helices and β-sheets of the structures are shown in red and green, respectively, while the loop and other non-conserved regions are shown in grey. The predicted catalytic residues for each family are shown as balls and sticks, with carbon atoms in white, nitrogen atoms in blue and oxygen atoms in red. The NCBI accession numbers of the representative protein sequences used in the AlphaFold2 predictions are shown in parentheses. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) Representative ribbon structures of the VOC toxins (A), FxnL-Toxin (B), and their distantly-related structural homologs. Nine ribonuclease toxins of the BECR ( Among the BetaH-coupled toxin domains, nine belong to the BECR ribonuclease fold (Fig. 2B). BECR ribonucleases are metal-independent RNases that are usually found in PTSs and toxin-antitoxin systems [5]. They share a common structural core with one alpha-helix followed by a four stranded antiparallel β-sheet (αββββ). Based on the DaliLite searches, the newly identified BECR families display significant structural similarity with several known BECR families (Supplementary File S3). Specifically, Ntox21-like and ColD-like families both show significant sequence similarity and structural similarity with Ntox21 and Colicin D, respectively (Fig. 2A and B). In terms of catalytic mechanisms, known BECR RNases appear to utilize distinct catalytic configurations to fulfill the general acid-base catalysis in cutting RNA molecules [77]. For example, the typical BECR family, Barnase [78], utilizes one conserved negative residue (Glu) as a general base to activate the 2′-OH of RNA which then acts as a nucleophile to attack the adjacent 3′-phosphate, one conserved positive residue (His) as a general acid to stabilize the leaving 5′-OH, and several positive residues to stabilize the penta-covalent transition state [77]. Many other BECR families, e.g. the XendoU family, have active sites utilizing two conserved histidine residues since histidine can function as both a general acid and general base [79]. Aravind et.al. also observed that many BECR RNases, such as Colicin D, contain a new conserved alcoholic residue (S/T) in the fourth strand of the BECR core, in addition to two conserved basic residues [5]. Further, for the RelE toxin, a ribosome-dependent RNase that were mostly found in the TAs, three highly conserved residues, two arginine and a tyrosine, are used to configure the catalytic site [80]. For the 9 domains belonging to the BECR superfamily that were identified in the BetaH proteins, we predicted their active site configuration based on evolutionary conservation of residues and clustering of their structural location. The active sites appeared to display unique but comparable configurations, such as those found in Ntox35, S8-BECR1 and ColD-like commonly involving two highly conserved histidine residues (Fig. 2B). Hence, we proposed that these novel toxin families will function as RNases in S8-PTs to fight against rival bacteria. HicA RNase HicA is the other RNase domain found in this system (Fig. 2C). This domain adopts a αβββα topology of the double-stranded RNA-binding domain (dsRBD) fold and has a conserved histidine as the catalytic residue [81], [82]. HicA is known for its primary presence in the TA systems, together with the HicB antitoxin [83]. However, we found the HicA domain is present at the C-termini of several BetaH-containing toxins and an RHS-type polymorphic toxin (i.e. RXZ81544.1). Sequence searches revealed that these HicA domains are further closely related to many HicA proteins belonging to the TA systems (i.e. approximately 52% identity to WP_173297211.1 and 50% identity to WP_139680861.1). Hence, it is possible that HicA toxin domains were recently acquired by the S8-PTS. This is also an example of component exchanges between PTSs and Toxin-Antitoxin systems. D N ases that belong to the HNH fold and the PD-(D/E)XK (restriction endonuclease) fold We identified two toxin domains, belonging to the HNH and PD-(D/E)XK superfamilies, respectively. Both HNH and PD-(D/E)XK superfamilies are metal-dependent DNases. In our earlier studies, we identified several distinct versions of HNH nucleases in PT systems [5], [7], [84] which all adapt a typical treble-clef fold as a distinct group of zinc fingers [85], but display extensive variations in both structure and construction of catalytic residues. The XHH domain [24] is found in several BetaH toxins (Fig. 2D) and demonstrates the typical HNH fold (a β-hairpin followed by a C-terminal helix), embedded in several additional helices. Its catalytic site is configured by several residues (Fig. 2D), including a conserved HH dyad at the end of first strand of the fold, where the first histidine functions as one of the metal-chelating ligands and the other histidine is a general base to activate a water molecule as a nucleophile to hydrolyze a phosphoester. The PD-(D/E)XK superfamily is also known as a Restriction Endonuclease (REase) superfamily since it is the primary type of nuclease in the restriction-modification systems [86]. This fold consists of a sandwich-like αβββαβ core with the first three strands forming a antiparallel sheet while the fourth strand is parallel with the third strand to form a mixed four-stranded sheet in total [87]. The second and third strands are bent heavily towards each other to form a Y-shaped crevice where the active site is located [88]. The typical active site of the PD-(D/E)XK superfamily is configured by five residues involving D/E-D--D/ExK-Q [88], in which the first D/E is located at the N-terminal α-helix, followed by D on the second strand, a (D/E)xK motif on the third strand, and a Q on a following α-helix. Our previous analysis has found a great diversity in the configuration of catalytic residues between the PD-(D/E)XK nucleases [5], [84]. The identified toxin family, namely S8-REase, represents a new example, as the typical D/E residue on the first helix has been replaced by a conserved residue D232 on the second helix of the core (Fig. 2E). Toxin domains related to the VOC ( While the majority of the toxin domains display DNase or RNase activities, we also identified two novel classes of toxin domains which appear to deploy previously unrecognized toxic activities. The first novel class includes two unique domains, S8-VOC1 and S8-VOC2 (Fig. 3A). Despite having different structures, structure similarity searches via DaliLite revealed that both domains are related to enzymes of the VOC (Vicinal Oxygen Chelate) superfamily, including several glyoxalases (with PDB IDs: 3E5D, 4QB5 and 3HPV). The VOC superfamily represents a group of diverse metalloenzymes that catalyze very diverse biochemical reactions on a wide range of molecules, including glyoxalases [89], methylmalonyl-CoA epimerases [90], extradiol dioxygenases [89], α-keto acid oxygenases [91], fosfomycin resistance proteins [92] and bleomycin resistance proteins [89]. These enzymes typically contain multiple copies of a characteristic scaffold with a βαβββ topology (VOC fold) in which a mixed parallel sheet was arranged in an order of 1–4-3–2 [93], such as two tandem VOC copies for the glyoxalase I (PDB: 1F9Z) from E. coli [94] and four tandem copies for the extradiol dioxygenase from Pseudomonas cepacia (PDB: 1HAN) [95] (Fig. 3A). The catalytic mechanisms of these enzymes, as indicated by the name of the superfamily, Vicinal Oxygen Chelate, usually involve a metal-binding center that is configured by three or four conserved polar residues (usually Glu and His), provided by two structurally adjacent VOC units, together with two additional vicinal oxygen atoms from nearby water molecules and/or their substrates [93]. Structural comparison and conservation analysis of both S8-VOC1 and S8-VOC2 support that they are new members of the VOC superfamily. First, both domains contain the core of the VOC fold, despite S8-VOC1 having an N-terminal extension and S8-VOC2 having a β-hairpin inserted directly after the core α helix. Second, the evolutionarily conserved residues of both domains, namely four His residues in S8-VOC1 and residues of Arg-Ser-Lys-Lys in S8-VOC2, are clustered on the surface of the β sheet, resembling the metal chelating site of other known VOC families (Fig. 3A). Based on both structural and conservation features, we propose that S8-VOC1/2 will utilize metal chelating mechanisms, like those of other VOC families, to catalyze the molecules of target cells. Further, we identified another previously unrecognized VOC family in the polymorphic CDI proteins (CdiA-VOC; PDB: 5T86_A). Its overall structure is more similar to that of S8-VOC2 in having a β-hairpin insert; however, it uses a different set of conserved residues for its potential metal-chelating site. The identification of three distict VOC families in PT systems suggests that the diversification of the VOC superfamily is largely unexplored, and the VOC domains might be another major toxin class in the conflict systems. Further, it needs to be noted that all the canonical VOC families contain two or more copies of the VOC domains, whereas all the new VOC families only have one VOC domain. Therefore, it will be interesting to investigate the hierarchical relationship among VOC domains, and it is possible that some of these one-copy VOC versions represent the ancestral stage of the VOC superfamily [93], [96]. The toxin domain with the Frataxin-like fold (FxnL-toxin) The second class of domains with potential novel toxin activity is from a toxin domain found in the C-terminal region of the protein WP_206392480.1 (Fig. 3B). Structure similarity searches revealed that this domain is significantly related to several members of the Frataxin-like fold, including the typical Frataxin (Dali Z = 6.6, PDB:4HS5) [97], the Nqo15 subunit of respiratory complex I from Thermus thermophiles (Dali Z = 6.8, PDB:2FUG_7) [98] and the YjbR domain (Dali Z = 4.5, PDB:2KFP) [99]. These proteins share a structural core composed of αβββββα elements with the leading and ending α-helices arranged on one layer and the central β-sheet on the other [100]. Previous studies have shown that human Frataxin and its yeast and bacterial homologs are functionally involved in regulating the Fe-S cluster assembly [101], [102], [103]. This activity appears to be associated with a cluster of several family-specific polar and aromatic residues on the surface of the β-sheet, especially Trp80, which interacts directly with the scaffold protein of the Fe-S cluster complex [102]. Importantly, the FxnL-toxin domain displays a similar cluster configuration of conserved residues, including an aromatic residue Tyr248, as that of Frataxin (Fig. 3B). Therefore, it is possible that the FxnL-Toxin is used to interfere with the Fe-S cluster formation of the rival bacteria since Fe-S clusters are common cofactors involved in energy metabolism pathways [104]. Ntox33 The last toxin family is the Ntox33 domain. It is an α + β domain with a conserved H-K-D triad as the potential catalytic residues (Fig. 2F). However, both profile-based and structure-based similarity searches did not identify any significant hits. Therefore, the potential function of Ntox33 remains unclear.

Gene neighborhood analysis identified a variety of immunity proteins associated with the BetaH-containing toxins

In addition to the polymorphic toxins, a major feature of the PTSs is the association of the toxins with the immunity proteins and other secretion components on the toxin loci. We thus conducted a systematic analysis on the genomic loci that contain the above-identified BetaH-containing toxins. Both upstream and downstream genes were extracted, and their encoded protein sequences were clustered and annotated using domain architecture. From this analysis, we identified 20 distinct immunity families (Fig. 4A) which are located downstream of the toxin genes [5], [7]. We also utilized the same sequence/structure analysis pipeline to predict their structures and conservation patterns and classify them into a superfamily when possible (Fig. 4, Fig. 5). Below we will specifically describe the structure and features of the major immunity families and their interactions.

Fig. 4

Fig. 5

Representative ribbon structures of the immunity families including TPR (A), 4Cys-Imm (B), DSL-Imm (C), FxnL-Imm (D), Imm64 (E), S8-Imm1 (F), S8-Imm2 (G) and S8-Imm3 (H).

A network clustering analysis of the immunity families that are associated with the BetaH toxins. The known immunity families are shown in black while the newly-identified ones are shown in various colors. (B-C) Representative ribbon structures of FdL (B) and sPC4L (C) immunity families and their structural homologs. Representative ribbon structures of the immunity families including TPR (A), 4Cys-Imm (B), DSL-Imm (C), FxnL-Imm (D), Imm64 (E), S8-Imm1 (F), S8-Imm2 (G) and S8-Imm3 (H). Six Ferredoxin-like (FdL) immunity families We found six novel immunity families, categorized into two major types, that adapt the Ferredoxin-like fold. A typical erreoxin-ike (FdL) fold, belonging to the alpha and beta (α + β) class according to SCOP classification, displays a βαββαβ topology in which the antiparallel sheet is ordered as 4–1-3–2. Four FdL immunity families, namely FdL-Imm1 to FdL-Imm4 (Fig. 4B), are grouped as the first type as they are featured by having two copies of the FdL domain. To be noted, the two copies (or units) are not tandemly arranged; instead, the second FdL domain is inserted into the first domain in the position between the first α-helix and the second β-strand. Two other immunity families, namely FdL-Imm5 (or DUF4279 in Pfam) and FdL-Imm6, are grouped as the second type as they contain only one FdL domain followed by some C-terminal extensions. This type also contains a CdiI immunity protein from Burkholderia pseudomallei 1026b (PDB: 4G6V_B) (Fig. 4B) [105], as revealed by the structural similarity. The Ferredoxin-like fold is a pervasive fold shared by numerous domains with different functions, such as the ACT domain for ligand binding [106], the RNA binding domain [107], the archetypal ferredoxin domain found in the iron–sulfur proteins [108], several toxins such as fungal killer toxin (PDB: 1KP6_A) [109] and fungal effector AvrLm4-7 (PDB: 4FPR_A) [110], and CRISPR Cas6 endonuclease (PDB: 3I4H_A) [111]. This is the first report showing that FdL might represent another major fold of immunity families, in addition to SUKH and SuFu, which we have previously identified [5], [7]. Two sPC4L immunity families Two immunity families, sPC4L-Imm1 and sPC4L-Imm2, are found to belong to the sPC4L superfamily (Pfam Clan: CL0609). The sPC4L fold is featured by a ββββα-ββββα topology which forms two separate β-sheets with two α-helices packed against each other [112]. Both sPC4L-Imm1 and sPC4L-Imm2 share the typical structure, but they differ from other canonical families in that their first β-sheet is three-stranded due to the missing first strand of the typical fold (Fig. 4C). Further, we found that two other CdiI immunity proteins (PDB: 4NTQ_B; PDB: 5FFP_A) also belong to this superfamily (Fig. 4C), suggesting that the sPC4L fold is another versatile scaffold in binding to toxin proteins, in addition to its known ability for nucleic acid-binding as observed in many families such as PurA [112], MRP [113] and Whirly [114]. Four TPR ( We found four novel immunity families that belong to the TPR repeats, namely TPR-Imm1 to TPR-Imm4 (Fig. 5A). The TPR repeats typically adapt a solenoid structure formed by multiple tandem repeats of a hairpin of antiparallel α-helices [115]. They are well-known for the ability to bind with short or long peptides via their concave grooves, or to bind with or block other globular domains via different sections of their solenoid structures [115]. Interestingly, three novel TPR-immunity families (TPR-Imm1 to TPR-Imm3) have at least one conserved negatively-charged residue located on the loop between helices (Fig. 5A). This situation is like that of the CdiI protein (PDB: 6CP8_C) in which a conserved Asp64 on the loop region is used to directly interact with a Arg299 of its cognate CdiA toxin [26]. This might indicate that the novel TPR families use these conserved negatively-charged residues to form polar contacts with their cognate toxins. Other immunity families In addition to the above immunity families, there are eight other immunity families that display unique structures. Specifically, 4Cys-Imm is featured by containing four conserved cysteines that are packed against each other like two perpendicular knuckles (Fig. 5B). No disulfide bond was formed based on the AlphaFold2 prediction, suggesting that they might be involved in binding a metal, i.e. zinc. Although DALI returned no reliable structural homologs, 4Cys-Imm shows a local structural similarity to the zinc finger of human transcriptional elongation factor TFIIS (PDB: 1TFI_A) (Fig. 5B). Therefore, we believe 4Cys-Imm might represent a new type of the zinc fingers [85]. Second, DSL-Imm (ouble heet ayers) represent another novel immunity family, which is featured by its composition of two layers of β-sheets (Fig. 5C). No report was found on this fold, but a hypothetical protein (5V77_A) was revealed as a highly significant hit (Z-score = 7.0) in the DaliLite search (Supplementary File S3). Further, like DSL-Imm, some of the homologs of 5V77_A (i.e. WP_172763861.1) are also coupled with the toxin genes including the MafB toxins, supporting the role of the fold as an immunity protein in PTSs [5]. Third, FxnL-Imm (Imm54) is an immunity family with a frataxin-like fold, a fold also observed in the above-mentioned FxnL-Toxin domain (Fig. 5D). However, unlike FxnL-Toxin which displays a striking conservation of multiple polar and aromatic residues on the surface of the β sheet (Fig. 3B), FxnL-Imm has no such feature (Supplementary File S1), highlighting their difference in function. This is an interesting case in which different families adopting the same fold can function as either toxin or immunity proteins in PTSs. Further, Imm64 was found to be structurally related to Imm52 (PDB: 6JDP_A) (Fig. 5E) and a CdiI protein (PDB: 6D7Y_B), whereas S8-Imm1 was found to be structurally related to another CdiI protein (PDB: 5T86_I) (Fig. 5F). Finally, both S8-Imm2 (Fig. 5G) and S8-Imm3 (Fig. 5H) are unique structures, and no structural homologs were found in DaliLite searches. Therefore, on the genomic loci of the BetaH-containing toxins, we have identified many novel immunity proteins that can be classified into 20 families. Based on the similarity of the folds, we could further unify some of them into superfamilies. However, the diversity in the overall structures between the families and between the superfamilies are still striking, and provides another typical presentation of the polymorphic toxin systems.

A peptidase is strongly associated with the toxin-immunity pairs

Importantly, our gene neighborhood analysis also revealed that a group of peptidase genes are frequently coupled with the above-identified toxin and immunity genes (Fig. 6). They are located mostly upstream of the BetaH-containing toxin genes and sometimes downstream of the immunity genes (Fig. 6A). Domain analysis revealed that they are led by a signal peptide, suggesting that they are secreted. The peptidase domain is found to belong to the S8 peptidase family (Pfam ID: PF00082), which is featured by an α/β fold with a 7-stranded parallel β sheet, and an Asp/Ser/His catalytic triad (Fig. 6B). The S8 peptidases, also known as subtilases, represent the second largest family of serine peptidases according to the MEROPS database [116] and are widely distributed in bacteria, archaea and eukaryotes. They are also known as proprotein-processing endopeptidases, as typified by subtilisins in bacteria [117], kexins in yeast [118], cucumisin and SISBT3 in plants [119] and proprotein convertases in mammals [120]. They usually have a multidomain architecture, including a N-terminal signal peptide, a prodomain, a central peptidase domain, and a C-terminal highly variable extension [121], [122], [123]. The peptidase remains inactive until the prodomain is autocatalytically cleaved, a process called zymogen maturation [120], [124]; then, the peptidase domain will process and facilitate maturation of the downstream substrates, such as neuropeptides and peptide hormones by mammal proprotein convertases [120], the killer toxin precursors by fungal kexins [125], and bacterial cytolysin precursors by lantibiotic leader peptidases [126].

Fig. 6

(A) Representative gene neighborhoods of the S8-PTS. Each gene is presented as a block arrow, in which the major domains of the encoded protein are separately shown as rectangular segments. The direction of the arrow shows the direction of the transcription. The loci are labelled by the NCBI accession numbers of the S8 peptidase genes followed by their species names. (B) The ribbon structure of the S8 peptidase that is associated with the BetaH toxin (WP_061680970.1) shown in the first operonic example. The α-helices (shown as cylinders) and β-sheets of the structure are shown in red and blue, respectively, whereas the loop and other non-conserved regions are shown in grey. The predicted catalytic residues are shown as balls and sticks, with carbon atoms in light blue, nitrogen atoms in blue and oxygen atoms in red. (C) Domain architecture and operonic association network of S8-PTS. Direct domain association is indicated by the solid line, whereas the operonic association is indicated by the dash line. The occurrence frequency of each association is shown along the line. The coloring theme of each domain is corresponding to the that of the respective domain architectures shown in (A). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.) To understand the role of the BetaH-toxin associated S8 peptidase proteins in the system, we collected the bacterial homologs using our custom database (see methods) and conducted a systematic analysis of their domain architectures and gene neighborhoods in a phylogenetic context. We found that the majority of the bacterial S8 peptidases retain multidomain architectures, typically including an N-terminal signal peptide, a prodomain of the I9 inhibitor family, and several C-terminal domains, in addition to the S8 peptidase domain (Fig. 7). This suggests that they will function as the prodomain processing endopeptidases. By contrast, BetaH-associated peptidases and several other clades of S8 peptidases display unique domain architectures. Further, by studying the domain architectures and gene neighborhoods of these bacterial homologs, we noticed that they are involved in different bacterial secretion pathways, respectively. These include a clade of S8 peptidases that are directly fused with the C-terminal β-barrel autotransporter domain, the secretion module for T5SSa [127]; the other clade of peptidases are fused with the Por_Secre_Tail domain of the immunoglobulin-like fold, a marker module for T9SS [128]; a third clade with a solo peptidase domain is operonically associated with the components of the T7SS. Further, several studies have shown that these unique clades of S8 peptidases serve as maturation peptidases in processing the substrates that are secreted by these secretion pathways. For example, the T5SSa-S8 peptidases in Bordetella pertussis, after being secreted via the coupled autotransporter [129], are required for the maturation of the filamentous hemagglutinin adhesins [130]. The mycosins, known as T7SS-specific S8 peptidases [131], are found to constitute the trimer dome-like structure that caps the cavity of the T7SS complex with catalytic sites facing towards the central lumen of the cavity, supporting the possibility that they process the substrates of T7SS, such as the toxins, that pass through the central channel [132]. Therefore, across the bacterial S8 peptidases, the function of processing the associated substrates appears to be a common theme, including both prodomain processing peptidases and those associated with the secretion pathways, such as T5SSa, T7SS and T9SS. We therefore propose that the association of S8 peptidases with different secretion pathways is derived during evolution and that the peptidases function as a processing peptidase to facilitate the release of the associated components, including the above-identified polymorphic toxins. Importantly, S8 peptidases, such as proprotein convertases, typically cleave precursors at the motif [R/K]-Xn-[R/K]↓, where X can be any amino acids and n = 0, 2, 4 or 6 [133]. Indeed, we have found that the C-terminal regions of BetaH domains are featured by a patch of positive residues, suggesting that they are potential cleavage sites of these peptidases (Fig. 1).

Fig. 7

Phylogenetic relationship of bacterial S8 peptidases. The phylogeny was constructed using the FastTree method, with the major branches supported by the SH-like local support values. The major clades that display either specific domain architectures or genomic context are highlighted in different colors. Representative domain architectures and gene neighborhoods are also shown along the clades.

S8-PTS, a novel PT system with three components

In conclusion, with our analysis of the genomic loci and protein components, we have identified a new type of polymorphic toxin system. The system has two defining features. The first is the strong association of the S8 peptidase with the toxin and immunity components typical of other PTS. Therefore, we term it S8-PTS (S8 peptidase-associated PTS) (Fig. 6). Our previous studies have identified several peptidase families in the PT systems. However, the majority of them are present as a component of the toxin proteins, such as HINT, ZU5, PrsW, and the so-called PVC-metallopeptidase [5]. They are typically used to release the C-terminal toxin domains during toxin secretion. S8-PTS is the first PTS that uses operonically-associated peptidase to release the toxins. The other defining feature is the presence of a new N-terminal domain, BetaH, in their toxins. We found that BetaH is structurally related to the receptor-binding domain of the colicin N toxin and that both toxins (ColN and BetaH-containing ones) share a comparable organization, suggesting that BetaH might function similarly in facilitating entry of the toxin into the target cell cytoplasm. In addition to the above-discussed defining features, we observed a striking level of variation in the composition of both the toxins and immunity proteins (Fig. 6), despite the S8-PTS only being found in a small set of bacterial species. The toxins, sharing a common BetaH domain, have employed 16 mechanically distinct toxin domains. Likewise, their immunity proteins can be classified into 20 families. Further, the domain shuffling appears to be mainly between the BetaH domain and the C-terminal toxin domains, whereas the toxin-immunity domain pairs appear to be stable for most of the cases (Fig. 6B). Additionally, we did not see a strong association between toxin and immunity domains in term of their folds. In other words, the toxin domains with the same fold can be neutralized by the immunity families that have different folds (Fig. 6). For example, different BECR domains are found to be coupled with FdL immunity domains, TPR-like families, 4Cys-Imm and DSL-Imm. Similarly, all six FdL immunity families are found to block at least three classes of toxin domains, including two BECR toxins (Ntox21 and Ntox21-like), S8-VOC1, and Ntox33, respectively. Not only did elucidate the organizational principle of the S8-PTS system, but we have also provided confident structure models of the components based on the AlphaFold2 algorithm and predicted the potential catalytic residues for toxin domains and interacting residues for immunity families. The newly discovered toxin types, including VOC and FxnL domains, have extended our understanding of the mechanisms of toxin action. Further, the unification of the immunity families, for the first time, suggests that an extensive evolutionary diversification of immunity proteins from a relatively small set of ancestors has paralleled with the expansion of the toxins. Finally, the S8-PTS was present mainly in firmicutes (Bacillales) and actinobacteria, two specific groups of gram-positive bacteria (Fig. 8 and Supplementary File S4). Gram-positive bacteria have only a single cytoplasmic membrane and their protein secretion is typically mediated by the Sec secretion pathway, which is consistent with the presence of the signal peptide on these BetaH-containing toxins. Analysis of these bacterial species suggest that they are mostly animal- and plant-associated bacteria (Fig. 8 and Supplementary File S4). Among them, many are pathogens, such as different species of the genus Staphylococcus, including Staphylococcus aureus and Staphylococcus pseudintermedius, both of which cause severe human and animal diseases [134], as well as Listeria monocytogenes, that causes the infection listeriosis [135], Mycobacteroides abscessus, responsible for a wide spectrum of skin and soft tissue diseases as well as central nervous system infections [136], and Curtobacterium flaccumfaciens, that leads to bacterial wilt and tan spot of dry beans in plants [137]. There are also many bacteria which appear to be beneficial to the host, such as Bacillus velezensis, a plant growth-promoting bacteria [138], and Terribacillus goriensi, which has antagonistic activity against several phytopathogenic fungi [139]. The majority of the genomes have only one S8-PTS locus, but a few have two to four PTS loci, with each one having distinct sets of toxin and immunity pairs. We propose that these PTSs might facilitate the competition of these bacteria against other microbes or contribute to the pathogen-host interactions.

Fig. 8

Taxonomy distribution of the S8-PTS based on the characteristic BetaH domain sequences (left) and classification of host/isolation source of these bacteria based on the retrieved genomes that contain the S8-PTS (right). The number of instances (either protein sequences or genomes) and their proportions associated with each taxonomy or host/source category are shown in the parentheses.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

133 in total

Review 1. Comparative genomics and evolutionary trajectories of viral ATP dependent DNA-packaging systems.

Authors: A M Burroughs; L M Iyer; L Aravind
Journal: Genome Dyn Date: 2007

Review 2. Resilience of biochemical activity in protein domains in the face of structural divergence.

Authors: Dapeng Zhang; Lakshminarayan M Iyer; A Maxwell Burroughs; L Aravind
Journal: Curr Opin Struct Biol Date: 2014-06-19 Impact factor: 6.809

3. Structural analysis and proteolytic activation of Enterococcus faecalis cytolysin, a novel lantibiotic.

Authors: M C Booth; C P Bogie; H G Sahl; R J Siezen; K L Hatter; M S Gilmore
Journal: Mol Microbiol Date: 1996-09 Impact factor: 3.501

4. Convergent Evolution of the Barnase/EndoU/Colicin/RelE (BECR) Fold in Antibacterial tRNase Toxins.

Authors: Grant C Gucinski; Karolina Michalska; Fernando Garza-Sánchez; William H Eschenfeldt; Lucy Stols; Josephine Y Nguyen; Celia W Goulding; Andrzej Joachimiak; Christopher S Hayes
Journal: Structure Date: 2019-09-09 Impact factor: 5.006

5. The Pseudomonas aeruginosa T6SS-VgrG1b spike is topped by a PAAR protein eliciting DNA damage to bacterial competitors.

Authors: Panayiota Pissaridou; Luke P Allsopp; Sarah Wettstadt; Sophie A Howard; Despoina A I Mavridou; Alain Filloux
Journal: Proc Natl Acad Sci U S A Date: 2018-11-19 Impact factor: 11.205

10. Agrobacterium tumefaciens deploys a superfamily of type VI secretion DNase effectors as weapons for interbacterial competition in planta.

Authors: Lay-Sun Ma; Abderrahman Hachani; Jer-Sheng Lin; Alain Filloux; Erh-Min Lai
Journal: Cell Host Microbe Date: 2014-06-26 Impact factor: 21.023