Literature DB >> 31350897

From Root to Tips: Sporulation Evolution and Specialization in Bacillus subtilis and the Intestinal Pathogen Clostridioides difficile.

Paula Ramos-Silva1,2, Mónica Serrano3, Adriano O Henriques3.   

Abstract

Bacteria of the Firmicutes phylum are able to enter a developmental pathway that culminates with the formation of highly resistant, dormant endospores. Endospores allow environmental persistence, dissemination and for pathogens, are also infection vehicles. In both the model Bacillus subtilis, an aerobic organism, and in the intestinal pathogen Clostridioides difficile, an obligate anaerobe, sporulation mobilizes hundreds of genes. Their expression is coordinated between the forespore and the mother cell, the two cells that participate in the process, and is kept in close register with the course of morphogenesis. The evolutionary mechanisms by which sporulation emerged and evolved in these two species, and more broadly across Firmicutes, remain largely unknown. Here, we trace the origin and evolution of sporulation using the genes known to be involved in the process in B. subtilis and C. difficile, and estimating their gain-loss dynamics in a comprehensive bacterial macroevolutionary framework. We show that sporulation evolution was driven by two major gene gain events, the first at the base of the Firmicutes and the second at the base of the B. subtilis group and within the Peptostreptococcaceae family, which includes C. difficile. We also show that early and late sporulation regulons have been coevolving and that sporulation genes entail greater innovation in B. subtilis with many Bacilli lineage-restricted genes. In contrast, C. difficile more often recruits new sporulation genes by horizontal gene transfer, which reflects both its highly mobile genome, the complexity of the gut microbiota, and an adjustment of sporulation to the gut ecosystem.
© The Author(s) 2019. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Entities:  

Keywords:  bacterial genome evolution; horizontal gene transfer; sporulation; taxon-specific genes, Bacillus subtilis, Clostridioides difficile

Mesh:

Year:  2019        PMID: 31350897      PMCID: PMC6878958          DOI: 10.1093/molbev/msz175

Source DB:  PubMed          Journal:  Mol Biol Evol        ISSN: 0737-4038            Impact factor:   16.240


Introduction

Bacterial endospores (spores for simplicity) are among the most resilient cells known to us. Because of their resistance to extremes of several physical and chemical conditions (Setlow 2014), spores can spread and persist in the environment for long periods of time and are found in a variety of ecosystems (Nicholson et al. 2000) from deep into Earth’s crust (Chivian et al. 2008) and the deep sea (Urios et al. 2004; Dick et al. 2006; Setlow 2014; Fang et al. 2017) to the stratosphere (Shivaji et al. 2006) and, as plant or animal pathogens, symbionts or commensals (Flint et al. 2005; Hong et al. 2009; Hutchison et al. 2014; Browne et al. 2016). A large proportion of the human gut microbiota consists of anaerobic spore formers and several species establish symbiotic or commensal associations thought to be important for health (Chase and Erlandsen 1976; Fujiya et al. 2007; Yano et al. 2015; Browne et al. 2016; Almeida et al. 2019; Forster et al. 2019). On the other hand, Clostridium difficile, recently reclassified as Clostridioides difficile (Lawson et al. 2016), a strict anaerobe, is currently a major nosocomial pathogen, that can cause a range of intestinal diseases linked to antibiotic therapy worldwide (Bauer et al. 2011; Lessa et al. 2015; Abt et al. 2016). Spore formation by C. difficile, but more generally by intestinal spore formers, allows efficient dispersal of the organisms through the environment and among hosts (Chase and Erlandsen 1976; Lawley et al. 2009; Browne et al. 2016). Spore formers show a variety of morphologies and metabolic styles that include syntrophy, sulfate reduction, and phototrophy and form spores of different sizes, shapes, and numbers per sporangial cell (Abecasis et al. 2013; Hutchison et al. 2014, and references therein). In spite of this diversity, the basic architectural plan of the spore is largely conserved. Spores have a core compartment that encloses the DNA delimited by a membrane and surrounded by a thin layer of peptidoglycan that will serve as the cell wall of the new cell that emerges from spore germination. A second layer of modified peptidoglycan essential for dormancy, the cortex, is surrounded by several proteinaceous layers (Henriques and Moran 2007; McKenney et al. 2013). In the model organism Bacillus subtilis a multilayered protein coat and a glycosylated crust forms the spore surface (Henriques and Moran 2007; McKenney et al. 2013); in other species, such as Bacillus cereus, Bacillus anthracis, and C. difficile, the spore is further enclosed in a glycosylated exosporium (Stewart 2015). Both the structure and composition of the coat and exosporium vary greatly among species (Stewart 2015). Most spore formers are rods, and sporulation begins with the formation of a polar division that produces two unequal cells, the forespore (or future spore) and the mother cell, each with an identical copy of the chromosome but expressing different genetic programs. At an intermediate stage, the forespore is engulfed by the mother cell and becomes isolated from the external medium. At later stages, while gene expression in the forespore prepares this cell for dormancy, the mother cell drives the assembly of the cortex, coat, and exosporium, and eventually triggers its own lysis to release the mature spore (Henriques and Moran 2007; Higgins and Dworkin 2012; McKenney et al. 2013). Entry in sporulation is triggered by the activation, through phosphorylation, of the transcriptional regulator Spo0A (Stephenson and Lewis 2005; Edwards and Mcbride 2014). Spo0A is conserved among spore formers and its presence is often used as an indicator of sporulation ability (Galperin 2013; Filippidou et al. 2016). The activity of Spo0A is required for the division that creates the mother cell and the forespore. Following division, gene expression in the forespore and the terminal mother cell is sequentially activated by cell type-specific RNA polymerase sigma factors (Saujet et al. 2014; Al-Hinai et al. 2015; Fimlaid and Shen 2015); prior to engulfment completion, gene expression is governed by σF in the forespore and by σE in the mother cell; following engulfment completion, σF is replaced by σG and σE is replaced by σK. Like Spo0A, sporulation-specific sigma factors are also conserved among spore formers (de Hoon et al. 2010; Galperin et al. 2012; Abecasis et al. 2013). Genetic studies were initially conducted in B. subtilis, an aerobic organism, representative of the class Bacilli, still the best characterized spore former; global genetic and transcriptomics studies in this organism have established a comprehensive catalog of the genes whose transcription is directly controlled by σF, σE, σG, and σK (Eichenberger et al. 2003, 2004; Feucht et al. 2003; Steil et al. 2005; Wang et al. 2006; Meeske et al. 2016). In turn, C. difficile has emerged in recent years as the anaerobic model for the process and is currently the best characterized spore former within the Clostridia (Paredes-Sabja et al. 2014; Al-Hinai et al. 2015; Zhu et al. 2018). As an obligate anaerobe, C. difficile relies on its sporulation ability to spread horizontally between hosts via fecal and oral transmission (Lawley et al. 2009; Janoir et al. 2013). The role of spores in infection, colonization, and persistence of the organism has motivated a number of studies on the control of sporulation. Importantly, several studies have been directed at the identification of the genes under the control of the four cell type-specific sigma factors (Fimlaid et al. 2013; Saujet et al. 2013; Pishdadian et al. 2015), and several genes have been functionally characterized, highlighting conservancy of function of the main regulatory proteins, but also important differences relative to the model B. subtilis in the control of sporulation and in the assembly of key structures such as the spore surface layers, specifically the coat/crust and the exosporium (Paredes-Sabja et al. 2014; Saujet et al. 2014; Al-Hinai et al. 2015; Fimlaid and Shen 2015; Stewart 2015; Zhu et al. 2018). Based on the extensive knowledge of sporulation in B. subtilis, previous comparative genomics studies have led to the identification of a core (i.e., genes conserved in all spore formers), a signature (present in all spore formers but absent in nonspore-forming lineages), and Bacilli-specific sporulation genes (Onyenwoke et al. 2004; Traag et al. 2010; Galperin et al. 2012; Abecasis et al. 2013). More recently, studies on anaerobic spore formers isolated from the human gut have also led to the identification of a signature for sporulation in the gut (Browne et al. 2016). Overall, these studies have contributed to our knowledge of how conservation patterns are related to gene function. Although restricted to Firmicutes, sporulation is a highly diversified process within the phylum (Flint et al. 2005; Galperin 2013; Yutin and Galperin 2013; Hutchison et al. 2014). Still, the evolutionary mechanisms driving this diversity remain largely unknown. Here, we address this question by comparative genomics of the B. subtilis and C. difficile genetic machinery for sporulation. Special focus is given to those genes whose expression is dependent on the cell type-specific sigma factors σF, σE, σG, and σK and hence, expressed in either the mother cell or the forespore. We show that the σ regulons are highly divergent between the two species, which share in common only a small fraction of genes. We then map the presence/absence of both sets of sporulation genes in a solid and comprehensive bacterial macroevolutionary framework in order to estimate the origin and evolution of the gene families, from an ancient bacterial ancestor to the contemporary B. subtilis and C. difficile. Our analyses show common evolutionary patterns in the lineages of the two species, with two major gene gain events having expanded the cell type-specific regulons. Moreover, per each gain event, the cell type-specific regulons appear to grow in the same size-proportions, suggesting that sporulation stages have been coevolving along the bacterial phylogeny, whereas multiple independent gene losses emerge in some branches of the tree for both Bacilli and Clostridia. We also show that in the evolution of B. subtilis, gene gains more often involve the emergence of taxonomically restricted genes (TRGs) than in C. difficile, where more sporulation genes have homologs in other bacteria and were likely acquired by horizontal gene transfers (HGTs). Still, the two mechanisms are expected to occur to a greater or lesser extent in both lineages and are fully demonstrated here for the genes coding an exosporium protein of C. difficile, CdeA (Díaz-González et al. 2015) and for SpoIIIAF, a component of the sporulation-essential mother cell-to-forespore secretion system, the SpoIIIA-SpoIIQ complex, conserved across spore formers (Meisner et al. 2008; Fimlaid et al. 2015; Serrano et al. 2016; Morlot and Rodrigues 2018). Lastly, our analysis supports the hypothesis that sporulation emerged at the base of Firmicutes, some 2.5 billion years ago (Ga), with a first major gene gain event possibly in response to the rise in oxygen levels. This major evolutionary step was followed by a series of intermediate gains that culminated with a second major gene gain event corresponding to the specialization of this developmental program and its adjustment to particular ecosystems.

Results

A Small Core and Large Diversity of Sporulation Genes between B. subtilis and C. difficile

B. subtilis is by far the best characterized spore former in terms of sporulation genes, their regulation, and the molecular mechanisms underlying spore morphogenesis (Fawcett et al. 2000; Eichenberger et al. 2003, 2004; Wang et al. 2006; Mäder et al. 2012). C. difficile is the best characterized within the class Clostridia (Fimlaid et al. 2013; Saujet et al. 2013; Paredes-Sabja et al. 2014; Pishdadian et al. 2015). For this study, we have manually compiled 726 sporulation genes for B. subtilis strain 168 (supplementary table S1, Supplementary Material online, BSU) and 307 sporulation genes from the C. difficile strain 630 (supplementary table S1, Supplementary Material online, CD). Each gene is under the direct control of one or more sporulation-specific sigma factors (fig. 1) and thus part of at least one cell type-specific regulon. The regulons have different sizes (table 1): the σE regulon is the largest followed by the σG- and σK-controlled regulons (of similar sizes) with the σF regulon being the smallest.

Sporulation stages and spore ultrastructure. (A) Depicts the main stages of sporulation as described for the model organisms B. subtilis. At the onset of the process, the rod-shaped cells (a) divide asymmetrically to produce a larger mother cell and a smaller forespore (the future spore) (b). Asymmetric division involves PG synthesis within the septum. The mother cell then starts to engulf the forespore (c), eventually releasing it as a free protoplast inside its cytoplasm (d). Following engulfment completion, the forespore is no longer in contact with the external medium and is separated from the mother cell by a system of two membranes that derive from the asymmetric division septum, the inner and outer forespore membranes. Following engulfment completion, the forespore becomes visible as a phase dark body inside the mother cell (e). Synthesis of the primordial germ cell wall takes place from the forespore, whereas synthesis of the spore cortex PG layer is a function of the mother cell. Development of full spore refractility coincides with the formation of the cortex. The final stages in the assembly of the spore coat and crust are also represented (f). Finally, the spore is released into the environment through autolysis of the mother cell (g). Spores will resume vegetative growth through the process of germination. Spo0A is represented in predivisional cells; the cell in which the cell type-specific sigma factors are active in relation to the stages of sporulation is also represented. (B) Transmission electron microscopy (TEM) image of a thin cross section of a B. subtilis (top) and C. difficile (bottom) spore. The main spore structures are labeled in the diagram. Note that the crust layer of B. subtilis spores is not visible in the microscopy image, but its position, at the edge of the outer coat, is indicated in the diagram. The panels on the right show a magnification of the spore surface layers. The diagram identifies the main structures or compartments normally seen by TEM.

Table 1.

Number of sporulation genes collected in the literature per regulon and described in supplementary table S1, Supplementary Material online.

Sporulation StageExpressionRegulon#Genes B. subtilis 168#Genes C. dificille 630
InitiationMother cellSpo0A65
EngulfmentForesporeσF8140
EngulfmentMother cellσE331203
MaturationForesporeσG15659
MaturationMother cellσK14756
Number of sporulation genes collected in the literature per regulon and described in supplementary table S1, Supplementary Material online. Sporulation stages and spore ultrastructure. (A) Depicts the main stages of sporulation as described for the model organisms B. subtilis. At the onset of the process, the rod-shaped cells (a) divide asymmetrically to produce a larger mother cell and a smaller forespore (the future spore) (b). Asymmetric division involves PG synthesis within the septum. The mother cell then starts to engulf the forespore (c), eventually releasing it as a free protoplast inside its cytoplasm (d). Following engulfment completion, the forespore is no longer in contact with the external medium and is separated from the mother cell by a system of two membranes that derive from the asymmetric division septum, the inner and outer forespore membranes. Following engulfment completion, the forespore becomes visible as a phase dark body inside the mother cell (e). Synthesis of the primordial germ cell wall takes place from the forespore, whereas synthesis of the spore cortex PG layer is a function of the mother cell. Development of full spore refractility coincides with the formation of the cortex. The final stages in the assembly of the spore coat and crust are also represented (f). Finally, the spore is released into the environment through autolysis of the mother cell (g). Spores will resume vegetative growth through the process of germination. Spo0A is represented in predivisional cells; the cell in which the cell type-specific sigma factors are active in relation to the stages of sporulation is also represented. (B) Transmission electron microscopy (TEM) image of a thin cross section of a B. subtilis (top) and C. difficile (bottom) spore. The main spore structures are labeled in the diagram. Note that the crust layer of B. subtilis spores is not visible in the microscopy image, but its position, at the edge of the outer coat, is indicated in the diagram. The panels on the right show a magnification of the spore surface layers. The diagram identifies the main structures or compartments normally seen by TEM. Homology mapping between B. subtilis and C. difficile genomes was carried out using a bidirectional BlastP approach, followed by selection and analysis of ortholog groups containing sporulation genes. In figure 2, we show that only a small fraction of sporulation genes have candidate orthologs in the two species. Some of these genes are present in the same regulon (fig. 2, orange; see also supplementary table S2, Supplementary Material online): 2 genes in the σF regulon, 28 in the σE regulon, 11 controlled by σG, and 5 controlled by σK, whereas <1% belongs to different regulons (fig. 2, yellow; supplementary table S3, Supplementary Material online). A larger proportion of genes correspond to those having candidate orthologs in both genomes but only reported to be involved in sporulation in one species (fig. 2, light purple; supplementary table S4, Supplementary Material online). These proportions vary depending on the regulon, from 11% to 19% in B. subtilis and from 12% to 35% in C. difficile. Finally, the majority of sporulation genes have no candidate orthologs in the two species (fig. 2, dark purple; supplementary table S5, Supplementary Material online) (>70% for B. subtilis and >50% for C. difficile), highlighting the great genetic diversity of the sporulation process. In order to elucidate the emergence of this diversity in a larger context of bacterial evolution, we extended the homology analysis of the known cell type-specific sporulation genes to 258 bacterial genomes (supplementary fig. S1 and supplementary table S6, Supplementary Material online).

Clustering of candidate orthologs from a pairwise genome comparison between B. subtilis 168 and C. difficile 630. Only clusters including genes controlled by sporulation-specific sigma factors were selected for further analyses. The fraction of orthologous genes sharing the same sporulation regulon is marked in orange. The fraction of orthologs in different regulons is in yellow. The fraction of genes without orthologs in the regulons but with orthologs in the genome is in light purple. The fraction of genes without orthologs is in dark purple.

Clustering of candidate orthologs from a pairwise genome comparison between B. subtilis 168 and C. difficile 630. Only clusters including genes controlled by sporulation-specific sigma factors were selected for further analyses. The fraction of orthologous genes sharing the same sporulation regulon is marked in orange. The fraction of orthologs in different regulons is in yellow. The fraction of genes without orthologs in the regulons but with orthologs in the genome is in light purple. The fraction of genes without orthologs is in dark purple.

Distribution of Sporulation Genes across Bacteria Reveals a Small Core of Conserved Genes and Lineage Specificity

The presence/absence of sporulation genes was mapped across 258 bacterial genomes (supplementary fig. S1 and supplementary table S6, Supplementary Material online) using an optimized homology mapping approach (supplementary fig. S3, Supplementary Material online) (one vs. all and all vs. one BlastP) applied twice: First, using B. subtilis 168 as the reference, second using C. difficile 630 (supplementary fig. S2, Supplementary Material online). The results were represented in supermatrices of gene presence (1 or more, when candidate orthologs are present) and absence (0, when no candidate orthologs are present) (supplementary tables S7 and S8, Supplementary Material online), and used to calculate the distribution of sporulation genes across bacterial genomes (fig. 3). Both distributions show that only a small proportion of sporulation genes are conserved in the bacterial genomes (fig. 3). A unimodal distribution is observed for sporulation genes from B. subtilis (D = 0.024757, P value = 0.3179) with a peak around 20%, whereas for C. difficile the distribution is bimodal (D = 0.05604, P value = 7.262e-06) with high densities around 20% and a secondary high-density peak at 42% (fig. 3). These results suggest that B. subtilis has more TRGs for sporulation than C. difficile, whose genes appear more frequently in the bacterial genomic data set. To better elucidate the causes underlying these differences, we conducted an evolutionary analysis of sporulation gene families by posterior probabilities in a phylogenetic birth–death model.

Distribution of sporulation genes from B. subtilis and C. difficile across the 258 bacterial genomes sampled from NCBI. Distributions are represented in (A) absolute number of genes and (B) relative percentages. Distributions were tested for unimodality using the Hartigan’s Dip test in R (D = 0.024757, P value = 0.3179 for B. subtilis; D = 0.05604, P value = 7.262e-06 for C. difficile).

Distribution of sporulation genes from B. subtilis and C. difficile across the 258 bacterial genomes sampled from NCBI. Distributions are represented in (A) absolute number of genes and (B) relative percentages. Distributions were tested for unimodality using the Hartigan’s Dip test in R (D = 0.024757, P value = 0.3179 for B. subtilis; D = 0.05604, P value = 7.262e-06 for C. difficile).

A Bacterial Evolutionary Backbone

Prior to the evolutionary analysis of sporulation gene families, we started by building a solid and comprehensive bacterial phylogeny based on 70 marker genes conserved across the genomic data set. Overall, the resulting tree topology (fig. 4 and supplementary fig. S4, Supplementary Material online) is similar to recent macroevolutionary bacterial trees (Hug et al. 2016; Marin et al. 2017). The phyla Dictyoglomi, Thermotogae, Aquificae, Synergistetes, and Deinococcus-Thermus form an early diverging clade. The second earliest split in the tree corresponds to the divergence of the taxon coined as hydrobacteria (which includes Proteobacteria, Bacteroidetes, Spirochaetes, and Fusobacteria) and terrabacteria (which includes Actinobacteria, Cyanobacteria, and the Firmicutes), estimated to have occurred between 2.83 and 3.54 Ga (Battistuzzi et al. 2004; Battistuzzi and Hedges 2009). Within terrabacteria, Firmicutes (monoderms, low GC, mostly gram-positive) is an early diverging phylum currently divided into seven classes: Bacilli, Clostridia, Negativicutes, Erysipelotrichia, Limnochordia, Thermolitobacteria, and Tissierellia (source: NCBI taxonomy). In our study, we included genomes from four of these classes. The major split is between Bacilli (n = 77) and Clostridia (n = 95) but other groups previously raised to the class level emerge within Clostridia, namely, the Negativicutes (n = 11) coding for diderm bacteria with a gram negative-type cell envelope and Erysipelotrichia (n = 1) (supplementary fig. S1, Supplementary Material online). Two spore formers (supplementary fig. S1, Supplementary Material online, and fig. 4), Symbiobacterium thermophilum and Thermaerobacter marianensis (gram-positive, GC > 60%) appear as a distinct lineage from the Firmicutes and have been suggested to be an intermediate phylum between the high GC (Actinobacteria) and the low GC (Firmicutes) groups (Ueda et al. 2004; Han et al. 2010). Here, however, they remain as Firmicutes.

Sporulation gene gain and loss events across the bacterial phylogeny for (A) Bacilli and (B) Clostridia. The total number of sporulation genes present at the root and at the referential tips is highlighted in gray. Gain events are numbered from 1 to 12. Multigene maximum likelihood (RAxML) tree inferred from an alignment of 70 orthologs and corresponding bootstraps measures in %. The PROTGAMMALG evolutionary model was used to infer the tree with branch support estimated with 100 bootstrap replicates.

Sporulation gene gain and loss events across the bacterial phylogeny for (A) Bacilli and (B) Clostridia. The total number of sporulation genes present at the root and at the referential tips is highlighted in gray. Gain events are numbered from 1 to 12. Multigene maximum likelihood (RAxML) tree inferred from an alignment of 70 orthologs and corresponding bootstraps measures in %. The PROTGAMMALG evolutionary model was used to infer the tree with branch support estimated with 100 bootstrap replicates.

A Major Gene Gain Event at the Basis of the Firmicutes

Supermatrices of presence/absence of sporulation genes (0, 1, >1) sized 726 × 258 and 307 × 258 from reference strains B. subtilis 168 and C. difficile 630, respectively, were used in an evolutionary analysis of sporulation genes across the bacterial phylogeny (supplementary tables S7 and S8, Supplementary Material online, and fig. 4). We first focused on the gene gain events, occurred from the root to the terminal branches of B. subtilis (fig. 4) and C. difficile (fig. 4). Our approach estimates that throughout evolution of both species, sporulation genes were acquired in two major evolutionary steps. The first major gene gain event occurred at the base of the Firmicutes, comprising at least 166 genes (fig. 4, branch 2; see also supplementary tables S9 and S10, Supplementary Material online) including those coding for Spo0A and the four cell type-specific sigma factors. Accordingly, these genes are found in most of the species and taxonomic groups of spore formers (supplementary fig. S5, Supplementary Material online). The major gene gain event in branch 2 comprised other core sporulation genes, conserved among members of the phylum (supplementary tables S9 and S10, Supplementary Material online). For B. subtilis, gene gains in branch 2 included many of the sporulation signature genes defined by Abecasis et al. (2013) (supplementary table S9, Supplementary Material online, genes highlighted in yellow) and included those involved in the control of σ factor activation or activity, and genes coding for ancillary transcription factors (fig. 5). They also included some genes involved in the engulfment process, synthesis of the spore cortex, formation of the coat/crust as well as synthesis and transport of pyridine-2,6-dicarboxylic acid (or dipicolinic acid, DPA) (supplementary fig. S6, Supplementary Material online). The accumulation of chelates of DPA with divalent cations in spores is important for dormancy and resistance to various DNA-damaging agents (Setlow 2014). The spoIIE and spoIIAB genes, expressed in the predivisional cell under the control of Spo0A, are involved in the cell type-specific activation of σF, whereas spoIIR, spoIIT, and spoIIGA are required for the activation of σE (Konovalova et al. 2014; Bradshaw and Losick 2015; Meeske et al. 2016; Narula et al. 2016, and references therein). The gain of csfB in branch 2 is interesting. This gene, lost from C. difficile, codes for a dual-specificity antisigma factor. CsfB is first produced under the control of σF (Decatur and Losick 1996) and helps preventing premature activity of σG, prior to engulfment completion (Chary et al. 2007; Karmazyn-Campelli et al. 2008; Rhayat et al. 2009; Serrano et al. 2011; Flanagan et al. 2016). Because of the autoregulatory nature of σG, several mechanisms concur with inhibition by CsfB to prevent its premature activity (Mearls et al. 2018). Later in sporulation, CsfB is produced in the mother cell, under the control of σK and helps preventing protracted activity of σE, to which it also binds an inhibits (Serrano et al. 2015; Martinez-Lumbreras et al. 2018). Thus, CsfB enforces the switch from early to late gene expression in both the forespore and the mother cell. The spoIIIA operon is conserved in all spore formers (Galperin et al. 2012; Morlot and Rodrigues 2018). In both B. subtilis and C. difficile, spoIIIA is expressed in the mother cell under the control of σE and codes for eight proteins, SpoIIIAA to H (Crawshaw et al. 2014; Morlot and Rodrigues 2018). Together with additional proteins, including the forespore-specific SpoIIQ (gained in branch 3) and the mother cell-specific GerM (gained in branch 2 but absent from the Clostridia), the SpoIIIAA-H proteins are required for the assembly of a transenvelope secretion complex, also known as a channel or “feeding tube”, whose main role is to maintain metabolic potential in the forespore including σG-dependent, transcriptional activity (Meisner et al. 2008; Camp and Losick 2009; Doan et al. 2009; Rodrigues, Henry, et al. 2016; Rodrigues, Ramirez-Guadiana, et al. 2016; Trouve et al. 2018). SpoIIIAH and SpoIIQ directly interact in the interspace and form a direct connection between the two cells (Meisner et al. 2008). The later appearance of SpoIIQ, in branch 3, suggests that initially the secretion complex did not make a direct synapse with the forespore cytoplasm. It is interesting to note two other functions of SpoIIQ. First, the SpoIIIAH-SpoIIQ zipper-like interaction also makes a contribution to engulfment (Blaylock et al. 2004). Second, a conserved membrane-embedded amino acid in SpoIIQ blocks transcription of csfB by σG maximizing the activity of this sigma factor following engulfment completion (Flanagan et al. 2016). The appearance of the σG-controlled spoIVB gene, coding for a signaling protease, is concomitant with the appearance of bofA, spoIVFA, and spoIVFB in the σK regulon (fig. 5 and supplementary table S9, Supplementary Material online). σK is produced as an inactive proprotein that associates with the forespore outer membrane; the pro-σK-processing protease, SpoIVFB, is maintained in an inactive complex in the forespore outer membrane by BofA and SpoIVFA (Cutting et al. 1990; Ricca et al. 1992; Resnekov and Losick 1998; Rudner and Losick 2002; Zhou and Kroos 2004). Release of SpoIVFB from inhibition requires production of SpoIVB in the forespore (Cutting et al. 1991; Dong and Cutting 2003; Konovalova et al. 2014; Halder et al. 2017). This ensures that the activation of σK, which drives the final stages in the assembly of the spore protective structures (cortex and coat/crust) (fig. 1) as well as mother cell lysis, is coupled to engulfment completion which marks the onset of σG activity in the forespore. The bofA, spoIVFA, and spoIVFB genes are absent from the C. difficile genome in which σK lacks a prosequence (Pereira et al. 2013), as well as from other Clostridia species (Galperin et al. 2012) (fig. 5). The gene for a second forespore-produced protease that contributes to this signaling cascade in B. subtilis, ctpB (Pan et al. 2003; Campo and Rudner 2007; Mastny et al. 2013), is present from the origin (supplementary table S9, Supplementary Material online). Also gained in branch 2 are SpoIIID, a mother cell-specific transcription factor that works with σE, and SpoVT, forespore-specific, which works with σG; SpoIIID and SpoVT both act as either activators or repressors, and define, with their cognate main transcriptional regulators, a series of type I coherent and incoherent feed-forward loops (FFLs) that result in pulses or delayed and protracted gene expression that fine tunes the genetic programs (Eichenberger et al. 2004; Wang et al. 2006; de Hoon et al. 2010).

Phylogenetic profile showing the conservation of gene presence in the following functional categories: FFLs (dark green) and regulation of activation and activity of σF (orange), σE (purple), σG (pink), and σK (light green). Conservation of gene presence is based on the homology mapping approach described in supplementary figure S2, Supplementary Material online, averaged for collapsed tips that included more than 1 species (e.g., Ruminiclostridium, Actinobacteria) and visualized with EvolView (Zhang et al. 2012). Full circles indicate gene presence above 50%. Half circles indicate gene presence at 50%. No circle indicates gene presence lower than 50% or total absence. On the top, gene reference sequences, names and functional categories were selected based on the B. subtilis sporulation network. Members of the ortholog cluster spoiIIIAF are inside the rectangle.

Phylogenetic profile showing the conservation of gene presence in the following functional categories: FFLs (dark green) and regulation of activation and activity of σF (orange), σE (purple), σG (pink), and σK (light green). Conservation of gene presence is based on the homology mapping approach described in supplementary figure S2, Supplementary Material online, averaged for collapsed tips that included more than 1 species (e.g., Ruminiclostridium, Actinobacteria) and visualized with EvolView (Zhang et al. 2012). Full circles indicate gene presence above 50%. Half circles indicate gene presence at 50%. No circle indicates gene presence lower than 50% or total absence. On the top, gene reference sequences, names and functional categories were selected based on the B. subtilis sporulation network. Members of the ortholog cluster spoiIIIAF are inside the rectangle. In B. subtilis, engulfment requires the concerted action of a three-protein machine, formed by the σE-controlled SpoIID, SpoIIM, and SpoIIP proteins (Abanes-De Mello et al. 2002; Broder and Pogliano 2006; Chastanet and Losick 2007; Morlot et al. 2010; Leidal et al. 2012; Ojkic et al. 2016). These genes are conserved in C. difficile, although spoIIM is largely dispensable for engulfment (Dembek et al. 2018; Ribis et al. 2018). While spoIID, which codes for a lytic transglycosylase (Abanes-De Mello et al. 2002; Chastanet and Losick 2007; Gutierrez et al. 2010; Morlot et al. 2010) is present from the origin, and was later recruited to the σE regulon, the spoIIM and spoIIP genes appeared in branch 2 (fig. 5 and supplementary table S9, Supplementary Material online). Membrane fission at the end of the engulfment process requires FisB (Doan et al. 2013), whose coding gene is also found in branch 2 (fig. 5 and supplementary table S9, Supplementary Material online).The spoVA operon (σG regulon) and spoVV (ylbJ, σE regulon), involved in the uptake of DPA into the forespore (Tovar-Rojo et al. 2002; Meeske et al. 2016; Ramirez-Guadiana et al. 2017), also appeared in branch 2; it is interesting to note that gain of the spoVA operon is concomitant with the gain of the genes coding for the DPA synthetase (spoVFA and spoVFB) into the σK regulon of B. subtilis (supplementary fig. S6 and supplementary table S9, Supplementary Material online; see also the Discussion). spoIVA codes for a morphogenetic ATPase required for the formation of a coat basal layer and the targeting of the coat proteins to the surface of the developing spore (Roels et al. 1992; Stevens et al. 1992; Driks et al. 1994; Ramamurthi and Losick 2008; Castaing et al. 2013). Of note, genes required for synthesis of the spore cortex, such as murB, murG, murF, spoVE, spoVD, and spoVR, are present from the origin as they have general functions in the synthesis of peptidoglycan and have a broad taxonomic distribution; they were co-opted to the mother cell-specific line of gene expression (supplementary fig. S6 and supplementary table S9, Supplementary Material online) (Popham and Bernhards 2015). spoVR, for example, (ortholog group no. COG2719) (Beall and Moran 1994; Huerta-Cepas, Szklarczyk, et al. 2016), is present in Proteobacteria and Archaea. Other genes required for cortex synthesis appeared in branch 2, such as spoVB, and dacB or pdaA, the later required for the production of muramic δ-lactam, a sporulation-specific modification of the cortex peptidoglycan, in both organisms (Fukushima et al. 2002; Saujet et al. 2013; Popham and Bernhards 2015; Diaz et al. 2018). In general, the presence of the σF activation pathway, the σF to σE, σE to σG, and σG to σK cell–cell signaling/activation pathways, and the presence of CsfB and of the ancillary factors SpoIIID and SpoVT show that key mechanisms enforcing the careful orchestration of the developmental program in the two cells and its coupling to the course of morphogenesis were established early. Also in general, most of the sporulation genes at branch 2, as expected, are also found in C. difficile (supplementary tables S9 and S10, Supplementary Material online). Together, this analysis indicates that the main morphological plan of sporulation and the required effector proteins were established early, at the basis of the Firmicutes.

A Second Major Gene Gain Event at Terminal Branches

A second major gene gain event occurred at terminal branches, at the base of the B. subtilis group (Berkeley et al. 2002) (fig. 4; +90 genes, branch 7) as well as within members of the family Peptostreptococcaceae (Clostridium cluster XI) (fig. 4; +73 genes, branch 10), at the branch that predates the split between the genera Clostridioides, Paeniclostridium, Paraclostridium, and Terrisporobacter (Galperin et al. 2016). For B. subtilis, genes gained in branch 7 include several σK-controlled determinants of coat/crust assembly such as safA, and cotO, cotW, cotV, cotM, and cgeABCD (supplementary table S9, Supplementary Material online). In B. subtilis, the coat is differentiated into a lamellar inner coat a striated, electrondense outer coat and a crust (fig. 1; McKenney et al. 2013). While cotE, acquired in branch 3 (i.e., Bacilli specific, supplementary table S9 and supplementary fig. S6, Supplementary Material online) is required for the assembly of the outer coat (Zheng et al. 1988; Driks et al. 1994; de Francesco et al. 2012; Nunes et al. 2018), safA drives assembly of the inner coat (Takamatsu et al. 1999; Ozin et al. 2000; Ozin et al. 2001) and has an additional role in linking the coat to the underlying cortex (Fernandes et al. 2018; Pereira et al. 2018). Neither cotE nor safA are conserved in C. dificile. Several genes involved in crust assembly appeared in branch 7. cotV and cotW form a transcriptional unit in a cluster of genes (cotVWXYZ, with cotO downstream of cotZ and convergently oriented) required for crust assembly (McKenney et al. 2010; Imamura et al. 2011; Shuster et al. 2019). At the basis of the hierarchy for crust assembly is CotZ, required for the localization of most crust proteins including the products of the cgeABCD operon, which are glycosylation enzymes (Shuster et al. 2019). CotO, in turn, has a role in encasement of the spore by the crust (Shuster et al. 2019). Since at least cotX appeared in branch 5 (an intermediate Bacilli lineage, excluding Paenibacillaceae and Alicyclobacillaceae; supplementary table S9, Supplementary Material online), the plan for crust assembly may have been laid down at this stage, and the addition of the cotVW and cgeABCD modules represent a specialization of this structure within the B. subtilis group. Several genes were gained at the basis of the Peptostreptococcaceae, in branch 10 (see supplementary table S10, Supplementary Material online). Among them, the genes coding for the spore germination proteins SleC, CspBA, and CspC, these genes coding for these proteins are part of the σK regulon, and thus expressed late in the mother cell. cspC is also found in C. perfringens and in C. lentocellum, but we found no evidence that the gene was gained at the basis of the Clostridium clade; the most likely scenario is thus that cspC represents a novelty of the Peptostreptococcaceae. CspBA, on the contrary, has homologs in Paraclostridium bifermentans, Paeniclostridium sordellii, and Terrisporobacter glycolicus and interestingly with gut organisms belonging to the segmented filamentous bacteria, Candidatus arthromites sp.; the best BlastP hit outside the Peptostreptococcaceae group is with a Candidatus isolate, suggesting HGT from/to this or from/to highly related organisms. Importantly, the appearance of cspC and cspBA in branch 10 seems to represent a clear adaptation to the gut environment as these genes are essential for the germination of C. difficile spores in response to bile salts (Bhattacharjee et al. 2016; Setlow et al. 2017; Zhu et al. 2018). CspBA is subject to interdomain processing by a conserved protease, yabG, which is present from branch 2 and is under the control of σK (supplementary table S9, Supplementary Material online) (Kevorkian and Shen 2017; Shrestha et al. 2019). CspB, is a subtilisin-like serine protease involved in the activation, together with CspC, of pro-SleC (Adams et al. 2013; Francis et al. 2013; Kevorkian et al. 2016; Kevorkian and Shen 2017; Shrestha et al. 2019). SleC, in turn, is a hydrolase required for degradation of the cortex peptidoglycan during spore germination (Burns et al. 2010; Gutelius et al. 2014); production of SleC from Pro-SleC is also under the control of YabG (Kevorkian et al. 2016).

Intermediate Gains

Other intermediate gains, between branches 2 and 7/10, occurred in the evolutionary path of B. subtilis and C. difficile. For example, for B. subtilis, several important genes both directly involved in the morphogenesis of the spore structures and in the regulation of the transcriptional cascade, appeared in branch 3. The cotE gene emerged at the basis of Bacilli; this suggests that a striated electrondense outer coat-like layer evolved before the inner coat, and that clearly defined inner coat layers is a specialization of the B. subtilis group (supplementary table S9 and supplementary fig. S6, Supplementary Material online; see above). Also, spoVM appeared in branch 3; it codes for a small peptide that recognizes positive curvature and helps in the localization of SpoIVA to the spore surface, and later, together with SpoVID, promotes encasement of the spore by the proteins that form all the layers of the coat/crust, including SpoIVA (Levin et al. 1993; Ramamurthi et al. 2006; Ramamurthi and Losick 2008; Wang et al. 2009; McKenney and Eichenberger 2012). The localization of SpoIVA did not initially rely on SpoVM, a key function of which seems to be in the coordination of encasement. SpoVID interacts with both SpoIVA, CotE, and SafA (Costa et al. 2006; Mullerova et al. 2009; de Francesco et al. 2012; Qiao et al. 2012, 2013) and acts as a hub that connects the inner and outer coat modules (Nunes et al. 2018). Thus, branch 3 represents a critical point in the evolution of the spore surface layers. Moreover, two other genes coding for transcription factors that modulate the expression of genes in the σE and σK regulons, many of which are involved in coat/exosporium assembly, also appear in branch 3. These are ylbO, also known as gerR, and gerE. These genes code for ancillary transcription factors that establish additional incoherent (GerR) and both coherent and incoherent (GerE) type I FFLs that with SpoIIID (above) further subdividing the σE and σK regulons (Eichenberger et al. 2004; Wang et al. 2006; de Hoon et al. 2010). Also of note, is the gain of bofC, coding for a negative regulator of the σG to σK cell–cell signaling pathway (Gomez and Cutting 1997; Wakeley et al. 2000; Wang et al. 2006) in branch 5 (fig. 5 and supplementary table S9, Supplementary Material online). Finally, the yabK gene (also known as fin) appeared in branch 3; yabK is produced under the control of σF and binds to the β′-subunit of RNA polymerase preventing docking of σF (Camp et al. 2011; Wang Erickson et al. 2017). YabK thus establishes a negative feedback loop that modulates the activity of σF (Wang Erickson et al. 2017). It appears that although the main mechanisms known to control the activity of the cell type-specific sigma factors were established at the basis of the Firmicutes (branch 2; see above), additional control mechanisms continued to be introduced. Of note, SpoVM is found in branch 3 although in C. difficile it is not essential for spore formation (Ribis et al. 2017) whereas SipL (Entrez GI:126701193), which performs a function similar to that of SpoVID in B. subtilis (Putnam et al. 2013; Touchette et al. 2019), emerged later, at an intermediate branch, before the divergence of the Halanaerobiales and Natanaerobiales groups (fig. 4B). Also, CotL (Entrez GI:126698652), important for the assembly of the coat/exosporium (Alves Feliciano et al. 2019) appeared in branch 11, at base of the Clostridioides. Similarly, several of the genes specifically involved in the assembly of the exosporium layer have appeared in terminal branches from 10 to 12 (supplementary table S10, Supplementary Material online, see below).

Coevolution of Early and Late Sporulation Stages

The cell type-specific sporulation genes belong to at least one regulon and thus it is possible to explore the effect of gene gain events in the size of the regulons as a function of the evolutionary steps for B. subtilis and C. difficile, highlighted in green in figure 4 (branches 1–12). The general pattern for B. subtilis (fig. 6) and C. difficile (fig. 6) is that regulons grow in similar proportions. Larger increases in the size of the regulons corresponded to the major gene gain event at the basis of the Firmicutes (fig. 4, branch 2). In branch 2, we observe a slightly larger increase of the σG regulon controlling late sporulation stages in the forespore,

Increase in size of sporulation regulons represented from root to tips for (A) B. subtilis and (B) C. difficile. Sporulation regulons have diverse sizes being σE the largest regulon and also the one acquiring the largest number of genes in every evolutionary step. To suppress this effect, the gains per regulon were normalized by size.

Increase in size of sporulation regulons represented from root to tips for (A) B. subtilis and (B) C. difficile. Sporulation regulons have diverse sizes being σE the largest regulon and also the one acquiring the largest number of genes in every evolutionary step. To suppress this effect, the gains per regulon were normalized by size.

Extensive Gene Loss in Spore Forming and Asporogenous Lineages

Besides gene gains, significant losses have also occurred in certain Bacilli and Clostridia lineages. In Bacilli, extensive gene loss (fig. 4; >=90, red triangles) was predicted for the branch ancestral to the asporogenous groups Listeria and Lactobacillales, Solibacillus and Staphylococcaceae, and Exiguobacterium and Erysipelotrichia. In each of these groups, succeeding gene losses have occurred for Staphylococcaceae, Exiguobaterium, Listeria, and Lactobacillales. In the extremophiles Exiguobacterium (Vishnivetskaya and Kathariou 2005; Vishnivetskaya et al. 2009; Moreno-Letelier et al. 2012), spo0A is present, but none of the genes coding for the cell-type sigma factors (supplementary fig. S5, Supplementary Material online). Moreover, the two Exiguobacterium species used here, E. antarcticum and E. sibiricum, have only 21% of the sporulation signature (Abecasis et al. 2013). These observations are in line with the notion that asporogenous Firmicutes have developed alternative ways to persist in diverse and sometimes extreme environments (Sauders et al. 2012; Linke et al. 2014). In addition, multiple gene losses arose independently in other groups, including spore formers, for example, at the base of Alicyclobacillaceae, of the Geobacillus and Anoxybacillus genera and in some other Bacillus species (fig. 4, orange triangles). In Clostridia, extensive gene losses have occurred independently for Candidatus Arthromitus and at the branch preceding the split of the genera Filifactor and Proteocatella (fig. 4B, red triangles). Other significant losses are predicted for the branches preceding the lineages Peptococcaceae and Thermoanaerobacterales groups III and IV, Thermoanaerobacterales, Lachnospiraceae, Ruminiclostridium and Mageeibacillus, and Ruminococcus (fig. 4, yellow triangles). Ruminococcus species, in particular, have lost some of the conserved sporulation genes (fig. 4) and have between 48% and 60% of the sporulation signature (Galperin et al. 2012; Abecasis et al. 2013) (supplementary figs. S1 and S3, Supplementary Material online), in what appears to be an intermediary stage into becoming asporogenous. Still, these organisms are likely to spread as spores or spore-like structures between hosts (Schloss et al. 2014; Browne et al. 2016; Mukhopadhya et al. 2018), which plays as a counter evolutionary force to the emergence of asporogenous phenotypes. It should be noted, that at least R. bromii has most of the sporulation core genes and was shown to sporulate (Mukhopadhya et al. 2018). In another example, Epulopiscium morphotype B uses the sporulation program to form multiple intracellular offspring; it does not form dormant, fully resistant spores and has lost a significant proportion of the late sporulation genes (Miller et al. 2012; Hutchison et al. 2014; this work). Although reported as present, we failed to observe the sigK gene in the available Epulopiscium morphotype B genome (supplementary fig. S5, Supplementary Material online). Significant gains and losses were also predicted for some terminal branches of Bacilli and Clostridia species/strains. However, gain/loss expectations on terminal branches are highly prone to error because false positives/negatives, or lower quality genomes, have a greater impact on the total estimates. Thus, only events occurring on internal branches were considered for discussion and presented in figure 4.

Sporulation Genes: A Landmark of Firmicutes

Evolutionary analyses of sporulation genes from B. subtilis and C. difficile estimate that 28.2% and 23.8% of their machinery was already present at the last bacterial ancestor (fig. 4 and supplementary tables S9 and S10, Supplementary Material online). These genes are conserved across early diverging bacterial clades and may have been co-opted for sporulation. Subsequent gene gains occurred later in the bacterial phylogeny, at different taxonomic levels and included TRGs, HGTs, and duplications as illustrated in figure 7. In particular, the first major gain event of sporulation genes (fig. 4, branch 2), which correlates with the emergence of the Firmicutes as a phylum, involved a gain of at least 166 genes, from which the vast majority, 80–90%, are TRGs, that is, only found in Firmicutes members (fig. 7 and supplementary tables S9 and S10, Supplementary Material online). A much smaller number of gene gains (10.4–14.3%) are estimated to result from HGT (supplementary tables S9 and S10, Supplementary Material online) having homologs in lineages outside the Firmicutes.

Predicted categories of sporulation gene gains in four branches for (A) B. subtilis and (B) C. difficile. Gene gains were considered to be taxon specific (green) when gained at the branch, present in its descendants but not present or gained in other clades. Alternatively, gene gains were considered non taxon specific (dark blue) when gained along a certain branch but also present in other nondescendent clades. The later included genes gained by HGT in both directions (from and to the branch/clade). Family expansions (light blue) were originated by duplication events.

Predicted categories of sporulation gene gains in four branches for (A) B. subtilis and (B) C. difficile. Gene gains were considered to be taxon specific (green) when gained at the branch, present in its descendants but not present or gained in other clades. Alternatively, gene gains were considered non taxon specific (dark blue) when gained along a certain branch but also present in other nondescendent clades. The later included genes gained by HGT in both directions (from and to the branch/clade). Family expansions (light blue) were originated by duplication events.

B. subtilis versus C. difficile: Sporulation Driven by Rapid Divergence and HGT

Besides branch 2, we also estimated the possible origins of gene gains at lower taxonomic ranks. In particular, we chose branches in the phylogenetic tree for which we could strictly define descendants and outgroups in agreement with the NCBI taxonomy (release from May 2018). These ranks, shown in figure 7, included branch 3 (at the base of the class Bacilli), branch 5 (all Bacilli families excluding the early diverging Alicyclobacillaceae and Paenibacillaceae), branch 7 (at the base of the B. subtilis group), branch 10 (within the family Peptostreptococcaceae, a subgroup containing the genera Clostridioides, Paeniclostridium, Paraclostridium, and Terrisporobacter), branch 11 (at the base of the genus Clostridioides), and branch 12 (at the base of the species C. difficile). At the base and within Bacilli, the estimated gains of TRGs were dominant over HGTs, representing 61.3% (branch 3), 89.8% (branch 5), and 56.3% (branch 7) of the acquired genes (fig. 7, in green). In sharp contrast, genes acquired along the clostridial lineages were more often present in other clades (fig. 7, in blue) and more likely acquired by HGT, representing 55.6% in branch 10, 77.8% in branch 11, and 64.1% of the total gains in branch 12. Family expansions were significantly lower when compared with other mechanisms of gene acquisition with a maximum of four duplicated genes in branches 10 and 12. Lists of candidate HGTs, TRGs, and expansions by gene duplication are summarized in supplementary tables S9 for Bacilli and S10 for Clostridia, Supplementary Material online. Beyond the differences in rates of HGTs, TRGs, and family expansions, we can find evidence of both mechanisms occurring in Bacilli and Clostridia lineages.

HGT and Rapid Divergence in Clostridia

The exosporium that encases C. difficile spores has unique structural features and an important role in spore adhesion to host cells (Díaz-González et al. 2015; Calderón-Romero et al. 2018). Several exosporium components have been identified in C. difficile, including the CdeA, CdeB, CdeC, and CdeM proteins (Díaz-González et al. 2015; Calderón-Romero et al. 2018). CdeB and CdeM are only found in C. difficile and represent a gene gain in branch 12 (Entrez GI:126699184, fig. 4 and supplementary table S10, Supplementary Material online). CdeC is specific to the Peptostreptococcaceae family (Calderón-Romero et al. 2018) and besides all three C. difficile strains, orthologs are present in the following species of our data set: T. glycolicus, C. mangenotii, and P. sordellii (Entrez GI:126698654, fig. 4 and supplementary table S10, Supplementary Material online). No orthologs are found in other groups suggesting that CdeC is a novelty within the Peptostreptococcaceae family. Strikingly, CdeA is coded by a gene gained in branch 10 (Entrez GI:126699990, supplementary table S10, Supplementary Material online), which has homologs outside the Peptostreptococcaceae family, especially in species inhabiting gut ecosystems (clusters ENOG410XVF3 and HESKKIA from EggNOG 4.5.1 and OMA databases). A phylogenetic tree based on the alignment of the CdeA sequences (fig. 8) does not reflect the species phylogeny suggesting multiple HGT events within the gut environment and through the cycling of Clostridia species among different niches, that is, mostly between sludge/sediments and the gut. Members of the cdeA cluster share two blocks of a four cysteine conserved domain with unknown function (InterPro Ac. no. IPR011437) (fig. 8). CdeA thus provides a clear example of HGT, likely to have occurred in the gut ecosystem and important for spore function during infection. These results suggest that the exosporium proteins of C. difficile result from both gene novelty as well as HGT.

Phylogenetic tree and conserved blocks of CdeA proteins. Orthologous sequences were selected based on the results of the orthology mapping approach and supplemented with orthologs from the cluster ENOG410XVF3 from EggNOG 4.5.1 (Huerta-Cepas, Szklarczyk, et al. 2016). Species are colored by environmental origin/niche. (A) Phylogeny with midpoint rooting and bootstrap probabilities were estimated with RAxML (Stamatakis 2014) using the best protein evolution model (LG + I) selected with ProtTest version 3 (Darriba et al. 2011), sequences were aligned using MAFFT (Katoh and Standley 2013). Protein blocks were drawn with ETE3 (Huerta-Cepas, Serra, et al. 2016). (B) Protein logos showing the two blocks of the conserved domain with four cysteines were estimated with WEBLOGO (Crooks et al. 2004).

Phylogenetic tree and conserved blocks of CdeA proteins. Orthologous sequences were selected based on the results of the orthology mapping approach and supplemented with orthologs from the cluster ENOG410XVF3 from EggNOG 4.5.1 (Huerta-Cepas, Szklarczyk, et al. 2016). Species are colored by environmental origin/niche. (A) Phylogeny with midpoint rooting and bootstrap probabilities were estimated with RAxML (Stamatakis 2014) using the best protein evolution model (LG + I) selected with ProtTest version 3 (Darriba et al. 2011), sequences were aligned using MAFFT (Katoh and Standley 2013). Protein blocks were drawn with ETE3 (Huerta-Cepas, Serra, et al. 2016). (B) Protein logos showing the two blocks of the conserved domain with four cysteines were estimated with WEBLOGO (Crooks et al. 2004). The cotF gene is another example of HGT in C. difficile (Entrez GI:126697784). Like its B. subtilis counterpart, CotF belongs to the coat F family (InterPro Ac. no. IPR012851) with a putative ferritin-like fold (PDB code: 2RBD). In both organisms, cotF is expressed in the mother cell, under the control of σK in B. subtilis and of σE in C. difficile (Cutting et al. 1991; Eichenberger et al. 2004; Steil et al. 2005; Fimlaid et al. 2013; Saujet et al. 2013). The B. subtilis CotF protein bears similarity to the products of four σG-controlled, forespore-specific genes, yraG, yraF, yraE, and yraD, part of an operon with the order yraG-yraF-adhB-yraE-yraD (supplementary fig. S7A, Supplementary Material online) (Wang et al. 2006). Both yraF and yraD code for proteins that resemble the C-terminal moiety of CotF; the yraG and yraE genes, in turn, code for proteins with similarity to the N-terminal part of CotF. Hence, the appearance of the cotF gene in branch 7, possibly results from a duplication and fusion of two yra genes, and is thus specific to the Bacillus group (supplementary table S9 and supplementary fig. S6, Supplementary Material online). The product of the yraD gene clusters with a single homolog in C. difficile coded by a cotF-like gene, but the genomic context of this gene is quite different from that of the yra genes in B. subtilis (supplementary fig. S7B, Supplementary Material online). This suggests HGT of an ancestral form of the yraD gene and its co-option for the σE regulon in C. difficile. While the B. subtilis and C. difficile cotF genes may code for coat proteins (mother cell specific) in both organisms, the coat F domain also appears, at least in B. subtilis, to be functionally relevant in the forespore. In contrast, the spoIIIAF gene, is an example of divergence. spoIIIAF is part of the spoIIIA operon and required for the assembly of the mother cell-to-forespore channel (Crawshaw et al. 2014; Morlot and Rodrigues 2018, see also above). The gene is conserved in all spore formers (Galperin et al. 2012; Crawshaw et al. 2014; Morlot and Rodrigues 2018). In both B. subtilis and C. difficile, spoIIIA is expressed in the mother cell under the control of σE. The requirement of the channel for sporulation is conserved in C. difficile (Fimlaid et al. 2015; Serrano et al. 2015). In C. difficile, spoiIIIAF (Entrez GI:126698793; which corresponds to the spoIIIAF gene of B. subtilis), is the only gene of the spoIIIA operon that has been annotated with an extra “i” in its name for all C. difficile species. Genes named “spoiIIIAF” have diverged to the point that they no longer cluster with their orthologs spoIIIAF genes, resulting in two distinct clusters of orthologous groups by our homology mapping approach (fig. 9) as well as in orthology databases (Huerta-Cepas, Szklarczyk, et al. 2016; Altenhoff et al. 2018). The spoiIIIAF cluster includes orthologs from all strains of C. difficile, C. magenotii, Paraclostridium bifermentans, P. sordellii, and T. glycolicus (figs. 5 and 9). Because spoiIIIAF is poorly conserved (fig. 9), it is considered here as an innovation “gained” in branch 10, within the Peptostreptococcaceae family (supplementary table S10, Supplementary Material online). An alignment of SpoIIIAF orthologs from both clusters shows two possible transmembrane domains (residues 1–57) and divergence in the remaining sequence, which only shares 16% identity with its B. subtilis counterpart (fig. 9). The exception are a group of hydrophobic residues (fig. 9, residues highlighted in blue) which are part of the hydrophobic core of the protein and are associated with a conserved ring-building motif; this motif may support formation of a multimeric ring, similar to SpoIIIAG and SpoIIIAH (Levdikov et al. 2012; Zeytuni et al. 2017, 2018). Since a structure for B. subtilis SpoIIIAF is available (Zeytuni et al. 2018), we modeled the 3D structure of SpoiIIIAF. Simulations generated five reliable models (supplementary fig. S8A, Supplementary Material online), which are structurally similar to SpoIIIAF and adopt a α1β1β2α2β4 topology (fig. 9). Still, the high divergence rates between the two orthologs is reflected in changes in the overall electrostatic potential and hydrophobicity of the protein surfaces (fig. 9 and supplementary fig. S8B, Supplementary Material online) and suggests that in the two organisms, SpoIIIAF has been under different evolutionary constraints. As the other spoIIIA-encoded products, spoIIIAF is predicted to be located in the outer forespore membrane (Doan et al. 2009; Zeytuni et al. 2017; Morlot and Rodrigues 2018). The high sequence divergence between SpoiIIIAF and SpoIIIAF suggests functional links to distinct proteins or binding partners that need to be further explored both for Bacilli as well as for the members of the Peptostreptococcaceae family.

Comparative analysis between sporulation proteins SpoIIIAF from Bacillus subtilis and SpoiIIIAF from Clostridioides difficile. (A) Ortholog clusters of SpoIIIAF and SpoiIIIAF (n = 64) were merged and aligned using MAFFT (Katoh and Standley 2013) followed by a principal component analysis as implemented in Jalview (Waterhouse et al. 2009) to estimate sequence divergence between clusters. (B) Selected sequences clustering with SpoIIIAF (black square) and SpoiIIIAF (purple square) were aligned with T-coffee (Notredame et al. 2000). (C) Aligned structures of the monomeric forms of SpoIIIAF85-206 (gray) and SpoiIIIAF84-205 (orange) were determined using Modeler homology-based modeling (Eswar et al. 2006) using the structure of B. subtilis SpoIIIAF (PDB code: 6dcs) as the template and MUSCLE (Edgar 2004) as sequence aligner. Structures and electrostatic charge distributions were visualized with Chimera (Pettersen et al. 2004).

Comparative analysis between sporulation proteins SpoIIIAF from Bacillus subtilis and SpoiIIIAF from Clostridioides difficile. (A) Ortholog clusters of SpoIIIAF and SpoiIIIAF (n = 64) were merged and aligned using MAFFT (Katoh and Standley 2013) followed by a principal component analysis as implemented in Jalview (Waterhouse et al. 2009) to estimate sequence divergence between clusters. (B) Selected sequences clustering with SpoIIIAF (black square) and SpoiIIIAF (purple square) were aligned with T-coffee (Notredame et al. 2000). (C) Aligned structures of the monomeric forms of SpoIIIAF85-206 (gray) and SpoiIIIAF84-205 (orange) were determined using Modeler homology-based modeling (Eswar et al. 2006) using the structure of B. subtilis SpoIIIAF (PDB code: 6dcs) as the template and MUSCLE (Edgar 2004) as sequence aligner. Structures and electrostatic charge distributions were visualized with Chimera (Pettersen et al. 2004).

Discussion

In this work, we have extended previous comparative genomics analyses based on B. subtilis sporulation genes (Paredes et al. 2005; de Hoon et al. 2010; Galperin et al. 2012; Abecasis et al. 2013; Galperin 2013) by including the sporulation genes identified in the human pathogen C. difficile. Recent advances in C. difficile genetics and the ability to construct sporulation mutants in this anaerobe have enabled the identification of over 300 genes, directly regulated by the cell type-specific sigma factors of sporulation (Fimlaid et al. 2013; Saujet et al. 2013; Pishdadian et al. 2015) (supplementary table S1, Supplementary Material online). By comparing the sporulation-specific regulons between the two species, we identified four main groups of genes based on their homology and role in sporulation. The largest group, representing over 50% of each regulon, consists of genes only present in one of the two reference spore formers (fig. 2 and supplementary table S5, Supplementary Material online), highlighting the diversity and complex origins of the sporulation machinery. In some cases, genes having the same function in both species (i.e., functional equivalents) are annotated with the same name but have distinct evolutionary paths, with different ancestors, and hence belong to distinct ortholog groups. This is the case of the forespore-specific component of the mother cell-to-forespore channel, SpoIIQ, that also participates in engulfment in both B. subtilis (Londono-Vallejo et al. 1997; Camp and Losick 2008; Meisner et al. 2008) and in C. difficile (Fimlaid et al. 2015; Serrano et al. 2016). Both SpoIIQ proteins have a catalytic peptidase M23 motif (InterPro Ac. no. IPR016047) although nonfunctional in B. subtilis (Fimlaid et al. 2015; Serrano et al. 2016). Despite having the same name and similar roles in both species, the two spoIIQ genes are not orthologs; spoIIQ is a recognized case of a non-orthologous gene displacement (Galperin et al. 2012). The second largest group of genes corresponds to those with candidate orthologs in the two genomes but only described as part of the sporulation regulons in one species (fig. 2 and supplementary table S4, Supplementary Material online). These genes are involved in other processes and may have been co-opted for roles in sporulation. For instance, the dapA and dapB genes, which are present in many bacterial phyla and involved in the lysine and peptidoglycan biosynthetic pathways (Mukherjee et al. 2017), have been recruited to sporulation in B. subtilis and form the spoVF operon, expressed in the mother cell and involved in the synthesis of DPA (Chen et al. 1993; Daniel and Errington 1993). In C. difficile, dapA and dapB are located in the lysA-asd-dapA-dapB operon (Mukherjee et al. 2017) but are dispensable for spore formation (Dembek et al. 2015; Donnelly et al. 2016). A study on C. perfringens suggests a more ancestral mechanism for DPA synthesis, involving the electron transfer flavoprotein Etfa (Orsburn et al. 2010). In the same group, the uppS gene (coding for undecaprenyl pyrophosphate synthetase, required for synthesis of the lipid carrier undecaprenyl phosphate, needed for peptidoglycan synthesis) is part of the σG regulon in C. difficile (Saujet et al. 2013; Dembek et al. 2015) but a role in B. subtilis sporulation has not yet been determined. Finally, there are two last groups of candidate orthologs in B. subtilis and C. difficile. One group corresponds to genes present in the same regulon in both species (fig. 2 and supplementary table S2, Supplementary Material online). For example, the spmA and spmB genes, involved in spore core dehydration, are under the control of σE in both species (Popham et al. 1995; Steil et al. 2005; Fimlaid et al. 2013; Pereira et al. 2013; Saujet et al. 2013). Other examples include the spoIIIA locus and spoIID, both controlled by σE or the pdaA gene, in the σG regulon. Still, despite being candidate orthologous pairs and regulated by the same sigma factors, these genes may show different modes of action in the two species through the link to nonhomologous intermediates (Fimlaid et al. 2015; Serrano et al. 2016). At last, there are genes clustering in the same ortholog group that belong to different regulons, as exemplified by the yra/cotF genes (fig. 2 and supplementary table S3 and supplementary fig. S7, Supplementary Material online). We also find that some sporulation genes are conserved across several bacterial phyla, or even extended to other domains of life. For example, several genes with a role in spore cortex biogenesis established in B. subtilis, such as spoVR are present in Proteobacteria and Archaea (see above). Also, the cotR gene, with a role in spore coat assembly in B. subtilis (McKenney et al. 2013), shows a conserved hydrolase domain (InterPro Ac. no. IPR016035) and has an even broader taxonomic profile, which includes eukaryotes (ortholog group no. COG3621) (Huerta-Cepas, Szklarczyk, et al. 2016). Still, genes of broader taxonomic range constitute less than one-third of the sporulation machinery (fig. 4 and supplementary tables S9 and S10, Supplementary Material online). The great majority of the sporulation genes were acquired in the first major gain event, coincidently with the origin of the Firmicutes (fig. 4) and include the genes of the sporulation signature based on the B. subtilis genes (Abecasis et al. 2013) and the gut signature based on the analysis of the genomes of spore formers from the human gut microbiota (Browne et al. 2016). Since the divergence of Firmicutes is estimated to have occurred between 2.5 and 3.0 Ga, upon the emergence of terrestrial ecosystems (2.6 Ga) and after the Great Oxidation Event (2.3 Ga) (Battistuzzi et al. 2004; Battistuzzi and Hedges 2009; Marin et al. 2017), it has been suggested that sporulation is part of an adaptive process to cope with O2 exposure (de Hoon et al. 2010; Abecasis et al. 2013). Gene gain events occurring after the origin of sporulation (highlighted in fig. 4), correspond to lineage specializations of the program. The fact that we only find two larger gain events (≥90) along the evolutionary paths of B. subtilis and C. difficile, suggests that sporulation has evolved in punctual moments in the evolutionary history of these two species, only tied to certain lineages. Moreover, we find no evidence of these gains having greater influence in the size of a specific regulon since they have all coevolved proportionally (fig. 6). This and the fact that the four sporulation-specific sigma factors were present in the last common ancestor of Firmicutes (supplementary tables S9 and S10, Supplementary Material online) suggest that both early (engulfment) and late processes (synthesis of the spore cortex, coat/exosporium and preparation of the forespore for dormancy) were defined upon the emergence of sporulation. Interestingly, the single cell analysis of sporulation-specific gene expression in C. difficile revealed a reduced temporal segregation between the activities of the early and late sigma factors and a reduced dependency on the cell–cell signaling pathways controlling sigma factor activity or activation compared with the B. subtilis model (Pereira et al. 2013; Saujet et al. 2013) Possibly, this results from the loss of regulatory genes such as csfB (present at the basis of the Firmicutes) and the absence of additional regulatory proteins such as YabK, RsfA, and GerE (fig. 5). Though the common macroevolutionary patterns in both lineages, our analysis to detect possible origins of gene gains, suggests distinct preferred modes of gene acquisition. While in B. subtilis most of the genes gained in branches 3, 5, and 7 are taxon specific (fig. 7 and supplementary table S9, Supplementary Material online), in the evolutionary path of C. difficile, genes were mostly acquired by HGT events, with a consistently smaller proportion of TRGs (fig. 7 and supplementary table S10, Supplementary Material online). This finding is in line with the highly plastic genome of C. difficile 630, where 11% of mobile genetic elements (mainly transposons) are the main cause for acquisition of foreign genes from other species in the gut microbiota (Sebaihia et al. 2006; He et al. 2010). Genes acquired by HGT are involved in antibiotic resistance and virulence (Sebaihia et al. 2006; He et al. 2010) and as we now show, sporulation, conferring adaptive advantages to C. difficile in the gut ecosystem. A case study emerging from our analysis is the cdeA gene (fig. 8), coding for an exosporium protein, with the exosporium likely to have an important role during colonization/infection (Díaz-González et al. 2015). In turn, B. subtilis shows a stronger lineage-specific specialization of the program, with genes that are only found in closely related species (fig. 7 and supplementary table S9, Supplementary Material online). The appearance of a higher number of such genes in members of the B. subtilis group may be due to the higher homologous recombination rates observed for Bacilli species when compared with C. difficile (Vos and Didelot 2009; He et al. 2010). A key feature of B. subtilis and close relatives is their natural competence, that is, the ability to enter a physiological state in which cells are able to take up exogenous DNA fragments and recombine them into the chromosome (Dubnau 1991; Haijema et al. 2001; Smits et al. 2005; Chen et al. 2007). Competence has been proposed as a main driver of genome diversity within the B. subtilis group (Brito et al. 2018) and to facilitate homologous recombination (Vos 2009; Mell and Redfield 2014). The other mechanisms giving rise to TRGs are of high mutation rates that saturate the signal of homology. Rapid divergence between orthologs may occur driven by exposure to particular lineage-specific environment/niches and to different binding partners. This latter is clearly illustrated for the counterparts of SpoIIIAF from members of the Peptostreptococcaceae family (SpoiIIIAF) (fig. 9). Besides gene gains, there were also extensive gene losses occurring independently throughout the bacterial phylogeny (fig. 4). These losses have resulted in the emergence of asporogenous lineages (i.e., nonspore-forming Firmicutes, Onyenwoke et al. 2004) among spore formers at different taxonomic levels. Within Bacilli, we find the group including Listeria, Lactobacillales, Staphylococcaceae, and Exiguobacterium; in Clostridia, the groups including the species Proteocatella sphenisci and Filifactor alocis, members of the family Lachnospiraceae and the genus Caldicellulosiruptor (within Thermoanaerobacterales group III) (supplementary fig. S1, Supplementary Material online). Although the ability to form spores is often seen has an evolutionary advantage, the process is energetically costly, involving the expression of hundreds of genes and taking up several hours to complete (Fujita and Losick 2005; Paredes et al. 2005; Fimlaid et al. 2013). In nutrient-rich environments after 6.000 generations of B. subtilis, neutral processes but also selection were found to facilitate loss of sporulation, through the accumulation of indels or single-nucleotide substitutions (Maughan et al. 2007, 2009; Maughan and Nicholson 2011; Nicholson 2012; Maitra and Dill 2015). In nature though, sporulation loss is mostly driven by consecutive gene losses as suggested earlier (Galperin et al. 2012; Galperin 2013) and demonstrated in figure 4. As mentioned above, in Epulopiscium morphotype B and Candidatus Arthromitus, as well as in other intestinal spore formers, sporulation has evolved as a reproductive/propagation mechanism, concomitant with some gene loss (fig. 4) and eventually the gain of new genes under distinct evolutionary forces (Miller et al. 2012; Galperin 2013; Hutchison et al. 2014).

Conclusions

In this study, we were able to combine the comprehensive knowledge of sporulation genes in B. subtilis with the emerging model C. difficile. Our analysis validates earlier studies in that sporulation emerged once, at the base of Firmicutes. From this first major gene gain event, we have identified intermediate evolutionary steps combined with a second major lineage-specific gain event that underlies the current specialization and diversity observed among B. subtilis and C. difficile sporulation programs. Furthermore, predominant mechanisms of gene acquisition and innovation differ in the lineages of the two species. While B. subtilis entails greater innovation with many TRGs, C. difficile has acquired new genes mostly through HGTs. We also show that extensive gene losses underlie the emergence of asporogenous lineages and the co-option of sporulation as a reproductive mechanism. Overall, our comparative genomics approach provides a framework with which to elucidate the evolution and diversity observed in the sporulation program across Firmicutes and unveils new evolutionary case studies, such as those described for the cdeA, cotF, and spoIIIAF genes.

Materials and Methods

Data Sets Construction

All known sporulation genes from B. subtilis strain 168 and C. difficile strain 630, the two species that are better characterized in terms of their sporulation machinery, were collected from the literature (Fimlaid et al. 2013; Saujet et al. 2013; Pishdadian et al. 2015; Meeske et al. 2016) and SubtiWiki 2.0 (Mäder et al. 2012). These included the genes coding for the main regulators of sporulation: spo0A—controlling initiation; sigE and sigK—coding for the mother cell early and late specific sigma factors; sigF and sigG—coding for the early and late forespore-specific sigma factors; and the genes under their control (supplementary table S1, Supplementary Material online). Bacterial genomes (n = 258) were collected from the NCBI Database (release from April 2017) including main representative species from the following classes: Actinobacteria (15), Aquificae (2), Bacteroidetes (3), Cyanobacteria (3), Dictyogiomi (2), Fusobacteria (4), Proteobacteria (30), Spirochaetes (6), Synergistetes (1), Thermotogae (5), and Deinococcus-Thermus (1) (supplementary fig. S1 and supplementary table S6, Supplementary Material online). The selection of genomes from the class Firmicutes (186) was more exhaustive in order to include all the diversity of spore-forming bacteria described in the literature (Galperin 2013; Hutchison et al. 2014; Browne et al. 2016). Complete and closed genome assemblies were preferred where possible.

Orthology/Paralogy Mapping and Selection of Homologous Families

The predicted proteins from each genome were clustered into homologous families using the bidirectional best hits (BDBH) algorithm implemented in GET_HOMOLOGUES (Contreras-Moreira and Vinuesa 2013) (supplementary fig. S2, Supplementary Material online). The software is built on top of BLAST+ (Camacho et al. 2009). First, inparalogs, defined as intraspecific BDBHs, are identified in each genome. Second, new genomes are incrementally compared with the reference genome and their BDBHs annotated. The main script ran with BlastP default options and a minimum coverage fixed at 65% for the pairwise alignments (-C), option -s to save memory, and reporting at least clusters with two sequences (-t set at 2). The choice of the BDBH algorithm and corresponding cutoffs resulted from prior validation with sets and subsets of the bacterial genomes used in this study (supplementary fig. S3, Supplementary Material online). The method was applied twice, the first time with B. subtilis strain 168 (BSU) genome fixed as reference (-r) and the second time with C. difficile strain 630 (CD) (supplementary fig. S2, Supplementary Material online). Homology mapping resulted in over 3,500 clusters of homologous genes for references BSU and CD. Clusters corresponding to sporulation genes were selected making a total of 726 and 307 clusters for BSU and CD, respectively. The results were represented in two large-scale profile matrices of presence/absence of genes (0 for absence, 1 presence, or >1 for multiple presence). The two reference genomes were also compared between each other using the orthoMCL algorithm implemented in GET_HOMOLOGUES. The main script ran with BlastP default options, -C set at 65% and reporting all clusters (-t 0).

Bacterial Phylogenies

Seventy gene clusters of single-copy orthologs conserved across the genomic data set (adk, alaS, aspS, coaD, cysS, efp, engA, fabG, fmt, frr, gcp, gltX, guaA, infA, infC, ksgA, miaA, mraA, mraW, mraY, murC, murD, murE, murG, obgE, pheS, pheT, prfA, pth, pyrH, rluD, rpe, rplA-F, rplI-L, rplN-P, rplR-X, rpmA, rpoA, rpsB-E, rpsH, rpsJ, rpsK, rpsM, rpsQ, rpsS, ruvB, uppS, uvrB, yabC, yabD, ylbH, and yyaF) were assembled with good occupancy; allowing for a maximum of five sequences missing per cluster. Clusters were aligned independently using MAFFT (Katoh and Standley 2013) or MUSCLE (Edgar 2004) under the default options. The best alignment per cluster was selected and trimmed using trimAl (Capella-Gutiérrez et al. 2009) with the -automated1 option. The 70 alignments were concatenated into a single superalignment comprising 258 species/strains and 14,401 amino acid positions. The alignment was used to construct phylogenetic trees by maximum likelihood and Bayesian methods (supplementary fig. S4, Supplementary Material online). Calculations and tree construction were performed with the RaxML 8 (Stamatakis 2014) (-N 100) and Mr. Bayes 3.2.5 (Ronquist et al. 2012) using the best protein evolution model (LG + G) selected with ProtTest version 3 (Darriba et al. 2011). Tree topologies generated with both methods were compared using ete-compare (Huerta-Cepas, Serra, et al. 2016) based on the normalized Robinson–Foulds distance (nRF). The RF scores are the total number of leaf bipartitions occurring in one but not in both trees. Since nonnormalized measures of phylogenetic distance typically increase with the number of leaves in the tree, we used the normalized version, nRF, which divides RF by the maximum possible RF value for n leaves (maxRF). For topologically identical trees, nRF will be 0. Our nRF of 0.03 (RF/maxRF = 16.0/512.0) indicates similar tree topologies.

Phylogenetic Profiles and Evolutionary Analysis of Sporulation Genes

Sporulation gene clusters were represented in two large-scale profile matrices of presence/absence, sized 726 × 258 and 307 × 258, for references BSU and CD. The profiles were combined with the bacterial phylogeny to estimate gain (0 -> 1), loss (1 -> 0), and duplication events (1 -> 2 or more) in every tree branch with the software Count v 10.04 (Csurös 2010). First, rates were optimized using the gain–loss–duplication model implemented in Count, allowing different gain–loss and duplication–loss rates in all lineages and rate variation across gene families (1–4 gamma categories). One hundred rounds of optimization were computed. Next, a gene family history analysis was computed by posterior probabilities based on a phylogenetic birth-and-death model. Tables with the posterior probabilities based on the two profile matrices (supplementary tables S7 and S8, Supplementary Material online) were extracted for further analysis with a customized Python script available at https://github.com/PaulaRS/ProcessCount (last accessed August 2019) to estimate total number of gene gains, losses and duplications (per branch), and gene presences (per node).

Estimating Origins of Gene Gains

To estimate the source of gene gains only events with a posterior probability of ≥0.5 by Count (Csurös 2010) were considered. Gene gains were classified as “taxon specific” (TRGs) when present at a given branch, present in its descendants but not present in other clades. Alternatively, they were considered “non specific” when gained at a certain branch, present in its descendants but also present in other clades of the phylogeny. The later included genes gained by HGT in both directions (from and to the branch/clade), since it is often challenging to identify the donor (Koonin et al. 2001), specially when dealing with large genomic sets. At last, a small fraction of sporulation genes was also gained by gene duplication and classified as “family expansions.” Since our ancestral reconstruction was limited to 258 bacterial genomes, we needed to validate genes classified as “taxon specific” in a broader taxonomic context. To do so, we performed searches against the NCBI nr database (released in January 2018, Cutoffs: E-value 10E-10, 75% query coverage with 50% identity) using BlastP and checked for the best hits outside the clade and corresponding taxonomic distribution. While using this approach we were confronted with several false positives in the NCBI nr database that were removed from the BlastP reports either because of misidentified species, for example, a genome assigned to the species Varibaculum timonense, from Actinobacteria (Guilhot et al. 2017), that is in fact a Clostridia, or because of recurring genomic contaminations of spore formers in genome assemblies of other species as was shown for Mycobacterium (Ghosh et al. 2009; Traag et al. 2010). Genes gained in the braches 2, 3, 5, 7, 10, 11, and 12 and classified as “taxon specific” and “non specific” or expansions are listed in supplementary tables S9 and S10, Supplementary Material online, with corresponding cluster names and Entrez GI numbers.

Supplementary Material

Supplementary data are available at Molecular Biology and Evolution online. Click here for additional data file.
  211 in total

1.  A ComGA-dependent checkpoint limits growth during the escape from competence.

Authors:  B J Haijema; J Hahn; J Haynes; D Dubnau
Journal:  Mol Microbiol       Date:  2001-04       Impact factor: 3.501

2.  SpoVID guides SafA to the spore coat in Bacillus subtilis.

Authors:  A J Ozin; C S Samford; A O Henriques; C P Moran
Journal:  J Bacteriol       Date:  2001-05       Impact factor: 3.490

Review 3.  Horizontal gene transfer in prokaryotes: quantification and classification.

Authors:  E V Koonin; K S Makarova; L Aravind
Journal:  Annu Rev Microbiol       Date:  2001       Impact factor: 15.500

4.  Structure of components of an intercellular channel complex in sporulating Bacillus subtilis.

Authors:  Vladimir M Levdikov; Elena V Blagova; Amanda McFeat; Mark J Fogg; Keith S Wilson; Anthony J Wilkinson
Journal:  Proc Natl Acad Sci U S A       Date:  2012-03-19       Impact factor: 11.205

5.  Engulfment during sporulation in Bacillus subtilis is governed by a multi-protein complex containing tandemly acting autolysins.

Authors:  Arnaud Chastanet; Richard Losick
Journal:  Mol Microbiol       Date:  2007-04       Impact factor: 3.501

6.  The forespore line of gene expression in Bacillus subtilis.

Authors:  Stephanie T Wang; Barbara Setlow; Erin M Conlon; Jessica L Lyon; Daisuke Imamura; Tsutomu Sato; Peter Setlow; Richard Losick; Patrick Eichenberger
Journal:  J Mol Biol       Date:  2006-02-08       Impact factor: 5.469

7.  Revisiting the Role of Csp Family Proteins in Regulating Clostridium difficile Spore Germination.

Authors:  Yuzo Kevorkian; Aimee Shen
Journal:  J Bacteriol       Date:  2017-10-17       Impact factor: 3.490

8.  Genome sequence of Symbiobacterium thermophilum, an uncultivable bacterium that depends on microbial commensalism.

Authors:  Kenji Ueda; Atsushi Yamashita; Jun Ishikawa; Masafumi Shimada; Tomo-o Watsuji; Kohji Morimura; Haruo Ikeda; Masahira Hattori; Teruhiko Beppu
Journal:  Nucleic Acids Res       Date:  2004-09-21       Impact factor: 16.971

9.  Dual-specificity anti-sigma factor reinforces control of cell-type specific gene expression in Bacillus subtilis.

Authors:  Mónica Serrano; JinXin Gao; João Bota; Ashley R Bate; Jeffrey Meisner; Patrick Eichenberger; Charles P Moran; Adriano O Henriques
Journal:  PLoS Genet       Date:  2015-04-02       Impact factor: 5.917

Review 10.  Clostridioides difficile Biology: Sporulation, Germination, and Corresponding Therapies for C. difficile Infection.

Authors:  Duolong Zhu; Joseph A Sorg; Xingmin Sun
Journal:  Front Cell Infect Microbiol       Date:  2018-02-08       Impact factor: 5.293

View more
  10 in total

1.  Shaping an Endospore: Architectural Transformations During Bacillus subtilis Sporulation.

Authors:  Kanika Khanna; Javier Lopez-Garrido; Kit Pogliano
Journal:  Annu Rev Microbiol       Date:  2020-07-13       Impact factor: 15.500

2.  Conservation and Evolution of the Sporulation Gene Set in Diverse Members of the Firmicutes.

Authors:  Michael Y Galperin; Natalya Yutin; Yuri I Wolf; Roberto Vera Alvarez; Eugene V Koonin
Journal:  J Bacteriol       Date:  2022-05-31       Impact factor: 3.476

3.  Inhibitory proteins block substrate access by occupying the active site cleft of Bacillus subtilis intramembrane protease SpoIVFB.

Authors:  Sandra Olenic; Lim Heo; Michael Feig; Lee Kroos
Journal:  Elife       Date:  2022-04-26       Impact factor: 8.713

4.  Insights into the Structure and Protein Composition of Moorella thermoacetica Spores Formed at Different Temperatures.

Authors:  Tiffany Malleck; Fatima Fekraoui; Isabelle Bornard; Céline Henry; Eloi Haudebourg; Stella Planchon; Véronique Broussolle
Journal:  Int J Mol Sci       Date:  2022-01-04       Impact factor: 5.923

5.  Phage-Encoded Sigma Factors Alter Bacterial Dormancy.

Authors:  D A Schwartz; B K Lehmkuhl; J T Lennon
Journal:  mSphere       Date:  2022-07-20       Impact factor: 5.029

6.  The impact of PrsA over-expression on the Bacillus subtilis transcriptome during fed-batch fermentation of alpha-amylase production.

Authors:  Adrian S Geissler; Line D Poulsen; Nadezhda T Doncheva; Christian Anthon; Stefan E Seemann; Enrique González-Tortuero; Anne Breüner; Lars J Jensen; Carsten Hjort; Jeppe Vinther; Jan Gorodkin
Journal:  Front Microbiol       Date:  2022-08-04       Impact factor: 6.064

7.  Identical sequences found in distant genomes reveal frequent horizontal transfer across the bacterial domain.

Authors:  Michael Sheinman; Ksenia Arkhipova; Rutger Hermsen; Florian Massip; Peter F Arndt; Bas E Dutilh
Journal:  Elife       Date:  2021-06-14       Impact factor: 8.140

8.  Diversity and evolutionary dynamics of spore-coat proteins in spore-forming species of Bacillales.

Authors:  Henry Secaira-Morocho; José A Castillo; Adam Driks
Journal:  Microb Genom       Date:  2020-10-14

Review 9.  Mechanisms and Applications of Bacterial Sporulation and Germination in the Intestine.

Authors:  Nienke Koopman; Lauren Remijas; Jurgen Seppen; Peter Setlow; Stanley Brul
Journal:  Int J Mol Sci       Date:  2022-03-21       Impact factor: 5.923

10.  Host adaptation in gut Firmicutes is associated with sporulation loss and altered transmission cycle.

Authors:  Hilary P Browne; Alexandre Almeida; Nitin Kumar; Kevin Vervier; Anne T Adoum; Elisa Viciani; Nicholas J R Dawson; Samuel C Forster; Claire Cormie; David Goulding; Trevor D Lawley
Journal:  Genome Biol       Date:  2021-08-05       Impact factor: 13.583

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.