Literature DB >> 22977114

Evolution of conjugation and type IV secretion systems.

Julien Guglielmini¹, Fernando de la Cruz, Eduardo P C Rocha.

Abstract

Genetic exchange by conjugation is responsible for the spread of resistance, virulence, and social traits among prokaryotes. Recent works unraveled the functioning of the underlying type IV secretion systems (T4SS) and its distribution and recruitment for other biological processes (exaptation), notably pathogenesis. We analyzed the phylogeny of key conjugation proteins to infer the evolutionary history of conjugation and T4SS. We show that single-stranded DNA (ssDNA) and double-stranded DNA (dsDNA) conjugation, while both based on a key AAA(+) ATPase, diverged before the last common ancestor of bacteria. The two key ATPases of ssDNA conjugation are monophyletic, having diverged at an early stage from dsDNA translocases. Our data suggest that ssDNA conjugation arose first in diderm bacteria, possibly Proteobacteria, and then spread to other bacterial phyla, including bacterial monoderms and Archaea. Identifiable T4SS fall within the eight monophyletic groups, determined by both taxonomy and structure of the cell envelope. Transfer to monoderms might have occurred only once, but followed diverse adaptive paths. Remarkably, some Firmicutes developed a new conjugation system based on an atypical relaxase and an ATPase derived from a dsDNA translocase. The observed evolutionary rates and patterns of presence/absence of specific T4SS proteins show that conjugation systems are often and independently exapted for other functions. This work brings a natural basis for the classification of all kinds of conjugative systems, thus tackling a problem that is growing as fast as genomic databases. Our analysis provides the first global picture of the evolution of conjugation and shows how a self-transferrable complex multiprotein system has adapted to different taxa and often been recruited by the host. As conjugation systems became specific to certain clades and cell envelopes, they may have biased the rate and direction of gene transfer by conjugation within prokaryotes.

Entities: Chemical

Mesh：

Substances：

Year: 2012 PMID： 22977114 PMCID： PMC3548315 DOI： 10.1093/molbev/mss221

Source DB: PubMed Journal: Mol Biol Evol ISSN： 0737-4038 Impact factor: 16.240

Introduction

Prokaryotic genomes adapt quickly to new environmental conditions largely because they can acquire pre-evolved traits by horizontal gene transfer (HGT) (de la Cruz and Davies 2000; Gogarten et al. 2002; Ochman et al. 2005). Conjugation is a mechanism of genetic transfer that allows single-event transfer of large DNA fragments, up to entire chromosomes. Conjugation can transfer nonhomologous genes to the recipient genome and has a broader host range than transduction or transformation (Amabile-Cuevas and Chicurel 1992; Llosa et al. 2002; Chen et al. 2005). Accordingly, recent work suggests that conjugation is the most frequent mechanism of HGT (Halary et al. 2010). Indeed, conjugative systems are major players in the spread of antibiotic resistance, metabolic pathways, symbiotic traits, and other mobile genetic elements (de la Cruz and Davies 2000; Thomas 2000; van der Meer and Sentchilo 2003; Frost et al. 2005; Ding and Hynes 2009; Allen et al. 2010). Conjugation is also involved in the establishment of social processes, promoting biofilm formation (Ghigo 2001) and spreading of cooperative traits (Nogueira et al. 2009; Rankin et al. 2011). There are two known modes of conjugation that differ both in the type of translocated DNA, single-stranded DNA (ssDNA) versus double-stranded DNA (dsDNA), and in the complexity of the transport system (de la Cruz et al. 2010; Vogelmann et al. 2011). Both types of conjugative systems are either encoded by autonomously replicating plasmids or inserted in chromosomes as integrative conjugative elements (ICEs) (Smillie et al. 2010; Wozniak and Waldor 2010). We recently made a large-scale identification of ssDNA conjugation systems, both in plasmids and ICEs, and found them to be essentially short-term variants of otherwise identical backbone elements (Guglielmini et al. 2011). In the following, we note proteins from a given genetic element by GIMGE, where GI refers to the gene identification and mobile genetic element (MGE) to the name of the element (e.g., TraCF corresponds to the TraC protein of the F plasmid). Conjugative systems involved in ssDNA conjugation include two major protein complexes: relaxosomes and type IV secretion systems (T4SS) (reviewed in Fronzes et al. 2009; de la Cruz et al. 2010). MGE delivery through the membranes of the donor and recipient cells is done by the T4SS (fig. 1). In Proteobacteria, the T4SS are a large protein complex, including a ubiquitous ATPase (VirB4Ti or the distant homolog TraUR64), mating-pair formation (MPF) proteins that form the transport channel, and a pilus that attaches to the recipient cell (Alvarez-Martinez and Christie 2009; Fronzes et al. 2009). The large (>70 kDa) VirB4 ATPase is highly conserved in sequence and the only protein with clear-sequence homologs in all known T4SS. It is therefore the marker of the presence of a T4SS (Alvarez-Martinez and Christie 2009). VirB4 is thought to energize the assembly or activity of the secretion channel and is essential for pilus biogenesis and substrate transfer (Berger and Christie 1993; Fullner et al. 1996; Wallden et al. 2012). Four MPF families have been described in Proteobacteria: MPFT (based on the T-DNA conjugation system of A. tumefaciens plasmid Ti), MPFF (based on plasmid F), MPFI (based on the IncI plasmid R64), and MPFG (based on ICEHIN1056) (Smillie et al. 2010). These four models describe all functionally studied and nearly all T4SS identified by bioinformatic methods among Proteobacteria, both in plasmids and chromosomes (Guglielmini et al. 2011). The best-studied system is the vir operon (MPFT) from A. tumefaciens Ti plasmid. This small operon encodes 11 VirB proteins (Thompson et al. 1988; Ward et al. 1988), and we use these names as a template for naming the protein families of the MPFT system. T4SS from Cyanobacteria, Bacteroides, Firmicutes, Actinobacteria, and Archaea have homologs to VirB4 (Guglielmini et al. 2011). ssDNA-conjugative systems are very diverse, but very few studies have been done on the structure, function, and evolution of T4SS outside Proteobacteria and Firmicutes.

Scheme of the most-studied T4SS, the vir system of A. tumefaciens Ti plasmid. The VirBX proteins are depicted as BX (e.g., B5 refers to the VirB5 protein). The coupling protein VirD4 (D4) and the mobilization complex, which includes the relaxase (MOB)-DNA complex are also represented. OM: outer membrane; IM: inner membrane. The two other essential components of the ssDNA conjugation machinery are the relaxosome and the type IV coupling protein (T4CP). The relaxosome is composed of the relaxase (MOB) and often includes auxiliary proteins. It nicks the dsDNA and binds the resulting ssDNA at the origin of transfer. The diversity and evolution of the different families of relaxases has been extensively studied (Garcillan-Barcia et al. 2009). The highly conserved T4CP binds the DNA-relaxase substrate and couples it to the T4SS, possibly using ATP to translocate the complex across the inner membrane (Gomis-Ruth et al. 2004; Tato et al. 2005). The majority of T4CPs belong to the VirD4Ti family, but some T4SS were recently found to lack VirD4 and instead use a distantly related ATPase as T4CP (TcpApCW3) (Parsons et al. 2007; Steen et al. 2009). Protein secretion systems based on T4SS do not require relaxosomes. They usually require T4CP, albeit exceptions have been found in Bordetella pertussis and Brucella spp. (Alvarez-Martinez and Christie 2009). In these systems, proteins are translocated across the inner membrane by other means. Conjugation of dsDNA takes place in mycelia-producing Actinobacteria (Grohmann et al. 2003; Ghinet et al. 2011). It relies on a single protein: TraBpSG5 that translocates dsDNA between neighboring cells in mycelia (Possoz et al. 2001). This protein resembles, in sequence and function, the essential protein FtsK that segregates sister chromosomes in the last stages of chromosomal replication (Bigot et al. 2007; Vogelmann et al. 2011). They are both members of the AAA+ motor ATPase family, which also includes both types of T4CP (VirD4 and TcpA) and both types of ATPases essential for the function of T4SS (VirB4 and TraU). Hence, all key proteins of the dsDNA and ssDNA conjugation systems are evolutionarily related. This association has not yet been clarified from a phylogenetic point of view. T4SS are often recruited by bacterial pathogens to deliver effectors to eukaryotic cells (Weiss et al. 1993; Vogel et al. 1998; Seubert et al. 2003; Nystedt et al. 2008). These MOBless T4SS, called so because they do not contain a relaxase gene, are closely related to the T4SS of conjugative systems. Indeed, several T4SS can perform both conjugation between bacteria and protein delivery (Vogel et al. 1998; Llosa et al. 2003; Schroder et al. 2011). Protein delivery by T4SS is essential for the virulence of many plant and animal pathogens, including Legionella pneumophila, Helicobacter pylori, Bartonella spp., Coxiella burnetii, and A. tumefaciens (reviewed in Seubert et al. 2003; Juhas et al. 2008; Alvarez-Martinez and Christie 2009). Only T4SS among MPFT and MPFI have been experimentally shown to be used for protein delivery. The extreme flexibility of T4SS has allowed at least two other types of exaptations, i.e., evolutionary events in which part of the pre-existing machinery of conjugation was recruited for other functions (Gould and Vrba 1982). H. pylori genomes encode a MOBless T4SS that is used for natural transformation. It is necessary to import environmental DNA (Hofreuter et al. 2001). In Neisseria gonorrhoeae, one T4SS is responsible for DNA export to the extracellular space, an intermediate step in the process of natural transformation among these bacteria (Hamilton et al. 2005). Interestingly, in the case of Neisseria, the locus encodes a T4SS and a MOBH-type relaxase that is necessary for DNA export (Salgado-Pabon et al. 2007). A previous analysis of MPFT systems suggests that exaptation of conjugative systems occurred several times in evolution (Frank et al. 2005). Because we recently found that MOBless T4SS are significantly more abundant than previously thought (Guglielmini et al. 2011), this point needs to be reassessed for MPFT and developed for other MPF types. Although studies on conjugation are as old as molecular biology itself (Lederberg and Tatum 1946), several recent works have significantly changed our understanding of this process. These include the discovery of new conjugation systems (Juhas, Crook, et al. 2007), of new key elements in known conjugation systems, e.g., TcpA (Parsons et al. 2007) and of the important role of ICEs (Burrus et al. 2002; Wozniak and Waldor 2010). Recent functional studies explored the diversity of T4SS (Alvarez-Martinez and Christie 2009), and bioinformatics work unraveled the presence of T4SS in several new clades (Guglielmini et al. 2011). Finally, other works highlighted the close structural and functional relationship between T4SS used for protein secretion and conjugation (Fernandez-Gonzalez et al. 2011). This succession of works opens the opportunity to infer a global scenario for the evolution of conjugative systems and T4SS, which is the goal of the present work. To assess the uncertainty in the phylogenetic reconstruction, we used classical methods such as bootstrap analyses. Yet, because these large and deep phylogenetic reconstructions can be sensitive to alignment algorithms and to methods to extract informative positions (Philippe et al. 2011), we also tested the robustness of our results by comparing them with two automatic analyses that we did in parallel. To guide the comparisons between the three sets of analyses, we made an assessment of the quality of the multiple alignments using T-Coffee (Notredame et al. 2000). By default, we only mention the results of our expert analysis (typically, the one with highest alignment quality), but highlight differences between methods when they are relevant. The overall structure of the article is the following. First, we analyze the deep branching of the key proteins that have homologs among (nearly) all conjugative systems of a given kind. This allows uncovering the initial split of the proteins that became key to conjugative processes. Then, we focus on the early events of the diversification of ssDNA conjugation, by far the most frequent process among prokaryotes. Finally, we detail the diversification of the best-known conjugation families within ssDNA-based systems with a focus on the evolution of gene repertoires and MOBless T4SS. This analysis provides information that naturally leads to a revision of T4SS classification based on evolutionary biology.

Materials and Methods

Data

Data on complete chromosomes and plasmids of prokaryotes were taken from Genbank Refseq (ftp://ftp.ncbi.nih.gov/genomes/Bacteria/, last accessed November 2011). This included 1,207 chromosomes, 891 plasmids that were sequenced along with these chromosomes, and 1,391 plasmids that were sequenced independently. We used the annotations of the Genbank files, having removed all pseudogenes and proteins with inner stop codons. The information on T4SS was taken from Guglielmini et al. (2011).

Construction of Protein Profiles and Genome Searches

Unless mentioned explicitly, the protein profiles used are those described in Guglielmini et al. (2011). To study the presence/absence of the different components of the vir system, we made additional protein profiles, namely for VirB1, VirB2, VirB5, VirB7, VirB10, and VirB11. We first used PSI-Basic Local Alignment Search Tool (BLAST) (e value < 0.1) to search for distant homologs, using as query each of these genes from the VirB locus of the A. tumefaciens plasmid pTi SAKURA (Refseq entry NC_002147) and the aforementioned databank of completely sequenced replicons. Given the problems of convergence of PSI-BLAST when using complete genomes, and the extensive similarity of plasmid and chromosomal conjugative systems (Guglielmini et al. 2011), we restricted homology searches to plasmid sequences when building protein profiles. We retrieved the proteins with hits for each protein family and built multiple alignments using MUSCLE (Edgar 2004). We removed the few proteins with sizes very different from the average. We then rebuilt the multiple alignments with MUSCLE and trimmed them to remove the sites at the edges that were poorly aligned. We used HMMER 3.0 (Eddy 2011) to produce hidden Markov model (HMM) profiles and to perform searches within genomes. In the analysis of the evolution of the MPFT system, we only considered the hits that colocalized with previously detected vir proteins (VirB3, VirB4, VirB6, VirB8, VirB9). FtsK proteins were retrieved directly by using the PFAM PF01580 profile. TraB proteins, being closely related to FtsK, were retrieved by BLASTP searches of TraB from Streptomyces plasmid pCQ3 (YP_003280879) on the Actinomycetales proteins from the Refseq database. We sampled the top results and then built a protein profile for this protein and searched for its occurrences as for the other profiles. We built a web server to allow running the protein profiles. This is available at http://mobyle.pasteur.fr/cgi-bin/portal.py#forms::CONJscan-T4SSscan.

Phylogenetic Analysis

Unless explicitly stated, all phylogenetic analyses were performed with the following procedure. First, sequences were aligned using MUSCLE with default parameters as implemented in SeaView (Gouy et al. 2010). Second, all columns in the multiple alignment matrix with more than 80% of gaps were removed. Third, 100 replicate trees were built with RAxML 7.2.7 (Stamatakis 2006) using the model GTRGAMMA. We kept the one with the best likelihood. We calculated bootstraps with the standard implementation and used the autoMR stop criterion to obtain confidence values for each node. There were two exceptions to this method. We aligned the ATPases using MAFFT (Katoh and Toh 2010) with the G-INSI algorithm and removed the sites containing more than 60% of gaps. We performed the phylogenetic inference as mentioned earlier and additionally with PhyML 3.0 (Gascuel et al. 2010) under the LG model and with the bioNJ starting tree to get aLRT support values. The alignment of the set of VirB4 and VirD4 was built with MAFFT with the E-INSI algorithm, since these two proteins show different domain organization, and then manually edited. MAFFT was used instead of MUSCLE because it provided better alignments in these cases. The computation of 100 replicates plus hundreds of bootstrap trees was excessively time consuming, given the size of the data set in the VirB4/VirD4 analysis. Thus, we used PhyML 3.0 to build the phylogenetic tree, under the LG model and with the bioNJ starting tree. aLRT support values were also calculated for each node. The support tests we conducted revealed in this last tree some weak support that conflict with the aLRT values. To further investigate this, we used a reduced data set composed of VirB4 proteins, excluding the distant homolog TraU. Using this data set, we performed the tests described later. All multiple alignments and phylogenetic reconstructions are freely available on DRYAD (http://datadryad.org/).

Tests to the Phylogenetic Analysis

To test the robustness of our conclusions based on phylogenetic analysis, we made a number of tests. These analyses aimed at testing the robustness of the conclusions to the multiple alignments, to the identification of informative sites in multiple alignments, and to the use of a protein model matrix. We therefore produced two automatic methods where we make the alignment of the protein using MAFFT and MUSCLE. Informative sites were extracted from the alignments using BMGE (Criscuolo and Gribaldo 2010). We fine-tuned BMGE parameters for each alignment to obtain a good compromise between the quality and the number of informative sites. The best model to analyze the data was chosen with ProtTest (Darriba et al. 2011). Note that ProtTest does not analyze the GTR model for proteins, so we cannot assess whether the model chosen by ProtTest is better than ours. Trees were built as before using RAxML, and we generated 100 bootstrap trees for each analysis. To compare the different analyses, we computed the quality of multiple alignment score using the Core component of T-Coffee (Notredame et al. 2000) for the three methods (our expert analysis, the MAFFT and MUSCLE-based analyses). This score, ranging from 0 to 100, is computed by comparing the consistency of the alignment with a list of precomputed pairwise alignments called library. We used the default “Mproba_pair” library. The key results, e.g., monophyly or basal position of certain clades, were tested for the three methods and are displayed in table 1 and supplementary table S1, Supplementary Material online. Each of these tests has an identification number in the tables. This number is displayed in the respective node in the phylogenetic trees. For example, in figure 2, the node with ID no. 3 refers to the monophyly of TraB and is indicated in table 1 as having 99% bootstrap support in our expert analysis, 100% in the automatic analysis using MAFFT, and 96% in the automatic analysis using MUSCLE. In supplementary table S1, Supplementary Material online, it is indicated that for this analysis the best alignment, as given by T-Coffee, is the one of the expert alignment (score 88), followed by MAFFT (76) and then MUSCLE (67). The node no. 3 in figure 2 is thus indicated in a black circle (high bootstrap support).

Table 1.

Analysis of the Robustness of Key Phylogenetic Results.

aProteins included in the data set.

bThe different hypotheses for which we present the bootstrap supports.

cWhen the hypothesis correspond to what we observe in the reference phylogeny, and if the support value is greater than 50, it is displayed here and in the corresponding figure with a number.

dBootstrap values for each hypothesis and for each alignment technique.

Phylogenetic analysis of the AAA+ ATPases associated with conjugation. The position of the root was determined using the AAA+ ATPase VirB11 in a separate analysis. Names along the FtsK tips correspond to the taxonomic origins of each protein, reflecting the width of sampling. Bold vertical black lines represent nodes with a high support value (bootstrap >70% and aLRT >0.7). Bold gray lines represent nodes with high aLRT score (>0.7) but a weaker bootstrap (<70%). The homologs of TcpA are found only in Firmicutes. The homologs of TraB are found only in Actinobacteria. Numbers in circles refer to the analysis of robustness in table 1 (identified in the third column of table 1); black background stands for a high support (≥70% bootstrap in the best-scoring alignment) and gray background for a moderate support (≥50% bootstrap in the best-scoring alignment). Analysis of the Robustness of Key Phylogenetic Results. aProteins included in the data set. bThe different hypotheses for which we present the bootstrap supports. cWhen the hypothesis correspond to what we observe in the reference phylogeny, and if the support value is greater than 50, it is displayed here and in the corresponding figure with a number. dBootstrap values for each hypothesis and for each alignment technique.

Relative Decrease in Protein Similarity with Divergence

For each pair of T4SS loci, we made pairwise alignments of each of the orthologous pairs of genes. Alignments were done using an end-gap free version of the Needleman–Wunsch algorithm (Mount 2004), with a BLOSUM60 matrix, open penalty of 1.2, and extension penalty of 0.8. We then plotted the percentage of similarity between VirB4 homologs and each of the other pairs of homologs. The points for each scatter plot were then fitted with a spline (λ = 1,500), and the curves were superimposed.

Results and Discussion

Early Evolutionary Split of the Key Conjugation ATPases

The two families of T4CPs (with prototypes given by the VirD4pTi and TcpApCW3), the two families of ATPases (based on VirB4Ti and TraUR64), the dsDNA conjugation protein TraBpSG5, and FtsK are all part of the superfamily of AAA+ motor ATPases. Hence, we investigated the events at the onset of the natural history of conjugation from the analysis of the phylogeny-linking homologs for all these protein profiles among 3,489 replicons (see Materials and Methods). The tree was rooted using the distantly related protein family derived from VirB11Ti (Planet et al. 2001). The monophyly of VirB11 is robust in both expert and the automatic analyses (table 1). This phylogenetic reconstruction separates a monophyletic VirD4/VirB4 clade (67% boostrap) from the others. This fits previous genomic and structural analysis showing the similarity between the dsDNA translocators FtsK and TraB on the one hand and between the ssDNA translocators VirD4 and VirB4 on the other (Iyer et al. 2004; Cabezon et al. 2011). The previous analysis allows rooting the tree and highlights the early split between ssDNA and dsDNA translocases. But the inclusion of the distantly related VirB11 produces a multiple alignment with few sufficiently conserved positions, increasing uncertainty in the process of phylogenetic inference (supplementary table S1, Supplementary Material online). This reduced the power of this data set to robustly resolve the more recent splits. Thus, we excluded VirB11 from the analysis and made a new phylogenetic reconstruction of the remaining five families. This tree shows the same dichotomy at the base (fig. 2), with strong support for all five monophyletic groups with our expert analysis and in the best automatic method (table 1). These results fit our observation that our VirB4 protein profiles often match VirD4 proteins and vice versa, albeit with weak scores, and that none of these match significantly proteins from the families TraB/FtsK. T4CPs and VirB4s show clear structural similarities, underscoring a common functional mechanism (Cabezon et al. 2011). The most conspicuous structural difference between T4CPs and VirB4s is the existence of three alpha helices that are conserved in the C terminus of VirB4 proteins but are absent in T4CPs. Deletion of these helical structures in the VirB4 homolog TrwKR388 resulted in a large increase in its ATPase activity, suggesting that the C-terminal end of VirB4 proteins functions as an autoregulatory element (Pena et al. 2011). Overall, these analyses fit structural work, suggesting that the common ancestor of the VirB4/VirD4 families consisted of a soluble protein engaged in polypeptide transport (as it’s still the case in most studied VirB4 proteins). VirB4 later became membrane bound by association with the VirB3 component of T4SS (as in VirB4R388). This association can be covalent (as in VirB4R6K) (Pena et al. 2011). The protein that specialized in ssDNA transport (T4CP) also acquired an integral-membrane protein domain in its N-terminus. This component is involved in its interaction with another T4SS component, in this case VirB10 (Llosa et al. 2003; de Paz et al. 2010). The other basal branch in the phylogeny includes TraB, TcpA, and FtsK, all with strong to moderate evidence of monophyly (99%, 96%, and 62% bootstraps, respectively) (fig. 2). The relative order of the split between the three clades is different from a previously published one, but its bootstrap support is weak in our tree (and not documented in Parsons et al. 2007). SpoIIIE, a protein involved in segregation of chromosomes during Bacillus subtilis sporulation (Wu and Errington 1998), branches within the FtsK clade (data not shown). The elements of the TraB family are found only in Actinobacteria and are related with FtsK, but they do not emerge from within the FtsK. Instead, they derive independently from the ancestor of this protein. FtsK is an essential protein that, contrary to some previous suggestions (Iyer et al. 2004), includes at least one member among Archaea (YP_503307.1). The latter is annotated as FtsK-like protein, and it is not closely related with HerA proteins, which branch closer to the VirD4/B4 branches, and its study falls outside the scope of this article. FtsK phylogeny follows approximately the one of bacteria (Gupta 2004) and thus provides a guideline to the timing of the diversification of these protein families. The tree in figure 2 shows that proteins have widely diverse tip-to-root branch lengths, i.e., the proteins do not evolve according to a strict molecular clock. Therefore, we cannot assume a molecular clock that would allow dating the split of these families and thus presumably that of conjugation processes. Yet, this data does place the origin of ssDNA conjugation extremely early in the history of life. While TraB and TcpA seem to diversify after FtsK, in agreement with their presence only in Firmicutes and Actinobacteria, the diversification of the pair VirB4/VirD4 could be contemporaneous or shortly subsequent to that of FtsK. These results suggest that the two conjugation mechanisms, ssDNA and dsDNA conjugation, are based on ATPases that diverged before the last common ancestor of bacteria.

T4SS Phylogeny

We aligned the proteins matching the VirB4 and TraU profiles to infer the evolutionary history of all VirB4 homologs. We then used VirD4 to root this tree. Despite relatively weak support in the bootstrap tests (48% in the best automatic alignment and 69% in our expert analysis), this rooting shows a good aLRT support value (0.82), consistent with the literature in terms of phylogeny and biochemical function (Iyer et al. 2004; de la Cruz et al. 2010; Smillie et al. 2010) and with the previous analysis of the five ATPases (78% boostrap). The tree shows that all VirB4 and TraU-related proteins can be classified into eight groups, which are represented by eight well-supported clades (fig. 3). The two basal groups in the VirB4 phylogenetic reconstruction are MPFI followed by a group specific to Cyanobacteria (MPFC). This is in agreement with the low similarity between TraUR64 (MPFI) and VirB4Ti (MPFT) that had prevented previous phylogenetic reconstructions of all VirB4 homologs (Smillie et al. 2010). With the availability of more sequences of these proteins, notably cyanobacteria, and the inclusion of the T4CP, we could now reconstruct a reliable phylogeny. However, the position of MPFI at the basis of the tree must be taken with care. Our expert method and the two controls produce MPFI at the basis of the phylogeny but with relatively low support (45% bootstrap in the best automatic alignment) (table 1). The MPFC clade often arises at the basis in the bootstrap trees or as a sister clade of MPFI. In any case, this analysis places one of these two clades at the root of the tree in more than 85% of the boostrap analyses.

Joint phylogenetic reconstruction of the VirD4 and VirB4/TraU families of proteins from conjugative systems. Bold vertical black lines represent nodes with a high support value (aLRT ≥ 0.9), and black vertical gray lines represent nodes with a support value between 0.7 and 0.9. Black square brackets indicate the VirB4 and VirD4 clades; colored square brackets on the left delimit the different MPF clades (purple: MPFFATA, orange: MPFFA, red: MPFF, black: MPFB, blue: MPFT, yellow: MPFG, cyan: MPFC, green: MPFI); colored square brackets on the right delimit the relaxase clades within the VirD4 part of the tree (blue: MOBP, green: MOBQ, red: MOBF, purple: MOBB, orange: MOBH, brown MOBC, red/green dashed brackets: clades with a mix of MOBF and MOBQ; black: mix of MOBP, MOBF and MOBH). Numbers in circles refer to the analysis of robustness in table 1 (identified in the third column of table 1); black background stands for a high support (≥70% bootstrap in the best-scoring alignment) and gray background for a moderate support (≥50% bootstrap in the best-scoring alignment). Some mobile elements encoding an MPFI, e.g., the R64 plasmid from the MOBP12 family, besides encoding a thick rigid pilus, with homology to MPFT, also encode a thin pilus that is only required for conjugation in liquid and that is homologous to type IV pili (Kim and Komano 1997). This led to the classification of MPFI as T4SSb in opposition to MPFF and MPFT, both classed as T4SSa (Christie and Vogel 2000). However, other MPFI elements, e.g., plasmid CTX-M3, lack a thin pilus and are still able to mate in liquid at high frequency (Golebiewski et al. 2007). Thus, the thin pilus of MOBP12 plasmids is just an additional feature of some MPFI systems, acting probably just as a facilitator of liquid mating and a selector of recipients (Kim and Komano 1997), while the core MPFI machinery forms the basis of this conjugation system. In any case, the highly divergent nature of TraUR64 is a signature for this whole family of liquid maters. Nothing is known experimentally about MPFC. Because cyanobacteria diverged early on from Proteobacteria, MPFC might also contain peculiarities relevant to the genetic or physical environment of these organisms. MPFG is the next most basal group in the tree. This system was recently discovered, was identified only in Proteobacteria, and its features are largely unknown (Juhas, Crook, et al. 2007; Juhas, Power, et al. 2007). Interestingly, an MPFG encoding element, the PAPI-1 pathogenicity island of Pseudomonas aeruginosa, has several genes homologous to the thin pilus of R64 (Carter et al. 2010). Hence, the association between MPF and thin pili might be an ancestral trait. Four groups correspond to the different T4SS families of Proteobacteria (MPFF, MPFG, MPFI, MPFT) (Juhas, Crook, et al. 2007; Smillie et al. 2010). These four groups are clearly separated because they all have strong bootstraps in the analysis of monophyly (table 1), and each contains a set of four to nine genes that are specific, i.e., their protein profiles match loci of a given MPF but not those of the other MPF types (Smillie et al. 2010). Interestingly, 307 out of 327 (94%) of the T4SS of Proteobacteria are classed in one of these four clades. We investigated the loci of the 20 remaining VirB4 proteins. One of them does not colocalize with any of the other conjugation protein profiles, including relaxases and T4CP. The other 19 VirB4 are encoded near genes specific of one, and only one, MPF type. They were not classed as a given MPF just because the number of these specific genes is below the quorum we set up as a minimum for a putative complete T4SS (Guglielmini et al. 2011). Many of these 20 unclassed elements are thus probably inactive, enduring a genetic degradation that results in incomplete loci. Alternatively, they may correspond to highly modified versions of T4SS; the H. pylori Cag-pathogenicity island is notably found within these elements. A few genomes of species not classed among Proteobacteria encode T4SS classed within MPFF and MPFT. All these bacteria are diderms, i.e., they have both an inner and an outer membrane. This list includes MOBless T4SS in one Aquificae (MPFF) and one Protochlamydia (MPFF), and conjugative T4SS in one Chlorobi (MPFT), one Deferribacteres (MPFF), one Acidobacteria (MPFT), and two Fusobacteria (MPFT). These elements are scattered in the trees of MPFT and MPFF (figs. 4 and 5), suggesting different events of horizontal transfer from Proteobacteria. Indeed, they do not cluster together in the phylogenetic trees (0% in bootstrap trees). The elements of each given bacterial clade are always monophyletic, suggesting one single transfer event, but the very small number of such elements does not allow any robust conclusions for the moment. Only one nonproteobacterial clade, Acidobacteria, is basal in the tree of MPFT (100% bootstraps in the expert analysis and the controls). Acidobacteria are often regarded as a sister clade of Proteobacteria (Ciccarelli et al. 2006), and therefore, we cannot discard the possibility of a diversification of MPFT before the split between Acidobacteria and Proteobacteria. However, since MPFG and MPFI are more basal in the tree of VirB4 (fig. 3), and both only found in Proteobacteria, the scenario of a transfer from Proteobacteria to Acidobacteria remains more parsimonious. Interestingly, all T4SS predicted in these six nonproteobacterial clades were classed among MPFF and MPFT. Nothing is known about conjugation in these clades, but this data suggest they might use mechanisms closely related to, and originating from, those of Proteobacteria.

Phylogenetic analysis of MPFF VirB4 proteins. Bold vertical black lines represent nodes with a high support value (bootstrap >90%), and bold vertical gray lines represent nodes with a support value between 70% and 90%. Green branches correspond to taxa that are not from Proteobacteria (plus the outgroup). Red branches represent the VirB4 not associated to a relaxase (MOBless T4SS). Green and red dotted branches represent MOBless T4SS that are not from Proteobacteria. The bar on the right stands for the chromosomal (black) or plasmidic (white) proteins. Known representative systems are labeled. The GGI DNA release system corresponds to the N. gonorrhoeae gonococcal genetic island (Hamilton et al. 2005). Number in circles refers to the analysis of robustness in table 1 (identified in the third column of table 1); black background stands for a high support (≥70% bootstrap in the best-scoring alignment) and gray background for a moderate support (≥50% bootstrap in the best-scoring alignment).

Phylogenetic analysis of MPFT VirB4 proteins. Bold vertical black lines represent nodes with a high support value (bootstrap > 90%), and bold vertical gray lines represent nodes with a support value between 70% and 90%. Green branches correspond to taxa that are not within Proteobacteria (or the outgroup). Red branches represent VirB4 not associated to a relaxase (MOBless T4SS). The leftmost vertical bar on the right stands for chromosomal (black) or plasmidic (white) proteins. The colored bar represents the different gene order patterns found; the patterns and their corresponding color are depicted at the bottom (the numbers represent the corresponding virB gene); a pattern is attributed to a system if, considering the possibly missing vir genes, the gene order is preserved. For example, a system composed of the genes virB1, virB4, virB6, virB5, virB8, virB9, and virB10 in this order will be assigned to the orange pattern. Unique or atypical patterns are depicted in black. Known representative systems are labeled. Numbers in circles refer to the analysis of robustness in table 1 (identified in the third column of table 1); black background stands for a high support (≥70% bootstrap in the best-scoring alignment) and gray background for a moderate support (≥50% bootstrap in the best-scoring alignment). Phylogenetic analysis of MPFF VirB4 proteins. Bold vertical black lines represent nodes with a high support value (bootstrap >90%), and bold vertical gray lines represent nodes with a support value between 70% and 90%. Green branches correspond to taxa that are not from Proteobacteria (plus the outgroup). Red branches represent the VirB4 not associated to a relaxase (MOBless T4SS). Green and red dotted branches represent MOBless T4SS that are not from Proteobacteria. The bar on the right stands for the chromosomal (black) or plasmidic (white) proteins. Known representative systems are labeled. The GGI DNA release system corresponds to the N. gonorrhoeae gonococcal genetic island (Hamilton et al. 2005). Number in circles refers to the analysis of robustness in table 1 (identified in the third column of table 1); black background stands for a high support (≥70% bootstrap in the best-scoring alignment) and gray background for a moderate support (≥50% bootstrap in the best-scoring alignment).

Phylogeny of the T4CP at the Light of VirB4 Phylogeny

The trees of VirD4 and VirB4 are not congruent (ELW confidence value: 0, and SH P value < 0.01). Yet, they share many features (fig. 3). The proteins encoded by the virD4 genes colocalizing in replicons with virB4 tend to form similar clades. Notably, the VirD4 associated with each of six of the eight VirB4 clades also clustered in nearly monophyletic clades of T4CP (MPFFA, MPFFATA, MPFB, MPFG, MPFI, and MPFC). VirD4 of the two remaining clades (MPFT and MPFF) are scattered in a small number of clades. Most of the MPFFA use TcpA instead of a VirD4-like T4CP (see later). The few VirD4 proteins found in MPFFA are also monophyletic (orange in the bottom of fig. 3). It was previously shown that plasmid T4CP are sometimes scattered in different groups corresponding to given relaxases (Smillie et al. 2010). This result is still valid with the present much larger data set. For example, the T4CP clade with a mixture of MPFT and MPFF has one type of relaxase in common (MOBF). On the other hand, some relaxase types are scattered among different VirD4 clades that follow MPF types, e.g., the VirD4 associated with MPFC is monophyletic and includes three different relaxases, which are also found in other MPF types. Hence, evolution of conjugation is driven by two main constraints, one acting mainly on the T4SS, represented by VirB4, and other on the relaxosome, represented by the relaxases. T4CP tends to coevolve with both components.

Cell Envelope Adaptation in Monoderms

The most basal clades in both VirB4 and VirD4 phylogenies correspond to bacteria with both inner and outer membranes, i.e., diderms (98–100% of the bootstrap trees in all three analyses). This strongly suggests that ssDNA conjugation was invented among diderms. In this scenario, ssDNA conjugation would have been acquired by monoderm prokaryotes, i.e., organisms devoid of an outer membrane, by HGT. This also fits the observation that all monoderm conjugation systems are in two sister clades: MPFFA and MPFFATA (monophyletic in 67–55% of the bootstrap trees). MPFFATA includes six distinct groups of Firmicutes (monophyly of all Firmicutes supported by 0% of the bootstrap trees, table 1), two of Actinobacteria (monophyly of all Actinobacteria supported by 0% bootstrap trees), one of Tenericutes (monophyly of the clade supported by 96–99% bootstrap trees), and a group of Archaea unlikely to be monophyletic (bootstrap of only 17–29%) with a clear separation between Euryarchaeota and Crenarchaeota (91–96%, respectively, and 100% bootstrap support for each clade) (fig. 6). The deeper relations between these clades are difficult to disentangle, given the low bootstrap supports of the basal nodes. Within the Firmicutes clades, we find the main divisions, i.e., Bacillales, Lactobacillales, and Clostridia, scattered in the tree. This suggests that, once a conjugative system arose in this phylum, it spreads early among the main divisions, and transfers between divergent clades were maintained through a certain moment in evolution. The monophyly of monoderms in the VirB4 tree suggests that monoderms acquired conjugative systems by transfer from diderms. This early acquisition was followed by the adaptation of the T4SS to monoderms. Finally, frequent conjugation between diderms contributed to the scattered distribution of taxa in the phylogenetic tree of MPFFATA and MPFFA.

Phylogenetic analysis of MPFFATA VirB4 proteins. Bold vertical black lines represent nodes with a high support value (bootstrap >90%), and bold vertical gray lines represent nodes with a support value between 70% and 90%. Squared brackets delimit the different taxonomic clades (plus the outgroup). Red branches represent the VirB4 not associated to a relaxase (MOBless T4SS). The bar on the right stands for the chromosomal (black) or plasmidic (white) proteins. Numbers in circles refer to the analysis of robustness in table 1 (identified in the third column of table 1); black background stands for a high support (≥70% bootstrap in the best-scoring alignment) and gray background for a moderate support (≥50% bootstrap in the best-scoring alignment). The MPFFA clade includes two groups of Actinobacteria intermingled with three groups of Firmicutes (<5% bootstrap support for a net separation of the two clades) (fig. 7). The most basal group (Firmicutes III in fig. 7) is constituted by a few elements from Firmicutes (bootstrap support for this basal position of 52–100%, table 1). This suggests that the ancestral conjugative system might have arisen within Firmicutes from which it was transferred to Actinobacteria. This is consistent with the observation of a basal group, including only Firmicutes and Tenericutes in the sister MPFFATA tree (fig. 6). The subsequent split in the MPFFA group separates a clade with Actinobacteria and Firmicutes II from Firmicutes I (fig. 7). The latter encodes TcpA as a putative T4CP, which further supports the monophyly of Firmicutes I based on VirB4 sequences (52–99% of bootstrap support). Homologs of TcpA were found in the plasmid pCW3 of Clostridium perfringens, in ICEBs1 of B. subtilis, and in Tn916 of Enterococcus faecalis (Teng et al. 2008). We found that 63% of the TcpApCW3 hits were colocalized with VirB4 in MPFFA systems of Firmicutes, and all 47 of these regions lacked a VirD4-like protein. This gives further credit to the hypothesis that TcpA is an alternative T4CP (Parsons et al. 2007; Steen et al. 2009). TcpA-associated systems are, with one single exception, also associated with MOBT. The MOBT relaxase of Tn916 (Orf20), when assisted by the accessory protein Int, produces strand- and sequence-specific cleavage generating a 3′-OH (Rocco, Churchward 2006). Thus, although phylogenetically different, TcpA and VirD4 T4CPs seem to be both alternatives for ssDNA conjugation, suggesting the recruitment of a new dsDNA translocase to make ssDNA conjugation in this subclade of MPFFA. This process was concomitant with the acquisition of a very atypical relaxase, which has no similarity with other relaxases, and instead resembles replication initiator factors of phages and plasmids (Garcillan-Barcia et al. 2009). Interestingly, ICEBs1 transfers extremely fast within chains of bacteria (Babic et al. 2011). It is currently unknown if this behavior reminiscent of TraB, which as we showed earlier is a closer homolog of TcpA than VirD4, has associated mechanistic analogies, e.g., if TcpA might have maintained a dsDNA translocase activity.

Phylogenetic analysis of MPFFA VirB4 proteins. Bold vertical black lines represent nodes with a high support value (bootstrap >90%), and bold vertical gray lines represent nodes with a support value between 70% and 90%. Squared brackets delimit the different taxonomic clades (plus the outgroup). Red branches represent the VirB4 not associated to a relaxase (MOBless T4SS). The bar on the right stands for the chromosomal (black) or plasmidic (white) proteins. Numbers in circles refer to the analysis of robustness in table 1 (identified in the third column of table 1); black background stands for a high support (≥70% bootstrap in the best-scoring alignment) and gray background for a moderate support (≥50% bootstrap in the best-scoring alignment).

Evolution of MPFT

Except for VirB4, which has homologs in every T4SS, most of our protein profiles for a given MPF type allow identifying homologs only within the respective MPF system. Several of these are nearly ubiquitous within a given MPF type, and we have previously used them to class MPF types in plasmids and chromosomes (Smillie et al. 2010; Guglielmini et al. 2011). To analyze in detail the patterns of presence and absence of MPF specific genes, we analyzed the MPFT system, the best studied and most frequently found in sequenced genomes. Its prototype is the vir system of the A. tumefaciens plasmid Ti, which encodes 11 genes: virB1 to virB11. We built HMM profiles for each protein and used them to scan plasmids for homologs. We excluded chromosomes from this particular analysis because these are more likely to contain inactivated T4SS ongoing genetic degradation, and this would lead to the introduction of false positives in the analysis. Most systems include between 8 and 11 out of the 11 genes, but not always the same genes are missing (supplementary fig. S1, Supplementary Material online). The only gene nonessential for conjugation in this system, the lytic transglycosylase virB1 (Berger and Christie 1994), is often missing or not identified (absent in 48% of the MPFT). The small VirB7 lipoprotein interacts with VirB9 and performs some sort of stabilizing function (Spudich et al. 1996) and is also often missed in the search (67%). The most basal branches within the MPFT tree show an increasing number of proteins that we fail to detect, most notably the minor component of the pilus VirB5 (missing in 25%). VirB5 and VirB7 are the most exposed proteins at the cell outer membrane (Christie and Vogel 2000; Fronzes et al. 2009) and are cell receptors for phages and the immune system (Haase et al. 1995; Harris and Silverman 2002; Alvarez-Martinez and Christie 2009). They are therefore likely to evolve rapidly because of these two types of selection pressure. Accordingly, both VirB5 and VirB7 show evidence of positive selection in the T4SST of Bartonella (Engel et al. 2011). Hence, the patterns of gene absence are probably caused by both gene absence and rapid evolution of some T4SS components. The names of the different vir genes correspond to their order within the prototype VirBTi system. This prototype gene order pattern (from 1 to 11 in ascending order) is conserved in a large fraction of the MPFT (fig. 4). For almost all MPFT loci, the order is strictly conserved for a core composed of virB2, virB3, virB4, virB8, virB9, and virB10. As mentioned earlier, virB7 is often missed by our scan. The gene virB11 can be found before virB2, and virB1 after virB10; this defines the gene order depicted in green in figure 4. Importantly, the node separating the two large clades of MPFT relative to gene order is also highly supported by the analysis of the VirB4 phylogeny (98% bootstrap). The genes virB5 and virB6 are sometimes placed after virB10 (fig. 4, in dark blue), which seems a derivation from the previous pattern. These three patterns of gene order represent more than 80% of all the MPFT. Interestingly, the prototype pattern is less often found on chromosomes, the “green” pattern being more represented. It is difficult to say for the moment if this difference is a simple consequence of the higher frequency of chromosomal T4SS in this part of the tree or if this gene order is adaptive in chromosomal loci. Importantly, the clusters of gene order in the tree accurately reflect the phylogeny of VirB4. This is further evidence that recombination of distant VirB4 variants rarely occurs, even within MPF types. Considering the number of possible permutations and the relatively low number of different patterns, these data suggest that the gene order within vir systems is highly constrained in most genes, with four genes often being found in different positions (virB1, virB5, virB6, and virB11). The gene succession is also preserved; indeed, the vast majority of virB genes are directly adjacent, suggesting strong counterselection for insertions in the loci (data not shown). Highly conserved gene order at a locus is a sign of selection for a given organization of transcription (Rocha 2006). In the case of large protein complexes, such organization can give rise to an ordered assembly of the complex, as it has been shown for the flagellum (Kutsukake et al. 1990). Gene order conservation thus suggests conservation of a developmental plan. The variants we see, outlined in figure 4, could reflect innovations in this plan.

T4SS Exaptation

We recently uncovered that a large fraction of T4SS lack neighboring relaxases (Guglielmini et al. 2011). A few observations suggest that most of these are not genetic elements ongoing degradation. First, these MOBless T4SS are more often chromosomal than plasmidic. Second, many of these chromosomal elements lack neighboring integrases. Third, the T4SS known to deliver proteins were classed as MOBless T4SS. These observations suggest that many MOBless T4SS are not undergoing degradation but that, instead, they result from recruitment of conjugation systems for other functions (exaptation). The VirB4 phylogeny confirms that, within MPFT, the loss of the relaxase occurred many times and that this pattern is also found among the other MPF types (figs. 4–7). Just like conjugative systems of ICEs and plasmids are interspersed in the phylogenetic trees (Guglielmini et al. 2011), MOBless T4SS are interspersed with conjugative systems. This shows that MOBless T4SS arose frequently and in independent instances. The only exception concerns the Archaea and the Actinobacteria, for which the lack of known relaxases has been pointed out before (Garcillan-Barcia et al. 2009). In these clades, it is likely that the abundance of MOBless T4SS predominantly reflects the presence of unknown relaxases. Importantly, the T4SS that are experimentally known to have nonconjugation-related functions are interspersed in the trees of MPFT and MPFF (figs. 4 and 5). This suggests that conjugative T4SS have been frequently recruited for other functions.

An Evolution-Based Classification System for MPF

The lack of an all-encompassing classification scheme for conjugative systems and the extreme diverse gene nomenclature for homologous conjugation genes greatly and unnecessarily complicates the analysis of the literature of the domain. We suggest that the phylogeny of VirB4, the only ubiquitously recognizable protein of T4SS, could be used to class ssDNA conjugative systems and other T4SS. This could be the foundation for the much-needed gene name standardization in the literature and databases. The model systems of the vir operon of A. tumefaciens Ti plasmid (MPFT), F plasmid (MPFF), R64 plasmid (MPFI), and ICEHin1056 (MPFG) could be used for all Proteobacteria and possibly for other diderm clades such as Acidobacteria. Four other MPF types for now cover the diversity of all the other systems in so far as the VirB4 phylogeny is concerned. These would include a type that for the moment only includes Bacteroides (MPFB) and another that includes only Cyanobacteria (MPFC). The classification would also include the two types that are specific to monoderms, the MPFFA and MPFFATA. The MPFFA type, given its heterogeneity in the use of T4CP, might be split into two groups when more is known about the differences in the biochemistry of conjugation in the group. The advantage of this classification is that it is based on evolutionary biology, tends to reflect similarity between elements, and can be done even when one knows yet relatively little of the biochemistry of the elements being classed. We believe there is little risk of an excessive inflation in the number classes of MPF with the uncovering of new uncultivated bacterial clades. First, all monoderms seem to cluster in only two sister clades. Second, MPF of a string of poorly sampled clades of diderms are classed along with the four common MPF types of Proteobacteria. Some previous classifications of conjugation systems have been based on the type of replicon or on the secretion substrate. The former, separating conjugative plasmids from ICEs, are pertinent to class mobile elements but are inadequate to separate conjugative systems because MPF cannot be discriminated based on the type of the host replicon (Guglielmini et al. 2011). Classifications regarding the secretion substrate, i.e., proteins or DNA–protein complexes, pertain to the role of the T4SS and its impact on genetic mobility. They are extremely important to understand the adaptive role of T4SS in a bacterium. However, as shown in this work, they carry little information allowing classification of the T4SS. T4SS were divided on structural grounds in two classes: T4SSa-including elements from MPFT and MPFF and T4SSb including elements from MPFI (Christie and Vogel 2000). These two classes can easily be mapped into the VirB4 phylogeny in these three different MPF types. Although this classification reflects important differences in terms of conjugative pili among Proteobacteria, it no longer represents the diversity of T4SS. It is unclear how MPFG or any MPF type not present in Proteobacteria should be classed in this scheme (fig. 3). Our analysis provides a natural classification scheme for T4SS and may also help highlight the commonalities and differences between systems. Together with the classification of relaxases (Garcillan-Barcia et al. 2009; Guglielmini et al. 2011), it can be easily extended to class ssDNA conjugative systems. Furthermore, this classification system can be applied to partial data, e.g., from metagenomics, because it requires the identification of a single gene.

Conclusion

Our work provides a scenario for the evolution of conjugation and T4SS from their origin to recent exaptations (fig. 8). These results suggest that conjugation is a very ancient process that arose in two independent ways for ssDNA and dsDNA mechanisms, starting from ancestrally related AAA+ ATPases involved in DNA translocation. Conjugation of ssDNA is by far the best studied and also the mechanism most frequently found in prokaryotes. It probably appeared very early among bacteria with two cell envelopes, possibly ancient Proteobacteria, and from there it spread to all clades of prokaryotes. The T4SS of monoderms seem less complex, in that they involve fewer genes (Grohmann et al. 2003), and could initially evolve by gene deletion from the larger T4SS of diderms. Our evolutionary scenario links together all known ssDNA conjugative systems, and their T4SS, by the common ancestry of VirB4. Several observations show the validity of the use of this protein for the classification of T4SS. First, it is the only ubiquitous protein in T4SS. Second, its phylogeny closely matches those of other conserved proteins, notably the VirD4. Third, patterns of the presence/absence of MPF specific genes match the VirB4 phylogeny. Fourth, the order of MPF-specific genes, at least in MPFT, also matches the VirB4 phylogeny.

Model for the evolution of conjugation. First, DNA translocases diversify into a number of families that are involved in conjugation (ssDNA for VirB4, VirD4, and TcpA, and dsDNA for TraB). Second, ssDNA conjugation diversified in a series of clades that are the basis of MPF classes. Several of these show a preponderance of Proteobacteria. Transfer of a conjugative system to monoderms led to the diversification and further spread within Firmicutes, Actinobacteria, Archaea, and Tenericutes. Among MPFFA, some elements engaged in a dramatically different system, including TcpA and the relaxase MOBT. Finally, at much shorter evolutionary distances, we observe diversification of conjugative systems among integrative (ICEs) and extrachromosomal (plasmids) elements. Exaptation of the conjugative systems for protein delivery, DNA uptake and other, also arise relatively late in the evolutionary scale. The structure of the VirB4 tree, with its robust separation in eight large clades, reflects in part an effect of the cell envelope. Indeed, once systems arose within a clade with a peculiar membrane structure, they tended to adapt to this cell structure and were not further passed on to other clades. This resulted in large clades of VirB4, including monoderms—such as Archaea or Firmicutes—or diderms with peculiar membrane compositions such as Cyanobacteria (Wada and Murata 1998) or Bacteroides (An et al. 2011). Adaptation of the T4SS to such cell envelopes is likely to increase the efficiency of conjugation within taxa but at the cost of reducing its efficiency between taxa, effectively leading to T4SS specialization. This process has the potential to bias the rate and direction of genetic transfer between prokaryotes and thus shape the networks of gene sharing (Halary et al. 2010; Dagan 2011). Notably, it might contribute to the observed coherence between high bacterial taxonomic ranks (Philippot et al. 2010). Surprisingly, one group of ssDNA T4SS has radically changed into a system with a new T4CP (TcpA) and relaxase (MOBT). Although the cognate VirB4 protein fits clearly in our T4SS classification and is presumably representative of the evolutionary history of the remaining proteins of the MPFFA T4SS, the replacement of the T4CP suggests that the evolution of the coupling protein can in certain cases differ radically from the one of the T4SS. In several cases (fig. 3), this seems to reflect the double evolutionary constraint of T4CP in adapting to both the T4SS and to the relaxase. Our work also shows that exaptations of T4SS can occur frequently in the evolutionary history. Conjugation consists in the secretion of a nucleoprotein complex. Passing from this function to a protein secretion system can probably occur in few evolutionary steps. Accordingly, several systems are known to transfer both proteins and relaxosomes (Vogel et al. 1998; Fernandez-Gonzalez et al. 2011; Schroder et al. 2011). Furthermore, conjugation systems and MOBless T4SS can interchange components without loss of function (de Paz et al. 2005). The exaptation of H. pylori comB system is more surprising because this system has evolved into a DNA import mechanism (Hofreuter et al. 2001). Several other protein secretion systems are thought to be exaptations, e.g., nonflagellum T3SS are related with the bacterium flagellum and T6SS show structural homologies with phages (Ginocchio et al. 1994; Pell et al. 2009). Yet, T4SS present an uncommon case in that exaptations occurred multiple times in the evolutionary history. Given the present results, it is not unlikely that novel exaptations, e.g., protein transfer among bacteria, are present among the poorly studied MOBless T4SS of free-living bacteria.

Supplementary Material

Supplementary table S1 and figure S1 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/).

89 in total

Review 1. Bacterial conjugation: a two-step mechanism for DNA transport.

Authors: Matxalen Llosa; F Xavier Gomis-Rüth; Miquel Coll; Fernando de la Cruz Fd
Journal: Mol Microbiol Date: 2002-07 Impact factor: 3.501

2. The integrase of the conjugative transposon Tn916 directs strand- and sequence-specific cleavage of the origin of conjugal transfer, oriT, by the endonuclease Orf20.

Authors: Jennifer M Rocco; Gordon Churchward
Journal: J Bacteriol Date: 2006-03 Impact factor: 3.490

3. Transcriptional analysis of the flagellar regulon of Salmonella typhimurium.

Authors: K Kutsukake; Y Ohya; T Iino
Journal: J Bacteriol Date: 1990-02 Impact factor: 3.490

4. Conjugative transfer by the virulence system of Legionella pneumophila.

Authors: J P Vogel; H L Andrews; S K Wong; R R Isberg
Journal: Science Date: 1998-02-06 Impact factor: 47.728

5. Intermolecular disulfide bonds stabilize VirB7 homodimers and VirB7/VirB9 heterodimers during biogenesis of the Agrobacterium tumefaciens T-complex transport apparatus.

Authors: G M Spudich; D Fernandez; X R Zhou; P J Christie
Journal: Proc Natl Acad Sci U S A Date: 1996-07-23 Impact factor: 11.205

6. Natural transformation competence in Helicobacter pylori is mediated by the basic components of a type IV secretion system.

Authors: D Hofreuter; S Odenbreit; R Haas
Journal: Mol Microbiol Date: 2001-07 Impact factor: 3.501

7. Parallel evolution of a type IV secretion system in radiating lineages of the host-restricted bacterial pathogen Bartonella.

Authors: Philipp Engel; Walter Salzburger; Marius Liesch; Chao-Chin Chang; Soichi Maruyama; Christa Lanz; Alexandra Calteau; Aurélie Lajus; Claudine Médigue; Stephan C Schuster; Christoph Dehio
Journal: PLoS Genet Date: 2011-02-10 Impact factor: 5.917

8. Genetic complementation analysis of the Agrobacterium tumefaciens virB operon: virB2 through virB11 are essential virulence genes.

Authors: B R Berger; P J Christie
Journal: J Bacteriol Date: 1994-06 Impact factor: 3.490

9. Analysis of the complete nucleotide sequence of the Agrobacterium tumefaciens virB operon.

Authors: D V Thompson; L S Melchers; K B Idler; R A Schilperoort; P J Hooykaas
Journal: Nucleic Acids Res Date: 1988-05-25 Impact factor: 16.971

10. Efficient gene transfer in bacterial cell chains.

Authors: Ana Babic; Melanie B Berkmen; Catherine A Lee; Alan D Grossman
Journal: MBio Date: 2011-03-15 Impact factor: 7.867

88 in total

1. Chimeric Coupling Proteins Mediate Transfer of Heterologous Type IV Effectors through the Escherichia coli pKM101-Encoded Conjugation Machine.

Authors: Neal Whitaker; Trista M Berry; Nathan Rosenthal; Jay E Gordon; Christian Gonzalez-Rivera; Kathy B Sheehan; Hilary K Truchan; Lauren VieBrock; Irene L G Newton; Jason A Carlyon; Peter J Christie
Journal: J Bacteriol Date: 2016-09-09 Impact factor: 3.490

2. Functional amyloids promote retention of public goods in bacteria.

Authors: John B Bruce; Stuart A West; Ashleigh S Griffin
Journal: Proc Biol Sci Date: 2019-05-29 Impact factor: 5.349

Review 3. Mechanism and structure of the bacterial type IV secretion systems.

Authors: Peter J Christie; Neal Whitaker; Christian González-Rivera
Journal: Biochim Biophys Acta Date: 2014-01-02

Review 4. How hyperthermophiles adapt to change their lives: DNA exchange in extreme conditions.

Authors: Marleen van Wolferen; Małgorzata Ajon; Arnold J M Driessen; Sonja-Verena Albers
Journal: Extremophiles Date: 2013-05-28 Impact factor: 2.395

Review 5. A comprehensive guide to pilus biogenesis in Gram-negative bacteria.

Authors: Manuela K Hospenthal; Tiago R D Costa; Gabriel Waksman
Journal: Nat Rev Microbiol Date: 2017-05-12 Impact factor: 60.633

6. Substrate translocation involves specific lysine residues of the central channel of the conjugative coupling protein TrwB.

Authors: Delfina Larrea; Héctor D de Paz; Inmaculada Matilla; Dolores L Guzmán-Herrador; Gorka Lasso; Fernando de la Cruz; Elena Cabezón; Matxalen Llosa
Journal: Mol Genet Genomics Date: 2017-06-08 Impact factor: 3.291

Review 7. The Mosaic Type IV Secretion Systems.

Authors: Peter J Christie
Journal: EcoSal Plus Date: 2016-10

8. The Rickettsia type IV secretion system: unrealized complexity mired by gene family expansion.

Authors: Joseph J Gillespie; Isabelle Q H Phan; Timothy P Driscoll; Mark L Guillotte; Stephanie S Lehman; Kristen E Rennoll-Bankert; Sandhya Subramanian; Magda Beier-Sexton; Peter J Myler; M Sayeedur Rahman; Abdu F Azad
Journal: Pathog Dis Date: 2016-06-14 Impact factor: 3.166

9. Highly variable individual donor cell fates characterize robust horizontal gene transfer of an integrative and conjugative element.

Authors: François Delavat; Sara Mitri; Serge Pelet; Jan Roelof van der Meer
Journal: Proc Natl Acad Sci U S A Date: 2016-05-31 Impact factor: 11.205

Review 10. ESX secretion systems: mycobacterial evolution to counter host immunity.

Authors: Matthias I Gröschel; Fadel Sayes; Roxane Simeone; Laleh Majlessi; Roland Brosch
Journal: Nat Rev Microbiol Date: 2016-09-26 Impact factor: 60.633