Genetic exchange by conjugation is responsible for the spread of resistance, virulence, and social traits among prokaryotes. Recent works unraveled the functioning of the underlying type IV secretion systems (T4SS) and its distribution and recruitment for other biological processes (exaptation), notably pathogenesis. We analyzed the phylogeny of key conjugation proteins to infer the evolutionary history of conjugation and T4SS. We show that single-stranded DNA (ssDNA) and double-stranded DNA (dsDNA) conjugation, while both based on a key AAA(+) ATPase, diverged before the last common ancestor of bacteria. The two key ATPases of ssDNA conjugation are monophyletic, having diverged at an early stage from dsDNA translocases. Our data suggest that ssDNA conjugation arose first in diderm bacteria, possibly Proteobacteria, and then spread to other bacterial phyla, including bacterial monoderms and Archaea. Identifiable T4SS fall within the eight monophyletic groups, determined by both taxonomy and structure of the cell envelope. Transfer to monoderms might have occurred only once, but followed diverse adaptive paths. Remarkably, some Firmicutes developed a new conjugation system based on an atypical relaxase and an ATPase derived from a dsDNA translocase. The observed evolutionary rates and patterns of presence/absence of specific T4SS proteins show that conjugation systems are often and independently exapted for other functions. This work brings a natural basis for the classification of all kinds of conjugative systems, thus tackling a problem that is growing as fast as genomic databases. Our analysis provides the first global picture of the evolution of conjugation and shows how a self-transferrable complex multiprotein system has adapted to different taxa and often been recruited by the host. As conjugation systems became specific to certain clades and cell envelopes, they may have biased the rate and direction of gene transfer by conjugation within prokaryotes.
Genetic exchange by conjugation is responsible for the spread of resistance, virulence, and social traits among prokaryotes. Recent works unraveled the functioning of the underlying type IV secretion systems (T4SS) and its distribution and recruitment for other biological processes (exaptation), notably pathogenesis. We analyzed the phylogeny of key conjugation proteins to infer the evolutionary history of conjugation and T4SS. We show that single-stranded DNA (ssDNA) and double-stranded DNA (dsDNA) conjugation, while both based on a key AAA(+) ATPase, diverged before the last common ancestor of bacteria. The two key ATPases of ssDNA conjugation are monophyletic, having diverged at an early stage from dsDNA translocases. Our data suggest that ssDNA conjugation arose first in diderm bacteria, possibly Proteobacteria, and then spread to other bacterial phyla, including bacterial monoderms and Archaea. Identifiable T4SS fall within the eight monophyletic groups, determined by both taxonomy and structure of the cell envelope. Transfer to monoderms might have occurred only once, but followed diverse adaptive paths. Remarkably, some Firmicutes developed a new conjugation system based on an atypical relaxase and an ATPase derived from a dsDNA translocase. The observed evolutionary rates and patterns of presence/absence of specific T4SS proteins show that conjugation systems are often and independently exapted for other functions. This work brings a natural basis for the classification of all kinds of conjugative systems, thus tackling a problem that is growing as fast as genomic databases. Our analysis provides the first global picture of the evolution of conjugation and shows how a self-transferrable complex multiprotein system has adapted to different taxa and often been recruited by the host. As conjugation systems became specific to certain clades and cell envelopes, they may have biased the rate and direction of gene transfer by conjugation within prokaryotes.
Prokaryotic genomes adapt quickly to new environmental conditions largely because they can
acquire pre-evolved traits by horizontal gene transfer (HGT) (de la Cruz and Davies 2000; Gogarten et al. 2002; Ochman et al.
2005). Conjugation is a mechanism of genetic transfer that allows single-event
transfer of large DNA fragments, up to entire chromosomes. Conjugation can transfer
nonhomologous genes to the recipient genome and has a broader host range than transduction
or transformation (Amabile-Cuevas and Chicurel
1992; Llosa et al. 2002; Chen et al. 2005). Accordingly, recent work
suggests that conjugation is the most frequent mechanism of HGT (Halary et al. 2010). Indeed, conjugative systems are major players
in the spread of antibiotic resistance, metabolic pathways, symbiotic traits, and other
mobile genetic elements (de la Cruz and Davies
2000; Thomas 2000; van der Meer and Sentchilo 2003; Frost et al. 2005; Ding and Hynes 2009; Allen
et al. 2010). Conjugation is also involved in the establishment of social
processes, promoting biofilm formation (Ghigo
2001) and spreading of cooperative traits (Nogueira et al. 2009; Rankin et al.
2011). There are two known modes of conjugation that differ both in the type of
translocated DNA, single-stranded DNA (ssDNA) versus double-stranded DNA (dsDNA), and in the
complexity of the transport system (de la Cruz et al.
2010; Vogelmann et al. 2011). Both
types of conjugative systems are either encoded by autonomously replicating plasmids or
inserted in chromosomes as integrative conjugative elements (ICEs) (Smillie et al. 2010; Wozniak and Waldor 2010). We recently made a large-scale identification of ssDNA
conjugation systems, both in plasmids and ICEs, and found them to be essentially short-term
variants of otherwise identical backbone elements (Guglielmini et al. 2011).In the following, we note proteins from a given genetic element by GIMGE, where
GI refers to the gene identification and mobile genetic element (MGE) to the name of the
element (e.g., TraCF corresponds to the TraC protein of the F plasmid).
Conjugative systems involved in ssDNA conjugation include two major protein complexes:
relaxosomes and type IV secretion systems (T4SS) (reviewed in Fronzes et al. 2009; de la
Cruz et al. 2010). MGE delivery through the membranes of the donor and recipient
cells is done by the T4SS (fig. 1). In
Proteobacteria, the T4SS are a large protein complex, including a ubiquitous ATPase
(VirB4Ti or the distant homolog TraUR64), mating-pair formation
(MPF) proteins that form the transport channel, and a pilus that attaches to the recipient
cell (Alvarez-Martinez and Christie 2009; Fronzes et al. 2009). The large (>70 kDa) VirB4
ATPase is highly conserved in sequence and the only protein with clear-sequence homologs in
all known T4SS. It is therefore the marker of the presence of a T4SS (Alvarez-Martinez and Christie 2009). VirB4 is thought to energize
the assembly or activity of the secretion channel and is essential for pilus biogenesis and
substrate transfer (Berger and Christie 1993;
Fullner et al. 1996; Wallden et al. 2012). Four MPF families have been described in
Proteobacteria: MPFT (based on the T-DNA conjugation system of A.
tumefaciens plasmid Ti), MPFF (based on plasmid F), MPFI
(based on the IncI plasmid R64), and MPFG (based on ICEHIN1056) (Smillie et al. 2010). These four models describe
all functionally studied and nearly all T4SS identified by bioinformatic methods among
Proteobacteria, both in plasmids and chromosomes (Guglielmini et al. 2011). The best-studied system is the vir
operon (MPFT) from A. tumefaciens Ti plasmid.
This small operon encodes 11 VirB proteins (Thompson
et al. 1988; Ward et al. 1988), and
we use these names as a template for naming the protein families of the MPFT
system. T4SS from Cyanobacteria, Bacteroides, Firmicutes, Actinobacteria, and Archaea have
homologs to VirB4 (Guglielmini et al. 2011).
ssDNA-conjugative systems are very diverse, but very few studies have been done on the
structure, function, and evolution of T4SS outside Proteobacteria and Firmicutes.
F
Scheme of the most-studied T4SS,
the vir system of A. tumefaciens Ti plasmid. The VirBX proteins are
depicted as BX (e.g., B5 refers to the VirB5 protein). The coupling
protein VirD4 (D4) and the mobilization complex, which includes the relaxase (MOB)-DNA
complex are also represented. OM: outer membrane; IM: inner
membrane.
Scheme of the most-studied T4SS,
the vir system of A. tumefaciens Ti plasmid. The VirBX proteins are
depicted as BX (e.g., B5 refers to the VirB5 protein). The coupling
protein VirD4 (D4) and the mobilization complex, which includes the relaxase (MOB)-DNA
complex are also represented. OM: outer membrane; IM: inner
membrane.The two other essential components of the ssDNA conjugation machinery are the relaxosome
and the type IV coupling protein (T4CP). The relaxosome is composed of the relaxase (MOB)
and often includes auxiliary proteins. It nicks the dsDNA and binds the resulting ssDNA at
the origin of transfer. The diversity and evolution of the different families of relaxases
has been extensively studied (Garcillan-Barcia et al.
2009). The highly conserved T4CP binds the DNA-relaxase substrate and couples it to
the T4SS, possibly using ATP to translocate the complex across the inner membrane (Gomis-Ruth et al. 2004; Tato et al. 2005). The majority of T4CPs belong to the
VirD4Ti family, but some T4SS were recently found to lack VirD4 and instead use
a distantly related ATPase as T4CP (TcpApCW3) (Parsons et al. 2007; Steen
et al. 2009). Protein secretion systems based on T4SS do not require relaxosomes.
They usually require T4CP, albeit exceptions have been found in Bordetella
pertussis and Brucella spp. (Alvarez-Martinez and Christie 2009). In these systems, proteins are
translocated across the inner membrane by other means.Conjugation of dsDNA takes place in mycelia-producing Actinobacteria (Grohmann et al. 2003; Ghinet et al. 2011). It relies on a single protein: TraBpSG5 that
translocates dsDNA between neighboring cells in mycelia (Possoz et al. 2001). This protein resembles, in sequence and
function, the essential protein FtsK that segregates sister chromosomes in the last stages
of chromosomal replication (Bigot et al. 2007;
Vogelmann et al. 2011). They are both
members of the AAA+ motor ATPase family, which also includes both types of
T4CP (VirD4 and TcpA) and both types of ATPases essential for the function of T4SS (VirB4
and TraU). Hence, all key proteins of the dsDNA and ssDNA conjugation systems are
evolutionarily related. This association has not yet been clarified from a phylogenetic
point of view.T4SS are often recruited by bacterial pathogens to deliver effectors to eukaryotic cells
(Weiss et al. 1993; Vogel et al. 1998; Seubert
et al. 2003; Nystedt et al. 2008).
These MOBless T4SS, called so because they do not contain a relaxase gene, are closely
related to the T4SS of conjugative systems. Indeed, several T4SS can perform both
conjugation between bacteria and protein delivery (Vogel et al. 1998; Llosa et al.
2003; Schroder et al. 2011). Protein
delivery by T4SS is essential for the virulence of many plant and animal pathogens,
including Legionella pneumophila, Helicobacter pylori,
Bartonella spp., Coxiella burnetii, and
A. tumefaciens (reviewed in Seubert et al. 2003; Juhas
et al. 2008; Alvarez-Martinez and Christie
2009). Only T4SS among MPFT and MPFI have been experimentally
shown to be used for protein delivery. The extreme flexibility of T4SS has allowed at least
two other types of exaptations, i.e., evolutionary events in which part of the pre-existing
machinery of conjugation was recruited for other functions (Gould and Vrba 1982). H. pylori
genomes encode a MOBless T4SS that is used for natural transformation. It is necessary to
import environmental DNA (Hofreuter et al.
2001). In Neisseria gonorrhoeae, one T4SS is responsible for DNA
export to the extracellular space, an intermediate step in the process of natural
transformation among these bacteria (Hamilton et al.
2005). Interestingly, in the case of Neisseria, the locus encodes
a T4SS and a MOBH-type relaxase that is necessary for DNA export (Salgado-Pabon et al. 2007). A previous analysis of
MPFT systems suggests that exaptation of conjugative systems occurred several
times in evolution (Frank et al. 2005).
Because we recently found that MOBless T4SS are significantly more abundant than previously
thought (Guglielmini et al. 2011), this point
needs to be reassessed for MPFT and developed for other MPF types.Although studies on conjugation are as old as molecular biology itself (Lederberg and Tatum 1946), several recent works
have significantly changed our understanding of this process. These include the discovery of
new conjugation systems (Juhas, Crook, et al.
2007), of new key elements in known conjugation systems, e.g., TcpA (Parsons et al. 2007) and of the important role of
ICEs (Burrus et al. 2002; Wozniak and Waldor 2010). Recent functional studies explored the
diversity of T4SS (Alvarez-Martinez and Christie
2009), and bioinformatics work unraveled the presence of T4SS in several new clades
(Guglielmini et al. 2011). Finally, other
works highlighted the close structural and functional relationship between T4SS used for
protein secretion and conjugation (Fernandez-Gonzalez
et al. 2011). This succession of works opens the opportunity to infer a global
scenario for the evolution of conjugative systems and T4SS, which is the goal of the present
work. To assess the uncertainty in the phylogenetic reconstruction, we used classical
methods such as bootstrap analyses. Yet, because these large and deep phylogenetic
reconstructions can be sensitive to alignment algorithms and to methods to extract
informative positions (Philippe et al. 2011),
we also tested the robustness of our results by comparing them with two automatic analyses
that we did in parallel. To guide the comparisons between the three sets of analyses, we
made an assessment of the quality of the multiple alignments using T-Coffee (Notredame et al. 2000). By default, we only
mention the results of our expert analysis (typically, the one with highest alignment
quality), but highlight differences between methods when they are relevant. The overall
structure of the article is the following. First, we analyze the deep branching of the key
proteins that have homologs among (nearly) all conjugative systems of a given kind. This
allows uncovering the initial split of the proteins that became key to conjugative
processes. Then, we focus on the early events of the diversification of ssDNA conjugation,
by far the most frequent process among prokaryotes. Finally, we detail the diversification
of the best-known conjugation families within ssDNA-based systems with a focus on the
evolution of gene repertoires and MOBless T4SS. This analysis provides information that
naturally leads to a revision of T4SS classification based on evolutionary biology.
Materials and Methods
Data
Data on complete chromosomes and plasmids of prokaryotes were taken from Genbank Refseq
(ftp://ftp.ncbi.nih.gov/genomes/Bacteria/, last accessed November 2011). This
included 1,207 chromosomes, 891 plasmids that were sequenced along with these chromosomes,
and 1,391 plasmids that were sequenced independently. We used the annotations of the
Genbank files, having removed all pseudogenes and proteins with inner stop codons. The
information on T4SS was taken from Guglielmini et
al. (2011).
Construction of Protein Profiles and Genome Searches
Unless mentioned explicitly, the protein profiles used are those described in Guglielmini et al. (2011). To study the
presence/absence of the different components of the vir system, we made
additional protein profiles, namely for VirB1, VirB2, VirB5, VirB7, VirB10, and VirB11. We
first used PSI-Basic Local Alignment Search Tool (BLAST) (e value <
0.1) to search for distant homologs, using as query each of these genes from the VirB
locus of the A. tumefaciens plasmid pTi SAKURA (Refseq entry NC_002147)
and the aforementioned databank of completely sequenced replicons. Given the problems of
convergence of PSI-BLAST when using complete genomes, and the extensive similarity of
plasmid and chromosomal conjugative systems (Guglielmini et al. 2011), we restricted homology searches to plasmid sequences
when building protein profiles. We retrieved the proteins with hits for each protein
family and built multiple alignments using MUSCLE (Edgar 2004). We removed the few proteins with sizes very different from the
average. We then rebuilt the multiple alignments with MUSCLE and trimmed them to remove
the sites at the edges that were poorly aligned. We used HMMER 3.0 (Eddy 2011) to produce hidden Markov model (HMM) profiles and to
perform searches within genomes. In the analysis of the evolution of the MPFT
system, we only considered the hits that colocalized with previously detected
vir proteins (VirB3, VirB4, VirB6, VirB8, VirB9). FtsK proteins were
retrieved directly by using the PFAM PF01580 profile. TraB proteins, being closely related
to FtsK, were retrieved by BLASTP searches of TraB from Streptomyces
plasmid pCQ3 (YP_003280879) on the Actinomycetales proteins from the Refseq database. We
sampled the top results and then built a protein profile for this protein and searched for
its occurrences as for the other profiles. We built a web server to allow running the
protein profiles. This is available at http://mobyle.pasteur.fr/cgi-bin/portal.py#forms::CONJscan-T4SSscan.
Phylogenetic Analysis
Unless explicitly stated, all phylogenetic analyses were performed with the following
procedure. First, sequences were aligned using MUSCLE with default parameters as
implemented in SeaView (Gouy et al. 2010).
Second, all columns in the multiple alignment matrix with more than 80% of gaps
were removed. Third, 100 replicate trees were built with RAxML 7.2.7 (Stamatakis 2006) using the model GTRGAMMA. We
kept the one with the best likelihood. We calculated bootstraps with the standard
implementation and used the autoMR stop criterion to obtain confidence values for each
node. There were two exceptions to this method. We aligned the ATPases using MAFFT (Katoh and Toh 2010) with the G-INSI algorithm
and removed the sites containing more than 60% of gaps. We performed the
phylogenetic inference as mentioned earlier and additionally with PhyML 3.0 (Gascuel et al. 2010) under the LG model and with
the bioNJ starting tree to get aLRT support values. The alignment of the set of VirB4 and
VirD4 was built with MAFFT with the E-INSI algorithm, since these two proteins show
different domain organization, and then manually edited. MAFFT was used instead of MUSCLE
because it provided better alignments in these cases. The computation of 100 replicates
plus hundreds of bootstrap trees was excessively time consuming, given the size of the
data set in the VirB4/VirD4 analysis. Thus, we used PhyML 3.0 to build the phylogenetic
tree, under the LG model and with the bioNJ starting tree. aLRT support values were also
calculated for each node.The support tests we conducted revealed in this last tree some weak support that conflict
with the aLRT values. To further investigate this, we used a reduced data set composed of
VirB4 proteins, excluding the distant homolog TraU. Using this data set, we performed the
tests described later. All multiple alignments and phylogenetic reconstructions are freely
available on DRYAD (http://datadryad.org/).
Tests to the Phylogenetic Analysis
To test the robustness of our conclusions based on phylogenetic analysis, we made a
number of tests. These analyses aimed at testing the robustness of the conclusions to the
multiple alignments, to the identification of informative sites in multiple alignments,
and to the use of a protein model matrix. We therefore produced two automatic methods
where we make the alignment of the protein using MAFFT and MUSCLE. Informative sites were
extracted from the alignments using BMGE (Criscuolo
and Gribaldo 2010). We fine-tuned BMGE parameters for each alignment to obtain a
good compromise between the quality and the number of informative sites. The best model to
analyze the data was chosen with ProtTest (Darriba
et al. 2011). Note that ProtTest does not analyze the GTR model for proteins, so
we cannot assess whether the model chosen by ProtTest is better than ours. Trees were
built as before using RAxML, and we generated 100 bootstrap trees for each analysis. To
compare the different analyses, we computed the quality of multiple alignment score using
the Core component of T-Coffee (Notredame et al.
2000) for the three methods (our expert analysis, the MAFFT and MUSCLE-based
analyses). This score, ranging from 0 to 100, is computed by comparing the consistency of
the alignment with a list of precomputed pairwise alignments called library. We used the
default “Mproba_pair” library. The key results, e.g., monophyly or basal
position of certain clades, were tested for the three methods and are displayed in table 1 and supplementary
table S1, Supplementary
Material online. Each of these tests has an identification number in the
tables. This number is displayed in the respective node in the phylogenetic trees. For
example, in figure 2, the node with ID no. 3
refers to the monophyly of TraB and is indicated in table 1 as having 99% bootstrap support in our expert analysis,
100% in the automatic analysis using MAFFT, and 96% in the automatic
analysis using MUSCLE. In supplementary
table S1, Supplementary
Material online, it is indicated that for this analysis the best alignment,
as given by T-Coffee, is the one of the expert alignment (score 88), followed by MAFFT
(76) and then MUSCLE (67). The node no. 3 in figure
2 is thus indicated in a black circle (high bootstrap support).
Table 1.
Analysis of the Robustness
of Key Phylogenetic Results.
aProteins included in the data set.
bThe different hypotheses for which we present the
bootstrap supports.
cWhen the hypothesis correspond to what we observe
in the reference phylogeny, and if the support value is greater than 50, it is
displayed here and in the corresponding figure with a number.
dBootstrap values for each hypothesis and for each
alignment technique.
F
Phylogenetic analysis of the
AAA+ ATPases associated with conjugation. The position of the root was
determined using the AAA+ ATPase VirB11 in a separate analysis. Names along the
FtsK tips correspond to the taxonomic origins of each protein, reflecting the width
of sampling. Bold vertical black lines represent nodes with a high support value
(bootstrap >70% and aLRT >0.7). Bold gray lines represent nodes with
high aLRT score (>0.7) but a weaker bootstrap (<70%). The homologs of
TcpA are found only in Firmicutes. The homologs of TraB are found only in
Actinobacteria. Numbers in circles refer to the analysis of robustness in table 1 (identified in the third column of
table 1); black background stands
for a high support (≥70% bootstrap in the best-scoring alignment) and
gray background for a moderate support (≥50% bootstrap in the
best-scoring alignment).
Phylogenetic analysis of the
AAA+ ATPases associated with conjugation. The position of the root was
determined using the AAA+ ATPase VirB11 in a separate analysis. Names along the
FtsK tips correspond to the taxonomic origins of each protein, reflecting the width
of sampling. Bold vertical black lines represent nodes with a high support value
(bootstrap >70% and aLRT >0.7). Bold gray lines represent nodes with
high aLRT score (>0.7) but a weaker bootstrap (<70%). The homologs of
TcpA are found only in Firmicutes. The homologs of TraB are found only in
Actinobacteria. Numbers in circles refer to the analysis of robustness in table 1 (identified in the third column of
table 1); black background stands
for a high support (≥70% bootstrap in the best-scoring alignment) and
gray background for a moderate support (≥50% bootstrap in the
best-scoring alignment).Analysis of the Robustness
of Key Phylogenetic Results.aProteins included in the data set.bThe different hypotheses for which we present the
bootstrap supports.cWhen the hypothesis correspond to what we observe
in the reference phylogeny, and if the support value is greater than 50, it is
displayed here and in the corresponding figure with a number.dBootstrap values for each hypothesis and for each
alignment technique.
Relative Decrease in Protein Similarity with Divergence
For each pair of T4SS loci, we made pairwise alignments of each of the orthologous
pairs of genes. Alignments were done using an end-gap free version of the
Needleman–Wunsch algorithm (Mount
2004), with a BLOSUM60 matrix, open penalty of 1.2, and extension penalty of
0.8. We then plotted the percentage of similarity between VirB4 homologs and each of the
other pairs of homologs. The points for each scatter plot were then fitted with a spline
(λ = 1,500), and the curves were superimposed.
Results and Discussion
Early Evolutionary Split of the Key Conjugation ATPases
The two families of T4CPs (with prototypes given by the VirD4pTi and
TcpApCW3), the two families of ATPases (based on VirB4Ti and
TraUR64), the dsDNA conjugation protein TraBpSG5, and FtsK are all
part of the superfamily of AAA+ motor ATPases. Hence, we investigated the
events at the onset of the natural history of conjugation from the analysis of the
phylogeny-linking homologs for all these protein profiles among 3,489 replicons (see
Materials and Methods). The tree was rooted using the distantly related protein family
derived from VirB11Ti (Planet et al.
2001). The monophyly of VirB11 is robust in both expert and the automatic
analyses (table 1). This phylogenetic
reconstruction separates a monophyletic VirD4/VirB4 clade (67% boostrap) from the
others. This fits previous genomic and structural analysis showing the similarity between
the dsDNA translocators FtsK and TraB on the one hand and between the ssDNA translocators
VirD4 and VirB4 on the other (Iyer et al.
2004; Cabezon et al. 2011).The previous analysis allows rooting the tree and highlights the early split between
ssDNA and dsDNA translocases. But the inclusion of the distantly related VirB11 produces a
multiple alignment with few sufficiently conserved positions, increasing uncertainty in
the process of phylogenetic inference (supplementary
table S1, Supplementary
Material online). This reduced the power of this data set to robustly resolve
the more recent splits. Thus, we excluded VirB11 from the analysis and made a new
phylogenetic reconstruction of the remaining five families. This tree shows the same
dichotomy at the base (fig. 2), with strong
support for all five monophyletic groups with our expert analysis and in the best
automatic method (table 1). These results
fit our observation that our VirB4 protein profiles often match VirD4 proteins and vice
versa, albeit with weak scores, and that none of these match significantly proteins from
the families TraB/FtsK. T4CPs and VirB4s show clear structural similarities, underscoring
a common functional mechanism (Cabezon et al.
2011). The most conspicuous structural difference between T4CPs and VirB4s is the
existence of three alpha helices that are conserved in the C terminus of VirB4 proteins
but are absent in T4CPs. Deletion of these helical structures in the VirB4 homolog
TrwKR388 resulted in a large increase in its ATPase activity, suggesting that
the C-terminal end of VirB4 proteins functions as an autoregulatory element (Pena et al. 2011). Overall, these analyses fit
structural work, suggesting that the common ancestor of the VirB4/VirD4 families consisted
of a soluble protein engaged in polypeptide transport (as it’s still the case in
most studied VirB4 proteins). VirB4 later became membrane bound by association with the
VirB3 component of T4SS (as in VirB4R388). This association can be covalent (as
in VirB4R6K) (Pena et al. 2011).
The protein that specialized in ssDNA transport (T4CP) also acquired an integral-membrane
protein domain in its N-terminus. This component is involved in its interaction with
another T4SS component, in this case VirB10 (Llosa
et al. 2003; de Paz et al.
2010).The other basal branch in the phylogeny includes TraB, TcpA, and FtsK, all with strong to
moderate evidence of monophyly (99%, 96%, and 62% bootstraps,
respectively) (fig. 2). The relative order of
the split between the three clades is different from a previously published one, but its
bootstrap support is weak in our tree (and not documented in Parsons et al. 2007). SpoIIIE, a protein involved in segregation
of chromosomes during Bacillus subtilis sporulation (Wu and Errington 1998), branches within the FtsK
clade (data not shown). The elements of the TraB family are found only in Actinobacteria
and are related with FtsK, but they do not emerge from within the FtsK. Instead, they
derive independently from the ancestor of this protein. FtsK is an essential protein that,
contrary to some previous suggestions (Iyer et al.
2004), includes at least one member among Archaea (YP_503307.1). The latter is
annotated as FtsK-like protein, and it is not closely related with HerA proteins, which
branch closer to the VirD4/B4 branches, and its study falls outside the scope of this
article. FtsK phylogeny follows approximately the one of bacteria (Gupta 2004) and thus provides a guideline to the timing of the
diversification of these protein families. The tree in figure 2 shows that proteins have widely diverse tip-to-root
branch lengths, i.e., the proteins do not evolve according to a strict molecular clock.
Therefore, we cannot assume a molecular clock that would allow dating the split of these
families and thus presumably that of conjugation processes. Yet, this data does place the
origin of ssDNA conjugation extremely early in the history of life. While TraB and TcpA
seem to diversify after FtsK, in agreement with their presence only in Firmicutes and
Actinobacteria, the diversification of the pair VirB4/VirD4 could be contemporaneous or
shortly subsequent to that of FtsK. These results suggest that the two conjugation
mechanisms, ssDNA and dsDNA conjugation, are based on ATPases that diverged before the
last common ancestor of bacteria.
T4SS Phylogeny
We aligned the proteins matching the VirB4 and TraU profiles to infer the evolutionary
history of all VirB4 homologs. We then used VirD4 to root this tree. Despite relatively
weak support in the bootstrap tests (48% in the best automatic alignment and
69% in our expert analysis), this rooting shows a good aLRT support value (0.82),
consistent with the literature in terms of phylogeny and biochemical function (Iyer et al. 2004; de la Cruz et al. 2010; Smillie et al. 2010) and with the previous analysis of the five ATPases
(78% boostrap). The tree shows that all VirB4 and TraU-related proteins can be
classified into eight groups, which are represented by eight well-supported clades (fig. 3). The two basal groups in the VirB4
phylogenetic reconstruction are MPFI followed by a group specific to
Cyanobacteria (MPFC). This is in agreement with the low similarity between
TraUR64 (MPFI) and VirB4Ti (MPFT) that had
prevented previous phylogenetic reconstructions of all VirB4 homologs (Smillie et al. 2010). With the availability of
more sequences of these proteins, notably cyanobacteria, and the inclusion of the T4CP, we
could now reconstruct a reliable phylogeny. However, the position of MPFI at
the basis of the tree must be taken with care. Our expert method and the two controls
produce MPFI at the basis of the phylogeny but with relatively low support
(45% bootstrap in the best automatic alignment) (table 1). The MPFC clade often arises at the basis in
the bootstrap trees or as a sister clade of MPFI. In any case, this analysis
places one of these two clades at the root of the tree in more than 85% of the
boostrap analyses.
F
Joint
phylogenetic reconstruction of the VirD4 and VirB4/TraU families of proteins from
conjugative systems. Bold vertical black lines represent nodes with a high support
value (aLRT ≥ 0.9), and black vertical gray lines represent nodes with a support
value between 0.7 and 0.9. Black square brackets indicate the VirB4 and VirD4
clades; colored square brackets on the left delimit the different MPF clades
(purple: MPFFATA, orange: MPFFA, red: MPFF, black:
MPFB, blue: MPFT, yellow: MPFG, cyan:
MPFC, green: MPFI); colored square brackets on the right
delimit the relaxase clades within the VirD4 part of the tree (blue:
MOBP, green: MOBQ, red: MOBF, purple:
MOBB, orange: MOBH, brown MOBC, red/green dashed
brackets: clades with a mix of MOBF and MOBQ; black: mix of
MOBP, MOBF and MOBH). Numbers in circles refer to
the analysis of robustness in table 1
(identified in the third column of table
1); black background stands for a high support (≥70% bootstrap
in the best-scoring alignment) and gray background for a moderate support
(≥50% bootstrap in the best-scoring alignment).
Joint
phylogenetic reconstruction of the VirD4 and VirB4/TraU families of proteins from
conjugative systems. Bold vertical black lines represent nodes with a high support
value (aLRT ≥ 0.9), and black vertical gray lines represent nodes with a support
value between 0.7 and 0.9. Black square brackets indicate the VirB4 and VirD4
clades; colored square brackets on the left delimit the different MPF clades
(purple: MPFFATA, orange: MPFFA, red: MPFF, black:
MPFB, blue: MPFT, yellow: MPFG, cyan:
MPFC, green: MPFI); colored square brackets on the right
delimit the relaxase clades within the VirD4 part of the tree (blue:
MOBP, green: MOBQ, red: MOBF, purple:
MOBB, orange: MOBH, brown MOBC, red/green dashed
brackets: clades with a mix of MOBF and MOBQ; black: mix of
MOBP, MOBF and MOBH). Numbers in circles refer to
the analysis of robustness in table 1
(identified in the third column of table
1); black background stands for a high support (≥70% bootstrap
in the best-scoring alignment) and gray background for a moderate support
(≥50% bootstrap in the best-scoring alignment).Some mobile elements encoding an MPFI, e.g., the R64 plasmid from the
MOBP12 family, besides encoding a thick rigid pilus, with homology to
MPFT, also encode a thin pilus that is only required for conjugation in
liquid and that is homologous to type IV pili (Kim
and Komano 1997). This led to the classification of MPFI as T4SSb in
opposition to MPFF and MPFT, both classed as T4SSa (Christie and Vogel 2000). However, other
MPFI elements, e.g., plasmid CTX-M3, lack a thin pilus and are still able to
mate in liquid at high frequency (Golebiewski et
al. 2007). Thus, the thin pilus of MOBP12 plasmids is just an
additional feature of some MPFI systems, acting probably just as a facilitator
of liquid mating and a selector of recipients (Kim
and Komano 1997), while the core MPFI machinery forms the basis of
this conjugation system. In any case, the highly divergent nature of TraUR64 is
a signature for this whole family of liquid maters. Nothing is known experimentally about
MPFC. Because cyanobacteria diverged early on from Proteobacteria,
MPFC might also contain peculiarities relevant to the genetic or physical
environment of these organisms. MPFG is the next most basal group in the tree.
This system was recently discovered, was identified only in Proteobacteria, and its
features are largely unknown (Juhas, Crook, et al.
2007; Juhas, Power, et al. 2007).
Interestingly, an MPFG encoding element, the PAPI-1 pathogenicity island of
Pseudomonas aeruginosa, has several genes homologous to the thin pilus
of R64 (Carter et al. 2010). Hence, the
association between MPF and thin pili might be an ancestral trait.Four groups correspond to the different T4SS families of Proteobacteria (MPFF,
MPFG, MPFI, MPFT) (Juhas, Crook, et al. 2007; Smillie et al. 2010). These four groups are clearly separated
because they all have strong bootstraps in the analysis of monophyly (table 1), and each contains a set of four to
nine genes that are specific, i.e., their protein profiles match loci of a given MPF but
not those of the other MPF types (Smillie et al.
2010). Interestingly, 307 out of 327 (94%) of the T4SS of Proteobacteria
are classed in one of these four clades. We investigated the loci of the 20 remaining
VirB4 proteins. One of them does not colocalize with any of the other conjugation protein
profiles, including relaxases and T4CP. The other 19 VirB4 are encoded near genes specific
of one, and only one, MPF type. They were not classed as a given MPF just because the
number of these specific genes is below the quorum we set up as a minimum for a putative
complete T4SS (Guglielmini et al. 2011).
Many of these 20 unclassed elements are thus probably inactive, enduring a genetic
degradation that results in incomplete loci. Alternatively, they may correspond to highly
modified versions of T4SS; the H. pylori
Cag-pathogenicity island is notably found within these elements.A few genomes of species not classed among Proteobacteria encode T4SS classed within
MPFF and MPFT. All these bacteria are diderms, i.e., they have
both an inner and an outer membrane. This list includes MOBless T4SS in one Aquificae
(MPFF) and one Protochlamydia (MPFF), and conjugative T4SS in one
Chlorobi (MPFT), one Deferribacteres (MPFF), one Acidobacteria
(MPFT), and two Fusobacteria (MPFT). These elements are scattered
in the trees of MPFT and MPFF (figs. 4 and 5), suggesting different
events of horizontal transfer from Proteobacteria. Indeed, they do not cluster together in
the phylogenetic trees (0% in bootstrap trees). The elements of each given
bacterial clade are always monophyletic, suggesting one single transfer event, but the
very small number of such elements does not allow any robust conclusions for the moment.
Only one nonproteobacterial clade, Acidobacteria, is basal in the tree of MPFT
(100% bootstraps in the expert analysis and the controls). Acidobacteria are often
regarded as a sister clade of Proteobacteria (Ciccarelli et al. 2006), and therefore, we cannot discard the possibility of a
diversification of MPFT before the split between Acidobacteria and
Proteobacteria. However, since MPFG and MPFI are more basal in the
tree of VirB4 (fig. 3), and both only found in
Proteobacteria, the scenario of a transfer from Proteobacteria to Acidobacteria remains
more parsimonious. Interestingly, all T4SS predicted in these six nonproteobacterial
clades were classed among MPFF and MPFT. Nothing is known about
conjugation in these clades, but this data suggest they might use mechanisms closely
related to, and originating from, those of Proteobacteria.
F
Phylogenetic analysis of MPFT
VirB4 proteins. Bold vertical black lines represent nodes with a high support value
(bootstrap > 90%), and bold vertical gray lines represent nodes with a
support value between 70% and 90%. Green branches correspond to taxa
that are not within Proteobacteria (or the outgroup). Red branches represent VirB4
not associated to a relaxase (MOBless T4SS). The leftmost vertical bar on the right
stands for chromosomal (black) or plasmidic (white) proteins. The colored bar
represents the different gene order patterns found; the patterns and their
corresponding color are depicted at the bottom (the numbers represent the
corresponding virB gene); a pattern is attributed to a system if,
considering the possibly missing vir genes, the gene order is
preserved. For example, a system composed of the genes virB1,
virB4, virB6, virB5,
virB8, virB9, and virB10 in
this order will be assigned to the orange pattern. Unique or atypical patterns are
depicted in black. Known representative systems are labeled. Numbers in circles
refer to the analysis of robustness in table
1 (identified in the third column of table 1); black background stands for a high support (≥70%
bootstrap in the best-scoring alignment) and gray background for a moderate support
(≥50% bootstrap in the best-scoring alignment).
F
Phylogenetic analysis of
MPFF VirB4 proteins. Bold vertical black lines represent nodes with a
high support value (bootstrap >90%), and bold vertical gray lines
represent nodes with a support value between 70% and 90%. Green
branches correspond to taxa that are not from Proteobacteria (plus the outgroup).
Red branches represent the VirB4 not associated to a relaxase (MOBless T4SS). Green
and red dotted branches represent MOBless T4SS that are not from Proteobacteria. The
bar on the right stands for the chromosomal (black) or plasmidic (white) proteins.
Known representative systems are labeled. The GGI DNA release system corresponds to
the N. gonorrhoeae gonococcal genetic island (Hamilton et al. 2005). Number in circles refers to the
analysis of robustness in table 1
(identified in the third column of table
1); black background stands for a high support (≥70% bootstrap
in the best-scoring alignment) and gray background for a moderate support
(≥50% bootstrap in the best-scoring alignment).
Phylogenetic analysis of MPFT
VirB4 proteins. Bold vertical black lines represent nodes with a high support value
(bootstrap > 90%), and bold vertical gray lines represent nodes with a
support value between 70% and 90%. Green branches correspond to taxa
that are not within Proteobacteria (or the outgroup). Red branches represent VirB4
not associated to a relaxase (MOBless T4SS). The leftmost vertical bar on the right
stands for chromosomal (black) or plasmidic (white) proteins. The colored bar
represents the different gene order patterns found; the patterns and their
corresponding color are depicted at the bottom (the numbers represent the
corresponding virB gene); a pattern is attributed to a system if,
considering the possibly missing vir genes, the gene order is
preserved. For example, a system composed of the genes virB1,
virB4, virB6, virB5,
virB8, virB9, and virB10 in
this order will be assigned to the orange pattern. Unique or atypical patterns are
depicted in black. Known representative systems are labeled. Numbers in circles
refer to the analysis of robustness in table
1 (identified in the third column of table 1); black background stands for a high support (≥70%
bootstrap in the best-scoring alignment) and gray background for a moderate support
(≥50% bootstrap in the best-scoring alignment).Phylogenetic analysis of
MPFF VirB4 proteins. Bold vertical black lines represent nodes with a
high support value (bootstrap >90%), and bold vertical gray lines
represent nodes with a support value between 70% and 90%. Green
branches correspond to taxa that are not from Proteobacteria (plus the outgroup).
Red branches represent the VirB4 not associated to a relaxase (MOBless T4SS). Green
and red dotted branches represent MOBless T4SS that are not from Proteobacteria. The
bar on the right stands for the chromosomal (black) or plasmidic (white) proteins.
Known representative systems are labeled. The GGI DNA release system corresponds to
the N. gonorrhoeae gonococcal genetic island (Hamilton et al. 2005). Number in circles refers to the
analysis of robustness in table 1
(identified in the third column of table
1); black background stands for a high support (≥70% bootstrap
in the best-scoring alignment) and gray background for a moderate support
(≥50% bootstrap in the best-scoring alignment).
Phylogeny of the T4CP at the Light of VirB4 Phylogeny
The trees of VirD4 and VirB4 are not congruent (ELW confidence value: 0, and SH
P value < 0.01). Yet, they share many features (fig. 3). The proteins encoded by the virD4 genes
colocalizing in replicons with virB4 tend to form similar clades.
Notably, the VirD4 associated with each of six of the eight VirB4 clades also clustered in
nearly monophyletic clades of T4CP (MPFFA, MPFFATA, MPFB,
MPFG, MPFI, and MPFC). VirD4 of the two remaining
clades (MPFT and MPFF) are scattered in a small number of clades.
Most of the MPFFA use TcpA instead of a VirD4-like T4CP (see later). The few
VirD4 proteins found in MPFFA are also monophyletic (orange in the bottom of
fig. 3). It was previously shown that
plasmid T4CP are sometimes scattered in different groups corresponding to given relaxases
(Smillie et al. 2010). This result is
still valid with the present much larger data set. For example, the T4CP clade with a
mixture of MPFT and MPFF has one type of relaxase in common
(MOBF). On the other hand, some relaxase types are scattered among different
VirD4 clades that follow MPF types, e.g., the VirD4 associated with MPFC is
monophyletic and includes three different relaxases, which are also found in other MPF
types. Hence, evolution of conjugation is driven by two main constraints, one acting
mainly on the T4SS, represented by VirB4, and other on the relaxosome, represented by the
relaxases. T4CP tends to coevolve with both components.
Cell Envelope Adaptation in Monoderms
The most basal clades in both VirB4 and VirD4 phylogenies correspond to bacteria with
both inner and outer membranes, i.e., diderms (98–100% of the bootstrap trees
in all three analyses). This strongly suggests that ssDNA conjugation was invented among
diderms. In this scenario, ssDNA conjugation would have been acquired by monoderm
prokaryotes, i.e., organisms devoid of an outer membrane, by HGT. This also fits the
observation that all monoderm conjugation systems are in two sister clades:
MPFFA and MPFFATA (monophyletic in 67–55% of the
bootstrap trees).MPFFATA includes six distinct groups of Firmicutes (monophyly of all
Firmicutes supported by 0% of the bootstrap trees, table 1), two of Actinobacteria (monophyly of all Actinobacteria
supported by 0% bootstrap trees), one of Tenericutes (monophyly of the clade
supported by 96–99% bootstrap trees), and a group of Archaea unlikely to be
monophyletic (bootstrap of only 17–29%) with a clear separation between
Euryarchaeota and Crenarchaeota (91–96%, respectively, and 100%
bootstrap support for each clade) (fig. 6).
The deeper relations between these clades are difficult to disentangle, given the low
bootstrap supports of the basal nodes. Within the Firmicutes clades, we find the main
divisions, i.e., Bacillales, Lactobacillales, and Clostridia, scattered in the tree. This
suggests that, once a conjugative system arose in this phylum, it spreads early among the
main divisions, and transfers between divergent clades were maintained through a certain
moment in evolution. The monophyly of monoderms in the VirB4 tree suggests that monoderms
acquired conjugative systems by transfer from diderms. This early acquisition was followed
by the adaptation of the T4SS to monoderms. Finally, frequent conjugation between diderms
contributed to the scattered distribution of taxa in the phylogenetic tree of
MPFFATA and MPFFA.
F
Phylogenetic analysis of MPFFATA VirB4 proteins. Bold
vertical black lines represent nodes with a high support value (bootstrap
>90%), and bold vertical gray lines represent nodes with a support value
between 70% and 90%. Squared brackets delimit the different taxonomic
clades (plus the outgroup). Red branches represent the VirB4 not associated to a
relaxase (MOBless T4SS). The bar on the right stands for the chromosomal (black) or
plasmidic (white) proteins. Numbers in circles refer to the analysis of robustness
in table 1 (identified in the third
column of table 1); black background
stands for a high support (≥70% bootstrap in the best-scoring alignment)
and gray background for a moderate support (≥50% bootstrap in the
best-scoring alignment).
Phylogenetic analysis of MPFFATA VirB4 proteins. Bold
vertical black lines represent nodes with a high support value (bootstrap
>90%), and bold vertical gray lines represent nodes with a support value
between 70% and 90%. Squared brackets delimit the different taxonomic
clades (plus the outgroup). Red branches represent the VirB4 not associated to a
relaxase (MOBless T4SS). The bar on the right stands for the chromosomal (black) or
plasmidic (white) proteins. Numbers in circles refer to the analysis of robustness
in table 1 (identified in the third
column of table 1); black background
stands for a high support (≥70% bootstrap in the best-scoring alignment)
and gray background for a moderate support (≥50% bootstrap in the
best-scoring alignment).The MPFFA clade includes two groups of Actinobacteria intermingled with three
groups of Firmicutes (<5% bootstrap support for a net separation of the two
clades) (fig. 7). The most basal group
(Firmicutes III in fig. 7) is constituted by a
few elements from Firmicutes (bootstrap support for this basal position of
52–100%, table 1). This
suggests that the ancestral conjugative system might have arisen within Firmicutes from
which it was transferred to Actinobacteria. This is consistent with the observation of a
basal group, including only Firmicutes and Tenericutes in the sister MPFFATA
tree (fig. 6). The subsequent split in the
MPFFA group separates a clade with Actinobacteria and Firmicutes II from
Firmicutes I (fig. 7). The latter encodes TcpA
as a putative T4CP, which further supports the monophyly of Firmicutes I based on VirB4
sequences (52–99% of bootstrap support). Homologs of TcpA were found in the
plasmid pCW3 of Clostridium perfringens, in ICEBs1 of
B. subtilis, and in Tn916 of Enterococcus
faecalis (Teng et al. 2008). We
found that 63% of the TcpApCW3 hits were colocalized with VirB4 in
MPFFA systems of Firmicutes, and all 47 of these regions lacked a VirD4-like
protein. This gives further credit to the hypothesis that TcpA is an alternative T4CP
(Parsons et al. 2007; Steen et al. 2009). TcpA-associated systems are,
with one single exception, also associated with MOBT. The MOBT
relaxase of Tn916 (Orf20), when assisted by the accessory protein Int, produces strand-
and sequence-specific cleavage generating a 3′-OH (Rocco, Churchward 2006). Thus, although phylogenetically
different, TcpA and VirD4 T4CPs seem to be both alternatives for ssDNA conjugation,
suggesting the recruitment of a new dsDNA translocase to make ssDNA conjugation in this
subclade of MPFFA. This process was concomitant with the acquisition of a very
atypical relaxase, which has no similarity with other relaxases, and instead resembles
replication initiator factors of phages and plasmids (Garcillan-Barcia et al. 2009). Interestingly,
ICEBs1 transfers extremely fast within chains of bacteria (Babic et al. 2011). It is currently unknown if
this behavior reminiscent of TraB, which as we showed earlier is a closer homolog of TcpA
than VirD4, has associated mechanistic analogies, e.g., if TcpA might have maintained a
dsDNA translocase activity.
F
Phylogenetic analysis of MPFFA VirB4 proteins. Bold
vertical black lines represent nodes with a high support value (bootstrap
>90%), and bold vertical gray lines represent nodes with a support value
between 70% and 90%. Squared brackets delimit the different taxonomic
clades (plus the outgroup). Red branches represent the VirB4 not associated to a
relaxase (MOBless T4SS). The bar on the right stands for the chromosomal (black) or
plasmidic (white) proteins. Numbers in circles refer to the analysis of robustness
in table 1 (identified in the third
column of table 1); black background
stands for a high support (≥70% bootstrap in the best-scoring alignment)
and gray background for a moderate support (≥50% bootstrap in the
best-scoring alignment).
Phylogenetic analysis of MPFFA VirB4 proteins. Bold
vertical black lines represent nodes with a high support value (bootstrap
>90%), and bold vertical gray lines represent nodes with a support value
between 70% and 90%. Squared brackets delimit the different taxonomic
clades (plus the outgroup). Red branches represent the VirB4 not associated to a
relaxase (MOBless T4SS). The bar on the right stands for the chromosomal (black) or
plasmidic (white) proteins. Numbers in circles refer to the analysis of robustness
in table 1 (identified in the third
column of table 1); black background
stands for a high support (≥70% bootstrap in the best-scoring alignment)
and gray background for a moderate support (≥50% bootstrap in the
best-scoring alignment).
Evolution of MPFT
Except for VirB4, which has homologs in every T4SS, most of our protein profiles for a
given MPF type allow identifying homologs only within the respective MPF system. Several
of these are nearly ubiquitous within a given MPF type, and we have previously used them
to class MPF types in plasmids and chromosomes (Smillie et al. 2010; Guglielmini et al.
2011). To analyze in detail the patterns of presence and absence of MPF specific
genes, we analyzed the MPFT system, the best studied and most frequently found
in sequenced genomes. Its prototype is the vir system of the A.
tumefaciens plasmid Ti, which encodes 11 genes: virB1 to
virB11. We built HMM profiles for each protein and used them to scan
plasmids for homologs. We excluded chromosomes from this particular analysis because these
are more likely to contain inactivated T4SS ongoing genetic degradation, and this would
lead to the introduction of false positives in the analysis. Most systems include between
8 and 11 out of the 11 genes, but not always the same genes are missing (supplementary
fig. S1, Supplementary
Material online). The only gene nonessential for conjugation in this system,
the lytic transglycosylase virB1 (Berger and Christie 1994), is often missing or not identified (absent in
48% of the MPFT). The small VirB7 lipoprotein interacts with VirB9 and
performs some sort of stabilizing function (Spudich
et al. 1996) and is also often missed in the search (67%). The most basal
branches within the MPFT tree show an increasing number of proteins that we
fail to detect, most notably the minor component of the pilus VirB5 (missing in
25%). VirB5 and VirB7 are the most exposed proteins at the cell outer membrane
(Christie and Vogel 2000; Fronzes et al. 2009) and are cell receptors for
phages and the immune system (Haase et al.
1995; Harris and Silverman 2002;
Alvarez-Martinez and Christie 2009). They
are therefore likely to evolve rapidly because of these two types of selection pressure.
Accordingly, both VirB5 and VirB7 show evidence of positive selection in the
T4SST of Bartonella (Engel et al. 2011). Hence, the patterns of gene absence are probably caused by
both gene absence and rapid evolution of some T4SS components.The names of the different vir genes correspond to their order within
the prototype VirBTi system. This prototype gene order pattern (from 1 to 11 in
ascending order) is conserved in a large fraction of the MPFT (fig. 4). For almost all MPFT loci, the
order is strictly conserved for a core composed of virB2,
virB3, virB4, virB8,
virB9, and virB10. As mentioned earlier,
virB7 is often missed by our scan. The gene virB11 can
be found before virB2, and virB1 after
virB10; this defines the gene order depicted in green in figure 4. Importantly, the node separating the two
large clades of MPFT relative to gene order is also highly supported by the
analysis of the VirB4 phylogeny (98% bootstrap). The genes virB5
and virB6 are sometimes placed after virB10 (fig. 4, in dark blue), which seems a derivation
from the previous pattern. These three patterns of gene order represent more than
80% of all the MPFT. Interestingly, the prototype pattern is less often
found on chromosomes, the “green” pattern being more represented. It is
difficult to say for the moment if this difference is a simple consequence of the higher
frequency of chromosomal T4SS in this part of the tree or if this gene order is adaptive
in chromosomal loci. Importantly, the clusters of gene order in the tree accurately
reflect the phylogeny of VirB4. This is further evidence that recombination of distant
VirB4 variants rarely occurs, even within MPF types.Considering the number of possible permutations and the relatively low number of
different patterns, these data suggest that the gene order within vir
systems is highly constrained in most genes, with four genes often being found in
different positions (virB1, virB5,
virB6, and virB11). The gene succession is also
preserved; indeed, the vast majority of virB genes are directly adjacent,
suggesting strong counterselection for insertions in the loci (data not shown). Highly
conserved gene order at a locus is a sign of selection for a given organization of
transcription (Rocha 2006). In the case of
large protein complexes, such organization can give rise to an ordered assembly of the
complex, as it has been shown for the flagellum (Kutsukake et al. 1990). Gene order conservation thus suggests conservation of a
developmental plan. The variants we see, outlined in figure 4, could reflect innovations in this plan.
T4SS Exaptation
We recently uncovered that a large fraction of T4SS lack neighboring relaxases (Guglielmini et al. 2011). A few observations
suggest that most of these are not genetic elements ongoing degradation. First, these
MOBless T4SS are more often chromosomal than plasmidic. Second, many of these chromosomal
elements lack neighboring integrases. Third, the T4SS known to deliver proteins were
classed as MOBless T4SS. These observations suggest that many MOBless T4SS are not
undergoing degradation but that, instead, they result from recruitment of conjugation
systems for other functions (exaptation). The VirB4 phylogeny confirms that, within
MPFT, the loss of the relaxase occurred many times and that this pattern is
also found among the other MPF types (figs. 4–7). Just like
conjugative systems of ICEs and plasmids are interspersed in the phylogenetic trees (Guglielmini et al. 2011), MOBless T4SS are
interspersed with conjugative systems. This shows that MOBless T4SS arose frequently and
in independent instances. The only exception concerns the Archaea and the Actinobacteria,
for which the lack of known relaxases has been pointed out before (Garcillan-Barcia et al. 2009). In these clades, it is likely
that the abundance of MOBless T4SS predominantly reflects the presence of unknown
relaxases. Importantly, the T4SS that are experimentally known to have
nonconjugation-related functions are interspersed in the trees of MPFT and MPFF (figs. 4 and 5). This suggests that conjugative T4SS have been frequently recruited for other
functions.
An Evolution-Based Classification System for MPF
The lack of an all-encompassing classification scheme for conjugative systems and the
extreme diverse gene nomenclature for homologous conjugation genes greatly and
unnecessarily complicates the analysis of the literature of the domain. We suggest that
the phylogeny of VirB4, the only ubiquitously recognizable protein of T4SS, could be used
to class ssDNA conjugative systems and other T4SS. This could be the foundation for the
much-needed gene name standardization in the literature and databases. The model systems
of the vir operon of A. tumefaciens Ti plasmid
(MPFT), F plasmid (MPFF), R64 plasmid (MPFI), and
ICEHin1056 (MPFG) could be used for all Proteobacteria and possibly for other
diderm clades such as Acidobacteria. Four other MPF types for now cover the diversity of
all the other systems in so far as the VirB4 phylogeny is concerned. These would include a
type that for the moment only includes Bacteroides (MPFB) and another that
includes only Cyanobacteria (MPFC). The classification would also include the
two types that are specific to monoderms, the MPFFA and MPFFATA. The
MPFFA type, given its heterogeneity in the use of T4CP, might be split into
two groups when more is known about the differences in the biochemistry of conjugation in
the group. The advantage of this classification is that it is based on evolutionary
biology, tends to reflect similarity between elements, and can be done even when one knows
yet relatively little of the biochemistry of the elements being classed.We believe there is little risk of an excessive inflation in the number classes of MPF
with the uncovering of new uncultivated bacterial clades. First, all monoderms seem to
cluster in only two sister clades. Second, MPF of a string of poorly sampled clades of
diderms are classed along with the four common MPF types of Proteobacteria. Some previous
classifications of conjugation systems have been based on the type of replicon or on the
secretion substrate. The former, separating conjugative plasmids from ICEs, are pertinent
to class mobile elements but are inadequate to separate conjugative systems because MPF
cannot be discriminated based on the type of the host replicon (Guglielmini et al. 2011). Classifications regarding the
secretion substrate, i.e., proteins or DNA–protein complexes, pertain to the role of
the T4SS and its impact on genetic mobility. They are extremely important to understand
the adaptive role of T4SS in a bacterium. However, as shown in this work, they carry
little information allowing classification of the T4SS.T4SS were divided on structural grounds in two classes: T4SSa-including elements from
MPFT and MPFF and T4SSb including elements from MPFI
(Christie and Vogel 2000). These two
classes can easily be mapped into the VirB4 phylogeny in these three different MPF types.
Although this classification reflects important differences in terms of conjugative pili
among Proteobacteria, it no longer represents the diversity of T4SS. It is unclear how
MPFG or any MPF type not present in Proteobacteria should be classed in this
scheme (fig. 3). Our analysis provides a
natural classification scheme for T4SS and may also help highlight the commonalities and
differences between systems. Together with the classification of relaxases (Garcillan-Barcia et al. 2009; Guglielmini et al. 2011), it can be easily
extended to class ssDNA conjugative systems. Furthermore, this classification system can
be applied to partial data, e.g., from metagenomics, because it requires the
identification of a single gene.
Conclusion
Our work provides a scenario for the evolution of conjugation and T4SS from their origin to
recent exaptations (fig. 8). These results
suggest that conjugation is a very ancient process that arose in two independent ways for
ssDNA and dsDNA mechanisms, starting from ancestrally related AAA+ ATPases involved in
DNA translocation. Conjugation of ssDNA is by far the best studied and also the mechanism
most frequently found in prokaryotes. It probably appeared very early among bacteria with
two cell envelopes, possibly ancient Proteobacteria, and from there it spread to all clades
of prokaryotes. The T4SS of monoderms seem less complex, in that they involve fewer genes
(Grohmann et al. 2003), and could initially
evolve by gene deletion from the larger T4SS of diderms. Our evolutionary scenario links
together all known ssDNA conjugative systems, and their T4SS, by the common ancestry of
VirB4. Several observations show the validity of the use of this protein for the
classification of T4SS. First, it is the only ubiquitous protein in T4SS. Second, its
phylogeny closely matches those of other conserved proteins, notably the VirD4. Third,
patterns of the presence/absence of MPF specific genes match the VirB4 phylogeny. Fourth,
the order of MPF-specific genes, at least in MPFT, also matches the VirB4
phylogeny.
F
Model for the
evolution of conjugation. First, DNA translocases diversify into a number of families
that are involved in conjugation (ssDNA for VirB4, VirD4, and TcpA, and dsDNA for
TraB). Second, ssDNA conjugation diversified in a series of clades that are the basis
of MPF classes. Several of these show a preponderance of Proteobacteria. Transfer of a
conjugative system to monoderms led to the diversification and further spread within
Firmicutes, Actinobacteria, Archaea, and Tenericutes. Among MPFFA, some
elements engaged in a dramatically different system, including TcpA and the relaxase
MOBT. Finally, at much shorter evolutionary distances, we observe
diversification of conjugative systems among integrative (ICEs) and extrachromosomal
(plasmids) elements. Exaptation of the conjugative systems for protein delivery, DNA
uptake and other, also arise relatively late in the evolutionary
scale.
Model for the
evolution of conjugation. First, DNA translocases diversify into a number of families
that are involved in conjugation (ssDNA for VirB4, VirD4, and TcpA, and dsDNA for
TraB). Second, ssDNA conjugation diversified in a series of clades that are the basis
of MPF classes. Several of these show a preponderance of Proteobacteria. Transfer of a
conjugative system to monoderms led to the diversification and further spread within
Firmicutes, Actinobacteria, Archaea, and Tenericutes. Among MPFFA, some
elements engaged in a dramatically different system, including TcpA and the relaxase
MOBT. Finally, at much shorter evolutionary distances, we observe
diversification of conjugative systems among integrative (ICEs) and extrachromosomal
(plasmids) elements. Exaptation of the conjugative systems for protein delivery, DNA
uptake and other, also arise relatively late in the evolutionary
scale.The structure of the VirB4 tree, with its robust separation in eight large clades, reflects
in part an effect of the cell envelope. Indeed, once systems arose within a clade with a
peculiar membrane structure, they tended to adapt to this cell structure and were not
further passed on to other clades. This resulted in large clades of VirB4, including
monoderms—such as Archaea or Firmicutes—or diderms with peculiar membrane
compositions such as Cyanobacteria (Wada and Murata
1998) or Bacteroides (An et al. 2011).
Adaptation of the T4SS to such cell envelopes is likely to increase the efficiency of
conjugation within taxa but at the cost of reducing its efficiency between taxa, effectively
leading to T4SS specialization. This process has the potential to bias the rate and
direction of genetic transfer between prokaryotes and thus shape the networks of gene
sharing (Halary et al. 2010; Dagan 2011). Notably, it might contribute to the
observed coherence between high bacterial taxonomic ranks (Philippot et al. 2010).Surprisingly, one group of ssDNA T4SS has radically changed into a system with a new T4CP
(TcpA) and relaxase (MOBT). Although the cognate VirB4 protein fits clearly in
our T4SS classification and is presumably representative of the evolutionary history of the
remaining proteins of the MPFFA T4SS, the replacement of the T4CP suggests that
the evolution of the coupling protein can in certain cases differ radically from the one of
the T4SS. In several cases (fig. 3), this seems
to reflect the double evolutionary constraint of T4CP in adapting to both the T4SS and to
the relaxase.Our work also shows that exaptations of T4SS can occur frequently in the evolutionary
history. Conjugation consists in the secretion of a nucleoprotein complex. Passing from this
function to a protein secretion system can probably occur in few evolutionary steps.
Accordingly, several systems are known to transfer both proteins and relaxosomes (Vogel et al. 1998; Fernandez-Gonzalez et al. 2011; Schroder et al. 2011). Furthermore, conjugation systems and
MOBless T4SS can interchange components without loss of function (de Paz et al. 2005). The exaptation of H. pylori
comB system is more surprising because this system has evolved into a DNA import
mechanism (Hofreuter et al. 2001). Several
other protein secretion systems are thought to be exaptations, e.g., nonflagellum T3SS are
related with the bacterium flagellum and T6SS show structural homologies with phages (Ginocchio et al. 1994; Pell et al. 2009). Yet, T4SS present an uncommon case in that
exaptations occurred multiple times in the evolutionary history. Given the present results,
it is not unlikely that novel exaptations, e.g., protein transfer among bacteria, are
present among the poorly studied MOBless T4SS of free-living bacteria.
Supplementary Material
Supplementary table
S1 and figure
S1 are available at Molecular Biology and Evolution online
(http://www.mbe.oxfordjournals.org/).
Authors: Philipp Engel; Walter Salzburger; Marius Liesch; Chao-Chin Chang; Soichi Maruyama; Christa Lanz; Alexandra Calteau; Aurélie Lajus; Claudine Médigue; Stephan C Schuster; Christoph Dehio Journal: PLoS Genet Date: 2011-02-10 Impact factor: 5.917
Authors: Neal Whitaker; Trista M Berry; Nathan Rosenthal; Jay E Gordon; Christian Gonzalez-Rivera; Kathy B Sheehan; Hilary K Truchan; Lauren VieBrock; Irene L G Newton; Jason A Carlyon; Peter J Christie Journal: J Bacteriol Date: 2016-09-09 Impact factor: 3.490
Authors: Delfina Larrea; Héctor D de Paz; Inmaculada Matilla; Dolores L Guzmán-Herrador; Gorka Lasso; Fernando de la Cruz; Elena Cabezón; Matxalen Llosa Journal: Mol Genet Genomics Date: 2017-06-08 Impact factor: 3.291
Authors: Joseph J Gillespie; Isabelle Q H Phan; Timothy P Driscoll; Mark L Guillotte; Stephanie S Lehman; Kristen E Rennoll-Bankert; Sandhya Subramanian; Magda Beier-Sexton; Peter J Myler; M Sayeedur Rahman; Abdu F Azad Journal: Pathog Dis Date: 2016-06-14 Impact factor: 3.166