Literature DB >> 35798888

The evolutionary origin of host association in the Rickettsiales.

Max E Schön1, Joran Martijn1,2,3, Julian Vosseberg1,4, Stephan Köstlbacher5, Thijs J G Ettema6,7.   

Abstract

The evolution of obligate host-association of bacterial symbionts and pathogens remains poorly understood. The Rickettsiales are an alphaproteobacterial order of obligate endosymbionts and parasites that infect a wide variety of eukaryotic hosts, including humans, livestock, insects and protists. Induced by their host-associated lifestyle, Rickettsiales genomes have undergone reductive evolution, leading to small, AT-rich genomes with limited metabolic capacities. Here we uncover eleven deep-branching alphaproteobacterial metagenome assembled genomes from aquatic environments, including data from the Tara Oceans initiative and other publicly available datasets, distributed over three previously undescribed Rickettsiales-related clades. Phylogenomic analyses reveal that two of these clades, Mitibacteraceae and Athabascaceae, branch sister to all previously sampled Rickettsiales. The third clade, Gamibacteraceae, branch sister to the recently identified ectosymbiotic 'Candidatus Deianiraea vastatrix'. Comparative analyses indicate that the gene complement of Mitibacteraceae and Athabascaceae is reminiscent of that of free-living and biofilm-associated bacteria. Ancestral genome content reconstruction across the Rickettsiales species tree further suggests that the evolution of host association in Rickettsiales was a gradual process that may have involved the repurposing of a type IV secretion system.
© 2022. The Author(s).

Entities:  

Mesh:

Year:  2022        PMID: 35798888      PMCID: PMC9352585          DOI: 10.1038/s41564-022-01169-x

Source DB:  PubMed          Journal:  Nat Microbiol        ISSN: 2058-5276            Impact factor:   30.964


Main

Obligate host-associated bacteria include pathogens that represent a leading cause of human, livestock and crop disease, resulting in considerable economic loss worldwide. While the molecular and cellular underpinnings of host association have been described in considerable detail[1], their evolutionary origin remains generally poorly understood. The Rickettsiales represent a widespread and diverse order of obligate host-associated alphaproteobacteria that have been estimated to have originated >1.7 Ga, similar to a conservative estimated age of eukaryotes[2,3]. Rickettsiales infect a wide variety of eukaryotic species, including protists, leeches, cnidarians, arthropods and mammals[1]. Well-known examples include Rickettsia prowazekii, the causative agent of epidemic typhus in humans[4], and Wolbachia, a genus of bacteria infecting over two-thirds of arthropods and nearly all filarial nematodes[5]. Genomes of Rickettsiales are shaped by ongoing reductive evolution and are typically small (<1.5 Mb), rich in A+T nucleotides (<40% G+C), display a low coding density (<85%), lack metabolite biosynthesis genes and display a high degree of pseudogenization[6,7]. The host’s nutrient-rich cytoplasm rendered biosynthetic genes redundant, and genetic drift enhanced by small effective population sizes and frequent bottlenecks resulted in further genomic deterioration[8]. Rickettsiales employ various host-interaction factors, including a characteristic P-type (or Rickettsiales vir homologue, rvh[9]) type IV secretion system (T4SS), host-cell manipulating effector proteins[1,10-12] and an ATP/ADP translocase. The latter facilitates energy parasitism by exchanging host-cell ATP for endogenous ADP[13,14] and is commonly found in host-associated bacteria[15]. The currently recognized Rickettsiales families (Rickettsiaceae, Anaplasmataceae, Midichloriaceae and Deianiraeaceae) have each adopted specific lifestyles to interact with their respective host-cell environment[1,16-19]. While these observations give us insights into how the Rickettsiales adapted to the intracellular environment, it is still unclear how and when their last free-living ancestor became host-associated initially. Here we describe the discovery of genomes of deeply branching rickettsial lineages. In-depth analyses of these genomes suggest that their lifestyle is reminiscent of that of free-living and biofilm-colonizing planktonic bacteria. Subsequent ancestral genome content analysis across Rickettsiales provides new insights about the emergence of host association, a key step in the evolution of various host relationships displayed by this bacterial clade, including pathogenicity, mutualism and reproductive parasitism.

Results

Metagenomic identification of previously undescribed Rickettsiales

To shed light on the early evolution of Rickettsiales and the emergence of host association within this clade, we screened publicly available metagenomic repositories for deep-branching Rickettsiales (Supplementary Data 1 and 2). We reconstructed three metagenome assembled genomes (MAGs) from the Tara Oceans metagenome data[20] and identified another eight MAGs from public data derived from aquatic (marine, lake, aquifer and tailings water) environments (Supplementary Fig. 1 and Data 3)[21-24]. To assess their phylogenetic affiliation with other alphaproteobacteria and mitochondria, we inserted them into a previously established dataset of 24 genes highly conserved in alphaproteobacterial and gene-rich mitochondrial genomes[25]. We found that the identified lineages represented three distinct, previously undescribed Rickettsiales clades unrelated to mitochondria (Fig. 1 and Extended Data Fig. 1).
Fig. 1

Identification and phylogenomics of previously undescribed Rickettsiales-associated alphaproteobacteria.

Phylogenetic tree based on the 24 ‘alphamitoCOGs’ dataset including the Rickettsiales MAGs and other recently sequenced genomes. The 20% most heterogeneous sites were removed before tree inference with IQTREE under the PMSF approximation of the LG+C60+F+Γ4 model and 100 non-parametric bootstraps. The tree was rooted with Beta-, Gammaproteobacteria and Magnetococcales. See Extended Data Fig. 1 for the uncollapsed tree.

Extended Data Fig. 1

Maximum likelihood phylogeny of Alphaproteobacteria and mitochondria.

Based on the χ2-trimmed (20% most heterogeneous sites removed) concatenated alignment of 24 highly conserved mitochondrially encoded proteins. ML tree was inferred under the PMSF approximation of LG+C60+F+Γ4 with 100 non-parametric bootstraps as implemented by IQ-TREE. Distinct alphaproteobacterial clades are given their own unique color, including the Rickettsiales (orange) and mitochondria (salmon). Rickettsiales MAGs identified and/or reconstructed in this study are given in black.

Identification and phylogenomics of previously undescribed Rickettsiales-associated alphaproteobacteria.

Phylogenetic tree based on the 24 ‘alphamitoCOGs’ dataset including the Rickettsiales MAGs and other recently sequenced genomes. The 20% most heterogeneous sites were removed before tree inference with IQTREE under the PMSF approximation of the LG+C60+F+Γ4 model and 100 non-parametric bootstraps. The tree was rooted with Beta-, Gammaproteobacteria and Magnetococcales. See Extended Data Fig. 1 for the uncollapsed tree.

Phylogenomic placement of obtained clades

To pinpoint the phylogenetic position of the retrieved Rickettsiales-associated clades more confidently, we performed in-depth phylogenomic analyses with a dataset comprising 116 marker genes highly conserved among Rickettsiales[26] (Supplementary Fig. 2). These analyses confirmed that two of the three clades (hereafter Mitibacteraceae and Athabascaceae; Supplementary Text) branched as distinct groups sister to all other sampled Rickettsiales (Supplementary Figs. 3 and 4, and Data 1). Of the two, the Mitibacteraceae represented the deepest branching clade. The third clade (hereafter Gamibacteraceae; Supplementary Text) branched sister to the recently described ‘Candidatus Deianiraea vastatrix’ (Fig. 2a), a host-associated bacterium able to replicate outside their host cells[17]. The placement of the clade comprising Gamibacteraceae and Deianiraeaceae within the Rickettsiales species tree was not fully resolved. In agreement with Castelli et al.[17], this clade branched sister to the Anaplasmataceae with maximum branch support in both Bayesian (Supplementary Fig. 3) and maximum likelihood (Supplementary Fig. 4) trees. However, alphaproteobacterial and possibly rickettsial phylogenies[25,27-29] are known to be affected by long branch and compositional bias artefacts. We tested for such artefacts by separately removing the long-branched ‘Ca. D. vastatrix’ and the most heterogeneous sites (Methods and Supplementary Text). The overall topology was robust to either treatment (Supplementary Figs. 5 and 6), except for the placement of the Gamibacteraceae-Deianiraeaceae clade. When removing the most heterogeneous sites, they branched sister to a clade comprising the Midichloriaceae and Anaplasmataceae with near maximum branch support in the Bayesian tree (Fig. 2a, Supplementary Fig. 7 and Table 1) but with insignificant branch support in the maximum likelihood tree (Supplementary Text and Fig. 8). On the basis of the resolved Bayesian phylogeny, we suggest that this topology more probably reflects Rickettsiales evolutionary history and that the placement of the Gamibacteraceae-Deianiraeaceae clade sister to Anaplasmataceae is the result of a phylogenetic artefact.
Fig. 2

Phylogeny and comparative analysis of deep-branching Rickettsiales genomes.

a, Rickettsiales species tree based on a dataset of 116 marker genes of which the most heterogeneous sites were removed, highlighting the phylogenetic position of Gamibacteraceae (orange shading), Athabascaceae (blue shading) and Mitibacteraceae (green shading). The tree was inferred using PhyloBayes (CAT+LG+Γ4, 20,000 generations with a burn-in of 5,000 generations). Support values correspond to posterior probabilities and are only shown for values below 1. See Supplementary Fig. 7 for the complete tree including the outgroup. b–f, Distribution of environmental Gamibacteraceae (b) and Mitibacteraceae sequences (c) obtained from the NCBI nt database. Colours represent the sequences’ source environments. For each genome, the observed genome size in Mbp (d), with estimated genome size in dark grey shading, the GC content (e) and the coding density, calculated as the total length of protein-coding genes divided by the genome size (f), are shown. Source data for graphs in d–f can be found in Supplementary Data 3.

Phylogeny and comparative analysis of deep-branching Rickettsiales genomes.

a, Rickettsiales species tree based on a dataset of 116 marker genes of which the most heterogeneous sites were removed, highlighting the phylogenetic position of Gamibacteraceae (orange shading), Athabascaceae (blue shading) and Mitibacteraceae (green shading). The tree was inferred using PhyloBayes (CAT+LG+Γ4, 20,000 generations with a burn-in of 5,000 generations). Support values correspond to posterior probabilities and are only shown for values below 1. See Supplementary Fig. 7 for the complete tree including the outgroup. b–f, Distribution of environmental Gamibacteraceae (b) and Mitibacteraceae sequences (c) obtained from the NCBI nt database. Colours represent the sequences’ source environments. For each genome, the observed genome size in Mbp (d), with estimated genome size in dark grey shading, the GC content (e) and the coding density, calculated as the total length of protein-coding genes divided by the genome size (f), are shown. Source data for graphs in d–f can be found in Supplementary Data 3.

Environmental distribution of added Rickettsiales

To assess the environmental distribution of the newly obtained alphaproteobacterial clades, we used the 16S ribosomal RNA gene sequences of Gamibacteraceae and Mitibacteraceae MAGs to query public sequence databases. Athabascaceae MAGs did not contain 16S rRNA genes and could therefore not be included in this analysis. Highly similar sequences were exclusively found in datasets obtained from aquatic habitats. Sequences closely related to the Gamibacteraceae 16S rRNA gene were associated with a diverse set of environments, including lakes and aquifers as well as marine systems. Interestingly, two recovered 16S rRNA gene sequences were obtained from sequencing datasets of cells of the freshwater cnidarian Hydra vulgaris and of the marine ciliate Hemigastrostyla elongata (Fig. 2b and Supplementary Fig. 9)[30,31]. The two sequences are unlikely to be contaminants as rigorous efforts were made to minimize contamination[30,31]. Their detection in a ciliate microbiome could hint at a conserved lifestyle of Gamibacteraceae and their closest characterized relative ‘Ca. D. vastatrix’, which was described to colonize the cell surface of its ciliate host Paramecium primaurelia[17]. Apart from Gamibacteraceae, Deianiraeaceae-related 16S rRNA gene sequences were also detected in the Hydra vulgaris microbiome (Supplementary Fig. 9)[17,30]. Most probably, the observed interactions of Gamibacteraceae and Deianiraeaceae with Hydra can be explained by the presence of host ciliates in the polyp microbiome. Alternatively, Hydra could represent a direct host for these lineages as well. We could not detect potential host organisms for Mitibacteraceae, as all ‘Ca. Mitibacter marchionensis’-related 16S rRNA gene sequences were exclusively associated with marine, non-host-associated environments (Fig. 2c and Supplementary Fig. 9).

Inferred physiology of Mitibacteraceae and Athabascaceae

The reconstructed genomes of Mitibacteraceae and Athabascaceae are on average larger in size, display a higher GC content and a higher protein-coding density (Fig. 2d–f and Supplementary Data 3) compared with those of the known obligate host-associated Rickettsiales (hereafter referred to as ‘classical Rickettsiales’, including Gamibacteraceae). Mitibacteraceae and Athabascaceae, like most classical Rickettsiales, encode an incomplete glycolysis pathway and a complete tricarboxylic acid cycle (Extended Data Fig. 2 and Supplementary Data 4). However, unlike all classical Rickettsiales, they encode complete pathways to synthesize all amino acids and nucleotides (Fig. 3a, Extended Data Fig. 2 and Supplementary Data 4). Even the ectosymbiotic ‘Ca. D. vastatrix’, by far the richest classical Rickettsiales described so far in terms of amino acid biosynthesis potential, can only synthesize 15 amino acids and lacks the genetic potential for de novo nucleotide biosynthesis (Fig. 3a, Extended Data Fig. 2 and Supplementary Data 4)[17]. The two lineages further encode nearly all subunits of the main oxidative phosphorylation complexes and a complete glyoxylate cycle. The latter may allow them to use substrates such as acetate as a sole carbon source (Extended Data Fig. 2 and Supplementary Data 4). They further uncharacteristically encode functions more commonly found in free-living bacteria, such as a sulfate uptake system and a complete (Athabascaceae) or partial (Mitibacteraceae) assimilatory sulfate reduction pathway, with the potential to provide sulfide for the de novo biosynthesis of cysteine (Extended Data Fig. 2 and Supplementary Data 4)[32,33]. Two Mitibacteraceae and one Athabascaceae MAG also feature ars gene clusters (Supplementary Data 4), which are lacking in all classical Rickettsiales. These clusters facilitate arsenite export (Extended Data Fig. 2) and so confer arsenic resistance. Arsenic is the most prevalent toxic element in the environment[34] and resistance mechanisms are almost ubiquitously found in prokaryotes[35]. Mitibacteraceae further encode an ammonium transporter which is an important inorganic nutrient uptake system in marine bacteria and is not found in other Rickettsiales[36]. Despite being aquatic alphaproteobacteria, the Mitibacteraceae and Athabascaceae do not encode proteorhodopsin-related proteins (Supplementary Data 4), indicating that they are unable to harness energy from a light-driven H+ gradient. However, they do encode homologues of hydroperoxidase KatG (not found in other Rickettsiales families) and superoxide dismutase SOD2 (Extended Data Fig. 2, Supplementary Data 4 and 6), suggesting that they experience stress from UV light-induced reactive oxygen species.
Extended Data Fig. 2

Overview of conservation of central metabolism, transporters, and electron transport chain (ETC) in Rickettsiales families and key ancestors.

All enzymatic steps are depicted with enzyme commision (EC) numbers. The corresponding enzyme name and distribution in Rickettsiales can be found in Supplementary Data 4. Arrows between molecules and arrow heads indicate catalyzed enzymatic reactions and directionality, respectively. Presence/absence is indicated for extant genomes of Anaplasmataceae (light blue), Midichloriaceae (dark blue), Deianiraeaceae (light yellow), Gamibacteraceae (dark yellow), Rickettsiaceae (light violet), Athabascaceae (light orange), Mitibacteraceae (dark orange) as well as the reconstructed genomes of the ancestor of all obligate host-associated (classical) Rickettsiales (LhRCA, dark violet) and the ancestor of all Rickettsiales including Mitibacteraceae and Athabascaceae (LRCA, green). For the seven families, full squares represent absence in at most one member, half squares absence in more then one member but presence in some, and empty squares absence in all members. For reconstructed ancestral genomes, genes are inferred as present if the ALE program inferred an ancestral relative frequency of at least 0.3.

Fig. 3

Distribution and ancestral inference of phenotypic traits and pathways across Rickettsiales.

a, Presence (full circle) and absence (empty circle) of genes or pathways in the seven Rickettsiales families (including Gamibacteraceae, Athabascaceae and Mitibacteraceae), the alphaproteobacterial outgroup as well as the last common ancestors of the classical, obligate host-associated Rickettsiales (LhRCA) and of all Rickettsiales (LRCA). Pathways are grouped into ‘secretion’ (yellow), ‘motility’ (blue), ‘biosynthesis and transport’ (orange) and ‘electron transport chain’ (green). In some cases, pathways are either partial (one or several genes missing) or are absent from some species, but present in others (half circle). Numbers inside the circles represent the number of complete pathways for the biosynthesis of amino acids. The Gln transporter could potentially also transport other polar amino acids such as Ser, Thr and Asn. b, Synteny of the Rickettsiales vir homologue (rvh) T4SS in classical (black) and environmental (red) Rickettsiales. Taxa are ordered according to the species tree in a. One representative per family except Athabascaceae and Mitibacteraceae with two representatives each. c, Schematic phylogenetic tree showing the common duplication of the virB4 gene in LRCA. See Supplementary Fig. 10 for the uncollapsed tree. P-T4SS, type IV secretion system of the P type; T1SS, type I secretion system; CONJ, conjugation system; T4P, type 4 pilus; Tad, tight adherence system.

Distribution and ancestral inference of phenotypic traits and pathways across Rickettsiales.

a, Presence (full circle) and absence (empty circle) of genes or pathways in the seven Rickettsiales families (including Gamibacteraceae, Athabascaceae and Mitibacteraceae), the alphaproteobacterial outgroup as well as the last common ancestors of the classical, obligate host-associated Rickettsiales (LhRCA) and of all Rickettsiales (LRCA). Pathways are grouped into ‘secretion’ (yellow), ‘motility’ (blue), ‘biosynthesis and transport’ (orange) and ‘electron transport chain’ (green). In some cases, pathways are either partial (one or several genes missing) or are absent from some species, but present in others (half circle). Numbers inside the circles represent the number of complete pathways for the biosynthesis of amino acids. The Gln transporter could potentially also transport other polar amino acids such as Ser, Thr and Asn. b, Synteny of the Rickettsiales vir homologue (rvh) T4SS in classical (black) and environmental (red) Rickettsiales. Taxa are ordered according to the species tree in a. One representative per family except Athabascaceae and Mitibacteraceae with two representatives each. c, Schematic phylogenetic tree showing the common duplication of the virB4 gene in LRCA. See Supplementary Fig. 10 for the uncollapsed tree. P-T4SS, type IV secretion system of the P type; T1SS, type I secretion system; CONJ, conjugation system; T4P, type 4 pilus; Tad, tight adherence system. Furthermore, we found that Mitibacteraceae and Athabascaceae genomes encode a complete flagellum and associated chemotaxis machinery (Fig. 3a), suggesting a possible motile lifestyle[17,18]. Their genomes also encode Type 4 pili (T4P; Fig. 3a), which have been implicated in motility, surface attachment and biofilm formation[37], but can also function in evasion of protist predators[38]. Mitibacteraceae genomes additionally encoded a tight adherence pilus (Tad) and an extracellular polysaccharide biosynthesis pathway via pel gene clusters[39,40]. The latter was also found in the Athabascaceae MAG UBA6187 (Fig. 3a and Extended Data Fig. 3a,b). Both Tad pili and exopolysaccharide biosynthesis capacity have been shown to play important roles in biofilm formation[41,42] and together with the presence of T4P, suggest that Mitibacteraceae, and perhaps Athabascaceae, can form biofilms.
Extended Data Fig. 3

Gene synteny of and domain architecture of derived pel exopolysaccharide gene cluster in free-living Rickettsiales.

(a) Gene synteny plot of pelABCDEFG gene cluster in the model organism Pseudomonas aeruginosa PAO, and related clusters in selected free-living bacteria, and free-living Rickettsiales. Arrows indicate the genomic orientation and size of genes. Scale bar indicates a length of 2.5 kb. (b) Domain architecture of likely functional equivalents of the P. aeruginosa PelB and PelE proteins in the Rickettsiales bacterium Candidatus Mitibacter marchionensis.

We failed to detect genes encoding an ATP/ADP translocase or any other gene typical of intracellular or parasitic lifestyles in Mitibacteraceae and Athabascaceae, except for those encoding the rvh T4SS, including the duplication of several components[43] as well as several distinct modifications compared with the vir T4SS of Agrobacterium[10] (Fig. 3b,c and Supplementary Fig. 10). In classical Rickettsiales, this T4SS is typically used for the manipulation of the host cell via translocation of effector proteins[10-12]. Besides a few homologues of rickettsial ankyrin repeat protein 1 (RARP-1), a Sec-TolC-secreted effector of Rickettsia typhi[44], and RARP-2 which may be secreted by the T4SS, we were neither able to identify any of the experimentally verified T4SS effector proteins of classical Rickettsiales[11,12], nor could we detect similar numbers of putative effector proteins containing eukaryotic-like repeat domains as in other Rickettsiales genomes (ankyrin-, leucine rich- and tetratricopeptide repeats; Supplementary Text, Extended Data Fig. 4 and Supplementary Data 4). In contrast, we detected several proteins containing peptidoglycan-binding (PGB) domains in Mitibacteraceae genomes. These proteins, which are otherwise relatively rare in Rickettsiales (Supplementary Text and Extended Data Fig. 4), have been shown to serve as effectors of a ‘bacteria killing’-type T4SS in other proteobacteria[45,46]. However, PGB domain proteins are also known to operate in peptidoglycan synthesis and cell shape remodelling[47], and may thus allow the Mitibacteraceae to evade predatory protists by providing cell shape plasticity[48].
Extended Data Fig. 4

Frequency of proteins containing selected PFAM domain families.

Domains typical of Rickettsiales effectors (Ank, LRR, TPR) and domains often found in Xanthomonas bacterial-killing effectors (Peptidoglycan binding domain PGBD, Peptidase M23) in the selected taxa used for phylogenomic and ancestral reconstruction. Domains were identified using HMMER hmmsearch (-E 1e-05) of the corresponding Pfam domain profiles.

Based on their general genome structure and gene content, Mitibacteraceae and Athabascaceae are atypical compared with the classical obligate intracellular Rickettsiales. Instead, they possibly exhibit a lifestyle similar to aquatic copiotrophic, biofilm-associated and free-living bacteria[49-51], or perhaps a facultative host-associated (probably extracellular) lifestyle (Figs. 2d–f and 3a, Extended Data Fig. 2 and Supplementary Data 4). The suggestion that Mitibacteraceae and Athabascaceae are possibly free-living bacteria is reinforced by an analysis with PhenDB[52], a machine learning algorithm that has been trained to predict phenotypic traits, including obligate host-association, on the basis of gene content. PhenDB predicted that Mitibacteraceae and Athabascaceae are not obligate intracellular symbionts, in contrast to classical Rickettsiales (Supplementary Table 2).

Emergence of host association in Rickettsiales

The deep-branching nature of the Mitibacteraceae and Athabascaceae allowed us to study the origin of the host-associated lifestyle of the Rickettsiales in unprecedented detail. We reconciled[53,54] 4,240 single-gene trees with the previously obtained species tree (Fig. 2a) to reconstruct the gene family complement as well as gene duplications, transfers, losses and origination events along the entirety of the species tree (Extended Data Fig. 5, Supplementary Fig. 11, and Data 4 and 5). Below, we focus on two ancestors that are central to understanding the emergence of host association: the last Rickettsiales common ancestor (LRCA) and the last obligate host-associated Rickettsiales common ancestor (LhRCA; Fig. 3a).
Extended Data Fig. 5

Key gains, losses and characteristics of host-associated Rickettsiales ancestors.

Schematic phylogenetic tree depicting the relationships of the last common ancestor of host-associated Rickettsiales (LhRCA) and Rickettsiales families ancestors. Last common ancestor of Anaplasmataceae, Midichloriaceae, Deianiraea and Gamibacteraceae: LMiGDACA; Last common ancestor of Anaplasmataceae and Midichloriaceae: LMiACA; Last common ancestor of Anaplasmataceae: LACA, Midichloriaceae: LMiCA, Gamibacteraceae: LGCA; Rickettsiaceae: LRiCA. Gains (violet) and losses (green) of genes encoding for key characteristics are written above and below the corresponding branch, respectively. The ancestors are depicted as cells with inferred features such as the presence of a flagellum, Type 4 pili, the capacity to synthesize amino acids and the number of inferred ancestral genes. Amino acid biosynthesis pathways are represented by the three-letter code of the produced amino acid. Proteins related to disulfide bond formation (DsbB); filamentous hemagglutinin production (FhaB), secretion and activation (FhaC); rickettsial ankyrin repeat protein (RARP-1); surface cell antigen (Sca); polyhydroxybutyrate (PHB) synthesis.

The LRCA was inferred to feature at least 1,432 protein-coding genes, considerably more than most extant classical Rickettsiales (Supplementary Data 6). This is most probably an underestimate as LRCA gene families could have gone extinct or were not sampled in the analysed genomes. Only a few classical Rickettsiales encode a similar number or more protein-coding genes. For example, Orientia tsutsugamushi str. Boryong encodes 2,120 protein-coding genes, but this large number can be explained by a recent massive proliferation of transposons, conjugative T4SS genes and other host-cell interaction genes[55]. Furthermore, this expanded gene repertoire of O. tsutsugamushi also includes around 800 pseudogenes, most of which are duplicates of functional genes[55,56]. The rvh T4SS in LRCA was most probably acquired via horizontal gene transfer from an ancestral proteobacterial donor (Fig. 3c and Supplementary Fig. 10). We were unable to assign any verified Rickettsiales effector proteins[1,11,12] related to eukaryotic host-association to this T4SS, suggesting that it could have served an alternative function. As we inferred LRCA to contain at least four PGB domain-containing proteins, which are putative effectors of such secretion systems, it could have acted instead as a ‘bacteria killing’ type T4SS[45,46]. Alternatively, the ancestral T4SS could also have acted in defence against predatory protists. The LRCA was further inferred to encode a very similar gene profile as observed for the Mitibacteraceae and Athabascaceae, hence including many genes affiliated to a free-living lifestyle (for example, full complement of nucleotide and amino acid biosynthesis genes; Fig. 3a, Extended Data Fig. 2 and Supplementary Data 4) and lacking several genes related to the typical symbiotic lifestyle of classical Rickettsiales (for example, an ATP/ADP translocase and several conserved amino acid transporters; Fig. 3a). Besides the capacity to synthesize its own nucleotides and amino acids, LRCA had the genetic potential to take up sulfate and ammonium, carry out assimilatory sulfate reduction and confer arsenic resistance (Fig. 4, Extended Data Fig. 2 and Supplementary Data 4). It had a motile lifestyle, enabled by a flagellum, a T4P and a functioning chemotaxis system. Finally, it most probably was able to initiate biofilm formation using the pel and tad systems (Fig. 4 and Supplementary Data 4). On the basis of these observations, we propose that LRCA was not an obligate intracellular symbiont, but rather a free-living or perhaps facultative host-associated organism. This would avoid the need for an unprecedented and highly complex intracellular-to-extracellular transition to explain the extracellular lifestyle of ‘Ca. D. vastatrix’, as originally argued by Castelli et al.[17]. However, a stage of facultative intracellular host-association cannot be fully excluded either.
Fig. 4

Evolutionary transition from a free-living to a host-associated lifestyle in Rickettsiales ancestors.

Proposed lifestyle of LRCA and LhRCA in blue boxes at the bottom indicating marine environments. Rickettsiales cells are depicted as ellipsoid cell models in brown and blue for LRCA and LhRCA, respectively. Reconstructions of key genomic features of ancestors are depicted in cell models in the respective boxes. Arrow in the centre indicates evolutionary transition from LRCA to LhRCA and grey boxes above and below show gains and losses of genomic traits, respectively. Orange and blue arrows indicate metabolic pathways related to sulfate assimilation or central carbon metabolism, respectively. Abbreviations in the figure refer to genes encoding for the tight adherence (Tad) pili; type 4 pili (T4P); type I (T1SS) or rvh type IV (T4SS) secretion systems; Pel-like exopolysaccharide (EPS) synthesis; tricarboxylic acid (TCA) cycle; electron transport chain complexes (C) I-V; Sec-, Tat- and TolC-mediated protein secretion pathways (Sec, Tat and TolC, respectively).

Evolutionary transition from a free-living to a host-associated lifestyle in Rickettsiales ancestors.

Proposed lifestyle of LRCA and LhRCA in blue boxes at the bottom indicating marine environments. Rickettsiales cells are depicted as ellipsoid cell models in brown and blue for LRCA and LhRCA, respectively. Reconstructions of key genomic features of ancestors are depicted in cell models in the respective boxes. Arrow in the centre indicates evolutionary transition from LRCA to LhRCA and grey boxes above and below show gains and losses of genomic traits, respectively. Orange and blue arrows indicate metabolic pathways related to sulfate assimilation or central carbon metabolism, respectively. Abbreviations in the figure refer to genes encoding for the tight adherence (Tad) pili; type 4 pili (T4P); type I (T1SS) or rvh type IV (T4SS) secretion systems; Pel-like exopolysaccharide (EPS) synthesis; tricarboxylic acid (TCA) cycle; electron transport chain complexes (C) I-V; Sec-, Tat- and TolC-mediated protein secretion pathways (Sec, Tat and TolC, respectively). The transition from the LRCA to the last common ancestor of all classical Rickettsiales and the Athabascaceae (LhRAtCA) saw little net change to the number of inferred protein-coding genes (from n = 1,432 in the LRCA to n = 1,430 in LhRAtCA). Besides the loss of several genes encoding components of the Tad pilus in LhRAtCA, no genes of note were lost or gained (Fig. 3a). In contrast, the LhRCA had considerably fewer genes (n = 1,165) and had lost genes involved in alanine, cysteine and serine biosynthesis, sulfate assimilation, sulfate transporters and ammonium transporters (Fig. 4). It furthermore lost the pel genes, suggesting that the ability to form biofilms was lost at this stage (Figs. 3a and 4). The characteristic ATP/ADP translocase which enables energy parasitism was gained here, as well as a transporter for polar amino acids, such as glutamine, and a tyrosine or tryptophan transporter. Thus, the LhRCA, as its extant descendants, most probably relied on a host for certain key metabolites but was also still able to synthesize a wide range of metabolites itself (Fig. 4 and Supplementary Data 4). We therefore hypothesize that the LhRCA was the first Rickettsiales to be an obligate symbiont. It did retain all flagellum and T4P genes, many of the chemotaxis-related genes and the ability to synthesize nucleotides de novo. Taken together, we suggest that it exhibited an obligate symbiotic yet extracellular lifestyle, which was conserved in its descendant ‘Ca. D. vastatrix’[17], and possibly also in the Gamibacteraceae. However, based on current data, a facultative intracellular lifestyle cannot be excluded either. Its host was most probably a unicellular eukaryote, as the LhRCA was previously estimated, via molecular dating, to predate the origin of Metazoa by ~700 Myr and predicted, via ancestral trait reconstruction, to be protist-associated[2]. Irrespectively, all known extant descendants of the LhRCA retained this obligate symbiotic lifestyle, but diversified to occupy distinct ecological niches.

Key transitions in classical Rickettsiales

From the LhRCA onwards, the evolutionary history of the classical Rickettsiales is generally characterized by reduction of central metabolic capabilities (Extended Data Figs. 2 and 5, and Supplementary Data 6). This trend is typical for obligate host-associated clades[8,19,57]. Yet, a number of genes linked to host association were gained as well. Below we discuss several observations that are of particular interest with respect to the evolution of host association. Assuming an extracellular nature of the LhRCA, one of the key transitions that occurred during Rickettsiales evolution was the emergence of an intracellular lifestyle. Castelli et al.[17] proposed that intracellularity in Rickettsiales evolved multiple times independently. Given our species tree (Fig. 2a), this would entail independent transitions to intracellularity in the last common ancestor of Midichloriaceae and Anaplasmataceae (LMiACA), and in the last common ancestor of the Rickettsiaceae (LRiCA). The extensive loss of a largely overlapping set of amino acid biosynthesis genes coincided with these transitions (Extended Data Fig. 5): the LMiACA lost the capacity to synthesize nine amino acids, and the LRiCA eleven amino acids. In contrast, the possibly ectosymbiotic Gamibacteraceae retained its amino acid biosynthesis pathways and the ectosymbiont ‘Ca. D. vastatrix’ only lost three such pathways. This suggests that switching to a predominantly intracellular environment predisposed these lineages to losing many amino acid biosynthesis genes. Similar to the emergence of an intracellular lifestyle, the association with animal hosts evolved independently in several lineages of classical Rickettsiales. Like the LhRCA, which was inferred to be a protist symbiont[2,17], species from the Rickettsiaceae and Midichloriaceae and ‘Ca. D. vastatrix’ are known to be associated with protists such as ciliates and amoeba. Wang and Luo[2] inferred that the highly specialized insects and mammal symbionts and pathogens of the Anaplasmataceae, Midichloriaceae and Rickettsiaceae evolved by frequent and independent transitions from protist-associated ancestors[2,17]. A similar host switch could have occured in Gamibacteraceae as well (see above). The hallmark feature of energy parasitism, the ATP/ADP translocase[13-15], was lost twice independently as well: once in the last common ancestor of the Gamibacteraceae (LGCA) and once in the last common ancestor of the Anaplasmataceae (LACA). The Anaplasmataceae can produce their own ATP via oxidative phosphorylation using the electron transport chain[1] and the Gamibacteraceae have the genetic potential for this as well (Extended Data Fig. 2). The lack of an ATP/ADP translocase in Gamibacteraceae suggests that this group may be less dependent on a host for energy and nutrients (Fig. 3a and Extended Data Fig. 2). The genetic potential for storing carbon as polyhydroxybutyrate (PHB) was lost twice independently as well: once in the last common ancestor of the Gamibacteraceae and ‘Ca. D. vastatrix’ (LGDCA), and once in the LACA (Extended Data Figs. 2 and 5, and Supplementary Data 4). Rickettsial species may store carbon as PHB when host energy sources are unavailable[7], for instance, when they are horizontally transmitted to a new host. PHB carbon storage is otherwise commonly found in marine heterotrophic bacteria[58]. Intriguingly, losing PHB carbon storage coincided in both cases with the loss of the flagellum (Extended Data Fig. 5) and may indicate a diminished capability for both ancestors to move and survive independent of a host. DsbB, a protein previously proposed to be involved in oxidative folding of secreted Wolbachia effectors[59], was gained by the LMiACA, as was a DnaJ-like chaperone. Genomes of the endosymbiotic Midichloriaceae and Anaplasmataceae experience an increased deleterious mutational load due to their intracellular nature[8,60]. This chaperone could potentially help stabilize enzymes to retain their function despite an accumulation of destabilizing mutations in their respective genes[60,61]. The LRiCA additionally gained several genes involved in host entry and manipulation, including the known family of autotransporters called surface cell antigens (Sca)[4,62], the known Sec-TolC-secreted effector RARP-1[44] and several copies of related ankyrin repeat-containing proteins (Extended Data Fig. 5 and Supplementary Data 4). FhaB and FhaC, proteins involved in adhesin production and export[63], were gained by the LGDCA (Extended Data Fig. 5 and Supplementary Data 4). The homologues in the ectosymbiont ‘Ca. D. vastatrix’ (DeiVas_00243/WP_146820351.1 and DeiVas_00245/WP_161982782.1) were previously proposed to be involved in adhesion to cells of the host Paramecium primaurelia[17], as they are essential for host-cell adhesion in Bordetella species as well[63]. Their conservation in ‘Ca. D. vastatrix’ and Gamibacteraceae MAGs underpins a probably ancestral ectosymbiotic lifestyle of this group.

Discussion

The origin and emergence of host association during the early evolution of Rickettsiales remains poorly understood. Here we identify and reconstruct eleven MAGs distributed over the proposed deep-branching Mitibacteraceae and Athabascaceae and the more nested Gamibacteraceae. We propose that the Mitibacteraceae and Athabascaceae exhibit free-living or possibly facultative host-associated lifestyles, and are capable of forming biofilms. Their genomes lack the signatures of reductive evolution and contain the genetic repertoire for chemotactic motility, biofilm formation and, most importantly, a rich metabolism with complete biosynthetic pathways for all amino acids and nucleotides. Subsequent ancestral genome reconstruction analyses indicate a similar, free-living lifestyle for the LRCA, although a facultative host-association cannot be fully excluded. The rvh-type T4SS, characteristic for Rickettsiales, was acquired by the LRCA through horizontal gene transfer (HGT) from a proteobacterial donor. This ancestral T4SS may have functioned in so-called ‘bacteria killing’[45,46] as indicated by the presence of several PGB-containing proteins in the LRCA, and/or in defence against protist predation. Evasion of predatory protists may have additionally been facilitated by the aforementioned PGB proteins, which are known to function in cell shape remodelling[47], and by biofilm formation, which was shown to allow some aquatic prokaryotes to evade protist grazing[48,64]. As the Rickettsiales evolved into obligate symbionts of ancestral protists, we propose that the rvh T4SS was repurposed from a system for interbacterial competition (‘bacteria killing’) and/or defence against protist predation into the host-cell manipulating secretion system found in present-day Rickettsiales symbionts. In addition to the proposed repurposing of the T4SS, the loss of biofilm-forming capacity and key metabolic genes, mirrored by the gain of several amino acid transporters and an ATP/ADP translocase in the LhRCA, represents key events in the transition from a free-living or facultative host-associated lifestyle to an obligate symbiotic lifestyle during Rickettsiales evolution. We propose the LhRCA to have been an ectosymbiont of an ancestral protist, a lifestyle that is conserved in ‘Ca. D. vastatrix’ and possibly also in Gamibacteraceae. Alternatively, it could have exhibited a facultative intracellular lifestyle, and its known descendants either reverted to extracellularity or became obligately intracellular. The LhRCA possibly used its vertically inherited T4P to attach to its host, perhaps in a fashion similar to how the predatory bacterium Vampirococcus lugosii attaches to its prey[65]. It may further have used its flagellum and chemotaxis machinery, conserved in some extant members of the Midichloriaceae and Rickettsiaceae[16,26], for motility and to navigate towards host cells. Our results show that subsequent rickettsial evolution was dominated by further genome reduction and specialization towards different host organisms and niches within or outside host cells, giving rise to the diverse lifestyles displayed by extant rickettsial symbionts and pathogens. The current identification of previously undescribed environmental clades of Rickettsiales has allowed us to reconstruct the early evolution and emergence of host association in this bacterial clade. While the inferred genome content of the LRCA is compatible with a free-living lifestyle, we currently cannot exclude the possibility that it displayed a facultative extracellular or even intracellular host-association. As our ancestral genome reconstruction analyses indicate the inferred genomic content of the LRCA to be largely congruent with that of present-day representatives of the Mitibacteraceae and Athabascaceae, future studies aiming to characterize the physiology and lifestyles of these lineages might provide further insights into the nature of the last common ancestor of Rickettsiales.

Methods

Sample selection

All publicly available Tara Oceans assemblies[20] were screened with the RP15 pipeline[66,67] for the presence of Rickettsiales-related lineages, as previously described[25]. The RP15 pipeline approximates the phylogenetic position of all taxa present in a metagenome assembly for which at least 5 out of 15 well-conserved ribosomal proteins are encoded on a single contig. In the end, two assemblies (125_MIX_0.22-3 and 067_SRF_0.22-0.45) were identified (Supplementary Data 2).

Raw sequence data

The raw sequence data from all samples corresponding to the two selected assemblies and from additional samples (Supplementary Data 2) were downloaded from the Tara Oceans project ERP001736 on the EBI Metagenomics portal.

Read preprocessing

All reads were preprocessed as previously described[25]. SEQPREP v1.3.2 (https://github.com/jstjohn/SeqPrep) was used to merge overlapping read pairs into single reads and remove read-through Illumina adapters. TRIMMOMATIC v0.35[68] was used to remove residual Illumina adapters, trim low-quality base-calls at starts and ends of reads, remove short reads and finally remove reads that had a low average phred score. The overall quality and presence of adapter sequences of processed and unprocessed reads were assessed with FASTQC v0.11.4[69].

Metagenome assembly

The preprocessed metagenomic reads from the two selected samples (125_MIX_0.22-3 and 067_SRF_0.22-0.45) were re-assembled with metaSPAdes[70], a mode of SPAdes 3.7.0 with k-mers 21,33,55,77. In case a sample was associated with multiple sequencing runs, all preprocessed reads from the different sequencing runs were pooled before assembly.

Phylogenetic diversity in metagenome assemblies and public MAG datasets

The RP15 pipeline was used to estimate the phylogenetic diversity of alphaproteobacterial lineages present in the two re-assembled metagenomes and published MAGs of Parks et al.[23] (PRJNA348753), Tully et al.[24] (PRJNA391943), Delmont et al.[22] (10.6084/m9.figshare.4902923) and Anantharaman et al.[21] (PRJNA288027). Only the MAGs categorized by the aforementioned studies as ‘Alphaproteobacteria’ or ‘Rickettsiales’ were considered. Protein-coding sequences were predicted with Prodigal v2.60[71]. The RP15 pipeline was first run with a reference set of 90 representative bacteria and archaea to identify alphaproteobacterial contigs (Supplementary Data 1)[25]. The ribosomal proteins encoded on these contigs were then incorporated in a second RP15 dataset consisting of their orthologues in 84 representative alphaproteobacteria, 12 mitochondria, 2 MarineProteo1, 2 magnetococcales, 4 betaproteobacteria and 4 gammaproteobacteria. A concatenated supermatrix alignment was prepared (alignment: MAFFT L-INS-i v7.471, alignment trimming: trimAl v1.4.rev15 -gt 0.5). To reduce compositional bias—a phylogenetic artefact to which alphaproteobacteria are particularly sensitive[25,27,28,72]—we removed 20% of the sites that contributed most to compositional heterogeneity[28]. A phylogenetic tree was inferred from the compositionally trimmed supermatrix alignment with IQTREE v1.6.9 (-m LG+C60+F+G, selected by ModelFinder, -bb 1000 -nm 250; Supplementary Fig. 1)[73].

Binning of metagenomic contigs

For each metagenome assembly, contigs larger than 2 kb were grouped into bins on the basis of differential coverage across samples, tetranucleotide frequency profiles, GC composition and read-pair linkage as previously described[25]. The contigs were cut every 10 kb, unless the remaining fragment was shorter than 20 kb. Then the preprocessed reads of a set of sequencing runs (125_MIX_0.22-3: all sequencing runs listed in Supplementary Data 2; 067_SRF_0.22-0.45: sequencing runs ERR598994, -599144, -594313, -594325, -594395 and -594404; Supplementary Data 2) were mapped onto the fragmented contigs with KALLISTO v0.42.5[74], yielding differential coverage profiles per fragmented contig. This was then used together with tetranucleotide frequency information by CONCOCT v0.4.0[75] to group the fragmented contigs into bins. Bins containing the Rickettsiales ribocontigs (Fig. 1) were then assessed and cleaned with MMGENOME (accessed June 2016)[76] using differential coverage, GC composition, read-pair linkage and presence of 139 genes well-conserved across Bacteria. Finally, the fragmented contigs of the cleaned bins were replaced by their corresponding full-length contigs. In case not all fragmented contigs from a corresponding full-length contig were present in a cleaned bin, the full-length contig would only be included in the final bin if the majority of the fragmented contigs were present. This yielded three draft genome bins: ‘BIN125’, ‘BIN67-1’ and ‘BIN67-3’. We aimed to improve the quality of the draft bins by recruiting reads from all Tara Oceans metagenomes that had sequence coverage for the (BIN125: ERR594323, -599156, -594338, -594339, -59434; BIN67-1 and BIN67-3: ERR594395, -594404, -598994, -599144, -594313, -594325) and performing a second round of assembly and binning. This was completed as follows. Preprocessed reads putatively derived from the genomes of interest were recruited from the selected metagenomes by classifying them with Bowtie2 v2.3[77] and CLARK-S v1.2.3[78] using a set of reference Rickettsiales genomes. The recruited reads were combined in two separate pools, one for BIN125 samples and another for BIN67-1/BIN67-3 samples. Each pool was assembled separately with SPAdes (–careful)[79]. The final BIN125 bin was obtained by removing all contigs <3,300 bp. The final BIN67-1 and BIN67-3 bins were obtained by separating the contigs ≥1,500 bp into two groups with CONCOCT (–clusters 4)[75].

Completeness and redundancy estimates

We used the miComplete tool v.1.1.1[80] to estimate the completeness and redundancy of the MAGs as well as the reference genomes using a set of bacterial marker genes.

Annotation

All bins were annotated with prokka v1.12[81], which was altered to allow for partial gene predictions on contig-edges (GitHub pull request no. 219), with the options–compliant,–partialgenes,–cdsrnaolap and–evalue 1e-10, and with barrnap (https://github.com/tseemann/barrnap) as the rRNA predictor. We used eggNOG­mapper v1.0.3[82] to get annotations from the EggNOG database 4.5.1[83] (from which we gathered the alphaNOGs). We assigned KEGG[84] orthology (KO) and enzyme commision (EC) numbers using GhostKOALA v2.2[85]. Additionally, we annotated the proteins using the Carbohydrate­Active enZYmes Database[86] (CAZY, using HMMER v3.3[87]), the Transporter Classification Database[88] (TCDB, using BlastP 2.8.1+[89]) and used InterProScan v5.42-78.0[90] to annotate the proteins with PFAM[91], TIGRFAM[92] and IPR domains. For detailed annotation of secretion systems and filamentous structures, we screened proteomes using MacSyFinder[93,94] v2.0rc1 with the ‘TXSScan’[93,94] and ‘TFF-SF’[95] HMM models with ‘—db_type unordered’. Finally, we used DIAMOND v2.0.6.144[96] to perform similarity searches of the proteins against the non-redundant protein sequence database and recorded the taxonomic annotation of the last common ancestor of all hits within 2% of the best score. Further searches of PFAM[91] domain profiles were performed using HMMER (-E 1e-05)[87] for eukaryotic domains commonly found in Rickettsiales T4SS effectors (Ankyrin repeats: CL0465; leucin rich repeats: CL0022; Tetratricopeptide repeats: CL0020, Pentapeptide: CL0505)[11,18] as well as in Xanthomonas bacterial-killing T4SS effectors (PGB domains: CL0244; Peptidase_M23: PF01551)[45,46]. All annotations are available from the Figshare repository (10.6084/m9.figshare.c.5494977, see Data availability). We finally searched a selection of genomes for candidate sequences of virB7 homologues using tblastN[89] and references (‘rickettsiales+AND+virB7’) from Uniprot[97].

Alphaproteobacteria and mitochondria species tree

A phylogenomics dataset was constructed by updating the ‘24 alphamitoCOGs with more diverse mitochondria’ dataset from Martijn et al.[25] (henceforth ‘alphamito24’) with the 3 Rickettsiales MAGs reconstructed in this study, eight Rickettsiales MAGs identified from public MAG datasets[21-24], ‘Ca. Deianiraea vastatrix’[17], endosymbiont of Peranema trichophorum, and endosymbiont of Stachyamoeba lipophora[29] and ‘Ca. Phycorickettsia trachydisci’[98]. Orthologues were identified by PSI-BLAST (E-value cut-off: 1 × 10−6)[89] using the gene alignments of the alphamito24 dataset as queries. Non-orthologues were detected and removed via single-gene tree inspections (alignment: MAFFT L-INS-i v7.471[99]; alignment trimming: trimAl v1.4.rev15 -gappyout[100]; phylogenetic inference: IQTREE v1.6.12 -fast -m LG+F+G[73]). A supermatrix alignment was prepared from the updated orthologous groups by re-aligning with MAFFT L-INS-i[99] and trimming the alignments with BMGE v1.12 -m BLOSUM30[101] before concatenation. The alignment was finalized by removing the 20% most compositionally heterogeneous sites with the χ2-trimmer[25,28]. A phylogenetic tree was inferred under the posterior mean site frequency (PMSF) approximation[102] of the LG+C60+F+Γ4 model (selected by ModelFinder[103]; guidetree: LG+G+F, 100 non-parametric bootstraps).

Genome-based obligate intracellular lifestyle prediction

We uploaded all Rickettsiales MAG and reference genome proteome files to the PhenDB[52] web server (http://phendb.org/) and used default parameters for prediction of obligate intracellular lifestyles.

Rickettsiales species tree

The ‘129-panorthologues’ phylogenomics dataset of Martijn et al.[26] was updated with the same MAGs and genomes that were used to update the alphamito24 dataset, as well as the recently sequenced Rickettsiales ‘UBA6177’[23], ‘Ca. Xenolissoclinum pacificiensis’[104], ‘Ca. Fokinia solitaria’[105], ‘Ca. Neoehrlichia lotoris’ (ASM96479v1), Neorickettsia helminthoeca (ASM63298v1), ‘Ca. Jidaibacter acanthamoeba’[18], endosymbiont of Acanthamoeba UWC8[72], Occidentia massiliensis Os18[106], Rickettsiales bacterium Ac37b and alphaproteobacterial outgroups Caulobacter crescentus CB15, ‘Ca. Puniceispirillum marinum’ IMCC1322[107], Azospirillum brasiliense Sp245 (now Azospirillum baldaniorum)[108], MarineAlpha3 Bin5, MarineAlpha3 Bin2, MarineAlpha12 Bin1, MarineAlpha11 Bin1, MarineAlpha9 Bin6 and MarineAlpha10 Bin2[25]. Orthologues were identified through PSI-BLAST v2.8.1+ (E-value cut-off: 1 ×10−6) searches using the 129 gene alignments as a query, and non-orthologues were detected and removed via single-gene tree inspections as described above. A discordance filter[109] was applied to remove the most discordant genes as follows: (i) single-gene alignments were prepared with MAFFT E-INS-i[99] and trimAl v1.4.rev15 -gappyout[100], (ii) single-gene trees were inferred with IQTREE v.1.6.9 (with -bnni)[73,110], (iii) bipartition count profiles were constructed from the bootstraps (tre_make_splits.pl) and compared between all possible gene pairs to calculate discordance scores (tre_discordance_two.pl). The top 13 most discordant genes were removed from the dataset (Supplementary Fig. 2). After preliminary phylogenomics analyses with the remaining 116 genes, we decided to omit extremely long-branching taxa ‘Ca. Xenolissoclinum pacificiensis’ and ‘Ca. Fokinia solitaria’ and phylogenetically unstable taxa (endosymbiont of Stachyamoeba lipophora and UBA6177) from downstream phylogenomics analyses (Supplementary Data 1). The resultant 116 orthologous groups were first aligned by applying PREQUAL v1.01[111] (masking of putative non-homologous sites), MAFFT E-INS-i (multiple sequence alignment) and Divvier -partial[112] (alignment ‘divvying’) and then concatenated into an ‘untreated’ supermatrix alignment. Another ‘no-Deianiraea’ supermatrix alignment was prepared in an identical manner but with the long-branched ‘Ca. D. vastatrix’ omitted. From the untreated supermatrix, an iterative χ2-trimmed alignment (47 rounds of removing the top 1% most heterogeneous sites as determined by χ2-score[28]) was prepared. The iterative χ2-trimmer was found to be more efficient at reducing compositional heterogeneity compared with the standard χ2-trim method[25,28,113]. These were used for phylogenetic reconstruction under the CAT+GTR+Γ4 (untreated) and CAT+LG+Γ4 (iterative χ2-trimmed) models with PhyloBayes MPI v1.8[114]. Four independent Markov chain Monte Carlo (MCMC) chains were run until convergence was reached (maxdiff <0.3) or a sufficient effective sample size was reached (effsize >300), while using a burn-in of at least 5,000 generations. Posterior predictive checks were performed to check to what degree the inferred phylogenetic models captured the across-taxa compositional heterogeneity and site-specific pattern diversity present in the alignments. Parameter configurations were sampled every 50 generations after the burn-in. Maximum likelihood phylogenetic reconstructions were done under the PMSF approximation (with 100 non-parametric bootstraps; guidetree under LG+G+F) of the LG+C60+F+Γ4 model (selected by ModelFinder) for both supermatrix alignments with IQTREE v1.6.5.

Gene family trees

We used the annotation of AlphaNOGs from EggNOG 4.5.1[83] to assign proteins from the MAGs and reference genomes into clusters. All proteins without AlphaNOG annotation were subjected to all-versus-all BLASTP analysis and subsequent de novo clustering with Silix (overlap 90, identity 60)[115], resulting in a total of 34,361 clusters. We computed alignments for all protein clusters with at least 4 members (4,240), using PREQUAL[111] to prefilter unaligned sequences, MAFFT E-INS-i[99] to perform alignments and Divvier v1.01 (-divvygap)[112] to filter alignments. Single-gene trees were inferred for the 4,240 alignments with IQTREE v1.6.9[73] with 1,000 ultrafast bootstraps (-bb 1000 -wbtl)[110]. A model test[103] was performed (-m TESTNEW -mset LG -madd LG+C10,..,LG+C60) for each tree inference. For clusters with only 2 or 3 members, we created ‘dummy’ bootstraps that represent the only possible topology for an unrooted tree.

Gene tree-species tree reconciliation

The 4,240 single-gene trees were reconciled with the species tree derived from the iterative χ2-trimmed alignment with CAT+LG+Γ4 using the ALEml_undated algorithm of the ALE suite v0.4[54]. ALE infers gene duplications, losses, transfers, originations and ancestral gene contents along a species tree[53,54]. It also takes into account the completeness of the extant genomes. Per node, a gene family was considered present if ‘copies’ are ≥0.3 in the ALE output. In addition, gene family evolution events (transfers, originations and losses) were counted similarly, with a threshold of 0.3. Copy numbers and evolutionary events per node in the species tree and gene family are reported by ALE as relative frequencies. These relative frequency values express the support of whether and how many times an evolutionary event occurred in a node (or a gene family was present), while incorporating all the uncertainty of the reconstructed gene tree sample. Therefore, standard statistical thresholds do not apply. We selected 0.3 as a minimal relative frequency due to its high signal to noise ratio (Supplementary Fig. 14). This threshold ensures that events that were reconstructed with a low frequency are still detected, since the signal in single-gene trees can sometimes be very low and events could be overlooked at more stringent thresholds[113,116]. Singleton clusters were counted as originations for the corresponding species.

Environmental diversity

The 16S rRNA gene sequences of the MAGs (Gamibacteraceae and Mitibacteraceae, no 16S sequences were found for Athabascaceae) were used to search the NCBI nt database with BLASTN v2.8.1+ (E-value <0.05, length >700 bp). An alignment was prepared for Rickettsiales and an alphaproteobacterial outgroup using MAFFT E-INS-i[99], and trimmed with trimAl v1.4.rev15 (-automated1)[100]. A phylogenetic tree was inferred with IQTREE[73] with 100 non-parametric bootstraps under the GTR+F+R8 model (selected by IQTREE’s model test[103]).

T4SS subunit gene trees

Trees for the T4SS subunits virB1-6,8-11,D4 were prepared using the corresponding eggNOG v.5.0[117] clusters for all Bacteria. First, we removed all Rickettsiales sequences from these clusters to avoid self-alignment. We then used the sequences from our taxon selection as queries to search the COGs using DIAMOND (–top 50–ultra-sensitive)[96] and clustered the obtained sequences using cd-hit[118] at 80% identity. We then aligned these reference sequences together with the sequences from our taxon sampling using MAFFT-E-INS-i[99] and applied a light gap trimming (trimAl v1.4.rev15 -gt 0.01[100]). Finally, trees were inferred in IQTREE[119] with automatic model selection[103] and support values were estimated by 1,000 ultrafast bootstraps (see Data availability).

Ancestral gene content reconstruction

We reconstructed ancestral gene family repertoires with the results from ALE by selecting all gene families predicted to be present at a given node with a frequency ≥0.3. Consensus annotations for gene families were inferred on the basis of the abovementioned gene annotations. Metabolic pathways or enzyme complexes were inferred as present if at least half of the necessary genes were present. We assessed the metabolic capabilities of ancestral genomes using the KEGG[84] Module tool or MetaCyc pathways[120].
  112 in total

Review 1.  Evolutionary microbial genomics: insights into bacterial host adaptation.

Authors:  Christina Toft; Siv G E Andersson
Journal:  Nat Rev Genet       Date:  2010-07       Impact factor: 53.242

Review 2.  For better or worse: genomic consequences of intracellular mutualism and parasitism.

Authors:  Jennifer J Wernegreen
Journal:  Curr Opin Genet Dev       Date:  2005-10-17       Impact factor: 5.578

Review 3.  Wolbachia: master manipulators of invertebrate biology.

Authors:  John H Werren; Laura Baldo; Michael E Clark
Journal:  Nat Rev Microbiol       Date:  2008-10       Impact factor: 60.633

Review 4.  Phylogenomics reveals a diverse Rickettsiales type IV secretion system.

Authors:  Joseph J Gillespie; Kelly A Brayton; Kelly P Williams; Marco A Quevedo Diaz; Wendy C Brown; Abdu F Azad; Bruno W Sobral
Journal:  Infect Immun       Date:  2010-02-22       Impact factor: 3.441

Review 5.  Cells within cells: Rickettsiales and the obligate intracellular bacterial lifestyle.

Authors:  Jeanne Salje
Journal:  Nat Rev Microbiol       Date:  2021-02-09       Impact factor: 60.633

6.  The genome sequence of Rickettsia prowazekii and the origin of mitochondria.

Authors:  S G Andersson; A Zomorodipour; J O Andersson; T Sicheritz-Pontén; U C Alsmark; R M Podowski; A K Näslund; A S Eriksson; H H Winkler; C G Kurland
Journal:  Nature       Date:  1998-11-12       Impact factor: 49.962

7.  A molecular timescale for eukaryote evolution with implications for the origin of red algal-derived plastids.

Authors:  Jürgen F H Strassert; Iker Irisarri; Tom A Williams; Fabien Burki
Journal:  Nat Commun       Date:  2021-03-25       Impact factor: 14.919

8.  Wholly Rickettsia! Reconstructed Metabolic Profile of the Quintessential Bacterial Parasite of Eukaryotic Cells.

Authors:  Timothy P Driscoll; Victoria I Verhoeve; Mark L Guillotte; Stephanie S Lehman; Sherri A Rennoll; Magda Beier-Sexton; M Sayeedur Rahman; Abdu F Azad; Joseph J Gillespie
Journal:  mBio       Date:  2017-09-26       Impact factor: 7.867

9.  Dating Alphaproteobacteria evolution with eukaryotic fossils.

Authors:  Sishuo Wang; Haiwei Luo
Journal:  Nat Commun       Date:  2021-06-03       Impact factor: 14.919

10.  An anomalous type IV secretion system in Rickettsia is evolutionarily conserved.

Authors:  Joseph J Gillespie; Nicole C Ammerman; Sheila M Dreher-Lesnick; M Sayeedur Rahman; Micah J Worley; Joao C Setubal; Bruno S Sobral; Abdu F Azad
Journal:  PLoS One       Date:  2009-03-12       Impact factor: 3.240

View more
  1 in total

1.  Origin of rickettsial host dependency unravelled.

Authors:  Victoria I Verhoeve; Joseph J Gillespie
Journal:  Nat Microbiol       Date:  2022-08       Impact factor: 30.964

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.