Literature DB >> 18045498

Phylogenomics and signature proteins for the alpha proteobacteria and its main groups.

Radhey S Gupta1, Amy Mok.   

Abstract

BACKGROUND: Alpha proteobacteria are one of the largest and most extensively studied groups within bacteria. However, for these bacteria as a whole and for all of its major subgroups (viz. Rhizobiales, Rhodobacterales, Rhodospirillales, Rickettsiales, Sphingomonadales and Caulobacterales), very few or no distinctive molecular or biochemical characteristics are known.
RESULTS: We have carried out comprehensive phylogenomic analyses by means of Blastp and PSI-Blast searches on the open reading frames in the genomes of several alpha-proteobacteria (viz. Bradyrhizobium japonicum, Brucella suis, Caulobacter crescentus, Gluconobacter oxydans, Mesorhizobium loti, Nitrobacter winogradskyi, Novosphingobium aromaticivorans, Rhodobacter sphaeroides 2.4.1, Silicibacter sp. TM1040, Rhodospirillum rubrum and Wolbachia (Drosophila) endosymbiont). These studies have identified several proteins that are distinctive characteristics of all alpha-proteobacteria, as well as numerous proteins that are unique repertoires of all of its main orders (viz. Rhizobiales, Rhodobacterales, Rhodospirillales, Rickettsiales, Sphingomonadales and Caulobacterales) and many families (viz. Rickettsiaceae, Anaplasmataceae, Rhodospirillaceae, Acetobacteraceae, Bradyrhiozobiaceae, Brucellaceae and Bartonellaceae). Many other proteins that are present at different phylogenetic depths in alpha-proteobacteria provide important information regarding their evolution. The evolutionary relationships among alpha-proteobacteria as deduced from these studies are in excellent agreement with their branching pattern in the phylogenetic trees and character compatibility cliques based on concatenated sequences for many conserved proteins. These studies provide evidence that the major groups within alpha-proteobacteria have diverged in the following order: (Rickettsiales(Rhodospirillales (Sphingomonadales (Rhodobacterales (Caulobacterales-Parvularculales (Rhizobiales)))))). We also describe two conserved inserts in DNA Gyrase B and RNA polymerase beta subunit that are distinctive characteristics of the Sphingomonadales and Rhodosprilllales species, respectively. The results presented here also provide support for the grouping of Hyphomonadaceae and Parvularcula species with the Caulobacterales and the placement of Stappia aggregata with the Rhizobiaceae group.
CONCLUSION: The alpha-proteobacteria-specific proteins and indels described here provide novel and powerful means for the taxonomic, biochemical and molecular biological studies on these bacteria. Their functional studies should prove helpful in identifying novel biochemical and physiological characteristics that are unique to these bacteria.

Entities:  

Mesh:

Substances:

Year:  2007        PMID: 18045498      PMCID: PMC2241609          DOI: 10.1186/1471-2180-7-106

Source DB:  PubMed          Journal:  BMC Microbiol        ISSN: 1471-2180            Impact factor:   3.605


Background

The α-proteobacteria form one of the largest groups within bacteria that includes numerous phototrophs, chemolithotrophs, chemoorganotrophs and aerobic photoheterotrophs [1]. They are abundant constituents of various terrestrial and marine environments [2]. Pelagibacter oblique, which is the smallest known free-living bacteria, is believed to be the most numerous bacteria on this planet (about 1028 cells) comprising about 25% of all microbial cells in the oceans [2]. The intimate association that many α-proteobacteria exhibit with the eukaryotic organisms is of central importance from agricultural and medical perspectives [3,4]. Symbiotic association of the Rhizobiaceae family members with the plant root nodules is responsible for most of the atmospheric nitrogen fixation by plants [4-6]. Many other α-proteobacteria such as Rickettsiales, Brucella and Bartonella have adopted intracellular life styles and they constitute important human and animal pathogens [3,7-9]. Additionally, the α-proteobacteria have also played a seminal role in the origin of the eukaryotic cell [10,11]. In the current taxonomic scheme based on 16S rRNA, α-proteobacteria are recognized as a Class within the phylum Proteobacteria [1,12,13]. They are subdivided into 7 main subgroups or orders (viz. Caulobacterales, Rhizobiales, Rhodobacterales, Rhodospirillales, Rickettsiales, Sphingomonadales and Parvularculales) [12]. The α-proteobacteria and their main subgroups are presently distinguished from each other and from other bacteria primarily on the basis of their branching in phylogenetic trees [14,15]. However, we have previously described several conserved inserts and deletions (indels), as well as whole proteins, that are specific for these bacteria [16,17]. In the past 3–4 years, the numbers of α-proteobacterial genomes that have been sequenced has increased markedly. In addition to > 60 complete genomes (Table 1), sequence information is available for a large number of other species. These genomes cover all of the main groups within α-proteobacteria (Table 1) and provide an enormously valuable resource for identifying molecular characteristics that are unique to them. This information can be used to identify unique sets of genes or proteins that are distinctive characteristics of various higher taxonomic groups (e.g. families, orders, etc.) within α-proteobacteria. Such genes/proteins provide valuable tools for taxonomic, biochemical and molecular biological studies [17-26]. With this goal in mind, in the present work, we have performed comprehensive phylogenomic analyses of α-proteobacterial genomes to identify proteins/ORFs that are distinctive characteristics of the various higher taxonomic groups within α-proteobacteria. The phylogenomic distribution of these proteins is also compared with the branching patterns of these species in phylogenetic trees and in the character compatibility cliques to develop a reliable picture of α-proteobacterial evolution.
Table 1

Genome sizes, protein numbers and GC contents of sequenced alpha-proteobacteria

Species NameOrderFamilyGenome Size (Mb)GC content (%)Protein NumberReference
Bartonella bacilliformis KC583RhizobialesBartonellaceae1.438.21283TIGR
Bartonella henselae str. Houston-1RhizobialesBartonellaceae1.9338.21488[7]
Bartonella quintana str. Toulouse *RhizobialesBartonellaceae1.5838.81142[7]
Bradyrhizobium japonicum USDA 110*RhizobialesBradyrhizobiaceae9.164.18317[5]
Bradyrhizobium japonicum BT AilRhizobialesBradyrhizobiaceae8.5364.87622DOE-JGI
Bradyrhizobium sp. ORS278RhizobialesBradyrhizobiaceae7.565.56717Genoscope
Nitrobacter hamburgensis X14RhizobialesBradyrhizobiaceae5.0161.64326DOE-JGI
Nitrobacter winogradskyi Nb-255*RhizobialesBradyrhizobiaceae3.42.53122[42]
Rhodopseudomonas palustris BisA53RhizobialesBradyrhizobiaceae5.5164.44878DOE-JGI
Rhodopseudomonas palustris BisB18RhizobialesBradyrhizobiaceae5.5165.04886DOE-JGI
Rhodopseudomonas palustris BisB5RhizobialesBradyrhizobiaceae4.8964.84397DOE-JGI
Rhodopseudomonas palustris CGA009RhizobialesBradyrhizobiaceae5.4765.04820[41]
Rhodopseudomonas palustris HaA2RhizobialesBradyrhizobiaceae5.3366.04683DOE-JGI
Brucella abortus biovar 1 str. 9–941RhizobialesBrucellaceae3.2957.23085[69]
Brucella melitensis 16MRhizobialesBrucellaceae3.2957.23198[38]
Brucella melitensis biovar Abortus 2308RhizobialesBrucellaceae3.2857.23034[39]
Brucella suis 1330*RhizobialesBrucellaceae3.3257.32123[40]
Brucella ovis ATCC 25840RhizobialesBrucellaceae3.357.22892TIGR
Mesorhizobium loti MAFF303099*RhizobialesPhyllobacteriaceae7.662.57372[6]
Mesorhizobium sp. BNC1RhizobialesPhyllobacteriaceae4.9461.14543DOE-JGI
Agrobacterium tumefaciens str. C58RhizobialesRhizobiaceae5.6759.04661[37]
Rhizobium etli CFN 42RhizobialesRhizobiaceae6.5361.05963[35]
Rhizobium leguminosarum bv. viciae 3841RhizobialesRhizobiaceae7.7561.07263[81]
Sinorhizobium meliloti 1021RhizobialesRhizobiaceae6.6962.26205[36]
Caulobacter crescentus CB15*CaulobacteralesCaulobacteraceae4.0267.23737[51]
Hyphomonas neptunium ATCC 15444Caulobacterales+Hyphomonadaceae3.7161.93505[46]
Maricaulis maris MCS10Caulobacterales+Hyphomonadaceae3.3762.73063DOE-JGI
Jannaschia sp. CCS1RhodobacteralesRhodobacteraceae4.462.24212DOE-JGI
Paracoccus denitrificans PD1222RhodobacteralesRhodobacteraceae5.2566.85077DOE-JGI
Rhodobacter sphaeroides 2.4.1*RhodobacteralesRhodobacteraceae4.4568.84242DOE-JGI
Rhodobacter sphaeroides ATCC 17025RhodobacteralesRhodobacteraceae4.5468.24333DOE-JGI
Rhodobacter sphaeroides ATCC 17029RhodobacteralesRhodobacteraceae4.4269.04132DOE-JGI
Roseobacter denitrificans OCh 114RhodobacteralesRhodobacteraceae4.358.94129[44]
Silicibacter pomeroyi DSS-3RhodobacteralesRhodobacteraceae4.664.14252[45]
Silicibacter sp. TM1040*RhodobacteralesRhodobacteraceae4.1560.13864DOE-JGI
Gluconobacter oxydans 621H*RhodospirillalesAcetobacteraceae2.9260.82664[56]
Granulibacter bethesdensis CGDNIH1RhodospirillalesAcetobacteraceae2.759.12437Rocky Mountain Lab
Acidiphilium cryptum JF-5RhodospirillalesAcetobacteraceae3.9767.13564DOE-JGI
Magnetospirillum magneticum AMB-1RhodospirillalesRhodospirillaceae4.9765.14559[57]
Rhodospirillum rubrum ATCC 11170*RhodospirillalesRhodospirillaceae4.4165.43841DOE-JGI
Candidatus Pelagibacter ubique HTCC1062Rickettsiales-1.329.71354[2]
Anaplasma marginale str. St. MariesRickettsialesAnaplasmataceae1.249.8949[70]
Anaplasma phagocytophilum HZRickettsialesAnaplasmataceae1.4741.61264[71]
Ehrlichia canis str. JakeRickettsialesAnaplasmataceae1.3229.0925DOE-JGI
Ehrlichia chaffeensis str. ArkansasRickettsialesAnaplasmataceae1.1830.11105[71]
Ehrlichia ruminantium str. GardelRickettsialesAnaplasmataceae1.527.5950[72]
Ehrlichia ruminantium str. WelgevondenRickettsialesAnaplasmataceae1.5227.5888[73]
Neorickettsia sennetsu str. MiyayamaRickettsialesAnaplasmataceae0.8641.1932[71]
Rickettsia bellii RML369-CRickettsialesRickettsiaceae1.5231.61429[74]
Rickettsia conorii str. Malish 7RickettsialesRickettsiaceae1.2732.41374[75]
Rickettsia felis URRWXCal2RickettsialesRickettsiaceae1.5932.51512[76]
Rickettsia prowazekii str. Madrid ERickettsialesRickettsiaceae1.1129.0835[77]
Rickettsia typhi str. WilmingtonRickettsialesRickettsiaceae1.1128.9838[78]
Wolbachia endosymbiont (Drosophila)*RickettsialesRickettsiaceae1.2735.21195[79]
Wolbachia endosymbiont (Brugia malayi)RickettsialesRickettsiaceae1.0834.2805[80]
Erythrobacter litoralis HTCC2594SphingomonadalesErythrobacteraceae3.0563.13011GBM Foundation
Novosphingobium aromaticivorans DSM 12444*SphingomonadalesSphingomona-daceae3.5665.23937DOE-JGI
Sphingopyxis alaskensis RB2256SphingomonadalesSphingomona-daceae3.3765.53195DOE-JGI
Sphingomonas wittichii RW1SphingomonadalesSphingomona-daceae5.9367.95345DOE-JGI
Zymomonas mobilis subsp. mobilis ZM4SphingomonadalesSphingomona-daceae2.0646.31998[53]

* indicates species that were used in blast searches

+ Although these species are classified as Rhodobacterales, results presented here and elsewhere [29,47] suggest their placement in the Caulobacterales.

DOE-JGI, Department of Energy, Joint Genome Institute; GBM, The Gordon and Betty Moore Foundation Marine Microbiology Initiative; TIGR, The Institute for Genome Research.

Genome sizes, protein numbers and GC contents of sequenced alpha-proteobacteria * indicates species that were used in blast searches + Although these species are classified as Rhodobacterales, results presented here and elsewhere [29,47] suggest their placement in the Caulobacterales. DOE-JGI, Department of Energy, Joint Genome Institute; GBM, The Gordon and Betty Moore Foundation Marine Microbiology Initiative; TIGR, The Institute for Genome Research.

Results and discussion

Phylogeny of alpha proteobacteria

For comparing and interpreting the results of phylogenomic analysis, it was necessary at first to examine the evolutionary relationships among α-proteobacteria in phylogenetic trees. Phylogenetic analyses for α-proteobacteria was carried out based on concatenated sequences for 12 highly conserved proteins (see Methods). The relationships among these species were examined by both traditional phylogenetic methods (viz. neighbour-joining (NJ), maximum parsimony (MP) and maximum-likelihood (ML)) and by the character compatibility approach [27]. Figure 1 presents a neighbour-joining distance tree for α-proteobacteria, showing the bootstrap scores for various nodes using the NJ, MP and ML methods. In this tree, all of the main groups or orders within α-proteobacteria (viz. Caulobacterales, Rhizobiales, Rhodobacterales, Rhodospirillales and Sphingomonadales), except the Rickettsiales, formed well-resolved clades by different methods. For the Parvularculales, sequence information was available from a single species and it branched with the Caulobacterales. The two main families of the Rickettsiales i.e. Rickettsiaceae and Anaplasmataceae did not form a monophyletic clade, although they constituted the deepest branching lineages within α-proteobacteria. Except for the NJ method, the relative branching of different main groups within α-proteobacteria was not resolved by other methods. The branching orders of different groups as seen here is similar to that observed previously in the rRNA trees [1,15,16,28]. Recently, after this work was completed, Williams et al. [29] have reported similar results based on phylogenetic analysis of a different large set of protein sequences.
Figure 1

A neighbour-joining distance tree based on concatenated sequences for 12 conserved proteins. The numbers on the nodes indicate bootstrap scores (out of 100) observed in the neighbour-joining (NJ), maximum parsimony (MP) and maximum-likelihood (ML) analyses (NJ/MP/ML). The species marked with * are presently not part of the Caulobacterales order, but the results of phylogenetic and phylogenomic studies presented here suggest their placement in this group.

A neighbour-joining distance tree based on concatenated sequences for 12 conserved proteins. The numbers on the nodes indicate bootstrap scores (out of 100) observed in the neighbour-joining (NJ), maximum parsimony (MP) and maximum-likelihood (ML) analyses (NJ/MP/ML). The species marked with * are presently not part of the Caulobacterales order, but the results of phylogenetic and phylogenomic studies presented here suggest their placement in this group. The evolutionary relationships among α-proteobacteria were also examined using the character compatibility approach [27]. This method removes all homoplasic and fast-evolving characters from the dataset [27,30,31] and it has proven useful in obtaining correct topology in cases which have proven difficult to resolve by other means [31-33]. These analyses were carried out on a smaller set of 27 species containing all main groups of α-proteobacteria and two ε-proteobacteria. The concatenated sequence alignment for the 12 proteins contained 896 positions that were useful for these studies (i.e. those sites where only two amino acids were found with each present in at least two species). The compatibility analysis of these sites resulted in 12 largest cliques each containing 350 mutually compatible characters. The two main relationships observed in these cliques are shown in Fig. 2. The other cliques differed from those shown only in the relative branching positions of various Rhizobiaceae species, which varied from each other by a single character and are shown as unresolved in Fig. 2. In both these cliques, the species from all main orders within α-proteobacteria were clearly distinguished by multiple unique characters. Further, in contrast to the phylogenetic trees where Rickettsiaceae and Anaplasmataceae did not form a distinct clade (Fig. 1), their monophyletic grouping was strongly supported by 9 unique shared characters (Fig. 2). The Rickettsiales and Rhodospirillales formed the deepest branching lineages in these cliques and other groups within α-proteobacteria branched in the following order: (Sphingomonadales (Rhodobacterales (Caulobacterales (Rhizobiales)))). This branching order was supported by multiple unique characters at each node giving confidence in the results.
Figure 2

Character compatibility cliques showing the two largest cliques of mutually compatible characters based on the two states sites in the concatenated sequence alignment for 12 conserved proteins. The cliques consisted of 350 mutually compatible characters. The numbers of characters that distinguished different clades are indicated on the nodes. Rooting was done using the sequences for Helicobacter pylori and Campylobacter jejuni. *, as in Figure 1.

Character compatibility cliques showing the two largest cliques of mutually compatible characters based on the two states sites in the concatenated sequence alignment for 12 conserved proteins. The cliques consisted of 350 mutually compatible characters. The numbers of characters that distinguished different clades are indicated on the nodes. Rooting was done using the sequences for Helicobacter pylori and Campylobacter jejuni. *, as in Figure 1. The two main cliques obtained in these analyses differed from each other in terms of the branching position of the species belonging to the order Rhodospirillales. In one clique (Fig. 2A), the Rhodospirillales branched with the Rickettsiales, whereas in the other this group of species was found to branch after the Rickettsiales and it formed outgroup of the other α-proteobacteria (Fig. 2B). However, only a single character supported the former relationship indicating that it was not reliable. The two families within the Rhodospirillales order (Rhodospirillaceae and Acetobacteraceae), although they branched close to each other, no unique character common to them was identified, indicating that they are highly divergent. The exact branching position of the Xanthobacter was also not resolved in these cliques. In some cliques, it appeared as an outgroup of the Bradyrhizobiaceae (as shown by the dotted lines in Fig. 2), whereas in others it was placed in the middle of the Rhizobiaceae and Bradyrhizobiaceae families (as seen for the Aurantiomonas).

Phylogenomic analyses of alpha proteobacteria

Table 1 lists some characteristics of various α-proteobacterial genomes that have been sequenced. The genomes vary in size from less than 1 Mb for Neorickettsia sennetsu to more than 9.0 Mb for Bradyrhizobium japonicum. To identify proteins that are distinguishing features of various higher taxonomic groups within α-proteobacteria, systematic Blastp searches were performed on each ORF in the genomes of B. japonicum USDA 110, Brucella suis 1330, Caulobacter crescentus CB15, Gluconobacter oxydans 621H, Mesorhizobium loti MAFF303099, Nitrobacter winogradskyi Nb-255, Novosphingobium aromaticivorans DSM 12444, Rhodobacter sphaeroides 2.4.1, Silicibacter sp. TM1040, Rhodospirillum rubrum ATCC 11170 and Wolbachia (Drosophila melanogaster) endosymbiont (see Methods section). The genomes chosen covered all main taxonomic groups within the sequenced α-proteobacteria. These analyses have identified large numbers of proteins that are uniquely found in particular groups of α-proteobacteria. A brief description of these results is given below.

Proteins that are distinguishing features of all (or most) alpha proteobacteria

We previously described 6 proteins (viz. CC1365, CC1725, CC1887, CC2102, CC3292 and CC3319) that appeared distinctive characteristics of α-proteobacteria [17]. The α-proteobacterial specificity of these proteins in earlier work was only assessed by means of Blastp searches and it was not confirmed by PSI-Blast, as in the present work (see Methods). Further, since the earlier work, the number of sequenced α-proteobacteria and other genomes has more than doubled. Hence, it was important to confirm the α-proteobacteria specificity of these proteins. Our results reveal that four of these proteins viz. CC1365, CC2102, CC3292 and CC3319, are indeed specific for the α-proteobacteria as a whole, whereas for the remaining two proteins homologs showing significant similarities are also found in other bacteria. Of these four proteins, CC3292 is present in all sequenced α-proteobacteria including Candidatus Pelagibacter ubique, which is the smallest known free-living bacterium [2]. The protein CC2102 is only missing in P. ubique, while the other two proteins are only missing in 1–2 rickettsiae species (CC1365) and P. ubique (CC3319). These proteins, which are uniquely present in virtually all α-proteobacteria, provide distinguishing markers for this Class of bacteria (Fig. 3).
Figure 3

Summary of the phylogenomic analyses showing the species distribution of various α-proteobacteria-specific proteins and the suggested evolutionary stages where the genes for these proteins have likely evolved. The genes IDs for some proteins described in earlier work are indicated [17]. The information for all other proteins can be found in the indicated Tables or Additional files. A large numbers of conserved indels that are specific for different groups or clades within α-proteobacteria shown here have also been identified in our earlier work [16] (not shown here). The branching order of α-proteobacteria relative to other bacteria has been established in earlier work [32,58].

Summary of the phylogenomic analyses showing the species distribution of various α-proteobacteria-specific proteins and the suggested evolutionary stages where the genes for these proteins have likely evolved. The genes IDs for some proteins described in earlier work are indicated [17]. The information for all other proteins can be found in the indicated Tables or Additional files. A large numbers of conserved indels that are specific for different groups or clades within α-proteobacteria shown here have also been identified in our earlier work [16] (not shown here). The branching order of α-proteobacteria relative to other bacteria has been established in earlier work [32,58]. In earlier work, 9 proteins were identified that were present in nearly all α-proteobacteria, except the Rickettsiales [17]. These latter species are all intracellular bacteria that have lost many genes that are not required under these conditions [3,34]. The Blastp and PSI-Blast reexamination of these proteins have confirmed that 7 of these proteins (CC0100, CC0520, CC1211, CC1886, CC2245, CC3010 and CC3470) exhibit the indicated specificity. Except for CC0520 and CC3470, the other five proteins are also found in P. ubique, providing evidence for its placement within α-proteobacteria [2]. These results also provide evidence that P. ubique is not specifically related to the Rickettsiales. The phylogenomic distributions of these genes/proteins can be explained by either their evolution after the divergence of the Rickettsiales (Fig. 3), or by gene loss from this lineage.

Proteins that are distinguishing features of the Rhizobiales species

The Rhizobiales species comprise more than 1/3rd of the sequenced α-proteobacterial genomes (see Table 1). This order includes a wide assortment of species many of which interact with the eukaryotic organisms to produce diverse effects. This group includes various rhizobia (Rhizobium, Mesorhizobium, Sinorhizobium) and Bradyrhizobia species that induce root nodules in plants and live symbiotically within them to enable nitrogen fixation [4-6,35,36]. Another Rhizobiaceae species, A. tumefaciens, induces Crown gall disease (tumors) in plants [37]. Bartonella and Brucella species are intracellular pathogens responsible for a number of diseases in human and animals including trench fever and brucellosis [7,38-40]. Other members of this order exhibit enormous versatility in terms of their metabolic capabilities and life styles [1,41,42]. Earlier studies identified six proteins that appeared distinctive characteristics of the Rhizobiales species [17]. Reexamination of these proteins by the more stringent criteria used in the present work confirmed that three of these proteins (BQ00720, BQ07670 and BQ12030) are indeed specific for the Rhizobiales and they are present in virtually all sequenced species from this large order. In addition to the Rhizobiales, these proteins are also present in Stappia aggregata, which is presently grouped with the Rhodobacterales [12], but was originally known as a strain of Agrobacterium [43]. New blast searches on the genomes of B. japonicum, B. suis, M. loti and N. winogradskyi have identified large number of other proteins that are specific for different subgroups within the Rhizobiales order. Seven proteins listed in Table 2A are uniquely present in most of the sequenced species belonging to the Rhizobiaceae (Rhizobium, Sinorhizobium, Agrobacterium), Brucellaceae, Phyllobacteriaceae (Mesorhizobium) and Aurantimonadaceae families. The absence of many of these proteins in Bartonella species is probably due to gene loss [7,34]. These groups form a well-defined clade with high bootstrap scores in the NJ, MP and ML trees (see Fig. 1) and the genes for them likely evolved in a common ancestor of these genera/families (Fig. 3). Another 9 proteins (Table 2B) are uniquely present in most of the above species except Aurantimonas, which forms outgroup of the Rhizobiaceae, Brucellaceae, Bartonellaceae and Phyllobacteriaceae families (Figs. 1 and 2) [29]. Thus, the genes for these proteins have evolved after the branching of Aurantimonadaceae (Fig. 3). For six of the proteins in Tables 2A and 2B (marked with *), homologs with low E values are also found in S. aggregata, providing further evidence for its relatedness to the Rhizobiaceae family. Nine other proteins in Table 2C are found in most Rhizobiaceae species and M. loti, but they are missing in Bartonella and Brucella species. Although these proteins suggest a closer relationship between the Rhizobiaceae and Phyllobacteriaceae families, it is more likely that their genes have been lost from the Bartonella and Brucella species [34], which are intracellular bacteria. Additionally, some proteins were only found in either Mesorhizobium and Rhizobium, or Mesorhizobium and Sinorhizobium (Table 2D). These analyses have also identified 43 proteins that are uniquely present in all four sequenced Brucella species and many other proteins that are present in either three or two of the sequenced Brucella species (see Additional file 1).
Table 2

Proteins specific for the Rhizobiaceae and related species

Gene IDAccession NumberFunctionGene IDAccession NumberFunction
A. Proteins unique to Aurantimonas, Mesorhizobium, Sinorhizobium, Rhizobium, Agrobacterium, Bartonella and Brucella

mll00621NP_101943Hypotheticalmlr07891NP_102519Hypothetical
mll40681NP_105027Hypotheticalmlr3016NP_104217Omp10
mll77911,4NP_108034Hypotheticalmsl65261NP_107016Hypothetical
mlr0777 1NP_102510Hypothetical

B. Proteins unique to Mesorhizobium, Sinorhizobium, Rhizobium, Agrobacterium, Bartonella and Brucella (Missing in Aurantimonas)

mll0122 1,4NP_101988Hypotheticalmll5001 1,4NP_105743Hypothetical
mll1268 2NP_102895Hypotheticalmll8359 1,4NP_108472Hypothetical
mll2847 2,3NP_104087Hypotheticalmlr1823 1NP_103319Hypothetical
mll2898 4NP_104130Hypotheticalmlr0094 1,4NP_101965MhpC (COG0596)
mll4298 1,2NP_105201Hypothetical

C. Proteins unique to Mesorhizobium and Rhizobiaceae species

mll0080NP_101954Hypotheticalmll67034NP_107159Hypothetical
mll0867NP_102577Hypotheticalmlr1904NP_103376Hypothetical
mll9619NP_109472Hypotheticalmlr3274NP_104418Hypothetical
mlr5174 4NP_105883Hypotheticalmlr4951NP_105704NodF
mll6303NP_106835Hypothetical

D. Proteins specific to Mesorhizobium and either Rhizobium or Sinorhizobium

mll0459NP_102252Hypotheticalmll2007NP_103455Hypothetical
mll1779NP_103286Hypotheticalmlr1999NP_103450Hypothetical
mll6195NP_106741Hypotheticalmlr2029NP_103476Hypothetical
mll8758NP_106740Hypotheticalmlr6601NP_107075Hypothetical
mlr3037NP_104236Transcriptional regulator

1 Missing in Bartonella

2 Missing in Agrobacterium

3 Missing in Rhizobium.

4 Also found in Stappia aggregata

Proteins specific for the Rhizobiaceae and related species 1 Missing in Bartonella 2 Missing in Agrobacterium 3 Missing in Rhizobium. 4 Also found in Stappia aggregata The analyses of proteins in the genomes of B. japonicum and N. winogradskyi have identified 12 proteins that are uniquely present in either all (or most) of the sequenced Bradyrhizobiaceae species as well as X. autotrophicus (Table 3A). The species from these two families form a strongly supported clade in the phylogenetic tree (Fig. 1)[29]. Sixty-two additional proteins in Table 3B are uniquely present in various species belonging to the Bradyrhizobiaceae family (i.e. Bradyrhizobium, Nitrobacter, Rhodopseudomonas). Many other proteins (see Additional file 2) are only found in two of the three Bradyrhizobiaceae genera and their distributions can result from gene losses, lateral gene transfers (LGTs), or other mechanisms.
Table 3

Proteins that are specific for the Bradyrhizobiaceae group

Gene IDAccession NumberFunctionGene IDAccession NumberFunction
A. Proteins Unique to Bradyrhizobiaceae Family and Xanthobacter

bll60141,2NP_772654Putative general secretion pathway protein MNwi_2179YP_318785Hypothetical
Nwi_1093YP_317707HypotheticalNwi_24321YP_319038Hypothetical
Nwi_1227YP_317841HypotheticalNwi_24761YP_319081Putative bacterioferritin
Nwi_17861YP_318399HypotheticalNwi_2572YP_319177Hypothetical
Nwi_1788YP_318401HypotheticalNwi_2623YP_319228Hypothetical
Nwi_21471,3YP_318753HypotheticalNwi_2707YP_319312Hypothetical

B. Proteins Unique to Bradyrhizobiaceae Family

bll58992NP_772539HypotheticalNwi_2021YP_318632Hypothetical
blr61061,2NP_772746HypotheticalNwi_2063YP_318673Hypothetical
Nwi_0278YP_316897HypotheticalNwi_2064YP_318674Hypothetical
Nwi_0503YP_317122HypotheticalNwi_2163YP_318769Hypothetical
Nwi_0528YP_317147HypotheticalNwi_2173YP_318779Hypothetical
Nwi_06051YP_317224HypotheticalNwi_21831YP_318789Hypothetical
Nwi_07101,dYP_317328HypotheticalNwi_22083YP_318814Hypothetical
Nwi_0925YP_317539HypotheticalNwi_2244YP_318850Hypothetical
Nwi_09661,dYP_317580HypotheticalNwi_2247YP_318853Hypothetical
Nwi_1084YP_317698HypotheticalNwi_2379YP_318985Hypothetical
Nwi_1092YP_317706HypotheticalNwi_23812YP_318987Hypothetical
Nwi_11073YP_317721HypotheticalNwi_24143YP_319020Hypothetical
Nwi_1108YP_317722HypotheticalNwi_2489YP_319094Hypothetical
Nwi_1336YP_317949HypotheticalNwi_249213YP_319097Hypothetical
Nwi_1139YP_317753HypotheticalNwi_2500YP_319105Hypothetical
Nwi_12473YP_317861HypotheticalNwi_25063YP_319111Hypothetical
Nwi_1270YP_317883HypotheticalNwi_25093YP_319114Hypothetical
Nwi_12753YP_317888HypotheticalNwi_2531YP_319136Hypothetical
Nwi_14541YP_318067HypotheticalNwi_2575YP_319180Hypothetical
Nwi_1498YP_318111HypotheticalNwi_2577YP_319182Hypothetical
Nwi_1512YP_318125HypotheticalNwi_258813YP_319193Hypothetical
Nwi_15813YP_318194HypotheticalNwi_2630YP_319235Hypothetical
Nwi_1582YP_318195HypotheticalNwi_2676YP_319281Hypothetical
Nwi_1586YP_318199Dihydrofolate reductaseNwi_26773YP_319282Hypothetical
Nwi_16491,3YP_318262HypotheticalNwi_2769YP_319374Hypothetical
Nwi_1674YP_318287HypotheticalNwi_27891YP_319394Hypothetical
Nwi_1705YP_318318HypotheticalNwi_29843YP_319586Hypothetical
Nwi_1711YP_318324HypotheticalNwi_2959YP_319561Hypothetical
Nwi_1785YP_318398HypotheticalNwi_3035YP_319637Hypothetical
Nwi_1793YP_318406HypotheticalNwi_3140YP_319739Hypothetical
Nwi_18001,2YP_318413HypotheticalNwi_31411YP_319740Hypothetical

1 Missing in one or more strains of Rhodopseudomonas

2 Missing in one or more species of Nitrobacter

3 Missing in Bradyrhizobium japonicum USDA 110

Proteins that are specific for the Bradyrhizobiaceae group 1 Missing in one or more strains of Rhodopseudomonas 2 Missing in one or more species of Nitrobacter 3 Missing in Bradyrhizobium japonicum USDA 110

Proteins that are distinguishing features of the Rhodobacterales species

The order Rhodobacterales is a heterogeneous lineage of bacteria that exhibit much diversity in terms of their metabolism and cell division cycles [1,13]. This group includes many photosynthetic bacteria that are capable of CO2 as well as nitrogen fixation and also many chemoorganotrophs that can metabolize various sulfur-containing compounds [44,45]. A number of budding, stalk forming and prosthecate bacteria also belong to this group [46]. In addition to many completely sequenced genomes (see Table 1), information for several other species belonging to this order (e.g. Sulfitibacter, Oceanicola, Loktanella, Jannaschia, Dinoroseobacter, Roseovarius and Sagittula) is available in the NCBI database. To identify proteins that are specific for the Rhodobacterales, phylogenomic analyses were carried out on various ORFs in the genomes of R. sphaeroides 2.4.1 and Silicibacter sp. TM1040. These studies have identified 29 proteins that are present in all-available Rhodobacterales species (Table 4A), but these proteins as well as those listed in Table 4B and 4C are not found in H. neptunium, Oceanicaulis alexandrii, Maricaulis maris or Stappia aggregata. These latter species are presently grouped with the Rhodobacterales [12], however, the absence of various Rhodobacterales-specific proteins in them and phylogenetic studies indicate that the placement of these species within this order is incorrect and needs be revised. In phylogenetic trees based on concatenated sequences for many proteins, O. alexandrii and M. maris consistently branched with the Caulobacter rather than the well-defined clade of Rhodobacterales (Figs. 1 and 2) [29]. The studies by Badger et al. [47] also provide strong evidence for the grouping of H. neptunium with the Caulobacterales.
Table 4

Proteins that are specific for the Rhodobacterales

Gene IDAccession NumberFunctionGene IDAccession NumberFunction
A. Proteins specific for Rhodobacterales (Oceanicola, Loktanella, Paracoccus, Roseovarius, Roseobacter, Jannaschia, Silicibacter, Sulfitobacter, Dinoroseobacter, Sagittula)

TM1040_00931YP_612088HypotheticalTM1040_1842YP_613837Phasin, PhaP
TM1040_01842YP_612179HypotheticalTM1040_19673YP_613961Putative CheA signal transduction
TM1040_02361,2YP_612231HypotheticalTM1040_1988YP_613982Hypothetical
TM1040_0471YP_612466Putative rod shape-determining protein MreDTM1040_22632YP_614257Hypothetical
TM1040_05862YP_612581HypotheticalTM1040_2370YP_614364Hypothetical
TM1040_05872YP_612582HypotheticalTM1040_24253YP_614419Hypothetical
TM1040_0697YP_612692HypotheticalTM1040_2466YP_614460GCN5-related N-acetyltransferase COG045
TM1040_07502,4YP_612745HypotheticalTM1040_24872YP_614481Hypothetical
TM1040_07521YP_612747Lipoprotein, putativeTM1040_25823YP_614576Hypothetical
TM1040_1063YP_613058Gene transfer agentTM1040_2999YP_614993Hypothetical
TM1040_1064YP_613059Gene transfer agentTM1040_30773YP_611313Hypothetical
TM1040_12473YP_613242HypotheticalTM1040_3749YP_611978Hypothetical
TM1040_13502YP_613345HypotheticalTM1040_3759YP_611988Lipoprotein, putative
TM1040_1406YP_613401Outer membrane chaperone Skp (OmpH)TM1040_37643YP_611993Putative transmembrane protein
TM1040_1567YP_613562Hypothetical

B. Proteins unique to various Rhodobacterales but missing in Rhodobacter and Paracoccus

TM1040_15581YP_613553HypotheticalTM1040_21574YP_614151Hypothetical
TM1040_17351YP_613730HypotheticalTM1040_24431,4YP_614437Lipolytic enzyme, G-D-S-L
TM1040_18441,5,6YP_613839HypotheticalTM1040_26807YP_614674Hypothetical

C. Proteins Unique to Silicibacter and Roseobacter

TM1040_10998YP_613094HypotheticalTM1040_3189YP_611425
TM1040_14238YP_613418HypotheticalTM1040_32028YP_611438
TM1040_1451YP_613446HypotheticalTM1040_32088YP_611444
TM1040_19868YP_613980HypotheticalTM1040_32268YP_611462
TM1040_2106YP_614100HypotheticalTM1040_35298YP_611763
TM1040_21398YP_614133HypotheticalTM1040_36268YP_611855
TM1040_30758YP_611311Hypothetical

1 Missing in Loktanella vestfoldensis SKA53

2 Missing in one or more Rhodobacter sphaeroides strains

3 Missing in Paracoccus denitrificans PD12222

4 Missing in Rhodobacterales bacterium HTCC2654

5 Missing in one or more species of Roseovarius

6 Missing in Oceanicola batsensis HTCC2597

7 Missing in one or more species of Roseobacter

8 Missing in Silicibacter pomeroyi DSS-3

Proteins that are specific for the Rhodobacterales 1 Missing in Loktanella vestfoldensis SKA53 2 Missing in one or more Rhodobacter sphaeroides strains 3 Missing in Paracoccus denitrificans PD12222 4 Missing in Rhodobacterales bacterium HTCC2654 5 Missing in one or more species of Roseovarius 6 Missing in Oceanicola batsensis HTCC2597 7 Missing in one or more species of Roseobacter 8 Missing in Silicibacter pomeroyi DSS-3 Six additional proteins in Table 4B are present in most of the Rhodobacterales species, but they are missing in R. sphaeroides and P. denitrificans, which form a distinct clade that appears as the outgroup of other Rhodobacterales species (Fig. 1)[29]. Thus, the genes for these proteins have likely evolved after the branching of these two genera. Thirteen additional proteins, which are only found in Silicibacter and Roseobacter genera (Table 4C) support a close relationship among them, as seen in phylogenetic trees (Fig. 1). Of the proteins that are specific for Rhodobacterales, two of them (YP_613058 and YP_613059 in Table 4A, corresponding to proteins ABK27256 and ABK27255 in R. capsulatus genome) were previously identified as part of a complex referred to as gene transfer agent [48], based on similarity to certain virus-like elements. Another protein in the same category (viz. ABK27253) is specific for the Rhodobacter genus. The significance of these results is presently unclear.

Proteins that are distinctive characteristics of the Caulobacterales

The order Caulobacterales is comprised of a single family with only four genera [12,49]. These chemoorganotrophic bacteria are commonly found in marine aerobic environments and they are distinguished by their ability to form stalked cells and unusual cell division cycle [49,50]. The complete genome of only C. crescentus, which is the best-studied organism from this group, is presently available [51]. However, as discussed above, a number of other species which are presently classified as Rhodobacterales viz. O. alexandrii, M. maris, H. neptunium and also Parvularcula bermudensis, consistently branch with C. crescentus in different phylogenetic trees (Figs. 1 and 2) [29,47]. Thus, phylogenomic analysis of the ORFs from C. crescentus genome was of much interest. These analyses have identified 2 proteins (CC0486 and CC2480), which are uniquely present in this species as well as O. alexandrii, M. maris, H. neptunium and P. bermudensis (Table 5) One additional protein, CC2764 is present in all of these species except H. neptunium. The remaining eight proteins in Table 5 are only found in C. crescentus, O. alexandrii and M. maris, indicating that the latter two species are more closely related to C. crescentus in comparison to either H. neptunium or P. bermudensis. These results are strongly supported by the branching patterns of these species in phylogenetic trees (Figs. 1 and 2) [29,47]. Previously, Badger et al. [46] have reported identification of 62 proteins that were only found in C. crescentus and H. neptunium. However, the blast threshold used in this study to infer the absence of these proteins in other species was very high i.e. 1e-10. By the criteria used in the present work (see Methods), none of these 62 proteins was found to be unique to these two species.
Table 5

Proteins specific for the Caulobacter and related species

Proteins Unique to Caulobacter, Oceanicaulis and Maricaulis (Some also found in Hyphomonas and Parvularcula)
Gene IDAccession NumberFunctionGene IDAccession NumberFunction

CC04861NP_419305HypotheticalCC1066NP_419882Hypothetical
CC24801NP_421283HypotheticalCC1586NP_420397Hypothetical
CC27642NP_421560HypotheticalCC2207NP_421010Hypothetical
CC3101NP_421895HypotheticalCC2628NP_421428hfaA protein
CC0512NP_419331HypotheticalCC2639NP_421438Hypothetical
CC1064NP_419880Hypothetical

All of the proteins in this Table are present in C. crescentus as well as in O. alexandrii and M. maris.

1 These proteins are also present in H. neptunium and P. bermudensis.

2 Also found in P. bermudensis.

Proteins specific for the Caulobacter and related species All of the proteins in this Table are present in C. crescentus as well as in O. alexandrii and M. maris. 1 These proteins are also present in H. neptunium and P. bermudensis. 2 Also found in P. bermudensis.

Proteins that are distinguishing characteristics of the Sphingomonadales

The species belonging to the order Sphingomonadales are present in both terrestrial and aquatic environments [52]. A distinguishing characteristic of many species from this group is the presence of glycosphingolipids in their cell envelope rather than lipopolysaccharides [52,53]. Several species from this group (e.g. N. aromaticivorans) can degrade a wide variety of aromatic hydrocarbons [52], whereas others such as Zymomonas mobilis, can highly efficiently ferment sugar to ethanol [53], making them of much interest and importance from biotechnological standpoints. This group also includes phototrophic organisms (e.g. Erythrobacter litoralis), which contain bacteriochlorophyll a and can derive significant fraction of their metabolic energy via anaerobic photosynthesis [54]. The complete genomes of 5 species from this order are now available (see Table 1). In addition, large numbers of sequences for Sphingomonas sp. SKA58, are also available in the NCBI database. Blast searches on the ORFs in the genome of N. aromaticivorans have identified 16 proteins (Table 6A) that are uniquely present in all 6 of the Sphingomonadales species for which information is available. Thirteen additional proteins (Table 6B) are present in all other Sphingomonadales species, except Z. mobilis, which is the deepest branching species within this group (Fig. 1). The genes for these proteins likely evolved after the branching of Z. mobilis. Many other proteins are present in only 3 or 4 of these species (viz. N. aromaticivorans, E. litoralis, S. alaskensis, S. wittichiii and Sphingomonas sp. SKA58) (see Additional file 3) and their phylogenomic distribution can result from a variety of mechanisms including shared ancestry, gene loss and LGTs among these species.
Table 6

Proteins that are specific for the Sphingomonadales group of species

Gene IDAccession NumberFunctionGene IDAccession NumberFunction
A. Proteins Unique to Sphingomonadales Order (Novosphingobium, Erythrobacter, Sphingomonas, Sphingopyxis and Zymomonas)

Saro_0018YP_495301HypotheticalSaro_1291YP_496569Hypothetical
Saro_00521YP_495335HypotheticalSaro_1378YP_496656Hypothetical
Saro_0087YP_495370HypotheticalSaro_19141YP_497188Hypothetical
Saro_0150YP_495433HypotheticalSaro_2130YP_497403Hypothetical
Saro_0232YP_495514HypotheticalSaro_2788YP_498058Hypothetical
Saro_0409YP_495691HypotheticalSaro_2958YP_498227Hypothetical
Saro_1088YP_496367HypotheticalSaro_3138YP_498407Hypothetical
Saro_1144YP_496423HypotheticalSaro_3213YP_498482Hypothetical

B. Proteins Unique to Sphingomonadales but missing in Zymononas

Saro_0044YP_495327HypotheticalSaro_17482YP_497022
Saro_0154YP_495437HypotheticalSaro_1785YP_497059
Saro_04153YP_495697HypotheticalSaro_1972YP_497246
Saro_0458YP_495740HypotheticalSaro_2036YP_497309
Saro_1078YP_496357HypotheticalSaro_2037YP_497310
Saro_1126YP_496405HypotheticalSaro_2333YP_497604
Saro_1160YP_496439HypotheticalSaro_2548YP_497818
Saro_1163YP_496442Hypothetical

1 Missing in Sphingomonas wittchii

2 Blast scores for C. crescentus and H. neptunium are also significant

3 Significant blast scores for C. crescentus

Proteins that are specific for the Sphingomonadales group of species 1 Missing in Sphingomonas wittchii 2 Blast scores for C. crescentus and H. neptunium are also significant 3 Significant blast scores for C. crescentus We have also identified a 4 aa insert in a highly conserved region of Gyrase B that is mainly specific for the species from this order (Fig. 4). This indel is present in all available Sphingomonadales species, but it is not found in most other alpha proteobacteria or other bacteria (results not shown). Besides Sphingomonadales, a similar size indel is also present in three other species (viz. C. leidyia, R. blasticus, and Pesudorhodobacter incheonensis). Because other Rhodobacterales or Caulobacter species lack this insert, the presence of this indel in these three species could be either due to LGTs or possibly due to taxonomic misclassification of these species.
Figure 4

Partial sequence alignments of DNA Gyrase B showing a 4 aa insert that is mainly specific for the Sphingomonadales species. A 4–5 aa insert present in some other α-proteobacteria could be due to either LGTs or taxonomic anomalies. The dashes (-) denote identity with the amino acid on the top line. Sequence information for other groups of bacteria (which do not contain this insert) is not shown.

Partial sequence alignments of DNA Gyrase B showing a 4 aa insert that is mainly specific for the Sphingomonadales species. A 4–5 aa insert present in some other α-proteobacteria could be due to either LGTs or taxonomic anomalies. The dashes (-) denote identity with the amino acid on the top line. Sequence information for other groups of bacteria (which do not contain this insert) is not shown.

Proteins and indels that are specific for the Rhodospirillales species

The order Rhodospirillales is comprised of diverse species including some photosynthetic and magnetotactic bacteria, some acidophiles as well as other bacteria commonly associated with flowers, fruits and fermented beverages that are involved in the partial oxidation of carbohydrates and alcohols [55]. The order is made up of two main families, Rhodospirillaceae and Acetobacteraceae [12,55]. The complete genomes of four species, two from each family, Rhodospirillaceae (Magnetospirillum magnticum, R. rubrum) and Acetobacteraceae (Acidiphilium cryptum and G. oxydans), are available (Table 1) [56,57]. Phylogenomic analyses of various ORFs in the genomes of G. oxydans and R. rubrum have led to identification of one proteins, GOX0963, which is uniquely found in all of these species (Table 7A). Three other proteins in this Table are present in at least 3 of the 4 species from this order. This table also lists 14 proteins each that are distinctive characteristics of either the Acetobacteraceae (Table 7B) or the Rhodospirillaceae (Table 7C) families, providing molecular markers for these families. We have also identified a 25 aa insert in a conserved region of the RNA polymerase beta subunit (RpoB) that is unique to various sequenced Rhodospirillales species, but not found in any other bacteria (Fig. 5).
Table 7

Proteins that are specific for the Rhodospirillales group

Gene IDAccession NumberFunctionGene IDAccession NumberFunction
A. Proteins Unique to Rhodospirillales Order (Gluconobacter, Magnetospirillum, Rhodospirillum and Acidiphilium)

GOX06331AAW60410HypotheticalGOX0963AAW60735Hypothetical
GOX06952AAW60472HypotheticalGOX12583AAW61019Hypothetical

B. Proteins Unique to Acetobacteraceae Family (Gluconobacter and Acidiphilium)

GOX0143AAW59936HypotheticalGOX1616AAW61357Hypothetical
GOX0343AAW60126ANK, ankyrin repeatsGOX2216AAW61951Hypothetical
GOX1212AAW60973Phage portal proteinGOX2275AAW62008Hypothetical
GOX1215AAW60976Putative phage proteinGOX2316AAW62049Hypothetical
GOX1222AAW60983Putative phage proteinGOX2452AAW62183Hypothetical
GOX1224AAW60985Putative phage proteinGOX2454AAW62185Hypothetical
GOX1233AAW60994HypotheticalGOX2456AAW62187Hypothetical

C. Proteins Unique to Rhodospirillaceae Family (Rhodospirillum and Magnetospirillum)

Rru_A0125YP_425217Putative diguanylate phosphodiesteraseRru_A2592YP_427676Hypothetical
Rru_A0152YP_425244HypotheticalRru_A2828YP_427912Hypothetical
Rru_A0531YP_425622HypotheticalRru_A3562YP_428643Hypothetical
Rru_A1689YP_426776HypotheticalRru_A3636YP_428717Hypothetical
Rru_A1756YP_426843HypotheticalRru_A3662YP_428743Hypothetical
Rru_A2112YP_427199HypotheticalRru_A3739YP_428820Hypothetical
Rru_A2510YP_427597Predicted transcriptional regulatorRru_A3800YP_428881Hypothetical

1 Missing in Rhodospirillum

2 Missing in Acidiphilium

3 Missing in Magnetospirillum

Figure 5

Partial sequence alignments of RNA polymerase β subunit (RpoB) showing a large insert (boxed) that is a distinctive characteristic of various Rhodospirillales species and not found in any other bacteria. There are two homologs of RpoB in Magentospirillum (Mag.) magneticum and only one of these contains the insert. The dashes (-) denote identity with the amino acid on the top line.

Proteins that are specific for the Rhodospirillales group 1 Missing in Rhodospirillum 2 Missing in Acidiphilium 3 Missing in Magnetospirillum Partial sequence alignments of RNA polymerase β subunit (RpoB) showing a large insert (boxed) that is a distinctive characteristic of various Rhodospirillales species and not found in any other bacteria. There are two homologs of RpoB in Magentospirillum (Mag.) magneticum and only one of these contains the insert. The dashes (-) denote identity with the amino acid on the top line.

Proteins that are specific for the Rickettsiales

The Rickettsiales species are intracellular pathogens responsible for a number of diseases in humans and other animals [3,34]. This order is comprised of three families, Rickettsiaceae, Anaplasmataceae and Holosporaceae. All of the sequenced genomes are from the first two families. Phylogenomic analysis of various ORFs from the genome of Wolbachia (D. melanogaster) endosymbiont has identified 3 proteins viz. WD0161, WD0715 and WD0771 (Table 8A) that are specific for the entire Rickettsiales order. Five other proteins in Table 8B (viz. WD0083, WD0157, WD0821, WD0827 and WD0863) are present in all sequenced Ehrlichia, Anaplasma, Wolbachia and Neorickettsia species (belonging to the Anaplasmataceae family), but they are not found in any of the Rickettsiaceae or other bacteria. Ten additional proteins (Table 8C) are uniquely present in the Ehrlichia, Anaplasma and Wolbachia species, but are absent in Neorickettsia. In view of the deep branching of Neorickettsia in comparison to these other genera (Fig. 1), the genes for these proteins have likely evolved after the branching of Neorickettsia. In earlier work, a number of proteins that appeared specific for the Rickettsiaceae family were also identified [17]. Additional Blastp and PSI-Blast searches on these proteins confirm that three of these proteins viz. RP030, RP187 are RP192, are indeed specific for the Rickettsiaceae family. Of these, RP030 is also found in Orientia tsutsugamushi.
Table 8

Protein that are specific for the Rickettsiales group of species

Gene IDAccession NumberFunctionGene IDAccession NumberFunction
A. Proteins Unique to Rickettsiales (Wolbachia, Ehrlichia, Anaplasma, Rickettsia and Neorickettsia)

WD0161NP_965979HypotheticalWD07711NP_966526Hypothetical
WD0715NP_966474Hypothetical

B. Proteins Unique to Anaplasmataceae Family (Wolbachia, Ehrlichia, Anaplasma and Neorickettsia)

WD0083NP_965909HypotheticalWD0827NP_966580Hypothetical
WD0157NP_965975HypotheticalWD0863NP_966613Hypothetical
WD0821NP_966574Hypothetical

C. Proteins Unique to Anaplasmataceae Family but missing in Neorickettsia

WD0148NP_965966HypotheticalWD0772NP_966527Hypothetical
WD0412NP_966202HypotheticalWD1025NP_966750Hypothetical
WD0467NP_966253Preprotein translocase, SecGWD1056NP_966779Hypothetical
WD0757NP_966513HypotheticalWD1220NP_966932Hypothetical
WD0764NP_966520HypotheticalWD1230NP_966942Hypothetical

1 Missing in Neorickettsia and Orientia

Protein that are specific for the Rickettsiales group of species 1 Missing in Neorickettsia and Orientia

Conclusion

In this work, we have used a combined phylogenetic and phylogenomic approach to examine the evolutionary relationships among α-proteobacteria. Our analyses have identified large numbers of genes/proteins that are uniquely found in α-proteobacteria at various phylogenetic depths (Fig. 3). These include several proteins that are distinctive characteristics of all α-proteobacteria, as well as many proteins that constitute the unique repertoires of either all of the main orders of α-proteobacteria (viz. Rhizobiales, Rhodobacterales, Rhodospirillales, Rickettsiales, Sphingomonadales and Caulobacterales) or its different families (viz. Rickettsiaceae, Anaplasmataceae, Rhodospirillaceae, Acetobacteraceae, Bradyrhizobiaceae, Brucellaceae and Bartonellaceae). In addition, numerous other α-proteobacteria-specific proteins are present at different phylogenetic depths and they provide important information regarding the evolution of these bacteria. This work also describes two novel conserved indels in important housekeeping genes (viz. Gyrase B and RpoB) that are distinctive characteristics of the Sphingomonadales and Rhodospirillales orders, respectively. These indels are in addition to many other α-proteobacteria-specific indels that have been described in earlier work [16,58]. Based upon these α-proteobacteria-specific proteins and conserved indels, it is now possible to define nearly all of the higher taxonomic groups (i.e. most orders and many families) within α-proteobacteria in clear and definitive molecular terms based upon multiple characteristics (Fig. 3). The species distribution profiles of these α-proteobacteria-specific proteins and indels also provide important information regarding their branching order and interrelationships, which are highly concordant with each other (c.f. Fig. 26 in ref. [16] with Fig. 3 in this work). Importantly, the relationships that emerge from these phylogenomic analyses (Fig. 3) are in excellent agreement with the branching patterns of these species in different phylogenetic trees (Figs. 1 and 2) [29], giving high degree of confidence in the derived inferences. It should be noted that both in our work (Fig. 1) and that by Williams et al. [29], when analyses were performed using the traditional phylogenetic methods (viz. NJ, MP or ML analyses), the branching of Caulobacterales with respect to Rhizobiales and Rhodobacterales was not resolved. In contrast, using the character compatibility approach, the Caulobacterales and related species were found to consistently branch in between the Rhizobiales and the Rhodobacterales (Fig. 2). Previously, this approach has also proven useful in clarifying the phylogenetic placement of Salinibacter ruber, which was not resolved by other methods [33,59]. The phylogenetic studies presented here reveal that a number of species belonging to the Hyphomonadaceae family (viz. M. maris, O. alexandrii and H. neptunium), for which sequence information is available and that are presently grouped with the Rhodobacterales, branch reliably with the Caulobacterales (Figs 1 and 2)[29,47]. The same is also true for P. bermudensis, which is the only species from Parvularculales order, for which sequence information is available. The grouping of these species with the Caulobacterales is independently strongly supported by phylogenomic studies, where a number of proteins that are unique for C. crescentus were present in these species and at the same time many proteins that are distinctive characteristics of the Rhodobacterales were absent in them. These results make a strong case for the transfer of these Hyphomonadaceae species and also P. bermudensis to an expanded Caulobacterales order [29,47]. Another taxonomic anomaly identified by the present study concerns the phylogenetic position of Stappia aggregata. This species was originally identified as an Agrobacterium-related species, but later transferred to the Rhodobacterales order [43]. However, the shared presence of many Rhizobiales -specific proteins by S. aggregata and the absence of various Rhodobacterales-specific proteins in it, strongly suggest that it should be regrouped with the Rhizobiaceae-Phyllobacteriacea species. The overwhelming majority of the identified α-proteobacteria-specific proteins do not have a homolog showing significant similarity in any other bacteria. The group-specificities of these proteins indicate that their genes have evolved in a common ancestor of these particular groups or clades of α-proteobacteria (Fig. 3). The clade specificities of these proteins also provide evidence that following their evolution, their genes have been transmitted primarily in a vertical manner, and that non-specific mechanisms such as LGTs have not played a significant role in their species distribution. Similar inferences have been reached in earlier studies for proteins that are specific for other higher taxa of bacteria [20,22-24,33,60]. Most of the α-proteobacteria-specific proteins identified in the present work are of unknown function. A number of these proteins are present in the genomes in clusters of two or three, suggesting that they may form functional units and could be involved in related functions [18,26,48,61]. The retention of these α-proteobacteria-specific proteins and conserved indels by the indicated clades of α-proteobacteria over long evolutionary periods strongly suggests that they serve essential functions in these groups of bacteria. Hence, studies on their cellular functions may lead to the discovery of novel biochemical and physiological characteristics that are distinctive characteristics of either all α-proteobacteria or their particular subgroups. Lastly, the primary sequences of many of these genes/proteins are highly conserved and they provide novel means for the identification and characterization of these bacteria by PCR-based and immunological methods.

Methods

Identification of proteins that are specific for alpha proteobacteria

The Blastp searches were carried out on each ORF in the genomes of B. japonicum USDA 110, B. suis 1330, C. crescentus CB15, G. oxydans 621H, M. loti MAFF303099, N. winogradskyi Nb-255, N. aromaticivorans DSM 12444, R. sphaeroides 2.4.1, Silicibacter sp. TM1040, R. rubrum ATCC 11170 and Wolbachia (D. melanogaster) endosymbiont to identify proteins that are uniquely present in α-proteobacteria species at different phylogenetic depths. The blast searches were performed against all organisms (i.e. non-redundant (nr) database) using the default parameters, without the low complexity filter [62]. The proteins that were of interest were those where either all significant hits were from the indicated groups (or orders) of α-proteobacteria, or which involved a large increase in E values from the last hit belonging to a particular group to the first hit from any other group and the E values for the latter hits were > 1e -4, indicating weak similarity that could occur by chance. However, higher E values were often considered significant for smaller proteins as the magnitude of the E value depends upon the length of the query sequence [62]. All promising proteins were further analyzed using the position-specific iterated (PSI)-Blast program [62]. In this study, we have also retained a few proteins where 1 or 2 isolated species from other groups had acceptable E values, as they provide possible cases of LGTs. For all of the proteins that are specific for α-proteobacteria, their protein ID's, accession numbers and any information regarding cellular functions (such as COG number or presence of any conserved domain) were tabulated and are presented. In describing various proteins in the text, "bll, bsl, blr", "BQ", "BR or BRA" "CC," "GOX" "ml", "Nwi", "Saro", "RSP", "TM1040", "Rru" and "WD" indicate the identification numbers of the proteins in the genomes of B. japonicum, Bartonella quintana, B. suis, C. crescentus, G. oxydans, M. loti, N. winogradskyi Nb-255, N. aromaticivorans, R. sphaeroides 2.4.1, Silicibacter sp. TM1040, R. rubrum and Wolbachia (Dros.) endosymbiont, respectively.

Phylogenetic analysis

The amino acid sequences for 12 conserved proteins (viz. RNA polymerase β and β' subunits, alanyl-tRNA synthetase, phenyalanyl-tRNA synthetase, arginyl-tRNA synthetase, protein synthesis elongation factors EF-Tu and EF-G, RecA, Gyrase A, Gyrase B, Hsp60 and Hsp70) for different species were downloaded from the NCBI database and aligned using the CLUSTAL × program [63]. In addition to the sequences for 50 α-proteobacteria species, sequences for two deep-branching species viz. Helicobacter pylori and Campylobacter jejuni [58], were also included for rooting purposes. The sequence alignments for all 12 proteins were concatenated into a single large file and poorly aligned regions were removed using the Gblocks 0.91b program [64]. The final sequence alignment that was used for phylogenetic analyses contained 7652 aligned positions. A neighbour-joining tree based on this alignment was constructed based on Kimura's two parameter model distances using the TREECON program [65]. Maximum-likelihood and MP trees were computed using the WAG+F model plus a gamma distribution with four categories using the TREE-PUZZLE [66] and Mega 3.1 program [67], respectively. All trees were bootstrapped 100 times [68], unless otherwise indicated. The character compatibility analysis was performed on a concatenated sequences for the above 12 proteins for 25 α-proteobacteria species representing all its main orders plus two outgroup species (i.e. H. pylori and C. jejuni) [32]. Using the program "DUALSITE" [32], all sites in the sequence alignments where only two amino acid states were found, with each state present in at least two species, were selected. All columns where any gap was present in any of the species were omitted. The useful two state sites were converted into a binary file of "0, 1" characters using the DUALSITE program and this file was used for compatibility analysis [32]. The compatibility analysis was carried out using the CLIQUE program from the PHYLIP (ver. 3.5c) program package [68] to identify the largest clique(s) of compatible characters. The cliques were drawn and the numbers of characters that distinguished different nodes were indicated.

Identification of conserved indels specific for α-proteobacteria subgroups

Multiple sequence alignments for various proteins constructed in this work were visually inspected to search for any indels in a conserved region that was unique to particular subgroups or orders of α-proteobacteria. The group-specificity of any indel was evaluated by carrying out blast searches on a short segment of the sequence (between 80–120 aa) containing the indel and flanking conserved regions against the non-redundant database. The sequence information for various α-proteobacteria was compiled into signature files that are presented. Sequence information for all other groups of bacteria, which lack these inserts, is not shown.

Authors' contributions

The initial blastp searches on various genomes were carried out by RSG with the computer assistance provided by Venus Wong. AM analyzed the results of these blast searches to identify various group-specific proteins and confirmed their specificities by means of PSI-blast and genomic blasts. Phylogenetic analyses and identification of conserved indels was done by RSG. RSG was also responsible for directing this study, for final evaluation of the results, and for writing this manuscript. All authors have read and approved the submitted manuscript.

Additional file 1

Proteins that are specific for the Brucella species. Many of these proteins are specific for all sequenced Brucella species (viz. B. abortus, B. melitensis, B. ovis and B. suis), whereas others are present in only 2 or 3 of these species. The proteins only found in a single Brucella species are not listed here. Click here for file

Additional file 2

Bradyrhizobiaceae-specific proteins that are missing in some species. For the proteins listed in this table, all significant hits in Blastp and PSI-Blast searches are from Bradyrhizobiaceae species. However, unlike the proteins listed in Table 3, which are present in all sequenced Bradyrhizobiaceae species belonging to the genera Bradyrhizobium, Nitrobacter and Rhodopseudomonas, these proteins are generally missing in species from one of these three genera. Click here for file

Additional file 3

Proteins that are specific for the Sphingomonadales but missing in one or more species. The proteins listed in part (A) of this table are present in three of the following four Sphingomonadales species (Novosphingobium, Erythrobacter, Sphingomonas and Sphingopyxis), where as those listed in part (B) are present in Novosphingobium and either Sphingomonas or Erythrobacter. Click here for file
  68 in total

1.  Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis.

Authors:  J Castresana
Journal:  Mol Biol Evol       Date:  2000-04       Impact factor: 16.240

Review 2.  On the origin of mitochondria: a genomics perspective.

Authors:  Siv G E Andersson; Olof Karlberg; Björn Canbäck; Charles G Kurland
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2003-01-29       Impact factor: 6.237

3.  A phylogenomic approach to bacterial phylogeny: evidence of a core of genes sharing a common history.

Authors:  Vincent Daubin; Manolo Gouy; Guy Perrière
Journal:  Genome Res       Date:  2002-07       Impact factor: 9.043

4.  Complete genome sequence of Rickettsia typhi and comparison with sequences of other rickettsiae.

Authors:  Michael P McLeod; Xiang Qin; Sandor E Karpathy; Jason Gioia; Sarah K Highlander; George E Fox; Thomas Z McNeill; Huaiyang Jiang; Donna Muzny; Leni S Jacob; Alicia C Hawes; Erica Sodergren; Rachel Gill; Jennifer Hume; Maggie Morgan; Guangwei Fan; Anita G Amin; Richard A Gibbs; Chao Hong; Xue-Jie Yu; David H Walker; George M Weinstock
Journal:  J Bacteriol       Date:  2004-09       Impact factor: 3.490

5.  The genome sequence of the facultative intracellular pathogen Brucella melitensis.

Authors:  Vito G DelVecchio; Vinayak Kapatral; Rajendra J Redkar; Guy Patra; Cesar Mujer; Tamara Los; Natalia Ivanova; Iain Anderson; Anamitra Bhattacharyya; Athanasios Lykidis; Gary Reznik; Lynn Jablonski; Niels Larsen; Mark D'Souza; Axel Bernal; Mikhail Mazur; Eugene Goltsman; Eugene Selkov; Philip H Elzer; Sue Hagius; David O'Callaghan; Jean-Jacques Letesson; Robert Haselkorn; Nikos Kyrpides; Ross Overbeek
Journal:  Proc Natl Acad Sci U S A       Date:  2001-12-26       Impact factor: 11.205

Review 6.  Aerobic anoxygenic phototrophic bacteria.

Authors:  V V Yurkov; J T Beatty
Journal:  Microbiol Mol Biol Rev       Date:  1998-09       Impact factor: 11.056

7.  Comparative genomic evidence for a close relationship between the dimorphic prosthecate bacteria Hyphomonas neptunium and Caulobacter crescentus.

Authors:  Jonathan H Badger; Timothy R Hoover; Yves V Brun; Ronald M Weiner; Michael T Laub; Gladys Alexandre; Jan Mrázek; Qinghu Ren; Ian T Paulsen; Karen E Nelson; Hoda M Khouri; Diana Radune; Julia Sosa; Robert J Dodson; Steven A Sullivan; M J Rosovitz; Ramana Madupu; Lauren M Brinkac; A Scott Durkin; Sean C Daugherty; Sagar P Kothari; Michelle Gwinn Giglio; Liwei Zhou; Daniel H Haft; Jeremy D Selengut; Tanja M Davidsen; Qi Yang; Nikhat Zafar; Naomi L Ward
Journal:  J Bacteriol       Date:  2006-10       Impact factor: 3.490

Review 8.  Protein signatures distinctive of alpha proteobacteria and its subgroups and a model for alpha-proteobacterial evolution.

Authors:  Radhey S Gupta
Journal:  Crit Rev Microbiol       Date:  2005       Impact factor: 7.624

Review 9.  The genome of Rhizobium leguminosarum has recognizable core and accessory components.

Authors:  J Peter W Young; Lisa C Crossman; Andrew W B Johnston; Nicholas R Thomson; Zara F Ghazoui; Katherine H Hull; Margaret Wexler; Andrew R J Curson; Jonathan D Todd; Philip S Poole; Tim H Mauchline; Alison K East; Michael A Quail; Carol Churcher; Claire Arrowsmith; Inna Cherevach; Tracey Chillingworth; Kay Clarke; Ann Cronin; Paul Davis; Audrey Fraser; Zahra Hance; Heidi Hauser; Kay Jagels; Sharon Moule; Karen Mungall; Halina Norbertczak; Ester Rabbinowitsch; Mandy Sanders; Mark Simmonds; Sally Whitehead; Julian Parkhill
Journal:  Genome Biol       Date:  2006-04-26       Impact factor: 13.583

10.  Evolutionary origins of genomic repertoires in bacteria.

Authors:  Emmanuelle Lerat; Vincent Daubin; Howard Ochman; Nancy A Moran
Journal:  PLoS Biol       Date:  2005-04-05       Impact factor: 8.029

View more
  46 in total

Review 1.  Phylogenetic framework and molecular signatures for the main clades of the phylum Actinobacteria.

Authors:  Beile Gao; Radhey S Gupta
Journal:  Microbiol Mol Biol Rev       Date:  2012-03       Impact factor: 11.056

Review 2.  Molecular signatures for the main phyla of photosynthetic bacteria and their subgroups.

Authors:  Radhey S Gupta
Journal:  Photosynth Res       Date:  2010-04-23       Impact factor: 3.573

3.  A New N-Acyl Homoserine Lactone Synthase in an Uncultured Symbiont of the Red Sea Sponge Theonella swinhoei.

Authors:  Maya Britstein; Giulia Devescovi; Kim M Handley; Assaf Malik; Markus Haber; Kumar Saurav; Roberta Teta; Valeria Costantino; Ilia Burgsdorf; Jack A Gilbert; Noa Sher; Vittorio Venturi; Laura Steindler
Journal:  Appl Environ Microbiol       Date:  2015-12-11       Impact factor: 4.792

4.  Next-generation pyrosequencing analysis of microbial biofilm communities on granular activated carbon in treatment of oil sands process-affected water.

Authors:  M Shahinoor Islam; Yanyan Zhang; Kerry N McPhedran; Yang Liu; Mohamed Gamal El-Din
Journal:  Appl Environ Microbiol       Date:  2015-04-03       Impact factor: 4.792

5.  The DUF1013 protein TrcR tracks with RNA polymerase to control the bacterial cell cycle and protect against antibiotics.

Authors:  Marie Delaby; Lydia M Varesio; Laurence Degeorges; Sean Crosson; Patrick H Viollier
Journal:  Proc Natl Acad Sci U S A       Date:  2021-02-23       Impact factor: 11.205

6.  Illumina sequencing-based analysis of a microbial community enriched under anaerobic methane oxidation condition coupled to denitrification revealed coexistence of aerobic and anaerobic methanotrophs.

Authors:  Luciene Alves Batista Siniscalchi; Laura Rabelo Leite; Guilherme Oliveira; Carlos Augusto Lemos Chernicharo; Juliana Calabria de Araújo
Journal:  Environ Sci Pollut Res Int       Date:  2017-05-31       Impact factor: 4.223

7.  The diversity and evolution of cell cycle regulation in alpha-proteobacteria: a comparative genomic analysis.

Authors:  Matteo Brilli; Marco Fondi; Renato Fani; Alessio Mengoni; Lorenzo Ferri; Marco Bazzicalupo; Emanuele G Biondi
Journal:  BMC Syst Biol       Date:  2010-04-28

8.  Disruption of the BMEI0066 gene attenuates the virulence of Brucella melitensis and decreases its stress tolerance.

Authors:  Xinglin Zhang; Jie Ren; Na Li; Wenjuan Liu; Qingmin Wu
Journal:  Int J Biol Sci       Date:  2009-09-01       Impact factor: 6.580

9.  Structural and phylogenetic analysis of a conserved actinobacteria-specific protein (ASP1; SCO1997) from Streptomyces coelicolor.

Authors:  Beile Gao; Seiji Sugiman-Marangos; Murray S Junop; Radhey S Gupta
Journal:  BMC Struct Biol       Date:  2009-06-10

10.  Signature proteins for the major clades of Cyanobacteria.

Authors:  Radhey S Gupta; Divya W Mathews
Journal:  BMC Evol Biol       Date:  2010-01-25       Impact factor: 3.260

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.