Literature DB >> 30550857

Polyphyly in 16S rRNA-based LVTree Versus Monophyly in Whole-genome-based CVTree.

Guanghong Zuo1, Ji Qi2, Bailin Hao3.   

Abstract

We report an important but long-overlooked manifestation of low-resolution power of 16S rRNA sequence analysis at the species level, namely, in 16S rRNA-based phylogenetic trees polyphyletic placements of closely-related species are abundant compared to those in genome-based phylogeny. This phenomenon makes the demarcation of genera within many families ambiguous in the 16S rRNA-based taxonomy. In this study, we reconstructed phylogenetic relationship for more than ten thousand prokaryote genomes using the CVTree method, which is based on whole-genome information. And many such genera, which are polyphyletic in 16S rRNA-based trees, are well resolved as monophyletic clusters by CVTree. We believe that with genome sequencing of prokaryotes becoming a commonplace, genome-based phylogeny is doomed to play a definitive role in the construction of a natural and objective taxonomy.
Copyright © 2018 The Authors. Production and hosting by Elsevier B.V. All rights reserved.

Entities:  

Keywords:  16S rRNA sequence; Archaea and bacteria taxonomy; CVTree; Phylogeny; Whole-genome sequence

Mesh:

Substances:

Year:  2018        PMID: 30550857      PMCID: PMC6364046          DOI: 10.1016/j.gpb.2018.06.005

Source DB:  PubMed          Journal:  Genomics Proteomics Bioinformatics        ISSN: 1672-0229            Impact factor:   7.691


Introduction

The use of small subunit (SSU) rRNA as molecular marker by Carl Woese and coworkers in the 1970s [1] has been a great success in prokaryotic taxonomy. Nowadays, the major references to prokaryotic taxonomy such as The Bergey’s Manual, including both the 2nd hardcopy edition [2] and the online electronic edition (BMSAB) [3], the multi-volume treatise The Prokaryotes IV [4], and the List of Prokaryotic Names with Standing in Nomenclature (LPSN) [5], are all based on 16S rRNA sequence analysis. At the same time, it has been recognized that the SSU rRNA sequences lack resolution at the species level and below (see, e.g., [6], [7], [8], [9]). However, to the best of our knowledge, a more severe consequence of the low resolution of 16S rRNA sequence analysis has not been reported in the literature so far, namely, redundant polyphyletic placements of species in 16S rRNA trees prevent correct definition of many genera. In contrast, many such genera are well-defined as monophyletic clusters in whole-genome-based phylogeny. In the present work, we demonstrate this phenomenon with a number of examples.

Methods

We use the All-Species Living Tree [9], [10], [11], abbreviated as LVTree, as reference of phylogenetic information from 16S rRNA sequence analysis. The latest release of LVTree, LTPs128 of February 2017, was based on 475 archaeal and 12,478 bacterial 16S rRNA sequences. We display and manipulate LVTree using the LVTree Viewer [12]. Whole-genome-based phylogenetic trees were constructed by implementing the alignment-free Composition Vector approach [13], [14], [15], [16]. In fact, in order to generate data for this paper, we use a more powerful version of the publicly-available CVTree3 Web Server [16]. It is capable to deal with 10,000–15,000 genomes in a single run within reasonable CPU time. These genomes were picked up from a collection of more than 125,000 prokaryotic genomes downloaded from IMG [17], RefSeq [18], NCBI [19], and occasionally, PATRIC [20] or EzBioCloud [21]. It is a good practice to put any group of species under study in the background of a large number of genomes with a wide taxonomic distribution. A typical CVTree job used in the present study contains 254 archaeal and 8036 bacterial genomes with K = 6. A guiding principle in evaluating the quality of a taxon is monophyly. Historically, the notion of monophyly originated from zoology and was associated with sexual reproduction. We apply it to prokaryotes in a pragmatic way by restricting the discussion to an input dataset and a reference taxonomy. A tree branch is said to be monophyletic if it contains exclusively species from a given taxon according to the reference taxonomy. For example, if all 144 leaves of a branch come from the same family, say, Acetobacteraceae, and no members of this family appear in other branches, we write the family as Acetobacteraceae{144}, where 144 is the number of 16S rRNA sequences in LVTree or number of genomes in CVTree. A taxon is said to be well-defined if it is monophyletic. Both CVTree Web Server and LVTree Viewer report automatically whether a taxon is monophyletic or not, at all taxonomic ranks from phylum down to species. Comparison of CVTree and LVTree phylogenies with taxonomy is carried out in a family-by-family manner. LVTree Release 128 contains 358 families. Among them, 68 monospecific families are trivially monophyletic containing only a single species, 180 are monophyletic, and the remaining 110 families are non-monophyletic. The aforementioned typical CVTree job contains 313 families, of which 76 are trivially monophyletic, 139 monophyletic, and 98 non-monophyletic. Some non-monophyletic families may become well-defined by making just a few obvious lineage modifications. Table 1 lists a number of families containing a comparatively large number of subordinate genera and species.
Table 1

Number of organisms in some families well-defined in both LVTree and CVTree up to probable minor lineage modifications

FamilyLVTreeCVTreeRemarkRef.
Acetobacteraceae144233Stella transferred to Rhodospirillaceae[22]
Bifidobacteriaceae68119
Caulobacteraceae5185
Corynebacteriaceae98103
Flavobacteriaceae671188
Leuconostocaceae4675
Methanobacteriaceae4670Re-assigning Methanothermus, see text
Pasteurellaceae8397
Staphylococcaceae9588
Streptococcaceae118222
Veillonellaceae74151Retained as part of Negativicutes, see text
Number of organisms in some families well-defined in both LVTree and CVTree up to probable minor lineage modifications In order to demonstrate the main conclusion of this paper, namely, there are abundant polyphyletic placements of species across genera in LVTree compared to predominant monophyletic genera in CVTree, we elaborate three groups of examples. These include, (1) straightforward cases without invoking lineage modifications; (2) cases requiring minor lineage modifications; and (3) a case that which at first glance speaks in favor of LVTree but a recent taxonomic proposal has eventually made it a supporter of CVTree.

Results

Straightforward cases

Example 1 Caulobacteraceae

According to BMSAB [3], Caulobacteraceae is the only family in the order Caulobacterales in class Alphaproteobacteria of the phylum Proteobacteria. Organisms of this family have been grouped together owing to their specific way of asymmetric cell division long before molecular means of characterizing bacteria has been developed. Being the first example of this study, we present some more details behind the construction of phylogenetic trees. The family Caulobacteraceae contains four genera, but major taxonomic references list different number of species as shown in Table 2. A few comments on Table 2 are appropriate:
Table 2

Number of species in the constituent genera of Caulobacteraceae as listed in major taxonomic references

GenusBMSAB[3]2005/2017The Prokaryotes IV[4]2015LPSN[5]Dec 2017EzBioCloud[21]Oct 2017
Asticcacaulis2466
Brevundimonas9212829
Caulobacter4699
Phenylobacterium17+1*1111+1*

Note: 1* denotes the species Phenylobacterium zucineum, which has not be validly published by BMSAB and LPSN. BMSAB, Bergey’s Manual of Systematics of Archaea and Bacteria; LPSN, List of Prokaryotic Names with Standing in Nomenclature.

Number of species in the constituent genera of Caulobacteraceae as listed in major taxonomic references Note: 1* denotes the species Phenylobacterium zucineum, which has not be validly published by BMSAB and LPSN. BMSAB, Bergey’s Manual of Systematics of Archaea and Bacteria; LPSN, List of Prokaryotic Names with Standing in Nomenclature. First, the electronic edition of BMSAB [3] appeared online in 2015, but most of its texts remained the same as in the volumes of The Bergey’s Manual of Systematic Bacteriology, 2nd edition [2]. Though partial updates of the electronic edition have been released four times a year, it may take many years to have all parts of BMSAB updated. In particular, the files related to Caulobacteraceae in BMSAB were identical to those of Bergey’s Manual of 2005. This explains why the numbers of species in the first column of Table 2 are the lowest ones. Second, the corresponding volume of The Prokaryotes IV [4], published in 2014, was organized by families and contained more updated information. In particular, the genus Phenylobacterium included a species P. zucineum [23], which is considered to be not validly published by BMSAB and LPSN, despite the fact that its finished genome is available for almost 10 years [24]. This is marked by “+1” in the last row of Table 2. Third, although both LPSN [5] and EzBioCloud [21] reflect the content of International Journal of Systematic and Evolutionary Microbiology, EzBioCloud adds more information on sequenced prokaryotic genomes, which is useful for the inspection of whole-genome-based CVTree. While BMSAB and LPSN contain only validly-published names, especially those of type strains, the dataset behind CVTree includes many genomes with unclassified lineages. For example, Caulobacterales_bacterium_RIFOXYB1_FULL_67_16 is classified only to the order and Caulobacteraceae_bacterium_PMMR1 only to the family level. There are many more genomes classified to the species level without validly-published names, e.g., Brevundimonas_sp_Root1423. CVTree is capable of assigning most of them to a proper genus, as summarized in Table 3.
Table 3

Number of representatives in the constituent genera of Caulobacteraceae in LVTree and CVTree used in the present work

GenusNo. of 16S rRNA sequences in LVTreeNo. of genomes in CVTree
Asticcacaulis64 genomes from 4 species;4 genomes from unclassified species



Brevundimonas2716 genomes from 14 species;17 genomes from unclassified species



Caulobacter912 genomes from 4 species;21 genomes from unclassified species



Phenylobacterium93 genomes from 3 species;8 genomes from unclassified species
Number of representatives in the constituent genera of Caulobacteraceae in LVTree and CVTree used in the present work Figure 1 shows the maximally-collapsed Caulobacteraceae branch in both LVTree (Figure 1A) and CVTree (Figure 1B). Only numbers of organisms are indicated in the figure. The detailed names with strain tags can be found in the fully-expanded figures (Figures S1 and S2). In order to avoid confusion, a remark must be made concerning Streptomyces longisporoflavus, which appeared in 27 species of the genus Brevundimonas. Its 16S rRNA sequence (GenBank accession No. DQ442520, 2006) apparently came from a Brevundimonas strain mislabeled as a Streptomyces. Although the authors of the original 16S rRNA submitted a new sequence (GenBank accession No. NR_115963) in 2015, they did not make a formal emendation to replace the old one. This problem was pointed out in Chapter 7 of The Prokaryotes IV [4] without drawing a conclusion. We have performed BLAST comparison of the two 16S rRNA sequences and confirmed the correctness of NR_115963 for Streptomyces longisporoflavus [12]. However, a piece of validly-published information, though incorrect, may remain there as long as no one makes a formal emendation. Therefore, the wrong Streptomyces longisporoflavus label still exists in the literature, e.g., in Figure 7.1 of The Prokaryotes IV [4]. We mention in passing that, all the four genera in Figure 7.1 of The Prokaryotes IV [4] are monophyletic, contradicting the LVTree (Figure 1A) but agreeing with the CVTree (Figure 1B). To this end, it must be noticed that in almost all phylogenetic trees given in The Prokaryotes IV [4], the input data and method of tree inference were indicated in figure captions except for Figure 7.1. Therefore, one must assume that this figure represented a consensus branching scheme, not what followed from a single phylogenetic tree based on 16S rRNA sequence analysis.
Figure 1

Collapsed trees of families Caulobacteraceae and Leuconostocaceae

Branches are collapsed at genus level (denoted by G) for both 16S rRNA-based LVTree and whole-genome-based CVTree for every family. A solid circle at the end of the branch denoted that there are more than one genomes in the branch. Numbers in a bracket represent the total number of taxa in a genus (denominator) and those included in the branch (numerator), while only the total number of taxa is shown when a branch is monophyletic.

Collapsed trees of families Caulobacteraceae and Leuconostocaceae Branches are collapsed at genus level (denoted by G) for both 16S rRNA-based LVTree and whole-genome-based CVTree for every family. A solid circle at the end of the branch denoted that there are more than one genomes in the branch. Numbers in a bracket represent the total number of taxa in a genus (denominator) and those included in the branch (numerator), while only the total number of taxa is shown when a branch is monophyletic. The contrast of LVTree and CVTree is noticeable in Figure 1A and B. While in 16S rRNA-based LVTree only one genus Asticcacaulis is monophyletic, all four genera are well-defined in whole-genome-based CVTree.

Example 2 Leuconostocaceae

Now we turn to the family Leuconostocaceae which is represented by 46 16S rRNA sequences in LVTree (Figure 1C) and by 75 genomes in CVTree (Figure 1D). As in LPSN, there are five valid genera, named Convivina, Fructobacillus, Leuconostoc, Oenococcus, and Weissella in this family. The genus Convivina was not involved in this analysis as only one genome of the genus was published recently [25]. Among the rest four genera, only Oenococcus and Fructobacillus are monophyletic on the 16S rRNA-based LVTree, while the other two polyphyletic genera are represented in form of Leuconostoc{17+1} and Weissella{16+4}. On the contrary, all four genera are monophyletic in CVTree. It is worth noting that an unclassified species Leuconostocaceae sp. R53105 is placed as a sister branch of the genus Fructobacillus, implying its possible classification as a member of Fructobacillus or a new genus. Expanded versions of these two phylogenetic trees with full names and strain tags are given in Figures S3 and S4.

Example 3 Staphylococcaceae

The family Staphylococcaceae contains the notorious species Staphylococcus aureus whose methicillin-resistant strains (MRSA) cause severe cross-infections in hospitals. Owing to its clinical importance, more than 8000 genomes of this species have been sequenced. It is remarkable that all these genomes form a monophyletic cluster in CVTree. However, as epidemiologic studies of pathogens go beyond the scope of this work, we only retain a few tens of S. aureus strains as members of the genus Staphylococcus. In 16S rRNA-based LVTree, although the family Staphylococcaceae{95} appears as a monophyletic cluster, it does contain two polyphyletically-placed genera, Salinicoccus and Jeotgalicoccus. Contrary to LVTree (Figure 2A), in whole-genome-based CVTree (Figure 2B), all subordinate genera in the family Staphylococcaceae{115} appear monophyletic on their own.
Figure 2

Collapsed trees of families Staphylococcaceae and Streptococcaceae

Branches are collapsed at genus level for both 16S rRNA-based LVTree and whole-genome-based CVTree for every family.

Collapsed trees of families Staphylococcaceae and Streptococcaceae Branches are collapsed at genus level for both 16S rRNA-based LVTree and whole-genome-based CVTree for every family.

Example 4 Streptococcaceae

This is a trivial case. In LVTree, the main cluster of family Streptococcaceae{118} consists of three genera: Streptococcus{102}, Lactococcus{5+10}, and Lactovum{1} (Figure 2C). However, in CVTree, the family Streptococcaceae{222} consists of a monophyletic cluster made of two monophyletic genera: Streptococcus{181} and Lactococcus{41} (Figure 2D). The monophyly of Lactococcus being violated by insertion of a monospecific genus Lactovum (as shown in LVTree) was proposed in 2005. There are two possibilities for Lactovum: either it is a disguised Lactococcus, or it actually makes a new genus, thus causing Lactococcus species placement to be polyphyletic. Since no sequenced genome is available so far, one does not have enough information to draw conclusions.

Example 5 Corynebacteriaceae

This is another trivial case as the family essentially contains only a single genus Corynebacterium. There was a monospecific genus Turicella proposed in 1994, which violated monophyly of the genus Corynebacterium in both LVTree and CVTree. As we have pointed out recently [26], Turicella could not make an independent genus and should be considered as a synonym to Corynebacterium. Therefore, the family Corynebacteriaceae contains only a single monophyletic genus Corynebacterium in both LVTree and CVTree, and there is no polyphyly in both trees. The comparisons in all the five examples above are made under the assumption that the corresponding taxonomy is correct and no lineage modifications are needed. However, as taxonomy has always been a work in progress, revisions happen constantly as a rule. Therefore, we turn to the second group of examples that require minor lineage modifications. In fact, this second group of examples represents commonplace in prokaryotic taxonomy.

Cases requiring minor lineage modifications

Example 6 Methanobacteriaceae

Our next example comes from Archaea. In LVTree the family Methanobacteriaceae{44} consists of a monophyletic cluster made of four genera: Methanosphaera{1}, Methanobrevibacter{14}, Methanothermobacter{2+6}, and Methanobacterium{1+3+17} (Figure 3A). The last two genera turn out to be polyphyletic. For example, Methanothermobacter{2+6} means that the genus Methanothermobacter comprises two parallel branches represented by 2 and 6 sequences of 16S rRNA, respectively. Please note that next to the monophyletic cluster Methanobacteriaceae{44}, there is a genus Methanothermus{2}, belonging to the family Methanothermaceae, which was proposed in 1981 [27] together with its type genus Methanothermus. Since then, no new genus has been discovered and described in the family.
Figure 3

Collapsed trees of families Methanobacteriaceae and Flavobacteriaceae

Branches are collapsed at genus level for both 16S rRNA-based LVTree and whole-genome-based CVTree for every family. F and G denote family and genus, respectively.

Collapsed trees of families Methanobacteriaceae and Flavobacteriaceae Branches are collapsed at genus level for both 16S rRNA-based LVTree and whole-genome-based CVTree for every family. F and G denote family and genus, respectively. In whole-genome-based CVTree, the family Methanobacteriaceae is represented by 68 genomes from four genera Methanothermobacter{5}, Methanobacterium{14}, Methanobrevibacter{45}, and Methanosphaera{4} (Figure 3B). However, these four genera do not form a monophyletic cluster, as the family Methanothermaceae with its only type genus Methanothermus gets deeply inside the cluster above. Pursuing monophyly as a guiding principle, this fact suggests a plausible revision: including Methanothermus as a part of the family Methanobacteriaceae and dropping the family name Methanothermaceae from the prokaryotic nomenclature. This lineage modification does not contradict the branching scheme in LVTree, i.e., it is acceptable in both LVTree and CVTree. This explains the numbers 46 and 70 in the Methanobacteriaceae row of Table 1.

Example 7 Bifidobacteriaceae

An inspection of family Bifidobacteriaceae{58} in LVTree reveals clearly polyphyly of the genus Bifidobacterium{1+1+6+1+3+8+8+17+10+1} (Figure 3C). In sharp contrast, genus Bifidobacterium{82} in CVTree is manifestly monophyletic (Figure 3D). A few words on the monospecific genus Gardnerella. Ever since the genus and species was proposed in 1980 [28], Gardnerella remains monospecific. In LVTree, it gets deeply into the genus Bifidobacterium. In CVTree, it stands next to the monophyletic Bifidobacterium cluster and might be absorbed into the latter without causing taxonomic contradiction. Not being related to the main theme of this paper, we leave this problem open. Another part of the family Bifidobacteriaceae is made of several genera from the Scardovia group, mostly polyphyletic in LVTree (Figure 3C) and seemingly monophyletic in CVTree (Figure 3D). A convincing elucidation of the situation requires more data.

Example 8 Acetobacteraceae

Now let us consider the family Acetobacteraceae. In both LVTree (Figure 4A) and CVTree (Figure 4B), species from two genera Gluconacetobacter and Komagataeibacter are heavily intermixed. In fact, the genus Gluconacetobacter was proposed in 1997 [29]. Later on, some species of this genus were taken out to form a new genus Komagataeibacter, as new combinations [30] and transfer from the former to the latter continued, e.g., in 2014 [31]. All these proposals were made by the same leading author Y. Yamada and his collaborators by comparing incomplete 16S rRNA sequences [29-31]. However, it is a sobering fact that in CVTree, species from the two genera Gluconacetobacter and Komagataeibacter, taken together, do make a monophyletic cluster. This fact hints strongly on the rationality of making the two genera a single one by retaining only the name Gluconacetobacter, which has the priority of being introduced first [29]. With this lineage modification done, the Acetobacteraceae branch appears as shown for LVTree (Figure 4A) and CVTree (Figure 4B), respectively.
Figure 4

Collapsed trees of families Acetobacteraceae and Pasteurellaceae

Branches are collapsed at genus level for both 16S rRNA-based LVTree and whole-genome-based CVTree for every family. G and S denote genus and species, respectively.

Collapsed trees of families Acetobacteraceae and Pasteurellaceae Branches are collapsed at genus level for both 16S rRNA-based LVTree and whole-genome-based CVTree for every family. G and S denote genus and species, respectively. Although the genus Gluconacetobacter{45} comes out as a monophyletic group in CVTree, its counterpart appears as six juxtaposed polyphyletic leaves, or, in our notations, as Gluconacetobacter{15+2+1+1+3+3}. It seems that this fact has misled the original authors to introduce a new genus but could not yet resolve the problem. Another non-monophyletic group in both LVTree and CVTree is formed by Roseomonas species interspersed with organisms from other genera. In particular, LVTree contains many genus names that are absent in CVTree, due to the lack of sequenced genomes. One must await new emerging data to complete the evaluation of branching schemes in LVTree and CVTree. Nonetheless, for the time being, CVTree behaves “better” by accommodating only one polyphyletic cluster of Roseomonas.

Example 9 Pasteurellaceae

Now we turn to a more complicated case. As shown in Figure 4C and D, the family Pasteurellaceae{83} in LVTree has different taxa number as {97} in CVTree, which is the most intricate branching figure given explicitly in this paper. Suffice it to look at how species from the three genera Pasteurella, Haemophilus, and Actinobacillus are mixed up in LVTree. Their interrelationship cannot be simply characterized as polyphyletic. However, the branching scheme in CVTree brings about some enlightenment. The genus Actinobacillus{11} is monophyletic, and the genus Haemophilus{9/10} is de facto monophyletic, if taking into account the assignment of Haemophilus ducreyi to a new unclassified genus by EzBioCloud [21]. Only the Pasteurella species come out polyphyletically. There is good hope that based on whole-genome analysis, the taxonomy of Pasteurellaceae will be brought to a better shape. In addition, we note that the newly proposed genus Rodentibacter [22] makes the Pasteurella species fewer in both LVTree and CVTree.

Example 10 Flavobacteriaceae

Now we look at an even more complicated case in Flavobacteriaceae. In LVTree, this family is represented by 671 species from 131 genera after assigning Pibocella to the genus Maribacter according to EzBioCloud [21]. The branching scheme is not shown because even the maximallycollapsed tree contains 189 lines. Although about 1/3 of the genera presented in LVTree do not have a genome sequenced, there are many sequenced genomes that are classified only to the species level without a validly-published name. These organisms are excluded from the LVTree dataset by design. However, as they do not violate monophyly of many genera in CVTree, it is easy to construct a whole-genome-based tree with a total genome number comparable with the number of 16S rRNA sequences present in LVTree (671) (Figure S5). In fact, we have a monophyletic family Flavobacteriaceae{818} in a CVTree (Figure S6). In order to highlight the difference between these two kinds of trees, it is instructive to pay attention to some local part. For example, Figure 5A shows the vicinity of the two genera Flavobacterium and Myroides in LVTree. The insertion of the genus Myroides made the genus Flavobacterium forming eight groups. The Flavobacterium species are clearly polyphyletic compared to the same vicinity in CVTree (Figure 5B). Anyway, CVTree comes out closer to monophyly than LVTree does.
Figure 5

Collapsed tree of two genera,

Branches are collapsed at genus level (denoted by G) for both 16S rRNA-based LVTree and whole-genome-based CVTree. The collapsed trees of LVTree and CVTree for all genera of the family are shown in Figures S5 and S6, respectively.

Collapsed tree of two genera, Branches are collapsed at genus level (denoted by G) for both 16S rRNA-based LVTree and whole-genome-based CVTree. The collapsed trees of LVTree and CVTree for all genera of the family are shown in Figures S5 and S6, respectively.

The special case of class Negativicutes

Being stained Gram-positive makes an important part of the definition of species in the phylum Firmicutes. However, there is a group of Gram-negative organisms embedded in the generally Gram-positive sea of Firmicutes. The taxonomic placement of this group has undergone long debates and, eventually, a new class Negativicutes in the phylum Firmicutes was proposed in 2010 [32]. As the last example in this paper, we consider the class Negativicutes. Not long ago, the 16S rRNA-based LVTree (Release 123; September 2015) followed the taxonomy that this class consisted of a single order Selenomonadales, which in turn was made of two monophyletic families Acidaminococcaceae and Veillonellaceae (Figure 6A). In contrast, according to this taxonomy, the whole-genome-based CVTree led to a polyphyletic family Veillonellaceae (Figure 6B). Therefore, LVTree seems to be “better” than CVTree in the sense of monophyly of the family Veillonellaceae. However, this was caused by the fact that the placement of about 20 genera in Veillonellaceae was questionable. These genera should be considered as Selenomonadales Incertae sedis, as indicated in Figure 35.1 on p. 434 of the corresponding volume of The Prokaryotes IV [33], but ignored in the dataset behind LVTree. This was the situation when the class Negativicutes was defined as containing only a single order Selenomonadales.
Figure 6

Collapsed tree of class Negativicutes before and after taxonomic revision

Branches are collapsed at family level (denoted by F) for 16S rRNA-based LVTree and whole-genome-based CVTree.

Collapsed tree of class Negativicutes before and after taxonomic revision Branches are collapsed at family level (denoted by F) for 16S rRNA-based LVTree and whole-genome-based CVTree. About the same time, a detailed taxonomic analysis using genomic data [34] arrived at the conclusion that the class Negativicutes actually contains three orders instead of one, that is, Veillonellales, Acidaminococcales, and Selenomonadales, with the last one consisting of two families Selenomonadaceae and Sporomusaceae. At present, both the LVTree Release 128 (February 2017) and CVTree adopted this validly published classification. This being done, the collapsed trees shown in Figure 6A and B transform into those shown in Figure 6C and D, respectively. In CVTree, all orders and families are now monophyletic. However, with this new classification, the family Sporomusaceae in LVTree becomes polyphyletic. Therefore, taxonomic proposal [34] again makes CVTree superior compared to LVTree. In other words, it supports our statement that whole-genome-based phylogeny agrees better with taxonomy in the sense of accommodating more monophyletic taxa.

Discussion

In this study, phylogenetic relationship for ten families and one class of prokaryotes is reconstructed based on alignment-free analysis upon whole-proteome information using CVTree, to provide detailed and comprehensive information for further comparisons with 16S rRNA-based phylogeny upon ten families and one class. This work is not simply a collection of examples. Using these examples, we intent to call attention on some principles in prokaryotic phylogeny and taxonomy. We look at some problems at large for prokaryotic phylogeny and taxonomy, as the intention of this study goes far beyond the collection of examples. In 1987, an Ad Hoc Committee wrote in its report [35]: “There was general agreement that the complete deoxyribonucleic acid (DNA) sequence would be the reference standard to determine phylogeny and that phylogeny should determine taxonomy. Furthermore, nomenclature should agree with (and reflect) genomic information.” Taxonomy came much earlier than phylogeny. Taxonomy is the classification of organisms by assigning them to discrete levels, i.e., from domain to species. A great achievement was made by Carl Woese and his colleagues [36] to propose the division of life into three domains based on small subunit rRNA sequences. The proposal greatly enhanced people’s acknowledgment of “the tree of life”, to which the increasing bacterial genomes from the end of the last century raise strong controversies instead of providing support [37]. As different genes may tell different stories, horizontal gene transfer, gene duplication and loss, incomplete lineage sorting, and other possibilities all together bring challenges to the development of objective taxonomic system guided by whole-genome information. Compared with taxonomy, phylogeny is more definitive in nature. Given an input dataset, be it a collection of 16S rRNA sequences or a collection of genomes, and a fixed method of inference of phylogenetic information, be it based on sequence-alignment or alignment-free, it produces a phylogenetic tree, i.e., a branching scheme of the input data. There is no way to do fine adjustment of the input data or the final results. Phylogeny cannot produce nomenclature on its own, but provides standard for hierarchical classification of organisms, ruling by their evolutionary histories. Does phylogeny represent relation among individual organisms or among populations? The notion of type strain was associated with individual organisms, but taxonomy always deals with population. In the long run “type strains” may be replaced by “type genomes”. By defining distance between genomes in the genome space, it is possible to make this approach quantitative. DNA–DNA hybridization gives some “distance” between genomes, but cannot be used incrementally to build an entire distance matrix, while CVTree can. We will elaborate this point in forthcoming publications.

Authors’ contributions

BH designed the study. GZ built and maintained the web server, collected data, and carried out the calculation. GZ and BH performed the analysis. GZ, JQ, and BH wrote the manuscript.

Competing interests

The authors have declared that no competing interests exist.
  29 in total

1.  Description of Komagataeibacter gen. nov., with proposals of new combinations (Acetobacteraceae).

Authors:  Yuzo Yamada; Pattaraporn Yukphan; Huong Thi Lan Vu; Yuki Muramatsu; Duangjai Ochaikul; Somboon Tanasupawat; Yasuyoshi Nakagawa
Journal:  J Gen Appl Microbiol       Date:  2012       Impact factor: 1.452

2.  The phylogeny of acetic acid bacteria based on the partial sequences of 16S ribosomal RNA: the elevation of the subgenus Gluconoacetobacter to the generic level.

Authors:  Y Yamada; K Hoshino; T Ishikawa
Journal:  Biosci Biotechnol Biochem       Date:  1997-08       Impact factor: 2.043

3.  Evidence from Aeromonas for genetic crossing-over in ribosomal sequences.

Authors:  P H Sneath
Journal:  Int J Syst Bacteriol       Date:  1993-07

4.  Transfer of Gluconacetobacter kakiaceti, Gluconacetobacter medellinensis and Gluconacetobacter maltaceti to the genus Komagataeibacter as Komagataeibacter kakiaceti comb. nov., Komagataeibacter medellinensis comb. nov. and Komagataeibacter maltaceti comb. nov.

Authors:  Yuzo Yamada
Journal:  Int J Syst Evol Microbiol       Date:  2014-02-12       Impact factor: 2.747

5.  Complete genome of Phenylobacterium zucineum--a novel facultative intracellular bacterium isolated from human erythroleukemia cell line K562.

Authors:  Yingfeng Luo; Xiaoli Xu; Zonghui Ding; Zhen Liu; Bing Zhang; Zhiyu Yan; Jie Sun; Songnian Hu; Xun Hu
Journal:  BMC Genomics       Date:  2008-08-13       Impact factor: 3.969

6.  Improvements to PATRIC, the all-bacterial Bioinformatics Database and Analysis Resource Center.

Authors:  Alice R Wattam; James J Davis; Rida Assaf; Sébastien Boisvert; Thomas Brettin; Christopher Bun; Neal Conrad; Emily M Dietrich; Terry Disz; Joseph L Gabbard; Svetlana Gerdes; Christopher S Henry; Ronald W Kenyon; Dustin Machi; Chunhong Mao; Eric K Nordberg; Gary J Olsen; Daniel E Murphy-Olson; Robert Olson; Ross Overbeek; Bruce Parrello; Gordon D Pusch; Maulik Shukla; Veronika Vonstein; Andrew Warren; Fangfang Xia; Hyunseung Yoo; Rick L Stevens
Journal:  Nucleic Acids Res       Date:  2016-11-29       Impact factor: 16.971

7.  Introducing EzBioCloud: a taxonomically united database of 16S rRNA gene sequences and whole-genome assemblies.

Authors:  Seok-Hwan Yoon; Sung-Min Ha; Soonjae Kwon; Jeongmin Lim; Yeseul Kim; Hyungseok Seo; Jongsik Chun
Journal:  Int J Syst Evol Microbiol       Date:  2017-05-30       Impact factor: 2.747

8.  Database resources of the National Center for Biotechnology Information.

Authors: 
Journal:  Nucleic Acids Res       Date:  2018-01-04       Impact factor: 16.971

9.  CVTree update: a newly designed phylogenetic study platform using composition vectors and whole genomes.

Authors:  Zhao Xu; Bailin Hao
Journal:  Nucleic Acids Res       Date:  2009-04-26       Impact factor: 16.971

10.  LPSN--list of prokaryotic names with standing in nomenclature.

Authors:  Aidan C Parte
Journal:  Nucleic Acids Res       Date:  2013-11-15       Impact factor: 16.971

View more
  2 in total

1.  Genomic Insights Into Cadmium Resistance of a Newly Isolated, Plasmid-Free Cellulomonas sp. Strain Y8.

Authors:  Jinghao Chen; Likun Wang; Wenjun Li; Xin Zheng; Xiaofang Li
Journal:  Front Microbiol       Date:  2022-01-28       Impact factor: 5.640

2.  CVTree: A Parallel Alignment-free Phylogeny and Taxonomy Tool Based on Composition Vectors of Genomes.

Authors:  Guanghong Zuo
Journal:  Genomics Proteomics Bioinformatics       Date:  2021-06-10       Impact factor: 6.409

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.