Sarah Bello1, Bashudev Rudra1, Radhey S Gupta1. 1. Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, Ontario, L8N 3Z5, Canada.
Abstract
Entities:
Keywords:
Fructobacillus, Oenococcus and Weissella; conserved signature indels (CSIs); description of the genus Periweissella; molecular signatures specific for Leuconostoc; phylogenomic and comparative genomic analyses
Supplementary data for this manuscript can be found at https://doi.org/10.6084/m9.figshare.18866273.v1 [1].
Introduction
The family
[2, 3], comprises Gram-positive, non-spore-forming, anaerobic or aerotolerant bacteria, which are usually found in nutrient-rich environments such as milk, meat, vegetable products, roots, foods and fermented products [3-5]. Similar to the other lactic acid bacteria, the major end products of their heterofermentative carbohydrate metabolism include lactic acid, CO2, ethanol and/or acetate [3, 4, 6]. Until recently, the family
consisted of five genera, namely
[7],
[8],
[4, 9, 10],
[11] and
[5, 12], which have now been merged within the family
[13]. However, as the name
remains a valid name, for the sake of convenience, in the present study we will be referring to this group of species by this family name. Most of the genera within
have originated from the taxonomic reassignments of species from the genus
[2]. Earlier studies based on phylogenetic analysis of 16S rRNA and 23S rRNA gene sequences [12, 14] led to the transfer of a number of
species into two novel genera viz.
[12] and
[11]. In 2008, Endo and Okada [8], based on their analysis of the genus
using 16S rRNA, 16S–23S rRNA gene intergenic spacer region, and rpoC and recA genes, transferred four additional
species into a new genus
[8]. Members of the genus
also differ from other
species in terms of their morphology, preference for growth in presence of fructose, and several genomic characteristics [15, 16]. Later, Praet et al. [7], based on their analysis of 16S rRNA gene sequences and the G+C content, proposed the creation of the genus
, which branches in between the genera
and
in the 16S rRNA gene tree. However, in contrast to the other members of the family
, whose evolutionary relationships have been extensively studied based on genome sequences [17-20], our current understanding of the evolutionary relationships and classification of species within the family
is based primarily on analysis of the 16S rRNA gene and a limited number of other gene sequences [4, 5, 16, 21–23], and it requires further investigation. In the published 16S rRNA gene trees, the genera
,
and
form a strongly supported clade [7, 8]. Additionally, in most of the phylogenetic studies of
species [24-30], several species from this genus (viz.
,
,
,
and
) branch distinctly from the main clade of
species containing the type species (
) of this genus [12]. These studies suggest that the
species are likely to comprise two phylogenetically distant clades, but this inference needs to be confirmed and supported by other more reliable means. Thus, it is important to carry out detailed phylogenomic and comparative genomic studies of members of the family
to reliably discern their evolutionary relationships.The family
presently contains 49 validly published and three non-validly published species [31]. In the past few years, as a result of several major genomic-sequence projects [32-34], genome sequences for 47 of the
species have become available in the NCBI database (www.ncbi.nlm.nih.gov/genome/). These genomes provide a comprehensive resource for undertaking detailed studies to clarify the evolutionary relationships among
species. Using these genome sequences, we have reconstructed a highly resolved phylogenetic tree based on concatenated sequences of 498 core proteins for this family. The sequence alignments of the core proteins were also utilized to determine the pairwise average amino acid identity (AAI) [35] for different members of this family. In addition, we have performed comparative genomic analyses to identify molecular signatures in the form of conserved signature indels (CSIs) in protein sequences, which are specific for different main clades within the family
. Molecular markers such as the CSIs, which are uniquely shared by a given group of organisms, provide strong evidence independent of phylogenetic analyses of the monophyly and genetic cohesiveness of different observed clades. Furthermore, the CSIs specific for a given clade provide reliable means for the circumscription of these clades in molecular terms [36-40]. The results of these analyses have identified 46 CSIs in diverse proteins that are specific for different strongly supported clades within the family
, including those that are specific for the genera
,
and
. In addition, the results presented here provide strong evidence that the genus
is polyphyletic and these species form two distinct clades. The members from these two clades can be reliably distinguished from each other based on their branching in phylogenetic trees and multiple identified CSIs that are exclusively shared by the species from these two clades. Based on the compelling evidence obtained from these studies, we are proposing a division of the genus
into an emended genus
and a novel genus Periweissella gen. nov.
Methods
Reconstruction of phylogenetic trees
Genome sequences for 47
species, whose annotated protein sequences were available in the NCBI genome database, were downloaded. In addition, the genomes of three
species (
,
and
) were included in our dataset for rooting the tree. Using these genome sequences, a rooted phylogenomic tree was reconstructed based on concatenated sequences of all core proteins from the family
by methods detailed in our earlier work [38, 41, 42]. Briefly, the CD-HIT program was used to identify protein families where the proteins were present in at least 80 % of the genomes in the dataset and shared at least 50 % of sequence length and identity [43]. The Clustal Omega program [44] was then used to generate multiple sequence alignments (MSAs) of the proteins. These MSAs were converted into profile hidden Markov models [45], which were then used to search for other members of the protein families in the input genomes. The alignments obtained were trimmed using TrimAl program [46] to remove poorly aligned sections and to create a core proteins alignment. The final alignment used for phylogenetic analysis was based on 498 proteins and it contained 163 109 aligned positions. A maximum-likelihood tree based on this sequence alignment was initially reconstructed with FastTree 2 [47], based on the Whelan and Goldman model [48], and it was then optimized using RAxML based on the Le and Gascuel model [49]. RAxML was also used to calculate Shimodaira–Hasegawa (SH)-like statistical support values for each node. The resultant phylogenetic tree was drawn using mega X [50]. The sequence alignment of the 498 core proteins was also used to determine the pairwise average amino acid sequence identity (AAI) between the type species of different genera within the family
[51].A 16S rRNA gene tree was also reconstructed based on sequences of all
species/type strains obtained from the silva ribosomal RNA database [52], and the NCBI genome database (www.ncbi.nlm.nih.gov).
species
and
were included in the dataset for rooting purpose. The sequences were aligned using the muscle program in mega X [50]. The non-conserved regions as well as regions with gaps were removed, leaving 1269 positions in the final aligned dataset. A maximum-likelihood phylogenetic tree based on this dataset was created using mega X [50], employing the Tamura–Nei model [53], based on 100 bootstrap replicates.
Identification of CSIs
The identification of CSIs was carried out as described in detail in earlier work [37, 54]. Briefly, blastp searches using the NCBI non-redundant database were carried out on all proteins from the genomes of
,
and
. Based on these blast searches, protein sequences were obtained for 10–15 divergent
species and 8–10 species from other bacterial taxa. The multiple sequence alignments of various proteins were created using ClustalX 2.1. These alignments as well as the alignment for various protein families obtained from the CD-HIT program were examined for insertions or deletions of fixed length that were present in conserved regions [i.e. flanked on both sides by at least 4–5 conserved amino acids (aa) in the neighbouring 40–50 aa] and specifically shared by species from different main clades of
species in the core genome tree. The query sequences of interest containing the identified conserved indels and its flanking 30–50 aa (generally beginning and ending with a stretch of completely conserved amino acid residues) were reblasted using the NCBI nr (non-redundant) database and the top 500 hits were examined. Based on these blastp searches, conserved indels which were specifically shared by all or most of the species from different main clades of
were identified and further formatted using the SIG_CREATE and SIG_STYLE programs (available from http://gleans.net/) [37]. Due to space constraints, sequence information is presented in the main figures for only a limited number of species. However, unless otherwise stated, the CSIs described here are exclusively shared by the indicated groups of
and absent in all other bacterial homologues in the top 500 blastp hits examined. More detailed information for different CSIs is provided in the supplementary data [55].
Results
Phylogenetic analysis of the species from the family
To elucidate the evolutionary relationships among members of the family
, we have reconstructed a maximum-likelihood phylogenomic tree based on the genomes of 47
species whose sequences were available in the NCBI database. The accession numbers and some other characteristics of the genomes that were utilized for this tree reconstruction are provided in Table S1 (available in the online version of this article) [55]. The resulting tree, which is based on concatenated sequences for 498 proteins that are commonly shared by the species from
genomes, is shown in Fig. 1. This tree, which will be referred to as the core genome tree, was rooted using the sequences for representative
species (see Methods). As seen from Fig. 1, all of the nodes in this core genome tree are supported by 100 % SH-like statistical support values (similar to the bootstrap scores), which indicates that the evolutionary relationships amongst
species as seen here are reliable. The tree shown in Fig. 1 provides several important insights into the evolutionary relationships among members of the family
. First, the tree shows that species from the genera
,
and
form strongly supported clades. For the genus
, in addition to the genomes for named species, genome sequences are also available for a number of unnamed species. Information for these
species is included in a phylogenetic tree presented in Fig. S1 [55] and these species also grouped reliably with the other members of the genus
. Second, the species
, which is the sole species in the genus
, branches distinctly in between the genera
and
. Third, the tree also shows that the species from the genera
,
and
form a strongly supported clade, which is separated from the neighbouring genus
by a long branch. We will be referring to this clade comprising the genera
,
and
as the ‘larger
clade’. Lastly, a fourth important aspect of this tree is that it shows that species from the genus
do not form a monophyletic lineage, but they are separated into two distinct clades/lineages. Of the two
species clades, the larger clade contains the species
, which is the type species of this genus. Hence, we have designated this clade as the ‘
main clade’. The second
clade comprises three genome-sequenced species (
,
,
) and this clade forms an outgroup of the remainder of the
species as well as other
genera. We will be referring to this smaller clade as the ‘
clade 2’.
Fig. 1.
A bootstrapped maximum-likelihood tree for 47 genomes sequenced
species based on concatenated sequences for 498 core proteins. The statistical support values for different branches are indicated on the nodes. This tree was rooted by using species from the genus Lactobacillus. Non-validly published species are shown within quotation (“ ”) marks. Different main species clades observed in the tree are identified by the names of the genera or other designated clade names.
A bootstrapped maximum-likelihood tree for 47 genomes sequenced
species based on concatenated sequences for 498 core proteins. The statistical support values for different branches are indicated on the nodes. This tree was rooted by using species from the genus Lactobacillus. Non-validly published species are shown within quotation (“ ”) marks. Different main species clades observed in the tree are identified by the names of the genera or other designated clade names.In addition to the core genome tree, we have also reconstructed a phylogenetic tree based on 16S rRNA gene sequences for the type strains of all species from the family
(Fig. 2). The overall evolutionary relationships among the
species in the 16S rRNA gene tree are very similar to that seen in the core genome tree (Fig. 1). The species from the genera
and
formed monophyletic clades. All species from the genus
, except
which branched more deeply, also formed a well-supported clade. Similar branching of
in 16S rRNA gene tree has also been observed in an earlier study [8]. Furthermore, a close relationship of the species from the genera
,
and
is also observed in this tree. In addition, species from the genus
also formed two distinct clades, which were separated by long branches. Of these two
species clades, one clade consisting of three genome sequenced species, namely
,
and
, and two other species (viz.
and
) formed a sister lineage of the remainder of the
species and other
genera. As the tree based on core genome proteins is more reliable and provides higher resolution than the 16S rRNA gene tree, we have generally relied on it for most of the phylogenetic inferences derived in this study.
Fig. 2.
A maximum-likelihood phylogenetic tree based on 16S rRNA gene sequences for the type strains of all validly published
species.
species
and
were used to root the tree. The accession numbers of the 16S rRNA gene sequences are given within bracket after each species in the tree. Different main clades within the tree are marked with the names of the genera or other given names.
A maximum-likelihood phylogenetic tree based on 16S rRNA gene sequences for the type strains of all validly published
species.
species
and
were used to root the tree. The accession numbers of the 16S rRNA gene sequences are given within bracket after each species in the tree. Different main clades within the tree are marked with the names of the genera or other given names.The sequence alignment of the core genome proteins from members of the family
was also used to calculate pairwise AAI, which provides a measure of the overall genetic relatedness among different species [51]. The matrix depicting the pairwise AAI information for members of the family
is presented in Fig. 3. Detailed information regarding pairwise AAI is provided in Table S2. In Fig. 3, the genome pairs exhibiting higher sequence similarities are shown by a darker shade of green. As seen from Fig. 3, based on the AAI similarity data, species from various genera within the family
exhibit higher intra-genus AAI values in comparison to the inter-group AAI values (see Table S2) [55]. Based on the AAI values, a closer relationship is also observed between members of the genera
,
and Fructobacillus. The intra-group AAI value for this larger
clade is 0.73 in comparison to the inter-generic AAI values, which are in the range of 0.64–0.69. Based on AAI analysis, species from the two
clades are also more closely related to each other than to the other
genera. However, AAI values are not a reliable tool for the demarcation of genera as there is no established threshold for distinction between adjacent bacterial taxa [35, 56].
Fig. 3.
A matrix indicating the pairwise percentage average amino acid identities of the species from different genera within the family
. Genome pairs sharing higher amino acid identity are shaded more darkly (green). The regions of the matrix corresponding to different clades have been marked and labelled.
A matrix indicating the pairwise percentage average amino acid identities of the species from different genera within the family
. Genome pairs sharing higher amino acid identity are shaded more darkly (green). The regions of the matrix corresponding to different clades have been marked and labelled.
Identification of molecular markers specific for different clades within the family
The results of our phylogenomic studies and AAI analysis indicate that the family
comprises a number of distinct clades including some novel species groupings. However, based upon the branching of the species in phylogenetic trees, it is often difficult to reliably delimit the boundaries of different clades [39]. Hence, we have also conducted detailed comparative studies on protein sequences from
genomes to identify molecular markers in the forms of CSIs, which are uniquely shared by members of different observed clades. The CSIs in gene/protein sequences, which are specifically shared by the members of a given clade, constitute synapomorphic characteristics and they provide important class of molecular markers for evolutionary and taxonomic studies [38, 41, 42, 57]. Our analyses of protein sequences from
genomes have identified 46 CSIs specific for different clades within this family and provide important means for their demarcation in molecular terms. The group-specificities and characteristics of the identified CSIs are described below.
CSIs specific for the genera
and
and for the larger
clade
and
are two of the main genera within the family
. The work that we have carried out has identified five CSIs each in different proteins that are specifically shared by the members of each of these two genera. In Fig. 4(a), we present an example of a CSI consisting of a 2 aa insert in an RNA-binding transcriptional accessory protein, which is uniquely shared by all species from the genus
, but not found in any other bacteria within the top 500 blastp hits. Interestingly, the homologs of this protein were not found in
,
and
species. More detailed information for this CSI and the sequence information for four other CSIs, which are also shown to be specific for the genus
, is provided in Figs S2–S6 and some of their characteristics summarized in Table 1. In Fig. 4(b), we show the partial sequence alignment of the protein Asp-tRNA (Asn)/Glu tRNA (Gln) amidotransferase subunit (GatB). The 4 aa insert highlighted in this figure is specific for all members of the genus
including a number of unnamed
species. Besides the
species, this insert is again not found in any other
species or other bacteria within the top blastp 500 hits. In addition to the CSI shown in Fig. 4(b), our analysis has identified four additional CSIs in other proteins, which are also specific for members of the genus
. Detailed sequence information for all of the
-specific CSIs is provided in Figs S7–S11 [55] and some of their characteristics are summarized in Table 1.
Fig. 4.
(a) Partial sequence alignment of the RNA-binding transcriptional accessory protein showing a 2 aa insertion (boxed) that is exclusively shared by all species from the genus
. Detailed sequence information for this CSI as well as four other CSIs specific for the genus
are presented in Figs S2–S6 and some of their characteristics are summarized in Table 1. (b) Excerpts from the sequence alignment of the protein Asp-tRNA(Asn)/Glu tRNA(Gln) amidotransferase subunit (GatB) showing four aa insertion in a conserved region that is specific for all species from the genus
. Detailed sequence information for this CSI as well as four other CSIs specific for the genus
are presented in Figs S7–S11 [55] and some of their characteristics are summarized in Table 1.
Table 1.
Conserved signature indels specific for different clades of the family
(a) Partial sequence alignment of the RNA-binding transcriptional accessory protein showing a 2 aa insertion (boxed) that is exclusively shared by all species from the genus
. Detailed sequence information for this CSI as well as four other CSIs specific for the genus
are presented in Figs S2–S6 and some of their characteristics are summarized in Table 1. (b) Excerpts from the sequence alignment of the protein Asp-tRNA(Asn)/Glu tRNA(Gln) amidotransferase subunit (GatB) showing four aa insertion in a conserved region that is specific for all species from the genus
. Detailed sequence information for this CSI as well as four other CSIs specific for the genus
are presented in Figs S7–S11 [55] and some of their characteristics are summarized in Table 1.Conserved signature indels specific for different clades of the familyProtein nameAccession noIndel sizeIndel positionFigure noSpecificityRNA-binding transcriptional accessory proteinWP_1351973912 aa Ins205–243Figs 4(a) and S2BMP family protein*WP_1502805471 aa Ins51–98Fig. S3BMP family protein*WP_1502805471 aa Ins90–122Fig. S4Universal stress proteinSPJ431402 aa Del5–41Fig. S5Copper resistance proteinWP_1502592992 aa Del37–74Fig. S6Asp-tRNA(Asn)/Glu tRNA(Gln) amidotransferase subunit GatBWP_2036175954 aa Ins323–362Figs 4(b) and S7Xanthine phosphoribosyltransferaseWP_0593780471 aa Ins136–182Fig. S8ABC transporter ATP-binding protein/permeaseWP_0593764301 aa Del326–360Fig. S9NCS2 family nucleobase:cation symporterWP_1877536023 aa Ins205–245Fig. S10Ribonuclease JWP_0593757281 aa Ins186–215Fig. S11Mevalonate kinaseWP_2036183601 aa Del146–183Figs 5(a) and S12
Fig. 5.
(a) A partial sequence alignment of the protein mevalonate kinase showing a 1 aa deletion (boxed) that is exclusively shared by all species from the genera
and
. Sequence information for one more CSI specific for these two genera is presented in Fig. S13 (Table 1). (b) Excerpts from the sequence alignment of the protein phenylalanine tRNA ligase subunit beta showing a 5 aa insertion in a conserved region that is specifically present in all species from the genera
,
a and
. Sequence information for nine other CSIs showing similar specificities are presented in Figs S15–S23 [55] and some of their characteristics are summarized in Table 1.
andd-Alanyl-lipoteichoic acid biosynthesis protein DltB†WP_0619927531 aa Ins36–71Fig. S13Phenylalanine tRNA ligase subunit betaWP_0915023065 aa Ins84–127Figs 5(b) and S14Largerclade (
,
,
)Valine tRNA ligaseWP_0103863632 aa Del650–689Fig. S15Phenylalanine tRNA ligase subunit alphaWP_0106921084aa Ins110–148Fig. S16Diphosphomevalonate decarboxylase proteinWP_0116800781 aa Ins210–249Fig. S17Single-stranded-DNA-specific exonuclease RecJWP_0593773471 aa Ins254–295Fig. S18RluA family pseudouridine synthaseWP_0899378713 aa Ins234–376Fig. S19ATP-dependent Clp protease ATP-binding subunit ClpXWP_0899387231 aa Ins13–42Fig. S20PolC-type DNA polymerase IIIWP_0899394571 aa Ins926–957Fig. S21Transcription-repair coupling factorWP_0915025822 aa Ins531–576Fig. S22Chromosome segregation protein SMCWP_1486064652 aa Del565–606Fig. S23*Not shared by Leuconostoc fallax.†Also shared by two Lactobacillaceae species.In our core genome tree as well as in the 16S rRNA gene tree (Figs. 1 and 2), the species
forms outgroup of the genus Fructobacillus. A close relationship of
to
has also been observed in earlier studies [7]. Our analysis has identified two CSIs that are commonly and exclusively shared by the members of these two genera. Sequence information for one of these CSIs, consisting of a 1 aa deletion in the protein mevalonate kinase is presented in Fig. 5(a). As can be seen, the identified CSI is specifically shared by
and all of the
species including unnamed members of this genus. More detailed information for this CSI, as well as sequence information for another CSI showing similar specificity found in the protein d-alanyl-lipoteichoic acid biosynthesis protein (DltB), is provided in Figs S12 and S13 and some of their characteristics summarized in Table 1.(a) A partial sequence alignment of the protein mevalonate kinase showing a 1 aa deletion (boxed) that is exclusively shared by all species from the genera
and
. Sequence information for one more CSI specific for these two genera is presented in Fig. S13 (Table 1). (b) Excerpts from the sequence alignment of the protein phenylalanine tRNA ligase subunit beta showing a 5 aa insertion in a conserved region that is specifically present in all species from the genera
,
a and
. Sequence information for nine other CSIs showing similar specificities are presented in Figs S15–S23 [55] and some of their characteristics are summarized in Table 1.In our core genome tree, members of the
,
and
form a strongly supported clade, which we have designated as the ‘Larger
clade’. A close relationship of the species from these three genera is also evident from the results of the AAI matrix (Fig. 3). Furthermore, a close relationship of the species from these three genera is independently strongly supported by our identification of 10 CSIs in different proteins, which are specifically shared by the members of these three genera. One example of a CSI specific for these three genera is presented in Fig. 5(b), where a 5 aa insert in the protein phenylalanine tRNA ligase beta subunit is exclusively present in all members of these three genera, but it is not found in any other
genera or other bacteria. More detailed information for this CSI and sequence information for the other nine CSIs, which are also specific for the larger
clade, is provided in Figs S14–S23 [55] and some of their characteristics are summarized in Table 1. The results from these CSIs provide strong evidence that the members of these three genera (viz.
,
and
) shared a common ancestor exclusive of all other bacteria and they provide reliable means to demarcate the species from this clade in molecular terms.
CSIs specific for the genus
and for the two clades of
species
In our core genome tree, members of the genus
form a strongly supported clade, which is separated from all other
genera by a long branch (Fig. 1). The distinctness of the genus
from all other
genera is also strongly supported by 13 identified CSIs, which are exclusively shared by the members of this genus. Sequence information for one of these CSIs is shown in Fig. 6(a), where a 4 aa insertion in the protein DNA-directed RNA polymerase beta subunit is exclusively present in all
species but not found in the protein homologs from any other bacteria in the top 500 blastp hits. Detailed sequence information for this CSI as well as 12 others, which are also specific for the genus
, is presented in Figs S24–S36 [55] and some of their characteristics are summarized in Table 2.
Fig. 6.
(a) Partial sequence alignment of the protein DNA-directed-RNA polymerase subunit beta showing a 4 aa insertion (boxed) that is exclusively present in all species from the genus
. Sequence information for 12 other CSIs specific for the genus
are presented in Figs S25–S36 and some of their characteristics are summarized in Table 2. (b) Excerpts from the sequence alignment of the protein phospho-N-acetylmuramoyl-pentapeptide-transferases showing eight aa insertion in a conserved region that is exclusively shared by all species from the
main clade. Sequence information for five other CSIs specific for this clade are presented in Figs S38–S42 [55] and some of their characteristics are summarized in Table 2. (c) Partial sequence alignment of the protein DEAD/DEAH box helicase showing a 3 aa deletion in a conserved region that is specifically present in all species from the
clade 2. Sequence information for four other CSIs showing similar specificity is presented in Figs S44–S47 and some of their characteristics are summarized in Table 2.
Table 2.
Conserved signature indels specific for the genus
and for the two
species clades
(a) Partial sequence alignment of the protein DNA-directed-RNA polymerase subunit beta showing a 4 aa insertion (boxed) that is exclusively present in all species from the genus
. Sequence information for 12 other CSIs specific for the genus
are presented in Figs S25–S36 and some of their characteristics are summarized in Table 2. (b) Excerpts from the sequence alignment of the protein phospho-N-acetylmuramoyl-pentapeptide-transferases showing eight aa insertion in a conserved region that is exclusively shared by all species from the
main clade. Sequence information for five other CSIs specific for this clade are presented in Figs S38–S42 [55] and some of their characteristics are summarized in Table 2. (c) Partial sequence alignment of the protein DEAD/DEAH box helicase showing a 3 aa deletion in a conserved region that is specifically present in all species from the
clade 2. Sequence information for four other CSIs showing similar specificity is presented in Figs S44–S47 and some of their characteristics are summarized in Table 2.Conserved signature indels specific for the genus
and for the two
species cladesProtein nameAccession noIndel sizeIndel positionFigure noSpecificityDNA-directed-RNA polymerase subunit betaWP_0116776884 aa Ins366–408Figs 6(a)and S24Preprotein translocase subunit SecYWP_0714506071 aa Ins31–72Fig. S25Preprotein translocase subunit SecYWP_0968665682 aa Ins358–408Fig. S2650S ribosomal protein L13WP_0968771091 aa Ins62–98Fig. S27Molecular chaperone DnaKWP_0028167764 aa Del244–287Fig. S28Riboflavin kinaseWP_1803709435 aa Ins22–83Fig. S29Sua5/YciO/YrdC/YwlC family proteinWP_1437953621 aa Ins54–89Fig. S30Amino acid permeaseWP_1438051352 aa Ins24–63Fig. S31RluA family pseudouridine synthaseWP_0077462771 aa Ins145–189Fig. S32TatD family hydrolaseWP_1803692272 aa Ins48–70Fig. S33YidC/Oxa1 family membrane protein insertaseWP_1803693971 aa Ins19–76Fig. S34Class I SAM-dependent RNA methyltransferaseWP_0028243301 aa Ins73–111Fig. S35GTPase HflXWP_0714383552 aa Ins67–135Fig. S36Phospho-N-acetylmuramoyl-pentapeptide-transferasesWP_0752700248 aa Ins281–315Figs 6(b)and S37main cladeAPC family permeaseWP_0702293751 aa Del579–609Fig. S38Alanine tRNA ligase*WP_0752698771 aa Del368–401Fig. S39Cytochrome d ubiquinol oxidase subunit IIWP_0702303951 aa Ins165–205Fig. S40Response regulator transcription factorWP_0702308081 aa Ins178–211Fig. S41Endonuclease MutS2WP_1154716532 aa Ins614–655Fig. S42DEAD/DEAH box helicaseWP_1687222013 aa Del86–120Figs 6(c)and S43clade 2Hydroxyethylthiazole kinaseWP_1333644962 aa Ins48–93Fig. S44ArgR family transcriptional regulatorWP_1333638331 aa Ins3–45Fig. S45Flp pilus assembly complex ATPase component TadAWP_1687220241 aa Ins136–174Fig. S46Amidophosphoribosyl transferase proteinWP_1333625701 aa Ins64–102Fig. S47*Also shared by a few Bacillales species.The
species form two distinct clades in the core genome tree as well as in the tree based on 16S rRNA gene sequences. Our comparative genomic analyses have identified multiple CSIs that are specific for the members of these two clades reliably distinguishing them from each other as well as other bacteria. Of these CSIs, six are exclusively found in the species from the ‘
main clade’. One example of such a CSI is presented in Fig. 6B, where an 8 aa insert in the protein phospho-N-acetylmuramoyl-pentapeptide-transferases is exclusively present in all species from the
main clade but not found in the protein homologs from other
species or any other bacteria in the top 500 blastp hits. Detailed sequence information for this CSI and five other CSIs specific for the
main clade is presented in Figs S37–S42 [55] and some of their characteristics are summarized in Table 2. The
clade 2 comprises three genome-sequenced species (viz.
,
,
) and five of the identified CSIs are exclusively shared by these three species. Sequence information for one of the CSIs specific for
clade 2 is presented in Fig. 6C. In the example shown, a 3 aa deletion in the protein DEAD/DEAH box helicase is exclusively present in all three members of
clade 2, but not found in the protein homologs from any other bacteria in the top 500 blastp hits. Detailed sequence information for this CSI and four other CSIs that are also specific for
clade 2 is presented in Figs S43–S47 [55] and some of their characteristics are summarized in Table 2.
Discussion
Members of the family
belong to a group of bacteria commonly referred to as lactic acid bacteria. Most of these bacteria generally produce lactic acid as a byproduct of sugar degradation [3]. Because of this trait, these bacteria have found widespread usage in food manufacturing for the purpose of various fermentation processes/products [3]. These bacteria are commonly present in human and animal gastrointestinal tracts, plants, dairy products and some beverages and some of them are also of clinical significance [3, 23]. Due to these characteristics, it is important to understand the evolutionary relationships among these bacteria and to identify novel and reliable means for the identification of different groups within these bacteria [3, 4, 13, 15]. In the present work, we have examined the evolutionary relationships among members of the family
based on phylogenetic and comparative analyses of protein sequences from whole genomes. Although the family
has recently been merged within the family
[13], for the sake of convenience, this group is referred to here by the name
, which remains a valid name under the prokaryotic code [58]. Unlike other members of the family
, whose evolutionary relationships has been studied in detail based on genomic sequences [13], our current understanding of the evolutionary relationships among members of the family ‘
’ is mainly based on phylogenetic analysis of 16S rRNA gene and in some cases a few housekeeping genes [11, 21, 22, 59]. Thus, the focus of the present work was to examine the evolutionary relationships among members of the family
based on genome sequence data.The present work reports comprehensive phylogenomic and comparative analyses on the genome sequences for most of the species (47 of the 52 named species) from
family using a number of different approaches. The approaches used include: (i) reconstruction of a phylogenetic tree based on concatenated sequences of 498 core proteins from their genomes (Fig. 1); (ii) reconstruction of a phylogenetic tree based on 16S rRNA gene sequences for all
species with validly published names (Fig. 2); (iii) reconstruction of a pairwise AAI matrix for different species based on core genome proteins (Fig. 3), and (iv) detailed analyses of protein sequences from
species to identify CSIs that are specific for members of different clades. These latter studies have identified 46 novel CSIs that are uniquely shared characteristics of different main clades of
species observed in core genome tree providing reliable means for their demarcation in molecular terms. The results from all these analyses present a consistent picture concerning the evolutionary relationships among different
species/genera. A conceptual diagram summarizing the results of these studies as well as the numbers and clade specificities of different identified CSIs is presented in Fig. 7.
Fig. 7.
A conceptual diagram summarizing the results of our phylogenomic and comparative genomic studies on members of the family
. The numbers of CSIs, which constitute molecular syanpomorphies, that are specifically shared by members of different obsevred clades are shown on the nodes. Members of the Weisssella clade 2 are proposed as a novel genus Periweissella gen. nov. The * indicates that the placement of these non-genome sequenced species into the genus Periweisslla is based on branching in the 16S rRNA gene tree.
A conceptual diagram summarizing the results of our phylogenomic and comparative genomic studies on members of the family
. The numbers of CSIs, which constitute molecular syanpomorphies, that are specifically shared by members of different obsevred clades are shown on the nodes. Members of the Weisssella clade 2 are proposed as a novel genus Periweissella gen. nov. The * indicates that the placement of these non-genome sequenced species into the genus Periweisslla is based on branching in the 16S rRNA gene tree.As seen from Fig. 7, the results from different studies support the monophyletic grouping of the species from the genera
,
and
. The members of these genera can also be reliably distinguished from each other as well as all other bacteria on the basis of five, five and 13 CSIs identified in the present work that are uniquely shared properties of the members of these genera. The genus
contains only a single species, which branches in between the genera
and
as an outgroup of the genus
. A close relationship of
to
is also supported by two CSIs that are uniquely shared by these two groups of species. The results presented here also show that members of the genera
,
and
form a strongly supported clade in the core genome tree. A specific grouping of the species from these three genera is also supported by 10 identified CSIs, which are commonly and uniquely shared by the members of these three genera. Furthermore, in an AAI matrix reconstructed based on the core proteins from the family
, the clade consisting of these three genera has average AAI of value of 0.73, which is comparable to the AAI values seen for some other genera viz.
. Thus, based on their phylogenetic grouping, AAI value, and the sharing of large numbers of CSIs, a case can be made for combining species from all three genera into the genus Leuconostoc. However, we do not favour the amalgamation of these three genera, as extensive work on
species provide compelling evidence that they differ from
species both in terms of their morphology as well as large numbers of biochemical characteristics, including their preference for fructose, need for an electron acceptor for glucose assimilation [3, 8, 15, 16]. In addition,
species have smaller genome sizes and lower G+C content in comparison to the
species [3, 8, 15, 16]. Furthermore, multiple CSIs identified in the present work, which are exclusively shared by either the
or
species, also strongly support the distinctness of these two groups of bacteria.The results presented here also provide compelling evidence that the members of the genus
do not constitute a monophyletic grouping but instead comprise two distinct unrelated clades, designated in this work as the ‘
main clade’ and ‘
clade 2’. The branching of
species into two distantly related clades is also observed in earlier studies on the members of this genus [24-29]. In the present work, we have identified six and five CSIs that exclusively found in either different species from the
main clade, or which are specific for the
clade 2 species. In contrast, no CSI was identified that is commonly shared by all
species. The identified CSIs provide strong independent evidence supporting the distinctness of these two clades. It should be noted that the
species are also assigned into different clades in the Genome Taxonomy Database [60], which is now a widely used resource for taxonomic studies. Based upon these results, we are proposing division of the genus
into two genera, an emended genus
corresponding to the
main clade, which contains the type species of this genus Weissella mesenteroides [61], and a new genus Periweissella gen. nov., harbouring various species from the
clade 2.The CSIs in protein sequences result from rare genetic changes [37, 39, 62]. Hence, the shared presence of these molecular synapomorphies by a given clade of species provides strong evidence, independently of the phylogenetic tree, that the species from that clade shared a common ancestor exclusive of all other bacteria and they are specifically related to each other [37, 39, 62]. Additionally, earlier work on CSIs provides evidence that these molecular markers possess high degree of predictive ability to be found in other unidentified or uncharacterized members of these clades [39, 41, 63]. In the present work, the CSIs specific for the genus
are not only commonly shared by all named species from this genus, but also in several unnamed strains/species of
, demonstrating the predictive ability of these markers to be present in other novel or uncharacterized members of a given group. In view of these characteristics, the CSIs that are specific for different clades now provide novel and reliable means for the demarcation of different clades of organisms in molecular terms and have proven very useful for evolutionary/taxonomic studies [41, 42, 57, 63]. To incorporate the information for the CSIs that are specific for the genera
,
and
, emended descriptions of these taxa are also provided. The descriptions of the emended and novel taxa are given below.
Emended description of the genus
van Tieghem 1878 (Approved lists 1980)
(Leu.co.nos’toc. Gr. masc. adj. leukos, clear, light; N.L. neut. n.
, algal generic name; N.L. neut. n.
, colourless
).The description of this genus is partially based on the original description by van Tieghem et al. [9] and Bjorkroth et al. [4]. Cells are Gram-positive, non-spore forming, non-fructophillic, facultatively anaerobic, heterofermentative, non-motile, catalase- and oxidase negative, ovoid or coccus shaped bacteria. Most species have been isolated from fermented dairy and legumes. They grow within the temperature range of 10–40 °C with optimum growth around 25–30 °C in medium with pH between 6 and 7. Growth requires NaCl concentration (0–6 % w/v) with optimal growth achieved at 3 and 4% NaCl for most species. Genome size of the species ranges between 1.6–2.1 Mbp and G+C content ranging between 35.4–44.0 mol%. The majority of known species can utilize d-glucose, d-fructose, d-mannose, lactose to produce lactic acid and CO2 gas as the end products. Species from this genus are used in dairy industries to produce aroma. Members of this genus form a monophyletic clade in phylogenetic trees based on concatenated sequences for large datasets of core proteins. In addition, they can be reliably distinguished from all other
and
genera by the shared presence of 5 identified CSIs (Table 1) in the following four proteins: BMP family protein, copper resistance protein, RNA binding transcriptional accessory protein and universal stress protein. These CSIs, in most cases, are exclusively shared by either all or most members of this genus.The type species is
(Approved Lists) [10].
Emended description of the genus
Endo and Okada 2008
(Fruc.to.ba.cil’lus. N.L. masc. n.
, arbitrarily derived from fructose and
, intended to mean fructose-loving lactic acid-producing bacillus).The description of this genus is modified from the original description by Endo et al. [8]. Cells are facultatively anaerobic, short rod-shaped, non-spore-forming, non-motile bacteria. Catalase activity varies between species. Members are heterofermentative and produce acetic acid, CO2 and lactic acid from d-glucose supplemented with electron acceptors and d-fructose. They can be differentiated from other genera based on their fructophillic metabolism indicating a preference for d-fructose. Temperature range for growth is 5–40 °C with optimal growth at around 30 °C. The pH range for the growth of these species is between pH 4–8 with an optimum around pH 6.5. These bacteria have been isolated from fructose-rich environments ranging from flowers to fruits. Most species require NaCl (2.5–8.0 % w/v). The genome size of the known
species ranges between 1.30–1.70 Mbp and their G+C content ranges from 43.90 to 44.70 mol%. Members of this genus form a monophyletic clade in phylogenetic trees based on 16S rRNA gene sequences and concatenated sequences for several large datasets of proteins. In addition, members of this genus can reliably be distinguished from all other
and
genera by the five CSIs described in this work (Table 1), found within the following proteins: Asp-tRNA(Asn)/Glu tRNA(Gln) amidotransferase subunit (GatB), xanthine phosphoribosyltransferase, ABC transporter ATP-binding protein/permease, NCS2 family nucleobase:cation symporter and ribonuclease J. These CSIs, in most cases, are exclusively shared by either all or most members of this genus.The type species is
[8]
Emended description of the genus
Dicks et al. 2015
(Oe.no.coc’cus. Gr. masc. n. oînos, wine; N.L. masc. n. coccus, berry; from Gr. masc. n. kokkos, grain; N.L. masc. n.
, coccus from wine).The description of this genus is modified from that given by Dicks et al. [11]. Cells are Gram-positive ellipsoidal cocci, usually appear in pairs. Can grow either in anaerobic or aerobic conditions. Obligately heterofermentative, non-motile, non-spore-forming, and oxidase- and catalase-negative. Member species have been isolated from various alcoholic beverages which is attributed to the ability of most species to undergo malolactic fermentation needed for alcohol fermentation. Most species are mesophilic and require NaCl (0–2.5 % w/v). Due to their acidophilic nature, they grow at pH values ranging from pH 3.5 to 7.5, with optimal pH between 6.0–6.8. Temperature range for growth is 5–40 °C, with optimum growth at around 25–30 °C. The DNA G+C content ranges from 37.60 to 42.70 mol%. Members of this genus form a monophyletic clade in 16S RNA gene sequences and phylogenetic trees based on concatenated sequences for several large datasets of proteins. In addition, the members of this genus can reliably be distinguished from all other
and
species by the 13 CSIs described in this work in the following proteins: DNA-directed-RNA polymerase subunit beta, two different CSIs in preprotein translocase subunit SecY, 50S ribosomal protein L13, molecular chaperone DnaK, riboflavin kinase, Sua5/YciO/YrdC/YwlC family protein, RluA family pseudouridine synthase, amino acid permease, TatD family hydrolase, YidC/Oxa1 family membrane protein insertase, class I SAM-dependent RNA methyltransferase and GTPase HflX. The described CSIs in most cases are exclusively shared by either all or most members of this genus.The type species is
[11].
Emended description of the genus
Collins et al. 1994
(Weiss.el’la. N.L. fem. dim. n.
, named after Norbert Weiss, a German microbiologist known for his many research contributions to the taxonomy of the lactic acid bacteria).The description of this genus is partially based on the original description by Collins et al. [12] and emended by Padonou et al. [24]. Some other characteristics of this genus are reviewed by Björkroth et al. [5]. Cells are Gram-positive, obligately heterofermentative, non-spore-forming, non-motile short rods or cocci. Growth occurs at pH ranging from pH 3 to 8 and at 10–37 °C (optimum growth for most species at 18–25 °C). Genome sizes range between 1.33–2.51 Mbp and their G+C content vary between 35.40–45.40 mol%. Although these species are generally non-pathogenic, some
species (
,
, W.confusa, W.cibaria) are opportunistic bacteria infecting in post-operative patients as well as some animals. Many of these species are unable to hydrolyse arginine and have been isolated from fermenting meat, dairy and vegetables. Members of this genus form a monophyletic clade, distinct from all other bacteria including those from the genus Periweissella, in 16S rRNA gene tree and in a phylogenetic tree based on concatenated sequences for large datasets of core genome proteins. In addition, members of this genus can be reliably distinguished from all other
and
genera, including Periweissella, by the shared presence of six CSIs identified in the present work (Table 2) in the following proteins: phospho-N-acetylmuramoyl-pentapeptide-transferases, APC family permease, alanine tRNA ligase, cytochrome d ubiquinol oxidase subunit II, response regulator transcription factor and endonuclease MutS2. These CSIs, in most cases, are exclusively shared by either all or most members of this genus.The type species is
Collins et al. [61]
Description of Periweissella gen. nov.
Periweissella (Pe.ri.weiss.el’la. Gr. prep. peri, about, around or nearby; N.L. fem. dim. n.
, a bacterial genus named after Norbert Weiss, a German microbiologist; N.L. fem. dim. n. Periweissella, a genus about or nearby
)Cells are Gram-positive, obligately heterofermentative, non-spore-forming, short rods or cocci. Most species within this genus are non-motile apart from
. Growth occurs in the presence of 0–5 % NaCl (w/v) in the pH range from pH 3.9 to 9.0 (optimum, pH 6.0–7.0) across different species. Genome sizes range between 1.80–3.10 Mbp and G+C content varies between 35.40–41.10 mol%. Most of the colonies grow in temperatures ranging from 15 to 37 °C (optimum, 28–30 °C). Several of these species are able to hydrolyse arginine and have been isolated from fermenting cocoa or cassava. Members of this genus form a monophyletic clade, distinct from all other bacteria including those from the genus
, in a 16S rRNA gene tree and in a phylogenetic tree based on concatenated sequences for large datasets of core genome proteins. In addition, members of this genus can be reliably distinguished from all other
and
genera, including
, by the shared presence of five CSIs identified in the present work (Table 2) in the following proteins: amidophosphoribosyl transferase protein, DEAD/DEAH box helicase, ArgR family transcriptional regulator, Flp pilus assembly complex ATPase component (TadA) and hydroxyethylthiazole kinase. Most of these CSIs are exclusively shared by either all or most members of this genus.The type species is Periweissella ghanensis
Description of Periweissella ghanensis comb. nov.
Periweissella ghanensis (gha.nen’sis. N.L. fem. adj. ghanensis, pertaining to Ghana).Basonym:
De Bruyne et al. 2008.The description of this species is as provided by De Bruyne et al. [64] for Weissella ghanensis.Type strain: 215T=DSM 19935T=LMG 24286T
Description of Periweissella beninensis comb. nov.
Periweissella beninensis (be.nin.en’sis. N.L. fem. adj. beninensis pertaining to Benin, where the type strain was isolated).Basonym:
Padonou et al. 2010The description of this species is as provided by Padonou et al. [24] for Weissella beninensis.Type strain: 2L24P13T=DSM 22752T=LMG 25373T
Description of Periweissella cryptocerci comb. nov.
Periweissella cryptocerci(cryp.to.cer’ci. N.L. gen. n. cryptocerci, of Cryptocercus, a genus of insect from which the species was isolated).Basonym:
Heo et al. 2019The description of this species is as provided by Heo et al. [27] for Weissella cryptocerci.Type strain: 26KH-42T=KACC 18423T=NBRC 113066T
Description of Periweissella fabalis comb. nov.
Periweissella fabalis (fa.ba’lis. L. fem. adj. fabalis, of or belonging to beans).Basonym:
Snauwaert et al. 2013The description of this species is as provided by Snauwaert et al. [26] for Weissella fabalis.Type strain: CCUG 61472T=DSM 28407T=LMG 26217T=M75T
Description of Periweissella fabaria comb. nov.
Periweissella fabaria (fa.ba’ri.a. L. fem. adj. fabaria, of or belonging to beans).Basonym:
De Bruyne et al. 2010The description of this species is as provided by De Bruyne et al. [25] for
.Type strain: 257T=DSM 21416T=LMG 24289TClick here for additional data file.
Authors: Jessy Praet; Ivan Meeus; Margo Cnockaert; Kurt Houf; Guy Smagghe; Peter Vandamme Journal: Antonie Van Leeuwenhoek Date: 2015-03-18 Impact factor: 2.271
Authors: Jinshui Zheng; Stijn Wittouck; Elisa Salvetti; Charles M A P Franz; Hugh M B Harris; Paola Mattarelli; Paul W O'Toole; Bruno Pot; Peter Vandamme; Jens Walter; Koichi Watanabe; Sander Wuyts; Giovanna E Felis; Michael G Gänzle; Sarah Lebeer Journal: Int J Syst Evol Microbiol Date: 2020-04-15 Impact factor: 2.747
Authors: William B Whitman; Tanja Woyke; Hans-Peter Klenk; Yuguang Zhou; Timothy G Lilburn; Brian J Beck; Paul De Vos; Peter Vandamme; Jonathan A Eisen; George Garrity; Philip Hugenholtz; Nikos C Kyrpides Journal: Stand Genomic Sci Date: 2015-05-17