Literature DB >> 29932393

Using glycolysis enzyme sequences to inform Lactobacillus phylogeny.

Katelyn Brandt^1,2, Rodolphe Barrangou^1,2.

Abstract

The genus Lactobacillus encompasses a diversity of species that occur widely in nature and encode a plethora of metabolic pathways reflecting their adaptation to various ecological niches, including humans, animals, plants and food products. Accordingly, their functional attributes have been exploited industrially and several strains are commonly formulated as probiotics or starter cultures in the food industry. Although divergent evolutionary processes have yielded the acquisition and evolution of specialized functionalities, all Lactobacillus species share a small set of core metabolic properties, including the glycolysis pathway. Thus, the sequences of glycolytic enzymes afford a means to establish phylogenetic groups with the potential to discern species that are too closely related from a 16S rRNA standpoint. Here, we identified and extracted glycolysis enzyme sequences from 52 species, and carried out individual and concatenated phylogenetic analyses. We show that a glycolysis-based phylogenetic tree can robustly segregate lactobacilli into distinct clusters and discern very closely related species. We also compare and contrast evolutionary patterns with genome-wide features and transcriptomic patterns, reflecting genomic drift trends. Overall, results suggest that glycolytic enzymes provide valuable phylogenetic insights and may constitute practical targets for evolutionary studies.

Entities: CellLine Chemical Disease Species

Keywords: Lactobacillus; evolution; glycolysis; phylogeny

Mesh：

Substances：
RNA, Ribosomal, 16S

Year: 2018 PMID： 29932393 PMCID： PMC6096939 DOI： 10.1099/mgen.0.000187

Source DB: PubMed Journal: Microb Genom ISSN： 2057-5858

Data Summary

RNA sequencing data has been deposited at the National Center for Biotechnology Information, BioProject PRJNA420353. Though 16S rRNA-based phylogeny methods have been broadly used, they have a limited ability to precisely ascribe genus species across the prokaryotic branch of the tree of life. In this study, we have shown that using glycolysis enzyme sequences for phylogenetic analyses can be applied to the diverse genus Lactobacillus, and is able to consistently unravel phylogenetic groups and precisely ascertain relatedness, even between species nearly identical on the classical ribosomal tree. Because of their universal presence and greater diversity compared to 16S rRNA sequences, we posit that these sequences could be valuable markers in future phylogenetic and microbiome studies, specifically by providing connections to the other major branches, and enabling increased resolution. This can also be used to help identify unknown and un-culturable species, as the glycolysis enzymes are widespread, variable and allow for greater discriminatory power. Importantly, variability within some of the hypervariable regions within glycolytic sequences can also provide discrimination within a species. Looking forward, expanding this analysis to other genera and phylogenetic branches could open new avenues for evolutionary studies, and for investigating the phylogeny, composition and diversity of microbial populations in complex microbiomes.

Introduction

Genome adaptation is an important feature for speciation, and evolutionary processes balance various adaptive techniques for optimal growth and survival. At the genome level, adaptation features may include gene synteny conservation, G+C mol% drift, as well as codon bias optimization [1-3]. A working balance of these and other forces enable an organism to become uniquely adapted to its niche, and build up competitive advantages in shifting environmental conditions, or overcome predators and competitors. Such unique adaptations are the basis of phylogenetic studies and allow researchers various degrees of discrimination. At the genus and species levels, additions and deletions of genes can be used to define the pan- and core-genome and genome architecture can be used to evaluate synteny [4]. At the strain level, nucleotide polymorphisms afford the highest resolution opportunities, with the ability to compare and contrast nearly identical isolates and even clonal relatives [5, 6]. For prokaryotic species, various tools and methodologies have been used to compare and contrast genomes, but the challenges are often genus- or species-specific, and approaches can vary depending on the desired resolution and encompassed genetic diversity [7]. In some cases where within genus diversity is extensive, such as in bifidobacteria and lactobacilli, using canonical housekeeping genes or universal markers (i.e. 16S rRNA) has proven difficult or limited [8-11]. Also, there has yet to be defined a consistent set of genes to be utilized for multilocus sequence typing studies. Indeed, while universally conserved 16S rRNA sequences afford opportunities for metagenomic analyses, their shortcomings and biases are increasingly under scrutiny [12-14]. For some genera, it has become obvious that the 16S rRNA resolution limit has been met and a new set of criteria must be established. One such genus is Lactobacillus. Belonging to the lactic acid bacteria (LAB) group, this genus is composed of over 150 Gram-positive, low G+C species [15, 16]. Lactobacilli have been used as starter cultures in the food industry for decades, and by humankind for millennia, and as such have been labelled generally regarded as safe (GRAS) and benefit from the qualified presumption of safety (QPS) [17]. Food-related studies have led to the assertion that some strains in select species are to be considered probiotic (‘live microorganisms which when administered in adequate amounts confer a health benefit on the host’) [18] and, as such, are now predominantly featured in dairy foods and widely formulated in probiotic dietary supplements [19]. Recently, the advent of microbiome studies has revealed that microbial populations are more numerous, diverse and variable than originally thought [20, 21]. With both qualitative and quantitative considerations, associations and sometimes even correlations have been established between members of the microbiome and host health, though the accuracy and precision with which bacteria are identified vary widely and are not universally satisfactory. One such instance concerns the genus Lactobacillus, which has been established as an important colonizer of the human gastrointestinal tract [22]. Additional research is thus needed in this area, as researchers better grasp the role of this genus in health and disease [23-28]. Some lactobacilli are already being exploited, for example, as a tool to deliver vaccines [29]. Arguably, we are far from exhausting all the possible uses of this functional genus. However, in order to be able to fully utilize the numerous functions of Lactobacillus, we must first establish a method that enables us to properly identify and relate the many diverse species within this genus. While 16S rRNA sequencing has gotten us this far, it has a limited ability to distinguish between closely related species and represent overall genomic content and reflect genome-wide trends. These shortcomings are certainly not unique to Lactobacillus, and with the ever-increasing expansion of our understanding of the microbial world [30], there is a need to identify 16S rRNA-independent genomic features that capture diversity on a more granular level. Thus, it is imperative that a standard method be developed that allows the proper identification of species. In order to achieve this, we assessed the potential of the widespread glycolysis pathway enzyme sequences to inform phylogeny. In this paper, we applied a previously described method of phylogenetic analysis using the classical glycolysis enzymes as phylogenetic markers [31] to a diverse set of Lactobacillus species in order to establish its effect on a complicated genus. Though previous studies had used glycolysis as an expansion of ribosomal trees [32], we determined how a broad glycolysis-based phylogeny compares to the ribosomal tree. Specifically, previous studies have applied glycolysis-based approaches to LAB in order to define an evolutionary pathway. By adding data from the entirety of the glycolysis and pentose phosphate pathways, Salvetti et al. [32] were able to apply phenotypic data to explain the branching of the LAB tree, as well as highlight some areas of misclassification in the 16S rRNA tree [32]. Here, we propose using the entirety of the canonical glycolysis pathway as a replacement phylogenetic marker for the 16S rRNA. Conveniently, the glycolysis pathway, much like the 16S rRNA, is universally present, at least partially, conserved, and constitutes a set of suitable candidates for phylogenetic analyses [33, 34]. Here, we demonstrate that this method can assign phylogenetic relationships consistent with what is known from the 16S rRNA marker, though at a much higher discriminatory power. Specifically, we compared sequence-based alignment trees of a representative set of lactobacilli using 16S rRNA- and glycolysis-based approaches. We also analysed the occurrence and location, expression, and G+C mol% of each glycolysis gene. The location and transcriptional profiles confirm that these genes are conserved and highly transcribed with varying levels of drift.

Methods

Genomes

We selected 52 diverse species and subspecies of Lactobacillus for analysis, sampled across and throughout the 16S rRNA and core- and pan-genome tree (Table 1). We ensured this set was representative of this paraphyletic genus and included species from various niches, as previously established [16]. The genomes were mined using Geneious version 9.0.5 [35] to identify the classical glycolysis genes in each species (Figs S1 and S2, available with the online version of this article). Four reference genomes were used to make a curated database for the glycolysis genes, namely Lactobacillus acidophilus, Lactobacillus gasseri, Lactobacillus reuteri and Lactobacillus rhamnosus. The Annotate from Database feature was used to annotate the other genomes. To validate the glycolysis annotations, especially in the case of multiple hits, a combination of blast, get_homologues and mRNA-Seq (mRNA sequencing) data was used [36, 37]. The 16S rRNA sequences were extracted from the genomes and blast was used to validate any cases where there were multiple hits. Once annotated and curated, the genes were extracted from the genome. The glycolysis genes were then translated and confirmed by ExPASy [38]. For the concatenated tree, the amino acid sequences were joined together in order of their presence in the glycolysis pathway (Fig. S1).

Table 1.

Species and genomes list

This shows the representative set of 52 Lactobacillus species and sub-species used in this study. Accession numbers and naming conventions are included.

Genus	Species	Subspecies	Strain	Accession no.	Naming convention	Locus tag
Lactobacillus	acidipiscis		KCTC 13900	NZ_BACS00000000	L_acidipiscis	GSS
Lactobacillus	acidophilus		NCFM	NC_006814	L_acidophilus	LBA
Lactobacillus	algidus		DSM 15638	NZ_AZDI00000000	L_algidus	FC66
Lactobacillus	amylolyticus		DSM 11664	NZ_ADNY00000000	L_amylolyticus	HMPREF0493
Lactobacillus	amylovorus		GRL1118	NC_017470	L_amylovorus	LAB52
Lactobacillus	animalis		DSM 20602	NZ_AEOF00000000	L_animalis	LACAN
Lactobacillus	aquaticus		DSM 21051	NZ_AYZD00000000	L_aquaticus	FC19
Lactobacillus	brevis		ATCC 367	NC_008497	L_brevis	LVIS
Lactobacillus	buchneri		CD034	NC_018610	L_buchneri	LBUCD034
Lactobacillus	cacaonum		DSM 21116	NZ_AYZE00000000	L_cacaonum	FC80
Lactobacillus	casei		DSM 20011	NZ_AZCO00000000	L_casei	FC13
Lactobacillus	coryniformis	torquens	DSM 20004	NZ_AEOS00000000	L_coryniformis_t	EWE
Lactobacillus	crispatus		ST1	NC_014106	L_crispatus	LCRIS
Lactobacillus	curvatus		CRL 705	NZ_AGBU00000000	L_curvatus	CRL705
Lactobacillus	delbrueckii	bulgaricus	ATCC BAA-365	NC_008529	L_delbrueckii_b	LBUL
Lactobacillus	farciminis		DSM 20184	NZ_AEOT00000000	L_farciminis	LACFC
Lactobacillus	fermentum		CECT 5716	NC_017465	L_fermentum	LC40
Lactobacillus	floricola		DSM 23037	NZ_AYZL00000000	L_floricola	FC86
Lactobacillus	gallinarum		DSM 10532	NZ_BALB00000000	L_gallinarum	JCM2011
Lactobacillus	gasseri		ATCC 33323	NC_008530	L_gasseri	LGAS
Lactobacillus	helveticus		CNRZ32	NC_021744	L_helveticus	LHE
Lactobacillus	hilgardii		DSM 20176	NZ_ACGP00000000	L_hilgardii	HMPREF0519
Lactobacillus	hominis		DSM 23910	NZ_CAKE00000000	L_hominis	BN55
Lactobacillus	iners		DSM 13335	NZ_ACLN00000000	L_iners	HMPREF0520
Lactobacillus	jensenii		DSM 20557	NZ_AYYU00000000	L_jensenii	FC45
Lactobacillus	johnsonii		NCC 533	NC_005362	L_johnsonii	LJ
Lactobacillus	kimchicus		JCM_15530	NZ_AZCX00000000	L_kimchicus	FC96
Lactobacillus	lindneri		DSM 20690	NZ_JQBT00000000	L_lindneri	IV52
Lactobacillus	mali		DSM 20444	NZ_AKKT00000000	L_mali	LMA
Lactobacillus	mindensis		DSM 14500	NZ_AZEZ00000000	L_mindensis	FD29
Lactobacillus	mucosae		LM1	NZ_CP011013	L_mucosae	LBLM1
Lactobacillus	nasuensis		JCM_17158	NZ_AZDJ00000000	L_nasuensis	FD02
Lactobacillus	oeni		DSM 19972	NZ_AZEH00000000	L_oeni	FD46
Lactobacillus	oris		F0423	NZ_AFTL00000000	L_oris	HMPREF9102
Lactobacillus	otakiensis		DSM 19908	NZ_BASH00000000	L_otakiensis	LOT
Lactobacillus	parabuchneri		DSM 5707	NZ_AZGK00000000	L_parabuchneri	FC51
Lactobacillus	paracasei		N1115	NZ_CP007122	L_paracasei	AF91
Lactobacillus	pasteurii		DSM 23907	NZ_CAKD00000000	L_pasteurii	BN53
Lactobacillus	pentosus		DSM 20314	NZ_AZCU00000000	L_pentosus	FD24
Lactobacillus	plantarum		16	NC_021514	L_plantarum	LP16
Lactobacillus	reuteri		DSM 20016	NC_009513	L_reuteri	LREU
Lactobacillus	rhamnosus		GG	NC_013198	L_rhamnosus	LGG
Lactobacillus	rossiae		DSM 15814	NZ_AZFF00000000	L_rossiae	FD35
Lactobacillus	ruminis		ATCC 27782	NC_015975	L_ruminis	LRC
Lactobacillus	sakei	sakei	DSM 20017	NZ_BALW00000000	L_sakei_s	JCM1157
Lactobacillus	salivarius		CECT 5713	NC_017481	L_salivarius	CECT 5713
Lactobacillus	sanfranciscensis		TMW 1.1304	NC_015978	L_sanfranciscensis	LSA
Lactobacillus	suebicus		DSM 5007	NZ_BACO00000000	L_suebicus	GSK
Lactobacillus	sunkii		DSM 19904	NZ_AZEA00000000	L_sunkii	FD17
Lactobacillus	vaginalis		DSM 5837	NZ_ACGV00000000	L_vaginalis	HMPREF0549
Lactobacillus	versmoldensis		DSM 14857	NZ_BACR00000000	L_versmoldensis	GSQ
Lactobacillus	zymae		DSM 19395	NZ_AZDW00000000	L_zymae	FD38

Species and genomes list

This shows the representative set of 52 Lactobacillus species and sub-species used in this study. Accession numbers and naming conventions are included.

Transcriptional profiles of glycolysis genes

We analysed RNA transcription profiles from mRNA-Seq data for six species (L. acidophilus, Lactobacillus amylovorus, Lactobacillus crispatus, Lactobacillus delbrueckii subsp. bulgaricus, L. gasseri, and Lactobacillus helveticus) with the previously published isolation method, mRNA sequencing and analyses [39]. Briefly, we used mRNA-Seq data generated in our laboratory to determine the boundaries and quantitative amounts of RNA transcripts for glycolysis genes as previously described. Samples were grown to mid-log phase and flash-frozen. Single-read RNA sequencing was performed on the extracted RNA using an Illumina HiSeq 2500. Data was then quality assessed, trimmed, filtered and mapped on the reference genomes. Presumably, levels of constitutive transcription reflect biological relevance in the tested conditions and transcript boundaries inform on co-transcribed functional pairs.

Alignments and trees

Alignments and trees were generated using a previously described methodology [31]. Briefly, once curated sequences were extracted, we aligned the sequences using clustalw (IUB, gap penalty of 15, gap extension of 6.66), muscle (eight iterations), Geneious [global alignment with free end gaps, cost matrix was BLOSUM62 (amino acids) or 65 % similarity (nucleotide)] and mafft [algorithm was auto, scoring matrix was BLOSUM62 and BLOSUM80 (amino acids) or 100PAM and 200PAM (nucleotide), gap penalty of 1.53, offset 0.123], then used trimAl (compareset and automated1) to find a consistent alignment [35, 40–43]. Trees were then generated using RaxML [CAT BLOSUM62 (amino acids) or CAT GTR (nucleotide), Bootstrap using rapid hill climbing with random seed 1, replicates were 100] [44]. A consensus tree was then established using a 50 % threshold level.

R analyses

Statistical analyses were performed using R version 3.2.2. [45]. R was used to create plots, graphs and quantitative data. Statistical tests used included a two-tailed t-test for comparing G+C contents. Default settings were used to preform statistical analyses and assess quantitative distributions.

Results

16S rRNA phylogeny

We first generated a 16S rRNA-based tree to use as a reference for our subsequent analyses. A phylogenetic tree based on the alignment of the 16S rRNA sequences from a representative set of 52 species and sub-species of Lactobacillus is depicted in Fig. 1. Six phylogenetic groups were identified based on their branching: the Lactobacillus animalis group, the Lactobacillus vaginalis group, the Lactobacillus buchneri group, the L. rhamnosus group, the L. acidophilus group and the L. gasseri group. These groupings are consistent with historically established relationships, as well as recent core-genome analyses [16, 46]. Some of these groups also encompass species that have been historically associated with distinct niches and points of isolation (i.e. mucosal vs intestinal vs dairy origins) [16]. The groups ranged in size from four to nine genomes with the L. rhamnosus group as the smallest and the L. animalis group as the largest. The bootstrap values for the 16S rRNA tree ranged from 51 to 100. There were 27 nodes that had a bootstrap of 70 or greater (Fig. S3). We used these six phylogenetic groups as references for our subsequent analyses, though some species were not assigned to one of these six groups.

Fig. 1.

16S rRNA tree. Tree based on the alignment of the 16S rRNA sequences using RaxML. Bootstrap values are recorded on the nodes. Groups are coloured as follows: the L. animalis group in purple, the L. vaginalis group in green, the L. buchneri group in red, the L. rhamnosus group in yellow, the L. acidophilus group in maroon, and the L. gasseri group in blue. The representative species in each group is in bold. Species names follow the naming convention shown in Table 1.

Glycolysis gene expression

Before using the glycolysis enzymes as phylogenetic markers, we first explored their genetic properties in Lactobacillus. Of the 52 Lactobacillus species and sub-species selected, 35 species encoded all ten of the classical glycolytic genes. In contrast, 16 species (encompassing the L. vaginalis and L. buchneri groups) presented eight of the canonical genes (missing pfk and fba) (Fig. S2). In such cases, alternative metabolic pathways may be utilized, such as the pentose phosphate pathway (Lactobacillus fermentum) or the phosphoketolase pathway (L. buchneri) [47, 48]. L. reuteri uses a mixture of the Embden–Meyerhof pathway and phosphoketolase pathway and, thus, was the only species with six of the glycolysis genes (Fig. S2) [49]. Next, we characterized the transcripts of glycolysis genes in Lactobacillus. Chromosome location and mRNA sequence data were analysed from six species: L. acidophilus, L. amylovorus, L. crispatus, L. delbrueckii subsp. bulgaricus, L. gasseri and L. helveticus. These six species fall into the L. acidophilus and L. gasseri groups, and all six species contain the complete complement of glycolysis genes, allowing for inferences on all of the genes in this study, instead of just a subset. Fig. 2 depicts the location of the glycolysis genes on normalized chromosomes for each of these six species. It is noteworthy that two operons can be visualized: the gap, pgk and tpi operon, as well as the pfk and pyk operon. Furthermore, the operon boundaries are clearly seen in the mRNA coverage data for each of the six species (Fig. 3). The remaining five genes have clear start and stop boundaries. Notably, L. helveticus has a unique arrangement of the glycolysis genes compared to the other five species, possibly due to the large number of IS elements leading to genome decay; however, the operons remain conserved [50]. Next, we compared the expression levels of the glycolysis genes to the whole transcriptome. We found that the glycolysis genes are among the most highly expressed genes. Indeed, considering the top 10 % of the most highly expressed genes in the cell, nine of the ten glycolysis genes are listed (Fig. 4). The only gene absent from the top 10 % is pgm. Strikingly, the gap gene is consistently among the top three most highly expressed genes in all six species. Such a consistently high transcription level indicates that the gap gene is critical to the functionality of the cell and perhaps, as such, less susceptible to changes. This is also reflected by the conserved location of gap in the genome and operon structure amongst the strains studied (Fig. 2), potentially indicating uses for gap in identification. These results demonstrate that glycolysis genes are genomically conserved, organizationally syntenous and transcriptionally important, showcasing their use as potential phylogenetic markers.

Fig. 2.

Fig. 3.

Glycolysis genes transcription. Each plot represents the mRNA-Seq coverage, log2 transformed, for the corresponding glycolysis gene over its length; ±100 represents the number of bases away from the start/end of the gene. The species are plotted as follows: L. acidophilus is red, L. amylovorus in blue, L. crispatus in yellow, L. delbrueckii subsp. bulgaricus in green, L. gasseri in purple and L. helveticus in maroon.

Fig. 4.

Ranked order of mRNA expression. Top 10 % most highly expressed genes in L. acidophilus, L. amylovorus, L. crispatus, L. delbrueckii subsp. bulgaricus, L. gasseri and L. helveticus. Data is represented as a log2 transformed RPKM (Reads Per Kilobase of transcript, per Million mapped reads). Transcripts are ranked from most abundant to least abundant. Glycolysis genes are coloured as follows: pgm in red, pgi in blue, pfk in yellow, fba in dark green, tpi in purple, gap in maroon, pgk in navy, gpm in mustard, eno in light green and pyk in lavender.

Genomic location. Normalized glycolysis gene locations in L. acidophilus, L. amylovorus, L. crispatus, L. delbrueckii subsp. bulgaricus, L. gasseri and L. helveticus. Normalization was calculated by dividing the location on the genome by the total genome size. Right arrows indicate forward direction, left reverse direction. The genomes are organized in the 5′ to 3′ direction. Colours are as follows: pgm in red, pgi in blue, pfk in yellow, fba in dark green, tpi in purple, gap in maroon, pgk in navy, gpm in mustard, eno in light green and pyk in lavender. Glycolysis genes transcription. Each plot represents the mRNA-Seq coverage, log2 transformed, for the corresponding glycolysis gene over its length; ±100 represents the number of bases away from the start/end of the gene. The species are plotted as follows: L. acidophilus is red, L. amylovorus in blue, L. crispatus in yellow, L. delbrueckii subsp. bulgaricus in green, L. gasseri in purple and L. helveticus in maroon. Ranked order of mRNA expression. Top 10 % most highly expressed genes in L. acidophilus, L. amylovorus, L. crispatus, L. delbrueckii subsp. bulgaricus, L. gasseri and L. helveticus. Data is represented as a log2 transformed RPKM (Reads Per Kilobase of transcript, per Million mapped reads). Transcripts are ranked from most abundant to least abundant. Glycolysis genes are coloured as follows: pgm in red, pgi in blue, pfk in yellow, fba in dark green, tpi in purple, gap in maroon, pgk in navy, gpm in mustard, eno in light green and pyk in lavender.

Glycolysis-based phylogeny

To create a glycolysis-based phylogeny for the 52 selected Lactobacillus species and subspecies, the concatenated amino acid sequences of the glycolysis enzymes were used (Fig. 5). The enzymes were concatenated in their order of occurrence in the glycolysis pathway (Fig. S1). For organisms with all enzymes present, this meant ten sequences were concatenated together, whereas only six to eight amino acid sequences were concatenated for the other species (Fig. S2). The six phylogenetic groups identified from the 16S rRNA reference tree, namely L. animalis, L. vaginalis, L. buchneri, L. rhamnosus, L. acidophilus and L. gasseri, were also identified in the concatenated tree and follow the same clustering (colouring) scheme. The bootstrap values for the concatenated tree ranged from 52 to 100. Nodes with bootstrap values equal to or greater than 70 numbered 43, a 59 % increase from that of the 16S rRNA tree. Overall, the concatenated tree correctly assigned the phylogenetic groups established from the 16S rRNA tree. In addition, the concatenated tree better discerned how the phylogenetic groups relate to one another, even within groups. This is supported by the higher bootstrap values (Fig. S3). Trees based on the individual glycolysis enzymes can be found in Figs S4–S13. The sum of branch lengths for each tree can be found in Table S1. A detailed comparative analysis of various trees structures revealed that overall there is high congruence in clustering both between and within the six established groups, though with various levels of discrimination across each protein sequence. Repeatedly, glycolysis-based trees provided more discriminatory power than the 16S rRNA tree.

Fig. 5.

Concatenated glycolysis tree. Tree based on the alignment of concatenated amino acid sequences of glycolysis enzymes using RaxML. Bootstrap values are recorded on the nodes. Groups are coloured as follows: the L. animalis group in purple, the L. vaginalis group in green, L. buchneri group in red, the L. rhamnosus group in yellow, the L. acidophilus group in maroon, and the L. gasseri group in blue. The representative species in each group is in bold. Species name follows the naming convention shown in Table 1.

G+C content analyses

Next, we looked at the G+C mol% and genomic drift of the glycolysis genes across the various species. Fig. 6 shows notched boxplots comparing the G+C mol% of each sequence set (the 16S rRNA sequence, the 10 genes and the concatenated sequences) in this study, compared to the genome-wide G+C mol%, ranked in increasing order. The G+C mol% of the pgm gene is closest to that of the total genome, while the 16S rRNA gene is the farthest. The notches are indicative of strong evidence that the medians differ when the notches do not overlap [51]. The 16S rRNA gene does not overlap with any other gene. In fact, a two-tailed t-test with a P value less than 0.001 (2.2×10−16) revealed that the G+C mol% of the 16S rRNA sequence was statistically distinct from that of the total genome G+C mol%. This indicates that the 16S rRNA gene is not matching the pace of drift of the total genome with regards to G+C mol%. In contrast, all of the glycolysis genes, with the exception of pfk and eno, were not statistically different from the total genome G+C mol% (P value greater than 0.01), indicating that G+C mol% drift for glycolysis genes provide insights into the genome-wide G+C mol% drift. This further supports glycolytic sequences as intriguing candidates for both phylogenetic studies and representatives of genome-wide trends.

Fig. 6.

G+C mol% analysis of Lactobacillus glycolysis genes. Depicted are notched boxplots of G+C mol% for each glycolysis gene, concatenated genes, 16S rRNA and total genome. Genes are placed in order of increasing median. If two notches do not overlap, it is an indication of strong evidence for differing medians. The genome sizes in this study ranged from 1.28 Mb (Lactobacillus iners) to 3.65 Mb (Lactobacillus pentosus), again reflecting the extensive genomic diversity within this genus. The total G+C mol% ranged from 32.50 % (L. iners) to 57.00 % (Lactobacillus nasuensis), which is intriguing given the general assumption that all lactobacilli are low G+C mol% organisms. Nevertheless, the mean G+C mol% was 40.70 %, consistent with Lactobacillus being generally perceived as low G+C mol% organisms. Splitting the species into high, medium and low categories, it becomes apparent that most species are trending towards the lower end of the spectrum, and away from the higher G+C mol% range (Fig. 7a). Some of the phylogenetic groups are closely clustered, such as the L. acidophilus group, L. gasseri group and the L. rhamnosus group, with the exception of L. delbrueckii subsp. bulgaricus (a dairy bacterium) and L. nasuensis (an aforementioned ex in G+C mol%). The L. animalis group and L. buchneri group are similarly clustered, albeit more loosely. These observations hold true when comparing the G+C mol% of all the individual genes in their respective genomes, perhaps reflecting a consistent and genome-wide pace of drift, rather than variable speeds of drift for each gene (Fig. 7b). Again, the 16S rRNA sequence has a much higher G+C mol% than most of the other studied genes, with the outlier L. nasuensis deviating from the consensus. The G+C mol% of the glycolysis genes within clusters are often times very close, as exemplified by the L. acidophilus group.

Fig. 7.

G+C mol% analysis of Lactobacillus genomes. (a) shows the total G+C mol% for each species. Species are coloured according to their phylogenetic group. (b) shows the G+C mol% of the glycolysis genes, the concatenated glycolysis genes, the 16S rRNA and total G+C mol% for each species. Species are named according to Table 1.

Discussion

The genomic and functional attributes of Lactobacillus render it a pervasive genus, both in research and in industry. The benefits and uses of this diverse set of species are well-established and exhaustive, and yet, the list continues to grow. Many Lactobacillus strains are now considered to be health-promoting in the form of probiotics and are often found to be a part of a healthy microbiome [26]. They are also being engineered to promote healthy host–microbe interactions and deliver bioactive compounds such as vaccines [52]. As microbiome studies expand, we anticipate that the interest in Lactobacillus is set to increase, especially given their occurrence in several human-associated microbiomes, encompassing intestinal, vaginal, oral and skin communities [21]. Many studies have been published discussing the role of Lactobacillus in the microbiota, including research into the microbiota changes through disease, enhancing the microbiome as a form of treatment, and how the microbiome reacts to drugs [53-55]. The continuously expanding list of uses and studies just illustrates how important it is to accurately identify Lactobacillus species. While all species of Lactobacillus share some classical features of LAB organisms, notably their ability to produce lactic acid, the similarities between species are relatively few. In fact, even basic characteristics such as niche and isolation source can vary radically. Proper identification is an increasing concern especially when it comes to disease modelling in the human microbiome, as well as the formulation, tracking and efficacy of probiotic strains. Innovative techniques are continuously being developed and often use a combination of 16S rRNA with developing technologies, such as MALDI-TOF [56]. However, these tools are not broadly accessible and still rely partially on the sometimes unsatisfactory 16S rRNA. Here, we provide a practical alternative to the classical use of 16S rRNA sequencing. In this paper, we applied the previously proposed methodology of using glycolysis sequences to perform phylogenetic studies [31] in the genus Lactobacillus. We demonstrated that this method is a practical and robust approach for Lactobacillus. Compared to the traditional 16S rRNA method, this approach was able to consistently identify phylogenetic groupings, with notably high-resolution between closely related species. While the 16S rRNA-based tree was able to identify the six phylogenetic groups, the concatenated tree was able to add more discrimination both between and within groups, evidenced by the higher bootstrap values in the glycolysis-based tree. Our grouping is consistent with a previous study using glycolysis sequences for phylogenetic analysis of Lactobacillus species [32]. Further analyses based on genomic content revealed clues as to why the glycolysis-based tree was better able to assign species. First, looking at the organization of the genes in the genomes revealed two conserved operons in Lactobacillus, the gap operon and the pfk operon, with the remaining enzymes showing clear start and stop boundaries. This shared synteny emphasizes the importance of glycolysis gene conservation. Next, we looked at expression level. The glycolysis genes were consistently among the most highly expressed genes in the cell, with the gap gene always in the top three most abundant transcripts. These high expression levels indicate a great use and energy expenditure and, thus, arguably reflect the biological importance of this gene to the cell. Because of this importance, the glycolysis genes are much less likely to be subjected to loss. The operon structures and expression levels of the glycolysis genes are significant because a main criterion for selecting the 16S rRNA as a phylogenetic marker was its high conservation among species [57]. Next, we looked at how the glycolysis genes reflected genomic drift in terms of G+C mol%. First, it would appear that the genus is reaching a stabilizing point in its G+C mol% drift, though some species with high G+C mol% still have margin for extending the trend (L. nasuensis, Lactobacillus zymae, and L. fermentum). Next, we saw that the glycolysis gene G+C mol% was extremely close to that of the genome-wide G+C mol%, while the 16S rRNA was startlingly higher (P<0.001), underscoring the fact that the 16S rRNA is by all accounts much different than that of the total genome, whereas the majority of the glycolysis genes are significantly similar to the total genome G+C mol% (Fig. 6). This provides a possible explanation for the reason why the 16S rRNA analyses have been limited at a high-resolution level in Lactobacillus and why the glycolysis-based tree was able to reach a higher-resolution level. In fact, it has long been noted that 16S rRNA is unable to discriminate between species of lactobacilli due to its high similarity amongst them [58]. The individual glycolysis genes are much more similar to the genome as a whole (Fig. 6). Additionally, individual glycolysis genes are also able to accurately assign species to groups with a high resolution (Figs S4–S13). The gap gene is of particular note, due to its presence in an operon, consistently high expression, G+C mol% and ability to accurately define species groups. Overall, the glycolysis-based approach was able to provide a highe-resolution phylogeny for Lactobacillus, due in part to its conservation, expression and reflection of genomic drift.

55 in total

1. How close is close: 16S rRNA sequence identity may not be sufficient to guarantee species identity.

Authors: G E Fox; J D Wisotzkey; P Jurtshuk
Journal: Int J Syst Bacteriol Date: 1992-01

2. Conservation of gene order: a fingerprint of proteins that physically interact.

Authors: T Dandekar; B Snel; M Huynen; P Bork
Journal: Trends Biochem Sci Date: 1998-09 Impact factor: 13.807

3. Expert consensus document. The International Scientific Association for Probiotics and Prebiotics consensus statement on the scope and appropriate use of the term probiotic.

Authors: Colin Hill; Francisco Guarner; Gregor Reid; Glenn R Gibson; Daniel J Merenstein; Bruno Pot; Lorenzo Morelli; Roberto Berni Canani; Harry J Flint; Seppo Salminen; Philip C Calder; Mary Ellen Sanders
Journal: Nat Rev Gastroenterol Hepatol Date: 2014-06-10 Impact factor: 46.802

4. The complete genomes of Lactobacillus plantarum and Lactobacillus johnsonii reveal extensive differences in chromosome organization and gene content.

Authors: Jos Boekhorst; Roland J Siezen; Marie-Camille Zwahlen; David Vilanova; Raymond D Pridmore; Annick Mercenier; Michiel Kleerebezem; Willem M de Vos; Harald Brüssow; Frank Desiere
Journal: Microbiology Date: 2004-11 Impact factor: 2.777

5. Comparative sequence analysis of a recA gene fragment brings new evidence for a change in the taxonomy of the Lactobacillus casei group.

Authors: G E Felis; F Dellaglio; L Mizzi; S Torriani
Journal: Int J Syst Evol Microbiol Date: 2001-11 Impact factor: 2.747

Review 6. Genomics of the Genus Bifidobacterium Reveals Species-Specific Adaptation to the Glycan-Rich Gut Environment.

Authors: Christian Milani; Francesca Turroni; Sabrina Duranti; Gabriele Andrea Lugli; Leonardo Mancabelli; Chiara Ferrario; Douwe van Sinderen; Marco Ventura
Journal: Appl Environ Microbiol Date: 2015-11-20 Impact factor: 4.792

7. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data.

Authors: Matthew Kearse; Richard Moir; Amy Wilson; Steven Stones-Havas; Matthew Cheung; Shane Sturrock; Simon Buxton; Alex Cooper; Sidney Markowitz; Chris Duran; Tobias Thierer; Bruce Ashton; Peter Meintjes; Alexei Drummond
Journal: Bioinformatics Date: 2012-04-27 Impact factor: 6.937

8. Phylogenetic Analysis of the Bifidobacterium Genus Using Glycolysis Enzyme Sequences.

Authors: Katelyn Brandt; Rodolphe Barrangou
Journal: Front Microbiol Date: 2016-05-09 Impact factor: 5.640

9. Novel approaches for the taxonomic and metabolic characterization of lactobacilli: Integration of 16S rRNA gene sequencing with MALDI-TOF MS and 1H-NMR.

Authors: Claudio Foschi; Luca Laghi; Carola Parolin; Barbara Giordani; Monica Compri; Roberto Cevenini; Antonella Marangoni; Beatrice Vitali
Journal: PLoS One Date: 2017-02-16 Impact factor: 3.240

3. Comparative genomics of Leuconostoc lactis strains isolated from human gastrointestinal system and fermented foods microbiomes.

Authors: Ismail Gumustop; Fatih Ortakci
Journal: BMC Genom Data Date: 2022-08-02

4. Adaptive response to iterative passages of five Lactobacillus species in simulated vaginal fluid.

Authors: Katelyn Brandt; Rodolphe Barrangou
Journal: BMC Microbiol Date: 2020-11-10 Impact factor: 3.605

4 in total