Literature DB >> 29932393

Using glycolysis enzyme sequences to inform Lactobacillus phylogeny.

Katelyn Brandt1,2, Rodolphe Barrangou1,2.   

Abstract

The genus Lactobacillus encompasses a diversity of species that occur widely in nature and encode a plethora of metabolic pathways reflecting their adaptation to various ecological niches, including humans, animals, plants and food products. Accordingly, their functional attributes have been exploited industrially and several strains are commonly formulated as probiotics or starter cultures in the food industry. Although divergent evolutionary processes have yielded the acquisition and evolution of specialized functionalities, all Lactobacillus species share a small set of core metabolic properties, including the glycolysis pathway. Thus, the sequences of glycolytic enzymes afford a means to establish phylogenetic groups with the potential to discern species that are too closely related from a 16S rRNA standpoint. Here, we identified and extracted glycolysis enzyme sequences from 52 species, and carried out individual and concatenated phylogenetic analyses. We show that a glycolysis-based phylogenetic tree can robustly segregate lactobacilli into distinct clusters and discern very closely related species. We also compare and contrast evolutionary patterns with genome-wide features and transcriptomic patterns, reflecting genomic drift trends. Overall, results suggest that glycolytic enzymes provide valuable phylogenetic insights and may constitute practical targets for evolutionary studies.

Entities:  

Keywords:  Lactobacillus; evolution; glycolysis; phylogeny

Mesh:

Substances:

Year:  2018        PMID: 29932393      PMCID: PMC6096939          DOI: 10.1099/mgen.0.000187

Source DB:  PubMed          Journal:  Microb Genom        ISSN: 2057-5858


Data Summary

RNA sequencing data has been deposited at the National Center for Biotechnology Information, BioProject PRJNA420353. Though 16S rRNA-based phylogeny methods have been broadly used, they have a limited ability to precisely ascribe genus species across the prokaryotic branch of the tree of life. In this study, we have shown that using glycolysis enzyme sequences for phylogenetic analyses can be applied to the diverse genus Lactobacillus, and is able to consistently unravel phylogenetic groups and precisely ascertain relatedness, even between species nearly identical on the classical ribosomal tree. Because of their universal presence and greater diversity compared to 16S rRNA sequences, we posit that these sequences could be valuable markers in future phylogenetic and microbiome studies, specifically by providing connections to the other major branches, and enabling increased resolution. This can also be used to help identify unknown and un-culturable species, as the glycolysis enzymes are widespread, variable and allow for greater discriminatory power. Importantly, variability within some of the hypervariable regions within glycolytic sequences can also provide discrimination within a species. Looking forward, expanding this analysis to other genera and phylogenetic branches could open new avenues for evolutionary studies, and for investigating the phylogeny, composition and diversity of microbial populations in complex microbiomes.

Introduction

Genome adaptation is an important feature for speciation, and evolutionary processes balance various adaptive techniques for optimal growth and survival. At the genome level, adaptation features may include gene synteny conservation, G+C mol% drift, as well as codon bias optimization [1-3]. A working balance of these and other forces enable an organism to become uniquely adapted to its niche, and build up competitive advantages in shifting environmental conditions, or overcome predators and competitors. Such unique adaptations are the basis of phylogenetic studies and allow researchers various degrees of discrimination. At the genus and species levels, additions and deletions of genes can be used to define the pan- and core-genome and genome architecture can be used to evaluate synteny [4]. At the strain level, nucleotide polymorphisms afford the highest resolution opportunities, with the ability to compare and contrast nearly identical isolates and even clonal relatives [5, 6]. For prokaryotic species, various tools and methodologies have been used to compare and contrast genomes, but the challenges are often genus- or species-specific, and approaches can vary depending on the desired resolution and encompassed genetic diversity [7]. In some cases where within genus diversity is extensive, such as in bifidobacteria and lactobacilli, using canonical housekeeping genes or universal markers (i.e. 16S rRNA) has proven difficult or limited [8-11]. Also, there has yet to be defined a consistent set of genes to be utilized for multilocus sequence typing studies. Indeed, while universally conserved 16S rRNA sequences afford opportunities for metagenomic analyses, their shortcomings and biases are increasingly under scrutiny [12-14]. For some genera, it has become obvious that the 16S rRNA resolution limit has been met and a new set of criteria must be established. One such genus is Lactobacillus. Belonging to the lactic acid bacteria (LAB) group, this genus is composed of over 150 Gram-positive, low G+C species [15, 16]. Lactobacilli have been used as starter cultures in the food industry for decades, and by humankind for millennia, and as such have been labelled generally regarded as safe (GRAS) and benefit from the qualified presumption of safety (QPS) [17]. Food-related studies have led to the assertion that some strains in select species are to be considered probiotic (‘live microorganisms which when administered in adequate amounts confer a health benefit on the host’) [18] and, as such, are now predominantly featured in dairy foods and widely formulated in probiotic dietary supplements [19]. Recently, the advent of microbiome studies has revealed that microbial populations are more numerous, diverse and variable than originally thought [20, 21]. With both qualitative and quantitative considerations, associations and sometimes even correlations have been established between members of the microbiome and host health, though the accuracy and precision with which bacteria are identified vary widely and are not universally satisfactory. One such instance concerns the genus Lactobacillus, which has been established as an important colonizer of the human gastrointestinal tract [22]. Additional research is thus needed in this area, as researchers better grasp the role of this genus in health and disease [23-28]. Some lactobacilli are already being exploited, for example, as a tool to deliver vaccines [29]. Arguably, we are far from exhausting all the possible uses of this functional genus. However, in order to be able to fully utilize the numerous functions of Lactobacillus, we must first establish a method that enables us to properly identify and relate the many diverse species within this genus. While 16S rRNA sequencing has gotten us this far, it has a limited ability to distinguish between closely related species and represent overall genomic content and reflect genome-wide trends. These shortcomings are certainly not unique to Lactobacillus, and with the ever-increasing expansion of our understanding of the microbial world [30], there is a need to identify 16S rRNA-independent genomic features that capture diversity on a more granular level. Thus, it is imperative that a standard method be developed that allows the proper identification of species. In order to achieve this, we assessed the potential of the widespread glycolysis pathway enzyme sequences to inform phylogeny. In this paper, we applied a previously described method of phylogenetic analysis using the classical glycolysis enzymes as phylogenetic markers [31] to a diverse set of Lactobacillus species in order to establish its effect on a complicated genus. Though previous studies had used glycolysis as an expansion of ribosomal trees [32], we determined how a broad glycolysis-based phylogeny compares to the ribosomal tree. Specifically, previous studies have applied glycolysis-based approaches to LAB in order to define an evolutionary pathway. By adding data from the entirety of the glycolysis and pentose phosphate pathways, Salvetti et al. [32] were able to apply phenotypic data to explain the branching of the LAB tree, as well as highlight some areas of misclassification in the 16S rRNA tree [32]. Here, we propose using the entirety of the canonical glycolysis pathway as a replacement phylogenetic marker for the 16S rRNA. Conveniently, the glycolysis pathway, much like the 16S rRNA, is universally present, at least partially, conserved, and constitutes a set of suitable candidates for phylogenetic analyses [33, 34]. Here, we demonstrate that this method can assign phylogenetic relationships consistent with what is known from the 16S rRNA marker, though at a much higher discriminatory power. Specifically, we compared sequence-based alignment trees of a representative set of lactobacilli using 16S rRNA- and glycolysis-based approaches. We also analysed the occurrence and location, expression, and G+C mol% of each glycolysis gene. The location and transcriptional profiles confirm that these genes are conserved and highly transcribed with varying levels of drift.

Methods

Genomes

We selected 52 diverse species and subspecies of Lactobacillus for analysis, sampled across and throughout the 16S rRNA and core- and pan-genome tree (Table 1). We ensured this set was representative of this paraphyletic genus and included species from various niches, as previously established [16]. The genomes were mined using Geneious version 9.0.5 [35] to identify the classical glycolysis genes in each species (Figs S1 and S2, available with the online version of this article). Four reference genomes were used to make a curated database for the glycolysis genes, namely Lactobacillus acidophilus, Lactobacillus gasseri, Lactobacillus reuteri and Lactobacillus rhamnosus. The Annotate from Database feature was used to annotate the other genomes. To validate the glycolysis annotations, especially in the case of multiple hits, a combination of blast, get_homologues and mRNA-Seq (mRNA sequencing) data was used [36, 37]. The 16S rRNA sequences were extracted from the genomes and blast was used to validate any cases where there were multiple hits. Once annotated and curated, the genes were extracted from the genome. The glycolysis genes were then translated and confirmed by ExPASy [38]. For the concatenated tree, the amino acid sequences were joined together in order of their presence in the glycolysis pathway (Fig. S1).
Table 1.

Species and genomes list

This shows the representative set of 52 Lactobacillus species and sub-species used in this study. Accession numbers and naming conventions are included.

GenusSpeciesSubspeciesStrainAccession no.Naming conventionLocus tag
LactobacillusacidipiscisKCTC 13900NZ_BACS00000000L_acidipiscisGSS
LactobacillusacidophilusNCFMNC_006814L_acidophilusLBA
LactobacillusalgidusDSM 15638NZ_AZDI00000000L_algidusFC66
LactobacillusamylolyticusDSM 11664NZ_ADNY00000000L_amylolyticusHMPREF0493
LactobacillusamylovorusGRL1118NC_017470L_amylovorusLAB52
LactobacillusanimalisDSM 20602NZ_AEOF00000000L_animalisLACAN
LactobacillusaquaticusDSM 21051NZ_AYZD00000000L_aquaticusFC19
LactobacillusbrevisATCC 367NC_008497L_brevisLVIS
LactobacillusbuchneriCD034NC_018610L_buchneriLBUCD034
LactobacilluscacaonumDSM 21116NZ_AYZE00000000L_cacaonumFC80
LactobacilluscaseiDSM 20011NZ_AZCO00000000L_caseiFC13
LactobacilluscoryniformistorquensDSM 20004NZ_AEOS00000000L_coryniformis_tEWE
LactobacilluscrispatusST1NC_014106L_crispatusLCRIS
LactobacilluscurvatusCRL 705NZ_AGBU00000000L_curvatusCRL705
LactobacillusdelbrueckiibulgaricusATCC BAA-365NC_008529L_delbrueckii_bLBUL
LactobacillusfarciminisDSM 20184NZ_AEOT00000000L_farciminisLACFC
LactobacillusfermentumCECT 5716NC_017465L_fermentumLC40
LactobacillusfloricolaDSM 23037NZ_AYZL00000000L_floricolaFC86
LactobacillusgallinarumDSM 10532NZ_BALB00000000L_gallinarumJCM2011
LactobacillusgasseriATCC 33323NC_008530L_gasseriLGAS
LactobacillushelveticusCNRZ32NC_021744L_helveticusLHE
LactobacillushilgardiiDSM 20176NZ_ACGP00000000L_hilgardiiHMPREF0519
LactobacillushominisDSM 23910NZ_CAKE00000000L_hominisBN55
LactobacillusinersDSM 13335NZ_ACLN00000000L_inersHMPREF0520
LactobacillusjenseniiDSM 20557NZ_AYYU00000000L_jenseniiFC45
LactobacillusjohnsoniiNCC 533NC_005362L_johnsoniiLJ
LactobacilluskimchicusJCM_15530NZ_AZCX00000000L_kimchicusFC96
LactobacilluslindneriDSM 20690NZ_JQBT00000000L_lindneriIV52
LactobacillusmaliDSM 20444NZ_AKKT00000000L_maliLMA
LactobacillusmindensisDSM 14500NZ_AZEZ00000000L_mindensisFD29
LactobacillusmucosaeLM1NZ_CP011013L_mucosaeLBLM1
LactobacillusnasuensisJCM_17158NZ_AZDJ00000000L_nasuensisFD02
LactobacillusoeniDSM 19972NZ_AZEH00000000L_oeniFD46
LactobacillusorisF0423NZ_AFTL00000000L_orisHMPREF9102
LactobacillusotakiensisDSM 19908NZ_BASH00000000L_otakiensisLOT
LactobacillusparabuchneriDSM 5707NZ_AZGK00000000L_parabuchneriFC51
LactobacillusparacaseiN1115NZ_CP007122L_paracaseiAF91
LactobacilluspasteuriiDSM 23907NZ_CAKD00000000L_pasteuriiBN53
LactobacilluspentosusDSM 20314NZ_AZCU00000000L_pentosusFD24
Lactobacillusplantarum16NC_021514L_plantarumLP16
LactobacillusreuteriDSM 20016NC_009513L_reuteriLREU
LactobacillusrhamnosusGGNC_013198L_rhamnosusLGG
LactobacillusrossiaeDSM 15814NZ_AZFF00000000L_rossiaeFD35
LactobacillusruminisATCC 27782NC_015975L_ruminisLRC
LactobacillussakeisakeiDSM 20017NZ_BALW00000000L_sakei_sJCM1157
LactobacillussalivariusCECT 5713NC_017481L_salivariusCECT 5713
LactobacillussanfranciscensisTMW 1.1304NC_015978L_sanfranciscensisLSA
LactobacillussuebicusDSM 5007NZ_BACO00000000L_suebicusGSK
LactobacillussunkiiDSM 19904NZ_AZEA00000000L_sunkiiFD17
LactobacillusvaginalisDSM 5837NZ_ACGV00000000L_vaginalisHMPREF0549
LactobacillusversmoldensisDSM 14857NZ_BACR00000000L_versmoldensisGSQ
LactobacilluszymaeDSM 19395NZ_AZDW00000000L_zymaeFD38

Species and genomes list

This shows the representative set of 52 Lactobacillus species and sub-species used in this study. Accession numbers and naming conventions are included.

Transcriptional profiles of glycolysis genes

We analysed RNA transcription profiles from mRNA-Seq data for six species (L. acidophilus, Lactobacillus amylovorus, Lactobacillus crispatus, Lactobacillus delbrueckii subsp. bulgaricus, L. gasseri, and Lactobacillus helveticus) with the previously published isolation method, mRNA sequencing and analyses [39]. Briefly, we used mRNA-Seq data generated in our laboratory to determine the boundaries and quantitative amounts of RNA transcripts for glycolysis genes as previously described. Samples were grown to mid-log phase and flash-frozen. Single-read RNA sequencing was performed on the extracted RNA using an Illumina HiSeq 2500. Data was then quality assessed, trimmed, filtered and mapped on the reference genomes. Presumably, levels of constitutive transcription reflect biological relevance in the tested conditions and transcript boundaries inform on co-transcribed functional pairs.

Alignments and trees

Alignments and trees were generated using a previously described methodology [31]. Briefly, once curated sequences were extracted, we aligned the sequences using clustalw (IUB, gap penalty of 15, gap extension of 6.66), muscle (eight iterations), Geneious [global alignment with free end gaps, cost matrix was BLOSUM62 (amino acids) or 65 % similarity (nucleotide)] and mafft [algorithm was auto, scoring matrix was BLOSUM62 and BLOSUM80 (amino acids) or 100PAM and 200PAM (nucleotide), gap penalty of 1.53, offset 0.123], then used trimAl (compareset and automated1) to find a consistent alignment [35, 40–43]. Trees were then generated using RaxML [CAT BLOSUM62 (amino acids) or CAT GTR (nucleotide), Bootstrap using rapid hill climbing with random seed 1, replicates were 100] [44]. A consensus tree was then established using a 50 % threshold level.

R analyses

Statistical analyses were performed using R version 3.2.2. [45]. R was used to create plots, graphs and quantitative data. Statistical tests used included a two-tailed t-test for comparing G+C contents. Default settings were used to preform statistical analyses and assess quantitative distributions.

Results

16S rRNA phylogeny

We first generated a 16S rRNA-based tree to use as a reference for our subsequent analyses. A phylogenetic tree based on the alignment of the 16S rRNA sequences from a representative set of 52 species and sub-species of Lactobacillus is depicted in Fig. 1. Six phylogenetic groups were identified based on their branching: the Lactobacillus animalis group, the Lactobacillus vaginalis group, the Lactobacillus buchneri group, the L. rhamnosus group, the L. acidophilus group and the L. gasseri group. These groupings are consistent with historically established relationships, as well as recent core-genome analyses [16, 46]. Some of these groups also encompass species that have been historically associated with distinct niches and points of isolation (i.e. mucosal vs intestinal vs dairy origins) [16]. The groups ranged in size from four to nine genomes with the L. rhamnosus group as the smallest and the L. animalis group as the largest. The bootstrap values for the 16S rRNA tree ranged from 51 to 100. There were 27 nodes that had a bootstrap of 70 or greater (Fig. S3). We used these six phylogenetic groups as references for our subsequent analyses, though some species were not assigned to one of these six groups.
Fig. 1.

16S rRNA tree. Tree based on the alignment of the 16S rRNA sequences using RaxML. Bootstrap values are recorded on the nodes. Groups are coloured as follows: the L. animalis group in purple, the L. vaginalis group in green, the L. buchneri group in red, the L. rhamnosus group in yellow, the L. acidophilus group in maroon, and the L. gasseri group in blue. The representative species in each group is in bold. Species names follow the naming convention shown in Table 1.

16S rRNA tree. Tree based on the alignment of the 16S rRNA sequences using RaxML. Bootstrap values are recorded on the nodes. Groups are coloured as follows: the L. animalis group in purple, the L. vaginalis group in green, the L. buchneri group in red, the L. rhamnosus group in yellow, the L. acidophilus group in maroon, and the L. gasseri group in blue. The representative species in each group is in bold. Species names follow the naming convention shown in Table 1.

Glycolysis gene expression

Before using the glycolysis enzymes as phylogenetic markers, we first explored their genetic properties in Lactobacillus. Of the 52 Lactobacillus species and sub-species selected, 35 species encoded all ten of the classical glycolytic genes. In contrast, 16 species (encompassing the L. vaginalis and L. buchneri groups) presented eight of the canonical genes (missing pfk and fba) (Fig. S2). In such cases, alternative metabolic pathways may be utilized, such as the pentose phosphate pathway (Lactobacillus fermentum) or the phosphoketolase pathway (L. buchneri) [47, 48]. L. reuteri uses a mixture of the Embden–Meyerhof pathway and phosphoketolase pathway and, thus, was the only species with six of the glycolysis genes (Fig. S2) [49]. Next, we characterized the transcripts of glycolysis genes in Lactobacillus. Chromosome location and mRNA sequence data were analysed from six species: L. acidophilus, L. amylovorus, L. crispatus, L. delbrueckii subsp. bulgaricus, L. gasseri and L. helveticus. These six species fall into the L. acidophilus and L. gasseri groups, and all six species contain the complete complement of glycolysis genes, allowing for inferences on all of the genes in this study, instead of just a subset. Fig. 2 depicts the location of the glycolysis genes on normalized chromosomes for each of these six species. It is noteworthy that two operons can be visualized: the gap, pgk and tpi operon, as well as the pfk and pyk operon. Furthermore, the operon boundaries are clearly seen in the mRNA coverage data for each of the six species (Fig. 3). The remaining five genes have clear start and stop boundaries. Notably, L. helveticus has a unique arrangement of the glycolysis genes compared to the other five species, possibly due to the large number of IS elements leading to genome decay; however, the operons remain conserved [50]. Next, we compared the expression levels of the glycolysis genes to the whole transcriptome. We found that the glycolysis genes are among the most highly expressed genes. Indeed, considering the top 10 % of the most highly expressed genes in the cell, nine of the ten glycolysis genes are listed (Fig. 4). The only gene absent from the top 10 % is pgm. Strikingly, the gap gene is consistently among the top three most highly expressed genes in all six species. Such a consistently high transcription level indicates that the gap gene is critical to the functionality of the cell and perhaps, as such, less susceptible to changes. This is also reflected by the conserved location of gap in the genome and operon structure amongst the strains studied (Fig. 2), potentially indicating uses for gap in identification. These results demonstrate that glycolysis genes are genomically conserved, organizationally syntenous and transcriptionally important, showcasing their use as potential phylogenetic markers.
Fig. 2.

Genomic location. Normalized glycolysis gene locations in L. acidophilus, L. amylovorus, L. crispatus, L. delbrueckii subsp. bulgaricus, L. gasseri and L. helveticus. Normalization was calculated by dividing the location on the genome by the total genome size. Right arrows indicate forward direction, left reverse direction. The genomes are organized in the 5′ to 3′ direction. Colours are as follows: pgm in red, pgi in blue, pfk in yellow, fba in dark green, tpi in purple, gap in maroon, pgk in navy, gpm in mustard, eno in light green and pyk in lavender.

Fig. 3.

Glycolysis genes transcription. Each plot represents the mRNA-Seq coverage, log2 transformed, for the corresponding glycolysis gene over its length; ±100 represents the number of bases away from the start/end of the gene. The species are plotted as follows: L. acidophilus is red, L. amylovorus in blue, L. crispatus in yellow, L. delbrueckii subsp. bulgaricus in green, L. gasseri in purple and L. helveticus in maroon.

Fig. 4.

Ranked order of mRNA expression. Top 10 % most highly expressed genes in L. acidophilus, L. amylovorus, L. crispatus, L. delbrueckii subsp. bulgaricus, L. gasseri and L. helveticus. Data is represented as a log2 transformed RPKM (Reads Per Kilobase of transcript, per Million mapped reads). Transcripts are ranked from most abundant to least abundant. Glycolysis genes are coloured as follows: pgm in red, pgi in blue, pfk in yellow, fba in dark green, tpi in purple, gap in maroon, pgk in navy, gpm in mustard, eno in light green and pyk in lavender.

Genomic location. Normalized glycolysis gene locations in L. acidophilus, L. amylovorus, L. crispatus, L. delbrueckii subsp. bulgaricus, L. gasseri and L. helveticus. Normalization was calculated by dividing the location on the genome by the total genome size. Right arrows indicate forward direction, left reverse direction. The genomes are organized in the 5′ to 3′ direction. Colours are as follows: pgm in red, pgi in blue, pfk in yellow, fba in dark green, tpi in purple, gap in maroon, pgk in navy, gpm in mustard, eno in light green and pyk in lavender. Glycolysis genes transcription. Each plot represents the mRNA-Seq coverage, log2 transformed, for the corresponding glycolysis gene over its length; ±100 represents the number of bases away from the start/end of the gene. The species are plotted as follows: L. acidophilus is red, L. amylovorus in blue, L. crispatus in yellow, L. delbrueckii subsp. bulgaricus in green, L. gasseri in purple and L. helveticus in maroon. Ranked order of mRNA expression. Top 10 % most highly expressed genes in L. acidophilus, L. amylovorus, L. crispatus, L. delbrueckii subsp. bulgaricus, L. gasseri and L. helveticus. Data is represented as a log2 transformed RPKM (Reads Per Kilobase of transcript, per Million mapped reads). Transcripts are ranked from most abundant to least abundant. Glycolysis genes are coloured as follows: pgm in red, pgi in blue, pfk in yellow, fba in dark green, tpi in purple, gap in maroon, pgk in navy, gpm in mustard, eno in light green and pyk in lavender.

Glycolysis-based phylogeny

To create a glycolysis-based phylogeny for the 52 selected Lactobacillus species and subspecies, the concatenated amino acid sequences of the glycolysis enzymes were used (Fig. 5). The enzymes were concatenated in their order of occurrence in the glycolysis pathway (Fig. S1). For organisms with all enzymes present, this meant ten sequences were concatenated together, whereas only six to eight amino acid sequences were concatenated for the other species (Fig. S2). The six phylogenetic groups identified from the 16S rRNA reference tree, namely L. animalis, L. vaginalis, L. buchneri, L. rhamnosus, L. acidophilus and L. gasseri, were also identified in the concatenated tree and follow the same clustering (colouring) scheme. The bootstrap values for the concatenated tree ranged from 52 to 100. Nodes with bootstrap values equal to or greater than 70 numbered 43, a 59 % increase from that of the 16S rRNA tree. Overall, the concatenated tree correctly assigned the phylogenetic groups established from the 16S rRNA tree. In addition, the concatenated tree better discerned how the phylogenetic groups relate to one another, even within groups. This is supported by the higher bootstrap values (Fig. S3). Trees based on the individual glycolysis enzymes can be found in Figs S4–S13. The sum of branch lengths for each tree can be found in Table S1. A detailed comparative analysis of various trees structures revealed that overall there is high congruence in clustering both between and within the six established groups, though with various levels of discrimination across each protein sequence. Repeatedly, glycolysis-based trees provided more discriminatory power than the 16S rRNA tree.
Fig. 5.

Concatenated glycolysis tree. Tree based on the alignment of concatenated amino acid sequences of glycolysis enzymes using RaxML. Bootstrap values are recorded on the nodes. Groups are coloured as follows: the L. animalis group in purple, the L. vaginalis group in green, L. buchneri group in red, the L. rhamnosus group in yellow, the L. acidophilus group in maroon, and the L. gasseri group in blue. The representative species in each group is in bold. Species name follows the naming convention shown in Table 1.

Concatenated glycolysis tree. Tree based on the alignment of concatenated amino acid sequences of glycolysis enzymes using RaxML. Bootstrap values are recorded on the nodes. Groups are coloured as follows: the L. animalis group in purple, the L. vaginalis group in green, L. buchneri group in red, the L. rhamnosus group in yellow, the L. acidophilus group in maroon, and the L. gasseri group in blue. The representative species in each group is in bold. Species name follows the naming convention shown in Table 1.

G+C content analyses

Next, we looked at the G+C mol% and genomic drift of the glycolysis genes across the various species. Fig. 6 shows notched boxplots comparing the G+C mol% of each sequence set (the 16S rRNA sequence, the 10 genes and the concatenated sequences) in this study, compared to the genome-wide G+C mol%, ranked in increasing order. The G+C mol% of the pgm gene is closest to that of the total genome, while the 16S rRNA gene is the farthest. The notches are indicative of strong evidence that the medians differ when the notches do not overlap [51]. The 16S rRNA gene does not overlap with any other gene. In fact, a two-tailed t-test with a P value less than 0.001 (2.2×10−16) revealed that the G+C mol% of the 16S rRNA sequence was statistically distinct from that of the total genome G+C mol%. This indicates that the 16S rRNA gene is not matching the pace of drift of the total genome with regards to G+C mol%. In contrast, all of the glycolysis genes, with the exception of pfk and eno, were not statistically different from the total genome G+C mol% (P value greater than 0.01), indicating that G+C mol% drift for glycolysis genes provide insights into the genome-wide G+C mol% drift. This further supports glycolytic sequences as intriguing candidates for both phylogenetic studies and representatives of genome-wide trends.
Fig. 6.

G+C mol% analysis of Lactobacillus glycolysis genes. Depicted are notched boxplots of G+C mol% for each glycolysis gene, concatenated genes, 16S rRNA and total genome. Genes are placed in order of increasing median. If two notches do not overlap, it is an indication of strong evidence for differing medians.

G+C mol% analysis of Lactobacillus glycolysis genes. Depicted are notched boxplots of G+C mol% for each glycolysis gene, concatenated genes, 16S rRNA and total genome. Genes are placed in order of increasing median. If two notches do not overlap, it is an indication of strong evidence for differing medians. The genome sizes in this study ranged from 1.28 Mb (Lactobacillus iners) to 3.65 Mb (Lactobacillus pentosus), again reflecting the extensive genomic diversity within this genus. The total G+C mol% ranged from 32.50 % (L. iners) to 57.00 % (Lactobacillus nasuensis), which is intriguing given the general assumption that all lactobacilli are low G+C mol% organisms. Nevertheless, the mean G+C mol% was 40.70 %, consistent with Lactobacillus being generally perceived as low G+C mol% organisms. Splitting the species into high, medium and low categories, it becomes apparent that most species are trending towards the lower end of the spectrum, and away from the higher G+C mol% range (Fig. 7a). Some of the phylogenetic groups are closely clustered, such as the L. acidophilus group, L. gasseri group and the L. rhamnosus group, with the exception of L. delbrueckii subsp. bulgaricus (a dairy bacterium) and L. nasuensis (an aforementioned ex in G+C mol%). The L. animalis group and L. buchneri group are similarly clustered, albeit more loosely. These observations hold true when comparing the G+C mol% of all the individual genes in their respective genomes, perhaps reflecting a consistent and genome-wide pace of drift, rather than variable speeds of drift for each gene (Fig. 7b). Again, the 16S rRNA sequence has a much higher G+C mol% than most of the other studied genes, with the outlier L. nasuensis deviating from the consensus. The G+C mol% of the glycolysis genes within clusters are often times very close, as exemplified by the L. acidophilus group.
Fig. 7.

G+C mol% analysis of Lactobacillus genomes. (a) shows the total G+C mol% for each species. Species are coloured according to their phylogenetic group. (b) shows the G+C mol% of the glycolysis genes, the concatenated glycolysis genes, the 16S rRNA and total G+C mol% for each species. Species are named according to Table 1.

G+C mol% analysis of Lactobacillus genomes. (a) shows the total G+C mol% for each species. Species are coloured according to their phylogenetic group. (b) shows the G+C mol% of the glycolysis genes, the concatenated glycolysis genes, the 16S rRNA and total G+C mol% for each species. Species are named according to Table 1.

Discussion

The genomic and functional attributes of Lactobacillus render it a pervasive genus, both in research and in industry. The benefits and uses of this diverse set of species are well-established and exhaustive, and yet, the list continues to grow. Many Lactobacillus strains are now considered to be health-promoting in the form of probiotics and are often found to be a part of a healthy microbiome [26]. They are also being engineered to promote healthy host–microbe interactions and deliver bioactive compounds such as vaccines [52]. As microbiome studies expand, we anticipate that the interest in Lactobacillus is set to increase, especially given their occurrence in several human-associated microbiomes, encompassing intestinal, vaginal, oral and skin communities [21]. Many studies have been published discussing the role of Lactobacillus in the microbiota, including research into the microbiota changes through disease, enhancing the microbiome as a form of treatment, and how the microbiome reacts to drugs [53-55]. The continuously expanding list of uses and studies just illustrates how important it is to accurately identify Lactobacillus species. While all species of Lactobacillus share some classical features of LAB organisms, notably their ability to produce lactic acid, the similarities between species are relatively few. In fact, even basic characteristics such as niche and isolation source can vary radically. Proper identification is an increasing concern especially when it comes to disease modelling in the human microbiome, as well as the formulation, tracking and efficacy of probiotic strains. Innovative techniques are continuously being developed and often use a combination of 16S rRNA with developing technologies, such as MALDI-TOF [56]. However, these tools are not broadly accessible and still rely partially on the sometimes unsatisfactory 16S rRNA. Here, we provide a practical alternative to the classical use of 16S rRNA sequencing. In this paper, we applied the previously proposed methodology of using glycolysis sequences to perform phylogenetic studies [31] in the genus Lactobacillus. We demonstrated that this method is a practical and robust approach for Lactobacillus. Compared to the traditional 16S rRNA method, this approach was able to consistently identify phylogenetic groupings, with notably high-resolution between closely related species. While the 16S rRNA-based tree was able to identify the six phylogenetic groups, the concatenated tree was able to add more discrimination both between and within groups, evidenced by the higher bootstrap values in the glycolysis-based tree. Our grouping is consistent with a previous study using glycolysis sequences for phylogenetic analysis of Lactobacillus species [32]. Further analyses based on genomic content revealed clues as to why the glycolysis-based tree was better able to assign species. First, looking at the organization of the genes in the genomes revealed two conserved operons in Lactobacillus, the gap operon and the pfk operon, with the remaining enzymes showing clear start and stop boundaries. This shared synteny emphasizes the importance of glycolysis gene conservation. Next, we looked at expression level. The glycolysis genes were consistently among the most highly expressed genes in the cell, with the gap gene always in the top three most abundant transcripts. These high expression levels indicate a great use and energy expenditure and, thus, arguably reflect the biological importance of this gene to the cell. Because of this importance, the glycolysis genes are much less likely to be subjected to loss. The operon structures and expression levels of the glycolysis genes are significant because a main criterion for selecting the 16S rRNA as a phylogenetic marker was its high conservation among species [57]. Next, we looked at how the glycolysis genes reflected genomic drift in terms of G+C mol%. First, it would appear that the genus is reaching a stabilizing point in its G+C mol% drift, though some species with high G+C mol% still have margin for extending the trend (L. nasuensis, Lactobacillus zymae, and L. fermentum). Next, we saw that the glycolysis gene G+C mol% was extremely close to that of the genome-wide G+C mol%, while the 16S rRNA was startlingly higher (P<0.001), underscoring the fact that the 16S rRNA is by all accounts much different than that of the total genome, whereas the majority of the glycolysis genes are significantly similar to the total genome G+C mol% (Fig. 6). This provides a possible explanation for the reason why the 16S rRNA analyses have been limited at a high-resolution level in Lactobacillus and why the glycolysis-based tree was able to reach a higher-resolution level. In fact, it has long been noted that 16S rRNA is unable to discriminate between species of lactobacilli due to its high similarity amongst them [58]. The individual glycolysis genes are much more similar to the genome as a whole (Fig. 6). Additionally, individual glycolysis genes are also able to accurately assign species to groups with a high resolution (Figs S4–S13). The gap gene is of particular note, due to its presence in an operon, consistently high expression, G+C mol% and ability to accurately define species groups. Overall, the glycolysis-based approach was able to provide a highe-resolution phylogeny for Lactobacillus, due in part to its conservation, expression and reflection of genomic drift.
  55 in total

1.  How close is close: 16S rRNA sequence identity may not be sufficient to guarantee species identity.

Authors:  G E Fox; J D Wisotzkey; P Jurtshuk
Journal:  Int J Syst Bacteriol       Date:  1992-01

2.  Conservation of gene order: a fingerprint of proteins that physically interact.

Authors:  T Dandekar; B Snel; M Huynen; P Bork
Journal:  Trends Biochem Sci       Date:  1998-09       Impact factor: 13.807

3.  Expert consensus document. The International Scientific Association for Probiotics and Prebiotics consensus statement on the scope and appropriate use of the term probiotic.

Authors:  Colin Hill; Francisco Guarner; Gregor Reid; Glenn R Gibson; Daniel J Merenstein; Bruno Pot; Lorenzo Morelli; Roberto Berni Canani; Harry J Flint; Seppo Salminen; Philip C Calder; Mary Ellen Sanders
Journal:  Nat Rev Gastroenterol Hepatol       Date:  2014-06-10       Impact factor: 46.802

4.  The complete genomes of Lactobacillus plantarum and Lactobacillus johnsonii reveal extensive differences in chromosome organization and gene content.

Authors:  Jos Boekhorst; Roland J Siezen; Marie-Camille Zwahlen; David Vilanova; Raymond D Pridmore; Annick Mercenier; Michiel Kleerebezem; Willem M de Vos; Harald Brüssow; Frank Desiere
Journal:  Microbiology       Date:  2004-11       Impact factor: 2.777

5.  Comparative sequence analysis of a recA gene fragment brings new evidence for a change in the taxonomy of the Lactobacillus casei group.

Authors:  G E Felis; F Dellaglio; L Mizzi; S Torriani
Journal:  Int J Syst Evol Microbiol       Date:  2001-11       Impact factor: 2.747

Review 6.  Genomics of the Genus Bifidobacterium Reveals Species-Specific Adaptation to the Glycan-Rich Gut Environment.

Authors:  Christian Milani; Francesca Turroni; Sabrina Duranti; Gabriele Andrea Lugli; Leonardo Mancabelli; Chiara Ferrario; Douwe van Sinderen; Marco Ventura
Journal:  Appl Environ Microbiol       Date:  2015-11-20       Impact factor: 4.792

7.  Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data.

Authors:  Matthew Kearse; Richard Moir; Amy Wilson; Steven Stones-Havas; Matthew Cheung; Shane Sturrock; Simon Buxton; Alex Cooper; Sidney Markowitz; Chris Duran; Tobias Thierer; Bruce Ashton; Peter Meintjes; Alexei Drummond
Journal:  Bioinformatics       Date:  2012-04-27       Impact factor: 6.937

8.  Phylogenetic Analysis of the Bifidobacterium Genus Using Glycolysis Enzyme Sequences.

Authors:  Katelyn Brandt; Rodolphe Barrangou
Journal:  Front Microbiol       Date:  2016-05-09       Impact factor: 5.640

9.  Novel approaches for the taxonomic and metabolic characterization of lactobacilli: Integration of 16S rRNA gene sequencing with MALDI-TOF MS and 1H-NMR.

Authors:  Claudio Foschi; Luca Laghi; Carola Parolin; Barbara Giordani; Monica Compri; Roberto Cevenini; Antonella Marangoni; Beatrice Vitali
Journal:  PLoS One       Date:  2017-02-16       Impact factor: 3.240

10.  trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses.

Authors:  Salvador Capella-Gutiérrez; José M Silla-Martínez; Toni Gabaldón
Journal:  Bioinformatics       Date:  2009-06-08       Impact factor: 6.937

View more
  4 in total

1.  The Lactobacillus Bile Salt Hydrolase Repertoire Reveals Niche-Specific Adaptation.

Authors:  Sarah O'Flaherty; Alexandra Briner Crawley; Casey M Theriot; Rodolphe Barrangou
Journal:  mSphere       Date:  2018-05-30       Impact factor: 4.389

2.  Genomic characterization of Lactobacillus fermentum DSM 20052.

Authors:  Katelyn Brandt; Matthew A Nethery; Sarah O'Flaherty; Rodolphe Barrangou
Journal:  BMC Genomics       Date:  2020-04-29       Impact factor: 3.969

3.  Comparative genomics of Leuconostoc lactis strains isolated from human gastrointestinal system and fermented foods microbiomes.

Authors:  Ismail Gumustop; Fatih Ortakci
Journal:  BMC Genom Data       Date:  2022-08-02

4.  Adaptive response to iterative passages of five Lactobacillus species in simulated vaginal fluid.

Authors:  Katelyn Brandt; Rodolphe Barrangou
Journal:  BMC Microbiol       Date:  2020-11-10       Impact factor: 3.605

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.