Literature DB >> 31368886

Comparative genome analysis of Lactobacillus mudanjiangensis, an understudied member of the Lactobacillus plantarum group.

Sander Wuyts^1,2, Camille Nina Allonsius², Stijn Wittouck², Sofie Thys³, Bart Lievens⁴, Stefan Weckx¹, Luc De Vuyst¹, Lebeer Sarah².

Abstract

The genus Lactobacillus is known to be extremely diverse and consists of different phylogenetic groups that show a diversity that is roughly equal to the expected diversity of a typical bacterial genus. One of the most prominent phylogenetic groups within this genus is the Lactobacillus plantarum group, which contains the understudied Lactobacillus mudanjiangensis species. Before this study, only one L. mudanjiangensis strain, DSM 28402T, had been described, but without whole-genome analysis. In this study, three strains classified as L. mudanjiangensis were isolated from three different carrot juice fermentations and their whole-genome sequence was determined, together with the genome sequence of the type strain. The genomes of all four strains were compared with publicly available L. plantarum group genome sequences. This analysis showed that L. mudanjiangensis harboured the second largest genome size and gene count of the whole L. plantarum group. In addition, all members of this species showed the presence of a gene coding for a cellulose-degrading enzyme. Finally, three of the four L. mudanjiangensis strains studied showed the presence of pili on scanning electron microscopy (SEM) images, which were linked to conjugative gene regions, coded on a plasmid in at least two of the strains studied.

Entities: Chemical Gene Mutation Species

Keywords: Lactobacillus; cellulase; comparative genomics; conjugation; scanning electron microscopy

Mesh：

Substances：

Year: 2019 PMID： 31368886 PMCID： PMC6807380 DOI： 10.1099/mgen.0.000286

Source DB: PubMed Journal: Microb Genom ISSN： 2057-5858

The sequencing data and genome assemblies are available at the European Nucleotide Archive under accession number PRJEB29655 (https://www.ebi.ac.uk/ena/data/view/PRJEB29655). The complete data analysis pipeline can be found on GitHub (https://github.com/swuyts/mudAnalysis). Members of the bacterial genus Lactobacillus are well known because of their use in foods and probiotics. Currently more than 200 species have been described as members of this genus and every year several new species are discovered. One of them is Lactobacillus mudanjiangensis, first described in 2013. Since its first description, no other study has provided additional characterization or reported the isolation of other strains of the L. mudanjiangensis species. In this study, three new strains, isolated from fermented carrot juice, are reported and the first comparative genome analysis of this species is presented. This resulted in the discovery of a cellulose-degrading enzyme, which has not been found in any other Lactobacillus species, and which could be useful for several industrial applications wherein breakdown of this important skeletal plant component is necessary. Furthermore, scanning electron microscopy detected the presence of pili, which were linked to bacterial conjugation, a process in which DNA is transferred from one bacterial cell to another. In general, this genome-based study of L. mudanjiangensis thus provides the first insights into the biology of this species, which could lead to novel applications.

Introduction

The genus is known to be extremely diverse [1]. Furthermore, it has been shown that different phylogenetic groups within this genus display a diversity that is roughly equal to the expected diversity of a typical bacterial genus [2-6]. Each of these phylogenetic groups can be recognized as an entity with unique properties and a distinct natural history, ecology, function and physiology [5]. Therefore, the study of these phylogenetic groups separately, as if they were one genus, can be an interesting approach that might reveal new, previously overlooked, phylogenetic relationships and functional properties. One of the more abundantly studied species within the genus is . Previous genome-based phylogenetic studies have defined as a member of the group, together with , , and [1, 7]. In addition, the species [8], [9], [9] and [10] are closely related to and thus should be regarded as members of the group. is a species that was described for the first time in 2013 and that was isolated from a traditional pickle fermentation in the Heilongjiang province in China [10]. Since its first description, no other study has provided additional characterization or reported the isolation of other strains of the species. Therefore, before this study, not a single genomic assembly of this species was publicly available. However, in this study, four strains isolated from three different spontaneous carrot juice fermentations [11] were putatively classified as members of this species. Since the discovery of the mucus-binding pili, fimbriae or adhesins, in GG [12, 13], several comparative genomic studies have focused on exploring similar gene clusters in other lactobacilli, including the members of the group [1, 14–18]. Whereas these specific pili play an important role in cell surface adhesion, pili can be of importance for an array of other functions as well, ranging from biofilm formation to the uptake of extracellular DNA via natural competence (type IV pili) or facilitation of DNA transfer via conjugation [19-21]. The latter is a process that uses conjugative pili to bring bacterial cells together to provide an interface to exchange macromolecules, such as DNA or DNA–protein complexes [20]. Historically, these conjugation systems and their pili have only been associated with conjugative plasmids [22], one of the main drivers of horizontal gene transfer [23, 24]. However, recently, integrative and conjugative elements (ICEs), which harbour conjugation systems as well, have also been found to be another important driver of horizontal gene transfer [24-26]. This study aimed to provide more insights into the genomic features of the understudied species, in relation to the other members of the group, using a comparative genomics approach. Therefore, the genome of the type strain of was sequenced together with three strains isolated from fermented carrot juice. These and other publicly available genome sequences were used to screen for species-specific properties, which included an analysis for the presence of genes related to pili formation and conjugation. In total, 304 genomes were subjected to an in-depth analysis focusing on the phylogenetic relationships as well as the predicted functional capacity of these strains.

Methods

Sequencing of the type strain and downloading of publicly available assemblies

The type strain of [ DSM 28402T (=LMG 27194T=CCUG 62991T)] was purchased from a public micro-organism collection (BCCM-LMG, Ghent, Belgium). This strain and AMBF197, AMBF209 and AMBF249 were grown overnight in de Man–Rogosa–Sharpe (MRS) medium (Carl Roth, Karlsruhe, Germany) and DNA was extracted using the NucleoSpin 96 tissue kit (Macherey-Nagel, Düren, Germany), with an extra cell lysis step using 20 mg ml−1 of lysozyme (Sigma-Aldrich, St Louis, MO, USA) and 100 U ml−1 of mutanolysin (Sigma-Aldrich). Whole-genome sequencing was performed using the Nextera XT DNA Sample Preparation kit (Illumina, San Diego, CA, USA) and the Illumina MiSeq platform, using 2×250 cycles, at the Laboratory of Medical Microbiology (University of Antwerp, Antwerp, Belgium) in the case of the strains AMBF197, AMBF249 and DSM28402T or 2×300 cycles at the Center of Medical Genetics Antwerp (University of Antwerp) for strain AMBF209. De novo assembly of the genome sequence was performed using SPAdes v 3.12.0 [27]. In addition, all genome sequences annotated as , , , , and were downloaded from the National Center for Biotechnology Information (NCBI) Assembly database on 24 July 2018 using in-house scripts. In total, 310 genomes were used as an input for quality control.

Quality control and annotation

Basic genome characteristics, including genome size, GC content, completeness and N50 value, were estimated using Quast 4.6.3 [28] and CheckM v1.0.12 [29]. The quality of the genome assemblies was evaluated using the Quast and CheckM output. After visualization of several quality control parameters using ggplot2 [30], genomes with an N50 value <25 000 bp, a number of undefined nucleotides (N) per 100 000 bases >500 and a completeness <94 % were discarded. Furthermore, one genome with an extremely low total genome length (GCA_001660655) was also discarded. An overview of all the genome sequences and strains that passed this quality control (304 assemblies) can be found in Table S1 (available in the online version of this article). Finally, Prokka 1.12 [31] was used to predict and annotate genes for all genome sequences, including an estimation of the number of tRNA and rRNA sequences. In addition to its internal databases, a customized genus-specific blast database was used for higher quality annotation with Prokka’s –usegenus option. This database was created using blast [32, 33] and all complete genomes found in the NCBI Assembly database.

Defining the pangenomes of all group species

To define the pangenome, all genes were clustered into orthogroups using OrthoFinder 2.2.6 [34] and further analysed in R [35]. Here, a core orthogroup is defined as an orthogroup present in more than 95 % of a set of genomes. All other orthogroups are defined as accessory orthogroups. An upset plot was created using the R package UpSetR [36]. Unique orthogroups belonging to were further annotated using EggNOG-mapper [37] and visualized using ggplot2 [30].

Phylogenetic tree construction

Single-copy core orthogroups found by Orthofinder were used as input for the construction of a phylogenetic tree. DSM 15638 (NCBI Assembly accession number GCA_001434695) served as an outgroup, as it is the species most closely related to the group [1]. The first protein sequence of each fasta file of the single-copy core orthogroups was compared with a blast database of all genome proteins of the outgroup’s genome sequence. All hits with a coverage >75 % and a percentage similarity >50 % were added to the alignment of each orthogroup. These alignments, on the amino acid level, were concatenated into a supermatrix that was used in RaxML 8.2.9 [38] to build a maximum-likelihood phylogenetic tree with the –f a option, which combines a rapid bootstrap algorithm with an extensive search of the tree space, starting from multiple different starting trees. The tree and subtrees were plotted with the R package ggtree [39].

Average nucleotide identity

All pairwise ANI values were calculated with the Python pyani package [40] using a blastn approach [32, 33] based on the methodology described by Goris et al. [41].

Characterization of a cellulase-encoding gene

The predicted cellulase-encoding genes were further characterized by performing a mafft [42] alignment on the nucleotide and protein level to assess their similarity. Furthermore a blastp search against the NCBI protein (nr) database was performed [32, 33]. Only hits with a query coverage >80 % and a percentage identity >60 % were retained. A conserved domain analysis was performed using the NCBI Conserved Domain web interface [43]. Genes encoding glycosyl hydrolases (GHs) were detected by scanning all genomes and all the Lactobacillus genus complex (LGC) as described by Wittouck et al. [44] (genome sequences downloaded 18 December 2018) against hidden Markov model (HMM) profiles of the CAZyme families [45]. The profiles were downloaded from the dbCAN webserver [46] and queried using HMMSCAN [47]. An E-value of 1×10−15 and a coverage of 0.35 were used as a cut-off, similar to what has been described before [48]. Experimental validation of the cellulase activity was performed by growing the bacterial strains AMBF197, AMBF209, AMBF249 and DSM28402T and CMPG5300 overnight in MRS medium with the addition of 0.5 % carboxymethyl cellulose (CMC) at 37 °C. The overnight cultures were centrifuged for 10 min at 4000 , the supernatant was removed and the cells were resuspended in 8 ml phosphate-buffered saline (PBS) medium. Next, the cells were centrifuged again, followed by the removal of supernatant and finally resuspended in 80 µl PBS [56 g of NaCl, 1.4 g of KCl, 10.48 g of Na2HPO4 and 1.68 g of KH2PO4 (pH 7.4) l−1]. Then 60 µl of each culture was spotted in the middle of a CMC agar medium plate (5 g of CMC, 1 g of NaNO3, 1 g of K2HPO4, 1 g of KCl, 0.5 g of MgSO4, 0.5 g of yeast extract and 15 g of agar l−1) followed by a 3-day incubation at 37 °C. Finally, to visualize halo formation, the agar plates were stained for 30 min with 3 ml of a 0.1 % Congo red solution and destained with 3 ml of 1M NaCl twice for 15 min.

Scanning electron microscopy (SEM)

To assess the presence or absence of pili or fimbriae on the cell surface of strains AMBF197, AMBF209, AMBF249 and DSM 28402T, SEM was performed. To this end, the bacterial strains were grown overnight (MRS medium, 37 °C), gently washed with PBS and spotted on a gold-coated membrane [approximately 5×107 colony-forming units (c.f.u.) per membrane]. Bacterial spots were fixed with 2.5 % (m/v) glutaraldehyde in 0.1 M sodium cacodylate buffer (2.5 % glutaraldehyde, 0.1 M sodium cacodylate, 0.05 % CaCl2.2H2O; pH 7.4) by gently shaking the membrane for 1 h at room temperature, followed by a further overnight fixation at 4 °C. After fixation, the membranes were washed three times for 20 min with cacodylate buffer [containing 7.5 % (m/v) saccharose]. Subsequently, the bacteria were dehydrated in an ascending series of ethanol (50, 70, 90 and 95 %, each for 30 min at room temperature, and 100 % for 2×1 h and 1×30 min) and dried in a Leica EM CPD030 (Leica Microsystems Belgium, Diegem, Belgium). The membranes were mounted on a stub and coated with 5 nm of carbon (Leica Microsystems Belgium) in a Leica EM Ace 600 coater (Leica Microsystems Belgium). SEM imaging was performed using a Quanta FEG250 SEM system (Thermo Fisher, Asse, Belgium) at the Antwerp Centre for Advanced Microscopy (ACAM, University of Antwerp) and the Electron Microscopy for Material Science group (EMAT, University of Antwerp).

Detection of genomic clusters encoding pili or fimbriae

To screen for the presence of the spaCBA gene cluster, the gene cluster that is responsible for expression of the fimbriae in GG [12, 13], a blast search [32, 33] on the protein level was performed against a blast database constructed for each genome separately. The gene sequences of spaA (NCBI GenBank accession number BAI40953.1), spaB (BAI40954.1) and spaC (BAI40955.1) were used as queries. Furthermore, the genomes were screened for genes encoding pili-related protein secretion systems using the predicted amino acid sequences as a query and the TXSScan definitions and profile models [49] as references in MacSyFinder v1.0.5 [50]. As only genes related to conjugation systems were found, all protein sequences of all genomes were scanned again, this time using the CONJScan definitions and profile models [22, 24] using MacSyFinder. In brief, a conjugation region was only considered if the conjugation genes were separated by fewer than 31 genes, except for genes encoding relaxases that can be separated by maximal 60 genes. The region was considered to be conjugative when it contained genes coding for (i) a VirB4/TraU homologue, (ii) a relaxase, (iii) a type 4 coupling protein (T4CP) and (iv) a minimum number of type 4 secretion system (T4SS) type-specific genes [24]. For both scans, hits with alignments covering >50 % of the protein profile and with an independent E-value <10−3 were kept for further analysis (default parameters) in R [35]. Conserved domain analysis of genes of interest was performed using the NCBI Conserved Domain web interface [43]. The gene regions were visualized using the R package gggenes (available at https://github.com/wilkox/gggenes).

Plasmid identification

Detection and reconstruction of plasmids in the different strains was performed using Recycler v0.7 [51], with the original fastq files and SPAdes assembly graphs as input. In addition, plasmidSPAdes (SPAdes v3.12.0) [52] was used and only circular sequences detected with both assembly strategies were retained for further analysis. The assembled plasmids were annotated with Prokka and further characterized by scanning against the EggNOG database, as described above. The presence of a conjugation system was confirmed with CONJScan, as described above. The percentage identity between the different plasmids found was assessed using blast [32, 33]. The similarity with any previously described plasmid was checked by performing a Mash (v 2.1.1) [53] distance search against the PLSDB database with the PLSDB webserver (v 0.4.1–2) [54]. Only plasmids with a maximum distance of 0.05 were kept for further analysis. The presence of shared genes was assessed by performing a blast search [24] for all predicted plasmid gene sequences against the reference plasmid sequences. Only hits with a nucleotide identity >80 % and a query coverage >70 % were assessed as true hits. A plasmid map was created using Geneious v8 [32, 33].

Delimitation of integrative and conjugative elements

The presence of ICEs was explored by using a similar approach to the pipeline described previously [10]. Briefly, all strict core genes, i.e. genes present in all strains of , were found using the Orthofinder output (see above). Next, all flanking core genes of each conjugative region were identified. Since within one species an ICE is expected to be found between the same core orthogroups, the flanking core genes of each conjugative region found were evaluated to determine whether or not they could be defined as an ICE.

Results

The assembled genome of the type strain DSM 28402T was analysed together with the genome sequences of three putative strains isolated from carrot juice fermentations, namely AMBF197, AMBF209 and AMBF249, to confirm their putative classification as members. Furthermore, to allow comparison with other closely related species and the detection of species-specific properties, all publicly available genome sequences (NCBI Assembly database, 24 July 2018) of group members were included in this comparative genomics study, and 304 genomes in total were analysed (Table 1).

Table 1.

An overview of the studied species and strains

Public data
Species	No. of genomes	Type strain	Reference
L. fabifermentans	2	DSM 21115^T	[80]
L. herbarum	1	DSM 100358^T	[8]
L. mudanjiangensis	0	DSM 28402^T	[10]
L. paraplantarum	5	DSM 10667^T	[81]
L. pentosus	13	DSM 20314^T	[82]
L. plantarum	278	DSM 20174^T	[83]
L. xiangfangensis	1	DSM 27103^T	[84]
Type strains sequenced in this study
Species	No. of genomes	Type strain	Reference
L. mudanjiangensis	1	DSM 28402^T	[85]
In-house isolates
Species	No. of genomes	Isolation source	Reference
L. mudanjiangensis	3	Spontaneously fermented carrot juice	[11]

An overview of the studied species and strains Public data Species No. of genomes Type strain Reference 2 DSM 21115T [80] 1 DSM 100358T [8] 0 DSM 28402T [10] 5 DSM 10667T [81] 13 DSM 20314T [82] 278 DSM 20174T [83] 1 DSM 27103T [84] Type strains sequenced in this study Species No. of genomes Type strain Reference 1 DSM 28402T [85] In-house isolates Species No. of genomes Isolation source Reference 3 Spontaneously fermented carrot juice [11]

Phylogeny of the group

To obtain a detailed view on the phylogeny of in relation to the whole group, a maximum-likelihood phylogenetic tree was constructed, based on 612 single-copy core orthogroups, found with Orthofinder (Figs 1 and S1). The resulting topology of this tree showed seven major clades, mostly following the species annotation as described in the NCBI Assembly database. However, these results exposed a few wrongly annotated genomic assemblies. For example, both MPL16 and AY01 had previously been annotated as , whereas here they were found within a clade that contained the type strain. Similarly, EGD-AQ4 was found within the clade of the type strain, whereas it was annotated as previously. These strains were reclassified for subsequent analysis. Furthermore, the type strain of formed a separate clade together with the strains AMBF197, AMBF209 and AMBF249 (Fig. 1). Based on its single-copy core orthogroups, this species was phylogenetically the least similar to , whereas its most similar relative was , followed by .

Fig. 1.

Maximum-likelihood phylogenetic tree of the . group. The tree is based on the amino acid sequences of 612 single-copy marker genes. DSM 15638 was used as an outgroup. The tree was pruned to only keep 70 L . plantarum strains to avoid over-plotting. A complete tree can be found in Fig. S1. The branch length of the outgroup was shortened for better visualization. Each tip is colored based on its species name as annotated in the NCBI Assembly database (where applicable), with dark blue for , light green for ., light blue for ., pink for ., light orange for ., dark orange for . and red for the isolates obtained from the carrot juice fermentations. The type strains of each species are annotated with a triangle (NCBI) or a square (sequenced in this study). To confirm that each major phylogenetic clade represented at least one different species, the pairwise average nucleotide identity (ANI) values of all genome assemblies were calculated (Fig. 2). The intraclade ANI values all exceeded the commonly used 95 % species-level threshold [55] for (99.0–99.4 %), (99.7–99.9 %) and (99.7–99.9 %), whereas their interclade ANI values were far below this threshold, showing that these clades all represented a single species. However, this was not the case for and , for which multiple pairwise comparisons led to intraclade ANI values below this threshold, suggesting that these phylogenetic clades contained at least two species (Figs 2 and S2). Conversely, further analysis described elsewhere [44], showed that this was not the case and therefore none of the major clades were split for subsequent analysis. Finally, for and , no intraclade comparisons could be performed, as only one genome assembly was available for these species.

Fig. 2.

Density plot of all pairwise average nucleotide identity (ANI) comparisons for each L . plantarum group species. All interclade comparisons are shown in green, whereas all intraclade comparisons are shown in orange. The vertical red line shows the commonly used 95 % species-level delimitation threshold [55]. No intraclade comparisons could be performed for . and ., as only one genome assembly was available for these species.

Genomic features of

The above results confirmed that strains AMBF197, AMBF209 and AMBF249 were members of the species. Therefore, the first four genomes of this species are presented here. Their estimated genome size varied between 3.4 Mb (strain DSM 28402T) and 3.6 Mb (strain AMBF209), whereas their GC content varied between 42.73 % (strain AMBF209) and 43.06 % (strain DSM 28402T) (Table 2). Finally, a high number of transfer RNA (tRNA) genes were found in all four strains.

Table 2.

Genome characteristics of strains

Genome	Total length (bp)	# contigs	GC content (%)	Coding sequences	# tRNA genes	# rRNA genes	Accession no.
AMBF197	3 501 388	52	42.85	3463	65	8	GCA_900617905
AMBF209	3 589 692	111	42.73	3586	63	8	GCA_900617925
AMBF249	3 554 025	69	42.83	3503	71	8	GCA_900617935
DSM 28402^T	3 389 962	42	43.06	3346	66	8	GCA_900617945

Genome characteristics of strains Genome Total length (bp) # contigs GC content (%) Coding sequences # tRNA genes # rRNA genes Accession no. AMBF197 3 501 388 52 42.85 3463 65 8 GCA_900617905 AMBF209 3 589 692 111 42.73 3586 63 8 GCA_900617925 AMBF249 3 554 025 69 42.83 3503 71 8 GCA_900617935 DSM 28402T 3 389 962 42 43.06 3346 66 8 GCA_900617945 A substantial difference in total genome length between the different species of the group was found (Fig. 3a). showed a median estimated genome size of 3.53 Mb, making it the second largest of the whole group up to now. showed the largest median estimated genome size (3.74 Mb), followed by (3.53 Mb) and (3.43 Mb), whereas for (3.0 Mb) and (2.9 Mb) it was much smaller. A high spread in genome length was found within strains belonging to the species, as their genome size ranged between 2.9 and 3.8 Mb. Furthermore, showed a GC content of 42.9 %, the lowest value within the whole group (Fig. 3b). Finally, regarding median gene count, similar trends to those for the genome length were found, with showing the highest count, followed by and , whereas and harboured the lowest numbers of genes (Fig. 3c).

Fig. 3.

Estimated genome sizes, GC content and gene counts for all genomes of the . group species studied, and predicted functional capacity of all unique L mudanjiangensis. orthogroups. (a) Total genome size, (b) GC content and (c) gene counts for all genomes studied, coloured by species. (d) Upset plot comparing shared orthogroup counts between the eight . group species. Species-specific orthogroups for . are coloured in blue and the inset shows their functional category based on EggNOG classification. Uniquely shared orthogroups between . and one other . group member are coloured in orange, whereas uniquely shared orthogroups between . and two other species are coloured in red. In total, 947 588 genes were found in the whole group, with an average of 3110 genes per genome. These genes were further clustered into 8005 different orthogroups, leading to an average count of 2924 orthogroups per genome. The differences between these numbers were due to the fact that some genes were found in multiple copies within one genome, which clustered together in a single orthogroup. Of all these orthogroups, 2172 were defined as core orthogroups and 5833 as accessory orthogroups. A detailed overview of the number of genes and core and accessory orthogroups can be found in Table S2. Subsequently, the distribution of orthogroups between the different group members was explored (Fig. 3d). The species with the highest number of species-specific orthogroups was . With 2065 species-specific orthogroups, it greatly outnumbered all other species, although this number was most probably biased, due to the higher number of sequenced genomes available for compared with the other group species. It was followed by (Fig. 3d, blue), which contained 372 species-specific orthogroups, and , harbouring 279 species-specific orthogroups. Furthermore, and shared the highest number of uniquely shared orthogroups (426), followed by the combination of and (219 uniquely shared orthogroups), which seemed to be in line with the phylogeny described in Fig. 1. In contrast, shared more unique orthogroups with the phylogenetically distant (166 orthogroups; Fig. 3d, yellow) than it did with its most similar species, (59 orthogroups). To obtain more insights into the unique properties of , all 372 species-specific orthogroups were further classified using the EggNOG database (inset Fig. 3d). However, this resulted in the vast majority of orthogroups (304) being classified under ‘category S: function unknown’, showing that further experimental work on functional gene validation is necessary. Otherwise, most orthogroups belonged to category G (carbohydrate transport and metabolism; 14 orthogroups), followed by category M (cell wall/membrane/envelope biogenesis; 13 orthogroups).

harbours a cellulose-degrading enzyme

Carbohydrate transport and metabolism (category G) was found to be the most abundantly characterized category among the species-specific orthogroups. Further examination of the 14 unique orthogroups that were detected in this category revealed the presence of a gene in all four strains annotated as endoglucanase E1, which is involved in the conversion of cellulose polymers into simple saccharides [56]. Alignment of both the nucleotide and predicted protein sequence (GenBank accession numbers: VDG21000, VDG22783, VDG26647 and VDG31879) showed that the sequences were identical between all four strains studied. Further analysis revealed the presence of a conserved domain in the endoglucanase E1 gene of all strains, annotated as cellulase/glycosyl hydrolase family 5 (NCBI Conserved Domains ID: pfam00150), supporting the view that this gene is related to cellulose degradation. Since cellulases/endoglucanases are thus classified as glycosyl hydrolases (GHs), GHs were predicted for all four strains and all LGC genomes as additional reference. Indeed, for all four strains, this endoglucanase E1 gene was classified as belonging to the GH5_1 family, a GH subfamily that was uniquely found in and not in any other member of the LGC. Although this GH5_1 family shows some degree of polyspecificity, the majority of enzymes (22 out of 24 enzymes characterized) are reported to be endoglucanases/cellulases [57]. Finally, a blastp search of the predicted protein sequences of these endoglucanase E1 genes to the NCBI nr database resulted in three additional hits to three predicted proteins of poorly characterized unclassified lactobacilli (GenBank accession numbers: WP_137634547, WP_137628413 and WP_137639555), which were not included in the above-described GH analysis, indicating that not only harboured putative cellulase genes within the LGC. To confirm the in silico prediction of the putative cellulase-encoding genes, all four strains (AMBF197, AMBF209, AMBF249 and DSM28402T) were grown on CMC agar, together with strain CMPG5300 as a negative control, as this strain lacks the endoglucanase E1 gene. Staining with Congo red, which binds to cellulose, revealed halo formation on plates containing strains AMBF197, AMBF209 and AMBF249 (Fig. 4). In contrast, no halo was formed on CMC agar plates containing DSM28402T and CMPG5300. These results showed that while a putative cellulase-encoding gene was found in the genome sequences of all studied strains, unexpectedly, only three out of four strains showed cellulose-degrading activity on CMC agar plates. Together, these results thus pointed towards the presence of a novel cellulose-degrading enzyme in three strains.

Fig. 4.

Cellulose degradation of L . mudanjiangensis. AMBF197, AMBF209, AMBF249 and DSM28402T were grown on carboxymethyl cellulose (CMC) agar and cellulose degradation was visualized by Congo red staining. CMPG5300 was used as a negative control.

Presence of a putative conjugative system in

To characterize potential novel cell surface macromolecules associated with this species, the cell surfaces of the four strains AMBF197, AMBF209, AMBF249 and DSM28402T were visualized using SEM. This analysis revealed that three of the four strains ( DSM28402T, AMBF209 and AMBF249) formed pili or fimbriae, connecting different cells to each other as well as cells to an undefined structure (Fig. 5a).

Fig. 5.

Scanning electron microscopy (SEM) and genes related to conjugation. (a) SEM images of all four . strains studied. White arrows indicate putative conjugative pili. (b) Gene clusters encoding a putative conjugation system, coloured according to their potential function, as classified by CONJSCAN. The text above each gene shows its matching orthogroup. (c) Schematic model representing the process of bacterial conjugation with all three mandatory elements. The conjugation system is coloured based on the figure legend of (b). (R, relaxase; T4CP, type IV coupling protein; T4SS, type IV secretion system.) Adapted from [22]. In order to identify the genes encoding these pili, all genome sequences of were screened for the presence of genes associated with these kinds of phenotypes. These included the spaCBA gene cluster, as well as secretion systems based on pili, such as the type II and type IV secretion systems [12, 13, 49]. In this study, no spaCBA gene cluster was found. However, further exploration revealed the presence of a conjugation system in at least three of the four strains examined (AMBF209, AMBF249 and DSM 28402T). In general, a conjugation system consists of three major components: (i) a relaxase (Fig. 5c, green) that will bind and nick the DNA at the origin of replication and (ii) a coupling protein (T4CP; Fig. 5c, orange) that will couple the relaxase–DNA complex to (iii) a type IV secretion system (T4SS; Fig. 5c, blue), which ultimately transfers the whole complex to the recipient cell [22, 23, 49, 58]. Two complete conjugation systems containing all three mandatory parts (Fig. 5c) were found in AMBF209 and AMBF249, whereas one complete conjugation system was found in DSM 28402T (Fig. 5b). All components of these conjugation systems were further classified based on the classification presented in Guglielmini et al. [58]. For all three strains, the relaxase gene (Fig. 5b, c, green) of this conjugation system was classified as a member of the MOBQ class, whereas the coupling protein (Fig. 5b, c, orange) was classified as T4CP2. The T4SS system, which harboured the genes possibly related to the observed pilus, was further classified as belonging to the class MPFFATA, which groups the conjugation-related T4SS systems of Gram-positive bacteria [58]. The ATPase motor of this T4SS system was identified as a VirB4 homologue (first described in ). Furthermore, this T4SS system contained three accessory genes (trsC, trsD and trsJ) in AMBF209 and AMBF249, whereas four accessory genes were annotated in DSM 28402T (trsC, trsD, trsF and trsJ) (Fig. 5b). For these accessory genes, homologues had already been identified previously for the genes trsC and trsD, with trsC coding for a VirB3 homologue, which is linked to the formation of the membrane pore, and trsD coding for another homologue of VirB4, the conjugation ATPase [58]. In contrast, both trsF and trsJ are poorly characterized. Further analysis of the genes surrounding the annotated conjugation genes showed that this genomic region contained 18 to 19 open reading frames, most of them annotated as hypothetical proteins (Fig. 5b and Table S3). However, a conserved domain analysis revealed a bacteriophage peptidoglycan hydrolase domain in orthogroup OG0002812, annotated as a hypothetical protein, in both AMBF209 conjugation region 1 (AMBF209_CR1) and AMBF249 conjugation region 2 (AMBF249_CR2), making it a VirB1-like protein [58]. In , the VirB1 protein provides localized lysis of the peptidoglycan cell wall to allow insertion of the T4SS [59]. A similar domain, also known to harbour peptidoglycan lytic activity, was found in DSM28402_CR1 (orthogroup OG0002812). Finally, another conserved domain was found in all five gene regions clustered in orthogroup OG0003012, annotated as T4SS_CagC, which was shown to be a VirB2 homologue. VirB2 is the major pilus component of the type IV secretion system of , which is the main building block for extension and retraction of the conjugative pilus [60-62]. Taken together, these results showed the presence of pili in three strains (AMBF209, AMBF249 and DSM 28402T), which after genomic analysis were hypothesized to be part of a conjugation system that is poorly characterized. Finally, genome analysis of all other group members showed that the presence of a complete conjugation system was not unique to (Table S1). All 3 necessary genes were also found in 58 out of 275 L . plantarum strains, 2 out of 7 strains and 4 out of 14 strains. In contrast, the system was completely absent in , and .

Plasmid reconstruction from genome data

Many conjugation systems are coded on plasmids [25]. Therefore, all four genomes were screened for plasmid presence. Plasmids were only found in two out of four genome assemblies, namely AMBF209 and AMBF249 (Fig. S3). Both strains harboured a plasmid of 27.3 Kb with 33 predicted genes, and after pairwise alignment of the plasmid they were found to be exactly the same. Subsequently, the presence of a conjugation system on these plasmids was confirmed using CONJScan. Further examination showed that the plasmid exactly matched the above-described AMBF209_CR2 and AMBF249_CR1 gene regions (Fig. 5b). Regarding annotated genes, 13 out of 33 gene products were predicted to be hypothetical proteins by Prokka. Further annotation using the EggNOG database revealed that most genes were mapped to category S (function unknown), followed by category L (replication, recombination and repair) and category C (energy production and conversion). Finally, a Mash search was performed against the PLSDB database to explore whether a similar plasmid was already described in the literature. Three plasmids showed a high similarity (Mash distance score <0.05) to the predicted plasmid, namely MFPC16A2803 plasmid pMFPC16A2803B (GenBank accession number: LT991587), JB16 plasmid pKLC4 (NC_018699) and SRCM103356 plasmid unnamed1 (NZ_CP035140). Out of 33 L . mudanjiangensis plasmid genes, 18 genes were shared with pMFPC16A2803B and unnamed1, while 21 genes were shared with pKLC4 (Table S4). All plasmid-borne conjugation genes (AMBF209_CR2 and AMBF249_CR1) were also detected in all three reference plasmids. Taken together, these results showed that strains AMBF209 and AMBF249 carried the same conjugative plasmid, for which the encoded gene functions are poorly characterized. Since only two out of five conjugation regions (Fig. 5, AMBF209_CR2 and AMBF249_CR1) were plasmid-encoded, an additional analysis was performed to assess whether the other three conjugation systems could be part of an ICE. For this, all four genomes were analysed using a similar method to one that was recently described [24]. However, ICE regions usually contain repeats, such as transposases, leading to fragmentation of these ICE regions, if short-read sequencing technology is used [24]. Therefore, these analysis methods usually require a complete genome for proper ICE identification. The assembly state of the four genomes thus made it hard to correctly interpret the results obtained.

Discussion

In this study, the genome sequence of the type strain DSM 28402T was presented together with the genomes of three new strains, AMBF197, AMBF209 and AMBF249, which were isolated from three different spontaneous carrot juice fermentations [11]. To gain more insight into this understudied species, the genome sequences were compared with all publicly available genome sequences of the closely related species belonging to the group. This resulted in the discovery of a putative cellulose-degrading enzyme, annotated as endoglucanase E1, in all four strains and three uncharacterized sp. genomes. To date, such an enzyme has never been found in any other LGC genome. Moreover, cellulose-degrading enzymes have never previously been found in lactic acid bacteria (LAB). Cellulose is the most abundant organic polymer on Earth, the most important skeletal component in plants in general [63] and the most abundant crude fibre in carrots [64]. From an industrial perspective, degradation of this polysaccharide is seen as being valuable for the production of high value-added products such as biofuels and lactic acid [65]. For this reason and because of their widespread industrial use, much effort has been devoted to the construction of recombinant cellulolytic LAB strains [65]. In particular, strains have been genetically modified to express cellulases due to their added benefit of high acid and ethanol tolerance [65-72]. This means that the discovery of a natural cellulolytic species, namely , which is a member of the group, may be valuable for industrial purposes. However, it should be noted that efficient hydrolysis of lignocellulose requires the cooperation of multiple enzymes [65] and therefore further characterization of the detected endoglucanase E1 is necessary. For example, as this enzyme was found in four strains but only three strains showed CMC degradation on CMC agar plates, future studies should focus on unravelling the expression conditions. SEM analysis revealed the presence of pili or fimbriae in three of the four strains studied. In this study, the observation of pili in was associated with bacterial conjugation. Conjugation is one of the main drivers of horizontal gene transfer and is commonly associated with conjugative plasmids [23, 24]. Here, two of the five conjugative regions found were plasmid-associated and the two plasmids found were exactly the same for both AMBF209 and AMBF249, although these strains were isolated from different household carrot juice fermentations [11]. Previous studies also identified and described conjugative plasmids in other species, such as [73], [74], [75], [76], [77] and [78]. Genes on these plasmids often code for proteins involved in detoxification, virulence, antibiotic resistance and ecological interactions [23], which could give them a fitness advantage in certain environments. Here, apart from the conjugation-related genes, many genes were annotated as hypothetical proteins on the conjugative plasmid, showing that the detected plasmid was poorly characterized. However, since this plasmid showed high similarity and a high number of shared genes with a plasmid from a strain, which was isolated from fermented kimchi [79], it could potentially harbour genes that are beneficial for survival on plants or in a fermented vegetable environment. In conclusion, in this study, the genome sequences of four strains were studied in relation to the closely related members of the group. Comparative genome analysis showed that harboured one of the largest genomes and the highest gene counts of the group. Furthermore, a cellulose-degrading enzyme was detected in all four strains studied, but only three of these showed in vitro cellulase activity. Finally, three of the four strains studied showed the presence of pili on SEM images, and these were linked to conjugative gene regions. For two strains, AMBF209 and AMBF249, these regions were plasmid-associated. Further experimental studies, such as phenotypic growth curve-based screenings, conjugation experiments and the creation of knock-out mutants, are necessary to characterize the plasmid found and to confirm the link between the pili observed and this conjugation gene region.

Data bibliography

1. Wuyts S. European Nucleotide Archive. Study Accession number: PRJEB29655 Click here for additional data file. Click here for additional data file.

79 in total

Review 1. The mating pair formation system of conjugative plasmids-A versatile secretion machinery for transfer of proteins and DNA.

Authors: Gunnar Schröder; Erich Lanka
Journal: Plasmid Date: 2005-07 Impact factor: 3.466

Review 2. Pili in Gram-positive bacteria: assembly, involvement in colonization and biofilm development.

Authors: Anjali Mandlik; Arlene Swierczynski; Asis Das; Hung Ton-That
Journal: Trends Microbiol Date: 2008-01 Impact factor: 17.079

3. Lactobacillus phylogenomics--towards a reclassification of the genus.

Authors: Marcus J Claesson; Douwe van Sinderen; Paul W O'Toole
Journal: Int J Syst Evol Microbiol Date: 2008-12 Impact factor: 2.747

4. plasmidSPAdes: assembling plasmids from whole genome sequencing data.

Authors: Dmitry Antipov; Nolan Hartwick; Max Shen; Mikhail Raiko; Alla Lapidus; Pavel A Pevzner
Journal: Bioinformatics Date: 2016-07-27 Impact factor: 6.937

5. Assembly of Synthetic Functional Cellulosomal Structures onto the Cell Surface of Lactobacillus plantarum, a Potent Member of the Gut Microbiome.

Authors: Johanna Stern; Sarah Moraïs; Yonit Ben-David; Rachel Salama; Melina Shamshoum; Raphael Lamed; Yuval Shoham; Edward A Bayer; Itzhak Mizrahi
Journal: Appl Environ Microbiol Date: 2018-04-02 Impact factor: 4.792

6. Complete genome sequence of Leuconostoc carnosum strain JB16, isolated from kimchi.

Authors: Ji Young Jung; Se Hee Lee; Che Ok Jeon
Journal: J Bacteriol Date: 2012-12 Impact factor: 3.490

Review 7. Cellulose: fascinating biopolymer and sustainable raw material.

Authors: Dieter Klemm; Brigitte Heublein; Hans-Peter Fink; Andreas Bohn
Journal: Angew Chem Int Ed Engl Date: 2005-05-30 Impact factor: 15.336

8. MacSyFinder: a program to mine genomes for molecular systems with an application to CRISPR-Cas systems.

Authors: Sophie S Abby; Bertrand Néron; Hervé Ménager; Marie Touchon; Eduardo P C Rocha
Journal: PLoS One Date: 2014-10-17 Impact factor: 3.240

9. Carrot Juice Fermentations as Man-Made Microbial Ecosystems Dominated by Lactic Acid Bacteria.

Authors: Sander Wuyts; Wannes Van Beeck; Eline F M Oerlemans; Stijn Wittouck; Ingmar J J Claes; Ilke De Boeck; Stefan Weckx; Bart Lievens; Luc De Vuyst; Sarah Lebeer
Journal: Appl Environ Microbiol Date: 2018-05-31 Impact factor: 4.792

10. Comparative genomic and functional analysis of 100 Lactobacillus rhamnosus strains and their comparison with strain GG.

Authors: François P Douillard; Angela Ribbera; Ravi Kant; Taija E Pietilä; Hanna M Järvinen; Marcel Messing; Cinzia L Randazzo; Lars Paulin; Pia Laine; Jarmo Ritari; Cinzia Caggia; Tanja Lähteinen; Stan J J Brouns; Reetta Satokari; Ingemar von Ossowski; Justus Reunanen; Airi Palva; Willem M de Vos
Journal: PLoS Genet Date: 2013-08-15 Impact factor: 5.917

1 in total

1. Bacterial community dynamics in spontaneous sourdoughs made from wheat, spelt, and rye wholemeal flour.

Authors: Jakub Boreczek; Dorota Litwinek; Joanna Żylińska-Urban; Dariusz Izak; Krzysztof Buksa; Jan Gawor; Robert Gromadka; Jacek Karol Bardowski; Magdalena Kowalczyk
Journal: Microbiologyopen Date: 2020-02-11 Impact factor: 3.139

1 in total