| Literature DB >> 23894338 |
Tamara Smokvina1, Michiel Wels, Justyna Polka, Christian Chervaux, Sylvain Brisse, Jos Boekhorst, Johan E T van Hylckama Vlieg, Roland J Siezen.
Abstract
Lactobacillus paracasei is a member of the normal human and animal gut microbiota and is used extensively in the food industry in starter cultures for dairy products or as probiotics. With the development of low-cost, high-throughput sequencing techniques it has become feasible to sequence many different strains of one species and to determine its "pan-genome". We have sequenced the genomes of 34 different L. paracasei strains, and performed a comparative genomics analysis. We analysed genome synteny and content, focussing on the pan-genome, core genome and variable genome. Each genome was shown to contain around 2800-3100 protein-coding genes, and comparative analysis identified over 4200 ortholog groups that comprise the pan-genome of this species, of which about 1800 ortholog groups make up the conserved core. Several factors previously associated with host-microbe interactions such as pili, cell-envelope proteinase, hydrolases p40 and p75 or the capacity to produce short branched-chain fatty acids (bkd operon) are part of the L. paracasei core genome present in all analysed strains. The variome consists mainly of hypothetical proteins, phages, plasmids, transposon/conjugative elements, and known functions such as sugar metabolism, cell-surface proteins, transporters, CRISPR-associated proteins, and EPS biosynthesis proteins. An enormous variety and variability of sugar utilization gene cassettes were identified, with each strain harbouring between 25-53 cassettes, reflecting the high adaptability of L. paracasei to different niches. A phylogenomic tree was constructed based on total genome contents, and together with an analysis of horizontal gene transfer events we conclude that evolution of these L. paracasei strains is complex and not always related to niche adaptation. The results of this genome content comparison was used, together with high-throughput growth experiments on various carbohydrates, to perform gene-trait matching analysis, in order to link the distribution pattern of a specific phenotype to the presence/absence of specific sets of genes.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23894338 PMCID: PMC3716772 DOI: 10.1371/journal.pone.0068731
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Overview of Lactobacillus paracasei strains and properties.
| CRISPR system | EPS biosynthesis cluster | |||||||||||||
| strain code | other strain codes | source | year of isolation | (ML)ST type | Lcas1 CRISPR | Lcas1 | Lcas2 | EPS-1 | EPS-2 | EPS-3A | EPS- 3B# | EPS-4 | ||
|
| ||||||||||||||
| Lpp7 |
| 1982 | Y | 9 | ||||||||||
| Lpp14 |
| 1989 | Y | Y | 15 | |||||||||
| Lpp17 | D640 |
| 1987 | 14 | 3 | Y | Y | Y | 4 | |||||
| Lpp22 | D645 |
| 1987 | 16 | 0 | Y | 4 | |||||||
| CNCM I–4648 | D647 |
| 1988 | 21 | 0 | 7 | ||||||||
| Lpp37 | D657, ATCC27092 |
| 1994 | 1 | 1 | Y | Y | Y | 13 | |||||
| Lpp41 | D661 |
| 1995 | 17 | 0 | Y | 7 | |||||||
| Lpp43 | D662 |
| 1995 | 9 | 5 | Y | Y | 13 | ||||||
| Lpp74 |
| 2000 | Y | Y | 7 | |||||||||
| Lpp120 | D695 |
| 2003 | 29 | 0 | Y | Y | 15 | ||||||
| Lpp122 | D697 |
| 2003 | 18 | 0 | Y | 10 | |||||||
| Lpp123 | D693 |
| 2003 | 28 | 7 | Y | Y | 7 | ||||||
| CNCM I–4649 |
| 2003 | ? | Y | Y | Y | 6 | |||||||
| Lpp125 | D698 |
| 2003 | 6 | 0 | Y | Y | 12 | ||||||
| Lpp126 | D699 |
| 2003 | 30 | 9 | Y | Y | Y | 9 | |||||
| Lpp226 |
| 2009 | Y | Y | 14 | |||||||||
| Lpp219 |
| 2008 | 13 | Y | Y | Y | 3 | |||||||
| Lpp221 |
| 2008 | ? | Y | Y | Y | 12 | |||||||
| Lpp223 |
| 2008 | 1/2 | Y | Y | Y | 12 | |||||||
| Lpp225 |
| 2009 | 10 | Y | Y | Y | 5 | Y | ||||||
| Lpp227 |
| 2008 | 12 | Y | Y | Y | 12 | |||||||
| Lpp228 |
| 2008 | Y | Y | 14 | |||||||||
| Lpp229 | ATCC4009 |
| ? | Y | Y | 8 | ||||||||
| Lpp230 | ATCC11582 |
| ? | Y | Y | 7 | ||||||||
| Lpp46 | D664, DSM2649 |
| 1996 | 11 | 4 | Y | Y | Y | 17 | |||||
| Lpp48 |
| 1996 | 23 | ? | Y | Y | Y | 4 | ||||||
| Lpp49 | D667 |
| 1996 | 24 | 0 | Y | Y | 14 | ||||||
| Lpp70 |
| 1999 | Y | 14 | ||||||||||
| Lpp71 | D679 |
| 1999 | 19 | 0 | 8 | ||||||||
| CNCM I–4270 | D685 |
| 2000 | 26 | 0 | Y | Y | 10 | ||||||
| CNCM I 2877 | D686 |
| 2000 | 27 | 6 | Y | Y | 9 | ||||||
| Lpp189 |
| 2005 | Y | Y | 14 | |||||||||
| Lpl7 |
| ? | 11 | Y | Y | Y | 16 | |||||||
| Lpl14 |
| 1996 | 11 | Y | Y | Y | 16 | |||||||
|
| ||||||||||||||
| ATCC334 | D671 |
| ? | 25 | 0 | Y | Y | Y | 13 | |||||
| BL23 | D692 |
| ? | 1 | 1 | Y | Y | Y | 13 | |||||
| Zhang |
| ? | 1 | Y | Y | Y | 16 | |||||||
Y = present ; # = number of OGs; ? = unknown.
Figure 1Pan-genome prediction.
The number of pan-genome OGs (blue) and core genome OGs (red) is shown as a function of genomes added to the pan-genome. OGs present in only one annotated genome were not included if they appeared to represent gene fragments or overpredicted small genes.
Figure 2Genetic relatedness of strains.
(A) phylogenetic tree based on sequence similarity of 183 orthologous genes present in all strains; (B) pan-genome tree based on total genome content. Red = dairy strains; green = plant origin strains; black = human/animal origin strains; blue = unknown origin.
Figure 3Genetic potential of L. paracasei to produce short branched-chain fatty acids. from branched-chain α-keto acids (BCKA).
(a) Organization of the bkd operon in the L. casei strains and genetic context in other lactobacilli. The functions encoded by bkd genes (yellow) are: Ptb, Phosphate butyryl-transferase; Buk, Butyrate kinase; BkdD, Dihydrolipoamide dehydrogenase; BkdA, 2-oxoisovalerate dehydrogenase a subunit; BkdB, 2-oxoisovalerate dehydrogenase b subunit; BkdC, Lipoamide acyltransferase component of BKDH complex; PanE, Ketopantoate reductase PanE/ApbA. The locus tags of the respective 8 bkd genes in the reference genomes are: LSEI_1441–1148 in L casei ATCC 334, LCABL_16640–16710 in L. casei BL23 and LCAZH_ 1429–1436 in L. casei Zhang. The black arrow and the stem-loop indicate a potential promoter and an ρ-independent terminator, respectively. The genetic environment around the bkd operon of L. casei is conserved among other lactobacilli: orthologous genes are shown by the same colour. PyrAB, Carbamoylphosphate synthase large subunit; PyrD, Dihydroorotate dehydrogenase PyrF; Orotidine-5′-phosphate decarboxylase; PyrE, Orotate phosphoribosyltransferase; FbpA, Fibronectin-binding protein, hypothetical protein LSEI_1438 (b) Branched-chain amino acids (BAA) catabolism to fatty acids adapted after [61]. BAA are converted into BCKA via a BAA-amino transferase. The branched-chain α-keto acid dehydrogenase (BKDH) complex is composed of BkdA, BkdB, BkdC and BkdD.
Figure 4Bar plot of OG presence/absence for the L. paracasei strains ordered according to the reference genomes.
This figure shows all pan-genome OGs found to be present (white bar) or absent (black bar) on the genomes. The box at the bottom contains OGs on contigs which are presumed plasmids. The pan-genome tree is shown at the top. The scale at the left represents pseudoassembly location relative to the reference genomes. A description of highly variable regions is shown at the right. The GC content is presented in the middle (wavy line), ranging from 30–60% (left to right).
Figure 5Summary of sugar utilization cassettes.
Each row represents the presence (green) or absence (red) of a sugar utilization cassette in the strains listed at the top; D = dairy origin, P = plant origin, M = mammalian origin, U = unknown origin. The putative sugar(s) utilized, the type of transport system, and the number of genes in each cassette are listed in the last three rows. Group A, group B and group C strains refers to Figure. Chromosomal location: A = cassette in sugar island A; B = cassette in sugar island B.
Main surface-associated and secreted proteins in 37 L. (para)casei strains.
| protein/complex/cluster | genes | strains present | strains absent | notes | reference |
| Pili gene cluster |
| ||||
| pilus proteins, pilus-specific sortase |
| 36 | 1 | absent in strain Lpp125 | |
| pilus proteins, pilus-specific sortase |
| 37 | 0 | ||
| Csc cell-surface complex CscABCDa |
| ||||
| Csc cluster 1 |
| 37 | 0 | ||
| Csc cluster 2 |
| 36 | 1 | absent in strain Lpp219 | |
| Csc cluster 3 |
| 37 | 0 | ||
| Csc cluster 4 |
| 14 | 23 | ||
| Csc cluster 5 |
| 33 | 4 | absent in strains Lpp17, Lpp46, Lpp230, Zhang | |
| Collagen/fibronection adhesion proteins |
| ||||
| collagen-binding protein CnbA |
| 37 | 0 | ∼2700 AA; 11 CnaB domains; LPSTE anchor | |
| collagen-binding protein CnbB |
| 12 | 25 | ∼750 AA; 3–4 CnaB domains; MPQTG anchor | |
| collagen-binding protein CnbC |
| 7 | 30 | ∼900 AA; 4–5 CnaB domains; LPQTG anchor; only plasmid-encoded? | |
| fibronectin-binding protein FbpA |
| 37 | 0 | ∼576 AA; FbpA and DUF814 domains | |
| Cell-wall hydrolases |
| ||||
| Msp2/p40 |
| 37 | 0 | ||
| Msp1/p75 D-glutamyl-L-lysyl endopeptidase |
| 37 | 0 | ||
| Cell-envelope proteinases | subtilase family serine proteinases |
| |||
| proteinase PrtP (and its maturase PrtM) |
| 37 | 0 | ∼1900 AA; LPKTA anchor | |
| proteinase PrtR1, inactive variant |
| 37 | 0 | ∼1800 AA; LPQMA anchor | |
| proteinase PrtR2 |
| 37 | 0 | ∼2230 AA; LPPMG anchor | |
| proteinase PrtR3 |
| 2 | 35 | ∼1500 AA; MPQAG anchor; only in strains Lpp120, Lpp122; plasmid-encoded | |
| Glycoprotein gene cluster | 11 genes, also encodes 3 glycosyltransferases |
| |||
| Ser/Ala-rich glycoprotein | 10 | 27 | ∼2700 AA; LPQTG anchor | ||
| extracellular protein, unknown function | 10 | 27 | ∼580 AA; 2 Ig-like and 1 SCP domains | ||
| extracellular protein, unknown function | 10 | 27 | ∼900 AA; 2 Ig-like domains | ||
| Wss secretion gene cluster | 2 | 35 | 6 genes, WXG100 secretion system; only in strains Lpp17 and Lpp230 |
| |
| Extracellular proteins gene cluster | 4 genes, only in strains Lpp46, Zhang | ||||
| 3 extracellular proteins, unknown function | 2 | 35 | no LPxTG-type peptidoglycan anchors |
The csc gene cluster can encode different combinations of A, B, C and D subunits [41].
Refers to LPXTG-type peptidoglycan anchors [104].
PrtR1 and PrtR2 are encoded on adjacent genes.
Overview of significant gene-trait matching results corresponding to growth/no growth of 34 strains in the presence of different sugars.
| sugar | strains growth | strains no growth | regions, OG functions |
| lactose | 21 | 13 |
|
| > lactose PTS transport system | |||
| >6-phospho-beta-galactosidase | |||
| > beta-glucoside bgl operon antiterminator | |||
| saccharose ( = sucrose) | 17 | 17 | > sucrose-6-phosphate hydrolase |
| galactitol ( = dulcitol) | 2 | 32 |
|
| > galactitol PTS | |||
| > tagatose-6-phosphate kinase | |||
| > sorbitol-6-phosphate 2-dehydrogenase | |||
|
| |||
| > L-proline/glycine-betaine ABC transporter | |||
| mannitol | 31 | 3 |
|
| > galactosamine PTS | |||
| > galactosamine-6-phosphate isomerase | |||
| > N-acetylglucosamine-6-phosphate deacetylase | |||
| > glycosyl hydrolase, family 35 (beta galactosidase 3?) | |||
|
| |||
| > phosphonate/sulfonate ABC transporters | |||
| cellobiose | 29 | 5 |
|
| > alpha-glucosides PTS | |||
| > maltose-6′-phosphate glucosidase | |||
| ribose | 26 | 8 |
|
| > part of ribose utilization operon | |||
| > alpha-glucosides PTS | |||
| > maltose-6′-phosphate glucosidase | |||
| sorbitol ( = glucitol) | 19 | 15 |
|
| > galactitol PTS | |||
| > galactosamine PTS | |||
| > ascorbate PTS | |||
| > cellobiose PTS | |||
| > fructose/mannitol PTS | |||
| sorbose | 21 | 13 |
|
| > sorbose PTS | |||
| > many other sugar PTS |
Figure 6Example of the GTM output.
The first column lists the sugar tested, and the second and third columns indicate the number of strains that grow (positive) or do not grow (negative) on that sugar. Relevant OGs and their annotation are listed in columns four and five. All coloured cells indicate OGs important for the classification of the specified phenotype (at top). Green cells indicate presence of the OG (>75%), red indicates absence of the OG (>75%). OGs that are important for the classification of the phenotype but are not present or absent in a large fraction of the strains are coloured black.