Literature DB >> 21338475

Genome-scale diversity and niche adaptation analysis of Lactococcus lactis by comparative genome hybridization using multi-strain arrays.

Roland J Siezen1, Jumamurat R Bayjanov, Giovanna E Felis, Marijke R van der Sijde, Marjo Starrenburg, Douwe Molenaar, Michiel Wels, Sacha A F T van Hijum, Johan E T van Hylckama Vlieg.   

Abstract

Lactococcus lactis produces lactic acid and is widely used in the manufacturing of various fermented dairy products. However, the species is also frequently isolated from non-dairy niches, such as fermented plant material. Recently, these non-dairy strains have gained increasing interest, as they have been described to possess flavour-forming activities that are rarely found in dairy isolates and have diverse metabolic properties. We performed an extensive whole-genome diversity analysis on 39 L. lactis strains, isolated from dairy and plant sources. Comparative genome hybridization analysis with multi-strain microarrays was used to assess presence or absence of genes and gene clusters in these strains, relative to all L. lactis sequences in public databases, whereby chromosomal and plasmid-encoded genes were computationally analysed separately. Nearly 3900 chromosomal orthologous groups (chrOGs) were defined on basis of four sequenced chromosomes of L. lactis strains (IL1403, KF147, SK11, MG1363). Of these, 1268 chrOGs are present in at least 35 strains and represent the presently known core genome of L. lactis, and 72 chrOGs appear to be unique for L. lactis. Nearly 600 and 400 chrOGs were found to be specific for either the subspecies lactis or subspecies cremoris respectively. Strain variability was found in presence or absence of gene clusters related to growth on plant substrates, such as genes involved in the consumption of arabinose, xylan, α-galactosides and galacturonate. Further niche-specific differences were found in gene clusters for exopolysaccharides biosynthesis, stress response (iron transport, osmotolerance) and bacterial defence mechanisms (nisin biosynthesis). Strain variability of functions encoded on known plasmids included proteolysis, lactose fermentation, citrate uptake, metal ion resistance and exopolysaccharides biosynthesis. The present study supports the view of L. lactis as a species with a very flexible genome.
© 2011 The Authors. Journal compilation © 2011 Society for Applied Microbiology and Blackwell Publishing Ltd.

Entities:  

Mesh:

Year:  2011        PMID: 21338475      PMCID: PMC3818997          DOI: 10.1111/j.1751-7915.2011.00247.x

Source DB:  PubMed          Journal:  Microb Biotechnol        ISSN: 1751-7915            Impact factor:   5.813


Introduction

The Gram‐positive bacterium Lactococcus lactis has been an important model organism for low‐GC Gram‐positive bacteria for many years. The primary reason for the interest in this species is the extraordinary industrial importance of L. lactis strains as primary components of dairy starter cultures. Genetic techniques have been widely applied in recent years to unravel the molecular basis of industrially important phenotypic traits. Complete genome sequences of three different L. lactis strains of dairy origin have been published, further improving our knowledge of strains used in dairy technology (Bolotin ; Makarova ; Wegmann ). The abundant occurrence of L. lactis strains outside the dairy environment was already known for decades (Sandine, 1972), but recently it has been rediscovered due to ecological interest and technological properties of non‐dairy strains in an applied context (van Hylckama Vlieg ). The complete genome sequence of a L. lactis plant isolate has recently been determined and has provided a more complete view on the genomic diversity of the species L. lactis (Siezen ). The existence of many plasmids reported for L. lactis further enlarges the genetic pool and thereby the number of possible phenotypic manifestations from different combinations of chromosomal and plasmid pools (Campo ; Bolotin ; Siezen ). Taxonomically, three subspecies (ssp. lactis, ssp. cremoris and ssp. hordniae) and one biovar (ssp. lactis biovar diacetylatis) are recognized. These are the results of reclassification of now discontinued taxa, first recognized as different species (Streptococcus lactis, Streptococcus cremoris and Lactobacillus hordniae), subsequently united under the genus Lactococcus and species lactis (the historical summary of species naming is reported in van Hylckama Vlieg ). The discrimination between subspecies is formally linked to a few phenotypic tests (i.e. growth at 40°C, growth at 4% NaCl, deamination of arginine, and acid production from maltose, lactose, galactose and ribose) (Rademaker ). However, phenotypic and genetic relationships do not always correlate among strains of the same subspecies, leading to considerable confusion in taxonomy (Tailliez ). In fact all possible combinations of lactis and cremoris phenotypes and genotypes have been reported, although with different incidence (Kelly and Ward, 2002). In recent years, comparative genome hybridization (CGH), sometimes referred to as genomotyping, has been increasingly applied to unravel the gene content of bacterial strains (Molenaar ; Peng ; Earl ; Han ; La ; McBride ; Wang ; Rasmussen ; Siezen ). A recent CGH analysis of five L. lactis ssp. cremoris strains provided a first insight into diversity of genes and gene clusters, but was limited by the fact that the DNA microarray used for CGH specified only 1030 genes selected from the genome of a single strain L. lactis ssp. cremoris SK11, which is less than half of the genes encoded in its genome (Taibi ). Therefore many of the potential genomic variations were not assessed. Chromosomal diversity of a large collection of L. lactis strains was recently screened on the basis of their phenotype and the macrorestriction patterns produced from pulsed‐field gel electrophoresis (PFGE) analysis of SmaI digests of genomic DNA, providing insight into chromosomal size and architecture variation (Kelly ). In the current study, we performed a CGH analysis of 39 L. lactis strains using a multi‐strain, high‐resolution NimbleGen microarray, in an attempt to cover the presently known L. lactis pan‐genome. These strains were selected from a much larger set of phenotypically and genotypically characterized L. lactis strains (Rademaker ). The strains represent different subspecies (cremoris, lactis, hordniae), different phenotypic groups, and were isolated from different environmental niches. They are therefore believed to be a representative sample of diversity of the species (Table 1).
Table 1

Strains included in the analysis.

Strain codeInternal collection codeIsolation sourceOther information
Lactococcus lactis ssp. lactis genotype and a L. lactis ssp. lactis phenotype
 ATCC19435TNIZO 29TMilk (dairy starter)
 Li‐1NIZO 1156Grass
 E34NIZO 1173Silage
 N42NIZO 1230Soil and grass
 DRA4NIZO 1592Dairy starter AL. lactis ssp. lactis biovar diacetylactis
 ML8NIZO 20Dairy starter
 LMG9446, NCFB1867NIZO 2123Frozen peas
 LMG9449, NCFB1868NIZO 2124Frozen peas
 K231NIZO 2199White kimchii
 K337NIZO 2202White kimchii
 NCDO895, NCIMB700895NIZO 2211Dairy starter
 KF7NIZO 2219Alfalfa sprouts
 KF24NIZO 2220Alfalfa sprouts
 KF67NIZO 2223Grapefruit juice
 KF134NIZO 2226Alfalfa and radish sprouts
 KF146NIZO 2229Alfalfa and radish sprouts
 KF147NIZO 2230Mung bean sprouts
 KF196NIZO 2236Japanese kaiwere shoots
 KF201NIZO 2238Sliced mixed vegetables
 B2244BNIZO 3919Mustard and cress
 KF282NIZO 3920Mustard and cress
 LMG14418NIZO 2424Bovine milk
 IL1403NIZO 2441Dairy starterPlasmid‐free derivative of L. lactis ssp. lactis biovar diacetylactis CNRZ157(IL594)
 LMG8526, NCFB2091NIZO 26Chinese radish seeds
 UC317NIZO 644Dairy starter
 M20NIZO 844SoilL. lactis ssp. lactis biovar diacetylactis
 P7304NIZO 2207Litter on pasturesrRNA most related to isolates from prawns
 P7266NIZO 2206Litter on pasturesrRNA most related to isolates from prawns
Lactococcus lactis ssp. cremoris genotype and a L. lactis ssp. lactis phenotype
 V4NIZO 1157Raw sheep milk
 KW10NIZO 2249Kaanga way
 NCDO763, ML3NIZO 643Dairy starterDerivative of NCDO712
 MG1363NIZO 1492Cheese starterPlasmid‐free derivative of NCDO712
 N41NIZO 1175Soil and grass
Lactococcus lactis ssp. cremoris genotype and a L. lactis ssp. cremoris phenotype (‘true cremoris’ strains)
 LMG6897TNIZO 2418TCheese starterSubculture of strain HP
 FG2NIZO 2252Dairy starter
 AM2NIZO 33Dairy starter
 HPNIZO 42Dairy starter
 SK11NIZO 32Dairy starterPhage‐resistant derivative of AM1
Lactococcus lactis ssp. hordniae
 LMG8520TNIZO 24TLeaf hopper (insect)
Strains included in the analysis. Our objectives were (i) to gain insight into the genetic diversity based on whole‐genome gene content, and compare it with the results of other techniques (e.g. genome fingerprints and MLSA analysis (Rademaker ), (ii) to compare chromosomal and plasmid diversity, (iii) to estimate and characterize the core genome of the species, and (iv) to analyse genes and gene clusters specific for subclades of strains. These results contribute to a more complete insight into the diversity and niche adaptation of the species.

Results

Diversity in gene distribution and population structure

A CGH analysis was performed to investigate the gene content of 39 strains of L. lactis. Analysis of all core genes from sequenced genomes shows that nucleotide sequence identity between strains from the same subspecies is high: sequence identity is 99% between L. lactis ssp. lactis strains IL1403 and KF147, and it is 98% between L. lactis ssp. cremoris strains SK11 and MG1363. This is in sharp contrast to the average sequence identity of only 88% that was observed between ssp. lactis and ssp. cremoris strains. Because strains from different subspecies can be quite diverse in sequence conservation and gene content (Lan and Reeves, 2000; Medini ), we used a multi‐strain microarray instead of a single reference strain array. This multi‐strain array based on NimbleGen technology contains multiple overlapping probes targeting all known L. lactis genes in the NCBI database and is therefore better suited to detect the expected relatively large differences in nucleotide sequence identity. As with any CGH analysis, its limitation remains that novel genes that are not represented on the array will not be detected. The hybridization of DNA from the query genomes to the probes on the multi‐strain array was translated into absence or presence of genes in orthologous groups. The hybridization efficiency of DNA from the four reference strains shows that 96–99% of the known genes in these genomes were positively identified using our PanCGH algorithm (Table 2).
Table 2

Hybridization and genotyping accuracy for the four reference strains.

GenotypingIL1403KF147MG1363bSK11
OGs with at least one gene from reference strain2286242824062289
OGs with score NAa132181274109
OGs correctly identified as ‘present’ (true positives)2101222620562130
OGs incorrectly identified as ‘absent’ (false negatives)53217650
True‐positive rate97.5%99.1%96.4%97.7%
False‐negative rate2.5%0.9%3.6%2.3%

NA means that the presence/absence of an OG could not be calculated, either because the corresponding genes were not represented on the microarray, or due to an insufficient number of probes matching to members of this OG (by default at least 10 probes must be aligned).

Note that strain MG1363 was not used in the CGH array design, and therefore the positive recall for this strain was slightly lower than for the other three reference strains.

Hybridization and genotyping accuracy for the four reference strains. NA means that the presence/absence of an OG could not be calculated, either because the corresponding genes were not represented on the microarray, or due to an insufficient number of probes matching to members of this OG (by default at least 10 probes must be aligned). Note that strain MG1363 was not used in the CGH array design, and therefore the positive recall for this strain was slightly lower than for the other three reference strains. Phylogenetic relationships of strains are basically reflected in differences in chromosomal sequence and content, although adaptation to different environmental niches is also related to acquisition or loss of mobile elements (plasmids, phages, IS elements, transposons, etc.), and the interchange between mobile elements and the chromosome is well documented in lactococci. We analysed chromosomal orthologous groups (chrOGs) separately from plasmid orthologous groups (pOGs). For chrOGs, the PanCGH algorithm was used to translate hybridization signals into presence or absence of orthologous groups, rather than individual genes (Bayjanov ). In total, 3877 chrOGs were defined on the basis of presence of genes in chromosomes of the four fully sequenced strains (IL1403, KF147, SK11 and MG1363). A total of 622 chrOGs were targeted by fewer than 10 probes per chrOG, and therefore excluded by the PanCGH algorithm from the analysis, reducing the total number of chrOGs investigated to 3255 (Table 3).
Table 3

Chromosomal orthologous groups (chrOGs), derived from pan‐genome CGH analysis, and their presence in L. lactis strains according to different criteria.

Analysed groupsNumberOther information
Total orthologous groups3877Based on four sequenced L. lactis genomesa
Core chrOGs for sequenced genomes1513Based on four sequenced L. lactis genomesa
Number of groups reliably analysable by CGH3255622 OGs not on array or not analysed
Core chrOGs for the species L. lactis (37 strains)1121Strains P7266 and P7304 omitted
Core chrOGs for the species L. lactis (35 strains)1268Strains KW10 and KF282 also omitted; see Table S1
Core chrOGs linked to LaCOGs1246Table S1
Core chrOGs only in L. lactis72Table S2; not in other LAB
Variable chrOGs in 35 strains1987See distribution in Fig. 2

Lactococcus lactis ssp. cremoris strains SK11 and MG1363, L. lactis ssp. lactis strains IL1403 and KF147.

Chromosomal orthologous groups (chrOGs), derived from pan‐genome CGH analysis, and their presence in L. lactis strains according to different criteria. Lactococcus lactis ssp. cremoris strains SK11 and MG1363, L. lactis ssp. lactis strains IL1403 and KF147. The complete data set of chrOGs was used to cluster the L. lactis strains on the basis of presence/absence of chrOGs (Fig. 1). Strains were clearly separated into two major clades corresponding to the subspecies lactis and cremoris. This confirms previous results of genotypic and phenotypic studies on these Lactococcus strains (Rademaker ). Our whole chromosome‐based tree is most similar to their tree based on a five‐locus MLST cluster analysis, but our tree contains much more genomic information on strain diversity, as demonstrated below. The two major subspecies groups are further subdivided into subclades in the whole‐genome tree (Fig. 1). For the ssp. lactis strains, dairy and plant lactis isolates are in separate subclades, while in the ssp. cremoris strains, the two subclades correspond to the two different phenotypes, i.e. the lactis‐like and cremoris‐like phenotypes. The type strain LMG8520T of L. lactis ssp. hordniae, isolated from leaf hoppers, appears to have a lactis‐like genomic content, and is grouped with plant isolates.
Figure 1

Whole‐genome content‐based tree. Hierarchical clustering tree of L. lactis strains based on presence/absence of all chromosomal orthologous groups (chrOGs) in these strains. The binary distance metric was used in combination with the average linkage clustering algorithm. Solid rectangles signify dairy isolates, while the other strains signify mainly plant origin. The top clade of 10 strains corresponds to ssp. cremoris genotype, further divided into two subclades, corresponding to the two phenotypes, i.e. cremoris‐like (upper subclade) and lactis‐like phenotype (lower subclade). The lower clade of 27 strains contains only L. lactis ssp. lactis and ssp. hordniae type strain LMG8520T. This clade grouping ssp. lactis strains contains subclades corresponding to isolation source (dairy versus non‐dairy). Strains P7266 and P7304 are clustered far apart from the other subspecies with a lactis genotype.

Whole‐genome content‐based tree. Hierarchical clustering tree of L. lactis strains based on presence/absence of all chromosomal orthologous groups (chrOGs) in these strains. The binary distance metric was used in combination with the average linkage clustering algorithm. Solid rectangles signify dairy isolates, while the other strains signify mainly plant origin. The top clade of 10 strains corresponds to ssp. cremoris genotype, further divided into two subclades, corresponding to the two phenotypes, i.e. cremoris‐like (upper subclade) and lactis‐like phenotype (lower subclade). The lower clade of 27 strains contains only L. lactis ssp. lactis and ssp. hordniae type strain LMG8520T. This clade grouping ssp. lactis strains contains subclades corresponding to isolation source (dairy versus non‐dairy). Strains P7266 and P7304 are clustered far apart from the other subspecies with a lactis genotype.

Core genes of L. lactis

Core genes are those that are conserved in all strains and are typically involved in the essential cellular processes of a species. Strains P7304 and P7266 were not included in this analysis, because their chromosomal sequences deviate too much from the other strains, resulting in too many false negatives in the hybridization signals (see the text in Supporting information). The distribution of presence shows that there are 1121 chrOGs present in the 37 L. lactis strains (Fig. 2A), which we coin as ‘core chrOGs’.
Figure 2

Distribution of chrOGs in the strains. Distribution of chromosomal orthologous groups (chrOGs) in 37 strains (A) and in 35 strains (B). Strains P7304 and P7266 are omitted in (A) and strains KW10 and KF282 are also omitted in (B), due to ambiguities in hybridization efficiencies (see text). The bar on the outer right represents the total number of chrOGs in the core genome. Shading indicates whether the chrOGs are present only in ssp. cremoris strains (black), only in ssp. lactis strains (white) or in both subspecies (grey).

Distribution of chrOGs in the strains. Distribution of chromosomal orthologous groups (chrOGs) in 37 strains (A) and in 35 strains (B). Strains P7304 and P7266 are omitted in (A) and strains KW10 and KF282 are also omitted in (B), due to ambiguities in hybridization efficiencies (see text). The bar on the outer right represents the total number of chrOGs in the core genome. Shading indicates whether the chrOGs are present only in ssp. cremoris strains (black), only in ssp. lactis strains (white) or in both subspecies (grey). Another 2134 chrOGs contain genes which do not appear to be present in all strains, and of these, 280 chrOGs are found in 36 strains and 79 chrOGs in 35 strains. From the genes that lack in only one strain, most are absent in KW10 (72 chrOGs) or in KF282 (70 chrOGs), possibly due to chromosomal sequence variations leading to poor hybridization signals. Since strains KW10 and KF282 show an aberrant gene presence/absence pattern compared with strains with the same genotype, the core genome would be considerably larger if these strains were also left out from the analysis (Fig. 2B). When considering only the remaining 35 strains, 1268 chrOGs constitute the core genome; a full list of these core genes in the four reference genomes and their encoded functions is presented in Table S1 in Supporting information. Amazingly, about 180 core chrOGs (14%) consist of proteins with as yet unknown function (hypothetical proteins), and many more encode proteins with only a general function annotated (e.g. general enzyme or transporter family predicted only). These results show that there is still much unknown about the core gene functions of lactococci.

Linking core chrOGs to LaCOGs (Lactobacillales‐specific Clusters of Orthologous Genes)

The 1268 L. lactis core chrOGs were compared with the LaCOGs (Lactobacillales‐specific Clusters of Orthologous Genes), which represent groups of genes present in at least two out of 12 sequenced LAB genomes (Makarova ; Makarova and Koonin, 2007) and recently updated to 26 sequenced LAB genomes (Zhou ). The vast majority (98%) of our core chrOGs were unambiguously linked to the LaCOGs (Table 3 and Table S1 in Supporting information). Interestingly, in the initial definition of LaCOGs (Makarova ), L. lactis strains IL1403 and SK11 were considered as separate organisms although they belong to the same species. Therefore, LaCOGs actually include a number of OGs that are specific for the species L. lactis (see below). Based on our CGH analysis of 35 strains, we have now identified 72 core chrOGs/LaCOGs which are specific for the L. lactis species, in the sense that they are found in all L. lactis strains, but do not have homologues in other LAB genome sequences (Table 4; full details in Table S2).
Table 4

Lactococcus lactis specific core genes with predicted functionsa in 35 strains.

chrOG idLaCOG idSize (AA)bConsensus functionBest hit in non‐LAB organism
1626LaCOG02385162–180Acetyltransferase, GNAT familyStreptococcus sp.
336LaCOG02698152Acetyltransferase, GNAT familyBacteroides
1134LaCOG027311436Activator of (R)‐2‐hydroxyglutaryl‐CoA dehydrataseStreptococcus sp.
1884LaCOG02722213Aminoglycoside phosphotransferaseBacillus sp.
350LaCOG02578379–383ATP/GTP‐binding proteinEnterococcus sp.
202LaCOG02425784Carbon starvation protein APropionibacterium freudenreichii
1125LaCOG02464134–151Dinucleoside polyphosphate hydrolaseCaminibacter mediatlanticus
262LaCOG02721251–261MetallophosphoesteraseEnterococcus sp.
463LaCOG02619462–465MF superfamily multidrug resistance proteinListeria grayi
174LaCOG02554443NAD(FAD)‐utilizing dehydrogenaseTuricibacter sp.
2067LaCOG02712535NADH dehydrogenasePaenibacillus sp.
1192LaCOG02661101O6‐methylguanine‐DNA methyltransferaseBacillus sp.
339LaCOG02380145Osmotically inducible protein CPseudomonas sp.
1256LaCOG024281190–1223Pyruvate‐flavodoxin oxidoreductaseEnterococcus sp.
1483LaCOG02566276–296Rgg/GadR/MutR family transcriptional regulatorStreptococcus sp.
2408LaCOG02658160–163SUF system FeS assembly proteinNakamurella multipartita
1011LaCOG02734151TransporterNone
2258LaCOG02509143Universal stress proteinEnterococcus sp.
636LaCOG02670141Universal stress protein AEnterococcus sp.
1370LaCOG02404269–303Zinc‐binding dehydrogenaseStreptomyces sp.

For a full list of the 72 L. lactis‐specific chrOGs see Table S2.

Size (in AA) of protein in four reference L. lactis genomes.

Lactococcus lactis specific core genes with predicted functionsa in 35 strains. For a full list of the 72 L. lactis‐specific chrOGs see Table S2. Size (in AA) of protein in four reference L. lactis genomes.

Diversity of chromosomal genes of L. lactis

The occurrence of numerous chrOGs in only a few strains (Fig. 2) supports the hypothesis that the species L. lactis is genetically extremely flexible. Therefore, we investigated in more detail the genetic signatures, i.e. chrOGs, genes and gene clusters, linked to the different genomic subclades and to the different isolation niches. Based on total chromosomal gene content, the 37 strains investigated can be divided in two clusters, each including the type strains of the subspecies (Fig. 1). In the following analysis, we first focused on the chrOGs specific for each subspecies clade. Nearly 600 and 400 chrOGs were found to be specific for either the subspecies lactis or subspecies cremoris respectively, of which nearly half specified hypothetical proteins of unknown function; full details of these subspecies‐specific chrOGs and genes are listed in Table S3. Based on our CGH analysis, a small subset of these subspecies‐specific chrOGs appear to be present in all tested cremoris (151 chrOGs) or all lactis strains (72 chrOGs), and hence these could be used as genotypic marker genes to distinguish the lactis and cremoris subspecies. Many of these subspecies‐specific genes are organized in gene clusters in the reference genomes, and the functions specified by these gene clusters could be used in phenotypic typing. A short summary of the largest gene clusters and their predicted functions is presented in Table 5.
Table 5

Main subspecies‐specific conserved gene clusters and functions.

(A) Subspecies lactis‐specific
LocusaGeneFunctionComment
LLKF_0567umuCImpB/MucB/SamB family protein
LLKF_0568yfiCAcetyltransferase, GNAT family
LLKF_0569rmaJTranscriptional regulator, MarR family
LLKF_0570yfiEOrganic hydroperoxide resistance family protein
LLKF_1314nhaPNhaP‐type Na+/H+ and K+/H+ antiporterCluster not in UC317, LMG8520
LLKF_1315ymhCHypothetical protein
LLKF_1316amyLAlpha‐amylase
LLKF_1317lctOl‐lactate oxidase
LLKF_1605ypcCDEndo‐beta‐N‐acetylglucosaminidase (EC 3.2.1.96)Arabinose gene cluster is inserted between ptk–xylT in some strains
LLKF_1606dexBGlucan 1,6‐alpha‐glucosidase (EC 3.2.1.70)
LLKF_1607lnbALacto‐N‐biosidase (EC 3.2.1.140)
LLKF_1608ypcGSugar ABC transporter, substrate‐binding protein
LLKF_1609ypcHSugar ABC transporter, permease protein
LLKF_1610ypdASugar ABC transporter, permease protein
LLKF_1611ypdBAlpha‐mannosidase (EC 3.2.1.24)
LLKF_1612ypdCHypothetical protein
LLKF_1613rliBTranscriptional regulator, GntR family
LLKF_1614ypdDAlpha‐1,2‐mannosidase (EC 3.2.1.24)
LLKF_1615ptkPhosphoketolase (EC 4.1.2.9)
LLKF_1623xylTd‐xylose‐proton symporter
LLKF_1624xylXAcetyltransferase (EC 2.3.1.‐)
LLKF_1625xynBBeta‐1,4‐xylosidase
LLKF_1626xynTXyloside transporter
LLKF_1627xylMAldose‐1‐epimerase (EC 5.1.3.3)
LLKF_1628xylBXylulose kinase (EC 2.7.1.17)
LLKF_1859arcCCarbamate kinase (EC 2.7.2.2)Cluster partially absent in LMG9449; there are other copies of carbamate kinase
LLKF_1860aguAAgmatine deiminase (EC 3.5.3.12)
LLKF_1861yrfDAgmatine/putrescine antiporter
LLKF_1862pctAPutrescine carbamoyltransferase (EC 2.1.3.6)
LLKF_1863Transcriptional regulator, LuxR family
LLKF_2026corCMagnesium and cobalt efflux protein
LLKF_2027pacBPenicillin acylase (EC 3.5.1.11)
LLKF_2028ytaDProtein‐tyrosine phosphatase (EC 3.1.3.48)
LLKF_2164lacZBeta‐galactosidase (EC 3.2.1.23)
LLKF_2165thgAGalactoside O‐acetyltransferase (EC 2.3.1.18)
(B) Subspecies cremoris‐specific
LACR_0451Antibiotic export permease proteinInserted relative to IL1403, KF147
LACR_0452Antibiotic export ATP‐binding protein
LACR_0453Transcriptional regulator, MarR family
LACR_0498Hypothetical proteinCluster unique for L. lactis
LACR_0501Hypothetical proteinGene absent in FG2, HP
LACR_0502Hypothetical protein
LACR_0505Hypothetical protein
LACR_0506Hypothetical protein
LACR_0507Hypothetical protein
LACR_0508Hypothetical protein
LACR_0509Hypothetical protein
LACR_0754Hypothetical protein
LACR_0755Cold‐shock DNA‐binding protein family protein
LACR_0756Cold‐shock DNA‐binding protein family protein
LACR_0761Sugar ABC transporter permeaseIn IL1403 a transposase at this position
LACR_0762Sugar ABC transporter permease
LACR_0763Oligosaccharide‐binding protein
LACR_0764Integral membrane protein
LACR_0765Alpha‐glucosidase (EC 3.2.1.30)
LACR_1288Transcriptional regulator, AraC familyGlycan degradation; similar clusters in Leuconostoc mesenteroides, Clostridium difficile, Bifidobacteria, Ruminococcus obeum
LACR_1289Major facilitator superfamily permeaseGene absent in FG2, HP
LACR_1290Glucan 1,3‐beta‐glucosidase (EC 3.2.1.58)Gene absent in FG2, HP, LMG6897T
LACR_1291Beta‐xylosidase (EC 3.2.1.37)
LACR_1632PTS system cellobiose‐specific, IIC componentWhole gene cluster absent in V4, KW10
LACR_1633Transcriptional regulator, AraC familyGene absent in FG2, HP, LMG6897T
LACR_1636Glucokinase (EC 2.7.1.2)/transcription regulatorGene absent in FG2, HP, LMG6897T
LACR_16376‐Phospho‐beta‐glucosidase (EC 3.2.1.86)
LACR_1638rpiBRibose‐5‐phosphate isomerase B (EC 5.3.1.6)
LACR_1639rpeRibulose‐5‐phosphate 3‐epimerase (EC 5.1.3.1)
LACR_1640Transcription regulator, LacI family
LACR_2591Hypothetical protein
LACR_2592Hypothetical protein
LACR_2593Hypothetical protein
LACR_2594Hypothetical protein

For the conserved OGs, members from a reference genome are listed, i.e. LLKF = L. lactis ssp. lactis KF147; LACR = L. lactis ssp. cremoris SK11. Numbering indicates genomic position relative to other chromosomal genes, where consecutively numbered genes are generally in an operon.

These genes are predicted to be present in all strains of a subspecies, either lactis or cremoris, and absent in all strains of the other subspecies. Exceptions are indicated.

Main subspecies‐specific conserved gene clusters and functions. For the conserved OGs, members from a reference genome are listed, i.e. LLKF = L. lactis ssp. lactis KF147; LACR = L. lactis ssp. cremoris SK11. Numbering indicates genomic position relative to other chromosomal genes, where consecutively numbered genes are generally in an operon. These genes are predicted to be present in all strains of a subspecies, either lactis or cremoris, and absent in all strains of the other subspecies. Exceptions are indicated. Gene clusters unique for all ssp. lactis strains (and not present in any ssp. cremoris strain) include a large cluster of 17 genes for glycan (xylan, mannan or glucan) and xylose metabolism (Table 5), which is typical for plant‐derived lactis strains as they can use these plant cell‐wall components for growth, but apparently this cluster is also maintained in dairy lactis strains. In some lactis strains, the arabinose‐utilization genes are also part this gene cluster (see below). The thgA–lacZ genes for galactose metabolism appear to be unique for all lactis strains, but are absent in all ssp. cremoris strains. Another lactis‐unique cluster is predicted to be involved in nitrogen metabolism of agmatine and putrescine, both breakdown products of arginine. Several other lactis‐specific genes are predicted to be involved in stress response (Table 5). Gene clusters unique for (almost) all ssp. cremoris strains (and not present in any ssp. lactis strain) include antibiotic resistance, sugar metabolism (α‐glucosides, β‐glucosides, ribose), but also many hypothetical proteins (Table 5). Many of the cremoris‐specific gene clusters are identified as pseudogenes in the reference cremoris genomes, which could indicate ongoing degeneration of genes and encoded functions.

Subclade‐specific clusters

Next, each branch in the tree was investigated separately for gain and loss of chrOGs to determine the degree of relatedness between strains and subclades, and to obtain insight into possible insertions and deletions of genes and gene clusters during diversification. Per split in the tree, the genes in these chrOGs were used to find clusters of adjacent genes in the corresponding reference genomes. Several large gene clusters were identified of which a selection is described below and summarized in Table 6 (others can be found in the text in Supporting information). Tree splits, annotation of the gene clusters and their best blast hits are presented in detail in Table S4.
Table 6

Diversity of chromosomally encoded gene clusters and functions.

StrainSubspeciesDairyArabinose metabolismSucrose metabolismGalacturonate metabolismα‐Galactoside metabolismXylan breakdownStarch/maltose breakdownTrp metabolismLeu/Ile/Val metabolismCitrate metabolismHigh‐affinity K+ transportNisin production/immunityEPS biosynthesis (epsX–epsL)Teichoic acid biosynthesis
LMG6897TCD+S+
HPCD+S+
FG2CD+S+
SK11CD+++S+
AM2CD++++/−S+
NCD0763CD+++M+
MG1363CD*+++M+
N41C++++/−+/−M+
V4CD+++++M+
KW10C+++/−+?
B2244BL+++++/−++++
LMG8526L++++++++++/−K+/−
Li‐1L+/−+++/−++++I+/−
K231L+/−++/−+/−++++I+/−
KF7L+/−+++++++/−K+/−
LMG9449L+/−++++/−+++I+/−
KF24L+/−++/−+++/−+
KF146L+++++++++
KF134L+++++++/−
KF196L++++++++/−
KF67L++++++++/−
KF201L++++++I+/−
E34L+/−++++
K337L+/−++/−+/−++++I+/−
M20L+/−++/−++++K+/−
LMG8520H++++/−
UC317LD++++I+
NCD0895LD++++++I+
ML8LD+++I+
LMG14418LD+++/−++++/−I+
N42L+++++I+
IL1403LD*++++I+
DRA4LD++++I+
LMG9446L++/−+++++/−
KF147L+++++++++++K+/−
ATCC19435TLD+/−+/−+/−+++K+/−
KF282L+++/−+/−++++K+/−

Predicted presence of chromosomally encoded gene clusters and their functions in the L. lactis strains. L: ssp. lactis; C: ssp. cremoris; D: dairy;

denotes plasmid‐cured strain; + denotes presence of all of the required genes; +/− denotes presence of some of the required genes. Teichoic acid biosynthesis: I = IL1403 type, M = MG1363 type, S = SK11 type, K = KF147 type. Strains P7266 and P7304 were omitted from this analysis.

Diversity of chromosomally encoded gene clusters and functions. Predicted presence of chromosomally encoded gene clusters and their functions in the L. lactis strains. L: ssp. lactis; C: ssp. cremoris; D: dairy; denotes plasmid‐cured strain; + denotes presence of all of the required genes; +/− denotes presence of some of the required genes. Teichoic acid biosynthesis: I = IL1403 type, M = MG1363 type, S = SK11 type, K = KF147 type. Strains P7266 and P7304 were omitted from this analysis.

Simple sugar metabolism

Arabinose is a monosaccharide commonly found in plants as a component of biopolymers such as hemicellulose and pectin. Plant L. lactis strains KF147 and KF282 have previously been shown to grow on l‐arabinose, in contrast to IL1403 and SK11 (Siezen ). The arabinose operon (Fig. 3A) was indeed found to be specific for plant strains. Only strains N41, KF147, KF282, LMG8526 and B2244B were predicted to contain the complete arabinose gene cluster araADBTFPR. Eight other plant lactis strains contain an arabinose operon lacking the genes araFP, encoding an alpha‐N‐arabinofuranosidase and a disaccharide permease, suggesting that they cannot utilize arabinose polymers/oligomers, but can still use arabinose itself.
Figure 3

Variable gene clusters involved in sugar breakdown. As no gene order is known for the query strains, the representative clusters present in the reference genome L. lactis KF147 are shown. (A) Arabinose metabolism; (B) sucrose metabolism; (C) galacturonate metabolism; (D) α‐galactoside metabolism; (E) xylan breakdown; (F) starch breakdown. Coloured bars indicate operon predictions of two or more genes; stalks indicate predicted terminators. Images made using MINOMICS (Brouwer ). Gene annotations are in Table S4.

Variable gene clusters involved in sugar breakdown. As no gene order is known for the query strains, the representative clusters present in the reference genome L. lactis KF147 are shown. (A) Arabinose metabolism; (B) sucrose metabolism; (C) galacturonate metabolism; (D) α‐galactoside metabolism; (E) xylan breakdown; (F) starch breakdown. Coloured bars indicate operon predictions of two or more genes; stalks indicate predicted terminators. Images made using MINOMICS (Brouwer ). Gene annotations are in Table S4. Sucrose is the major stable product of photosynthesis in plants and it is also the form in which most carbon is transported. It has been described that genes for the biosynthesis of nisin and the fermentation of sucrose are located on a 70 kb conjugative transposon in L. lactis ssp. lactis (Kelly ). In plant strains, the conjugative element is smaller and lacks the nisin genes. Here, the sucrose gene cluster (Fig. 3B) was found in all plant strains, except N42, M20 and E34. In an earlier study, plant strains KF147 and KF282 were already found to grow on sucrose, in contrast to dairy strains IL1403 and SK11 (Siezen ). However, three dairy strains do contain the operon: NCD0895, LMG14418 and V4. This suggests that the ability to ferment sucrose is not plant‐specific. Previously, the plant strains KF147 and KF282 were shown to grow on glucuronate, which is a building block of the complex sugar xylan, found in plant cells (Siezen ). All four L. lactis strains (KF147, KF282, SK11 and IL1403) described in that study were found to contain a gene cluster for uptake and degradation of dglucuronate: kdgR–uxuB–uxuA–uxuT–hypAE–uxaC–kdgK–kdgA. Only strain KF147 was found to have an additional gene cluster for uptake and degradation of dgalacturonate, a compound that is formed by the epimerization of glucuronate, which is a building block of pectin (Fig. 3C). In the present study, the dglucuronate cluster was found to be present in all strains, except the hordniae strain LMG8520. The additional dgalacturonate cluster described for KF147 was found to be only present in some other plant strains, i.e. KF146, KF196, KF67, LMG8526 and LMG9446. This suggests that these six plant strains are able to metabolize both pectin and xylan, while the rest of the plant strains can only metabolize xylan. α‐Galactosides, such as raffinose, melibiose and stachyose, are oligosaccharides typical for plants. In a previous study comparing four L. lactis strains, only plant strain KF147 was found to grow on α‐galactosides (Siezen ). In agreement with this observation, only strain KF147 was then found to possess a gene cluster for α‐galactosides uptake, breakdown and subsequent dgalactose conversion: fbp–galR–aga–galK–galT–purH–agaRCBA–sucP (Fig. 3D). The present analysis predicts that three other plant strains also contain this gene cluster, i.e. strains KF146, LMG9449 and B2244B. This α‐galactoside gene cluster resides on a 51 kb transposon in strain KF147, which could be conjugally transferred to strain MG1363 (Machielsen ) and is spontaneously lost upon prolonged growth in milk (Bachmann, 2009). The entire transposon appears to be present in strain B2244B, and parts of the transposon are present in strains LMG9449, KF146, KF67, M20, UC317 and N42.

Complex sugar metabolism

Xylan is the main component of hemicelluloses, which are heteropolymers frequently encountered in plant material. Xylan is composed of dxylose units, which can be substituted with side groups, such as l‐arabinose, dgalactose or acetyl. It is a complex structure, requiring multiple enzymes acting together for breakdown. Xylose is subsequently converted into xylulose‐5‐phosphate, which can enter the pentose phosphate pathway. Earlier studies revealed the presence of a gene cluster predicted to be involved in xylan breakdown in plant strains KF147 and KF282 (Siezen ) (Fig. 3E). In the current study this gene cluster was only found to be present in some ssp. lactis strains, mostly plant‐derived strains but also in two dairy lactis strains (Table 6). A large gene cluster, malR–mapA–agl–amyY–maa–dexA–dexC–malEFG, involved in breakdown of starch and its building block maltose is present in all four sequenced reference L. lactis strains: IL1403, MG1363, SK11 and KF147 (Fig. 3F). The CGH data predict that the entire cluster is absent only in the cremoris strains HP, FG2 and LMG6897T, while the maltose transporter genes malEFG are absent in 10 lactis strains. The genes for starch breakdown and subsequent uptake and conversion of oligo/monosaccharides are probably lost in these three cremoris strains as a consequence of living in a lactose‐rich dairy environment.

Amino acid metabolism

Glutamate decarboxylase activity is one of the phenotypic traits used to distinguish ssp. cremoris from ssp. lactis strains (Nomura ). CGH analysis indicates that all strains of ssp. cremoris and ssp. lactis appear to have a large gene cluster for glutamate metabolism, including the genes gadRCB and gltBD. The glutamate decarboxylase gene gadB of cremoris strain SK11 is inactive due to a frameshift mutation (Wegmann ), while the gadB gene of cremoris strain MG1363 is complete and was shown to be active (Sanders ). Our CGH analysis can only predict whether genes are present, and not whether they are active or inactive. Therefore we conclude that presence/absence of gadB genes or their activity is not suitable to distinguish ssp. cremoris from ssp. lactis. Arginine deiminase activity is another phenotypic trait used to distinguish ssp. cremoris from ssp. lactis strains. Gene clusters argFBDJC, argGH and argRS–arcABD1C1C2TD2 for arginine metabolism are predicted by CGH analysis to be present in all analysed L. lactis strains. The arginine deiminase gene arcA of cremoris strain SK11 is inactive due to a frameshift mutation (Wegmann ), while the arcA gene of cremoris strain MG1363 is complete and has been shown to be functional (Budin‐Verneuil ). Therefore, as described for the gadB genes, the presence/absence of the arcA gene does not appear to be a good predictor to distinguish between ssp. cremoris and lactis. Degradation products from branched‐chain amino acids play a major role in cheese flavour formation (Smit ). A large cluster leuABCD–ilvDBHCA involved in branched‐chain amino acid metabolism was found to be absent in dairy L. lactis strain ML8, and incomplete in strains LM8520 and N41. Therefore, all three strains are probably incapable of synthesizing branched‐chain amino acids. However, auxotrophy in dairy L. lactis strains may also be due to simple mutations in these genes, as has been demonstrated for strain IL1403 (Godon ). Citrate metabolism. Citrate utilization, with final production of acetoin and diacetyl, is an interesting phenotypic trait for the dairy industry. Diacetyl production is the criterion for naming of the biovar diacetylactis strains. The genes required are citP for citrate permease (usually plasmid‐located; see below) and operon citMCDEFXG encoding the enzymes for metabolism of citrate (Garcia‐Quintans ). Indeed, the chromosomal gene cluster was detected only in strains belonging to the biovar diacetylactis included in our analysis: IL1403 (plasmid‐free derivative of a diacetylactis strain), DRA4 and M20. Only strain DRA4 has the plasmid‐encoded citrate permease gene citP (see below).

Survival/stress response

Manganese functions in protection against oxidative stress, as has been described for Bacillus subtilis (Inaoka ) and Lactobacillus plantarum (Groot ). Studies with tellurite‐resistant L. lactis mutants showed that manganese stimulates iron transport and reduces oxidative stress (Turner ). A manganese ABC‐transporter operon mtsACB was identified in most strains, except lactis strain LMG9446 and dairy cremoris strains V4, LMG6897T, HP and FG2. The gene cluster shows high sequence similarity to genes in enterococci and streptococci (60–98% amino acid identity). As iron excess is believed to generate oxidative stress, it is possible that these strains are less resistant to oxidative stress because they are unable to transport iron efficiently and consequently have higher intracellular iron levels. Lactococcus lactis strains from the ssp. cremoris have been described to be more sensitive to osmotic stress than ssp. lactis strains. The mechanism of osmo‐dependent repression by the glycine/betaine transporter encoded in the bus operon in L. lactis has been described in a recent study (Romeo ). Reduced growth of cremoris strains at high osmolality has been shown to relate to absence or reduced activity of the bus operon (Obis ). In our CGH analysis, both the busRAB operon and a gene cluster encoding a choline transporter (choQS) and glutathione reductase (gshR) were found to be absent only in cremoris strains HP, FG2 and LMG6897T. A high‐affinity K+ transport system kdpDEABC (two‐component regulator and ATPase) is absent in all cremoris strains and in the hordniae strain, but present in all lactis strains except diacetylactis strains IL1403 and DRA4 (Table 6). These findings suggest that in particular many of the cremoris strains cannot cope well with a high‐osmolarity environment, such as high salt concentrations. Several soil bacteria, such as Bacillus and Streptomyces species, are known to contain gene clusters involved in non‐ribosomal peptide or polyketide biosynthesis (Finking and Marahiel, 2004; Siezen and Khayatt, 2008). Non‐ribosomal peptide synthetases (NRPS) and polyketide synthases (PKS) are of great interest, because they produce numerous therapeutic agents and have a great potential for engineering novel compounds. These multi‐module proteins are the largest enzymes known. In recent years, NRPS and NRPS/PKS gene clusters have also been identified in the lactic acid bacteria L. plantarum WCFS1 (Kleerebezem ) and L. lactis KF147 (Siezen ). It was hypothesized that the NRPS/PKS product in L. lactis functions in microbe–plant interactions (defence or adhesion) or that it facilitates iron uptake from the environment. Here, the complete NRPS/PKS gene cluster of 13 genes from strain KF147 has been found to be present in five of the L. lactis strains, i.e. the plant strains KF147, KF146, KF134, KF196 and Li‐1, suggesting that all these plant strains are capable of synthesizing this as yet unknown NRPS/PKS product.

Cell envelope

Bacteria living in plant environments are often found in biofilms, using exopolysaccharides (EPS) to adhere to plants (Danhorn and Fuqua, 2007). As a consequence, genes involved in the physical interaction with the plant cells are expected to be present in the plant‐derived L. lactis strains. EPS‐producing strains are interesting for the dairy industry, as they are used to improve the texture and viscosity of fermented products. Our CGH results show some remarkable variability in chrOG distribution of EPS genes. A large EPS biosynthesis cluster I of about 25 genes includes rmlACBD and rgpABCDEF that are responsible for the formation of rhamnose‐glucose polysaccharides (Fig. 4A) (Table S4). This EPS gene cluster consists of three separate parts: (i) the first part of seven to eight genes (rmlA–rgpB) appears to be present in all ssp. lactis and cremoris strains, (ii) the second part of seven to eight genes (rgpC–ycbC) is present in all cremoris strains, but only in lactis strains KF7, KF147 and IL1403, while (iii) the third part of nine genes is completely different in the cremoris and lactis reference strains (see genes and their functions in Table S4). This third set of cremoris‐like genes appears to be present in all cremoris strains and lactis strain KF282, while the third set of lactis genes, presumably involved in glycerophosphate‐containing lipoteichoic acid biosynthesis, is again only present in lactis strains KF7, KF147 and IL1403 (Table S4). This variability in the composition of genes in this large EPS cluster suggests that a variety of different EPS structures can be made by L. lactis strains.
Figure 4

Variable gene clusters for cell‐envelope biosynthesis. As no gene order is known for the query strains, the representative clusters present in the reference genome L. lactis KF147 are shown. (A) Exopolysaccharide (EPS) biosynthesis cluster I; (B) exopolysaccharide (EPS) biosynthesis cluster II; (C) teichoic acid biosynthesis cluster. Coloured bars indicate operon predictions of two or more genes; stalks indicate predicted terminators. Images made using MINOMICS (Brouwer ). Gene annotations are in Table S4.

Variable gene clusters for cell‐envelope biosynthesis. As no gene order is known for the query strains, the representative clusters present in the reference genome L. lactis KF147 are shown. (A) Exopolysaccharide (EPS) biosynthesis cluster I; (B) exopolysaccharide (EPS) biosynthesis cluster II; (C) teichoic acid biosynthesis cluster. Coloured bars indicate operon predictions of two or more genes; stalks indicate predicted terminators. Images made using MINOMICS (Brouwer ). Gene annotations are in Table S4. A second large cluster II for EPS biosynthesis in the plant‐derived strain KF147 consists of 13 genes, epsXABCDEFGHIJKL (Fig. 4B) (Siezen ). In the present study, this complete cluster was found to be present only in plant strains KF147 and KF146, while parts of the cluster (usually including the genes epsXABC, which possibly encode a basic EPS backbone structure) are present in the plant strains N41, KF134, KF196, KF67, KF7, LMG8526 and B2244B (Table 6). Therefore, this EPS gene cluster and its variants appear to be more specific for plant‐derived strains, and could encode biosynthesis of EPS which are beneficial for survival in the plant environment. This remarkable variability of EPS cluster genes in L. lactis confirms other observations on diversity already reported in Streptococcus thermophilus (Rasmussen ), again suggesting a rich variety in structures of the produced EPS in these LAB species. A teichoic acid (TA) biosynthesis gene cluster tagL–tagB is quite variable in the four reference strains (Table 6). The reference cremoris strain MG1363 and lactis strain KF147 have the most similar TA cluster, sharing 14 syntenous genes (out of 17 genes in KF147 and 19 in MG1363) (Fig. 4C), while strain IL1403 shares only 7 (out of 15) genes with MG1363 and KF147. In reference strain SK11, all the genes between tagB and tagL have been replaced by pseudogenes encoding transposases and a putative lipopolysaccharide‐1,2‐glucosyltransferase. All these types of TA clusters are predicted to be present in the larger set of L. lactis strains analysed in this study, with the IL1403‐type TA cluster being the most common (Table 6). The variability in the composition of this TA biosynthesis gene cluster suggests that different types of teichoic acids and their derivatives may be made by L. lactis strains.

Diversity of plasmid‐encoded genes

Dairy strains often contain several plasmids to provide the functions needed to survive and thrive in a milk environment (McKay, 1983; Davidson ; Siezen ). All known plasmid‐located genes of L. lactis were represented on the CGH array (Table S5) which allowed us to assess there occurrence and distribution in the L. lactis strains analysed in our study. The presence or absence of corresponding genes, rather than OGs, in the 39 L. lactis strains was evaluated from the CGH data, and is available in Table S6. In this case, initial clustering into ‘plasmid OGs’ did not provide any advantage due to the large variability in types of known plasmids and their encoded proteins. Moreover, direct analysis of the much smaller set of plasmid genes was computationally easier, and allowed a direct analysis of their presence/absence in context of functional gene clusters. Overall, dairy strains appear to contain many known plasmid‐encoded functions, while plant strains contain few or none (Table 7). These functions include lactose metabolism (lacRABCDFEGX genes), external proteolysis (prtP, prtM), copper resistance (lcoCRS), cadmium resistance (cadAC) and manganese transport (mntH). Dairy strains harbouring multiple genes for replication and partitioning presumably contain multiple plasmids encoding these functions (Table 7). Interestingly, strains N41 and N42, of soil and grass origin, appear to have very similar plasmid‐encoded functions compared with the dairy strains. Moreover, they both cluster with dairy strains based on chromosome content (Fig. 1), and may therefore originally be from dairy sources.
Table 7

Diversity of putative plasmid‐encoded genes and functions.

StrainSubspeciesDairyReplication/partitioningMobilization/conjugationProteolysis (prtP, prtM)Copper resistanceCadmium resistanceManganese transportLactose metabolismCitrate uptake (citP)Glu dehydrogenaseEPS synthesis
LMG6897TCD++/−+++
HPCD++++
FG2CD+++++
SK11CD++/−++++
AM2CD++/−+++++
NCD0763CD++++
MG1363CD*
N41C++++++++/−
V4CD+
KW10C++/−+/−+/−++/−
B2244BL+/−
LMG8526L+/−++/−
Li‐1L++/−+++
K231L++/−+
KF7L+++/−
LMG9449L+++/−+
KF24L+
KF146L+/−
KF134L+/−
KF196L++/−
KF67L+/−
KF201L
E34L
K337L
M20L
LMG8520H+
UC317LD++++++++/−
NCD0895LD+++++
ML8LD++++++
LMG14418LD
N42L++++/−+++
IL1403LD*
DRA4LD++/−+++++/−
LMG9446L++/−+/−++
KF147L++/−
ATCC19435TLD++
KF282L+
P7304L++/−++/−++/−
P7266L++/−+/−+++/−

Predicted presence of plasmid‐encoded genes and their functions in the L. lactis strains. L: ssp. lactis; C: ssp. cremoris; D: dairy;

denotes plasmid‐cured strain; + denotes the presence of all or most of the required genes, +/− denotes the presence of some of the required genes. Genes that are known to be both chromosomally and plasmid‐encoded are not included in this analysis, e.g. transposases, intergrases/recombinases, restriction/modification system (hsdM, hsdR, hsdS), proteolytic system (pcp, pepO, pepF, oppACBFD), cold shock proteins and all plasmid‐encoded genes that hybridized with the plasmid‐free strains IL1403 or MG1363.

Diversity of putative plasmid‐encoded genes and functions. Predicted presence of plasmid‐encoded genes and their functions in the L. lactis strains. L: ssp. lactis; C: ssp. cremoris; D: dairy; denotes plasmid‐cured strain; + denotes the presence of all or most of the required genes, +/− denotes the presence of some of the required genes. Genes that are known to be both chromosomally and plasmid‐encoded are not included in this analysis, e.g. transposases, intergrases/recombinases, restriction/modification system (hsdM, hsdR, hsdS), proteolytic system (pcp, pepO, pepF, oppACBFD), cold shock proteins and all plasmid‐encoded genes that hybridized with the plasmid‐free strains IL1403 or MG1363. Several plant‐derived L. lactis strains also appear to contain plasmids, but the encoded genes could not be predicted because our pan‐genome microarray specified probes to many known dairy plasmids, whereas few plasmids from plant isolates have been described and thus were not included on the array. Therefore our present analysis clearly underestimates the plasmid‐encoded genes of plant L. lactis strains. The presence of genes for EPS biosynthesis in many plant strains does not always correlate with the presence of replication/partitioning genes, so those EPS genes may be chromosomally located (Table 6). Gel electrophoresis confirmed that most dairy strains contained multiple plasmids, while these plant strains contained very few or no plasmids (Fig. 5).
Figure 5

Polyacrylamide gel electrophoresis of plasmid DNA in L. lactis strains. Far left and right lanes contain molecular weight markers. The lower three panels are Southern blots of the same gel as at top, using probes for the citP, lacG and prtP genes. The arrow indicates an artefact band, present in all lanes, and presumably due to contaminating chromosomal DNA.

Polyacrylamide gel electrophoresis of plasmid DNA in L. lactis strains. Far left and right lanes contain molecular weight markers. The lower three panels are Southern blots of the same gel as at top, using probes for the citP, lacG and prtP genes. The arrow indicates an artefact band, present in all lanes, and presumably due to contaminating chromosomal DNA.

Discussion

The present study supports the view of L. lactis as a genomically very flexible species. Different genetic events – some reversible, some irreversible – influence phenotypes, which are the interactions between the bacterium and the environment it encounters. Genetic transfer has been demonstrated to be possible between strains of the two L. lactis subspecies (Rademaker ) and also with other bacteria (Bolotin ). Also, literature data on amino acids auxotrophy (e.g. Delorme ) and on carbohydrate metabolism, e.g. maltose degradation shown in the present study, confirm that auxotrophy is either due to mutations/frameshifts or due to deletions. This further demonstrates the flexibility of L. lactis genomes, and their diversification related to niche adaptation. This is important also in the taxonomic perspective (Pace, 2009), as previous work and our study demonstrate that nomenclature based only on phenotype is unreliable. In fact, some phenotypic tests differentiating type strains of lactis and cremoris are due to severe gene deletions in the cremoris type strain and in a few other strains, but due to simple point mutations in other strains (e.g. SK11), which could be reversible. From the current study we conclude that species lactis diversity can best be described through a combination of 16S rRNA sequence, genotypic markers and selected phenotypic tests. Therefore, we suggest that nomenclature of this species should be based on genotypic tests, e.g. fingerprinting techniques or specific gene sequence analysis, completed with classical phenotypic tests, to guarantee the continuity with classical taxonomy. Our data support the theory that the ancestor of the species originally inhabited the plant niche, but was able to successfully colonize other habitats due to its genomic flexibility (Quiberoni ). The first event in evolution appears to be subspeciation into the lactis and cremoris subspecies, with no evident differences between gene gain and gene loss, which generated the two subspecies. Adaptation to milk was a more recent event, and therefore appears to have happened independently in the two subspecies. Considering that very few ssp. cremoris strains are known outside the dairy environment, speciation and adaptation to milk for this subspecies could have happened at the same time, while adaptation in ssp. lactis could be a more recent event. Interestingly, the two sequenced cremoris strains, SK11 and MG1363, display genomic inversions (Wegmann ). Therefore, structural events could have influenced speciation and/or adaptation to milk in this subspecies. Also, mobile elements could have played a crucial role, as witnessed by the plasmid location of genes responsible for lactose degradation and oligopeptide transport in strain SK11. Our CGH analysis of presence or absence of gene clusters can be used to match phenotypic traits to specific genes or gene clusters, i.e. find correlations between gene content and functional properties. However, gene‐trait matching is not straightforward as, for instance, many genes encode proteins of yet unknown function, genes can be inactivated or differentially expressed, and phenotypic test results can often be ambiguous. On the other hand, our extensive data set is an obvious starting point for further research to investigate gene‐trait matching in L. lactis strains and to move further in the genome annotation procedure. In this sense, the genes need to be seen in their genomic and biological context and, in particular, in the context of cellular metabolic pathways (Teusink ). Therefore, innovative bioinformatics tools, such as Random Forest methods, are currently being used to investigate gene‐trait matching and to evaluate these data in a functional perspective (J. Bayjanov, R.J. Siezen and S.A.F.T. van Hijum, in preparation).

Experimental procedures

Strain selection and DNA preparation

Lactococcus lactis strains were selected from a large set of phenotypically and genotypically characterized strains (Rademaker ) to represent the diversity of the species in terms of taxonomy and ecology. They belong phenotypically to both subspecies lactis (29 strains) and cremoris (10 strains) and were isolated from different sources (Table 1). The source, growth conditions and typing of the selected L. lactis strains, using 16S rRNA typing and other standard methods and using outgroups such as L. plantarum and Enterococcus casseliflavus, have been described in detail previously (Rademaker ). These authors concluded that the two very divergent strains P7304 and P7266 belong to the L. lactis species, but that these strains follow a different lineage. DNA was prepared from L. lactis strains (Table 1) using the QiaAmp DNA Mini Kit (Qiagen GmbH, Hilden, Germany) according to the manufacturer's protocol for the isolation of genomic DNA from Gram‐positive bacteria.

Microarray design, data acquisition and normalization.

All L. lactis genomic, plasmid and single gene or operon DNA sequences (1988 sequences present in July 2005, constituting 10.7 Mb) were collected from the NCBI CoreNucleotide database. This included the complete genome sequences of L. lactis strain IL1403 (2.35 Mb, Accession No. AE005176) and the incomplete genome of strain SK11 (2.43 Mb, GenBank record GI:62464763). Additionally, draft genome sequences consisting at that time of 547 contigs (2.3 Mb) of L. lactis ssp. lactis strain KF147 (NIZOB2230) and 961 contigs (2.6 Mb) of L. lactis ssp. lactis KF282 (B2244W) were added. Redundant stretches of DNA were removed, where a DNA fragment was defined as redundant if it differed from another fragment by at most 2 nucleotides over a window of 100 nucleotides. For the remaining non‐redundant 7 Mb of DNA, on each of the sequences, 32 bp probes were defined with a sliding window of 19 nucleotides, resulting in a total of 386 298 probes. We also designed 3181 random probes with their sequence absent in the non‐redundant 7 Mb of L. lactis DNA, and these were randomly located on the array. Details of array production, DNA hybridization (NimbleGen Systems, Madison, WI, USA), data normalization and data submission to GEO are described in Bayjanov and colleagues (2009). Briefly, array normalization was performed using the fields package (Fields Development Team; http://www.image.ucar.edu/Software/Fields/) using the statistical programming language R (R Development Core Team, 2006). Description of the array platform with probe information and hybridization data of 39 L. lactis strains have been deposited in the Gene Expression Omnibus database (http://www.ncbi.nlm.nih.gov/geo) with the Accession No. GPL7231. The annotations (gene definitions and putative protein function descriptions) were extracted from the GenBank files for publicly available sequences; for the draft sequences Glimmer (Salzberg ) and InterProScan (Zdobnov and Apweiler, 2001) were used. For selected genes the annotation was improved using the ERGO Bioinformatics Suite (Overbeek ).

Defining orthologous groups of genes (OGs)

During the course of our work, the complete sequences of L. lactis ssp. cremoris strains SK11, MG1363 and KF147 were published (Makarova ; Wegmann ; Siezen ), and we re‐mapped the microarray probes to the annotated genes in these genomes. In order to predict orthology among genes, the chromosome sequence of the four fully sequenced public L. lactis strains (ssp. lactis IL1403, ssp. lactis KF147, ssp. cremoris SK11, ssp. cremoris MG1363) were used. The orthology prediction program InParanoid (Remm ) was run to find orthologous genes among these genomes. InParanoid's default minimum bit score value of 50 and a minimum identity value of 80 were used for grouping genes into OGs. All possible pairwise comparisons between the genes of the four chromosomes were performed and iteratively combined to groups of chromosomal orthologous genes (chrOGs). In cases where inconsistencies were found between the InParanoid predictions (i.e. homologous genes from the four reference genomes were not all bidirectional best hits to each other), genes were regarded as not being orthologous and each treated as single genes in an orthologous group of size 1. The genes from plasmids were not categorized into OGs, but were studied as single genes (828 genes). We compared our chrOGs with the complete annotated list of LaCOGs available at ftp://ftp.ncbi.nih.gov/pub/wolf/lacto (file LaCOGS_table.xls) (Makarova ).

Determination of gene conservation in the strains

A novel genotype‐calling algorithm PanCGH was developed to determine the presence/absence of orthologous groups of genes in strains with unknown genome sequence (Bayjanov ). Briefly, a threshold score of 5.5 was defined based on presence/absence of orthologous groups in the four sequenced strains. This score was then used in the genotype‐calling algorithm applied to normalized hybridization signals of DNA from query strains. Thus, presence/absence of genes was determined on the basis of signal intensities and orthologue distribution. Applying the PanCGH algorithm to the CGH data results in a binary matrix, in which the rows represent the chrOGs and the columns the different strains. For each strain, a ‘1’ denotes the presence of an orthologue in the strain and ‘0’ denotes the absence of an orthologue. ‘NA’ signifies that presence or absence of an orthologue in a strain could not be estimated from the data due to too few valid probe signals of the chrOG members. The PanCGH algorithm assumes a minimum of 10 aligned probes, and hence CGH signal data for 622 chrOGs were not considered, as these genes were represented by less than 10 probes on the array. The hybridization results for these chrOGs were excluded from further data analysis. Presence or absence of plasmid‐encoded genes was analysed separately. Probes for all published plasmids of L. lactis (Table S5) were also present on the array. PanCGH was used to predict presence/absence in query strains of the known plasmid‐encoded genes from their hybridization signals. Genes that are known to be plasmid‐ and chromosome‐encoded were not included in this analysis of putative plasmid genes, e.g. genes encoding transposases, intergrases/recombinases, restriction/modification (R/M) system (hsdM, hsdR, hsdS), proteolytic system (pcp, pepO, pepF, oppACBFD), cold shock proteins and all plasmid‐encoded genes that hybridized with the plasmid‐free strains IL1403 or MG1363.

Hierarchical clustering of strains

To study the evolutionary relatedness and differences in genes and gene clusters that could have contributed to L. lactis strain diversification, a hierarchical clustering was performed by comparing the presence/absence profiles of chrOGs of the different strains to each other. Of the original 3877 chrOGs, the 622 chrOGs containing ‘NA’ values were omitted from this clustering (see above). A tree was constructed using the statistical programming language R, with the average linkage clustering method based on the binary distance metric.

Determining gene clusters contributing to strain diversification

By combining both the tree plot and the presence/absence profiles (‘NA’ values were again omitted), genes were identified that might be important for the diversification of the strains. Since plasmid genes are frequently exchanged between bacteria, these genes were not considered in this analysis. A Perl‐script was developed that identifies features (chrOGs) that cause a clear separation between branches in a tree, encoded in the Newick format. The script parses the tree according to the depth‐first search principle, in which the tree is traversed from the root to each leaf. At each split in the tree the presence/absence patterns of the strains in the two branches are evaluated. For each chrOG the fraction of presence in the two sub‐branches is calculated and only those chrOGs with a difference in presence of more than 70% are selected. This allows identification of chrOGs that are (almost) fully absent in one branch and (almost) fully present in the other. From this analysis a list of chrOGs that are important for each split in the tree was obtained. This list was used to identify gene clusters in the strains, which were projected on the chromosomes of the four reference genomes: MG1363, IL1403, SK11 and KF147. Gene clusters can be (parts of) operons or functional groups of genes, involved in a certain trait. Per split in the tree, the genes of the reference genomes constituting a chrOG were retrieved. For these genes the locations in the respective genome were retrieved and groups of adjacent genes were identified. Furthermore, an operon prediction was performed for the chromosomes of the four reference strains using the Operon web‐tool of the Molecular Genetics group of the University of Groningen (http://bioinformatics.biol.rug.nl/websoftware/operon/). The default settings were used for the predictions (maximum spacing between ORFs of 100 bp and maximum energy/deltaG of 0).

Identifying subspecies‐specific or niche‐specific OGs

Strains were divided into two categories according to their subspecies or niche assignment. We used a hypergeometric test in order to find OGs that are mostly present in one category of strains (e.g. in ssp. lactis strains) but almost absent in all strains of the other category (e.g. ssp. cremoris strains). The resulting P‐values were corrected for false discovery rate and only OGs that have a P‐value below 0.05 were considered to be specific.

Plasmid gel electrophoresis

Isolation of plasmid DNA was performed as previously described (de Vos ). Standard SDSpolyacrylamide gel electrophoresis was performed as described by Sambrook and colleagues (1989). Southern hybridization was performed using probes designed to detect the typical plasmid‐located genes citP (encoding citrate permease for citrate uptake), lacG (encoding 6‐P‐β‐galactosidase carried on the lactose plasmid) and prtP (encoding cell‐wall proteinase).
  59 in total

1.  InterProScan--an integration platform for the signature-recognition methods in InterPro.

Authors:  E M Zdobnov; R Apweiler
Journal:  Bioinformatics       Date:  2001-09       Impact factor: 6.937

Review 2.  Genome plasticity in Lactococcus lactis.

Authors:  Nathalie Campo; Miguel J Dias; Marie-Line Daveran-Mingot; Paul Ritzenthaler; Pascal Le Bourgeois
Journal:  Antonie Van Leeuwenhoek       Date:  2002-08       Impact factor: 2.271

3.  Automatic clustering of orthologs and in-paralogs from pairwise species comparisons.

Authors:  M Remm; C E Storm; E L Sonnhammer
Journal:  J Mol Biol       Date:  2001-12-14       Impact factor: 5.469

Review 4.  Distinctive features of homologous recombination in an 'old' microorganism, Lactococcus lactis.

Authors:  A Quiberoni; L Rezaïki; M El Karoui; I Biswas; P Tailliez; A Gruss
Journal:  Res Microbiol       Date:  2001-03       Impact factor: 3.992

5.  Inactivation of the glutamate decarboxylase gene in Lactococcus lactis subsp. cremoris.

Authors:  M Nomura; M Kobayashi; S Ohmomo; T Okamoto
Journal:  Appl Environ Microbiol       Date:  2000-05       Impact factor: 4.792

6.  Tolerance to high osmolality of Lactococcus lactis subsp. lactis and cremoris is related to the activity of a betaine transport system.

Authors:  D Obis; A Guillot; M Y Mistou
Journal:  FEMS Microbiol Lett       Date:  2001-08-07       Impact factor: 2.742

7.  The complete genome sequence of the lactic acid bacterium Lactococcus lactis ssp. lactis IL1403.

Authors:  A Bolotin; P Wincker; S Mauger; O Jaillon; K Malarme; J Weissenbach; S D Ehrlich; A Sorokin
Journal:  Genome Res       Date:  2001-05       Impact factor: 9.043

8.  Novel sucrose transposons from plant strains of Lactococcus lactis.

Authors:  W J Kelly; G P Davey; L J Ward
Journal:  FEMS Microbiol Lett       Date:  2000-09-15       Impact factor: 2.742

9.  Rapid PCR-based method which can determine both phenotype and genotype of Lactococcus lactis subspecies.

Authors:  Masaru Nomura; Miho Kobayashi; Takashi Okamoto
Journal:  Appl Environ Microbiol       Date:  2002-05       Impact factor: 4.792

10.  Streptococcus thermophilus core genome: comparative genome hybridization study of 47 strains.

Authors:  Thomas Bovbjerg Rasmussen; Morten Danielsen; Ondrej Valina; Christel Garrigues; Eric Johansen; Martin Bastian Pedersen
Journal:  Appl Environ Microbiol       Date:  2008-06-06       Impact factor: 4.792

View more
  37 in total

1.  Unleashing Natural Competence in Lactococcus lactis by Induction of the Competence Regulator ComX.

Authors:  Joyce Mulder; Michiel Wels; Oscar P Kuipers; Michiel Kleerebezem; Peter A Bron
Journal:  Appl Environ Microbiol       Date:  2017-09-29       Impact factor: 4.792

2.  Genotypic and phenotypic analysis of dairy Lactococcus lactis biodiversity in milk: volatile organic compounds as discriminating markers.

Authors:  Amandine Dhaisne; Maeva Guellerin; Valérie Laroute; Sandrine Laguerre; Muriel Cocaign-Bousquet; Pascal Le Bourgeois; Pascal Loubiere
Journal:  Appl Environ Microbiol       Date:  2013-05-24       Impact factor: 4.792

3.  Diversity in robustness of Lactococcus lactis strains during heat stress, oxidative stress, and spray drying stress.

Authors:  Annereinou R Dijkstra; Meily C Setyawati; Jumamurat R Bayjanov; Wynand Alkema; Sacha A F T van Hijum; Peter A Bron; Jeroen Hugenholtz
Journal:  Appl Environ Microbiol       Date:  2013-11-08       Impact factor: 4.792

4.  Correlation of Lactobacillus rhamnosus Genotypes and Carbohydrate Utilization Signatures Determined by Phenotype Profiling.

Authors:  Corina Ceapa; Jolanda Lambert; Kees van Limpt; Michiel Wels; Tamara Smokvina; Jan Knol; Michiel Kleerebezem
Journal:  Appl Environ Microbiol       Date:  2015-06-05       Impact factor: 4.792

5.  The Genome of the Plant-Associated Lactic Acid Bacterium Lactococcus lactis KF147 Harbors a Hybrid NRPS-PKS System Conserved in Strains of the Dental Cariogenic Streptococcus mutans.

Authors:  Barzan I Khayatt; Vera van Noort; Roland J Siezen
Journal:  Curr Microbiol       Date:  2019-11-08       Impact factor: 2.188

6.  In Vitro Characterization of Lactic Acid Bacteria Isolated from Bovine Milk as Potential Probiotic Strains to Prevent Bovine Mastitis.

Authors:  Matías S Pellegrino; Ignacio D Frola; Berardo Natanael; Dino Gobelli; María E F Nader-Macias; Cristina I Bogni
Journal:  Probiotics Antimicrob Proteins       Date:  2019-03       Impact factor: 4.609

7.  High-resolution amplified fragment length polymorphism typing of Lactococcus lactis strains enables identification of genetic markers for subspecies-related phenotypes.

Authors:  Oylum Erkus Kütahya; Marjo J C Starrenburg; Jan L W Rademaker; Corné H W Klaassen; Johan E T van Hylckama Vlieg; Eddy J Smid; Michiel Kleerebezem
Journal:  Appl Environ Microbiol       Date:  2011-06-10       Impact factor: 4.792

8.  Surface proteome analysis of a natural isolate of Lactococcus lactis reveals the presence of pili able to bind human intestinal epithelial cells.

Authors:  Mickael Meyrand; Alain Guillot; Mélodie Goin; Sylviane Furlan; Julija Armalyte; Saulius Kulakauskas; Naima G Cortes-Perez; Ginette Thomas; Sophie Chat; Christine Péchoux; Vincent Dupres; Pascal Hols; Yves F Dufrêne; Germain Trugnan; Marie-Pierre Chapot-Chartier
Journal:  Mol Cell Proteomics       Date:  2013-09-03       Impact factor: 5.911

9.  Lactococcus lactis metabolism and gene expression during growth on plant tissues.

Authors:  Benjamin L Golomb; Maria L Marco
Journal:  J Bacteriol       Date:  2014-11-10       Impact factor: 3.490

10.  The carbohydrate metabolism signature of lactococcus lactis strain A12 reveals its sourdough ecosystem origin.

Authors:  Delphine Passerini; Michèle Coddeville; Pascal Le Bourgeois; Pascal Loubière; Paul Ritzenthaler; Catherine Fontagné-Faucher; Marie-Line Daveran-Mingot; Muriel Cocaign-Bousquet
Journal:  Appl Environ Microbiol       Date:  2013-07-19       Impact factor: 4.792

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.