Seungwoo Son1, Raham Lee2, Seung-Moon Park3, Sung Ho Lee4, Hak-Kyo Lee1,2, Yangseon Kim5, Donghyun Shin1. 1. Department of Agricultural Convergence Technology, Jeonbuk National University, Jeonju 54896, Korea. 2. Department of Animal Biotechnology, College of Agricultural and Life Sciences, Jeonbuk National University, Jeonju 54896, Korea. 3. Department of Bioenvironmental Chemistry, College of Agriculture and Life Sciences, Jeonbuk National University, Jeonju 54896, Korea. 4. Woogene B&G, Hwaseong 18630, Korea. 5. Center for Industrialization of Agriculture and Livestock Microorganism, Jeongeup 56212, Korea.
Lactobacillus acidophilus is a gram-positive, homofermentative, and
microaerophilic bacteria. It ferments sugars into lactic acid and grows readily in
acidic pH (below 5.0) and is common in the gastrointestinal (GI) tract of humans and
other animals [1]. L.
acidophilus is used in the production of fermented food and dairy
products and is a symbiont in humans and animals. L. acidophilus
strains are commercially used in many dairy products for the production of yogurt,
health foods, and several medicines [2]. Some
strains of L. acidophilus have probiotic characteristics. Several
studies have shown that L. acidophilus exhibits various probiotic
effects in humans and animals and helps in lowering cholesterol levels, preventing
and treating diarrhea, modulating the immune system, and suppressing cancer [3,4].Recently, there has been an increasing interest in developing methods to modulate the
animal intestinal microbiota to improve health. The GI tract of healthy dogs
contains Lactobacillus species, such as L.
acidophilus [5]. Even after
considering inter-individual variations, L. acidophilus is
established in the gut of dogs soon after birth, similar to humans and other
mammals. As it grows, it reaches compositional stability, with its principal
activity being inhibition of undesirable microorganism proliferation [6]. Additionally, commensal gut bacteria
positively interact with the host immune system, producing a wide range of
metabolites crucial for host physiology. The gut bacteria depend on their host for
nutrients and to maintain a stable ecosystem [7]. Therefore, the phylogenetic differentiation of L.
acidophilus strains reflects their co-evolution with their vertebrate
hosts. Previous studies have characterized the phylogenetic and genetic features of
L. acidophilus strains isolated from diverse hosts using their
genomic data [8, 9]. To date, dozens of strains isolated from several hosts have
been studied and are published in the National Center for Biotechnology Information
Center (NCBI). However, the phylogenetic and genetic features of L.
acidophilus strains isolated from canine intestines remain unknown.Recently, with the development of long-read and high-throughput DNA sequencing
technologies, whole-genome studies have become increasingly feasible and
affordable, making the genomic data of diverse organisms publicly available.
Based on this data, comparative genomic analysis of strains within the same species
has provided insights into modified, acquired, or lost genetic features
closely related to the evolution and adaptation to specific environments
[10]. In this study, we sequenced
canine-derived L. acidophilus strain (C5) and performed comparative
genomic analysis (pan-genome and the ratio of non-synonymous to synonymous
substitutions [dN/dS] analysis) to profile its genetic characteristics.
MATERIALS AND METHODS
L. acidophilus C5 isolation
L. acidophilus C5 was isolated from dog feces (Shih Tzu, male)
in Korea by Woogene B&G and was kindly provided for research purposes.
L. acidophilus C5 was cultivated on modified de Man, Rogosa
and Sharpe (MRS) (mMRS, with 0.05% cysteine-HCl) in an anaerobic atmosphere (5%
hydrogen, 5% carbon dioxide, and 90% nitrogen) for 48 h at 37°C.
Genome sequencing, annotation, and average nucleotide identity (ANI)
calculation
DNA was extracted using DNeasy UltraClean Microbial Kit (Qiagen, Hilden, Germany)
following the manufacturer’s instructions. Whole-genome shotgun
sequencing of the L. acidophilus C5 strain was performed using
the PacBio SMRT and Illumina HiSeq sequencing technology. For genome assembly,
we applied the recently described Unicycler (v0.4.6) [11] and the final assembly was 2.005 Mb in one contig
(Fig. 1). The PreAssembly step mapped
single-pass reads to seed reads, the longest portion of the read length
distribution. Subsequently, a consensus sequence of the mapped reads was
generated, resulting in long and highly accurate fragments of the target genome.
The next step was to correct and filter the reads. Reads that were fully
contained in other reads did not provide extra information for constructing the
genome, so they were filtered. Reads with an unsuitable extent of overlap were
also filtered. Next, we constructed contigs of the L.
acidophilus C5 strain. After de-novo assembly, we mapped the
Illumina HiSeq reads to the first assembled genome sequence. We observed slight
difference in the mapping result and the assembly result. We used this
information to generate a consensus sequence of higher quality through a
self-mapping step. Previously published genome sequences of six strains of
L. acidophilus isolated from diverse hosts (yogurt, DuPont
nutrition, and humans) were acquired from NCBI and compared with the
host-derived strain. All genome sequences of L. acidophilus
strains were annotated using Prokka (v1.12b) [12] and EggNOG-mapper (v5.0) [13]. The protein-coding sequences were categorized based on the
Clusters of Orthologous Groups (COG) database from the Prokka results (Fig. 2). For comparative genomic analysis in
this study, we selected strains with complete genome information in NCBI
database. To evaluate the genetic relationship between L.
acidophilus C5 and the other strains, the ANI was calculated using
the JSpecies web server [14]. All
information on L. acidophilus C5 and the other six strains used
in this study is presented in Table 1 and
Supplementary Table S1.
Fig. 1.
Genome map of Lactobacillus acidophilus C5.
Circular map was drawn using the genome annotation result. Marked
characteristics are shown from outside to the center; CDS on forward
strand, CDS on reverse strand, tRNA, rRNA, GC content, and GC skew. CDS,
coding region; G, guanine; C, cytosine.
Fig. 2.
Functional categorization using Eggnog annotation (COG database) of
all predicted CDSs of Lactobacillus acidophilus
C5.
COG, clusters of orthologous groups; CDS, coding region.
Table 1.
Comparison of the chromosomal properties of the seven
Lactobacillus acidophilus strains
Strain
C5
NCFM
La14
FSI4
ATCC53544
LA1
DSM20079
Genome size (bp)
2,005,383
1,993,560
1,991,579
1,991,969
1,991,906
1,991,195
2,009,973
Gene
2,082
1,963
1,978
1,977
1,977
2,002
2,020
Coding genes
2,009
1,832
1,862
1,868
1,852
1,886
1,840
Non coding genes
73
131
116
109
125
116
180
ANI (average nucleotide identity)
100.00%
99.28%
99.28%
99.29%
99.28%
99.28%
99.24%
Genome map of Lactobacillus acidophilus C5.
Circular map was drawn using the genome annotation result. Marked
characteristics are shown from outside to the center; CDS on forward
strand, CDS on reverse strand, tRNA, rRNA, GC content, and GC skew. CDS,
coding region; G, guanine; C, cytosine.
Functional categorization using Eggnog annotation (COG database) of
all predicted CDSs of Lactobacillus acidophilus
C5.
COG, clusters of orthologous groups; CDS, coding region.
Pan-genome analysis
The genome sequences of seven L. acidophilus strains, including
the C5 strain, were first annotated using Prokka (v1.12b) [12] to obtain GFF files, which were used to perform
pan-genome analysis. The core- and pan-genomes were calculated using Roary, a
rapid standalone pan-genomic pipeline [15] and a pair of genes was defined as belonging to the same gene family
when the identity value of their amino acid sequences was > 95% (Fig. 3). The COG annotation (from
EggNOG-mapper result) of core genes and the C5 strain is shown in Fig. 4. A phylogenetic tree was constructed
based on the core genes (Fig. 5).
Fig. 3.
Pan-genome analysis using genome sequences of seven
Lactobacillus acidophilus strains.
The plot represents the core genes, accessory genes, and unique genes of
the seven analyzed genomes (core genes = all seven species, accessory
genes = 2–6, and unique genes = 1).
Fig. 4.
Core and C5 genes of Lactobacillus acidophilus using
the Eggnog annotation result (COG database) Blue color means core genes
(genes in all strains) and light blue means genes only in C5.
Also, the X-axis is information that classifies gene functions by the COG
database. The meaning of each abbreviation is as follows. D, cell cycle
control, cell division, chromosome partitioning; M, cell
wall/membrane/envelope biogenesis; N, cell motility; O,
post-translational modification, protein turnover, and chaperones; T,
signal transduction mechanisms; U, intracellular trafficking, secretion,
and vesicular transport; V, defense mechanisms; J, translation,
ribosomal structure, and biogenesis; K, transcription; L, replication,
recombination, and repair; C, energy production and conversion; E, amino
acid transport and metabolism; F, nucleotide transport and metabolism;
G, carbohydrate transport and metabolism; H, coenzyme transport and
metabolism; I, lipid transport and metabolism; P, inorganic ion
transport and metabolism; Q, secondary metabolites biosynthesis,
transport, and catabolism; S, function unknown; COG, clusters of
orthologous groups.
Fig. 5.
Phylogenetic analyses (C5 and six other complete genome sequences of
Lactobacillus acidophilus) using the core genes
from pan-genome analysis.
Pan-genome analysis using genome sequences of seven
Lactobacillus acidophilus strains.
The plot represents the core genes, accessory genes, and unique genes of
the seven analyzed genomes (core genes = all seven species, accessory
genes = 2–6, and unique genes = 1).
Core and C5 genes of Lactobacillus acidophilus using
the Eggnog annotation result (COG database) Blue color means core genes
(genes in all strains) and light blue means genes only in C5.
Also, the X-axis is information that classifies gene functions by the COG
database. The meaning of each abbreviation is as follows. D, cell cycle
control, cell division, chromosome partitioning; M, cell
wall/membrane/envelope biogenesis; N, cell motility; O,
post-translational modification, protein turnover, and chaperones; T,
signal transduction mechanisms; U, intracellular trafficking, secretion,
and vesicular transport; V, defense mechanisms; J, translation,
ribosomal structure, and biogenesis; K, transcription; L, replication,
recombination, and repair; C, energy production and conversion; E, amino
acid transport and metabolism; F, nucleotide transport and metabolism;
G, carbohydrate transport and metabolism; H, coenzyme transport and
metabolism; I, lipid transport and metabolism; P, inorganic ion
transport and metabolism; Q, secondary metabolites biosynthesis,
transport, and catabolism; S, function unknown; COG, clusters of
orthologous groups.
Comparing C5 strain to the other strains using dN/dS analysis
For comparative genome analysis using the dN/dS method, we used OrthoFinder
(v1.1.10) [16] and PRANK [17] to determine ortholog genes for the
seven genomes and multiple sequence alignment of each orthologous gene,
respectively. These sequences were converted into the corresponding cDNA
sequences using PAL2NAL [18] and poorly
aligned transcripts were eliminated using Gblocks [19]. After all the filtering steps, a total of 1,843
orthologous groups remained. Phylogenetic analysis by maximum (PAML4) analysis
using the maximum likelihood method [20]
was used to estimate the dS and dN. Phylogenetically featured genes were
investigated by the branch-site models.
RESULTS
General genomic characteristics of the Lactobacillus
acidophilus C5 strain
The total genome size of the L. acidophilus C5 strain in this
study was 2.005 Mb, and the guanine + cytosine (G + C) content was 34.5% (after
fitting). Additionally, genome annotation using Prokka (v1.12b) [12] and EggNOG-mapper [13] showed that the sequenced genome
consisted of 2,009 coding genes and 73 non-coding genes (61 tRNA and 12 rRNA
genes) (Fig. 1 and Table 1). In the functional analysis using the COG database
in EggNog-mapper, the largest protein-coding categories (except, “General
function prediction only” and “Function unknown”) in
L. acidophilus C5 strain were “Carbohydrate
transport and metabolism (G)” (9.01%), “Translation, ribosomal
structure and biogenesis (J)” (7.32%) and “Replication,
recombination, and repair (L)” (7.12%) (Fig. 2).
Pan-genome analysis of Lactobacillus acidophilus
strains
The core- and pan-genomes of the seven L. acidophilus strains,
including the C5 strain, were analyzed using a comparative genomics method. The
pan-genome of the seven L. acidophilus strains contained 2,254
gene families, and the core genome contained 1,726 gene families, indicating
that together, the seven genomes were sufficient to represent the core genome of
L. acidophilus. Moreover, the seven L.
acidophilus genomes contained 200 accessory gene families (six
isolates: 126 gene families, five isolates: 27 gene families, four isolates: 5
gene families, three isolates: 10 gene families, and two isolates: 32 gene
families), and 328 strain-specific genes (Fig.
3). To determine the functions of the 1,726 core genes, we extracted
the sequences of the core genes to map to the COG database. The results showed
that except for the “General function prediction only” and
“Function unknown” categories, the largest proportion of core
genes belonged to “Carbohydrate transport and metabolism (G, 154
genes)” followed by “Translation, ribosomal structure and
biogenesis (J, 135 genes)”, and “Amino acid transport and
metabolism (E, 113 genes).” (Fig.
4). Using a phylogenetic tree based on core genes, we found that the
L. acidophilus C5 strain was clearly distinguished from the
other six strains (Fig. 5). The strains
closest to the L. acidophilus C5 strain were the human-derived
strains (DSM20079 and NCFM). The C5 strain had the highest number of unique
genes (245 genes) among the seven L. acidophilus strains. Among
the unique genes of the C5 strain, the largest proportion belonged to
“Replication, recombination, and repair (L) (28 genes)” and
“Carbohydrate transport and metabolism (G) (24 genes)” apart from
the “Function unknown” category (Fig. 4).
Comparison of Lactobacillus acidophilus C5 strain to the six
strains using evolutionary genomic analysis (dN/dS)
We performed dN/dS analysis (branch-site model) to identify the evolutionarily
selective genes in the L. acidophilus C5 strain. Considering
the phylogenetic relationships among the seven L. acidophilus
strains, we searched for the genes that could explain the specific
characteristics of the L. acidophilus C5 strain. We identified
1,843 orthologous genes from the seven strains and measured the rate of
evolution using the dN/dS analysis (). We identified 30 phylogenetically
featured genes and the variations in their amino acid sequences (Supplementary
Table S2). To determine the functions of the 30 evolutionarily selective genes,
we mapped their sequences to the COG database (Fig. 6). We observed that apart from the “Function
unknown” category, the largest proportion of core genes belonged to the
“Carbohydrate transport and metabolism (G)” and
“Transcription (K)” categories. The carbohydrate transport and
metabolism (G) category included five genes- C5_1_01133 (glcU_1: glucose uptake
protein GlcU), C5_1_00898 (fba: fructose-bisphosphate aldolase), C5_1_01372
(hypothetical protein), C5_1_01889 (ptsI: phosphoenolpyruvate-protein
phosphotransferase), and C5_1_00253 (malL_2: Oligo-1,6-glucosidase).
Fig. 6.
Functional categorization using Eggnog annotation (COG database) of
significant evolutionarily accelerated genes of Lactobacillus
acidophilus C5 identified in the dN/dS analysis (the
branch-site model).
COG, clusters of orthologous groups; dN/dS, the ratio of non-synonymous
to synonymous substitution.
Functional categorization using Eggnog annotation (COG database) of
significant evolutionarily accelerated genes of Lactobacillus
acidophilus C5 identified in the dN/dS analysis (the
branch-site model).
COG, clusters of orthologous groups; dN/dS, the ratio of non-synonymous
to synonymous substitution.
DISCUSSION
This study is the first attempt to decipher the genetic features and evolutionary
adaptations of L. acidophilus in the canine gut intestine. We
performed whole-genome sequencing to construct the genome of the L.
acidophilus C5 strain and compared the genomic information of
L. acidophilus derived from a dog with six other strains
isolated from diverse hosts. We determined the genetic basis for the characteristics
of the L. acidophilus C5 strain that are likely related to the
host. After the sequencing and assembly process, we were able to construct one large
contig corresponding to the genome of the C5 strain, comparable in size (2.005 Mb)
with the other six complete genomes of L. acidophilus (NCFM, LA14,
FSI4, ATCC53544, LA1,and DSM20079) from NCBI (Table
1) [21-26]. In this study, complete genome sequences
of other strains in the NCBI database were used for comparative genome analysis of
C5, which were isolated from different hosts, and none were isolated from dogs as in
previous studies [27,28]. In the annotation process, we found 2,082 genes (2,009
coding and 73 non-coding genes), which was slightly more than the number of genes
found in the other six strains. Similarly, the genome size of L.
acidophilus C5 was slightly larger than that of the other strains. In
the functional analysis using the COG database, it was found that genes related to
carbohydrate transport and metabolism (181 genes) comprised the largest part of the
L. acidophilus C5 strain genome, apart from “General
function prediction only” and “Function unknown” categories. In
other strains too, the carbohydrate metabolism-related genes comprised a large
portion of the genome (Fig. 4), but C5 was the
only strain in which they were most abundant among the categories with distinct
functions (except for “General function prediction only” and
“Function unknown” categories).In pan-genome analysis, a total of 1,726 core genes were detected in the seven
L. acidophilus genomes isolated from four hosts, which mainly
encoded essential proteins for metabolism (30.35%) (Fig. 4). Consistent with the findings of studies on other Lactobacillus
strains, our findings suggested that core genes are indispensable, constitute the
basic framework of the L. acidophilus, and play important roles in
carbohydrate transport and metabolism (154 genes) and translation, ribosomal
structure, and biogenesis (134 genes). Competitive retention of L.
acidophilus in the intestinal tract is important for glycogen
biosynthesis in carbohydrate metabolism, which demonstrates that the ability to
synthesize intracellular glycogen contributes to gut fitness and indicates retention
of probiotic microorganisms [29,30]. After pan-genome analysis, we constructed
a phylogenetic tree based on the 1,726 core genes to evaluate the genetic
relationships between the canine C5 strain and the six other strains of L.
acidophilus. In this phylogenetic tree, we identified that C5 was
distant from other strains but was close to two strains (DSM20079 and NCFM) from
humans. This could be attributed to the fact that dogs are representative companion
animals to humans and have shared their environments for a long time. Therefore, the
genomic differences in L. acidophilus strains and similarities
between the C5 and the two strains from humans might be associated with their
colonized environments. Significantly, the strains closest to C5 were human-derived
even if all human-derived strains were not close (such as LA1 and ATCC53544).
Similar results were obtained in a comparative genome study of other Lactobacillus
strains [27]. To better understand the
characteristics of L. acidophilus C5, we investigated the unique
genes of all L. acidophilus strains in this study. From the 2,254
gene clusters in pan-genome analysis, we found 328 unique genes, 245 of which were
specific to the C5 strain. In the functional analysis based on the COG database,
these C5-specific genes were mainly associated with “Replication,
recombination, and repair (28 genes)” and “Carbohydrate transport and
metabolism (24 genes).” Interestingly, core gene analysis revealed many
common carbohydrate metabolism-related genes in the L. acidophilus
strains, and it was confirmed that many carbohydrate metabolism-related genes in the
C5 strain were not found in other strains. Moreover, we found that there were more
carbohydrate metabolism-related genes in the C5 strain than in the other strains.
Based on the core gene analysis and unique gene results, we inferred that C5 has
distinct genomic features from other strains of L. acidophilus.The similarity of several genomes within or between species is the basis of
comparative genomics. If two species or strains have a recent common ancestor, the
differences between the two genomes evolved after the common ancestral genome. The
more closely related the two strains, the higher the similarities between their
genomes [31]. When we performed comparative
genomic analysis of the C5 strain isolated from dogs with other strains of
L. acidophilus, we supposed that the genomic differences
between the L. acidophilus strains were significant in the
evolutionary process and could explain the adaption process of the C5 strain using
the evolutionary statistical method (dN/dS) [32]. The identification of genetic loci undergoing adaptation is a
central aim of evolutionary biology, and several statistical tests have been
developed to quantify selection pressures acting on protein-coding regions. Among
these methods, the dN/dS ratio is one of the most widely used. It is simple and
robust and can quantify selection pressures by comparing the rate of substitutions
at silent sites (dS, which are presumed neutral) to the rate of substitutions at
non-silent sites (dN, which possibly experience selection). The dN/dS ratio is used
for distantly diverged sequences, so the differences among them represent
substitutions that have been fixed along independent lineages [33,34]. We assumed that
the selection signal by dN/dS indicated adaptation of each strain to its environment
and used this measurement to understand the adaptation of the C5 strain to the
canine environment [28, 35]. After identifying 1,843 orthologous genes in the seven
genomes, we performed dN/dS estimation for each orthologous gene using PAML4 [20]. We determined 30 evolutionarily selective
genes for the C5 strain. In these 30 genes, there were 300 C5-specific amino acid
changes, out of which 141 were statistically significant. In the crcB gene, there
were 30 C5-specific amino acid changes, the highest among the 30 selective genes,
and 14 of all amino acid changes were statistically significant. This gene is
important for preventing fluoride toxicity by reducing its concentration in the cell
[36]. In the functional analysis based on
the COG database, these evolutionarily selective genes were mainly associated with
carbohydrate transport and metabolism (five genes) and transcription (four genes).
Interestingly, among the genes containing significant evolutionary selection
signals, most genes were carbohydrate metabolism-related genes. We hypothesized that
the genes related to carbohydrate metabolism in strains isolated from dogs were
closely related to the domestication of dogs. The domestication of dogs was an
important milestone of human civilization [37]. In a previous study, whole-genome re-sequencing of dogs and wolves was
performed to identify genomic regions potentially representing selection targets
during dog domestication [38]. This study
identified candidate mutations in key genes and provided functional support for
increased starch digestion in dogs relative to wolves. This result indicates that
adaptations allowed modern dog ancestors to survive on a diet rich in starch, a
crucial step in the domestication of dogs. We wondered whether the evolution of
L. acidophilus C5 from dogs has occurred. Reportedly, the dog
and human gut microbiomes are similar in terms of gene content and response to diet
[39]. As dogs and humans have shared a
similar environment for a long time after domestication, if the dogs experience
similar metagenome changes due to changes in diet, it was deduced that its
microbiome adapts to a carbohydrate-rich diet. We inferred that more
carbohydrate-related genes in the genome and genes with selection signals in
L. acidophilus C5 were the result of evolution and
domestication.In summary our results indicated that the L. acidophilus C5 strain
from a canine host had many genes related to carbohydrate metabolism, presumably due
to domestication by humans. We reported the characteristics of the C5 strain from a
canine using whole-genome sequencing data compared with other original isolates,
providing a strong indication of the factors affecting its evolutionary history
(evolution due to domestication of dogs by humans). We hope that our study
contributes to the feasibility of using these strains as probiotics for dogs in the
future.Supplementary Tables
Authors: Jenny L Baker; Narasimhan Sudarsan; Zasha Weinberg; Adam Roth; Randy B Stockbridge; Ronald R Breaker Journal: Science Date: 2011-12-22 Impact factor: 47.728
Authors: Andrew J Page; Carla A Cummins; Martin Hunt; Vanessa K Wong; Sandra Reuter; Matthew T G Holden; Maria Fookes; Daniel Falush; Jacqueline A Keane; Julian Parkhill Journal: Bioinformatics Date: 2015-07-20 Impact factor: 6.937
Authors: Luis Pedro Coelho; Jens Roat Kultima; Paul Igor Costea; Coralie Fournier; Yuanlong Pan; Gail Czarnecki-Maulden; Matthew Robert Hayward; Sofia K Forslund; Thomas Sebastian Benedikt Schmidt; Patrick Descombes; Janet R Jackson; Qinghong Li; Peer Bork Journal: Microbiome Date: 2018-04-19 Impact factor: 14.650