| Literature DB >> 18700032 |
Renato H Orsi1, Qi Sun, Martin Wiedmann.
Abstract
BACKGROUND: The genus Listeria includes two closely related pathogenic and non-pathogenic species, L. monocytogenes and L. innocua. L. monocytogenes is an opportunistic human foodborne and animal pathogen that includes two common lineages. While lineage I is more commonly found among human listeriosis cases, lineage II appears to be overrepresented among isolates from foods and environmental sources. This study used the genome sequences for one L. innocua strain and four L. monocytogenes strains representing lineages I and II, to characterize the contributions of positive selection and recombination to the evolution of the L. innocua/L. monocytogenes core genome.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18700032 PMCID: PMC2532693 DOI: 10.1186/1471-2148-8-233
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
Strains and genomes analyzed.
| Strain | Serotype | Lineage | Species | Genome size (nt) | No. of CDS(1) | Ref. |
|---|---|---|---|---|---|---|
| EGD-e | 1/2a | II | 2,944,528 | 2846 | [ | |
| F6854 | 1/2a | II | 2,953,211 | 2945 | [ | |
| F2365 | 4b | I | 2,905,310 | 2821 | [ | |
| H7858 | 4b | I | 2,893,921 | 3007 | [ | |
| CLIP 11262 | 6a | - | 3,011,209 | 2968 | [ |
(1) number of coding sequences used for cluster analysis.
Figure 1Unrooted phylogenetic tree of the five strains used in the genome-wide analyses. This tree represents the consensus tree of 2267 gene cluster alignments used for analyses. Branches tested for positive selection are indicated as TLM1A (Test lineage IAncestral), TLM2A (Test lineage IIAncestral), and TLI/LM (Test /).
Figure 2Schematic representations of the genes analyzed including (A) relative position of the orthologs in EGD-e, CLIP 11262 and F2365; and (B) circular chromosome of EGD-e. In "B" all protein coding, tRNA, and rRNA genes are shown in blue, brown, and purple; all genes analyzed are shown in green and genes with evidence for recombination (at least one test significant) are shown in red. There was no evidence for spatial clustering of genes with evidence for recombination (P = 0.957; U-test).
Associations between COGs and descriptive variables.
| COG | Number of gene analyzed | Bonferroni-corrected P value for one-sided U-test testing for associations between genes in a given COG and(1) | ||||
|---|---|---|---|---|---|---|
| > length | > nt diversity | > Number of Informative sites | > Codon bias(2) | < Codon bias(2) | ||
| Energy production and conversion | 100 | < 0.002 | ND | NS | 0.007 | ND |
| Cell cycle control, mitosis and meiosis | 20 | NS | ND | ND | NS | ND |
| Amino acid transport and metabolism | 181 | < 0.002 | NS | < 0.002 | NS | ND |
| Nucleotide transport and metabolism | 63 | NS | ND | NS | NS | ND |
| Carbohydrate transport and metabolism | 186 | < 0.002 | ND | NS | NS | ND |
| Coenzyme transport and metabolism | 87 | NS | < 0.001 | 0.004 | ND | 0.047 |
| Lipid transport and metabolism | 51 | NS | NS | NS | NS | ND |
| Translation | 91 | NS | ND | ND | < 0.001 | ND |
| Transcription | 187 | ND | ND | ND | ND | < 0.002 |
| Replication, recombination and repair | 92 | < 0.002 | < 0.001 | < 0.002 | ND | NS |
| Cell wall/membrane biogenesis | 76 | < 0.002 | < 0.001 | < 0.002 | NS | ND |
| Cell motility | 40 | NS | ND | NS | ND | NS |
| Posttranslational modification, protein turnover, chaperones | 52 | ND | ND | ND | 0.021 | ND |
| Inorganic ion transport and metabolism | 104 | NS | ND | NS | ND | NS |
| Secondary metabolites biosynthesis, transport and catabolism | 31 | ND | NS | NS | ND | NS |
| General function prediction only | 254 | NS | NS | 0.008 | ND | NS |
| Function unknown | 161 | ND | NS | ND | ND | NS |
| Signal transduction mechanisms | 104 | NS | ND | NS | ND | < 0.002 |
| Intracellular trafficking and secretion | 37 | NS | ND | ND | NS | ND |
| Defense mechanisms | 51 | < 0.002 | NS | 0.020 | ND | NS |
| Not in COGs | 511 | ND | ND | ND | ND | NS |
(1) ">" or "<" indicates the direction of the one-sided tests (i.e. column "> Codon bias" shows Bonferroni-corrected p-values for associations between genes in a given COG and higher codon bias (as compared to the genes in other COGs), while the column "< Codon bias" test shows Bonferroni-corrected p-values for genes in COGs with lower codon bias as compared to genes in other COGs; "ND" = Not determined (tests were not performed for COGs that showed values that were not consistent with the tested alternative hypothesis, e.g., if the average gene length for genes in a given COG was below average than we did not test for an association of this COG with increased gene length); "NS". Not significant.
(2)Tests for codon bias were performed using Nc values; a lower Nc indicates increased codon bias.
Association between COGs and recombination
| COG | |||||
|---|---|---|---|---|---|
| GENECONV | NSS | PHI | |||
| Carbohydrate transport and metabolism | 0.012 | 0.001 | < 0.001 | 0.014 | 0.032 |
| Amino acid transport and metabolism | NS | NS | 0.001 | 0.026 | 0.002 |
| Defense mechanisms | NS | NS | 0.022 | NS | NS |
(1) Fisher's exact test with Bonferroni correction; NS = not significant;
(2) U-test with Bonferroni correction.
Genes under positive selection.
| Gene locus | Gene description (gene symbol) | COG(1) | Recombination(2) | Branch under pos. selection | Q-value | ω(3) | p(4) | BEB ( |
|---|---|---|---|---|---|---|---|---|
| Lmo0098 | PTS system, mannose/fructose/sorbose family, IID component ( | NCOG | GCV; MAX | LI/LM | 0.170 | 472.58 | 0.004 | - |
| Lmo0099 | conserved hypothetical protein | NCOG | - | LI/LM | 0.170 | ∞ | 0.009 | - |
| Lmo0139 | conserved hypothetical protein | NCOG | - | LM2A | 0.160 | 56.74 | 0.054 | 95 |
| Lmo0297 | PRD/PTS system IIA 2 domain protein | K; G; T | GCV; MAX; NSS; PHI | LM2A | 0.164 | ∞ | 0.012 | 499 |
| Lmo0397 | conserved hypothetical protein | S | GCV; MAX | LM2A | 0.1264 | ∞ | 0.043 | - |
| Lmo0429 | glycosyl hydrolase, family 38 | G | GCV; MAX; NSS; PHI | LM2A | 0.0325 | 977.51 | 0.008 | 667 |
| Lmo0455 | conserved hypothetical protein | T; Q | GCV; MAX; NSS; PHI | LM2A | 0.1214 | ∞ | 0.004 | - |
| Lmo0653 | conserved hypothetical protein | S | MAX | LM2A | 0.033 | ∞ | 0.012 | 306 |
| Lmo0658 | endonuclease III domain protein | L | MAX; NSS | LM2A | 0.108 | ∞ | 0.023 | 209 |
| Lmo0692 | chemotaxis protein CheA ( | T; N | GCV; MAX; NSS; PHI | LM1A | 0.156 | ∞ | 0.002 | - |
| Lmo0693 | flagellar motor switch domain protein | N; U | - | LM2A | 0.007 | ∞ | 0.022 | - |
| Lmo0695 | conserved hypothetical protein | NCOG | MAX; NSS; PHI | LM1A | 0.175 | ∞ | 0.014 | - |
| Lmo0732 | cell wall surface anchor family protein | NCOG | GCV | LM1A | 0.185 | 6.92 | 0.076 | - |
| Lmo0782 | PTS system, mannose/fructose/sorbose family, IIC component ( | NCOG | GCV; MAX | Overall; | 0.146; | 88.23; | 0.008; | - |
| LM1A; | 0.052; | ∞; | 0.0001; | - | ||||
| LM2A | 0.098 | ∞ | 0.004 | - | ||||
| Lmo0785 | sigma-54 dependent Kal regulator ( | K; T | GCV; MAX | LM2A | 0.129 | 1.00 | 0.000 | - |
| Lmo0872 | major facilitator family transporter | G | GCV; MAX; NSS | LM2A | 0.137 | ∞ | 0.008 | - |
| Lmo0910 | putative membrane protein | R | MAX; NSS | LM1A | 0.019 | ∞ | 0.012 | - |
| Lmo1146 | conserved hypothetical protein | NCOG | GCV; MAX | LI/LM | 0.046 | 223.37 | 0.023 | 169 |
| Lmo1164 | PduO protein ( | S; R | GCV | LI/LM | 0.170 | 195.87 | 0.038 | - |
| Lmo1412 | DNA topology modulation protein FlaR ( | F | - | LM2A | 0.160 | 8.56 | 0.139 | 12; 37; 68 |
| Lmo1424 | transporter, NRAMP family ( | P | GCV | LM1A | 0.185 | 1.00 | 0.000 | - |
| Lmo1523 | GTP pyrophosphokinase ( | K; T | - | LM2A | 0.033 | 1.00 | 0.000 | - |
| Lmo1529 | preprotein translocase, YajC subunit | U | - | LI/LM | 0.170 | ∞ | 0.011 | - |
| Lmo2102 | glutamine amidotransferase, SNO family ( | H | GCV; MAX; NSS; PHI | LM1A | 0.185 | ∞ | 0.017 | 66 |
| Lmo2121 | glycosyl transferase, family 65 | G | GCV; MAX; NSS | LM1A | 7.6E-06 | 35.31 | 0.031 | 722; 723; 725; 729; 730; 744; 752; |
| Lmo2178 | cell wall surface anchor family protein | M | GCV; MAX; NSS; PHI | LI/LM; | 0.193; | 15.92; | 0.001; | - |
| LM2A | 0.137 | 5.34 | 0.025 | 1769 | ||||
| Lmo2215 | ABC transporter, ATP-binding protein | V | MAX; NSS | Overall | 0.188 | 14.23 | 0.008 | - |
| Lmo2222 | Ser/Thr protein phosphatase family protein | L | GCV; MAX | LM2A | 0.160 | ∞ | 0.021 | 253 |
| Lmo2446 | glycosyl hydrolase, family 31 | G | GCV; MAX; NSS; PHI | LM2A | 0.003 | ∞ | 0.0001 | - |
| Lmo2596 | ribosomal protein S9 ( | NCOG | - | LM2A | 0.021 | ∞ | 0.010 | - |
| Lmo2611 | adenylate kinase ( | F | GCV; MAX | LM2A | 0.120 | 24.80 | 0.013 | - |
| Lmo2724 | conserved hypothetical protein | S | NSS | LI/LM | 0.170 | ∞ | 0.008 | - |
| Lmo2802 | glucose-inhibited division protein B ( | M | GCV; MAX; NSS; PHI | LM1A | 0.046 | ∞ | 0.013 | - |
| Lmo2804 | conserved hypothetical protein | NCOG | GCV; MAX; NSS; PHI | Overall | 6.5E-17 | 16.53 | 0.044 | - |
| Lmo2824 | D-isomer specific 2- hydroxyacid dehydrogenase family protein | E; H | GCV; MAX; NSS | LM2A | 0.160 | 229.62 | 0.003 | - |
(1) NCOG: Not in COGs; K: Transcription; G: Carbohydrate transport and metabolism; T: Signal transduction mechanisms; S: Function unknown; Q: Secondary metabolites biosynthesis, transport and catabolism; L: Replication, recombination and repair; N: Cell motility; U: Intracellular trafficking and secretion; R: General function prediction only; F: Nucleotide transport and metabolism; P: Inorganic ion transport and metabolism; E: Amino acid transport and metabolism; H: Coenzyme transport and metabolism; M: Cell wall/membrane biogenesis; V: Defense mechanisms
(2) GCV: GENECONV; MAX: Maximum χ2; NSS: Neighbour Similarity Score; PHI: Pairwise Homoplasy Test
(3) ω = dN/dS (Number of nonsynonymous changes per nonsynonymous sites/Number of synonymous changes per synonymous sites); infinite values of ω (∞) indicate that the model did not find synonymous changes for the branches tested (dS = 0; ω ~ ∞). However, this (i.e., ω ~ ∞) does not affect the validity of the Likelihood Ratio Test, which was used to identify the genes under positive selection (Z. Yang, pers. Communication; see http://gsf.gc.ucdavis.edu/viewtopic.php?f=1&t=2079&sid=c4a82e1ca334ca84a00a8b85e0f33c9d and http://gsf.gc.ucdavis.edu/viewtopic.php?f=1&t=2329&sid=c4a82e1ca334ca84a00a8b85e0f33c9d;
(4) Proportion of sites under positive selection
(5) This column lists sites identified using Bayes Empirical Bayes (BEB) as being under positive selection; numbers identify the amino acid sites (in alignments) that are under positive selection
Figure 3Cumulative distribution of the "Overall" represents all p-values regardless of the COG classification. Genes involved in "Cell-wall/membrane biosynthesis", "Coenzyme transport and metabolism", and "Amino acid transport and metabolism" showed a tendency to have lower p-values in comparison with all genes analyzed, while genes involved in "Transcription" and "Inorganic ion transport and metabolism" showed a tendency to have higher p-values than all genes analyzed.
Figure 4Recombination events identified by Clonal Frame, using the concatenated alignment of 40 randomly selected genes, in the external branches of the Each of the 40 genes is represented between gray vertical lines. The order of the genes (left to right) is as follow: clpX (lmo1268), lmo0343, lmo0405, pflC (lmo1407), phoP (lmo2501), lmo1436, lmo1460, lmo1537, hemC (lmo1556), ccpA (lmo1599), lmo1623, lmo1790, lmo2262, pepC (lmo2338), lmo2391, trxB (lmo2478), lmo0190, lmo0860, lmo0877, lmo1087, proA (lmo1259), lmo0992, smbA (lmo1313), lmo1401, lmo1420, opuCC (lmo1426), trpD (lmo1631), lmo1693, purK (lmo1774), lmo1825, panB (lmo1902), lmo0028, lmo2175, lmo2348, lmo2566, lmo0487, lmo0878, lmo1004, lmo1011, cbiH (lmo1199). "x" indicate substitutions inferred to have occurred in the respective branches. Red lines represent the probability for each nucleotide to have been imported by means of recombination. Values at the bottom represent the position in the alignment in kilobases.
Positive selection and recombination analyses of 5 genes in 45 isolates(1)
| Gene | Function | Recombination evidence(2) ( | Positive selection evidence(3) ( | ω(4) | p(5) | BEB(6) sites (probability) |
|---|---|---|---|---|---|---|
| Two-component sensor histidine kinase CheA, involved in chemotaxis (Dons | GENECONV (< 0.001), NSS (< 0.001), Max χ2 (< 0.001), PHI (< 0.001) | LIIIA/C-LI (< 0.001) | ∞ | 0.002 | 140 (98%); | |
| Putative flagellar motor switch protein, involved in motility | NSS (0.033) | LII (< 0.001) | ∞ | 0.022 | 17 (73%) | |
| Histone like-DNA topology modulator, involved in regulation of flagellin expression (Sanchez-Campillo | GENECONV (0.010), Max χ2(0.010), PHI (0.005) | LII (0.002) | 14.1 | 0.046 | 4 (90%) | |
| Putative two-component response phosphate regulator PhoP | GENECONV (0.014), NSS (0.001), Max χ2 (0.020), PHI (0.003) | - | - | - | - | |
| Putative UDP-N- acetylglucosamine-2-epimerase, involved in teichoic acid biogenesis (Dubail | GENECONV (< 0.001), NSS (< 0.001), Max χ2 (< 0.001), PHI (< 0.001) | - | - | - | - |
(1) Analyses were performed using an alignment of these 5 genes for the 40 L. monocytogenes isolates, for which these genes were sequenced here (see Supp. Table 1), as well as the four L. monocytogenes and one L. innocua strain for which full genome sequences were available (Table 1).
(2) NSS: Neighbour Similarity Score; Max χ2: Maximum χ2; PHI: Pairwise Homoplasy Index; GENECONV performed with g-scale = 1 did not show significant inner fragments for any of the five genes, however four genes showed significant inner fragments in GENECONV performed with g-scale = 2, these p-values are reported here. The g-scale setting in GENECONV is associated with the number of polymorphisms allowed in a putative recombinant fragment; more polymorphisms are allowed as the g-scale value decreases from 3 to 1,
(3) Branches where positive selection was identified. LIIIA/C-LI: ancestral branch of lineages IIIA/C and I isolates (branch B in Figure 5); LII: Ancestral branch of lineage II isolates (branch F in Figure 5); there was no evidence for positive selection in phoP and lmo2537.
(4) ω = dN/dS (number of nonsynonymous changes per nonsynonymous sites/Number of synonymous changes per synonymous sites); infinite values of ω (∞) indicate that the model did not find synonymous changes for the branches tested (dS = 0; ω ~ ∞).
(5) Proportion of sites under positive selection.
(6) this column lists sites identified using Bayes Empirical Bayes (BEB) as being under positive selection; numbers identify the amino acid sites (in alignments) that are under positive selection; "probabilities" refer to the posterior probabilities that the respective sites evolved by positive selection.
Figure 5Phylogenetic consensus tree generated by ClonalFrame from the concatenated alignment of The 95% consensus phylogeny was obtained from two independent runs of ClonalFrame. This phylogeny clearly shows that the L. monocytogenes isolates form four distinct clusters, with lineage IIIA and IIIC (IIIA/C) isolates forming a sister group to lineage I isolates, while lineage IIIB isolates form an independent cluster that diverged earlier from the other isolates. Internal branches that showed evidence for recombination are labeled from A to J. Isolates with premature stop codons in flaR are marked with*.