| Literature DB >> 20846431 |
Xiangyu Deng1, Adam M Phillippy, Zengxin Li, Steven L Salzberg, Wei Zhang.
Abstract
BACKGROUND: Bacterial pathogens often show significant intraspecific variations in ecological fitness, host preference and pathogenic potential to cause infectious disease. The species of Listeria monocytogenes, a facultative intracellular pathogen and the causative agent of human listeriosis, consists of at least three distinct genetic lineages. Two of these lineages predominantly cause human sporadic and epidemic infections, whereas the third lineage has never been implicated in human disease outbreaks despite its overall conservation of many known virulence factors.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20846431 PMCID: PMC2996996 DOI: 10.1186/1471-2164-11-500
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
L. monocytogenes genomes analyzed in this study
| Strain | Lineage | Serotype | Size (bp) | Genbank Accession | Sequencing institution | ||||
|---|---|---|---|---|---|---|---|---|---|
| EGD-e | II | 1/2a | 2,944,528 | Closed | 2931 | 100 | European consortium [ | DSA | |
| R2-561 | II | 1/2c | 2,945,851 | 37 | 2993 | 99.78 | Broad Institute | DS | |
| LO28 | II | 1/2c | 2,675,580 | 1150 | 3030 | 99.6 | Broad Institute/Institut Pasteur | D | |
| Finland 1988 | II | 3a | 2,834,040 | 49 | 2740 | 98.49 | Broad Institute | S | |
| 10403S | II | 1/2a | 2,873,541 | 21 | 2905 | 98.48 | Broad Institute | DS | |
| F2-515 | II | 1/2a | 1,815,995 | 1728 | 2710 | 98.47 | Broad Institute | D | |
| N3-165 | II | 1/2a | 2,884,080 | 39 | 2885 | 98.39 | Broad Institute | DS | |
| J2-003 | II | 1/2a | 2,741,640 | 795 | 2972 | 98.32 | Broad Institute | D | |
| F6900 | II | 1/2a | 2,968,620 | 23 | 3007 | 98.28 | Broad Institute | DS | |
| F6854 | II | 1/2a | 2,950,285 | 133 | 2967 | 98.26 | TIGR | DS | |
| J2818 | II | 1/2a | 2,973,040 | 24 | 3020 | 98.24 | Broad Institute | DS | |
| J0161 | II | 1/2a | 3,062,582 | 25 | 3114 | 98.23 | Broad Institute | DS | |
| J1-175 | I | 1/2b | 2,866,484 | 457 | 3178 | 94.39 | Broad Institute | D | |
| J2-064 | I | 1/2b | 2,828,700 | 545 | 2968 | 94.37 | Broad Institute | D | |
| R2-503 | I | 1/2b | 2,991,493 | 55 | 2968 | 94.28 | Broad Institute | S | |
| J1-194 | I | 1/2b | 2,989,818 | 30 | 3040 | 94.27 | Broad Institute | DS | |
| N1-017 | I | 4b | 3,142,060 | 79 | 3253 | 94.2 | Broad Institute | DS5 | |
| Clip 80459 | I | 4b | 2,912,690 | Closed | 2972 | 94.17 | Institut Pasteur | S | |
| F2365 | I | 4b | 2,905,187 | Closed | 2907 | 94.14 | TIGR [ | DS | |
| H7858 | I | 4b | 2,972,254 | 181 | 3195 | 94.08 | TIGR | DS | |
| HPB2262 | I | 4b | 2,991,120 | 79 | 3067 | 93.98 | Broad Institute/Istituto Superiore di Sanita | DS | |
| HCC23 | III | 4a | 2,976,212 | Closed | 3059 | 92.38 | Mississippi State University | S | |
| F2-524 | IIIA | 4a | - | - | - | - | - | - | A |
| F2-501 | IIIA | 4b | - | - | - | - | - | - | A |
| J2-071 | IIIA | 4c | 2,851,800 | 53 | 2778 | 92.6 | Broad Institute | DA5 | |
| J1-208 | IIIB | 4a | 1,963,740 | 1660 | 2809 | 91.8 | Broad Institute | DA | |
| M1-002 | IIIB | 4b | - | - | - | - | - | - | A |
| W1-111 | IIIB | 4c | - | - | - | - | - | - | A |
| F2-208 | IIIC | 4a | - | - | - | - | - | Life Technologies Corporation/Cornell University | A |
| F2-569 | IIIC | 4b | - | - | - | - | - | - | A |
| W1-110 | IIIC | 4c | - | - | - | - | - | - | A |
1Number of contigs based on GenBank at the time of our study. Strains with > 200 contigs were sequenced only to low coverage and were excluded from analysis.
2Number of annotated protein coding genes and RNAs based on GenBank.
3Nucleotide sequence identity in reference to EGD-e.
4Strains used for array design (D); comparative sequence analysis (S), comparative genomic hybridizations (A).
5Strains N1-017 and J2-071 were found to be mislabeled in GenBank; this has since been fixed.
- Information not available.
Probe coverage of newly sequenced genomes
| Genome | Lineage | Probe coverage | ||
|---|---|---|---|---|
| 100% | 90% | 80% | ||
| R2-561 | II | 0.95 | 0.98 | 0.98 |
| Clip 80459 | I | 0.91 | 0.99 | 0.99 |
| Finland 1988 | I | 0.80 | 0.96 | 0.98 |
| HCC23 | III | 0.30 | 0.80 | 0.89 |
Proportion of genes from four newly sequenced strains with probe coverage meeting a minimum percentage of the gene length (100%, 90%, 80%) for probes containing at most one SNP.
Figure 1Circular maps that compare the genomes of nine . The inner most circle is the reference genome. Core genes in the reference genome are shown blue and accessory genes are shown in yellow. From inside out, the second to the tenth circles represent the nine LIII genomes, including J2-071 (LIIIA), F2-501 (LIIIA), F2-504 (LIIIA), J1-208 (LIIIB), M1-002 (LIIIB), W1-111 (LIIIB), F2-208 (LIIIC), F2-569 (LIIIC), and W1-110 (LIIIC), respectively. Genes in LIII genomes are color-coded based on the PF values (see the reference bar). Green indicates a gene is absent (PF = 0) in a LIII genome; red indicates a gene is conserved (PF = 1) in a LIII genome at the corresponding location in the reference genome. The eleventh circle gives color-coded gene annotations in the reference genome based Clusters of Orthologous Groups of proteins (see the color codes at the bottom). The outer most circle provides relative genomic coordinates. Eight DDG clusters at similar genomic locations in EGD-e and F2365 are marked with letters A through H. Specifically: A, lmo0037-0041 (or lmof2365_0045-0050); B, lmo0357-0360 (or lmof2365_0377-0381); C, lmo0631-0633 (or lmof2365_0660-0662); D, lmo1030-1036 (or lmof2365_1051-1057); E, lmo2133-2138; F, lmo2732-2736 (or lmof2365_2719-2723); G, lmo2771-2773 (or lmof2365_2761-2763); and H, lmo2846-2851 (or lmof2365_2836-2841), respectively. The LII-specific comK prophage integration region was marked in the EGD-e genome (I). The figure was created using GenomeViz.
Figure 2Receiver operating characteristic curves. ROC curves compare true-positive rates with false-positive rates of different PF cutoffs for prediction of the presence or absence of individual gene variants and homologous groups. Error rates are shown for genes (dotted lines) and homologous groups (solid lines), computed from EGD-e (red) and J2-071 (black) control hybridizations. Circles indicate the chosen PF cutoff of 0.6 for classifying gene variants. Triangles indicate the chosen PF cutoff of 0.6 for classifying homologous groups.
Accuracy of the pan-genome array for detecting genes and homologous groups
| Chip Data | Test Data | Present | Absent | ACC | TPR | FPR | FDR |
|---|---|---|---|---|---|---|---|
| EGD-e | EGD-e genes only | 2846 | 0 | 1.000 ± 0.000 | 1.000 ± 0.000 | N/A | N/A |
| EGD-e | All gene variants | 49068 | 2746 | 0.973 ± 0.002 | 0.973 ± 0.003 | 0.020 ± 0.009 | 0.001 ± 0.000 |
| EGD-e | Gene groups | 2642 | 918 | 0.989 ± 0.002 | 0.993 ± 0.001 | 0.024 ± 0.007 | 0.008 ± 0.003 |
| EGD-e(-) | Gene groups | 2627 | 918 | 0.987 ± 0.002 | 0.991 ± 0.001 | 0.024 ± 0.007 | 0.008 ± 0.003 |
| J2-071 | J2-071 genes only | 2694 | 0 | 1.000 | 1.000 | N/A | N/A |
| J2-071 | All gene variants | 47411 | 4403 | 0.964 | 0.970 | 0.090 | 0.009 |
| J2-071 | Gene groups | 2543 | 1017 | 0.978 | 0.995 | 0.063 | 0.025 |
| J2-071(-) | Gene groups | 2468 | 1016 | 0.969 | 0.982 | 0.062 | 0.025 |
Present/Absent are based on a tblastn search. ACC, TPR, FPR, FDR stand for accuracy, true-positive rate, false-positive rate, and false discovery rate, respectively. (-) Excludes all probes directly targeting the hybridized strain from the analysis to simulate detection accuracy for an unknown strain. For EGD-e, the mean of 9 data sets are given, along with their standard deviation to illustrate array reproducibility.
Figure 3Prediction of core, new and pan genes in . (A) Exponential regression analysis that predicts the number of core genes in N sequenced genomes. For each N, permutations are randomly sampled and the number of core genes conserved in all N genomes is computed. The estimated number of core genes in 26 L. monocytogenes genomes ranges from 2,330 to 2,456. The sampled distribution is represented by a smoothed color density plot obtained through kernel density estimation. Yellow indicates the lowest density and purple indicates the highest density. For each N, black circles indicate the mean value and whiskers indicate the 5th and the 95th percentiles of the distribution. An exponential decay fit to the means is given by a solid red curve. A modified exponential decay is given by a solid black curve, which better fits the observed data by accounting for false-negative gene calls. (B) Power law regression analysis predicts the number of new genes that will be discovered by sequencing additional L. monocytogenes genomes. The LIII genomes are the outliers that pull the means higher, indicating that LIII diversity has not yet been fully sequenced. (C) Power law regression analysis predicts the number of L. monocytogenes pan genes accumulated from genome sequencing is currently 4,052 and growing.
Lineage specific genes in L. monocytogenes
| Gene | Genome | Annotation |
|---|---|---|
| F2365 | Hypothetical protein | |
| F2365 | Hypothetical protein | |
| F2365 | Hypothetical protein | |
| F2365 | Similar to cell surface anchor family protein | |
| EGD-e | Hypothetical protein | |
| EGD-e | Hypothetical protein | |
| EGD-e | Similar to two-component sensor histidine kinase | |
| EGD-e | Similar to creatinine amidohydrolases | |
| EGD-e | Similar to 2-keto-3-deoxygluconate-6-phosphate aldolase | |
| J2-071 | Hypothetical protein | |
| J2-071 | Hypothetical protein | |
| J2-071 | Similar to ADP-ribose 1''-phosphate domain protein | |
| J2-071 | Hypothetical protein | |
| J2-071 | Hypothetical protein | |
| J2-071 | Hypothetical protein |
Lineage specificity is based on comparative analysis of 26 genomes in this study, including 7 LI strains (F2365, H7858, Clip 80459, N1-017, R2-503, HPB2262 and J1-194), 9 LII strains (EGD-e, R2-561, Finland 1988, 10403S, N3-165, F6900, F6854, J2818 and J0161) and 10 LIII genomes (HCC23, J2-071, F2-501, F2-524, J1-208, M1-002, W1-111, F2-208, F2-569 and W1-110). Gene ID is designated based on a respective reference genome.
Genes that are conserved in LI and LII but absent or disparately distributed in LIII
| Annotation | ||||
|---|---|---|---|---|
| Similar to PTS system, enzyme IIA component | IIIA | + | 059 | |
| Similar to PTS system, fructose-specific enzyme IIBC component | IIIA | + | 059 | |
| Similar to D-fructose-1,6-biphosphate aldolase" | IIIA | + | - | |
| Similar to PTS system, fructose-specific IIA component | IIIA | - | - | |
| Similar to PTS system, fructose-specific IIC component | IIIA | + | - | |
| Similar to PTS system, fructose-specific IIB component | IIIA | + | - | |
| Similar to ribulose-5-phosphate 3-epimerase | IIIA | + | 119 | |
| Similar to ribose 5-phosphate isomerase | IIIA | + | 119 | |
| Similar to PTS system, beta-glucoside-specific enzyme IIABC | IIIA | + | 119 | |
| Similar to 6-phospho-beta-glucosidase | IIIA | + | 119 | |
| Similar to putative sugar ABC transporter, permease protein | IIIA | + | - | |
| Similar to ABC transporter, permease protein | IIIA | + | - | |
| Hypothetical protein | IIIA | - | 166 | |
| Similar to transketolase | IIIA | - | 166 | |
| Similar to transketolase | IIIA | - | 166 | |
| Similar to PTS beta-glucoside-specific enzyme IIABC | IIIA | + | 166 | |
| Similar to pentitol PTS system enzyme II C component | IIIA | + | - | |
| Similar to pentitol PTS system enzyme II B component | IIIA | + | - | |
| Similar to PTS system enzyme II A component | IIIA | + | - | |
| Similar to fructose-1,6-biphosphate aldolase type | IIIA | + | - | |
| Similar to fructose-1,6-biphosphate aldolase type II | IIIA | + | - | |
| Similar to PTS system, fructose-specific enzyme IIC component | IIIA | + | - | |
| Similar to PTS system, fructose-specific enzyme IIB component | IIIA | + | - | |
| Similar to PTS system, fructose-specific enzyme IIA component | IIIA | + | - | |
| Similar to mannose-6-phosphate isomerase | IIIA | - | - | |
| Similar to PTS system, fructose-specific IIABC component | IIIA | + | 494 | |
| Similar to sugar hydrolase | IIIA | + | 494 | |
| Similar to Sucrose phosphorylase | IIIA | + | 494 | |
| Hypothetical protein | IIIA | + | 494 | |
| Similar to beta-glucosidase | IIIA | + | - | |
| Similar to PTS system, beta-glucoside-specific enzyme IIABC | IIIA | + | - | |
| Similar to rhamnulose-1-phosphate aldolase | IIIA | + | 516 | |
| Similar to L-rhamnose isomerase | IIIA | + | 516 | |
| Similar to rhamnulokinase | IIIA | + | 516 | |
| Similar to sugar transport proteins | IIIA | + | 516 | |
| Similar to | IIIA | - | - | |
| Similar to sugar transferase | IIIA | + | - | |
| Similar to ABC transporters (permease protein) | IIIA | + | - | |
| TagB, teichoic acid biosynthesis protein B precursor | IIIA | + | 177 | |
| TagD, teichoic acid biosynthesis protein D | IIIA | + | 177 | |
| Similar to internalin, putative peptidoglycan bound protein | IIIA | - | - | |
| Putative peptidoglycan bound protein (LPXTG motif) | IIIA | + | - | |
| Similar to internalin, putative peptidoglycan bound protein | IIIA | + | - | |
| Similar to internalin, putative peptidoglycan bound protein | IIIA | + | - | |
| Peptidoglycan linked protein (LPXTG motif) | IIIA | - | - | |
| Putative peptidoglycan binding protein (LPXTG motif) | IIIA | - | - | |
| Putative peptidoglycan binding protein (LPXTG motif) | IIIA | + | - | |
| Similar to glycosyl transferases | IIIA | + | - | |
| Hypothetical protein | IIIA | - | - | |
| Similar to glycerol kinase | IIIA | + | 166 | |
| Weakly similar to a putative haloacetate dehalogenase | IIIA | - | - | |
| Similar to putative phosphotriesterase related proteins | IIIA | + | - | |
| Protein gp18, bacteriophage A118 | IIIA | + | - | |
| Protein gp17, bacteriophage A118 | IIIA | + | - | |
| SepA, required for septum formation | IIIA | - | - | |
| Similar to amino acid transporter | IIIB | - | - | |
| Similar to agmatine deiminase | IIIB | - | 008 | |
| Similar to carbamate kinase | IIIB | - | 008 | |
| Similar to agmatine deiminase | IIIB | - | - | |
| Similar to glutamate decarboxylase | IIIA | + | - | |
| Similar to amino acid antiporter | IIIA | + | - | |
| Similar to penicillin acylase and to conjugated bile acid hydrolase | IIIA | - | - | |
| Weakly similar to a bile acid 7-alpha dehydratase | IIIA | - | - | |
| Similar to transcription regulator, RpiR family | IIIB | - | - | |
| Similar to transcriptional regulator, DeoR family | IIIA | + | - | |
| Hypothetical protein | IIIA | + | - | |
| Similar to transcription regulator, Crp/Fnr family | IIIA | - | - | |
| Similar to transcription regulator, BglG family | IIIA | + | - | |
| Similar to 2-component response regulator | IIIA | + | - | |
| Similar to transcription regulator, BglG family | IIIA | + | - | |
| Similar to transcription regulator, GntR family | IIIA | - | - | |
| Similar to repressor protein | IIIA | + | - | |
| Similar to transcription regulator, RpiR family | IIIA | - | - | |
| Similar to transcription antiterminator | IIIA | + | - | |
| Similar to transcription regulator, AraC family | IIIA | + | - | |
| Similar to ABC transporter (ATP binding protein) | IIIA | + | - | |
| CadA, cadmium resistance protein | IIIA | + | - | |
| Similar to amidases | IIIB | + | - | |
| Hypothetical protein | - | - | - | |
| Hypothetical protein | IIIA | + | - | |
| Hypothetical protein | IIIA | + | - | |
| Hypothetical protein | IIIA | + | - | |
| Similar to B. subtilis YulD protein | IIIA | + | 516 | |
| Hypothetical protein | IIIA | - | 166 | |
| Hypothetical protein | IIIA | + | - | |
| Hypothetical protein | IIIA | + | - | |
1Genes conserved in all LI and LII genomes but absent in two or more LIII sub-groups (IIIA, IIIB or IIIC). Genes are listed based on their annotation in functional groups.
2LIII subgroup in which a listed gene is present.
3Presence "+" or absence "-" of a gene in L. innocua genome.
4Genes belong to an annotated operon based on [45]; "-", not annotated in operons.
Small regulatory RNAs absent or divergent in LIII genomes
| RNA | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| IIIA | IIIB | IIIC | ||||||||
| J2-071 | F2-501 | F2-524 | J1-208 | W1-111 | M1-002 | F2-569 | F2-208 | W1-110 | ||
| rli62 | n/a | - | - | - | - | - | + | - | - | - |
| rliG | n/a | - | - | - | - | - | - | - | + | - |
| rli38 | ↑ in broth & blood | + | - | - | - | - | - | - | - | + |
| rli48 | ↑ in intestine | - | - | - | - | - | + | - | + | - |
| rli26 | ↑ in blood | + | + | + | - | - | - | - | - | - |
| rli29 | ↑ in intestine & blood | - | - | - | + | - | + | + | - | - |
| rli49 | n/a | - | - | - | - | - | - | - | - | - |
| rliC | ↓ in blood | + | + | + | - | - | - | - | - | + |
1Up-regulated "↑", or down-regulated "↓" in vivo [45]; n/a, information not available.
2Gene is either present "+" or absent "-" in a LIII genome.
Figure 4Phylogenomic reconstruction of 26 . (A) Neighbor joining (NJ) tree based on the presence or absence of 3,560 HGs in 7 LI, 9 LII and 10 LIII genomes. EGD-e and J2-071 are analyzed by both BLAST and CGH data. Braches with bootstrap (1,000 replicates) values less than 70% were labeled in red. (B) NJ tree based on the presence or absence of 2,855 EGD-e core genes. (C) Split network based on the distribution of 3,560 HGs in 26 L. monocytogenes genomes.
Figure 5Phylogenetic analysis of the three LIII subgroups. (A) A rooted tree shows the phylogenetic relatedness of the 9 LIII strains analyzed by CGH and 1 sequenced LIII strain HCC23. The tree was rooted by EGD-e and reconstructed based on the presence or absence of 3,560 HGs using the maximum-likelihood gene content method. Two branches with bootstrap values lower than 70% (1,000 replicates) are highlighted in red. (B) Neighbor-net split network shows the phylogenetic relatedness of 10 LIII strains. (C) A heat map based on PF values shows the distribution of 206 phylogenetically informative LI and LII core genes in 10 LIII strains.
Summary of pan-genomic studies
| Species | No. core genes | No. pan genes | Avg. no. genes | % Core genes | Ref | |||
|---|---|---|---|---|---|---|---|---|
| 20 | Open | 1976 | > 17838 | 4700 | 42% | 80/80 | [ | |
| 17 | Open | 2200 | > 13000 | 5020 | 44% | 0.8 BSR | [ | |
| 32 | Open | 1563 | > 9433 | 4537 | 34% | 50/50 | [ | |
| 13 | Finite | 1461 | 4425-6052 | 1970 | 74% | 70/70 | [ | |
| 26 | Open | 2350-2450 | > 4000 | 2978 | 80% | 0.5 SSR | This study | |
| 7 | Open | 1333 | > 3290 | 1963 | 68% | 50/50 | [ | |
| 8 | Open | 1806 | > 2750 | 2245 | 80% | 50/50 | [ | |
| 8 | *Open | 1472 | *> 2800 | 2198 | 67% | 1e-5 E-value | [ | |
| 17 | Finite | 1380 | 5100 | 2438 | 57% | 70/70 | [ | |
| 11 | *Closed | 1376 | *2500 | 1878 | 73% | 1e-5 E-value | [ |
All numbers are estimates in this table.
1Only studies including more than five strains are shown.
2Pan-genome growth behaviors as described by the authors. * Estimated from figures, but not explicitly stated in the paper.
3Cutoff values and methods for defining core and pan genes vary widely across the different studies. This column only gives a rough summary of the similarity cutoff. Cutoffs of the form I/L indicate a minimum BLAST hit of I% similarity over L% of the protein length. BSR is Blast Score Ratio [32]. SSR is the similarity score ratio used in this study, similar to BSR.