| Literature DB >> 24158624 |
Ruth E Timme1, James B Pettengill, Marc W Allard, Errol Strain, Rodolphe Barrangou, Chris Wehnes, Joann S Van Kessel, Jeffrey S Karns, Steven M Musser, Eric W Brown.
Abstract
The enteric pathogen Salmonella enterica is one of the leading causes of foodborne illness in the world. The species is extremely diverse, containing more than 2,500 named serovars that are designated for their unique antigen characters and pathogenicity profiles-some are known to be virulent pathogens, while others are not. Questions regarding the evolution of pathogenicity, significance of antigen characters, diversity of clustered regularly interspaced short palindromic repeat (CRISPR) loci, among others, will remain elusive until a strong evolutionary framework is established. We present the first large-scale S. enterica subsp. enterica phylogeny inferred from a new reference-free k-mer approach of gathering single nucleotide polymorphisms (SNPs) from whole genomes. The phylogeny of 156 isolates representing 78 serovars (102 were newly sequenced) reveals two major lineages, each with many strongly supported sublineages. One of these lineages is the S. Typhi group; well nested within the phylogeny. Lineage-through-time analyses suggest there have been two instances of accelerated rates of diversification within the subspecies. We also found that antigen characters and CRISPR loci reveal different evolutionary patterns than that of the phylogeny, suggesting that a horizontal gene transfer or possibly a shared environmental acquisition might have influenced the present character distribution. Our study also shows the ability to extract reference-free SNPs from a large set of genomes and then to use these SNPs for phylogenetic reconstruction. This automated, annotation-free approach is an important step forward for bacterial disease tracking and in efficiently elucidating the evolutionary history of highly clonal organisms.Entities:
Keywords: CRISPR; H antigens; O antigens; comparative method; lineage-through-time plot; serovar
Mesh:
Substances:
Year: 2013 PMID: 24158624 PMCID: PMC3845640 DOI: 10.1093/gbe/evt159
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
FPhylogenetic tree based on the maximum-likelihood method implemented in RAxML. Bold black branches represent 90–100% bootstrap support. Bold gray branches represent 70–90% bootstrap support. Numbers associated with branches are SNPs unique to that lineage. For the purposes of this figure, the long outgroup branches were shortened; however, the original tree is available for download at TreeBase.org. Three antigen characters are mapped onto this phylogeny: O group, Phase 1 (H) flagellar antigen, and Phase 2 (H) flagellar antigen.
FLineage-through-time plots illustrating fluctuations in diversification rate throughout the evolutionary history of S. enterica.
Serovar-Level Characters and Statistics
| Number of Strains | GSI | Antigen Characters | |||
|---|---|---|---|---|---|
| O Group | H Phase 1 | Phase 2 | |||
| 4,[5],12:i:- | 2 | 4 | i | — | |
| Abaetetuba | 1 | NA | 11 | k | 1,5 |
| Abony | 1 | NA | 4 | b | e,n,x |
| Adelaide | 1 | NA | 35 | f,g | — |
| Agona | 4 | 4 | f,g,s | [1,2] | |
| Alachua | 1 | NA | 35 | z4,z23 | — |
| Albany | 1 | NA | 8 | z4,z24 | — |
| Anatum | 2 | 3,10 | e,h | 1,6 | |
| Baildon | 1 | NA | 9,46 | a | e,n,x |
| Bareilly | 6 | 7 | y | 1,5 | |
| Berta | 1 | NA | 9 | [f],g,[t] | — |
| Braenderup | 2 | 4 | e,h | e,n,z15 | |
| Bredeney | 1 | NA | 4 | l,v | 1,7 |
| Cerro | 1 | NA | 18 | z4,z23 | [1,5] |
| Chester | 1 | NA | 4 | e,h | e,n,x |
| Choleraesuis | 3 | 7 | c | 1,5 | |
| Cubana | 1 | NA | 13 | z29 | — |
| Derby | 1 | NA | 4 | f,g | [1,2] |
| Dublin | 2 | 9 | g,p | — | |
| Eastbourne | 1 | NA | 9 | e,h | 1,5 |
| Enteritidis | 3 | 9 | g,m | — | |
| Gallinarum | 2 | 9 | — | — | |
| Gaminara | 2 | 16 | d | 1,7 | |
| Give | 2 | 0.105 | 3,10 | l,v | 1,7 |
| Hadar | 2 | 8 | z10 | e,n,x | |
| Hartford | 1 | NA | 7 | y | e,n,x |
| Havana | 1 | NA | 13 | f,g,[s] | — |
| Heidelberg | 3 | 4 | r | 1,2 | |
| Hvittingfoss | 1 | NA | 16 | b | e,n,x |
| Indiana | 1 | NA | 4 | z | 1,7 |
| Infantis | 1 | NA | 7 | r | 1,5 |
| Inverness | 2 | 38 | k | 1,6 | |
| Javiana | 3 | 9 | l,z28 | 1,5 | |
| Johannesburg | 1 | NA | 40 | b | e,n,x |
| Kentucky | 4 | 8 | i | z6 | |
| Litchfield | 1 | NA | 8 | l,v | 1,2 |
| London | 1 | NA | 3,10 | l,v | 1,6 |
| Manhattan | 1 | NA | 8 | d | 1,5 |
| Mbandaka | 1 | NA | 7 | z10 | e,n,z15 |
| Meleagridis | 1 | NA | 3,10 | e,h | l,w |
| Miami | 1 | NA | 9 | a | 1,5 |
| Minnesota | 2 | 21 | b | e,n,x | |
| Mississippi | 1 | NA | 13 | b | 1,5 |
| Montevideo | 5 | 7 | g,m,[p],s | [1,2,7] | |
| Muenchen | 3 | 8 | d | 1,2 | |
| Muenster | 2 | 3,10 | e,h | 1,5 | |
| Nchanga | 2 | 3,10 | l,v | 1,2 | |
| Newport | 7 | 8 | e,h | 1,2 | |
| Norwich | 1 | NA | 7 | e,h | 1,6 |
| Ohio | 1 | NA | 7 | b | l,w |
| Oranienburg | 2 | 7 | m,t | [z57] | |
| Panama | 1 | NA | 28 | l,v | 1,5 |
| Paratyphi A | 3 | 2 | a | [1,5] | |
| Paratyphi B | 9 | 4 | b | 1,2 | |
| Paratyphi C | 1 | NA | 6,7 | c | 1,5 |
| Pomona | 1 | NA | 28 | y | 1,7 |
| Poona | 1 | NA | 13 | z | 1,6 |
| Pullorum | 2 | 9 | — | — | |
| Rissen | 1 | NA | 7 | f,g | — |
| Rubislaw | 2 | 0.497 | 11 | r | e,n,x |
| Saintpaul | 4 | 4 | e,h | 1,2 | |
| Schwarzengrund | 2 | 4 | d | 1,7 | |
| Senftenberg | 6 | 1,3,19 | g,[s],t | — | |
| Sloterdijk | 1 | NA | 4 | z35 | z6 |
| Soerenga | 1 | NA | 30 | i | l,w |
| Stanley | 1 | NA | 4 | d | 1,2 |
| Stanleyville | 1 | NA | 4 | z4,z23 | [1,2] |
| Tallahassee | 1 | NA | 8 | z4,z32 | — |
| Tennessee | 3 | 7 | z29 | [1,2,7] | |
| Thompson | 1 | NA | 7 | k | 1,5 |
| Typhi | 2 | 9 | d | — | |
| Typhimurium | 4 | 4 | i | 1,2 | |
| Uganda | 1 | NA | 3,10 | l,z13 | 1,5 |
| Urbana | 2 | 30 | b | e,n,x | |
| Virchow | 2 | 7 | r | 1,2 | |
| Wandsworth | 1 | NA | 39 | b | 1,2 |
| Weltevreden | 1 | NA | 3,10 | r | z6 |
| Worthington | 1 | NA | 13 | z | l,w |
Note.—GSI, genealogical sorting index. NA, not applicable.
FBayesian clustering results for values of k = 2–5 based on the matrix containing SNPs present in at least 95% of the samples (outgroups were excluded). Different colors represent different clusters and the bars represent different individuals. The extent to which different colors comprise a bar is indicative of the degree of admixture. Samples are in the same order as they are in the ML phylogeny (fig. 1), which is shown for comparison.
SNP Annotations for Clade A and Clade B as Determined by kSNP
| kSNP Locus ID | Annot. | aa Change | Codon Change | Pos. in CDS | AE006468 Location | Gene | Locus Tag | Strand | Product Name |
|---|---|---|---|---|---|---|---|---|---|
| 115230 | Coding | A_A | GCC_GCG | 345 | 2417643 | STM2310 | − | Isochorismate synthase | |
| 139194 | Coding | P_P | CCC_CCG | 90 | 729566 | gltI, menF | STM0665 | − | Glutamate/aspartate transporter |
| 198788 | Coding | S_S | AGC_AGT | 126 | 2402046 | yfaZ | STM2294 | − | Putative inner membrane protein |
| 22659 | Coding | V_V | GTC_GTT | 125 | 1784052 | sapA | STM1692 | + | ABC superfamily peptide transport protein |
| 245765 | Coding | V_V | GTC_GTT | 206 | 1276770 | plsX | STM1192 | + | Putative fatty acid/phospholipid synthesis protein |
| 2656 | Coding | T_T | ACC_ACT | 57 | 17316 | STM0017 | − | Putative protein | |
| 266398 | Coding | A_A | GCC_GCT | 30 | 1518497 | STM1441 | − | Putative inner membrane protein | |
| 267325 | Coding | L_L | CTG_TTG | 370 | 552354 | fsr | STM0493 | − | Putative MFS family of transport protein |
| 34745 | Intergenic | 80406 | + | Upstream of STM0068, transcriptional regulator | |||||
| 365928 | Intergenic | 1332201 | + | Upstream of STM1246, reduced macrophage survival protein | |||||
| 370045 | Coding | S_S | TCA_TCG | 460 | 2727192 | lepA | STM2583 | − | GTP-binding elongation factor |
| 389879 | Coding | I_I | ATC_ATT | 97 | 729545 | gltI | STM0665 | − | Glutamate/aspartate transporter |
| | Coding | ||||||||
| 421386 | Coding | P_P | CCC_CCT | 43 | 2022982 | yecG | STM1927 | + | Putative universal stress protein |
| 444415 | Coding | V_V | GTA_GTT | 109 | 4820845 | yhhI | STM4566 | − | Putative cytoplasmic protein |
| 457785 | Intergenic | 3515395 | + | Upstream of STM3348, serine endoprotease | |||||
| 461849 | Coding | T_T | ACC_ACT | 26 | 2438962 | nuoB | STM2327 | − | NADH dehydrogenase I chain B |
| 559334 | Coding | G_G | GGG_GGT | 294 | 1524368 | tyrS | STM1449 | + | Tyrosine tRNA synthetase |
| 572551 | Coding | T_T_T | ACA_ACC_ACT | 650 | 2618849 | rihC | STM2503 | − | Putative diguanylate cyclase |
| 609237 | Coding | P_P | CCA_CCG | 13 | 60202 | mig-14 | STM0051 | + | Putative purine nucleoside hydrolase |
| | Coding | ||||||||
| 632030 | Coding | A_A_A | GCA_GCC_GCT | 502 | 132233 | leuA | STM0113 | − | 2-isopropylmalate synthase |
| 647650 | Intergenic | 1528293 | − | Upstream of STM1542, putative POT family peptide transport protein | |||||
| 78823 | Coding | L_L | TTA_TTG | 41 | 1832990 | cls | STM1739 | + | Cardiolipin synthase |
| 135358 | Intergenic | 2449145 | + | Upsteam of STM2338, phosphotransacetylase | |||||
| 267325 | Coding | L_L | CTG_TTG | 370 | 552354 | fsr | STM0493 | − | Putative MFS family of transport protein |
| 271444 | Coding | Y_Y | TAC_TAT | 279 | 59782 | STM0050 | + | Putative nitrite reductase | |
| 370045 | Coding | S_S | TCA_TCG | 460 | 2727192 | lepA | STM2583 | − | GTP-binding elongation factor |
| | Coding | − | |||||||
| 572551 | Coding | T_T_T | ACA_ACC_ACT | 650 | 2618849 | STM2503 | − | Putative diguanylate cyclase | |
Note.—SNPs that produce nonsynonymous amino acid substitutions are highlighted in Bold.
FML phylogeny from figure 1, pruned for strains for which we have CRISPR data (126 in-house collected draft genomes plus published complete genomes). The four most ancestral spacers were extracted from the CRISPR alignment in supplementary data S4, Supplementary Material online and mapped onto the tree. Spacers with the same coloring represent the exact same underlying sequence. Different coloring represents different underlying sequence. Blue bars represent CRISPR length, which was determined from the number of unaligned spacers for each CRISPR locus. Spacer deletions are represented by a black square with an x.
GSI Values for the Antigen Character States
| Antigens | Num Taxa | GSI | |
|---|---|---|---|
| 8 | 11 | 0.27 | |
| 3,10 | 9 | 0.14 | 0.490 |
| 16 | 2 | 0.05 | 0.893 |
| 7 | 13 | 0.26 | |
| 4 | 20 | 0.16 | 0.461 |
| 9 | 9 | 0.24 | |
| 35 | 2 | 0.13 | 0.165 |
| 13 | 4 | 0.15 | 0.158 |
| 1,3,19 | 2 | 0.11 | 0.226 |
| 30 | 2 | 0.06 | 0.774 |
| 28 | 2 | 0.24 | 0.050 |
| 11 | 2 | 1.00 | |
| D | 7 | 0.17 | 0.130 |
| e,h | 11 | 0.14 | 0.544 |
| l,v | 7 | 0.16 | 0.168 |
| b | 10 | 0.15 | 0.361 |
| y | 4 | 0.10 | 0.546 |
| z10 | 2 | 0.10 | 0.284 |
| a | 3 | 0.10 | 0.395 |
| i | 5 | 0.15 | 0.173 |
| r | 5 | 0.17 | 0.078 |
| – | 2 | 0.49 | |
| z4,z23 | 3 | 0.10 | 0.474 |
| k | 2 | 0.07 | 0.607 |
| c | 2 | 1.00 | |
| f,g | 3 | 0.13 | 0.169 |
| z | 3 | 0.11 | 0.297 |
| z29 | 2 | 0.24 | 0.052 |
| g,s,t | 2 | 0.11 | 0.226 |
| f,g,s | 3 | 0.20 | |
| 1,2 | 20 | 0.24 | |
| 1,5 | 15 | 0.27 | |
| 1,6 | 5 | 0.16 | 0.122 |
| e,n,x | 10 | 0.21 | |
| e,n,z15 | 2 | 0.10 | 0.288 |
| – | 16 | 0.28 | |
| l,w | 4 | 0.16 | 0.106 |
| 1,7 | 7 | 0.26 | |
| 1,2,7 | 2 | 0.06 | 0.766 |
| z6 | 4 | 0.25 | |
P <0.05 are in bold.
MultiState Lambda Test for Phylogenetic Signal
| Character | Ultrametric Tree | Star Phylogeny (Null) | Lambda Test | ||||
|---|---|---|---|---|---|---|---|
| Trait1.lnl | Trait1.q | Trait1.treeParam | Trait1.lnl.1 | Trait1.q.1 | Likelihood Ratio | ||
| O antigens | −226.91 | −1.74 | 0.90 | −244.18 | 11.34 | 34.53 | 4.19E-09 |
| H antigen, phase 1 | −269.36 | −3.41 | 0.98 | −287.13 | −16.63 | 35.53 | 2.51E-09 |
| H antigen, phase 2 | −185.02 | −1.44 | 0.90 | −216.91 | −14.40 | 63.78 | 1.39E-15 |