| Literature DB >> 24765574 |
James B Pettengill1, Ruth E Timme1, Rodolphe Barrangou2, Magaly Toro3, Marc W Allard1, Errol Strain1, Steven M Musser1, Eric W Brown1.
Abstract
Evolutionary studies of clustered regularly interspaced short palindromic repeats (CRISPRs) and their associated (cas) genes can provide insights into host-pathogen co-evolutionary dynamics and the frequency at which different genomic events (e.g., horizontal vs. vertical transmission) occur. Within this study, we used whole genome sequence (WGS) data to determine the evolutionary history and genetic diversity of CRISPR loci and cas genes among a diverse set of 427 Salmonella enterica ssp. enterica isolates representing 64 different serovars. We also evaluated the performance of CRISPR loci for typing when compared to whole genome and multilocus sequence typing (MLST) approaches. We found that there was high diversity in array length within both CRISPR1 (median = 22; min = 3; max = 79) and CRISPR2 (median = 27; min = 2; max = 221). There was also much diversity within serovars (e.g., arrays differed by as many as 50 repeat-spacer units among Salmonella ser. Senftenberg isolates). Interestingly, we found that there are two general cas gene profiles that do not track phylogenetic relationships, which suggests that non-vertical transmission events have occurred frequently throughout the evolutionary history of the sampled isolates. There is also considerable variation among the ranges of pairwise distances estimated within each cas gene, which may be indicative of the strength of natural selection acting on those genes. We developed a novel clustering approach based on CRISPR spacer content, but found that typing based on CRISPRs was less accurate than the MLST-based alternative; typing based on WGS data was the most accurate. Notwithstanding cost and accessibility, we anticipate that draft genome sequencing, due to its greater discriminatory power, will eventually become routine for traceback investigations.Entities:
Keywords: CRISPR; Evolution; Horizontal gene transfer; Outbreak; Phylogeny; Salmonella; Typing; Whole genome sequencing
Year: 2014 PMID: 24765574 PMCID: PMC3994646 DOI: 10.7717/peerj.340
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
The major clade within S. enterica to which each serovar belongs, number of isolates (N), and the average length of CRISPR1 (L1) and CRISPR2 (L2). Lengths are the number of spacers.
| Subspecies and Serovar | Clade |
|
|
|
|---|---|---|---|---|
| A | 2 | 14 | 4 | |
| A | 37 | 18 | 35 | |
| A | 1 | 17 | 16 | |
| A | 2 | 47 | 10 | |
| A | 2 | 13 | 27 | |
| A | 1 | 11 | 6 | |
| A | 2 | 41 | 55 | |
| B | 1 | 18 | 24 | |
| A | 1 | 43 | 48 | |
| B | 1 | 3 | 2 | |
| A | 2 | 16 | 10 | |
| A | 1 | 25 | 28 | |
| A | 1 | 27 | 60 | |
| A | 2 | 11 | 5 | |
| B | 1 | 3 | 2 | |
| A | 103 | 19 | 18 | |
| A | 1 | 21 | 4 | |
| B | 1 | 32 | 38 | |
| B | 1 | 70 | 26 | |
| A | 1 | 32 | 56 | |
| A | 1 | 32 | 51 | |
| A | 1 | 3 | 40 | |
| A | 55 | 33 | 52 | |
| A | 1 | 35 | 44 | |
| A | 1 | 21 | 20 | |
| B | 3 | 22 | 15 | |
| A | 9 | 42 | 30 | |
| A | 1 | 15 | 8 | |
| A | 1 | 47 | 4 | |
| A | 1 | 29 | 35 | |
| A | 1 | 5 | 22 | |
| A | 1 | 33 | 221 | |
| A | 1 | 51 | 48 | |
| B | 1 | 15 | 16 | |
| B | 1 | 39 | 26 | |
| B | 51 | 33 | 38 | |
| A | 3 | 14 | 31 | |
| B | 2 | 48 | 126 | |
| A | 2 | 20 | 10 | |
| A | 61 | 32 | 27 | |
| A | 1 | 5 | 4 | |
| A | 1 | 7 | 5 | |
| B | 3 | 19 | 56 | |
| B | 1 | 19 | 2 | |
| A | 1 | 7 | 14 | |
| A | 8 | 19 | 28 | |
| B | 1 | 23 | 2 | |
| B | 1 | 11 | 72 | |
| A | 4 | 13 | 4 | |
| B | 1 | 69 | 77 | |
| A | 1 | 47 | 60 | |
| B | 1 | 3 | 2 | |
| A | 3 | 44 | 23 | |
| A | 8 | 61 | 51 | |
| A | 1 | 29 | 42 | |
| A | 1 | 79 | 70 | |
| A | 1 | 41 | 22 | |
| A | 1 | 21 | 4 | |
| A | 1 | 17 | 6 | |
| A | 4 | 40 | 66 | |
| A | 18 | 53 | 43 | |
| B | 1 | 23 | 4 | |
| A | 1 | 33 | 72 | |
| A | 1 | 16 | 21 | |
| N/A | 1 | 15 | 40 | |
| N/A | 1 | 29 | 73 | |
| N/A | 1 | 17 | 0 | |
| N/A | 1 | 3 | 40 | |
| Summary | N/A | 431 | 27 | 33 |
Notes.
The total number of isolates and average lengths of CRISPR1 and CRISPR2 across all serovars.
Figure 1CRISPR variation.
Variation in CRISPR locus length among the 431 isolates for both locus 1 and locus 2. Boxes depict the interquartile (IQR) range and whiskers indicate 1.5 IQR; the horizontal black line represents the mean.
Figure 2Whole genome and cas gene phylogenies.
Phylogenetic relationships among the 431 isolates determined using whole genome sequencing data from which a SNP matrix was created using the k-mer based approach implemented in kSNP [39]. Bootstrap values are based on 100 traditional replicates created using seqboot within the phylip package [60]. The two cas gene profiles are also shown as different colors at the tips (cas type a (I-Ea) = blue; cas type b (I-Eb) = red). Branch width is indicative of bootstrap support value (thickest lines depict >80% bootstrap support). Gray colored branches represent lineages found in Clade B [16,38]; all other lineages except the outgroups belong to Clade A. The insert shows the phylogenetic relationships based on a phylogeny constructed using only the cas genes with tips colored according to cas type as shown in the larger phylogeny.
Figure 3Genealogical sorting index (gsi) results per dataset.
Boxplot illustrating the differences among the four datasets in gsi values, which was used as a metric to quantify how well the datasets constructed relationships congruent with taxonomy. Boxes depict the interquartile (IQR) range and whiskers indicate 1.5 IQR; the horizontal black line represents the mean. Gray dots represent observed values within each dataset and are dispersed horizontally (jittered) to decrease overlap.
Figure 4MLST phylogeny.
Phylogenetic relationships among the sampled isolates based on MLST matrix. Branch width is indicative of bootstrap support value (thickest lines depict >80% bootstrap support).
Figure 5CRISPR1 phenogram.
Phenogram depicting similarity among isolates in spacer composition of CRISPR1. Branch width is indicative of bootstrap support value (thickest lines depict >80% bootstrap support).
Figure 6CRISPR2 phenogram.
Phenogram depicting similarity among isolates in spacer composition of CRISPR2. Branch width is indicative of bootstrap support value (thickest lines depict >80% bootstrap support).
Figure 7DISTRUCT diagram depicting clusters based on CRISPR spacer similarity.
Model-based clustering results showing the assignment of individuals to different groups based on similarity in SNP profiles. Only serovars for which >3 isolates were sequenced are shown. Colors indicate the different clusters and the degree to which a vertical bar consists of multiple colors is indicative of the proportion of SNPs that resemble a particular cluster.
Figure 8Pairwise distances among individuals for each dataset.
Intra- and inter-serovar pairwise distance histograms for (A) the kSNP matrix, (B) MLST matrix, (C) CRISPR1 presence/absence matrix, and (D) CRISPR2 presence/absence matrix. Note that scales on the x-axis differ due to the method used to calculate distances and scale on the y-axis differs as a result of different binwidths and distribution of observations within each bin.
Mean intra- and inter-serovar pairwise distance estimates for the four different marker datasets.
| Marker type | Intraspecific | Interspecific |
|---|---|---|
| CRISPR1 | 1.9243 | 4.9884 |
| CRISPR2 | 1.4003 | 5.4235 |
| SNP | 0.0068 | 0.0725 |
| MLST | 0.0011 | 0.0112 |