| Literature DB >> 29295910 |
Lavanya Challagundla1, Xiao Luo1, Isabella A Tickler2, Xavier Didelot3, David C Coleman4, Anna C Shore4, Geoffrey W Coombs5,6, Daniel O Sordelli7, Eric L Brown8, Robert Skov9, Anders Rhod Larsen9, Jinnethe Reyes10, Iraida E Robledo11, Guillermo J Vazquez11, Raul Rivera11, Paul D Fey12, Kurt Stevenson13, Shu-Hua Wang13, Barry N Kreiswirth14, Jose R Mediavilla14, Cesar A Arias10,15, Paul J Planet16, Rathel L Nolan17, Fred C Tenover2, Richard V Goering18, D Ashley Robinson19.
Abstract
The USA300 North American epidemic (USA300-NAE) clone of methicillin-resistant Staphylococcus aureus has caused a wave of severe skin and soft tissue infections in the United States since it emerged in the early 2000s, but its geographic origin is obscure. Here we use the population genomic signatures expected from the serial founder effects of a geographic range expansion to infer the origin of USA300-NAE and identify polymorphisms associated with its spread. Genome sequences from 357 isolates from 22 U.S. states and territories and seven other countries are compared. We observe two significant signatures of range expansion, including decreases in genetic diversity and increases in derived allele frequency with geographic distance from the Pennsylvania region. These signatures account for approximately half of the core nucleotide variation of this clone, occur genome wide, and are robust to heterogeneity in temporal sampling of isolates, human population density, and recombination detection methods. The potential for positive selection of a gyrA fluoroquinolone resistance allele and several intergenic regions, along with a 2.4 times higher recombination rate in a resistant subclade, is noted. These results are the first to show a pattern of genetic variation that is consistent with a range expansion of an epidemic bacterial clone, and they highlight a rarely considered but potentially common mechanism by which genetic drift may profoundly influence bacterial genetic variation.IMPORTANCE The process of geographic spread of an origin population by a series of smaller populations can result in distinctive patterns of genetic variation. We detect these patterns for the first time with an epidemic bacterial clone and use them to uncover the clone's geographic origin and variants associated with its spread. We study the USA300 clone of methicillin-resistant Staphylococcus aureus, which was first noticed in the early 2000s and subsequently became the leading cause of skin and soft tissue infections in the United States. The eastern United States is the most likely origin of epidemic USA300. Relatively few variants, which include an antibiotic resistance mutation, have persisted during this clone's spread. Our study suggests that an early chapter in the genetic history of this epidemic bacterial clone was greatly influenced by random subsampling of isolates during the clone's geographic spread.Entities:
Keywords: epidemics; fluoroquinolones; founder effects; genetic drift; population genetics; range expansion
Mesh:
Year: 2018 PMID: 29295910 PMCID: PMC5750399 DOI: 10.1128/mBio.02016-17
Source DB: PubMed Journal: mBio Impact factor: 7.867
FIG 1 ML phylogeny of USA300 with branch lengths corrected for recombination by CFML. Branch color indicates bootstrap support of the tipward node. Circle 1 indicates clades. Circle 2 indicates an FQ resistance mutation(s). The asterisk indicates an isolate with a resistance mutation in gyrA but not grlA. Circle 3 indicates the geographic sources of isolation.
Population genomic summary statistics for 15 populations of USA300-NAE
| Population(s) | No. of | Avg pairwise | θπ | θW | Tajima’s | Avg tip-to-tip | Sum of pairwise | |
|---|---|---|---|---|---|---|---|---|
| All 15 | 265 | 0.0345 | 0.0112 | 0.1483 | −2.9279 | 0.0034 | 1.8617 | |
| MS | 33 | 0.0350 | 0.0085 | 0.0238 | −2.4578 | 0.0139 | 1.4748 | 2.5384 |
| CA | 29 | 0.0471 | 0.0094 | 0.0245 | −2.4000 | 0.0164 | 1.5681 | 0.8437 |
| IL | 27 | 0.0205 | 0.0104 | 0.0331 | −2.7086 | 0.0068 | 1.6446 | 0.1011 |
| NY | 25 | 0.0288 | 0.0119 | 0.0311 | −2.4655 | 0.0137 | 1.846 | −1.1768 |
| NE | 22 | 0.0431 | 0.0117 | 0.0225 | −1.9546 | 0.0506 | 1.8624 | −0.0964 |
| TX | 21 | 0.0266 | 0.0104 | 0.0255 | −2.4299 | 0.0151 | 1.6183 | 0.0379 |
| OH | 17 | 0.0291 | 0.0151 | 0.032 | −2.2684 | 0.0233 | 2.1435 | −2.3722 |
| PA | 17 | 0.0244 | 0.0124 | 0.0255 | −2.2082 | 0.0272 | 1.8791 | −1.1698 |
| GA | 16 | 0.0200 | 0.0133 | 0.0278 | −2.2684 | 0.0233 | 1.9802 | −1.69 |
| FL | 14 | 0.0197 | 0.0096 | 0.0201 | −2.3475 | 0.0189 | 1.4509 | 0.7842 |
| IA | 11 | 0.0269 | 0.0099 | 0.0156 | −1.7511 | 0.0799 | 1.5167 | 0.5411 |
| OR | 11 | 0.0665 | 0.0095 | 0.0138 | −1.5072 | 0.1318 | 1.4318 | 0.4013 |
| MA | 9 | 0.0238 | 0.0131 | 0.0183 | −1.4688 | 0.1419 | 1.8973 | −1.2202 |
| WA | 7 | 0.0643 | 0.0079 | 0.0088 | −0.5311 | 0.5953 | 1.2155 | 3.2673 |
| SC | 6 | 0.0411 | 0.0116 | 0.0133 | −0.8322 | 0.4053 | 1.6125 | −0.7896 |
The average pairwise FST, θπ; θW, and D values were based on 2,599 biallelic, nonrecombinant SNPs.
The average tip-to-tip distance reflects branch lengths on an ML tree corrected for recombinant sites.
The sum of pairwise Ψ values was based on 2,595 biallelic, nonrecombinant SNPs where ancestral and derived alleles were assigned.
Relationships among genetic differentiation, geographic distance, and connectivity from all pairwise comparisons of 15 populations of USA300-NAE
| Test | All isolates | Recent isolates | ||
|---|---|---|---|---|
| Mantel | ||||
| Geographic distance | 0.397 | 0.0096 | 0.355 | 0.0187 |
| Log total airline passengers | −0.330 | 0.0732 | −0.384 | 0.0408 |
| Log total migrants | −0.210 | 0.1656 | −0.257 | 0.1112 |
| Partial Mantel | ||||
| Geographic distance accounting for log | 0.371 | 0.0245 | 0.325 | 0.0585 |
| Log total airline passengers accounting | −0.297 | 0.0974 | −0.357 | 0.0630 |
The Mantel and partial Mantel tests were performed with 10,000 permutations.
FIG 2 Signatures of range expansion and the origin of USA300-NAE. Panels A to C show the regressions of θπ, tip-to-tip distance, and Ψ, respectively, with geographic distance from Pennsylvania, when all isolates (from 2001 to 2011) were used. A solid line indicates the linear regression, and dotted lines indicate the 95% confidence intervals. The maps in panels A to C illustrate the correlations when each of the 15 populations was used as the origin, with interpolation of values between populations. Stronger evidence of origin is shown as yellow. Panels D to F show the results obtained when only recent isolates (from 2007 to 2011) were used. The asterisks on the maps in panels C and F indicate the coordinates of the selected origin when the nonlinear regression of Ψ was used.
FIG 3 Power analysis of signatures of range expansion. USA300-NAE SNPs were randomly downsampled to make 20 new data sets for each bin of downsampled SNPs and tested for a significant origin by using the signature of a decrease in θπ (panel A) or an increase in Ψ (panel B). Yellow represents those data sets giving a significant origin in the Pennsylvania region (i.e., Pennsylvania, Ohio, New York, or Massachusetts), red represents any other significant origin, and white represents a nonsignificant origin.
FIG 4 Identification of well-sampled derived alleles of USA300-NAE. Panel A shows the relationship between the frequency of derived alleles and the number of colonized populations when all isolates (from 2001 to 2011; black dots), recent isolates (from 2007 to 2011; red dots), or all isolates with allele frequencies binned into one of five 20% bins before averaging across populations (blue dots) were used. Panels B and C show the significant positive frequency gradients for the gyrA and ssa-1 alleles, respectively. The data points, regression lines, and 95% confidence intervals are shown for all isolates (in black) along with the regression lines for recent isolates (red lines) and for all isolates with binned allele frequencies (blue lines). The different origin populations giving the best correlation with the different analyses are indicated in panel C.
Unusual SNPs from various population genomic analyses
| Category and TCH1516 | TCH1516 locus, gene | Comment |
|---|---|---|
| SNPs that are polymorphic in both USA300-NAE | ||
| 460 | Intergenic before 0001 | |
| 14575 | Intergenic 0008–0009 | |
| 233245 | Coding 0210, hypothetical protein | nSNP |
| 2542210 | Intergenic 2401–2402 | |
| SNPs with derived allele present in ≥10 populations | ||
| 7282 | Coding 0006, | nSNP |
| 2708710 | Coding 2561, | sSNP |
| SNPs within intervals that were recombined | ||
| 168833–169219 | Intergenic 0159–0160 | Recombined on 3 or 4 branches |
| 672683–672727 | Intergenic 0614–0615 | Recombined on 3 or 4 branches |
| 2600705 | Intergenic 2461–2462 | Recombined on 3 branches |
| 2600727–2600792 | Intergenic 2461–2462 | Recombined on 3 or 4 branches |
| 2680141–2680194 | Intergenic 2536–2537 | Recombined on 8 or 9 branches |
| 2813570–2813582 | Coding 2564, | Recombined on 4 branches |
The TCH1516 locus designation has the prefix USA300HOU_.
FIG 5 Recombination in USA300. Panel A shows 4,109 biallelic SNPs in columns and results of recombination analysis by CFML and BNG in rows. Gray columns indicate nonrecombinant SNPs according to both methods. White columns indicate recombinant SNPs according to both methods. Black columns indicate recombinant SNPs according to one method only. Panel B shows the distribution of the number of branches for which sites are recombinant as inferred by CFML (○) and as expected under the CFML model (×). In panel B, 95% confidence intervals are shown but the intervals are sometimes too small to be visible and sometimes the symbols for observed and expected results overlap.