Literature DB >> 34374563

Population Genomics of emm4 Group A Streptococcus Reveals Progressive Replacement with a Hypervirulent Clone in North America.

Sruti DebRoy1, Misu Sanson2, Brittany Shah2, Shirjana Regmi2, Luis Alberto Vega2, Chioma Odo1, Pranoti Sahasrabhojane1, Allison McGeer3,4, Gregory J Tyrrell5,6, Nahuel Fittipaldi4,7, Samuel A Shelburne1,8,9, Anthony R Flores2,9.   

Abstract

Clonal replacement is a major driver for changes in bacterial disease epidemiology. Recently, it has been proposed that episodic emergence of novel, hypervirulent clones of group A Streptococcus (GAS) results from acquisition of a 36-kb DNA region leading to increased expression of the cytotoxins Nga (NADase) and SLO (streptolysin O). We previously described a gene fusion event involving the gene encoding the GAS M protein (emm) and an adjacent M-like protein (enn) in the emm4 GAS population, a GAS emm type that lacks the hyaluronic acid capsule. Using whole-genome sequencing of a temporally and geographically diverse set of 1,126 isolates, we discovered that the North American emm4 GAS population has undergone clonal replacement with emergent GAS strains completely replacing historical isolates by 2017. Emergent emm4 GAS strains contained a handful of small genetic variations, including the emm-enn gene fusion, and showed a marked in vitro growth defect compared to historical strains. In contrast to other previously described GAS clonal replacement events, emergent emm4 GAS strains were not defined by acquisition of exogenous DNA and had no significant increase in transcript levels of nga and slo toxin genes via RNA sequencing and quantitative real-time PCR analysis relative to historic strains. Despite the in vitro growth differences, emergent emm4 GAS strains were hypervirulent in mice and ex vivo growth in human blood compared to historical strains. Thus, these data detail the emergence and dissemination of a hypervirulent acapsular GAS clone defined by small, endogenous genetic variation, thereby defining a novel model for GAS strain replacement. IMPORTANCE Severe invasive infections caused by group A Streptococcus (GAS) result in substantial morbidity and mortality in children and adults worldwide. Previously, GAS clonal strain replacement has been attributed to acquisition of exogenous DNA leading to novel virulence gene acquisition or increased virulence gene expression. Our study of type emm4 GAS identified emergence of a hypervirulent GAS clade defined by variation in endogenous DNA content and lacking augmented toxin gene expression relative to replaced strains. These findings expand our understanding of the molecular mechanisms underlying bacterial clonal emergence.

Entities:  

Keywords:  M type; evolution; exotoxin; genomics; group A Streptococcus; population genetics; virulence

Year:  2021        PMID: 34374563      PMCID: PMC8409732          DOI: 10.1128/mSystems.00495-21

Source DB:  PubMed          Journal:  mSystems        ISSN: 2379-5077            Impact factor:   6.496


INTRODUCTION

Bacterial clonal emergence is a hallmark of human epidemics and has contributed to changes in disease caused by Staphylococcus aureus (1, 2), Streptococcus pneumoniae (3), and Neisseria meningitidis (4) to name a few. Streptococcus pyogenes (group A Streptococcus [GAS])—a leading cause of invasive bacterial disease in humans—has long served as a model organism for investigating severe bacterial infections and bacterial epidemic behavior (5–7). The primary typing scheme for GAS is based on sequencing of the hypervariable 5′ end of the emm gene which encodes the cell surface M protein, with GAS strains of the same emm type generally being more closely related to each other than to strains of different emm types (8). GAS epidemiology is characterized by epidemic events resulting from strain emergence and eventual replacement of prior circulating strains. To date, emergence of GAS clones has primarily been attributed to acquisition of exogenous DNA (6, 9). For example, the replacement of historical emm1 strains with the currently circulating pandemic emm1 GAS in the 1980s resulted from recombination of a 36-kb region that contained the secreted NADase (encoded by nga) and streptolysin O (SLO) (slo) toxins, two well-known, cotranscribed virulence factors (10–13). This recombination event resulted in a high-activity nga or slo promoter which has been linked to increased frequency and severity of infections and rapid global spread (6, 14). A similar recombination event has since been described in emm89 GAS strains contributing to the emergence of a new and more virulent clade (7). In addition to increased Nga/SLO expression, contemporary emm1 GAS acquired via horizontal gene transfer (HGT) a novel prophage-associated pyrogenic exotoxin (SpeA), further contributing to virulence potential and spread (6)—a mechanism also attributed to the enhanced invasiveness of emm3 GAS strains following integration of a bacteriophage carrying a novel phospholipase (Sla) (15). The GAS hyaluronic acid capsule has historically been considered critical to virulence by promoting resistance to host immune cell phagocytosis (16–20). However, several emm types that lack capsule have recently been identified (21, 22). There has been a significant rise in the number of infections caused by acapsular strains (9, 22), and they now collectively account for nearly one-third of invasive GAS infections (23, 24). Interestingly, lack of capsule production through gene loss or mutation has been linked to high-activity nga and slo genotypes (25). The combination was first described in emm89 GAS strains (7, 26) but has since been recognized to occur in additional GAS genotypes, including emm28 and emm87 (25). Thus, the current paradigm is that acapsular GAS bacteria compensate for the lack of capsule by elevating Nga/Slo production. We recently reported the occurrence of a chimeric emm gene in circulating acapsular emm4 GAS strains isolated in the United States and England (27). The new emm4 strains harboring the chimeric emm gene completely replaced the circulating strains by 2017 in the Houston, Texas, metropolitan area (27). To better understand the timeline and the genetic changes driving the emergence of chimeric emm4 strains, we analyzed 1,126 temporally and geographically diverse emm4 strains from the United States, Canada, and United Kingdom. Our analysis reveals that a new, genetically distinct clade of emm4 GAS strains with enhanced virulence has emerged and constitutes nearly the entire contemporary emm4 population. In striking contrast to previous GAS clonal replacement events, we report here that the newly circulating emm4 strains are defined by small genetic changes of existing DNA rather than acquisition of exogenous DNA and do not have augmented Nga/SLO production.

RESULTS

Analysis of the emm4 GAS population structure reveals distinct clades with nearly complete temporal replacement.

A total of 861 strains were sequenced for the current study, and we analyzed the genome sequences from an additional 265 (n = 1,126 total sequences) emm4 GAS disease isolates (invasive and noninvasive) originating from multiple geographic locations in the United States, Canada, and United Kingdom, spanning 25 years (Table 1; see also Table S1 in the supplemental material). Importantly, >70% of the isolates were derived from national-level surveillance (Centers for Disease Control and Prevention, Active Bacterial Core Surveillance program) spanning nearly 20 years (Table 1). Mean sequencing depth of coverage exceeded 120-fold for all strains analyzed. Phylogenetic relationships were inferred based on single nucleotide polymorphisms (SNPs) relative to the core genome (excluding mobile genetic elements [MGE]) of the reference emm4 GAS strain Duke (GenBank accession number CP031770) (28). Based on Bayesian population clustering, the emm4 GAS population consisted of three distinct subpopulations (subclades [SC]) (29) (Fig. 1A). Subclade 3 (SC3) was the most common subclade representing 54.5% (614/1,126) of emm4 strains followed by SC1 (38.4% [433/1,126]) and SC2 (7.0% [79/1,126]). SC1 included 16 isolates (dashed circle in Fig. 1A) that were a single locus variant (ST38) of the major emm4 sequence type (ST39).
TABLE 1

Description of strains used in this study

CountryState/regionYearsNo. of strainsDisease typeReference(s)
CanadaMultiplea1993−2018221InvasiveThis study
United KingdomNAb201433Invasive/scarlet fever 76
United StatesMultiplec1998−2017799Invasive23, 24d; this study
United StatesTexas2014−201774Invasive/pharyngeal22; this study

Strains selected from a total of 884 strains with representatives from all 10 Canadian provinces.

NA, not available.

Total of 10 ABCs collection sites (https://www.cdc.gov/abcs/index.html).

All GAS isolates beginning in 2015 were sequenced by the Centers for Disease Control and Prevention Streptococcal Laboratory with data publicly available at NCBI (BioProject accession number PRJNA395240).

FIG 1

Maximum likelihood phylogenetic reconstruction of 1,126 emm4 GAS disease isolates from North America and the United Kingdom. (A) Unrooted neighbor-joining phylogenetic tree based on 12,436 biallelic SNPs relative to the emm4 reference strain Duke (GenBank accession number CP031770) following correction for recombination (ClonalFrameML). Branch tips colored to indicate individual emm4 strain year of isolation and shape for subclade designation (BAPS). The dashed circle indicates single-locus variant (ST38) strains from the main (ST39) emm4 population. (B) Progressive replacement of historical (SC1) strains by emergent (SC3) over the 22-year period. The total number of isolates and geographic origin are indicated below bar graph. The red value for 2014 indicates that 32 isolates from the United Kingdom were included in the analysis.

Maximum likelihood phylogenetic reconstruction of 1,126 emm4 GAS disease isolates from North America and the United Kingdom. (A) Unrooted neighbor-joining phylogenetic tree based on 12,436 biallelic SNPs relative to the emm4 reference strain Duke (GenBank accession number CP031770) following correction for recombination (ClonalFrameML). Branch tips colored to indicate individual emm4 strain year of isolation and shape for subclade designation (BAPS). The dashed circle indicates single-locus variant (ST38) strains from the main (ST39) emm4 population. (B) Progressive replacement of historical (SC1) strains by emergent (SC3) over the 22-year period. The total number of isolates and geographic origin are indicated below bar graph. The red value for 2014 indicates that 32 isolates from the United Kingdom were included in the analysis. Description of strains used in this study Strains selected from a total of 884 strains with representatives from all 10 Canadian provinces. NA, not available. Total of 10 ABCs collection sites (https://www.cdc.gov/abcs/index.html). All GAS isolates beginning in 2015 were sequenced by the Centers for Disease Control and Prevention Streptococcal Laboratory with data publicly available at NCBI (BioProject accession number PRJNA395240). Strains used in this study. Download Table S1, XLSX file, 0.10 MB. Maximum likelihood phylogeny suggested that contemporary strains are overrepresented in SC3 compared to SC1 (note the predominance of red for SC3 strains in Fig. 1A). Thus, we determined the overall temporal distribution of the emm4 GAS population stratified by Bayesian clustering (29). Consistent with clonal replacement, we observed a progressive decrease in SC1 strains and a concomitant increase in SC3 over time such that by 2017, 94.7% (89/94) of emm4 strains sampled were SC3 (Fig. 1B). Interestingly, SC2 strains appear sporadically and do not persist over time. Thus, we chose to focus our analyses on the differences between SC1 and SC3 subclades. To better determine the timing of SC3 emergence, we next performed Bayesian reconstruction of ancestral states using BactDating (30). The overall substitution rate (3.5 substitutions per core genome per year; 1.84 × 10−6 substitutions per site year−1) was similar to that reported for emm1 and emm12 GAS using BEAST (31). The root date estimate (time to most recent common ancestor) for the entire emm4 population was 1,921 (range, 1,886 to 1,936), and the date of emergence for SC2 strains was 1,979 (range, 1,975 to 1,983) compared to 1,992 (range, 1,991 to 1,993) for SC3 strains (see Fig. S1 in the supplemental material). For comparison, estimated date of SC1 emergence (excluding ST38 strains) was 1,955 (range, 1,946 to 1,964). Overall, these data indicate a strong temporal signal within the emm4 population examined. Thus, henceforth, we refer to the SC1 strains as “historical” and to SC3 strains as “emergent.” Rooted phylogram of 1,126 emm4 GAS strains showing subclade temporal relationships. Phylogram corrected for recombination using ClonalFrameML. Tree tips are colored by subclade (SC) as indicated in the legend. BactDating was used to infer node dates. Estimated dates for the most recent common ancestor (Root) and SC emergence are indicated. Download FIG S1, JPG file, 0.4 MB.

Major gene content differences between clades are defined by prophage gene loss but do not involve MGE-related exotoxins or DNases.

Recently, it has been shown that contemporary emm4 lineages are defined by prophage-specific gene loss resulting in cryptic prophages (32). Importantly, prophage degradation in the emm4 lineages did not alter exotoxin or DNase gene content. Thus, we first compared the prophage-associated exotoxin and DNase gene content in our emm4 population to determine differences between subclades. Among the major subclades, we observed no differences in exotoxin gene (smeZ, speC, and ssa) content (Fig. 2 and Table S2). Similarly, the prophage-associated, DNase-encoding genes spd1 and spd3 were nearly universally present in all three subclades (Fig. 2 and Table S2). However, the emm4 subpopulations differed in the presence of the prophage-associated DNase gene sdn occurring in 54% (333/614) of SC3 strains but was not a subclade-defining feature (Fig. 2 and Fig. S2).
FIG 2

Distribution of DNases, exotoxins, and macrolide-lincosamide-streptogramin (MLS) resistance within the emm4 population. A rooted maximum likelihood (ML) phylogram of 1,126 emm4 GAS disease strains is shown. Branch tip color and shape indicate year and subclade designation as in Fig. 1 and the associated legend. Subclades (SC1 to 3) are indicated by shaded boxes. DNase, exotoxin, and antimicrobial resistance gene content is indicated by vertical bars to the right of the ML phylogram and defined in the legend headings. Asterisks indicate nodes representing strains with completed genomes (see Table S3 in the supplemental material).

Distribution of DNases, exotoxins, and macrolide-lincosamide-streptogramin (MLS) resistance within the emm4 population. A rooted maximum likelihood (ML) phylogram of 1,126 emm4 GAS disease strains is shown. Branch tip color and shape indicate year and subclade designation as in Fig. 1 and the associated legend. Subclades (SC1 to 3) are indicated by shaded boxes. DNase, exotoxin, and antimicrobial resistance gene content is indicated by vertical bars to the right of the ML phylogram and defined in the legend headings. Asterisks indicate nodes representing strains with completed genomes (see Table S3 in the supplemental material). Exotoxin and DNase content of emm4 GAS population. Download Table S2, DOCX file, 0.02 MB. Pangenome analysis and clustering of emm4 GAS population using PANINI. Points represent individual strain clusters defined by the PANINI output and visualized using Microreact (microreact.org). Points are colored by subclade as defined in the legend. The dashed lines separate clusters based on gene content [sdn, erm(B), and mef(A)/msr(D)]. The boxed inset represents the unrooted neighbor-joining tree as in Fig. 1A with branch tips colored by subclade. Download FIG S2, JPG file, 0.2 MB. Completed emm4 GAS genomes used for transcriptome analysis. Download Table S3, DOCX file, 0.02 MB. We next performed a pangenome analysis to further define gene content differences within the emm4 population. The core genome (shared by >99% of strains) was defined by 1,516 (31.7%) of the total 4,788 genes identified in the emm4 pangenome. In comparison, there are 1,934 genes in the Duke reference genome. The majority (2,879/4,788 [60.1%]) of genes were found in <15% of strains. Using a recently described, novel machine-learning method (PANINI) (33), we discovered that gene content differences contributing to emm4 GAS population structure correlated almost perfectly with subclades defined by Bayesian population clustering from SNPs (Fig. S2)—an observation potentially consistent with previously described prophage degradation (32). In addition, genes conferring resistance to macrolides [e.g., mef(A) and erm(B)] also contributed to strain subclustering (Fig. S2). Interestingly, a subpopulation of SC3 strains contained erm(T) residing on a previously described ∼5-kb plasmid (34) (Fig. 2). However, strains were more broadly distributed among the SC3 population when assessing gene content (Fig. S2) compared to other resistance elements (data not shown). Further, subpopulation structure within SC3 strains was strongly associated with the presence of the DNase sdn (Fig. S2). Taken together, these data demonstrate that acquisition of an exogenous virulence factor does not distinguish emergent from historical emm4 strains.

Emergent clade emm4 GAS strains are defined by small genetic changes and not large-scale recombination.

Previous studies of GAS epidemic clone emergence established that large recombination events, specifically involving the known GAS virulence factors Nga and SLO, are critical for epidemic clone emergence (6, 7). Examination of phylogenetic relationships before and after correction for recombination (ClonalFrameML and gubbins) (35, 36) revealed only modest changes in the emm4 GAS population structure (Fig. S3A to C). To further facilitate identification of recombination, we sequenced to closure the genomes of eight (four SC1 and four SC3) emm4 GAS strains representing major clades/subclades and MGE content (asterisks in Fig. 2 and Table S3). Given the sporadic temporal appearance and lack of persistence in SC2 strains, we focused our analyses on differences between SC1 and SC3. Comparison of the completed genomes revealed no large-scale (e.g., >1-kb) recombination, and nga and slo, including the associated promoter, were identical between completed genomes and within the studied emm4 population (data not shown). Specifically, all strains possessed an identical “high-activity” nga or slo promoter (25). For comparison, in a study of emm89 GAS clonal emergence, HGT and recombination accounted for the majority of core chromosomal differences between clades and ranged in size from a few hundred bases to >70 kb (26). Thus, although previous studies identified HGT as a hallmark of GAS clade emergence, these data show that HGT is not a major contributor to clade differences in the emm4 GAS population. Exogenous DNA recombination and/or acquisition do not play a major role in the emm4 GAS population. (A and B) Unrooted neighbor-joining phylogenetic tree inferred from 12,436 biallelic SNP loci prior to (A) and after correction for recombination using ClonalFrameML (B). Tree tips are colored to indicate year and shape to indicate subclade (SC) designation as defined in the legend. (C) Map of recombination relative to the reference genome Duke following gubbins and visualized using phandango (https://jameshadfield.github.io/phandango/). Regions of recombination are limited to MGE (e.g. prophage) and small genomic loci (e.g., emm-enn gene fusion). Historical clade strains are highlighted in blue. Download FIG S3, JPG file, 1.5 MB. We next determined differences between emergent and historical strains with respect to small genetic changes (e.g., SNPs, insertions/deletions). In addition to the previously described emm-enn gene fusion (27), emergent strains were defined by a small number of differences in core chromosomal loci compared to historical strains. A total of 36 polymorphisms (core chromosomal) occurred in predicted genes and separated historical and emergent emm4 populations (Table S4). Importantly, several of the identified polymorphisms occurred in genes previously shown to alter virulence (Table 2). For example, all or nearly all emergent (SC3) strains contain inactivating mutations (historical strains with intact reading frames) in ralp3 (98% [602/614]), which encodes a RofA family transcriptional regulator previously identified as important to GAS virulence (37), and silA (100% [614/614]), which encodes a response regulator of the Streptococcus invasion locus (sil) (38) (Table 2). Similarly, >86% of emergent strains contain a mutation predicted to result in a single amino acid change (T104I) in the regulator of proteinase RopB, an activator of the actively secreted virulence factor Streptococcus cysteine protease SpeB (39). We used casein hydrolysis (milk plate assays) to assess the functional impact of this polymorphism and found that the emergent strains produced a slightly larger clearance area relative to the historical strains consistent with increased SpeB production (Fig. S4A and B). Finally, 90% (377/417) of historical strains encode valine (V) at position 27 (a variably conserved region of the predicted DNA-binding domain [40]) in the multigene activator (Mga) protein (41). Conversely, only 22% (138/614) of emergent emm4 strains possess the same valine, and these strains form a distinct subpopulation within SC3 (data not shown). The remaining emergent strains (476/614) have an Mga protein with alanine (A) at position 27 and thus have a V27A change relative to the historical strains.
TABLE 2

Select virulence genes/loci with changes differentiating emergent and historical clades

Virulence gene(s)TypeaLocusb/descriptionNo. of strains (%) with virulence gene
Emergent (SC3) (n = 614)Historical (SC1) (n = 417)
emm/enn chimeraRD1F63_08970; gene fusion between emm and enn614 (100)0 (0.0)
ropB T104ISNPD1F63_09065; single amino acid change (threonine to isoleucine) in regulator of SpeB, RopB532 (86.6)1 (0.2)
silA IndelD1F63_02180; single nucleotide insertion leading to frameshift in response regulator, SilA, of the Sil bacteriocin signaling system614 (100)29 (7.0)
ralp3 SNPD1F63_03645; nonsense mutation in ralp3 of the ERES pathogenicity island602 (98.0)2 (0.5)
mga A27VSNPD1F63_08980; single amino acid change (alanine to valine) in stand-alone regulator Mga137 (22.5)376 (87.0)

Type of chromosomal change: nucleotide insertion/deletion (indel), recombination (R), single nucleotide polymorphism (SNP).

Gene locus in the emm4 GAS reference strain Duke (GenBank accession number CP031770).

Select virulence genes/loci with changes differentiating emergent and historical clades Type of chromosomal change: nucleotide insertion/deletion (indel), recombination (R), single nucleotide polymorphism (SNP). Gene locus in the emm4 GAS reference strain Duke (GenBank accession number CP031770). Clade-defining core chromosomal differences between emergent and historical emm4 populations. Download Table S4, XLSX file, 0.02 MB. In vitro growth characteristics and phenotypes following in vivo mouse and ex vivo human blood models of GAS disease in emergent and historical emm4 GAS. (A and B) Casein hydrolysis (milk plate) assays of historical and emergent clade emm4 GAS. The area of clearance was measured following growth on milk plates as described in Materials and Methods for historical and emergent clade strains as a group (A) and individually (B). Asterisk indicates P < 0.05 following Mann-Whitney U-test. (C and D) Growth of emm4 GAS strains in vitro in nutrient-rich (THY, C) and glucose-limiting (C medium, D) conditions. Details of strain growth are provided in Materials and Methods. The dashed line indicates mid-exponential (ME) phase of growth. (E to G) Cumulative Kaplan-Meier survival curves of mice infected with four emergent (red) or four historical (blue) emm4 GAS over a range of doses (CFUs). Groups of seven mice were used per strain per dose. (H) Bactericidal assay of emergent and historical clade strains. Error bars represent standard deviations following assays performed in quadruplicate using blood from two independent donors. Download FIG S4, JPG file, 1.0 MB. We also identified a small number of locus variation between the clades representing ∼12 kb (<1%) of the core chromosome (data not shown). Most of these variations were in genes encoding surface proteins with known highly repetitive sequences and frequent strain-to-strain allelic variation such as the streptococcal collagen-like proteins (sclA and sclB) (42, 43), serum opacity factor (sof) (44), and fibronectin-binding protein (fbpA) (23). However, in contrast to single nucleotide changes and the emm-enn fusion, the observed gene variations were not clade defining and occurred in both emergent and historical strains.

emm4 clades exhibit significantly different gene variation in key transcriptional regulators.

To gain insight into whether particular GAS loci were under selective pressure, we determined genetic variation at each of the 1,629 core genetic loci (relative to the reference strain Duke and excluding MGE) to define highly polymorphic genes. After exclusion of variation likely to have arisen due to repetitive elements or small recombination events, a total of 24 genes were identified as significantly highly polymorphic (Fig. 3 and Table S5). Interestingly, nearly half (10/24 [42%]) of the highly variant genes encoded known or predicted proteins of two-component systems (TCS) or stand-alone transcriptional regulators. Similar to previous investigations of GAS populations (5–7, 45), we identified the genes encoding the CovRS TCS as significantly highly polymorphic (Fig. 3). Mutations altering the activity of the CovRS TCS are well-known to aid in the transition from colonization to invasion (28, 46–49) and occur at relatively high frequencies in multiple GAS emm types (5, 6, 45, 50). In nearly all cases of regulator gene variation, observed mutations resulted in predicted modified proteins, with >25% of the mutations introducing a premature stop codon (131/505 [25.9%]) and nearly 70% leading to at least one amino acid change (350/505 [69.3%]) (Table S5).
FIG 3

Highly polymorphic genetic loci identified within the emm4 GAS population after exclusion of recombinant loci. Genes of known or predicted regulators (red) are indicated.

Highly polymorphic genetic loci identified within the emm4 GAS population after exclusion of recombinant loci. Genes of known or predicted regulators (red) are indicated. List of highly polymorphic loci in the emm4 GAS population, including clade-specific loci. Download Table S5, XLSX file, 0.01 MB. To gain additional insights into mechanisms underlying the observed clonal emergence, we sought to determine whether the frequencies of gene variation differed between the historical and emergent clades. A total of 32 genes were identified in the emm4 population with significantly higher variation levels compared to the rest of the genome of which 8 were also significantly highly polymorphic in both emergent and historical clade strains when analyzed separately (Table S5). Interestingly, excluding clade-defining mutations in mga, ralp3, and ropB (Table 2), we discovered that the proportion of strains with regulator-specific mutations differed significantly between clades (Table 3 and Table S5). Specifically, gene variation rates for covS (histidine kinase of CovRS TCS), rofA, and ralp3 were significantly greater in historical clade strains compared to the emergent clade strains (Table 3). On the other hand, genetic variation in mga was observed to be significantly greater in emergent compared to historical strains (Table 3). The significantly different gene variation levels between the two emm4 clades suggest different regulatory programs contributing to host-pathogen interaction.
TABLE 3

Frequency of mutation in known or predicted transcriptional regulators in emm4 GAS population and by subclade

GeneLocus tagaDescriptionbTotal emm4 gene variation
Total emm4g (n = 1,127)SC3h (n = 614)SC1h (n = 417)P valuei
MutationscFS/NS/SYNdChi-squareeP valuef
covS D1F63_01590Sensor histidine kinase15748/106/32,6350.0E+00220 (19.5)88 (14.4)119 (28.5)<0.001
ropB j D1F63_09080Rgg family transcriptional regulator10317/86/02,0560.0E+00145 (12.9)82 (13.4)51 (12.2)NS
rofA D1F63_00785HTH, Mga domain-containing protein8229/49/46521.72E−140137 (12.2)55 (9.0)73 (17.5)<0.001
mga k D1F63_08980Multiple gene activator, Mga404/34/21084.3E−22114 (10.1)95 (15.5) 5 (1.2) <0.001
covR D1F63_01585DNA-binding response regulator240/23/11061.1E−2128 (2.5)10 (1.6)16 (3.8)NS
fasA D1F63_01190DNA-binding response regulator207/13/0619.6E−1233 (2.9)22 (3.5) 8 (1.9) NS
ralp3 l D1F63_03645HTH, Mga domain-containing protein3011/12/7534.8E−1085 (7.5) 13 (2.1) 68 (16.3)<0.001
fasB m D1F63_01180Sensor histidine kinase2513/10/2412.9E−7118 (10.5)57 (9.3) 59 (14.1) NS
D1F63_07540HTH, Mga domain-containing protein242/17/5291.1E−438 (3.4) 18 (2.9) 18 (4.3) NS

Identifier in the reference genome Duke (CP031770).

Putative function and/or conserved functional domains.

Number of unique variations (mutations) in the given gene derived from the emm4 population (n = 1,127).

Number of mutations defined as frameshift or stop (FS), nonsynonymous (NS), or synonymous (SYN).

Chi-square statistic based on total number of variations in the core genome (n = 10,594).

P value based on chi-square test for the total emm4 population following correction for multiple comparisons (Bonferroni).

Number of strains in the emm4 population (total emm4) with at least one of the defined gene mutations.

Number of strains in emergent or historical clades (excluding ST38; n = 16) with at least one of the defined gene mutations. Italic values indicate that the indicated gene was not significantly highly polymorphic in the respective population (Table S5).

P value (Fisher’s exact) comparing proportion of strains with at least one mutation in emergent versus historical clades following correction for multiple comparisons (Bonferroni).

Excludes the clade-defining mutation T104I found in emergent/clade 2 strains.

Excludes the clade-defining mutation A27V found in historical/clade 1 strains.

Excludes the clade-defining nonsense mutation found in emergent/clade 2 strains and a synonymous SNP (n = 289 strains) in historical/clade 1 strains.

Excludes a synonymous SNP (n = 492) occurring in both clades.

Frequency of mutation in known or predicted transcriptional regulators in emm4 GAS population and by subclade Identifier in the reference genome Duke (CP031770). Putative function and/or conserved functional domains. Number of unique variations (mutations) in the given gene derived from the emm4 population (n = 1,127). Number of mutations defined as frameshift or stop (FS), nonsynonymous (NS), or synonymous (SYN). Chi-square statistic based on total number of variations in the core genome (n = 10,594). P value based on chi-square test for the total emm4 population following correction for multiple comparisons (Bonferroni). Number of strains in the emm4 population (total emm4) with at least one of the defined gene mutations. Number of strains in emergent or historical clades (excluding ST38; n = 16) with at least one of the defined gene mutations. Italic values indicate that the indicated gene was not significantly highly polymorphic in the respective population (Table S5). P value (Fisher’s exact) comparing proportion of strains with at least one mutation in emergent versus historical clades following correction for multiple comparisons (Bonferroni). Excludes the clade-defining mutation T104I found in emergent/clade 2 strains. Excludes the clade-defining mutation A27V found in historical/clade 1 strains. Excludes the clade-defining nonsense mutation found in emergent/clade 2 strains and a synonymous SNP (n = 289 strains) in historical/clade 1 strains. Excludes a synonymous SNP (n = 492) occurring in both clades.

Emergent emm4 GAS strains exhibit a distinct transcriptome profile with decreased virulence gene expression.

The observed differences in regulator gene polymorphism frequency between the two clades, including regulators known to respond to metabolite availability, led us to hypothesize that emergent clade strains may differ from historical strains in in vitro growth characteristics and global gene regulation. For all subsequent clade comparisons, we identified four representative strains each from the SC1 (historical) and SC3 (emergent) clades (Table S3). All strains were wild type at key, known regulatory loci (e.g., TCS and stand-alone regulators) with the exception of clade-specific changes (Table 2 and Table S4). Thus, observed differences are likely to be representative of each clade. We first compared in vitro strain growth in nutrient-rich medium (THY) and glucose-limiting conditions (C medium). Overall, growth among emergent clade strains showed greater strain-to-strain variability compared to historical strains (Fig. 4A). Surprisingly, under both conditions, emergent clade strains had a marked growth defect beginning at approximately mid-exponential (ME) phase (Fig. 4A and Fig. S4C). The observed differences in growth were more pronounced under glucose-limiting conditions (C medium) (Fig. 4A and Fig. S4D), suggesting that emergent strains are sensitive to nutrient limitation. The lower optical densities (ODs) of the emergent strains correlated with lower CFU, indicating that the lower ODs were not due to an artifact (data not shown).
FIG 4

Emergent emm4 GAS strains have altered in vitro growth and no increased SLO toxin production compared to historical strains. (A) Growth curves comparing historical (blue squares) to emergent (red circles) clade emm4 GAS strains in nutrient-rich (THY; solid) or glucose-deficient (C medium; open) conditions. Error bars represent the standard deviations of four strains grown in biological triplicate. Dashed lines indicate mid-exponential (ME) phase of growth. (B) Principal-component analysis of RNA-seq comparing emergent (red) and historical (blue) clade strains. Individual strains with replicate samples are labeled and indicated using dashed circles. PC 1, principal component 1. (C) Log2 fold change in transcript levels derived from RNA-seq for selected genes derived from RNA-seq comparing four historical and four emergent emm4 GAS strains. Error bars represent standard deviations. (D) Transcript levels (qRT-PCR) of slo from emergent or historical GAS cells grown in nutrient-rich media and harvested at ME as defined in panel A. The asterisk indicates a P value of <0.005 (unpaired t test). (E) Western blot analysis using anti-Nga antibody (α-Nga) of culture supernatants (SN) from historical (blue text) or emergent (red text) emm4 GAS strains. Similar amounts of protein are demonstrated using anti-HPr. ctrl, control. (F) Transcript levels derived from qRT-PCR of slo from representative strains of emm1 (MGAS2221), emm3 (MGAS10870), emm4, emm28 (strain TSPY902), emm87 (strain TSPY1057), and emm89 (strain MSPY1). For emm4 GAS, transcript levels reflect four historical strains (in blue) and four emergent strains (in red). RNA was extracted from ME cultures as described in Methods. TaqMan qRT-PCR data for both panels D and F are means ± standard deviations of two biological replicates, with two technical replicates, done on two separate days. pos., positive; neg., negative.

Emergent emm4 GAS strains have altered in vitro growth and no increased SLO toxin production compared to historical strains. (A) Growth curves comparing historical (blue squares) to emergent (red circles) clade emm4 GAS strains in nutrient-rich (THY; solid) or glucose-deficient (C medium; open) conditions. Error bars represent the standard deviations of four strains grown in biological triplicate. Dashed lines indicate mid-exponential (ME) phase of growth. (B) Principal-component analysis of RNA-seq comparing emergent (red) and historical (blue) clade strains. Individual strains with replicate samples are labeled and indicated using dashed circles. PC 1, principal component 1. (C) Log2 fold change in transcript levels derived from RNA-seq for selected genes derived from RNA-seq comparing four historical and four emergent emm4 GAS strains. Error bars represent standard deviations. (D) Transcript levels (qRT-PCR) of slo from emergent or historical GAS cells grown in nutrient-rich media and harvested at ME as defined in panel A. The asterisk indicates a P value of <0.005 (unpaired t test). (E) Western blot analysis using anti-Nga antibody (α-Nga) of culture supernatants (SN) from historical (blue text) or emergent (red text) emm4 GAS strains. Similar amounts of protein are demonstrated using anti-HPr. ctrl, control. (F) Transcript levels derived from qRT-PCR of slo from representative strains of emm1 (MGAS2221), emm3 (MGAS10870), emm4, emm28 (strain TSPY902), emm87 (strain TSPY1057), and emm89 (strain MSPY1). For emm4 GAS, transcript levels reflect four historical strains (in blue) and four emergent strains (in red). RNA was extracted from ME cultures as described in Methods. TaqMan qRT-PCR data for both panels D and F are means ± standard deviations of two biological replicates, with two technical replicates, done on two separate days. pos., positive; neg., negative. Next, we determined the in vitro transcriptomes of emergent and historical clade strains using RNA sequencing (RNA-seq) from cells grown in nutrient-rich medium and harvested at mid-exponential phase. These conditions were chosen given that growth differences were less pronounced in nutrient-rich media (Fig. 4A), and many GAS virulence factors are known to be maximally expressed during mid-exponential phase in vitro (26, 51). For RNA-seq analysis, the complete genome sequence was resolved for each of the eight strains and annotated using the prokaryotic genome annotation pipeline (PGAP) at NCBI (52). Via pangenome analysis, we identified 1,619 genes that were shared between the eight emm4 genomes, and these were used to determine differential gene expression (DGE) between clades. Principal-component analysis demonstrated well-defined, clade-specific clustering of emm4 strain transcriptomes (Fig. 4B). DGE was defined as an absolute fold difference of ≥1.5 and Bonferroni-corrected P value of <0.05. Excluding MGE, we found 62 genes upregulated and 141 genes downregulated in emergent strains relative to historical strains (Table S6). Highly upregulated genes in emergent clade strains involved the methionine oxidative response (ccdA2), arginine deiminase pathway (arcA), and genes encoding putative peptidoglycan-modifying enzymes (05805) (Fig. 4C). In striking contrast to previous GAS epidemic strain emergence investigations (6, 7, 53), we discovered that genes encoding the virulence factors nga, slo, prtS (interleukin 8 [IL-8] protease), and sda3 (DNase) were among the most highly downregulated genes in emergent strains relative to historical strains (Fig. 4C). The observed differences in nga and slo transcript levels cannot be explained by sequence variation, as emergent and historical strains have identical nga-slo operons and promoter regions. Given that the T4 pilus was recently shown to contribute to colonization and immune evasion of emm4 GAS (54), we also examined pilus gene transcription levels and again observed significantly decreased expression in emergent strains compared to historical clade strains (Table S6). Differential gene expression determined by RNA sequencing in emergent versus historical emm4 GAS strains. Download Table S6, XLSX file, 0.02 MB. It is possible that the differences in growth between emergent and historical strains contributed to the observed nga and slo DGE. Given the pivotal role of Nga and SLO in previous GAS epidemic clonal emergence (6, 7, 53), we sought to confirm the lack of increased transcript levels in emergent clade strains. We isolated RNA from GAS strains grown in nutrient-rich medium at mid-exponential phase and assayed transcript levels of slo using quantitative reverse transcription-PCR (qRT-PCR). We found significantly reduced slo transcript levels in emergent strains compared to historical strains (Fig. 4D). Next, we sought to correlate Nga/SLO protein expression with transcript levels determined by qRT-PCR. Consistent with the decreased nga and slo transcript levels in the emergent strains, we observed less Nga in the supernatant derived from emergent strains (Fig. 4E). Inasmuch as acapsular strains (e.g., emm4) have been associated with high Nga/SLO toxin expression (25), we compared nga and slo transcript levels from historical and emergent emm4 strains to strains representative of other GAS emm types with the high-level nga or slo promoter variant (25). We observed significantly lower slo transcript levels in emm4 GAS compared to emm1, emm3, emm28, and emm89 strains and similar levels compared to the emm87 strain (Fig. 5F). Thus, in total, the above data suggest distinct regulatory programs governing emergent and historical emm4 strains and that emm4 clonal replacement did not result from augmented Nga/SLO toxin production.
FIG 5

In vivo mouse and ex vivo human blood virulence assays comparing emergent and historical clade emm4 GAS strains. (A) Cumulative Kaplan-Meier survival curves of mice (n = 7 per strains) infected with four historical (blue) or four emergent (red) emm4 GAS strains (2 × 108 CFU). The P value of  <0.0001 was determined by the log rank test. Survival curves following infection with other dose ranges are shown in Fig. S4 in the supplemental material. (B) Survival of emergent (red) or historical (blue) strains following exposure to human blood ex vivo. Error bars represent standard deviations of four emergent and four historical emm4 GAS strains assayed in quadruplicate using three independent donors. The two asterisks indicate P < 0.05 (Mann-Whitney U test). Individual strain comparisons are shown in Fig. S4.

In vivo mouse and ex vivo human blood virulence assays comparing emergent and historical clade emm4 GAS strains. (A) Cumulative Kaplan-Meier survival curves of mice (n = 7 per strains) infected with four historical (blue) or four emergent (red) emm4 GAS strains (2 × 108 CFU). The P value of  <0.0001 was determined by the log rank test. Survival curves following infection with other dose ranges are shown in Fig. S4 in the supplemental material. (B) Survival of emergent (red) or historical (blue) strains following exposure to human blood ex vivo. Error bars represent standard deviations of four emergent and four historical emm4 GAS strains assayed in quadruplicate using three independent donors. The two asterisks indicate P < 0.05 (Mann-Whitney U test). Individual strain comparisons are shown in Fig. S4.

Emergent strains are more virulent compared to historical emm4 GAS strains.

Emergence of novel GAS strains resulting in clonal replacement has previously been associated with enhanced virulence compared to historical strains (5–7). Given the nearly complete replacement observed in our temporal analysis and despite the seemingly paradoxical in vitro growth difference, we hypothesized that emergent emm4 strains possess increased virulence compared to historical strains in models of GAS disease. The same historical and emergent strains used in transcriptomic analyses were used for virulence assays (Table S3). In a model of GAS bacteremia, groups of mice were inoculated with one of five doses of GAS (range, 5 × 106 to 6 × 108) in order to determine the dose at which 50% of the mice become moribund (LD50). Mice exhibited a dose response to GAS infection with either emergent or historical clade strains (Fig. S4E to S4G). With the exception of the lowest dose, mice infected with emergent clade GAS strains exhibited significantly decreased survival compared to historical emm4 GAS strains. For example, the median survival time when infected with 2 × 108 CFU (Fig. 5A) was 23.5 h in emergent strains compared to 181.5 h for historical strains. Next, we compared the ability of emergent and historical clade strains to survive in human blood ex vivo using the Lancefield bactericidal assay (55). In agreement with the results from the mouse challenge, emergent clade emm4 strains demonstrated significantly increased growth in human blood compared to the historical strains (Fig. 5B), although there was substantial strain-to-strain variability (Fig. S4H). The increased virulence observed in emergent emm4 GAS strains that have nearly replaced all previously circulating strains herein suggests that mechanisms that do not involve augmentation of Nga/Slo toxin expression can lead to GAS epidemic emergence.

DISCUSSION

Whole-genome sequencing (WGS) of large cohorts of clinical isolates facilitated by the dramatic decrease in sequencing costs over the past 15 years have allowed for clear detection of clonal emergence and spread of numerous major human bacterial pathogens such as USA300 methicillin-resistant Staphylococcus aureus (1, 2) and lineage 11 Neisseria meningitidis (4). Similarly, WGS of distinct GAS emm types has identified clonal emergence and replacement events resulting in tens of thousands of life-threatening human infections (5–7, 45). We previously detected an unrecognized emm-enn fusion event in contemporary emm4 GAS isolates (27). Herein, we used WGS to analyze >1,000 emm4 GAS strains of diverse temporal and geographic origins to ascertain the extent and broader molecular context of the emm-enn gene chimera. Our results show that there has been nearly complete strain replacement among emm4 GAS in North America over the past 20 years. Importantly, this strain replacement is not defined by acquisition of exogenous DNA which has previously been the molecular basis attributed to GAS clonal replacement (6, 9). A critical strength of our study relative to other investigations of bacterial clonal emergence was that the replacement event occurred during a period of active surveillance (CDC active surveillance for GAS began in 1997) which facilitated collection of large numbers of both historical and emergent isolates. In turn, this meant that we could use large-scale phylogenomic analyses to accurately select representative strains from both the historical and emergent populations on which to perform comparative studies rather than having to rely on a small number of convenience samples for the historical isolates. Inasmuch as a single nucleotide polymorphism can markedly change GAS phenotype (9, 56), having robust sample collections for both the historical and emergent clades markedly increases the confidence that our findings are generally applicable to the populations being studied. Additionally, the large number of historical and emergent isolates facilitated intra-emm type comparison of relative rates of gene polymorphisms. Intriguingly, the hypovirulent historical isolates had higher rates of covS mutation, which we have recently shown increases emm4 virulence (57). Conversely, the emergent strains had higher rates of mga variation, which we postulate may reflect the in vitro growth issues we identified in the emergent strains given the critical role of Mga in GAS metabolism (41). Nearly 10 years ago, emm4 and emm22, were the first identified acapsular emm types, a finding which was explained by the absence of the capsule-encoding hasABC operon (21). This discovery was surprising given that the GAS hyaluronic acid capsule has been considered critical to GAS pathogenesis for nearly a century (58). However, in the decade since the initial recognition of acapsular emm4 and emm22 GAS, we and others have described numerous other emm types either lacking or having abrogating mutations in the hasABC operon (22, 25). Indeed, acapsular GAS bacteria account for >30% of invasive GAS isolates as determined by active CDC surveillance (24). To date, the majority of acapsular GAS investigation has focused on emm89, as strains lacking the hasABC operon have recently emerged from and replaced their encapsulated precursors to cause epidemics in both the United States and Europe (7, 53). The “new” emm89 strains have several regions of large-scale genetic recombination relative to replaced strains (26). The GAS emm types that are the source of these recombined areas have not been fully defined, but it is clear that the emergent emm89 clone contains numerous pieces of exogenous DNA that likely have contributed to its global spread. In contrast, we found emergent emm4 GAS are not distinguished from the replaced strains by acquisition of exogenous DNA despite our use of multiple, complementary methodologies. Thus, the emergent emm4 GAS strains are not defined by the presence of a new virulence factor, which also stands in contrast to various successful GAS clones such as contemporary emm3 strains that contain an actively secreted phospholipase (sla) (15). Some 50% of new emm4 strains contain the DNase Sdn which was very rarely identified in historical strains (Fig. 2), but half of the new strains lack Sdn, meaning that Sdn acquisition cannot explain emergence of the new strains. Our findings of small changes in endogenous DNA content rather than acquisition of exogenous DNA defining an emerging GAS clade echo those recently described for an emm1 clone in England causing scarlet fever (59). Thus, together with the recent emm1 data, our emm4 clonal emergence findings suggest that increased focus needs to be paid to understanding mechanisms by which alteration of endogenous DNA content impacts GAS pathophysiology. The GAS recombination event that has been most studied thus far involves the nga or slo operon and results in augmented transcript levels of the gene encoding the streptolysin O toxin through acquisition of a “high-activity” promoter variant (9, 25). Clonal replacement and emergence of both emm1 (6) and emm89 (7, 53) GAS have been at least partially ascribed to this particular event. Indeed, all of the genes differentially transcribed between currently circulating and replaced emm1 strains were located in the nga or slo recombination area (6). Additionally, acquisition of a high-activity nga or slo promoter by a variety of emm types through recombination has been specifically associated with acapsular GAS strains (25). In stark contrast, we found no difference between the historical and emergent emm4 GAS strains in terms of nga/slo region genetic composition. Additionally, we identified that both historical and emergent emm4 GAS strains produce significantly lower slo transcript levels relative to strains with an identical high-activity nga or slo promoter. We also found that similarly low nga and slo transcript levels were present in an acapsular emm87 strain with a high-activity nga or slo promoter. Therefore, our data challenge the notion that the nga or slo region is the sole driver of acapsular GAS strain emergence, and they conclusively show that acapsular strains can cause serious infections without elevation of slo transcript levels. Our analyses have identified several marked phenotypic and genotypic differences between the historical and emergent strains, which complicates the ability to definitively associate a single trait with emm4 GAS clonal emergence. Surprisingly, the emergent strains had a clear growth defect relative to the historical strains. As noticeable as this defect is under in vitro conditions, the epidemiologic data clearly show that it does not affect the overall fitness of the organism in the human host, as demonstrated by the hundreds of invasive GAS infections caused by the emergent emm4 clade. Similarly, our RNA-seq data show markedly lower transcript levels of numerous key virulence factor-encoding genes such as prtS and sda3 in the emergent strains. In contrast to the RNA-seq data, however, the emergent strains were more virulent in an intraperitoneal mouse challenge model and an ex vivo human blood model, findings which are consistent with emm1 (6) and emm89 (7, 53) studies in which the expanding clones were hypervirulent relative to the replaced strains. However, increased virulence in emergent emm4 strains was not associated with alterations in the nga or slo region as has been clearly demonstrated for emm1 and emm89 epidemic strains. We identified multiple small-scale genetic events separating the emergent and historical emm4 strains, such as the emm-enn gene fusion and early stop codons in genes encoding two regulatory proteins previously identified as impacting GAS pathophysiology. Additionally, the emergent strains contained a single amino acid change in RopB and produced slightly more of the broad-spectrum, actively secreted protease SpeB relative to historic stains. None of the genetic changes that we identified in the emerging emm4 strains were shared with the 27 SNPs that defined the new emm1 English strains discussed above (59). Interestingly, our extensive phylogenetic analysis did not reveal a stepwise acquisition of the genetic differences separating the emergent and historical strains suggesting that a previously unrecognized clone containing all of the changes may have been introduced into the population. Alternatively, the “intermediate” strains may not have caused invasive infection such that they were not available for our analysis. Which of these factors or combination of factors accounts for observed phenotypic differences is an active area of further investigation in our laboratories. Another key difference between our investigation and other studies of GAS strain emergence relates to total disease burden. For both emm1 and emm89 GAS, the emerging clones caused markedly more total infections relative to the precursor strains (6, 7, 53), findings which in turn spurred subsequent investigation. In contrast, our study of emm4 GAS was driven by the WGS identification of the emm-enn gene fusion in a single strain (27). Indeed, total emm4 invasive disease rates have not markedly changed in the United States over the past 20 years, possibly suggesting a defined ecologic niche for emm4 organisms. Thus, our findings show that even when total GAS disease due to a particular emm type is relatively stable, there may be sizable variation in clonal distribution. To this end, the public availability of WGS from invasive GAS strains provided by the CDC is likely to assist with identification and characterization of heretofore unrecognized shifts in GAS epidemiology. In conclusion, via WGS of >1,000 strains collected over a 20+ year period of active surveillance, we have discovered nearly complete replacement of previously circulating emm4 GAS strains by a new clone in high-income settings. In contrast to prior GAS clonal replacement investigations, emergent emm4 GAS strains are defined by small changes in endogenous DNA, rather than acquisition of exogenous DNA, and have low slo transcript levels. Yet, emergent emm4 strains still have marked phenotypic variation relative to the replaced strains, indicating that much remains to be learned regarding the factors driving proliferation of GAS clones within the human population.

MATERIALS AND METHODS

GAS strains used in this study.

GAS strains are listed in Table S1 in the supplemental material. A total of 583 strains were obtained from national-level surveillance performed by the CDC Active Bacterial Core Surveillance (ABCs) beginning in 1997 (60, 61). Likewise, from a total of 812 strains, a representative subset of strains (n = 203) were selected from surveillance derived from the Toronto Invasive Bacterial Diseases Network and Alberta Provincial Laboratory for Public Health. All strains were grown using standard laboratory media as previously described (62). Genome sequence information from the remaining strains was obtained from the NCBI Sequence Read Archive (see “Data availability” below).

Genomic DNA extraction, short- and long-read sequencing, and bioinformatic methods.

For all short-read, Illumina-based sequencing, high-quality genomic DNA was extracted from GAS cells grown overnight at 37°C and 5% CO2 on SBA using the Qiagen DNeasy kit modified for Gram-positive bacteria as previously described (63). Genomic DNA used in Oxford Nanopore long-read sequencing was extracted from cells using a cetyl trimethylammonium bromide (CTAB) protocol adapted for GAS to enrich for longer DNA fragments. The quantity and quality of DNA were measured using a nanodrop spectrophotometer (ThermoFisher) or Take3 microvolume plate reader (BioTek). DNA libraries for Illumina sequencing were generated using the Illumina Nextera Flex kit per the manufacturer’s instructions. Library quality was ensured using an Agilent TapeStation and sequenced using either an Illumina MiSeq or HiSeq4000 instrument. Long-read sequencing libraries were generated using the Oxford Nanopore Technologies Rapid Barcoding kit and GridION instrument per the manufacturer’s instructions. Raw Illumina-based platform sequencing reads were processed for de novo assembly and polymorphism discovery as previously described (63). Briefly, raw reads (150-bp paired end [PE] for HiSeq4000 or 300-bp PE for MiSeq) underwent error correction and de novo assembly using SPAdes (v3.12.0) (64). Corrected reads were mapped to the reference M4 GAS genome Duke (GenBank accession number CP031770 [28]) using SMALT (https://www.sanger.ac.uk/science/tools/smalt-0). Polymorphisms were identified using freebayes (v1.2.0) (65) and filtered for quality (minimum positional depth of coverage of 15 and minimum polymorphism frequency of 0.75). For phylogenetic reconstruction, polymorphisms were filtered to exclude insertions/deletions (prephix.py; https://github.com/codinghedgehog), concatenated (phrecon.py; https://github.com/codinghedgehog), and the subsequent alignment of biallelic SNPs fed into RAxML (v8.2) (36, 66). ClonalFrameML (36) and gubbins (35) were used to identify and account for potential recombination/horizontal gene transfer. The final newick files were manipulated using CLC Genomic Workbench (v20) to incorporate associated strain metadata. The complete genome sequence was resolved for a subset (n = 8) of emm4 GAS strains (Table S3). Strains were chosen to represent major subclades and included strains wild type at all known two-component systems and stand-alone regulators except in cases where the clade or subclade was defined by such a mutation. Complete genome assembly was performed using both Illumina short-read and Oxford Nanopore GridION long-read sequences. Average depth of coverage for completed genomes exceeded 300-fold and was at least 50-fold for both short- and long-read sequences. Complete hybrid (short- and long-read) assemblies were obtained using Unicycler (v0.4.6) (67). Final circular chromosomes were annotated using PGAP at NCBI (52). A pangenome analysis was performed on strains used in RNA-sequencing experiments (n = 8, Table S3) using Roary (v3.12.0) (68) to define shared gene content. Identification of MGE was performed using SPAdes-corrected short reads and SRST2 (v0.2.0) (69). MGE genotype was inferred based on the presence/absence of site-specific integrases or secreted virulence factors using a published database (45). Virulence factor content (e.g., exotoxin and secreted DNase) was assessed for consistency by blastn (70) comparison of de novo assemblies to a custom database containing all known exotoxin and secreted DNase alleles (71). In addition, completed genomes (n = 14) were manually inspected for the presence/absence of MGE, associated virulence genes, and sites of integration. A complete pangenome analysis was performed on the emm4 population (n = 1,127) using Roary (68). Individual annotated genomes for Roary pangenome analysis were generated using SPAdes de novo assembly followed by annotation with Prokka (v1.13.3) (72). Output from the pangenome analysis was subsequently fed into PANINI (https://panini.pathogen.watch) for identification of clusters using the Barnes-Hut t-SNE (33). Following clustering by PANINI, results were visualized using Microreact (https://microreact.org/) together with the maximum likelihood, recombination-corrected, phylogenetic tree and associated metadata.

In vitro growth analysis of selected emm4 GAS strains.

Growth curves for the four historical and four emergent clade strains were performed in THY or C medium as previously described (62). Briefly, overnight cultures grown at 37°C and 5% CO2 were used to inoculate (1/50 volume) prewarmed medium and distributed to a 96-well microtiter plate. Growth was monitored in a BioTek Synergy H1 plate reader, and readings were obtained every 30 min following shaking to redistribute cells in individual wells. Individual strains were assayed in technical quadruplicate and in biological triplicate on two independent days. Milk plate assays were performed in technical quadruplicate as previously described (39).

RNA extraction, sequencing, and analysis.

High-quality RNA was extracted from GAS cells grown to mid-exponential phase (optical density at 600 nm [OD600] of 0.5) in THY at 37°C and 5% CO2 using the Qiagen RNeasy minikit as previously described (73). The quantity and quality of RNA were assessed using nanodrop (ThermoFisher) and by Agilent TapeStation. RNA-seq libraries were prepared using the Illumina ScriptSeq Complete Bacteria (v3) kit per the manufacturer’s instructions. Libraries were sequenced (150-bp PE) using an Illumina HiSeq4000 instrument. RNA-seq analysis was performed using CLC Genomics Workbench (v20). Raw reads were processed and mapped to the reference genome ABC25 (CP049696). Analysis was restricted to core genome content (n = 1,619 open reading frames/genes) based on the pangenome analysis for the completed genomes. Significant differential gene expression was defined as an absolute fold difference of >1.5 and Bonferroni-corrected P value of <0.05.

Targeted gene transcript analysis.

For TaqMan real-time qRT-PCR, strains were grown in duplicate on two separate occasions to mid-exponential phase in THY as described above and processed as described previously (74). Primers and probes used targeted slo (forward, GACCTTTAAAGAGTTGCAACGAAAA; reverse, GACCATAAGCTACGTTACTCACAAAGA; probe, TGTCAGCAATGAAGCCCCGCC) with tufA (forward, CAATCGTCACTATGCGCACAT, reverse, GAGCGGCACCAGTGATCAT; probe, CTCCAGGACACGCGGACTACGTTAAAAA) as an internal control.

SLO immunoblot assay.

GAS strains were grown to mid-exponential phase in 45 ml of THY as described above. Cultures were centrifuged to separate cells, supernatants were precipitated with acetone and resuspended in 10 mM Tris and 150 mM NaCl. Protein concentration was estimated by the Bradford method, and equal amounts of total protein were separated by sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE) and transferred. HPr and Nga were detected using a custom-made antibody (75) and a commercially available (Abcam) polyclonal antibody, respectively, and the Odyssey imaging system as described previously (51).

Murine model of GAS bacteremia.

All mouse studies were performed under a protocol approved by the Institutional Animal Care and Use Committee at the MD Anderson Cancer Center. Four historic (ABC3, ABC25, ABC76, and ABC221) and four emergent (ABC199, ABC208, TSPY637, and TSPY767) strains were used. For the LD50 studies, seven female BALB/c mice per dose per strain (Harlan-Sprague-Dawley) were injected intraperitoneally with a range (5 × 106 to 6 × 108) of GAS CFU as described previously (57). Mice were monitored for near-mortality over 7 days, and survival was compared using Kaplan-Meier analysis. Differences in survival were calculated using a Mantel-Cox (log rank) analysis with a P value of <0.05 considered statistically significant.

Lancefield assays.

Bactericidal assays were performed as previously described (56) under a protocol approved by the Committee for the Protection of Human Subjects at the University of Texas (UT) Health/McGovern Medical School. Blood samples from a minimum of three healthy, nonimmune, adult donors were used for each strain and performed in quadruplicate. Multiplication factors were calculated by dividing the number of CFU per milliliter after 3 h of incubation by the initial inoculum.

Statistical analyses.

All statistical analyses were performed using GraphPad/Prism (v8). Statistical comparisons of continuous variables used Student’s t test (normally distributed) or Mann-Whitney U test (nonnormally distributed). Survival curves for mouse infection studies were compared using log rank test. P values less than 0.05 were considered significant for all tests.

Data availability.

All raw sequence data generated in this study are available at NCBI under BioProject accession number PRJNA609136. Complete annotated genome assemblies are available at GenBank under accession numbers CP049697 (ABC3), CP049696 (ABC25), CP049695 (ABC76), CP049691 (ABC199), CP049690 (ABC208), CP049689 (ABC221), CP049686 (TSPY637), and CP055246 (TSPY767). Raw sequence data for emm4 GAS strains sequenced by the CDC ABCs are available under BioProject PRJNA395240.
  75 in total

1.  Molecular complexity of successive bacterial epidemics deconvoluted by comparative pathogenomics.

Authors:  Stephen B Beres; Ronan K Carroll; Patrick R Shea; Izabela Sitkiewicz; Juan Carlos Martinez-Gutierrez; Donald E Low; Allison McGeer; Barbara M Willey; Karen Green; Gregory J Tyrrell; Thomas D Goldman; Michael Feldgarden; Bruce W Birren; Yuriy Fofanov; John Boos; William D Wheaton; Christiane Honisch; James M Musser
Journal:  Proc Natl Acad Sci U S A       Date:  2010-02-08       Impact factor: 11.205

2.  Characterization of the effect of the histidine kinase CovS on response regulator phosphorylation in group A Streptococcus.

Authors:  Nicola Horstmann; Pranoti Sahasrabhojane; Miguel Saldaña; Nadim J Ajami; Anthony R Flores; Paul Sumby; Chang-Gong Liu; Hui Yao; Xiaoping Su; Erika Thompson; Samuel A Shelburne
Journal:  Infect Immun       Date:  2015-01-05       Impact factor: 3.441

3.  Contribution of Secreted NADase and Streptolysin O to the Pathogenesis of Epidemic Serotype M1 Streptococcus pyogenes Infections.

Authors:  Luchang Zhu; Randall J Olsen; Jessica D Lee; Adeline R Porter; Frank R DeLeo; James M Musser
Journal:  Am J Pathol       Date:  2016-12-27       Impact factor: 4.307

4.  Cluster of Fatal Group A Streptococcal emm87 Infections in a Single Family: Molecular Basis for Invasion and Transmission.

Authors:  Anthony R Flores; Ruth Ann Luna; Jessica K Runge; Samuel A Shelburne; Carol J Baker
Journal:  J Infect Dis       Date:  2017-06-01       Impact factor: 5.226

5.  Plasmid-Borne erm(T) from invasive, macrolide-resistant Streptococcus pyogenes strains.

Authors:  Robyn L Woodbury; Kathryn A Klammer; Yang Xiong; Timothy Bailiff; Anita Glennen; Joanne M Bartkus; Ruth Lynfield; Chris Van Beneden; Bernard W Beall
Journal:  Antimicrob Agents Chemother       Date:  2008-01-07       Impact factor: 5.191

6.  Genome-scale rates of evolutionary change in bacteria.

Authors:  Sebastian Duchêne; Kathryn E Holt; François-Xavier Weill; Simon Le Hello; Jane Hawkey; David J Edwards; Mathieu Fourment; Edward C Holmes
Journal:  Microb Genom       Date:  2016-11-30

7.  Hypervirulent group A Streptococcus emergence in an acaspular background is associated with marked remodeling of the bacterial cell surface.

Authors:  Jessica Galloway-Peña; Sruti DebRoy; Chelcy Brumlow; Xiqi Li; Truc T Tran; Nicola Horstmann; Hui Yao; Ken Chen; Fang Wang; Bih-Fang Pan; David H Hawke; Erika J Thompson; Cesar A Arias; Vance G Fowler; Micah M Bhatti; Awdhesh Kalia; Anthony R Flores; Samuel A Shelburne
Journal:  PLoS One       Date:  2018-12-05       Impact factor: 3.752

8.  Identification of a chimeric emm gene and novel emm pattern in currently circulating strains of emm4 Group A Streptococcus.

Authors:  Sruti DebRoy; Xiqi Li; Awdhesh Kalia; Jessica Galloway-Pena; Brittany J Shah; Vance G Fowler; Anthony R Flores; Samuel A Shelburne
Journal:  Microb Genom       Date:  2018-11-09

9.  Capsule-Negative emm Types Are an Increasing Cause of Pediatric Group A Streptococcal Infections at a Large Pediatric Hospital in Texas.

Authors:  Anthony R Flores; J Chase McNeil; Brittany Shah; Chris Van Beneden; Samuel A Shelburne
Journal:  J Pediatric Infect Dis Soc       Date:  2019-07-01       Impact factor: 3.164

10.  T4 Pili Promote Colonization and Immune Evasion Phenotypes of Nonencapsulated M4 Streptococcus pyogenes.

Authors:  Yi-Hsuan Chen; Shao-Hui Li; Yao-Cheng Yang; Shu-Hao Hsu; Victor Nizet; Yung-Chi Chang
Journal:  mBio       Date:  2020-07-21       Impact factor: 7.867

View more
  2 in total

1.  Characterization of the Type I Restriction Modification System Broadly Conserved among Group A Streptococci.

Authors:  Sruti DebRoy; William C Shropshire; Chau Nguyen Tran; Haiping Hao; Marc Gohel; Jessica Galloway-Peña; Blake Hanson; Anthony R Flores; Samuel A Shelburne
Journal:  mSphere       Date:  2021-11-17       Impact factor: 4.389

2.  CovS inactivation reduces CovR promoter binding at diverse virulence factor encoding genes in group A Streptococcus.

Authors:  Nicola Horstmann; Kevin S Myers; Chau Nguyen Tran; Anthony R Flores; Samuel A Shelburne Iii
Journal:  PLoS Pathog       Date:  2022-02-18       Impact factor: 6.823

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.