| Literature DB >> 31915226 |
Hannah R Frost1, Mark R Davies2, Valérie Delforge1, Dalila Lakhloufi1, Martina Sanderson-Smith3, Velusamy Srinivasan4, Andrew C Steer5,6, Mark J Walker7, Bernard Beall4, Anne Botteaux1, Pierre R Smeesters8,5,9,6.
Abstract
The core Mga (multiple gene activator) regulon of group A Streptococcus (GAS) contains genes encoding proteins involved in adhesion and immune evasion. While all GAS genomes contain genes for Mga and C5a peptidase, the intervening genes encoding M and M-like proteins vary between strains. The genetic make-up of the Mga regulon of GAS was characterized by utilizing a collection of 1,688 GAS genomes that are representative of the global GAS population. Sequence variations were examined with multiple alignments, and the expression of all core Mga regulon genes was examined by quantitative reverse transcription-PCR in a representative strain collection. In 85.2% of the sampled genomes, the Mga locus contained genes encoding Mga, Mrp, M, Enn, and C5a peptidase proteins. These isolates account for 53% of global infections. Only 9.1% of genomes did not contain either an mrp or an enn gene. The pairwise identity within Enn (68.6%) and Mrp (83.2%) protein sequences was higher than within M proteins (44.7%). Gene expression varied between strains tested, but high expression was recorded for all genes in at least one strain. Previous nomenclature issues were clarified with molecular gene definitions. Our findings support a shift in focus in the GAS research field to further consider the role of Mrp and Enn in virulence and vaccine development.IMPORTANCE While the GAS M protein has been the leading vaccine target for decades, the bacteria encode many other virulence factors of interest for vaccine development. In this work, we show that emm-like genes are encoded in a remarkable majority of GAS genomes and expressed at a level similar to that for the emm gene. In collaboration with the U.S. Centers for Disease Control, we developed molecular definitions of the different emm and emm-like gene families. This clarification should abrogate mistyping of strains, especially in the area of whole-genome typing. We have also updated the emm-typing collection by removing emm-like gene sequences and provided in-depth analysis of Mrp and Enn protein sequence structure and diversity.Entities:
Keywords: M-like proteins; Mga regulon; Streptococcus pyogeneszzm321990; global GAS diversity
Year: 2020 PMID: 31915226 PMCID: PMC6952200 DOI: 10.1128/mSphere.00806-19
Source DB: PubMed Journal: mSphere ISSN: 2379-5042 Impact factor: 4.389
FIG 1Details of the genome and gene collections. Details on the process by which the genomes and genes available were selected for further analysis. The final collection of alleles provides the best representation of global diversity of the gene families available, while avoiding the possibility of confounding by sequence ambiguities. *, a total of 19 genomes had substantial sequence ambiguities in the emm gene domain but nevertheless contained emm-typing sequences. These genes were excluded from gene family analyses, but genomes were included in Mga composition analyses.
Nucleotide probes for in silico identification of Mga regulon genes
| Gene | Flexibility | Sequence | Probe in unique alleles (%) |
|---|---|---|---|
| 3 | GAGATTGAAAAACAGTACGATGTTATCGTGACAGATGTTATGGT | 386/390 (99.0) | |
| 1 | AACCAAGAAAAAGAAAAGTTAGAAGC | 295/295 (100) | |
| 1 | AACAAAGAGCTTGAAGAA | 623/635 (98.1) | |
| 0 | TATTSGCTTAGAAAATTAA | 624/635 (98.3) | |
| 1 | TCTGAGTTAACRCAAGCRAARRYTCAACTYKY | 350/352 (99.4) | |
| 1 | GAAGTAACAGTAACAGTTCACAACAAATCTGATAAACCTCAAGAGTTGTATTA | 550/553 (99.5) |
The flexibility number refers to the number of mismatched nucleotide (nt) bases allowed to provide 99 to 100% specificity and sensitivity. All emm genes contained at least one of the two emm probes; there were 32 genomes with 10 distinct emm alleles that contained the 3′ probe but not the 5′ probe and 18 genomes with 7 distinct emm alleles that contained the 5′ probe but not the 3′ probe.
Whatmore et al. (7).
FIG 2Configurations of Mga regulons. In the large collection of contiguous Mga regulon sequences, we identified 10 possible configurations of the regulon based on presence and positions of genes. All Mga regulons began and ended with the mga and scpA genes and could contain genes for Mrp, Emm, Enn, Pgs, protein H, SIC, and transposases. The most frequent Mga regulon configuration, with genes for the trio of M and M-like proteins, was present in around 74% of genomes, from emm types that are responsible for around half of global infections.
FIG 3Distribution of M and M-like protein lengths. Distribution of the lengths of all M and M-like proteins from the collection. Bars represent the number of genomes in the collection that contained a protein of the size indicated on the x axis. M proteins show the most diversity in protein lengths and can be both the smallest or the largest of the trio of proteins. Enn proteins are typically smaller and have less variable length distribution, and Mrp proteins are largely restricted to four possible lengths.
Amino acid sequence identities in different regions of M and M-like proteins
| Protein(s) | Amino acid identity (%) | |||
|---|---|---|---|---|
| Signal peptide | First 50 amino acids | 51st to repeat region | Repeat region to LPXTG | |
| Mrp | 98.0 | 43.2 | 91.6 | 96.1 |
| Emm | 81.8 | 15.0 | 41.4 | 87.9 |
| Enn | 96.2 | 27.3 | 56.2 | 93.7 |
| Mrp + Emm + Enn | 78.4 | 59.2 | ||
The variability between protein families was too great in the regions from the first amino acids of the mature protein to the repeat regions to perform a meaningful multiple alignment across the three protein families.
FIG 4Expression analysis of Mga regulon genes. cDNA from 19 isolates grown to mid-log phase in rich medium were analyzed for the expression of Mga regulon genes. The isolates were selected to be representative of all possible Mga regulon configurations and emm cluster diversity where possible. Primers were designed to amplify all members of the gene family where possible (mrp, enn, pgs, sph, and scpA) and to amplify a subset where sequence diversity necessitates. The dot plot symbols represent the mean value of the four qPCR analyses for each isolate, and the error bars represent the standard errors for all isolates for each gene.