| Literature DB >> 21489287 |
Robert Boissy1, Azad Ahmed, Benjamin Janto, Josh Earl, Barry G Hall, Justin S Hogg, Gordon D Pusch, Luisa N Hiller, Evan Powell, Jay Hayes, Susan Yu, Sandeep Kathju, Paul Stoodley, J Christopher Post, Garth D Ehrlich, Fen Z Hu.
Abstract
BACKGROUND: Staphylococcus aureus is associated with a spectrum of symbiotic relationships with its human host from carriage to sepsis and is frequently associated with nosocomial and community-acquired infections, thus the differential gene content among strains is of interest.Entities:
Mesh:
Year: 2011 PMID: 21489287 PMCID: PMC3094309 DOI: 10.1186/1471-2164-12-187
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Bacterial Chromosome Sequence Datasets Used for Supragenome Analysis
| Genome* | Reference | Sensitivity | MBp | Contigs | %GC | Plasmids | Source |
|---|---|---|---|---|---|---|---|
| CGSSa00 | this publication | untested | 2.78 | 18 | 32.7 | unknown | |
| CGSSa01 | this publication | untested | 2.86 | 58 | 32.6 | unknown | elbow arthroplasty infection |
| CGSSa03 | this publication | untested | 2.83 | 68 | 32.8 | unknown | abdominoplasty infection |
| COL | Gill et al., 2005 | MRSA | 2.81 | 1 | 32.8 | 1 | |
| JH1 | Mwangi et al., 2007 | VISA | 2.91 | 1 | 33.0 | 1 | patient on vancomycin |
| JH9 | Mwangi et al., 2007 | VISA | 2.91 | 1 | 33.0 | 1 | patient on vancomycin |
| MRSA252 | Holden et al., 2004 | MRSA | 2.90 | 1 | 32.8 | 0 | |
| MSSA476 | Holden et al., 2004 | MSSA | 2.80 | 1 | 32.8 | 1 | |
| Mu3 | Neoh et al., 2008 | hetero-VISA | 2.88 | 1 | 32.9 | 0 | |
| Mu50 | Kuroda et al., 2001 | HA-MRVISA | 2.88 | 1 | 32.9 | 1 | pus, neonatal surgical infection |
| MW2 | Baba et al., 2002 | CA-MRSA | 2.82 | 1 | 32.8 | 0 | |
| N315 | Kuroda et al., 2001 | MRSA | 2.81 | 1 | 32.8 | 1 | pharyngeal smear |
| NCTC8325 | Gillaspy et al., 2006 | MRSA | 2.82 | 1 | 32.9 | 0 | |
| Newman | Baba et al., 2008 | MSSA | 2.88 | 1 | 32.9 | 0 | |
| RF122 (ET3-1) | Herron-Olson et al., 2007 | sensitive | 2.74 | 1 | 32.8 | 0 | mastitis (bovine) |
| USA300 (FPR3757) | Diep et al., 2006 | CA-MRSA | 2.87 | 1 | 32.8 | 3 | abscess, HIV + i.v. drug user |
| USA300TCH15 | Highlander et al., 2007 | CA-MRSA | 2.87 | 1 | 32.8 | 1 | asymptomatic pediatric patient |
*The NCBI's "genus species [subspecies]" name for each strain is either Staphylococcus aureus (for the bovine isolate RF122) or Staphylococcus aureus subsp. aureus. Abbreviations: Antibiotic sensitivity: CA, community-acquired; HA, healthcare-acquired; M, methicillin; R, resistant; S, sensitive; V, vancomycin; VI, V-intermediate; hetero-VI, heterogeneous VI; SA, Staphylococcus aureus.
Chromosomal Coding Sequence (CDS) Counts From Different Annotation Providers
| Genome | PGAAP | RAST | RefSeq | GenBank | CMR-P | CMR-T | IMG | ||
|---|---|---|---|---|---|---|---|---|---|
| CDS | Accession | CDS | Accession | ||||||
| CGSSa00 | 2,781 | 2,733 | n.a | n.a. | n.a | n.a | n.a | n.a | |
| CGSSa01 | 2,971 | 2,769 | n.a | n.a. | n.a | n.a | n.a | n.a | |
| CGSSa03 | 2,951 | 2,795 | n.a | n.a. | n.a | n.a | n.a | n.a | |
| COL | 2,864 | 2,687 | 2,615 | 2,673 | 2,712 | n.a | 2,649 | ||
| JH1 | 2,992 | 2,828 | 2,747 | 2,747 | n.a | n.a | 2,789 | ||
| JH9 | 2,997 | 2,828 | 2,697 | 2,697 | n.a | n.a | 2,731 | ||
| MRSA252 | 2,901 | 2,823 | 2,656 | 2,744 | 2,744 | 2,689 | 2,733 | ||
| MSSA476 | 2,829 | 2,679 | 2,579 | 2,619 | 2,619 | 2,524 | 2,614 | ||
| Mu3 | 2,945 | 2,777 | 2,698 | 2,699 | n.a | n.a | 2,698 | ||
| Mu50 | 2,949 | 2,785 | 2,697 | 2,699 | 2,714 | 2,628 | 2,697 | ||
| MW2 | 2,860 | 2,695 | 2,632 | 2,632 | 2,632 | 2,849 | 2,632 | ||
| N315 | 2,837 | 2,688 | 2,588 | 2,593 | 2,592 | 2,762 | 2,588 | ||
| NCTC8325 | 2,924 | 2,747 | 2,892 | 2,892 | 2,892 | 2,654 | 2,894 | ||
| Newman | 3,025 | 2,813 | 2,614 | 2,614 | n.a | n.a | 2,614 | ||
| RF122 | 2,795 | 2,715 | 2,509 | 2,589 | 2,589 | 2,595 | 2,579 | ||
| USA300 | 2,957 | 2,778 | 2,560 | 2,560 | 2,578 | n.a | 2,646 | ||
| USA300TCH15 | 2,955 | 2,783 | 2,657 | 2,657 | n.a | n.a | 2,710 | ||
Abbreviations: PGAAP, NCBI's "Prokaryotic Genome Automated Annotation Pipeline"; RAST, Argonne National Laboratory's "Rapid Annotation using Subsystem Technology" system; CMR, J. Craig Venter Institute's Comprehensive Microbial Resource (v 21.0); CMR-P and CMR-T, primary annotations and JCVI's re-annotations; IMG, DOE-Joint Genome Institute's Integrated Microbial Genomes (v. 2.5); n.a., not available. A RefSeq is derived from an underlying GenBank record, but the annotations in each record may differ.
Orthologous Clusters and Coding Sequences (CDS) in the S. aureus Supragenome
| Genome | Orthologous Clusters (genes) | CDS | |||||||
|---|---|---|---|---|---|---|---|---|---|
| All | Distributed | Unique | Non-core % | All | Core | Distributed | Unique | Non-core % | |
| CGSSa00 | 2,534 | 266 | 2 | 11 | 2,701 | 2,410 | 289 | 2 | 11 |
| CGSSa01 | 2,648 | 364 | 18 | 14 | 2,733 | 2,362 | 353 | 18 | 14 |
| CGSSa03 | 2,628 | 358 | 4 | 14 | 2,765 | 2,389 | 372 | 4 | 14 |
| COL | 2,543 | 270 | 7 | 11 | 2,649 | 2,374 | 268 | 7 | 10 |
| JH1 | 2,643 | 377 | 0 | 14 | 2,796 | 2,382 | 414 | 0 | 15 |
| JH9 | 2,643 | 377 | 0 | 14 | 2,796 | 2,382 | 414 | 0 | 15 |
| MRSA252 | 2,645 | 376 | 3 | 14 | 2,788 | 2,393 | 392 | 3 | 14 |
| MSSA476 | 2,553 | 275 | 12 | 11 | 2,643 | 2,370 | 261 | 12 | 10 |
| Mu3 | 2,629 | 363 | 0 | 14 | 2,747 | 2,369 | 378 | 0 | 14 |
| Mu50 | 2,629 | 363 | 0 | 14 | 2,754 | 2,370 | 384 | 0 | 14 |
| MW2 | 2,574 | 302 | 6 | 12 | 2,661 | 2,370 | 285 | 6 | 11 |
| N315 | 2,538 | 271 | 1 | 11 | 2,660 | 2,362 | 297 | 1 | 11 |
| NCTC8325 | 2,589 | 315 | 8 | 12 | 2,712 | 2,380 | 323 | 9 | 12 |
| Newman | 2,579 | 293 | 20 | 12 | 2,775 | 2,391 | 361 | 23 | 14 |
| RF122 | 2,524 | 205 | 53 | 10 | 2,682 | 2,391 | 238 | 53 | 11 |
| USA300 | 2,620 | 354 | 0 | 14 | 2,744 | 2,387 | 357 | 0 | 13 |
| USA300TCH15 | 2,620 | 354 | 0 | 14 | 2,746 | 2,390 | 356 | 0 | 13 |
Core clusters (2,266 here) have one or more representative CDS in each genome examined; unique clusters are represented in only one genome; and distributed clusters in more than one but not all genomes examined.
Figure 1Clustering of strains using neighbor-grouping analysis. The figure shows the relationships among the 17 Staphylococcus aureus genomes under study based on the percentage of shared distributed gene. Valid neighbor groups of genomes (see Materials and Methods) are enclosed in ellipses.
Figure 2Pair-wise gene possession comparisons among all 136 possible . The comparison of two strains is summarized in the (4-level) box at the intersection of the row and column corresponding to the respective strains. Pair-wise relationships are summarized based on the number of genes with orthologs in each of the two strains (S = similarity score, level 1 of each box); the number of genes with an ortholog in one strain but not the other (D = difference score, level 2 of each box); a composite comparison score (C = S - D, level 3 of each box); and the number of genes with orthologs found only in both strains (P = pair unique score, level 4 of each box).
Figure 3Finite supragenome model results using (. In our previous supragenome analyses carried out with Haemophilus influenzae and Streptococcus pneumoniae we used a version of the finite supragenome model that required fixed population gene frequency classes. This model has been updated to make the optimization function (the log-likelihood of the observed sample gene frequency histogram, i.e., the observed gene frequency class distribution among the |S| strains examined) dependent on the values of the population gene frequency vector (μ) as well as the values of the corresponding mixture coefficient vector (π, for the probability that a gene in a supragenome will be represented in one of the K classes of population gene frequencies). For a given species, the bottom graph plots the values of the vector μ against the product of the estimate of supragenome size and the values of the vector π, all obtained at the maximization of the log-likelihood function.
Figure 4Histogram of observed sample gene frequencies compared to the predicted number using the finite supragenome model. The number of genes for each frequency class was calculated using the results from our revised finite supragenome model (trained on all 17 strains). The observed and predicted number of core genes (2,266) found in all 17 strains agreed exactly; these values are not shown to avoid distortion of the scale of the graph. Distributed genes appear in two or more strains, but not all (from 2 to 16 here).
Figure 5Comparison of the observed and predicted supragenome parameters as additional strains are sequenced. The two panels on the left show observed (upper panel) and predicted (lower panel) numbers of new genes that were or would be found in the second to the nth genome for the number of strains examined (17) or a hypothetical study of 30 strains of Staphylococcus aureus. The two panels on the right show observed (upper panel) and predicted (lower panel) numbers of core and total genes that were or are predicted to be found in second to the nth genome for the number of strains examined (17) or a hypothetical study of 30 strains of Staphylococcus aureus. Observed new, core, and total genes were calculated using genomes examined in ascending order of their counts of non-core genes.
Mapping of S. aureus chromosomal CDS clusters to RAST annotation "Product" feature qualifiers
| 1 | 2,212 | 692 | 134 | 3038 |
| 2 | 41 | 56 | 0 | 97 |
| 3 | 11 | 6 | 0 | 17 |
| > 3 | 2 | 1 | 0 | 3 |
| Total | 2,266 | 755 | 134 | 3155 |
*Shown are the numbers of orthologous clusters of core, distributed, or unique type whose constituent CDS yield the indicated numbers of distinct RAST annotation CDS "Product" feature qualifiers.
Mapping of S. aureus RAST annotation "Product" feature qualifiers to chromosomal CDS clusters
| 1 | 1,473 | 258 | 62 | 1793 |
| 2 | 122 | 39 | 7 | 168 |
| 3 | 17 | 10 | 2 | 29 |
| > 3 | 11 | 9 | 1 | 21 |
| Total | 1,623 | 316 | 72 | 2011 |
*Shown are the numbers of distinct RAST annotation CDS "product" feature qualifiers that describe CDS belonging to either core, distributed, or unique clusters and where the relevant CDS yield the indicated number of distinct clusters. The CDS product feature qualifier "hypothetical protein" was deliberately excluded as it would be expected to map to different clusters.
Supragenome Coding Sequence (CDS) Gene Assignments to RAST Subsystems
| Subsystem annotation* | CDS count |
|---|---|
| • none | 20,117 |
| • Ribosome LSU bacterial | 558 |
| • Teichoic and lipoteichoic acids biosynthesis | 385 |
| • Heme, hemin uptake and utilization systems in Gram Positives | 358 |
| • Glycerolipid and Glycerophospholipid Metabolism in Bacteria | 357 |
| • DNA-replication | 357 |
| • Conserved gene cluster associated with Met-tRNA formyltransferase | 357 |
| • Ribosome SSU bacterial | 357 |
| • Peptidoglycan Biosynthesis | 340 |
| • tRNA modification | 339 |
| • Adhesins in Staphylococcus | 320 |
| • DNA repair, bacterial | 311 |
| • Methionine Biosynthesis | 307 |
| • tRNA aminoacylation | 289 |
| • Embden-Meyerhof and Gluconeogenesis | 255 |
| • Bacterial Cell Division | 255 |
| • pyrimidine conversions | 244 |
| • Translation factors bacterial | 242 |
| • Other defined categories (206 additional RAST subsystems) | 14,724 |
| ◦ none | 5,161 |
| ◦ Staphylococcal pathogenicity islands SaPI | 68 |
| ◦ ABC transporter oligopeptide (TC 3.A.1.5.1) | 62 |
| ◦ ESAT-6 proteins secretion system in Firmicutes | 60 |
| ◦ Methicillin resistance in Staphylococci | 47 |
| ◦ Adhesins in Staphylococcus | 39 |
| ◦ Restriction-Modification System | 33 |
| ◦ Cobalt-zinc-cadmium resistance | 31 |
| ◦ Potassium homeostasis | 27 |
| ◦ Teichoic and lipoteichoic acids biosynthesis | 22 |
| ◦ Aminoglycoside adenylyltransferases | 17 |
| ◦ Sex pheromones in | 16 |
| ◦ DNA repair, bacterial | 16 |
| ◦ tRNA modification E.coli | 16 |
| ◦ Nudix proteins (nucleoside triphosphate hydrolases) | 15 |
| ◦ Fosfomycin resistance | 14 |
| ◦ Tn552 | 14 |
| ◦ Glycerol and Glycerol-3-phosphate Uptake and Utilization | 12 |
| ◦ Peptidoglycan Biosynthesis | 12 |
| ◦ Other defined categories (15 additional RAST subsystems) | 60 |
| ❖ none | 130 |
| ❖ Restriction-Modification System | 3 |
| ❖ Streptothricin resistance | 1 |
| ❖ Teichoic and lipoteichoic acids biosynthesis | 1 |
| ❖ ABC transporter oligopeptide (TC 3.A.1.5.1) | 1 |
| ❖ Formaldehyde assimilation: Ribulose monophosphate pathway | 1 |
| ❖ Heme and Siroheme Biosynthesis | 1 |
*50% of core, 90% of distributed, and 94% of unique CDS could not be assigned to any RAST subsystem.
Chromosomal Coding Sequence (CDS) annotations associated with Methicillin Resistance
| Genome | Sensitivity | FemA | FemB | FemC | FemD | FmtA | FmtB | FmtC | HmrA | HmrB | LytH | MecA | MecI | MecR1 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| CGSSa00 | untested | 1 | 1 | 1 | 1 | 1 | 3 | 1 | 1 | 1 | 1 | 0 | 0 | 0 |
| CGSSa01 | untested | 1 | 1 | 1 | 1 | 1 | 2 | 1 | 1 | 1 | 1 | 1 | 0 | 1 |
| CGSSa03 | untested | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| COL | MRSA | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 1 |
| JH1 | VISA | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| JH9 | VISA | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| MRSA252 | MRSA | 1 | 1 | 1 | 1 | 1 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| MSSA476 | MSSA | 1 | 1 | 1 | 1 | 1 | 2 | 1 | 1 | 1 | 1 | 0 | 0 | 0 |
| Mu3 | hetero-VISA | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| Mu50 | HA-MRVISA | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| MW2 | CA-MRSA | 1 | 1 | 1 | 1 | 1 | 2 | 1 | 1 | 1 | 1 | 1 | 0 | 1 |
| N315 | MRSA | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| NCTC8325 | MRSA | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 |
| Newman | MSSA | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 |
| RF122 (ET3-1) | sensitive | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 |
| USA300 (FPR3757) | CA-MRSA | 1 | 1 | 1 | 1 | 1 | 2 | 1 | 1 | 1 | 1 | 1 | 0 | 1 |
| USA300TCH15 | CA-MRSA | 1 | 1 | 1 | 1 | 1 | 2 | 1 | 1 | 1 | 1 | 1 | 0 | 1 |
Abbreviations (see also Table 1): FemA, essential for MR (glycine interpeptide bridge formation); FemB, involved in MR (glycine interpeptide bridge formation; FemC, involved in MR (glutamine synthetase repressor); FemD (phosphoglucosamine mutase EC 5.4.2.10) involved in MR; FmtA, involved in MR (affects cell wall cross-linking and amidation); FmtB, (Mrp) involved in MR and cell wall biosynthesis; FmtC, (MrpF) involved in MR (L-lysine modification of phosphatidylglycerol); HmrA, involved in MR (amidohydrolase of M40 family); HmrB, Acyl carrier protein involved in MR; LytH, involved in MR (N-acetylmuramoyl-L-alanine amidase, EC 3.5.1.28 domain); MecA, Penicillin-binding protein PBP2a, MR determinant, transpeptidase; MecI, MR repressor; MecR1, MR regulatory sensor-transducer.