| Literature DB >> 30858837 |
Andries J van Tonder1,2, James E Bray3, Keith A Jolley3, Melissa Jansen van Rensburg4, Sigríður J Quirk5, Gunnsteinn Haraldsson5, Martin C J Maiden3, Stephen D Bentley2,6,7, Ásgeir Haraldsson8, Helga Erlendsdóttir5, Karl G Kristinsson5, Angela B Brueggemann1,4.
Abstract
Understanding the structure of a bacterial population is essential in order to understand bacterial evolution. Estimating the core genome (those genes common to all, or nearly all, strains of a species) is a key component of such analyses. The size and composition of the core genome varies by dataset, but we hypothesized that the variation between different collections of the same bacterial species would be minimal. To investigate this, we analyzed the genome sequences of 3,118 pneumococci recovered from healthy individuals in Reykjavik (Iceland), Southampton (United Kingdom), Boston (United States), and Maela (Thailand). The analyses revealed a "supercore" genome (genes shared by all 3,118 pneumococci) of 558 genes, although an additional 354 core genes were shared by pneumococci from Reykjavik, Southampton, and Boston. Overall, the size and composition of the core and pan-genomes among pneumococci recovered in Reykjavik, Southampton, and Boston were similar. Maela pneumococci were distinctly different in that they had a smaller core genome and larger pan-genome. The pan-genome of Maela pneumococci contained several >25 Kb sequence regions (flanked by pneumococcal genes) that were homologous to genomic regions found in other bacterial species. Overall, our work revealed that some subsets of the global pneumococcal population are highly heterogeneous, and our hypothesis was rejected. This is an important finding in terms of understanding genetic variation among pneumococci and is also an essential point of consideration before generalizing the findings from a single dataset to the wider pneumococcal population.Entities:
Keywords: accessory genome; bacterial population structure; core genome; next generation sequencing; pan-genome; pneumococcus
Year: 2019 PMID: 30858837 PMCID: PMC6398412 DOI: 10.3389/fmicb.2019.00317
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
Summary of the pneumococcal genome datasets analyzed in this study.
| Location | Genomes (n) | Years of isolation | STsa (n) | CCsb (n) | Serotypes (n) | PCV statusc | Source of data |
|---|---|---|---|---|---|---|---|
| Reykjavik | 986 | 2009–2014 | 98 | 42 | 31 | Pre- and post-PCV10 | |
| Southampton | 516 | 2006–2011 | 128 | 54 | 43 | Post-PCV7/13 | |
| Boston | 616 | 2001–2007 | 139 | 56 | 31 | Post-PCV7/13 | |
| Maela | 1,000 | 2004–2010 | 215 | 85 | 63 | PCV naive | |
Summary of the estimated core genome and pan-genome for each pneumococcal genome dataset.
| Location | % of genomes that possess each core gene | Putative paralogs (n) | Genes within estimated core genome (n) | Genes within pan-genome (n) |
|---|---|---|---|---|
| Reykjavik | ≥99.8 | 9 | 1,112 | 7,340 |
| Southampton | ≥99.8 | 3 | 1,135 | 6,821 |
| Boston | ≥99.8 | 6 | 1,108 | 6,885 |
| Maela | ≥99.9 | 5 | 671 | 12,184 |
FIGURE 1Bar graphs depicting the Clusters of Orthologous Groups (COGs) functional categories for each set of estimated core genes.
FIGURE 2Illustration of the estimated core genes in each dataset and the shared supercore genome. (A) Venn diagram depicting the numbers of core genes shared by the four datasets and the 558 genes in the shared supercore genome; (B) COGs functional categories for the supercore genes as compared to the additional core genes shared by the Reykjavik, Southampton and Boston datasets only.
FIGURE 3Phylogenetic tree representing the relationships among all 3,118 pneumococci, constructed based on the concatenated sequence alignment of 558 supercore genes. Clades were colored according to hierBAPS sequence cluster. Outer rings were annotated and colored as depicted in the legends. The serotypes and CCs represented by >10 and >20 genomes, respectively, were annotated here.
FIGURE 4Comparison of the number of genes in the pan-genome of each dataset. (A) Results of Roary pan-genome analyses using two different thresholds, ≥70 and ≥90% nucleotide sequence similarity, for each of the four datasets; (B) Venn diagram depicting the number of genes present in the pan-genome of each dataset and of those, which were shared between datasets (using the ≥70% sequence similarity threshold).
FIGURE 5Bar graphs depicting the Clusters of Orthologous Groups (COGs) functional categories of the unique genes in each dataset.
FIGURE 6Phylogenetic tree constructed based upon the sequences of 53 ribosomal gene sequences extracted from the 3,118 pneumococcal study genomes plus 1,000 genomes of 65 different non-pneumococcal Streptococcus spp. genomes. Pneumococcal and non-pneumococcal clusters are colored red and blue, respectively.
Large genomic regions that were unique to the Maela dataset.
| Representative genome | Length of region (bp) | No. of Maela genomes with region | GenBank best match (% identity) | Fragment type | |
|---|---|---|---|---|---|
| SMRU1398 | 66,142 | 7 | Tn | ||
| SMRU1170 | 59,943 (1 gap) | 2 | Tn | ||
| SMRU1457 | 51,873 | 11 | No significant match | Tn | |
| SMRU2268 | 41,961 | 2 | Tn | ||
| SMRU1351 | 39520 | 11 | No significant match | No significant match | Prophage |
| SMRU2725 | 38,392 | 19 | No significant match | No significant match | Unknown transposon fragment |
| SMRU392 | 34256 | 1 | No significant match | No significant match | Prophage |
| SMRU158 | 32,902 (2 gaps) | 1 | Unknown transposon fragment | ||
| SMRU1017 | 32,098 | 1 | No significant match | No significant match | Partial prophage sequence |
| SMRU128 | 30,967 | 46 | Pentose and glucoronate interconversion region | ||
| SMRU148 | 30,628 | 1 | No significant match | TnGBS2 | |
| SMRU1266 | 28998 | 1 | No significant match | No significant match | Partial prophage sequence |
| SMRU1770 | 26,625 (3 gaps) | 3 | Unknown transposon fragment | ||
| SMRU602 | 25,230 | 1 | Tn | ||
FIGURE 7Mobile genetic elements identified among the Maela genomes. (A–C) Tn1549-like ICE with Tn916; (D) Tn1549-like ICE without Tn916; (E) Tn916; and (F) TnGBS2.