| Literature DB >> 28348822 |
Pekka Marttinen1,2, Nicholas J Croucher3, Michael U Gutmann4, Jukka Corander4, William P Hanage2.
Abstract
BACKGROUND: Population samples show bacterial genomes can be divided into a core of ubiquitous genes and accessory genes that are present in a fraction of isolates. The ecological significance of this variation in gene content remains unclear. However, microbiologists agree that a bacterial species should be 'genomically coherent', even though there is no consensus on how this should be determined.Entities:
Keywords: computational modeling; core/accessory genome; evolution; recombination; speciation
Year: 2015 PMID: 28348822 PMCID: PMC5320679 DOI: 10.1099/mgen.0.000038
Source DB: PubMed Journal: Microb Genom ISSN: 2057-5858
Fig. 1Gene frequency histograms (a, c, e) and strain distance distributions (b, d, f). The frequency histograms (a, c, e) show the number of very rare or common genes is much larger than the number of genes at intermediate frequencies; the red column represents the core genome (the overlapping grey bar represents frequencies f with 0.98 < f < 1). The distance distributions (b, d, f), obtained by averaging over the whole simulation after discarding initial samples, are based on pairwise comparisons of strains, showing the core genome (Hamming) distance on the x-axis and the gene content (Jaccard) distance on the y-axis (see Methods). A contour line encompassing the mode in the real data is shown in the simulated distributions for easier comparison. The columns show results in the real data (a, b), in the model with learned parameter values (c, d) and in the model with between-strain recombination increased by a factor of 10 (e, f).
Estimates for two parameters: r/m (the number of substitutions introduced by recombinations versus mutations) and the ratio of gene introduction/deletion rates
The second column reports the estimate from the model and the third column an estimate from a detailed genomic analysis (see Methods).
| Parameter | Model estimate | Genomic analysis |
| 8.0 | 11.3 | |
| Gene introduction/deletion | 1.3 | 1.4 |
Fig. 2Effects of geographical sampling bias and a recent bottleneck on the core genome Hamming distance distribution. Strains from a simulated generation, representative of the average shape, were selected as the initial population (a). The green rectangle highlights the region of interest, showing the increase in the number of closely related strain pairs in the real data. (b) The distance distribution after taking a geographically structured sample, averaged over 20 independent replicates (red curve). (c) The effect of a population bottleneck, obtained by selecting a specified number of strains (here 100 out of 2000 strains in total) as possible ancestors from which the next generation was sampled with replacement. Bottlenecks of other sizes are shown in Fig. S10. The distribution for the real data is shown in each panel for comparison.