| Literature DB >> 17062419 |
Daniel Falush1, Mia Torpdahl, Xavier Didelot, Donald F Conrad, Daniel J Wilson, Mark Achtman.
Abstract
In bacteria, DNA sequence mismatches act as a barrier to recombination between distantly related organisms and can potentially promote the cohesion of species. We have performed computer simulations which show that the homology dependence of recombination can cause de novo speciation in a neutrally evolving population once a critical population size has been exceeded. Our model can explain the patterns of divergence and genetic exchange observed in the genus Salmonella, without invoking either natural selection or geographical population subdivision. If this model was validated, based on extensive sequence data, it would imply that the named subspecies of Salmonella enterica correspond to good biological species, making species boundaries objective. However, multilocus sequence typing data, analysed using several conventional tools, provide a misleading impression of relationships within S. enterica subspecies enterica and do not provide the resolution to establish whether new species are presently being formed.Entities:
Mesh:
Year: 2006 PMID: 17062419 PMCID: PMC1764929 DOI: 10.1098/rstb.2006.1925
Source DB: PubMed Journal: Philos Trans R Soc Lond B Biol Sci ISSN: 0962-8436 Impact factor: 6.237
Figure 1Examples of speciation extracted from simulated bacterial populations. (a) A two-way speciation event (movie S2, electronic supplementary material); (b) a three-way speciation event (movie S3, electronic supplementary material); (c) loss and emergence of species in a large population (movie S4, electronic supplementary material). Each column shows the composition of the population at the time point (generations) indicated above the neighbour-joining tree (calculated using MEGA; Kumar ). For each time point in (a) and (b), additional subfigures summarize, from top to bottom, pairwise nucleotide-mismatch distributions, acceptance probability histograms, with an arrowhead indicating the r-connectivity, and sources of ancestry for each genome position for nine representative genomes. Sources of ancestry were estimated using the linkage model of Structure, assuming K=2 distinct ancestry sources for (a) and K=3 for (b). See the Salmonella analysis below for a description of how naive clustering is performed by Structure. The Structure input file contained the genotype of each strain or genome for all nucleotide sites that were polymorphic. Physical distances between adjacent polymorphic nucleotides were input as map distances and each run of structure consisted of a burn-in phase of 2000 iterations, followed by 5000 subsequent iterations. The population size, N, is 1000 in (a) and (b), and 2000 in (c). (a) and (c) were simulated according to a log-linear homology rule, while (b) was simulated according to a MEPS rule. For the log-linear rule, the average acceptance probability of imports between a pair of strains is estimated by averaging the import probability of 1000 bp stretches for all genome positions. For the MEPS rule, the probability is estimated by squaring the proportion of the two genomes that are identical for runs of 150 nucleotides or more.
Figure 2Effect of population size on divergence. Twenty independent simulations with N=500 (blue) and 20 independent simulations with N=1000 (red) were used in this graph. For each pair of genomes in each simulation and every 100 generations, the time to the most recent common ancestor (TMRCA) and the number of genetic differences were recorded. However, because deep branches of the genealogy correspond to many pairs of individuals, only one pairwise genetic distance, calculated for a randomly chosen pair of individuals, is shown for each possible value of the TMRCA in each generation of each simulation.
Genetic distances within and between STs for all Salmonella taxa. (Each cell shows the average nucleotide distances (above) and the average recombination acceptance probability according to the simulated log-linear model (below). The number of STs for each taxon is indicated in the first column. Note that enterica here excludes Typhi.)
| Typhi | ||||||||
|---|---|---|---|---|---|---|---|---|
| 0.012 | 0.022 | 0.032 | 0.036 | 0.044 | 0.044 | 0.061 | 0.104 | |
| 0.1 | 0.03 | 0.008 | <0.001 | <0.001 | 0.001 | 0.002 | <0.001 | |
| Typhi (4) | 0.0004 | 0.029 | 0.034 | 0.041 | 0.040 | 0.067 | 0.106 | |
| 0.8 | 0.004 | 0.001 | <0.001 | <0.001 | <0.001 | <0.001 | ||
| 0.006 | 0.028 | 0.028 | 0.029 | 0.060 | 0.102 | |||
| 0.3 | 0.006 | <0.001 | 0.002 | <0.001 | <0.001 | |||
| 0.039 | 0.039 | 0.064 | 0.110 | |||||
| <0.001 | <0.001 | <0.001 | <0.001 | |||||
| 0.009 | 0.042 | 0.067 | 0.111 | |||||
| 0.6 | <0.001 | <0.001 | <0.001 | |||||
| 0.002 | 0.067 | 0.105 | ||||||
| 0.6 | <0.001 | <0.001 | ||||||
| 0.014 | 0.106 | |||||||
| 0.4 | <0.001 | |||||||
| 0.004 | ||||||||
| 0.5 |
Figure 3Sequence data within Salmonella. (a) Neighbour-joining tree of genotypes within Salmonella. (b) Neighbour-joining tree of genotypes, mismatch distributions between genotypes and Structure analysis of sources of ancestry within enterica. Both the neighbour-joining tree and the Structure analysis identified three groups: clade A (red); clade B (green); and Typhi (blue). However, clades A and B are only weakly differentiated, as indicated by intermediate bootstrap support (60%) and between-clade distances, extensive allele sharing and a continuum of ancestry. (c) Neighbour-joining trees shown for each MLST fragment. Sequences from strains from clade A are shown as filled circles, from clade B as open circles, from Typhi as filled triangles and from other subspecies or bongori as open squares.