Literature DB >> 27066306

Gene flow in microbial communities could explain unexpected patterns of synonymous variation in the Escherichia coli core genome.

Rohan Maddamsetti1.   

Abstract

Researchers contest the importance of gene flow in bacterial core genomes, as traditionalists view microbes as predominantly clonal, asexually reproducing organisms. Contrary to the traditional perspective, Escherichia coli core genes vary greatly in their levels of synonymous genetic diversity. This observation indicates that the relative importance of evolutionary forces such as mutation, selection, and recombination varies from gene to gene. In this paper, I highlight why the synonymous diversity observation is broadly relevant to researchers interested in the evolutionary dynamics of microbial populations and communities. I explain how a model of evolution called the coalescent relates neutral diversity (i.e. mutations with negligible fitness effects) to mutation rates, evolutionary time, and a parameter called effective population size. I then describe the possible ways in which mutation, selection, and recombination can explain observed patterns of synonymous diversity in E. coli. Finally, I describe a model for E. coli genome evolution in which different loci are subject to varying levels of gene flow among co-occurring microbes and viruses in the environment. Researchers can falsify the gene flow hypothesis by sequencing genes and strains isolated from stable microbiomes or by carrying out evolution experiments that trace gene genealogies in real-time.

Entities:  

Keywords:  Escherichia coli; experimental evolution; gene flow; genetic diversity; genome evolution; microbiome; molecular evolution; population genomics

Year:  2016        PMID: 27066306      PMCID: PMC4802760          DOI: 10.1080/2159256X.2015.1137380

Source DB:  PubMed          Journal:  Mob Genet Elements        ISSN: 2159-2543


Evolutionary dynamics of the Escherichia coli genome

As for many microbes, gene content across Escherichia coli strains is quite variable. The E. coli genome comprises a core set of genes shared by all E. coli isolates, and a set of flexible genes found in some but not all E. coli isolates. A commonplace assumption is that core genes share a common history of vertical descent. Over time, E. coli lineages accumulate mutations that have negligible effects on fitness. The rate at which these neutral mutations accrue is roughly proportional to the mutation rate. Synonymous mutations are a reasonable proxy for truly neutral mutations, because their fitness effects are usually (but not always) negligible compared to nonsynonymous mutations that change amino acid sequence. From this line of reasoning it follows that levels of synonymous genetic diversity in core genes should be roughly proportional to the mutation rate at those core genes. However, levels of synonymous genetic diversity vary by more than an order of magnitude over core E. coli genes. Such variation in levels of synonymous diversity causes the branch lengths of some gene trees to be uniformly longer than the branches of other gene trees without affecting tree topology. Trees for highly expressed, important housekeeping genes tend to have shorter branch lengths (less synonymous diversity) than less important core genes. The implication is that either the mutation rate unexpectedly varies over orders of magnitude over core E. coli genes, or that there is a serious flaw in the preceding argument linking synonymous diversity to mutation rates. The rest of this paper delves into the evolutionary theory behind synonymous diversity, and examines the evolutionary forces that could cause synonymous diversity to vary over E. coli core genes. I argue that it is a mistake to assume that core genes in the same E. coli genome share the same history of vertical descent, when in fact recombination and gene transfer can cause the history of core genes (or pieces of core genes) present in the same genome to differ substantially without affecting the topologies of bacterial phylogenies.

The Wright-Fisher model and the coalescent: Neutral models of molecular evolution

In this section, I explain how neutral models of evolution help in understanding patterns of synonymous diversity. The neutral theory of molecular evolution makes clear predictions for how genetic drift, in the absence of all other evolutionary forces, shapes genetic diversity. Neutral theory has become an essential tool for studying genome evolution because it is the null hypothesis that must be rejected before considering more complicated explanations for patterns of molecular variation. I highly recommend refs. 6-8 to readers who are interested in a broader overview as well as a deeper exposition of the following ideas. The Wright-Fisher model of neutral evolution describes an idealized population of N organisms (Fig. 1A). In the absence of natural selection, all organisms are equally fit. We measure time in discrete generations, and the population size is fixed at N. Every generation, we randomly pick organisms from the current generation to leave offspring in the next generation. As in all neutral models, evolution reduces to random sampling of a finite population.
Figure 1.

The population size in a neutral model of evolution also describes the average time for 2 lineages to coalesce in that model. A) One run of the Wright-Fisher model over 4 generations for a population of 4 individuals. B) The coalescent for the run of the Wright-Fisher model in part A). C) The probability that it takes t generations for 2 lineages to coalesce is identical to the probability of flipping t – 1 tails before flipping heads using a biased coin that has a probability of flipping heads (i.e., coalescence) of 1/N. A geometric distribution with mean N describes both processes.

The population size in a neutral model of evolution also describes the average time for 2 lineages to coalesce in that model. A) One run of the Wright-Fisher model over 4 generations for a population of 4 individuals. B) The coalescent for the run of the Wright-Fisher model in part A). C) The probability that it takes t generations for 2 lineages to coalesce is identical to the probability of flipping t – 1 tails before flipping heads using a biased coin that has a probability of flipping heads (i.e., coalescence) of 1/N. A geometric distribution with mean N describes both processes. Due to random sampling, eventually the whole population descends from a single organism. If we trace the ancestry of a population backward in time, eventually we come to this individual: the most recent common ancestor (MRCA) of the population. The basic premise of the coalescent is that we run a model of neutral evolution backward in time to the MRCA (Fig. 1B). The history of 2 given individuals coalesces in the generation in which they share a common ancestor. At any point in time, the probability that a second organism has the same ancestor as a first organism is . Therefore, the probability that 2 specific individuals coalesce in one generation is , and the probability that they do not coalesce is 1 – . Eventually, the histories of all individuals in the population coalesce to that of the MRCA. The probability that 2 specific individuals in the current generation coalesce t generations in the past is the probability that they do not coalesce for t – 1 generations backward in time and then coalesce in the t generation: . The coalescence of a pair of organisms is thus described by a geometric random variable X with a mean of N generations. The mathematics is identical to flipping a coin until reaching a flip of heads (Fig. 1C). Intuitively, it takes 2 coin flips on average to flip heads once. Flipping a long stretch of tails before flipping heads is unlikely with a fair coin, because the probability of flipping a long stretch of tails before flipping heads decreases geometrically . Coalescence for 2 specific individuals is like flipping a biased coin where the probability of heads (coalescence) is , and the probability of tails (no coalescence) is .

Effective population size, coalescence times, and neutral diversity

It is important to remember that N is not the population size for organisms evolving in the real world, but the population size of organisms in an idealized model of neutral evolution. For this reason, researchers add a subscript to make it clear that N is the population size of the idealized model of neutral evolution that best fits molecular data. Much of the power of coalescent theory derives from the fact that more complicated models of evolution involving recombination, natural selection, and population structure make predictions for patterns of molecular variation that are identical to a neutral model with an appropriately scaled effective population size N.9 In general, effective population sizes are usually orders of magnitude smaller than actual census population sizes in nature. For example, a population that has experienced a recent selective sweep or population bottleneck coalesces to the MRCA after a short period of time, causing a dramatically lower effective population size with regard to levels of neutral genetic diversity. Researchers interested in bacterial speciation have used computer simulations to demonstrate that recombination, mutation, and population structure (i.e., dividing a population into many subpopulations) can cause populations to cluster or diverge genetically in the absence of natural selection. In these models, effective population size is simply the number of organisms in the simulation, and levels of neutral genetic diversity depend on the relative importance of recombination, mutation, and population structure in the model. In neutral models, clusters of diverged genotypes (“species”) do not easily form in recombining populations, implying a strong role for either natural selection or strong population subdivision (or both) in bacterial speciation. In clonal populations, neutral genetic diversity should accumulate uniformly across the genome because all genes in a genome are completely linked, and thus equally affected by evolutionary forces such as mutation or natural selection. Variation in synonymous genetic diversity among core genes allows us to reject the null hypothesis that core E. coli genes experience the same evolutionary forces. Neutral theory applies equally well to genes as to individuals, so on average, the MRCA for 2 neutrally evolving sequences existed N generations in the past. If the mutation rate µ is constant over the genome, then the number of neutral genetic differences between 2 sequences in the present day is θ = 2 µN . If we use synonymous variation as a proxy for neutral genetic changes, then synonymous diversity θs = 2 µN is a natural statistical estimator for both the effective population size as well as the coalescence time for pairs of sequences. In the next section, I discuss possible explanations why synonymous genetic diversity varies so much across the core genome of E. coli.

Explanations for variation in synonymous diversity in E. coli core genes

Many evolutionary forces, including mutation, selection, and recombination, have similar as well as correlated effects on both µ and N. Disentangling the contributions of these forces to patterns of natural variation remains challenging. I discuss the effects of these evolutionary processes on µ and N in turn (Fig. 2).
Figure 2.

Mutation, selection, and recombination affect the branch lengths and topology of phylogenetic trees. A) Differing selection pressures or mutation rates can lengthen or shorten branch lengths. B) Recombination with an ingroup will not change the tree, while recombination with an outgroup always changes either the topology of the tree or disproportionately changes the length of some branches.

Mutation, selection, and recombination affect the branch lengths and topology of phylogenetic trees. A) Differing selection pressures or mutation rates can lengthen or shorten branch lengths. B) Recombination with an ingroup will not change the tree, while recombination with an outgroup always changes either the topology of the tree or disproportionately changes the length of some branches.

Mutation

One explanation for why some genes are more variable than others is mutation rate variation. While there is good evidence for local differences in the point mutation rate in bacterial genomes, explanations that solely rely on local mutation rate variation are implausible because no studies to date have found a correlation between mutation rates and patterns of synonymous variation in E. coli. In short, variation in the mutation rate does not appear to be strong enough to explain orders of magnitude differences in synonymous genetic diversity across E. coli core genes.

Natural selection

Selection plays an important role in determining genetic variability across loci. When a highly beneficial mutation sweeps through a population (positive selection), it also reduces genetic variability at linked sites and decreases the time to coalescence to the MRCA. Because a selective sweep reduces variation at all linked sites, this explanation cannot account for patterns in synonymous genetic diversity in E. coli without sufficient recombination, because a selective sweep uniformly reduces standing genetic diversity in completely clonal populations. Background selection is a more satisfying explanation for patterns of synonymous diversity in E. coli. Housekeeping core genes are more conserved on the amino acid level than other core genes, because mutations in these most essential core genes can have large effects on organismal fitness. This form of selection is known as purifying selection because it promotes sequence conservation. Purifying selection on deleterious mutations also decreases variability at nearby sites in the genome, and selection on neutral mutations due to purifying selection on nearby sites is called background selection. Background selection is the most parsimonious explanation for variation in synonymous diversity, although Martincorena et al. rejected it as a sufficient explanation. Negative frequency-dependent selection (balancing selection) on a locus preserves genetic diversity. Such beneficial mutations do not complete selective sweeps because the fitness advantage conferred by the mutation decreases as it increases in frequency in the population. Mutations conferring frequency-dependent advantages are common in evolution experiments, and are probably even more common in complex and heterogeneous environments such as the animal gut. However, this explanation again requires recombination, otherwise frequency-dependent selection would maintain synonymous variation at similar levels across the genome.

Recombination

Many studies have estimated the relative contributions of recombination and mutation to E. coli diversity. An important open question outside the scope of this paper is how and why diverse bacterial species and populations vary in their propensity toward freely-recombining and clonal lifestyles. Some natural populations of Synechococcus have enough homologous recombination to generate quasisexual evolutionary dynamics, while some Pseudomonas populations appear to be largely clonal. Recombination can affect synonymous diversity because a recombination event between diverged sequences causes multiple changes to appear simultaneously, while recombination between closely related or even identical sequences may not be detectable at all. If some genes have had a history of more successful recombination events with diverged homologs compared to other genes in the genome, then those genes will be more diverse than genes with a history of fewer successful recombination events. However, recombination with diverged homologs cannot explain observed patterns of synonymous diversity in E. coli. Any recombination event with an outgroup will either change the topology of the gene tree or cause anomalously long branches (Fig. 2B), while observed patterns of synonymous diversity in E. coli core genes are inconsistent with these predictions. Nonetheless, a combination of recombination and positive selection or negative frequency-dependent selection could account for some of the observed variation in synonymous diversity.

Mutagenic effects of recombination

Recent sequencing studies have found that new mutations correlate with the location of recent crossover events in human sperm as well as in plants and honeybees. It is unclear whether the molecular mechanisms responsible for elevated mutated rates in these studies also occur in E. coli. Nonetheless, error-prone repair of double-strand breaks associated with recombination events could contribute to higher levels of synonymous diversity at loci with a history of many successful but undetected recombination events in E. coli.

Population structure

Population structure measures the degree to which populations are not well-mixed. A simple case is a metapopulation, or a population subdivided into a large number of subpopulations. Populations can be structured at multiple spatial scales (i.e. subpopulations of subpopulations), and population structure generally maintains genetic diversity by restricting the scope of selective sweeps. Population structure can also reduce effective population sizes and coalescence times due to local extinctions, colonization events and local population bottlenecks.

Gene flow could explain patterns of synonymous genetic variation in E. coli

In this section, I present a model that combines aspects of recombination, selection, and population structure to explain patterns of synonymous genetic variation in Escherichia coli. Although this model is not parsimonious, it is testable and consistent with existing molecular and ecological observations in the literature. While it is well-known that flexible E. coli genes differ in their histories of recombination and selection across diverged microbial species in gut communities, the same may hold true for many E. coli core genes. Imagine a “wind” of diverse alleles blowing into a population of E. coli, this “wind” being the migration of alleles into the population from other E. coli populations, viral populations, or other microbes in the community. Resident genes under purifying selection can resist this “wind” more strongly, and they will have a shorter coalescence time than genes that cannot effectively resist replacement by diverse alleles. In terms of the Wright-Fisher model, gene flow between species within a community increases the effective population size of that gene compared to species-specific genes (Fig. 3). This argument is general in that it holds for subpopulations of a single bacterial species, or for populations of co-evolving phage and bacteria. For instance, imagine 2 subpopulations of E. coli, each adapted to different parts of an animal’s gut. Genes under stronger purifying selection in one subpopulation would better resist gene flow from the other subpopulation. The key point in this model is that gene flow within microbial communities can change effective population sizes and coalescence times at core genes without changing the topology of gene trees constructed with single isolates from diverse ecological sources. In the most extreme cases, between-species divergence and within-species polymorphism may be indistinguishable. One likely mechanism for gene flow in microbial communities are phage-bacteria infection networks in which generalized transducing phage infect multiple microbial species and act as viral vectors. The gene flow model makes a strong prediction: genes with high synonymous diversity should tend to cluster according to microbial community, while genes with low synonymous diversity should tend to cluster by species (Fig. 3). Evolution experiments or appropriate sampling of microbiomes could test this prediction to falsify the gene flow model.
Figure 3.

Different rates of gene flow at different loci causes effective population size to vary at these loci, in turn affecting gene tree coalescence times without changing tree topology for genes co-occurring in the same genome. A) Gene flow at this locus occurs between species within communities, increasing the effective population size of this locus. In this case, communities cluster in the gene tree. B) Gene flow does not occur between species at this second locus. The effective population size at this locus is the population size of the species in which it is found. In this case, species cluster in the gene tree.

Different rates of gene flow at different loci causes effective population size to vary at these loci, in turn affecting gene tree coalescence times without changing tree topology for genes co-occurring in the same genome. A) Gene flow at this locus occurs between species within communities, increasing the effective population size of this locus. In this case, communities cluster in the gene tree. B) Gene flow does not occur between species at this second locus. The effective population size at this locus is the population size of the species in which it is found. In this case, species cluster in the gene tree. The gene flow model has some support in the literature. Retchless et al. proposed the fragmented speciation model in which different segments of bacterial chromosomes become genetically isolated at different times. Species-specific alleles become isolated first; alleles can sweep across species boundaries, and gene flow stops earlier at earlier diverging loci. This study came to the conclusion that in some cases, it may not be possible to make a clear distinction between intraspecific and interspecific variability in microbes. Sheppard et al. found evidence of increasing gene flow between previously distinct Campylobacter species. Retchless et al. argued that phylogenetic incongruence in gene trees made with genes found in Escherichia, Salmonella, and Citrobacter provides further evidence for the fragmented speciation model. Luo et al. described the genomes of environmental isolates of E. coli and found little evidence of gene exchange with gut commensal E. coli due to plausible ecological barriers. Although they found within-clade transfer of core genes, this paper rejected the fragmented speciation model because fragmented speciation posits gene flow across E. coli clades except at niche-specific adaptive mutations or genetic incompatibilities restricting gene flow. Karberg et al. found that recently acquired genes in Salmonella and Escherichia genomes have similar codon usage frequencies, while core genes in Salmonella and Escherichia have noticeably diverged in codon usage. Therefore, it appears that Salmonella and Escherichia strains acquire genes from a common pangenome shared among enterobacterial species. Smillie et al. built a database of horizontally transferred sequences among 2,235 full bacterial genomes to explore the effects of phylogeny, geography, and ecology on horizontal gene transfer. This study found that shared ecology is far more important than phylogenetic relatedness in structuring networks of gene flow across bacterial species.

Conclusion

Synonymous genetic diversity depends on both the mutation rate and effective population size. In neutral models of evolution, effective population size has a second interpretation as the average time for 2 lineages to coalesce. Many evolutionary forces, including mutation, selection, and recombination can affect genome-wide variation in synonymous genetic diversity. While researchers recognize the importance of gene flow in structuring the flexible genome of microbes, gene flow may also affect the core genome of microbes. If so, gene flow could explain why highly important E. coli core genes have less synonymous genetic diversity than other core genes. While the importance of gene flow in microbial genome evolution depends strongly on ecological context, many important microbiomes, such as the animal gut, might be effectively described as metapopulations of genes that interact within and across genomes over multiple spatial and temporal scales.
  21 in total

1.  Phylogenetic incongruence arising from fragmented speciation in enteric bacteria.

Authors:  Adam C Retchless; Jeffrey G Lawrence
Journal:  Proc Natl Acad Sci U S A       Date:  2010-06-07       Impact factor: 11.205

Review 2.  The bacterial species challenge: making sense of genetic and ecological diversity.

Authors:  Christophe Fraser; Eric J Alm; Martin F Polz; Brian G Spratt; William P Hanage
Journal:  Science       Date:  2009-02-06       Impact factor: 47.728

3.  Microbial diversity. Fine-scale diversity and extensive recombination in a quasisexual bacterial population occupying a broad niche.

Authors:  Michael J Rosen; Michelle Davison; Devaki Bhaya; Daniel S Fisher
Journal:  Science       Date:  2015-05-29       Impact factor: 47.728

4.  Similarity of genes horizontally acquired by Escherichia coli and Salmonella enterica is evidence of a supraspecies pangenome.

Authors:  Katherine A Karberg; Gary J Olsen; James J Davis
Journal:  Proc Natl Acad Sci U S A       Date:  2011-11-29       Impact factor: 11.205

5.  Impermanence of bacterial clones.

Authors:  Louis-Marie Bobay; Charles C Traverse; Howard Ochman
Journal:  Proc Natl Acad Sci U S A       Date:  2015-07-21       Impact factor: 11.205

6.  Evidence of non-random mutation rates suggests an evolutionary risk management strategy.

Authors:  Iñigo Martincorena; Aswin S N Seshasayee; Nicholas M Luscombe
Journal:  Nature       Date:  2012-05-03       Impact factor: 49.962

7.  Convergence of Campylobacter species: implications for bacterial evolution.

Authors:  Samuel K Sheppard; Noel D McCarthy; Daniel Falush; Martin C J Maiden
Journal:  Science       Date:  2008-04-11       Impact factor: 47.728

8.  Evolution of the core genome of Pseudomonas syringae, a highly clonal, endemic plant pathogen.

Authors:  Sara F Sarkar; David S Guttman
Journal:  Appl Environ Microbiol       Date:  2004-04       Impact factor: 4.792

9.  Bacterial phylogenetic reconstruction from whole genomes is robust to recombination but demographic inference is not.

Authors:  Jessica Hedge; Daniel J Wilson
Journal:  mBio       Date:  2014-11-25       Impact factor: 7.867

10.  Synonymous Genetic Variation in Natural Isolates of Escherichia coli Does Not Predict Where Synonymous Substitutions Occur in a Long-Term Experiment.

Authors:  Rohan Maddamsetti; Philip J Hatcher; Stéphane Cruveiller; Claudine Médigue; Jeffrey E Barrick; Richard E Lenski
Journal:  Mol Biol Evol       Date:  2015-07-20       Impact factor: 16.240

View more
  1 in total

1.  Universal Constraints on Protein Evolution in the Long-Term Evolution Experiment with Escherichia coli.

Authors:  Rohan Maddamsetti
Journal:  Genome Biol Evol       Date:  2021-06-08       Impact factor: 3.416

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.