Literature DB >> 34864972

Evolutionary Dynamics of Asexual Hypermutators Adapting to a Novel Environment.

Wei-Chin Ho¹, Megan G Behringer^1,2,3, Samuel F Miller¹, Jadon Gonzales¹, Amber Nguyen¹, Meriem Allahwerdy¹, Gwyneth F Boyer¹, Michael Lynch¹.

Abstract

How microbes adapt to a novel environment is a central question in evolutionary biology. Although adaptive evolution must be fueled by beneficial mutations, whether higher mutation rates facilitate the rate of adaptive evolution remains unclear. To address this question, we cultured Escherichia coli hypermutating populations, in which a defective methyl-directed mismatch repair pathway causes a 140-fold increase in single-nucleotide mutation rates. In parallel with wild-type E. coli, populations were cultured in tubes containing Luria-Bertani broth, a complex medium known to promote the evolution of subpopulation structure. After 900 days of evolution, in three transfer schemes with different population-size bottlenecks, hypermutators always exhibited similar levels of improved fitness as controls. Fluctuation tests revealed that the mutation rates of hypermutator lines converged evolutionarily on those of wild-type populations, which may have contributed to the absence of fitness differences. Further genome-sequence analysis revealed that, although hypermutator populations have higher rates of genomic evolution, this largely reflects strong genetic linkage. Despite these linkage effects, the evolved population exhibits parallelism in fixed mutations, including those potentially related to biofilm formation, transcription regulation, and mutation-rate evolution. Together, these results are generally inconsistent with a hypothesized positive relationship between the mutation rate and the adaptive speed of evolution, and provide insight into how clonal adaptation occurs in novel environments.

Entities: Chemical

Keywords: zzm321990 Escherichia colizzm321990 ; adaptation; bottleneck effects; drift barrier; mutation rate; mutational load

Mesh：

Substances：

Year: 2021 PMID： 34864972 PMCID： PMC8643662 DOI： 10.1093/gbe/evab257

Source DB: PubMed Journal: Genome Biol Evol ISSN： 1759-6653 Impact factor: 4.065

Significance Although mutations are a critical source for the adaptation in a new environment, whether or not elevated mutation rates can lead to elevated adaptation rates remains unclear, especially when the environment is heterogeneous. To address this issue, we evolved E. coli populations with different starting mutation rates in a complex medium for 900 days and then examined their fitness and genome profiles. In the populations with higher starting mutation rates, despite faster rates of genome evolution, fitness improvement is not significantly elevated, and most of the accumulated mutations represent passive consequences of linkage effects.

Introduction

Beneficial mutations are the ultimate source of adaptive evolution. Therefore, it is of interest to study how changes to mutational processes can influence the pace of adaptive processes. In terms of mutation rates, a theoretically complicated relationship with the rate of adaptation in asexual populations was proposed in the early studies on the evolution of sex (Muller 1932; Crow and Kimura 1965). These studies posited that, when mutation rates are low, such that the waiting time for a beneficial mutation to arise in a population remains long, increases in the mutation rate can result in linear increases in the adaptation rate. In contrast, when mutation rates are relatively high, such that multiple beneficial mutations frequently arise in different individuals within an asexual population, beneficial mutations may interfere with each other’s opportunity to spread through the entire population. Thus, the facilitating effect of mutation on the rate of adaptation becomes diminished; a phenomenon later termed clonal interference (Gerrish and Lenski 1998). More recent theoretical studies have suggested that the effective number of beneficial mutations per population is critical for the strength of clonal interference, and effective population sizes and effect-size distributions of mutations have also been proposed to be influential (Gerrish and Lenski 1998; Wilke 2004; Kim and Orr 2005; Bollback and Huelsenbeck 2007; Desai and Fisher 2007; Park and Krug 2007; Campos and Wahl 2010; Park et al. 2010; Good et al. 2012). When mutation rates are high, rare beneficial mutations can commonly emerge in genomes already containing some deleterious mutations, and such linkage can further impede adaptation, especially in asexual populations (Bachtrog and Gordo 2004; Good and Desai 2014; Luksza and Lassig 2014; McFarland et al. 2017; Penisson et al. 2017). The relationship between rates of mutation and adaptation is further complicated because mutation rates can be plastically different in various environments (Williams and Foster 2012; Long et al. 2016; Shewaramani et al. 2017) and evolve over time (Wielgoss et al. 2013; Swings et al. 2017). Given that most mutations are deleterious, high mutation rates create high mutational loads, potentially driving the spread of antimutator alleles and resulting in the evolution of lower mutation rates (Muller 1950; Kimura 1967; Lynch 2008). According to the drift-barrier hypothesis, the reduction of the mutation rate should continue until the addition of new antimutator alleles no longer reduces the mutational load to a significant enough extent to overcome genetic drift (Lynch 2010; Sung et al. 2012; Lynch et al. 2016). Consequently, if two populations with initially different mutation rates adapt to the same constant environment, the resulting difference in fitness-improvement rates can be less than predicted if both populations converge evolutionarily to similar mutation rates. Although ample discussion exists on the relationship between rates of mutation and adaptation, the theory has been mostly focused on constant environments with a simple fitness landscape, neglecting the likely complexity of more natural environments. The limited empirical evidence in asexual populations suggests that adaptation rates are a concave-down function of the mutation rate (Arjan et al. 1999; Desai et al. 2007; Sprouffske et al. 2018). However, the experimental environments in these studies were generally simple and homogeneous (Arjan et al. 1999; Desai et al. 2007; Sprouffske et al. 2018), or the populations of interest were already well-adapted to the experimental environment (McDonald et al. 2012). Thus, it remains to be determined whether the relationship between rates of mutation and adaptation in richer and more complex environments follows the same patterns as observed in simpler settings. To study how the mutation rate affects adaptation in a more complex setting, we investigated the long-term evolutionary changes of Escherichia coli grown in culture tubes containing a complex medium, Luria-Bertani (LB) broth, comprised a nutritionally rich mixture of multiple amino-acid based carbon sources (Sezonov et al. 2007). In contrast to evolution in flasks containing glucose-limited media, such environments can facilitate the rapid emergence of stable subpopulations and clonal divergence based on spatial niche differentiation and amino-acid metabolism divergence (Behringer et al. 2018). To vary the mutation rate, we evolved both WT populations (methyl-directed mismatch repair [MMR]+) and hypermutator populations with an impaired MMR pathway (MMR−, obtained by mutL knockout), for which the single-nucleotide mutation rate is 140-fold higher than that for the WT genetic background (Lee et al. 2012). As different population-genetic environments can alter the fixation probability of mutations and the proportion of effectively beneficial or deleterious mutations (Wahl et al. 2002), and different demographic settings have been found to affect the results of experimental evolution (Vogwill et al. 2016; Wein and Dagan 2019), for both WT and MMR− populations, we performed three different transfer sizes in daily transfers: 1/10 (large, L), 1/104 (medium, M), and 1/107 (small, S) to explore the generality of our experimental results. Here, we examine the differences in phenotypic and molecular evolution among these populations over the course of 900 days.

Results

Higher Initial Mutation Rates Do Not Lead to Faster Rates of Fitness Improvement

When batch cultured, E. coli commonly adapt to their experimental environments and show fitness improvement compared with their ancestors (Van den Bergh et al. 2018; McDonald 2019). To study fitness improvement in populations originating from genetic backgrounds with different initial mutation rates (MMR− and WT), we performed the experimental evolution in three different transfer dilution bottlenecks (L: 1/10, M: 1/104, and S: 1/107) for 900 days (supplementary fig. S1, Supplementary Material online). For each of six genetic background/transfer-size combinations (2 × 3), eight replicates were maintained. After 900 days of experimental evolution, we then performed head-to-head competition assays between evolved populations and their corresponding ancestors using four replicates. Across all 21 populations with data available (three were aborted; see Materials and Methods), mean fitness significantly increased relative to the time-zero ancestor, by a ratio of 1.14 (standard error, or SE = 0.019; P = 5.8 × 10−7, two-tailed t-test), indicating that the evolution of these populations was shaped by adaptive processes. The amount of fitness improvement of MMR− populations was not significantly different from that for WT populations at any of the transfer sizes (fig. 1; L: P = 0.26; M: P = 0.46; S: P = 0.15, nested ANOVA). In addition, no transfer sizes produced a ratio of mean fitness improvement significantly different from 1.0. For example, for the L transfer size, the mean fitness improvement for MMR− and WT backgrounds are respectively 0.27 (SE = 0.043) and 0.21 (SE = 0.047), and therefore the ratio (MMR−: WT) is 1.28 (SE = 0.35). Similarly, the ratios are 0.69 (SE = 0.28) and 2.76 (SE = 1.98) in the M and S transfer sizes, respectively (fig. 1). Thus, starting evolution as a hypermutator does not necessarily translate into a faster fitness-improvement rate.

Fitness improvement during experimental evolution. For evolved populations under different transfer sizes (orange for L, 1/10; blue for M, 1/104; or green for S, 1/107) and different genetic backgrounds (MMR− or WT), mean fitnesses relative to the ancestor at (A) Day 900, (B) Day 90, (C) Day 300, or (D) Day 600 are reported. Each open circle represents an estimated mean for an evolved population with at least three independent competition assays. The error bars represent SEs. Gray dashed lines denote the null expectation under no improvement from ancestral fitness. The means across all evolved lines for a combination of transfer size and genetic background are represented by colored horizontal lines; the numeric values of means and SEs are printed on the tops of each set of data. The P values for nested ANOVA are also shown on the top. Previous studies of E. coli in simpler, more homogeneous environments have shown that the most rapid increases in population fitness typically occur within 2,500 generations, after which the rate of adaptation significantly slows (Barrick et al. 2009). At 900 days, the L populations, which due to their large transfer size (1:10) experienced the least number of cell divisions (log210 ≈ 3.3 generations per day), had experienced ∼3,000 generations, whereas the S populations had experienced ∼21,000 generations (log2107 ≈ 23.3 generations per day). As such, the absence of significant differences in the cumulative amount of adaptation over this period—despite large differences in initial mutation rate—might be a consequence of both genetic backgrounds having exited an initial period of rapid fitness evolution. Thus, to better survey any temporal heterogeneity in the rate of adaptation, we further assessed fitness after 90, 300, and 600 days of evolution in response to L and S transfer sizes. The results of nested ANOVA again demonstrate a lack of evidence for an increase in initial mutation rates leading to an increase in the amount of fitness improvement (fig. 1). The results are not qualitatively changed even when the natural-logarithmic transformed fitness is used in the analysis (supplementary fig. S2, Supplementary Material online). Thus, high mutation rates did not result in accelerated fitness improvement in these asexual populations even in the early stages of adaptation.

Mutation Rates Evolve to Be More Similar Throughout Experimental Evolution

The indifference of adaptation rates to initial mutation rates might be explained if the mutation rates of hypermutator and WT populations became more similar during the evolution experiment, either due to a reduction in the mutation rate in initially hypermutating populations or to an increase of the rate in WT populations. To test this possibility, we performed fluctuation tests, which indirectly measure mutation rates at a resistance locus (Foster 2006), on different clones isolated from evolved populations after 900 days. Although the ratio of rifampicin-resistance mutation rates for the two ancestral lines (MMR−: WT) was 250 (supplementary fig. S3, Supplementary Material online), after 900 days of evolution, the mean difference in mutation rates between MMR− and WT backgrounds greatly decreased across all transfer sizes (fig. 2). For example, in the L transfer size, the mean mutation rate is 5.0 × 10−7 (SE = 2.7 × 10−7) and 1.6 × 10−8 (SE = 1.2 × 10−8) for MMR− and WT evolved populations, respectively. Therefore, the ratio of mean mutation rates (MMR−: WT) was reduced to 32 (SE = 29). However, this reduction seems mostly a consequence of the occasional emergence of higher mutation rates in the WT background, as only one population (115) among four tested WT/L populations shows a significant increase in the mutation rate for both tested clones. For the M and S transfer sizes, the ratios of mean mutation rates (MMR−: WT) were also reduced to 32 (SE = 15) and 12 (SE = 3.9), respectively. However, here the repeated evolution of lower mutation rates in clones isolated from the MMR− background plays a more important role in the reduction, as all of the tested clones from MMR−/M and MMR−/S populations show a significant mutation rate decrease. Thus, the evolutionary convergence of mutation rates likely contributes to the difference in adaptation rates being less than what might be expected based on initial differences in mutation rates. Interestingly, these observations demonstrate how transfer schemes can affect the evolutionary dynamics of mutation rates in asexual populations.

Evolution of mutation rates after 900 days of experimental evolution. Each panel shows mutation rates of evolved populations in a combination of transfer size (orange for L, 1/10; blue for M, 1/104; or green for S, 1/107) and genetic background (MMR− or WT). In each combination, three or four evolved populations were tested. Two clones per evolved population were isolated and measured. An open circle and an error bar represent the mean and the 95% confidence interval for a clone. The gray dashed line represents the mutation-rate estimate of the corresponding ancestor. Each colored horizontal line represents a mean mutation-rate measurement of a combination; the value of mean and the SE are also printed.

Genome Evolution Rates Are Less Different Than Predicted by Initial Mutation Rates

To enhance our understanding of how the tempo and mode of genomic evolution relate to fitness and phenotypic evolution, we performed metapopulation sequencing of each experimental population roughly every 100 days to acquire mutation profiles of derived allele frequencies (DAFs; supplementary fig. S4, Supplementary Material online). For each combination of genetic background and transfer size, we estimated the rate of genomic evolution by regressing the number of mutations per clone (i.e., the sum of DAFs of all observed SNPs) of all populations against the number of generations at each sampled time point. Because a quadratic regression model did not significantly outperform the linear regression (P = 0.028 for MMR−/L; P > 0.10 for the others, nested ANOVA), we will focus on the results of the linear regression below. As with the rate of adaptation, for all three transfer sizes, the ratio of the rate of genomic evolution between the two backgrounds (MMR− versus WT) was much smaller than the initial difference in mutation rates (fig. 3). For example, under the L transfer size, the rate of genomic evolution is 115 (SE = 5.3) and 30 (SE = 4.3) mutations per clone per 1,000 generations for MMR− and WT populations, respectively. Therefore, the ratio of the genomic-evolution rates is only 3.8 (SE = 0.58). Similarly, in the M and S transfer sizes, the ratios are 20 (SE = 1.3) and 23 (SE = 1.6), respectively. This observation remains qualitatively similar even when different kinds of measurements for genomic divergence are used, for example, the number of detected mutations (supplementary fig. S5 online), nucleotide diversity (supplementary fig. S5 online), or when genomic divergence was estimated only by synonymous SNPs (supplementary fig. S6, Supplementary Material online).

Rate of genomic evolution in evolved populations. Each panel shows the results of evolved populations in two genetic backgrounds (MMR− and WT) in a transfer size (L, M, or S). Each dot shows a mean number of SNPs per clone for MMR− (open circles) or WT (closed circles) populations at a sequencing time point. The error bars represent the associated standard errors (some are invisible because smaller than the height of points). The colored dashed and solid lines are linear regressions against the time for MMR− and WT populations, respectively. The estimated slope (b) and associated standard error are also printed for each regression line. The gray dashed and solid lines represent how evolved populations accumulate mutations under neutral expectation with the initial mutation rates of MMR− and WT ancestors. To further survey how genomic evolutionary rates vary across experimental populations and to determine if the mean rates accurately represent the majority of experimental populations, we separately measured the rate of genomic evolution for each experimental population (supplementary fig. S7, Supplementary Material online). This revealed that the distribution of rates in the WT populations under the L transfer size is wider than the distributions of rates in all other combinations of genetic background and transfer sizes. Specifically, in the L transfer size, although five of the eight WT populations exhibit a genomic evolutionary rate of ∼10 mutations per clone per 1,000 generations, one WT population (115) has a rate about ∼2× higher, and two WT populations (101 and 113) have a rate close to 100, ∼10× higher and similar to MMR− populations. Consistent with the fluctuation-test results noted above, these results suggest that some, but not all, WT populations under the L transfer size evolved a higher mutation rate (see Discussion).

Mutations Arising in Hypermutators Are More Likely to Be Fixed

Using longitudinal metagenomics-sequencing data allows one to observe evolutionary dynamics at the level of single mutations and thus better understand the entire adaptive process. Here, we focus on fixed mutations because they are more likely to contribute to adaptation than polymorphic mutations or other mutations that are transient in a population. Because our experimental-environment setting facilitates the development of subpopulation structure, we applied clade-aware hidden Markov chain (caHMM) analysis. Assuming coexistence of two clades (major and minor) in the population, caHMM considers each mutation’s DAFs found in different sequencing time-points and then infers which clade each mutation belongs to, whether each mutation reaches within-clade fixation, and when a fixed mutation reaches fixation (Good et al. 2017). One characteristic parameter in an adaptive process is the fraction of mutations that reached fixation in a population. Using the observed numbers of both detected and fixed mutations, we first tested whether hypermutator populations have a different fraction of mutation that reached fixation. Different mutation types have a different potential to impact populational fitness. For example, compared with synonymous SNPs, nonsynonymous SNPs have a greater potential to change protein functions; intergenic SNPs have a greater potential to change protein expression; and structural variations (SVs; including indels and mobile-element insertions) have a greater potential to disrupt a protein. Therefore, we performed separate tests on the conditional fractions of these four functional categories of detected mutations that reached fixation. Although not always significant, the fixation fraction in the MMR− populations is generally higher than in WT populations across different transfer sizes and different categories of mutations (fig. 4). The fixation fraction is similar across different categories of mutations, regardless of their perceived potential to affect fitness, suggesting that the fixation of most mutations is a consequence of genetic hitchhiking as opposed to intrinsic beneficial effects.

Analysis of strength of natural selection associated with fixed mutations in different categories in different treatments of experimental evolution. (A) Each symbol shows the population mean fraction of detected nonsynonymous mutations (squares), intergenic mutations (diamonds), synonymous mutations (crosses), or SV mutations (triangles) that reached within-clade fixation in each combination of transfer size (L, M, or S) and genetic background (MMR− or WT). The error bars show the 95% confidence intervals. (B) Each symbol shows the mean selection coefficient of nonsynonymous mutations (squares), intergenic mutations (diamonds), synonymous mutations (crosses), or SV (triangles) that are fixed in any clade in any population belonging in a combination of transfer size and genetic background. The error bars show the 95% confidence intervals. (C) Each square shows the population mean neutrality index of nonsynonymous mutations for a combination of transfer size and genetic background. The error bars show the 95% confidence intervals. The horizontal gray lines denote the point of neutrality (1.0). (D) Each square shows the population mean neutrality index of intergenic mutations for a combination of transfer size and genetic background. The error bars show the 95% confidence intervals. The horizontal gray lines denote the point of neutrality (1.0). Another critical factor determining the temporal dynamics of a mutation is the underlying fitness effect. Using the temporal data of allele frequencies for a mutation, we can quantify the net selection coefficient, which reflects an allele’s own fitness effects but can be potentially influenced by the effects of linked mutations. We did not find a significant difference between the selection coefficients of fixed mutations in the two genetic backgrounds (fig. 4). More importantly, we also found that different categories of mutations show similar mean selection coefficient estimates, which again suggests a pivotal contribution of hitchhiking effects to the mutational dynamics of genome evolution in asexual populations. Given this observation, we assumed that the fixation of mutations that occurred in the same clade at the same time belong to one selective sweep. After calculating the mean selection coefficients per selective sweep, we again found no significant differences of selection coefficients between two genetic backgrounds (supplementary fig. S8, Supplementary Material online).

Neutrality Tests Reveal Partial Evidence for Positive Selection

In theory, comparing the number of fixed mutations in functional categories of sites with different potential effects on fitness can summarize general patterns in the mode of genome evolution (McDonald and Kreitman 1991; Rand and Kann 1996). If a population has experienced strong positive selection on protein function or expression, it is expected that there will be more fixed mutations with a greater potential to change protein function or expression than mutations with smaller such potential. If a population has experienced purifying selection on protein function or expression, it is expected that there will be fewer fixed mutations with a potential to change protein function/expression than mutations with smaller such potential. Therefore, the ratio of the number of fixed nonsynonymous synonymous SNPs (FN) to the number of fixed synonymous SNPs (FS) or the ratio of the number of fixed intergenic SNPs (FI) to FS is predicted to be elevated by strong positive selection. Similarly, FN/FS or FI/FS are predicted to be small with a strong purifying selection. However, these ratios are not necessarily directly comparable in MMR− and WT backgrounds because the two genetic backgrounds have varied mutational spectra and different relative rates of nonsynonymous and synonymous mutations (Lee et al. 2012). To address this issue, we normalized the observed FN/FS by the ratio of nonsynonymous and synonymous mutations (UN/US) previously observed in a mutation accumulation experiment that utilized our exact ancestral genotypes (Lee et al. 2012). The ratio of these ratios (FN/FS)/(UN/US), will be referred to as the neutrality index of nonsynonymous SNPs, because ratios significantly >1.0 or <1.0 imply, respectively, a predominance of positive selection in driving mutation fixation or a predominance of purifying selection driving mutation extinction. This index is analogous to Tachida’s index for neutrality (Tachida 2000), but with a different normalizing approach—via results from mutation-accumulation experiment instead of polymorphism, and is a preferred approach as it eliminates any potential problems with selection on silent sites. This definition of the neutrality index can be applied to any kind of mutation. For example, we also define the neutrality index of intergenic SNPs as (FI/FS)/(UI/US), where UI is the number of intergenic mutations observed in mutation accumulation (Lee et al. 2012). The neutrality index of nonsynonymous SNPs in MMR− populations under the L transfer size is significantly larger than one, consistent with strong positive selection (fig. 4). Moreover, the neutrality index of intergenic SNPs in MMR− populations under all transfer sizes is significantly smaller than one, suggesting overall strong purifying selection on intergenic SNPs (fig. 4). On the contrary, the neutrality index in WT populations never significantly differs from 1.0. Although this may indicate a weaker strength of selection in WT populations, it may also be the result of insufficient statistical power due to an overall smaller number of fixed SNPs in WT populations, evident by their large confidence intervals associated with the point estimation.

Parallel Evolution of Fixed Mutations at the Genic and Nucleotide Level

As the previous analysis suggests that the statistical power of the neutrality index may be insufficient, we also considered parallelism of fixed mutations to examine the action of positive selection using two different metrics. First, we utilized a sum of G-scores to measure the excess parallelism across fixed nonsynonymous mutations relative to the expectation based on gene lengths (Tenaillon et al. 2016). For each gene, a higher G-score implies that the gene is more enriched for fixed nonsynonymous mutations than expected by gene length. Summing the G-scores of all genes for each combination of genetic background and transfer size revealed statistical significance in all cases (z-test, fig. 5). Second, we calculated the mean Bray–Curtis similarity of the number of fixed nonsynonymous mutations across all genes (Turner et al. 2018) for all pairwise comparisons of evolved populations in each combination of genetic background and transfer size, again revealing statistically significant similarity in all cases (z-test, supplementary fig. S9, Supplementary Material online). Therefore, both findings suggest that positive selection has shaped the genomic evolution of populations in all genetic-background/transfer-size combinations.

Evolutionary parallelism as evidence for positive selection. (A) The vertical arrow shows the observed sum of G-scores, representing the extent of parallel mutation for a particular combination of transfer size and genetic background. The histogram shows the distribution of 20,000 simulated sums of G-scores, representing the null distribution of evolutionary parallelism. The significance of the observed sums can be evaluated by z-scores (z > 1.65 for one-tailed P < 0.05). (B) List of genes with fixed nonsynonymous mutations. Significance levels (simulated P values with Bonferroni correction) are shown by the different nonblack colors of tiles. Genes with no such hits in a particular combination are shown by black tiles. (C) List of genes with fixed intergenic mutations. Significance levels (simulated P values with Bonferroni correction) are shown by the different nonblack colors of tiles. Intergenic regions with no such hits in a particular combination are shown by black tiles. (D) List of genes significantly overrepresented for structural mutations that are likely under positive selection. Yellow tiles highlight genes observed in at least two populations and yielding simulated P values < 0.05 after Bonferroni correction. The analysis of parallelism also helps reveal which mutations are most likely to be drivers of adaptation. Using the same G-score analysis, in each genetic-background/transfer-size combination, we identified genes that were overrepresented for fixed nonsynonymous mutations (P < 0.05, Bonferroni correction; fig. 5). Gene Ontology (GO) analysis on these gene subsets revealed that significantly enriched GO terms were often related to transcription regulation and biofilm formation (supplementary table S1, Supplementary Material online). Using similar methods, we also identified subsets of genes enriched for fixed mutations in intergenic regions (fig. 5) and for structural mutations, including indels and IS-element insertions (fig. 5). Together, the genes in these lists serve as good candidates for revealing the various mechanisms important to adaptation in complex environments (see Discussion). In addition to genic-level parallelism, we also identified nucleotide-level parallelism. In particular, 197 cases of parallel fixed nonsynonymous mutations were identified in at least two experimental populations within the same combination of transfer size and genetic background (supplementary table S2, Supplementary Material online). Because the probability that fixed mutations would occur by chance at a given nonsynonymous site in at least two experimental populations is very low, the observed parallelism at the nucleotide-level again suggests positive selection and identifies important candidates for studying the molecular mechanisms associated with adaptation in complex media. For example, three instances of parallel mutation within fimH were located in its mannose-binding domain and therefore may be good candidates for future functional studies of cell adhesion (Schembri et al. 2001). Furthermore, three instances of parallel-fixed nonsynonymous mutations were found in genes with GO terms associated DNA repair or DNA replication, including dnaE (DNA pol III subunit α), yajL (protein/nucleic acid deglycase 3), and nrdA (ribonucleoside-diphosphate reductase 1, α subunit dimer). As all three of these cases arose in M and S transfer sizes with MMR− backgrounds, they may be good candidates for studying the molecular mechanisms associated with lowering mutation rates.

Discussion

Here, we describe the effect of the initial mutation rate on the rate of fitness improvement and genomic evolution by experimentally evolving E. coli with two distinct mutation-rate backgrounds. Although the initial difference in mutation rates of these two genetic backgrounds is >100×, after 900 days of evolution, the differences in the net rates of fitness improvement (0.7–2.7 fold), in the mutation rates (12- to 32-fold), and in the level of genome evolution (4- to 23-fold) are much smaller. These results suggest that elevating the mutation rates in asexual populations does not proportionally increase the rate of fitness improvement, possibly due to clonal interference, linkage to deleterious mutations, and/or evolution of lower mutation rates. In addition, we found that mutations arising in MMR− populations exhibit a higher fraction of fixation than mutations in WT populations, whereas mutations in categories with different fitness effects show similar fixation probabilities within each genetic-background treatment, suggesting the strong influence of hitchhiking effects in genome evolution in these populations. Despite the strong linkage, a high degree of parallelism in mutations arising in particular genes and nucleotides still suggests positive selection shaping genome evolution in all transfer-size/genetic-background combinations. The observed mutations with high parallelism serve as excellent candidates for understanding the mechanisms of adaption to the complex conditions provided by the experimental environment.

Effects of High Mutation Rates on Evolution

Our experiments reveal that, even over 900 days of evolution in a complex medium, hypermutating E. coli does not necessarily exhibit a faster rate of adaptation than wild-type E. coli. This observation is consistent with previous experimental-evolution results over a shorter period and in simpler media, such as glucose medium DM25 for 1,000 generations (Arjan et al. 1999) and glucose medium DM1,000 for 3,000 generations (Sprouffske et al. 2018). Our experiments further demonstrate that populations with initial hypermutator backgrounds can rapidly evolve lower mutation rates. Together with other empirical work on prokaryotic (Sprouffske et al. 2018) and eukaryotic hypermutators (McDonald et al. 2012), these results suggest that strong genetic load due to deleterious mutations remains a pivotal factor in the evolution of mutation rates, consistent with the drift-barrier hypothesis (Lynch 2010; Sung et al. 2012). Beneficial mutations may also play a role in the evolution of hypermutators, suggested by our finding that the evolution of higher mutation rates tends to be found in populations with the L transfer size. Even when some new hypermutator alleles can spread in a population by linkage with other beneficial mutations during the adaptive process and thus briefly improve the rate of adaptation (Sniegowski et al. 1997; Tenaillon et al. 1999, 2001), such events are usually transitory (Giraud et al. 2001; Desai and Fisher 2007; Wielgoss et al. 2013). Moreover, the fact that lowering the mutation rate could not have involved a reversion of deleted MMR in our experiments implies that there is excess capacity for improving replication fidelity through other parts of E. coli genome. As mutation-rate evolution occurred over a relatively short period in our study, the results bear on several critical questions for future studies, including the rapidity of the dynamics of the evolution of mutation rates and the consequences for the mutational spectrum. Another interesting question on the relationship between genetic backgrounds and evolutionary outcomes is whether different genetic backgrounds alter the set of beneficial mutations in the fitness landscape. As there is significant parallelism between WT and MMR− populations in each of the three transfer sizes (supplementary fig. S10, Supplementary Material online), the differences in their fitness landscapes are likely limited.

Effect of Genetic Linkage on Evolution

In asexual populations with reduced recombination, the fate of a mutation is largely affected by its association with other mutations due to strong genetic linkage (Gillespie 2000; Neher 2013; Couce et al. 2017). Extensive hitchhiking is a feature of our evolving E. coli populations, as we observe similar fractions of fixed mutations and net associated fitness effects for fixed mutations across different functional categories. To better illustrate this effect of genetic linkage, we showed that the temporal DAF changes of any two SNPs in the same genome are highly correlated (i.e., a rightly skewed distribution of correlation coefficients) and result in more largely positive correlation coefficients than for a random expectation of nonlinked mutations (supplementary fig. S11, Supplementary Material online). Therefore, even though MMR− populations show a range of genome-evolution rates 4–23× higher than WT populations, the excess of fixed mutations does not directly contribute to adaptation rates. Rather, with no recombination in the experimental populations, the adaptation rates are heavily affected by clonal interference and linkage to deleterious mutations (Schiffels et al. 2011). In the future, it will be interesting to measure the fitness of single clones within a population to further reveal the effect of clonal interference. By sequencing single clones and measuring fitness effects of single mutations, it will be possible to separate the driver mutations from hitchhiking mutations and to dissect the evolutionary dynamics under linked selection.

Effects of Transfer Sizes on Evolution

The three different transfer sizes (L, M, and S) implemented in our evolution experiment allow us to compare adaptive processes in different population-genetic environments. In theory, when the transfer size is large, there should be more effectively beneficial mutations. Consistent with this theoretical prediction, the populations cultured with the L-transfer size show the highest rates of fitness gain (fig. 1). The fitness again may be due to the improvement of protein functions, as the observed neutrality index of nonsynonymous mutations in MMR− populations under L transfer size is >1.0 and the highest (fig. 4). On the contrary, the populations cultured with the S-transfer size may have accumulated more deleterious mutations and thus show less fitness improvement. We also found that the populations under the L-transfer size tend to evolve higher mutation rates (fig. 2), as hypermutator alleles may hitchhike with more selective sweeps of beneficial mutations. Previous research has also demonstrated that the spread of hypermutator alleles tends to be found in populations with larger sizes or weaker bottleneck effects (Raynes et al. 2014). As a result, even though the mutation rates of populations in two genetic backgrounds (WT and MMR−) evolved to be closer to each other during the experiments with all three transfer sizes, the driving mechanisms are different between populations under the L transfer size and those under M or S transfer size (fig. 2). We note that an L-size transfer requires fewer cell divisions to reach the stationary phase than an M-size or S-size transfer. Therefore, populations under the L transfer size may experience a longer stationary phase during experimental evolution. Considering different staying times in the stationary phase potentially is critical for interpreting many results across transfer-size treatments. For example, when calculating the rate of genomic evolution (fig. 3), we estimated generation numbers by the required number of cell divisions in the exponential growth phase, assuming no cell death during the stationary phase. Such an assumption can overestimate the rate of genomic evolution, and the overestimation in L populations can be the strongest due to the longest stationary phase. Given that the physiological features of E. coli can be different in exponential and stationary phases (Pletnev et al. 2015), the evolutionary pressure under different transfer sizes may also vary as their relative durations within the two phases differ. A population can adapt by improving its maximum growth rate in the exponential phase, increasing its carrying capacity, and decreasing its death rate during the stationary phase, or a combination of both. The fitness effects combined among different growth phases were focused in our fitness assays. In our case, however, the adaptive mechanisms of the populations in different transfer sizes are probably similar, as we still found significant parallelism in the fixed enriched mutations across different transfer sizes (supplementary fig. S12, Supplementary Material online). This issue of different durations in different growth phases should not directly affect our main conclusion regarding evolved fitness under different genetic backgrounds in the same transfer size. However, in our competition assays, the transferred ratio at inoculation (1/100) was not the same as the daily transferred ratios applied to the experimentally evolved populations, especially for S populations (1/107). Because the relative durations in different growth phases in competition assay can be different from the ones in experimental evolution, the measured fitness may not reflect the genuine fitness improvement during the experimental evolution. To explore whether this problem has limited our abilities to identify the fitness differences in different genetic backgrounds, we performed another set of competition assays for S/MMR− and S/WT populations using 1/107 as the transferred ratio at inoculation. The results show no significant differences between the evolved populations in two genetic backgrounds (supplementary fig. S13, Supplementary Material online), similar to the previous results (fig. 1). Therefore, our conclusion, that there is no strong relationship between hypermutators and the overall rate of fitness improvement, remains unchanged, although how much E. coli evolved specifically in different growth phases is worth studying in the future.

The Possible Role of Biofilm Formation in Adaptation

With respect to the specific genes and biological processes that appear to be targetes for adaptation, our analysis of candidate genes suggests that biofilm formation is an important characteristic in adapting to the complex setting imposed by the experimental environment. In particular, the formation of type I fimbriae is critical for biofilm formation in E. coli (Pratt and Kolter 1998). Consistently, several genes enriched for fixed nonsynonymous mutations are related to formation of type I fimbriae (fig. 5), including fimH and fimG, which account for components the type I fimbriae (Waksman and Hultgren 2009; Le Trong et al. 2010), fimB, and fimE, which regulate the expression of fimAICDFGH operon (Olsen et al. 1998), and proQ (RNA chaperone) and lsrK (autoinducer-2 kinase), which also facilitate biofilm formation (Li et al. 2007; Sheidy and Zielke 2013). We additionally observed an enrichment of mutations in the intergenic region of fimE/fimA (fig. 5) which contains a phase-variable promoter for regulating the expression of the fimAICDFGH operon (Abraham et al. 1985; Spears et al. 1986). Lastly, the list of genes enriched with structural mutations (fig. 5) also includes fimE, which primarily turns off the expression of fimAICDFGH operon, and several genes related to gatYZABCD operon, including gatZ, gatB, and gatA, whose deletions can increase biofilm formation (Domka et al. 2007). Many of these candidate genes related to type I fimbriae contributed to the adaptation in an earlier experimental environment of a similar nature (Behringer et al. 2018). Moreover, structural mutations in gatZ and gatA genes have been found to contribute to the initial adaptation of E. coli in a mouse gut, another example of a complex environment (Barroso-Batista et al. 2014). Studying the genetic variants promoting the evolution of biofilm in complex environments is of particular interest in the field of public health, as the evolution of biofilm has been considered to be related to the evolution of microbial social behaviors (Tarnita 2017), the evolution of pathogenicity (Kaper et al. 2004; Naves et al. 2008; Rossi et al. 2018), and the evolution of antibiotic resistance (Avalos Vizcarra et al. 2016; Sharma et al. 2016). Thus, understanding the evolution of biofilm formation may be key to increasing the efficiency of treatments in patients to combat the fast emergence of antibiotic resistance and pathogenicity.

Candidate Transcriptional Regulators Involved in Adaptation

Because we have provided fresh media every day during experimental evolution, a fast switch from stationary phase to exponential growth phase may bring benefits to the evolving populations (Monod 1949; Navarro Llorens et al. 2010). Interestingly, several genes enriched in fixed nonsynonymous mutations in our study are transcription regulators, including arcA, cadC, cytR, rbsR, rseB, and sspA (fig. 5). The genes enriched in structured variations also include transcription regulators, such as arcB, cadC, rpoS, and nlpD (fig. 5). The mutations in these genes may contribute to the transcriptomic reprogramming for the fast switch from stationary phase to exponential phase. For example, rseB is a negative regulator of the stationary-phase effector, sigma factor E (Missiakas et al. 1997); and both cytR and rbsR are repressors to the carbon limitation effector, cAMP-CRP (Bell et al. 1986; Mauzy and Hermodson 1992; Kristensen et al. 1996). Therefore, gain-of-function mutations in these three genes can theoretically reduce the chance that cells stay in stationary phase. In addition, arcA, arcB, cadC, rpoS, and sspA are known as stress-responding activators (Iuchi et al. 1989; Lange and Hengge-Aronis 1991; Watson et al. 1992; Williams et al. 1994; Rolfe et al. 2011), suggesting that loss-of-function mutations in these genes are beneficial in the experimental environment involving a frequent supply of fresh media imposing low stress. Further investigation is needed to determine whether any of these mutations do indeed bring such benefits to the experimentally evolved populations.

Concluding Remarks

To sum up, our results reveal that high mutation rates in E. coli have only a very limited influence on the rate of adaptation. Our findings may provide useful insight for understanding clinically relevant processes involving asexual populations, such as the evolution of improved growth rates in pathogens and the emergence of antibiotic resistance in natural or host environments. Compared with the previous lab experiments in simpler environmental settings, our experimental evolution results in complex media are likely to be more representative of natural or host environments, which are usually highly heterogeneous. For example, a combination of the occasional emergence of hypermutators under weaker bottlenecks and the consistent evolution of antimutators under stronger bottlenecks may explain why a low to intermediate frequency of hypermutators is usually found in pathogen populations (Couce et al. 2016; Veschetti et al. 2020). Whether the elevated mutation rates affect adaptation rates and the pattern of genome evolution in these populations under natural or host environments merits further research. For example, hypermutators may be critical for a founding population in a new environment, especially with epistasis in the fitness landscape, but such an effect of hypermutation can diminish after in subsequent generations (Mehta et al. 2019).

Materials and Methods

Strains

The ancestral strains used in experimental evolution are descendants of PMF2 and PMF5, provided by the Foster Lab (Lee et al. 2012). PMF2 is a prototrophic derivative of E. coli K-12 str. MG1655, and its genetic background is called WT in the article. PMF5 is derived from PMF2 with mutL deletion, providing the MMR− genetic background. For both kinds of strains, a 3513 bp deletion to the araBAD operon is further introduced by lambda red recombineering as a neutral marker. Plates with TA agar (1% arabinose, 1%tryptone, 0.5% NaCl, 0.1% yeast extract, 0.005% TTC [Sigma T8877]) are used for examining the deletion of araBAD operon. The colonies with an araBAD deletion (ara−) appear to be pink; otherwise, the colonies (ara+) appear to be purple.

Experimental Evolution

When we established the experimental populations, the ancestral strains were first cultivated overnight at 37 °C on LB agar plates, and then their single-isolated progenitor colonies were inoculated in a 16- × 100-mm glass tube with 10 ml of LB-Miller broth (BD Difco). The tubes were cultured at 175 rpm shaking at 37 °C. Every day, cultures are thoroughly vortexed and transferred into a new tube with 10 ml of fresh LB broth. Three different transfer sizes are used: 1 ml (large), 1 µl (medium), and 1 nl (small), which correspond to different dilution factors: 10−1, 10−4, and 10−7. Specifically, for small-size transfers, we first diluted 1 µl vortexed bacterial liquid with 10 ml fresh LB broth and then mixed 10 µl of this 1:104 diluted bacterial liquid with another 10 ml fresh LB broth. Initially, for each combination of genetic background and transfer size, we set up eight replicate tubes. For preventing cross-contamination, four replicates are ara- and four replicates are ara+. During the experimental evolution, experimental populations at Day 90, 200, 300, 400, 500, 600, 700, 800, 900 were frozen in −80 °C freezers for analysis.

Competition Assay

When evaluating the fitness of an evolved population (ara−), we performed head-to-head competition assays between evolved populations and the corresponding ancestor (ara+). We used the WT ancestor and the MMR− ancestor for evolved populations with WT or MMR− genetic backgrounds, respectively, as there may be some slight fitness effects for the mutL deletion (fitness = 0.97 with SE = 0.04 by the competition assay described here, n = 5). At the beginning of the competition assay, we inoculated the frozen samples of the evolved population and the corresponding ancestor in two different tubes each of which contains 10 ml of fresh LB broth at 37 °C shaking at 175 rpm overnight. For the competition, we then put both 50 µl aliquot from the evolved population and 50 µl aliquot from the ancestor into a new tube with 10 ml of LB broth at 37 °C shaking at 175 rpm for 24 h. The shaking speed and the temperature used in the competition are the same as what was used in experimental evolution. Immediately after inoculation (Day 0) and after 24-h competition (Day 1), we used plate-counting to determine the colony-forming units (CFU) of both the ancestor and the evolved population. Specifically, we serially diluted 100 µl aliquot in 900 µl phosphate buffered saline. To distinguish the evolved population from the ancestor, we plated serially diluted aliquots on TA agar because ara− colonies will appear to be pink, whereas the ancestor (ara+) will appear to be purple. The TA agar plates were incubated at 37 °C overnight. We then identified the plate with 30–300 total colonies and counted the number of colonies for the evolved population and the ancestor. Based on the colony numbers and the dilution factor during the serial dilution, we then calculated the CFU of the ancestor before the competition (A0), the CFU of the ancestor after the competition (A1), the CFU of the evolved population before the competition (E0), and the CFU of the evolved population after the competition (E1). The fitness of the evolved population (w) relative to the ancestor was then calculated by the following formula: which is the ratio of two Malthusian parameters (Lenski et al. 1991). At Day 0 of the competition assay, we also serially diluted and plated 100 µl aliquot of evolved population as a control. For an evolved population, if both purple and pink colonies were found in the countable control plates (30–300 colonies per plate), the data will be discarded from the analysis because we are not sure about the source of the pink colonies in the experimental plates.

Fluctuation Test and Mutation Rate Estimation

We quantified mutation rates of evolved populations (at Day 900) and ancestors by fluctuation tests (Foster 2006). Briefly, fluctuation tests measure the rate of resistance to the antimicrobial rifampicin which is conferred by mutations to rpoB. For each combination of genetic background and transfer size, the four ara− populations were assayed. For each population, two biological replicates with different starting clones were assayed. For each of the WT or MMR− ancestor, we also run replicate experiments in different starting clones. For each clone, 40 replicate experiments were performed. A number of mutants as determined by CFU/ml were converted to an estimated mutation rate and a corresponding 95% confidence interval by the function “newton.LD” function in the R package “rSalvador” (Zheng 2017).

DNA Isolation and High-Throughput Sequencing

We conducted high resolution population tracking by collecting 1 ml of culture at Day 90, 200, 300, 400, 500, 600, 700, 800, and 900 of experimental evolution. We used the DNeasy UltraClean Microbial Kit (Qiagen 12224; formerly MO BIO UltraClean Microbial DNA Kit) to extract DNA. For library preparation and sequencing, we submitted DNA to either the Hubbard Center for Genomic Analysis at the University of New Hampshire, the Center for Genomics and Bioinformatics at Indiana University, or the CLAS Genomics Facility at Arizona State University for library preparation and sequencing. Library preparation was done by the Nextera DNA Library Preparation Kit (Illumina, FC-121-1030) following an augmented protocol for optimization of reagent use (Baym et al. 2015) before being pooled and sequenced as paired-end reads on an Illumina HiSeq 2500 (UNH) or an Illumina NextSeq 500 (Indiana; ASU). The target depth is 100×.

Sequencing Analysis

We performed Sequencing analysis on the Mason and Carbonate high-performance computing clusters at Indiana University. The quality control of sequencing reads was performed by Cutadapt version 1.9.1 (Martin 2011), which removes residual adapters and trims low-quality sequences. The qualified sequencing reads were then mapped to the E.coli K-12 substr. MG1655 reference genome (NC_000913.3). All mutations and their DAFs were identified using Breseq version 0.30.2 with the predict-polymorphisms parameter setting (Deatherage and Barrick 2014). Furthermore, several criteria of further quality checks were applied to the samples which we will only include in the following analysis: 1) mean sequencing depths >10; 2) any WT sample identified to contain the 1,830 bp deletion in mutL from the PMF5 progenitor strain was discarded; 3) regions lacking sequencing coverage (i.e., depth = 0) must be smaller than 5% of the genome; and 4) the sequencing result should reflect the correct genetic background in terms of ara markers, including a nonsynonymous SNP at position 66528, an intergenic SNP at position 70289, and a multiple base substitution mutation (SUB) at position 66533. For an ara+ population, we required either of two SNPs showing DAF < 0.2. For an ara− population, we required either of two SNPs showing DAF > 0.8 or the SUB is detected. In the end, 396 genomic profiles passed QC and were included in the following analysis (supplementary table S3, Supplementary Material online). In the case of M transfer size under WT background, only six out of eight replicates of evolved populations are left for the following analysis because the other two were potentially contaminated by MMR− strains. In the other five combinations of transfer size and genetic background, there are still eight replicates of evolved populations for the following analysis. For every evolved population subject to the following analysis, its sequencing profiles are available in at least seven different time points. To make sure that we do not use mutations that originated from the starting clone before experimental evolution in the analysis, we discarded any mutations with a DAF = 1.0 at one time point for at least 11 experimental populations with the same genetic background from the analysis. Furthermore, the highly repetitive sequences in rsx genes are known to cause errors in SNP calling (McCloskey et al. 2018), so they were also discarded from the analysis.

Number of Guaranteed Generations

Given the observation that carrying capacity of experimental populations can be recovered within a transfer period (i.e., 1 day), the guaranteed generation numbers in 900 days for dilution factors = 10−1, 10−4, and 10−7 were, respectively, estimated as 900 times of log2(10), log2(104), and log2(107), which are equal to ∼ 3.0k, 12k, and 21k.

Rate of Genomic Evolution

The level of genomic divergence for each experimental population at each time point is defined by summing all DAFs of detected mutations. We then calculated the mean genomic divergence across all eligible experimental populations in each combination of transfer size and genetic background. We further performed the linear regression by the function “lm” in R with formula “mean genomic divergence ∼ guaranteed generations + 0,” which enforces the y-intercept as 0. The slope of the regression is the estimated rate of genomic evolution. We also performed a nonlinear regression using the formula “mean genomic divergence ∼ guaranteed generations + square root of guaranteed generations + 0,” which was previously proposed to catch the trend of diminishing returns (Tenaillon et al. 2016).

Identification of Fixed Mutations by Hidden Markov Chain

For each population, caHMM was performed using a modified version (Behringer et al. 2020) of previously released code (Good et al. 2017). For the populations in which caHMM did not finish, we instead performed well-mixed hidden Markov chain (wmHMM) using a modified version (Behringer et al. 2020) of previously released code (Good et al. 2017). The single clade in wmHMM is defined as the basal clade. Fixed mutations are then defined as mutations that are inferred to be fixed in basal, major, or minor clade in the results of either analysis.

Estimation of Selection Coefficients

For each fixed mutation, we estimated its selection coefficients by its temporal data of DAFs. If a mutation is in the basal clade, no correction is needed. If caHMM infers a mutation belongs to the major or minor clade, its corrected allele frequency will be the DAF divided by the proportion of the population belonging to the major or minor clade (also inferred by caHMM). For each of two available consecutive time points i and j, if the corrected allele frequencies at both time points (pi and pj) are smaller than 0.95 and larger than 0.05, the selection coefficient is calculated as where ti and tj are the numbers of guaranteed generations at time point i and j, respectively. The negative values were discarded. The largest positive value across all pairs of time point was used as the final measurement.

Calculation of Neutrality Index

As discussed in the main text, we defined (FN/FS)/(UN/US) as the neutrality index of nonsynonymous SNPs, where FN is the number of nonsynonymous SNPs fixed within a clade or fixed in an entire population, FS is the number of synonymous SNPs fixed within a clade or fixed in an entire population, UN is the number of nonsynonymous SNPs in the mutation- accumulation experiment, and US is the number of synonymous SNPs in the mutation accumulation-experiment. We also similarly defined (FI/FS)/(UI/US) as neutrality index of intergenic SNPs, where FI is the number of intergenic SNPs fixed within a clade or fixed in an entire population, and UI is the number of intergenic SNPs in the mutation-accumulation experiment. The values of UN, US, and UI are from a previously published mutation-accumulation experiment of our ancestral lines (Lee et al. 2012). We calculated population-specific indexes and then acquired population-wise mean and SE. The populations with FN = 0 and FI = 0 were discarded in the calculation of neutrality index of nonsynonymous and intergenic SNPs, respectively.

Calculation of G-Scores

For each combination of genetic background and transfer size, we quantified the parallelism of the fixed nonsynonymous mutations using the sum of G-scores across genes (Tenaillon et al. 2016). A larger G-score for a gene suggests that the fixed nonsynonymous mutations are more overrepresented in that gene. Specifically, to calculate a genic G-score (Gi), we first counted the observed number of fixed nonsynonymous mutations in gene i (Oi) per a combination of genetic background and transfer size. We then calculated the expected number for gene i (Ei) by Otot(Li/Ltot), where Otot= ΣiOi, Li is the number of nonsynonymous sites for gene i, and Ltot= ΣiLi. In the end, Gi is defined by 2Oiln(Oi/Ei) or defined as zero when Oi = 0 or when 2Oiln(Oi/Ei) < 0. It was noted that the null expectation of G-scores varies with total number of fixed nonsynonymous mutations (Behringer et al. 2020). Therefore, for each combination of genetic background and transfer size, we performed 20,000 simulations in each of which Otot hits are randomly distributed among all Ltot sites across all genes in the reference genome. Then the significance of the sum of G-scores was evaluated by the z score defined by (the observed sum—mean of simulated sums)/(standard deviation of simulated sums).

Calculation of Mean Bray–Curtis Similarity

For each combination of genetic background and transfer size, we also quantified the parallelism of the fixed nonsynonymous mutations using the mean Bray–Curtis similarity across all pairs of experimental populations (Turner et al. 2018; Behringer et al. 2020). Specifically, for a pair of populations j and k, their Bray–Curtis similarity is defined by where oij and oik are the observed number of fixed nonsynonymous mutations in gene i for population j and k, respectively. For each combination of genetic background and transfer size, we also performed 1,000 simulations to acquire the null distribution. In each simulation, we randomly sample the nonsynonymous sites up to the number of observed fixed nonsynonymous mutations for each population and calculated mean Bray–Curtis similarity as described above. After acquiring the null distribution, we evaluated the significance of the observed mean Bray–Curtis similarity by calculating the z score defined by (the observed value—mean of simulated values)/(standard deviation of simulated values).

Overrepresentation of the Genes Affected by Nonsynonymous Nutations

To evaluate the significance of G-score for gene i, we directly compared the Gi to the distribution of 20,000 simulated Gi, and the P value was defined as the proportion of simulated Gi larger or equal to the observed Gi. For multiple test correction, we multiplied each gene’s P value by the number of genes with at least one hit by the set of fixed nonsynonymous mutations (Bonferroni correction). The genes are called significant only if the genes show Bonferroni corrected P value < 0.05.

Enrichment Test of GO Terms and KEGG Pathways

Using the set of significant genes, we performed the enrichment test of GO terms using the function “enrichGO” in R package “DOSE” (Yu et al. 2015) with q-value cut-off = 0.05 and the organismal database as org. EcK12.eg.db. We also performed the enrichment test of KEGG pathways using the function “enrichKEGG” in the same package but found no terms with q-value < 0.05.

Overrepresentation of the Intergenic Regions Affected by Fixed Mutations

The identification of the intergenic regions affected by mutations was also performed by the way similar to identify the genes affected by nonsynonymous fixed mutations in genic G-score approach (Tenaillon et al. 2016). Instead of focusing on genic regions, genome-wide intergenic regions are focused. For each combination of genetic background and transfer size, we first counted the observed number of intergenic mutations in intergenic region i (Oi), and the expected number for intergenic region i (Ei) was calculated by Otot(Li/Ltot), where Otot= ΣiOi, Li is the length for intergenic region i, and Ltot= ΣiLi. The G-score for intergenic region i (Gi) was then calculated by 2Oiln(Oi/Ei), following the methods described in the above section. We also performed 20,000 simulations and determined the Bonferroni corrected P value for each gene i following the methods described in the above section.

Overrepresentation of the Genes Affected by Structural Fixed Mutations

The identification of the genes affected by structural fixed mutations was performed by the way similar to identify the genes affected by nonsynonymous fixed mutations in genic G-score approach (Tenaillon et al. 2016). Structural mutations include indels and IS-element insertions. For each combination of genetic background and transfer size, we first counted the observed number of populations with any structural mutations in gene i (Oi), and the expected number for gene i (Ei) was calculated by Otot(Li/Ltot), where Otot= ΣiOi, Li is the gene length for gene i, and Ltot= ΣiLi. The G-score for gene i (Gi) was then calculated by 2Oiln(Oi/Ei), following the methods described in the above section. We also performed 20,000 simulations and determined the Bonferroni corrected P value for each gene i following the methods described in the above section. As a result, we found all the genes with Oi ≥ 2 show Bonferroni corrected P value < 0.05.

Correlations Between Pairs of SNPs

For each evolved population, we focused on the nonsynonymous, synonymous, and intergenic SNPs in which at least two nonzero DAFs were found. For each pair of two such SNPs, we calculated the change of DAFs. Then we calculated Pearson’s correlation coefficients across only all the odd-numbered changes of DAFs to avoid nonindependence (Lynch and Ho 2020). That is to say, if a population has sequencing profiles available for analysis at every sampling time points (Days 90, 200, 300, 400, 500, 600, 700, 800, 900), we calculate Pearson’s correlation coefficients using the five changes of DAFs: the one between Day 0 and 90, between Day 200 and 300, between Day 400 and 500, between Day 600 and 700, and between Day 800 and 900. For another example of population, if its sequencing profile at Day 90 is discard from analysis due to its low quality, we calculate the Pearson’s correlation coefficients using only the four changes of DAFs: the one between Day 0 and 200, between Day 300 and 400, between Day 500 and 600, and between Day 700 and 800. Note that at least four changes of DAFs are used for each evolved population because no populations have more than two missing profiles. We then get the distribution of Pearson’s correlation coefficients for each evolved population. To establish the baseline for comparison in each combination of transfer size and genetic background, we also generated the set of Pearson’s correlation coefficients using two random mutations from two different random populations with all sequencing profiles available (i.e., they are unlinked for sure). We followed the same procedure above to calculate Pearson’s correlation coefficients for each pair of unlinked mutations. When simulating a distribution of Pearson’s correlation coefficients, we used 100 pair of unlinked mutations. We then repetitively performed 100 rounds of simulation to get the mean and SE for the distribution.

Supplementary Material

Supplementary data are available at Genome Biology and Evolution online. Click here for additional data file.

87 in total

1. The speed of adaptation in large asexual populations.

Authors: Claus O Wilke
Journal: Genetics Date: 2004-08 Impact factor: 4.562

Review 2. Stationary phase in gram-negative bacteria.

Authors: Juana María Navarro Llorens; Antonio Tormo; Esteban Martínez-García
Journal: FEMS Microbiol Rev Date: 2010-02-06 Impact factor: 16.408

3. Methods for determining spontaneous mutation rates.

Authors: Patricia L Foster
Journal: Methods Enzymol Date: 2006 Impact factor: 1.600

4. Dynamics and Fate of Beneficial Mutations Under Lineage Contamination by Linked Deleterious Mutations.

Authors: Sophie Pénisson; Tanya Singh; Paul Sniegowski; Philip Gerrish
Journal: Genetics Date: 2017-01-18 Impact factor: 4.562

Review 5. Genetic drift, selection and the evolution of the mutation rate.

Authors: Michael Lynch; Matthew S Ackerman; Jean-Francois Gout; Hongan Long; Way Sung; W Kelley Thomas; Patricia L Foster
Journal: Nat Rev Genet Date: 2016-10-14 Impact factor: 53.242

6. Antibiotic treatment enhances the genome-wide mutation rate of target cells.

Authors: Hongan Long; Samuel F Miller; Chloe Strauss; Chaoxian Zhao; Lei Cheng; Zhiqiang Ye; Katherine Griffin; Ronald Te; Heewook Lee; Chi-Chun Chen; Michael Lynch
Journal: Proc Natl Acad Sci U S A Date: 2016-04-18 Impact factor: 11.205

7. Intrapopulation variability in mutator prevalence among urinary tract infection isolates of Escherichia coli.

Authors: A Couce; N Alonso-Rodriguez; C Costas; A Oliver; J Blázquez
Journal: Clin Microbiol Infect Date: 2016-03-26 Impact factor: 8.067

8. Mutator genomes decay, despite sustained fitness gains, in a long-term experiment with bacteria.

Authors: Alejandro Couce; Larissa Viraphong Caudwell; Christoph Feinauer; Thomas Hindré; Jean-Paul Feugeas; Martin Weigt; Richard E Lenski; Dominique Schneider; Olivier Tenaillon
Journal: Proc Natl Acad Sci U S A Date: 2017-10-10 Impact factor: 11.205