Literature DB >> 35666722

Empirical estimates of the mutation rate for an alphabaculovirus.

Dieke Boezen¹, Ghulam Ali², Manli Wang³, Xi Wang³, Wopke van der Werf⁴, Just M Vlak², Mark P Zwart¹.

Abstract

Mutation rates are of key importance for understanding evolutionary processes and predicting their outcomes. Empirical mutation rate estimates are available for a number of RNA viruses, but few are available for DNA viruses, which tend to have larger genomes. Whilst some viruses have very high mutation rates, lower mutation rates are expected for viruses with large genomes to ensure genome integrity. Alphabaculoviruses are insect viruses with large genomes and often have high levels of polymorphism, suggesting high mutation rates despite evidence of proofreading activity by the replication machinery. Here, we report an empirical estimate of the mutation rate per base per strand copying (s/n/r) of Autographa californica multiple nucleopolyhedrovirus (AcMNPV). To avoid biases due to selection, we analyzed mutations that occurred in a stable, non-functional genomic insert after five serial passages in Spodoptera exigua larvae. Our results highlight that viral demography and the stringency of mutation calling affect mutation rate estimates, and that using a population genetic simulation model to make inferences can mitigate the impact of these processes on estimates of mutation rate. We estimated a mutation rate of μ = 1×10-7 s/n/r when applying the most stringent criteria for mutation calling, and estimates of up to μ = 5×10-7 s/n/r when relaxing these criteria. The rates at which different classes of mutations accumulate provide good evidence for neutrality of mutations occurring within the inserted region. We therefore present a robust approach for mutation rate estimation for viruses with stable genomes, and strong evidence of a much lower alphabaculovirus mutation rate than supposed based on the high levels of polymorphism observed.

Entities: Chemical

Mesh：

Year: 2022 PMID： 35666722 PMCID： PMC9203023 DOI： 10.1371/journal.pgen.1009806

Source DB: PubMed Journal: PLoS Genet ISSN： 1553-7390 Impact factor: 6.020

Introduction

Mutation rates are of key importance for understanding and predicting evolutionary patterns, as the mutation rate modulates the mutation supply of a population [1]. Large mutation supplies can fuel rapid and repeatable adaptation [2, 3], but also increase the mutational load on a population [4]. By contrast, low mutation supplies can limit the rate of adaptation [5], but also result in a lower mutational load [4]. The impact of mutational supply depends on the topography of the fitness landscape. Small mutational supplies can have advantages for evolution on rugged fitness landscapes: although adaptation will be slower and in most cases less fit genotypes will be selected, some populations can avoid becoming trapped on local fitness peaks [6]. Mutation rates are not only relevant to understanding basic evolutionary processes, but they also impinge on real world outcomes, such as the efficacy of prophylactic or therapeutic interventions to infectious diseases [7, 8]. Viruses have high mutation rates compared to cellular life forms [7, 9, 10], with estimates of mutations per site per strand copying (s/n/r) ranging from 2×10−8 for Enterobacteria phage T2 [11] to 2×10−4 for Influenza A virus [12]. Whilst these high mutation rates are thought to contribute to the rapid adaptation of viruses, beneficial mutations are typically rare, as the majority of mutations are neutral or deleterious [13, 14]. Many viruses with large genomes belong to Group I of the Baltimore classification (dsDNA viruses) [15], and typically have polymerases with proofreading activity, which should enhance the fidelity of replication [16]. A general expectation is therefore that viruses with relatively large genomes have lower mutation rates [9]. An inverse relationship between genome size and mutation rate indeed has been found [9, 17]. Small genomes can tolerate higher mutation rates as a larger proportion of mutation-free genomes are generated in each round of replication, due to their small size. The alphabaculoviruses are a large group of insect baculoviruses that have been studied because of their biocontrol and biotechnological potential [18, 19]. Alphabaculoviruses have relatively large dsDNA genomes compared to other viruses [20], and high levels of within-host genetic diversity have been documented from wild [21-23] and captive [24] insect populations. It has been suggested that baculoviruses might therefore have high mutation rates despite their large genome sizes [24]. To our knowledge, no empirical estimates of the mutation rate have been reported for any baculovirus or insect virus to date, and there are only a few estimates for other large dsDNA viruses [9]. A major challenge for making empirical estimates of mutation rates is the need to account for biases due to selection [9]. Selection will decrease the frequency of deleterious mutations, whist it will increase the frequency of rare beneficial mutations. These opposing effects of purifying and directional selection make it problematic to derive information on mutation rates directly from mutation accumulation patterns. Many different approaches have been developed to remove the bias introduced by selection [7, 9]. For example, some studies considered the frequency of lethal mutations in a population, since these variants cannot replicate autonomously and therefore represent a snapshot of genetic variation [25]. Others have evolved viruses in hosts expressing a viral gene, and then restricting their analysis to the sequence of this redundant viral gene [26]. Another strategy reported recently has been to incorporate fluorescent markers with inactivating mutations into a viral genome, and then performing fluctuation tests based on recovering fluorescence [12]. Finally, others have setup experiments with demographic conditions that remove variation while limiting the role of selection, in combination with high-fidelity high-throughput sequencing [27]. In the current study, we report the first empirical estimate of mutation rate for a large dsDNA insect virus, the alphabaculovirus Autographa californica multiple nucleopolyhedrovirus (AcMNPV). To ensure selection did not bias our estimates, we analyzed virus populations carrying a large, nonfunctional genomic insert that was stably maintained [28], exploiting the genomic stability of the Group I viruses [29]. For our analysis, we assumed that mutations in this region are neutral due to the absence of known viral genes and regulatory sequences, and verified this assumption. We also developed a population genetics simulation model to estimate mutation rates from empirical data that incorporates the effects of population bottlenecks and different modes of virus replication on the occurrence and maintenance of mutations. Using this approach, we made robust estimates of the mutation rate, for the first time for a baculovirus.

Results and discussion

Serial passage and detection of mutations

To estimate the mutation rate of a large dsDNA virus, we experimentally evolved a variant of alphabaculovirus AcMNPV containing a stable, non-functional genomic region. The AcMNPV variant used was a so-called bacmid: an infectious clone that also contains the AcMNPV genome (~134 kb) for the E2 variant [28]. It also contains bacterial sequences that enable propagation as a low copy number plasmid in Escherichia coli (~12.5 kb) and the acceptance of expression cassettes by transposition [28]. The specific variant used here contains an expression cassette from the pFastBac-Dual vector to restore expression of complete and functional polyhedrin (Fig 1). We consider the inserted bacterial sequences to be non-functional and therefore neutral in insects, except for the polyhedrin promoter and open reading frame (ORF) sequences derived from the pFastBac Dual vector. This renders two neutral sequences with a combined length of 11,646 bp flanking the polyhedrin gene, and the former can be studied in a mutation accumulation experiment (Fig 1). By contrast, the remainder of the AcMNPV genome is intact and unaltered, making this bacmid-derived virus a good representative of an alphabaculovirus. We will hereafter refer to this bacmid-derived AcMNPV variant with restored polyhedrin expression as “BAC”, and the two neutral sequences it contains as the “neutral region”. The virus could be reconstituted from the infectious clone by transfection of the BAC genome into fourth instar (L4) Spodoptera exigua (Hübner) (see Materials and Methods). Mutations were allowed to accumulate across the BAC genome by experimentally evolving five replicate BAC lineages (referred to as lineages A, B, C, D and E) for five passages in S. exigua L2. For each replicate lineage, passaging was performed in five larvae exposed to a high viral dose of occlusion bodies (OBs) sufficient to kill all larvae. Upon death larval cadavers were collected and pooled prior to the isolation of OBs, which were used to inoculate larvae orally for the next passage. After five passages, each evolved population was amplified in 100 S. exigua L3 prior to further analyses. Further details on the experimental evolution experimental setup can be found in the Materials and Methods.

Fig 1

Illustration of the neutral region (green diagonal bars) used for estimating mutation rate.

Illustration of the neutral region (green diagonal bars) used for estimating mutation rate.

ORF603 and ORF1629 (light blue) are native AcMNPV genes, between which lies the polyhedrin gene which was disrupted (box with magenta bars) by the insertion of the bacmid sequences in its 5’ end. To restore polyhedrin expression the gene has been reinserted under control of its native promotor within the bacmid insert. We consider the sequences of bacterial origin in the insert and the remnants of the pseudogenized polyhedrin gene copy as neutral sequences. Evolved lineages A—E, as well as the ancestral BAC, were sequenced using Illumina HiSeq to detect mutations in both the non-functional genomic insert, as well as across the whole baculovirus genome. When mapping reads to the BAC reference genome we noticed a correlation between low sequencing coverage and the calling of mutations, and we therefore removed regions with relatively low coverage (S1 and S2 Figs, see Materials and Methods). We show that mutations indeed accumulated across lineages and passages (Table 1). When analyzing the sequence data, the number of mutations detected is dependent on the minimum threshold value (τ) for mutation frequency (τ values of 0.5, 1 and 2% were used for all analyses reported throughout this study). Mutations detected in the ancestral BAC were excluded from the analysis, as we were only interested in de novo mutations which occurred during the evolution experiment. We also noticed that some mutations were detected in multiple evolved lineages. We suspect that the majority of these observed mutations are due to sequencing and read-mapping errors, and consequently they should be removed from the set of mutations used for analyses. Nevertheless, to illustrate how this assumption affects the results, we have performed analyses with different values of the parameter ψ, the threshold for the number of evolved populations in which a mutation can occur. I.e., when ψ = 1 only unique mutations are included, and when ψ = 5 mutations detected in all five evolved populations are included. To limit the number of results presented, we report results for ψ values of 1, 3 and 5 for most analyses. Regardless of the chosen mutation frequency threshold, the number of mutations detected is low for both the bacmid insert and genome after five passages in S. exigua L2, although the number of mutations detected increases as τ and ρ increase (Table 1).

Table 1

Mutations called per lineage, where τ is the threshold frequency for detecting mutations and ψ is the maximum number of lineages in which a mutation could occur before being filtered.

		Neutral region only			Virus genome
Lineage	ψ	τ = 0.5%	τ = 1.0%	τ = 2.0%	τ = 0.5%	τ = 1.0%	τ = 2.0%
A	1	2	0	0	18	7	1
	3	4	2	0	54	16	2
	5	9	5	2	71	25	3
B	1	2	0	0	12	3	2
	3	4	0	0	19	3	2
	5	8	1	0	40	5	2
C	1	1	1	0	8	1	1
	3	2	1	0	35	7	1
	5	4	3	0	48	14	2
D	1	1	1	0	6	5	2
	3	3	1	0	8	5	2
	5	7	2	0	18	6	3
E	1	2	1	1	16	1	1
	3	4	1	1	47	8	1
	5	9	4	1	67	18	2
Total	1	8	3	1	60	17	7
	3	17	5	1	163	39	8
	5	37	15	3	244	68	12

Mutations were not distributed randomly along the genome (S1 Fig): Kolmogorov-Smirnoff tests against a uniform distribution showed that mutation position is clustered, for both the bacmid region and the natural viral genome (Table A in S1 File). These analyses were performed for the lowest mutation frequency threshold (τ = 0.5%) to ensure sufficient mutations for a meaningful analyses. For the bacmid region and the high stringency condition for mutation calling (ψ = 1), the significance was only marginal (P-value = 0.062), whereas the results were significant to highly significant for all other tests. We expected mutations to be clustered, as the accuracy of the replication machinery is sequence dependent and consequently mutational hotspots have been described for many viruses [10].

Low mutation rate for AcMNPV with bacmid and whole-genome mutation data

To make robust inferences on the mutation rate (μ) from these experimental data, we developed a population genetic model. Briefly, we generated a stochastic model simulating neutral evolution in a virus genome, modelled as mutation rate per base per strand copying (s/n/r). We fitted this model to the experimental data by considering the number of bases with a frequency of mutations above the threshold τ, using a maximum likelihood approach. We estimated the viral mutation rate to be μ ~ 1 × 10−7 s/n/r, when filtering mutations that were detected in multiple evolved populations (ψ = 1). Whilst the threshold for mutation detection τ did not have a strong effect on mutation rate estimates, allowing mutations that occurred in multiple populations (ψ > 1) lead to higher mutation rate estimates (Fig 2). Estimates for the neutral bacmid region and the whole genome data gave similar results (Fig 2), although analyses of the rates at which different types of mutations accumulated suggests that mutations in the bacmid insert are indeed neutral. Finally, we explored how changes in viral demography–specifically the size of the population bottleneck and the final size of the population in each host–affect accumulation of neutral mutations, illustrating the importance of demography for mutation accumulation and hereby highlighting the importance of using a model for mutation rate estimation. Below we describe these results in more detail.

Fig 2

Mutation rate estimates (s/n/r, mutations per site per strand copying) derived with the model are given based on the neutral bacmid region and the whole genome.

Mutation rate estimates (s/n/r, mutations per site per strand copying) derived with the model are given based on the neutral bacmid region and the whole genome.

We estimated mutation rates for different values of the threshold frequency for detecting mutations (τ), noted as percentages here, and for different values of the maximum number of lineages in which a mutation could occur before being excluded from the analysis (ψ). Error bars represent the 95% confidence interval, as determined by bootstrapping. Note that for the neutral region and τ = 2.0%, the lower fiducial limit extends to zero due to the low number of mutations detected. The simulation model was run for a range of model parameter values for mutation rate, viral replication mode (⍴, with values of 1, 3 and 10 used), threshold values for mutation detection (τ values of 0.5%, 1% and 2%), the maximum number of evolved populations in which mutations could occur (ψ values of 1, 3 and 5), and lengths corresponding to the neutral bacmid region only and the whole virus genome (see Materials and Methods section for details and a full explanation of all parameters). Strikingly, our mutation rate estimates largely were robust relative to the replication mode ⍴ value and to the choice of experimental data used (neutral region or whole genome), with estimates of approximately μ = 1×10−7 s/n/r in all scenarios when ψ = 1 (Figs 2 and S3). When we included mutations that occurred in multiple populations (ψ > 1), in the most extreme case (τ = 0.5, ψ = 5 and mutations in the bacmid region only) the estimated mutation rate was μ = 5×10−7 s/n/r. As we think the repeated mutations are most likely sequencing errors, we think the estimate μ = 1×10−7 s/n/r is the most valid. We also estimated the mutation rate with established models for sequencing data from clones [9], adapting these methods to use the frequency of mutations determined by high-throughput sequencing data instead of Sanger sequences from clones (see Materials and Methods). These estimates were roughly similar to those obtained with the first approach, although they tended to be about a factor 2.5 smaller (μ ≤ 4 ×10−8 s/n/r when ψ = 1, S4 Fig). This difference was expected, as this alternative approach does not take into consideration the effects of the mutation detection threshold τ, but rather assumes all mutations will be detected irrespective of their frequency. As the limited sensitivity of the deep-sequencing data is not taken into account, this approach likely will underestimate the mutation rate. Mutation rate estimates with established models also were higher if less stringent criteria for mutation calling were used, approaching 1 × 10−7 when ψ = 5. One of the purported strengths of the approach used here is the large region (11.6 kb) in which point mutations will not affect fitness, given there are no known viral genes or regulatory sequences (Fig 1). Although large genomic deletions in this region would presumably be beneficial as they could speed up viral replication [29], none were detected: Sequence coverage was similar to the rest of the genome for the evolved populations (S1 and S2 Figs), ruling out the occurrence of large deletions at high frequencies. To test if point mutations in this region were indeed neutral, we considered the rate of intergenic mutations (dI), normalized by the rate of synonymous substitutions in viral genes (dS) (see Materials and Methods). If mutations in the inserted region are neutral, we expect dI/dS ~ 1. For the different values of τ and ψ used, dI/dS estimates were indeed approximately 1 for the bacmid region, ranging from 0.54 to 1.75 (Fig 3). For all conditions for which we could perform a formal test, dI/dS was not significantly different from 1, lending further support to the idea that mutations in the bacmid region are neutral (Fig 3). By contrast, when the same analysis was performed for the rate of nonsynonymous mutations (dN) in viral genes, all dN/dS values were ≤ 0.34 (Fig 3). For all conditions under which we could perform a formal test, dN/dS was significantly lower than 1 (Fig 3). This underrepresentation of nonsynonymous mutations in the viral genome presumably occurs because most nonsynonymous mutations will be deleterious [13, 14], and therefore are removed by purifying selection. Despite evidence for purifying selection acting on viral genes, mutation rate estimates were similar for the neutral region and the whole genome (Fig 2). Finally, we considered the normalized rate of intergenic mutations (dI/dS) for the authentic viral genome, and found a broad range of values ranging from 0 to 9.26. For the lowest value used for the threshold value for mutation detection (τ = 0.5%), intergenic mutations in virus genome were overrepresented; for a highest value (τ = 2.0%), there were few or no intergenic mutations (Fig 3). We therefore cannot draw any conclusions from these data, but note that almost all of the mutations found are associated with homologous regions (hrs). These repetitive elements will likely lead to sequencing and read-mapping errors, but their sequences are highly variable [30] and hence they could also be mutation hotspots.

Fig 3

Trends for the rates of intergenic (dI) or non-synonymous (dN) mutations are reported, all normalized by the rate of synonymous mutations in native viral genes (dS), for the different values used for the mutation threshold value (τ) and the maximum number of lineages in which a mutation can occur before being excluded from the analysis (ψ).

Trends for the rates of intergenic (dI) or non-synonymous (dN) mutations are reported, all normalized by the rate of synonymous mutations in native viral genes (dS), for the different values used for the mutation threshold value (τ) and the maximum number of lineages in which a mutation can occur before being excluded from the analysis (ψ).

Error bars indicate the 95% confidence interval of the estimate as determined by bootstrapping, and results of a one-sample t-test on log10-transformed dI/dS or dN/dS values are indicated by ns (non-significant, p > 0.05), * (p < 0.05), ** (p < 0.01) and *** (P < 0.001). Note that for many conditions the confidence interval could not be determined, or the test could not be performed, as one or more values for a mutation class were zero, in which case no test results are indicated. The bars on the left labelled “dI/dS (Bacmid insert)” indicate the rate of intergenic mutations in the artificial neutral regions. These mutations occur with a normalized rate close to 1, indicating neutral evolution. The central bars labelled “dN/dS (Virus genome)” indicate the rate of non-synonymous mutations. These mutations were under-represented compared to synonymous mutations, indicating purifying selection. Finally, the columns on the right labelled “dI/dS (Virus genome)” indicate the rate of intergenic mutations in the viral genome, with bars extending to a value of 0.01 indicative of a value of zero. These results were inconclusive, as dI/dS was strongly dependent on the threshold for mutation detection chosen. The accumulation of mutations in a virus population is affected by the viral replication mode (⍴) [9, 10, 31–33], and the distribution of mutation frequencies is linked to the mode of replication [33]. Here we considered this effect by fitting the genome evolution model with values ⍴ = 1 (the “geometric growth” scenario), ⍴ = 3 (mixed replication scenario) and ⍴ = 10 (“stamping machine” scenario). We found that viral mode of replication had a consistent but small effect on the estimated mutation rate (see S1 Text), with higher values of ⍴ corresponding to higher mutation rate estimates, as expected. The effect on model fit was minimal (S3 Fig), and we therefore cannot make inferences on the mode of replication from these data. This result is not surprising, given that the number of mutations we detected was small and that the model fitting only considers the number of sites with a mutation frequency above τ, and not the frequency of individual mutations. For baculoviruses, the mode of replication has not been described formally, to the best of our knowledge. However, as these viruses probably employ rolling circle amplification [34], replication is likely to follow the “stamping machine” scenario and to be described best by high values of ρ. We considered the importance of two viral demographic parameters, the sizes of the founding (λ) and final population (κ) in an insect, and found that both parameters also had an effect on the accumulation of detectable mutations predicted by our model (S1 Text). This analysis showed a non-monotonic relationship between the size of λ and accumulation of detectable mutations, with the highest numbers of mutations being detected at intermediate values of λ (S5 Fig). Large values of λ allow more mutations to be maintained in the virus population, but also prevent neutral mutations from reaching frequencies at which they can be detected (S6 Fig). This unintuitive result emphasizes the importance of considering demography when estimating viral mutations rates.

AcMNPV mutation rate estimate is congruent with estimates for other viruses

Mutation rates (s/n/r) have been estimated for a number of viruses, allowing for a comparison with our baculovirus estimate. For this comparison, we included a collection of mutation rate data [9, 35], with updated mutation rates for the RNA viruses influenza A virus [12] and poliovirus [33] due to the availability of better estimates. When multiple mutation rates were available for one virus, we used only the most recent estimate because methodological advances make estimates that are more recent more reliable. Our estimate clearly is congruent with mutation rate estimates for other dsDNA viruses, as it is close to the predicted relationship between genome size and mutation rate (Fig 4). As AcMNPV has a relatively large genome, it is also one of the lowest estimates of mutation rate in DNA viruses reported in the literature, similar to that of Escherichia virus λ, and with only Enterobacteria phage T2 (170 kbp) being lower.

Fig 4

An overview of known mutation rate estimates (s/n/r, ordinate) for viruses with different genome sizes (abscissa) is given.

An overview of known mutation rate estimates (s/n/r, ordinate) for viruses with different genome sizes (abscissa) is given.

The color and shapes of symbols indicate the Baltimore classification group to which each virus belongs, as indicated by the legend in the top right of the figure. The solid black line marks the regression line, with dotted lines marking the 95% confidence interval. The slope of the fitted relationship is significantly lower than zero (t = -3.243, p = 0.012), and the coefficient of determination (r2) is 0.568. Our mutation rate estimate for AcMNPV is in good agreement with the fitted relationship between genome size and mutation rate. Other virus names in the figure are Enterobacteria phage T2 (T2), Escherichia virus λ (λ), Escherichia virus ΦX174 (phiX174), Influenza A virus (IAH), Measles virus (MV), Poliovirus (PV), Pseudomonas virus Φ6 (phi6), Turnip mosaic virus (TuMV), and Vesicular stomatitis virus (VSV). Besides being consistent with trends in other viruses, our estimate of mutation rate for AcMNPV is congruent with what is known about its polymerase. AcMNPV codes for a DNA-dependent DNA polymerase (DNApol) that belongs to the family B DNA polymerases and contains the exonuclease domain, thought to be responsible for proofreading and editing of mismatches [16, 36, 37]. The 3’ to 5’ exonuclease activity of this domain—essential for repairing errors—has been confirmed [38], and hence a low mutation rate is expected for AcMNPV. The other viruses with low mutation rates are both dsDNA bacteriophages with relatively large genomes (Fig 4). Proofreading activity also has been demonstrated in the case of T2 [39].

Analysis of the mutational spectrum: A low transition to transversion ratio?

We considered whether our data shed light on AcMNPV’s mutation spectrum, the occurrence of the different kinds of single-nucleotide mutations, focusing on the transition to transversion ratio (Table 2; see also Tables B-D in S1 File). For the neutral region, the number of observed mutations is too small to be informative, even for the lowest mutation detection threshold of τ = 0.5%. We also considered mutation bias for the whole genome with different threshold values for mutation frequency (τ) and the maximum number of lineages in which a mutation could occur (ψ). Whilst ψ did not have a major effect on the transition to transversion ratio, this ratio depended strongly on τ (Table 2). For low values of τ the transition to transversion ratio was less than 1. For high values of τ, the transition to transversion ratio becomes greater than one, but the number of mutations included in the analysis becomes very small and confidence interval for estimates becomes very large (Table 2). Moreover, for analysis outside of the neutral bacmid region selection may bias the mutations detected. Due to the small number of mutations detected in most conditions considered and the possible effects of selection on the whole-genome data, we therefore cannot draw any firm conclusions on AcMNPV’s mutational spectrum. Analysis of a larger number of neutral mutational events will be necessary to draw conclusions on mutation spectrum. These larger numbers could be achieved by analyzing a larger number of replicate populations or populations evolved over a longer period of time, although improved sequencing methods with much lower error rates [40] may suffice to analyze mutation bias for the evolved populations described here.

Table 2

Overview of the mutation spectrum, for different sets of mutations called.

Mutations were called with different values of the threshold frequency for detecting mutations (τ), noted as percentages here, and for different values of the maximum number of lineages in which a mutation could occur before being excluded from the analysis (ψ). The number of transitions (transit.) and transversions (transver.) observed is noted, and below the transition to transversion ratio and its confidence interval (CI) are given.

		Relative frequency of mutation
		Whole genome			Bacmid only
ψ	Observation or index	τ = 0.5	τ = 1.0	τ = 2.0	τ = 0.5
1	Transit. / Transver.	27/41	9/11	6/2	3/5
	Ratio [95% CI]	0.659 [0.976–3.975]	0.818 [0.460–3.329]	3.000 [0.536–30.25]	0.600 [0.093–3.082]
3	Transit. / Transver.	78/102	13/31	6/3	7/10
	Ratio [95% CI]	0.765 [0.563–1.037]	0.419 [0.202–0.825]	2.000 [0.427–12.33]	0.700 [0.225–2.040]
5	Transit. / Transver.	129/152	29/54	10/5	19/18
	Ratio [95% CI]	0.849 [0.667–1.079]	0.537 [0.330–0.859]	2.000 [0.621–7.475]	1.056 [0.524–2.135]

Overview of the mutation spectrum, for different sets of mutations called.

Concluding remarks

Our low mutation rate estimate is congruent with the large genome size of AcMNPV and its known proofreading activity [39]. By contrast, the high genetic variation often observed within alphabaculovirus populations [21-24] remains a conundrum. As baculovirus populations are subject to narrow bottlenecks at the start of infection [23, 41, 42], standing genetic variation will be rapidly lost [43] and a mechanism is required that introduces or maintains genetic variation. In one case, frequency-dependent selection has been observed in a baculovirus population and may account for stable polymorphisms [44, 45]. Whether intricate relationships between genetic variants that complement each other are common or evolutionarily stable remains to be seen, but such relationships would help to explain the genetic diversity often seen within alphabaculovirus populations. Recently, a novel strain of Chyrosodeixis includens nucleopolyhedrovirus was described, which generates high levels of genetic variation when it is at low frequency in mixed infection [46]. If such strains with high mutation rates occur in other baculoviruses, their existence may help explain the high genetic variation often observed in baculovirus populations, even if they are rare genotypes. Our experimental setup uses a well-defined bacmid-derived virus population, and our mutation estimates will therefore be representative of a standard virus and not a mutator variant. Recombination could also help explain the high levels of genetic variation in natural baculovirus populations, as it could generate new haplotypes and enable mutations generated by mutators to spread widely through the population. Experimental work strongly suggests that baculoviruses have a high recombination rate [47]. We estimated the mutation rate for the alphabaculovirus AcMNPV, using an approach that depends on the insertion of a large artificial neutral region in the viral genome, and which starts with a single genotype and exploits the genome stability of group I dsDNA viruses. Such an approach requires mutations in this ‘artificial’ region to be neutral, and dI/dS results suggest this assumption is met. We developed an approach to analyzing regular Illumina sequencing data that were not gathered specifically with the intent of determining mutation rates. Our approach relies on a detailed analysis of the sequencing data to eliminate obvious sequencing biases and a comparison of sequencing data to simulation-model predictions. The use of models allows us to take into consideration the effects of viral demography on mutation rate estimates, and limit the impact of choosing thresholds for mutation detection, as the same threshold is applied in the model. Others have employed high fidelity approaches such as duplex sequencing to reduce sequencing error associated with high-throughput sequencing [40] for estimating mutation rates of DNA viruses and finding mutation-prone sequence motifs and mutational hotspots [27]. High throughput sequencing has also enabled the characterization of genetic diversity within DNA virus populations and its functional implications [48, 49]. Long-read sequencing has made it possible to study of evolutionary dynamics of adaptation by point mutations and gene-copy-number variation in poxviruses [50]. In the future, combining this sequencing technology with large neutral regions could make it viable to extend work on rates and the distribution along the genome of mutational events beyond point mutations to include structural mutations, such as large indels or copy number variation [50, 51]. We estimated the mutation rate per base per strand copying (s/n/r), as has been done in many other previous studies [9, 10]. This metric is convenient, as we could make empirically supported estimates of the initial and final viral population sizes in infected larvae (see Materials and Methods). By contrast, the mode of replication for baculoviruses has not been quantified, and we therefore considered the effect of this parameter, but did not find strong effects on model fit or the mutation rate. An alternative approach would have been to calculate the mutation rate per base per cell infection (s/n/c) [9, 10], in which case we would be missing many details of cellular infection dynamics, particularly in the early stages of infection. Some mutations were detected in multiple evolved populations, whilst not being detected in the ancestral bacmid. Mutations that were detected in the ancestral bacmid simply could be excluded from analyses because they did not occur de novo during the experiment, or alternatively were indicative of sequencing or read-mapping errors. For the repeated mutations that were not detected in the bacmid, the most parsimonious explanation is that they are sequencing or read-mapping errors. Even with strong selection and a large effective population size, it is unlikely that multiple mutations will be presented in all evolved populations [51, 52]. Moreover, our dI/dS and dN/dS results suggest neutral molecular evolution predominates the bacmid region, and purifying selection predominates in the viral genome. Hence, these changes would need to be driven exclusively by mutation bias, an explanation that we find unlikely. However, since we cannot categorically rule out this explanation, for all our data analysis we considered the implications of including repeated mutations. In the most extreme case this lead to a mutation rate estimate of μ = 5×10−7 s/n/r, half an order of magnitude higher than what we consider the best estimate of μ = 1×10−7 s/n/r when repeated mutations are excluded (see Fig 3). Although the criteria for including mutations affect the mutation rate estimate, even when we apply less stringent conditions this estimate will still be relatively low. We chose to perform our experiments in S. exigua, primarily because in our experience we could perform experiments in this particular population of hosts–including reconstitution of the virus from the infectious clone–without activation of latent viruses. Although we used the bacmid region to estimate mutation rates, evolutionary dynamics in the natural viral genome might still have an effect on the observed mutations in the bacmid region. For example, if neutral mutations in the bacmid region hitchhike to high frequencies with a beneficial mutation in the viral genome, this could effect the final distribution of mutation frequencies observed in the bacmid. Although S. exigua is sometimes considered a semi-permissive host for AcMNPV the particular S. exigua colony used for our experiments is highly susceptible to AcMNPV (E.g., the infectivity of OBs to early instar larvae is similar to that in Trichoplusia ni, a permissive host [41]), suggesting the scope for adaptation may be limited. Indeed, we found few mutations in the viral genome for the evolved lineages, and an analysis of the rate of nonsynonymous and synonymous substitutions suggests purifying selection predominated. These observations make it unlikely that positive selection on beneficial mutations in the viral genome distorted the observed distribution mutations in the bacmid region or affected mutation rate estimates.

Materials and methods

Experimental evolution of AcMNPV

pBac-E2 (BAC) [28] was evolved experimentally in S. exigua larvae in five replicate lineages (A, B, C, D and E). To reconstitute the virus from the bacmid, the haemocoel of S. exigua L4 was injected with a total volume of 20 μl (i.e., 2 × 10 μl), containing a 4:1:1 mixture of Lipofectin transfection reagent (ThermoFisher Scientific), water and BAC DNA (~ 15 μg DNA per larva). Upon larval death, OBs were harvested from cadavers. From a single infected larva, OBs were isolated and diluted to 2 × 107 OBs/ml. Serial passage of BAC was performed five times for each replicate lineage, with five larvae used for each replicate lineage. For each passage, newly molted L2 were starved for 12 h and then inoculated by droplet feeding with an OB suspension exceeding 10 x LC99 (≥ 2 × 107 OBs/ml), to avoid narrow transmission bottlenecks. Per replicate lineage, five inoculated larvae were transferred to 6-well tissue culture plates with artificial diet plugs. Larvae were incubated at 26°C and with a 14 h:10 h day-night photoperiod. Upon death, larval cadavers were collected, pooled and used to inoculate the next passage. After five passages, lineages A—E were amplified using 100 S. exigua L3 exposed to a high concentration of OBs (3 × 109 Obs/ml) by droplet feeding, and 1.5 × 109 OBs were used to extract viral genomic DNA. Briefly OBs were dissolved with DAS buffer (0.1 M Na2CO3, 0.15 M NaCl, 10 mM EDTA, pH 11), and DNA was then extracted from the liberated occlusion-derived virus particles using a DNA isolation kit (Omega Bio-tek) following the manufacturer’s instructions. For the BAC, DNA was extracted from 50 ml LB from an overnight culture using a plasmid midi kit (Qiagen). Successful genomic DNA extraction was confirmed by PCR with primers gp41 inner F (5’-CAAGAGCAAAGAACCGACG-3’) and inner R (5’-TTATGCAGTGCGCCCTTTCGT-3’), and contamination of SeMNPV was ruled out by PCR with primers Se F (5’-GACGACGAATTATGTTGTGACCGAC-3’) and R (5’- AGATGGATGGAAAGGCAACGCT-3’). Purified DNA (~ 1.5 ug) from the evolved AcMNPV lineages A, B, C, D and E, as well as the ancestral BAC, was used for library preparation with the Next Ultra DNA Library Prep Kit for Illumina (New England Biolabs), followed by Illumina HiSeq paired-end 150 (PE150) sequencing (Beijing Novogene Bioinformatics Technology Co., Ltd). The raw sequencing data are available in the Sequence Read Archive under accession PRJNA798700 (https://www.ncbi.nlm.nih.gov/sra/PRJNA798700).

Mutation calling and filtering

Because the number of viral reads was not equal across samples, fastq files were subsampled to ensure an approximately equal mean coverage across the reference genome for each isolate using seqtk sample [53]. NGS data was analyzed using CLC Genomics Workbench 20.0 [54]. Reads were trimmed (quality limit = 0.05) and mapped to a reference genome. The reference genome is based on the sequence of the E2 variant [55], the details of the original bacmid construction and donor vectors [28], and limited Sanger sequencing to bridge small gaps (S2 File). Mutations were called using the “low frequency variant detection tool” (minimum frequency = 0.5%). An overview of parameter settings is included (S1 Data). Additional filtering criteria were: forward-reverse balance > 0.05, read count > 10, and the type of mutation is "SNV" (single nucleotide variant). Moreover, positions with extreme coverage values were excluded. To this end, we ranked the coverage value per position for each lineage and excluded the upper and lower 1%. Analyses were done with three thresholds τ for mutation frequency: 0.5%, 1% and 2%, and thee threshold values ψ for the number of lineages in which the exact same mutation could occur before it was filtered: 1, 3 and 5 (S2 and S4 Data, respectively). Finally, mutations were tallied per isolate, both across the whole genome and for the neutral bacmid insert (S3 File).

Mutation model and mutation rate estimation

We generated a stochastic model that predicts the distribution of mutation frequencies per base in an evolving virus genome, and then fitted this model to our empirical data with a maximum likelihood approach to obtain mutation rate estimates. The model was implemented in R 4.0.3 [56] and all code is available (S1 and S2 Code). We model the genome region under consideration as a vector with g elements for each nucleotide position, with each element representing the total frequency f of mutated bases at position i. For simplicity, we do not consider the identity of the mutated bases and we do not allow for reversions, as we are considering scenarios in which mutations are rare and the probability of a reversion occurring and reaching high frequency is very low. We assume that all mutations are strictly neutral, and that all changes in mutation frequency result from the occurrence of de novo mutations or neutral processes like stochastic changes in allele frequencies due to population bottlenecks (i.e. genetic drift). Parameter values, additional explanation and justification are provided for the fitted (Table 3) and fixed (Table 4) model parameters.

Table 3

Fitted parameters for the models for estimating mutation rates.

Parameter	Value	Explanation
ρ	1, 3, 10	Parameter that describes the viral mode of replication, with 1 being equivalent to equivalent to “geometric growth” by fission and large values (i.e., 10) representing “stamping machine” replication kinetics.
log₁₀μ	-10, -9.9, -9.8, [. . .], -5	Mutation rate (s/n/r)

Table 4

Fixed parameters for the models for estimating mutation rates.

Parameter	Value	Explanation
g	11,646	The length of the neutral bacmid region in bases.
	145,465	The length of the full genome, including the neutral bacmid region.
λ	46	The bottleneck size for the viral founding population in each insect larvae. Following [41], the bottleneck size is related to the host survival (S) for AcMNPV infection of S. exigua: λ = ln (S). For each insect exposed to 10 x LD₉₉ dose: λ = 10×−ln(1−0.99)~46. As is typical for multicellular host/virus pathosystems, a virus inoculum with a large number of horizontal transmission stages is used, but the ensuing bottleneck is still narrow [23, 42].
ζ	3.71	Single parameter for the zero-truncated Poisson distribution of nucleocapsids per occlusion derived virus (ODV) [57]. Note that the corresponding mean of the distribution is 3.80 nucleocapsids per ODV.
κ	5.05×10⁸	The final size of the virus population within a single L2. As AcMNPV generates OBs with multiple ODVs, and ODVs with multiple nucleocapids, each containing single copy of the genome. The mean OB yield per larvae during the experiment was 1.33×10⁶, based on the OB concentrations measured for each pool of five insects made after each round of passaging, assuming 100 ODV per OB (we are not aware of any empirical estimates) and a mean of 3.8 nucleocapsids per ODV [57].
σ	1001	Parameter value minus one indicates the maximum number of mutations per genome allowed in the simulations.
φ	10⁴	Threshold value of population growth for switching from stochastic to deterministic mutation.
τ	0.5%, 1%, 2%	Minimum frequency for the detection of mutations, indicated as a percentage.
ψ	1,2, […], 5	The threshold value for the number of evolved lineages in which a mutation can occur for the empirical data. I.e., the most stringent condition is ψ = 1, as only mutations that occur in one lineage are accepted. When ψ = 5, mutations that occur in all 5 linages are accepted.

At the start of the infection of an individual host, there is a bottleneck with λ virus genomes initiating infection. We first draw the number of occlusion derived viruses that infect the host, allowing it to follow a zero-truncated Poisson distribution with a mean of λ/ζ, where ζ is the number of nucleocapsids (each containing one genome copy) per occlusion derived virus (ODV). We obtained an estimate of λ = 46 by considering the relationship between host mortality and the number of viral founders [41] (see Table 4). We use a zero-truncated distribution to avoid having uninfected hosts, but this approximation does not affect our results as the dose used in experiments was high (10 x LD99 dose) and virtually no hosts remain uninfected. Next, for each infecting ODV, we draw the number of nucleocapsids contained from a zero-truncated Poisson distribution with a mean ζ, as the multiple nucleopolyhedroviruses have multiple nucleocapsids present in each ODV. Our model therefore incorporates stochasticity in the number of infecting virus particles and their nucleocapsid content. For each position in the genome, we draw the number of genomes containing a mutation at this position following the population bottleneck at the start of infection from a binomial distribution, where for the ith position: , where x is the number of mutant genomes added to the population at a particular step in the infection process (and f is the frequency of mutated bases at position i). The virus population then expands within the host exponentially with a replication factor ⍴ per cycle of virus replication within the host, such that N = λ(1+ρ) where N is the number of genomes present at a time t, measured in generations of viral replication within the host. It is unknown what the mode of replication [31, 32] is for a baculovirus. We therefore used values of 1 (“geometric” or “symmetric” replication with a doubling of the number of copies per cycle–one original genome copy and one replicated copy), 3 (“mixed” replication–one original genome copy and three replicated copies) and 10 (“stamping machine” or “asymmetric” replication) for ⍴. Replication proceeds until the carrying capacity κ of a host is reached, with an expansion to exactly κ virus genomes allowed in the final round of replication. During each round of replication, the number of new mutants that occur at each position follows a binomial distribution, such that the mutation rate μ is the probability of success and η, the number of genomes generated during that round of replication which are not mutated at this nucleotide position (i.e., ), is the number of events: . However, to make the model computationally tractable for large population sizes, we switched to deterministic mutation (I.e., X = ημ) once a large number of virus genomes was being generated relative to the mutation rate (Nφ>μ−1), where φ is a constant. Note that all mutations are assumed to be neutral and that the bottleneck at the start of infection is narrow (λ = 46). Mutations occurring late in infection when the viral census population size is large therefore will rarely be sampled during the bottleneck events at the start of the next round of infection (i.e. serial transfer). To model the infection of five host larvae per replicate, we simulated five separate infections and pooled the viral progeny by taking the mean mutation frequency per site over the five larvae for each position in the genome, and using these f values for the next round of infection. After 5 rounds of infection, we also included the final amplification of the virus in 100 larvae to generate the final population that is compared to the sequencing results. For L3, we assumed the same population bottleneck λ as for L2 –as a high OB concentration was used for final amplification–and the number of viral genomes generated was double that for L2 (2κ). To fit the model to the data, we first ran 1000 simulations for each combination of parameter values (i.e., μ, ρ and τ) to generate model predictions. We compared the observed number of experimental replicates (q) with j– 1 mutated nucleotide positions with a frequency f higher than the threshold value τ, to the frequency predicted by the model (β). (I.e., q1 is the number of experimental replicates with 0 nucleotide positions for which f > τ, q2 is the number of replicates with 1 nucleotide positions for which f > τ, etc.) The multinomial pseudo-likelihood of any realization is then: , where σ-1 = 1000 is the maximum number of mutated bases that were tracked in the simulations. If the number of mutated nucleotide positions (with a frequency f higher than the threshold value τ for detection) exceeded σ-1 in any simulation, results from that set of parameter values were excluded from further analysis. We fitted the model to 1000 bootstrapped datasets to determine the 95% confidence interval of the parameter estimates.

Mutation rate estimates with established approaches

We estimated the mutation rate with an established approach, to compare to the estimates made with our simulation-model approach. A canonical approach [9] for calculating mutation rate (s/n/r) is , where m is the number of observed mutations observed in sequenced clones, T is the mutational target size, c is the number of viral generations (i.e., in terms of strand copying events), and α is a correction for the effects of selection. To obtain m from our deep sequencing data, we sum the frequency of all observed mutations (f) above the threshold for mutation selection τ in all lineages. If these sequencing data are accurate and mutations are neutral, the frequency of each mutation is also the probability that this mutation would be detected in a randomly selected clone by sequencing. This approach is a simple approximation, as here we do not consider the effect of the threshold for mutation detection (τ) on mutation rate estimates. (Lowering τ will lead to a larger number of mutations and consequently a higher mutation rate estimate. To keep this method as simple as possible and free of additional assumptions, we choose not to incorporate any corrections.) Note that because we sum the frequency of all possible mutations over each site, we can drop the three in the numerator. To obtain T, we multiply the length of the neutral region by the number of replicates. To obtain c we estimate the number of generations assuming different values for ρ (i.e., 1, 3 and 10 as for the simulation-based model fitting), such that c = θ·ln(κ/λ)/ln(1+ρ), where θ is the number of passages. Finally, we can drop α because we only consider mutations in the neutral bacmid region. One-thousand bootstrapped datasets were used to obtain fiducial limits for the mutation rate estimates. To make predictions of mutation accumulation (m) using this model (I.e., see the legend of S5 Fig), we use the simplified relationship m = μTc.

dI/dS and dN/dS analyses

Estimates of dN/dS (i.e., the normalized rate of nonsynonymous mutations, here made for authentic viral genes) were made using standard methods [58, 59]. As we are not aware of any estimates of the mutation spectrum for insect DNA viruses and our own data suggest these biases may not be very strong (Table 2), we assume no mutation bias is present (i.e., a transition to transversion ratio of 1). For the dI/dS [3], the dS term is the same as for the dN/dS analysis, derived from the results for natural viral genes. The dI term is determined for mutations in the bacmid neutral region, or for the intergenic regions of the natural virus genome. Ninety-five percent confidence intervals of the dI/dS and dN/dS were obtained using 1000 bootstrapped datasets, and data were tested for significance with a one-sample t-test on the dN/dS values calculated for individual samples compared to a test value of 1. However, when τ > 0.5% for one or more samples the number of intergenic, non-synonymous or synonymous samples was 0, and hence these analyses could only be performed for τ = 0.5%. Full results and R code have been made available (S3 and S4 Code and S5 Data). We show coverage along the genome for each evolved line (A, B, C, D and E) as well as ancestral strain BAC. Position of mutations observed at mutation frequency threshold value (τ) = 0.5 and present only in a single evolved population (ψ = 1) are shown as black dots. Coverage patterns are similar between the different isolates. The peak observed at around 10000 bp for the BAC isolate is due to the presence of empty bacmid vectors in sequencing data and is omitted from mutation calling. (TIF) Click here for additional data file.

Coverage distribution per isolate after subsampling to approximately equal mean coverage.

Isolates have a mean coverage of around 5500. The BAC isolate is showing an additional peak at a coverage of around 10000, which is explained by the presence of empty bacmid vectors in sequencing data. (TIF) Click here for additional data file.

We show the negative log likelihood (NLL) for models fitted with different mutation frequency threshold values (τ), given as a percentage.

For simplicity, we show the results when only unique mutations are considered (ψ = 1). Mutation frequency threshold values clearly effect model fit, as they have an effect on the number of mutations that will be detected. By contrast, assumptions on the value for the parameter that determines the mode of virus replication (ρ) had little effect on model fit. This result is not surprising however, given that our model does not consider the frequency of mutations, but simply the number of bases with a mutation frequency greater than τ. (TIF) Click here for additional data file.

Estimate of mutation rate (s/n/r) using established methods applied to deep sequencing data for the bacmid region (solid bars, samples categorized as “Classic”), as an alternative to our approach using a simulation-based model (hatched bars on the right, samples categorized as “Simulation model”).

Mutation rates were estimated for different values of the viral mode of replication (⍴), different values for the threshold of mutation detection (τ), and different values for of the maximum number of lineages in which a mutation could occur before being excluded from the analysis (ψ). Error bars represent the 95% fiducial limits, as determined by bootstrapping. When the lower fiducial limit extends beyond the lower limit of the axis, this indicates a lower fiducial limit of zero. Overall, these estimates were lower than those obtained with the approach employing a simulation model. As baculoviruses most likely employ rolling circle amplification, replication is likely to have a high value of ⍴. Therefore, the best estimates with this approach assume the highest value of ⍴. Moreover, they will assume the lowest mutation detection threshold (τ), provided all mutations are assumed to be bona fide, as the cumulative frequency of mutations above this value is used required to estimate mutations and no correction is made for this threshold. Finally, as in our other analyses, we think the most conservative estimate of mutation rate will exlude all repeated mutations (ψ = 1). These conditions (⍴ = 10, τ = 0.5%, ψ = 1) render an estimate of μ = 3 x 10−8 s/n/r, which is lower but roughly similar to for our simulation-based approach (μ ~ 10−7). (TIF) Click here for additional data file.

These heatmaps indicate the number of mutations that accumulate after 5 passages in 5 insects, based on the predictions from the simulation model.

For all simulations, we assumed a mutation rate similar to our estimated value for baculoviruses (μ = 10−7), and kept other model parameters the same as for model fitting (Table 4) unless otherwise indicated. We varied the size of the founding viral population in one insect (λ, x-axis is the log10[λ]) and the final size of the viral population in one insect (κ, y-axis is the log10[κ]), while also varying the threshold value for mutation detection (τ) and mode of virus replication (ρ) over the different panels. The purple cross indicates the point in the parameter space that corresponds to the model parameters assumed in model fitting (λ = 46, κ = 5.05 × 108). There are more detectable mutations when τ is low, when ρ is low, and as the final population size κ increases. Increases in the size of the founding viral population λ initially lead to increasing numbers of detectable mutations, but the number of detectable mutations eventually decreases. For an explanation of this non-monotonic behaviour, see S6 Fig. Finally, note that we can also predict mutation accumulation using the established approach (see Materials and Methods Section) for comparison purposes, which does not take τ into account. The range of model predictions for the number of accumulating mutations (lowest to highest predicted value, based on the extreme values of λ and κ) is then: for ρ = 1, 0.48–2.42 mutations; for ρ = 3, 0.24–1.21 mutations; for ρ = 10, 0.13–0.70 mutations. The simulation model which takes into consideration better the effects of demography on mutation accumulation, therefore predicts considerably lower and higher mutation accumulation under some conditions. (TIF) Click here for additional data file.

An illustration is provided of why the size of the founding population (λ) has a non-monotonic effect on the accumulation of detectable mutations.

The simulation model was run for 5 passages in single insect larvae, with a genome size of g = 50,000 base pairs, mutation rate μ = 10−7 and final population size κ = 3 × 108. We then varied λ, as indicated at the top of each column of panels, with all panels in a column simply representing replicate simulations. We plotted of the log10-transformed frequency of mutations at each position (y-axis) at the end of each round of passaging (x-axis), randomly selecting a hue and line type for each position to make them easier to distinguish. Finally, for each panel we noted the number of mutations which were above a frequency of 0.01 (a, with the threshold indicated by a blue line) and mutations above a frequency of 0.0001 (b, with the threshold indicated by a purple line). We assume the a mutations will be detected by sequencing, as a ~ τ, the threshold value for mutation detection used. The b mutations are sometimes maintained in the population over passages, but they need not be detected as they can be below τ. The number of b mutations increases as λ is increased, whereas the number of a mutations only increases initially. Wide bottlenecks will lead to the maintenance of more mutations in the population, but they also limit the stochastic increases in mutation frequency and prevent mutations from reaching the detection threshold. Recall that all mutations are assumed to be strictly neutral, and that all changes in mutation in mutation frequency are due to de novo mutations or genetic drift. (TIF) Click here for additional data file.

R script for estimating mutation rates based on the bacmid region data.

(TXT) Click here for additional data file.

R script for estimating mutation rates based on the whole genome data.

(TXT) Click here for additional data file.

Main R script for the dN/dS analysis.

(TXT) Click here for additional data file.

R script for bootstrapping and statistical tests for dN/dS analysis.

(TXT) Click here for additional data file.

PDF file with CLC Genomics Workbench settings.

(PDF) Click here for additional data file.

Notebook with scripts and results for mutation calling (τ = 0.5%).

(HTML) Click here for additional data file.

Notebook with scripts and results for mutation calling (τ = 1%).

(HTML) Click here for additional data file.

Notebook with scripts and results for mutation calling (τ = 2%).

(HTML) Click here for additional data file.

Excel file containing the final results for the dN/dS analysis.

(XLSX) Click here for additional data file. PDF file with supplementary tables, including Table A (Analysis of the distribution of mutations along the genome), Table B (Relative frequencies of mutations, ψ = 1), Table C (Relative frequencies of mutations, ψ = 3), and Table D (Relative frequencies of mutations, ψ = 5). (PDF) Click here for additional data file.

ZIP file containing the bacmid sequence and annotation files used in the genome analysis here (sequence as .fa file, annotation as .csv and *.gbk files).

(ZIP) Click here for additional data file.

ZIP file containing 36 *.csv files, containing the mutations called for each condition (whole genome vs bacmid region only, threshold for repeated mutations, and threshold mutation detection) as indicated by the file names.

(ZIP) Click here for additional data file.

PDF file containing Supplementary Text 1 (Relevance of viral demography for mutation rate estimates).

(PDF) Click here for additional data file. 19 Oct 2021 Dear Dr Zwart, Thank you very much for submitting your Research Article entitled 'Empirical estimates of the mutation rate for an alphabaculovirus' to PLOS Genetics. The manuscript was fully evaluated at the editorial level and by three independent expert peer reviewers. The reviewers appreciated the attention to an important problem, but you will see that they have varied opinions on the suitability of your paper for PLOS Genetics and the general strengths of your findings. In particular, concerns were raised about the MOI and passaging conditions that could challenge the estimates you have generated. We feel that you may be able to address these concerns with a detailed rebuttal and a revision which will be strengthened by incorporating some additional experiments based on the constructive suggestions made here. Based on the reviews, we will not be able to accept this version of the manuscript, but given the likely importance of this paper to baculovirologists and dsDNA virologists, we would be willing to review a much-revised version. We cannot, of course, promise publication at that time. Should you decide to revise the manuscript for further consideration here, your revisions should address the specific points made by each reviewer. We will also require a detailed list of your responses to the review comments and a description of the changes you have made in the manuscript. If you decide to revise the manuscript for further consideration at PLOS Genetics, please aim to resubmit within the next 60 days, unless it will take extra time to address the concerns of the reviewers, in which case we would appreciate an expected resubmission date by email to plosgenetics@plos.org. If present, accompanying reviewer attachments are included with this email; please notify the journal office if any appear to be missing. They will also be available for download from the link below. You can use this link to log into the system when you are ready to submit a revised version, having first consulted our Submission Checklist. To enhance the reproducibility of your results, we recommend that you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. Additionally, PLOS ONE offers an option to publish peer-reviewed clinical study protocols. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols Please be aware that our data availability policy requires that all numerical data underlying graphs or summary statistics are included with the submission, and you will need to provide this upon resubmission if not already present. In addition, we do not permit the inclusion of phrases such as "data not shown" or "unpublished results" in manuscripts. All points should be backed up by data provided with the submission. While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. PLOS has incorporated Similarity Check, powered by iThenticate, into its journal-wide submission system in order to screen submitted content for originality before publication. Each PLOS journal undertakes screening on a proportion of submitted articles. You will be contacted if needed following the screening process. To resubmit, use the link below and 'Revise Submission' in the 'Submissions Needing Revision' folder. [LINK] We are sorry that we cannot be more positive about your manuscript at this stage. Please do not hesitate to contact us if you have any concerns or questions. Yours sincerely, Harmit S. Malik Associate Editor PLOS Genetics Bret Payseur Section Editor: Evolution PLOS Genetics Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: PGENETICS-D-21-01205 This manuscript describes an estimate of the mutation rate of an alphabaculovirus, AcMNPV, by deep sequencing of a modified genome that comprises a DNA region that is supposed non-expressed in insect cells, related to the replication ability in bacteria (Bacmid). The manuscript is well written and easy to read, and the methods are clearly described. The authors analysed the mutations that accumulated in the genome on five parallel lineages after five passages through Spodoptera exigua larvae at high multiplicity of oral infection. Three major hypothesis are taken in the manuscript : i.) The bacterial region is perfectly neutral, ii.) all progeny genomes have the same mutation rate, and iii) the mutation rate is the same all through the genome. i). The authors argue that the low copy number bacterial insert can be considered as neutral, and provide evidence of this neutral character in Figure 3 by comparing the rate of synonymous versus non synonymous mutations in this regions compared to the genuine baculovirus genes. For the former, there is conclusion of neutrality, while for the later there is a global conclusion of purifying selection. As not all virus genes are selected at the same level, it could be of interest to differentiate them into “highly selected and loosely selected”. However, is the number of mutations enough for that approach to be carried out? ii). The presence of a exonuclease activity in the viral DNA polymerase seems to be associated to the low mutation rate observed on baculoviruses. There is a recent work (Aguirre et al. Viruses 2021) showing that in an alphabaculovirus, one variant is able to generate high levels of variation. This is not the first time that high variation is observed in baculovirus populations. Although this observations do not hampered the experimental results, they must be taken into account when explaining how variants are generated and maintained in baculovirus populations. iii). I think this point should be mentioned, as in previous papers there are references to “hot spots of mutations”. The authors analyze the robustness of their model by a sensitivity analysis, but only of some of the parameters. It is interesting that the replication mechanism (strain copying or rolling circle) does not change the mutation rate estimates. What about the value of kappa, that is, the size of the virus population on a given host. Has a sensitivity analysis been performed using various estimates of the population size. Clearly the number of genomes present in the OBs is a subset of the total number of genomes replicated. It does not take into account the genomes that for BV not those that could be not included in nucleocapsids. It could be expected that the subset encapsidated into the is chosen randomly, but this could be not the case as OB production occurs at later times of infection, and thus the cellular machinery of repair might be not in optimal conditions. In a similar way, the multiplicity of infection, that is, the value of parameter Lambda, has been fixed to 46, due to the mode of infection of the larvae. The authors have fixed the numbers of ODV by OB, and fixed the number of nucleocapsids per ODV, but these values are very variable, even whithin a single production. The distribution of nucleocapsids par virion does not follow a normal law. A high number of ODV have a single nucleocapsid. Can the authors speculate on the consequences of varying this parameter, as the final number of genomes entering into the larvae is not known? Minor remarks Line 132. What does it means "per strand copying, the number of replication cycles in each insect is not known. Line 154; What means “viral generation? Line 193. As you provided evidence that nonsynonymous mutations are cleaned from the genomes, why not to use the synonymous mutations only in viral genes when calculating the mutation rate? This could help on the analysis of the transition to transversion ratio (line 236). Line 303. The value of Lambda should be -10*ln(1-0.99) to get a positive value. Reviewer #2: The authors present a concise examination of the mutation rate of a particular BAC-cloned isolate of Autographa californica multiple nucleopolyhedrovirus (AcMNPV), after experimental evolution for five passages in Spodoptera exigua (beet armyworm) larvae. The study is carefully executed from a bioinformatics and modeling perspective. The included code examples are well-annotated and documented. However there is a notable lack of consideration for how the biological context of the experiment may have influenced the outcome, as well as an over-concentration on comparisons to constraints that are particular to similar studies of RNA viruses. Major issues - One biological context that goes unmentioned in this study is the matchup of the particular AcMNPV BAC clone under study, and the Spodoptera exigua (beet armyworm) larvae used as the experimental evolution host. Presumably there was a rationale for choosing this isolate and this host – can the authors please add this? Might the choice of host be influencing the level of selective pressure on the virus? The discussion is extremely short, so there is room to explain how these choices may have influenced the outcome. - Another biological context that goes undiscussed is the choice of an extremely high MOI inoculum – despite the low stated bottleneck of 46 – and the experimental design that pools billions of viral particles across multiple larvae cadavers, to create the next round of viral inoculum for each lineage. This approach seems sure to reduce the impact of any rare variants, and does not seem likely to reflect natural environmental bottlenecks. This leaves the impression that while the mutation rate can be as low as that measured here, the natural situation may be more varied. - Two other aspects of the natural biological context for these viruses are standing genetic variation in the population, and the ability of coincident viruses to undergo recombination. The authors have intentionally aimed to use a homogeneous source population, but these other contributions impact the natural polymorphisms observed in isolates of AcMNPV at least belong in the discussion. At present, the authors leave it thus: "By contrast, the high genetic variation often observed within alphabaculovirus populations [21-24] remains a conundrum." The biological contexts listed here, among others, would help to explain this conundrum. - Throughout the text, there is an over-emphasis on comparing the authors' work to data, methods and controls that have been necessary in prior studies of small RNA virus genomes. The vast majority of literature cited for comparison is likewise on (very excellent) RNA virus studies. Yet these are not the best comparisons for the present study, and this emphasis is likely to confuse the reader. In the introduction, the authors posit that "most mutations in viral genomes are deleterious", with citations to RNA virus studies. Large DNA viruses have sufficient intergenic space, genetic redundancy, repetitive elements, etc. that this claim is not likely to be correct for these viruses. This claim should be removed or backed up with relevant DNA virus references. As a methods example, the CirSeq approach used as a comparison on line 270 is an approach developed for RNA viruses to counterbalance for the need to use (error-prone) reverse transcriptase to create DNA templates from the initial pool of RNA virus genomes. That biological step is not necessary for baculoviruses, so this is not a useful contrast to make with the authors' methods here. It would be more useful and appropriate to compare the present work to recent in-depth genomics analyses of adenoviruses, poxviruses, herpesviruses, megaviruses, etc. - The comparisons in the manuscript focus on the mutation rate of the AcMNPV genome, as compared to the "neutral intergenic region" of the BAC-insertion. While this is interesting, it seems odd to not include any mention or analysis of the natural intergenic regions of the AcMNPV genome. Since the authors calculated dN/dS, their analysis pipeline includes an awareness of gene-encoding regions and the intergenic areas between them. At present it is not stated how these are handled. Did the mutations observed in this study all fall into coding regions? Or were they predominantly intergenic? These data should be included, and the natural AcMNPV intergenic regions should be analyzed as a group for comparison, alongside the artificial neutral-intergenic (dI) region of the BAC, and the dN/dS rate of the rest of the AcMNPV genome. - It is not clear why the authors chose to exclude the sizable number of mutations that occurs in the ancestral BAC, and any that showed up in more than one progeny virus. While the text (lines 313-316) suggests that this was done to exclude any contribution of variants in the initial starting population, this seems like an odd choice – since natural virus populations may have this much or an even larger amount of variation. The choice to exclude variants that appear more than once likewise seems poised to exclude the very real possibility of mutational hotspots in the viral genome. Minor issues - line 265 - this approach is not particularly novel. There is a long and classic literature using the insertion of exogenous sequences into viral genomes (and many other species), and using various methods to observe the mutation rate of the inserted sequence. Recognition and citation of this history would be more appropriate, and then the authors can distinguish how their application of this deep sequencing and this particular host-virus matchup provides novelty. - What isolate or lineage of AcMNPV is contained in pBac-E2? Was this resource generated for this study, or is there a prior citation or description of its construction? These details should be referenced or included. - How were the library preps generated for HiSeq sequencing? Presumably PE150 indicates paired-end reads? - Why not include other large DNA viruses in figure 4? There are relevant comparative data on from similar studies of adenoviruses, poxviruses, herpesviruses, megaviruses, etc. These should be included as well. - What is observed around the homologous regions (hrs) of this baculovirus genome? Does the alignment approach (mapping to the reference genome, line 308) obscure the ability to observe any fluctuations at these repetitive regions? - What reference genome was used here, for mapping of all data (line 308)? Does it have an accession number in a common repository (EBI/GenBank)? It is not stated if it is derived from exactly this isolate, or if not, how closely it matches the BAC. - Figures S1-S3 could easily be combined into a single figure. - Sequencing data should be deposited in a common repository such as the Sequence Read Archive. Reviewer #3: PGENETICS-D-21-01205 Boezen et al This manuscript describes an in vivo experiment designed to calculate the mutation rate of the baculovirus AcMNPV based on mutation accumulation. The design if the experiment is rather elegant as the construction of a bacmid allowed both to start the experiment from a genomic clone and achieve normal in vivo replication in the caterpillar host Spodoptera exigua. The authors estimate a mutation rate of 10-7 both on the neutral inserted portion of the genome as well as the original portion of the genome. This is a rather conservative estimate and I wonder if the number of assumptions made truly reflects what could happen in natural populations. Main comments: L 109: Isn’t the viral amplification step in L3 a 6th passage in the experiment. Please explain the possible impact this amplification passage could have on the parameters used for the models to estimate the mutations rates L113: Around there indicate the average sequence 5500 coverage for your genome. It is much lower than in previous deep sequencing analyses done on AcMNPV. L258-259: In the conclusion it would help readers to estimate how many neutral mutations per genome (and per OB) could be transmitted to the next generation given the actual calculated rate. This could be compared this with the number of transposable elements found in AcMNPV OBs. L286: What was the volume of the droplet used for the droplet feeding assay. (How many OBs do the caterpillar ingest?). How does this relate to the genome bottleneck size of 46? L291: What was the OB concentrations and volumes used of infect the L3 caterpillar in the 6th (amplification) cycle? Was the final size of the virus population per insect cadaver still 5.32x108 ? In L3 the yield should be higher than in L2. L312-317: I don’t understand why convergent evolution of mutation in different lineage should be excluded as possibly deriving from polymorphism in the BAC population. Is this an indication of insufficient sequence coverage for the experiment? L 320: Would the sequencing error rate in the illumina data allow to use a lower mutation threshold given the sequencing coverage? L330-332: what would it change if reversions are allowed? L387-389: Throughout the material and methods the authors assume mutations are extremely rare and seem to exclude all data and parameters that would increase the calculated rate. ********** Have all data underlying the figures and results presented in the manuscript been provided? Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: Yes Reviewer #2: No: Sequencing data should be deposited in a common repository such as the Sequence Read Archive. Reviewer #3: No: the sequence data does not seem to be available as :' Sequence data have been uploaded to a permanent repository within the Netherlands Institute of Ecology and are available upon request.' ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Miguel Lopez-Ferber Reviewer #2: No Reviewer #3: No 25 Mar 2022 Submitted filename: Response to reviewers.pdf Click here for additional data file. 27 Apr 2022 Dear Dr Zwart, We are pleased to inform you that your manuscript entitled "Empirical estimates of the mutation rate for an alphabaculovirus" has been editorially accepted for publication in PLOS Genetics. Congratulations! Before your submission can be formally accepted and sent to production you will need to complete our formatting changes, which you will receive in a follow up email. Please be aware that it may take several days for you to receive this email; during this time no action is required by you. Please note: the accept date on your published article will reflect the date of this provisional acceptance, but your manuscript will not be scheduled for publication until the required changes have been made. Once your paper is formally accepted, an uncorrected proof of your manuscript will be published online ahead of the final version, unless you’ve already opted out via the online submission form. If, for any reason, you do not want an earlier version of your manuscript published online or are unsure if you have already indicated as such, please let the journal staff know immediately at plosgenetics@plos.org. In the meantime, please log into Editorial Manager at https://www.editorialmanager.com/pgenetics/, click the "Update My Information" link at the top of the page, and update your user information to ensure an efficient production and billing process. Note that PLOS requires an ORCID iD for all corresponding authors. Therefore, please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field. This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager. If you have a press-related query, or would like to know about making your underlying data available (as you will be aware, this is required for publication), please see the end of this email. If your institution or institutions have a press office, please notify them about your upcoming article at this point, to enable them to help maximise its impact. Inform journal staff as soon as possible if you are preparing a press release for your article and need a publication date. Thank you again for supporting open-access publishing; we are looking forward to publishing your work in PLOS Genetics! Yours sincerely, Harmit S. Malik Associate Editor PLOS Genetics Bret Payseur Section Editor: Evolution PLOS Genetics www.plosgenetics.org Twitter: @PLOSGenetics ---------------------------------------------------- Comments from the reviewers (if applicable): Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: In this revised version, I found the authors have considered the comments of the reviewers in a satisfactory way. Line 225. Strictly speaking, there is not a single neutral insertion of 11.6 kbp, but two regions separated by a relatively highly selected gene, the polyhedrin, that conditions between host survival. Reviewer #2: The authors have addressed the reviewers concerns well, and the revisions to the manuscript have made it more thorough and more clear. Reviewer #3: I am satisfied with the response to reviewer comment provided. I particularly appreciate the substantial effort made to revise the molecular evolution model to address the different points made by the 3 reviewers and ensuing discussion. I noted one typo, but there might be others: l376: remove one 'we' ********** Have all data underlying the figures and results presented in the manuscript been provided? Large-scale datasets should be made available via a public repository as described in the PLOS Genetics data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: Yes ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Miguel LOPEZ-FERBER Reviewer #2: No Reviewer #3: No ---------------------------------------------------- Data Deposition If you have submitted a Research Article or Front Matter that has associated data that are not suitable for deposition in a subject-specific public repository (such as GenBank or ArrayExpress), one way to make that data available is to deposit it in the Dryad Digital Repository. As you may recall, we ask all authors to agree to make data available; this is one way to achieve that. A full list of recommended repositories can be found on our website. The following link will take you to the Dryad record for your article, so you won't have to re‐enter its bibliographic information, and can upload your files directly: http://datadryad.org/submit?journalID=pgenetics&manu=PGENETICS-D-21-01205R1 More information about depositing data in Dryad is available at http://www.datadryad.org/depositing. If you experience any difficulties in submitting your data, please contact help@datadryad.org for support. Additionally, please be aware that our data availability policy requires that all numerical data underlying display items are included with the submission, and you will need to provide this before we can formally accept your manuscript, if not already present. ---------------------------------------------------- Press Queries If you or your institution will be preparing press materials for this manuscript, or if you need to know your paper's publication date for media purposes, please inform the journal staff as soon as possible so that your submission can be scheduled accordingly. Your manuscript will remain under a strict press embargo until the publication date and time. This means an early version of your manuscript will not be published ahead of your final version. PLOS Genetics may also choose to issue a press release for your article. If there's anything the journal should know or you'd like more information, please get in touch via plosgenetics@plos.org. 31 May 2022 PGENETICS-D-21-01205R1 Empirical estimates of the mutation rate for an alphabaculovirus Dear Dr Zwart, We are pleased to inform you that your manuscript entitled "Empirical estimates of the mutation rate for an alphabaculovirus" has been formally accepted for publication in PLOS Genetics! Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out or your manuscript is a front-matter piece, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Genetics and open-access publishing. We are looking forward to publishing your work! With kind regards, Zita Barta PLOS Genetics On behalf of: The PLOS Genetics Team Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom plosgenetics@plos.org | +44 (0) 1223-442823 plosgenetics.org | Twitter: @PLOSGenetics

53 in total

Review 1. DNA polymerase of the T4-related bacteriophages.

Authors: J D Karam; W H Konigsberg
Journal: Prog Nucleic Acid Res Mol Biol Date: 2000

2. An experimental test of the independent action hypothesis in virus-insect pathosystems.

Authors: Mark P Zwart; Lia Hemerik; Jenny S Cory; J Arjan G M de Visser; Felix J J A Bianchi; Monique M Van Oers; Just M Vlak; Rolf F Hoekstra; Wopke Van der Werf
Journal: Proc Biol Sci Date: 2009-03-11 Impact factor: 5.349

3. Extremely high mutation rate of a hammerhead viroid.

Authors: Selma Gago; Santiago F Elena; Ricardo Flores; Rafael Sanjuán
Journal: Science Date: 2009-03-06 Impact factor: 47.728

4. Mixtures of complete and pif1- and pif2-deficient genotypes are required for increased potency of an insect nucleopolyhedrovirus.

Authors: Gabriel Clavijo; Trevor Williams; Oihane Simón; Delia Muñoz; Martine Cerutti; Miguel López-Ferber; Primitivo Caballero
Journal: J Virol Date: 2009-03-04 Impact factor: 5.103

Introduction

Results and discussion

Serial passage and detection of mutations

Illustration of the neutral region (green diagonal bars) used for estimating mutation rate.

Low mutation rate for AcMNPV with bacmid and whole-genome mutation data

Mutation rate estimates (s/n/r, mutations per site per strand copying) derived with the model are given based on the neutral bacmid region and the whole genome.

AcMNPV mutation rate estimate is congruent with estimates for other viruses

An overview of known mutation rate estimates (s/n/r, ordinate) for viruses with different genome sizes (abscissa) is given.

Analysis of the mutational spectrum: A low transition to transversion ratio?

Overview of the mutation spectrum, for different sets of mutations called.

Concluding remarks

Materials and methods

Experimental evolution of AcMNPV

Mutation calling and filtering

Mutation model and mutation rate estimation

Mutation rate estimates with established approaches

dI/dS and dN/dS analyses

Coverage distribution per isolate after subsampling to approximately equal mean coverage.

We show the negative log likelihood (NLL) for models fitted with different mutation frequency threshold values (τ), given as a percentage.

These heatmaps indicate the number of mutations that accumulate after 5 passages in 5 insects, based on the predictions from the simulation model.

An illustration is provided of why the size of the founding population (λ) has a non-monotonic effect on the accumulation of detectable mutations.

R script for estimating mutation rates based on the bacmid region data.

R script for estimating mutation rates based on the whole genome data.

Main R script for the dN/dS analysis.

R script for bootstrapping and statistical tests for dN/dS analysis.

PDF file with CLC Genomics Workbench settings.

Notebook with scripts and results for mutation calling (τ = 0.5%).

Notebook with scripts and results for mutation calling (τ = 1%).

Notebook with scripts and results for mutation calling (τ = 2%).

Excel file containing the final results for the dN/dS analysis.

ZIP file containing the bacmid sequence and annotation files used in the genome analysis here (sequence as *.fa file, annotation as *.csv and *.gbk files).

ZIP file containing 36 *.csv files, containing the mutations called for each condition (whole genome vs bacmid region only, threshold for repeated mutations, and threshold mutation detection) as indicated by the file names.

PDF file containing Supplementary Text 1 (Relevance of viral demography for mutation rate estimates).

Review 1. DNA polymerase of the T4-related bacteriophages.

Review 7. Matters of Size: Genetic Bottlenecks in Virus Infection and Their Potential Impact on Evolution.

ZIP file containing the bacmid sequence and annotation files used in the genome analysis here (sequence as .fa file, annotation as .csv and *.gbk files).