Fiona Tulloch1, Nicky J Atkinson2, David J Evans3, Martin D Ryan1, Peter Simmonds2. 1. School of Biology, University of St Andrews, St Andrews, United Kingdom. 2. Infection and Immunity Division, Roslin Institute, University of Edinburgh, Edinburgh, United Kingdom. 3. School of Life Sciences, University of Warwick, Coventry, United Kingdom.
Abstract
Mutating RNA virus genomes to alter codon pair (CP) frequencies and reduce translation efficiency has been advocated as a method to generate safe, attenuated virus vaccines. However, selection for disfavoured CPs leads to unintended increases in CpG and UpA dinucleotide frequencies that also attenuate replication. We designed and phenotypically characterised mutants of the picornavirus, echovirus 7, in which these parameters were independently varied to determine which most influenced virus replication. CpG and UpA dinucleotide frequencies primarily influenced virus replication ability while no fitness differences were observed between mutants with different CP usage where dinucleotide frequencies were kept constant. Contrastingly, translation efficiency was unaffected by either CP usage or dinucleotide frequencies. This mechanistic insight is critical for future rational design of live virus vaccines and their safety evaluation; attenuation is mediated through enhanced innate immune responses to viruses with elevated CpG/UpA dinucleotide frequencies rather the viruses themselves being intrinsically defective.
Mutating RNA virus genomes to alter codon pair (CP) frequencies and reduce translation efficiency has been advocated as a method to generate safe, attenuated virus vaccines. However, selection for disfavoured CPs leads to unintended increases in CpG and UpA dinucleotide frequencies that also attenuate replication. We designed and phenotypically characterised mutants of the picornavirus, echovirus 7, in which these parameters were independently varied to determine which most influenced virus replication. CpG and UpA dinucleotide frequencies primarily influenced virus replication ability while no fitness differences were observed between mutants with different CP usage where dinucleotide frequencies were kept constant. Contrastingly, translation efficiency was unaffected by either CP usage or dinucleotide frequencies. This mechanistic insight is critical for future rational design of live virus vaccines and their safety evaluation; attenuation is mediated through enhanced innate immune responses to viruses with elevated CpG/UpA dinucleotide frequencies rather the viruses themselves being intrinsically defective.
Protein encoding regions of all organisms, eukaryotic, bacterial and viral, are subject
to a number of functional constraints in addition to coding capacity, many of which
contribute to regulation of translation. These include the widely reported biases in the
relative frequencies of codons encoding the same amino acid (Bennetzen and Hall, 1982; Sharp
et al., 2005; Wu et al., 2010) which
in some organisms represents optimisation of the coding sequence for specific tRNAs,
elongation rates and translation accuracy (reviewed in Gingold and Pilpel, 2011). There are, in addition, consistent under- and
over-representations of codon pairs (CPs) in all organisms (Yarus and Folley, 1985; Gutman
and Hatfield, 1989; Boycheva et al.,
2003; Moura et al., 2005; Tats et al., 2008) that have been proposed to
influence gene expression through alterations in translation efficiency.Because of its potential effect on gene expression, altering CP frequencies towards
those that are disfavoured in their hosts has recently been advocated as a novel
strategy to reduce RNA virus replication (Coleman et
al., 2008; Wimmer et al., 2009; Mueller et al., 2010; Martrus et al., 2013; Yang et
al., 2013; Le Nouen et al., 2014;
Ni et al., 2014). This procedure potentially
provides the means to produce a new generation of safer, non-reverting, live attenuated
vaccines. Classically, virus genomes have been empirically attenuated by serial passage
in tissue-culture leading to the accumulation of mutations. This lengthy, stochastic
process produced attenuated virus vaccine strains which have produced major effects on
human (eg., poliovirus—Sabin vaccines) and animal health (eg., eradication of
Rinderpest using the Plowright vaccine). However, reversion to virulence by
back-mutation of characteristically a small number of key, attenuating, mutations is a
well-known problem. Novel strategies by which synonymous coding changes are introduced
to modify codon usage (Mueller et al., 2010)
(Martrus et al., 2013; Yang et al., 2013; Le Nouen et
al., 2014; Ni et al., 2014) have the
advantage that the resulting virus attenuation is dependent on a large number of
mutations each of which only slightly reduce replicative fitness, but taken together
produce significant attenuation with greatly enhanced genetic stability.As one of the first examples, Coleman et al.
(2008) generated synthetic poliovirus capsid gene sequences containing codon
pairs that were specifically disfavoured in human coding sequences. These CP
de-optimised sequences were inserted into an infectious cDNA clone of poliovirus. Virus
generated from these mutants showed a remarkably attenuated replication phenotype
attributed by the authors to impaired translation efficiency. Codon pair de-optimisation
(CPD) has since been developed as a strategy for the production of a wide range of other
live attenuated virus vaccines including influenza A virus (IAV), porcine reproductive
and respiratory syndrome virus (PRRSV), human immunodeficiency virus type 1 (HIV-1) and
respiratory syncytial virus (Mueller et al.,
2010; Martrus et al., 2013; Yang et al., 2013; Le Nouen et al., 2014; Ni et
al., 2014).While altering translation efficiency through manipulation of codon or codon pair usage
may attenuate virus replication, other virus compositional features may additionally
contribute to replication phenotypes. One prominent compositional abnormality among RNA
and small DNA viruses infecting mammals and plants is the marked suppression of the
frequencies of CpG and UpA dinucleotides (Karlin et
al., 1994; Rima and McFerran, 1997;
Simmonds et al., 2013). The functional basis
for this suppression was recently demonstrated by the marked attenuating effect of
artificially increasing the numbers of CpG and UpA dinucleotides in the genome of
echovirus 7 (E7; (Atkinson et al., 2014)). We
and others (Burns et al., 2009) have speculated
that effects of CpG/UpA frequencies on virus replication may indeed account for, at
least in part, the attenuating effect of selecting disfavoured codon pairs in CPD mutant
of poliovirus and other candidate attenuated vaccines. Supporting this conjecture,
regression analysis of the effects of numerous compositional variables in a range of
codon- and codon pair deoptimised mutants on poliovirus replication demonstrated the
primary effect of CpG and UpA frequencies on replication ability rather than alterations
in codon or codon pair usage (Burns et al.,
2009).In the current study, we have used a variety of bioinformatic analyses to investigate
the relationship between dinucleotide frequencies and codon pair usage. We have
subsequently designed and assessed the replication phenotypes and fitness of mutants of
E7 constructed in such a way that allows effects of codon pair and dinucleotide
frequency alterations to be separately altered. The findings demonstrate the primary
influence of CpG and UpA frequencies on virus replication that was independent of codon
pair usage and translation efficiency.
Results
Virus attenuation, CP and dinucleotide frequencies
Coding regions of poliovirus, IAV, PRRSV and HIV-1 have all been subjected to CP
de-optimisation and effects on virus replication quantified (Coleman et al., 2008; Mueller
et al., 2010; Martrus et al.,
2013; Yang et al., 2013; Ni et al., 2014). Despite their diversity of
replication and translation mechanisms, each showed a similar relationship between
the extent of CPD and reduction in virus replication ability (Table 1). Typically, 10-fold or greater attenuation in cell
culture required >12–15% replacement of WT genome with CPD sequences. It
is notable, however, that for each virus, CPD invariably increased frequencies of CpG
and UpA dinucleotides (Table 1), typically
from 0.4–0.6 to 1.4–1.6 (CpG) and from 0.5–0.8 to 1.1–1.4
(UpA) in the mutated regions.
Table 1.
Relationship between codon pair de-optimisation, CpG and UpA frequencies and
virus fitness reduction
DOI:
http://dx.doi.org/10.7554/eLife.04531.003
WT
CPD
Virus
Gene
Prop'n
CP bias
CpG
UpA
CP bias
CpG
UpA
Replication Reduction
Ref
Poliovirus
PV-X
Capsid
14.8%
−0.03
0.52
0.75
−0.46
1.34
1.25
×25
Coleman et al., 2008
PV-XY
Capsid
25.9%
−0.03
0.54
0.75
−0.46
1.31
1.27
×400
Influenza A virus*
HAMin
Segs.4
11.4%
0.02
0.43
0.64
−0.42
1.65
1.11
×3.5
Mueller et al., 2010
HA/NPMin
Segs.4,5
21.3%
0.02
0.44
0.55
−0.42
1.56
1.14
×14
PR83F
Segs.1,4,5
29.1%
0.01
0.43
0.53
−0.41
1.55
1.07
×35
HIV-1
A
gag
4.6%
0.03
0.47
1.04
−0.43
1.43
1.25
×7
Martrus et al., 2013
B
gag
4.7%
0.08
0
0.91
−0.37
1.22
1.15
×3
C
gag
4.8%
0.03
0.31
1.00
−0.38
1.50
1.09
× 8
D
gag
2.1%
−0.02
0
0.49
−0.42
1.47
0.99
×1.5
PRRSV
SAVE5
gp5
2.6%†
−0.06
0.63
0.73
−0.38
1.37
1.14
×4‡
Ni et al., 20142
Codon pair minimised sequences of IAV were not provided in (Coleman et al., 2008) and for the
purposes of comparison these have been reconstructed in SSE. Note that
the CP scores described in Table
1 of that paper (−0.386, −0.420 and −0.421
for PB1, HA and NP respectively) are not minimum scores; these are in
fact −0.533, −0.585 and −0.602. Therefore, for the
purposes of comparison, CP score minimisation in the current study was
targeted to the former values. Although the sequences generated by SSE
were not identical to those obtained previously, they would demonstrate a
similar distortion of dinucleotide frequencies to those used in the
previous study (Coleman et al.,
2008).
Mutated region only (positions 147–542 in gp5).
Data from replication assay in PAM cells.
Relationship between codon pair de-optimisation, CpG and UpA frequencies and
virus fitness reductionDOI:
http://dx.doi.org/10.7554/eLife.04531.003Codon pair minimised sequences of IAV were not provided in (Coleman et al., 2008) and for the
purposes of comparison these have been reconstructed in SSE. Note that
the CP scores described in Table
1 of that paper (−0.386, −0.420 and −0.421
for PB1, HA and NP respectively) are not minimum scores; these are in
fact −0.533, −0.585 and −0.602. Therefore, for the
purposes of comparison, CP score minimisation in the current study was
targeted to the former values. Although the sequences generated by SSE
were not identical to those obtained previously, they would demonstrate a
similar distortion of dinucleotide frequencies to those used in the
previous study (Coleman et al.,
2008).Mutated region only (positions 147–542 in gp5).Data from replication assay in PAM cells.This linkage can be accounted for at least in part by the association between CP
choice and the identity of the dinucleotide between the third and first (3–1)
codon positions. We analysed these parameters in coding regions of a curated dataset
of over 35,170 human mRNA sequences. The representation of each of the 3904 codon
pairs found in coding sequences (ie., 61 × 64) was calculated, taking into
account both the nucleotide composition of the sequences and the amino acid usage as
previously described (Gutman and Hatfield,
1989). Relative under- and over-representation of each was indicated in a
heat map, with values ranging from −0.222 to +0.271 (mean 0.072). Values
were plotted on x- and y-axes using values that reflected base identities at each of
six positions in the codon pair (Figure 1A).
Most of the 256 CPs with CpG at the 3–1 position (sixth main column) were
markedly under-represented. There was further influence of codon pair position 6 (CP
score was more suppressed if A or U) but with generally minimal and inconsistent
influences of nucleotide identities at other codon positions (Bennetzen and Hall, 1982; Buchan et al., 2006; Atkinson et al.,
2014). In the overall distribution of human codon pair representations,
codon pairs containing CpG across the codon boundary distributed towards the negative
tail of the distribution of CP score (Figure
1B) and accounted for almost all those with scores below −1.25.
Figure 1.
(A) CP usage in human coding sequences arranged in a 64
x 64 grid.
CP frequencies relative to those expected from nucleotide and amino acid
frequencies (CP bias) are colour coded in a heat map. The primary
division on the x-axis is by identity of the 3–1 dinucleotide as
annotated. Within these, further divisions within each of the 16 columns
show the identity of the nucleotide at position 2 (A, C, G or U). The
y-axis records nucleotides at positions 4 (4 main divisions on the
y-axis), 1 (4 subdivisions of position 4) and 6 (4 subdivisions of
position 1). Positions of unused codon pairs containing a 5′ stop
codon (translated as *|x) are shaded in grey. CP usage heat maps for
A. thaliana, C. elegans and E. coli
coding sequences are shown in Figure
1—figure supplement 2A–2C. (B)
Distribution of codon pair bias scores in human coding sequences;
separate labelling of the 64 codon pairs with CpG (red) or UpA (blue)
across the codon junction (3–1) demonstrates their consistent
under-representation based on their component nucleotide and amino acid
frequencies. The distribution of codon pair scores for A.
thaliana, C. elegans and E. coli are shown
in Figure 1—figure supplement
3A–3C. Correlations between codon pair scores between
human coding sequences and those of A. thaliana, C.
elegans and E. coli are shown in Figure 1—figure supplement
4.
DOI:
http://dx.doi.org/10.7554/eLife.04531.004
Codons with CpG and UpA at the 1–2 or 2–3 codon position
are shaded as indicated in the key (data derived from http://bioinformatics.weizmann.ac.il/databases/codon).
DOI:
http://dx.doi.org/10.7554/eLife.04531.005
The primary division on the x-axis is by identity of the 3–1
dinucleotide (labelled on y-axis), divisions within each column show the
identity of codon position 2. The y-axis records codon positions 5 (1
cycle), 1 (4 cycles) and 6 (16 cycles). Positions of codon pairs
translated as *|x are shaded grey.
DOI:
http://dx.doi.org/10.7554/eLife.04531.006
DOI:
http://dx.doi.org/10.7554/eLife.04531.007
DOI:
http://dx.doi.org/10.7554/eLife.04531.008
(A) CP usage in human coding sequences arranged in a 64
x 64 grid.
CP frequencies relative to those expected from nucleotide and amino acid
frequencies (CP bias) are colour coded in a heat map. The primary
division on the x-axis is by identity of the 3–1 dinucleotide as
annotated. Within these, further divisions within each of the 16 columns
show the identity of the nucleotide at position 2 (A, C, G or U). The
y-axis records nucleotides at positions 4 (4 main divisions on the
y-axis), 1 (4 subdivisions of position 4) and 6 (4 subdivisions of
position 1). Positions of unused codon pairs containing a 5′ stop
codon (translated as *|x) are shaded in grey. CP usage heat maps for
A. thaliana, C. elegans and E. coli
coding sequences are shown in Figure
1—figure supplement 2A–2C. (B)
Distribution of codon pair bias scores in human coding sequences;
separate labelling of the 64 codon pairs with CpG (red) or UpA (blue)
across the codon junction (3–1) demonstrates their consistent
under-representation based on their component nucleotide and amino acid
frequencies. The distribution of codon pair scores for A.
thaliana, C. elegans and E. coli are shown
in Figure 1—figure supplement
3A–3C. Correlations between codon pair scores between
human coding sequences and those of A. thaliana, C.
elegans and E. coli are shown in Figure 1—figure supplement
4.
Figure 1—figure supplement 2.
CP scores of codon pairs of (A) A.
thaliana, (B) C. elegans and
(C) E.coli ORFeomes.
The primary division on the x-axis is by identity of the 3–1
dinucleotide (labelled on y-axis), divisions within each column show the
identity of codon position 2. The y-axis records codon positions 5 (1
cycle), 1 (4 cycles) and 6 (16 cycles). Positions of codon pairs
translated as *|x are shaded grey.
DOI:
http://dx.doi.org/10.7554/eLife.04531.006
Figure 1—figure supplement 3.
Distribution of codon pair scores for other
organisms-(A) A thaliana, (B)
C. elegans and (C) E.
coli, with separate representation of codon pairs with CpG
and UpA across the codon junction.
DOI:
http://dx.doi.org/10.7554/eLife.04531.007
Figure 1—figure supplement 4.
Correlation between representations of human codon pairs (x-axis)
with those of other organisms-(A) A
thaliana, (B) C. elegans and
(C) E. coli (y-axis).
DOI:
http://dx.doi.org/10.7554/eLife.04531.008
DOI:
http://dx.doi.org/10.7554/eLife.04531.004
Distribution of relative synonymous codon usage values for degenerate
codons in the human genome (stop codons were excluded).
Codons with CpG and UpA at the 1–2 or 2–3 codon position
are shaded as indicated in the key (data derived from http://bioinformatics.weizmann.ac.il/databases/codon).DOI:
http://dx.doi.org/10.7554/eLife.04531.005
CP scores of codon pairs of (A) A.
thaliana, (B) C. elegans and
(C) E.coli ORFeomes.
The primary division on the x-axis is by identity of the 3–1
dinucleotide (labelled on y-axis), divisions within each column show the
identity of codon position 2. The y-axis records codon positions 5 (1
cycle), 1 (4 cycles) and 6 (16 cycles). Positions of codon pairs
translated as *|x are shaded grey.DOI:
http://dx.doi.org/10.7554/eLife.04531.006
Distribution of codon pair scores for other
organisms-(A) A thaliana, (B)
C. elegans and (C) E.
coli, with separate representation of codon pairs with CpG
and UpA across the codon junction.
DOI:
http://dx.doi.org/10.7554/eLife.04531.007
Correlation between representations of human codon pairs (x-axis)
with those of other organisms-(A) A
thaliana, (B) C. elegans and
(C) E. coli (y-axis).
DOI:
http://dx.doi.org/10.7554/eLife.04531.008CPs with UpA at the 3–1 position were also specifically under-represented in
human mRNA sequences (Figure 1A,B), consistent
with global under-representation of this dinucleotide in coding sequences from
eukaryotes (Beutler et al., 1989; Duan and Antezana, 2003). The dataset
additionally demonstrated over-representation of CpA and UpG dinucleotides at the
3–1 position; these are typically created by the (methylation-associated)
C->T transition upstream of G (fifth and 14th main columns in Figure 1A) and of CpC and CpU (Simmen, 2008). However, with a few exceptions,
such as the prominent over-representation of GCG|GCG and CCG|CCG, other codon pairs
showed infrequent or minor differences in representation. The avoidance of CpG and
UpA in human mRNA sequences at the 3–1 position was further manifested at
other three codon position (Bennetzen and Hall,
1982; Atkinson et al., 2014); among
the 61 degenerate codons, those containing CpG or UpA at these positions showed lower
relative synonymous codon usage than those containing other dinucleotides (Figure 1—figure supplement 1).
Figure 1—figure supplement 1.
Distribution of relative synonymous codon usage values for degenerate
codons in the human genome (stop codons were excluded).
Codons with CpG and UpA at the 1–2 or 2–3 codon position
are shaded as indicated in the key (data derived from http://bioinformatics.weizmann.ac.il/databases/codon).
DOI:
http://dx.doi.org/10.7554/eLife.04531.005
Avoidance of codon pairs with CpG at the 3–1 position was also observed in the
plant genome of A. thaliana that also possesses a
methylation-dependent suppression of CpG dinucleotides (Figure 1—figure supplements 2A and 3). Codon
pair usage of human and plant coding sequences was indeed significantly correlated
(R2 = 0.146; Figure
1—figure supplement 4). In contrast to plant coding sequences, no
equivalent avoidance of CpG-containing codon pairs was observed in organisms with
non-methylated genomes (Caenorhabditis elegans and
Escherichia coli; Figure 1—figure supplements 2B,C,3,4).
Separate assessment of effects of CP and dinucleotide frequencies on virus
replication
The close association between CP usage and the identity of dinucleotides at codon
boundaries immediately complicates any observational assessment of the potentially
separate contributions of CP bias and CpG/UpA dinucleotide frequencies on virus
replication. On the one hand, it could be hypothesised that the suppression of CpG
and UpA at position 3–1 in mammalian codon pairs was a simple consequence of
avoiding disfavoured codon pairs. Conversely, it could be conceptualised that codon
pair choice is driven in part through avoidance of specific dinucleotides. To resolve
this functionally, we compared replication dynamics and relative fitness of native E7
with a series of novel mutants of E7 in which dinucleotide frequencies and CP usage
were independently manipulated (Figure 2;
Table 2). To achieve this, a mutational
program was developed (Sequence Mutate in the SSE package (Simmonds, 2012) that allowed synonymous changes to be
introduced into a coding sequence to achieve a pre-specified CP score target while
under constraints such as retaining CpG and UpA dinucleotide frequencies and
mononucleotide composition.
Figure 2.
Codon pair scores and numbers of CpG and UpA dinucleotides in native
(WT) and mutated regions of E7.
Mean CP scores for Regions 1 and 2 combined are shown on the x-axis; the
total numbers of CpG and UpA dinucleotides in each sequence are shown on the
y-axis. The histogram shows CP scores for the 35,170 human mRNA sequences
>200 bases in length (mean 0.072; standard deviation ±0.031).
DOI:
http://dx.doi.org/10.7554/eLife.04531.009
Table 2.
Composition and codon usage of E7 wt and mutant insert sequences
DOI:
http://dx.doi.org/10.7554/eLife.04531.010
Region
Sequence (Symbol)
G+C content
CpG Total*
O/E ratio†,‡
UpA Total*
O/E ratio†,‡
Codon Usage
CAI¶
ENc
CP Bias
1
Native (WT)
47.6%
51 (−)
0.730
62 (−)
0.742
0.685
56.5
−0.043
Permuted (P)
47.6%
51 (0)
0.730
2 (0)
0.742
0.694
55.8
−0.025
CpG/UpAL (cu)
47.5%
0 (−51)
0
19 (−43)
0.227
0.686
43.5
0.087
Max-U
50.1%
47 (−4)
0.610
43 (−19)
0.573
0.708
49.6
0.328
Min_E
47.5%
51 (0)
0.736
62 (0)
0.735
0.748
54.3
−0.131
Min_U
47.5%
69 (+18)
0.992
76 (+14)
0.939
0.709
58.3
−0.134
Min_H
49.8%
106 (+55)
1.400
79 (+17)
0.981
0.696
49.2
−0.130
2
Native (WT)
47.1%
18 (−)
0.320
48 (−)
0.695
0.743
53.2
0.015
Permuted (P)
47.6%
18 (0)
0.320
48 (0)
0.695
0.739
49.0
0.013
CpG/UpAL (cu)
48.5%
0 (−18)
0
48 (0)
0.214
0.739
47.2
0.118
Max-U
46.3%
24 (+6)
0.440
43 (−3)
0.601
0.750
46.1
0.311
Min-E
45.7%
18 (0)
0.343
48 (0)
0.657
0.785
53.3
−0.091
Min-U
47.4%
37 (+19)
0.649
50 (+2)
0.738
0.767
57.6
−0.083
Min-H
47.8%
68 (+50)
1.172
65 (+15)
0.970
0.715
49.7
−0.085
Total number of CpG and UpA dinucleotides in sequence. Changes in numbers
between mutated and original WT sequence are indicated in
parentheses.
Ratio of observed dinucleotide frequency (O) to that expected based on
mononucleotide composition (E) that is, f(CpG)/f(C) × f(G).
Values deliberately changed are shown in red (maximised) and blue
(minimised).
Calculated from http://genomes.urv.es/CAIcal/ (Puigbo et al., 2008).
Codon pair scores and numbers of CpG and UpA dinucleotides in native
(WT) and mutated regions of E7.
Mean CP scores for Regions 1 and 2 combined are shown on the x-axis; the
total numbers of CpG and UpA dinucleotides in each sequence are shown on the
y-axis. The histogram shows CP scores for the 35,170 human mRNA sequences
>200 bases in length (mean 0.072; standard deviation ±0.031).DOI:
http://dx.doi.org/10.7554/eLife.04531.009Composition and codon usage of E7 wt and mutant insert sequencesDOI:
http://dx.doi.org/10.7554/eLife.04531.010Total number of CpG and UpA dinucleotides in sequence. Changes in numbers
between mutated and original WT sequence are indicated in
parentheses.Ratio of observed dinucleotide frequency (O) to that expected based on
mononucleotide composition (E) that is, f(CpG)/f(C) × f(G).Values deliberately changed are shown in red (maximised) and blue
(minimised).Calculated from http://genomes.urv.es/CAIcal/ (Puigbo et al., 2008).The mutant, Min-E was constructed from two genome regions, together comprising 31% of
the E7 genome, in which the coding sequence possessed the minimum possible CP score
(−0.111) while retaining identical CpG and UpA frequencies as WT virus (CP
score: −0.014; CpG: 0.525; UpA: 0.718; Figure
2, Table 2). Inserts with the same
CP frequencies as Min-E but without dinucleotide frequency constraints (Min-U; CpG:
0.82; UpA: 0.95) or where CpG and UpA frequencies were maximised (Min-H; CpG: 1.3;
UpA: 0.98) were generated similarly. The three mutants provided the opportunity to
investigate effects of dinucleotide frequency differences of viral fitness without
the compounding effect of CP bias. It was similarly possible to compare fitness of
the mutant, Max-U, with a maximised CP score (0.320) but with similar CpG and UpA
frequencies to WT with the previously described mutant, cu|cu, with minimised CpG and
UpA frequencies (0 and 0.22 respectively) but a CP score marginally greater that WT
(0.11; Figure 2). P|P was the permuted mutant
control with randomised codon order but identical coding and dinucleotide frequencies
to WT sequence.If CP usage solely determined virus replication ability, the seven mutants would be
expected to display the following fitness ranking:Max-U > cu|cu > (WT = P|P) > (Min-H = Min-U =
Min-E).Conversely, if virus fitness was determined by CpG and UpA dinucleotide frequencies,
the following ranking would be expected:cu|cu > Max-U > (WT = P|P = Min-E) > Min-U > Min-H.These predictions were determined by generation and infectivity measurement of virus
stocks corresponding to the seven mutants and comparing their relative fitness in
competition and replication assays (Figures
3,4).
Replication of WT and mutants of E7 with altered CP and dinucleotide
frequencies.
Bars are shaded diagrammatically based on their relative CpG/UpA
composition. RD cells were infected with E7 WT, at an MOI of 0.03 and
infectious titres quantified at 8, 18 and 30 hr time points post inoculation
(p.i.) by TCID50 determination. Results are the mean of three
biological replicates; error bars show standard errors of the mean.DOI:
http://dx.doi.org/10.7554/eLife.04531.011
RD cells were co-infected with pairs of WT (W|W) and E7 mutants at
equal MOI and the supernatant serially passaged through cells after
development of CPE. RNA was isolated and the composition of each virus
determined through selective restriction digests using enzymes listed in
Table 3.
(A) Examples of three competition assays showing cleavage
patterns of individual viruses (lanes 1, 2), the starting inoculum (lane
3) and two biological replicates after 10 (panels 1, 2) or 5 (panel 3)
passages in lanes 4 and 5. Results from the other competition assays are
shown in Figure 4—figure
supplement 1. (B) Summary of pairwise fitness
comparisons of viruses with outcomes for the viruses listed in columns at
passages 5 and 10 indicated by colour shading. For example, Min-E and WT
showed equal fitness (yellow shading) and cu|cu outcompeted WT by passage
5 (red) and Max-U by passage 10.
Figure 4—figure supplement 1.
Competition assays between E7 mutants showing competing variants
(lanes 1 and 2) andout at indicated passage number (lane 3) for
each.
DOI:
http://dx.doi.org/10.7554/eLife.04531.013
DOI:
http://dx.doi.org/10.7554/eLife.04531.012
Competition assays between E7 mutants showing competing variants
(lanes 1 and 2) andout at indicated passage number (lane 3) for
each.
DOI:
http://dx.doi.org/10.7554/eLife.04531.013
Replication phenotypes
Full length RNA transcribed from each E7 mutant cDNA constructs all generated
infectious virus after transfection into RD cells. Stocks of virus were generated
from WT and each mutant and infectivity quantified by quantal limiting dilution. To
investigate replication kinetics, RD cells were infected with WT and each mutant at
an MOI of 0.03 in triplicate and infectivity of supernatants measured at 8, 18 and 30
hr (Figure 3;). During the exponential period
of replication (8 and 8 hr), Min-U and Min-H mutants showed 1 and >2 log
reductions in virus replication respectively compared to WT E7. Contrastingly, the
CpG/UpA-minimised mutant, cu|cu replicated to approximately 1 log higher levels that
WT. Significantly for the analysis of effects of CP and dinucleotide frequencies on
replication, virus titres obtained from mutant with identical (Min-E, CDLR) or
similar (Max) CpG/UpA frequencies to WT were highly similar at both time points. At
the last timepoint (30 hr), RD cells infected with WT, CDLR, Min-E, Max and cu|cu
were entirely destroyed or almost entirely destroyed (Max-U) and showed similar
residual infectivities, while those infected with Min-H showed an incomplete
cytopthic effect.
Figure 3.
Replication of WT and mutants of E7 with altered CP and dinucleotide
frequencies.
Bars are shaded diagrammatically based on their relative CpG/UpA
composition. RD cells were infected with E7 WT, at an MOI of 0.03 and
infectious titres quantified at 8, 18 and 30 hr time points post inoculation
(p.i.) by TCID50 determination. Results are the mean of three
biological replicates; error bars show standard errors of the mean.
DOI:
http://dx.doi.org/10.7554/eLife.04531.011
Competition assays were used as a more stringent measure of fitness differences in
mutants with different codon pair biases. Equal MOIs of WT and mutants were
co-inoculated onto RD cells and serially passaged up to ten times. Population
compositions were determined by amplification of sequences across modified regions
and cleavage with restriction enzymes that differentiated WT mutant sequences from
each other (Figure 4A,B; Table 3). In the examples of competition assays
(Figure 4A), Max-U showed similar fitness
to WT but a greater population representation at passage 10. cu|cu completely
out-competed Max-U by passage 10 while in the final example, WT and Min-E showed
equal fitness at passage 5 and at passage 10 (Figure
4B). A total of 12 pairwise comparisons were made and outcomes in terms of
population representation recorded at passage 10 (Figure 4B; see Key). The results are internally consistent and with their
replication kinetics (Figure 3) and indicate
the following fitness ranking:
Figure 4.
RD cells were co-infected with pairs of WT (W|W) and E7 mutants at
equal MOI and the supernatant serially passaged through cells after
development of CPE. RNA was isolated and the composition of each virus
determined through selective restriction digests using enzymes listed in
Table 3.
(A) Examples of three competition assays showing cleavage
patterns of individual viruses (lanes 1, 2), the starting inoculum (lane
3) and two biological replicates after 10 (panels 1, 2) or 5 (panel 3)
passages in lanes 4 and 5. Results from the other competition assays are
shown in Figure 4—figure
supplement 1. (B) Summary of pairwise fitness
comparisons of viruses with outcomes for the viruses listed in columns at
passages 5 and 10 indicated by colour shading. For example, Min-E and WT
showed equal fitness (yellow shading) and cu|cu outcompeted WT by passage
5 (red) and Max-U by passage 10.
DOI:
http://dx.doi.org/10.7554/eLife.04531.012
DOI:
http://dx.doi.org/10.7554/eLife.04531.013
Table 3.
Enzymes used in selective digests for competition ASSAYS
DOI:
http://dx.doi.org/10.7554/eLife.04531.014
Virus 1
Virus 2
Region
Enzyme
Target
W|W
P|P
1
SpeI
Permuted
W|W
Max-U
1
SacI
Max
W|W
Min-E
1
NcoI
WT
W|W
Min-U
1
NcoI
WT
W|W
Min-H
1
EcoRV
WT
W|W
cu|cu
1
EcoRV
WT
P|P
cu|cu
1
SpeI
Permuted
Max-U
P|P
1
SpeI
Permuted
Max-U
cu|cu
1
SacI
Max
Min-E
Min-U
1
ClaI
Min-U
Min-E
Min-H
1
EcoRV
Min-E
Min-U
Min-H
1
ClaI
Min-U
Enzymes used in selective digests for competition ASSAYSDOI:
http://dx.doi.org/10.7554/eLife.04531.014Using the Spearman rank correlation test, fitness ranking was significantly
associated with CpG and UpA frequencies in the insert region (p < 0.001) but
showed no association with CP frequencies and other measures of codon usage that
potentially influence translation rates, codon adaptation index (CAI) and effective
number of codons (ENc) (Table 4).
Consistently, these results demonstrate that when altered independently from CP bias,
only dinucleotide frequencies were associated with replication fitness.
Table 4.
Correlation between fitness ranking and sequence composition
DOI:
http://dx.doi.org/10.7554/eLife.04531.015
Variable
Spearman R
p value†
CpG/UpA*
1.0
<0.001
CP bias
−0.70
0.1 (n.s.‡)
CAI
−0.334
>0.5 (n.s.)
ENc
0.593
>0.5 (n.s.)
G + C content
0.075
>0.5 (n.s.)
Translation efficiency
−0.074
>0.5 (n.s.)
Number of CpG and UpA dinucleotides in insert region.
From values tabulated in (Ramsey,
1989).
n.s. : not significant.
Correlation between fitness ranking and sequence compositionDOI:
http://dx.doi.org/10.7554/eLife.04531.015Number of CpG and UpA dinucleotides in insert region.From values tabulated in (Ramsey,
1989).n.s. : not significant.
Comparison of translation efficiencies
The maxim that any effects of CP frequencies on replication are mediated through its
influence on translation efficiency was investigated for the mutants constructed in
the study. Translation assays were evaluated in vitro to avoid effects mediated
through stress response-related RNA recognition mechanisms that restrict E7
translation and subsequent replication immediately after entry (Atkinson et al., 2014). Viral RNA transcripts from E7 WT and
mutant cDNA clones were used to program rabbit reticulocyte lysates in the presence
of [35S]-methionine. Electrophoresis of reactions after 3 hr showed
translation of several bands representing cleaved and partially cleaved E7 proteins
(Figure 5; Figure 5—figure supplement 1). Translation efficiencies
of each of the mutant E7 transcripts were comparable to WT RNA; what variability
there was between mutants (Figure 5—figure
supplement 1) did not correlate with replication fitness
(R = −0.075; p > 0.5; Table 4). This indicates that, at least in a whole genome
context, alteration of either CP or dinucleotide frequencies had no significant
effect on viral polyprotein translation and therefore cannot be attributed to the
marked differences in replication phenotypes observed.
Figure 5.
Translation of RNA templates generated from WT and mutant E7 cDNAs in
a rabbit reticulocyte cell free assay.
Assignments of bands to E7 proteins were based on molecular weights on
SDS-PAGE. A comparison of densitometry values for viral proteins is shown
in Figure 5—figure supplement
1.
DOI:
http://dx.doi.org/10.7554/eLife.04531.016
Translation efficiencies of mutant E7 cDNAs were quantified relative to
expression from the WT template. Bars show mean values for seven viral
proteins; error bars show standard errors of the mean.
DOI:
http://dx.doi.org/10.7554/eLife.04531.017
Figure 5—figure supplement 1.
Translation efficiencies estimated by densitometry of band
intensities of viral proteins translated in a rabbit reticulocyte cell
free assay.
Translation efficiencies of mutant E7 cDNAs were quantified relative to
expression from the WT template. Bars show mean values for seven viral
proteins; error bars show standard errors of the mean.
DOI:
http://dx.doi.org/10.7554/eLife.04531.017
Translation of RNA templates generated from WT and mutant E7 cDNAs in
a rabbit reticulocyte cell free assay.
Assignments of bands to E7 proteins were based on molecular weights on
SDS-PAGE. A comparison of densitometry values for viral proteins is shown
in Figure 5—figure supplement
1.DOI:
http://dx.doi.org/10.7554/eLife.04531.016
Translation efficiencies estimated by densitometry of band
intensities of viral proteins translated in a rabbit reticulocyte cell
free assay.
Translation efficiencies of mutant E7 cDNAs were quantified relative to
expression from the WT template. Bars show mean values for seven viral
proteins; error bars show standard errors of the mean.DOI:
http://dx.doi.org/10.7554/eLife.04531.017
Discussion
This study sought to disentangle effects of codon pair usage and nucleotide frequencies
in a re-examination of their effects on the replication of an RNA virus, E7. In the
literature, studies have documented effects of CP de-optimisation on virus attenuation
without reference to effects of this procedure on dinucleotide frequencies (Coleman et al., 2008; Mueller et al., 2010; Martrus et
al., 2013; Yang et al., 2013; Le Nouen et al., 2014; Ni et al., 2014). While frequencies of both of these dinucleotides
are suppressed in most classes of mammalian RNA viruses (Rima and McFerran, 1997), all sequences modified to select
disfavoured CPs (Coleman et al., 2008; Mueller et al., 2010; Martrus et al., 2013; Yang et
al., 2013; Ni et al., 2014)
consistently elevated frequencies of CpG and UpA dinucleotides to levels to
2.5–threefold higher levels than the original native sequences (Table 1). As documented in other studies (Burns et al., 2009; Atkinson et al., 2014), these dinucleotide frequencies may
contribute additionally to the observed attenuation of virus replication.Through construction of mutants of E7 in which CP frequencies was altered while keeping
dinucleotide frequencies constant (WT/Min-E) and conversely, generating viruses with the
same or similar CP biases but different dinucleotide frequencies (eg., Min-E/Min-U/Min-U
and cu|cu/WT/Max-U), we were able to separate potential influences of these
compositional variables on replication phenotype and fitness. The fitness ranking
derived from competition assays (cu|cu > Max > (WT = P|P = Min-E)
> Min-U > Min-H) demonstrated that it was dinucleotide frequencies that
significantly influenced fitness while differences in CP usage showed no detectable
phenotypic effect. Moreover, if CPD were to influence virus replication then its effect
would be manifested through changes in translation rate; however, no measurable
differences in translation efficiency were detected between WT and CP-optimised (Max) or
de-optimised (Min) template RNA sequences. These findings are broadly consistent with
results of previous translation assays of poliovirus mutants differing in CP choice in
which relatively small differences in translation of PV-Min, WT and PV-Max mutants were
clearly incompatible with the marked differences in their replicative ability (Coleman et al., 2008).In a broader context, the finding that alterations in codon pair frequencies has no
independent effect on virus replication are consistent with current understanding of the
nature and driving forces behind codon pair usage in other organisms. Most importantly,
there is no evidence that disfavoured CPs in eukaryotes, archaea or prokaryotes are
those that specifically retard translation rates. Indeed, where specifically
investigated, the opposite was observed. mRNA templates containing disfavoured codon
pairs in E. coli were translated faster than those containing
over-represented CPs (Irwin et al., 1995). The
16 codon pairs identified as most retarding translation of the E. coli
his operon leader peptide gene has codon pair scores ranging from −0.94 to
+0.54 and distributed around the centre of the distribution of codon pair scores
(Figure 1—Figure Supplement 3) (Chevance et al., 2014). The current consensus view
is that CP usage in prokaryotes is governed functionally as a means to regulate gene
expression rather to maximise translation (Folley and
Yarus, 1989; Irwin et al., 1995;
Boycheva et al., 2003; Buchan et al., 2006).In eukaryotic genomes, other factors underlie codon pair representation since coding
regions in mRNA sequences and in non-transcribed genomic DNA showed similar biases in
codon pair frequencies (Moura et al., 2007). CP
frequencies must therefore be substantially determined by mutational events operating on
DNA such as methylation and specific context-dependent errors during genome replication
instead of any kind of optimisation or regulation of translation. The consistent
under-representation of codon pairs with CpG in the 3–1 position (Figure 1; Buchan et
al., 2006) in mammalian genomes indeed likely originates from DNA
methylation-induced mutations in the nucleus. Our data showing similar rates of
translation of Min-H and Max-U that show major differences in frequencies of
CpG-containing CPs (Figure 5) are consistent with
this interpretation.Finally, there is no theoretical basis for the assumption that CPs are disfavoured
because of their negative effects on translation efficiency and this concept runs
counter to our growing understanding of the intricate mechanisms that govern gene
expression. In all organisms, coding sequences differ in codon usage, match to tRNA
abundances, mRNA stability and initiation sites to regulate rates and fidelity of
protein expression (reviewed in Gingold and Pilpel,
2011). Some of the variability in CP usage observed in the three domains of
life (Moura et al., 2005; Sharp et al., 2005; Tats et al.,
2008; Wang and Li, 2009) likely
represents aspects of that control, rather than as a means to simply maximise
translation.
Mechanism of attenuation
Understanding what limits the replication of viruses with altered CP and dinucleotide
frequencies is critical in the evaluation of their broader safety as attenuated virus
vaccines. The proposed mechanism in which alterations in CP bias alter translation
efficiency and it is this that inhibits virus replication introduces a conceptual
model in which it is the virus that is intrinsically defective. With the large number
of mutations required for reversion, such viruses should be stably attenuated in
whatever context they are used. However, as we have now shown, the replication defect
of CPD viruses is actually mediated through alterations in dinucleotide frequencies
in the genome that influence their recognition by the cell. In this alternative
paradigm, viruses with elevated frequencies of CpG and UpA are not intrinsically
defective but they are more readily recognised by the cell and prevented from
initiating replication. Their attenuation is therefore dependent on the efficacy of
the host innate immune response.The cellular mechanisms responsible for differential recognition and response to RNA
sequences with different dinucleotide composition are currently unknown. In our
previous study, we obtained evidence that replication inhibition of high CpG/UpA
mutants of E7 occurred shortly after cell entry and was not mediated though
conventional pattern recognition receptors (Atkinson
et al., 2014). In that study, we additionally demonstrated that it was
additionally not the result of differences between high and low CpG/UpA viruses in
their sensitivity to the cellular interferon response. We did observe, however, that
the attenuated phenotype of high mutants could be entirely reversed by the kinase
inhibitor, C16, a finding that suggests that recognition may occur through an as yet
uncharacterised PKR-related component of the stress response pathway in the cell.Both the adaptive and innate arms of the human immune system are highly polymorphic
with remarkable variability in function and expression of many key components of
recognition or effector proteins mediating antiviral responses (Thomas et al., 2009; Everitt
et al., 2012; Hambleton et al.,
2013; Pothlichet and Quintana-Murci,
2013). Although uncharacterised mechanistically, there is clearly a
potential danger that pathways that restrict the replication of high CpG/UpA RNA
viruses may be similarly variable in the efficacy in humans and in veterinary species
with different genetic backgrounds. The attenuation of live vaccines and safety
margins established for their large scale use may be similarly variable;
investigation of population differences in innate cellular responses to viruses of
different dinucleotide compositions is essential in the evaluation of the safety of
this new generation of high CpG/UpA live attenuated vaccines.
Materials and methods
Cell culture and cell lines
RNA transcripts of the pT7:E7 infectious cDNA clone of the isolate Wallace (accession
number AF465516) were used to generate E7 viral stocks. E7 was propagated in
rhabdomyosarcoma (RD) cells using Dulbecco modified Eagle medium (DMEM) with 10%
foetal calf serum (FCS), penicillin (100 U/ml) and streptomycin (100 µg/ml). All
cells were maintained at 37°C with 5% CO2.
In silico design of CpG and UpA modified viruses
The two regions in the E7 genome used previously to investigate effects of
dinucleotide frequencies on virus replication (Atkinson et al., 2014) were used in the current study (Region 1:
1878–3119 and 5403–6462). Previously characterised mutants comprised
the CpG/UpA-low mutant cu|cu with all CpG dinucleotides and as many UpA dinucleotides
possible eliminated and the permuted mutant P|P in which codon order was permuted
while retaining protein coding and native dinucleotide frequencies. Further mutants
(Max-U, Min-E, Min-U, Min-H) are described in the main text; sequences listed in
Supplementary file
1.
Bioinformatics analysis
Manipulation of dinucleotide frequencies and codon pair scores in coding sequences
was performed using the program Sequence Mutate in version 1.2 of the SSE package
(Simmonds, 2012). Reference datasets of
human, A. thaliana, C. elegans and E.
coli messenger RNA sequences were obtained from the Refseq database
(http://www.ncbi.nlm.nih.gov/nuccore). Codon pair usage tables were
generated from coding regions of each mRNA sequence datasets from the four organisms
using the program Composition Scan in the SSE package (Simmonds, 2012). Codon pair tables generated by SSE were used
to calculate CP frequencies for all mRNA sequences with coding regions >200
bases in length from each organism. These comprised 35,770 human mRNA sequences,
32,768 from A. thaliana, 24,093 from C. elegans and
4316 from E. coli.The codon pair table generated by SSE from our dataset of human mRNA sequences was
used in preference to that previously described (Table S2 in reference Coleman et al., 2008) because of the larger
number of human mRNA sequences now available. The previously published dataset as
presented additionally unaccountably omitted a large number of codon pairs (3586 were
listed instead of the 3904 expected—61 × 64). There was however a good
correlation between CP frequencies between the two datasets (Figure 6).
Figure 6.
Comparison of codon pair scores generated by SSE using a dataset of
35,770 human mRNA sequences (y-axis) with those used in a previous analysis
(Coleman et al., 2008).
DOI:
http://dx.doi.org/10.7554/eLife.04531.018
Comparison of codon pair scores generated by SSE using a dataset of
35,770 human mRNA sequences (y-axis) with those used in a previous analysis
(Coleman et al., 2008).
DOI:
http://dx.doi.org/10.7554/eLife.04531.018The codon adaptation index for human codon usage was calculated through the website
http://genomes.urv.es/CAIcal/ (Puigbo et al., 2008). The effective number of codons (Enc; Wright, 1990) and CP usage (Buchan et al., 2006; Coleman et al., 2008) were calculated using the program
Composition Scan in SSE.
Clone construction and recovery of mutant viruses
Mutant E7 constructs with altered CP frequencies were generated from custom
synthesised DNA sequences (Eurofins Genomics, Ebersberg, Germany). Mutant clones were
constructed as previously described (Atkinson et
al., 2014). All clones were sequenced over the insert regions prior to
further applications. Infectious virus from each cDNA clone was recovered by
transfection of RNA transcripts produced from plasmids linearised using
NotI using a Riboprobe System-T7 in vitro transcription kit
(Promega Ltd. Southampton, UK). 100 ng of RNA was transfected into RD cells using
Lipofectamine 2000 (Invitrogen, Life Technologies Ltd., Paisley, UK) according to the
manufacturer's instructions. The resulting cell lysates were used to generate passage
1 stocks by re-infecting RD cells. Viral titres were determined by TCID50
titration in RD cells.
Replication phenotype
Multi-step growth curves for each virus were generated by infecting RD cells in
triplicate in 24-well plates at an MOI of 0.03 as previously described (Atkinson et al., 2014). Supernatant collected at
time points (8, 18 and 30 hr post-infection) were assayed for infectivity by quantal
dilution. Competition assays were performed as previously described. Briefly, equal
titres of virus pairs (combined MOI = 0.01) were applied simultaneously to RD
cells in 25 cm2 bottles. Following the development of CPE, supernatant was
collected and 300 µl applied to fresh RD cells. This was continued for up to 10
passages. The results of the competition assays were determined by restriction enzyme
digestion of the amplicon amplified from Region 1 by combined reverse
transcription—PCR (Atkinson et al.,
2014). Restriction enzymes used to differentiate each mutant pair are
listed in Table 3.
In vitro transcription and translation
RNAs were produced by in vitro T7 transcription (Riboprobe System T7, Promega) of the
various cDNA plasmids, each linearised with NotI (Promega).
Transcript RNAs were used to program nuclease-treated rabbit reticulocyte lysates
(Promega) supplemented with HeLa cell S10 cytoplasmic extracts (Dundee Cell Products,
Dundee, UK). Reactions were set-up as follows; 7 μl rabbit reticulocyte lysate,
transcript RNA (0.25–2 μg), 0.5 μl 1 mM amino acid mix (minus
methionine), 0.5 μl [35S]-methionine (1200 Ci/mmol), 10 U RNasin
Ribonuclease Inhibitor and 2.25 μl HeLa cell extract in a total volume of 12.5
μl. Reactions were incubated at 30°C for 3 hr and analysed by SDS-PAGE
(4–20% Tris-Glycine, Expedeon Ltd. Cambridge, UK). Gels were exposed to film
(Thermo Scientific, Basingstoke, UK) for 1–4 days at −70°C. To
determine the relative density of the protein bands, densitometry was carried out on
the scanned gel image using ImageJ 1.48 software (http://imagej.nih.gov/ij).eLife posts the editorial decision letter and author response on a selection of the
published articles (subject to the approval of the authors). An edited version of the
letter sent to the authors after peer review is shown, indicating the substantive
concerns or comments; minor concerns are not usually shown. Reviewers have the
opportunity to discuss the decision before the letter is sent (see review
process). Similarly, the author response typically shows only responses
to the major concerns raised by the reviewers.Thank you for sending your work entitled “RNA virus attenuation by codon pair
deoptimisation is an artefact of increases in CpG/UpA dinucleotide frequencies”
for consideration at eLife. Your article has been favorably evaluated
by Richard Losick (Senior editor), a Reviewing editor, and 3 reviewers, one of whom,
Raul Andino, has agreed to reveal his identity.The Reviewing editor and the reviewers discussed their comments before we reached this
decision, and the Reviewing editor has assembled the following comments to help you
prepare a revised submission. The comments in full from the three reviewers are attached
below.Our reviewers were in close agreement that the paper makes important points and is
topical and appropriate for eLife. There was also a consensus that
there were some failings in reviewing the history of this topic, and I agree that an
improved review is needed (apparently key citations are missing). This should be easily
fixed. There were some good suggestions to improve clarity. There were also requests for
more experimentation from one (the third) reviewer, which I think sound reasonable, but
I cannot evaluate how much work is involved. I would encourage the authors to consider
doing what is easy. It seems to this reviewer that the quantitation might be improved
without huge effort. We are willing to leave it to the authors to decide how much more
wet work is desired.All told, we are supportive of publication.Reviewer #1This manuscript presents a careful and interesting work, and even more thoroughly
documents on the attenuating effect of dinucleotides CpG and UpA in the genomes of RNA
viruses than previous papers have. They argue, convincingly, that attenuating effects
previously ascribed to 'codon pair bias' in attenuating viruses can be
completely explained by the resulting increases in CpG and UpA dinucleotides. They do
this via a systematic analysis of the attenuating effects of altering 'codon-pair
bias' in previous papers and by carefully design experiments of their own, which
include documentation that the attenuation does not occur by reducing translational
efficiency, but by reducing the infectivity of the resulting virions.With respect to the possible attenuation mechanism of CpG and UpA dinucleotides, the
authors discuss the possibility that it is a mechanism of the innate immune response
that is not PKR- or pattern receptor-dependent. Data relevant to this are published
separately (Atkinson et al., 2014).The curious thing about this field, and this manuscript, is that the 'straw
man' of codon-pair bias was already destroyed by Burns et al., in 2009. These
authors were pursuing their own hypothesis, that the substitution of rare codons would
attenuate viruses. They found that this was the case, but careful mutagenesis revealed
that it was, in fact, the increase in CpG and UpA nucleotides in the RNA that accounted
for the attenuation. They showed that this effect was not due to translational
efficiency and was manifested as the increased particle-to-PFU ratio of the resulting
viruses. They reflected that bias against CpG and UpA codons might be the reason, in
fact, that codons that contain them, and codon pairs that create them, are rare to begin
with.They noted, quite politely, that this also accounted for the effect of the highly
publicized “codon-pair bias” published by Coleman et al., 2008. They
stated: “A prominent feature of the most disfavored codon pairs is the presence
of CpG or UpA across codons. Thus, the observed CPB in poliovirus and in humans and
higher eukaryotes may be driven primarily by CpG and UpA dinucleotide suppression. In
this context, it is notable that in cassette C of the construct with the lowest fitness,
ABc12, within-codon CpG and UpA frequencies were maximized but the CPB
score was similar to those of higher-fitness constructs, including ABC.” Thus,
like the current manuscript, they re-analyzed the data of Coleman et al., and found it
to be completely explicable in terms of CpG and UpA dinucleotides.This conclusion, and the published data, were clearly insufficient to prevent the design
and interpretation of several more manuscripts using 'codon pair bias' to
design attenuated viruses for vaccine and basic science purposes (Mueller et al., 2010;
Martrus, 2013; Yang, 2013; Ni, 2014).In short, this is a well-executed, important study that extends the very clear
conclusions of the also excellent, but completely ignored, 2009 paper of Burns et al.
Therefore its publication would bring an important discussion into the limelight. Here
are my suggestions, mostly having to do with writing and scholarship:1) That CpG and UpA dinucleotide frequency correlates with attenuation of viruses,
rather than 'codon pair bias' or 'codon de-optimization', is a
feature of these approaches rather than an artifact, in this reviewer's opinion.
The combative nature of the title is not necessary to bring this fact to light.2) It does not detract from the quality of the present work to acknowledge more
explicitly the contributions of Burns et al. It may, however, detract from its
novelty.3) The readers understanding of the data is compromised by the poor description of the
experiments and analysis in the figure legends. These should be rewritten in such a way
that the figures can be independently understood. For example, nowhere in the manuscript
can the metric for “CPB score” be discovered.4) Similarly, the quantitative findings in Figure
4 are difficult to dissect. With so many changes in the genomes being
analyzed, qRT-PCR of viral competitions would be helpful. As it is, the actual data from
the competition experiments should be presented, rather than a cartoon depicting which
viruses 'won' and 'lost'.5) The last paragraph of the Discussion uses 'data not shown' to make its
point. These data should be shown or the sentences deleted.Reviewer #2There has been interesting discussions on why the nucleotide composition of many human
viruses present distinct nucleotide, di-nucleotide and codon preferences. In the last
years synthetic viruses have been generated with alternative codon distributions that
show attenuated replication. However these constructs change other variables (nucleotide
or dinucleotide composition) in addition to the codon distribution and it is unclear
what of those changes affects replication. The authors address this issue by generating
mutants of the echovirus 7 in which CpG and UpA dinucleotides were varied independent of
the codon distribution. The results clearly showed that the main factor dominating
replication was dinucleotide content, and that translation efficiency was unaffected by
the two variables. The authors then argue that attenuation is mediated through innate
immune response to viruses with high CpG/UpA content.The paper is very interesting and results are compelling. The main critiques are:1) The topic of dinucleotide biases in RNA viruses has been explored extensively and
most references are absent.2) The discussion on the mechanisms of attenuation and the association to the innate
immune response is not well elaborated. They mention the lack of association to
interferon response and the effect of C16. However, the results are not presented in
detail. I suggest the authors to strengthen their conclusions.Reviewer #3This study examines the role of nucleotide composition in RNA virus genomes. The authors
introduced a number of synonymous mutations into the EV71 genome to modify the frequency
of CpG and UpA dinucleotide (DN) or codon pair (CP). Their results support the idea that
DN frequencies determine virus replication, while neither DN nor CP affected translation
efficiency. They concluded that DN affect virus replication because the nucleotide
composition of the virus genome influences the host-cell innate response to the
virus.The most interesting contributions of this study are: 1) they are able to identify
mutations that will affect CP without affecting CpG/UpA dinucleotide composition and
vice versa; (Bennetzen and Hall, 1982) based on
this bioinformatics information they constructed mutants that change CP or DN
composition and experimentally evaluated the effect of synonymous mutation on virus
replication. Their conclusions are supported by the available results, and I think this
study represents an important contribution to the field because it seems to address a
molecular mechanism for virus genome nucleotide composition bias. However, I believe
that a more quantitative analysis of the competition experiment may be required to
determine the degree of correlation between DN composition and virus replication
fitness.The authors choose competition assays to precisely analyze fitness between different
virus and with respect to WT. This is the correct experiment in my opinion, however I
believe the analysis of the experiment is somewhat casual and not very quantitative, and
therefore limits the value of this data. Figure 4
presents the results using a differential restriction enzyme pattern to distinguish
between the two competing viruses, but they can only determine when one virus is lost
(no longer detected by the assay). I think that it will be a lot more powerful to use
digital PCR to precisely quantify the ratio between virus genomes in the given
competition assay. This will provide parameters that can then be fit in a simple
mathematical model to determine with more accuracy the correlation between DN or CP and
fitness, which at this point seems a bit circumstantial.Similarly, it would be desirable to improve the quality of the in vitro translation
assay and quantify protein production to determine that, in this case, there is little
correlation between translation efficiency and fitness, as this is one of the central
claims of the study.Reviewer #11) That CpG and UpA dinucleotide frequency correlates with attenuation of
viruses, rather than 'codon pair bias' or 'codon
de-optimization', is a feature of these approaches rather than an artifact, in
this reviewer's opinion. The combative nature of the title is not necessary to
bring this fact to light.As co-authors, we did spend some time discussing the title of the paper. The final
choice was in fact motivated for reasons alluded to by the reviewer: that those involved
in the codon pair programme have simply ignored the published evidence, by Burns et al.
and more recently from our lab (Atkinson et al.), that the attenuating effect was
mediated through inadvertently increasing CpG ad UpA dinucleotide frequencies rather any
effect on translation. A clear, declarative title of the paper seems required to counter
mistaken views on this. If we changed the title as the reviewer suggests, it would imply
that codon pair de-optimisation has been used deliberately as a way to increase CpG and
UpA dinucleotide frequencies. This is absolutely not the case.That said, we would have liked to qualify that statement by stating why we make this
assertion, but the low character limit imposed on eLife papers titles
prevents us from doing this.2) It does not detract from the quality of the present work to acknowledge more
explicitly the contributions of Burns et al. It may, however, detract from its
novelty.We have cited that study in the original manuscript. In the Introduction of the revised
manuscript, we have described the observation made by Burns and colleagues in more
detail as requested by the reviewer.3) The readers understanding of the data is compromised by the poor description
of the experiments and analysis in the figure legends. These should be rewritten in
such a way that the figures can be independently understood. For example, nowhere in
the manuscript can the metric for “CPB score” be
discovered.We apologise for these omissions and have endeavoured to make the figure legends clearer
(more explanatory) and describe the various abbreviations more fully. We have dropped
the abbreviations CPB as it can be expressed more clearly by other wordings
(e.g. biased codon pair usage).4) Similarly, the quantitative findings in
are difficult to dissect. With so many changes in the genomes being analyzed,
qRT-PCR of viral competitions would be helpful. As it is, the actual data from the
competition experiments should be presented, rather than a cartoon depicting which
viruses 'won' and 'lost'.We fully agree that the results presentation in Figure
4B was unduly diagrammatic and lacked primary data (with the exception of the
two example gel images in Figure 4A). This point
was also made by Reviewer #3. To address this, we have made use of the possibility
to include figure supplements by now including gel images of the other competition
assays so that relative fitness can be directly evaluated. We have, however, retained
the original Figure 4B as a summary of the
experimental data.The use of restriction enzymes to differentiate E7 mutants is a widely used method and
its results can be made fully quantitative using appropriately calibrated controls. Had
the phenotypes been more subtle we agree that quantitative qRT-PCR would likely have
been necessary to discriminate between the fitness of viruses with modified dinucleotide
ratios. However, in the current study, outcomes were either elimination of one or other
of the competing viruses or a draw. These results are readily visualised as presented,
and we believe the investigation does not need more precise quantitation as suggested by
the reviewer.5) The last paragraph of the discussion uses 'data not shown' to make
its point. These data should be shown or the sentences deleted.We agree, and that part of the discussion has entirely been removed. We have also
removed reference to unpublished investigations of Theiler’s virus and influenza
A virus from the Introduction.Reviewer #21) The topic of dinucleotide biases in RNA viruses has been explored extensively
and most references are absent.We have now cited, in the Introduction, the Rima and Karlin studies that originally
noted the suppression of CpG and UpA dinucleotide frequencies in RNA viruses.2) The discussion on the mechanisms of attenuation and the association to the
innate immune response is not well elaborated. They mention the lack of association
to interferon response and the effect of C16. However, the results are not presented
in detail. I suggest the authors to strengthen their conclusions.That part of the Discussion was based on the results presented in the Atkinson et al.
paper from earlier in the year. We have modified that paragraph to clarify that we were
referring to this previous study.Reviewer #3This study examines the role of nucleotide composition in RNA virus genomes. The
authors introduced a number of synonymous mutations into the EV71 genome to modify
the frequency of CpG and UpA dinucleotide (DN) or codon pair (CP). Their results
support the idea that DN frequencies determine virus replication, while neither DN
nor CP affected translation efficiency. They concluded that DN affect virus
replication because the nucleotide composition of the virus genome influences the
host-cell innate response to the virus.The most interesting contributions of this study are: 1) they are able to
identify mutations that will affect CP without affecting CpG/UpA dinucleotide
composition and vice versa; () based on this
bioinformatics information they constructed mutants that change CP or DN composition
and experimentally evaluated the effect of synonymous mutation on virus replication.
Their conclusions are supported by the available results, and I think this study
represents an important contribution to the field because it seems to address a
molecular mechanism for virus genome nucleotide composition bias. However, I believe
that a more quantitative analysis of the competition experiment may be required to
determine the degree of correlation between DN composition and virus replication
fitness.The authors choose competition assays to precisely analyze fitness between
different virus and with respect to WT. This is the correct experiment in my opinion,
however I believe the analysis of the experiment is somewhat casual and not very
quantitative, and therefore limits the value of this data.
presents the results using a differential restriction enzyme pattern to
distinguish between the two competing viruses, but they can only determine when one
virus is lost (no longer detected by the assay). I think that it will be a lot more
powerful to use digital PCR to precisely quantify the ratio between virus genomes in
the given competition assay. This will provide parameters that can then be fit in a
simple mathematical model to determine with more accuracy the correlation between DN
or CP and fitness, which at this point seems a bit circumstantial.This comment is related to that of Reviewer #1 and has been addressed above (the
limited number of observed outcomes of competition assays can be effectively
demonstrated through restriction enzyme analysis).To address the issue of relative fitness further, we have repeated the experiment to
determine the replication kinetics of WT and mutants of E7 that was depicted in Figure 3. The re-formatting of data as histograms
with error bars now allows replication rates from all mutants at different time points
to be shown and these entirely back up the competition assay results.Similarly, it would be desirable to improve the quality of the in vitro
translation assay and quantify protein production to determine that, in this case,
there is little correlation between translation efficiency and fitness, as this is
one of the central claims of the study.We did indeed quantify translation of the individual proteins on the blot by
densitometry and presented the results in the form of a histogram in Figure 5–figure supplement 1. We additionally
used a translation efficiency metric based on this quantitation to analyse potential
associations with virus replication rate (Table
4) along with other variables (dinucleotide composition, CB bias, CAI, ENc and
G + C content).
Authors: David L Thomas; Chloe L Thio; Maureen P Martin; Ying Qi; Dongliang Ge; Colm O'Huigin; Judith Kidd; Kenneth Kidd; Salim I Khakoo; Graeme Alexander; James J Goedert; Gregory D Kirk; Sharyne M Donfield; Hugo R Rosen; Leslie H Tobler; Michael P Busch; John G McHutchison; David B Goldstein; Mary Carrington Journal: Nature Date: 2009-10-08 Impact factor: 49.962
Authors: Kristian G Andersen; B Jesse Shapiro; Christian B Matranga; Rachel Sealfon; Aaron E Lin; Lina M Moses; Onikepe A Folarin; Augustine Goba; Ikponmwonsa Odia; Philomena E Ehiane; Mambu Momoh; Eleina M England; Sarah Winnicki; Luis M Branco; Stephen K Gire; Eric Phelan; Ridhi Tariyal; Ryan Tewhey; Omowunmi Omoniwa; Mohammed Fullah; Richard Fonnie; Mbalu Fonnie; Lansana Kanneh; Simbirie Jalloh; Michael Gbakie; Sidiki Saffa; Kandeh Karbo; Adrianne D Gladden; James Qu; Matthew Stremlau; Mahan Nekoui; Hilary K Finucane; Shervin Tabrizi; Joseph J Vitti; Bruce Birren; Michael Fitzgerald; Caryn McCowan; Andrea Ireland; Aaron M Berlin; James Bochicchio; Barbara Tazon-Vega; Niall J Lennon; Elizabeth M Ryan; Zach Bjornson; Danny A Milner; Amanda K Lukens; Nisha Broodie; Megan Rowland; Megan Heinrich; Marjan Akdag; John S Schieffelin; Danielle Levy; Henry Akpan; Daniel G Bausch; Kathleen Rubins; Joseph B McCormick; Eric S Lander; Stephan Günther; Lisa Hensley; Sylvanus Okogbenin; Stephen F Schaffner; Peter O Okokhere; S Humarr Khan; Donald S Grant; George O Akpede; Danny A Asogun; Andreas Gnirke; Joshua Z Levin; Christian T Happi; Robert F Garry; Pardis C Sabeti Journal: Cell Date: 2015-08-13 Impact factor: 41.582
Authors: Bruce Futcher; Oleksandr Gorbatsevych; Sam H Shen; Charles B Stauft; Yutong Song; Bingyin Wang; Janet Leatherwood; Justin Gardin; Alisa Yurovsky; Steffen Mueller; Eckard Wimmer Journal: Proc Natl Acad Sci U S A Date: 2015-06-12 Impact factor: 11.205
Authors: Sam H Shen; Charles B Stauft; Oleksandr Gorbatsevych; Yutong Song; Charles B Ward; Alisa Yurovsky; Steffen Mueller; Bruce Futcher; Eckard Wimmer Journal: Proc Natl Acad Sci U S A Date: 2015-03-30 Impact factor: 11.205
Authors: Cyril Le Nouën; Thomas McCarty; Michael Brown; Melissa Laird Smith; Roberto Lleras; Michael A Dolan; Masfique Mehedi; Lijuan Yang; Cindy Luongo; Bo Liang; Shirin Munir; Joshua M DiNapoli; Steffen Mueller; Eckard Wimmer; Peter L Collins; Ursula J Buchholz Journal: Proc Natl Acad Sci U S A Date: 2017-01-03 Impact factor: 11.205
Authors: Adi Stern; Ming Te Yeh; Tal Zinger; Matt Smith; Caroline Wright; Guy Ling; Rasmus Nielsen; Andrew Macadam; Raul Andino Journal: Cell Date: 2017-03-23 Impact factor: 41.582