| Literature DB >> 17332012 |
Julia R Gog1, Emmanuel Dos Santos Afonso, Rosa M Dalton, India Leclercq, Laurence Tiley, Debra Elton, Johann C von Kirchbach, Nadia Naffakh, Nicolas Escriou, Paul Digard.
Abstract
Genome segmentation facilitates reassortment and rapid evolution of influenza A virus. However, segmentation complicates particle assembly as virions must contain all eight vRNA species to be infectious. Specific packaging signals exist that extend into the coding regions of most if not all segments, but these RNA motifs are poorly defined. We measured codon variability in a large dataset of sequences to identify areas of low nucleotide sequence variation independent of amino acid conservation in each segment. Most clusters of codons showing very little synonymous variation were located at segment termini, consistent with previous experimental data mapping packaging signals. Certain internal regions of conservation, most notably in the PA gene, may however signify previously unidentified functions in the virus genome. To experimentally test the bioinformatics analysis, we introduced synonymous mutations into conserved codons within known packaging signals and measured incorporation of the mutant segment into virus particles. Surprisingly, in most cases, single nucleotide changes dramatically reduced segment packaging. Thus our analysis identifies cis-acting sequences in the influenza virus genome at the nucleotide level. Furthermore, we propose that strain-specific differences exist in certain packaging signals, most notably the haemagglutinin gene; this finding has major implications for the evolution of pandemic viruses.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17332012 PMCID: PMC1874621 DOI: 10.1093/nar/gkm087
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Influenza A virus sequence datasets
| Seg | Gene | No. | Cons. | ORF | Year | Median | HA types | % ‘Human’ |
|---|---|---|---|---|---|---|---|---|
| 1 | PB2 | 369 | 92 | 759 | 1930 | 2000 | 1–7,9,13 | 54 |
| 2 | PB1 | 351 | 93 | 757 | 1933 | 2000 | 1–7,9,13,16 | 58 |
| 3 | PA | 396 | 89 | 716 | 1930 | 2000 | 1–7,9,13 | 53 |
| 4 | HA (1) | 99 | 67 | 566 | 1918 | 1994 | n/a | 99 |
| 4 | HA (3) | 203 | 82 | 566 | 1968 | 2002 | n/a | 52 |
| 5 | NP | 617 | 85 | 498 | 1918 | 1997 | 1–7,9–11,13,14,16 | 52 |
| 6 | NA (1) | 130 | 75 | 468/9 | 1918 | 1999 | 1–7,9,11 | 51 |
| 7 | M1 | 887 | 89 | 252 | 1902 | 1999 | 1–7,9–13 | 38 |
| M2 | 746 | 73 | (88) | 1902 | 1999 | 1–7,9–13 | 40 | |
| 8 | NS1 | 829 | 51 | 230 | 1902 | 1998 | 1–7,9–13 | 42 |
| NS2 | 962 | 74 | (111) | 1902 | 1999 | 1–7,9–13 | 35 | |
| 8 | NS1(A) | 645 | 62 | 230 | 1902 | 1998 | 1–7,9–11,13 | 58 |
| 8 | NS1(B) | 184 | 80 | 230 | 1949 | 1998 | 1,3–7,9,10,12 | 0 |
aSegment number.
bTranslation product analysed (parentheses specify subtype where applicable).
cNumber of sequences analysed.
d% of amino acid residues within each dataset that are identical in ≥95% of isolates.
eLength of ORF analysed. (Only second exon sequences are considered for M2 and NS2).
fThe earliest year of virus isolation. The latest isolation date was 2004 except for NS1(B), NA (1) (2003) and HA (1) (2002).
gMedian date of virus isolation for each dataset.
hHA subtypes of virus isolates from which sequences were derived; n/a, not applicable.
i% of isolates from potentially human adapted H1N1, H1N2, H2N2 or H3N2 viruses.
Figure 1.Normalized codon MPD plots for influenza A virus segments. Dots indicate individual codon scores while lines delineate a moving average taken over a window of 10 residues. For (G) M and (H) NS segments, data for the spliced gene products (blue dots and green lines) are plotted on the same scale with the appropriate degree of overlap. Bars above plots represent areas of dual coding capacity. Bars below plots represent packaging signals known from reverse genetics experiments (black lines) or inferred from sequencing studies of DI RNAs (grey lines). Arrow heads indicate splice donor and acceptor sites.
Examples of low and high MPD codon scores associated with invariant arginine (R) amino acid (AA) residues are tabulated
| Codon | AA | Obs | Exp |
|---|---|---|---|
| (a) PB1 ORF, position 755 | |||
| CGU | R | 0 | 6.0 |
| CGC | R | 0 | 18.7 |
| CGA | R | 0 | 22.7 |
| CGG | R | 351 | 36.9 |
| AGA | R | 0 | 170.3 |
| AGG | R | 0 | 96.4 |
| MPD | 0 | 0.93 | |
| Norm | 0 | ||
| (b) PB2 ORF, position 427 | |||
| CGU | R | 1 | 7.8 |
| CGC | R | 2 | 23.1 |
| CGA | R | 192 | 33.5 |
| CGG | R | 81 | 26.7 |
| AGA | R | 87 | 185.4 |
| AGG | R | 6 | 92.4 |
| MPD | 0.75 | 0.91 | |
| Norm: | 0.82 | ||
aObserved frequency of codons at specified position.
bHypothetical predicted occurence of codons based on overall codon usage data for the segment.
cCalculated mean pairwise difference (MPD) of the codon distribution.
dNorm = observed MPD/expected MPD.
Effect of synonymous mutations in the PB2-coding region on segment 1 packaging
| vRNA | Mutation | MPD score | Packaging levels (% of GFP-expressing cells) | vRNA expression levels (% of PB2(159) GFP(166)) |
|---|---|---|---|---|
| PB2(159) GFP(34) | n/a | n/a | 0.4 ± 0.1 | 130 ± 10 |
| PB2(159) GFP(166) | n/a | n/a | 27.5 ± 7.3 | 100 |
| mut 731 | GUG -> GUC | 0.360 | 18.5 ± 1.6 | 20 ± 0 |
| mut 737 | CGG -> CGC | 0.539 | 17.9 ± 4.4 | 35 ± 5 |
| mut 744a | CUU -> CUA | 0 | 0.8 ± 0.2 | 80 ± 20 |
| mut 744b | CUU -> UUA | 0 | 0.5 ± 0.1 | 65 ± 5 |
| mut 745 | ACU -> ACA | 0 | 1.2 ± 0.3 | 55 ± 5 |
| mut 748 | CAG -> CAA | 0 | 1.1 ± 0.3 | 75 ± 25 |
| mut 751 | ACC -> ACG | 0.008 | 0.5 ± 0.3 | 110 ± 10 |
| mut 757 | GCC -> GCG | 0 | 1.1 ± 0.4 | 70 ± 10 |
aPB2(159)GFP(166) denotes vRNAs containing PB2 codons 1–44 and 716–759 (synonymous mutations introduced as indicated), while PB2(159)GFP(34) refers to a vRNA lacking all 3′-PB2-coding regions.
bMean ± standard deviation of four measurements from two independent clones.
cMean ± range from two independent clones as determined by primer-extension analysis. n/a, not applicable.
Effect of synonymous mutations in the NA-coding region on segment 6 packaging
| vRNA | Mutation | MPD score | Relative packaging efficiency (on a 0.1–100 scale) | vRNA expression levels (% of CAT35) |
|---|---|---|---|---|
| CAT35 | n/a | n/a | 0.1 | 100 |
| CAT38 | n/a | n/a | 100 | 111 ± 5 |
| mut 461 | GCU -> GCA | 0.364 | 15.0 ± 4.5 | 94 ± 19 |
| mut 464 | CCG -> CCC | 0.393 | 9.5 ± 1.5 | 44 ± 9 |
| mut 463a | UUG -> UUA | 0.098 | 0.5 ± 0.1 | 103 ± 17 |
| mut 463b | UUG -> CUG | 0.098 | 11.9 ± 4.3 | 44 ± 7 |
| mut 466 | ACC -> ACG | 0.065 | 35.5 ± 1.0 | 55 ± 7 |
| mut 467 | AAU -> AAA | 0.047 | 64.2 ± 6.7 | 91 ± 12 |
| mut 468 | GAC -> GAU | 0 | 2.4 ± 0.4 | 333 ± 91 |
| mut 469 | AAG -> AAA | 0.061 | 6.6 ± 1.2 | 235 ± 43 |
aCAT38 denotes vRNAs containing NA codons 456–469 required for efficient packaging, while CAT35 refers to a vRNA lacking all 3′-NA-coding region. Synonymous mutations were introduced into the CAT38 vRNA as indicated.
bMean ± standard deviation of four measurements from two independent clones as determined by CAT ELISA. n/a, not applicable.
Figure 2.Varying patterns of RNA conservation in the NS1 and HA ORFs. (A) Consensus nucleotide and amino acid (single letter code) sequences of the first 11 codons of the overall NS1 dataset and the A and B lineages are shown along with the codon MPD scores. Residues not conserved at the 95% level are shown by asterisks. Highly conserved triplets (MPD value <0.05) are shown in red. Conserved residues that differ between lineages are highlighted. Underlining denotes the splice donor sequence. (B) Consensus sequences for the C-terminal 27 codons of HA subtypes 1, 3 and 5, encompassing the packaging signal for H1. MPD scores are not shown for clarity but conserved triplets are color-coded red (MPD < 0.05) or blue (MPD < 0.1) and highlighted. Homology between the H5 sequence and conserved nucleotides in H1 or H3 sequences are indicated by lines.