| Literature DB >> 31737288 |
Haogao Gu1, Rebecca L Y Fan1, Di Wang2, Leo L M Poon1.
Abstract
Significant biases of dinucleotide composition in many RNA viruses including influenza A virus have been reported in recent years. Previous studies have showed that a codon-usage-altered influenza mutant with elevated CpG usage is attenuated in mammalian in vitro and in vivo models. However, the relationship between dinucleotide preference and codon usage bias is not entirely clear and changes in dinucleotide usage of influenza virus during evolution at segment level are yet to be investigated. In this study, a Monte Carlo type method was applied to identify under-represented or over-represented dinucleotide motifs, among different segments and different groups, in influenza viral sequences. After excluding the potential biases caused by codon usage and amino acid sequences, CpG and UpA were found under-represented in all viral segments from all groups, whereas UpG and CpA were found over-represented. We further explored the temporal changes of usage of these dinucleotides. Our analyses revealed significant decrease of CpG frequency in Segments 1, 3, 4, and 5 in seasonal H1 virus after its re-emergence in humans in 1977. Such temporal variations were mainly contributed by the dinucleotide changes at the codon positions 3-1 and 2-3 where silent mutations played a major role. The depletions of CpG and UpA through silent mutations consequently led to over-representations of UpG and CpA. We also found that dinucleotide preference directly results in significant synonymous codon usage bias. Our study helps to provide details on understanding the evolutionary history of influenza virus and selection pressures that shape the virus genome.Entities:
Keywords: codon usage; dinucleotide usage; evolution; influenza
Year: 2019 PMID: 31737288 PMCID: PMC6845147 DOI: 10.1093/ve/vez038
Source DB: PubMed Journal: Virus Evol ISSN: 2057-1577
Figure 1.Dinucleotide Monte Carlo odds ratios in influenza A viruses in boxplot. The odds ratios were calculated for all the sixteen dinucleotide motifs by the order as shown in the legend. The odds ratios over 1.23 or below 0.78 were considered significantly over-represented or under-represented (dotted lines), respectively. The lower and upper hinges in the boxplot correspond to the first and third quartiles of the data, the upper/lower whisker extends from the hinge to the largest/smallest value no further than 1.5×IQR from the hinge. Data beyond the end of the whiskers are plotted individually.
Figure 2.Evolution of the relative frequency of CpG dinucleotide in influenza A virus by segment in boxplot.
Figure 3.Evolution of CpG frequency at different codon positions (top: 1-2; middle: 2-3; bottom: 3-1).
The substitutes of CpG in human H1N1 virus evolution.
| Mutant | UpG (39.3%) | CpA (37.7%) | ApG (17.0%) | ||||||
|---|---|---|---|---|---|---|---|---|---|
| SNP position | 1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 3 |
| Count | 9 | 7 | 155 | 41 | 15 | 108 | 28 | 3 | 43 |
The substitutes of UpA in human H1N1 virus evolution.
| Mutant | UpG (41.6%) | CpA (30.1%) | UpC (9.0%) | ||||||
|---|---|---|---|---|---|---|---|---|---|
| SNP position | 1 | 2 | 3 | 1 | 2 | 3 | 1 | 2 | 3 |
| Count | 82 | 3 | 179 | 13 | 20 | 158 | 15 | 3 | 39 |
Figure 4.The 3-1 dinucleotide composition of the U/C-ended synonymous codons in boxplot. UpA and CpG are much less represented than their corresponding counterparts, CpA and UpG, respectively.
The mean 3-1 dinucleotide occurrence of the U/C-ended synonymous codons.
| Segment | Host | *pA | *pC | *pG | *pU | ||||
|---|---|---|---|---|---|---|---|---|---|
| CpA | UpA | CpC | UpC | CpG | UpG | CpU | UpU | ||
| PB2 | Avian |
|
| 2.84 | 2.22 |
|
| 1.63 | 1.8 |
| PB2 | Human H2N2 |
|
| 2.26 | 2.89 |
|
| 1.26 | 2.07 |
| PB2 | Pandemic H1 |
|
| 2.98 | 2.18 |
|
| 1.51 | 1.83 |
| PB2 | Seasonal H1 |
|
| 2.39 | 2.57 |
|
| 1.83 | 1.7 |
| PB2 | Seasonal H3 |
|
| 2.5 | 2.55 |
|
|
|
|
| PB1 | Avian |
|
| 2.51 | 2.81 |
|
| 2.66 | 1.71 |
| PB1 | Human H2N2 |
|
| 2.2 | 2.99 |
|
| 2.98 | 1.36 |
| PB1 | Pandemic H1 |
|
| 2.26 | 2.92 |
|
| 2.4 | 2.31 |
| PB1 | Seasonal H1 |
|
| 2.58 | 2.3 |
|
| 2.47 | 2.68 |
| PB1 | Seasonal H3 |
|
| 2.63 | 2.4 |
|
| 2.52 | 2.21 |
| PA | Avian |
|
| 2.35 | 1.98 |
|
| 3.27 | 2.44 |
| PA | Human H2N2 |
|
| 1.57 | 2.35 |
|
| 2.83 | 3.16 |
| PA | Pandemic H1 |
|
|
|
|
|
| 3 | 2.5 |
| PA | Seasonal H1 |
|
|
|
|
|
| 2.73 | 3.57 |
| PA | Seasonal H3 |
|
| 1.99 | 1.69 |
|
|
|
|
| HA | Avian H1 |
|
| 2.58 | 2.37 |
|
| 2.38 | 2.46 |
| HA | Avian H3 |
|
| 2.83 | 2.42 |
|
| 2.1 | 1.43 |
| HA | Human H2N2 |
|
| 1.88 | 2.37 |
|
| 1.53 | 1.8 |
| HA | Pandemic H1 |
|
| 2.53 | 2.41 |
|
| 2.02 | 2.32 |
| HA | Seasonal H1 |
|
| 2.28 | 2.57 |
|
| 2.34 | 1.8 |
| HA | Seasonal H3 |
|
|
|
|
|
| 1.3 | 1.84 |
| NP | Avian |
|
| 2.01 | 1.65 |
|
| 1.66 | 1 |
| NP | Human H2N2 |
|
|
|
|
|
| 1.31 | 1.19 |
| NP | Pandemic H1 |
|
| 2.08 | 1.25 |
|
| 1.67 | 1.06 |
| NP | Seasonal H1 |
|
| 1.33 | 1.83 |
|
| 1.37 | 1.19 |
| NP | Seasonal H3 |
|
| 1.61 | 1.66 |
|
| 0.92 | 1.79 |
| NA | Avian N1 |
|
|
|
|
|
| 1.65 | 2.16 |
| NA | Avian N2 |
|
| 0.74 | 1.53 |
|
|
|
|
| NA | Human H2N2 |
|
|
|
|
|
|
|
|
| NA | Pandemic H1 |
|
|
|
|
|
| 2.46 | 1.51 |
| NA | Seasonal H1 |
|
|
|
|
|
|
|
|
| NA | Seasonal H3 |
|
|
|
|
|
| 1.43 | 2.03 |
| M | Avian |
|
| 0.61 | 0.76 |
|
| 0.16 | 0.36 |
| M | Human H2N2 | 1.43 | 0.74 | 0.17 | 0.97 |
|
| 0.18 | 0.32 |
| M | Pandemic H1 |
|
| 0.67 | 0.8 |
|
| 0 | 0.5 |
| M | Seasonal H1 | 1.12 | 0.92 | 0.18 | 1.14 |
|
| 0.16 | 0.34 |
| M | Seasonal H3 | 1.51 | 0.66 | 0.28 | 0.89 |
|
| 0.1 | 0.4 |
| NS | Avian | 1.41 | 0.71 | 0.77 | 0.69 |
|
| 0.57 | 0.34 |
| NS | Human H2N2 | 1.42 | 0.72 | 1 | 0.6 |
|
| 0.34 | 0.56 |
| NS | Pandemic H1 | 1.7 | 0.73 | 0.71 | 0.56 |
|
| 0.74 | 0.17 |
| NS | Seasonal H1 | 1.47 | 0.62 | 0.95 | 0.9 |
|
| 0.37 | 0.48 |
| NS | Seasonal H3 |
|
| 0.74 | 0.82 |
|
| 0.34 | 0.5 |
The dinucleotide pairs with significant difference were shown in bold (P < 0.05 in Mann−Whitney U test and, difference of mean values >1).