Literature DB >> 21450075

Analysis of codon usage and nucleotide composition bias in polioviruses.

Jie Zhang1, Meng Wang, Wen-qian Liu, Jian-hua Zhou, Hao-tai Chen, Li-na Ma, Yao-zhong Ding, Yuan-xing Gu, Yong-sheng Liu.   

Abstract

BACKGROUND: Poliovirus, the causative agent of poliomyelitis, is a human enterovirus and a member of the family of Picornaviridae and among the most rapidly evolving viruses known. Analysis of codon usage can reveal much about the molecular evolution of the viruses. However, little information about synonymous codon usage pattern of polioviruses genome has been acquired to date.
METHODS: The relative synonymous codon usage (RSCU) values, effective number of codon (ENC) values, nucleotide contents and dinucleotides were investigated and a comparative analysis of codon usage pattern for open reading frames (ORFs) among 48 polioviruses isolates including 31 of genotype 1, 13 of genotype 2 and 4 of genotype 3.
RESULTS: The result shows that the overall extent of codon usage bias in poliovirus samples is low (mean ENC = 53.754 > 40). The general correlation between base composition and codon usage bias suggests that mutational pressure rather than natural selection is the main factor that determines the codon usage bias in those polioviruses. Depending on the RSCU data, it was found that there was a significant variation in bias of codon usage among three genotypes. Geographic factor also has some effect on the codon usage pattern (exists in the genotype-1 of polioviruses). No significant effect in gene length or vaccine derived polioviruses (DVPVs), wild viruses and live attenuated virus was observed on the variations of synonymous codon usage in the virus genes. The relative abundance of dinucleotide (CpG) in the ORFs of polioviruses are far below expected values especially in DVPVs and attenuated virus of polioviruses genotype 1.
CONCLUSION: The information from this study may not only have theoretical value in understanding poliovirus evolution, especially for DVPVs genotype 1, but also have potential value for the development of poliovirus vaccines.

Entities:  

Mesh:

Substances:

Year:  2011        PMID: 21450075      PMCID: PMC3079669          DOI: 10.1186/1743-422X-8-146

Source DB:  PubMed          Journal:  Virol J        ISSN: 1743-422X            Impact factor:   4.099


Background

When molecular sequence data started to be accumulated nearly 20 years ago, it was noted that synonymous codons are not used equally in different genomes, even in different genes of the same genome[1-3]. As an important evolutionary phenomenon, it is well known that synonymous codon usage bias exists in a wide range of biological systems from prokaryotes to eukaryotes [4,5]. Codon usage analysis has been applied to prokaryote and eukaryote, such as Escherichia coli, Bacillus subtilis, Saccharomyces cerevisiae, Caenorhabditis elegans and human beings [6-8]. These observed patterns in synonymous codon usage varied among genes within a genome, and among genomes. The codon usage is attributable to the equilibrium between natural selection and mutation pressure [9,10]. Recent studies of viral codon usage has shown that mutation bias may be a more important factor than natural selection in determining codon usage bias of some viruses, such as Picornaviridae, Pestivirus, plant viruses, and vertebrate DNA viruses [9,11-13]. Meanwhile, recent report also showed that the G+C compositional constraint is the main factor that determines the codon usage bias in iridovirus genomes[11,14]. Analysis of codon usage can reveal much about the molecular evolution or individual genes of the viruses. Polioviruses belong to the family Picornaviridae and are classified as human enterovirus C (HEV-C) species in the genus Enterovirus according to the current taxonomy [15,16]. Polioviruses can be divided into three different genotypes: 1, 2 and 3. The genome of each genotypes contains a single positive-stranded RNA with a size of approximately 6 kb consisting of a single large open reading frame (ORF) flanked by 5' and 3' untranslated region [17]. As we known, the Sabin oral poliovaccine (OPV) was among the best known viral vaccines [18]. It has saved the lives and health of innumerable people, in particular children. However, poliovirus is highly genetically variable. OPV viruses may undergo transformation into circulating highly diverged VDPV, exhibiting properties hardly distinguishable from those of wild polioviruses [19]. So far, little information about synonymous codon usage pattern of polioviruses genome has been acquired to date. To our knowledge, this is the first report of the codon usage analysis on polioviruses (including wild strains, attenuated live vaccine strains and VDPV strains). In this study, we analyzed the codon usage data and base composition of 48 available representative complete ORFs of poliovirus to obtain some clues to the features of genetic evolution of the virus.

Methods

Sequence data

A total of 48 poliovirus genomes were used in this study (Table 1). The serial number (SN), genotype, length value, isolated region, GenBank accession numbers, and other detail information about these strains were listed in Table 1. All of the sequences were downloaded from NCBI http://www.ncbi.nlm.nih.gov/Genbank/, and 48 poliovirus genomes were selected in the study. The other sequences with >98% sequence identities were excluded.
Table 1

The information of 48 polioviruses genomes used in this study

SNStrainGene typeLengthaIsolationNoteAccession No.
1CHN-Henan/91-316630ChinaW VirusbAF111983
2CHN-Jiangxi/89-116630ChinaW VirusbAF111984
3P1W/Bar65 (19276)16630BelarusDVPVAY278553
4HAI01008C216630HaitiDVPVAF405662
5HAI0100716630HaitiDVPVAF405666
6HAI0100216630HaitiDVPVAF405667
7HAI0100116630HaitiDVPVAF405668
8HAI0000316630HaitiDVPVAF405669
9DOR0101216630DominicaDVPVAF405670
10DOR00041C316630DominicaDVPVAF405679
11DOR0002816630DominicaDVPVAF405684
1299/056-252-1416630RussiaDVPVAF462418
13RUS-1161-96-00116630RussiaDVPVAF462419
14HAI01-1316630HaitiDVPVAF416342
15TCDCE01-13516630C TaiwancDVPVAF538840
16TCDC01-11316630C TaiwancDVPVAF538841
17TCDC01-33016630C TaiwancDVPVAF538842
18TCDC01-86116630C TaiwancDVPVAF538843
19Sabin 116630USAVaccinedAY184219
20Brunhilde16630ChinaW VirusbAY560657
21USA1078416630USADVPVEF682356
22USA1078516630USADVPVEF682357
23USA1078316630USADVPVEF682358
24USA1078616630USADVPVEF682359
25CHN8229-3/GZ/CHN/200416630ChinaDVPVFJ769381
261005016630ChinaDVPVFJ859058
2710091c16630ChinaDVPVFJ859060
2810092c16630ChinaDVPVFJ859061
2910094c16630ChinaDVPVFJ859062
3010095c16630ChinaDVPVFJ859063
3110097c16630ChinaDVPVFJ859064
32EGY88-07426624EgyptDVPVAF448782
33EGY93-03426624EgyptDVPVAF448783
34P2S/Mog65-3 (20120)26624BelarusDVPVAY278549
35P2S/Mog66-4 (21043)26624BelarusDVPVAY278551
36P2S/Mog65-2 (20077)27439BelarusDVPVAY278552
37NIE021076626624NigeriaDVPVDQ890385
38NIE011076726624NigeriaDVPVDQ890386
39USA981076826624USADVPVDQ890387
40PER831076926624PeruDVPVDQ890388
413219126624BelarusDVPVFJ460223
4232189+AP126624BelarusDVPVFJ460224
433199626624BelarusDVPVFJ460225
44PV2/Rus26624RussiaDVPVFJ517649
45Sabin 336621USAVaccinedAY184221
463323936621BelarusDVPVFJ460226
473197436621BelarusDVPVFJ460227
48FIN84-6021236621FinlandDVPVFJ842158

Note: a the length values excluding non-coding sequence.

b means wild strain

c stands for China Taiwan

d stands for attenuated live vaccine strain

The information of 48 polioviruses genomes used in this study Note: a the length values excluding non-coding sequence. b means wild strain c stands for China Taiwan d stands for attenuated live vaccine strain

The actual and predicted values of the effective number of codon (ENC)

The ENC is used to measure the degree of departure from the equal use of synonymous codons of coding regions of polioviruses. The values of the effective number of codon (ENC) range from 20 to 61. In an extremely biased gene where only one codon is used for each amino acid, this value would be 20; if all codons are used equally, it would be 61; and if the value of ENC is greater than 40, the codon usage bias was regarded as low. The values of ENC were obtained by EMBOSS CHIPS program [20]. Genes, whose codon choice is constrained only by a mutation bias, will lie on or just below the curve of the predicted values. The predicted values of ENC were calculated as where s represents the given (G+C)3% value [21].

The calculation of the relative synonymous codon usage (RSCU)

To investigate the pattern of relative synonymous codon usage (RSCU) without the influence of amino acid composition among all polioviruses samples, the RSCU values of codons in each ORF of polioviruses were calculated according to the formula of previous reports [22,23]. where gis the observed number of the ith codon for jth amino acid which has ntype of synonymous codons. The codon with RSCU value more than 1.0 has positive codon usage bias, while the value <1.0 has relative negative codon usage bias. When RSCU value is equal to 1.0, it means that this codon is chosen equally and randomly.

Relative dinucleotide abundance in polioviruses

Because dinucleotide biases can affect codon bias, the relative abundance of dinucleotides in the coding regions of polioviruse genomes was assessed using the method described by Karlin and Burge [24]. A comparison of actual and expected dinucleotide frequencies of the 16 dinucleotides in coding region of the 48 polioviruses genomes was also undertaken. The odds ratio ρxy=ƒxy/ƒyƒx, where ƒx denotes the frequency of the nucleotide X, ƒy denotes the frequency of the nucleotide Y, ƒyƒx the expected frequency of the dinucleotide XY and ƒxy the frequency of the dinucleotide XY, etc., for each dinucleotide were calculated. As a conservative criterion, for ρ xy > 1.23 (or < 0.78), the XY pair is considered to be of over-represented (or under-represented) relative abundance compared with a random association of mononucleotides.

Statistical analysis

Principal component analysis (PCA) was carried out to analyze the major trend in codon usage pattern in different genomes of polioviruses (excluding non-coding regions). It is a statistical method that performs linear mapping to extract optimal features from an input distribution in the mean squared error sense and can be used by self-organizing neural networks to form unsupervised neural preprocessing modules for classification problems [6]. In order to minimize the effect of amino acid composition on codon usage, each ORF is represented as a 59-dimensional vector. Each dimension corresponds to the RSCU value of one sense codon excluding Met, Trp and three stop codons. Linear regression analysis was used to find the correlation between codon usage bias and gene length. Correlation analysis is used to identify the relationship between codon usage bias and synonymous codon usage pattern. This analysis is implemented based on the Spearman's rank correlation analysis way. All statistical analyses were carried out using the statistical analysis software SPSS Version 17.0.

Results

The characteristics of synonymous codon usage in polioviruses

In order to investigate the extent of codon usage bias in polioviruses, all RSCU values of different codon in 48 polioviruses strains were calculated. There is only two preferred codons UUG (Leu) and GUG (Val), choosing G at the third position, and most of preferred codons are ended with A (Table 2). Moreover, polioviruses genome is A redundant with A content ranging from 29.739 to 30.826.11, with the mean value of 30.367 and S.D. of 0.234; in contrast, low content of G ranging from 21.723 to 22.401 (mean = 22.118, S.D. of 0.147), suggesting that nucleotide contents influence the patterns of synonymous codon usage (Table 3). The values of ENC among these polioviruses ORFs are similar, which vary from 52.609 to 55.105 with a mean value of 53.754 and S.D. of 0.545. The data showed that the extent of codon preference in polioviruses genes was kept basically stable.
Table 2

Synonymous codon usage in the coding region of polioviruses

AA aCodonRSCU bAACodonRSCU b
PheUUU1.020GlnCAA1.075
UUC0.980CAG0.925
LeuUUA0.914HisCAU0.787
UUG1.349CAC1.213
CUU0.566AsnAAU0.900
CUC0.909AAC1.100
CUA1.023LysAAA1.050
CUG1.072AAG0.95
ValGUU0.441AspGAU0.961
GUC0.762GAC1.039
GUA0.735GluGAA1.057
GUG1.657GAG0.943
Ser cUCU0.785ArgAGA2.868
UCC1.345AGG1.471
UCA1.749CGU0.434
UCG0.424CGC0.577
AGU0.920CGA0.268
AGC0.777CGG0.381
ProCCU0.800CysUGU1.105
CCC0.799UGC0.895
CCA1.884TyrUAU0.847
CCG0.517UAC1.153
ThrACU1.170AlaGCU1.161
ACC1.330GCC0.969
ACA1.124GCA1.438
ACG0.376GCG0.432
GlyGGU1.160IleAUU1.247
GGC0.757AUC1.049
GGA1.175AUA0.705
GGG0.909

Note: The boldface means the preferred codon compare with other synonymous codon.

Table 3

Nucleotide contents in ORFs of 48 poliovirus genomes

No.AGUCA3C3G3U3A+UG+CC3/G3ENC
10.3050.2200.2360.2390.2750.2700.1920.2630.5410.4590.46253.082
20.3050.2180.2360.2420.2690.2750.1910.2640.5400.4600.46753.749
30.3000.2190.2380.2420.2630.2830.1900.2640.5380.4620.47353.709
40.3010.2220.2360.2410.2690.2790.1890.2630.5370.4630.46854.085
50.3080.2210.2330.2380.2850.2680.1890.2580.5410.4590.45753.936
60.3000.2230.2330.2440.2650.2890.1920.2530.5330.4670.48153.506
70.3000.2230.2360.2410.2670.2790.1910.2630.5360.4640.47053.929
80.3050.2200.2330.2420.2810.2830.1850.2500.5380.4620.46953.506
90.3010.2220.2320.2440.2700.2870.1890.2530.5330.4670.47753.592
100.3030.2200.2320.2440.2750.2880.1860.2520.5350.4650.47453.308
110.3030.2210.2340.2420.2730.2830.1870.2570.5370.4630.47053.389
120.3020.2190.2400.2380.2700.2640.1880.2780.5420.4580.45254.486
130.3040.2200.2370.2400.2740.2680.1880.2700.5410.4590.45654.637
140.3010.2220.2340.2430.2680.2840.1910.2570.5350.4650.47453.640
150.3050.2220.2350.2380.2790.2670.1930.2610.5400.4600.46052.948
160.3070.2210.2360.2370.2810.2640.1910.2630.5430.4570.45653.822
170.3050.2220.2360.2370.2790.2640.1950.2630.5410.4590.45853.194
180.3050.2220.2400.2330.2790.2540.1940.2730.5450.4550.44853.054
190.3080.2190.2310.2410.2820.2740.1890.2540.5400.4600.46453.359
200.3050.2190.2360.2400.2740.2770.1900.2590.5400.4600.46754.470
210.3050.2220.2340.2400.2760.2730.1940.2570.5380.4620.46753.840
220.3050.2210.2340.2390.2760.2720.1930.2580.5390.4610.46653.705
230.3050.2220.2340.2390.2760.2730.1940.2570.5390.4610.46653.745
240.3060.2210.2330.2400.2800.2730.1910.2560.5390.4610.46453.546
250.3050.2220.2320.2410.2760.2730.1930.2570.5370.4630.46653.349
260.3030.2230.2330.2410.2730.2750.1960.2560.5350.4650.47153.914
270.3030.2230.2320.2410.2730.2750.1960.2560.5350.4650.47153.800
280.3020.2240.2310.2430.2730.2790.1960.2520.5330.4670.47554.002
290.3040.2220.2330.2410.2750.2740.1940.2570.5360.4640.46853.752
300.3030.2240.2310.2420.2730.2780.1960.2530.5340.4660.47453.895
310.3030.2230.2330.2410.2740.2740.1950.2580.5360.4640.46953.803
320.3030.2200.2340.2440.2730.2810.1800.2650.5370.4630.46253.837
330.3040.2200.2370.2390.2800.2730.1800.2670.5410.4590.45353.339
340.3030.2210.2370.2380.2760.2690.1830.2720.5410.4590.45254.287
350.2980.2220.2350.2450.2600.2840.1840.2710.5340.4660.46953.712
360.2970.2220.2380.2420.2740.2720.1810.2730.5350.4650.45355.105
370.3020.2210.2360.2400.2710.2760.1840.2690.5390.4610.46054.092
380.3020.2220.2350.2410.2700.2760.1870.2660.5370.4630.46454.774
390.3050.2200.2360.2400.2800.2720.1780.2690.5410.4590.45054.418
400.3040.2220.2370.2370.2740.2660.1910.2700.5410.4590.45753.926
410.3040.2210.2370.2390.2770.2670.1850.2720.5400.4600.45254.478
420.3030.2210.2370.2390.2760.2680.1840.2730.5410.4590.45254.450
430.3040.2200.2370.2390.2760.2680.1840.2720.5410.4590.45354.463
440.3060.2190.2360.2390.2800.2710.1810.2680.5420.4580.45252.838
450.3000.2240.2350.2410.2700.2740.1940.2630.5350.4650.46752.609
460.3030.2210.2380.2380.2720.2670.1900.2710.5410.4590.45752.735
470.3010.2220.2370.2400.2700.2710.1910.2680.5380.4620.46254.245
480.3030.2230.2320.2420.2780.2810.1920.2480.5350.4650.47453.968
Synonymous codon usage in the coding region of polioviruses Note: The boldface means the preferred codon compare with other synonymous codon. Nucleotide contents in ORFs of 48 poliovirus genomes

Compositional properties of ORFs of 48 polioviruses genomes

The values of A, U, C, G and C+G were compared with the values of A3, C3, G3, U3, (G+C) 3, respectively. An interesting and complex correlation was observed. In detail, the (C+G)3 have highly significant correlations with A, U, C, G and C+G, respectively, indicating C+G may reflect interaction between mutation pressure and natural selection. However, the A have no correlation with A3, G3 and C3, and U have no correlation with A3 (Table 4). Both cases suggested that the nucleotide constraint possibly influence synonymous codon usage of polioviruses. In addition, the correlation between the Axis 1 (calculated by PCA) and the values of A, C, G, U, A3, C3, G3, U3, (G+C), (G+C)3 of each strain was also analyzed. The significant correlation was found between nucleotide compositions and synonymous codon usage to some extent excluding Axis 1 and the value of A (Table 4). The analysis revealed that most of the codon usage bias among ORFs of polioviruses strains was directly related to the base composition. Finally, the ENC-plot [ENC plotted against (G+C)3%] was used as a part of general strategy to investigate patterns of synonymous codon usage and all of the spots lie below the expected curve (Figure 1). These imply that the codon bias can be explained mainly by an uneven base composition, in other words, by mutation pressure rather than natural selection.
Table 4

Correlation analysis between the A, U, C, G contents and A 3, U 3, C 3, G 3 contents in ORF of 48 polioviruses genomes

A3U3G3C3(G+C) 3Axis 1
Ar = -0.093Nr = -0.303*r = -0.169Nr = -0.185Nr = -0.287*r = -0.126
Ur = -0.078Nr = 0.905**r = -0.422**r = -0.573**r = -0.706**r = -0.782**
Gr = -0.285*r = -0.341*r = 0.641**r = 0.195Nr = 0.777**r = 0.556**
Cr = -0.529**r = -0.509**r = -0.014Nr = 0.913**r = 0.461**r = 0.466**
G+Cr = -0.544**r = -0.599**r = 0.307 *r = 0.807**r = -0.851**r = 0.708**
Axis 1r = 0.541**r = -0.700**r = 0.360*r = 0.401**r = 0.502**

Note: ** Means p < 0.01

* Means 0.01 < p < 0.05

N Means no correlation

Figure 1

Graphs showing the relationship between the effective number of codons (ENC) and the GC content of the third codon position (GC. The curve indicates the expected codon usage if GC compositional constraints alone account for codon usage bias.

Correlation analysis between the A, U, C, G contents and A 3, U 3, C 3, G 3 contents in ORF of 48 polioviruses genomes Note: ** Means p < 0.01 * Means 0.01 < p < 0.05 N Means no correlation Graphs showing the relationship between the effective number of codons (ENC) and the GC content of the third codon position (GC. The curve indicates the expected codon usage if GC compositional constraints alone account for codon usage bias.

Effect of other potential factors on codon usage

Principal component analysis was carried out to identify the codon usage bias among ORFs. From which we could detect one major trend in the Axis 1 which accounted for 20.815% of the total variation, and another major trend in the Axis 2 for 16.273% of the total variation. A plot of the Axis 1 and the Axis 2 of each gene was shown in Additional file 1, Figure S1. Obviously, those polioviruses belong to the same genotype tends to come together (except strain 48, isolated from Finland). Compared with the scattered groups of polioviruses genotype 1, genotype 2 and 3 strains aggregated more tightly to some degree. Although this graph is a little complex, it seems that there is a clear geographical demarcation in the polioviruses genotype 1 such as the VDPV strains isolated from USA, Dominica, China mainland and Taiwan. These may indicate that geographic is another factor on codon usage bias. The frequencies of occurrence for dinucleotides were not randomly distributed and no dinucleotides were present at the expected frequencies. And the frequency of CpG and TpA was significantly low at all codon positions for coding region of 48 polioviruses genomes (mean ± S.D. = 0.490 ± 0.012; and mean ± S.D. = 0.748 ± 0.034. both < 0.78). The relative abundance of CpA and TpG also showed slight deviation from the ''normal range'' (mean ± S.D. = 1.253 ± 0.032 and 1.423 ± 0.023, respectively) (Table 5). In addition, the RSCU values of the eight codons containing CpG (CCG, GCG, UCG, ACG, CGC, CGG, CGU, and CGA) were analyzed, to reveal the possible effects of CpG under-represented on codon usage bias. All of these eight codons were not preferential codons and were markedly suppressed. The six codons containing TpA (UUA, CUA, GUA, UAU, UAC and AUA) were suppressed too. Conversely, the RSCU values of the eight codons containing CpA (UCA, CCA, ACA, GCA, CAA, CAG, CAU, CAC) and five codons containing UpG (UUG, GUG, UGU, UGC, CUG) are high, and most of them (8 out of 13) were preferential codons (Table 2 and Table 5). In addition, compared with DVPVs and live attenuated strain of polioviruses genotype 1, the wild viruses has higher frequencies of dinucleotides including CpG (Figure 2 and Table 5).
Table 5

Relative abundance of the 16 dinucleotides in ORF of 48 polioviruses

DinucleotidesRangeaMean ± S.DbDVPV 1Wild virusesVaccine
ApA0.909-0.9650.935 ± 0.0160.9430.9550.937
ApG0.993-1.0831.048 ± 0.0191.0581.0731.060
ApT0.955-1.0410.989 ± 0.0190.9991.0180.999
ApC0.960-1.0401.007 ± 0.0161.0131.0301.010
GpA0.987-1.0941.035 ± 0.0201.0431.0711.038
GpG1.140-1.2661.215 ± 0.0241.2281.2491.221
GpT0.986-1.0791.023 ± 0.0211.0331.0571.028
GpC0.846-0.9500.895 ± 0.0281.2281.2491.221
CpA1.204-1.3081.253 ± 0.0321.2701.2931.268
CpG0.418-0.5380.490 ± 0.0120.4990.5220.496
CpT0.920-1.0390.979 ± 0.0330.9951.0220.997
CpC0.952-1.0350.991 ± 0.0120.9991.0210.995
TpA0.669-0.8010.748 ± 0.0340.7660.7870.766
TpG1.386-1.4741.423 ± 0.0231.4341.4571.427
TpT1.059-1.1641.106 ± 0.0291.1221.1441.118
TpC0.824-0.9730.914 ± 0.0260.9250.9530.916

Note: The boldface means that the dinucleotide was over-represented or under-represented.

a The range of coding region of 48 polioviruses's relative dinucleotide ratios

b Mean values of coding region of 48 polioviruses's relative dinucleotide ratios ± S.D

Figure 2

Comparison the relative dinucleotide abundance in polioviruses DVPVs genotype 1, live attenuated virus genotype 1, wild viruses genotype 1, DVPVs genotype 2, DVPVs genotype 3 and live attenuated virus genotype 3.

Relative abundance of the 16 dinucleotides in ORF of 48 polioviruses Note: The boldface means that the dinucleotide was over-represented or under-represented. a The range of coding region of 48 polioviruses's relative dinucleotide ratios b Mean values of coding region of 48 polioviruses's relative dinucleotide ratios ± S.D Comparison the relative dinucleotide abundance in polioviruses DVPVs genotype 1, live attenuated virus genotype 1, wild viruses genotype 1, DVPVs genotype 2, DVPVs genotype 3 and live attenuated virus genotype 3. Furthermore, we also performed a linear regression analysis on ENC value and gene length of ORFs of 48 polioviruses genomes. However, there was no significant correlation between codon usage and gene length in these virus genes (Spearman P > 0.05).

Discussion

Studies of synonymous codon usage in viruses can reveal much about viral genomes [25]. The overall codon usage among 48 ORFs of polioviruses was analyzed in this study. First, the ENC values of all the poliovirus samples were analyzed, and the results showed that the majority of polioviruses do not have a strong codon bias (mean ENC = 53.754 > 40). In addition, together with published data on codon usage bias among some RNA viruses, such as BVDV, H5N1 influenza virus and SARS-covs with mean values of 51.43, 50.91 and 48.99, respectively, one possible explanation for this is that the weak codon bias of RNA virus is advantageous to replicate efficiently in vertebrate host cells, with potentially distinct codon preferences [26-28]. Natural selection and mutation pressure are thought to be the main factors that account for codon usage variation in different organisms [29-31]. In this study, the general association between codon usage bias and base composition suggests that mutational pressure, rather than natural selection is the mainly factors on codon usage pattern of polioviruses. Codon usage can also be strongly influenced by underlying biases in dinucleotide frequency, which differs greatly among organisms. Specifically, after accounting for dinucleotide biases, the proportion of codon usage bias explained by mutation pressure often increases, as seen in human RNA viruses [25]. Our study revealed that CpG and the eight CpG-containing codons are notably deficient in ORFs of 48 poliovirus genomes. The explanation for the CpG deficiency is immunologic escape. A high CpG content may be detrimental to small DNA (or RNA) viruses, as unmethylated CpGs are recognized by the host's innate immune system (Toll-like receptor 9) as a pathogen signature [32]. As with vertebrate genomes, methylated viral genomes would face a high chance of mutation at CpGs, that would result in a reduction of this dinucleotide [9,33]. We found that DVPVs and live attenuated virus of genotype 1 have lower frequencies of CpG dinucleotide compare with wild viruses of polioviruses genotype 1. The most popular explanation for lower frequencies of CpG in ORFs of DVPV genomes is that when OPV viruses turning into VDPV genotype 1, a lower frequencies of CpG dinucleotide maybe help VDPV out of the host immunity. Although it seems speculative and complex, some researchers have found that reduction of the rate of poliovirus protein synthesis through large-scale utilization of codons that are not optimal has caused attenuation of viral virulence by lowering specific infectivity [34]. Therefore, the information from this study may not only have theoretical value in understanding poliovirus evolution (especially for DVPVs genotype 1), but also have practical value for the development the poliovirus vaccine. However, a more comprehensive analysis is needed to reveal more information about codon usage bias variation within poliovirus and other responsible factors.

Conclusions

The information from this study may not only help to understand the evolution of the poliovirus, especially for DVPVs genotype 1, but also have potential value for the development of poliovirus vaccines.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

JZ, MW and YL designed the study and drafted the manuscript. WL, JZ, HC and LM collected the data and participated in the sequence alignment. YD and YG performed the statistical analysis. All authors read and approved the final manuscript.

Additional files 1

Figure S1. A plot of the values of the Axis1a (20.82%) and the Axis2a (16.27%) of each ORF in principle component analysis. Click here for file
  33 in total

Review 1.  DNA methylation and the Epstein-Barr virus.

Authors:  R F Ambinder; K D Robertson; Q Tao
Journal:  Semin Cancer Biol       Date:  1999-10       Impact factor: 15.707

2.  The 'effective number of codons' used in a gene.

Authors:  F Wright
Journal:  Gene       Date:  1990-03-01       Impact factor: 3.688

3.  What drives codon choices in human genes?

Authors:  S Karlin; J Mrázek
Journal:  J Mol Biol       Date:  1996-10-04       Impact factor: 5.469

4.  Individual variation in inbreeding depression: the roles of inbreeding history and mutation.

Authors:  S T Schultz; J H Willis
Journal:  Genetics       Date:  1995-11       Impact factor: 4.562

Review 5.  Genetics of poliovirus.

Authors:  E Wimmer; C U Hellen; X Cao
Journal:  Annu Rev Genet       Date:  1993       Impact factor: 16.830

Review 6.  Codon usage: mutational bias, translational selection, or both?

Authors:  P M Sharp; M Stenico; J F Peden; A T Lloyd
Journal:  Biochem Soc Trans       Date:  1993-11       Impact factor: 5.407

7.  Codon usage in regulatory genes in Escherichia coli does not reflect selection for 'rare' codons.

Authors:  P M Sharp; W H Li
Journal:  Nucleic Acids Res       Date:  1986-10-10       Impact factor: 16.971

Review 8.  Codon usage patterns in Escherichia coli, Bacillus subtilis, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Drosophila melanogaster and Homo sapiens; a review of the considerable within-species diversity.

Authors:  P M Sharp; E Cowe; D G Higgins; D C Shields; K H Wolfe; F Wright
Journal:  Nucleic Acids Res       Date:  1988-09-12       Impact factor: 16.971

Review 9.  Dinucleotide relative abundance extremes: a genomic signature.

Authors:  S Karlin; C Burge
Journal:  Trends Genet       Date:  1995-07       Impact factor: 11.639

10.  Analysis of synonymous codon usage bias in Chlamydia.

Authors:  Hui Lü; Wei-Ming Zhao; Yan Zheng; Hong Wang; Mei Qi; Xiu-Ping Yu
Journal:  Acta Biochim Biophys Sin (Shanghai)       Date:  2005-01       Impact factor: 3.848

View more
  15 in total

1.  Analysis of synonymous codon usage patterns in duck hepatitis A virus: a comparison on the roles of mutual pressure and natural selection.

Authors:  Youhua Chen; You-Fang Chen
Journal:  Virusdisease       Date:  2014-01-25

2.  Selective pressure dominates the synonymous codon usage in parvoviridae.

Authors:  Sheng-Lin Shi; Yi-Ren Jiang; Yan-Qun Liu; Run-Xi Xia; Li Qin
Journal:  Virus Genes       Date:  2012-09-21       Impact factor: 2.332

3.  Comparative evolutionary genomics of Corynebacterium with special reference to codon and amino acid usage diversities.

Authors:  Shilpee Pal; Indrani Sarkar; Ayan Roy; Pradeep K Das Mohapatra; Keshab C Mondal; Arnab Sen
Journal:  Genetica       Date:  2017-09-18       Impact factor: 1.082

4.  Genetic and evolutionary analysis of enterovirus 71 base dinucleotide.

Authors:  Meng Wang; Li Chen; Wangjie Jin; Shasha Wang
Journal:  Virusdisease       Date:  2020-01-29

5.  Microbial lifestyle and genome signatures.

Authors:  Chitra Dutta; Sandip Paul
Journal:  Curr Genomics       Date:  2012-04       Impact factor: 2.236

6.  Species based synonymous codon usage in fusion protein gene of Newcastle disease virus.

Authors:  Chandra Shekhar Kumar; Sachin Kumar
Journal:  PLoS One       Date:  2014-12-05       Impact factor: 3.240

7.  Analysis of Synonymous Codon Usage Bias in Flaviviridae Virus.

Authors:  Huipeng Yao; Mengyu Chen; Zizhong Tang
Journal:  Biomed Res Int       Date:  2019-06-27       Impact factor: 3.411

8.  A comparison of synonymous codon usage bias patterns in DNA and RNA virus genomes: quantifying the relative importance of mutational pressure and natural selection.

Authors:  Youhua Chen
Journal:  Biomed Res Int       Date:  2013-10-02       Impact factor: 3.411

9.  Evolution of Synonymous Codon Usage in the Mitogenomes of Certain Species of Bilaterian Lineage with Special Reference to Chaetognatha.

Authors:  Sudeesh Karumathil; Vijaya R Dirisala; Uthpala Srinadh; Valaboju Nikhil; N Satya Sampath Kumar; Rahul R Nair
Journal:  Bioinform Biol Insights       Date:  2016-09-22

10.  Evolution of Synonymous Codon Usage Bias in West African and Central African Strains of Monkeypox Virus.

Authors:  Sudeesh Karumathil; Nimal T Raveendran; Doss Ganesh; Sampath Kumar Ns; Rahul R Nair; Vijaya R Dirisala
Journal:  Evol Bioinform Online       Date:  2018-03-09       Impact factor: 1.625

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.