Literature DB >> 21255687

Analysis of human P[4]G2 rotavirus strains isolated in Brazil reveals codon usage bias and strong compositional constraints.

Mariela Martínez Gómez1, Luis Fernando Lopez Tort, Eduardo de Mello Volotao, Ricardo Recarey, Gonzalo Moratorio, Héctor Musto, José Paulo G Leite, Juan Cristina.   

Abstract

The Rotavirus genus belongs to the family Reoviridae and its genome consist of 11 segments of double-stranded RNA. Group A rotaviruses (RV-A) are the main etiological agent of acute viral gastroenteritis in infants and young children worldwide. Understanding the extent and causes of biases in codon usage is essential to the understanding of viral evolution. However, the factors shaping synonymous codon usage bias and nucleotide composition in human RV-A are currently unknown. In order to gain insight into these matters, we analyzed the codon usage and base composition constraints on the two genes that codify the two outer capsid proteins (VP4 [VP8*] and VP7) of 58 P[4]G2 RV-A strains isolated in Brazil and investigated the possible key evolutionary determinants of codon usage bias. The results of these studies revealed that the frequencies of codon usage in both RV-A proteins studied are significantly different than the ones used by human cells. In order to observe if similar trends of codon usage are found when RV-A complete genomes are considered, we compare these results with results found using a dataset of 10 reference strains for whom the complete codes of the 11 segments are known. Similar results were obtained using capsid proteins or complete genomes. The general correlations found between the position of each sequence on the first axis generated by correspondence analysis and the relative dinucleotide abundances indicate that codon usage in RV-A can also be strongly influenced by underlying biases in dinucleotide frequencies. CpG and GpC containing codons are markedly suppressed. Thus, the results of this study suggest that RV-A genomic biases are the result of the evolution of genome composition in relation to host adaptation and the ability to escape antiviral cell responses.
Copyright © 2011 Elsevier B.V. All rights reserved.

Entities:  

Mesh:

Substances:

Year:  2011        PMID: 21255687      PMCID: PMC7172681          DOI: 10.1016/j.meegid.2011.01.006

Source DB:  PubMed          Journal:  Infect Genet Evol        ISSN: 1567-1348            Impact factor:   3.342


Introduction

Group A rotaviruses (RV-A) are the main etiological agent of acute viral gastroenteritis in infants and young children worldwide (Aoki et al., 2009, CDC, 2008). The Rotavirus genus belongs to the family Reoviridae and its genome consist of 11 double-stranded RNA (dsRNA) gene segments encoding six structural (VP) and six non-structural proteins (NSP) (Estes and Kapikian, 2007). Based on the two genes that codify the outer neutralizing capsid proteins, VP4 and VP7, a widely used binary classification system was established for RV-A that defined G (from VP7, glycoprotein) and P (from VP4, protease-cleaved protein) genotypes (Estes and Kapikian, 2007). To date, at least 25 G and 32 P genotypes have been identified (Matthijnssens et al., 2009, Matthijnssens et al., 2008, Collins et al., 2010, Abe et al., 2009, Ursu et al., 2009, Esona et al., 2010). Five RV-A G genotypes (G1–G4 and G9) and two P genotypes (P[8] and P[4]) are prevalent worldwide (Santos and Hoshino, 2005, Leite et al., 2008, Iturriza-Gómara et al., 2009). Different surveillance studies with RV-A-positive samples have shown that genotype P[4]G2 reemerges in Brazil in 2005, and since then has become predominant in this country (Carvalho-Costa et al., 2006, Gurgel et al., 2007, de Oliveira et al., 2008, Leite et al., 2008, Nakagomi et al., 2008, Mascarenhas et al., 2010). Due to the degeneracy of the genetic code, most amino acids are coded by more than one codon. Synonymous codons are not used randomly, and in several organisms natural selection seems to bias codon usage toward a certain subset of optimal codons, mainly in highly expressed genes (Stoletzki and Eyre-Walker, 2007). Two major models have been proposed to explain codon usage, the translation related model and the mutational model (Wong et al., 2010). Translational efficiency or translational accuracy bias may be due to the relationship between local tRNA abundance and major codon preference, wherein a particular codon of an amino acid family pairs most optimally with the most abundant tRNA (Ikemura, 1982). The discrepancies of codon usage could also be due to genome compositional constraints and mutational biases (Sharp et al., 1986). Understanding the extent and causes of biases in codon usage is essential to comprehend the interplay between viruses and the immune response (Shackelton et al., 2006). However, the factors shaping synonymous codon usage bias, like mutational pressure, nucleotide composition or translational selection are currently unknown for human RV-A. In order to gain insight into these matters, we analyzed the codon usage and base composition constraints of VP4 [VP8*] and VP7 gene sequences of 72 P[4]G2 RV-A strains isolated in Brazil and investigated the possible key evolutionary determinants of codon usage bias. In order to observe if similar trends of codon usage are found when RV-A complete genomes are considered, we compared these results with the ones found using a dataset of reference strains from which the complete sequences of the 11 segments are known. The results of these studies revealed a significant codon usage bias and compositional constraints in the human RV-A strains studied.

Materials and methods

Fecal samples, viral RNA extraction and PCR amplification

A total of 72 diarrheic stool specimens were collected from 1996 to 2009 from children up to 5 years old hospitalized with acute diarrhea. These samples were obtained from children from the States of Acre (AC), Alagoas (AL), Bahia (BA), Espirito Santo (ES), Maranhão (MA), Mato Grosso do Sul (MS), Minas Gerais (MG), Pernambuco (PE), Rio de Janeiro (RJ), Rio Grande do Sul (RS) and Sergipe (SE), and were genotyped as P[4]G2 as previously described (Fischer et al., 2000, Das et al., 1994). The viral dsRNA was extracted by the glass powder method (Boom et al., 1990). The dsRNA was reverse transcribed (RT) and amplified by polymerase chain reaction (PCR) using a pair of consensus primers corresponding to a conserved nucleotide sequence of the VP7 (Gouvea et al., 1990, Das et al., 1994) or VP4 (VP8*) (Gentsch et al., 1992, Gómez et al., 2010) genes. Temperature and time conditions for PCR amplifications were performed as originally described (Gouvea et al., 1990, Gentsch et al., 1992). Distilled Milli-Q water was used as a negative control in all steps, and recommended manipulations for PCR procedures were carried out as a precaution to avoid false-positive results.

Sequencing

DNA sequencing was performed with an ABI Prism Big Dye Terminator Cycle Sequencing Ready Reaction Kit® and an ABI Prism 3730 Genetic Analyzer (both from Applied Biosystems, Foster City, CA, USA). Sequences of the VP4 [VP8]* and VP7 genes were obtained by using the same set of primers utilized in the RT-PCR. For strain names and accession numbers, see Supplementary Material, Table 1. From the initial 72 stool samples, a total of 58 VP4 [VP8]* and 60 VP7 sequences, 818 and 978 nucleotides in-length, respectively, were obtained.

Codon usage analyses

The relative synonymous codon usage (RSCU) values of each codon in each gene (VP8* or VP7) were determined in order to measure the synonymous codon usage bias (Sharp and Li, 1986). This was done using the CodonW program (available at: http://mobyle.pasteur.fr). The RSCU of P[4]G2 RV-A VP8* and VP7 genes were compared with corresponding values of human cells (International Human Genome Sequencing Consortium, 2001). The effective number of codons (ENC) and the frequency of use of G+C at synonymous variable third positions of codons (GC3S) (excluding Met, Trp, and termination codons) were also calculated with CodonW. ENC was used to quantify the codon usage bias of an ORF (Wrigth, 1990, Comeron and Aguade, 1998). Similarly, the fraction of the G+C nucleotides not involved in the GC3S fraction (GC12) was also calculated. All these indices were also calculated using CodonW. Dinucleotides relative frequencies were also calculated using this program as implemented in the Mobyle server (http://mobyle.pasteur.fr).

Correspondence analysis (COA)

The relationship between variables and samples can be obtained using multivariate statistical analysis. COA is a type of multivariate analysis that allows a geometrical representation of the sets of rows and columns in a dataset (Wong et al., 2010, Greenacre, 1984). Each ORF is represented as a 59-dimensional vector and each dimension correspond to the RSCU value of one codon (excluding AUG, UGG and stop codons). Major trends within a dataset can be determined using measures of relative inertia and genes ordered according to their position along the axis of major inertia (Tao et al., 2009). COA was performed on the RSCU values of the ORFs studied using the CodonW program.

Statistical analysis

Correlation analysis was carried out using Spearman's rank correlation analysis method (Wessa, 2010; available at: www.wessa.net).

Sequence alignment

Sequences were aligned using the MUSCLE program (Edgar, 2004).

Comparative analysis

In order to observe if the codon usage bias found in the outer capsid proteins of P[4]G2 RV-A strains isolated in Brazil, can also be found in other genome regions or considering complete genome codes of human RV-A strains of different genotypes and isolated elsewhere, a new dataset composed of 10 human RV-A reference strains for whom the complete codes of the 11 genome segments are known was constructed. For strain names, genotypes, accession numbers and genomic constellations see Supplementary Material Table 3.

Results

In order to study the extent of codon usage bias in P[4]G2 RV-A isolated in Brazil, the RSCU values of the codons in VP4 [VP8*] and VP7 ORFs were calculated, and the figures obtained for these genes, comprising a dataset of 58 and 60 sequences, respectively, are shown in Table 1 .
Table 1

Codon usage in P[4]G2 RV-A strains, displayed as RSCU values.

AACodHCVP4VP7AACodHCVP4VP7AACodHCVP4VP7AACodHCVP4VP7
PheUUU0.921.801.86SerUCU1.141.071.00TyrUAU0.881.961.50CysUGU0.921.971.24
UUC1.080.200.14UCC1.320.400.27UAC1.120.040.50UGC1.080.030.76
LeuUUA0.482.722.33UCA0.901.963.56TERUAA******TERUGA******
UUG0.780.460.94UCG0.300.550.59UAG******TrpUGG1.001.001.00



CUU0.781.140.39ProCCU1.160.310.36HisCAU0.841.652.00ArgCGU0.480.040.64
CUC1.200.440.37CCC1.280.290.00CAC1.160.350.00CGC1.080.000.00
CUA0.420.891.27CCA1.123.382.91GlnCAA0.541.421.81CGA0.660.021.78
CUG2.400.350.70CCG0.440.010.72CAG1.460.580.19CGG1.200.410.64



IleAUU1.081.791.45ThrACU1.001.981.51AsnAAU0.941.851.77SerAGU0.901.990.33
AUC1.410.310.25ACC1.440.180.25AAC1.060.150.23AGC1.440.030.26
AUA0.510.901.30ACA1.121.471.42LysAAA0.861.651.90ArgAGA1.264.702.59
MetAUG1.001.001.00ACG0.440.370.82AAG1.140.350.10AGG1.260.840.35



ValGUU0.721.311.97AlaGCU1.081.742.49AspGAU0.921.571.22GlyGGU0.642.030.98
GUC0.960.750.01GCC1.600.020.13GAC1.080.430.78GGC1.360.260.24
GUA0.481.651.57GCA0.921.741.11GluGAA0.841.411.57GGA1.001.662.51
GUG1.840.280.45GCG0.440.500.28GAG1.160.590.43GGG1.000.050.26

RSCU, relative synonymous codon usage; AA, amino acid; Cod, codons; HC, human cells; TER, termination codon. More frequent codons in both VP4 [VP8*] and VP7 with respect to human cells are shown in bold. Codon CGC (Arg) not used in VP4 [VP8*] and VP7 P[4]G2 RV-A isolated in Brazil are shown in italics.

Codon usage in P[4]G2 RV-A strains, displayed as RSCU values. RSCU, relative synonymous codon usage; AA, amino acid; Cod, codons; HC, human cells; TER, termination codon. More frequent codons in both VP4 [VP8*] and VP7 with respect to human cells are shown in bold. Codon CGC (Arg) not used in VP4 [VP8*] and VP7 P[4]G2 RV-A isolated in Brazil are shown in italics. Interestingly, the frequencies of codon usage in both VP4 [VP8*] and VP7 P[4]G2 RV-A ORFs are significantly different in relation to human cells. Particularly, extremely high biased frequencies were found for UUU (Phe), UUA (Leu), GUU and GUA (Val), UCA (Ser), CCA (Pro), GCU (Ala), UAU (Tyr), CAU (His), CAA (Gln), AAU (Asn), AAA (Lys), GAA (Glu), UGU (Cys), AGA (Arg) and GGA (Gly) in both ORFs (see Table 1). As can be seen, highly preferred codons are all U/A ending, which strongly suggests that mutational bias is the main force shaping codon usage in these two genes. It is interesting to note that CGC (Arg) is not used in both ORFs. In order to investigate if these P[4]G2 RV-A strain sequences display similar composition features, the ENC values were calculated for VP8* and VP7 ORFs. These values range from 35.21 to 40.49 for VP8* and from 38.97 to 41.88 for VP7 (mean ENCs values are 37.36 and 40.56 for VP4 [VP8*] and VP7, respectively). For results obtained for Brazilian strains enrolled in these studies, see Supplementary Material Table 2. Due to the fact that almost all ENC values are <40, the results obtained for the two ORFs studied reveal that codon usage in P[4]G2 RV-A is biased. An ENC–GC3S plot (ENC plotted against GC3S) can be used as a method that quantifies how far the codon usage of a gene departs from equal usage of synonymous codons (Wrigth, 1990). As shown in Fig. 1 , the dotted continuous line in the plot represents a curve if codon usage is only determined by GC content at the third codon position. In other words, if GC3S is the only determinant factor shaping the codon usage pattern, the values of ENC would fall on a continuous curve, which represents random codon usage (Jiang et al., 2007). If G+C compositional constraint influences the codon usage, then the GC3S and ENC correlated spots would lie on or bellow the expected curve (Tsai et al., 2007). Otherwise, the codon usage bias of genes may be affected by other factors such as translational selection.
Fig. 1

Effective number of codons used in each ORF plotted against the GC3S. The curve plots the relationship between GC3S and ENC in absence of selection. Black square dots show the results obtained for RV-A strains. All of them lie below the expected curve. The results found for VP4 and VP7 are shown in (A) and (B), respectively.

Effective number of codons used in each ORF plotted against the GC3S. The curve plots the relationship between GC3S and ENC in absence of selection. Black square dots show the results obtained for RV-A strains. All of them lie below the expected curve. The results found for VP4 and VP7 are shown in (A) and (B), respectively. When the GC3S values were calculated for VP4 [VP8*] and VP7 ORFs and the ENC–GC3S plots constructed (for ENC and GC3S values obtained for Brazilian strains enrolled in these studies, see Supplementary Material Table 2), all spots lie below and “parallel” in relation to the expected curve for both ORFs studied, indicating that the codon usage bias may be influenced by the G+C compositional constraints (see Fig. 1). Since codon usage by its very nature is multivariate, it is necessary to analyze the data using multivariate statistical techniques (i.e. COA) in order to confirm these findings. COA is an ordination technique that identifies the major trends in the variation of the data and distributes genes along continuous axes in accordance with these trends. Moreover, it has the advantage that it does not assume that the data falls into discrete clusters and therefore can represent continuous variation accurately (Greenacre, 1984). COA creates a series of orthogonal axes to identify trends that explain the data variation, with each subsequent axis explaining a decreasing amount of the variation (Greenacre, 1984). The correlation between the position on the first axis generated by COA for each gene and the respective GC3S values of each strain was analyzed for both VP4 [VP8*] and VP7 ORFs studied. We have found that the position of the sequences on the first axis from COA are highly correlated with the GC3S values in both VP4 [VP8*] and VP7 ORFs (r  = 0.625, P  < 0.0001 and r  = −0.469, P  < 0.001 for VP4 [VP8] and VP7, respectively). Taking altogether, these results reveal that most of the codon usage bias is directly related to the nucleotide composition. Nevertheless, other factors may be also acting in shaping codon usage bias. In order to analyze if the codon usage biases reported above can also be found using other genome regions or considering complete genome sequences, a new dataset was constructed composed of 10 human RV-A reference strains, for which the complete genomes of the 11 segments are known. For strains names, genotypes, accession numbers and genomic constellations, see Supplementary Material Table 3. By concatenation of different genome ORF's sequences, the RSCU values of the different codons were calculated for different virus regions (outer capsid shell proteins, OC, VP4+VP7; intermediate protein shell, IM, VP6; inner capid shell proteins, IC, VP1+VP2+VP3; non-structural proteins, NSP, NSP1+NSP2+NSP3+NSP4+NSP5; and full genome, VP4+VP7+VP6+VP1+VP2+VP3+NSP1+NSP2+NSP3+NSP4+NSP5, which accounts for a total of 54,318 codons). The results of these studies are shown in Table 2 .
Table 2

Codon usage in RV-A strains of different genotypes, expressed by RSCU values.

AACodHCOCIMICNSPFullAACodHCOCIMICNSPFull
PheUUU0.921.531.581.521.521.51SerUCU1.141.301.021.131.661.30
UUC1.080.470.420.480.480.49UCC1.320.260.430.290.280.28
LeuUUA0.482.741.412.591.892.33UCA0.902.843.073.262.292.93
UUG0.780.991.251.141.321.17UCG0.300.540.550.620.540.57



CUU0.780.471.370.600.920.70ProCCU1.160.610.260.430.900.54
CUC1.200.290.320.180.390.25CCC1.280.270.020.120.200.15
CUA0.421.091.181.041.031.07CCA1.122.683.342.852.552.81
CUG2.400.420.480.460.450.48CCG0.440.440.380.600.350.51



IleAUU1.081.131.801.021.801.24ThrACU1.001.141.351.351.641.35
AUC1.410.220.300.230.270.24ACC1.440.240.120.240.240.23
AUA0.511.650.901.750.931.51ACA1.121.732.111.631.521.66
MetAUG1.001.001.001.001.001.00ACG0.440.890.430.780.610.76



ValGUU0.720.951.091.361.671.33AlaGCU1.081.251.571.441.341.40
GUC0.960.330.710.380.320.37GCC1.600.260.320.330.170.28
GUA0.481.771.461.461.111.43GCA0.921.761.361.622.031.69
GUG1.840.940.740.800.900.86GCG0.440.730.750.610.450.63



TyrUAU0.881.461.131.471.331.43CysUGU0.921.351.801.391.171.31
UAC1.120.540.870.530.670.57UGC1.080.650.200.610.830.69
TERUAA************TERUGA************
UAG************TrpUGG1.001.001.001.001.001.00



HisCAU0.841.691.801.661.481.62ArgCGU0.480.120.460.600.740.53
CAC1.160.310.200.340.520.38CGC1.080.220.120.230.060.18
GlnCAA0.541.201.281.281.381.29CGA0.660.870.310.500.590.56
CAG1.460.800.720.720.620.71CGG1.200.290.000.150.180.18



AsnAAU0.941.531.381.521.561.51SerAGU0.900.850.550.580.920.71
AAC1.060.470.620.480.440.49AGC1.440.220.390.110.310.21
LysAAA0.861.521.661.491.481.49ArgAGA1.263.754.983.863.503.85
AAG1.140.480.340.510.520.57AGG1.260.750.120.670.920.70



AspGAU0.921.381.541.431.631.47GlyGGU0.641.441.031.311.451.34
GAC1.080.620.460.570.370.53GGC1.360.340.340.260.200.28
GluGAA0.841.301.461.461.471.43GGA1.001.902.402.022.042.01
GAG1.160.700.540.540.530.57GGG1.000.320.230.410.310.37

RSCU, relative synonymous codon usage; AA, amino acid; Cod, codons; HC, human cells; OC, outer capsid shell proteins; IM, intermediate protein shell; IC, inner capsid shell proteins; NSP, non-structural proteins; Full, full genome; TER, termination codon. More frequent codons with respect to human cells found in all genome regions studied are shown in bold. Frequencies sharply reduced with respect to frequencies found in human cells are shown in italics.

Codon usage in RV-A strains of different genotypes, expressed by RSCU values. RSCU, relative synonymous codon usage; AA, amino acid; Cod, codons; HC, human cells; OC, outer capsid shell proteins; IM, intermediate protein shell; IC, inner capsid shell proteins; NSP, non-structural proteins; Full, full genome; TER, termination codon. More frequent codons with respect to human cells found in all genome regions studied are shown in bold. Frequencies sharply reduced with respect to frequencies found in human cells are shown in italics. Again, the frequencies of codon usage found in different genomic regions or considering complete genomes of RV-A are significantly different in relation to human cells (see Table 1, Table 2). Highly biased frequencies were also found for the same amino acids in all genomic regions or considering full genomes (Table 2) and in agreement with the previous results found using outer capsid proteins from P[4]G2 RV-A strains isolated in Brazil. The correlation between the position on the first axis generated by COA and the respective GC3S values of each strain was analyzed for the complete genome dataset. A high and significant correlation among the position of the sequences on the first axis of COA and the GC3S values (r  = −0.9879, P  < 0.01) was also found using full, complete genomes. It has been suggested that dinucleotide biases can affect codon bias (Tao et al., 2009). To study this possibility, the relative abundances of the 16 dinucleotides in VP8* and/or VP7 ORFs was established. The results of these studies are shown in Table 3 . As can be seen, the occurrences of dinucleotides are not random and no dinucleotides is present at the expected frequencies.
Table 3

Relative abundance of dinucleotides in VP4 [VP8*] and VP7 proteins from P[4]G2 RV-A Brazilian strains and summary of COA.

VP4 [VP8]
UUUCUAUGCUCCCACG
Mean ± S.D.a1.490 ± 0.0350.823 ± 0.0241.665 ± 0.0210.846 ± 0.0200.610 ± 0.0220.381 ± 0.0121.157 ± 0.0230.230 ± 0.035
Axis 1br−0.2610.384−0.4270.038−0.0100.5600.1270.453
P0.040<0.01<0.0010.7640.928<0.00010.317<0.001

Mean values of 58 P[4]G2 RV-A strains’ relative dinucleotide ratios ± standard deviation.

Correlation analysis between the first axis in COA and the sixteen dinucleotides frequencies in VP4 [VP8*] and VP7 proteins is shown.

Relative abundance of dinucleotides in VP4 [VP8*] and VP7 proteins from P[4]G2 RV-A Brazilian strains and summary of COA. Mean values of 58 P[4]G2 RV-A strains’ relative dinucleotide ratios ± standard deviation. Correlation analysis between the first axis in COA and the sixteen dinucleotides frequencies in VP4 [VP8*] and VP7 proteins is shown. In the case of VP4 [VP8*] protein, the relative abundance of CpG and GpC showed a strong deviation from the expected frequencies (i.e. 1.0) (mean ± S.D. = 0.230 ± 0.035 and 0.282 ± 0.009, respectively) and were markedly underrepresented. On the other hand, ApU and ApA are markedly over-used (mean ± S.D. = 1.951 ± 0.033 and 1.979 ± 0.04, respectively) (Table 3). Among the 16 dinucleotides, 10 are correlated with the first axis value in COA (P values <0.01, Table 3). These observations indicated that the composition of dinucleotides also determines the variation in synonymous codon usage among P[4]G2 RV-A VP4 [VP8*] ORFs. To study the possible effects of CpG and GpC under-representation on codon usage bias of VP4 [VP8*] protein, the RSCU value of the 14 codons that contain CpG and/or GpC (CCG, GCG, UCG, ACG, CGC, CGG, CGU, CGA, GCU, GCC, GCA, UGC, AGC, GGC) were analyzed. Of these triplets, 12 [CCG (mean 0.01), GCG (mean 0.50), UCG (mean 0.35), ACG (mean 0.37), CGC (mean 0.00), CGG (mean 0.41) and CGU (mean 0.04), GCC (mean 0.02), CGA (mean 0.02), UGC (mean 0.03), AGC (mean 0.03) and GGC (mean 0.26)] were markedly suppressed. In the case of VP7 protein, again, the relative abundance of CpG and GpC showed a strong deviation from the expected frequencies (mean ± S.D. = 0.397 ± 0.014 and 0.330 ± 0.018, respectively) and were underrepresented. Interestingly, the frequencies of ApU and ApA showed a sharp deviation from the expected frequencies and again we found a markedly over-use of these dinucleotides (mean ± S.D. = 2.056 ± 0.029 and 1.948 ± 0.038, respectively) (Table 3). Among the 16 dinucleotides, seven are correlated with the position of the sequences along the first axis in COA (P values <0.01, Table 3). These results indicate that the composition of dinucleotides also determines the variation in synonymous codon usage among P[4]G2 RV-A VP7 ORFs. The RSCU value for the VP7 protein of the 14 codons that contain CpG and GpC (see above) revealed that six [GCG (mean 0.28), CGC (mean 0.00), GCC (mean 0.13), GCG (mean 0.28), AGC (mean 0.26) and GGC (mean 0.24)] were markedly suppressed and five [CCG (mean 0.73), UCG (mean 0.59), CGG (mean 0.64), CGU (mean 0.64) and UGC (mean 0.75)] were slightly suppressed. Besides, the position of each codon in each of the four major axes of COA was determined for both proteins studied. For VP4 [VP8*] ORFs, the first major axis accounted for the 28.67% of the observed variation, while the second, third and fourth axis accounted for the 21.57%, 18.56% and 12.39%, respectively. For VP7 ORFs, the first major axis accounted for the 66.00% of the observed variation; the second, third and fourth major axis accounted for the 14.82%, 8.09% and 2.40% of the observed variation, respectively. Table 4 shows the codons for which the maximum and minimum values were obtained for each of the axes studied (i.e. the most divergent codons values), indicating a strong bias in their use by both VP4 and VP7 proteins. As can be seen, the most divergent triplets tend to be GC-rich (considering the two ORFs, G+C explains 19/24 positions of these codons). Again, this can be explained in terms of a strong mutational bias.
Table 4

Position of codons in each of the four major axes of COA for RV-A VP4 [VP8*] and VP7 proteins.

Axis 1
Axis 2
Axis 3
Axis 4
CodonValueAminoacidCodonValueAminoacidCodonValueAminoacidCodonValueAminoacid
VP4GCC−4.183AlaUGC−7.683CysCCG−6.889ProGCC−2.634Ala
UGC0.660CysCCG0.898ProGCC2.071AlaGGG1.967Gly



VP7GUC0.610ValGGG0.212GlyGUC−0.960ValGUC−0.100Val
AGG1.488ArgGUC4.447ValUUC0.337PheUUC0.279Phe
Position of codons in each of the four major axes of COA for RV-A VP4 [VP8*] and VP7 proteins. In order to observe if the same results found using outer capsid proteins of P[4]G2 RV-A strains can be found using complete genomes, the same studies were repeated using a dataset of full complete genomes (for strains, accession numbers and genomic constellations, see Supplementary Material Table 3). The results of these studies are shown in Supplementary Material Table 4. Again, the relative abundance of CpG and GpC showed a strong deviation from the expected frequencies (i.e. 1.0) (mean ± S.D. = 0.360 ± 0.021) and (0.468 ± 0.038, respectively) and were markedly underrepresented. The frequencies of ApU and ApA also showed a sharp deviation from the expected frequencies and were markedly over-used (mean ± S.D. = 1.907 ± 0.069 and 2.089 ± 0.048, respectively). Among the 16 dinucleotides, seven are correlated with the position of the sequences along the first axis in COA (P values <0.01, Supplementary Material Table 4). Taking all these results together, it is possible to observe that the composition of dinucleotides also determines the variation in synonymous codon usage in the complete sequences of human RV-A.

Discussion

The results of these studies revealed that codon usage for VP4 [VP8*] and VP7 in P[4]G2 RV-A is quite different from that of human genes (see Table 1). Moreover, this is also observed considering all different genome regions or complete, full genome codes (see Table 2). This is in agreement with results found for other viruses such as human immunodeficiency virus 1 (HIV-1) (Grantham and Perrin, 1986, Kypr and Mrazek, 1987) and hepatitis A virus (Aragones et al., 2008). In other RNA viruses, like poliovirus or foot-and-mouth disease virus (FMDV) the codon usage is very similar to that of their hosts, implying competence for tRNAs among virus and host (Sanchez et al., 2003). In these cases, competition is avoided by the induction of cellular shutoff of protein synthesis through carboxy cleavage of translation initiation factor 4G (eIF4G) by 2A and L proteases, respectively (Racaniello, 2001). Early during the infection process RV-A also takes over the host translation machinery of the cell, causing a shutoff of cell protein synthesis, although by a different mechanism of picornaviruses. After RV-A infection, the translation initiation factor 2α (eIF2α) becomes phosphorylated and remains in this state throughout the virus replication cycle, leading to a further inhibition of cell protein synthesis (Montero et al., 2008). However, recent studies have shown that under these restrictive conditions, the viral proteins and some cellular proteins are efficiently translated (Montero et al., 2008). Whether this extremely different strategy in codon usage among RV-A and human cells is related to this fact is currently unknown, but might allow RV-A to compete successfully for translation of viral RNAs. We analyzed synonymous codon usage and nucleotide compositional constraints in VP4 [VP8*] and VP7 genes of P[4]G2 RV-A and compare the results found with a dataset of RV-A reference strains from which the complete sequences for the 11 segments were previously known. Interestingly, in contrary to previous results found for other viruses such H5N1 influenza A Virus (mean ENC = 50.91) (Ahn et al., 2006, Zhou et al., 2005); SARS (mean ENC = 48.99) (Zhao et al., 2008); FMDV (mean ENC = 51.42) (Zhong et al., 2007); classical swine fever virus (mean ENC = 51.7) (Tao et al., 2009) and duck enteritis virus (mean ENC = 52.17) (Jia et al., 2009), the ENC values found for human P[4]G2 RV-A are comparatively low (mean ENC values of 37.36 and 40.56 for VP8* and VP7, respectively). Moreover, when the complete genomes are studied (accounting for 54,318 codons), the mean ENC value obtained is 41.60. This indicates that the overall extent of codon usage bias in RV-A genomes is significant. We observed a general correlation between codon usage bias and base composition was observed, since all spots in the ENC–GC3S plot lie below the curve of the predicted values (Fig. 1). Highly significant correlations between the first axis of COA and GC3S values were obtained for both outer surface protein shells. Moreover, concatenation of complete sequences of the 11 segments of 10 reference human RV-A strains also show this significant correlation. All these results strongly suggest that mutational pressure is an important factor in determining codon usage bias in human RV-A. Nevertheless, we cannot completely discard other factors that may also account for codon usage bias. The frequencies of dinucleotides were not random and no dinucleotides was present at the expected frequencies for both ORFs studied (VP8* and VP7, see Table 3). The same results are found using the complete genome dataset (Supplementary Material Table 4). CpG and GpC containing codons are markedly suppressed (see Table 1, Table 2). Marked CpG deficiency has been also observed in Coronaviruses (Woo et al., 2007), vertebrate-infecting members of the family Flaviviridae (Lobo et al., 2009), poliovirus (Rothberg and Wimmer, 1981) and other RNA viruses (Karlin et al., 1994). The CpG deficiency was proposed to be related to the immunostimulatory properties of unmethylated CpG, which were recognized by the host's innate immune system as a pathogen signature (Shackelton et al., 2006, Woo et al., 2007). This is now known to be triggered by the intracellular Pattern Recognition Receptor (PRR) Toll-like 9 (TLR9), which recognizes CpG-unmethylated DNA, and triggers several immune response pathways (Dorn and Kippenberger, 2008). Since the vertebrate immune system relies on unmethylated CpG recognition in DNA molecules as a sign of infection, and CpG under-representation in RNA viruses is exclusively observed in vertebrate viruses (Lobo et al., 2009), it is reasonable to suggest that a TLR9-like mechanism exists in the vertebrate immune system which recognizes CpG when in RNA context (such as in the genomes of RNA viruses) and triggers immune responses (Lobo et al., 2009). Moreover, recent studies on influenza A viruses, which have originated from an avian reservoir and have been infecting human hosts since 1918, were selected under strong pressure to reduce the frequency of CpG in its genome (Greenbaum et al., 2008). The results of this work provide a basic knowledge of the mechanisms that give rise to codon usage bias in human RV-A and are also useful in understanding the processes involved in RV-A evolution. Further studies will be needed to reveal more about RV-A viral genome.
  50 in total

Review 1.  Rotavirus disease and vaccination: impact on genotype diversity.

Authors:  Jelle Matthijnssens; Joke Bilcke; Max Ciarlet; Vito Martella; Krisztián Bányai; Mustafizur Rahman; Mark Zeller; Philippe Beutels; Pierre Van Damme; Marc Van Ranst
Journal:  Future Microbiol       Date:  2009-12       Impact factor: 3.165

2.  Characterization of rotavirus strains from newborns in New Delhi, India.

Authors:  B K Das; J R Gentsch; H G Cicirello; P A Woods; A Gupta; M Ramachandran; R Kumar; M K Bhan; R I Glass
Journal:  J Clin Microbiol       Date:  1994-07       Impact factor: 5.948

3.  Analysis of synonymous codon usage in H5N1 virus and other influenza A viruses.

Authors:  Tong Zhou; Wanjun Gu; Jianmin Ma; Xiao Sun; Zuhong Lu
Journal:  Biosystems       Date:  2005-04-07       Impact factor: 1.973

Review 4.  Global distribution of rotavirus serotypes/genotypes and its implication for the development and implementation of an effective rotavirus vaccine.

Authors:  Norma Santos; Yasutaka Hoshino
Journal:  Rev Med Virol       Date:  2005 Jan-Feb       Impact factor: 6.989

5.  Rotavirus surveillance in europe, 2005-2008: web-enabled reporting and real-time analysis of genotyping and epidemiological data.

Authors:  M Iturriza-Gómara; T Dallman; K Bányai; B Böttiger; J Buesa; S Diedrich; L Fiore; K Johansen; N Korsun; A Kroneman; M Lappalainen; B László; L Maunula; J Matthinjnssens; S Midgley; Z Mladenova; M Poljsak-Prijatelj; P Pothier; F M Ruggeri; A Sanchez-Fauquier; E Schreier; A Steyer; I Sidaraviciute; A N Tran; V Usonis; M Van Ranst; A de Rougemont; J Gray
Journal:  J Infect Dis       Date:  2009-11-01       Impact factor: 5.226

Review 6.  Clinical application of CpG-, non-CpG-, and antisense oligodeoxynucleotides as immunomodulators.

Authors:  Annette Dorn; Stefan Kippenberger
Journal:  Curr Opin Mol Ther       Date:  2008-02

7.  Rotavirus surveillance--worldwide, 2001-2008.

Authors: 
Journal:  MMWR Morb Mortal Wkly Rep       Date:  2008-11-21       Impact factor: 17.586

8.  Identification of two sublineages of genotype G2 rotavirus among diarrheic children in Parauapebas, Southern Pará State, Brazil.

Authors:  Joana D'Arc Pereira Mascarenhas; Clarissa Silva Lima; Darleise Silva de Oliveira; Sylvia de Fátima dos Santos Guerra; Régis Piloni Maestri; Yvone Benchimol Gabbay; Ian Carlos Gomes de Lima; Euzeni Maria Costa de Menezes; Alexandre da Costa Linhares; Gilberta Bensabath
Journal:  J Med Virol       Date:  2010-04       Impact factor: 2.327

9.  Predominance of rotavirus P[4]G2 in a vaccinated population, Brazil.

Authors:  Ricardo Q Gurgel; Luis E Cuevas; Sarah C F Vieira; Vanessa C F Barros; Paula B Fontes; Eduardo F Salustino; Osamu Nakagomi; Toyoko Nakagomi; Winifred Dove; Nigel Cunliffe; Charles A Hart
Journal:  Emerg Infect Dis       Date:  2007-10       Impact factor: 6.883

10.  Analysis of synonymous codon usage in classical swine fever virus.

Authors:  Pan Tao; Li Dai; Mengcheng Luo; Fangqiang Tang; Po Tien; Zishu Pan
Journal:  Virus Genes       Date:  2008-10-29       Impact factor: 2.332

View more
  4 in total

1.  Codon usage of host-specific P genotypes (VP4) in group A rotavirus.

Authors:  Han Wu; Bingzhe Li; Ziping Miao; Linjie Hu; Lu Zhou; Yihan Lu
Journal:  BMC Genomics       Date:  2022-07-16       Impact factor: 4.547

2.  Rotavirus A Genome Segments Show Distinct Segregation and Codon Usage Patterns.

Authors:  Irene Hoxie; John J Dennehy
Journal:  Viruses       Date:  2021-07-27       Impact factor: 5.048

3.  Pandemic influenza A virus codon usage revisited: biases, adaptation and implications for vaccine strain development.

Authors:  Natalia Goñi; Andrés Iriarte; Victoria Comas; Martín Soñora; Pilar Moreno; Gonzalo Moratorio; Héctor Musto; Juan Cristina
Journal:  Virol J       Date:  2012-11-08       Impact factor: 4.099

4.  Revelation of Influencing Factors in Overall Codon Usage Bias of Equine Influenza Viruses.

Authors:  Naveen Kumar; Bidhan Chandra Bera; Benjamin D Greenbaum; Sandeep Bhatia; Richa Sood; Pavulraj Selvaraj; Taruna Anand; Bhupendra Nath Tripathi; Nitin Virmani
Journal:  PLoS One       Date:  2016-04-27       Impact factor: 3.240

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.