Literature DB >> 24595095

Genome-wide analysis of codon usage and influencing factors in chikungunya viruses.

Azeem Mehmood Butt1, Izza Nasrullah2, Yigang Tong3.   

Abstract

Chikungunya virus (CHIKV) is an arthropod-borne virus of the family Togaviridae that is transmitted to humans by Aedes spp. mosquitoes. Its genome comprises a 12 kb single-strand positive-sense RNA. In the present study, we report the patterns of synonymous codon usage in 141 CHIKV genomes by calculating several codon usage indices and applying multivariate statistical methods. Relative synonymous codon usage (RSCU) analysis showed that the preferred synonymous codons were G/C and A-ended. A comparative analysis of RSCU between CHIKV and its hosts showed that codon usage patterns of CHIKV are a mixture of coincidence and antagonism. Similarity index analysis showed that the overall codon usage patterns of CHIKV have been strongly influenced by Pan troglodytes and Aedes albopictus during evolution. The overall codon usage bias was low in CHIKV genomes, as inferred from the analysis of effective number of codons (ENC) and codon adaptation index (CAI). Our data suggested that although mutation pressure dominates codon usage in CHIKV, patterns of codon usage in CHIKV are also under the influence of natural selection from its hosts and geography. To the best of our knowledge, this is first report describing codon usage analysis in CHIKV genomes. The findings from this study are expected to increase our understanding of factors involved in viral evolution, and fitness towards hosts and the environment.

Entities:  

Mesh:

Substances:

Year:  2014        PMID: 24595095      PMCID: PMC3942501          DOI: 10.1371/journal.pone.0090905

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Chikungunya virus (CHIKV), a member of the genus alphavirus of the family Togaviridae, is a small (60–70 nm), enveloped, single-strand positive-sense RNA virus. The genome is approximately 12 kb in size and comprises two open reading frames (ORFs) encoding non-structural and structural proteins, respectively [1]. The CHIKV genome is arranged in the order of 5-′cap-nsP1-nsP2-nsP3-nsP4-(junction region)-C-E3-E2-6K-E1-poly(A)-3′ [1]. Since the first isolation of CHIKV from a febrile individual in Tanzania in 1953 [2], CHIKV has caused several outbreaks in Asia, Africa, and Indian Ocean islands, emerging as a serious public health concern [3]–[6]. CHIKV infection is characterized by abrupt onset of high fever, headache, rashes, arthralgia and myalgia. The typical clinical sign of the disease is poly-arthralgia, which is a very painful condition affecting joints and may persist for several months to years in some cases [7]. Being an arthropod-borne virus, the mode of transmission is the mosquitoes of the Aedes spp. It is generally accepted that CHIKV originated from Africa, where it is primarily maintained in a yellow fever-like zoonotic sylvatic cycle and depends upon non-human primates and arboreal, peridomestic mosquitoes as reservoir hosts. However, the spread of CHIKV in Asia and urban endemics are associated with a dengue-like “human-mosquito-human” direct transmission cycle, where A. aegypti and A. albopuctus serve as primary transmission vectors and humans serve as hosts [7]–[9]. The genetic code comprises 64 codons that can be divided into 20 groups, where each group consists of one to six codons, and each group corresponds to each of the standard amino acids. Alternative codons within the same group coding for the same amino acid are often termed ‘synonymous’ codons, although their corresponding tRNAs might differ in their relative abundance in cells and in the speed by which they are recognized by the ribosome. This redundancy of the genetic code, in which most of the amino acids can be translated by more than one codon, represents a key step in modulating the efficiency and accuracy of protein production, while maintaining the same amino acid sequence of the protein. On the other hand, the synonymous codons are not chosen randomly both within and between genomes, which is referred to as codon usage bias [10],[11]. This phenomenon of synonymous codon usage bias has been studied in a wide range of organisms, from prokaryotes to eukaryotes and viruses [12]–[17]. Studies on codon usage have determined several factors that could influence codon usage patterns, including mutational pressure, natural or translational selection, secondary protein structure, replication and selective transcription, hydrophobicity and hydrophilicity of the protein and the external environment. Among these, the major factors responsible for codon usage variation among different organisms are considered to be compositional constraints under mutational pressure and natural selection [12], [18]–[20]. Previous studies on codon usage in different viruses have highlighted mutational pressure as the major factor in shaping codon usage patterns compared with natural selection [12], [21]–[23]; however, as our understanding of codon usage increases, it appears that although mutational pressure is still a major driving force, it is certainly not the only one when considering different types of RNA and DNA viruses [24]–[27]. Considering their comparatively small genome size and other viral features, such as dependence on host’s machinery for key process including replication, protein synthesis and transmission in comparison with prokaryotic and eukaryotic genomes, the interplay of codon usage among viruses and their hosts is expected to affect overall viral survival, fitness, evasion from host’s immune system and evolution [15], [28]. Therefore, knowledge of the codon usage in viruses can not only reveal information about molecular evolution, but also improve our understanding of the regulation of viral genes expression and aid vaccine design, where the efficient expression of viral proteins may be required to generate immunity. In the present study, we report the detailed codon usage data and analysis of various factors shaping the codon usage patterns in CHIKV genomes.

Results and Discussion

Nucleotide Composition Analysis of CHIKV Genomes

Codon usage bias, or preference for one type of codon over another, can be influenced greatly by the overall nucleotide composition of genomes [21]. Therefore, we first analyzed the nucleotide composition of coding sequences from CHIKV genomes. As shown in Table 1, the mean A% (28.91) was the highest, followed by similar composition of G% (25.75) and C% (25.19), with the U% being the lowest (20.16). The mean GC and AU compositions were 50.91% and 49.06% respectively. This appears to suggests there might be equal or almost equal distribution of A, U, G, and C nucleotides among codons of CHIKVs, with potentially more preference towards A-ended codons followed by G/C-ended codons. However, a clearer picture of overall nucleotide composition that could influence the codon usage preference in CHIKV genomes emerged from the analysis of the nucleotide composition of the third position of codons (A3, U3, G3, C3) and of GC1, GC1,2, GC3 and AU3 (Table 1). The mean C3 and G3 were the highest, followed by A3 and U3. The GC3 values ranged from 54.9% to 57.2%, with a mean of 55.86% and a standard deviation (SD) 0.40 compared with that of AU3, whose values ranged from 42.8% to 45.1%, with a mean of 44.14% and an SD of 0.41. The GC1 ranged from 50.6% to 53.8%, with a mean of 53.56% and an SD 0.27. The GC1,2 values ranged from 48.2% to 48.7%, with an average of 48.45% and an SD of 0.07. Therefore, from the initial nucleotide composition analysis, it is expected that G/C-ended codons might be preferred over A/U-ended codons in CHIKV genomes.
Table 1

Nucleotide composition analysis of CHIKV genomes (%).

NoAUGCA3 U3 G3 C3 AUGCGC1 GC2 AU3 GC3 GC12 ENC
129.019.925.725.427.116.726.829.348.951.153.743.543.856.248.655.13
228.920.025.725.427.016.726.929.348.951.153.743.343.756.348.555.11
328.920.325.924.926.717.727.428.249.250.853.543.344.455.648.455.66
428.919.925.825.426.716.927.229.248.851.253.843.443.656.448.655.09
528.920.125.725.327.017.227.028.949.051.053.743.444.255.948.655.54
629.020.125.725.327.216.926.729.249.151.053.743.444.155.948.655.33
728.820.425.924.926.617.927.428.149.250.853.543.344.555.548.455.95
828.820.425.924.926.617.927.428.149.250.853.543.344.555.548.455.94
928.820.425.924.926.617.927.328.149.250.853.543.444.555.548.555.91
1028.720.125.925.226.017.227.828.948.851.253.443.443.256.748.454.93
1128.720.125.925.326.217.027.729.148.851.253.443.443.256.848.454.81
1228.720.225.925.226.217.327.728.848.951.153.443.443.556.548.454.97
1328.920.425.824.926.818.027.328.049.350.753.543.344.855.248.455.88
1428.920.425.824.926.818.027.228.049.350.753.543.344.855.248.455.84
1528.820.525.924.826.618.227.527.749.350.653.543.244.855.348.456.09
1629.020.025.725.427.116.926.829.149.051.053.743.444.055.948.655.24
1728.820.525.924.826.618.127.627.749.350.753.543.344.755.348.456.04
1829.020.225.725.127.017.526.928.649.250.853.643.444.555.548.555.52
1928.720.125.925.326.217.127.729.148.851.253.443.443.356.748.454.92
2028.720.026.025.326.117.027.829.248.751.353.443.543.156.948.554.66
2128.920.225.725.226.217.127.629.149.151.253.443.543.356.748.554.85
2229.020.125.725.327.017.327.028.849.150.953.643.544.355.848.655.53
2328.820.525.924.826.718.127.527.849.350.753.443.344.855.348.456.08
2428.720.125.925.426.017.127.729.248.851.353.543.443.156.948.554.80
2529.120.025.625.327.217.226.728.949.150.953.643.544.455.648.655.48
2628.820.525.924.826.618.227.427.749.350.653.543.344.855.148.456.11
2728.820.525.924.826.618.127.527.849.350.753.543.244.755.348.456.02
2828.920.525.924.826.718.127.527.849.450.753.543.244.855.348.456.02
2929.120.025.625.327.416.926.729.149.150.953.743.444.355.848.655.08
3028.920.025.725.327.016.926.929.248.951.053.543.443.956.248.555.32
3128.820.525.924.826.518.227.627.749.350.753.443.444.755.348.456.15
3228.720.026.025.425.916.927.929.348.751.353.443.442.857.248.454.57
3328.620.026.025.326.017.027.829.248.651.353.643.443.057.048.554.56
3428.820.625.924.726.618.427.527.549.450.653.543.345.055.048.456.26
3528.820.625.924.726.618.327.627.549.450.653.443.344.955.148.456.22
3628.820.625.924.726.618.327.527.649.450.653.443.344.955.148.456.28
3729.020.125.725.326.917.327.028.849.150.953.643.444.255.748.555.55
3828.720.125.925.226.117.227.729.048.851.153.443.343.356.748.454.55
3929.020.025.725.326.917.127.029.149.051.053.643.344.056.048.555.45
4029.020.025.725.326.917.127.029.149.051.053.643.344.056.048.555.45
4128.920.125.725.326.416.827.129.749.051.453.843.743.256.848.855.44
4228.819.725.925.626.718.427.427.548.550.553.443.345.154.948.456.25
4328.820.725.924.626.917.226.929.049.550.953.643.344.155.948.555.51
4429.020.125.725.326.718.427.327.549.150.553.443.345.154.948.456.28
4528.820.725.924.626.718.427.427.549.550.653.543.345.154.948.456.23
4628.820.725.924.626.817.227.029.049.551.053.643.344.056.048.555.49
4729.020.125.725.326.917.226.928.949.150.953.643.344.155.948.555.53
4829.020.125.725.327.017.126.929.049.151.053.743.344.155.948.555.42
4929.020.025.725.326.817.227.029.049.050.953.643.344.055.948.555.46
5029.020.025.725.326.917.127.029.149.051.053.643.344.056.048.555.46
5128.920.125.725.326.817.227.029.049.051.053.643.344.056.048.555.46
5229.020.125.725.326.817.227.029.049.150.953.643.344.055.948.555.51
5329.020.125.725.326.917.127.029.049.151.053.643.444.056.048.555.44
5429.020.125.725.326.817.227.029.049.151.053.643.344.056.048.555.47
5528.920.125.725.326.917.226.929.049.050.953.543.344.155.948.455.46
5629.020.125.725.326.817.227.028.949.151.053.643.444.055.948.555.53
5728.920.125.725.326.917.227.028.949.050.953.643.344.155.948.555.52
5829.020.125.725.326.917.226.929.049.151.053.643.344.155.948.555.43
5929.020.125.725.326.917.327.028.949.150.953.643.344.255.948.555.53
6029.020.125.725.326.917.327.028.949.150.953.743.244.255.848.555.54
6129.020.125.725.326.917.326.928.949.150.953.643.244.255.848.455.49
6229.020.125.725.326.817.127.029.049.151.053.643.343.956.048.555.44
6329.020.025.725.326.917.027.029.149.051.053.643.343.956.148.555.45
6429.020.125.725.326.817.227.029.049.150.953.643.344.056.048.555.50
6529.020.125.725.326.917.226.929.049.151.053.643.344.155.948.555.50
6629.020.125.725.326.917.227.029.049.150.953.643.344.155.948.555.49
6728.720.125.925.326.917.126.929.148.851.053.643.444.056.048.555.49
6829.020.125.725.326.817.227.029.049.151.053.643.344.056.048.555.50
6929.020.125.725.326.917.226.929.049.150.953.543.344.155.948.455.52
7029.120.025.925.027.116.927.128.949.150.953.743.244.056.048.555.11
7128.720.626.024.726.917.227.028.949.350.953.643.344.155.948.555.42
7229.020.125.625.226.618.327.527.749.150.653.443.444.955.148.456.28
7329.020.125.725.326.917.227.028.949.150.953.643.344.155.948.555.57
7429.020.125.725.226.917.227.028.949.151.053.643.344.155.948.555.55
7529.020.125.725.326.917.327.028.949.150.953.643.344.255.848.555.55
7629.020.125.725.326.917.226.928.949.150.953.643.244.155.948.455.49
7729.020.125.725.226.817.327.028.949.151.053.743.444.155.948.655.61
7828.920.125.725.326.917.227.029.049.051.053.743.444.156.048.655.58
7929.020.125.725.226.917.327.028.949.150.953.643.344.255.948.555.58
8028.920.125.725.226.917.327.028.949.050.953.543.444.255.948.555.55
8128.920.125.725.326.817.227.029.049.051.053.643.344.056.048.555.55
8228.920.125.725.326.817.227.029.049.050.953.643.344.056.048.555.51
8329.020.125.625.227.017.326.828.949.150.853.543.344.355.748.455.63
8429.020.125.725.226.817.327.028.849.150.953.643.344.155.948.555.60
8528.920.125.725.326.917.327.028.949.050.953.643.344.255.848.555.48
8628.920.125.725.326.817.327.028.849.050.953.643.344.155.848.555.61
8729.020.125.725.326.817.227.029.049.150.953.643.344.056.048.555.51
8828.920.125.725.227.017.226.829.049.050.953.643.344.255.848.555.53
8929.020.125.625.326.817.327.028.949.150.953.643.344.156.048.555.58
9029.020.125.625.326.917.227.029.049.150.953.643.244.155.948.455.38
9129.020.125.625.327.017.326.828.949.150.953.643.344.355.848.555.54
9229.020.125.725.327.017.226.929.049.150.953.643.344.255.848.555.53
9329.020.125.725.326.917.227.029.049.150.953.643.244.156.048.455.44
9429.020.125.725.326.917.227.028.949.150.953.643.244.155.948.455.42
9528.920.125.725.226.817.327.028.949.050.953.643.344.155.948.555.59
9629.020.125.725.226.917.227.029.049.150.953.643.344.155.948.555.43
9729.020.125.725.326.817.327.028.949.150.953.543.344.155.948.455.57
9828.920.225.725.226.917.227.029.049.150.953.643.244.156.048.455.42
9929.020.125.725.326.817.327.028.949.150.953.643.344.155.948.555.59
10028.920.225.725.227.017.226.929.049.150.953.643.344.255.948.555.41
10129.020.125.725.326.717.327.128.949.150.953.643.344.056.048.555.56
10229.020.125.725.326.817.327.028.949.150.953.543.244.156.048.455.59
10329.020.125.725.326.817.327.028.949.150.953.643.344.155.948.555.63
10428.920.125.725.226.817.327.028.949.051.053.643.344.155.948.555.58
10528.920.125.725.326.817.227.029.049.051.053.743.444.056.048.655.56
10628.920.125.725.226.817.327.028.849.050.953.643.344.155.848.555.60
10728.920.125.725.226.817.327.028.949.050.953.643.344.155.948.555.63
10828.920.125.725.326.817.327.028.949.050.953.643.344.155.948.555.65
10928.920.125.725.226.817.327.028.949.050.953.643.344.155.948.555.65
11028.920.125.725.226.817.327.028.949.050.953.643.344.155.948.555.63
11128.920.125.725.226.817.327.028.949.050.953.643.344.155.948.555.66
11228.920.125.725.226.817.327.028.849.050.953.643.344.155.848.555.67
11328.920.125.725.226.817.327.028.949.050.953.643.344.155.848.555.63
11428.920.125.725.226.817.227.028.949.050.953.643.344.056.048.555.58
11529.020.225.625.226.917.426.928.849.250.853.543.344.355.748.455.62
11629.020.125.625.227.017.326.828.949.150.953.643.344.355.748.555.59
11728.920.125.725.226.317.027.229.549.051.453.843.743.356.748.855.49
11828.819.825.925.526.817.427.028.848.650.953.643.344.255.848.555.63
11928.920.125.725.326.817.327.028.949.050.953.643.344.155.948.555.59
12028.920.125.725.226.817.327.028.849.050.953.643.444.155.848.555.63
12128.920.225.725.226.817.427.028.849.150.953.543.344.255.848.455.66
12228.920.225.725.226.817.527.028.749.150.953.643.344.355.748.555.79
12328.920.125.725.226.817.427.128.849.050.953.643.444.255.948.555.55
12428.920.225.725.226.717.527.128.849.150.953.643.344.255.948.555.84
12528.920.125.725.226.817.327.128.849.051.053.643.344.156.048.555.55
12628.920.125.725.226.817.427.028.849.050.953.643.344.255.848.555.77
12728.920.125.725.226.817.327.028.949.050.953.743.344.155.948.555.68
12828.920.225.725.226.817.427.028.849.150.953.643.444.255.848.555.54
12929.020.125.725.327.116.926.929.149.150.953.443.344.056.048.455.28
13028.720.62624.726.518.327.627.649.350.753.543.344.855.248.456.19
13128.920.125.725.226.917.427.028.849.050.953.743.344.355.748.555.60
13228.920.125.725.226.917.327.028.849.050.953.743.344.255.848.555.60
13328.920.125.725.226.917.327.028.849.050.953.743.344.255.848.555.60
13428.920.125.725.326.917.327.028.849.050.953.743.344.255.848.555.59
13528.920.125.725.326.917.327.028.849.050.953.743.344.255.848.555.59
13628.920.125.725.226.917.327.028.849.050.953.743.344.255.848.555.61
13728.920.125.725.226.917.327.028.849.050.953.743.344.255.848.555.61
13828.820.626.024.726.618.227.527.749.450.653.343.244.855.348.356.41
13928.820.626.024.726.518.227.627.749.450.650.650.644.755.348.356.41
14029.020.125.925.327.017.126.829.149.151.053.643.444.155.948.555.39
14128.920.025.725.327.317.026.729.048.950.953.743.344.355.748.555.25
Mean28.9120.1625.7525.1926.7817.3627.1028.7649.0750.9153.5643.3844.1455.8648.4555.56
SD0.100.180.100.200.250.390.270.470.160.160.270.620.410.400.070.34

SD: Standard deviation.

SD: Standard deviation.

Relative Synonymous Codon Usage (RSCU) Analysis of CHIKV

To determine the patterns of synonymous codon usage and to what extent G/C-ended codons might be preferred, we performed RSCU analysis and calculated the RSCU values. Among the 18 most abundantly used codons in CHIKV genomes, eleven (UUC, CUG, AUC, GUG, CCG, UAC, UGC, CAC, CAG, AAC and GAC) were G/C-ended (C-ended: 7; G-ended: 4) and the remaining seven (ACA, GCA, UCA, AGA, AAA, GAA, GGA) were A-ended codons; none of the preferred codons were U-ended (Figure 1A and Table 2). From RSCU analysis, we observed that CHIKV exhibits comparatively higher codon usage bias towards G/C- and less towards A-ended codons. However, it is also interesting to note that the mean GC% and AU% values are very similar (Table 1), yet the G/C- ending codons were used in a comparatively biased manner, indicating that the G/C content at the third position of the codons influenced the shaping of the overall synonymous codons usage patterns. The overall general trend of the 59 synonymous codon usages was also relatively consistent among different genotypes of CHIKV, indicating that the evolutionary processes of the three genotypes of CHIKV are restricted by the synonymous codon usage pattern to some extent (Figure 1B and Table 2). Furthermore, analysis of over- and under-represented codons showed that codons with an RSCU>1.6 are infrequently observed in CHIKV genomes. The RSCU values of the majority of preferred and non-preferred codons fell between 0.6 and 1.6. We further divided the RSCU data into three groups; (A) codons with RSCU<0.6 (under-represented), (B) codons with RSCU values between 0.6 and 1.6 (unbiased/randomly represented), and (C) codons with RSCU values >1.6 (over-represented). Among 59 codons, only CUG (Leu) and AGA (Arg) had an RSCU>1.6. However, the under-represented codons (RSCU<0.6), were identified as follows: CUU, CUC for Leu, GUU for Val, and CGU, CGG for Arg. The remaining 52 codons had RSCU values between 0.6–1.6 (Figure 1 and Table 2). These findings suggested that despite being an RNA virus with a high mutation rate in its lifecycle, CHIKV has evolved to form a relatively stable genetic composition at some specific levels of synonymous codon usage. This was further confirmed by ENC and CAI analysis as discussed in coming sections. Combining nucleotide composition and RSCU analysis, we deduced that the selection for preferred codons has been mostly influenced by compositional constraints, which also accounts for the presence of mutational pressure. However, we suspect that the compositional constraints may not be the sole factor associated with codon usage patterns in CHIKV, because although the overall RSCU values could reveal the codon usage pattern for the genomes, it may hide the codon usage variation among different genes in a genome [29].
Figure 1

Comparative analysis of relative synonymous codon usage (RSCU) patterns.

(A) between chikungunya virus (CHIKV), Homo sapiens (HS), Pan troglodytes (PT) and Aedes aegypti (AG) and Aedes albopictus (AB). (B) between east central south African (ECSA), Asian and West African (WA) genotypes of CHIKV.

Table 2

The synonymous codon usage patterns of CHIKV, its hosts and transmission vectors.

RSCURSCU
AACodonCHIKVHosts & VectorsAACodonCHIKVHosts & Vectors
OverallECSAAsianWA HS PT AG AB OverallECSAAsianWA HS PT AG AB
Pheb , c UUU0.760.740.810.980.920.780.560.48Sera , c UCU0.950.950.891.071.141.200.660.54
UUC 1.24 1.26 1.19 1.02 1.08 1.22 1.44 1.52 UCC1.000.991.071.001.321.441.201.38
Leub , d UUA0.600.600.690.430.480.360.360.24 UCA 1.32 1.351.271.170.900.780.660.48
UUG0.880.840.931.240.780.661.321.14UCG0.860.860.880.870.300.30 1.44 1.68
CUU0.590.570.620.680.780.720.660.48AGU0.620.640.590.510.900.780.960.78
CUC0.570.580.490.551.201.380.840.84AGC1.251.22 1.31 1.39 1.44 1.44 1.081.08
CUA1.391.431.341.040.420.420.540.54Arga , c AGA 2.11 2.10 2.20 2.04 1.26 1.260.660.60
CUG 1.98 1.98 1.92 2.06 2.40 2.58 2.28 2.76 CGU0.380.380.390.480.480.42 1.38 1.50
Ileb , c AUU0.700.700.680.741.080.960.990.75CGC1.031.040.990.911.081.201.261.32
AUC 1.31 1.31 1.30 1.26 1.41 1.56 1.59 1.86 CGA0.630.630.660.590.660.601.200.96
AUA0.990.991.020.990.510.480.390.39CGG0.460.450.490.461.201.141.021.20
Valb , c GUU0.510.510.510.410.720.601.040.88AGG1.381.391.281.531.26 1.32 0.540.42
GUC1.121.141.051.100.961.001.081.32Cysb , c UGU0.670.690.710.440.920.840.840.70
GUA0.970.951.020.990.480.360.600.52 UGC 1.33 1.31 1.29 1.56 1.08 1.16 1.16 1.30
GUG 1.41 1.40 1.42 1.51 1.84 2.04 1.28 1.32 Hisb , c CAU0.650.610.750.890.84 2.40 0.840.76
Prob , d CCU0.840.840.880.691.161.080.680.36 CAC 1.35 1.39 1.25 1.11 1.16 1.20 1.16 1.24
CCC0.710.710.640.80 1.28 1.40 0.841.12Glnb , c CAA0.890.900.890.800.540.460.820.60
CCA1.201.191.15 1.37 1.120.961.201.08 CAG 1.11 1.10 1.11 1.20 1.46 1.54 1.18 1.40
CCG 1.26 1.26 1.33 1.140.440.52 1.32 1.44 Asnb , c AAU0.740.730.840.660.940.840.800.64
Thra , c ACU0.790.800.760.771.000.840.800.64 AAC 1.26 1.27 1.16 1.34 1.06 1.16 1.20 1.36
ACC0.980.990.970.93 1.44 1.68 1.48 1.80 Lysa , d AAA1.00 1.02 0.960.970.860.800.800.58
ACA 1.33 1.34 1.28 1.36 1.121.000.720.60 AAG 1.00 0.98 1.04 1.03 1.14 1.20 1.20 1.42
ACG0.900.880.990.930.440.441.001.00Aspb , c GAU0.610.590.710.660.920.80 1.12 0.96
Alaa , c GCU0.660.660.670.651.081.081.081.00 GAC 1.39 1.41 1.29 1.34 1.08 1.20 0.88 1.04
GCC1.111.101.131.18 1.60 1.56 1.48 1.80 Glua , d GAA 1.09 1.091.111.060.840.68 1.16 1.10
GCA 1.43 1.44 1.37 1.49 0.920.800.760.60GAG0.910.910.890.94 1.16 1.32 0.840.90
GCG0.800.800.840.670.440.560.680.60Glya , c GGU0.720.690.830.700.640.561.12 1.24
Tyrb , c UAU0.730.750.720.550.880.780.640.56GGC0.950.960.870.99 1.36 1.40 1.041.08
UAC 1.27 1.25 1.28 1.45 1.12 1.22 1.36 1.44 GGA 1.27 1.26 1.27 1.34 1.000.92 1.48 1.20
GGG1.071.081.040.971.001.160.360.48

AA: amino acid, HS: H. sapiens, AG: A. aegypti, AB: A. albopictus, PT: P. troglodytes. Preferred codons of CHIKV, H. sapiens, A. aegypti, A. albopictus and P. troglodytes are shown in bold.

Amino acids with A/U-ended preferred codons in CHIKV.

Amino acids with G/C-ended preferred codons in CHIKV.

Amino acids with A/U-ended preferred codons in CHIKV.

Amino acids with G/C-ended preferred codons in CHIKV.

Comparative analysis of relative synonymous codon usage (RSCU) patterns.

(A) between chikungunya virus (CHIKV), Homo sapiens (HS), Pan troglodytes (PT) and Aedes aegypti (AG) and Aedes albopictus (AB). (B) between east central south African (ECSA), Asian and West African (WA) genotypes of CHIKV. AA: amino acid, HS: H. sapiens, AG: A. aegypti, AB: A. albopictus, PT: P. troglodytes. Preferred codons of CHIKV, H. sapiens, A. aegypti, A. albopictus and P. troglodytes are shown in bold. Amino acids with A/U-ended preferred codons in CHIKV. Amino acids with G/C-ended preferred codons in CHIKV. Amino acids with A/U-ended preferred codons in CHIKV. Amino acids with G/C-ended preferred codons in CHIKV.

Codon Usage Bias among CHIKV

To quantify the extent of variation in codon usage among different genomes of CHIKV arising from different geographical regions and genotypes, the ENC values for each genome were calculated. The ENC values among CHIKV genomes ranged from 54.55 to 56.41, with a mean of 55.56 and an SD of 0.34 (Table 1). An average value of 55.56 (ENC>40) represents stable ENC values and indicates a relatively conserved genomic composition among different CHIKV genomes. In general, there is an inverse relationship between ENC and gene expression; i.e., a lower ENC value indicates a higher codon usage preference and higher gene expression and vice versa [30]. Our results show that the overall codon usage bias and gene expression among different CHIKV genomes is lower, slightly biased and would be mainly affected by the base composition. Previous studies on codon usage analysis among other RNA viruses, such as bovine viral diarrhea virus (ENC: 50.91) [22], classical swine fever virus (ENC = 51.7) [17] and HCV (ENC = 52.62) [31], have also reported lower codon usage bias. The same is also true in the case of arthropod-borne RNA viruses, including West Nile virus (ENC: 53.81) [15] and dengue virus (DENV) (ENC: 49.70: DENV-1; 48.78: DENV-2; 49.52: DENV-3; and 50.81: DENV-4) [14]. A possible explanation for the weak codon bias of RNA viruses is that it might be advantageous for efficient replication in host cells, with potentially distinct codon preferences [21]. The codon adaptation index (CAI) is often used as measure of level of gene expression and to assess the adaptation of viral genes to their hosts. Highly expressed genes exhibit a strong bias for particular codons in many bacteria and small eukaryotes. In comparison to the ENC, which is another way of calculating codon usage bias and measures deviation from a uniform bias (null hypothesis), CAI measures the deviation of a given protein coding gene sequence with respect to a reference set of genes [32]. Here, we calculated the CAI values of coding sequences from CHIKV genomes. The CAI values ranged from 0.21 to 0.22, with a mean value of 0.22 and an SD of 0.001 (data not shown). The mean CAI value was low, indicating low codon usage bias and expression levels, which agreed with the ENC analysis.

Relationship between Codon Usage Patterns of CHIKV and its Hosts

Being parasitic organisms, it can be expected that the codon usage patterns of viruses would be affected by its hosts to some extent [33]. For instance, the codon usage pattern of poliovirus is reported to be mostly coincident with that of its host [34], while the codon usage pattern of hepatitis A was reported to be antagonistic to that of its host [35]. We therefore computed and compared the codon usage of CHIKV with its two hosts (Homo sapiens and Pan troglodytes), and transmission vectors (A. aegypti and A. albopictus). The results showed that the codon usage patterns of CHIKV were a mixture of coincidence and antagonism to its hosts and vectors (Table 2). In detail, the preferred codons for 12 out of 18 amino acids were common between CHIKV and H. sapiens. This included UUC (Phe), CUG (Leu), AUC (Ile), GUG (Val), UAC (Tyr), AGA (Arg), UGC (Cys), CAC (His), CAG (Gln), AAC (Asn), AAG (Lys) and GAC (Asp). Furthermore, all common preferred codons between CHIKV and H. sapiens were G/C- ended (C-ended: 7; G-ended: 4), with exception of an A-ended preferred codon for amino acid Arg. Similarly, preferred codons for 10 out of 18 amino acids were common between CHIKV and P. troglodytes. In case of the two transmission vectors, 10 out of 18 preferred codons were common among both mosquito species and CHIKV. It is also interesting to note that, except for amino acid Arg, the remaining 10 highly preferred codons were same among CHIKV, H. sapiens, A. aegypti and A. albopictus. Moreover, the preferred codon usage profiles of A. aegypti and A. albopictus were also very similar: 16 out of 18 preferred codons were common between, with exceptions for the preferred codons for Asp and Gly (Table 2). These results indicated that selection pressures from hosts and vectors have influenced the codon usage pattern of CHIKV and the possible fitness of the virus to adjust among its dynamic range of hosts and vectors. A mixture of coincidence and antagonism has also been reported previously in the case of HCV [31] and enterovirus 71 [13]. It was suggested that the coincident portions of codon usage among viruses and their hosts could enable the corresponding amino acids to be translated efficiently, while the antagonistic portions of codon usage may enable viral proteins to be folded properly, although the translation efficiency of the corresponding amino acids might decrease [31]. Although the comparative analysis of individual RSCU values as given above is frequently employed as a method of estimating the effect of synonymous codons usage of the hosts on that of specific viruses, it has its limitations in revealing the effect of the overall codon usage of the hosts on the formation of codon usage patterns of the viruses. Therefore, we took advantage of a method proposed recently that estimates the similarity degree of the overall codon usage patterns comprehensively between viruses and their hosts by treating the 59 synonymous codons as 59 different spatial vectors. The advantage of this formula, as reported by the authors in the case of dengue viruses, is that the comparative overall codon usage takes the place of the direct estimation of each synonymous codon usage; thus, the new method avoids the situation that the variations of 59 synonymous codon usage confuse the correct estimation of the effect of the host on the virus for codon usage [36]. The similarity index D(A,B) was therefore calculated for each genotype of CHIKV in relation to its hosts and vectors. The similarity index was found to be highest for A. albopictus vs. CHIKV group followed by P. troglodytes vs. CHIKV, A. aegypti vs. CHIKV and lowest in the case of H. sapiens vs. CHIKV (Figure 2), indicating that the effect of A. albopictus and P. troglodytes on the formation of the overall codon usage patterns of CHIKV is relatively higher than that of the A. aegypti and H. sapiens. Secondly, we computed the effect of transmission vectors on the formation of the overall codon usage patterns of three genotypes of CHIKV. A. aegypti had the strongest effect on the east central south African (ECSA) genotype, followed by West African (WA) and Asian genotypes. In the case of A. albopictus, the strongest effect was noted on the ECSA genotype, followed by Asian and WA genotypes. As for the effects of the two primates on the formation of the overall codon usage of CHIKV, the strongest effect of H. sapiens was on the Asian genotype, closely followed by the ECSA and WA genotypes. By contrast, P. troglodytes had its strongest and equal effect on ECSA and Asian genotypes, followed by WA genotype (Figure 2). Therefore, from the similarity index analysis, we observed that selection pressure from hosts and vectors have contributed to shaping the molecular evolution of CHIKV at the level of codon usage. The effect of the hosts was unevenly distributed among different genotypes, potentially indicating different evolutionary rates of CHIKV isolates. The calculation of the effects of primates and transmission vectors on the overall codon usage patterns of CHIKV showed that P. troglodytes and A. albopictus dominate the effects of H. sapiens and A. aegypti, respectively, on the formation of the overall codon usage patterns of CHIKV (Figure 2). The stronger effect of P. troglodytes than H. sapiens could also be attributed to the maintenance of CHIKV in a yellow fever-like zoonotic sylvatic cycle and its dependence upon non-human primates as reservoir hosts [7], [9]. Moreover, the similarity index of codon usage was also the highest between CHIKV and A. albopictus, as compared with A. aegypti, P. troglodytes and H. sapiens. The successful human-to-human transmission of CHIKV depends on Aedes mosquitoes [7], [9]; therefore, the stronger effect of A. albopictus on all three genotypes of CHIKV suggests that this vector might be a more efficient reservoir for viral replication and transmission compared with A. aegypti. These results are in agreement with recent studies showing more efficient dissemination and transmission of CHIKV by A. albopictus, which contribute to its ongoing re-emergence in a series of large-scale epidemics [37], [38].
Figure 2

The similarity index analysis of the codon usage between CHIKV, its hosts and transmission vectors.

Trends of Codon Usage Variation in CHIKV

Correspondence Analysis (COA)

Codon usage is multivariate by its very nature; therefore, it is necessary to analyze the data using multivariate statistical techniques, such as COA [39]. Therefore, to determine the trends in codon usage variation among different CHIKV genomes, we performed COA on the RSCU values, which were examined as a single dataset based on the RSCU value of each coding region (Figure 3). The first principal axis (f′1) accounted for 53.57% of the total variation, and the next three axes (f′2−f′4) accounted for 25.16%, 7.62%, and 2.06% of the total variation in synonymous codon usage, respectively. For further analysis, plots were reconstructed based on different geographical locations (Figure 4) and genotypes of CHIKV isolates (Figure 5). As expected the CHIKV isolates belonging to ECSA genotype were distributed across all planes of axes. When these plots were accessed on regional basis, it was found that different genotypes are circulating in single country. This analysis showed that the three different genotypes of CHIKV might have common ancestor. This further implies that the geographical diversity and associated factors, such as presence of favorable transmission vectors, climate features, host range and susceptibility, have also contributed to shaping the molecular evolution and codon usage in CHIKV, even though it appears to be less influential than mutational pressure (based on the current analysis).
Figure 3

Correspondence analysis of codon usage patterns in CHIKV genomes.

Figure 4

Correspondence analysis of codon usage patterns in CHIKV genomes based on region of isolation.

Figure 5

Correspondence analysis of codon usage patterns in CHIKV genomes based on virus genotypes.

Effect of mutational pressure in shaping the codon usage patterns in CHIKV

Mutational pressure and natural selection are considered the two major factors that shape codon usage patterns [40]. A general mutational pressure, which affects the whole genome, would certainly account for the majority of the codon usage among certain RNA viruses [21]. To determine the extent of the influence of these two factors on CHIKV codon usage, we performed correlation analysis between different nucleotide constraints. A complex correlation was observed among different nucleotide constraints (Table 3). U3% had a significant positive correlation with U% (r = 0.621, P<0.01) and G% (r = 0.185, P<0.05), whereas it had significant negative correlations with C% (r = −0.606, P<0.01) A% (r = −0.278, P<0.01) and GC% (r = −0.806, P<0.01). C3% had significant positive correlation with C% (r = 0.621, P<0.01), A (r = 0.261, P<0.01) and GC% (r = 0.798, P<0.01), and negative correlations with U% (r = −0.5877, P<0.01) and G% (r = −0.217, P<0.01). A3% had positive correlations with A (r = 0.625, P<0.01), C% (r = 0.327, P<0.01) and negative correlations with U% (r = −0.373, P<0.01) and G% (r = −0.576, P<0.01), whereas no correlation was observed between A3% and the GC%. G3% was positively correlated with G% (r = 0.658, P<0.01) U% (r = 0.354, P<0.01), and negatively correlated with C% (r = −0.377, P<0.01) and A% (r = −0.610, P<0.01); the correlation with the GC% was non-significant. In the case of GC3%, positive correlation was noted with C% (r = 0.498, P<0.01) and GC% (r = 0.852, P<0.01), and negative correlation with U% (r = −0.480, P<0.01); the correlation with G% was non-significant. Finally the GC and GC12 were also compared with GC3 and a highly significant positive correlations (r = 0.28, P<0.01; GC12 versus GC3) (r = 0.85, P<0.01; GC versus GC3) was observed as shown in Figure 6A and 6B respectively. Furthermore, a significant negative correlation between GC3 and ENC values was also observed (r = −0.756, P<0.01). This analysis collectively indicates that mutational pressure is most likely responsible for the patterns of nucleotide composition and, therefore, codon usage patterns, because the effects were present at all codon positions.
Table 3

Summary of correlation analysis between nucleotide constraints in CHIKV genomes.

A3%U3%C3%G3%GC3%
A% 0.625** −0.278** 0.261** −0.610** 0.090NS
U% −0.373** 0.621** −0.587** 0.354** −0.480**
C% 0.327** −0.606** 0.621** −0.377** 0.498**
G% −0.576** 0.185* −0.217** 0.658** −0.080NS
GC% 0.103NS −0.806** 0.798** −0.153NS 0.852**

The numbers in the each column represents correlation coefficient “r” values, which are calculated in each correlation analysis.

NS: non-significant (P>0.05).

*represents 0.01

**represents P<0.01.

Figure 6

Correlation analysis. (A) GC1,2 with that at GC3, (B) GC with that at GC3.

The numbers in the each column represents correlation coefficient “r” values, which are calculated in each correlation analysis. NS: non-significant (P>0.05). *represents 0.01 **represents P<0.01. In addition to correlation analysis, linear regression analysis was also performed to determine correlations between the first two principle axes (f′1 and f′2) and nucleotide constraints of CHIKV genomes. Again, several significant correlations were observed between the two principle axes and nucleotide contents (Table 4). f′1 showed a significantly positive correlation with U3% (r = 0.31, P<0.01), G3% (r = 0.58, P<0.01), U% (r = 0.25, P<0.01) and C% (r = 0.51, P<0.01); however, it showed significantly negative correlations with A% (r = −0.54, P<0.01), G% (r = −0.29, P<0.01), A3% (r = −0.50, P<0.01), C3 (r = −0.35, P<0.01), GC3 (r = −0.24, P<0.01; Figure 7A) and GC% (r = −0.21, P<0.01). In the case of f, A3%, G3% and C% had non-significant correlations. f′2 axis showed significantly positive correlations with C3 (r = 0.69, P<0.01), GC3% (r = 0.74, P<0.01; Figure 7B), GC% (r = 0.64, P<0.01), A% (r = 0.17, P<0.05) and G% (r = 0.39, P<0.01) whereas, negative correlations with U3% (r = −0.66, P<0.01), and U% (r = −0.34, P<0.01) (Table 4). Our analysis shows that mutational pressure has played a major role in shaping the dynamics of codon usage patterns within CHIKV genomes.
Table 4

Summary of correlation between the first two principle axes and nucleotide constraints in CHIKV genomes.

Composition (%) f1 (53.57%) f2 (25.16%)
A3 −0.50** −0.97NS
U3 0.310** −0.659**
C3 −0.35** 0.69**
G3 0.58** −0.134NS
GC3 −0.24** 0.740**
GC−0.21* 0.640**
A−0.54** 0.174*
U0.25** −0.340**
G−0.29** 0.390**
C0.517** −0.126NS

The numbers in the each column represents correlation coefficient “r” values, which are calculated in each correlation analysis.

NS: non-significant (P>0.05).

*represents 0.01

**represents P<0.01.

Figure 7

Correlation between the first axis (A) and second axis (B) values of COA and GC3 values.

The numbers in the each column represents correlation coefficient “r” values, which are calculated in each correlation analysis. NS: non-significant (P>0.05). *represents 0.01 **represents P<0.01.

Correlation analysis between ENC and GC3 values

A plot of ENC versus GC3 (Nc plot) is widely used to study codon usage variation among genes in different organisms. It has been postulated that an ENC-plot of genes, whose codon choice is constrained only by a G3+ C3 mutational bias, will lie on or just below the continuous curve of the predicted ENC values [30]. Although, the nucleotide composition correlation analysis showed that codon usage in CHIKV genomes is mainly caused by compositional constraints or mutational pressure, we were interested to determine the possible influence of other factors, such as natural selection. Therefore, we constructed a corresponding relation distribution plot between the ENC and GC3 values. As shown in Figure 8, all points aggregated closely towards the right side under the expected ENC curve, indicating that, apart from mutation pressure, the codon usage patterns have also been influenced by other factors to some extent.
Figure 8

The relationship between the effective number of codons (ENC) values and the GC content at the third synonymous codon position (GC3).

The curve indicates the expected codon usage if GC compositional constraints alone account for codon usage bias.

The relationship between the effective number of codons (ENC) values and the GC content at the third synonymous codon position (GC3).

The curve indicates the expected codon usage if GC compositional constraints alone account for codon usage bias.

Relationship between dinucleotide and codon usage patterns in CHIKV

It has been suggested that dinucleotide bias can affect overall codon usage bias in several organisms, including DNA and RNA viruses [41]–[43]. To study the possible effect of dinucleotides on codon usage in CHIKV genomes, we calculated the relative abundances of the 16 dinucleotides from the coding sequences of CHIKV. The occurrences of dinucleotides were not randomly distributed, and no dinucleotides were present at the expected frequencies (Table 5). Under-representation of CpG dinucleotides in different RNA and DNA viruses has been reported [41]. In the case of CHIKV, the relative abundance of CpG showed deviation from the “normal range” (mean ± SD = 0.808±0.016) and was under-represented. Interestingly, GpC dinucleotides also deviated from the normal range and were instead slightly over-represented (mean ± SD = 1.001±0.007) (Table 5). The RSCU values of the eight codons containing CpG (CCG, GCG, UCG, ACG, CGC, CGG, CGU, and CGA) and the six codons containing GpC (GCU, GCC, GCA, UGC, AGC, GGC) were also analyzed to determine the possible effects of CpG and GpC representations on codon usage bias. In the case of CpG-containing codons, all codons were under-represented (RSCU<1.6) and were not preferred codons for their respective amino acid, except for CCG (RSCU = 1.26), a preferred codon for proline (Table 2). On the other hand, despite slight over-representation of the GpC dinucleotide, all GpC-containing codons were also under-represented (RSCU<1.6) and were not preferred codons for their respective amino acids, with two exceptions; GCA (Ala, RSCU = 1.43) and UGC (Cys, RSCU = 1.33) (Table 2). It has been proposed that CpG deficiency in pathogens is associated with the immunostimulatory properties of unmethylated CpGs, which are recognized by the host’s innate immune system as a pathogen signature [28]. Recognition of umethylated CpGs by Toll like receptor 9 (TLR9), a type of intracellular pattern recognition receptor (PRR), leads to activation of several immune response pathways [44]. The vertebrate immune system relies on unmethylated CpG recognition in DNA molecules as a signature of infection, and CpG under-representation in RNA viruses is exclusively observed in vertebrate viruses; therefore, it is reasonable to suggest that a TLR9-like mechanism exists in the vertebrate immune system that recognizes CpGs when in an RNA context (such as in the genomes of RNA viruses) and triggers immune responses [45].
Table 5

Summary of correlation analysis between the first two principal axes and relative abundance of dinucleotides in CHIKV genomes.

UUUCUAUGCUCCCACG
Mean ± SD 0.954±0.0300.935±0.0200.859±0.0221.275±0.0221.082±0.0260.979±0.0171.125±0.0170.808±0.016
Range 0.886–1.0820.862–0.9640.784–0.9341.214–1.3291.022–1.1070.946–1.0251.058–1.1720.781–0.856
Axis 1 r 0.755** −0.664** 0.213* −0.071NS −0.724** 0.612** −0.262** 0.236**
P 0.0000.0000.0110.4030.0000.0000.0020.005
Axis 2 r −0.357** 0.233** −0.665** 0.429** 0.418** −0.305** 0.611** −0.548**
P 0.0000.0050.0000.0000.0000.0000.0000.000
AU AC AA AG GU GC GA GG
Mean ± SD 0.929±0.0151.055±0.0140.987±.00781.024±0.0061.054±0.0141.001±0.0071.012±0.0090.931±0.010
Range 0.884–0.9870.998–1.0970.963–1.0081.009–1.0371.007–1.1170.965–1.0200.996–1.0370.900–0.954
Axis 1 r 0.145NS 0.39NS −0.387** 0.236** 0.80NS −0.009NS 0.698** −0.366**
P 0.0860.6450.0000.0050.3450.9190.0000.000
Axis 2 r −0.601** −0.381** 0.221** −0.404** −0.508** 0.279** −0.288** 0.168*
P 0.0000.0000.0090.0000.0000.0010.0010.047

NS: non-significant (P>0.05).

*represents 0.01

**represents P<0.01.

NS: non-significant (P>0.05). *represents 0.01 **represents P<0.01. Compared with differential (over- and under-) representation of CpGs in different organisms, UpA under-representation also exists in several organisms, including vertebrates, invertebrates, plants and prokaryotes [41]. The presence of TpA in two out of three canonical stop codons and in transcriptional regulatory motifs (e.g., the TATA box sequence) is believed to be responsible for its under-representation. Therefore, UpA under-representation is expected to reduce the risk of nonsense mutations and minimizes improper transcription [43], [46]. In the case of CHIKV, the relative abundance of UpA also deviated from the “normal range” (mean ± SD = 0.859±0.022) and was under-represented, similarly to CpG. The six codons containing UpA (UUA, CUA, GUA, UAU, UAC and AUA) were also under-represented (RSCU<1.6) and were not preferred codons for their respective amino acids. The CpA (mean ± SD = 1.125±0.017) and UpG (mean ± SD = 1.275±0.022) dinucleotides were over-represented compared with the rest of the 14 dinucleotide pairs (Table 5). Similarly, the eight codons containing CpA (UCA, CCA, ACA, GCA, CAA, CAG, CAU and CAC) and five codons containing UpG (UUG, CUG, GUG, UGU and UGC) were also over-represented compared with the rest of the codons for their respective amino acids and a majority of them were also preferential codons for their respective amino acids, based on RSCU analysis (Table 2). Over-representation of CpA and UpG in different organisms has been observed and is regarded as a consequence of the under-representation of CpG dinucleotides. One possible explanation is that methylated cytosines are prone to mutate into thymines through spontaneous deamination, resulting in the dinucleotide TpG and the subsequent presence of a CpA on the opposite strand after DNA replication [47]. However, this theory cannot explain under-representation of CpGs in RNA viruses. Moreover, under-representation of CpGs has also been observed in several vertebrate viruses, where it is independent of their genomic composition and replication cycles. Recently, two studies performed large-scale dinucleotide analyses in different viruses and suggested that the CpG usage of +ssRNA viruses is affected greatly by their hosts. As a result, most +ssRNA viruses mimic their hosts’ CpG usage and the existence of an RNA dinucleotide recognition system, probably linked to the innate immune system of the host, has also been proposed [41], [48]. Finally, the relative abundance of dinucleotides was also correlated with the first two principal axes. Among the 16 dinucleotides, 11 significantly (positive and negative) correlated with the first axis and 16 significantly (positive and negative) correlated with the second axis (Table 5). These observations indicated that the composition of dinucleotides determines the variation in synonymous codon usage. Therefore, from the present dinucleotide composition analysis, it is evident that selection pressure associated with (i) maintenance of efficient replication and transmission cycles among multiple hosts, and (ii) evolution of escape mechanisms to evade from the host antiviral responses, have contributed to shaping the overall synonymous codon usage in CHIKV.

Effect of natural selection in shaping the codon usage patterns in CHIKV

It has been suggested that if synonymous codon usage bias is affected by mutational pressure alone, then the frequency of nucleotides A and U/T should be equal to that of C and G at the synonymous codon third position [26]. However, in case of CHIKV genomes, variations in nucleotide base compositions were noted (Table 1), indicating that other factors, such as natural selection, could also influence overall synonymous codon usage bias. As the role of natural selection is also evident from previous codon usage analysis studies in several viruses [25], [26], [49], we were interested to determine to what extent natural selection might be involved in the codon usage patterns of CHIKV. For this purpose, we computed the GRAVY and aromaticity (ARO) values for each CHIKV isolate (Table S1) and a linear regression analysis was performed between GRAVY, ARO and the f′1, f′2, ENC, GC and GC3 values. The analysis results showed that the GRAVY values were not significant for f′1 and were highly significant for f′2, ENC, GC3 and GC. In the case of ARO, an opposite trend was observed: ARO values were significantly negatively correlated with f′1 and correlations with f′2, ENC, GC3 and GC were not significant (Table 6). These results indicated that, although natural selection has influenced codon usage of CHIKV genomes to some extent, it is much weaker compared with mutational pressure.
Table 6

Correlation analysis among GRAVY, ARO, ENC, GC3, GC and the first two principle axes.

f1 (53.57%) f2 (25.16%)ENCGC3 GC
GRAVY r 0.118NS −0.558** 0.420** −0.529** −0.568**
P 0.1640.0000.0030.0000.000
ARO r 0.169* −0.149NS 0.081NS 0.026NS −0.021NS
P 0.0450.0770.3400.7580.803

ARO: Aromaticity.

NS: non-significant (P>0.05).

*represents 0.01

**represents P<0.01.

ARO: Aromaticity. NS: non-significant (P>0.05). *represents 0.01 **represents P<0.01.

Conclusions

Taken together, our analysis showed that overall codon usage bias in CHIKV is slightly biased, and the major factor that has contributed to shaping codon usage pressure is mutational pressure. In addition, contributions of other factors, including hosts, geography, dinucleotides composition and natural selection, are also evident from our analysis. Our data suggested that codon usage in CHIKV is undergoing an evolutionary process, probably reflecting a dynamic process of mutation and natural selection to re-adapt its codon usage to different environments and hosts. To the best our knowledge, this is first report of codon usage analysis in CHIKV and is expected to deepen our understanding of the mechanisms contributing towards codon usage and evolution of CHIKV.

Materials and Methods

Sequences

The complete genome sequences of 141 CHIKV isolates (in FASTA format) were obtained from the National Center for Biotechnology (NCBI) GenBank database (http://www.ncbi.nlm.nih.gov). The accession numbers and other detailed information of the selected CHIKVs’ genomes, such as isolation date, isolation place, host and genome size were also retrieved (Table 7).
Table 7

Demographics of CHIKV genomes analyzed in present study.

NoStrain NameGenBank AccessionLength (bp)YearHostCountryGenotype
1Ross low-psgHM045811117751953HumanTanzaniaECSA
2VereenigingHM045792118361956HumanSouth AfricaECSA
3TH35HM045810119861958HumanThailandAsian
4LSFSHM045809117531960HumanDRCECSA
5Angola M2022HM045823117541962AngolaECSA
6A301HM045821118231963BatSenegalECSA
7Gibbs 63–263HM045813119761963HumanIndiaAsian
8I-634029HM045803118971963HumanIndiaAsian
9IND-63-WB1EF027140117841963IndiaAsian
10IbH35HM045786118441964HumanNigeriaWA
11PM2951HM045785118441966MosquitoSenegalWA
12SH 3013HM045816118231966HumanSenegalWA
13PO731460HM045788119881973HumanIndiaAsian
14IND-73-MH5EF027141118051973IndiaAsian
151455–75HM045814119391975HumanThailandAsian
16AR 18211HM045805116861976MosquitoSouth AfricaECSA
173412–78HM045808119681978HumanThailandAsian
18HB78HM045822117531978HumanCARECSA
19ArD 30237HM045815118231979MosquitoSenegalWA
20ArA 2657HM045818118231981MosquitoCote d’IvoireWA
21IPD/A SH 2807HM04580411847HumanSenegalWA
22UgAg4155HM045812117741982HumanUgandaECSA
23JKT23574HM045791119921983HumanIndonesiaAsian
2437997AY726732118811983MosquitoSenegalWA
25DakAr B 16878HM045784117721984MosquitoCARECSA
26RSU1HM045797119791985HumanIndonesiaAsian
27Hu/85/NR/001HM045800118971985HumanPhilippinesAsian
28PhH15483HM045790119071985HumanPhilippinesAsian
29ALSA-1HM045806117681986IndiaECSA
30CAR256HM04579311767CARECSA
316441–88HM045789118551988HumanThailandAsian
32ArD 93229HM045819118601993MosquitoSenegalWA
33ArA 30548HM045820118171993MosquitoCote d’IvoireWA
34CO392-95HM045796119791995HumanThailandAsian
35SV0444-95HM045787119681995HumanThailandAsian
36K0146-95HM045802119751995ThailandAsian
37IND-00-MH4EF027139118142000HumanIndiaECSA
38HD 180760HM045817118322005HumanSenegalWA
39IMTSSA6424CFR717337115592005HumanFranceECSA
40IMTSSA6424SFR717336115592005HumanFranceECSA
41BNI-CHIKV_899FJ959103118322006HumanMauritiusECSA
42MY019IMR/06/BPEU703761120282006HumanMalaysiaAsian
43DHS4263-Calif ABHM045794117742006HumanUSAECSA
44MY003IMR/06/BPEU703760120282006HumanMalaysiaAsian
45MY002IMR/06/BPEU703759120282006HumanMalaysiaAsian
46DRDE-06EF210157117742006HumanIndiaECSA
470611aTwFJ807896118112006HumanSingaporeECSA
48TM25EU564334117722006HumanMauritiusECSA
49IND-KA51FJ000068118122006HumanIndiaECSA
50IND-MH51FJ000067118122006HumanIndiaECSA
51IND-GJ52FJ000062118122006HumanIndiaECSA
52IND-GJ53FJ000065118132006HumanIndiaECSA
53IND-KR51FJ000066118122006HumanIndiaECSA
54IND-GJ51FJ000064118072006HumanIndiaECSA
55IND-06-GujJF274082118292006HumanIndiaECSA
56IND-KA52FJ000063118122006HumanIndiaECSA
57RGCB05/KL06GQ428211117642006HumanIndiaECSA
58RGCB03/KL06GQ428210117642006HumanIndiaECSA
59CHIK31EU564335118102006HumanIndiaECSA
60SL10571AB455494118292006HumanECSA
61SL11131AB455493118292006HumanECSA
62IND-06-KA15EF027135117292006HumanIndiaECSA
63D570/06EF012359118062006MauritiusECSA
64IND-06-RJ1EF027137117672006IndiaECSA
65IND-06-AP3EF027134117792006HumanIndiaECSA
66IND-06-TN1EF027138117502006HumanIndiaECSA
67LR2006_OPY1DQ443544118402006HumanReunionECSA
68IND-06-MH2EF027136118002006HumanIndiaECSA
69SL-CR 3HM045799117582007HumanSri LankaECSA
70ITA07-RA1EU244823117882007ItalyECSA
71SL-CK1HM045801117662007HumanSri LankaECSA
720706aTwFJ807897120132007HumanIndonesiaAsian
73LKRGCH1507FJ445428117172007HumanSri LankaECSA
74IND-KR52FJ000069118122007HumanIndiaECSA
75DRDE-07EU372006117742007HumanIndiaECSA
76LKMTCH2707FJ445427117172007HumanSri LankaECSA
77RGCB80/KL07GQ428212117642007HumanIndiaECSA
78RGCB120/KL07GQ428213117642007HumanIndiaECSA
790810aTwFJ807898118112008HumanBangladeshECSA
80SD08PanGU199351117932008HumanChinaECSA
810810bTwFJ807899118112008HumanMalaysiaECSA
82SGEHICHS277108FJ445510118002008HumanSingaporeECSA
83SVUKDP-08JN558835117332008HumanIndiaECSA
84FD080178GU199352116772008HumanChinaECSA
85FD080008GU199350116872008HumanChinaECSA
86FD080231GU199353116872008HumanChinaECSA
87SGEHICHD13508FJ445511117192008HumanSingaporeECSA
88LK(PB)CH5808FJ513637117102008HumanSri LankaECSA
89LK(PB)CH3008FJ513632116932008HumanSri LankaECSA
90LK(PB)CH1608FJ513629117162008HumanSri LankaECSA
91LK(PB)CH5308FJ513635117262008HumanSri LankaECSA
92LK(PB)chik6008GU013529117182008HumanSri LankaECSA
93LK(PB)CH1008FJ513628117222008HumanSri LankaECSA
94LK(PB)chik3408GU013528117152008HumanSri LankaECSA
95LK(EH)CH6708FJ513654117172008HumanSri LankaECSA
96LK(EH)CH7708FJ513657116962008HumanSri LankaECSA
97LK(EH)CH4408FJ513645117142008HumanSri LankaECSA
98LK(EH)CH20108FJ513679117172008HumanSri LankaECSA
99LK(EH)CH18608FJ513675117162008HumanSri LankaECSA
100LK(EH)chik19708GU013530117142008HumanSri LankaECSA
101LKEHCH13908FJ445426117172008HumanSri LankaECSA
102LK(EH)CH17708FJ513673117102008HumanSri LankaECSA
103SGEHICHT077808FJ445484117902008HumanSingaporeECSA
104RGCB356/KL08GQ428215117642008HumanIndiaECSA
105RGCB355/KL08GQ428214117642008HumanIndiaECSA
106SGEHICHS422308FJ445432117222008HumanSingaporeECSA
107SGEHICHS421708FJ445431117222008HumanSingaporeECSA
108SGEHICHD93508FJ445430117222008HumanSingaporeECSA
109SGEHICHD96808FJ445463117292008HumanSingaporeECSA
110SGEHICHS424108FJ445443117142008HumanSingaporeECSA
111SGEHICHS422808FJ445433117292008HumanSingaporeECSA
112SGEHICHS425208FJ445445117192008HumanSingaporeECSA
113SGEHICHD122508FJ445502117172008HumanSingaporeECSA
114CU-Chik10GU301780118112008HumanThailandECSA
115SVUCTR-09JN558834117332009HumanIndiaECSA
116SVUKDP-09JN558836117332009HumanIndiaECSA
117CU-Chik661GQ905863117522009HumanThailandECSA
118CU-Chik683GU301781118112009HumanThailandECSA
119CU-Chik_OBFGU908223116702009MosquitoThailandECSA
120CU-Chik009GU301779118112009HumanThailandECSA
121NL10/152KC862329118362010HumanIndonesiaECSA
122GD05/2010JX088705118112010HumanChinaECSA
123GZ0991JQ065890116842010HumanChinaECSA
124GD113HQ846357117202010HumanChinaECSA
125GD139HQ846358117302010HumanChinaECSA
126GD115HQ846356117462010HumanChinaECSA
127GD134HQ846359117252010HumanChinaECSA
128GZ1029JQ065891116872010HumanChinaECSA
129CHI2010JQ067624117242010HumanChinaECSA
130NC/2011-568HE806461116212011HumanNew CaledoniaECSA
131V0603310_KH11_BTBJQ861260117432011HumanCambodiaECSA
132V1024311_KH11_PVHJQ861256117542011HumanCambodiaECSA
133V1024308_KH11_PVHJQ861254117502011HumanCambodiaECSA
134V1024314_KH11_PVHJQ861258117332011HumanCambodiaECSA
135V1024306_KH11_PVHJQ861253117452011HumanCambodiaECSA
136V1024310_KH11_PVHJQ861255117362011HumanCambodiaECSA
137V1024313_KH11_PVHJQ861257117552011HumanCambodiaECSA
138CHIKV-JC2012KC488650118892012HumanChinaAsian
139Chik-syKF318729120172012HumanChinaAsian
140WuerzburgEU03796211805HumanMauritiusECSA
141S27-African prototypeAF36902411826HumanECSA

Dashes (−) indicates data not available. East Central South African, ECSA; Democratic Republic of Congo, DRC; Central African Republic, CAR; West African; WA.

Dashes (−) indicates data not available. East Central South African, ECSA; Democratic Republic of Congo, DRC; Central African Republic, CAR; West African; WA.

Compositional Analysis

The following compositional properties were calculated for the CHIKV genomes; (i) the overall frequency of occurrence of the nucleotides (A %, C %, U/T %, and G %); (ii) the frequency of each nucleotide at the third site of the synonymous codons (A3%, C3%, U3% and G3%); (iii) the frequencies of occurrence of nucleotides G+C at the first (GC1), second (GC2), and third synonymous codon positions (GC3); (iv) the mean frequencies of nucleotide G+C at the first and the second position (GC1,2); and (v) the overall GC and AU content. The codons AUG and UGG are the only codons for Met and Trp, respectively, and the termination codons UAA, UAG and UGA do not encode any amino acids. Therefore, these five codons are expected not to exhibit any usage bias and were therefore excluded from the analysis.

RSCU Analysis

The RSCU values for all the coding sequences of CHIKV genomes were calculated to determine the characteristics of synonymous codon usage without the confounding influence of amino acid composition and the size of coding sequence of different gene samples, following a previously described method [18]. The RSCU index was calculated as follows:where g is the observed number of the ith codon for the jth amino acid which has n kinds of synonymous codons. RSCU values represent the ratio between the observed usage frequency of one codon in a gene sample and the expected usage frequency in the synonymous codon family given that all codons for the particular amino acid are used equally. The synonymous codons with RSCU values >1.0 have positive codon usage bias and were defined as abundant codons, while those with RSCU values <1.0 have negative codon usage bias and were defined as less-abundant codons. When the RSCU values is 1.0, it means there is no codon usage bias for that amino acid and the codons are chosen equally or randomly [50]. Moreover, the synonymous codons with RSCU values >1.6 and <0.6 were treated as over-represented and under-represented codons, respectively [23].

Influence of Overall Codon Usage of the Hosts on that of CHIKV

For the comparative analysis of codon usage between CHIKVs and its vectors and hosts; codon usage data for two transmission vectors (A. aegypti, A. albopictus), and hosts (H. sapiens, P. troglodytes) were obtained from the codon usage database (http://www.kazusa.or.jp/codon/) [51]. Zhou et al. proposed a method recently to determine the potential impact of the overall codon usage patterns of the hosts in the formation of the overall codon usage of viruses [36]. Here, we applied the same approach in case of CHIKV and the similarity index D(A,B) was calculated as follows:where R(A,B) is defined as a cosine value of an included angle between A and B spatial vectors representing the degree of similarity between CHIKV and a specific host at the aspect of the overall codon usage pattern, a is defined as the RSCU value for a specific codon among 59 synonymous codons of CHIKV coding sequence, b is termed as the RSCU value for the same codon of the host. D(A,B) represents the potential effect of the overall codon usage of the host on that of CHIKV, and its value ranges from zero to 1.0 [36].

Measures of Relative Dinucleotides Abundance

The relative abundance of dinucleotides in the coding regions of CHIKV genomes was calculated using a previously described method [43]. A comparison of actual and expected dinucleotide frequencies of the 16 dinucleotides in coding regions of the CHIKV was also undertaken. The odds ratio was calculated using the following formula:where f x denotes the frequency of the nucleotide X, f y denotes the frequency of the nucleotide Y, f y f x the expected frequency of the dinucleotide XY and f xy the frequency of the dinucleotide XY, etc,. for each dinucleotide were calculated. As a conservative criterion, for Pxy>1.23 (or <0.78), the XY pair is considered to be over-represented (or under-represented) in terms of relative abundance compared with a random association of mononucleotides.

CAI Analysis

The CAI is used as a quantitative method of predicting the expression level of a gene based on its codon sequence. The CAI value ranges from 0 to 1. The most frequent codons simply have the highest relative adaptiveness values, and sequences with higher CAIs are preferred over those with lower CAIs [32].

ENC Analysis

The ENC is used to quantify the absolute codon usage bias of the gene (s) of interest, irrespective of gene length and the number of amino acids [30]. In this study, this measure was calculated to evaluate the degree of codon usage bias exhibited by the coding sequences of CHIKVs. The ENC values ranged from 20 for a gene showing extreme codon usage bias using only one of the possible synonymous codons for the corresponding amino acid, to 61 for a gene showing no bias using all possible synonymous codons equally for the corresponding amino acid. The larger the extent of codon preference in a gene, the smaller the ENC value is. It is also generally accepted that genes have a significant codon bias when the ENC value is less than or equal to 35 [30], [52]. The ENC was calculated using the following formula:Where (k = 2,3,4,6) is the mean of values for the k-fold degenerate amino acids, which is estimated using the formula as follows:where n is the total number of occurrences of the codons for that amino acid andwhere n is the total number of occurrences of the i th codon for that amino acid. Genes, whose codon choice is constrained only by a mutation bias, will lie on or just below the curve of the expected ENC values. Therefore, for elucidating the relationship between GC3 and ENC values, the expected ENC values for different GC3 were calculated as follows:where s represents the given GC3% value [30].

COA of Codon Usage

COA is a multivariate statistical method that is used to explore the relationships between variables and samples. In the present study, COA was used to analyze the major trends in codon usage patterns among CHIKVs coding sequences. COA involves a mathematical procedure that transforms some correlated variable (RSCU values) into a smaller number of uncorrelated variables called principal components. To minimize the effect of amino acid composition on codon usage, each coding sequence was represented as a 59 dimensional vector, and each dimension corresponded to the RSCU value of each sense codon, which only included several synonymous codons for a particular amino acid, excluding the codons AUG, UGG and the three stop codons.

Correlation Analysis

Correlation analysis was carried out to identify the relationship between nucleotide composition and synonymous codon usage patterns of CHIKV. This analysis was implemented based on the Spearman’s rank correlation analysis. All statistical processes were carried out using the statistical software SPSS 16.0 for windows. Hydrophobicity (GRAVY) and aromaticity (ARO) indices in CHIKV genomes. (DOCX) Click here for additional data file.
  49 in total

1.  Codon usage tabulated from international DNA sequence databases: status for the year 2000.

Authors:  Y Nakamura; T Gojobori; T Ikemura
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  Correlation between codon usage and thermostability.

Authors:  Marx Gomes Van der Linden; Sávio Torres de Farias
Journal:  Extremophiles       Date:  2006-07-08       Impact factor: 2.395

3.  Variation in G + C-content and codon choice: differences among synonymous codon groups in vertebrate genes.

Authors:  A Marín; J Bertranpetit; J L Oliver; J R Medina
Journal:  Nucleic Acids Res       Date:  1989-08-11       Impact factor: 16.971

4.  The codon Adaptation Index--a measure of directional synonymous codon usage bias, and its potential applications.

Authors:  P M Sharp; W H Li
Journal:  Nucleic Acids Res       Date:  1987-02-11       Impact factor: 16.971

Review 5.  The alphaviruses: gene expression, replication, and evolution.

Authors:  J H Strauss; E G Strauss
Journal:  Microbiol Rev       Date:  1994-09

6.  Synonymous codon usage analysis of thirty two mycobacteriophage genomes.

Authors:  Sameer Hassan; Vasantha Mahalingam; Vanaja Kumar
Journal:  Adv Bioinformatics       Date:  2010-02-01

7.  DNA methylation and the frequency of CpG in animal DNA.

Authors:  A P Bird
Journal:  Nucleic Acids Res       Date:  1980-04-11       Impact factor: 16.971

8.  Codon usage bias and the evolution of influenza A viruses. Codon Usage Biases of Influenza Virus.

Authors:  Emily H M Wong; David K Smith; Raul Rabadan; Malik Peiris; Leo L M Poon
Journal:  BMC Evol Biol       Date:  2010-08-19       Impact factor: 3.260

9.  Sequential adaptive mutations enhance efficient vector switching by Chikungunya virus and its epidemic emergence.

Authors:  Konstantin A Tsetsarkin; Scott C Weaver
Journal:  PLoS Pathog       Date:  2011-12-08       Impact factor: 6.823

10.  Patterns of evolution and host gene mimicry in influenza and other RNA viruses.

Authors:  Benjamin D Greenbaum; Arnold J Levine; Gyan Bhanot; Raul Rabadan
Journal:  PLoS Pathog       Date:  2008-06-06       Impact factor: 6.823

View more
  42 in total

1.  Clinical Sequencing Uncovers Origins and Evolution of Lassa Virus.

Authors:  Kristian G Andersen; B Jesse Shapiro; Christian B Matranga; Rachel Sealfon; Aaron E Lin; Lina M Moses; Onikepe A Folarin; Augustine Goba; Ikponmwonsa Odia; Philomena E Ehiane; Mambu Momoh; Eleina M England; Sarah Winnicki; Luis M Branco; Stephen K Gire; Eric Phelan; Ridhi Tariyal; Ryan Tewhey; Omowunmi Omoniwa; Mohammed Fullah; Richard Fonnie; Mbalu Fonnie; Lansana Kanneh; Simbirie Jalloh; Michael Gbakie; Sidiki Saffa; Kandeh Karbo; Adrianne D Gladden; James Qu; Matthew Stremlau; Mahan Nekoui; Hilary K Finucane; Shervin Tabrizi; Joseph J Vitti; Bruce Birren; Michael Fitzgerald; Caryn McCowan; Andrea Ireland; Aaron M Berlin; James Bochicchio; Barbara Tazon-Vega; Niall J Lennon; Elizabeth M Ryan; Zach Bjornson; Danny A Milner; Amanda K Lukens; Nisha Broodie; Megan Rowland; Megan Heinrich; Marjan Akdag; John S Schieffelin; Danielle Levy; Henry Akpan; Daniel G Bausch; Kathleen Rubins; Joseph B McCormick; Eric S Lander; Stephan Günther; Lisa Hensley; Sylvanus Okogbenin; Stephen F Schaffner; Peter O Okokhere; S Humarr Khan; Donald S Grant; George O Akpede; Danny A Asogun; Andreas Gnirke; Joshua Z Levin; Christian T Happi; Robert F Garry; Pardis C Sabeti
Journal:  Cell       Date:  2015-08-13       Impact factor: 41.582

2.  Constraints of Viral RNA Synthesis on Codon Usage of Negative-Strand RNA Virus.

Authors:  Ryan H Gumpper; Weike Li; Ming Luo
Journal:  J Virol       Date:  2019-02-19       Impact factor: 5.103

3.  Gene expression, nucleotide composition and codon usage bias of genes associated with human Y chromosome.

Authors:  Monisha Nath Choudhury; Arif Uddin; Supriyo Chakraborty
Journal:  Genetica       Date:  2017-04-18       Impact factor: 1.082

4.  Codon Usage Pattern of Genes Involved in Central Nervous System.

Authors:  Arif Uddin; Supriyo Chakraborty
Journal:  Mol Neurobiol       Date:  2018-06-19       Impact factor: 5.590

5.  Molecular characterization of Chikungunya virus and forecasting of future outbreak.

Authors:  Rujittika Mungmunpuntipantip; Viroj Wiwanitkit
Journal:  Med J Armed Forces India       Date:  2019-11-30

6.  Codon usage vis-a-vis start and stop codon context analysis of three dicot species.

Authors:  Prosenjit Paul; Arup Kumar Malakar; Supriyo Chakraborty
Journal:  J Genet       Date:  2018-03       Impact factor: 1.166

7.  Codon Usage of Hepatitis E Viruses: A Comprehensive Analysis.

Authors:  Bingzhe Li; Han Wu; Ziping Miao; Linjie Hu; Lu Zhou; Yihan Lu
Journal:  Front Microbiol       Date:  2022-06-21       Impact factor: 6.064

8.  Codon usage of host-specific P genotypes (VP4) in group A rotavirus.

Authors:  Han Wu; Bingzhe Li; Ziping Miao; Linjie Hu; Lu Zhou; Yihan Lu
Journal:  BMC Genomics       Date:  2022-07-16       Impact factor: 4.547

Review 9.  De-Coding the Contributions of the Viral RNAs to Alphaviral Pathogenesis.

Authors:  Autumn T LaPointe; Kevin J Sokoloski
Journal:  Pathogens       Date:  2021-06-19

10.  Edging on Mutational Bias, Induced Natural Selection From Host and Natural Reservoirs Predominates Codon Usage Evolution in Hantaan Virus.

Authors:  Galal Ata; Hao Wang; Haoxiang Bai; Xiaoting Yao; Shiheng Tao
Journal:  Front Microbiol       Date:  2021-07-02       Impact factor: 5.640

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.