Literature DB >> 32422647

Comparative analysis of codon usage patterns in Rift Valley fever virus.

Hayeon Kim1, Myeongji Cho2, Hyeon S Son2,3.   

Abstract

Rift Valley fever virus (RVFV) is a vector-borne pathogen and is the most widely known virus in the genus Phlebovirus. Since it was first reported, RVFV has spread to western Africa, Egypt and Madagascar from its traditional endemic region, and infections continue to occur in new areas. In this study, we analyzed genomic patterns according to the infection properties of RVFV. Among the four segments of RVFV, the nucleotide composition, overall GC content and the difference of GC composition in the third position of the codons (%GC3) between groups were the largest in the S (NP) segment, showing that more diverse codons were used than in other segments. Furthermore, the results of CAI analysis of the S (NP) segment showed that viruses isolated from regions where no previous infections had been reported had the highest values, indicating greater adaptability to human hosts compared with other viruses. This result suggests that mutations in the S (NP) segment co-evolve with the infected hosts and may lead to expansion of the geographic range. The distinctive codon usage patterns observed in specific genomic regions of a group with similar infection properties may be related to the increasing likelihood of RVFV infections in new areas.

Entities:  

Year:  2020        PMID: 32422647      PMCID: PMC7323899          DOI: 10.1590/1678-4685-GMB-2019-0240

Source DB:  PubMed          Journal:  Genet Mol Biol        ISSN: 1415-4757            Impact factor:   1.771


Introduction

Recently, infection with Rift Valley fever virus (RVFV) was reported for the first time in China (Liu ). Although it was identified in a patient who was returning to China from Angola and was not directly infected in China, no RVFV infections have been reported in Angola previously (Liu ). Since its first report of infection and transmission between lambs in the Rift Valley of Kenya in 1930, RVFV continues to cause infections (Daubney ). Previously, RVFV infections were found mostly in parts of Africa such as Kenya, but infections are increasing outside the traditional endemic region, such as in the Middle East and Europe (Madani ; Chevalier ; Grobbelaar ). This trend indicates that RVFV is highly likely to cause infections in new areas. RVFV is a vector-borne viral pathogen in the genus Phlebovirus and is known to cause zoonotic infections and change hosts via mosquitoes (Aedes spp., Culex spp., Anopheles spp., etc.) (Elliott, 1997; Bouloy and Weber, 2010). RVFV is an enveloped negative single-stranded RNA virus and ranges in size from 80 to 120 nm (Ellis ; Pepin ). RVFV has a circular three-segment genome, and these segments form a panhandle secondary structure due to cDNA sequences at the end of each segment (Hewlett ; Boshra ). Different proteins are encoded in each segment. The L segment encodes RNA polymerase used in the replication and mRNA transcription processes (Gerrard and Nichol, 2007). The M segment encodes two glycoproteins (Gn and Gc) that are required for viral entry and assembly and a nonstructural protein that inhibits cell apoptosis (Gerrard and Nichol, 2007). The S segment, with ambisense characteristics, encodes nucleoproteins that induce a host immune response in the antisense orientation, and nonstructural (NS) proteins that damage the host genome and function as an interferon antagonist in the complementary orientation (Gerrard and Nichol, 2007). Infections with RVFV can lead to serious illness, including retinitis, hepatitis, renal failure, meningoencephalitis, and severe hemorrhagic diseases, and can cause death in humans (Bird ). As there are currently no effective vaccines or treatments for RVFV, the emergence of RVFV in new areas may lead to serious public health problems (Faburay ). RVFV infection is mainly spread by mosquitoes, and therefore the area infected with RVFV is limited by the habitat distribution of its mosquito vectors (Tantely ). However, recent climate change and increasing international trade have resulted in migration and expanded habitat for the vectors, allowing RVFV infection to occur in unexpected areas (Chevalier ; Tantely ). In this study, we analyzed the infection properties of RVFV based on previously reported sequence information.

Material and Methods

Data collection

Sequence data was downloaded from the National Center for Biotechnology (NCBI) GenBank database (https://www.ncbi.nlm.nih.gov/genbank/) in order to compare the genetic characteristics of RVFVs that infect humans. RVFV sequences isolated from infected humans were studied in this analysis. Sequence datasets (in FASTA format) for the four coding-sequence (CDS) regions (large [L], medium [M], small [S] nonstructural protein [NS] and small [S] nucleoprotein [NP] segments) within each segment of the virus were grouped based on the region of collection (country) (Table 1).
Table 1

Summary of RVFV sequence characteristics.

CountryL segmentM segmentS (NP) segmentS (NS) segment
Kenya1021414
Sudan451313
Central Africa Republic2787
Madagascar6666
Egypt3466
Angola1111
Saudi Arabia1221
China3333
Total30305351

Phylogenetic analysis

Phylogenetic analysis was performed on the L, M, and S (NP and NS) segments using the program MEGA7 (http://www.megasoftware.net) to examine the evolutionary relationships among RVFVs by region and time (year) (Kumar ). Sequence alignment was performed with MUSCLE in MEGA7, and the maximum likelihood (ML) method based on the Tamura-Nei model was used to construct phylogenetic trees (Tamura and Nei, 1993; Kumar ). A robustness test was conducted with the bootstrap value set to 1,000.

Codon usage analysis

Analysis of codon usage bias in viruses provides information on molecular evolution; it can also improve understanding of the regulation of viral gene expression and help to identify the efficient expression process of viral proteins required to evade immune responses (Shackelton and Holmes, 2004; Butt ). In this study, genomic patterns were compared by analyzing the nucleotide composition features of each segment, and codon usage bias was evaluated using the effective number of codons (ENC). The ENC value is 20 if only one synonymous codon is preferred and ranges up to 61 if all synonymous codons are equally preferred (Wright, 1990). There is an inverse relationship between ENC and gene expression. A lower ENC value indicates strong codon usage bias and elevated gene expression, while a higher ENC value indicates a diversity of codons encoding amino acids and lower gene expression (Wright, 1990). Generally, an ENC value > 35 suggests that there is a relatively conserved genomic composition (Comeron and Aguadé, 1998). Furthermore, differences in the preference of codons for a single amino acid were examined using relative synonymous codon usage (RSCU) values (Sharp and Li, 1986). Amino acids can be simultaneously encoded by one to six different codons, and codons encoding the same amino acid tend show preferential usage (Plotkin ). Generally, codons with RSCU values > 1.0 are more preferred (abundant codons), while those with RSCU values < 1.0 are less preferred (less-abundant codons). An RSCU value of 1.0 indicates that all codons were used randomly or equally (Sharp and Li, 1986). In this study, codon usage patterns were analyzed using the tools of the Gene Infinity website (http://www.geneinfinity.org/sms/sms_codonusage.html), and codon adaptation index (CAI) values were calculated for comparison of general codon usage patterns among the virus and its hosts, human and mosquito, using the CAIcal program (ver. 1.4, http://genomes.urv.cat/CAIcal).

Results

Phylogenetic relationships and classification of RVFV

Phylogenetic trees were constructed for each segment (S [NS, NP], M, and L) of the RVFV genome. RVFVs were grouped according to the infected region (country) and time (year) in the constructed trees (Figure 1). This result indicated that RVFVs do not cause infections with the same genetic composition, but rather the genomic features of this virus vary with region and time due to mutations, which can also lead to changes in viral infection patterns. RVFV infections do not maintain the same level of toxicity every year, and the reported death rate due to the virus varies according to the time and region. In particular, the RVFV sequences from Kenya and Sudan in 2007–2008 that were subjected to analysis were found to form a single group, and RVFV infection caused a considerable number of deaths in Kenya (155 deaths; case fatality rate 23%) and Sudan (230 deaths; case fatality rate 30.8%) (World Health Organization, 2007; Hassan ). This result shows that mutations in RVFV may affect its toxicity. Although, the genetic lineages (A~G) of RVFV have been classified by previous studies (Bird ; Ikegami, 2012), the groups of RVFV in this study were re-classified based on the phylogenetic analysis for the collected sequences. This is because previous studies did not consider the sequences of RVFV that occurred in the 2000s. Therefore, we based on these results, codon usage patterns of the five groups of RVFV (Group 1: Kenya [2006-2007], Madagascar [2008], and Sudan [2010]; Group 2: Madagascar [1991], Kenya [1998], and Saudi Arabia [2000]; Group 3: Central African Republic [1969 and 1985]; Group 4: Egypt [1997] and Madagascar [1979]; and Group 5: Angola [2016] and China [2016]) were analyzed and compared.
Figure 1

Phylogenetic analysis of RVFV; (a) L segment, (b) M segment, (c) S (np) segment, (d) S (ns) segment RVFVs were classified into five groups (Group 1: [pink triangle]; Group 2: [sky blue circle]; Group 3: [dark blue circle]; Group 4: [red circle]; Group 5: [light green triangle]).

Nucleotide composition of the CDS region in RVFV

Four CDS regions were analyzed for each segment to compare the nucleotide compositions of the five groups identified in phylogenetic analysis (Table 2). In the L, M and S (NS) segments, no significant difference in base composition was detected. In contrast, the nucleotide composition features of each group in the S (NP) segment showed a difference in composition of the third base. The third bases A (A3), C (C3), T (T3), and G (G3) of the S (NP) segment had overall frequencies in the range of 17.48–21.09%, 21.22–23.48%, 25.61–27.66%, and 29.73–33.44%, respectively. These results show that among the four CDS regions of RVFV, the S (NP) segment may be a useful indicator for identifying the genetic properties of RVFVs.
Table 2

Nucleotide composition of RVFV segments.

A1A2A3C1C2C3T1T2T3G1G2G3
LGroup 131.6732.1325.0816.7220.7821.5621.5630.2728.8830.0416.8224.48
Group 231.6132.0825.3416.6720.8321.6621.6030.2428.6830.1216.8524.32
Group 331.3931.8824.9516.8020.8821.8521.4830.2428.4830.3417.0024.72
Group 431.5632.0125.4916.7520.8321.6221.5030.2428.6930.2016.9124.20
Group 531.6332.0124.6316.6520.8921.3821.5330.1829.1230.2016.9224.87
MGroup 128.3628.1424.3417.1924.0421.0922.4627.4630.3031.9920.3624.27
Group 228.5128.0523.8917.1523.9620.4922.4827.6130.8631.8720.3924.75
Group 328.3928.1124.3017.2723.9321.0022.3627.5830.1831.9820.3824.51
Group 428.3528.1124.2216.9623.9620.9322.7727.5830.3231.9220.3524.53
Group 528.4028.1924.4617.2624.0020.7022.3927.5231.0731.9520.2823.77
S (NP)Group 128.0531.3019.0721.1726.0221.2215.0127.2427.6635.7715.4532.06
Group 228.0531.3019.1120.7326.0221.6215.4527.2427.1635.7715.4532.11
Group 328.0531.2021.0920.8326.0221.8015.3527.2427.3935.7715.5529.73
Group 428.5230.8920.5020.1526.0222.2415.6227.2426.9535.7115.8530.31
Group 528.0531.3017.4821.0426.0223.4815.1427.2425.6135.7715.4533.44
S (NS)Group 125.1826.1520.7223.5320.6521.7417.4034.6035.7833.9018.6021.76
Group 225.1926.3220.6823.5920.6821.9017.3934.5935.5333.8318.4221.90
Group 325.0826.3220.4123.5220.6821.7017.4534.5935.8233.9418.4222.07
Group 425.1926.2720.3023.6820.6821.3817.6734.5935.7633.4618.4722.56
Group 525.1026.3219.2723.6820.5921.4317.3934.6836.7533.8318.4222.56

Compositional properties of the CDS region of RVFV

The %GC, %GC3, ENC, and CAI values were calculated for each group in order to analyze codon usage patterns in RVFVs. The %GC and %GC3 values showed the most significant differences between groups within the S (NP) segment (Table 3). The %GC3 value indicates the frequency of occurrence of guanine (G) or cytosine (C) at the wobble site, which is the third position of a codon. The %GC3 values were found to be greater than 50% for all groups in the S (NP) segment, but less than 50% in the other three segments. This result shows that the frequency of codons ending in G or C is higher than that of adenine (A) or thymine (T). In particular, the %GC3 values of the five groups were 51.50–56.90%, showing a greater difference between groups than other segments. The CDS region of the S (NP) segment encodes a nucleoprotein, and nucleoprotein of RVFV is known to induce host immune responses. This finding suggests that differences in host immune responses to the virus and the varied outcome of viral infection for each group may be caused by the properties of the S (NP) segment. As a result of ENC analysis, RVFV was found to have a high ENC value overall. Although the difference between groups was not great, the ENC value of the S (NP) segment was notably high (> 60), indicating that the CDS region of the S (NP) segment uses a greater variety of codons than other CDS regions. The CAI value is a measure of similarity in the codon usage pattern of a given gene, with that of the host species used as a reference. This study used the CAI values of the mosquito (Aedes aegypti), a representative vector of RVFV, and the infected host (Homo sapiens) for comparison of general codon usage patterns. As the CAI value approaches one, the codon usage pattern becomes more similar to that of the reference individual. Overall, the CAI value with humans (Homo sapiens) as a reference was higher than that with mosquitos (Aedes aegypti). Remarkably, the CAI value of the S (NP) segment is highest in Group 5. In this study, all viral data for Group 5 were obtained from RVFVs collected in 2016. These viruses were isolated from new regions (Angola and China) where no previous cases of infection were reported, and the viral data used for analysis is the most recent data among the five groups. These results suggest that mutations in the S (NP) segment co-evolve with the hosts (mosquitoes and humans) and may allow the virus to expand its geographic range.
Table 3

%GC, %GC3, ENA and CAI values of RVFV segments.

%GC%GC3ENCCAIa CAIb
L segmentGroup 143.4646.0351.290.7640.693
Group 243.5046.0051.430.7630.691
Group 343.8846.5551.450.7640.697
Group 443.5045.8551.300.7630.690
Group 543.6346.2551.300.7650.688
M segmentGroup 146.3245.3449.490.7620.666
Group 246.2045.2549.580.7640.666
Group 346.3445.4949.740.7610.665
Group 446.2245.4650.000.7650.668
Group 545.9844.4849.980.7610.671
S (NP) segmentGroup 150.5553.2760.320.7480.696
Group 250.5453.7860.820.7530.696
Group 349.9051.5060.510.7480.688
Group 450.0952.5360.990.7470.700
Group 551.7556.9061.000.7620.716
S (NS) segmentGroup 146.7343.5355.290.7450.693
Group 246.7543.8055.250.7450.695
Group 346.7943.7754.540.7540.703
Group 446.7043.9455.640.7430.698
Group 546.8544.0053.230.7540.704
In addition, ENC plots were generated for each CDS region in order to determine the degree of compositional constraints on codon usage bias in the RVFVs (Figure 2). The ENC plot shows variation of ENC values according to the change in %GC3 as a decentralized graph and is known to be an effective method for examining codon usage variations among genes. In the present study, ENC values plotted against %GC3 of the CDS regions in the L, M and S (NS) segments were distributed below the curve, showing that codon usage is biased. In contrast, for the S (NP) segment, the ENC values were distributed above the curve, indicating that codon usage is more variable.
Figure 2

ENC versus GC3 plots for RVFV segments; (a) L segment, (b) M segment, (c) S (np) segment, (d) S (ns) segment ENC plotted against GC3 of the CDS regions in the L, M and S (NS) segments are distributed below the curve, which means that codon usage is biased. In the S (NP) segment, the ENC values are distributed above the curve, showing that codon usage is more variable.

Prevalence of preferred codons

RSCU analysis was performed to determine whether group-specific properties could be discriminated from differing codon preferences in each CDS region (Figure 3). In the L segment, the codons AGC (R) and AGG (R) showed relatively large differences in RSCU values compared to other codons and were found to be over-represented. Most other codons showed similar preferences, with the same over-represented codons (≥ 1.6) and under-represented codons (≤ 0.6) and no differences among groups. In the M segment, the codons AGC (R), AGG (R) and UCA (S) showed relatively large differences in RSCU values compared to other codons and were identified as over-represented codons. The RSCU values of the codons CGA (R), CGG (R) and GGG (G) in Groups 1 to 4 were 0.36–0.39, 0.56–0.6 and 1.8–1.91, while those in Group 5 were 0.64, 0.37, and 1.54, respectively, indicating large differences compared to other groups. In the S (NP) segment, UUA (L) was an under-represented codon except in Group 3 (0.72) and had the lowest representation in Group 5 (0.2), while CUG (L) was identified as an over-represented codon in Group 1 (1.88) and Group 5 (2.02). In Group 5, the most highly preferred codon was UCU (S), with RSCU values ≥ 1.6 (1.62), while the RSCU value of the codon UCG (S) was 0.0, showing a different codon usage pattern from other groups. In Group 1, the RSCU values of the codons CAU (H) and CAC (H) were 1.51 and 0.49, respectively, indicating differences in codon preference from other groups. In Group 3, the RSCU values of the codons GGU (G) and GGC (G) were 0.35 and 1.81, respectively, showing a different codon usage pattern from other groups. In the S (NS) segment, CUU (L) was an over-represented codon in Group 4 (1.62) and Group 5 (1.62), and GUG (L) showed a significant difference in codon preference in Group 3 (1.68), Group 4 (1.62), and Group 5 (1.79). The RSCU values of the codon GCU (A) were 1.71 (Group 1) and 1.76 (Group 2), while those of codon GCC (A) were 0.53 (Group 1) and 0.57 (Group 2), respectively. In Group 4, the RSCU values of GCA (A) and GCC (A) were 1.43 and 0.57, respectively, indicating a difference in codon preferences compared to other groups. RSCU analysis showed that the difference in codon preference between groups was more variable in the S segment than in the L and M segments.
Figure 3

RSCU analysis of RVFV segments; (a) L segment, (b) M segment, (c) S (np) segment, (d) S (ns) segment There is more variation in the differences of codon preferences between groups in the S segment compared to the L and M segments.

Discussion

Various factors allow viruses to expand their range and rapidly evolve pathogenicity when adapting to new environments and hosts, including natural environmental factors and anthropogenic factors, such as climate change and the development of international trade and transportation. Surveillance of the emergence of viruses is important, as an unexpected influx of new infectious agents into a new area can cause serious illnesses in unimmunized populations. RVFV infections are being reported in new areas continuously and constant monitoring for the emergence of the virus is required. This study investigated whether the effects of RVFV on hosts differed among epidemic periods and whether evolutionary changes in viruses are involved in the expansion of the affected area. RVFVs were grouped based on the collection time and region through phylogenetic analysis. Based on the sample clusters, nucleotide composition and codon usage were analyzed. The nucleotide composition, overall GC content, and differences in GC content in the third codon position (%GC3) between groups were greatest in the S (NP) segment, confirming that more diverse codons were used there than in other segments. Remarkably, in CAI analysis of the S (NP) segment, Group 5 had the highest value, indicating that Group 5 viruses have the greatest similarity to the reference data in terms of codon usage patterns and expression levels, and that they are better adapted to human hosts compared with other groups. Group 5 consisted of the most recent viral samples among the five groups, and all Group 5 viruses were isolated from new regions (Angola and China) where no previous cases of infection had been reported. These results suggest that mutations in the S (NP) segment co-evolve with infected hosts, i.e., mosquitoes and humans, and may lead to expansion of the areas where viral infection occurs. Due to the limitations of the published data, we could not analyze some recently isolated RVFV sequence data, other than data for Group 5 collected in Angola and China. Sufficient genetic data can reduce the bias that can occur during the analysis, so if a future analysis is performed using additional public data, it may provide important information to confirm the relationship between evolutionary variation in the patterns of RVFV and the incidence of infection. RVFV viruses have a relatively large number of conserved genomic regions, and infection has occurred mainly in limited areas due to the geographically limited habitat of its vectors. However, the results of this study showed distinct codon usage patterns in specific genomic regions and identified a group of RVFVs that might have an increased possibility of causing infections in new areas based on genetic mutations. Therefore, continuous monitoring of RVFV is necessary to prevent an epidemic of this infectious disease. The codon usage patterns of RVFVs demonstrated in the present study suggest the need for continuous monitoring of RVFV infections, particularly with regard to mechanisms of viral evolution and adaptation to new environmental conditions and to human hosts.
  26 in total

Review 1.  The evolution of large DNA viruses: combining genomic information of viruses and their hosts.

Authors:  Laura A Shackelton; Edward C Holmes
Journal:  Trends Microbiol       Date:  2004-10       Impact factor: 17.079

2.  The 'effective number of codons' used in a gene.

Authors:  F Wright
Journal:  Gene       Date:  1990-03-01       Impact factor: 3.688

3.  Complete genome analysis of 33 ecologically and biologically diverse Rift Valley fever virus strains reveals widespread virus movement and low genetic diversity due to recent common ancestry.

Authors:  Brian H Bird; Marina L Khristova; Pierre E Rollin; Thomas G Ksiazek; Stuart T Nichol
Journal:  J Virol       Date:  2006-12-27       Impact factor: 5.103

Review 4.  Rift Valley fever virus.

Authors:  Brian H Bird; Thomas G Ksiazek; Stuart T Nichol; N James Maclachlan
Journal:  J Am Vet Med Assoc       Date:  2009-04-01       Impact factor: 1.936

Review 5.  Emerging viruses: the Bunyaviridae.

Authors:  R M Elliott
Journal:  Mol Med       Date:  1997-09       Impact factor: 6.354

6.  Biology of mosquitoes that are potential vectors of Rift Valley Fever virus in different biotopes of the central highlands of Madagascar.

Authors:  Michaël Luciano Tantely; Jean-Claude Rakotoniaina; Etienne Tata; Lala Andrianaivolambo; Fidimanana Razafindrasata; Didier Fontenille; Nohal Elissa
Journal:  J Med Entomol       Date:  2013-05       Impact factor: 2.278

7.  Rift Valley fever virus: some ultrastructural observations on material from the outbreak in Egypt 1977.

Authors:  D S Ellis; D I Simpson; S Stamford; K S Abdel Wahab
Journal:  J Gen Virol       Date:  1979-02       Impact factor: 3.891

8.  Rift Valley fever epidemic in Saudi Arabia: epidemiological, clinical, and laboratory characteristics.

Authors:  Tariq A Madani; Yagob Y Al-Mazrou; Mohammad H Al-Jeffri; Amin A Mishkhas; Abdullah M Al-Rabeah; Adel M Turkistani; Mohammad O Al-Sayed; Abdullah A Abodahish; Ali S Khan; Thomas G Ksiazek; Osama Shobokshi
Journal:  Clin Infect Dis       Date:  2003-09-23       Impact factor: 9.079

9.  Molecular epidemiology of Rift Valley fever virus.

Authors:  Antoinette A Grobbelaar; Jacqueline Weyer; Patricia A Leman; Alan Kemp; Janusz T Paweska; Robert Swanepoel
Journal:  Emerg Infect Dis       Date:  2011-12       Impact factor: 6.883

10.  The first imported case of Rift Valley fever in China reveals a genetic reassortment of different viral lineages.

Authors:  Jingyuan Liu; Yulan Sun; Weifeng Shi; Shuguang Tan; Yang Pan; Shujuan Cui; Qingchao Zhang; Xiangfeng Dou; Yanning Lv; Xinyu Li; Xitai Li; Lijuan Chen; Chuansong Quan; Qianli Wang; Yingze Zhao; Qiang Lv; Wenhao Hua; Hui Zeng; Zhihai Chen; Haofeng Xiong; Chengyu Jiang; Xinghuo Pang; Fujie Zhang; Mifang Liang; Guizhen Wu; George F Gao; William J Liu; Ang Li; Quanyi Wang
Journal:  Emerg Microbes Infect       Date:  2017-01-18       Impact factor: 7.163

View more
  2 in total

1.  Edging on Mutational Bias, Induced Natural Selection From Host and Natural Reservoirs Predominates Codon Usage Evolution in Hantaan Virus.

Authors:  Galal Ata; Hao Wang; Haoxiang Bai; Xiaoting Yao; Shiheng Tao
Journal:  Front Microbiol       Date:  2021-07-02       Impact factor: 5.640

2.  Strategies and Patterns of Codon Bias in Molluscum Contagiosum Virus.

Authors:  Rahul Raveendran Nair; Manikandan Mohan; Gudepalya R Rudramurthy; Reethu Vivekanandam; Panayampalli S Satheshkumar
Journal:  Pathogens       Date:  2021-12-20
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.