Literature DB >> 15826910

Composition bias and genome polarity of RNA viruses.

Prasert Auewarakul1.   

Abstract

I have observed a relationship between GC content in coding sequences of RNA viruses and their genome polarity. Positive-stranded RNA viruses have significantly higher GC contents than negative-stranded RNA viruses. Coding sequences of all negative-stranded RNA viruses are biased toward high A in coding strands (high T in genomes), while two distinct patterns were observed among positive-stranded RNA genomes. This finding suggests that RNA viruses with different genome polarity are under different mutational pressure, which may be a consequence of the difference in the strategies of viral genome expression and replication. The GC content directly affects the viral codon adaptation index using highly expressed human genes as the reference set, which may theoretically predict the efficiency of viral gene expression in human cells.

Entities:  

Mesh:

Substances:

Year:  2004        PMID: 15826910      PMCID: PMC7114242          DOI: 10.1016/j.virusres.2004.10.004

Source DB:  PubMed          Journal:  Virus Res        ISSN: 0168-1702            Impact factor:   3.303


Main text

Genome composition of living organisms can vary widely. This is considered to be the result of the directional mutational bias toward GC or AT (Lobry and Sueoka, 2002, Sueoka, 1988, Sueoka, 1993). This directional mutational bias could theoretically be due to a bias in the copying error of viral RNA polymerase, selection pressure, or editing by host RNA-editing enzymes. Certain types of hypermutation have been described in a number of viruses (Cattaneo et al., 1988, Vartanian et al., 1994, Vartanian et al., 2002), and may also contribute to the viral genome composition. Genome composition bias in viruses has not been systematically analyzed. A global view of the pattern of viral genome composition bias may give us an insight into the complex evolution history of viruses and viral-host interactions. GC content of genome has been shown to be a major contributing factor to the codon usage bias, which could affect expression efficiency (Aota and Ikemura, 1986, Chen et al., 2004, Francino and Ochman, 1999, Ikemura and Wada, 1991, Kanaya et al., 2001). It is interesting to see how GC content interacts with genome polarity and codon usage bias in RNA viruses. Genome composition and codon usage bias are particularly interesting in RNA viruses because the same RNA may be used as mRNA, genome, or anti-genome. The replication of RNA genome is also very different from DNA replication of the host using different polymerase enzymes and taking place in different environments, which may contribute to the mutational bias that drives the genome composition. RNA viruses with positive- and negative-stranded genome are very different in their strategies of genome expression and replication, which may contribute to mutational bias and selection pressure. To do the analyses, I retrieved compositions of coding sequences of 50 viruses from a codon usage database available at Kazusa Research Institute, Japan (http://www.kazusa.or.jp/codon/cgi-bin/). The viruses were chosen to cover most viral families that cause diseases in human. When distinct separation between human and animal strains can be made, only human strains were included in the analyses, for example human influenza virus A (H3N2). The names of the viruses and their codon composition are shown in Table 1 . There is a significant difference between GC contents of positive-stranded viruses versus negative-stranded viruses (p  < 0.01, t-test) (Fig. 1 ). The positive-stranded viruses have a mean GC content of 49.8% in their coding sequences, while that of the negative-stranded viruses is 40.4%. I excluded retroviruses from positive-stranded viruses in these analyses because their replication strategies are very different. If the strategies of genome replication and expression could affect mutational pressure or exert a selection pressure on codon usage, it is likely to be different between retroviruses and other positive-stranded RNA viruses. For retroviruses, two distinct patterns were observed: HIV has lower GC content, while HTLV has high GC content (Fig. 1). This is in agreement with a previous observation, but the reason for this difference is unclear (Berkhout et al., 2002).
Table 1

List of the analyzed viruses

NamePolarityCAIGC (%)G (%)C (%)A (%)T (%)
Astrovirus+0.36246.1823.1822.9929.6724.14
Coronavirus+0.30238.4221.5016.9126.8034.77
Coxsackie virusA9+0.39846.9724.4722.4929.1823.84
Dengue virus type 1+0.35346.3825.8820.4931.9621.65
Dengue virus type 2+0.36145.825.2120.5833.2120.98
Dengue virus type 3+0.35946.4725.9120.5532.1221.40
Dengue virus type 4+0.36646.9126.3120.5931.0322.06
Enterovirus 71+0.39947.9924.3723.6127.5524.46
Eastern equine encephalitis+0.41250.3324.3425.9827.6921.97
Hepatitis Avirus+0.29837.1521.7415.4030.0832.76
Hepatitis C virus+0.47757.8828.1329.7420.5421.57
Hepatitis E virus+0.44557.2926.1931.0917.3225.38
Hepatitis G virus+0.4658.8332.1826.6417.8123.36
Japanese encephalitis virus+0.41851.4828.4223.0527.4621.05
Norwalk virus+0.3748.5423.6224.9126.5024.95
O’Nyong Nyong virus+0.38348.5624.4224.1330.8920.54
Poliovirus type 3+0.39245.9623.2822.6830.1823.85
Rhinovirus type 89+0.29138.2919.4418.8432.1429.56
Ross river virus+0.43952.1826.6025.5727.3720.44
Rubella virus+0.61269.5930.8738.7114.9015.50
SARS coronavirus+0.32541.0221.0819.9328.2530.72
Sindbis virus+0.41851.0525.1825.8727.9820.96
Venezuelan encephalitis virus+0.42350.1225.4924.6228.0821.79
West Nile virus+0.42651.228.7922.4027.2321.56
Yellow fever virus+0.41649.7328.5821.1327.0623.21
HIV-1Retro0.32843.2824.9318.3434.6622.05
HIV-2Retro0.35545.8925.1920.6933.3420.76
HTLV-1Retro0.4152.6818.1634.5123.0624.25
HTLV-2Retro0.42853.6217.7535.8624.4021.96
Borna virus0.40150.6525.0625.5825.0624.21
Bunyamwera virus0.35335.9719.2516.7135.3128.71
Crimean-Congo virus0.36943.5922.4421.1431.5524.85
Ebola virus0.34444.3621.5422.8130.6225.02
Hantaan virus0.31840.4422.5917.8431.8227.74
Hendra virus0.34342.3822.9719.4032.5325.08
Influenza A virus (H3N2)0.35643.5724.3119.2532.4323.98
Influenza B virus0.32541.1322.4718.6535.2323.63
Influenza C virus0.30138.620.5818.0235.7825.60
La Crosse virus0.31637.6420.3917.2434.6027.74
Marburg virus0.31140.7119.6621.0431.9427.34
Measles virus0.38347.1924.3422.8428.4524.35
Metapneumovirus0.29939.0920.9718.1136.9423.96
Mokola virus0.38345.2825.3419.9430.1524.56
Mumps virus0.3241.9819.2422.7329.7828.23
Nipah virus0.32940.3621.5018.8533.2126.43
Parainfluenzavirus 10.29638.5320.1018.4236.3925.06
Parainfluenzavirus 20.30639.918.4321.4631.7128.39
Parainfluenzavirus 30.28836.5218.9417.5637.9925.48
Rabies virus0.38846.0524.3521.6928.6725.27
Respiratory syncytial virus0.29635.3215.1720.1438.9825.70
Rift Valley fever virus0.38545.6125.2720.3328.1226.26
Sendai virus0.37546.524.9021.5929.4024.09
Sin Nombre virus0.29939.1322.2916.8431.3129.54
Vesicular stomatitis virus0.3441.7622.2719.4831.8226.42
Fig. 1

A dot plot shows genome GC contents of positive-stranded RNA viruses on the left and those of retroviruses in the middle and negative-RNA viruses on the right.

List of the analyzed viruses A dot plot shows genome GC contents of positive-stranded RNA viruses on the left and those of retroviruses in the middle and negative-RNA viruses on the right. Codon usage bias of many human viruses does not match the pattern for efficient expression in human and has been shown to be driven mainly by GC contents of their genomes (Jenkins and Holmes, 2003). Expression of viral genes can be restricted by codon usage bias (Haas et al., 1996), and codon optimization can enhance expression of viral genes and has been used in development of DNA vaccines (Andre et al., 1998). To study the codon bias in relation to predicted translational efficiency in human cells, I calculated codon adaptation index (CAI) using highly expressed human genes as the reference set (Haas et al., 1996). This highly expressed codon set has been used successfully for codon optimization of viral genes for efficient expression in human cells (Andre et al., 1998, Haas et al., 1996). The CAI was designed for predicting the level of expression of a gene and for assessing the adaptation of viral genes to their hosts. A higher CAI value indicates a better codon adaptation. Genes with well-adapted codons for efficient translation generally have CAIs of > 0.6 (Sharp and Li, 1987). The CAI was calculated on a server of evolving code group at the University of Maryland (http://www.evolvingcode.net/codon/CAI_Calculator.php). In this set of RNA viruses, GC contents correlated with CAIs with a Pearson correlation coefficient of 0.959 (p  < 0.01) (Fig. 2a). CAIs varied widely among viruses ranging from 0.288 for parainfluenza virus to 0.612 for rubella virus with an average of 0.369. This result confirmed that codon bias of RNA viruses is driven mainly by GC content, and consequently the positive-stranded viruses have higher CAI than the negative-stranded viruses (0.403 versus 0.325, p  < 0.001, t-test). Since codons contain different number of GC, amino acid content can be biased by GC content. To determine the influence of GC content on amino acid choice, I counted the number of amino acids Glycine, Alanine, Arginine, and Proline (GARP), of which codons are GC rich. The GARP contents in this set of viruses show a Pearson correlation coefficient of 0.959 (p  < 0.01) with GC content (Fig. 2b). This indicates that amino acid contents in the viral proteins are determined mainly by GC contents of their genomes.
Fig. 2

Dot plots show the relationships between genome GC contents and CAIs (a), and between GC contents and %GARP (b) in all analyzed viruses. Solid circles represent positive-stranded RNA viruses, open circles represent negative-stranded viruses, and open triangles are retroviruses.

Dot plots show the relationships between genome GC contents and CAIs (a), and between GC contents and %GARP (b) in all analyzed viruses. Solid circles represent positive-stranded RNA viruses, open circles represent negative-stranded viruses, and open triangles are retroviruses. I further analyzed coding nucleotide-count of these viruses. Most positive-stranded viruses, HIVs, and All negative-stranded viruses have high A and low C (in the positive-strands), although the positive-stranded viruses show relatively lower level of bias. Some of the positive-stranded viruses and HTLVs, on the other hand, have low A and high C (Fig. 3 ). The reason for these two opposite pattern of biases is not clear. These patterns of nucleotide biase were similar when first, second, and third positions of codon were analyzed separately (data not shown). This suggested that selection pressure on codon preference is not likely to be the cause of the nucleotide bias. Because a similar pattern (high A and low C) was observed in both positive- and negative-stranded viruses on the same plus strand, i.e. genome of positive-stranded viruses and anti-genome of negative-stranded viruses, the mechanism underlying the bias may be similar and act in a strand-specific manner. Because copying of both strands uses the same viral RNA polymerase and takes place in similar intracellular environment, intrinsic copying error of the enzyme is unlikely to cause the strand-specific nucleotice bias.
Fig. 3

Nucleotide frequencies of coding sequences in positive-stranded viruses (upper), retroviruses (middle), and negative-stranded viruses (lower).

Nucleotide frequencies of coding sequences in positive-stranded viruses (upper), retroviruses (middle), and negative-stranded viruses (lower). Recently, a mechanism responsible for G to A hypermutation in HIV-1 by a host innate defense has been discovered (Mangeat et al., 2003, Shindo et al., 2003). Other types of RNA-editing, some of which target double-stranded RNA, have been also reported in some RNA viruses (Galinski et al., 1992, Polson et al., 1996). If host responses are also responsible for mutational bias in other RNA viruses, it is possible that they are less effective for positive-stranded RNA genomes as they might be recognized as self mRNA. It is also possible that the minus strand RNA may be the main target of host RNA-editing mechanism. This would explain the strand-specific pattern of nucleotide bias. It might also explain the nucleotide bias difference between the viruses with different genome polarity, because positive-stranded viruses produce only limited amount of minus strand anti-genome, which may be well protected in their replication complex. Negative-stranded viruses, on the other hand, must produce numerous amount of minus strand RNA. While the explanation awaits further exploration, my analysis gives an initial clue to an interaction between strategies of genome replication (genome polarity) and mutational bias (GC content) of RNA viruses.
  20 in total

1.  Codon and amino acid usage in retroviral genomes is consistent with virus-specific nucleotide pressure.

Authors:  Ben Berkhout; Andrei Grigoriev; Margreet Bakker; Vladimir V Lukashov
Journal:  AIDS Res Hum Retroviruses       Date:  2002-01-20       Impact factor: 2.205

2.  Codon usage between genomes is constrained by genome-wide mutational processes.

Authors:  Swaine L Chen; William Lee; Alison K Hottes; Lucy Shapiro; Harley H McAdams
Journal:  Proc Natl Acad Sci U S A       Date:  2004-02-27       Impact factor: 11.205

3.  Evident diversity of codon usage patterns of human genes with respect to chromosome banding patterns and chromosome numbers; relation between nucleotide sequence data and cytogenetic data.

Authors:  T Ikemura; K Wada
Journal:  Nucleic Acids Res       Date:  1991-08-25       Impact factor: 16.971

4.  G-->A hypermutation of the human immunodeficiency virus type 1 genome: evidence for dCTP pool imbalance during reverse transcription.

Authors:  J P Vartanian; A Meyerhans; M Sala; S Wain-Hobson
Journal:  Proc Natl Acad Sci U S A       Date:  1994-04-12       Impact factor: 11.205

5.  Diversity in G + C content at the third position of codons in vertebrate genes and its cause.

Authors:  S Aota; T Ikemura
Journal:  Nucleic Acids Res       Date:  1986-08-26       Impact factor: 16.971

6.  The codon Adaptation Index--a measure of directional synonymous codon usage bias, and its potential applications.

Authors:  P M Sharp; W H Li
Journal:  Nucleic Acids Res       Date:  1987-02-11       Impact factor: 16.971

7.  Directional mutation pressure, mutator mutations, and dynamics of molecular evolution.

Authors:  N Sueoka
Journal:  J Mol Evol       Date:  1993-08       Impact factor: 2.395

8.  Codon usage and tRNA genes in eukaryotes: correlation of codon usage diversity with translation efficiency and with CG-dinucleotide usage as assessed by multivariate analysis.

Authors:  S Kanaya; Y Yamada; M Kinouchi; Y Kudo; T Ikemura
Journal:  J Mol Evol       Date:  2001 Oct-Nov       Impact factor: 2.395

9.  The enzymatic activity of CEM15/Apobec-3G is essential for the regulation of the infectivity of HIV-1 virion but not a sole determinant of its antiviral activity.

Authors:  Keisuke Shindo; Akifumi Takaori-Kondo; Masayuki Kobayashi; Aierken Abudu; Keiko Fukunaga; Takashi Uchiyama
Journal:  J Biol Chem       Date:  2003-09-11       Impact factor: 5.157

10.  Biased hypermutation and other genetic changes in defective measles viruses in human brain infections.

Authors:  R Cattaneo; A Schmid; D Eschle; K Baczko; V ter Meulen; M A Billeter
Journal:  Cell       Date:  1988-10-21       Impact factor: 41.582

View more
  16 in total

1.  Molecular characterization of China rabies virus vaccine strain.

Authors:  Wenqiang Jiao; Xiangping Yin; Zhiyong Li; Xi Lan; Xuerui Li; Xiaoting Tian; Baoyu Li; Bin Yang; Yun Zhang; Jixing Liu
Journal:  Virol J       Date:  2011-11-17       Impact factor: 4.099

2.  Classification of COVID-19 and Other Pathogenic Sequences: A Dinucleotide Frequency and Machine Learning Approach.

Authors:  Gciniwe S Dlamini; Stephanie J Muller; Rebone L Meraba; Richard A Young; James Mashiyane; Tapiwa Chiwewe; Darlington S Mapiye
Journal:  IEEE Access       Date:  2020-10-15       Impact factor: 3.367

3.  Senecavirus A Enhances Its Adaptive Evolution via Synonymous Codon Bias Evolution.

Authors:  Simiao Zhao; Huiqi Cui; Zhenru Hu; Li Du; Xuhua Ran; Xiaobo Wen
Journal:  Viruses       Date:  2022-05-16       Impact factor: 5.818

Review 4.  Molecular and Structural Insights into the Life Cycle of Rubella Virus.

Authors:  Pratyush Kumar Das; Margaret Kielian
Journal:  J Virol       Date:  2021-02-24       Impact factor: 5.103

5.  Large-scale nucleotide optimization of simian immunodeficiency virus reduces its capacity to stimulate type I interferon in vitro.

Authors:  Nicolas Vabret; Marc Bailly-Bechet; Alice Lepelley; Valérie Najburg; Olivier Schwartz; Bernard Verrier; Frédéric Tangy
Journal:  J Virol       Date:  2014-01-29       Impact factor: 5.103

6.  Analysis of synonymous codon usage bias in helicase gene from Autographa californica multiple nucleopolyhedrovirus.

Authors:  Hongju Wang; Tao Meng; Wenqiang Wei
Journal:  Genes Genomics       Date:  2018-04-06       Impact factor: 1.839

7.  Analysis of Synonymous Codon Usage Bias of Zika Virus and Its Adaption to the Hosts.

Authors:  Hongju Wang; Siqing Liu; Bo Zhang; Wenqiang Wei
Journal:  PLoS One       Date:  2016-11-28       Impact factor: 3.240

8.  Patterns of evolution and host gene mimicry in influenza and other RNA viruses.

Authors:  Benjamin D Greenbaum; Arnold J Levine; Gyan Bhanot; Raul Rabadan
Journal:  PLoS Pathog       Date:  2008-06-06       Impact factor: 6.823

9.  Genomic diversity and evolution of the lyssaviruses.

Authors:  Olivier Delmas; Edward C Holmes; Chiraz Talbi; Florence Larrous; Laurent Dacheux; Christiane Bouchier; Hervé Bourhy
Journal:  PLoS One       Date:  2008-04-30       Impact factor: 3.240

10.  CpG usage in RNA viruses: data and hypotheses.

Authors:  Xiaofei Cheng; Nasar Virk; Wei Chen; Shuqin Ji; Shuxian Ji; Yuqiang Sun; Xiaoyun Wu
Journal:  PLoS One       Date:  2013-09-23       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.