Literature DB >> 32274012

vhcub: Virus-host codon usage co-adaptation analysis.

Ali Mostafa Anwar¹, Mohamed Soudy², Radwa Mohamed¹.

Abstract

Viruses show noticeable evolution to adapt and reproduce within their hosts. Theoretically, patterns and factors that affect the codon usage of viruses should reflect evolutionary changes that allow them to optimize their codon usage to their hosts. Some software tools can analyze the codon usage of organisms; however, their performance has room for improvement, as these tools do not focus on examining the codon usage co-adaptation between viruses and their hosts. This paper describes the vhcub R package, which is a crucial tool used to analyze the co-adaptation of codon usage between a virus and its host, with several implementations of indices and plots. The tool is available from: https://cran.r-project.org/web/packages/vhcub/. Copyright:

Entities: CellLine Chemical Disease Gene Species

Keywords: Adaptation; Codon Usage Bias; Evolution; Natural selection; R; RStudio; Viruses

Mesh：

Substances：
Codon

Year: 2019 PMID： 32274012 PMCID： PMC7104870 DOI： 10.12688/f1000research.21763.1

Source DB: PubMed Journal: F1000Res ISSN： 2046-1402

Introduction

During the translation process from mRNAs to proteins, information is transmitted in the form of triple nucleotides, named codons, which encode amino acids. Multiple codons that encode one amino acid are known as synonymous codons. Studies concerning different organisms report that synonymous codons are not used uniformly within and between genes of one genome, a phenomenon known as codon usage bias (CUB) [1, 2]. Since viruses rely on the tRNA pool of their hosts in the translation process, previous studies suggest that translation selection or/and directional mutational pressure act on the codon usage of the viral genome to optimize or deoptimize it towards the codon usage of their hosts [3, 4]. Tools and packages are available to analyze codon usage, e.g. coRdon [5], but there is no package available that focuses on the examination of codon usage co-adaptation between viruses and their hosts. vhcub is a package implemented in R, which aims to easily analyze the co-adaptation of codon usage between a virus and its host. vhcub measures several codon usage bias measurements, such as effective number of codons (ENc) [6], codon adaptation index (CAI) [7], relative codon deoptimization index (RCDI) [8], similarity index (SiD) [9], synonymous codon usage orderliness (SCUO) [10], and relative synonymous codon usage (RSCU) [10]. It also provides a statistical dinucleotide over- and under-representation with three different models.

Methods

Implementation

vhcub imports Biostrings [11], seqinr [12] and stringr [13] to handle fasta files and manipulate DNA sequences. Also, it imports coRdon [5], which is used to estimate different CUB measures. vhcub first converts the fasta format to data.frame type, to efficiently maintain and calculate different indices implemented in the package. Table 1 describes all the functions available in vhcub, and the result returned from each. Also, it contains references to the equations used to estimate each measure. Furthermore, vhcub uses ggplot2 [14] to visualize two important plots named ENc-GC3 plot ( Figure 2) and PR2-plot ( Figure 3), which help to explain the factors influencing a virus’s evolution concerning its CUB.

Table 1.

Functions available in vhcub, and the result returned from each one.

Function name	Description	Value
fasta.read	Read fasta formate and convert it to data frame	A list with two data.frames; the first one for virus DNA sequences and the second one for the host.
CAI.values	Measure the Codon Adaptation Index (CAI) using Sharp and Li (1987) [7] equation, of DNA sequence.	A data.frame containing the computed CAI values for each DNA sequences within df.fasta.
dinuc.base	A measure of statistical dinucleotide over- and under-representation; by allows for random sequence generation by shuffling (with/without replacement) of all bases in the sequence [13].	A data.frame containing the computed statistic for each dinucleotide in all DNA sequences within df.virus.
dinuc. codon	A measure of statistical dinucleotide over- and underrepresentation; by allows for random sequence generation by shuffling (with/without replacement) of codons [13].	A data.frame containing the computed statistic for each dinucleotide in all DNA sequences within df.virus.
dinuc. syncodon	A measure of statistical dinucleotide over- and underrepresentation; by allows for random sequence generation by shuffling (with/without replacement) of synonymous codons [13].	A data.frame containing the computed statistic for each dinucleotide in all DNA sequences within df.virus.
ENc.values	Measure the Effective Number of Codons (ENc) of DNA sequence. Using its modified version (Novembre, 2002) [6].	A data.frame containing the computed ENc values for each DNA sequences within df.fasta.
GC.content	Calculates overall GC content as well as GC at first, second, and third codon positions.	A data.frame with overall GC content as well as GC at first, second, and third codon positions of all DNA sequence from df.virus.
RCDI.values	Measure the Relative Codon Deoptimization Index (RCDI) [8] of DNA sequence.	A data.frame containing the computed ENc values for each DNA sequences within df.fasta.
RSCU. values	Measure the Relative Synonymous Codon Usage (RSCU) [7] of DNA sequence.	A data.frame containing the computed RSCU values for each codon for each DNA sequences within df.fasta.
SCUO. values	Measure the Synonymous Codon Usage Eorderliness (SCUO) of DNA sequence using Wan et al., 2004 [10] equation.	A data.frame containing the computed SCUO values for each DNA sequences within df.fasta.
SiD.value	Measure the Similarity Index (SiD) between a virus and its host codon usage [15].	A numeric represent a SiD value.
PR2.plot	Make a Parity rule 2 (PR2) plot [16], where the AT-bias [A3/(A3 +T3)] at the third codon position of the four- codon amino acids of entire genes are the ordinate and the GC-bias [G3/(G3 +C3)] is the abscissa. The centre of the plot, where both coordinates are 0.5, is where A = U and G = C (PR2), with no bias between the influence of the mutation and selection rates.	A ggplot object.
ENc. GC3plot	Make an ENc-GC3 scatterplot [17]. Where the y-axis represents the ENc values and the x-axis represents the GC3 content. The red fitting line shows the expected ENc values when codon usage bias affected solely by GC3.	A ggplot object.

Figure 2.

ENc-GC3 plot showing the values of the ENc versus the GC3 content for the virus (Escherichia virus T4) CDS, the solid red line represents the expected ENc values if the codon bias is affected by GC3s only.

Figure 3.

PR2-plot showing CDS of the virus (Escherichia virus T4), plotted based on their GC bias [G3/(G3 + C3)] and AT bias [A3/(A3 + T3)] in the third codon position, the two solid red lines represent both coordinates (ordinate and abscissa) equal to 0.5, where A = T and G = C.

Operation

vhcub was developed using R and is available on CRAN. It is compatible with Windows, and major Linux operating systems. The package can be installed as: Figure 1 describes the vhcub workflow. It starts with reading the fasta files for a virus and its host. After, nucleotide content analysis, codon usage bias analysis on genes and codon level (marked by the red boxes in Figure 1) can be applied independently (the blue boxes in Figure 1). However, within the same analysis, some measures rely on others. For example, the reference set of genes used to estimate a virus codon adaptation index was defined based on the effective number of codons of its host. Finally, the orange boxes in Figure 1 represent the two plots (ENc-GC3 plot and PR2-plot).

Figure 1.

vhcub workflow, to analyze virus-host codon usage co-adaptation.

The white boxes represent the input fasta files. The red boxes represent three main analysis, each with different measures (the blue boxes), and the orange boxes represent ENc-GC3 plot and PR2-plot.

vhcub workflow, to analyze virus-host codon usage co-adaptation.

The white boxes represent the input fasta files. The red boxes represent three main analysis, each with different measures (the blue boxes), and the orange boxes represent ENc-GC3 plot and PR2-plot.

Use cases

Using vhcub to study the CUB of a virus, its host and the co-adaptation between them is straightforward. As an example, we have used the coding sequences for Escherichia virus T4 and its host Escherichia coli in the form of fasta format. As mentioned before, each category of analysis could be applied independently. Hence, this example will show only the codon usage bias analysis at the codon level. SiD measures the effect of the codon usage bias of the E. coli on E. coli T4 virus. In general, SiD ranged from 0 to 1 with higher values indicating that the host has a dominant effect on the usage of codons. In this example, SiD is approximately equal to 0.491. Which means that E . coli does not dominate E. coli T4 CUB. Also, this code generates RSCU values for each codon in each gene from both organisms and can be used for further analysis.

Conclusions

vhcub depends only on DNA sequences as input and can compute different measures of CUB for viruses, such as ENc, CAI, SCUO, and RCDI ( Table 1). It can also be used to study the association between viruses and their hosts’ RSCU and SiD. There are many possible directions for future work; further versions will execute more indices, plots, and statistical analysis, to facilitate the workflow for examining the adaptations of viruses’ CUB in the R environment.

Data availability

Escherichia virus T4 fasta file: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/836/945/GCF_000836945.1_ViralProj14044 Escherichia coli fasta file: ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/005/845/GCF_000005845.2_ASM584v2/GCF_000005845.2_ASM584v2_cds_from_genomic.fna.gz

Software availability

Software available from: https://CRAN.R-project.org/package=vhcub Source code available from: https://github.com/AliYoussef96/vhcub Archived source code as at time of publication: http://doi.org/10.5281/zenodo.3572391 [18] License: GPL-3 From the technical point of view, the vhcub R package looks quite reliable and well supported. However, there is only one example illustrating it. Moreover, this example shows the mean value for SiD in its range (i.e., 0.491 or approx. 0.5), which is great but makes us wondering if this package could give expected values for other cases. If that is possible, could the authors include a couple of examples where the SiD value were below 0.5 and above 0.5? The tool is very interesting and captivating. The advantages that R offers are infinite, so I consider that it would be invaluable to exploit the output that R offers. It is clear in the article that the algorithm only allows the entry of DNA sequences in Fasta format, although there are other very simple tools to use to transcribe from RNA to DNA, or from RNA- to RNA + and DNA, it would be very nice to use the same R package to carry out this step, especially considering that from biological tests we not only analyze one sequence but many. To the same extent, I consider that the figures presented in the article should be more discussed from a biological point of view, they could be more informative. We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. Viruses in the course of their evolution would optimize their codon usage to their hosts. They rely on the tRNA pool of their hosts in the translation process. Though tools for analyzing the codon usage of organisms are available, none of them focus on examining the codon usage co-adaptation between viruses and their hosts. This software, vhcub, is a tool used to analyze the co-adaptation of codon usage between a virus and its host. This may also help to predict the possible mutations that would accumulate in the virus vis - a - vis its host(s), thereby showing the readiness for the control and prevention of the disease. General comments 1. Corrections in the text Specific Spelling of formate may be corrected to format in the second column X first row of Table 1 Please define df.fasta In Third column X sixth row and Third column X eight row are one and the same - please explain or correct Whether it can be used in Eukayotes? I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. General comments (Corrections in the text) Response: I will make this correction during the article revisions. Response: (df.host) as well as (df.virus) are just variables names for data frames holds host genes and virus genes, respectively. The definition will be added during the article revisions. Response: In the third column X eight row. It will be corrected from ENc to RCDI. Comment: The spelling of formate may be corrected to format in the second column X first row of Table 1. Comment: Please define df.fasta Comment: In Third column X sixth row and Third column X eight row are one and the same - please explain or correct Specific comments Response: The translation codon table (The Genetic Codes Tables) number could be changed to any table number (As defined by NCBI https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi), in vhcub. For example in the function named ( CAI.values() ), one can pass an argument genetic.code="11" for bacterial codon table or genetic.code="1" for eukaryotes. Hence, yes, the host can be Prockayotic or Eukaryotic (vhcub can be used for Eukaryotes). Comment: Whether it can be used in Eukaryotes? The authors have accepted to do the necessary changes in the revised version and as well they have answered to my query.

12 in total

1. Accounting for background nucleotide composition when measuring codon usage bias.

Authors: John A Novembre
Journal: Mol Biol Evol Date: 2002-08 Impact factor: 16.240

2. Modulation of poliovirus replicative fitness in HeLa cells by deoptimization of synonymous codon usage in the capsid region.

Authors: Cara Carthel Burns; Jing Shaw; Ray Campagnoli; Jaume Jorba; Annelet Vincent; Jacqueline Quay; Olen Kew
Journal: J Virol Date: 2006-04 Impact factor: 5.103

3. RCDI/eRCDI: a web-server to estimate codon usage deoptimization.

Authors: Pere Puigbò; Lluís Aragonès; Santiago Garcia-Vallvé
Journal: BMC Res Notes Date: 2010-03-31

4. Codon influence on protein expression in E. coli correlates with mRNA levels.

Authors: Reka Letso; Helen Neely; W Nicholson Price; Grégory Boël; Kam-Ho Wong; Min Su; Jon Luff; Mayank Valecha; John K Everett; Thomas B Acton; Rong Xiao; Gaetano T Montelione; Daniel P Aalberts; John F Hunt
Journal: Nature Date: 2016-01-13 Impact factor: 49.962

5. Evolution of codon usage in Zika virus genomes is host and vector specific.

Authors: Azeem Mehmood Butt; Izza Nasrullah; Raheel Qamar; Yigang Tong
Journal: Emerg Microbes Infect Date: 2016-10-12 Impact factor: 7.163

6. Analysis of Synonymous Codon Usage Bias in Potato Virus M and Its Adaption to Hosts.

Authors: Zhen He; Haifeng Gan; Xinyan Liang
Journal: Viruses Date: 2019-08-14 Impact factor: 5.048

7. Comparative analysis of codon usage bias and codon context patterns between dipteran and hymenopteran sequenced genomes.

Authors: Susanta K Behura; David W Severson
Journal: PLoS One Date: 2012-08-17 Impact factor: 3.240

8. Quantitative relationship between synonymous codon usage bias and GC composition across unicellular genomes.

Authors: Xiu-Feng Wan; Dong Xu; Andris Kleinhofs; Jizhong Zhou
Journal: BMC Evol Biol Date: 2004-06-28 Impact factor: 3.260

9. CRPV genomes with synonymous codon optimizations in the CRPV E7 gene show phenotypic differences in growth and altered immunity upon E7 vaccination.

Authors: Nancy M Cladel; Jiafen Hu; Karla K Balogh; Neil D Christensen
Journal: PLoS One Date: 2008-08-13 Impact factor: 3.240

10. Comparative Analysis of Codon Usage Bias Patterns in Microsporidian Genomes.

Authors: Heng Xiang; Ruizhi Zhang; Robert R Butler; Tie Liu; Li Zhang; Jean-François Pombert; Zeyang Zhou
Journal: PLoS One Date: 2015-06-09 Impact factor: 3.240

5 in total

1. Deciphering the co-adaptation of codon usage between respiratory coronaviruses and their human host uncovers candidate therapeutics for COVID-19.

Authors: Komi Nambou; Manawa Anakpa
Journal: Infect Genet Evol Date: 2020-07-22 Impact factor: 3.342

2. Cryptosporidium felis differs from other Cryptosporidium spp. in codon usage.

Authors: Jiayu Li; Yaqiong Guo; Dawn M Roellig; Na Li; Yaoyu Feng; Lihua Xiao
Journal: Microb Genom Date: 2021-12

3. Codon usage bias and dinucleotide preference in 29 Drosophila species.

Authors: Prajakta P Kokate; Stephen M Techtmann; Thomas Werner
Journal: G3 (Bethesda) Date: 2021-08-07 Impact factor: 3.154

4. Human genes with codon usage bias similar to that of the nonstructural protein 1 gene of influenza A viruses are conjointly involved in the infectious pathogenesis of influenza A viruses.

Authors: Komi Nambou; Manawa Anakpa; Yin Selina Tong
Journal: Genetica Date: 2022-04-08 Impact factor: 1.633

5. Human genes with relative synonymous codon usage analogous to that of polyomaviruses are involved in the mechanism of polyomavirus nephropathy.

Authors: Yu Fan; Duan Guo; Shangping Zhao; Qiang Wei; Yi Li; Tao Lin
Journal: Front Cell Infect Microbiol Date: 2022-09-08 Impact factor: 6.073

5 in total