Literature DB >> 35685354

Evolution and host adaptability of plant RNA viruses: Research insights on compositional biases.

Zhen He^1,2, Lang Qin¹, Xiaowei Xu¹, Shiwen Ding¹.

Abstract

During recent decades, many new emerging or re-emerging RNA viruses have been found in plants through the development of deep-sequencing technology and big data analysis. These findings largely changed our understanding of the origin, evolution and host range of plant RNA viruses. There is evidence that their genetic composition originates from viruses, and host populations play a key role in the evolution and host adaptability of plant RNA viruses. In this mini-review, we describe the state of our understanding of the evolution of plant RNA viruses in view of compositional biases and explore how they adapt to the host. It appears that adenine rich (A-rich) coding sequences, low CpG and UpA dinucleotide frequencies and lower codon usage patterns were found in the vast majority of plant RNA viruses. The codon usage pattern of plant RNA viruses was influenced by both natural selection and mutation pressure, and natural selection mostly from hosts was the dominant factor. The codon adaptation analyses support that plant RNA viruses probably evolved a dynamic balance between codon adaptation and deoptimization to maintain efficient replication cycles in multiple hosts with various codon usage patterns. In the future, additional combinations of computational and experimental analyses of the nucleotide composition and codon usage of plant RNA viruses should be addressed.

Entities: Chemical

Keywords: Codon usage; Compositional biases; Host adaptation; Plant RNA viruses

Year: 2022 PMID： 35685354 PMCID： PMC9160401 DOI： 10.1016/j.csbj.2022.05.021

Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN： 2001-0370 Impact factor: 6.155

Introduction

In the last two decades, viromics (or viral metagenomics) have led to the discovery of many new RNA viruses in animals and plants through the development of deep-sequencing technology and big data analysis [1], [2]. These findings largely changed our understanding of the origin, evolution and host range of plant RNA viruses. In general, several common forces drive the evolution of plant RNA viruses, including high mutational rates, strong purifying selection, genetic drift, and evolutionary arms races with infected hosts [1], [3], [4], [5], [6], [7], [8], [9], [10], [11], [12]. Consistent with animal RNA viruses, the evolutionary history of plant RNA viruses also comprises three possible hypotheses: horizontal gene transfers from the host genome, coevolution or codivergence with hosts, and parallel evolution with related genetic elements [1], [13]. We can see that plant hosts had a significant influence on the evolutionary history and trends of RNA viruses. In fact, the recent frequent emergence or re-emergence of new viral diseases is driven by adaptive evolution corresponding to new ecological conditions, especially hosts [9], [12], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24]. In agriculture, several well-studied emerging plant RNA virus diseases have attracted much attention due to economic damage to crop hosts, such as from rice yellow mottle virus [23], [25], [26] and barley yellow dwarf virus[27]. During the process of emergence, the well-established original host species could be considered reservoir hosts. Elena et al. (2011, 2014) [11], [12] described three temporal phases of emergence, such as host jumps to new species or the same species but in a new ecological condition, adaptation to the new host or environment, and epidemiology in the new host population, usually by adaptation to a new transmission mode or new vector species (Fig. 1). In summary, four groups of driving forces sharpen the emergence of viruses, including the genetic composition of the virus population, the genetic composition of the host population, the genetic composition of vectors for vectored viruses, and the ecology of viruses and/or host plants (Fig. 1) [12], [22]. Thus, the genetic composition originating from virus, host and vector populations plays a key role in the evolution and host adaptability of plant RNA viruses.

Fig. 1

Schematic overview on the emergence and host adaptation of plant RNA viruses.

Schematic overview on the emergence and host adaptation of plant RNA viruses. In general, the four nucleotides (A, adenine, C cytosine, G guanine and U uracil) are not random in the genomes of viruses and the hosts they infect [28], [29], [30], [31], [32], [33] (Fig. 2). This is often facilitated by synonymous codons (codons encoding the same amino acid), which allow for 61 triplet codons that encode 20 amino acids; for example, Asn, Asp, Cys, Glu, Gln, His, Phe, Tyr, and Lys are encoded by two codons; Ile is encoded by three codons; Ala, Gly, Thr, Pro, and Val are encoded by four codons; and Arg, Leu, Ser are encoded by six codons. These phenomena are termed codon degeneracy. Interestingly, the usage of codon degeneracy is also not randomly selected [34], [35], [36], [37], [38], [39], [40] (Fig. 2). In nature, the unequal preference for specific codons over other synonymous codons in various organisms creates a bias in codon usage [41], [42], [43], [44]. Similar to codon usage, codon order is also not randomly selected because a ribosome decodes two codons simultaneously in the process of translation [45] (Fig. 2). In 1985, codon pair bias was first described in Escherichia coli [46] and then in bacteria, archaea, and eukaryotes [47]. Dinucleotide biases were considered the proposed explanation of nucleotide and codon preferences [48], [49], [50], [51].

Fig. 2

Schematic overview on the compositional biases of plant RNA viruses in ideal and nature conditions.

Schematic overview on the compositional biases of plant RNA viruses in ideal and nature conditions. In the past three years, SARS-CoV-2 induced by COVID-19 has rapidly developed into a devastating global pandemic, causing nearly 5 million fatalities and more than 238 million cases, and now the daily number of people infected is also increasing rapidly [52], [53]. Therefore, the evolution and host adaptation of animal RNA viruses have attracted great attention. Kustin and Stern (2020) described adenine rich (A-rich) coding sequences in the vast majority of animal RNA viruses and proposed possible reasons such as codon usage bias, weakened RNA secondary structures, and selection for a particular amino acid composition, concluding that similar biases in coding sequence composition across animal RNA viruses are possibly due to host immune pressures [29]. Gaunt and Digard (2021) reviewed the compositional biases mainly in RNA viruses in terms of the causes, consequences and applications [28]. For plant RNA viruses, several recent studies have reported nucleotide composition, codon usage bias, dinucleotide bias and host or vector adaptation [32], [39], [40], [54], [55], [56], [57], [58], [59], [60], [61], [62], [63], [64], [65]. In this review, we summarize evolution and host adaptation in plant RNA viral genomes by considering the compositional biases, and widely used software packages for compositional bias analyses. We also discuss future trends under the rapid development of big data and metagenomic analysis.

Nucleotide composition of plant RNA viruses

Nucleotide bias of plant RNA viruses

Ideally, the four bases A, T, C, and G occur at a frequency of 25% equally in an organism’s genome. However, in nature, nucleotide bias is frequently seen across almost all genomes. Adenine rich (A-rich) coding sequences have been found in the vast majority of animal RNA viruses, accompanied by a strong diminution of C [29]. The highest A (49%) was found in VPg sequences of Rhinovirus [29]. Consistently, in the family Potyviridae, A-rich composition has been found in all genera, with the highest value (35%) in Arepavirus [66] (Fig. 3A). For single plant virus species, A-rich composition has also been found in all or partial coding region sequences of potato virus Y (PVY) [40], citrus tristeza virus (CTV) [64], sugarcane mosaic virus (SCMV) [62], rice black streak dwarf virus (RBSDV) [61], narcissus degeneration virus (NDV), narcissus late season yellows virus (NLSYV) and narcissus yellow stripe virus (NYSV) [128] (Fig. 3). Uracil rich (U-rich) coding sequences were found in the two open reading frames (ORFs) of broad bean wilt virus 2 (BBWV-2) [60], the cysteine-rich nucleic acid binding protein (NABP) gene of potato virus M (PVM) [63], P8 protein coding sequences of RBSDV [61], coat protein (CP) of CTV [54], cowpea mild mottle virus (CpMMV) [127] and banana bract mosaic virus (BBrMV) [129] (Fig. 3C). Similarly, U3-rich (uracil at the third codon position) has been found in most coding sequences of these plant RNA viruses (Fig. 3D). More U3S-rich sequences (the third position's nucleotide composition of synonymous codons) were also found in these plant RNA virus coding sequences (Fig. 3E). Overall, AU-rich coding sequences were found in these plant RNA viruses, and the highest AU (65.50%) was found in the P8 coding sequences of RBSDV [61] (Fig. 3F). For viruses, the AU- or GC-rich composition tends to correlate with their RSCU patterns [60], [61], [62], [63], [67], [68], [132]. For example, an AU-rich composition of SCMV genomes contains codons that frequently end with A and U [62]. Codon usage bias, weakened RNA secondary structures, and selection for a particular amino acid composition possibly explain adenine rich (A-rich) coding sequences in these plant RNA viruses [29]. However, extensive G, G3, G3s and GC were observed in the CP gene of PVM, reflecting the influence of mutation pressure [63] (Fig. 3). Codon W, MEGA, BioEditor, DnaSP and CAIcal SERVER can calculate the base composition of plant RNA viruses (Table 1).

Fig. 3

Nucleotide composition of recently reported plant RNA viruses. Source from Biswas et al. (2019), Chakraborty et al. (2015), Cheng et al. (2012), Gómez et al. (2020), He et al. (2019, 2020, 2021, 2022), Prádena et al (2020), Patil et al. (2017), Yang et al. (2022), Huang et al. (2015).

Table 1

Software for nucleotide composition and codon adaptation analyses.

Name	Description and advantages	Uses	Availability	URL	Reference
Software for nucleotide composition and codon analyses
BioEditor	BioEditor is an application that enables scientists and educators to prepare and present structure annotations containing formatted text, graphics, sequence data, and interactive molecular views.	BioEditor can be used to analyse codon and base composition.	Local installation	https://bioeditor.sdsc.edu	[110]

chips	Nc provides an intuitive and meaningful measure of the degree of codon bias in genes. Low values indicate strong codon bias and high values indicate low bias (probably noncoding regions).	Chips computes Frank Wright's Nc statistic for nucleotide sequences.	Local installation	https://emboss.sourceforge.net/apps/release/6.6/emboss/apps/chips.html	[111]

CodonW	Codon W is a software package for codon usage analysis. It is designed to simplify multivariate analysis (MVA) of codon usage. The MVA method employed in codon W is COA, the most popular MVA method for codon usage analysis. Codon W can generate COAs for codon usage, relative synonymous codon usage, or amino acid usage. Other analyses of codon usage include studies of optimal codons, codon and dinucleotide bias and/or base composition.	Codon W applies correspondence analysis (COA), the most popular MVA method for codon usage analysis. Codon W can generate COA for codon usage, relative synonymous codon usage or amino acid usage analyses.	Local installation	https://sourceforge.net/projects/codonw/	[112]

cusp	Cusp computes a codon usage table for one or more nucleotide coding sequences and writes the table to a file.The codon usage table gives each codon: i. sequence of codons. ii. The encoded amino acid. iii. The proportion of codon usage in its redundant set, i.e., the set of codons encoding the amino acid of that codon. iv. Given the input sequence, the expected number of codons per 1000 bases. v. The number of codons observed in the sequence.	Creates a codon usage table from a nucleotide sequence.	Local installation	https://emboss.sourceforge.net/apps/release/6.6/emboss/apps/cusp.html	[111]

DnaSP	DnaSP is a software package for the analysis of DNA polymorphism data.	The present version allows for analysis of the evolutionary pattern of preferred and unpreferred codons.	Local installation	https://www.ub.es/dnasp	[113]

EncPrime	A program to calculate the summary statistic Nc' of codon usage bias.	Calculates the ENC metric.	Local installation	https://github.com/jnovembre/ENCprime

SMS (Sequence Manipulation Suite)	The program can compares the frequency of codons encoding the same amino acid (synonymous codons)	SMS can be used to assess whether sequences show a preference for particular synonymous codons.	Web	https://www.bioinformatics.org/sms2/codon_usage.html	[114]

MEGA 11	Molecular Evolutionary Genetics Analysis (MEGA) software has matured to contain a large collection of methods and tools of computational molecular evolution.	MEGA now contains methods for analyses of codons, RSCU and base composition.	Local installation	https://www.megasoftware.net/citations	[115]

Software for codon pair analysis
ANACONDA	The Anaconda software package provides a set of statistical, bioinformatics and data visualization tools for gene primary structure analysis.	It can be used for analysis of genomic codon preference and codon pair preference	Local installation	https://bioinformatics.ua.pt/software/anaconda/	[116]

CoCoPUTs	CoCoPUT is a table of codon and codon pair usage derived from all available GenBank and RefSeq data. When searching for species, the search takes precedence over RefSeq, so that if the RefSeq assembly is available, it will automatically extract data from that source. If searching for a species without RefSeq assemblies, use the taxonomic ID of the organism for best results.	The codon usage table is a measure of codon usage bias, such as the relative frequency with which different codons are used in genes of a given species. Likewise, the codon pair usage table shows counts for each codon pair in the CDS of a given species and is a measure of codon pair usage bias.	Web	https://hive.biochemistry.gwu.edu/review/codon2	[117]

CPS (codon pair score)	Measures codon pair bias, defined analogously to the RSCU.	It can be used to determine the level of similarity in codon pair preferences between viruses and their host.	R package	https://rdrr.io/github/alex-sbu/CPBias/man/CPScalc.html	[48]

CPO (codon pair optimization)	A software tool to provide codon pair optimization for synthetic gene design.	CPO provides a simple and efficient means for customizing codon optimization based on the codon pair bias of Pichia pastoris.	R package	https://microbialcellfactories.biomedcentral.com/articles/10.1186/s12934-021-01696-y#Sec15	[118]

Software for codon adaptation analysis
CAIcal	It includes a complete set of CAI related utilities. The server provides useful important functions such as computational and graphical representation of CAI, representation along single sequences or protein multiple sequence alignments translated into DNA. The CAIcal tool also includes automatic calculation of the CAI and its expected value.	The CAIcal server provides a complete set of tools to assess codon usage adaptation and aid in genome annotation.	Web	https://genomes.urv.es/CAIcal	[107]

CBI (codon bias index)	Optimal codon usage is measured using the ratio between the number of optimal codons in the gene and the total number of codons in the gene. It uses the expected usage as a scaling factor.	It can calculate the presence of components with high CUB in a particular gene.	Local installation	https://codonw.sourceforge.net/index.html	[119]
COOL	COOL was designed as an adaptable web-based interface that provides a wide range of functions. Users can completely customize the synthetic gene design process through a step-by-step job submission process, which allows for them to specify their optimal parameter settings.	COOL supports a simple and flexible interface for customizing various codon optimization parameters such as the codon adaptation index, single codon usage, and codon pairing.	Web	https://bioinfo.bti.a-star.edu.sg/COOL/	[120]

coRdon	Codon usage bias can be used to predict the relative expression levels of genes by comparing the CU bias of a gene to the CU bias of a set of genes known to be highly expressed. This method can be effectively used to predict highly expressed genes in a single genome, and it is particularly useful at a higher level of the whole metagenome. By analysing the CU deviation of the macrogenome, we can identify the genes with high predictive expression in the whole microbial community, and determine the enrichment functions in the community, that is, their “functional fingerprint”.	It can calculation of different CU bias statistics and CU-based gene expression predictions, gene set enrichment analysis of annotated sequences, and several methods for displaying CU and enrichment analysis results.	R package	https://www.bioconductor.org/packages/devel/bioc/vignettes/coRdon/inst/doc/coRdon.html

COUSIN	Calculates codon usage for user-suppliedSequences.	COUSIN allows for easy and complete analysis of cuprefs, including seven other indices, and provides functions such as statistical analysis, clustering and cuprefs optimization of gene expression.	Web or install	https://cousin.ird.fr/index.php	[121]

HEG-DB	Database of the CAI index of HEGs for 200 genomes	Calculates the CAI.	Web	https://genomes.urv.cat/HEG-DB/	[122]

Jcat (Java Codon Adaptation Tool)	Further choices for Jcat codon adaptation include the avoidance of unwanted cleavage sites for restriction enzymes and Rho-independent transcription terminators. Compared with existing tools, Jcat does not need to manually define high-expression genes, so it is a very fast and simple method.	A novel method for the adaptation of target gene codon usage to most sequenced prokaryotes and selected eukaryotic gene expression hosts to improve heterologous protein production.	Web	http://www.jcat.de/Start.jsp	[123]

OPTIMIZER	OPTIMIZER allows for three optimization methods and uses several valuable new reference sets. It can be used to optimize the expression levels of genes, assess the fitness of foreign genes inserted into the genome, or design new genes from protein sequences.	Optimizes the codon usage of a DNA sequence to increase its expression level.	Web	https://genomes.urv.es/OPTIMIZER/	[124]

stAI (species-specific tRNA adaptation index)	The tRNA adaptation index (tAI) is a widely used measure of the efficiency with which the intracellular tRNA pool recognizes coding sequences. The index includes weights representing the wobble interactions between codons and tRNA molecules. The software presents a new method to adjust tAI weights to any target model organism without the need for gene expression measurements. The method is based on optimizing the correlation between tAI and codon usage bias measures.	The calculator includes optimized tAI weights for 100 species from three life domains, as well as a stand-alone software package to optimize weights for new organisms.	Web	https://www.cs.tau.ac.il/∼tamirtul/stAIcalc/stAIcalc.html	[125]

Synthetic Gene Designer	Synthetic Gene Designer includes three main stages of genetic design. Given it a gene of interest and the target genome in which it is expressed.	Synthetic Gene Designer offers enhanced functionality compared to existing software options; for example, it enables users to use nonstandard genetic codes, user-defined codon usage patterns, and an expanded set of codon optimization methods.	Web	https://www.evolvingcode.net/codon/sgd/index.php	[126]

Codon bias of plant RNA viruses

The codon usage bias of viruses is not randomly selected [28], [29], [34], [69], [70], [71], [72], including that of plant RNA viruses [39], [40], [60], [63], [64], [65]. Most reported animal RNA viruses show a low codon usage bias [37], [64], [67], [68], [73], which allows for efficient replication in the host cell by lowering the level of competition with the host genes. For plant RNA viruses, Adams and Antoniw (2004) found low codon usage bias in CP gene sequences of several genera, such as Potyvirus, Cucumovirus, Sobemovirus, and Polerovirus [56]. More recently, a lower codon usage pattern was also found in complete or partial gene coding sequences of several plant RNA viruses, such as BBWV2, CTV, PRSV, PVM, PVX, RSV and SCMV [39], [60], [61], [62], [63], [64], [74]. These lower codon usage patterns indicate a low degree of preference in plant RNA viruses. Similar to eukaryotic life, the codon usage patterns of viruses are shaped by mutation, natural selection, drift, compositional constraints, gene length and function, secondary protein structure, selective transcription, replication and hydrophobicity [41], [43], [44], [45], [75], [76], [77], [78], [79]. Several codon usage pattern analyses, including ENC-plot, neutrality plot, PR2, and regression analyses between ENC, GC, GC3S and ARO, GRAVY values indicated that plant RNA viruses were influenced by both natural selection and mutation pressure, and natural selection was the dominant factor shaping the codon usage pattern of plant RNA viruses. Chen et al. (2020) found that virus codon usage bias (CUB) tended to be more similar to that of symptomatic hosts than that of asymptomatic natural hosts, indicating a general dissimilation of CUB in virus–host coevolution due to translational selection (Fig. 4) [80]. Codon W, DnaSP, MEGA, Chips, cusp, EncPrime, CodonO, SMS and CAIcal SERVER can calculate the codon usage of plant RNA viruses (Table 1).

Fig. 4

Schematic overview on the regulatory role of plant RNA viruses’ CUB and its evolutionary implication.

Codon pair bias of plant RNA viruses

Consistent with the codon usage bias, some codon pairs are used more frequently than others in prokaryotic and eukaryotic genomes, and the phenomenon was described as codon pair bias (CPB) [45], [81], [82]. CPB has been summarized for bacteria, archaea, and eukaryotes [47], [81]. For viruses, CPB was first described in poliovirus [45], followed by classical swine fever virus [83], human immunodeficiency virus type 1 [84], [85], porcine reproductive and respiratory syndrome virus [86], dengue virus type 2 [87], influenza A/Puerto Rico/8/34 (H1N1) virus [88], Marek's disease herpesvirus [89], [90], Zika virus [91], influenza A virus (IAV) [92], influenza B virus [92], and influenza C virus [92]. However, there have been no reports on CPB in plant viruses until now. In general, CpG/UpA dinucleotide and translational selection shape codon pair usage in protein coding sequences [47], [48], [82], [87]. In prokaryotic and eukaryotic genomes, the most frequently preferred codon pairs are nnGCnn, nnCAnn and nnUnCn [47]. The most frequently avoided codon pairs are nnGGnn, nnUAnn, nnCGCn, nnGnnC, GUCCnn, CUCCnn, UUCGnn and nnCnnA [47]. ANACONDA, CoCoPUT, and CPS software can calculate the codon pair bias of plant RNA viruses (Table 1).

Dinucleotide bias of plant RNA viruses

Normally, the dinucleotide (two consecutive nucleotides) frequencies in different or even the same organisms usually do not match that of the nucleotide composition [93], [94], [95], [96], [97], [98]. In other words, dinucleotides are also not randomly present in organisms. Recent studies have revealed that low CpG and UpA dinucleotide frequencies in animal RNA viruses could avoid specific host defences [99], [100], [101]. Similarly, UpA and CpG were largely underrepresented in the genomes of rice stripe virus [65], potato virus X [102], SCMV [59] and other potyvirids [66] (Fig. 3B). Prádena et al. (2020) showed that an increase in UpA frequency strongly diminishes virus accumulation and fitness [130] using plum pox virus (PPV) as a model. They also demonstrated that host RNA polymerase II plays a key role in the anticorrelation between UpA frequency and RNA accumulation in the genome of PPV. Codon W can calculate the dinucleotide bias of plant RNA viruses (Table 1).

Codon adaptation to the host

Relative synonymous codon usage (RSCU) analysis

The adaptation, evolution, fitness and survival of viruses are affected by codon usage bias [37], [54], [64], [67], [68]. The RSCU value of a codon for viruses and their hosts is the ratio between the observed usage frequency and the expected usage frequency [103]. Generally, hosts have a significant effect on the selection of optimal codons in viruses. Both coincident and antagonistic codon usage between viruses and their hosts have been reported [54], [61], [64], [68]. It is accepted that coincident codon usage allows for the corresponding amino acids to be translated efficiently, whereas antagonistic codon usage suggests viral proteins are folded properly, regardless of whether the translation efficiency of the corresponding amino acids might be reduced [68], [104]. He et al. (2020) compared the RSCU patterns of RBSDV with those of its hosts and vector and showed that RBSDV had evolved complete antagonistic codon usage patterns relative to its host and a mixture of coincident and antagonistic codon usage patterns relative to its vector [61]. Similar results were also found in CTV and its host citrus [54], [64]. These results indicate that the selection pressure exerted by hosts has greatly influenced the codon usage patterns of plant RNA viruses. Codon W, MEGA, and CAIcal SERVER can calculate the RSCU of plant RNA viruses (Table 1).

Codon adaptation index (CAI) and relative codon deoptimization index (RCDI) analyses

The codon adaptation index (CAI) is used to measure the synonymous codon usage bias for a DNA or RNA sequence, including viral DNA or RNA sequences. Several reports show that the CAI is frequently used to assess the adaptation of viral genes to their hosts [38], [60], [63], [64], [67], [105]. Generally, if the CAI value is high, then the codon usage bias is extremely high [106], [107]. For example, CAI analysis showed that CTV might have evolved millions of years ago in Citrus reticulata and then vertically or horizontally transmitted to later citrus species [64], SCMV genes were strongly adapted to maize compared to sugarcane and canna [62], and RBSDV was strongly adapted to rice, followed by maize, wheat and its vector Laodelphax striatellus [61]. Similar to CAI analysis, RCDI was performed to calculate codon deoptimization by comparing the codon usage similarity of a gene and a reference genome sequence [108], and a low RCDI value indicates strong adaptation to a host [108]. Based on the CAI and RCDI analyses, it is proposed that plant RNA viruses have probably evolved a dynamic balance between codon adaptation and codon deoptimization to maintain efficient replication cycles in multiple hosts with various codon usage patterns. Codon W, DnaSP, COOL, CUSIN, HEG-DB, Jact, CAIcal and RCDI SERVER can calculate the CAI and RCDI of plant RNA viruses (Table 1).

Similarity index (SiD) analysis

SiD analysis can reflect the influence of the codon usage bias of hosts on viral genes. The function SiD, ranging from 0 to 1.0, indicates the potential effect of the entire codon usage of the host on the different clades of the viral genes. Normally, a higher SiD value shows that the host plays a key role in the usage of virus codons. For example, during SCMV evolution, maize had a greater impact on the virus than canna or sugarcane because the highest SiD values were observed in maize based on the complete polyprotein and eleven protein coding sequences of SCMV [62]. Several recent studies also report similar SiD analyses on plant RNA viruses and their hosts [39], [61], [62], [63], [64], [65].

CpG and UpA dinucleotide bias

Recently, CpG and UpA dinucleotide motifs have been found to be markedly underrepresented in RNA viruses, including plant RNA viruses [39], [40], [59], [65], [66]. The avoidance of CpG dinucleotide was observed in several potyviruses and other plant RNA viruses [39], [40], [59], [65], [66], possibly due to the outcome of selection on nucleotide composition. Moreover, using PVY as a model, Ibrahim et al. (2019) indicated that increased CpG dinucleotide frequencies in the PVY genome showed a reduction in systemic spread and pathogenicity and attenuated replication kinetics in tobacco plants [109]. Similarly, UpA is also underrepresented in plant RNA viruses. In the Potyviridae family, one of the most important plant RNA virus groups, the UpA odds ratio was observed with a mean frequency of 0.632 (±0.066) [66]. An increase in UpA frequency in the genome of plum pox virus (PPV) strongly diminishes virus accumulation and viral fitness. Furthermore, Prádena et al (2020) showed that the anticorrelation between UpA frequency and RNA accumulation applies to mRNA-like fragments produced by the host RNA polymerase II, and indicated that the host controls diverse RNAs in a dinucleotide-based system in plant cells, including plant RNA viruses [66].

Summary and discussion

Presently, numerous studies have shown the diverse nucleotide composition, codon usage and adaptation in animal- and human-infecting viruses. We are now in a position to better explore the evolution of plant RNA viral genomes by considering the compositional biases and how they adapt to the host. It appears that adenine rich (A-rich) coding sequences were found in the vast majority of plant RNA viruses. A lower codon usage pattern was also found in the gene coding sequences of plant RNA viruses, indicating a low degree of preference in plant RNA viruses. The codon usage pattern of plant RNA viruses was influenced by both natural selection and mutation pressure, and natural selection was the dominant factor. Low CpG and UpA dinucleotide frequencies were also found in plant RNA viruses, possibly to avoid specific host defences. The codon adaptation analyses support that plant RNA viruses have probably evolved a dynamic balance between codon adaptation and deoptimization to maintain efficient replication cycles in multiple hosts with various codon usage patterns. Generally, the nucleotide composition of plant RNA viruses is determined by mutation and drift, resulting in a diversity of codons, dinucleotides, and codon pairs [31]. Meanwhile, tRNA selection preference of host affects the diversity of codons, dinucleotides and codon pairs of plant RNA viruses [80]. Tian et al (2018) showed that viruses can invade a narrow spectrum only (NSTVs) had a higher degree of matching to their hosts' tRNA pools than others can invade a broad spectrum of hosts (BSTVs) [131]. Andmore, Chen et al. (2020) found that virus CUB tended to be more similar to that of symptomatic hosts than that of asymptomatic natural hosts. Thus, the hypothesis we considered that for viruses with narrow host range or high pathogenicity, host tRNA selection bias has a great influence on the virus, making the virus codon bias highly similar to the host, while for viruses with wide host range or weak pathogenicity, host tRNA selection bias has a balance to virus CUB, which makes the virus CUB similar to the host but different to some extent (Fig. 4). However, Cardinale et al. (2013) showed that the codon bias of plant RNA viruses is not only affected by mutation and drift of its own genome and the selection of host tRNA, but also influenced by the genomic architecture and secondary structure of the virus. In future, more factors should be considered in evaluation the CUB of plant RNA viruses.

Outstanding questions

With the increase in studies on evolution and host adaptation of plant RNA viruses, our understanding of the evolutionary changes of plant RNA viruses has been greatly improved. However, many outstanding questions remain: (i) Synonymous mutations do not change the amino acid encoded by the sequence, so synonymous mutations are generally considered to be neutral mutations. While, several studies have found that synonymous mutations can promote the adaptive evolution of animal viruses (for example influenza A virus, vesicular stomatitisvirus, and Qβ bacteriophage) [131], [133], [134] and one plant virus, tobacco etch virus [136]. However, it remains unclear how synonymous mutations affect the adaptive evolution of viruses, especially plant RNA viruses. (ii) Genetic drift is a key factor on the evolution of viruses, while the effect of drift on the CUB of plant RNA viruses is unclear. (iii) For single strand plant RNA viruses, the genes or coding protein regions of viruses appeared more stronger effect than host tRNA selection on their nucleotide, and dinucleotide composition [40], [59], [60], [63], [64]. How about the segment plant RNA viruses and satellites? (iv) Only one study showed that increased UpA frequency greatly reduces plant RNA virus replication in the host [66], more and systemic experimental analyses on the effect of UpA frequency in plant RNA virus replication is eagerly needed. (v). How about other dinucleotide frequency affect host adaptation and replication of plant RNA viruses. (vi) Codon pair bias has significant affect on the evolution of animal and human viruses [45], [83], [84], [85], [86], [87], [88], [89], [90], [91], [92], however, there have been no reports on codon pair bias in plant RNA viruses. (vii) The CUB and dinucleotide bias is also related to amino acid conservation, gene length, protein structure and hydrophobicity level [32], [135], new methods of analyzing CUB and dinucleotide bias are required. Therefore, future work must be performed to combine computational and experimental analyses on the evolution and host adaptability of plant RNA viruses.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

128 in total

1. Analysis of synonymous codon usage bias and phylogeny of coat protein gene in banana bract mosaic virus isolates.

Authors: Atul B Patil; Vijayendra S Dalvi; Akhilesh A Mishra; Bal Krishna; Abdul Azeez
Journal: Virusdisease Date: 2017-05-18

2. Zika Virus Attenuation by Codon Pair Deoptimization Induces Sterilizing Immunity in Mouse Models.

Authors: Penghui Li; Xianliang Ke; Ting Wang; Zhongyuan Tan; Dan Luo; Yuanjiu Miao; Jianhong Sun; Yuan Zhang; Yan Liu; Qinxue Hu; Fuqiang Xu; Hanzhong Wang; Zhenhua Zheng
Journal: J Virol Date: 2018-08-16 Impact factor: 5.103

3. Synonymous Codon Usage Analysis of Three Narcissus Potyviruses.

Authors: Zhen He; Shiwen Ding; Jiyuan Guo; Lang Qin; Xiaowei Xu
Journal: Viruses Date: 2022-04-19 Impact factor: 5.818

4. Interspecies Transmission, Genetic Diversity, and Evolutionary Dynamics of Pseudorabies Virus.

Authors: Wanting He; Lisa Zoé Auclert; Xiaofeng Zhai; Gary Wong; Cheng Zhang; Henan Zhu; Gang Xing; Shilei Wang; Wei He; Kemang Li; Liang Wang; Guan-Zhu Han; Michael Veit; Jiyong Zhou; Shuo Su
Journal: J Infect Dis Date: 2019-05-05 Impact factor: 5.226