| Literature DB >> 34155814 |
Eleanor R Gaunt1, Paul Digard1.
Abstract
If each of the four nucleotides were represented equally in the genomes of viruses and the hosts they infect, each base would occur at a frequency of 25%. However, this is not observed in nature. Similarly, the order of nucleotides is not random (e.g., in the human genome, guanine follows cytosine at a frequency of ~0.0125, or a quarter the number of times predicted by random representation). Codon usage and codon order are also nonrandom. Furthermore, nucleotide and codon biases vary between species. Such biases have various drivers, including cellular proteins that recognize specific patterns in nucleic acids, that once triggered, induce mutations or invoke intrinsic or innate immune responses. In this review we examine the types of compositional biases identified in viral genomes and current understanding of the evolutionary mechanisms underpinning these trends. Finally, we consider the potential for large scale synonymous recoding strategies to engineer RNA virus vaccines, including those with pandemic potential, such as influenza A virus and Severe Acute Respiratory Syndrome Coronavirus Virus 2. This article is categorized under: RNA in Disease and Development > RNA in Disease RNA Evolution and Genomics > Computational Analyses of RNA RNA Interactions with Proteins and Other Molecules > Protein-RNA Recognition.Entities:
Keywords: dinucleotides; mutation bias; selection bias; viral genome composition
Mesh:
Substances:
Year: 2021 PMID: 34155814 PMCID: PMC8420353 DOI: 10.1002/wrna.1679
Source DB: PubMed Journal: Wiley Interdiscip Rev RNA ISSN: 1757-7004 Impact factor: 9.349
FIGURE 1(a) There are 16 possible dinucleotide compositions in RNA. (b) Schematic of CpG motif, with “p” referring to the phosphate bridge (green) joining the cytosine (C) (blue) and guanine (G) (red) bases
FIGURE 2GC content vs CpG ratio for various invertebrate (blue circle) and vertebrate (pink circle) species. In blue from left to right: Spodoptera exempta (African armyworm), Drosophila melanogaster (fruit fly), Bombus bombus (bumble bee), Anopheles gambiae (mosquito). In pink from left to right: Danio rerio (zebrafish), Halichoerus spp (seals), Phocoena spp (porpoise), Didelphis virginiana (opossum), Homo sapiens (human), Rattus norvegicus (brown rat), Takifugu rubripes (pufferfish), Ornithorhynchus anatinus (platypus)
FIGURE 3Under‐representation of CpG dinucleotides (a) and UpA dinucleotides (b) in the genomes of representative viruses. Abbreviations are Adeno, human adenovirus 2; HCMV, human cytomegalovirus; HSV‐1, herpes simplex virus 1; parvo, parvovirus; BTV, bluetongue virus; HCV, hepatitis C virus; FMDV, foot and mouth disease virus; SARS2, severe acute respiratory syndrome coronavirus 2; EBOV, ebola virus; IAV, influenza A virus; RSV, respiratory syncytial virus; HIV‐1, human immunodeficiency virus 1. The Baltimore classifications are I dsDNA; II ssDNA; III dsRNA; IV +ssRNA; V –ssRNA; VI rtRNA
Most strongly avoided codon pairs across bacteria, archaea and eukaryotes
| Codon pair | % of organisms which avoid it | O:E ratio |
|---|---|---|
| UUC GCA | 86 | 0.570 |
| GGG GGU | 83 | 0.460 |
| UUC GAA | 82 | 0.590 |
| CUU AUG | 79 | 0.529 |
| GCU AUG | 76 | 0.590 |
| ACU AUG | 73 | 0.611 |
| GUU AGC | 73 | 0.529 |
| CUU AGU | 73 | 0.521 |
| UUC GCG | 72 | 0.559 |
| GUU AUG | 72 | 0.611 |
Source: Adapted from Tats et al. (2008).
Codon pairs which are inefficiently translated and associated with wobble decoding
| Codon pair | First codon wobble | Second codon wobble |
|---|---|---|
| AGG CGA | — | I |
| AGG CGG | — | — |
| AUA CGA | — | I |
| AUA CGG | — | — |
| CGA AUA | I | — |
| CGA CCG | I | U |
| CGA CGA | I | I |
| CGA CGG | I | — |
| CGA CUG | I | U |
| CGA GCG | I | U |
| CUC CCG | — | U |
| CUG AUA | U | |
| CUG CCG | U | U |
| CUG CGA | U | I |
| GUA CCG | — | U |
| GUA CGA | — | I |
| GUG CGA | — | I |
Note: I∙A, inosine base pairing with adenine; U∙G, uracil base pairing with guanine.
Source: Adapted from Gamble et al. (2016).
FIGURE 4Four types of bias are described in the genomes of organisms and the viruses they are infected with
FIGURE 5Compositional biases in viral genomes may be driven by three types of evolutionary pressure—Translational, selection and mutational. Translationally derived biases arise due to the different translational efficiencies of transcripts with varying composition in different cell conditions (e.g., resting vs. stress). Biases driven by selection arise through viral genomes avoiding encoding specific motifs that may be recognized by components of the innate immune response. Biases driven by mutation arise through editing of viral genomes or transcripts by host cell proteins
FIGURE 6Possible mechanisms by which ZAP activity leads to viral transcript degradation. CpG motifs in viral RNA (red) are bound by the cytoplasmic PRR ZAP, which can lead to recruitment of 5′ decapping enzymes (Dcp1/2 complex), the 3′ deadenylation enzyme PARN and potentially the KHNYN RNA endonuclease, followed by 5′–3′ degradation mediated by Xrn1 and/or 3′–5′ degradation mediated by the RNA exosome. Interactions between ZAP and RIG‐I and/or TRIM25 may also lead to innate immune signaling
FIGURE 7Comparison of CpG and UpA suppression in the genomes of various viruses. RNA viruses: BTV, bluetongue virus; EBOV, ebola virus; FMDV, foot and mouth disease virus; HCV, hepatitis C virus; RSV, respiratory syncytial virus; SARS2, severe acute respiratory syndrome coronavirus 2. DNA viruses: adeno, adenovirus; HCMV, human cytomegalovirus; HSV‐1, herpes simplex virus 1; Parvo, canine parvovirus 2
Synonymous recoding strategies which have been applied to RNA viruses are summarized
| Virus | Recoding strategy | Region recoded | Findings | References |
|---|---|---|---|---|
| Adeno‐associated virus | Codon pair bias deoptimization | Rep | The negative regulatory signal imparted on adenovirus by AAV was diminished, and so adenovirus replication was enhanced | Sitaraman et al. ( |
| Dengue virus | Codon pair bias deoptimization to match insect bias | E/NS3/NS5 | Mutants grow well in insect cells but not well (if at all) in mammalian cells. LD50 was 102–3.5 fold up in mice | Shen et al. ( |
| Bioinformatic analyses showed that the above recoding strategy also increased CpG frequency | This re‐analysis suggested that attenuation of viral replication in mammalian cells might result from increased CpG content rather than increased codon‐pair bias | Simmonds et al. ( | ||
| Echovirus 7 | CpG or UpA dinucleotide bias optimization and deoptimization | VP3/1 and/or 3B/C/D | CpG enrichment in two regions caused a 7000‐fold reduction in replication; UpA enrichment caused a 30‐fold reduction in cells. Removal of CpGs and UpAs increased replication, with removal of both increasing virus titres 10‐fold in cells | Atkinson et al. ( |
| Foot and mouth disease virus | Codon pair bias deoptimization | P1 capsid | 103‐fold increase in the vaccine safety margin compared with WT virus | Diaz‐San Segundo et al. ( |
| Human cytomegalovirus | CpG dinucleotide deoptimization | IE1 | Reporter constructs with elevated CpG content triggered ZAP induction | Lin et al. ( |
| Human immunodeficiency virus 1 | Codon pair optimization and deoptimization | Gag and pol | No observed effects of optimization; deoptimization reduced replication titre in cells. Deoptimized but not optimized virus reverted following passage | Martrus et al. ( |
| Increased CpG frequency | Gag | Up to 102‐fold defect in replication in cells | Antzin‐Anduetza et al. ( | |
| Influenza A virus | Codon pair bias deoptimization | PB1, NP and HA | 101‐fold reduction in titre in cells | Mueller et al. ( |
| Codon pair bias deoptimization | HA and NA | 105‐fold attenuation in mice and clinical attenuation in ferrets | Yang et al. ( | |
| CpG and UpA dinucleotide deoptimization | NP | 101–2‐fold reduction in titre in cell culture and disease attenuation in mice | Gaunt et al. ( | |
| CpG and codon pair bias deoptimization | NA | Codon pair bias dramatically decreased replication whereas increased CpG dinucleotides did not | Groenke et al. ( | |
| Poliovirus | Codon usage bias deoptimization | Capsid | 65 fold reduction in virus titre in cells | Burns et al. ( |
| CpG and UpA dinucleotide deoptimization | Capsid | Up to a 103‐fold reduction in virus titre in cells | Burns et al. ( | |
| Codon usage optimization and deoptimization | Capsid | Little effect with codon optimization; deoptimization reduced virus titre in cells and mice | Lauring et al. ( | |
| Codon pair bias deoptimization | Capsid | Replication defect correlated with extent of mutagenesis in cells | Coleman et al. ( | |
| Porcine reproductive and respiratory syndrome virus | Codon pair bias deoptimization | GP5 | A 10‐fold replication defect in cells, 103‐fold decrease in virus titre in pigs | Ni et al. ( |
| Codon pair bias deoptimization | NSP9 | 104‐fold replication defect in cells, no evidence of infection in pigs | Gao et al. ( | |
| Potato virus Y | CpG and UpA dinucleotide deoptimization | Nonstructural genes | Up to 103‐fold defect (CpG) or 106‐fold defect (UpA) in systemic spread | Ibrahim et al. ( |
| Respiratory syncytial virus | Codon pair deoptimization | Various combinations, with the most extensive recoding extending to all ORFs except M1 and M2 | Multiple log10‐fold reduction in titre of various mutants in cells, mice and African Green Monkeys | Le Nouën et al. ( |
| Codon deoptimization by altering codon usage to be consistent with human | NS1 and NS2 | Modest replication attenuation in cells and mice | Meng et al. ( | |
| Simian immunodeficiency virus | Nucleotide optimization towards nucleotide frequencies in macaque | Gag and pol | 102‐fold decrease in replication in cells; recoding in polymerase only had no effect | Vabret et al. ( |
| Vesicular stomatitis virus | Codon pair bias optimization and deoptimization | Polymerase | Optimization resulted in a modest replication defect in cells and 102–3‐fold deficit in mice. Deoptimized virus could not be recovered | Wang et al. ( |