Literature DB >> 22334868

Single-stranded genomic architecture constrains optimal codon usage.

Abstract

Viral codon usage is shaped by the conflicting forces of mutational pressure and selection to match host patterns for optimal expression. We examined whether genomic architecture (single- or double-stranded DNA) influences the degree to which bacteriophage codon usage differ from their primary bacterial hosts and each other. While both correlated equally with their hosts' genomic nucleotide content, the coat genes of ssDNA phages were less well adapted than those of dsDNA phages to their hosts' codon usage profiles due to their preference for codons ending in thymine. No specific biases were detected in dsDNA phage genomes. In all nine of ten cases of codon redundancy in which a specific codon was overrepresented, ssDNA phages favored the NNT codon. A cytosine to thymine biased mutational pressure working in conjunction with strong selection against non-synonymous mutations appears be shaping codon usage bias in ssDNA viral genomes.

Entities: Chemical Disease Species

Year: 2011 PMID： 22334868 PMCID： PMC3278643 DOI： 10.4161/bact.1.4.18496

Source DB: PubMed Journal: Bacteriophage ISSN： 2159-7073

Introduction

Viruses usually exhibit genomic signatures that closely mimic those of their primary hosts,, in part to better evade innate and acquired immune responses., However, the majority of the close adherence to host nucleotide usage is attributed to selection for improved translational speed and efficiency, which are correlates of viral fitness. Synonymous codons are used at different frequencies in virtually all organisms,, and the most frequently used codons correlate with the most abundant tRNAs within a cell., These favored synonymous codons are therefore recognized and translated- more rapidly. The most frequently expressed cellular genes within a given organism exhibit similar patterns of this codon usage bias (CUB) and are more biased than less frequently expressed genes.,- For viruses, these factors should contribute to increased rate of replication when strictly adhering to host CUB. Therefore many viruses have been under selective pressure to match the CUB of their preferred hosts. Despite increased attention to the genomic match between viruses and their hosts, there have been few studies examining how different viral genomic architectures facilitate or hinder adaptation to their hosts’ genomes. Phages are the optimal system in which to explore how genomic architecture affects viral molecular evolution. The codon bias expressed in prokaryotic hosts is constant for each host cell, unlike multi-cellular organisms, in which codon usage profiles are affected by tissue-specific gene expression. Perhaps due to this, phage are more strongly adapted to their primary hosts’ CUB than eukaryotic viruses, allowing the greatest potential to identify factors that diminish the match between virus and host genomes. Bacterial hosts also offer a wider range of genomic nucleotide content to examine compared with plant or mammalian hosts, and their CUB have been well-documented. Additionally, while phage host ranges are far from perfectly annotated, bacteriophage host ranges are usually quite narrow and many of their host ranges have been better delineated than eukaryotic viruses, such as phytopathogens. Two distinct phage genomic architectures (single-stranded DNA, ssDNA and double-stranded DNA, dsDNA) have been amply sequenced; unfortunately, the small number of sequenced RNA phages precludes their close examination at this time. The two DNA-based architectures are subject to specific constraints: dsDNA phages can house the largest genomes, up to ~300 kb,, whereas even the largest ssDNA phages are smaller than 10 kb. Many dsDNA phages encode their own tRNAs, (e.g., T4 encodes eight), decreasing selection for adherence to host CUB, whereas none have been found in ssDNA phages. dsDNA phages have the lowest mutation rates among viruses, while ssDNA phage mutation rates are faster, approaching those of a dsRNA phage., Eukaryotic viruses with the same ssDNA genomic architecture exhibit evolutionary rates orders of magnitude above those seen in eukaryotic dsDNA viruses. Consequently, faster-evolving ssDNA phages might be better able to adapt to host-imposed genomic conditions. Conversely, the mutation frequency in ssDNA phages may diminish their ability to conform to their host codon preferences. Genomic GC content is a rough predictor of CUB, and many viruses match the GC content of their hosts.- Bacteriophage GC content, in particular, correlates strongly to that of their primary bacterial hosts. We measured the similarity in GC content between each ssDNA and dsDNA GenBank phage reference genome and that of its primary host. We used the most numerous group of phages with a common host, Escherichia coli, to compare codon adaptation indices (CAI) and relative synonymous codon usage (RSCU) for a subset of highly expressed genes from dsDNA and ssDNA coliphages. Our results show that genomic architecture correlates to statistically significant differences in nucleotide content and codon usage between ssDNA and dsDNA phages, and point to an enrichment of thymine as a cause.

Results and Discussion

GC content in ssDNA and dsDNA phages was highly correlated with host GC content (r2 = 0.82 for ssDNA phages, 0.84 for dsDNA phages, equally correlated p = 0.72) across a very wide range of host GC content (~0.25 to ~0.72) (Fig. 1). A previous study found significant differences between ssDNA and dsDNA phage nucleotide correlation with their hosts, but the additional 333 dsDNA and 13 ssDNA reference sequences added to GenBank since that analysis suggest there is no difference (). ssDNA phages exhibited a pronounced genomic thymine bias (average 0.30 T), but nonetheless infected hosts with a range of GC contents (0.25 to 0.70), as wide as that of dsDNA phages (0.26 to 0.72).

Figure 1. Correlation between host and phage genomic GC content. Grey squares indicate dsDNA phages, open squares ssDNA. Best-fit linear regression lines are solid for dsDNA (r2 = 0.84) and dashed for ssDNA (r2 = 0.82). There was no significant difference between the correlations (p = 0.72). Correlated GC content was a poor predictor of strong CAI match between E. coli and the coat genes of its phages. The mean CAI of ssDNA coliphages was 0.706, while the dsDNA phages were significantly better matched to E. coli (0.744, p < 0.001, Fig. 2). This number includes eight dsDNA coliphage genomes for which tail protein encoding genes were used, rather that coat protein encoding genes, due to the absence of properly annotated coat genes. The inclusion of tail genes did not change the results of this analysis (p < 0.001 with and without the eight tail genes). The evidence of selection for translational efficiency is stronger for dsDNA phages.

Figure 2. Mean coat gene CAI with 95% confidence intervals of ssDNA (n = 11), dsDNA (n = 34) coliphages.

Figure 2. Mean coat gene CAI with 95% confidence intervals of ssDNA (n = 11), dsDNA (n = 34) coliphages. Comparison of the GC content of the first two positions of each codon (GC1,2) and the third position (GC3) of these genes revealed an interesting pattern: for both ssDNA and dsDNA coliphages, the GC1,2 was restricted to a tight range between about 0.45 and 0.55. dsDNA GC3 varied along a wide range, from 0.26 to 0.69, but ssDNA GC3 occupied a narrower range, from 0.30 to 0.54 (Fig. 3). Furthermore, when plotted with a line representing a perfect correlation between GC1,2 and GC3, all but one of the ssDNA phages fell to the left of that line (Fig. 3), indicating a paucity of GC in the third codon position of their coat genes. Conversely, the dsDNA coat genes were GC3-rich or GC3-poor in approximately equal numbers. Past studies have indicated that strong mutational biases often occur with low levels of CUB,- possibly because a strong, non-specific mutational pressure would prevent any persistent, directional changes in the genome. The consistently lower GC3 content of the ssDNA genes suggests that a specific mutational pressure might be reducing GC3 content in a directional manner, which is disrupting the effects of selection for translational efficiency.

Figure 3. GC1,2/GC3 correlation for ssDNA (open squares) and dsDNA (gray squares) coliphage coat genes. Solid line indicates perfect correlation. Points above the line indicate genes deficient in GC3, points below denote genes enriched in GC3. We further investigated the GC3-poor nature of ssDNA coliphage coat proteins with RSCU analysis. It revealed statistically significant variation in use for 15 of 59 codons between ssDNA and dsDNA phage (p < 0.03 for TTG, p < 0.002 for CTT and TCC, p < 0.001 for all other codons, Fig. 4). Notably, for four of the five codons more frequently used by ssDNA rather than dsDNA coliphages, thymine was in the third position. No codons enriched in dsDNA phage relative to ssDNA phage contained thymine in the third positions.

Figure 4. Mean RSCU values and 95% confidence intervals for individual codons with statistically significant differences in usage between ssDNA (open squares) and dsDNA (gray squares) coliphage coat or tail genes. Calculation of RSCUs of coat genes in 28 ssDNA phages with a diverse host range confirmed this pattern: codons with thymine in the third position were extremely overrepresented (p < 0.001) for six amino acids (A, D, G, I, T, V), and were significantly favored (p < 0.012) in three more (H, P, S) (Fig. 5). Only one of the remaining nine degenerate amino acids had a statistically preferred codon in ssDNA phages (GAA for E, p < 0.01).

Figure 5. RSCU values and 95% confidence intervals for ssDNA phage coat gene codons that exhibited an NNT codon preference. Preferred NNT codons indicated by bold triangles, NNV codons indicated by squares. We subdivided our data set to separately examine the two morphologically distinct families of ssDNA phages, the Inoviridae and the Microviridae. Because inoviruses are frequently vertically transmitted and can productively infect their hosts without causing lysis, they might be under increased selective pressure to match the genomes of their more permanently associated hosts. RSCU comparisons revealed no consistent patterns associated with phage lifestyle. No difference in RSCU was evident for 11 of the 16 NNT codons in these groups (). Cytosines are comparatively unstable and readily undergo spontaneous deamination to uracil, resulting in C to T transitions after unrepaired replication. This spontaneous deamination occurs 100 times more frequently in ssDNA than dsDNA, resulting in a higher mutation rate at cytosines than at other bases in ssDNA phage. ssDNA phage genomes appear to spend more time truly single-stranded, as they do not experience consistent intra-strand base pairing or regular secondary structure formation while encapsidated.- This causes ssDNA phages to more frequently have unpaired bases than ssRNA genomes, which are constrained by extensive stem-loop formation both in the cytosol and when encapsidated. Any thymine-increasing bias does not appear to have a discernible effect on genomic nucleotide content relative to the phages’ primary hosts. Rather, it is likely that cytosine transitions in the first or second positions are subject to strong purifying selection relative to the wobble position,- and the signature of this mutational bias is only observed in the overabundance of thymine in the third position of synonymous ssDNA phage codons. The significant overrepresentation of NNT codons is strongly indicative of a biased mutational pressure acting in concert with strong selection against non-synonymous substitutions. Genomic architecture (nucleic acid, segmentation, strandedness), while acknowledged as an important characteristic of virus taxonomy, is not typically included in broad-scale analyses of viral evolution. Instead, most comparisons focus within a single kind of virus, and while many of these studies have provided insight into the codon usage biases of individual viruses, this is the first observation of a specific bias with a possible mechanistic explanation. Examining across two architectures, we saw strandedness play a critical role in the composition of phage genomes, and in determining the limits of ssDNA viral adaptation to their hosts.

Materials and Methods

All available ssDNA and dsDNA bacteriophage genome reference sequences were collected from GenBank on March 16, 2011. Reference sequences were used to avoid biasing our data sets toward any particular phage species, or highly studied phage, such as the model organisms PhiX174 or T7. These genomes were separated according to genomic architecture for further analysis. Initially collected were 41 ssDNA phages and 447 dsDNA phages (). For each phage having a known host with a sequenced genome (GenBank reference sequence), the relationship between the GC content of the phage and the host bacterium was examined. Because not every sequenced phage has an identified and sequenced host, not all phages were included in this analysis. Four ssDNA phages were excluded, as were 44 dsDNA phages (). The codon usage biases of representative ssDNA and dsDNA phages were examined to gain a more complete picture of the CUB patterns in both architectures. Codon usage profiles were determined using major coat/capsid genes, or, in the eight cases for which coat genes were not available, tail gene sequences retrieved from GenBank reference genomes (). These structural proteins are highly expressed and exhibit the highest degrees of codon usage bias found in phage., We compared codon usage between the two genomic architectures for phages infecting a single host: Escherichia coli. Coat or tail genes from 11 ssDNA and 34 dsDNA phages were used (). The online CAIcal tool was used to calculate each phage’s codon adaptation index (CAI), a measure of the degree to which one gene or set of genes adheres to the CUB of another gene or set of genes, as implemented by Xia. CAI ranges from zero to one; values closer to one indicate a strong correlation. The average CAI was calculated for both architectures. Frequency of the first and second codon positions (GC1,2) and frequency of GC in the third position (GC3) were calculated for these genes using CAIcal and relationship between the two was analyzed. A plot of GC1,2 against GC3 is a common measure of the factors affecting CUB in a gene or set of genes; a strong correlation between the two implies that genome-wide mutational pressures are the driving force behind CUB, while a weaker correlation indicates that some force is unequally affecting the first two positions and the third position. Usually, this is interpreted as implying a selective force acting on CUB, as is expected to be the case for viruses under relatively strong selection for translational speed. To examine the variation in codon usage that contributes to the differing CAI values and site-specific base compositions, relative synonymous codon usage (RSCU) values were calculated for the same sets of genes using CAIcal. RSCU is a measure of the relative codon usage for each individual degenerate amino acid compared with expected levels if synonymous codons were used with equal frequency. An RSCU of about one indicates that a codon is used as frequently as expected, while values above or below one indicate over or underuse of that synonymous codon, respectively. Mean dsDNA coliphage RSCUs were compared with ssDNA coliphage RSCU to determine the proximate cause of the observed variation in CAI. RSCU was also calculated for 17 additional sufficiently well-annotated genomes of ssDNA phages infecting a wide host range (primarily infecting Acholplasma, Bdellovibrio, Chlamydia, Escherichia, Propionibacteria, Pseudomonas, Ralstonia, Spiroplasma, Vibrio and Xanthomonas, ), and the complete set of 28 ssDNA phage RSCUs was assessed for consistent CUB. For amino acids with 6-fold redundancy (L, R, S), RSCUs were calculated separately for the codon sets with 4-fold and 2-fold redundancy. Significantly biased codon use was measured for each codon with one-tailed t-tests (Microsoft Excel) and Bonferroni correction for multiple comparisons (α = 0.017 for 4-fold, α = 0.025 for 3-fold).

54 in total

1. Conserved RNA secondary structures in Flaviviridae genomes.

Authors: Caroline Thurner; Christina Witwer; Ivo L Hofacker; Peter F Stadler
Journal: J Gen Virol Date: 2004-05 Impact factor: 3.891

2. Point mutation rate of bacteriophage PhiX174.

Authors: José M Cuevas; Siobain Duffy; Rafael Sanjuán
Journal: Genetics Date: 2009-08-03 Impact factor: 4.562

3. A sensitive genetic assay for the detection of cytosine deamination: determination of rate constants and the activation energy.

Authors: L A Frederico; T A Kunkel; B R Shaw
Journal: Biochemistry Date: 1990-03-13 Impact factor: 3.162

4. What drives codon choices in human genes?

Authors: S Karlin; J Mrázek
Journal: J Mol Biol Date: 1996-10-04 Impact factor: 5.469

5. Compositional constraints and genome evolution.

Authors: G Bernardi; G Bernardi
Journal: J Mol Evol Date: 1986 Impact factor: 2.395

6. Diversity in G + C content at the third position of codons in vertebrate genes and its cause.

Authors: S Aota; T Ikemura
Journal: Nucleic Acids Res Date: 1986-08-26 Impact factor: 16.971

7. The codon Adaptation Index--a measure of directional synonymous codon usage bias, and its potential applications.

Authors: P M Sharp; W H Li
Journal: Nucleic Acids Res Date: 1987-02-11 Impact factor: 16.971

8. Correlation between the abundance of yeast transfer RNAs and the occurrence of the respective codons in protein genes. Differences in synonymous codon choice patterns of yeast and Escherichia coli with reference to the abundance of isoaccepting transfer RNAs.

Authors: T Ikemura
Journal: J Mol Biol Date: 1982-07-15 Impact factor: 5.469

9. Codon usages in different gene classes of the Escherichia coli genome.

Authors: S Karlin; J Mrázek; A M Campbell
Journal: Mol Microbiol Date: 1998-09 Impact factor: 3.501

10. Phage phi X174 probed by laser Raman spectroscopy: evidence for capsid-imposed constraint on DNA secondary structure.

Authors: N L Incardona; B Prescott; D Sargent; O P Lamba; G J Thomas
Journal: Biochemistry Date: 1987-03-24 Impact factor: 3.162

13 in total

1. Viral dark matter and virus-host interactions resolved from publicly available microbial genomes.

Authors: Simon Roux; Steven J Hallam; Tanja Woyke; Matthew B Sullivan
Journal: Elife Date: 2015-07-22 Impact factor: 8.140

2. Environmental Viral Genomes Shed New Light on Virus-Host Interactions in the Ocean.

Authors: Yosuke Nishimura; Hiroyasu Watai; Takashi Honda; Tomoko Mihara; Kimiho Omae; Simon Roux; Romain Blanc-Mathieu; Keigo Yamamoto; Pascal Hingamp; Yoshihiko Sako; Matthew B Sullivan; Susumu Goto; Hiroyuki Ogata; Takashi Yoshida
Journal: mSphere Date: 2017-03-01 Impact factor: 4.389

3. System analysis of synonymous codon usage biases in archaeal virus genomes.

Authors: Sen Li; Jie Yang
Journal: J Theor Biol Date: 2014-03-28 Impact factor: 2.691

4. Universal evolutionary selection for high dimensional silent patterns of information hidden in the redundancy of viral genetic code.

Authors: Eli Goz; Zohar Zafrir; Tamir Tuller
Journal: Bioinformatics Date: 2018-10-01 Impact factor: 6.937

5. Base composition and translational selection are insufficient to explain codon usage bias in plant viruses.

Authors: Daniel J Cardinale; Kate DeRosa; Siobain Duffy
Journal: Viruses Date: 2013-01-15 Impact factor: 5.048

6. Differential codon adaptation between dsDNA and ssDNA phages in Escherichia coli.

Authors: Shivapriya Chithambaram; Ramanandan Prabhakaran; Xuhua Xia
Journal: Mol Biol Evol Date: 2014-02-27 Impact factor: 16.240

7. Linking Virus Genomes with Host Taxonomy.

Authors: Tomoko Mihara; Yosuke Nishimura; Yugo Shimizu; Hiroki Nishiyama; Genki Yoshikawa; Hideya Uehara; Pascal Hingamp; Susumu Goto; Hiroyuki Ogata
Journal: Viruses Date: 2016-03-01 Impact factor: 5.048

8. Comparative genomic analysis of Clostridium difficile ribotype 027 strains including the newly sequenced strain NCKUH-21 isolated from a patient in Taiwan.

Authors: Haruo Suzuki; Masaru Tomita; Pei-Jane Tsai; Wen-Chien Ko; Yuan-Pin Hung; I-Hsiu Huang; Jenn-Wei Chen
Journal: Gut Pathog Date: 2017-11-29 Impact factor: 4.181

9. Identification of a novel archaea virus, detected in hydrocarbon polluted Hungarian and Canadian samples.

Authors: János Molnár; Balázs Magyar; György Schneider; Krisztián Laczi; Sarshad K Valappil; Árpád L Kovács; Ildikó K Nagy; Gábor Rákhely; Tamás Kovács
Journal: PLoS One Date: 2020-04-17 Impact factor: 3.240

10. Entamoeba and Giardia parasites implicated as hosts of CRESS viruses.

Authors: Cormac M Kinsella; Aldert Bart; Martin Deijs; Patricia Broekhuizen; Joanna Kaczorowska; Maarten F Jebbink; Tom van Gool; Matthew Cotten; Lia van der Hoek
Journal: Nat Commun Date: 2020-09-15 Impact factor: 14.919