| Literature DB >> 28338832 |
Eli Goz1,2, Oriah Mioduser1, Alon Diament1, Tamir Tuller1,2,3.
Abstract
Deciphering the way gene expression regulatory aspects are encoded in viral genomes is a challenging mission with ramifications related to all biomedical disciplines. Here, we aimed to understand how the evolution shapes the bacteriophage lambda genes by performing a high resolution analysis of ribosomal profiling data and gene expression related synonymous/silent information encoded in bacteriophage coding regions.We demonstrated evidence of selection for distinct compositions of synonymous codons in early and late viral genes related to the adaptation of translation efficiency to different bacteriophage developmental stages. Specifically, we showed that evolution of viral coding regions is driven, among others, by selection for codons with higher decoding rates; during the initial/progressive stages of infection the decoding rates in early/late genes were found to be superior to those in late/early genes, respectively. Moreover, we argued that selection for translation efficiency could be partially explained by adaptation to Escherichia coli tRNA pool and the fact that it can change during the bacteriophage life cycle.An analysis of additional aspects related to the expression of viral genes, such as mRNA folding and more complex/longer regulatory signals in the coding regions, is also reported. The reported conclusions are likely to be relevant also to additional viruses.Entities:
Keywords: bacteriophage fitness; bacteriophage genome; coding regions evolution; decoding rates; ribosomes and tRNAs
Mesh:
Substances:
Year: 2017 PMID: 28338832 PMCID: PMC5737525 DOI: 10.1093/dnares/dsx005
Source DB: PubMed Journal: DNA Res ISSN: 1340-2838 Impact factor: 4.458
Figure 1(A) A flow diagram and illustration of the study (see details in the main text). (B) Relative expression level of each of the gene groups (early/late) in read count per nucleotide.
Figure 2(A) Selection for different codons in early and late genes. A plot of two principal components of RSCF vectors of all bacteriophage and E. coli genes. Bicteriophage early (blue) late (red) and E. coli (green) genes tend to be clustered into two distinct groups according to their synonymous codons compositions. The separation between the groups was measured by Davies–Bouldin index and found to be significantly higher than expected in random (P < 0.01; see methods). The separation between the groups of early and late viral genes was visualized by a maximum margin separation line—a line for which the distance between it and the nearest point from either of the groups is maximized. (B) Selection for translation elongations efficiency in bacteriophage coding regions. At each time point: average MTDR values of wildtype early/late genes (vertical bars) were compared with MTDR values of 100 corresponding randomized variants (histograms). Average wildtype MTDR values of each group are significantly higher (P < 0.05) than expected in random. The late genes were sampled to control the length factor (see methods; see also Supplementary Fig. S1). (C) Adaptation of translation elongation efficiency in early and late genes to different bacteriophage development stages genes. Relative translation elongation efficiency coefficient, RTEC = , as a function of time from the beginning of the lytic stage (0–20 min), where and sigh for the MTDR of early and late genes, respectively. We can see that the RTEC of early genes is higher at the beginning and become lower with time (as expected); the first point (t = 0), when there are no measurements of expression, is ignored (see also Fig. 1B). (D) Correlation between codons typical decoding rates (TDR) and relative synonymous codons frequencies (RSCF) at different time conditions for all, early and late viral genes and all E. coli genes. Time points with significant correlations (Spearman P-values <0.05) are marked by asterix. For early genes, the correlation is higher than for late and E. coli genes and is significant (P-value <0.05) for the firsttime points. No significant correlation can be seen for late genes except the first time point. The correlation in the case of the E. coli is significant up to 10 min (except at 2 min). (E) Selection for adaptation to E. coli tRNA pool in both early and late genes. Average tAI values of wildtype early (blue)/late (red) genes (vertical bars) were compared with tAI values of 1,000 corresponding randomized variants (histograms). Average wildtype tAI values of each group are significantly higher (P < 0.001) than expected in random. The late genes were sampled to control the length factor (see Methods). (F) Correlation between TDR and tAI values for each codons at different type points is significant (P value < 0.05).
Figure 3(A and B) Profiles of folding energy (average MFE in all windows of 39-nt length) across the bacteriophage genome (blue) vs. an averaged profile corresponding to 100 randomized variants (black) based on dinucleotide preserving randomization; the window index denotes the distance (in nucleotides) from the beginning of the ORF to the beginning of the window. Regions where the folding energy of the wild type genome is significantly higher (red) or lower (green) than in randomized variants are marked at the bottom of the figure. (A) The profiles include the 5′-UTR near the beginning of the ORF (negative window indexes). (B) The profiles include the 3ʹ-UTR near the ending of the ORF (positive window indexes). (C) Histograms of mean local folding energies (folding energies averaged over all the windows of each gene) compared with randomized mean local folding energies obtained from two models: (i) protein + dinucleotides preserving and (ii) protein + codon usage bias preserving (see also Supplementary Fig. S3). (D) Histograms of log[ARS index]. Eight analyses were performed: two types of reference genomes; bacterial and viral, two type of randomizations; dinucleotide and codons, two groups of genes; early and late. In each histogram, the wild type distribution is compared with the mean random distribution (1,000 random genomes). The P-values were calculated according to Wilcoxon signed-rank test.