| Literature DB >> 28168289 |
Damien Paulet1,2, Alexandre David3, Eric Rivals1,2.
Abstract
Codon usage is biased between lowly and highly expressed genes in a genome-specific manner. This universal bias has been well assessed in some unicellular species, but remains problematic to assess in more complex species. We propose a new method to compute codon usage bias based on genome wide translational data. A new technique based on sequencing of ribosome protected mRNA fragments (Ribo-seq) allowed us to rank genes and compute codon usage bias with high precision for a great variety of species, including mammals. Genes ranking using Ribo-Seq data confirms the influence of the tRNA pool on codon usage bias and shows a decreasing bias in multicellular species. Ribo-Seq analysis also makes possible to detect preferred codons without information on genes function.Entities:
Keywords: codon usage; evolution; high throughput sequencing; synonymous codon; translation
Mesh:
Substances:
Year: 2017 PMID: 28168289 PMCID: PMC5499818 DOI: 10.1093/dnares/dsw062
Source DB: PubMed Journal: DNA Res ISSN: 1340-2838 Impact factor: 4.458
Figure 1Comparison of the strength of codon usage bias as measured by the RSCU as originally defined by Sharp et al., and by the RSCURS which is introduced in this work. Both measures are plotted for a subset of highly expressed genes. The RSCU is computed for each codon of an amino acid: it multiplies the relative observed frequency of this codon among all possible codons for the corresponding amino acid by the number of possible codons. In all graphs, the codons on x-axis are ordered by increasing RSCU values computed by Sharp et al. (A) Comparison of RSCU values of Sharp and RSCURS values in C. elegans (in two replicates). Both replicate curves for RSCURS are extremely close from each other, and they closely follow that from Sharp. (B) Comparison of RSCU values of Sharp and RSCURS values in C. albicans. (C) and (D). Comparison of eight RSCURS curves obtained with codons counts computed in different ranges of codons starting with 20th codon and ending between the 50th and the 400th codons, respectively, in C. elegans (C) and in C. albicans (D). Apart for the smallest range of codons (i.e. [20-50]) all curves are very close to each other, thereby showing the robustness of with respect to the range of codons taken into account. One observes that RSCURS values reach higher values in the yeast species than in the worm species. Refer to the online version for colors.
Figure 2Differences in codon usage bias between lowly and highly translated genes. (A) Euclidean distance between RSCURS of lowly and highly translated genes was computed for ten species. This distance clearly partitions the species in two groups: the mean euclidean distance is 1.51 for group 1 (left, in orange), and is 6.64 for group 2 (right, in grey). Group 2 comprises C. albicans, S. cerevisiae, S. pombe and C. elegans, while group 1 comprises all vertebrates, D. melanogaster, P. falciparum and H. capsulatum. (B) For each species, comparison of the RSCURS of all codons between lowly and highly translated genes (ltg vs htg). Codons from group 1 species (containing the vertebrates) are shown as orange circles, while codons from group 2 species (that of the budding yeast) as gray circles. A point near the diagonal indicates a similar behavior in ltg and htg. The further apart the point from the diagonal, the higher the bias of that codon. Refer to the online version for colors.
Figure 3(A) Preferred codons in highly translated genes. To ease the comparison of the bias across codons, we plotted the codon frequency such that each codon has a value comprised in [0,1]. The frequency is the RSCU multiplied by the number of codons for the corresponding amino acid. The range of colors goes from blue for a frequency of 0 until red for a frequency of 1. Same species abbreviations as in Fig. 1. (B) Clustering of the ten species based on tRNA copy numbers. (C) Clustering of the 10 species based on RSCURS in highly translated genes. Both clustering were performed using UPGMA algorithm in R (see Methods section). (D) Variation of the numbers of codons lacking tRNA with the size of the species' tRNA pool. Linear regression lines are solid for codons with no tRNA (R2 = 0.69) and dotted for preferred codons with no tRNA (R2 = 0.53). H. capsulatum was excluded for both regressions. The regression line for all codons decreases sharply (slope = −0.026), while the regression line for preferred codons remains flat (slope = −0.002). Refer to the online version for colors.
Preferred codons per organism in highly (A) and lowly (B) translated genes
| A. | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Hs | Mm | Rn | Dm | Hc | Ce | Sp | Sc | Ca | Pf | |
| Phe | TTC | TTC | TTC | TTC | TTC | TTC | TTC | TTC | TTC | |
| His | CAC | CAC | CAC | CAC | CAC | CAC | CAC | CAC | CAC | CAT |
| Tyr | TAC | TAC | TAC | TAC | TAC | TAC | TAC | TAC | TAC | |
| Asn | AAC | AAC | AAC | AAC | AAC | AAC | AAC | AAC | AAC | |
| Lys | AAG | AAG | AAG | AAG | AAG | AAG | AAG | AAG | AAA | AAA |
| Cys | TGC | TGC | TGC | TGC | TGC | TGC | TGC | TGT | ||
| Glu | GAG | GAG | GAG | GAG | GAG | GAG | GAG | GAA | GAA | GAA |
| Ala | GCT | GCT | GCT | GCT | ||||||
| Se | TCT | TCT | TCT | TCA | ||||||
| Thr | ACT | ACT | ACT | ACA | ||||||
| Ile | ATT | ATT | ATT | |||||||
| Gly | GGC | GGC | GGC | GGC | GGA | |||||
| Pro | CCC | CCA | CCT | CCA | CCA | CCA | ||||
| Gln | CAA | CAA | CAA | CAA | CAA | |||||
| Leu | CTG | CTG | TTG | TTG | TTA | |||||
| Val | GTG | GTG | GTG | GTT | GTT | GTT | GTT | |||
| Arg | CGT | CGT | AGA | AGA | AGA | |||||
| Asp | GAC | GAC | GAC | GAC | GAC | |||||
Color code: on a blue background codons shared with M. musculus, on a red background, codons shared with C. albicans. A: Bolded codons with a star are preferred codons for which a codon with more acceptors exists. B: bolded and underlined codons are identical in lowly and highly translated genes. The following abbreviations of species names are used throughout this article: H. sapiens (Hs), H. sapiens (Hs), M. musculus (Mm), R. norvegicus (Rn), D. melanogaster (Dm), C. elegans (Ce), H. capsulatum (Hc), S. pombe (Sp), S. cerevisiae (Sc), C. albicans (Ca), and P. falciparum (Pf). Refer to the online version for colors.
Comparison of favourite codons and tRNA copy numbers
| Species | Number of tRNAs | Number of favourite codons having the highest tRNA copy number (out of 18 cases) | (%) | Number of favorite codons lacking tRNA | Number of codons lacking tRNA |
|---|---|---|---|---|---|
| 589 | 9 | 50.0 | 2 | 6 | |
| 563 | 12 | 66.7 | 6 | 15 | |
| 441 | 12 | 66.7 | 2 | 8 | |
| 385 | 10 | 55.6 | 5 | 13 | |
| 268 | 12 | 66.7 | 6 | 18 | |
| 259 | 15 | 83.3 | 3 | 20 | |
| 158 | 15 | 83.3 | 2 | 16 | |
| 119 | 15 | 83.3 | 3 | 20 | |
| 71 | 8 | 44.4 | 5 | 16 | |
| 33 | 14 | 77.8 | 7 | 26 |
(The codons for Stop, Met and Trp are excluded). tRNA: transfer RNA. Copy numbers of tRNA genes come from GtRNAdb 2.0. The percentage of favourite codons (one per amino acid) that have the highest tRNA copy number lies above 50% for all species but H. capsulatum.
Figure 4Genome percentage of GC bases (GC %) versus GC % of last base of preferred codons. Same species abbreviations as in Fig. 1.