| Literature DB >> 23637123 |
Kord M Kober1, Grant H Pogson.
Abstract
Codon usage bias has been documented in a wide diversity of species, but the relative contributions of mutational bias and various forms of natural selection remain unclear. Here, we describe for the first time genome-wide patterns of codon bias at 4623 genes in the purple sea urchin, Strongylocentrotus purpuratus. Preferred codons were identified at 18 amino acids that exclusively used G or C at third positions, which contrasted with the strong AT bias of the genome (overall GC content is 36.9%). The GC content of third positions and coding regions exhibited significant correlations with the magnitude of codon bias. In contrast, the GC content of introns and flanking regions was indistinguishable from the genome-wide background, which suggested a limited contribution of mutational bias to synonymous codon usage. Five distinct clusters of genes were identified that had significantly different synonymous codon usage patterns. A significant correlation was observed between codon bias and mRNA expression supporting translational selection, but this relationship was driven by only one highly biased cluster that represented only 8.6% of all genes. In all five clusters preferred codons were evolutionarily conserved to a similar degree despite differences in their synonymous codon usage distributions and magnitude of codon bias. The third positions of preferred codons in two codon usage groups also paired significantly more often in stems than in loops of mRNA secondary structure predictions, which suggested that codon bias might also affect mRNA stability. Our results suggest that mutational bias has played a minor role in determining codon bias in S. purpuratus and that preferred codon usage may be heterogeneous across different genes and subject to different forms of natural selection.Entities:
Keywords: antagonistic pleiotropy; codon bias; mRNA stability; mutational bias; sea urchin; translational selection
Mesh:
Substances:
Year: 2013 PMID: 23637123 PMCID: PMC3704236 DOI: 10.1534/g3.113.005769
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1Distribution of codon bias scores of (A) Nc and (B) Nc′ in S. purpuratus.
Genome-wide preferred synonymous codon usage in S. purpuratus
| Second Base | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| U | C | A | G | |||||||||
| First Base | Codon | ρ | Codon | ρ | Codon | ρ | Codon | ρ | ||||
| U | UUU (F) | 0.381 | 7.54E-160 | UCU (S) | 0.191 | 3.94E-39 | UAU (Y) | 0.387 | 9.07E-165 | |||
| UGC (C) | 0.208 | 2.18E-46 | ||||||||||
| UUA (L) | 0.453 | 2.16E-232 | UCA (S) | 0.231 | 5.12E-57 | UAA (X) | UGA (X) | |||||
| UUG (L) | 0.274 | 1.67E-80 | UCG (S) | −0.022 | 1.27E-01 | UAG (X) | UGG (W) | |||||
| C | CUU (L) | 0.208 | 2.28E-46 | CCU (P) | 0.157 | 7.61E-27 | CAU (H) | 0.242 | 2.24E-62 | CGU (R) | −0.021 | 1.46E-01 |
| CCA (P) | 0.185 | 5.08E-37 | ||||||||||
| CUA (L) | 0.253 | 1.65E-68 | CAA (Q) | 0.382 | 1.73E-160 | CGA (R) | 0.152 | 3.05E-25 | ||||
| CUG (L) | −0.390 | 2.64E-167 | CCG (P) | −0.011 | 4.64E-01 | CGG (R) | 0.062 | 2.56E-05 | ||||
| A | AUU (I) | 0.342 | 3.55E-127 | ACU (T) | 0.277 | 3.53E-82 | AAU (N) | 0.454 | 2.42E-233 | AGU (S) | 0.204 | 1.68E-44 |
| AAC (N) | AGC (S) | −0.230 | 1.61E-56 | |||||||||
| AUA (I) | 0.385 | 6.16E-163 | ACA (T) | 0.284 | 2.66E-86 | AAA (K) | 0.466 | 2.95E-248 | AGA (R) | 0.302 | 4.55E-98 | |
| AUG (M) | ACG (T) | −0.039 | 7.81E-03 | AGG (R) | −0.179 | 1.90E-34 | ||||||
| G | GUU (V) | 0.346 | 4.56E-130 | GCU (A) | 0.132 | 1.55E-19 | GAU (D) | 0.338 | 3.48E-124 | GGU (G) | 0.052 | 3.99E-04 |
| GUA (V) | 0.309 | 1.48E-102 | GCA (A) | 0.311 | 1.99E-104 | GAA (E) | 0.416 | 6.46E-193 | GGA (G) | 0.150 | 1.44E-24 | |
| GUG (V) | −0.095 | 1.17E-10 | GCG (A) | 0.045 | 2.14E-03 | 2.82E-193 | GGG (G) | 0.072 | 1.03E-06 | |||
Bold identifies the synonymous codon in a codon family with the strongest significant negative correlation.
Spearman’s correlation coefficient between frequency of a synonymous codon usage to the overall codon bias of the gene.
Correlations between frequency of isoaccepting tRNA gene copy numbers and usage of synonymous codons and amino acids
| Synonymous Codon | Amino Acid | |||
|---|---|---|---|---|
| Group | ρ | ρ | ||
| 0 | 0.6520 | 6.913e-06 | 0.6924 | 0.00101 |
| 1 | 0.6719 | 2.824e-06 | 0.6310 | 0.00377 |
| 2 | 0.6122 | 3.448e-05 | 0.6994 | 0.00086 |
| 3 | 0.6874 | 1.338e-06 | 0.5985 | 0.00678 |
| 4 | 0.5785 | 1.145e-04 | 0.7100 | 0.00066 |
| All | 0.5934 | 6.856e-05 | 0.6836 | 0.00125 |
tRNA, transfer RNA.
Figure 2Heatmap of codon usage frequencies in the five codon usage groups (CUG) in (A) Strongylocentrotus purpuratus and (B) Drosophila melanogaster. Codon usage frequencies were centered on the equal usage expectation (e.g., for fourfold degenerate codons, frequencies were centered on 1/4). Centered frequencies clustering was performed with hierarchical cluster analysis using pairwise complete-linkage by Euclidean distance for both columns (CUG) and rows (synonymous codons) using Cluster 3.0 (Eisen ). Heatmap plots were generated with Java TreeView (Saldanha 2004). Synonymous codon labels are colored by the base composition of the third position (N3) with N3 of ‘A’ and ‘T’ nucleotides, colored pink, and ‘G’ and ‘C,’ colored green. The asterisk (*) denotes genome-wide preferred codons in S. purpuratus (see Materials and Methods).
Summary of codon bias, GC content, gene size, and mRNA expression levels in the five codon usage groups
| Statistic | Group 0 | Group 1 | Group 2 | Group 3 | Group 4 | All |
|---|---|---|---|---|---|---|
| No. of loci | 396 | 861 | 1154 | 912 | 1300 | 4623 |
| Mean Nc | 49.72 | 54.54 | 53.57 | 57.63 | 54.65 | 54.46 |
| Mean Nc′ | 42.50 | 47.21 | 54.74 | 53.93 | 50.87 | 51.02 |
| Mean GCcds | 0.521 | 0.517 | 0.450 | 0.481 | 0.484 | 0.484 |
| Mean GC3 | 0.605 | 0.596 | 0.413 | 0.491 | 0.504 | 0.507 |
| Mean GCi | 0.353 | 0.356 | 0.345 | 0.356 | 0.350 | 0.351 |
| Mean GCf | 0.362 | 0.368 | 0.361 | 0.368 | 0.360 | 0.363 |
| Mean number of codons | 458.5 | 520.1 | 478.0 | 448.3 | 572.7 | 504.9 |
| Mean number of exons | 7.87 | 7.60 | 7.96 | 6.44 | 10.2 | 8.22 |
| Mean transcript length, bp | 9004 | 14,122 | 10,714 | 11,782 | 14,048 | 12,351 |
| Mean mRNA expression (AU) | 12,447.0 | 3051.0 | 3090.8 | 3142.2 | 3201.0 | 3926.0 |
mRNA, messenger RNA; AU, Arbitrary Units.
Figure 3Plot of offsets β1 and β2 for a SCUMBLE model with four trends for S. purpuratus genes, colored by groups of genes clustered by codon usage distributions. Offset β1 is correlated most strongly to GC3 in all groups.
The contributions of individual amino acids to codon bias in S. purpuratus
| Spearman Correlation Coefficient Between sENC-X | ||||||
|---|---|---|---|---|---|---|
| Amino Acid | Group 0 | Group 1 | Group 2 | Group 3 | Group 4 | All |
| Phe | −0.2817* | −0.1190* | −0.0349 | −0.1389* | −0.0998* | −0.2320*** |
| Leu | −0.4710*** | −0.4230*** | −0.2347*** | −0.3133*** | −0.3443*** | −0.5338*** |
| Ile | −0.3479*** | −0.1645* | −0.1145* | −0.1926* | −0.2276*** | −0.4752*** |
| Val | −0.4540*** | −0.2901*** | −0.0887 | −0.2167*** | −0.2213*** | −0.3698*** |
| Ser | −0.2738* | −0.2942*** | −0.2220*** | −0.2015*** | −0.1740* | −0.0881* |
| Pro | −0.1641 | −0.1617* | −0.1618* | −0.0553 | −0.0972* | 0.0469 |
| Thr | −0.4155*** | −0.2503*** | −0.2589*** | −0.1756* | −0.2127*** | −0.1869* |
| Ala | −0.3273*** | −0.2561*** | −0.1886*** | −0.0902 | −0.1108* | −0.0987*** |
| Tyr | −0.3532*** | −0.1159* | −0.0639 | −0.1601* | −0.1461* | −0.2838*** |
| His | −0.1075 | −0.0884 | 0.0689 | 0.0327 | 0.0187 | 0.0528 |
| Gln | −0.2788* | −0.1790* | −0.1207* | −0.0667 | −0.1399* | −0.2867*** |
| Asn | −0.3426*** | −0.2335*** | −0.0085 | −0.1234* | −0.1298* | −0.2969*** |
| Lys | −0.4319*** | −0.2315*** | −0.0601 | 0.0016 | −0.1771*** | −0.3864*** |
| Asp | −0.0306 | −0.1397* | 0.0298 | 0.0013 | 0.0158 | 0.0567* |
| Glu | −0.2241* | −0.1997*** | −0.0158 | −0.0448 | −0.1125* | −0.2452*** |
| Cys | −0.0050 | −0.0663 | −0.0105 | 0.0565 | −0.0315 | 0.0003 |
| Arg | −0.3034*** | −0.1641*** | −0.3598*** | −0.1693* | −0.2573*** | −0.0451 |
| Gly | −0.1303 | −0.1055 | −0.0958 | −0.0003 | −0.1383* | −0.0213 |
sENC-X, scaled ENC-X; Phe, phenylalanine; Leu, leucine; Ile, ; isoleucine; Val, valine; Ser, serine; Pro, proline; Thr; threonine; Ala, alanine; Tyr, tyrosine; His, histidine; Gln, glutamine; Asn, asparagine; Lys, lysine; Asp, aspartic acid; Glu, glutamine; Cys, cysteine, Arg, arginine; Gly, glycine. *Significance at P < 0.001. ***Significance at P < 10−10
Moriyama and Powell 1997.
Novembre 2002.
Figure 4Codon bias and GC composition in protein coding gene regions for different codon usage groups. Codon bias is represented as Wright’s Nc (Wright 1990). Regional GC composition is calculated from (A) GC3 content of exons, (B) GC content of exons, (C) GC content of introns, and (D) GC content of flanking regions. “Rp” denotes annotated ribosomal proteins. The genome-wide mean GC content is 36.9%.
Correlations between codon bias (Nc′), GC composition, and rates of protein evolution in S. purpuratus
| Spearman’s Correlation Coefficient for Each Group | |||||||
|---|---|---|---|---|---|---|---|
| Variable 1 | Variable 2 | Group 0 | Group 1 | Group 2 | Group 3 | Group 4 | All |
| Codon bias | |||||||
| Nc′ | GC3 | −0.7049*** | −0.6316*** | −0.2946*** | −0.432*** | −0.4489*** | −0.7325*** |
| Nc′ | GCcds | −0.4557*** | −0.2381*** | −0.0668*** | −0.187*** | −0.1392*** | −0.5584*** |
| Nc′ | GCi | 0.0279 | 0.0919 | 0.0933 | 0.2226*** | 0.1315 | −0.0195 |
| N′ | GCf | 0.103 | 0.0648 | 0.1394 | 0.0547 | 0.0917 | 0.0469 |
| Regional GC composition | |||||||
| GC3 | GCcds | 0.4811*** | 0.3884*** | 0.3299*** | 0.3872*** | 0.2443*** | 0.7409*** |
| GC3 | GCi | 0.0837 | 0.1491 | 0.1259 | 0.054 | 0.1021 | 0.2218*** |
| GC3 | GCf | 0.0548 | 0.0774 | 0.0424 | 0.0889 | 0.0605 | 0.0872 |
| GCcds | GCi | 0.1576 | 0.1009 | 0.0761 | 0.0619 | 0.1076 | 0.2073*** |
| GCcds | GCf | 0.0913 | 0.0377 | 0.0063 | 0.0951 | 0.024 | 0.0737 |
| GCi | GCf | 0.1221 | 0.084 | 0.1113 | 0.1109 | 0.1383 | 0.1333*** |
| Rate comparisons | |||||||
| Nc′ | −0.124 | −0.1538 | −0.0552 | −0.0243 | −0.0389 | −0.1541*** | |
| Nc′ | 0.1682 | 0.18249 | 0.0577 | 0.126 | 0.1475 | 0.1571*** | |
| Nc′ | 0.2231 | 0.2292 | 0.0667 | 0.133 | 0.1515 | 0.1993*** | |
Significance at P < 0.001. *** Significance at P < 10−10
Number of genes in each group: all (4623), cluster 0 (396), cluster 1 (861), cluster 2 (1154), cluster 3 (912), cluster 4 (1300).
Number of genes with introns for each group: all (4389), cluster 0 (368), cluster 1 (814), cluster 2 (1113), cluster 3 (826), Cluster 4 (1268).
Number of genes with comparative data for each group: all (2954), cluster 0 (225), cluster 1 (593), cluster 2 (744), cluster 3 (563), cluster 4 (829).
Figure 5Synonymous codon usage bias and mRNA expression levels for genes in different codon usage groups in (A) S. purpuratus and (B) Drosophila melanogaster. Codon usage bias is Novembre’s Nc′. Red circles surround annotated ribosomal proteins (‘Rp’). The black arrow labeled “U” points to “ubiquitin-like/S30 ribosomal fusion protein” (SPU_005280), an example of a highly expressed gene in S. purpuratus. The black arrow labeled “E” points to Sp-Ets1/2 (SPU_002874), an example of a highly expressed, highly biased gene that does not belong to codon usage group 0.
Correlations between gene size and mRNA expression, codon bias (Nc′), and GC content in S. purpuratus
| Spearman’s Correlation Coefficient for Each Group | |||||||
|---|---|---|---|---|---|---|---|
| Variable 1 | Variable 2 | Group 0 | Group 1 | Group 2 | Group 3 | Group 4 | All |
| Coding and transcript lengths | |||||||
| Ncodons | mRNA level | −0.2305* | −0.1052 | −0.1366* | −0.0823 | −0.0605 | −0.1164*** |
| Ncodons | Nc′ | 0.1991* | 0.2118* | 0.0559 | 0.2119* | 0.1226* | 0.0649* |
| Ncodons | GCcds | −0.1058 | −0.1738* | 0.1404* | 0.0106 | −0.0558 | 0.0345 |
| Nexons | mRNA level | −0.1316 | −0.0437 | −0.0346 | −0.0578 | −0.0315 | −0.0507* |
| Tlength | mRNA level | −0.1379 | −0.0521 | −0.0128 | −0.0354 | −0.0233 | −0.0574* |
| Tlength | Nc′ | 0.1055 | −0.0697 | −0.0915 | −0.112* | −0.0412 | −0.049* |
mRNA, messenger RNA. *Significance at P < 0.001. ***Significance at P < 10−10.
Number of genes in each group: all (4623), cluster 0 (396), cluster 1 (861), cluster 2 (1154), cluster 3 (912), cluster 4 (1300).
log10(maximum observed expression in AU).
Akashi’s test for the conservation of preferred codons between species
| Group | No. of Genes | M-H Χ2 | OR | P(Better Codon Set) | |
|---|---|---|---|---|---|
| 0 | 225 | 31.46 | 1.287 | 1.02e-08 | < 0.001 |
| 1 | 592 | 38.46 | 1.163 | 2.70e-10 | 0.005 |
| 2 | 744 | 36.25 | 1.098 | 8.69e-10 | 0.007 |
| 3 | 562 | 24.08 | 1.137 | 4.61e-10 | < 0.001 |
| 4 | 828 | 123.84 | 1.213 | 4.59e-29 | 0.024 |
| All | 2349 | 247.77 | 1.193 | 3.98e-56 | 0.036 |
OR, odds ratio.
Woolf test on homogeneity of ORs (Woolf 1955) shows significant three-way association for all groups.
P(Better Codon Set) is the fraction of 1000 randomly generated alternate preferred synonymous codon sets having a stronger association with conserved codons than the observed preferred set (see Materials and Methods).
Tests for stem-pairing of preferred codon N3 in mRNA secondary structure predictions
| M-H Χ2 | |||||
|---|---|---|---|---|---|
| Group | A3 | C3 | G3 | U3 | N3 |
| 0 | 18.83 | 261.80*** | 19.00* | 295.36 | 112.31 |
| 1 | 987.55 | 612.26*** | 164.31*** | n/a | 670.40 |
| 2 | 1810.24 | 2.02 | 254.02*** | 944.23*** | 4224.29*** |
| 3 | 806.18 | 242.88*** | 358.13*** | 1.69 | 3011.43*** |
| 4 | 2355.78 | 311.50*** | 536.99*** | 6.14 | 1.16 |
| All | n/a | 1945.18*** | 1594.94*** | n/a | 3675.23 |
mRNA, messenger RNA. *P < 0.01. ***P < 1e-10.
Mantel-Haenszel Χ2 test with continuity correction and an alternate hypothesis that preferred codons are more likely to be a stem than loop in mRNA secondary structure.