| Literature DB >> 26384771 |
Carrie A Whittle1, Cassandra G Extavour2.
Abstract
In protein-coding genes, synonymous codon usage and amino acid composition correlate to expression in some eukaryotes, and may result from translational selection. Here, we studied large-scale RNA-seq data from three divergent arthropod models, including cricket (Gryllus bimaculatus), milkweed bug (Oncopeltus fasciatus), and the amphipod crustacean Parhyale hawaiensis, and tested for optimization of codon and amino acid usage relative to expression level. We report strong signals of AT3 optimal codons (those favored in highly expressed genes) in G. bimaculatus and O. fasciatus, whereas weaker signs of GC3 optimal codons were found in P. hawaiensis, suggesting selection on codon usage in all three organisms. Further, in G. bimaculatus and O. fasciatus, high expression was associated with lowered frequency of amino acids with large size/complexity (S/C) scores in favor of those with intermediate S/C values; thus, selection may favor smaller amino acids while retaining those of moderate size for protein stability or conformation. In P. hawaiensis, highly transcribed genes had elevated frequency of amino acids with large and small S/C scores, suggesting a complex dynamic in this crustacean. In all species, the highly transcribed genes appeared to favor short proteins, high optimal codon usage, specific amino acids, and were preferentially involved in cell-cycling and protein synthesis. Together, based on examination of 1,680,067, 1,667,783, and 1,326,896 codon sites in G. bimaculatus, O. fasciatus, and P. hawaiensis, respectively, we conclude that translational selection shapes codon and amino acid usage in these three Pancrustacean arthropods.Entities:
Keywords: Gryllus bimaculatus; Oncopeltus fasciatus; Parhyale hawaiensis; optimal codons; translational selection
Mesh:
Substances:
Year: 2015 PMID: 26384771 PMCID: PMC4632051 DOI: 10.1534/g3.115.021402
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1The GC3 content for the 5% most highly and lowly expressed genes for each of three species under study. Different letters indicate a statistically significant difference among high and low expressed genes within each species (P < 0.05 using t-tests).
The difference (Δ) in mean RSCU for the 5% most highly vs. lowly expressed genes in Gryllus bimaculatus, Oncopeltus fasciatus, and Parhyale hawaiensis
| GC3/AT3 optimal codons | AT3 | AT3 | GC3 | ||||
| No. optimal codons | 17 | 16 | 13 | ||||
| Amino acid | Codon | ∆RSCU | ∆RSCU | ∆RSCU | |||
| Ala | GCT | *** | *** | +0.048 | |||
| Ala | GCC | −0.060 | −0.081 | * | ** | ||
| Ala | GCA | −0.055 | −0.120 | ** | +0.012 | ||
| Ala | GCG | −0.123 | *** | −0.101 | ** | −0.079 | |
| Arg | CGT | ** | ** | +0.060 | |||
| Arg | CGC | +0.013 | −0.101 | ** | −0.001 | ||
| Arg | CGA | +0.081 | −0.019 | −0.051 | |||
| Arg | CGG | +0.029 | −0.047 | −0.163 | |||
| Arg | AGA | −0.259 | ** | +0.050 | +0.122 | ||
| Arg | AGG | −0.019 | +0.117 | +0.064 | |||
| Asn | AAT | ** | *** | −0.054 | ** | ||
| Asn | AAC | −0.060 | * | −0.110 | *** | *** | |
| Asp | GAT | *** | *** | +0.078 | |||
| Asp | GAC | −0.091 | ** | −0.113 | *** | −0.031 | |
| Cys | TGT | ** | *** | −0.167 | ** | ||
| Cys | TGC | −0.033 | +0.003 | −0.013 | |||
| Gln | CAA | −0.070 | * | +0.063 | −0.064 | ** | |
| Gln | CAG | ** | −0.013 | ** | |||
| Glu | GAA | * | *** | −0.005 | |||
| Glu | GAG | −0.021 | −0.091 | ** | +0.030 | ||
| Gly | GGT | *** | *** | +0.021 | |||
| Gly | GGC | −0.018 | −0.183 | *** | +0.017 | ||
| Gly | GGA | +0.011 | −0.026 | * | |||
| Gly | GGG | −0.133 | ** | −0.074 | * | −0.127 | *** |
| His | CAT | ** | ** | +0.013 | |||
| His | CAC | −0.034 | −0.029 | −0.035 | |||
| Ile | ATT | *** | *** | −0.027 | |||
| Ile | ATC | −0.135 | ** | −0.033 | *** | ||
| Ile | ATA | −0.125 | ** | −0.100 | ** | −0.112 | ** |
| Leu | TTA | −0.034 | +0.073 | * | −0.226 | *** | |
| Leu | TTG | *** | +0.017 | +0.053 | |||
| Leu | CTT | +0.068 | *** | +0.004 | |||
| Leu | CTC | −0.217 | *** | −0.147 | ** | *** | |
| Leu | CTA | −0.075 | −0.112 | ** | −0.064 | ** | |
| Leu | CTG | −0.021 | −0.177 | *** | +0.087 | ||
| Lys | AAA | −0.032 | +0.019 | −0.100 | *** | ||
| Lys | AAG | +0.059 | −0.002 | *** | |||
| Phe | *** | −0.075 | ** | ||||
| Phe | TTC | −0.015 | −0.078 | ** | *** | ||
| Pro | CCT | +0.158 | * | ** | +0.164 | ||
| Pro | CCC | −0.190 | *** | −0.052 | ** | ||
| Pro | CCA | ** | −0.009 | −0.108 | * | ||
| Pro | CCG | −0.038 | −0.124 | *** | −0.129 | ** | |
| Ser | TCT | *** | *** | +0.015 | |||
| Ser | TCC | −0.200 | *** | −0.142 | ** | −0.023 | |
| Ser | TCA | −0.024 | +0.123 | * | +0.038 | ** | |
| Ser | TCG | −0.007 | −0.127 | *** | ** | ||
| Ser | AGT | +0.024 | −0.081 | −0.079 | ** | ||
| Ser | AGC | −0.090 | −0.131 | ** | −0.040 | ||
| Thr | ACT | ** | *** | +0.033 | |||
| Thr | ACC | −0.104 | ** | −0.122 | ** | ** | |
| Thr | ACA | +0.040 | −0.033 | −0.091 | |||
| Thr | ACG | −0.114 | ** | −0.080 | ** | −0.052 | |
| *** | −0.106 | ||||||
| Tyr | TAC | +0.061 | −0.140 | *** | |||
| Val | GTT | *** | *** | −0.031 | |||
| Val | GTC | −0.141 | *** | −0.100 | ** | * | |
| Val | GTA | −0.126 | ** | −0.108 | * | −0.121 | ** |
| Val | GTG | +0.037 | −0.041 | +0.069 | ** | ||
The codon identified as the primary optimal codon for each amino acid is in boldface. RSCU, relative synonymous codon usage.
Asterisks indicate P value using t-tests where **P < 0.05, ***P < 0.001. Codons with *0.05 > P < 0.1 are also indicated and considered putative optimal codons. The means and standard errors for highly and for lowly expressed CDS are provided in Table S2. Species are abbreviated using their genus name.
The codons TTT and TAT for G. bimaculatus and TAC for P. hawaiensis are identified as candidate optimal codons with P values at or slightly above 0.1.
For the amino acid Pro in G. bimaculatus, CCA was selected as the optimal codon due to the fact that it had a lower P value than CCT, although both exhibit signals of being optimal codons.
Figure 2The average frequency of optimal codons (Fop) relative to expression level for the three species of invertebrates. Expression of coding sequence was categorized as low (above 95th percentile), moderate (between 5th and 95th percentile), and high (below 5th percentile). Error bars represent standard errors and are very small.
The Spearman rank correlations between the frequency of each amino acid per CDS and the Fop
| Amino acid | ||||||||
|---|---|---|---|---|---|---|---|---|
| Arg | −0.198 | <0.001 | Arg | −0.252 | <0.001 | Ser | −0.377 | <0.001 |
| Thr | −0.123 | <0.001 | Gly | −0.188 | <0.001 | Cys | −0.132 | <0.001 |
| Pro | −0.122 | <0.001 | Ala | −0.183 | <0.001 | Thr | −0.119 | <0.001 |
| Ser | −0.110 | <0.001 | Pro | −0.161 | <0.001 | Leu | −0.102 | <0.001 |
| Ala | −0.093 | <0.001 | Leu | −0.183 | <0.001 | Arg | −0.021 | 0.085 |
| Leu | −0.083 | <0.001 | Thr | −0.099 | <0.001 | Pro | −0.021 | 0.079 |
| Gly | −0.061 | <0.001 | Met | −0.061 | <0.001 | Val | −0.011 | 0.347 |
| Met | −0.049 | <0.001 | Trp | −0.059 | <0.001 | Asn | −0.008 | 0.512 |
| Gln | −0.019 | 0.155 | Val | −0.054 | <0.001 | His | −0.003 | 0.778 |
| Trp | −0.008 | 0.571 | Ser | −0.038 | 0.003 | Ile | 0.034 | 0.005 |
| His | −0.003 | 0.850 | Gln | −0.032 | 0.012 | Ala | 0.060 | <0.001 |
| Cys | 0.029 | 0.033 | His | −0.020 | 0.110 | Gln | 0.065 | <0.001 |
| Val | 0.040 | 0.003 | Cys | 0.035 | 0.006 | Trp | 0.089 | <0.001 |
| Phe | 0.050 | <0.001 | Tyr | 0.059 | <0.001 | Phe | 0.090 | <0.001 |
| Tyr | 0.079 | <0.001 | Phe | 0.085 | <0.001 | Met | 0.103 | <0.001 |
| Ile | 0.115 | <0.001 | Ile | 0.198 | <0.001 | Glu | 0.114 | <0.001 |
| Lys | 0.151 | <0.001 | Glu | 0.211 | <0.001 | Lys | 0.121 | <0.001 |
| Asn | 0.169 | <0.001 | Asp | 0.226 | <0.001 | Gly | 0.141 | <0.001 |
| Asp | 0.269 | <0.001 | Lys | 0.272 | <0.001 | Tyr | 0.158 | <0.001 |
| Glu | 0.270 | <0.001 | Asn | 0.287 | <0.001 | Asp | 0.195 | <0.001 |
Species are abbreviated using their genus name. CDS, coding sequence; Fop, frequency of optimal codons.
Amino acids are listed from the most negative to positive R values for each species.
Figure 3The two amino acids with the largest positive (left) and negative (right) correlation to Fop/expression level in (A) G. bimaculatus; (B) O. fasciatus; and (C) P. hawaiensis. Fop was binned into four categories as shown. Spearman R correlations in Table 2 were calculated with the use of all (unbinned) data points.
Figure 4Bar and whisker plots of the effective number of codons (ENCs) and GC3 content of D. melanogaster orthologs to coding sequence with low, moderate, and high expression in (A) G. bimaculatus; (B) O. fasciatus; and (C) P. hawaiensis. P-values of ranked analysis of variance <0.0003 for each figure. Different letters in each figure indicate paired differences using Dunn’s contrast (P < 0.05).
Functional clustering of the most highly expressed CDS (above the 95th percentile) for each of three arthropod species under study using their orthologs in Drosophila melanogaster and the gene ontology system DAVID (Huang da ,b)
| Enrichment Score: 13.46 | Enrichment Score: 10.7 | ||||
| Proteasome regulatory particle | 6.20E–5 | Mitotic spindle organization | 5.70E–18 | Ribonucleoprotein | 3.20E–16 |
| Proteasome accessory complex | 8.00E–5 | Mitotic cell cycle | 2.70E–17 | Ribosomal protein | 6.80E–16 |
| Proteasome complex | 3.70E–4 | Spindle organization | 3.80E–17 | Structural constituent of ribosome | 3.00E–13 |
| Microtubule cytoskeleton organization | 2.40E–16 | Ribosomal subunit | 7.90E–11 | ||
| Spindle organization | 6.30E–6 | Microtubule-based process | 1.20E–14 | Ribosome | 1.10E–10 |
| Cell-cycle process | 3.80E–5 | Cell cycle | 2.10E–12 | Cytosolic ribosome | 3.80E–10 |
| Mitotic spindle organization | 7.10E–5 | M phase | 8.60E–12 | Ribosome | 6.10E–10 |
| Mitotic cell cycle | 8.10E–5 | Cytoskeleton organization | 1.90E–11 | Structural molecule activity | 1.20E–9 |
| Cell cycle | 1.20E–4 | Cell-cycle process | 2.00E–11 | Large ribosomal subunit | 1.70E–9 |
| Microtubule cytoskeleton organization | 2.20E–4 | Cell-cycle phase | 2.50E–11 | Cytosolic part | 3.70E–8 |
| Microtubule-based process | 4.40E–4 | Enrichment Score: 6.03 | Enrichment Score: 7.13 | ||
| M phase | 5.70E–4 | Cytosolic ribosome | 7.40E–9 | Mitotic spindle elongation | 5.40E–12 |
| Cell-cycle phase | 9.30E–4 | Ribonucleoprotein | 9.20E–9 | Spindle elongation | 6.50E–12 |
| Ribosomal protein | 1.30E–8 | Mitotic spindle organization | 4.10E–10 | ||
| Proteasome | 1.00E–5 | Ribosome | 4.10E–8 | Microtubule cytoskeleton organization | 2.00E–9 |
| Proteasome complex | 3.70E–4 | Cytosolic large ribosomal subunit | 9.50E–8 | Spindle organization | 4.50E–9 |
| Proteasome | 3.20E–3 | Structural constituent of ribosome | 1.80E–6 | Microtubule-based process | 9.90E–8 |
| Ribosomal subunit | 1.60E–5 | Mitotic cell cycle | 2.50E–7 | ||
| atp-binding | 1.20E–6 | Large ribosomal subunit | 5.10E–5 | Cytoskeleton organization | 2.60E–7 |
| Nucleotide-binding | 5.70E–6 | Ribosome | 6.90E–5 | M phase | 2.20E–5 |
| Adenyl nucleotide binding | 8.80E–4 | Structural molecule activity | 1.40E–3 | Cell-cycle phase | 3.70E–5 |
| Purine nucleoside binding | 9.70E–4 | Enrichment Score: 4.44 | Cell-cycle process | 1.40E–4 | |
| ATP binding | 9.80E–4 | Nucleotide-binding | 5.40E–8 | Cell cycle | 2.50E–4 |
| Adenyl ribonucleotide binding | 1.00E–3 | Atp-binding | 1.70E–7 | Enrichment Score: 3.63 | |
| Nucleoside binding | 1.10E–3 | Nucleotide binding | 7.40E–6 | Transit peptide | 4.30E–5 |
| Purine nucleotide binding | 1.70E–3 | Purine nucleotide binding | 5.90E–5 | Mitochondrion | 4.50E–5 |
| Ribonucleotide binding | 1.90E–3 | Purine ribonucleotide binding | 7.50E–5 | Transit peptide:mitochondrion | 6.60E–3 |
| Purine ribonucleotide binding | 1.90E–3 | Ribonucleotide binding | 7.50E–5 | Enrichment Score: 3.26 | |
| ATP binding | 2.40E–4 | Proteasome | 1.30E–4 | ||
| PINT | 1.20E–4 | Adenyl ribonucleotide binding | 2.50E–4 | Proteasome | 9.60E–4 |
| Proteasome component region PCI | 1.90E–4 | Adenyl nucleotide binding | 4.30E–4 | Proteasome complex | 1.30E–3 |
| Domain:PCI | 5.40E–3 | Purine nucleoside binding | 4.80E–4 | Enrichment Score: 2.59 | |
| Nucleoside binding | 5.50E–4 | PINT | 4.90E–4 | ||
| Chaperone | 5.4E–4 | Enrichment Score: 4.13 | Proteasome component region PCI | 8.80E–4 | |
| Chaperonin TCP-1 | 7.4E–4 | Chaperone | 8.0E–4 | Domain:PCI | 4.00E–2 |
| Chaperonin-containing T-complex | 1.40E–3 | Chaperonin TCP-1 | 1.4E–4 | ||
| Chaperonin Cpn60/TCP-1 | 2.50E–3 | Chaperonin-containing T-complex | 2.70E–5 | ||
| Chaperonin Cpn60/TCP-1 | 7.60E–5 | ||||
| Ribosomal protein | 9.80E–5 | PIRSF002584:molecular chaperone t-complex-type | 4.00E–4 | ||
| Ribonucleoprotein | 1.00E–4 | Chaperone | 1.70E–3 | ||
| Cytosolic large ribosomal subunit | 2.20E–4 | Enrichment Score: 3.91 | |||
| Cytosolic ribosome | 4.30E–4 | Proteasome | 3.60E–6 | ||
| Structural constituent of ribosome | 2.00E–3 | Proteasome complex | 7.00E–4 | ||
| Ribosome | 2.50E–3 | Proteasome | 7.60E–4 | ||
| Large ribosomal subunit | 3.70E–3 | Enrichment Score: 2.54 | |||
| Ribosomal subunit | 1.10E–2 | Proteasome regulatory particle | 2.40E–3 | ||
| Ribosome | 2.60E–2 | Proteasome regulatory particle | 3.00E–3 | ||
| Structural molecule activity | 8.20E–2 | Proteasome accessory complex | 3.50E–3 |
CDS, coding sequence.
P-values represent a modified Fisher’s test, wherein lower values indicate greater enrichment.
Functional categories with enrichment values >2.5 are shown.