| Literature DB >> 27196606 |
Haruo Suzuki1, Brian R Morton2.
Abstract
Codon adaptation is codon usage bias that results from selective pressure to increase the translation efficiency of a gene. Codon adaptation has been studied across a wide range of genomes and some early analyses of plastids have shown evidence for codon adaptation in a limited set of highly expressed plastid genes. Here we study codon usage bias across all fully sequenced plastid genomes which includes representatives of the Rhodophyta, Alveolata, Cryptophyta, Euglenozoa, Glaucocystophyceae, Rhizaria, Stramenopiles and numerous lineages within the Viridiplantae, including Chlorophyta and Embryophyta. We show evidence that codon adaptation occurs in all genomes except for two, Theileria parva and Heicosporidium sp., both of which have highly reduced gene contents and no photosynthesis genes. We also show evidence that selection for codon adaptation increases the representation of the same set of codons, which we refer to as the adaptive codons, across this wide range of taxa, which is probably due to common features descended from the initial endosymbiont. We use various measures to estimate the relative strength of selection in the different lineages and show that it appears to be fairly strong in certain Stramenopiles and Chlorophyta lineages but relatively weak in many members of the Rhodophyta, Euglenozoa and Embryophyta. Given these results we propose that codon adaptation in plastids is widespread and displays the same general features as adaptation in eubacterial genomes.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27196606 PMCID: PMC4873144 DOI: 10.1371/journal.pone.0154306
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Codon usage in three plastid genomes.
| Codon | tRNA | Mpo | Mpo Total | Cre | Cre Total | Ppu | Ppu Total |
|---|---|---|---|---|---|---|---|
| AGT | 0 | 3 | 405 | 0 | 306 | 5 | 772 |
| 38 | |||||||
| AAT | 0 | 7 | 1219 | 1 | 792 | 2 | 1925 |
| 38 | |||||||
| TAT | 0 | 2 | 802 | 0 | 495 | 5 | 1093 |
| 38 | |||||||
| TTT | 0 | 8 | 1518 | 2 | 638 | 8 | 1445 |
| 38 | |||||||
| CAT | 0 | 5 | 379 | 1 | 197 | 3 | 615 |
| 38 | |||||||
| ATT | 0 | 17 | 1480 | 5 | 1129 | 13 | 2407 |
| 36 | |||||||
| ATA | 0 | 0 | 695 | 0 | 111 | 0 | 1101 |
| TGT | 0 | 0 | 207 | 3 | 167 | 0 | 299 |
| 36 | |||||||
| GAT | 0 | 4 | 709 | 1 | 514 | 5 | 1684 |
| 38 | |||||||
| ACA | 38 | 1 | 477 | 4 | 656 | 8 | 1027 |
| ACT | 0 | 14 | 597 | 12 | 534 | 8 | 1081 |
| ACC | 23 | 2 | 58 | 0 | 59 | 1 | 183 |
| ACG | 0 | 0 | 41 | 0 | 72 | 0 | 153 |
| CCA | 38 | 3 | 355 | 8 | 467 | 11 | 701 |
| CCT | 0 | 12 | 459 | 4 | 323 | 5 | 757 |
| CCC | 4 | 0 | 38 | 0 | 34 | 0 | 89 |
| CCG | 0 | 0 | 47 | 2 | 51 | 0 | 130 |
| GCA | 38 | 6 | 438 | 7 | 460 | 10 | 1242 |
| GCT | 0 | 32 | 752 | 25 | 812 | 26 | 1432 |
| GCC | 3 | 0 | 62 | 0 | 78 | 1 | 249 |
| GCG | 0 | 0 | 47 | 0 | 65 | 0 | 182 |
| GGA | 33 | 3 | 658 | 0 | 160 | 6 | 1108 |
| GGT | 0 | 29 | 612 | 30 | 1076 | 23 | 1233 |
| GGC | 35 | 1 | 82 | 1 | 97 | 3 | 447 |
| GGG | 0 | 0 | 88 | 0 | 68 | 0 | 225 |
| GTA | 38 | 13 | 442 | 16 | 599 | 14 | 1020 |
| GTT | 0 | 11 | 627 | 5 | 615 | 13 | 1391 |
| GTC | 21 | 0 | 47 | 0 | 12 | 0 | 250 |
| GTG | 0 | 0 | 46 | 0 | 74 | 0 | 225 |
| TCA | 38 | 4 | 350 | 16 | 454 | 1 | 679 |
| TCT | 0 | 12 | 614 | 12 | 392 | 15 | 1176 |
| TCC | 23 | 0 | 71 | 0 | 37 | 3 | 196 |
| TCG | 7 | 0 | 48 | 0 | 74 | 0 | 98 |
| CTA | 38 | 2 | 141 | 6 | 141 | 18 | 818 |
| CTT | 0 | 8 | 507 | 8 | 319 | 3 | 700 |
| CTC | 11 | 0 | 24 | 0 | 9 | 0 | 126 |
| CTG | 0 | 0 | 25 | 0 | 40 | 0 | 212 |
| TTA | 37 | 15 | 1823 | 18 | 1617 | 12 | 2255 |
| TTG | 30 | 5 | 199 | 0 | 75 | 2 | 475 |
| CGA | 0 | 0 | 248 | 0 | 67 | 0 | 285 |
| CGT | 38 | 10 | 344 | 15 | 734 | 11 | 330 |
| CGC | 0 | 2 | 46 | 0 | 57 | 2 | 123 |
| CGG | 23 | 0 | 22 | 0 | 4 | 0 | 56 |
| AGA | 38 | 2 | 367 | 0 | 99 | 1 | 1152 |
| AGG | 4 | 0 | 24 | 0 | 15 | 0 | 181 |
| GAA | 37 | 17 | 1080 | 15 | 914 | 16 | 2133 |
| GAG | 0 | 2 | 84 | 4 | 76 | 2 | 490 |
| AAA | 37 | 0 | 1724 | 0 | 1497 | 1 | 2417 |
| AAG | 0 | 1 | 81 | 0 | 86 | 0 | 582 |
| CAA | 38 | 6 | 855 | 7 | 782 | 7 | 1475 |
| CAG | 0 | 0 | 51 | 0 | 63 | 2 | 401 |
1 –Codon usage is given for the psbA gene and all genes cumulatively (total) for Marchantia polymorpha (Mpo), Chlamydomonas reinhardtii (Cre) and Porphyra purpurea (Ppu). The NNC codons of the two-fold degenerate groups are in bold: the psbA genes have an increased frequency of these codons as discussed in the text. The AGT and AGC codons of Serine are grouped with the NNY two-fold degenerate codon groups separate from the TCN Serine codons.
2 –Number of the 38 plastid genomes in the tRNA database at http://trna.ie.niigata-u.ac.jp/ that have a tRNA complementary to the codon.
Fig 1Codon usage patterns in plastid genes.
A cluster of three putative high-translation (psbA, rbcL, psbC in red) and three putative low-translation (rps3, rps4, rpoB in blue) genes from 43 plastid genomes selected to represent the major lineages (see text). Genes are clustered by similarity in codon usage as described in the Materials and Methods.
Fig 2Third position composition patterns.
A plot of %C (C/[C+T]) base composition at two-fold degenerate and four-fold degenerate sites for the genes shown in Fig 1. Values are the cumulative base composition for each gene. For the low translation genes we show the cumulative composition of the rps3, rps4 and rpoB genes.
Genes with highest rejection rate across genomes in the resampling analysis.
| Gene | Number of Genomes | Number of Genomes Rejected |
|---|---|---|
| 98 | 97 (99.0%) | |
| 98 | 86 (87.8%) | |
| 12 | 10 (83.3%) | |
| 10 | 8 (80.0%) | |
| 48 | 34 (70.83%) | |
| 92 | 62 (67.4%) | |
| 96 | 61 (63.5%) | |
| 97 | 58 (59.8%) |
1 –Number of genomes that code the gene.
2 –Number of genomes in which the gene was rejected in the resampling analysis.
Genomes with the highest levels of rejection in the resampling analysis.
| Genome | Classification | Number of Genes Rejected | Genes |
|---|---|---|---|
| Chlorophyta, Chlorophyceae | 50 (74.6%) | ||
| Chlorophyta, Chlorophyceae | 45 (61.2%) | ||
| Chlorophyta, Chlorophyceae | 37 (53.0%) | ||
| Chlorophyta, Chlorophyceae | 37 (47.8%) | ||
| Chlorophyta, Oltmannsiellopsis | 41 (47.5%) | ||
| Chlorophyta, Prasinophytes | 64 (43.8%) | ||
| Streptophyta, Klebsormidiophyceae | 44 (43.3%) | ||
| Chlorophyta, Chlorophyceae | 29 (37.3%) | ||
| Chlorophyta, Chlorophyceae | 29 (37.3%) | ||
| Chlorophyta, Pedinophyceae | 28 (34.2%) | ||
| Cryptophyta, Cryptomonadales | 2 (2.6%) | ||
| Rhodophyta, Florideophyceae | 5 (2.6%) | ||
| Rhodophyta, Florideophyceae | 4 (2.1%) | ||
| Rhodophyta, Florideophyceae | 4 (1.9%) | ||
| Streptophyta, Embryophyta | 3 (1.9%) | ||
| Rhodophyta, Florideophyceae | 4 (1.6%) | ||
| Rhodophyta, Bangiophyceae | 3 (1.5%) | ||
| Rhodophyta, Bangiophyceae | 2 (1.1%) | ||
| Alveolata, Apicomplexa | 0 | N/A | |
| Chlorophyta, Trebouxiophyceae | 0 | N/A |
1 –For genomes with the lowest rejection rates those genes rejected are listed
Genomes with the Lowest and Highest S Coefficients.
| Genome | Classification | S |
|---|---|---|
| Streptophyta, Coleochaetophyceae | 3.405 | |
| Stramenopiles, Bacillariophyta | 3.146 | |
| Stramenopiles, PX_clade | 2.998 | |
| Streptophyta, Mesostigmatophyceae | 2.989 | |
| Streptophyta, Embryophyta | 2.973 | |
| Chlorophyta, Ulvophyceae | 2.953 | |
| Stramenopiles, Bacillariophyta | 2.752 | |
| Stramenopiles, Bacillariophyta | 2.745 | |
| Chlorophyta, Chlorophyceae | 2.743 | |
| Streptophyta, Zygnemophyceae | 2.640 | |
| Stramenopiles, PX_clade | 2.604 | |
| Rhodophyta, Bangiophyceae | 2.568 | |
| Chlorophyta, Trebouxiophyceae | 2.564 | |
| Cryptophyta, Pyrenomonadales | 2.545 | |
| Rhodophyta, Bangiophyceae | 2.452 | |
| Rhodophyta, Florideophyceae, | 1.621 | |
| Rhodophyta, Bangiophyceae, | 1.554 | |
| Euglenozoa, Euglenida, | 1.501 | |
| Streptophyta, Embryophyta | 1.414 | |
| Streptophyta, Embryophyta | 1.226 | |
| Rhodophyta, Florideophyceae | 1.204 | |
| Euglenozoa, Euglenida | 1.062 | |
| Streptophyta, Embryophyta | 1.026 | |
| Alveolata, Chromerida, | 1.010 | |
| Rhodophyta, Florideophyceae | 0.843 | |
| Streptophyta, Charophyceae | 0.816 | |
| Rhodophyta, Florideophyceae | 0.814 | |
| Rhodophyta, Bangiophyceae | 0.566 | |
| Rhodophyta, Bangiophyceae | 0.562 | |
| Euglenozoa, Euglenida | 0.453 |
Genomes with the Lowest and Highest S3 Coefficients.
| Genome | Classification | S3 |
|---|---|---|
| Stramenopiles, Bacillariophyta | 3.093 | |
| Alveolata, Dinophyceae | 2.434 | |
| Chlorophyta, Chlorophyceae | 2.418 | |
| Haptophyceae, Isochrysidales | 2.385 | |
| Chlorophyta, Prasinophytes | 2.384 | |
| Stramenopiles, Bacillariophyta | 2.332 | |
| Chlorophyta, Trebouxiophyceae | 2.330 | |
| Stramenopiles, Bacillariophyta | 2.200 | |
| Stramenopiles, Pelagophyceae | 2.181 | |
| Chlorophyta, Prasinophytes | 2.164 | |
| Alveolata, Dinophyceae | 2.127 | |
| Stramenopiles, Bacillariophyta | 2.097 | |
| Stramenopiles, Bacillariophyta | 2.082 | |
| Haptophyceae, Phaeocystales | 1.965 | |
| Chlorophyta, Prasinophytes | 1.939 | |
| Rhodophyta, Florideophyceae | 1.046 | |
| Rhodophyta, Florideophyceae | 1.040 | |
| Rhodophyta, Florideophyceae | 1.015 | |
| Streptophyta, Zygnemophyceae | 0.914 | |
| Euglenozoa, Euglenida, | 0.838 | |
| Streptophyta, Embryophyta | 0.730 | |
| Euglenozoa, Euglenida | 0.676 | |
| Streptophyta, Embryophyta | 0.667 | |
| Rhodophyta, Florideophyceae | 0.626 | |
| Rhodophyta, Bangiophyceae | 0.624 | |
| Rhodophyta, Florideophyceae | 0.576 | |
| Streptophyta, Embryophyta | 0.474 | |
| Euglenozoa, Euglenida | 0.395 | |
| Rhodophyta, Bangiophyceae | 0.269 | |
| Streptophyta, Charophyceae | 0.178 |
1 –The S3 coefficient as defined in the text.
Genomes ranked by the maximal CAI value.
| Genome | Classification | Max. CAI |
|---|---|---|
| Chlorophyta, Oltmannsiellopsis | 0.915 | |
| Stramenopiles, Bacillariophyta | 0.877 | |
| Alveolata, Dinophyceae | 0.859 | |
| Alveolata, Dinophyceae | 0.856 | |
| Stramenopiles, Bacillariophyta | 0.844 | |
| Stramenopiles, Bacillariophyta | 0.831 | |
| Chlorophyta, Pedinophyceae | 0.830 | |
| Stramenopiles, Bacillariophyta | 0.828 | |
| Chlorophyta, Chlorophyceae | 0.827 | |
| Stramenopiles, Bacillariophyta | 0.825 | |
| Stramenopiles, Raphidophyceae | 0.816 | |
| Stramenopiles, Bacillariophyta | 0.816 | |
| Stramenopiles, Pelagophyceae | 0.812 | |
| Stramenopiles, Bacillariophyta | 0.812 | |
| Stramenopiles, Bacillariophyta | 0.807 | |
| Euglenozoa, Euglenida | 0.500 | |
| Chlorophyta, Prasinophytes | 0.476 | |
| Rhodophyta, Bangiophyceae | 0.476 | |
| Streptophyta, Zygnemophyceae | 0.474 | |
| Rhodophyta, Bangiophyceae | 0.461 | |
| Alveolata, Chromerida | 0.448 | |
| Streptophyta, Embryophyta | 0.430 | |
| Streptophyta, Embryophyta | 0.416 | |
| Alveolata, Chromerida | 0.415 | |
| Streptophyta, Klebsormidiophyceae | 0.400 | |
| Alveolata, Apicomplexa | 0.391 | |
| Chlorophyta, Trebouxiophyceae | 0.388 | |
| Chlorophyta, Trebouxiophyceae | 0.377 | |
| Streptophyta, Embryophyta | 0.366 | |
| Cryptophyta, Cryptomonadales | 0.364 |
Genomes with the strongest and weakest overall codon adaptation as measured by Spca.
| Genome | Classification | Spca |
|---|---|---|
| Chlorophyta, Oltmannsiellopsis | 3.38 | |
| Chlorophyta, Chlorophyceae | 3.24 | |
| Chlorophyta, Chlorophyceae | 3.19 | |
| Chlorophyta, Chlorophyceae | 2.85 | |
| Chlorophyta, Pedinophyceae, | 2.36 | |
| Alveolata, Dinophyceae, | 2.20 | |
| Stramenopiles, Bacillariophyta, | 2.18 | |
| Chlorophyta, Chlorophyceae | 2.09 | |
| Chlorophyta, Prasinophytes, | 1.99 | |
| Chlorophyta, Trebouxiophyceae | 1.98 | |
| Chlorophyta, Chlorophyceae | 1.89 | |
| Chlorophyta, Chlorophyceae | 1.60 | |
| Stramenopiles, Bacillariophyta | 1.57 | |
| Stramenopiles, Pelagophyceae | 1.52 | |
| Stramenopiles, Bacillariophyta | 1.46 | |
| Rhodophyta, Florideophyceae | -1.63 | |
| Rhodophyta, Florideophyceae | -1.81 | |
| Euglenozoa, Euglenida | -1.83 | |
| Rhodophyta, Florideophyceae | -2.05 | |
| Streptophyta, Zygnemophyceae | -2.06 | |
| Streptophyta, Charophyceae | -2.11 | |
| Euglenozoa, Euglenida | -2.21 | |
| Streptophyta, Embryophyta | -2.23 | |
| Alveolata, Chromerida | -2.33 | |
| Rhodophyta, Bangiophyceae | -2.39 | |
| Alveolata, Chromerida | -2.64 | |
| Streptophyta, Embryophyta | -2.76 | |
| Rhodophyta, Bangiophyceae | -2.81 | |
| Streptophyta, Embryophyta | -2.86 | |
| Cryptophyta, Cryptomonadales | -4.19 |
Fig 3Strength of codon adaptation across lineages.
A phylogeny of plastids with the strength of codon adaptation indicated for different lineages. Strength of selection is based on the Spca measure described in the text and given in S1 Table. An average value for the plastids within a given lineage greater than 1 is considered strong adaptation and an average value less than -1 is considered weak adaptation. A dashed line indicates variation within the genomes of that lineage. The primary endosymbiont is indicated as are the two proposed secondary events, one from green plant ancestors to the Euglenoids and another from the red algae ancestors to the lineage leading to extant Cryptophytes, Alveolates, Stramenopiles and Haptophytes (see text). Branches preceding the endosymbiosis are shaded black and indicate a lack of a plastid. The phylogeny overall is based on the general relationships from different sources [13,14].
Fig 4Within-group correspondence analysis (WCA) of codon usage for genomes with low S values.
WCA first component plot against %C at two-fold degenerate sites for four plastid genomes inferred to be under weak selection; Helicosporidium sp, Euglena gracilis, Chara vulgaris and Galdieria sulphuraria. Genes rejected in the resampling test are highlighted in red. Gene names are given based on NCBI annotation. Full taxon names from the NCBI annotation are given.
Fig 5Within-group correspondence analysis (WCA) of codon usage for genomes with high S values.
WCA first component plot against %C at two-fold degenerate sites for four plastid genomes inferred to be under strong selection; Chaetosphaeridium globosum, Dunaliella salina, Chlamydomonas reinhardtii and Pseudendoclonium akinetum. Genes rejected in the resampling test are highlighted in red. Gene names are given based on NCBI annotation. Full taxon names from the NCBI annotation are given.
Fig 6GC Skew in plastid genomes.
Box-and-whisker plots summarizing the GC skew index (GCSI) for plastid genomes of different lineages.
Fig 7Gene distribution by strand in plastid genomes.
Box-and-whisker plots summarizing the distributions of the ratio of leading strand genes to the total number of genes for chloroplasts from nine phyla. A deviation from the ratio = 0.5 (red line) indicates that genes tend to be unevenly distributed between the leading and lagging strands of DNA replication.