| Literature DB >> 28193262 |
Tomáš Pánek1, David Žihala1, Martin Sokol1, Romain Derelle2, Vladimír Klimeš1, Miluše Hradilová3, Eliška Zadrobílková4, Edward Susko5,6, Andrew J Roger6,7,8, Ivan Čepička4, Marek Eliáš9.
Abstract
BACKGROUND: Departures from the standard genetic code in eukaryotic nuclear genomes are known for only a handful of lineages and only a few genetic code variants seem to exist outside the ciliates, the most creative group in this regard. Most frequent code modifications entail reassignment of the UAG and UAA codons, with evidence for at least 13 independent cases of a coordinated change in the meaning of both codons. However, no change affecting each of the two codons separately has been documented, suggesting the existence of underlying evolutionary or mechanistic constraints.Entities:
Keywords: Codon reassignment; Evolution; Evolutionary constraint; Fornicata; Genetic code; Iotanema spirale; Lygus hesperus; Protists; Rhizaria; Transcriptome
Mesh:
Substances:
Year: 2017 PMID: 28193262 PMCID: PMC5304391 DOI: 10.1186/s12915-017-0353-y
Source DB: PubMed Journal: BMC Biol ISSN: 1741-7007 Impact factor: 7.431
Fig. 1Phylogenetic position of the organisms studied. a Phylogeny of eukaryotes including the rhizarian exLh based on 18S rDNA sequences. The maximum likelihood (ML) tree was inferred with RAxML using the GTRGAMMAI substitution model. The values at branches represent RAxML BS values followed by PhyloBayes posterior probabilities (GTRCAT model). b Phylogeny of Fornicata including I. spirale based on a concatenated data set of 18S rDNA and EF-1α, EF2, HSP70, and HSP90 protein sequences. The ML tree was inferred with RAxML using the substitution models GTRGAMMA (for 18S rDNA) and PROTGAMMALG4X (for the protein sequences). The values at branches represent RAxML BS values followed by PhyloBayes posterior probabilities (CAT Poisson model). Maximal support (100/1) is indicated with black dots. Asterisks indicate support values lower than 50% or 0.5, respectively, dashes mark branches in the ML tree that are absent from the PhyloBayes tree
Fig. 2In-frame UAG codons in protein-coding genes of the rhizarian exLh and I. spirale. a An example of a rhizarian exLh gene with several in-frame UAG codons: multiple sequence alignment of orthologs of the Bat1 protein (spliceosome RNA helicase). b Relative frequency of hyperconserved positions (at least 90% amino acid identity across orthologs from 250 representatives of main eukaryotic groups in the alignment) corresponding to UAG-containing sites in the rhizarian exLh transcripts. c An example of a I. spirale gene with several in-frame UAG codons: multiple sequence alignment of orthologs of the Polr2a protein (also known as RNA polymerase II subunit RPB1). d Relative frequency of hyperconserved positions (at least 90% amino acid identity across orthologs from 54 representatives of main eukaryotic groups in the alignment) corresponding to UAG-containing sites in the rhizarian exLh transcripts. e, f Dominant amino acid identity at conserved alignment positions (defined using 90% and 50% threshold) in a broad-scale comparison of I. spirale sequences with eukaryotic homologs. e Positions corresponding to in-frame UAG codons in I. spirale sequences. f Positions corresponding to canonical glutamine codons (CAG, CAA) in I. spirale sequences. In Fig. 2a and c, only selected segments of the full alignments (separated by double slashes) are shown for simplicity. Asterisks indicate positions with in-frame UAG codons in the underlying coding sequences. In Fig. 2b and d, the hyperconserved positions are sorted according to the respective hyperconserved amino acid residue (only four most frequent position classes are shown). Source tables for Fig. 2b and d including data from read mapping are available in Additional file 1: Table S1D and S2C
Phylogeny-informed maximum likelihood-based estimation of the UAG codon meaning in the rhizarian exLh and Iotanema spirale. See the main text for details on the procedure employed. Note that the values of conditional probabilities for the most likely assignments (UAG = L in the rhizarian exLh and UAG = Q in I. spirale) differ so little from 1.00 that they must have been indicated as 1 subtracted by the sum of conditional probabilities of the 19 other assignments
| Rhizarian exLh |
| ||||
|---|---|---|---|---|---|
| Alternative UAG meaning | Log-likelihood score | Conditional probability | Alternative UAG meaning | Log-likelihood score | Conditional probability |
| L | –362314.715 | 1 – (2.65 × 10–82) | Q | –55505.075 | 1 – (2.36 × 10–34) |
| I | –362502.554 | 2.65 × 10–82 | E | –55582.654 | 2.03 × 10–34 |
| V | –362543.823 | 3.16 × 10–100 | K | –55584.47 | 3.31 × 10–35 |
| M | –362553.384 | 2.23 × 10–104 | R | –55632.34 | 5.36 × 10–56 |
| F | –362585.699 | 2.06 × 10–118 | S | –55657.799 | 4.71 × 10–67 |
| A | –362632.167 | 1.4 × 10–138 | A | –55659.005 | 1.41 × 10–67 |
| T | –362657.807 | 9.9 × 10–150 | N | –55672.232 | 2.54 × 10–73 |
| Q | –362673.9 | 1.02 × 10–156 | T | –55677.733 | 1.04 × 10–75 |
| R | –362687.179 | 1.74 × 10–162 | D | –55688.053 | 3.42 × 10–80 |
| S | –362701.797 | 7.81 × 10–169 | H | –55708.978 | 2.79 × 10–89 |
| K | –362710.573 | 1.2 × 10–172 | L | –55710.215 | 8.11 × 10–90 |
| Y | –362713.231 | 8.45 × 10–174 | P | –55750.281 | 3.22 × 10–107 |
| P | –362724.093 | 1.62 × 10–178 | G | –55753.721 | 1.03 × 10–108 |
| H | –362741.154 | 6.31 × 10–186 | V | –55757.545 | 2.26 × 10–110 |
| C | –362745.196 | 1.11 × 10–187 | M | –55785.355 | 1.89 × 10–122 |
| E | –362745.729 | 6.50 × 10–188 | I | –55808.382 | 1.89 × 10–132 |
| W | –362766.438 | 6.59 × 10–197 | Y | –55825.982 | 4.28 × 10–140 |
| N | –362774.796 | 1.55 × 10–200 | F | –55883.499 | 4.49 × 10–165 |
| G | –362797.379 | 2.41 × 10–210 | C | –55936.549 | 4.10 × 10–188 |
| D | –362848.162 | 2.12 × 10–232 | W | –55966.189 | 5.50 × 10–201 |
Fig. 3Relative codon frequencies in the rhizarian exLh and I. spirale. a Relative codon frequencies in two different groups of genes (for ribosomal proteins and for subunits of the 26S proteasome; listed in Additional file 1: Tables S1A and S1B) in the rhizarian exLh. b Relative codon frequencies in a reference set of genes of I. spirale (listed in Additional file 3: Tables S2A and S2B). The relative codon frequencies are calculated as the percentage of the codon among all occurrences of codons with the same meaning (i.e., coding for the same amino acid or terminating translation)
Fig. 4Phylogenetic distribution of known non-canonical genetic codes in nuclear genes of eukaryotes. The schematic phylogenetic tree was drawn on the basis of phylogenetic and phylogenomic analyses for eukaryotes as a whole [60, 71, 72] (our own Fig. 1 and Additional file 2: Figure S1) and for the relevant subgroups with non-canonical codes [12, 13, 73–77]. Multifurcations indicate uncertain or controversial branching order, dashed branches indicate different positions of Metamonada within eukaryotes suggested by different studies, branches drawn as double lines indicate paraphyletic groupings. The types and occurrences of the different non-canonical codes are based on this study (the rhizarian exLh and Iotanema) and the following previous reports: fungi [14, 15]; Amoeboaphelidium [13]; oxymonads [11]; Blastocrithidia [18]; ulvophytes [12]; ciliates [7, 9, 16, 17]. Note that, for simplicity, code variants with a context-dependent dual meaning of UAR or UGA codons as sense or termination ones (UAR in Blastocrithidia and Condylostoma, UGA in Parduczia and Condylostoma) are not distinguished from those with a “complete” reassignment. We also omitted some ciliate species with their putative non-canonical codes supported by little data that are specifically related to and possibly sharing the same code with better studied species. Changes in the genetic code are mapped onto the tree primarily (black circles) using Dollo parsimony (no reversions are allowed). An alternative maximum parsimony scenario with reversions weighted the same as other changes is indicated by the respective code numbers in white circles. An alternative branching order to the one indicated in the figure was supported by some studies for some of the ciliate lineages, but the alternative topology does not decrease the minimal number of codon reassignments required to explain the distribution of non-standard genetic codes