Literature DB >> 26569314

Non-Standard Genetic Codes Define New Concepts for Protein Engineering.

Ana R Bezerra¹, Ana R Guimarães², Manuel A S Santos³.

Abstract

The essential feature of the genetic code is the strict one-to-one correspondence between codons and amino acids. The canonical code consists of three stop codons and 61 sense codons that encode 20% of the amino acid repertoire observed in nature. It was originally designated as immutable and universal due to its conservation in most organisms, but sequencing of genes from the human mitochondrial genomes revealed deviations in codon assignments. Since then, alternative codes have been reported in both nuclear and mitochondrial genomes and genetic code engineering has become an important research field. Here, we review the most recent concepts arising from the study of natural non-standard genetic codes with special emphasis on codon re-assignment strategies that are relevant to engineering genetic code in the laboratory. Recent tools for synthetic biology and current attempts to engineer new codes for incorporation of non-standard amino acids are also reviewed in this article.

Entities: CellLine Chemical Disease Gene Mutation Species

Keywords: amino acids; biotechnology; codon reassignment; evolution; genetic code

Year: 2015 PMID： 26569314 PMCID： PMC4695839 DOI： 10.3390/life5041610

Source DB: PubMed Journal: Life (Basel) ISSN： 2075-1729

1. Introduction

The genetic code maps 64 codons onto a set of 20 amino acids plus the translational stop signal [1]. These codon-to-amino acid assignments are established by 20 aminoacyl-tRNA synthetases (AARSs) that recognize, activate and charge 20 proteinaceous amino acids onto tRNAs. Aminoacyl tRNAs are then transferred to the ribosome where their three letter anticodons read the three letter codons of messenger RNAs (mRNA) [2]. Although the genetic code is almost universal, 34 alterations in nuclear and organellar genomes (Table 1) from bacterial to eukaryotic species have been discovered [3]. The majority of these codon reassignments involve sense to nonsense codon changes (or vice versa) and occur in mitochondria. Only one nuclear sense-to-sense alteration is known so far, namely the reassignment of the CUG codon from leucine to serine in several fungal species of the CTG clade [4,5]. Among code variants involving stop codons are glutamine and cysteine codons of certain ciliates [6] and the tryptophan codon of Mycoplasma [7]. Some of these reassignments involve codons whose identities change multiple times in closely related phylogenetic lineages suggesting that certain taxonomic groups (e.g., the ciliates) are more prone to codon reassignment than others [8]. Additionally, two non-canonical amino acids are naturally incorporated into the genetic code, namely selenocysteine, which is inserted at specific UGA sites in a wide range of prokaryotes and eukaryotes [9,10] and pyrrolysine in the archeon Methanosarcina barkeri at selected UAG sites [11,12].

Table 1

Genetic code alterations in mitochondrial and nuclear genomes. These changes are phylogenetically independent and some of them occur more than once (adapted from [3]).

	Unassigned ➔ X	Sense ➔ Unassigned	Stop ➔ Sense	Sense ➔ Stop	Sense ➔ Sense
Mitochondrial					^SerAGG ➔ Lys
	AGA ➔ Gly		UGA ➔ Trp		^IleAUA ➔Met
	AGA ➔ Ser	^ArgCGN ➔ UN	UAA ➔ Tyr	^SerUCA ➔ Stop	^LeuCUN ➔ Thr
		^SerAGR ➔ UN	UAG ➔ Leu		^ArgAGA ➔ Ser
	AGR ➔ Stop		UAG ➔ Ala		^ArgAGG ➔ Ser
					^LysAAA ➔ Asn
					^ArgAGA ➔ Gly
					^ArgAGG ➔ Gly
Nuclear		^ArgAGA ➔ UN	UGA ➔ Trp
		^IleAUA ➔ UN	UGA ➔ Cys		^LeuCUG ➔ Ser
		^ArgCGG ➔ UN	UAR ➔ Gln

These alterations provide insight into the evolution of the genetic code and highlight new concepts that can be used to manipulate protein function for basic and applied research purposes. In recent years, non-canonical amino acids have been incorporated into proteins in vivo using orthogonal aminoacyl-tRNA synthetase/tRNA pairs and nonsense codons. More than 100 unnatural amino acids have been incorporated into proteins of numerous organisms, such as Escherichia coli [13,14,15], Saccharomyces cerevisiae [13], mammalian cells [13,14], Shigella [15], Salmonella [15], Mycobacterium tuberculosis [16], Drosophila melanogaster [13], Caenorhabditis elegans [13,17], Bombyx mori [18] and Arabidopsis thaliana [14]. High level misincorporation of canonical amino acids has also been reported. UAG stop codons have been reassigned to glutamine (Gln) and tyrosine (Tyr) in a modified E. coli strain lacking both UAGs in essential genes and the release factor-1 (RF1) which recognizes UAGs [19]. Sense codons have been reassigned to semi-conserved amino acids in E. coli through selective pressure incorporation (SPI) methodologies that activate amino acid misincorporation in quiescent cells to minimize the toxic effects of codon ambiguity [20,21]. Moreover, Euplotes crassus tolerates the incorporation of two amino acids (selenocysteine and cysteine) at the UGA codon and the dual use of this codon can occur within the same gene [22]. These examples highlight high genetic code flexibility, but how natural variation in codon-amino acid assignments emerges and is selected as well as the consequences of engineering the genetic code remain unclear. Genetic code alterations in mitochondrial and nuclear genomes. These changes are phylogenetically independent and some of them occur more than once (adapted from [3]).

2. Structural and Molecular Features of Non-Standard Genetic Codes

2.1. Nuclear Genetic Code Variation

Most codon reassignments have been linked to alterations in components of the translational machinery, namely tRNAs, aminoacyl-tRNA synthetases and the release factors that recognize stop codons [23]. In bacteria, reassignments appear to be restricted to the UGA stop codon and are associated with disappearance of RF2, which recognizes the UGA and UAA termination codons and mutant tRNAs that misread these codons. UGA has been reassigned to Trp in Mycoplasma spp. [7] and Spiroplasma citri [24]. Recent metagenomics studies and single-cell sequencing approaches revealed that the uncultivated bacteria Candidatus Hodgkinia cicadicola [25] and BD1-5 [26] also decode UGA as Trp, while SR1 bacteria [27] and Gracilibacteria [28] decode it as Gly. Mollicutes with altered codes have two Trp-tRNA species, one with the canonical CCA anticodon to decode the UGG-Trp codon and the other with the UCA anticodon for decoding the UGA stop codon [29]. Since only UAA and UAG codons are used as termination codons, these species maintained the RF1 (responsible for the recognition of UAA and UAG) and eliminated RF2 [30]. Their small and AT-rich genomes (e.g., Mycoplasma capricolum AT content is ~75%) is likely to introduce important codon usage biases that may force the replacement of UGA for UAA codons. This renders RF2 dispensable as RF1 alone is able to recognize the remaining UAA and UAG termination codons [31]. Conversely, the reassignment of UGA to Trp exists in the GC-rich (~60%) genome of Candidatus Hodgkinia cicadicola, where the RF2 is absent. In this case, there is only one Trp-tRNA with an UCA anticodon, suggesting that its gene arose from mutation and not from tRNA gene duplication. Authors proposed that this identity change arose from codon ambiguity initiated by the emergence of a mutant Trp-tRNA that could decode the UGA codon. This tRNA competed with RF2 for the UGA stop codon, which eventually led to the dispensability of RF2 and to further tRNA mutations to refine its new decoding properties. The Hodgkinia genome adapted to the new codon usage by replacing the old UGA stop codons with UAA and UAG codons and by substituting some of the UGG codons for UGA [25]. The reassignment of UGA to Gly in SR-1 bacteria present in the human microbiome is also accompanied by the loss of RF2. Apart from the canonical Gly-tRNAUCC which decodes GGN-Gly codons, its genome also encodes an additional Gly-tRNAUCA. Although the D and anticodon arms of this unusual tRNA are divergent from the canonical Gly-tRNA, it maintains the major identity elements for glycylation by the GlyRS [27]. Little is known about codon alterations in phages, but several reports suggest reassignment of the UGA stop codon to Trp [32] and to Gly, and the UAG stop to Ser and Gln [33]. Since bacteria appear to reassign only UGA codons, the use of a divergent code in bacteriophages has important implications. It has been suggested that differences between viral and host genetic codes constitute a barrier to infection, because phages are deeply dependent on the translation machinery of their hosts [32,34], but these phases encode Gln-tRNACUA or Ser-tRNACUA and RF2 to translate UAG codons. Since bacteria that use UGA codons as sense codons erased RF2, such phages are able to infect their hosts [33]. In eukaryotes, termination codons are also reassigned by the cytoplasmic translational machinery. These alterations are again associated with misreading tRNAs, aminoacyl-tRNA synthetases and release factors [23]. Since eukaryotes have only one release factor (eRF1) to decode the UAA, UAG and UGA stop codons and this factor has three well defined domains [35]: domain 1 is responsible for stop codon recognition [36], domain 2 is associated with peptide hydrolysis [37] and domain 3 interacts with eRF3, a GTPase which stimulates termination activity [38], changes in the stop codon recognition domain, i.e. domain-1, are associated with stop codon reassignment. UAR stop codons have been reassigned to Gln in the diplomonad Hexamita inflate [39], the oxymonad Streblomastix strix [40] and in several dasycladalean, cladophoralean and trentepohlialean green algae [41]. Several species of ciliates use different deviant codes that arose independently. UAR stop codons have been reassigned to Gln in Paramecium [42], Tetrahymena [43], Oxytricha, Loxodes [44] and Stylonychia [45], and also to Glu in Vorticella and Opisthonecta [46]. The UGA stop codon has been reassigned to Cys in Euplotes spp. [47] and to Trp in Blepharisma americanum and Colpoda [48]. Decoding of UAR codons as Gln in Tetrahymena thermophila requires two additional Gln-tRNAs with the anticodons UUA and CUA while translation of the canonical CAR-Gln codons is accomplished by the usual Gln-tRNAUUG [6]. On the other hand, Euplotes has only one gene for Cys-tRNA with a GCA anticodon and so the decoding of UGA codons requires an unusual G-A base pairing in the wobble position (Figure 1A) [49]. Apart from the emergence of suppressor tRNAs able to decode UGA or UAR codons, ciliate eRF1s must have altered stop codon recognition specificities.

Figure 1

(A) An expanded wobble rule; (B) Possible pairings between the wobble nucleoside of tRNA and the codon third nucleoside of mRNA found in animal mitochondria. U *: cmnm5(s2)U, mnm5U, τm5U, τm5s2U (adapted from [64]).

Several studies implicated a series of modifications in domain 1 of divergent eRF1, particularly in the highly conserved TASNIKS and YCF motifs, which are involved in stop codon recognition (Table 2). However, substitutions across ciliate species are not alike and show different modes of stop codon specificity [50,51]. Ciliates that use UAR as sense codons terminate translation only at UGA codons and their eRF1 is UGA specific. Introduction of the divergent YCF motif of Stylonychia (QFMYFCGGKF) in the human eRF1 is sufficient to alter its specificity to UGA-only [52]. However, in both Paramecium and Loxodes, the divergent YCF alone is not sufficient and must act together with the altered TASNIKS motif to ensure UGA-only specificity [50,52]. Data is not consistent for Tetrahymena eRF1 in vitro and in in vivo studies. Chimeras of domains 2 and 3 of yeast eRF1 fused with the entire domain 1 of Tetrahymena result in UGA-only specificity in vitro [53], but it retains the ability to recognize all three codons in vivo [54]. Introduction of Tetrahymena TASNIKS and YCF motifs in human eRF1 does not alter recognition of UAA and UGA codons, but dramatically increases readthrough at UAG codons [50]. It has been suggested that Tetrahymena represents an ambiguous intermediate stage of the codon reassignment process as eRF1 retains the ability to recognize all three stop codons and reassignment is accomplished by competition from its suppressor Gln-tRNAs [54] that efficiently decode UAR codons as Gln [6]. Conversely, Blepharisma and Euplotes reassigned UGA stop codons to Cys and only UAR codons are recognized by their eRF1 as termination codons [55]. Both have a single substitution from Leu-126 to Ile in the YCF motif—YICDNKF. Introduction of this mutation in S. cerevisiae eRF1 dramatically increased the readthrough at UGA sites [50]. Another consistent substitution found in both genera is Ser-70 to Ala, which has been shown to increase UGA readthrough in vivo, while maintaining efficient termination at UAR codons. For the efficient discrimination of guanine in the second codon position, Ser-70 must be able to form a hydrogen bond with Ser-33 (GTS loop), whose interaction is lost upon substitution with alanine [56].

Table 2

Mutations in the highly conserved TASNIKS and YCF motifs of domain 1 of ciliate eRF1s, which alter stop codon recognition specificity and constitute an important step in codon reassignment (adapted from [51]).

	TASNIKS motif	YCF motif	S70
Canonical codes	TASNIKS	YLCDNKF	Ser
Paramecium tetraurelia	EAASIKD	YFCDPQF	Ser
Loxodes striatus	RAQNIKS	FLCENTF	Ala
Oxytricha trifallax	AAQNIKS	YFCGGKF	Ser
Tetrahymena thermophila	KATNIKD	YFCDSKF	Ser
Stylonychia lemnae	AAQNIKS	YFCGGKF	Ser
Stylonychia mytilus	AAQNIKS	YFCGGKF	Ser
Euplotes octocarinatus/a	TAESIKS	YICDNKF	Ala
Euplotes octocarinatus/b	TAVNIKS	YICDNKF	Ala
Euplotes aediculatus/a	TAESIKS	YICDNKF	Ala
Euplotes aediculatus/b	TAVNIKS	YICDNKF	Ala
Blepharisma americanum	KSSNIKS	YICDNKF	Ala
Blepharisma japonica	KSSNIKS	YICDNKF	Ala
Blepharisma musculus	KSSNIKS	YICDNKF	Ala

The only known sense-to-sense reassignment in nuclear genomes is found in several Candida species [5] where the CUG codon is reassigned from Leu to Ser, although its decoding in vivo still involves some degree of ambiguity [57,58,59]. This code alteration is mediated by a Ser-tRNACAG (Figure 2A,B) that is recognized by both SerRS and LeuRS [60,61]. It has the leucylation identity elements A35 and m1G37 and a U-to-G33 mutation which distorts the anticodon U-turn and lowers its leucylation and decoding efficiencies. The discriminator base is G73 which is a major identity element for serylation along with 3 GC pairs in the variable arm [60,61].

Figure 2

tRNA secondary structures. (A,B) A purine at position 33 (G33) in the C. albicans tRNA SerCAG anticodon loop replaces a conserved pyrimidine found in all other tRNAs and is a key structural element in the reassignment of the CUG codon from leucine to serine. Two other nucleotides in the anticodon loop, A35 and G37, are important for leucylation, and the discriminator base, G73, functions as a negative identity determinant for leucyl-tRNA synthetase (A73 is required for leucylation); (C) tRNAsSec from all domains of life are unusual in both length (>90 nt) and structure. Most tRNAs have 7 bp in the acceptor stem and 5 in the TΨC arm, while eukaryal and archaeal tRNAsSec exhibit a 9 bp in the acceptor stem and 4 in the TΨC arm. Eukaryotic and archaeal tRNASec species have 6 or 7 bp D-stems, respectively. Molecular modeling suggested that a 7 bp D-stem in archaeal tRNASec would compensate for the short 4 bp T-stem thus allowing for the normal interaction between the D- and T-loops; (D) tRNAsPyl has a smaller D-loop (4–5 bp). Only one base is found between the acceptor and D-stems, rather than two bases, and the almost universally conserved G-purine sequence in the D-loop and TΨC sequence in the T loop are lacking. The anticodon stem forms with six, rather than five, base pairs, leaving only a very short (three base only) variable loop (adapted from [3]).

2.2. Mitochondrial Variations

Mitochondria show a significant diversity of codon identity reassignments, comprising nonsense-to-sense, sense-to-sense, sense-to-nonsense and sense-to-unassigned codon changes [62]. Alterations appear to be facilitated due to their reduced genome size and complexity, which encodes only a small set of essential genes. Also, their genomes tend to be strongly biased as they are AT-rich [62]. They encode only a small set of tRNAs (for example, human mtDNAs encode 22 tRNA species [63] and thus each tRNA can read two to four codons in a four codon-box by expanded wobbling (Figure 1). For example, the presence of an unmodified U at anticodon position 34 (wobble) enables pairing with N-ending codons, allowing for decoding four codons in codon-boxes. Also, several modified nucleosides in the first and second position of the anticodon play critical roles in mitochondrial decoding [64]. (A) An expanded wobble rule; (B) Possible pairings between the wobble nucleoside of tRNA and the codon third nucleoside of mRNA found in animal mitochondria. U *: cmnm5(s2)U, mnm5U, τm5U, τm5s2U (adapted from [64]). Termination codons have been reassigned to different amino acids in mitochondria. The UAA codon is decoded as Tyr in the mitochondria of the nematode R. similis [65]. UAG codons are decoded as tyrosine by an unusual Tyr-tRNACUA in calcareous sponges [66], but in green algae its meaning has changed to Ala or Leu [67]. The most frequent reassignment involves decoding of the UGA stop as Trp [68]. This change is mediated by a Trp-tRNA with the anticodon UCA, where its wobble position carries a modified uridine. Modifications can be 5-carboxymethylaminomethyluridine (cmnm5U), 5-carboxymethylaminomethyl(2-thio)uridine (cmnm5s2U) or 5-taurinomethyluridine (τm5U) and they expand the decoding capacity to R-ending codons, enabling the decoding of UGG and UGA codons as Trp [69]. Sense codons also change identity in mitochondria and some are unassigned as they are not present in the mtDNA. Insertion of Met at Ile AUA codon is frequent in most metazoans. In mammalians, this identity change is mediated by a Met-tRNACAU with a modified C in the wobble position to 5-formylcytidine (f5C) [70], which enables decoding of both AUG and AUA codons [71]. Ascidian Met-tRNA has a τm5U modification in the same position [69]. The AAA-Lys codon is translated as asparagine in echinoderms and platyhelminths [72]. In starfish mitochondria, a single Asn-tRNAGΨU with a modification to pseudouridine (Ψ) in the second position of the anticodon decodes the canonical AAY-Asn codons and the AAA-Lys codon. Also, its Lys-tRNA has a CUU anticodon, instead of GUU, which restricts its decoding to AAG only [73]. Mitochondria of the yeast species Saccharomyces, Nakaseomyces and Vanderwaltozyma decode the four Leu-CUN codons as threonine [74]. This alteration is associated with the loss of the Leu-tRNAUAG capable of decoding the CUN codons and the appearance of a mutant Thr-tRNAUAG with an unmodified U at the wobble position which enables recognition of all four nucleotides at the third codon position [64]. Interestingly, this Thr-tRNA has evolved from a His-tRNAGUG due to loss of its typical guanosine at position -1 and substitution of the discriminator base C73 to A73 (critical identity elements for the HisRS) [75], and by addition of an adenosine at position 35. Consequently, its anticodon loop has 8-nt and is a substrate for the yeast ThrRS [76]. On the other hand, the yeast Ashbya gossypii decodes the CUU and CUA codons as Ala using an Ala-tRNAUAG [77]. It was proposed that this tRNA evolved from the later Thr-tRNAUAG through reduction of the anticodon loop (major identity element to S. cerevisiae ThrRS [78]) and introduction of a G3:U70 base pair which is a major identity element for the AlaRS [75]. Arginine AGA and AGG codons change identity very often and have different meanings, namely Ser [79], Gly [80] or stop [63]. Mitochondria that reassigned AGR codons lack the Arg-tRNAUCU gene, which has been proposed as the initial step for these reassignments [68]. In the absence of the competitor Arg-tRNAUCU, the AGA codon is captured by a Ser-tRNAGCU [81]. In Drosophila, AGG codons are absent and only AGA codons are decoded by the Ser-tRNAGCU which has an unmodified G at the wobble position [82]. In squid and starfish mitochondria, the wobble position of Ser-tRNAGCU is methylated to m7G34 which expands its capacity to read AGR-Arg codons, inserting serine at these sites [83]. On the other hand, the wobble position of Ser-tRNA of Ascaris mitochondria is occupied by an unmodified U [84], which allows decoding of AGN codons as Ser [81]. In ascidian mitochondria, AGR codons are decoded as Gly by a Gly-tRNAUCU with a modification in the wobble position to τm5U [69]. Although the majority of changes are associated to the codon pair simultaneously, some arthropods and also the nematode R. compacta decode the AGG codon as Lys and AGA as Ser. These species have an unmodified Ser-tRNAGCU for AGA codons and a Lys-tRNA with a CUU anticodon instead of the typical UUU anticodon, which is thought to recognize the AGG codons at low efficiency [85]. Interestingly, the appearance of this atypical Lys-tRNACUU restricts recognition of AAA-Lys codons, which has been correlated with its reassignment to Asn by Asn-tRNAGUU, in this case and in other species that do not use the AGG codon as Lys (e.g., in echinoderms) [73]. Another codon that is reassigned to stop is the UCA-Ser codon of the green alga Scenedesmus obliquus [86]. Both have in common the absence of the cognate tRNA that would recognize AGR or UCA codons, namely Arg-tRNAUCU [68] and Ser-tRNAUGA, respectively. Since Ser-tRNAUGA is responsible for decoding the UCN-Leu codon-box, S. obliquus has a Ser-tRNAGGA to decode the other UCU and UCC codons, and UCG is an unassigned codon [86]. Termination codons have also been reassigned in mitochondria. The reassignment of the UGA codon to Trp happens in all animal mitochondria [64]. These reassignments require changes in the release factors, but the termination mechanism in mitochondria remains an unsolved question. Four different homologues to bacterial release factors have been found in human mitochondrial systems: mtRF1, mtRF1a, ICT1 and C12orf65 [87]. To date, none of these factors have shown specific UGA release activity. Although molecular dynamics simulations have proposed that mtRF1 may behave like RF1 [88] or that it may rescue stalled ribosomes with empty A-sites [89], its function remains elusive since no in vitro release activity has been found for any termination codon, including AGR codons [90]. mtRF1a has in vitro and in vivo release activity in response to UAG and UAA stop codons, similarly to bacterial RF1 [91]. ICT1 is an integral member of the mitoribosome with codon-independent peptidyl-tRNA hydrolase activity [87], and is supposed to function as a multipurpose rescue factor for stalled ribosomes [90]. Regarding the use of AGR codons as termination codons in vertebrate mitochondria, one must consider the absence of the Arg-tRNAUCU that decodes AGR codons [68]. Since it is expected that the ribosome stalls at these sites, ICT1 recognizes it and terminates translation at AGR sites [90].

2.3. Natural Expansion of the Genetic Code to 22 Amino Acids

Termination codons are also the target for the incorporation of the non-canonical amino acids selenocysteine (Sec), in a wide range of prokaryotes and eukaryotes [92], and pyrrolysine (Pyl) in archaeal Methanosarcina species [93], producing novel classes of proteins. Incorporation of Sec in response to an in-frame UGA codon is achieved by complex recoding machinery that informs the ribosome not to stop at this position. The mechanism is distinct in prokaryotic and eukaryotic organisms, but there are some similarities. Both have a special Sec tRNA, which is a minor isoacceptor derived from a serine tRNA (Figure 2C). The other key players are SelB and SECIS (selenocysteine insertion sequence). Since Sec has its own tRNASec, biosynthesis begins with SerRS acylating tRNASec with serine, producing Ser-tRNASec. Then, different enzymes convert Ser-tRNASec into Sec-tRNASec: selenocysteine synthase (SelA) and selenophosphate synthetase (SelD) in bacteria and O-phosphoseryl-tRNA kinase (PSTK) and Sep-tRNA:Sec-tRNA synthase (SepSecS) in archaea and eukarya [10,94]. Once the Sec-tRNASec is available, recoding of UGA as Sec requires the presence of the translation elongation factor SelB. This factor binds to Sec-tRNASec and forms the SelB.GTP.Sec-tRNASec complex that is delivered to the ribosome. Studies performed by Bock and co-workers revealed that SelB must be complexed with the SECIS element for the correct interaction with the ribosome to occur [92]. Binding of the ternary complex to the SECIS structure induces a conformational change in SelB that enables codon–anticodon interaction between the Sec-tRNASec and the UGA codon at the ribosomal A-site. Therefore, the SECIS element has a critical double function. It converts SelB into a “competent state” that gives SelB a strong competitive advantage relative to the release factor for decoding UGA. Simultaneously, it prevents normal UGA termination codons from being decoded as Sec by the SELB.GTP.Sec-tRNASec ternary complex. The dual properties of SelB and SECIS ensure that only UGA codons in selenoprotein mRNAs are recoded [9]. tRNA secondary structures. (A,B) A purine at position 33 (G33) in the C. albicans tRNA SerCAG anticodon loop replaces a conserved pyrimidine found in all other tRNAs and is a key structural element in the reassignment of the CUG codon from leucine to serine. Two other nucleotides in the anticodon loop, A35 and G37, are important for leucylation, and the discriminator base, G73, functions as a negative identity determinant for leucyl-tRNA synthetase (A73 is required for leucylation); (C) tRNAsSec from all domains of life are unusual in both length (>90 nt) and structure. Most tRNAs have 7 bp in the acceptor stem and 5 in the TΨC arm, while eukaryal and archaeal tRNAsSec exhibit a 9 bp in the acceptor stem and 4 in the TΨC arm. Eukaryotic and archaeal tRNASec species have 6 or 7 bp D-stems, respectively. Molecular modeling suggested that a 7 bp D-stem in archaeal tRNASec would compensate for the short 4 bp T-stem thus allowing for the normal interaction between the D- and T-loops; (D) tRNAsPyl has a smaller D-loop (4–5 bp). Only one base is found between the acceptor and D-stems, rather than two bases, and the almost universally conserved G-purine sequence in the D-loop and TΨC sequence in the T loop are lacking. The anticodon stem forms with six, rather than five, base pairs, leaving only a very short (three base only) variable loop (adapted from [3]). While Sec is generated by a pretranslational modification of Ser-tRNASec (Figure 2D), pyrrolysine (Pyl) is directly attached to tRNAPylCUA by PylRS in response to an in-frame UAG codon in the Methanosarcina barkeri monomethylamine methyltransferase gene [12]. These are methane-producing organisms and Pyl is necessary for methane biosynthesis from methylamines. Indeed, the three different methyltransferases that initiate methanogenesis from different methylamines have genes with an in-frame UAG codon which is translated as pyrrolysine [11,93]. The mechanism for Pyl insertion requires a tRNAPyl (tRNAPylCUA) and a pyrrolysyl-tRNA synthetase (PylRS). The PylRS is considered the 21st AARS, since it charges specifically Pyl to tRNAPylCUA (lysine itself and its cognate tRNALys are not substrates of this enzyme) [95]. Therefore, PylRS is the first example of a synthetase that is specific for a modified amino acid; PylRS and tRNAPyl form a naturally occurring AARS-tRNA pair that is effectively orthogonal to the canonical genetic code [11]. Several mechanisms for Sec and Pyl insertion in protein sequences are present in different organisms, but context dependency is the universal feature of these occurrences and they can be regarded as preprogrammed modifications of canonical decoding rules.

3. Genetic Code Expansion for Co-Translational Protein Engineering

The study of structural and molecular features of non-standard genetic codes, in addition to support models for codon reassignment theories (reviewed in [96,97]), also provides useful information for synthetic rewriting of genetic codes. Incorporation of non-canonical amino acids (ncAAs), in particular, the isostructural ncAAs which are recognized by the endogenous host cell machinery, has been possible by replacement of canonical amino acids (cAAs) using a supplementation-based incorporation method (SPI). This approach uses auxotrophic strains for one of the common 20 canonical amino acids (cAAs) to replace a specific cAA with a ncAA. The method exploits the natural tolerance of the host AARSs to the isostructural ncAAs, which allows the concurrent exchange of many residues in a target protein by sense-codon reassignment [98]. Although the overall replacement of a cAA by a ncAA cannot be tolerated during exponential growth, non-dividing cells are viable and are able to overexpress proteins that contain the ncAA. The diversity of amino acid analogs that can be incorporated using this approach has been increased through AARS overexpression, active-site engineering and editing domain mutations [99]. Numerous examples of applications of this technique are available, including the replacement of methionine with selenomethionine to introduce a heavy atom into proteins for crystallographic phasing experiments [100] and, in other cases, methionine or phenylalanine have been replaced by alkyne-containing ncAA analogs to track newly synthesized proteins [101]. As for orthogonal ncAAs (that do not participate in conventional translation), they have been added by site-specific incorporation in response to stop or quadruplet codons (stop codon suppression, SCS) using orthogonal aminoacyl-tRNA synthetase:tRNA pairs (Figure 3) [102]. Orthogonal tRNAs and AARSs are constructed by following a series of conditions that contribute to the lack of cross-reactivity between the pair and the endogenous host synthetases, amino acids and tRNAs. Firstly, the tRNA cannot be recognized by the endogenous AARSs of the host, but must function efficiently in translation. Another crucial requirement for the tRNA is that it must deliver the ncAA in response to a unique codon that does not encode any of the 20 cAA (for example, a stop codon). Secondly, the orthogonal AARS must aminoacylate only the orthogonal tRNA and none of the endogenous tRNAs. This synthetase must also aminoacylate the tRNA with only the desired unnatural amino acid and no endogenous amino acid. Similarly, the ncAA cannot be a substrate for the endogenous synthetases. Finally, the ncAA must be efficiently transported into the cytoplasm when added to the growth medium, or biosynthesized by the host [103]. A number of heterologous AARS/tRNA pairs have been developed to expand the genetic code of E. coli, yeast and mammalian cells. For example, the E. coli GluRS/human initiator tRNA, the E. coli TyrRS/E. coli tRNATyr, the E. coli LeuRS/E. coli tRNALeu, and the M. mazei PylRS/M. mazei tRNAPyl pairs are all orthogonal in S. cerevisiae [102], demonstrating the potential of this methodology for synthetic biology.

Figure 3

(A) Aminoacylation with canonical amino acids. tRNA aminoacylation is catalyzed by the corresponding aminoacyl-tRNA synthetase responsible for charging the tRNA with the cognate amino acid; (B) Stop codon suppression methods use heterologous orthogonal AARS:tRNA pairs to incorporate an orthogonal amino acid in response to a stop or quadruplet codon. This orthogonal amino acid is not a substrate for the endogenous tRNA and AARS (adapted from [104]).

3.1. Reassignment of Stop Codons

Stop codon suppression is the most frequently used method to incorporate ncAA into proteins in vivo. This approach comprises the use of an orthogonal aminoacyl-tRNA synthetase/tRNA pair, specifically developed to introduce ncAAs at the stop codon, and deletion of the corresponding release factor to increase suppression efficiency. One of the first successful reassignments was performed by Mukai and colleagues that reassigned the UAG (amber) codon to the ncAA iodotyrosine (3-iodo-l-Tyr) [19]. They started by mutagenizing the UAG stop codon to UAA in seven essential genes of E. coli, which allowed the deletion of the RF1-encoding prfA gene (release factor 1 terminates gene translation at UAA and UAG). Next, cells were supplied with an amber suppressor archaebacterial TyrRS/tRNACUA pair that inserted 3-iodo-l-Tyr when it encountered UAG, as demonstrated by the full-length expression of a target protein containing six copies of the UAG codon [19,105,106]. Recently, several groups applied a genome wide editing approach where the replacement of the amber stop codon occurs not only in essential genes but in all instances [34, 107, 108]. For example, Lajoie et al. used both multiplex automated genome engineering (MAGE) [109] and conjugative assembly genome engineering (CAGE) [107] to replace all known UAG stop codons in E. coli MG1655 with synonymous UAA codons. This allowed the deletion of RF1 and, therefore, elimination of termination at UAG codons. The resulting organism allowed them to reintroduce amber codons, along with an orthogonal translation machinery (episomal pEVOL) to permit efficient and site specific incorporation of p-azidophenylalanine (pAzF) and 2-naphthalalanine (NapA) into green fluorescent protein (GFP). This recoded organism exhibited increased resistance to T7 bacteriophage, suggesting that new genetic codes could facilitate increased viral resistance [34]. Although this approach is widely used nowadays, it is mostly applied in prokaryotic organisms because deletion of RF1 is not viable in yeast or mammalian cells [110]. Another limitation of this method concerns the nonsense mediated mRNA decay (NMD) mechanism that degrades mRNAs with premature stop codons, which significantly decreases protein yield [111].

3.2. Reassignment of Sense Codons

Although recent methods for protein engineering rely on the manipulation of the translation apparatus of the host, the simplest method exploits the close structural similarity between ncAA and a natural amino acid. Due to this similarity, the appropriate aminoacyl-tRNA synthetase is not able to distinguish between cAA and ncAA and permits non-specific charging of the ncAA onto tRNA. Consequently, the activated ncAA-tRNA is used in the translation process and the ncAA is incorporated in response to the sense codon encoding the corresponding cAA. The efficiency of this method is improved when competition from the canonical amino acid for the reassigned sense codon is limited. Auxotrophic bacterial hosts starved for the natural amino acid and supplemented with the ncAA are often used. The success of this strategy was first demonstrated by Cohen and Cowie when they took advantage of the relaxed substrate binding pocket of MetRS to completely replace the natural amino acid methionine by its analog selenomethionine in an E. coli methionine auxotroph [112]. Since then, many other sense codons have been reassigned to incorporate ncAAs into proteins via global substitution [99]. Complementary techniques to this approach have also been used, particularly the over-expression of the aminoacyl-tRNA synthetase of interest and attenuation of its hydrolytic editing activity [113]. For example, overexpression of valyl-tRNA synthetase (ValRS) in a valine auxotroph led to incorporation of one of the stereoisomers of 4,4,4-trifluorovaline (2S,3R-Tfv) in response to valine codons, as indicated by mass spectrometry [114]. Also, Yang and Tirrell showed that mutation of the conserved threonine residue to tyrosine (T252Y) in the editing domain of E. coli LeuRS led to the disruption of the editing activity of the LeuRS, which allowed the incorporation of several unsaturated, non-canonical amino acids in response to leucine codons [115]. Another methodology takes advantage of codons that are decoded by wobbling. At the third position of such codons, Us and Cs can be read by G in the anticodon of the corresponding tRNA while As and Gs can be read by a U or pseudouridine. Kwon et al. introduced an orthologous PheRS/tRNAAAA pair from yeast into an E. coli Phe auxotrophic host and put a target gene under a strong inducible promoter. This gene contained the UUC codon at all desired Phe sites, and a UUU wobble codon was inserted at specific sites for 2-naphthylalanine. The yeast PheRS was able to activate 2-naphthylalanine and charged it on the yeast Phe-tRNAAAA, allowing for the production of a recombinant protein with 2-naphthylalanine [116]. Rare codons provide another method to introduce ncAAs into proteins. For example, the rare AGG arginine codon in E. coli has been reassigned to ncAAs using the PylRS/tRNAPylCCU pair. Since codon usage and tRNA gene content coevolved to match each other, the endogenous Arg-tRNACCU content is low, which allowed the ncAA-activated orthogonal tRNACCU to outcompete the former for the AGG codon. Zeng et al. showed that when N-alloc-lysine was used as a PylRS substrate, almost quantitative occupancy of N-alloc-lysine at an AGG codon site was achieved in minimal medium [117]. Recently, Mukai and colleagues demonstrated the in vivo reassignment of the AGG sense codon from arginine to l-homoarginine. A variant of the archaeal pyrrolysyl-tRNA synthetase (PylRS) was engineered in order to recognize l-homoarginine. The expression of this variant with the AGG-reading tRNAPylCCU permitted the efficient incorporation into proteins of the arginine analog. Subsequently, all AGG codons in essential genes were eliminated and the bacterial ability to translate AGG into arginine was restricted in a temperature-dependent manner [118].

3.3. Quadruplet Codons

Another opportunity to expand codons for ncAAs emerged from the discovery of naturally occurring frameshift suppressor tRNAs, namely UAGN suppressors (N being A, G, C, or T) derived from Su7-encoding glutamine, ACCN suppressors derived from sufJ-encoding threonine and CAAA suppressors derived from tRNALys and tRNAGln [119]. In these cases, four bases specify an amino acid in response to a mutant tRNA with an extra nucleotide in its anticodon loop (eight nucleotides instead of the standard seven), which leads to a reading frame shift and synthesis of a full length protein. Following this rationale, an orthogonal four-base suppressor tRNA/synthetase pair was generated from Pyrococcus horikoshii tRNALys sequences. The mutant suppressor pair permitted the incorporation of l-homoglutamine into proteins in E. coli in response to the quadruplet codon AGGA [119]. Frequently, quadruplets target a rare codon to avoid competition of the native tRNA for the first three bases, which decreases the yield of the target protein with the ncAA. Since the endogenous tRNA is readily accepted by the native ribosome, several groups developed “orthogonal” ribosomes [120,121] that only recognize altered ribosome-binding sites (RBS). The presence of these mutant RBSs assures that only mRNAs containing those sequences are translated by the orthogonal ribosomes with reduced premature termination (ribo-X). This methodology generated orthogonal ribosomes with increased amber suppression on the desired mRNA, while native ribosomes sustained the standard level of amber suppression. Ribo-X were then evolved to increase the efficiency of translation of quadruplet codons (ribo-Q). Recently, a protein containing an azide and an alkyne was produced efficiently using this approach, which allowed the establishment of an internal cross-link [122]. The expectation is that ribo-Q might enable more ambitious alterations to proteins in the near future.

4. Conclusions and Perspectives

Genetic code alterations may be much more frequent than previously expected, as indicated by the diverse range of alterations found to date (Table 1) [3,123]. Low codon usage, codon unassignment, genome GC pressure, genome minimization, small proteome size and tRNA disappearance are essential players for the evolution of the genetic code [96, 124,125,126]. The Codon Capture theory posits that under biased genome AT or GC pressure, certain codons vanish from the polypeptide coding sequences (ORFeome). These unassigned codons lead to loss of functionality of the corresponding tRNAs, which can be eliminated by natural selection [125]. These erased codons may be reintroduced by genetic drift. Since GC content fluctuates over time, the erased codons can re-emerge, but they may lack cognate tRNAs. Cells that are able to capture these codons and convert them to sense codons have a growth advantage and the codon reassignment can be achieved. The codon capture theory is supported by the disappearance of the CGG codon in Mycoplasma capricolum (25% genome G + C) and the AGA and AUA codons in Micrococcus luteus (75% genome G + C) [127]. On the other hand, there are several other examples of codon reassignments in organisms where strong GC biases do not exist, and even cases of codon reassignments that appear against such bias; for example, reassignment of the leucine CUU and CUA codons to threonine in the AT rich genome of yeast mitochondria [128]. These codon reassignments are better explained by the Ambiguous Intermediate theory [62,124]. This theory postulates that ambiguous codon decoding provides an initial step for gradual codon identity change, and wild-type or mutant misreading tRNAs are the critical elements of codon reassignment. The appearance of mutant tRNAs with altered/expanded decoding properties allows the recognition and translation of non-cognate codons that are incorporated into proteins in competition with cognate ones. Consequently, statistical proteins are produced and, if this ambiguous codon translation is advantageous for the organism, the alternative codon interpretation is selected by natural selection, leading to a new arrangement of the code [124]. This theory is strongly supported by CUG reassignment from leucine to serine in fungi [4,129]. The incidence of genetic code alterations in mitochondria suggests that proteome size imposes strong negative pressure on codon reassignment. This is in line with the Genome Minimization hypothesis that posits that replication speed imposes a strong negative pressure on the mitochondrial genome, leading to selection of small size genomes [126]. This is supported by a study in human mitochondria where only 13 of the 900 proteins of its proteome are encoded by its genome [130]. Since nuclear encoded proteins are synthesized in the cytoplasm using the standard genetic code and are transported into the mitochondria using a signal peptide translocation system, their synthesis escapes the disruption caused by mitochondrial codon reassignments. The three theories are not exclusive, since the ambiguous intermediate stage can be preceded by a decrease in the content of GC rich codons, so that codon reassignment might be driven by a combination of evolutionary mechanisms [131]. Additionally, the unpredicted existence of AARSs specific for the noncanonical amino acids pyrrolysine and O-phosphoserine [11] raised the possibility that other amino acids with particular functions might exist in still-uncharacterized genomes. Detailed characterization of natural reassignments was a key step for developing efficient strategies to expand the code for production of proteins with novel biochemical properties. Due to the central importance of engineering proteins for both basic research and biopharmaceutical drug development, there are several established methods to accomplish the incorporation of non-natural amino acids. These can offer selective advantages beyond the evolution of proteins with only the canonical amino acids. One area that benefits from expanded genetic codes is the field of synthetic biology. Synthetic biologists have successfully engineered a wide range of functions into artificial gene circuits, generating switches, oscillators, filters, sensors, and cell-cell communicators with potential applications in medicine, biotechnology, bioremediation, and bioenergy [132]. For example, selective pressure incorporation (SPI) methodologies are currently being used to incorporate non-natural amino acids with reactive functional groups that are critical in site-specific derivatization of proteins for therapeutic purposes. Cho and colleagues reported the recombinant expression of human growth hormone (hGH) containing a site-specifically incorporated para-acetylphenylalanine (pAcF), which served as a chemical handle for conjugation to poly(ethylene glycol) (PEG) [133]. The resulting homogeneously mono-PEGylated hGH showed favorable pharmacodynamics and is being developed clinically [133]. Also, SPI methodologies allowed the purification and identification of 195 newly synthesized proteins in human embryonic kidney (HEK293) cells by orthogonal labeling of non-natural amino acids that were incorporated proteome-wide, following the removal of the corresponding natural amino acid [134]. More recently, Romesberg and colleagues surpassed the dependency on the four natural nucleotides A, T, G, and C [135] by using unnatural base pairs (UBPs) that allowed the incorporation of 152 additional non-canonical amino acids. The future will likely include a host for new applications based on these new technologies.

134 in total

1. Changes in mitochondrial genetic codes as phylogenetic characters: two examples from the flatworms.

Authors: M J Telford; E A Herniou; R B Russell; D T Littlewood
Journal: Proc Natl Acad Sci U S A Date: 2000-10-10 Impact factor: 11.205

2. Inhibited cell growth and protein functional changes from an editing-defective tRNA synthetase.

Authors: Jamie M Bacher; Valérie de Crécy-Lagard; Paul R Schimmel
Journal: Proc Natl Acad Sci U S A Date: 2005-01-12 Impact factor: 11.205

Review 3. Universal rules and idiosyncratic features in tRNA identity.

Authors: R Giegé; M Sissler; C Florentz
Journal: Nucleic Acids Res Date: 1998-11-15 Impact factor: 16.971

4. Sequence and organization of the human mitochondrial genome.

Authors: S Anderson; A T Bankier; B G Barrell; M H de Bruijn; A R Coulson; J Drouin; I C Eperon; D P Nierlich; B A Roe; F Sanger; P H Schreier; A J Smith; R Staden; I G Young
Journal: Nature Date: 1981-04-09 Impact factor: 49.962

5. Pyrrolysine encoded by UAG in Archaea: charging of a UAG-decoding specialized tRNA.

Authors: Gayathri Srinivasan; Carey M James; Joseph A Krzycki
Journal: Science Date: 2002-05-24 Impact factor: 47.728

6. Characterisation of a non-canonical genetic code in the oxymonad Streblomastix strix.

Authors: Patrick J Keeling; Brian S Leander
Journal: J Mol Biol Date: 2003-03-07 Impact factor: 5.469

7. Genetic code deviations in the ciliates: evidence for multiple and independent events.

Authors: A B Tourancheau; N Tsao; L A Klobutcher; R E Pearlman; A Adoutte
Journal: EMBO J Date: 1995-07-03 Impact factor: 11.598

8. tRNA Modification and Genetic Code Variations in Animal Mitochondria.

Authors: Kimitsuna Watanabe; Shin-Ichi Yokobori
Journal: J Nucleic Acids Date: 2011-10-09

9. RF1 knockout allows ribosomal incorporation of unnatural amino acids at multiple sites.

Authors: David B F Johnson; Jianfeng Xu; Zhouxin Shen; Jeffrey K Takimoto; Matthew D Schultz; Robert J Schmitz; Zheng Xiang; Joseph R Ecker; Steven P Briggs; Lei Wang
Journal: Nat Chem Biol Date: 2011-09-18 Impact factor: 15.040

10. Evolved orthogonal ribosomes enhance the efficiency of synthetic genetic code expansion.

Authors: Kaihang Wang; Heinz Neumann; Sew Y Peak-Chew; Jason W Chin
Journal: Nat Biotechnol Date: 2007-06-24 Impact factor: 54.908

12 in total

1. The "periodic table" of the genetic code: A new way to look at the code and the decoding process.

Authors: Anton A Komar
Journal: Translation (Austin) Date: 2016-09-09

Review 2. How tRNAs dictate nuclear codon reassignments: Only a few can capture non-cognate codons.

Authors: Martin Kollmar; Stefanie Mühlhausen
Journal: RNA Biol Date: 2017-01-17 Impact factor: 4.652

3. The Standard Genetic Code can Evolve from a Two-Letter GC Code Without Information Loss or Costly Reassignments.

Authors: Alejandro Frank; Tom Froese
Journal: Orig Life Evol Biosph Date: 2018-06-29 Impact factor: 1.950

4. A Robust and Quantitative Reporter System To Evaluate Noncanonical Amino Acid Incorporation in Yeast.

Authors: Jessica T Stieglitz; Haixing P Kehoe; Ming Lei; James A Van Deventer
Journal: ACS Synth Biol Date: 2018-09-04 Impact factor: 5.110

Review 5. Peptides derived from small mitochondrial open reading frames: Genomic, biological, and therapeutic implications.

Authors: Brendan Miller; Su-Jeong Kim; Hiroshi Kumagai; Hemal H Mehta; Wang Xiang; Jiali Liu; Kelvin Yen; Pinchas Cohen
Journal: Exp Cell Res Date: 2020-05-06 Impact factor: 3.905