Literature DB >> 35330182

Why Is the UAG (Amber) Stop Codon Almost Absent in Highly Expressed Bacterial Genes?

Dominique Belin1, Pere Puigbò2,3,4.   

Abstract

The genome hypothesis postulates that genes in a genome tend to conform to their species' usage of the codon catalog and the GC content of the DNA. Thus, codon frequencies differ across organisms, including the three termination codons in the standard genetic code. Here, we analyze the frequencies of stop codons in a group of highly expressed genes from 196 prokaryotes under strong translational selection. The occurrence of the three translation termination codons is highly biased, with UAA (ochre) being the most prevalent in almost all bacteria. In contrast, UAG (amber) is the least frequent termination codon, e.g., only 321 occurrences (7.4%) in E. coli K-12 substr. W3110. Of the 253 highly expressed genes, only two end with an UAG codon. The strength of the selective bias against UAG in highly expressed genes varies among bacterial genomes, but it is not affected by the GC content of these genomes. In contrast, increased GC content results in a decrease in UAA abundance with a concomitant increase in UGA abundance. We propose that readthrough efficiency and context effects could explain the prevalence of UAA over UAG, particularly in highly expressed genes. Findings from this communication can be utilized for the optimization of gene expression.

Entities:  

Keywords:  bacterial genomes; codon prevalence; codon usage; gene; genome; genome hypothesis; highly expressed genes; non-sense; stop codons; translation termination; translational selection

Year:  2022        PMID: 35330182      PMCID: PMC8954436          DOI: 10.3390/life12030431

Source DB:  PubMed          Journal:  Life (Basel)        ISSN: 2075-1729


1. Introduction

The degeneracy of the standard genetic code was postulated by Crick et al. in 1961 [1]. In the standard genetic code, most amino acids are indeed encoded by two or more codons, with only methionine and tryptophan having a single codon. Synonymous codons are not used at the same frequency, and this frequency differs in different organisms [2]. Cases of extreme codon usage biases can be found in prokaryotes from low to high guanine + cytosine (GC) content, e.g., in Escherichia coli K-12 substr. W3110 (GC = 51.93%), the arginine AGG codon is 20 times less frequent than the CGC one; in Pseudomonas aeruginosa PAO1 (GC = 67.14%), the arginine AGG codon is 100 times less frequent than the CGC one; in Staphylococcus aureus subsp. aureus Mu50 (GC = 33.54%), the proline CCA codon is 15 times less frequent than the CCC one [3]. The differences in codon usage are more evident in ribosomal protein genes and other highly expressed genes (HEG) [4], especially in genomes under strong translational selection [5,6]. The three termination codons, UAG (amber), UAA (ochre), and UGA (opal), were first identified as nonsense codons that interrupt translation. The identification of UAG nonsense mutations was greatly facilitated by the frequent occurrence of amber suppressors in commonly used E. coli K-12 strains [7,8,9,10]. Furthermore, the amber suppressors are usually highly efficient, so that up to 70% of the amount of the wild-type protein can be produced. UAA (ochre) non-sense mutants and their suppressors were then rapidly identified [11]. In contrast to amber suppressors, ochre suppressors are much less efficient, and ochre mutants cannot be isolated in genes expressed in high amounts, such as phage structural genes. UGA (opal) non-sense mutants and their suppressors were identified later [12,13]. A few years ago, non-sense suppressors were identified in Haloferax volcanii [14], thus this is a common mechanism in all domains of life (archaea, bacteria, and eukaryotes). Readthrough of stop codons, whether as an adaptive mechanism or not [15], has been largely described in eukaryotes and bacteria [16]. Differential use of the codon repertoire in bacteria [2] is utilized in heterologous gene expression experiments [5]. However, the frequency of stop codons and the following downstream nucleotides has been largely neglected in such experiments. Here, we report the first analysis of the frequency of stop codons and their context in a set of HEG from 196 prokaryotes. We show the three termination codons are also not used at the same frequencies in HEG [17,18]. The UAG stop codon is universally suboptimal in bacteria [17], even more so in HEG. In contrast, UAA is dominant in low and intermediate GC content genomes, while UGA becomes prevalent in high GC content genomes. Moreover, noticeable tendencies in the downstream context arise from the dataset.

2. Materials and Methods

2.1. Data

A list of predicted highly expressed genes in 196 prokaryotes was gathered from the HEG-DB [6]. This database contains predictions of highly expressed genes in genomes under strong translational selection [5,6]. The E. coli genes information was obtained from Ecocyc: https://ecocyc.org/ [19], last accessed on 1 March 2022.

2.2. Frequency of Stop Codons

The frequency of stop codons in the list of prokaryotes under translation selection was calculated from the dataset (Table S1). In the genetic code 11 (the bacterial, archaeal, and plant plastid code), the stop codons are UAA, UAG, and UGA. Frequency of stop codons in E. coli strain K-12 substr. W3110 was obtained from the codon usage database [3].

2.3. Statistics

A Pearson correlation coefficient between frequencies of stop codons and GC content has been calculated using the statistical software package R (https://www.r-project.org/), last accessed on 1 March 2022. Differences in the frequency of +4 nucleotides have been analyzed with a Student’s t-test (p < 0.01).

3. Results and Discussion

In the group of highly expressed genes, an increase in GC content is strongly associated with a parallel increase in UGA and a decrease in UAA content, whereas UAG abundance remains low and largely unchanged (Figure 1), in agreement with previous studies [17,18]. The GC content is a major driver of UGA and UAA stop codons, but its contribution differs between bacteria [20] and archaea [18], and this trend is even stronger in HEG from genomes under strong translational selection. The HEG of 196 prokaryotic species analyzed here includes 192 bacteria (9 actinobacteria, 6 bacteroidetes, 1 deinococcus, 69 firmicutes, and 107 proteobacteria) and 4 archaea (4 euryarchaeota). Among this set, only 18 have more than 15% HEG that terminates with UAG, while this value is less than 5% for more than 100 species. The underrepresentation of the UAG stop codon in HEG is widespread across diverse taxonomic groups (with a few exceptions such as 8/107 proteobacteria, 3/69 firmicutes, and 6/9 actinobacteria) and is independent of the GC content of the organisms. It will be interesting to determine how the force(s) that contribute to counterselection of UAG termination codons in E. coli are less effective in these organisms.
Figure 1

Correlation of the frequency of the three termination codons and the guanine + cytosine (GC) content in the highly expressed genes of 196 prokaryotes under strong translational selection.

It was proposed that the high suppression efficiency of UAG amber mutations is somehow related to the rare occurrence of this codon at the end of E. coli genes. However, this correlation is unlikely to reflect a causal link for two reasons. First, the standard genetic code cannot have evolved in the presence of classical nonsense suppressors that only appear through selective pressure. Such a selective pressure was provided by an rpoS amber mutation in the original K-12 strain [10]. In the related B strain, also extensively used in laboratory conditions, most isolates are devoid of suppressor mutations. Second, 29% of the E. coli genes and 12% of the HEGs terminate with a UGA codon, even though suppression of UGA opal mutations occurs with a >50% efficiency. In E. coli, the three termination codons, UAG, UAA, and UGA, are not used at the same frequencies, with only 321 genes ending with UAG (7.4%), while 2765 (64%) end with UAA and 1249 (29%) end with UGA. With the identification of HEG, this difference becomes even more striking. Out of 253 HEG in E. coli, 223 (87%) end with UAA, 30 (12%) with UGA and only 2 (0.8%) with UAG. The two HEG that end with UAG are atpE and sucB. The atpE gene, which is essential in both rich and minimal media, encodes a membrane-embedded subunit of the Fo complex. Its UAG termination codon is immediately followed by a UAA codon. Thus, suppression or read through of the atpE UAG would only add one residue at the C-terminus, which faces the cytoplasm. The sucB gene, which is only essential in some rich media, encodes a cytoplasmic enzyme involved in lysine degradation that is a component of the 2-oxoglutarate dehydrogenase complex. The UAG termination codon is immediately followed by a second UAG codon, and the next termination codon (UGA) is 21 codons away. In this case, readthrough of both UAG could affect the activity or stability of SucB. Readthrough of nonsense codons refers to continued translation elongation in the absence of a suppressor tRNA. Gln is inserted at both UAG and UAA codons, although termination is much more efficient with UAA (about 0.2%) [21]. In the case of UGA, readthrough occurs with an efficiency of about 2% and results from the incorporation of Trp [13]. The termination efficiency of nonsense mutations varies at different sites; this phenomenon is called the context effect. This was first observed with phoA UAG mutants in a strain carrying an amber suppressor [8]. Termination could be as high as 97% at one site and as low as <1% at another [8]. Starting with an UAA mutant that was poorly suppressed by an ochre suppressor and thus efficiently terminated, mutations in the immediate vicinity of the UAA codon decreased termination by at least 10-fold [22]. Similar results have been observed with the base (also known as +4 nucleotide) following UGA codons [23]. Termination efficiency was also shown to vary extensively at many sites within the lacI gene [24,25]. In general, between one (+4) and six (+9) nucleotides after the stop codon are critical for termination efficiency [16,26,27]. A role for context effects is illustrated by the higher prevalence of the U residues at the 3′ side of UAA and UGA in HEGs when compared to low expressed genes (LEG) [28]. Our results show the prevalence of U nucleotides after the stop codon is dominant independently of the GC content of the genome, especially in the HEG (Figure 2). Although, in some genomic contexts, U residues share dominance with another nucleotide, e.g., in AT-rich genomes, U and A residues are equally frequent after the stop codon. The dominance of U residues after the stop codon is almost universal in HEG, with the exception of UAA-ending genes in GC-rich genomes, where U residues share dominance with G residues. Overall, results indicate a set of rules that largely determine the context effect that drives the efficiency of gene expression (Figure 2).
Figure 2

Frequency of nucleotides after the stop codons. (a) All genes from genomes with a %GC lower than 40% (b) HEG from genomes with a %GC lower than 40% (c) All genes from genomes with a %GC between 40% and 60% (d) HEG from genomes with a %GC between 40% and 60% (e) All genes from genomes with a %GC higher than 60% (f) HEG from genomes with a %GC higher than 60. Groups within stop codons are determined based on a Student’s t-test (p < 0.01).

Some authors have debated whether a second stop codon may be present after the termination one to increase termination efficiency [29,30,31,32], as in the two E. coli K-12 HEG that end with UAG (atpE and sucB). However, neither HEG nor LEG have a higher frequency of additional stop codons after the termination one, although there is a dominance of UNN (N = any nucleotide) triplets and a detectable prevalence of UUU (Figure 3). Moreover, it is interesting to notice that 17/18 genomes with more than 15% UAG in their HEG have an extended preference for certain residues for at least 10 residues on the 3′ side of the UAG codon.
Figure 3

Frequency of nucleotides after the stop codons (+456) in the HEG. (a) Most frequent codons downstream of the stop codon. The red line indicates chance expectations (1/64 = 2%). Only codons occurring with or above the chance expectation are indicated. (b) Frequency of tandem stop codons.

Thus, we hypothesize that low readthrough efficiency and context effects may provide a better explanation for the prevalence of UAA termination codons over UAG, particularly in highly expressed genes. Moreover, the wide suppression across taxonomic groups of the UAG stop codon in bacteria is an indication that this is an ancient mechanism. However, a few taxonomically diverse species have higher %UAG than expected, thus we speculate that they may have developed alternative strategies to efficiently overcome readthroughs.

4. Conclusions

Our data indicate that UAG codon is universally suboptimal in the HEG from prokaryotes under translational selection. UAA is likely to be the preferred stop codon for low or intermediate GC content, whereas UGA is the preferred stop codon for high GC content. Readthrough efficiency and additional context effects may explain the dominance of UAA and UGA stop codons in HEG. Although the UAA stop codon is usually the choice in genome engineering and to optimize gene expression, the differential use of the stop codons in the HEG and the downstream context described in this article should be taken into consideration.
  31 in total

1.  Tandem termination signals: myth or reality?

Authors:  Louise L Major; Tina D Edgar; Po Yee Yip; Leif A Isaksson; Warren P Tate
Journal:  FEBS Lett       Date:  2002-03-06       Impact factor: 4.124

2.  Why are suppressors of amber mutations so frequent among Escherichia coli K12 strains?. A plausible explanation for a long-lasting puzzle.

Authors:  Dominique Belin
Journal:  Genetics       Date:  2003-10       Impact factor: 4.562

3.  Suppression of mutations in the alkaline phosphatase structural cistron of E. coli.

Authors:  A GAREN; O SIDDIQI
Journal:  Proc Natl Acad Sci U S A       Date:  1962-07-15       Impact factor: 11.205

4.  A strong suppressor specific for UGA.

Authors:  J F Sambrook; D P Fan; S Brenner
Journal:  Nature       Date:  1967-04-29       Impact factor: 49.962

5.  Nucleotide sequence from the polypeptide chain termination region of the coat protein cistron in bacteriophage R17 RNA.

Authors:  J L Nichols
Journal:  Nature       Date:  1970-01-10       Impact factor: 49.962

6.  Impact of the six nucleotides downstream of the stop codon on translation termination.

Authors:  O Namy; I Hatin; J P Rousset
Journal:  EMBO Rep       Date:  2001-08-23       Impact factor: 8.807

7.  Nonsense suppression in archaea.

Authors:  Arpita Bhattacharya; Caroline Köhrer; Debabrata Mandal; Uttam L RajBhandary
Journal:  Proc Natl Acad Sci U S A       Date:  2015-04-27       Impact factor: 11.205

8.  HEG-DB: a database of predicted highly expressed genes in prokaryotic complete genomes under translational selection.

Authors:  Pere Puigbò; Antoni Romeu; Santiago Garcia-Vallvé
Journal:  Nucleic Acids Res       Date:  2007-10-11       Impact factor: 16.971

9.  OPTIMIZER: a web server for optimizing the codon usage of DNA sequences.

Authors:  Pere Puigbò; Eduard Guzmán; Antoni Romeu; Santiago Garcia-Vallvé
Journal:  Nucleic Acids Res       Date:  2007-04-16       Impact factor: 16.971

10.  The identity of the base following the stop codon determines the efficiency of in vivo translational termination in Escherichia coli.

Authors:  E S Poole; C M Brown; W P Tate
Journal:  EMBO J       Date:  1995-01-03       Impact factor: 11.598

View more
  1 in total

Review 1.  Genetic Code Engineering by Natural and Unnatural Base Pair Systems for the Site-Specific Incorporation of Non-Standard Amino Acids Into Proteins.

Authors:  Michiko Kimoto; Ichiro Hirao
Journal:  Front Mol Biosci       Date:  2022-05-24
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.