Literature DB >> 36040266

Phenotypic mutations contribute to protein diversity and shape protein evolution.

Maria Luisa Romero Romero^1,2, Cedric Landerer^1,2, Jonas Poehls^1,2, Agnes Toth-Petroczy^1,2,3.

Abstract

Errors in DNA replication generate genetic mutations, while errors in transcription and translation lead to phenotypic mutations. Phenotypic mutations are orders of magnitude more frequent than genetic ones, yet they are less understood. Here, we review the types of phenotypic mutations, their quantifications, and their role in protein evolution and disease. The diversity generated by phenotypic mutation can facilitate adaptive evolution. Indeed, phenotypic mutations, such as ribosomal frameshift and stop codon readthrough, sometimes serve to regulate protein expression and function. Phenotypic mutations have often been linked to fitness decrease and diseases. Thus, understanding the protein heterogeneity and phenotypic diversity caused by phenotypic mutations will advance our understanding of protein evolution and have implications on human health and diseases.

Entities: Chemical

Keywords: frameshifts; protein evolution; transcriptional errors; translational errors

Mesh：

Substances：
Codon, Terminator

Year: 2022 PMID： 36040266 PMCID： PMC9375231 DOI： 10.1002/pro.4397

Source DB: PubMed Journal: Protein Sci ISSN： 0961-8368 Impact factor: 6.993

INTRODUCTION

Evolution is limited by the accuracy of information transfer in the cell. The evolution of a highly accurate machinery for replication, transcription, and translation was crucial to the emergence of cellular life as we know it today. However, the information transfer is not perfect, and every time DNA, RNA, or a protein is synthesized, errors may happen, resulting in mutations. Errors during transcription and translation lead to so‐called “phenotypic mutations.” Sources of phenotypic mutations include misincorporation of nucleotides or amino acids as well as RNA polymerase slippage, ribosomal slippage, premature termination, and stop codon readthrough (Figure 1a).

FIGURE 1

Phenotypic mutations contribute to protein diversity. (a) Transcriptional errors, such as nucleotide misincorporation and RNA polymerase slippage, and translational errors, such as amino acid misincorporation, ribosomal frameshift, stop codon readthrough, and premature termination, lead to protein sequence heterogeneity. (b) Phenotypic mutations can generate several transcripts and proteins from a single gene sequence Phenotypic mutations are orders‐of‐magnitude more frequent than genetic mutations (10−2 to 105 vs. 10−6–10−8 respectively). , In fact, protein synthesis from genetic information is so error‐prone that ~15% of an average‐length protein contains at least one mis‐incorporated amino acid. Thus, each gene corresponds to a population of proteins, also called “statistical proteins”, that differ from the canonical, error‐free protein (Figure 1b). “Biology is messy,” as Dan Tawfik outlined in an intriguing concept article, and wherever the cost of inaccuracy proves bearable, biological systems produce heterogeneity. This heterogeneity can facilitate physiological and evolutionary adaptation to new environments and challenges. Phenotypic mutations explore a vast mutational space from a single gene while maintaining the wild type sequence. Thus, Tawfik suggested that phenotypic mutations allow tinkering for novel functions, as illustrated by a seminal example of a frameshift that controls cytosolic and peroxisomal localization. Here, we first review the types and magnitude of phenotypic mutations that originate from transcriptional and translational errors. Then we discuss their evolutionary impact shaping protein robustness and their potential in adaptation, such as the evolution of new functions via frameshifts. Finally, we highlight how phenotypic mutations can alter phenotypes and review associations with human disease phenotypes and therapeutic applications.

DETECTING PHENOTYPIC MUTATIONS

The nature and magnitude of transcriptional errors

Transcriptional errors include nucleotide misincorporations in the mRNA and slippage of the RNA polymerase (RNAP), giving rise to single amino acid changes, long regions of altered amino acid sequences, and deletions (Figure 1a and Table 1). Single nucleotide misincorporations occur >100‐fold more frequently than DNA replication errors , and, since individual mRNAs can be translated up to 40 times, they significantly impact the downstream protein population.

TABLE 1

Representative transcriptional error rates of several species

Nucleotide misincorporation in transcription
Organism	Error rate (per nucleotide)	Method
Escherichia coli	4×10−4	In vitro transcription of purified RNAP ¹¹
E. coli	1.4×10−4	In vivo reporter‐construct assay to detect mutations in the lacZ transcript ¹³
E. coli	1−2×10−5	In vivo reporter‐construct assay to detect mutations in the lacZ transcript ¹⁵
E. coli	~10−6	In vivo reporter‐construct assay to detect G to A errors ¹⁶
E. coli	8.2×10−5	In vivo transcriptome‐wide detection by CirSeq ⁹
rpb9Δ yeast	1.7×10−3	In vivo reporter‐construct assay to detect mutations in the lacZ transcript ¹⁷
Yeast	2.8×10−4	In vivo reporter‐construct assay to detect mutations in the lacZ transcript ¹⁸
C. elegans	4×10−6	In vivo transcriptome‐wide detection by NGS ²⁰
Yeast	3.9×10−6	In vivo transcriptome‐wide detection of RNAP II error by CirSeq ²⁵
Yeast	4.3×10−6	In vivo transcriptome‐wide detection of RNAP I errors by CirSeq ²⁵
Yeast	9.3×10−6	In vivo transcriptome‐wide detection of mitochondrial mRNA errors by CirSeq ²⁵
Yeast	1.7×10−5	In vivo transcriptome‐wide detection of RNAP III errors by CirSeq ²⁵
Yeast	4.2×10−5	In vivo transcriptome‐wide detection by CirSeq ¹³²
M. florum	1.8×10−5	In vivo transcriptome‐wide detection by CirSeq ²⁴
E. coli	5.8×10−6	In vivo transcriptome‐wide detection by CirSeq ²⁴
B. subtilis	5.8×10−6	In vivo transcriptome‐wide detection by CirSeq ²⁴
A. tumefaciens	7.3×10−6	In vivo transcriptome‐wide detection by NGS ²⁴
Buchnera aphidicola	4.7×10−5	In vivo transcriptome‐wide detection by CirSeq of a specie that has lost transcription fidelity factors ⁹
Carsonella ruddii	5.1×10−5	In vivo transcriptome‐wide detection by CirSeq of a specie that has lost transcription fidelity factors ⁹

Representative transcriptional error rates of several species One of the earliest studies of the error rate of RNAP was presented by Springgate and Loeb in 1975. They purified Escherichia coli RNA polymerase and showed that during the in vitro transcription of synthetic polydeoxynucleotides, it inserted noncomplementary nucleotides. This work described erroneous base‐pairing during transcription as an infrequent and nonuniform event. Substitution of U to C was the most common error, occurring with a frequency of error per base pair, two orders of magnitude less accurate than the fidelity of DNA replication in vitro (for E. coli, error per base pair ). The outcome of such a transcriptional error rate would be that ~5% of the E. coli proteome differ from the canonical sequence. About a decade later, an assay measuring in vivo transcriptional error rate in E. coli revealed a rate of errors per base. The assay relies on introducing nonsense mutations in lacZ and then assaying for residual β‐galactosidase activity. Functional mRNA LacZ will only be synthesized upon translational errors that revert the nonsense mutation. Reporter‐construct assays have long provided valuable information about RNA fidelity and transcriptional errors, , , , , yet these studies have mainly focused on individual loci. Only recently it became possible to study transcriptional errors globally in the transcriptome. Detecting incorrect transcripts by conventional next‐generation sequencing is not possible because the reverse transcription itself is at least as error‐prone as transcription itself. Barcoding individual RNA fragments before multiple rounds of the reverse transcription step overcame the obstacle and allowed the detection of natural transcripts at transcriptome‐wide level. Application of this method to Caenorhabditis elegans showed a base misincorporation rate in mRNAs of errors per base, about 10 times lower than previously reported for prokaryotes and unicellular eukaryotes (10−5–10−3 errors per codon ). Another powerful method, Circular Sequencing (Cir‐Seq), is based on circularized genomic RNA fragments to generate tandem repeats for next‐generation sequencing to distinguish transcriptional errors from the noise produced by technical errors. , Different variants of this method have been applied to assess error rates, revealing that eukaryotes have a more accurate transcriptional machinery than prokaryotes in general. Yet, there are great differences in the transcriptional error rate even among prokaryotes ( Table 1 and Figure 2).

FIGURE 2

Transcription error rates vary across the tree of life. (a) Transcriptional error rates in the form of nucleotide misincorporations are orders of magnitude higher than genomic mutations across the tree of life (Table 1). (b) Increasing effective population size correlates with higher transcriptional error rates. Higher effective population sizes potentially increase the efficacy of selection for local error rate reduction. Effective population size estimates were taken from Sung et al. Transcriptional error rates vary among species, within the same species, and even within the same organism. In yeast, for example, the overall error rate was measured to be errors per base, however different RNA polymerases have different error rates: mRNA synthesized by RNAPI had errors per base, mRNA synthesized by mitochondrial RNAP had errors per base, and RNA synthesized by RNAPIII had errors per base. Furthermore, nucleotide misincorporations differ in frequency depending on the sequence context; for example, in vitro transcripts synthesized by E. coli RNAP have a strong bias in errors toward AMP misincorporations instead of GMP. Transcriptional error rates also differ when comparing nascent to total RNA due to the proof‐reading machinery. Despite these differences in the magnitude of the transcriptional error rate, all these studies agree that RNA polymerases do not operate as accurately as DNA polymerases (Figure 2a and Table 1). Lynch proposed that, since individual loci produce numerous transcripts, and mRNAs have a short lifespan, transcriptional errors likely have milder consequences than genomic errors. Thus, the lower selection pressure on the fidelity of the transcriptional machinery might be one of the reasons for the higher transcriptional error rates.

The nature and magnitude of translational errors

Estimating translational error rates, especially on a large scale, has proven challenging. A significant constraint to accurately measure translational errors is the lack of high‐throughput methods to detect these rare and transient events proteome‐wide. Thus, most of our knowledge regarding translational error rates comes from the design of reporter‐construct assays of a single locus. Another challenge, once the protein synthesis error is identified, is to distinguish transcriptional from translational errors. Overcoming these challenges, many creative works, some of them listed below (Table 2), have shown that: (a) translational errors are orders of magnitude more frequent than genetic mutations (Figure 2a) ; (b) organisms have hotspot sequences, prone to translational recoding, in their adaptive toolbox. For example, programmed ribosomal frameshifting (PRF) , , and programmed stop codon readthrough are used to encode novel functions, as we will review in the following chapters (Table 3).

TABLE 2

Representative translational error rates of several species

Errors in translation
Organism	Error rate	Method
Amino acid misincorporation
E. coli	3×10−4 per base	In vitro radioactive assay combined with chromatography techniques to nonsense mutations in ovalbumin ³⁴
E. coli	10−4to10−3 per base	In vivo reporter‐construct assay for lysine misincorporation in the active site of luciferase ³⁵
E. coli	10−4to10−3 per base	Proteome‐wide methodology based on mass spectrometry to identify and quantify amino acid mismatches in vivo ³⁶
Stop codon readthrough
E. coli	50%	In vivo reporter‐construct assay for stop codon readthrough based on RF2‐lacZ gene fusion ¹⁰¹
Frameshifting
E. coli	1.1−2.2×10−4 per base	In vitro laboratory drift to assess frameshifting of homonucleotide sequences ⁷² , ⁹¹
E. coli	1.5×10−5 per base	Restoration of β‐galactosidase activity to stop‐containing variant ¹⁴⁴
E. coli	3×10−5 per base	Restoration of β‐galactosidase activity to stop‐containing variant ¹⁴⁴
Premature termination
E. coli	2.7×10−4 per codon	76% success probability for completing the fully amino‐acid β‐galactosidase polypeptide ¹⁴⁵
Yeast	0−2×10−3 per codon	Comparison of ribosome density on 5′ versus 3′ end of mRNA ¹⁴⁶

TABLE 3

Representative functional frameshifts of various genes

Functional frameshifts
Organism	Gene	Frameshift type	Frameshift frequency	Description
E. coli	copA	FS ‐1	0.5	PRF generates a copper transporter and a copper chaperone from the same gene ¹¹⁰
E. coli	dnaX	FS −1	0.7	Translational frameshift generates the gamma subunit of DNA polymerase III ⁴⁶
B. subtilis	cdd	FS −1	0.1–0.2	a frameshift near the stop codon results in a CDA subunit extended by 13 amino acids ¹⁴⁷
Bacteria	prfB	FS +1	Regulated by RF2	PRF regulates the expression of peptide chain release factor 2 ¹⁰¹ , ¹⁰²
S. cerevisiae	ABP140	FS +1	Unknown	PRF leads to the production of fully length ABP140 ¹⁰⁸
S. cerevisiae	EST3	FS +1	0.08–0.4	PRF leads to the production of fully length EST3 required for yeast telomere replication ¹⁰⁷
S. cerevisiae	Ty1	FS +1	0.2–0.6	The primary translation product of TYB is a TYA/TYB fusion protein expressed by translational frameshifting ¹⁴⁸
E. gossypii	IDP2	FS +1	0.28	PRF controls the localization of NADP‐dependent isocitrate dehydrogenases (IDP) in the yeast ⁶
P. anserina	PaYIP3	FS −1	0.01	PRF controls the expression of Rab‐GDI complex dissociation factor ¹⁴⁹
H. sapiens	CCR5	FS −1	0.1	The NMD pathway and miRNAs regulate the framshifts in the CCR5 mRNA ¹⁵⁰
H. sapiens	PEG10	FS +1	0.3	Mammalian gene PEG10 expresses two reading frames by high PRF in embryonic‐associated tissues ⁵²
Mammals	EDR1	FS −1	0.3	A developmentally regulated mammalian gene utilizes PRF ⁴⁸
Eukaryotes	OAZ1	FS +1	Regulated by polyamines	A negative feedback loop by a frameshifting controls the expression of the mammalian ornithine decarboxylase antizyme ⁵⁵

Representative translational error rates of several species Representative functional frameshifts of various genes Probably the first work that addressed translational errors, then called “nongenetic errors” in protein biosynthesis, dates from 1963. Combining radioactive measurements with chromatography techniques, amino acid misincorporations were found in ovalbumin as frequently as errors per codon. Later, engineering a reporter for lysine misincorporation in the active site of luciferase in E. coli, Kramer and Farabaugh estimated that error rates vary from errors per codon and proposed that the frequency of translational misreading errors in E. coli is largely determined by tRNA competition. The first methodology to detect and quantify translational errors proteome‐wide was presented in 2019. Shotgun mass spectrometry was applied to assess amino acid misincorporation in the proteome of E. coli and Saccharomyces cerevisiae. Mass spectrometry is, so far, the most powerful method to study protein forms, yet it lacks sensitivity to detect rare, low abundance proteins. The above methods alone cannot discern whether the errors in the protein sequence have transcriptional or translational origin; however, they could be combined with RNA‐Seq techniques to pinpoint the source of the error. Taken together, the findings on translational error rates show that the synthesis of a functional protein from mRNA is strikingly error‐prone and that error rates differ among organisms (Table 2).

The nature and magnitude of frameshifts

Investigations of translational errors often focus on single amino acid misincorporations. , However, another type of error that is hard to investigate proteome‐wide is ribosomal frameshift. In ribosomal frameshifting, the ribosome slips on the mRNA, usually by −1 or +1, and continues translation in the new reading frame. , , As a consequence, a completely changed protein sequence is synthesized. Like other phenotypic mutations, frameshifting is a stochastic process: at a given site, some proportion of ribosomes will shift, and some will continue translation in the normal frame. Several studies have shown that these errors happen at low rates in vivo, with estimated rates per codon ranging from to more than , , Specific sites show much higher frameshifting rates (Table 3), for example, ~30% in SARS‐CoV‐2, and in some cases exceeding 80%. This phenomenon is called programmed ribosomal frameshifting (PRF) , , and, in many cases, serves a specific function in protein production. The most widespread use of PRF is found in viruses, possibly due to constraints on their genome size and limited availability of other mechanisms to regulate protein expression. In prokaryotes and eukaryotes, the extent of PRF is less clear. Computational studies showed that a known −1 FS site occurs in dozens of E. coli genes, and the S. cerevisiae genome contains thousands of sequences resembling −1 sites. However, only a handful of genes have been experimentally confirmed to frameshift with high frequency (Table 3). Even if a frameshift occurs with high frequency, it is difficult to infer if it is simply an error or harbors a function. Indications for a functional frameshift are, for example: (a) conservation of the frameshift site and the sequence of the frameshifted ORF; (b) binding or enzymatic activity in the shifted ORF; and (c) a phenotypic effect due to abolition of frameshifting, for example, by synonymous mutations of the frameshift site. For instance, in the human gene PEG10, a −1 frameshift was confirmed in cell culture. The frameshift site and the shifted ORF are conserved, and the frameshift produces a working protease. However, it is still unclear if this frameshift has an actual function in the living organism. In other cases, frameshift sites were found to work in vitro, but the product could not be identified in vivo, or a frameshift was only inferred in silico by similarity. The only human cases of ribosomal frameshifting with a known function are the antizyme genes, OAZ1, 2, and 3, where +1 frameshifting is widely conserved. , , Recently, the programmed frameshifting in human CCR5 was found to be an artifact. An exceptional case are the ciliates of the Euplotes genus, which use +1 or +2 frameshifting in more than 10% of their genes. , Invariably, frameshifting happens at in‐frame stop codons, and it is suspected that Euplotes evolved to use additional signals for genuine termination, while the default behavior at a stop codon is frameshifting and continued translation. Unlike single amino acid misincorporations, frameshifting can generate an entirely new protein sequence and can occur at orders of magnitude higher frequencies. Below, we review how frameshifting can lead to novel functions (Sections 3.4, 3.6).

EVOLUTIONARY IMPACT OF PHENOTYPIC MUTATIONS

The evolution of an accurate transcriptional and translational machinery was crucial to the emergence of cellular life. It was proposed that, at the advent of life, evolution of the codon table was driven by the necessity to minimize the effects of translational errors and maximize resistance against single base mutations. However, while the standard genetic code is highly robust in this regard, genetic codes with even higher robustness are possible. , , , As Maseshiro and Kimura pointed out, the genetic code not only has to be robust, but also allow for change not to limit evolvability—two conflicting constraints that are hard to satisfy at the same time. The trade‐offs between robustness and evolvability were extensively studied in the context of genetic mutations. , , , , Below, we highlight work regarding robustness to phenotypic mutations.

The cost and benefit of phenotypic mutations

Similar to genetic mutations, phenotypic mutations impose a cost as well as may provide benefits to the organism. The metabolic production costs of a correct and erroneous protein are similar. However, the erroneous protein incurs these production costs, often without benefiting the cell. This is because the erroneous protein will most likely have reduced or no function, as most exchanges of amino acids are destabilizing and deleterious, , , , or even impose cytotoxic stress on the cell due to misfolding (Figure 3a).

FIGURE 3

Errors during protein production can be costly, and coping mechanisms are diverse. (a) Most metabolic costs incurred during protein production do not differ between functional and nonfunctional protein variants. Other costs, however, such as opportunity costs may differ at much larger scales. Transcriptional and translational errors are likely deleterious and diminish benefits of the functional protein. In case of strongly deleterious errors, additional cytotoxic costs may occur due to protein misfolding or aggregation. A cell might tolerate additional metabolic or opportunity costs until the error rate causes the cost to exceed the benefit of the remaining functional protein. (b) To minimize the fitness effects of errors in protein production, a cell can either minimize the error rate, modulate protein production level, or increase protein tolerance to errors. Minimizing global error rates are costly as it requires ATP‐dependent proof‐reading. Local reduction of error rates could be achieved by adapting the codon usage. Protein production can either be increased to compensate for the missing erroneous proteins or decreased to minimize cytotoxic costs Cells and organisms that are robust against mutations have an evolutionary advantage, as they reduce the potential costs of mutations. , Likewise, cells and organisms with reduced cost of phenotypic mutations might have an evolutionary advantage. Since only individual proteins within a population of proteins are affected by phenotypic mutations, it is difficult to compare evolutionary effects between genetic and phenotypic mutations. Two mechanisms were proposed for coping with protein synthesis errors : (a) increasing robustness of proteins to mutations; or (b) increasing the fidelity of transcription and translation (Figure 3b). Why and when these different options are preferred is still not fully understood. There is a trade‐off between the cost of phenotypic mutations and mechanisms to cope with them. The observed rate of phenotypic mutations results from a balance between the fitness cost imposed by their deleterious effects , , and the cost of decreasing their incidence. Any potential benefits that phenotypic mutations may provide could influence the rate. Species in nature provide insights into the evolution of error rates. So far, no experiments have been designed to test whether higher translational fidelity could evolve under laboratory conditions. However, theoretical models predict that higher translational fidelity could be achieved by either optimizing codon usage or evolving the translational machinery toward higher fidelity. The question arises what the limits are for protein production fidelity. Bürger et al. derived a lower and upper bound for the phenotypic error rate. The upper bound can be directly linked to the cost of nonfunctional proteins, as at some point, the cost of producing a fraction of nonfunctional proteins exceeds the benefit of the fraction of canonical, error‐free proteins. The lower bound is closely linked to the proof‐reading cost of the RNA polymerase and the ribosome. It has been shown that increasing the fidelity of ribosome proof‐reading reduces bacterial growth rates. What is the reason for the vastly different error rates across species (Tables 1 and 2)? It was proposed that species with high effective population sizes have higher overall error rates because reducing error rates by kinetic proof‐reading is a costly global solution. Increased efficacy of selection due to the large effective population size allows the evolution of “local” solutions on a gene‐by‐gene basis to reduce the effects of errors. Thus, there is no need to decrease globally the error rate by increasing the fidelity of the kinetic proof‐reading. Local solutions to mis‐transcription are: (a) local robustness to the consequences of mis‐transcription when it occurs, and (b) locally reduced mis‐transcription rates at the sites that are the most sensitive. Transcriptional error rates of non‐C to U errors are lower in highly expressed genes in E. coli but not in S. cerevisiae, suggesting that E. coli reduces “local” error rates in proteins where selection is strongest. Still, local mis‐transcription rates, even of highly expressed E. coli genes are higher than the global mis‐transcription rate in S. cerevisiae. While this trend holds for S. cerevisiae and E. coli, it does not generalize for other species when comparing effective population sizes and transcriptional error rate estimates from RNA‐Seq experiments (Figure 2b). We note that M. florum is an outlier, and notably more experiments and error rate estimates are needed to validate theoretical predictions and understand how error rates evolve.

Phenotypic mutations shape protein properties

Increased robustness to phenotypic mutations was observed by a pioneering study by Goldsmith and Tawfik, who explored the evolutionary response to increased transcriptional errors. They used a bacterial system where the antibiotic resistance gene TEM‐1 beta lactamase was transcribed either by a normal, high fidelity RNA polymerase or an error‐prone, mutant polymerase. They studied the response of the bacteria to ampicillin selection pressure and identified two mechanisms allowing E. coli to cope with the erroneous TEM‐1 beta lactamase proteins due to the higher transcriptional error rate: (a) increasing expression level and (b) fixation of stabilizing mutations (Figure 4). Increasead expression levels could compensate for the large amount of mis‐transcribed mRNA and consequently nonfunctional TEM‐1 variants. Interestingly, the more stable TEM‐1 exhibited increased tolerance to both phenotypic and genetic mutations.

FIGURE 4

Phenotypic mutations play a role in shaping protein traits and tolerance to genetic mutations. TEM‐1 beta‐lactamase was transcribed with a high‐fidelity RNA polymerase and its error‐prone mutant. Higher transcriptional error rates promoted enhanced TEM‐1 expression levels and stabilized enzyme variants A similar follow‐up work on TEM‐1 used an error‐prone ribosome to study the evolutionary effects of translational errors. Under weak selection, they observed mutations toward alternative start codons, decreasing translation initiation, and thereby decreasing the cytotoxic costs of misfolding and aggregation. Under strong selection, however, when reducing gene expression would be detrimental, they observed the purging of deleterious and the fixation of stabilizing mutation, which increases robustness to (phenotypic) mutations. The evolutionary response observed in these experiments , mitigated the effects of phenotypic mutations by adjusting the expression level and by increasing the robustness of proteins (Figure 3b). Goldsmith and Tawfik surmised that phenotypic mutations might play a role in shaping protein properties such as expression levels, stability, and tolerance to genetic mutations. Kalapis et al. studied the response to protein synthesis errors by evolving a mutant yeast strain that was engineered to translate a codon ambiguously, leading to amino acid misincorporations. Unlike the previous studies that focused on the response at the level of a single gene, that is, TEM‐1, Kalapis et al. explored genetic changes across the whole genome of the organism. These laboratory evolution experiments revealed that fitness loss due to mistranslation was mitigated by large chromosomal duplication and deletion events that alter the dosages of numerous, functionally related proteins simultaneously. They also observed faster degradation rates by the ubiquitin‐proteasome system that led to the elimination of erroneous protein products, thereby reducing the extent of toxic protein aggregation in mistranslating cells. However, they observed a trade‐off between adaptation to mistranslation and survival upon starvation. As a response to an enhanced energy demand of accelerated protein turnover, the evolved lines exhibited increased glucose uptake by selective duplication of hexose transporter genes. This means that, while adaptive mechanisms to sudden and catastrophic level of mistranslation evolved rapidly, they affected cellular homeostasis.

Phenotypic mutations promote evolvability

Changing environments may favor higher error rates which results in diverse protein populations. , The protein diversity that is generated by protein synthesis errors can occasionally be beneficial and promote evolvability. One determinant of evolvability is the set of possible mutations available to a genotype, or “mutational neighborhood.” Nelson and Masel found that selection on a better phenotypic mutational neighborhood has as its byproduct the creation of a better genetic mutational neighborhood. A benign mutational neighborhood arises as a byproduct of transiently elevated error rates via a mechanism termed emergent evolutionary capacitance (similar to evolutionary capacitance but without a capacitor). Capacitance results in a higher error rate that promotes evolvability. Theoretical studies on stop codon readthrough suggested the purging of the mutational neighborhood of catastrophically bad options enrich the remainder for potential adaptations through a process of elimination. So far, no experimental work has tackled how increased global error rates could facilitate adaptation. Meyerovich et al. showed how higher local error rates could be advantageous in a reporter gene. They compared bacterial survival toward antibiotic resistance using a specific antibiotic resistance gene (cat—encoding the chloramphenicol resistance protein), as a reporter, and showed how higher levels of errors could be selectively advantageous. They found that low temperature induced stop codon readthrough and frameshifts. Accordingly, bacteria harboring the mutated cat gene (i.e., inactivated via frameshift or nonsense mutation) grown at low temperature, showed higher survival rate under antibiotic selection than when grown at optimal growth temperature. In this case, errors in gene expression enabled survival of strains carrying a mutated antibiotic resistance gene. The authors proposed that high error levels, in general, could facilitate the expression of pseudogenes containing frameshift and nonsense mutations. Pseudogenes are common in bacterial genomes, and they may be expressed in low amounts and become beneficial under changing environments.

Phenotypic mutations as bridging evolutionary intermediates

Masel showed that cryptic (or hidden) genetic variation could be enriched for adaptation, and this enrichment is stronger when multiple changes are needed simultaneously to generate a potentially adaptive phenotype. While these examples were inspired by alternative splicing and stop codon readthrough due to [PSI+] prion state in yeast. It could be generalized that phenotypic mutations might act as bridging evolutionary intermediates and provide a powerful mechanism for adaptation when multiple mutations are needed. Based on theoretical modeling and simulations, Whitehead et al. showed that if two mutations are needed for a novel trait, after the first one was acquired as a genetic mutation, the second mutation can be introduced into the phenotype via a transcriptional or translational error. If the novel trait is advantageous enough, the allele with only one mutation will spread through the population, even though the gene sequence does not yet code for both alleles. Thus, phenotypic mutations allow “look‐ahead” for a two mutation path to evolve a novel trait. Rockah‐Shmuel et al. showed how phenotypic mutations can act as compensatory mutations and hypothesized that they can serve as evolutionary intermediates. They observed that short 1–2 nt long frameshifting InDels (insertions and deletions that result from DNA polymerase slippage and alter the reading frame) persisted during a laboratory genetic drift experiment of a gene coding for the DNA methyltransferase M.HaeIII. Since frameshifting InDels usually lead to loss of function, these InDels should have been purged from the population. Surprisingly, the authors found that many frameshifting InDels within homonucleotide repeats of 3–8 nt were bypassed by compensatory frameshifts by the RNA polymerase or the ribosome. Intriguingly, the genetic occurrence of InDels and the transcriptional–translational bypass to correct the induced frameshifts seem mechanistically related, and their frequencies are correlated. The longer the repeat, the higher was the frequency of InDels in the gene, and the more frequent was their bypass. Thus, the same sequence context might be slippery for the DNA polymerase and the RNA polymerase as well. It should be noted that, even when functional rescue is 100%, the bypass is not, thus truncated protein forms are also present and impose a cost on the cell. In general, InDels could speed up sequence divergence since they can drastically alter the protein's length and sequence, unlike nonsynonymous point mutations that result in exchange of single side chains. For example, an InDel that induces a frameshift and is compensated by another InDel downstream in the sequence could result in a new sequence stretch. More than 600 such compensatory frameshifting InDel candidates were recently detected, including in human genes RAB36, ARHGAP6, and NCR3LG1. Compensatory frameshifting InDels represent a previously overlooked source of protein variations. The evolutionary trajectory of their appearance and the role of intermediates via transcriptional or translational bypass is an exciting future research direction.

Protein expression regulation by programmed ribosomal frameshifting

The evolutionary relevance of frameshifting depends on their frequency of occurrence. Most frameshifts are rare, as reviewed in Section 2.3. Programmed ribosomal frameshifts with very high rates may be employed to regulate protein expression. Many viruses, like HIV‐1 and SARS‐CoV‐2 use −1 frameshifts to control the relative production of large polycistronic sequences (Gag‐Pol and Rep, respectively) (Figure 5a).

FIGURE 5

Functional innovations by ribosomal frameshifting and stop codon readthrough events. (a) (1) Frameshifting is used to regulate the relative production of proteins in many RNA viruses; for example, the polyproteins Gag and Pol are encoded in one RNA and are separated by a frameshift. (2) Frameshifting is used to regulate protein expression by turning genes on or off; for example, bacterial release factor 2 (RF2) is only produced when an internal stop codon is bypassed via frameshifting. With higher RF2 levels, accurate termination becomes more likely, and production of RF2 is decreased. (3) Phenotypic mutations can regulate modular function. In the copA gene (E. coli), the 5′ part of the mRNA encodes a copper‐binding domain. This domain is used as part of the transporter produced by normal translation and as a soluble chaperone produced by early termination after a frameshift. In the human gene MDH1, stop codon readthrough leads to synthesis of a peroxisomal signal peptide. (b) Phenotypic mutation preceded a genetic solution for dual targeting. In several budding yeasts, there is only one IDP gene; however, it contains a peroxisomal signal peptide (PTS1) in the +1‐frame of the 3′ UTR. In E. gossypii, frameshifting leads to producing a signal peptide and targeting the peroxisome only in a fraction of proteins. Species that underwent Whole‐Genome Duplication have two IDP paralogues: IDP3 bears PTS1 in the 0‐frame and is fully targeted to the peroxisome, while IDP2 has no PTS1, and it localizes to the cytosol Interrupted genes or shifted open reading frames are “fixed” by a readthrough mechanism that generates functional proteins. , , , Under fluctuating environments or within small populations, frameshifting InDels in sequence repeats provide a rapid means of switching genes on and off. A unique case is the regulation of the expression of bacterial release factor 2 (RF2), which is responsible for translation termination. Regulation occurs via a negative feedback loop (Figure 5a). The RF2 gene (prfB in E. coli) contains a premature stop codon. High RF2 levels will lead to early termination and a truncated, nonfunctional product. With decreasing RF2 concentration, a +1 frameshift at the stop codon and continued translation becomes more likely, leading to full‐length RF2. This frameshift and regulatory mechanism is conserved in around 70% of bacteria. A well‐studied example of eukaryotic frameshifting is the ornithine decarboxylase antizyme (OAZ). , The antizyme, which is needed to degrade polyamines, requires a +1 frameshift to be produced. Frameshifting in the encoding genes increases with polyamine concentrations, creating a negative feedback loop. This mechanism is found in yeast, rats, drosophila, and humans. , , Other S. cerevisiae genes where +1 frameshifting in effect reduces translation include EST3 and ABP140, both of which require the frameshift to be produced full‐length, with the truncated sequence having no known function. ,

Functional modularity via phenotypic mutations

Frameshifts are used to diversify protein function by providing functional modularity, where the full‐length protein and a truncated form have overlapping functions. In E. coli, programmed ribosomal frameshifting is used in the dnaX gene, which, when translated fully, encodes the Tau subunit of DNA polymerase III. A −1 frameshift after the first 430 amino acids leads to termination shortly afterwards and production of the shorter Gamma subunit. Tau is essential and Gamma is part of the Pol III holoenzyme, assumed to be functional as well. The copA gene, in contrast, encodes for a copper transporter when fully translated, while a −1 frameshift shortly after the first domain produces a copper chaperone (Figure 5a). Possibly, this PRF acts to produce the copper‐binding domain in two forms: membrane‐bound (as part of the transporter) and as a soluble chaperone. The copper chaperone resulting from the frameshift contributes to copper tolerance by scavenging copper, thus helping cells survive toxic copper concentrations. Another form of functional modularity is producing an extended protein harboring a signal peptide by frameshifts or stop codon readthrough (Figure 5a). Dual localization, that is, cytosolic and peroxisomal, was observed via cryptic peroxisomal signals revealed by alternative splicing and stop codon readthrough. Interestingly, different organisms use different molecular mechanisms to generate the peroxisomal proteins. In the plant pathogen Ustilago maydis, peroxisomal targeting of Pgk1 and Gadph are due to stop codon readthrough and alternative splicing, respectively. In Aspergillus nidulans, peroxisomal targeting of these enzymes is achieved by the opposite mechanisms: Pgk1 uses alternative splicing while Gadph uses a stop codon readthrough. Reliable targeting regulation can be achieved by altering the stop codon context to achieve a suitable readthrough rate. , Such programmed stop codon readthrough or leaky termination was shown to generate dually targeted protein isoforms, for example, cytosolic and peroxisomal forms of NAD‐dependent lactate dehydrogenase B (LDHB) and NAD‐dependent malate dehydrogenase 1 (MDH1). By reconstructing the evolutionary history of a protein family that uses frameshifting, Yanagida et al. showed how a phenotypic mutation was later replaced by a genetic solution. In this intriguing example, frameshifting controls the localization of NADP‐dependent isocitrate dehydrogenases (IDP) in the yeast Eremothecium gossypii. The gene contains a cryptic peroxisomal signal peptide that is only translated after a frameshift shortly before the stop codon. Yeast species that underwent whole‐genome duplication, such as S. cerevisiae, possess two genes, one codes for the cytosolic form, IDP2, and the other for the peroxisomal form, IDP3, marked for localization via a C‐terminal signal peptide (Figure 5b). However, species without genome duplication only possess one IDP gene, likely corresponding to the cytosolic IDP2. Since the 3′UTR of IDP and the cryptic peroxisomal signal is conserved across those species, the frameshift mediated mechanism is likely conserved. Thus a single gene allows for an intermediate phenotype between a single cytosolic gene and two gene copies, preceding gene duplication and the genetic solution to a phenotypic mutation.

PHENOTYPIC MUTATIONS IN DISEASE

Phenotypic mutations as therapeutics

There is not much known about the endogenous phenotypic consequences of phenotypic mutations since there is no error‐free cell or organism to compare to. We have, however, experimental evidence that an increased error rate reduces growth rate in bacteria. In fact, inducing extensive translational errors is an effective antibiotic treatment. Aminoglycoside antibiotics work by binding to the bacterial ribosomal decoding center, which leads to misincorporation of near‐cognate aminoacyl‐tRNAs and induces translation error clusters. At high concentration they inhibit protein synthesis and induce death. Aminoglycosides have only very low affinity to eukaryotic ribosomes. This low affinity could still be exploited to induce stop codon readthrough for genetic diseases caused by a premature stop codon (or nonsense mutation). Increasing stop codon readthrough was suggested as a therapy for approximately 40 genetic diseases caused by premature termination due to nonsense mutation. , Recently, a gene, named Shiftless after its function, was discovered to inhibit −1 PRF and was suggested as a broad‐spectrum inhibitor of viruses that use PRF, such as HIV‐1. Later it was shown to reduce viral replication of Flaviviruses such as Zika and coronaviruses, among others.

Phenotypic mutations and diseases

Increased transcriptional infidelity contributes to several human diseases, such as cancer. Errors in translation underlie the pathogenesis of many neurodegenerative disorders such as Alzheimer's and Huntington's diseases. , Three mechanisms have been proposed to account for the production of disease‐specific protein variants: (a) ribosomal frameshifting, , (b) alternative initiation sites, such as repeat‐associated non‐AUG (RAN) translation, and (c) near‐cognate start codon initiated translation. Here we list some examples for each mechanism. Several diseases are linked to repeat expansions, for example, CAG expansion, including Huntington disease, spinal bulbar muscular atrophy, and spinocerebellar ataxia types 1, 2, 3, 6, 7, and 17. In spinocerebellar ataxia type 3 (SCA3), ataxin 3 contains CAG repeats whose in‐frame translation produces an extended polyglutamine stretch; however, polyalanine‐containing proteins are also generated in patient neurons. In Huntington's disease (HD), the most common CAG repeat disease, the repeats produce polyglutamine when in frame, and polyserine and polyalanine in the +1 and −1 frames, which have been detected in autopsy samples of HD patients, and in a transgenic mouse model of HD. , The translation of expanded CAG repeats leads to a depletion of charged glutaminyl‐transfer RNA that pairs exclusively to the CAG codon. This results in translational frameshifting in cell culture. Polyalanine has also been detected in tissues of patients with another CAG repeat disease, spinocerebellar ataxia type 8 (SCA8) associated with ataxin 8 (ATXN8). Rather than frameshifting, in this case polyalanine seems to be produced by RAN translation. In myotonic dystrophy type 1 (DM1), CAG expansion transcripts result in the accumulation of polyglutamine expansion proteins. In general, CAG expansion constructs express homopolymeric polyglutamine, polyalanine, and polyserine proteins in the absence of an ATG start codon. Abnormal disease‐specific repeat proteins that are synthesized from both sense and antisense transcripts through RAN translation have been detected in tissues from patients with GGGGCC repeat expansion in the first intron of C9ORF72 that causes both amyotrophic lateral sclerosis (ALS) and frontotemporal dementia (FTD), , as well as in patients with CCTG repeat expansion in the first intron of ZNF9 in myotonic dystrophy type 2 (DM2). Lastly, an example for near‐cognate initiation is the FMR1 protein that is associated with Fragile X‐associated tremor/ataxia syndrome (FXTAS). This neurodegenerative disease is caused by a limited expansion of CGG repeats in the 5′ UTR of FMR1. The CGG repeats are translated through initiation by an ACG codon into polyglycine protein, that is toxic in mice. The above examples show how repeat associated frameshifting and translation initiation can produce alternative protein forms that are toxic and contribute to diseases. Thus, alternative protein forms produced by translational errors rather than the canonical proteins may drive the pathogenesis of many diseases.

CONCLUSIONS AND FUTURE DIRECTIONS

Phenotypic mutations are orders of magnitude more frequent than genetic ones (Tables 1 and 2), yet their evolutionary impact is less understood. Due to their transient and stochastic nature, proteome‐ and transcriptome‐wide experimental characterizations of phenotypic mutations only recently became feasible by the advance of mass spectrometry and RNA‐Seq technics. , , , In the future, an innovative technique could obtain single amino acid sequence information for millions of molecules in parallel. , Proteins are cleaved, and the resulting peptides are fluorescently labeled and immobilized on a glass surface. Then, the peptides are imaged by microscopy to detect changes in fluorescence after Edman degradation. This method has the potential to process complex proteomic samples with the sensitivity required to detect translational errors at the single‐molecule level. There is large variability of protein synthesis fidelity across organisms (Tables 1 and 2). The observed error rates are the result of balancing deleterious effects and the cost of higher fidelity protein synthesis (Figure 3a). The deleterious effect of phenotypic mutations can be mitigated by increasing expression levels or accumulation of stabilizing genetic mutations (Figure 3b). Thus, phenotypic mutations play a role in shaping protein properties such as expression levels, stability, and tolerance to genetic mutations. Experiments on TEM‐1 as a model protein indicated an immediate effect of phenotypic mutations on protein dose and stability (Figure 4). , Phenotypic mutations may act as bridging intermediates, , , , or look‐ahead mutations that allow otherwise deleterious intermediates to survive in the population. Rockah‐Shmuel et al. demonstrated experimentally how deleterious frameshifts could be bypassed via phenotypic mutations. Yanagida et al. discovered a gene family where the phenotypic mutation preceded the genetic solution and paved the way to evolutionary adaptation via gene duplication (Figure 5b). There are many examples in all kingdoms of life where programmed frameshifts and stop codon readthroughs are used as a regulatory mechanism or lead to altered protein function, for example, a change in protein localization (Figure 5a). , It is difficult to distinguish adaptive, that is, programmed and nonadaptive molecular errors. For example, recently, the CCR5 frameshifting was found to be an artifact. While, ribosome profiling identified numerous stop codon readthrough events in S. cerevisiae, D. melanogaster , and mammals. There is a negative correlation with the expression level and readthrough rates, and read‐through motifs are avoided in highly expressed genes suggesting that most stop codon readthrough events are nonadaptive molecular errors. Programmed errors have likely high rate of occurrence, that is, they are frequent; and are likely conserved among multiple species and less tolerant to loss of function mutations (since they are under functional constraints). Candidates for programmed frameshifts and stop codon readthrough ultimately have to be experimentally verified, for example by measuring the functional and/or fitness effects of altering the read‐through rate. Phenotypic mutations have often been linked to fitness decrease and diseases. It has been shown how transcriptional infidelity causes cancer heterogeneity. Furthermore, translational inaccuracy, often due to repeats in protein sequences, contributes to several neurodegenerative disorders. , , , Inducing phenotypic mutations is used as therapeutic. For example, aminoglycosides inhibit ribosome translocation inducing translational errors. Since they have low affinity to eukaryotic ribosomes, aminoglycosides are often used as an antibiotic. Also, their propensity to induce translation errors can be utilized to alleviate the symptoms of human genetic diseases, for example, by inducing stop codon readthrough for diseases caused by nonsense mutations. It is unknown how long the effect of phenotypic mutations lasts. Proteins harboring phenotypic mutations can be passed to the next generation as protein half‐life in many cases exceeds cell cycle time. Phenotypic mutations can also affect new generations by triggering transcription network loops. For example, an increased error rate during transcription modulates switching the lac operon from the uninduced state to the induced after cell division. Phenotypic mutations might be incorporated directly into the genome via reverse transcription. Reverse transcription is mainly used by retroviruses and occasionally by cellular life. This mechanism was shown in cancer, where intron‐less versions of human pseudogenes were identified in cancerous genomes that were most likely the result of reverse transcription in somatic cells. Reverse transcription could serve as a potential evolvability mechanism by which transcriptional errors could become genetic. However, this hypothesis has not been tested yet. Thus, future studies are needed to answer the unmet questions of whether phenotypic mutations can be replaced with genetic ones. Although phenotypic mutations are not individually subjected to inheritance, as genetic mutations are, the examples highlighted in this review suggest that they, nevertheless, play a critical role in evolution. Understanding the protein diversity and phenotypic diversity caused by phenotypic mutations will advance our understanding of protein evolution and will have implications on human health and diseases.

AUTHOR CONTRIBUTIONS

Maria Luisa Romero Romero: Conceptualization (supporting); visualization (equal); writing – original draft (equal); writing – review and editing (equal). Cedric Landerer: Conceptualization (supporting); visualization (equal); writing – original draft (supporting); writing – review and editing (equal). Jonas Poehls: Visualization (equal); writing – original draft (supporting); writing – review and editing (equal). Agnes Toth‐Petroczy: Conceptualization (lead); supervision (lead); writing – original draft (equal); writing – review and editing (lead).

FUNDING

This work was supported by the Max Planck Society MPRGL funding.

149 in total

1. Sequences that direct significant levels of frameshifting are frequent in coding regions of Escherichia coli.

Authors: Olga L Gurvich; Pavel V Baranov; Jiadong Zhou; Andrew W Hammer; Raymond F Gesteland; John F Atkins
Journal: EMBO J Date: 2003-11-03 Impact factor: 11.598

Review 2. Translational accuracy and the fitness of bacteria.

Authors: C G Kurland
Journal: Annu Rev Genet Date: 1992 Impact factor: 16.830

3. Cryptic genetic variation is enriched for potential adaptations.

Authors: Joanna Masel
Journal: Genetics Date: 2005-12-30 Impact factor: 4.562

4. Inhibited cell growth and protein functional changes from an editing-defective tRNA synthetase.

Authors: Jamie M Bacher; Valérie de Crécy-Lagard; Paul R Schimmel
Journal: Proc Natl Acad Sci U S A Date: 2005-01-12 Impact factor: 11.205

5. Potential role of phenotypic mutations in the evolution of protein expression and stability.

Authors: Moshe Goldsmith; Dan S Tawfik
Journal: Proc Natl Acad Sci U S A Date: 2009-04-01 Impact factor: 11.205

6. On the evolution of the genetic code.

Authors: C R Woese
Journal: Proc Natl Acad Sci U S A Date: 1965-12 Impact factor: 11.205

7. How infidelity creates a sticky situation.

Authors: D Allan Drummond
Journal: Mol Cell Date: 2012-12-14 Impact factor: 17.970

8. Conserved rates and patterns of transcription errors across bacterial growth states and lifestyles.

Authors: Charles C Traverse; Howard Ochman
Journal: Proc Natl Acad Sci U S A Date: 2016-02-16 Impact factor: 11.205

9. The landscape of transcription errors in eukaryotic cells.

Authors: Jean-Francois Gout; Weiyi Li; Clark Fritsch; Annie Li; Suraiya Haroon; Larry Singh; Ding Hua; Hossein Fazelinia; Zach Smith; Steven Seeholzer; Kelley Thomas; Michael Lynch; Marc Vermulst
Journal: Sci Adv Date: 2017-10-20 Impact factor: 14.136

10. Dynamic pathways of -1 translational frameshifting.

Authors: Jin Chen; Alexey Petrov; Magnus Johansson; Albert Tsai; Seán E O'Leary; Joseph D Puglisi
Journal: Nature Date: 2014-06-11 Impact factor: 49.962