Literature DB >> 12459250

Pushing the limits of the scanning mechanism for initiation of translation.

Marilyn Kozak1.   

Abstract

Selection of the translational initiation site in most eukaryotic mRNAs appears to occur via a scanning mechanism which predicts that proximity to the 5' end plays a dominant role in identifying the start codon. This "position effect" is seen in cases where a mutation creates an AUG codon upstream from the normal start site and translation shifts to the upstream site. The position effect is evident also in cases where a silent internal AUG codon is activated upon being relocated closer to the 5' end. Two mechanisms for escaping the first-AUG rule--reinitiation and context-dependent leaky scanning--enable downstream AUG codons to be accessed in some mRNAs. Although these mechanisms are not new, many new examples of their use have emerged. Via these escape pathways, the scanning mechanism operates even in extreme cases, such as a plant virus mRNA in which translation initiates from three start sites over a distance of 900 nt. This depends on careful structural arrangements, however, which are rarely present in cellular mRNAs. Understanding the rules for initiation of translation enables understanding of human diseases in which the expression of a critical gene is reduced by mutations that add upstream AUG codons or change the context around the AUG(START) codon. The opposite problem occurs in the case of hereditary thrombocythemia: translational efficiency is increased by mutations that remove or restructure a small upstream open reading frame in thrombopoietin mRNA, and the resulting overproduction of the cytokine causes the disease. This and other examples support the idea that 5' leader sequences are sometimes structured deliberately in a way that constrains scanning in order to prevent harmful overproduction of potent regulatory proteins. The accumulated evidence reveals how the scanning mechanism dictates the pattern of transcription--forcing production of monocistronic mRNAs--and the pattern of translation of eukaryotic cellular and viral genes.

Entities:  

Mesh:

Substances:

Year:  2002        PMID: 12459250      PMCID: PMC7126118          DOI: 10.1016/s0378-1119(02)01056-9

Source DB:  PubMed          Journal:  Gene        ISSN: 0378-1119            Impact factor:   3.688


Introduction

The scanning mechanism for initiation of translation postulates that the small (40S) ribosomal subunit enters at the 5′ end of the mRNA and migrates linearly, stopping when the first AUG codon is reached. Consistent with the postulated 5′ end-dependent entry of ribosomes, translation in vivo is strongly augmented by the m7G cap (Furuichi and Shatkin, 2000, Horikami et al., 1984, Lo et al., 1998, Neeleman et al., 2001) and ribosome binding in vitro is prevented by circularization of the mRNA (Kozak, 1979a, Konarska et al., 1981). Perhaps because the scanning mechanism has been around for a while, the evidence for some basic points has been forgotten. One recent commentary even questions whether the 40S ribosomal subunit has anything to do with it (Mathews, 2002). The easiest answer is that the stop-scanning step is clearly mediated by pairing of the initiation codon with the anticodon in Met-tRNAi (Cigan et al., 1988a), and the 40S ribosomal subunit is the carrier of Met-tRNAi eIF2. But the 40S subunit was already implicated by experiments done earlier. The experiments that gave rise to the scanning model concerned unusual polysome-like complexes formed in the presence of edeine, an antibiotic which blocks recognition of the AUG codon (Kozak and Shatkin, 1978). Analysis of the rapidly sedimenting complexes revealed 40S ribosomal subunits distributed throughout the body of the mRNA. Because control experiments showed that, even in the presence of edeine, ribosomes can enter only from the 5′ end, the simplest explanation was that 40S subunits enter at the 5′ end and then migrate into the interior of the mRNA; in the absence of edeine, the migration would stop when an AUG codon is reached. Independent experiments confirmed that edeine is targeted to the ribosome (Herrera et al., 1986), and use of a fractionated translation system confirmed that the edeine-induced complexes are formed by 40S but not 60S ribosomal subunits (Kozak and Shatkin, 1978, Kozak, 1979b). Subsequent experiments, with edeine omitted, showed that scanning can be interrupted by inserting a base-paired structure between the cap and the AUG codon; the resulting abortive complexes sediment around 40S (Kozak, 1989, Kozak, 1998, Paraskeva et al., 1999). We are not yet sure which initiation factors are associated with the 40S ribosomal subunit during the scanning phase. The only factors whose role in scanning has been defined clearly are the GTP-binding protein eIF2, which escorts Met-tRNAi onto the 40S subunit, and eIF5, which activates GTP hydrolysis by eIF2 (Asano et al., 2001, Das et al., 2001). By controlling the rate of GTP hydrolysis, eIF5 controls the fidelity of initiation, i.e. the fidelity of the stop-scanning step (Huang et al., 1997). Other protein factors have not yet been fitted in. (The voluminous literature on factors focuses on modifications – phosphorylation, cleavages – rather than on defining the initiation pathway. Basic questions, such as when each factor enters and leaves, have not yet been answered.) One untested possibility is that the large initiation factor eIF3, bound to the 40S ribosomal subunit, might form a clamp around the mRNA that is opened and closed by cycles of ATP hydrolysis. Scanning appears to be dependent on ATP hydrolysis (Kozak, 1980), thereby implicating eIF4A, an RNA-dependent ATPase which might control the hypothetical clamp. Some ideas about the function of other initiation factors are reviewed elsewhere (Dever, 2002, McCarthy, 1998, Pestova et al., 2001). The strongest evidence that the scanning 40S ribosome/factor complex advances linearly is the position effect on selection of the start codon: initiation at the first potential start codon has been demonstrated in rigorous experimental tests (Cigan et al., 1988a, Kozak, 1983, Kozak, 1995) and confirmed in many ‘natural tests’ wherein addition or removal of an AUG codon produces the expected shift in the site of initiation (see below). The aforementioned blockade caused by inserting a base-paired structure between the 5′ cap and the AUGSTART codon is further evidence that 40S ribosomal subunits traverse the leader sequence linearly, rather than hopping (discontinuous scanning) or entering directly at the AUG codon. Although the scanning mechanism predicts that translation should initiate at the AUG codon nearest the 5′ end of the mRNA, two ancillary mechanisms – reinitiation and context-dependent leaky scanning – enable additional initiation events at downstream AUG codons in some mRNAs. These well-defined mechanisms for escaping the first-AUG rule are discussed below. An additional escape mechanism might involve direct entry of ribosomes at an internal site in the mRNA. While there is evidence suggestive of direct internal initiation with picornavirus mRNAs, the evidence for internal ribosome entry sites (IRES) in cellular mRNAs is problematic (Kozak, 2001a). The absence of shared structural features among candidate cellular IRES elements makes it impossible to predict which mRNAs, if any, might use such a mechanism. Rather than attempting to summarize the extensive literature on internal initiation, I refer the reader to other detailed reviews on that subject (Dever, 2002, Hellen and Sarnow, 2001, Pestova et al., 2001). The next section provides a terse summary of points that are easily explained by the scanning model. The bulk of the review then focuses on complicated examples and issues.

Constraints imposed by the scanning mechanism explain many common aspects of gene expression in higher eukaryotes

Silent downstream cistrons

Many plant and animal viruses produce dicistronic or polycistronic mRNAs from which only the 5′ cistron can be translated (Table 1). All these viruses solve the problem of ‘silent 3′ cistrons’ by producing – via splicing or discontinuous transcription or an internal promoter – additional forms of mRNA in which the downstream cistron is repositioned closer to the 5′ end. The reason for the complicated pattern of splicing seen with human immunodeficiency virus type 1 (HIV-1), for example, is simply to produce mRNAs that allow downstream open reading frames (ORFs) to be translated. The broad range of viruses represented in Table 1 merits attention.
Table 1

Partial list of structurally polycistronic viral mRNAs which are functionally monocistronic, i.e. only the first cistron is translateda

VirusExpressed 5′ cistronbSilent 3′ cistron(s)Source of short transcriptcReferences
Polyoma virusCapsid protein VP2Capsid protein VP1SplicingSiddell and Smith, 1978
Bovine papillomavirusNumerous examplesNumerous examplesPromoter switch and splicingLambert et al., 1988
CytomegalovirusUL98dUL99 (pp28)Promoter switchWing and Huang, 1995
AdenovirusNumerous examplesNumerous examplesSplicingWold et al., 1995, Ziff, 1985
Parvovirus: adeno-associatedCapsid protein ACapsid proteins B/CSplicingMuralidhar et al., 1994
Hepatitis B virusCore proteinS proteins (envelope)Promoter switchSchaller and Fischer, 1991
Retrovirus: avian, murineGag (capsid) proteinEnv proteinSplicingePawson et al., 1977, Van Zaane et al., 1977
Retrovirus: human foamyGag (capsid) proteinPol precursorSplicingJordan et al., 1996
Lentivirus: HIV-1TatRev and NeffSplicingSchwartz et al., 1992
Alphavirus: Semliki ForestNonstructural proteinsCapsid proteinInternal promoterGlanville et al., 1976, Strauss and Strauss, 1994
Calicivirus: felinegNonstructural proteinsCapsid proteinIndependent replicationCarter, 1990, Neill et al., 1991
Coronavirus: mouse hepatitisMembrane proteinNucleocapsid proteinDiscontinuous transcriptionLai and Cavanagh, 1997
Equine arteritis virusReplicase polyproteinGs glycoproteinDiscontinuous transcriptionPasternak et al., 2000
Brome mosaic virusRNA polymeraseCoat proteinInternal promoterMiller et al., 1985, Shih and Kaesberg, 1976
Tobacco mosaic virusReplicaseCoat and movement proteinsInternal promotersGrdzelishvili et al., 2000, Hunter et al., 1976
Potato virus X25 kDa movement protein12 and 8 kDa movement proteinsf??Verchot et al., 1998
Carmovirus: turnip crinklegReplicase (p28/p88)p8 and p9 movement proteinsInternal promotersLi et al., 1998, Wang and Simon, 1997
Tombusvirus: tobacco necrosisgRNA polymeraseCoat protein (ORF5)hInternal promoters?Meulewaeter et al., 1992
Southern bean mosaic virusgMovement protein and polymerasefCoat proteinInternal promoterHacker and Sivakumaran, 1997
Luteovirus: barley yellow dwarfgProtease/polymeraseCoat protein and p17fInternal promotersKoev and Miller, 2000, Mayo and Ziegler-Graff, 1996
Turnip yellow mosaic tymovirusp69 and p206 replicasefCoat proteinInternal promoterSchirawski et al., 2000, Szybiak et al., 1978
Closterovirus: citrus tristeza; beet yellowsPolymerase precursorEight to ten downstream ORFsInternal transcription elementsGowda et al., 2001, Peremyslov and Dolja, 2002
Geminivirus: tomato leaf curlC1 replication proteindC2 transcription factorInternal promoterMullineaux et al., 1993
Pararetrovirus: rice tungro bacilliformORFs 1, 2, 3fORF4SplicingFütterer et al., 1994

The silent downstream cistron is expressed only upon being moved closer to the 5′ end via production of a second, shorter mRNA.

Translation of most genes derived from these viruses follows straightforward predictions of the scanning mechanism, although occasional deviations have been reported. In rare instances where a 3′ cistron appears to be translated from a dicistronic mRNA (Grundhoff and Ganem, 2001, Kirshner et al., 1999, Nador et al., 2001, Stacey et al., 2000), the virus in question employs a complicated pattern of splicing and therefore the existence of an undetected monocistronic mRNA is not beyond the realm of reason. In some other cases only a small amount of the protein encoded by the 3′ cistron was produced, and the published RNA analyses were not sufficiently sensitive to rule out the presence of an additional subgenomic mRNA (Herbert et al., 1996).

In some cases the listed example is arbitrary, i.e. with retroviruses, coronaviruses, closteroviruses, etc., there are additional polycistronic mRNAs wherein translation is restricted to the 5′ cistron.

Whereas DNA viruses and retroviruses use conventional promoter-switching or splicing mechanisms to generate alternative forms of mRNA that allow translation of the downstream cistron, more complicated mechanisms underlie the production of subgenomic mRNAs by some RNA viruses (Miller and Koev, 2000).

The presence of internal promoters that produce a shorter transcript for each downstream ORF is suggestive, but testing of translation is still needed for the mRNAs produced by cytomegalovirus and geminivirus.

Whereas all retroviruses employ splicing to produce the subgenomic mRNA from which envelope protein (Env) is translated, some retroviruses also employ an internal promoter which is postulated to mediate expression of novel ORFs, such as the superantigen of mouse mammary tumor virus (Reuss and Coffin, 1998) and orf-x of the virus that causes lung cancer in sheep (Palmarini et al., 2002).

See leaky scanning in Table 3 and Fig. 1.

In place of the usual m7G cap, the 5′ end of these viral RNAs carries a covalently linked protein (VPg) or is unblocked. The need for a subgenomic mRNA even in these cases emphasizes that translation is 5′ end-dependent even when it is not cap-dependent.

The full-length genomic mRNA supports translation of the 3′ cistron in vitro but the 3′ cistron is silent in vivo. The latter result is considered more reliable (Meulewaeter et al., 1992).

Partial list of structurally polycistronic viral mRNAs which are functionally monocistronic, i.e. only the first cistron is translateda The silent downstream cistron is expressed only upon being moved closer to the 5′ end via production of a second, shorter mRNA. Translation of most genes derived from these viruses follows straightforward predictions of the scanning mechanism, although occasional deviations have been reported. In rare instances where a 3′ cistron appears to be translated from a dicistronic mRNA (Grundhoff and Ganem, 2001, Kirshner et al., 1999, Nador et al., 2001, Stacey et al., 2000), the virus in question employs a complicated pattern of splicing and therefore the existence of an undetected monocistronic mRNA is not beyond the realm of reason. In some other cases only a small amount of the protein encoded by the 3′ cistron was produced, and the published RNA analyses were not sufficiently sensitive to rule out the presence of an additional subgenomic mRNA (Herbert et al., 1996). In some cases the listed example is arbitrary, i.e. with retroviruses, coronaviruses, closteroviruses, etc., there are additional polycistronic mRNAs wherein translation is restricted to the 5′ cistron. Whereas DNA viruses and retroviruses use conventional promoter-switching or splicing mechanisms to generate alternative forms of mRNA that allow translation of the downstream cistron, more complicated mechanisms underlie the production of subgenomic mRNAs by some RNA viruses (Miller and Koev, 2000). The presence of internal promoters that produce a shorter transcript for each downstream ORF is suggestive, but testing of translation is still needed for the mRNAs produced by cytomegalovirus and geminivirus. Whereas all retroviruses employ splicing to produce the subgenomic mRNA from which envelope protein (Env) is translated, some retroviruses also employ an internal promoter which is postulated to mediate expression of novel ORFs, such as the superantigen of mouse mammary tumor virus (Reuss and Coffin, 1998) and orf-x of the virus that causes lung cancer in sheep (Palmarini et al., 2002). See leaky scanning in Table 3 and Fig. 1.
Table 3

Partial list of cellular and viral mRNAs that produce two separately-initiated proteins by context-dependent leaky scanninga

Source of mRNAIdentifying informationSequence flanking first AUG codonbProtein productscReferences
Glucocorticoid receptor geneHumanCUGaugG; testedLong and short transactivatorsYudt and Cidlowski, 2001
NFAT transcription factor geneHumanCGGaugC90 and 86 kDa isoformsLyakh et al., 1997
C/EBPα genedMouseCCCaugG; tested42 and 30 kDa isoformsLin et al., 1993, Ossipow et al., 1993
Rx/rax homeobox geneMouseUCCaugC; testedLong and short isoformsTucker et al., 2001
GATA-1 geneHuman, mouseCCCaugG (mouse); UCCaugG (human)50 and 40 kDa isoformsCalligaris et al., 1995, Wechsler et al., 2002
Peripherin geneRatUGAaugC; testedLong and short isoformsHo et al., 1995
MxB protein geneHumanCACaugU; testedNuclear and cytoplasmic isoformsMelén et al., 1996
Ubiquitin-activating enzyme E1 geneHumanUUGaugUNuclear and cytoplasmic isoformsHandley-Gearhart et al., 1994, Shang et al., 2001
Microtubule-associated protein geneHumanCCAaugCLong and short isoformsSu and Qi, 2001
Von Hippel-Lindau geneeHumanGGAaugC24 and 18 kDa isoformsIliopoulos et al., 1998, Schoenfeld et al., 1998
S6 kinase geneHuman, ratCCCaugALong and short isoformsGrove et al., 1991, Reinhard et al., 1992
Rlk/Txk tyrosine kinase geneMouseGCCaugALong and short isoformsDebnath et al., 1999
Vitamin D receptor geneChickenUCCaugU; testedLong and short isoformsLu et al., 1997
Val-tRNA synthetase geneArabidopsisUCUaugU; testedMitochondrial and cytoplasmic isoformsSouciet et al., 1999
Simian virus 40Late 19S mRNAUCCaugG; testedCapsid proteins VP2 and VP3Sedman and Mertz, 1988
CytomegalovirusUL4 mRNAGUGaugC; testedInhibitory peptide and gp48Cao and Geballe, 1995
Adenovirus type 5Region E3UAUaugA6.7 kDa protein and gp19KWilson-Rawls et al., 1990, Wold et al., 1986
Adenovirus type 5Region E1BUCCaugG21 kDa and 55kDa proteinsBos et al., 1981
Hepatitis B virus2.1 kb mRNAGCCaugC; testedMiddle (pre-S2) and small (p24) surface proteinsSheu and Lo, 1992
Feline leukemia retrovirusfGenomic mRNACUGaugUgp80gag and pr65gagLaprevotte et al., 1984
HIV-1Spliced mRNAGUAaugC; testedVpu and EnvSchwartz et al., 1992
HIV-1Spliced mRNACCUaugG; testedRev and NefSchwartz et al., 1992
Reovirus (mammalian)RNA segment S1CGGaugG; testedσ1 and 14kDa proteinsFajardo and Shatkin, 1990, Kozak, 1982
Reovirus (baboon)gRNA segment S4UACaugGp15 and p16 proteinsDawe and Duncan, 2002
BunyavirusRNA segment SUCAaugANucleocapsid (N) and NSsBridgen et al., 2001, Elliott and McGregor, 1989
Influenza A virusgRNA segment 2UGAaugGPolymerase subunit PB1 and mitochondrialproteinChen et al., 2001
Barley yellow dwarf luteovirusSubgenomic mRNAUGAaugA; testedCoat protein and p17Dinesh-Kumar and Miller, 1993
Turnip yellow mosaic viruseGenomic mRNACAAaugAp69 and p206replicaseWeiland and Dreher, 1989
Cucumber necrosis virus0.9 kb subgenomic mRNAUUCaugG; testedp21 and p20Johnston and Rochon, 1996
Peanut clump furovirusRNA segment 2CUUaugU; testedp23 (coat) and p39Herzog et al., 1995
Potato virus X potexvirusSubgenomic mRNACAUaugU; tested12 and 8kDa movement proteinsVerchot et al., 1998
Southern bean mosaic virusgGenomic mRNAUUUaugA; testedp21 movement protein and p105polymeraseSivakumaran and Hacker, 1998
BaculovirusdIE0 mRNAGACaugALong and short forms of transactivator (IE0, IE1)Theilmann et al., 2001

Some additional examples of leaky scanning are described in Fig. 1, Fig. 2 and in the text.

In all mRNAs here listed, the sequence flanking the first start codon deviates from the consensus sequence in position −3 and/or position +4, highlighted by underlining. When the postulated link between context and leaky scanning was tested (so marked in this column), mutations that improved the context at the first start site diminished access to the downstream start site. This test failed only with cucumber necrosis virus, where the short distance between the m7G cap and AUG#1 allowed some leaky scanning even when the context was optimized.

In some cases the first and second AUG codons are in the same reading frame, generating long and short versions of the encoded protein which may function differently. In cases where the first and second start codons are in different reading frames, indicated by italicizing the second product, the extent of overlap between the two ORFs ranges from a few codons (peanut clump virus, southern bean mosaic virus) to 626 codons (turnip yellow mosaic virus).

Access to the downstream initiation site via leaky scanning is augmented by a reinitiation shunt, as explained in the text (Section 4.3) and diagrammed in Fig. 1 for C/EBPβ mRNA.

Mutations that eliminate AUG#1 usually increase production of the second, downstream protein. In rare cases where the expected increase was not seen (e.g. von Hippel-Lindau, turnip yellow mosaic virus), it might be because translation of the second protein was restricted at the level of elongation. For a similar reason, improving the context around AUG#1 occasionally fails to elevate production of the protein there initiated (Fajardo and Shatkin, 1990). These entries nevertheless satisfy the main prediction of the leaky scanning mechanism, which is that improving the context around AUG#1 prevents initiation from the second, downstream site (Fajardo and Shatkin, 1990, Iliopoulos et al., 1998).

Whereas feline leukemia virus produces an N-terminally-extended, glycosylated form of Gag (gp80gag) from the indicated weak AUG codon, the corresponding upstream start site in murine leukemia virus is ACCCUGG (Portis et al., 1994). When that site was experimentally ablated, however, revertants expressed the extended protein from a weak upstream AUG codon (UUUaugG) created by a point mutation. Those revertants were selected because the extra glycosylated form of Gag contributes to viral spread (Portis et al., 1996).

In the mRNAs from baboon reovirus, influenza A virus, and southern bean mosaic virus, the indicated proteins derive from the first (weak) and fourth AUG codons. AUG#2 and AUG#3 initiate small ORFs that terminate before AUG#4. Thus, a combination of leaky scanning and reinitiation probably mediates access to the downstream start site.

Fig. 1

Examples of ‘maximally leaky’ scanning wherein one mRNA produces three independently initiated proteins. Major (thick arrow) and minor (thin arrow) translation products are identified below their respective start codons. Sequences that cause the initiation site to be weak, and thus promote leaky scanning, are highlighted in red. Offset rectangles represent ORFs in different reading frames. (A) With c-myc mRNA, a leaky scanning mechanism was inferred from experiments in which optimizing the context around the first AUG codon suppressed production of the 50 kDa isoform, while changing the upstream CUG codon to AUG suppressed production of both the 65 and 50 kDa isoforms (Spotts et al., 1997). Access to the downstream start site might be more complicated than here depicted, as there is a small out-of-frame ORF between the 65 and 50 kDa start sites. (B) With C/EBPβ mRNA, a mutation that strengthens the first start codon (UUCaugC→ACCaugG) blocked production of all shorter isoforms, implicating a leaky scanning mechanism (Calkhoven et al., 2000). A small upORF (blue) superimposes another level of control, causing more ribosomes to bypass the start site for isoform B1 than would be expected from leaky scanning alone. Presumably because the AUGSTART codon for isoform B1 is positioned close to the termination site of the upORF, reinitiation at site B1 is inefficient and some ribosomes thus reach the far downstream start site for the 20 kDa isoform (LIP). As evidence for this reinitiation shunt, Calkhoven et al. (2000) showed that eliminating the AUG codon of the upORF abolished production of LIP and that strengthening or weakening the context around the upORF start codon caused corresponding changes in the yield of LIP. Although the smallest form of C/EBPβ can be generated in some situations by proteolysis (Dearth et al., 2001), the effects of the aforementioned mutations clearly implicate a translational mechanism. The LAP/LIP ratio shows tissue and stage specific variation (Dearth et al., 2001, Descombes and Schibler, 1991). (C) Whereas leaky scanning allows initiation at multiple sites within a single ORF in C/EBPβ and c-myc mRNAs, leaky scanning allows translation of three separate ORFs in the pregenomic mRNA of rice tungro bacilliform virus. These ORFs (not drawn to scale) have overlapping start and stop codons of the form AUGA. Translation via leaky scanning was inferred from the strong reduction (>13-fold) in translation of ORF2 and ORF3 when the start codon of ORF1 was changed from AUU to AUG (Fütterer et al., 1997) and from the inhibitory effect on expression of ORF3 when an adventitious AUG codon was inserted into ORF2. The 5′ leader sequence that precedes ORF1 has ten small upORFs which are not depicted here because that peculiar leader sequence, postulated to be translated by ribosome hopping (Fütterer et al., 1996), is not required for the leaky scanning mechanism that underlies translation of ORFs 1, 2 and 3. (D) The avian reovirus S1 mRNA supports translation of one structural and two nonstructural proteins (Bodelón et al., 2001). The depicted mechanism postulates that ORF1 has a dual function, encoding its own polypeptide (p10) and facilitating translation of ORF3 by shunting some ribosomes past the strong AUGSTART codon for ORF2. The absence of extraneous AUG codons in the 310 nt region between the end of ORF1 and the start of ORF3 is consistent with the idea that ORF3 might be translated by reinitiation. Some ribosomes would be expected to translate p17 (ORF2) by leaky scanning, engendered by the poor context at the start of ORF1. Improving the context at the start of ORF1 indeed increased production of p10 (Shmulevitz et al., 2002); unfortunately, the yield of p17, which would be expected to decrease, was not monitored. The observation that strengthening the context at the start of ORF1 had no effect on the yield of σC is not surprising because the reinitiation mechanism postulated to underlie translation of ORF3 would probably be limited by other features, such as the relatively large size of ORF1.

In place of the usual m7G cap, the 5′ end of these viral RNAs carries a covalently linked protein (VPg) or is unblocked. The need for a subgenomic mRNA even in these cases emphasizes that translation is 5′ end-dependent even when it is not cap-dependent. The full-length genomic mRNA supports translation of the 3′ cistron in vitro but the 3′ cistron is silent in vivo. The latter result is considered more reliable (Meulewaeter et al., 1992). The same problem and same solution – post-transcriptional processing of polycistronic mRNAs – underlie the expression of many genes in Caenorhabditis elegans (Blumenthal et al., 2002, Hough et al., 1999). In mammalian cells, mRNAs that contain two full-length nonoverlapping cistrons are extremely rare and, as with the aforementioned viruses, actual translation of the 3′ cistron probably occurs from a second, monocistronic mRNA (Pardigol et al., 1998, Westerman et al., 2001) or from a second mRNA in which the two cistrons are fused into a single translation unit (Gray and Nicholls, 2000, Hänzelmann et al., 2002). A dicistronic transcript derived from the mouse Snurf-Snrpn locus (Gray et al., 1999) barely supports translation of the second cistron, as discussed below in the section on reinitiation (Section 4). Recently discovered dicistronic transcripts produced from the mouse Hyal locus support translation of only the 5′ cistron (Shuttleworth et al., 2002). A few other reported dicistronic mRNAs await testing. Notwithstanding the documented inability to translate the 3′ cistron in natural dicistronic mRNAs, synthetic dicistronic mRNAs – constructed by inserting a putative IRES element between two reporter genes – appear sometimes to allow translation of the downstream cistron. The interpretation that this occurs via direct internal initiation of translation has been questioned (Kozak, 2001a) and defended (Hellen and Sarnow, 2001) in other reviews.

First-AUG rule

The position effect, indicative of scanning, is seen when a mutation creates an AUG codon upstream from the normal start codon and translation shifts to the upstream site (Bergenhem et al., 1992, Cai et al., 1992, Gross et al., 1998, Harington et al., 1994, Liu et al., 1999, Lock et al., 1991, Mével-Ninio et al., 1996, Muralidhar et al., 1994, Wada et al., 1995). In the most stringent test of the rule, the first AUG codon was shown to be the exclusive site of initiation even when the second AUG was positioned just a few bases downstream from, and in the same optimal context as, the first (Kozak, 1995). The position effect is seen also when removal of the first start codon activates initiation from the next AUG downstream. Some genes require production of two versions of the encoded protein, wherein the shorter version, initiated from an internal AUG codon, lacks the N-terminal domain of the longer isoform. The problem of how ribosomes can gain access to an internal start codon is solved by producing, via splicing or a downstream promoter, a second form of mRNA from which the first AUGSTART codon has been removed. Table 2 lists some examples. The N-terminally truncated isoform thus produced may reside in a different cellular compartment, or may function as an antagonist to the full-length protein (as seen with various transcription factors listed in Table 2), or may function in a surprising way. One such surprise was the discovery that a truncated form of tryptophanyl-tRNA-synthetase (‘miniTrpRS’) has angiostatic activity (Wakasugi et al., 2002).
Table 2

Partial list of vertebrate genes that produce a second, shorter version of the encoded protein via a second form of mRNA in which an internal AUG codon becomes a functional start site upon elimination of the upstream AUGSTART codona

GeneSourceReferences
Tryptophanyl-tRNA synthetaseHumanTolstrup et al., 1995, Wakasugi et al., 2002
Stromelysin 3HumanLuo et al., 2002
Procaspase-8HumanBreckenridge et al., 2002
ATBF1 transcription factorbHumanBerry et al., 2001
Hepatocyte-nuclear factor 3βbMousePhilippe, 1995
ZAC transcription factorHumanBilanges et al., 2001
HOF transcription factorMouseMitchelmore et al., 2002
Lymphoid enhancer factor-1 (LEF1)bHumanHovanes et al., 2001
RIZ transcription factorHumanJiang et al., 1999, Liu et al., 1997
Estrogen receptor-αbHumanDenger et al., 2001, Flouriot et al., 2000
Thyroid hormone receptor-βRatWilliams, 2000
Progesterone receptorbHumanKastner et al., 1990a
CCAAT enhancer binding protein (C/EBP) εHumanYamanaka et al., 1997
SmoothelinHumanRensen et al., 2002
Protein kinase NtkMouseChow et al., 1994
Protein kinase ChkbRatShann and Hsu, 2001
MDM2 oncogenebMousePerry et al., 2000, Saucedo et al., 1999
MXI1 tumor suppressor geneHuman, mouseSchreiber-Agus et al., 1995, Wechsler et al., 1996
Dopamine-regulated phosphoproteinHumanEl-Rifai et al., 2002
Adenosine deaminaseHumanKawakubo and Samuel, 2000
Caveolin-1MurineKogo and Fujimoto, 2000
Nitric-oxide synthaseHumanWang et al., 1997
GelsolincHumanKwiatkowski et al., 1988
Serine:pyruvate aminotransferasecRatOda et al., 1990
Alanine:glyoxylate aminotransferasecRat, frogHolbrook and Danpure, 2002
Phospholipid-hydroperoxide GTH peroxidasecRodentKnopp et al., 1999, Pushpa-Rekha et al., 1995
Folylpoly-γ-glutamate synthetasecHumanFreemantle et al., 1995, Turner et al., 2000
Porphobilinogen deaminasedHuman, mouseChretien et al., 1988, Porcher et al., 1991
Erythroid membrane protein 4.1dHuman, mouseConboy et al., 1991, Huang et al., 1993
p120ctn cateninc, dHuman, mouseAho et al., 2002, Keirsebilck et al., 1998, Mo and Reynolds, 1996
Carbonic anhydrase VIMouseSok et al., 1999
Water channel aquaporin 4Human, mouseLu et al., 1996, Zelenin et al., 2000
Sterol carrier protein 2Rat liverSeedorf and Assmann, 1991
β1,4-galactosyltransferaseeMouseHarduin-Lepers et al., 1993
Calmodulin-kinase IV/calsperminRodentSun et al., 1995

Production of long and short protein isoforms via this mechanism is seen also with genes from insects (Mével-Ninio et al., 1996), plants (Cunillera et al., 1997, Wimmer et al., 1997), yeast (Beltzer et al., 1988, Carlson et al., 1983, Chatton et al., 1988, Ellis et al., 1989, Gammie et al., 1999, Natsoulis et al., 1986, Wolfe et al., 1996) and viruses (Barbosa and Wettstein, 1988, Lambert et al., 1987, Liu and Roizman, 1991, Liu and Biegalke, 2002, Weimer et al., 1987, Welch et al., 1991, Wu et al., 1993b, Zheng et al., 1994).

In these cases, the long and short protein isoforms have different functional effects. Other genes that resemble this pattern, producing long and short isoforms with contrasting functions, are not listed in the table because the AUGSTART codon for the shorter protein is carried on an alternative exon present only in the shorter mRNA (e.g. Koski et al., 1999, Molina et al., 1993). That arrangement does not illustrate the main point of the table, which is that a silent internal AUG codon in the longer mRNA can be activated simply by truncating the transcript.

The long and short isoforms are targeted to different cellular compartments.

The long and short isoforms are expressed in different tissues.

The long and short forms of β1,4-galactosyltransferase appear to function identically. The main significance of the promoter switch, which eliminates the first AUGSTART codon, is that the shorter 5′ UTR supports translation more efficiently (Charron et al., 1998).

Partial list of vertebrate genes that produce a second, shorter version of the encoded protein via a second form of mRNA in which an internal AUG codon becomes a functional start site upon elimination of the upstream AUGSTART codona Production of long and short protein isoforms via this mechanism is seen also with genes from insects (Mével-Ninio et al., 1996), plants (Cunillera et al., 1997, Wimmer et al., 1997), yeast (Beltzer et al., 1988, Carlson et al., 1983, Chatton et al., 1988, Ellis et al., 1989, Gammie et al., 1999, Natsoulis et al., 1986, Wolfe et al., 1996) and viruses (Barbosa and Wettstein, 1988, Lambert et al., 1987, Liu and Roizman, 1991, Liu and Biegalke, 2002, Weimer et al., 1987, Welch et al., 1991, Wu et al., 1993b, Zheng et al., 1994). In these cases, the long and short protein isoforms have different functional effects. Other genes that resemble this pattern, producing long and short isoforms with contrasting functions, are not listed in the table because the AUGSTART codon for the shorter protein is carried on an alternative exon present only in the shorter mRNA (e.g. Koski et al., 1999, Molina et al., 1993). That arrangement does not illustrate the main point of the table, which is that a silent internal AUG codon in the longer mRNA can be activated simply by truncating the transcript. The long and short isoforms are targeted to different cellular compartments. The long and short isoforms are expressed in different tissues. The long and short forms of β1,4-galactosyltransferase appear to function identically. The main significance of the promoter switch, which eliminates the first AUGSTART codon, is that the shorter 5′ UTR supports translation more efficiently (Charron et al., 1998). The entries in Table 2 and some other examples (Aichem and Mutzel, 2001, Beuret et al., 1999, Falvey et al., 1995, Nagpal et al., 1992) are what I call natural tests of the position rule. Additional evidence comes from experimental manipulations wherein removal of the first AUG was shown to activate initiation from a downstream site (Cahana et al., 2001, Chenik et al., 1995, Tailor et al., 2001, Thoma et al., 2001).

Inhibition by upstream AUG codons

The scanning mechanism predicts that the 5′ untranslated region (5′ UTR, an unfortunate misnomer) is actually traversed by ribosomes. This explains why translation of the major coding domain is reduced when adventitious out-of-frame AUG codons occur upstream. The upstream AUG codons often create small ORFs (upORFs) which are indeed translated, as shown by detecting the encoded peptide (Hackett et al., 1986, Raney et al., 2000, Wang and Wessler, 2001) or by fusing a reporter gene to the upORF (Abastado et al., 1991, Donzé et al., 1995, Liu et al., 1999, Steel et al., 1996, Tanaka et al., 2001, Xu et al., 2001). The fusion test is the more reliable, as small peptides are usually degraded rapidly. Even if upstream AUG codons are arranged in a way that allows reinitiation, there is a penalty because reinitiation is usually inefficient. This topic will be discussed at length in Section 4.

Inhibition by secondary structure

The hypothesis that the 5′ UTR is traversed by ribosomes explains why a highly structured 5′ leader sequence is so detrimental to translation. Vertebrate mRNAs characteristically have long, GC-rich – hence highly structured – leader sequences (Kozak, 1991a, Macleod et al., 1998), and the resulting difficulty in translation has been discovered over and over in the course of cloning. Even a short GC-rich 5′ UTR can inhibit profoundly, as illustrated in cases where a gene produces a mixture of mRNAs with different leader sequences, and the worst-translated mRNA was found to be the form with the shortest 5′ UTR (Jiang and Lucy, 2001, Yang et al., 1998). A stem-and-loop structure, stabilized in some cases by a repressor protein, is most inhibitory when its proximity to the 5′ end blocks ribosome binding (Goossen and Hentze, 1992, Kozak, 1989, Wang and Wessler, 2001). If the structure is far enough from the 5′ end to allow ribosome entry, the advancing 40S ribosome/factor complex apparently has some ability to disrupt base pairing, but this ability is notably less than that of 80S elongating ribosomes (Kozak, 1986a, Kozak, 2001b, Lingelbach and Dobberstein, 1988, Paraskeva et al., 1999) and is curtailed in yeast (Koloteva et al., 1997). While there are mechanisms for reducing the inhibitory effects of upstream AUG codons, as discussed below, no mechanism has yet been defined for modulating the inhibitory effects of secondary structure. Some studies suggest that secondary structure might be less inhibitory to translation in vivo than in vitro (Charron et al., 1998, Curnow et al., 1995, Hensold et al., 1997, Hoover et al., 1997, Morrish and Rumsby, 2001, Van der Velden et al., 2002). This could be due to production of an alternative transcript that simply eliminates the secondary structure – a reasonable possibility given that GC-rich domains often harbor promoter elements – or to modification of the translation machinery. Interpretation of in vivo tests of translation could also be complicated by effects of secondary structure on mRNA stability (Stefanovic et al., 1999). Whether and how translation of GC-rich leader sequences might improve in exponentially growing cells (Nielsen et al., 1995) remains an important open question.

Non-AUG start codons

The scanning mechanism rationalizes the occurrence of initiation at upstream ACG or CUG codons in some mRNAs. These alternative codons are usually too weak to actually substitute for the AUGSTART codon (reviewed by Kozak, 1999; for some exceptions see Falvey et al., 1995, Kiefer et al., 1994, Riechmann et al., 1999, Sadler et al., 1999). It is not uncommon, however, for initiation to occur at an upstream non-AUG codon in addition to the first AUG (see leaky scanning in Section 3). This is observed frequently with cellular genes that have highly structured, GC-rich leader sequences (Kozak, 1991b), perhaps because secondary structure slows scanning and thus allows more time for the mismatched codon to pair with Met-tRNAi. With some viruses, the extra protein isoform initiated from an upstream non-AUG codon serves an essential function (Muralidhar et al., 1994, Portis et al., 1994, Portis et al., 1996). While the N-terminally-extended isoforms derived from some cellular genes also display distinct functions (Arnaud et al., 1999, Calkhoven et al., 2000, Spotts et al., 1997) or patterns of localization (Acland et al., 1990, Lock et al., 1991, Packham et al., 1997), it would not be surprising if some other upstream-initiated proteins turn out to be inadvertent byproducts generated in the course of slowly traversing a GC-rich leader sequence.

Introduction to harder cases

Because the ability of the scanning mechanism to explain the big picture is generally accepted, the remainder of this review directs attention, not to examples that can be seen readily to support the model, but to mRNAs that seem to be poorly designed for a scanning mode of initiation. The main point is that the scanning mechanism applies even in these difficult cases. Understandably, such mRNAs are translated inefficiently and this brings out a second important point: some critical regulatory genes require protein synthesis to be inefficient. An earlier review raised awareness that genes that encode potent regulatory proteins – cytokines, growth factors, kinases, transcription factors – often produce mRNAs in which the 5′ leader sequence is GC-rich or burdened by upstream AUG codons (Kozak, 1991a). Some examples described herein validate the prediction that these encumbered 5′ sequences are nature's way of limiting the synthesis of potent proteins that would be harmful if overproduced. I also suggested in earlier reviews that, when a cDNA sequence has so many upstream AUG codons as to challenge the applicability of the scanning mechanism, it is wise to ask whether the cDNA correctly reflects the structure of the mRNA. That advice is not changed by what is written here. Very often, cDNA sequences that appear incompatible with scanning have been found to derive from incompletely spliced transcripts or to have been misinterpreted in other ways (Kozak, 1996, Kozak, 2000). In other cases, although an encumbered cDNA sequence is correct, it derives from a transcript that does not support translation (Hake and Hecht, 1993, Han et al., 2002, Larsen et al., 2002, Lee et al., 2000). Only after one is certain of the mRNA structure should the mechanisms below be considered.

Context-dependent leaky scanning

Definition, conventional examples, exceptions

In mammals, the optimal context for recognition of the AUGSTART codon is GCCRCCaugG. Within this motif, the purine (R) in position −3 is the most highly conserved (see Section 6.1) and functionally the most important position. The importance of A or G (A is somewhat better than G) in position −3 was proved by mutagenesis experiments on a wide variety of genes (Kozak, 1986b; and see entries marked ‘tested’ in Table 3). The G in position +4 is also highly conserved and, especially in the absence of A in position −3, contributes strongly (Kozak, 1997). Adherence to the rest of the GCCRCCaugG motif varies, without major consequences as long as positions −3 and +4 conform; the upstream GCC motif can be seen to contribute, however, in the absence of other elements (Kozak, 1987b). Partial list of cellular and viral mRNAs that produce two separately-initiated proteins by context-dependent leaky scanninga Some additional examples of leaky scanning are described in Fig. 1, Fig. 2 and in the text.
Fig. 2

Examples of minimally leaky scanning in which a strong, but not quite perfect, context at AUG#1 causes most ribosomes to initiate there while allowing a low level of initiation downstream. With the depicted viral mRNAs (A,B), the predominant product of translation is the capsid protein initiated from AUG#1. Low-level leaky scanning generates a small but adequate amount of the indicated second protein. With bovine coronavirus, a mutation in position +4 (U→G, indicated in red) flanking AUG#1 strongly reduced translation from the downstream site (Senanayake and Brian, 1997), supporting the interpretation that the natural mRNA is slightly leaky because the context flanking AUG#1 is not a perfect match to the consensus sequence. With hepatitis B virus, ribosomes en route to the P start site (AUG#5) apparently bypass the weak AUG#2 by leaky scanning, while translation of the small ORF initiated at AUG#3 enables some ribosomes to miss the inhibitory AUG#4 (inhibitory because it resides in a strong context and overlaps the P ORF) and thus to reach AUG#5. Whereas the core protein start codon (AUG#1) here depicted resides in a context which allows a low level of leaky scanning, a slightly longer mRNA which encodes the pre-core protein has a stronger start codon (A in position −3) and polymerase cannot be translated from that form of mRNA (Fouillot and Rossignol, 1996). The publications on which the scheme shown here is based (Fouillot et al., 1993, Hwang and Su, 1998) also discuss some alternative possibilities vis-à-vis translation of polymerase. (C) The first AUG codon in rat histone H4 mRNA initiates translation of the full-length protein. The second AUG, 85 codons downstream and in the same reading frame, initiates production of a peptide which has growth-regulatory properties (Bab et al., 1999). (D) With rat A2AR adenosine receptor mRNA, an overlapping upORF that initiates at an AUG codon in a strong context is used to minimize production of A2AR protein. The overlapping arrangement precludes reinitiation but the not-quite-perfect context at the upstream start site allows low-level leaky scanning. This interpretation is supported by the observed ten-fold increase in translation of A2AR in vivo when the start codon of the upORF was eliminated (Lee et al., 1999). Via a second promoter, the rat A2A-R gene produces some transcripts with additional upORFs, but no transcript has yet been found that lacks the inhibitory upORF discussed here. Here and in Fig. 3, the major coding domain is shaded gray. Small regulatory ORFs (blue rectangles) are not drawn to scale.

In all mRNAs here listed, the sequence flanking the first start codon deviates from the consensus sequence in position −3 and/or position +4, highlighted by underlining. When the postulated link between context and leaky scanning was tested (so marked in this column), mutations that improved the context at the first start site diminished access to the downstream start site. This test failed only with cucumber necrosis virus, where the short distance between the m7G cap and AUG#1 allowed some leaky scanning even when the context was optimized. In some cases the first and second AUG codons are in the same reading frame, generating long and short versions of the encoded protein which may function differently. In cases where the first and second start codons are in different reading frames, indicated by italicizing the second product, the extent of overlap between the two ORFs ranges from a few codons (peanut clump virus, southern bean mosaic virus) to 626 codons (turnip yellow mosaic virus). Access to the downstream initiation site via leaky scanning is augmented by a reinitiation shunt, as explained in the text (Section 4.3) and diagrammed in Fig. 1 for C/EBPβ mRNA. Mutations that eliminate AUG#1 usually increase production of the second, downstream protein. In rare cases where the expected increase was not seen (e.g. von Hippel-Lindau, turnip yellow mosaic virus), it might be because translation of the second protein was restricted at the level of elongation. For a similar reason, improving the context around AUG#1 occasionally fails to elevate production of the protein there initiated (Fajardo and Shatkin, 1990). These entries nevertheless satisfy the main prediction of the leaky scanning mechanism, which is that improving the context around AUG#1 prevents initiation from the second, downstream site (Fajardo and Shatkin, 1990, Iliopoulos et al., 1998). Whereas feline leukemia virus produces an N-terminally-extended, glycosylated form of Gag (gp80gag) from the indicated weak AUG codon, the corresponding upstream start site in murine leukemia virus is ACCCUGG (Portis et al., 1994). When that site was experimentally ablated, however, revertants expressed the extended protein from a weak upstream AUG codon (UUUaugG) created by a point mutation. Those revertants were selected because the extra glycosylated form of Gag contributes to viral spread (Portis et al., 1996). In the mRNAs from baboon reovirus, influenza A virus, and southern bean mosaic virus, the indicated proteins derive from the first (weak) and fourth AUG codons. AUG#2 and AUG#3 initiate small ORFs that terminate before AUG#4. Thus, a combination of leaky scanning and reinitiation probably mediates access to the downstream start site. The aforementioned mutagenesis experiments define two extremes: (i) when the first AUG codon occurs in a strong context – ANNaugN or GNNaugG – all or almost all ribosomes stop and initiate at that point; (ii) when the first AUG resides in a very weak context, lacking both R in position −3 and G in position +4, some ribosomes initiate at that point but most continue scanning and initiate farther downstream. This leaky scanning enables the production of two separately initiated proteins from one mRNA, as documented below. It is harder to predict what happens at start sites that fall between the extremes, i.e. mRNAs in which the first potential start codon has the sequence YNNaugG, GNNaugY or GNNaugA. Leaky scanning is seen in some but not all such cases. A possible explanation suggested by studies with test transcripts (Kozak, 1990a) is that initiation might be restricted to the first AUG codon, despite a suboptimal context, when downstream secondary structure slows scanning and thus provides more time for codon/anticodon pairing. Suppression of leaky scanning via this mechanism requires a critical distance (13–15 nt, which corresponds to half the diameter of the ribosome) between the AUG codon and the downstream structured element. Table 3 lists some examples in which two proteins are produced from one mRNA via leaky scanning. The postulated link between context and leaky scanning has been tested in many of these cases by showing that, upon improving the context at the upstream site, initiation from the second site is reduced or abolished. (Whether the second AUG codon resides in a strong or weak context is not relevant; the ribosome reads the mRNA linearly and thus the decision to stop or to bypass the first AUG is not influenced by whether there is a better initiation site downstream.) The large number of genes that employ leaky scanning precludes discussion of the biological significance of the proteins thereby produced, but it merits noting that, for many of the viruses in Table 3, replication requires production of both listed proteins. For some other viruses, the second protein is a virulence factor that weakens host defenses (Bridgen et al., 2001, Chen et al., 2001, Weber et al., 2002). The biological importance of these downstream-initiated proteins shows that leaky scanning is a deliberately employed tool; it does not simply reflect sloppiness on the part of the translational machinery. The long list of examples in Table 3 conforms to expectations in that the first start codon resides in a suboptimal context. There are, however, rare instances of leaky scanning despite a good context (R−3 and G+4) around the first AUG. This can happen when the first AUG codon is too close to the 5′ end to be recognized efficiently (Kaneda et al., 2000, Kozak, 1991c, Ruan et al., 1994, Sedman et al., 1990, Slusher et al., 1991, Spiropoulou and Nichol, 1993, Werten et al., 1999) or when the facilitating effect of G in position +4 is canceled by U in position +5 (Kozak, 1997, Sloan et al., 1999, Stallmeyer et al., 1999). Other occasional claims of leaky scanning despite a strong context at the first AUG codon were simply mistaken (Scherer et al., 1995); the shorter protein turned out to be translated from a second form of mRNA (Kogo and Fujimoto, 2000). At the opposite extreme, there are rare mammalian mRNAs in which, despite a very unfavorable context flanking the first AUG codon, translation appears to initiate exclusively at that site (Arai et al., 1991, Hickey and Roth, 1993, Leslie et al., 1992, McNeil et al., 1992, Plowman et al., 1990, Wu et al., 1993a). Leaky scanning might be suppressed in these few cases because of downstream secondary structure, or because the wider context (C in positions −1, −2, −4, −5; G in position −6) compensates to some extent for the absence of R−3 and G+4, or for other unknown reasons. The same principle that allows initiation from the first and second AUG codons when the first AUG is in a suboptimal context (Table 3) applies in cases where translation initiates at an upstream non-AUG codon in addition to the first AUG (Acland et al., 1990, Arnaud et al., 1999, Carroll and Derse, 1993, Fajardo et al., 1993, Florkiewicz and Sommer, 1989, Fütterer et al., 1996, Fuxe et al., 2000, Lock et al., 1991, Muralidhar et al., 1994, Packham et al., 1997, Saris et al., 1991, Spotts et al., 1997). Recognition of an upstream ACG, CUG or GUG start codon requires a strong context (Portis et al., 1994), despite which scanning is usually leaky because the initiator codon itself is weak. Production of long and short protein isoforms via leaky scanning is harder to regulate – e.g. to achieve tissue specific expression of one or the other form – than when a unique mRNA encodes each isoform, as in Table 2. There are hints, however, that dual initiation via leaky scanning might be regulable (Probst-Kepper et al., 2001, Spotts et al., 1997). This could conceivably be accomplished via proteins that stabilize downstream secondary structure or, perhaps, via a combination of leaky scanning and regulated reinitiation, if the mRNA also has small upORFs. Among the examples in Table 3 are many plant viruses, indicating that the basic context rules extend to plant systems. Mutagenesis experiments confirm the functional importance of R in position −3 and G in position +4 in plant mRNAs (Jones et al., 1988, Lukaszewicz et al., 2000) and surveys of plant cDNA sequences confirm the conservation of those key positions (Pesole et al., 2000, Rogozin et al., 2001). Unlike mammalian mRNAs, however, plant mRNAs do not show a predominance of C in positions −1, −2, −4 and −5. The foregoing discussion pertains to mRNAs from plants and vertebrate animals. There is some evidence for context-dependent leaky scanning in fungi (Arst and Sheerins, 1996), but context effects on initiation have not yet been studied carefully in protozoa, insects, and various other systems. The observation that trans-splicing of mRNAs in C. elegans sometimes brings a purine into position −3 is interesting (Hough et al., 1999) but the significance awaits testing. With a number of yeast genes, there is a hint of leaky scanning when the usual A in position −3 is replaced by a pyrimidine (Gaba et al., 2001, Slusher et al., 1991, Vilela et al., 1998, Welch and Jacobson, 1999, Wolfe et al., 1994). Context effects were not evident, however, in other studies of translation in yeast (Cigan et al., 1988b). For whatever reason, leaky scanning is rare in yeast, apart from a few cases attributable to the first AUG codon residing too close to the 5′ end.

Pushing the limits: maximally leaky mRNAs

mRNAs that initiate translation from three sites provide a striking illustration of how far leaky scanning, alone or in combination with reinitiation, can be pushed. Fig. 1 shows some examples. Examples of ‘maximally leaky’ scanning wherein one mRNA produces three independently initiated proteins. Major (thick arrow) and minor (thin arrow) translation products are identified below their respective start codons. Sequences that cause the initiation site to be weak, and thus promote leaky scanning, are highlighted in red. Offset rectangles represent ORFs in different reading frames. (A) With c-myc mRNA, a leaky scanning mechanism was inferred from experiments in which optimizing the context around the first AUG codon suppressed production of the 50 kDa isoform, while changing the upstream CUG codon to AUG suppressed production of both the 65 and 50 kDa isoforms (Spotts et al., 1997). Access to the downstream start site might be more complicated than here depicted, as there is a small out-of-frame ORF between the 65 and 50 kDa start sites. (B) With C/EBPβ mRNA, a mutation that strengthens the first start codon (UUCaugC→ACCaugG) blocked production of all shorter isoforms, implicating a leaky scanning mechanism (Calkhoven et al., 2000). A small upORF (blue) superimposes another level of control, causing more ribosomes to bypass the start site for isoform B1 than would be expected from leaky scanning alone. Presumably because the AUGSTART codon for isoform B1 is positioned close to the termination site of the upORF, reinitiation at site B1 is inefficient and some ribosomes thus reach the far downstream start site for the 20 kDa isoform (LIP). As evidence for this reinitiation shunt, Calkhoven et al. (2000) showed that eliminating the AUG codon of the upORF abolished production of LIP and that strengthening or weakening the context around the upORF start codon caused corresponding changes in the yield of LIP. Although the smallest form of C/EBPβ can be generated in some situations by proteolysis (Dearth et al., 2001), the effects of the aforementioned mutations clearly implicate a translational mechanism. The LAP/LIP ratio shows tissue and stage specific variation (Dearth et al., 2001, Descombes and Schibler, 1991). (C) Whereas leaky scanning allows initiation at multiple sites within a single ORF in C/EBPβ and c-myc mRNAs, leaky scanning allows translation of three separate ORFs in the pregenomic mRNA of rice tungro bacilliform virus. These ORFs (not drawn to scale) have overlapping start and stop codons of the form AUGA. Translation via leaky scanning was inferred from the strong reduction (>13-fold) in translation of ORF2 and ORF3 when the start codon of ORF1 was changed from AUU to AUG (Fütterer et al., 1997) and from the inhibitory effect on expression of ORF3 when an adventitious AUG codon was inserted into ORF2. The 5′ leader sequence that precedes ORF1 has ten small upORFs which are not depicted here because that peculiar leader sequence, postulated to be translated by ribosome hopping (Fütterer et al., 1996), is not required for the leaky scanning mechanism that underlies translation of ORFs 1, 2 and 3. (D) The avian reovirus S1 mRNA supports translation of one structural and two nonstructural proteins (Bodelón et al., 2001). The depicted mechanism postulates that ORF1 has a dual function, encoding its own polypeptide (p10) and facilitating translation of ORF3 by shunting some ribosomes past the strong AUGSTART codon for ORF2. The absence of extraneous AUG codons in the 310 nt region between the end of ORF1 and the start of ORF3 is consistent with the idea that ORF3 might be translated by reinitiation. Some ribosomes would be expected to translate p17 (ORF2) by leaky scanning, engendered by the poor context at the start of ORF1. Improving the context at the start of ORF1 indeed increased production of p10 (Shmulevitz et al., 2002); unfortunately, the yield of p17, which would be expected to decrease, was not monitored. The observation that strengthening the context at the start of ORF1 had no effect on the yield of σC is not surprising because the reinitiation mechanism postulated to underlie translation of ORF3 would probably be limited by other features, such as the relatively large size of ORF1. The predominant translation product obtained from c-myc mRNA is a 65 kDa ‘long form 2’ which initiates at the first AUG codon (Fig. 1A). A small amount of a longer isoform (68 kDa) derives from an upstream CUG codon which is a weak start site (i.e. very leaky) because the codon is not AUG. Although the first AUG codon has the required A in position −3, a small percentage of ribosomes bypass that site and initiate at the next AUG, producing a third (50 kDa) form of c-myc. This happens apparently because the context flanking the first AUG codon is good but not perfect. Thus, production of the 50 kDa isoform was eliminated when the upstream site was changed from ACGaugC to ACCaugG (Spotts et al., 1997). Fig. 1B depicts another example in which ribosomes initiate from three in-frame AUG codons. With the mRNA that encodes C/EBPβ, access to the far downstream site via leaky scanning is augmented by a reinitiation shunt, as explained in the legend to Fig. 1 and discussed further in Section 4. Translation of C/EBPα mRNA occurs by a mechanism similar to that depicted for C/EBPβ except that the first start site in C/EBPα is a CUG codon (Calkhoven et al., 2000), which generates a smaller amount of the longest protein (isoform A) than does the AUG codon in C/EBPβ. With c-myc, C/EBPβ and C/EBPα mRNAs, leaky scanning is biologically important because the long and short versions of the protein have opposing effects as regulators of transcription. It is striking that leaky scanning can operate even when the second initiation site resides far downstream from the first. With synthetic transcripts designed to test the processivity of scanning, there was no reduction in initiation from the downstream site when the inter-AUG distance was expanded stepwise from 11 to 251 nt (Kozak, 1998). In some remarkable viral mRNAs, the second functional initiation site is more than 500 nt downstream from the first AUG (Herzog et al., 1995, Sivakumaran and Hacker, 1998). The pregenomic mRNA of rice tungro bacilliform virus (Fig. 1C) provides the most dramatic illustration of these points. Use of a weak (non-AUG) codon to initiate ORF1 and an unfavorable context at the start of ORF2 (UACaugA) enables the majority of ribosomes to reach and initiate at the start of ORF3. The ORF3 polyprotein is thought to be a precursor from which coat protein, protease and reverse transcriptase are derived by proteolysis. The remarkable absence of AUG codons from the long (563 nt) coding domain of ORF1 and the presence of but one weak AUG codon within ORF2 underscore how carefully this mRNA is constructed to support translation via scanning. The careful construction includes minimizing the overlap between adjacent cistrons. Without that precaution, elongational occlusion might work against utilization of a far downstream start codon, as documented in other cases (Kozak, 1995). The avian reovirus RNA diagrammed in Fig. 1D offers another example of initiation from three sites in one mRNA. Additional experiments are needed to validate the postulated mechanism.

Pushing the limits: minimally leaky mRNAs

In contrast with the ‘maximally leaky’ mRNAs in Fig. 1, the mRNAs in Fig. 2 are minimally leaky: only a small fraction of ribosomes bypass the first AUG codon and initiate downstream. Here the leaky scanning mechanism has been pushed to the limits in the sense that there is (low-level) initiation from a second site despite the presence of a strong context around the first AUG codon. The explanation is that the context flanking the first AUGSTART codon is good but not perfect. The resulting low-level leaky scanning enables the viruses depicted in Fig. 2A,B to produce two proteins – one abundant, the second in small amounts – from a single mRNA. Experimental manipulations that support this interpretation are summarized in the legend to Fig. 2. A few other viral genes that might fit this category have been described (Chenik et al., 1995, Jayakar and Whitt, 2002). The hepatitis B virus example is noteworthy because, via the Rube Goldberg mechanism diagrammed in Fig. 2B, reverse transcriptase encoded by the P gene is initiated independently from a far downstream site, unlike most other reverse transcriptase genes which lack an independent start codon and therefore require frameshifting during translation of the preceding core gene. Examples of minimally leaky scanning in which a strong, but not quite perfect, context at AUG#1 causes most ribosomes to initiate there while allowing a low level of initiation downstream. With the depicted viral mRNAs (A,B), the predominant product of translation is the capsid protein initiated from AUG#1. Low-level leaky scanning generates a small but adequate amount of the indicated second protein. With bovine coronavirus, a mutation in position +4 (U→G, indicated in red) flanking AUG#1 strongly reduced translation from the downstream site (Senanayake and Brian, 1997), supporting the interpretation that the natural mRNA is slightly leaky because the context flanking AUG#1 is not a perfect match to the consensus sequence. With hepatitis B virus, ribosomes en route to the P start site (AUG#5) apparently bypass the weak AUG#2 by leaky scanning, while translation of the small ORF initiated at AUG#3 enables some ribosomes to miss the inhibitory AUG#4 (inhibitory because it resides in a strong context and overlaps the P ORF) and thus to reach AUG#5. Whereas the core protein start codon (AUG#1) here depicted resides in a context which allows a low level of leaky scanning, a slightly longer mRNA which encodes the pre-core protein has a stronger start codon (A in position −3) and polymerase cannot be translated from that form of mRNA (Fouillot and Rossignol, 1996). The publications on which the scheme shown here is based (Fouillot et al., 1993, Hwang and Su, 1998) also discuss some alternative possibilities vis-à-vis translation of polymerase. (C) The first AUG codon in rat histone H4 mRNA initiates translation of the full-length protein. The second AUG, 85 codons downstream and in the same reading frame, initiates production of a peptide which has growth-regulatory properties (Bab et al., 1999). (D) With rat A2AR adenosine receptor mRNA, an overlapping upORF that initiates at an AUG codon in a strong context is used to minimize production of A2AR protein. The overlapping arrangement precludes reinitiation but the not-quite-perfect context at the upstream start site allows low-level leaky scanning. This interpretation is supported by the observed ten-fold increase in translation of A2AR in vivo when the start codon of the upORF was eliminated (Lee et al., 1999). Via a second promoter, the rat A2A-R gene produces some transcripts with additional upORFs, but no transcript has yet been found that lacks the inhibitory upORF discussed here. Here and in Fig. 3, the major coding domain is shaded gray. Small regulatory ORFs (blue rectangles) are not drawn to scale.
Fig. 3

Small upstream ORFs in eukaryotic mRNAs function in various ways to modulate translation. Only the 5′ end of each mRNA is depicted. (A) The presence of upORFs forces translation of the major ORF to occur by a reinitiation mechanism, which is usually inefficient. The extent of inhibition depends on the number and arrangement of upORFs and whether the context flanking the upstream start codon(s) allows some escape via leaky scanning. (B) Because reinitiation can occur only in the forward direction, an overlapping upORF strongly impairs translation of the major ORF. (C) Whereas type B mRNAs have a single in-frame start codon which is bypassed due to the overlapping upORF, type C mRNAs initiate from two in-frame start codons; the upORF serves to divert some ribosomes to the downstream start site. The depicted sequence is a simplified representation of GlyRS mRNA (Mudge et al., 1998). Translation of Bag-1 mRNA can also be fitted to this pattern: the first start site is an in-frame CUG codon which produces the 50 kDa form of Bag-1; the next start site (AUG#1, out-of-frame) initiates a small upORF within which the first in-frame AUG codon (AUG#2) resides, and that AUG is thereby skipped; the 36 kDa form of Bag-1 is produced from AUG#3 which is accessed by reinitiation following translation of the small upORF (Packham et al., 1997). Some other mRNAs that use an upORF to dodge one AUG codon in favor of another are described elsewhere (Mittag et al., 1997, Sarrazin et al., 2000). Note that the reinitiation shunt as here defined adheres to the linear scanning mechanism, unlike a shunt postulated to operate with cauliflower mosaic virus mRNA (Ryabova et al., 2000). (D) The common feature of mRNAs that use mechanism D is inhibition of translation in cis by a peptide encoded within the upORF. The amino acid sequence of the inhibitory peptide is different in each case (Morris and Geballe, 2000). In the column at the far right, asterisks indicate examples in which the translational control mechanism is regulated, e.g. via a change in concentration of eIF2 (GCN4) or arginine (CPA1) or polyamines (AdoMetDC) or, more commonly, via an alternative promoter that generates a simpler form of mRNA devoid of upORFs (c-mos, MDM2, IL-12; see text for other examples).

Production of a second protein isoform via low-level leaky scanning is seen also with some cellular mRNAs. An interesting example is the production in rats of an osteogenic growth peptide (OGP) initiated from codon 85 in the histone H4 gene (Fig. 2C). The leaky scanning explanation was tested by showing that production of OGP increased upon deleting the upstream H4 start codon, and that production of OGP was suppressed upon changing the H4 start codon from a good (AGGAAGaugU) to a perfect (GCCACCaugG) context. A similar mechanism might operate with a few other cellular genes that produce a trace amount of a second protein isoform (Liu et al., 2000, Short and Pfarr, 2002; it is not clear whether leaky scanning or a change in splicing underlies the translational switch described by Land and Rouault, 1998). The fourth example in Fig. 2 differs from the others in that a good-but-not-perfect context at the first start site serves, not to enable production of two proteins, but simply to modulate the yield of A2A-R from the second AUG. Examination of A2A-R genes from various organisms shows conservation of the overlapping ORF, with the upstream AUG codon always in a context that allows only low-level leaky scanning (Lee et al., 1999). Conservation of the structure supports the interpretation that this is a device contrived to limit the production of A2A-R protein. Low-level leaky scanning caused by a not-quite-perfect context around the first AUG codon might occur with other mRNAs where it normally goes unnoticed because the downstream start site(s) are out-of-frame. Antigenic peptides recognized by cytotoxic T-lymphocytes (CTLs) might be produced in this way, as discussed in Section 5.4. A small degree of leaky scanning that normally goes unnoticed could become significant if a mutation that shifts the normal start codon out of frame moves a downstream AUG codon into the main reading frame. In some cases where low-level internal initiation was observed with such a mutated gene (e.g. Maser et al., 2001), the possibility that the downstream site is reached via a combination of leaky scanning and reinitiation – a mechanism such as that proposed for hepatitis B virus (Fig. 2B) – merits consideration.

Reinitiation

Definition and general rules

Reinitiation occurs with mRNAs, such as those depicted in Fig. 3 , that have small ORFs near the 5′ end. Our rudimentary understanding of what happens following translation of the first upORF may be summarized as follows. Small upstream ORFs in eukaryotic mRNAs function in various ways to modulate translation. Only the 5′ end of each mRNA is depicted. (A) The presence of upORFs forces translation of the major ORF to occur by a reinitiation mechanism, which is usually inefficient. The extent of inhibition depends on the number and arrangement of upORFs and whether the context flanking the upstream start codon(s) allows some escape via leaky scanning. (B) Because reinitiation can occur only in the forward direction, an overlapping upORF strongly impairs translation of the major ORF. (C) Whereas type B mRNAs have a single in-frame start codon which is bypassed due to the overlapping upORF, type C mRNAs initiate from two in-frame start codons; the upORF serves to divert some ribosomes to the downstream start site. The depicted sequence is a simplified representation of GlyRS mRNA (Mudge et al., 1998). Translation of Bag-1 mRNA can also be fitted to this pattern: the first start site is an in-frame CUG codon which produces the 50 kDa form of Bag-1; the next start site (AUG#1, out-of-frame) initiates a small upORF within which the first in-frame AUG codon (AUG#2) resides, and that AUG is thereby skipped; the 36 kDa form of Bag-1 is produced from AUG#3 which is accessed by reinitiation following translation of the small upORF (Packham et al., 1997). Some other mRNAs that use an upORF to dodge one AUG codon in favor of another are described elsewhere (Mittag et al., 1997, Sarrazin et al., 2000). Note that the reinitiation shunt as here defined adheres to the linear scanning mechanism, unlike a shunt postulated to operate with cauliflower mosaic virus mRNA (Ryabova et al., 2000). (D) The common feature of mRNAs that use mechanism D is inhibition of translation in cis by a peptide encoded within the upORF. The amino acid sequence of the inhibitory peptide is different in each case (Morris and Geballe, 2000). In the column at the far right, asterisks indicate examples in which the translational control mechanism is regulated, e.g. via a change in concentration of eIF2 (GCN4) or arginine (CPA1) or polyamines (AdoMetDC) or, more commonly, via an alternative promoter that generates a simpler form of mRNA devoid of upORFs (c-mos, MDM2, IL-12; see text for other examples). When the 80S ribosome reaches the termination site of the upORF, the 60S ribosomal subunit is thought to be released (this has not actually been shown) while the 40S subunit remains bound to the mRNA, resumes scanning, and may initiate another round of translation at a downstream AUG codon. For the downstream reinitiation event to occur, the 40S subunit must reacquire Met-tRNAi and this appears to be an important point of control. Reacquisition of Met-tRNAi is promoted by lengthening the intercistronic domain (Abastado et al., 1991, Kozak, 1987c), which provides more time for Met-tRNAi to bind, or by increasing the concentration of eIF2 (Abastado et al., 1991, Hinnebusch, 1997). Genetic experiments also implicate eIF3 in the Met-tRNAi rebinding step (Garcia-Barrio et al., 1995). Another potential point of control is at the termination site of the upORF, where certain features – perhaps nearby secondary structure (Grant and Hinnebusch, 1994, Vilela et al., 1998) – might prevent the resumption of scanning or, in some other way, prevent reinitiation. This brief summary is based on studies carried out in yeast and mammals. Some studies of reinitiation in plants suggest that the intercistronic sequence may have effects beyond simply providing time for ribosomes to reacquire Met-tRNA (Wang and Wessler, 1998). Some results obtained in early experiments with mammalian vectors were interpreted as evidence that ribosomes can scan backwards and thus reinitiate at an AUG codon positioned upstream from the termination site (Peabody et al., 1986), but recent experiments contradict that view (Kozak, 2001b). Indeed many studies have shown that the strongest inhibition is caused by an upORF that overlaps the start of the downstream cistron (Babik et al., 1999, Bates et al., 1991, Byrne et al., 1995, Cao and Geballe, 1995, Ghilardi et al., 1998, Hansen et al., 2002, Kos et al., 2002, Lee et al., 1999, Liu et al., 1999), which would not be the case if ribosomes could move backwards to reinitiate. The size of the first ORF is a major limitation on reinitiation in eukaryotes: reinitiation can occur following the translation of a ‘minicistron’ (a small first ORF) but not following the translation of a full-length 5′ cistron. The long list of mRNAs that contain silent 3′ cistrons (Table 1) underscores the point. The only apparent exception occurs with cauliflower mosaic virus, where a protein encoded by the virus appears to promote reinitiation following the translation of a full-length first cistron (Park et al., 2001). The reason why reinitiation is usually restricted by the size of the first ORF is not known, but a possible explanation is that certain initiation factors dissociate from the ribosome only gradually during the course of elongation. If the elongation phase is brief – i.e. if the first ORF is a minicistron – the factors required for reinitiation would still be present when the 40S subunit resumes scanning. Although the postulated factors have not been identified, there is evidence for the idea that the duration of the elongation phase matters: when a short upORF which normally permits reinitiation was reconfigured to contain a pseudoknot that is known to slow elongation, reinitiation failed (Kozak, 2001b). That result makes it difficult to specify a cutoff size, i.e. one cannot say ‘an upORF this long will allow reinitiation’ while a longer ORF will not. The permissible size is likely to vary depending on features, such as secondary structure or codon usage, that affect the rate of elongation. As a rough guide, however, one may note that reinitiation often has been observed following translation of a ten to 12 codon upORF, and that reinitiation was substantially reduced, but not abolished, when a 13 codon upORF was lengthened to 33 codons (Kozak, 2001b). In a different study, reinitiation occurred following a 24 codon upORF but not when the ORF was lengthened to 40 codons (Luukkonen et al., 1995). Some naturally occurring upORFs that strongly inhibit translation, perhaps because their size precludes reinitiation, include a 36 codon upORF in mitochondrial uncoupling protein 2 mRNA (Pecqueur et al., 2001), a 71 codon upORF in polyoma virus JC mRNA (Shishido-Hara et al., 2000), and a 53 codon upORF in plant S-adenosylmethionine decarboxylase (AdoMetDC) mRNA (Hanfrey et al., in press). That the size of the upORF might be what limits translation of AdoMetDC is suggested from the five-fold increase in translation observed when the upORF was shortened from 53 to 25 codons, but that result could also be explained in other ways. (The suggested interpretation is not contradicted by the fact that an alternative upORF in AdoMetDC mRNA caused little inhibition even when lengthened to 66 codons; the alternative upORF initiates from an AUG codon in a weak context which would allow it to be bypassed to some extent by leaky scanning. The 53 codon upORF, in contrast, has a strong start codon.) With the mouse Snurf-Snrpn transcript, where the first cistron is 71 codons long (Gray et al., 1999), a very low level of reinitiation might account for translation of the downstream SNRPN cistron. A naturally occurring ATG-to-AGG mutation in the start codon of the upstream SNURF cistron was found to elevate translation of SNRPN >15-fold (Tsai et al., 2002), which implicates a scanning/reinitiation mechanism and rules out direct internal initiation.

Identifying candidate genes that might be regulated by reinitiation

From cDNA sequencing data, it is clear that many vertebrate mRNAs have small ORFs upstream from the start of the major coding domain, but an accurate count of genes in this class is difficult. The tallies that have been attempted (e.g. Pesole et al., 2001, Suzuki et al., 2000) are invariably flawed by inclusion of misinterpreted cDNA sequences, such as cDNAs in which a putative 5′ UTR with ‘upstream’ AUG codons turned out to be part of the coding domain or part of an intron that gets removed from the functional mRNA (Di Fruscio et al., 1998, Kozak, 1996, Kozak, 2000, Kubu et al., 2000, Nishitani et al., 2001, Santamarina-Fojo et al., 2000, Wagner et al., 1998). Some transcripts with long, AUG-burdened leader sequences are not associated with polysomes (Hake and Hecht, 1993, Sanchez-Góngora et al., 2000) or not able to support protein synthesis (Foo et al., 1994, Larsen et al., 2002, Lee et al., 2000), emphasizing that not all cDNAs correspond to functional mRNAs. A more fundamental complication vis-à-vis which genes to count is the propensity for a single gene to produce transcripts with different 5′ leader sequences, only some of which have upstream AUG codons (Anant et al., 2002, Aplan et al., 1991, Eerola et al., 2001, Huo and Scarpulla, 1999, Kawakubo and Samuel, 2000, Laurin et al., 2000, Perälä et al., 1994, Perrais et al., 2001, Sanchez-Góngora et al., 2000, Suva et al., 1989, Tanaka et al., 2001, Tsuda et al., 2000, Zimmermann et al., 2000). The significance of a particular form of RNA cannot always be deduced from its abundance, inasmuch as a minor transcript is sometimes the major functional mRNA (Andrea and Walsh, 1995, Babik et al., 1999, Ghilardi et al., 1998, Mitsuhashi and Nikodem, 1989, Nielsen et al., 1990) and incompletely processed transcripts are sometimes more abundant than the fully-spliced, translatable mRNA (Boularand et al., 1995, Frost et al., 2000, Xie et al., 1991, Zachar et al., 1987). Translational regulation mediated by small upORFs is important, as discussed below, but equally important are non-translational mechanisms – use of alternative promoters or splice sites – that simplify the 5′ UTR by eliminating upORFs in certain tissues or at certain times when elevated synthesis of the protein is required (Aizencang et al., 2000, Anusaksathien et al., 2001, Arrick et al., 1994, Babik et al., 1999, Brown et al., 1999, Horiuchi et al., 1990, Landers et al., 1997, Lee et al., 2000, Nonaka et al., 1989, Phelps et al., 1998, Ren and Stiles, 1994, Steel et al., 1996, Teruya et al., 1990). Because vertebrate mRNA leader sequences are often GC-rich (Section 2.4), secondary structure near the 5′ end might impair translation even more than the presence of upstream AUG codons. Thus, it is not surprising that eliminating upstream AUG codons does not improve translation in every case (Rao et al., 1988, Wood et al., 1996). In many cases, however, mutations targeted to the upstream AUG codons confirmed their role in restricting translation from downstream (Anant et al., 2002, Babik et al., 1999, Bates et al., 1991, Brown et al., 1999, Child et al., 1999a, Gereben et al., 2002, Ghilardi et al., 1998, Griffin et al., 2001, Harigai et al., 1996, Kos et al., 2002, Lee et al., 1999, Marth et al., 1988, Meijer et al., 2000, Pecqueur et al., 2001, Ren and Stiles, 1994, Steel et al., 1996, Tanaka et al., 2001, Tsai et al., 2002, Wang and Wessler, 1998, Wang and Rothnagel, 2001, Wera et al., 1995, Wu et al., 2002). This occurs by a variety of mechanisms, as summarized in Fig. 3 and discussed next.

Yeast GCN4 as an example of regulation via small upstream ORFs

While the efficiency of reinitiation varies, there is almost always a penalty – demonstrable by showing an increase in translation when the upORF is deleted – and the penalty can be severe. Thus, the simplest function of small upORFs is to limit production of the protein encoded in the full-length ORF by making downstream translation dependent on an inefficient reinitiation mechanism (Fig. 3A). The best studied example is yeast GCN4, which initiates from the fifth AUG codon in the mRNA; the long leader sequence contains four small upORFs. In a series of classic experiments, Hinnebusch (1997) was able to reconstruct GCN4 regulation using only the first and fourth upORFs, and I will explain what happens in that simplified case. UpORF1 is always translated efficiently (it is the first AUG in the mRNA), after which ribosomes resume scanning and reinitiate, usually, at upORF4. UpORF4 is unusual in that its translation precludes further reinitiation events: thus, when upORF4 is translated, GCN4 is not. That is the situation in yeast cultures that have adequate nutrients. Starvation for amino acids, however, causes some ribosomes to bypass the inhibitory upORF4 and reinitiate instead farther downstream. This happens because starvation creates a pool of uncharged tRNAs which activate a protein kinase that phosphorylates, and thus partially inactivates, eIF2. When eIF2 levels fall, it takes longer for ribosomes to reacquire Met-tRNAi and thus become competent to reinitiate. The slower acquisition of competence means that some ribosomes, scanning in the reinitiation mode, will bypass the nearby upORF4 and can thus reach the downstream GCN4 start site. Three general lessons from the GCN4 story appear to carry over to mammals. (i) Fig. 3A lists some examples of mammalian mRNAs that are translated inefficiently due to small upORFs; many other examples were cited in Section 4.2. (ii) Experimental manipulations with C/EBPβ mRNA (Fig. 1B) support the interpretation that an AUG codon which follows the upORF too closely is skipped (presumably because ribosomes have not yet reacquired Met-tRNAi), allowing reinitiation to occur farther downstream. The same mechanism might be invoked to explain how an internal start codon is accessed in miniTrpRS mRNA (Wakasugi et al., 2002) and baculovirus IE0 mRNA (Theilmann et al., 2001), and how c-myb gets translated from a rearranged transcript generated by retrovirus insertion (Jiang et al., 1997). In each of these mRNAs, the first AUG codon that follows a small upORF must be bypassed to reach the functional start codon downstream. (iii) The third lesson from GCN4 pertains to regulation of reinitiation by manipulation of eIF2 levels. Although hints of this have been described with mammalian genes that encode C/EBP transcription factors (Calkhoven et al., 2000), macrophage receptor protein CD36 (Griffin et al., 2001) and activating transcription factor 4 (Harding et al., 2000), the point requires much more careful study.

Human thrombopoietin – an example of severely restricted translation

The mRNAs discussed in connection with Fig. 3A have upORFs that terminate before the start of the major coding domain, thus allowing (inefficient) translation of the main ORF by reinitiation. In Fig. 3B, however, the upORF overlaps the start of the major coding domain. This precludes reinitiation and profoundly reduces the translational yield. Limited access to the main ORF in some of these mRNAs might be achieved by leaky scanning, as was discussed for A2A-R (Fig. 2D). mRNAs derived from the human thrombopoietin (TPO) gene have structures similar to that depicted in Fig. 3B and much can be learned from the TPO story, as outlined in Fig. 4 . The normal gene produces a mixture of transcripts with different leader sequences, all of which translate TPO poorly because of an overlapping upORF (upORF7 in Fig. 4). Targeted mutagenesis (Ghilardi et al., 1998) confirmed that upstream AUG#7 is primarily responsible for blocking translation of TPO. This is because its near-optimal context (GCCGCCUCCaugG) prevents leaky scanning and the overlapping arrangement precludes reinitiation.
Fig. 4

A low-level reinitiation mechanism normally prevents overproduction of TPO. Translational yields from various forms of TPO mRNA in transfected COS cells (far right column) are expressed relative to a control transcript that has a short, unencumbered 5′ UTR. P1 and P2 are alternative promoters; a cluster of arrows indicates that P2 produces staggered start sites. The TPO coding domain (horizontal black bar) begins at an AUG codon which is labeled #8 because, in the longest form of mRNA (line 1), it is preceded by seven AUG codons that initiate small upORFs. Superscript letters indicate whether each upstream AUG resides in a strong (S) or weak (W) context and horizontal blue lines depict the approximate length and arrangement of the upORFs. Vertical lines demarcate the boundaries of exons; carets depict the introns in alternatively spliced transcripts. Only the beginning of the TPO coding domain (exons 3–7) is shown. The key point is that the normal set of transcripts supports translation poorly because upORF7 overlaps the TPO start site. Various mutations (shown in red) that relieve this constraint elevate the translation of TPO, and this overproduction causes hereditary thrombocythemia. Among the normal set of mRNAs, the ‘rare’ transcript from promoter P1 (line 2) supports translation slightly better than the others, perhaps because the short distance between upORF2 and AUG#7 enables some reinitiating ribosomes to bypass AUG#7 and thus reach AUG#8. Because of the strong context at AUGs #1 and #2, upORFs 1 and 2 would be more effective than upORFs 5 and 6 in setting up this reinitiation shunt. The depicted scheme is based on experiments described by Ghilardi et al. (1998) and Wiestner et al. (1998). Additional mutations diagrammed near the bottom of the figure were described by Ghilardi and Skoda (1999), Ghilardi et al. (1999), and Kondo et al. (1998).

A low-level reinitiation mechanism normally prevents overproduction of TPO. Translational yields from various forms of TPO mRNA in transfected COS cells (far right column) are expressed relative to a control transcript that has a short, unencumbered 5′ UTR. P1 and P2 are alternative promoters; a cluster of arrows indicates that P2 produces staggered start sites. The TPO coding domain (horizontal black bar) begins at an AUG codon which is labeled #8 because, in the longest form of mRNA (line 1), it is preceded by seven AUG codons that initiate small upORFs. Superscript letters indicate whether each upstream AUG resides in a strong (S) or weak (W) context and horizontal blue lines depict the approximate length and arrangement of the upORFs. Vertical lines demarcate the boundaries of exons; carets depict the introns in alternatively spliced transcripts. Only the beginning of the TPO coding domain (exons 3–7) is shown. The key point is that the normal set of transcripts supports translation poorly because upORF7 overlaps the TPO start site. Various mutations (shown in red) that relieve this constraint elevate the translation of TPO, and this overproduction causes hereditary thrombocythemia. Among the normal set of mRNAs, the ‘rare’ transcript from promoter P1 (line 2) supports translation slightly better than the others, perhaps because the short distance between upORF2 and AUG#7 enables some reinitiating ribosomes to bypass AUG#7 and thus reach AUG#8. Because of the strong context at AUGs #1 and #2, upORFs 1 and 2 would be more effective than upORFs 5 and 6 in setting up this reinitiation shunt. The depicted scheme is based on experiments described by Ghilardi et al. (1998) and Wiestner et al. (1998). Additional mutations diagrammed near the bottom of the figure were described by Ghilardi and Skoda (1999), Ghilardi et al. (1999), and Kondo et al. (1998). Various mutations that restructure the 5′ UTR in ways that increase production of TPO cause hereditary thrombocythemia. Translation of TPO normally initiates at AUG#8 in exon 3, but a splice-site mutation that causes deletion of exon 3 causes initiation to shift to a previously silent in-frame codon (AUG#9) in exon 4; this is diagrammed in the center of Fig. 4. The resulting truncated form of TPO lacks only the first four amino acids and appears to function normally (Wiestner et al., 1998). The problem – the cause of the pathology – is that the mutation greatly increases translation of TPO by removing the inhibitory upORF7. In two other families affected with hereditary thrombocythemia, production of TPO is elevated by mutations that restructure upORF7. In one case, deletion of a G residue shifts upORF7 into the same reading frame as TPO, thereby causing overproduction of an elongated form of TPO initiated from AUG#7 (Ghilardi and Skoda, 1999, Kondo et al., 1998). In the other case, a G→T mutation creates a terminator codon within upORF7 and this shortening of the ORF, which now terminates 31 nt before AUG#8, enables efficient reinitiation at AUG#8 (Ghilardi et al., 1999). These insightful studies of TPO expression make two important points: (i) the bulk of the transcripts produced by the wild type gene are virtually untranslatable; and (ii) it is necessary for this potent cytokine to be translated poorly; overproduction results in disease. With TPO as precedent, one suspects that in other cases where – despite the production of alternative leader sequences – it is hard to find even one form of mRNA devoid of upstream AUG codons (e.g. Larsen et al., 2002, Lee et al., 1999, Pecci et al., 2001, Peterson and Morris, 2000, Wang et al., 1999), the goal is to ensure that translation is very, very inefficient. The wig-1 growth-regulatory gene might be another example: an overlapping upORF initiates from a strong AUG codon, while the wig-1 start codon itself is weak, and these distinctive features are conserved between the human and mouse genes (Hellborg et al., 2001).

Other uses of upstream ORFs

Whereas an overlapping upORF functions simply to down-modulate translation in the examples depicted in Fig. 3B, with the mRNAs in Fig. 3C the overlapping upORF qualitatively affects the protein output. Ribosomes that translate the upORF thereby miss the first in-frame AUG codon but proceed to reinitiate at another start codon downstream. If the upORF itself has a suboptimal initiation site (U in position −3 in the depicted example), leaky scanning will allow some production of the long protein isoform from the first in-frame AUG codon while the reinitiation shunt promotes production of the shorter protein isoform. The operation of a reinitiation shunt is most obvious when the upORF overlaps a potential start codon, as shown in Fig. 3C, but the same principle applies in cases (discussed in Section 4.3) where, although the upORF terminates prior to a potential downstream start codon, the intervening distance is too short to allow reinitiation. The fourth regulatory mechanism diagrammed in Fig. 3 is used only rarely. Mammalian AdoMetDC mRNA is the best studied example in which a small upORF encodes a peptide which functions in cis to inhibit downstream translation. The nascent peptide (MAGDIS) produced during translation of the upORF is thought to interact with ribosomes in a way that prevents completion of the termination process and thus prevents reinitiation (Law et al., 2001). The stalled ribosome, held at the termination site of the upORF, would also block other ribosomes from reaching the downstream start site via leaky scanning. Biologically, this mechanism is important because AdoMetDC is a key enzyme in polyamine biosynthesis and, at least in vitro, elevated polyamine levels stabilize the ribosome complex stalled at the end of the upORF (Law et al., 2001). In other words, elevated polyamine levels down-regulate translation of AdoMetDC. It is interesting to note parenthetically that antizyme, a protein that down-regulates polyamine levels, is also translated via a polyamine-sensitive mechanism. Elevated polyamine levels up-regulate production of antizyme by promoting a ribosomal frameshift needed to translate the full-length protein (Ivanov et al., 2000). The foregoing examples illustrate how reinitiation operates as part of the normal translation mechanism in cases where upORFs are constitutively present in mRNAs. There are other cases in which a reinitiation mechanism kicks in only when a nonsense mutation is introduced in a way that truncates the coding domain. In effect, the normal AUG initiator codon becomes the start of an upORF, following which reinitiation occurs at a normally silent internal AUG codon (Chang and Gould, 1998, Ledley et al., 1990, Zoppi et al., 1993). The N-terminally truncated protein thus produced sometimes retains enough function to mitigate the pathological effects of the nonsense mutation (Chang and Gould, 1998). This potential rescue device often fails, however, because many mRNAs are rapidly degraded when a nonsense codon is introduced (Frischmeyer and Dietz, 1999, He and Jacobson, 2001). The mRNA decay pathway that targets these abnormal mRNAs is activated in part by cis elements located in the coding domain (Gudikote and Wilkinson, 2002), which might explain why normal upORF-containing mRNAs (e.g. those discussed in Fig. 3, Fig. 4) are not rapidly degraded.

Changes in mRNA structure and translation linked to human diseases

Initiation factor eIF2 plays a key role in translational control (Clemens, 2001, Dever, 2002) and mutations that perturb regulation of eIF2 have profound pathological consequences (Delépine et al., 2000, Han et al., 2001, Harding et al., 2001, Van der Knaap et al., 2002). Human genetic disorders have been traced also to disruption of regulatory mechanisms mediated by mRNA binding proteins (Brown et al., 2001, Cazzola and Skoda, 2000, Cazzola et al., 2002, Kaytor and Orr, 2001, Mikulits et al., 1999). Here, however, I focus on pathologies resulting from increases or decreases in translation caused directly by changes in mRNA structure. The preceding paragraph mentioned some cases in which an N-terminally truncated protein is produced, apparently by reinitiation, when a mutation introduces a premature nonsense codon. The effects of some other types of mutations can also be understood in light of the scanning mechanism, as outlined next and discussed elsewhere in more detail (Kozak, 2002). Recent investigations have identified diseases that result from failure to produce one of the two protein isoforms derived from genes that encode certain transcription factors (Table 4). Because the second isoform often functions as a modulator, the transcriptional imbalance caused by these changes in translation can have serious consequences.
Table 4

Pathologies resulting from a change in mRNA structure which selectively abolishes production of the long or short form of a transcription factor

GeneTranslational mechanism that normally generates two protein isoformsDisease-associated change in mRNA structure and translationReferences
C/EBPα (human)Two proteins from one mRNA via leaky scanning+reinitiation shuntIn acutemyeloidleukemia, mutations near amino terminus eliminate production of longer isoform.Pabst et al., 2001
GATA1 (human)Two proteins from one mRNA via leaky scanningIn Downsyndrome-relatedleukemia, premature stop codon eliminates production of longer isoform.Wechsler et al., 2002
LEF1 (human)Two proteins from two mRNAs (via two promoters)In coloncancer, failure to activate downstream promoter prevents production of shorter (inhibitory) isoform.Hovanes et al., 2001
Rx/rax (mouse)Two proteins from one mRNA via leaky scanningaIn eyeless mice, mutation of second AUG, leaving only the weak upstream start codon, results in inadequate yield.Tucker et al., 2001

Here the long and short isoforms appear to function identically; the significance of the second AUGSTART codon pertains to boosting the overall protein yield. The eyeless mouse serves as a spontaneous model for human anophthalmia.

Pathologies resulting from a change in mRNA structure which selectively abolishes production of the long or short form of a transcription factor Here the long and short isoforms appear to function identically; the significance of the second AUGSTART codon pertains to boosting the overall protein yield. The eyeless mouse serves as a spontaneous model for human anophthalmia.

AUG start codon and context mutations

Hereditary diseases have been traced occasionally to point mutations that alter the context flanking the AUGSTART codon. The list includes α-thalassaemia caused by an A→C change in position −3 of the α-globin gene (Morlé et al., 1985), androgen insensitivity syndrome caused by a G→A mutation in position +4 of the androgen receptor gene (Choong et al., 1996), and vitamin E deficiency associated with a C→T mutation in position −1 of the α-tocopherol transfer protein gene (Usuki and Maruyama, 2000). There is an interesting report of a somatic point mutation (G→C in position −3) in the BRCA1 gene in a highly aggressive case of sporadic breast cancer (Signori et al., 2001). In mice, a screen for mutations that cause defects in eye development uncovered an A→T change in position −3 of the Pax6 gene (Favor et al., 2001). Each of these mutations was shown to cause a decrease (generally two- to four-fold) in translation. Not every mutation or polymorphism within the consensus motif can be explained simply, however. Other considerations, such as codon usage, might prevent an increase in translation even when the context is improved (i.e. translation might be limited at the level of elongation rather than initiation), and some mutations near the AUG codon might affect mRNA processing or stability rather than translation per se. A clinically relevant polymorphism in position −1 of annexin V appears to have an effect on translation which is inconsistent with the context rules (González-Conejero et al., 2002), but the effect was small and documented only by assaying translation in vitro, which is not always reliable (Section 6.2). A natural polymorphism in position −5 of the glycoprotein Ibα gene displayed a small effect on translation in vitro that was consistent with the rules (C worked better than T; Afshar-Kharghan et al., 1999) but, in the same study, mutations that changed position +4 from C to G did not augment translation. Testing mutations in position +4 is tricky, however, because the change in identity of the penultimate amino acid might affect protein stability in ways that obscure the effects on translation. The solution is to use an assay that directly monitors the initiation step of translation (Kozak, 1997). The scanning mechanism predicts that a mutation which weakens or destroys the normal start codon should activate initiation from the next AUG downstream. In some hereditary diseases in which the AUGSTART codon is ablated, a truncated protein is indeed produced in this way but it does not function well enough to prevent the disease (Cahana et al., 2001, Huang et al., 1999, O'Neill et al., 2001). In the case of a mutated vasopressin gene in which the G of the AUGSTART codon is deleted, the shorter signal peptide initiated from the second AUG codon is not recognized by signal peptidase (Beuret et al., 1999). The resulting uncleaved vasopressin-precursor protein folds incorrectly, causing subsequent processing steps to fail, and therefore vasopressin never gets released from the endoplasmic reticulum. The second AUG is only four codons downstream from the first, but the processing defect caused by this slight shift in the site of initiation causes diabetes insipidus.

Restructuring cellular mRNAs by addition or removal of upstream AUG codons

The scanning mechanism predicts that, when an out-of-frame AUG codon is introduced into the 5′ UTR, the adventitious upstream start codon should supplant the normal start site. A number of pathologies result from this kind of translational block. Sometimes the upstream AUG codon is created by a rare mutation (Cai et al., 1992, Liu et al., 1999). Other times it derives from a common polymorphism (Bergenhem et al., 1992, Endler et al., 2001, Kanaji et al., 1998, Kraft et al., 1998, Zysow et al., 1995). The reduction in translation is more or less severe depending on the context of the upstream AUG codon and whether reinitiation is possible. I have already explained how hereditary thrombocythemia is caused by mutations that elevate translation of TPO by restructuring or eliminating an inhibitory upORF (Fig. 4). Translation of proto-oncogenes is also elevated in some cases by eliminating small upORFs from the mRNA. The MDM2 oncogene is one example: whereas the normal mRNA has a long 5′ UTR that includes two upstream AUG codons, in tumor cells the use of a different promoter eliminates the upstream AUGs and thus increases translational efficiency 20-fold (Brown et al., 1999, Landers et al., 1997). In the case of oncogene GLI1, the upstream AUG codons that restrict translation in normal cells reside in an intron which is eliminated by more efficient splicing in basal cell carcinomas (Wang and Rothnagel, 2001). Translation of many other human or rodent oncogenes is restricted in normal cells by an encumbered (AUG-burdened or GC-rich) leader sequence (Arrick et al., 1991, Bates et al., 1991, Child et al., 1999a, Harigai et al., 1996, Hoover et al., 1997, Horvath et al., 1995, Manzella and Blackshear, 1990, Sarrazin et al., 2000); in some of these cases, a shorter 5′ UTR that better supports translation is produced in transformed cells (Arrick et al., 1994, Marth et al., 1988). For other oncogenes, although there are alternative leader sequences that might regulate expression in normal tissues (Link et al., 1992, Sasahara et al., 1998), there is no evidence that switching leader sequences contributes to tumorigenesis. Whereas removal of small upORFs elevates the translation of the aforementioned MDM2 and GLI1 oncogenes in tumor cells, addition of small upORFs shuts off the translation of some tumor suppressor genes. In the case of HYAL1, retention of an intron which contains eight upstream AUG codons renders the mRNA untranslatable in squamous cell carcinomas (Frost et al., 2000). A striking example of translational inactivation of a tumor suppressor gene is seen in some individuals with a predisposition to melanoma. In certain families, a point mutation (G→T) creates an upstream, out-of-frame AUG codon in the CDKN2 gene (Liu et al., 1999). The small upORF initiated from this new AUG codon overlaps the CDKN2 start codon, and the resulting inhibition of translation is profound.

Modulation of viral infectivity by restructuring mRNAs

Structural changes that attenuate the translation of viral mRNAs can contribute to the development of persistent infections. The 5′ leader sequence on bovine coronavirus mRNAs, for example, was found to evolve – by acquiring a small upORF – during the course of establishing a persistent infection (Hofmann et al., 1993). Shishido-Hara et al. (2000) speculate that human polyomavirus JC might cause persistent rather than acute infection because all capsid-protein encoding transcripts produced by the JC virus have a small upORF. With the related simian virus 40, in contrast, the upORF is sometimes eliminated by splicing, generating transcripts that better support translation of the major capsid protein. Attenuating effects caused by introducing an upstream AUG codon have been described also with other viruses (Petty et al., 1990, Slobodskaya et al., 1996). A more drastic restructuring of mRNAs sometimes occurs during the establishment of persistent infections by the measles virus. Instead of the normal monocistronic mRNA for the fusion protein, the predominant transcript in some persistently infected cells was a dicistronic mRNA from which the F cistron, located at the 3′ end, could not be translated (Hummel et al., 1994). A similar problem encountered in studies with recombinant rhabdoviruses provides insight into the transcriptional defects that can generate untranslatable dicistronic mRNAs (Quiñones-Kochs et al., 2001). In the case of a human parvovirus, productive infection is restricted to a subset of erythroid cells in which splicing generates a monocistronic mRNA for each of the major capsid proteins. In nonpermissive cells, a slight shift in the position of a splice site imposes an upstream ORF which is postulated to restrict translation of the capsid proteins (Brunstein et al., 2000).

Production of CTL antigens

Translational twists sometimes generate antigens which, by stimulating the CTL response, are important in the host defense against tumor cells and viruses (Shastri et al., 2002). Leaky scanning is a likely explanation in several cases where the major ORF starts with an AUG codon in a suboptimal context and the CTL antigen derives from initiation at the next (out-of-frame) AUG (Aarnoudse et al., 1999, Bullock et al., 1997, Probst-Kepper et al., 2001, Rimoldi et al., 2000). In one notable case, translation shifts upstream to an in-frame AUG codon created during insertion of a provirus, and the resulting novel N-terminal amino acid extension functions as a tumor rejection antigen (Wada et al., 1995). The scanning mechanism cannot explain the translation of CTL antigens for which the start codon resides far in the interior of the mRNA (Ronsin et al., 1999, Wang et al., 1996). In these cases the antigenic peptide might be produced from an undetected alternative form of mRNA. Sensitive new assay techniques employed with some genes indeed reveal an array of alternative transcripts from which novel tumor antigens can be translated (Behrends et al., 2002). In another study, a potent tumor rejection peptide, which maps to an internal AUG codon in the full-length cDNA, was expressed experimentally from a truncated cDNA wherein the start codon for the antigenic peptide was made the first AUG (Rosenberg et al., 2002). Additional analyses are needed to determine whether, in the melanoma cells wherein this antigen is expressed naturally, a transcript similar to the experimentally truncated cDNA is produced via a downstream promoter or splice site.

Surveys and assays and problems therein

cDNA surveys

Surveys of mRNA/cDNA sequences differ in other details, but every survey confirms the presence of a purine in position −3 in most (≥90%) vertebrate mRNAs (Kozak, 1987a, Pesole et al., 2000, Rogozin et al., 2001, Sakai et al., 2001). The occasional survey that purports to challenge the context rules involves distortions, such as emphasizing the low percentage of cDNAs that have the full consensus sequence while ignoring the high percentage of cDNAs that have the critical purine in position −3 (Peri and Pandey, 2001). A major uncertainty pertaining to all cDNA surveys concerns the validity of the database. When I re-examined the entries in one study (Suzuki et al., 2000), I found numerous instances in which the AUGSTART codon had been misidentified; the corrected start sites adhered more closely to the consensus motif (Kozak, 2000). Some authors pre-emptively defend their conclusions on the grounds that the (unidentified) cDNA sequences used for their analysis derive from RefSeq, which is a curated database (Pruitt et al., 2000). But the entries in RefSeq are not without errors, some of which – e.g. misidentified start codons, mistaken claims of upstream AUG codons – can be traced by comparing curated GenBank entries NM_005493, NM_005502, NM_000282 and NM_003605 with results published elsewhere (Campeau et al., 2001, Nishitani et al., 2001, Nolte and Müller, 2002, Santamarina-Fojo et al., 2000). Some cDNA surveys use misleading terminology, e.g. referring to upstream AUG codons as ‘unused’ (Peri and Pandey, 2001). Given that upstream AUG codons are used, as proved by detecting the encoded peptide or by fusing the upORF to a reporter gene, it is not anomalous to find a good context around some upstream AUG codons. All surveys tend to overestimate the incidence of upstream AUGs by scoring only the longest cDNA isoform, ignoring the existence of alternative transcripts that have shorter, unencumbered 5′ UTRs (Section 4.2). The significance of upstream AUG codons also tends to be misstated: the presence of small upORFs in vertebrate mRNAs which are thereby translated inefficiently (see the foregoing discussion of TPO, oncogenes, etc.) constitutes evidence for, rather than against, the scanning model. The first-AUG rule, which I cite as evidence for the scanning mechanism, derives not from statistical analysis of cDNA sequences but from the experimentally observed fact (Section 2.2) that translation shifts predictably upstream or downstream when an AUG codon is added or removed. In short, it makes more sense to use the scanning/context rules to evaluate cDNA sequences (Hatzigeorgiou, 2002) than to attempt the reverse.

Assays used to study translation

While conclusions about translation derived from experimental studies are arguably more meaningful than those derived from statistics, the interpretation of experimental results can be complicated. In vivo assays avoid the problem of reaction-conditions-dictating-the-outcome (see next paragraph), but there are other potential traps. The usually-valid assumption that polysomal association identifies actively translated mRNAs is called into question by the recent discovery of mRNAs trapped on large polysomes from which there is no polypeptide production (Rüegsegger et al., 2001). The major problem when translation is studied in vivo is uncertainty about the structure of the mRNA. For example, a claim that IRES-mediated translation is developmentally regulated (Créancier et al., 2000, Créancier et al., 2001) is premature, inasmuch as those studies monitored the amount but not the form of mRNA. Other studies wherein translation of an encumbered leader sequence appears to improve under certain conditions or in certain cell types require better analyses to rule out a possible change in structure of the 5′ UTR (Bernstein et al., 1995, Child et al., 1999b, Li et al., 2001, Zimmer et al., 1994). Some useful hints may be found in reports that describe the belated discovery of alternative forms of mRNA that were missed the first time around (Chen et al., 1999, Cortner and Farnham, 1990, Déjardin et al., 2000, Deng et al., 2002, Frost et al., 2000, Grundhoff and Ganem, 2001, Jordan et al., 1996, Kastner et al., 1990b, Kiss-László et al., 1995, Laurin et al., 2000, Peremyslov and Dolja, 2002, Zhang and Liu, 2000, Zheng et al., 1994). In vitro translation assays pose a different set of problems. The commercial availability of in vitro translation kits is both a blessing – the systems are easy to use – and a curse. The latter because insufficient attention is paid to reaction conditions that can affect the selection of AUG start codons. When the magnesium concentration is too low, the first AUG codon may be bypassed despite an adequate context; when the magnesium concentration is too high, initiation may occur at upstream non-AUG codons that are not naturally used. One solution is to include control transcripts for which start-codon selection was determined in vivo, and to adjust the in vitro reaction conditions to give the same result (Kozak, 1990b). Some suppliers of translation kits make it possible to adjust the magnesium concentration, but there is little awareness of the need to do so and the use of coupled transcription/translation systems makes it difficult. For whatever reason, in vitro translation results sometimes deviate significantly from what is seen in vivo vis-à-vis access to internal AUG codons (Grove et al., 1991, Land and Rouault, 1998, Meijer et al., 2000, Meulewaeter et al., 1992, Mitchelmore et al., 2002, Saucedo et al., 1999) and the degree of inhibition caused by small upORFs (Ghilardi et al., 1998, Harigai et al., 1996, Pecqueur et al., 1999, Pecqueur et al., 2001, Tanaka et al., 2001, Wang and Wessler, 1998). The fidelity of initiation in vitro is clearly impaired, possibly due to degradation of the mRNA, in cases where extraneous, low molecule weight polypeptides are produced (Herbert et al., 1996, Liu and Biegalke, 2002, Lekven et al., 2001, Maser et al., 2001, Packham et al., 1997). The possibility that the input mRNA might undergo cleavage during incubation in vitro complicates attempts to study the expression of dicistronic mRNAs, as discussed in the next section. This type of artifact is not ruled out by finding that only certain downstream ORFs are translated (O'Connor and Brian, 2000). Extrapolating from what is seen when mRNAs are deliberately cleaved in vivo (Thoma et al., 2001), activation of internal start codons in vitro would depend on where the accidental cleavage occurs and whether the endolytic cleavage product persists long enough for a ribosome to engage the newly created 5′ end before exonucleases take over. This line of reasoning could explain the claim that an ‘artificial IRES’, consisting of a multiple cloning region and a portion of the Escherichia coli lacI gene, supports internal initiation of translation in starved yeast cells (Paz et al., 1999): starvation is likely to promote mRNA degradation, and the ‘IRES’ might fortuitously stabilize certain intermediates in degradation. The discovery that IRES elements are actually targeted by some ribonucleases (Elgadi and Smiley, 1999, Nadal et al., 2002) should be remembered. Recent studies that use a primer-extension inhibition (toeprinting) assay to monitor the binding of ribosomes to mRNAs have the advantage of focusing directly on the initiation step, but care is needed to distinguish authentic initiation complexes from artifactual pauses in primer extension caused by base-paired structures or extraneous proteins bound to the mRNA. The complicating effects of mRNA secondary structure, which are prominent when avian reverse transcriptase is used for toeprinting, can be minimized by using a form of the enzyme derived from murine leukemia virus (Kozak, 1998).

Studies of eIF4G translation illustrate some common problems

Attempts to explain the origin of multiple isoforms of eIF4G (Bradley et al., 2002) illustrate how challenging it can be to interpret translation assays. In vitro experiments presented in support of the idea that translation can initiate from five AUG codons, in a single form of eIF4G mRNA, might have been compromised by mRNA breakage; this would explain the production of an array of extraneous smaller polypeptides (Byrd et al., 2002, Fig. 3C, lanes 1, 3 and 5). Translation of some eIF4G isoforms from broken mRNAs could also explain the variability in yields noted throughout that study. When the endogenous eIF4G gene is expressed in vivo, access to certain downstream AUG codons might occur via alternative splicing or internal promoters; both mechanisms have been documented in studies of eIF4G by other investigators (Han and Zhang, in press). Thus, even though one could rationalize the production of at least three isoforms of eIF4G from one mRNA via established translational mechanisms – an overlapping upORF could shunt some 40S ribosomal subunits past the first in-frame AUG codon (position 275), and the unfavorable context at AUG 395 might allow some ribosomes to reach AUG 536 by leaky scanning – it would be premature to propose that solution. The in vitro experiments need to be repeated with careful attention to magnesium levels and with efforts to minimize mRNA breakage. The latter might be accomplished by lowering the temperature to 25 °C and limiting the window for initiation to 5 or 10 min. (Addition of edeine after the first 5 or 10 min, followed by another period of incubation, would allow polypeptides to be elongated without further initiation events.) The possible production of some eIF4G isoforms by proteolysis also needs to be ruled out, as this protein is notoriously susceptible to cleavage. The study by Byrd et al. (2002) included experiments carried out with dicistronic transcripts, predicated on the belief that eIF4G mRNA contains IRES elements which allow direct internal initiation of translation. An eIF4G/EGFP (enhanced green fluorescent protein) fusion gene positioned at the 3′ end of a dicistronic transcript was translatable in vitro, but the aforementioned possibility of mRNA breakage complicates the interpretation. Indeed, the unexpected translation of EGFP from the 3′ position even without fusion to eIF4G (Byrd et al., 2002, Fig. 7A, lane 3) is most easily explained by mRNA cleavage. (The authors invoke reinitiation as the explanation, but reinitiation cannot occur following translation of a large 5′ cistron.) Fragmentation of the mRNA could explain why translation of the 3′ eIF4G/EGFP cistron persisted when translation of the 5′ cistron was blocked by a hairpin structure (Byrd et al., 2002, Fig. 7B). The hairpin test, widely used to test for internal initiation, is meaningless without evidence that the dicistronic input mRNA remains intact. The in vivo tests of eIF4G translation (Byrd et al., 2002, Fig. 8) also require careful RNA analyses to document that the vector produces only the intended dicistronic mRNA; the quality of the Northern blot in that figure falls far short of what is required. To rule out the possibility that the 3′ cistron might be translated from an unintended monocistronic mRNA, a promoter-deletion control is needed – a control which shows that, upon deleting the promoter that precedes the 5′ cistron, expression of the 3′ cistron is also abolished. This test failed in studies with some other sequences, revealing that the candidate IRES actually harbors a cryptic promoter (Han and Zhang, 2002, Han et al., 2002, Larsen et al., 2002). The foregoing discussion of eIF4G translation alludes to only some of the problems associated with dicistronic vectors; a more detailed critique may be found elsewhere (Kozak, 2001a, Kozak, 2001b). Use of a certain popular vector which harbors an intron near the 5′ end (Jopling and Willis, 2001) increases the likelihood of producing an unintended monocistronic mRNA via splicing; the candidate IRES need contribute only a cryptic 3′ splice site. This indeed happens in some cases (Grundhoff and Ganem, 2001, Pinkstaff et al., 2001). Claims of IRES activity are problematic when supported by in vitro assays in which translation of the 3′ ORF is very, very weak (e.g. Deffaud and Darlix, 2000, Lekven et al., 2001, Maser et al., 2001). The simple idea that an IRES can be identified based on the ability to support translation of a 3′ cistron runs into trouble when, for example, the β-globin mRNA leader sequence, intended to serve as a negative control, turns out somehow to allow translation of a downstream cistron (Van der Velden et al., 2002). In another study, merely lengthening the intercistronic domain enabled substantial translation of the 3′ cistron (Gallie et al., 2000), perhaps by providing room for RNases to cleave and thus release a translatable 3′ fragment. Even with the paradigmatic IRES elements derived from picornaviruses, the ability to support internal initiation was found to depend inexplicably on the choice and arrangement of 5′ and 3′ reporter genes (Hennecke et al., 2001). These oddities – along with the notable inability to translate the 3′ cistron in natural dicistronic mRNAs (Table 1) – are reason to worry about the validity of experiments that employ synthetic dicistronic constructs. The proffered rationale for a cap-independent internal initiation mechanism is that it would enable certain mRNAs to be translated when eIF4E levels decline, but recent experiments presented in support of that idea (Li et al., 2002) used the 5′ UTR from poliovirus rather than 5′ UTRs from cellular mRNAs, such as eIF4G, that are supposedly regulated via ‘a dynamic interplay between cap-dependent and cap-independent processes’. Even if the proffered rationale is valid, convincing evidence for direct internal initiation with particular mRNAs is needed. The widely used dicistronic assay has flaws, as outlined above. An alternative assay which involves circularization of the mRNA has been attempted with only one viral IRES element (Chen and Sarnow, 1995); the results await independent verification and extension to other sequences.

Conclusions and closing notes

The scanning model provides a framework for understanding basic patterns of eukaryotic gene expression, such as the reliance on monocistronic mRNAs, and for understanding how translation is perturbed by mutations that restructure the 5′ UTR. A growing number of human diseases have been traced to such mutations. The scanning mechanism has been shown to operate not only with simple mRNAs that have a short 5′ UTR and initiate at the first AUG codon, but also with mRNAs that have complicated leader sequences and multiple start codons. One often hears the suggestion that an alternative, IRES-mediated mechanism of initiation is required when a long leader sequence is encumbered by secondary structure or upstream AUG codons (Dever, 2002, Pestova et al., 2001). That view is not well taken. Scanning can occur over long distances, as evidenced by some bifunctional viral mRNAs in which the second start site is more than 500 nt downstream from the first (e.g. peanut clump virus, southern bean mosaic virus, rice tungro bacilliform virus; Table 3 and Fig. 1C). The structure-prone, GC-rich leader sequences on mammalian mRNAs strongly reduce translational efficiency but do not preclude operation of the scanning mechanism (Van der Velden et al., 2002). Upstream AUG codons also reduce translational efficiency and that is why they are there. To postulate the need for an alternative mechanism is to miss the point that an encumbered leader sequence ensures that translation via scanning will be inefficient, and thus ensures against harmful overproduction of cytokines (Fig. 4) and other potent proteins. The high frequency of intron-containing cDNA sequences (Kozak, 1991a, Kozak, 1996) might reflect another type of regulation. Inefficient or regulated removal of the first intron has been documented in some cases (Boularand et al., 1995, Frost et al., 2000, Van der Leij et al., 2002, Wang and Rothnagel, 2001, Xie et al., 1991, Zachar et al., 1987) and I suspect that additional examples might be found – miscategorized – among the aforementioned cDNAs that are postulated to require an alternative mechanism of initiation. Removal of the intron, or use of a cryptic promoter therein, would eliminate the upstream AUG codons that are barriers to scanning. Examples in which translation is prevented deliberately by splicing-out the exon that contains the AUGSTART codon (Lin et al., 1998) or by other regulated splicing events (Rueter et al., 1999) underscore the point that not every transcript – hence not every cDNA – corresponds to a functional mRNA. Before postulating the need for a new mechanism to explain how a funny looking cDNA gets translated, one must be certain that it is translated. The mechanisms discussed herein for escaping the first-AUG rule, within the constraints imposed by the scanning model, obviously cannot explain every report of initiation from an internal position. More information is needed to understand how N-terminally truncated versions of some proteins are produced apparently without truncation of the mRNA (Goss et al., 2002, Maser et al., 2001, Santagata et al., 2000, Scharnhorst et al., 1999, Vanhoutte et al., 2001). An IRES element was postulated in some of those cases, based on the dicistronic test, but in one study there were no accompanying analyses of RNA structure in vivo (Goss et al., 2002), and in another study the use of an in vitro translation system produced too little of the truncated protein to be convincing (Maser et al., 2001). Speculation about how some other interesting genes are translated (Klemke et al., 2001) also must be postponed pending a search for possible additional forms of mRNA. Although I listed the von Hippel-Lindau tumor suppressor gene as a possible example of leaky scanning (Table 3), definitive tests are needed to distinguish between that and other mechanisms for producing the short isoform (Iliopoulos et al., 1998). Those of us with an interest in translation have a tendency to interpret every change in mRNA structure as a means to control translation, but transcriptional requirements – the need to turn on a gene in various tissues via whatever promoter works in each tissue – underlie most switches in 5′ leader sequences. In some cases the actual sequence of the 5′ UTR is dictated by the presence therein of transcriptional control elements (Akiri et al., 1998, Minami et al., 2001, Solecki et al., 1997, Yin and Blanchard, 2000, Yu et al., 2001, Zimmermann et al., 2000). Regulation of transcription is the major reason for the GC-rich domains near the 5′ end of many mammalian genes; the accompanying down-modulation of translation is an inevitable consequence – arguably a useful consequence because, given the long half-life of most mammalian mRNAs, inefficient translation might be a necessity. It merits repeating that, although the m7G cap strongly promotes ribosome binding, the scanning mechanism is not dependent on the presence of the cap. The essence of the scanning model is 5′ entry of ribosomes and position-dependent selection of the AUGSTART codon. Those key points hold with naturally uncapped mRNAs produced by some viruses (footnote g in Table 1) and with synthetic uncapped mRNAs used to study translation in vitro (Kozak, 1979a, Kozak, 1998). The inclination to invoke internal initiation based on indirect criteria – absence of a cap, or the ability to be translated in extracts from poliovirus-infected cells – should be resisted. It is a mistake to think that, because archaeal mRNAs lack a 5′ cap, translation in that system cannot occur via scanning. The discovery in archaea of proteins similar to certain eukaryotic initiation factors (Kyrpides and Woese, 1998) is intriguing for other reasons but has no direct bearing on whether the start codon in archaeal mRNAs might be recognized via a prokaryotic- or eukaryotic-type mechanism. That interesting question, which bears on the evolutionary origin of scanning, awaits answering. Fundamental questions about the molecular workings of the scanning mechanism also await answering. What drives migration of the 40S subunit during the scanning phase? How does the 40S subunit hold on at a terminator codon, in order to reinitiate? What prevents reinitiation when the size of the first ORF exceeds a certain length? We know nothing about how recognition of the start codon is aided by a purine in position −3 and G in position +4. There is no evidence for base pairing between the GCCRCC motif and rRNA (or for binding of rRNA to any other sequence in eukaryotic mRNAs). There is as yet no convincing evidence for recognition of GCCRCC by a trans-acting protein factor. It would be easy, and meaningless, simply to find proteins that bind an RNA fragment which contains the motif. Credible experiments would require controls based on what we know about the consensus sequence: that the purine (A>G) in position −3 plays a dominant role, and the full effect requires that the GCCRCC motif abut the AUG codon (Kozak, 1999, Fig. 1). With so much effort being directed to searching for possible exceptions to the scanning mechanism, one can only wish that some enterprising soul would tackle these important questions.

Uncited References

Alderete et al., 2001.
  475 in total

1.  Characterization of an internal ribosomal entry segment in the 5' leader of murine leukemia virus env RNA.

Authors:  C Deffaud; J L Darlix
Journal:  J Virol       Date:  2000-01       Impact factor: 5.103

2.  Biosynthesis of osteogenic growth peptide via alternative translational initiation at AUG85 of histone H4 mRNA.

Authors:  I Bab; E Smith; H Gavish; M Attar-Namdar; M Chorev; Y C Chen; A Muhlrad; M J Birnbaum; G Stein; B Frenkel
Journal:  J Biol Chem       Date:  1999-05-14       Impact factor: 5.157

3.  Rice tungro bacilliform virus open reading frames II and III are translated from polycistronic pregenomic RNA by leaky scanning.

Authors:  J Fütterer; H M Rothnie; T Hohn; I Potrykus
Journal:  J Virol       Date:  1997-10       Impact factor: 5.103

4.  The size of Rous sarcoma virus mRNAs active in cell-free translation.

Authors:  T Pawson; R Harvey; A E Smith
Journal:  Nature       Date:  1977-08-04       Impact factor: 49.962

5.  Role of mRNA secondary structure in translational repression of the maize transcriptional activator Lc(1,2).

Authors:  L Wang; S R Wessler
Journal:  Plant Physiol       Date:  2001-03       Impact factor: 8.340

6.  The yeast VAS1 gene encodes both mitochondrial and cytoplasmic valyl-tRNA synthetases.

Authors:  B Chatton; P Walter; J P Ebel; F Lacroute; F Fasiolo
Journal:  J Biol Chem       Date:  1988-01-05       Impact factor: 5.157

7.  The retinoblastoma interacting zinc finger gene RIZ produces a PR domain-lacking product through an internal promoter.

Authors:  L Liu; G Shao; G Steele-Perkins; S Huang
Journal:  J Biol Chem       Date:  1997-01-31       Impact factor: 5.157

8.  Upstream open reading frames regulate the translation of the multiple mRNA variants of the estrogen receptor alpha.

Authors:  Martin Kos; Stefanie Denger; George Reid; Frank Gannon
Journal:  J Biol Chem       Date:  2002-07-29       Impact factor: 5.157

9.  Targeting of a human iron-sulfur cluster assembly enzyme, nifs, to different subcellular compartments is regulated through alternative AUG utilization.

Authors:  T Land; T A Rouault
Journal:  Mol Cell       Date:  1998-12       Impact factor: 17.970

10.  Transient expression of human and chicken progesterone receptors does not support alternative translational initiation from a single mRNA as the mechanism generating two receptor isoforms.

Authors:  P Kastner; M T Bocquel; B Turcotte; J M Garnier; K B Horwitz; P Chambon; H Gronemeyer
Journal:  J Biol Chem       Date:  1990-07-25       Impact factor: 5.157

View more
  353 in total

1.  Initiation context modulates autoregulation of eukaryotic translation initiation factor 1 (eIF1).

Authors:  Ivaylo P Ivanov; Gary Loughran; Matthew S Sachs; John F Atkins
Journal:  Proc Natl Acad Sci U S A       Date:  2010-10-04       Impact factor: 11.205

2.  Interrelations between the efficiency of translation start sites and other sequence features of yeast mRNAs.

Authors:  A V Kochetov; N A Kolchanov; A Sarai
Journal:  Mol Genet Genomics       Date:  2003-11-08       Impact factor: 3.291

3.  Characterization of the 5'-untranslated region of YB-1 mRNA and autoregulation of translation by YB-1 protein.

Authors:  Takao Fukuda; Megumi Ashizuka; Takanori Nakamura; Kotaro Shibahara; Katsumasa Maeda; Hiroto Izumi; Kimitoshi Kohno; Michihiko Kuwano; Takeshi Uchiumi
Journal:  Nucleic Acids Res       Date:  2004-01-29       Impact factor: 16.971

4.  Isolation of multiple TT virus genotypes from spleen biopsy tissue from a Hodgkin's disease patient: genome reorganization and diversity in the hypervariable region.

Authors:  Ilijas Jelcic; Agnes Hotz-Wagenblatt; Andreas Hunziker; Harald Zur Hausen; Ethel-Michele de Villiers
Journal:  J Virol       Date:  2004-07       Impact factor: 5.103

5.  Constitutive and nitrogen catabolite repression-sensitive production of Gat1 isoforms.

Authors:  Rajendra Rai; Jennifer J Tate; Isabelle Georis; Evelyne Dubois; Terrance G Cooper
Journal:  J Biol Chem       Date:  2013-12-09       Impact factor: 5.157

6.  Fine-tuning and autoregulation of the intestinal determinant and tumor suppressor homeobox gene CDX2 by alternative splicing.

Authors:  Camille Balbinot; Marie Vanier; Olivier Armant; Asmaa Nair; Julien Penichon; Christine Soret; Elisabeth Martin; Thoueiba Saandi; Jean-Marie Reimund; Jacqueline Deschamps; Felix Beck; Claire Domon-Dell; Isabelle Gross; Isabelle Duluc; Jean-Noël Freund
Journal:  Cell Death Differ       Date:  2017-09-01       Impact factor: 15.828

7.  Deleterious mutation in FDX1L gene is associated with a novel mitochondrial muscle myopathy.

Authors:  Ronen Spiegel; Ann Saada; Jonatan Halvardson; Devorah Soiferman; Avraham Shaag; Simon Edvardson; Yoseph Horovitz; Morad Khayat; Stavit A Shalev; Lars Feuk; Orly Elpeleg
Journal:  Eur J Hum Genet       Date:  2013-11-27       Impact factor: 4.246

8.  Role of the highly structured 5'-end region of MDR1 mRNA in P-glycoprotein expression.

Authors:  Rebecca A Randle; Selina Raguz; Christopher F Higgins; Ernesto Yagüe
Journal:  Biochem J       Date:  2007-09-15       Impact factor: 3.857

9.  A bicistronic MAVS transcript highlights a class of truncated variants in antiviral immunity.

Authors:  Sky W Brubaker; Anna E Gauthier; Eric W Mills; Nicholas T Ingolia; Jonathan C Kagan
Journal:  Cell       Date:  2014-02-13       Impact factor: 41.582

10.  miniMAVS, You Complete Me!

Authors:  Manira Rayamajhi; Edward A Miao; Nathaniel J Moorman
Journal:  Cell       Date:  2014-02-13       Impact factor: 41.582

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.