Literature DB >> 19783813

FSscan: a mechanism-based program to identify +1 ribosomal frameshift hotspots.

Pei-Yu Liao1, Yong Seok Choi, Kelvin H Lee.   

Abstract

In +1 programmed ribosomal frameshifting (PRF), ribosomes skip one nucleotide toward the 3'-end during translation. Most of the genes known to demonstrate +1 PRF have been discovered by chance or by searching homologous genes. Here, a bioinformatic framework called FSscan is developed to perform a systematic search for potential +1 frameshift sites in the Escherichia coli genome. Based on a current state of the art understanding of the mechanism of +1 PRF, FSscan calculates scores for a 16-nt window along a gene sequence according to different effects of the stimulatory signals, and ribosome E-, P- and A-site interactions. FSscan successfully identified the +1 PRF site in prfB and predicted yehP, pepP, nuoE and cheA as +1 frameshift candidates in the E. coli genome. Empirical results demonstrated that potential +1 frameshift sequences identified promoted significant levels of +1 frameshifting in vivo. Mass spectrometry analysis confirmed the presence of the frameshifted proteins expressed from a yehP-egfp fusion construct. FSscan allows a genome-wide and systematic search for +1 frameshift sites in E. coli. The results have implications for bioinformatic identification of novel frameshift proteins, ribosomal frameshifting, coding sequence detection and the application of mass spectrometry on studying frameshift proteins.

Entities:  

Mesh:

Substances:

Year:  2009        PMID: 19783813      PMCID: PMC2790909          DOI: 10.1093/nar/gkp796

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Translation is a highly accurate process. The frequency of decoding error is estimated to be on the order of 10−5 per codon (1). Programmed ribosomal frameshifting (PRF) is a coded shift in the reading frame during translation. Consequently, mRNAs with PRF features may yield two different protein products, an inframe product and a frameshifted product. In +1 PRF, the ribosome skips over one nucleotide toward the 3′ direction. Today, 88 cases of +1 PRF have been found in different organisms in the RECODE database (2). +1 PRF has been observed to occur during the translation of prfB to produce release factor 2 (RF2) in Escherichia coli (3). In Saccharomyces cerevisiae four retrotransposable elements, Ty1, Ty2, Ty3 and Ty4 (4–6), and three genes, ABP140 (7), EST3 (8) and OAZ1 (9) use +1 PRF. The expression of mammalian antizyme has also been shown to involve +1 PRF (10). A genome-wide prediction of +1 frameshift sites is currently a difficult task because the sequence elements for +1 frameshifting are diverse among the organisms. To date, most of the known genes involving +1 PRF have been discovered by chance, and in some cases, by searching homologous genes. Several computer programs have been developed to identify +1 frameshift sites (11,12). Shah et al. (11) hypothesized that selective pressure would have rendered potential frameshift sites under-abundant in protein coding sequences. In that study, a computer program was developed to identify oligos that are over- or underrepresented for reasons other than codon bias. Their result suggested that the heptanucleotides CUU AGG C and CUU AGU U, +1 PRF sites for the production of ABP140 and EST3, respectively, rank among the least represented of the heptanucleotides in the coding sequence of S. cerevisiae. While the approach is able to identify novel sequences, the method did not account for stimulatory signals. The program ‘FSFinder’ by Moon et al. (12) used known components of a frameshift cassette for predicting both −1 and +1 PRF sites. This method achieves a high sensitivity and a high specificity (0.88 and 0.97, respectively) for predicting +1 PRF. However, FSFinder does not predict novel +1 frameshift sites in E. coli. A novel antizyme gene, whose expression requires +1 frameshifting, was found in the zebra fish Danio rerio by a protein BLAST search against the translated nucleotide database of the known antizyme family sequence (13). While the method successfully identified novel genes requiring +1 frameshifting, the approach is limited to the antizyme family in eukaryotic cells. Recently, a mathematical model revealed that destabilization of the deacylated tRNA in the ribosomal E-site, rearrangement of the peptidyl-tRNA in the ribosomal P-site, and availability of the cognate aminoacylated tRNA (aa-tRNA) corresponding to the ribosomal A-site act synergistically to promote efficient +1 PRF in E. coli (14). Motivated by this result, one might identify potential +1 frameshift sites in the E. coli genome by searching sequences with a combination of stimulatory, E-, P- and A-site features. In this study, FSscan is developed to perform a systematic and genome-wide search for potential +1 frameshift sites in E. coli. Based on a current state-of-art understanding of the mechanism of +1 PRF, FSscan looks for a 16-nt sequence with possible synergistic effects in the E. coli genome. Potential +1 frameshift sequences so identified are shown to promote significant levels of +1 frameshifting in vivo. The mass spectrometry data obtained from a multiple reaction monitoring assay (MRM), a specific and sensitive mass scan method (15), experimentally confirms the expression of the predicted frameshift protein. Importantly, current methods of coding sequence detection generally do not take into account the shift of the reading frames and only a few algorithms assign a frameshift as a possible regulatory process (16). FSscan presented in the study provides an algorithm to predict potential +1 frameshift products in E. coli.

FSscan algorithm

FSscan is developed in Python (v2.4.3, Python Software Foundation, Hampton, NH) to search for potential +1 frameshift sites in the E. coli genome. The program assigns scores for a 16-nt window along a gene sequence according to different effects of the stimulatory signals (S score) and interactions of the E-, P- and A-site in the ribosome (E, P and A scores, respectively) (Figure 1). A stimulatory signal in E. coli for +1 PRF can be a Shine–Dalgarno (SD)—like sequence upstream of the frameshift site (17). FSscan assigns zero to the S score if <4 base pairings can be formed between the 6 nt upstream of the E-site position and the anti-SD sequence (3′UCCUCC5′); otherwise, FSscan assigns the number of base pairings divided by three to the S score [Equation (1)].
Figure 1.

The scoring system for FSscan program. FSscan calculates scores for a 16-nt window along the gene sequence. Each step is 3 nt. FS index (FSI) = S + E + P + A.

The scoring system for FSscan program. FSscan calculates scores for a 16-nt window along the gene sequence. Each step is 3 nt. FS index (FSI) = S + E + P + A. Sanders et al., (18) suggested that zero frame condon:anticodon interactions in the E-site can affect frameshifting. The E score is calculated as exp (−ΔGc), where ΔGc is the codon:anticodon interaction (19) in the ribosome E-site. For the P-site, both zero frame and +1 frame interactions can influence +1 frameshifting (20). The P score in the program represents the stability difference between the zero frame and the +1 frame interactions for the P-site tRNA, normalized with the maximum stability difference obtained among 256 possible P-site sequences (Supplementary Data). The A score is the combination of the A0 score and the A1 score. The A0 score is the ratio of the arrival frequency, on the basis of transport by diffusion, of the near-cognate aa-tRNA versus the cognate aa-tRNA corresponding to the zero frame A-site codon (21), normalized with the maximum ratio of the arrival frequency obtained among 64 possible zero frame A-site codons. The A1 score is the ratio between the concentration of the cognate aa-tRNA for the +1 frame A-site codon to that of the cognate aa-tRNA for the zero frame A-site codon (21), normalized with the maximum concentration ratio obtained among 256 possible A-site sequences. For a stop codon in the zero frame A-site, the A0 and A1 scores were set to be 0.9 for TAG and TGA, and 0.6 for TAA. If the summation of the E, P and A scores is <3, the S score is then reset to zero [Equation (2)]. Equation (2) has a higher priority than Equation (1), which means, as long as the summation of the E, P and A score is <3, the program assigns zero to the S score no matter how many base pairings can be formed between the mRNA sequence and the anti-SD sequence. The frameshift index (FSI) for a 16-nt window is calculated as Equation (3). A higher FSI suggests the sequence contains more features for +1 frameshifting. It is important to note that FSI is not set for quantitatively predicting the level of the +1 frameshifting, but rather how likely a sequence is a frameshift site.

MATERIALS AND METHODS

Plasmids and bacterial strains

Escherichia coli XL1 blue MRF’ (Stratagene, La Jolla, CA, USA) was used in all experimental studies. All constructs were verified by DNA sequencing. The construction of the dual fluorescence reporter was performed as described previously (14). The control strain has both DsRed and enhanced green fluorescence protein (EGFP) coding sequences in frame. For the test strain, the linker sequences inserted between the two reporters contained predicted frameshift sequences followed by an in-frame stop codon and the downstream egfp in the +1 frame. The control strain expressed the DsRed-EGFP fusion protein from the reporter. The test strains expressed DsRed proteins as non-frameshift proteins (due to the stop codon in the linker sequence) and DsRed-EGFP fusion protein as frameshift proteins (because the stop codon is bypassed by +1 frameshifting). Table 1 lists the nucleotide sequences incorporated into the dual fluorescence reporter for testing +1 frameshift efficiency in vivo in this study. A negative control strain, ran1, was transformed with a plasmid containing a randomly designed linker (rand) inserted between the two fluorescence reporters with egfp in the +1 frame.
Table 1.

Nucleotide sequences incorporated into the dual fluorescence reporter system for testing +1 frameshift efficiency in vivo in this study

Original gene16-nt window with max FSI in the gene (the P-site position is underlined)Strain (transformed with corresponding reporter plasmids)
yehPGTG GAG TAT GGT CGG CyehP6
nuoEGAG CGG TAT AAA TGA AnuoE6
pepPAGT GAG ATA TCC CGG CpepP6
cheAAGT CGC TAT CCC CGG CcheA6
ygcHCCA CTC TAT TTT CGG CygcH6
yeaIAAT ATT TAT AAT CGG CyeaI6
pspDCAG CGT TAT AAA AGG TpspD6
glnDGGT GGG ATA AAA GCC CglnD6
yjgNGAG AGA TAT TTT CTT AyjgN6
cysDCAG GGG TAT TTT TAA GcysD6
randTCT GGC TCT GGC TGA Gran1
yehPGTG GAG TTAGGT CGG C (mutated sequence shown in bold)yehP7

yehP, nuoE, pepP, cheA, ygcH and yeaI are the top ranking candidates identified by FSscan.

glnD, yjgN and cysD are selected genes with one or two frameshifting features. rand is a randomly designed sequence to serve as a negative control.

Nucleotide sequences incorporated into the dual fluorescence reporter system for testing +1 frameshift efficiency in vivo in this study yehP, nuoE, pepP, cheA, ygcH and yeaI are the top ranking candidates identified by FSscan. glnD, yjgN and cysD are selected genes with one or two frameshifting features. rand is a randomly designed sequence to serve as a negative control. The first 915 nt in yehP were PCR-amplified with the forward primer, yehPf, 5′-AAA-3′ (PstI site underlined) and two reverse primers, yehPr0 5′-ATT-3′ and yehPr1 5′-ATT-3′ (KpnI site underlined) using E. coli genomic DNA as a template. The PstI/KpnI restricted PCR products were ligated with a PstI/KpnI-restricted pEGFP (Clontech, Mountain View, CA, USA) vector to yield pYehP0 (using yehPr0 as the reverse primer for PCR) and pYehP1 (using yehPr1 as the reverse primer for PCR). The predicted frameshift sequence in pYehP1 was mutated by using QuikChange II site-directed mutagenesis kit (Stratagene) to create pYehPC. BsrGI/EcoRI restricted pYehP0, pYehP1 and pYehPC were ligated with a nucleotide sequence, 5′-GTACAAGCATCATCATCATCATCATTAAG-3′, to create pYehP20, pYehP21 and pYehP2C to add a 6X-histidine tag downstream of egfp. KpnI/NcoI restricted pYehP20, pYehP21 and pYehP2C were ligated with a nucleotide sequence, 5′-CGTCTAGCTCTGGCTCTGGCTCTGGCAC-3′, to create pYehP40, pYehP41 and pYehP4C to incorporate an in-frame stop codon and a flexible linker between yehP and egfp. Escherichia coli strains transformed with pYehP40, pYehP41 and pYehP4C are named yehP40, yehP41 and yehP4C, respectively.

Fluorescence assay

Cells with the appropriate plasmids were cultured in 1 ml Luria-Bertani (LB) medium containing 100 µg/ml ampicillin in a 24-well plate for 24 h at 37°C. The fluorescence was then measured by a plate reader (SpectraMax M5, Molecular Devices, Sunnyvale, CA, USA). The fluorescence measurement was performed as described previously (14). Frameshift efficiency (FS%) was obtained as the ratio of the green fluorescence to the red fluorescence for the test strains, normalized against the fluorescence ratio of the control strain. Statistical analysis was applied to all data sets according to Jacobs and Dinman (22). Eleven to twelve replicates for test strains and control strains were performed to satisfy the minimum sample requirement for statistical significance.

Western analysis

Cells with the appropriate plasmids were cultured in 3 ml LB medium containing 100 µg/ml ampicillin in 17 ml round-bottom tubes at 37°C. Aliquots of cells were harvested after 24-h cultivation and pelleted by centrifugation for 20 min at 4°C and 4000 g. The cell pellet was resuspended in 50 µl phosphate-buffered saline per OD600 and resolved by SDS–PAGE (10% w/v TrisHCl). Immunoblot was performed as described by Gupta and Lee (23), except rabbit anti-GFP (1:5000, Clontech) and alkaline phosphatase conjugated mouse anti-rabbit IgG antibody (1:10 000; Sigma, St. Louis, MO, USA) were used as the primary and secondary antibodies, respectively.

Protein digestion

yeh41 cell lysate was purified by Ni–NTA under denaturing conditions according to the manufacturer's protocol (Qiagen, Valencia, CA, USA). The purified protein sample was exchanged into 0.2 M ammonium bicarbonate using Amicon Ultra 10-kDa molecular cutoff filter (Millipore, Billerica, MA, USA). The buffer-exchanged sample was denatured and reduced by 6 M urea and 200 mM dithiothreitol (DTT) at room temperature for an hour. Then, the sample was alkylated by 200 mM iodoacetamide at room temperature for an hour in the dark. The remaining iodoacetamide in the sample was quenched by 200 mM DTT at room temperature for an hour and the sample was digested by trypsin (Promega, Madison, WI, USA) at 37°C for 14 h. The digestion was stopped by decreasing the pH of the solution with 88% formic acid (FA) and vacuum dried, and the digested sample was reconstituted with 25 µl of 0.1% FA.

Liquid chromatography tandem mass spectrometry

Of the digested sample, 1.2 µl was separated by Dionex 3000 nLC system (Sunnyvale, CA, USA) with an Acclaim PepMap 100 C18 trap column (300 µm × 5 mm, 5 µm, for the online desalting at a flow rate of 30 µl/min for 3 min) and an Acclaim PepMap 100 C18 analytical column (75 µm × 15 cm, 3 µm) at a flow rate of 250 nl/min. Peptides were eluted with gradients of 2–90% acetonitrile with 0.1% FA and the eluent was directly introduced into 4000 QTRAP MS through Nanospray II source (Applied Biosystems, Foster City, CA, USA) for MRM study. To determine the appropriate MRM transitions that would be specific to the peptide of interest, the frameshift protein sequence was imported into the MIDAS Workflow software system (Applied Biosystems). The software generates a list of possible MRM transitions (Table S2), including mass to charge ratios of precursor ions, fragment ions and collision energy values for fragmentation. MS and MS/MS data obtained through MRM were searched within a custom sequence database that included the addition of the frameshift protein sequence. The spectral assignment of MS/MS were performed using ProteinPilot (v1.2 Applied Biosystems).

RESULTS

FSscan identifies a +1 frameshift hot spot in prfB gene

FSscan successfully identifies the +1 frameshift site in prfB. Figure 2 shows the FSI along the prfB gene sequence. The FSI is at maximum when the ribosome P-site is positioned at the 25th codon in the coding sequence, the frameshift site for prfB in the literature (3).
Figure 2.

FSscan identifies the +1 frameshift site in prfB. A peak FSI is observed as the ribosome P-site is positioned at the 25th codon.

FSscan identifies the +1 frameshift site in prfB. A peak FSI is observed as the ribosome P-site is positioned at the 25th codon.

Analysis of 4132 protein coding sequences in the E. coli genome reveals additional potential +1 frameshift candidates

To identify potential +1 framshifting sites, FSscan analyzed 4132 protein coding sequences in E. coli K12 MG1655 genome (Genbank: U00096). As the FSI calculation requires an additional nucleotide downstream of the A-site codon, the 4132 coding sequences were adjusted to include one more nucleotide downstream of the stop codon. The maximum FSI obtained in each protein coding sequence is plotted in Figure 3. prfB, whose expression has been shown to involve +1 PRF (3), has the highest FSI among all tested coding sequences (maximum FSI in prfB = 5.05). The next four highest ranking genes are yehP, nuoE, pepP and cheA, with a maximum FSI 4.47, 4.39, 4.39 and 3.54 in their coding sequences, respectively. The potential +1 frameshift sequences in these genes are listed in Table 1. None of these candidates has been reported by previous approaches to identify +1 PRF genes (11,12). The other 4127 protein-coding sequences all have a maximum FSI <3.50.
Figure 3.

Maximum FSI in each of the 4132 E. coli protein-coding sequences. Five genes with a maximum FSI above 3.5 are indicated in red. prfB has the maximum FSI 5.05. yehP has the maximum FSI 4.47. nuoE has the maximum FSI 4.39. pepP has the maximum FSI 4.39. cheA has the maximum FSI 3.55.

Maximum FSI in each of the 4132 E. coli protein-coding sequences. Five genes with a maximum FSI above 3.5 are indicated in red. prfB has the maximum FSI 5.05. yehP has the maximum FSI 4.47. nuoE has the maximum FSI 4.39. pepP has the maximum FSI 4.39. cheA has the maximum FSI 3.55.

In vivo examination of +1 frameshift sequences agrees with the program predictions

Several +1 frameshift candidates were examined in vivo by using a dual fluorescence reporter system. A randomly designed sequence with FSI = 1.70 (rand, Table 1) was constructed to serve as a negative control strain (see ‘Materials and Methods’ section). Potential frameshift sequences from yehP, nuoE, pepP and cheA resulted in FS% significantly higher than rand (Figure 4). A lower FS% was observed for sequences with FSI <3.5, suggesting that FSI 3.5 may serve as a threshold for identifying potential frameshift cassettes.
Figure 4.

Frameshift efficiency (FS%) for potential frameshift sequences identified by FSscan. The histogram indicates the experimentally observed FS% for different test strains listed in Table 1. Error bars show the standard deviation. Diamonds demonstrate the program calculated FSI for the potential frameshift cassettes (sequences are shown in Table 1).

Frameshift efficiency (FS%) for potential frameshift sequences identified by FSscan. The histogram indicates the experimentally observed FS% for different test strains listed in Table 1. Error bars show the standard deviation. Diamonds demonstrate the program calculated FSI for the potential frameshift cassettes (sequences are shown in Table 1).

FSscan identifies yehP as a +1 frameshift candidate

yehP contains a potential +1 frameshift sequence with the second highest FSI, only after prfB. The predicted frameshifting sequence is GTG GAG T (where each zero frame codon is separated by a space and the P-site position for obtaining the maximum FSI is underlined). In this sequence, an ATG in the +1 frame (shown in bold in the sequence above) together with an upstream GGAG may result in internal translation, causing non-frameshifting based EGFP expression in the dual reporter system. To further confirm yehP as a candidate +1 PRF gene, the sequence was mutated to GTG GAG T (mutation shown in bold) to remove ATG in the +1 frame while keeping a weaker E-site interaction (yehP7 in Table 1). A small decrease in FS% was observed (Figure 5), but the mutation still resulted in a significantly higher FS% as compared to the negative control strain, ran1 (Figure 4). This observation suggests that the higher FS% for yehP6 is not likely due to the internal translation of EGFP starting from the linker sequence.
Figure 5.

Frameshift efficiency (FS%) for yehP6 and yehP7. In yehP6, the linker inserted between the two fluorescence reporters contains the predicted yehP frameshift sequence: GTG GAG TAT GGT CGG C. In yehP7, the frameshift sequence is mutated to GTG GAG TTA GGT CGG C (where zero frame codons are separated by spaces).

Frameshift efficiency (FS%) for yehP6 and yehP7. In yehP6, the linker inserted between the two fluorescence reporters contains the predicted yehP frameshift sequence: GTG GAG TAT GGT CGG C. In yehP7, the frameshift sequence is mutated to GTG GAG TTA GGT CGG C (where zero frame codons are separated by spaces). To study the frameshift site in yehP, the fusion constructs yehP40, yehP41 and yehP4C were made with egfp 3′ to yehP (Figure 6a). Proteins from cell lysate were subjected to western analysis. Protein bands with molecular weight 63 kDa, the expected mass for the fusion protein, were observed for yehP40 and yehP41. Interestingly, no or very few proteins with this mass were observed when the potential frameshift sequence was mutated to GTG GAG T to remove frameshifting features (yehP4C, mutated nucleotides shown in bold) (Figure 6a and b). The result suggests that the +1 frameshift event is specific to the predicted sequence.
Figure 6.

(a) The nucleotide sequence design for yehP40, yehP41 and yehP4C. (b) Western blot for the cell lysate to detect the frameshift protein. Lane 1: total lysate from yehP40; lane 2: total lysate from yehP41; lane 3: total lysate from yehP4C. The amount of the protein loaded for yehP40 is one-third of the amount of the protein for yehP41 and yehP4C.

(a) The nucleotide sequence design for yehP40, yehP41 and yehP4C. (b) Western blot for the cell lysate to detect the frameshift protein. Lane 1: total lysate from yehP40; lane 2: total lysate from yehP41; lane 3: total lysate from yehP4C. The amount of the protein loaded for yehP40 is one-third of the amount of the protein for yehP41 and yehP4C. Proteins from yehP41 cell lysate were purified, buffer-exchanged and digested by trypsin. The digest was analyzed by liquid chromatography tandem mass spectrometry (LC-MS/MS) using MRM. MRM is a highly sensitive scanning technique for peptide identification. The greater specificity is achieved by fragmenting the analyte and monitoring both parent and one or more product ions simultaneously [see review by Kitteringham et al. (24)]. Figure 7 presents the amino acid sequence derived from the frameshift site and the tryptic peptides observed by MRM. The presence of the peptide VQLGGGTNIASAVEYGGNLLNNQR (Figure S3 in the Supplementary Data), whose coding sequence spans the potential frameshift site, is a result of the +1 frameshifting at the 291st codon, (where the P-site position is underlined), in yehP. This result further confirms the frameshift site in yehP, as suggested by FSscan.
Figure 7.

Nucleotide and amino acid sequence for the YehP-EGFP frameshift protein in yehP41. (a) The nucleotide and amino acid sequence for the predicted frameshift region in YehP-EGFP. The predicted frameshift sequence is shown in bold, with the P-site codon underlined. The zero frame and the +1 frame amino acid sequences are shown under the nucleotide sequence. The peptide spanning the frameshift site, with the zero frame translation before the site and the +1 frame translation after the site, is shown in red. (b) Amino acid sequence for the frameshift protein in yehP41 strain. The YehP-EGFP was expressed as a result of +1 frameshifting. Tryptic peptides observed by MRM are marked in red (>95% confidence level). The sequence coverage is 21.7%.

For +1 frameshifting at the 291st codon in yehP, the ribosome encounters a stop codon 15 codons downstream of the frameshift site. As a result, the frameshift product is 303 amino acids in length, which is 75 amino acids shorter than the non-frameshift yehP product. Importantly, yehP is highly conserved in different E. coli strains and is also observed in several other eubacteria (Table 2). The consensus of the yehP frameshift cassette for the 31 sequences in Table 2 is shown by a sequence logo (Figure 8) (25,26). Only a minor diversity is observed at position 1, 6, 12 and 14 in the 16-nt frameshifting window.
Table 2.

BLAST result for yehP. blastn was used as the algorithm to search the nucleotide collection database in National Center for Biotechnology Information's website

AccessionDescriptionMax scoreTotal scoreQuery coverage (%)E-valueMax ident (%)
CP000948.1Escherichia coli str. K12 substr. DH10B, complete genome225422901000.0100
AP009048.1Escherichia coli str. K12 substr. W3110 DNA, complete genome225422901000.0100
U00096.2Escherichia coli str. K-12 substr. MG1655, complete genome225422901000.0100
U00007.147 to 48 centisome region of E. coli K12 BHB2600225422541000.0100
CU928160.2Escherichia coli str. IAI1 chromosome, complete genome211921551000.0100
AP009240.1Escherichia coli SE11 DNA, complete genome209521321000.0100
CP000800.1Escherichia coli E24377A, complete genome209521321000.0100
CP000036.1Shigella boydii Sb227, complete genome209521681000.0100
AB426057.1Escherichia coli O111:H- DNA, genomic island GEI2.21208720871000.098
CP000034.1Shigella dysenteriae Sd197, complete genome208721601000.0100
CP000946.1Escherichia coli ATCC 8739, complete genome205620921000.0100
CP000802.1Escherichia coli HS, complete genome203220681000.0100
AE005674.1Shigella flexneri 2a str. 301, complete genome199220651000.0100
AE014073.1Shigella flexneri 2a str. 2457T, complete genome199220651000.0100
AE014075.1Escherichia coli CFT073, complete genome197620851000.0100
CU928164.2Escherichia coli str. IAI39 chromosome, complete genome196120331000.0100
BA000007.2Escherichia coli O157:H7 str. Sakai DNA, complete genome196120331000.0100
AE005174.2Escherichia coli O157:H7 EDL933, complete genome196120331000.0100
CP001164.1Escherichia coli O157:H7 str. EC4115, complete genome195320251000.0100
CP000970.1Escherichia coli SMS-3-5, complete genome193720091000.0100
CU928162.2Escherichia coli str. ED1a chromosome, complete genome191320211000.0100
FM180568.1Escherichia coli 0127:H6 E2348/69 complete genome, strain E2348/69190519771000.0100
CU928161.2Escherichia coli str. S88 chromosome, complete genome189720061000.0100
CP000468.1Escherichia coli APEC O1, complete genome189720061000.0100
CP000243.1Escherichia coli UTI89, complete genome189720061000.0100
CU928158.2Escherichia fergusonii str. ATCC 35469T chromosome, complete genome185019241000.095
CP000247.1Escherichia coli 536, complete genome185019581000.0100
CU928163.2Escherichia coli str. UMN026 chromosome, complete genome184219141000.0100
CU651637.1Escherichia coli LF82 chromosome, complete sequence181819261000.0100
AP000400.1Enterobacteria phage VT1-Sakai genomic DNA, prophage inserted region in Escherichia coli O157:H715421542810.096
CP000038.1Shigella sonnei Ss046, complete genome603675298e-169100

The search was optimized for highly similar sequences

Max ident, Maximum identities.

Figure 8.

Sequence conservation of the predicted frameshift cassette in yehP. The sequence logo was generated by aligning 31 sequences in Table 2.

Nucleotide and amino acid sequence for the YehP-EGFP frameshift protein in yehP41. (a) The nucleotide and amino acid sequence for the predicted frameshift region in YehP-EGFP. The predicted frameshift sequence is shown in bold, with the P-site codon underlined. The zero frame and the +1 frame amino acid sequences are shown under the nucleotide sequence. The peptide spanning the frameshift site, with the zero frame translation before the site and the +1 frame translation after the site, is shown in red. (b) Amino acid sequence for the frameshift protein in yehP41 strain. The YehP-EGFP was expressed as a result of +1 frameshifting. Tryptic peptides observed by MRM are marked in red (>95% confidence level). The sequence coverage is 21.7%. Sequence conservation of the predicted frameshift cassette in yehP. The sequence logo was generated by aligning 31 sequences in Table 2. BLAST result for yehP. blastn was used as the algorithm to search the nucleotide collection database in National Center for Biotechnology Information's website The search was optimized for highly similar sequences Max ident, Maximum identities.

DISCUSSION

The scoring system

In FSscan, the S score represents the stimulatory effect on +1 frameshifting. FSscan assigns zero to the S score for <4 base pairings between the six nucleotides upstream of the E-site and the anti-SD sequence [Equation (1)]. Equation (1) implies that at least four base pairing between mRNA and the anti-SD sequence are required to reveal the stimulatory effect. FSscan identifies yehP as the second best candidate for +1 frameshifting by using four as a threshold value in Equation (1), while the program identifies cheA as the second best candidate by using five as a threshold value. The in vivo observation that yehP6 results in higher frameshift efficiency than cheA6 (Figure 4) suggests that four base pairings could be sufficient to induce a stimulatory effect. In addition, FSscan assigns zero to the S score if the summation of the E, P and A scores is <3 [Equation (2)]. Equation (2) implies that for a less prominent synergic effect of the E-, P- and A-site for +1 frameshifting, the stimulatory effect by SD:anti-SD interaction is negligible. The E score in the program represents the effect of E-site interaction on +1 frameshifting. FSscan calculates the E score as exp (−ΔGc), where ΔGc is the codon:anticodon interaction (19) in the ribosome E-site. The interaction in ribosome E-site has been shown to affect the reading frame maintenance (14,18,27–30). Weaker codon:anticodon interactions in the ribosome E-site have also been observed to result in a higher +1 frameshift efficiency (14,18). Notably, FSscan does not account for different tRNA:ribosome interactions in the E-site. While the tRNA:ribosome interactions are important for the E-site interaction, there has not been a well-established method to estimate these interactions. Previously, it has been suggested that a major fraction of the E-site tRNA binding is contributed by the binding of the 3′-terminal adenine to the ribosome (31). As the 3′-terminal adenine is conserved in all E. coli tRNAs, FSscan assumes a similar level of tRNA:ribosome interactions for different tRNAs and considers only codon:anticodon interactions in the E-site. The P score represents the stability difference between the +1 frame and the zero frame interaction for the P-site tRNA. FSscan assumes the stability difference between the +1 frame and the zero frame interaction (Δstability*) as M1S1 − M2S0, where S1 is the stability of the +1 frame interaction, S0 is the stability of the zero frame interaction, and M1 and M2 are weighting factors. A separate data fitting program suggests M1 and M2 as 0.63 and 0.26, respectively, for the best linear correlation between the Δstability* and the logarithm of +1 frameshift efficiency observed by Curran (20) (Supplementary Data). The weighting factor for the +1 frame stability is 2.4-fold larger than that for the zero frame stability. Interestingly, zero frame duplexes are in general cognate but the realigned complexes contain a much wilder array of pairing and stabilities. Taken together, a favorable +1 frame interaction in the P-site may contribute more than an unstable zero frame interaction to a higher +1 frameshift efficiency. FSscan accounts for two A-site features that enhance +1 frameshifting: (i) the competition between the cognate and the near cognate aa-tRNA for the zero frame A site codon (A0 score); (ii) the competition between the cognate aa-tRNA for the zero frame A-site codon and the cognate aa-tRNA for the +1 frame A-site codon (A1 score). A ribosome pause because of a stop codon or a rare codon in the A-site is a key factor for +1 frameshifting (32,33). It has been shown that the competition between the near-cognate aa-tRNA and the cognate aa-tRNA to the ribosome A-site plays an important role on the translation rate (21). The imbalance of the zero frame A-site tRNA and the +1 frame A-site tRNA was also shown to enhance +1 frameshifting (34). Three +1 frameshift candidates, yehP, pepP and cheA, all have CGG C in the A-site (where the zero frame codon is separated by the space). While the average A score is 0.44, the A score for CGG C is 1.58. CGG has one cognate tRNA, , with 639 molecules per cell, and four near-cognate tRNAs, , , and , with 4752, 881, 4470 and 900 molecules per cell, respectively (21). The fact that near-cognate tRNAs outnumber cognate tRNAs for CGG results in a competition between these tRNAs for the ribosome A-site. In addition, the concentration of the cognate tRNA for the +1 frame A-site codon (GGC) is about 7-fold higher than that for the zero frame A-site codon (CGG). These two features may result in a longer pause during translation, making CGG C a likely A-site codon for +1 frameshifting. The other +1 frameshift candidate, nuoE, has TGA A in the A-site. The A score for TGA A is 1.8, which is also much higher than the average A score. FSI for a 16-nt window sums up S, E, P and A scores. The S score ranges from 0 to 2. The E score ranges from 0 to 1. The P score ranges from −1 to 1. The A score ranges from 0 to 2 because it combines A0 and A1, each ranging from 0 to 1. As a result, FSscan weighs the stimulatory, P-site, and A-site effects more than the E-site effect. This algorithm is supported by the kinetic model of +1 PRF, which suggested that +1 frameshift efficiency is more sensitive to the change in the stimulatory signal, P-site, and A-site effects (14).

Analysis of six reading frames and pseudogenes

Analysis of the six reading frames of the E. coli genome by FSscan reveals that 192 sequences have FSI higher than 3.5. Eighty-three of these sequences are located in the annotated coding regions, but only five sequences are in-frame with the start codon. The five cassettes are in prfB, yehP, nuoE, pepP and cheA. This result is consistent with the analysis of the 4132 protein-coding sequences (Figure 3). The function of intergenic sequence with FSI higher than 3.5 is not clear and requires further investigation. In addition, none of the 163 pseudogenes in the E. coli genome had a maximum FSI higher than 3.5 (data not shown).

yehP

yehP contains a potential +1 frameshift site with the second highest FSI, only after prfB. The predicted frameshift site in yehP is highly conserved in different E. coli strains (Table 2 and Figure 8). The potential cassette, GTG GAG TAT (the zero frame is separated by a space and the P-site position is underlined), forms four base pairings with the anti-SD sequence and allows a weaker interaction in the E-site. In the P-site, may form two canonical base pairings with the +1 frame although a central position mismatch can also occur. Notably, it has been proposed that <2 base parings in the shifted codon : anticodon complex may be sufficient for the efficient frameshifting (35). In a more extreme case, mRNA sites with little or no potential for canonical base pairing with the peptidyl-tRNA in the ribosome can also be used as landing positions for ribosomal bypassing (36). In the A-site, CGG is one of the four codons with the highest near-cognate tRNA competition (21). All of these features make yehP a potential +1 frameshifting candidate. To date, the function of the yehP product is not well described in the literature. A known +1 PRF case in E. coli is the expression of RF2 from prfB gene (3). RF2 frameshifting is auto-regulated, meaning higher frameshift efficiency is driven by a lower level of the frameshifted products (3). It is suggested that this auto-regulation property may be evolved to evade a newly discovered fidelity control system: the ribosome would trigger a premature termination of protein synthesis when a mismatch P-site interaction is presented (37). RF2 frameshifting occurs more frequently when RF2 level is low, making it more difficult for ribosomes to trigger early termination in the presence of mismatch P-site. Whether yehP has involved in any regulation feedback loop or other mechanisms to escape from this fidelity control mechanism is uncertain. A yehP knockout E. coli strain was previously shown to result in a different swarming phenotype (38). yehP was suggested to have been introduced to the E. coli genome by the horizontal gene transfer (39). The predicted frameshifted product is 75 amino acids shorter than the standard decoding product. The function of the yehP frameshift protein remains unclear and needs to be investigated further.

Other frameshift-prone sequences

FSscan did not identify several shift-prone sequences observed experimentally in previous studies (40,41). argI was found to have a high level of +1 frameshifting at the very beginning of the coding sequence, UUU UAU (40). However, the maximum FSI in the gene is relatively low (2.0 for the P-site at the 110th codon). For the P-site positioned at the fourth codon UUU, FSI equals 0.38. Because argI frameshifting does not involve ribosomal pausing at a stop codon or a hungry codon in the A-site, the recoding may be achieved through mechanisms not considered by FSscan. In addition, CCC TGA containing genes, pheL, yjeF, ykgD and yrhB, were also shown to result in a higher level of +1 frameshifting (41). Notably, these sequences do not form >3 base pairings with the anti-SD sequence and their E-site interactions are relatively strong, which result in lower FSI. It is possible that a slippery sequence in the P-site (i.e. P-site tRNA can form complementary interactions with the +1 frame) along with a stop codon in the A-site can efficiently induce +1 frameshifting, which FSscan does not consider. On the other hand, not all of the CCC TGA containing genes promotes efficient +1 frameshifing, suggesting different mechanisms may be involved for pheL, yjeF, ykgD and yrhB framshifting. As growing numbers of the +1 frameshifting features are discovered, these features can be incorporated into FSscan to better predict frameshift sites.

FSscan as a bioinformatic program to search for novel +1 frameshift sequences

FSscan locates a 16-nt sequence with features for stimulatory signals, E-, P- and A-site effects in the E. coli genome. As compared to previous +1 frameshift site searching programs (11,12), FSscan differs in several major ways. (i) FSscan is not limited to a specific P- or A-site codon. Instead, FSscan looks for any P-site codon with a higher opportunity for tRNA rearrangement and any A-site codon with a higher possibility for a ribosome pausing during translation. (ii) The algorithm does not search for overlapping genes. Thus, it is not necessary that predicted frameshifting cassettes yield C-terminally extended fusion products. (iii) FSscan is intended for searching the E. coli genome, because the tRNA data for the score calculation and the experimental system are specific to E. coli. FSscan may be directly applied to screen the genome of E. coli bacteriophage, whose proteins can be translated by using E. coli ribosomes and tRNA pool. The strategy can be extended to other organisms with minor adjustments for the scoring system. (iv) FSscan predicts how likely a sequence is a frameshift site, but not the +1 frameshift efficiency. (v) FSscan needs no prior knowledge of the mRNA secondary structure involved in recoding. This method can be modified by varying the size of the recoding window to include mRNA structures serving as stimulatory signals.

CONCLUSION

FSscan performs a mechanistic-based genetic algorithm search for potential +1 frameshift sites in E. coli. The program successfully identifies prfB as a +1 frameshift candidate and predicts the frameshift site in this gene. Other predicted frameshift cassettes are shown to result in frameshift efficiency higher than a randomly designed sequence in vivo. These results suggest that the synergistic effects of ribosome E-, P- and A-sites are functionally important for +1 frameshifting. Importantly, FSscan provides the ability to perform a genome-wide systematic search for +1 frameshift sites. Further investigation of the predicted +1 frameshift sequences are in progress. The knowledge of different frameshift sites will enable researchers to better understand translational control.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

FUNDING

The University of Delaware. Funding for open access charge: University of Delaware internal funds. Conflict of interest statement. None declared.
  40 in total

1.  Ribosome kinetics and aa-tRNA competition determine rate and fidelity of peptide synthesis.

Authors:  Aaron Fluitt; Elsje Pienaar; Hendrik Viljoen
Journal:  Comput Biol Chem       Date:  2007-08-15       Impact factor: 2.877

2.  Genetic analysis of the E site during RF2 programmed frameshifting.

Authors:  Christina L Sanders; James F Curran
Journal:  RNA       Date:  2007-07-27       Impact factor: 4.942

3.  Isolation and characterization of a novel actin filament-binding protein from Saccharomyces cerevisiae.

Authors:  T Asakura; T Sasaki; F Nagano; A Satoh; H Obaishi; H Nishioka; H Imamura; K Hotta; K Tanaka; H Nakanishi; Y Takai
Journal:  Oncogene       Date:  1998-01-08       Impact factor: 9.867

4.  Pulling the ribosome out of frame by +1 at a programmed frameshift site by cognate binding of aminoacyl-tRNA.

Authors:  S Pande; A Vimaladithan; H Zhao; P J Farabaugh
Journal:  Mol Cell Biol       Date:  1995-01       Impact factor: 4.272

5.  Programmed translational frameshifting in a gene required for yeast telomere replication.

Authors:  D K Morris; V Lundblad
Journal:  Curr Biol       Date:  1997-12-01       Impact factor: 10.834

6.  Polyamines regulate their synthesis by inducing expression and blocking degradation of ODC antizyme.

Authors:  R Palanimurugan; Hartmut Scheel; Kay Hofmann; R Jürgen Dohmen
Journal:  EMBO J       Date:  2004-11-11       Impact factor: 11.598

7.  Functional tRNAs with altered 3' ends.

Authors:  M O'Connor; N M Willis; L Bossi; R F Gesteland; J F Atkins
Journal:  EMBO J       Date:  1993-06       Impact factor: 11.598

8.  A novel programed frameshift expresses the POL3 gene of retrotransposon Ty3 of yeast: frameshifting without tRNA slippage.

Authors:  P J Farabaugh; H Zhao; A Vimaladithan
Journal:  Cell       Date:  1993-07-16       Impact factor: 41.582

9.  Autoregulatory frameshifting in decoding mammalian ornithine decarboxylase antizyme.

Authors:  S Matsufuji; T Matsufuji; Y Miyazaki; Y Murakami; J F Atkins; R F Gesteland; S Hayashi
Journal:  Cell       Date:  1995-01-13       Impact factor: 41.582

10.  A ribosomal frameshifting error during translation of the argI mRNA of Escherichia coli.

Authors:  C Fu; J Parker
Journal:  Mol Gen Genet       Date:  1994-05-25
View more
  6 in total

1.  The many paths to frameshifting: kinetic modelling and analysis of the effects of different elongation steps on programmed -1 ribosomal frameshifting.

Authors:  Pei-Yu Liao; Yong Seok Choi; Jonathan D Dinman; Kelvin H Lee
Journal:  Nucleic Acids Res       Date:  2010-09-07       Impact factor: 16.971

2.  Two groups of phenylalanine biosynthetic operon leader peptides genes: a high level of apparently incidental frameshifting in decoding Escherichia coli pheL.

Authors:  Olga L Gurvich; S Joakim Näsvall; Pavel V Baranov; Glenn R Björk; John F Atkins
Journal:  Nucleic Acids Res       Date:  2010-12-21       Impact factor: 16.971

3.  RNA-mediated translation regulation in viral genomes: computational advances in the recognition of sequences and structures.

Authors:  Asmita Gupta; Manju Bansal
Journal:  Brief Bioinform       Date:  2020-07-15       Impact factor: 11.622

4.  Ribosomal frameshifting used in influenza A virus expression occurs within the sequence UCC_UUU_CGU and is in the +1 direction.

Authors:  A E Firth; B W Jagger; H M Wise; C C Nelson; K Parsawar; N M Wills; S Napthine; J K Taubenberger; P Digard; J F Atkins
Journal:  Open Biol       Date:  2012-10       Impact factor: 6.411

5.  On programmed ribosomal frameshifting: the alternative proteomes.

Authors:  Robin Ketteler
Journal:  Front Genet       Date:  2012-11-19       Impact factor: 4.599

6.  Analysis of tetra- and hepta-nucleotides motifs promoting -1 ribosomal frameshifting in Escherichia coli.

Authors:  Virag Sharma; Marie-Françoise Prère; Isabelle Canal; Andrew E Firth; John F Atkins; Pavel V Baranov; Olivier Fayet
Journal:  Nucleic Acids Res       Date:  2014-05-29       Impact factor: 16.971

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.