Literature DB >> 15371551

Predicting genes expressed via -1 and +1 frameshifts.

Sanghoon Moon1, Yanga Byun, Hong-Jin Kim, Sunjoo Jeong, Kyungsook Han.   

Abstract

Computational identification of ribosomal frameshift sites in genomic sequences is difficult due to their diverse nature, yet it provides useful information for understanding the underlying mechanisms and discovering new genes. We have developed an algorithm that searches entire genomic or mRNA sequences for frameshifting sites, and implements the algorithm as a web-based program called FSFinder (Frameshift Signal Finder). The current version of FSFinder is capable of finding -1 frameshift sites on heptamer sequences X XXY YYZ, and +1 frameshift sites for two genes: protein chain release factor B (prfB) and ornithine decarboxylase antizyme (oaz). We tested FSFinder on approximately 190 genomic and partial DNA sequences from a number of organisms and found that it predicted frameshift sites efficiently and with greater sensitivity and specificity than existing approaches. It has improved sensitivity because it considers many known components of a frameshifting cassette and searches these components on both + and - strands, and its specificity is increased because it focuses on overlapping regions of open reading frames and prioritizes candidate frameshift sites. FSFinder is useful for discovering unknown genes that utilize alternative decoding, as well as for analyzing frameshift sites. It is freely accessible at http://wilab.inha.ac.kr/FSFinder/.

Entities:  

Mesh:

Substances:

Year:  2004        PMID: 15371551      PMCID: PMC519117          DOI: 10.1093/nar/gkh829

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Programmed ribosomal frameshifting is involved in the expression of certain genes in a wide range of organisms such as viruses, bacteria and eukaryotes including humans (1–5). In this process, the ribosome switches to an alternative frame at a specific site in response to special signals in the messenger RNA (4). Programmed frameshifting plays a significant role in morphogenesis, autogenous control and in producing alternative enzymatic activities (6). The most common frameshift is a −1 frameshift, in which the ribosome slips a single nucleotide in the upstream direction. The major elements of −1 frameshifting consist of a slippery site, where the ribosome changes reading frames, and a stimulatory RNA structure such as a pseudoknot or a stem–loop located a few nucleotides downstream (4,6–9). It is generally accepted that ribosomes pause at −1 frameshifts, but Kontos et al. (7) report that pausing is not sufficient to mediate frameshifting. Most slippery sites consist of a heptameric sequence of the form X XXY YYZ in the incoming 0-frame (10), but there are other slippery sequences that do not conform to this motif (5). The slippery heptamer is separated from the stimulatory structure by a sequence of 5–9 nt, the so-called spacer (3,8). The length of the spacer is known to influence the efficiency of frameshifting. Frameshifts typically produce fusion proteins in which the N- and C-terminal domains are encoded by overlapping open reading frames (ORFs) (9), as shown in Figure 1.
Figure 1

The three components of −1 frameshift signals in the overlap between two ORFs: slippery sequence, spacer and pseudoknot (or stem–loop). When a frameshift takes place, protein synthesis terminates at C rather than at B.

+1 frameshifts are much less common than −1 frameshifts but have been observed in diverse organisms (6). Escherichia coli prfB encoding release factor 2 (RF2) is a well-known gene that utilizes +1 frameshifting (11,12). In RF2 frameshifting, a Shine–Dalgarno (SD) sequence is often observed upstream of a slippery sequence, normally CUU UGA C and in a single known case CUU UAA C (12). Several +1 frameshift sites have also been recognized in eukaryotic mRNA. For example, the expression of mammalian antizyme 1 (AZ1) requires a +1 frameshift, and the frameshift signal consists of a slippery sequence and two stimulatory elements—a sequence of unknown function, upstream of the slippery sequence, and a pseudoknot (13). Computational identification of frameshift sites from genomic sequences is difficult since the sequence requirements for frameshifting cassettes are diverse and highly dependent on the organism. Several computational approaches have been attempted, but only a few are publicly available. The model for eukaryotic −1 frameshifting developed by Bekaert et al. (8) only considers H-type pseudoknots as stimulatory structures and misses many frameshift sites with other stimulatory structures. Hammell et al. (9) developed a program to identify −1 frameshift sites in prokaryotic and eukaryotic DNA sequences, but the sensitivity of their approach is low; it misses many frameshift sites because it only considers downstream pseudoknots, and its definition of a pseudoknot is too restrictive. For example, their approach does not locate the frameshift sites in Rous sarcoma virus (RSV), because loops 1 and 2 of the pseudoknot are larger than permitted by their approach. FreqAnalysis developed by Shah et al. (14) is usable to identify simple novel slippery sequences, but it does not take in consideration existence of stimulators. A semi-automated approach by Ivanov et al. (13) finds a gene where antizyme frameshifting is expected to occur and then identifies the frameshift. While this approach has been shown to be successful for identifying ornithine decarboxylase antizyme (oaz) frameshifting, it omits universality. There are also computational approaches that identify frameshifting errors in sequencing when the reference protein sequences are available (15–17). In this paper, we present an algorithm for locating −1 and +1 frameshift sites of certain types in genomic or mRNA sequences. The algorithm is intended to find −1 frameshift sites of X XXY YYZ type in viruses, bacteria and eukaryotes, and considers pseudoknots as well as simple stem–loops as downstream stimulatory structures. It also allows the user to change the stem and loop sizes from their default values. +1 frameshift signals are too diverse among different organisms. Therefore, the algorithm currently finds only those frameshift sites that are conserved among many species, namely frameshift sites used in genes encoding protein chain release factor B (prfB) and ornithine decarboxylase antizyme (oaz). The algorithm has been implemented as a web-based application program called FSFinder (Frameshift Signal Finder), and is accessible at http://wilab.inha.ac.kr/FSFinder/.

COMPUTATIONAL MODEL

Components of frameshift signals

We have modified the computational model for −1 frameshift signals of Hammell et al. (9) to improve its sensitivity and selectivity. Sequences of three codons (9 nt) in a genomic sequence are first examined for possible slippery sequences of the form X XXY YYZ. In this sequence X and Z can be any nucleotide, and Y can be A or U (in Hammell's model, Z is either A, U or C). If a slippery sequence is identified, FSFinder searches for a downstream structure by sliding 4–11 nt along the spacer. Figure 2 shows a programmed −1 frameshift site with a pseudoknot as stimulatory structure. The pseudoknot is of the H-type, in which stem 1 has ≤13 bp, stem 2 has ≤6 bp, and both loops of the pseudoknot have ≤6 nt. The first 4 bp of stem 1 include at least 2 G–C pairs. Some programmed −1 frameshift signals have a simple stem–loop as stimulatory structure. As explained in Figure 3, we examine the sequence in both directions from every pivot nucleotide for possible base pairing. The pivot nucleotide can be either included in, or excluded from, the base pairing.
Figure 2

A programmed −1 ribosomal frameshift signal with an H-type pseudoknot.

Figure 3

Finding a simple stem–loop structure downstream of a slippery sequence. Nucleotides in both directions from each pivot nucleotide are examined for possible base pairing.

Frameshifting can produce longer or shorter proteins than those resulting from standard decoding (4), as shown in Figure 4. FSFinder currently finds frameshift sites that result in longer products (Figure 4A), and ignores those resulting in shorter products (Figure 4B), since it focuses on frameshift sites in the overlapping region of ORFs. An exception to this is the E.coli dnaX gene. Although dnaX −1 frameshifting results in a shorter product, FSFinder finds its frameshift site using information about the upstream SD-like sequence (18). The SD-like sequence is simplified to GGRG or RGGR in the sequence located 9 nt upstream of the slippery sequence.
Figure 4

Frameshifting may result in a long (A) or short product (B).

Since +1 frameshift signals are too diverse to model, we focus on +1 frameshift signals in two of the most common genes known to utilize frameshifting: protein chain release factor B (prfB) encoding release factor 2 (RF2), in prokaryota (12), and ornithine decarboxylase antizyme (ODC antizyme, oaz), in eukaryota (13). To detect prfB signals, FSFinder first searches for CUU UGA C or CUU UAA C slippery motifs. It then searches for an SD sequence 3 nt upstream and this sequence is simplified to 5 nt with RGG in the sequence. To detect oaz signals, FSFinder searches for UUU, UCC or CCC codons together with a UGA termination codon, a 3′ RNA pseudoknot, or both. Figure 5 shows a model of +1 frameshift signals. AUU codon that occurs upstream of UGA in Dugesia japonica antizyme frameshift site was not taken into account since it is the only known case where such frameshift site is utilized (19).
Figure 5

Programmed +1 ribosomal frameshift signals for eukaryotic oaz and prokaryotic prfB genes.

Algorithms for predicting frameshift sites

Algorithms 1 and 2 search for stem–loops and canonical base pairs, respectively. When bases of a single-stranded loop pair with complementary bases outside the loop, they are considered to form a pseudoknot (20). Algorithm 3 finds an overlap of ORFs. This is found as follows: suppose that a pair of ORFs is identified in frame 0 and frame −1, respectively (see Figure 6); the start positions of the ORFs are extended from their original start codons to upstream stop codons (positions A and C in Figure 6). The extended regions A–B and C–D of the two ORFs partially overlap at their termini if position A of frame −1 is to the left of position D of frame 0 and there exists a start codon in frame 0. FSFinder focuses on frameshift sites in the region of overlap (region E in Figure 6).
Figure 6

The reading frame A–B (region that starts at A and ends at B) and the reading frame C–D partially overlap at their termini. FSFinder focuses on finding frameshift sites in the overlap region E.

Implementation

FSFinder has been implemented as a web-based application program using Microsoft C#. It can be executed on a Windows NT/2000/XP system with Microsoft .NET framework installed. Given a DNA or mRNA sequence in GenBank or FASTA format, it shows three frames (−1, 0 and +1 frames) in the upper left window (Figure 7). It considers one start codon, AUG, and three stop codons, UAA, UAG and UGA, for the three frames. Users are asked to choose from a list of available types of frameshifting (e.g. dnaX type, oaz type, etc.), the sequence size, and whether the search should be performed in the + or − strand during the file open operation. This information is used to determine the method of finding genes in the given sequence. For a bacterial genome with the prfB gene or sequence with the oaz gene, FSFinder first finds a gene in a manner similar to Glimmer (21). For a full genomic sequence specified as − strand by a user, frameshift sites are found in the reverse complementary sequence. Candidate −1 and +1 frameshift sites are shown below in the three frame views. +1 frameshift signals are set to prfB signals by default, but can be switched to oaz signals using the run menu. If a user specifies a region for detailed examination by the drag and drop operation, the specified region is enlarged in the lower left window.
Figure 7

Graphical user interface of FSFinder. (A) Stop codons (long, blue lines). (B) Start codons (short, red lines). (C) Frameshift signal with the highest probability (light yellow). (D) Frameshift signal with a stem–loop (green bar). (E) Frameshift signal with a pseudoknot (pink bar).

The right window of FSFinder consists of three panels (Figure 7) for selection details, −1 signals, and +1 signals. The panel for selection details shows the start and stop codons, slippery sequences, pseudoknots and stem–loops (Figure 7). The panels for −1 and +1 signal panels show the total number of signals detected in overlapping and non-overlapping regions of the frames, as well as the positions of the signals. Users can also choose the range of a view using the draw option in the draw menu, and change the stem and loop sizes of a stem–loop or pseudoknot using the find option in the run menu. They can also alternate frames to find frameshift sites in different overlapping frames using the analysis menu. Overlapping frames with the largest ORF (light grey) have the highest probability of containing frameshift sites, and overlapping frames with the second largest ORF (dark grey) have the second highest probability of having frameshift sites (see Figure 8).
Figure 8

Alternating ORFs.

RESULTS AND DISCUSSION

We tested FSFinder on 71 organisms with known programmed −1 frameshift mutations obtained from the databases PseudoBase (22) and RECODE (23). At the moment when this work has been performed, PseudoBase contained 20 eukaryotic viruses, while RECODE had 65 prokaryotes, eukaryotic viruses, bacteriophages, eukaryotic transposable elements and bacterial insertion sequences. The two databases share 14 frameshifts. Each of these organisms and elements has one or two authentic programmed −1 frameshift sites for 27 genes in total. FSFinder identifies more potential frameshift sites than the approach of Hammell et al. (9) because both pseudoknots and simple stem–loops are considered as downstream secondary structures and because the conditions for slippery motifs and pseudoknots are relaxed. On the other hand, it finds fewer candidates for non-programmed frameshift sites than the approach of Bekaert et al. (8) because it only searches for frameshift sites in the overlapping regions of ORFs, and prioritizes candidate frameshift signals. Existence of frameshift site in the overlap of two ORFs increases likelihood of frameshift site to be utilized for gene expression purposes. In total, 26 frameshift sites in RECODE have simple stem–loops as downstream secondary structures, but 5 of these were excluded because PseudoBase assigns them different stimulatory structures or sequences. Eighteen of the remaining 21 frameshift sites were detected by FSFinder while 3 could not be found because their slippery sequences do not conform to the motif X XXY YYZ (Table 1). It turns out that most of bacterial frameshift sites have the slippery motif X XXY YYG. FSFinder identified 13 such sequences, and these can be classified into two types: A AAA AAG and G GGA AAG.
Table 1.

Frameshift sites in RECODE with downstream stem–loops and X XXY YYG slippery sequences

RECODE IDOrganisms  
 Frameshift signals with X XXY YYZ (Z ≠ G) and a downstream stemFrameshift signals with X XXY YYG and a downstream stemFrameshift signals with X XXY YYG and other downstream structures
71 Escherichia coli 
82HIV type 1  
83HIV type 2  
84Human T-cell lympotrophic virus type 1  
85Human T-cell lympotrophic virus type 2  
92Red clover necrotic mosaic virusa  
97Simian T-cell lymphosropic virus type 1  
104  Bacteriophage lambda
106Drosophila buzzatii Ossvaldo retrotransposon  
237  IS2
238 IS911 
251 IS150 
252 IS1221A 
257Carrot mottle mimic virusa  
258Groundnut rosette virus  
260Pea enation mosaic virus RNA 2a  
360 Salmonella typhi 
361 Salmonella typhimurium 
362 Vibrio cholerae 
363 Neisseria meningtidis 
364 Neisseria gonorrhoeae 
365 Neisseria meningitides 
392 Yersinia pestis 

aIndicates a frameshift site that was not identified by FSFinder because the slippery sequence did not conform to the motif X XXY YYZ.

Searching for frameshift signals in the overlapping region of ORFs is effective in predicting strong candidates for programmed frameshift sites. For example, a total of 582 potential −1 frameshift sites were found in the sequences of the test cases in PseudoBase. Only 40 of these were in overlapping ORFs, and only 21 of the 40 proved to be genuine frameshift sites. FSFinder also identifies frameshift sites in alternative frames. For example, simian type D virus 1 has two slippery sequences G GGA AAC and A AAU UUU in different frames at positions 2058 and 2585, respectively. FSFinder detected two different sites in each of six viruses in RECODE: human T-cell lymphotropic virus type 2, mouse mammary tumor virus, simian type D virus 1, simian retrovirus type 2, simian T-cell lymphotropic virus type 1 and visna virus. Only one alternative site (in mouse mammary tumor virus) could not be identified as it had a different motif (G GAU UUA). FSFinder could not detect the nine frameshift sites marked with ‘a’ in Table 2. As mentioned earlier, it only considers frameshift sites resulting in a long product, and those missed are associated with a short product.
Table 2.

Predictions for −1 frameshift sites in PseudoBase and RECODE

IDOrganismTPFNFPTNIDOrganismTPFNFPTN
PKB1BLV10440RECODE96Simian retrovirus 210133
PKB2BWYV10316RECODE97Siman T cell lympotropic virus 120325
PKB3EIAV10241RECODE98Visna virus20031
PKB4FIV10141RECODE99Bacteriophage T7a0100
PKB42PLRV-W10113RECODE104Bacteriophage lambda1000
PKB43PLRV-S10013RECODE105Cocksfoot mottle virus1005
PKB44CABYV10010RECODE106D.buzzatii ossvaldo retrotransposone1014
PKB45PEMV10212RECODE107D.ananassae Tom retrotransposone10033
PKB46BYDV-NY_RPV10112RECODE108Gill-associated virus10016
PKB80MMTV20034RECODE110T.vaginalis virus 2a0106
PKB106IBV10065RECODE114B.subtilisa0103
PKB107SRV1_gag/pro20033RECODE115D.melanogaster telo-meric retrotransposon Het-Aa01022
PKB127EAVa01141      
PKB128BEV10153RECODE118Enzootic nasal tumor V.10115
PKB171HCV_229E10055RECODE233Potato leafrol V.1019
PKB174RSV10017RECODE235IS11012
PKB217LDV-C10036RECODE236IS3a0103
PKB218PRRSV-16244B10143RECODE237IS21001
PKB233PRRSV-LV10032RECODE238IS9111016
PKB240BChV10217RECODE249Cereal yellow dwarf V. RPV-NY1019
RECODE71E.coli1004RECODE250Cereal yellow dwarf V. RPV-Mex1003
RECODE72Drosophila TE10033RECODE251IS1501003
RECODE73Human astrovirus1017RECODE252IS1221A10030
RECODE79Giardiavirus1007RECODE257Carrot mottle mimic V.a0106
RECODE80D.melanogaster gypsy TE10021RECODE258Groundnut rosette V.10014
RECODE82HIV type 110040RECODE260PEMV2a01013
RECODE83HIV type 210013RECODE360S.typhi1006
RECODE84Human T-cell lympotrophic 110522RECODE361S.typhimurium1006
RECODE85Human T-cell lympotrophic 220016RECODE362V.cholerae1005
RECODE86IAP10116RECODE363N.meningitides1007
RECODE88S.cerevisiae L-A10015RECODE364N.gonorrhoeae1008
RECODE89Murine hepatitis V.10049RECODE365N.meningitides1009
RECODE91Mason-pfizer monkey V.20033RECODE375M.musculus10019
RECODE92Red clover necrotic mosaic V.a01013RECODE376H.sapiens10028
RECODE94SIV10218RECODE392Y.pestis1007
RECODE95Simian type D V. 120030RECODE393SARS coronavirus10162

aIndicates a frameshift site missed by FSFinder because a slippery sequence did not conform to the motif X XXY YYZ. TE: transposable element. TP: true positives, TN: true negatives, FP: false positives, FN: false negatives.

We also tested FSFinder on 75 organisms in RECODE with known +1 frameshift cassettes in the prfB gene and oaz genes, and successfully detected 62 out of 75. The reasons FSFinder missed 13 of the sites were as follows. Nine (RECODE19, RECODE34, RECODE35, RECODE37, RECODE44, RECODE52, RECODE64, RECODE67, RECODE369) of the 13 sequences were partial DNA sequences that have a truncated ORF (entire genomic sequences were not available in GenBank), and FSFinder could not find an overlap of ORFs. In three (RECODE9, RECODE14, RECODE21 in Table 3) of the 13 sequences, there was no pair of overlapping ORFs since one of the ORFs has no start codon. One (RECODE43 in Table 3) of the 13 sequences has a different SD sequence (GGUG) from FSFinder definition of a SD signal, and could not be detected.
Table 3.

Predictions for +1 frameshift sites in RECODE

IDOrganismTPFNFPTNIDOrganismTPFNFPTN
RECODE1B.mori1001RECODE40C.pneumoniae1000
RECODE2B.fuckeliana1000RECODE41C.acetobutylicum1001
RECODE3C.elegans1002RECODE42C.difficile1000
RECODE4D.rerio (long form)1001RECODE43D.ethenogenes0001
RECODE5D.rerio (short form)1001RECODE44D.radiodurans0001
RECODE6D.melanogaster1013RECODE45D.vulgaris1010
RECODE7A.nidulellus1000RECODE46E.faecalis1000
RECODE8G.gallus1001RECODE47E.coli1000
RECODE9G.pallida0001RECODE48H.ducreyi1000
RECODE10H.contortus1000RECODE49H.influenzae1000
RECODE11H.sapiens1001RECODE50P.multocida1000
RECODE12H.sapiens1004RECODE51P.gingivalis1000
RECODE13H.sapiens1000RECODE52P.aeruginosa0001
RECODE14H.sapiens0002RECODE53P.putida1000
RECODE15M.auratus1002RECODE54R.prowazekii1000
RECODE16M.musculus1002RECODE55S.typhimurium1000
RECODE17M.musculus1002RECODE56S.typhi1000
RECODE18M.musculus1000RECODE57S.putrefaciens1000
RECODE19N.americanus0002RECODE58S.mutans1000
RECODE20O.volvulus1001RECODE59S.aureus1000
RECODE21P.carinii0001RECODE61S.pneumoniae1000
RECODE22P.pacificus1000RECODE62S.pyogenes1000
RECODE23R.norvegicus1002RECODE63S.PCC68031001
RECODE24S.pombe1002RECODE64T.pallidum0101
RECODE25S.japonicus1000RECODE65V.cholerae1000
RECODE26S.octosporus1002RECODE66X.campestris pv. campestris1000
RECODE27T.marmorata1002      
RECODE28X.laevis1002RECODE67X.fastidiosa1000
RECODE29A.ferrooxidans1000RECODE68N.meningitidis1000
RECODE30A.actinomycetemcomitans1000RECODE69L.monocytogenes1000
RECODE32B.firmus1000RECODE366B.halodurans1000
RECODE33B.subtilis1000RECODE367B.parapertussis0101
RECODE34B.bronchiseptica0102RECODE368B.sp.APS1000
RECODE35B.pertussis0100RECODE369C.psittaci0101
RECODE36B.burgdorferi1000RECODE370C.psittaci1000
RECODE37C.crescentus0101RECODE371C.tepidum1000
RECODE38C.trachomatis1010RECODE372D.hafniense1000
RECODE39C.muridarum1001RECODE373M.loti1000
Tables 2 and 3 summarize the predictions for −1 and +1 frameshift sites, respectively. A total of 68 −1 frameshift sites for 21 genes were predicted correctly, and 10 −1 frameshift sites for six genes were missed. The average sensitivity and specificity of prediction for −1 frameshift sites were 0.88 and 0.97, respectively, using Equations 1 and 2. For +1 frameshifts, FSFinder was intended for two genes. A total of 62 +1 frameshift sites were predicted correctly, and six were missed. The average sensitivity and specificity of prediction for +1 frameshift sites were 0.91 and 0.94, respectively, using Equations 3 and 4. It has higher specificity than sensitivity for both types of frameshifting. where TP, TN, FP and FN are true positives, true negatives, false positives and false negatives, respectively. TPs are those cases where FSFinder found frameshifts that are annotated in the databases. FPs are those cases where FSFinder reported frameshifts that do not exist. TNs are those frameshifts that conform to the frameshift signal model but were rejected by FSFinder as candidate frameshifts because they exist outside the overlapping regions of ORFs. They are not annotated in databases, either. FNs are actual frameshifts that were missed by FSFinder.

Frameshift signals in microbial genomes

Escherichia coli release factor 2 (RF2) is a well-known example that utilizes +1 frameshifting (11,12), and the role of this frameshifting is widely acknowledged. We extracted 38 bacterial genomes with RF2 genes from GenBank that are not present in the RECODE database and tested FSFinder on them (Table 4). FSFinder missed 11 frameshift sites in the 38 organisms since their slippery sequences were of the form CUU URA C. The average sensitivity and specificity of prediction were 0.72 and 0.92, respectively (Equations 5 and 6). The sensitivity was lower than that for the RECODE data on +1 frameshifts.
Table 4.

Predictions for +1 frameshift sites in the RF2 gene in bacterial genomes

IDTPFNFPTN
NC_0026631018
NC_00273710018
NC_00295210010
NC_0029711005
NC_0030621018
NC_00319710220
NC_0032951055
NC_0033041018
NC_0033171014
NC_00345410011
NC_00386901016
NC_00390910020
NC_00419310215
NC_0043070102
NC_0043101014
NC_00434201018
NC_0043440102
NC_00435010123
NC_00446310520
NC_0045510105
NC_0045720119
NC_00466310217
NC_00472210025
NC_0047571083
NC_00502710621
NC_00504210015
NC_0050610101
NC_00507110114
NC_0050721007
NC_0050851015
NC_00509001031
NC_00512610120
NC_0052961007
NC_0053030103
NC_00536310136
NC_00582301018
NC_00583510250
NC_00586111117

TP: true positives, TN: true negatives, FP: false positives, FN: false negatives.

In Borrelia burgdorferi B31 (gi:15594346, 910 724 bp), FSFinder predicted a CUUUGAC heptameric sequence in the overlapping region of the ORFs (at position 70 196 in the +1 strand of B.burgdorferi), which corresponds to a known +1 frameshift site in prfB (23). It also predicted a new −1 frameshift site in the overlap region (at position 428 613). Biochemical experiments to confirm this are in progress. We compared these predictions with those using randomly generated sequences in which the number of As and Ts were equal to those of Gs and Cs. FSFinder was tested on 10 random sequences of the same length as B.burgdorferi B31. On average, no −1 frameshift site and 0.9 +1 frameshift sites were detected in the overlapping regions of ORFs. These results indicate that −1 frameshift signals are very unlikely to exist by chance in the overlapping regions of random sequences. For the purpose of comparison, we tested FreqAnalysis (14) on the ORF regions of the five organisms. FreqAnalysis finds various types of motifs in frameshift sites but does not provide information on motif positions and related RNA structures. It finds all potential frameshift sites in both overlapping and non-overlapping regions. In contrast, FSFinder only finds frameshift sites in overlapping regions and provides detailed information on the frameshift sites.

CONCLUSION

Identifying programmed frameshifts is difficult because of their diverse nature, yet it is important to fully understand the underlying mechanisms and to discover new genes. Existing computational models predict too many false positives, or need reference protein sequences together with DNA sequence data from similar organisms. We have developed an algorithm and a program called FSFinder for predicting plausible −1 and +1 frameshift sites in long DNA or mRNA sequences. FSFinder was tested on the DNA sequences obtained from different organisms in RECODE, PseudoBase and GenBank, and it predicted both −1 and +1 frameshift signals with higher sensitivity and specificity than other approaches. FSFinder obtains increased sensitivity by considering most of known potentially relevant components and by searching both + and − strands, and has increased specificity because it focuses on the overlapping regions of ORFs and prioritizes candidate signals. We believe FSFinder will be useful to predict frameshift sites. The development of FSFinder is not yet complete. The current version is capable of finding X XXY YYZ type of −1 frameshifting and prfB and oaz types of +1 frameshifting. Frameshift signals are very diverse and organism-dependent, so that they cannot be modeled in a single, universal way. FSFinder will be extended in future to find any frameshift site modeled by the user.
  23 in total

1.  PseudoBase: a database with RNA pseudoknots.

Authors:  F H van Batenburg; A P Gultyaev; C W Pleij; J Ng; J Oliehoek
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  FramePlus: aligning DNA to protein sequences.

Authors:  E Halperin; S Faigler; R Gill-More
Journal:  Bioinformatics       Date:  1999-11       Impact factor: 6.937

3.  Programmed translational -1 frameshifting on hexanucleotide motifs and the wobble properties of tRNAs.

Authors:  Patricia Licznar; Nina Mejlhede; Marie-Françoise Prère; Norma Wills; Raymond F Gesteland; John F Atkins; Olivier Fayet
Journal:  EMBO J       Date:  2003-09-15       Impact factor: 11.598

Review 4.  Programmed translational frameshifting.

Authors:  P J Farabaugh
Journal:  Annu Rev Genet       Date:  1996       Impact factor: 16.830

5.  Slippery runs, shifty stops, backward steps, and forward hops: -2, -1, +1, +2, +5, and +6 ribosomal frameshifting.

Authors:  R B Weiss; D M Dunn; J F Atkins; R F Gesteland
Journal:  Cold Spring Harb Symp Quant Biol       Date:  1987

6.  A frameshift error detection algorithm for DNA sequencing projects.

Authors:  G A Fichant; Y Quentin
Journal:  Nucleic Acids Res       Date:  1995-08-11       Impact factor: 16.971

7.  rRNA-mRNA base pairing stimulates a programmed -1 ribosomal frameshift.

Authors:  B Larsen; N M Wills; R F Gesteland; J F Atkins
Journal:  J Bacteriol       Date:  1994-11       Impact factor: 3.490

8.  Computational identification of putative programmed translational frameshift sites.

Authors:  Atul A Shah; Michael C Giddings; Jasmin B Parvaz; Raymond F Gesteland; John F Atkins; Ivaylo P Ivanov
Journal:  Bioinformatics       Date:  2002-08       Impact factor: 6.937

9.  Ribosome structure: revisiting the connection between translational accuracy and unconventional decoding.

Authors:  Guillaume Stahl; Gregory P McCarty; Philip J Farabaugh
Journal:  Trends Biochem Sci       Date:  2002-04       Impact factor: 13.807

10.  Towards a computational model for -1 eukaryotic frameshifting sites.

Authors:  Michaël Bekaert; Laure Bidou; Alain Denise; Guillemette Duchateau-Nguyen; Jean-Paul Forest; Christine Froidevaux; Isabelle Hatin; Jean-Pierre Rousset; Michel Termier
Journal:  Bioinformatics       Date:  2003-02-12       Impact factor: 6.937

View more
  29 in total

1.  A second case of -1 ribosomal frameshifting affecting a major virion protein of the Lactobacillus bacteriophage A2.

Authors:  Isabel Rodríguez; Pilar García; Juan E Suárez
Journal:  J Bacteriol       Date:  2005-12       Impact factor: 3.490

2.  Positive selection on transposase genes of insertion sequences in the Crocosphaera watsonii genome.

Authors:  Ted H M Mes; Marije Doeleman
Journal:  J Bacteriol       Date:  2006-10       Impact factor: 3.490

3.  Introduction to special issue on RNA.

Authors:  Peter Clote
Journal:  J Math Biol       Date:  2008-01       Impact factor: 2.259

4.  Molluscan mobile elements similar to the vertebrate Recombination-Activating Genes.

Authors:  Yuri Panchin; Leonid L Moroz
Journal:  Biochem Biophys Res Commun       Date:  2008-02-29       Impact factor: 3.575

5.  Identification of a tail assembly gene cluster from deep-sea thermophilic bacteriophage GVE2.

Authors:  Suijie Wu; Bin Liu; Xiaobo Zhang
Journal:  Virus Genes       Date:  2009-03-27       Impact factor: 2.332

6.  Human DNA tumor viruses generate alternative reading frame proteins through repeat sequence recoding.

Authors:  Hyun Jin Kwun; Tuna Toptan; Suzane Ramos da Silva; John F Atkins; Patrick S Moore; Yuan Chang
Journal:  Proc Natl Acad Sci U S A       Date:  2014-09-30       Impact factor: 11.205

7.  Complete genome sequence of a highly divergent astrovirus isolated from a child with acute diarrhea.

Authors:  Stacy R Finkbeiner; Carl D Kirkwood; David Wang
Journal:  Virol J       Date:  2008-10-14       Impact factor: 4.099

8.  Adaptive immunity restricts replication of novel murine astroviruses.

Authors:  Christine C Yokoyama; Joy Loh; Guoyan Zhao; Thaddeus S Stappenbeck; David Wang; Henry V Huang; Herbert W Virgin; Larissa B Thackray
Journal:  J Virol       Date:  2012-09-05       Impact factor: 5.103

9.  FSscan: a mechanism-based program to identify +1 ribosomal frameshift hotspots.

Authors:  Pei-Yu Liao; Yong Seok Choi; Kelvin H Lee
Journal:  Nucleic Acids Res       Date:  2009-11       Impact factor: 16.971

10.  Recode-2: new design, new search tools, and many more genes.

Authors:  Michaël Bekaert; Andrew E Firth; Yan Zhang; Vadim N Gladyshev; John F Atkins; Pavel V Baranov
Journal:  Nucleic Acids Res       Date:  2009-09-25       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.