| Literature DB >> 19737402 |
Andrew E Firth1, John F Atkins.
Abstract
The genus Torovirus (order Nidovirales) includes a number of species that infect livestock. These viruses have a linear positive-sense ssRNA genome of approximately 25-30 kb, encoding a large polyprotein that is expressed from the genomic RNA, and several additional proteins expressed from a nested set of 3'-coterminal subgenomic RNAs. In this brief report, we describe the bioinformatic discovery of a new, apparently coding, ORF that overlaps the 5' end of the polyprotein coding sequence, ORF1a, in the +2 reading frame. The new ORF has a strong coding signature and, in fact, is more conserved at the amino acid level than the overlapping region of ORF1a. We propose that the new ORF utilizes a non-AUG initiation codon--namely a conserved CUG codon in a strong Kozak context--upstream of the ORF1a AUG initiation codon, resulting in a novel 258 amino acid protein, dubbed '30K'.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19737402 PMCID: PMC2749830 DOI: 10.1186/1743-422X-6-136
Source DB: PubMed Journal: Virol J ISSN: 1743-422X Impact factor: 4.099
Figure 1Coding potential statistics for torovirus ORF1a and the overlapping ORFX. (A) Torovirus genome map (Breda virus or Bovine torovirus [GenBank:NC_007447]; from [5]) showing the location of the proposed new coding sequence, ORFX. (B1) Map of the ORF1a region showing the proposed new coding sequence, ORFX, overlapping ORF1a in the +2 reading frame. (B2-B4) The positions of stop codons in each of the three forward reading frames. The +0 frame corresponds to ORF1a and is therefore devoid of stop codons. Note the conserved absence of stop codons in the +2 frame within the ORFX region. (B5-B6) Conservation at synonymous sites within ORF1a (see [11] for details). (B5) depicts the probability that the degree of conservation within a given window could be obtained under a null model of neutral evolution at synonymous sites, while (B6) depicts the absolute amount of conservation as represented by the ratio of the observed number of substitutions within a given window to the number expected under the null model. Note that the relatively large sliding window size (75 codons) - used here for improved statistical power - is responsible for the broad smoothing of the conservation scores at the 3' end of ORFX. (B7-B9) MLOGD sliding-window plots (window size 75 codons; step size 25 codons; see [8] for details). The null model, in each window, is that the sequence is non-coding, while the alternative model is that the sequence is coding in the given reading frame. Positive scores favour the alternative model and, as expected, in the +0 frame (B7) there is a strong coding signature throughout ORF1a except where ORF1a is overlapped by ORFX (see text). In the +1 and +2 frames (B8-B9), scores are generally negative, albeit with significant scatter into positive scores (a reflection of the limited amount of available input sequence data). Nonetheless the ORFX region is characterized by consecutive positively scoring windows in the +2 frame (B9). Note that, regardless of the sign (either positive or negative), the magnitude of MLOGD scores tends to be lower within the overlap region itself (B7-B9) due to there being fewer substitutions with which to discrimate the null model from the alternative model in this region of above-average nucleotide conservation.
Figure 2Alignment extract showing ORFX and flanking regions.
Figure 3Amino acid alignment for '30K', the translated ORFX. Note, here the proposed CUG initiation codon is assumed to be translated by initiator Met-tRNA - resulting in an N-terminal methionine rather than leucine.