| Literature DB >> 18489030 |
Abstract
BACKGROUND: The genus Orbivirus includes several species that infect livestock - including Bluetongue virus (BTV) and African horse sickness virus (AHSV). These viruses have linear dsRNA genomes divided into ten segments, all of which have previously been assumed to be monocistronic.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18489030 PMCID: PMC2373779 DOI: 10.1186/1743-422X-5-48
Source DB: PubMed Journal: Virol J ISSN: 1743-422X Impact factor: 4.099
Figure 1Genome map for BTV. The putative new coding sequence – ORFX – is located on segment 9 (RNA9), in the +1 reading frame relative to the overlapping VP6 cistron. Molecular masses are based on the unmodified amino acid sequences.
Figure 2MLOGD statistics for the alignment of 48 BTV sequences. The input alignment comprised a CLUSTALW [39] alignment of the VP6 amino acid sequences only, back-translated to nucleotide sequences. (1) The positions of alignment gaps in each of the 48 sequences. In fact most of the alignment is ungapped, though a few sequences are incomplete. (2)–(4) The positions of stop codons in each of the 48 sequences in each of the three forward reading frames. Note the conserved absence of stop codons in the +0 frame (i.e. the VP6 CDS) and in the +1 frame in the ORFX region. (5)–(8) MLOGD sliding-window plots. Window size = 20 codons. Step size = 10 codons. Each window is represented by a small circle (showing the likelihood ratio score for that window), and grey bars showing the width (ends) of the window. See [16] for further details of the MLOGD software. In (5)–(6) the null model, in each window, is that the sequence is non-coding, while the alternative model is that the sequence is coding in the window frame. Positive scores favour the alternative model. There is a strong coding signature in the +0 frame (5) throughout the VP6 CDS, except where the VP6 CDS overlaps ORFX. In this region there is a strong coding signature in the +1 frame (6) indicating that ORFX is subject to stronger functional constraints than the overlapping section of VP6. In (7)–(8) the null model, in each window, is that only the VP6 frame is coding, while the alternative model is that both the VP6 frame and the window frame are coding. Only the +1 (7) and +2 (8) frames are shown because the +0 frame is the VP6 frame which is included in the null model. Scores are generally negative with occasional random scatter into low positive scores, except for the ORFX region which has consecutive high-positively scoring windows (7). Note that there are four non-overlapping – and hence completely independent – positively scoring windows in the ORFX region (7). Formally, and within the MLOGD model, p < 10-40. (9) Genome map for the reference sequence [GenBank: NC_006008]. (10) Phylogenetically summed sequence divergence (mean number of base variations per nucleotide) for the sequences that contribute to the statistics at each position in the alignment. In any particular column, some sequences may be omitted from the statistical calculations due to alignment gaps. Statistics in regions with lower summed divergence (i.e. partially gapped regions) have a lower signal-to-noise ratio.
Figure 3MLOGD statistics for BTV, AHSV, PALV and PHSV/YUOV alignments. Output plots from MLOGD used in the 'Test Query CDS' mode, applied to the ORFX region in BTV, AHSV, PALV and PHSV/YUOV sequence alignments. See [16] for full details of the MLOGD software. The null model comprises the VP6 CDS and the query CDS is ORFX. In each plot, the top panel displays the raw log(LR) statistics at each alignment position. There is a separate track for each reference – non-reference sequence pair (labelled at the right, together with the pairwise divergences; albeit not legible for the BTV alignment since it contains so many – i.e. 48 – sequences). Stop codons (of which there are none except 3' terminal ones) in each of the VP6 and ORFX reading frames, and alignment gaps for each sequence, are marked on the appropriate tracks. The second panel displays the Σtree log(LR) statistic at each alignment position, where 'tree' represents a phylogenetic tree – see [16]. The third and fourth panels display sliding window means of the statistics in the first and second panels, respectively. The fifth panel shows the locations of the null and alternative model CDSs (i.e. VP6 and ORFX, respectively). The sixth panel shows the summed mean sequence divergence (base variations per alignment nt column) for the sequence pairs that contribute to the Σtree log(LR) statistic at each alignment position. This is a measure of the information available at each alignment position (e.g. partially gapped regions have lower summed mean sequence divergence). The predominantly positive values in the fourth panel indicate that ORFX is subject to functional constraints, at the amino acid level, over the majority of its length.
Kozak contexts of VP6 AUG codons in BTV. Kozak contexts of AUG codons upstream of ORFX in BTV for the 34 segment 9 sequences which appear to contain the complete 5'UTR. Kozak contexts are assumed to be 'strong' if there is 'G' at +4 and an 'A' or 'G' at -3, 'medium' if one of these is present, and 'weak' if neither are present.
| One upstream AUG codon | Two upstream AUG codons | ||||||||
| First | Strength | Number | First | Second | Strength | Number | |||
| -3 | +4 | -3 | +4 | -3 | +4 | ||||
| G | C | medium | 5 | C | U | U | G | weak-medium | 15 |
| A | A | medium | 1 | C | U | G | C | weak-medium | 9 |
| C | U | weak | 1 | C | U | A | A | weak-medium | 1 |
| C | U | A | G | weak-strong | 1 | ||||
| C | U | C | C | weak-weak | 1 | ||||
Figure 4Nucleotide frequencies for segment 9. Nucleotide frequencies in 60 nt running windows along each Orbivirus segment 9 RefSeq. 'A' – red, 'C' – green, 'G' – blue, 'U' – purple. Horizontal black bars represent the locations of the VP6 CDS and ORFX (the grey bar represents ORFXb in SCRV). Except for SCRV, the sequences are A- or AG-rich, but they also have an A-rich peak just upstream of ORFX.
Figure 5Segment 9 genome maps for six Orbivirus species. Genome maps for segment 9 of the six Orbivirus RefSeqs in GenBank, showing the location of putative ORFX homologues. In SCRV, no long ORF was found in the right location and frame; the two ORFs indicated here are separated by a stop codon. A phylogenetic tree for the six Orbivirus VP6 amino acid sequences (columns with alignment gaps excluded; neighbour-joining tree; numbers indicate bootstrap support [out of 1000]; scale bar represents the number of substitutions per site; tree produced with CLUSTALX [39]) is given at left.
ORFX MLOGD statistics. MLOGD statistics for ORFX in different Orbivirus alignments. These statistics were derived using MLOGD in the 'Test Query CDS' mode (Figure 3) – specifically testing the coding potential of the whole ORFX – rather than the 'Sliding Window' mode used for Figure 2.
| Species | Reference1 | Nseqs | Length | ln(LR)2 | var/nt3 | ln(LR)/nt4 | ||
| BTV | NC_006008 | 48 | 234 nt | 101.8 | 0.77 | 0.44 | 180 | 0.21 |
| AHSV | NC_006019 | 3 | 429 nt | 15.8 | 0.06 | 0.04 | 26 | 0.05 |
| PALV | NC_005992 | 11 | 1807 nt | 29.7 | 0.23 | 0.16 | 41 | 0.12 |
| PHSV/YUOV | NC_007753 | 2 | 336 nt | 33.0 | 0.56 | 0.10 | 189 | 0.56 |
1. GenBank reference sequence used for MLOGD.
2. Total MLOGD log likelihood score – positive values indicate that ORFX is likely to be coding. Formally, exp(ln(LR)) gives , which may be equated to if equal Bayesian priors are assumed. These probabilities are, however, subject to the assumptions of the MLOGD sequence evolution model [15]. Nonetheless, extensive tests with known single-coding and double-coding sequences indicate that 'Nvar ≥ 20' and 'ln(LR)/nt ≥ × var/nt' signals robust detection of an overlapping same-strand CDS [16] (and unpublished data).
3. Alignment divergence per nucleotide – i.e. mean number of independent base variations per alignment column in the ORFX region.
4. Log likelihood score per alignment column.
5. Approximate total number of independent base variations in ORFX region.
6. Maximum pairwise divergence from the chosen reference sequence.
7. Alignment of PALV partial sequences – does not cover the entire ORFX region.
Nucleotide frequencies for segment 9. Mean nucleotide frequencies for the six Orbivirus segment 9 RefSeqs in GenBank.
| RefSeq | Species | A% | C% | G% | U% |
| NC_006008 | BTV | 32 | 16 | 33 | 19 |
| NC_006019 | AHSV | 32 | 16 | 32 | 20 |
| NC_005992 | PALV | 36 | 16 | 26 | 23 |
| NC_007753 | PHSV | 41 | 13 | 24 | 22 |
| NC_007664 | YUOV | 36 | 18 | 25 | 20 |
| NC_006005 | SCRV | 25 | 27 | 24 | 25 |