| Literature DB >> 31784609 |
Joseph L DeRisi1,2, Greg Huber1, Amy Kistler1, Hanna Retallack2, Michael Wilkinson1,3, David Yllanes4.
Abstract
Narnaviruses have been described as positive-sense RNA viruses with a remarkably simple genome of ~3 kb, encoding only a highly conserved RNA-dependent RNA polymerase (RdRp). Many narnaviruses, however, are 'ambigrammatic' and harbour an additional uninterrupted open reading frame (ORF) covering almost the entire length of the reverse complement strand. No function has been described for this ORF, yet the absence of stops is conserved across diverse narnaviruses, and in every case the codons in the reverse ORF and the RdRp are aligned. The >3 kb ORF overlap on opposite strands, unprecedented among RNA viruses, motivates an exploration of the constraints imposed or alleviated by the codon alignment. Here, we show that only when the codon frames are aligned can all stop codons be eliminated from the reverse strand by synonymous single-nucleotide substitutions in the RdRp gene, suggesting a mechanism for de novo gene creation within a strongly conserved amino-acid sequence. It will be fascinating to explore what implications this coding strategy has for other aspects of narnavirus biology. Beyond narnaviruses, our rapidly expanding catalogue of viral diversity may yet reveal additional examples of this broadly-extensible principle for ambigrammatic-sequence development.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31784609 PMCID: PMC6884476 DOI: 10.1038/s41598-019-54181-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Ambigrammatic sequences in narnaviruses. Coding region for the RNA-dependent RNA polymerase (RdRp) of Phytophthora infestans RNA virus 4 (a), Culex narnavirus 1 (b), and Wenling narna-like virus 7 (c) in the reference +0 frame and all five other reading frames (see Fig. 2 for our frame-labelling conventions). Stop codons in each frame are depicted as vertical lines. Large uninterrupted open reading frames (ORFs) are highlighted in colour.
Figure 2Labelling conventions used in this paper for reading frames.
Figure 3Maximum likelihood tree of amino-acid sequences for RNA-dependent RNA polymerase (RdRp) of 42 representative narnaviruses, identified by homology to the narnaviruses observed in culture, Culex narnavirus 1 and Phytophthora infestans virus 4 (NCBI Blastx[36]). Unrooted tree shown with midpoint rooting for display. Branch colouring indicates the fraction of RdRp coding sequence overlapped by the longest open reading frame (defined as a region uninterrupted by stops) in the reverse complement aligned frame (−0 frame) for sequences at tips (see colour bar, bottom left). The sequence names in bold correspond to those shown in Fig. 1. Numbers at nodes indicate bootstrap values (shown when >80). The branch length is given by the amino-acid substitutions per site, as illustrated by the scale bar.
Figure 4Probability distribution for ORF lengths in narnavirus-like sequences. Shading shows distribution of ORF lengths coloured by reading frame after codon permutation test on RdRp coding sequences of 42 representative narnaviruses as in Fig. 3. In brief, codons are randomly re-ordered and then ORF lengths in the 5 alternate frames are calculated (permutation methods as in[33]). Points give lengths of actual ORFs in reference sequences, coloured according to reading frame, with the reference RdRp as +0 frame (red, below). Note that some annotated RdRp coding regions in the database may be fragments of the complete coding sequence.