| Literature DB >> 35363834 |
Jason Nomburg1,2,3, Wei Zou4, Thomas C Frost1,3, Chandreyee Datta5,6,7,8, Shobha Vasudevan5,6,7,8, Gabriel J Starrett9, Michael J Imperiale4,10, Matthew Meyerson1,2,11,12, James A DeCaprio1,3,12.
Abstract
Polyomaviruses (PyV) are ubiquitous pathogens that can cause devastating human diseases. Due to the small size of their genomes, PyV utilize complex patterns of RNA splicing to maximize their coding capacity. Despite the importance of PyV to human disease, their transcriptome architecture is poorly characterized. Here, we compare short- and long-read RNA sequencing data from eight human and non-human PyV. We provide a detailed transcriptome atlas for BK polyomavirus (BKPyV), an important human pathogen, and the prototype PyV, simian virus 40 (SV40). We identify pervasive wraparound transcription in PyV, wherein transcription runs through the polyA site and circles the genome multiple times. Comparative analyses identify novel, conserved transcripts that increase PyV coding capacity. One of these conserved transcripts encodes superT, a T antigen containing two RB-binding LxCxE motifs. We find that superT-encoding transcripts are abundant in PyV-associated human cancers. Together, we show that comparative transcriptomic approaches can greatly expand known transcript and coding capacity in one of the simplest and most well-studied viral families.Entities:
Mesh:
Year: 2022 PMID: 35363834 PMCID: PMC9007360 DOI: 10.1371/journal.ppat.1010401
Source DB: PubMed Journal: PLoS Pathog ISSN: 1553-7366 Impact factor: 7.464
Fig 4Pervasive wraparound transcription across PyV.
A-C. Watch plots indicating the top 4 highest abundance late wraparound transcript classes in dRNAseq data from SV40 (A), BKPyV Dunlop (B), and MPyV (C). The outer ring of each watch plot indicates the position of the viral ORFs. The inner arms are histograms detailing the distribution of transcript starts (in blue) and ends (in red) for transcripts within each transcript class. The red segments indicate exons. Transcripts start in the innermost ring—a second or third ring indicates that the pre-mRNA that generated the transcript must have circled the viral genome multiple times. The 3’ end of the transcript and the direction in which these plots are oriented is indicated by the red arrow at the end of the last exon segment. The red exon segments start at the most common transcript start site within the transcript class, and end at the most common transcript end site within the class. The watch plot key shows an example of the path of the pre-mRNA for SV40 transcript class L6_I. D. Bar plots indicating the percentage of late transcripts that span a given number of genome lengths in SV40, BKPyV Dunlop, and MPyV dRNAseq data. E. The leader-leader junction, that connects the pre-mRNA from one genome to the subsequent wraparound, was identified in Illumina short-RNAseq (total) data. The intron in question is plotted as a black line in this plot, with the x axis indicating the genomic position of the intron. The top late wraparound transcript for each virus was plotted. The gene map indicates the approximate gene position and is accurate for SV40—the exact position of the viral genes varies between viruses. Percentages indicate the percentage of late junction-spanning transcripts that support the plotted wraparound leader-leader junction. F. Schematic illustrating how leader-leader wraparound transcription can be detected from short read short-RNAseq (total). Leader-leader splicing can be seen as a repetitive exon in watch plots from long-read RNAseq data. Ultimately, there was an original processed mRNA in the cell that contained two tandem leader sequences. When this transcript of origin is sequenced via short read sequencing, reads will be generated across its length. A minority of these reads will span the leader-leader junction, and mapping against the viral reference genome can be used to uncover leader-leader splicing.
Fig 5Detection of novel, conserved splicing events that expand PyV coding capacity.
A-D. Schematics illustrating identified ORFs. Each row is a reading frame (except for ST and the LT 1st exon, which are in the same frame), and unannotated amino acids are represented by grey boxes. The measured intron is indicated by the red arrow. Colored ORFs are annotated, while grey ORFs are unannotated. Percentages on the right side of the Fig are the percentage of spliced viral transcripts on the same strand as determined from short-read short-RNAseq (total) data. Numbers after each virus name indicate the transcript class within each short-RNAseq (total) dataset. The measured intron is indicated by the red arrow. A) ST2: This ORF is generating from a splicing event that uses the LT first exon donor and an acceptor within the ST ORF. In HPyV7 and BKPyV Dunlop, the splice lands in frame and results in an internal deletion within ST. In MPyV and MCPyV the splice lands out of frame, resulting in an ORF that contains the N-terminal region of ST and novel amino acids at the C terminus. B) MT: MPyV encodes a MT following splicing connecting the end of the ST ORF with an ORF in an alternate frame of the LT second exon. In BKPyV, a similar splice occurs connecting ST with an MT-like ORF in an alternative frame of the LT second exon. C) VP1X: JCPyV encodes two VP1X ORFs generated by splicing within VP1 and landing in an alternative frame of VP1, or earlier in the late region due to wraparound transcription. While predominant in JCPyV, VP1X is likewise present in many other PyV. D) superT: The superT-specific splice utilizes the splice donor canonically associated with truncated T antigens such as 17kT in SV40 and truncT in BKPyV. Due to wraparound transcription, a LT second exon acceptor is available to the 3’ of this donor and acts as the acceptor. For the superT ORF to form, an initial LT splice is required. Ultimately, superT contains a duplication in part of the LT second exon that includes the RB-binding LxCxE motif. E. Schematics detailing BKPyV Dik isolates used for querying the existence of superT. BKPyV WT is wild type virus. M1 contains a LT intron that has been replaced with an intron from the plasmid pCI. Both WT and M1 are expected to generate LT and superT of expected sizes. M2 has a completely removed LT intron, and the pCI intron is located directly 5’ of the LT ORF. M2 is expected to encode LT of expected size, but a larger superT variant due to incorporation of a second copy of the LT first exon. F. Western blot of cells infected with BKPyV Dik WT, M1, or M2 and probed with an antibody reactive against LT. The lower molecular weight band is LT, and the higher molecular weight bands are consistent with superT.
Information on all samples and viruses (excluding tumors) can be found in Table 1.
| Virus | Sequencing Type | Origin | MOI / Timepoint | Host |
|---|---|---|---|---|
| SV40 | dRNAseq (two replicates) | Generated here | MOI 1 / 48hpi | |
| SV40 | SMRTseq | Generated here | MOI 1 / 48hpi | |
| SV40 (polysome input/whole-cell) | dRNAseq | Generated here | MOI 1 / 44hpi | |
| SV40 (polysome) | dRNAseq | Generated here | MOI 1 / 44hpi | |
| SV40 | Short-RNAseq (total) | Generated here | MOI 1 / 48hpi | |
| SV40 | short-RNAseq (polyA) | Generated here | MOI 1 / 48hpi | |
| BKPyV (Dunlop) | dRNAseq | Generated here | MOI 0.5 / 3dpi | Human |
| BKPyV (Dunlop) | SMRTseq | Generated here | MOI 0.5 / 3dpi | Human |
| BKPyV (Dunlop) | Short-RNAseq (total) | Generated here | MOI 0.5 / 3dpi | Human |
| BKPyV (Dunlop) | short-RNAseq (polyA) | Generated here | MOI 0.5 / 3dpi | Human |
| BKPyV (Dik) WT | Short-RNAseq (total) | Generated here | MOI 1 / 5dpi | Human |
| BKPyV (Dik) WT | Short-RNAseq (polyA) | Generated here | MOI 1 / 5dpi | Human |
| BKPyV (Dik) M1 | Short-RNAseq (polyA) | Generated here | MOI 1 / 5dpi | Human |
| BKPyV (Dik) M2 | Short-RNAseq (polyA) | Generated here | MOI 1 / 5dpi | Human |
| MPyV | dRNAseq | Generated here | Unknown / 28hpi | Mouse |
| MPyV | Short-RNAseq (total) | Garren et al. [ | MOI 50 / 36hpi | Mouse |
| JCPyV | Short-RNAseq (total) | Assetta et al. [ | Unknown / 9dpi | Human |
| MCPyV (Synthetic genome) | short-RNAseq (polyA) | Theiss et al. [ | 200ng viral DNA / Unknown | Human |
| HPyV7 | Short-RNAseq (total) | Rosenstein et al. [ | From infected human skin | Human |
| BSPyV1 | Short-RNAseq (total) | Identified by Schmidlin et al. [ | From whole scorpion |