| Literature DB >> 33256807 |
Jason Nomburg1,2,3, Matthew Meyerson4,5,6,7, James A DeCaprio8,9,10.
Abstract
BACKGROUND:Entities:
Keywords: COVID-19; Direct RNA sequencing; SARS-CoV-2; Transcription
Mesh:
Substances:
Year: 2020 PMID: 33256807 PMCID: PMC7704119 DOI: 10.1186/s13073-020-00802-w
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 11.117
Information on transcriptomes analyzed in this study
| Source | Virus isolate (MOI) | Infection host | Timepoint | Accession or DOI | Technology |
|---|---|---|---|---|---|
| Taiaroa et al. [ | Australia/VIC01/2020 (MOI: Unknown) (Passage unknown) | Vero cells | Unknown | SRR11350376 | dRNAseq |
| Kim et al. [ | Korea/KCDC02/2020 (MOI: 0.05) (4th passage. No plaque isolation) | Vero cells | 24 hpi | 10.17605/OSF.IO/8F6N9 | dRNAseq |
| Davidson et al. [ | England/VE6-T/2020 (MOI: Unknown) (2nd passage. No plaque isolation) | Vero cells | Unknown | 10.5281/zenodo.3722580 | dRNAseq |
| Finkel et al. [ | GISAID Acc. No. EPI_ISL_406862 (MOI: 0.2) (4th passage. No plaque isolation) | Vero cells | 5 hpi | SRR11713354 | Illumina PolyA |
| Finkel et al. [ | GISAID Acc. No. EPI_ISL_406862 (MOI: 0.2) (4th passage. No plaque isolation) | Vero cells | 24 hpi | SRR11713362 | Illumina PolyA |
| Blanco-Melo et al. [ | USA/WA1/2020 (MOI: 0.2) (Passage unknown) | A549-ACE2 cells | 24 hpi | SRR11517741 | Illumina PolyA |
| Blanco-Melo et al. [ | USA/WA1/2020 (5 × 104 PFU/ferret) (Passage unknown) | Ferret (Nasal washings) | 3 dpi | SRR11517855- SRR11517858 | Illumina PolyA |
| Suzuki et al. [ | SARS-CoV-2/Hu/DP/Kng/19-020 (5 × 104 PFU/100 organoids) (Passage unknown but | Bronchial organoids | 5 dpi | SRR11811022 | Illumina PolyA |
| Emanuel et al. [ | Frankfurt strain AY310120.1 (Passage unknown) | Calu3 | 4 hpi | SRR11550047 | Illumina Total RNA |
| Emanuel et al. [ | Frankfurl strain AY310120.1 (Passage unknown) | Calu3 | 24 hpi | SRR11550045 | Illumina Total RNA |
| Viehweger et al. [ | CoV-229E (MOI: 3) (Passage unknown) | Huh7 cells | 24 hpi | ERR3460961 | dRNAseq |
hpi hours post infection, dpi days post infection
Fig. 1SARS-CoV-2 generates a defined population of canonical subgenomic RNAs. a–c For each location on the viral genome, a histogram of 5′ and 3′ junctions at that position was calculated and plotted as an inverse peak. The histogram bin size is 100 bases, meaning each inverse peak represents the cumulative count of 5′ or 3′ junctions occurring within that span. Curved lines represent the 5′ and 3′ locations of junctions that occur at least twice. Red curves represent canonical junctions, black curves represent non-canonical junctions. “Taiaroa” [7], “Kim” [8], and “Davidson” [9] represent the three independent SARS-CoV-2 dRNAseq datasets investigated. d–f A histogram of 3′ junctions past position 21000 with a 5′ end before position 100 are plotted. Dashed lines indicate the start coordinates of annotated viral genes. Bin size is 20 bases. g Based on junction analysis, the predicted major species of virus-produced RNAs are represented. The most 5′ gene or genes on each subgenomic RNA is listed
Fig. 2Identification of few prominent dataset-specific non-canonical junctions. a Percentage of junctions that are non-canonical in five independent datasets. Junctions were assigned as canonical if their 5′ location was within 20 bases of the TRS-L and their 3′ location within 15 bases of a TRS-B, and otherwise assigned as non-canonical. Taiaroa, Kim, and Davidson are dRNAseq, while Finkel and Blanco-Melo are Illumina PolyA RNAseq. b Illustration of computational approach to determine the consistency of 5′ and 3′ junction points across independent datasets. The percentage of non-canonical junctions at each genome position in each dataset was determined (X). The mean (μ) and standard deviation (σ) of percentages at each position across the five datasets was calculated. For each position in each dataset, the Z-score was calculated as (X − μ)/σ (i.e., the number of standard deviations away from the mean), and the percentages and Z-scores for each position in each dataset were plotted. c, d Each point represents the percentage and Z-score of one position in one dataset. For each position in the SARS-CoV-2 genome, the percentage and Z-score of non-canonical junctions with a 5′ end (c) or a 3′ end (d) was determined as described above for five independent datasets: Taiaroa, Kim, Davidson, Finkel, and Blanco-Melo. Points with a percentage above 4% of non-canonical junctions and a Z-score above 1 were highlighted
Fig. 3Non-canonical junctions accumulate over time in cell culture. a Illustration of computational approach used to determine change in junction percentages over time. For each position in the SARS-CoV-2 genome and separately for 5′ and 3′ junctions, the number of junctions at that position was calculated. Using these numbers, the percentage of 5′ or 3′ junctions falling at each position was calculated. The change in junction percentage at each position is defined as the difference between the position’s junction percentage at the late and early timepoints, and this junction change was plotted for each position. b–e The change in junction percentage for 5′ (orange) and 3′ (blue) junctions over time in the Finkel (b, d) and Emanuel (c, e) datasets was determined as described above. Changes in the percentage of junctions Positions with at change greater than 2.5% are annotated with text on the plot. Panels d and e are zoomed in versions of b and c. f, g Junctions were assigned into groups based on their 5′ and 3′ junction positions. If a junction had a 5′ end within 20 bases of the TRS-L and within 15 bases of a TRS-B, it was considered a canonical junction belonging to the ORF with a start closest to the TRS-B. Otherwise, it was considered non-canonical. The percentage of junctions falling into each category was calculated for early and late timepoints, and the difference between each category’s percentage in the late vs early timepoint was plotted
Fig. 4Non-canonical junctions are not associated with TRS-like homology. a Illustration of computational approach used to assess homologous sequences between the 5′ and 3′ junction points. For each of the three dRNAseq datasets, the 30 bases flanking the 5′ and 3′ junction points were assessed, and the longest homologous sequence between these two 30 base regions was determined. b–d Homologous sequences present in the 15 bases on either side of the 5′ and 3′ ends of each junction. Junctions are classified by the location of their 3′ end—if this is within 15 bases of the canonical TRS site or if it falls within a gene it is assigned accordingly. The only exception is ORF1a—junctions with a 5′ end originating in ORF1a are assigned to ORF1a. Labels represent the most common homologous sequence between the ends of each junction. The core TRS sequence ACGAAC is underlined. e–g The length of the longest homologous sequence for canonical (C) and non-canonical (NC) are plotted. Here, junctions were considered canonical if they have a 5′ end within 20 nucleotides of the TRS-L and a 3′ end within 15 nucleotides of a TRS-B. (R) represents random homology lengths. The points for (R) were calculated by first extracting all possible 30 base sequences (30mers) from the SARS-CoV-2 genome, and then assessing the length of the longest homologous sequence between 100,000 random pairs of 30mers that are separated by at least 1000 bases on the genome. Within each column, the relative width of each band represents the relative abundance of junctions with each homology length. The value at the top of each column is the mean homology length
Fig. 5Relative abundance of subgenomic RNAs that contain only the 5′ end of ORF1a. a–h Read coverage (black) and cumulative 5′ junctions (red) are plotted for eight independent datasets. The sequencing strategy and sample types are annotated. i A schematic of a representative subgenomic RNA that consists of only the 5′ region of ORF1a. Pileups showing reads can be found in Additional file 1: Fig. S3
Fig. 6Junctions have the potential to generate variant open reading frames. a–c ORFs were predicted directly from transcript-derived reads for the three dRNAseq datasets. Each ORF was aligned against the protein sequences of canonical SARS-CoV-2 genes using the DIAMOND aligner. Variant ORFs were defined as ORFs that were assigned to a canonical SARS-CoV-2 protein but had an unexpected start or stop position, while perfectly aligning ORFs were considered canonical. The percentage of canonical and variant ORFs for each protein is plotted. d–l Schematics of M, N, and ORF1a are displayed with the approximate location of predicted transmembrane domains labeled in red. A histogram of the start and end sites of variant M, N, and ORF1a ORFs are displayed. Start sites of each variant ORF are on top, and end sites are on the bottom of each panel. Percentages represent the percentage of each ORF that is variant. Histogram bin sizes: M, 1; N, 10; ORF1a, 20. NTD: N-terminal domain. RBD: Receptor binding domain. CD: connector domain. NSP: nonstructural protein. m–o The identity of ORF1a-fusion partners is plotted on the Y-axis, with the count of such fusions on the X-axis. The top 10 fusion partners for each sample are represented. Color indicates if the fusion partner is on the N and C terminus of ORF1a, if the terminus is ambiguous, or if the fusion is a “self” fusion between an upstream and downstream region of ORF1a