| Literature DB >> 32673562 |
Kevin E Wu1, Furqan M Fazal2, Kevin R Parker2, James Zou3, Howard Y Chang4.
Abstract
SARS-CoV-2 genomic and subgenomic RNA (sgRNA) transcripts hijack the host cell's machinery. Subcellular localization of its viral RNA could, thus, play important roles in viral replication and host antiviral immune response. We perform computational modeling of SARS-CoV-2 viral RNA subcellular residency across eight subcellular neighborhoods. We compare hundreds of SARS-CoV-2 genomes with the human transcriptome and other coronaviruses. We predict the SARS-CoV-2 RNA genome and sgRNAs to be enriched toward the host mitochondrial matrix and nucleolus, and that the 5' and 3' viral untranslated regions contain the strongest, most distinct localization signals. We interpret the mitochondrial residency signal as an indicator of intracellular RNA trafficking with respect to double-membrane vesicles, a critical stage in the coronavirus life cycle. Our computational analysis serves as a hypothesis generation tool to suggest models for SARS-CoV-2 biology and inform experimental efforts to combat the virus. A record of this paper's Transparent Peer Review process is included in the Supplemental Information.Entities:
Keywords: APEX-seq; COX4; SARS-CoV-2; double-membrane vesicle; hypothesis generation; machine learning model; proximity labelling; viral RNA localization
Mesh:
Substances:
Year: 2020 PMID: 32673562 PMCID: PMC7305881 DOI: 10.1016/j.cels.2020.06.008
Source DB: PubMed Journal: Cell Syst ISSN: 2405-4712 Impact factor: 10.304
Figure 1Depictions of the SARS-CoV-2 Genome, the Eight Compartments that RNA-GPS Predicts Viral Transcript Residency to, and the Predicted Residencies for SARS-CoV-2 sgRNAs and its 5′/CDS/3′ Sequence Segments
(A and B) The SARS-CoV-2 genome produces a series of sub-genomic RNAs (sgRNAs), each encoding one or more genes or proteins (A). These sgRNAs share a common leader 5′ sequence and a common trailing 3′ UTR sequence (arrow blocks). For each sgRNA, RNA-GPS predicts residency to each compartment in (B). Italicized text indicates the APEX2 fusion protein used to measure transcripts corresponding to each localization (see Table S1).
(C and D) (C) Heatmap of rank scores, indicating how strongly each sgRNA (rows) is predicted to exhibit subcellular residency at each compartment (columns), compared with endogenous human transcripts measured to localize to that compartment. Colors indicate rank scores; color scale is shared across all heatmaps. Most sgRNAs share similar residency patterns, exhibiting statistically significant enrichment toward the mitochondrial matrix and nucleolus (see Table S3). We also computed these rank scores against a baseline of other coronavirus residency signals (D). SARS-CoV-2 exhibits a stronger mitochondrial matrix residency signal than most other coronaviruses, along with greater overall nuclear residency, particularly at the nucleolus. For context, coronaviruses are generally predicted to have residency at the nucleolus, mitochondrial matrix, and ER membrane (see Figure S2). These predictions are also consistent across different models (see Figure S3) and negative-strand SARS-CoV-2 sgRNA precursors (see Figure S4).
(E) Shows the predicted residency rank scores for shared 5′ and 3′ segments and an averaged residency rank score for the variable coding segments. Even on their own, the short ∼90–250 base pair 5′ and 3′ segments carry mitochondrial and nucleolar residency signals.
Figure 2Validation of SARS-CoV-2 Residency Predictions
(A) RNA-GPS predictions for the human cytomegalovirus β2.7 transcript, which has been shown to localize to the inner mitochondrial membrane. RNA-GPS correctly predicts its residency to the closest compartment it has been trained on—the mitochondrial matrix. This provides support that RNA-GPS can make reasonable predictions on viral RNA.
(B) To evaluate the effect of the potentially noisy mitochondrial examples in our APEX-seq training set on predicted SARS-CoV-2 residencies, we trained a “denoised” variant of RNA-GPS on a subsetted dataset that excludes these examples. This denoised model predicts the same residency pattern for the three components of the SARS-CoV-2 sgRNAs (compare with Figure 1E). For additional analysis of the mitochondrial dataset and predictions, see Figure S1.
| REAGENT or RESOURCE | SOURCE | IDENTIFIER |
|---|---|---|
| Coronavirus (incl. SARS-CoV-2) genome sequences | NCBI GenBank | Various (see query strings in covid19/baseline.py and covid19/covid19.py source code files in GitHub repository) |
| Human cytomegalovirus genome sequence | NCBI GenBank | |
| APEX-seq RNA localization data | GEO: GSE116008 | |
| RNA binding protein motif database | MEME Motif Databases | |
| seqFISH data | Derived from | |
| RNA-GPS model and SARS-CoV-2 analysis code | This manuscript and | |