Literature DB >> 33017003

VIRTUS: a pipeline for comprehensive virus analysis from conventional RNA-seq data.

Yoshiaki Yasumizu1, Atsushi Hara2, Shimon Sakaguchi1, Naganari Ohkura1.   

Abstract

SUMMARY: The possibility that RNA transcripts from clinical samples contain plenty of virus RNAs has not been pursued actively so far. We here developed a new tool for analyzing virus-transcribed mRNAs, not virus copy numbers, in the data of bulk and single-cell RNA-sequencing of human cells. Our pipeline, named VIRTUS (VIRal Transcript Usage Sensor), was able to detect 762 viruses including herpesviruses, retroviruses and even SARS-CoV-2 (COVID-19), and quantify their transcripts in the sequence data. This tool thus enabled simultaneously detecting infected cells, the composition of multiple viruses within the cell, and the endogenous host-gene expression profile of the cell. This bioinformatics method would be instrumental in addressing the possible effects of covertly infecting viruses on certain diseases and developing new treatments to target such viruses.
AVAILABILITY AND IMPLEMENTATION: : VIRTUS is implemented using Common Workflow Language and Docker under a CC-NC license. VIRTUS is freely available at https://github.com/yyoshiaki/VIRTUS. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
© The Author(s) 2020. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2021        PMID: 33017003      PMCID: PMC7745649          DOI: 10.1093/bioinformatics/btaa859

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

A variety of virus species including retroviruses, flaviviruses and herpesviruses, might contribute to the development of human diseases including autoimmune diseases and cancers. For example, Epstein–Barr virus (EBV) has been reported to play a causative role for head-neck cancer and lymphoma (Zapatka et al., 2020), and possibly for multiple sclerosis and systemic lupus erythematosus (Harley et al., 2018). It remains to be determined, however, which viruses are present in normal tissues and whether their state of activation contributes to disease development. Viruses can be detected by several methods such as antibody-based assays and PCR. Virus copy numbers in the genome can also be assessed by analyzing NGS derived data such as VirTect (Khan et al., 2019) and Kraken2 (Wood et al., 2019). On the other hand, it has been technologically difficult to examine the state of the virus in host tissues especially in relation to endogenous expression of host genes. In addition, since viral infection is heterogeneous depending on cell populations, it is unclear which cells are infected, how many virus species are present in the cells, and what states the viruses and the host cells assume. To address these issues, RNA information derived from polyA-based reverse transcription should be useful for analyzing intracellular viruses, since viruses intercept the host transcription systems, which yield polyA-tailed viral RNA transcripts along with endogenous RNAs from the host cells. We here attempted to establish a tool for measuring multiple viral transcriptomes even in a single cell.

2 VIRTUS

We developed a pipeline for detecting and quantifying transcripts of multiple viruses from conventional human RNA-seq data, and named it VIRTUS (VIRal Transcript Usage Sensor) (Supplementary Fig. S1). As a framework of VIRTUS, RNA-seq data was quality-trimmed, filtered by fastp (Chen et al., 2018) and mapped to the human genome by STAR (Dobin et al., 2013). The unmapped reads were next aligned on 762 virus genome references. After removing polyA/T containing reads, infected viruses were determined comprehensively. Using salmon (Patro et al., 2017), an amount of viral transcripts was quantified. The profiles of viral gene expression were integrated with the profiles of the host-gene expression in each cell or sample.

3 Results

3.1 Applications to bulk RNA-seq analyses

We first analyzed a bulk RNA-seq data of B cells infected with EBV (Mrozek-Gorska et al., 2019) (Fig. 1b and c). VIRTUS successfully detected EBV in all infected replicates (Supplementary Fig. S5a); and the frequency of incorrect assignment of the virus infection was much less compared with other tools, such as VirTect and kraken2 (Supplementary Fig. S5b–d). It was also able to quantify the EBV transcripts (Fig. 1b, Supplementary Fig. S5e) and detect its splicing pattern (Fig. 1c, Supplementary Fig. S5h). We also evaluated virus contents in clinical samples. VIRTUS successfully detected Hepatitis C virus infections from chronic hepatitis patients (Supplementary Fig. S4; Boldanova et al., 2017) and several latent infections such as human herpesvirus 4, 5, 6B and human adenovirus C from peripheral blood leukocytes from 12 systemic lupus erythematosus patients and 4 healthy donors (Fig. 1b;Rai et al., 2016). In addition, from bronchoalveolar lavage fluids from two SARS-CoV-2 infected patients, VIRTUS successfully detected SARS-CoV-2 in both patients (Supplementary Fig. S6; Chen et al., 2020).
Fig. 1.

VIRTUS, a pipeline for analyzing multiple viruses, and its outputs from conventional RNA-seq data. (a) Viruses detected from peripheral blood leukocytes from Systemic Lupus Erythematosus (SLE) patients and healthy donors. The colors indicate the number of reads mapped on the viruses with log2 transformation. Only viruses mapped more than five reads in at least one sample were shown. In this dataset, the statistical power was insufficient to detect the differences (two-sided Mann-Whitney U test). (b) Top20 differentially expressed genes within EBV infected cells. The color shows the normalized expression. (c) Virus-mapped reads with splicing on the EBV genome visualized by The Integrative Genomics Viewer. (d) Mean expression of HSV-1 transcripts and the expression of a correlated host gene, RASD1, on UMAP plots. (e) Differentially expressed genes between HSV-1 infected and non-infected cells. The x-axis shows log2 fold change in the gene expression and the y-axis represents -log10 (P-value) calculated by DESeq2

VIRTUS, a pipeline for analyzing multiple viruses, and its outputs from conventional RNA-seq data. (a) Viruses detected from peripheral blood leukocytes from Systemic Lupus Erythematosus (SLE) patients and healthy donors. The colors indicate the number of reads mapped on the viruses with log2 transformation. Only viruses mapped more than five reads in at least one sample were shown. In this dataset, the statistical power was insufficient to detect the differences (two-sided Mann-Whitney U test). (b) Top20 differentially expressed genes within EBV infected cells. The color shows the normalized expression. (c) Virus-mapped reads with splicing on the EBV genome visualized by The Integrative Genomics Viewer. (d) Mean expression of HSV-1 transcripts and the expression of a correlated host gene, RASD1, on UMAP plots. (e) Differentially expressed genes between HSV-1 infected and non-infected cells. The x-axis shows log2 fold change in the gene expression and the y-axis represents -log10 (P-value) calculated by DESeq2

3.2 Applications to single-cell RNA-seq analyses

We next applied VIRTUS to droplet-based single-cell RNA-seq data of human primary fibroblasts infected with Herpes simplex virus 1 (HSV-1) (Wyler et al., 2019) (Fig. 1d and e). First, we conducted pooled screening of viruses, in which all reads from all cells were assigned at once, and detected HSV-1 in the samples. Then, we measured HSV-1 transcripts by Alevin (Srivastava et al., 2019), which was suitable for downstream analysis of VIRTUS. We detected infected single cells, and found differentially expressed genes, such as RASD1 and MT-RNR1, between infected and non-infected cells, using VIRTUS and a standard single-cell pipeline. As shown in Figure 1d and e, RASD1, one of the differentially expressed genes, was tightly linked to the HSV-1 infected cells.

4 Conclusion

We developed a novel viral transcriptome detection and quantification pipeline, VIRTUS, which can be applied to both bulk and single-cell RNA-seq analyses. With this tool, we are able to detect the cells harboring activated viruses, the composition of multiple viruses in a cell and the expression differences between infected and uninfected cells. It would help our understanding of how viruses contribute to certain diseases as a trigger or modifier of disease development and devising new ways of treatment by targeting viruses.

Funding

This work was supported by Grants-in-Aid by Japanese Society for the Promotion of Science (JSPS) for Scientific Research B 15H04744 to N.O. and for Specially Promoted Research 16H06295 to S.S., by the Core Research for Evolutional Science and Technology (CREST, no. 17 gm0410016h0006) program from the Japan Science and Technology Agency to S.S. and by Leading Advanced Projects for medical innovation (LEAP, no. 18 gm0010005h0001) from Japan’s Agency for Medical Research and Development (AMED) to S.S. and Y.Y. Conflict of Interest: none declared. Click here for additional data file.
  13 in total

1.  STAR: ultrafast universal RNA-seq aligner.

Authors:  Alexander Dobin; Carrie A Davis; Felix Schlesinger; Jorg Drenkow; Chris Zaleski; Sonali Jha; Philippe Batut; Mark Chaisson; Thomas R Gingeras
Journal:  Bioinformatics       Date:  2012-10-25       Impact factor: 6.937

2.  RNA-seq Analysis Reveals Unique Transcriptome Signatures in Systemic Lupus Erythematosus Patients with Distinct Autoantibody Specificities.

Authors:  Richa Rai; Sudhir Kumar Chauhan; Vikas Vikram Singh; Madhukar Rai; Geeta Rai
Journal:  PLoS One       Date:  2016-11-11       Impact factor: 3.240

3.  Salmon provides fast and bias-aware quantification of transcript expression.

Authors:  Rob Patro; Geet Duggal; Michael I Love; Rafael A Irizarry; Carl Kingsford
Journal:  Nat Methods       Date:  2017-03-06       Impact factor: 28.547

4.  Transcription factors operate across disease loci, with EBNA2 implicated in autoimmunity.

Authors:  John B Harley; Xiaoting Chen; Mario Pujato; Daniel Miller; Avery Maddox; Carmy Forney; Albert F Magnusen; Arthur Lynch; Kashish Chetal; Masashi Yukawa; Artem Barski; Nathan Salomonis; Kenneth M Kaufman; Leah C Kottyan; Matthew T Weirauch
Journal:  Nat Genet       Date:  2018-04-16       Impact factor: 38.330

5.  fastp: an ultra-fast all-in-one FASTQ preprocessor.

Authors:  Shifu Chen; Yanqing Zhou; Yaru Chen; Jia Gu
Journal:  Bioinformatics       Date:  2018-09-01       Impact factor: 6.937

6.  Alevin efficiently estimates accurate gene abundances from dscRNA-seq data.

Authors:  Avi Srivastava; Laraib Malik; Tom Smith; Ian Sudbery; Rob Patro
Journal:  Genome Biol       Date:  2019-03-27       Impact factor: 13.583

7.  Epstein-Barr virus reprograms human B lymphocytes immediately in the prelatent phase of infection.

Authors:  Paulina Mrozek-Gorska; Alexander Buschle; Dagmar Pich; Thomas Schwarzmayr; Ron Fechtner; Antonio Scialdone; Wolfgang Hammerschmidt
Journal:  Proc Natl Acad Sci U S A       Date:  2019-07-24       Impact factor: 11.205

8.  RNA based mNGS approach identifies a novel human coronavirus from two individual pneumonia cases in 2019 Wuhan outbreak.

Authors:  Liangjun Chen; Weiyong Liu; Qi Zhang; Ke Xu; Guangming Ye; Weichen Wu; Ziyong Sun; Fang Liu; Kailang Wu; Bo Zhong; Yi Mei; Wenxia Zhang; Yu Chen; Yirong Li; Mang Shi; Ke Lan; Yingle Liu
Journal:  Emerg Microbes Infect       Date:  2020-02-05       Impact factor: 7.163

9.  The landscape of viral associations in human cancers.

Authors:  Marc Zapatka; Ivan Borozan; Daniel S Brewer; Murat Iskar; Adam Grundhoff; Malik Alawi; Nikita Desai; Holger Sültmann; Holger Moch; Colin S Cooper; Roland Eils; Vincent Ferretti; Peter Lichter
Journal:  Nat Genet       Date:  2020-02-05       Impact factor: 38.330

10.  Detection of human papillomavirus in cases of head and neck squamous cell carcinoma by RNA-seq and VirTect.

Authors:  Atlas Khan; Qian Liu; Xuelian Chen; Andres Stucky; Parish P Sedghizadeh; Daniel Adelpour; Xi Zhang; Kai Wang; Jiang F Zhong
Journal:  Mol Oncol       Date:  2019-02-23       Impact factor: 6.603

View more
  4 in total

1.  MTD: a unique pipeline for host and meta-transcriptome joint and integrative analyses of RNA-seq data.

Authors:  Fei Wu; Yao-Zhong Liu; Binhua Ling
Journal:  Brief Bioinform       Date:  2022-05-13       Impact factor: 13.994

Review 2.  COVID-19: A systematic review and update on prevention, diagnosis, and treatment.

Authors:  Hooman Aghamirza Moghim Aliabadi; Reza Eivazzadeh-Keihan; Arezoo Beig Parikhani; Sara Fattahi Mehraban; Ali Maleki; Sepideh Fereshteh; Masoume Bazaz; Ashkan Zolriasatein; Bahareh Bozorgnia; Saman Rahmati; Fatemeh Saberi; Zeinab Yousefi Najafabadi; Shadi Damough; Sara Mohseni; Hamid Salehzadeh; Vahid Khakyzadeh; Hamid Madanchi; Gholam Ali Kardar; Payam Zarrintaj; Mohammad Reza Saeb; Masoud Mozafari
Journal:  MedComm (2020)       Date:  2022-02-17

3.  Whole-transcriptome sequencing-based concomitant detection of viral and human genetic determinants of cutaneous lesions.

Authors:  Amir Hossein Saeidian; Leila Youssefian; Charles Y Huang; Fahimeh Palizban; Mahtab Naji; Zahra Saffarian; Hamidreza Mahmoudi; Azadeh Goodarzi; Soheila Sotoudeh; Fatemeh Vahidnezhad; Maliheh Amani; Narjes Tavakoli; Ali Ajami; Samaneh Mozafarpoor; Mehrdad Teimoorian; Saeed Dorgaleleh; Sima Shokri; Mohammad Shenagari; Nima Abedi; Sirous Zeinali; Paolo Fortina; Vivien Béziat; Emmanuelle Jouanguy; Jean-Laurent Casanova; Jouni Uitto; Hassan Vahidnezhad
Journal:  JCI Insight       Date:  2022-04-22

4.  Myasthenia gravis-specific aberrant neuromuscular gene expression by medullary thymic epithelial cells in thymoma.

Authors:  Yoshiaki Yasumizu; Naganari Ohkura; Hisashi Murata; Makoto Kinoshita; Soichiro Funaki; Satoshi Nojima; Kansuke Kido; Masaharu Kohara; Daisuke Motooka; Daisuke Okuzaki; Shuji Suganami; Eriko Takeuchi; Yamami Nakamura; Yusuke Takeshima; Masaya Arai; Satoru Tada; Meinoshin Okumura; Eiichi Morii; Yasushi Shintani; Shimon Sakaguchi; Tatsusada Okuno; Hideki Mochizuki
Journal:  Nat Commun       Date:  2022-07-22       Impact factor: 17.694

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.