Literature DB >> 31764967

Cerebro: interactive visualization of scRNA-seq data.

Roman Hillje¹, Pier Giuseppe Pelicci^1,2, Lucilla Luzi¹.

Abstract

Despite the growing availability of sophisticated bioinformatic methods for the analysis of single-cell RNA-seq data, few tools exist that allow biologists without extensive bioinformatic expertise to directly visualize and interact with their own data and results. Here, we present Cerebro (cell report browser), a Shiny- and Electron-based standalone desktop application for macOS and Windows which allows investigation and inspection of pre-processed single-cell transcriptomics data without requiring bioinformatic experience of the user. Through an interactive and intuitive graphical interface, users can (i) explore similarities and heterogeneity between samples and cell clusters in two-dimensional or three-dimensional projections such as t-SNE or UMAP, (ii) display the expression level of single genes or gene sets of interest, (iii) browse tables of most expressed genes and marker genes for each sample and cluster and (iv) display trajectories calculated with Monocle 2. We provide three examples prepared from publicly available datasets to show how Cerebro can be used and which are its capabilities. Through a focus on flexibility and direct access to data and results, we think Cerebro offers a collaborative framework for bioinformaticians and experimental biologists that facilitates effective interaction to shorten the gap between analysis and interpretation of the data.
AVAILABILITY AND IMPLEMENTATION: The Cerebro application, additional documentation, and example datasets are available at https://github.com/romanhaa/Cerebro. Similarly, the cerebroApp R package is available at https://github.com/romanhaa/cerebroApp. All components are released under the MIT License. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2020 PMID： 31764967 PMCID： PMC7141853 DOI： 10.1093/bioinformatics/btz877

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

1 Introduction

Transcriptomics data of single cells (scRNA-seq) are generated with unprecedented frequency due to the recent availability of fully commercialized workflows and improvements in throughput and costs (Svensson ). Though sophisticated bioinformatic tools are being developed (Deng ; Pliner ; Zhang ), appropriate analysis and interpretation of data from scRNA-seq experiments largely relies on deep understanding of the biological context and experimental conditions behind the preparation of input cells. However, direct interaction with their own datasets is often out of reach for biologists without bioinformatic expertise. Existing visualization software for scRNA-seq data, such as Loupe Cell Browser by 10× Genomics or iSEE (Rue-Albrecht ), often either provide a limited amount of results or require the user to be proficient enough to execute (at least a few) commands in the terminal. Cerebro aims to overcome the technical hurdles and allow direct and interactive exploration of pre-processed scRNA-seq results.

2 Materials and methods

The key features of Cerebro include: (i) visualization of two-dimensional (2D) and three-dimensional (3D) projections such as t-SNE or UMAP; (ii) overview panels for samples and clusters; (iii) tables of most expressed genes and marker genes for each sample and cluster; (iv) tables of enriched pathways in marker genes of samples or clusters and (v) visualization of expression of user-specified genes and gene sets from MSigDB (Liberzon ; Subramanian ). All these elements are designed to be interactive. Plots can be exported to PNG and/or PDF, while tables can be saved to CSV and Excel format. The core of Cerebro is the cerebroApp R package [built with Shiny (https://CRAN.R-project.org/package=shiny)], which can be installed as a standalone application [built with Electron (https://electronjs.org/)]. Alternatively, the Cerebro user interface is available through the cerebroApp R package or as a Docker container. Input data needs to be prepared using the cerebroApp R package. Currently, cerebroApp offers functionality to export a Seurat object (both Seurat v2 and v3 are supported) to the Cerebro format in a single step (Butler ). However, through existing conversion functions available in the Seurat framework, results generated with other analysis frameworks, such as scanpy (AnnData format), can be exported for visualization in Cerebro as well. cerebroApp also provides functions to perform a set of (optional) analyses, e.g. pathway enrichment analysis based on marker gene lists of samples or clusters through Enrichr (Chen ; Kuleshov ), gene set enrichment analysis for samples and clusters using the gene set variation analysis method (Hänzelmann ), and extraction of single-cell trajectories calculated with Monocle 2 (Qiu ). Parallel processing in these functions ensures time-efficient execution. For human and mouse datasets, marker genes will be intersected with the gene ontology term ‘cell surface’ (GO: 0009986) to highlight potential markers for experimental enrichment of the respective cell community. The exported .crb file is then loaded into Cerebro and shows the information from the Seurat object (Fig. 1). Full-size versions of the examples of the graphical interface of Cerebro shown in Fig. 1 can be found in Supplementary Figs S1–S6 as well as in the Cerebro GitHub repository.

Fig. 1.

Schematic workflow of Cerebro. In the first step, the raw data are processed and analyzed (barcode extraction, alignment, etc.) using existing tools such as Cell Ranger, stored in a Seurat object, and exported to a .crb file using functions of the cerebroApp package. Subsequently, the .crb file can then be loaded into Cerebro for visualization. Currently, Cerebro can be launched as a standalone application, from the cerebroApp R package, or from the dedicated Docker container

3 Usage scenario

To illustrate the proposed workflow, we analyzed three publicly available scRNA-seq datasets and provide the resulting .crb files in the Cerebro GitHub repository (Russell ; Yu ). For example, the ‘pbmc_10k_v3’ dataset contains ∼10k human PBMCs from a healthy donor (link to dataset in Supplementary Material) following the basic Seurat (v2 and v3) and basic scanpy (Wolf ) workflows. First, we loaded the feature matrix and created a Seurat/AnnData object, filtered cells based on numbers of transcripts and expressed genes, log-transformed transcript counts and normalized each cell to 10 000 transcripts. We then identified variable genes, scaled the expression matrix and regressed out numbers of transcripts, performed cell cycle and principal component analysis, identified clusters and described their relationship in a cluster tree. We also generated 2D and 3D projections using the t-SNE and UMAP algorithms. Then, we used cerebroApp to calculate the percentage of mitochondrial and ribosomal gene expression, obtain the most expressed genes and differentially expressed genes (marker genes) for each sample and cluster, perform pathway enrichment analysis using the identified marker genes, perform gene set enrichment analysis on all 5501 curated C2 genes sets available in the MSigDB, and finally export a .crb file that can be loaded into Cerebro. Based on the combined information from pathway enrichment (in particular the Enrichr results from the Human Gene Atlas), marker genes and expression of additional genes and gene sets, we were able to retrieve expected cell types commonly found in PBMC samples (dendritic cells, NK cells, B cells, megakaryocytes, monocytes, CD4+ and CD8+ T cells) and assign a cell type to each cluster. If desired, these cell groups could be further discriminated by checking the expression of additional marker genes and gene sets.

4 Conclusion

By providing access to comprehensive information on expression profiles of samples and clusters, we hope that Cerebro will accelerate data interpretation and ultimately knowledge acquisition. Notably, the proposed workflow also provides analytical flexibility by enabling the addition of custom analyses and results to the Seurat object. Since the code is completely open-source, it is possible (and people are encouraged) to modify and adapt Cerebro to display other results and data types. While cerebroApp currently only supports to prepare Seurat objects for visualization in Cerebro, export methods for object types of other popular scRNA-seq analysis frameworks, such as SingleCellExperiment or AnnData [used by scanpy (Wolf )] can be added in the future. Furthermore, Seurat already provides functionality to import data from other frameworks, including the two mentioned above, and therefore serves as a gateway for the majority of datasets. An example of how to export data analyzed in scanpy for visualization in Cerebro is provided in the Cerebro GitHub repository. Due to the nature of Shiny apps, Cerebro can be easily adapted to be hosted on web servers.

5 Software availability

The current standalone version of Cerebro is available for Windows and macOS and can be downloaded from the GitHub repository: https://github.com/romanhaa/Cerebro/releases. Alternatively, users of Windows, macOS and Linux can install (and find the source code of) cerebroApp R package — which provides the same functionality as the standalone version — from: https://github.com/romanhaa/cerebroApp. Analysis of the example datasets was carried out in a Docker container to ensure reproducibility. The container was built using a recipe file stored in the Cerebro GitHub repository and is available through the Docker Hub under the name ‘romanhaa/cerebro’. Click here for additional data file.

15 in total

1. Molecular signatures database (MSigDB) 3.0.

Authors: Arthur Liberzon; Aravind Subramanian; Reid Pinchback; Helga Thorvaldsdóttir; Pablo Tamayo; Jill P Mesirov
Journal: Bioinformatics Date: 2011-05-05 Impact factor: 6.937

2. Exponential scaling of single-cell RNA-seq in the past decade.

Authors: Valentine Svensson; Roser Vento-Tormo; Sarah A Teichmann
Journal: Nat Protoc Date: 2018-03-01 Impact factor: 13.491

3. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.

Authors: Aravind Subramanian; Pablo Tamayo; Vamsi K Mootha; Sayan Mukherjee; Benjamin L Ebert; Michael A Gillette; Amanda Paulovich; Scott L Pomeroy; Todd R Golub; Eric S Lander; Jill P Mesirov
Journal: Proc Natl Acad Sci U S A Date: 2005-09-30 Impact factor: 11.205

4. Integrating single-cell transcriptomic data across different conditions, technologies, and species.

Authors: Andrew Butler; Paul Hoffman; Peter Smibert; Efthymia Papalexi; Rahul Satija
Journal: Nat Biotechnol Date: 2018-04-02 Impact factor: 54.908

5. Probabilistic cell-type assignment of single-cell RNA-seq for tumor microenvironment profiling.

Authors: Allen W Zhang; Ciara O'Flanagan; Elizabeth A Chavez; Jamie L P Lim; Nicholas Ceglia; Andrew McPherson; Matt Wiens; Pascale Walters; Tim Chan; Brittany Hewitson; Daniel Lai; Anja Mottok; Clementine Sarkozy; Lauren Chong; Tomohiro Aoki; Xuehai Wang; Andrew P Weng; Jessica N McAlpine; Samuel Aparicio; Christian Steidl; Kieran R Campbell; Sohrab P Shah
Journal: Nat Methods Date: 2019-09-09 Impact factor: 28.547

6. SCANPY: large-scale single-cell gene expression data analysis.

Authors: F Alexander Wolf; Philipp Angerer; Fabian J Theis
Journal: Genome Biol Date: 2018-02-06 Impact factor: 13.583

7. GSVA: gene set variation analysis for microarray and RNA-seq data.

Authors: Sonja Hänzelmann; Robert Castelo; Justin Guinney
Journal: BMC Bioinformatics Date: 2013-01-16 Impact factor: 3.169

8. Enrichr: interactive and collaborative HTML5 gene list enrichment analysis tool.

Authors: Edward Y Chen; Christopher M Tan; Yan Kou; Qiaonan Duan; Zichen Wang; Gabriela Vaz Meirelles; Neil R Clark; Avi Ma'ayan
Journal: BMC Bioinformatics Date: 2013-04-15 Impact factor: 3.169

9. Reversed graph embedding resolves complex single-cell trajectories.

Authors: Xiaojie Qiu; Qi Mao; Ying Tang; Li Wang; Raghav Chawla; Hannah A Pliner; Cole Trapnell
Journal: Nat Methods Date: 2017-08-21 Impact factor: 47.990

10. Extreme heterogeneity of influenza virus infection in single cells.

Authors: Alistair B Russell; Cole Trapnell; Jesse D Bloom
Journal: Elife Date: 2018-02-16 Impact factor: 8.140

15 in total

Review 1. Tutorial: guidelines for annotating single-cell transcriptomic maps using automated and manual methods.

Authors: Zoe A Clarke; Tallulah S Andrews; Jawairia Atif; Delaram Pouyabahar; Brendan T Innes; Sonya A MacParland; Gary D Bader
Journal: Nat Protoc Date: 2021-05-24 Impact factor: 13.491

2. Single-cell profiling of CNS border compartment leukocytes reveals that B cells and their progenitors reside in non-diseased meninges.

Authors: David Schafflick; Jolien Wolbert; Michael Heming; Christian Thomas; Maike Hartlehnert; Anna-Lena Börsch; Alessio Ricci; Sandra Martín-Salamanca; Xiaolin Li; I-Na Lu; Mathias Pawlak; Jens Minnerup; Jan-Kolja Strecker; Thomas Seidenbecher; Sven G Meuth; Andres Hidalgo; Arthur Liesz; Heinz Wiendl; Gerd Meyer Zu Horste
Journal: Nat Neurosci Date: 2021-07-12 Impact factor: 24.884

Review 3. Multi-omics integration in the age of million single-cell data.

Authors: Zhen Miao; Benjamin D Humphreys; Andrew P McMahon; Junhyong Kim
Journal: Nat Rev Nephrol Date: 2021-08-20 Impact factor: 42.439

4. ICARUS, an interactive web server for single cell RNA-seq analysis.

Authors: Andrew Jiang; Klaus Lehnert; Linya You; Russell G Snell
Journal: Nucleic Acids Res Date: 2022-05-10 Impact factor: 19.160

5. Integrating T cell receptor sequences and transcriptional profiles by clonotype neighbor graph analysis (CoNGA).

Authors: Stefan A Schattgen; Kate Guion; Jeremy Chase Crawford; Aisha Souquette; Alvaro Martinez Barrio; Michael J T Stubbington; Paul G Thomas; Philip Bradley
Journal: Nat Biotechnol Date: 2021-08-23 Impact factor: 68.164

6. Intraocular dendritic cells characterize HLA-B27-associated acute anterior uveitis.

Authors: Maren Kasper; Michael Heming; David Schafflick; Xiaolin Li; Tobias Lautwein; Melissa Meyer Zu Horste; Dirk Bauer; Karoline Walscheid; Heinz Wiendl; Karin Loser; Arnd Heiligenhaus; Gerd Meyer Zu Hörste
Journal: Elife Date: 2021-11-16 Impact factor: 8.140

Review 7. The Dual Role of Innate Lymphoid and Natural Killer Cells in Cancer. from Phenotype to Single-Cell Transcriptomics, Functions and Clinical Uses.

Authors: Stefania Roma; Laura Carpen; Alessandro Raveane; Francesco Bertolini
Journal: Cancers (Basel) Date: 2021-10-09 Impact factor: 6.639

8. Gene Expression Nebulas (GEN): a comprehensive data portal integrating transcriptomic profiles across multiple species at both bulk and single-cell levels.

Authors: Yuansheng Zhang; Dong Zou; Tongtong Zhu; Tianyi Xu; Ming Chen; Guangyi Niu; Wenting Zong; Rong Pan; Wei Jing; Jian Sang; Chang Liu; Yujia Xiong; Yubin Sun; Shuang Zhai; Huanxin Chen; Wenming Zhao; Jingfa Xiao; Yiming Bao; Lili Hao; Zhang Zhang
Journal: Nucleic Acids Res Date: 2022-01-07 Impact factor: 16.971

9. SCelVis: exploratory single cell data analysis on the desktop and in the cloud.

Authors: Benedikt Obermayer; Manuel Holtgrewe; Mikko Nieminen; Clemens Messerschmidt; Dieter Beule
Journal: PeerJ Date: 2020-02-19 Impact factor: 2.984

10. SARS-CoV-2 receptor is co-expressed with elements of the kinin-kallikrein, renin-angiotensin and coagulation systems in alveolar cells.

Authors: Davi Sidarta-Oliveira; Carlos Poblete Jara; Adriano J Ferruzzi; Munir S Skaf; William H Velander; Eliana P Araujo; Licio A Velloso
Journal: Sci Rep Date: 2020-11-11 Impact factor: 4.379