| Literature DB >> 34973417 |
David G Garmire1, Xun Zhu2, Aravind Mantravadi3, Qianhui Huang4, Breck Yunits2, Yu Liu4, Thomas Wolfgruber2, Olivier Poirion2, Tianying Zhao5, Cédric Arisdakessian6, Stefan Stanojevic4, Lana X Garmire7.
Abstract
We present GranatumX, a next-generation software environment for single-cell RNA sequencing (scRNA-seq) data analysis. GranatumX is inspired by the interactive webtool Granatum. GranatumX enables biologists to access the latest scRNA-seq bioinformatics methods in a web-based graphical environment. It also offers software developers the opportunity to rapidly promote their own tools with others in customizable pipelines. The architecture of GranatumX allows for easy inclusion of plugin modules, named Gboxes, which wrap around bioinformatics tools written in various programming languages and on various platforms. GranatumX can be run on the cloud or private servers and generate reproducible results. It is a community-engaging, flexible, and evolving software ecosystem for scRNA-seq analysis, connecting developers with bench scientists. GranatumX is freely accessible at http://garmiregroup.org/granatumx/app.Entities:
Keywords: Analysis; Module; Pipeline; Single-cell RNA sequencing; Webtool
Mesh:
Year: 2021 PMID: 34973417 PMCID: PMC8864242 DOI: 10.1016/j.gpb.2021.07.005
Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN: 1672-0229 Impact factor: 7.691
Figure 1Overview of the Granatum X platform
Granatum X aims to bridge the gap between the computational method developers (the bioinformaticians) and the experiment designers (the biologists). It achieves this by building end-to-end infrastructure including the packaging and containerization of the codes (Gbox packaging), organization and indexing of the Gboxes (Apps), customization of the analysis steps (pipeline building), visualization and result downloading (interactive analysis), and finally the aggregation and summarization of the study (report generation).
Figure 2GranatumX deployment, data management, and analysis flow
A. Granatum X can be deployed on various computational environments, from PCs, servers, HPC systems, to cloud services. Granatum X’s web UI is adaptable to devices with various screen sizes, which allows desktop and mobile access. B. Granatum X’s data management. Each Gbox (labeled by a particular color to represent a certain functionality) with order dependency on the pipeline, may take some project data and some user-specified parameters as input and may generate results (interactive visualization, plots, tables, or even plain text) and new project data. All project data and results, as well as the specified parameters, are recorded and saved into the CDS and can be used for reproducibility control. C. A scRNA-seq computational study typically consists of three phases: the upload and parsing of the expression matrices and metadata (data entry), the quality improvement and signal extraction of the data (data processing), and finally the assorted analyses on the processed data which offer biological insights (data analysis). PC, personal computer; HPC, high-performance computing; UI, user interface; CDS, central data storage; GSEA, gene set enrichment analysis.
Figure 3Case studies using an exemplary workflow of GranatumX
A. An exemplary workflow of a customized scRNA-seq pipeline set by the user. B. UMAP plot showing clusters on metastatic Merkel cell carcinoma data from the 10x Genomics platform [13]. C. UMAP plot showing clusters of Tabula Muris Consortium data [14]. PCA, principal component analysis; t-SNE, t-distributed stochastic neighbor embedding; UMAP, uniform manifold approximation and projection.
GSEA results on clusters from UMAP plot inFigure 3B
| Comparing pair | KEGG gene set name | Gene set size | NES | FDR | |
|---|---|---|---|---|---|
| Cluster 2 | Glycolysis gluconeogenesis | 13 | 4.23 | 0 | 0 |
| Pathogenic | 14 | 3.63 | 0 | 0 | |
| Alzheimer’s disease | 20 | 3.97 | 0 | 0 | |
| Tight junction | 16 | 3.03 | 0.004 | 0.0063 | |
| Cluster 3 | Oocyte meiosis | 15 | 3.75 | 0 | 0 |
| Pathogenic | 14 | 3.48 | 0.001 | 0.0315 | |
| Cell cycle | 20 | 3.13 | 0.002 | 0.042 | |
| Ubiquitin-mediated proteolysis | 12 | 3.29 | 0.005 | 0.0787 | |
| Cluster 4 | Spliceosome | 13 | 3.84 | 0 | 0 |
| Viral myocarditis | 25 | 3.23 | 0.002 | 0.042 | |
| Cluster 5 | Alzheimer’s disease | 20 | 3.08 | 0 | 0 |
| Antigen processing and presentation | 30 | 2.51 | 0.003 | 0.0472 | |
| MAPK signaling pathway | 45 | 2.91 | 0 | 0 | |
| Glycolysis gluconeogenesis | 13 | 2.49 | 0.043 | 0.198 | |
| Spliceosome | 13 | 2.99 | 0.004 | 0.0504 | |
Note: GSEA, gene set enrichment analysis; UMAP, uniform manifold approximation and projection; KEGG, Kyoto Encyclopedia of Genes and Genomes; NES, normalized enrichment score; FDR, false detection rate.
Comparison on multiple user-friendly webtools
| Feature | Platform | |||
|---|---|---|---|---|
| SC1 | ASAP | Single Cell Explorer | GranatumX | |
| Simple report and interactivity for biologists | Yes | Yes | Yes | Yes |
| Configurable* pipeline | No | No | Yes | Yes |
| Supporting computational developers to plug in their own containers | No | No | No | Yes |
| Programming languages allowed in plug ins | NA | NA | NA | Multiple languages ( |
| Default pipeline supporting imputation | No | No | No | Yes |
| Default pipeline supporting pseudo-time analysis | No | No | No | Yes |
| Supporting protein–protein interaction network | No | No | No | Yes |
Note: * configurable refers to the ability to customize the analytical steps and orders. NA, not available.