| Literature DB >> 32600310 |
Ayman Yousif1, Nizar Drou1, Jillian Rowe1, Mohammed Khalfan2, Kristin C Gunsalus3,4.
Abstract
BACKGROUND: As high-throughput sequencing applications continue to evolve, the rapid growth in quantity and variety of sequence-based data calls for the development of new software libraries and tools for data analysis and visualization. Often, effective use of these tools requires computational skills beyond those of many researchers. To ease this computational barrier, we have created a dynamic web-based platform, NASQAR (Nucleic Acid SeQuence Analysis Resource).Entities:
Keywords: Exploratory data analysis; Graphical user interface; Interactive visualization; Transcriptomics
Mesh:
Year: 2020 PMID: 32600310 PMCID: PMC7322916 DOI: 10.1186/s12859-020-03577-4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1NASQAR platform architecture. A cluster of virtual machines at NYU Abu Dhabi serves NASQAR applications to multiple concurrent users. Applications are containerized and managed on the cluster using Docker and Swarm, while Traefik load-balances requests among available server nodes. Functionality includes merging gene counts, conversion of gene IDs to gene names, analysis of differential mRNA expression, metagenomics analysis, and functional enrichment analysis. Applications for bulk expression analysis include DESeq2, limma, and EdgeR. Single-cell RNAseq analysis with Seurat Wizards is built on top of the Seurat R package and includes options for filtering, normalization, dimensionality reduction (PCA), clustering, and UMAP/t-SNE. Enrichment analysis includes applications for Gene Set Enrichment Analysis (GSEA) and Over-representation Analysis (ORA) built using the clusterProfiler R package
Comprehensive overview of NASQAR applications
| Preprocessing | Merge counts files into a matrix format that is needed for many downstream analysis; also offers gene id/name conversion | base R packages | gene count files (Eg. output count files from htseq) | csv file containing a matrix of samples and corresponding gene counts | N/A | |
| Preprocessing | Merge sample fpkm files into a matrix format; also offers gene id/name conversion | base R packages | sample gene fpkm files (Eg. sample fpkm files from Cufflinks) | csv file containing a matrix of samples and corresponding gene fpkm values | N/A | |
| RNAseq (bulk) | Differential Gene Expression (DGE) analysis and optionally Surrogate Variable Analysis (SVA) for hidden batch effect detection | DESeq2, SVA | matrix of samples and gene counts (csv) and optionally a metadata table (csv) | VST matrix, Rlog matrix, DGE results table, Gene expression table, normalized counts matrix | Distance heatmap, PCA plots, MA plots, Gene expression boxplots, normalized counts heatmap, SVA plots | |
| RNAseq (bulk) | Differential Gene Expression (DGE) analysis | edgeR, limma-voom | matrix of samples and gene counts (csv) | Gene expression table | PCA plots, Heatmaps, Scatter plots, Volcano plots, Gene expression boxplots | |
| RNAseq (bulk) | Differential Gene Expression (DGE) analysis | DESeq2, edgeR, limma-voom | matrix of samples and gene counts (csv) and a metadata table (csv) | DGE results table, method comparison results | PCA plots, Dispersion plot, Volcano plot, Venn diagram method comparison | |
| RNAseq (single-cell) | Guided single-cell RNA-seq data analysis and clustering. | Seurat, sctransform, dplyr | either 10X data files (mtx, tsv) OR matrix of cell/gene counts | PCA results, ICA results, DGE cluster markers table, Seurat R Object, R script | Violin plots, PCA plots, PCA heatmaps, Elbow Plot,UMAP, t-SNE | |
| Metagonomics | Differential analysis of quantitative metagenomic data | DESeq2, circlize, ape, phytools, philentropy | a BIOM matrix file OR a counts file (csv)/taxonomy file (tsv). Optionally a phylogenetic tree newick file (nhx) and/or fastq files | Output files (biom, tsv, nhx) of Shaman workflow that can be used in downstream metagenomic analysis and visualization within Shaman | Bar plots, PCOA/PCA plots, Clustering dendograma, Rarefaction curves, Scatter plots, Heatmaps, Box plots, Diversity plots, Venn diagram | |
| Gene Enrichment - Gene Set Enrichment Analysis | Gene Set Enrichment Analysis (GSEA) of GO-Terms and KEGG pathways | clusterProfiler, DOSE, Goplot, enrichplot, pathview | a table of differential gene expression DGE data (csv/tsv) | GO terms table, KEGG results table | Dot plots, Category netplot, GO induced graph, Pathview plot, ridge plot, pubmed trends | |
| Gene Enrichment - Over Respresentation Analysis | Over-representation analysis of GO-Terms and KEGG pathways | clusterProfiler, DOSE, Goplot, enrichplot, pathview, wordcloud2 | a table of differential gene expression DGE data (csv/tsv) | GO terms table, KEGG results table | Bar plots, Dot plots, Category netplot, GO induced graph, Pathview plot, word cloud |
Fig. 2GeneCountMerger screenshot. A preprocessing utility to generate the gene count matrices required as input to many analysis tools. It can merge individual raw gene count files from htseq-count and other similar applications. Convenient features include conversion of Ensembl gene IDs to gene names for reference genomes and seamless launching of downstream analysis applications
Fig. 3Seurat wizard screenshot. Wizard-style web-based interactive applications based on Seurat, a popular R package designed for QC, analysis, and exploration of single-cell RNA-seq data. The wizards guide users through single-cell RNA-seq data analysis and visualization and provide an intuitive way to fine-tune parameters using feedback from results at each stage of the analysis. Functionality includes filtering, normalization, dimensionality reduction (PCA), clustering, and visualization with UMAP or t-SNE plots
Fig. 4DESeq2Shiny screenshot. A web-based shiny wrapper around DESeq2, a popular R package for performing differential mRNA expression analysis of RNA-seq data
Fig. 5ClusterProfShinyGSEA screenshot. Web-based apps wrap the popular R package clusterProfiler for the analysis and visualization of functional themes and enrichment among gene clusters, using data from either DESeq2 or DESeq2Shiny. Both Gene Set Enrichment Analysis (GSEA) and Over-Representation Analysis (ORA) are implemented