| Literature DB >> 29649993 |
MacIntosh Cornwell1, Mahesh Vangala2, Len Taing1,3, Zachary Herbert4, Johannes Köster1,5, Bo Li6, Hanfei Sun7, Taiwen Li8, Jian Zhang9, Xintao Qiu1,3, Matthew Pun1, Rinath Jeselsohn1,3, Myles Brown1,3, X Shirley Liu1,3,6, Henry W Long10,11.
Abstract
BACKGROUND: RNA sequencing has become a ubiquitous technology used throughout life sciences as an effective method of measuring RNA abundance quantitatively in tissues and cells. The increase in use of RNA-seq technology has led to the continuous development of new tools for every step of analysis from alignment to downstream pathway analysis. However, effectively using these analysis tools in a scalable and reproducible way can be challenging, especially for non-experts.Entities:
Keywords: Analysis; Gene fusion; Immunological infiltrate; Pipeline; RNA-seq; Snakemake
Mesh:
Substances:
Year: 2018 PMID: 29649993 PMCID: PMC5897949 DOI: 10.1186/s12859-018-2139-9
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Overview of the full workflow performed by VIPER (Visualization Pipeline for RNAseq analysis). The different segments of the pipeline are broken down by color. The core of the pipeline is the read alignment performed by STAR that outputs alignment (bam) files. Gene expression is quantitated with Cufflinks for unsupervised analysis (clustering and PCA). STAR also generates a count matrix used for supervised analysis (differential expression with DESeq2). When a publically available analysis tool is used for a particular step, the name of the tool is identified above the arrow leading to the resulting output (boxed). When there is no tool indicated next to an arrow, the analysis step was performed with custom R code. Conditional/optional analyses are denoted with a hashed arrow and outlining box and represent the most distinguishing functionality for VIPER
Fig. 2a Read Alignment Report denoting the number of mapped and uniquely mapped reads per sample. b Read Distribution Report illustrating the percentage of reads that fall into specific genomic regions. c rRNA Read Alignment Report demonstrating the percentage of each sample that were considered rRNA reads. Gene Body Coverage of the samples illustrated as (d) curves and as (e) bars in a heatmap
Fig. 3a Sample-Sample Clustering Map depicting samples on both axes with the color representative of the correlation between samples. Metadata columns (provided by the user) are annotated along the top. b Sample-Feature (Gene) Hierarchical Clustering Map with samples along the x-axis and genes along the y-axis. Metadata columns (provided by the user) are annotated along the top. c Sample-Feature heatmaps can also be plotted using k-means clustering, with the number of clusters being configured in the input file. d Principal Component Analysis (PCA) plots, with one being output per metasheet column with the coloring corresponding to the metadata within the column. e Scree plot depicting the amount of variance captured within each principal component
Fig. 4a Differential Gene Expression Summary plot summarizing the number of up and down regulated genes per comparison, broken down by various Padj (adjusted p-value) and Log2 Fold Change cutoffs. b Volcano Plot visually representing the each of the differential expressions in the VIPER run, labeled points have a Padj < 0.01, and an absolute Log2 Fold Change > 1
Fig. 5Summary plot depicting the results of analyzing the differentially increased genes for enrichment (a) in GO terms (b) KEGG pathways and (c) MSigDB gene sets. There are corresponding plots (not shown) showing top differentially decreased pathways. d A plot showing the running enrichment score of the indicated gene sets within the ranked list of differentially expressed genes
Fig. 6a Summary boxplot depicting the population levels of various immune cell classes seen across normal, luminal and basal breast cancers in TCGA. b A Q-Q plot that depicts the gene expression of immune cells after batch correction within the TIMER module, and a bar graph per sample that depicts the proportion of immune cell signature in a particular sample. c Plots depicting TCR clonal diversity reported as clonotypes per thousand reads (CPK) in normal, luminal and basal breast cancers
Fig. 7a Fusion-Gene Analysis Summary Plot with samples along the x-axis and the fusion genes discovered depicted along the y-axis. b Histogram Plot illustrating the insert size per paired end sample. c HLA SNP correlation heatmap showing the correlation between the HLA regions of each sample. d Example of an IGV snapshot with the full vcf annotation of all SNPs seen genome wide. e Table output for the virus-seq module that depicts the top represented viruses within the sample
Comparison of features in VIPER with other RNA-seq pipelines
| Features | VIPER | HppRNA | TRAPLINE | QuickRNASeq |
|---|---|---|---|---|
| Quality Control | X | X | X | X |
| SNP Detection | X | X | X | X |
| Fusion Gene Detection | X | X | ||
| Differential Expression | X | X | X | |
| Pathway Analysis | X | X | X | |
| Consolidated Report | X | X | ||
| Galaxy Based | X | |||
| Dependencies Packaged | X | X | X | |
| Support New Species | X | X | ||
| Package Easy Update | X | X | ||
| Batch Correction | X | |||
| Virus Detection | X | |||
| Immunology Analysis | X |