| Literature DB >> 32183729 |
Xiaokang Zhang1, Inge Jonassen2.
Abstract
BACKGROUND: With the cost of DNA sequencing decreasing, increasing amounts of RNA-Seq data are being generated giving novel insight into gene expression and regulation. Prior to analysis of gene expression, the RNA-Seq data has to be processed through a number of steps resulting in a quantification of expression of each gene/transcript in each of the analyzed samples. A number of workflows are available to help researchers perform these steps on their own data, or on public data to take advantage of novel software or reference data in data re-analysis. However, many of the existing workflows are limited to specific types of studies. We therefore aimed to develop a maximally general workflow, applicable to a wide range of data and analysis approaches and at the same time support research on both model and non-model organisms. Furthermore, we aimed to make the workflow usable also for users with limited programming skills.Entities:
Keywords: RNA-Seq; Snakemake; Workflow
Mesh:
Year: 2020 PMID: 32183729 PMCID: PMC7079470 DOI: 10.1186/s12859-020-3433-x
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Overview of the steps performed by RNA-Seq Analysis Snakemake Workflow (RASflow)
Fig. 2Quality control of raw reads and alignment. a The mean quality value across each base position in the read. b The average GC content of reads. A normal random library typically has a roughly normal distribution of GC content. c Distribution of estimated insert sizes of mapped reads. d A brief mapping summary
Fig. 3Visualization of DEA results. a Volcano plot with labeled genes who pass the thresholds of both Fold Change and P-value. b Hierarchical clustering heatmap with samples along the x-axis and differentially expressed genes along the y-axis
Alignment runtime of three datasets
| Dataset | Number of samples | Size of raw data (GB) | Runtime of alignment (HH:MM) | |
|---|---|---|---|---|
| Transcriptome as reference | Genome as reference | |||
| Cod | 47 | 244 | 05:32 | 69:18 |
| Human | 28 | 137 | 03:14 | 20:03 |
| Benchmark | 32 | 36 | 02:37 | 11:22 |
| Mouse | 8 | 9.3 | 00:28 | 03:46 |
| Mouse_pc ∗ | 8 | 9.3 | 01:11 | 19:31 |
*This was run on a personal computer
Comparison of RASflow with the other workflows published between 2017 and 2019
| workflow | quality control | organism | mapping reference | workflow for DEA ∗ | hardware requirement | installation | programming requirement | year | ref |
|---|---|---|---|---|---|---|---|---|---|
| RASflow | yes | all | genome transcriptome | GB & TB | low | easy | low | 2020 | NA |
| UTAP | yes | 5 | genome | GB | high | easy | low | 2019 | [ |
| ARMOR | yes | all | genome transcriptome | TB | high | easy | low | 2019 | [ |
| VIPER | yes | 2 | genome | GB | high | easy | low | 2018 | [ |
| BioJupies | no | 2 | genome | GB | low | web application | low | 2018 | [ |
| hppRNA | yes | 2 | genome transcriptome | GB & TB | low | medium | medium | 2018 | [ |
| aRNApipe | yes | all | genome | GB | high | hard | high | 2017 | [ |
| RNACocktail | no | all | genome transcriptome | GB & TB | low | hard | high | 2017 | [ |
*GB: genome based — gene/transcript quantification and DEA based on reads mapped to a genome; TB: transcriptome based