| Literature DB >> 35501696 |
John A Hadish1, Tyler D Biggs2, Benjamin T Shealy3, M Reed Bender4, Coleman B McKnight5, Connor Wytko6, Melissa C Smith3, F Alex Feltus4,5,7, Loren Honaas8, Stephen P Ficklin9,10.
Abstract
BACKGROUND: Quantification of gene expression from RNA-seq data is a prerequisite for transcriptome analysis such as differential gene expression analysis and gene co-expression network construction. Individual RNA-seq experiments are larger and combining multiple experiments from sequence repositories can result in datasets with thousands of samples. Processing hundreds to thousands of RNA-seq data can result in challenges related to data management, access to sufficient computational resources, navigation of high-performance computing (HPC) systems, installation of required software dependencies, and reproducibility. Processing of larger and deeper RNA-seq experiments will become more common as sequencing technology matures.Entities:
Keywords: Differential gene expression; Gene co-expression network; Gene expression matrix; Nextflow; RNA-seq; Workflows
Mesh:
Year: 2022 PMID: 35501696 PMCID: PMC9063052 DOI: 10.1186/s12859-022-04629-7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.307
Containerized software tools used in release v2.0 of GEMmaker
| Tool | Version | Notes |
|---|---|---|
| nf-core/base | 1.13.3 | The base operating system for all nf-core compatible workflows |
| Python3 | 3.9.2 | Used by a variety of custom data wrangling tools |
| Aspera | 3.8.1 | Downloads SRA files from NCBI SRA using provided run IDs |
| SRAToolkit | 2.10.0 | Downloads SRA files from NCBI using provided SRA Run IDs |
| FastQC | 0.11.9 | Generates read quality statistics for FASTQ files |
| Trimmomatic | 0.39 | Removes low-quality bases and removes adapter sequences |
| STAR | 2.7.9a | Aligns cleaned reads to the reference |
| HISAT2 | 2.2.0 | Aligns cleaned reads to the reference |
| Salmon | 1.5.2 | Performs quasi-alignment of reads and quantities |
| kallisto | 0.46.2 | Performs pseudo-alignment of reads and quantities |
| SAMTools | 1.14 | Used for indexing and sorting of BAM files created by HISAT2 |
| StringTie | 2.1.7 | Performs gene expression quantification |
| MultiQC | 1.11 | Generate a full summary report for the entire workflow |
Fig. 1GEMmaker workflow diagram. GEMmaker supports the inclusion of both local and remote RNA-seq data files and offers four different alignment tools for gene expression quantification: Hisat2, STAR, Kallisto, and Salmon
Fig. 2Storage usage comparison. Storage sizes for processing the 475-sample time-series rice dataset is shown. Dashed lines indicate tests in which GEMmaker was configured to not cleanup of intermediate files between batches, while solid lines indicate that a cleanup was performed