| Literature DB >> 27650223 |
Tyler W H Backman1, Thomas Girke2.
Abstract
BACKGROUND: Next-generation sequencing (NGS) has revolutionized how research is carried out in many areas of biology and medicine. However, the analysis of NGS data remains a major obstacle to the efficient utilization of the technology, as it requires complex multi-step processing of big data demanding considerable computational expertise from users. While substantial effort has been invested on the development of software dedicated to the individual analysis steps of NGS experiments, insufficient resources are currently available for integrating the individual software components within the widely used R/Bioconductor environment into automated workflows capable of running the analysis of most types of NGS applications from start-to-finish in a time-efficient and reproducible manner.Entities:
Keywords: Analysis workflow; ChIP-Seq; Next Generation Sequencing (NGS); RNA-Seq; Ribo-Seq; VAR-Seq
Mesh:
Year: 2016 PMID: 27650223 PMCID: PMC5029110 DOI: 10.1186/s12859-016-1241-0
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Workflow steps with input/output file operations are controlled by SYSargs objects. Each SYSargs instance is constructed from a targets and a param file. The only input required from the user is the initial targets file. Subsequent instances are created automatically. Any number of predefined or custom workflow steps is supported
Fig. 2Workflow Steps and Graphical Features. Relevant workflow steps of several NGS applications (a) are illustrated in form of a simplified flowchart (b). Examples of systemPipeR’s functionalities are given under (c) including: (1) eight different plots for summarizing the quality and diversity of short reads provided as FASTQ files; (2) strand-specific read count summaries for all feature types provided by a genome annotation; (3) summary plots of read depth coverage for any number of transcripts with nucleotide resolution upstream/downstream of their start and stop codons, as well as binned coverage for their coding regions; (4) enumeration of up- and down-regulated DEGs for user defined sample comparisons; (5) similarity clustering of sample profiles; (6) 2-5-way Venn diagrams for DEGs, peak and variant sets; (7) gene-wise clustering with a wide range of algorithms; and (8) support for plotting read pileups and variants in the context of genome annotations along with genome browser support
Selected functions. The table lists a subset of over 50 methods and functions defined by systemPipeR. Usage instructions are provided in the corresponding help pages and vignettes of the package
| Function name | Description |
|---|---|
| genWorkenvir | Generates workflow templates provided by |
| systemArgs | Constructs SYSargs workflow control module (S4 object) from |
| runCommandline | Executes command-line software on samples and parameters specified in SYSargs |
| clusterRun | Runs command-line software in parallel mode on a computer cluster |
| preprocessReads | Filtering and/or trimming of short reads using predefined or custom parameters |
| seeFASTQ/seeFASTQplot | Generates quality reports for any number of FASTQ files |
| alignStats | Generates alignment statistics, such as total number of reads and alignment frequency |
| run_edgeR/run_DESeq2 | Runs |
| filterDEGs | Filters and plots DEG results based on user-defined parameters |
| overLapper/vennPlot | Computation of Venn intersects for 2-20 or more samples and 2-5 way Venn diagrams |
| GOCluster_Report | GO term enrichment analysis for large numbers of gene sets |
| variantReport | Generates a variant report containing genomic annotations and confidence statistics |
| predORF | Prediction of short open reading frames in DNA sequences |
| featuretypeCounts | Computes and plots read distribution for many feature types at once |
| featureCoverage | Computes and plots read depth coverage from many transcripts |