| Literature DB >> 27213017 |
Katherine Icay1, Ping Chen1, Alejandra Cervera1, Ville Rantanen1, Rainer Lehtonen1, Sampsa Hautaniemi1.
Abstract
BACKGROUND: Large-scale sequencing experiments are complex and require a wide spectrum of computational tools to extract and interpret relevant biological information. This is especially true in projects where individual processing and integrated analysis of both small RNA and complementary RNA data is needed. Such studies would benefit from a computational workflow that is easy to implement and standardizes the processing and analysis of both sequenced data types.Entities:
Keywords: Breast cancer; Integration; RNA; Sequencing; miRNA; totalRNA
Year: 2016 PMID: 27213017 PMCID: PMC4875694 DOI: 10.1186/s13040-016-0099-z
Source DB: PubMed Journal: BioData Min ISSN: 1756-0381 Impact factor: 2.522
Command-line executable software readily implemented in the SePIA workflow and used with the case studies
| Module | Component | Software | Reference |
|---|---|---|---|
| Preprocessing | Adaptor and quality trimming | FastX-Toolkit |
|
| Trimmomatic | [ | ||
| Trim Galore |
| ||
| Quality statistics | FastQC |
| |
| Read mapping | Align sequences to a reference | BWA | [ |
| Tophat | [ | ||
| Bowtie | [ | ||
| Bowtie2 | [ | ||
| STAR | [ | ||
| Alignment sorting and conversion | SAMtools | [ | |
| Picard tools |
| ||
| Alignment statistics | RNA-SeQC | [ | |
| RSeQC | [ | ||
| Expression | Mapped reads quantification | HTSeq | [ |
| Cufflinks | [ | ||
| Analysis | Variant calling | Bambino | [ |
| and annotation | ANNOVAR | [ | |
| Differential expression | Cuffdiff | [ | |
| R bioconductor packages for differential expression | DESeq | [ | |
| DESeq2 | [ | ||
| DEXseq | [ | ||
| EdgeR | [ | ||
| novel miRNA discovery | miRanalyzer | [ | |
| miRDeep2 | [ | ||
| miRNA-mRNA integration | R package for SQLite query | sqldf | [ |
| Pathway impact analysis | SPIA | [ |
Software marked with are mandatory requirements for the minimum execution of a module. A list of software pre-installed within SePIA’s Docker image and the full range of currently available components can be accessed through the website
Fig. 1SePIA workflow summarized in five generalized modules. Each module contains a brief description of the major steps performed in each pipeline. For example, the ’double-pass’ alignment means reads are mapped first to the whole genome and then to a reference transcriptome. Colors used represent common processes (black), processes specific to small RNA (purple) and RNA (green) data, and the main outputs of the modules (grey). Incorporation of a miRNA-target mRNA database to the workflow is represented in blue. Interesting molecules of the analysis module are defined as differentially expressed, predicted, or mutated
Fig. 2A snapshot of the reports created by SePIA for the case studies. a Small RNA preprocessing report for Case II, including FastQC results organized by patient sample. b, c Alignment and expression statistics for Case I with some standard visualization. d The searchable miRNA-target mRNA report for Case II
Fig. 3Target genes of the mir-17/92 cluster and paralog clusters in the TGF-beta signaling KEGG pathway. Target transcripts were selected to have minimum log2-fold change of 0.5 between tumor and normal breast tissue. Node colors represent expression fold change between normal and tumor breast tissue samples. Relationships between connected miRNA-target transcript pairs are shaded based on correlation coefficient values. Connections between transcript and gene represent average correlation values of contributing transcripts and their regulating miRNAs