| Literature DB >> 26738481 |
Markus Wolfien1, Christian Rimmbach2, Ulf Schmitz3,4, Julia Jeannine Jung5, Stefan Krebs6, Gustav Steinhoff7, Robert David8, Olaf Wolkenhauer9,10.
Abstract
BACKGROUND: Technical advances in Next Generation Sequencing (NGS) provide a means to acquire deeper insights into cellular functions. The lack of standardized and automated methodologies poses a challenge for the analysis and interpretation of RNA sequencing data. We critically compare and evaluate state-of-the-art bioinformatics approaches and present a workflow that integrates the best performing data analysis, data evaluation and annotation methods in a Transparent, Reproducible and Automated PipeLINE (TRAPLINE) for RNA sequencing data processing (suitable for Illumina, SOLiD and Solexa).Entities:
Mesh:
Substances:
Year: 2016 PMID: 26738481 PMCID: PMC4702420 DOI: 10.1186/s12859-015-0873-9
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Scheme to illustrate TRAPLINE’s RNA sequencing analyses modules. * The target files are provided as Galaxy histories - miRNA Targets (usegalaxy.org/u/mwolfien/h/trapline-mirna-targets-input) and Protein-Protein Interactions (usegalaxy.org/u/mwolfien/h/trapline-protein-protein-interaction-input)
Fig. 2Evaluating the different analyses modules of TRAPLINE. a A fraction of mapped reads with and without applying pre-processing modules (QT: quality trimming; C: clipping). TopHat2 was used for genome mapping. Error bars indicate the standard deviation. Asterisks indicate a significant difference: # Welch’s t-test with α = 0.05; § ANOVA with α = 0.05; (n = 6). b Comparison of different genome mapping tools. The bars indicate the transcript accuracy of the reads aligned to the genome in %, including the standard deviation. Marks indicate significant difference: # Welch’s t-test with α = 0.05, Bonferroni test with α = 0.05; (n = 6). c and d Comparison of read correction procedure by Picard Toolkit, before (c) and after (d), to visualize and correct for multiple RNA sequences in the experimental datasets. RSeQC shows the two specific read duplication correction possibilities: "Sequence-base" reads have the same nucleotide sequence (blue), "Mapping-base" reads have the same mapped sequence, but are aligned to different locations on the genome (red). e Comparison of three different DE analysis tools (edgeR, SAMseq and Cuffdiff2), after read mapping with Bowtie (edgeR, SAMseq) TopHat2 (Cuffdiff2). The total number of significantly differentially expressed genes is based on FDR < 0.05 and divided into upregulated and downregulated genes. f Vulcano plot illustrating significantly differentially expressed genes (red dots: FC≥2; p≤0.05)
Fig. 3ClueGo visualization of a gene ontology interaction network obtained after functional classification analysis with DAVID. The network shows interconnections among different biological processes of 350 enriched significantly upregulated genes. Subnetworks are grouped based on GO superclasses (bold label). The colors range from red (less significant; pV = 0.05) to brown (highly significant; pV ≤ 0.05*10−3)