| Literature DB >> 26187896 |
Stephen W Hartley1, James C Mullikin2.
Abstract
BACKGROUND: High-throughput next-generation RNA sequencing has matured into a viable and powerful method for detecting variations in transcript expression and regulation. Proactive quality control is of critical importance as unanticipated biases, artifacts, or errors can potentially drive false associations and lead to flawed results.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26187896 PMCID: PMC4506620 DOI: 10.1186/s12859-015-0670-5
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1An example analysis pipeline with QoRTs. This flowchart illustrates the recommended analysis pipeline for conventional RNA-Seq analysis using QoRTs. Input and intermediary files are shown in blue, output files and results are shown in purple
Fig. 2A small selection of the QC plots offered by QoRTs. This series includes 12 samples, each consisting of 6 technical replicates (for a total of 72 bam files), with 4 different biological conditions (3 samples per condition). In all nine plots, replicates are colored and differentiated by biological group. In the line plots (c,d,e, and f) the samples are simply colored by biological group. In other plots (a and g), replicates are differentiated by character, color, and horizontal offset. This differentiation allows easy identification of both outliers and systematic biases or errors associated with the biological condition. Such systematic errors are of particular importance as they could potentially drive false associations. A full description of each plot and its interpretation can be found in the supplementary materials
Fig. 3Example issue detected via QoRTs. A subset of the output plots from a dataset in which a rare hardware-level fault produced an actionable QC issue that can be easily identified via QoRTs. In (a) and (b) the replicates are colored by biological sample; in (c) and (d) replicates are colored by sequencer lane. See the QoRTs vignette for more information (Additional file 1)
Features and capabilities of QoRTs compared with those offered by other tools
| QoRTs | RSeQC | RNA-SeQC | |
|---|---|---|---|
| Sequence Metrics: | |||
| Quality score (by cycle) | Yes | Yes1,* | Yes |
| G/C content | Yes | Yes | Yes |
| Nucleotide vs cycle (NVC) | Yes | Yes1 | No |
| N-rate by cycle | Yes | No | No |
| Unclipped NVC | Yes | No | No |
| Clipped Sequences NVC | Yes | No | No |
| Alignment Metrics: | |||
| Strandedness | Yes | Yes2 | Yes |
| Clipping Profile | Yes | Yes1,* | No |
| Insert Size | Yes | Yes2,* | Partial3 |
| Cigar Op Profile | Yes | Partial1,2,4,* | No |
| Cigar Op Length Distribution | Yes | No | No |
| Gene / Exon Coverage | |||
| Gene-Body Coverage | Yes | Yes5,* | Yes |
| Gene-Body Coverage, Low-/Medium-/High-expression genes | Yes | No | Yes |
| Mapping Location rates (intron, exon, UTR, etc.) | Yes | Yes | Partial |
| Gene Diversity | Yes | No | No |
| RPKM/FPKM | Yes | Yes* | Yes |
| “Wiggle” browser tracks | Yes | Yes5 | No |
| Gene-level read counts for DESeq, edgeR | Yes | Partial | No |
| Exon-level read counts for DEXSeq | Yes | No | No |
| Splice Junction Metrics | |||
| # Distinct Junction Loci, Known/Novel, High/Low coverage | Yes | Partial5 | No |
| # Splice Junction Events, Known/Novel, High/Low coverage loci | Yes | Partial5 | No |
| Splice junction coverage “.bed” browser tracks | Yes | No | No |
| Coverage read-pair counts for all Junction Loci | Yes | No | No |
| Visualization and Cross-Comparison | |||
| Cross-Comparison between replicates | Yes | Partial6 | Partial6 |
| Contrast by lane/run, biological group, etc. | Yes | No | No |
| Generate Multiplots (png, svg, etc.) | Yes | No | No |
| Generate QC reports (pdf) | Yes | No | No |
RSeQC functions with documented flaws are marked with an asterisk (*); see the Additional file 2 for more information. (Note: 1Does not separately track read-pairs for paired-end data. 2Performs analysis on a subsample of input reads. 3Only calculates mean and standard deviation. 4Only profiles some cigar operations. 5No paired-end mode, may double-count overlapping paired reads. 6Generates comparison plots only for some metrics.)