| Literature DB >> 31844586 |
Ashley J Waardenberg1, Matthew A Field1,2.
Abstract
Extensive evaluation of RNA-seq methods have demonstrated that no single algorithm consistently outperforms all others. Removal of unwanted variation (RUV) has also been proposed as a method for stabilizing differential expression (DE) results. Despite this, it remains a challenge to run multiple RNA-seq algorithms to identify significant differences common to multiple algorithms, whilst also integrating and assessing the impact of RUV into all algorithms. consensusDE was developed to automate the process of identifying significant DE by combining the results from multiple algorithms with minimal user input and with the option to automatically integrate RUV. consensusDE only requires a table describing the sample groups, a directory containing BAM files or preprocessed count tables and an optional transcript database for annotation. It supports merging of technical replicates, paired analyses and outputs a compendium of plots to guide the user in subsequent analyses. Herein, we assess the ability of RUV to improve DE stability when combined with multiple algorithms and between algorithms, through application to real and simulated data. We find that, although RUV increased fold change stability between algorithms, it demonstrated improved FDR in a setting of low replication for the intersect, the effect was algorithm specific and diminished with increased replication, reinforcing increased replication for recovery of true DE genes. We finish by offering some rules and considerations for the application of RUV in a consensus-based setting. consensusDE is freely available, implemented in R and available as a Bioconductor package, under the GPL-3 license, along with a comprehensive vignette describing functionality: http://bioconductor.org/packages/consensusDE/. ©2019 Waardenberg and Field.Entities:
Keywords: Benchmark; BioConductor; Consensus; DESeq2; EdgeR; R; RNA-seq; RUV; Software; Voom
Year: 2019 PMID: 31844586 PMCID: PMC6913255 DOI: 10.7717/peerj.8206
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1consensusDE typical workflow.
(A) consensusDE requires a table, here defined as “my_data”, for example purposes, that describes the experimental design and location of files. (B) running consensusDE requires two steps, first to build a summarized object, using the buildSummarized function, to store all information and second to run analyses with all algorithms using the multi_de_pairs function. Example code for a typical analysis with consensusDE is provided for illustration.
consensusDE merged table features and description.
| ID | Identifier | Identifier of feature used for mapping read counts against |
| AveExpr | Average Expression | Average of edgeR, DESeq2 and voom reported Average Expression |
| LogFC | Log Fold-Change, also known as a log-ratio | Average of edgeR, DESeq2 and voom logFC |
| LogFC_sd | Log Fold-Change standard deviation | Standard Deviation of LogFC reported by edgeR, DESeq2 and voom |
| edger_adj_p | edgeR | Adjusted for multiple hypotheses using benjamini and hochberg (default) |
| deseq_adj_p | DESeq2 | Adjusted for multiple hypotheses using benjamini and hochberg (default) |
| voom_adj_p | voom | Adjusted for multiple hypotheses using benjamini and hochberg (default) |
| edger_rank | rank of the | smallest rank is most significant (or smallest |
| deseq_rank | rank of the | smallest rank is most significant (or smallest |
| voom_rank | rank of the | smallest rank is most significant (or smallest |
| rank_sum | Sum of ranks | Combination of ranks from edger_rank, voom_rank and deseq_rank. Results are orderd by this sum, which represents the order of the most stable reported p-values |
| p_intersect | Largest | This represents the intersect when a threshold is set on the p_intersect column |
| p_union | Smallest | This represents the union when a threshold is set on the p_union column |
| genename | Extended gene name | e.g., alpha-L-fucosidase 2 corresponds to FUCA2 |
| symbol | Gene symbol | e.g., FUCA2 |
| kegg | kegg pathway identifier | For further analyses of pathways where annotated |
| coords | chromosomal coordinates | e.g., chr6:143494811-143511690 |
| strand | transcript strand | forward strand is +, reverse strand is - |
| width | transcript width | Reported in base pairs (bp) (transcript start to end) (e.g., 16,880 bp) |
Figure 2Application to airway data.
(A) Jaccard Coefficient (JC) of each set to the intersect (common) without RUV correction and (B) with RUV correction. Absolute upper (C) and lower (D) limits of log fold change differences between pairs of algorithms (95% confidence intervals), with and without RUV correction.
Figure 3Simulated data for 3 and 5 replications with and without RUV.
(A) Jaccard Coefficient (JC) of each method (B) limits of log fold change differences between pairs of algorithms (95% confidence intervals), with and without RUV correction (C) False Discovery Rates (FDR), lower number is better and (D) Sensitivity (or recall), higher number is better. Each panel contains 3 replicates, 5 replicates and without and without RUV correction–see central legend for colour coding scheme. All values represent the absolute average of 10 simulations.