| Literature DB >> 32448189 |
Abstract
BACKGROUND: With the explosion in the number of methods designed to analyze bulk and single-cell RNA-seq data, there is a growing need for approaches that assess and compare these methods. The usual technique is to compare methods on data simulated according to some theoretical model. However, as real data often exhibit violations from theoretical models, this can result in unsubstantiated claims of a method's performance.Entities:
Keywords: Confounders; Differential expression; Factor analysis; RNA-seq; Scaling factors; Simulation
Year: 2020 PMID: 32448189 PMCID: PMC7245910 DOI: 10.1186/s12859-020-3450-9
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Scree Plots. Scree plots for the GTEx dataset (black), powsimR dataset (orange), and the seqgendiff dataset (blue). The singular values for the GTEx and seqgendiff datasets are almost identical
Fig. 2Principal Component Plots. First and second principle components for the GTEx dataset (left), the powsimR dataset (center), and the seqgendiff dataset (right). The first and second principle components of the powsimR dataset are very different from those of the GTEx and seqgendiff datasets
Fig. 3Voom plots. Voom plots [26] visualizing the mean-variance trend in RNA-seq datasets. The voom plots are visually similar for the GTEx and seqgendiff datasets. The powsimR dataset has an uncharacteristic hook near the low counts in its voom plot
Fig. 4False discovery proportion of various methods. Boxplots of false discovery proportion (FDP) (y-axis) for various differential expression analysis methods (x-axis) when applied on different simulated datasets (color). Benjamini-Hochberg was used to control for false discovery rate at the 0.05 level (horizontal dashed line). Only voom-limma controls false discovery rate at the nominal level. The FDP is more variable among the seqgendiff datasets than among the powsimR datasets