| Literature DB >> 23497356 |
Charlotte Soneson1, Mauro Delorenzi.
Abstract
BACKGROUND: Finding genes that are differentially expressed between conditions is an integral part of understanding the molecular basis of phenotypic variation. In the past decades, DNA microarrays have been used extensively to quantify the abundance of mRNA corresponding to different genes, and more recently high-throughput sequencing of cDNA (RNA-seq) has emerged as a powerful competitor. As the cost of sequencing decreases, it is conceivable that the use of RNA-seq for differential expression analysis will increase rapidly. To exploit the possibilities and address the challenges posed by this relatively new type of data, a number of software packages have been developed especially for differential expression analysis of RNA-seq data.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23497356 PMCID: PMC3608160 DOI: 10.1186/1471-2105-14-91
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Area under the ROC curve (AUC). Area under the ROC curve (AUC) for the eleven evaluated methods, in simulation studies (panel A), (panel B), (panel C), (panel D), (panel E) and (panel F). The boxplots summarize the AUCs obtained across 10 independently simulated instances of each simulation study. Each panel shows the AUCs across three sample sizes (|S1| = |S2| = 2, 5 and 10, respectively, signified by the last number in the tick labels). The methods are ordered according to their median AUC for the largest sample size. When all DE genes were regulated in the same direction, increasing the number of DE genes from 1,250 (panel A) to 4,000 (panel C) impaired the performance of all methods. In contrast, when the DE genes were regulated in different directions (panels B and D), the number of DE genes had much less impact. The variability of the performance of baySeq was much higher when all genes were regulated in the same direction (panels A and C) compared to when the DE genes were regulated in different directions (panels B and D). Including outliers (panels E and F) decreased the AUC for most methods (compare to panel B), but less so for the transformation-based methods (voom+limma and vst+limma) and SAMseq.
Figure 2False discovery curves. Representative false discovery curves, depicting the number of false positives encountered among the T top-ranked genes by the eleven evaluated methods, for T between 0 and 1,500. In all cases, there were 5 samples per condition. A: Simulation study . B: Simulation study . C: Simulation study D: Simulation study . E: Simulation study F: Simulation study . Some of the curves do not pass through the origin, since many genes obtained the same ranking score and had to be called simultaneously.
Figure 3Type I error rates. Type I error rates, for the six methods providing nominal p-values, in simulation studies (panel A), (panel B), (panel C) and (panel D). Letting some counts follow a Poisson distribution (panel B) reduced the type I error rates for TSPM slightly but had overall a small effect. Including outliers with abnormally high counts (panels C and D) had a detrimental effect on the ability to control the type I error for edgeR and NBPSeq, while DESeq became slightly more conservative.
Figure 4True false discovery rates. True false discovery rates (FDR) observed for an imposed FDR threshold of 0.05, for the nine methods returning adjusted p-values or FDR estimates, in simulation studies (panel A), (panel B), (panel C) , (panel D), (panel E) and (panel F). With only two samples per condition, three of the methods (vst+limma, voom+limma and SAMseq) did not call any DE genes, and the FDR was considered to be undefined.
Figure 5Analysis of the Bottomly data set. A: The number of genes found to be significantly DE between the two mouse strains in the Bottomly data set. B-C: Overlap among the set of DE genes found by different methods. D: The average number of genes found to be significantly DE genes when contrasting two subsets of mice from the same strain, in which case we expect no truly DE genes.
The number of shared differentially expressed genes found by the different methods for the Bottomly data set
| ShrinkSeq | 583 | 1125 | 985 | 1075 | 971 | 1049 | 192 | 803 | 1821 | |
| DESeq | 583 | 598 | 567 | 588 | 589 | 587 | 191 | 523 | 592 | |
| edgeR | 1125 | 598 | 877 | 886 | 942 | 1013 | 194 | 753 | 1099 | |
| NBPSeq | 985 | 567 | 877 | 695 | 753 | 797 | 194 | 612 | 924 | |
| TSPM | 1075 | 588 | 886 | 695 | 891 | 907 | 191 | 794 | 1014 | |
| voom | 971 | 589 | 942 | 753 | 891 | 971 | 194 | 752 | 991 | |
| vst | 1049 | 587 | 1013 | 797 | 907 | 971 | 194 | 752 | 1061 | |
| baySeq | 192 | 191 | 194 | 194 | 191 | 194 | 194 | 175 | 194 | |
| EBSeq | 803 | 523 | 753 | 612 | 794 | 752 | 752 | 175 | 801 | |
| SAMseq | 1821 | 592 | 1099 | 924 | 1014 | 991 | 1061 | 194 | 801 |
The table contains the number of differentially expressed genes that are shared between each pair of methods, for the Bottomly data set (compare to Figure 5). The numbers on the diagonal, indicating the number of differentially expressed genes found by the respective methods, are highlighted in bold.
Summary of the main observations
| DESeq | - Conservative with default settings. Becomes more conservative when outliers are introduced. |
| - Generally low TPR. | |
| - Poor FDR control with 2 samples/condition, good FDR control for larger sample sizes, also with outliers. | |
| - Medium computational time requirement, increases slightly with sample size. | |
| edgeR | - Slightly liberal for small sample sizes with default settings. Becomes more liberal when outliers are introduced. |
| - Generally high TPR. | |
| - Poor FDR control in many cases, worse with outliers. | |
| - Medium computational time requirement, largely independent of sample size. | |
| NBPSeq | - Liberal for all sample sizes. Becomes more liberal when outliers are introduced. |
| - Medium TPR. | |
| - Poor FDR control, worse with outliers. Often truly non-DE genes are among those with smallest p-values. | |
| - Medium computational time requirement, increases slightly with sample size. | |
| TSPM | - Overall highly sample-size dependent performance. |
| - Liberal for small sample sizes, largely unaffected by outliers. | |
| - Very poor FDR control for small sample sizes, improves rapidly with increasing sample size. Largely unaffected by outliers. | |
| - When all genes are overdispersed, many truly non-DE genes are among the ones with smallest p-values. Remedied when the counts for some genes are Poisson distributed. | |
| - Medium computational time requirement, largely independent of sample size. | |
| voom / vst | - Good type I error control, becomes more conservative when outliers are introduced. |
| - Low power for small sample sizes. Medium TPR for larger sample sizes. | |
| - Good FDR control except for simulation study | |
| - Computationally fast. | |
| baySeq | - Highly variable results when all DE genes are regulated in the same direction. Less variability when the DE genes are regulated in different directions. |
| - Low TPR. Largely unaffected by outliers. | |
| - Poor FDR control with 2 samples/condition, good for larger sample sizes in the absence of outliers. Poor FDR control in the presence of outliers. | |
| - Computationally slow, but allows parallelization. | |
| EBSeq | - TPR relatively independent of sample size and presence of outliers. |
| - Poor FDR control in most situations, relatively unaffected by outliers. | |
| - Medium computational time requirement, increases slightly with sample size. | |
| NOISeq | - Not clear how to set the threshold for |
| - Performs well, in terms of false discovery curves, when the dispersion is different between the conditions (see supplementary material). | |
| - Computational time requirement highly dependent on sample size. | |
| SAMseq | - Low power for small sample sizes. High TPR for large enough sample sizes. |
| - Performs well also for simulation study | |
| - Largely unaffected by introduction of outliers. | |
| - Computational time requirement highly dependent on sample size. | |
| ShrinkSeq | - Often poor FDR control, but allows the user to use also a fold change threshold in the inference procedure. |
| - High TPR. | |
| - Computationally slow, but allows parallelization. |
The table summarizes the present study by means of the main observations and characteristic features for each of the evaluted methods. We have grouped voom+limma and vst+limma together since they performed overall very similarly.
Summary of the parameters used to generate the synthetic data sets
| 0 | 0 | 0 | 0 | 0 | |
| 1,250 | 0 | 0 | 0 | 0 | |
| 625 | 625 | 0 | 0 | 0 | |
| 4,000 | 0 | 0 | 0 | 0 | |
| 2,000 | 2,000 | 0 | 0 | 0 | |
| 0 | 0 | 6,250 | 0 | 0 | |
| 625 | 625 | 6,250 | 0 | 0 | |
| 0 | 0 | 0 | 10% | 0 | |
| 625 | 625 | 0 | 10% | 0 | |
| 0 | 0 | 0 | 0 | 5% | |
| 625 | 625 | 0 | 0 | 5% |
In all synthetic data sets, the observations were distributed between two conditions (denoted S1 and S2), with the same number of observations (2, 5 or 10) in each condition. We let and denote, respectively, the number of genes that were up- and downregulated in condition S2 compared to S1. The number of genes whose counts were drawn from a Poisson distribution (i.e., with the dispersion parameter equal to zero) is given by |{g; ϕ = 0}|. The ‘single’ outlier fraction denotes the fraction of the genes for which we selected a single sample and multiplied the corresponding count with a factor between 5 and 10. The ‘random’ outlier fraction denotes the fraction of counts that were selected randomly (among all counts) and multiplied with a factor between 5 and 10. The notation for the simulation studies (leftmost column) summarizes the type of simulation (B - ‘baseline’, P - ‘Poisson’, S - ‘single outlier’, R - ‘random outlier‘), the number of DE genes that are upregulated in S2 (i.e., , in the superscript) and the number of DE genes that are downregulated in S2 (i.e., , in the subscript).