| Literature DB >> 25475910 |
Yasir Rahmatallah1, Frank Emmert-Streib2, Galina Glazko3.
Abstract
BACKGROUND: Over the last few years transcriptome sequencing (RNA-Seq) has almost completely taken over microarrays for high-throughput studies of gene expression. Currently, the most popular use of RNA-Seq is to identify genes which are differentially expressed between two or more conditions. Despite the importance of Gene Set Analysis (GSA) in the interpretation of the results from RNA-Seq experiments, the limitations of GSA methods developed for microarrays in the context of RNA-Seq data are not well understood.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25475910 PMCID: PMC4265362 DOI: 10.1186/s12859-014-0397-8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Type I error rates for multivariate methods, α = 0.05
|
|
|
| |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
| ||
|
| N-stat | 0.060 | 0.062 | 0.038 | 0.038 | 0.054 | 0.062 | 0.058 | 0.035 | 0.052 | 0.050 | 0.049 | 0.047 |
| WW | 0.096 | 0.103 | 0.091 | 0.091 | 0.102 | 0.102 | 0.096 | 0.070 | 0.097 | 0.099 | 0.099 | 0.089 | |
| KS | 0.104 | 0.090 | 0.083 | 0.082 | 0.102 | 0.088 | 0.077 | 0.076 | 0.080 | 0.077 | 0.072 | 0.092 | |
| ROAST | 0.050 | 0.048 | 0.036 | ||||||||||
|
| N | 0.053 | 0.048 | 0.049 | 0.048 | 0.058 | 0.052 | 0.035 | 0.048 | 0.054 | 0.047 | 0.039 | 0.051 |
| WW | 0.066 | 0.075 | 0.063 | 0.073 | 0.060 | 0.058 | 0.056 | 0.076 | 0.056 | 0.067 | 0.067 | 0.079 | |
| KS | 0.069 | 0.071 | 0.072 | 0.073 | 0.068 | 0.079 | 0.059 | 0.059 | 0.055 | 0.066 | 0.081 | 0.065 | |
| ROAST | 0.052 | 0.050 | 0.039 | ||||||||||
|
| N | 0.052 | 0.054 | 0.060 | 0.067 | 0.051 | 0.040 | 0.053 | 0.055 | 0.046 | 0.054 | 0.059 | 0.044 |
| WW | 0.089 | 0.066 | 0.065 | 0.079 | 0.057 | 0.069 | 0.060 | 0.073 | 0.065 | 0.065 | 0.076 | 0.064 | |
| KS | 0.061 | 0.073 | 0.055 | 0.060 | 0.052 | 0.059 | 0.061 | 0.070 | 0.053 | 0.051 | 0.068 | 0.047 | |
| ROAST | 0.054 | 0.043 | 0.055 | ||||||||||
Figure 1The functional relationship between the transformed and the original values for different transformation functions (used by FM, SM and GM with STT = 0.05).
Type I error rates for gene-level GSA methods, α = 0.05
|
|
|
| ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
| ||
|
| FM | 0.087 | 0.067 | 0.045 | 0.107 | 0.072 | 0.046 | 0.096 | 0.065 | 0.037 |
| SM | 0.052 | 0.048 | 0.045 | 0.063 | 0.062 | 0.048 | 0.046 | 0.047 | 0.040 | |
| GM | 0.123 | 0.092 | 0.049 | 0.187 | 0.141 | 0.039 | 0.245 | 0.180 | 0.041 | |
|
| FM | 0.067 | 0.058 | 0.049 | 0.082 | 0.059 | 0.049 | 0.090 | 0.073 | 0.054 |
| SM | 0.065 | 0.062 | 0.054 | 0.059 | 0.060 | 0.053 | 0.063 | 0.061 | 0.057 | |
| GM | 0.092 | 0.063 | 0.051 | 0.132 | 0.091 | 0.058 | 0.164 | 0.104 | 0.051 | |
|
| FM | 0.066 | 0.061 | 0.048 | 0.056 | 0.050 | 0.049 | 0.072 | 0.061 | 0.044 |
| SM | 0.052 | 0.047 | 0.046 | 0.048 | 0.050 | 0.049 | 0.048 | 0.046 | 0.058 | |
| GM | 0.088 | 0.072 | 0.049 | 0.090 | 0.065 | 0.050 | 0.108 | 0.091 | 0.046 | |
Figure 2The power curves of multivariate tests with different normalizations when shift alternative hypothesis ( ) holds true and the number of genes in pathways = 16 = 20).
Figure 3The power curves of gene - level GSA methods when shift alternative hypothesis ( ) holds true and the number of genes in pathways = 16 ( = 20).
Average type I error rates attained from Nigerian male samples, α = 0.05
|
|
|
|
| ||
|---|---|---|---|---|---|
| N-stat | 0.049 | 0.045 | 0.044 | 0.055 | |
| WW | 0.069 | 0.062 | 0.058 | 0.072 | |
| KS | 0.052 | 0.052 | 0.048 | 0.059 | |
| ROAST | 0.033 | ||||
| FM | SM | GM | |||
| edgeR | 0.075 | 0.062 | 0.119 | ||
| DESeq | 0.068 | 0.059 | 0.103 | ||
| eBayes | 0.059 | 0.057 | 0.063 |
Figure 4Venn diagrams showing the number of common pathways detected in the processed Nigerian dataset by multivariate tests with normalizations and univariate tests with combined - values for gene - level GSA methods (α = 0.05). (a) N-statistic with different normalizations and ROAST; (b) WW with different normalizations and ROAST; (c) KS with different normalizations and ROAST; (d) edgeR with different P-values combining methods; (e) DESeq with different P-values combining methods; (f) eBayes with different P-values combining methods; (g) univariate tests with FM; (h) univariate tests with SM; (i) univariate tests with GM.
Figure 5The percentage of DE genes , number of genes and average gene length in detected pathways in the processed Nigerian dataset by different methods.