| Literature DB >> 25928861 |
Jessica L Larson1,2, Art B Owen3.
Abstract
BACKGROUND: Permutation-based gene set tests are standard approaches for testing relationships between collections of related genes and an outcome of interest in high throughput expression analyses. Using M random permutations, one can attain p-values as small as 1/(M+1). When many gene sets are tested, we need smaller p-values, hence larger M, to achieve significance while accounting for the number of simultaneous tests being made. As a result, the number of permutations to be done rises along with the cost per permutation. To reduce this cost, we seek parametric approximations to the permutation distributions for gene set tests.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25928861 PMCID: PMC4419444 DOI: 10.1186/s12859-015-0571-7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Three data sets used for non-permutation GSEA
|
|
|
|
|
|---|---|---|---|
| Moran | Substantia nigra | 29 | 14 |
| Zhang | Substantia nigra | 18 | 11 |
| Scherzer | Blood | 47 | 21 |
Figure 1Distributions of permuted statistics resemble known probability densities. Top panel shows a permutation histogram for a linear test statistic for the steroid hormone signaling pathway gene set as described in the text. The bottom panel shows a quadratic test statistic. Solid red dots indicate the observed values and curves indicate parametric fits, based on normal and χ 2 distributions.
Figure 2Permutation and moment-based p-values are tightly correlated. Permutation p-values (x-axis) versus moment-based p-values (y-axis) for 6,303 gene sets. The left two column represents results for a linear test statistic versus the beta and Gaussian approximations; the right-most column represents results for the sum of squares statistic versus the χ 2 approximation. Data come from three genome-wide expression studies. We applied the transformation − log10(p) to stretch the lower range of these distributions for a more informative visual. Red dotted lines represent the line y=x.
Spearman correlations between gold standard (999,999 and 499,999 permutations for linear and quadratic statistics) and approximation -values
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| Moran | 0.99991 | 0.99997 | 0.99973 | 0.99991 | 0.978 |
| Zhang | 0.99996 | 0.99997 | 0.99983 | 0.99991 | 0.990 |
| Scherzer | 0.99998 | 0.99999 | 0.99991 | 0.99997 | 0.994 |
p L and p represent results for one and two-tailed linear test statistics, respectively. Chisq p represents results for the sum of squares analysis.
Time in seconds for -value calculations for gene sets in three genome-wide expression studies
|
|
|
|
|
|---|---|---|---|
|
| 31.03 | 29.84 | 34.71 |
|
| 31.95 | 32.49 | 35.54 |
|
| 5010.17 | 4434.77 | 3933.15 |
| Normal | 29.74 | 27.00 | 34.66 |
| Beta | 30.79 | 31.88 | 37.89 |
|
| 9146.27 | 7217.59 | 11808.02 |
|
| 12256.54 | 9636.06 | 16545.60 |
|
| 16833.08 | 12564.06 | 21480.80 |
|
| 149588.37 | 129667.73 | 187067.91 |
|
| 11020.62 | 10600.82 | 12677.15 |
Linear statistic results with M = 100, M = 500, and M = 1,000,000 permutations, and the normal and beta approximations are in the top block. Timings for the quadratic statistic with M = 30,000, M = 40,000, M = 50,000, and M = 500,000 permutations, and the χ 2 approximation are presented in the bottom block.