| Literature DB >> 25413493 |
Tom Heskes1, Rob Eisinga2, Rainer Breitling3.
Abstract
BACKGROUND: The rank product method is a powerful statistical technique for identifying differentially expressed molecules in replicated experiments. A critical issue in molecule selection is accurate calculation of the p-value of the rank product statistic to adequately address multiple testing. Both exact calculation and permutation and gamma approximations have been proposed to determine molecule-level significance. These current approaches have serious drawbacks as they are either computationally burdensome or provide inaccurate estimates in the tail of the p-value distribution.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25413493 PMCID: PMC4245829 DOI: 10.1186/s12859-014-0367-1
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Visualization of recursion. Visualization of recursion with k, the numbers of experiments, on the y-axis, and j, the index of the interval that contains the rank product ρ, on the x-axis. Nodes correspond to combinations of k and j. The squares are given: with n the number of molecules. The arrows show the dependencies between the nodes. For example, to compute we first need to compute and . The red path visualizes the calculations required to obtain
Figure 2Computation time (in milliseconds) for calculating 10000 upper bound -values for = 10000 and = 2, …, 50.
Figure 3Bounds and approximations of -value distribution. (A) Strict bounds and approximations (geometric mean of upper and lower bound, and gamma) for n = 10000 molecules and k = 4 experiments, on the left-hand side over the whole range of rank products, on the right-hand side for small rank products only (gamma approximation is outside the figure). It can be seen that, on the log scale, the bounds are very tight. Zooming in on small rank products, the bounds are on average about a factor 2.5 off (i.e., higher/lower than the exact p-value), yet the geometric mean approximation is still very close to the exact p-value. (B) Same as (A), but for n = 10000 and k = 20. The curve on the left looks more or less the same but, as is best seen on the right, the bounds are much further off (almost a factor 1000). (C) Same as (A), but for n = 10 and k = 4. The curve on the left may look worse, but that is mainly because of the scaling of the y-axis. Relatively speaking, the bounds are still on average about a factor 2.5 off. (D) Same as (A), but for n = 10 and k = 20. With very small n and relatively large k, we get the worst of both worlds.
Top-25 age-associated genes with increased expression level (Van den Akker [19])
|
|
|
|
| ||||
|---|---|---|---|---|---|---|---|
|
|
|
|
|
| |||
| GPR56 | 9289 | 9282 | 2.645 × 10−10 | 5.255 × 10−09 | 3.888 × 10−10 | 2.709 × 10−10 | 1.887 × 10−10 |
| HF1 | 3075 | 48576 | 2.074 × 10−09 | 2.296 × 10−08 | 2.873 × 10−09 | 2.117 × 10−09 | 1.559 × 10−09 |
| SYT11 | 23208 | 57600 | 2.550 × 10−09 | 2.671 × 10−08 | 3.510 × 10−09 | 2.601 × 10−09 | 1.927 × 10−09 |
| ARP10 | 164668 | 179400 | 9.817 × 10−09 | 7.297 × 10−08 | 1.303 × 10−08 | 1.000 × 10−08 | 7.680 × 10−09 |
| B3GAT1(CD57) | 27087 | 278460 | 1.635 × 10−08 | 1.075 × 10−07 | 2.142 × 10−08 | 1.666 × 10−08 | 1.295 × 10−08 |
| SLC1A7 | 6512 | 483780 | 3.078 × 10−08 | 1.746 × 10−07 | 3.970 × 10−08 | 3.135 × 10−08 | 2.476 × 10−08 |
| IFNG | 3458 | 1594440 | 1.171 × 10−07 | 4.953 × 10−07 | 1.465 × 10−07 | 1.192 × 10−07 | 9.697 × 10−08 |
| DSCR1L1 | 10231 | 2004864 | 1.507 × 10−07 | 6.046 × 10−07 | 1.874 × 10−07 | 1.533 × 10−07 | 1.254 × 10−07 |
| ARK5 | 9891 | 2726880 | 2.110 × 10−07 | 7.898 × 10−07 | 2.605 × 10−07 | 2.146 × 10−07 | 1.768 × 10−07 |
| PIG13 | 81563 | 3549314 | 2.809 × 10−07 | 9.927 × 10−07 | 3.448 × 10−07 | 2.857 × 10−07 | 2.367 × 10−07 |
| SPUVE | 11098 | 3880576 | 3.093 × 10−07 | 1.072 × 10−06 | 3.789 × 10−07 | 3.146 × 10−07 | 2.612 × 10−07 |
| PDGFRB | 5159 | 4294368 | 3.451 × 10−07 | 1.171 × 10−06 | 4.217 × 10−07 | 3.509 × 10−07 | 2.920 × 10−07 |
| EDG8 | 53637 | 5083584 | 4.137 × 10−07 | 1.355 × 10−06 | 5.037 × 10−07 | 4.207 × 10−07 | 3.513 × 10−07 |
| MARLIN1 | 152789 | 5505984 | 4.507 × 10−07 | 1.451 × 10−06 | 5.477 × 10−07 | 4.582 × 10−07 | 3.833 × 10−07 |
| TGFBR3 | 7049 | 8081700 | 6.784 × 10−07 | 2.021 × 10−06 | 8.176 × 10−07 | 6.896 × 10−07 | 5.815 × 10−07 |
| GZMB | 3002 | 9886240 | 8.396 × 10−07 | 2.404 × 10−06 | 1.008 × 10−06 | 8.533 × 10−07 | 7.227 × 10−07 |
| DEFA3 | 1168 | 9980528 | 8.481 × 10−07 | 2.423 × 10−06 | 1.018 × 10−06 | 8.619 × 10−07 | 7.301 × 10−07 |
| KRT1 | 3848 | 11787930 | 1.010 × 10−06 | 2.796 × 10−06 | 1.208 × 10−06 | 1.027 × 10−06 | 8.728 × 10−07 |
| CX3CR1 | 1524 | 12060288 | 1.035 × 10−06 | 2.851 × 10−06 | 1.237 × 10−06 | 1.052 × 10−06 | 8.944 × 10−07 |
| STYK1 | 55359 | 14337372 | 1.241 × 10−06 | 3.308 × 10−06 | 1.477 × 10−06 | 1.260 × 10−06 | 1.076 × 10−06 |
| ADRB2 | 154 | 16272900 | 1.416 × 10−06 | 3.687 × 10−06 | 1.681 × 10−06 | 1.438 × 10−06 | 1.231 × 10−06 |
| GAF1 | 26056 | 35217600 | 3.138 × 10−06 | 7.128 × 10−06 | 3.667 × 10−06 | 3.186 × 10−06 | 2.769 × 10−06 |
| CTSL | 1514 | 38246400 | 3.414 × 10−06 | 7.647 × 10−06 | 3.982 × 10−06 | 3.465 × 10−06 | 3.016 × 10−06 |
| GFI1 | 2672 | 56960480 | 5.107 × 10−06 | 1.072 × 10−05 | 5.907 × 10−06 | 5.183 × 10−06 | 4.547 × 10−06 |
| TTC38 | 55020 | 59340600 | 5.322 × 10−06 | 1.110 × 10−05 | 6.150 × 10−06 | 5.400 × 10−06 | 4.742 × 10−06 |
Number of genes called significant according to Bonferroni correction and FDR -values
|
|
| |||
|---|---|---|---|---|
|
|
|
| ||
|
| ||||
| Exact | 25 | |||
| Gamma | 21 | 14 | 40 | 112 |
| Upper bound | 23 | 21 | 57 | 122 |
| Geometric mean | 25 | 21 | 58 | 129 |
| Lower bound | 26 | 21 | 59 | 131 |
|
| ||||
| Exact | 42 | |||
| Gamma | 30 | 23 | 69 | 140 |
| Upper bound | 42 | 39 | 74 | 143 |
| Geometric mean | 42 | 42 | 74 | 154 |
| Lower bound | 43 | 43 | 75 | 157 |