| Literature DB >> 19259400 |
Abstract
Whole genome microarray investigations (e.g. differential expression, differential methylation, ChIP-Chip) provide opportunities to test millions of features in a genome. Traditional multiple comparison procedures such as familywise error rate (FWER) controlling procedures are too conservative. Although false discovery rate (FDR) procedures have been suggested as having greater power, the control itself is not exact and depends on the proportion of true null hypotheses. Because this proportion is unknown, it has to be accurately (small bias, small variance) estimated, preferably using a simple calculation that can be made accessible to the general scientific community. We propose an easy-to-implement method and make the R code available, for estimating the proportion of true null hypotheses. This estimate has relatively small bias and small variance as demonstrated by (simulated and real data) comparing it with four existing procedures. Although presented here in the context of microarrays, this estimate is applicable for many multiple comparison situations.Entities:
Keywords: epigenomics; false discovery rate; microarray; multiple comparisons; type I error rate
Year: 2008 PMID: 19259400 PMCID: PMC2623313
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
The estimate of the proportion of true null hypotheses is compared for: Benjamini and Hochberg’s lowest slope approach (LSL), Storey’s π̂0 (λ) estimate with λ selected via bootstrapping (Storeyboot), Storey and Tibshirani’s smoother method (STsmoother), Langass’s nonparametric maximum likelihood approach (convest), and the proposed average estimate approach with fixed values of B = 5, 10, 20, 50, 100 and with B chosen via the bootstrapping procedure (Bboot). There are 1,000 simulated data sets, each with a total of m = 1, 000 hypothesis tests, for each value of π0.
| π0 | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 |
|---|---|---|---|---|---|
| Estimates of π0 | |||||
| LSL | 0.7151 | 0.7889 | 0.8561 | 0.9184 | 0.9683 |
| Storeyboot | 0.4814 | 0.5789 | 0.6765 | 0.7728 | 0.8660 |
| ST smoother | 0.4951 | 0.5939 | 0.6980 | 0.7993 | 0.8973 |
| convest | 0.4963 | 0.5938 | 0.6947 | 0.7921 | 0.8882 |
| 0.5132 | 0.6113 | 0.7136 | 0.8086 | 0.9058 | |
| 0.5082 | 0.6084 | 0.7083 | 0.8045 | 0.9052 | |
| 0.5141 | 0.6128 | 0.7115 | 0.8076 | 0.9064 | |
| 0.5196 | 0.6175 | 0.7156 | 0.8106 | 0.9078 | |
| 0.5243 | 0.6210 | 0.7180 | 0.8122 | 0.9085 | |
| 0.5195 | 0.6175 | 0.7148 | 0.8113 | 0.9082 | |
| LSL | 0.0323 | 0.0269 | 0.0225 | 0.0155 | 0.0092 |
| Storeyboot | 0.0467 | 0.0491 | 0.0513 | 0.0522 | 0.0549 |
| ST smoother | 0.0513 | 0.0570 | 0.0608 | 0.0654 | 0.0656 |
| convest | 0.0331 | 0.0364 | 0.0337 | 0.0321 | 0.0328 |
| 0.0335 | 0.0356 | 0.0420 | 0.0428 | 0.0382 | |
| 0.0391 | 0.0390 | 0.0402 | 0.0412 | 0.0366 | |
| 0.0331 | 0.0343 | 0.0358 | 0.0371 | 0.0331 | |
| 0.0293 | 0.0309 | 0.0321 | 0.0334 | 0.0315 | |
| 0.0272 | 0.0291 | 0.0307 | 0.0321 | 0.0312 | |
| 0.0301 | 0.0301 | 0.0313 | 0.0313 | 0.0311 | |
Figure 1Simulation results of the False Discovery Rate (FDR) at significance level α = 0.05 for seven procedures: Benjamini and Hochberg’s FDR controlling procedure with incorporation of the true π0 (BHπ0 ), Benjamini and Hochberg’s FDR controlling procedure (BH), Benjamini and Hochberg’s adaptive approach with incorporation of the estimate of π0 which is estimated by the proposed average estimate procedure where B is chosen via bootstrapping (Bboot), Benjamini and Hochberg’s lowest slope approach (LSL), Storey’s bootstrapping approach (Storeyboot), Storey and Tibshirani’s smoother method (STsmoother), and Langass et al.’s nonparametric maximum likelihood estimate (convest), respectively. The black straight line represents FDR = 0.05. The total number of hypotheses tests is m = 1, 000 and the size of simulation study 1,000 for each value of π0.
Figure 2Simulation results for the evaluation of statistical power at significance level α = 0.05 for seven procedures: Benjamini and Hochberg’s FDR controlling procedure with incorporation of the true π0 (BHπ0 ), Benjamini and Hochberg’s FDR controlling procedure (BH), Benjamini and Hochberg’s adaptive approach with incorporation of the estimate of π0 which is estimated by the proposed average estimate procedure where B is chosen via bootstrapping (Bboot), Benjamini and Hochberg’s lowest slope approach (LSL), Storey’s bootstrapping approach (Storeyboot), Storey and Tibshirani’s smoother method (STsmoother) and Langass et al.’s nonparametric maximum likelihood estimate (convest), respectively. The total number of hypotheses tests is m = 1, 000, and the size of simulation study is 1,000 for each value of π0.
The estimate of the proportion of true null hypotheses and the number of statistically significant genes for the leukemai data (Golub et al. 1999) at significance level α = 0.05 after applying Benjamni and Hochberg’s adaptive FDR controlling procedure with π0 estimated using five methods: Benjamini and Hochberg’s lowest slope approach (LSL), Storey’s π̂0 (λ) estimate with λ selected via bootstrapping (Storeyboot), Storey and Tibshirani’s smoother method (STsmoother), Langass’s convest approach (convest), and the proposed average approach with B chosen via the bootstrapping procedure (Bboot). A two-sample t-test was used to compute the p-values.
| Method | Estimate of π0 | Number of Signicant genes |
|---|---|---|
| LSL | 0.899 | 584 |
| Storeyboot | 0.595 | 787 |
| ST smoother | 0.583 | 791 |
| convest | 0.595 | 787 |
| 0.604 | 776 |