| Literature DB >> 27240256 |
Nikolaos Ignatiadis1, Bernd Klaus1, Judith B Zaugg1, Wolfgang Huber1.
Abstract
Hypothesis weighting improves the power of large-scale multiple testing. We describe independent hypothesis weighting (IHW), a method that assigns weights using covariates independent of the P-values under the null hypothesis but informative of each test's power or prior probability of the null hypothesis (http://www.bioconductor.org/packages/IHW). IHW increases power while controlling the false discovery rate and is a practical approach to discovering associations in genomics, high-throughput biology and other large data sets.Entities:
Mesh:
Year: 2016 PMID: 27240256 PMCID: PMC4930141 DOI: 10.1038/nmeth.3885
Source DB: PubMed Journal: Nat Methods ISSN: 1548-7091 Impact factor: 28.547
Examples of covariates.
| Application | Covariate |
|---|---|
| Differential expression analysis | Sum of read counts per gene across all samples [ |
| Genome-wide association study (GWAS) | Minor allele frequency |
| Expression-QTL analysis | Distance between the genetic variant and genomic location of the phenotype |
| ChIP-QTL analysis | Comembership in a topologically associated domain [ |
| Overall variance [ | |
| Two-sided tests | Sign of the effect |
| Various applications | Signal quality, sample size |
Figure 1Histograms stratified by the covariate as a diagnostic plot.
a) The histogram of all p-values shows a mixture of a uniform distribution (corresponding to the true null hypotheses) and an enrichment of small p-values to the left (corresponding to the alternatives). Such a well-calibrated histogram is the starting point for most multiple testing methods. b-d) Histograms after splitting the hypotheses into three groups based on the values of the covariate. Shown is an example of a good covariate: each histogram still shows a uniform component, but the mixture proportion and/or the shape of the alternative distribution differ between the groups. If all histograms look the same, the covariate is uninformative, and its use would not lead to an increase in power. If the tails are no longer uniform, independence under the null is violated, and application of IHW is not valid.
Figure 2Performance evaluation.
Panels a-c show the number of discoveries with IHW and BH on real data as a function of the target FDR. a) RNA-Seq dataset [13] with mean of normalized counts for each gene as the covariate. b) SILAC dataset [15], with number of peptides quantified per protein as the covariate. c) hQTL dataset [16] for Chromosome 21, with genomic distance between SNPs and ChIP-seq signals as the covariate. Independent Filtering with different distance cutoffs was also applied. d) Weight function learned by IHW at α = 0.1 for the hQTL dataset. Shown are the curves for the five folds in the data splitting scheme. Panels e-h benchmark different methods based on simulations. Brief descriptions of each method are in Table 2. e–f) Type I error control if all null hypotheses are true. Shown is the true FDR against the nominal significance level α. e) All methods shown make too many false discoveries. f) BH, FDRreg, and IHW control the FDR. LSL-GBH and Clfdr are slightly anticonservative. g-h) Implications of different effect sizes. The two-sample t-test was applied to Normal samples (n = 2 × 5, σ = 1) with either the same mean (nulls) or means differing by the effect size indicated on the x-axis (alternatives). The fraction of alternatives was 0.05. The pooled sample variance was used as the covariate. The nominal level was α = 0.1 (dotted line). g) The y-axis shows the actual FDR. h) Power analysis. All methods show improvement over BH.
Short description of the different methods benchmarked and summary of the results of Fig. 2e–h and Supplementary Fig. 2.
| Method | Short description | Type I error control | Gain in power | Comment | ||
|---|---|---|---|---|---|---|
| t-test | t-test (vs BH) | size investing | ||||
| BH | Method of Benjamini and Hochberg [ | Yes | Yes | – | – | |
| IHW | Independent hypothesis weighting, as proposed here. | Yes | Yes | Yes | Yes | |
| Naive | Naive independent hypothesis weighting, as proposed here. | No | No | Yes | Yes | |
| Greedy Independent Filtering | The Independent Filtering procedure [ | No | No | Yes | No | The covariate-weights function is a binary step, monotonic. |
| SBH | Stratified Benjamini-Hochberg [ | No | No | Yes | No | |
| TST-GBH | The Group BH procedure [ | No | No | Yes | No | |
| LSL-GBH | The Group BH procedure [ | Yes | Yes | Yes | No | |
| Clfdr | In the Clfdr procedure [ | Yes | No | Yes | Yes | |
| FDRreg | The FDR regression method [ | Yes | No | Yes | No | Requires |
Figure 3True discovery rate and informative covariates.
a) Schematic representation of the density fi, which is composed of the alternative density f1,i weighted by its prior probability π1,i and the uniform null density weighted by π0,i. b-d) The true discovery rate (tdr) of individual tests can vary. In b), the test has high power, and π0,i is well below 1. In c), the test has equal power, but π0,i is higher, leading to a reduced tdr. In d), π0,i is like in b), but the test has little power, again leading to a reduced tdr. e) If an informative covariate is associated with each test, the distribution of the p-values from multiple tests is different for different values of the covariate. The contours represent the joint density of p-values and covariate. The BH procedure accounts only for the p-values and not the covariates (dashed red line). In contrast, the decision boundary of IHW is a step function; each step corresponds to one group, i. e., to one weight. f) By Equation (1), the density of the tdr also depends on the covariate. The decision boundary of the BH procedure (dashed red line) leads to a suboptimal set of discoveries, in this example with higher than optimal tdr for intermediate covariate values and too low otherwise. In contrast, IHW approximates a line of constant tdr, implying efficient use of the FDR budget. An important feature of IHW is that it works directly on p-values and covariates rather than explicitly estimating the tdr.