Literature DB >> 29499647

Control procedures and estimators of the false discovery rate and their application in low-dimensional settings: an empirical investigation.

Regina Brinster^1,2, Anna Köttgen³, Bamidele O Tayo⁴, Martin Schumacher⁵, Peggy Sekula³.

Abstract

BACKGROUND: When many (up to millions) of statistical tests are conducted in discovery set analyses such as genome-wide association studies (GWAS), approaches controlling family-wise error rate (FWER) or false discovery rate (FDR) are required to reduce the number of false positive decisions. Some methods were specifically developed in the context of high-dimensional settings and partially rely on the estimation of the proportion of true null hypotheses. However, these approaches are also applied in low-dimensional settings such as replication set analyses that might be restricted to a small number of specific hypotheses. The aim of this study was to compare different approaches in low-dimensional settings using (a) real data from the CKDGen Consortium and (b) a simulation study.
RESULTS: In both application and simulation FWER approaches were less powerful compared to FDR control methods, whether a larger number of hypotheses were tested or not. Most powerful was the q-value method. However, the specificity of this method to maintain true null hypotheses was especially decreased when the number of tested hypotheses was small. In this low-dimensional situation, estimation of the proportion of true null hypotheses was biased.
CONCLUSIONS: The results highlight the importance of a sizeable data set for a reliable estimation of the proportion of true null hypotheses. Consequently, methods relying on this estimation should only be applied in high-dimensional settings. Furthermore, if the focus lies on testing of a small number of hypotheses such as in replication settings, FWER methods rather than FDR methods should be preferred to maintain high specificity.

Entities: Chemical Disease Mutation Species

Keywords: False discovery rate; Low-dimensional setting; Q-value method; Simulation study

Mesh：

Year: 2018 PMID： 29499647 PMCID： PMC5833079 DOI： 10.1186/s12859-018-2081-x

Source DB: PubMed Journal: BMC Bioinformatics ISSN： 1471-2105 Impact factor: 3.169

Background

Advances in molecular biology and laboratory techniques allow for evaluating a multitude of different features in humans on a large scale to elucidate (patho-)physiology and risk factors for a specific disease or its progression. In recent studies, up to millions of features are often assessed simultaneously in discovery set analyses such as in genome-wide association studies (GWAS) where single nucleotide polymorphisms (SNPs) are evaluated with respect to a single trait or clinical outcome [1]. For reasons of practicability, the usual analysis procedure of such high-dimensional data comprises statistical testing of each single feature separately with the outcome of interest [2]. Statistical testing aims to verify a hypothesis, which is either rejected or accepted based on the observed test statistic [3]. Depending on the decision, there are two possible mistakes that can occur: The null hypothesis might be erroneously rejected although it is true (false positive decision, type I error) or failed to reject although it is false (false negative decision, type II error). The type I error can be controlled by defining a significance threshold. For a single hypothesis, a commonly used threshold is α=0.05. However, when testing multiple hypotheses such as in GWAS, the application of a threshold like 0.05 across all tests will result in an unacceptable large number of false positive results. Consequently, other ways to control the type I error are required. In general, there are different approaches: the control of the family-wise error rate (FWER) and the control or the estimation of the false discovery rate (FDR) [4]. FWER methods such as the well-known Bonferroni correction [5] were already proposed when the number of tested hypotheses was not as large as, for example, in GWAS nowadays. Although often applied, these methods are thought to be too conservative in a high-dimensional setting. Alternatively, FDR methods that are less conservative and partially developed in the context of high-dimensional data can be used. In addition, there are approaches to estimate a significance measure for each individual hypothesis, such as the local false discovery rate (LFDR) [6] and the q-value [7]. FDR methods are also used quite frequently nowadays and not only in high-dimensional settings but also in situations where the number of assessed features is small such as in a replication set analysis restricted to the significant hypotheses of the discovery set analysis. For a small number of features, however, there are limited data on the performance of FDR methods. The aim of this study was thus to assess FDR methods in low-dimensional data and to compare them to classic FWER methods. For this purpose, we used real data obtained from the CKDGen Consortium [8] to illustrate the different control methods. Moreover, we conducted a simulation study to evaluate different control methods in different settings.

Methods

Control methods

In order to describe different error control and estimation methods, we adopted the notation of Benjamini and Hochberg [9] on test decisions (Table 1). Assuming m hypotheses H1, …, Hm were tested leading to the observation of the respective m p-values p1, …, pm. If the truth would be known, type I errors are described by V and type II errors by T. However, only m and the total number of rejections, R, are observable in practice. The overall significance threshold is called α.

Table 1

Statistical hypothesis test with possible test decisions related to the unknown truth (notation)

		Test decision
		declared non-significant	declared significant	Total
Underlying truth	true null	U	V (type I error, α)	m ₀
Underlying truth	non-null/alternative	T (type II error, β)	S	m ₁
Total		m-R	R	m

Statistical hypothesis test with possible test decisions related to the unknown truth (notation)

Methods controlling the family-wise error rate (FWER)

FWER is defined as the probability of making at least one false positive decision: FWER = Pr(V > 0). The error rate can be controlled by a fixed threshold α. In the following, four well known methods are considered (Table 2a):

Table 2

Algorithms of methods controlling family-wise error rate (FWER) and false discovery rate (FDR) Let m be the number of hypotheses H1, …, Hm to test and p1, …, pm their respective m p-values. The p-values ranked in increasing order are defined as p(1) ≤ … ≤ p(m). The overall significance threshold is called α. Furthermore, let be the estimated proportion of true null hypotheses The simplest and likely most often applied control method of the FWER is the [10]. It compares each individual p-value p1, …, pm with the fixed threshold. P-values that are smaller than the threshold lead to the rejection of the respective null hypothesis. The Bonferroni correction guarantees the control of the FWER at level α in a strong sense, which means that the control is ensured for every proportion of true null hypotheses. Bonferroni correction does not demand independent p-values and hence can be applied to any dependency structures. Nevertheless, Bonferroni can be conservative; true alternatives might therefore be missed. To reduce the number of missed true alternatives, approaches to adjust Bonferroni correction were proposed that use the number of independent tests (also: effective number) instead of the actual number of conducted tests (e.g. Li et al. [11]). Therefore, these approaches gain in power over the traditional Bonferroni correction. In the specific context of GWAS, for example, an adjusted Bonferroni correction frequently applied was proposed by Pe’er et al. [12] that accounts for correlation between SNPs due to linkage disequilibrium (LD) by estimating the number of independent genome-wide loci (n = 1,000,000 in individuals of European ancestry). Instead of using the much larger number of all SNPs tested for association (often several millions), the overall significance threshold such as α=0.05 is divided by the number of independent SNPs to define an adjusted significance threshold. For GWAS on Europeans, for example, the significance threshold becomes . Similarly, the number of independent tests in the field of metabolomics can be estimated with help of principle component analysis to reduce the number of all tests used in Bonferroni correction (e.g. Grams et al. [13]). The other three FWER control methods considered below are sequential methods for which p-values need to be ranked in increasing order: p(1) ≤ … ≤ p(m). [10] rejects at least as many hypotheses as Bonferroni correction does. The gain in power of Holm’s procedure by defining more features significant is larger with larger number of alternative hypotheses. Like the Bonferroni correction, Holm’s procedure has no restrictions with respect to the dependency structure of p-values. [14] and also [15] make use of the assumption that the p-values under the true null hypotheses hold a positive regression dependency. Positive dependency structure assumes the probability of a p-value belonging to the null hypothesis to be increasing with increasing p-value. In situations of a positive dependency structure, Hochberg’s procedure is more powerful than Holm’s [4]. Hommel’s procedure, however, is the most powerful FWER control procedure of the previously mentioned methods when the assumption holds since it rejects at least as many hypotheses as Hochberg does. One criticism of the method lies in the higher computational load.

Methods controlling the false discovery rate (FDR)

In contrast to FWER, the false discovery rate (FDR) represents the proportion of false positives. This error rate is defined as following: FDR can be controlled at a fixed significance threshold as well. Furthermore, Benjamini and Hochberg [9] proved that every FWER control method controls the FDR likewise. The three most common FDR control methods that also require ordered p-values are considered below (Table 2b): [9] controls the FDR at level α assuming positive dependent p-values (see description above) under the true null hypotheses such as Hommel’s and Hochberg’s FWER procedures. It shows greater power than any of the above mentioned FWER methods. The [16] is an adapted procedure of Benjamini-Hochberg’s that takes the estimation of the proportion of the true null hypotheses, π0, into account. The gain in power of the two-stage procedure compared to the classical Benjamini-Hochberg’s linear step-up procedure is dependent on the proportion of true null hypotheses (π0) [4]. For π0 close to 1, the adapted version has low power. The adaptive approach has been proven for independent p-values only. Finally, [17] has no restrictions on the dependency structure of p-values at all. It is more conservative compared to the Benjamini-Hochberg’s linear step-up procedure [4] and the two-stage linear step-up procedure [16].

Methods estimating the false discovery rate (FDR)

Recent approaches do not control the FDR in the traditional sense, but rather estimate the proportion of false discoveries. In order to estimate the FDR, the estimation of the proportion of the true null hypotheses, π0, is conducted first which can lead to a gain in power compared to the classic FWER and FDR control methods. Two common FDR estimation methods are described in the following: [7] uses a Bayesian approach to estimate the so-called positive false discovery rate (pFDR), a modified definition of the false discovery rate which assumes at least one rejection: . The approach is based on the idea of estimating the pFDR for a particular rejection region, γ, to achieve a control of the pFDR. In order to determine a rejection region, the q-value was introduced as the pFDR analogue of the p-value. The q-value provides an error measure for each observed p-value. It denotes the smallest pFDR that can occur when calling that particular p-value significant: . The approach assumes independent, respectively “weak dependent” p-values, whose dependency effect becomes negligible for a large number of p-values [18]. The method provides an improvement in power compared to the classic Benjamini-Hochberg’s linear step-up procedure due to its estimation of π0 [7]. Likewise, Strimmer [19] proposed an alternative method to estimate q-values based on pFDR (). In addition, the method provides estimates of the so-called local false discovery rate (LFDR, ) that again present individual significance measures such as the q-values for each p-value. It describes the probability that a p-value leads to a false positive decision given the observed data information. Estimations are based on a Bayesian approach using a modified Grenander density estimator [19].

Software implementation

R packages are available for all described control methods via CRAN [20] or Bioconductor [21]. Specifically, we used the packages multtest [22], qvalue [23] (Bioconductor), mutoss [24] and fdrtool [25] (CRAN) in our study. We applied the methods using default options of the packages. However, Storey’s q-value application displayed an error whenever the estimated proportion of true null hypotheses (π0) was close to zero, which occurred when all p-values happened to be (very) small. Therefore, we adjusted the range of input p-values (“lambda”) in a stepwise manner until the application allowed the estimation of π0. Further details on our R-code and the stepwise algorithm can be obtained directly from the authors. Statistical significance using either FWER, FDR controlling or FDR estimation methods such as the q-value methods or LFDR, was defined as a cutoff of 0.05.

Data example

For illustration of the different control methods, we obtained data from the CKDGen Consortium [8]. The aim of this project was to identify genetic variants associated with estimated glomerular filtration rate (eGFR), a measure for kidney function, and chronic kidney disease (CKD). Altogether, 48 study groups provided genome-wide summary statistics (GWAS results) from 50 study populations for SNP associations with eGFR based on serum creatinine (eGFRcrea) (2 study groups provided GWAS results for 2 subpopulations separately). The discovery meta-analysis of all GWAS was carried out using an inverse variance-weighted fixed effect model and incorporated data from 133,413 individuals of European ancestry. Genomic control had been applied before and also after meta-analysis to reduce inflation and thus limit the possibility of false positive results. In the meta-analysis, 29 previously identified loci and 34 independent novel loci (p-value < 10−6) were detected. Novel loci were then verified in an independent replication set (14 studies; N = 42,166). For 16 of the 34 novel loci, replication analysis showed direction-consistent results with p-value combining discovery and replication < 5×10−8 (see Table 1 in Pattaro et al. [8]). For all but 1 SNP (rs6795744), the reported q-values in the replication study were < 0.05. The results of the discovery meta-analyses for different traits including eGFRcrea (approximately 2.2 million SNPs) are publicly available [26]. Moreover, we obtained the summary statistics from GWAS results for eGFRcrea of all studies contributing to the discovery (48 studies, 50 result files) for our project. For the illustration of the different control methods in both discovery (high-dimensional) setting and replication (low-dimensional) setting, we split the 50 study contributions into two sets taking into account general study characteristics (population-based study versus diseased cohort) and imputation reference (HapMap versus 1000 Genomes [27]). By conditioning on the presence of at least one study from each of the 4 categories in either setting and on a sample size ratio of 2:1, study contributions were randomly assigned to discovery set or replication set. The final discovery set contained 35 studies with 90,565 individuals (67.9%) and the replication set 15 studies with 42,848 individuals (32.1%). Based on the same set of SNPs as in the publicly available data set, our discovery set was processed similarly to the original analysis [8] by using an inverse variance-weighted fixed effect model and genomic control before and after that step. For simplicity reasons we considered two-sided p-values in the discovery and replication set analysis. To select independently associated SNPs, SNPs were clustered based on LD pruning using the --clump command of Plink v1.90b2 (r2: 0.2, window: 1000 kb, significance threshold for index SNP: 10−6) [28] and data of 1000 Genomes project (phase 3) as the LD reference. SNPs with the lowest p-value within a specific region were considered as index SNPs. Few SNPs that were either not present in the reference or tri-allelic were excluded at this point. Using the prepared discovery data, the various FDR and FWER methods were then applied exploratively. Similar to the published analysis by the CKDGen Consortium (Pattaro et al. [8]), independent index SNPs with p-value < 10−6 were selected from the discovery set to be followed up in the replication set. The various control methods were subsequently applied to the results of the meta-analysis (same model as before but without genomic control) in the replication set to identify significant findings.

Simulation study

In order to assess power and specificity of the described FWER and FDR methods in detail, we conducted a simulation study with varying settings, with special emphasis on situations with a smaller number of tested features. The R-code of the simulation study can be requested from the author. For this purpose, test statistics for varying numbers of features (N = 4, 8, 16, 32, 64, 1000) were simulated to generate data sets. Test statistics for single features were simulated by drawing from with either β = 0 (null hypothesis) or β ∈ {1.0, 2.5} (alternative or non-null hypothesis). Depending on the number of features in a given data set, the proportion of the true null hypotheses π0 ∈ {25%, 50%, 75%, 100%} was a-priori defined. Each scenario defined by the different combinations of parameters was repeated 100 times. In preparation of the subsequent application of control methods, simulated test statistics were transformed into two-sided p-values. The power of each approach was defined as proportion of correctly rejected hypotheses among all true alternative hypotheses whereas the specificity was defined as the proportion of correctly maintained hypotheses among all true null hypotheses. Furthermore, we evaluated the estimation results of the proportion of true null hypotheses of Storey’s and Strimmer’s q-value methods within the simulation study.

Results

For the purpose of illustration, the 50 GWAS summary statistics provided by contributing study groups included in the original CKDGen discovery meta-analysis of eGFRcrea were split into 2 sets resembling a high-dimensional discovery set (35 studies, 90,565 individuals) and a low-dimensional replication set (15 studies, 42,848 individuals). Details on the two sets are provided in Additional file 1 and Additional file 2. Similar to the published analysis by the CKDGen Consortium (Pattaro et al. [8]), the discovery set was processed to select independent variants to be moved forward to a low-dimensional replication analysis. Based on p-value threshold < 10−6 followed by LD pruning, 57 index SNPs from different genomic regions were selected from the discovery set. The replication analysis of the 57 selected index SNPs showed direction-consistent effect estimates for 56 SNPs. Subsequently, the various control methods were applied to the meta-analysis results of the replication set to identify significant findings. Figure 1 presents the number of significant results of the different control procedures. Since the FWER methods Holm, Hochberg, and Hommel declared the same p-values as significant, we decided to display the performance of Hommel’s approach only.

Fig. 1

CKDGen data example – Number of significant p-values (regions) in replication set. Applied procedures controlling the type I error: Bonferroni correction (BO), Hommel’s procedure (HO), Benjamini-Yekutieli’s procedure (BY), Strimmer’s LFDR method (LFDR), Benjamini-Hochberg’s procedure (BH), Two-stage procedure (TSBH), Strimmer’s q-value method (qv Str), Storey’s q-value method (qv Sto). Results are ordered by number of significant p-values leading to a separation of FDR methods from FWER methods (indicated by dashed line). Additional significant p-values from one approach to another are indicated by decreasing gray shades within the bars In contrast to FDR methods, FWER methods rejected the smallest number of hypotheses with Bonferroni being least powerful. Among the FDR methods, FDR estimating methods by Strimmer and Storey provided more power. Storey’s q-value method rejected all hypotheses and it was the only approach which declared the direction-inconsistent SNP as significant. As expected, the applied FWER and FDR methods showed a monotone subset behavior related to rejected hypotheses, i.e. that the p-values declared significant from a more conservative approach were always included in the set of p-values declared significant from a less conservative method. This is a consequence of the methods’ property that – if a specific p-value is declared significant – all other smaller p-values are also declared significant.

Power and specificity of control methods

In a setting where the proportion of true null hypotheses, π0, is 100%, Storey’s and Strimmer’s q-value methods most often falsely rejected true null hypotheses when the number of tested hypotheses N is small (≤32), while for larger numbers of tested hypotheses and/or other methods the number of erroneous decisions mostly did not exceed 5 (Fig. 2a). Benjamini-Yekutieli’s procedure and Strimmer’s LFDR approach performed best with 0 to 3 repetitions of falsely rejected hypotheses for all N. As a remark, Strimmer’s LFDR approach could not provide any results for N = 4. Specificity of methods to correctly maintain hypotheses is similarly good on average; only Storey’s q-value method showed decreased specificity when the number of tested hypotheses was small.

Fig. 2

Simulation – Number of repetitions with at least 1 false positive decision and average specificity for π0 = 100% (a). Average power and specificity for β1 = 2.5 and π0 = 75% (b), 50% (c), 25% (d). Applied procedures controlling the type I error: Bonferroni correction, Hommel’s procedure, Benjamini-Hochberg’s procedure, Two-stage procedure, Benjamini-Yekutieli’s procedure, Storey’s q-value method, Strimmer’s q-value method, Strimmer’s LFDR method. Power is defined as the proportion of correctly rejected hypotheses and specificity as the proportion of correctly maintained hypotheses. Both proportions potentially range from 0 to 1. Simulations for each scenario were repeated 100 times When the proportion of true null hypotheses was < 100%, the power to correctly reject hypotheses was dependent on π0, the effect size (β) and N. On average, it increased with decreasing π0, increasing β and decreasing N overall. Figure 2b, c and d exemplarily show the average power for varying π0 and β1 = 2.5 under the alternative hypothesis, in dependence on N. Further figures for an effect size of β1= 1 can be found in the Additional file 3. As expected, FDR methods, especially the two q-values methods, were more powerful than FWER methods. In terms of specificity, Storey’s q-value method followed by Strimmer’s q-value method showed lower specificity results for small N (≤16) than other methods. We observed similarity in specificities among the other methods. Again, Strimmer’s LFDR approach did not provide results when number of hypotheses were < 8 (Fig. 2b) or < 16 (Fig. 2c and d).

Estimation of proportion of true null hypotheses

LFDR and q-value methods rely on the estimation of π0. Figure 3 displays its estimations using Storey’s and Strimmer’s q-value approaches for varying π0 and β1 = 2.5 under the alternative hypotheses (if present), while remaining figures are in the Additional file 4.

Fig. 3

Simulation – Observed estimations of π0 for Storey’s (qv) and Strimmer’s q-value methods (fdr) for π0 = 100% (a) and for β1 = 2.5 and π0 = 75% (b), 50% (c), 25% (d)

Simulation – Observed estimations of π0 for Storey’s (qv) and Strimmer’s q-value methods (fdr) for π0 = 100% (a) and for β1 = 2.5 and π0 = 75% (b), 50% (c), 25% (d) For small N, both estimations showed large variability within repetitions. Throughout all scenarios, Storey’s method showed greater estimation ranges of π0 compared to Strimmer’s q-value approach. Moreover, estimation of π0 was often biased. Only when β1 = 2.5 and N was larger than 32, bias essentially disappeared. When β1 = 1, however, π0 was overestimated on average, even for larger N.

Discussion

FDR estimation methods such as Strimmer’s LFDR or Storey’s q-value method have been mainly developed for high-dimensional settings, of which discovery GWAS is one. They provide a less conservative approach compared to standard FWER and FDR control methods. The LFDR as well as the q-value methods are Bayesian approaches which take the whole information on the data itself into account when estimating the proportion of true null hypotheses, π0. Consequently, for the purposes of FDR estimation, a high-dimensional setting is a great advantage allowing reasonable estimation of π0. Though controversial, the q-value methods as well as other FDR methods have been used in low-dimensional settings as well, such as in the analysis of replication data sets consisting of only limited number of SNPs. We thus aimed to compare various FWER and FDR methods including the q-value method in order to assess their power and specificity in low-dimensional settings using simulated data and application to real data. The analysis of our example data from the CKDGen Consortium [8] showed that the FDR estimation methods by Strimmer and Storey declared the largest number of SNPs significant in the low-dimensional replication analysis of 57 SNPs, followed by the FDR control methods of Benjamini-Hochberg and Benjamini-Yekutieli. As expected, the FWER control methods showed the lowest power by declaring the least number of p-values significant. Of note, Storey’s q-value method was the only approach which declared the single SNP (rs10201691) that showed direction-inconsistent results between the discovery and replication analyses as significant in the replication analysis. To deepen the understanding, we conducted a simulation study to systematically assess different scenarios. As one result, the differences between the methods that were seen in the application could be confirmed. For example, Storey’s q-value method showed the highest power especially for a small number of hypotheses. At the same time, however, the specificity results for Storey’s method were lowest when number of tested hypotheses was small. In the presence of alternative hypotheses (π0 < 100%), we also observed that the FDR methods, Benjamini-Hochberg and the two-stage approach, − although less powerful than both q-value methods – were more powerful than the FWER control methods of Bonferroni and Hommel, but of similar specificity. Since both q-value methods as well as LFDR rely on the estimation of π0, we also investigated its estimation accuracy using the different approaches. For both methods, the estimate of π0 was often biased, especially when numbers of tested hypotheses were small. In addition, Storey’s q-value method showed much higher variance compared to Strimmer’s approach. In summary, the q-value methods rejected in general the largest number of hypotheses which is especially of advantage if researchers wish to obtain a greater pool of significant features to be followed-up in subsequent studies, at the expense of specificity. However, their application should be restricted to high-dimensional settings. The gain in power for both q-value methods, however, was not observed for LFDR in the simulation study. Strimmer reported the gain in power of the q-value method compared to the LFDR as well and explained it as the tendency of q-values being smaller or equal compared to LFDR for a given set of p-values [19]. In the context of gene expression, Lai [29] mentioned a tendency of the q-value to underestimate the true FDR leading to a larger number of low q-values especially when the proportion of differentially expressed genes is small or the overall differential expression signal is weak. We also observed an underestimation in our simulation study, especially for a smaller number of p-values. To overcome this issue, Lai [29] suggested a conservative adjustment of the estimation of the proportion of true null hypotheses, the p-values or the number of identified genes. Moreover, when applying q-value methods or LFDR, correct interpretation of these estimates is requested that is different for the q-values and for LFDR. Strimmer [19] highlighted the easier interpretation of the LFDR compared to the q-value since the LFDR provides point estimates for the proportion of false discoveries for individual hypotheses whereas the q-value of a p-value is the expected proportion of false positives when calling that feature significant [18]. In any case, when applying FDR estimation methods, there is a critical need for a sizeable data set [18, 19]. Storey and Tibshirani [18] described their q-value method as a more explorative tool compared to FWER methods and therefore as well-performing procedure in high-dimensional data. A more recent FDR estimation approach by Stephens [30] provides an alternative to the LFDR, the so called local false sign rate. This empirical Bayes approach describes the probability of making an error in the sign of a certain variant if forced to declare it either as true or false discovery. Simulation studies showed smaller and more accurate estimation of π0 by Stephens’ approach compared to Storey’s q-value method leading to more significant discoveries [30]. However, small sample sizes represent a challenge for this FDR estimation approach as well. Another observation of our simulation study worth mentioning was that the FDR method by Benjamini-Yekutieli for arbitrary dependencies, and thus assumed to be more conservative than the Benjamini-Hochberg method, was not only outperformed by this method in terms of power in our application data and simulation, but also less powerful than FWER control methods in some scenarios of our simulation. The latter had already been observed, especially if the expected number of alternative hypotheses is very small [4]. Since Benjamini-Hochberg’s approach controls the FDR at level π0α, adaptive FDR control methods such as the two-stage approach were developed to control the FDR directly at level α by taking estimated π0 into account and thereby gaining power. Especially if π0 is substantially smaller than 1, the adaptive approaches might outperform Benjamini-Hochberg’s procedure [4]. Before concluding the discussion on results, some limitations of this study warrant mentioning: Although it was important for us to illustrate the effect of the different control methods on the results in real data, observed differences may not be transferrable to every other study setting in general. To overcome this limitation, we conducted a simulation study. Still, the simulation study has limitations of its own: We used a simplified approach to generate data by simulating test statistics rather than analytical data sets to which control methods would have been applied after analysis. Furthermore, we explored a limited set of scenarios and did not consider dependency structures but evaluated p-values that were derived from independently simulated test statistics. Hence, additional work could add to the current understanding. In the face of all the different control methods, it is clear that the decision on what method is actually applied in a given setting should be made not only before the analysis is conducted but also on reasonable ground. Among others, aspects to consider include: (a) the amount of tests to be conducted, (b) the general aim of testing, (c) what is known or can be assumed about dependency structure of p-values under the true null hypothesis and (d) what is the assumed proportion of null hypotheses. If the general aim of the analysis lies on the specific testing of individual hypotheses, FWER control methods should be preferred to FDR control or estimation methods because they provide higher specificity by correctly maintaining true null hypotheses. Within FWER control methods, the power might differ slightly and is, especially, in dependence of given p-value structure. If a positive structure can be assumed, Hochberg’s or Hommel’s procedures are preferable to gain power. The computational burden that comes along with Hommel’s procedure should not be a true issue nowadays. Goeman and Solari [4] especially expected a gain in power of Hochberg’s and Hommel’s compared to Bonferroni’s and Holm’s methods if the proportion of alternative hypotheses is rather large. We, however, observed only a rather small gain in power in our simulation study that might be induced by the simulation of independent test statistics. If researchers, however, wish to identify a promising set of hypotheses for follow-up rather than specific testing of single hypotheses with high specificity, we agree with Goeman and Solari [4] who recommended the use of FDR control methods. To reach highest power, one may even apply the FDR estimating method of q-values, when the number of tests is reasonably large.

Conclusions

In summary, our findings highlight the importance of a larger data set for the application of FDR estimation methods in order to guarantee reliable estimation of the proportion of true null hypotheses. The choice of control method mainly depends on the specific setting and the aims of an analysis. For example, when high specificity in testing of a limited number of hypotheses as in a replication study is desired, we recommend to utilize FWER methods rather than FDR methods. CKDGen study contributions assigned to discovery set for illustration of procedures controlling type I error. (DOCX 17 kb) CKDGen study contributions assigned to replication set for illustration of procedures controlling type I error. (DOCX 18 kb) Simulation – Average power and specificity for β1 = 1 and π0 = 75% (a), 50% (b), 25% (c). Applied procedures controlling the type I error: Bonferroni correction, Hommel’s procedure, Benjamini-Hochberg’s procedure, Two-stage procedure, Benjamini-Yekutieli’s procedure, Storey’s q-value method, Strimmer’s q-value method, Strimmer’s LFDR method. Power is defined as the proportion of correctly rejected hypotheses and specificity as the proportion of correctly maintained hypotheses. Both proportions potentially range from 0 to 1. Simulations for each scenario were repeated 100 times. (PNG 457 kb) Simulation – Observed estimations of π0 for Storey’s (qv) and Strimmer’s q-value methods (fdr)for β1 = 1 and π0 = 75% (a), 50% (b), 25% (c). (PNG 209 kb)

14 in total

1. Statistical significance for genomewide studies.

Authors: John D Storey; Robert Tibshirani
Journal: Proc Natl Acad Sci U S A Date: 2003-07-25 Impact factor: 11.205

2. Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix.

Authors: J Li; L Ji
Journal: Heredity (Edinb) Date: 2005-09 Impact factor: 3.821

3. Multiple hypothesis testing in genomics.

Authors: Jelle J Goeman; Aldo Solari
Journal: Stat Med Date: 2014-01-08 Impact factor: 2.373

4. False discovery rates: a new deal.

Authors: Matthew Stephens
Journal: Biostatistics Date: 2017-04-01 Impact factor: 5.899

5. Metabolomic Alterations Associated with Cause of CKD.

Authors: Morgan E Grams; Adrienne Tin; Casey M Rebholz; Tariq Shafi; Anna Köttgen; Ronald D Perrone; Mark J Sarnak; Lesley A Inker; Andrew S Levey; Josef Coresh
Journal: Clin J Am Soc Nephrol Date: 2017-09-28 Impact factor: 8.237

6. A global reference for human genetic variation.

Authors: Adam Auton; Lisa D Brooks; Richard M Durbin; Erik P Garrison; Hyun Min Kang; Jan O Korbel; Jonathan L Marchini; Shane McCarthy; Gil A McVean; Gonçalo R Abecasis
Journal: Nature Date: 2015-10-01 Impact factor: 49.962

7. A statistical method for the conservative adjustment of false discovery rate (q-value).

Authors: Yinglei Lai
Journal: BMC Bioinformatics Date: 2017-03-14 Impact factor: 3.169

8. A unified approach to false discovery rate estimation.

Authors: Korbinian Strimmer
Journal: BMC Bioinformatics Date: 2008-07-09 Impact factor: 3.169

Review 9. Statistical analysis for genome-wide association study.

Authors: Ping Zeng; Yang Zhao; Cheng Qian; Liwei Zhang; Ruyang Zhang; Jianwei Gou; Jin Liu; Liya Liu; Feng Chen
Journal: J Biomed Res Date: 2014-11-30

10. Genetic associations at 53 loci highlight cell types and biological pathways relevant for kidney function.

Authors: Cristian Pattaro; Alexander Teumer; Mathias Gorski; Audrey Y Chu; Man Li; Vladan Mijatovic; Maija Garnaas; Adrienne Tin; Rossella Sorice; Yong Li; Daniel Taliun; Matthias Olden; Meredith Foster; Qiong Yang; Ming-Huei Chen; Tune H Pers; Andrew D Johnson; Yi-An Ko; Christian Fuchsberger; Bamidele Tayo; Michael Nalls; Mary F Feitosa; Aaron Isaacs; Abbas Dehghan; Pio d'Adamo; Adebowale Adeyemo; Aida Karina Dieffenbach; Alan B Zonderman; Ilja M Nolte; Peter J van der Most; Alan F Wright; Alan R Shuldiner; Alanna C Morrison; Albert Hofman; Albert V Smith; Albert W Dreisbach; Andre Franke; Andre G Uitterlinden; Andres Metspalu; Anke Tonjes; Antonio Lupo; Antonietta Robino; Åsa Johansson; Ayse Demirkan; Barbara Kollerits; Barry I Freedman; Belen Ponte; Ben A Oostra; Bernhard Paulweber; Bernhard K Krämer; Braxton D Mitchell; Brendan M Buckley; Carmen A Peralta; Caroline Hayward; Catherine Helmer; Charles N Rotimi; Christian M Shaffer; Christian Müller; Cinzia Sala; Cornelia M van Duijn; Aude Saint-Pierre; Daniel Ackermann; Daniel Shriner; Daniela Ruggiero; Daniela Toniolo; Yingchang Lu; Daniele Cusi; Darina Czamara; David Ellinghaus; David S Siscovick; Douglas Ruderfer; Christian Gieger; Harald Grallert; Elena Rochtchina; Elizabeth J Atkinson; Elizabeth G Holliday; Eric Boerwinkle; Erika Salvi; Erwin P Bottinger; Federico Murgia; Fernando Rivadeneira; Florian Ernst; Florian Kronenberg; Frank B Hu; Gerjan J Navis; Gary C Curhan; George B Ehret; Georg Homuth; Stefan Coassin; Gian-Andri Thun; Giorgio Pistis; Giovanni Gambaro; Giovanni Malerba; Grant W Montgomery; Gudny Eiriksdottir; Gunnar Jacobs; Guo Li; H-Erich Wichmann; Harry Campbell; Helena Schmidt; Henri Wallaschofski; Henry Völzke; Hermann Brenner; Heyo K Kroemer; Holly Kramer; Honghuang Lin; I Mateo Leach; Ian Ford; Idris Guessous; Igor Rudan; Inga Prokopenko; Ingrid Borecki; Iris M Heid; Ivana Kolcic; Ivana Persico; J Wouter Jukema; James F Wilson; Janine F Felix; Jasmin Divers; Jean-Charles Lambert; Jeanette M Stafford; Jean-Michel Gaspoz; Jennifer A Smith; Jessica D Faul; Jie Jin Wang; Jingzhong Ding; Joel N Hirschhorn; John Attia; John B Whitfield; John Chalmers; Jorma Viikari; Josef Coresh; Joshua C Denny; Juha Karjalainen; Jyotika K Fernandes; Karlhans Endlich; Katja Butterbach; Keith L Keene; Kurt Lohman; Laura Portas; Lenore J Launer; Leo-Pekka Lyytikäinen; Loic Yengo; Lude Franke; Luigi Ferrucci; Lynda M Rose; Lyudmyla Kedenko; Madhumathi Rao; Maksim Struchalin; Marcus E Kleber; Margherita Cavalieri; Margot Haun; Marilyn C Cornelis; Marina Ciullo; Mario Pirastu; Mariza de Andrade; Mark A McEvoy; Mark Woodward; Martin Adam; Massimiliano Cocca; Matthias Nauck; Medea Imboden; Melanie Waldenberger; Menno Pruijm; Marie Metzger; Michael Stumvoll; Michele K Evans; Michele M Sale; Mika Kähönen; Mladen Boban; Murielle Bochud; Myriam Rheinberger; Niek Verweij; Nabila Bouatia-Naji; Nicholas G Martin; Nick Hastie; Nicole Probst-Hensch; Nicole Soranzo; Olivier Devuyst; Olli Raitakari; Omri Gottesman; Oscar H Franco; Ozren Polasek; Paolo Gasparini; Patricia B Munroe; Paul M Ridker; Paul Mitchell; Paul Muntner; Christa Meisinger; Johannes H Smit; Peter Kovacs; Philipp S Wild; Philippe Froguel; Rainer Rettig; Reedik Mägi; Reiner Biffar; Reinhold Schmidt; Rita P S Middelberg; Robert J Carroll; Brenda W Penninx; Rodney J Scott; Ronit Katz; Sanaz Sedaghat; Sarah H Wild; Sharon L R Kardia; Sheila Ulivi; Shih-Jen Hwang; Stefan Enroth; Stefan Kloiber; Stella Trompet; Benedicte Stengel; Stephen J Hancock; Stephen T Turner; Sylvia E Rosas; Sylvia Stracke; Tamara B Harris; Tanja Zeller; Tatijana Zemunik; Terho Lehtimäki; Thomas Illig; Thor Aspelund; Tiit Nikopensius; Tonu Esko; Toshiko Tanaka; Ulf Gyllensten; Uwe Völker; Valur Emilsson; Veronique Vitart; Ville Aalto; Vilmundur Gudnason; Vincent Chouraki; Wei-Min Chen; Wilmar Igl; Winfried März; Wolfgang Koenig; Wolfgang Lieb; Ruth J F Loos; Yongmei Liu; Harold Snieder; Peter P Pramstaller; Afshin Parsa; Jeffrey R O'Connell; Katalin Susztak; Pavel Hamet; Johanne Tremblay; Ian H de Boer; Carsten A Böger; Wolfram Goessling; Daniel I Chasman; Anna Köttgen; W H Linda Kao; Caroline S Fox
Journal: Nat Commun Date: 2016-01-21 Impact factor: 14.919

6 in total

1. Multi-Omic Approaches to Identify Genetic Factors in Metabolic Syndrome.

Authors: Karen C Clark; Anne E Kwitek
Journal: Compr Physiol Date: 2021-12-29 Impact factor: 8.915

2. The Promise of Metabolomics in Decelerating CKD Progression in Children.

Authors: Ulla T Schultheiss; Peggy Sekula
Journal: Clin J Am Soc Nephrol Date: 2021-08 Impact factor: 10.614

3. Stepwise approach to SNP-set analysis illustrated with the Metabochip and colorectal cancer in Japanese Americans of the Multiethnic Cohort.

Authors: John Cologne; Lenora Loo; Yurii B Shvetsov; Munechika Misumi; Philip Lin; Christopher A Haiman; Lynne R Wilkens; Loïc Le Marchand
Journal: BMC Genomics Date: 2018-07-09 Impact factor: 3.969

4. In-Depth Analysis of Genetic Variation Associated with Severe West Nile Viral Disease.

Authors: Megan E Cahill; Mark Loeb; Andrew T Dewan; Ruth R Montgomery
Journal: Vaccines (Basel) Date: 2020-12-08

5. Cytokine adsorption in patients with acute-on-chronic liver failure (CYTOHEP)-a single center, open-label, three-arm, randomized, controlled intervention trial.

Authors: Asieb Sekandarzad; Enya Weber; Eric Peter Prager; Erika Graf; Dominik Bettinger; Tobias Wengenmayer; Alexander Supady
Journal: Trials Date: 2022-03-18 Impact factor: 2.279

6. Genome-Wide Association Study Reveals Candidate Genes for Litter Size Traits in Pelibuey Sheep.

Authors: Wilber Hernández-Montiel; Mario Alberto Martínez-Núñez; Julio Porfirio Ramón-Ugalde; Sergio Iván Román-Ponce; Rene Calderón-Chagoya; Roberto Zamora-Bustillos
Journal: Animals (Basel) Date: 2020-03-04 Impact factor: 2.752

6 in total