Literature DB >> 17597927

An adaptive alpha spending algorithm improves the power of statistical inference in microarray data analysis.

Jacob P L Brand¹, Lang Chen, Xiangqin Cui, Alfred A Bartolucci, Grier P Page, Kyoungmi Kim, Stephen Barnes, Vinodh Srinivasasainagendra, Mark T Beasley, David B Allison.

Abstract

The adaptive alpha-spending algorithm incorporates additional contextual evidence (including correlations among genes) about differential expression to adjust the initial p-values to yield the alpha-spending adjusted p-values. The alpha-spending algorithm is named so because of its similarity with the alpha-spending algorithm in interim analysis of clinical trials in which stage-specific significance levels are assigned to each stage of the clinical trial. We show that the Bonferroni correction applied to the alpha-spending adjusted p-values approximately controls the Family Wise Error Rate under the complete null hypothesis. Using simulations we also show that the use of the alpha spending algorithm yields increased power over the unadjusted p-values while controlling FDR. We found the greater benefits of the alpha spending algorithm with increasing sample sizes and correlation among genes. The use of the alpha spending algorithm will result in microarray experiments that make more efficient use of their data and may help conserve resources.

Year: 2007 PMID： 17597927 PMCID： PMC1896052 DOI： 10.6026/97320630001384

Source DB: PubMed Journal: Bioinformation ISSN： 0973-2063

Background

Microarray technology has become a widely used and effective research tool in modern molecular biology. It can produce a snapshot of the expression levels of thousands of genes simultaneously at a very low cost per data point. However, researchers are often more interested in how biological pathways respond to experimental condition changes rather than in changes in expression levels of individual genes. The total flux through a pathway can change dramatically through subtle changes in expression levels of genes involved in that pathway. [1] Thus, the prevalence of microarray technology in the research of complex metabolic disorders makes the problem of identifying genes with subtle differential expression increasingly important. Unfortunately, the identification of genes with subtle differential expression is challenging due to the huge number of genes involved, the noisiness of the data, and the very small sample sizes (often not more than 5 observed expression levels per gene and/or per treatment group). Most approaches for identifying differentially expressed genes may be of limited power because they neither take into account nor capitalize on dependencies among genes. As an alternative, we propose an adaptive alpha-spending algorithm that takes into account the dependencies of expression levels among genes explicitly by assigning gene-specific significance levels to each gene. The alpha-spending algorithm is named so because of its similarity with alpha-spending algorithms in interim analysis in clinical trials. [2] Interim analysis is often carried out at multiple times in a clinical trial for reasons such as checking adherence to the protocol, economic and ethical reasons. Because in interim analysis the same null-hypothesis is tested multiple times, not correcting for multiple testing will inflate the type 1 error. Multiplicity is controlled in the alpha-spending algorithm by assigning stage specific significance levels to each stage in the clinical trial such that the sum of stage specific significance levels is equal to the overall significance level, The PDF file linked below

Methodolgy

The gene-specific significance levels are based on a prediction equation similar to the linear regression prediction equation: as given in the PDF file linked below

Discussion

We have proposed an adaptive alpha-spending algorithm for finding differentially expressed genes in microarray data sets in which observed dependencies among genes are incorporated by assigning gene specific significance levels to each gene. We think this procedure may increase the power in finding differentially expressed genes. The supplementary material is attached Our simulation study confirms that the alpha-spending algorithm controls the PCER and FDR in many practical situations. Under the complete null, the PCER was controlled with respect to all genes overall as well as for the group of uncorrelated genes. For the group of correlated genes, the PCER tended to be inflated (Table 1). Under the partial null, the PCER was controlled in all simulation parameter settings and the FDR was controlled in most of the simulation parameter settings (Figure 1). The observed PCER decreases for increasing group-size and correlation, but this relationship was not seen in the observed FDR. On average the alpha-spending algorithm improves the power and this power improvement increased for increasing group size or increasing correlation. The power improvement can be up to 47% for ρ = 0.7 and n = 6 (Figure 2). However the power improvement varied substantially across individual simulated data sets. For lower values of ρ and n power decreased for some simulated data sets and this decrease in power was up to 15% for ρ = 0.3 and n = 4 . For n ≥ 6 the alphaspending algorithm seemed to have added value. We also increased the number of genes in the simulation to 2000 for some cases; the results are very similar to what was obtained for the simulations with 700 genes.

Table 1

Observed PCER for the alpha-spending post-processed p-values estimated for correlated genes, uncorrelated genes, and all genes under the complete null hypothesis that all genes are non-differentially expressed. The number of genes in each simulation was 700 and the nominal alpha levels of 0.01, 0.05, and 0.1 were used for identifying differential genes. In each simulation parameter setting (ρ , n) the observed PCER was estimated from 100 simulated data sets

		Correlated genes			Uncorrelated genes			All genes
ρ	n	0.01	0.05	0.1	0.01	0.05	0.1	0.01	0.05	0.1
0.3	4	0.0092	0.0506	0.1044	0.0099	0.0483	0.0966	0.0098	0.0487	0.0982
0.3	6	0.0136	0.0689	0.1362	0.0095	0.0466	0.0938	0.0103	0.0510	0.1023
0.3	10	0.0117	0.0660	0.1316	0.0098	0.0463	0.0928	0.0102	0.0502	0.1006
0.5	4	0.0111	0.0663	0.1333	0.0091	0.0466	0.0932	0.0095	0.0505	0.1012
0.5	6	0.0175	0.0864	0.1664	0.0085	0.0421	0.0849	0.0103	0.0510	0.1012
0.5	10	0.0238	0.1006	0.1849	0.0081	0.0437	0.0875	0.0112	0.0551	0.1070
0.7	4	0.0326	0.1078	0.1908	0.0088	0.0450	0.0897	0.0136	0.0575	0.1099
0.7	6	0.0126	0.0794	0.1723	0.0088	0.0433	0.0864	0.0096	0.0505	0.1036
0.7	10	0.0353	0.1265	0.2249	0.0079	0.0389	0.0813	0.0134	0.0564	0.1101

Figure 1

Observed PCER and observed FDR of the alpha-spending algorithm as a function of power of the ordinary t-test for different correlations ρ = 0.3, 0.5, 0.7 and different group sizes n = 4, 6,10 for k = 700 . The number of genes in each simulation was 700 and the nominal alpha levels of 0.05 was used for identifying differential genes. A thin dashed black line, a solid blue line, and a thick red line refer to a correlation ρ of 0.3, 0.5, and 0.7, respectively. The group sizes of 4, 6, and 10 are represented by circles, squares, and triangles, respectively. In each simulation parameter setting (ρ , n) the observed PCER was estimated from 100 simulated data sets

Figure 2

Power improvement of alpha-spending p-values with respect to the ordinary t-test. The results are from the partial null hypothesis simulations with 20% of the genes differentially expressed and correlated with the same correlation coefficient ρ and 80% of the genes non-differentially expressed and uncorrelated. For k = 700 , the 700 = 7x100 simulated data sets per plot were obtained by independently generating 100 data sets for each of seven different values of the population mean differential expression Δ . These seven values of Δ = Δ(1– β ) were obtained such that the corresponding power of the ordinary t-test in detecting the differentially expressed genes was varied by1– β = 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8 . For k = 2000 the 30 simulated data sets correspond to 1– β = 0.5 only. The situation k = 2000 is simulated for n = 4, 6 but not for n =10

The supplementary material is attached

Conclusion

5 in total

1. Shrinkage-based similarity metric for cluster analysis of microarray data.

Authors: Vera Cherepinsky; Jiawu Feng; Marc Rejali; Bud Mishra
Journal: Proc Natl Acad Sci U S A Date: 2003-08-05 Impact factor: 11.205

2. Empirical Bayes estimation of gene-specific effects in micro-array research.

Authors: Jode W Edwards; Grier P Page; Gary Gadbury; Moonseong Heo; Tsuyoshi Kayo; Richard Weindruch; David B Allison
Journal: Funct Integr Genomics Date: 2004-09-29 Impact factor: 3.410

3. Improved statistical tests for differential gene expression by shrinking variance components estimates.

Authors: Xiangqin Cui; J T Gene Hwang; Jing Qiu; Natalie J Blades; Gary A Churchill
Journal: Biostatistics Date: 2005-01 Impact factor: 5.899

Review 4. A review of methods for futility stopping based on conditional power.

Authors: John M Lachin
Journal: Stat Med Date: 2005-09-30 Impact factor: 2.373

5. PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes.

Authors: Vamsi K Mootha; Cecilia M Lindgren; Karl-Fredrik Eriksson; Aravind Subramanian; Smita Sihag; Joseph Lehar; Pere Puigserver; Emma Carlsson; Martin Ridderstråle; Esa Laurila; Nicholas Houstis; Mark J Daly; Nick Patterson; Jill P Mesirov; Todd R Golub; Pablo Tamayo; Bruce Spiegelman; Eric S Lander; Joel N Hirschhorn; David Altshuler; Leif C Groop
Journal: Nat Genet Date: 2003-07 Impact factor: 38.330

5 in total

2 in total

Review 1. Common scientific and statistical errors in obesity research.

Authors: Brandon J George; T Mark Beasley; Andrew W Brown; John Dawson; Rositsa Dimova; Jasmin Divers; TaShauna U Goldsby; Moonseong Heo; Kathryn A Kaiser; Scott W Keith; Mimi Y Kim; Peng Li; Tapan Mehta; J Michael Oakes; Asheley Skinner; Elizabeth Stuart; David B Allison
Journal: Obesity (Silver Spring) Date: 2016-04 Impact factor: 5.002

2. Optimal alpha reduces error rates in gene expression studies: a meta-analysis approach.

Authors: J F Mudge; C J Martyniuk; J E Houlahan
Journal: BMC Bioinformatics Date: 2017-06-21 Impact factor: 3.169

2 in total