Literature DB >> 21052523

Correlated z-values and the accuracy of large-scale statistical estimates.

Bradley Efron1.   

Abstract

We consider large-scale studies in which there are hundreds or thousands of correlated cases to investigate, each represented by its own normal variate, typically a z-value. A familiar example is provided by a microarray experiment comparing healthy with sick subjects' expression levels for thousands of genes. This paper concerns the accuracy of summary statistics for the collection of normal variates, such as their empirical cdf or a false discovery rate statistic. It seems like we must estimate an N by N correlation matrix, N the number of cases, but our main result shows that this is not necessary: good accuracy approximations can be based on the root mean square correlation over all N · (N - 1)/2 pairs, a quantity often easily estimated. A second result shows that z-values closely follow normal distributions even under non-null conditions, supporting application of the main theorem. Practical application of the theory is illustrated for a large leukemia microarray study.

Entities:  

Year:  2010        PMID: 21052523      PMCID: PMC2967047          DOI: 10.1198/jasa.2010.tm09129

Source DB:  PubMed          Journal:  J Am Stat Assoc        ISSN: 0162-1459            Impact factor:   5.033


  6 in total

1.  A comparison of normalization methods for high density oligonucleotide array data based on variance and bias.

Authors:  B M Bolstad; R A Irizarry; M Astrand; T P Speed
Journal:  Bioinformatics       Date:  2003-01-22       Impact factor: 6.937

2.  Multiple testing. Part I. Single-step procedures for control of general type I error rates.

Authors:  Sandrine Dudoit; Mark J van der Laan; Katherine S Pollard
Journal:  Stat Appl Genet Mol Biol       Date:  2004-06-09

3.  Correlation between gene expression levels and limitations of the empirical bayes methodology for finding differentially expressed genes.

Authors:  Xing Qiu; Lev Klebanov; Andrei Yakovlev
Journal:  Stat Appl Genet Mol Biol       Date:  2005-11-22

4.  Are a set of microarrays independent of each other?

Authors:  Bradley Efron
Journal:  Ann Appl Stat       Date:  2009-01-01       Impact factor: 2.083

5.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring.

Authors:  T R Golub; D K Slonim; P Tamayo; C Huard; M Gaasenbeek; J P Mesirov; H Coller; M L Loh; J R Downing; M A Caligiuri; C D Bloomfield; E S Lander
Journal:  Science       Date:  1999-10-15       Impact factor: 47.728

6.  The effects of normalization on the correlation structure of microarray data.

Authors:  Xing Qiu; Andrew I Brooks; Lev Klebanov; Ndrei Yakovlev
Journal:  BMC Bioinformatics       Date:  2005-05-16       Impact factor: 3.169

  6 in total
  31 in total

1.  PROJECTED PRINCIPAL COMPONENT ANALYSIS IN FACTOR MODELS.

Authors:  Jianqing Fan; Yuan Liao; Weichen Wang
Journal:  Ann Stat       Date:  2016-02       Impact factor: 4.028

2.  An efficient hierarchical generalized linear mixed model for pathway analysis of genome-wide association studies.

Authors:  Lily Wang; Peilin Jia; Russell D Wolfinger; Xi Chen; Britney L Grayson; Thomas M Aune; Zhongming Zhao
Journal:  Bioinformatics       Date:  2011-01-25       Impact factor: 6.937

3.  Functional and Optogenetic Approaches to Discovering Stable Subtype-Specific Circuit Mechanisms in Depression.

Authors:  Logan Grosenick; Tracey C Shi; Faith M Gunning; Marc J Dubin; Jonathan Downar; Conor Liston
Journal:  Biol Psychiatry Cogn Neurosci Neuroimaging       Date:  2019-05-10

4.  Integrating prior knowledge in multiple testing under dependence with applications to detecting differential DNA methylation.

Authors:  Pei Fen Kuan; Derek Y Chiang
Journal:  Biometrics       Date:  2012-01-19       Impact factor: 2.571

5.  NOISY MATRIX COMPLETION: UNDERSTANDING STATISTICAL GUARANTEES FOR CONVEX RELAXATION VIA NONCONVEX OPTIMIZATION.

Authors:  Yuxin Chen; Yuejie Chi; Jianqing Fan; Cong Ma; Yuling Yan
Journal:  SIAM J Optim       Date:  2020-10-28       Impact factor: 2.850

6.  Robust high dimensional factor models with applications to statistical machine learning.

Authors:  Jianqing Fan; Kaizheng Wang; Yiqiao Zhong; Ziwei Zhu
Journal:  Stat Sci       Date:  2021-04-19       Impact factor: 2.901

7.  Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions.

Authors:  Jianqing Fan; Quefeng Li; Yuyan Wang
Journal:  J R Stat Soc Series B Stat Methodol       Date:  2016-04-14       Impact factor: 4.488

8.  Tweedie's Formula and Selection Bias.

Authors:  Bradley Efron
Journal:  J Am Stat Assoc       Date:  2012-01-24       Impact factor: 5.033

9.  Optimal False Discovery Rate Control for Dependent Data.

Authors:  Jichun Xie; T Tony Cai; John Maris; Hongzhe Li
Journal:  Stat Interface       Date:  2011       Impact factor: 0.582

10.  Statistical analysis of big data on pharmacogenomics.

Authors:  Jianqing Fan; Han Liu
Journal:  Adv Drug Deliv Rev       Date:  2013-04-17       Impact factor: 15.470

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.