Literature DB >> 29942099

ARE DISCOVERIES SPURIOUS? DISTRIBUTIONS OF MAXIMUM SPURIOUS CORRELATIONS AND THEIR APPLICATIONS.

Jianqing Fan1, Qi-Man Shao2, Wen-Xin Zhou3.   

Abstract

Over the last two decades, many exciting variable selection methods have been developed for finding a small group of covariates that are associated with the response from a large pool. Can the discoveries by such data mining approaches be spurious due to high dimensionality and limited sample size? Can our fundamental assumptions on exogeneity of covariates needed for such variable selection be validated with the data? To answer these questions, we need to derive the distributions of the maximum spurious correlations given certain number of predictors, namely, the distribution of the correlation of a response variable Y with the best s linear combinations of p covariates X, even when X and Y are independent. When the covariance matrix of X possesses the restricted eigenvalue property, we derive such distributions for both finite s and diverging s, using Gaussian approximation and empirical process techniques. However, such a distribution depends on the unknown covariance matrix of X. Hence, we use the multiplier bootstrap procedure to approximate the unknown distributions and establish the consistency of such a simple bootstrap approach. The results are further extended to the situation where residuals are from regularized fits. Our approach is then applied to construct the upper confidence limit for the maximum spurious correlation and testing exogeneity of covariates. The former provides a baseline for guarding against false discoveries due to data mining and the latter tests whether our fundamental assumptions for high-dimensional model selection are statistically valid. Our techniques and results are illustrated by both numerical examples and real data analysis.

Entities:  

Keywords:  High dimension; bootstrap; false discovery; spurious correlation

Year:  2018        PMID: 29942099      PMCID: PMC6014708          DOI: 10.1214/17-AOS1575

Source DB:  PubMed          Journal:  Ann Stat        ISSN: 0090-5364            Impact factor:   4.028


  11 in total

1.  Non-Concave Penalized Likelihood with NP-Dimensionality.

Authors:  Jianqing Fan; Jinchi Lv
Journal:  IEEE Trans Inf Theory       Date:  2011-08       Impact factor: 2.501

2.  The International HapMap Project Web site.

Authors:  Gudmundur A Thorisson; Albert V Smith; Lalitha Krishnan; Lincoln D Stein
Journal:  Genome Res       Date:  2005-11       Impact factor: 9.043

3.  Variance estimation using refitted cross-validation in ultrahigh dimensional regression.

Authors:  Jianqing Fan; Shaojun Guo; Ning Hao
Journal:  J R Stat Soc Series B Stat Methodol       Date:  2012-01-01       Impact factor: 4.488

4.  A Selective Overview of Variable Selection in High Dimensional Feature Space.

Authors:  Jianqing Fan; Jinchi Lv
Journal:  Stat Sin       Date:  2010-01       Impact factor: 1.261

5.  Endogeneity in High Dimensions.

Authors:  Jianqing Fan; Yuan Liao
Journal:  Ann Stat       Date:  2014-06-01       Impact factor: 4.028

6.  STRONG ORACLE OPTIMALITY OF FOLDED CONCAVE PENALIZED ESTIMATION.

Authors:  Jianqing Fan; Lingzhou Xue; Hui Zou
Journal:  Ann Stat       Date:  2014-06       Impact factor: 4.028

7.  Challenges of Big Data Analysis.

Authors:  Jianqing Fan; Fang Han; Han Liu
Journal:  Natl Sci Rev       Date:  2014-06       Impact factor: 17.275

8.  Distributions of Angles in Random Packing on Spheres.

Authors:  Tony Cai; Jianqing Fan; Tiefeng Jiang
Journal:  J Mach Learn Res       Date:  2013-01       Impact factor: 3.654

9.  Sequence variants at CHRNB3-CHRNA6 and CYP2A6 affect smoking behavior.

Authors:  Thorgeir E Thorgeirsson; Daniel F Gudbjartsson; Ida Surakka; Jacqueline M Vink; Najaf Amin; Frank Geller; Patrick Sulem; Thorunn Rafnar; Tõnu Esko; Stefan Walter; Christian Gieger; Rajesh Rawal; Massimo Mangino; Inga Prokopenko; Reedik Mägi; Kaisu Keskitalo; Iris H Gudjonsdottir; Solveig Gretarsdottir; Hreinn Stefansson; John R Thompson; Yurii S Aulchenko; Mari Nelis; Katja K Aben; Martin den Heijer; Asger Dirksen; Haseem Ashraf; Nicole Soranzo; Ana M Valdes; Claire Steves; André G Uitterlinden; Albert Hofman; Anke Tönjes; Peter Kovacs; Jouke Jan Hottenga; Gonneke Willemsen; Nicole Vogelzangs; Angela Döring; Norbert Dahmen; Barbara Nitz; Michele L Pergadia; Berta Saez; Veronica De Diego; Victoria Lezcano; Maria D Garcia-Prats; Samuli Ripatti; Markus Perola; Johannes Kettunen; Anna-Liisa Hartikainen; Anneli Pouta; Jaana Laitinen; Matti Isohanni; Shen Huei-Yi; Maxine Allen; Maria Krestyaninova; Alistair S Hall; Gregory T Jones; Andre M van Rij; Thomas Mueller; Benjamin Dieplinger; Meinhard Haltmayer; Steinn Jonsson; Stefan E Matthiasson; Hogni Oskarsson; Thorarinn Tyrfingsson; Lambertus A Kiemeney; Jose I Mayordomo; Jes S Lindholt; Jesper Holst Pedersen; Wilbur A Franklin; Holly Wolf; Grant W Montgomery; Andrew C Heath; Nicholas G Martin; Pamela A F Madden; Ina Giegling; Dan Rujescu; Marjo-Riitta Järvelin; Veikko Salomaa; Michael Stumvoll; Tim D Spector; H-Erich Wichmann; Andres Metspalu; Nilesh J Samani; Brenda W Penninx; Ben A Oostra; Dorret I Boomsma; Henning Tiemeier; Cornelia M van Duijn; Jaakko Kaprio; Jeffrey R Gulcher; Mark I McCarthy; Leena Peltonen; Unnur Thorsteinsdottir; Kari Stefansson
Journal:  Nat Genet       Date:  2010-04-25       Impact factor: 38.330

10.  One-step Sparse Estimates in Nonconcave Penalized Likelihood Models.

Authors:  Hui Zou; Runze Li
Journal:  Ann Stat       Date:  2008-08-01       Impact factor: 4.028

View more
  3 in total

Review 1.  Radiomics: from qualitative to quantitative imaging.

Authors:  William Rogers; Sithin Thulasi Seetha; Turkey A G Refaee; Relinde I Y Lieverse; Renée W Y Granzier; Abdalla Ibrahim; Simon A Keek; Sebastian Sanduleanu; Sergey P Primakov; Manon P L Beuque; Damiënne Marcus; Alexander M A van der Wiel; Fadila Zerka; Cary J G Oberije; Janita E van Timmeren; Henry C Woodruff; Philippe Lambin
Journal:  Br J Radiol       Date:  2020-02-26       Impact factor: 3.039

2.  Sparse Sliced Inverse Regression Via Lasso.

Authors:  Qian Lin; Zhigen Zhao; Jun S Liu
Journal:  J Am Stat Assoc       Date:  2019-03-09       Impact factor: 5.033

3.  Multiblock Discriminant Analysis of Integrative 18F-FDG-PET/CT Radiomics for Predicting Circulating Tumor Cells in Early-Stage Non-small Cell Lung Cancer Treated With Stereotactic Body Radiation Therapy.

Authors:  Sang Ho Lee; Gary D Kao; Steven J Feigenberg; Jay F Dorsey; Melissa A Frick; Samuel Jean-Baptiste; Chibueze Z Uche; Keith A Cengel; William P Levin; Abigail T Berman; Charu Aggarwal; Yong Fan; Ying Xiao
Journal:  Int J Radiat Oncol Biol Phys       Date:  2021-03-01       Impact factor: 8.013

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.