Literature DB >> 20617121

On Consistency and Sparsity for Principal Components Analysis in High Dimensions.

Iain M Johnstone1, Arthur Yu Lu.   

Abstract

Principal components analysis (PCA) is a classic method for the reduction of dimensionality of data in the form of n observations (or cases) of a vector with p variables. Contemporary datasets often have p comparable with or even much larger than n. Our main assertions, in such settings, are (a) that some initial reduction in dimensionality is desirable before applying any PCA-type search for principal modes, and (b) the initial reduction in dimensionality is best achieved by working in a basis in which the signals have a sparse representation. We describe a simple asymptotic model in which the estimate of the leading principal component vector via standard PCA is consistent if and only if p(n)/n→0. We provide a simple algorithm for selecting a subset of coordinates with largest sample variances, and show that if PCA is done on the selected subset, then consistency is recovered, even if p(n) ⪢ n.

Entities:  

Year:  2009        PMID: 20617121      PMCID: PMC2898454          DOI: 10.1198/jasa.2009.0121

Source DB:  PubMed          Journal:  J Am Stat Assoc        ISSN: 0162-1459            Impact factor:   5.033


  1 in total

1.  Principal-component-analysis eigenvalue spectra from data with symmetry-breaking structure.

Authors:  D C Hoyle; M Rattray
Journal:  Phys Rev E Stat Nonlin Soft Matter Phys       Date:  2004-02-27
  1 in total
  72 in total

1.  Limitations of GCTA as a solution to the missing heritability problem.

Authors:  Siddharth Krishna Kumar; Marcus W Feldman; David H Rehkopf; Shripad Tuljapurkar
Journal:  Proc Natl Acad Sci U S A       Date:  2015-12-22       Impact factor: 11.205

2.  Biclustering with heterogeneous variance.

Authors:  Guanhua Chen; Patrick F Sullivan; Michael R Kosorok
Journal:  Proc Natl Acad Sci U S A       Date:  2013-07-08       Impact factor: 11.205

3.  TESTING HIGH-DIMENSIONAL COVARIANCE MATRICES, WITH APPLICATION TO DETECTING SCHIZOPHRENIA RISK GENES.

Authors:  Lingxue Zhu; Jing Lei; Bernie Devlin; Kathryn Roeder
Journal:  Ann Appl Stat       Date:  2017-10-05       Impact factor: 2.083

4.  Two-Step Hypothesis Testing When the Number of Variables Exceeds the Sample Size.

Authors:  Yueh-Yun Chi; Keith E Muller
Journal:  Commun Stat Simul Comput       Date:  2013       Impact factor: 1.118

5.  LARGE COVARIANCE ESTIMATION THROUGH ELLIPTICAL FACTOR MODELS.

Authors:  Jianqing Fan; Han Liu; Weichen Wang
Journal:  Ann Stat       Date:  2018-06-27       Impact factor: 4.028

6.  Statistical challenges of high-dimensional data.

Authors:  Iain M Johnstone; D Michael Titterington
Journal:  Philos Trans A Math Phys Eng Sci       Date:  2009-11-13       Impact factor: 4.226

7.  FLCRM: Functional linear cox regression model.

Authors:  Dehan Kong; Joseph G Ibrahim; Eunjee Lee; Hongtu Zhu
Journal:  Biometrics       Date:  2017-09-01       Impact factor: 2.571

8.  Scale-Invariant Sparse PCA on High Dimensional Meta-elliptical Data.

Authors:  Fang Han; Han Liu
Journal:  J Am Stat Assoc       Date:  2014-01-01       Impact factor: 5.033

9.  PCA in High Dimensions: An orientation.

Authors:  Iain M Johnstone; Debashis Paul
Journal:  Proc IEEE Inst Electr Electron Eng       Date:  2018-07-18       Impact factor: 10.961

10.  Sparse principal component analysis by choice of norm.

Authors:  Xin Qi; Ruiyan Luo; Hongyu Zhao
Journal:  J Multivar Anal       Date:  2012-07-16       Impact factor: 1.473

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.