Literature DB >> 30150379

Projection pursuit in high dimensions.

Peter J Bickel1, Gil Kur2, Boaz Nadler3.   

Abstract

Projection pursuit is a classical exploratory data analysis method to detect interesting low-dimensional structures in multivariate data. Originally, projection pursuit was applied mostly to data of moderately low dimension. Motivated by contemporary applications, we here study its properties in high-dimensional settings. Specifically, we analyze the asymptotic properties of projection pursuit on structureless multivariate Gaussian data with an identity covariance, as both dimension p and sample size n tend to infinity, with [Formula: see text] Our main results are that (i) if [Formula: see text] then there exist projections whose corresponding empirical cumulative distribution function can approximate any arbitrary distribution; and (ii) if [Formula: see text], not all limiting distributions are possible. However, depending on the value of γ, various non-Gaussian distributions may still be approximated. In contrast, if we restrict to sparse projections, involving only a few of the p variables, then asymptotically all empirical cumulative distribution functions are Gaussian. And (iii) if [Formula: see text], then asymptotically all projections are Gaussian. Some of these results extend to mean-centered sub-Gaussian data and to projections into k dimensions. Hence, in the "small n, large p" setting, unless sparsity is enforced, and regardless of the chosen projection index, projection pursuit may detect an apparent structure that has no statistical significance. Furthermore, our work reveals fundamental limitations on the ability to detect non-Gaussian signals in high-dimensional data, in particular through independent component analysis and related non-Gaussian component analysis.

Keywords:  dimensionality reduction; independent component analysis; projection pursuit; random matrix theory; sparsity

Year:  2018        PMID: 30150379      PMCID: PMC6140545          DOI: 10.1073/pnas.1801177115

Source DB:  PubMed          Journal:  Proc Natl Acad Sci U S A        ISSN: 0027-8424            Impact factor:   11.205


  3 in total

1.  Molecular function recognition by supervised projection pursuit machine learning.

Authors:  Tyler Grear; Chris Avery; John Patterson; Donald J Jacobs
Journal:  Sci Rep       Date:  2021-02-19       Impact factor: 4.379

2.  Group linear non-Gaussian component analysis with applications to neuroimaging.

Authors:  Yuxuan Zhao; David S Matteson; Stewart H Mostofsky; Mary Beth Nebel; Benjamin B Risk
Journal:  Comput Stat Data Anal       Date:  2022-02-22       Impact factor: 2.035

3.  Projection in genomic analysis: A theoretical basis to rationalize tensor decomposition and principal component analysis as feature selection tools.

Authors:  Y-H Taguchi; Turki Turki
Journal:  PLoS One       Date:  2022-09-29       Impact factor: 3.752

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.