Literature DB >> 30778267

Asymptotic performance of PCA for high-dimensional heteroscedastic data.

David Hong1, Laura Balzano1, Jeffrey A Fessler1.   

Abstract

Principal Component Analysis (PCA) is a classical method for reducing the dimensionality of data by projecting them onto a subspace that captures most of their variation. Effective use of PCA in modern applications requires understanding its performance for data that are both high-dimensional and heteroscedastic. This paper analyzes the statistical performance of PCA in this setting, i.e., for high-dimensional data drawn from a low-dimensional subspace and degraded by heteroscedastic noise. We provide simplified expressions for the asymptotic PCA recovery of the underlying subspace, subspace amplitudes and subspace coefficients; the expressions enable both easy and efficient calculation and reasoning about the performance of PCA. We exploit the structure of these expressions to show that, for a fixed average noise variance, the asymptotic recovery of PCA for heteroscedastic data is always worse than that for homoscedastic data (i.e., for noise variances that are equal across samples). Hence, while average noise variance is often a practically convenient measure for the overall quality of data, it gives an overly optimistic estimate of the performance of PCA for heteroscedastic data.

Entities:  

Keywords:  Asymptotic random matrix theory; Heteroscedasticity; High-dimensional data; Principal component analysis; Subspace estimation

Year:  2018        PMID: 30778267      PMCID: PMC6377200          DOI: 10.1016/j.jmva.2018.06.002

Source DB:  PubMed          Journal:  J Multivar Anal        ISSN: 0047-259X            Impact factor:   1.473


  4 in total

1.  Joint dimension reduction and clustering analysis of single-cell RNA-seq and spatial transcriptomics data.

Authors:  Wei Liu; Xu Liao; Yi Yang; Huazhen Lin; Joe Yeong; Xiang Zhou; Xingjie Shi; Jin Liu
Journal:  Nucleic Acids Res       Date:  2022-07-08       Impact factor: 19.160

2.  Portable Electronic Nose Based on Digital and Analog Chemical Sensors for 2,4,6-Trichloroanisole Discrimination.

Authors:  Félix Meléndez; Patricia Arroyo; Jaime Gómez-Suárez; Sergio Palomeque-Mangut; José Ignacio Suárez; Jesús Lozano
Journal:  Sensors (Basel)       Date:  2022-04-30       Impact factor: 3.847

3.  Temporal scatterplots.

Authors:  Or Patashnik; Min Lu; Amit H Bermano; Daniel Cohen-Or
Journal:  Comput Vis Media (Beijing)       Date:  2020-11-07

4.  A GRU-based traffic situation prediction method in multi-domain software defined network.

Authors:  Wenwen Sun; Shaopeng Guan
Journal:  PeerJ Comput Sci       Date:  2022-06-23
  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.