Literature DB >> 19805452

Classification of sparse high-dimensional vectors.

Yuri I Ingster1, Christophe Pouet, Alexandre B Tsybakov.   

Abstract

We study the problem of classification of d-dimensional vectors into two classes (one of which is 'pure noise') based on a training sample of size m. The main specific feature is that the dimension d can be very large. We suppose that the difference between the distribution of the population and that of the noise is only in a shift, which is a sparse vector. For Gaussian noise, fixed sample size m, and dimension d that tends to infinity, we obtain the sharp classification boundary, i.e. the necessary and sufficient conditions for the possibility of successful classification. We propose classifiers attaining this boundary. We also give extensions of the result to the case where the sample size m depends on d and satisfies the condition (log m)/log d --> gamma, 0 <or= gamma < 1, and to the case of non-Gaussian noise satisfying the Cramér condition.

Year:  2009        PMID: 19805452     DOI: 10.1098/rsta.2009.0156

Source DB:  PubMed          Journal:  Philos Trans A Math Phys Eng Sci        ISSN: 1364-503X            Impact factor:   4.226


  2 in total

1.  Impossibility of successful classification when useful features are rare and weak.

Authors:  Jiashun Jin
Journal:  Proc Natl Acad Sci U S A       Date:  2009-05-15       Impact factor: 11.205

2.  Statistical challenges of high-dimensional data.

Authors:  Iain M Johnstone; D Michael Titterington
Journal:  Philos Trans A Math Phys Eng Sci       Date:  2009-11-13       Impact factor: 4.226

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.