Literature DB >> 19129490

Spectral methods in machine learning and new strategies for very large datasets.

Mohamed-Ali Belabbas1, Patrick J Wolfe.   

Abstract

Spectral methods are of fundamental importance in statistics and machine learning, because they underlie algorithms from classical principal components analysis to more recent approaches that exploit manifold structure. In most cases, the core technical problem can be reduced to computing a low-rank approximation to a positive-definite kernel. For the growing number of applications dealing with very large or high-dimensional datasets, however, the optimal approximation afforded by an exact spectral decomposition is too costly, because its complexity scales as the cube of either the number of training examples or their dimensionality. Motivated by such applications, we present here 2 new algorithms for the approximation of positive-semidefinite kernels, together with error bounds that improve on results in the literature. We approach this problem by seeking to determine, in an efficient manner, the most informative subset of our data relative to the kernel approximation task at hand. This leads to two new strategies based on the Nyström method that are directly applicable to massive datasets. The first of these-based on sampling-leads to a randomized algorithm whereupon the kernel induces a probability distribution on its set of partitions, whereas the latter approach-based on sorting-provides for the selection of a partition in a deterministic way. We detail their numerical implementation and provide simulation results for a variety of representative problems in statistical data analysis, each of which demonstrates the improved performance of our approach relative to existing methods.

Mesh:

Year:  2009        PMID: 19129490      PMCID: PMC2626709          DOI: 10.1073/pnas.0810600105

Source DB:  PubMed          Journal:  Proc Natl Acad Sci U S A        ISSN: 0027-8424            Impact factor:   11.205


  5 in total

1.  A global geometric framework for nonlinear dimensionality reduction.

Authors:  J B Tenenbaum; V de Silva; J C Langford
Journal:  Science       Date:  2000-12-22       Impact factor: 47.728

2.  Spectral grouping using the Nyström method.

Authors:  Charless Fowlkes; Serge Belongie; Fan Chung; Jitendra Malik
Journal:  IEEE Trans Pattern Anal Mach Intell       Date:  2004-02       Impact factor: 6.226

3.  Hessian eigenmaps: locally linear embedding techniques for high-dimensional data.

Authors:  David L Donoho; Carrie Grimes
Journal:  Proc Natl Acad Sci U S A       Date:  2003-04-30       Impact factor: 11.205

4.  Randomized algorithms for the low-rank approximation of matrices.

Authors:  Edo Liberty; Franco Woolfe; Per-Gunnar Martinsson; Vladimir Rokhlin; Mark Tygert
Journal:  Proc Natl Acad Sci U S A       Date:  2007-12-04       Impact factor: 11.205

5.  Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps.

Authors:  R R Coifman; S Lafon; A B Lee; M Maggioni; B Nadler; F Warner; S W Zucker
Journal:  Proc Natl Acad Sci U S A       Date:  2005-05-17       Impact factor: 12.779

  5 in total
  4 in total

1.  Making sense of big data.

Authors:  Patrick J Wolfe
Journal:  Proc Natl Acad Sci U S A       Date:  2013-10-21       Impact factor: 11.205

2.  On landmark selection and sampling in high-dimensional data analysis.

Authors:  Mohamed-Ali Belabbas; Patrick J Wolfe
Journal:  Philos Trans A Math Phys Eng Sci       Date:  2009-11-13       Impact factor: 4.226

3.  Sampling from Determinantal Point Processes for Scalable Manifold Learning.

Authors:  Christian Wachinger; Polina Golland
Journal:  Inf Process Med Imaging       Date:  2015

4.  Spectral clustering using Nyström approximation for the accurate identification of cancer molecular subtypes.

Authors:  Mingguang Shi; Guofu Xu
Journal:  Sci Rep       Date:  2017-07-07       Impact factor: 4.379

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.