Literature DB >> 16566500

Metric learning for text documents.

Guy Lebanon1.   

Abstract

Many algorithms in machine learning rely on being given a good distance metric over the input space. Rather than using a default metric such as the Euclidean metric, it is desirable to obtain a metric based on the provided data. We consider the problem of learning a Riemannian metric associated with a given differentiable manifold and a set of points. Our approach to the problem involves choosing a metric from a parametric family that is based on maximizing the inverse volume of a given data set of points. From a statistical perspective, it is related to maximum likelihood under a model that assigns probabilities inversely proportional to the Riemannian volume element. We discuss in detail learning a metric on the multinomial simplex where the metric candidates are pull-back metrics of the Fisher information under a Lie group of transformations. When applied to text document classification the resulting geodesic distance resemble, but outperform, the tfidf cosine similarity measure.

Mesh:

Year:  2006        PMID: 16566500     DOI: 10.1109/TPAMI.2006.77

Source DB:  PubMed          Journal:  IEEE Trans Pattern Anal Mach Intell        ISSN: 0098-5589            Impact factor:   6.226


  2 in total

1.  An Efficient Framework for Constructing Generalized Locally-Induced Text Metrics.

Authors:  Saeed Amizadeh; Shuguang Wang; Milos Hauskrecht
Journal:  IJCAI (U S)       Date:  2011

2.  Spherical Minimum Description Length.

Authors:  Trevor Herntier; Koffi Eddy Ihou; Anthony Smith; Anand Rangarajan; Adrian Peter
Journal:  Entropy (Basel)       Date:  2018-08-03       Impact factor: 2.524

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.