Literature DB >> 26068881

A New Distance Metric for Unsupervised Learning of Categorical Data.

Hong Jia, Yiu-Ming Cheung, Jiming Liu.   

Abstract

Distance metric is the basis of many learning algorithms, and its effectiveness usually has a significant influence on the learning results. In general, measuring distance for numerical data is a tractable task, but it could be a nontrivial problem for categorical data sets. This paper, therefore, presents a new distance metric for categorical data based on the characteristics of categorical values. In particular, the distance between two values from one attribute measured by this metric is determined by both the frequency probabilities of these two values and the values of other attributes that have high interdependence with the calculated one. Dynamic attribute weight is further designed to adjust the contribution of each attribute-distance to the distance between the whole data objects. Promising experimental results on different real data sets have shown the effectiveness of the proposed distance metric.

Entities:  

Year:  2015        PMID: 26068881     DOI: 10.1109/TNNLS.2015.2436432

Source DB:  PubMed          Journal:  IEEE Trans Neural Netw Learn Syst        ISSN: 2162-237X            Impact factor:   10.451


  2 in total

1.  CDE++: Learning Categorical Data Embedding by Enhancing Heterogeneous Feature Value Coupling Relationships.

Authors:  Bin Dong; Songlei Jian; Ke Zuo
Journal:  Entropy (Basel)       Date:  2020-03-29       Impact factor: 2.524

2.  A visual analytic approach for the identification of ICU patient subpopulations using ICD diagnostic codes.

Authors:  Daniel Alcaide; Jan Aerts
Journal:  PeerJ Comput Sci       Date:  2021-04-06
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.