Literature DB >> 11747539

Clustering based on conditional distributions in an auxiliary space.

Janne Sinkkonen1, Samuel Kaski.   

Abstract

We study the problem of learning groups or categories that are local in the continuous primary space but homogeneous by the distributions of an associated auxiliary random variable over a discrete auxiliary space. Assuming that variation in the auxiliary space is meaningful, categories will emphasize similarly meaningful aspects of the primary space. From a data set consisting of pairs of primary and auxiliary items, the categories are learned by minimizing a Kullback-Leibler divergence-based distortion between (implicitly estimated) distributions of the auxiliary data, conditioned on the primary data. Still, the categories are defined in terms of the primary space. An online algorithm resembling the traditional Hebb-type competitive learning is introduced for learning the categories. Minimizing the distortion criterion turns out to be equivalent to maximizing the mutual information between the categories and the auxiliary data. In addition, connections to density estimation and to the distributional clustering paradigm are outlined. The method is demonstrated by clustering yeast gene expression data from DNA chips, with biological knowledge about the functional classes of the genes as the auxiliary data.

Entities:  

Mesh:

Year:  2002        PMID: 11747539     DOI: 10.1162/089976602753284509

Source DB:  PubMed          Journal:  Neural Comput        ISSN: 0899-7667            Impact factor:   2.026


  4 in total

1.  Judging the quality of gene expression-based clustering methods using gene annotation.

Authors:  Francis D Gibbons; Frederick P Roth
Journal:  Genome Res       Date:  2002-10       Impact factor: 9.043

2.  A Possible World-Based Fusion Estimation Model for Uncertain Data Clustering in WBNs.

Authors:  Chao Li; Zhenjiang Zhang; Wei Wei; Han-Chieh Chao; Xuejun Liu
Journal:  Sensors (Basel)       Date:  2021-01-28       Impact factor: 3.576

3.  Validating module network learning algorithms using simulated data.

Authors:  Tom Michoel; Steven Maere; Eric Bonnet; Anagha Joshi; Yvan Saeys; Tim Van den Bulcke; Koenraad Van Leemput; Piet van Remortel; Martin Kuiper; Kathleen Marchal; Yves Van de Peer
Journal:  BMC Bioinformatics       Date:  2007-05-03       Impact factor: 3.169

4.  Trustworthiness and metrics in visualizing similarity of gene expression.

Authors:  Samuel Kaski; Janne Nikkilä; Merja Oja; Jarkko Venna; Petri Törönen; Eero Castrén
Journal:  BMC Bioinformatics       Date:  2003-10-13       Impact factor: 3.169

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.