Literature DB >> 16448008

Minimum entropy clustering and applications to gene expression analysis.

Haifeng Li1, Keshu Zhang, Tao Jiang.   

Abstract

Clustering is a common methodology for analyzing the gene expression data. In this paper, we present a new clustering algorithm from an information-theoretic point of view. First, we propose the minimum entropy (measured on a posteriori probabilities) criterion, which is the conditional entropy of clusters given the observations. Fano's inequality indicates that it could be a good criterion for clustering. We generalize the criterion by replacing Shannon's entropy with Havrda-Charvat's structural alpha-entropy. Interestingly, the minimum entropy criterion based on structural alpha-entropy is equal to the probability error of the nearest neighbor method when alpha = 2. This is another evidence that the proposed criterion is good for clustering. With a non-parametric approach for estimating a posteriori probabilities, an efficient iterative algorithm is then established to minimize the entropy. The experimental results show that the clustering algorithm performs significantly better than k-means/medians, hierarchical clustering, SOM, and EM in terms of adjusted Rand index. Particularly, our algorithm performs very well even when the correct number of clusters is unknown. In addition, most clustering algorithms produce poor partitions in presence of outliers while our method can correctly reveal the structure of data and effectively identify outliers simultaneously.

Mesh:

Year:  2004        PMID: 16448008     DOI: 10.1109/csb.2004.1332427

Source DB:  PubMed          Journal:  Proc IEEE Comput Syst Bioinform Conf        ISSN: 1551-7497


  8 in total

1.  A stationary wavelet entropy-based clustering approach accurately predicts gene expression.

Authors:  Nha Nguyen; An Vo; Inchan Choi; Kyoung-Jae Won
Journal:  J Comput Biol       Date:  2014-11-10       Impact factor: 1.479

2.  Large-scale analysis of Arabidopsis transcription reveals a basal co-regulation network.

Authors:  Osnat Atias; Benny Chor; Daniel A Chamovitz
Journal:  BMC Syst Biol       Date:  2009-09-03

3.  iBBiG: iterative binary bi-clustering of gene sets.

Authors:  Daniel Gusenleitner; Eleanor A Howe; Stefan Bentink; John Quackenbush; Aedín C Culhane
Journal:  Bioinformatics       Date:  2012-07-12       Impact factor: 6.937

4.  Inference of disease-related molecular logic from systems-based microarray analysis.

Authors:  Vinay Varadan; Dimitris Anastassiou
Journal:  PLoS Comput Biol       Date:  2006-06-16       Impact factor: 4.475

5.  A Novel Subset of Human Tumors That Simultaneously Overexpress Multiple E2F-responsive Genes Found in Breast, Ovarian, and Prostate Cancers.

Authors:  Stanley E Shackney; Salim Akhter Chowdhury; Russell Schwartz
Journal:  Cancer Inform       Date:  2014-11-03

6.  NIFTI: an evolutionary approach for finding number of clusters in microarray data.

Authors:  Sudhakar Jonnalagadda; Rajagopalan Srinivasan
Journal:  BMC Bioinformatics       Date:  2009-01-30       Impact factor: 3.169

7.  Sample entropy analysis of cervical neoplasia gene-expression signatures.

Authors:  Shaleen K Botting; Jerome P Trzeciakowski; Michelle F Benoit; Salama A Salama; Concepcion R Diaz-Arrastia
Journal:  BMC Bioinformatics       Date:  2009-02-20       Impact factor: 3.169

8.  Ranked Adjusted Rand: integrating distance and partition information in a measure of clustering agreement.

Authors:  Francisco R Pinto; João A Carriço; Mário Ramirez; Jonas S Almeida
Journal:  BMC Bioinformatics       Date:  2007-02-07       Impact factor: 3.169

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.