Literature DB >> 12016051

Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees.

Ying Xu1, Victor Olman, Dong Xu.   

Abstract

MOTIVATION: Gene expression data clustering provides a powerful tool for studying functional relationships of genes in a biological process. Identifying correlated expression patterns of genes represents the basic challenge in this clustering problem.
RESULTS: This paper describes a new framework for representing a set of multi-dimensional gene expression data as a Minimum Spanning Tree (MST), a concept from the graph theory. A key property of this representation is that each cluster of the expression data corresponds to one subtree of the MST, which rigorously converts a multi-dimensional clustering problem to a tree partitioning problem. We have demonstrated that though the inter-data relationship is greatly simplified in the MST representation, no essential information is lost for the purpose of clustering. Two key advantages in representing a set of multi-dimensional data as an MST are: (1) the simple structure of a tree facilitates efficient implementations of rigorous clustering algorithms, which otherwise are highly computationally challenging; and (2) as an MST-based clustering does not depend on detailed geometric shape of a cluster, it can overcome many of the problems faced by classical clustering algorithms. Based on the MST representation, we have developed a number of rigorous and efficient clustering algorithms, including two with guaranteed global optimality. We have implemented these algorithms as a computer software EXpression data Clustering Analysis and VisualizATiOn Resource (EXCAVATOR). To demonstrate its effectiveness, we have tested it on three data sets, i.e. expression data from yeast Saccharomyces cerevisiae, expression data in response of human fibroblasts to serum, and Arabidopsis expression data in response to chitin elicitation. The test results are highly encouraging. AVAILABILITY: EXCAVATOR is available on request from the authors.

Entities:  

Mesh:

Year:  2002        PMID: 12016051     DOI: 10.1093/bioinformatics/18.4.536

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  26 in total

1.  ESPD: a pattern detection model underlying gene expression profiles.

Authors:  Chun Tang; Aidong Zhang; Murali Ramanathan
Journal:  Bioinformatics       Date:  2004-01-29       Impact factor: 6.937

2.  Local Context Finder (LCF) reveals multidimensional relationships among mRNA expression profiles of Arabidopsis responding to pathogen infection.

Authors:  Fumiaki Katagiri; Jane Glazebrook
Journal:  Proc Natl Acad Sci U S A       Date:  2003-09-05       Impact factor: 11.205

3.  EXCAVATOR: a computer program for efficiently mining gene expression data.

Authors:  Dong Xu; Victor Olman; Li Wang; Ying Xu
Journal:  Nucleic Acids Res       Date:  2003-10-01       Impact factor: 16.971

4.  Density-equalizing Euclidean minimum spanning trees for the detection of all disease cluster shapes.

Authors:  Shannon C Wieland; John S Brownstein; Bonnie Berger; Kenneth D Mandl
Journal:  Proc Natl Acad Sci U S A       Date:  2007-05-22       Impact factor: 11.205

5.  Analysis of time-series gene expression data: methods, challenges, and opportunities.

Authors:  I P Androulakis; E Yang; R R Almon
Journal:  Annu Rev Biomed Eng       Date:  2007       Impact factor: 9.590

6.  TreeVis: a MATLAB-based tool for tree visualization.

Authors:  Peng Qiu; Sylvia K Plevritis
Journal:  Comput Methods Programs Biomed       Date:  2012-10-01       Impact factor: 5.428

7.  A temporal precedence based clustering method for gene expression microarray data.

Authors:  Ritesh Krishna; Chang-Tsun Li; Vicky Buchanan-Wollaston
Journal:  BMC Bioinformatics       Date:  2010-01-30       Impact factor: 3.169

8.  A differentiation-based phylogeny of cancer subtypes.

Authors:  Markus Riester; Camille Stephan-Otto Attolini; Robert J Downey; Samuel Singer; Franziska Michor
Journal:  PLoS Comput Biol       Date:  2010-05-06       Impact factor: 4.475

9.  Iterative class discovery and feature selection using Minimal Spanning Trees.

Authors:  Sudhir Varma; Richard Simon
Journal:  BMC Bioinformatics       Date:  2004-09-08       Impact factor: 3.169

10.  QUBIC: a qualitative biclustering algorithm for analyses of gene expression data.

Authors:  Guojun Li; Qin Ma; Haibao Tang; Andrew H Paterson; Ying Xu
Journal:  Nucleic Acids Res       Date:  2009-06-09       Impact factor: 16.971

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.