Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Evaluation of normalization and pre-clustering issues in a novel clustering approach: global optimum search with enhanced positioning.

Literature DB >> 17787062

Evaluation of normalization and pre-clustering issues in a novel clustering approach: global optimum search with enhanced positioning.

Meng P Tan¹, James R Broach, Christodoulos A Floudas.

Abstract

We study the effects on clustering quality by different normalization and pre-clustering techniques for a novel mixed-integer nonlinear optimization-based clustering algorithm, the Global Optimum Search with Enhanced Positioning (EP_GOS_Clust). These are important issues to be addressed. DNA microarray experiments are informative tools to elucidate gene regulatory networks. But in order for gene expression levels to be comparable across microarrays, normalization procedures have to be properly undertaken. The aim of pre-clustering is to use an adequate amount of discriminatory characteristics to form rough information profiles, so that data with similar features can be pre-grouped together and outliers deemed insignificant to the clustering process can be removed. Using experimental DNA microarray data from the yeast Saccharomyces Cerevisiae, we study the merits of pre-clustering genes based on distance/correlation comparisons and symbolic representations such as {+, o, -}. As a performance metric, we look at the intra- and inter-cluster error sums, two generic but intuitive measures of clustering quality. We also use publicly available Gene Ontology resources to assess the clusters' level of biological coherence. Our analysis indicates a significant effect by normalization and pre-clustering methods on the clustering results. Hence, the outcome of this study has significance in fine-tuning the EP_GOS_Clust clustering approach.

Entities: Disease Species

Mesh：

Substances：
RNA

Year: 2007 PMID： 17787062 DOI： 10.1142/s0219720007002941

Source DB: PubMed Journal: J Bioinform Comput Biol ISSN： 0219-7200 Impact factor: 1.122

Keyword Cloud
Cited

7 in total

1. Using hierarchical clustering and dendrograms to quantify the clustering of membrane proteins.

Authors: Flor A Espinoza; Janet M Oliver; Bridget S Wilson; Stanly L Steinberg
Journal: Bull Math Biol Date: 2011-07-13 Impact factor: 1.758

2. Selecting high quality protein structures from diverse conformational ensembles.

Authors: Ashwin Subramani; Peter A DiMaggio; Christodoulos A Floudas
Journal: Biophys J Date: 2009-09-16 Impact factor: 4.033

3. Clustering of High Throughput Gene Expression Data.

Authors: Harun Pirim; Burak Ekşioğlu; Andy Perkins; Cetin Yüceer
Journal: Comput Oper Res Date: 2012-12 Impact factor: 4.008

4. A novel framework for predicting in vivo toxicities from in vitro data using optimal methods for dense and sparse matrix reordering and logistic regression.

Authors: Peter A DiMaggio; Ashwin Subramani; Richard S Judson; Christodoulos A Floudas
Journal: Toxicol Sci Date: 2010-08-11 Impact factor: 4.849

Evaluation of normalization and pre-clustering issues in a novel clustering approach: global optimum search with enhanced positioning.

1. Using hierarchical clustering and dendrograms to quantify the clustering of membrane proteins.

2. Selecting high quality protein structures from diverse conformational ensembles.

3. Clustering of High Throughput Gene Expression Data.

4. A novel framework for predicting in vivo toxicities from in vitro data using optimal methods for dense and sparse matrix reordering and logistic regression.

5. Effects of tobacco smoke on gene expression and cellular pathways in a cellular model of oral leukoplakia.

6. Biclustering via optimal re-ordering of data matrices in systems biology: rigorous methods and comparative studies.

7. Microarray data mining: a novel optimization-based approach to uncover biologically coherent structures.