Literature DB >> 16533412

Effect of data normalization on fuzzy clustering of DNA microarray data.

Seo Young Kim1, Jae Won Lee, Jong Sung Bae.   

Abstract

BACKGROUND: Microarray technology has made it possible to simultaneously measure the expression levels of large numbers of genes in a short time. Gene expression data is information rich; however, extensive data mining is required to identify the patterns that characterize the underlying mechanisms of action. Clustering is an important tool for finding groups of genes with similar expression patterns in microarray data analysis. However, hard clustering methods, which assign each gene exactly to one cluster, are poorly suited to the analysis of microarray datasets because in such datasets the clusters of genes frequently overlap.
RESULTS: In this study we applied the fuzzy partitional clustering method known as Fuzzy C-Means (FCM) to overcome the limitations of hard clustering. To identify the effect of data normalization, we used three normalization methods, the two common scale and location transformations and Lowess normalization methods, to normalize three microarray datasets and three simulated datasets. First we determined the optimal parameters for FCM clustering. We found that the optimal fuzzification parameter in the FCM analysis of a microarray dataset depended on the normalization method applied to the dataset during preprocessing. We additionally evaluated the effect of normalization of noisy datasets on the results obtained when hard clustering or FCM clustering was applied to those datasets. The effects of normalization were evaluated using both simulated datasets and microarray datasets. A comparative analysis showed that the clustering results depended on the normalization method used and the noisiness of the data. In particular, the selection of the fuzzification parameter value for the FCM method was sensitive to the normalization method used for datasets with large variations across samples.
CONCLUSION: Lowess normalization is more robust for clustering of genes from general microarray data than the two common scale and location adjustment methods when samples have varying expression patterns or are noisy. In particular, the FCM method slightly outperformed the hard clustering methods when the expression patterns of genes overlapped and was advantageous in finding co-regulated genes. Thus, the FCM approach offers a convenient method for finding subsets of genes that are strongly associated to a given cluster.

Entities:  

Mesh:

Year:  2006        PMID: 16533412      PMCID: PMC1431564          DOI: 10.1186/1471-2105-7-134

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  13 in total

Review 1.  Computational analysis of microarray data.

Authors:  J Quackenbush
Journal:  Nat Rev Genet       Date:  2001-06       Impact factor: 53.242

2.  Comparisons and validation of statistical clustering techniques for microarray gene expression data.

Authors:  Susmita Datta; Somnath Datta
Journal:  Bioinformatics       Date:  2003-03-01       Impact factor: 6.937

3.  Fuzzy C-means method for clustering microarray data.

Authors:  Doulaye Dembélé; Philippe Kastner
Journal:  Bioinformatics       Date:  2003-05-22       Impact factor: 6.937

4.  The transcriptional program in the response of human fibroblasts to serum.

Authors:  V R Iyer; M B Eisen; D T Ross; G Schuler; T Moore; J C Lee; J M Trent; L M Staudt; J Hudson; M S Boguski; D Lashkari; D Shalon; D Botstein; P O Brown
Journal:  Science       Date:  1999-01-01       Impact factor: 47.728

5.  A genome-wide transcriptional analysis of the mitotic cell cycle.

Authors:  R J Cho; M J Campbell; E A Winzeler; L Steinmetz; A Conway; L Wodicka; T G Wolfsberg; A E Gabrielian; D Landsman; D J Lockhart; R W Davis
Journal:  Mol Cell       Date:  1998-07       Impact factor: 17.970

6.  Cluster analysis and display of genome-wide expression patterns.

Authors:  M B Eisen; P T Spellman; P O Brown; D Botstein
Journal:  Proc Natl Acad Sci U S A       Date:  1998-12-08       Impact factor: 11.205

7.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization.

Authors:  P T Spellman; G Sherlock; M Q Zhang; V R Iyer; K Anders; M B Eisen; P O Brown; D Botstein; B Futcher
Journal:  Mol Biol Cell       Date:  1998-12       Impact factor: 4.138

8.  Fuzzy J-Means and VNS methods for clustering genes from microarray data.

Authors:  Nabil Belacel; Miroslava Cuperlović-Culf; Mark Laflamme; Rodney Ouellette
Journal:  Bioinformatics       Date:  2004-02-26       Impact factor: 6.937

9.  Optimized LOWESS normalization parameter selection for DNA microarray data.

Authors:  John A Berger; Sampsa Hautaniemi; Anna-Kaarina Järvinen; Henrik Edgren; Sanjit K Mitra; Jaakko Astola
Journal:  BMC Bioinformatics       Date:  2004-12-09       Impact factor: 3.169

10.  An adaptive method for cDNA microarray normalization.

Authors:  Yingdong Zhao; Ming-Chung Li; Richard Simon
Journal:  BMC Bioinformatics       Date:  2005-02-11       Impact factor: 3.169

View more
  8 in total

1.  Adjusting background noise in cluster analyses of longitudinal data.

Authors:  Shengtong Han; Hongmei Zhang; Wilfried Karmaus; Graham Roberts; Hasan Arshad
Journal:  Comput Stat Data Anal       Date:  2016-11-27       Impact factor: 1.681

2.  Fuzzy clustering of physicochemical and biochemical properties of amino acids.

Authors:  Indrajit Saha; Ujjwal Maulik; Sanghamitra Bandyopadhyay; Dariusz Plewczynski
Journal:  Amino Acids       Date:  2011-10-13       Impact factor: 3.520

3.  Challenges in microarray class discovery: a comprehensive examination of normalization, gene selection and clustering.

Authors:  Eva Freyhult; Mattias Landfors; Jenny Önskog; Torgeir R Hvidsten; Patrik Rydén
Journal:  BMC Bioinformatics       Date:  2010-10-11       Impact factor: 3.169

4.  Classification of microarrays; synergistic effects between normalization, gene selection and machine learning.

Authors:  Jenny Önskog; Eva Freyhult; Mattias Landfors; Patrik Rydén; Torgeir R Hvidsten
Journal:  BMC Bioinformatics       Date:  2011-10-07       Impact factor: 3.169

5.  Combining Pareto-optimal clusters using supervised learning for identifying co-expressed genes.

Authors:  Ujjwal Maulik; Anirban Mukhopadhyay; Sanghamitra Bandyopadhyay
Journal:  BMC Bioinformatics       Date:  2009-01-20       Impact factor: 3.169

6.  Altered expression patterns of lipid metabolism genes in an animal model of HCV core-related, nonobese, modest hepatic steatosis.

Authors:  Ming-Ling Chang; Chau-Ting Yeh; Jeng-Chang Chen; Chau-Chun Huang; Shi-Ming Lin; I-Shyan Sheen; Dar-In Tai; Chia-Ming Chu; Wei-Pin Lin; Ming-Yu Chang; Chun-Kai Liang; Cheng-Tang Chiu; Deng-Yn Lin
Journal:  BMC Genomics       Date:  2008-02-29       Impact factor: 3.969

7.  A comprehensive comparison of different clustering methods for reliability analysis of microarray data.

Authors:  Rahele Kafieh; Alireza Mehridehnavi
Journal:  J Med Signals Sens       Date:  2013-01

8.  Fuzzy technique for microcalcifications clustering in digital mammograms.

Authors:  Letizia Vivona; Donato Cascio; Francesco Fauci; Giuseppe Raso
Journal:  BMC Med Imaging       Date:  2014-06-24       Impact factor: 1.930

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.