Literature DB >> 11590094

Principal component analysis for clustering gene expression data.

K Y Yeung1, W L Ruzzo.   

Abstract

MOTIVATION: There is a great need to develop analytical methodology to analyze and to exploit the information contained in gene expression data. Because of the large number of genes and the complexity of biological networks, clustering is a useful exploratory technique for analysis of gene expression data. Other classical techniques, such as principal component analysis (PCA), have also been applied to analyze gene expression data. Using different data analysis techniques and different clustering algorithms to analyze the same data set can lead to very different conclusions. Our goal is to study the effectiveness of principal components (PCs) in capturing cluster structure. Specifically, using both real and synthetic gene expression data sets, we compared the quality of clusters obtained from the original data to the quality of clusters obtained after projecting onto subsets of the principal component axes.
RESULTS: Our empirical study showed that clustering with the PCs instead of the original variables does not necessarily improve, and often degrades, cluster quality. In particular, the first few PCs (which contain most of the variation in the data) do not necessarily capture most of the cluster structure. We also showed that clustering with PCs has different impact on different algorithms and different similarity metrics. Overall, we would not recommend PCA before clustering except in special circumstances.

Entities:  

Mesh:

Year:  2001        PMID: 11590094     DOI: 10.1093/bioinformatics/17.9.763

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  183 in total

1.  EXCAVATOR: a computer program for efficiently mining gene expression data.

Authors:  Dong Xu; Victor Olman; Li Wang; Ying Xu
Journal:  Nucleic Acids Res       Date:  2003-10-01       Impact factor: 16.971

2.  Translational bioinformatics and healthcare informatics: computational and ethical challenges.

Authors:  Prerna Sethi; Kimberly Theodos
Journal:  Perspect Health Inf Manag       Date:  2009-09-16

3.  Information Visualization Techniques in Bioinformatics during the Postgenomic Era.

Authors:  Ying Tao; Yang Liu; Carol Friedman; Yves A Lussier
Journal:  Drug Discov Today Biosilico       Date:  2004-11

4.  Asymptotic conditional singular value decomposition for high-dimensional genomic data.

Authors:  Jeffrey T Leek
Journal:  Biometrics       Date:  2010-06-16       Impact factor: 2.571

Review 5.  The evolution of bioinformatics in toxicology: advancing toxicogenomics.

Authors:  Cynthia A Afshari; Hisham K Hamadeh; Pierre R Bushel
Journal:  Toxicol Sci       Date:  2010-12-22       Impact factor: 4.849

6.  Low-dose pretreatment for radiation therapy.

Authors:  Richard Blankenbecler
Journal:  Dose Response       Date:  2010-09-10       Impact factor: 2.658

7.  The diversity changes of soil microbial communities stimulated by climate, soil type and vegetation type analyzed via a functional gene array.

Authors:  Fu Chen; Min Tan; Yongjun Yang; Jing Ma; Shaoliang Zhang; Gang Li
Journal:  World J Microbiol Biotechnol       Date:  2015-08-22       Impact factor: 3.312

8.  pH regulates genes for flagellar motility, catabolism, and oxidative stress in Escherichia coli K-12.

Authors:  Lisa M Maurer; Elizabeth Yohannes; Sandra S Bondurant; Michael Radmacher; Joan L Slonczewski
Journal:  J Bacteriol       Date:  2005-01       Impact factor: 3.490

9.  Sparse canonical correlation analysis relates network-level atrophy to multivariate cognitive measures in a neurodegenerative population.

Authors:  Brian B Avants; David J Libon; Katya Rascovsky; Ashley Boller; Corey T McMillan; Lauren Massimo; H Branch Coslett; Anjan Chatterjee; Rachel G Gross; Murray Grossman
Journal:  Neuroimage       Date:  2013-10-02       Impact factor: 6.556

10.  Mass spectrometry of the M. smegmatis proteome: protein expression levels correlate with function, operons, and codon bias.

Authors:  Rong Wang; John T Prince; Edward M Marcotte
Journal:  Genome Res       Date:  2005-08       Impact factor: 9.043

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.