Literature DB >> 21258660

A permutation test for determining significance of clusters with applications to spatial and gene expression data.

P J Park1, J Manjourides, M Bonetti, M Pagano.   

Abstract

Hierarchical clustering is a common procedure for identifying structure in a data set, and this is frequently used for organizing genomic data. Although more advanced clustering algorithms are available, the simplicity and visual appeal of hierarchical clustering has made it ubiquitous in gene expression data analysis. Hence, even minor improvements in this framework would have significant impact. There is currently no simple and systematic way of assessing and displaying the significance of various clusters in a resulting dendrogram without making certain distributional assumptions or ignoring gene-specific variances. In this work, we introduce a permutation test based on comparing the within-cluster structure of the observed data with those of sample datasets obtained by permuting the cluster membership. We carry out this test at each node of the dendrogram using a statistic derived from the singular value decomposition of variance matrices. The p-values thus obtained provide insight into the significance of each cluster division. Given these values, one can also modify the dendrogram by combining non-significant branches. By adjusting the cut-off level of significance for branches, one can produce dendrograms with a desired level of detail for ease of interpretation. We demonstrate the usefulness of this approach by applying it to illustrative data sets.

Entities:  

Year:  2009        PMID: 21258660      PMCID: PMC3023458          DOI: 10.1016/j.csda.2009.05.031

Source DB:  PubMed          Journal:  Comput Stat Data Anal        ISSN: 0167-9473            Impact factor:   1.681


  19 in total

1.  Assessing reliability of gene clusters from gene expression data.

Authors:  K Zhang; H Zhao
Journal:  Funct Integr Genomics       Date:  2000-11       Impact factor: 3.410

2.  Bootstrapping cluster analysis: assessing the reliability of conclusions from microarray experiments.

Authors:  M K Kerr; G A Churchill
Journal:  Proc Natl Acad Sci U S A       Date:  2001-07-24       Impact factor: 11.205

3.  The interpoint distance distribution as a descriptor of point patterns, with an application to spatial disease clustering.

Authors:  Marco Bonetti; Marcello Pagano
Journal:  Stat Med       Date:  2005-03-15       Impact factor: 2.373

4.  Robust multi-scale clustering of large DNA microarray datasets with the consensus algorithm.

Authors:  Thomas Grotkjaer; Ole Winther; Birgitte Regenberg; Jens Nielsen; Lars Kai Hansen
Journal:  Bioinformatics       Date:  2005-10-27       Impact factor: 6.937

5.  Odontologic kinship analysis in skeletal remains: concepts, methods, and results.

Authors:  K W Alt; W Vach
Journal:  Forensic Sci Int       Date:  1995-06-30       Impact factor: 2.395

6.  The Sverdlovsk anthrax outbreak of 1979.

Authors:  M Meselson; J Guillemin; M Hugh-Jones; A Langmuir; I Popova; A Shelokov; O Yampolskaya
Journal:  Science       Date:  1994-11-18       Impact factor: 47.728

7.  Molecular classification of cutaneous malignant melanoma by gene expression profiling.

Authors:  M Bittner; P Meltzer; Y Chen; Y Jiang; E Seftor; M Hendrix; M Radmacher; R Simon; Z Yakhini; A Ben-Dor; N Sampas; E Dougherty; E Wang; F Marincola; C Gooden; J Lueders; A Glatfelter; P Pollock; J Carpten; E Gillanders; D Leja; K Dietrich; C Beaudry; M Berens; D Alberts; V Sondak
Journal:  Nature       Date:  2000-08-03       Impact factor: 49.962

8.  Cluster analysis and display of genome-wide expression patterns.

Authors:  M B Eisen; P T Spellman; P O Brown; D Botstein
Journal:  Proc Natl Acad Sci U S A       Date:  1998-12-08       Impact factor: 11.205

9.  GenClust: a genetic algorithm for clustering gene expression data.

Authors:  Vito Di Gesú; Raffaele Giancarlo; Giosué Lo Bosco; Alessandra Raimondi; Davide Scaturro
Journal:  BMC Bioinformatics       Date:  2005-12-07       Impact factor: 3.169

10.  Clustering gene-expression data with repeated measurements.

Authors:  Ka Yee Yeung; Mario Medvedovic; Roger E Bumgarner
Journal:  Genome Biol       Date:  2003-04-25       Impact factor: 13.583

View more
  5 in total

Review 1.  Representation of memories in the cortical-hippocampal system: Results from the application of population similarity analyses.

Authors:  Sam McKenzie; Christopher S Keene; Anja Farovik; John Bladon; Ryan Place; Robert Komorowski; Howard Eichenbaum
Journal:  Neurobiol Learn Mem       Date:  2015-12-31       Impact factor: 2.877

2.  Detection of Significant Groups in Hierarchical Clustering by Resampling.

Authors:  Paola Sebastiani; Thomas T Perls
Journal:  Front Genet       Date:  2016-08-08       Impact factor: 4.599

3.  Preoperative liking and wanting for sweet beverages as predictors of body weight loss after Roux-en-Y gastric bypass and sleeve gastrectomy.

Authors:  Claudio E Perez-Leighton; Jeon D Hamm; Ari Shechter; Shoran Tamura; Blandine Laferrère; Jeanine Albu; Danielle Greenberg; Harry R Kissileff
Journal:  Int J Obes (Lond)       Date:  2019-10-22       Impact factor: 5.095

4.  Associations between neighborhood socioeconomic cluster and hypertension, diabetes, myocardial infarction, and coronary artery disease within a cohort of cardiac catheterization patients.

Authors:  Anne M Weaver; Laura A McGuinn; Lucas Neas; Robert B Devlin; Radhika Dhingra; Cavin K Ward-Caviness; Wayne E Cascio; William E Kraus; Elizabeth R Hauser; David Diaz-Sanchez
Journal:  Am Heart J       Date:  2021-10-02       Impact factor: 5.099

5.  NetDI: Methodology Elucidating the Role of Power and Dynamical Brain Network Features That Underpin Word Production.

Authors:  Sudha Yellapantula; Kiefer Forseth; Nitin Tandon; Behnaam Aazhang
Journal:  eNeuro       Date:  2021-02-09
  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.