M Dugas1, S Merk, S Breit, P Dirschedl. 1. Department of Medical Informatics, Marchioninistr. 15, D-81377 Munich, Germany. dug@ibe.med.uni-muenchen.de
Abstract
MOTIVATION: Unsupervised clustering of microarray data may detect potentially important, but not obvious characteristics of samples, for instance subgroups of diagnoses with distinct gene profiles or systematic errors in experimentation. RESULTS: Multidimensional clustering (mdclust) is a method, which identifies sets of sample clusters and associated genes. It applies iteratively two-means clustering and score-based gene selection. For any phenotype variable best matching sets of clusters can be selected. This provides a method to identify gene-phenotype associations, suited even for settings with a large number of phenotype variables. An optional model based discriminant step may reduce further the number of selected genes.
MOTIVATION: Unsupervised clustering of microarray data may detect potentially important, but not obvious characteristics of samples, for instance subgroups of diagnoses with distinct gene profiles or systematic errors in experimentation. RESULTS: Multidimensional clustering (mdclust) is a method, which identifies sets of sample clusters and associated genes. It applies iteratively two-means clustering and score-based gene selection. For any phenotype variable best matching sets of clusters can be selected. This provides a method to identify gene-phenotype associations, suited even for settings with a large number of phenotype variables. An optional model based discriminant step may reduce further the number of selected genes.