Claudio Lottaz1, Joern Toedling, Rainer Spang. 1. Max Planck Institute for Molecular Genetics and Berlin Center for Genome Based Bioinformatics, Ihnestr. 73, D-14195 Berlin, Germany. claudio.lottaz@molgen.mpg.de
Abstract
MOTIVATION: Clustering algorithms are widely used in the analysis of microarray data. In clinical studies, they are often applied to find groups of co-regulated genes. Clustering, however, can also stratify patients by similarity of their gene expression profiles, thereby defining novel disease entities based on molecular characteristics. Several distance-based cluster algorithms have been suggested, but little attention has been given to the distance measure between patients. Even with the Euclidean metric, including and excluding genes from the analysis leads to different distances between the same objects, and consequently different clustering results. RESULTS: We describe a new clustering algorithm, in which gene selection is used to derive biologically meaningful clusterings of samples by combining expression profiles and functional annotation data. According to gene annotations, candidate gene sets with specific functional characterizations are generated. Each set defines a different distance measure between patients, leading to different clusterings. These clusterings are filtered using a resampling-based significance measure. Significant clusterings are reported together with the underlying gene sets and their functional definition. CONCLUSIONS: Our method reports clusterings defined by biologically focused sets of genes. In annotation-driven clusterings, we have recovered clinically relevant patient subgroups through biologically plausible sets of genes as well as new subgroupings. We conjecture that our method has the potential to reveal so far unknown, clinically relevant classes of patients in an unsupervised manner. AVAILABILITY: We provide the R package adSplit as part of Bioconductor release 1.9 and on http://compdiag.molgen.mpg.de/software.
MOTIVATION: Clustering algorithms are widely used in the analysis of microarray data. In clinical studies, they are often applied to find groups of co-regulated genes. Clustering, however, can also stratify patients by similarity of their gene expression profiles, thereby defining novel disease entities based on molecular characteristics. Several distance-based cluster algorithms have been suggested, but little attention has been given to the distance measure between patients. Even with the Euclidean metric, including and excluding genes from the analysis leads to different distances between the same objects, and consequently different clustering results. RESULTS: We describe a new clustering algorithm, in which gene selection is used to derive biologically meaningful clusterings of samples by combining expression profiles and functional annotation data. According to gene annotations, candidate gene sets with specific functional characterizations are generated. Each set defines a different distance measure between patients, leading to different clusterings. These clusterings are filtered using a resampling-based significance measure. Significant clusterings are reported together with the underlying gene sets and their functional definition. CONCLUSIONS: Our method reports clusterings defined by biologically focused sets of genes. In annotation-driven clusterings, we have recovered clinically relevant patient subgroups through biologically plausible sets of genes as well as new subgroupings. We conjecture that our method has the potential to reveal so far unknown, clinically relevant classes of patients in an unsupervised manner. AVAILABILITY: We provide the R package adSplit as part of Bioconductor release 1.9 and on http://compdiag.molgen.mpg.de/software.
Authors: Yasmina Bauer; John Tedrow; Simon de Bernard; Magdalena Birker-Robaczewska; Kevin F Gibson; Brenda Juan Guardela; Patrick Hess; Axel Klenk; Kathleen O Lindell; Sylvie Poirey; Bérengère Renault; Markus Rey; Edgar Weber; Oliver Nayler; Naftali Kaminski Journal: Am J Respir Cell Mol Biol Date: 2015-02 Impact factor: 6.914
Authors: Jonathan L Hess; Daniel S Tylee; Rahul Barve; Simone de Jong; Roel A Ophoff; Nishantha Kumarasinghe; Paul Tooney; Ulrich Schall; Erin Gardiner; Natalie Jane Beveridge; Rodney J Scott; Surangi Yasawardene; Antionette Perera; Jayan Mendis; Vaughan Carr; Brian Kelly; Murray Cairns; Ming T Tsuang; Stephen J Glatt Journal: Schizophr Res Date: 2016-07-20 Impact factor: 4.939