Devin C Koestler1, Carmen J Marsit2, Brock C Christensen2, Karl T Kelsey3, E Andres Houseman4. 1. Department of Biostatistics, University of Kansas Medical Center, Kansas City, KS 66160, USA. 2. Department of Community and Family Medicine, Section for Biostatistics and Epidemiology, Dartmouth Medical School, Hanover, New Hampshire 03756, USA ; Department of Pharmacology and Toxicology, Dartmouth College, Hanover, NH 03756, USA. 3. Department of Epidemiology, Brown University, Providence, RI 02192, USA. 4. Department of Public Health, Oregon State University, Corvallis, OR 97331, USA.
Abstract
BACKGROUND: Longitudinally collected gene expression data provides an opportunity to investigate the dynamic behavior of gene expression and is crucial for establishing causal links between changes on a molecular level and disease development and progression. In terms of the analysis of such data, clustering of subjects based on time-course expression data may improve our understanding of temporal expression patterns that result in disease phenotypes. Although there are numerous existing methods for clustering subjects using gene expression data, most are not suitable when expression measurements are repeatedly collected over a time-course. METHODS: We present a modified version of the recursively partitioned mixture model (RPMM) for clustering subjects based on longitudinally collected gene expression data. In the proposed time-course RPMM (TC-RPMM), subjects are clustered on the basis of their temporal profiles of gene expression using a mixture of mixed effects models framework. This framework captures changes in gene expression over time and models the autocorrelation between repeated gene expression measurements for the same subject. We assessed the performance of TC-RPMM using extensive simulation studies and a dataset from a multi-center research study of inflammation and response to injury (www.gluegrant.org), which consisted of time-course gene expression data for 140 subjects. RESULTS: Our simulation studies encompassed several different scenarios and were aimed at assessing the ability of TC-RPMM to correctly recover true class memberships when the expression trajectories that characterized those classes differed. Overall, our simulation studies revealed favorable performance of TC-RPMM compared to competing approaches, however clustering performance was observed to be highly dependent on the proportion of class discriminating genes used in clustering analysis. When applied to real epidemiologic data with repeated-measures, longitudinal gene expression measurements, TC-RPMM identified clusters that had strong biological and clinical significance. CONCLUSIONS: Methods for clustering subjects based on temporal gene expression profiles is a high priority for molecular biology and bioinformatics research. Along these lines, the proposed TC-RPMM represents a promising new approach for analyzing time-course gene expression data.
BACKGROUND: Longitudinally collected gene expression data provides an opportunity to investigate the dynamic behavior of gene expression and is crucial for establishing causal links between changes on a molecular level and disease development and progression. In terms of the analysis of such data, clustering of subjects based on time-course expression data may improve our understanding of temporal expression patterns that result in disease phenotypes. Although there are numerous existing methods for clustering subjects using gene expression data, most are not suitable when expression measurements are repeatedly collected over a time-course. METHODS: We present a modified version of the recursively partitioned mixture model (RPMM) for clustering subjects based on longitudinally collected gene expression data. In the proposed time-course RPMM (TC-RPMM), subjects are clustered on the basis of their temporal profiles of gene expression using a mixture of mixed effects models framework. This framework captures changes in gene expression over time and models the autocorrelation between repeated gene expression measurements for the same subject. We assessed the performance of TC-RPMM using extensive simulation studies and a dataset from a multi-center research study of inflammation and response to injury (www.gluegrant.org), which consisted of time-course gene expression data for 140 subjects. RESULTS: Our simulation studies encompassed several different scenarios and were aimed at assessing the ability of TC-RPMM to correctly recover true class memberships when the expression trajectories that characterized those classes differed. Overall, our simulation studies revealed favorable performance of TC-RPMM compared to competing approaches, however clustering performance was observed to be highly dependent on the proportion of class discriminating genes used in clustering analysis. When applied to real epidemiologic data with repeated-measures, longitudinal gene expression measurements, TC-RPMM identified clusters that had strong biological and clinical significance. CONCLUSIONS: Methods for clustering subjects based on temporal gene expression profiles is a high priority for molecular biology and bioinformatics research. Along these lines, the proposed TC-RPMM represents a promising new approach for analyzing time-course gene expression data.
Authors: Carmen J Marsit; Devin C Koestler; Brock C Christensen; Margaret R Karagas; E Andres Houseman; Karl T Kelsey Journal: J Clin Oncol Date: 2011-02-22 Impact factor: 44.544
Authors: Devin C Koestler; Carmen J Marsit; Brock C Christensen; Margaret R Karagas; Raphael Bueno; David J Sugarbaker; Karl T Kelsey; E Andres Houseman Journal: Bioinformatics Date: 2010-08-16 Impact factor: 6.937
Authors: Ana Gutiérrez-Fernández; Masaki Inada; Milagros Balbín; Antonio Fueyo; Ana S Pitiot; Aurora Astudillo; Kenji Hirose; Michiko Hirata; Steven D Shapiro; Agnès Noël; Zena Werb; Stephen M Krane; Carlos López-Otín; Xose S Puente Journal: FASEB J Date: 2007-03-28 Impact factor: 5.191
Authors: Marcilio C P de Souto; Ivan G Costa; Daniel S A de Araujo; Teresa B Ludermir; Alexander Schliep Journal: BMC Bioinformatics Date: 2008-11-27 Impact factor: 3.169
Authors: Jolein Mijnes; Jürgen Veeck; Nadine T Gaisa; Eduard Burghardt; Tim C de Ruijter; Sonja Gostek; Edgar Dahl; David Pfister; Sebastian C Schmid; Ruth Knüchel; Michael Rose Journal: Clin Epigenetics Date: 2018-02-06 Impact factor: 6.551