| Literature DB >> 28620908 |
Jiehuan Sun1, Jose D Herazo-Maya2, Naftali Kaminski2, Hongyu Zhao1, Joshua L Warren1.
Abstract
Subgroup identification (clustering) is an important problem in biomedical research. Gene expression profiles are commonly utilized to define subgroups. Longitudinal gene expression profiles might provide additional information on disease progression than what is captured by baseline profiles alone. Therefore, subgroup identification could be more accurate and effective with the aid of longitudinal gene expression data. However, existing statistical methods are unable to fully utilize these data for patient clustering. In this article, we introduce a novel clustering method in the Bayesian setting based on longitudinal gene expression profiles. This method, called BClustLonG, adopts a linear mixed-effects framework to model the trajectory of genes over time, while clustering is jointly conducted based on the regression coefficients obtained from all genes. In order to account for the correlations among genes and alleviate the high dimensionality challenges, we adopt a factor analysis model for the regression coefficients. The Dirichlet process prior distribution is utilized for the means of the regression coefficients to induce clustering. Through extensive simulation studies, we show that BClustLonG has improved performance over other clustering methods. When applied to a dataset of severely injured (burn or trauma) patients, our model is able to identify interesting subgroups.Entities:
Keywords: Bayesian factor analysis; Bayesian nonparametrics; clustering; longitudinal gene expression study
Mesh:
Year: 2017 PMID: 28620908 PMCID: PMC5583037 DOI: 10.1002/sim.7374
Source DB: PubMed Journal: Stat Med ISSN: 0277-6715 Impact factor: 2.373