| Literature DB >> 26316678 |
Hokeun Sun1, Wei Lin2, Rui Feng2, Hongzhe Li2.
Abstract
We consider estimation and variable selection in high-dimensional Cox regression when a prior knowledge of the relationships among the covariates, described by a network or graph, is available. A limitation of the existing methodology for survival analysis with high-dimensional genomic data is that a wealth of structural information about many biological processes, such as regulatory networks and pathways, has often been ignored. In order to incorporate such prior network information into the analysis of genomic data, we propose a network-based regularization method for high-dimensional Cox regression; it uses an ℓ1-penalty to induce sparsity of the regression coefficients and a quadratic Laplacian penalty to encourage smoothness between the coefficients of neighboring variables on a given network. The proposed method is implemented by an efficient coordinate descent algorithm. In the setting where the dimensionality p can grow exponentially fast with the sample size n, we establish model selection consistency and estimation bounds for the proposed estimators. The theoretical results provide insights into the gain from taking into account the network structural information. Extensive simulation studies indicate that our method outperforms Lasso and elastic net in terms of variable selection accuracy and stability. We apply our method to a breast cancer gene expression study and identify several biologically plausible subnetworks and pathways that are associated with breast cancer distant metastasis.Entities:
Keywords: Laplacian penalty; network analysis; regularization; sparsity; survival data; variable selection; weak oracle property
Year: 2014 PMID: 26316678 PMCID: PMC4549005 DOI: 10.5705/ss.2012.317
Source DB: PubMed Journal: Stat Sin ISSN: 1017-0405 Impact factor: 1.261