Literature DB >> 18310618

Network-constrained regularization and variable selection for analysis of genomic data.

Caiyan Li1, Hongzhe Li.   

Abstract

MOTIVATION: Graphs or networks are common ways of depicting information. In biology in particular, many different biological processes are represented by graphs, such as regulatory networks or metabolic pathways. This kind of a priori information gathered over many years of biomedical research is a useful supplement to the standard numerical genomic data such as microarray gene-expression data. How to incorporate information encoded by the known biological networks or graphs into analysis of numerical data raises interesting statistical challenges. In this article, we introduce a network-constrained regularization procedure for linear regression analysis in order to incorporate the information from these graphs into an analysis of the numerical data, where the network is represented as a graph and its corresponding Laplacian matrix. We define a network-constrained penalty function that penalizes the L(1)-norm of the coefficients but encourages smoothness of the coefficients on the network.
RESULTS: Simulation studies indicated that the method is quite effective in identifying genes and subnetworks that are related to disease and has higher sensitivity than the commonly used procedures that do not use the pathway structure information. Application to one glioblastoma microarray gene-expression dataset identified several subnetworks on several of the Kyoto Encyclopedia of Genes and Genomes (KEGG) transcriptional pathways that are related to survival from glioblastoma, many of which were supported by published literatures.
CONCLUSIONS: The proposed network-constrained regularization procedure efficiently utilizes the known pathway structures in identifying the relevant genes and the subnetworks that might be related to phenotype in a general regression framework. As more biological networks are identified and documented in databases, the proposed method should find more applications in identifying the subnetworks that are related to diseases and other biological processes.

Entities:  

Mesh:

Substances:

Year:  2008        PMID: 18310618     DOI: 10.1093/bioinformatics/btn081

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  166 in total

1.  Bayesian Joint Modeling of Multiple Gene Networks and Diverse Genomic Data to Identify Target Genes of a Transcription Factor.

Authors:  Peng Wei; Wei Pan
Journal:  Ann Appl Stat       Date:  2012-01-01       Impact factor: 2.083

2.  Empirical Bayes conditional independence graphs for regulatory network recovery.

Authors:  Rami Mahdi; Abishek S Madduri; Guoqing Wang; Yael Strulovici-Barel; Jacqueline Salit; Neil R Hackett; Ronald G Crystal; Jason G Mezey
Journal:  Bioinformatics       Date:  2012-06-08       Impact factor: 6.937

3.  glmgraph: an R package for variable selection and predictive modeling of structured genomic data.

Authors:  Li Chen; Han Liu; Jean-Pierre A Kocher; Hongzhe Li; Jun Chen
Journal:  Bioinformatics       Date:  2015-08-26       Impact factor: 6.937

4.  Principal Component Analysis With Sparse Fused Loadings.

Authors:  Jian Guo; Gareth James; Elizaveta Levina; George Michailidis; Ji Zhu
Journal:  J Comput Graph Stat       Date:  2010       Impact factor: 2.302

5.  Scalable Bayesian variable selection for structured high-dimensional data.

Authors:  Changgee Chang; Suprateek Kundu; Qi Long
Journal:  Biometrics       Date:  2018-05-08       Impact factor: 2.571

6.  Improving biomarker list stability by integration of biological knowledge in the learning process.

Authors:  Tiziana Sanavia; Fabio Aiolli; Giovanni Da San Martino; Andrea Bisognin; Barbara Di Camillo
Journal:  BMC Bioinformatics       Date:  2012-03-28       Impact factor: 3.169

7.  Graph- and rule-based learning algorithms: a comprehensive review of their applications for cancer type classification and prognosis using genomic data.

Authors:  Saurav Mallik; Zhongming Zhao
Journal:  Brief Bioinform       Date:  2020-03-23       Impact factor: 11.622

8.  The Cluster Elastic Net for High-Dimensional Regression With Unknown Variable Grouping.

Authors:  Daniela M Witten; Ali Shojaie; Fan Zhang
Journal:  Technometrics       Date:  2014-02-20

9.  Hierarchical Feature Selection Incorporating Known and Novel Biological Information: Identifying Genomic Features Related to Prostate Cancer Recurrence.

Authors:  Yize Zhao; Matthias Chung; Brent A Johnson; Carlos S Moreno; Qi Long
Journal:  J Am Stat Assoc       Date:  2017-01-04       Impact factor: 5.033

10.  A Sparse Structured Shrinkage Estimator for Nonparametric Varying-Coefficient Model with an Application in Genomics.

Authors:  Z John Daye; Jichun Xie; Hongzhe Li
Journal:  J Comput Graph Stat       Date:  2012-04-10       Impact factor: 2.302

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.