Zaixiang Tang1,2,3,4, Yueping Shen1,2, Yan Li4, Xinyan Zhang4, Jia Wen5, Chen'ao Qian6, Wenzhuo Zhuang7, Xinghua Shi5, Nengjun Yi4. 1. Department of Biostatistics, School of Public Health. 2. Jiangsu Key Laboratory of Preventive and Translational Medicine for Geriatric Diseases. 3. Center for Genetic Epidemiology and Genomics, Medical College of Soochow University, Suzhou 215123, China. 4. Department of Biostatistics, University of Alabama at Birmingham, Birmingham, AL 35294, USA. 5. Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC 28223, USA. 6. Department of Bioinformatics. 7. Department of Cell Biology, School of Biology & Basic Medical Science, Soochow University, Suzhou 215123, China.
Abstract
Motivation: Large-scale molecular data have been increasingly used as an important resource for prognostic prediction of diseases and detection of associated genes. However, standard approaches for omics data analysis ignore the group structure among genes encoded in functional relationships or pathway information. Results: We propose new Bayesian hierarchical generalized linear models, called group spike-and-slab lasso GLMs, for predicting disease outcomes and detecting associated genes by incorporating large-scale molecular data and group structures. The proposed model employs a mixture double-exponential prior for coefficients that induces self-adaptive shrinkage amount on different coefficients. The group information is incorporated into the model by setting group-specific parameters. We have developed a fast and stable deterministic algorithm to fit the proposed hierarchal GLMs, which can perform variable selection within groups. We assess the performance of the proposed method on several simulated scenarios, by varying the overlap among groups, group size, number of non-null groups, and the correlation within group. Compared with existing methods, the proposed method provides not only more accurate estimates of the parameters but also better prediction. We further demonstrate the application of the proposed procedure on three cancer datasets by utilizing pathway structures of genes. Our results show that the proposed method generates powerful models for predicting disease outcomes and detecting associated genes. Availability and implementation: The methods have been implemented in a freely available R package BhGLM (http://www.ssg.uab.edu/bhglm/). Contact: nyi@uab.edu. Supplementary information: Supplementary data are available at Bioinformatics online.
Motivation: Large-scale molecular data have been increasingly used as an important resource for prognostic prediction of diseases and detection of associated genes. However, standard approaches for omics data analysis ignore the group structure among genes encoded in functional relationships or pathway information. Results: We propose new Bayesian hierarchical generalized linear models, called group spike-and-slab lasso GLMs, for predicting disease outcomes and detecting associated genes by incorporating large-scale molecular data and group structures. The proposed model employs a mixture double-exponential prior for coefficients that induces self-adaptive shrinkage amount on different coefficients. The group information is incorporated into the model by setting group-specific parameters. We have developed a fast and stable deterministic algorithm to fit the proposed hierarchal GLMs, which can perform variable selection within groups. We assess the performance of the proposed method on several simulated scenarios, by varying the overlap among groups, group size, number of non-null groups, and the correlation within group. Compared with existing methods, the proposed method provides not only more accurate estimates of the parameters but also better prediction. We further demonstrate the application of the proposed procedure on three cancer datasets by utilizing pathway structures of genes. Our results show that the proposed method generates powerful models for predicting disease outcomes and detecting associated genes. Availability and implementation: The methods have been implemented in a freely available R package BhGLM (http://www.ssg.uab.edu/bhglm/). Contact: nyi@uab.edu. Supplementary information: Supplementary data are available at Bioinformatics online.
Authors: Jyoti Shankar; Sebastian Szpakowski; Norma V Solis; Stephanie Mounaud; Hong Liu; Liliana Losada; William C Nierman; Scott G Filler Journal: BMC Bioinformatics Date: 2015-02-01 Impact factor: 3.169
Authors: Yuan Yuan; Eliezer M Van Allen; Larsson Omberg; Nikhil Wagle; Ali Amin-Mansour; Artem Sokolov; Lauren A Byers; Yanxun Xu; Kenneth R Hess; Lixia Diao; Leng Han; Xuelin Huang; Michael S Lawrence; John N Weinstein; Josh M Stuart; Gordon B Mills; Levi A Garraway; Adam A Margolin; Gad Getz; Han Liang Journal: Nat Biotechnol Date: 2014-06-22 Impact factor: 54.908
Authors: Michael E Seifert; Joseph P Gaut; Boyi Guo; Sanjay Jain; Andrew F Malone; Feargal Geraghty; Deborah L Della Manna; Eddy S Yang; Nengjun Yi; Daniel C Brennan; Roslyn B Mannon Journal: Am J Transplant Date: 2019-05-10 Impact factor: 8.086