Ai-Jun Yang1, Xin-Yuan Song. 1. Department of Statistics, The Chinese University of Hong Kong, Hong Kong, PR China. ajyang81@gmail.com
Abstract
MOTIVATION: An important application of gene expression microarray data is the classification of samples into categories. Accurate classification depends upon the method used to identify the most relevant genes. Owing to the large number of genes and relatively small sample size, the selection process can be unstable. Modification of existing methods for achieving better analysis of microarray data is needed. RESULTS: We propose a Bayesian stochastic variable selection approach for gene selection based on a probit regression model with a generalized singular g-prior distribution for regression coefficients. Using simulation-based Markov chain Monte Carlo methods for simulating parameters from the posterior distribution, an efficient and dependable algorithm is implemented. It is also shown that this algorithm is robust to the choices of initial values, and produces posterior probabilities of related genes for biological interpretation. The performance of the proposed approach is compared with other popular methods in gene selection and classification via the well-known colon cancer and leukemia datasets in microarray literature. AVAILABILITY: A free Matlab code to perform gene selection is available at http://www.sta.cuhk.edu.hk/xysong/geneselection/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: An important application of gene expression microarray data is the classification of samples into categories. Accurate classification depends upon the method used to identify the most relevant genes. Owing to the large number of genes and relatively small sample size, the selection process can be unstable. Modification of existing methods for achieving better analysis of microarray data is needed. RESULTS: We propose a Bayesian stochastic variable selection approach for gene selection based on a probit regression model with a generalized singular g-prior distribution for regression coefficients. Using simulation-based Markov chain Monte Carlo methods for simulating parameters from the posterior distribution, an efficient and dependable algorithm is implemented. It is also shown that this algorithm is robust to the choices of initial values, and produces posterior probabilities of related genes for biological interpretation. The performance of the proposed approach is compared with other popular methods in gene selection and classification via the well-known colon cancer and leukemia datasets in microarray literature. AVAILABILITY: A free Matlab code to perform gene selection is available at http://www.sta.cuhk.edu.hk/xysong/geneselection/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Nicholas B Larson; Shannon McDonnell; Lisa Cannon Albright; Craig Teerlink; Janet Stanford; Elaine A Ostrander; William B Isaacs; Jianfeng Xu; Kathleen A Cooney; Ethan Lange; Johanna Schleutker; John D Carpten; Isaac Powell; Joan Bailey-Wilson; Olivier Cussenot; Geraldine Cancel-Tassin; Graham Giles; Robert MacInnis; Christiane Maier; Alice S Whittemore; Chih-Lin Hsieh; Fredrik Wiklund; William J Catalona; William Foulkes; Diptasri Mandal; Rosalind Eeles; Zsofia Kote-Jarai; Michael J Ackerman; Timothy M Olson; Christopher J Klein; Stephen N Thibodeau; Daniel J Schaid Journal: Genet Epidemiol Date: 2016-06-17 Impact factor: 2.135
Authors: Steven M Hill; Richard M Neve; Nora Bayani; Wen-Lin Kuo; Safiyyah Ziyad; Paul T Spellman; Joe W Gray; Sach Mukherjee Journal: BMC Bioinformatics Date: 2012-05-11 Impact factor: 3.169
Authors: Bin Peng; Dianwen Zhu; Bradley P Ander; Xiaoshuai Zhang; Fuzhong Xue; Frank R Sharp; Xiaowei Yang Journal: PLoS One Date: 2013-07-03 Impact factor: 3.240