Literature DB >> 16844704

Gene selection in cancer classification using sparse logistic regression with Bayesian regularization.

Gavin C Cawley1, Nicola L C Talbot.   

Abstract

MOTIVATION: Gene selection algorithms for cancer classification, based on the expression of a small number of biomarker genes, have been the subject of considerable research in recent years. Shevade and Keerthi propose a gene selection algorithm based on sparse logistic regression (SLogReg) incorporating a Laplace prior to promote sparsity in the model parameters, and provide a simple but efficient training procedure. The degree of sparsity obtained is determined by the value of a regularization parameter, which must be carefully tuned in order to optimize performance. This normally involves a model selection stage, based on a computationally intensive search for the minimizer of the cross-validation error. In this paper, we demonstrate that a simple Bayesian approach can be taken to eliminate this regularization parameter entirely, by integrating it out analytically using an uninformative Jeffrey's prior. The improved algorithm (BLogReg) is then typically two or three orders of magnitude faster than the original algorithm, as there is no longer a need for a model selection step. The BLogReg algorithm is also free from selection bias in performance estimation, a common pitfall in the application of machine learning algorithms in cancer classification.
RESULTS: The SLogReg, BLogReg and Relevance Vector Machine (RVM) gene selection algorithms are evaluated over the well-studied colon cancer and leukaemia benchmark datasets. The leave-one-out estimates of the probability of test error and cross-entropy of the BLogReg and SLogReg algorithms are very similar, however the BlogReg algorithm is found to be considerably faster than the original SLogReg algorithm. Using nested cross-validation to avoid selection bias, performance estimation for SLogReg on the leukaemia dataset takes almost 48 h, whereas the corresponding result for BLogReg is obtained in only 1 min 24 s, making BLogReg by far the more practical algorithm. BLogReg also demonstrates better estimates of conditional probability than the RVM, which are of great importance in medical applications, with similar computational expense. AVAILABILITY: A MATLAB implementation of the sparse logistic regression algorithm with Bayesian regularization (BLogReg) is available from http://theoval.cmp.uea.ac.uk/~gcc/cbl/blogreg/

Entities:  

Mesh:

Substances:

Year:  2006        PMID: 16844704     DOI: 10.1093/bioinformatics/btl386

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  20 in total

1.  Sparse feature selection identifies H2A.Z as a novel, pattern-specific biomarker for asymmetrically self-renewing distributed stem cells.

Authors:  Yang Hoon Huh; Minsoo Noh; Frank R Burden; Jennifer C Chen; David A Winkler; James L Sherley
Journal:  Stem Cell Res       Date:  2015-01-06       Impact factor: 2.020

2.  Spectral organization of the human lateral superior temporal gyrus revealed by intracranial recordings.

Authors:  Kirill V Nourski; Mitchell Steinschneider; Hiroyuki Oya; Hiroto Kawasaki; Robert D Jones; Matthew A Howard
Journal:  Cereb Cortex       Date:  2012-10-09       Impact factor: 5.357

3.  Big Data Analytical Approaches to the NACC Dataset: Aiding Preclinical Trial Enrichment.

Authors:  Ming Lin; Pinghua Gong; Tao Yang; Jieping Ye; Roger L Albin; Hiroko H Dodge
Journal:  Alzheimer Dis Assoc Disord       Date:  2018 Jan-Mar       Impact factor: 2.703

4.  Sparse Bayesian classification and feature selection for biological expression data with high correlations.

Authors:  Xian Yang; Wei Pan; Yike Guo
Journal:  PLoS One       Date:  2017-12-27       Impact factor: 3.240

5.  L1 penalized continuation ratio models for ordinal response prediction using high-dimensional datasets.

Authors:  K J Archer; A A A Williams
Journal:  Stat Med       Date:  2012-02-23       Impact factor: 2.373

6.  Error margin analysis for feature gene extraction.

Authors:  Chi Kin Chow; Hai Long Zhu; Jessica Lacy; Winston P Kuo
Journal:  BMC Bioinformatics       Date:  2010-05-11       Impact factor: 3.169

7.  ccSVM: correcting Support Vector Machines for confounding factors in biological data classification.

Authors:  Limin Li; Barbara Rakitsch; Karsten Borgwardt
Journal:  Bioinformatics       Date:  2011-07-01       Impact factor: 6.937

8.  Polytomy identification in microbial phylogenetic reconstruction.

Authors:  Guan Ning Lin; Chao Zhang; Dong Xu
Journal:  BMC Syst Biol       Date:  2011-12-23

9.  Sparse logistic regression with a L1/2 penalty for gene selection in cancer classification.

Authors:  Yong Liang; Cheng Liu; Xin-Ze Luan; Kwong-Sak Leung; Tak-Ming Chan; Zong-Ben Xu; Hai Zhang
Journal:  BMC Bioinformatics       Date:  2013-06-19       Impact factor: 3.169

10.  On sparse Fisher discriminant method for microarray data analysis.

Authors:  Eric S Fung; Michael K Ng
Journal:  Bioinformation       Date:  2007-12-30
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.