Literature DB >> 17540680

Logistic regression for disease classification using microarray data: model selection in a large p and small n case.

J G Liao1, Khew-Voon Chin.   

Abstract

MOTIVATION: Logistic regression is a standard method for building prediction models for a binary outcome and has been extended for disease classification with microarray data by many authors. A feature (gene) selection step, however, must be added to penalized logistic modeling due to a large number of genes and a small number of subjects. Model selection for this two-step approach requires new statistical tools because prediction error estimation ignoring the feature selection step can be severely downward biased. Generic methods such as cross-validation and non-parametric bootstrap can be very ineffective due to the big variability in the prediction error estimate.
RESULTS: We propose a parametric bootstrap model for more accurate estimation of the prediction error that is tailored to the microarray data by borrowing from the extensive research in identifying differentially expressed genes, especially the local false discovery rate. The proposed method provides guidance on the two critical issues in model selection: the number of genes to include in the model and the optimal shrinkage for the penalized logistic regression. We show that selecting more than 20 genes usually helps little in further reducing the prediction error. Application to Golub's leukemia data and our own cervical cancer data leads to highly accurate prediction models. AVAILABILITY: R library GeneLogit at http://geocities.com/jg_liao

Entities:  

Mesh:

Substances:

Year:  2007        PMID: 17540680     DOI: 10.1093/bioinformatics/btm287

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  31 in total

1.  Penalized model-based clustering with cluster-specific diagonal covariance matrices and grouped variables.

Authors:  Benhuai Xie; Wei Pan; Xiaotong Shen
Journal:  Electron J Stat       Date:  2008       Impact factor: 1.125

2.  Extracting causal relations on HIV drug resistance from literature.

Authors:  Quoc-Chinh Bui; Breanndán O Nualláin; Charles A Boucher; Peter M A Sloot
Journal:  BMC Bioinformatics       Date:  2010-02-23       Impact factor: 3.169

3.  An EEG-based machine learning method to screen alcohol use disorder.

Authors:  Wajid Mumtaz; Pham Lam Vuong; Likun Xia; Aamir Saeed Malik; Rusdi Bin Abd Rashid
Journal:  Cogn Neurodyn       Date:  2016-10-24       Impact factor: 5.082

4.  Folded concave penalized learning in identifying multimodal MRI marker for Parkinson's disease.

Authors:  Hongcheng Liu; Guangwei Du; Lijun Zhang; Mechelle M Lewis; Xue Wang; Tao Yao; Runze Li; Xuemei Huang
Journal:  J Neurosci Methods       Date:  2016-04-19       Impact factor: 2.390

5.  Statistical analysis and modeling of mass spectrometry-based metabolomics data.

Authors:  Bowei Xi; Haiwei Gu; Hamid Baniasadi; Daniel Raftery
Journal:  Methods Mol Biol       Date:  2014

6.  Penalized model-based clustering with unconstrained covariance matrices.

Authors:  Hui Zhou; Wei Pan; Xiaotong Shen
Journal:  Electron J Stat       Date:  2009-01-01       Impact factor: 1.125

7.  Utilizing ECG-Based Heartbeat Classification for Hypertrophic Cardiomyopathy Identification.

Authors:  Quazi Abidur Rahman; Larisa G Tereshchenko; Matthew Kongkatong; Theodore Abraham; M Roselle Abraham; Hagit Shatkay
Journal:  IEEE Trans Nanobioscience       Date:  2015-04-24       Impact factor: 2.935

8.  Multi-TGDR: a regularization method for multi-class classification in microarray experiments.

Authors:  Suyan Tian; Mayte Suárez-Fariñas
Journal:  PLoS One       Date:  2013-11-19       Impact factor: 3.240

Review 9.  Review of multi-omics data resources and integrative analysis for human brain disorders.

Authors:  Xianjun Dong; Chunyu Liu; Mikhail Dozmorov
Journal:  Brief Funct Genomics       Date:  2021-07-17       Impact factor: 4.241

10.  Lung cancer gene expression database analysis incorporating prior knowledge with support vector machine-based classification method.

Authors:  Peng Guan; Desheng Huang; Miao He; Baosen Zhou
Journal:  J Exp Clin Cancer Res       Date:  2009-07-18
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.