Literature DB >> 29082539

Efficient ℓ0 -norm feature selection based on augmented and penalized minimization.

Xiang Li1, Shanghong Xie2, Donglin Zeng3, Yuanjia Wang2.   

Abstract

Advances in high-throughput technologies in genomics and imaging yield unprecedentedly large numbers of prognostic biomarkers. To accommodate the scale of biomarkers and study their association with disease outcomes, penalized regression is often used to identify important biomarkers. The ideal variable selection procedure would search for the best subset of predictors, which is equivalent to imposing an ℓ0 -penalty on the regression coefficients. Since this optimization is a nondeterministic polynomial-time hard (NP-hard) problem that does not scale with number of biomarkers, alternative methods mostly place smooth penalties on the regression parameters, which lead to computationally feasible optimization problems. However, empirical studies and theoretical analyses show that convex approximation of ℓ0 -norm (eg, ℓ1 ) does not outperform their ℓ0 counterpart. The progress for ℓ0 -norm feature selection is relatively slower, where the main methods are greedy algorithms such as stepwise regression or orthogonal matching pursuit. Penalized regression based on regularizing ℓ0 -norm remains much less explored in the literature. In this work, inspired by the recently popular augmenting and data splitting algorithms including alternating direction method of multipliers, we propose a 2-stage procedure for ℓ0 -penalty variable selection, referred to as augmented penalized minimization-L0 (APM-L0 ). The APM-L0 targets ℓ0 -norm as closely as possible while keeping computation tractable, efficient, and simple, which is achieved by iterating between a convex regularized regression and a simple hard-thresholding estimation. The procedure can be viewed as arising from regularized optimization with truncated ℓ1 norm. Thus, we propose to treat regularization parameter and thresholding parameter as tuning parameters and select based on cross-validation. A 1-step coordinate descent algorithm is used in the first stage to significantly improve computational efficiency. Through extensive simulation studies and real data application, we demonstrate superior performance of the proposed method in terms of selection accuracy and computational speed as compared to existing methods. The proposed APM-L0 procedure is implemented in the R-package APML0.
Copyright © 2017 John Wiley & Sons, Ltd.

Entities:  

Keywords:  ADMM; biomarker signature; censored data; variable selection; ℓ0-penalty

Mesh:

Substances:

Year:  2017        PMID: 29082539      PMCID: PMC5768461          DOI: 10.1002/sim.7526

Source DB:  PubMed          Journal:  Stat Med        ISSN: 0277-6715            Impact factor:   2.373


  19 in total

Review 1.  Principles and practical application of the receiver-operating characteristic analysis for diagnostic tests.

Authors:  M Greiner; D Pfeiffer; R D Smith
Journal:  Prev Vet Med       Date:  2000-05-30       Impact factor: 2.670

2.  Likelihood-based selection and sharp parameter estimation.

Authors:  Xiaotong Shen; Wei Pan; Yunzhang Zhu
Journal:  J Am Stat Assoc       Date:  2012-06-11       Impact factor: 5.033

3.  High-dimensional, massive sample-size Cox proportional hazards regression for survival analysis.

Authors:  Sushil Mittal; David Madigan; Randall S Burd; Marc A Suchard
Journal:  Biostatistics       Date:  2013-10-04       Impact factor: 5.899

Review 4.  Huntington disease: natural history, biomarkers and prospects for therapeutics.

Authors:  Christopher A Ross; Elizabeth H Aylward; Edward J Wild; Douglas R Langbehn; Jeffrey D Long; John H Warner; Rachael I Scahill; Blair R Leavitt; Julie C Stout; Jane S Paulsen; Ralf Reilmann; Paul G Unschuld; Alice Wexler; Russell L Margolis; Sarah J Tabrizi
Journal:  Nat Rev Neurol       Date:  2014-03-11       Impact factor: 42.937

5.  Evaluating the yield of medical tests.

Authors:  F E Harrell; R M Califf; D B Pryor; K L Lee; R A Rosati
Journal:  JAMA       Date:  1982-05-14       Impact factor: 56.272

6.  Regularization Paths for Generalized Linear Models via Coordinate Descent.

Authors:  Jerome Friedman; Trevor Hastie; Rob Tibshirani
Journal:  J Stat Softw       Date:  2010       Impact factor: 6.440

7.  Thalamic metabolism and symptom onset in preclinical Huntington's disease.

Authors:  A Feigin; C Tang; Y Ma; P Mattis; D Zgaljardic; M Guttman; J S Paulsen; V Dhawan; D Eidelberg
Journal:  Brain       Date:  2007-09-24       Impact factor: 13.501

8.  VARIABLE SELECTION AND REGRESSION ANALYSIS FOR GRAPH-STRUCTURED COVARIATES WITH AN APPLICATION TO GENOMICS.

Authors:  Caiyan Li; Hongzhe Li
Journal:  Ann Appl Stat       Date:  2010-09-01       Impact factor: 2.083

9.  Variable selection and estimation in generalized linear models with the seamless L0 penalty.

Authors:  Zilin Li; Sijian Wang; Xihong Lin
Journal:  Can J Stat       Date:  2012-12       Impact factor: 0.875

10.  The relative importance of imaging markers for the prediction of Alzheimer's disease dementia in mild cognitive impairment - Beyond classical regression.

Authors:  Stefan J Teipel; Jens Kurth; Bernd Krause; Michel J Grothe
Journal:  Neuroimage Clin       Date:  2015-05-21       Impact factor: 4.881

View more
  4 in total

1.  Time-varying Hazards Model for Incorporating Irregularly Measured, High-Dimensional Biomarkers.

Authors:  Xiang Li; Quefeng Li; Donglin Zeng; Karen Marder; Jane Paulsen; Yuanjia Wang
Journal:  Stat Sin       Date:  2020-07       Impact factor: 1.261

2.  Continual reassessment method with regularization in phase I clinical trials.

Authors:  Xiang Li; Anastasia Ivanova; Hong Tian; Pilar Lim; Kevin Liu
Journal:  J Biopharm Stat       Date:  2020-09-14       Impact factor: 1.051

3.  Detecting prognostic biomarkers of breast cancer by regularized Cox proportional hazards models.

Authors:  Lingyu Li; Zhi-Ping Liu
Journal:  J Transl Med       Date:  2021-12-20       Impact factor: 5.531

4.  Identifying phenotype-associated subpopulations by integrating bulk and single-cell sequencing data.

Authors:  Duanchen Sun; Xiangnan Guan; Amy E Moran; Ling-Yun Wu; David Z Qian; Pepper Schedin; Mu-Shui Dai; Alexey V Danilov; Joshi J Alumkal; Andrew C Adey; Paul T Spellman; Zheng Xia
Journal:  Nat Biotechnol       Date:  2021-11-11       Impact factor: 68.164

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.