| Literature DB >> 29082539 |
Xiang Li1, Shanghong Xie2, Donglin Zeng3, Yuanjia Wang2.
Abstract
Advances in high-throughput technologies in genomics and imaging yield unprecedentedly large numbers of prognostic biomarkers. To accommodate the scale of biomarkers and study their association with disease outcomes, penalized regression is often used to identify important biomarkers. The ideal variable selection procedure would search for the best subset of predictors, which is equivalent to imposing an ℓ0 -penalty on the regression coefficients. Since this optimization is a nondeterministic polynomial-time hard (NP-hard) problem that does not scale with number of biomarkers, alternative methods mostly place smooth penalties on the regression parameters, which lead to computationally feasible optimization problems. However, empirical studies and theoretical analyses show that convex approximation of ℓ0 -norm (eg, ℓ1 ) does not outperform their ℓ0 counterpart. The progress for ℓ0 -norm feature selection is relatively slower, where the main methods are greedy algorithms such as stepwise regression or orthogonal matching pursuit. Penalized regression based on regularizing ℓ0 -norm remains much less explored in the literature. In this work, inspired by the recently popular augmenting and data splitting algorithms including alternating direction method of multipliers, we propose a 2-stage procedure for ℓ0 -penalty variable selection, referred to as augmented penalized minimization-L0 (APM-L0 ). The APM-L0 targets ℓ0 -norm as closely as possible while keeping computation tractable, efficient, and simple, which is achieved by iterating between a convex regularized regression and a simple hard-thresholding estimation. The procedure can be viewed as arising from regularized optimization with truncated ℓ1 norm. Thus, we propose to treat regularization parameter and thresholding parameter as tuning parameters and select based on cross-validation. A 1-step coordinate descent algorithm is used in the first stage to significantly improve computational efficiency. Through extensive simulation studies and real data application, we demonstrate superior performance of the proposed method in terms of selection accuracy and computational speed as compared to existing methods. The proposed APM-L0 procedure is implemented in the R-package APML0.Entities:
Keywords: ADMM; biomarker signature; censored data; variable selection; ℓ0-penalty
Mesh:
Substances:
Year: 2017 PMID: 29082539 PMCID: PMC5768461 DOI: 10.1002/sim.7526
Source DB: PubMed Journal: Stat Med ISSN: 0277-6715 Impact factor: 2.373