Literature DB >> 33765925

Improved two-stage model averaging for high-dimensional linear regression, with application to Riboflavin data analysis.

Juming Pan1.   

Abstract

BACKGROUND: Model averaging has attracted increasing attention in recent years for the analysis of high-dimensional data. By weighting several competing statistical models suitably, model averaging attempts to achieve stable and improved prediction. In this paper, we develop a two-stage model averaging procedure to enhance accuracy and stability in prediction for high-dimensional linear regression. First we employ a high-dimensional variable selection method such as LASSO to screen redundant predictors and construct a class of candidate models, then we apply the jackknife cross-validation to optimize model weights for averaging.
RESULTS: In simulation studies, the proposed technique outperforms commonly used alternative methods under high-dimensional regression setting, in terms of minimizing the mean of the squared prediction error. We apply the proposed method to a riboflavin data, the result show that such method is quite efficient in forecasting the riboflavin production rate, when there are thousands of genes and only tens of subjects.
CONCLUSIONS: Compared with a recent high-dimensional model averaging procedure (Ando and Li in J Am Stat Assoc 109:254-65, 2014), the proposed approach enjoys three appealing features thus has better predictive performance: (1) More suitable methods are applied for model constructing and weighting. (2) Computational flexibility is retained since each candidate model and its corresponding weight are determined in the low-dimensional setting and the quadratic programming is utilized in the cross-validation. (3) Model selection and averaging are combined in the procedure thus it makes full use of the strengths of both techniques. As a consequence, the proposed method can achieve stable and accurate predictions in high-dimensional linear models, and can greatly help practical researchers analyze genetic data in medical research.

Entities:  

Keywords:  Cross-validation; High-dimensional regression; Jackknife; Model averaging; Variable selection

Mesh:

Substances:

Year:  2021        PMID: 33765925      PMCID: PMC7992957          DOI: 10.1186/s12859-021-04053-3

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  4 in total

1.  Discussion of "Sure Independence Screening for Ultra-High Dimensional Feature Space.

Authors:  Hao Helen Zhang
Journal:  J R Stat Soc Series B Stat Methodol       Date:  2008-11       Impact factor: 4.488

2.  Combining Models is More Likely to Give Better Predictions than Single Models.

Authors:  Xiaoping Hu; Laurence V Madden; Simon Edwards; Xiangming Xu
Journal:  Phytopathology       Date:  2015-08-28       Impact factor: 4.025

3.  Variable screening via quantile partial correlation.

Authors:  Shujie Ma; Runze Li; Chih-Ling Tsai
Journal:  J Am Stat Assoc       Date:  2017-03-30       Impact factor: 5.033

Review 4.  Variable selection - A review and recommendations for the practicing statistician.

Authors:  Georg Heinze; Christine Wallisch; Daniela Dunkler
Journal:  Biom J       Date:  2018-01-02       Impact factor: 2.207

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.