| Literature DB >> 33836714 |
A Bartonicek1, S R Wickham2, N Pat2, T S Conner2.
Abstract
BACKGROUND: Variable selection is an important issue in many fields such as public health and psychology. Researchers often gather data on many variables of interest and then are faced with two challenging goals: building an accurate model with few predictors, and making probabilistic statements (inference) about this model. Unfortunately, it is currently difficult to attain these goals with the two most popular methods for variable selection methods: stepwise selection and LASSO. The aim of the present study was to demonstrate the use predictive projection feature selection - a novel Bayesian variable selection method that delivers both predictive power and inference. We apply predictive projection to a sample of New Zealand young adults, use it to build a compact model for predicting well-being, and compare it to other variable selection methods.Entities:
Keywords: Diet; Exercise; Health behaviors; Health habits; Inference; Prediction; Psychological well-being; Sleep; Variable selection; Young adults
Year: 2021 PMID: 33836714 PMCID: PMC8033696 DOI: 10.1186/s12889-021-10690-3
Source DB: PubMed Journal: BMC Public Health ISSN: 1471-2458 Impact factor: 3.295
Fig. 1Flow diagram of the data selection procedure. Not applicable
Fig. 2Predictive projection feature selection trajectory and scatterplot of reference model’s vs. submodel’s prediction. a) Change in ELPD/decrease in RMSE as more predictors entered the submodel. b) Average daily flourishing predicted by the submodel (3 predictors) vs. the average daily flourishing predicted by the reference model (28 predictors; both predicting training data)
Fig. 3Credible intervals for predictors in the submodel and scatterplot of submodel’s predictions vs. observed values. Marginal posterior distributions of predictors selected for the submodel (in order: felt refreshed after waking up today, had trouble concentrating today, servings of fruit today). b) Average daily flourishing predicted by the submodel vs. observed daily flourishing (unseen test data), with overlaid least squares fit
Comparison of variable selection methods
| Model | R2 | RMSE | # of selected predictors | Selected predictors | |
|---|---|---|---|---|---|
| Reference model | 0.331 | 0.858 | (28) | – | |
| Freq. multiple regression | 0.332 | 0.858 | (28) | – | |
| Projected submodel (1 SE) | 0.253 | 0.883 | 3 | Felt refreshed after waking up today, had trouble concentrating today, servings of fruit today | |
| Projected submodel (matched) | 0.284 | 0.864 | 6 | Felt refreshed after waking up today, had trouble concentrating today, servings of fruit today, servings of soft drink last night, servings of vegetables today, gender: female | |
| Stepwise selection (AIC) | 0.315 | 0.872 | 10 | Felt refreshed after waking up today, ethnicity: asian, had trouble concentrating today, gender: female, servings of soft drink last night, servings of sweets today, servings of sweets last night, felt tired today, servings of fruit today, bmi | |
| Stepwise selection (p-values) | 0.275 | 0.871 | 8 | Felt refreshed after waking up today, had trouble concentrating today, gender: female, servings of sweets today, felt tired today, servings of sweets last night, servings of fruit today, servings of soft drink last night | |
| LASSO (1 SE) | 0.139 | 0.897 | 4 | Felt refreshed after waking up today, had trouble concentrating today, servings of fruit today, servings of soft drink last night | |
| LASSO (min.) | 0.283 | 0.857 | 23 | – | |
Summary statistics of model selection strategies, showing test data RMSE and Bayesian R2, number of selected predictors, and the names of the significant predictors (where 10 or fewer predictors were selected, ranked by absolute slope size)