| Literature DB >> 27097747 |
Sander Greenland1, Rhian Daniel2, Neil Pearce3.
Abstract
Controlling for too many potential confounders can lead to or aggravate problems of data sparsity or multicollinearity, particularly when the number of covariates is large in relation to the study size. As a result, methods to reduce the number of modelled covariates are often deployed. We review several traditional modelling strategies, including stepwise regression and the 'change-in-estimate' (CIE) approach to deciding which potential confounders to include in an outcome-regression model for estimating effects of a targeted exposure. We discuss their shortcomings, and then provide some basic alternatives and refinements that do not require special macros or programming. Throughout, we assume the main goal is to derive the most accurate effect estimates obtainable from the data and commercial software. Allowing that most users must stay within standard software packages, this goal can be roughly approximated using basic methods to assess, and thereby minimize, mean squared error (MSE).Entities:
Mesh:
Year: 2016 PMID: 27097747 PMCID: PMC4864881 DOI: 10.1093/ije/dyw040
Source DB: PubMed Journal: Int J Epidemiol ISSN: 0300-5771 Impact factor: 7.196
Associations of sodium and potassium intake at age 4 months with blood pressure (BP) at age 7 years
| Model | Exposure variables | Coefficient estimate | SE for coefficient | Coefficient bias estimate | Indicates bias | Indicates large collinear | Root MSE estimate |
|---|---|---|---|---|---|---|---|
| 1 | Sodium | 0.518 | 0.290 | Referent | 0.290 | ||
| Potassium | 0.099 | 0.095 | Referent | 0.095 | |||
| 2a | Sodium | 0.708 | 0.225 | 0.190 | Yes | Yes | 0.294 |
| 2b | Potassium | 0.206 | 0.074 | 0.107 | Yes | Yes | 0.130 |
*All analyses are adjusted for energy intake at 4 or 8 months, age at BP measurement, sex, socioeconomic position (maternal and paternal education), family social class, maternal age at childbirth, parity, birthweight, gestational age, breastfeeding, smoking during pregnancy, sodium intake at 7 years.
Hypothetical results from rate regressions in which a covariate is or is not a confounder or a source of multicollinearity
| Model | Model variables | Exposure coefficient estimate | Rate ratio estimate | SE for coeff. | 95% CL | Coefficient bias estimate | Indicates bias? | Indicates strongly collinear? | Root MSE estimate | Collapsibility χ 2 and |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | X,W 1 …W J , U 1 …U H | 0.693 | 2.00 | 0.24 | 1.25,3.20 | Referent | 0.24 | |||
|
| ||||||||||
| 2a | X,W 1 …W J | 0.693 | 2.00 | 0.24 | 1.25, 3.20 | 0 | No | No | 0.24 | 0, |
| 2b | X,W 1 …W J | 1.099 | 3.00 | 0.20 | 2.03, 4.44 | 0.405 | Yes | No | 0.45 | 9.34, |
| 2c | X,W 1 …W J | 0.693 | 2.00 | 0.14 | 1.52, 2.63 | 0 | No | Yes | 0.14 | 0, |
| 2d | X,W 1 …W J | 1.099 | 3.00 | 0.14 | 2.28, 3.95 | 0.405 | Yes | Yes | 0.43 | 4.03, |
*Taking model 1 as the referent (‘gold standard’).
Model-adjusted associations of current unpasteurized milk consumption with current atopy status
| Model | Model variables | Exposure coefficient estimate | SE for coefficient | OR | 95% CL for OR | Estimated bias and RMSE | Bootstrap SE | Bootstrap 95% CL |
|---|---|---|---|---|---|---|---|---|
| 1 (basic) | Milk | 0.899 | 0.225 | 2.46 | 1.58, 3.82 |
0.516 0.567 | 0.236 | 1.59, 3.97 |
| 2a (ML full) |
Milk All other variables | 0.406 | 0.257 | 1.50 | 0.91, 2.48 |
0.023 0.262 | 0.261 | 0.89, 2.46 |
| 2b (Firth) |
Milk All other variables | 0.383 | 0.252 | 1.47 | 0.91, 2.40 |
0.000 0.251 | 0.251 | 0.89, 2.37 |
| 3a (forwards stepwise, |
Milk Town Firstborn Current smoker Town as a child Parents farmers Parents kept poultry Parents kept horses | 0.390 | 0.244 | 1.48 | 0.91, 2.38 |
0.007, 0.261 | 0.261 | 0.87, 2.43 |
| 3b (forwards stepwise, |
Milk Town Current smoker Town as a child Parents kept poultry | 0.383 | 0.243 | 1.47 | 0.91, 2.36 |
<0.001 0.261 | 0.261 | 0.88, 2.44 |
| 3c (backward stepwise, |
Milk Town Firstborn Current smoker Parents farmers Parents kept poultry Parents kept horses | 0.398 | 0.244 | 1.49 | 0.92, 2.40 |
0.015 0.261 | 0.261 | 0.88, 2.47 |
| 3d (backward stepwise, |
Milk Town Current smoker Parents farmers Parents kept poultry Parents kept horses | 0.414 | 0.244 | 1.51 | 0.94, 2.44 |
0.031 0.265 | 0.263 | 0.93, 2.61 |
| 4a (forwards AIC) |
Milk Town Horses Firstborn Current smoker Parents kept poultry | 0.381 | 0.243 | 1.46 | 0.91, 2.36 |
−0.002 0.260 | 0.260 | 0.86, 2.39 |
| 4b (backward AIC) |
Milk Town Horses Firstborn Current smoker Parents kept poultry Parents kept horses Parents farmers | 0.398 | 0.244 | 1.49 | 0.92, 2.40 |
0.015 0.262 | 0.262 | 0.88, 2.48 |
| 5a (forwards BIC) |
Milk Town Current smoker Parents kept poultry | 0.393 | 0.243 | 1.48 | 0.92, 2.39 |
0.010 0.264 | 0.264 | 0.88, 2.45 |
| 5b (backward BIC) |
Milk Town Current smoker Parents kept poultry | 0.393 | 0.243 | 1.48 | 0.92, 2.39 |
0.010 0.264 | 0.264 | 0.87, 2.45 |
| 6a (forwards CIE) |
Milk Town | 0.400 | 0.242 | 1.49 | 0.93, 2.39 |
0.017 0.255 | 0.254 | 0.93, 2.56 |
| 6b (backward CIE) |
Milk Town | 0.400 | 0.242 | 1.49 | 0.93, 2.39 |
0.017 0.255 | 0.254 | 0.92, 2.52 |
| 7a (forwards RMSE, larger model as referent) |
Milk Town Poultry Collecting eggs Number of siblings Parents kept cows Parents kept poultry | 0.363 | 0.245 | 1.44 | 0.89, 2.32 |
−0.020 0.258 | 0.257 | 0.86, 2.35 |
| 7b (backward RMSE, larger model as referent) |
Milk Town Poultry Collecting eggs Firstborn | 0.350 | 0.243 | 1.42 | 0.88, 2.29 |
0.017 0.257 | 0.256 | 0.84, 2.28 |
| 8a (forwards RMSE, full model as referent) |
Milk Town | 0.400 | 0.242 | 1.49 | 0.93, 2.39 |
−0.033 0.263 | 0.261 | 0.88, 2.45 |
| 8b (backward RMSE, full model as referent) |
Milk Town Parents kept cows Parents kept poultry | 0.407 | 0.242 | 1.50 | 0.94, 2.42 |
0.024 0.264 | 0.263 | 0.89, 2.51 |
| 9a penalization by log-F(1,1) priors |
Milk All other variables | 0.396 | 0.253 | 1.49 | 0.90, 2.44 |
0.013 0.253 | 0.253 | 0.90, 2.42 |
| 9b penalization by log-F(2,2) priors |
Milk All other variables | 0.389 | 0.250 | 1.47 | 0.90, 2.41 |
0.006 0.246 | 0.246 | 0.90, 2.36 |
*All analyses are adjusted for age group and sex.
† Based on 4000 bootstrap samples.
‡ Bias-corrected and accelerated (BCa) with 4000 resamples.
# Town, farm, cows, pigs, poultry, sheep/goats, horses, milking cows, cleaning barns, collecting eggs, firstborn, number of siblings, current smoker, lived in town or village as a child, parents were farmers, family kept cows, family kept pigs, family kept poultry, family kept sheep or goats, family kept horses.
§ Equivalent to F(1,1) prior for odds ratio; 95% prior limits are 1/648, 648.
¶ Equivalent to F(2,2) prior for odds ratio; 95% prior limits are 1/39, 39.