| Literature DB >> 33894761 |
Muhammad Abu Shadeque Mullah1, James A Hanley1, Andrea Benedetti2,3.
Abstract
BACKGROUND: Generalized linear mixed models (GLMMs), typically used for analyzing correlated data, can also be used for smoothing by considering the knot coefficients from a regression spline as random effects. The resulting models are called semiparametric mixed models (SPMMs). Allowing the random knot coefficients to follow a normal distribution with mean zero and a constant variance is equivalent to using a penalized spline with a ridge regression type penalty. We introduce the least absolute shrinkage and selection operator (LASSO) type penalty in the SPMM setting by considering the coefficients at the knots to follow a Laplace double exponential distribution with mean zero.Entities:
Keywords: Generalized linear mixed models; Least absolute shrinkage and selection operator (LASSO); Markov chain Monte Carlo; Penalized splines; Ridge regression
Mesh:
Year: 2021 PMID: 33894761 PMCID: PMC8070328 DOI: 10.1186/s12874-021-01234-9
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Test functions used for data generation
| Name | Shape | Function |
|---|---|---|
| Linear | [JWFFXGRAPHICS]s12874-021-01234-9tmc1.eps[JWFFXGRAPHICS] | log(3)∗ |
| Concave | [JWFFXGRAPHICS]s12874-021-01234-9tmc2.eps[JWFFXGRAPHICS] | |
| Double Hump | [JWFFXGRAPHICS]s12874-021-01234-9tmc3.eps[JWFFXGRAPHICS] |
Simulation results from logistic spline fit by RIDGE and LASSO penalties
| Full Curve | At Boundaries | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Lower 10 | Upper 10 | ||||||||
| Penalty | MASE | MACP | MACL | MASE | MACP | MACL | MASE | MACP | MACL |
| Function : Linear | |||||||||
| LASSO | 0.105 | 0.96 | 1.332 | 0.270 | 0.96 | 2.755 | 0.281 | 0.96 | 2.765 |
| RIDGE | 0.149 | 0.96 | 1.369 | 0.445 | 0.95 | 2.830 | 0.537 | 0.95 | 3.024 |
| LASSO | 0.104 | 0.96 | 1.358 | 0.255 | 0.96 | 2.821 | 0.285 | 0.97 | 2.826 |
| RIDGE | 0.148 | 0.96 | 1.403 | 0.415 | 0.95 | 2.907 | 0.549 | 0.96 | 3.071 |
| LASSO | 0.096 | 0.96 | 1.356 | 0.223 | 0.96 | 2.813 | 0.287 | 0.97 | 2.815 |
| RIDGE | 0.142 | 0.96 | 1.401 | 0.365 | 0.96 | 2.898 | 0.553 | 0.96 | 3.045 |
| Function : Concave | |||||||||
| LASSO | 0.338 | 0.95 | 1.791 | 1.282 | 0.92 | 3.461 | 1.123 | 0.93 | 3.467 |
| RIDGE | 0.341 | 0.95 | 1.786 | 1.313 | 0.92 | 3.465 | 1.150 | 0.93 | 3.471 |
| LASSO | 0.359 | 0.96 | 1.953 | 1.421 | 0.94 | 3.886 | 1.151 | 0.95 | 3.787 |
| RIDGE | 0.364 | 0.96 | 1.951 | 1.476 | 0.94 | 3.979 | 1.206 | 0.95 | 3.855 |
| LASSO | 0.350 | 0.96 | 1.958 | 1.351 | 0.95 | 3.873 | 1.113 | 0.96 | 3.829 |
| RIDGE | 0.355 | 0.96 | 1.953 | 1.424 | 0.94 | 3.965 | 1.173 | 0.95 | 3.846 |
| Function : Double Hump | |||||||||
| LASSO | 0.301 | 0.90 | 1.502 | 0.291 | 0.92 | 1.655 | 0.472 | 0.92 | 2.441 |
| RIDGE | 0.345 | 0.89 | 1.583 | 0.328 | 0.92 | 1.769 | 0.678 | 0.91 | 2.506 |
| LASSO | 0.316 | 0.95 | 1.901 | 0.315 | 0.94 | 1.854 | 0.514 | 0.95 | 2.697 |
| RIDGE | 0.383 | 0.94 | 1.932 | 0.382 | 0.93 | 1.980 | 0.753 | 0.93 | 2.786 |
| LASSO | 0.323 | 0.95 | 1.942 | 0.382 | 0.95 | 2.021 | 0.531 | 0.95 | 2.818 |
| RIDGE | 0.396 | 0.94 | 1.987 | 0.445 | 0.94 | 2.152 | 0.780 | 0.93 | 2.894 |
We report mean average squared distance (MASE), mean average 95% coverage probability (MACP), and mean average coverage length (MACL) measures for full curve and boundaries for each K, penalty and curve
Fig. 1Three performance indicators as a function of number of knots, K comparing the performance of using two different penalties (: LASSO and : ridge) for the double hump shape of association. Performance indicators are: Average Squared Error (ASE) of , Average Coverage Probability (ACP) and Average Coverage Length (ACL) for m(x). In all, we present the median and interquartile ranges based on 1,000 replication
Fig. 2Estimated functions (pointwise mean of fits) vs actual functions in the upper row (a linear, b concave function, c double hump) and smoothed pointwise coverage probabilities of the 95% confidence intervals in the lower row (d linear, e concave function, f double hump) from 1,000 replicated datasets
Characteristics of the participants/study population, by COPD status
| Characteristic | Summary Measure (total n = 6,564) | |||
|---|---|---|---|---|
| COPD | NO COPD | |||
| n = 1367 | (20.8%) | n = 5197 | (79.2%) | |
| Age | 65.2 | (11.2) | 57.0 | (10.8) |
| Male | 755 | (55.2%) | 2286 | (44.0%) |
| BMI | 27.3 | (5.3) | 27.9 | (5.7) |
| Ever smoker (cigarette) | 943 | (69.0%) | 2623 | (50.5%) |
| Ever smoker (pipe or cigarette) | ||||
| Never smoker | 406 | (29.7%) | 2543 | (48.9%) |
| Ex smoker | 645 | (47.2%) | 2030 | (39.1%) |
| Current smoker | 316 | (23.1%) | 624 | (12.0%) |
| Pack Years | 22.9 | (24.6) | 10.5 | (17.0%) |
| Average cigarette per day | 13.2 | (12.6) | 8.3 | (11.4) |
| Duration of smoking (year) | 23.0 | (19.7) | 11.8 | (14.9) |
| Smoking cessation | 943 | (69.0%) | 2623 | (50.5%) |
| Occupation | ||||
| Hard rock mining | 34 | (2.5%) | 80 | (1.5%) |
| Coal mining | 11 | (0.8%) | 12 | (0.2%) |
| Working with asbestos | 59 | (4.3%) | 157 | (3.0%) |
| Chemical/plastics manufacturing | 80 | (5.9%) | 231 | (4.4%) |
| Foundry/steel milling | 39 | (2.9%) | 106 | (2.0%) |
| Welding | 68 | (5.0%) | 172 | (3.3%) |
| Saw-milling | 39 | (2.9%) | 103 | (2.0%) |
Mean (SD) is reported for quantitative variables, while count (%) is reported for categorical variables
Fig. 3LASSO type penalized splines estimates of m1(pack years), m2(age) and m3(BMI) for the logit of the prevalence of COPD. The shaded regions are the pointwise 95% credible sets obtained from the fully Bayesian fit
Results from SPMMs fit using LASSO penalty for assessing the effect of occupational exposures on the occurrence of COPD
| Odds Ratio (95% CI) | ||
|---|---|---|
| Unadjusted | Adjusted | |
| Age | - | * |
| Pack Years | - | * |
| BMI | - | * |
| Sex (Men) | - | ** |
| Occupation | ||
| Hard rock mining | 1.6 (1.1, 2.4) | 1.1 (0.7, 1.8) |
| Coal mining | 3.5 (1.5, 8.0) | 1.7(0.7, 4.3) |
| Working with asbestos | 1.4 (1.1, 2.0) | 0.8 (0.5, 1.1) |
| Chemical/plastics manufacturing | 1.3 (1.1, 1.7) | 1.1 (0.8, 1.5) |
| Foundry/steel milling | 1.4 (1.0, 2.0) | 1.0 (0.7, 1.5) |
| Welding | 1.5 (1.1, 2.0) | 1.2 (0.9, 1.6) |
| Saw-milling | 1.5 (1.0, 2.1) | 1.2 (0.8, 1.8) |
aAdjusted ORs were obtained from the SPMM as shown in (17) adjusting for pack years, age, sex and BMI
* Covariates were modeled using penalized splines. A summary of odds ratio and CI were not provided by the method; see Fig. 3 for smooth curves when occupation was hard rock mining
** OR = 1.4 (1.3, 1.6) when occupation was hard rock mining