| Literature DB >> 31008441 |
Micah L Hartwell1,2,3, Jam Khojasteh2, Marianna S Wetherill4, Julie M Croff3,5, Denna Wheeler3,5.
Abstract
BACKGROUND: Structural equation modeling (SEM) is a multivariate analysis method for exploring relations between latent constructs and measured variables. As a theory-guided approach, SEM estimates directional pathways in complex models based on longitudinal or cross-sectional data where randomized control trials would either be unethical or cost prohibitive. However, this method is infrequently used in nutrition research, despite recommendations by epidemiologists for its increased use.Entities:
Keywords: NHANES; Structural equation modeling; complex survey design; multiple imputation; quasi-maximum likelihood
Year: 2019 PMID: 31008441 PMCID: PMC6465451 DOI: 10.1093/cdn/nzz010
Source DB: PubMed Journal: Curr Dev Nutr ISSN: 2475-2991
Applied definitions for select statistical terms used in this paper
| Term | Applied example |
|---|---|
| Measured variable | A construct that can be directly measured, such as age, blood pressure, height, weight; or biomarkers, such as serum cotinine (a measure of cigarette smoking) |
| Latent construct | A construct that cannot be directly measured, but that can be reliably assessed through a combination of validated measured variables, such as the construct of depression based on the 9-item PHQ |
| Factor analysis (measurement model) | A statistical technique used to identify latent constructs, such as depression, through multiple measurable items, such as the PHQ. In SEM, items from the survey are assessed through factor analysis to confirm their reliability and validity as a measure of the intended construct |
| Path analysis (structural model) | A series of independent linear regressions between multiple variables to test a causal pathway, such as the relation from age to food security to cotinine |
| Structural equation modeling | A theory-guided approach for regressing pathways among latent constructs and measured variables, such as socioeconomic status, to predict health ( |
| Complex survey design | A method of sampling from a population applying stratification and clustering to achieve statistical and practical efficiency ( |
| Simple imputation | Simple imputation strategies for data missingness, such as mean imputation, last observation carried forward, and hot-deck imputation strategies, can provide the researcher with a full dataset, but may artificially reduce variability or yield results that are much more precise than they should be, which can lead to inflated type I error rates ( |
| Multiple imputation | An alternative approach to simple imputation, where each missing value is computed independently under a Bayesian model that includes an estimation of uncertainty about the missing data, The datasets are analyzed individually and then combined to give an appropriate pooled estimate. The combined procedure produces more accurate standard errors than simple imputation strategies ( |
| Maximum likelihood | An iterative approach that uses probability density functions to find the parameter estimates that result in a best fit to the observed data ( |
| Quasimaximum likelihood | An iterative approach that approximates the likelihood function based on the use of approximated nonnormal density functions, making better estimations for nonnormally distributed variables and sampling weights ( |
| Satorra-Bentler correction | A sandwich estimator that is used in conjunction with QML which relaxes the assumption for multivariate normal data, creating robustness to nonnormal distributions and providing better chi-square goodness-of fit statistics and better estimates of other fit indices ( |
PHQ, Patient Health Questionnaire; QML, quasimaximum likelihood; SEM, structural equation modeling.
FIGURE 1Steps for conducting SEM analysis with important steps highlighted that researchers should consider when they use complex survey data and handle missing data as explored in this manuscript. SEM, structural equation modeling.
FIGURE 2Final adjusted structural equation model of the effects of age on cotinine with mediating factors of food security and depression based on the use of the QML estimation with Satorra-Bentler correction with the NHANES data (n = 17,132 sample size of people aged >20 y; total population = 214,755,655). QML, quasimaximum likelihood.
Demographics from NHANES unweighted raw data (n = 17,132), with 6-y weighting (total sample = 214,755,655), and complete cases (n = 10,574), for participants >20 y of age
| NHANES unweighted | NHANES weighted | Complete cases sample | |||||
|---|---|---|---|---|---|---|---|
| No. | % | No. | % | No. | % | % completing survey | |
| Ethnicity | |||||||
| Mexican American | 3176 | 18.54 | 18,000,000 | 8.38 | 1526 | 14.43 | 48.05 |
| Other Hispanic | 1452 | 8.48 | 9500,000 | 4.42 | 761 | 7.20 | 52.41 |
| Non-Hispanic white | 8232 | 48.05 | 150,000,000 | 69.85 | 5929 | 56.07 | 72.02 |
| Non-Hispanic black | 3472 | 20.27 | 24,000,000 | 11.18 | 1929 | 18.24 | 55.56 |
| Other, including multiracial | 800 | 4.67 | 13,000,000 | 6.05 | 429 | 4.06 | 53.63 |
| Age group, y | |||||||
| 20–29 | 3006 | 17.55 | 41,000,000 | 19.09 | 1581 | 14.95 | 52.59 |
| 30–39 | 2910 | 16.99 | 40,000,000 | 18.63 | 1660 | 15.70 | 57.04 |
| 40–49 | 2899 | 16.92 | 44,000,000 | 20.49 | 1766 | 16.70 | 60.92 |
| 50–59 | 2520 | 14.71 | 39,000,000 | 18.16 | 1619 | 15.31 | 64.25 |
| 60–69 | 2648 | 15.46 | 25,000,000 | 11.64 | 1817 | 17.18 | 68.62 |
| 70–79 | 1891 | 11.04 | 16,000,000 | 7.45 | 1337 | 12.64 | 70.70 |
| ≥80 | 1258 | 7.34 | 9,500,000 | 4.42 | 794 | 7.51 | 63.12 |
| Food security (raw: | |||||||
| High food security | 12,558 | 98.13 | 170,000,000 | 96.74 | 10,392 | 98.28 | 82.75 |
| Marginal food Security | 10 | 0.08 | 120,000 | 0.07 | 8 | 0.08 | 80.00 |
| Low food security | 108 | 0.84 | 900,000 | 0.51 | 79 | 0.75 | 73.15 |
| Very low food security | 121 | 0.95 | 1100,000 | 0.63 | 95 | 0.90 | 78.51 |
Descriptive statistics of variables from the NHANES sample
| Variable | Mean | SD | Min | Max | Skew | Kurtosis |
|---|---|---|---|---|---|---|
| Age | 49.64 | 18.31 | 20 | 85 | 0.14 | −1.12 |
| Cotinine | 59.78 | 129.2 | 0.01 | 1438 | 2.60 | 8.11 |
| Food security | 0.24 | 1.45 | 0 | 10 | 5.94 | 33.85 |
| DPQ010: …little interest or pleasure in doing things? | 0.34 | 0.71 | 0 | 3 | 2.27 | 4.71 |
| DPQ020: …feeling down, depressed, or hopeless? | 0.36 | 0.72 | 0 | 3 | 2.24 | 4.58 |
| DPQ030: …trouble falling or staying asleep, or sleeping too much? | 0.63 | 0.95 | 0 | 3 | 1.41 | 0.84 |
| DPQ040: …feeling tired or having little energy? | 0.74 | 0.93 | 0 | 3 | 1.18 | 0.46 |
| DPQ050: …poor appetite or overeating? | 0.37 | 0.76 | 0 | 3 | 2.24 | 4.30 |
| DPQ060: …feeling bad about yourself—or that you are a failure or have let yourself or your family down? | 0.25 | 0.63 | 0 | 3 | 2.88 | 8.20 |
| DPQ070: …trouble concentrating on things, such as reading the newspaper or watching TV? | 0.25 | 0.64 | 0 | 3 | 2.88 | 8.13 |
| DPQ080: …moving or speaking so slowly that other people could have noticed? Or the opposite—being so fidgety or restless that you have been moving around a lot more than usual? | 0.16 | 0.52 | 0 | 3 | 3.73 | 14.61 |
| DPQ090: …thoughts that you would be better off dead or of hurting yourself in some way? | 0.06 | 0.31 | 0 | 3 | 6.81 | 52.02 |
DPQ, NHANES depression screener questionnaire.
FIGURE 3Histograms of age from NHANES data cycles 2005–2010 showing the shift in distribution with unweighted observations (left) and when sampling weights are applied (right).
FIGURE 4Proportion of missing data from the full NHANES dataset (left) and modified complete cases data (right) with percentage of component missingness (top) and patterns (bottom) of component missingness among NHANES participants.
FIGURE 5Patterns of missing data by race from the full NHANES data. Mexican-Americans and other Hispanics were far less likely to complete the Food Security Survey Module.
Comparison of unstandardized coefficients and standard errors for the structural model of each SEM iteration
| Iteration 1 | Iteration 2 | Iteration 3 | Iteration 4 | |||||
|---|---|---|---|---|---|---|---|---|
| Complete cases without weightings ( | Complete cases with weightings ( | Imputed complete cases with weightings ( | Full dataset with weightings and MI ( | |||||
| Coefficient | SE | Coefficient | SE | Coefficient | SE | Coefficient | SE | |
| Food security on age | −0.006 | 0.001 | −0.004 | 0.001 | −0.004 | 0.001 | −0.006 | 0.001 |
| Cotinine on age | −0.475 | 0.056 | −0.540 | 0.086 | −0.569 | 0.085 | −0.544 | 0.075 |
| Cotinine on food security | 8.561 | 1.219 | 10.807 | 2.035 | 10.63 | 1.828 | 9.918 | 1.407 |
| Depression on age | −0.003 | 0.001 | −0.003 | 0.001 | −0.002 | 0.001 | −0.002 | 0.000 |
| Cotinine on depression | 10.264 | 1.392 | 9.719 | 1.526 | 9.668 | 1.696 | 11.17 | 1.128 |
| Depression on food security | 0.155 | 0.014 | 0.180 | 0.018 | 0.176 | 0.018 | 0.166 | 0.011 |
All coefficients were statistically significant P < 0.001. MI, multiple imputation; SEM, structural equation modeling.
Comparison fit statistics between ML and QML with Satorra-Bentler correction among model iterations
| Iteration 1 | Iteration 2 | Iteration 3 | Iteration 4 | Iteration 5 | |||||
|---|---|---|---|---|---|---|---|---|---|
| Complete cases without weightings ( | Complete cases with weightings ( | Imputed complete cases with weightings ( | Full dataset with weightings and MI ( | Final respecified model ( | |||||
| ML | QML | ML | QML | ML | QML | ML | QML | QML | |
| Chi-square test | <0.001 | <0.001 | <0.001 | <0.001 | <0.001 | <0.001 | <0.001 | <0.001 | <0.001 |
| Root mean square error of approximation (RMSEA) (90% CI) | 0.062 (0.0590, 0.064) | 0.061 | 0.066 (0.063, 0.068) | 0.064 | 0.064 (0.062, 0.067) | 0.063 | 0.062 (0.061, 0.064) | 0.062 | 0.024 |
| Standardized root mean square residual (SRMR) | 0.035 | 0.032 | 0.037 | 0.035 | 0.034 | 0.034 | 0.032 | 0.032 | 0.012 |
| Comparative fit index (CFI) | 0.925 | 0.927 | 0.914 | 0.916 | 0.916 | 0.919 | 0.928 | 0.930 | 0.971 |
| Tucker-Lewis Index (TLI) | 0.903 | 0.906 | 0.888 | 0.892 | 0.891 | 0.895 | 0.907 | 0.909 | 0.986 |
Denotes robust statistic for QML with Satorra-Bentler correction as produced by the Lavaan package in R. MI, multiple imputation; ML, maximum likelihood; QML, quasimaximum likelihood.
Respecified SEM model with unstandardized coefficients and standard errors based on the use of QML with the Satorra-Bentler correction for the NHANES cycles 2005–2010 (n = 17,132, total sample = 214,755,655)
| Estimate | SE |
|
| |
|---|---|---|---|---|
| Measurement model | ||||
| Depression on | ||||
| DPQ010 | 0.425 | 0.012 | 55.242 | <0.001 |
| DPQ020 | 0.483 | 0.012 | 49.53 | <0.001 |
| DPQ030 | 0.478 | 0.014 | 40.028 | <0.001 |
| DPQ040 | 0.546 | 0.012 | 48.576 | <0.001 |
| DPQ050 | 0.415 | 0.011 | 56.587 | <0.001 |
| DPQ060 | 0.396 | 0.011 | 54.356 | <0.001 |
| DPQ070 | 0.368 | 0.012 | 50.414 | <0.001 |
| DPQ080 | 0.238 | 0.011 | 34.862 | <0.001 |
| DPQ090 | 0.123 | 0.01 | 21.413 | <0.001 |
| Structural model | ||||
| Food security on age | −0.005 | 0.001 | −7.333 | <0.001 |
| Cotinine on age | −0.565 | 0.078 | −7.232 | <0.001 |
| Cotinine on food security | 9.171 | 1.464 | 7.049 | <0.001 |
| Depression on age | −0.002 | 0.000 | −4.011 | <0.001 |
| Cotinine on depression | 11.818 | 1.260 | 9.902 | <0.001 |
| Depression on food security | 0.169 | 0.011 | 15.143 | <0.001 |
| Variances | ||||
| DPQ010 | 0.234 | 0.009 | 25.420 | <0.001 |
| DPQ020 | 0.179 | 0.007 | 26.810 | <0.001 |
| DPQ030 | 0.590 | 0.016 | 37.445 | <0.001 |
| DPQ040 | 0.505 | 0.015 | 34.272 | <0.001 |
| DPQ050 | 0.348 | 0.011 | 31.230 | <0.001 |
| DPQ060 | 0.186 | 0.007 | 26.755 | <0.001 |
| DPQ070 | 0.230 | 0.008 | 27.312 | <0.001 |
| DPQ080 | 0.159 | 0.006 | 25.369 | <0.001 |
| DPQ090 | 0.059 | 0.005 | 12.948 | <0.001 |
| Food security | 1.785 | 0.127 | 14.060 | <0.001 |
| Cotinine | 16,680.716 | 739.710 | 22.550 | <0.001 |
DPQ, NHANES depression screener questionnaire; QML, quasimaximum likelihood; SEM, structural equation modeling.