| Literature DB >> 16539711 |
Abstract
BACKGROUND: Structural equation modelling (SEM) has been increasingly used in medical statistics for solving a system of related regression equations. However, a great obstacle for its wider use has been its difficulty in handling categorical variables within the framework of generalised linear models.Entities:
Mesh:
Year: 2006 PMID: 16539711 PMCID: PMC1431551 DOI: 10.1186/1471-2288-6-13
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Figure 1Statistical problems needing SEM approach.
Simulated data: Observed odds ratios (OR), associated 95% confidence intervals (CI) and SEM regression coefficients with corresponding standard errors (SE) obtained via ML estimation (N = 5000)
| Observed association | SEM-predicted effects | |||
| Parameter* | OR (95% CI) for the variable pairs | Correlation (Q) estimate | Regression estimate (SE) in Q-metric | Regression estimate (95% CI) in OR-metric** |
| a1 (BIN1→YBIN) | 2.138 (1.887, 2.423) | 0.3627 | 0.0281 (0.0039) | 1.058 (1.042, 1.074) |
| a2 (BIN2→YBIN) | 3.711 (3.255, 4.232) | 0.5755 | 0.1036 (0.0044) | 1.231 (1.210, 1.253) |
| a3 (BIN3→YBIN) | 0.364 (0.321, 0.414) | -0.4660 | -0.4979 (0.0033) | 0.335 (0.329, 0.341) |
| a4 (MBIN→YBIN) | 10.883 (9.411, 12.586) | 0.8137 | 0.7760 (0.0050) | 7.929 (7.554, 8.337) |
| b1 (BIN1→MBIN) | 2.632 (2.304, 3.006) | 0.4493 | 0.4479 (0.0093) | 2.622 (2.507, 2.746) |
| b2 (BIN2→MBIN) | 4.095 (3.561, 4.709) | 0.6075 | 0.6070 (0.0093) | 4.089 (3.863, 4.337) |
| b3 (BIN3→MBIN) | 1.083 (0.955, 1.229) | 0.0398 | 0.0276 (0.0093) | 1.0568 (1.019, 1.096) |
* Arrows point to the dependent variables in the model (see Figure 2)
** Back-transformed from Q to OR by (1+Q)/(1-Q)
Figure 2Simulated model.
Multivariate logistic regression for generated data: parameter estimates (standard errors) for large (N = 5000) and small (N = 100) samples
| YBIN outcome | MBIN outcome | |||
| N = 5000 | N = 100 | N = 5000 | N = 100 | |
| Intercept | -0.5735 (0.0787) | -0.3470 (0.6320) | -0.0835 (0.0627) | 0.9316 (0.4745) |
| BIN1 | 0.5602 (0.0781) | 0.3234 (0.5362) | 1.0596 (0.0713) | 0.9921 (0.9921) |
| BIN2 | 0.9941 (0.0791) | 1.3645 (0.5409) | 1.4787 (0.0734) | 1.4472 (0.5842) |
| BIN3 | -1.5431 (0.0844) | -1.6759 (0.5551) | 0.0708 (0.0691) | -0.8168 (0.5530) |
| MBIN | 2.3781 (0.0873) | 1.7528 (0.6260) | - | - |
Figure 3Normal probability plots for raw data residuals. Normal probability plots for raw data residuals in the simulated data model with two related outcomes: YBIN (top) and MBIN (bottom). Asterisk may represent up to 30 residuals.
Figure 4Comparison of SEM and logistic model estimates for the obstetric data example.
Percentage of correctly classified events for logistic regression (LR) models in table 2 versus SEM in tables 1 and 3
| Sample size | N = 5000 | N = 100 | ||||||
| Outcome | YBIN | MBIN | YBIN | MBIN | ||||
| Method | LR | SEM | LR | SEM | LR | SEM | LR | SEM |
| All outcomes | 80.36 | 80.28 | 74.32 | 74.36 | 81.00 | 80.00 | 80.00 | 72.00 |
| Events | 91.30 | 93.32 | 83.46 | 91.66 | 91.18 | 85.29 | 80.00 | 76.25 |
| Non-events | 54.42 | 49.36 | 48.47 | 25.42 | 59.38 | 68.75 | 0.00 | 55.00 |
Classification performance for the obstetric data example (N = 10574): logistic regression (LR) and SEM with Q-metric input (see Figure 4)
| Correctly classified (%) | Dependent variables | |||||
| Caesarian section | Low birthweight | Special Care Baby Unit | ||||
| LR * | SEM | LR * | SEM | LR * | SEM | |
| All outcomes | 85.7 | 83.8 | 95.4 | 83.9 | 95.3 | 82.7 |
| Events | 0.0 | 15.5 | 26.7 | 72.7 | 36.5 | 69.0 |
| Non events | 100.0 | 95.2 | 99.1 | 84.6 | 99.2 | 83.6 |
* Note: Logistic regression used the same outcome and predictor variables as the SEM model in Figure 4 but needed separate logistic models for each dependent variable.
Figure 5SEM with latent risk variable for the obstetric data example.
Small sample (N = 100) parameter estimates and their standard errors (SE) for SEM using Q-statistic input (correlations estimated via Yule's transformation)
| Observed association | SEM-predicted effects | |||
| Parameter* | OR (95% CI) for the variable pairs | Correlation (Q) estimate | Regression estimate (SE) in Q-metric | Regression estimate (95% CI) in OR-metric** |
| a1 (BIN1→YBIN) | 1.600 (0.669, 3.824) | 0.2308 | 0.0068 (0.0410) | 1.014 (0.863, 1.191) |
| a2 (BIN2→YBIN) | 3.881 (1.561, 9.650) | 0.5902 | 0.4354 (0.0485) | 2.542 (2.032, 3.259) |
| a3 (BIN3→YBIN) | 0.233 (0.092, 0.594) | -0.6220 | -0.5604 (0.0403) | 0.282 0.220, 0.350) |
| a4 (MBIN→YBIN) | 8.037 (2.700, 23.925) | 0.7787 | 0.3387 (0.0568) | 2.173 (1.589, 2.636) |
| b1 (BIN1→MBIN) | 2.581 (0.856, 7.782) | 0.4415 | 0.3697 (0.0625) | 2.173 (1.657, 2.939) |
| b2 (BIN2→MBIN) | 3.857 (1.278, 11.638) | 0.5882 | 0.5854 (0.0625) | 3.8239 (2.724, 5.847) |
| b3 (BIN3→MBIN) | 0.512 (0.185, 1.418) | -0.3230 | -0.3441 (0.0624) | 0.4880 (0.364, 0.637) |
* Arrows point to the dependent variables in the model (see Figure 2)
** Back-transformed from Q to OR by (1+Q)/(1-Q)