| Literature DB >> 27774446 |
Lihan Yan1, Yongmin Sun2, Michael R Boivin3, Paul O Kwon3, Yuanzhang Li3.
Abstract
This paper reviews several common challenges encountered in statistical analyses of epidemiological data for epidemiologists. We focus on the application of linear regression, multivariate logistic regression, and log-linear modeling to epidemiological data. Specific topics include: (a) deletion of outliers, (b) heteroscedasticity in linear regression, (c) limitations of principal component analysis in dimension reduction, (d) hazard ratio vs. odds ratio in a rate comparison analysis, (e) log-linear models with multiple response data, and (f) ordinal logistic vs. multinomial logistic models. As a general rule, a thorough examination of a model's assumptions against both current data and prior research should precede its use in estimating effects.Entities:
Keywords: epidemiology; hazard ratio; log-linear; logistic; odds ratio; principal component analysis; regression; relative risk
Year: 2016 PMID: 27774446 PMCID: PMC5053988 DOI: 10.3389/fpubh.2016.00207
Source DB: PubMed Journal: Front Public Health ISSN: 2296-2565
The death rate and cigarette data in Freedman et al.1
| Obs | Country | Cigarette | Deaths per million |
|---|---|---|---|
| 1 | Australia | 480 | 180 |
| 2 | Canada | 500 | 150 |
| 3 | Denmark | 380 | 170 |
| 4 | Finland | 1100 | 350 |
| 5 | Great Britain | 1100 | 460 |
| 6 | Iceland | 230 | 60 |
| 7 | Netherlands | 490 | 240 |
| 8 | Norway | 250 | 90 |
| 9 | Sweden | 300 | 110 |
| 10 | Switzerland | 510 | 250 |
| 11 | USA | 1300 | 200 |
Figure 1The deaths by cigarette using: regression with USA and Without USA.
Figure 2The SD changes by average cigarette.
Parameter estimation and model fitting.
| Model | Parameter | Estimate | SE | |||
|---|---|---|---|---|---|---|
| 1 | α | 67.561 | 49.06 | 1.38 | 0.2 | 0.54 |
| β | 0.228 | 0.07 | 3.27 | 0.01 | ||
| 2 | α | 9.074 | 1.644 | 5.52 | <0.0001 | 0.56 |
| β | 0.008 | 0.002 | 3.36 | 0.01 | ||
| 3 | α | 4.483 | 0.248 | 18.07 | <0.0001 | 0.54 |
| β | 0.001 | 0 | 3.25 | 0.01 | ||
| 4 | α | 0.422 | 0.058 | 7.24 | <0.0001 | 0.26 |
| β | 0 | 0 | −1.2 | 0.26 | ||
| 5 | α | 0.352 | 0.071 | 4.98 | <0.0001 | 0.37 |
| β | 4.274 | 27.749 | 0.15 | 0.88 | ||
| 6 | α | −44.763 | 58.595 | −0.76 | 0.467 | 0.73 |
| c | 39.883 | 33.856 | 1.18 | 0.273 | ||
| β | 0.366 | 0.123 | 2.98 | 0.018 |
Hald’s data.
| 0.955 | 7 | 26 | 6 | 60 |
| 0.746 | 1 | 29 | 15 | 52 |
| −2.323 | 11 | 56 | 8 | 20 |
| −0.82 | 11 | 31 | 8 | 47 |
| 0.471 | 7 | 52 | 6 | 33 |
| −0.299 | 11 | 55 | 9 | 22 |
| 0.21 | 3 | 71 | 17 | 6 |
| 0.558 | 1 | 31 | 22 | 44 |
| −0.119 | 2 | 54 | 18 | 22 |
| 0.496 | 21 | 47 | 4 | 26 |
| 0.781 | 1 | 40 | 23 | 34 |
| 0.918 | 11 | 66 | 9 | 12 |
| 0.918 | 10 | 68 | 8 | 12 |
Effects of collinearity.
| Model | Variable | DF | Parameter | SE | Pr > | | |
|---|---|---|---|---|---|---|
| 1 | Intercept | 1 | 4.49 | 4.33 | 0.3227 | 0.05 |
| 1 | −0.34 | 0.46 | 0.4724 | |||
| 2 | Intercept | 1 | −0.76 | 5.60 | 0.8948 | 0.026 |
| 1 | 0.09 | 0.16 | 0.5988 | |||
| 3 | Intercept | 1 | 6.81 | 17.62 | 0.708 | 0.05 |
| 1 | −0.37 | 0.92 | 0.6962 | |||
| 1 | −0.03 | 0.20 | 0.8775 | |||
| 1 | −0.05 | 0.83 | 0.9522 | |||
| 4 | Intercept | 1 | −804.74 | 132.13 | 0.0003 | 0.83 |
| 1 | 7.90 | 1.40 | 0.0005 | |||
| 1 | 8.35 | 1.36 | 0.0003 | |||
| 1 | 8.41 | 1.42 | 0.0004 | |||
| 1 | 8.23 | 1.34 | 0.0003 |
Eigenvalue and eigenvactor for Hald’s data.
| Prin1 | Prin2 | Prin3 | Prin4 | |
|---|---|---|---|---|
| Eigenvalue | 2.236 | 1.576 | 0.187 | 0.002 |
| 0.476 | −0.509 | 0.676 | 0.241 | |
| 0.564 | 0.414 | −0.314 | 0.642 | |
| −0.394 | 0.605 | 0.638 | 0.268 | |
| −0.548 | −0.451 | −0.195 | 0.677 |
Regression analysis by PCA score.
| Model | Variable | Parameter | SE | Pr > | | |
|---|---|---|---|---|---|
| Prin1–prin3 | Intercept | 0.19 | 0.29 | 0.52 | 0.0596 |
| Prin1 | −0.13 | 0.2 | 0.53 | ||
| Prin2 | 0.06 | 0.24 | 0.82 | ||
| Prin3 | −0.2 | 0.69 | 0.77 | ||
| Prin1–prin4 | Intercept | 0.19 | 0.13 | 0.17 | 0.8345 |
| Prin1 | −0.13 | 0.09 | 0.18 | ||
| Prin2 | 0.06 | 0.11 | 0.61 | ||
| Prin3 | −0.2 | 0.31 | 0.53 | ||
| Prin4 | 20.22 | 3.3 | 0.0003 |
Comparison of regression coefficients in multiple regressions.
| Model | Variable | Parameter | SE | |
|---|---|---|---|---|
| 3 | Intercept | 278.916 | 48.804 | 0.000 |
| −1.826 | 0.685 | 0.024 | ||
| 0.132 | 0.254 | 0.614 | ||
| −0.712 | 0.312 | 0.046 | ||
| 4 | Intercept | 0.000 | 1.544 | 1.000 |
| −0.712 | 0.285 | 0.028 |
Figure 3The unexplained information of .
Employee survey data.
| Obs | Age | Graduate | Load | Grade | |||
|---|---|---|---|---|---|---|---|
| A | B | C | D | ||||
| 1 | <25 | Yes | <4 | 10 | 7 | 9 | 8 |
| 2 | <25 | No | <4 | 18 | 28 | 15 | 12 |
| 3 | >25 | Yes | <4 | 12 | 13 | 11 | 9 |
| 4 | >25 | No | <4 | 38 | 28 | 22 | 14 |
| 5 | <25 | Yes | 4 | 17 | 23 | 32 | 19 |
| 6 | <25 | No | 4 | 25 | 27 | 21 | 11 |
| 7 | ≥25 | Yes | 4 | 12 | 24 | 37 | 27 |
| 8 | ≥25 | No | 4 | 17 | 29 | 31 | 9 |
| 9 | ≤25 | Yes | 5 | 7 | 6 | 12 | 15 |
| 10 | ≤25 | No | 5 | 11 | 25 | 34 | 21 |
| 11 | ≥25 | Yes | 5 | 6 | 12 | 9 | 44 |
| 12 | ≥25 | No | 5 | 12 | 39 | 20 | 22 |
| 13 | ≤25 | Yes | >5 | 9 | 9 | 13 | 18 |
| 14 | ≤25 | No | >5 | 12 | 24 | 29 | 23 |
| 15 | ≥25 | Yes | >5 | 8 | 9 | 6 | 40 |
| 16 | ≥25 | No | >5 | 15 | 29 | 23 | 24 |
Comparison of the estimates from log-linear and logistic models.
| Log-linear model | Logistic model | ||||||
|---|---|---|---|---|---|---|---|
| Source | DF | Chi-Square | Pr > ChiSq | Effect | DF | Wald Chi-square | Pr > ChiSq |
| Grade | 3 | 21.72 | <0.0001 | ||||
| Age | 1 | 5.77 | 0.02 | ||||
| Age*grade | 3 | 6.96 | 0.07 | Age | 3 | 7.05 | 0.07 |
| Graduate | 1 | 53.63 | <0.0001 | ||||
| Graduate*grade | 3 | 48.76 | <0.0001 | Graduate | 3 | 47.98 | <0.0001 |
| Age*graduate | 1 | 0.59 | 0.44 | ||||
| Age*graduate*grade | 3 | 8.64 | 0.03 | Age*graduate | 3 | 6.90 | 0.08 |
| Load | 3 | 24.46 | <0.0001 | ||||
| Load*grade | 9 | 79.55 | <0.0001 | Load | 9 | 74.74 | <0.0001 |
| Load*graduate | 3 | 40.73 | <0.0001 | ||||
Results from multinomial logistic regression.
| Parameter | Score | Estimate (β) | SE | Model fitting | |||
|---|---|---|---|---|---|---|---|
| Intercept | vg | 0.000 | 0.200 | 1.000 | AIC | 3356 | 3323 |
| Intercept | g | 0.095 | 0.195 | 0.626 | SC | 3376 | 3383 |
| Intercept | m | 1.030 | 0.165 | <0.0001 | −2Log L | 3348 | 3299 |
| Intercept | b | 0.365 | 0.184 | 0.048 | OR | 95% CI | |
| Game 1 | vg | 0.420 | 0.276 | 0.128 | 1.522 | 0.886 | 2.612 |
| Game 1 | g | 0.339 | 0.272 | 0.213 | 1.403 | 0.823 | 2.391 |
| Game 1 | m | 0.159 | 0.236 | 0.500 | 1.172 | 0.739 | 1.86 |
| Game 1 | b | −0.099 | 0.269 | 0.713 | 0.906 | 0.535 | 1.534 |
| Game 2 | vg | −1.253 | 0.323 | 0.000 | 0.286 | 0.152 | 0.538 |
| Game 2 | g | −0.760 | 0.283 | 0.007 | 0.468 | 0.268 | 0.815 |
| Game 2 | m | −0.411 | 0.222 | 0.064 | 0.663 | 0.43 | 1.024 |
| Game 2 | b | −0.231 | 0.246 | 0.348 | 0.794 | 0.490 | 1.286 |
.
Analysis of maximum likelihood estimates.
| Parameter | DF | Estimate (slope) | SE | Wald Chi-Square | Pr > ChiSq | |
|---|---|---|---|---|---|---|
| Intercept | vg | 1 | −1.9087 | 0.1189 | 257.592 | <0.0001 |
| Intercept | g | 1 | −0.9356 | 0.1022 | 83.7943 | <0.0001 |
| Intercept | m | 1 | 0.7305 | 0.1004 | 52.9434 | <0.0001 |
| Intercept | b | 1 | 1.8493 | 0.1162 | 253.437 | <0.0001 |
| Game | Game 1 | 1 | 0.3098 | 0.1307 | 5.6179 | 0.0178 |
| Game | Game 2 | 1 | −0.5748 | 0.1365 | 17.7412 | <0.0001 |
| Game | Game 3 | 0 | 0 | – | – |
Comparisons between OR and HR for approximation of the RR.
| RR | OR | HR | ||
|---|---|---|---|---|
| 1.1 | 0.011 | 0.01 | 1.1011 | 1.1006 |
| 0.11 | 0.1 | 1.112 | 1.106 | |
| 0.22 | 0.2 | 1.128 | 1.113 | |
| 1.5 | 0.015 | 0.01 | 1.508 | 1.504 |
| 0.15 | 0.1 | 1.588 | 1.543 | |
| 0.3 | 0.2 | 1.714 | 1.598 | |
| 2.0 | 0.02 | 0.01 | 2.020 | 2.010 |
| 0.11 | 0.1 | 1.112 | 1.106 | |
| 0.22 | 0.2 | 1.128 | 1.113 | |
| 3.0 | 0.03 | 0.01 | 3.062 | 3.031 |
| 0.3 | 0.1 | 3.857 | 3.385 | |
| 0.6 | 0.2 | 6.000 | 4.106 | |
| 4.0 | 0.4 | 0.1 | 6.000 | 4.848 |
| 0.4 | 0.1 | 6.000 | 4.848 | |
| 0.8 | 0.2 | 16.000 | 7.213 |
| Response function | Assigned score |
|---|---|
| Very good: ln( | 0 |
| Good: ln( | 1 |
| Medium: ln( | 2 |
| Bad: ln(( | 3 |