| Literature DB >> 30732587 |
Abstract
BACKGROUND: It is common in applied epidemiological and clinical research to convert continuous variables into categorical variables by grouping values into categories. Such categorized variables are then often used as exposure variables in some regression model. There are numerous statistical arguments why this practice should be avoided, and in this paper we present yet another such argument.Entities:
Keywords: Categorization; Dichotomization; Interaction; Measurement error; Regression
Mesh:
Year: 2019 PMID: 30732587 PMCID: PMC6367751 DOI: 10.1186/s12874-019-0667-2
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Fig. 1The figure gives the absolute value of the ratio between and for ρ = 0.2 (blue line), 0.5 (red line) and 0.7 (green line) as a function of the cut point c when β1 = β2 = 1
Fig. 2The figure gives for ρ = 0.2 (blue line), 0.5 (red line) and 0.7 (green line) as a function of the cut point c. True β2 = β1 = 1
Results of the simulation study
| Normal | Uniform | Chi-square | ||
|---|---|---|---|---|
| 60th percentile |
| 1.74 (0.11) | 0.54 (0.04) | 2.99 (0.25) |
|
| 1.73 (0.11) | 0.54 (0.04) | 3.00 (0.26) | |
|
| −0.06 (0.20) | − 0.03 (0.06) | 0.63 (0.40) | |
| 60th percentile |
| 1.89 (0.14) | 0.59 (0.04) | 2.81 (0.28) |
|
| 1.88 (0.14) | 0.59 (0.04) | 2.83 (0.29) | |
|
| −0.16 (0.21) | − 0.08 (0.06) | 1.38 (0.43) | |
| 60th percentile |
| 1.95 (0.15) | 0.62 (0.04) | 2.60 (0.31) |
|
| 1.94 (0.15) | 0.61 (0.04) | 2.63 (0.32) | |
|
| −0.24 (0.22) | − 0.12 (0.07) | 1.82 (0.48) | |
| 80th percentile |
| 1.96 (0.14) | 0.57 (0.04) | 4.22 (0.27) |
|
| 1.96 (0.14) | 0.57 (0.04) | 4.23 (0.28) | |
|
| −0.23 (0.26) | − 0.11 (0.07) | 0.25 (0.55) | |
| 80th percentile |
| 2.22 (0.15) | 0.67 (0.04) | 4.35 (0.31) |
|
| 2.22 (0.16) | 0.67 (0.05) | 4.35 (0.32) | |
|
| −0.58 (0.25) | − 0.27 (0.07) | 0.50 (0.54) | |
| 80th percentile |
| 2.35 (0.16) | 0.74 (0.05) | 4.35 (0.35) |
|
| 2.36 (0.18) | 0.74 (0.05) | 4.34 (0.35) | |
|
| −0.81 (0.26) | − 0.40 (0.08) | 0.58 (0.56) |
The true regression coefficients β = 1, i = 1, 2. The table gives estimated regression coefficients with corresponding empirical standard errors in parentheses
Results of the simulation study
| Normal | Uniform | Chi-square | ||
|---|---|---|---|---|
| 80th percentile |
| 1.65 (0.16) | 0.50 (0.05) | 3.35 (0.35) |
|
| 0.70 (0.18) | 0.24 (0.05) | 0.99 (0.35) | |
|
| −0.40 (0.26) | − 0.20 (0.08) | 0.29 (0.55) | |
| 80th percentile |
| 4.72 (0.17) | 1.48 (0.05) | 8.69 (0.36) |
|
| 4.72 (0.18) | 1.48 (0.05) | 8.68 (0.37) | |
|
| −1.63 (0.27) | − 0.79 (0.08) | 1.16 (0.61) | |
The table gives estimated regression coefficients with corresponding empirical standard errors in parentheses
Illustration of the effect of collapsing exposure categories
|
| Diseased | Not diseased | Total | RR |
| Diseased | Not diseased | Total | RR |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 20 | 380 | 400 | 1.0 | 1 | 5 | 95 | 100 | 1.0 |
| 2 | 30 | 270 | 300 | 2.0 | 2 | 20 | 180 | 200 | 2.0 |
| 3 | 30 | 170 | 200 | 3.0 | 3 | 45 | 255 | 300 | 3.0 |
| 4 | 20 | 80 | 100 | 4.0 | 4 | 80 | 320 | 400 | 4.0 |
The table gives the true situation
Illustration of the effect of collapsing exposure categories
|
| Diseased | Not diseased | Total | RR |
| Diseased | Not diseased | Total | RR |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 50 | 650 | 700 | 1.00 | 1 | 25 | 275 | 300 | 1.00 |
| 2 | 50 | 250 | 300 | 2.33 | 2 | 125 | 575 | 700 | 2.14 |
The table gives the observed situation after collapsing