| Literature DB >> 29552553 |
Chia-Ling Kuo1,2, Yinghui Duan1,2, James Grady1,2.
Abstract
Matching on demographic variables is commonly used in case-control studies to adjust for confounding at the design stage. There is a presumption that matched data need to be analyzed by matched methods. Conditional logistic regression has become a standard for matched case-control data to tackle the sparse data problem. The sparse data problem, however, may not be a concern for loose-matching data when the matching between cases and controls is not unique, and one case can be matched to other controls without substantially changing the association. Data matched on a few demographic variables are clearly loose-matching data, and we hypothesize that unconditional logistic regression is a proper method to perform. To address the hypothesis, we compare unconditional and conditional logistic regression models by precision in estimates and hypothesis testing using simulated matched case-control data. Our results support our hypothesis; however, the unconditional model is not as robust as the conditional model to the matching distortion that the matching process not only makes cases and controls similar for matching variables but also for the exposure status. When the study design involves other complex features or the computational burden is high, matching in loose-matching data can be ignored for negligible loss in testing and estimation if the distributions of matching variables are not extremely different between cases and controls.Entities:
Keywords: frequency matching; individual matching; loose matching; precision in estimates and tests; sparse data problem; width of 95% confidence interval
Year: 2018 PMID: 29552553 PMCID: PMC5840200 DOI: 10.3389/fpubh.2018.00057
Source DB: PubMed Journal: Front Public Health ISSN: 2296-2565
Figure 1Age distributions of cases (white) and controls (grey) in the population, where the age distributions of exposed and unexposed subjects are N(70, 102) and N(65, 102), respectively, and OR (agex10) denotes odds ratio associated with a 10-year increase in age.
Figure 2Age distributions of cases (white) and controls (grey) in the population where the age distributions of exposed and unexposed subjects are N(70, 102) and N(50, 102) and OR (agex10) denotes odds ratio associated with a 10-year increase in age.
Type I errors of unconditional and conditional logistic regression models.
| Age distribution of unexposed and exposed subjects (in years) | ||||||
|---|---|---|---|---|---|---|
| Unconditional | Conditional | Unconditional | Conditional | Unconditional | Conditional | |
| 0 | 0.048 | 0.048 | 0.046 | 0.046 | 0.053 | |
| 1 | 0.052 | 0.051 | 0.047 | 0.048 | 0.052 | |
| 2 | 0.049 | 0.049 | 0.049 | 0.050 | 0.049 | |
| 3 | 0.051 | 0.051 | 0.050 | 0.050 | 0.048 | |
| 0 | 0.050 | 0.049 | 0.050 | 0.050 | 0.051 | |
| 1 | 0.048 | 0.046 | 0.049 | |||
| 2 | 0.050 | 0.049 | 0.051 | 0.051 | 0.052 | |
| 3 | 0.052 | 0.050 | 0.052 | 0.051 | 0.049 | |
| 0 | 0.051 | 0.050 | 0.051 | 0.051 | 0.047 | |
| 1 | 0.048 | 0.049 | 0.051 | 0.050 | 0.050 | |
| 2 | 0.053 | 0.053 | 0.046 | 0.046 | 0.047 | |
| 3 | 0.052 | 0.049 | 0.047 | 0.048 | 0.048 | |
| 0 | 0.049 | 0.048 | 0.050 | |||
| 1 | 0.048 | 0.047 | 0.051 | 0.052 | 0.050 | |
| 2 | 0.047 | 0.050 | 0.050 | 0.053 | ||
| 3 | 0.052 | 0.052 | 0.050 | 0.050 | 0.046 | 0.053 |
Cases and controls were matched by age ± d. The odds ratio associated with the exposure was 1 under the null hypothesis, H.
Power of unconditional and conditional logistic regression models.
| Age distribution of unexposed and exposed subjects (in years) | ||||||
|---|---|---|---|---|---|---|
| Unconditional | Conditional | Unconditional | Conditional | Unconditional | Conditional | |
| 0 | 0.73 | 0.73 | 0.78 | 0.77 | 0.78 | 0.80 |
| 1 | 0.77 | 0.76 | 0.76 | 0.77 | 0.78 | 0.81 |
| 2 | 0.73 | 0.72 | 0.76 | 0.75 | 0.78 | 0.81 |
| 3 | 0.73 | 0.73 | 0.77 | 0.78 | 0.78 | 0.80 |
| 0 | 0.76 | 0.76 | 0.80 | 0.80 | 0.82 | 0.84 |
| 1 | 0.75 | 0.74 | 0.81 | 0.82 | 0.79 | 0.83 |
| 2 | 0.80 | 0.80 | 0.81 | 0.80 | 0.81 | 0.83 |
| 3 | 0.78 | 0.78 | 0.82 | 0.82 | 0.81 | 0.84 |
| 0 | 0.80 | 0.79 | 0.80 | 0.80 | 0.76 | 0.78 |
| 1 | 0.79 | 0.79 | 0.83 | 0.82 | 0.81 | 0.83 |
| 2 | 0.79 | 0.79 | 0.82 | 0.80 | 0.78 | 0.82 |
| 3 | 0.78 | 0.77 | 0.80 | 0.80 | 0.75 | 0.76 |
| 0 | 0.76 | 0.76 | 0.83 | 0.83 | 0.77 | 0.80 |
| 1 | 0.80 | 0.80 | 0.85 | 0.85 | 0.76 | 0.78 |
| 2 | 0.79 | 0.78 | 0.82 | 0.82 | 0.75 | 0.79 |
| 3 | 0.81 | 0.78 | 0.85 | 0.83 | 0.71 | 0.74 |
Cases and controls were matched by age ± d. The odds ratio associated with the exposure was 1.5 under the alternative hypothesis, H.
Biases of unconditional and conditional logistic regression models under the null hypothesis.
| Age distribution of unexposed and exposed subjects (in years) | ||||||
|---|---|---|---|---|---|---|
| Unconditional | Conditional | Unconditional | Conditional | Unconditional | Conditional | |
| 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 1 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 2 | 0.00 | 0.00 | 0.00 | 0.00 | 0.01 | 0.01 |
| 3 | 0.00 | 0.00 | 0.01 | 0.01 | 0.01 | 0.01 |
| 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 1 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 2 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 3 | 0.00 | 0.00 | 0.01 | 0.01 | 0.00 | 0.00 |
| 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 1 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 2 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 3 | 0.00 | 0.00 | 0.00 | 0.00 | −0.01 | 0.00 |
| 0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 1 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
| 2 | 0.00 | 0.00 | 0.00 | 0.00 | −0.01 | −0.01 |
| 3 | 0.00 | 0.00 | 0.00 | 0.00 | −0.02 | −0.01 |
Cases and controls were matched by age ± d. The odds ratio associated with the exposure was 1 under the null hypothesis, H.
Widths of 95% confidence interval of unconditional and conditional logistic regression models under the null hypothesis.
| Age distribution of unexposed and exposed subjects (in years) | ||||||
|---|---|---|---|---|---|---|
| Unconditional | Conditional | Unconditional | Conditional | Unconditional | Conditional | |
| 0 | 0.60 | 0.61 | 0.57 | 0.58 | 0.52 | 0.60 |
| 1 | 0.58 | 0.59 | 0.60 | 0.61 | 0.51 | 0.59 |
| 2 | 0.62 | 0.63 | 0.58 | 0.60 | 0.50 | 0.58 |
| 3 | 0.61 | 0.62 | 0.59 | 0.60 | 0.50 | 0.58 |
| 0 | 0.61 | 0.62 | 0.57 | 0.58 | 0.49 | 0.56 |
| 1 | 0.56 | 0.57 | 0.56 | 0.57 | 0.48 | 0.54 |
| 2 | 0.58 | 0.58 | 0.56 | 0.57 | 0.48 | 0.54 |
| 3 | 0.60 | 0.62 | 0.55 | 0.56 | 0.50 | 0.56 |
| 0 | 0.57 | 0.58 | 0.56 | 0.57 | 0.50 | 0.56 |
| 1 | 0.59 | 0.61 | 0.53 | 0.54 | 0.50 | 0.57 |
| 2 | 0.57 | 0.59 | 0.56 | 0.57 | 0.51 | 0.58 |
| 3 | 0.60 | 0.61 | 0.56 | 0.58 | 0.52 | 0.59 |
| 0 | 0.58 | 0.59 | 0.56 | 0.57 | 0.53 | 0.59 |
| 1 | 0.57 | 0.58 | 0.53 | 0.54 | 0.50 | 0.57 |
| 2 | 0.57 | 0.58 | 0.55 | 0.57 | 0.52 | 0.60 |
| 3 | 0.61 | 0.63 | 0.54 | 0.56 | 0.53 | 0.61 |
Cases and controls were matched by age ± d. The odds ratio associated with the exposure was 1 under the null hypothesis, H.
Percents of bias (%) of unconditional and conditional logistic regression models under the alternative hypothesis.
| Age distribution of unexposed and exposed subjects (in years) | ||||||
|---|---|---|---|---|---|---|
| Unconditional | Conditional | Unconditional | Conditional | Unconditional | Conditional | |
| 0 | −1.36 | −1.22 | −0.40 | 0.95 | 1.49 | |
| 1 | 0.27 | 0.68 | −1.53 | −0.32 | 1.61 | |
| 2 | −1.75 | −1.33 | −1.64 | −0.11 | 0.95 | |
| 3 | −1.04 | −0.79 | −0.23 | 1.38 | 3.26 | |
| 0 | 0.14 | 0.17 | 0.31 | 0.91 | 2.44 | |
| 1 | −3.93 | −3.49 | 0.26 | 1.22 | 0.15 | |
| 2 | 2.48 | 3.17 | 0.28 | 1.43 | 1.62 | |
| 3 | 1.69 | 1.97 | 0.51 | 1.38 | 0.89 | |
| 0 | −0.10 | −0.05 | −1.85 | −1.16 | −2.52 | |
| 1 | 1.93 | 2.49 | −1.82 | −1.32 | 0.78 | |
| 2 | 0.06 | 0.66 | −0.73 | −0.20 | 0.45 | |
| 3 | 0.91 | 1.17 | −0.26 | 0.22 | −3.13 | |
| 0 | −2.16 | −2.13 | 0.10 | 0.72 | 0.30 | |
| 1 | 0.84 | 1.38 | 0.34 | 1.22 | −0.08 | |
| 2 | 0.08 | 0.39 | −0.56 | 0.04 | 0.99 | |
| 3 | 1.77 | 1.98 | 1.15 | 1.67 | −2.68 | |
Cases and controls were matched by age ± d. The odds ratio associated with the exposure was 1.5 under the alternative hypothesis, H.
Widths of 95% confidence interval of unconditional and conditional logistic regression models under the alternative hypothesis.
| Age distribution of unexposed and exposed subjects (in years) | ||||||
|---|---|---|---|---|---|---|
| Unconditional | Conditional | Unconditional | Conditional | Unconditional | Conditional | |
| 0 | 0.60 | 0.61 | 0.57 | 0.58 | 0.52 | 0.60 |
| 1 | 0.58 | 0.59 | 0.60 | 0.61 | 0.51 | 0.59 |
| 2 | 0.62 | 0.63 | 0.58 | 0.60 | 0.50 | 0.58 |
| 3 | 0.61 | 0.62 | 0.59 | 0.60 | 0.50 | 0.58 |
| 0 | 0.61 | 0.62 | 0.57 | 0.58 | 0.49 | 0.56 |
| 1 | 0.56 | 0.57 | 0.56 | 0.57 | 0.48 | 0.54 |
| 2 | 0.58 | 0.58 | 0.56 | 0.57 | 0.48 | 0.54 |
| 3 | 0.60 | 0.62 | 0.55 | 0.56 | 0.50 | 0.56 |
| 0 | 0.57 | 0.58 | 0.56 | 0.57 | 0.50 | 0.56 |
| 1 | 0.59 | 0.61 | 0.53 | 0.54 | 0.50 | 0.57 |
| 2 | 0.57 | 0.59 | 0.56 | 0.57 | 0.51 | 0.58 |
| 3 | 0.60 | 0.61 | 0.56 | 0.58 | 0.52 | 0.59 |
| 0 | 0.58 | 0.59 | 0.56 | 0.57 | 0.53 | 0.59 |
| 1 | 0.57 | 0.58 | 0.53 | 0.54 | 0.50 | 0.57 |
| 2 | 0.57 | 0.58 | 0.55 | 0.57 | 0.52 | 0.60 |
| 3 | 0.61 | 0.63 | 0.54 | 0.56 | 0.53 | 0.61 |
Cases and controls were matched by age ± d. The odds ratio associated with the exposure was 1.5 under the alternative hypothesis, H.
2 by 2 table of exposure status vs. disease status.
| Case | Control | |
|---|---|---|
| Exposed | ||
| Unexposed |