| Literature DB >> 17988388 |
Anna Oudin1, Jonas Björk, Ulf Strömberg.
Abstract
BACKGROUND: We plan to conduct a case-control study to investigate whether exposure to nitrogen dioxide (NO2) increases the risk of stroke. In case-control studies, selective participation can lead to bias and loss of efficiency. A two-phase design can reduce bias and improve efficiency by combining information on the non-participating subjects with information from the participating subjects. In our planned study, we will have access to individual disease status and data on NO2 exposure on group (area) level for a large population sample of Scania, southern Sweden. A smaller sub-sample will be selected to the second phase for individual-level assessment on exposure and covariables. In this paper, we simulate a case-control study based on our planned study. We develop a two-phase method for this study and compare the performance of our method with the performance of other two-phase methods.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17988388 PMCID: PMC2174445 DOI: 10.1186/1476-069X-6-34
Source DB: PubMed Journal: Environ Health ISSN: 1476-069X Impact factor: 5.984
Hypothetical population distribution in Scania
| Group-level exposure probabilities for 12 areas* within the categories Low Medium and High (%) | Smoking prevalence within exposure categories † (%) | |||||||||||||
| Area* | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | ||
| Low** | 92 | 89 | 88 | 85 | 77 | 60 | 40 | 35 | 32 | 20 | 20 | 16.5 | 5 | |
| Exposure category | Medium** | 7 | 10 | 60 | 14 | 20 | 37 | 30 | 60 | 42 | 75 | 18 | 18.5 | 25 |
| High** | 1 | 1 | 5 | 1 | 3 | 3 | 30 | 5 | 26 | 5 | 62 | 65 | 40 | |
| First-phase control distribution (%) | 2 | 9.5 | 3 | 7 | 0.5 | 4 | 5 | 12 | 15 | 6 | 1 | 35 | ||
* The areas represent 12 municipalities in Scania, Southern Sweden.
** Overall exposure prevalence to the categories Low, Medium and High was 40%, 30% and 30%, respectively.
† Overall smoking prevalence in the population is 22%.
Simulation results based on 1,000 replications
| Scenario 1 | Scenario 2 | Scenario 3 | ||||||||
| True OR†† | OR* | SD* | Ef** | OR* | SD* | Ef** | OR* | SD* | Ef** | |
| Individual-level information on all subjects † | 1.50 | 1.49 | 0.18 | 100 | 1.50 | 0.12 | 100 | 1.50 | 0.16 | 100 |
| 3.00 | 3.00 | 0.17 | 100 | 3.00 | 0.11 | 100 | 3.00 | 0.14 | 100 | |
| Method 1‡ | 1.50 | 1.48 | 0.25 | 53 | 1.50 | 0.24 | 25 | 1.50 | 0.25 | 43 |
| 3.00 | 3.01 | 0.24 | 51 | 2.98 | 0.23 | 23 | 2.99 | 0.23 | 40 | |
| Method 2‡ | 1.50 | 1.60 | 0.32 | 29 | 1.56 | 0.24 | 25 | 1.59 | 0.29 | 28 |
| 3.00 | 3.05 | 0.27 | 39 | 3.01 | 0.18 | 39 | 3.04 | 0.24 | 37 | |
| Method 3‡ | 1.50 | 1.49 | 0.24 | 58 | 1.50 | 0.22 | 29 | 1.50 | 0.24 | 46 |
| 3.00 | 3.00 | 0.22 | 59 | 2.98 | 0.21 | 29 | 3.00 | 0.21 | 47 | |
| Method 4‡ | 1.50 | 1.49 | 0.19 | 81 | 1.50 | 0.20 | 37 | 1.51 | 0.19 | 68 |
| 3.00 | 2.99 | 0.16 | 78 | 3.01 | 0.17 | 44 | 3.02 | 0.18 | 66 | |
* Geometric mean of the OR estimates and the empiric standard deviation of the ln(OR) estimates.
** Efficiency of the ln(OR) estimates. eff1 = (var(ln(OR1)) + (ln(true OR1))-ln(OR1)))/(var(ln(ORref)) + (ln(true ORref))-ln(ORref))) where ORref is the estimate in the ideal scenario. Efficiencies calculated when varying the number of first-phase subjects. The number of second-phase cases and controls are held fixed at 300 cases and 300 controls.
† Ideal scenario.
‡ Methods 1–4 are further described in the Methods section and in Table 1.
†† A confounder with OR = 2 is introduced and a positive bias-effect of 20% for OR = 1.50 and 33% for OR = 3.00
Simulation results based on 1,000 replications
| Scenario 2.1*** | Scenario 2.2 | Scenario 2.3 | ||||||||
| True OR†† | OR* | SD* | Ef** | OR* | SD* | Ef** | OR* | SD* | Ef** | |
| Individual-level information on all subjects † | 1.50 | 1.50 | 0.12 | 100 | 1.49 | 0.12 | 100 | 1.50 | 0.12 | 100 |
| 3.00 | 3.00 | 0.11 | 100 | 3.02 | 0.12 | 100 | 3.02 | 0.12 | 100 | |
| Method 3‡ | 1.50 | 1.50 | 0.22 | 29 | 1.49 | 0.23 | 27 | 1.52 | 0.22 | 30 |
| 3.00 | 2.98 | 0.21 | 29 | 2.97 | 0.21 | 30 | 3.05 | 0.20 | 35 | |
| Method 4‡ | 1.50 | 1.50 | 0.20 | 37 | 1.51 | 0.18 | 46 | 1.51 | 0.21 | 34 |
| 3.00 | 3.01 | 0.17 | 44 | 3.02 | 0.16 | 51 | 3.04 | 0.17 | 46 | |
* Geometric mean of the OR estimates and the empiric standard deviation of the ln(OR) estimates.
** Efficiency of the ln(OR) estimates. eff1 = (var(ln(OR1)) + (ln(true OR1))-ln(OR1)))/(var(ln(ORref)) + (ln(true ORref))-ln(ORref))) where ORref is the estimate in the ideal scenario. Efficiencies calculated when varying the number of second-phase cases and controls. The number of first-phase cases and controls are held fixed at 1,200 cases and 1,200 controls.
*** These results are also presented in table 3 (scenario 2).
† Ideal scenario.
‡ Methods 3–4 are further described in the Methods section and in Table 1.
†† A confounder with OR = 2 is introduced and a positive bias-effect of 20% for OR = 1.50 and 33% for OR = 3.00
Overview of the four combinations of first- and second-phase data evaluated in this paper
| First-phase registry data | Second-phase interview and measurement data | ||||
| Type of design | Disease status | Residential area | Exposure | Exposure | Covariates |
| 1. No first-phase exposure data (method 1) | Individual | Individual | - | Individual | Individual |
| 2. First- and second-phase exposure data (method 2 and method 4) | Individual | Individual | Area | Individual | Individual |
| 3. No first-phase exposure data (method 3) | Individual | Individual | - | Individual | Individual |