| Literature DB >> 18828892 |
Pascal Wild1, Nadine Andrieu, Alisa M Goldstein, Walter Schill.
Abstract
The two-phase design consists of an initial (Phase One) study with known disease status and inexpensive covariate information. Within this initial study one selects a subsample on which to collect detailed covariate data. Two-phase studies have been shown to be efficient compared to standard case-control designs. However, potential problems arise if one cannot assure minimum sample sizes in the rarest categories or if recontact of subjects is difficult. In the case of a rare exposure with an inexpensive proxy, the authors propose the flexible two-phase design for which there is a single time of contact, at which a decision about full covariate ascertainment is made based on the proxy. Subjects are screened until the desired numbers of cases and controls have been selected for full data collection. Strategies for optimizing the cost/efficiency of this design and corresponding software are presented. The design is applied to two examples from occupational and genetic epidemiology. By ensuring minimum numbers for the rarest disease-covariate combination(s), we obtain considerable efficiency gains over standard two-phase studies with an improved practical feasibility. The flexible two-phase design may be the design of choice in the case of well targeted studies of the effect of rare exposures with an inexpensive proxy.Entities:
Year: 2008 PMID: 18828892 PMCID: PMC2602593 DOI: 10.1186/1742-5573-5-4
Source DB: PubMed Journal: Epidemiol Perspect Innov ISSN: 1742-5573
Scenario for Example 1
| Stratification/Proxy Z (with J strata) | Past work in metal industry |
| No: Z = 1 | |
| Yes: Z = 2 | |
| Phase One prevalence among controls (τ0j) | Z = 1: τ01 = 80% |
| Z = 2: τ02 = 20%* | |
| Risk factor X (with K outcomes) | Exposure to MWF |
| No: X = 1 | |
| Yes: X = 2 | |
| Disease Model (Odds Ratios) (ψk) | ψ1 = 1: baseline risk |
| ψ2 = 2# | |
| Phase Two prevalence of X among controls by stratum (π0jk) | Z = 1: π011 = 97.5%, π012 = 2.5%& |
*20% prevalence of having worked in the metal industry
#Exposure to MWF doubles the risk of bladder cancer
&Among non-metal-industry workers, 2.5% exposed to MWF
@Among metal-industry workers, 25% exposed to MWF
Design of the flexible two-phase study for Example 1
| N0 = Max(160/20%, 40/80%) = 800 | |||||
| Control | No (80%*) | 40 | 800*80% = 640 | No (97.5%*) | 40*97.5% = 39 |
| Yes (2.5%*) | 40*2.5% = 1 | ||||
| Yes (20%*) | 160 | 800*20% = 160 | No (75%*) | 160*75% = 120 | |
| Yes (25%*) | 160*25% = 40 | ||||
| N1 = Max(85/23.4%, 20/76.6%) = 364 | |||||
| Case | No (76.6%#) | 20 | 364*76.6% = 278.8 | No (95.1%#) | 20*95.1% = 19.02 |
| Yes (4.9%#) | 20*4.9% = 0.98 | ||||
| Yes (23.4%#) | 85 | 364*23.4% = 85 | No (60%#) | 85*60% = 51 | |
| Yes (40%#) | 85*40% = 34 | ||||
* Values of parameters fixed in Table 1
# Values of parameters computed from parameter values fixed in Table 1 (see Appendix 2)
§ In Phase One controls, the overall expected percentage of MWF exposure is equal to 7%, that is, 2% = 2.5% of 80% non-metal-workers plus 5% = 25% of 20% metal-workers. Similar computations lead to 13% MWF exposure in cases.
Figure 1STATA output for Example 1.
Scenarios for Example 2
| Stratification/Proxy Z (with J strata) | Environmental exposure E and Gene proxy SG |
| J = 4 | |
| Z = 1: E- SG-, Z = 2: E- SG+, Z = 3: E+ SG-, Z = 4: E+ SG+ | |
| Phase One prevalence among controls (τ0j): | τ01 = Pr(E-)Pr(SG-) = (1 - PE)[(1-Se)PG+Sp(1-PG)] |
| PE = 20% | τ02 = Pr(E-)Pr(SG+) = (1 - PE)[SePG+(1-Sp).(1-PG)] |
| PG = 1% | τ03 = Pr(E+)Pr(SG-) = PE[(1-Se)PG+Sp(1-PG)] |
| τ04 = Pr(E+)Pr(SG+) = PE[SePG+(1-Sp).(1-PG)] | |
| Risk factor X (with K outcomes) | Exposure to E and exposure to G: K = 4 |
| X = 1: E- G-, X = 2: E- G+, X = 3: E+ G-, X = 4: E+ G+ | |
| Disease Model (Odds Ratios ψk) | ψ1 = 1, ψ2 = 3, ψ3 = 2, ψ4 = ψ2 × ψ3 × ORI = 30 |
| Phase Two prevalence of X among controls by stratum (π0jk) | Z = 1: π011 = (1 -PE)Sp(1-PG)/Pr(SG-), |
| π012 = 1 - π011, π013 = π014 = 0 | |
| Z = 2: π021 = (1 - PE)(1 - Sp)(1-PG)/Pr(SG+), | |
| π022 = 1 - π021, π023 = π024 = 0 | |
| Z = 3: π031 = π032 = 0, π033 = PE Sp(1-PG)/Pr(SG-), | |
| π034 = 1 - π033 | |
| Z = 4: π041 = = π042 = 0, π043 = PE (1 - Sp)(1-PG)/Pr(SG+), | |
| π044 = 1 - π043 | |
*Se = sensitivity; Sp = specificity
Designs with maximal power of detecting the interaction, according to sensitivity and specificity
| Gene-surrogate | Flexible two-phase design options | Expected Phase One counts | Power# | Cost* | |||||
| Spec | Sens | n0 | n1 | ρ0† | ρ1‡ | N0 | N1 | ||
| 70% | 80% | 800 | 400 | 90% | 90% | 5902 | 1373 | 83% | 1564 |
| 70% | 90% | 800 | 400 | 90% | 90% | 5882 | 1325 | 87% | 1560 |
| 80% | 60% | 800 | 400 | 90% | 90% | 8824 | 1988 | 87% | 1741 |
| 80% | 70% | 800 | 400 | 90% | 90% | 8780 | 1889 | 91% | 1733 |
| 80% | 80% | 800 | 400 | 90% | 90% | 8738 | 1800 | 94% | 1727 |
| 80% | 90% | 800 | 400 | 90% | 90% | 8696 | 1718 | 96% | 1720 |
| 90% | 60% | 900 | 300 | 90% | 90% | 19286 | 2000 | 98% | 2264 |
| 90% | 70% | 900 | 300 | 90% | 90% | 19104 | 2000 | 99% | 2255 |
| 90% | 80% | 900 | 300 | 90% | 90% | 18925 | 1960 | 99.6% | 2244 |
| 90% | 90% | 900 | 300 | 90% | 90% | 18750 | 1835 | 99.8% | 2229 |
Designs with minimum cost among designs with 80% power of detecting the interaction
| Gene-surrogate | Flexible two-phase design options | Expected Phase One counts | Power# | Cost* | |||||
| Spec | Sens | n0 | n1 | ρ0† | ρ1‡ | N0 | N1 | ||
| 70% | 80% | 700 | 500 | 90% | 80% | 5163 | 1525 | 81% | 1534 |
| 70% | 90% | 600 | 500 | 90% | 80% | 4412 | 1472 | 80% | 1394 |
| 80% | 60% | 700 | 300 | 90% | 90% | 7721 | 1491 | 80% | 1461 |
| 80% | 70% | 600 | 300 | 90% | 80% | 6585 | 1259 | 80% | 1292 |
| 80% | 80% | 500 | 300 | 90% | 80% | 5461 | 1200 | 80% | 1133 |
| 80% | 90% | 400 | 400 | 90% | 80% | 4348 | 1528 | 81% | 1094 |
| 90% | 60% | 400 | 400 | 70% | 50% | 6667 | 1683 | 81% | 1217 |
| 90% | 70% | 500 | 300 | 50% | 50% | 5896 | 1169 | 80% | 1153 |
| 90% | 80% | 500 | 300 | 40% | 60% | 4673 | 1307 | 80% | 1099 |
| 90% | 90% | 500 | 300 | 40% | 50% | 4630 | 1019 | 82% | 1082 |
# Analysis approach: Maximum likelihood
* the study cost is computed as the sum of the number of screened subjects divided by 20 plus the number of subjects included in Phase Two.
† ρ0 is the proportion of SG+ controls included in Phase Two
‡ ρ1 is the proportion of SG+ cases included in Phase Two