| Literature DB >> 29136145 |
P J Newcombe1, S Connolly1, S Seaman1, S Richardson1, S J Sharp2.
Abstract
Background: Accurate detection and estimation of true exposure-outcome associations is important in aetiological analysis; when there are multiple potential exposure variables of interest, methods for detecting the subset of variables most likely to have true associations with the outcome of interest are required. Case-cohort studies often collect data on a large number of variables which have not been measured in the entire cohort (e.g. panels of biomarkers). There is a lack of guidance on methods for variable selection in case-cohort studies.Entities:
Mesh:
Year: 2018 PMID: 29136145 PMCID: PMC5913627 DOI: 10.1093/ije/dyx224
Source DB: PubMed Journal: Int J Epidemiol ISSN: 0300-5771 Impact factor: 7.196
Assumptions used to generate artificial datasets used in the comparison of methods
| Scenario | Number of variables | Pairwise correlations | Number of signals | HR signals | Baseline hazard function |
|---|---|---|---|---|---|
| 1 | 20 | 0.2 | |||
| 2 | 20 | 0.5 | |||
| 3 | 20 | 0.8 | |||
| 4 | 100 | 0.2 | |||
| 5 | 100 | 0.5 | 5 | 1.1, 0.78, 1.48, 0.58, 2 | Weibull(30,4) |
| 6 | 100 | 0.8 | |||
| 7 | 1000 | 0.2 | |||
| 8 | 1000 | 0.5 | |||
| 9 | 1000 | 0.8 |
Sensitivity of variable selection methods, for each scenario
| Method | Pairwise correlation between all variables | ||
|---|---|---|---|
| 0.2 | 0.5 | 0.8 | |
| One-at-a-time | 0.76 (0.01) | 0.48 (0.01) | 0.44 (0.01) |
| Stepwise | 0.76 (0.01) | 0.72 (0.01) | 0.59 (0.01) |
| Two-step BVS | 0.89 (0.01) | 0.84 (0.01) | 0.73 (0.01) |
| One-at-a-time | 0.69 (0.01) | 0.44 (0.01) | 0.40 (0.01) |
| Stepwise | 0.76 (0.01) | 0.71 (0.01) | 0.59 (0.01) |
| Two-step BVS | 0.84 (0.01) | 0.78 (0.01) | 0.66 (0.01) |
| One-at-a-time | 0.64 (0.01) | 0.41 (0.00) | 0.34 (0.01) |
| Stepwise | 0.77 (0.01) | 0.70 (0.01) | 0.56 (0.01) |
| Two-step BVS | 0.79 (0.01) | 0.73 (0.01) | 0.58 (0.01) |
Mean sensitivity, the proportion of true signals selected, is displayed for 200 simulations with the corresponding Monte Carlo errors in brackets.
*Selection thresholds chosen to match the false discovery rate of the stepwise method in each simulation, for which a nominal P-value inclusion threshold of 0.05 was used.
False discovery rates of variable selection methods, for each scenario
| Method | Pairwise correlation between all variables | ||
|---|---|---|---|
| 0.2 | 0.5 | 0.8 | |
| One-at-a-time | 0.09 (0.01) | 0.58 (0.02) | 0.40 (0.02) |
| Stepwise | <0.01 (<0.01) | <0.01 (<0.01) | 0.01 (<0.01) |
| Two-step BVS | <0.01 (<0.01) | <0.01 (<0.01) | <0.01 (<0.01) |
| One-at-a-time | 0.25 (0.02) | 0.77 (0.02) | 0.67 (0.02) |
| Stepwise | 0.01 (<0.01) | 0.03 (0.01) | 0.08 (0.01) |
| Two-step BVS | <0.01 (<0.01) | 0.01 (<0.01) | 0.01 (<0.01) |
| One-at-a-time | 0.53 (0.03) | 0.93 (0.01) | 0.78 (0.02) |
| Stepwise | 0.07 (0.01) | 0.14 (0.01) | 0.27 (0.01) |
| Two-step BVS | 0.01 (<0.01) | 0.01 (<0.01) | 0.06 (0.01) |
Mean false discovery rate, the proportion of noise variables selected, is displayed for 200 simulations with the corresponding Monte Carlo errors in brackets.
*Selection thresholds chosen to match the sensitivity of the stepwise method in each simulation, for which a nominal P-value inclusion threshold of 0.05 was used.
Figure 1Results from application of three variable selection methods to data from the EPIC-InterAct case-cohort study. Panel A) shows the log10(P value) for each fatty acid from one-at-a-time Prentice-weighted Cox regression models; the dashed line indicates the Bonferroni significance threshold (0.05/20 =0.0025). Panel B) shows the log10(P values) for the combination of fatty acids selected using the stepwise method according to inclusion thresholds of P=0.05 and P=0.1. Panel C) shows posterior probabilities of selection using the BVS method; the dashed line indicates a Bayes Factor of 5.
Hazard ratios, 95% CIs, posterior probabilities and Bayes Factors for the fatty acids selected using the BVS method
| Fatty acid | HR (95% CI) | Posterior probabilities | Bayes Factor |
|---|---|---|---|
| Saturated: | |||
| c160 | 1.25 (1.04, 1.50) | 1.00 | 7672.3 |
| c170 | 0.77 (0.66, 0.89) | 1.00 | ∞ |
| c180 | 1.17 (1.00, 1.38) | 0.75 | 60.6 |
| c220 | 1.25 (0.91, 1.71) | 1.00 | 6646.7 |
| c240 | 0.67 (0.50, 0.90) | 1.00 | ∞ |
| n-3 polyunsaturated: | |||
| c225n3 | 0.88 (0.75, 1.03) | 0.74 | 55.6 |
| n-6 polyunsaturated: | |||
| c182n6 | 0.85 (0.70, 1.02) | 0.60 | 30.2 |
| c202n6 | 0.79 (0.68, 0.91) | 1.00 | ∞ |
| c203n6 | 1.44 (1.22, 1.70) | 1.00 | ∞ |
Hazard ratios are per 1 standard deviation in the fatty acid.
aConfidence intervals for 3/9 fatty acids in the modal model included 1. The fatty acids were selected for inclusion using the BVS algorithm based on logistic regression, and hazard ratios were estimated from Prentice-weighted Cox regression. As shown in the simulations, the BVS algorithm has higher sensitivity than methods using weighted Cox regression, so it is possible that the CIs around the hazard ratios for some variables selected using BVS will include 1.