| Literature DB >> 26411674 |
Julia G Poirier1, Laura L Faye1,2, Apostolos Dimitromanolakis1, Andrew D Paterson2,3, Lei Sun2,4, Shelley B Bull1,2.
Abstract
The "winner's curse" is a subtle and difficult problem in interpretation of genetic association, in which association estimates from large-scale gene detection studies are larger in magnitude than those from subsequent replication studies. This is practically important because use of a biased estimate from the original study will yield an underestimate of sample size requirements for replication, leaving the investigators with an underpowered study. Motivated by investigation of the genetics of type 1 diabetes complications in a longitudinal cohort of participants in the Diabetes Control and Complications Trial/Epidemiology of Diabetes Interventions and Complications (DCCT/EDIC) Genetics Study, we apply a bootstrap resampling method in analysis of time to nephropathy under a Cox proportional hazards model, examining 1,213 single-nucleotide polymorphisms (SNPs) in 201 candidate genes custom genotyped in 1,361 white probands. Among 15 top-ranked SNPs, bias reduction in log hazard ratio estimates ranges from 43.1% to 80.5%. In simulation studies based on the observed DCCT/EDIC genotype data, genome-wide bootstrap estimates for false-positive SNPs and for true-positive SNPs with low-to-moderate power are closer to the true values than uncorrected naïve estimates, but tend to overcorrect SNPs with high power. This bias-reduction technique is generally applicable for complex trait studies including quantitative, binary, and time-to-event traits.Entities:
Keywords: DCCT/EDIC Genetics Study; bootstrap; cohort studies; genotype; phenotype; selection bias; survival analysis
Mesh:
Year: 2015 PMID: 26411674 PMCID: PMC4609263 DOI: 10.1002/gepi.21920
Source DB: PubMed Journal: Genet Epidemiol ISSN: 0741-0395 Impact factor: 2.135
Figure 1Log HR estimates under a Cox proportional hazards model analysis of time to severe nephropathy for the top 15 SNPs in the DCCT/EDIC Genetics Study dataset, including 1,361 individuals. The horizontal axis corresponds to the minor allele frequency (MAF), with each SNP annotated with gene name and rs number. The number above each vertical arrow indicates the SNP ranking according to the P‐value of the original test of association (reported in Table 1). The vertical arrows quantify the reduction in the logHR by the genome‐wide bootstrap method: the percentage reduction varies with MAF from 43.1 (at MAF = 48.3%) to 80.5% (at MAF = 15.9%).
DCCT/EDIC study naïve and genome‐wide bootstrap bias‐reduced logHR estimates (taken in absolute value)
| Univariable models | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Gene | SNP | MAF (%) |
| Naïve logHR estimate | Genome‐wide bootstrap estimate | Percentage reduction of naïve by genome‐wide bootstrap | Power by naïve (%) | Power by genome‐wide bootstrap (%) | |
| 1 |
| rs1968685 | 48.3 | 2.22 × 10−4 | 0.51 | 0.29 | 43.1 | 90.1 | 35.3 |
| 2 |
| rs7178362 | 13.7 | 2.36 × 10−4 | 0.56 | 0.22 | 60.1 | 63.4 | 7.7 |
| 3 |
| rs7999615 | 6.2 | 6.13 × 10−4 | 0.75 | 0.27 | 64.0 | 57.3 | 5.7 |
| 4 |
| rs17880135 | 5.6 | 8.18 × 10−4 | 0.69 | 0.17 | 75.4 | 43.2 | 2.4 |
| 5 |
| rs2254210 | 36.3 | 2.61 × 10−3 | 0.39 | 0.16 | 59.0 | 60.4 | 7.9 |
| 6 |
| rs411103 | 39.2 | 3.82 × 10−3 | 0.41 | 0.16 | 61.0 | 67.7 | 8.2 |
| 7 |
| rs307806 | 14.9 | 3.85 × 10−3 | 0.46 | 0.12 | 73.9 | 46.4 | 3.8 |
| 8 |
| rs17859935 | 16.9 | 4.36 × 10−3 | 0.46 | 0.14 | 69.6 | 53.0 | 3.8 |
| 9 |
| rs2472448 | 10.6 | 5.00 × 10−3 | 0.86 | 0.46 | 46.5 | 91.5 | 32.2 |
| 10 |
| rs3025035 | 7.2 | 5.34 × 10−3 | 0.58 | 0.13 | 77.6 | 35.4 | 1.9 |
| 11 |
| rs854555 | 34.4 | 6.22 × 10−3 | 0.41 | 0.17 | 58.5 | 64.7 | 8.8 |
| 12 |
| rs2043125 | 26.5 | 6.52 × 10−3 | 0.46 | 0.17 | 63.0 | 69.3 | 7.5 |
| 13 |
| rs3219065 | 15.9 | 7.75 × 10−3 | 0.41 | 0.09 | 78.0 | 38.0 | 1.9 |
| 14 |
| rs2027440 | 15.9 | 8.18 × 10−3 | 0.41 | 0.08 | 80.5 | 38.0 | 1.6 |
| 15 |
| rs17859970 | 14.8 | 9.99 × 10−3 | 0.43 | 0.10 | 76.7 | 40.8 | 2.1 |
Minor allele is the risk allele, otherwise major allele is associated with risk.
Analysis of time to severe nephropathy (1,361 individuals with 115 events) based on a significance threshold selection criterion of P < 0.01. SNPs with MAF ≤ 5% are excluded. Bias reduction ranges from 43.1% to 80.5%. The final two columns display post hoc power calculations for a similar sample assuming logHRs are set to the value of the naïve or the genome‐wide bias‐reduced estimates.
Simulation Study 1 summary statistics for the genome‐wide bootstrap, conditional likelihood, and single‐SNP bootstrap estimates (significance threshold P < 5 × 10−5)
| Mean bias in logHR estimates | ||||||||
|---|---|---|---|---|---|---|---|---|
| SNP MAF | Data‐generated logHR | Empirical power (%) | Mean fitted logHR | Uncorrected naïve | Genome‐wide bootstrap | Conditional likelihood | Single‐SNP bootstrap | No. of datasets selected |
| 10.3 | 0.50 | 99.5 | 0.50 | 0.00 | −0.07 | 0.00 | 0.00 | 4,974 |
| 34.9 | 0.19 | 77.3 | 0.22 | 0.01 | −0.04 | 0.01 | 0.00 | 3,863 |
| 26.7 | 0.22 | 65.5 | 0.20 | 0.03 | −0.03 | 0.02 | 0.01 | 3,275 |
| 17.1 | 0.17 | 55.7 | 0.26 | 0.04 | −0.03 | 0.03 | 0.01 | 2,785 |
| 15.0 | 0.13 | 52.2 | 0.27 | 0.04 | −0.03 | 0.03 | 0.02 | 2,611 |
| 13.7 | 0.23 | 50.0 | 0.22 | 0.04 | −0.03 | 0.03 | 0.02 | 2,499 |
| 48.8 | 0.27 | 17.2 | 0.14 | 0.05 | 0.00 | 0.04 | 0.03 | 861 |
| 39.0 | 0.21 | 10.0 | 0.12 | 0.07 | 0.01 | 0.06 | 0.04 | 500 |
| 5.8 | 0.19 | 9.3 | 0.21 | 0.13 | 0.02 | 0.12 | 0.09 | 463 |
| 15.8 | 0.09 | 8.9 | 0.15 | 0.09 | 0.01 | 0.08 | 0.06 | 447 |
| 15.8 | 0.13 | 8.4 | 0.15 | 0.09 | 0.01 | 0.08 | 0.06 | 418 |
| 14.6 | 0.14 | 0.7 | 0.12 | 0.16 | 0.08 | 0.15 | 0.13 | 36 |
| 36.3 | 0.14 | 0.6 | 0.07 | 0.12 | 0.07 | 0.11 | 0.10 | 29 |
| 6.8 | 0.27 | 0.2 | 0.16 | 0.26 | 0.15 | 0.23 | 0.19 | 11 |
| 6.9 | 0.12 | 0 | 0.09 | NA | NA | NA | NA | 0 |
Minor allele is the risk allele, otherwise major allele is associated with risk.
Comparison with the naïve Cox PH estimates for 15 SNPs generated to have association with time to severe nephropathy in a sample of 5,444 individuals. The rows are ordered by empirical power, which is the proportion of simulated datasets in which the SNP was detected as significant out of 5,000 replications. Mean bias is calculated as the difference between the mean fitted logHR in all datasets and the mean logHR in selected datasets.
Figure 2Simulation Study 1 genome‐wide bootstrap estimates for true‐positive SNPs (significance threshold P < 5 × 10−5). Comparison of distributions of genome‐wide bootstrap (transparent red GW BR2) and uncorrected naive (transparent blue) logHR estimates of true‐positive SNPs with MAF ≥ 5% out of the 5,000 replications of a sample of 5,444 subjects. The vertical, solid red line denotes the fitted logHR averaged across unselected datasets. The SNPs are ordered by number of simulation datasets (N) in which the SNP was detected as statistically significant (see Table 2).
Figure 3Simulation Study 1 genome‐wide bootstrap estimates for false‐positive SNPs (significance threshold P < 5 × 10−5). Comparison of distributions of genome‐wide bootstrap (transparent red GW BR2) and uncorrected naive (transparent blue) logHR estimates of false‐positive SNPs in a sample of 5,444 subjects, stratified by MAF categories. False‐positive SNPs are those found to be statistically significant among 5,000 replications and not in the same gene as any of the SNPs in the model used for data generation. The vertical solid red line denotes the null reference value.
Figure 4Simulation Study 1 relative mean bias comparison of the genome‐wide (GW BR2) and single‐SNP (SS BR2) bootstrap estimates with the naïve estimates for 14 SNPs with MAF ≥ 5% generated to have association with time to severe nephropathy in a sample of 5,444 subjects. Relative mean bias is calculated as Mean bias/mean fitted effect size, where means are taken over 5,000 replications (values derived from supplementary Tables S1 and S3).
Expected risk scores based on the naïve and genome‐wide (GW) bootstrap mean logHR estimates from 5,000 simulation datasets compared to the risk score based on the generating values
| Expected risk score | |||||
|---|---|---|---|---|---|
| Simulation study | Bootstrap | Generating | Naïve | GW | Naïve – GW difference |
| 1 | 5 × 10−5 | 3.49 | 4.44 | 3.40 | 1.04 |
| 2 | 0.01 | 3.49 | 4.18 | 2.83 | 1.35 |
| 3 | 5 × 10−5 | 2.89 | 4.38 | 3.45 | 0.93 |
Risk scores are calculated from 15 SNPs generated to have association with risk of severe nephropathy in a sample of 5,444 individuals. Expected scores are taken over a hypothetical population with the same marginal allele frequencies as the sample of 5,444.