| Literature DB >> 31263887 |
Mark J Adams1, W David Hill2,3, David M Howard1,4, Hassan S Dashti5, Katrina A S Davis4,6,7, Archie Campbell8,9, Toni-Kim Clarke1, Ian J Deary2,3, Caroline Hayward10, David Porteous2,8, Matthew Hotopf4,6,7, Andrew M McIntosh1,2.
Abstract
BACKGROUND: People who opt to participate in scientific studies tend to be healthier, wealthier and more educated than the broader population. Although selection bias does not always pose a problem for analysing the relationships between exposures and diseases or other outcomes, it can lead to biased effect size estimates. Biased estimates may weaken the utility of genetic findings because the goal is often to make inferences in a new sample (such as in polygenic risk score analysis).Entities:
Keywords: Generation Scotland; Partners Biobank; Selection bias; UK Biobank; cohort studies; follow-up studies; genome-wide association study; mental health
Year: 2020 PMID: 31263887 PMCID: PMC7266553 DOI: 10.1093/ije/dyz134
Source DB: PubMed Journal: Int J Epidemiol ISSN: 0300-5771 Impact factor: 7.196
Logistic regression on e-mail contact and MHQ data in UK Biobank (N = 373 478). Regression coefficients are expressed as odds ratios for increased probability of having e-mail contact and increased probability of having MHQ data
| E-mail contact | MHQ data | |||||
|---|---|---|---|---|---|---|
| Variable | N | OR (SE) | 95% CI | OR (SE) | 95% CI | |
| Age (SD) | 373 478 | 0.85 (0.004) | 0.846–0.861 | 1.01 (0.004) | 0.998–1.014 | |
| Sex | Female | 211 768 | 1 | − | 1 | − |
| Male | 161 710 | 1.11 (0.010) | 1.093–1.131 | 0.90 (0.008) | 0.883–0.914 | |
| Region | East Midlands | 25 307 | 1 | − | 1 | − |
| Greater London | 50 795 | 1.85 (0.032) | 1.785–1.909 | 1.13 (0.022) | 1.088–1.173 | |
| North East | 27 594 | 0.49 (0.008) | 0.470–0.501 | 0.87 (0.018) | 0.835–0.904 | |
| North West | 54 053 | 0.81 (0.013) | 0.781–0.833 | 0.84 (0.012) | 0.817–0.866 | |
| Scotland | 27 557 | 0.42 (0.009) | 0.405–0.439 | 0.83 (0.017) | 0.800–0.866 | |
| South East | 34 114 | 0.84 (0.016) | 0.805–0.867 | 1.13 (0.020) | 1.088–1.165 | |
| South West | 33 410 | 1.13 (0.021) | 1.087–1.171 | 1.08 (0.020) | 1.042–1.121 | |
| Wales | 15 741 | 0.58 (0.013) | 0.558–0.611 | 0.83 (0.020) | 0.796–0.873 | |
| West Midlands | 33 042 | 0.63 (0.011) | 0.606–0.649 | 0.83 (0.016) | 0.799–0.862 | |
| Yorkshire | 71 865 | 1.00 (0.016) | 0.967–1.028 | 0.93 (0.014) | 0.900–0.957 | |
| Qualifications | None | 53 654 | 1 | − | 1 | − |
| GCSE | 124 377 | 2.35 (0.028) | 2.297–2.408 | 2.29 (0.029) | 2.230–2.342 | |
| A Levels | 44 132 | 3.43 (0.048) | 3.338–3.525 | 3.53 (0.057) | 3.421–3.642 | |
| Other | 19 583 | 2.53 (0.042) | 2.451–2.616 | 2.72 (0.052) | 2.620–2.823 | |
| College/university | 131 732 | 4.27 (0.054) | 4.163–4.375 | 4.43 (0.056) | 4.322–4.541 | |
| Smoking | Never | 210 858 | 1 | − | 1 | − |
| Previous | 126 802 | 1.13 (0.009) | 1.116–1.152 | 1.06 (0.008) | 1.042–1.074 | |
| Current | 35 818 | 0.71 (0.009) | 0.689–0.723 | 0.73 (0.010) | 0.706–0.744 | |
| Alcohol | Units/week (SD) | 373 478 | 1.05 (0.004) | 1.038–1.053 | 1.03 (0.005) | 1.021–1.039 |
| Anthropometry | Body mass index (SD) | 373 478 | 0.95 (0.004) | 0.940–0.953 | 0.88 (0.004) | 0.877–0.893 |
| Diagnoses, yes (vs no) | ||||||
| Mental disorder | 24 668 | 0.75 (0.011) | 0.729–0.774 | 0.68 (0.012) | 0.654–0.701 | |
| Injury | 59 706 | 0.90 (0.007) | 0.881–0.909 | 0.83 (0.009) | 0.815–0.851 | |
| Other disease | 278 019 | 0.95 (0.009) | 0.929–0.963 | 0.91 (0.009) | 0.889–0.923 | |
| Family history, yes (vs no) | ||||||
| Alzheimer's/dementia | 52 238 | 1.18 (0.013) | 1.157–1.208 | 1.22 (0.013) | 1.198–1.250 | |
| Severe depression | 54 651 | 1.04 (0.011) | 1.022–1.066 | 1.11 (0.012) | 1.084–1.131 | |
Figure 1.Manhattan plot of e-mail contact in UK Biobank.
Top lead SNPs associated with e-mail contact in UK Biobank. Direction of effects are listed for the UK Biobank discovery sample and the Generation Scotland and Partners Biobank replication samples as either positive (+) or negative (−)
| Chr | SNP | Location (bp) | A1/A2 | Freq. | OR (SE) |
| Direction |
|---|---|---|---|---|---|---|---|
| 1 | rs632180 | 234, 758, 181 | T/C | 0.70 | 0.973 (0.005) | 2.0 × 10−8 | −−+ |
| 2 | rs7597665 | 34, 420, 702 | C/T | 0.29 | 1.031 (0.005) | 1.1 × 10−9 | +++ |
| 2 | rs1455343 | 199, 519, 691 | T/G | 0.38 | 0.974 (0.005) | 2.2 × 10−8 | −−+ |
| 3 | rs73078357 | 48, 695, 834 | C/T | 0.12 | 1.038 (0.007) | 4.5 × 10−8 | +++ |
| 3 | rs111488606 | 49, 864, 924 | CA/C | 0.44 | 0.973 (0.005) | 2.3 × 10−8 | −−− |
| 5 | rs6452788 | 87, 712, 913 | A/G | 0.24 | 1.032 (0.005) | 2.9 × 10−9 | ++− |
| 5 | rs4976602 | 167, 843, 998 | A/G | 0.11 | 0.96 (0.007) | 2.7 × 10−8 | −−− |
| 6 | rs1487441 | 98, 553, 894 | A/G | 0.49 | 1.031 (0.005) | 9.5 × 10−12 | +++ |
| 18 | rs1788784 | 21, 159, 630 | G/A | 0.66 | 1.031 (0.005) | 1.3 × 10−10 | +++ |
A1, effect allele; A2, non-effect allele; Chr, chromosome; Freq., frequency of effect allele; OR, odds ratio; SE, standard error.
Figure 2.Manhattan plot of data available in MHQ follow-up.
Top lead SNPs associated with MHQ data. Direction of effects are listed for the UK Biobank discovery sample and the Generation Scotland and Partners Biobank replication samples as either positive (+) or negative (−)
| Chr | SNP | Location (bp) | A1/A2 | Freq. | OR (SE) |
| Direction |
|---|---|---|---|---|---|---|---|
| 1 | rs7542974 | 72, 544, 704 | A/G | 0.25 | 1.032 (0.006) | 3.8 × 10−8 | +++ |
| 1 | rs485929 | 74, 678, 285 | G/A | 0.39 | 1.028 (0.005) | 3.7 × 10−8 | +−+ |
| 1 | rs532246 | 84, 411, 238 | G/A | 0.74 | 0.968 (0.005) | 7.0 × 10−9 | −+− |
| 1 | rs2789111 | 243, 346, 404 | C/T | 0.38 | 0.968 (0.005) | 1.5 × 10−10 | −−+ |
| 2 | rs35028061 | 49, 479, 987 | GT/G | 0.38 | 1.029 (0.005) | 1.9 × 10−8 | +−− |
| 3 | rs9917656 | 48, 581, 513 | C/T | 0.30 | 1.03 (0.006) | 3.2 × 10−8 | ++− |
| 3 | rs13082026 | 52, 962, 681 | T/C | 0.44 | 0.972 (0.005) | 2.4 × 10−8 | −−+ |
| 4 | rs57692580 | 106, 214, 476 | A/T | 0.39 | 0.973 (0.005) | 2.8 × 10−8 | −++ |
| 5 | rs34635 | 60, 513, 501 | G/A | 0.42 | 0.972 (0.005) | 1.2 × 10−8 | −−− |
| 5 | rs146681214 | 133, 867, 867 | AC/A | 0.18 | 1.039 (0.007) | 3.6 × 10−9 | +++ |
| 5 | rs2336897 | 167, 050, 276 | T/C | 0.69 | 1.031 (0.005) | 5.2 × 10−9 | ++− |
| 6 | rs3993747 | 31, 580, 507 | G/A | 0.35 | 0.969 (0.005) | 9.5 × 10−10 | −−− |
| 6 | rs59732267 | 98, 432, 302 | CA/C | 0.52 | 0.972 (0.005) | 2.5 × 10−8 | −−− |
| 8 | rs28716319 | 83, 269, 854 | G/A | 0.28 | 1.031 (0.005) | 2.7 × 10−8 | +−+ |
| 8 | rs13262595 | 143, 316, 970 | G/A | 0.56 | 1.03 (0.005) | 1.0 × 10−9 | +++ |
| 9 | rs6474966 | 15, 757, 537 | A/G | 0.46 | 1.028 (0.005) | 2.8 × 10−8 | +++ |
| 9 | rs11793831 | 23, 362, 311 | T/G | 0.42 | 1.027 (0.005) | 4.3 × 10−8 | +−+ |
| 11 | rs1984389 | 31, 740, 989 | C/A | 0.54 | 0.973 (0.005) | 2.4 × 10−8 | −−− |
| 11 | rs10791143 | 131, 278, 676 | G/A | 0.62 | 1.034 (0.005) | 1.5 × 10−11 | +++ |
| 16 | rs4616299 | 7, 657, 432 | G/A | 0.40 | 0.972 (0.005) | 1.2 × 10−8 | −−− |
| 17 | rs56058331 | 56, 427, 128 | A/G | 0.42 | 1.029 (0.005) | 1.0 × 10−8 | +++ |
| 18 | rs1261078 | 52, 866, 791 | G/A | 0.05 | 0.927 (0.010) | 5.6 × 10−12 | −+− |
| 19 | rs34232444 | 4, 965, 404 | C/T | 0.35 | 1.029 (0.005) | 2.5 × 10−8 | ++− |
| 19 | rs3746187 | 18, 279, 816 | G/A | 0.40 | 0.968 (0.005) | 9.8 × 10−11 | −−− |
| 19 | rs429358 | 45, 411, 941 | C/T | 0.15 | 0.942 (0.006) | 4.6 × 10−19 | −−− |
A1, effect allele; A2, non-effect allele; Chr, chromosome; Freq., frequency of effect allele; OR, odds ratio; SE, standard error.
Figure 3.LD score genetic correlations (rg) with e-mail contact (triangle) and MHQ data (circle), with 95% confidence intervals.
Figure 4.Possible effects of selection bias on polygenic risk score analyses in follow-up studies. (A) Causal model to be tested where PRS causes phenotype Y via phenotype X. (B) Worst-case scenario where PRS influences X but not Y and both phenotypes cause follow-up participation. Analysing only follow-up participants is the same as conditioning on F, which induces a correlation between PRS and Y. (C) More likely scenario, where both X and Y cause follow-up participation. Conditioning on F attenuates estimates of the relationship between PRS and Y. (D) Ideal scenario where X causes follow-up participation, but Y does not. Conditioning on F has no impact on the dependence of Y on PRS. PRS, polygenic risk score; X and Y, phenotypes of interest; F, selection into follow-up; directional solid line, true causal association; dashed line, induced or attenuated statistical dependence.