| Literature DB >> 33893285 |
Marta Guindo-Martínez1, Ramon Amela1, Silvia Bonàs-Guarch1,2,3, Montserrat Puiggròs1, Cecilia Salvoro1, Irene Miguel-Escalada1,2,3, Caitlin E Carey4,5, Joanne B Cole6,7,8,9, Sina Rüeger10, Elizabeth Atkinson4,5,11, Aaron Leong8,12, Friman Sanchez1, Cristian Ramon-Cortes1, Jorge Ejarque1, Duncan S Palmer4,5,13, Mitja Kurki10, Krishna Aragam11,14,15, Jose C Florez6,7,16, Rosa M Badia1, Josep M Mercader17,18,19,20, David Torrents21,22.
Abstract
Genome-wide association studies (GWAS) are not fully comprehensive, as current strategies typically test only the additive model, exclude the X chromosome, and use only one reference panel for genotype imputation. We implement an extensive GWAS strategy, GUIDANCE, which improves genotype imputation by using multiple reference panels and includes the analysis of the X chromosome and non-additive models to test for association. We apply this methodology to 62,281 subjects across 22 age-related diseases and identify 94 genome-wide associated loci, including 26 previously unreported. Moreover, we observe that 27.7% of the 94 loci are missed if we use standard imputation strategies with a single reference panel, such as HRC, and only test the additive model. Among the new findings, we identify three novel low-frequency recessive variants with odds ratios larger than 4, which need at least a three-fold larger sample size to be detected under the additive model. This study highlights the benefits of applying innovative strategies to better uncover the genetic architecture of complex diseases.Entities:
Mesh:
Year: 2021 PMID: 33893285 PMCID: PMC8065056 DOI: 10.1038/s41467-021-21952-4
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1Graphical representation illustrating the benefits of combining the results from different reference panels.
a Comparison of the number of variants after the imputation with four reference panels (info score ≥ 0.7), and combining them, colored according to MAF and variant type (SNP vs alternative forms of variation, such as indels). As shown in the bar plot, combining the results from the four reference panels increased the final set of variants for association testing when compared with the results for each of the panels alone (GoNL, UK10K, 1000G Phase 3, or HRC), especially in the low and rare frequency spectrum. For example, we covered up to 5.5 M rare variants (0.01> MAF > 0.001) by combining panels, while only 2.3 M, 2.9 M, 3.2 M, and 3.8 M of rare variants were imputed independently with GoNL, UK10K, 1000G phase 3, and HRC, respectively. b Comparison of the contribution of each reference panel in the combined results. Each bar represents the number of variants that had the best imputation accuracy for a given reference panel. As shown in the figure, although the HRC panel showed overall higher imputation scores, as it provided around 10 of the final 16 M variants, the contribution of the other reference panels, primarily with non-SNP variants, was substantial. Indels seen in the bar plot for HRC correspond to genotyped indels. All variants with info score <0.7, MAF < 0.001, and HWE for controls p < 1.0 × 10−6 were filtered. c Percentage of high-quality imputed variants (IMPUTE2-info score ≥ 0.7) with an allelic dosage R2 ≥ 0.5 between sequenced genotypes in UK10K samples vs variants imputed in the same UK10K samples using 1000G phase 3, GoNL, and HRC reference panels for the autosomes. The percentage of high-quality imputed variants with allelic dosage R2 values (y axis) are represented across several MAF ranges (x-axis) for each of the reference panels and the combined panels imputed results. The combination of the three reference panels outperforms the single reference panels with 97.74% of variants with R2 ≥ 0.5. d Percentage of variants in the X chromosome with an IMPUTE2-info score ≥ 0.7 and with an allelic dosage R2 ≥ 0.5 for UK10K imputed genotypes across MAF ranges for 1000G phase 3, GoNL, and HRC reference panels and the combined results. The combination of the results from the three panels outperforms single reference panels with 93.89% of variants with allelic dosage R2 ≥ 0.5. e Venn Diagram illustrating the loci identified by each reference panel. New loci are depicted in bold. As shown in this figure, only 67 of the 94 GWAS significant loci were identified by all four reference panels, while 27 of them (28.7%) were only identified by one, two, or three of the four panels.
New associations from the GERA cohort analysis.
| Phenotype (cases/controls) | CHR | Nearest gene | Position | rsID | Alleles | MAF | Lowest | Additive model | Lowest | Dominance deviation | Best panel empirical | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| OR (CI 95%) | OR (CI 95%) | ||||||||||||
| Allergic rhinitis (13,936/42,701) | 3 | 112,911,615 | rs2399472 | C/T | 0.073 | Additive | 1.17 (1.10–1.23) | 1.55 × 10−8 | 1.17 (1.10–1.23) | 1.55 × 10−8 | 6.66 × 10−1 | 1.000 | |
| 8 | 13,164,746 | rs10112506 | A/G | 0.390 | Dominant | 0.94 (0.91–0.97) | 8.61 × 10−6 | 0.89 (0.86–0.93) | 1.54 × 10−8 | 2.86 × 10−4 | 0.998 | ||
| Asthma (9209/47,428) | 5 | 137,858,067 | rs154073 | C/T | 0.429 | Recessive | 1.09 (1.06–1.13) | 6.06 × 10−8 | 1.18 (1.12–1.25) | 4.23 × 10−9 | 9.28 × 10−3 | 0.991 | |
| 9 | 98,344,866 | rs67053006 | C/G | 0.139 | Additive | 0.87 (0.83–0.91) | 4.14 × 10−8 | 0.87 (0.83–0.91) | 4.14 × 10−8 | 8.10 × 10−1 | –b | ||
| Cancer (17,131/39,506) | 13 | 112,115,591 | rs138646839 | C/T | 0.005 | Genotypic | 1.68 (1.39–2.03) | 1.45 × 10−7 | 1.60 (1.32–1.96)/>10 (1.01– >10)c | 3.54 × 10−8 | – | 0.802 | |
| 18 | 28,442,343 | rs2014497 | A/G | 0.008 | Additive | 1.50 (1.30–1.72) | 2.44 × 10−8 | 1.50 (1.30–1.72) | 2.44 × 10−8 | 6.00 × 10−1 | 0.988 | ||
| Cardiovascular (15,009/41,628) | 1 | 114,448,752 | rs10858023 | C/T | 0.350 | Dominant | 1.09 (1.06–1.12) | 3.26 × 10−8 | 1.14 (1.09–1.19) | 2.11 × 10−9 | 1.94 × 10−2 | 0.996 | |
| 2 | 152,912,244 | rs201654520 | CT/C | 0.017 | Recessive | 1.10 (0.98–1.22) | 1.10 × 10−1 | 19.02 (5.50–65.84) | 4.32 × 10−8 | 4.36 × 10−6 | 0.973 | ||
| Major depression disorder (7264/49,373) | 12 | 128,551,715 | rs1455286248 | GT/G | 0.281 | Heterodominant | 0.94 (0.90–0.98) | 3.00 × 10−3 | 1.18 (1.12–1.25) | 3.15 × 10−9 | 1.10 × 10−6 | –b | |
| Type 2 diabetes (6967/49,670) | 5 | 52,080,909 | rs77704739 | T/C | 0.036 | Recessive | 1.15 (1.05–1.26) | 2.80 × 10−3 | 4.32 (2.70–6.92) | 1.75 × 10−8 | 1.92 × 10−7 | 0.998 | |
| Hemorrhoids (9129/47,508) | 13 | 76,281,808 | rs186102686 | C/T | 0.004 | Heterodominant | 1.98 (1.58–2.48) | 2.18 × 10−8 | 1.99 (1.59–2.49) | 2.03 × 10−8 | – | 0.933 | |
| Hernia abdominopelvic (6291/50,346) | 1 | 219,762,581 | rs2494196 | C/A | 0.274 | Additive | 1.13 (1.08–1.18) | 2.03 × 10−8 | 1.13 (1.08–1.18) | 2.03 × 10−8 | 6.87 × 10−1 | 0.997 | |
| 4 | 27,019,359 | rs113180595 | T/C | 0.004 | Heterodominant | 2.17 (1.69–2.78) | 1.59 × 10−8 | 2.18 (1.70–2.8) | 1.27 × 10−8 | – | 0.647 | ||
| Hypertension disease (28,391/28,246) | 2 | 176,532,019 | rs1446802 | A/G | 0.500 | Recessive | 1.07 (1.04–1.09) | 1.66 × 10−6 | 1.13 (1.08–1.17) | 4.42 × 10−8 | 6.85 × 10−3 | 1.000 | |
| 15 | 90,081,905 | rs28792763 | G/A | 0.462 | Dominant | 0.94 (0.91–0.96) | 4.14 × 10−6 | 0.88 (0.84–0.92) | 4.42 × 10−8 | 4.80 × 10−3 | 0.907 | ||
| 17 | 1,959,826 | rs112963849 | C/A | 0.082 | Additive | 1.15 (1.10–1.21) | 1.71 × 10−8 | 1.15 (1.10–1.21) | 1.71 × 10−8 | 8.01 × 10−1 | 0.826 | ||
| Iron deficiency anemia (2439/54,198) | 7 | 67,292,424 | rs79798837 | C/T | 0.118 | Dominant | 0.77 (0.70–0.85) | 1.69 × 10−7 | 0.74 (0.66–0.83) | 3.80 × 10−8 | 8.92 × 10−2 | 0.948 | |
| Macular degeneration (3685/52,952) | 2 | 40,010,523 | rs557998486 | T/TG | 0.009 | Recessive | 1.07 (0.81–1.41) | 6.28 × 10−1 | 10.5d | 2.75 × 10−8 | – | 0.865 | |
| Osteoporosis (5399/51,238) | 22 | 27,772,054 | rs139959245 | C/T | 0.007 | Additive | 1.91 (1.53–2.37) | 4.79 × 10−8 | 1.91 (1.53–2.37) | 4.79 × 10−8 | – | 0.851 | |
| Psychiatric (8624/48,013) | 2 | 46,278,720 | rs12712961 | T/A | 0.452 | Additive | 1.10 (1.06–1.14) | 1.66 × 10−8 | 1.10 (1.06–1.14) | 1.66 × 10−8 | 2.57 × 10−1 | 0.994 | |
| Peripheral Vascular disease (4301/52,336) | 11 | 33,391,655 | rs80274406 | A/G | 0.091 | Genotypic | 1.06 (0.98–1.15) | 1.76 × 10−1 | 1.17 (1.07–1.27)/0.26 (0.13–0.53)c | 4.26 × 10−8 | 6.32 × 10−6 | 0.923 | |
| 19 | 48,403,215 | rs2932761 | A/G | 0.289 | Genotypic | 0.97 (0.93–1.02) | 3.04 × 10−1 | 1.11 (1.03–1.18/0.76 (0.66–0.87)c | 3.55 × 10−8 | 1.35 × 10−8 | 0.998 | ||
| Acute reaction to stress (4314/52,323) | 2 | 184,407,101 | rs577242570 | T/G | 0.004 | Additive | 2.33 (1.77–3.08) | 4.56 × 10−8 | 2.33 (1.77–3.08) | 4.56 × 10−8 | – | 0.875 | |
| Varicose veins (2483/54,154) | 3 | 32,652,184 | rs62250779 | G/A | 0.073 | Genotypic | 1.17 (1.05–1.3) | 5.60 × 10−3 | 1.29 (1.16–1.45)/0.13 (0.03–0.60)c | 2.13 × 10−8 | 9.58 × 10−4 | 0.939 | |
| 8 | 74,284,818 | rs2383896 | A/G | 0.479 | Additive | 1.17 (1.11–1.24) | 5.00 × 10−8 | 1.17 (1.11–1.24) | 5.00 × 10−8 | 9.88 × 10−1 | 0.995 | ||
| 13 | 88,346,617 | rs117798068 | T/C | 0.011 | Heterodominant | 2.03 (1.63–2.53) | 1.59 × 10−8 | 2.07 (1.66–2.59) | 8.41 × 10−9 | – | 0.752 | ||
CHR chromosome, Position position hg19, Alleles non-effect allele/effect allele, MAF minor allele frequency, OR odds ratio, CI confidence interval.
aEmpirical r-squared correlation (R2) between imputed and sequenced allele dosage for the best panel from our in silico analysis using an array of UK10K genotypes as a backbone and imputing with 1000G, HRC, and GoNL.
bThis variant is not present in UK10K.
cOdds ratio and confidence interval for heterozygous/odds ratio and confidence interval for effect allele homozygous calculated using the method het+hom from SNPTEST.
dOdds ratio calculated using the recessive allele frequency-based test (RAFT)[61].
Fig. 2Functional characterization of the rs77704739 recessive association near the PELO gene.
a Colocalization plots from LocusCompare for the rs77704739 variant in adipose subcutaneous tissue. As seen in the plots, the signals from both eQTL data and the recessive T2D association results colocalize. b Violin plot from GTEx showing that the recessive rs77704739 variant significantly modifies the expression of PELO gene in subcutaneous (n = 581 independent samples) and visceral adipose tissue (n = 469 independent samples), skeletal muscle (n = 706 independent samples) and pancreas (n = 305 independent samples). The box plots have lines extending from the boxes (whiskers) indicating variability outside the upper and lower quartiles. GTEx V7 was used for colocalization analyses, whereas GTEx V8 was used to generate the violin plots. c Signal plot for chromosome 5 region surrounding rs77704739. Each point represents a variant, with its p-value from the discovery stage on a −log10 scale in the y axis. The x-axis represents the genomic position (hg19). Three credible set variants are located in open chromatin sites in human pancreatic islets, one of them classified as an active promoter and one highly bounded by pancreatic islet-specific transcription factors, such as PDX1, NKX2.2, NKX6.1, and FOXA2.
Replication of new associations with UK Biobank.
| CHR | rsID (alleles) (MAF) | Best model | Stage 1. GERA Discovery | Stage 2. UK Biobank replication | Stage 1 + Stage 2. Meta-analysis | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Phenotype (cases/controls) | Additive | Best model | Field (cases/controls or sample size) | Additive | Lowest | Additive | Lowest | |||||||||
| OR (CI 95%) | OR (CI 95%) | OR (CI 95%) | OR (CI 95%) | OR (CI 95%) | OR (CI 95%) | |||||||||||
| 18 | rs2014497 (A/G) (0.008) | Additive | Cancer (17,131/39,506) | 1.50 (1.30–1.72) | 2.44 × 10−8 | 1.50 (1.30–1.72) | 2.44 × 10−8 | Self-reported: chronic lymphocytic (237/360,904) | 2.13 (1.14–3.97) | 3.50 × 10−2 | 2.13 (1.14–3.97) | 3.50 × 10−2 | 1.52 (1.33–1.74) | 1.60 × 10−9 | 1.52 (1.33–1.74) | 1.60 × 10−9 |
| Self-reported: kidney/renal cell cancer (473/360,668) | 1.75 (1.07–2.86) | 4.25 × 10−2 | 1.75 (1.07–2.86) | 4.25 × 10−2 | 1.51 (1.32–1.73) | 1.49 × 10−9 | 1.51 (1.32–1.73) | 1.49 × 10−9 | ||||||||
| C69 Malignant neoplasm of eye and adnexa (146/361,048) | 2.51 (1.19–5.3) | 3.56 × 10−2 | 2.51 (1.19–5.3) | 3.56 × 10−2 | 1.52 (1.33–1.75) | 1.95 × 10−9 | 1.52 (1.33–1.75) | 1.95 × 10−9 | ||||||||
| 1 | rs2494196 (C/A) (0.274) | Additive | Hernia abdominopelvic (6291/50,346) | 1.13 (1.08–1.18) | 2.03 × 10−8 | 1.13 (1.08–1.18) | 2.03 × 10−8 | Self-reported: umbilical hernia (328/360,813) | 1.42 (1.21–1.67) | 2.31 × 10−5 | 1.42 (1.21–1.67) | 2.31 × 10−5 | 1.15 (1.10–1.19) | 5.35 × 10−11 | 1.15 (1.10–1.19) | 5.35 × 10−11 |
| K40 Inguinal hernia (13,365/347,829) | 1.09 (1.06–1.12) | 3.95 × 10−10 | 1.09 (1.06–1.12) | 3.95 × 10−10 | 1.10 (1.08–1.12) | 7.78 × 10−17 | 1.10 (1.08–1.12) | 7.78 × 10−17 | ||||||||
| K41 Femoral hernia (475/360,719) | 1.44 (1.26–1.64) | 1.24 × 10−7 | 1.44 (1.26–1.64) | 1.24 × 10−7 | 1.16 (1.11–1.21) | 2.26 × 10−12 | 1.16 (1.11–1.21) | 2.26 × 10−12 | ||||||||
| K42 Umbilical hernia (2623/358,571) | 1.29 (1.22–1.37) | 1.14 × 10−17 | 1.29 (1.22–1.37) | 1.14 × 10−17 | 1.19 (1.15–1.22) | 2.94 × 10−22 | 1.19 (1.15–1.22) | 2.94 × 10−22 | ||||||||
| K43 Ventral hernia (2470/358,724) | 1.18 (1.11–1.25) | 1.77 × 10−7 | 1.18 (1.11–1.25) | 1.77 × 10−7 | 1.15 (1.11–1.19) | 1.99 × 10−14 | 1.15 (1.11–1.19) | 1.99 × 10−14 | ||||||||
| 2 | rs557998486 (T/TG) (0.009) | Recessive | Macular degeneration (3685/52,952) | 1.07 (0.81–1.41) | 6.28 × 10−1 | 10.5a | 2.75 × 10−8 | Eye problems/disorders: Macular degeneration (2726/115,164) | 0.98 (0.72–1.32) | 8.81 × 10−1 | 7.58 (1.54–37.32) | 4.1 × 10−2 | 1.01 (0.82–1.24)b | 7.91 × 10−1c | 26.51 (7.57–92.85)b | 3.29 × 10−8c |
| 5 | rs77704739 (T/C) (0.036) | Recessive | Type 2 diabetes (6967/49,670) | 1.15 (1.05–1.26) | 2.80 × 10−3 | 4.32 (2.70–6.92) | 1.75 × 10−8 | Self-reported: diabetes (14,114/347,027) | 1.03 (0.97–1.09) | 3.87 × 10−1 | 1.88 (1.35–2.6) | 4.95 × 10−4 | 1.06 (1.01–1.12) | 1.78 × 10−2 | 2.46 (1.88–3.21) | 4.68 × 10−11 |
CHR chromosome, Position position hg19, Alleles non-effect allele/effect allele, MAF minor allele frequency, OR odds ratio.
aOdds ratio calculated using the recessive allele frequency-based test (RAFT).
bObtained through a mega-analysis with UK Biobank using the expected method from SNPTEST.
cObtained using METAL method SAMPLESIZE to combine the p-values taking into account the sample size and direction of effect.
Fig. 3Results from the analysis of additive and non-additive inheritance models.
a The Venn Diagram shows the number of loci that were identified when analyzing multiple inheritance models. As seen in the Venn Diagram, the strongest association for 37 of the 94 associated loci was non-additive. Moreover, the analysis of non-additive models was crucial for the identification of 13 novel (in bold) associated loci. b Power calculation of the rs201654520 indel in CACNB4 associated with cardiovascular disease. The results show that the additive-based test would require a population sample size of 370,646 individuals to find this recessive association, while the population sample size needed for the recessive model was 21,021. c Power calculation of the rs77704739 variant near the PELO gene associated with type 2 diabetes. The results show that the additive-based test would require a population sample size of 188,637 individuals to find this recessive association, while the population sample size needed for the recessive model is 67,611. d Power calculation of the rs557998486 indel near the THUMPD2 gene associated with age-related macular degeneration. The results show that the additive-based test would require a population sample size of 6,493,419 individuals to find this recessive association, while the population sample size for the recessive model is 475,952.