| Literature DB >> 30298529 |
Ian B Stanaway1, Taryn O Hall1, Elisabeth A Rosenthal2, Melody Palmer2, Vivek Naranbhai1,3, Rachel Knevel3, Bahram Namjou-Khales4, Robert J Carroll5, Krzysztof Kiryluk6, Adam S Gordon2, Jodell Linder7, Kayla Marie Howell7, Brandy M Mapes7, Frederick T J Lin8, Yoonjung Yoonie Joo8, M Geoffrey Hayes8, Ali G Gharavi6, Sarah A Pendergrass9, Marylyn D Ritchie10, Mariza de Andrade11, Damien C Croteau-Chonka3, Soumya Raychaudhuri3,12, Scott T Weiss3, Matt Lebo3, Sami S Amr3, David Carrell13, Eric B Larson13, Christopher G Chute14, Laura Jarmila Rasmussen-Torvik8, Megan J Roy-Puckelwartz8, Patrick Sleiman15, Hakon Hakonarson15, Rongling Li16, Elizabeth W Karlson10, Josh F Peterson5, Iftikhar J Kullo11, Rex Chisholm8, Joshua Charles Denny5, Gail P Jarvik2, David R Crosslin1.
Abstract
The Electronic Medical Records and Genomics (eMERGE) network is a network of medical centers with electronic medical records linked to existing biorepository samples for genomic discovery and genomic medicine research. The network sought to unify the genetic results from 78 Illumina and Affymetrix genotype array batches from 12 contributing medical centers for joint association analysis of 83,717 human participants. In this report, we describe the imputation of eMERGE results and methods to create the unified imputed merged set of genome-wide variant genotype data. We imputed the data using the Michigan Imputation Server, which provides a missing single-nucleotide variant genotype imputation service using the minimac3 imputation algorithm with the Haplotype Reference Consortium genotype reference set. We describe the quality control and filtering steps used in the generation of this data set and suggest generalizable quality thresholds for imputation and phenotype association studies. To test the merged imputed genotype set, we replicated a previously reported chromosome 6 HLA-B herpes zoster (shingles) association and discovered a novel zoster-associated loci in an epigenetic binding site near the terminus of chromosome 3 (3p29).Entities:
Keywords: GWAS; electronic medical records; genotypes; herpes zoster; variants
Mesh:
Year: 2018 PMID: 30298529 PMCID: PMC6375696 DOI: 10.1002/gepi.22167
Source DB: PubMed Journal: Genet Epidemiol ISSN: 0741-0395 Impact factor: 2.135
Number of unique participant eMERGE IDs and reported demographics
| Arrays | Gender | Gender | African/ | American | Pacific | Hispanic/ | |||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Medical center | Participants | Batches | Male | Female | Black | Indian | Asian | White | Islander | Latino | Unknown |
| Boston Children’s | 1,019 | 1 | 596 | 423 | 66 | 2 | 21 | 676 | 0 | 125 | 129 |
| CCHMC | 5,717 | 12 | 3,262 | 2,455 | 601 | 5 | 67 | 4,673 | 5 | 143 | 223 |
| CHOP | 10,465 | 21 | 5,630 | 4,835 | 4,666 | 7 | 161 | 4,890 | 3 | 321 | 417 |
| Columbia | 2,065 | 2 | 1,058 | 1,007 | 179 | 6 | 77 | 619 | 2 | 448 | 734 |
| Geisinger | 3,111 | 1 | 1,638 | 1,473 | 9 | 2 | 0 | 3,085 | 0 | 13 | 2 |
| Harvard | 10,095 | 3 | 4,626 | 5,469 | 509 | 0 | 172 | 8,579 | 0 | 474 | 361 |
| Kaiser/GHC/UW | 3,316 | 3 | 1,428 | 1,888 | 109 | 12 | 89 | 2,922 | 6 | 69 | 109 |
| Marshfield Clinic | 4,756 | 5 | 1,878 | 2,878 | 2 | 3 | 12 | 4,690 | 0 | 14 | 35 |
| Mayo Clinic | 10,256 | 16 | 5,193 | 5,063 | 23 | 18 | 21 | 8,810 | 0 | 1,043 | 341 |
| Mt. Sinai | 6,255 | 4 | 2,555 | 3,700 | 4,046 | 33 | 3 | 679 | 0 | 1,297 | 197 |
| Northwestern | 4,848 | 2 | 817 | 4,031 | 598 | 0 | 0 | 4,207 | 0 | 36 | 7 |
| Vanderbilt | 21,814 | 10 | 9,868 | 11,946 | 3,854 | 16 | 102 | 17,313 | 0 | 211 | 318 |
| Total | 83,717 | 38,549 | 45,168 | 14,662 | 104 | 725 | 61,143 | 16 | 4,194 | 2,873 |
Note. eMERGE: Electronic Medical Records and Genomics.
Figure 1PCA and Screen plot using MAF 5%, LD‐pruned R‐square of 0.7 and missingness of 10% by joint ancestry, and stratified by African, Asian, and European ancestry PCAs defined by the intersection of the k‐means and observed/self‐reported race. LD: linkage disequilibrium; MAF: minor allele frequency; PCA: principal component analysis
Figure 2Z0 Z1 identity by descent plot of eMERGE 3 imputation (n = 3,504,226,187 pairwise comparisons). eMERGE: Electronic Medical Records and Genomics
Figure 3Plots of genotype array batch mean R‐square imputation quality regression variables of samples size (a) and variant count (b). Histogram (c) of each variants mean R‐square imputation quality across imputation batches. Boxplots (d) of variants mean R‐square imputation quality by frequency bins
Figure 4The chromosomes 1–22 inbreeding coefficient F, as a measure of homozygosity plotted versus the batch mean R‐square imputation quality in the top panel and k‐means Principal component analysis ancestries in the top and bottom panels
Counts of herpes zoster cases and controls included in the final zoster regression model
| Site | Cases | Controls | Male | Female | African | Asian | European |
|---|---|---|---|---|---|---|---|
| KPUW | 641 | 1,625 | 984 | 1,282 | 94 | 78 | 2,094 |
| MRSH | 758 | 1,244 | 851 | 1,151 | 2 | 13 | 1,987 |
| COLU | 51 | 1,511 | 817 | 745 | 368 | 178 | 1,016 |
| GEIS | 285 | 1,983 | 1,232 | 1,036 | 6 | 1 | 2,261 |
| NWUN | 178 | 3,415 | 588 | 3,005 | 399 | 8 | 3,186 |
| HARV | 461 | 7,063 | 3,539 | 3,985 | 405 | 205 | 6,914 |
| MTSI | 286 | 3,648 | 1,507 | 2,427 | 2,846 | 73 | 1,015 |
| MAYO | 348 | 6,128 | 3,426 | 3,050 | 19 | 36 | 6,421 |
| VAND | 755 | 15,970 | 7,679 | 9,046 | 2,692 | 164 | 13,869 |
| Total 46,350 | 3,763 | 42,587 | 20,623 | 25,727 | 6,831 | 756 | 38,763 |
Note. Ancestry is reported based on the k‐means three clusters of PC 1 and PC 2.
28 common variants that reach genome‐wide significance
| rsID | Rsq | G/I | M/MA | MAF | chr | position | p | OR | gene |
|---|---|---|---|---|---|---|---|---|---|
| rs9810195 | 0.92 | 8/70 | G/A | 0.13 | 3 | 192746326 | 2.805e‐09 | 1.26 | TFBS |
| rs9848218 | 0.92 | 0/78 | T/C | 0.13 | 3 | 192746451 | 3.438e‐09 | 1.26 | TFBS |
| rs6784731 | 0.92 | 0/78 | C/G | 0.13 | 3 | 192746746 | 3.538e‐09 | 1.26 | TFBS |
| rs6784850 | 0.92 | 0/78 | A/G | 0.13 | 3 | 192746850 | 3.604e‐09 | 1.26 | TFBS |
| rs1039219 | 0.91 | 0/78 | G/A | 0.12 | 3 | 192747249 | 2.966e‐08 | 1.24 | TFBS |
| rs1039220 | 0.94 | 0/78 | T/C | 0.14 | 3 | 192747381 | 1.096e‐08 | 1.23 | TFBS |
| rs11916599 | 0.92 | 0/78 | G/A | 0.13 | 3 | 192747851 | 6.238e‐09 | 1.25 | TFBS |
| rs11924420 | 0.94 | 4/74 | T/C | 0.15 | 3 | 192748011 | 2.09e‐08 | 1.23 | TFBS |
| rs4371461 | 0.94 | 0/78 | T/C | 0.15 | 3 | 192748673 | 2.911e‐08 | 1.23 | TFBS |
| rs7428308 | 0.94 | 0/78 | G/A | 0.14 | 3 | 192749047 | 3.481e‐08 | 1.22 | TFBS |
| rs73071839 | 0.94 | 0/78 | G/A | 0.15 | 3 | 192749140 | 2.387e‐08 | 1.23 | TFBS |
| rs112062423 | 0.94 | 0/78 | A/C | 0.15 | 3 | 192749169 | 2.367e‐08 | 1.23 | TFBS |
| rs2844584 | 0.86 | 0/78 | G/A | 0.09 | 6 | 31321524 | 2.043e‐08 | 0.772 | HLA‐B intron |
| rs2769 | 0.88 | 3/75 | A/G | 0.12 | 6 | 31321882 | 3.027e‐10 | 0.772 | HLA‐B 3 prime UTR |
| rs1093 | 0.88 | 3/75 | G/A | 0.17 | 6 | 31321906 | 1.541e‐09 | 0.884 | HLA‐B 3 prime UTR |
| rs17199328 | 0.89 | 0/78 | G/A | 0.12 | 6 | 31322395 | 2.284e‐10 | 0.771 | HLA‐B intron, missense |
| rs2854001 | 0.88 | 0/78 | A/G | 0.17 | 6 | 31323012 | 8.484e‐09 | 0.819 | HLA‐B intron |
| rs1050723 | 0.88 | 0/78 | A/G | 0.12 | 6 | 31323321 | 2.528e‐10 | 0.77 | HLA‐B missense |
| rs9266266 | 0.89 | 12/66 | T/C | 0.15 | 6 | 31326011 | 6.869e‐09 | 0.807 | HLA‐B upstream |
| rs9266269 | 0.89 | 0/78 | A/G | 0.15 | 6 | 31326055 | 4.808e‐09 | 0.805 | HLA‐B upstream |
| rs9266270 | 0.89 | 0/78 | A/G | 0.15 | 6 | 31326072 | 6.724e‐09 | 0.806 | HLA‐B upstream |
| rs116583816 | 0.89 | 0/78 | C/G | 0.13 | 6 | 31326123 | 2.36e‐09 | 0.785 | HLA‐B upstream, TFBS |
| rs2523591 | 0.90 | 34/44 | A/G | 0.42 | 6 | 31326960 | 9.609e‐09 | 0.863 | HLA‐B upstream |
| rs2523586 | 0.90 | 27/51 | T/G | 0.23 | 6 | 31327435 | 1.461e‐08 | 0.839 | HLA‐B upstream |
| rs2596477 | 0.89 | 20/58 | A/G | 0.13 | 6 | 31327723 | 3.531e‐10 | 0.774 | HLA‐B upstream |
| rs2523577 | 0.89 | 0/78 | G/A | 0.13 | 6 | 31328739 | 4.399e‐10 | 0.776 | HLA‐B upstream |
| rs9266853 | 0.86 | 0/78 | C/G | 0.08 | 6 | 31387725 | 3.718e‐08 | 0.754 | HPC5, HLA‐B upstream |
| rs3893526 | 0.88 | 0/78 | A/G | 0.08 | 6 | 31413742 | 2.102e‐08 | 0.751 | HPC5, LINC01149 |
Note. G/I: genotyped/imputed batch count; M/MA: major/minor allele; MAF: minor allele frequency; OR: odds‐ratio; Rsq: Imputation quality R‐square mean; TFBS: transcription factor binding site.
Figure 5Joint ancestry GWAS Manhattan (left panel) and quantile–quantile plots (right panel) of herpes zoster (shingles) with 3,763 cases and 42,587 controls. Variant inclusion stringency is set to R‐square of ≥0.3 and minor allele frequency of ≥0.05. Covariate adjustments were made for PCs 1, 2, and 3, gender and the nine contributing medical centers which were included. Genomic control is close to one with a λ of ~1.02
Figure 6Chromosome 3p29 site LocusZoom plot of zoster association in the joint ancestry regressions. SNP: single‐nucleotide polymorphism; TFBS: transcription factor binding site
Figure 7Chromosome 6 HLA‐B LocusZoom plot of zoster association in the joint ancestry regressions. SNP: single‐nucleotide polymorphism