UK Biobank is among the world's largest repositories for phenotypic and genotypic information in individuals of European ancestry. We performed a genome-wide association study in UK Biobank testing ∼9 million DNA sequence variants for association with coronary artery disease (4,831 cases and 115,455 controls) and carried out meta-analysis with previously published results. We identified 15 new loci, bringing the total number of loci associated with coronary artery disease to 95 at the time of analysis. Phenome-wide association scanning showed that CCDC92 likely affects coronary artery disease through insulin resistance pathways, whereas experimental analysis suggests that ARHGEF26 influences the transendothelial migration of leukocytes.
UK Biobank is among the world's largest repositories for phenotypic and genotypic information in individuals of European ancestry. We performed a genome-wide association study in UK Biobank testing ∼9 million DNA sequence variants for association with coronary artery disease (4,831 cases and 115,455 controls) and carried out meta-analysis with previously published results. We identified 15 new loci, bringing the total number of loci associated with coronary artery disease to 95 at the time of analysis. Phenome-wide association scanning showed that CCDC92 likely affects coronary artery disease through insulin resistance pathways, whereas experimental analysis suggests that ARHGEF26 influences the transendothelial migration of leukocytes.
Coronary artery disease (CAD) is a leading cause of disability and mortality worldwide[2]. Genome-wide association studies (GWAS) have provided new clues to the pathophysiology for this common, complex disease. Largely using a case-control design with cases ascertained based on CAD status, published studies have highlighted at least 80 loci reaching genome-wide significance[3-9].Population-based biobanks such as UK Biobank offer new potential for genetic analysis of common complex diseases. New opportunities include scale, a diverse range of traits, and the ability to explore a fuller spectrum of phenotypic consequences for identified DNA variants. Leveraging the UK Biobank resource, we sought to: 1) perform a genetic discovery analysis; 2) explore the phenotypic consequences and tissue-specific effects associated with CAD risk alleles; and 3) characterize the functional consequences of a risk mutation in a promising pathway.We designed a three-stage GWAS (Fig. 1). In Stage 1, we tested the association of DNA sequence variants with CAD in UK Biobank. In Stage 2 we took forward 2,190 variants that reached nominal significance in Stage 1 (P<0.05) for meta-analysis with results from an exome-focused-array analysis in 42,355 cases and 78,240 controls[6]. In Stage 3, we took forward 387,174 variants that reached nominal significance in Stage 1 and not tested in Stage 2 for meta-analysis with results from a genome-wide imputation study in 60,801 cases and 123,504 controls[5]. For each variant, we combined statistical evidence across Stages 1 and 2 (or Stages 1 and 3) and set a statistical threshold of P < 5 ×10−8 for genome-wide significance.
Figure 1
Study Design
Stage 1 consisted of a genome-wide association study for the coronary artery disease phenotype performed in UK Biobank; variants below a threshold P value < 0.05 moving forward to meta-analysis with CARDIoGRAM Exome (Stage 2) or CARDIoGRAMplusC4D summary statistics (Stage 3).
Characteristics of UK Biobank participants stratified by presence of CAD are presented in Supplementary Table 1. CAD cases were more likely to be older, male, on lipid-lowering therapy, have a history of smoking, and affected with type 2 diabetes. After quality control, 9,061,845 DNA sequence variants were tested for association in 4,831 CAD patients and 115,455 controls in UK Biobank (Stage 1). A total of 269 variants at five distinct loci met the genome-wide significance threshold (P < 5×10−8) (Supplementary Fig. 1 and 2). All five have been previously reported[5,10-13]. In UK Biobank, the 9p21/CDKN2B-AS1 variant rs4977575 (NC_000009.12:g.22124745C>G) was the top association result (49% frequency for G allele; OR =1.24; 95%CI: 1.19–1.29; P = 5.40×10−23); the other four loci were 1p13/SORT1, PHACTR1, LPA, and KCNE2 (Supplementary Table 2). For a set of previously reported CAD loci[5], we compared the effect estimates from the published literature with that from the current analysis in UK Biobank and found strong positive correlation in effect sizes (β = 0.92, 95%CI: 0.77–1.06; P = 1.8×10−17, Supplementary Fig. 3); these results validate our CAD phenotype definition in UK Biobank. A total of 513,403 variants exceeded nominal significance (P < 0.05) and were taken forward to Stages 2 or 3.After meta-analysis, 15 new loci exceeded genome-wide significance (Tables 1–2), bringing the total number of established CAD loci to 95. Of note, while this manuscript was under review, one of the 15 loci (HNF1A) has since been reported[9]. Effect allele frequencies of the 15 newly identified loci ranged from 13% to 86%, with effect sizes ranging from 1.05 to 1.08. Descriptions of relevant loci appear in Supplementary Table 3, and regional association plots for novel CAD loci are shown in Supplementary Figures 4–6.
Table 1
New loci from analysis of UK Biobank and CARDIoGRAM exome study
UK Biobank
Stage 2 Exome Study
Combined
Lead Variant
Chr
Gene
Description
EA
EAF
OR
P
OR
P
OR
95% CI
P
rs2972146
2
(LOC646736)
intergenic
T
0.65
1.07
0.0011
1.05
2.01×10−7
1.06
1.04–1.07
1.46×10−9
rs12493885 (p.Val29Leu)
3
ARHGEF26
missense
C
0.85
1.07
0.039
1.09
8.28×10−9
1.08
1.06–1.11
1.02×10−9
rs1800449
5
LOX
missense
T
0.17
1.09
0.0039
1.07
1.72×10−7
1.07
1.05–1.09
2.99×10−9
rs11057401 (p.Ser70Cys)
12
CCDC92
missense
T
0.69
1.08
0.001
1.05
4.32×10−7
1.06
1.04–1.08
3.88×10−9
Genes for variants that are outside the transcript boundary of the protein-coding gene are shown in parentheses [eg, (LOC646736)].
Chr = Chromosome, CI = Confidence Interval, EA = Effect Allele, EAF = Effect Allele Frequency, OR = Odds Ratio
Table 2
New Loci from analysis of UK Biobank and CARDIoGRAMplusC4D 1000G imputation study
UK Biobank
Stage 3 1000G Imputed Study
Combined
Lead Variant
Chr
Gene
Description
EA
EAF
OR
P
OR
P
OR
95% CI
P
rs17517928
2
FN1
intronic
C
0.75
1.08
0.0026
1.06
5.14×10−7
1.06
1.04–1.08
1.06×10−8
rs17843797
3
UMPS-ITGB5
intronic
G
0.13
1.11
0.00019
1.07
2.43×10−6
1.07
1.05–1.10
1.52×10−8
rs748431
3
FGD5
intronic
G
0.36
1.04
0.042
1.05
2.14×10−7
1.05
1.03–1.07
2.63×10−8
rs7623687
3
RHOA
intronic
A
0.86
1.09
0.0073
1.07
5.22×10−7
1.08
1.05–1.10
2.00×10−8
rs10857147
4
(FGF5)
regulatory region
T
0.29
1.06
0.014
1.06
5.83×10−7
1.06
1.04–1.08
3.39×10−8
rs7678555
4
(MAD2L1)
intergenic
C
0.29
1.06
0.027
1.06
3.26×10−7
1.06
1.04–1.08
2.91×10−8
rs10841443
12
RP11-664H17.1
intronic
G
0.67
1.06
0.0073
1.05
5.81×10−7
1.05
1.03–1.07
2.23×10−8
rs2244608
12
HNF1A
intronic
G
0.32
1.07
0.003
1.05
1.02×10−6
1.05
1.03–1.07
2.41×10−8
rs3851738
16
CFDP1
intronic
C
0.6
1.07
0.00089
1.05
1.88×10−6
1.05
1.03–1.07
2.43×10−8
rs7500448
16
CDH13
intronic
A
0.75
1.1
0.00016
1.06
2.11×10−6
1.06
1.04–1.09
1.20×10−8
rs8108632
19
TGFB1
intronic
T
0.41
1.06
0.011
1.05
4.76×10−7
1.05
1.03–1.07
2.35×10−8
Genes for variants that are outside the transcript boundary of the protein-coding gene are shown in parentheses [eg, (FGF5)].
1000G = 1000 Genomes, Chr = Chromosome, CI = Confidence Interval, EA = Effect Allele, EAF = Effect Allele Frequency, OR = Odds Ratio
To move from these 15 DNA sequence variants to biologic insights, we took two approaches: phenome-wide association scanning and functional analysis. Understanding the full spectrum of phenotypic consequences of a given DNA sequence variant may shed light on the mechanism by which a variant/gene leads to disease. Termed a ‘phenome-wide association study’ or “PheWAS”, this approach tests the association of a mapped disease variant with a broad range of human phenotypes[14]. In collaboration with Genomics plc, we conducted a PheWAS combining UK Biobank data, mRNA transcript phenotypes in the Genotype-Tissue Expression Project (GTEx) dataset[15], and an integrated set of GWAS results from a variety of publically available sources[16-24].We found that several of the newly identified DNA sequence variants correlated with a range of human traits (Fig. 2, Supplementary Tables 4–5). For example, the intronic variant rs10841443 within RP11-664H17.1 is in close proximity to PDE3A, a phosphodiesterase previously implicated in an autosomal dominant form of hypertension[25]. PheWAS showed an association for this variant with diastolic blood pressure[26], suggesting that this locus may be acting through hypertension. The variant rs2244608 within HNF1A has been previously associated with LDL cholesterol, a causal path to atherosclerosis[16]. The variant rs7500448 within CDH13 (encoding Cadherin 13 or T-Cadherin), a vascular adiponectin receptor implicated in hypertensive and insulin resistance biology[27], associates with plasma adiponectin levels. Variant rs2972146 is downstream of IRS1 (encoding the insulin receptor substrate-1 gene[24]) and is a cis-eQTL for IRS1 expression in adipose tissue. rs2972146 associates with a range of phenotypes seen in the setting of insulin resistance including HDL cholesterol, triglycerides, adiponectin, fasting insulin, and type 2 diabetes.
Figure 2
Phenome-wide association results for 15 novel loci
For the 15 novel CAD risk variants identified in our study, Z-scores (aligned to the CAD risk allele) were obtained from the Genomics plc Platform and UK Biobank. A positive Z-score (red) indicates a positive association between the CAD risk allele and the disease/trait, while a negative Z-score (blue) indicates an inverse association. Boxes are outlined in green if the variant is significantly (P < 0.00013) associated with the given trait.
Abbreviations: Adj, Adjusted; BMI, Body Mass Index; BP, Blood Pressure; crea, Creatinine; cys, cystatin-c; COPD, chronic obstructive pulmonary disease; eGFR, estimated Glomerular Filtration Rate; HDL, High Density Lipoprotein; LDL, Low Density Lipoprotein;
Compelling additional insights from the PheWAS emerged at the CCDC92 locus. Across 25 distinct traits and disorders, we observed significant associations (P < 0.00013) for CCDC92 p.Ser70Cys (rs11057401) with body fat percentage, waist-to-hip circumference ratio, as well as plasma high-density lipoprotein, triglyceride, and adiponectin levels. The directionality of these associations are hallmarks of insulin resistance and lipodystrophy[17,28], and the association with plasma adiponectin levels localizes these genetic effects to adipose tissue. Recent work has highlighted two candidate genes at this locus, CCDC92 and DNAH10[29], and further experimental work is necessary to define the causal gene at this locus.However, a few of the CAD loci (FN1, LOX, ITGB5, and ARHGEF26) did not associate with any of the studied risk factor traits and thus, appear to function through pathways beyond known CAD risk factors (Fig. 2, Supplementary Tables 4–5). A common variant within an intron of FN1[30] (encoding Fibronectin 1) and a missense variant in LOX[31] (encoding Lysyl Oxidase) suggest potential links to extracellular matrix biology. Of note, rare coding mutations in LOX were recently described to cause Mendelian forms of thoracic aortic aneurysm and dissection[32,33], highlighting a potential common link between atherosclerosis and aortic disease, possibly through altered extracellular matrix biology. A variant downstream of ITGB5[34] (encoding Integrin Subunit Beta 5) suggests pathways underlying cell adhesion and migration.In aggregate, our analysis brings the total number of known CAD loci to 95[3-9], and in Figure 3, we organize these loci into plausible pathways. Of note, the causal variant, gene, cell type, and mechanism has been definitively identified at only a few of these loci and as such, additional experimental research will be required, particularly at >50% of loci without an apparent link to known risk factors.
Figure 3
Biological pathways underlying genetic loci associated with coronary artery disease
CAD GWAS loci identified to date are depicted along with the plausible relationship to the underling biological pathway. The 15 new loci described in this paper are shown in bold. Loci names are based on the nearest genes; however, the causal gene(s) remains unclear for most associated loci and as such, the resultant annotation may prove incorrect in some cases. Adapted from Ref. [41].
At one of the new loci that did not relate to known risk factors, ARHGEF26 (encoding Rho Guanine Nucleotide Exchange Factor 26), we performed functional studies. Prior experimental work had connected this gene with murine atherosclerosis[35]. Earlier studies established a role for ARHGEF26 in facilitating the transendothelial migration of leukocytes, a key step in the initiation of atherosclerosis[36,37]. ARHGEF26 has been shown to activate RhoG GTPase by promoting the exchange of GDP by GTP and contributing to the formation of ICAM-1-induced endothelial docking structures that facilitate leukocyte transendothelial migration [36,37]. In addition, Arhgef26 −/− mice, when crossed with atherosclerosis-prone Apoe null mice, displayed less aortic atherosclerosis[35].At ARHGEF26 p.Val29Leu (rs12493885), the 29Leu allele, observed in 85% of participants, is associated with increased risk for CAD. We first examined the hypothesis that a haplotype block containing this variant may alter expression of ARHGEF26 in coronary artery. While this region demonstrates eQTL effects in a variety of tissues, there is no evidence of alteration of ARHGEF26 expression in coronary artery in both eQTL and allele specific expression analyses (Supplementary Fig. 7). To further evaluate the possibility that a haplotype containing the 29Leu allele may affect gene expression, we performed a luciferase reporter assay. We cloned a 2.5 kb region immediately upstream of the ARHGEF26 start codon consisting of the promoter, 5′ untranslated region (5′ UTR), and regions with ENCODE annotations suggestive of potential cis-acting elements. We obtained the reference (in LD with Val29 G allele) and alternative (in LD with 29Leu C allele) haplotypes of this region from human rs12493885 heterozygotes. We coupled each haplotype with a luciferase reporter, and measured luciferase activity (Supplementary Fig. 8). In HEK293, human aortic endothelial cells (HAEC), and human umbilical vein endothelial cells (HUVEC), there is no significant difference in luciferase activity between reference and alternative haplotypes. These data suggest that the ARHGEF26 29Leu allele may confer CAD risk via mechanisms other than affecting ARHGEF26 transcription or promoter activity in disease-relevant tissue.Next, we examined the hypothesis that ARHGEF26 p.Val29Leu may influence disease risk through its protein-altering consequence. We knocked down endogenous ARHGEF26 through siRNA and observed decreased leukocyte transendothelial migration, leukocyte adhesion on endothelial cells, and vascular smooth cell proliferation[38] (Fig. 4, Supplementary Fig. 9). Overexpression of exogenous, wild-type ARHGEF26 rescued these phenotypes. However, ARHGEF26 29Leu mutant overexpression led to rescued phenotypes that consistently exceeded wild-type. These data support the hypothesis that the ARHGEF26 29Leu allele associated with increased CAD risk may lead to a gain-of-function ARHGEF26 protein.
Figure 4
Functional assessment of ARHGEF26 p.Val29Leu in vitro
a) ARHGEF26-29Leu increases leukocyte transendothelial migration. HAEC were transfected with non-targeting siRNA and empty vector (control), siRNA against ARHGEF26 3′-UTR and empty vector, siRNA and ARHGEF26-WT, or siRNA and ARHGEF26-29Leu. Transfected HAEC were plated on transwell inserts and treated with 10 ng/mL TNF-α. Differentiated HL60 cells were loaded on the upper chambers of transwells and allowed to transmigrate across HAEC towards vehicle (blue) or 50 ng/mL SDF-1 (red). The migrated cells were quantified as percentage of input cells per well (n=5 or 6; mean±s.d.; F=11.89, DF=3 by two-way ANOVA within vehicle and SDF-1 subgroups with Fisher’s LSD test; variance among vehicle subgroups non-significant; NS, not significant; representative of 3 independent experiments).
b) ARHGEF26-29Leu increases leukocyte adhesion on endothelial cells. HAEC were transfected as 2a) and cultured on 96-well plates until confluent and treated with 10 ng/mL TNF-α. Calcein-AM-labeled THP-1 cells were incubated with HAEC and washed to remove non-adherent cells. The adherent cells were lysed, quantified by Calcein-AM fluorescence and compared to siRNA+WT (n=25, 17, 20, and 17; mean±s.d.; F=14.53, DF=3 by one-way ANOVA; NS, not significant; * P<0.0001 compared to siRNA+WT; representative of 3 independent experiments).
c) ARHGEF26-29Leu increases vascular smooth muscle cell proliferation. HCASMC were transfected as 2a) and made quiescent by serum starvation for 48 h, followed by 72-h proliferation in normal serum medium. Cell proliferation was quantified by a luminescent assay and compared to siRNA+WT (n=20; mean±s.d.; F=197.5, DF=3 by one-way ANOVA; NS, not significant; * P<0.0001 compared to siRNA+WT; representative of 3 independent experiments).
How could the ARHGEF26 29Leu mutation lead to a gain-of-function phenotype? We evaluated its functional impact in two ways, addressing ARHGEF26 quality and quantity, respectively. First, could the 29Leu mutation alter ARHGEF26 nucleotide exchange activity on RhoG? To answer this question, we developed a GTP-GDP nucleotide exchange assay using recombinant human full-length ARHGEF26 (wild-type or 29Leu) and RhoG proteins[39]. In a cell-free system, equal amount of wild-type or 29Leu ARHGEF26 protein was incubated with RhoG pre-loaded with GDP. After 60 minutes, we observed no significant difference in nucleotide exchange activity between wild-type and 29Leu mutant ARHGEF26 (Supplementary Fig. 10).Second, could the 29Leu allele affect cellular abundance of ARHGEF26 protein? We examined this possibility by treating cells expressing wild-type or 29Leu mutant ARHGEF26 with cycloheximide, a protein synthesis inhibitor, and compared ARHGEF26 degradation over time by Western blotting. Compared to wild-type ARHGEF26, the 29Leu mutant protein displayed a longer half-life (Supplementary Fig. 11). While further work is needed to understand the mechanism in vivo, in vitro results suggest that the gain of function phenotype observed may be secondary to the 29Leu mutant protein’s resistance to degradation.Our study should be interpreted within the context of its limitations. First, we focused on participants of European ancestry within UK Biobank and therefore results may not be generalizable to other populations. Second, our CAD phenotype definitions are based largely on interview and electronic health records and this may result in misclassification of case status. However, such misclassification should reduce statistical power for discovery and bias results toward the null. Finally, although we observed no evidence of robust changes in ARHGEF26 expression associated with the 29Leu haplotype in disease relevant tissue, it is possible that other regulatory mechanisms may potentiate the gain of function phenotypes we observed.In summary, we performed a gene discovery study for CAD using a large population-based biobank, identified 15 new loci, and explored the phenotypic consequences of CAD risk variants through PheWAS and in vitro functional analysis. These findings permit several conclusions. First, CAD cases phenotyped via electronic health records and verbal interviews exhibit similar genetic architecture to those derived in epidemiologic cohorts and can prove useful in gene discovery efforts. Second, phenome-wide association studies with risk variants can provide initial clues on how DNA sequence variants may lead to disease. Lastly, considerable experimental evidence in cells and rodents has suggested that transendothelial migration of leukocytes is a key step in the formation of atherosclerosis[40]; here, we provide human genetic support for a role of this pathway in CAD.
Online Methods
Study Design and Samples
We performed a three-stage sequential analysis to identify novel genetic loci associated with CAD. In Stage 1, we first tested the association of DNA sequence variants with CAD in UK Biobank. Beginning in 2006, individuals aged 45 to 69 years old were recruited from across the United Kingdom for participation in the UK Biobank Study[1]. At enrollment, a trained healthcare provider ascertained participants’ medical histories through verbal interview. In addition, participants’ electronic health records (EHR) including inpatient International Classification of Disease (ICD-10) diagnosis codes and Office of Population and Censuses Surveys (OPCS-4) procedure codes, were integrated into UK Biobank. Individuals were defined as having CAD based on at least one of the following criteria:Myocardial infarction (MI), coronary artery bypass grafting, or coronary artery angioplasty documented in medical history at time of enrollment by a trained nurseHospitalization for ICD-10 code for acute myocardial infarction (I21.0, I21.1, I21.2, I21.4, I21.9)Hospitalization for OPCS-4 coded procedure: coronary artery bypass grafting (K40.1–40.4, K41.1–41.4, K45.1–45.5)Hospitalization for OPCS-4 coded procedure: coronary angioplasty with or without stenting (K49.1–49.2, K49.8–49.9, K50.2, K75.1–75.4, K75.8–75.9)All other individuals were defined as controls. In total, genotypes were available for 120,286 participants of European ancestry.In Stage 2, we took forward 2,190 variants that reached nominal significance in Stage 1 for meta-analysis in the Coronary ARtery DIsease Genome wide Replication and Meta-analysis (CARDIoGRAM) Exome Consortia exome array analysis which incorporated 42,355 cases and 78,240 controls[6] (Supplementary Table 6). In Stage 3, we took forward 387,174 variants that reached nominal significance in Stage 1 (and not available in Stage 2) for meta-analysis into the CARDIoGRAMplusC4D 1000 Genomes imputation study containing 60,801 cases and 123,504 controls[5]. Informed consent was obtained for all participants, and UK Biobank received ethical approval from the Research Ethics Committee (reference number 11/NW/0382). Our study was approved by a local Institutional Review Board at Partners Healthcare (protocol 2013P001840).
Genotyping and Quality Control
UK Biobank samples were genotyped using either the UK Bileve[42] or UK Biobank Axiom Arrays having been performed in 33 separate batches of samples by Affymetrix (High Wycombe, UK). A total of 806,466 directly genotyped DNA sequence variants were available after variant quality control (QC). The UK Biobank team then performed imputation from a combined 1000 Genomes/UK10K reference panel; phasing was performed using SHAPEIT-3 and imputation carried out via IMPUTE3. Variant level QC exclusion metrics applied to imputed data for GWAS included: call rate < 95%, Hardy-Weinberg Equilibrium P-value <1×10−6, posterior call probability < 0.9, imputation quality <0.4, and minor allele frequency (MAF) < 0.005. Sex chromosome and mitochondrial genetic data were excluded from this analysis. In total, 9,061,845 imputed DNA sequence variants were included in our analysis. For sample QC, the UK Biobank analysis team removed individuals of relatedness 3rd degree or higher, and an additional 480 samples with an excess of missing genotype calls or more heterozygosity than expected were excluded. In total, genotypes were available for 120,286 participants of European ancestry.
Statistical Analysis
Stage 1 Association Analysis
The BOLT-LMM software[43] was used to perform linear mixed models (LMMs) for association testing. CAD case status was analyzed while adjusting for age, gender, and chip array at run-time. This analysis was used to derive statistical significance. As effect estimates from BOLT-LMM software are unreliable due to the treatment of binary phenotype data as quantitative data, we performed logistic regression to derive effect estimates for each variant that exceeded genome-wide significance. Effect estimates of top variants were derived from logistic regression using allelic dosages adjusting for age, sex, chip at run-time, and ten principal components under the assumption of additive effects utilizing the R v3.2.0 and SNPTEST statistical software programs.
Stage 2 and 3 Meta-Analysis
In stage 2, top variants (P < 0.05) from UK Biobank were then meta-analyzed with exome chip data from the CARDIoGRAM Exome Consortium[6]. Tested variants in the CARDIoGRAM exome array study were analyzed through logistic regression with an additive model adjusting for study specific covariates and principal components of ancestry as appropriate. Top variants from UK Biobank that were not available for analysis in the CARDIoGRAM exome array study were then meta-analyzed with data from the 1000 Genomes imputed CARDIoGRAMplusC4D GWAS[5] in Stage 3.Given differences in effect size units between the UK Biobank Stage 1 data and the CARDIoGRAM Exome/1000 Genomes CARDIoGRAMplusC4D data, both Stage 2 and 3 meta-analyses were performed via a weighted z-score method, adjusting for an unbalanced ratio of cases to controls. To derive effect size estimates for variants exceeding genome-wide significance, we meta-analyzed logistic regression results using inverse-variance weighting with fixed effects (METAL software)[44]. We set a combined statistical threshold of P < 5×10−8 for genome wide significance. P values reported in analysis Stages 1, 2, and 3 are all two-sided.
Phenome-Wide Association Study
For all 15 novel DNA sequence variants associated with CAD in our study, we collaborated with Genomics plc to conduct a phenome-wide association study. This PheWAS used the Genomics plc Platform, UK Biobank, and GTEx Consortium eQTL data. The Genomics plc Platform includes PheWAS data across 545 distinct molecular and disease phenotypes, at an integrated set of over 14 million common variants, from 677 GWAS studies. UK Biobank analyses within the Genomics plc Platform were conducted under a separate research agreement. We selected 25 phenotypes across a range of relevant diseases, metabolic and anthropometric traits from either previously published GWAS datasets or UK Biobank. Complete details of phenotype definitions, sample sizes, and GWAS data sources are shown in Supplementary Tables 7 and 8. In the PheWAS, quantitative traits were standardized to have unit variance, imputation was performed to generate results for all variants within the 1000 Genomes reference panel, and P values were recalculated based on a Wald test statistic for uniformity.Phenotypes were declared to be significantly associated with the risk variant if they met a Bonferroni corrected P value of < 0.00013 [0.05/(25 traits × 15 DNA sequence variants)]. Phenome scan results were then depicted in a heatmap based on the Z-scores for all variant-disease/trait associations aligned to the CAD risk allele as implemented by the gplots package in R v3.2.0. To identify loci that might influence gene expression, we used previously published cis-expression quantitative trait locus (eQTL) mapping data from the Genotype-Tissue Expression (GTEx) Consortium Project across 44 tissues[15]. We queried the 15 novel variants identified in our study for overlap with genome-wide significant variant-gene pairs from the GTEx portal.
Allele Specific Expression Analysis
Allele-specific expression (ASE) data from the GTEx project were obtained from dbGaP (accession phs000424.v6.p1). The generation of these data is summarized in Aguet et al., and relied on methods described earlier[45]. In brief, only uniquely mapping reads with base quality ≥ 10 at the SNP were counted, and only SNPs with coverage of at least 8 reads were reported. For ARHGEF26 p.Val29Leu, ASE counts were available for 20 heterozygous individuals. A two-sided binomial test was used to identify SNPs with significant allelic imbalance in each individual, and Benjamini-Hochberg adjusted p-values were calculated across all sites measured in an individual.
Luciferase Reporter Assay
HUVEC heterozygous for rs12493885 were identified from Caucasian donors by SNP genotyping. A 2.9 kb genomic fragment spanning from 5′ upstream of ARHGEF26 to exon 2 (rs12493885) was cloned into a pMiniT 2.0 vector (NEB) using the heterozygous HUVEC genomic DNA as a template, and sequenced for reference and alternative alleles. The −2516 to +2 reference and alternative haplotypes upstream of ARHGEF26 (NC_000003.12:154119477-154121994) were amplified from the 2.9 kb region by PCR with primers designed to create 5′ NheI and 3′ HindIII restriction sites in the PCR products. The amplified fragments were subcloned between the NheI and HindIII sites of a promoterless firefly luciferase (luc2) expression vector pGL4.10 (Promega), to create two plasmids: pGL4.10-Ref and pGL4.10-Alt. Promoterless pGL4.10-control, and pGL4.73[hRluc/SV40] vector containing the renilla luciferase hRluc reporter gene and an SV40 early enhancer/promoter, were used as negative control and co-reporter, respectively. Cells were cotransfected with equal amounts of luc2 expression plasmid (pGL4.10-control, pGL4.10-Ref and pGL4.10-Alt) and pGL4.73 vector by Lipofectamine 2000. Cells were harvested at 48 h after transfection and followed by a Dual-Glo Luciferase Assay (Promega) to measure firefly and renilla luciferase activities. The firefly luciferase activity was normalized to renilla luciferase in the same sample, and expressed as fold change relative to pGL4.10-control group.
Nucleotide Exchange Assay
Human full-length ARHGEF26 (wild-type or 29Leu) and RhoG (residues 1–188) proteins, both with N-terminal His-SUMO tags, were expressed in E. coli BL21(DE3) cells in TB media. Nucleotide exchange assay samples were prepared in buffer containing 10mM HEPES pH 7.4, 150mM NaCl, 1mM MgCl2, 0.5uM MANT-GTP, 2mM TCEP with 1μM ARHGEF26. Just prior to reading, RhoG protein, pre-loaded with GDP, was added to a final concentration of 0.4μM. MANT-GTP fluorescence was monitored for 60 minutes on a SpectraMax M2 at 37°C using an excitation wavelength of 280nm and an emissions wavelength of 440nm with a 435nm cutoff. Fluorescence data was imported into Prism GraphPad for analysis.
Functional Characterization of ARHGEF26 p.Val29Leu in Arterial Tissue
To investigate the functional effects of ARHGEF26 p.Val29Leu (rs12493885), we knocked-down the expression of endogenous ARHGEF26 in cultured human aortic endothelial cells (HAEC) and human coronary artery smooth muscle cells (HCASMC) by RNA interference. We then overexpressed wild-type or mutant ARHGEF26 (29Leu) resistant to siRNA, and measured leukocyte transendothelial migration, leukocyte adhesion on endothelial cells, and HCASMC proliferation in vitro. We also evaluated the degradation of wild-type or 29Leu mutant ARHGEF26 with a cycloheximide chase assay and Western blotting. Additional details on experimental techniques are described in the Supplementary Note.
Data Availability
Stage 2 and Stage 3 data contributed by CARDIoGRAM Exome and CARDIoGRAMplusC4D investigators is available online (see URLs). The genetic and phenotypic UK Biobank data are available upon application to the UK Biobank. Genotype-Tissue Expression Project data is available online.
Authors: Vivian S Lee; Carmen M Halabi; Erin P Hoffman; Nikkola Carmichael; Ignaty Leshchiner; Christine G Lian; Andrew J Bierhals; Dana Vuzman; Robert P Mecham; Natasha Y Frank; Nathan O Stitziel Journal: Proc Natl Acad Sci U S A Date: 2016-07-18 Impact factor: 11.205
Authors: Luca A Lotta; Pawan Gulati; Felix R Day; Felicity Payne; Halit Ongen; Martijn van de Bunt; Kyle J Gaulton; John D Eicher; Stephen J Sharp; Jian'an Luan; Emanuella De Lucia Rolfe; Isobel D Stewart; Eleanor Wheeler; Sara M Willems; Claire Adams; Hanieh Yaghootkar; Nita G Forouhi; Kay-Tee Khaw; Andrew D Johnson; Robert K Semple; Timothy Frayling; John R B Perry; Emmanouil Dermitzakis; Mark I McCarthy; Inês Barroso; Nicholas J Wareham; David B Savage; Claudia Langenberg; Stephen O'Rahilly; Robert A Scott Journal: Nat Genet Date: 2017-01-31 Impact factor: 38.330
Authors: Heribert Schunkert; Inke R König; Sekar Kathiresan; Muredach P Reilly; Themistocles L Assimes; Hilma Holm; Michael Preuss; Alexandre F R Stewart; Maja Barbalic; Christian Gieger; Devin Absher; Zouhair Aherrahrou; Hooman Allayee; David Altshuler; Sonia S Anand; Karl Andersen; Jeffrey L Anderson; Diego Ardissino; Stephen G Ball; Anthony J Balmforth; Timothy A Barnes; Diane M Becker; Lewis C Becker; Klaus Berger; Joshua C Bis; S Matthijs Boekholdt; Eric Boerwinkle; Peter S Braund; Morris J Brown; Mary Susan Burnett; Ian Buysschaert; John F Carlquist; Li Chen; Sven Cichon; Veryan Codd; Robert W Davies; George Dedoussis; Abbas Dehghan; Serkalem Demissie; Joseph M Devaney; Patrick Diemert; Ron Do; Angela Doering; Sandra Eifert; Nour Eddine El Mokhtari; Stephen G Ellis; Roberto Elosua; James C Engert; Stephen E Epstein; Ulf de Faire; Marcus Fischer; Aaron R Folsom; Jennifer Freyer; Bruna Gigante; Domenico Girelli; Solveig Gretarsdottir; Vilmundur Gudnason; Jeffrey R Gulcher; Eran Halperin; Naomi Hammond; Stanley L Hazen; Albert Hofman; Benjamin D Horne; Thomas Illig; Carlos Iribarren; Gregory T Jones; J Wouter Jukema; Michael A Kaiser; Lee M Kaplan; John J P Kastelein; Kay-Tee Khaw; Joshua W Knowles; Genovefa Kolovou; Augustine Kong; Reijo Laaksonen; Diether Lambrechts; Karin Leander; Guillaume Lettre; Mingyao Li; Wolfgang Lieb; Christina Loley; Andrew J Lotery; Pier M Mannucci; Seraya Maouche; Nicola Martinelli; Pascal P McKeown; Christa Meisinger; Thomas Meitinger; Olle Melander; Pier Angelica Merlini; Vincent Mooser; Thomas Morgan; Thomas W Mühleisen; Joseph B Muhlestein; Thomas Münzel; Kiran Musunuru; Janja Nahrstaedt; Christopher P Nelson; Markus M Nöthen; Oliviero Olivieri; Riyaz S Patel; Chris C Patterson; Annette Peters; Flora Peyvandi; Liming Qu; Arshed A Quyyumi; Daniel J Rader; Loukianos S Rallidis; Catherine Rice; Frits R Rosendaal; Diana Rubin; Veikko Salomaa; M Lourdes Sampietro; Manj S Sandhu; Eric Schadt; Arne Schäfer; Arne Schillert; Stefan Schreiber; Jürgen Schrezenmeir; Stephen M Schwartz; David S Siscovick; Mohan Sivananthan; Suthesh Sivapalaratnam; Albert Smith; Tamara B Smith; Jaapjan D Snoep; Nicole Soranzo; John A Spertus; Klaus Stark; Kathy Stirrups; Monika Stoll; W H Wilson Tang; Stephanie Tennstedt; Gudmundur Thorgeirsson; Gudmar Thorleifsson; Maciej Tomaszewski; Andre G Uitterlinden; Andre M van Rij; Benjamin F Voight; Nick J Wareham; George A Wells; H-Erich Wichmann; Philipp S Wild; Christina Willenborg; Jaqueline C M Witteman; Benjamin J Wright; Shu Ye; Tanja Zeller; Andreas Ziegler; Francois Cambien; Alison H Goodall; L Adrienne Cupples; Thomas Quertermous; Winfried März; Christian Hengstenberg; Stefan Blankenberg; Willem H Ouwehand; Alistair S Hall; Panos Deloukas; John R Thompson; Kari Stefansson; Robert Roberts; Unnur Thorsteinsdottir; Christopher J O'Donnell; Ruth McPherson; Jeanette Erdmann; Nilesh J Samani Journal: Nat Genet Date: 2011-03-06 Impact factor: 38.330
Authors: Norihiro Kato; Marie Loh; Fumihiko Takeuchi; Niek Verweij; Xu Wang; Weihua Zhang; Tanika N Kelly; Danish Saleheen; Benjamin Lehne; Irene Mateo Leach; Molly Scannell Bryan; Yik-Ying Teo; Jiang He; Paul Elliott; E Shyong Tai; Pim van der Harst; Jaspal S Kooner; John C Chambers; Alexander W Drong; James Abbott; Simone Wahl; Sian-Tsung Tan; William R Scott; Gianluca Campanella; Marc Chadeau-Hyam; Uzma Afzal; Tarunveer S Ahluwalia; Marc Jan Bonder; Peng Chen; Abbas Dehghan; Todd L Edwards; Tõnu Esko; Min Jin Go; Sarah E Harris; Jaana Hartiala; Silva Kasela; Anuradhani Kasturiratne; Chiea-Chuen Khor; Marcus E Kleber; Huaixing Li; Zuan Yu Mok; Masahiro Nakatochi; Nur Sabrina Sapari; Richa Saxena; Alexandre F R Stewart; Lisette Stolk; Yasuharu Tabara; Ai Ling Teh; Ying Wu; Jer-Yuarn Wu; Yi Zhang; Imke Aits; Alexessander Da Silva Couto Alves; Shikta Das; Rajkumar Dorajoo; Jemma C Hopewell; Yun Kyoung Kim; Robert W Koivula; Jian'an Luan; Leo-Pekka Lyytikäinen; Quang N Nguyen; Mark A Pereira; Iris Postmus; Olli T Raitakari; Robert A Scott; Rossella Sorice; Vinicius Tragante; Michela Traglia; Jon White; Ken Yamamoto; Yonghong Zhang; Linda S Adair; Alauddin Ahmed; Koichi Akiyama; Rasheed Asif; Tin Aung; Inês Barroso; Andrew Bjonnes; Timothy R Braun; Hui Cai; Li-Ching Chang; Chien-Hsiun Chen; Ching-Yu Cheng; Yap-Seng Chong; Rory Collins; Regina Courtney; Gail Davies; Graciela Delgado; Loi D Do; Pieter A Doevendans; Ron T Gansevoort; Yu-Tang Gao; Tanja B Grammer; Niels Grarup; Jagvir Grewal; Dongfeng Gu; Gurpreet S Wander; Anna-Liisa Hartikainen; Stanley L Hazen; Jing He; Chew-Kiat Heng; James E Hixson; Albert Hofman; Chris Hsu; Wei Huang; Lise L N Husemoen; Joo-Yeon Hwang; Sahoko Ichihara; Michiya Igase; Masato Isono; Johanne M Justesen; Tomohiro Katsuya; Muhammad G Kibriya; Young Jin Kim; Miyako Kishimoto; Woon-Puay Koh; Katsuhiko Kohara; Meena Kumari; Kenneth Kwek; Nanette R Lee; Jeannette Lee; Jiemin Liao; Wolfgang Lieb; David C M Liewald; Tatsuaki Matsubara; Yumi Matsushita; Thomas Meitinger; Evelin Mihailov; Lili Milani; Rebecca Mills; Nina Mononen; Martina Müller-Nurasyid; Toru Nabika; Eitaro Nakashima; Hong Kiat Ng; Kjell Nikus; Teresa Nutile; Takayoshi Ohkubo; Keizo Ohnaka; Sarah Parish; Lavinia Paternoster; Hao Peng; Annette Peters; Son T Pham; Mohitha J Pinidiyapathirage; Mahfuzar Rahman; Hiromi Rakugi; Olov Rolandsson; Michelle Ann Rozario; Daniela Ruggiero; Cinzia F Sala; Ralhan Sarju; Kazuro Shimokawa; Harold Snieder; Thomas Sparsø; Wilko Spiering; John M Starr; David J Stott; Daniel O Stram; Takao Sugiyama; Silke Szymczak; W H Wilson Tang; Lin Tong; Stella Trompet; Väinö Turjanmaa; Hirotsugu Ueshima; André G Uitterlinden; Satoshi Umemura; Marja Vaarasmaki; Rob M van Dam; Wiek H van Gilst; Dirk J van Veldhuisen; Jorma S Viikari; Melanie Waldenberger; Yiqin Wang; Aili Wang; Rory Wilson; Tien-Yin Wong; Yong-Bing Xiang; Shuhei Yamaguchi; Xingwang Ye; Robin D Young; Terri L Young; Jian-Min Yuan; Xueya Zhou; Folkert W Asselbergs; Marina Ciullo; Robert Clarke; Panos Deloukas; Andre Franke; Paul W Franks; Steve Franks; Yechiel Friedlander; Myron D Gross; Zhirong Guo; Torben Hansen; Marjo-Riitta Jarvelin; Torben Jørgensen; J Wouter Jukema; Mika Kähönen; Hiroshi Kajio; Mika Kivimaki; Jong-Young Lee; Terho Lehtimäki; Allan Linneberg; Tetsuro Miki; Oluf Pedersen; Nilesh J Samani; Thorkild I A Sørensen; Ryoichi Takayanagi; Daniela Toniolo; Habibul Ahsan; Hooman Allayee; Yuan-Tsong Chen; John Danesh; Ian J Deary; Oscar H Franco; Lude Franke; Bastiaan T Heijman; Joanna D Holbrook; Aaron Isaacs; Bong-Jo Kim; Xu Lin; Jianjun Liu; Winfried März; Andres Metspalu; Karen L Mohlke; Dharambir K Sanghera; Xiao-Ou Shu; Joyce B J van Meurs; Eranga Vithana; Ananda R Wickremasinghe; Cisca Wijmenga; Bruce H W Wolffenbuttel; Mitsuhiro Yokota; Wei Zheng; Dingliang Zhu; Paolo Vineis; Soterios A Kyrtopoulos; Jos C S Kleinjans; Mark I McCarthy; Richie Soong; Christian Gieger; James Scott Journal: Nat Genet Date: 2015-09-21 Impact factor: 38.330
Authors: Jos van Rijssel; Jeffrey Kroon; Mark Hoogenboezem; Floris P J van Alphen; Renske J de Jong; Elena Kostadinova; Dirk Geerts; Peter L Hordijk; Jaap D van Buul Journal: Mol Biol Cell Date: 2012-06-13 Impact factor: 4.138
Authors: Paul Nioi; Asgeir Sigurdsson; Gudmar Thorleifsson; Hannes Helgason; Arna B Agustsdottir; Gudmundur L Norddahl; Anna Helgadottir; Audur Magnusdottir; Aslaug Jonasdottir; Solveig Gretarsdottir; Ingileif Jonsdottir; Valgerdur Steinthorsdottir; Thorunn Rafnar; Dorine W Swinkels; Tessel E Galesloot; Niels Grarup; Torben Jørgensen; Henrik Vestergaard; Torben Hansen; Torsten Lauritzen; Allan Linneberg; Nele Friedrich; Nikolaj T Krarup; Mogens Fenger; Ulrik Abildgaard; Peter R Hansen; Anders M Galløe; Peter S Braund; Christopher P Nelson; Alistair S Hall; Michael J A Williams; Andre M van Rij; Gregory T Jones; Riyaz S Patel; Allan I Levey; Salim Hayek; Svati H Shah; Muredach Reilly; Gudmundur I Eyjolfsson; Olof Sigurdardottir; Isleifur Olafsson; Lambertus A Kiemeney; Arshed A Quyyumi; Daniel J Rader; William E Kraus; Nilesh J Samani; Oluf Pedersen; Gudmundur Thorgeirsson; Gisli Masson; Hilma Holm; Daniel Gudbjartsson; Patrick Sulem; Unnur Thorsteinsdottir; Kari Stefansson Journal: N Engl J Med Date: 2016-05-18 Impact factor: 91.245
Authors: Jaap D van Buul; Michael J Allingham; Thomas Samson; Julia Meller; Etienne Boulter; Rafael García-Mata; Keith Burridge Journal: J Cell Biol Date: 2007-09-17 Impact factor: 10.539
Authors: Golareh Agha; Michael M Mendelson; Cavin K Ward-Caviness; Roby Joehanes; TianXiao Huan; Rahul Gondalia; Elias Salfati; Jennifer A Brody; Giovanni Fiorito; Jan Bressler; Brian H Chen; Symen Ligthart; Simonetta Guarrera; Elena Colicino; Allan C Just; Simone Wahl; Christian Gieger; Amy R Vandiver; Toshiko Tanaka; Dena G Hernandez; Luke C Pilling; Andrew B Singleton; Carlotta Sacerdote; Vittorio Krogh; Salvatore Panico; Rosario Tumino; Yun Li; Guosheng Zhang; James D Stewart; James S Floyd; Kerri L Wiggins; Jerome I Rotter; Michael Multhaup; Kelly Bakulski; Steven Horvath; Philip S Tsao; Devin M Absher; Pantel Vokonas; Joel Hirschhorn; M Daniele Fallin; Chunyu Liu; Stefania Bandinelli; Eric Boerwinkle; Abbas Dehghan; Joel D Schwartz; Bruce M Psaty; Andrew P Feinberg; Lifang Hou; Luigi Ferrucci; Nona Sotoodehnia; Giuseppe Matullo; Annette Peters; Myriam Fornage; Themistocles L Assimes; Eric A Whitsel; Daniel Levy; Andrea A Baccarelli Journal: Circulation Date: 2019-08-19 Impact factor: 29.690
Authors: Kathryn J Moore; Simon Koplev; Edward A Fisher; Ira Tabas; Johan L M Björkegren; Amanda C Doran; Jason C Kovacic Journal: J Am Coll Cardiol Date: 2018-10-30 Impact factor: 24.094
Authors: Boxiang Liu; Milos Pjanic; Ting Wang; Trieu Nguyen; Michael Gloudemans; Abhiram Rao; Victor G Castano; Sylvia Nurnberg; Daniel J Rader; Susannah Elwyn; Erik Ingelsson; Stephen B Montgomery; Clint L Miller; Thomas Quertermous Journal: Am J Hum Genet Date: 2018-08-23 Impact factor: 11.025