Through genome-wide association meta-analyses of up to 133,010 individuals of European ancestry without diabetes, including individuals newly genotyped using the Metabochip, we have increased the number of confirmed loci influencing glycemic traits to 53, of which 33 also increase type 2 diabetes risk (q < 0.05). Loci influencing fasting insulin concentration showed association with lipid levels and fat distribution, suggesting impact on insulin resistance. Gene-based analyses identified further biologically plausible loci, suggesting that additional loci beyond those reaching genome-wide significance are likely to represent real associations. This conclusion is supported by an excess of directionally consistent and nominally significant signals between discovery and follow-up studies. Functional analysis of these newly discovered loci will further improve our understanding of glycemic control.
Through genome-wide association meta-analyses of up to 133,010 individuals of European ancestry without diabetes, including individuals newly genotyped using the Metabochip, we have increased the number of confirmed loci influencing glycemic traits to 53, of which 33 also increase type 2 diabetes risk (q < 0.05). Loci influencing fasting insulin concentration showed association with lipid levels and fat distribution, suggesting impact on insulin resistance. Gene-based analyses identified further biologically plausible loci, suggesting that additional loci beyond those reaching genome-wide significance are likely to represent real associations. This conclusion is supported by an excess of directionally consistent and nominally significant signals between discovery and follow-up studies. Functional analysis of these newly discovered loci will further improve our understanding of glycemic control.
The Meta-Analyses of Glucose and Insulin-related traits Consortium (MAGIC) previously undertook meta-analyses of genome-wide association studies (GWAS) of glycemic traits in non-diabetic individuals, leading to the discovery of multiple associated loci: 16 for fasting glucose concentrations, two for fasting insulin concentrations and five for post-challenge glucose concentrations (2hGlu)[1-3]. These and subsequent studies highlighted important biological pathways implicated in glucose and insulin regulation[4,5]. They also demonstrated that some, but not all, loci associated with glycemic traits in non-diabetic individuals also affect the risk of type 2 diabetes (T2D)[1,6]. Despite the success of these efforts, the identification of new loci was limited by de novo genotyping capacity and cost, such that only a limited number of promising loci from discovery analyses were taken forward to follow-up analyses (often those reaching a threshold of ~P < 10−5 in discovery). Therefore, it is likely that many additional associations with common, low penetrance variants remain to be found among SNPs not previously selected for replication[7,8].The Illumina CardioMetabochip (Metabochip) is a custom Illumina iSELECT array of 196,725 SNPs developed to support cost-effective large-scale follow-up studies of putative association signals for a range cardiovascular and metabolic traits (~66,000 SNPs) and to fine-map established loci (~120,000 SNPs) (Supplementary Fig. 1)[9]. The ~66,000 follow-up SNPs were selected to enable genotyping of the most significant association signals for each of 23 metabolic traits contributed by a range of consortia. MAGIC contributed ~5,000 top ranking SNPs for fasting glucose, and ~1,000 each for fasting insulin and 2hGlu that had shown nominal association in discovery analyses (P < 0.02)[1,2].In the present study, we combined newly available samples with genotype data for these 66,000 follow-up SNPs with previous discovery meta-analyses to discover new association signals with glycemic traits. This approach identified 41 glycemic associations not previously described[1,2]: 20 for fasting glucose, 17 for fasting insulin and four for 2hGlu. This raises the number of associated loci to 36 for fasting glucose, 19 for fasting insulin and 9 for 2hGlu, explaining 4.8%, 1.2% and 1.7% of the variance in these traits, respectively. Of these 53 non-overlapping loci, 33 were also associated with T2D (P < 0.05), which while supporting the previous assertion of an imperfect correlation between these traits, also implicates new loci in the etiology of T2D and increases the overlap between glycemic and T2D loci.
RESULTS
Approaches to identify loci associated with glycemic traits
To follow up loci showing evidence of association (P < 0.02) in discovery GWAS, we investigated the 66,000 Metabochip follow-up SNPs for association with fasting glucose, fasting insulin and 2hGlu. We combined in meta-analysis data from up to 133,010 (fasting glucose), 108,557 (fasting insulin) and 42,854 (2hGlu) non-diabetic individuals of European ancestry, including individuals from the previous meta-analyses[1,2], individuals from new GWAS and individuals newly genotyped on the Metabochip array (Supplementary Fig. 2). All study characteristics are shown in Supplementary Table 1. Genome-wide association data for Filipino women were available (Supplementary Table 1), for which we report the effect directions and allele frequencies in Supplementary Tables 2a,b. Genome-wide significant (P < 5 × 10−8) association signals located more than 500 kb from, and not in LD (Hapmap CEU: r2 < 0.05) with, any variant already known to be associated with the trait were considered novel. Associated loci are referred to by the name of the nearest gene, unless a more biologically plausible gene was nearby, or a nearby gene was previously associated with another trait. In such cases, we list the nearest genes in Supplementary Tables 2a-d. As body mass index (BMI) is a major risk factor for T2D and is correlated with glycemic traits, we also performed analyses adjusted for BMI.Though not the main focus of this effort, given the increased variant density available on the Metabochip for established glycemic loci, we investigated whether these data would enable fine-mapping of underlying functional variants[1-3]. In these analyses, we included data from up to 53,622 individuals for fasting glucose, 42,384 for fasting insulin and 27,602 for 2hGlu from studies with Metabochip genotypes only. However, given the lack of samples from different ancestries and the absence of full conditional analyses, for the most part these analyses did not improve the resolution of association signals.Beyond individual SNP investigations for each glycemic trait, we also tested the hypothesis that gene-based analyses using VEGAS[10] would identify genes that harbor multiple association signals, which individually did not reach genome-wide significance. Among the ~66,000 SNPs, we used VEGAS to pool the results for all SNPs within each gene (± 50 kb) to identify genes with more evidence of association than expected by chance (given gene size and linkage disequilibrium structure) by simulation, and significant after Bonferroni-correction for multiple testing (P < 5 × 10−6).
Fasting glucose
In analyses of up to 133,010 individuals, we identified 20 loci with genome-wide significant associations to fasting glucose (P < 5 × 10−8) (Table 1 and Supplementary Figs. 3 and 4) and confirmed previously established loci[1] (Supplementary Table 2e). Of these 20 loci, nine (in or near IBKAP, LOC728489, WARS, KL, TOP1, P2RX2, AMT, RREB1 and GLS2) had not previously been associated with other metabolic traits (Box 1). Among these, KL (Klotho) is of particular interest. In addition to being associated with fasting glucose (but not fasting insulin), the glucose-raising allele is also associated with an increased risk of T2D (OR = 1.08 (1.04-1.11), P = 1.1 × 10−5) (Fig. 1). KL was first identified as a gene related to suppression of aging: its reduced expression was associated with reduced lifespan, as well as hypoglycemia[11]. Despite further animal studies supporting a role for KL in glucose metabolism[12] and insulin sensitivity[13], human studies have generally been small and inconclusive[14,15].
Table 1
SNPs associated with fasting glucose, fasting insulin and 2 hour-glucose at genome-wide significance level in Europeans
Primarytrait
FI(BMI-adjusted)
2hGlu
PrimaryTrait
SNP
Chr
Position
Gene
Alleles(effect/other)
Freqeffectallele
Effect (SE)
GlobalanalysisP value
Globalanalysis n
I2estimate(Pvalue)
Effect(SE)
Globalanalysis Pvalue
Globalanalysis n
Effect(SE)
Globalanalysis Pvalue
Globalanalysis n
rs10811661
9
22124094
CDKN2B
T/C
0.82
0.0238(0.003)
5.6 × 10−18
128,488
0.00(1.00)
−0.0065(0.003)
0.019
98,880
0.0567(0.014)
8.8 × 10−5
42,801
rs4869272
5
95565204
PCSK1*
T/C
0.69
0.0177(0.002)
1.0 × 10−15
131,872
0.00(1.00)
0.0016(0.002)
0.469
103,493
−0.0322(0.012)
0.006
42,848
rs11619319
13
27385599
PDX1
G/A
0.23
0.0195(0.002)
1.3 × 10−15
132,996
0.00(1.00)
0.0001(0.002)
0.977
103,492
0.0185(0.013)
0.156
42,848
FG
rs983309
8
9215142
PPP1R3B*
T/G
0.12
0.0256(0.003)
6.3 × 10−15
127,470
0.14(0.32)
0.0223(0.003)
1.2 × 10−12
99,024
−0.0548(0.016)
0.001
42,846
rs6943153
7
50759073
GRB10
T/C
0.34
0.0154(0.002)
1.6 × 10−12
131,795
0.00(1.00)
0.0091(0.002)
2.3 × 10−5
103,447
0.0110(0.011)
0.333
42,794
rs11603334
11
72110633
ARAP1
G/A
0.83
0.0192(0.003)
1.1 × 10−11
128,139
0.00(1.00)
−0.0046(0.003)
0.086
99,026
0.0294(0.014)
0.037
42,839
rs6113722
20
22505099
FOXA2
G/A
0.96
0.0353(0.005)
2.5 × 10−11
123,665
0.04(0.78)
−0.0095(0.005)
0.064
103,471
0.0493(0.030)
0.101
41,416
rs16913693
9
110720180
IKBKAP
T/G
0.97
0.0434(0.007)
3.5 × 10−11
125,115
0.00(1.00)
−0.0018(0.007)
0.785
96,357
0.0639(0.034)
0.062
40,522
rs3829109
9
138376587
LOC728489
G/A
0.71
0.0172(0.003)
1.1 × 10−10
115,310
0.25(0.07)
−0.0002(0.003)
0.948
94,964
0.0343(0.014)
0.013
36,803
rs3783347
14
99909014
WARS
G/T
0.79
0.0168(0.003)
1.3 × 10−10
132,544
0.02(0.89)
0.0017(0.003)
0.515
103,339
0.0274(0.014)
0.044
42,850
rs2302593
19
50888474
GIPR
C/G
0.50
0.0144(0.002)
9.3 × 10−10
116,141
0.27(0.05
0.0025(0.002)
0.265
96,976
−0.0322(0.012)
0.006
40,781
rs9368222
6
20794975
CDKAL1
A/C
0.28
0.0143(0.002)
1.0 × 10−9
128,453
0.09(0.50)
−0.0047(0.002)
0.037
98,894
0.0279(0.012)
0.023
42,825
rs10747083
12
131551691
P2RX2
A/G
0.66
0.0133(0.002)
7.6 × 10−9
127,111
0.00(1.00)
−0.0006(0.002)
0.785
99,895
0.0269(0.012)
0.026
42,790
rs6072275
20
39177319
TOP1
A/G
0.16
0.0159(0.003)
1.7 × 10−8
128,616
0.00(1.00)
0.0038(0.003)
0.169
99,018
−0.0110(0.014)
0.435
42,853
rs7651090
3
186996086
IGF2BP2
G/A
0.31
0.0128(0.002)
1.75 × 10−8
128,548
0.02(0.86)
0.0003(0.002)
0.900
98,924
0.0583(0.012)
1.05 × 10−6
42,814
rs576674
13
32452302
KL
G/A
0.15
0.0167(0.003)
2.3 × 10−8
131,856
0.00(1.00)
−0.0001(0.003)
0.983
103,472
0.0308(0.016)
0.060
42,849
rs11715915
3
49430334
AMT
C/T
0.68
0.0120(0.002)
4.9 × 10−8
131,523
0.30(0.02)
0.0059(0.002)
0.006
103,398
0.0273(0.012)
0.018
42,851
FG (BMI-adjusted)
rs17762454
6
7158199
RREB1
T/C
0.26
0.0140(0.002)
9.6 × 10−9
123,247
0.00(1.00)
−0.0002(0.002)
0.919
103,470
0.0007(0.013)
0.953
42,848
rs7708285
5
76461623
ZBED3
G/A
0.27
0.0150(0.003)
1.2 × 10−8
117,931
0.00(1.00)
0.0027(0.002)
0.265
98,341
0.0349(0.013)
0.008
42,803
rs2657879
12
55151605
GLS2
G/A
0.18
0.0157(0.003)
3.9 × 10−8
121,596
0.39(0.03)
−0.0024(0.003)
0.366
102,175
0.0200(0.014)
0.164
42,670
Primarytrait
FG
2hGlu
rs1421085
16
52358455
FTO
C/T
0.42
0.0200(0.003)
1.9 × 10−15
104,062
0.00(1.00)
0.0074(0.002)
0.001
128,597
0.0122(0.011)
0.278
42,849
FI
rs983309
8
9215142
PPP1R3B*
T/G
0.12
0.0287(0.004)
3.8 × 10−14
103,030
0.04(0.77)
0.0256(0.003)
6.3 × 10−15
127,470
−0.0548(0.016)
0.001
42,846
rs9884482
4
106301085
TET2
C/T
0.39
0.0165(0.002)
1.4 × 10−11
108,420
0.00(1.00)
0.0001(0.002)
0.946
132,869
0.0004(0.011)
0.973
42,745
rs7903146
10
114748339
TCF7L2
C/T
0.72
0.0181(0.003)
6.1 × 10−11
103,037
0.31(0.02)
−0.0220(0.002)
2.7 × 10−20
127,477
−0.0885(0.013)
5.6 × 10−12
42,851
rs10195252
2
165221337
GRB14*
T/C
0.59
0.0159(0.003)
4.9 × 10−10
99,126
0.00(1.00)
0.0053(0.002)
0.014
127,005
0.0361(0.011)
0.001
42,846
rs1167800
7
75014132
HIP1
A/G
0.54
0.0156(0.003)
2.6 × 10−9
91,416
0.00(1.00)
0.0016(0.002)
0.470
118,536
−0.0133(0.012)
0.272
38,884
rs2820436
1
217707303
LYPLAL1
C/A
0.67
0.0153(0.003)
4.4 × 10−9
104,044
0.01(0.97)
0.0077(0.002)
0.001
128,580
−0.0041(0.012)
0.723
42,843
rs2745353
6
127494628
RSPO3
T/C
0.51
0.0143(0.002)
5.5 × 10−9
104,075
0.06(0.67)
−0.0009(0.002)
0.677
128,615
−0.0005(0.011)
0.962
42,853
rs731839
19
38590905
PEPD
G/A
0.34
0.0145(0.003)
1.7 × 10−8
104,636
0.13(0.38)
0.0046(0.002)
0.038
132,528
0.0142(0.012)
0.220
42,846
rs4865796
5
53308421
ARL15
A/G
0.67
0.0146(0.003)
2.1 × 10−8
100,001
0.03(0.81)
0.0043(0.002)
0.052
127,784
0.0337(0.012)
0.004
42,852
rs2972143
2
226824609
IRS1
G/A
0.62
0.0142(0.003)
3.2 × 10−8
99,566
0.00(1.00)
0.0035(0.002)
0.107
127,473
0.0195(0.011)
0.082
42,853
rs1530559
2
135472099
YSK4
A/G
0.52
0.0145(0.003)
3.4 × 10−8
107,281
0.19(0.18)
0.0037(0.002)
0.100
129,880
0.0200(0.011)
0.077
42,849
rs2943645
2
226807424
IRS1
T/C
0.63
0.0193(0.002)
2.3 × 10−19
99,023
0.00(1.00)
0.0034(0.002)
0.112
127475
0.0210(0.011)
0.061
42,846
rs10195252
2
165221337
GRB14*
T/C
0.60
0.0174(0.002)
1.3 × 10−16
98,997
0.00(1.00)
0.0053(0.002)
0.014
127005
0.0361(0.011)
0.001
42,846
rs2126259
8
9222556
PPP1R3B
T/C
0.11
0.0238(0.003)
3.3 × 10−13
99,021
0.14(0.51)
0.0213(0.003)
5.4 × 10−10
127480
−0.0877(0.017)
1.8 × 10−7
42,849
FI (BMI-adjusted)
rs4865796
5
53308421
ARL15
A/G
0.67
0.0154(0.002)
2.2 × 10−12
98,314
0.48(0.01)
0.0043(0.002)
0.052
127784
0.0337(0.012)
0.004
42,852
rs17036328
3
12365484
PPARG
T/C
0.86
0.0212(0.003)
3.6 × 10−12
98,984
0.21(0.31)
0.0051(0.003)
0.103
128567
0.0335(0.016)
0.031
42,843
rs731839
19
38590905
PEPD
G/A
0.34
0.0148(0.002)
5.1 × 10−12
103,252
0.13(0.55)
0.0046(0.002)
0.038
132528
0.0142(0.012)
0.220
42,847
rs974801
4
106290513
TET2
G/A
0.38
0.0139(0.002)
3.3 × 10−11
103,489
0.09(0.67)
0.0012(0.002)
0.582
131866
0.0052(0.011)
0.643
42,849
rs459193
5
55842508
ANKRD55/MAP3K1
G/A
0.73
0.0147(0.002)
1.12 × 10−10
103,378
0.27(0.17)
0.0111(0.002)
1.6 × 10−6
132989
0.0276(0.012)
0.023
42,849
rs6822892
4
157954125
PDGFC
A/G
0.68
0.0138(0.002)
2.6 × 10−10
103,432
0.00(1.00)
0.0010(0.002)
0.636
132951
0.0256(0.012)
0.031
42,836
rs4846565
1
217788727
LYPLAL1
G/A
0.67
0.0132(0.002)
1.8 × 10−9
99,014
0.00(1.00)
0.0066(0.002)
0.003
127468
0.0132(0.012)
0.254
42,853
rs3822072
4
89960292
FAM13A1
A/G
0.48
0.0116(0.002)
1.8 × 10−8
99,977
0.00(1.00)
0.0025(0.002)
0.236
129432
0.0161(0.011)
0.143
42,850
rs6912327
6
34872900
UHRF1BP1
T/C
0.80
0.0165(0.003)
2.3 × 10−8
80,010
0.04(0.91)
0.0074(0.003)
0.011
103826
0.0139
0.391
34,761
Primarytrait
FG
FI(BMI-adjusted)
2hGlu
rs6975024
7
44198411
GCK
C/T
0.15
0.1026(0.016)
5.2 × 10−11
42,842
0.00(1.00)
0.0605(0.003)
2.9 × 10−99
103,517
0.0063(0.003)
0.030
98,458
rs11782386
8
9239197
PPP1R3B*
C/T
0.87
0.0985(0.017)
2.2 × 10−9
42,852
0.00(1.00)
−0.0167(0.003)
5.5 × 10−7
100,595
−0.0164(0.003)
6.9 × 10−7
95,565
rs1019503
5
96280573
ERAP2
A/G
0.48
0.0628(0.011)
8.9 × 10−9
42,851
19.6(0.42)
−0.0061(0.002)
0.003
108,113
0.0004(0.002)
0.851
103,448
2hGlu(BMI-adjusted)
rs7651090
3
186996086
IGF2BP2
G/A
0.30
0.064(0.012)
4.5 × 10−8
42,792
63.4(0.01)
0.0128(0.002)
1.8 × 10−8
104,019
0.0003(0.002)
0.900
98,924
Genome-wide loci for fasting glucose (FG), fasting insulin (FI), FI (adjusted for BMI) and 2hGlu are shown along with results for the other traits aligned to the trait-raising allele for the primary trait. “Non-MAGIC” SNPs (identified in other consortia and selected for the Metabochip to follow-up on other non-MAGIC traits) are indicated in bold. Freq denotes the allele frequency of the primary trait-raising allele. Per allele effect (SE) for FI represents changes of natural-log transformed levels of FI. N represents sample size. Heterogeneity was assessed using the I2 index[86]. The gene shown is the nearest gene to the lead SNP, other than those loci marked with an asterisk. For these loci, the nearest gene is also listed in Supplementary Tables 2a-d.
Figure 1
Associations between glycemic loci and T2D, HDL-cholesterol and triglycerides, BMI and WHR. Loci associated with the above traits (P < 0.05) are highlighted. Those with positively correlated effect directions are colored yellow, and those with negative correlations are colored blue. Those which did not reach a q-value < 0.05 in FDR analyses are also marked.
We also identified new associations with fasting glucose in regions previously associated with other metabolic traits or disease outcomes, including T2D[6,16] (ARAP1, CDKN2B, GRB10, CDKAL1, IGF2BP2 and ZBED3, which was identified in BMI-adjusted models) and 2hGlu[2] (GIPR), as well as confirming the recently identified signals for fasting glucose[17-19] at FOXA2, PPP1R3B, PCSK1 and PDX1. FOXA2 is a forkhead transcription factor that regulates PDX1 expression, while PDX1 encodes a transcription factor critical for pancreatic development[20]. PDX1 mutations have been linked to MODY4 (ref. 21), pancreatic agenesis[22] and permanent neonatal diabetes[23], although we observed no significant association with T2D in DIAGRAM Metabochip analyses[24] (Fig. 1).Given the overlap between genetic loci for fasting glucose and other metabolic traits, we performed a systematic look-up of all glycemic loci and their associations with other metabolic traits using data available through other consortia[25-27]. In DIAGRAM Metabochip analyses[24], 22 (>60%) of the now 36 genome-wide significant fasting glucose loci showed association (P < 0.05; FDR q < 0.05) with T2D (Fig. 1). In all cases, the glucose-raising allele was associated with increased risk of T2D, yet the fasting glucose effect size and T2D odds ratio were weakly correlated (Fig. 2a).
Figure 2
Per-allele beta-coefficients for glucose and insulin concentrations vs. odds ratios for T2D. (a) Fasting glucose vs. T2D. (b) Fasting insulin vs. T2D. (c) Fasting insulin adjusted for BMI vs. T2D. (d) 2-h glucose vs. T2D.
Gene-based analyses confirmed many of the loci identified in individual SNP analyses (Supplementary Table 3a) and identified another nine genomic regions (containing 14 genes) with significant association signals (P < 5 × 10−6), including some with biological candidacy, such as the HKDC1 gene, encoding a putative hexokinase[28].
Fasting insulin
In 108,557 individuals, we identified 17 additional loci with genome-wide significant associations to fasting insulin and confirmed known associations[1]. These newly identified loci include variants in or near HIP1, TET2, YSK4, PEPD and FAM13A1 (Table 1, Box 1 and Supplementary Figs. 3 and 4), as well as SNPs near loci previously associated with other metabolic traits, including T2D[6] (TCF7L2, PPARG), BMI[29] (FTO), waist-hip ratio[26] (WHR) (LYPLAL1, RSPO3, GRB14), triglycerides[27] (ANKRD55-MAP3K1) and adiponectin[30] (ARL15). We also confirmed the recent associations with fasting insulin at GRB14, PPP1R3B, LYPLAL1, IRS1, UHRF1BP1 and PDGFC[19]. The ANKRD55-MAP3K1 association is of interest as MAP3K1 regulates expression of IRS1 (ref. 31) as well as activation of NF-κB[32,33] and the JNK pathway[34], both centrally implicated in insulin resistance[35,36]. Furthermore, data from DIAGRAM Metabochip analyses show that the insulin-raising allele at this SNP is strongly associated with increased risk of T2D[24].In contrast to fasting glucose (Supplementary Fig. 5a), in fasting insulin analyses adjusted for BMI, we observed a systematic decrease in the standard errors of the SNP effect estimates (Supplementary Fig. 5b), perhaps because BMI explains more of the variance in fasting insulin (R2 = 32.6%) than in fasting glucose (R2 = 8.6%) or 2hGlu (R2 = 11.0%) (data from the Fenland study). Therefore, BMI adjustment removes more variance in fasting insulin, thereby rendering genetic associations more readily detectable. This is supported by the identification of five additional loci in BMI-adjusted models by this approach (Table 1 and Supplementary Figs. 3 and 4). As expected, BMI-adjustment abolished fasting insulin associations at FTO (P = 0.71; Supplementary Table 2b), suggesting that the association with fasting insulin is mediated entirely through association with BMI.In total, 13 of the 19 loci associated with fasting insulin also showed associations with T2D (P < 0.05; FDR q < 0.05) (Fig. 1), with the insulin-raising allele associated with higher risk of T2D, except for TCF7L2 (Fig. 2b,c), where the allele associated with lower fasting insulin is associated with higher fasting glucose (Table 1). Notably, the loci associated with fasting insulin showed a pattern of association with lipid traits consistent with insulin resistance, and not observed for either fasting glucose or 2hGlu loci (Fig. 1). Thirteen (~68%) of the 19 loci were associated with HDL-cholesterol (q < 0.05): all insulin-raising alleles associated with lower HDL levels, nine of which were also associated with higher triglycerides (q < 0.05) (Fig. 1). Further, the insulin-raising alleles of four SNPs were associated with higher WHR (adjusted for BMI) (q < 0.05) (Fig. 1), another trait linked to insulin resistance, while five SNPs were also associated with BMI, although with inconsistent direction (q < 0.05) (Fig. 1).In gene-based analyses, we focused on BMI-adjusted results to account for the variance in fasting insulin explained by BMI. Beyond those loci containing genome-wide significant SNPs, we identified 7 distinct regions (containing 22 genes) after Bonferroni-correction (P < 5 × 10−6). Among these genes, we identified many for which prior biological evidence suggests their role in pathways involved in insulin secretion or action (Supplementary Table 3b). While the lead SNP in PPARD was not genome-wide significant (P = 3.9 × 10−6), the PPARD gene, a regulator of adipose, hepatic and skeletal muscle metabolism[37], reached the gene-based significance threshold (P < 1 × 10−6). PPARD agonists have also been shown to induce insulin sensitising effects in a mouse model[38]. In addition, we identified PTEN to be associated (Supplementary Table 3b), a gene previously suggested to affect glucose metabolism through regulation of insulin signalling[39], and in which a muscle-specific deletion protected mice from insulin resistance and diabetes resulting from high fat feeding[40].
2-h glucose
In 42,854 individuals, we identified four additional loci to be associated with 2hGlu (Table 1 and Supplementary Figs. 3 and 4), including a signal near ERAP2 and three signals near loci previously associated with fasting glucose[1] (GCK), HDL-cholesterol[27] (PPP1R3B) and T2D[6] (IGF2BP2), as well as confirming the five previous associations[2]. To determine whether these associations reflected differences in the response to a glucose challenge, or were partly driven by effects on fasting glucose, we also performed analyses adjusted for fasting glucose. No additional loci were identified as genome-wide significant after adjustment for fasting glucose, although the GCK association with 2hGlu was severely attenuated (β = 0.04 (SE = 0.016) mmol/L/allele, P = 0.005 vs. β = 0.1 (0.016) mmol/L/allele, P = 5.3 × 10−11 in the model unadjusted for fasting glucose), suggesting that the association with 2hGlu is driven, at least in part, by a primary association with fasting glucose (Supplementary Table 2d). The association of SNPs near GCK with both fasting glucose and 2hGlu suggests a generalized raising of the glucose set point, consistent with inactivating mutations of GCK that cause MODY[41]. As for fasting glucose, when 2hGlu models were adjusted for BMI, no systematic differences were observed, although again the IGF2BP2 SNP rs7651090 reached genome-wide significance (Table 1).Eight of the 9 SNPs associated with 2hGlu at genome-wide levels of significance were also associated with T2D (q < 0.05) (Fig. 1), although the 2hGlu-raising alleles at PPP1R3B, GCKR and VPS13C-C2CD4A/B were associated with lower risk of T2D (Fig. 2d), consistent with their association with lower fasting glucose levels (Table 1 and Supplementary Table 2e).In addition to SNPs that were genome-wide significant in individual SNP analyses, we identified three regions (containing six genes) showing association with 2hGlu in gene-based analyses. These included the HKDC1 gene, as well as an association signal at CRHR1 (P = 2 × 10−6) (Supplementary Table 3c), mostly driven by the lead SNP in this gene (rs17762954), which approached genome-wide significance (P = 7.4 × 10−7). CRHR1, together with GIPR, belongs to the family of class B GPCRs and is highly expressed in pancreatic β-cells, where stimulation of the receptor potentiates insulin secretion in response to glucose[42].
Fine-mapping of established loci
Analyses at higher SNP density around previously established loci did not generally yield stronger associations or more plausible functional variants (Supplementary Table 4). For fasting glucose, markedly more significant SNPs or larger effect size than the previous lead SNP were observed for four of the 16 loci: PROX1, GCK, ADRA2A and VPS13C-C2CD4A/B (Supplementary Table 4). Regional plots for these loci are shown in Supplementary Figure 6. While the new lead SNP near ADRA2A was not markedly more significant than the previous lead SNP, the effect size is almost double that of the previous lead SNP (Supplementary Table 4). However, this and other new lead SNPs were without more plausible functionality. The new lead SNP at VPS13C-C2CD4A/B, previously associated with proinsulin[43], is far more significant and of larger effect size than the previous lead SNP (β = 0.0273 (SE = 0.0035) mmol/L/allele, P = 4.8 × 10−15
vs. β = 0.0057 (0.0036) mmol/L/allele, P = 0.111; r2 = 0.27). For fasting insulin, another SNP downstream of IGF1 was found to be more significant and with a larger effect size, although with no known functionality (Supplementary Table 4 and Supplementary Fig. 6). For 2hGlu, again, another SNP at VPS13C-C2CD4A/B was more significant than the previous lead SNP (Supplementary Table 4 and Supplementary Fig. 6) and was previously associated with diabetes in Chinese individuals[44].
Pathway analysis
Next, we explored whether glycemic loci were enriched for connectivity between genes representing particular pathways or processes. To do this, we used GRAIL software[45] and investigated both an excess of connectivity between the established loci (genome-wide significant) and then between established loci and those loci that did not reach genome-wide significance but that showed a lower level of association (P < 0.0005) (Online Methods). We aimed to establish whether there were any biologically relevant genes among this longer list of suggestively-associated loci. This more liberal threshold yielded 218, 155 and 100 regions for fasting glucose, fasting insulin and 2hGlu, respectively. To further assess whether the established loci represented common biological pathways, we used MAGENTA to undertake gene-set enrichment analyses (Online Methods).We found that genes near the 36 loci associated with fasting glucose had a high degree of connectivity (refer to Online Methods for definition of how genes were selected), with eight genes demonstrating highly significant similarity to genes in other loci at a Pgrail < 0.01 level, connected by keywords such as “glucose”, “insulin”, “pancreatic” and “diabetes” (Supplementary Table 5a and Supplementary Fig. 7), and more than expected by chance (P = 0.003). We observed less connectivity among the genome-wide significant fasting insulin and 2hGlu loci, with no genes reaching Pgrail < 0.01 for fasting insulin (Supplementary Table 5b) and only one out of nine genes for 2hGlu (P = 0.07) (Supplementary Table 5c).Among the list of 218 suggestively-associated fasting glucose loci (P < 0.0005), we observed 13 genes to be connected to those in the genome-wide significant loci at Pgrail < 0.01, more than expected by chance (P = 0.003) (Supplementary Table 6a). These included genes such as GLP1R (P = 3.3 × 10−7) (a glucagon receptor that mediates the GLP-1 incretin effect and stimulates insulin release), IRS2 (P = 6.9 × 10−5; central to development and maintenance of β-cell mass and function[46,47]) and INS (P = 2.5 × 10−6; the insulin gene encoding proinsulin). The presence of these and other genes support our conjecture that many of the SNPs approaching genome-wide significance are likely to represent true associations. Of the 155 suggestively-associated loci for fasting insulin (adjusted for BMI), we observed seven to be connected to the genome-wide significant loci at Pgrail < 0.01; more than expected by chance (P = 0.002), and these included INSR (P = 1.5 × 10−4; encoding insulin receptor precursor), CD36 (P = 0.001; previously implicated in insulin resistance[48]), GCG (P = 0.008; glucagon gene) and HNF1A (P = 0.005; mutations in which are associated with MODY3)[49] (Supplementary Table 6b). Of the 100 suggestively-associated loci for 2hGlu (P < 0.0005), we found three to reach P < 0.01 (P = 0.014) and the gene highlighted as most biologically connected to the genome-wide significant loci was again HNF1A (P = 3.4 × 10−4) (Supplementary Table 6c).Using MAGENTA, we identified four pathways enriched for fasting glucose associations: GOTERM pathways lens development in camera-type eye (P = 0.004), PANTHER processes gut mesoderm development (P = 0.009), other steroid metabolism (P = 0.02) and KEGG MODY pathway (P = 0.03), although these were no longer significant after removing lead genes (P > 0.05), all of which were known fasting glucose loci: PROX1 for eye and gut, and G6PC2 and GCK for steroid and MODY pathways, respectively.
Directional consistency of associations between discovery and follow-up studies
Given the wealth of biologically plausible genes in loci near genome-wide significance (Supplementary Tables 6a-c) and the deviation of the observed distribution from the expected in QQ plots even after removing all established loci (Supplementary Fig. 8a-d), we hypothesized that additional loci not reaching genome-wide significance were likely to represent true associations with small effects. To establish the presence of further true associations that did not reach genome-wide significance, we compared SNP associations in discovery studies (those included in the original meta-analyses for 42,078 (fasting glucose), 34,230 (fasting insulin) and 15,252 (2hGlu) individuals[1,2]) with those in the “follow-up” studies (consisting of 85,710 (fasting glucose), 69,240 (fasting insulin) and 27,602 (2hGlu) individuals). We identified all SNPs which had a nominally significant association (P < 0.05) in the follow-up studies alone and, for these SNPs, performed a binomial test of whether more SNPs than expected by chance (50%) had a consistent direction of effect with that observed in the discovery analyses. We were also able to compare among SNPs nominated for follow-up by different consortia (Supplementary Fig. 9a-d).For each trait, evaluation of the 66,000 Metabochip follow-up SNPs demonstrated a significant excess of SNPs showing directionally consistent associations (P < 0.05) compared to that expected by chance (fasting glucose: P = 5.01 × 10−12; fasting insulin: P = 7.58 × 10−13; fasting insulin (adjusted for BMI): P = 9.76 × 10−9; 2hGlu: P = 2.37 × 10−6; Supplementary Table 7 and Supplementary Fig. 9a-d). FDR analyses suggested that a number of these nominal associations in the follow-up studies are true positives for fasting glucose and fasting insulin in particular (fasting glucose: 23%; fasting insulin: 24%; Supplementary Table 7). Interestingly, when we evaluated consistency of association with fasting insulin (between discovery and follow-up) among SNPs submitted to the Metabochip by other consortia, SNPs submitted by GIANT (anthropometric traits) (P = 1.52 × 10−8) and GLGC (lipid traits) (P = 1.15 × 10−6) and for BMI and triglycerides in particular also demonstrated a marked excess of directional consistency (Supplementary Table 7 and Supplementary Fig. 9b). When we performed the same test for fasting insulin adjusted for BMI, the observed enrichment among SNPs submitted by GIANT and GLGC was attenuated (Supplementary Table 7 and Supplementary Fig. 9c), although SNPs nominated to follow up on triglyceride associations remained the most significant (P = 3.18 × 10−7, Supplementary Table 7 and Supplementary Fig. 9c). Of the 3,353 SNPs submitted for follow-up of triglyceride associations, 158 SNPs showed nominal significance (P < 0.05) in follow-up studies and consistent direction of association with fasting insulin (adjusted for BMI) in both discovery and follow-up (Supplementary Table 7). In 139 (88%) of these SNPs, the insulin-raising alleles were associated with higher levels of triglycerides, consistent with the positive correlations between fasting insulin and triglyceride associations observed among the genome-wide significant fasting insulin loci (Fig. 1).
DISCUSSION
In the current meta-analysis of ~66,000 Metabochip follow-up SNPs in up to 133,010 individuals, we identified a large number of loci to be associated with glycemic traits, explaining 4.8%, 1.2% and 1.7% of the variance in fasting glucose, fasting insulin and 2hGlu, respectively. Of the 53 glycemic loci, 33 are also associated with increased T2D risk (q < 0.05), extending the overlap between glycemic and T2D loci. Given the current DIAGRAM effective sample size of 106,953 individuals, we can exclude an effect on T2D of 1.04 with 80% power for alleles more frequent than 5%, effectively confirming that the overlap is incomplete and that many loci associated with glycemic traits have no discernible effect on T2D (Figs. 1 and 2).Previously, we had detected only two loci associated with fasting insulin and had hypothesized that this might be due to a different genetic architecture of this trait compared to fasting glucose, with potentially smaller effect sizes, lower frequency alleles or greater environmental influence on fasting insulin[1]. In the current meta-analysis including up to 108,557 individuals (compared to 62,264 individuals previously), we expanded the number of loci associated with this trait to 19. Of note was the effect of BMI-adjustment on our ability to detect additional loci (five non-overlapping with unadjusted results)[19]. We also noted that some of the loci influencing fasting insulin uncovered after BMI-adjustment are likely to have been negatively confounded in previous efforts: at some loci, the insulin-raising allele was nominally associated with lower BMI (potentially via insulin resistance attenuating the anabolic effects of insulin). Given the positive correlation between BMI and fasting insulin, it is likely that this association previously masked their effect on fasting insulin. Fasting insulin loci showed directionally consistent association with lipid levels (HDL and triglycerides); that is, the insulin-raising allele was associated with lower HDL and higher triglyceride levels, a hallmark combination in insulin resistant individuals. We also observed some overlap between fasting insulin loci and those associated with abdominal obesity (Fig. 1). Jointly, these data suggest links of these fasting insulin loci to insulin resistance-related phenotypes. Indeed, some of the fasting insulin loci identified, such as IRS1 and PPARG, are classically known to exert effects on insulin action or sensitivity[50,51].There are now 36 established fasting glucose loci, many of which contain compelling biological candidate genes with plausible causality, including those encoding transcription factors with known roles in pancreas development (e.g. PDX1, FOXA2, PROX1, GLIS3) and genes involved in β-cell function and insulin secretion pathways (SLC2A2, GCK, PCSK1). For 2hGlu, only nine loci have been established to date, which likely reflects the smaller sample size available and consequently reduced power.Comparing the consistency of the direction of associations for glycemic traits between “discovery” and “follow-up” studies suggests that we are observing more directionally consistent associations than expected by chance among Metabochip follow-up SNPs (Supplementary Fig. 9a-d). This, combined with the excess of biologically plausible genes among the borderline loci (Supplementary Table 6a-c), suggests that beyond the genome-wide significant loci there is a more extensive list of loci still likely to contain true associations. Indeed, some of these loci are implicated by gene-based analyses, which identify genes with compelling biological credentials. For fasting insulin, these analyses revealed additional loci with previously suggested links to insulin resistance (PPARD and PTEN). These results lend further support to the proposal that a long tail of common variants of small effect size are likely to account for a substantial proportion of the variance of complex traits[7,8].Of note is the number of glycemic loci associated with other metabolic traits (q < 0.05; 34 of 53) and also at genome-wide levels of significance (P < 5 × 10−8) (14 of 53) (Fig. 1), potentially implicating pleiotropic effects. Further support for this notion comes from the analysis of loci nominated for the Metabochip by other consortia and their associations with glycemic traits (Supplementary Fig. 9a-d). Indeed, some of the loci associated with glycemic traits at genome-wide significance levels were not originally nominated to the Metabochip for follow-up by MAGIC (Table 1). Metabochip data available across all contributing consortia will facilitate systematic exploration of these correlated phenotypes with more sophisticated statistical methods for joint analysis[52-54], yielding greater insight into the underlying pathways and genetic networks they represent. As data from human genetic networks accrue, we will be better placed to test whether there is support for the notion of “hub” genes; that is, genes highly connected with others in the network and proposed by experiments in C. elegans to act as buffers for genetic variation and that could act as modifier genes for many different disorders[55].In summary, we present a large number of genome-wide significant loci influencing glycemic traits, many with a compelling biological basis for their association, as well as a number of loci not previously implicated in glycemic regulation, and for which fine-mapping and functional follow-up will expand and improve our understanding. Use of the Metabochip for deep follow-up has identified additional loci to be involved in glycemic regulation that, due to insufficient sample size and power, did not reach genome-wide significance. Consideration of such loci in future studies will better exploit data from GWAS and complimentary approaches and further improve our biological understanding of glycemic control and the etiology of diabetes.
ONLINE METHODS
Study design
The Illumina CardioMetabochip (Metabochip) is a custom Illumina iSELECT array of 196,725 SNPs. It has been designed to support efficient large-scale follow-up of putative associations for glycemic (including fasting glucose, fasting insulin and post-challenge glucose concentrations (2hGlu) and other metabolic and cardiovascular traits (Supplementary Fig. 1)[9], and to enable the fine-mapping of established loci. Overall, there were 65,435 SNPs genotyped on the Metabochip for follow-up of previous associations including a total of 23 cardio-metabolic traits. Traits contributing SNPs to the Metabochip were prioritized into “primary” (including fasting glucose) and “secondary” (including fasting insulin and 2hGlu) contributing ~5K and ~1K SNPs, respectively, from the most significantly associated variants for each phenotype in the discovery meta-analyses from each contributing consortium. This included 5,055 SNPs for follow-up of fasting glucose, 1,046 for fasting insulin and 1,038 for follow up of 2hGlu associations. In the present analysis, we focused our analysis on this set of “follow-up” SNPs available on the Metabochip to establish variants among these SNP associated with glycemic traits. While we also included newly available studies genotyped on genome-wide platforms, we limited our primary analyses to only these ~66,000 SNPs.
Studies
In the present effort, collaborating studies within the Meta-Analysis of Glucose and Insulin related traits Consortium (MAGIC) provided results for the 66,000 “follow-up” SNPs genotyped on Metabochip on a maximum total of 133,010 (fasting glucose), 108,557 (fasting insulin) and 42,854 (2hGlu) individuals. As well as those newly genotyped on the Metabochip platform, in our overall meta-analysis we were able to include further studies which had genotyped or imputed the same SNPs on other platforms. The largest proportion of our entire sample was directly genotyped on the Metabochip and comprised 53,622 (fasting glucose), 42,384 (fasting insulin) and 27,602 (2hGlu) individuals from 26/21/12 studies, respectively. We were also able to recruit 11,690 (fasting glucose) and 8,813 (fasting insulin) individuals from up to four additional GWA studies (Prevend, Ascot (FG-only), Prosper and TRAILS) (Supplementary Table 1) not included in the original meta-analysis[1]. From another MAGIC study of sex-specific associations with glycemic traits (I. Prokopenko on behalf of the MAGIC authors, personal communication), we were able to recruit another 15/13 independent studies comprising up to 25,618 and 23,130 individuals for fasting glucose and fasting insulin, respectively. The above studies were combined in a single fixed-effects meta-analysis with those studies included in the original GWAS[1,2]: 20 (fasting glucose), 19 (fasting insulin) and 9 (2hGlu) studies and 42,080 (fasting glucose), 34,230 (fasting insulin) and 15,252 (2hGlu) individuals, as described previously[1,2]. The study and individual counts from the original GWAS excluded the family-based SardiNIA study where, initially, a large number of the individuals had imputed genotype data only. The entire sample was directly genotyped on Metabochip, so those data were included in place of the original GWAS. Some studies had genotyping data available from both Metabochip and genome-wide arrays but from entirely independent samples within the studies (Supplementary Table 1). Full study characteristics of all Metabochip studies are shown in Supplementary Table 1, while data from discovery genome-wide studies and those from the sex-specific analyses are reported elsewhere[1,2] (I. Prokopenko on behalf of the MAGIC authors, personal communication). All participants of the main analysis were of European descent and mostly adults, although data from a total of 7,872 and 7,164 adolescents were also included in the fasting glucose and fasting insulin meta-analyses, respectively (NFBC86, Leipzig-childhood_IFB, TRAILS and ALSPAC studies). All studies were approved by local research ethic committees and all participants gave informed consent. Results from the CLHNS study of Filipino women (n = 1,682 and 1,635 for fasting glucose and fasting insulin, respectively) genotyped on Metabochip were also available and were included in supplementary analyses to compare effect directions with European-descent studies alone.
Phenotypes
Analyses were undertaken for fasting glucose and fasting insulin measured in mmol/l and pmol/l, respectively. 2hGlu was measured in mmol/l. Similar to the previous MAGIC discovery analysis[1,2], individuals were excluded from the analysis if they had a physician diagnosis of diabetes, were on diabetes treatment (oral or insulin), or had a fasting plasma glucose equal to or greater than 7 mmol/l. Individual studies applied further sample exclusions, including pregnancy, non-fasting individuals and type 1 diabetes, as detailed in Supplementary Table 1. Individuals from case-control studies (Supplementary Table 1) were excluded if they had hospitalization or blood transfusion in the 2-3 months before phenotyping took place. 2hGlu measures were done 120 min after a glucose challenge during an oral glucose tolerance test (OGTT). Measures of fasting glucose and 2Glu made in whole blood were corrected to plasma level using the correction factor of 1.13 (ref. 87). Fasting insulin was measured in serum. Detailed descriptions of study-specific glycemic measurements are given in the Supplementary Table 1.
Trait transformations and adjustment
Analyses were performed for untransformed levels of fasting glucose, natural logarithm transformed fasting insulin and untransformed 2hGlu using a linear regression model. All analyses were adjusted for age (if applicable), study site (if applicable) and geographical covariates (if applicable) to evaluate the association using an additive genetic model at each genetic SNP variant.
BMI-adjusted analysis
In the Fenland study (Supplementary Table 1), we investigated the correlation between BMI and natural logarithm transformed fasting insulin, fasting glucose and 2hGlu to establish the variation in each trait explained by BMI. Meta-analyses for each trait were also adjusted for body mass index (BMI). Metabochip and new GWA studies performed study-level analyses adjusted for BMI. Most studies in the original GWAS (except deCode, GEMs, KORAF4, TwinsUK studies) as well as from the studies analyzed in a sex-specific manner were included in BMI-adjusted meta-analysis. The original discovery 2hGlu meta-analysis adjusted for BMI[2] was also included in these analyses. We also performed an analysis for 2hGlu adjusted for fasting glucose to investigate if additional variants would be identified with an effect on 2hGlu independent of fasting glucose and also to establish whether identified 2hGlu associations were driven by fasting glucose.
Genotyping and quality control
The Metabochip or other commercial genome-wide arrays were used by individual studies for genotyping. Details are presented in Supplementary Table 1 or are reported elsewhere[1,2]. The quality control criteria for both Metabochip and genome-wide arrays for filtering of poorly genotyped individuals or low quality SNPs prior to imputation included: (i) call rate < 0.95; (ii) sex-discrepancies; (iii) ethnic outliers; (iv) heterozygosity (Supplementary Table 1); (v) SNP minor allele frequency < 0.01; (vi) SNP Hardy-Weinberg equilibrium P < 10−4; (vii) SNP effect estimate standard error (SE) ≥10; (viii) SNPs minor allele count (MAC) < 10 (calculated as total number of observed alleles at each SNP multiplied by MAF).Studies with genome-wide arrays undertook imputation using the HapMap CEU reference panel using MACH and IMPUTE software (Supplementary Table 1). Parameters used in imputation and filters applied to imputed genotypes are described in Supplementary Table 1 or reported previously[1,2]. From a total of ~2.5M genome-wide directly genotyped or imputed autosomal SNPs, study-specific results for the ~66,000 Metabochip follow-up SNPs were considered for the present meta-analyses. SNPs with a meta-analysis result for more than a total 10,000 individuals were included in the analysis.
Statistical analysis
Analyses of previous discovery studies are reported elsewhere[1,2], while those studies genotyped on the Metabochip are described in Supplementary Table 1. SNP effect estimates and their standard errors (for additive genetic model) were combined by inverse-variance weighted fixed effects meta-analysis using METAL[88] and GWAMA[89]. Two parallel meta-analyses for each trait by different analysts were compared for consistency. Individual cohort results were corrected for residual inflation of the test statistics using lambda of genomic control (GC) estimates. The GC values were estimated for each study using either test statistics from all SNPs for the GWA studies, while for those studies genotyped on the Metabochip, GC lambda estimates were derived from test statistics for 5,041 SNPs selected for follow-up of QT-interval associations, as we perceived these to have the lowest likelihood of common architecture of associations with glycemic traits. Individual study-level lambda GC estimates are shown in Supplementary Table 1. Overall QQ plots for the QT follow-up SNPs are shown in Supplementary Figure 10.
Trait-associated signal selection strategy
Meta-analysis results for each trait were considered as genome-wide significant if they achieved P ≤ 5 × 10−8 threshold and were not in LD (r2 < 0.05) or within 500 kb of an established signal. The most significantly associated SNP (lowest P-value) in each region (500 kb) was selected as the lead SNP. Associated loci are referred to by the name of the nearest gene, unless a more biologically plausible gene was nearby, or a nearby gene was previously associated with another trait. In such cases, we maintain consistency with the previous naming, but list the nearest genes in Supplementary Tables 2a-d. To establish the variance in each trait explained by these SNPs, in the Framingham Heart Study, we included all SNPs in a model adjusted for age, sex, BMI and cohort.
Fine-mapping of known glycemic trait loci
To undertake preliminary fine-mapping analyses, we investigated the patterns of association at 17 known fasting glucose and fasting insulin loci[1] and 5 known 2hGlu loci[2] using meta-analysis results from 13,644, 1,309 and 1,249 SNPs genotyped on the Metabochip in 53,622, 42,384 and 27,602 individuals for fasting glucose, fasting insulin and 2hGlu, respectively. Only studies genotyped directly on the Metabochip were used for fine-mapping purposes in order to have equal sample size and availability of all SNPs. Regional plots for each locus were created using the previous lead SNP[1] or a suitable proxy (r2 > 0.8) as the index SNP if that marker was not present on Metabochip. The plots were generated on the LocusZoom web-based plotting software[90] using LD information from the 1000 Genomes Project (hg19/Nov2010EUR data). Prior to generating the plots, all SNP names and positions from the Metabochip-only meta-analysis files were aligned to build37 using the Lift Genome annotation tool on the UCSC website (http://genome.ucsc.edu/cgi-bin/hgLiftOver) in order to be compatible with the 1000 Genomes SNP naming format (chr:position) and allow more thorough assessment of the pairwise LD patterns around the established SNPs.
Associations of glycemic trait variants with related traits
For those SNPs which we identified to be genome-wide significant, we also investigated their association with other metabolic and disease traits. We exchanged reciprocal data for such SNPs with the latest DIAGRAM Metabochip analyses[24], and checked associations of these SNPs in publicly available data from previous studies of lipid traits from the GLGC[27] (triglycerides, HDL- and LDL-cholesterol; http://www.sph.umich.edu/csg/abecasis/public/lipids2010/) as well as BMI and waist-hip ratio (WHR) from the GIANT consortium[25, 26] (http://www.broadinstitute.org/collaboration/giant/index.php/Main_Page). From these data, we were able to establish the presence of any association and the direction of effect for these other traits aligned to our trait-raising alleles. We highlighted associations with other traits at P < 0.05, and also performed FDR analyses. We performed FDR analyses for each trait separately (removing duplicate loci that were associated with more than one glycemic trait) and identified those where q < 0.05.
eQTL analyses
Liver gene expression data from the Advanced Study of Aortic Pathology (ASAP) study has been described previously[91]. In brief, liver biopsies were collected from patients at the Karolinska University Hospital, Stockholm, Sweden undergoing aortic valve surgery alone or combined with surgery for aortic aneurysm, starting from February 13, 2007. All subjects gave their informed consent and the study was approved by the ethics committee of Karolinska Institute, Stockholm, Sweden. After hybridization of extracted RNA to Affymetrix ST 1.0 Exon arrays, data was RMA normalized and log-transformed. DNA was extracted from whole blood and genotyping was carried out using the Illumina 610w-Quad bead array platform. Imputation was carried out on SNPs with a call rate exceeding 95%, using the MACH algorithm. Imputation quality scores of RSQ < 0.3 were excluded from analysis. An additive genetic model was used to test for association between SNPs and gene expression.
VEGAS
To identify genes with multiple associated SNPs, we performed gene-based analysis using VEGAS, described in detail previously[10]. Briefly, on all available samples and among the ~66,000 follow-up SNPs, VEGAS pooled the information for all SNPs within each gene (± 50 kb) to identify genes with higher evidence of association than expected by chance, while adjusting for gene size and the linkage disequilibrium structure of the SNPs, by simulation (maximum number of simulations used was 106). We identified genomic regions (separated by >1 Mb) showing evidence of association and described the genes contained within those regions. While we often identified multiple genes within an associated region, it is probable that some of these are significant via LD. Bonferroni correction was used to adjust for multiple testing, based on the number of independent tests (number of genes tested) (~9,300) and P-values < 5.0 × 10−6 were considered significant. While the number of genes represented was constrained by those SNPs submitted to the Metabochip, our analyses asked the question: of the genes represented on the Metabochip, all with a slightly raised prior likelihood of association, which genes show the most evidence for association with glycemic traits?
GRAIL
We used the GRAIL[45] to evaluate whether genome-wide loci associated with glycemic traits were enriched for connectivity between genes representing particular pathways or molecular processes. As described in detail previously[45], to define the genes near each SNP, GRAIL finds the furthest neighboring SNPs in the 3′ and 5′ direction in LD (Hapmap CEU: r2 > 0.5) and proceeds outwards in each direction to the nearest recombination hotspot[92]. All genes that overlap that interval are considered implicated by the SNP. If there are no genes in that region, the interval is extended by 250 kb in either direction. The method performs a text-based analysis looking at abstracts in PubMed prior to December 2006 (to avoid confounding from GWAS results arising after that date). We performed two analyses for each trait: first, we took all genome-wide signals for each trait as seed and query loci to investigate biological connectivity among those loci (fasting glucose = 35, fasting insulin = 16, 2hGlu = 9). For fasting insulin, we did not include FTO as the association with fasting insulin was entirely mediated by BMI. Secondly, we also investigated connectivity between these established signals (as seed regions) and those which did not reach genome-wide significance but were suggestively associated with each trait (P < 0.0005) (as query regions) as described previously[93]. For fasting insulin, we used BMI-adjusted results to define the query regions. Query regions were defined by taking all SNPs more significant than P < 0.0005, removing those associated at genome-wide levels of significance and pruning SNPs of r2 > 0.05 in each region using PLINK[94]. As GRAIL tests connectivity of regions, we also removed any duplicates where a region was represented by more than one SNP. For those SNPs not found by the software, we submitted the region as a 500 kb window centered at the location of the SNP. This approach identified 218, 155 and 100 query regions (representing 715, 639 and 298 genes) for fasting glucose, fasting insulin (adjusted for BMI),= and 2hGlu, respectively. The number of loci reaching Pgrail < 0.01 was identified from these analyses, and to establish the level of enrichment, we randomly sampled 1,000 random sets of matched numbers of SNPs and calculated the proportion with as many or more reaching Pgrail < 0.01 to derive a permutation based P-value (P).
Pathway analyses
Pathway analysis was carried out for fasting glucose, fasting insulin and 2hGlu (uniform/FG-BMI adjusted) using data from previous discovery GWAS studies only[1] to avoid bias towards pathways represented on the Metabochip (build36, n > 10,000 and MAF ≥ 1% cutoff used). The software used for this analysis was MAGENTA 2.4 (July 2011, http://www.broadinstitute.org/mpg/magenta/). SNPs from the meta-analysis file were assigned to a gene if they mapped within 110 kb upstream and 40 kb downstream of transcript boundaries. The smallest P-value for the set of SNPs assigned to the gene was adjusted for confounders, such as gene length, marker density and LD in a linear regression, creating a gene association score. If a top SNP was assigned to multiple genes, only the gene with the lowest score was kept to avoid positional clustering. The HLA region was removed due to high LD and gene density. Pathway terms from multiple databases (GO, PANTHER, Ingenuity, KEGG) was attached to each gene. The genes were ranked on their association score, and a GSEA test was performed testing all pathway terms using a 5% and 75% cutoff. Initially, 10,000 gene set permutations were performed for GSEA P-value estimation. This number was then increased with GSEA P < 1 × 10−4, and up to 1,000,000 permutations were performed. Results were sorted on FDR (5% cutoff), and FDR < 0.05 was considered to be significant.
Analyses of directional consistency of associations between discovery and follow-up studies
We investigated whether the Metabochip follow-up SNPs were likely to contain further true associations in addition to those SNPs which reached genome-wide significance. To do so, we meta-analyzed those studies involved in the original discovery analyses[1,2] comprising 42,078 individuals for fasting glucose, 34,230 for fasting insulin and 15,252 for 2hGlu, and also then separately meta-analyzed all studies newly available to follow up, comprising 85,710 individuals for fasting glucose, 69,240 for fasting insulin and 27,602 for 2hGlu. For each trait (fasting glucose, fasting insulin, FI-BMIadj and 2hGlu), we then identified all SNPs which had a nominally significant association (P < 0.05) in the follow-up studies alone and, for these SNPs, performed a two-sided binomial test of whether more SNPs than expected by chance (50%) had a consistent direction of effect with that observed in the discovery analyses. Before performing these analyses, SNPs were filtered by LD (r2 < 0.01) to identify independent variants, and all SNPs (and those in LD, r2 ≥ 0.01) associated with glycemic traits (fasting glucose, fasting insulin, 2hGlu, HbA1c and proinsulin) at genome-wide levels of significance (including those SNPs identified in the present study) were excluded. These analyses were initially performed for all 66,000 SNPs, but we were then also able to compare across SNPs submitted to the Metabochip by different consortia and for SNPs submitted to follow up on particular traits amongst these consortia. The results of each of these tests were plotted overall, within SNPs from each consortia, and within SNPs submitted for follow-up of each trait in Supplementary Figure 9. The numbers of SNPs meeting these criteria are shown are Supplementary Table 7. We supplemented these results with FDR analyses and noted the q-value at P = 0.05 in the follow-up studies to identify the likelihood of true positives among these nominally significant SNPs (Supplementary Table 7).
Authors: Dongsheng Cai; Minsheng Yuan; Daniel F Frantz; Peter A Melendez; Lone Hansen; Jongsoon Lee; Steven E Shoelson Journal: Nat Med Date: 2005-01-30 Impact factor: 53.440
Authors: S L Anderson; R Coli; I W Daly; E A Kichula; M J Rork; S A Volpi; J Ekstein; B Y Rubin Journal: Am J Hum Genet Date: 2001-01-22 Impact factor: 11.025
Authors: Michael H Cho; Nadia Boutaoui; Barbara J Klanderman; Jody S Sylvia; John P Ziniti; Craig P Hersh; Dawn L DeMeo; Gary M Hunninghake; Augusto A Litonjua; David Sparrow; Christoph Lange; Sungho Won; James R Murphy; Terri H Beaty; Elizabeth A Regan; Barry J Make; John E Hokanson; James D Crapo; Xiangyang Kong; Wayne H Anderson; Ruth Tal-Singer; David A Lomas; Per Bakke; Amund Gulsvik; Sreekumar G Pillai; Edwin K Silverman Journal: Nat Genet Date: 2010-02-21 Impact factor: 38.330
Authors: Alisa K Manning; Marie-France Hivert; Robert A Scott; Jonna L Grimsby; Nabila Bouatia-Naji; Han Chen; Denis Rybin; Ching-Ti Liu; Lawrence F Bielak; Inga Prokopenko; Najaf Amin; Daniel Barnes; Gemma Cadby; Jouke-Jan Hottenga; Erik Ingelsson; Anne U Jackson; Toby Johnson; Stavroula Kanoni; Claes Ladenvall; Vasiliki Lagou; Jari Lahti; Cecile Lecoeur; Yongmei Liu; Maria Teresa Martinez-Larrad; May E Montasser; Pau Navarro; John R B Perry; Laura J Rasmussen-Torvik; Perttu Salo; Naveed Sattar; Dmitry Shungin; Rona J Strawbridge; Toshiko Tanaka; Cornelia M van Duijn; Ping An; Mariza de Andrade; Jeanette S Andrews; Thor Aspelund; Mustafa Atalay; Yurii Aulchenko; Beverley Balkau; Stefania Bandinelli; Jacques S Beckmann; John P Beilby; Claire Bellis; Richard N Bergman; John Blangero; Mladen Boban; Michael Boehnke; Eric Boerwinkle; Lori L Bonnycastle; Dorret I Boomsma; Ingrid B Borecki; Yvonne Böttcher; Claude Bouchard; Eric Brunner; Danijela Budimir; Harry Campbell; Olga Carlson; Peter S Chines; Robert Clarke; Francis S Collins; Arturo Corbatón-Anchuelo; David Couper; Ulf de Faire; George V Dedoussis; Panos Deloukas; Maria Dimitriou; Josephine M Egan; Gudny Eiriksdottir; Michael R Erdos; Johan G Eriksson; Elodie Eury; Luigi Ferrucci; Ian Ford; Nita G Forouhi; Caroline S Fox; Maria Grazia Franzosi; Paul W Franks; Timothy M Frayling; Philippe Froguel; Pilar Galan; Eco de Geus; Bruna Gigante; Nicole L Glazer; Anuj Goel; Leif Groop; Vilmundur Gudnason; Göran Hallmans; Anders Hamsten; Ola Hansson; Tamara B Harris; Caroline Hayward; Simon Heath; Serge Hercberg; Andrew A Hicks; Aroon Hingorani; Albert Hofman; Jennie Hui; Joseph Hung; Marjo-Riitta Jarvelin; Min A Jhun; Paul C D Johnson; J Wouter Jukema; Antti Jula; W H Kao; Jaakko Kaprio; Sharon L R Kardia; Sirkka Keinanen-Kiukaanniemi; Mika Kivimaki; Ivana Kolcic; Peter Kovacs; Meena Kumari; Johanna Kuusisto; Kirsten Ohm Kyvik; Markku Laakso; Timo Lakka; Lars Lannfelt; G Mark Lathrop; Lenore J Launer; Karin Leander; Guo Li; Lars Lind; Jaana Lindstrom; Stéphane Lobbens; Ruth J F Loos; Jian'an Luan; Valeriya Lyssenko; Reedik Mägi; Patrik K E Magnusson; Michael Marmot; Pierre Meneton; Karen L Mohlke; Vincent Mooser; Mario A Morken; Iva Miljkovic; Narisu Narisu; Jeff O'Connell; Ken K Ong; Ben A Oostra; Lyle J Palmer; Aarno Palotie; James S Pankow; John F Peden; Nancy L Pedersen; Marina Pehlic; Leena Peltonen; Brenda Penninx; Marijana Pericic; Markus Perola; Louis Perusse; Patricia A Peyser; Ozren Polasek; Peter P Pramstaller; Michael A Province; Katri Räikkönen; Rainer Rauramaa; Emil Rehnberg; Ken Rice; Jerome I Rotter; Igor Rudan; Aimo Ruokonen; Timo Saaristo; Maria Sabater-Lleal; Veikko Salomaa; David B Savage; Richa Saxena; Peter Schwarz; Udo Seedorf; Bengt Sennblad; Manuel Serrano-Rios; Alan R Shuldiner; Eric J G Sijbrands; David S Siscovick; Johannes H Smit; Kerrin S Small; Nicholas L Smith; Albert Vernon Smith; Alena Stančáková; Kathleen Stirrups; Michael Stumvoll; Yan V Sun; Amy J Swift; Anke Tönjes; Jaakko Tuomilehto; Stella Trompet; Andre G Uitterlinden; Matti Uusitupa; Max Vikström; Veronique Vitart; Marie-Claude Vohl; Benjamin F Voight; Peter Vollenweider; Gerard Waeber; Dawn M Waterworth; Hugh Watkins; Eleanor Wheeler; Elisabeth Widen; Sarah H Wild; Sara M Willems; Gonneke Willemsen; James F Wilson; Jacqueline C M Witteman; Alan F Wright; Hanieh Yaghootkar; Diana Zelenika; Tatijana Zemunik; Lina Zgaga; Nicholas J Wareham; Mark I McCarthy; Ines Barroso; Richard M Watanabe; Jose C Florez; Josée Dupuis; James B Meigs; Claudia Langenberg Journal: Nat Genet Date: 2012-05-13 Impact factor: 38.330
Authors: Karolina Wesołowska; Marko Elovainio; Taina Hintsa; Markus Jokela; Laura Pulkki-Råback; Niina Pitkänen; Jari Lipsanen; Janne Tukiainen; Leo-Pekka Lyytikäinen; Terho Lehtimäki; Markus Juonala; Olli Raitakari; Liisa Keltikangas-Järvinen Journal: Int J Behav Med Date: 2017-12
Authors: Ivan Carcamo-Orive; Ngan F Huang; Thomas Quertermous; Joshua W Knowles Journal: Arterioscler Thromb Vasc Biol Date: 2017-07-20 Impact factor: 8.311
Authors: Xiang Shu; Lang Wu; Nikhil K Khankari; Xiao-Ou Shu; Thomas J Wang; Kyriaki Michailidou; Manjeet K Bolla; Qin Wang; Joe Dennis; Roger L Milne; Marjanka K Schmidt; Paul D P Pharoah; Irene L Andrulis; David J Hunter; Jacques Simard; Douglas F Easton; Wei Zheng Journal: Int J Epidemiol Date: 2019-06-01 Impact factor: 7.196