Literature DB >> 26551672

Genetic fine mapping and genomic annotation defines causal mechanisms at type 2 diabetes susceptibility loci.

Kyle J Gaulton^1,2, Teresa Ferreira¹, Yeji Lee³, Anne Raimondo⁴, Reedik Mägi⁵, Michael E Reschen⁶, Anubha Mahajan¹, Adam Locke³, N William Rayner^1,4,7, Neil Robertson^1,4, Robert A Scott⁸, Inga Prokopenko⁹, Laura J Scott³, Todd Green¹⁰, Thomas Sparso¹¹, Dorothee Thuillier¹², Loic Yengo¹², Harald Grallert^13,14,15, Simone Wahl^13,14,15, Mattias Frånberg^16,17,18, Rona J Strawbridge¹⁶, Hans Kestler^19,20, Himanshu Chheda²¹, Lewin Eisele²², Stefan Gustafsson²³, Valgerdur Steinthorsdottir²⁴, Gudmar Thorleifsson²⁴, Lu Qi^25,26,27,28, Lennart C Karssen²⁹, Elisabeth M van Leeuwen²⁹, Sara M Willems^8,29, Man Li³⁰, Han Chen^31,32, Christian Fuchsberger³, Phoenix Kwan³, Clement Ma³, Michael Linderman³³, Yingchang Lu³⁴, Soren K Thomsen⁴, Jana K Rundle⁴, Nicola L Beer^1,4, Martijn van de Bunt^1,4, Anil Chalisey⁶, Hyun Min Kang³, Benjamin F Voight³⁵, Gonçalo R Abecasis³, Peter Almgren³⁶, Damiano Baldassarre^37,38, Beverley Balkau^39,40, Rafn Benediktsson^41,42, Matthias Blüher^43,44, Heiner Boeing⁴⁵, Lori L Bonnycastle⁴⁶, Erwin P Bottinger⁴⁷, Noël P Burtt¹⁰, Jason Carey¹⁰, Guillaume Charpentier⁴⁸, Peter S Chines⁴⁶, Marilyn C Cornelis⁴⁹, David J Couper⁵⁰, Andrew T Crenshaw¹⁰, Rob M van Dam^26,51, Alex S F Doney^52,53, Mozhgan Dorkhan⁵⁴, Sarah Edkins⁷, Johan G Eriksson^55,56,57,58, Tonu Esko^5,59,60, Elodie Eury⁶¹, João Fadista³⁶, Jason Flannick¹⁰, Pierre Fontanillas¹⁰, Caroline Fox^62,63, Paul W Franks^26,36,64,65, Karl Gertow¹⁶, Christian Gieger^13,14, Bruna Gigante⁶⁶, Omri Gottesman⁴⁷, George B Grant¹⁰, Niels Grarup¹¹, Christopher J Groves⁴, Maija Hassinen⁶⁷, Christian T Have¹¹, Christian Herder^68,69, Oddgeir L Holmen⁷⁰, Astradur B Hreidarsson⁴², Steve E Humphries⁷¹, David J Hunter^25,26,27,72, Anne U Jackson³, Anna Jonsson³⁶, Marit E Jørgensen⁷³, Torben Jørgensen^74,75,76, Wen-Hong L Kao³⁰, Nicola D Kerrison⁸, Leena Kinnunen⁵⁵, Norman Klopp^13,77, Augustine Kong²⁴, Peter Kovacs^43,44, Peter Kraft^25,32,72, Jasmina Kravic³⁶, Cordelia Langford⁷, Karin Leander⁶⁶, Liming Liang^25,32, Peter Lichtner⁷⁸, Cecilia M Lindgren^1,10, Eero Lindholm³⁶, Allan Linneberg^74,79,80, Ching-Ti Liu³¹, Stéphane Lobbens⁶¹, Jian'an Luan⁸, Valeriya Lyssenko^36,73, Satu Männistö⁵⁵, Olga McLeod¹⁶, Julia Meyer⁸¹, Evelin Mihailov⁵, Ghazala Mirza⁸², Thomas W Mühleisen^83,84,85, Martina Müller-Nurasyid^81,86,87,88, Carmen Navarro^89,90,91, Markus M Nöthen^83,84, Nikolay N Oskolkov³⁶, Katharine R Owen^4,92, Domenico Palli⁹³, Sonali Pechlivanis²², Leena Peltonen^7,10,21,55, John R B Perry⁸, Carl G P Platou^70,94, Michael Roden^68,69,95, Douglas Ruderfer⁹⁶, Denis Rybin⁹⁷, Yvonne T van der Schouw⁹⁸, Bengt Sennblad^16,17, Gunnar Sigurðsson^42,99, Alena Stančáková¹⁰⁰, Gerald Steinbach¹⁰¹, Petter Storm³⁶, Konstantin Strauch^81,87, Heather M Stringham³, Qi Sun^26,27, Barbara Thorand^14,15, Emmi Tikkanen^21,102, Anke Tonjes^43,44, Joseph Trakalo¹, Elena Tremoli^37,38, Tiinamaija Tuomi^{21,58,103,104}, Roman Wennauer¹⁰⁵, Steven Wiltshire¹, Andrew R Wood¹⁰⁶, Eleftheria Zeggini⁷, Ian Dunham¹⁰⁷, Ewan Birney¹⁰⁷, Lorenzo Pasquali^108,109,110, Jorge Ferrer^111,112, Ruth J F Loos^8,34,47,113, Josée Dupuis^31,62, Jose C Florez^{60,114,115,116}, Eric Boerwinkle^117,118, James S Pankow¹¹⁹, Cornelia van Duijn^29,120, Eric Sijbrands¹⁰⁵, James B Meigs^114,121, Frank B Hu^25,26,27, Unnur Thorsteinsdottir^24,41, Kari Stefansson^24,41, Timo A Lakka^67,122,123, Rainer Rauramaa^67,123, Michael Stumvoll^43,44, Nancy L Pedersen¹²⁴, Lars Lind¹²⁵, Sirkka M Keinanen-Kiukaanniemi^126,127, Eeva Korpi-Hyövälti¹²⁸, Timo E Saaristo^129,130, Juha Saltevo¹³¹, Johanna Kuusisto¹⁰⁰, Markku Laakso¹⁰⁰, Andres Metspalu^5,132, Raimund Erbel¹³³, Karl-Heinz Jöcke¹²², Susanne Moebus²², Samuli Ripatti^7,21,102,134, Veikko Salomaa⁵⁵, Erik Ingelsson^1,23, Bernhard O Boehm^135,136, Richard N Bergman¹³⁷, Francis S Collins⁴⁶, Karen L Mohlke¹³⁸, Heikki Koistinen^55,139,140, Jaakko Tuomilehto^{55,141,142,143}, Kristian Hveem⁷⁰, Inger Njølstad¹⁴⁴, Panagiotis Deloukas^7,145, Peter J Donnelly^1,146, Timothy M Frayling¹⁰⁶, Andrew T Hattersley¹⁴⁷, Ulf de Faire⁶⁶, Anders Hamsten¹⁶, Thomas Illig^13,77, Annette Peters^14,15,88, Stephane Cauchi¹², Rob Sladek^148,149, Philippe Froguel^9,12,61, Torben Hansen^11,150, Oluf Pedersen¹¹, Andrew D Morris¹⁵¹, Collin N A Palmer^52,53, Sekar Kathiresan^10,115,152, Olle Melander³⁶, Peter M Nilsson³⁶, Leif C Groop^21,36, Inês Barroso^7,153,154, Claudia Langenberg⁸, Nicholas J Wareham⁸, Christopher A O'Callaghan⁶, Anna L Gloyn^1,4,92, David Altshuler^{10,114,115,116,155,156}, Michael Boehnke³, Tanya M Teslovich³, Mark I McCarthy^1,4,92, Andrew P Morris^1,5,157,158.

Abstract

We performed fine mapping of 39 established type 2 diabetes (T2D) loci in 27,206 cases and 57,574 controls of European ancestry. We identified 49 distinct association signals at these loci, including five mapping in or near KCNQ1. 'Credible sets' of the variants most likely to drive each distinct signal mapped predominantly to noncoding sequence, implying that association with T2D is mediated through gene regulation. Credible set variants were enriched for overlap with FOXA2 chromatin immunoprecipitation binding sites in human islet and liver cells, including at MTNR1B, where fine mapping implicated rs10830963 as driving T2D association. We confirmed that the T2D risk allele for this SNP increases FOXA2-bound enhancer activity in islet- and liver-derived cells. We observed allele-specific differences in NEUROD1 binding in islet-derived cells, consistent with evidence that the T2D risk allele increases islet MTNR1B expression. Our study demonstrates how integration of genetic and genomic information can define molecular mechanisms through which variants underlying association signals exert their effects on disease.

Entities: Chemical

Mesh：

Substances：

Year: 2015 PMID： 26551672 PMCID： PMC4666734 DOI： 10.1038/ng.3437

Source DB: PubMed Journal: Nat Genet ISSN： 1061-4036 Impact factor: 38.330

INTRODUCTION

Genome-wide association studies (GWAS) of common variants, defined by minor allele frequency (MAF) ≥5%, have been successful in identifying loci contributing to type 2 diabetes (T2D) susceptibility[1-5]. GWAS loci are typically represented by a “lead” SNP with the strongest signal of association in the region. However, lead SNPs may not directly impact disease susceptibility, but instead be proxies for causal variants because of linkage disequilibrium (LD). Interpretation may be further complicated by the presence of more than one causal variant at a locus, possibly acting through the joint effects of alleles on the same haplotype. This complex genetic architecture would result in multiple “distinct” association signals at the same locus, which can only be delineated, statistically, through conditional analyses. With the exception loci where the lead SNPs are protein altering variants, including PPARG[6], KCNJ11-ABCC8[7], SLC30A8[8], and GCKR[9], the mechanisms by which associated alleles influence T2D susceptibility are largely unknown. At other loci, direct biological interpretation of the effect of genetic variation on T2D is more challenging because the association signals mostly map to non-coding sequence. While recent reports have demonstrated a relationship between T2D-associated variants and transcriptional enhancer activity, particularly in human pancreatic islets, liver cells, adipose tissue, and muscle[10-14], the DNA-binding proteins through which these effects are mediated remain obscure. Localisation of non-coding causal variants may highlight the specific regulatory elements they perturb, and potentially the genes through which they operate, providing valuable insights into the pathophysiological basis of T2D susceptibility at GWAS loci. To improve the localisation of potential causal variants for T2D, and characterise the mechanisms through which they alter disease risk, we performed comprehensive fine-mapping of 39 established loci through high-density imputation into 27,206 cases and 57,574 controls from 23 studies of European ancestry, genotyped with the Metabochip[15] (Supplementary Tables 1 and 2). Within each locus, we aimed to: (i) evaluate the evidence for multiple distinct association signals through conditional analyses; (ii) undertake fine-mapping by defining credible sets of variants that account for ≥99% of the probability of driving each distinct association signal; and (iii) interrogate credible sets for functional and regulatory annotation to provide insight into the mechanisms through which variants driving association signals influence disease risk.

RESULTS

Imputation into Metabochip fine-mapping regions

The Metabochip includes high-density coverage of 257 “fine-mapping regions” that have been previously associated with 23 metabolic, cardiovascular, and anthropometric traits[15]. SNPs in these regions were selected using reference data from the HapMap[16] and the 1000 Genomes (1000G) Project[17]. At design, 27 T2D susceptibility loci were selected for fine-mapping. However, subsequent T2D GWAS efforts have identified additional loci that overlap 12 further fine-mapping regions that were initially selected for other traits (Supplementary Table 3). To enhance coverage of variation in the fine-mapping regions, we undertook imputation into the Metabochip scaffold up to the 1000G phase 1 integrated reference panel (March 2012 release)[18], including multi-ethnic haplotypes to reduce error rates[19] (Online Methods). The quality of imputation was variable across studies, particularly for MAF<5% variants, and dependent on the scaffold sample size (Supplementary Table 4). We defined variants to be “well-imputed” at widely-used thresholds[20] of IMPUTEv2[21] info≥0.4 or minimac[22] r2≥0.3 in at least 80% of the total effective sample size (Neff≥59,122) across studies. With this definition, 99.4% and 89.0%, respectively, of common and low-frequency (0.5%≤MAF<5%) variants in 1000G European ancestry haplotypes were well imputed, and therefore retained for downstream association analyses. Within studies, imputation quality was consistent across loci, despite the differential priority of fine-mapping regions and their coverage of variation at design (Supplementary Table 5). 1000G imputation into the Metabochip scaffold thus provides near complete coverage of common and low-frequency variation across the 39 T2D susceptibility loci, and supports direct interrogation of the majority of variants with MAF≥0.5% in European ancestry populations.

Distinct association signals at T2D susceptibility loci

The first step in fine-mapping GWAS loci is to delineate distinct association signals arising from multiple causal variants in the same region, which can efficiently be achieved through approximate conditioning with GCTA[23]. Within each T2D fine-mapping region, we identified distinct signals attaining “locus-wide” significance (represented by an index variant with pJ<10−5 in the joint association model) by applying GCTA in two stages (Online Methods). First, we selected index variants on the basis of fixed-effects meta-analysis across Metabochip studies. Second, we performed in silico replication of the index variants in a validation meta-analysis of an additional 19,662 T2D cases and 115,140 controls from 10 GWAS of European ancestry (Supplementary Tables 1, 2, and 6). Finally, because GCTA is only an approximation, we confirmed the association of each index variant through exact conditional analysis across Metabochip studies (Online Methods, Supplementary Table 7). The most dramatic delineation of distinct association signals was observed for the region flanking KCNQ1, where five non-coding index variants attained locus-wide significance (Table 1, Supplementary Figure 1). Distinct association signals represented by three of the index variants have been reported in previous GWAS of European[4] and East Asian[24] ancestry: rs74046911 (pJ=3.6×10−26, r2=0.98 with East Asian lead SNP, rs2237897) and rs2237895 (pJ=2.1×10−9, r2=0.75 with one European lead SNP, rs163184), both of which map to a <50kb intronic recombination interval of KCNQ1; and chr11:2692322:D (pJ=7.2×10−16, r2=0.59 with second European lead SNP, rs231361), which resides in the KCNQ1OT1 transcript that controls regional imprinting[25]. The remaining two distinct association signals at this locus are novel. The first, indexed by rs458069 (pJ=3.2×10−6), maps to the same <50kb recombination interval as rs74046911 and rs2237895, but is in only weak LD with both (r2=0.02 and r2=0.25, respectively). The second, indexed by rs2283220 (pJ=2.2×10−7), resides in a neighbouring intron of KCNQ1, outside of the <50kb recombination interval (Supplementary Figure 1).

Table 1

Established T2D susceptibility loci with multiple distinct signals of association at locus-wide significance in the GCTA joint regression model (pj<10−5)

Locus	Index variant	Chr	Position(b37)	Riskallele	Otherallele	Metabochip GCTA joint model27,206 cases and 57,574 controls			Validation GCTA joint model19,662 cases and 115,140 controls			Combined GCTA joint model46,868 cases and 172,714 controls
Locus	Index variant	Chr	Position(b37)	Riskallele	Otherallele	RAF	OR (95% CI)	p _J	RAF	OR (95% CI)	p _J	OR (95% CI)	p _J
DGKB	rs10276674	7	14,922,007	C	T	0.183	1.08 (1.04-1.11)	4.5×10⁻⁶	0.216	1.09 (1.05-1.12)	1.3×10⁻⁶	1.08 (1.06-1.11)	2.8×10⁻¹¹
DGKB	rs1974620	7	15,065,467	T	C	0.519	1.06 (1.04-1.09)	1.6×10⁻⁶	0.515	1.05 (1.03-1.08)	0.00014	1.06 (1.04-1.08)	1.0×10⁻⁹
CDKN2B	rs10811660	9	22,134,068	G	A	0.830	1.32 (1.27-1.38)	2.4×10⁻⁴⁴	0.817	1.21 (1.17-1.26)	2.6×10⁻²¹	1.27 (1.23-1.30)	1.1×10⁻⁶¹
CDKN2B	rs10757283	9	22,134,172	T	C	0.437	1.14 (1.10-1.17)	7.4×10⁻¹⁸	0.436	1.11 (1.07-1.14)	1.3×10⁻¹⁰	1.12 (1.10-1.14)	3.6×10⁻²⁶
KCNQ1	chr11:2692322:D	11	2,692,322	D	R	0.374	1.08 (1.05-1.10)	3.5×10⁻⁸	0.413	1.09 (1.06-1.12)	1.2×10⁻⁸	1.08 (1.06-1.10)	2.3×10⁻¹⁵
	rs2283220	11	2,755,548	A	G	0.661	1.06 (1.03-1.09)	0.000016	0.710	1.05 (1.02-1.08)	0.0031	1.06 (1.03-1.08)	2.4×10⁻⁷
	rs2237895	11	2,857,194	C	A	0.428	1.08 (1.05-1.11)	6.6×10⁻⁷	0.433	1.07 (1.03-1.10)	2.8×10⁻⁴	1.07 (1.05-1.10)	5.3×10⁻¹⁰
	rs74046911	11	2,858,636	C	T	0.951	1.32 (1.24-1.40)	1.7×10⁻¹⁷	0.943	1.25 (1.17-1.34)	4.8×10⁻¹⁰	1.29 (1.23-1.35)	9.6×10⁻²⁶
	rs458069	11	2,858,800	G	C	0.707	1.06 (1.03-1.10)	0.00026	0.707	1.07 (1.03-1.11)	0.00085	1.06 (1.04-1.09)	1.0×10⁻⁶
HNF1A	rs1169288	12	121,416,650	C	A	0.334	1.10 (1.07-1.13)	5.4×10⁻¹⁰	0.316	1.08 (1.05-1.12)	2.8×10⁻⁶	1.09 (1.07-1.12)	8.1×10⁻¹⁵
	rs1800574	12	121,416,864	T	C	0.027	1.21 (1.11-1.31)	5.2×10⁻⁶	0.020	1.23 (1.12-1.35)	0.000026	1.22 (1.14-1.29)	5.1×10⁻¹⁰
	chr12:121440833:D	12	121,440,833	R	D	0.416	1.06 (1.03-1.09)	0.000028	0.382	1.08 (1.04-1.11)	2.5×10⁻⁶	1.07(1.05-1.09)	2.9×10⁻¹⁰
MC4R	chr18:57739289:D	18	57,739,289	D	R	0.234	1.05 (1.02-1.09)	0.00079	0.254	1.07 (1.03-1.10)	0.000059	1.06 (1.04-1.08)	1.9×10⁻⁷
MC4R	rs17066842	18	58,040,624	G	A	0.961	1.13 (1.06-1.21)	0.00033	0.948	1.11 (1.04-1.19)	0.0012	1.12 (1.07-1.17)	1.4×10⁻⁶
GIPR	rs4399645	19	46,166,073	T	C	0.395	1.07 (1.04-1.10)	4.4×10⁻⁷	0.441	1.05 (1.01-1.08)	0.0046	1.06 (1.04-1.08)	1.4×10⁻⁸
GIPR	rs2238689	19	46,178,661	C	T	0.425	1.09 (1.07-1.12)	9.7×10⁻¹²	0.424	1.07 (1.04-1.10)	9.0×10⁻⁶	1.08 (1.06-1.11)	8.3×10⁻¹⁶
HNF4A [a]	rs1800961	20	43,042,364	T	C	0.034	1.16 (1.09-1.24)	0.000011	0.041	1.16 (1.08-1.25)	0.000051	1.16 (1.10-1.22)	2.3×10⁻⁹

Each distinct association signal was represented by an index variant in the GCTA joint regression model on the basis of: (i) summary statistics from a combined meta-analysis of 46,868 cases and 172,714 controls of European ancestry; and (ii) reference genotype data from GoDARTS (3,298 cases and 3,708 controls of European ancestry from the UK) to approximate LD across fine-mapping regions.

Chr: chromosome. RAF: risk allele frequency. OR: odds-ratio for risk allele. CI: confidence interval.

The previously reported T2D GWAS SNP at the HNF4A locus (rs4812829) is not included in the fine-mapping region. However, the reported index variant, rs1800961, is independent of the GWAS SNP, and thus represents a novel distinct association signal at this locus.

At the HNF1A locus, we observed three distinct association signals (Table 1, Supplementary Figure 2), represented by index variants that are in only weak LD with the previously reported lead GWAS SNP, rs12427353. They include two non-synonymous variants, rs1169288 (pJ=4.4×10−14, r2=0.09, HNF1A p.I27L) and rs1800574 (pJ=4.2×10−10, r2=0.01, HNF1A p.A98V), and one inter-genic SNP, chr12:121440833:D (pJ=2.9×10−10, r2=0.19). We also observed four loci with two distinct association signals (CDKN2A-B, DGKB, MC4R and GIPR), each represented by non-coding index variants (Table 1, Supplementary Figure 3). The index variants at the CDKN2A-B locus represent the known T2D haplotype association signal mapping to a 12kb inter-genic recombination interval[26-28]. Previous European ancestry GWAS meta-analyses[4] have highlighted a potential distinct association signal, located upstream of the recombination interval in the non-coding CDKN2B-AS1 (ANRIL) transcript. However, our conditional analyses indicate that the association in this region can be fully explained by the two index SNPs in the recombination interval, which when considered together, fully extinguish the CDKN2B-AS1 signal (Supplementary Figure 4). The index variants at DGKB and MC4R also confirm previously reported distinct association signals at these loci in European ancestry GWAS meta-analyses[4]. At the GIPR locus, the two index variants (rs2238689, pJ=8.3×10−16; rs4399645, pJ=1.4×10−8) are not in strong LD with the previously reported[4] lead SNP (rs8108269; r2=0.43 with rs2238689, r2=0.00 with rs4399645), but together can better explain the T2D association signal in this region. Finally, we observed a novel distinct association signal at the HNF4A locus, represented by the coding index variant rs1800961 (pJ=1.4×10−9, HNF4A p.T139I, referred to as p.T130I in some previous studies[29]). Unfortunately, this fine-mapping region was included on Metabochip for high-density lipoprotein cholesterol[15,30] (Supplementary Table 3), and does not include the previously reported[4] lead T2D SNP at this locus, rs4812829, precluding conditional analyses in these data. However, rs4812829 is not in LD with our index variant (r2=0.02), suggesting that there are at least two distinct T2D association signals at the HNF4A locus. Of the 49 distinct association signals achieving locus-wide significance across T2D loci represented on Metabochip (five at KCNQ1, three at HNF1A, two each at CDKN2A-B, DGKB, MC4R and GIPR, and one each at the remainder), only three index variants are not common (Supplementary Table 6, Supplementary Figure 5): rs1800574 (MAF=2.2%, OR=1.21) for one signal at the HNF1A locus; rs1800961 (MAF=3.9%, OR=1.16) at the HNF4A locus; and rs17066842 (MAF=4.8%, OR=1.12) for one signal at the MC4R locus.

Localising variants driving T2D association signals

We used statistical evidence of association from the meta-analysis of Metabochip studies to construct 99% “credible sets” of variants[28] that are most likely to drive the 49 distinct signals (Online Methods, Supplementary Table 8, Supplementary Figure 6). For ten distinct association signals, mapping to nine loci, the 99% credible set included no more than ten variants (Table 2, Supplementary Table 9). The greatest refinement was observed at the MTNR1B locus, where the credible set included only the index variant, rs10830963, accounting for more than 99.8% of the posterior probability of driving the association signal (πC). Small credible sets were also observed for the association at TCF7L2 (three variants, indexed by rs7903146, mapping to 4.3kb), and one signal at KCNQ1 (three variants, indexed by rs74046911, mapping to just 200bp). The 99% credible sets for both distinct association signals at CDKN2A-B together included just 11 variants in total, and map to less than 2kb.

Table 2

Distinct association signals at established T2D susceptibility loci for which the 99% credible set contains no more than ten variants

Locus	Index variant	Chr	Position(b37)	Riskallele	Otherallele	RAF	p-value	OR (95% CI)	99% credible set
Locus	Index variant	Chr	Position(b37)	Riskallele	Otherallele	RAF	p-value	OR (95% CI)	SNPs	Interval(bp)	Intervalstart (bp)	Intervalstop (bp)
MTNR1B	rs10830963	11	92,708,710	G	C	0.283	2.9×10⁻¹²	1.10 (1.07-1.13)	1	1	92,708,710	92,708,710
TCF7L2	rs7903146	10	114,758,349	T	C	0.260	5.8×10⁻¹²⁰	1.39 (1.35-1.43)	3	4,279	114,754,071	114,758,349
KCNQ1	rs74046911	11	2,858,636	C	T	0.951	5.9×10⁻¹⁸	1.33 (1.25-1.42)	3	197	2,858,440	2,858,636
ZBED3	rs7732130	5	76,435,004	G	A	0.278	6.4×10⁻¹⁰	1.09 (1.06-1.12)	5	10,056	76,424,949	76,435,004
CDKN2A-B	rs10757283	9	22,134,172	T	C	0.437	2.8×10-19	1.14 (1.11-1.18)	5	1,007	22,133,645	22,134,651
SLC30A8	rs13266634	8	118,184,783	C	T	0.676	1.3×10⁻¹⁸	1.13 (1.10-1.16)	6	33,133	118,184,783	118,217,915
CDKN2A-B	rs10811660	9	22,134,068	G	A	0.830	7.0×10⁻⁴³	1.32 (1.27-1.37)	6	1,397	22,132,698	22,134,094
HNF1B	rs4430796	17	36,098,040	G	A	0.455	6.3×10^-12	1.09 (1.07-1.12)	7	5,791	36,097,775	36,103,565
CDKAL1	rs35261542	6	20,675,792	A	C	0.280	9.6×10⁻²³	1.15 (1.12-1.18)	8	30,073	20,673,880	20,703,952
GLIS3	chr9:4294707:I	9	4,294,707	I	R	0.360	6.5×10⁻⁸	1.07 (1.05-1.10)	10	15,453	4,283,137	4,298,589

Association summary statistics and credible set construction are based on the meta-analysis of Metabochip studies in 27,206 cases and 57,574 controls of European ancestry. In loci with multiple distinct signals of association, results are presented from exact conditional analysis after adjusting for all other index variants in the fine-mapping region. In loci with a single signal of association, results are presented from unconditional analysis. Chr: chromosome. RAF: risk allele frequency. OR: odds-ratio for risk allele. CI: confidence interval.

We performed functional annotation of credible variants to search for evidence that association signals are driven by coding alleles. Across the 49 signals, only nine coding variants attained πC>1% (Supplementary Table 10), including six previously reported non-synonymous T2D-risk alleles at PPARG[6], KCNJ11-ABCC8[7,31,32], SLC30A8[8,33], and GCKR[9,34]. The remaining three coding alleles were the index variants for association signals mapping to HNF4A (p.T139I, rs1800961, πC=97.4%) and HNF1A (p.I27L, rs1169288, πC=75.5%; p.A98V, rs1800574, πC=34.0%). Our findings are supported by earlier studies, which reported nominal evidence for association of these three coding variants with T2D and defects in insulin secretion in vivo, and demonstrated reduced transcriptional activity of HNF1A target genes using in vitro assays[29,35]. These data provide strong evidence that HNF4A and HNF1A are T2D effector transcripts at these loci, a view further supported by the known impact of rare, loss of function mutations in these genes on maturity onset diabetes of the young[36,37]. Given the near complete coverage of common and low-frequency variants in fine-mapping regions after 1000G imputation, it is unlikely that additional distinct signals in established T2D susceptibility loci represented on the Metabochip are driven by coding variation with MAF≥0.5%, confirming reports that these associations are most likely to be mediated by effects on gene regulation[10,13,14,38].

Regulatory mechanisms underlying T2D association signals

We sought to understand the regulatory mechanisms through which variants at the 39 established T2D susceptibility loci influence disease by intersecting the 99% credible sets for each distinct association signal with chromatin immunoprecipitation sequence (ChIP-seq) data for 165 transcription factors, chromatin state maps from 12 cell types, and long non-coding RNA transcripts from 25 cell types (Online Methods, Supplementary Table 11). We applied an enrichment procedure that compared the mean posterior probability of driving the association signal for credible set variants directly overlapping sites for each regulatory annotation with a null distribution obtained from randomly shifted site locations within 100kb in either direction. We first applied this procedure to chromatin state and non-coding RNA elements using the 19,266 credible set variants for all 49 distinct association signals (Supplementary Figure 7). Using a Bonferroni correction for the 37 tested cell type annotations (p<0.0014), variants in pancreatic islet enhancer elements[14] had significantly higher posterior probability of driving association signals than that expected from the null distribution (1.97-fold, p=0.00022). We also observed nominal evidence for enrichment of the posterior probability of driving association signals among variants in human islet and hepatocellular carcinoma (HepG2) promoters[10,14] (p=0.0052 and p=0.0064, respectively). However, there was no corresponding enrichment of variants in regulatory elements for other cell types or in non-coding transcripts. These results are consistent with previous studies supporting a contribution of regulatory enhancer and promoter variants to T2D susceptibility in specific cell types[11-14]. We next sought to gain insight into the transcription factors these regulatory variants perturb, and applied the same procedure to ChIP-seq binding data for 165 proteins (Figure 1, Supplementary Figure 8). Using a Bonferroni correction for the 165 tested proteins (p<0.00030), the 89 credible set variants overlapping 57 FOXA2 ChIP-seq binding sites, assayed in human HepG2[10] and islet[14] cells, had significantly higher posterior probability of driving association signals than expected from the null distribution (8.24-fold, p=0.00028). The enrichment of FOXA2 ChIP-seq sites was exclusive to those shared with at least one other factor (9.18-fold, p=0.00028) compared to those that were not (1.12-fold, p=0.11). FOXA2 enrichment was also more pronounced among sites identified in pancreatic islets (15.43-fold, p=0.00045) than in HepG2 cells (4.55-fold, p=0.011). To exclude the possibility that this enrichment in HepG2 cells was driven by artefacts caused by a cultured cell line, we compared FOXA2 HepG2 sites to those previously assayed in primary liver[39]. We observed significant intersection of the HepG2 and liver FOXA2 sites that overlapped credible set variants (p=1.5×10−9). Consequently, we detected similar FOXA2 enrichment among sites detected in liver (3.63-fold, p=0.061) to that observed in HepG2 cells. We also compared FOXA2 ChIP-seq sites, genome-wide, from liver, HepG2 and islet cells (Supplementary Figure 9). The number of sites varied across cell types (8,023 for liver, 40,866 for HepG2, and 27,291 for islets), which is likely due, in part, to technical differences including sequencing platform, depth and read length. However, the intersection of FOXA2 sites between each pair of cell types was highly significant (p<2.2×10−16).

Figure 1

FOX2A bound sites are a genomic marker of T2D risk variants

(A) Variants in ChIP-seq binding sites for 165 proteins were tested for enrichment of posterior probabilities compared to variants in shifted sites. Variants in FOXA2 ChIP-seq sites were significantly enriched (p<0.00030). (B) FOXA2 ChIP-seq sites were partitioned based on overlap with other genomic features. There was stronger enrichment in: (i) FOXA2 sites overlapping a ChIP-seq site for another protein compared to unique sites; (ii) sites identified in primary islets compared to HepG2 or primary liver cells; and (iii) sites overlapping islet enhancers compared to those that did not (**p<0.00030; *p<0.05). (C) Variants at each signal were tested for FOXA2 enrichment. Nineteen signals had greater enrichment than expected compared to shifted sites; at 15 signals this enrichment was nominally-significant (p<0.05). (D) FOXA2-bound variants disrupting recognition motifs have an increased probability of being causal.

Given the preponderance of T2D-associated variants for islet enhancers, we next tested to what extent FOXA2 enrichment is driven by co-localisation with these genomic features[14]. Variants in FOXA2-bound sites were not enriched for posterior probability of driving association signals after removing enhancer sites (0.36-fold, p=0.69). Conversely, variants in islet enhancers remained nominally enriched when removing FOXA2 sites (1.65-fold, p=0.014). These results suggest that FOXA2 binding assayed by ChIP-seq, at a subset of enhancer element locations that are often shared by other proteins, is a genomic marker of variants with an increased posterior probability of driving T2D association signals. Having demonstrated global over-representation for FOXA2 ChIP-seq binding by considering all loci simultaneously, we applied the same procedure to the 99% credible sets of each distinct association signal, separately, to identify those with the strongest evidence for local enrichment (Figure 1). We observed over-representation of credible set variants in islet or HepG2 FOXA2 sites for 19 association signals, 15 of which attained nominal significance (p<0.05). A total of 41 credible set variants at these 19 distinct association signals overlap a FOXA2 ChIP-seq site in at least one of the two cell types (Supplementary Table 12). Of these, 12 variants were predicted to disrupt de novo recognition motifs (for FOXA2 and other factors) that were enriched in FOXA2-bound sequence (Table 3, Supplementary Table 13). The mean posterior probability of driving the association (πC) for these 12 variants was 22.0% on the basis of genetic fine-mapping (Figure 1), more than four times greater than for those in FOXA2 ChIP-seq sites that were not motif-disrupting at the same signals (mean πC of 5.2%, p=0.024). Furthermore, 11 of these 12 variants also overlapped an enhancer element in islets (9 variants) or HepG2 cells (6 variants), indicating that they are in transcriptionally active regions (Table 3). They include two variants with experimentally validated differences in regulatory activity: rs7903146 (πC=77.6%) at TCF7L2[40] and rs11257655 (πC=21.1%) at CDC123[41]. They also include rs10830963, the index variant at the MTNR1B locus, which accounts for 99.8% of the posterior probability of driving the association signal on the basis of genetic fine-mapping. These results suggest that FOXA2 binding patterns can be used to highlight specific variants that are potentially causal for T2D susceptibility through altered regulatory binding.

Table 3

Motif-altering credible set variants in FOXA2 sites

Locus	Index variant	Motif-alteringvariant	Chr	Position (b37)	Posteriorprobability (π_C)	Motif allele	Chromatin state
MTNR1B	rs10830963	rs10830963	11	92708710	0.998	G	Islet-enhancer, HepG2-enhancer
TCF7L2	rs7903146	rs7903146	10	114,758,349	0.78	T	Islet-enhancer
SLC30A8	rs13266634	rs13266634	8	118,184,783	0.29	T	Islet-enhancer
CDKN2A-B	rs10811660	rs10811660	9	22,134,068	0.24	A	Islet-enhancer
CDC123	rs11257658	rs11257655	10	12,307,894	0.21	T	Islet-enhancer, HepG2-enhancer
JAZF1	rs1513272	rs849133	7	28,192,280	0.042	T	Islet-enhancer
KCNQ1	rs2283220	rs231907	11	2,752,130	0.031	T	HepG2-enhancer
FTO	rs9927317	rs9940128	16	53,800,754	0.027	G	Islet-enhancer, HepG2-enhancer
FTO	rs9927317	rs9939973	16	53,800,568	0.025	G	Islet-enhancer, HepG2-enhancer
KCNQ1	rs458069	rs78688069	11	2,752,183	0.0006	A	HepG2-enhancer
KCNQ1	rs458069	rs190728714	11	2,813,084	0.00042	G	Islet-enhancer
DGKB	rs10276674	rs7798360	7	15,055,972	0.00005	G	-

Chr: chromosome.

Altered regulatory activity of the MTNR1B credible variant

To demonstrate how local enrichment of FOXA2 binding can be used to highlight regulatory mechanisms through which credible variants might impact T2D susceptibility, we focussed on the MTNR1B locus. Variants mapping to this region have amongst the strongest known effects on both T2D risk[6] and fasting plasma glucose concentration[42], and physiological data indicate an impact of MTNR1B on both insulin secretion and insulin action[43]. The lone credible variant at MTNR1B, rs10830963, overlaps a FOXA2 ChIP-seq binding site, and the risk allele, G, is predicted to create a recognition motif that matches the consensus sequence of NEUROD1 and several other factors (Figure 2, Supplementary Table 13). We tested in silico predictions of protein binding at rs10830963 via electrophoretic mobility shift assay (EMSA) with 25bp probe fragments surrounding each allele in human pancreatic islet beta-cell (EndoC-βH1)[44] or human liver HepG2 cell extracts. We observed allele-specific binding with extracts from both cell lines (Figure 2, Supplementary Figure 10).

Figure 2

The lone variant in the 99% credible set at the MTNR1B locus affects FOXA2-bound enhancer activity

(A) The intronic variant, rs10830963, has 99.8% probability of driving the association signal at the MTRN1B locus. This variant overlaps a FOXA2 binding site, and the risk allele G is predicted to create a de novo recognition motif, which closely matches the NEUROD1 consensus. (B) Electrophoretic mobility shift assay of a 25bp fragment surrounding both alleles in EndoC-βH1 cell extracts. Proteins were bound to both alleles. In the presence of a NEUROD1 antibody, only the risk allele band was super-shifted, and in the presence of an unlabelled NEUROD1 consensus probe, the signal was competed away. NE: nuclear extract. (C, D) The 224bp sequence surrounding each allele was cloned into a luciferase reporter construct containing a minimal promoter and tested for luciferase activity in (C) EndoC-βH1 and (D) HepG2 cells (n=3 for each cell type). Results are presented as mean ± standard error. The risk allele had significantly increased enhancer activity over the protective allele in both forward and reverse orientations in both cell types.

To determine the specific protein(s) bound at each allele, we then performed supershift experiments using antibodies directed against NEUROD1, FOXA2, and three other factors (TAL1, PTF1A, and YY1), whose consensus binding sequences resemble the recognition motif (Online Methods). We observed a shift in the presence of the NEUROD1 antibody on the risk allele in EndoC-βH1 extracts, which could be competed away by an excess of unlabelled NEUROD1 consensus sequence probe (Figure 2). None of the tested antibodies (including NEUROD1) shifted the risk allele band in HepG2 cell extracts (Supplementary Figure 10). These results demonstrate that, in vitro, the risk allele of rs10830963 preferentially binds NEUROD1 in islet-derived cells, and binds a protein not identified from known recognition motifs in liver-derived cells. To relate allelic differences in protein binding to genomic activity at this site, we cloned a 224bp region surrounding rs10830963 into a luciferase reporter vector containing a minimal promoter, and tested its enhancer activity in EndoC-βH1 and HepG2 cell lines. Consistent with in silico predictions, we observed a significant (p<0.05) increase in luciferase expression on the risk allele compared to the protective allele in both cell lines (Figure 2). Furthermore, RNA-seq data reported from human islets have linked the T2D risk allele of rs10830963 to increased expression of MTNR1B[45,46]. Taken together, these results suggest that the G allele of rs10830963 increases T2D risk through increased FOXA2-bound enhancer activity, potentially mediated through NEUROD1 binding in islets, and consequently higher expression of MTNR1B.

Candidate effector genes at FOXA2 enriched T2D signals

We hypothesised that the locus-specific effects of murine transcription factor knockout models would mimic patterns of binding enrichment at human disease loci. We thus attempted to relate FOXA2 binding at the 19 FOXA2-enriched association signals (Figure 1) to target effector genes using previously reported pancreatic islet expression profiles from wild-type and Foxa1/2-null mice[47] (Online Methods). Syntenic genes mapping within 500kb of the credible set at the 19 FOXA2-enriched signals were significantly down-regulated (45.2% decrease) in Foxa1/2 knockout mice (Supplementary Figure 11) compared to those genome-wide (0.021% increase, p=0.012), whilst those mapping within 500kb of the other 30 T2D association signals were not (2.25% decrease, p=0.20). We observed a consistent down-regulation (39.6% decrease) when considering only those genes mapping closest to each FOXA2-enriched signal, compared to those genome-wide (0.021% increase, p=0.0021). Thus, data related to altered gene expression in Foxa1/2 knockout mice support patterns of FOXA2 binding site enrichment in humans. We next identified specific genes at the 19 FOXA2-enriched association signals that were down-regulated in Foxa1/2 knockout mice, which might represent effector transcripts for these loci (Supplementary Table 14). Several of these genes have been previously implicated as likely effector transcripts in humans, including TCF7L2[48,49] (57% decrease), KCNJ11[7,50] (38% decrease), and SLC30A8[51] (135% decrease). These data also implicate novel candidate effector genes at FOXA2-enriched association signals (Supplementary Table 14). For example, in Foxa1/2 knockout mice, there is a marked down-regulation of Reg4 (1,415% decrease), which maps to a syntenic region at the FOXA2-enriched NOTCH2 GWAS locus, highlighting REG4 as a likely effector transcript in humans. Additional examples of candidate effector genes include IGF2 at the KCNQ1 locus (135% decrease), and CAMK1D at the CDC123 locus (81% decrease). Together, these results provide additional support for the importance of FOXA2 binding at a subset of T2D susceptibility loci, and further highlight specific genes through which regulatory variants in these regions may operate.

DISCUSSION

We have undertaken comprehensive fine-mapping of 39 established T2D susceptibility loci in 27,206 cases and 57,574 controls of European ancestry, and have demonstrated that multiple distinct association signals in these regions is a common phenomenon. Index variants for just three of the 49 distinct association signals are not common, despite near complete coverage of variation with MAF≥0.5% in fine-mapping regions after 1000G imputation. Although we cannot evaluate the impact of rare variation (MAF<0.5%) in established T2D susceptibility loci without large-scale re-sequencing, our data strongly argue against a role for low-frequency variants of large effect via synthetic association[52]. We have demonstrated that seven distinct association signals, mapping to six T2D susceptibility loci represented on the Metabochip, are likely to be driven by coding alleles, including novel index variants mapping to HNF1A and HNF4A. Outside of these regions, our fine-mapping confirms previous reports that T2D association signals are primarily driven by non-coding alleles, with effects that are mediated through gene regulation[10,13,14,38]. We have demonstrated, by genomic annotation and functional assays, that FOXA2 binding assayed by ChIP-seq can be used to pinpoint candidate causal regulatory elements, providing routes to understanding the biology of specific T2D susceptibility loci. These elements highlight variants and effector transcripts through which association signals are mediated, via altered binding of either FOXA2, directly, or another transcription factor. For example, at the MTNR1B locus, the risk allele of the lone credible variant, rs10830963, which drives the T2D association signal, preferentially binds NEUROD1 in islet-derived cells in vitro, and increases FOXA2-bound enhancer activity in human islet and liver-derived cells. These data are consistent with previous reports correlating the risk allele with higher MTNR1B expression[45,46], and not loss of function[53], and suggest altered NEUROD1 binding in islets contributes to T2D susceptibility at this locus. Further experiments will be required to establish that our in vitro findings regarding NEUROD1 binding can be confirmed in vivo. However, our attempts to perform ChIP-Seq in primary islet samples of the defined MTNR1B genotype were repeatedly unsuccessful, owing to a lack of a suitable NEUROD1 antibody. These studies are further complicated by the limited availability of primary human islets, and the slow division rate of human islet derived cell-lines is an impediment to the implementation of genome-editing technologies. FOXA2 is a pioneer factor that binds native chromatin and bookmarks genomic regions for transcriptional activity[54], and is involved in pancreatic and hepatic development[55,56]. FOXA2 is also expressed in other T2D-relevant cell types, such as adipocytes. Future studies will be required to elucidate the extent to which FOXA2 binding events across cell types influence disease risk. Foxa2 null mice have impaired insulin secretion[47], and common variants at the FOXA2 locus are associated with fasting plasma glucose concentrations[42,57]. Our findings are thus consistent with the involvement of FOXA2 in maintaining normal glucose homeostasis. Common T2D-associated variants at FOXA2 have also been reported in South Asians[58], although they do not attain genome-wide significance in the largest GWAS for the disease from multiple ancestry groups[1-5], and therefore require further replication. Enrichment of FOXA2 binding has also been reported within genomic intervals containing GWAS signals for endocrine, neuropsychiatric, cardiovascular and cancer traits[59]. Our study has the advantage that we consider only those FOXA2 sites that directly overlap variants that drive association signals by first fine-mapping GWAS loci, thereby providing more targeted credible sets for functional enrichment. Nevertheless, the results of these studies, taken together, suggest a possible role for FOXA2 across a broad spectrum of complex human phenotypes. In conclusion, we have highlighted that FOXA2 binding patterns can be used to inform future hypothesis-driven investigation of the variants, genes, and molecular mechanisms underlying T2D association signals mapping to non-coding sequence. Continued identification of the effector transcripts at these non-coding association signals will require the use of expression QTL and knockout models, in combination with high-throughput experimental data derived from chromatin conformation capture techniques, such as Capture-C. Our findings support the use of transcription factor binding events as a means to partition susceptibility loci, potentially residing in distinct pathways, within disease-relevant cell types. Finally, our study demonstrates the utility of fine-mapping through integration of genetic and genomic information from relevant tissues and cellular models to elucidate the pathophysiology of complex human diseases, thus offering a promising avenue for translation of GWAS findings for clinical utility.

ONLINE METHODS

Ethics statement

All human research was approved by the relevant institutional review boards, and conducted according to the Declaration of Helsinki. All participants provided written informed consent.

Metabochip imputation and association analysis

We considered a total of 27,206 T2D cases and 57,574 controls from 23 studies from populations of European ancestry (Supplementary Table 1), all genotyped with the Metabochip. Sample and variant quality control was performed within each study (Supplementary Table 2). To improve the quality of the genotype scaffold in each study, variants were subsequently removed if: (i) allele frequencies differed from those for European ancestry haplotypes from the 1000 Genomes Project Consortium phase 1 integrated reference panel (March 2012 release)[18] by more than 20%; AT/GC variants had MAF>40% because of potential undetected errors in strand alignment; or (iii) MAF<1% because of difficulties in calling rare variants. Each scaffold was then imputed up to up to the phase 1 integrated reference panel (all ancestries, March 2012 release) from the 1000 Genomes Project Consortium[18], using IMPUTEv2[21] or minimac[22]. Within each study, well-imputed variants (IMPUTEv2[21] info>0.4 or minimac[22] r2>0.3) were tested for T2D association under an additive model after adjustment for study-specific covariates (Supplementary Table 2), including principal components to adjust for population structure. Association summary statistics for each variant for each study were corrected for residual population structure using the genomic control inflation factor[60] obtained from 3,598 independent (r2<0.05) QT-interval variants, which were not expected to be associated with T2D[4] (Supplementary Table 2). We then combined association summary statistics for each variant across studies via fixed-effects inverse-variance weighted meta-analysis. The results of the meta-analysis were subsequently corrected by a second round of QT-interval genomic control (λQT=1.18) to account for structure between studies. Variants were excluded from downstream analyses if they were reported in less than 80% of the total effective sample size, defined as Neff = 4×Ncases×Ncontrols/(Ncases+Ncontrols), thus removing those that were not well imputed in the majority of studies.

Identification of distinct association signals in established GWAS loci

We used GCTA[23] to select index variants in each of the 39 established loci represented on Metabochip with nominal evidence of association (pJ<0.001) with T2D in an approximate joint regression model. The GCTA model made use of: (i) summary statistics from the fixed-effects meta-analysis Metabochip studies; and (ii) genotype data for 3,298 T2D cases and 3,708 controls of UK ancestry from GoDARTS as a reference for LD across each fine-mapping region. For comparison, we also obtained association summary statistics for the selected index variants from the GCTA joint regression model on the basis of genotype data from an alternative reference consisting of 4,435 T2D cases and 5,757 controls of Finnish ancestry from FUSION (Supplementary Table 15, Supplementary Figure 12). Selected index variants were then carried forward for in silico follow-up in validation meta-analysis. The validation meta-analysis consisted of 19,662 T2D cases and 115,140 controls from 10 GWAS from populations of European ancestry, genotyped with a range of genome-wide arrays (Supplementary Table 1). Sample and variant quality control was performed within each study (Supplementary Table 2). Each scaffold was then imputed up to the phase 1 integrated reference panel (all ancestries, March 2012 release) from the 1000 Genomes Project Consortium[18], using IMPUTEv2[21] or minimac[22]. Within each study, well-imputed variants (IMPUTEv2[21] info≥0.4 or minimac[22] r2≥0.3) were tested for T2D association under an additive model after adjustment for study-specific covariates (Supplementary Table 2), including principal components to adjust for population structure. Association summary statistics for each variant for each study were corrected for residual population structure using the genomic control inflation factor[60] (Supplementary Table 2). We then combined association summary statistics for each variant across studies via fixed-effects inverse-variance weighted meta-analysis. Association summary statistics for the selected index variants from the Metabochip and validation meta-analyses were next combined via fixed-effects inverse-variance weighted meta-analysis. In each of the 39 established loci represented on Metabochip, GCTA[23] was used to select index variants with locus-wide evidence of association (pJ<10−5) in the approximate joint regression model on the basis of: (i) summary statistics from the combined meta-analysis; and (ii) genotype data for 3,298 T2D cases and 3,708 controls from GoDARTS as a reference for LD across each fine-mapping region. For established loci with multiple index variants selected at locus-wide significance from the GCTA approximate joint regression model in combined meta-analysis, we performed exact conditioning within each Metabochip study (Supplementary Table 7). To obtain the association signal attributed to a specific index variant, high-quality variants (IMPUTEv2[21] info>0.4 or minimac[22] r2>0.3) were tested for T2D association under an additive model after adjustment for study-specific covariates (Supplementary Table 2) and genotypes at other selected index variants in the fine-mapping region. Association summary statistics for each study were corrected for residual population structure using the QT interval genomic control inflation factor obtained in the Metabochip meta-analysis. For each association signal, summary statistics for each variant were then combined across discovery studies via fixed-effects inverse-variance meta-analysis, and subsequently corrected by a second round of QT-interval genomic control (λQT=1.18).

Credible set construction

In an ideal fine-mapping experiment, we would calculate the posterior probability of driving each distinct association signal for all variants mapping to a locus. However, the posterior probability is determined by the association signal effect size of the variant and the corresponding standard error, which is also impacted by the quality of imputation across studies, amongst other factors. To minimise the impact of imputation quality on fine-mapping, we therefore retained only those variants that were directly typed and/or well imputed in at least 80% of the total effective sample size. Assuming that the variant driving an association signal meets these quality criteria, the probability that it would be contained within the 99% credible set would be ~0.99. For each distinct signal, we first calculated the posterior probability, πC, that the jth variant is driving the association, given by where the summation is over all retained variants in the fine-mapping region. In this expression, Λ is the approximate Bayes’ factor[61] for the jth variant, given by where β and V denote the estimated allelic effect (log-OR) and corresponding variance from the meta-analysis across Metabochip studies. In loci with multiple distinct signals of association, results are presented from exact conditional meta-analysis after adjusting for all other index variants in the fine-mapping region. In loci with a single association signal, results are presented from unconditional meta-analysis. The parameter ω denotes the prior variance in allelic effects, taken here to be 0.04[61]. The 99% credible set[29] for each signal was then constructed by: (i) ranking all variants according to their Bayes’ factor, Λ; and (ii) including ranked variants until their cumulative posterior probability of driving the association attained or exceeded 0.99.

Genomic annotation data and enrichment analyses

We obtained genomic annotation data for transcription factor binding sites (TFBS) assayed through ChIP experiments from multiple sources. We used sites from the ENCODE Project Consortium[10] for 161 proteins available from the UCSC human genome browser. We also obtained raw ChIP and input sequence data for additional factors assayed in primary pancreatic islets[14]. We then processed these additional factors using protocols employed by the ENCODE Project Consortium[10]. First, sequence reads were aligned to the human genome (hg19) using BWA[62] with sex-specific references, and were then converted to BAM files using SAMtools[63] after removing duplicate and non-uniquely mapped reads. Binding sites were called from reads of each replicate, as well as reads pooled across all replicates, using SPP[64]. Raw sites from each replicate of a protein were compared using an irreproducible discovery rate[65] (IDR) threshold of 0.02. The resulting number of sites passing this IDR threshold was then used to filter the pooled sites of a protein. The set of sites were further filtered for artefacts using a blacklist of genomic regions from the ENCODE Project Consortium. Sites from all sources for each protein, including ENCODE, were then combined. The complete set of 165 proteins employed in these analyses is presented in Supplementary Table 11. In addition, we obtained FOXA2 ChIP-seq sites that were previously identified in human liver[39] and lifted their positions to hg19. We obtained annotation data for five histone modifications (H3K4me1, H3K4me3, H3K27ac, H3K36me3, and H3K27me3) and CTCF binding assayed from ChIP experiments. We used data from 9 cell types from ENCODE[10] (Gm12878, K562, Hepg2, Hsmm, Huvec, Nhek, Nhlf, h1Hesc, and Hmec); we also obtained raw ChIP data assayed in primary pancreatic islets[14] and pre-mature and mature human adipose stromal cells[66]. We mapped reads to hg19 using BWA[62], and used the resulting mapped reads from these 12 cell types as input to ChromHMM[67]. We assigned states based on the following chromatin signatures: active promoter (H3K4me3 and H3K27ac); strong enhancer 1 (H3K4me3, H3K27ac, and H3K4me1); strong enhancer 2 (H3K27ac and H3K4me1); weak enhancer (H3K4me1); poised promoter (H3K27me3, H3K4me3, and H3K4me1); repressed (H3K27me3); insulator (CTCF); and transcription (H3K36me3). For each cell type, we pooled the three enhancer states into one enhancer category, and the two promoter states into one promoter category. We also identified long non-coding RNA data from the Human Body Map (UCSC genome browser) and from pancreatic islets[68]. For each genomic annotation, we tested for overall enrichment of the posterior probability that overlapping variants in the 99% credible sets are driving distinct association signals (πC). We first calculated the mean posterior probability (mean πC) over the set of variants overlapping a given annotation. We then generated a null distribution of the mean posterior probability (mean πC) by: (i) shifting the genomic locations of binding sites a random distance within 100kb in either direction; (ii) recalculating the mean posterior probability for 99% credible set variants overlapping shifted sites; and (iii) repeating this procedure 100,000 times. We estimated the fold-enrichment of each overlap by calculating the expected null posterior probability, and dividing the observed probability by the expected probability. We calculated a p-value for the enrichment by the proportion of permutations for which the expected posterior probability of driving the association signal was greater than or equal to that observed. We considered cell type annotations to be significantly enriched if the p-value was less than 0.05/37 = 0.0014 (Bonferroni correction for 37 annotations). We considered TFBS annotations to be significantly enriched if the p-value was less than 0.05/165 = 0.00030 (Bonferroni correction for 165 factors). We next partitioned binding sites into those that are “shared” with another factor (i.e. genomic interval intersects a site for at least one other factor), and those that are “unique”. We also partitioned binding sites based on overlap with islet enhancer elements. For each factor with significant enrichment across all credible sets (FOXA2), we applied the same enrichment analysis, but restricted to credible set variants for each distinct association signal, separately. We assessed the evidence for intersection in FOXA2 ChIP-seq sites from islets[14], HepG2[10], and liver[39], genome-wide and overlapping credible set variants, using BEDtools[69].

Motif analysis

We conducted recognition motif enhancement analyses for the set of FOXA2 ChIP-seq binding sites. First, we obtained repeat-masked genomic sequence underlying each site using the UCSC human genome browser. We scanned sequences for enrichment in these motifs using MEME-ChIP[70], which uses up to 100bp surrounding the mid-point of each site. This resulted in 198 enriched motifs with E-value (expected number of hits) less than 0.05 (Supplementary Table 16). We compared each motif to those known from JASPAR[71], ENCODE[10], and Homer[72] using Tomtom[73]. Second, we identified variants in FOXA2 ChIP-seq sites predicted to disrupt an enriched recognition motif by: (i) scanning a 25bp of sequence flanking each variant allele using FIMO[74] (p<0.0001); and (ii) retaining variants in highly conserved positions (entropy less than 0.5). For the 12 variants at FOXA2-enriched signals disrupting at least one recognition motif (Table 3, Supplementary Table 14), we compared their posterior probabilities of driving the association (πC) with those for non-disrupting variants in FOXA2 ChIP-seq sites at the same signals using a two-sided Wilcoxon rank-sum test.

Electrophoretic mobility shift assays

EMSA was performed using nuclear extracts from human HepG2 and EndoC-βH1 cells. HepG2 cells were the generous gift of the Ratcliffe laboratory[75] and authenticated by genotyping in the MHC region. Endo-βH1 cells were obtained from Endocells and have been previously authenticated[44]. Both cell lines were tested and found negative for mycoplasma contamination. Nuclear extracts were incubated with 32P gamma-ATP end-labeled double-stranded DNA probes (PerkinElmer, MA). The forward strand probe sequences used are presented in Supplementary Table 17. For each lane of the EMSA, 5μg of nuclear extract was incubated with 100 fmol labeled probes in a 10ul binding reaction containing 10mM Tris-HCl pH7.5, 4% glycerol, 1mM MgCl2, 0.5mM EDTA, 0.5mM DTT, 50mM NaCl and 1μg poly(dI-dC). For competition assays unlabeled probe at 100-fold excess was added to the binding reaction before addition of labeled probes. For super-shift assays the nuclear extract was pre-incubated with 1μg antibody for 30 minutes on ice before the probe was added. The following antibodies were used: anti-NEUROD1 (sc-1084X, Santa Cruz Biotechnology, Texas), anti-PTF1A (sc-98612X, Santa Cruz Biotechnology, Texas), anti-HNF3B (FOXA2) (sc-6554X, Santa Cruz Biotechnology, Texas), anti-YY1 (sc281X, Santa Cruz Biotechnology, Texas), anti-TAL1 (sc12984X, Santa Cruz Biotechnology, Texas), normal rabbit Ig (sc-2027, Santa Cruz Biotechnology, Texas), normal goat Ig (sc-2028, Santa Cruz Biotechnology, Texas).

Luciferase activity

We synthesised 224bp nucleotide sequences containing either the risk or protective allele of the MTNR1B enhancer sequence rs10830963 in either the forward or reverse orientation by GeneArt (Life Technologies). Complementary single-stranded oligos were then annealed and sub-cloned into the minimal promoter-driven luciferase vector pGL4.23 (Promega) using Nhel and Xhol. Isolated clones were verified by sequencing. For luciferase assays, human liver HepG2 and human beta-cell EndoC-βH1[45] cells were counted and seeded into 24 well trays (Corning) at 1.5×105 (HepG2) or 1.4×105 (EndoC-βH1) cells/well. Transfections were performed in triplicate with either Lipofectamine 2000 (HepG2) or Fugene 6 (EndoC-βH1) as per manufacturer’s instructions. Cells were transfected with 700ng pGL4.23 DNA containing the protective or risk MTNR1B enhancer sequence in either the forward or reverse orientation, or an equivalent amount of empty vector DNA, plus 10ng pRL-SV40 DNA (Promega) as a transfection control, per well. Cells were lysed 48 hours post-transfection and analysed for Firefly and Renilla luciferase activities using the Dual Luciferase Assay System (Promega) as per manufacturer’s instructions, in half-volume 96 well tray format on an Enspire Multimode Plate Reader (Perkin Elmer). Firefly luciferase activity was normalised to Renilla luciferase activity for each well, and the results expressed as a mean normalised activity relative to empty vector-transfected cells. All experiments were performed three times in triplicate. A two-sided unpaired t-test was used to compare luciferase activity between alleles.

Mouse gene expression analysis

We obtained fold-changes in pancreatic islet gene expression in wild type compared to Foxa1/Foxa2-null mice[47]. We used ENSEMBL to map mouse genes to human orthologs. We filtered for human genes annotated as protein coding in GENCODE. This filtering resulted in 4,629 human protein coding genes for analysis. First, we calculated the genomic interval spanned by the variants in each credible set. We expanded this interval for 500kb on either side, and identified the set of genes overlapping this region using BEDtools[69]. To account for syntenic differences in gene order between species, we retained only those genes that were: (a) on the same chromosome; and (b) in exactly the same relative order in both mouse and human genomes. At the GIPR locus, one of the genes was ordered differently and thus removed from the analysis. At two loci, KCNJ11 and HNF1A, at least one of the genes was located on a different part of the same chromosome, and at another locus, GCK, genes were located on different chromosomes. For these three loci, we retained only those genes that were at the same chromosomal location to the interval covered by the credible set for the association signal for that locus (by lifting over from hg19 to mouse build mm10). Second, for each distinct association signal, we identified the closest gene to the index variant using BEDtools[69]. We then partitioned distinct association signals into those with evidence for enriched FOXA2 binding (fold-enrichment >1) and those without, counting each gene only once in a given group. For each analysis, we converted the fold-changes to percentages, and compared the percent change in expression using a one-sided Wilcoxon rank-sum test between genes in each partition and all 4,629 protein coding genes.

74 in total

Review 1. Genotype imputation for genome-wide association studies.

Authors: Jonathan Marchini; Bryan Howie
Journal: Nat Rev Genet Date: 2010-07 Impact factor: 53.242

2. A genetically engineered human pancreatic β cell line exhibiting glucose-inducible insulin secretion.

Authors: Philippe Ravassard; Yasmine Hazhouz; Séverine Pechberty; Emilie Bricout-Neveu; Mathieu Armanet; Paul Czernichow; Raphael Scharfmann
Journal: J Clin Invest Date: 2011-08-25 Impact factor: 14.808

3. Species-specific strategies underlying conserved functions of metabolic transcription factors.

Authors: Raymond E Soccio; Geetu Tuteja; Logan J Everett; Zhaoyu Li; Mitchell A Lazar; Klaus H Kaestner
Journal: Mol Endocrinol Date: 2011-02-03

4. Comparing strategies to fine-map the association of common SNPs at chromosome 9p21 with type 2 diabetes and myocardial infarction.

Authors: Jessica Shea; Vineeta Agarwala; Anthony A Philippakis; Jared Maguire; Eric Banks; Mark Depristo; Brian Thomson; Candace Guiducci; Robert C Onofrio; Sekar Kathiresan; Stacey Gabriel; Noël P Burtt; Mark J Daly; Leif Groop; David Altshuler
Journal: Nat Genet Date: 2011-07-24 Impact factor: 38.330

5. A map of human genome variation from population-scale sequencing.

Authors: Gonçalo R Abecasis; David Altshuler; Adam Auton; Lisa D Brooks; Richard M Durbin; Richard A Gibbs; Matt E Hurles; Gil A McVean
Journal: Nature Date: 2010-10-28 Impact factor: 49.962

6. FIMO: scanning for occurrences of a given motif.

Authors: Charles E Grant; Timothy L Bailey; William Stafford Noble
Journal: Bioinformatics Date: 2011-02-16 Impact factor: 6.937

Review 7. Pioneer transcription factors: establishing competence for gene expression.

Authors: Kenneth S Zaret; Jason S Carroll
Journal: Genes Dev Date: 2011-11-01 Impact factor: 11.361

8. Genome-wide association study in individuals of South Asian ancestry identifies six new type 2 diabetes susceptibility loci.

Authors: Jaspal S Kooner; Danish Saleheen; Xueling Sim; Joban Sehmi; Weihua Zhang; Philippe Frossard; Latonya F Been; Kee-Seng Chia; Antigone S Dimas; Neelam Hassanali; Tazeen Jafar; Jeremy B M Jowett; Xinzhong Li; Venkatesan Radha; Simon D Rees; Fumihiko Takeuchi; Robin Young; Tin Aung; Abdul Basit; Manickam Chidambaram; Debashish Das; Elin Grundberg; Asa K Hedman; Zafar I Hydrie; Muhammed Islam; Chiea-Chuen Khor; Sudhir Kowlessur; Malene M Kristensen; Samuel Liju; Wei-Yen Lim; David R Matthews; Jianjun Liu; Andrew P Morris; Alexandra C Nica; Janani M Pinidiyapathirage; Inga Prokopenko; Asif Rasheed; Maria Samuel; Nabi Shah; A Samad Shera; Kerrin S Small; Chen Suo; Ananda R Wickremasinghe; Tien Yin Wong; Mingyu Yang; Fan Zhang; Goncalo R Abecasis; Anthony H Barnett; Mark Caulfield; Panos Deloukas; Timothy M Frayling; Philippe Froguel; Norihiro Kato; Prasad Katulanda; M Ann Kelly; Junbin Liang; Viswanathan Mohan; Dharambir K Sanghera; James Scott; Mark Seielstad; Paul Z Zimmet; Paul Elliott; Yik Ying Teo; Mark I McCarthy; John Danesh; E Shyong Tai; John C Chambers
Journal: Nat Genet Date: 2011-08-28 Impact factor: 38.330

9. A role for coding functional variants in HNF4A in type 2 diabetes susceptibility.

Authors: B Jafar-Mohammadi; C J Groves; A P Gjesing; B M Herrera; W Winckler; H M Stringham; A P Morris; T Lauritzen; A S F Doney; A D Morris; M N Weedon; A J Swift; J Kuusisto; M Laakso; D Altshuler; A T Hattersley; F S Collins; M Boehnke; T Hansen; O Pedersen; C N A Palmer; T M Frayling; A L Gloyn; M I McCarthy
Journal: Diabetologia Date: 2010-09-29 Impact factor: 10.122

10. MEME-ChIP: motif analysis of large DNA datasets.

Authors: Philip Machanick; Timothy L Bailey
Journal: Bioinformatics Date: 2011-04-12 Impact factor: 6.937

194 in total

1. A circadian rhythm-related MTNR1B genetic variant modulates the effect of weight-loss diets on changes in adiposity and body composition: the POUNDS Lost trial.

Authors: Leticia Goni; Dianjianyi Sun; Yoriko Heianza; Tiange Wang; Tao Huang; J Alfredo Martínez; Xiaoyun Shang; George A Bray; Steven R Smith; Frank M Sacks; Lu Qi
Journal: Eur J Nutr Date: 2018-03-07 Impact factor: 5.614

Review 2. The Continuing Evolution of Precision Health in Type 2 Diabetes: Achievements and Challenges.

Authors: Yuan Lin; Jennifer Wessel
Journal: Curr Diab Rep Date: 2019-02-26 Impact factor: 4.810

3. A Common Type 2 Diabetes Risk Variant Potentiates Activity of an Evolutionarily Conserved Islet Stretch Enhancer and Increases C2CD4A and C2CD4B Expression.

Authors: Ina Kycia; Brooke N Wolford; Jeroen R Huyghe; Christian Fuchsberger; Swarooparani Vadlamudi; Romy Kursawe; Ryan P Welch; Ricardo d'Oliveira Albanus; Asli Uyar; Shubham Khetan; Nathan Lawlor; Mohan Bolisetty; Anubhuti Mathur; Johanna Kuusisto; Markku Laakso; Duygu Ucar; Karen L Mohlke; Michael Boehnke; Francis S Collins; Stephen C J Parker; Michael L Stitzel
Journal: Am J Hum Genet Date: 2018-04-05 Impact factor: 11.025

4. Genetic regulatory signatures underlying islet gene expression and type 2 diabetes.

Authors: Arushi Varshney; Laura J Scott; Ryan P Welch; Michael R Erdos; Peter S Chines; Narisu Narisu; Ricardo D'O Albanus; Peter Orchard; Brooke N Wolford; Romy Kursawe; Swarooparani Vadlamudi; Maren E Cannon; John P Didion; John Hensley; Anthony Kirilusha; Lori L Bonnycastle; D Leland Taylor; Richard Watanabe; Karen L Mohlke; Michael Boehnke; Francis S Collins; Stephen C J Parker; Michael L Stitzel
Journal: Proc Natl Acad Sci U S A Date: 2017-02-13 Impact factor: 11.205

5. Comprehensive Multiple eQTL Detection and Its Application to GWAS Interpretation.

Authors: Biao Zeng; Luke R Lloyd-Jones; Grant W Montgomery; Andres Metspalu; Tonu Esko; Lude Franke; Urmo Vosa; Annique Claringbould; Kenneth L Brigham; Arshed A Quyyumi; Youssef Idaghdour; Jian Yang; Peter M Visscher; Joseph E Powell; Greg Gibson
Journal: Genetics Date: 2019-05-22 Impact factor: 4.562

6. Induction of α cell-restricted Gc in dedifferentiating β cells contributes to stress-induced β-cell dysfunction.

Authors: Taiyi Kuo; Manashree Damle; Bryan J González; Dieter Egli; Mitchell A Lazar; Domenico Accili
Journal: JCI Insight Date: 2019-05-23

7. Tpcn2 knockout mice have improved insulin sensitivity and are protected against high-fat diet-induced weight gain.

Authors: Hong He; Katie Holl; Sarah DeBehnke; Chay Teng Yeo; Polly Hansen; Abraham K Gebre; Sandra Leone-Kabler; Margarida Ruas; John S Parks; John Parrington; Leah C Solberg Woods
Journal: Physiol Genomics Date: 2018-05-11 Impact factor: 3.107

Review 8. The Genetic Architecture of Diabetes in Pregnancy: Implications for Clinical Practice.

Authors: Jeffrey W Kleinberger; Kristin A Maloney; Toni I Pollin
Journal: Am J Perinatol Date: 2016-08-29 Impact factor: 1.862

9. Genome-wide profiling of histone H3K27 acetylation featured fatty acid signalling in pancreatic beta cells in diet-induced obesity in mice.

Authors: Takao Nammo; Haruhide Udagawa; Nobuaki Funahashi; Miho Kawaguchi; Takashi Uebanso; Masaki Hiramoto; Wataru Nishimura; Kazuki Yasuda
Journal: Diabetologia Date: 2018-10-03 Impact factor: 10.122

Review 10. Insights into pancreatic islet cell dysfunction from type 2 diabetes mellitus genetics.

Authors: Nicole A J Krentz; Anna L Gloyn
Journal: Nat Rev Endocrinol Date: 2020-02-25 Impact factor: 43.330