Literature DB >> 24922517

Gene-wide analysis detects two new susceptibility genes for Alzheimer's disease.

Valentina Escott-Price¹, Céline Bellenguez², Li-San Wang³, Seung-Hoan Choi⁴, Denise Harold¹, Lesley Jones¹, Peter Holmans¹, Amy Gerrish¹, Alexey Vedernikov¹, Alexander Richards¹, Anita L DeStefano⁴, Jean-Charles Lambert², Carla A Ibrahim-Verbaas⁵, Adam C Naj⁶, Rebecca Sims¹, Gyungah Jun⁷, Joshua C Bis⁸, Gary W Beecham⁹, Benjamin Grenier-Boley², Giancarlo Russo¹⁰, Tricia A Thornton-Wells¹¹, Nicola Denning¹, Albert V Smith¹², Vincent Chouraki¹³, Charlene Thomas¹, M Arfan Ikram¹⁴, Diana Zelenika¹⁵, Badri N Vardarajan¹⁶, Yoichiro Kamatani¹⁷, Chiao-Feng Lin³, Helena Schmidt¹⁸, Brian Kunkle¹⁹, Melanie L Dunstan¹, Maria Vronskaya¹, Andrew D Johnson²⁰, Agustin Ruiz²¹, Marie-Thérèse Bihoreau¹⁵, Christiane Reitz²², Florence Pasquier²³, Paul Hollingworth¹, Olivier Hanon²⁴, Annette L Fitzpatrick²⁵, Joseph D Buxbaum²⁶, Dominique Campion²⁷, Paul K Crane²⁸, Clinton Baldwin²⁹, Tim Becker³⁰, Vilmundur Gudnason¹², Carlos Cruchaga³¹, David Craig³², Najaf Amin³³, Claudine Berr³⁴, Oscar L Lopez³⁵, Philip L De Jager³⁶, Vincent Deramecourt²³, Janet A Johnston³², Denis Evans³⁷, Simon Lovestone³⁸, Luc Letenneur³⁹, Isabel Hernández²¹, David C Rubinsztein⁴⁰, Gudny Eiriksdottir⁴¹, Kristel Sleegers⁴², Alison M Goate³¹, Nathalie Fiévet⁴³, Matthew J Huentelman⁴⁴, Michael Gill⁴⁵, Kristelle Brown⁴⁶, M Ilyas Kamboh⁴⁷, Lina Keller⁴⁸, Pascale Barberger-Gateau³⁸, Bernadette McGuinness³², Eric B Larson⁴⁹, Amanda J Myers⁵⁰, Carole Dufouil³⁹, Stephen Todd³², David Wallon²⁷, Seth Love⁵¹, Ekaterina Rogaeva⁵², John Gallacher⁵³, Peter St George-Hyslop⁵⁴, Jordi Clarimon⁵⁵, Alberto Lleo⁵⁵, Anthony Bayer⁵³, Debby W Tsuang⁵⁶, Lei Yu⁵⁷, Magda Tsolaki⁵⁸, Paola Bossù⁵⁹, Gianfranco Spalletta⁵⁹, Petra Proitsi³⁸, John Collinge⁶⁰, Sandro Sorbi⁶¹, Florentino Sanchez Garcia⁶², Nick C Fox⁶³, John Hardy⁶⁴, Maria Candida Deniz Naranjo⁶², Paolo Bosco⁶⁵, Robert Clarke⁶⁶, Carol Brayne⁶⁷, Daniela Galimberti⁶⁸, Elio Scarpini⁶⁸, Ubaldo Bonuccelli⁶⁹, Michelangelo Mancuso⁶⁹, Gabriele Siciliano⁶⁹, Susanne Moebus⁷⁰, Patrizia Mecocci⁷¹, Maria Del Zompo⁷², Wolfgang Maier⁷³, Harald Hampel⁷⁴, Alberto Pilotto⁷⁵, Ana Frank-García⁷⁶, Francesco Panza⁷⁷, Vincenzo Solfrizzi⁷⁷, Paolo Caffarra⁷⁸, Benedetta Nacmias⁶¹, William Perry⁹, Manuel Mayhaus⁷⁹, Lars Lannfelt⁸⁰, Hakon Hakonarson⁸¹, Sabrina Pichler⁷⁹, Minerva M Carrasquillo⁸², Martin Ingelsson⁸⁰, Duane Beekly⁸³, Victoria Alvarez⁸⁴, Fanggeng Zou⁸², Otto Valladares³, Steven G Younkin⁸², Eliecer Coto⁸⁴, Kara L Hamilton-Nelson¹⁹, Wei Gu⁸⁵, Cristina Razquin⁸⁶, Pau Pastor⁸⁷, Ignacio Mateo⁸⁸, Michael J Owen¹, Kelley M Faber⁸⁹, Palmi V Jonsson⁹⁰, Onofre Combarros⁸⁸, Michael C O'Donovan¹, Laura B Cantwell³, Hilkka Soininen⁹¹, Deborah Blacker⁹², Simon Mead⁶⁰, Thomas H Mosley⁹³, David A Bennett⁹⁴, Tamara B Harris⁹⁵, Laura Fratiglioni⁹⁶, Clive Holmes⁹⁷, Renee F A G de Bruijn⁹⁸, Peter Passmore³², Thomas J Montine⁹⁹, Karolien Bettens⁴², Jerome I Rotter¹⁰⁰, Alexis Brice¹⁰¹, Kevin Morgan⁴⁶, Tatiana M Foroud⁸⁹, Walter A Kukull¹⁰², Didier Hannequin²⁷, John F Powell³⁸, Michael A Nalls¹⁰³, Karen Ritchie¹⁰⁴, Kathryn L Lunetta⁴, John S K Kauwe¹⁰⁵, Eric Boerwinkle¹⁰⁶, Matthias Riemenschneider⁸⁵, Mercè Boada¹⁰⁷, Mikko Hiltunen⁹¹, Eden R Martin⁹, Reinhold Schmidt¹⁰⁸, Dan Rujescu¹⁰⁹, Jean-François Dartigues¹¹⁰, Richard Mayeux²², Christophe Tzourio¹¹¹, Albert Hofman¹⁴, Markus M Nöthen¹¹², Caroline Graff¹¹³, Bruce M Psaty¹¹⁴, Jonathan L Haines¹¹⁵, Mark Lathrop¹¹⁶, Margaret A Pericak-Vance⁹, Lenore J Launer⁹⁵, Christine Van Broeckhoven⁴², Lindsay A Farrer¹¹⁷, Cornelia M van Duijn¹¹⁸, Alfredo Ramirez¹¹⁹, Sudha Seshadri¹²⁰, Gerard D Schellenberg³, Philippe Amouyel¹²¹, Julie Williams¹.

Abstract

BACKGROUND: Alzheimer's disease is a common debilitating dementia with known heritability, for which 20 late onset susceptibility loci have been identified, but more remain to be discovered. This study sought to identify new susceptibility genes, using an alternative gene-wide analytical approach which tests for patterns of association within genes, in the powerful genome-wide association dataset of the International Genomics of Alzheimer's Project Consortium, comprising over 7 m genotypes from 25,580 Alzheimer's cases and 48,466 controls. PRINCIPAL
FINDINGS: In addition to earlier reported genes, we detected genome-wide significant loci on chromosomes 8 (TP53INP1, p = 1.4×10-6) and 14 (IGHV1-67 p = 7.9×10-8) which indexed novel susceptibility loci. SIGNIFICANCE: The additional genes identified in this study, have an array of functions previously implicated in Alzheimer's disease, including aspects of energy metabolism, protein degradation and the immune system and add further weight to these pathways as potential therapeutic targets in Alzheimer's disease.

Entities: Chemical

Mesh：

Substances：

Year: 2014 PMID： 24922517 PMCID： PMC4055488 DOI： 10.1371/journal.pone.0094661

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

The prevalence of Alzheimer's disease (AD) is increasing as more people live into old age. Hope for finding preventative and clinical therapies lies in the ability to gain a better understanding of the underlying biology of the disease, and genetics will provide a valuable starting point for advancement. Rare monogenic forms of AD, the majority of which are attributable to mutations in one of three genes, APP, PSEN1 and PSEN2, exist, but common, late-onset AD is genetically complex with heritability estimated to be between 56–79%[1], [2]. Along with the APOE polymorphism[3], 20 common susceptibility loci have been identified associated with AD[4]–[9]. (This figure does not include CD33 as it did not show genome-wide significance in the original report[9].) Recently, a moderately rare variant in TREM2 has also shown evidence for association[10]. However, new variants remain to be found. This study sought to identify new susceptibility genes, using an alternative gene-wide analytical approach, which focuses on the pattern of association within gene regions. Genome-wide association (GWA) studies to date have focused on single nucleotide polymorphisms (SNPs) as the unit of analysis. Single locus tests are the simplest to generate and to interpret, but have limitations. For example, if susceptibility is conferred by multiple variants within a locus[11], [12], this gives rise to complex patterns of association that might not be reflected by association to the same SNPs in different samples, despite apparently reasonably powered tests[13], [14]. In addition, rare risk-increasing variants may not be tagged by single SNPs, as is e.g. the case for CLU in which significant enrichment of rare variants in patients was observed independent of the single locus GWA signal[15]. It is therefore likely that the power to detect association might be enhanced by exploiting information from multiple signals within genes encompassed by gene-wide statistical approaches[12]. Disease risk may reflect the co-action of several loci but the number of loci involved at the individual or the population levels are unknown, as is the spectrum of allele frequencies and effect sizes[16]. The observations of multiple genome-wide significant or suggestive linkage signals for disorders, that do not readily replicate between studies but which are not randomly distributed across the genome[17], [18] is compatible with the existence of multiple risk alleles of moderate effect that would implicate a locus in disease risk, when analysed together. Thus the first aim of this study is to test for gene-wide association with AD, using a powerful mega-meta analysis of genome-wide datasets as part of the International Genomics of Alzheimer's Project (IGAP) Consortium comprising four AD genetic consortia (see the full list of consortia members in Materials S1): Genetic and Environmental Risk in Alzheimer's Disease (GERAD), European Alzheimer's Disease Initiative (EADI), Cohorts for Heart and Aging in Genomic Epidemiology (CHARGE) and Alzheimer's Disease Genetics Consortium (ADGC) (see full IGAP datasets description in Materials S2). A two stage study was undertaken. In Stage 1 the combined sample included 17,008 AD cases and 37,154 controls. In Stage 2 loci with p-values (combined over all SNPs at the locus) less than 10−4 were selected for replication for 8,572 AD cases and 11,312 controls of European ancestry. We observed evidence for gene-wide association at loci which implicate genes which already show genome-wide significant association from single SNP analysis (CR1, BIN1, HLA-DRB5/HLA-DRB1, CD2AP, EPHA1, PTK2B, CLU, MS4A6A, PICALM, SORL1, SLC24A4, ABCA7, APOE), three new genes in the vicinity of lately reported single SNP hits[9] (ZNF3, NDUFS3, MTCH2) and two novel loci (TP53INP1, combined p = 1.4×10−6 and IGHV1-67 combined p = 7.9×10−8).

Results

Initially, we tested for excess genetic signal revealed by the Stage 1 IGAP SNP GWAS study. We observed more SNPs at all significance intervals, and more genes at multiple significance thresholds, than expected by chance (Table S1). This is unlikely to be due to uncorrected stratification, since each of the individual GWAS samples in the IGAP Stage 1 analysis was corrected for ethnic variation. Thus it is likely that the sample contains novel genetic signals, in addition to those detected by the primary analysis[9], [19]. Next, we looked at overrepresentation of significant genes in the Stage 1 data. Table 1 gives the observed and expected numbers of significant genes at significance levels 10−4, 10−5, 10−6 when all genes are counted in the analyses and when the known genes (Table S1) and genes within 500kb of them are excluded, the observed numbers of genes are much larger than expected at all significance levels (all p≤0.001). Thus there are more loci associated with AD to find.

Table 1

Overrepresentation of replication of significant genes/loci available at Stage 2, excluding all loci of 0.5 Mb around genes previously reported[4]–[8] and Stage 1 IGAP genes[9], [19] containing genome-wide significant SNPs.

	GENES		LOCI
Stage 1 significance level	Significant at Stage 1	Replicated (p≤0.05) at Stage 2	Significant at Stage 1	Replicated (p≤0.05) at Stage 2	Over-representation p-value
p≤10⁻⁴	27	9 (33%)	9	3 (33%)	0.109
p≤10⁻³	74	17 (23%)	36	8 (22%)	0.125
p≤0.01	229	49 (21%)	102	26 (25%)	0.0001
p≤0.05	390	77 (20%)	171	33 (19%)	0.007
Total (p≤1)	887	124 (14%)	444	60 (13.5%)	4.6×10⁻¹²

Over-representation p-values were calculated with chi-square/Fisher's exact tests counting the genes within 0.5 Mb as one locus.

Over-representation p-values were calculated with chi-square/Fisher's exact tests counting the genes within 0.5 Mb as one locus. Furthermore, the number of independent nominally significant loci at Stage 2 (N = 60, (13.5%)) was significantly greater than expected by chance (p = 4.6×10−12). The percentage of replicated loci increased with the decrease of the gene-wise significance threshold at Stage 1 (see Table 2 for details).

Table 2

Overrepresentation of significant loci, excluding regions of 0.5[4]–[8] and Stage 1 IGAP genes[9], [19] containing genome-wide significant SNPs.

	Numbers of loci (genes)
	p≤10⁻⁴	p≤10⁻⁵	p≤10⁻⁶
Observed	9(27)	4(8)	2(2)
Expected	2.5	0.25	0.025
p-value	0.001	0.00013	0.0003

The observed number of genes is calculated by combining significant loci within 0.5 Mb into one signal. The APOE region is excluded (CHR19; 44,411,940–46,411,945bp). The total number of genes after exclusions is 24,849. Combining the gene-wide p-values in both stages 1 and 2, using Fisher's method revealed two new gene-based genome-wide significant (p<2.5×10−6) loci TP53INP1 and IGHV1-67. The TP53INP1 gene is located on chromosome 8∶95,938,200–95,961,615 and its combined gene-based p-value = 1.4×10−6 (Table 3). Table S3 provides details for each SNP contributing to the gene-based result. Out of 45 SNPs in the gene, three SNPs (rs4735333, rs1713669, rs896855) have p-value≤10−4. Figure 1 shows the LD plot of this gene and suggests that there are at least two partially independent signals in the TP53INP1 gene (r2 between the pairs of most significant SNPs rs4735333-rs1713669 and rs1713669- rs896855 are 0.65 and 0.6 respectively).

Table 3

New genome-wide significant genes associated with AD.

Gene Name	Chr	Position	Stage 1 gene-wide p-value	Stage 2 gene-wide p-value	N of SNPs per gene	Combined gene-wide p-value	Combined best SNP p-value	Biological function
TP53INP1	8	95,938,200–95,961,615	1.7×10⁻²	4.5×10⁻³	45	1.4×10⁻⁶	1.5×10⁻⁷	Regulation of autophagy, cell cycle arrest
IGHV1-67	14	107,136,620–107,137,059	2.3×10⁻⁴	3.2×10⁻⁵	2	7.9×10⁻⁸	3.9×10⁻⁵	Immunoglobulin heavy chain region: adaptive immunity
New genes in the vicinity of recently reported single SNP genome-wide significant hits[9], [19]:
ZNF3	7	99,661,653–99,679,371	2.7×10⁻²	1.8×10⁻⁶	27	8.6×10⁻⁷	3.1×10⁻⁷	Transcription factor, leucocyte activation
NDUFS3	11	47,600,632–47,606,114	1.2×10⁻⁶	2.2×10⁻²	5	4.8×10⁻⁷	2.9×10⁻⁶	Mitochondrial electron transport, NADH to ubiquinone
MTCH2	11	47,638,858–47,664,206	1.7×10⁻⁵	8.7×10⁻³	34	2.5×10⁻⁶	7.2×10⁻⁸	Mitochondrial inner membrane

Gene-wide p-values in the combined Stage 1 and Stage 2 sample obtained by combining the p-values from the Stage 1 with those from the Stage 2 using Fisher's method.

Figure 1

Linkage disequilibrium structure of TP53INP1 gene.

The SNPs which are significant at 10−4 level are circled in red.

Linkage disequilibrium structure of TP53INP1 gene.

The SNPs which are significant at 10−4 level are circled in red. Gene-wide p-values are shown for those genes with p<2.5×10−6 for which the best single-SNP p-value in that gene is greater than 5×10−8 in the combined Stage 1 and Stage 2 sample. Previously reported genes[4]–[8] ± 0.5 Mb around them are excluded. Gene-wide p-values in the combined Stage 1 and Stage 2 sample obtained by combining the p-values from the Stage 1 with those from the Stage 2 using Fisher's method. The IGHV1-67 gene on chromosome 14∶107,136,620–107,137,059 has combined p-value = 7.9×10−8 (Tables 3). This gene is covered by two SNPs (rs2011167, rs1961901), both are significant at 10−4 level. LD plot in Figure 2 and Table S4 indicate that the two most significant SNPs in IGHV1-67 gene represent almost the same signal (r2 = 0.92, calculated with SNAP software[20], 1000 genomes Pilot 1 dataset, CEU population panel, (http://www.broadinstitute.org/mpg/snap)).

Figure 2

Linkage disequilibrium structure of IGHV1-67 gene ±5 kb.

The SNPs which are significant at 10−4 level are circled in red.

Linkage disequilibrium structure of IGHV1-67 gene ±5 kb.

The SNPs which are significant at 10−4 level are circled in red. To look at the gene expression patterns in these novel genes, we used the Webster-Myers expression dataset[21], available at http://labs.med.miami.edu/myers/LFuN/data%20ajhg.html. Comparing 137 AD vs 176 controls with temporal or frontal cortex expression values by t-test, t showed significantly higher TP53INP1 expression in cases compared to controls (p = 0.0128). Further examination in the BRAINEAC database[22] (www.braineac.org) from the UK Brain Expression Consortium showed TP53INP1 to have a best cis-eQTL p-value of 6.8×10−6 (for rs4582532 SNP, which is about 7.6 kb upstream of the gene). The three SNPs with association p≤10−4 mentioned above (rs4735333, rs1713669, rs896855) had significant cis-eQTL p-values of 8.2×10−6, 7.8×10−5 and 1.1×10−5 respectively in BRAINEAC brain expression data. The r2 between the cis-eQTL and the three associated SNPs were 0.80, 0.65, and 0.81, respectively). Further analysis of additional independent brain expression and methylation datasets (see Methods S1) indicated significant cis eQTLs and meQTLs for TP53INP1 (Tables S10 and S11). The probe for the meQTL is in a CpG island region that corresponds well with ENCODE DNAse/ChIP-seq/Histone marks and is located upstream (∼1.5 kb) of the TP53INP1 transcription start site. In combination these results suggest a possible epigenetic mechanism whereby the associated variants in the region influence TP53INP1 expression in several brain regions. These expression data provide further evidence supporting the functional relevance of TP53INP1 to AD susceptibility. The IGHV1-67 gene was not found in those databases. In addition we detected two genome-wide significant loci 1) ZNF3 (chr7: 99,661,653–99,679,371; p = 8.6×10−7) and 2) two closely located genes on chromosome 11 MTCH2 (47,638,858–47,664,206, combined p = 2.5×10−6) and NDUFS3 (47,600,632–47,606,114, combined p = 4.8×10−7) (Table 4). None of these genes harbour genome-wide significant SNPs in the SNP GWAS analysis on its own (see Tables S5-S7). Figures S1-S3 show LD plots of these additional genes.

Table 4

New genome-wide significant genes associated with AD in the vicinity of recently reported single SNP genome-wide significant hits[9], [19].

Gene Name	Chr	Position	Stage 1 gene-wide p-value	Stage 2 gene-wide p-value	N of SNPs per gene	Combined gene-wide p-value	Combined best SNP p-value	Biological function
ZNF3	7	99,661,653–99,679,371	2.7×10⁻²	1.8×10⁻⁶	27	8.6×10⁻⁷	3.1×10⁻⁷	Transcription factor, leucocyte activation
NDUFS3	11	47,600,632–47,606,114	1.2×10⁻⁶	2.2×10⁻²	5	4.8×10⁻⁷	2.9×10⁻⁶	Mitochondrial electron transport, NADH to ubiquinone
MTCH2	11	47,638,858–47,664,206	1.7×10⁻⁵	8.7×10⁻³	34	2.5×10⁻⁶	7.2×10⁻⁸	Mitochondrial inner membrane

Gene-wide p-values in the combined Stage 1 and Stage 2 sample obtained by combining the p-values from the Stage 1 with those from the Stage 2 using Fisher's method. The LD between rs1476679 (chr7∶100,004,446) reported by IGAP [9] and the best SNP in ZNF3 is r2 = 0.16. The LD between rs10838725 (chr11: 47,557,871) reported by IGAP [9] and the best SNPs in the region on chr 11 in the table are r2 = 0.3 and 0.88 for NDUFS3 and MTCH2 respectively.

Gene-wide p-values are shown for those genes with p<2.5×10−6 for which the best single-SNP p-value in that gene is greater than 5×10−8 in the combined Stage 1 and Stage 2 sample. Previously reported genes[4]–[8] ± 0.5 Mb around them are excluded. Gene-wide p-values in the combined Stage 1 and Stage 2 sample obtained by combining the p-values from the Stage 1 with those from the Stage 2 using Fisher's method. The LD between rs1476679 (chr7∶100,004,446) reported by IGAP [9] and the best SNP in ZNF3 is r2 = 0.16. The LD between rs10838725 (chr11: 47,557,871) reported by IGAP [9] and the best SNPs in the region on chr 11 in the table are r2 = 0.3 and 0.88 for NDUFS3 and MTCH2 respectively. ZNF3 and NDUFS3, MTCH2 genes on chromosomes 7 and 11, respectively, lie close to rs1476679 (chr7∶100,004,446; ZCWPW1) and rs1083872 (chr11∶47,557,871; CELF1) SNPs, which are shown to be genome-wide significant in the IGAP study, when combining Stage 1 and Stage 2 data. Figures S1-S3 show LD structure of these genes in relation to the IGAP singe genome-wide significant hits. (Note that the NDUFS3 gene on chromosome 11 was gene-based genome-wide significant already at Stage 1.) Although none of these SNPs actually lie within the genes mentioned above, it is possible that they may account for the gene-based signals through linkage disequilibrium. In order to test whether the gene-based signals are independent of these strongly-associated SNPs, we performed single-SNP association for each SNP annotated to these genes by regression, adjusting for the significant SNPs mentioned above, along with the other study covariates. The resulting p-values were combined into gene-based tests, as described previously. Under this conditional analysis ZNF3 gene does not show significant association, however NDUFS3 still shows a trend towards significance (p = 0.081) (see Table S8 for details). Furthermore, five genes in chr11∶47,593,749–47,615,961 (KBTBD4, NDUFS3, LOC100287127, FAM180B, C1QTNF4) all have p<0.05 with gene-based analysis ±10 kb, when conditioning by the genome-wide significant hit rs10838725 in this region. This may partially be explained by the SNP rs10838731 (p = 1.2×10−3 after conditioning by rs10838725) which is shared by all latter five genes. Gene-based analysis with ±10 kb around genes did not reveal additional genome-wide significant loci in the Stage 1 data set. Moreover, the significance of the genes identified above did not improve in general, indicating that adding 10 kb flanking regions to genes introduces more noise to the gene-based signal. The combined Stage 1 and Stage 2 gene-based analysis provided further evidence for significant signals in the loci on chr 11 with 8 genes (SPI1, SLC39A13, LOC100287086, PTPMT1, KBTBD4, NDUFS3, LOC100287127, FAM180B) and on chr 7 with 6 genes (LOC100128334, MCM7, PILRB, PILRA, LOC100289298, C7orf51), all reaching genome-wide significance. This is likely to be due to the fact that including genes' flanking regions captures a greater number of the same SNPs or SNPs in high LD showing significant association. The Manhattan plot of the gene-based p-values (Figure 3) gives a general overview of the gene-based results and shows the new loci in relation to previously reported genes (see also QQ-plots in Figure S4). The results of gene-wide analysis for the genes, which were previously reported as associated with AD[4]-[8] and those which are GWAS significant in the Stage 1 analysis are presented in Table S9. Out of 16 reported susceptibility genes, 15 are nominally significant with gene-wide analysis (almost all p-values are smaller than 10−4), however not all of them reach the gene-based genome-wide significance level (2.5×10−6) when the number of SNPs per gene and LD structure of the gene is taken into account.

Figure 3

Manhattan plot of gene-wide p-values in the Stage 1 dataset and combined gene-wide p-values where Stage 2 data are available.

Each dot represents a gene, genes in blue lie within the previously reported[4]–[8] associated regions.

Manhattan plot of gene-wide p-values in the Stage 1 dataset and combined gene-wide p-values where Stage 2 data are available.

Each dot represents a gene, genes in blue lie within the previously reported[4]–[8] associated regions. We did not observe genome-wide significance for CD33 gene. This gene was genome-wide significant in Stage 1 (p = 1.9×10−6), but the association was attenuated when combining Stage 1 and Stage 2 data (p = 1.79×10−5), similar to the single SNP association result in the SNP GWAS study[9], [19].

Discussion

In this study we show that there are more signals in the GWAS imputed data at SNP- and gene-based levels than revealed by single SNP analysis. A gene-based analysis is a next logical step after the single SNP analyses in any attempt to combine possible several signals in genes and thus enhance the power of the association analyses. The first new gene TP53INP1 (chromosome 8) encodes a protein that is involved in mediating autophagy-dependent cell death via apoptosis through altering the phosphorylation state of p53[23] and in modulating cell-extracellular matrix adhesion and cell migration[24]. TP53INP1 encodes a pro-apoptotic tumor suppressor and its antisense oligonucleotide has been used as potential treatment for castration-resistant prostate cancer[25]. This association is notable, given the potential inverse association between cancer and AD that has previously been reported [26], [27]. The second new gene IGHV1-67 (chromosome 14) is a pseudogene in the immunoglobulin (IgG) variable heavy chain region of chromosome 14: its function is unknown but all genes in this region are most likely to be involved in IgG heavy chain VDJ recombinations that lead to the full repertoire of antigen-detecting immune cell clones[28]. The gene-based analysis in this study has shown its utility to enhance the information provided by single SNP analysis (i.e. NDUFS3 gene was genome-wide significant from Stage 1 using gene-based analysis whereas this gene was only genome-wide significant after combining the two stages of single SNP analysis). ZNF3 is a zinc-finger protein at the same locus on chromosome 7 as ZCWPW1 thus rendering it a candidate as the gene that contains the functional signal in this region. Although we can not identify which gene actually confers the risk to AD, it is interesting that ZNF3 function is unknown though it interacts with BAG3 which is involved in ubiquitin/proteasomal functions in protein degradation[29] and ZNF3 is regulated by upstream binding of BACH1 whose target genes have roles in the oxidative stress response and control of the cell cycle[30]. In the cluster of genes on chromosome 11, MTCH2 encodes one of the large family of inner mitochondrial membrane transporters[31] which is associated with mitochondrially-mediated cell death[32], adipocyte differentiation[33], insulin sensitivity[34] and has a genetic association with increased BMI[35]. NDUFS3 also has functions in the mitochondria as it encodes an iron-sulphur component of complex 1 (mitochondrial NADH:ubiquinone oxidoreductase) of the electron transport chain. A deficiency causes a form of Leigh syndrome[36] an early-onset progressive neurodegenerative disorder with a characteristic neuropathology consisting of focal lesions including areas of demyelination and gliosis[37]. In summary, we report two novel genes TP53INP1 (chr8: 95,938,200–95,961,615; combined p = 1.4×10−6) and IGHV1-67 (chr14: 107,136,620–107,137,059; combined p = 7.9×10−8), which were not reported as genome-wide significant before. We also report ZNF3 gene on chromosome 7 and a cluster of genes on chromosome 11 (SPI1-MTCH2), showing gene-based genome-wide significant association with Alzheimer's disease. These genes are in proximity with, but not the same as, those detected by genome-wide significant SNPs, demonstrating support for the signals identified by IGAP[9], [19]. They have an array of functions previously implicated in AD including aspects of energy metabolism, protein degradation and the immune system and add further weight to these pathways as potential therapeutic targets in AD.

Materials and Methods

Stage 1 data

The main dataset was reported by the IGAP consortium[9], [19] and consists in total of 17,008 cases and 37,154 controls. This sample of AD cases and controls comprises 4 data sets taken from genome-wide association studies performed by GERAD, EADI, CHARGE and ADGC (see primary IGAP manuscript[9], [19] for more details). The full details of the samples and methods for conduct of the GWA studies are provided in the respective manuscripts[4]-[8]. Each of these datasets was imputed with Impute2[38] or MACH[39] software using the 1000 genomes data (release Dec2010) as a reference panel. In total 11,863,202 SNPs were included in the SNPs allelic association result file. To make our analysis as conservative as possible, we only included autosomal SNPs which passed stringent quality control criteria, i.e. we included only SNPs with minor allele frequencies (MAF) ≥0.01 and imputation quality score greater than or equal to 0.3 in each individual study, resulting in 7,055,881 SNPs which are present in at least 40% of the AD cases and 40% of the controls in the analysis. The summary statistics across datasets were combined using fixed-effects inverse variance-weighted meta-analysis. We corrected all individual SNPs p-values for genomic control (GC) λ = 1.087. These SNPs are well imputed on a large proportion of the sample, which increases confidence in the accuracy of the association analysis upon which gene-wide analysis is based.

Stage 2 data

11,632 SNPs with p-values <10−3 in the IGAP meta-analysis were successfully genotyped in a Stage 2 sample comprising 8,572 cases and 11,312 controls (see primary IGAP manuscript[9], [19] for more details). An additional 771 SNPs were successfully genotyped to test all genes with gene-wide p-values <10-4 in the IGAP Stage 1 analysis, excluding genes reported prior to IGAP[4]–[8], the four loci reaching genome-wide significance in the Stage 1 IGAP meta-analysis[9], [19] and the 0.5Mb regions around them (Table S2). These SNPs cover 887 genes and correspond to 444 independent loci where all genes within 0.5 Mb are counted as one locus.

Assignment of SNPs to genes

SNPs were assigned to genes if they were located within the genomic sequence lying between the start of the first and the end of the last exon of any transcript corresponding to that gene. The chromosome and location for all currently known human SNPs were taken from the dbSNP132 database, as was their assignment to genes (using build 37.1). In total, we retained 2,804,431 (39.7% of the total) SNPs which annotated 28,636 unique genes with 1–16,514 SNPs per gene. For the gene-wide analysis we have excluded genes which contain only one SNP in the IGAP Stage 1 analysis, leaving a total of 25,310 genes. If a SNP belongs to more than one gene, it was assigned to each of these genes. In order to account for possible signals which are correlated with those in a gene, gene-wide analysis was also performed using a 10 kb window around genes to assign SNPs to genes.

Gene-wide analysis

The gene-wide analysis was performed based on the summary p-values while controlling for LD and different number of markers per gene using an approximate statistical approach[40] adopted for set-based analysis of genetic data[41]. This is a method for calculating the significance of a set of SNPs in the absence of individual genotype data based on a theoretical approximation to Fisher's statistic for combining p-values. Fisher's statistic (-∑ln(pi)) combines probabilities and under the null hypothesis has a chi-square distribution with 2N degrees of freedom, where N is the number of markers, and the summation above is for i = 1,…,N). If Fisher's statistic combines the results of several tests when the tests are independent, the approximate method combines non-independent tests and requires only the list of p-values for each SNP and knowledge of correlations between SNPs. Then the value of Fisher's statistic and the number of degrees of freedom is corrected by the coefficient which depends upon the number of SNPs and correlations (LD) between them. This approximation was applied to the Stage 1 and Stage 2 samples separately, and the resulting gene-wide p-values combined using Fisher's method (since these are independent). LD between markers was computed using 1000 genomes data. The gene-based genome-wide significant level was set to 2.5×10−6 to account for the number of tested genes[42].

Test for excess of associated SNPs/loci

The effective number N of independent SNPs in the whole genome (excluding genes with SNPs that are genome-wide significant in the Stage 1 IGAP dataset ± 0.5 Mb was estimated by the method described in [43] taking LD into account, as were the observed number of independent SNPs significant at each p-value criterion (adjusting individual SNP p-values for genomic control λ = 1.087 before hand). LD was computed from the 1000 Genomes database (http://www.1000genomes.org/). In the absence of excess association, the expected number of independent SNPs significant at significance level α is a normally distributed random variable whose mean and standard deviation (SD) can be calculated as αN and √Nα(1-α) (mean and SD for a binomial distribution). The number of independent SNPs (and thus statistical tests) in the whole genome were estimated as ∼3.7×106, ∼3.6×106 and ∼3.5×106 at significance levels below 0.1, between 0.05 and 0.1, and 0.2 and above respectively (see [43] for details on the dependence between the significance levels and the estimated number of independent tests). We then calculated mean of the expected number of significant SNPs in intervals α 1 < p ≤ α 2, (α 1, α 2 = 0, 10−6, 10−5, …, 0.5) as difference between the expected numbers of independent SNPs at α 2 and α 1 significance levels and SD as the square root of sum of the corresponding variances. We calculated the significance of the excess number of genes attaining the specified thresholds based upon the assumption that, under the null hypothesis of no association, the number of significant genes at a significance level of α in a scan is distributed as a binomial (N,α), where N is the total number of genes, assuming that genes are independent. Genes within 0.5 Mb of each other are counted as one signal when calculating the observed number of significant genes. This prevents significance being inflated by LD between genes, where a single association signal gives rise to several significantly-associated genes. The total number of genes was not corrected for LD in this way, making the estimate of significance of the excess number of genes conservative. Overrepresentation of significant SNPs excluding previously reported [4]-[8] genes ±0.5Mb and the APOE region as above. (DOCX) Click here for additional data file. List of genes that are genome-wide significant in the IGAP stage 1 dataset and the flanking regions which included SNPs either in r (DOCX) Click here for additional data file. Detailed SNP information for TP53INP1 gene. (XLS) Click here for additional data file. Detailed SNP information for IGHV1-67 gene. (XLS) Click here for additional data file. Detailed SNP information for ZNF3 gene. (XLS) Click here for additional data file. Detailed SNP information for NDUFS3 gene. (XLS) Click here for additional data file. Detailed SNP information for MTCH2 gene. (XLS) Click here for additional data file. Gene-based analysis results, when single SNPs p-values, contributing to the gene-based p-value were adjusted for the best genome-wide significant SNP in the nearby location. (DOCX) Click here for additional data file. Gene-wide analysis for genes which show GWAS significant association with AD in the stage 1 IGAP dataset. (DOCX) Click here for additional data file. Brain eQTL Tissues. (XLSX) Click here for additional data file. Brain Meth QTLs. (XLSX) Click here for additional data file. SNPs which are significant at 1e-3 level are circled in red, rs1476679 is highlighted in blue. (TIF) Click here for additional data file. SNPs which are significant at 1e-3 level are circled in red, rs10838725 is highlighted in blue. (TIF) Click here for additional data file. SNPs which are significant at 1e-3 level are circled in red, rs10838725 is highlighted in blue. (TIF) Click here for additional data file. QQ-plot of gene-wide p-values for all genes (A) and excluding previously reported [4]-[8] GWAS significantly associated genes ±0.5Mb (B) in the discovery dataset. Genomic control λ = 1.08 and 1.07 respectively. (TIFF) Click here for additional data file. Expression quantitative trait loci (eQTL) and Methylation quantitative trait loci (meQTL) analyses. (DOCX) Click here for additional data file. Full IGAP datasets description. (DOCX) Click here for additional data file. List of IGAP consortium members. (DOC) Click here for additional data file. Acknowledgements. (DOCX) Click here for additional data file.

41 in total

1. The future of association studies: gene-based analysis and replication.

Authors: Benjamin M Neale; Pak C Sham
Journal: Am J Hum Genet Date: 2004-07-22 Impact factor: 11.025

2. Exome sequencing and the genetic basis of complex traits.

Authors: Adam Kiezun; Kiran Garimella; Ron Do; Nathan O Stitziel; Benjamin M Neale; Paul J McLaren; Namrata Gupta; Pamela Sklar; Patrick F Sullivan; Jennifer L Moran; Christina M Hultman; Paul Lichtenstein; Patrik Magnusson; Thomas Lehner; Yin Yao Shugart; Alkes L Price; Paul I W de Bakker; Shaun M Purcell; Shamil R Sunyaev
Journal: Nat Genet Date: 2012-05-29 Impact factor: 38.330

3. An utter refutation of the "fundamental theorem of the HapMap".

Authors: Joseph D Terwilliger; Tero Hiekkalinna
Journal: Eur J Hum Genet Date: 2006-04 Impact factor: 4.246

4. Detailed analysis of the relative power of direct and indirect association studies and the implications for their interpretation.

Authors: V Moskvina; M C O'Donovan
Journal: Hum Hered Date: 2007-04-27 Impact factor: 0.444

5. SNAP: a web-based tool for identification and annotation of proxy SNPs using HapMap.

Authors: Andrew D Johnson; Robert E Handsaker; Sara L Pulit; Marcia M Nizzari; Christopher J O'Donnell; Paul I W de Bakker
Journal: Bioinformatics Date: 2008-10-30 Impact factor: 6.937

Review 6. The immunoglobulin heavy chain locus: genetic variation, missing data, and implications for human disease.

Authors: C T Watson; F Breden
Journal: Genes Immun Date: 2012-05-03 Impact factor: 2.676

7. TP53INP1, a tumor suppressor, interacts with LC3 and ATG8-family proteins through the LC3-interacting region (LIR) and promotes autophagy-dependent cell death.

Authors: M Seillier; S Peuget; O Gayet; C Gauthier; P N'Guessan; M Monte; A Carrier; J L Iovanna; N J Dusetti
Journal: Cell Death Differ Date: 2012-03-16 Impact factor: 15.828

8. TP53INP1 decreases pancreatic cancer cell migration by regulating SPARC expression.

Authors: M Seux; S Peuget; M P Montero; C Siret; V Rigot; P Clerc; V Gigoux; E Pellegrino; L Pouyet; P N'Guessan; S Garcia; M Dufresne; J L Iovanna; A Carrier; F André; N J Dusetti
Journal: Oncogene Date: 2011-02-21 Impact factor: 9.867

Review 9. Alzheimer's disease genetics: lessons to improve disease modelling.

Authors: Rita J Guerreiro; John Hardy
Journal: Biochem Soc Trans Date: 2011-08 Impact factor: 5.407

10. Inverse association between cancer and Alzheimer's disease: results from the Framingham Heart Study.

Authors: Jane A Driver; Alexa Beiser; Rhoda Au; Bernard E Kreger; Greta Lee Splansky; Tobias Kurth; Douglas P Kiel; Kun Ping Lu; Sudha Seshadri; Phillip A Wolf
Journal: BMJ Date: 2012-03-12

55 in total

Review 1. Genome-wide association studies of late-onset cardiovascular disease.

Authors: J Gustav Smith; Christopher Newton-Cheh
Journal: J Mol Cell Cardiol Date: 2015-04-11 Impact factor: 5.000

2. PLD3 and sporadic Alzheimer's disease risk.

Authors: Jean-Charles Lambert; Benjamin Grenier-Boley; Céline Bellenguez; Florence Pasquier; Dominique Campion; Jean-Francois Dartigues; Claudine Berr; Christophe Tzourio; Philippe Amouyel
Journal: Nature Date: 2015-04-02 Impact factor: 49.962

3. Systems biology approach to late-onset Alzheimer's disease genome-wide association study identifies novel candidate genes validated using brain expression data and Caenorhabditis elegans experiments.

Authors: Shubhabrata Mukherjee; Joshua C Russell; Daniel T Carr; Jeremy D Burgess; Mariet Allen; Daniel J Serie; Kevin L Boehme; John S K Kauwe; Adam C Naj; David W Fardo; Dennis W Dickson; Thomas J Montine; Nilufer Ertekin-Taner; Matt R Kaeberlein; Paul K Crane
Journal: Alzheimers Dement Date: 2017-02-24 Impact factor: 21.566

Review 4. Recent publications from the Alzheimer's Disease Neuroimaging Initiative: Reviewing progress toward improved AD clinical trials.

Authors: Michael W Weiner; Dallas P Veitch; Paul S Aisen; Laurel A Beckett; Nigel J Cairns; Robert C Green; Danielle Harvey; Clifford R Jack; William Jagust; John C Morris; Ronald C Petersen; Andrew J Saykin; Leslie M Shaw; Arthur W Toga; John Q Trojanowski
Journal: Alzheimers Dement Date: 2017-03-22 Impact factor: 21.566

5. PRECISION MEDICINE - The Golden Gate for Detection, Treatment and Prevention of Alzheimer's Disease.

Authors: H Hampel; S E O'Bryant; J I Castrillo; C Ritchie; K Rojkova; K Broich; N Benda; R Nisticò; R A Frank; B Dubois; V Escott-Price; S Lista
Journal: J Prev Alzheimers Dis Date: 2016-09-06

6. Proteomic Analyses for the Global S-Nitrosylated Proteins in the Brain Tissues of Different Human Prion Diseases.

Authors: Li-Na Chen; Qi Shi; Bao-Yun Zhang; Xiao-Mei Zhang; Jing Wang; Kang Xiao; Yan Lv; Jing Sun; Xiao-Dong Yang; Cao Chen; Wei Zhou; Jun Han; Xiao-Ping Dong
Journal: Mol Neurobiol Date: 2015-09-21 Impact factor: 5.590

7. Linking Alzheimer's disease and type 2 diabetes: Novel shared susceptibility genes detected by cFDR approach.

Authors: Xia-Fang Wang; Xu Lin; Ding-You Li; Rou Zhou; Jonathan Greenbaum; Yuan-Cheng Chen; Chun-Ping Zeng; Lin-Ping Peng; Ke-Hao Wu; Zeng-Xin Ao; Jun-Min Lu; Yan-Fang Guo; Jie Shen; Hong-Wen Deng
Journal: J Neurol Sci Date: 2017-08-01 Impact factor: 3.181