Jeremy Schwartzentruber1,2,3, Sarah Cooper4,5, Jimmy Z Liu6, Inigo Barrio-Hernandez7,4, Erica Bello4,5, Natsuhiko Kumasaka5, Adam M H Young8, Robin J M Franklin8, Toby Johnson9, Karol Estrada10, Daniel J Gaffney4,5,11, Pedro Beltrao7,4, Andrew Bassett12,13. 1. European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK. jeremys@ebi.ac.uk. 2. Open Targets, Wellcome Genome Campus, Cambridge, UK. jeremys@ebi.ac.uk. 3. Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK. jeremys@ebi.ac.uk. 4. Open Targets, Wellcome Genome Campus, Cambridge, UK. 5. Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK. 6. Biogen, Cambridge, MA, USA. 7. European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Cambridge, UK. 8. Wellcome-Medical Research Council Cambridge Stem Cell Institute, Cambridge Biomedical Campus, University of Cambridge, Cambridge, UK. 9. Target Sciences-R&D, GSK Medicines Research Centre, Stevenage, UK. 10. BioMarin Pharmaceutical, San Rafael, CA, USA. 11. Genomics Plc, Oxford, UK. 12. Open Targets, Wellcome Genome Campus, Cambridge, UK. ab42@sanger.ac.uk. 13. Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge, UK. ab42@sanger.ac.uk.
Abstract
Genome-wide association studies have discovered numerous genomic loci associated with Alzheimer's disease (AD); yet the causal genes and variants are incompletely identified. We performed an updated genome-wide AD meta-analysis, which identified 37 risk loci, including new associations near CCDC6, TSPAN14, NCK2 and SPRED2. Using three SNP-level fine-mapping methods, we identified 21 SNPs with >50% probability each of being causally involved in AD risk and others strongly suggested by functional annotation. We followed this with colocalization analyses across 109 gene expression quantitative trait loci datasets and prioritization of genes by using protein interaction networks and tissue-specific expression. Combining this information into a quantitative score, we found that evidence converged on likely causal genes, including the above four genes, and those at previously discovered AD loci, including BIN1, APH1B, PTK2B, PILRA and CASS4.
Genome-wide association studies have discovered numerous genomic loci associated with Alzheimer's disease (AD); yet the causal genes and variants are incompletely identified. We performed an updated genome-wide AD meta-analysis, which identified 37 risk loci, including new associations near CCDC6, TSPAN14, NCK2 and SPRED2. Using three SNP-level fine-mapping methods, we identified 21 SNPs with >50% probability each of being causally involved in AD risk and others strongly suggested by functional annotation. We followed this with colocalization analyses across 109 gene expression quantitative trait loci datasets and prioritization of genes by using protein interaction networks and tissue-specific expression. Combining this information into a quantitative score, we found that evidence converged on likely causal genes, including the above four genes, and those at previously discovered AD loci, including BIN1, APH1B, PTK2B, PILRA and CASS4.
Genome-wide association studies (GWAS) for family history of disease, known as GWAS-by-proxy (GWAX), are a powerful method for performing genetic discovery in large, unselected cohort biobanks, particularly for age-related diseases[1]. Recent meta-analyses have combined GWAS of diagnosed late-onset Alzheimer’s disease (AD) with GWAX for family history of AD in the UK Biobank[2,3], and reported 12 novel disease-associated genomic loci. However, the causal genetic variants and genes which influence AD risk at these and previously discovered loci have only been clearly identified in a few cases. Discovering causal variants has led to deeper insight into molecular mechanisms of multiple diseases, including obesity[4], schizophrenia[5], and inflammatory bowel disease[6]. For AD, known causal variants include the 𝜀4 haplotype in APOE, the strongest genetic risk factor for late-onset AD, and a common nonsynonymous variant that strongly alters splicing of CD33 exon 2[7]. Likely causal rare nonsynonymous variants have also been discovered in TREM2
[8], PLCG2 and ABI3
[9]. These findings have strengthened support for a causal role of microglial activation in AD.Although non-synonymous variants are highly enriched in trait associations, most human trait-associated variants do not alter protein-coding sequence and are thought to mediate their effects via altered gene expression, which is likely to occur in a cell type-dependent manner. A growing number of studies have mapped genetic variants affecting gene expression, known as expression quantitative trait loci (eQTLs), in diverse tissues or sorted cell types[10,11]. While it has become common to integrate GWAS results with eQTLs, this is often limited to a small number of datasets thought to be relevant.To identify putative causal genetic variants for AD, we performed a meta-analysis of GWAX in the UK Biobank with the latest GWAS for diagnosed AD[12], followed by fine-mapping using three alternative methods. Notably, this updated GWAS tested more genetic variants than the Lambert et al. study[13] used in meta-analyses by Jansen et al.[3] and Marioni et al.[2] (11.5 vs. 7.1 million). The increased power from our meta-analysis revealed four additional AD risk loci, and the higher density genotype imputation identified new candidate causal variants at both novel and established loci. We also performed statistical colocalization analyses with a broad collection of eQTL datasets, including a recent study on primary microglia[14], to identify candidate genes mediating risk at AD loci. We find that multiple lines of evidence, including colocalization, tissue- or cell type-specific expression, and information propagation in gene networks, converge on a set of likely causal AD genes.
Results
Meta-analysis discovers 37 loci associated with Alzheimer’s disease risk
We performed a GWAX in the UK Biobank for family history of AD, based on 53,042 unique individuals who were either diagnosed with AD or who reported a parent or sibling having dementia, and 355,900 controls. This identified 13 risk loci (P < 5 x 10-8), 10 of which have been reported previously. Three novel loci were located near NCK2, PRL, and FAM135B. Notably, PRL has been reported as a CSF biomarker of AD[15]. We next did a fixed-effects meta-analysis of these GWAX results with the Kunkle et al. stage 1 GWAS meta-analysis of 21,982 cases with diagnosed AD and 41,944 controls[12], across 10,687,126 overlapping variants (Fig. 1). This revealed 34 AD risk loci (P < 5 x 10-8), 22 of which were reported in Kunkle et al., while 8 others were reported in either Jansen et al.[3] or Marioni et al.[2]. Four loci were novel, located near NCK2, TSPAN14, SPRED2, and CCDC6. Notably, the PRL and FAM135B regions showed no evidence of association in Kunkle et al. (P > 0.1), and hence were not significant in meta-analysis. We included 37 loci in our follow-up analyses, which included three loci found at suggestive significance (P < 5 x 10-7) near IKZF1, TSPOAP1, and TMEM163 (Fig. 1 and Supplementary Table 1). LD score regression[16] showed that most of the inflation in summary statistics was due to the polygenicity of AD rather than confounding by population structure (lambda_GC = 1.140, intercept = 1.0285 with SE = 0.0069; Supplementary Table 2). Of our 37 loci, 16 were nominally replicated (P < 0.05) in either the Gr@ace study[17] (4,120 probable AD cases and 3,289 controls) or the FinnGen biobank (v3, 3,697 cases and 131,941 controls) (Supplementary Table 3). Among our four novel loci, only TSPAN14 replicated with P < 0.05 (in FinnGen), although power was limited in these smaller datasets (estimated at 28-76%), and most of the alleles had concordant directions of effect. In a meta-analysis with all four datasets, support for most loci was strengthened (Extended Data Fig. 1), including novel loci TSPAN14, CCDC6 and NCK2, but was weakened for SPRED2 (meta-analysis P = 1.3 x 10-7). Although not included in downstream analyses, four new loci became genome-wide significant, near GRN, IGHG1, SHARPIN, and SIGLEC11 (Supplementary Table 3).
Figure 1
Analysis overview.
a, Summary of AD meta-analysis and data processing steps. b, Manhattan plot of the meta-analysis of GWAS for diagnosed AD and our GWAX in UK Biobank. Novel genome-wide significant loci are labelled in blue, sub-threshold loci in red, and recently discovered loci[2,3,12] replicated in our analysis in black. c, The number of independent signals at each locus which is either recently discovered or which has more than one signal, as well as the meta-analysis P value the lead SNP at the locus. *The PLCG2 locus was significant (P < 5 x 10-8) when including Kunkle stage 3 SNPs. Conditional analyses were not done at APOE due to the strength of the signal (see Methods).
Extended Data Fig. 1
Association of AD loci in discovery + replication (“global”) meta-analysis
Association of AD loci in discovery + replication dataset (“global”) meta-analysis. For most loci, association significance is increased in the global meta-analysis (blue bars) relative to the discovery analysis (grey bars). The dashed vertical line shows P = 5 x 10-8. P-values were computed by inverse variance weighted meta-analysis, and bars show the -log10(P) for the SNP with minimum P value at the locus in either the discovery or global meta-analysis.
Next, we applied stepwise conditioning using GCTA[18], with linkage disequilibrium (LD) determined from UK Biobank samples, to identify independent signals at the discovered loci. Apart from APOE, 9 loci had two independent signals, while the TREM2 locus had three signals (Fig. 1c). Interestingly, a number of the loci discovered recently[2,3,12] had multiple signals: NCK2, EPHA1, ADAM10, ACE, and APP-ADAMTS1. To extract insight from both new and established AD GWAS discoveries, we performed comprehensive colocalization, annotation, fine-mapping and network analyses to identify causal genes and variants (Fig. 1a).
Colocalization between AD risk loci and gene expression traits
To identify genes whose expression may be altered by risk variants, we performed statistical colocalization[19] between each of 36 risk loci (excluding APOE) and a set of 109 eQTL datasets representing a wide variety of tissues, cell types and conditions (Fig. 2 and Supplementary Table 4). The eQTL datasets include a study of primary microglia from 93 brain surgery donors[14], a meta-analysis of 1,433 brain cortex samples[20], 49 tissues from GTEx[11], and 57 eQTL datasets uniformly reprocessed as part of the eQTL catalogue[10]. The latter include multiple studies in tissues of potential relevance to AD, such as brain, as well as sorted blood immune cell types under different stimulation conditions[21-37]. For each gene, the colocalization analysis reports the probability that the GWAS and eQTL share a causal variant, referred to as hypothesis 4 (H4).
Figure 2
Colocalization with eQTLs.
For genes with the top overall colocalization scores across AD risk loci, the colocalization probability (H4) is shown for selected brain, microglia, and monocyte eQTL datasets. For three loci with multiple signals (BIN1, EPHA1, PTK2B-CLU), scores are shown separately for the conditionally independent signals. The last column shows, for each gene, the number of eQTL datasets with a colocalization probability above 0.8 (Supplementary Tables 5 and 6).
Some studies using colocalization have suggested that there is relatively limited overlap between GWAS associations and eQTLs above that expected by chance[6,38]. A possible reason is that colocalization analyses can have low sensitivity to detect shared causal variants between traits, which could occur for a number of reasons. First, when a locus has multiple causal variants, and not all causal effects are shared between the two studies, colocalization may not be detected[19]. Second, if the relevant tissue, cell type, or cellular context has not been assayed, then a colocalization may not be found. Third, differences in LD patterns between studies can reduce the likelihood of a positive colocalization. Lastly, low power in either study can further reduce the colocalization probability. To mitigate the first effect, we performed colocalizations separately for each conditionally independent AD signal, to model the case where not all causal variants are shared, as well as for the combined AD signal at each locus. Problems relating to power, LD mismatch, or missing the relevant cell type or context are partially mitigated by our use of a large number of highly-powered eQTL datasets, which include those with stimulated conditions.Across the 36 loci, we found 391 colocalizations with at least 80% probability of a shared causal variant between AD and eQTL, representing 80 distinct genes at 27 loci (Supplementary Tables 5 and 6). The genes implicated by colocalization include many that have previously been investigated for roles in AD, such as PTK2B
[39,40], BIN1
[41,42], PILRA
[43], CD33
[44,45], and TREM2
[46,47], as well as novel candidates including FCER1G, TSPAN14, APH1B, and ACE. However, the presence of multiple genes with colocalization evidence within individual loci suggests that additional lines of evidence are important for prioritizing relevant genes.
Fine-mapping identifies credibly causal variants
Confirming the causal genes underlying AD risk will ultimately require experiments to identify the molecular mechanisms by which gene function is altered. Such experiments must be motivated by strong hypotheses regarding potentially causal variants and their possible effects. To identify candidate causal variants, we used three distinct fine-mapping methods: single causal variant fine-mapping[48] on each conditionally independent signal; FINEMAP[49], limiting the number of causal variants at each locus to the number of signals determined by GCTA; and PAINTOR[50], a method that leverages enrichments in functional genomic annotations to improve causal variant identification (see Methods).As a reference panel for our analyses, we used LD computed from UK Biobank participants. Previous work has shown that using reference panels that are either too small or poorly matched can result in spurious fine-mapping signals[51]. For this reason, we conducted a sensitivity analysis (described in the Supplementary Note) by using the same reference panel for conditional analysis and fine-mapping on the non-UK Biobank portion of our meta-analysis (Kunkle et al.). This gave comparable independent signals and SNP probabilities to the full meta-analysis, with the exception of a few loci, namely ABCA7, HLA, EPHA1, and ECHDC3 (Extended Data Fig. 2).
Extended Data Fig. 2
Comparison of fine-mapping in the meta-analysis vs. Kunkle et al.
Comparison of fine-mapping in the meta-analysis vs. Kunkle et al. Scatterplots showing, for each locus, SNP probabilities from FINEMAP applied to either the Kunkle et al. + UK Biobank meta-analysis (x-axis), or to only Kunkle et al. The number of causal variants at each locus was set to the number detected by GCTA in the meta-analysis. For most of the 36 loci, SNP probabilities are well correlated. For a few loci that are well powered in Kunkle et al., this is not the case, namely ABCA7, EPHA1, ECHDC3, and HLA. For these loci, fine-mapping results should be interpreted with caution. Six other loci are not well correlated (ADAMTS4, APH1B, IKZF1, PLCG2, TMEM163, and VKORC1), but these loci are poorly powered in Kunkle et al. (lead P values 2.1 x 10-6 to 2.1 x 10-3).
We used 44 annotations individually as input to PAINTOR (Supplementary Table 7); these included ATAC-seq peaks from primary microglia[52] or iPSC-derived macrophages[53], DNase peaks from the Roadmap Epigenomics project[54], variant consequence annotations[55], and evolutionary conservation[56] (Fig. 3). We also used scores from DeepSEA[57] and SpliceAI[58], deep-learning methods that predict the effects of variants on transcription factor binding or splicing. Missense mutations were the most enriched annotation, with 19.2-fold increased odds of being causal SNPs, but they comprised only 1% of input SNPs. Blood or immune DNase hypersensitivity peaks merged from 24 Roadmap Epigenomics tissues provided the highest model likelihood, as these peaks covered 16% of SNPs, despite a lower 6.4-fold enrichment. Variants with a non-zero score from SpliceAI, which predicts changes to gene splicing, were also highly enriched (9.3-fold).
Figure 3
Fine-mapping summary.
a, Number of variants with mean causal probability > 1% for each independent signal. Variant counts for independent signals are shown in different shades. b, PAINTOR outputs, showing (left) log-likelihood (LLK) of model for each individual annotation; (middle) log-odds enrichments for individual genomic annotations determined by PAINTOR; (right) fraction of SNPs which are in each annotation (among those selected by FINEMAP probability > 0.01%). Annotations selected for the final model are shown with a black border.
We next built a multi-annotation model in PAINTOR following a stepwise selection procedure, which identified a minimal but informative set of three annotations: blood and immune DNase, nonsynonymous coding variants, and variants with SpliceAI score greater than 0.01. We used probabilities from this PAINTOR model, and computed the mean causal probability per variant across the three fine-mapping methods.There were 21 variants with mean causal probability above 50% across the fine-mapping methods, and 79 further variants with probabilities from 10-50% (Table 1 and Supplementary Table 8). These include SNPs near established AD risk genes, such as rs6733839 ~20 kb upstream of BIN1, which has recently been shown to alter a microglial MEF2C binding site[14] and to regulate BIN1 expression specifically in microglia[42]. High-confidence variants also include a well-known missense SNP in PILRA
[43], and a splice-altering missense SNP in CD33
[7]. Missense SNP rs4147918 in ABCA7 had 55% causal probability, and ABCA7 harbored 5 further missense SNPs with probabilities greater than 0.01%, at varying allele frequencies. Notably, rs4147918 and 6 other variants within ABCA7, including the lead SNP rs12151021, had positive SpliceAI scores. This is consistent with reports of a burden of deleterious variants at ABCA7 associated with AD[59], as well as potential changes to splicing caused by intronic variable tandem repeats[60].
Table 1
Top candidate variants
Locus
SNP
P value
Odds ratio
Effect allele
Allele freq
SNP prob
SpliceAI
DeepSEA
Note
Refs
ADAMTS4
rs2070902
1.64E-06
0.949
T
0.2580
0.384
0.107
0.140
Intronic in candidate gene FCER1G, with predicted splicing change
ADAMTS4
rs4575098
4.30E-08
1.063
A
0.2350
0.339
0.033
3’ UTR of ADAMTS4, open chromatin
2
SPRED2
rs268120
2.08E-08
1.063
A
0.2502
0.556
0.033
Strong DNase peak, predicted by DeepSEA to decrease
5’ UTR of CASS4; global top DeepSEA variant predicting decreased TF binding
ADAMTS1
rs2830489
3.09E-08
0.943
T
0.2749
0.718
0.077
Lead variant near ADAMTS1
A selected list of the most likely causal variants across loci, based on a combination SNP fine-mapping probabilities and annotations. Column ‘SNP prob’ indicates the mean fine-mapping probability for the SNP; the SpliceAI score is the maximum splicing probability for donor gain/loss or acceptor gain/loss, with nonzero values highly enriched for splicing effects; the DeepSEA functional significance score represents the significance above expectation for chromatin feature changes, as well as evolutionary conservation, with lower values more significant. References for specific SNPs are shown[2,7,9,14,42,43,61–70].
A number of newly identified AD risk genes had high-confidence fine-mapped variants. These include the NCK2 rare intronic SNP rs143080277 (>99% probability, MAF 0.4%), APH1B missense SNP rs117618017 (90% probability), rs2830489 near ADAMTS1 (72% probability), and rs268120 intronic in SPRED2 (56% probability).Manual review highlighted a number of candidate causal variants, where the annotation-based SNP probability was higher than that of the other two methods (Fig. 4). Within TSPAN14, rs1870137 and rs1870138 reside within a DNase hypersensitivity peak found broadly across tissues, which is also an ATAC peak in microglia. Of these, rs1870138 lies at the centre of a ChIP-seq peak for binding of multiple transcription factors, including FOS/JUN and GATA1. The AD risk allele rs1870138-G alters an invariant position of a binding motif for TAL1, a gene highly expressed in microglia, and which is a binding partner for GATA1. This allele is also associated with increased monocyte count[71] and increased risk for inflammatory bowel disease[72]. Notably, the AD signal in the region colocalizes with both an eQTL and a splicing QTL for TSPAN14 in multiple datasets, and rs1870138-G associates with higher TSPAN14 expression in brain and in microglia, but with lower expression in some GTEx tissues.
Figure 4
Fine-mapped variants.
a, SNP rs1870138 in an intron of TSPAN14 disrupts an invariant position of a TAL1 motif. b, Missense SNP rs117618017 in exon 1 of APH1B. c, SNP rs17462136 in the 5’ UTR of CASS4 introduces a TEAD1 motif. Each panel shows (top) locus plot with GWAS P-values, SNP color representing LD to the lead SNP; (middle) expanded view of a subregion showing the mean SNP probabilities from fine-mapping; (bottom) read density of ATAC-sequencing assay from primary microglia[52].
Missense SNP rs117618017 in exon 1 of APH1B (Thr27Ile) is the likely single causal variant at its locus, with fine-mapping probability of 90% (Fig. 4b). APH1B is a component of the gamma-secretase complex, other members of which (PSEN1, PSEN2) have rare variants associated with early-onset AD[73]. Interestingly, the AD signal colocalizes with an APH1B eQTL in monocytes, neutrophils and T-cells, and rs117618017-T associates with higher AD risk and higher APH1B expression across datasets. This allele introduces a motif for transcriptional regulator YY1, and is predicted by DeepSEA to increase YY1 binding in multiple ENCODE cell lines. Therefore, it is an open question whether AD risk is mediated by altered APH1B protein structure or altered gene expression.Finally, the AD association on chromosome 20 colocalizes with an eQTL for CASS4 in Blueprint monocytes and in GTEx whole blood. While intronic lead SNP rs6014724 (55% probability) shows no evidence of transcription factor (TF) binding in ENCODE data, rs17462136 (7% probability) lies in a region of dense TF binding in the 5’ UTR of CASS4 (Fig. 4c). The nucleotide position is highly conserved (GERP score 3.46) and overlaps an ATAC peak in microglia, and the rs17462136-C allele introduces a TEAD1 binding motif. In addition, rs17462136 is more strongly associated with CASS4 expression in multiple eQTL datasets than is rs6014724.
Network evidence prioritizes genes within and beyond GWAS loci
As a further line of evidence, we developed a method that leverages gene network connectivity to prioritize genes at individual loci. We first constructed a gene interaction network combining information from the STRING, IntAct and BioGRID databases. Next, we nominated 32 candidate AD genes (Supplementary Table 9), based on our other evidence sources as well as literature reports, and used these as seed genes similar to the approach used in the priority index for drug discovery[74]. For each locus in turn, we used as input all seed genes except those at the locus, and propagated information through the network with the page rank algorithm. The “networkScore” for a gene thus represents the degree to which the gene is supported by its interaction with top AD candidate genes at other loci, unbiased by any locus-specific features.Across AD loci, our selected seed genes were highly enriched for having high network-based gene scores (one-tailed Wilcoxon rank sum test, P = 5 x 10-9; Extended Data Fig. 3). At our four novel AD loci, the nearest gene (NCK2, TSPAN14, SPRED2, CCDC6) in each case was one of the top two highest-scoring genes within 500 kb. Many established or recently discovered AD genes were also the top gene within 500 kb by network score, including ACE, BIN1, CASS4, CD2AP, PICALM, PLCG2, and PTK2B. At the SLC24A4 locus, RIN3 was strongly supported, whereas SLC24A4 was not, in line with evidence from deleterious rare variants that RIN3 may be causal[12].
Extended Data Fig. 3
Network enrichment
a, The Pagerank percentile of all genes (within 500 kb) at each AD GWAS locus containing a seed gene is shown, with seed genes highlighted in blue. b, A violin/boxplot shows that seed genes have a markedly higher network Pagerank percentile than remaining genes (P = 2.4 x 10-9, one-tailed Wilcoxon rank sum test). c, Log odds ratio enrichment of AD risk among SNPs nearest to genes with network Pagerank percentile in different bins, determined using fgwas (whiskers represent 95% confidence intervals).
Genes highly ranked by network propagation also include many outside of genome-wide significant AD loci (Supplementary Table 10). Consistent with their involvement in AD, such genes tended to have SNPs with lower P values nearby than did remaining genes (Fig. 5a and Extended Data Fig. 3c), suggesting that numerous AD loci remain to be discovered with larger GWAS sample sizes. Top network-ranked genes include LILRB2 (nearby rs3855678 P = 9.8 x 10-6), which encodes a leukocyte immunoglobulin-like receptor that recognizes multiple HLA alleles, and which may also be involved in amyloid-beta fibril growth[75]; ABCA1 (rs59237458 P = 4 x 10-6), involved in phospholipid transfer to apolipoproteins and previously associated with AD[76]; SREBF1 (rs35763683 P = 2 x 10-6), required for lipid homeostasis; and AGRN (rs2710871 P = 4 x 10-6), involved in synapse formation in mature hippocampal neurons. Overall, genes with high network ranks were strongly enriched in biological processes and pathways that have previously been associated with AD, including clathrin-mediated endocytosis, activation of immune response, phagocytosis, Ephrin signaling, and complement activation (Supplementary Table 11).
Figure 5
Genome-wide network and gene expression enrichments.
a, Enrichment of low GWAS P values within 10 kb of genes having high vs. low network pagerank percentile (low defined as below 50th percentile). Whiskers represent 95% confidence intervals based on Fisher’s exact test for n = 18,055 genes. b, Enrichment of AD risk near genes with high expression in each brain cell type (above 80th or 90th percentile) relative to the other cell types. Cell types are defined based on single-cell clusters defined in Hodge et al.[77]. Neuronal cells are defined either by cortical layer (L4, L5, L6), and/or by projection target (IT, intratelencephalic; CT, corticothalamic; ET, extratelencephalic-pyramidal tract; NP, near-projecting), or by binary marker genes (LAMP5, PAX6, PVALB, VIP, SST). OPC, oligodendrocyte precursor cells. Whiskers represent 95% confidence intervals as determined by fgwas.
AD risk is enriched near genes with high microglial gene expression
To understand the contribution of cell-type specific gene expression to AD risk, we used fgwas[78] to assess the genome-wide enrichment of SNPs near genes highly expressed in specific cell types, based on a single-nucleus sequencing dataset of 49,495 nuclei from six human brain cortical areas[77,79]. Out of 18 broad cell type clusters, only microglia showed clear enrichment of AD risk (odds ratio (OR) 6.0) near genes with expression above the 90th percentile across cell types (Fig. 5b). We performed a similar analysis looking at bulk gene expression across human tissues from GTEx, along with a small number of additional RNA-seq datasets, including sorted primary microglia from brain surgeries[14] (Extended Data Fig. 4 and Supplementary Table 12). This gave consistent results, with microglia showing strong enrichment (OR 4.4), followed by tissues rich in immune cells, including spleen (OR 3.6) and whole blood (OR 3.2). Notably, iPSC-derived microglia showed similar enrichment to primary microglia, while bulk brain tissues (including hippocampus) showed no enrichment.
Extended Data Fig. 4
Gene expression enrichments
Expression enrichments for GTEx + microglia. Shown are the log odds ratio enrichments of AD risk among SNPs with relative gene expression in each tissue above the 80th (or 90th) percentile across tissues. Whiskers represent 95% confidence intervals determined by fgwas.
Integrative gene prioritization from five lines of evidence
Determining the genes responsible for AD risk across GWAS loci is challenging, in part because few genes have been definitively confirmed as having a causal role. We therefore developed a comprehensive gene prioritization score, which incorporates quantitative information based on five lines of evidence: gene distance to lead SNPs, colocalization, network score, bulk and single-cell gene expression, and the sum of fine-mapped probability for any coding SNPs within a gene (Fig. 6, Extended Data Figs. 5 and 6, and Supplementary Table 13).
Figure 6
Gene evidence summary.
The top gene at each locus is shown, as well as the next 13 top genes by model score; for three loci where a non-coding gene was the top scoring, we also show the top scoring protein-coding gene. Score components for each gene are indicated by colored bars, and points show the distribution of scores for all genes within 500 kb at the locus. Bold gene names are those with evidence of causality based on rare variants from other studies. Scores for all genes are listed in Supplementary Table 13.
Extended Data Fig. 5
Colocalization scores
a, Genes with maximum colocalization H4 probability >0.9 have higher Pagerank percentile (left boxplot) and higher total score (sum of the four non-coloc predictors, right boxplot) than do genes without colocalisation (<0.5). Genes with intermediate colocalisation evidence (bins 0.5 - 0.8 and 0.8 - 0.9) show little evidence of having higher scores by the other metrics. Based on this, we chose a maxColoc probability of 0.9 as the lower bound for our colocalization score. b, Boxplot of the total score (excluding coloc) for genes that have a colocalisation probability > 0.9 in at least one QTL dataset within each tissue group. The most significant difference is between totalScore for genes with microglial colocalizations vs. the genes with colocalization in “other” tissues (non-immune GTEx tissues), but the for a difference is weak (P = 0.041, Wilcoxon rank sum test). In all cases, boxplots show the 25th, median, and 75th percentile of the distribution, with whiskers extending to the largest (and smallest) value no further than 1.5 times the interquartile range from the boxplot hinge.
Extended Data Fig. 6
Gene distance score
The distance score assigned to genes near an AD GWAS peak, which decreases approximately linearly (past a distance of 1 kb) with increasing log-scaled distance up to 500 kb.
We first explored how best to use colocalization information. We found that genes with maximum colocalization probability (maxH4) above 0.9 had higher prioritization scores based on the other four predictors, but this was not the case for genes with weaker colocalization evidence (Extended Data Fig. 5a). We also examined colocalizations in different cell type or tissue groups, such as brain, microglia, and other GTEx tissues. There was little evidence that colocalizing genes within any specific groups had higher total scores than other groups (Extended Data Fig. 5b), although this conclusion was limited by the low number of studies in some cell types, such as microglia. We therefore based our colocalization score on the maximum colocalization probability across tissues (> 0.9) and normalized this to the range 0-1.A priori, we do not know which lines of evidence are most important for prioritizing genes. We therefore sought a systematic way to identify appropriate weights for the predictors. Although we do not know the causal AD genes, we selected two independent, unbiased sets of candidate genes for use in supervised learning: genes nearest to the GWAS peaks, and genes with high network scores (>80th percentile). In order to identify weights for our predictive features, we defined two models to discriminate these two gene sets from others within 500 kb, in each case using cross-validated lasso-regularized logistic regression with the remaining variables as predictors. As expected, when predicting genes nearest GWAS peaks, the highest-weight predictor was fine-mapped coding variants; however, only a few loci have such variants. The most informative predictor, determined based on change in mean-squared error when the predictor is left out, was colocalization, followed by coding variants and then network score (Supplementary Table 14). When predicting high network score genes, the most informative predictor was distance to GWAS peak, followed by microglial gene expression, and neither colocalization nor coding variant predictors improved the model. For both models, including hippocampus expression (GTEx) or single-cell astrocyte expression resulted in worse models (increased mean squared error).We defined our gene prioritization “model score” as the average of the predictions from our two models. The model score identified as top-ranked many AD candidate genes previously suggested as causal (Fig. 6). Exemplifying the importance of integrating genetic evidence sources, ABCA7, SORL1, and CR1 were top-ranked by overall score at their respective loci, despite having only moderate network-based scores, while SORL1, PICALM, and SPI1 were top-ranked despite having limited eQTL colocalization evidence.While our prioritization further supports many established AD candidate genes, it also implicates novel genes. Among these are FCER1G, which has been reported as a hub gene in microglial gene modules associated with neurodegeneration[81,82], and has been experimentally shown to influence microglial phagocytosis[83]. Another candidate is ZYX, which receives a top network score, is highly expressed in microglia, and which was recently nominated as an AD risk gene based on chromatin interactions between the ZYX promoter and AD risk variants in a ZYX enhancer[84].
Discussion
Identifying therapeutic targets for human diseases is a key goal of human genetics research, and is particularly important for neurodegenerative diseases such as AD, for which no disease-modifying therapies yet exist. However, identifying the causal genes and genetic variants from GWAS is challenging, since non-coding associations can act via regulation of distal genes. We approached this challenge for AD by performing comprehensive fine-mapping, eQTL colocalization, network analysis, and quantitative gene prioritization.Our meta-analysis identified four novel associations near NCK2, SPRED2, TSPAN14, and CCDC6. Each of these was the nearest gene to the association peak and was supported by both eQTL colocalization and network ranking. Yet, despite the large number of eQTL datasets that we used, colocalization of likely AD risk genes was sometimes found in only one or a few datasets; this was the case for SPRED2 (TwinsUK LCL coloc probability 0.99), RIN3 (GTEx frontal cortex probability 0.94), and PILRA (Fairfax LPS-2hr monocyte coloc probability 0.99). Many factors could account for dataset-specific colocalizations, such as biological differences in sample state, differences in LD match between the GWAS and eQTL datasets, and technical differences in the transcriptome annotations used for eQTL discovery. As a result, absence of colocalization provides only weak evidence for lack of an effect in a given tissue type, whereas positive colocalization provides strong support for a shared genetic effect. It is therefore useful to look broadly across eQTL studies for colocalization, which will be facilitated by resources that simplify access to these datasets, such as the eQTL catalogue[11].One of our most confidently prioritized genes was APH1B, encoding a gamma-secretase complex component involved in APP processing. APH1B harbors the likely causal missense variant T27I, yet also has strong colocalization evidence that higher expression correlates with higher AD risk. One possibility is that impaired function of APH1B due to the missense variant leads to upregulation of APH1B transcription. This interpretation would be consistent with evidence from both mice[86] and humans[87] that loss of APH1B and gamma-secretase function leads to AD. It is noteworthy, however, that recent experiments failed to find an effect of the T27I variant on gamma-secretase activity in HEK cells[88].Among our novel associations, TSPAN14 has a role in defining the localization of ADAM10[90], another recently discovered AD gene that is a key component of the alpha-secretase complex and that could thus mediate AD risk via processing of amyloid precursor protein. However, ADAM10 also cleaves the microglia-associated protein TREM2 to generate its soluble ligand-binding domain[91]. Our fine-mapping showed that the risk SNP rs1870138 is also associated with higher risk for inflammatory bowel disease (IBD), an immune-mediated disease, and with higher monocyte count in UK Biobank participants. Since TSPAN14 is expressed more highly in immune cell types, including microglia, than in brain tissue, it is also plausible that AD risk is mediated by its effect on either immune cell count or activation. Recently proposed AD candidate genes supported by our analyses include RIN3, HS3ST1, and FCER1G. As noted above, FCER1G is a microglial master regulator[81-83]; RIN3 interacts with both BIN1 and CD2AP in the early endocytic pathway[93]; HS3ST1 is involved in cellular uptake of tau[94] and was recently been associated with AD in an independent Norwegian sample[62].In summary, our study reports quantitative gene prioritization for 36 AD-associated regions, as well as AD-specific gene network scores beyond these loci. Our genetic findings highlight the presence of diverse mechanisms in AD pathogenesis and suggest candidate targets for therapeutic development.
Online Methods
GWAS on family history of AD
Sample QC, variant QC and imputation was performed on all UK Biobank (UKB) participants as described in Bycroft et al.[95]. After genotype imputation, 93,095,623 variants across 487,409 individuals were available for analysis. To exclude individuals of non-European ancestry, we extracted “White British” ancestry participants as described in Bycroft et al.[95]. These individuals self-reported their ethnic background as “British” and have similar genetic ancestry based on principal components (PC) analysis. To extract additional individuals of European ancestry, we followed a similar approach to Bycroft et al. and applied Aberrant[96] on PCs 1v2, 3v4 and 5v6 across the individuals who self-reported as “Irish” or “Any other white background”. We identified first-degree relatives by applying KING[97] v2.0 to 147,522 UKB participants who had at least one relative identified in Bycroft et al. (UKB Field 22021). For each first-degree relative pair, we prioritized AD cases and proxy-cases (see below) for inclusion, and otherwise excluded one of the pair at random. We also excluded variants with low imputation quality (INFO < 0.3) and/or those with minor allele frequencies below 0.0005, resulting in 25,647,815 variants available for analysis.AD cases were extracted from UKB self-report (field 20002), ICD10 diagnoses (fields 41202 and 41204) and ICD10 cause of death (fields 40001 and 40002) data. UKB participants were asked whether they have a biological father, mother or sibling who suffered from Alzheimer’s disease/dementia (UKB fields 20107, 20110, and 20111, respectively). We extracted all participants with at least one affected relative as proxy-cases. Participants who answered “Do not know” or “Prefer not to answer” were excluded from analyses. All remaining individuals were denoted as controls.There were 898 AD cases, 52,791 AD proxy cases and 355,900 controls in the combined white British and white non-British cohorts. For association analyses, we lumped the true and proxy-cases together (53,042 unique affected individuals) and used the linear-mixed model implemented in BOLT-LMM[98].
AD meta-analysis
To enable meta-analysis combining the UKB cohorts with external case-control studies, we first transformed the AD proxy BOLT-LMM summary statistics from the linear scale to a 1/0 log odds ratio: with standard error: where βLMM and seLMM are the SNP effect sizes and standard errors respectively from BOLT-LMM, and f is the fraction of cases in the sample[99]. Since the affected individuals in our analysis include both true and proxy-cases, we then multiplied the transformed logORs and standard errors by 1.897 to approximate the logORs obtained from a true case/control study[1].We combined the transformed UKB white British cohort, UKB white non-British cohort and the Stage 1 summary statistics from Kunkle et al. using a fixed-effects (inverse variance weighted) meta-analysis across 10,687,126 overlapping variants. For display purposes (Supplementary Table 8), we used CrossMap[100] to convert variant positions from GRCh37 to GRCh38.
Replication
To assess replication of our discovered signals, we downloaded the publicly available summary statistics for the Gr@ace study of AD[17] from the GWAS catalog, and for the FinnGen GWAS of phenotypes “Alzheimer’s disease, wide definition” and “Alzheimer’s disease (Late onset)” from FinnGen release 3. We extracted summary results for our lead SNPs, or a partner in strong LD when the lead SNP was not found, and present these in Supplementary Table 3. We estimated power to detect our four novel loci at nominal significance (P < 0.05) using the genetic power calculator (zzz.bwh.harvard.edu/gpc/cc2.html) with the genotype relative risks estimated from our meta-analysis, and the allele frequency and case/control count from the GWAS study of interest (Gr@ace or FinnGen), and assuming a disease prevalence of 5%. We performed an inverse variance-weighted meta-analysis of all four studies (Kunkle et al., UKB, Gr@ace, and FinnGen “AD wide”), similar to our discovery meta-analysis.
Conditional analysis and statistical fine-mapping
To run GCTA, we prepared plink input files with genotypes from 10,000 randomly sampled UKB individuals at variants within +/- 5 Mb from each lead SNP. We excluded variants with INFO < 0.85, or which had a P-value from Cochran’s Q test for study heterogeneity < 0.001. We also excluded variants with minor allele frequency (MAF) in UKB below 0.1%, as LD estimates are unreliable at low allele counts. We selected these thresholds after manual examination of fine-mapping results, where we found that more lenient cutoffs led either FINEMAP or PAINTOR to select implausible causal variants at a few loci, such as pairs of very weakly associated rare variants to explain a common variant signal. We ran GCTA (v1.92.1) --cojo-slct with a threshold of P < 10-5 to identify secondary signals at each locus, and then retained only loci with a lead P-value below 5 x 10-8. For the HLA locus, we used a GCTA P-value threshold of 5 x 10-8. We also retained the loci TSPOAP1, IKZF1, and TMEM163 since they had P < 5 x 10-8 in an earlier version of our analysis. We excluded the APOE locus from conditional analysis and fine-mapping because the strength of association in the region would require a more perfect LD panel match to avoid spurious signals.We then ran FINEMAP (v1.3) at each locus, with --n-causal-snps given as the number of independent SNPs determined by GCTA. For FINEMAP, we excluded variants with MAF < 0.2%. For loci with multiple signals, we also used GCTA --cojo-cond to condition on each independent SNP identified in the previous analysis, and retained SNPs within 500 kb of any conditionally independent SNP at the locus. To fine-map based on GCTA conditional signals, we converted beta and standard error values to approximate Bayes Factors (BF)[101] using a prior of W = 0.1 (in Wakefield notation), and used the WTCCC single-causal variant method[48], probability = SNP BF / sum(all SNP BFs).To assess sensitivity of the results to our choice of reference panel, we applied the same steps (GCTA + FINEMAP) to summary statistics from the Kunkle et al. sub-study, which are described further in the Supplementary Note.
Colocalization with eQTLs
For eQTL colocalization, we downloaded summary statistics (see URLs) and determined eQTL genes at FDR 5% for each dataset in a uniform manner, first using Bonferroni correction of lead SNP nominal P values based on the number of variants tested for the gene, and using the Benjamini-Hochberg method to compute FDR. QTL calling for primary microglia was performed with RASQUAL[102] with the --no-posterior-update option. For datasets in GRCh38 coordinates, we first used CrossMap[100] to convert back to GRCh37 coordinates to match variants between eQTL and GWAS. We used the coloc package[19] with default priors to perform colocalization tests between GWAS and eQTLs having lead variants within 500 kb of each other, and passed to coloc all variants within 200 kb of each lead variant. We also ran coloc using P-values for each conditionally independent GWAS signal, obtained with GCTA as described above.
Functional annotations
We used the Ensembl VEP online Web tool (www.ensembl.org/vep)[55] to predict variant consequences, and to add selected annotations (Supplementary Table 7). We downloaded bed files based on imputed data for Roadmap Epigenomics DNase and 25-state genome segmentations for 127 epigenomes[54]. We grouped these into groups “all”, “brain” (epigenomes 7, 9, 10, 53, 54, 67, 68, 69, 70, 71, 72, 73, 74, 81, 82, 125), and “blood & immune” (epigenomes 33, 34, 37, 38, 39, 40, 41, 42, 43, 44, 45, 47, 48, 62, 29, 30, 31, 32, 35, 36, 46, 50, 51, 116). We considered 9 genome segmentation states to represent enhancers: TxReg, TxEnh5, TxEnh3, TxEnhW, EnhA1, EnhA2, EnhAF, EnhW1, EnhW2. We used bedtools[103] to determine overlaps, and counted the number of overlaps for each variant with peaks in the above groups. We downloaded FANTOM5[104] permissive enhancer annotations from fantom.gsc.riken.jp/5/data/. We downloaded pre-computed SpliceAI scores[58] for variants within genes from github.com/Illumina/SpliceAI. We merged filtered whole-genome and exome scores together, and for each AD variant annotated the maximum score across splice donor gain, donor loss, acceptor gain, acceptor loss. We used DeepSEA[57] (deepsea.princeton.edu) to annotate variants selected for functional fine-mapping with DeepSEA’s “functional significance” score. BigWig files with PhastCons, PhyloP and GERP RS scores were downloaded from UCSC. We downloaded microglial ATAC-seq based on the study by Gosselin et al.[52], aligned reads to GRCh37 with bwa 0.7.15[105], and called multisample peaks across all 15 datasets using MACS2[106]. We prepared bigWig files from alignments by using bedtools genomecov, followed by bedGraphToBigWig. To visualise microglia ATAC-seq tracks we adapted code from wiggleplotr[107].
Annotation-based fine-mapping
For fine-mapping with PAINTOR, we first restricted the number of considered variants for computational feasibility, by selecting 3,207 variants which had (i) FINEMAP probability ≥ 0.01% based on the GCTA-identified number of causal variants at the locus, or (ii) had FINEMAP probability ≥ 1% when run with either 1 or 2 causal variants, or (iii) were among the top 20 variants at the locus by FINEMAP probability. We defined binary annotations for input to PAINTOR based on the features described above, thresholding certain scores at multiple levels (e.g. CADD ≥ 5, 10, 20). For Roadmap annotations, we included a category based on whether a variant was in a peak or enhancer in ≥ 10 epigenomes. We ran PAINTOR v3.1 once for each of the 43 annotations (Fig. 2 and Supplementary Table 7), allowing two causal variants per locus.We built a multi-annotation model using forward stepwise selection. We selected the best annotation by log-likelihood (LLK), Blood & immune DNase, and then ran PAINTOR again for each combination of this annotation and the 42 remaining annotations. We added a top-ranking annotation at each iteration until the model LLK improvement was less than 1. This occurred at iteration 4, and so we kept the first three annotations in the combined model. We computed the mean causal probability for each SNP as the mean of the three fine-mapping methods at loci with two or more signals, or as the mean of the FINEMAP and PAINTOR probabilities for loci with one signal, since FINEMAP gives approximately the same results as WTCCC fine-mapping for a single causal variant.
Network analysis
For network analysis, we created a gene interaction network based on selecting all edges between protein-coding genes from systematic studies (>1,000 interactions) in the IntAct[108] and BioGRID databases[109], and edges from STRING v10.5[110] with edge score > 0.75. This combined network included 18,055 genes and 540,421 edges. We identified 28 top candidate genes across AD loci (Supplementary Table 9) to use as seed genes, and assigned weight to these as the -log10(P value) of the locus lead SNP. We added four genes from the literature (MAPT, PSEN1, PSEN2, ABI3), with a weight (equivalent -log10(P)) of 15. For three loci, the nearest gene was not present in the network (ECHDC3, TMEM163, SCIMP). For each locus, we used all seed genes as input except those at the same locus, and propagated information through the network with the personalized PageRank algorithm[111], included in the igraph R package[112]. Since a gene’s resulting PageRank was highly correlated with its node degree, we compared the PageRank of each gene to the distribution of PageRanks obtained for the same gene in 1,000 iterations of network propagation, where the same number of seed genes were randomly selected. We computed the percentile of a gene’s true PageRank relative to the 1,000 network propagations with randomized inputs. Although the distribution of PageRank percentile was fairly uniform, we further normalised this to a uniform distribution across genes, so that a Pagerank percentile of 90% indicates that a gene’s PageRank relative to permutations is above that of 90% of genes. To determine gene set enrichment, we used the top 1,000 genes by network rank as input to gProfiler[113] with default settings, with the set of all genes ranked by the network as a background set. To determine enrichment of low P-value AD SNPs near genes in specific bins of PageRank percentile (Fig. 5a), we first determined for each gene the minimum SNP P value within 10 kb of the gene’s footprint. We excluded genes within 1 Mb of APOE. Then, for genes in each PageRank percentile bin, we used Fisher’s exact test to determine the odds ratio for a gene in that bin (relative to genes with PageRank percentile <50%) to have a minimum SNP P value in the given bin (relative to genes with minimum SNP P > 0.01).
Gene expression
Gene expression values for all tissues were determined in units of transcripts per million (TPM). Both GTEx v8 and the eQTL catalogue provide tables of the median TPM expression across samples for each tissue and gene. For primary microglia, we obtained a table of read counts per gene, computed using FeatureCounts 1.5.3 as described[14], from which we computed median TPM. For use in gene prioritization and enrichment analyses, we first selected four GTEx brain tissues (cortex, hippocampus, substantia nigra, cerebellum) to avoid over-representing brain, and then the remaining 41 GTEx tissues, as well primary microglia and in-house expression data from iPSC-derived microglia, iPSC-derived NGN2 cortical neurons, and iPSC-derived neurons from growth factor differentiation. For each gene, we determined the TPM expression relative to all tissues/cell types.Single-cell gene expression data were obtained from the Allen Brain Institute as a gene-by-cell counts table, based on Smart-seq profiling of six human brain cortical areas[77]. For each cell type “subclass” as defined in the metadata (but excluding VLMC for having too few cells, and the outlier subclass labelled “exclude”), counts were summed across cells and then normalised to TPM within each subclass. We determined each gene’s TPM expression in each subclass relative to all 18 subclasses.
Genome-wide enrichment
We determined the GRCh37 coordinates of 18,055 genes present in the gene network using the R package annotables 0.1.91. For each AD GWAS SNP, excluding the APOE region (chr19:44-47 Mb), we determined the nearest gene. We defined annotation inputs for fgwas labelling a SNP 1 if it was nearest to a gene with network score in a given percentile bin (50-60, 60-70, 70-80, 80-90, 90-95, >95) and 0 otherwise. We ran fgwas[78] (-cc) with all network annotations as input, so that enrichments are with respect to SNPs nearest to genes with network score < 50th percentile. For every bulk gene expression dataset selected above, we defined an annotation for SNPs nearest genes with relative expression above the 80th (or 90th) percentile, and similarly for cell types from single-cell gene expression. We ran fgwas once for each expression annotation to determine enrichment of SNPs near high-expression genes relative to remaining genes (Supplementary Table 12).
Gene prioritization
Five predictors were used for gene prioritization.The coding score is the sum of the mean fine-mapping probability for missense or LoF variants in a gene.The expression score is the sum of component scores for bulk and single-cell microglial expression, and rewards genes with expression percentile above the 50th:Genes without measured expression in a given dataset (bulk, single-cell) are assigned an exprScore of zero for that dataset.Recent evidence from both eQTLs[114] and metabolite GWAS[115] suggests that genomic distance from the association peak is a strong predictor of causal target genes. The distance score is defined to give reasonable scores over the main range of interest of 0 - 200 kb: where x is the minimum distance from the gene’s footprint to the region defined by independent lead SNPs at a GWAS locus, maxDist is 500,000 and distBias is 100 (Extended Data Fig. 6).The coloc score is defined based on the maximum value across QTL datasets of the “H4” hypothesis probability, and rewards colocalisation probabilities above 0.9:The network score is determined based on the pagerank percentile for a gene relative to permutations:Genes not present in the network are assigned a networkScore of zero.The total score for a gene is the sum of the above five scores.To give appropriate weight to each component, we trained lasso-regularized logistic regression models with cross-validation using glmnet[116]. As input we used all protein-coding genes within 500 kb of our AD GWAS peaks, excluding the APOE region due to lack of colocalisation information, and excluding genes not present in the network. For the distance model, genes within 10 kb of each GWAS peak (40 genes) were set as positives, genes 10-100 kb were excluded, and genes >100 kb (394 genes) were set as negatives. These were predicted using the four non-distance predictors. For the network model, genes with pagerank percentile >80% (143 genes) were set as positives, those with pagerank percentile 50-80% were excluded, the 230 other genes were set as negatives, and these were predicted using the four non-network predictors. In each case, we selected the model that minimized mean squared error (MSE), shown in Supplementary Table 14, and used those parameters to generate predictions (in the range 0-1) for all genes at the AD loci. We defined the model score for a gene as the average prediction from the two models. To determine the importance of the predictors to each model (apart from looking at regression coefficients) we ran glmnet models excluding each predictor in turn. If the MSE was lower with a predictor excluded then we removed it from the final model. For each model, we compared the MSE when using our quantitative predictors as defined above, or using categorical predictors by thresholding the predictors into 2-4 bins. For both models, the quantitative predictors gave improved MSE. We also examined models that included as predictors expression scores from astrocytes (based on the single-cell data) and from brain hippocampus (based on the GTEx data), but for both models this resulted in higher MSE and the regularization set the coefficients to zero.
Association of AD loci in discovery + replication (“global”) meta-analysis
Association of AD loci in discovery + replication dataset (“global”) meta-analysis. For most loci, association significance is increased in the global meta-analysis (blue bars) relative to the discovery analysis (grey bars). The dashed vertical line shows P = 5 x 10-8. P-values were computed by inverse variance weighted meta-analysis, and bars show the -log10(P) for the SNP with minimum P value at the locus in either the discovery or global meta-analysis.
Comparison of fine-mapping in the meta-analysis vs. Kunkle et al.
Comparison of fine-mapping in the meta-analysis vs. Kunkle et al. Scatterplots showing, for each locus, SNP probabilities from FINEMAP applied to either the Kunkle et al. + UK Biobank meta-analysis (x-axis), or to only Kunkle et al. The number of causal variants at each locus was set to the number detected by GCTA in the meta-analysis. For most of the 36 loci, SNP probabilities are well correlated. For a few loci that are well powered in Kunkle et al., this is not the case, namely ABCA7, EPHA1, ECHDC3, and HLA. For these loci, fine-mapping results should be interpreted with caution. Six other loci are not well correlated (ADAMTS4, APH1B, IKZF1, PLCG2, TMEM163, and VKORC1), but these loci are poorly powered in Kunkle et al. (lead P values 2.1 x 10-6 to 2.1 x 10-3).
Network enrichment
a, The Pagerank percentile of all genes (within 500 kb) at each AD GWAS locus containing a seed gene is shown, with seed genes highlighted in blue. b, A violin/boxplot shows that seed genes have a markedly higher network Pagerank percentile than remaining genes (P = 2.4 x 10-9, one-tailed Wilcoxon rank sum test). c, Log odds ratio enrichment of AD risk among SNPs nearest to genes with network Pagerank percentile in different bins, determined using fgwas (whiskers represent 95% confidence intervals).
Gene expression enrichments
Expression enrichments for GTEx + microglia. Shown are the log odds ratio enrichments of AD risk among SNPs with relative gene expression in each tissue above the 80th (or 90th) percentile across tissues. Whiskers represent 95% confidence intervals determined by fgwas.
Colocalization scores
a, Genes with maximum colocalization H4 probability >0.9 have higher Pagerank percentile (left boxplot) and higher total score (sum of the four non-coloc predictors, right boxplot) than do genes without colocalisation (<0.5). Genes with intermediate colocalisation evidence (bins 0.5 - 0.8 and 0.8 - 0.9) show little evidence of having higher scores by the other metrics. Based on this, we chose a maxColoc probability of 0.9 as the lower bound for our colocalization score. b, Boxplot of the total score (excluding coloc) for genes that have a colocalisation probability > 0.9 in at least one QTL dataset within each tissue group. The most significant difference is between totalScore for genes with microglial colocalizations vs. the genes with colocalization in “other” tissues (non-immune GTEx tissues), but the for a difference is weak (P = 0.041, Wilcoxon rank sum test). In all cases, boxplots show the 25th, median, and 75th percentile of the distribution, with whiskers extending to the largest (and smallest) value no further than 1.5 times the interquartile range from the boxplot hinge.
Gene distance score
The distance score assigned to genes near an AD GWAS peak, which decreases approximately linearly (past a distance of 1 kb) with increasing log-scaled distance up to 500 kb.
Authors: Iris E Jansen; Jeanne E Savage; Stephan Ripke; Ole A Andreassen; Danielle Posthuma; Kyoko Watanabe; Julien Bryois; Dylan M Williams; Stacy Steinberg; Julia Sealock; Ida K Karlsson; Sara Hägg; Lavinia Athanasiu; Nicola Voyle; Petroula Proitsi; Aree Witoelar; Sven Stringer; Dag Aarsland; Ina S Almdahl; Fred Andersen; Sverre Bergh; Francesco Bettella; Sigurbjorn Bjornsson; Anne Brækhus; Geir Bråthen; Christiaan de Leeuw; Rahul S Desikan; Srdjan Djurovic; Logan Dumitrescu; Tormod Fladby; Timothy J Hohman; Palmi V Jonsson; Steven J Kiddle; Arvid Rongve; Ingvild Saltvedt; Sigrid B Sando; Geir Selbæk; Maryam Shoai; Nathan G Skene; Jon Snaedal; Eystein Stordal; Ingun D Ulstein; Yunpeng Wang; Linda R White; John Hardy; Jens Hjerling-Leffler; Patrick F Sullivan; Wiesje M van der Flier; Richard Dobson; Lea K Davis; Hreinn Stefansson; Kari Stefansson; Nancy L Pedersen Journal: Nat Genet Date: 2019-01-07 Impact factor: 38.330
Authors: Brian W Kunkle; Benjamin Grenier-Boley; Rebecca Sims; Joshua C Bis; Vincent Damotte; Adam C Naj; Anne Boland; Maria Vronskaya; Sven J van der Lee; Alexandre Amlie-Wolf; Céline Bellenguez; Aura Frizatti; Vincent Chouraki; Eden R Martin; Kristel Sleegers; Nandini Badarinarayan; Johanna Jakobsdottir; Kara L Hamilton-Nelson; Sonia Moreno-Grau; Robert Olaso; Rachel Raybould; Yuning Chen; Amanda B Kuzma; Mikko Hiltunen; Taniesha Morgan; Shahzad Ahmad; Badri N Vardarajan; Jacques Epelbaum; Per Hoffmann; Merce Boada; Gary W Beecham; Jean-Guillaume Garnier; Denise Harold; Annette L Fitzpatrick; Otto Valladares; Marie-Laure Moutet; Amy Gerrish; Albert V Smith; Liming Qu; Delphine Bacq; Nicola Denning; Xueqiu Jian; Yi Zhao; Maria Del Zompo; Nick C Fox; Seung-Hoan Choi; Ignacio Mateo; Joseph T Hughes; Hieab H Adams; John Malamon; Florentino Sanchez-Garcia; Yogen Patel; Jennifer A Brody; Beth A Dombroski; Maria Candida Deniz Naranjo; Makrina Daniilidou; Gudny Eiriksdottir; Shubhabrata Mukherjee; David Wallon; James Uphill; Thor Aspelund; Laura B Cantwell; Fabienne Garzia; Daniela Galimberti; Edith Hofer; Mariusz Butkiewicz; Bertrand Fin; Elio Scarpini; Chloe Sarnowski; Will S Bush; Stéphane Meslage; Johannes Kornhuber; Charles C White; Yuenjoo Song; Robert C Barber; Sebastiaan Engelborghs; Sabrina Sordon; Dina Voijnovic; Perrie M Adams; Rik Vandenberghe; Manuel Mayhaus; L Adrienne Cupples; Marilyn S Albert; Peter P De Deyn; Wei Gu; Jayanadra J Himali; Duane Beekly; Alessio Squassina; Annette M Hartmann; Adelina Orellana; Deborah Blacker; Eloy Rodriguez-Rodriguez; Simon Lovestone; Melissa E Garcia; Rachelle S Doody; Carmen Munoz-Fernadez; Rebecca Sussams; Honghuang Lin; Thomas J Fairchild; Yolanda A Benito; Clive Holmes; Hata Karamujić-Čomić; Matthew P Frosch; Hakan Thonberg; Wolfgang Maier; Gennady Roshchupkin; Bernardino Ghetti; Vilmantas Giedraitis; Amit Kawalia; Shuo Li; Ryan M Huebinger; Lena Kilander; Susanne Moebus; Isabel Hernández; M Ilyas Kamboh; RoseMarie Brundin; James Turton; Qiong Yang; Mindy J Katz; Letizia Concari; Jenny Lord; Alexa S Beiser; C Dirk Keene; Seppo Helisalmi; Iwona Kloszewska; Walter A Kukull; Anne Maria Koivisto; Aoibhinn Lynch; Lluís Tarraga; Eric B Larson; Annakaisa Haapasalo; Brian Lawlor; Thomas H Mosley; Richard B Lipton; Vincenzo Solfrizzi; Michael Gill; W T Longstreth; Thomas J Montine; Vincenza Frisardi; Monica Diez-Fairen; Fernando Rivadeneira; Ronald C Petersen; Vincent Deramecourt; Ignacio Alvarez; Francesca Salani; Antonio Ciaramella; Eric Boerwinkle; Eric M Reiman; Nathalie Fievet; Jerome I Rotter; Joan S Reisch; Olivier Hanon; Chiara Cupidi; A G Andre Uitterlinden; Donald R Royall; Carole Dufouil; Raffaele Giovanni Maletta; Itziar de Rojas; Mary Sano; Alexis Brice; Roberta Cecchetti; Peter St George-Hyslop; Karen Ritchie; Magda Tsolaki; Debby W Tsuang; Bruno Dubois; David Craig; Chuang-Kuo Wu; Hilkka Soininen; Despoina Avramidou; Roger L Albin; Laura Fratiglioni; Antonia Germanou; Liana G Apostolova; Lina Keller; Maria Koutroumani; Steven E Arnold; Francesco Panza; Olymbia Gkatzima; Sanjay Asthana; Didier Hannequin; Patrice Whitehead; Craig S Atwood; Paolo Caffarra; Harald Hampel; Inés Quintela; Ángel Carracedo; Lars Lannfelt; David C Rubinsztein; Lisa L Barnes; Florence Pasquier; Lutz Frölich; Sandra Barral; Bernadette McGuinness; Thomas G Beach; Janet A Johnston; James T Becker; Peter Passmore; Eileen H Bigio; Jonathan M Schott; Thomas D Bird; Jason D Warren; Bradley F Boeve; Michelle K Lupton; James D Bowen; Petra Proitsi; Adam Boxer; John F Powell; James R Burke; John S K Kauwe; Jeffrey M Burns; Michelangelo Mancuso; Joseph D Buxbaum; Ubaldo Bonuccelli; Nigel J Cairns; Andrew McQuillin; Chuanhai Cao; Gill Livingston; Chris S Carlson; Nicholas J Bass; Cynthia M Carlsson; John Hardy; Regina M Carney; Jose Bras; Minerva M Carrasquillo; Rita Guerreiro; Mariet Allen; Helena C Chui; Elizabeth Fisher; Carlo Masullo; Elizabeth A Crocco; Charles DeCarli; Gina Bisceglio; Malcolm Dick; Li Ma; Ranjan Duara; Neill R Graff-Radford; Denis A Evans; Angela Hodges; Kelley M Faber; Martin Scherer; Kenneth B Fallon; Matthias Riemenschneider; David W Fardo; Reinhard Heun; Martin R Farlow; Heike Kölsch; Steven Ferris; Markus Leber; Tatiana M Foroud; Isabella Heuser; Douglas R Galasko; Ina Giegling; Marla Gearing; Michael Hüll; Daniel H Geschwind; John R Gilbert; John Morris; Robert C Green; Kevin Mayo; John H Growdon; Thomas Feulner; Ronald L Hamilton; Lindy E Harrell; Dmitriy Drichel; Lawrence S Honig; Thomas D Cushion; Matthew J Huentelman; Paul Hollingworth; Christine M Hulette; Bradley T Hyman; Rachel Marshall; Gail P Jarvik; Alun Meggy; Erin Abner; Georgina E Menzies; Lee-Way Jin; Ganna Leonenko; Luis M Real; Gyungah R Jun; Clinton T Baldwin; Detelina Grozeva; Anna Karydas; Giancarlo Russo; Jeffrey A Kaye; Ronald Kim; Frank Jessen; Neil W Kowall; Bruno Vellas; Joel H Kramer; Emma Vardy; Frank M LaFerla; Karl-Heinz Jöckel; James J Lah; Martin Dichgans; James B Leverenz; David Mann; Allan I Levey; Stuart Pickering-Brown; Andrew P Lieberman; Norman Klopp; Kathryn L Lunetta; H-Erich Wichmann; Constantine G Lyketsos; Kevin Morgan; Daniel C Marson; Kristelle Brown; Frank Martiniuk; Christopher Medway; Deborah C Mash; Markus M Nöthen; Eliezer Masliah; Nigel M Hooper; Wayne C McCormick; Antonio Daniele; Susan M McCurry; Anthony Bayer; Andrew N McDavid; John Gallacher; Ann C McKee; Hendrik van den Bussche; Marsel Mesulam; Carol Brayne; Bruce L Miller; Steffi Riedel-Heller; Carol A Miller; Joshua W Miller; Ammar Al-Chalabi; John C Morris; Christopher E Shaw; Amanda J Myers; Jens Wiltfang; Sid O'Bryant; John M Olichney; Victoria Alvarez; Joseph E Parisi; Andrew B Singleton; Henry L Paulson; John Collinge; William R Perry; Simon Mead; Elaine Peskind; David H Cribbs; Martin Rossor; Aimee Pierce; Natalie S Ryan; Wayne W Poon; Benedetta Nacmias; Huntington Potter; Sandro Sorbi; Joseph F Quinn; Eleonora Sacchinelli; Ashok Raj; Gianfranco Spalletta; Murray Raskind; Carlo Caltagirone; Paola Bossù; Maria Donata Orfei; Barry Reisberg; Robert Clarke; Christiane Reitz; A David Smith; John M Ringman; Donald Warden; Erik D Roberson; Gordon Wilcock; Ekaterina Rogaeva; Amalia Cecilia Bruni; Howard J Rosen; Maura Gallo; Roger N Rosenberg; Yoav Ben-Shlomo; Mark A Sager; Patrizia Mecocci; Andrew J Saykin; Pau Pastor; Michael L Cuccaro; Jeffery M Vance; Julie A Schneider; Lori S Schneider; Susan Slifer; William W Seeley; Amanda G Smith; Joshua A Sonnen; Salvatore Spina; Robert A Stern; Russell H Swerdlow; Mitchell Tang; Rudolph E Tanzi; John Q Trojanowski; Juan C Troncoso; Vivianna M Van Deerlin; Linda J Van Eldik; Harry V Vinters; Jean Paul Vonsattel; Sandra Weintraub; Kathleen A Welsh-Bohmer; Kirk C Wilhelmsen; Jennifer Williamson; Thomas S Wingo; Randall L Woltjer; Clinton B Wright; Chang-En Yu; Lei Yu; Yasaman Saba; Alberto Pilotto; Maria J Bullido; Oliver Peters; Paul K Crane; David Bennett; Paola Bosco; Eliecer Coto; Virginia Boccardi; Phil L De Jager; Alberto Lleo; Nick Warner; Oscar L Lopez; Martin Ingelsson; Panagiotis Deloukas; Carlos Cruchaga; Caroline Graff; Rhian Gwilliam; Myriam Fornage; Alison M Goate; Pascual Sanchez-Juan; Patrick G Kehoe; Najaf Amin; Nilifur Ertekin-Taner; Claudine Berr; Stéphanie Debette; Seth Love; Lenore J Launer; Steven G Younkin; Jean-Francois Dartigues; Chris Corcoran; M Arfan Ikram; Dennis W Dickson; Gael Nicolas; Dominique Campion; JoAnn Tschanz; Helena Schmidt; Hakon Hakonarson; Jordi Clarimon; Ron Munger; Reinhold Schmidt; Lindsay A Farrer; Christine Van Broeckhoven; Michael C O'Donovan; Anita L DeStefano; Lesley Jones; Jonathan L Haines; Jean-Francois Deleuze; Michael J Owen; Vilmundur Gudnason; Richard Mayeux; Valentina Escott-Price; Bruce M Psaty; Alfredo Ramirez; Li-San Wang; Agustin Ruiz; Cornelia M van Duijn; Peter A Holmans; Sudha Seshadri; Julie Williams; Phillippe Amouyel; Gerard D Schellenberg; Jean-Charles Lambert; Margaret A Pericak-Vance Journal: Nat Genet Date: 2019-02-28 Impact factor: 41.307
Authors: Rebecca Sims; Sven J van der Lee; Adam C Naj; Céline Bellenguez; Nandini Badarinarayan; Johanna Jakobsdottir; Brian W Kunkle; Anne Boland; Rachel Raybould; Joshua C Bis; Eden R Martin; Benjamin Grenier-Boley; Stefanie Heilmann-Heimbach; Vincent Chouraki; Amanda B Kuzma; Kristel Sleegers; Maria Vronskaya; Agustin Ruiz; Robert R Graham; Robert Olaso; Per Hoffmann; Megan L Grove; Badri N Vardarajan; Mikko Hiltunen; Markus M Nöthen; Charles C White; Kara L Hamilton-Nelson; Jacques Epelbaum; Wolfgang Maier; Seung-Hoan Choi; Gary W Beecham; Cécile Dulary; Stefan Herms; Albert V Smith; Cory C Funk; Céline Derbois; Andreas J Forstner; Shahzad Ahmad; Hongdong Li; Delphine Bacq; Denise Harold; Claudia L Satizabal; Otto Valladares; Alessio Squassina; Rhodri Thomas; Jennifer A Brody; Liming Qu; Pascual Sánchez-Juan; Taniesha Morgan; Frank J Wolters; Yi Zhao; Florentino Sanchez Garcia; Nicola Denning; Myriam Fornage; John Malamon; Maria Candida Deniz Naranjo; Elisa Majounie; Thomas H Mosley; Beth Dombroski; David Wallon; Michelle K Lupton; Josée Dupuis; Patrice Whitehead; Laura Fratiglioni; Christopher Medway; Xueqiu Jian; Shubhabrata Mukherjee; Lina Keller; Kristelle Brown; Honghuang Lin; Laura B Cantwell; Francesco Panza; Bernadette McGuinness; Sonia Moreno-Grau; Jeremy D Burgess; Vincenzo Solfrizzi; Petra Proitsi; Hieab H Adams; Mariet Allen; Davide Seripa; Pau Pastor; L Adrienne Cupples; Nathan D Price; Didier Hannequin; Ana Frank-García; Daniel Levy; Paramita Chakrabarty; Paolo Caffarra; Ina Giegling; Alexa S Beiser; Vilmantas Giedraitis; Harald Hampel; Melissa E Garcia; Xue Wang; Lars Lannfelt; Patrizia Mecocci; Gudny Eiriksdottir; Paul K Crane; Florence Pasquier; Virginia Boccardi; Isabel Henández; Robert C Barber; Martin Scherer; Lluis Tarraga; Perrie M Adams; Markus Leber; Yuning Chen; Marilyn S Albert; Steffi Riedel-Heller; Valur Emilsson; Duane Beekly; Anne Braae; Reinhold Schmidt; Deborah Blacker; Carlo Masullo; Helena Schmidt; Rachelle S Doody; Gianfranco Spalletta; W T Longstreth; Thomas J Fairchild; Paola Bossù; Oscar L Lopez; Matthew P Frosch; Eleonora Sacchinelli; Bernardino Ghetti; Qiong Yang; Ryan M Huebinger; Frank Jessen; Shuo Li; M Ilyas Kamboh; John Morris; Oscar Sotolongo-Grau; Mindy J Katz; Chris Corcoran; Melanie Dunstan; Amy Braddel; Charlene Thomas; Alun Meggy; Rachel Marshall; Amy Gerrish; Jade Chapman; Miquel Aguilar; Sarah Taylor; Matt Hill; Mònica Díez Fairén; Angela Hodges; Bruno Vellas; Hilkka Soininen; Iwona Kloszewska; Makrina Daniilidou; James Uphill; Yogen Patel; Joseph T Hughes; Jenny Lord; James Turton; Annette M Hartmann; Roberta Cecchetti; Chiara Fenoglio; Maria Serpente; Marina Arcaro; Carlo Caltagirone; Maria Donata Orfei; Antonio Ciaramella; Sabrina Pichler; Manuel Mayhaus; Wei Gu; Alberto Lleó; Juan Fortea; Rafael Blesa; Imelda S Barber; Keeley Brookes; Chiara Cupidi; Raffaele Giovanni Maletta; David Carrell; Sandro Sorbi; Susanne Moebus; Maria Urbano; Alberto Pilotto; Johannes Kornhuber; Paolo Bosco; Stephen Todd; David Craig; Janet Johnston; Michael Gill; Brian Lawlor; Aoibhinn Lynch; Nick C Fox; John Hardy; Roger L Albin; Liana G Apostolova; Steven E Arnold; Sanjay Asthana; Craig S Atwood; Clinton T Baldwin; Lisa L Barnes; Sandra Barral; Thomas G Beach; James T Becker; Eileen H Bigio; Thomas D Bird; Bradley F Boeve; James D Bowen; Adam Boxer; James R Burke; Jeffrey M Burns; Joseph D Buxbaum; Nigel J Cairns; Chuanhai Cao; Chris S Carlson; Cynthia M Carlsson; Regina M Carney; Minerva M Carrasquillo; Steven L Carroll; Carolina Ceballos Diaz; Helena C Chui; David G Clark; David H Cribbs; Elizabeth A Crocco; Charles DeCarli; Malcolm Dick; Ranjan Duara; Denis A Evans; Kelley M Faber; Kenneth B Fallon; David W Fardo; Martin R Farlow; Steven Ferris; Tatiana M Foroud; Douglas R Galasko; Marla Gearing; Daniel H Geschwind; John R Gilbert; Neill R Graff-Radford; Robert C Green; John H Growdon; Ronald L Hamilton; Lindy E Harrell; Lawrence S Honig; Matthew J Huentelman; Christine M Hulette; Bradley T Hyman; Gail P Jarvik; Erin Abner; Lee-Way Jin; Gyungah Jun; Anna Karydas; Jeffrey A Kaye; Ronald Kim; Neil W Kowall; Joel H Kramer; Frank M LaFerla; James J Lah; James B Leverenz; Allan I Levey; Ge Li; Andrew P Lieberman; Kathryn L Lunetta; Constantine G Lyketsos; Daniel C Marson; Frank Martiniuk; Deborah C Mash; Eliezer Masliah; Wayne C McCormick; Susan M McCurry; Andrew N McDavid; Ann C McKee; Marsel Mesulam; Bruce L Miller; Carol A Miller; Joshua W Miller; John C Morris; Jill R Murrell; Amanda J Myers; Sid O'Bryant; John M Olichney; Vernon S Pankratz; Joseph E Parisi; Henry L Paulson; William Perry; Elaine Peskind; Aimee Pierce; Wayne W Poon; Huntington Potter; Joseph F Quinn; Ashok Raj; Murray Raskind; Barry Reisberg; Christiane Reitz; John M Ringman; Erik D Roberson; Ekaterina Rogaeva; Howard J Rosen; Roger N Rosenberg; Mark A Sager; Andrew J Saykin; Julie A Schneider; Lon S Schneider; William W Seeley; Amanda G Smith; Joshua A Sonnen; Salvatore Spina; Robert A Stern; Russell H Swerdlow; Rudolph E Tanzi; Tricia A Thornton-Wells; John Q Trojanowski; Juan C Troncoso; Vivianna M Van Deerlin; Linda J Van Eldik; Harry V Vinters; Jean Paul Vonsattel; Sandra Weintraub; Kathleen A Welsh-Bohmer; Kirk C Wilhelmsen; Jennifer Williamson; Thomas S Wingo; Randall L Woltjer; Clinton B Wright; Chang-En Yu; Lei Yu; Fabienne Garzia; Feroze Golamaully; Gislain Septier; Sebastien Engelborghs; Rik Vandenberghe; Peter P De Deyn; Carmen Muñoz Fernadez; Yoland Aladro Benito; Hakan Thonberg; Charlotte Forsell; Lena Lilius; Anne Kinhult-Stählbom; Lena Kilander; RoseMarie Brundin; Letizia Concari; Seppo Helisalmi; Anne Maria Koivisto; Annakaisa Haapasalo; Vincent Dermecourt; Nathalie Fievet; Olivier Hanon; Carole Dufouil; Alexis Brice; Karen Ritchie; Bruno Dubois; Jayanadra J Himali; C Dirk Keene; JoAnn Tschanz; Annette L Fitzpatrick; Walter A Kukull; Maria Norton; Thor Aspelund; Eric B Larson; Ron Munger; Jerome I Rotter; Richard B Lipton; María J Bullido; Albert Hofman; Thomas J Montine; Eliecer Coto; Eric Boerwinkle; Ronald C Petersen; Victoria Alvarez; Fernando Rivadeneira; Eric M Reiman; Maura Gallo; Christopher J O'Donnell; Joan S Reisch; Amalia Cecilia Bruni; Donald R Royall; Martin Dichgans; Mary Sano; Daniela Galimberti; Peter St George-Hyslop; Elio Scarpini; Debby W Tsuang; Michelangelo Mancuso; Ubaldo Bonuccelli; Ashley R Winslow; Antonio Daniele; Chuang-Kuo Wu; Oliver Peters; Benedetta Nacmias; Matthias Riemenschneider; Reinhard Heun; Carol Brayne; David C Rubinsztein; Jose Bras; Rita Guerreiro; Ammar Al-Chalabi; Christopher E Shaw; John Collinge; David Mann; Magda Tsolaki; Jordi Clarimón; Rebecca Sussams; Simon Lovestone; Michael C O'Donovan; Michael J Owen; Timothy W Behrens; Simon Mead; Alison M Goate; Andre G Uitterlinden; Clive Holmes; Carlos Cruchaga; Martin Ingelsson; David A Bennett; John Powell; Todd E Golde; Caroline Graff; Philip L De Jager; Kevin Morgan; Nilufer Ertekin-Taner; Onofre Combarros; Bruce M Psaty; Peter Passmore; Steven G Younkin; Claudine Berr; Vilmundur Gudnason; Dan Rujescu; Dennis W Dickson; Jean-François Dartigues; Anita L DeStefano; Sara Ortega-Cubero; Hakon Hakonarson; Dominique Campion; Merce Boada; John Keoni Kauwe; Lindsay A Farrer; Christine Van Broeckhoven; M Arfan Ikram; Lesley Jones; Jonathan L Haines; Christophe Tzourio; Lenore J Launer; Valentina Escott-Price; Richard Mayeux; Jean-François Deleuze; Najaf Amin; Peter A Holmans; Margaret A Pericak-Vance; Philippe Amouyel; Cornelia M van Duijn; Alfredo Ramirez; Li-San Wang; Jean-Charles Lambert; Sudha Seshadri; Julie Williams; Gerard D Schellenberg Journal: Nat Genet Date: 2017-07-17 Impact factor: 41.307
Authors: Hailiang Huang; Ming Fang; Luke Jostins; Maša Umićević Mirkov; Gabrielle Boucher; Carl A Anderson; Vibeke Andersen; Isabelle Cleynen; Adrian Cortes; François Crins; Mauro D'Amato; Valérie Deffontaine; Julia Dmitrieva; Elisa Docampo; Mahmoud Elansary; Kyle Kai-How Farh; Andre Franke; Ann-Stephan Gori; Philippe Goyette; Jonas Halfvarson; Talin Haritunians; Jo Knight; Ian C Lawrance; Charlie W Lees; Edouard Louis; Rob Mariman; Theo Meuwissen; Myriam Mni; Yukihide Momozawa; Miles Parkes; Sarah L Spain; Emilie Théâtre; Gosia Trynka; Jack Satsangi; Suzanne van Sommeren; Severine Vermeire; Ramnik J Xavier; Rinse K Weersma; Richard H Duerr; Christopher G Mathew; John D Rioux; Dermot P B McGovern; Judy H Cho; Michel Georges; Mark J Daly; Jeffrey C Barrett Journal: Nature Date: 2017-06-28 Impact factor: 49.962
Authors: Rita Guerreiro; Aleksandra Wojtas; Jose Bras; Minerva Carrasquillo; Ekaterina Rogaeva; Elisa Majounie; Carlos Cruchaga; Celeste Sassi; John S K Kauwe; Steven Younkin; Lilinaz Hazrati; John Collinge; Jennifer Pocock; Tammaryn Lashley; Julie Williams; Jean-Charles Lambert; Philippe Amouyel; Alison Goate; Rosa Rademakers; Kevin Morgan; John Powell; Peter St George-Hyslop; Andrew Singleton; John Hardy Journal: N Engl J Med Date: 2012-11-14 Impact factor: 91.245
Authors: Aswin Sekar; Allison R Bialas; Heather de Rivera; Avery Davis; Timothy R Hammond; Nolan Kamitaki; Katherine Tooley; Jessy Presumey; Matthew Baum; Vanessa Van Doren; Giulio Genovese; Samuel A Rose; Robert E Handsaker; Mark J Daly; Michael C Carroll; Beth Stevens; Steven A McCarroll Journal: Nature Date: 2016-01-27 Impact factor: 49.962
Authors: Riccardo E Marioni; Sarah E Harris; Qian Zhang; Allan F McRae; Saskia P Hagenaars; W David Hill; Gail Davies; Craig W Ritchie; Catharine R Gale; John M Starr; Alison M Goate; David J Porteous; Jian Yang; Kathryn L Evans; Ian J Deary; Naomi R Wray; Peter M Visscher Journal: Transl Psychiatry Date: 2018-05-18 Impact factor: 6.222
Authors: Michael Wainberg; Daniele Merico; Matthew C Keller; Eric B Fauman; Shreejoy J Tripathy Journal: Mol Psychiatry Date: 2022-04-11 Impact factor: 15.992
Authors: Julien Bryois; Daniela Calini; Will Macnair; Lynette Foo; Eduard Urich; Ward Ortmann; Victor Alejandro Iglesias; Suresh Selvaraj; Erik Nutma; Manuel Marzin; Sandra Amor; Anna Williams; Gonçalo Castelo-Branco; Vilas Menon; Philip De Jager; Dheeraj Malhotra Journal: Nat Neurosci Date: 2022-08-01 Impact factor: 28.771
Authors: Roman Kosoy; John F Fullard; Biao Zeng; Jaroslav Bendl; Pengfei Dong; Samir Rahman; Steven P Kleopoulos; Zhiping Shao; Kiran Girdhar; Jack Humphrey; Katia de Paiva Lopes; Alexander W Charney; Brian H Kopell; Towfique Raj; David Bennett; Christopher P Kellner; Vahram Haroutunian; Gabriel E Hoffman; Panos Roussos Journal: Nat Genet Date: 2022-08-05 Impact factor: 41.307
Authors: Ole A Andreassen; Danielle Posthuma; Douglas P Wightman; Iris E Jansen; Jeanne E Savage; Alexey A Shadrin; Shahram Bahrami; Dominic Holland; Arvid Rongve; Sigrid Børte; Bendik S Winsvold; Ole Kristian Drange; Amy E Martinsen; Anne Heidi Skogholt; Cristen Willer; Geir Bråthen; Ingunn Bosnes; Jonas Bille Nielsen; Lars G Fritsche; Laurent F Thomas; Linda M Pedersen; Maiken E Gabrielsen; Marianne Bakke Johnsen; Tore Wergeland Meisingset; Wei Zhou; Petroula Proitsi; Angela Hodges; Richard Dobson; Latha Velayudhan; Karl Heilbron; Adam Auton; Julia M Sealock; Lea K Davis; Nancy L Pedersen; Chandra A Reynolds; Ida K Karlsson; Sigurdur Magnusson; Hreinn Stefansson; Steinunn Thordardottir; Palmi V Jonsson; Jon Snaedal; Anna Zettergren; Ingmar Skoog; Silke Kern; Margda Waern; Henrik Zetterberg; Kaj Blennow; Eystein Stordal; Kristian Hveem; John-Anker Zwart; Lavinia Athanasiu; Per Selnes; Ingvild Saltvedt; Sigrid B Sando; Ingun Ulstein; Srdjan Djurovic; Tormod Fladby; Dag Aarsland; Geir Selbæk; Stephan Ripke; Kari Stefansson Journal: Nat Genet Date: 2021-09-07 Impact factor: 41.307