Literature DB >> 34873335

Common and rare variant association analyses in amyotrophic lateral sclerosis identify 15 risk loci with distinct genetic architectures and neuron-specific biology.

Wouter van Rheenen¹, Rick A A van der Spek², Mark K Bakker², Joke J F A van Vugt², Paul J Hop², Ramona A J Zwamborn², Niek de Klein³, Harm-Jan Westra³, Olivier B Bakker³, Patrick Deelen^3,4, Gemma Shireby⁵, Eilis Hannon⁵, Matthieu Moisse^6,7,8, Denis Baird^9,10, Restuadi Restuadi¹¹, Egor Dolzhenko¹², Annelot M Dekker², Klara Gawor², Henk-Jan Westeneng², Gijs H P Tazelaar², Kristel R van Eijk², Maarten Kooyman², Ross P Byrne¹³, Mark Doherty¹³, Mark Heverin¹⁴, Ahmad Al Khleifat¹⁵, Alfredo Iacoangeli^15,16,17, Aleksey Shatunov¹⁵, Nicola Ticozzi^18,19, Johnathan Cooper-Knock²⁰, Bradley N Smith¹⁵, Marta Gromicho²¹, Siddharthan Chandran^22,23, Suvankar Pal^22,23, Karen E Morrison²⁴, Pamela J Shaw²⁰, John Hardy²⁵, Richard W Orrell²⁶, Michael Sendtner²⁷, Thomas Meyer²⁸, Nazli Başak²⁹, Anneke J van der Kooi³⁰, Antonia Ratti^18,31, Isabella Fogh¹⁵, Cinzia Gellera³², Giuseppe Lauria^33,34, Stefania Corti^19,35, Cristina Cereda³⁶, Daisy Sproviero³⁶, Sandra D'Alfonso³⁷, Gianni Sorarù³⁸, Gabriele Siciliano³⁹, Massimiliano Filosto⁴⁰, Alessandro Padovani⁴⁰, Adriano Chiò^41,42, Andrea Calvo^41,42, Cristina Moglia^41,42, Maura Brunetti⁴¹, Antonio Canosa^41,42, Maurizio Grassano⁴¹, Ettore Beghi⁴³, Elisabetta Pupillo⁴³, Giancarlo Logroscino⁴⁴, Beatrice Nefussy⁴⁵, Alma Osmanovic^46,47, Angelica Nordin⁴⁸, Yossef Lerner^49,50, Michal Zabari^49,50, Marc Gotkine^49,50, Robert H Baloh^51,52, Shaughn Bell^51,52, Patrick Vourc'h^53,54, Philippe Corcia^54,55, Philippe Couratier^56,57, Stéphanie Millecamps⁵⁸, Vincent Meininger⁵⁹, François Salachas^58,60, Jesus S Mora Pardina⁶¹, Abdelilah Assialioui⁶², Ricardo Rojas-García⁶³, Patrick A Dion^64,65, Jay P Ross^64,66, Albert C Ludolph⁶⁷, Jochen H Weishaupt⁶⁸, David Brenner⁶⁸, Axel Freischmidt^67,69, Gilbert Bensimon^70,71,72,73, Alexis Brice⁷⁴, Alexandra Durr⁷⁴, Christine A M Payan⁷⁰, Safa Saker-Delye⁷⁵, Nicholas W Wood⁷⁶, Simon Topp¹⁵, Rosa Rademakers⁷⁷, Lukas Tittmann⁷⁸, Wolfgang Lieb⁷⁸, Andre Franke⁷⁹, Stephan Ripke^80,81,82, Alice Braun⁸², Julia Kraft⁸², David C Whiteman⁸³, Catherine M Olsen⁸³, Andre G Uitterlinden^84,85, Albert Hofman⁸⁵, Marcella Rietschel^86,87, Sven Cichon^88,89,90,91, Markus M Nöthen^88,89, Philippe Amouyel⁹², Bryan J Traynor^93,94, Andrew B Singleton⁹⁵, Miguel Mitne Neto⁹⁶, Ruben J Cauchi⁹⁷, Roel A Ophoff^98,99,100, Martina Wiedau-Pazos¹⁰¹, Catherine Lomen-Hoerth¹⁰², Vivianna M van Deerlin¹⁰³, Julian Grosskreutz^104,105, Annekathrin Roediger¹⁰⁴, Nayana Gaur¹⁰⁴, Alexander Jörk¹⁰⁴, Tabea Barthel¹⁰⁴, Erik Theele¹⁰⁴, Benjamin Ilse¹⁰⁴, Beatrice Stubendorff¹⁰⁴, Otto W Witte¹⁰⁴, Robert Steinbach¹⁰⁴, Christian A Hübner¹⁰⁶, Caroline Graff¹⁰⁷, Lev Brylev^108,109,110, Vera Fominykh^108,110, Vera Demeshonok¹¹¹, Anastasia Ataulina¹⁰⁸, Boris Rogelj^112,113,114, Blaž Koritnik¹¹⁵, Janez Zidar¹¹⁵, Metka Ravnik-Glavač¹¹⁶, Damjan Glavač¹¹⁷, Zorica Stević¹¹⁸, Vivian Drory^45,119, Monica Povedano⁶², Ian P Blair¹²⁰, Matthew C Kiernan¹²¹, Beben Benyamin^11,122, Robert D Henderson^123,124, Sarah Furlong¹²⁰, Susan Mathers¹²⁵, Pamela A McCombe^124,126, Merrilee Needham^127,128,129, Shyuan T Ngo^123,124,126, Garth A Nicholson^120,130,131, Roger Pamphlett¹³², Dominic B Rowe¹²⁰, Frederik J Steyn^124,133, Kelly L Williams¹²⁰, Karen A Mather^134,135, Perminder S Sachdev^134,136, Anjali K Henders¹¹, Leanne Wallace¹¹, Mamede de Carvalho²¹, Susana Pinto²¹, Susanne Petri⁴⁶, Markus Weber¹³⁷, Guy A Rouleau^64,65,66, Vincenzo Silani^18,19, Charles J Curtis^138,139, Gerome Breen^138,139, Jonathan D Glass¹⁴⁰, Robert H Brown¹⁴¹, John E Landers¹⁴¹, Christopher E Shaw¹⁵, Peter M Andersen⁴⁸, Ewout J N Groen², Michael A van Es², R Jeroen Pasterkamp¹⁴², Dongsheng Fan¹⁴³, Fleur C Garton¹¹, Allan F McRae¹¹, George Davey Smith^10,144, Tom R Gaunt^10,144, Michael A Eberle¹², Jonathan Mill⁵, Russell L McLaughlin¹³, Orla Hardiman¹⁴, Kevin P Kenna^2,142, Naomi R Wray^11,126, Ellen Tsai⁹, Heiko Runz⁹, Lude Franke³, Ammar Al-Chalabi^15,145, Philip Van Damme^6,7,8, Leonard H van den Berg², Jan H Veldink¹⁴⁶.

Abstract

Amyotrophic lateral sclerosis (ALS) is a fatal neurodegenerative disease with a lifetime risk of one in 350 people and an unmet need for disease-modifying therapies. We conducted a cross-ancestry genome-wide association study (GWAS) including 29,612 patients with ALS and 122,656 controls, which identified 15 risk loci. When combined with 8,953 individuals with whole-genome sequencing (6,538 patients, 2,415 controls) and a large cortex-derived expression quantitative trait locus (eQTL) dataset (MetaBrain), analyses revealed locus-specific genetic architectures in which we prioritized genes either through rare variants, short tandem repeats or regulatory effects. ALS-associated risk loci were shared with multiple traits within the neurodegenerative spectrum but with distinct enrichment patterns across brain regions and cell types. Of the environmental and lifestyle risk factors obtained from the literature, Mendelian randomization analyses indicated a causal role for high cholesterol levels. The combination of all ALS-associated signals reveals a role for perturbations in vesicle-mediated transport and autophagy and provides evidence for cell-autonomous disease initiation in glutamatergic neurons.

Entities: Chemical

Mesh：

Substances：
Glutamine
Cholesterol

Year: 2021 PMID： 34873335 PMCID： PMC8648564 DOI： 10.1038/s41588-021-00973-1

Source DB: PubMed Journal: Nat Genet ISSN： 1061-4036 Impact factor: 38.330

Main

ALS is a fatal neurodegenerative disease affecting one in 350 individuals. Due to degeneration of both upper and lower motor neurons, patients suffer from progressive paralysis, ultimately leading to respiratory failure within 3–5 years after disease onset[1]. In ~10% of patients with ALS, there is a clear family history for ALS, suggesting a strong genetic predisposition, and currently a pathogenic mutation can be found in more than half of these cases[2]. On the other hand, apparently sporadic ALS is considered a complex trait for which heritability is estimated at 40–50% (refs. [3,4]). There is no widely accepted definition of familial or sporadic ALS[5], and they are likely to represent the ends of a spectrum with overlapping genetic architectures for which the same genes have been implicated in both familial and sporadic disease[6-11]. To date, partially overlapping GWASs have identified up to six genome-wide significant loci, explaining a small proportion of the genetic susceptibility to ALS[11-16]. Indeed, some of these loci found in GWASs harbor rare variants with large effects also present in familial cases (for example, C9orf72 and TBK1)[6,17,18]. For other loci, the role of rare variants remains unknown. While ALS is referred to as a motor neuron disease, cognitive and behavioral changes are observed in up to 50% of patients, sometimes leading to frontotemporal dementia (FTD). The overlap with FTD is clearly illustrated by the pathogenic hexanucleotide repeat expansion in C9orf72, which causes familial ALS and/or FTD[17,18] and the genome-wide genetic correlation between ALS and FTD[19]. Further expanding the ALS–FTD spectrum, a genetic correlation with progressive supranuclear palsy (PSP) has been described[20]. Shared pathogenic mechanisms between ALS and other neurodegenerative diseases, including common diseases such as Alzheimer’s disease (AD) and Parkinson’s disease (PD), can further reveal ALS pathophysiology and inform new therapeutic strategies. Here, we combine new and existing individual-level genotype data in the largest GWAS of ALS to date. We present a comprehensive screen for pathogenic rare variants and short tandem repeat (STR) expansions as well as regulatory effects observed in brain cortex-derived RNA sequencing (RNA-seq) and methylation datasets to prioritize causal genes within ALS-risk loci. Furthermore, we reveal similarities and differences between ALS and other neurodegenerative diseases as well as the biological processes in disease-relevant tissues and cell types that affect ALS risk.

Results

Cross-ancestry meta-analysis reveals 15 risk loci for ALS

To generate the largest GWAS of ALS to date, we merged individual-level genotype data from 117 cohorts into six strata matched by genotyping platform. A total of 27,205 patients with ALS and 110,881 control participants of European ancestries passed quality control (including 6,374 newly genotyped cases and 22,526 control participants; Methods and Supplementary Tables 1 and 2). Patients were not selected for a family history of ALS. Through meta-analysis of these six strata, we obtained association statistics for 10,461,755 variants down to a minor allele frequency (MAF) of 0.1% in the Haplotype Reference Consortium resource[21]. We observed moderate inflation of the test statistics (λGC = 1.12, λ1000 = 1.003), and linkage disequilibrium (LD) score regression yielded an intercept of 1.029 (s.e. = 0.0073), indicating that the majority of inflation was due to the polygenic signal in ALS (LD score regression (LDSC): = 0.028, s.e. = 0.003, K = 350−1, P = 5.5 × 10−21). The European ancestry analysis identified 12 loci reaching genome-wide significance (P < 5.0 × 10−8; Extended Data Fig. 1). For nine loci, the top SNP or a strong LD proxy (r2 = 0.996) was present in GWAS of ALS in Asian ancestries (2,407 patients with ALS and 11,775 control participants)[15,16], and all showed a consistent direction of effects (Pbinom = 2.0 × 10−3). The three SNPs that were not present in the Asian ancestry GWAS were low-frequency variants (MAF of 0.6–1.6% in European ancestries, Table 1). The genetic overlap between ALS risk in European and Asian ancestries resulted in a trans-ancestry genetic correlation of 0.57 (s.e. = 0.28) for genetic effect and 0.58 (s.e. = 0.30) for genetic impact, which were not statistically significantly different from unity (P = 0.13 and P = 0.16, respectively). We therefore performed a cross-ancestry meta-analysis totaling 29,612 cases and 122,656 controls, which revealed three additional loci, totaling 15 genome-wide significant risk loci for ALS risk (Fig. 1, Table 1 and Supplementary Tables 4–18). Conditional and joint analysis did not identify secondary signals within these loci.

Extended Data Fig. 1

Manhattan plot in European ancestries GWAS.

Genome-wide association statistics obtained by inverse-variance weighted meta-analysis of the stratified SAIGE logistic mixed model regression in European ancestry cohorts. Y-axis corresponds to the two-tailed -log10(P-value), x-axis corresponds to the genomic coordinates (GRCh37). Loci containing a genome-wide significant SNP are highlighted in red. SNP IDs are the top associated SNPs in each locus. The dotted horizontal line reflects the threshold for genome-wide significance (P = 5 × 10−8).

Source data

Table 1

Genome-wide significant loci

							European ancestries		Asian ancestries		Cross-ancestry
Chr	Position (bp)	ID	Prioritized gene	A₁	A₂	Freq	Effect (s.e.)	P	Effect (s.e.)	P	Effect (s.e.)	P
9	27,563,868	rs2453555	C9orf72	A	G	0.248	0.174 (0.013)	1.0 × 10⁻⁴³	0.017 (0.066)	0.80	0.168 (0.012)	1.5 × 10⁻⁴¹
19	17,752,689	rs12608932	UNC13A	C	A	0.347	0.125 (0.012)	8.8 × 10⁻²⁵	0.074 (0.038)	0.053	0.120 (0.012)	3.0 × 10⁻²⁵
21	33,039,603	rs80265967	SOD1	C	A	0.006	1.078 (0.124)	3.5 × 10⁻¹⁸	–	–	–	–
14	31,045,596	rs229195	SCFD1	A	G	0.337	0.091 (0.012)	9.2 × 10⁻¹⁵	–	–	–	–
14	31,045,181	rs229194^a	SCFD1	A	G	0.337	0.091 (0.012)	9.2 × 10⁻¹⁵	0.002 (0.036)	0.97	0.083 (0.011)	1.5 × 10⁻¹³
3	39,508,968	rs631312	MOBP, RPSA	G	A	0.291	0.079 (0.012)	5.2 × 10⁻¹¹	0.084 (0.036)	0.020	0.080 (0.011)	3.3 × 10⁻¹²
6	32,672,641	rs9275477	HLA	C	A	0.096	−0.143 (0.021)	5.5 × 10⁻¹²	−0.110 (0.111)	0.32	−0.142 (0.02)	3.5 × 10⁻¹²
12	57,975,700	rs113247976	KIF5A	T	A	0.016	0.332 (0.049)	1.4 × 10⁻¹¹	–	–	–	–
21	45,753,117	rs75087725	CFAP410	A	C	0.012	0.418 (0.063)	2.7 × 10⁻¹¹	–	–	–	–
5	150,410,835	rs10463311	GPX3, TNIP1	C	T	0.253	0.079 (0.013)	3.5 × 10⁻¹⁰	0.042 (0.036)	0.24	0.075 (0.012)	2.7 × 10⁻¹⁰
20	48,438,761	rs17785991	SLC9A8, SPATA2	A	T	0.353	0.074 (0.012)	3.5 × 10⁻¹⁰	0.045 (0.076)	0.55	0.073 (0.012)	3.2 × 10⁻¹⁰
12	64,877,053	rs4075094	TBK1	A	T	0.112	−0.098 (0.018)	1.7 × 10⁻⁸	−0.216 (0.090)	0.017	−0.103 (0.017)	2.1 × 10⁻⁹
5	172,354,731	rs517339	ERGIC1	C	T	0.397	−0.065 (0.011)	8.5 × 10⁻⁹	−0.067 (0.074)	0.37	−0.065 (0.011)	5.6 × 10⁻⁹
4	170,583,157	rs62333164	NEK1	A	G	0.335	0.063 (0.012)	7.0 × 10⁻⁸	0.203 (0.070)	3.8 × 10⁻³	0.067 (0.012)	6.9 × 10⁻⁹
13	46,113,984	rs2985994	COG3	C	T	0.259	0.066 (0.013)	1.9 × 10⁻⁷	0.100 (0.041)	0.014	0.069 (0.012)	1.2 × 10⁻⁸
7	157,481,780	rs10280711	PTPRN2	G	C	0.124	0.076 (0.017)	5.8 × 10⁻⁶	0.132 (0.037)	2.9 × 10⁻⁴	0.086 (0.015)	1.8 × 10⁻⁸

Details of two-sided SAIGE logistic mixed model regression for the top associated SNPs within each genome-wide significant locus (P < 5 × 10−8). aFor the strongest associated SNP in the SCFD1 locus, rs229195 (MAF = 0.337), details of the LD proxy rs229194 are described (MAF = 0.337, r2 = 0.996 in Asian ancestries), as only the LD proxy was present in the Asian ancestry GWAS. The low-frequency SNPs rs80265967, rs113247976 and rs75087725 were not present in the Asian ancestry GWAS, and no LD proxies (r2 > 0.8) were found. Chr, chromosome; Position, basepair position in the reference genome GRCh37; A1, effect allele; A2, non-effect allele; Freq, frequency of the effect allele in the European ancestry GWAS; s.e., standard error of the effect estimate.

Fig. 1

Manhattan plot of cross-ancestry meta-analysis.

Source data

Genome-wide significant loci Details of two-sided SAIGE logistic mixed model regression for the top associated SNPs within each genome-wide significant locus (P < 5 × 10−8). aFor the strongest associated SNP in the SCFD1 locus, rs229195 (MAF = 0.337), details of the LD proxy rs229194 are described (MAF = 0.337, r2 = 0.996 in Asian ancestries), as only the LD proxy was present in the Asian ancestry GWAS. The low-frequency SNPs rs80265967, rs113247976 and rs75087725 were not present in the Asian ancestry GWAS, and no LD proxies (r2 > 0.8) were found. Chr, chromosome; Position, basepair position in the reference genome GRCh37; A1, effect allele; A2, non-effect allele; Freq, frequency of the effect allele in the European ancestry GWAS; s.e., standard error of the effect estimate.

Manhattan plot of cross-ancestry meta-analysis.

Genome-wide association statistics obtained by IVW meta-analysis of the stratified SAIGE logistic mixed model regression. The y axis corresponds to two-tailed −log10 (Pvalues); the x axis corresponds to genomic coordinates (GRCh37). The horizontal dashed line reflects the threshold for calling genome-wide significant SNPs (P = 5 × 10−8). Color coding and gene labels reflect those prioritized by the gene-prioritization analysis. Labels in bold indicate genes with known highly pathogenic mutations for ALS. SAIGE = Scalable and Accurate Implementation of Generalized mixed model software package. Source data Of these findings, eight loci have been reported in previous GWASs (C9orf72, UNC13A, SCFD1, MOBP–RPSA, KIF5A, CFAP410, GPX3–TNIP1 and TBK1)[11,14,15]. The rs80265967 variant corresponds to the p.D90A mutation in SOD1 previously identified in a Finnish ALS cohort enriched for familial ALS[13]. Interestingly, we observed a genome-wide significant common variant association signal within the NEK1 locus, which was previously shown to harbor rare variants associated with ALS[8]. The recently reported association at the ACSL5–ZDHHC6 locus[16,22] did not reach the threshold for genome-wide significance (rs58854276, PEUR = 5.4 × 10−5, PASN = 4.9 × 10−7, Pcomb = 6.5 × 10−8; Supplementary Table 19), despite the fact that our analysis includes all data from the original discovery studies.

Rare variant gene-based association analyses in ALS

To assess a general pattern of underlying architectures that link associated SNPs to causal genes, we first tested for annotation-specific enrichment using stratified LDSC. This revealed that 5′ UTR regions as well as coding regions in the genome and those annotated as conserved were most enriched for ALS-associated SNPs (Extended Data Fig. 2). Subsequently, we investigated how rare, coding variants contributed to ALS risk by generating a whole-genome sequencing (WGS) dataset of patients with ALS (n = 6,538) and control participants (n = 2,415), which is a subset of the common variant GWAS cohort. The exome-wide association analysis included transcript-level rare variant burden testing for different models of allele-frequency thresholds and variant annotations (Methods). This identified NEK1 as the strongest associated gene (minimal P = 4.9 × 10−8 for disruptive and damaging variants at MAF < 0.005), which was the only gene to pass the exome-wide significance thresholds (0.05 ÷ 17,994 = 2.8 × 10−6 and 0.05 ÷ 58,058 = 8.6 × 10−7 for number of genes and protein-coding transcripts, respectively; Supplementary Table 20). This association was independent from the previously reported increased rare variant burden in selected patients with ‘familial ALS’ (ref. [8]) who were not included in this study. Polygenic risk score (PRS) analyses did not illustrate a difference in PRSs in patients carrying rare variants in ALS-risk genes (SOD1, C9orf72 repeat expansion, TARDBP, FUS, NEK1, TBK1 and CFAP410) compared to all patients with ALS (Extended Data Fig. 3). Although power was limited, this is compatible with a scenario in which the genetic risk of ALS in these patients is a sum of rare variants in ALS genes and other (common) genetic variation.

Extended Data Fig. 2

Annotation specific heritability enrichment.

Enrichment of SNP-based heritability was calculated with LD-score regression. Grey dashed line represents no enrichment (enrichment = 1). Error bars denote standard error of enrichment estimate. Nominal statistically significant enrichment estimates (two-sided P < 0.05) are marked with an asterisk (Conserved_LindbladToh P = 6.5 × 10−5, SuperEnhancer_Hnisz P = 0.014, TFBS_ENCODE P = 0.017, H3K4me1_peaks_Trynka P = 0.018, Coding_UCSC P = 0.028, H3K9ac_Trynka P = 0.037). The category Conserved_LindbladToh was significant after Bonferroni correction for multiple testing across all categories (N = 28). Due to the regression framework in LDSC, enrichment estimates < 0 are possible (with large standard errors).

Source data

Extended Data Fig. 3

PRS stratified by rare variant carrier status.

Distribution of PRS in controls and ALS patients with or without one or more rare variants in ALS risk genes. There was no statistically significant difference in PRS between ALS patients with and without rare variants in ALS risk genes (labeled as gene_mut or gene_wt respectively). In total, 5,112 ALS patients and 2,132 controls from stratum 6 with whole-genome sequencing data available were included. For SOD1, TARDBP, FUS, NEK1, TBK1, and CFAP410, rare variants were included according to the model that yielded the strongest association in the rare variant burden association analyses. For C9orf72, patients with the pathogenic hexanucleotide repeat expansion were compared to those without the expansion. The ‘any ALS gene’ groups all patients together with a rare variant in any of the ALS risk genes. P-values for difference in PRS were derived by two-tailed logistic regression. The number of ALS patients carrying a rare variant per gene is denoted in the corresponding panel. Intervals for boxplots: center = median, box = lower and upper quartile, hinges = median ± 2 * IQR, IQR = interquartile range.

Source data

Gene prioritization shows locus-specific underlying architectures

To assess whether rare variant associations could drive the common variant signals at the 15 genome-wide significant loci, we combined the common and rare variant analyses to prioritize genes within these loci. The SNP effects on gene expression were assessed by summary-based Mendelian randomization (MR) (SMR) in blood (eQTLGen[23], n = 31,648) and a new brain cortex-derived eQTL dataset (MetaBrain[24], n = 2,970). Finally, we analyzed methylation quantitative trait loci (mQTL) by SMR in blood-derived (n = 2,082) and brain-derived (n = 522) mQTL datasets[25-27]. Through these multi-layered gene-prioritization strategies, we classified each locus into one of four classes of most likely underlying genetic architecture to prioritize the causal gene (Supplementary Figs. 1–15). First, in three GWAS loci, the strongest associated SNP was a low-frequency coding variant that was nominated as the causal variant. This was the case for rs80265967 (SOD1, p.D90A; Supplementary Fig. 14) and rs113247976 (KIF5A, p.P986L; Supplementary Fig. 8), which are coding variants in known ALS-risk genes. This was also the most likely causal mechanism for rs75087725 (CFAP410, formerly C21orf2, p.V58L; Supplementary Fig. 15), as the GWAS variant is a missense variant; no evidence for other mechanisms including repeat expansions or eQTL or mQTL effects was observed within this locus, and CFAP410 itself is known to directly interact with NEK1, another ALS gene[6,28]. These three loci illustrate the power of large-scale GWASs combined with large imputation panels to directly identify low-frequency causal variants that confer disease risk. Second, SNPs can tag a highly pathogenic repeat expansion, as was observed for rs2453555 (C9orf72) and the known GGGGCC hexanucleotide repeat in this locus (Supplementary Fig. 7). Conditional analysis revealed no residual signal after conditioning on the repeat expansion, which was in LD with the top SNP (r2 = 0.14, |D′| = 0.99, MAFSNP = 0.25, MAFSTR = 0.047). Besides the repeat expansion, both eQTL and mQTL analyses point to C9orf72 (Supplementary Fig. 7). The HEIDI (heterogeneity in dependent instruments) outlier test, however, rejected the null hypothesis that gene expression or methylation mediated the causal effect of the associated SNP (PHEIDI,eQTL = 3.7 × 10−23 and PHEIDI,mQTL = 4.1 × 10−7). This is in line with the idea that pathogenic repeat expansion is the causal variant in this locus and that eQTL and mQTL effects do not mediate a causal effect. We found no similar pathogenic repeat expansions that fully explained the SNP association signal in the other genome-wide significant loci. Third, in two loci (rs62333164 in NEK1 and rs4075094 in TBK1), common and rare variants converged to the same gene, which are known ALS-risk genes[6,8]. For both loci, the rare variant burden association was conditionally independent from the top SNP that was included in the GWAS (Supplementary Figs. 2 and 9). Here, eQTL and mQTL analyses indicated that the risk-increasing effects of the common variants were mediated through both eQTL and mQTL effects on NEK1 and TBK1. Furthermore, a polymorphic STR downstream of NEK1 was associated with increased ALS risk (motif, TTTA; threshold = 10 repeat units, expanded allele frequency = 0.51, P = 5.2 × 10−5, false discovery rate (FDR) = 4.7 × 10−4; Extended Data Fig. 4). This polymorphic repeat was in LD with the top associated SNP within this locus (r2 = 0.24, |D′| = 0.70). There was no statistically significant association for the top SNP in the WGS data to reliably determine its independent contribution to ALS risk.

Extended Data Fig. 4

NEK1 repeat distribution.

The frequency of STR alleles in ALS cases and controls are shown. A repeat length of 11 and longer was used as the optimal threshold for disease-associated genotype. The P-value was calculated by Firth logistic regression and FDR correction over all possible thresholds. Y-axis shows the allele frequency of repeat lengths. Repeat position on GRCh37, and repeat motif are shown.

Source data

Lastly, the fourth group contains seven remaining loci for which there was no direct link to a causal gene through coding variants or repeat expansions. Here, we investigated regulatory effects of the associated SNPs on target genes acting as either eQTL or mQTL. Single genes were prioritized by SMR using both mQTL and eQTL for rs2985994 (COG3; Supplementary Fig. 10), rs229243 (SCFD1; Supplementary Fig. 11) and rs517339 (ERGIC1; Supplementary Fig. 4). In other loci, both methods prioritized multiple genes, such as rs631312 (MOBP and RPSA; Supplementary Fig. 1) and rs10463311 (GPX3 and TNIP1; Supplementary Fig. 3). Aside from the prioritized genes, each of these loci harbored multiple genes that were not prioritized by any method and are therefore less likely to contribute to ALS risk. For two loci, no gene was prioritized with these approaches. Within the UNC13A locus (rs12608932; Supplementary Fig. 12), recent studies illustrate that the genome-wide significant SNPs act as splicing quantitative trait loci conditional on dysfunction of TAR DNA-binding protein (TDP)-43, resulting in inclusion of a cryptic exon in UNC13A[29,30]. Furthermore, we could not prioritize a specific gene in the HLA locus (rs9275477; Supplementary Fig. 5).

Genetic modifiers of ALS disease progression

We investigated whether genetic risk factors for ALS also act as disease modifiers that affect disease onset and progression. Genotypes for the 15 genome-wide significant SNPs, PRSs and the rare variant burden for SOD1, C9orf72 (repeat expansion status), TARDBP, FUS, NEK1, TBK1 and CFAP410 were obtained for all individuals with WGS for whom the complete core clinical data (sex, age at onset, site of onset, survival, time to censoring) were available (n = 6,095). Association analyses with survival and age at onset showed that common variants had a limited effect on survival (Fig. 2a) and age at onset (Fig. 2b) but confirmed the association between faster disease progression for the UNC13A risk allele (rs12608932, hazard ratio (HR) = 1.10, 95% confidence interval (CI) = 1.05–1.15, P = 1.2 × 10−4) and slower disease progression in patients with the SOD1 p.D90A mutation (rs80265967, HR = 0.35, 95% CI = 0.16–0.77, P = 8.4 × 10−4). This limited effect of common genetic risk factors for ALS susceptibility on disease progression was reflected in the PRS analyses in which we found no effect of the full-genome PRS on survival (HR = 1.02, 95% CI = 0.98–1.06, P = 0.28) or age at onset (b = 0.10, s.e. = 0.21, P = 0.64). Analyses of rare variants confirmed faster disease progression in patients with the C9orf72 repeat expansion (HR = 1.45, 95% CI = 1.28–1.65, P = 1.2 × 10−8) with an earlier age at onset (b = −2.62, s.e. = 0.77, P = 6.4 × 10−4).

Fig. 2

Genetic modifier analyses.

a, Cox proportional HRs for genome-wide significant SNPs (brown, n = 15), PRSs (red, n = 2) and rare variant burden in ALS-risk genes (pink, n = 7) on survival (months) tested in 6,095 patients with ALS. Estimated HRs are displayed with error bars corresponding to 95% CIs. Higher HRs correspond to shorter survival times. b, Effect estimates from a linear regression model of age at onset (years) in 6,095 patients with ALS. Lower effect estimates correspond to a younger age at onset. Effect estimates from linear regression are displayed with error bars corresponding to 95% CIs. The risk-increasing allele for ALS corresponds to the effect allele for both survival and age-at-onset analyses.

Source data

Genetic modifier analyses.

Locus-specific sharing of risk loci between ALS and neurodegenerative diseases

To investigate the pleiotropic properties of ALS-associated variants and shared genetic risk with other brain diseases, we estimated genetic correlations between neurodegenerative diseases, psychiatric traits, cerebrovascular diseases and multiple sclerosis (Extended Data Fig. 5). This showed strong genetic correlations among neurodegenerative diseases. Bivariate LDSC confirmed a statistically significant genetic correlation between ALS and PSP (rg = 0.44, s.e. = 0.11, P = 1.0 × 10−4) as previously reported[20] and also revealed a significant genetic correlation between ALS and AD (rg = 0.31, s.e. = 0.12, P = 9.6 × 10−3) as well as between ALS and PD (rg = 0.16, s.e. = 0.061, P = 0.011; Fig. 3a). The point estimate for the genetic correlation between ALS and FTD was high (rg = 0.59, s.e. = 0.41, P = 0.15) but not statistically significant due to the limited size of the FTD GWAS (3,526 cases and 9,402 controls). Thus, power to detect a genetic correlation between ALS and FTD using LDSC was limited.

Extended Data Fig. 5

Genetic correlations between brain diseases.

Correlation matrix for genetic correlation estimates obtained from bivariate LD score regression. Colors correspond to genetic correlation estimates. Strongest clusters appear between neurodegenerative diseases and within the psychiatric traits. ALS = amyotrophic lateral sclerosis, FTD = frontotemporal dementia, PSP progressive supranuclear palsy, PD = Parkinson’s disease, CBD = corticobasal degeneration, AD = (clinically diagnosed) Alzheimer’s disease, MS = multiple sclerosis, IS = ischemic stroke (any), ICH = intracerebral hemorrhage, IA = intracranial aneurysm (any), AN = anorexia nervosa, OCD = obsessive compulsive disorder, Anxiety = anxiety disorder (score), PTSD = post-traumatic stress disorder, MDD = major depressive disorder, BIP = bipolar disorder, SCZ = schizophrenia, TS = Tourette’s syndrome, ASD = autism spectrum disorder, ADHD = attention-deficit hyperactivity disorder.

Source data

Fig. 3

Shared genetic risk between ALS and neurodegenerative diseases.

Source data

Shared genetic risk between ALS and neurodegenerative diseases.

a, Genetic correlation analysis. Genetic correlation was estimated with LDSC between each pair of neurodegenerative diseases (ALS, AD, CBD, PD, PSP and FTD). Correlations marked with an asterisk reached nominal statistical significance (PALS,AD = 0.01, PALS,PD = 0.01, PALS,PSP = 0.0001, PPSP,PD = 0.002). b, SNP associations of ALS lead SNPs or LD proxies in neurodegenerative diseases. The association with ALS is shown at the top. Effective sample size is shown on the left. Posterior probabilities of the same causal SNP affecting two diseases were estimated through colocalization analysis and are highlighted as connections. Source data Patterns of sharing disease-associated genetic variants appeared to be locus specific (Fig. 3b and Supplementary Table 21). To assess whether two traits shared a common signal, indicating shared causal variants, we performed colocalization analyses for all loci meeting P < 5 × 10−5 in any of the GWASs of neurodegenerative diseases (n = 161 loci). This revealed a shared signal in the MOBP–RPSA locus between ALS, PSP and corticobasal degeneration (CBD) as well as a shared signal in the UNC13A locus between ALS and FTD (posterior probability, PPH4 > 95%; Extended Data Fig. 6). For the HLA locus, there was evidence for a shared causal variant between ALS and PD (PPH4 = 88%) but no conclusive evidence for ALS and AD (PPH4 = 51% for a shared causal variant and PPH3 = 49% for independent signals in both traits).

Extended Data Fig. 6

Colocalization signals.

Loci were selected for colocalization analysis when the top associated SNP was associated with any neurodegenerative disease at 5 × 10−5. For ALS, the European ancestries meta-analysis was used. Bayesian posterior probabilities for a shared variant driving risk of both traits (PPH4) are reported below locus names. Colors reflect LD between the variant and top associated SNP.

Source data

Furthermore, colocalization analyses identified two additional shared loci that were not genome-wide significant in the ALS GWAS: between ALS and PD at the GAK locus (rs34311866, PPH4 = 99%) and between ALS and AD at the TSPOAP1-AS1locus (rs2632516, PPH4 = 90%). Of note, the association at TSPOAP1-AS1 was not genome-wide significant in the GWAS of clinically diagnosed AD (P = 3.7 × 10−7) either but was identified in the larger AD-by-proxy GWAS[31]. For FTD subtypes, C9orf72 showed a colocalization signal for a shared causal variant between ALS and the motor neuron disease subtype of FTD (mndFTD, PPH4 = 93%; Extended Data Figs. 6 and 7).

Extended Data Fig. 7

Colocalization analysis with FTD subtypes.

Top associated SNPs in the ALS GWAS were selected for colocalization analysis between ALS and FTD subtypes using COLOC. In the top panel, point height is the two-sided -log10(P-value) of the top-associated SNP in the ALS GWAS. In the bottom panel, association P-values of these SNPs with FTD subtypes are shown by color. The Bayesian posterior probability for a shared causal variant between traits (PPH4) is depicted by a connection between points.

Source data

Enrichment of glutamatergic neurons indicates cell-autonomous processes in ALS susceptibility

To find tissues and cell types for which gene expression profiles were enriched for genes within ALS-risk loci, we first combined gene-based association statistics calculated using MAGMA[32] with gene expression patterns from the Genotype–Tissue Expression (GTEx) project (version 8) in a gene set enrichment analysis using FUMA[33]. We observed a significant enrichment in genes expressed in brain tissues across multiple brain regions but not in peripheral nervous tissue or muscle. Whereas this pattern roughly resembled the enrichments observed in PD and psychiatric traits, it was strikingly different from that reported[31] and observed in AD in which blood, lung and spleen were mostly enriched, resembling the pattern observed in multiple sclerosis, which is a typical immune-mediated brain disease (Fig. 4a and full results in Supplementary Fig. 16 and Extended Data Fig. 8a). We subsequently queried single-cell RNA-seq datasets of human-derived brain samples to further specify brain-specific enriched cell types using the cell type analysis module in FUMA[34]. This showed significant enrichment for neurons but not for microglia or astrocytes (Fig. 4b). Further subtyping of these neurons illustrated that genes expressed in glutamatergic neurons were mostly enriched for genes within the ALS-associated risk loci. Again, this contrasted with AD, which showed specific enrichment of microglia, similar to multiple sclerosis (Extended Data Fig. 8b). In single-cell RNA-seq data obtained from brain tissues in mice, a similar pattern was observed showing neuron-specific enrichment in ALS and PD but microglia in AD (Extended Data Fig. 9). Together, this indicates that susceptibility to neurodegeneration in ALS is mainly driven by neuron-specific pathology and not by immune-related tissues and microglia.

Fig. 4

Tissue and cell type enrichment analysis.

a, Enrichment of tissues and brain regions included in GTEx version 8 illustrates a brain-specific enrichment pattern in ALS, similar to that in PD but contrasting with that in AD. Tissues and brain regions displayed are those significantly enriched in ALS or PD, tissues previously reported in AD and tissues of specific interest for ALS (spinal cord, tibial nerve and muscle). Color represents the enrichment coefficient, and size indicates two-sided −log10 (P-values) of enrichment obtained by the linear regression model in the MAGMA gene property analysis. b, Cell type enrichment analyses indicate neuron-specific enrichment for glutamatergic neurons. In ALS, no enrichment was found for microglia or other non-neuronal cell types, contrasting with the pattern observed in AD. Color represents the enrichment coefficient, and size indicates two-sided −log10 (P-values) of enrichment obtained by the linear regression model in the MAGMA gene property analysis. Statistically significant enrichments after correction for multiple testing over all tissues (n = 54), cell types (n = 7) and neurons (n = 3) with FDR < 0.05 are marked with an asterisk. Cx, cortex; GABA, γ-aminobutyric acid; OPCs, oligodendrocyte progenitor cells.

Source data

Extended Data Fig. 8

Tissue and cell-type enrichment analyses for all brain diseases.

Tissue (a) and cell-type (b) enrichment for all included brain diseases obtained from two-sided MAGMA linear regression. Only brain diseases with exome-wide significant gene-based MAGMA associations (P < 2.7 × 10−6) were suitable for tissue and cell-type enrichment analyses. The color represents enrichment coefficient and size indicates two-sided -log10(P-value) of enrichment obtained by the linear regression model in the MAGMA gene-property analysis. Due to the large number of significant genes in the gene-based MAGMA analyses for schizophrenia, bipolar disorder and multiple sclerosis the enrichment P-values were truncated at P < 1.0 × 10−5. ALS = amyotrophic lateral sclerosis, PD = Parkinson’s disease, AD = Alzheimer’s disease, ADHD = attention-deficit hyperactivity disorder, ASD = autism spectrum disorder, TS = Tourette’s syndrome, SCZ = schizophrenia, BIP = bipolar disorder, MDD = major depressive disorder, PTSD = post-traumatic stress disorder, Anxiety = anxiety disorder (score), AN = anorexia nervosa, IA intracranial aneurysm (any), IS = ischemic stroke, MS = multiple sclerosis, Cx = cortex, OPC = oligodendrocyte progenitor cells.

Source data

Extended Data Fig. 9

Cell-type enrichment analysis in mice.

Cell-type enrichment analysis using the DropViz single-cell RNA sequencing dataset obtained from mice. Similar to the cell-type enrichment analyses there is neuron-specific enrichment in ALS and Parkinson’s disease. In Alzheimer’s disease microglia are the most enriched cell-types. The color represents enrichment coefficient and size indicates two-sided -log10(P-value) of enrichment obtained by the linear regression model in the MAGMA gene-property analysis. Statistically significant enrichments after correction for multiple testing with a false discovery rate (FDR) < 0.05 are marked with an asterisk. ALS = amyotrophic lateral sclerosis, PD = Parkinson’s disease, AD = Alzheimer’s disease, Cx = cortex.

Source data

Tissue and cell type enrichment analysis.

Brain-specific coexpression networks improve detection of ALS-relevant pathways

To determine which processes were mostly enriched in ALS, we performed enrichment analyses that combined gene-based association statistics with gene coexpression patterns obtained from either multi-tissue transcriptome datasets[35] or RNA-seq data from brain cortex samples (MetaBrain[24]). To validate this approach, we first tested for enrichment of human phenotype ontology (HPO) terms that are linked to well-established disease genes in the Online Mendelian Inheritance in Man (OMIM) and Orphanet catalogs. Using the multi-tissue coexpression matrix, we found no enriched HPO terms after Bonferroni correction for multiple testing. Using the brain-specific coexpression matrix, however, we found a strong enrichment of HPO terms that are related to ALS or neurodegenerative diseases in general, including ‘cerebral cortical atrophy’ (P = 1.8 × 10−8), ‘abnormal nervous system electrophysiology’ (P = 4.1 × 10−7) and ‘distal amyotrophy’ (P = 8.6 × 10−7; full list in Supplementary Table 22). In general, HPO terms in the neurological branch (‘abnormality of the nervous system’) showed an increase in enrichment statistics in ALS when using the brain-specific coexpression matrix compared to the multi-tissue dataset (Extended Data Fig. 10), which illustrates the benefit of the brain-specific coexpression matrix. Subsequently, we tested for enriched biological processes using reactome and gene ontology terms. Again, using the multi-tissue expression profiles, we found that no reactome annotations were enriched. Leveraging the brain-specific coexpression networks, we identified vesicle-mediated transport (‘membrane trafficking’, P = 4.2 × 10−6, ‘intra-Golgi and retrograde Golgi-to-endoplasmic reticulum (ER) trafficking’, P = 1.4 × 10−5) and autophagy (‘macroautophagy’, P = 3.2 × 10−5) as enriched processes after Bonferroni correction for multiple testing (Supplementary Table 23). The subsequently identified enriched gene ontology terms were all related to vesicle-mediated transport or autophagy (Supplementary Tables 24 and 25).

Extended Data Fig. 10

Human phenotype ontology term enrichment.

Downstreamer enrichment analyses were performed using the multi-tissue and brain-specific co-expression matrix to identify co-regulated ALS-genes. The distribution of enrichment statistics (Z-scores) for all Human phenotype ontology (HPO) terms are plotted per HPO parent branch. The multi-tissue analysis indicates enrichment for the neurology parent branch ‘abnormality of the nervous system’ (dark-red), although no term passes the Bonferroni threshold for multiple testing. The brain-specific analysis illustrates stronger enrichment for the neurology parent branch. In total, 58 HPO terms pass the threshold for multiple testing of which 42 are defined within the ‘abnormality of the nervous system’ branch.

Source data

MR analyses are in line with a causal relationship between cholesterol levels and ALS

From previous observational case–control studies and our blood-based methylome-wide study[36], numerous non-genetic risk factors have been implicated in ALS. Here, we studied a selection of those putative risk factors through causal inference in an MR framework[37]. We selected 22 risk factors for which robust genetic predictors were available including body mass index, smoking, alcohol consumption, physical activity, cholesterol-related traits, cardiovascular diseases and inflammatory markers (Supplementary Table 26). These analyses provided the strongest evidence that cholesterol levels were causally related to ALS risk (bweighted median = 0.15, s.e. = 0.04, P = 3.2 × 10−4; Fig. 5a and full results in Supplementary Table 27). These results were robust to removal of outliers through radial MR analysis[38], and we observed no evidence for reverse causality (Supplementary Tables 28 and 29). Importantly, ascertainment bias can lead to the selection of more highly educated control participants[39] compared to patients with ALS who are mostly ascertained through the clinic. In line with control participants having higher education, MR analyses indicated a negative effect for years of schooling on ALS risk (inverse-variance-weighted PIVW = 2.0 × 10−4; Fig. 5b). As a result, years of schooling can act as a confounder for the observed risk-increasing effect of higher total cholesterol levels through ascertainment bias. To correct for this potential confounding, we applied multivariate MR analyses including both years of schooling and total cholesterol levels. The results for total cholesterol were robust in the multivariate analyses, suggesting a causal role for total cholesterol levels on ALS susceptibility (Supplementary Table 30).

Fig. 5

Causal inference of total cholesterol levels and years of schooling in ALS.

a, MR results for ALS and total cholesterol levels. Results for the five different MR methods for two different P-value cutoffs for SNP instrument selection are presented. In total, 83 and 178 SNPs were used as instruments at cutoffs of P < 5 × 10−8 and P < 5 × 10−5, respectively. All methods show a consistent positive effect for an increased risk of ALS with higher total cholesterol levels. There is no evidence for reverse causality. Point estimates for MR are presented with error bars reflecting 95% CIs. b, MR results for ALS and years of schooling. In total, 306 and 681 SNPs were used as instruments at cutoffs of P < 5 × 10−8 and P < 5 × 10−5. Point estimates for MR are presented, with error bars reflecting 95% CIs. Statistically significant effects with a two-sided P-value passing Bonferroni correction for multiple testing over all tested traits (n = 22), instrument P-value cutoffs (n = 2) and MR methods (n = 5) are marked with an asterisk (total cholesterol, Pweighted median = 0.0003 and Pweighted median = 0.0007 for cutoffs at P < 5 × 10−8 and P < 5 × 10−5, respectively; years of schooling, PIVW = 0.0002 at the cutoff of P < 5 × 10−5). Here, SNP outliers were not removed for instrument selection. Z, genetic instrument; b, estimated causal effect for an increase of 1 s.d. in genetically predicted exposure.

Source data

Causal inference of total cholesterol levels and years of schooling in ALS.

Discussion

In summary, in the largest GWAS on ALS to date including 29,612 patients with ALS and 122,656 control participants, we identified 15 risk loci contributing to ALS risk. Through in-depth analysis of these loci incorporating rare variant burden analyses and repeat expansion screens in WGS data and blood- and brain-specific eQTL and mQTL analyses, we prioritized genes in 13 of the loci. Across the spectrum of neurodegenerative diseases, we identified a genetic correlation between ALS and AD as well as PD and PSP with locus-specific patterns of shared genetic risk across all neurodegenerative diseases. Colocalization analysis identified two additional loci, GAK and TSPOAP1-AS1, with a high posterior probability of shared causal variants between ALS and PD and between ALS and AD, respectively. We found glutamatergic neurons as the most enriched cell type in the brain, and brain-specific coexpression network enrichment analyses indicated a role for vesicle-mediated transport and autophagy in ALS. Finally, causal inference of previously described risk factors provides evidence for high total cholesterol levels as a causal risk factor for ALS. The cross-ancestry comparison illustrated similarities in the genetic risk factors for ALS in European and East Asian ancestries, providing an argument for cross-ancestry studies and to further expand ALS GWASs in non-European populations. It is important to note that three loci including those that harbor low-frequency variants (KIF5A, SOD1 and CFAP410) were not included in the East Asian GWAS due to their low MAFs. Therefore, the shared genetic risk might not extend to rare genetic variation, for which population-specific frequencies have been observed even within Europe. The multi-layered gene-prioritization analyses highlighted four different classes of genome-wide significant loci in ALS. First, the sample size of this GWAS combined with accurate imputation of low-frequency variants directly identified rare coding variants that increase ALS risk. These include the known p.D90A mutation in SOD1 (MAF = 0.006) as well as rare variants in KIF5A (MAF = 0.016) and CFAP410 (MAF = 0.012) for which, after their identification through GWAS, experimental work confirmed their direct role in ALS pathophysiology[11,28,40]. Second, we confirmed that the pathogenic C9orf72 repeat expansion is tagged by genome-wide significant GWAS SNPs and that no residual signal is left by conditioning the SNP on the repeat expansion. Although more repeat expansions are known to affect ALS risk, we found no similar loci for which the SNPs tag a highly pathogenic repeat expansion. This suggests that highly pathogenic repeat expansions on a stable haplotype are merely the exception rather than the rule in ALS. Third, common and rare variant association signals can converge on the same gene as observed for NEK1 and TBK1, consistent with observations for other traits and diseases[41-43]. We show that these signals are conditionally independent and that the common variants act on the same gene through regulatory effects as eQTL or mQTL. Fourth, we find evidence for regulatory effects of ALS-associated SNPs that act as eQTL or mQTL. These locus-specific architectures illustrate the complexity of ALS-associated GWAS loci for which not one solution fits all, but instead a multi-layered approach to prioritize genes is warranted. In addition, we find locus-specific patterns of shared effects across neurodegenerative diseases. The MOBP locus has previously been identified in PSP and ALS, and here we show that indeed both diseases as well as CBD are likely to share the same causal variant in this locus. The same is true for UNC13A and C9orf72 with FTD and mndFTD, respectively. The colocalization analysis with PD identified a shared causal variant in the GAK locus, which was not found in the ALS GWAS alone. Furthermore, the TSPOAP1-AS1 locus harbors SNPs associated with ALS and AD risk. Although this locus was not significant in either of the GWASs, a larger GWAS including AD-by-proxy cases confirmed this as a risk locus for AD. This illustrates the power of cross-disorder analyses to leverage the shared genetic risk of neurodegenerative diseases. We aimed to clarify the role of neuron-specific pathology in ALS susceptibility as opposed to non-cell-autonomous pathology through detailed cell type enrichment analyses. Previous experiments have illustrated multiple lines of evidence for non-cell-autonomous pathology in microglia, astrocytes and oligodendrocytes, which ultimately leads to neurodegeneration in ALS[44-46]. These experiments have shown that non-cell-autonomous processes, such as neuroinflammation, mainly act as modifiers of disease in SOD1 models of ALS[45,46]. Here, we show that genes within loci associated with ALS susceptibility are specifically expressed in (glutamatergic) neurons. This provides evidence for neuron-specific pathology as a driver of ALS susceptibility, which is in stark contrast to the signal of inflammation-associated tissues and cell types in AD and multiple sclerosis. It also shows that disease susceptibility and disease modification can be distinct processes, which is supported by our finding that most genetic susceptibility factors do not have a strong effect on survival. This motivates future large-scale genetic studies on modifiers of ALS progression, as these can be targets for potential new treatments for ALS as well. The subsequent functional enrichment analyses identified that membrane trafficking, Golgi-to-ER trafficking and autophagy were enriched for genes within ALS-associated loci. These terms and their related gene ontology terms of biological processes are all related to autophagy and degradation of (misfolded) proteins. This corroborates the central hypothesis of impaired protein degradation leading to aberrant protein aggregation in neurons, which is the pathological hallmark of ALS. Our results suggest that this is a central mechanism in ALS even in the absence of rare known mutations in genes directly involved in these biological processes such as TARDBP, FUS, UBQLN2 and OPTN[47]. Based on observational studies and MR analyses, conflicting evidence exists for lipid levels including cholesterol as a risk factor for ALS[48-50]. Potential selection bias, reverse causality and the subtype of cholesterol studied challenge the interpretation of these results. Here, we provided support for a causal relationship between high total cholesterol levels and ALS independent of educational attainment and ruling out reverse orientation of the MR effect. The total cholesterol effects were consistent across the different MR methods tested, indicating that this finding is robust to violation of the ‘no horizontal pleiotropy’ assumption. This is in line with our study showing methylation changes associated with increased cholesterol levels in ALS[36]. We do not find a clear pattern for either low-density lipoprotein (LDL) or high-density lipoprotein (HDL) cholesterol subtypes in relation to ALS risk. While cholesterol levels are closely related to cardiovascular risk, the association between cardiovascular risk and ALS risk remains controversial with conflicting reports[3,48,51]. Interestingly, recent work has shown that lipid metabolism and autophagy are closely related[52], which brings the results of our pathway analyses and MR together. Both in vitro and in vivo experiments have shown that autophagy regulates lipid homeostasis through lipolysis and that impaired autophagy increases triglyceride and cholesterol levels. Conversely, high lipid levels were shown to impair autophagy[52]. Further studies on the effect of high cholesterol levels and protein degradation through autophagy illustrate that high cholesterol levels decrease the fusogenic ability of autophagic vesicles through decreased function of soluble N-ethylmaleimide-sensitive factor-attachment protein receptor (SNARE)[53,54] and lead to increased protein aggregation due to impaired autophagy in mouse models of AD[55]. Therefore, the risk-increasing effect of cholesterol on ALS might be mediated through impaired autophagy. In conclusion, our GWAS identifies 15 risk loci in ALS and illustrates locus-specific interplay between common and rare genetic variation that helps to prioritize genes for future follow-up studies. We show a causal role for cholesterol, which can be linked to impaired autophagy as common denominators of neuron-specific pathology that drive ALS susceptibility and serve as potential targets for therapeutic strategies.

Methods

Genome-wide association study

Data description

We obtained individual genotype-level data for all individuals in the previously published GWAS of ALS in European ancestries[11,14] and publicly available control datasets including 120,971 controls genotyped on Illumina platforms. Additionally, 6,374 cases and 22,526 controls were genotyped on the Illumina OmniExpress and Illumina GSA arrays. Details for each cohort are provided in Supplementary Table 1. All patients with ALS were diagnosed and ascertained through specialized MND clinics where they were diagnosed with ALS according to the (revised) El Escorial Criteria[56] by neurologists specialized in motor neuron diseases. Whole-blood samples were drawn for DNA isolation, which were specifically collected for ongoing case–control studies of ALS. Both cases with and without a family history for ALS and/or dementia were included. Cases were not pre-screened for specific ALS-related mutations. Given the late onset and relatively low lifetime risk of ALS, controls were not screened for (subclinical) signs of ALS. A detailed description of the ascertainment of newly genotyped cases and controls is provided in the Supplementary Note. All participants gave written informed consent, and the relevant local institutional review boards approved this study (Supplementary Note). Cases and controls formed cohorts when they were processed in the same laboratory and were genotyped in the same batch, resulting in 117 independent cohorts. Summary statistics were obtained for the Asian ancestry GWAS of ALS[15,16] (Supplementary Note).

GWAS quality control and imputation

For each cohort, we first performed individual- and variant-level quality control, after which cohorts were merged into six strata based on genotyping platform. Subsequent stratum-wise quality control was performed, and strata were imputed up to the Haplotype Reference Consortium panel (r.1.1 2016) through the Michigan Imputation Server[21]. Full quality-control details are described in the Supplementary Note and Supplementary Fig. 17. Numbers of individuals and variants passing each quality-control step are described in Supplementary Table 2.

Association testing and meta-analysis

After quality control, a null logistic mixed model was fitted using SAIGE[57] 0.29.1 for each stratum with principal component (PC)1–PC20 as covariates. The model was fit on a set of high-quality (INFO > 0.95) SNPs pruned with PLINK 1.9 (‘–indep-pairwise 50 25 0.1’) in a leave-one-chromosome-out scheme. Subsequently, a SNP-wise logistic mixed model including the saddlepoint approximation test was performed using genotype dosages with SAIGE. Association statistics for all strata were combined in an IVW fixed-effects meta-analysis using METAL[58]. Genomic inflation factors were calculated per stratum and for the full meta-analysis. To assess any residual confounding due to population stratification and artificial structure in the data, we calculated the LDSC[59] intercept using SNP LD scores calculated in the HapMap3 CEU population.

Cross-ancestry analyses

GWAS summary statistics from two Asian ancestry studies were obtained[15,16]. These summary statistics were meta-analyzed with all European ancestry data in strata as described above. To assess genetic correlation for ALS in European and Asian ancestries, we used Popcorn[60] version 0.9.9. We used population-specific LD scores for genetic impact and genetic effect provided with the Popcorn software. The regression model (‘–use_regression’) was used to estimate genetic correlation. We calculated both the correlation of genetic effects (correlation of allelic effect sizes) and genetic impact (correlation of allelic effect size adjusted for difference in allele frequencies).

Conditional SNP analysis

Conditional and joint SNP analysis (COJO, GCTA version 1.91.1b)[61,62] was performed to identify potential secondary GWAS signals within a single locus. SNPs with association P ≤ 5 × 10−8 were considered. Controls of European ancestry from the Health and Retirement Study (HRS, cohort 65, Supplementary Table 1), included in stratum 4 of this study, were used as the LD reference panel.

Gene prioritization

Whole-genome sequencing

Sample selection, sequencing and data preparation

Patients with ALS and control participants from Project MinE[63] were recruited for WGS. The participating cohorts were not pre-screened for ALS-associated mutations and are described in the Supplementary Note. In total, 228 patients were known to have at least one first- or second-degree relative with ALS. A full description of Project MinE and the sequencing and quality-control pipeline were described previously[64]. In summary, the first batch of 2,250 cases and control samples was sequenced on the Illumina HiSeq 2000 platform. All remaining 7,350 case and control samples were sequenced on the Illumina HiSeq X platform. All samples were sequenced to ~35× coverage with 100-bp reads and ~25× coverage with 150-bp reads for HiSeq 2000 and HiSeq X, respectively. Both sequencing sets used PCR-free library preparation. Samples were also genotyped on the Illumina 2.5M array. Sequencing data were then aligned to GRCh37 using the Isaac Aligner, and variants were called using the Isaac variant caller; both the aligner and caller are standard to Illumina’s aligning and calling pipeline. Full details of individual- and variant-level quality control are described in the Supplementary Note.

Genic burden association analyses

To aggregate rare variants in a genic burden test framework, we used a variety of variant filters to allow for different genetic architectures of ALS-associated variants per gene as we and others did previously[64,65]. In summary, variants were annotated according to allele-frequency threshold (MAF < 0.01 or MAF < 0.005) and predicted variant impact (‘missense’, ‘damaging’, ‘disruptive’). ‘Disruptive’ variants were those variants classified as frameshift, splice site, exon loss, stop gained, start loss and transcription ablation by SnpEff[66]. ‘Damaging’ variants were missense variants predicted to be damaging by seven prediction algorithms (SIFT[67], PolyPhen-2 (ref. [68]), LRT[69], MutationTaster2 (ref. [70]), Mutations Assessor[71] and PROVEAN[72]). ‘Missense’ variants were those missense variants that did not meet the ‘damaging’ criteria. All combinations of allele-frequency threshold and variant annotations were used to test the genic burden on a transcript level in a Firth logistic regression framework in which burden was defined as the number of variants per individual. Sex and the first 20 PCs were included as covariates. All Ensembl protein-coding transcripts for which at least five individuals had a non-zero burden were included in the analysis.

Conditional genic burden analysis

We selected for each gene the protein-coding transcripts that were the most strongly associated with ALS across all different combinations of MAF and variant-impact thresholds. For these transcripts and variants, we applied Firth logistic regression on individuals included in both the GWAS and WGS datasets (5,158 cases and 2,167 controls). To assess whether the rare variant burden association and the signal from the GWAS were conditionally independent, we subsequently included the genotype of the top associated SNP within that locus as a covariate.

Short tandem repeat screen

For all individuals who had sequencing results in the HiSeq X dataset (5,392 cases, 1,795 controls), we screened all loci harboring SNPs associated with ALS meeting genome-wide significance for expansions of known and new STRs using ExpansionHunter[73] and ExpansionHunter Denovo[74]. First, we used ExpansionHunter (version 4.0) to screen for expansions of known STRs located within 1 Mb of the top ALS-associated SNP. For this, we used the STRs identified from indels in 18 high-quality genomes and the GangSTR STR catalog based on STR annotations in the reference genome[75]. We excluded all homopolymers from these catalogs. Repeat length was subsequently regressed on case–control status using Firth logistic regression including the first 20 PCs as covariates, recoding the STR size to a biallelic variant using a sliding window over all observed repeat lengths. To correct for multiple testing across all possible thresholds, we applied Benjamini–Hochberg correction per STR. To screen for extremely long STR expansions (similar to the C9orf72 repeat expansion) at loci that were not included in the predefined STR catalogs, we applied ExpansionHunter Denovo[74]. This method aims to only find STR expansions that exceed the sequencing read length (>150 bp) by identifying reads (mapped, mismapped and unmapped) that contain STR motifs, using their mate pairs for de novo mapping to the reference genome. For all STRs, we calculated LD statistics (r2 and |D′|) between recoded repeat genotypes at the optimal threshold and the top associated GWAS SNP. Subsequently, we conditioned the SNP association on the repeat genotype in a Firth logistic regression.

Summary-based Mendelian randomization

We used multi-SNP SMR[76,77] to infer the effect of gene expression variation on ALS using eQTL (the association of a SNP with expression of a gene) on ALS risk. We chose to apply SMR because this method yielded very similar results when compared to S-PrediXcan[78] and TWAS[79] (Supplementary Fig. 18) when applied using GTEx version 7 eQTL, and it can be applied to the large relevant eQTL datasets (MetaBrain and eQTLGen) without access to individual-level genotype and gene expression data. MetaBrain is a harmonized set of 8,727 RNA-seq samples from seven regions of the central nervous system from 15 datasets, and we selected eQTL derived from the cortex region of the brain in samples of European ancestry (MetaBrain Cortex-EUR eQTL, n = 2,970 individuals, n = 6,601 RNA-seq samples) as our instrument variable[24]. European-only ALS summary statistics were used as the outcome. To supplement this analysis, we also used eQTL in blood from the eQTLGen Consortium, as this is a large available eQTL resource. Samples of European ancestry in the HRS (cohort 65 of this GWAS) were used as the LD reference panel. SNPs with MAF ≥ 1% in the HRS were included. Further SMR settings were left as default, meaning probes with at least one eQTL with P ≤ 5 × 10−8 were included. We subsequently performed SMR using DNA mQTL data and European-only ALS summary statistics. Human prefrontal cortex and whole-blood DNA mQTL were generated as part of ongoing analyses by the Complex Disease Epigenomics Group at the University of Exeter (https://www.epigenomicslab.com/) using the Illumina EPIC HumanMethylation array that quantifies DNAm at >850,000 sites across the genome[25]. The prefrontal cortex mQTL dataset was generated using DNA-methylation and SNP data from 522 individuals from the Brains for Dementia Research cohort[26] and includes 4,623,966 cis mQTL (distance between quantitative trait locus SNP and DNAm site ≤500 kb) between 1,744,102 SNPs and 43,337 DNA-methylation sites. The whole-blood mQTL dataset was generated using DNAm and SNP data from 2,082 individuals[80] and included 30,432,023 cis mQTL between 4,030,902 SNPs and 167,854 DNA-methylation sites. mQTL reaching the significance threshold P ≤ 1 × 10−10 were taken forward for SMR analysis as described by Hannon and colleagues[80]. To map CpG sites to their putative target genes, we used the expression quantitative trait methylation results from a paired methylation and gene expression (RNA-seq) study in blood[81]. For CpG sites where no expression quantitative trait methylation was present in this dataset, we used positional mapping based on the basal regulatory domains and extended regulatory domains as defined in the Genomic Regions Enrichment of Annotations Tool (GREAT)[82], which is applied in the ‘cpg_to_gene‘ function in the CpGtools toolkit[83].

Polygenic risk score calculation

PRSs were constructed based on the 15 lead SNPs of genome-wide significant loci (15-SNP PRS) or a full-genome-wide model (full-genome PRS). For the 15-SNP PRS, the SNP weights were defined as the meta-analyzed effect estimates. We used the summary-BayesR framework from the Genome-wide Complex Trait Bayesian analysis (GCTB) toolkit[84,85] to obtain SNP weights for the full-genome PRS based on the European ancestry meta-analysis excluding stratum 6. We used the default model parameters and the precalculated sparse LD matrix of imputed HapMap3 SNPs in 50,000 random individuals included in the UK Biobank of European ancestries. Summary-BayesR SNP effects were plotted against marginal SNP effects to rule out potential biased estimates due to non-convergence of the MCMC algorithm. Finally, the PRSs for all individuals in stratum 6 were calculated using the ‘–score’ function in PLINK and normalized to zero mean and unit variance.

Modifier analyses

For 6,095 of the patients with WGS and ALS, core clinical data were obtained including sex, site of onset (spinal or bulbar), age at onset (years), country of origin and survival, defined as time from disease onset to death, 23 h of continuous non-invasive ventilation per day or tracheostomy. Patients who were still alive were censored at the last date of follow-up. The genetic risk factors included SNP genotypes, PRSs, C9orf72 repeat expansion status and the number of rare coding mutations in ALS-risk genes (SOD1, TARDBP, FUS, NEK1, TBK1 and CFAP410) as obtained from WGS as described above. For survival analyses, the Cox proportional hazards mixed model from the ‘coxme‘ package in R was used, modeling country of origin as a random effect. Fixed-effect covariates included sex, age at onset, site of onset, GWAS stratum and PC1–PC5. Violation of the proportional hazards assumption for genotype on survival was assessed by inspecting Schoenfeld residuals. For age-at-onset analyses, we applied linear regression of age at onset on genotype including sex, site of onset, country, GWAS stratum and PC1–PC5 as covariates.

Cross-trait analyses

Datasets and data preparation

GWAS summary statistics for clinically diagnosed AD[86], PD[87], FTD[88], CBD[89] and PSP[20] in individuals of European ancestry were obtained. For AD, we used the clinical diagnosis as the case definition to avoid spurious genetic correlations that could have been introduced through the by-proxy design[31], in which by-proxy cases are defined as having a parent with AD. Although this is a powerful design for gene discovery and the genetic correlation with clinically diagnosed AD is high[90], mislabeling by-proxy cases when parents suffer from other types of dementia (for example, Lewy body dementia, Parkinson’s dementia, FTD or vascular dementia) can lead to spurious genetic correlations with ALS and other neurodegenerative diseases. For FTD, we primarily used the results of the cross-subtype meta-analysis, which includes behavioral variant FTD, semantic dementia FTD, progressive non-fluent aphasia FTD and mndFTD. For CBD, allele coding was unavailable, and effect alleles were inferred by matching allele frequencies to those observed in the Haplotype Reference Consortium. SNPs with MAF > 0.4 were excluded. Because downstream methods rely on LD scores or population-specific LD patterns, the European ancestry summary statistics from the present study were used for ALS. For sample size parameters, effective sample size was calculated as described previously. Multiple sclerosis summary statistics were obtained from the International Multiple Sclerosis Genetics Consortium[91]. For cerebrovascular diseases, GWAS summary statistics were obtained for ischemic stroke (any ischemic stroke)[92], intracerebral hemorrhage[93] and intracranial aneurysm[94]. For psychiatric traits, GWAS summary statistics were obtained from Psychiatric Genomics Consortium studies on anorexia nervosa[95], obsessive–compulsive disorder[96], anxiety disorders (anxiety score)[97], post-traumatic stress disorder (all European ancestries)[98], major depressive disorder[99], bipolar disorder[100], schizophrenia[101], Tourette’s syndrome[102], autism spectrum disorder[103] and attention-deficit hyperactivity disorder (European ancestries)[104].

Genetic correlation

Genome-wide genetic correlation between neurodegenerative traits was calculated using LDSC (version 1.0.0)[59]. Precomputed LD scores of European individuals in the 1000 Genomes project for high-quality HapMap3 SNPs were used (‘eur_w_ld_chr’). A free intercept was modeled to allow for potential sample overlap.

Colocalization

Before the colocalization analysis of neurodegenerative diseases, we first assessed residual confounding by estimating the LDSC intercept using LDSC (version 1.0.0) (ALS, 1.03 (s.e., 0.0073); AD, 1.03 (s.e., 0.013); PD, 0.98 (s.e., 0.0065); PSP, 1.05 (s.e., 0.0076); CBD, 0.98 (s.e., 0.0073); FTD, 1.00 (s.e., 0.0071)), showing limited inflation of test statistics due to confounding across these studies. For each locus (top SNP ±100 kb) harboring SNPs with an association with any of the neurodegenerative diseases (ALS, AD, PD, PSP, CBD, FTD) at P < 1 × 10−5, we performed colocalization analysis using the ‘coloc’ package in R[105]. We set the prior probabilities to π1 = 1 × 10−4, π2 = 1 × 10−4 and π12 = 1 × 10−5 for a causal variant in trait 1 or trait 2 and a shared causal variant between traits 1 and 2, respectively. Using the same parameters, we performed colocalization analysis for ALS and each of the FTD subtypes (behavioral variant FTD, semantic dementia FTD, progressive non-fluent aphasia FTD and mndFTD).

Enrichment analyses

Linkage disequilibrium score regression annotation-specific enrichment analysis

We used LDSC (version 1.0.0)[59] to calculate SNP-based heritability, the LDSC intercept and SNP-based heritability enrichment for partitions of the genome. In all LDSC analyses, summary statistics excluding the HLA region of only samples of European ancestry were included. LD scores and partitioned LD scores provided by LDSC were used for genome-wide and genic region-based heritability analyses. The option ‘–overlap-annot’ was used in the partitioned heritability analysis to allow for overlapping SNPs between MAF bins. SNPs with MAF > 5% were included.

Tissue and cell type enrichment analysis

Tissue and cell type enrichment analyses were performed using the GWAS summary statistics of the European ancestry meta-analysis and FUMA[33] software version 1.3.6a. FUMA performs a genic aggregation analysis of GWAS association signals to calculate gene-wise association signals using MAGMA version 1.6 and subsequently tests whether tissues and cell types are enriched for expression of these genes. For tissue enrichment analysis, we used the GTEx version 8 reference set. FDR-corrected P-values <0.05 across all tissues (n = 54) were considered statistically significant. For cell type enrichment analyses[34], we used human-derived single-cell RNA-seq data on major brain cell types (GSE67835 without fetal samples[106]), Allen Brain Atlas cell types[107] for the human-derived major neuronal subtypes and the DropViz[108] dataset for mouse-derived brain cell types across all brain regions. We applied FDR correction for multiple testing within each expression dataset, and FDR-corrected P-values <0.05 were considered statistically significant.

Pathway enrichment analysis

We used Downstreamer software[24] to identify enriched biological pathways and processes. First, gene-based association statistics were obtained with the Pascal method[109], which aggregates SNP association statistics including SNPs up to 10 kb upstream and downstream of a gene, accounting for LD using the non-Finnish European individuals from the 1000 Genomes Project phase 3 (ref. [110]) as a reference. In the Downstreamer method, putative core genes are defined as those that are coexpressed with disease-associated genes and can therefore be implicated in disease. Coexpression networks are based on either a large, multi-tissue transcriptome dataset including 56,435 genes and 31,499 individuals or brain-specific RNA-seq data obtained from the MetaBrain resource. The gene-based association statistics, coexpression matrix and gene Z scores per pathway or HPO term are then combined in a generalized least-squares regression model to obtain enrichment statistics[24]. Enrichment analyses were performed for reactome, gene ontology and HPO terms using multi-tissue or brain-specific transcriptome datasets to calculate the coexpression matrix. The distribution of enrichment Z-score statistics was compared between analyses using multi-tissue or brain-specific coexpression matrices. Using the ‘pyhpo’ module in Python, all HPO terms were assigned to their parent term(s) in the ‘phenotypic abnormality’ (HP:0000118) branch, which includes phenotypic abnormalities grouped per organ system.

Mendelian randomization

Causal inference through MR analysis was performed for 22 exposures for which large-scale GWASs are available and for which there is prior evidence for an association with ALS. These include seven behavioral-related traits: body mass index (anthropometric)[111], years of schooling (educational attainment)[112], alcoholic drinks per week, age of smoking initiation and cigarettes per day from Liu et al.[113], days per week of moderate physical activity and days per week of vigorous activity from the UK Biobank[114]; four blood pressure traits (coronary artery disease[115], stroke[92], diastolic blood pressure and systolic blood pressure[116]); seven immune system traits from Vuckovic et al.[117] (basophil, eosinophil, lymphocyte, monocyte, neutrophil and white blood cell counts) and C-reactive protein[118]; and four lipid traits from Willer et al.[119] (HDL cholesterol, LDL cholesterol, total cholesterol and triglyceride levels). A full description of the included studies is provided in Supplementary Table 26. From these GWASs, SNPs to serve as instruments for MR analyses were selected at two different P-value cutoffs (P < 5 × 10−8 and P < 5 × 10−5) and then LD clumped to obtain independent SNPs. SNP effect estimates on ALS risk were obtained from the European ancestry-only GWAS and, if needed, an LD proxy was selected (r2 > 0.8). After harmonizing effect alleles and excluding palindromic SNPs, we performed a series of quality-control steps to avoid biased estimates of causal effects, checking for each exposure (1) instrument coverage (>85% overlapping SNPs; Supplementary Table 31), (2) instrument strength (F-statistic[37,120,121] >10; Supplementary Table 32), (3) distribution and significance of the Wald ratios (visual inspection of volcano plots; Supplementary Table 33) and (4) heterogeneity across the instrument-exposure effects (Q-statistic at P < 0.05 indicated heterogeneity; Supplementary Table 34). We applied five different MR methods: IVW using the random-effects model, MR-Egger and simple mode, weighted median and weighted mode methods. When only a single SNP was available, the Wald ratio test was conducted. MR analysis was conducted in R using the ‘mr()‘ function in the ‘TwoSampleMR‘ package[122]. Subsequently, radial MR analysis was conducted to determine whether Wald ratio outliers needed to be removed from the IVW or MR-Egger MR estimates[38]. In addition, we conducted a Q-test to identify outlier SNPs (P < 0.05). These outliers were then removed from the original MR analyses (across all five MR methods). The radial MR analysis was conducted using the RadialMR R package (https://github.com/WSpiller/RadialMR). To determine whether MR effects were orientated in the correct direction (from exposure to ALS), we conducted both reverse MR[123] and Steiger filtering[124] on our top MR findings. Finally, we explored whether the MR effects of our total and LDL cholesterol and systolic blood pressure exposures may be confounded by the effect we observed for years of schooling by conducting multivariate MR analysis[125]. Conditional F- and Q-statistics were calculated using the ‘MVMR‘ package[126] in R.

Statistical analyses

All presented P-values correspond to two-sided P-values uncorrected for multiple testing unless explicitly stated otherwise.

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Online content

Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at 10.1038/s41588-021-00973-1.

Supplementary information

Supplementary Note, Figs. 1–18 and Tables 4–25 and 27–30 Reporting Summary Peer Review Information Supplementary Tables 1–3, 26 and 31–34.

121 in total

1. SIFT missense predictions for genomes.

Authors: Robert Vaser; Swarnaseetha Adusumalli; Sim Ngak Leng; Mile Sikic; Pauline C Ng
Journal: Nat Protoc Date: 2015-12-03 Impact factor: 13.491

2. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets.

Authors: Zhihong Zhu; Futao Zhang; Han Hu; Andrew Bakshi; Matthew R Robinson; Joseph E Powell; Grant W Montgomery; Michael E Goddard; Naomi R Wray; Peter M Visscher; Jian Yang
Journal: Nat Genet Date: 2016-03-28 Impact factor: 38.330

3. Integrative approaches for large-scale transcriptome-wide association studies.

Authors: Alexander Gusev; Arthur Ko; Huwenbo Shi; Gaurav Bhatia; Wonil Chung; Brenda W J H Penninx; Rick Jansen; Eco J C de Geus; Dorret I Boomsma; Fred A Wright; Patrick F Sullivan; Elina Nikkola; Marcus Alvarez; Mete Civelek; Aldons J Lusis; Terho Lehtimäki; Emma Raitoharju; Mika Kähönen; Ilkka Seppälä; Olli T Raitakari; Johanna Kuusisto; Markku Laakso; Alkes L Price; Päivi Pajukanta; Bogdan Pasaniuc
Journal: Nat Genet Date: 2016-02-08 Impact factor: 38.330

4. Lifetime Risk and Heritability of Amyotrophic Lateral Sclerosis.

Authors: Marie Ryan; Mark Heverin; Russell L McLaughlin; Orla Hardiman
Journal: JAMA Neurol Date: 2019-11-01 Impact factor: 18.302

5. A hexanucleotide repeat expansion in C9ORF72 is the cause of chromosome 9p21-linked ALS-FTD.

Authors: Alan E Renton; Elisa Majounie; Adrian Waite; Javier Simón-Sánchez; Sara Rollinson; J Raphael Gibbs; Jennifer C Schymick; Hannu Laaksovirta; John C van Swieten; Liisa Myllykangas; Hannu Kalimo; Anders Paetau; Yevgeniya Abramzon; Anne M Remes; Alice Kaganovich; Sonja W Scholz; Jamie Duckworth; Jinhui Ding; Daniel W Harmer; Dena G Hernandez; Janel O Johnson; Kin Mok; Mina Ryten; Danyah Trabzuni; Rita J Guerreiro; Richard W Orrell; James Neal; Alex Murray; Justin Pearson; Iris E Jansen; David Sondervan; Harro Seelaar; Derek Blake; Kate Young; Nicola Halliwell; Janis Bennion Callister; Greg Toulson; Anna Richardson; Alex Gerhard; Julie Snowden; David Mann; David Neary; Michael A Nalls; Terhi Peuralinna; Lilja Jansson; Veli-Matti Isoviita; Anna-Lotta Kaivorinne; Maarit Hölttä-Vuori; Elina Ikonen; Raimo Sulkava; Michael Benatar; Joanne Wuu; Adriano Chiò; Gabriella Restagno; Giuseppe Borghero; Mario Sabatelli; David Heckerman; Ekaterina Rogaeva; Lorne Zinman; Jeffrey D Rothstein; Michael Sendtner; Carsten Drepper; Evan E Eichler; Can Alkan; Ziedulla Abdullaev; Svetlana D Pack; Amalia Dutra; Evgenia Pak; John Hardy; Andrew Singleton; Nigel M Williams; Peter Heutink; Stuart Pickering-Brown; Huw R Morris; Pentti J Tienari; Bryan J Traynor
Journal: Neuron Date: 2011-09-21 Impact factor: 17.173

6. Defining the role of common variation in the genomic and biological architecture of adult human height.

Authors: Andrew R Wood; Tonu Esko; Jian Yang; Sailaja Vedantam; Tune H Pers; Stefan Gustafsson; Audrey Y Chu; Karol Estrada; Jian'an Luan; Zoltán Kutalik; Najaf Amin; Martin L Buchkovich; Damien C Croteau-Chonka; Felix R Day; Yanan Duan; Tove Fall; Rudolf Fehrmann; Teresa Ferreira; Anne U Jackson; Juha Karjalainen; Ken Sin Lo; Adam E Locke; Reedik Mägi; Evelin Mihailov; Eleonora Porcu; Joshua C Randall; André Scherag; Anna A E Vinkhuyzen; Harm-Jan Westra; Thomas W Winkler; Tsegaselassie Workalemahu; Jing Hua Zhao; Devin Absher; Eva Albrecht; Denise Anderson; Jeffrey Baron; Marian Beekman; Ayse Demirkan; Georg B Ehret; Bjarke Feenstra; Mary F Feitosa; Krista Fischer; Ross M Fraser; Anuj Goel; Jian Gong; Anne E Justice; Stavroula Kanoni; Marcus E Kleber; Kati Kristiansson; Unhee Lim; Vaneet Lotay; Julian C Lui; Massimo Mangino; Irene Mateo Leach; Carolina Medina-Gomez; Michael A Nalls; Dale R Nyholt; Cameron D Palmer; Dorota Pasko; Sonali Pechlivanis; Inga Prokopenko; Janina S Ried; Stephan Ripke; Dmitry Shungin; Alena Stancáková; Rona J Strawbridge; Yun Ju Sung; Toshiko Tanaka; Alexander Teumer; Stella Trompet; Sander W van der Laan; Jessica van Setten; Jana V Van Vliet-Ostaptchouk; Zhaoming Wang; Loïc Yengo; Weihua Zhang; Uzma Afzal; Johan Arnlöv; Gillian M Arscott; Stefania Bandinelli; Amy Barrett; Claire Bellis; Amanda J Bennett; Christian Berne; Matthias Blüher; Jennifer L Bolton; Yvonne Böttcher; Heather A Boyd; Marcel Bruinenberg; Brendan M Buckley; Steven Buyske; Ida H Caspersen; Peter S Chines; Robert Clarke; Simone Claudi-Boehm; Matthew Cooper; E Warwick Daw; Pim A De Jong; Joris Deelen; Graciela Delgado; Josh C Denny; Rosalie Dhonukshe-Rutten; Maria Dimitriou; Alex S F Doney; Marcus Dörr; Niina Eklund; Elodie Eury; Lasse Folkersen; Melissa E Garcia; Frank Geller; Vilmantas Giedraitis; Alan S Go; Harald Grallert; Tanja B Grammer; Jürgen Gräßler; Henrik Grönberg; Lisette C P G M de Groot; Christopher J Groves; Jeffrey Haessler; Per Hall; Toomas Haller; Goran Hallmans; Anke Hannemann; Catharina A Hartman; Maija Hassinen; Caroline Hayward; Nancy L Heard-Costa; Quinta Helmer; Gibran Hemani; Anjali K Henders; Hans L Hillege; Mark A Hlatky; Wolfgang Hoffmann; Per Hoffmann; Oddgeir Holmen; Jeanine J Houwing-Duistermaat; Thomas Illig; Aaron Isaacs; Alan L James; Janina Jeff; Berit Johansen; Åsa Johansson; Jennifer Jolley; Thorhildur Juliusdottir; Juhani Junttila; Abel N Kho; Leena Kinnunen; Norman Klopp; Thomas Kocher; Wolfgang Kratzer; Peter Lichtner; Lars Lind; Jaana Lindström; Stéphane Lobbens; Mattias Lorentzon; Yingchang Lu; Valeriya Lyssenko; Patrik K E Magnusson; Anubha Mahajan; Marc Maillard; Wendy L McArdle; Colin A McKenzie; Stela McLachlan; Paul J McLaren; Cristina Menni; Sigrun Merger; Lili Milani; Alireza Moayyeri; Keri L Monda; Mario A Morken; Gabriele Müller; Martina Müller-Nurasyid; Arthur W Musk; Narisu Narisu; Matthias Nauck; Ilja M Nolte; Markus M Nöthen; Laticia Oozageer; Stefan Pilz; Nigel W Rayner; Frida Renstrom; Neil R Robertson; Lynda M Rose; Ronan Roussel; Serena Sanna; Hubert Scharnagl; Salome Scholtens; Fredrick R Schumacher; Heribert Schunkert; Robert A Scott; Joban Sehmi; Thomas Seufferlein; Jianxin Shi; Karri Silventoinen; Johannes H Smit; Albert Vernon Smith; Joanna Smolonska; Alice V Stanton; Kathleen Stirrups; David J Stott; Heather M Stringham; Johan Sundström; Morris A Swertz; Ann-Christine Syvänen; Bamidele O Tayo; Gudmar Thorleifsson; Jonathan P Tyrer; Suzanne van Dijk; Natasja M van Schoor; Nathalie van der Velde; Diana van Heemst; Floor V A van Oort; Sita H Vermeulen; Niek Verweij; Judith M Vonk; Lindsay L Waite; Melanie Waldenberger; Roman Wennauer; Lynne R Wilkens; Christina Willenborg; Tom Wilsgaard; Mary K Wojczynski; Andrew Wong; Alan F Wright; Qunyuan Zhang; Dominique Arveiler; Stephan J L Bakker; John Beilby; Richard N Bergman; Sven Bergmann; Reiner Biffar; John Blangero; Dorret I Boomsma; Stefan R Bornstein; Pascal Bovet; Paolo Brambilla; Morris J Brown; Harry Campbell; Mark J Caulfield; Aravinda Chakravarti; Rory Collins; Francis S Collins; Dana C Crawford; L Adrienne Cupples; John Danesh; Ulf de Faire; Hester M den Ruijter; Raimund Erbel; Jeanette Erdmann; Johan G Eriksson; Martin Farrall; Ele Ferrannini; Jean Ferrières; Ian Ford; Nita G Forouhi; Terrence Forrester; Ron T Gansevoort; Pablo V Gejman; Christian Gieger; Alain Golay; Omri Gottesman; Vilmundur Gudnason; Ulf Gyllensten; David W Haas; Alistair S Hall; Tamara B Harris; Andrew T Hattersley; Andrew C Heath; Christian Hengstenberg; Andrew A Hicks; Lucia A Hindorff; Aroon D Hingorani; Albert Hofman; G Kees Hovingh; Steve E Humphries; Steven C Hunt; Elina Hypponen; Kevin B Jacobs; Marjo-Riitta Jarvelin; Pekka Jousilahti; Antti M Jula; Jaakko Kaprio; John J P Kastelein; Manfred Kayser; Frank Kee; Sirkka M Keinanen-Kiukaanniemi; Lambertus A Kiemeney; Jaspal S Kooner; Charles Kooperberg; Seppo Koskinen; Peter Kovacs; Aldi T Kraja; Meena Kumari; Johanna Kuusisto; Timo A Lakka; Claudia Langenberg; Loic Le Marchand; Terho Lehtimäki; Sara Lupoli; Pamela A F Madden; Satu Männistö; Paolo Manunta; André Marette; Tara C Matise; Barbara McKnight; Thomas Meitinger; Frans L Moll; Grant W Montgomery; Andrew D Morris; Andrew P Morris; Jeffrey C Murray; Mari Nelis; Claes Ohlsson; Albertine J Oldehinkel; Ken K Ong; Willem H Ouwehand; Gerard Pasterkamp; Annette Peters; Peter P Pramstaller; Jackie F Price; Lu Qi; Olli T Raitakari; Tuomo Rankinen; D C Rao; Treva K Rice; Marylyn Ritchie; Igor Rudan; Veikko Salomaa; Nilesh J Samani; Jouko Saramies; Mark A Sarzynski; Peter E H Schwarz; Sylvain Sebert; Peter Sever; Alan R Shuldiner; Juha Sinisalo; Valgerdur Steinthorsdottir; Ronald P Stolk; Jean-Claude Tardif; Anke Tönjes; Angelo Tremblay; Elena Tremoli; Jarmo Virtamo; Marie-Claude Vohl; Philippe Amouyel; Folkert W Asselbergs; Themistocles L Assimes; Murielle Bochud; Bernhard O Boehm; Eric Boerwinkle; Erwin P Bottinger; Claude Bouchard; Stéphane Cauchi; John C Chambers; Stephen J Chanock; Richard S Cooper; Paul I W de Bakker; George Dedoussis; Luigi Ferrucci; Paul W Franks; Philippe Froguel; Leif C Groop; Christopher A Haiman; Anders Hamsten; M Geoffrey Hayes; Jennie Hui; David J Hunter; Kristian Hveem; J Wouter Jukema; Robert C Kaplan; Mika Kivimaki; Diana Kuh; Markku Laakso; Yongmei Liu; Nicholas G Martin; Winfried März; Mads Melbye; Susanne Moebus; Patricia B Munroe; Inger Njølstad; Ben A Oostra; Colin N A Palmer; Nancy L Pedersen; Markus Perola; Louis Pérusse; Ulrike Peters; Joseph E Powell; Chris Power; Thomas Quertermous; Rainer Rauramaa; Eva Reinmaa; Paul M Ridker; Fernando Rivadeneira; Jerome I Rotter; Timo E Saaristo; Danish Saleheen; David Schlessinger; P Eline Slagboom; Harold Snieder; Tim D Spector; Konstantin Strauch; Michael Stumvoll; Jaakko Tuomilehto; Matti Uusitupa; Pim van der Harst; Henry Völzke; Mark Walker; Nicholas J Wareham; Hugh Watkins; H-Erich Wichmann; James F Wilson; Pieter Zanen; Panos Deloukas; Iris M Heid; Cecilia M Lindgren; Karen L Mohlke; Elizabeth K Speliotes; Unnur Thorsteinsdottir; Inês Barroso; Caroline S Fox; Kari E North; David P Strachan; Jacques S Beckmann; Sonja I Berndt; Michael Boehnke; Ingrid B Borecki; Mark I McCarthy; Andres Metspalu; Kari Stefansson; André G Uitterlinden; Cornelia M van Duijn; Lude Franke; Cristen J Willer; Alkes L Price; Guillaume Lettre; Ruth J F Loos; Michael N Weedon; Erik Ingelsson; Jeffrey R O'Connell; Goncalo R Abecasis; Daniel I Chasman; Michael E Goddard; Peter M Visscher; Joel N Hirschhorn; Timothy M Frayling
Journal: Nat Genet Date: 2014-10-05 Impact factor: 38.330

7. An integrated genetic-epigenetic analysis of schizophrenia: evidence for co-localization of genetic associations and differential DNA methylation.

Authors: Eilis Hannon; Emma Dempster; Joana Viana; Joe Burrage; Adam R Smith; Ruby Macdonald; David St Clair; Colette Mustard; Gerome Breen; Sebastian Therman; Jaakko Kaprio; Timothea Toulopoulou; Hilleke E Hulshoff Pol; Marc M Bohlken; Rene S Kahn; Igor Nenadic; Christina M Hultman; Robin M Murray; David A Collier; Nick Bass; Hugh Gurling; Andrew McQuillin; Leonard Schalkwyk; Jonathan Mill
Journal: Genome Biol Date: 2016-08-30 Impact factor: 13.583

8. Exploring the genetic architecture of inflammatory bowel disease by whole-genome sequencing identifies association at ADCY7.

Authors: Yang Luo; Katrina M de Lange; Luke Jostins; Loukas Moutsianas; Joshua Randall; Nicholas A Kennedy; Christopher A Lamb; Shane McCarthy; Tariq Ahmad; Cathryn Edwards; Eva Goncalves Serra; Ailsa Hart; Chris Hawkey; John C Mansfield; Craig Mowat; William G Newman; Sam Nichols; Martin Pollard; Jack Satsangi; Alison Simmons; Mark Tremelling; Holm Uhlig; David C Wilson; James C Lee; Natalie J Prescott; Charlie W Lees; Christopher G Mathew; Miles Parkes; Jeffrey C Barrett; Carl A Anderson
Journal: Nat Genet Date: 2017-01-09 Impact factor: 41.307

9. Conserved cell types with divergent features in human versus mouse cortex.

Authors: Rebecca D Hodge; Trygve E Bakken; Jeremy A Miller; Kimberly A Smith; Eliza R Barkan; Lucas T Graybuck; Jennie L Close; Brian Long; Nelson Johansen; Osnat Penn; Zizhen Yao; Jeroen Eggermont; Thomas Höllt; Boaz P Levi; Soraya I Shehata; Brian Aevermann; Allison Beller; Darren Bertagnolli; Krissy Brouner; Tamara Casper; Charles Cobbs; Rachel Dalley; Nick Dee; Song-Lin Ding; Richard G Ellenbogen; Olivia Fong; Emma Garren; Jeff Goldy; Ryder P Gwinn; Daniel Hirschstein; C Dirk Keene; Mohamed Keshk; Andrew L Ko; Kanan Lathia; Ahmed Mahfouz; Zoe Maltzer; Medea McGraw; Thuc Nghi Nguyen; Julie Nyhus; Jeffrey G Ojemann; Aaron Oldre; Sheana Parry; Shannon Reynolds; Christine Rimorin; Nadiya V Shapovalova; Saroja Somasundaram; Aaron Szafer; Elliot R Thomsen; Michael Tieu; Gerald Quon; Richard H Scheuermann; Rafael Yuste; Susan M Sunkin; Boudewijn Lelieveldt; David Feng; Lydia Ng; Amy Bernard; Michael Hawrylycz; John W Phillips; Bosiljka Tasic; Hongkui Zeng; Allan R Jones; Christof Koch; Ed S Lein
Journal: Nature Date: 2019-08-21 Impact factor: 49.962

24 in total

Review 1. Genetics of amyotrophic lateral sclerosis: seeking therapeutic targets in the era of gene therapy.

Authors: Naoki Suzuki; Ayumi Nishiyama; Hitoshi Warita; Masashi Aoki
Journal: J Hum Genet Date: 2022-06-13 Impact factor: 3.172

2. Targeting phosphoglycerate kinase 1 with terazosin improves motor neuron phenotypes in multiple models of amyotrophic lateral sclerosis.

Authors: Helena Chaytow; Emily Carroll; David Gordon; Yu-Ting Huang; Dinja van der Hoorn; Hannah Louise Smith; Thomas Becker; Catherina Gwynne Becker; Kiterie Maud Edwige Faller; Kevin Talbot; Thomas Henry Gillingwater
Journal: EBioMedicine Date: 2022-08-11 Impact factor: 11.205

3. Co-occurrence of amyotrophic lateral sclerosis and Leber's hereditary optic neuropathy: is mitochondrial dysfunction a modifier?

Authors: Giulia Amore; Veria Vacchiano; Chiara La Morgia; Maria L Valentino; Leonardo Caporali; Claudio Fiorini; Danara Ormanbekova; Fabrizio Salvi; Anna Bartoletti-Stella; Sabina Capellari; Rocco Liguori; Valerio Carelli
Journal: J Neurol Date: 2022-09-06 Impact factor: 6.682

4. Causal Inference of Genetic Variants and Genes in Amyotrophic Lateral Sclerosis.

Authors: Siyu Pan; Xinxuan Liu; Tianzi Liu; Zhongming Zhao; Yulin Dai; Yin-Ying Wang; Peilin Jia; Fan Liu
Journal: Front Genet Date: 2022-06-22 Impact factor: 4.772

5. Parkinsonian Syndromes in Motor Neuron Disease: A Clinical Study.

Authors: Jacopo Pasquini; Francesca Trogu; Claudia Morelli; Barbara Poletti; Floriano Girotti; Silvia Peverelli; Alberto Brusati; Antonia Ratti; Andrea Ciammola; Vincenzo Silani; Nicola Ticozzi
Journal: Front Aging Neurosci Date: 2022-06-27 Impact factor: 5.702

Review 6. SOD1 in ALS: Taking Stock in Pathogenic Mechanisms and the Role of Glial and Muscle Cells.

Authors: Caterina Peggion; Valeria Scalcon; Maria Lina Massimino; Kelly Nies; Raffaele Lopreiato; Maria Pia Rigobello; Alessandro Bertoli
Journal: Antioxidants (Basel) Date: 2022-03-23

Review 7. A review of Mendelian randomization in amyotrophic lateral sclerosis.

Authors: Thomas H Julian; Sarah Boddy; Mahjabin Islam; Julian Kurz; Katherine J Whittaker; Tobias Moll; Calum Harvey; Sai Zhang; Michael P Snyder; Christopher McDermott; Johnathan Cooper-Knock; Pamela J Shaw
Journal: Brain Date: 2022-04-29 Impact factor: 15.255

8. Whole-genome sequencing reveals that variants in the Interleukin 18 Receptor Accessory Protein 3'UTR protect against ALS.

Authors: Chen Eitan; Aviad Siany; Elad Barkan; Tsviya Olender; Kristel R van Eijk; Matthieu Moisse; Sali M K Farhan; Yehuda M Danino; Eran Yanowski; Hagai Marmor-Kollet; Natalia Rivkin; Nancy Sarah Yacovzada; Shu-Ting Hung; Johnathan Cooper-Knock; Chien-Hsiung Yu; Cynthia Louis; Seth L Masters; Kevin P Kenna; Rick A A van der Spek; William Sproviero; Ahmad Al Khleifat; Alfredo Iacoangeli; Aleksey Shatunov; Ashley R Jones; Yael Elbaz-Alon; Yahel Cohen; Elik Chapnik; Daphna Rothschild; Omer Weissbrod; Gilad Beck; Elena Ainbinder; Shifra Ben-Dor; Sebastian Werneburg; Dorothy P Schafer; Robert H Brown; Pamela J Shaw; Philip Van Damme; Leonard H van den Berg; Hemali Phatnani; Eran Segal; Justin K Ichida; Ammar Al-Chalabi; Jan H Veldink; Eran Hornstein
Journal: Nat Neurosci Date: 2022-03-31 Impact factor: 28.771

9. Genome-wide identification of the genetic basis of amyotrophic lateral sclerosis.

Authors: Sai Zhang; Johnathan Cooper-Knock; Annika K Weimer; Minyi Shi; Tobias Moll; Jack N G Marshall; Calum Harvey; Helia Ghahremani Nezhad; John Franklin; Cleide Dos Santos Souza; Ke Ning; Cheng Wang; Jingjing Li; Allison A Dilliott; Sali Farhan; Eran Elhaik; Iris Pasniceanu; Matthew R Livesey; Chen Eitan; Eran Hornstein; Kevin P Kenna; Jan H Veldink; Laura Ferraiuolo; Pamela J Shaw; Michael P Snyder
Journal: Neuron Date: 2022-01-18 Impact factor: 18.688

Review 10. Mechanistic Insights of Mitochondrial Dysfunction in Amyotrophic Lateral Sclerosis: An Update on a Lasting Relationship.

Authors: Niccolò Candelise; Illari Salvatori; Silvia Scaricamazza; Valentina Nesci; Henri Zenuni; Alberto Ferri; Cristiana Valle
Journal: Metabolites Date: 2022-03-09