Literature DB >> 33510477

Integrating human brain proteomes with genome-wide association data implicates new proteins in Alzheimer's disease pathogenesis.

Aliza P Wingo^1,2, Yue Liu³, Ekaterina S Gerasimov³, Jake Gockley⁴, Benjamin A Logsdon⁴, Duc M Duong⁵, Eric B Dammer⁵, Chloe Robins³, Thomas G Beach⁶, Eric M Reiman⁷, Michael P Epstein⁸, Philip L De Jager⁹, James J Lah³, David A Bennett¹⁰, Nicholas T Seyfried⁵, Allan I Levey³, Thomas S Wingo^11,12.

Abstract

Genome-wide association studies (GWAS) have identified many risk loci for Alzheimer's disease (AD)1,2, but how these loci confer AD risk is unclear. Here, we aimed to identify loci that confer AD risk through their effects on brain protein abundance to provide new insights into AD pathogenesis. To that end, we integrated AD GWAS results with human brain proteomes to perform a proteome-wide association study (PWAS) of AD, followed by Mendelian randomization and colocalization analysis. We identified 11 genes that are consistent with being causal in AD, acting via their cis-regulated brain protein abundance. Nine replicated in a confirmation PWAS and eight represent new AD risk genes not identified before by AD GWAS. Furthermore, we demonstrated that our results were independent of APOE e4. Together, our findings provide new insights into AD pathogenesis and promising targets for further mechanistic and therapeutic studies.

Entities: Chemical Disease Gene Species

Mesh：

Substances：

Year: 2021 PMID： 33510477 PMCID： PMC8130821 DOI： 10.1038/s41588-020-00773-z

Source DB: PubMed Journal: Nat Genet ISSN： 1061-4036 Impact factor: 38.330

Introductory paragraph

Genome-wide association studies (GWAS) have identified many risk loci for Alzheimer’s disease (AD)[1,2], but how these loci confer AD risk remains unclear. Here, we aimed to identify loci that confer AD risk through their effects on brain protein abundances to provide new insights into AD pathogenesis. To that end, we integrated AD GWAS results with human brain proteomes to perform a proteome-wide association study (PWAS) of AD, followed by Mendelian randomization and colocalization analysis. We identified 11 genes that are consistent with being causal in AD, acting via their cis-regulated brain protein abundances. Nine replicated in a confirmatory PWAS and eight represent novel AD risk genes not identified before by AD GWAS. Furthermore, we demonstrated that our results were independent of APOE E4. Together, our findings provide new insights into AD pathogenesis and promising targets for further mechanistic and therapeutic studies. AD affects 35 million people worldwide but there is no effective disease-modifying treatment for it[3]. To support the development of new AD therapeutics, genetic studies of AD, especially GWAS, have identified many risk loci[1,2], but how these risk loci contribute to AD remains unclear. To gain insight into how these loci contribute to AD pathogenesis, we integrated AD GWAS results[1] with human brain proteomes[4] to identify genes that confer AD risk through their effects on brain protein abundance. In the discovery phase, we performed a PWAS by integrating AD GWAS results (N=455,258)[1] with 376 human brain proteomes profiled from the dorsolateral prefrontal cortex (dPFC; Supplementary Table 1a)[4] using the FUSION pipeline[5]. Before integration, the proteomic profiles underwent quality control and effects of clinical characteristics and technical factors were regressed out before we estimated effects of genetic variants on protein abundance, referred to as protein weights. After quality control, the proteomic profiles included 8356 proteins, of which 1475 were heritable and their protein weights could be estimated for the PWAS. The PWAS identified 13 genes whose cis-regulated brain protein levels were associated with AD at false discovery rate (FDR) p<0.05 (Figure 1, Table 1, Extended Data Figure 1a, Supplementary Table 2).

Figure 1:

Manhattan plot for the discovery AD PWAS integrating the AD GWAS (N=455,258) with the discovery ROS/MAP proteomes (N=376). Each point represents a single test of association between a gene and AD ordered by genomic position on the x axis and the association strength on the y axis as the -log10 p value of a z-score test. The discovery PWAS identified 13 genes whose cis-regulated brain protein abundances were associated with AD at FDR p< 0.05. The red horizontal line reflects the significant threshold of FDR p <0.05 and is set at the highest unadjusted p value that is below that threshold (p = 2.6×10−4).

Table 1:

The discovery AD PWAS identified 13 significant genes, of which, 10 were found in the confirmatory PWAS and all 10 replicated.

			Discovery PWAS			Confirmatory PWAS		Evidence for Replication

	Gene	CHR	PWAS.Z	PWAS.p	PWAS.FDR.p	PWAS.Z	PWAS.p
1	ACE	17	−5.36	8.5×10⁻⁸	4.2×10⁻⁵	−5.28	1.3×10⁻⁷	yes
2	EPHX2	8	5.46	4.7×10⁻⁸	3.4×10⁻⁵	4.68	2.8×10⁻⁶	yes
3	SNX32	11	−4.69	2.8×10⁻⁶	8.4×10⁻⁴	−4.27	2.0×10⁻⁵	yes
4	DOC2A	16	−4.51	6.4×10⁻⁶	1.6×10⁻³	−4.23	2.3×10⁻⁵	yes
5	LACTB	15	3.76	1.7×10⁻⁴	2.1×10⁻²	4.08	4.5×10⁻⁵	yes
6	ICA1L	2	−3.88	1.1×10⁻⁴	1.6×10⁻²	−3.96	7.5×10⁻⁵	yes
7	CARHSP1	16	3.66	2.6×10⁻⁴	2.9×10⁻²	3.48	5.1×10⁻⁴	yes
8	RTFDC1	20	4.25	2.1×10⁻⁵	3.9×10⁻³	3.10	2.0×10⁻³	yes
9	STX6	1	3.83	1.3×10⁻⁴	1.7×10⁻²	2.96	3.1×10⁻³	yes
10	CTSH	15	4.68	2.9×10⁻⁶	8.4×10⁻⁴	2.36	1.8×10⁻²	yes
11	PLEKHA1*	10	4.40	1.1×10⁻⁵	2.3×10⁻³	−	−	−
12	PVR**	19	−10.94	7.1×10⁻²⁸	1.0×10⁻²⁴	−	−	−
13	STX4**	16	4.00	6.2×10⁻⁵	1.0×10⁻²	−	−	−

This table gives the z scores for the AD PWAS associations with their corresponding p-values and FDR-adjusted p-values for all significant genes in the AD discovery PWAS. Confirmatory AD PWAS z scores and their corresponding unadjusted p-values are provided for the significant genes in the discovery AD PWAS.

Asterisk indicates protein not profiled in the confirmatory proteomic dataset.

Double asterisks denote proteins profiled but did not have significant SNP-based heritability estimates in the confirmatory proteomic dataset.

Extended Data Fig. 1

Quantile-quantile plots for the discovery and replication PWAS of AD

Quantile-quantile plot for A) the discovery PWAS of AD (λ = 1.36; λ1000 = 1.003) and B) confirmatory PWAS of AD (λ = 1.39; λ1000 = 1.003).

A confirmatory PWAS was performed using the same AD GWAS[1] and an independent set of 152 human brain proteomes profiled from the dPFC (Supplementary Table 1b)[6]. After quality control, 8168 proteins remained and 1139 were heritable. Correlation between the protein weights in the discovery and confirmatory datasets was high (median 0.85, interquartile range 0.21; Supplementary table 3). Three of the 13 discovery PWAS-significant proteins could not be tested in the confirmatory PWAS – one protein was not profiled and two were profiled but did not have significant heritability, likely due to the smaller sample size. Ten of these 13 proteins could be tested and all 10 proteins replicated in the confirmatory PWAS (Table 1; Extended Data Figure 1b; Supplementary table 4). Associations in the PWAS of AD may result when a variant is associated with protein expression (i.e., the variant is a protein quantitative trait locus [pQTL]) and AD simultaneously, or from a coincidental overlap between pQTLs and sites in linkage disequilibrium with AD GWAS sites. The former is interpreted as evidence supporting either a pleiotropic or causal role for the gene (and will be referred to as consistent with being causal for simplicity) while the latter suggests a non-causal role. We investigated these possibilities using two independent but complementary approaches. First, using a Bayesian colocalization method, COLOC[7], we examined the posterior probability for a shared causal variant between a pQTL and AD for the 13 discovery AD PWAS-significant genes. We found 9 of 13 genes consistent with being causal (Table 2; Supplementary Table 5). Second, we used the summary data-based Mendelian randomization (SMR)[8] and its accompanying heterogeneity in dependent instruments (HEIDI)[8]. SMR results suggest that the cis-regulated protein abundance mediates the association between genetic variants and AD for all these 13 genes, but HEIDI results argue against a causal role for 4 genes due to linkage disequilibrium (Table 2; Supplementary Table 6). Thus, 9 of the 13 genes have evidence consistent with a causal role in AD by SMR/HEIDI. In sum, we found 7 genes with consistent results for causality by both COLOC and SMR/HEIDI (CTSH, DOC2A, ICA1L, LACTB, PLEKHA1, SNX32, and STX4; Table 2), and 4 genes with conflicting results for causality by these two approaches (ACE, CARHSP1, RTFDC1, and STX6; Table 2). Results for EPHX2 and PVR argued against causality (Table 2).

Table 2:

COLOC and SMR analysis of the 13 significant genes in the discovery AD PWAS. Eleven genes had evidence consistent with a causal role by either COLOC or SMR.

			COLOC		SMR

	Gene	Chr	H₄	Causal variant	SMR.p	HEIDI.p	Causal variant
1	CTSH	15	0.962	yes	3.1×10⁻⁵	0.464	yes
2	DOC2A	16	0.907	yes	1.0×10⁻³	0.742	yes
3	ICA1L	2	0.672	yes	4.1×10⁻⁴	0.977	yes
4	LACTB	15	0.754	yes	3.8×10⁻⁴	0.070	yes
5	PLEKHA1*	10	0.581	yes	3.0×10⁻³	0.455	yes
6	SNX32	11	0.975	yes	2.7×10⁻⁵	0.588	yes
7	STX4*	16	0.918	yes	5.0×10⁻³	0.808	yes
8	ACE	17	0.976	yes	4.0×10⁻³	0.039	no
9	RTFDC1	20	0.643	yes	4.6×10⁻⁵	0.034	no
10	CARHSP1	16	0.188	no	1.2×10⁻²	0.397	yes
11	STX6	1	0.072	no	1.0×10⁻²	0.748	yes
12	EPHX2	8	0	no	7.1×10⁻⁷	0.008	no
13	PVR*	19	0.022	no	1.4×10⁻⁵	n/a	n/a

For the 13 FDR-significant genes in the discovery AD PWAS, the result of COLOC H4, which is the Bayesian posterior probability that a genetic variant is shared by both traits (i.e., gene and AD), and P values for SMR and SMR HEIDI tests are given.

Asterisk denotes genes not found in the confirmatory PWAS. n/a (not applicable) indicates undetermined result since the number of pQTL SNPs were too small for HEIDI to test. Genes were sorted by whether they are consistent with being a causal variant.

Combining evidence for replication and results of causality tests, there were 5 genes (CTSH, DOC2A, ICA1L, LACTB, and SNX32) with evidence for both replication and causality (Table 3). There were 4 genes with evidence for replication and mixed results supporting causality (ACE, CARHSP1, RTFDC1, and STX6; Table 3). Thus, among the 13 discovery PWAS-significant genes, 11 were consistent with being causal in AD, and 9 of 11 replicated in the confirmatory PWAS (Table 3).

Table 3:

Summary of the 11 AD PWAS-significant genes with evidence for being consistent with a causal role in AD.

			Discovery	Confirmatory	Evidence for causality		TWAS	Novel
	Gene	Chr	PWAS	PWAS	COLOC	SMR	significant	gene
1	CTSH	15	significant	replicated	yes	yes	suggestive	yes
2	DOC2A	16	significant	replicated	yes	yes	n/a	yes
3	ICA1L	2	significant	replicated	yes	yes	no	yes
4	LACTB	15	significant	replicated	yes	yes	suggestive	no
5	SNX32	11	significant	replicated	yes	yes	yes	yes
6	ACE	17	significant	replicated	yes	no	yes	yes
7	RTFDC1	20	significant	replicated	yes	no	suggestive	no
8	CARHSP1	16	significant	replicated	no	yes	yes	yes
9	STX6	1	significant	replicated	no	yes	yes	yes
10	STX4*	16	significant	−	yes	yes	yes	no
11	PLEKHA1*	10	significant	−	yes	yes	n/a	yes

Asterisk denotes proteins not found in the confirmation PWAS. n/a refers to genes that did not have significant heritability estimates to be included in the TWAS of AD. Full results for TWAS is in Supplementary tables 17-18. “suggestive” in “TWAS significant” column refers to genes with 0.05

Since the APOE E4 allele is strongly associated with AD, we investigated whether APOE E4 influenced our PWAS findings. To that end, we regressed out the effect of APOE E4 from the proteomes and used the regressed proteomic profiles to perform the PWAS of AD. That analysis found the 13 original PWAS-significant genes and 6 additional significant genes at FDR p<0.05 (ACOT8, DDX58, ISLR2, PITPNC1, TBC1D1, and TRIM65; Supplementary table 7). All the 13 genes had the same directions of association as those in the discovery PWAS. Moreover, results from COLOC and SMR/HEIDI tests found the same evidence of causality as the original findings except that ACE was now consistent with causality by both COLOC and SMR/HEIDI compared to mixed findings before (Supplementary tables 8-9). The 6 additional genes were not consistent with being causal by COLOC (Supplementary Table 8). These observations suggest that our findings are unlikely to be influenced by APOE E4. To understand the specificity of the AD PWAS results, we performed PWAS for other brain-relevant and biometric traits. We expected the degree of overlap of significant genes to roughly correspond to their genetic correlations. GWAS results from individuals of European descent for clinical AD (N=63,926)[2], amyotrophic lateral sclerosis (ALS; N=80,610)[9], Parkinson’s disease (PD; N=1,474,097)[10], neuroticism (N=390,278)[11], height (N=693,529)[12], body mass index (BMI; N=681,275)[12], and waist-to-hip ratio adjusting for BMI (WHRadjBMI; N=694,649)[13] were combined with the discovery proteomic profiles (N=376) to perform a PWAS of each trait. The PWAS of clinical AD identified 4 genes, ALS 7 genes, PD 17 genes, neuroticism 72 genes, height 662 genes, BMI 395 genes, and WHRadjBMI 244 genes (Supplementary Tables 10-16). Overlap of the significant genes between the discovery AD PWAS and PWAS of other traits was 75% for clinical AD, 0% for ALS, 5.9% for PD, 2.8% for neuroticism, 1.7% for height, 1.5% for BMI, and 0.4% for WHRadjBMI (Extended Data Figure 2). The small overlap with biometric traits is not surprising given their estimates of genetic correlation with AD[1]. These results suggest the specificity of our AD PWAS findings.

Extended Data Fig. 2

Overlap of significant genes between AD and other traits

Overlap between results of the AD PWAS and PWAS for other traits. All the PWAS used the discovery ROS/MAP proteomic dataset (n=376) and GWAS summary results from Caucasian individuals. The following outcomes were tested: clinical AD GWAS (N=63,926), amyotrophic lateral sclerosis (ALS; N=80,610), body mass index (BMI; N=681,275), height (N=693,529), neuroticism (N=390,278), Parkinson’s disease (PD; N=1,474,097), and waist-to-hip ratio adjusting for BMI (WHRadjBMI; N=694,649). Significant genes considered for overlap are those with FDR p<0.05.

Given the central dogma of molecular biology that DNA is transcribed into mRNA, which is translated into protein, we asked whether the identified 11 genes with evidence for being causal in AD at the protein level had similar evidence at the transcript level. We integrated the AD GWAS results[1] with 888 human brain transcriptomes to perform a TWAS of AD using FUSION[5]. The 888 transcriptomes were mainly from the frontal cortex donated by participants of European descent (Supplementary table 1c), and quality control was analogous to that of the proteomes to remove technical and clinical characteristics before estimating the effect of genetic variants on mRNA expression. Among the 13,650 transcripts after quality control, 6870 were heritable. The AD TWAS identified 40 genes whose genetically regulated mRNA expression levels were associated with AD at FDR p<0.05 (Extended Data Figure 3; Supplementary table 17). Among the 11 potentially causal genes identified at the protein-level, five genes, ACE, CARHSP1, SNX32, STX4, and STX6, showed at least nominal significance with similar directions of association with AD as seen at the protein-level (Table 3; Supplementary table 18a).

Extended Data Fig. 3

Quantile-quantile plot for the TWAS of AD

Quantile-quantile plot for the TWAS of AD (λ = 1.22; λ1000 = 1.002).

For the 5 genes with evidence at both the transcript and protein levels, results from SMR test for two molecular traits[14] suggested their protein abundance is mediated by mRNA expression (Supplementary table 18a,b). For the three genes with suggestive evidence for cis-regulated mRNA’s association with AD (CTSH, LACTB, and RTFDC1), only CTSH had evidence to suggest protein expression is mediated by mRNA expression level (Supplementary table 18a,b). In sum, about half (6 of 11) of the genes with evidence consistent with being causal in AD at the protein level were also associated with AD at the transcript level. We previously identified 31 modules of co-expressed proteins in ROS/MAP reference proteomes using Weight Gene Co-expression Network Analysis[4,15]. We found that 6 of the 11 potential AD causal proteins belonged to one of these modules while 5 did not. For these 6 proteins, each belonged to a different module, which implies that our PWAS findings are not simply the result of correlated protein expression[16]. Using human single-cell RNA-sequencing data profiled from the dPFC[17] we found cell-type specific enrichment for expression of 6 of the 11 causal genes at FDR p-value < 0.05 (adjusted for 17,775 genes). DOC2A, ICA1L, PLEKHA1, and SNX32 were enriched in excitatory neurons, whereas CARHSP1 showed enrichment in oligodendrocytes and CTSH in astrocytes and microglia (Extended Data Figure 4; Supplementary table 19).

Extended Data Fig. 4

Single cell-type expression

Single-cell type expression for AD PWAS-significant genes with evidence of causality in AD. Using human brain single-cell RNA-sequencing data profiled from the dPFC, we found that 6 genes (of the 11 genes) had evidence of enrichment in a cell type at FDR p < 0.05. Enrichment testing was performed using Wilcoxon rank sum test, as implemented by the Seurat package, and multiple testing was accounted for by FDR adjusted for 17,775 tested genes. CARHSP1 showed enrichment in oligodendrocytes. CTSH showed enrichment in astrocytes and microglia. DOC2A, ICA1L, PLEKHA1, and SNX32 were enriched in excitatory neurons.

Lastly, 8 of the 11 identified causal genes were not within 1Mb of AD genome-wide significant sites[1] while 3 were (LACTB, RTFDC1, and STX4), implying that these 8 genes were from novel sites. The 8 genes were in regions with suggestive AD associations in GWAS (p-values of 5.3×10−5 to 1.9×10−7), which is in line with other TWAS studies[18-20]. In conclusion, we identified 11 brain proteins that have evidence consistent with being causal in AD for future mechanistic studies to find new treatments for the disease.

Methods

Human Brain Proteomic and Genetic Data in the Discovery PWAS

We generated human brain proteomes from the dorsolateral prefrontal cortex (dPFC) of post-mortem brain samples donated by 400 participants of European descent of the Religious Orders Study and Rush Memory and Aging Project (ROS/MAP)[21]. Participants in the ROS/MAP studies gave informed consent for longitudinal assessments, agreed to an Anatomic Gift Act, and consented to repurposing their data and biospecimens for future studies. The Institutional Review Board of Rush University Medical Center approved the ROS/MAP studies. We performed proteomic sequencing using isobaric tandem mass tag (TMT) peptide labeling and analyzed these peptides by liquid chromatography coupled to mass spectrometry. Samples were randomized by age, sex, post-mortem interval, cognitive diagnosis, and pathologies into 50 batches prior to TMT labeling to minimize batch effects. Peptides from each individual sample (N=400) and the global internal standard (GIS; N=100) were labeled using the TMT 10-plex kit (ThermoFisher) and high pH fractionation was used to increase peptide depth as previously described[22]. Two of the exact same GIS were included in each batch. We used Proteome Discoverer suite (version 2.3 ThermoFisher Scientific) and MS2 spectra searched against the canonical UniProtKB Human proteome database (February 2019) with 20,338 total sequences to assign peptide spectral matches. Peptide spectral matches (PSM) were filtered using percolator to a false discovery rate (FDR) of less than 1%, and, after spectral assignment, peptides were collated into proteins such that the combined probabilities of their constituent peptides achieved an FDR of 1%. Peptides shared among multiple proteins were assigned based on parsimony. Integration of ion quantification from MS2 or MS3 scans with a tolerance of 20 ppm at the most confident centroid setting was used to quantify reporter ions. After quantification of the proteins, we identified proteins that were not reliably measured using the two GIS that were run in each batch. Proteins whose measurements fell outside the 95% confidence interval for any batch were removed from further analysis. Proteomic analysis identified 12,691 proteins and after we excluded proteins with missing values in more than 50% of the 400 subjects, 8356 proteins remained. To remove the effects of protein loading differences, we scaled each protein abundance with a sample-specific total protein abundance and log2 transformed the abundance. Next, we identified and removed poorly performing samples using iterative principal component analysis (PCA) to remove samples with greater than four standard deviations from the mean of either the first or second principal component. Subsequently, regression was used to estimate and remove the effects of proteomic sequencing batch, MS reporter quantification mode, sex, age at death, postmortem interval, study (ROS vs. MAP), and the final clinical diagnosis of cognitive status from the proteomic profile. Expanded details on the proteomic sequencing and quality control are published here[4]. Genotyping was obtained from either whole genome sequencing (WGS) or genome-wide genotyping by either Illumina OmniQuad Express or Affymetrix GeneChip 6.0 platforms as described here[23]. Quality control of genotyping from either source was performed separately using Plink[24]. WGS data was preferred over array-based genotyping in cases where individuals had genotyping data from both sources. Individuals with overall genotyping missingness >5% were excluded. Variants were excluded if they had evidence of deviation from Hardy Weinberg equilibrium (p-value < 1×10−8), missing genotype rate >5%, minor allele frequency <1%, or are not a single nucleotide polymorphism (SNP). Next, KING[25] was used to remove individuals estimated to be closer than second degree relatives. For array-based data, we imputed genotyping to 1000 Genome Project Phase 3[26] using the Michigan Imputation Server[27] and SNPs with imputation R2 > 0.3 were retained. Principal component analysis was performed to compare genetic ancestry of these individuals to CEU from 1000 Genomes Project (Extended Data Figure 5; Supplementary Table 20). All samples were kept for analyses. All of our analyses used only the 1,190,321 HapMap SNPs present in the 489 individuals of European descent from the 1000 Genomes Project, which was provided by FUSION[5] and commonly referred to as the linkage disequilibrium reference panel. After quality control, there were 376 subjects with both proteomic and genetic data for our discovery PWAS.

Extended Data Fig. 5

Genetic principal components of genetic ancestry for each dataset

Genetic principal components of genetic ancestry for each dataset. The first two genetic principal components for individuals in each dataset are plotted (grey boxes) with individuals from the 1000G CEU dataset (purple triangles) for A) the discovery proteomic dataset, B) the replication proteomic dataset, and C) the transcriptomic dataset.

Human Brain Proteomic and Genetic Data in the Confirmation PWAS

The confirmation human brain proteomes were profiled from the dPFC of post-mortem brain samples from 198 participants of European descent recruited by the Banner Sun Health Research Institute (Banner). Participants in this study were recruited from the retirement communities in the greater Phoenix, Arizona, USA. All enrolled participants or their legal representatives signed an informed consent and the study was approved by the Institutional Review Board of Banner Sun Health Research Institute. Participants consented to annual standardized medical, neurological, and neuropsychological testing. Research diagnoses were made using approved research guidelines and a final clinicopathological diagnosis was made after review of all clinical, medical records, and neuropathological findings[6]. Only subjects with a final diagnosis of normal cognition or AD were included in the proteomic analysis. Proteomic profiling was performed using the same approach as described above for the discovery proteomes with two differences: only MS2 scans were obtained and MS2 spectra were searched against the UniProtKB human brain proteome database downloaded in April 2015. Due to different databases, exact Uniprot IDs were used when comparing the discovery and confirmation results. In total, there were 11,518 proteins quantified. We applied the same quality control procedure as was done in the discovery proteomic dataset to the confirmation proteomic data. Likewise, we used regression to remove the effects of proteomic sequencing batch, age, sex, post-mortem interval, and final clinical diagnosis of cognitive status from the confirmatory proteomic profiles before estimating the protein weights. Genotyping was performed using the Affymetrix Precision Medicine Array using DNA extracted from the brain with the Qiagen GenePure kit. We applied the same approach to quality control as described for the discovery dataset, including removing individuals based on data completeness or relatedness, removing sites with evidence of deviation from Hardy Weinberg equilibrium, missingness above 5%, minor allele frequency below 1%, or are not a SNP. Genotyping was imputed to the 1000 Genome Project Phase 3[26] using the Michigan Imputation Server[27]. SNPs with imputation R2> 0.3 were retained. Finally, only sites included in the linkage disequilibrium reference panel were used in our confirmation PWAS, as recommended by the FUSION pipeline. After quality control, there were 152 subjects with both proteomic and genetic data to include in our confirmation analyses.

Brain Transcriptomic and Genetic Data in the AD TWAS

The brain transcriptomes were profiled from post-mortem brain samples donated by 783 individuals of European descent recruited by ROS/MAP, Mayo, and Mount Sinai Brain Bank studies[23,28,29]. These transcriptomes were profiled mainly from the dPFC and also from frontal cortex, temporal cortex, inferior frontal gyrus, superior temporal gyrus, and perirhinal gyrus. Details on alignment, quality control, and normalization of the RNA-sequencing data have been described previously[30]. Briefly, Picard was used to convert BAM files to FASTQ format and STAR[31] was used to align reads to the GRCh38 reference genome and compute gene counts for each sample. We removed genes with < 1 count per million in at least 50% of the samples and genes with missing gene length and percent GC content. Next, we removed outlier samples. Then, we regressed out effects of batch, sex, post-mortem interval, age at death, brain region, and final diagnosis of cognitive status from the transcriptomic profiles before estimating mRNA weights. For subjects with transcriptomic data, their genome-wide genotyping was generated as described previously[23,28,29]. Quality control of the genotyping data was performed as described above for the discovery ROS/MAP dataset. After quality control, there were 13,650 mRNAs quantified from 783 individuals using 888 transcriptomes. Genotyping was filtered to include only sites in the linkage disequilibrium reference panel provided by FUSION before estimating mRNA weights as described below.

AD GWAS summary statistics

We used the summary association statistics from the latest GWAS of AD by Jansen et al[1], which had 455,258 Caucasian participants, most of whom were from the UK Biobank with family history of dementia.

Statistical Approach

We used FUSION[5] to estimate protein weights in the discovery and confirmation dataset, separately. For simplicity, we described here the process for the discovery dataset and followed the same steps for the confirmation dataset. As mentioned above, we subset ROS/MAP genome-wide genotyping into a linkage disequilibrium reference panel of 1,190,321 SNPs provided by FUSION to minimize the influence of linkage disequilibrium on the estimated test statistics[5]. Next, the SNP-based heritability for each gene was estimated using the discovery proteomic and genetic data. For proteins with significant heritability (i.e. heritability p-value <0.01), we used FUSION to compute the effect of SNPs on protein abundance using multiple predictive models - top1, blup, lasso, enet, bslmm[5]. Protein weights from the most predictive model were selected. Subsequently, we used FUSION to combine the genetic effect of AD (AD GWAS Z-score) with the protein weights by calculating the linear sum of Z for the independent SNPs at the locus to perform the PWAS of AD[5]. Lambda (λ)and lambda 1,000 (λ1000), which is a standardized estimate of genomic inflation scaled to a study of 1,000 cases and 1,000 controls[32-34], were calculated for each PWAS. Lambda 1,000 was calculated using the following formula[32-34]: They were found to be consistent with other studies using FUSION that calculated lambda[34] (Extended Data Figure 1). The slightly higher in the confirmation PWAS may reflect some difference in the heterogeneity of the datasets. For the transcriptomic data, we calculated the transcript weights using FUSION with two modifications to accommodate individuals with transcriptomic profiles from more than one brain region. First, the flag -scale 1 was added to handle pre-scaled expression values. Second, the family ID in the plink FAM file was used to ensure that samples from the same individual were always in the same fold within cross validation, and that no fold differed by more than 5% in size from any other fold. RNA weights were estimated using all five models and the most predictive model was used. Next, we used FUSION to combine the genetic effect of AD (AD GWAS Z-score) with the mRNA expression weights to perform the TWAS of AD. For the colocalization test we used the COLOC software[7] to estimate the posterior probability of the protein and AD sharing a causal variant, as well as the posterior probability of the protein and AD not sharing a causal variant using the marginal association statistics. For summary data-based Mendelian Randomization, SMR software[8] was used to test whether the AD PWAS-significant genes (from the FUSION) were associated with AD via their cis-regulated brain protein expression. We used plink[24] to estimate protein quantitative trait loci (pQTL) in the discovery proteomic dataset by linear regression. Then, we applied SMR to the pQTL results and the AD GWAS summary statistics. We used the conservative unadjusted p-value <0.05 from the heterogeneity in dependent instrument (HEIDI) to declare that presence of linkage likely influences the main SMR findings. For genes with both mRNA and protein abundance associated with AD, we applied SMR for two molecular traits[14] to the eQTL summary statistics from Siebert et al[35] and pQTL summary statistics described above to determine if the mRNA mediates the influence of SNP on proteins. We examined the cell-type specific expression of the 11 genes with evidence for a causal role in AD at the brain protein level using human brain single-cell RNA-sequencing data profiled from the dPFC from Mathys et al[17]. First, we performed data preprocessing and transformation on the raw single-cell RNA-sequencing data using the Seurat package[36]. We removed genes with fewer than 3 counts in a cell and cells with unique feature counts over 2,500 or less than 200. The RNA counts were then normalized and scaled using the NormalizeData and ScaleData functions. The RNA-sequencing data had 17,926 genes in 70,634 cells before and 17,775 genes in 53,083 cells after quality control and normalization. We focused on the 5 main cell types - excitatory neuron, inhibitory neuron, astrocyte, microglia, and oligodendrocyte. For the 11 potentially AD causal genes, we performed differential expression analysis to compare their expression levels in one cell type versus the rest of the other cell types to determine if they are highly expressed in a particular cell type. Multiple testing correction applied to this analysis was corrected for all 17,775 genes. To determine the novelty of the genes identified in the discovery PWAS, we asked whether each gene was within 1Mb window of the 2358 significant AD GWAS sites (p < 5×10−8) that correspond to the 29 independent risk loci[1].

Quantile-quantile plots for the discovery and replication PWAS of AD

Quantile-quantile plot for A) the discovery PWAS of AD (λ = 1.36; λ1000 = 1.003) and B) confirmatory PWAS of AD (λ = 1.39; λ1000 = 1.003).

Overlap of significant genes between AD and other traits

Quantile-quantile plot for the TWAS of AD

Quantile-quantile plot for the TWAS of AD (λ = 1.22; λ1000 = 1.002).

Single cell-type expression

Genetic principal components of genetic ancestry for each dataset

35 in total

1. Assessing the impact of population stratification on genetic association studies.

Authors: Matthew L Freedman; David Reich; Kathryn L Penney; Gavin J McDonald; Andre A Mignault; Nick Patterson; Stacey B Gabriel; Eric J Topol; Jordan W Smoller; Carlos N Pato; Michele T Pato; Tracey L Petryshen; Laurence N Kolonel; Eric S Lander; Pamela Sklar; Brian Henderson; Joel N Hirschhorn; David Altshuler
Journal: Nat Genet Date: 2004-03-28 Impact factor: 38.330

2. Meta-analysis of genome-wide association studies for neuroticism in 449,484 individuals identifies novel genetic loci and pathways.

Authors: Mats Nagel; Philip R Jansen; Sven Stringer; Kyoko Watanabe; Christiaan A de Leeuw; Julien Bryois; Jeanne E Savage; Anke R Hammerschlag; Nathan G Skene; Ana B Muñoz-Manchado; Tonya White; Henning Tiemeier; Sten Linnarsson; Jens Hjerling-Leffler; Tinca J C Polderman; Patrick F Sullivan; Sophie van der Sluis; Danielle Posthuma
Journal: Nat Genet Date: 2018-06-25 Impact factor: 38.330

3. Meta-analysis of genome-wide association studies for height and body mass index in ∼700000 individuals of European ancestry.

Authors: Loic Yengo; Julia Sidorenko; Kathryn E Kemper; Zhili Zheng; Andrew R Wood; Michael N Weedon; Timothy M Frayling; Joel Hirschhorn; Jian Yang; Peter M Visscher
Journal: Hum Mol Genet Date: 2018-10-15 Impact factor: 6.150

4. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets.

Authors: Zhihong Zhu; Futao Zhang; Han Hu; Andrew Bakshi; Matthew R Robinson; Joseph E Powell; Grant W Montgomery; Michael E Goddard; Naomi R Wray; Peter M Visscher; Jian Yang
Journal: Nat Genet Date: 2016-03-28 Impact factor: 38.330

5. Integrative approaches for large-scale transcriptome-wide association studies.

Authors: Alexander Gusev; Arthur Ko; Huwenbo Shi; Gaurav Bhatia; Wonil Chung; Brenda W J H Penninx; Rick Jansen; Eco J C de Geus; Dorret I Boomsma; Fred A Wright; Patrick F Sullivan; Elina Nikkola; Marcus Alvarez; Mete Civelek; Aldons J Lusis; Terho Lehtimäki; Emma Raitoharju; Mika Kähönen; Ilkka Seppälä; Olli T Raitakari; Johanna Kuusisto; Markku Laakso; Alkes L Price; Päivi Pajukanta; Bogdan Pasaniuc
Journal: Nat Genet Date: 2016-02-08 Impact factor: 38.330

6. Arizona Study of Aging and Neurodegenerative Disorders and Brain and Body Donation Program.

Authors: Thomas G Beach; Charles H Adler; Lucia I Sue; Geidy Serrano; Holly A Shill; Douglas G Walker; LihFen Lue; Alex E Roher; Brittany N Dugger; Chera Maarouf; Alex C Birdsill; Anthony Intorcia; Megan Saxon-Labelle; Joel Pullen; Alexander Scroggins; Jessica Filon; Sarah Scott; Brittany Hoffman; Angelica Garcia; John N Caviness; Joseph G Hentz; Erika Driver-Dunckley; Sandra A Jacobson; Kathryn J Davis; Christine M Belden; Kathy E Long; Michael Malek-Ahmadi; Jessica J Powell; Lisa D Gale; Lisa R Nicholson; Richard J Caselli; Bryan K Woodruff; Steven Z Rapscak; Geoffrey L Ahern; Jiong Shi; Anna D Burke; Eric M Reiman; Marwan N Sabbagh
Journal: Neuropathology Date: 2015-01-26 Impact factor: 1.906

7. Single-cell transcriptomic analysis of Alzheimer's disease.

Authors: Hansruedi Mathys; Jose Davila-Velderrain; Zhuyu Peng; Fan Gao; Shahin Mohammadi; Jennie Z Young; Madhvi Menon; Liang He; Fatema Abdurrob; Xueqiao Jiang; Anthony J Martorell; Richard M Ransohoff; Brian P Hafler; David A Bennett; Manolis Kellis; Li-Huei Tsai
Journal: Nature Date: 2019-05-01 Impact factor: 49.962

8. Integrative analysis of omics summary data reveals putative mechanisms underlying complex traits.

Authors: Yang Wu; Jian Zeng; Futao Zhang; Zhihong Zhu; Ting Qi; Zhili Zheng; Luke R Lloyd-Jones; Riccardo E Marioni; Nicholas G Martin; Grant W Montgomery; Ian J Deary; Naomi R Wray; Peter M Visscher; Allan F McRae; Jian Yang
Journal: Nat Commun Date: 2018-03-02 Impact factor: 14.919

9. An integrated map of genetic variation from 1,092 human genomes.

Authors: Goncalo R Abecasis; Adam Auton; Lisa D Brooks; Mark A DePristo; Richard M Durbin; Robert E Handsaker; Hyun Min Kang; Gabor T Marth; Gil A McVean
Journal: Nature Date: 2012-11-01 Impact factor: 49.962

10. Integrative transcriptome analyses of the aging brain implicate altered splicing in Alzheimer's disease susceptibility.

Authors: Towfique Raj; Yang I Li; Garrett Wong; Jack Humphrey; Minghui Wang; Satesh Ramdhani; Ying-Chih Wang; Bernard Ng; Ishaan Gupta; Vahram Haroutunian; Eric E Schadt; Tracy Young-Pearse; Sara Mostafavi; Bin Zhang; Pamela Sklar; David A Bennett; Philip L De Jager
Journal: Nat Genet Date: 2018-10-08 Impact factor: 38.330

38 in total

Review 1. Advancing the use of genome-wide association studies for drug repurposing.

Authors: William R Reay; Murray J Cairns
Journal: Nat Rev Genet Date: 2021-07-23 Impact factor: 53.242

2. Neuropathologic Correlates of Human Cortical Proteins in Alzheimer Disease and Related Dementias.

Authors: Lei Yu; Patricia A Boyle; Aliza P Wingo; Jingyun Yang; Tianhao Wang; Aron S Buchman; Thomas S Wingo; Nicholas T Seyfried; Allan I Levey; Philip L De Jager; Julie A Schneider; David A Bennett
Journal: Neurology Date: 2021-12-22 Impact factor: 9.910

3. Integrating human brain proteomic data with genome-wide association study findings identifies novel brain proteins in substance use traits.

Authors: Rachel L Kember; Henry R Kranzler; Sylvanus Toikumo; Heng Xu; Joel Gelernter
Journal: Neuropsychopharmacology Date: 2022-08-08 Impact factor: 8.294

4. Cell-type-specific cis-eQTLs in eight human brain cell types identify novel risk genes for psychiatric and neurological disorders.

Authors: Julien Bryois; Daniela Calini; Will Macnair; Lynette Foo; Eduard Urich; Ward Ortmann; Victor Alejandro Iglesias; Suresh Selvaraj; Erik Nutma; Manuel Marzin; Sandra Amor; Anna Williams; Gonçalo Castelo-Branco; Vilas Menon; Philip De Jager; Dheeraj Malhotra
Journal: Nat Neurosci Date: 2022-08-01 Impact factor: 28.771

5. Protective effects of a small-molecule inhibitor DDQ against tau-induced toxicities in a transgenic tau mouse model of Alzheimer's disease.

Authors: Murali Vijayan; Mathew George; Lloyd E Bunquin; Chhanda Bose; P Hemachandra Reddy
Journal: Hum Mol Genet Date: 2022-03-31 Impact factor: 5.121

6. Stroke genetics informs drug discovery and risk prediction across ancestries.

Authors: Aniket Mishra; Rainer Malik; Tsuyoshi Hachiya; Tuuli Jürgenson; Shinichi Namba; Daniel C Posner; Frederick K Kamanu; Masaru Koido; Quentin Le Grand; Mingyang Shi; Yunye He; Marios K Georgakis; Ilana Caro; Kristi Krebs; Yi-Ching Liaw; Felix C Vaura; Kuang Lin; Bendik Slagsvold Winsvold; Vinodh Srinivasasainagendra; Livia Parodi; Hee-Joon Bae; Ganesh Chauhan; Michael R Chong; Liisa Tomppo; Rufus Akinyemi; Gennady V Roshchupkin; Naomi Habib; Yon Ho Jee; Jesper Qvist Thomassen; Vida Abedi; Jara Cárcel-Márquez; Marianne Nygaard; Hampton L Leonard; Chaojie Yang; Ekaterina Yonova-Doing; Maria J Knol; Adam J Lewis; Renae L Judy; Tetsuro Ago; Philippe Amouyel; Nicole D Armstrong; Mark K Bakker; Traci M Bartz; David A Bennett; Joshua C Bis; Constance Bordes; Sigrid Børte; Anael Cain; Paul M Ridker; Kelly Cho; Zhengming Chen; Carlos Cruchaga; John W Cole; Phil L de Jager; Rafael de Cid; Matthias Endres; Leslie E Ferreira; Mirjam I Geerlings; Natalie C Gasca; Vilmundur Gudnason; Jun Hata; Jing He; Alicia K Heath; Yuk-Lam Ho; Aki S Havulinna; Jemma C Hopewell; Hyacinth I Hyacinth; Michael Inouye; Mina A Jacob; Christina E Jeon; Christina Jern; Masahiro Kamouchi; Keith L Keene; Takanari Kitazono; Steven J Kittner; Takahiro Konuma; Amit Kumar; Paul Lacaze; Lenore J Launer; Keon-Joo Lee; Kaido Lepik; Jiang Li; Liming Li; Ani Manichaikul; Hugh S Markus; Nicholas A Marston; Thomas Meitinger; Braxton D Mitchell; Felipe A Montellano; Takayuki Morisaki; Thomas H Mosley; Mike A Nalls; Børge G Nordestgaard; Martin J O'Donnell; Yukinori Okada; N Charlotte Onland-Moret; Bruce Ovbiagele; Annette Peters; Bruce M Psaty; Stephen S Rich; Jonathan Rosand; Marc S Sabatine; Ralph L Sacco; Danish Saleheen; Else Charlotte Sandset; Veikko Salomaa; Muralidharan Sargurupremraj; Makoto Sasaki; Claudia L Satizabal; Carsten O Schmidt; Atsushi Shimizu; Nicholas L Smith; Kelly L Sloane; Yoichi Sutoh; Yan V Sun; Kozo Tanno; Steffen Tiedt; Turgut Tatlisumak; Nuria P Torres-Aguila; Hemant K Tiwari; David-Alexandre Trégouët; Stella Trompet; Anil Man Tuladhar; Anne Tybjærg-Hansen; Marion van Vugt; Riina Vibo; Shefali S Verma; Kerri L Wiggins; Patrik Wennberg; Daniel Woo; Peter W F Wilson; Huichun Xu; Qiong Yang; Kyungheon Yoon; Iona Y Millwood; Christian Gieger; Toshiharu Ninomiya; Hans J Grabe; J Wouter Jukema; Ina L Rissanen; Daniel Strbian; Young Jin Kim; Pei-Hsin Chen; Ernst Mayerhofer; Joanna M M Howson; Marguerite R Irvin; Hieab Adams; Sylvia Wassertheil-Smoller; Kaare Christensen; Mohammad A Ikram; Tatjana Rundek; Bradford B Worrall; G Mark Lathrop; Moeen Riaz; Eleanor M Simonsick; Janika Kõrv; Paulo H C França; Ramin Zand; Kameshwar Prasad; Ruth Frikke-Schmidt; Frank-Erik de Leeuw; Thomas Liman; Karl Georg Haeusler; Ynte M Ruigrok; Peter Ulrich Heuschmann; W T Longstreth; Keum Ji Jung; Lisa Bastarache; Guillaume Paré; Scott M Damrauer; Daniel I Chasman; Jerome I Rotter; Christopher D Anderson; John-Anker Zwart; Teemu J Niiranen; Myriam Fornage; Yung-Po Liaw; Sudha Seshadri; Israel Fernández-Cadenas; Robin G Walters; Christian T Ruff; Mayowa O Owolabi; Jennifer E Huffman; Lili Milani; Yoichiro Kamatani; Martin Dichgans; Stephanie Debette
Journal: Nature Date: 2022-09-30 Impact factor: 69.504

7. Molecular Quantitative Trait Locus Mapping in Human Complex Diseases.

Authors: Oluwatosin A Olayinka; Nicholas K O'Neill; Lindsay A Farrer; Gao Wang; Xiaoling Zhang
Journal: Curr Protoc Date: 2022-05

8. Brain DNA Methylation Patterns in CLDN5 Associated With Cognitive Decline.

Authors: Anke Hüls; Chloe Robins; Karen N Conneely; Rachel Edgar; Philip L De Jager; David A Bennett; Aliza P Wingo; Michael P Epstein; Thomas S Wingo
Journal: Biol Psychiatry Date: 2021-02-03 Impact factor: 13.382

9. A transcriptome-wide association study identifies novel blood-based gene biomarker candidates for Alzheimer's disease risk.

Authors: Yanfa Sun; Dan Zhou; Md Rezanur Rahman; Jingjing Zhu; Dalia Ghoneim; Nancy J Cox; Thomas G Beach; Chong Wu; Eric R Gamazon; Lang Wu
Journal: Hum Mol Genet Date: 2021-12-27 Impact factor: 5.121

Review 10. Immune-microbiome interplay and its implications in neurodegenerative disorders.

Authors: Ankit Uniyal; Vineeta Tiwari; Mousmi Rani; Vinod Tiwari
Journal: Metab Brain Dis Date: 2021-08-06 Impact factor: 3.584