Literature DB >> 34459398

Assessing Genetic Overlap and Causality Between Blood Plasma Proteins and Alzheimer's Disease.

Alex Handy1,2, Jodie Lord2, Rebecca Green2,3, Jin Xu2,4, Dag Aarsland2,5, Latha Velayudhan2, Abdul Hye2, Richard Dobson1,2,3,6,7, Petroula Proitsi2.   

Abstract

BACKGROUND: Blood plasma proteins have been associated with Alzheimer's disease (AD), but understanding which proteins are on the causal pathway remains challenging.
OBJECTIVE: Investigate the genetic overlap between candidate proteins and AD using polygenic risk scores (PRS) and interrogate their causal relationship using bi-directional Mendelian randomization (MR).
METHODS: Following a literature review, 31 proteins were selected for PRS analysis. PRS were constructed for prioritized proteins with and without the apolipoprotein E region (APOE+/-PRS) and tested for association with AD status across three cohorts (n = 6,244). An AD PRS was also tested for association with protein levels in one cohort (n = 410). Proteins showing association with AD were taken forward for MR.
RESULTS: For APOE ɛ3, apolipoprotein B-100, and C-reactive protein (CRP), protein APOE+ PRS were associated with AD below Bonferroni significance (pBonf, p < 0.00017). No protein APOE- PRS or AD PRS (APOE+/-) passed pBonf. However, vitamin D-binding protein (protein PRS APOE-, p = 0.009) and insulin-like growth factor-binding protein 2 (AD APOE- PRS p = 0.025, protein APOE- PRS p = 0.045) displayed suggestive signals and were selected for MR. In bi-directional MR, none of the five proteins demonstrated a causal association (p < 0.05) in either direction.
CONCLUSION: Apolipoproteins and CRP PRS are associated with AD and provide a genetic signal linked to a specific, accessible risk factor. While evidence of causality was limited, this study was conducted in a moderate sample size and provides a framework for larger samples with greater statistical power.

Entities:  

Keywords:  Alzheimer’s disease; C-reactive protein; apolipoprotein B-100; apolipoprotein E; blood proteins; insulin-like growth factor binding protein 2; mendelian randomization analysis; polygenic trait; vitamin D-binding protein

Mesh:

Substances:

Year:  2021        PMID: 34459398      PMCID: PMC8609677          DOI: 10.3233/JAD-210462

Source DB:  PubMed          Journal:  J Alzheimers Dis        ISSN: 1387-2877            Impact factor:   4.472


INTRODUCTION

Over 50 million people currently live with dementia worldwide, a figure forecast to rise to 152 million by 2050 as global populations live longer [1]. The most common form of dementia is late-onset Alzheimer’s disease (AD) [2], where individuals suffer severe, progressive cognitive decline and a range of neuropsychiatric symptoms [3] from their mid 60s until death. AD is a highly heritable [4], polygenic trait [5, 6] with a wide range of known genetic and environmental risk factors [7, 8]. However, the precise etiology of AD remains unexplained [9] and no disease altering treatments exist [10, 11]. Endophenotypes representing traits closer to a hypothesized biological risk factor can help unpack AD etiology and provide accessible targets for intervention. For example, a wide range of blood plasma proteins have been associated with AD [12] and provide a potential avenue for disease diagnosis and treatment. Promisingly, prediction of AD diagnosis using plasma levels of amyloid-β (Aβ) [13] and tau [14] is improving toward clinical level. However, measuring known AD protein neuropathological end-products (Aβ and tau) provides limited explanation of how other plasma proteins may mediate AD risk. For example, Kiddle et al.’s systematic review identified that four proteins (apolipoprotein E (APOE), alpha-2-macroglobulin, complement C3, and alpha-1-antitrypsin) were associated with AD in at least five cohorts [12]. APOE and the complement pathway have been consistently implicated in functional and genetic studies of AD risk [15, 16] and recent Mendelian randomization (MR) studies suggest lower levels of APOE and complement C3 in plasma may be causal for AD [17, 18]. One way to explore the role of plasma proteins further is to assess their genetic overlap with AD. Recent improvements in protein assay technology have enabled the creation of a genetic atlas for plasma proteins [19]. Over 3,000 proteins now have publicly available genetic summary statistics [19], allowing polygenic risk scores (PRS) to be constructed for individual proteins. PRS represent aggregate genetic propensity for a trait and so if associated with another trait imply a degree of shared genetics influences both traits. For example, a higher AD PRS has been associated with lower cognitive ability in individuals without dementia [20] and with increased levels of the promising AD biomarker p-tau181 [21]. If a protein PRS is associated with AD, this provides a genetic signal linked to a specific, accessible biological risk factor, something which remains a challenge for genome wide association studies (GWAS) [22]. PRS can also be calculated for individuals meaning protein PRS associated with AD could inform AD diagnosis prediction. However, testing the association of PRS with a trait does not demonstrate causality. For example, a plasma protein PRS associated with AD may simply indicate shared genetic variants which effect traits or pathways unrelated to disease pathogenesis [23]. MR provides a method to test whether an exposure causally effects an outcome by using genetic variants as instrumental variables in a construct similar to a randomized control trial. This works because an individual’s genes are effectively randomized at birth enabling the creation of a quasi-intervention group who have a genetic disposition for an exposure [24]. In AD, MR has demonstrated the protective effect of higher cognitive ability and educational attainment in two large scale studies [7, 25] and has indicated several blood metabolites are on the causal disease pathway [26]. For plasma proteins specifically, MR has primarily been deployed as part of large scale non-targeted, phenome-wide MR analysis [27, 28]. For example, Zheng et al. identified 111 causal associations between 65 proteins and 52 disease-related phenotypes (p < 3.5×10–7) including sialic acid binding Ig-like lectin 3 (CD33) with AD, supporting previous GWAS results [29, 30]. While phenome-wide MR designs excel at providing a multi-trait matrix of potential causal signals, they lack the flexibility to unpack disease specific relationships. For example, 61 out of 62 proteins associated with AD at a p-value less than 0.05 in Zheng et al. were tested using the Wald Ratio with only 1 SNP as an instrumental variable [27]. Disease specific approaches have more flexibility to relax assumptions which can help increase statistical power and enable more robust statistical sensitivity analyses [23]. The primary objective of this study was to explore the genetic overlap between AD and plasma proteins using PRS and to assess whether individual plasma proteins play a causal role in AD etiology using MR (see Fig. 1 for illustrative overview of study design). This study identified a shortlist of plasma proteins from existing literature that have been robustly associated with AD or AD endophenotypes and have publicly available genetic summary data. PRS models were then created for each shortlisted protein and tested for association with AD across three consortium cohorts, Genetic and Environmental Risk in Alzheimer’s Disease (GERAD1), Alzheimer’s Disease Neuroimaging Initiative (ADNI), and AddNeuroMed (ANM). An AD PRS was also constructed with publicly available genetic summary data from the largest case ascertained AD GWAS to date [8] and used to test each protein for bi-directional association in ANM where individual level plasma protein data was available. Lastly, for plasma proteins with PRS that demonstrated significant associations with AD (see Materials and Methods) in one or both PRS analyses, two sample bi-directional MR was conducted to test for causality (Fig. 1).
Fig. 1

Illustrative overview of study design outlining the four key steps in the study workflow: literature review, plasma protein PRS, AD PRS, and bi-directional MR.

Illustrative overview of study design outlining the four key steps in the study workflow: literature review, plasma protein PRS, AD PRS, and bi-directional MR.

MATERIALS AND METHODS

Plasma protein candidates

A shortlist of plasma proteins was compiled for analysis from previous AD discovery studies to represent known protein candidates for genetic exploration. A panel of 163 proteins previously associated with AD or AD related phenotypes from a systematic review of 21 non-targeted human AD blood protein discovery studies up to 2014 was used as the baseline [12]. An additional literature review was then conducted to cover the period from 2014–2019 and applied the same screening criteria (non-targeted discovery studies, except for panel based studies with > 100 candidates) [12] to generate the candidate list (further details in the Supplementary Material). Proteins were then selected for the shortlist if they passed two criteria. Firstly, the association replicated in > 1 study and, secondly, genetic summary data was available for the protein from Sun et al.’s human plasma protein GWAS [19]. At the time of analysis, Sun et al. was the largest, plasma protein GWAS using SomaLogic with publicly available data (downloaded and cross referenced from http://www.phpc.cam.ac.uk/ceu/proteins/).

Plasma protein data preparation

Plasma protein genetic summary data was acquired from Sun et al.’s GWAS of 3,622 proteins in 3,301 healthy participants from the INTERVAL study (further details in the Supplementary Material), a randomized trial of 45,000 blood donors across 25 centers in England [31]. Summary association results were made available on http://www.phpc.cam.ac.uk/ceu/proteins/ and were downloaded for proteins on the shortlist. Files were then uploaded to Rosalind, King’s College London’s high-performance computing facility, for further quality control (QC) and analysis. Further QC was conducted using R.3.6.0 and was based primarily on aligning to the protocol recommended for PRS analysis [32]. Specifically, remaining duplicates, non bi-allelic and non-target data overlapping variants were removed and SNP rsIDs were added and aligned with the target dataset (see https://github.com/AlexHandy1/ad-genetic-overlap-analysis for analysis scripts). Observed SNP heritability (h2SNP) and cross protein genetic correlation (rg) was estimated for each protein using linkage disequilibrium score regression (LDSR) with the Python LDSC package (Version 1.0.0) sourced from https://github.com/bulik/ldsc.

AD data preparation

Individual level genotype data for AD cases and controls was acquired from three consortium studies: GERAD1, ADNI, and ANM. GERAD1 is a European consortium (https://gtr.ukri.org/projects?ref=G0902227) [33]. The GERAD1 sample comprised up to 3,292 AD cases and 1,223 controls. Genetic data from a subset of 4,515 participants was made available for this analysis prior to QC (further details in the Supplementary Material). ADNI is a longitudinal, multi-center North American study initiated in 2004 [34] and now in its fourth wave (ADNI1, ADNI-GO, ADNI2, and ADNI3) collecting clinical, imaging, genetic, and biomedical biomarkers for AD. Genetic data from 1,674 participants (not including ADNI3) was made available for this analysis prior to QC (further details in the Supplementary Material). The ANM is a European consortium, initiated in 2008 with the aim to establish biomarkers for AD [35]. The Dementia Case Register (DCR) is a follow-up of ANM, with UK subjects recruited from the Maudsley and King’s Healthcare Partners Dementia Case Register [36]. Genetic data from 1,063 participants from ANM (including DCR) was made available for this analysis prior to QC (further details in the Supplementary Material). In addition to full clinical and demographic data, 410 participants had plasma protein data. Plasma protein data was collected using SomaLogic’s multiplexed, aptamer-based assay (SOMAscan) with SOMAmers for 1,016 proteins. Normalized, log2 transformed residuals were used as the phenotype after regression with age, sex, batch, and 10 principal components. For all three consortium datasets, standard genetic QC procedures were applied (removal of non-autosomal chromosomes, alleles with minor allele frequency < 1%, genotypes with call rate < 98%and Hardy Weinberg deviations at 1×10–5) based on protocol by Coleman [37] (further details in Supplementary Figure 1). The sample size remaining for analysis after these QC procedures were applied is reported in the results (see Tables 2 and 3). Summary characteristics of GERAD1, ADNI, and ANM participants post QC Summary characteristics of ANM participants with plasma protein data post QC

Plasma protein PRS

A PRS was constructed with the post QC Sun et al. GWAS data (“base data”) for each shortlist protein using PRSice-2 software (version 2.3.1e) [38]. PRSice automatically removes strand ambiguous SNPs and SNPs that are not present in both base and target data. SNPs were further “clumped” if LD (measured in r2) was > 0.1 within a window of 250 kilobases with the SNP with the lowest GWAS p-value within each region retained. After clumping, PRS were calculated with SNPs under 10 pre-defined p-value thresholds (5e-08, 5e-05, 5e-04, 0.0001, 0.001, 0.01, 0.05, 0.1, 0.2, 0.5, 1). PRS at all 10 thresholds for each protein were then independently tested for association with each individual AD dataset (GERAD1, ADNI and ANM) using logistic regression. Age, sex, and 7 principal components to control for population stratification were included as covariates to create a null model (covariates only), which was subtracted from the full model (covariates and PRS), to provide a Lee adjusted r2 [39] assuming an AD prevalence of 7%[40]. Protein PRS, for each selected protein, were tested for association with AD with the APOE region (992 SNPs within 750 kilobases of rs429358 on chromosome 19) included (APOE+) and excluded (APOE-) for all participants. APOE SNPs were removed to test whether protein PRS associations held without the known, strong effects of APOE alleles on AD risk [41]. As a secondary analysis to explore sex and age specific associations, PRS were constructed and tested for subgroups stratified by male, female, and 70 years and over (≥70) to test for gender and age specific effects. Lastly, random-effects meta-analysis was conducted on the PRS results at each p-value threshold across the three AD datasets using R and a restricted maximum likelihood (REML) estimator from the metafor R package [42]. Meta-analyzed results were ranked by lowest p-value to identify the most significant PRS model threshold for each protein-AD association. A Bonferroni corrected p-value of 0.00017 (0.05 / (number of proteins x number of PRS p-value thresholds tested)) was estimated and used as the primary threshold for significance. Given this is a conservative threshold, if no proteins pass this threshold in all participants (APOE+ and APOE–), nominally significant proteins (p < 0.05) will be considered for MR analysis based on an assessment of their association strength (e.g., how close to Bonferroni threshold) and consistency (e.g., association across subgroups).

AD PRS

An AD PRS was constructed using the meta-analyzed stage 1 discovery results from Kunkle et al.’s AD GWAS [8] of 21,982 AD cases and 41,944 cognitively normal controls (“base data”) downloaded from https://www.niagads.org/datasets/ng00075. Kunkle et al. was selected to provide statistical power and a clinical phenotype as the largest case ascertained AD GWAS to date. QC was applied to the base data of 11,480,632 variants to remove NAs, variants without an rsID and non bi-allelic variants leaving 10,528,610 variants for PRS analysis. As rsIDs were already provided in the AD base data and aligned to GRCh37, non-target data overlapping variants were removed with PRSice during the PRS analysis. p-value thresholds and clumping configuration settings in PRSice were kept constant with the protein PRS analysis. No covariates were included as age, sex, batch, and population stratification were already controlled for in the production of the protein residuals phenotype which was used as the target data. This decision was taken to align with other proteomic studies integrating genetic and proteomic data [19] and to enable comparability. AD PRS were tested for association with each individual shortlist protein using linear regression and the PRS model with the best fit (measured by lowest p-value) was presented for each protein. As with the protein PRS, AD PRS were tested for association with and without APOE SNPs and secondary analysis was conducted on subsets stratified by gender (male, female) and age (≥70). Bonferroni correction (and a more liberal assessment of association strength) was estimated and applied to control for multiple testing as described in the protein PRS analysis (with number of tests adjusted for the number of proteins with individual level blood data available for analysis).

Bi-directional MR

MR analysis was performed using the MR Base R package [43] on a subset of proteins from the PRS analyses. Proteins were selected for MR if they had a p-value below Bonferroni significance in either PRS analysis (protein PRS to AD APOE+/–or AD PRS to protein APOE+/–) or if they were nominally significant (p-value < 0.05) in both directions (protein PRS to AD APOE+/–and AD PRS to protein APOE+/–). Given Bonferroni is a conservative threshold, if no protein passed Bonferroni significance, the strongest protein association with a p-value below 0.05 was also considered. Univariate MR was performed with each protein as the exposure and AD as the outcome. Genetic instrument SNPs for each protein were selected from Sun et al. at two p-value thresholds for analysis (5×10–8 and 5×10–6). The less stringent 5×10–6 threshold was applied to ensure SNPs were available for all proteins with the noted limitation of introducing potential weak instrument bias. Selected SNPs were then clumped within a 250 kb window at LD r2 < 0.001. F statistics were generated for each SNP (SNP-exposure effect size∧2/SNP-exposure standard error∧2) to test for weak instrument bias and excluded if < 10 [44]. The remaining SNPs were further pruned if they were associated with any of the other proteins or with AD directly (p < 5×10–8 in Kunkle et al. GWAS). This was implemented to exclude SNPs that may affect AD through a pathway other than the exposure protein (horizontal pleiotropy) [45]. SNPs were also removed in the APOE region (chromosome 19, base-pairs 4500000–4580000) as potential confounders that violate MR’s core assumptions, given their known association with AD [46]. Protein exposure SNPs and AD outcome SNPs were harmonized and tested with alleles assumed to be on the forward strand (no palindromic SNPs removed). As a secondary test, the MR analysis was also run with palindromic SNPs flipped and removed if non-inferable. Causal estimates were estimated using inverse variance weighted (IVW) two sample MR and sensitivities were tested with MR-Egger, weighted median and leave one out analysis (further details in the Supplementary Material). To test for causality in the opposite direction, this analysis pipeline was repeated with AD as the exposure (using AD SNPs with p-value < 5×10–8 from Kunkle et al. stages 1, 2, and 3) and each protein as the outcome. An interactive web dashboard was built with R Shiny to present the full PRS and MR results (available at https://alexhandy1.shinyapps.io/ad-genetic-overlap-web-results/) with the key results reported herein.

RESULTS

Data preparation

The literature review provided 4 new studies [36, 47–49] adding 14 new proteins and bringing the total candidate protein list to 175. From the 175 candidate proteins, 31 passed the shortlist inclusion criteria (> 1 study replication, GWAS data available, see Table 1).
Table 1

Protein shortlist for analysis ordered by number of studies replicated in from literature review

Protein nameUniProt IDStudies replicated in (N)SOMAmer ID (Sun et al. study)
Pancreatic prohormoneP012987PPY.4588.1.2
Apolipoprotein E (isoform ɛ3)P026496APOE.2937.10.2
Complement factor HP086035CFH.4159.130.1
Plasma protease C1 inhibitorP051555SERPING1.4479.14.2
Complement C3P010245C3.2755.8.2
Fibrinogen (D-dimer)P02671,5FGA.FGB.FGG.4907.56.1
P02675,
P02679
Serum amyloid P-componentP027434APCS.2474.54.5
HaptoglobinP007383HP.3054.3.2
Interleukin-3P087003IL3.4717.55.2
Complement C4-A/BP0C0L43C4A.C4B.4481.34.2
P0C0L5
Interleukin-10P223013IL10.2773.50.2
VitronectinP040043VTN.13125.45.3
Insulin-like growth factor-binding protein 2P180653IGFBP2.2570.72.5
Angiopoietin-2O151232ANGPT2.13660.76.3
Apolipoprotein B-100P041142APOB.2797.56.2
C-C motif chemokine 26Q9Y2582CCL26.9168.31.3
C-reactive proteinP027412CRP.4337.49.2
ClusterinP109092CLU.4542.24.2
Granulocyte colony-stimulating factorP099192CSF3.8952.65.3
Interleukin-13P352252IL13.3072.4.2
Interleukin-8P101452CXCL8.3447.64.2
Kit ligandP215832KITLG.9377.25.3
Matrix metalloproteinase-9P147802MMP9.2579.17.5
Natriuretic peptides BP168602NPPB.3723.1.2
PlasminogenP007472PLG.3710.49.2
ResistinQ9HD892RETN.3046.31.1
SerotransferrinP027872TF.4162.54.2
TenascinP248212TNC.4155.3.2
Tumor necrosis factorP013752TNF.5936.53.3
Vascular cell adhesion protein 1P193202VCAM1.2967.8.1
Vitamin D-binding proteinP027742GC.6581.50.3
Protein shortlist for analysis ordered by number of studies replicated in from literature review Sun et al.’s GWAS provided summary statistics for 3301 participants (see characteristics in Supplementary Table 1) covering 10,572,788 variants for each protein, with 5,210,103 variants included for PRS analysis after additional QC for this study (see Supplementary Table 2). Average h2 SNP across the proteins was 0.10; however, results were treated as indicative given the average standard error was 0.16 (including 8 proteins with h2 SNP less than 0) (see the Supplementary Material for further details). For the AD cohort, 6,244 participants were available for analysis from GERAD1, ADNI, and ANM (see Table 2) with 5,218,413 overlapping variants included for PRS analysis after QC (see the Supplementary Material for further details).
Table 2

Summary characteristics of GERAD1, ADNI, and ANM participants post QC

TotalAD CasesControlsMalesFemales70 and overMean age
GERAD1 (N)4,4923,2771,2151,6402,8523,18970.5
ADNI (N)1,00763936855944888678.2
ANM (N)74537137430943664178.2
Total (N)6,2444,2871,9572,5083,7364,716
In the AD PRS analysis where only ANM had plasma protein data, 410 participants with plasma protein data remained for analysis after QC (see Table 3).
Table 3

Summary characteristics of ANM participants with plasma protein data post QC

TotalAD CasesMCI CasesControlsMalesFemales70 and overMean age
ANM (blood) (N)4102101049616324733075.2

PRS results

Plasma protein PRS

In all participants, APOE+ PRS for APOE ɛ3 (p = 6.5×10–21), apolipoprotein B-100 (APOB-100, p = 6.7×10–20), and C-reactive protein (CRP, p = 1.5×10–8) were associated with AD at Bonferroni significance (p < 0.00017) (see Table 4 and full results at https://alexhandy1.shinyapps.io/ad-genetic-overlap-web-results/). No other protein APOE+ PRS passed Bonferroni significance in all participants or subgroups. For APOE- PRS (992 SNPs within 750 kilobases of rs429358 on chromosome 19 removed), no proteins, including APOE, APOB, and CRP, passed Bonferroni significance in all participants or subgroups. In all participants, APOE- PRS for vitamin D-binding protein (VDBP) presented the strongest signal (p = 0.009) with 7 other proteins passing nominal significance (p < 0.05). The 3 proteins passing Bonferroni significance in APOE+ PRS (APOE ɛ3, APOB-100, and CRP) and the strongest APOE- PRS signal in all participants (VDBP) were deemed signals warranting further exploration in MR analysis.
Table 4

Summary of proteins with APOE+ PRS associations from meta-analyzed logistic regression with AD in all participants below Bonferroni significance

ProteinPRS p threshold*SNPs in PRS (N)Beta**95%CI Lower95%CI UpperR2 *** p
APOE ɛ35.0E-0850.260.200.310.016.5E-21
APOB-1005.0E-0820.580.460.700.016.7E-20
CRP5.0E-085–0.31–0.42–0.200.011.5E-08

*p-value from protein GWAS below which SNPs were included in PRS. PRS p-value threshold for most significant PRS model (based on meta-analyzed p-value) presented. **Estimated based on normalized, per standard deviation of PRS. ***Estimated from meta-analysis outputs using the formula where z is the Z-score for the protein PRS and n is the total sample size.

Summary of proteins with APOE+ PRS associations from meta-analyzed logistic regression with AD in all participants below Bonferroni significance *p-value from protein GWAS below which SNPs were included in PRS. PRS p-value threshold for most significant PRS model (based on meta-analyzed p-value) presented. **Estimated based on normalized, per standard deviation of PRS. ***Estimated from meta-analysis outputs using the formula where z is the Z-score for the protein PRS and n is the total sample size.

AD PRS

AD PRS models were tested for association with 26 of the 31 shortlist proteins in 1 ANM cohort (due to data availability) and in the same 3 subgroups (males, females and ≥70). In all participants and subgroups, no proteins were associated with AD APOE+ or APOE- PRS (992 SNPs within 750 kilobases of rs429358 on chromosome 19 removed) at Bonferroni significance (p < 0.00019, view full results at https://alexhandy1.shinyapps.io/ad-genetic-overlap-web-results/). In all participants, Haptoglobin presented the strongest association with AD APOE+ PRS (p = 0.0107), with CRP (p = 0.0108) the other protein to pass nominal significance. For AD APOE- PRS, Complement factor H was the strongest signal (p = 0.021) with 6 other proteins passing nominal significance. Insulin-like growth factor-binding protein 2 (IGFBP2) was one of these proteins (p = 0.025) and came close to Bonferroni significance in ≥70 (p = 0.00026). Given IGFBP2 was also the only protein to display a nominally significant association for APOE- PRS in both directions (AD APOE- PRS p = 0.025, protein APOE- PRS p = 0.045) it was selected for MR analysis.

Bi-directional MR results

MR analysis was conducted with five proteins, APOE ɛ3, APOB-100, CRP, IGFBP2, and VDBP to test for casual associations. Only two proteins (CRP and VDBP) had valid SNP instruments at 5×10–8 but all proteins had available SNP instruments at 5×10–6. No proteins passed nominal significance (p < 0.05) using IVW with SNPs selected at 5×10–8 or 5×10–6 (see Fig. 2 and view full results at https://alexhandy1.shinyapps.io/ad-genetic-overlap-web-results/). Overall, exclusion of non-inferable palindromic SNPs produced similar results across all methods. All instruments selected had an F statistic > 10 and there was limited evidence of horizontal pleiotropy (no Egger intercept p < 0.05) or heterogeneity between SNPs (no Cochran’s Q p < 0.05 for protein across multiple methods). There was also no evidence of reverse causality when AD was tested as an exposure.
Fig. 2

MR results with protein as exposure and AD as outcome for IVM, MR-Egger and weighted median with alleles assumed on forward strand for harmonization with SNP instruments selected at 5×10–6.

MR results with protein as exposure and AD as outcome for IVM, MR-Egger and weighted median with alleles assumed on forward strand for harmonization with SNP instruments selected at 5×10–6.

DISCUSSION

Main findings

This study set out to identify if a shortlist of plasma proteins, previously associated with AD in observational studies, genetically overlap with AD by testing if PRS for these proteins were associated with AD. For proteins with evidence of genetic overlap, MR was conducted to test whether exposure to the protein was causal. The findings provide strong evidence that APOE ɛ3, APOB-100, and CRP genetically overlap with AD and, therefore, identifies a genetic signal linked to a set of specific, accessible risk factors. Moreover, this overlap appears to be driven by the APOE genotype, providing evidence that APOE’s strong genetic effect on AD [41] may be partially linked to plasma proteins. However, our MR analysis found no evidence of causality for APOE ɛ3, APOB-100, or CRP. Apolipoproteins in plasma have been linked to AD risk through their role in regulating cholesterol [50], with increased total cholesterol (TC) associated with higher risk [51, 52] while increased high-density lipoprotein cholesterol (HDL-C) appears protective [53]. CRP has also been suggested as a causal factor in AD through its pro-inflammatory role in the immune response [54, 55]. Observational evidence is nuanced, with increased levels of CRP in midlife associated with multiple dementias in later life [56] contrasting with lower levels of CRP found in individuals who actively have AD [57]. The key advantage of MR is its ability to help disentangle causal associations from associations which could be triggered by numerous confounders [24]. Our null MR results for APOE ɛ3, APOB-100, or CRP align with larger MR studies that found no causal associations between AD and CRP [25], TC or HDL-C [58, 59], the hypothesized mechanisms of action for apolipoproteins. However, these results are not yet conclusive. A more recent MR meta-analysis found that both TC and HDL-C showed causal associations with AD [60] with Rasmussen et al.’s larger MR study also suggesting APOE in plasma is causal for AD [18]. Unfortunately due to data availability, our study was only able to measure APOE isoform ɛ3 rather than the higher effect isoforms ɛ2 and ɛ4 [61] suggesting our null MR result may be due to isoform rather than a lack of causality. All these findings support the need for a larger scale MR study measuring a consistent and historically comparable set of plasma protein exposures. Independently of APOE, VDBP and IGFBP2 presented suggestive signals of genetic overlap with AD in our study. Research investigating the effect of VDBP and IGFBP2 on AD is less well developed. VDBP has been shown to attenuate Aβ aggregation in vitro and in mice [62] and two recent MR studies have identified a causal relationship between increased levels of VDBP and reduced AD risk [63, 64], with Larsson et al demonstrating particularly strong evidence from two large samples. In our MR analysis, VDBP did not demonstrate evidence of causality, however, our sample is smaller than Larsson et al and may have lacked statistical power. IGFBP2 has been shown to restrict the neuroprotective effects of insulin growth factors [65] and increased levels have been associated with higher AD risk and lower cognitive performance [66]. In our MR analysis, IGFBP2 did not demonstrate evidence of causality. To the best of our knowledge, this is the first study to conduct a targeted MR on IGFBP2 and AD, suggesting further research is required before making a strong inference on causality.

Limitations and future directions

Our study has three key limitations. Firstly, the initial proteins considered for analysis had to demonstrate prior association with AD and, therefore, represent only a subset of proteins in plasma. Whilst a targeted design was deliberate to lower the risk of false positives and support interpretability, it does increase the probability that causal proteins for AD were not included in the analysis. This exclusion risk was amplified by data availability constraints and a reliance on the SomaLogic platform, with only 53%of candidate proteins from the literature review possessing publicly available genetic summary data at the time of analysis. Most notably, alpha-2-macroglobulin and alpha-1-antitrypsin were each replicated in 6 studies in the literature review (the same number of replications as APOE) but were excluded due to a lack of genetic data. Moving forward, a planned outreach effort to aggregate summary data from other research groups and assay platforms (as demonstrated by [27]) supported by integrating new data [67] could improve protein coverage and increase statistical power. Secondly, the statistical power to correctly identify true positives was limited across multiple steps of the analysis. Studies are now routinely conducted with samples of 100,000 + individuals for common human traits [68-70] in order to capture the small effect sizes of individual SNPs [71]. Sample size is particularly important for traits with low to moderate heritability where the effect size of individual SNPs is likely to be lower [72]. In this study, estimates of observed SNP heritability (h2 SNP) for proteins were themselves hampered by sample size (see https://github.com/bulik/ldsc/wiki/FAQ), but a median h2 SNP of 0.09 (interquartile range 0.006–0.22) suggests analyzed proteins were at most, moderately heritable. Moderate heritability weakens PRS and MR [73], given PRS and MR instrumental variables are a composite of genetic liability which is dependent on the heritability of a trait. This may partly explain the limited genetic overlap and low variance explained (R2) of AD by individual protein PRS (see https://alexhandy1.shinyapps.io/ad-genetic-overlap-web-results/). Low R2 may also reflect that proteins often exert their effects as part of large molecular networks rather than as individual entities [74]. Therefore, larger sample sizes and multivariate statistical tests are required to robustly corroborate the lack of causal protein-AD associations in this study. Lastly, the nature of the phenotype data collected possesses several inherent limitations. Both protein and AD case target data were collected at a single point in time. For AD this means a proportion of controls may have developed into cases and for proteins it means observed variability over time [75, 76] is not captured. For AD there is the added limitation of using a clinical diagnosis as a phenotype. Clinical manifestation is heterogenous [77, 78] and in-life diagnosis can be inaccurate [79, 80] implying that adding endophenotype measures closer to causal biology (e.g., hippocampal volume) and symptomology (e.g., cognitive performance) may allow more precise association analysis. Future analyses should, therefore, aim to expand protein coverage, increase sample sizes, and expand PRS and MR analyses to AD endophenotypes.

CONCLUSION

This study provides evidence that apolipoproteins and CRP PRS are associated with AD and identifies a genetic signal linked to a specific, accessible risk factor. However, none of the proteins tested in MR demonstrated evidence of causality. This study was conducted in a moderate sample size and may have lacked the statistical power to identify true causal associations. Therefore, this study provides a framework for future research to robustly interrogate protein causality in larger samples. Click here for additional data file.
  73 in total

1.  Accuracy of the clinical diagnosis of Alzheimer disease at National Institute on Aging Alzheimer Disease Centers, 2005-2010.

Authors:  Thomas G Beach; Sarah E Monsell; Leslie E Phillips; Walter Kukull
Journal:  J Neuropathol Exp Neurol       Date:  2012-04       Impact factor: 3.685

2.  Neuropsychiatric symptoms in Alzheimer's disease.

Authors:  Constantine G Lyketsos; Maria C Carrillo; J Michael Ryan; Ara S Khachaturian; Paula Trzepacz; Joan Amatniek; Jesse Cedarbaum; Robert Brashear; David S Miller
Journal:  Alzheimers Dement       Date:  2011-09       Impact factor: 21.566

Review 3.  Apolipoprotein E and Alzheimer disease: risk, mechanisms and therapy.

Authors:  Chia-Chen Liu; Chia-Chan Liu; Takahisa Kanekiyo; Huaxi Xu; Guojun Bu
Journal:  Nat Rev Neurol       Date:  2013-01-08       Impact factor: 42.937

4.  Association of genetic risk factors with cognitive decline: the PATH through life project.

Authors:  Shea J Andrews; Debjani Das; Nicolas Cherbuin; Kaarin J Anstey; Simon Easteal
Journal:  Neurobiol Aging       Date:  2016-02-21       Impact factor: 4.673

Review 5.  Regulation of plasma cholesterol by lipoprotein receptors.

Authors:  M S Brown; P T Kovanen; J L Goldstein
Journal:  Science       Date:  1981-05-08       Impact factor: 47.728

Review 6.  Candidate blood proteome markers of Alzheimer's disease onset and progression: a systematic review and replication study.

Authors:  Steven J Kiddle; Martina Sattlecker; Petroula Proitsi; Andrew Simmons; Eric Westman; Chantal Bazenet; Sally K Nelson; Stephen Williams; Angela Hodges; Caroline Johnston; Hilkka Soininen; Iwona Kłoszewska; Patrizia Mecocci; Magda Tsolaki; Bruno Vellas; Stephen Newhouse; Simon Lovestone; Richard J B Dobson
Journal:  J Alzheimers Dis       Date:  2014       Impact factor: 4.472

7.  An atlas of polygenic risk score associations to highlight putative causal relationships across the human phenome.

Authors:  Tom G Richardson; Sean Harrison; Gibran Hemani; George Davey Smith
Journal:  Elife       Date:  2019-03-05       Impact factor: 8.140

8.  A plasma protein classifier for predicting amyloid burden for preclinical Alzheimer's disease.

Authors:  Nicholas J Ashton; Alejo J Nevado-Holgado; Imelda S Barber; Steven Lynham; Veer Gupta; Pratishtha Chatterjee; Kathryn Goozee; Eugene Hone; Steve Pedrini; Kaj Blennow; Michael Schöll; Henrik Zetterberg; Kathryn A Ellis; Ashley I Bush; Christopher C Rowe; Victor L Villemagne; David Ames; Colin L Masters; Dag Aarsland; John Powell; Simon Lovestone; Ralph Martins; Abdul Hye
Journal:  Sci Adv       Date:  2019-02-06       Impact factor: 14.136

9.  Serum Parathyroid Hormone, 25-Hydroxyvitamin D, and Risk of Alzheimer's Disease: A Mendelian Randomization Study.

Authors:  Susanna C Larsson; Matthew Traylor; Hugh S Markus; Karl Michaëlsson
Journal:  Nutrients       Date:  2018-09-06       Impact factor: 5.717

Review 10.  Questions concerning the role of amyloid-β in the definition, aetiology and diagnosis of Alzheimer's disease.

Authors:  Gary P Morris; Ian A Clark; Bryce Vissel
Journal:  Acta Neuropathol       Date:  2018-10-22       Impact factor: 17.088

View more
  1 in total

1.  Mendelian randomization highlights significant difference and genetic heterogeneity in clinically diagnosed Alzheimer's disease GWAS and self-report proxy phenotype GWAX.

Authors:  Haijie Liu; Yang Hu; Yan Zhang; Haihua Zhang; Shan Gao; Longcai Wang; Tao Wang; Zhifa Han; Bao-Liang Sun; Guiyou Liu
Journal:  Alzheimers Res Ther       Date:  2022-01-28       Impact factor: 6.982

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.