Literature DB >> 32898681

Implicating causal brain imaging endophenotypes in Alzheimer's disease using multivariable IWAS and GWAS summary data.

Katherine A Knutson¹, Yangqing Deng¹, Wei Pan².

Abstract

Recent evidence suggests the existence of many undiscovered heritable brain phenotypes involved in Alzheimer's Disease (AD) pathogenesis. This finding necessitates methods for the discovery of causal brain changes in AD that integrate Magnetic Resonance Imaging measures and genotypic data. However, existing approaches for causal inference in this setting, such as the univariate Imaging Wide Association Study (UV-IWAS), suffer from inconsistent effect estimation and inflated Type I errors in the presence of genetic pleiotropy, the phenomenon in which a variant affects multiple causal intermediate risk phenotypes. In this study, we implement a multivariate extension to the IWAS model, namely MV-IWAS, to consistently estimate and test for the causal effects of multiple brain imaging endophenotypes from the Alzheimer's Disease Neuroimaging Initiative (ADNI) in the presence of pleiotropic and possibly correlated SNPs. We further extend MV-IWAS to incorporate variant-specific direct effects on AD, analogous to the existing Egger regression Mendelian Randomization approach, which allows for testing of remaining pleiotropy after adjusting for multiple intermediate pathways. We propose a convenient approach for implementing MV-IWAS that solely relies on publicly available GWAS summary data and a reference panel. Through simulations with either individual-level or summary data, we demonstrate the well controlled Type I errors and superior power of MV-IWAS over UV-IWAS in the presence of pleiotropic SNPs. We apply the summary statistic based tests to 1578 heritable imaging derived phenotypes (IDPs) from the UK Biobank. MV-IWAS detected numerous IDPs as possible false positives by UV-IWAS while uncovering many additional causal neuroimaging phenotypes in AD which are strongly supported by the existing literature.

Entities: Chemical Disease Gene Mutation Species

Keywords: Causal inference; Genetic pleiotropy; Instrumental variable; MRI; MV-IWAS; Mendelian randomization; TWAS

Mesh：

Year: 2020 PMID： 32898681 PMCID： PMC7778364 DOI： 10.1016/j.neuroimage.2020.117347

Source DB: PubMed Journal: Neuroimage ISSN： 1053-8119 Impact factor: 6.556

Introduction

Alzheimer’s Disease (AD) is a genetically complex disorder characterized by a collection of physiological brain changes, including many heritable phenotypes that are detectable via Magnetic Resonance Imaging (MRI) (Wenk, 2003), (Matsuda, 2017). However, recent research indicates that AD pathogenesis involves many more regions of the brain than previous realized (Frozza et al., 2018). As such, a more thorough understanding of the genetically-regulated structural and functional brain changes that drive AD progression is important for further characterization of the disease and development of targeted therapies for AD associated phenotypes. Advances toward high-speed and affordable genotyping and multimodal imaging technologies has facilitated the collection of these data for large-scale studies, such as the Alzheimer’s Disease Neuroimaging Initiative (ADNI) and UK Biobank (UKBB). Accordingly, powerful methods for integrating genetic and summarized brain measures (such as regional volumes) to test for causal imaging ROIs in AD are increasingly important (Pluta et al., 2018), (Shen and Thompson, 2020). Some existing approaches include Mendelian Randomization (MR) and the univariate Imaging Wide Association Study (UV-IWAS) (Davey Smith and Ebrahim, 2008), (Xu et al., 2017), (Zhao et al., 2019a), (Zhao et al., 2019b), (Gamazon et al., 2015), (Pluta et al., 2017). The latter method, which will be a focus of this work, is an adaptation of the popular Transcriptome Wide Association Study (TWAS) that uses a summarized imaging measure in place of gene expression to uncover possibly causal brain imaging phenotypes in neurodegenerative diseases (Gusev et al., 2016). In our previous study, we tested the AD associations of 1578 heritable UKBB brain imaging derived phenotypes (IDPs) using summary statistic based univariate Mendelian Randomization (MR), UV-IWAS, and the related univariate aSPU test (Knutson and Pan, 2020), (Pan, 2009), (Pan et al., 2015). We identified ten significant IDPs by UV-IWAS tests, proffering a set of endophenotypes with putative causal effects on AD under satisfaction of key modeling assumptions (Wainberg et al., 2019). One of these assumptions, which is required for consistent estimation of causal effects, is that the SNPs used in the IWAS model (as instrumental variables, IVs) only affect AD by means of the endophenotype being tested. This so-called”exclusion restriction” is violated if any SNP demonstrates horizontal genetic pleiotropy, the phenomenon in which a variant affects multiple causal intermediate risk phenotypes. The presence of pleiotropic variants in univariate IV-analysis can ultimately lead to inflated Type I error rates and inadequate statistical power, as is later demonstrated in UV-IWAS through simulations. These consequences have warranted skepticism toward the results of univariate MR methods and has motivated the development of tests, such as MR-PRESSO and the heterogeneity Q-test, to identify and correct for pleiotropic instruments in univariate MR (Pickrell, 2015), (Verbanck et al., 2018), (DelGreco et al., 2015), (Bowden et al., 2019), (Schaid et al., 2016). In particular, it has been shown that there are wide-spread pleiotropic effects of SNPs across the genome. In this study, we consider a multivariate extension of the univariate IWAS model, namely MV-IWAS, which provides consistent causal effect estimates in the presence of pleiotropic variants by accounting for multiple potential intermediate pathways. This approach is closely related to the existing Multivariable Mendelian Randomization IVW regression (MV-MR IVW), as outlined by Burgess and Thompson (Burgess and Thompson, 2015). We propose an extension to the MV-IWAS model which incorporates variant-specific random effects to quantify the remaining pleiotropy not captured by those phenotypes in the multivariate model. We call this approach MV-IWAS-Egger due it’s similarity to MR-Egger regression (Rees et al., 2017). In addition to correcting for pleiotropic pathways, MV-IWAS can boost power by adjusting for risk phenotypes which are not influenced by pleiotropic SNPs in the multivariate model. We apply MV-IWAS to the individual level data of 14 brain ROIs from ADNI and identify 2 endophenotypes with causal effects on AD, namely the left hippocampus and right inferior temporal cortex volumes. We further propose a practical extension to the MV-IWAS model that depends solely on endophenotype and AD GWAS summary statistics and an external reference panel of similar ancestry. This approach accounts for the LD-structure of a set of possibly correlated SNPs, making it advantageous over the existing summary statistic based MV-MR method which is restricted to a set of strictly independent instruments (Burgess and Thompson, 2015), (Eleanor Sanderson and Borden, 2019). As shown in our previous work (Knutson and Pan, 2020), using multiple correlated SNPs (as IVs) often improves statistical power over using only independent SNPs as in MR. We validate the summary statistic based MV-IWAS via simulations. Our simulations also reveal the improved performance of MV-IWAS-Egger over the related MV-MR-Egger regression for detecting remaining pleiotropic effects. We apply this method to the publicly available summary data from UKBB and IGAP and compare these results against those from univariate IWAS tests. Finally, we validate the results of our UKBB application using the GWAS summary data of seven subcortical volumes from the ENIGMA project.

Methods

TWAS In the 2SLS framework

The Transcriptome Wide Association Study (TWAS) is a popular approach for inferring the causal effect of an endophenotype, typically gene expression, on a disease trait. Statistically, TWAS is an application of the 2-Stage Least Squares (2SLS) method, an instrumental variable (IV) technique that accounts for latent confounding in association testing. The assumed true model under the TWAS framework is given in Eqs. 1a and 1b, where and . Here, U represents a set of confounders acting on the endophenotype (X) and disease (Y), and e and e are random errors. Since U is unobserved, the working model from which standard OLS estimation proceeds is a simple regression of Y on X (i.e. estimating β in Eq. 1b while ignoring the confounders embedded in the error term). The resulting effect estimates are inconsistent due to the correlation between X and . Note that, throughout this paper, for simpler notation we drop out the intercept term in a linear regression model by assuming that the data have been suitably standardized (to have a zero-intercept). The 2SLS/TWAS approach circumvents this problem by using a set of SNPs (Z) as valid instrumental variables, which are defined by the 3 key assumptions illustrated in Fig. 1a (Labrecque and Swanson, 2018). Explicitly, Z represents a matrix of SNP genotypes, with the jth column corresponding to the effect-allele counts (0, 1, or 2 for the additive model) of variant j for n subjects. The first assumption (relevance) requires correlation between Z and X, and is generally maintained by selecting a set of SNPs that are associated with the endophenotype. The second assumption prohibits associations between SNPs and the confounders U. The third IV assumption, often referred to as the “Exclusion Restriction”, requires that the only pathway by which the SNPs affect disease passes through the endophenotype being tested. This last assumption may be violated due to horizontal genetic pleiotropy, the phenomenon by which a genetic locus affects multiple intermediate phenotypes causal to disease. As such, a method to control for potential pleiotropic pathways is desirable. One such approach is a multivariate extension to the univariate TWAS method, as to be described in the coming sections.

Fig. 1.

a) A causal directed acyclic graph (DAG) representing the assumed model for univariate IV analysis with satisfaction of all 3 instrumental variable assumptions: 1) Relevance: Z is correlated with X, 2) Exchangability: Z is uncorrelated with confounders, and 3) Exclusion Restriction: Z does not affect Y, except through X. Here, C represents the set of additional causal phenotypes for disease which are not associated with the IVs. b) The assumed model under the MV-IWAS framework, allowing the IVs to be pleiotropic for multiple causal phenotypes (X1 and X2) for Y. Here, Z1 and Z exhibit horizontal pleiotropy, while Z2 does not. Under this setting, UV-IWAS estimates for X1 or X2 will be inconsistent. c) The assumed model for MV-IWAS-Egger, where SNPs Z are pleiotropic for pathways outside of X1 and X2, as represented by the direct pathway from Z to Y.

The univariate IWAS model with genetic pleiotropy

The 2SLS method is so-called because it is performed in 2 stages, as given in Eq. 2a and 2b. In the first stage, the endophenotype is regressed on SNPs (as IVs) to obtain an “imputed” endophenotype, , namely the fitted values of the Stage 1 model. In Stage 2, the disease is regressed on the imputed trait, which represents the “genetically-regulated” component of the endophenotype. Explicitly, in Stage 2 we model (using IV assumption 2), where E[X|Z] is estimated in Stage 1. If X is specified as a neuroimaging phenotype, as considered in this study, the resulting approach is referred to as the univariate Imaging Wide Association Study (IWAS) (Xu et al., 2017). IWAS is mathematically identical to the existing TWAS/2SLS model and differs only in its application. A Wald test for in the Stage 2 IWAS model tests the causal effect of the genetically-regulated component of an imaging endophenotype on Y, which is typically specified as a complex trait (such as AD). The consistency achieved by 2SLS does not hold if a GLM is used in Stage 2, such as with logistic regression for a binary disease trait (Xue and Pan, 2019). While alternative IV approaches with superior properties for binary or nonlinear outcomes have been proposed, in this study we simply model dichotomous traits in a linear Stage 2 to adhere to the standard MR/TWAS/IWAS framework (Uddin et al., 2015), (Burgess et al., 2014), (Newey, 2013), (Amemiya, 1974), (Johnston et al., 2008). As shown previously (Xue and Pan, 2019) and to be shown through simulations to follow, due to small genetic effects, a logistic regression model can be well approximated by a linear model. As previously discussed, the utility of TWAS (and thereby IWAS) for causal inference and consistent estimation relies on satisfaction of the IV assumptions for all Stage 1 SNPs. Violations of the exclusion restriction in univariate IWAS due to horizontal genetic pleiotropy can be shown mathematically by decomposing the Stage 2 error term (Eq. 2b) into two components, , where e represents the random error and C represents the set of r − 1 intermediate phenotypes (excluding X) with genetically regulated effects on Y. Without accounting for these intermediate pathways in Stage 2, is inconsistent due to pleiotropy-induced correlation between and C. This challenge motivates the multivariate IWAS model, an extension of the univariate model that adjusts for multiple genetically regulated disease pathways.

The multivariable IWAS model

The multivariable IWAS model (MV-IWAS), given by the system of equations in 3, is a straightforward multiple linear regression extension of UV-IWAS (Lin et al., 2015), (Barbeira et al., 2019). In Stage 1, r endophenotypes with putative disease associations are imputed, with the kth endophenotype given by for all . Each Stage 1 model uses an endophenotype-specific set of SNPs, Z( of dimension n by p. We define p to be the total number of unique SNPs used across all k endophenotypes. The multivariate Stage 2 model incorporates all r imputed endophenotypes, thereby adjusting for possible pleiotropic pathways from SNPs-to-disease. For an appropriately chosen set of Stage 1 traits, this model will yield consistent estimates of an endophenotype’s genetically regulated causal effect on disease . In the case that endophenotypes which are not influenced by pleiotropic variants but are reasonably predictive of the disease are included in Stage 2, MV-IWAS will gain efficiency over the univariate test by reducing the residual variance. The least squares estimates for β and Var(β) are given in Eqs. 4a and 4b, where σ2 is the residual variance and Z is an n × p matrix of genotypes for all Stage 1 SNPs, i.e., the union of SNPs represented in Z(1), …, Z(. W represents the p × r matrix of SNP effect estimates of the r endophenotypes, where the kth column includes all p estimates of α( and zero entries for all p − p SNPs not includes in the Stage 1 model for endophenotype k. If the Stage 1 SNP-sets for all r endophenotypes are disjoint (i.e. each of the p SNPs only used to impute 1 phenotype), W will be a block diagonal matrix with r blocks. The case in which the samples used to estimate α (stage 1) and β (stage 2) come from different studies is referred to as “2-Sample” MV-IWAS, which is often a more convenient alternative to the one-sample case when disease and endophenotype data are not available from the same study. OLS standard error estimation for IWAS assumes uncorrelated and homoscedastic errors. The former is generally upheld by strictly limiting analyses to a set of unrelated subjects, where relatedness can be determined by self-report or, more rigorously, using existing statistical methods for relationship inference based on genotype data (Manichaikul et al., 2010). The latter assumption, namely , will be violated for binary outcomes using OLS, yielding biased estimates for . Even so, our simulations reveal comparable performance (type I errors, power, and coverage) for continuous and binary outcomes, suggesting that the least squares approach may still be approximately valid for dichotomous traits in the current application. Existing studies have proposed the use of robust standard errors to allow for heteroscedasticity in 2SLS, a possible direction for future work in MV-IWAS (Pacini and Windmeijer, 2016),(Baiocchi et al., 2014).

Accounting for remaining directional pleiotropy with MV-IWAS-Egger

The above formulation achieves consistent causal estimation of endophenotype effects when all pleiotropic pathways from Z to Y are captured through X1, …, X. However, accounting for all intermediate pathways is unrealistic for complex diseases, as some may be unknown or unmeasured. What’s more, as predictive power for each endophenotype is improved by increasing the number of Stage 1 SNPs, the probability that any one variant affects AD through an unrepresented phenotype is high. Here, we extend the Stage 2 model in Eq. 3 to account for residual genetic effects on Y above those mediated by the set of X, as illustrated in Fig. 1c. Specifically, we incorporate variant-specific random effects , where μ represents the mean pleiotropic effect across all p SNPs (Eq. 5a). Under the regression equation in 5b, we can estimate and test for both endophenotype-specific causal effects and the presence of remaining directional pleiotropy . Here, Z represents the p-vector of genotypes for subject i. OLS estimation of is equivalent to the MV-IWAS estimates given in 4a after replacing W with , where () is a p-vector of ones. To estimate their corresponding variances using OLS, we rely on the following approximation: . This simplification assumes that is constant across subjects, which is particularly unrealistic for small p. However, it is reasonable to expect that , since each common variant explains a small proportion of the total disease heritability in complex disease (and made even smaller after removing the proportion explained via the adjusted intermediate phenotypes). Therefore, approximating will likely be valid in practice. We assess the validity of this approach for various relative magnitudes through simulation. Variance estimates are then approximated as which closely resembles the MV-IWAS estimates in 4b. By adjusting for additional pleiotropic genetic effects, MV-IWAS-Egger estimates are expected to be less biased than under the original MV-IWAS model. However, the original approach will have greater efficiency (and thereby power) if there are no additional SNPs-to-disease pathways above those included in the model. Additionally, in anomalous cases where is relatively large, Var() will be underestimated using the residual variance approximation in MV-IWAS-Egger, again yielding some loss of efficiency.

MV-IWAS Using GWAS summary statistics

Given individual level genotype, endophenotype, and disease trait data, the OLS estimates in Eqs. 4a and 4b and their MV-IWAS-Egger counterparts are straightforward to compute. However, concerns of patient privacy often limits access to these data. As such, we desire an extension of the MV-IWAS model that depends only on publicly available GWAS summary statistics, possibly from different studies, and an external reference panel. We outline such an approach in the following section, which parallels the method proposed by Deng and Pan (Deng and Pan, 2017a), (Deng and Pan, 2017b), (Yang et al., 2012). Through simulation, we evaluate this model under both the one and two sample cases (i.e. disease and endophenotype GWAS from the same sample versus different samples of similar ancestry). Without individual level data, the quantities ZZ, ZY and YY are unknown and must be estimated. Henceforth, we refer to the sample of n subjects from the disease-trait GWAS as the “disease-GWAS sample”, and denote the genotype matrix of n subjects from a reference sample of similar ancestry by Z. For a normalized genotype matrix, sufficiently large sample size and a homogeneous population, ZZ can be estimated using a reference panel of similar ancestry as In the following section, we discuss some guidelines for estimating this quantity. Estimation of ZY relies on the marginal SNP effect estimates on the disease trait, as reported in GWAS summary statistics. We denote the jth SNP’s genotypes as Z (n × 1) and it’s effect on Y as , which is estimated by with variance . Estimates for each component of the p-vector ZY, denoted as for SNP j, directly follows from manipulation of , given by where is estimated using an external reference panel as in Eq. 7. Finally, we manipulate the marginal variance estimates for each SNP to get p estimates of YY, given by for SNP j. Our final estimate of is obtained by taking the median value of all . Plugging , and into Eq. 4a and 4b (or the related MV-IWAS-Egger estimates) gives MV-IWAS estimates of the genetically-regulated and pleiotropy-adjusted effects of each endophenotype based solely on GWAS summary statistics and a reference panel. It is noted that, in our previous works on GWAS summary data-based TWAS/IWAS (Knutson and Pan, 2020; Xu et al., 2017), the tests were based on univariate/marginal Z-statistics of SNPs; here, our proposed test is based on estimating the regression coefficient β in a joint/multivariate regression model.

Estimating SNP-Correlations using an external reference sample

When individual level data is not available, the choice of GWAS and corresponding reference sample for estimating SNP correlations (ZZ) can have a substantial impact on the performance of the summary statistic based MV-IWAS estimation (Deng and Pan, 2018). We expect that the SNP correlations estimated from a study sample will asymptotically approximate the true correlation structure of an ancestrally homogeneous subpopulation. Therefore, should be a consistent estimate of for large GWAS and reference samples. In their 2017 exploration of LD estimation in fine-mapping experiments, Brenner et al. conclude that a reference panel of 1000 subjects is sufficient to approximate the LD-structure of GWAS sample of 5000 participants (Benner et al., 2017). Some regularization approaches have been proposed to improve estimation of , though these techniques can not overcome the handicap of an unrepresentative or small sample (Benner et al., 2017). Deng and Pan propose a multiple imputation type approach for handling LD estimation with small sample sizes and emphasize its improved power over existing methods in some situations, such as penalization of the off-diagonal components of for some (Deng and Pan, 2018; Mak et al., 2017). We do not apply these techniques in the applications of this study, but they nonetheless warrant consideration in summary statistic based methods such as MV-IWAS.

Advantage of MV-IWAS over existing multivariate approaches

MV-IWAS and MV-IWAS-Egger are analogous to the existing Multivariable Inverse-Variance Weighted and Egger Regression Mendelian Randomization methods (MV-IVW, MV-Egger) using neuroimaging phenotypes, respectively (Burgess and Thompson, 2015). In fact, provided all SNPs are strictly uncorrelated, UV-IWAS is asymptotically equivalent to the IVW MR. However, the IVW and Egger estimators (and by extension their multivariate version) suffer from substantial power loss when restricted to a smaller set of independent IVs/SNPs (Hemani et al., 2018), (Bowden et al., 2016). IWAS can therefore achieve greater power over MR by leveraging a larger set of SNPs (as IVs) in moderate LD (Knutson and Pan, 2020). Another related method is an extension of UV-IWAS to testing on multiple (neuroimaging) endophenotypes simultaneously (Xu et al., 2017). Their method improves statistical power for association using the so-called doubly adaptive sum of scores test (daSPU), which data-adaptively determines the relative contributions of the SNPs and the endophenotypes in a collapsed test statistic (Kim et al., 2016). This approach has particular utility in gene-based testing (i.e. Stage 1 SNPs chosen from within or near the coding region of a gene), where the collective information across intermediate genetic pathways boosts power for detecting disease-associated genes. However, unlike UV-IWAS and MV-IWAS, the data-adaptive test does not support causal interpretation. More importantly, as a global test (with a null hypothesis of no disease association with any of the endophenotypes), the method does not estimate the joint effects of each endophenotype after adjusting for other endophenotypes. In contrast, our proposed MV-IWAS facilitates formal inference for endophenotype-specific associations with disease. MV-IWAS is therefore a more appropriate and powerful parametric approach for uncovering causal genetically-regulated endophenotype-disease associations using multiple possibly correlated SNPs.

Applied analysis procedure

In this report, we use MV-IWAS to test for genetically regulated AD-endophenotype causal associations using both individual level and summary data from multiple cohorts. The aims of these analyses are two-fold; 1) to identify a set of MRI-derived phenotypes with causal associations in AD, with particular interest in causal IDPs that are replicable across multiple datasets and 2) to compare the IDPs identified by MV-IWAS against analogous UV-IWAS and Mendelian Randomization approaches. Fig. 2 summarizes the three applied data analyses performed in this study. Our first application uses individual level data from the first phase of the Alzheimer’s Disease Neuroimaging Initiative (ADNI1). We then apply the 2-sample summary statistic based approaches to the publicly available phase 1 GWASs of brain imaging phenotypes from UKBiobank and the AD GWAS from the International Genomics of Alzheimer’s Project (IGAP). Finally, we replicate results from the UKBB application using summary data on seven subcortical volumes from the ENIGMA network. For all LD estimation, we use 503 samples of European ancestry from Phase 3 of the 1000 Genomes Project (Consortium, 2010). In the following sections, we describe each of these datasets and detail our Stage 1 model building procedures. A detailed analysis pipeline and relevant code for our summary statistic application can be found at https://github.com/kathalexknuts/MVIWAS.

Fig. 2.

Workflow for applied analysis of 3 different data sources.

Data

Data used in the preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). The ADNI was launched in 2003 as a public-private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer’s disease (AD). For up-to-date information, see www.adni-info.org. ADNI1 has collected clinical, biomarker, and imaging data from 200 AD diagnosed patients, 400 mild cognitive impaired (MCI) patients, and 200 cognitively normal (CN) patients at baseline (Saykin et al., 2015). We only consider the 735 individuals who have both phenotype and genotype data (172 AD, 211 Cognitively Normal, 352 Mild Cognitive Impairment) and exclude subjects who self identify as Latino or His-panic. We categorize all MCI and CN patients as “non-AD” cases. We consider the baseline gray matter volumes from 14 brain ROIs with suggestive roles in AD, as used in the study of Xu et al. (2017). Specifically, we consider the left and right volumes of the Inferior Parietal, Inferior Temporal, Medial Orbitofrontal, Parahippocampal, Precuneus, Posterior Cingulate, and Hippocampal regions. As to be discussed, we adjust our analysis for the potential confounders of age, gender, handedness, education level and intracranial volume at baseline. We perform genetic imputation using the 1000 Genomes Project (1000G) Phase 3 reference panel using the SHAPEIT (for pre-phasing) and IMPUTE2 software. We filter out SNPs with MAF < 0.05 and R2 > 0.8, leaving 6,286,276 remaining variants. The UK Biobank (UKBB) is a large-scale prospective cohort study with phenotype and genetic data on ~ 500,000 subjects, and additional brain MRI data for a smaller subset of participants (projected 100,000 subjects by 2022). In 2018, Elliott et al. reported GWAS summary statistics at 11,734,353 SNPs for 1578 heritable brain imaging phenotypes (IDPs), including 306 functional, 656 diffusion, and 616 structural MRI measures (Elliott et al., 2018). These round 1 GWASs were based on the data from 8428 UKBB participants. Preceding association testing, all IDPs were quantile normalized, outliers were removed, and the following set of confounders were regressed out: age, sex, 4 measures of the subject’s position, 2 measures of head motion, volumetric head size, the top 40 genetic PCs, and 10 “drift” covariates (to account for software, hardware, or protocol changes throughout data collection). Various transformation on these covariates were also included (e.g. age2), as described in (Elliott et al., 2018). ENIGMA (”Enhancing Neuro Imaging Genetics Through Meta Analysis”) is a world-wide network focused on combining imaging and genetic data for the study of brain function and disease. As part of this initiative, a 2015 study publicly reported GWAS summary data on 7 subcortical region volumes (left + right hemispheres) using 13,171 subjects (Hibar et al., 2015), (Wu, 2020). These ROIs include the nucleus accumbens, caudate, putamen, pallidum, amygdala, hippocampus and thalamus, with volumes all correct for ICV. For AD information, we use summary statistics from the International Genomics of Alzheimer’s Project (IGAP) (Lambert et al., 2013). IGAP is a large two-stage study based upon genome-wide association studies (GWAS) on individuals of European ancestry. In stage 1, IGAP used genotyped and imputed data on 7,055,881 single nucleotide poly-morphisms (SNPs) to meta-analyse four previously-published GWAS datasets consisting of 17,008 Alzheimer’s disease cases and 37,154 controls (The European Alzheimer’s Disease Initiative (EADI), the Alzheimer’s Disease Genetics Consortium (ADGC), The Cohorts for Heart and Aging Research in Genomic Epidemiology consortium (CHARGE), The Genetic and Environmental Risk in AD consortium (GERAD)). In stage 2, 11,632 SNPs were genotyped and tested for association in an independent set of 8572 Alzheimer’s disease cases and 11,312 controls. Finally, a meta-analysis was performed combining results from stages 1 and 2.

Stage 1 model selection

To select Stage 1 SNPs for the 14 ADNI1 endophenotypes with individual level data, we implement a multistep approach that accounts for possible confounders of the endophenotype-AD relationship. We first perform LD pruning to remove loci based on high pairwise LD (R2 > 0.8), retaining the SNP with the highest MAF in each pair. Chromosomes 2 and 22 correspond to the largest and smallest post-pruning genotype matrices with 95,431 and 16,150 SNPs, respectively. Next, we regress each endophenotype on baseline measures for age, gender, handedness, education level and intracranial volume. Using the residuals from this first stage as an outcome and genotype matrix, we fit an elastic net for each chromosome separately, with α = 0.5 (i.e. equal importance assigned to L1 and ridge penalties) and the regularization parameter (λ) chosen based on minimum 5-fold CV error. We retain all SNPs with non-zero coefficients and estimate their joint effects in a multiple linear regression using SNPs across all chromosomes. A summary of the number of remaining SNPs for each of the 14 endophenotypes is given in Table 1. The average R2 between corresponding true and imputed endophenotypes is 0.5883 (min = 0.4146, max = 0.7233), demonstrating the adequate predictive power of our Stage 1 models for these data.

Table 1

Number of genome-wide SNPs with non-zero elastic net coefficients used as Stage 1 SNPs for 14 ADNI1 endophenotypes. Elastic net was fit with confounder adjusted residuals as a response and genotype matrix as predictors.

Endophenotype	Number of Stage 1 SNPs (Left/Right)
Hippocampus	148/101
Inferior Parietal	80/77
Inferior Temporal	310/77
Medial Orbitofrontal	69/1126
Parahippocampus	84/183
Precuneus	234/175
Posterior Cingulate	221/337

For summary statistic based MV-IWAS using UKBB IDPs, we use LD clumping and thresholding on each endophenotype GWAS to select variants. For each of the 1578 heritable IDPs, we perform LD Clumping using the 1000 Genomes reference panel (EUR) with a clumping radius of 1 Mb and R2 cutoff of 0.1. In our previously published work, we implemented univariate IWAS and aSPU tests for IDP-mediated-AD associations using 3 different significance thresholds and found that the p < 5 * 10−5 threshold yielded the greatest number of significant univariate IWAS tests. We will use this threshold for Stage 1 model selection in the present study. Across all heritable IDPs, the minimum, mean, and maximum number of Stage 1 SNPs were 43, 81, and 132, respectively.

Simulation framework

We conduct extensive simulation studies to assess the performance of MV-IWAS and MV-IWAS-Egger across numerous realistic scenarios, and compare these methods against UV-IWAS and, in summary data applications, MV-MR. We evaluate these approaches for both quantitative and binary disease traits under the one and two sample cases. We perform two sets of simulations, the first comparing UV- and MV-IWAS in the absence of remaining directional pleiotropy and the second evaluating all methods across varied magnitudes of the mean direct pleiotropic effect (μ). In this section, we outline the set-up for these simulations. To mimic realistic genetic data, we randomly simulate genotypes for n = 20000 subjects using LD correlations estimated directly from individual level ADNI1 data. More specifically, we generate allele frequencies from a multivariate binomial distribution (bindata package in R) based on the correlations of 20 genome-wide SNPs that were randomly sampled from the Stage 1 SNPs used in our application to ADNI1 data (Taudes, 2005). The LD structure for these SNPs, as estimated using ADNI1 versus 1000G, is illustrated in Figs. 3a and 3b.

Fig. 3.

(a)LD Correlations for the 20 simulated SNPs, estimated using ADNI1 data. (b)LD Correlations for the 20 simulated SNPs, estimated using 1000G data.

For 2 sample simulations, we randomly split subjects into 2 sets of n1 = n2 = 10000 for estimating endophenotype and disease SNP effects, respectively. Using these data, we simulate 3 endophenotypes from , where . To reflect a situation in which 30% of SNPs demonstrate pleiotropic effects, we set the first 14 true effect sizes for the kth endophenotype to zero (i.e. α = 0 for j = 1, … 14), and randomly sample the remaining 6 invalid SNPs from . Across all settings, the mean SNP-heritabilities for the 3 simulated endophenotypes were 0.156, 0.100, and 0.097 (estimated as the variance of X explained by Z). These values are comparable to the heritability of the UKBB IDPs considered in this study (mean = 0.29), as estimated and publicly reported by Elliott et al. (2018). We simulate quantitative disease traits from for , where . For binary disease traits, we generate Bernoulli probabilities from and obtain Y by randomly sampling from a Bernoulli distribution with probability p. Here, b = (b0, b1, b2, b3, b4) represent log odds or log odds ratios. In order to evaluate estimates from our linear Stage 2 working model, we rescale these log odds ratios as linear model effects using the following procedure (Pattee and Pan, 2020). Under the logistic regression model, . A first order multivariate Taylor Series expansion of this expression about (b1, b2, b3, b4) = (0, 0, 0, 0) yields an approximate linear model, with Later in application, we use this approach to linearize the OR from IGAP AD summary data. An approximation of can be explicitly justified using the law of total variance: . Note that represents the ratio of the proportions of controls and cases, again due to small genetic (and confounding) effect sizes. Then, , yielding the linearized variance approximation given in Eq. 11. In all quantitative trait simulations, we set (β1, β2, β3, β4) = (0, 0.5, −0.2, 1). In binary trait simulations, we set (b1, b2, b3) = (0, 0.5, −0.5) and b0 = −log(4) = −1.38, which corresponds to the ratio of controls to cases in ADNI1 (≈ 4). After transformation, (β1, β2, β3) = (0, 0.08, −0.08). For modeling with individual level data, we estimate the joint effects () for each endophenotype, impute each , and estimate in Stage 2 using both the UV-IWAS and MV-IWAS models. As previously discussed, we assume a linear Stage 2 model for both quantitative and binary disease traits. For the summary statistic based approach, we estimate the marginal SNP effects for each endophenotype and disease, as are reported in GWAS summary statistics, and use the 1000G EUR reference panel of 503 subjects to estimate for Stage 2. For all settings, we perform 1000 iterations and compare the point estimates, 95% coverage, and power across models. Note that power estimates for β1 = 0 correspond to the Type I error rate, which will ideally be maintained near 0.05. For our simulations which incorporate additional direct pleiotropy, we simulate variant specific effects by sampling from . We compare the performance of all methods across various magnitudes of the mean pleiotropic effects, , and the proportion of each SNPs variance explained compared to , with for all simulations. These values were chosen based on published simulations of the analogous MV-MR methods, such as in (Rees et al., 2017). The choice of μ = 0.2 reflects an expected upper bound for mean variant specific effects, determined based on the rounded mean marginal SNP effect across all variants in the APOE gene in IGAP (after linear approximation). We implement MV-IVW and MV-Egger using the MendelianRandomization package in R. For these MR models only, we perform recommended LD pruning with an R2 cutoff of 0.05 and retain the SNP in correlated pairs with the highest MAF based on 1000 Genomes.

Results

MV-IWAS Improves type-I error control compared to UV-IWAS

Our first set of simulations compared UV- and MV-IWAS in the absence of additional direct pleiotropic effects. For both quantitative and binary disease traits, Type I error rates were well-controlled for MV-IWAS using individual level data, as evaluated based on the power estimates for β1 = 0 (Table 2). 95% simulation coverages were near the nominal level for all MV-IWAS endophenotype estimates, with notably high power for β2 and β3 in both the 1 and 2 sample case. The summary statistic based approach performed comparably, with well maintained but slightly more conservative Type I errors and coverages near 95%. Ultimately, these results empirically validate the summary statistic extension of the MV-IWAS model, even with a small reference panel for LD approximations.

Table 2

Mean Simulation Estimates, Power, and 95% Coverages for 1- and 2- sample univariate versus multivariate IWAS for quantitative and binary disease traits with 3 simulated endophenotypes using p = 20 randomly selected genome-wide SNPs. True values for the quantitative and binary traits are β = (0, 0.5, −0.2) and β = (0, 0.08, −0.08), respectively. 30% of SNPs are invalid IVs. Power estimates (or alternatively 1 – 95% Coverage) for β1 = 0 correspond to the Type I Error rate.

2-Sample Test with Quantitative Disease Trait, True β = (0, 0.5, −0.2)
	β^1(SE),β^2(SE),β^3(SE)	95% Simulation Coverages	Power

Univariate IWAS (Individual Level Data)	−0.0005 (0.027), 0.479 (0.056), −0.177 (0.055)	0.541, 0.676, 0.363	0.459, 0.990, 0.739
MV-IWAS (Individual Level Data)	−0.003 (0.029), 0.491 (0.065), −0.208 (0.064)	0.949, 0.946, 0.939	0.051, 0.992, 0.861
MV-IWAS (1000G)	−0.002 (0.041), 0.495 (0.092), −0.208 (0.091)	0.982, 0.970, 0.983	0.018, 0.983, 0.682

2-Sample Test with Binary Disease Trait, True β = (0, 0.08, −0.08)
	β^1(SE),β^2(SE),β^3(SE)	95% Simulation Coverages	Power

Univariate IWAS (Individual Level Data)	0 (0.007), 0.074 (0.014), −0.074 (0.014)	0.654, 0.525, 0.506	0.346, 0.886, 0.901
MV-IWAS (Individual Level Data)	0 (0.007), 0.077 (0.017), −0.077 (0.017)	0.951, 0.936, 0.948	0.049, 0.962, 0.961
MV-IWAS (1000G)	0 (0.010), 0.077 (0.023), −0.077 (0.023)	0.991, 0.99, 0.989	0.009, 0.891, 0.918

1-Sample Test with Quantitative Disease Trait, True β = (0, 0.5, −0.2)
	β^1(SE),β^2(SE),β^3(SE)	95% Simulation Coverages

Univariate IWAS (Individual Level Data)	0.002 (0.019), 0.515 (0.039), −0.186 (0.039)	0.441, 0.581, 0.251	0.559, 0.997, 0.809
MV-IWAS (Individual Level Data)	0.002 (0.02), 0.515 (0.046), −0.186 (0.046)	0.985, 0.971, 0.966	0.015, 1, 0.932
MV-IWAS (1000G)	0.003 (0.029), 0.520 (0.065), −0.187 (0.065)	0.974, 0.946, 0.958	0.026, 1, 0.817

1-Sample Test with Binary Disease Trait, True β = (0, 0.08, −0.08)
	β^1(SE),β^2(SE),β^3(SE)	95% Simulation Coverages	Power

Univariate IWAS (Individual Level Data)	0 (0.005), 0.077 (0.010), −0.074 (0.010)	0.508, 0.389, 0.368	0.492, 0.941, 0.933
MV-IWAS (Individual Level Data)	0 (0.005), 0.079 (0.012), −0.075 (0.011)	0.953, 0.938, 0.923	0.049, 0.991, 0.989
MV-IWAS (1000G)	0 (0.007), 0.079 (0.017), −0.076 (0.016)	0.985, 0.983, 0.981	0.015, 0.976, 0.968

In contrast, the Type I errors for the univariate IWAS model were highly inflated, specifically at 0.459 and 0.346 for the 2 sample models of quantitative and binary traits, respectively. 95% confidence intervals for the first 100 simulations are illustrated in Fig. 4, which highlights the increased variability in for UV-IWAS that is not properly captured by the estimated variance. Power estimates for β2 and β3 were also substantially lower than those of MV-IWAS for both quantitative and binary diseases. These results serve to highlight the notable improvement of MV-IWAS, both using individual level data and summary statistics, over the standard univariate IWAS approach for the detection of true causal endophenotypes while controlling Type I error rates.

Fig. 4.

95% confidence intervals for β1 from the first 100 iterations from the 2 sample simulations for a quantitative disease trait. The true value for β1 = 0.

We further assessed the performance of MV-IWAS when one of the causal pleiotropic endophenotypes was excluded from the Stage 2 model. This situation is reasonable to expect in applications where some of the substantive pleiotropic pathways are unknown or unmeasured. For simulations of a quantitative disease trait, excluding X3 in the Stage 2 model resulted in a somewhat elevated Type I error but similar power for β2 compared to the fully specified model in Table 2 (specifically, Type I Error = 0.16 and Power = 0.989). Nonetheless, the performance of MV-IWAS with 2 of the 3 causal endophenotypes was dramatically improved over that of the univariate IWAS test, highlighting its utility in spite of some omitted causal pathways.

Superior performance of MV-IWAS-Egger over alternative approaches under directional pleiotropy

For our second set of simulations, we incorporated direct variant-specific pleiotropic effects and evaluated the performance of all IWAS and MR methods. Fig. 5 compares the Type I error rates and power for varied magnitudes of with (a) μ = 0, (b) μ = 0.05, and (c) μ = 0.2. Across all settings, both the individual level and summary statistic based applications of MV-IWAS-Egger maintained Type-I Errors below 0.05 and achieved the highest power compared to all other methods. In the case of balanced horizontal pleiotropy (i.e. μ = 0), all multivariate methods had well controlled Type-I Errors for β1 and μ (for Egger methods). However, MV-IWAS-Egger was alone able to control Type I errors in the presence of directional direct pleiotropic effects (μ = 0.05 and 0.2). As may be expected, the Type-I error rates for both UV- and MV-IWAS increased with larger mean direct pleiotropic effects, and were inflated even above those for the related MV-IVW method.

Fig. 5.

Simulation Type I Error and Power across different magnitudes of and μ. For each setting of μ: i) Type I error for β1 ii) Power for β2 iii) Power for β3 iv) Power/Type I Error for μ. LD estimated using 1000 Genomes for all Summary Statistic IWAS methods (Summ MVI-WAS and Summ MVIWAS-Egger).

In accordance with findings from our previous work (Knutson and Pan, 2020), both MR methods had substantially lower power to detect causal effects for β2 and β3 compared to all IWAS methods in all scenarios. This difference was particular notable for the largest direct pleiotropic effect size (β = 0.2); while the power for both β2 and β3 was greater than 0.8 for all IWAS models in this setting, neither of the MV-MR methods achieved power greater than 0.2 for β2 or β3. What’s more, the test of H0 : μ = 0 for MV-MR-Egger was notably underpowered, with power of ≈ 13% for both μ = 0.05 and μ = 0.2. In contrast, the MV-IWAS-Egger test for directional pleiotropy was very powerful (power near 1) and controlled the Type-I error rate below 0.05 when μ = 0. Finally, we point out that MV-IWAS-Egger yielded stable Type-I errors and power across all values of , validating the use of the OLS variance approximation. These simulations suggest that MV-IWAS-Egger should be preferred over alternative approaches for detecting true causal endophenotype-disease associations.

Application of MV-IWAS to 14 ADNI endophenotypes

MV-IWAS and MV-IWAS-Egger identified atrophy to the left hippocampus and right inferior temporal cortex as causal for Alzheimer’s Disease. As reported in Table 3, our application of the univariate IWAS test to the individual level data of 14 volumetric ADNI measures identified 13 significant phenotypes at a Bonferonni adjusted significance level of 0. 05∕14 = 0. 0036. Of these, only the left hippocampal and right inferior temporal cortex volumes remained significant for both multivariate tests. Estimates between MV-IWAS and MV-IWAS-Egger were concordant, and the random direct effect in MV-IWAS-Egger was not significant, indicating no substantive directional pleiotropy in the multivariate model.

Table 3

Univariate and multivariate tests of 14 ADNI1 enodphenotypes. Stars indicate significance at a Bonferonni adjusted significant threshold of 0.05/14 for UV- and MV-IWAS and 0.05/15 for MV-IWAS-Egger.

		UV-IWAS			MV-IWAS MV-IWAS-Egger
Volumetric Phenotype (L/R)	# of SNPs	Estimate	SE	P	Estimate	SE	P
μ					−0.001	0.015	0.899
Medial Orbitofronal Cortex	69/126	−0.045/−0.079	0.015/0.015	3.77e-03/2.77e-07*	0.019/−0.034 0.019/−0.034	0.015/0.016 0.015/0.016	0.205/3.45e-02 0.203/3.44e-02
Posterior Cingulate Cortex	221/337	−0.053/−0.061	0.015/0.015	6.35e-04/6.77e-05	0.009/−0.006 0.009/−0.006	0.015/0.015 0.015/0.015	0.540/0.679 0.538/0.679
Inferior Temporal Cortex	310/77	−0.117/−0.073	0.015/0.015	2.09e-14/2.13e-06	−0.038/−0.007 −0.038/−0.008	0.016/0.015 0.017/0.015	2.42e-02/0.623 2.41e-02/0.612
Parahippocampus	84/183	−0.089/−0.095	0.015/0.015	6.29e-09/6.12e-10	−0.018/−0.027 −0.018/−0.027	0.016/0.016 0.016/0.016	0.267/9.69e-02 0.265/9.64e-02
Inferior Parietal Cortex	80/125	−0.108/−0.080	0.015/0.015	2.17e-12/1.93e-07	−0.060/−0.005 −0.060/−0.005	0.016/0.016 0.016/0.016	2.30e-04/0.752 2.45e-04/0.746
Precuneus	234/175	−0.078/−0.081	0.015/0.015	3.87e-07/1.54e-07	−0.002/−0.014 −0.002/−0.014	0.018/0.018 0.018/0.018	0.902/0.427 0.891/0.431
Hippocampus	148/101	−0.148/−0.137	0.014/0.014	8.12e-23/1.85e-19	−0.082/−0.034 −0.082/−0.033	0.020/0.020 0.020/0.020	5.16e-05/9.74e-02 6.03e-05/9.89e-02

The notable differences between the univariate and multivariate IWAS estimates can possibly be attributed to the high degree of correlation between endophenotypes, which are illustrated by the heatmap in Fig. 6. Of particular note is the high correlation between the right and left volumes for the hippocampus (R2 = 0.677), parahippocampus (R2 = 0.439) and precuneus (R2 = 0.584), which share 25, 4, and 7 Stage 1 SNPs between the two sides, respectively. This high degree of correlation and sharing of Stage 1 SNPs suggests the influence of genetic pleiotropy in the inflated Type I errors of the UV-IWAS tests, as was shown in simulation.

Fig. 6.

Correlations between the 14 Imputed ADNI1 Endophenotypes.

Causal AD-associations for the left hippocampus and right inferior temporal cortex volumes are strongly supported across the literature (Fox et al., 1996), (Galton et al., 2001), (Shen et al., 2014). In particular, many studies have shown that the left hippocampus exhibits greater atrophy (and at faster rates) compared to the right hippocampus, supporting the result that the left (and not the right) imputed hippocampal volume remained significant in the multivariate model ((Sarica et al., 2018), and references therein). Manhattan plots in Fig. 7 illustrate the genetic association patterns in Stage 1 for these two significant IDPs and provide comparisons against the Stage 1 models for analogous UKBB IDPs. Top SNPs for the left hippocampus are located within genes which have previously been shown to be over-expressed in AD hippocampal tissue, such as NR4A1 (rs2242107) and FAR2 (rs7137556) (Zhao et al., 2018), (Zhang et al., 2015). What’s more, our imputation model includes 49 SNPs within chromosome 12, specifically 3 SNPs in 12q14 and 12 SNPs in 12q24, a finding which is corroborated throughout the literature and mirrors the association pattern for the related IDP in our UKBB summary statistic application, namely T1 FAST left hippocampus volume (Bis et al., 2012).

Fig. 7.

Manhattan Plots reflecting Stage 1 Models for the 2 ADNI (left) Endophenotypes significant for MV-IWAS with comparison to comparable UKBB T1 FAST (right) IDPs. Note the difference of y-axis scale for the ADNI and UKBB plots; the notably higher peaks for the UKBB marginal p-values can be explained by the substantial difference in sample size between the two studies. We further note that UKBB IDP 0053 is a measure of only the anterior division of the left inferior temporal gyrus and is thereby not directly comparable to the corresponding ADNI measure.

Univariate testing of 1578 UK biobank IDPs

Preceding application of our multivariate methods, we performed univariate IWAS tests for all 1578 heritable UK Biobank IDPs. A total of 94,025 unique SNPs were used in Stage 1 modeling across all IDPs, 15,036 of which were used to impute more than one IDP (Fig. 10). This result suggests possible pleiotropic effects, in addition to those caused by linkage disequilibrium between Stage 1 SNPs. For summary statistic based testing, we first converted all IGAP odds ratios and variances to linear regression estimates using the previously described approach, namely Eqs. 10 and 11, with exp(− b0) ≈ 37154∕17008. For all IDPs, we used the median value of Eq. 9 across all 15,036 SNPs to obtain a stable estimate of = 1.634237. We also applied univariate MR-IVW and MR-Egger regression. In line with recommended practice, we performed LD clumping using 1000 Genomes with an R2 cutoff of 0.001 and window of 10,000 kb for all MR analyses.

Fig. 10.

Number of SNPs which are included in 1, 2, 3, 4, 5–50, or 50+ UK Biobank IDP’s Stage 1 IWAS model by modality. Recurrence in more than one Stage 1 IDP model gives evidence for possible pleiotropic effects that will cause inconsistency in the univariate IWAS approach.

For the UV-IWAS and UV-IWAS-Egger tests, 216 and 210 IDPs were significant at a Bonferonni corrected significance threshold of 0.05/1578, respectively. 202 IDPs were significant under both models. These results diverge somewhat from those reported in our previous work, in which a similar (but not equivalent) test was used based on univariate summary Z-statistics (Knutson and Pan, 2020). Of the 202 IDPs significant for both univariate tests, 36, 79, and 87 were based on functional, diffusion, and structural modalities, respectively. The mean variant-specific effect (μ) deviated significantly from zero for 83 IDPs, indicating possible directional pleiotropy. In contrast, no IDPs were significant for univariate MR-IVW or Egger regression, including μ for all UV-MR Egger models. This result is consistent with our previous work and expectations: MR suffers from power loss by being restricted to a smaller set of uncorrelated variants.

Selecting UKBB imaging phenotypes for multivariate analyses

We implemented MV-IWAS separately for each of the imaging modalities, presupposing that the SNPs affecting brain structure, activity, and connectivity are largely distinct. For each of these tests, we included IDPs with unadjusted univariate p-values below 0.05 for either UV-IWAS or UV-IWAS-Egger (146 functional, 306 diffusion, and 279 structural). To address concerns of possible multicollinearity, we computed the correlations between these imputed phenotypes (using 1000G) within each of the modality groups. The mean pairwise correlation within the functional, diffusion, and structural MRI IDP modalities are 0.0381, 0.0490, and 0.0383, respectively (Fig. 8a). 3 pairs of imputed structural IDPs have correlations above 0.75, as shown in (Fig. 8b). For each pair, we retained the IDP with the smallest univariate p-value, namely unnormalized versus normalized T1 brain volume, Freesurfer brain segmentation volume without ventricles (from surface), and total versus right hemispheric cerebral white matter volumes. The 7 imputed dMRI IDPs illustrated in Fig. 8c have high pairwise correlations (|ρ| > 0.75). The biological connectivity between many of these pairs of IDPs have been previously identified in dMRI studies (specifically, the raw and not genetically imputed measures), such as the connection between the superior longitudinal fasciculus and cingulum bundle (Wu et al., 2016). Each of these ROIs have also been separately implicated in neurodegenerative diseases through dMRI based studies, indicating they may each play distinct and important roles in the brain and should therefore all be retained in the multivariate dMRI model (Yin et al., 2015), (Bubb et al., 2018), (Haghshomar et al., 2018), (Whitwell et al., 2010), (Walsh et al., 2011). The maximum correlation between imputed functional IDPs is 0.378, so we retain all 146 IDPs in our multivariate analysis.

Fig. 8.

a) Distribution of pairwise pearson correlations for IDPs with unadjusted UV-IWAS p-values below 0.05 by modality. This includes 306 dMRI, 279 sMRI and 146 fMRI imputed IDPs. b) 3 IDP pairs of sMRI IDPs with correlations > 0.75. c) 7 pairwise correlations > 0.75 between imputed dMRI IDPs.

Discovery of causal imaging phenotypes to AD using UKBB summary statistics

Across all 3 modalities, MV-IWAS identified new causal UKBB brain phenotypes which were not detected via univariate IWAS. We determine significance using a Bonferroni adjusted threshold of 0.05 divided by 729 (732 for MV-IWAS-Egger), the number of IDPs included across all modality-specific MV-IWAS models. A comparison of the significant IDPs between the UV and MV-IWAS tests are given in Fig. 11. All estimates are included in Supplementary Table 1.

Fig. 11.

Comparison of the significant phenotypes identified by the univariate and multivariate IWAS tests for Structural, Diffusion, and Functional MRI IDPs. The quantities given within the red, blue, and grey circles indicate the number of IDPs which were significant for MV-IWAS-Egger, MV-IWAS, and UV-IWAS, respectively. Their intersection reflects IDPs which were significant under all 3 tests.

MV-IWAS implicated the volumetric changes of many important structures within or functionally connected to the temporal lobe, such as the left and right amygdala, left superior temporal gyrus, and left and right middle and inferior temporal gyri (de Jong et al., 2008), (Poulin et al., 2011). Notably, the latter volumetric measure was also identified in our application to individual level ADNI1 data, further corroborating the results of our summary statistic based approach. MV-IWAS also identified structural changes to the left and right putamen, precuneus, and insular cortices and the left parahippocampal gyrus (de Jong et al., 2008), (Jiji et al., 2013), (Pini et al., 2016), (Fathy et al., 2019). Numerous diffusion MRI measures related to the superior cerebellar peduncle, hippocampal subdivision of the cingulum, superior longitudinal fasciculus, and corona radiata were also uniquely discovered by the MV-IWAS test (Toniolo et al., 2018), (Zhang et al., 2007), (Mayo et al., 2019). 7 additional measures related to the anterior and posterior limbs of the internal capsule were also identified by MV-IWAS (Mayo et al., 2019). Fig. 9 provides Manhattan plots and identifies top SNPs for IDPs with the greatest causal effect estimate from MV-IWAS-Egger within each modality group, namely total brain volume (structural), Mean OD in body of corpus callosum on FA skeleton (dMRI), and NET100 0780 (resting-state fMRI network edge) (Silbert et al., 2003), (Demey et al., 2015). Notably, 21 SNPs used for imputation of NET100 0780, including a top SNP rs116889766, are located within PRUNE2, a known risk gene for AD (Potkin et al., 2009). 14 Stage 1 SNPs (including multiple top SNPs) for the body of corpus callosum dMRI IDP are located within ALPK1 which has been previously associated with increased risk for multiple inflammatory diseases but not AD-related dementias (Ryzhakov et al., 2018), (Williams et al., 2019), (Ko et al., 2013). The top structural MRI IDP, total brain volume (freesurfer), was imputed using 25 Stage 1 SNPs located within SGIP1, including 12 genome-wide significant variants. The SGIP1 encoded protein has been implicated in cerebral amyloid angiopathy, the accumulation of amyloid proteins in the brain, which is highly characteristic of AD (Hondius et al., 2018). What’s more, the top SNP for this IDP (rs560082768) is located within the FAM19A5 (also called TAFA5) gene, which has been previously associated with both AD and neuropsychiatric conditions (Herold et al., 2016), (Shahapal et al., 2019).

Fig. 9.

Manhattan Plots for Stage 1 SNPs used for each of the IDPs with the greater causal effect estimate for MV-IWAS-Egger.

MV-IWAS also implicated many possible false positive univariate results. Specifically, 13 structural, 2 functional, and 23 diffusion MRI IDPs with significant UV-IWAS tests were not supported under the multivariate model. Interestingly, two structural MRI measures related to the thalamus (namely the left thalamus and L+R thalamus volumes) were supported by the UV-IWAS test but not the MV-IWAS model. Evidence in the existing literature suggests this UV-IWAS result may be a false positive; Aggleton et al. argue that structural MRI measures of total thalamic volume can not distinguish changes to the subset of nuclei associated with AD (Aggleton et al., 2016). Another notable finding is that the T2 left+right versus right-only amygdala sMRI measures were exclusively significant for UV- and MV-IWAS, respectively. Hemispheric lateralization of the amygdala has been observed in numerous AD studies, specifically implicating right amygdala atrophy in loss of emotional memory amongst AD patients (Philippi et al., 2015), (Cimino et al., 1991), (Buchanan et al., 2006), (Markowitsch et al., 2000). Our findings, which specifically implicate the right amygala in lieu of the aggregated left+right measure, are therefore well supported. The IDPs which are exclusively significant for UV-IWAS can also be explained by their close relation to ROIs which were significant for MV-IWAS, possibly because the detected effect in the univariate model was due solely to it’s association to a related causal IDP. For instance, the left posterior cingulate surface area was significant only under the univariate model, but the closely related left posterior cingulate thickness (based on the same brain atlas) was significant for MV-IWAS. These findings, along with our simulations, highlight the utility of MV-IWAS over UV-IWAS for controlling Type-I errors in the presence of genetic pleiotropy. No IDPs were significant for MV-MR Egger regression for any modality group. For MV-IVW, 1 diffusion, 3 structural, and 4 functional IDPs were significant, only two of which were also significant for MV-IWAS. These IDPs include the L Cerebral Peduncle (dMRI), R Intracalcarine Cortex volume, and L & R Supracalcarine Cortex volumes (sMRI).

Replicated MV-IWAS findings for UKBB and ENIGMA summary data

The ENIGMA project and UK Biobank each published GWAS summary data on comparable structural T1-weighted MRI based volumetric brain measures. These include the volume of left + right Nucleus Accumbens, Caudate, Putamen, Pallidum, Amygdala, Hippocampus and Thalamus. We perform MV-IWAS using these summary data separately for the two cohorts, but with the same set of Stage 1 SNPs (selected by clumping + thresholding, as before). As shown in Table 4, MV-IWAS yielded equivalent results between the two studies, specifically identifying the hippocampus and nucleus accumbens volumes as significant causal endophenotypes in AD. The two study’s results differed slightly for MV-IWAS-Egger, as hippocampal volume was no longer significant for the ENIGMA cohort. Nonetheless, we feel that these results serve to validate our approach and implicate two replicable phenotypes in AD. Notably, in both MV-IWAS-Egger applications, the direct variant effects (μ) were significant, indicating the presence of remaining genetic pleiotropy.

Table 4

MV-IWAS tests of 7 endophenotypes based on summary data from the ENIGMA and UKBB studies. Starred p-values indicate significance at a Bonferonni adjusted significant threshold of 0.05∕7 = 0.007.

IDP	P-Value (ukbb, MV-IWAS)	P-Value (enigma, MV-IWAS)	P-Value (ukbb, MV-IWAS-Egger)	P-Value (enigma, MV-IWAS-Egger)
μ	–	–	7.809e-06*	1.931e-05*
Nucleus Accumbens	0.00259*	2.739e-05*	0.00495*	2.351e-05*
Amygdala	0.86961	0.65678	0.76329	0.76232
Caudate	0.43492	0.07467	0.23074	0.12660
Hippocampus	0.00088*	0.00474*	0.00127*	0.01413
Pallidum	0.65615	0.06685	0.52996	0.05024
Putamen	0.14865	0.10710	0.08172	0.08743
Thalamus	0.42606	0.32672	0.59344	0.36795

Discussion

In this study, we implemented a multivariate extension of the univariate IWAS model to test for the causal effects of multiple brain imaging endophenotypes in Alzheimer’s Disease. We applied MV-IWAS to individual level data on 14 volumetric brain measures from the ADNI study, identifying 2 ROIs with considerable evidence of a causal effect on AD. These regions, namely the left hippocampus and right inferior temporal cortex, have been implicated in AD through multiple imaging and autopsy based studies (Lindberg et al., 2012), (Launer et al., 2001), (Convit et al., 1997), (Busatto et al., 2003). We further proposed a method for estimating MV-IWAS effects using only GWAS summary statistics and a reference panel, which we validated through simulation. These results highlighted the well controlled Type-I error rates and improved power of MV-IWAS, both the individual level and summary data versions, over UV-IWAS. Our simulations also highlight the out-performance of our proposed MV-IWAS-Egger test over all alternative approaches, including popular MV-MR methods, in the presence of direct pleiotropic SNPs. We applied this method to the summary data of 1578 imaging derived phenotypes and AD from UK Biobank and IGAP, respectively. MV-IWAS discovered multiple well-supported and biologically plausible IDPs above those identified by the univariate test and implicated many possible false positive UV-IWAS results. MV-IWAS can be further generalized to other molecular phenotypes with suspected causal effects on AD and can be applied to study causal pathways in numerous other complex diseases (Trushina et al., 2013), (Larsson and Markus, 2017). Ultimately, the results of this study and the numerous applications for future research promote MV-IWAS as a promising method for identifying multiple endophenotypes causally associated with complex diseases.

4 in total

Review 1. Imaging Genetics in Epilepsy: Current Knowledge and New Perspectives.

Authors: Ge Wang; Wenyue Wu; Yuchen Xu; Zhuanyi Yang; Bo Xiao; Lili Long
Journal: Front Mol Neurosci Date: 2022-05-30 Impact factor: 6.261

2. Accounting for nonlinear effects of gene expression identifies additional associated genes in transcriptome-wide association studies.

Authors: Zhaotong Lin; Haoran Xue; Mykhaylo M Malakhov; Katherine A Knutson; Wei Pan
Journal: Hum Mol Genet Date: 2022-07-21 Impact factor: 5.121

3. A new Mendelian Randomization method to estimate causal effects of multivariable brain imaging exposures.

Authors: Chen Mo; Zhenyao Ye; Hongjie Ke; Tong Lu; Travis Canida; Song Liu; Qiong Wu; Zhiwei Zhao; Yizhou Ma; L Elliot Hong; Peter Kochunov; Tianzhou Ma; Shuo Chen
Journal: Pac Symp Biocomput Date: 2022

4. A practical problem with Egger regression in Mendelian randomization.

Authors: Zhaotong Lin; Isaac Pan; Wei Pan
Journal: PLoS Genet Date: 2022-05-04 Impact factor: 6.020

4 in total