Literature DB >> 34019542

Leveraging eQTLs to identify individual-level tissue of interest for a complex trait.

Arunabha Majumdar1,2, Claudia Giambartolomei1, Na Cai3,4, Tanushree Haldar5, Tommer Schwarz6, Michael Gandal7, Jonathan Flint8, Bogdan Pasaniuc1,6.   

Abstract

Genetic predisposition for complex traits often acts through multiple tissues at different time points during development. As a simple example, the genetic predisposition for obesity could be manifested either through inherited variants that control metabolism through regulation of genes expressed in the brain, or that control fat storage through dysregulation of genes expressed in adipose tissue, or both. Here we describe a statistical approach that leverages tissue-specific expression quantitative trait loci (eQTLs) corresponding to tissue-specific genes to prioritize a relevant tissue underlying the genetic predisposition of a given individual for a complex trait. Unlike existing approaches that prioritize relevant tissues for the trait in the population, our approach probabilistically quantifies the tissue-wise genetic contribution to the trait for a given individual. We hypothesize that for a subgroup of individuals the genetic contribution to the trait can be mediated primarily through a specific tissue. Through simulations using the UK Biobank, we show that our approach can predict the relevant tissue accurately and can cluster individuals according to their tissue-specific genetic architecture. We analyze body mass index (BMI) and waist to hip ratio adjusted for BMI (WHRadjBMI) in the UK Biobank to identify subgroups of individuals whose genetic predisposition act primarily through brain versus adipose tissue, and adipose versus muscle tissue, respectively. Notably, we find that these individuals have specific phenotypic features beyond BMI and WHRadjBMI that distinguish them from random individuals in the data, suggesting biological effects of tissue-specific genetic contribution for these traits.

Entities:  

Mesh:

Year:  2021        PMID: 34019542      PMCID: PMC8174686          DOI: 10.1371/journal.pcbi.1008915

Source DB:  PubMed          Journal:  PLoS Comput Biol        ISSN: 1553-734X            Impact factor:   4.779


Introduction

Multiple clinical, pathologic, and molecular lines of evidence suggest that many phenotypes and diseases show heterogeneity and can be viewed as a collection of multiple traits (i.e. subtypes) in the population [1-5]. Traditional subtype identification has relied on detecting biomarkers or subphenotypes that distinguish subsets of individuals in a biologically meaningful way. For example, breast cancer has two well-known subtypes, estrogen receptor positive and negative [6-8]. Patients with psychiatric disorders can have different severities [9]. With the advent of large scale genome-wide association studies (GWAS) that have robustly identified thousands of risk variants for complex traits, multiple approaches have investigated the use of genetic risk variants to define classes of individuals that show genetic heterogeneity across subtypes [10-15]. For example, autism can be subtyped by grouping together individuals with recurrent mutations in the same autism-associated gene [10, 12]; type 2 diabetes can be subtyped using clusters of genetic variants previously associated with the disease [13]. Other examples include adiposity traits such as body mass index (BMI), waist-to-hip ratio (WHR), and WHR adjusted for BMI (WHRadjBMI), that can be subtyped based on genetic variants with distinct patterns of fat depots and metabolism [14]. Genetic subtyping offers an advantage over phenotypic subtyping in that germline genetic characteristics are more stable than phenotypic characteristics of an individual [12, 13]. A significant component of the genetic susceptibility of complex traits is mediated through genetic control of gene expression in one or multiple tissues [16-18], with several studies highlighting the relevance of tissue or cell type specific biological mechanisms underlying the pathogenesis of complex traits [19-25]. Such studies rely on integration of expression quantitative loci (eQTLs) with GWAS data in a tissue or cell type specific manner to prioritize tissues and cell types that are relevant for a given complex trait. For example, a recent study [23] proposed a novel approach based on the stratified LD score regression to identify the disease-relevant tissues; it investigates whether the collection of genomic regions surrounding the set of tissue-specific expressed genes are enriched for the disease heritability. These studies have often identified multiple tissues relevant to a given trait in the population (e.g., liver, pancreas and thyroid tissues for total cholesterol [22]; connective skeletal muscle and adipose for WHRadjBMI [23, 26]; brain and adipose for BMI [21, 23, 27]). Transcriptome-wide association studies (TWAS) have identified novel associations between the genetic component of gene expression and a phenotype [16, 28–30]; genes significantly associated with the phenotype have often been discovered in multiple tissues which demonstrates the tissue-specific genetic contribution to the trait across different tissues. These findings motivate us to ask an important follow-up question that, if a phenotype has two or more relevant tissues in the population how we can quantify the tissue-wise genetic contribution to the trait for a given individual. A method addressing this objective would also allow us to explore the possibility of prioritizing a specific tissue relevant for the phenotype of an individual. A recent study [31] has proposed to identify disease subtypes by integrating clinical features related to the disease and gene expression profiles across patients. Using a multi-view clustering algorithm they classified patients into subgroups where each subgroup has a distinct pattern of genetic component of gene expression predicted using genotypes of cis-SNPs and various clinical features related to the disease combined together. However, their approach does not investigate tissue-specific genetic predisposition to the phenotype for a given individual. Since, tissue-specific genetics plays an important functional role underlying the tissue-specific biological processes, we aim to explicitly quantify tissue-wise genetic contribution to a phenotype at an individual-level, and identify subgroups of individuals who are homogeneous with respect to the effect of tissue-specific genomic profile on the phenotype. In this article, we present a statistical approach that integrates tissue-specific eQTLs (i.e., eQTLs for tissue-specific genes) with genetic association data for a complex trait to probabilistically quantify the tissue-wise genetic contribution to the phenotype of each individual in the study. Following previous studies [20, 23] we characterize a tissue by a set of genes specifically over-expressed in it, and develop our model based on the eQTLs for such genes in the tissue. We focus on traits where multiple tissues have been implicated by previous studies (e.g. brain and adipose for BMI [21, 23, 27]), and hypothesize that the genetic predisposition to the trait for a subgroup of individuals can act primarily through a specific tissue. Assuming that two tissues are relevant for the trait in the population, there are three possibilities for a given individual: the genetic susceptibility of the trait is mediated primarily through first tissue, second tissue, or both tissues. That is, for one group of individuals the genetic predisposition to BMI acts through regulation in the brain, for another group of individuals the genetic predisposition acts through adipose, and for the remaining individuals it acts through both. In our study, we focus on the first two subgroups of individuals and examine the characteristics that distinguish them from the remaining population. We propose eGST (eQTL-based Genetic Sub-Typer), an approach that estimates the posterior probability that whether a relevant tissue can be prioritized for an individual’s phenotype or not, based on individual-level genotype data of tissue-specific eQTLs and marginal phenotype data. eGST implements a Bayesian framework of mixture model by employing a computationally efficient maximum a posteriori (MAP) expectation-maximization (EM) algorithm to estimate the tissue-specific posterior probabilities per individual. We perform extensive simulations using real genotypes from the UK Biobank and show that eGST accurately infers the simulated tissue of interest for each individual. We also show that a Bayesian framework of the mixture model performs better than the corresponding frequentist framework. By integrating expression data from the GTEx consortium [17, 18], we apply eGST to two obesity related measures (BMI and WHRadjBMI) in the UK Biobank [32, 33]. We consider brain and adipose tissues for BMI to identify two subgroups, one with adipose and the other with brain tissue-specific genetic contribution (in aggregate 25192 individuals, 7.5% of total sample size). Similarly, we consider muscle and adipose tissues for WHRadjBMI to identify two subgroups with muscle- or adipose-specific genetic contribution. Interestingly, the subgroups of individuals classified into each tissue show distinct genetic and phenotypic characteristics. 85 out of 106 phenotypes tested in the UK Biobank were differentially distributed between the BMI-adipose (or BMI-brain) group of individuals and the remaining population, with 72 out of 85 remaining significant after adjusting for BMI. For example, diabetes proportion, various mental health phenotypes, alcohol intake frequency, and smoking status were differentially distributed between one or both of the tissue-specific groups of BMI and the remaining population. Overall, our results suggest that tissue-specific eQTLs can be successfully utilized to quantify tissue-wise genetic contribution to a complex trait and prioritize the tissue of interest at an individual-level in the study.

Results

Overview of methods

We start by depicting the main intuition underlying our hypothesis and model (Fig 1). For simplicity, consider two tissues of interest and assume that gene A is only expressed in tissue 1 whereas gene B is only expressed in tissue 2. The main hypothesis underlying our model is that the genetic susceptibility of a complex trait for a given individual is mediated through regulation of either gene A in tissue 1 or gene B in tissue 2 or both. Having gene expression measurements in every individual at both genes in both tissues can be used to test this hypothesis. Unfortunately, gene expression measurements in large sample sizes such as the UK biobank are not typically available. To circumvent this, since eQTLs explain a substantial heritability of gene expression [16, 34], we use eQTLs for each gene in the corresponding tissue as a proxy for the genetically regulated component of the expression. In details, eGST takes as input the phenotype values and the genotype values at a set of variants known to be eQTLs for tissue-specific genes (Fig 1). We consider a set of genes overexpressed in a tissue as the set of tissue-specific genes [20, 23]. Formally, the phenotype for individual i under the tissue of interest k is modeled as , where α is the baseline tissue-specific trait mean, is the vector of normalized genotype values of individual i at the eQTL SNPs specific to tissue k, are their effects on the trait, and ϵ is a noise term, i = 1, …, n and k = 1, …, K. For simplicity of exposition, we introduce indicator variables for each individual C = k iff the individual i has its tissue of interest k. Thus, P(C = k) = w is the prior proportion of individuals for whom the phenotype has k tissue-specific genetic effect. We assume that the eQTL SNP sets across k tissues are non-overlapping and that each element in , the genetic effect of k tissue-specific eQTLs (i.e., eQTLs for tissue-specific genes) on the trait, is drawn from . If is the variance of the trait under C = k, then , where is the heritability of the trait under C = k due to k tissue-specific m eQTLs, and is termed as k tissue-specific subtype heritability. Under this mixture model, the likelihood of individual i takes the form: , where ϕ(.|.) denotes the normal density, , Θ = (θ1, …, θ) with θ denoting k tissue-specific set of model parameters. We propose a Bayesian inference approach based on a maximum a posteriori (MAP) expectation maximization (EM) algorithm to estimate the posterior probability that the phenotype of individual i is mediated through the genetic effects of eQTLs specific to tissue k (P(C = k|X, Y)). Our main inference is based on this posterior probability. For example, while P(C = 1|X, Y) > 65% indicates that tissue 1 is likely to be the tissue of interest for individual i, a value of the posterior probability around 50% indicates that the genetic susceptibility for the individual does not mediate through a specific tissue.
Fig 1

The top diagram (a) explains our main hypothesis and the bottom diagram (b) explains our model.

(a): Consider two tissues of interest for a phenotype, tissue 1 and tissue 2, where gene A has higher expression but gene B has much lower expression in tissue 1, and gene B has higher expression but gene A has much lower expression in tissue 2. The key hypothesis is that the susceptibility of the phenotype for the first two individuals is mediated through the effect of gene A in tissue 1, in which case we can assign tissue 1 as the tissue of interest for these individuals (similarly tissue 2 for last two individuals). We refer to the phenotype of the first two individuals as tissue 1 specific subtype and the phenotype of last two individuals as tissue 2 specific subtype. However, two individuals in the middle are not assigned as any of the two tissue-specific subtypes and remain unclassified. (b): We use genotypes at the tissue-specific eQTLs (i.e., eQTLs for tissue-specific genes) as a proxy for the expressions of the corresponding tissue-specific genes. We consider a finite mixture model with each of its components being a linear model regressing the trait on the genotypes at each set of tissue-specific eQTLs. Our method takes as input the individual-level measurements of the phenotype and genotypes at the sets of tissue-specific eQTLs and provides per-individual tissue-specific posterior probabilities as the main output.

The top diagram (a) explains our main hypothesis and the bottom diagram (b) explains our model.

(a): Consider two tissues of interest for a phenotype, tissue 1 and tissue 2, where gene A has higher expression but gene B has much lower expression in tissue 1, and gene B has higher expression but gene A has much lower expression in tissue 2. The key hypothesis is that the susceptibility of the phenotype for the first two individuals is mediated through the effect of gene A in tissue 1, in which case we can assign tissue 1 as the tissue of interest for these individuals (similarly tissue 2 for last two individuals). We refer to the phenotype of the first two individuals as tissue 1 specific subtype and the phenotype of last two individuals as tissue 2 specific subtype. However, two individuals in the middle are not assigned as any of the two tissue-specific subtypes and remain unclassified. (b): We use genotypes at the tissue-specific eQTLs (i.e., eQTLs for tissue-specific genes) as a proxy for the expressions of the corresponding tissue-specific genes. We consider a finite mixture model with each of its components being a linear model regressing the trait on the genotypes at each set of tissue-specific eQTLs. Our method takes as input the individual-level measurements of the phenotype and genotypes at the sets of tissue-specific eQTLs and provides per-individual tissue-specific posterior probabilities as the main output.

Simulations

We performed simulations to assess the performance of eGST with respect to the accuracy of classifying the tissue of interest across individuals under various scenarios. We simulated phenotypes using the real genotype data from the UK Biobank, in which two tissue-specific eQTL effects generate the phenotype (see Materials and methods). We evaluated the classification accuracy of eGST with respect to the variance explained in the trait specific to the two tissues by the two sets of tissue-specific SNPs’ effects. As expected, the average area under the curve (AUC) increases with the tissue-specific subtype heritability, ranging from AUC of 50% when to 95% when (Fig 2). This is likely due to larger tissue-specific subtype heritabilities inducing better differentiation between the tissue-specific genetic effects. Next, we assessed the performance of eGST compared to a variation of our approach that assumes the parameters of the model to be known. The performance obtained by this gold-standard strategy can be viewed as the maximum achievable under our proposed framework. We find that eGST loses 1.4%−3.9% AUC on average compared to this strategy across all simulation scenarios considered (Fig 2 and S1 and S2 Figs). We also considered a thresholding scheme on the tissue-specific posterior probabilities to balance total discoveries versus accuracy. As expected, the true discovery rate (TDR) of classifying the tissue of interest increases (hence FDR = 1-TDR decreases) with the posterior probability threshold but the proportion of discovery decreases (Fig 3).
Fig 2

In the first diagram (a), we present the receiver operating characteristic (ROC) curve evaluating the classification accuracy of eGST for a single dataset simulated under the following scenarios: , , m1 = m2 = 1000, n = 40000. The mean (across 50 simulated datasets) area under the curve (AUC) obtained by eGST under the same scenarios are also provided. Here and are the heritability of tissue-specific subtypes of the trait due to m1 and m2 SNPs representing two sets of tissue-specific eQTL SNPs, w1 and w2 are the proportions of individuals in the sample assigned to the two tissues, n is the total number of individuals. In the second diagram (b), box plots of AUCs obtained by eGST and the gold-standard strategy implementing our model in which true model parameters were assumed to be known while estimating the tissue-specific posterior probabilities are presented across the same simulation scenarios.

Fig 3

True discovery rate (TDR) and the proportion of discovery (POD) by eGST while classifying the tissue of interest across individuals at increasing thresholds of tissue-specific subtype posterior probability: 55%, 60%, 65%, …, 90%, 95%.

Box plots of TDR (top) and POD (bottom) across 50 datasets simulated under , , m1 = m2 = 1000, n = 40, 000 are presented. Here and are the heritabilities of the tissue-specific subtypes of the trait due to m1 and m2 SNPs representing two sets of tissue-specific eQTL SNPs, w1 and w2 are the proportions of individuals in the sample assigned to the two tissues, n is the total number of individuals.

In the first diagram (a), we present the receiver operating characteristic (ROC) curve evaluating the classification accuracy of eGST for a single dataset simulated under the following scenarios: , , m1 = m2 = 1000, n = 40000. The mean (across 50 simulated datasets) area under the curve (AUC) obtained by eGST under the same scenarios are also provided. Here and are the heritability of tissue-specific subtypes of the trait due to m1 and m2 SNPs representing two sets of tissue-specific eQTL SNPs, w1 and w2 are the proportions of individuals in the sample assigned to the two tissues, n is the total number of individuals. In the second diagram (b), box plots of AUCs obtained by eGST and the gold-standard strategy implementing our model in which true model parameters were assumed to be known while estimating the tissue-specific posterior probabilities are presented across the same simulation scenarios.

True discovery rate (TDR) and the proportion of discovery (POD) by eGST while classifying the tissue of interest across individuals at increasing thresholds of tissue-specific subtype posterior probability: 55%, 60%, 65%, …, 90%, 95%.

Box plots of TDR (top) and POD (bottom) across 50 datasets simulated under , , m1 = m2 = 1000, n = 40, 000 are presented. Here and are the heritabilities of the tissue-specific subtypes of the trait due to m1 and m2 SNPs representing two sets of tissue-specific eQTL SNPs, w1 and w2 are the proportions of individuals in the sample assigned to the two tissues, n is the total number of individuals. We then explored the effect of other parameters on the classification accuracy. First, we found that increasing sample size n from 40, 000 to 100, 000 marginally increases the AUC by an average of 1% across different simulation scenarios (S2 Table), which indicates that increasing sample size improves the overall classification accuracy. Second, we observed that as the number of causal SNPs explaining a fixed heritability of each subtype increases, the average AUC marginally decreases. For example, for a fixed subtype heritability explained, the average AUC for 2000 causal SNPs (1000 per tissue) is 1% higher than that for 3000 causal SNPs (1500 per tissue) across different choices of other simulation parameters (S3 Table). Third, as the difference between the baseline tissue-specific mean of the trait across tissues increases, the classification accuracy also increases. For example, we find that the AUC increased from 60% (for no difference in tissue-specific phenotype means, α1 = α2 = 0) to 63% when α1 = 0, α2 = 1 (S4 Table). We also explored the impact of the difference between the mean of tissue-specific genetic effect size distributions and observed that the classification accuracy improves compared to zero mean of both causal effects. For example, the AUC increases from 60% to 63% if we consider E(β1) = −0.02 and E(β2) = 0.02, j = 1, …, 1000, instead of zero means of 1, 2 (S5 Table). Next, we explored the comparative performance of the MAP-EM algorithm under Bayesian framework (Algorithm 1) and the EM algorithm under frequentist framework (Algorithm 2). Although both approaches yield similar AUC, MAP-EM performed better than EM with respect to the true discovery rate (TDR) at different posterior probability thresholds in nearly all of the simulation scenarios (Fig 4 and S3 and S4 Figs). MAP-EM offered an average of 0.05%−20% higher TDR (hence lower FDR) than EM across various posterior probability thresholds (Fig 4 and S3 and S4 Figs).
Fig 4

Comparison between the true discovery rate (TDR) of classifying tissue-specific subtypes by the MAP-EM algorithm (under the Bayesian framework of the mixture model which eGST employs) versus the EM algorithm (under the frequentist framework of the mixture model) based on the threshold of tissue-specific subtype posterior probability as 65%, 70%, 75%, 80%, 85%, 90%, 95%, respectively.

Box plots of TDR across 50 datasets simulated under , , m1 = m2 = 1000, n = 40, 000 are presented. Here and are the heritability of tissue-specific subtypes of the trait due to m1 and m2 SNPs representing two sets of tissue-specific eQTL SNPs, w1 and w2 are the proportions of individuals in the sample assigned to the two tissues, n is the total number of individuals.

Comparison between the true discovery rate (TDR) of classifying tissue-specific subtypes by the MAP-EM algorithm (under the Bayesian framework of the mixture model which eGST employs) versus the EM algorithm (under the frequentist framework of the mixture model) based on the threshold of tissue-specific subtype posterior probability as 65%, 70%, 75%, 80%, 85%, 90%, 95%, respectively.

Box plots of TDR across 50 datasets simulated under , , m1 = m2 = 1000, n = 40, 000 are presented. Here and are the heritability of tissue-specific subtypes of the trait due to m1 and m2 SNPs representing two sets of tissue-specific eQTL SNPs, w1 and w2 are the proportions of individuals in the sample assigned to the two tissues, n is the total number of individuals. In the above simulation scenarios, we considered that each individual has genetic contribution due to one of the tissue-specific set of SNPs. However, a group of individuals can have genetic effect due to both of the tissue-specific sets of SNPs. For such individuals, we would expect that the tissue-specific subtype posterior probability be distributed around half. We consider a sample of 30,000 individuals, in which first group of 10K individuals have genetic effect only due to the first tissue-specific set of SNPs. Second group of 10K individuals have genetic effect due to both tissue-specific sets of SNPs. And the last group of 10K individuals have effect only due to the second tissue-specific set of SNPs. We observe that the mean of the first tissue-specific subtype posterior probability for the individuals who have effect from both tissues is centered around 50% (S6 Table). As expected, the same quantity for the first group of individuals becomes larger than 50% as the tissue-specific subtype heritability increases (S6 Table). So far, we have assumed that both of the tissues included in the simulation model are relevant for the complex trait. However, it is important to explore the performance of eGST if one of the tissues considered is completely irrelevant for the trait. We perform simulations based on a sample of 20,000 individuals where all individuals have genetic effect due to one tissue-specific set of SNPs. We run eGST including another tissue-specific set of SNPs which is irrelevant for the phenotype. We observe that a small percentage of individuals were misclassified to the irrelevant tissue, and the percentage of correctly classified individuals is much larger (S7 Table). For example, when the heritability of the trait due to the relevant tissue-specific set of SNPs is 20%, the percentage of correctly classified individuals is 89% whereas the percentage of misclassified individuals is 1.4%. The percentage of misclassification decreases as the choice of the threshold of the tissue-specific subtype posterior probability increases.

Inferring individual-level tissue of interest for BMI and WHRadjBMI

Having established in simulations that our approach is effective in correctly classifying the individual-level tissue of interest, we next analyzed BMI and WHRadjBMI, two phenotypes that are known to have multiple tissues of interest mediating their genetic susceptibility [21, 23, 26, 27, 35–37]. For BMI analysis, we used phenotype and genotype data for 336, 106 individuals in the UK Biobank [32, 33] at 1705 adipose specific eQTLs (i.e., eQTLs for adipose-specific genes) and 1478 brain specific eQTLs (see Materials and methods). We considered rank-based inverse normal transformation of the BMI residual obtained after adjusting BMI for age, sex, and 20 PCs of genetic ancestry to adjust for population stratification. Each SNP eQTL is the top cis-association of a tissue-specific expressed gene [23] in the corresponding tissue [17, 18] (see Materials and methods). At 65% threshold of tissue-specific subtype posterior probability, 7.5% of all the individuals where assigned a tissue (adipose or brain) and the rest of the individuals remained unclassified; eGST classified the genetic susceptibility on the BMI of 11, 838 individuals through adipose eQTLs and for 13, 354 individuals through brain eQTLs (S1 Table). Individuals classified to each of the tissues are distributed across different bins of BMI (S8 Table). While the individuals classified into adipose have a higher mean of BMI (30.4) than the population, the BMI mean for brain-specific individuals (27.6) is very close to the population mean (27.4) (S10 Table). For WHRadjBMI, we included 953 adipose subcutaneous (abbreviated and referred as AS in the following) tissue-specific eQTLs (i.e., eQTLs for AS-specific genes) and 1052 muscle skeletal connective (abbreviated as MS) tissue-specific eQTLs; and inverse normal transformed WHRadjBMI residual (adjusting WHRadjBMI for age, sex, and top 20 PCs) for 336, 018 individuals. Similarly to the BMI analysis, the tissue of interest for WHRadjBMI of a small percentage (5.7%) of all individuals were classified (S1 Table), and the remaining individuals were unclassified. Individuals assigned to the two tissues are both spread across different bins of WHRadjBMI (S9 Table). The individuals classified into MS have a higher mean of WHRadjBMI and WHR (0.03 and 0.91) than the population, and the mean for AS-specific individuals (−0.04 and 0.85) is lower than the population mean (0 and 0.87) (S11 Table). We permuted the phenotype data across individuals while keeping the eQTL assignment to tissues fixed as it is in the original data (see Materials and methods). For BMI, the average number of individuals classified as a tissue-specific subtype (based on 65% threshold of subtype posterior probability) across 500 random permutations of the phenotype was 7404 (s.d. 700) which is substantially smaller than 25,192 individuals classified as real adipose or brain specific subtype of BMI in the original data. For WHRadjBMI, the average number of individuals classified as a tissue-specific subtype across 500 random permutations of the phenotype was 3433 (s.d. 517) compared to 19,041 individuals classified as real AS and MS specific subtypes of WHRadjBMI. To estimate the tissue-specific subtype heritability, we employed an MCMC algorithm that implements eGST under the Bayesian framework (S1 Algorithm in S1 Text). Through simulations we found that the MCMC estimates the tissue-specific subtype heritability more unbiasedly than the MAP-EM algorithm (results not provided for brevity), but is computationally much slower. In BMI analysis, the posterior mean of the tissue-specific subtype heritability was found to be 3.6% (posterior s.d. 0.2%) and 3.7% (posterior s.d. 0.2%) for brain and adipose, respectively. Similarly, in WHRadjBMI analysis, the posterior mean of the tissue-specific subtype heritability was 3.1% (posterior s.d. 0.2%) and 2.5% (posterior s.d. 0.2%) for adipose and muscle, respectively. Inclusion of only the top eQTLs in our analyses (mainly for computational simplicity) partly explains the lower estimates of tissue-specific subtype heritability.

Genetic characteristics

To confirm that eGST identified groups of individuals with different genetic basis, we contrasted the SNP effects of the adipose and brain-specific eQTLs on the BMI in those individuals assigned to the adipose (or the brain specific subtype). As expected, we find that in the individuals classified as having the brain-specific subtype in their genetic contribution to BMI, the magnitude of the effect size of a brain eQTL SNP is larger than the corresponding effect size magnitude of an adipose eQTL SNP (Wilcoxon rank sum (WRS) right tail test p-value < 2.2 × 10−16 (Table 1)). The opposite is true for the individuals classified as having the adipose-specific subtype of BMI. We also find that the magnitude of effect size of the adipose eQTLs are larger in the adipose-specific individuals than that in the brain-specific individuals, and we find statistical evidence supporting the analogous hypothesis about the brain eQTLs, brain-specific, and adipose-specific individuals of BMI (Table 1). We observe the same pattern in our analogous analysis for WHRadjBMI (Table 1).
Table 1

Genetic heterogeneity between groups of individuals assigned to tissue-specific subtypes of BMI (or WHRadjBMI).

In the BMI analysis, β1 denotes the SNP-effect of an adipose eQTL on BMI in those individuals assigned to the adipose-specific subtype of BMI, β2 is the SNP-effect of a brain eQTL on the brain-specific subtype of BMI, γ1 is the SNP-effect of a brain eQTL on the adipose subtype and γ2 is the effect of an adipose eQTL on the brain subtype. We provide the mean magnitude of the effect sizes of a tissue-specific eQTLs on the BMI of the corresponding tissue-specific group of individuals (e.g., joint SNP-effect of the adipose eQTLs on the BMI of individuals with adipose subtype), and the p-values obtained from the Wilcoxon rank sum (WRS) right tail tests of effect heterogeneity. For each test, the alternative hypotheses are listed in parentheses, while the null hypothesis is the equality between the corresponding pair of parameters. These parameters are defined in the same way for the adipose subcutaneous (AS) and muscle skeletal (MS) tissue-specific subtype of WHRadjBMI and the same analyses are performed.

BMI analysis
BMI subtypeMean magnitude of eQTLs’ effectsP-values
AdiposeBrain
Adipose0.045 (|β1|)0.028 (|γ1|)<2E-16 (|β1| > |γ1|)<2E-16 (|β2| > |γ1|)
Brain0.034 (|γ2|)0.054 (|β2|)<2E-16 (|β1| > |γ2|)<2E-16 (|β2| > |γ2|)
WHRadjBMI analysis
WHRadjBMI subtypeMean magnitude of eQTLs’ effectsP-values
ASMS
AS0.0007 (|β1|)0.0004 (|γ1|)<2E-16 (|β1| > |γ1|)<2E-16 (|β2| > |γ1|)
MS0.0005 (|γ2|)0.0007 (|β2|)E-9 (|β1| > |γ2|)9E-16 (|β2| > |γ2|)

Genetic heterogeneity between groups of individuals assigned to tissue-specific subtypes of BMI (or WHRadjBMI).

In the BMI analysis, β1 denotes the SNP-effect of an adipose eQTL on BMI in those individuals assigned to the adipose-specific subtype of BMI, β2 is the SNP-effect of a brain eQTL on the brain-specific subtype of BMI, γ1 is the SNP-effect of a brain eQTL on the adipose subtype and γ2 is the effect of an adipose eQTL on the brain subtype. We provide the mean magnitude of the effect sizes of a tissue-specific eQTLs on the BMI of the corresponding tissue-specific group of individuals (e.g., joint SNP-effect of the adipose eQTLs on the BMI of individuals with adipose subtype), and the p-values obtained from the Wilcoxon rank sum (WRS) right tail tests of effect heterogeneity. For each test, the alternative hypotheses are listed in parentheses, while the null hypothesis is the equality between the corresponding pair of parameters. These parameters are defined in the same way for the adipose subcutaneous (AS) and muscle skeletal (MS) tissue-specific subtype of WHRadjBMI and the same analyses are performed.

Phenotypic characteristics of individuals with a prioritized tissue

Next, we explored the phenotypic characteristics of the individuals assigned with a prioritized tissue. We considered 106 phenotypes in the UK Biobank and tested each one for being differentially distributed (heterogeneous) between the individuals of each tissue-specific subtype and the remaining population (see Materials and methods). In aggregate for BMI, 45 quantitative traits and 40 qualitative traits (total 85 among 106) were significantly heterogeneous between at least one of the BMI-adipose or BMI-brain specific groups versus the remaining population (S10 and S12 Tables and Table 2). None of these 106 traits was found to be differentially distributed between a random set of individuals from the population (with the same size as a tissue-specific subtype group) and the remaining population (see Materials and methods). 33 quantitative and 34 categorical traits showed heterogeneity in both the adipose group versus the population, and the brain group versus population. We found 6 quantitative and 3 categorical traits heterogeneous for individuals in the adipose group but not the brain group, and found 6 quantitative and 3 categorical traits heterogeneous for individuals in the brain group but not the adipose group (S10 and S12 Tables and Table 2).
Table 2

Qualitative/Categorical traits with three or more categories that are differentially distributed between at least one of the adipose and brain-specific subtype groups of individuals for BMI and the remaining population.

For each trait, we provide the p-values of testing heterogeneity between each tissue-specific subtype group of individuals and the remaining population before (primary) and after BMI adjustment (BMIadj). For each trait, tissue-specific groups which appear to be significantly heterogeneous (signif tissue) before (primary) and after BMI adjustment (BMIadj) are also provided. The asterisk mark attached to the traits indicate which trait remains differentially distributed between at least one of the tissue-specific groups and the remaining population after BMI adjustment. The number of categories for each trait (#categ) are also listed.

TraitP adiposeP brainsignif tissue#categ
primaryBMIadjprimaryBMIadjprimaryBMIadj
*Overall health rating4.98E-2391.83E-156.96E-871.59E-19bothboth4
*Alcohol intake frequency8.05E-1331.51E-067.47E-733.09E-11bothboth6
*Frequency of tiredness lethargy in last weeks4.04E-910.634.03E-389.44E-06bothbrain4
*Frequency of depressed mood in last 2 weeks5.00E-502.41E-053.09E-275.23E-07bothboth4
*Frequency of unenthusiasm disinterest in last 2 weeks3.41E-475.74E-053.26E-220.005bothadipose4
*Falls in the last year5.19E-456.42E-053.72E-150.0002bothboth3
Illness injury bereavement stress in last 2 years6.49E-440.51.11E-100.93bothnone7
*Alcohol drinker status2.50E-357.56E-181.66E-292.17E-24bothboth3
Getting up in morning1.05E-280.142.20E-230.003bothnone4
*Weight change compared with 1 year ago6.15E-213.81E-139.43E-233.18E-20bothboth3
*Smoking status3.14E-061.31E-087.62E-243.06E-11bothboth3
*Sleeplessness insomnia7.15E-245.09E-052.88E-100.0001bothboth3
Frequency of tenseness restlessness in last 2 weeks5.08E-190.265.24E-200.004bothnone4
Daytime dozing sleeping narcolepsy1.51E-170.840.00020.4bothnone4
Blood clot DVT bronchitis emphysema asthma rhinitis eczema allergy diagnosed by doctor5.76E-190.051.36E-080.24bothnone6
*Current tobacco smoking5.16E-071.51E-084.88E-192.71E-11bothboth3
*Past tobacco smoking2.63E-063.81E-121.86E-152.47E-09bothboth4
Qualifications1.31E-080.301.62E-060.1bothnone7
Alcohol intake versus 10 years previously4.16E-210.110.0050.36adiposenone3
Nap during day1.30E-140.080.060.02adiposenone3
Morning evening person chronotype2.88E-070.430.00070.72adiposenone4

Qualitative/Categorical traits with three or more categories that are differentially distributed between at least one of the adipose and brain-specific subtype groups of individuals for BMI and the remaining population.

For each trait, we provide the p-values of testing heterogeneity between each tissue-specific subtype group of individuals and the remaining population before (primary) and after BMI adjustment (BMIadj). For each trait, tissue-specific groups which appear to be significantly heterogeneous (signif tissue) before (primary) and after BMI adjustment (BMIadj) are also provided. The asterisk mark attached to the traits indicate which trait remains differentially distributed between at least one of the tissue-specific groups and the remaining population after BMI adjustment. The number of categories for each trait (#categ) are also listed. For example, hemoglobin concentration and snoring were heterogeneous for both the adipose and brain groups, lymphocyte count and alcohol intake versus 10 years previously were heterogeneous only for the adipose group, and birth weight, nervous feelings only for the brain group (S10 and S12 Tables and Table 2). We observe that hemoglobin concentration was lower in individuals from both groups when compared to the population, whereas reticulocyte percentage was relatively higher in individuals of the adipose but lower in those with the brain tissue compared to the population (Fig 5 and S10 Table). Among binary traits, snoring was more prevalent in those from the adipose group and less prevalent in brain group compared to the population (Fig 6). We observe that for most of the case-control traits, both the tissue-specific groups of individuals had a higher risk of developing the disease compared to the population (Fig 6). Of note, when the tissue-specific relative change of the traits (see Materials and methods) were in the same direction across tissues, they were of different magnitude for a majority of the traits (Figs 5 and 6). For example, the relative change were 15% and 8% for neutrophil count (S15 Table). Similar to BMI, we observed phenotypic heterogeneity across individuals with AS (MS) as the prioritized tissue for WHRadjBMI (S5 and S6 Figs and S11, S13 and S14 Tables and S2 Text).
Fig 5

Percentage of tissue-specific relative change of the quantitative traits that were differentially distributed between the individuals assigned to a tissue-specific subtype of BMI and the remaining population.

Traits in the left panels are primarily heterogeneous for both tissue-specific groups and traits in the right panels are heterogenous for one tissue-specific group. We measure the tissue-specific relative change of a trait by: , where the tissue-specific mean is computed only in the individuals with the corresponding tissue-specific subtype. The same measure is calculated for a trait residual obtained after adjusting for BMI to quantify the tissue-specific relative change of the trait after BMI adjustment. The faded green (or blue) bar presents primary adipose (or brain) tissue-specific relative change of a trait compared to the remaining population. The dark green (or blue) bar presents the BMI-adjusted adipose (or brain) specific relative change of a trait. Each trait listed here was found to be differentially distributed between at least one of the adipose or brain specific groups and the remaining population after BMI adjustment. For each trait, the asterisk mark attached to the bars indicates which tissue-specific group remains significantly heterogeneous after BMI adjustment.

Fig 6

Percentage of tissue-specific relative change in the risk of case-control traits between the individuals assigned to a tissue-specific subtype of BMI and the population.

The tissue-specific relative change of a disease risk is measured by: . Tissue-specific prevalence of the disorder was computed only in the individuals classified as the corresponding tissue-specific subtype of BMI. The asterisk mark attached to the traits indicate which trait remains differentially distributed between at least one of the adipose and brain tissue-specific groups of individuals and the remaining population after BMI adjustment. For each trait the asterisk mark attached to the bars indicate which tissue-specific group of individuals remains significantly heterogeneous for the trait after BMI adjustment.

Percentage of tissue-specific relative change of the quantitative traits that were differentially distributed between the individuals assigned to a tissue-specific subtype of BMI and the remaining population.

Traits in the left panels are primarily heterogeneous for both tissue-specific groups and traits in the right panels are heterogenous for one tissue-specific group. We measure the tissue-specific relative change of a trait by: , where the tissue-specific mean is computed only in the individuals with the corresponding tissue-specific subtype. The same measure is calculated for a trait residual obtained after adjusting for BMI to quantify the tissue-specific relative change of the trait after BMI adjustment. The faded green (or blue) bar presents primary adipose (or brain) tissue-specific relative change of a trait compared to the remaining population. The dark green (or blue) bar presents the BMI-adjusted adipose (or brain) specific relative change of a trait. Each trait listed here was found to be differentially distributed between at least one of the adipose or brain specific groups and the remaining population after BMI adjustment. For each trait, the asterisk mark attached to the bars indicates which tissue-specific group remains significantly heterogeneous after BMI adjustment.

Percentage of tissue-specific relative change in the risk of case-control traits between the individuals assigned to a tissue-specific subtype of BMI and the population.

The tissue-specific relative change of a disease risk is measured by: . Tissue-specific prevalence of the disorder was computed only in the individuals classified as the corresponding tissue-specific subtype of BMI. The asterisk mark attached to the traits indicate which trait remains differentially distributed between at least one of the adipose and brain tissue-specific groups of individuals and the remaining population after BMI adjustment. For each trait the asterisk mark attached to the bars indicate which tissue-specific group of individuals remains significantly heterogeneous for the trait after BMI adjustment. Since BMI itself was differentially distributed between the individuals of the adipose subtype as well as brain subtype compared to the remaining population, we investigated whether the heterogeneity of 84 non-BMI traits (S10 and S12 Tables and Table 2) were induced due to BMI heterogeneity (see Materials and methods). After BMI adjustment, 72 (out of 85) traits remained heterogeneous (41 quantitative traits [Fig 5 and S15 Table] and 31 qualitative traits [Fig 6 and Table 2] consistent with unique phenotypic characteristics of these individuals beyond the main phenotype effect. All the quantitative traits which remained heterogeneous after BMI adjustment have the same direction in BMI-adjusted tissue-specific relative change (see Materials and methods) in both adipose and brain compared to the population (Fig 5 and S15 Table). Since we used linear regression while evaluating BMI-adjusted tissue-specific relative change of heterogeneous non-BMI quantitative traits, we also investigated a model-free BMI random matching strategy. We assessed the magnitude of relative change of a trait between individuals with the brain (or adipose) subtype and a group of BMI-matched random individuals drawn from the population (see Materials and methods). For example, the magnitude of primary brain-specific relative change (prior to BMI matching) for hemoglobin concentration (mean reticulocyte volume) decreased from 20% (5%) to 4% (2%) after BMI matching (S17 Table). Of note, it is very difficult to exactly match BMI between a tissue-specific subtype group and the corresponding random group of individuals, because bins of BMI in the tail of its distribution contain very few individuals (S8 Table), the majority of whom were assigned to a tissue-specific subtype. We observed the same pattern in the results from analogous analyses for WHRadjBMI (S9, S16 and S18 Tables). To better understand the phenotypic characteristics of the individuals classified to a specific tissue, we performed the following two experiments. First, we shuffled the tissue-specific eQTL SNPs between tissues to create an artificial tissue-specific eQTL set and implemented eGST to identify groups of individuals having subtype specific to the artificial tissues (see Materials and methods). We found that the mean of artificial tissue-specific means of a quantitative trait (found primarily heterogeneous between adipose and/or brain specific group versus the remaining population [S10 Table]) across eQTL shuffles was significantly further from the original corresponding tissue-specific trait mean (S19 Table for BMI and S20 Table for WHRadjBMI). For example, for waist circumference, the mean of pseudo tissue-specific means over the random eQTL shuffles is 93.25 for adipose and 91.36 for brain, which are significantly different from the original adipose-specific mean 95.5 (P < 10−100) and brain-specific mean 89.2 (P < 10−100), respectively (S19 Table). The same pattern was observed for the primary phenotypes BMI (S19 Table) and WHRadjBMI (S20 Table) themselves. Second, we permuted the phenotype data across individuals while keeping the eQTL assignment to tissues fixed as it is in the original data. As before, we also observed that the mean of tissue-specific means of a quantitative trait (found primarily heterogeneous between a tissue-specific group versus the remaining population [S10 Table]) across random phenotype permutations was significantly further from the original corresponding tissue-specific trait mean (S21 Table for BMI and S22 Table for WHRadjBMI). Finally, we also tested for heterogeneity in distribution of each quantitative trait between the two tissue-specific subtype groups of individuals for BMI, instead of comparing each tissue-specific subtype group with the remaining population. In BMI analysis, we found 27 quantitative traits to be heterogeneously distributed between the adipose-specific group of individuals and the brain-specific group of individuals (S23 Table).

Computational efficiency

The MAP-EM algorithm underlying eGST is computationally efficient. 70 MAP-EM iterations in the BMI analysis (336K individuals with 1705 adipose-specific eQTLs and 1478 brain-specific eQTLs) took a runtime of 1.75 hours and yielded a log likelihood improvement of 2 × 10−8 in the final iteration. Though we ran eGST for a pair of tissues only considering the top eQTL per gene, it is computationally feasible to analyze larger datasets considering more eQTLs and multiple tissues simultaneously.

Discussion

We proposed a novel approach to quantify tissue-wise genetic contribution to a complex trait and prioritize a relevant tissue for every individual in the study, integrating genotype and phenotype data and an external expression panel data. We applied our method to infer individual-level tissue of interest for BMI and WHRadjBMI in the UK Biobank, integrating expression data in brain, adipose, and muscle tissues from the GTEx consortium, previously shown to be enriched in heritability for these phenotypes [21, 23, 26, 27]. Our approach identified subgroups of individuals with their genetic susceptibility to the trait mediated in a tissue-specific manner. Interestingly, multiple metabolic traits, neuropsychiatric traits, and other traits attained significant differences between the tissue-specific groups of individuals and the remaining population, suggesting a biologically meaningful interpretation for these subgroups of individuals. Even after adjusting the traits for the primary phenotype (BMI or WHRadjBMI), a majority of the traits remained differentially distributed between a tissue-specific group and the remaining population. We note that the performance of eGST is robust with respect to different reasonable choices of the hyper-parameters in the Bayesian model, mainly due to large sample size considered in contemporary GWAS. For example, we chose the hyper-parameters such that 5% of the total variance of each tissue-specific subtype of the trait is explained due to the corresponding tissue-specific set of SNPs. Since this is a heuristic choice, we experimented with other choices of this prior quantity and observed that a bit of variation in the choice has negligible effect on the final results, which is mainly due to large sample size of the GWAS. While analyzing the phenotypic heterogeneity, the main reason behind using the 65% threshold of tissue-specific subtype posterior probability was that we obtained a larger number of individuals classified to one of the two tissues compared to using a more stringent 70% threshold. We repeated the analysis for phenotypic characteristics of tissue-specific subtype groups for BMI based on 70% threshold of posterior probability. We observed a similar pattern of phenotypic heterogeneity between a tissue-specific subtype group and the remaining population as compared to using 65% threshold. Using the 70% threshold we found 87 phenotypes in UKB to be heterogeneously distributed compared to 85 phenotypes obtained by using the 65% threshold. The groups of heterogeneous phenotypes are highly overlapping between the two choices. Thus, results on phenotypic heterogeneity were not sensitive to one of these choices. For brevity, we skip providing detailed results on phenotypic heterogeneity obtained by using the 70% threshold. We note that if we consider a much higher choice of the posterior probability threshold, the size of the tissue-specific subtype group will be small. In such a scenario, even though the identified subtype groups will have fewer misclassified individuals, the statistical power to identify heterogeneously distributed phenotypes would decrease mainly due to substantially lower sample size of the subtype groups. For example, using 70%, 80% and 90% thresholds of posterior probability in the BMI analysis, the number of classified individuals was 11871, 2085, and 177, respectively. We analyzed the UKB sample of individuals, who are unrelated at least up to third degree relatives, i.e., a pair of individuals can be related only as fourth or higher degree relative. To adjust the phenotype for population stratification, we included 20 PCs of genetic ancestries in linear regression. However, fitting a linear mixed model is a more comprehensive strategy to adjust for both population stratification and cryptic relatedness which remain beyond fourth degree relatives in the sample. The number of classified individuals for the permuted data (phenotype values randomly permuted keeping the genotype data of tissue-specific eQTLs fixed as in the original data) was in the order of thousands. Even though the number of classified individuals was much larger in the original non-permuted data, this result indicates that the false discovery rate of the classified individuals is relatively large. One reason behind this is that we used a less stringent threshold (65%) of tissue-specific subtype posterior probability for the classification. Furthermore, carefully expanding the set of tissue-specific eQTLs may reduce the rate of misclassification. In real data analysis, we considered the tissues which were reported to be relevant for the phenotype by previous studies. For a different phenotype, if any previous studies have not prioritized the relevant tissues yet, the first step will be to implement the statistical approaches proposed by Finucane et al. [23] and Ongen et al. [22] to detect the tissues important for the phenotype. If at least a pair of tissues are identified to be significantly relevant for the phenotype, we can implement eGST based on the selected tissues. We note that for some complex traits multiple tissues can be biologically relevant. Although in this work we demonstrated the utility of eGST for a pair of tissues for BMI and WHRadjBMI, the MAP-EM algorithm underlying eGST is general in nature, and can be applied to a larger number (≥ 2) of tissues. We note that our model can alternatively be viewed as an approach to assign individuals’ phenotypes to a collection of tissues that are biologically important for the trait based on tissue-specific polygenic risk score [38]. We did not explicitly model for individuals who have their genetic contribution to the trait mediated through both tissues. As demonstrated by simulations, such individuals would be assigned posterior probabilities equally distributed across the tissues, and hence would not appear in the tails of the tissue-specific subtype posterior probability distribution. We developed our model under minimal assumptions on tissue-specific genetics. Following previous studies [20, 23] we characterized a tissue by a set of genes specifically over-expressed in it. However, the best possible strategies of choosing an optimal subset of genes (e.g. combination of both over-expressed and low-expressed genes) to efficiently characterize a tissue need to be further investigated. In real data application, we considered the top eQTL of each gene in a tissue mainly for computational convenience. A principled strategy to include more eQTLs of the tissue-specific genes can be to implement COJO in GCTA software [39] to perform conditional and joint analysis of multiple cis-SNPs adjusting for linkage disequilibrium (LD) to identify independent eQTLs for a tissue-specific expressed gene. The set of tissue-specific eQTLs obtained in this way is expected to be larger than the set of top eQTLs only. In future work, we plan to explore how this approach improves the performance of eGST. We considered top 10% of the tissue-specific expressed genes and the corresponding top eQTLs to implement eGST in UK Biobank. Instead of top 10% genes, we also considered the top 15% of the over-expressed genes in a tissue as the set of tissue-specific expressed genes. In BMI analysis, we ran eGST considering the top eQTLs for the top 15% genes. We observed 73% correlation between the estimate of tissue-specific subtype posterior probability across individuals obtained based on top 10% and 15% tissue-specific expressed genes. If we further increase the percentage of inclusion of tissue-specific genes, it can reduce the tissue-specificity of the selected genes. Thus, how to choose an optimal percentage of tissue-specific genes and corresponding set of eQTLs is a challenging task and requires a separate extensive investigation. However, top 10% expressed genes and the corresponding top eQTLs should always form a core part of the tissue-specific set of genes and eQTLs. Thus, even if we expand the list of tissue-specific expressed genes and corresponding eQTLs, there should be a substantial correlation between the estimates of tissue-specific subtype posterior probability. In the real data analysis, eGST classified a small percentage of individuals as tissue-specific subtypes. A few possible biological reasons are as follows: 1. a substantial proportion of individuals may have their genetic contribution mediated through both tissues; 2. more than two tissues can be biologically relevant for the phenotype; 3. the set of tissue-specific over expressed genes were considered to represent tissue-specificity, but an interesting possibility is to include lower-expressed genes in a tissue as well. The estimated tissue-specific subtype heritability across tissues appeared to be small (3% − 4%) for BMI and WHRadjBMI. These estimates should increase upon inclusion of more eQTLs in the analyses. However, we observe that the estimates are comparable to the average estimated heritability of six complex traits (3.4%) due to the effects of imputed gene expressions in blood/adipose (provided in Gusev et al. [16]). Also, a part of the genetic susceptibility for a complex trait is expected to mediate through gene expression. A recent study [31] has proposed to integrate clinical features related to a disease and imputed gene expression profiles to identify subtypes of the disease. We note that the objective of our study is distinct, and the two approaches are not comparable. Because, we aim to explicitly quantify tissue-wise genetic contribution to the trait at an individual-level and prioritize a relevant tissue for each individual. From a methodological perspective, we proposed an explicitly likelihood-based classification framework in contrast to their multi-view clustering algorithm. We conclude with a few caveats and limitations of our work and opportunities for future improvement. We investigated the utility of eGST using adipose and brain tissues for BMI [21, 23, 27, 35–37], and using adipose and muscle tissues for WHRadjBMI [23, 26]. However, the true tissues of interest could be different due to limitations in the existing studies. Since eGST depends on the choice of relevant tissues for a trait, a possible generalization of the model could include an additional mixture component, which does not associate to any of the tissues considered (a null component) and represents individuals for whom none of the tissues is relevant. In BMI analysis, a preliminary experimentation with this model indicates that the subgroup of individuals assigned to the null component remained unclassified (to any of the tissues) by the primary 2-component model. That said, we emphasize that eGST is a general analytic framework that can be applied to a collection of tissues for any complex trait. Even though eGST identified subgroups of individuals having their genetic contribution to the trait mediated in a tissue-specific manner, a major proportion of individuals remained unclassified. Few possible reasons are that we considered two tissues in the analysis, but multiple tissues can be relevant for the trait; we considered the top eQTL for each tissue-specific gene. We note that other types of tissue-specific QTLs (e.g., methylation QTLs, histone QTLs, splicing QTLs, etc. [40]) can also be combined with eQTLs to create a set of SNPs that better represent a tissue-specific genetic architecture. To explore the performance of eGST in real data analysis, we considered the top eQTLs for the tissue-specific expressed genes. Finucane et al. [23] and Kitsak et al. [41] demonstrated that a promising strategy to comprehend tissue-specificity underlying a complex phenotype is to consider the genes that are specifically expressed in the tissue. Another possibility is to first fine-map the causal eQTLs for each eGene in a tissue [42], and then consider the set of such tissue-specific eQTLs in eGST. In future work, we plan to investigate the merits of this approach. We developed the model for continuous traits, meaning that to extend the method for case-control data, we would need to use a logistic regression likelihood. Another future methodological investigation is to extend the model under penalized regression framework; if the number of SNPs characterizing the genetic architecture of a tissue becomes large and the ratio between the number of individuals and number of SNPs decreases, model fitting issues can arise. Finally, Fig 1b motivates that if gene expression data across tissues are available, it is possible to use the expression data itself to identify expression subtypes of the trait. However, since expression data is not available in most GWAS cohorts, an alternative avenue will be to impute the genetically regulated component of gene expression, e.g., using PrediXcan [34], EpiXcan [43] and identify tissue of interest based on imputed gene expression. While a possible advantage of such approach will be that all cis eQTLs can be unified to impute tissue-specific expression (instead of top few eQTLs only), a significant noise in the predicted expression due to limited sample size of expression panel data can also trim the improvement in performance. Another limitation of our analysis was that we focused on the GTEx data which does not have a large sample size across tissues. In future work, we plan to apply eGST on other complex traits integrating expression datasets of larger sample size. We provide a user-friendly R software package ‘eGST’ for general use of our approach: https://cran.r-project.org/web/packages/eGST/index.html.

Materials and methods

Model

For simplicity, we describe the model assuming two (K = 2) tissues of interest. Suppose, for n unrelated individuals, we have phenotype data Y = (y1, …, y) and expression data for two sets of tissue-specific expressed genes E(1), E(2) characterizing the two tissues. We define an indicator variable C such that for an individual, C = k iff the genetic susceptibility of the phenotype of the individual is mediated through tissue k, k = 1, 2 (Fig 1). We model the phenotype of individual i based on the tissue-specific expression of the two sets of tissue-specific genes as: Here, a1 and a2 represent the baseline tissue-specific trait means. and denote the vector of expression values of the first and second tissue-specific set of genes for individual i in the first tissue, respectively; and denote the vector of expression values of the first and second tissue-specific set of genes for individual i in the second tissue. Under C = 1, 1 and 1 denote the effects of expression of the first and second tissue-specific set of genes in the first tissue on the trait, respectively. Similarly, when C = 2, 2 and 2 denote the effects of expression of the two gene sets in the second tissue on the trait. Next we assume that, if C = 1, the expression of second tissue-specific genes (much low expressed in first tissue) in the first tissue have no effect (1 = 0) on the phenotype of the individual. Similarly, when C = 2, we assume that the expression of first tissue-specific genes (much low expressed in second tissue) in the second tissue have no effect (2 = 0) on the phenotype. Thus, we obtain the following simplified model under these assumptions: Expression datasets in general have limited sample size and are not available in large GWAS cohorts. Therefore, we consider genetically regulated component of a tissue-specific gene’s expression. However, the genetic component of expression in the GWAS cohort predicted by integrating an external panel of expression data [16, 34] can have substantial noise, which is mainly due to limited sample size of the expression panel (e.g., GTEx). Since eQTLs explain a substantial heritability of gene expression, we use genotypes of tissue-specific eQTLs (i.e., eQTLs for tissue-specific genes) in the GWAS data as a proxy for the predicted genetically regulated component of the expressions of the corresponding tissue-specific genes. Suppose, in a GWAS cohort, we have phenotype data, and genotype data for the two sets of tissue-specific eQTL SNPs corresponding to the two sets of tissue-specific expressed genes, one comprising m1 SNPs and the other comprising m2 SNPs. Then, we consider the following model for the phenotype of individual i: So, the phenotype of individual i under the tissue of interest k is modeled as , where α is the baseline tissue-specific trait mean, is the vector of normalized genotype values of individual i at the eQTL SNPs specific to tissue k, are their effects on the trait under C = k, and ϵ is a noise term, i = 1, …, n and k = 1, 2. The random errors are distributed as: and . The above mixture model can be viewed as a variant of finite mixture of regression models where each component is a linear model with a distinct set of predictors. Of note, the mixture model in our context is identifiable because the mean parameter in each component is a function of the genotype vector of the set of tissue-specific eQTLs, which is distinct across tissues [44].

Prior distributions

P(C = k) = w is the prior proportion of individuals for whom the phenotype has k tissue-specific genetic effect. We assume that the eQTL SNP sets across k tissues are non-overlapping and that each element in , the genetic effect of k tissue-specific eQTLs on the trait, is independently drawn from . If is the variance of the trait under C = k, then , where is the heritability of the trait under C = k due to k tissue-specific m eQTLs, and is termed as k tissue-specific subtype heritability. We also assume that and , with fixed . For K = 2, we assume that w1 ∼ Beta(s1, s2) [w2 = 1 − w1], which will be a Dirichlet distribution for more than two tissues. We consider fixed values of s1 = s2 = 1. Next, we assume: and ∼ Inverse−Gamma(a, b); and ∼ Inverse−Gamma(a, b). We choose fixed values of a, b, a, b such that in the prior expectation, 5% of the total variance of each tissue-specific subtype (under C = 1 or 2) of the trait is explained by the corresponding set of tissue-specific eQTL SNPs and 95% of the variance remains unexplained.

Inference procedure

Under this Bayesian framework, we implemented the maximum a posteriori (MAP) expectation-maximization (EM) algorithm (Algorithm 1) to estimate the posterior probability that the phenotype of individual i is mediated through the genetic effects of eQTLs specific to tissue k (P(C = k|X, Y)). We note that it is also possible to consider a frequentist framework of the mixture model, i.e., instead of having a distribution, (w1, w2), (α1, α2), (1, 2), can be assumed to have a fixed unknown true value. We implemented an EM algorithm to estimate the tissue-specific posterior probability across individuals under the frequentist framework. Next, for a general K (≥ 2) number of tissues, we outline the MAP-EM algorithm that implements the Bayesian framework, and the EM algorithm that implements the frequentist framework of the mixture model [45, 46]. For individual i and tissue k, i = 1, …, n and k = 1, …, K, P(C = k) = w; ∑ w = 1. Denote k tissue-specific set of parameters by and full set of parameters by Θ = (θ1, …, θ). Under the mixture model, the likelihood of individual i takes the following form: where ϕ(.|.) denotes the normal density. Thus, the full data log-likelihood conditioned on Θ is given by: . The prior log-likelihood of (C1, …, C) is given by , where f(C) is: P(C = k) = w; (w1, …, w) ∼ Dirichlet(s1, …, s). The prior of k tissue-specific parameters θ has the following hierarchical structure: , , , . In the prior, θ1, …, θ are independently distributed. Define the posterior probability that the phenotype of individual i be assigned tissue k as: Thus, the choice of the tissue of interest across individuals is quantified by Γ = {γ; i = 1, …, n; k = 1, …, K}. Next, we define the total membership weight of k tissue-specific subtype: , ∑ n = n. In the expectation-maximization algorithm, the main component which we maximize is given by: , where the conditional expectation Q(Θ|Θ() = E{log f(y, C|Θ)} . To obtain Θ(, we maximize {Q(Θ|Θ() + log f(Θ)} in the MAP-EM algorithm implementing the Bayesian framework, and maximize only Q(Θ|Θ() in the EM algorithm implementing the frequentist framework [45, 46]. The steps of the MAP-EM and EM algorithm are provided in Algorithm 1 and 2, respectively. Our main inference is based on the posterior probability matrix Γ = {γ; i = 1, …, n; k = 1, …, K}. For example, γ > 65% indicates that tissue 1 is likely to be the tissue of interest for individual i. We also designed a MCMC algorithm to implement the eGST model which we describe in S1 Algorithm in S1 Text. Algorithm 1 Maximum a posteriori (MAP) expectation maximization (EM) algorithm under Bayesian framework of the mixture model 1: Initialization: For k = 1, …, K, choose , , , ; and simulate from . Compute the initial log-likelihood: . Next, for iteration r = 0, 1, … 2: E-step: Compute: 3: M-step: For k = 1, …, K, update: 4: Convergence check: Compute the new log-likelihood: Return to step 2, if |logL( − logL(| > δ, for a pre-fixed threshold δ (e.g. 10−5). Algorithm 2 Expectation Maximization (EM) algorithm under frequentist framework of the mixture model 1: Initialization: For k = 1, …, K, choose , , , ; and simulate from . Compute the initial log-likelihood: . Next, for iteration r = 0, 1, … 2: E-step: Compute: 3: M-step: For k = 1, …, K, update: 4: Convergence check: Compute the new log-likelihood: Return to step 2, if |logL( − logL(| > δ, for a pre-fixed threshold δ.

Simulation design and choice of parameters

Consider n individuals and two non-overlapping sets of m1 SNPs and m2 SNPs representing eQTL SNP sets specific to two tissues. We chose the SNPs on chromosome 8 − 17 from the array SNPs in the UK Biobank (UKB). We pruned for LD between the SNPs such that two consecutive SNPs (on a chromosome) included in a SNP set had r2 < 0.25 (based on UKB in-sample LD). Each SNP had MAF >1% and satisfied Hardy Weinberg Equilibrium (HWE). We collected genotype data at both sets of SNPs for n individuals that were randomly selected from 337,205 white-British individuals in the UKB. Let w = (w1, w2) denote the proportions of individuals in the sample assigned to the two tissues where (100 × w1)% individuals are assigned the first tissue-specific subtype and (100 × w2)% individuals are assigned the second tissue-specific subtype. We assume that m SNPs explain of the total variance of k tissue-specific subtype, k = 1, 2. So is the heritability of k tissue-specific subtype of the trait due to m SNPs representing k tissue-specific eQTLs, k = 1, 2. Thus, if first subtype of Y has a total variance , we draw each element of 1 as: . Similarly we simulate 2, the genetic effect of second set of m2 SNPs on the second subtype of Y. For simplicity, we assume , but the performance of eGST remains similar for other choices of this parameter. If the genetic susceptibility of an individual’s phenotype was assigned to be mediated through first tissue, we simulated the phenotype as: , where x1 is the normalized genotype values of the individual at the first set of SNPs. While simulating the phenotype, we normalized the genotypes at each of first tissue-specific m1 SNPs only based on the individuals assigned to the first tissue-specific subtype. However, when applying eGST on a simulated dataset, we normalized the genotypes at each SNP based on all n individuals in the sample, because the tissue of interest across individuals are unknown. The random error components have the following distribution: and . We varied the choice of parameters to evaluate eGST in various simulation scenarios. We chose , and initially assumed α1 = α2 = 0 and simulate 1, 2 from zero-mean normal distributions. We considered all possible combinations of (w1, w2) where and , and all possible combinations of , where and ∈(10%, 20%, 30%, 40%, 50%). We also considered two unrealistic scenarios of null and high subtype heritability: and to evaluate if eGST is performing as expected in these extreme scenarios. We chose (m1, m2) with m1 = 1000, 1500 and m2 = 1000, 1500. Initially we chose n = 40, 000, and later n = 100, 000 to explore the effects of an increased sample size. For each choice of the complete set of simulation parameters, we summarized the results of eGST across 50 simulated datasets. We also performed simulations for α1 ≠ α2 and different non-zero mean of 1, 2 distributions.

BMI and WHRadjBMI analysis in the UK Biobank integrating GTEx data

We implemented eGST to infer the individual-level tissue of interest for two obesity related measures, BMI and WHRadjBMI, in the UK Biobank [32, 33], integrating expression data from the Genotype-Tissue Expression (GTEx) project [17, 18]. Sets of tissue-specific expressed genes were obtained from Finucane et al. [23] who analyzed the GTEx data and considered a gene to be specifically expressed in a tissue of interest if the gene’s mean expression in the tissue is substantially higher than its mean expression in other tissues combined, and calculated a t-statistic to rank the genes with respect to higher expression in a specific tissue. Similar to their work [23], we considered the top 10% of all genes (2485 such genes) in a tissue, ranked according to descending value of the tissue-specific t-statistic, as the set of genes specifically expressed in the tissue. We focused on the adipose and brain tissue for BMI, and the adipose and muscle tissue for WHRadjBMI. We took the union of the sets of genes specifically expressed in adipose subcutaneous and adipose visceral tissues, and considered it as the adipose-specific gene set. Similarly, we took the union of sets of genes specifically expressed in the brain cerebellum and brain cortex regions (these two had maximum sample size among different brain regions) to create a brain-specific set of genes. We excluded the genes overlapping between these two sets to consider non-overlapping sets of adipose and brain specific genes. For WHRadjBMI, we considered adipose subcutaneous and muscle skeletal connective tissue, and excluded the genes overlapping between the two sets of top 10% expressed genes within the tissues. We considered genes on the autosomal chromosomes 1–22. In BMI analysis, the main reason behind merging two type of adipose (or brain) tissues together to represent adipose (or brain) was to increase the number of tissue-specific eGenes per tissue. For WHRadjBMI analysis, we considered the adipose subcutaneous and muscle skeletal tissues to find different possible patterns in the performance of eGST that might be missed in BMI analysis due to merging tissues. The subsets of primary sets of tissue-specific genes that were found to be eGenes in GTEx were included in subsequent analyses. A gene is considered to be an eGene if at least one cis-SNP is significantly associated with its expression at FDR level 0.05 [17]. For WHRadjBMI analysis, among the initially selected 2228 adipose subcutaneous tissue-specific genes, 1152 genes were found to be eGenes for which at least one bi-allelic SNP was reported to be an eQTL in the GTEx summary-level data (version 7). Similarly, we had 1272 eGenes for muscle skeletal tissue. In BMI analysis, we had 1887 eGenes for adipose and 1653 eGenes for brain. For each gene in a tissue, we took the top bi-allelic eQTL SNP (smallest SNP-expression association p-value) with MAF >1%. In BMI analysis, while creating an adipose-specific set of eQTLs, if a gene was both adipose subcutaneous and visceral tissue-specific gene, we included the top eQTL of the gene in both tissues, one in subcutaneous and one in visceral. We implemented the same strategy for brain tissue, as well. Next, we obtained the subset of SNPs from each set of tissue-specific eQTL SNPs (obtained from GTEx), which were genotyped or imputed in UKB (imputation accuracy score > 0.9). The SNPs were also screened for HWE (p-value > 10−6) in UKB. We LD-pruned each set of tissue-specific eQTL SNPs based on r2 threshold 0.25 using UKB in-sample LD. In a tissue-specific set, if two eQTL SNPs had r2 > 0.25, we excluded the one for which the minimum of SNP-expression association p-value (in GTEx) across the genes (for which it was found to be the top eQTL) was larger. Finally, after LD pruning, we had 1705 eQTL SNPs specific to adipose and 1478 eQTL SNPs specific to brain for BMI analysis. We obtained 953 eQTL SNPs specific to adipose subcutaneous tissue and 1052 eQTL SNPs specific to muscle skeletal tissue for WHRadjBMI analysis. We used individual-level genotype data for the tissue-specific SNP sets in UKB to infer tissue of interest across individuals. Before running eGST, we normalized genotypes at each SNP in the two tissue-specific sets based on the whole sample of individuals.

Phenotype data

We considered the BMI of 337,205 unrelated white-British individuals in the UK Biobank (full release) and excluded individuals for whom BMI or relevant covariates (age, sex, etc.) were missing. We then adjusted BMI for age, sex, and the top 20 principal components (PCs) of genetic ancestry by linear regression and obtained the BMI residuals. We initially developed eGST assuming that each tissue-specific subtype of the trait follows a normal distribution. Since the BMI residuals obtained after the adjustment of covariates deviated substantially from the normal distribution (p-value of Kolmogorov Smirnov (KS) test for deviation from normal distribution < 2.2 × 10−16), we applied the rank-based inverse normal transformation on the BMI residuals, and implemented eGST for the transformed phenotype data. We adjusted WHR for BMI to obtain WHRadjBMI. We then adjusted WHRadjBMI for age, sex, and top 20 PCs of genetic ancestry. Since the WHRadjBMI residuals significantly deviated from the normal distribution, we applied the inverse normal transformation on the residuals. We contrasted the genetic basis of the groups of individuals assigned to the adipose and brain-specific subtypes of BMI. Let 1 denote the joint SNP-effects of the adipose-specific eQTLs on the BMI of the individuals classified as the adipose-specific subtype for whom the adipose-specific posterior probability obtained by eGST was > 50%. We chose a relaxed threshold of posterior probability because we used multiple linear regression (MLR) to estimate the joint SNP effects of a set of tissue-specific eQTLs on the BMI of a tissue-specific group of individuals, and MLR requires sufficiently large number of individuals (assigned to the corresponding tissue-specific subtype) in the sample for efficient estimation of the model parameters. Thus, if Y1 = {Y: C = 1} and X1 is the genotype matrix of adipose eQTLs for these adipose-specific individuals, we fit the linear model: E(Y1) = X11. Let γ1 be the joint SNP effects of the brain eQTLs on BMI of individuals assigned to the adipose subtype. Since the BMI of individuals assigned to the adipose subtype should have larger effects from adipose-specific eQTLs than from brain-specific eQTLs, we should expect that the magnitude of a general element in 1 would be larger than the magnitude of a general element in γ1. Based on the individuals assigned to the adipose subtype, we estimated 1 and γ1 using multiple linear regression of BMI residual on the genotypes of adipose eQTLs and brain eQTLs in UKB, respectively. Based on the estimated 1 and γ1 vectors, we performed the non-parametric Wilcoxon rank sum (WRS) test to evaluate H0: |β1| = |γ1| versus H1: |β1| > |γ1|, where β1 and γ1 represent a general element in 1 and γ1 vectors, respectively. Similarly, we tested if the magnitude of the effect of a brain eQTL SNP on the BMI of individuals assigned to brain-specific subtype (β2) was larger than the corresponding effect magnitude of an adipose eQTL SNP (γ2). We also tested whether the adipose eQTLs had a larger SNP effect on the adipose subtype of BMI than on the brain subtype, and whether brain eQTLs had a larger effect on the brain subtype than on the adipose subtype. We performed the analogous experiments for the groups of individuals assigned to AS and MS tissue-specific subtype of WHRadjBMI.

Phenotypic characteristics

We explored if the group of individuals whose BMI were classified as a tissue-specific genetic subtype is phenotypically distinct from the rest of the population, with respect to various other phenotypes collected in the UK Biobank. We considered 106 such phenotypes and individually tested each trait for being differentially distributed between individuals of each tissue-specific subtype and the remaining population (for BMI, 11,838 individuals assigned to adipose subtype and 13,354 individuals assigned to brain subtype based on 65% threshold of subtype posterior probability [S1 Table]). We performed the Wilcoxon rank sum (WRS) test for a quantitative trait and χ2 test based on the contingency table for a qualitative/categorical trait. We corrected the p-values for multiple testing across traits using the Bonferroni correction procedure. The same approach was adopted to find the traits differentially distributed between individuals classified as a tissue-specific subtype of WHRadjBMI and the remaining population (for WHRadjBMI, 11,803 individuals with AS subtype and 7,238 individuals of MS subtype [S1 Table]). For a binary/case-control trait, we term the percentage of individuals (among those assigned to the tissue) who had the disorder as tissue-specific risk of the disease.

A random group of individuals is phenotypically homogeneous with the remaining population

For BMI, we randomly selected two groups of individuals from the population with the same size as the groups of tissue-specific BMI subtype (11,838 and 13,354) and evaluated phenotypic heterogeneity across 106 traits between each of the two random groups and the rest of the population using WRS test for a continuous trait and contingency table χ2 test for a qualitative trait (as before). We repeated the random selection of individuals from the population to replicate the experiment. We did the same experiment for WHRadjBMI.

BMI (or WHRadjBMI) adjusted phenotypic heterogeneity

In the above analysis for BMI, BMI itself was found to be differentially distributed between the individuals with the adipose (as well as brain) specific subtype and the remaining population. Therefore, we further investigated whether the heterogeneity of non-BMI traits between a subtype group and the remaining population were induced due to BMI heterogeneity. For each quantitative trait initially found to be heterogeneous between individuals assigned to one of the subtype groups and the remaining population (S10 Table), we first adjusted the trait for BMI in the whole population and obtained the trait residuals. We then tested for heterogeneity between the trait residual in the adipose (or the brain) subtype group and the remaining population using WRS test. Similarly for qualitative/categorical traits that were initially heterogeneous (Fig 6 and Table 2 and S12 Table), we performed a binomial or multinomial (depending on the number of categories of the trait) logistic regression adjusting for BMI in the population. We adopted the same strategy for WHRadjBMI (which itself was found to be differentially distributed between AS as well as MS group and the remaining population) to find which among the non-WHR traits remain heterogeneous after WHRadjBMI adjustment.

Tissue-specific relative change

For each quantitative trait that was differentially distributed between the individuals of a tissue-specific subtype and the remaining population, we measured the relative change (or difference) of the trait between the tissue-specific subtype group and the remaining population as: , where the tissue-specific mean of the trait is calculated only in the individuals classified as the corresponding tissue-specific subtype of BMI (or WHRadjBMI). To quantify BMI-adjusted tissue-specific relative change of a primarily heterogeneous quantitative trait, we computed the same measure for BMI-adjusted trait residual (instead of the trait itself). To evaluate the tissue-specific relative change in the risk of a binary/case-control trait, we calculated , where the tissue-specific prevalence of a disease is computed only in the individuals assigned to the corresponding tissue-specific subtype.

BMI (or WHRadjBMI) matched tissue-specific relative change

In order to further investigate the role of tissue-specific genetics (uncoupled from the role of BMI heterogeneity) underlying the phenotypic characteristics of the individuals assigned to a tissue-specific subtype of BMI, we performed the following experiment. We split the range of BMI of the individuals assigned to the adipose subtype (11,838 individuals [S1 Table]) into 30 consecutive non-overlapping bins. In each BMI bin, we counted the number of individuals assigned to the adipose subtype, and randomly sampled the same number of individuals from all of the individuals contained in the bin. In this way, we randomly selected a pool of individuals (with the same size as the adipose-specific group) from the population, who are matched with the BMI of the adipose subtype individuals. Next, for each non-BMI quantitative trait which was found to be heterogeneous between the adipose group and the remaining population after BMI adjustment (S15 Table), we computed: , where the adipose specific mean of the trait is calculated only in the individuals of the adipose subtype and the BMI matched random mean is the trait mean calculated only in the BMI matched (with adipose group) random pool of individuals. This measure quantifies the relative change/difference of the trait between the individuals assigned to the adipose subtype and the corresponding BMI-matched random individuals selected from the population. This should provide insights into the phenotypic characteristics of the individuals with the adipose subtype, which is solely mediated through adipose-specific genetics (uncoupled from the corresponding effect of BMI heterogeneity between the adipose group and the remaining population). We repeated the random selection of BMI-matched individuals 500 times and computed the mean and s.d. of the above measure of BMI-matched tissue-specific relative change of a quantitative trait across random selections. We replicated the same experiment for individuals with brain subtype of BMI. For WHRadjBMI, we performed the same experiment to characterize the phenotypic characteristics of AS (or MS) subtype group induced due to AS- (or MS-) specific genetics only.

Tissue-specificity of phenotypic characteristics

To investigate tissue-specificity of the phenotypic characteristics of the individuals assigned to adipose and brain specific subtype of BMI, we randomly shuffled/exchanged 739 (half of the minimum of number of adipose and brain specific ) eQTLs between the set of adipose and brain specific eQTLs to create artificial tissue-specific eQTL sets. We considered 500 such random shuffles. Keeping the phenotype data fixed, for the genotype data at each set of artificial tissue-specific eQTLs, we ran eGST to identify the groups of individuals with the BMI subtype specific to the artificial adipose and brain tissues (based on the posterior probability threshold of 65%). Next, for each quantitative trait that was found to be primarily heterogeneous between the individuals assigned to the original adipose (or brain) subtype of BMI and the remaining population (S10 Table), we computed the artificial adipose and brain tissue-specific trait mean only in the individuals classified into the corresponding artificial tissue-specific subtype of BMI. Then for each trait, we computed central tendency measures of the artificial tissue-specific trait means across 500 sets of artificial tissue-specific eQTLs. For each trait, we also tested whether the overall mean of the artificial tissue-specific trait means is significantly different from the corresponding original (adipose or brain) tissue-specific trait mean. We performed the same experiment for WHRadjBMI.

Permuting phenotype data across individuals

Next, we performed a similar experiment for permuted phenotype (BMI or WHRadjBMI) data while keeping the eQTL assignment to tissues fixed as it is in the original data. We consider 500 random permutations of BMI across individuals. Keeping the genotype data fixed, we ran eGST for each permuted phenotype data and classified the tissue of interest across individuals based on 65% threshold of subtype posterior probability. As before, in each of these 500 pairs of subtype groups of individuals thus obtained, subtype-specific means were computed for each quantitative trait that was found to be primarily heterogeneous between the individuals of the original adipose (or brain) subtype of BMI and the remaining population (S10 Table). For each trait, we then computed central tendency measures of the tissue-specific means across 500 random BMI permutations. For each trait, we tested whether the overall mean of the tissue-specific trait means obtained across random permutations was significantly different from the corresponding original (adipose or brain) tissue-specific trait means. We conducted the same experiment for WHRadjBMI.

Markov Chain Monte Carlo (MCMC) algorithm for eGST.

(PDF) Click here for additional data file.

Phenotypic characteristics of the tissue-specific groups of individuals for WHRadjBMI.

(PDF) Click here for additional data file.

Box plots of AUCs comparing the classification accuracy of eGST and the gold-standard strategy implementing our model in first set of simulation scenarios.

(PDF) Click here for additional data file.

Box plots of AUCs comparing the classification accuracy of eGST and the gold-standard strategy implementing our model in second set of simulation scenarios.

(PDF) Click here for additional data file.

Comparison between the true discovery rate of classifying tissue-specific subtypes by the MAP-EM algorithm versus the EM algorithm in first set of simulation scenarios.

(PDF) Click here for additional data file.

Comparison between the true discovery rate of classifying tissue-specific subtypes by the MAP-EM algorithm versus the EM algorithm in second set of simulation scenarios.

(PDF) Click here for additional data file.

Percentage of tissue-specific relative change of the quantitative traits differentially distributed between the individuals assigned to a tissue-specific subtype of WHRadjBMI and the remaining population.

(PDF) Click here for additional data file.

Percentage of tissue-specific relative change in the risk of case-control traits between the individuals assigned to a tissue-specific subtype of WHRadjBMI and the population.

(PDF) Click here for additional data file.

Number of individuals classified by eGST as a tissue-specific subtype of BMI and WHRadjBMI.

(PDF) Click here for additional data file.

Simulation results: Effect of increasing sample size on classification accuracy of eGST.

(PDF) Click here for additional data file.

Simulation results: Effect of increasing number of tissue-specific SNPs with fixed subtype heritability per tissue on classification accuracy of eGST.

(PDF) Click here for additional data file.

Simulation results: Effect of difference between baseline tissue-specific means of the phenotype (α1, α2) on the classification accuracy of eGST.

(PDF) Click here for additional data file.

Simulation results: Effect of difference between the mean of tissue-specific genetic effect size distribution on the classification accuracy of eGST.

(PDF) Click here for additional data file.

Pattern of first tissue-specific subtype posterior probability when a group of individuals have genetic effect from both tissues.

(PDF) Click here for additional data file.

Simulation results: Percentage of misclassification when one of the two tissues is completely irrelevant to the phenotype.

(PDF) Click here for additional data file.

Number of individuals in consecutive non-overlapping bins of BMI who were assigned to adipose and brain specific subtype of BMI, and the number of individuals that remained unassigned by eGST.

(PDF) Click here for additional data file.

Number of individuals in consecutive non-overlapping bins of WHRadjBMI who were assigned to adipose and muscle specific subtype of WHRadjBMI, and the number of individuals that remained unassigned by eGST.

(PDF) Click here for additional data file.

Quantitative traits among 106 phenotypes in UK Biobank which are differentially distributed between the adipose (and/or brain) specific subtype group of individuals for BMI and the remaining population.

(PDF) Click here for additional data file.

Quantitative traits among 106 phenotypes in UK Biobank which are differentially distributed between the adipose subcutaneous (AS) (and/or muscle skeletal (MS)) specific subtype group of individuals for WHRadjBMI and the remaining population.

(PDF) Click here for additional data file.

Case-control traits among 106 phenotypes considered in the UK biobank which are differentially distributed between the adipose (and/or brain) specific subtype group of individuals for BMI and the remaining population.

(PDF) Click here for additional data file.

Case-control traits among 106 phenotypes considered in the UK biobank which are differentially distributed between the adipose subcutaneous (AS) (and/or muscle skeletal (MS)) specific subtype group of individuals for WHRadjBMI and the remaining population.

(PDF) Click here for additional data file.

Qualitative/categorical traits with three or more categories that are differentially distributed between at least one of adipose subcutaneous (AS) and muscle skeletal (MS) specific subtype groups of individuals for WHRadjBMI and the remaining population.

(PDF) Click here for additional data file.

Heterogeneity of non-BMI quantitative traits between the adipose (or brain) specific subtype group of individuals for BMI and the remaining population after BMI adjustment of the traits in the population using linear regression.

(PDF) Click here for additional data file.

Phenotypic heterogeneity of non-WHR quantitative traits between adipose subcutaneous (AS) (muscle skeletal (MS)) specific group of individuals for WHRadjBMI and the remaining population after WHRadjBMI adjustment of the traits in the population using linear regression.

(PDF) Click here for additional data file.

Magnitude of relative change of non-BMI quantitative traits between a tissue-specific subtype group of individuals for BMI and the corresponding group of BMI-matched individuals randomly selected from the population.

(PDF) Click here for additional data file.

Magnitude of relative change of non-WHR quantitative traits between a tissue-specific subtype group of individuals for WHRadjBMI and the corresponding group of WHRadjBMI-matched individuals randomly selected from the population.

(PDF) Click here for additional data file.

Summary of results from the analyses performed to characterize the tissue-specificity of phenotypic characteristics of individuals assigned to a tissue-specific subtype of BMI.

(PDF) Click here for additional data file.

Summary of results from the analyses performed to characterize the tissue-specificity of phenotypic characteristics of the individuals assigned to a tissue-specific subtype of WHRadjBMI.

(PDF) Click here for additional data file.

Summary of results on the phenotypic characteristics of the individuals assigned to tissue-specific subtype of BMI identified based on random permutations of the primary phenotype (BMI) across individuals.

(PDF) Click here for additional data file.

Summary of results on the phenotypic characteristics of the individuals assigned to tissue-specific subtype of WHRadjBMI identified based on random permutations of the primary phenotype (WHRadjBMI) across individuals.

(PDF) Click here for additional data file.

Quantitative traits among 106 phenotypes in UK Biobank which are differentially distributed between the adipose-specific subtype group of individuals for BMI and brain-specific subtype group of individuals.

(PDF) Click here for additional data file. 24 Nov 2020 Dear Dr. Majumdar, Thank you very much for submitting your manuscript "Leveraging eQTLs to identify individual-level tissue of interest for a complex trait" for consideration at PLOS Computational Biology. As with all papers reviewed by the journal, your manuscript was reviewed by members of the editorial board and by several independent reviewers. The reviewers appreciated the attention to an important topic. Based on the reviews, we are likely to accept this manuscript for publication, providing that you modify the manuscript according to the review recommendations. Please prepare and submit your revised manuscript within 30 days. If you anticipate any delay, please let us know the expected resubmission date by replying to this email. When you are ready to resubmit, please upload the following: [1] A letter containing a detailed list of your responses to all review comments, and a description of the changes you have made in the manuscript. Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out [2] Two versions of the revised manuscript: one with either highlights or tracked changes denoting where the text has been changed; the other a clean version (uploaded as the manuscript file). Important additional instructions are given below your reviewer comments. Thank you again for your submission to our journal. We hope that our editorial process has been constructive so far, and we welcome your feedback at any time. Please don't hesitate to contact us if you have any questions or comments. Sincerely, Ferhat Ay, Ph.D Associate Editor PLOS Computational Biology Jian Ma Deputy Editor PLOS Computational Biology *********************** A link appears below if there are any accompanying review attachments. If you believe any reviews to be missing, please contact ploscompbiol@plos.org immediately: [LINK] Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: In this manuscript, Majumdar et al. propose a novel method to quantify the tissue-wise genetic contribution to the trait of interest at the individual level. Briefly, they use a mixture model, which integrates tissue-specific eQTLs with genetic association data, to identify subgroups of individuals whose genetic predisposition act primarily through one specific trait. They showcase the utility of their method by extensive simulations and real data analyses of UK Biobank data. I find the manuscript is very interesting and may further help us understand the etiology of complex traits. However, I have a few questions/comments that hopefully the authors can consider to address. 1. A 65% threshold of tissue-specific subtype posterior probability was used. Is there any justification for this threshold? Are the results (especially those in real data analyses) sensitive to the choice of threshold? 2. How the tissue-specific genes and eQTL are selected? What's the specific cutoff we are using (on Line 583)? I just curious why "we included the top eQTL of the gene in both tissues, one in subcutaneous and one in visceral." Is there any consideration for that? Why not just simply exclude those genes to make the eQTL lists more tissue-specific? Minor: 1. On Line 121, does the sigma_{y_k}^{2} is the same no matter C_i = 1 or C_i = 2? The trait variance seems the same as we deal with the same trait. 2. On line 468, some explanations regarding why using 5% are appreciated. 3. One line 432, I feel that starting with Model 2 may much easier to follow and understand. Model (1) seems misleading, especially if we read the Results section first. It is just a minor suggestion. It's Okay to keep the current form that starts with a more general model. Reviewer #2: In this paper, the authors describe a statistical framework to infer the causal tissue per-individual underlying the predisposition for a specific complex trait. One of the application of this is to determine subtypes of a complex trait and assign individuals to it. While the aim of the study and the proposed methods are of interest, I do however have some comments/questions/concerns: 1. I would be curious to see what happens when using at least one irrelevant tissue (determined from biological knowledge or common sense). All the work described here starts from tissues that were previously inferred as relevant to the studied complex trait. What happens to the classification when one of the tissue being used is known to be independent of the trait? Are there any individual being assigned to this dummy tissue? 2. I understand that using genes specifically expressed for each tissue helps to avoid any ambiguities. However, GTEx introduced multiple methods and metrics to pinpoint causal eQTL variants and therefore accurately determine that a eQLT-eGene pair is specific to a given tissue. Can this information be used to improve the assignment (proportion & accuracy) of individuals to subtypes? If yes, to which extent? 3. In UK Biobank, >100,000 samples are related at least at the 3rd degree. How does this high level of relatedness between GWAS samples does affect your modeling? I understand that population stratification is accounted for thanks to PCs, but what about relatedness? 4. For BMI and WHRadjBMI, you managed to assign a tissue to 7.5% and 5.7% individuals, respectively. I understand that some of the causes for these relatively low percentages are technical, however is there also any biological rationale behind these? 5. When permuting the phenotype data in the case of real data, you get 7,404 and 3,433 individuals being assigned a tissue. On non-permuted data, you get 25,192 and 19,041 individuals with a tissue being assigned. Am I correct to say that you have 30% and 18% false discovery rate (FDR), respectively? If yes, can you comment on this? 6. When looking at the “phenotypic characteristics of individuals with an assigned tissue”, it seems to me that you mostly look at heterogeneity compared to general population. Could you actually compare individuals assigned to tissue A with those assigned to tissue B in a more systematic way and see if there is any phenotype significantly different? To me, that would be very informative to determine somehow a signature for the trait subtypes. Reviewer #3: This study develops a methodology to quantify the tissue-specific genetic contribution to a trait for an individual. The Bayesian methodology uses tissue-specific eQTLs of tissue-specific genes to prioritize tissues. This approach can therefore be used to identify individuals for which the genetic contribution is mediated through the tissue (and therefore potentially identify disease subtypes). The authors then applied the methodology to BMI and WHRAdjBMI in the UK Biobank. There are major concerns with the approach which should be addressed, but I would recommend publication of the paper if these are adequately addressed because the study contains some interesting and novel insights. One of the difficulties I have with the approach is that it does not explicitly consider the scenario in which the genetic contribution to the trait for an individual is mediated through multiple tissues. This scenario may well be the most generic one. The only model considered here is one in which the genetic contribution is mediated through a single tissue for an individual, and the extent to which this is realistic or plausible is not clear. The authors recognize this challenge, and, as they point out, in this case the approach would likely distribute the posterior probabilities equally across the tissues. Practically, how would one choose the tissues to include in a general application? The authors focused on brain and adipose for BMI, but it's not clear how one would go about selecting tissues to include for a different phenotype. Also, the authors used only the top eQTL for each tissue-specific gene. What's the distribution of the number of independent eQTLs for the tissue-specific genes? This will help to clarify/quantify the limitation of the approach. It's not clear how the uncertainty in the input set of "tissue-specific genes" or the set of tissue-specific eQTLs for such a gene affects the quantification (posterior probability) of the mediating tissue. The authors refer to an R package. Is the source available through github or some other repository? The link should be included. ********** Have all data underlying the figures and results presented in the manuscript been provided? Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: Yes Reviewer #2: Yes Reviewer #3: No: Please include the link to the source code. ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: Yes: Chong Wu Reviewer #2: No Reviewer #3: No Figure Files: While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, . PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at . Data Requirements: Please note that, as a condition of publication, PLOS' data policy requires that you make available all data used to draw the conclusions outlined in your manuscript. Data must be deposited in an appropriate repository, included within the body of the manuscript, or uploaded as supporting information. This includes all numerical values that were used to generate graphs, histograms etc.. For an example in PLOS Biology see here: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1001908#s5. Reproducibility: To enhance the reproducibility of your results, PLOS recommends that you deposit laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see 9 Mar 2021 Submitted filename: reviewrs-response.pdf Click here for additional data file. 26 Mar 2021 Dear Dr. Majumdar, We are pleased to inform you that your manuscript 'Leveraging eQTLs to identify individual-level tissue of interest for a complex trait' has been provisionally accepted for publication in PLOS Computational Biology. Before your manuscript can be formally accepted you will need to complete some formatting changes, which you will receive in a follow up email. A member of our team will be in touch with a set of requests. Please note that your manuscript will not be scheduled for publication until you have made the required changes, so a swift response is appreciated. IMPORTANT: The editorial review process is now complete. PLOS will only permit corrections to spelling, formatting or significant scientific errors from this point onwards. Requests for major changes, or any which affect the scientific understanding of your work, will cause delays to the publication date of your manuscript. Should you, your institution's press office or the journal office choose to press release your paper, you will automatically be opted out of early publication. We ask that you notify us now if you or your institution is planning to press release the article. All press must be co-ordinated with PLOS. Thank you again for supporting Open Access publishing; we are looking forward to publishing your work in PLOS Computational Biology. Best regards, Ferhat Ay, Ph.D Associate Editor PLOS Computational Biology Jian Ma Deputy Editor PLOS Computational Biology *********************************************************** Reviewer's Responses to Questions Comments to the Authors: Please note here if the review is uploaded as an attachment. Reviewer #1: The authors have addressed all my comments. Reviewer #3: The authors have fully addressed my concerns. The study, as I noted previously, contains some interesting insights, and so I recommend publication. ********** Have all data underlying the figures and results presented in the manuscript been provided? Large-scale datasets should be made available via a public repository as described in the PLOS Computational Biology data availability policy, and numerical data that underlies graphs or summary statistics should be provided in spreadsheet form as supporting information. Reviewer #1: Yes Reviewer #3: Yes ********** PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #3: Yes: Eric R. Gamazon 17 May 2021 PCOMPBIOL-D-20-01662R1 Leveraging eQTLs to identify individual-level tissue of interest for a complex trait Dear Dr Majumdar, I am pleased to inform you that your manuscript has been formally accepted for publication in PLOS Computational Biology. Your manuscript is now with our production department and you will be notified of the publication date in due course. The corresponding author will soon be receiving a typeset proof for review, to ensure errors have not been introduced during production. Please review the PDF proof of your manuscript carefully, as this is the last chance to correct any errors. Please note that major changes, or those which affect the scientific understanding of the work, will likely cause delays to the publication date of your manuscript. Soon after your final files are uploaded, unless you have opted out, the early version of your manuscript will be published online. The date of the early version will be your article's publication date. The final article will be published to the same URL, and all versions of the paper will be accessible to readers. Thank you again for supporting PLOS Computational Biology and open-access publishing. We are looking forward to publishing your work! With kind regards, Zita Barta PLOS Computational Biology | Carlyle House, Carlyle Road, Cambridge CB4 3DN | United Kingdom ploscompbiol@plos.org | Phone +44 (0) 1223-442824 | ploscompbiol.org | @PLOSCompBiol
  43 in total

1.  Cross-tissue and tissue-specific eQTLs: partitioning the heritability of a complex trait.

Authors:  Jason M Torres; Eric R Gamazon; Esteban J Parra; Jennifer E Below; Adan Valladares-Salgado; Niels Wacher; Miguel Cruz; Craig L Hanis; Nancy J Cox
Journal:  Am J Hum Genet       Date:  2014-10-30       Impact factor: 11.025

Review 2.  Heterogeneity of autoimmune diseases: pathophysiologic insights from genetics and implications for new therapies.

Authors:  Judy H Cho; Marc Feldman
Journal:  Nat Med       Date:  2015-06-29       Impact factor: 53.440

3.  Waist circumference, waist-to-hip ratio and body mass index as predictors of adipose tissue compartments in men.

Authors:  D C Chan; G F Watts; P H R Barrett; V Burke
Journal:  QJM       Date:  2003-06

4.  Biological interpretation of genome-wide association studies using predicted gene functions.

Authors:  Tune H Pers; Juha M Karjalainen; Yingleong Chan; Harm-Jan Westra; Andrew R Wood; Jian Yang; Julian C Lui; Sailaja Vedantam; Stefan Gustafsson; Tonu Esko; Tim Frayling; Elizabeth K Speliotes; Michael Boehnke; Soumya Raychaudhuri; Rudolf S N Fehrmann; Joel N Hirschhorn; Lude Franke
Journal:  Nat Commun       Date:  2015-01-19       Impact factor: 14.919

5.  Leveraging functional annotations in genetic risk prediction for human complex diseases.

Authors:  Yiming Hu; Qiongshi Lu; Ryan Powles; Xinwei Yao; Can Yang; Fang Fang; Xinran Xu; Hongyu Zhao
Journal:  PLoS Comput Biol       Date:  2017-06-08       Impact factor: 4.475

6.  Identification of ten variants associated with risk of estrogen-receptor-negative breast cancer.

Authors:  Roger L Milne; Karoline B Kuchenbaecker; Kyriaki Michailidou; Jonathan Beesley; Siddhartha Kar; Sara Lindström; Shirley Hui; Audrey Lemaçon; Penny Soucy; Joe Dennis; Xia Jiang; Asha Rostamianfar; Hilary Finucane; Manjeet K Bolla; Lesley McGuffog; Qin Wang; Cora M Aalfs; Marcia Adams; Julian Adlard; Simona Agata; Shahana Ahmed; Habibul Ahsan; Kristiina Aittomäki; Fares Al-Ejeh; Jamie Allen; Christine B Ambrosone; Christopher I Amos; Irene L Andrulis; Hoda Anton-Culver; Natalia N Antonenkova; Volker Arndt; Norbert Arnold; Kristan J Aronson; Bernd Auber; Paul L Auer; Margreet G E M Ausems; Jacopo Azzollini; François Bacot; Judith Balmaña; Monica Barile; Laure Barjhoux; Rosa B Barkardottir; Myrto Barrdahl; Daniel Barnes; Daniel Barrowdale; Caroline Baynes; Matthias W Beckmann; Javier Benitez; Marina Bermisheva; Leslie Bernstein; Yves-Jean Bignon; Kathleen R Blazer; Marinus J Blok; Carl Blomqvist; William Blot; Kristie Bobolis; Bram Boeckx; Natalia V Bogdanova; Anders Bojesen; Stig E Bojesen; Bernardo Bonanni; Anne-Lise Børresen-Dale; Aniko Bozsik; Angela R Bradbury; Judith S Brand; Hiltrud Brauch; Hermann Brenner; Brigitte Bressac-de Paillerets; Carole Brewer; Louise Brinton; Per Broberg; Angela Brooks-Wilson; Joan Brunet; Thomas Brüning; Barbara Burwinkel; Saundra S Buys; Jinyoung Byun; Qiuyin Cai; Trinidad Caldés; Maria A Caligo; Ian Campbell; Federico Canzian; Olivier Caron; Angel Carracedo; Brian D Carter; J Esteban Castelao; Laurent Castera; Virginie Caux-Moncoutier; Salina B Chan; Jenny Chang-Claude; Stephen J Chanock; Xiaoqing Chen; Ting-Yuan David Cheng; Jocelyne Chiquette; Hans Christiansen; Kathleen B M Claes; Christine L Clarke; Thomas Conner; Don M Conroy; Jackie Cook; Emilie Cordina-Duverger; Sten Cornelissen; Isabelle Coupier; Angela Cox; David G Cox; Simon S Cross; Katarina Cuk; Julie M Cunningham; Kamila Czene; Mary B Daly; Francesca Damiola; Hatef Darabi; Rosemarie Davidson; Kim De Leeneer; Peter Devilee; Ed Dicks; Orland Diez; Yuan Chun Ding; Nina Ditsch; Kimberly F Doheny; Susan M Domchek; Cecilia M Dorfling; Thilo Dörk; Isabel Dos-Santos-Silva; Stéphane Dubois; Pierre-Antoine Dugué; Martine Dumont; Alison M Dunning; Lorraine Durcan; Miriam Dwek; Bernd Dworniczak; Diana Eccles; Ros Eeles; Hans Ehrencrona; Ursula Eilber; Bent Ejlertsen; Arif B Ekici; A Heather Eliassen; Christoph Engel; Mikael Eriksson; Laura Fachal; Laurence Faivre; Peter A Fasching; Ulrike Faust; Jonine Figueroa; Dieter Flesch-Janys; Olivia Fletcher; Henrik Flyger; William D Foulkes; Eitan Friedman; Lin Fritschi; Debra Frost; Marike Gabrielson; Pragna Gaddam; Marilie D Gammon; Patricia A Ganz; Susan M Gapstur; Judy Garber; Vanesa Garcia-Barberan; José A García-Sáenz; Mia M Gaudet; Marion Gauthier-Villars; Andrea Gehrig; Vassilios Georgoulias; Anne-Marie Gerdes; Graham G Giles; Gord Glendon; Andrew K Godwin; Mark S Goldberg; David E Goldgar; Anna González-Neira; Paul Goodfellow; Mark H Greene; Grethe I Grenaker Alnæs; Mervi Grip; Jacek Gronwald; Anne Grundy; Daphne Gschwantler-Kaulich; Pascal Guénel; Qi Guo; Lothar Haeberle; Eric Hahnen; Christopher A Haiman; Niclas Håkansson; Emily Hallberg; Ute Hamann; Nathalie Hamel; Susan Hankinson; Thomas V O Hansen; Patricia Harrington; Steven N Hart; Jaana M Hartikainen; Catherine S Healey; Alexander Hein; Sonja Helbig; Alex Henderson; Jane Heyworth; Belynda Hicks; Peter Hillemanns; Shirley Hodgson; Frans B Hogervorst; Antoinette Hollestelle; Maartje J Hooning; Bob Hoover; John L Hopper; Chunling Hu; Guanmengqian Huang; Peter J Hulick; Keith Humphreys; David J Hunter; Evgeny N Imyanitov; Claudine Isaacs; Motoki Iwasaki; Louise Izatt; Anna Jakubowska; Paul James; Ramunas Janavicius; Wolfgang Janni; Uffe Birk Jensen; Esther M John; Nichola Johnson; Kristine Jones; Michael Jones; Arja Jukkola-Vuorinen; Rudolf Kaaks; Maria Kabisch; Katarzyna Kaczmarek; Daehee Kang; Karin Kast; Renske Keeman; Michael J Kerin; Carolien M Kets; Machteld Keupers; Sofia Khan; Elza Khusnutdinova; Johanna I Kiiski; Sung-Won Kim; Julia A Knight; Irene Konstantopoulou; Veli-Matti Kosma; Vessela N Kristensen; Torben A Kruse; Ava Kwong; Anne-Vibeke Lænkholm; Yael Laitman; Fiona Lalloo; Diether Lambrechts; Keren Landsman; Christine Lasset; Conxi Lazaro; Loic Le Marchand; Julie Lecarpentier; Andrew Lee; Eunjung Lee; Jong Won Lee; Min Hyuk Lee; Flavio Lejbkowicz; Fabienne Lesueur; Jingmei Li; Jenna Lilyquist; Anne Lincoln; Annika Lindblom; Jolanta Lissowska; Wing-Yee Lo; Sibylle Loibl; Jirong Long; Jennifer T Loud; Jan Lubinski; Craig Luccarini; Michael Lush; Robert J MacInnis; Tom Maishman; Enes Makalic; Ivana Maleva Kostovska; Kathleen E Malone; Siranoush Manoukian; JoAnn E Manson; Sara Margolin; John W M Martens; Maria Elena Martinez; Keitaro Matsuo; Dimitrios Mavroudis; Sylvie Mazoyer; Catriona McLean; Hanne Meijers-Heijboer; Primitiva Menéndez; Jeffery Meyer; Hui Miao; Austin Miller; Nicola Miller; Gillian Mitchell; Marco Montagna; Kenneth Muir; Anna Marie Mulligan; Claire Mulot; Sue Nadesan; Katherine L Nathanson; Susan L Neuhausen; Heli Nevanlinna; Ines Nevelsteen; Dieter Niederacher; Sune F Nielsen; Børge G Nordestgaard; Aaron Norman; Robert L Nussbaum; Edith Olah; Olufunmilayo I Olopade; Janet E Olson; Curtis Olswold; Kai-Ren Ong; Jan C Oosterwijk; Nick Orr; Ana Osorio; V Shane Pankratz; Laura Papi; Tjoung-Won Park-Simon; Ylva Paulsson-Karlsson; Rachel Lloyd; Inge Søkilde Pedersen; Bernard Peissel; Ana Peixoto; Jose I A Perez; Paolo Peterlongo; Julian Peto; Georg Pfeiler; Catherine M Phelan; Mila Pinchev; Dijana Plaseska-Karanfilska; Bruce Poppe; Mary E Porteous; Ross Prentice; Nadege Presneau; Darya Prokofieva; Elizabeth Pugh; Miquel Angel Pujana; Katri Pylkäs; Brigitte Rack; Paolo Radice; Nazneen Rahman; Johanna Rantala; Christine Rappaport-Fuerhauser; Gad Rennert; Hedy S Rennert; Valerie Rhenius; Kerstin Rhiem; Andrea Richardson; Gustavo C Rodriguez; Atocha Romero; Jane Romm; Matti A Rookus; Anja Rudolph; Thomas Ruediger; Emmanouil Saloustros; Joyce Sanders; Dale P Sandler; Suleeporn Sangrajrang; Elinor J Sawyer; Daniel F Schmidt; Minouk J Schoemaker; Fredrick Schumacher; Peter Schürmann; Lukas Schwentner; Christopher Scott; Rodney J Scott; Sheila Seal; Leigha Senter; Caroline Seynaeve; Mitul Shah; Priyanka Sharma; Chen-Yang Shen; Xin Sheng; Hermela Shimelis; Martha J Shrubsole; Xiao-Ou Shu; Lucy E Side; Christian F Singer; Christof Sohn; Melissa C Southey; John J Spinelli; Amanda B Spurdle; Christa Stegmaier; Dominique Stoppa-Lyonnet; Grzegorz Sukiennicki; Harald Surowy; Christian Sutter; Anthony Swerdlow; Csilla I Szabo; Rulla M Tamimi; Yen Y Tan; Jack A Taylor; Maria-Isabel Tejada; Maria Tengström; Soo H Teo; Mary B Terry; Daniel C Tessier; Alex Teulé; Kathrin Thöne; Darcy L Thull; Maria Grazia Tibiletti; Laima Tihomirova; Marc Tischkowitz; Amanda E Toland; Rob A E M Tollenaar; Ian Tomlinson; Ling Tong; Diana Torres; Martine Tranchant; Thérèse Truong; Kathy Tucker; Nadine Tung; Jonathan Tyrer; Hans-Ulrich Ulmer; Celine Vachon; Christi J van Asperen; David Van Den Berg; Ans M W van den Ouweland; Elizabeth J van Rensburg; Liliana Varesco; Raymonda Varon-Mateeva; Ana Vega; Alessandra Viel; Joseph Vijai; Daniel Vincent; Jason Vollenweider; Lisa Walker; Zhaoming Wang; Shan Wang-Gohrke; Barbara Wappenschmidt; Clarice R Weinberg; Jeffrey N Weitzel; Camilla Wendt; Jelle Wesseling; Alice S Whittemore; Juul T Wijnen; Walter Willett; Robert Winqvist; Alicja Wolk; Anna H Wu; Lucy Xia; Xiaohong R Yang; Drakoulis Yannoukakos; Daniela Zaffaroni; Wei Zheng; Bin Zhu; Argyrios Ziogas; Elad Ziv; Kristin K Zorn; Manuela Gago-Dominguez; Arto Mannermaa; Håkan Olsson; Manuel R Teixeira; Jennifer Stone; Kenneth Offit; Laura Ottini; Sue K Park; Mads Thomassen; Per Hall; Alfons Meindl; Rita K Schmutzler; Arnaud Droit; Gary D Bader; Paul D P Pharoah; Fergus J Couch; Douglas F Easton; Peter Kraft; Georgia Chenevix-Trench; Montserrat García-Closas; Marjanka K Schmidt; Antonis C Antoniou; Jacques Simard
Journal:  Nat Genet       Date:  2017-10-23       Impact factor: 38.330

7.  Genome-wide association studies identify four ER negative-specific breast cancer risk loci.

Authors:  Montserrat Garcia-Closas; Fergus J Couch; Sara Lindstrom; Kyriaki Michailidou; Marjanka K Schmidt; Mark N Brook; Nick Orr; Suhn Kyong Rhie; Elio Riboli; Heather S Feigelson; Loic Le Marchand; Julie E Buring; Diana Eccles; Penelope Miron; Peter A Fasching; Hiltrud Brauch; Jenny Chang-Claude; Jane Carpenter; Andrew K Godwin; Heli Nevanlinna; Graham G Giles; Angela Cox; John L Hopper; Manjeet K Bolla; Qin Wang; Joe Dennis; Ed Dicks; Will J Howat; Nils Schoof; Stig E Bojesen; Diether Lambrechts; Annegien Broeks; Irene L Andrulis; Pascal Guénel; Barbara Burwinkel; Elinor J Sawyer; Antoinette Hollestelle; Olivia Fletcher; Robert Winqvist; Hermann Brenner; Arto Mannermaa; Ute Hamann; Alfons Meindl; Annika Lindblom; Wei Zheng; Peter Devillee; Mark S Goldberg; Jan Lubinski; Vessela Kristensen; Anthony Swerdlow; Hoda Anton-Culver; Thilo Dörk; Kenneth Muir; Keitaro Matsuo; Anna H Wu; Paolo Radice; Soo Hwang Teo; Xiao-Ou Shu; William Blot; Daehee Kang; Mikael Hartman; Suleeporn Sangrajrang; Chen-Yang Shen; Melissa C Southey; Daniel J Park; Fleur Hammet; Jennifer Stone; Laura J Van't Veer; Emiel J Rutgers; Artitaya Lophatananon; Sarah Stewart-Brown; Pornthep Siriwanarangsan; Julian Peto; Michael G Schrauder; Arif B Ekici; Matthias W Beckmann; Isabel Dos Santos Silva; Nichola Johnson; Helen Warren; Ian Tomlinson; Michael J Kerin; Nicola Miller; Federick Marme; Andreas Schneeweiss; Christof Sohn; Therese Truong; Pierre Laurent-Puig; Pierre Kerbrat; Børge G Nordestgaard; Sune F Nielsen; Henrik Flyger; Roger L Milne; Jose Ignacio Arias Perez; Primitiva Menéndez; Heiko Müller; Volker Arndt; Christa Stegmaier; Peter Lichtner; Magdalena Lochmann; Christina Justenhoven; Yon-Dschun Ko; Taru A Muranen; Kristiina Aittomäki; Carl Blomqvist; Dario Greco; Tuomas Heikkinen; Hidemi Ito; Hiroji Iwata; Yasushi Yatabe; Natalia N Antonenkova; Sara Margolin; Vesa Kataja; Veli-Matti Kosma; Jaana M Hartikainen; Rosemary Balleine; Chiu-Chen Tseng; David Van Den Berg; Daniel O Stram; Patrick Neven; Anne-Sophie Dieudonné; Karin Leunen; Anja Rudolph; Stefan Nickels; Dieter Flesch-Janys; Paolo Peterlongo; Bernard Peissel; Loris Bernard; Janet E Olson; Xianshu Wang; Kristen Stevens; Gianluca Severi; Laura Baglietto; Catriona McLean; Gerhard A Coetzee; Ye Feng; Brian E Henderson; Fredrick Schumacher; Natalia V Bogdanova; France Labrèche; Martine Dumont; Cheng Har Yip; Nur Aishah Mohd Taib; Ching-Yu Cheng; Martha Shrubsole; Jirong Long; Katri Pylkäs; Arja Jukkola-Vuorinen; Saila Kauppila; Julia A Knight; Gord Glendon; Anna Marie Mulligan; Robertus A E M Tollenaar; Caroline M Seynaeve; Mieke Kriege; Maartje J Hooning; Ans M W van den Ouweland; Carolien H M van Deurzen; Wei Lu; Yu-Tang Gao; Hui Cai; Sabapathy P Balasubramanian; Simon S Cross; Malcolm W R Reed; Lisa Signorello; Qiuyin Cai; Mitul Shah; Hui Miao; Ching Wan Chan; Kee Seng Chia; Anna Jakubowska; Katarzyna Jaworska; Katarzyna Durda; Chia-Ni Hsiung; Pei-Ei Wu; Jyh-Cherng Yu; Alan Ashworth; Michael Jones; Daniel C Tessier; Anna González-Neira; Guillermo Pita; M Rosario Alonso; Daniel Vincent; Francois Bacot; Christine B Ambrosone; Elisa V Bandera; Esther M John; Gary K Chen; Jennifer J Hu; Jorge L Rodriguez-Gil; Leslie Bernstein; Michael F Press; Regina G Ziegler; Robert M Millikan; Sandra L Deming-Halverson; Sarah Nyante; Sue A Ingles; Quinten Waisfisz; Helen Tsimiklis; Enes Makalic; Daniel Schmidt; Minh Bui; Lorna Gibson; Bertram Müller-Myhsok; Rita K Schmutzler; Rebecca Hein; Norbert Dahmen; Lars Beckmann; Kirsimari Aaltonen; Kamila Czene; Astrid Irwanto; Jianjun Liu; Clare Turnbull; Nazneen Rahman; Hanne Meijers-Heijboer; Andre G Uitterlinden; Fernando Rivadeneira; Curtis Olswold; Susan Slager; Robert Pilarski; Foluso Ademuyiwa; Irene Konstantopoulou; Nicholas G Martin; Grant W Montgomery; Dennis J Slamon; Claudia Rauh; Michael P Lux; Sebastian M Jud; Thomas Bruning; Joellen Weaver; Priyanka Sharma; Harsh Pathak; Will Tapper; Sue Gerty; Lorraine Durcan; Dimitrios Trichopoulos; Rosario Tumino; Petra H Peeters; Rudolf Kaaks; Daniele Campa; Federico Canzian; Elisabete Weiderpass; Mattias Johansson; Kay-Tee Khaw; Ruth Travis; Françoise Clavel-Chapelon; Laurence N Kolonel; Constance Chen; Andy Beck; Susan E Hankinson; Christine D Berg; Robert N Hoover; Jolanta Lissowska; Jonine D Figueroa; Daniel I Chasman; Mia M Gaudet; W Ryan Diver; Walter C Willett; David J Hunter; Jacques Simard; Javier Benitez; Alison M Dunning; Mark E Sherman; Georgia Chenevix-Trench; Stephen J Chanock; Per Hall; Paul D P Pharoah; Celine Vachon; Douglas F Easton; Christopher A Haiman; Peter Kraft
Journal:  Nat Genet       Date:  2013-04       Impact factor: 38.330

8.  Disruptive CHD8 mutations define a subtype of autism early in development.

Authors:  Raphael Bernier; Christelle Golzio; Bo Xiong; Holly A Stessman; Bradley P Coe; Osnat Penn; Kali Witherspoon; Jennifer Gerdts; Carl Baker; Anneke T Vulto-van Silfhout; Janneke H Schuurs-Hoeijmakers; Marco Fichera; Paolo Bosco; Serafino Buono; Antonino Alberti; Pinella Failla; Hilde Peeters; Jean Steyaert; Lisenka E L M Vissers; Ludmila Francescatto; Heather C Mefford; Jill A Rosenfeld; Trygve Bakken; Brian J O'Roak; Matthew Pawlus; Randall Moon; Jay Shendure; David G Amaral; Ed Lein; Julia Rankin; Corrado Romano; Bert B A de Vries; Nicholas Katsanis; Evan E Eichler
Journal:  Cell       Date:  2014-07-03       Impact factor: 41.582

Review 9.  Genetic heterogeneity in Alzheimer disease and implications for treatment strategies.

Authors:  John M Ringman; Alison Goate; Colin L Masters; Nigel J Cairns; Adrian Danek; Neill Graff-Radford; Bernardino Ghetti; John C Morris
Journal:  Curr Neurol Neurosci Rep       Date:  2014-11       Impact factor: 5.081

10.  Large-scale transcriptome-wide association study identifies new prostate cancer risk regions.

Authors:  Nicholas Mancuso; Simon Gayther; Alexander Gusev; Wei Zheng; Kathryn L Penney; Zsofia Kote-Jarai; Rosalind Eeles; Matthew Freedman; Christopher Haiman; Bogdan Pasaniuc
Journal:  Nat Commun       Date:  2018-10-04       Impact factor: 14.919

View more
  1 in total

1.  Redefining tissue specificity of genetic regulation of gene expression in the presence of allelic heterogeneity.

Authors:  Marios Arvanitis; Karl Tayeb; Benjamin J Strober; Alexis Battle
Journal:  Am J Hum Genet       Date:  2022-01-31       Impact factor: 11.025

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.