Literature DB >> 27182969

A method to decipher pleiotropy by detecting underlying heterogeneity driven by hidden subgroups applied to autoimmune and neuropsychiatric diseases.

Buhm Han^1,2,3,4, Jennie G Pouget^1,5,6,7, Kamil Slowikowski^1,3,4,8, Eli Stahl⁹, Cue Hyunkyu Lee¹⁰, Dorothee Diogo^1,3,4, Xinli Hu^1,3,4,11, Yu Rang Park^10,12, Eunji Kim^10,13, Peter K Gregersen¹⁴, Solbritt Rantapää Dahlqvist¹⁵, Jane Worthington^16,17, Javier Martin¹⁸, Steve Eyre^16,17, Lars Klareskog¹⁹, Tom Huizinga²⁰, Wei-Min Chen²¹, Suna Onengut-Gumuscu²¹, Stephen S Rich²¹, Naomi R Wray²², Soumya Raychaudhuri^1,3,4,19,23.

Abstract

There is growing evidence of shared risk alleles for complex traits (pleiotropy), including autoimmune and neuropsychiatric diseases. This might be due to sharing among all individuals (whole-group pleiotropy) or a subset of individuals in a genetically heterogeneous cohort (subgroup heterogeneity). Here we describe the use of a well-powered statistic, BUHMBOX, to distinguish between those two situations using genotype data. We observed a shared genetic basis for 11 autoimmune diseases and type 1 diabetes (T1D; P < 1 × 10(-4)) and for 11 autoimmune diseases and rheumatoid arthritis (RA; P < 1 × 10(-3)). This sharing was not explained by subgroup heterogeneity (corrected PBUHMBOX > 0.2; 6,670 T1D cases and 7,279 RA cases). Genetic sharing between seronegative and seropostive RA (P < 1 × 10(-9)) had significant evidence of subgroup heterogeneity, suggesting a subgroup of seropositive-like cases within seronegative cases (PBUHMBOX = 0.008; 2,406 seronegative RA cases). We also observed a shared genetic basis for major depressive disorder (MDD) and schizophrenia (P < 1 × 10(-4)) that was not explained by subgroup heterogeneity (PBUHMBOX = 0.28; 9,238 MDD cases).

Entities: Chemical Disease Gene Mutation Species

Mesh：

Substances：
Genetic Markers

Year: 2016 PMID： 27182969 PMCID： PMC4925284 DOI： 10.1038/ng.3572

Source DB: PubMed Journal: Nat Genet ISSN： 1061-4036 Impact factor: 38.330

INTRODUCTION

Recent studies have demonstrated that many diseases share risk alleles[1-4] and exhibit significant coheritability[5-7]. Coheritability studies are defining the relationship between complex traits, and providing new insights into disease mechanisms. Critically, as the number of phenotypes studied with genetics expands in the context of emerging deeply phenotyped population-wide cohorts[8], including the Precision Medicine Initiative[9], coheritablity between traits will become even more apparent. In the genomic era, methods for detecting coheritability have moved beyond traditional approaches such as twin or family studies[10, 11]. Now, alternative approaches using genome-wide association study (GWAS) data from unrelated individuals are widely used. Polygenic risk score approaches[3, 12, 13] build genetic risk scores (GRSs) for one phenotype and test their association with a second phenotype. Mixed-model approaches[5, 6, 14] can estimate the genetic covariance between two traits on the observed scale. Genetic covariance can be used to calculate genetic correlation and coheritability[6]. Cross-trait LD Score regression (LDSC) utilizes linkage disequilibrium (LD) and summary statistics obtained from GWAS to estimate genetic correlation attributable to SNPs[7]. In addition, the p-values of independent SNPs associated with multiple phenotypes can be tested for a significant deviation from the null distribution[2]. These approaches have been applied to demonstrate significant shared genetic structure among many phenotypes[5, 7, 15] including autoimmune[2] and neuropsychiatric diseases[3, 6, 13]. The observed coheritability and genetic sharing suggests the possibility of pleiotropy, defined here as the sharing of risk alleles across traits at specific loci or at a genome-wide level. An example of pleiotropy is the PTPN22 variant R620W, which is associated with multiple autoimmune diseases[16]. Shared risk alleles across diseases can be driven by all individuals or by a subset of individuals. In the former, the sharing is clearly driven by pleiotropy (whole-group pleiotropy). In the latter, only a subset of individuals is genetically similar to another disease. We call this subgroup heterogeneity – a situation where a patient cohort consists of genetically distinct subgroups that may or may not result in distinct symptom profiles and treatment outcomes. Subgroup heterogeneity can occur in the context of misclassifications (e.g. cases with atypical clinical presentations for a different disease are erroneously included), molecular subtypes (e.g. two different etiologies cause a disease, resulting in a subset of cases that share pathogenesis with a different disease), asymmetric causal relationships (e.g. one disease causes another disease, resulting in a subset of cases that also have the causal disease; often called mediated pleiotropy), or ascertainment bias (e.g. cases also affected with a different disease are more likely to come to clinical attention and be included in the study). These situations result in a subset of cases that is genetically similar to another disease, creating shared genetic structure[17]. Indeed, there is now evidence that misclassifications[18-21], etiological diversity[22], and ascertainment bias[23] are prevalent across certain human diseases, leading to the conclusion that significant heterogeneity may exist[24-27]. Since the potential contribution of subgroup heterogeneity to any genetic sharing observed between diseases represents a critical disease insight, statistical methods are needed to distinguish subgroup heterogeneity from whole-group pleiotropy. For the purposes of this paper, we will use the term pleiotropy to refer to whole-group pleiotropy and heterogeneity to refer to subgroup heterogeneity.

RESULTS

Overview of BUHMBOX

Genetic sharing between disease A (DA) and disease B (DB) could be due to pleiotropy, but could also be due to heterogeneity (i.e. a subset of DA cases are genetically more similar to DB cases). If we calculated GRSs for DA cases using DB-associated loci and their effect sizes (GRSB), the mean of GRSB would be statistically different between DA cases and controls under either pleiotropy or heterogeneity. Under pleiotropy, some DB risk alleles impose DA risk, and DB risk alleles will be enriched in DA cases compared to controls. Under heterogeneity, a subset of DA cases will have genetic characteristics of DB, and therefore DB risk alleles will also be enriched in those individuals. In both situations, the enriched DB risk alleles in DA cases will result in an increased mean GRSB in individuals that are DA cases. For the same reasons, if we calculated the rg of DA and DB using cross-trait LDSC[7] in this scenario, the rg would be positive under both pleiotropy and heterogeneity. To detect heterogeneity, even in the presence of pleiotropy, we developed BUHMBOX (Breaking Up Heterogeneous Mixture Based On Cross-locus correlations). Our method tests for the presence of heterogeneous subgroups (i.e. DB-like cases) in an otherwise homogenous phenotype (i.e. DA). To do this, BUHMBOX requires (1) a list of known DB-associated SNPs with corresponding risk alleles, risk allele frequencies, and effect sizes, and (2) individual-level genotype data for DB SNPs in DA cases. BUHMBOX leverages the fact that in the setting of heterogeneity, DB risk alleles have higher allele frequencies only in a specific subset of DA cases. In contrast, under true pleiotropy, DB risk alleles are expected to have higher allele frequencies across all DA cases (Figure 1). If DB risk alleles are enriched in one subgroup, the expected correlations of risk allele dosages between loci will be consistently positive (for details see Supplementary Table 1 and Supplementary Note). BUHMBOX combines these pairwise correlations into one statistic and tests for it; heterogeneity can lead to a significant BUHMBOX test statistic. In contrast, the lack of true heterogeneity or insufficient power to detect the presence of heterogeneity (type II error) can lead to a non-significant BUHMBOX test statistic. Insufficient power occurs when the number of DA cases, heterogeneity proportion, or number of known risk alleles and/or their effect sizes for DB are small.

Figure 1

Overview of BUHMBOX

(a) Under the scenario of subgroup heterogeneity, risk alleles of disease B (DB)-associated loci will be enriched in a subgroup of disease A (DA) cases, producing positive correlations between DB risk allele dosages from independent loci. (b) Under the scenario where there is no heterogeneity and DA and DB share alleles due to pleiotropy (i.e. whole-group pleiotropy), DB risk alleles will be uniformly distributed and have no correlations. Red boxes: risk alleles; white boxes: non-risk alleles.

BUHMBOX discriminates between heterogeneity and pleiotropy

To demonstrate that BUHMBOX detects heterogeneity (even in the presence of pleiotropy), we conducted simulations with the following parameters: sample size of DA case individuals (N), number of risk loci associated to DB (M), and the proportion of DA cases that actually show genetic characteristics of DB (heterogeneity proportion, or π). To simulate realistic distributions of effect sizes and allele frequencies, we sampled odds ratio (OR) and risk allele frequency (RAF) pairs from reported associations in the GWAS catalog[28] (Online Methods). To characterize the false positive rate (FPR) of BUHMBOX we simulated 1,000,000 studies (N=2,000 and M=50) where there was no heterogeneity (π=0, Online Methods) or pleiotropy. BUHMBOX obtained a 5.1% FPR at p<0.05; it also obtained appropriate FPRs at a wide range of statistical significance thresholds (p<0.05 to 0.0005, Supplementary Table 2). To evaluate the FPR of BUHMBOX where there actually was pleiotropy without heterogeneity (π=0), we simulated 1,000 studies (N=2,000 and M=50) assuming DA and DB shared 10% of risk loci (five loci). We quantified the proportion of instances where BUHMBOX and GRS approaches obtained p-values smaller than the threshold p<0.05. GRS appropriately demonstrated 64.8% power to detect shared genetic structure. BUHMBOX demonstrated an appropriate false positive rate of 4.3% to detect heterogeneity (Supplementary Figure 1). Finally, to evaluate BUHMBOX’s power to detect heterogeneity we repeated these simulations assuming there was no pleiotropy, but that there was indeed subtle heterogeneity. We assumed that 10% of DA cases were actually DB (π=0.1). Here, BUHMBOX demonstrated 81.7% power to detect heterogeneity at p<0.05 (Supplementary Figure 1). The GRS approach demonstrated 100% power to detect shared genetic structure. Note that the power difference of the GRS approach in the pleiotropy and heterogeneity simulations is because of the stochastic chance that sampled effect sizes of all five loci may be small in the pleiotropy simulation; in simulations where we fixed the OR (1.25) and RAF (0.3) for all loci, the power of GRS was similar: 91.8% in pleiotropy and 92.0% in heterogeneity. Together, these simulations illustrate that BUHMBOX is sensitive to heterogeneity but robust to pleiotropy, while the GRS detects both scenarios and cannot discriminate between the two. Thus, BUHMBOX complements methods for detecting pleiotropy by helping to interpret shared genetic structure (Supplementary Table 1).

Weighting pairwise correlations increases power

BUHMBOX combines multiple pairwise correlations into one statistic. A pair of loci with larger allele frequencies and effect sizes will show larger expected correlation given the same π, and may be more informative than other pairs of loci (Supplementary Figure 2). We hypothesized that accounting for this unequal information between SNP pairs could increase power. We defined a scheme to weight pairwise correlations between loci as a function of their effect sizes and allele frequencies (Online Methods). In simulations we observed substantial power gain with this weighting scheme. Assuming 1,000 cases and 50 loci, we compared the BUHMBOX power implemented with and without weighting correlations (equation (12) in Supplementary Note). Across a wide range of π we observed that weighting dramatically increased power (Figure 2). For example, at π=0.1 the weighted implementation of BUHMBOX obtained 74% compared to the unweighted implementation which obtained only 36% power.

Figure 2

Power gain by weighting SNPs by allele frequency and effect size

We compared the statistical power of BUHMBOX with a weighting scheme that optimally weights correlations between SNPs (weighted) to an alternative approach that weights correlations uniformly (unweighted; equation (12) in Supplementary Note). We simulated 1,000 case individuals and assumed 50 risk loci, whose OR and RAFs were sampled from the GWAS catalog. Colored bands denote 95% confidence intervals of power estimates.

Power is proportional to number of samples and loci

The statistical power of BUHMBOX is a function of many factors including sample size N of the cases we are testing for heterogeneity in, heterogeneity proportion π, number of loci M for the coheritable disease, RAF, and OR. We sampled pairs of RAF and OR from the GWAS catalog. Given a sample size of N=2,000 cases and 2,000 controls, assuming π=0.2 and 50 risk loci, BUHMBOX achieved 92% power at p<0.05 (Figure 3). As many GWAS now consist of more than 2,000 cases, and many diseases are approaching 50 known associated loci[28], BUHMBOX is currently well powered to detect a moderate amount heterogeneity (π=0.2) for many human traits. Modest heterogeneity is more challenging to detect at this sample size; power decreased to 67% at π=0.1 and to 38% at π=0.05. Power can be augmented with larger sample size (Figure 3) and larger effect sizes (Supplementary Figure 3). Power can also be increased by including large numbers of loci with even nominal evidence of association in addition to established genome-wide significant loci (Supplementary Note and Supplementary Figure 4).

Figure 3

BUHMBOX power analysis

Power of BUHMBOX for detecting heterogeneity as a function of the number of risk loci, number of case samples, and the proportion of samples that actually have different phenotype (heterogeneity proportion, π). We assume that we have the same number of controls as cases. White lines denote 20, 40, 60, and 80% power. (a) Power as a function of number of case individuals and heterogeneity proportion, when the number of risk loci is fixed at 50. (b) Power as a function of number of risk loci and heterogeneity proportion, when the case sample size is fixed at 2,000.

Controlling for linkage disequilibrium

Although BUHMBOX adequately controlled the FPR when loci were truly independent, we were concerned that long-range LD between apparently independent loci may introduce false positives[29]. To ensure BUHMBOX was robust to LD, we implemented the following strategies: (1) stringent LD-pruning of DB loci to exclude SNPs with r>0.1, and (2) accounting for any remaining residual LD by assessing the relative increase of correlations in cases compared to controls (delta-correlations). We evaluated these strategies by measuring FPR using the RA Immunochip Consortium data[30]. In 1,000 different loosely pruned (r<0.5) SNP sets constructed using the Sweden EIRA data (Online Methods), the FPR without using delta-correlations was high (22.4% at p<0.05). Applying delta-correlations reduced this FPR to 9.5%. When we used stringent pruning (r<0.1), FPR was appropriately controlled (FPR 5.9% and FPR 5.3% with and without delta correlations, respectively). Although LD pruning alone was sufficiently effective for FPR control in this simulation, we used both strategies throughout the paper to be conservative.

Accounting for population stratification

Another potential confounding factor is population stratification. If population stratification exists, weak correlations between unlinked loci may occur, leading to inappropriate significance. If similar population stratification exists in cases and controls, the use of delta-correlations mitigates this effect. To more aggressively control for the effect of stratification at the individual level, we implemented BUHMBOX to regress out principal components (PCs) from risk allele dosages before calculating correlation statistics. To evaluate this strategy, we simulated extreme population stratification using HapMap[31] data (60 CEU and 60 YRI founders as cases, and 90 JPG+CHB founders as controls; λGC=26.5). Unsurprisingly, in 5,000 randomly sampled sets of independent SNPs we observed an inflated BUHMBOX FPR (14.1% at p<0.05). After regressing the effect of ten PCs from risk allele dosages, we observed that the FPR was appropriately controlled (5.7% at p<0.05). As an additional test under a more realistic scenario, we merged genotype data from Northern Europe (Sweden EIRA cohort; 2,762 cases/1,940 controls) and Southern Europe (Spain cohort; 807 cases/399 controls) in the RA Immunochip Consortium case-control dataset[30] (Online Methods) to create a highly stratified dataset. In 1,000 sets of randomly sampled independent SNPs, we observed an inflation of the FPR (8.6% at p<0.05); this was appropriately corrected (5.9% at p<0.05) when we regressed out the effect of ten PCs.

Application to autoimmune diseases

Autoimmune diseases share genetic loci[2, 4, 32–36], clustering in specific immune pathways[2, 27, 36]. We used the GRS approach to evaluate shared genetic structure between autoimmune diseases, and then applied BUHMBOX to assess heterogeneity. We obtained individual-level genotype data from the Type 1 Diabetes Genetics Consortium (T1DGC) UK case-control cohort (6,670 cases and 9,416 controls)[37] and the RA Immunochip Consortium’s six RA case-control cohorts (7,279 seropositive RA cases and 15,870 controls)[30] (Online Methods). We evaluated genetic sharing between a spectrum of autoimmune diseases with T1D and RA. We obtained associated independent loci for all 18 autoimmune diseases (r2<0.1, including MHC SNPs) from ImmunoBase (see URLs and Supplementary Table 3), and tested the association of GRSs for these autoimmune diseases with T1D and RA case status. We observed substantial genetic sharing between autoimmune diseases. T1D demonstrated significant sharing with alopecia areata (AA), autoimmune thyroid disease (ATD), celiac disease (CEL), Crohn’s disease (CRO), juvenile idiopathic arthritis (JIA), primary biliary cirrhosis (PBC), primary sclerosing cholangitis (PSC), RA, Sjögren’s syndrome (SJO), systemic lupus erythematosus (SLE), and vitiligo (VIT) (positive association, p<10−4). RA exhibited significant sharing with AA, ankylosing spondylitis (AS), ATD, CEL, JIA, PBC, PSC, SLE, systemic sclerosis (SSC), T1D and VIT (p<10−3). Overall, GRSs showed significant positive associations for 11 autoimmune diseases each in T1D and RA cohorts, respectively (GRS p<2.9×10−3 [=0.05/17 correcting for 17 diseases tested]; Table 1, Supplementary Table 4). We considered only these traits for subsequent analyses.

Table 1

Summary of genetic overlap using GRS and BUHMBOX

Only the traits that have significant GRS p-values in positive directions are shown. Significant GRS p-value indicates evidence of shared genetic structure; significant BUHMBOX p-value indicates evidence of heterogeneity. See Supplementary Table 4 for the full results for all traits tested.

Cohort data	Test trait	#SNP	GRS p-value	GRS Beta (95% CI)	BUHMBOX p-value	BUHMBOX power at π=0.20
T1D	AA	10	1.4 × 10⁻¹²⁰	0.76 (0.69 – 0.82)	0.83	0.15
	ATD	7	1.4 × 10⁻³¹	0.48 (0.40 – 0.56)	0.30	0.05
	CEL	38	2.2 × 10⁻³⁵	0.32 (0.27 – 0.38)	0.16	0.50
	CRO	119	2.4 × 10⁻⁰⁵	0.08 (0.04 – 0.11)	0.54	0.99
	JIA	22	3.6 × 10⁻¹⁵¹	0.44 (0.40 – 0.47)	0.37	0.96
	PBC	19	1.1 × 10⁻¹²	0.16 (0.11 – 0.20)	0.18	0.82
	PSC	12	4.1 × 10⁻²⁶	0.38 (0.31 – 0.45)	0.91	0.08
	RA	68	6.6 × 10⁻⁸⁹	0.55 (0.49 – 0.60)	0.45	0.40
	SJO	7	3.9 × 10⁻¹⁴⁶	0.53 (0.49 – 0.57)	0.84	0.66
	SLE	16	1.1 × 10⁻⁸³	0.44 (0.39 – 0.48)	0.79	0.91
	VIT	12	2.5 × 10⁻⁹⁰	0.59 (0.53 – 0.65)	0.14	0.33
RA	AA	10	1.5 × 10⁻²²	0.28 (0.22 – 0.34)	0.71	0.23
	AS	24	6.1 × 10⁻⁰⁴	0.10 (0.04 – 0.15)	0.19	0.20
	ATD	7	3.9 × 10⁻²⁰	0.34 (0.27 – 0.41)	0.57	0.08
	CEL	38	6.4 × 10⁻²⁰	0.21 (0.17 – 0.26)	0.57	0.63
	JIA	22	8.9 × 10⁻¹²⁵	0.36 (0.33 – 0.39)	0.61	0.99
	PBC	19	1.5 × 10⁻¹³	0.15 (0.11 – 0.19)	0.83	0.90
	PSC	12	6.2 × 10⁻¹⁴	0.24 (0.18 – 0.31)	0.46	0.12
	SLE	16	4.3 × 10⁻⁰⁶	0.10 (0.05 – 0.14)	0.34	0.96
	SSC	5	9.6 × 10⁻¹⁰	0.22 (0.15 – 0.29)	0.08	0.09
	T1D	53	9.6 × 10⁻²⁰⁷	0.43 (0.40 – 0.46)	0.29	1.00
	VIT	12	1.8 × 10⁻¹¹	0.18 (0.12 – 0.23)	0.02	0.41
Seroneg.RA	Seropos.RA	14	1.1 × 10⁻¹⁰	0.30 (0.21 – 0.39)	0.008	0.26
MDD	SCZ	90	1.5 × 10⁻⁵	0.17 (0.09 – 0.24)	0.28	0.53

AA, Alopecia areata; AS, Ankylosing spondylitis; ATD, Autoimmune thyroid disease; CEL, celiac disease; CRO, Crohn’s disease; JIA, juvenile idiopathic arthritis; MS, multiple sclerosis; PBC, primary biliary cirrhosis; PSC, primary sclerosing cholangitis; SJO, Sjögren’s syndrome; SLE, systemic lupus erythematosus; SSC, Systemic sclerosis; UC, ulcerative colitis; VIT: Vitiligo; MDD, major depressive disorder; SCZ, schizophrenia; Seroneg., seronegative; Seropos., seropositive.

To evaluate the degree of heterogeneity necessary to achieve the observed genetic sharing for these autoimmune diseases, we calculated the GRS regression coefficient, which we previously showed approximates the expected heterogeneity proportion π[38] assuming no pleiotropy. Based on the GRS coefficients, we observed π estimates ranging from 0.08–0.76 across the different autoimmune diseases in T1D and from 0.10–0.43 in RA (Figure 4, Table 1).

Figure 4

Genetic sharing between autoimmune diseases and psychiatric disorders

In (a) and (b), we show only the diseases that have significantly positive GRS p-values out of the 17 tested. Y-axis denotes the expected heterogeneity proportion (π) to explain observed genetic sharing. Vertical bars indicate 95% confidence intervals. Heterogeneity proportion estimates are based on GRS analysis, assuming no pleiotropy for (a) T1D, (b) RA, (c) seronegative RA, and (d) MDD.

We estimated the power of BUHMBOX to detect heterogeneity, correcting for 11 tests (p<4.5×10−3). BUHMBOX was well powered for some autoimmune traits; at π=0.2, four traits had >90% power for T1D, and four traits had >90% power for RA (Figure 5). Despite this, we observed no evidence of heterogeneity at all (corrected p>0.2; Figure 6, Table 1). Our findings suggest that autoimmune diseases share similar risk alleles and pathways with T1D and RA, and not by subgroups of genetically similar cases resulting from misclassifications or molecular subtypes.

Figure 5

Statistical power of BUHMBOX to detect heterogeneity

We calculated power by performing 1,000 simulations with corresponding sample size, number of risk alleles, risk allele frequencies, and odds ratios. To calculate power for (c) and (d), we used a significance threshold of 0.05. For (a) and (b), the threshold was adjusted using the Bonferroni correction accounting for 11 tests in T1D and RA, respectively.

Figure 6

BUHMBOX results

We show only diseases with significantly positive GRS p-values (for complete results for all traits tested, see Supplementary Table 4). Significant GRS p-values indicate evidence of shared genetic structure; significant BUHMBOX p-value indicates evidence of heterogeneity. Point size represents the number of DB-associated SNPs included in the analysis. Dashed vertical lines denote the Bonferroni-adjusted significance threshold for the BUHMBOX test statistic. Arrow indicates significant BUHMBOX test statistic.

Application to subtype misclassifications in RA

RA consists of two subtypes, seropositive and seronegative, with distinct clinical outcomes and MHC associations[38]. These two subtypes are classified by whether patients are reactive to anti-CCP antibody. While anti-CCP testing is specific, its lack of sensitivity can result in some seropositive RA patients being misclassified as seronegative RA[20]. We previously demonstrated that there is shared genetic structure between seropositive and seronegative RA using the GRS approach[38], which could imply misclassifications of up to 26.3% between the two RA subtypes. We used BUHMBOX to evaluate whether seropositive RA misclassifications are present in a seronegative RA cohort. We used the seronegative RA cohort (2,406 cases/15,870 controls) from the RA Immunochip Consortium[30]. Among 68 RA-associated independent loci, we chose SNPs that are associated to seropositive RA (p<5×10−8) but not seronegative RA (p>5×10−8) in our Immunochip data. This criterion resulted in 14 specific loci exclusively associated to seropositive RA (Supplementary Table 3). The seropositive RA GRS was significantly associated with seronegative RA case status (β=0.30, p=1.1×10−10). The regression coefficient (β=0.30) represents an upper bound for π (Figure 4). BUHMBOX suggested that heterogeneity was indeed present (p=0.008, Figure 6, Table 1, Supplementary Table 4), consistent with potential subtype misclassifications. As a more stringent test, we selected SNPs based on between-RA-subtype heterogeneity test results; for this test we obtained p-values by assigning seropositive RA as cases and seronegative RA as controls. We chose SNPs that are associated to seropositive RA (p<5×10−8) and show nominally significant between-RA-subtype heterogeneity (p<0.05, Supplementary Table 3). Applying BUHMBOX to these 12 loci still showed significant heterogeneity within the seronegative RA cohort (p=0.017).

Application to major depressive disorder and schizophrenia

Current definitions of psychiatric disorders reflect clinical syndromes, with overlapping clinical features. As a result, psychiatric diagnoses for a patient may change as their symptoms evolve[21]. In addition to the potential for misdiagnosis, a subset of true MDD cases may be genetically more similar to schizophrenia. If heterogeneity with respect to schizophrenia risk alleles exists among MDD cases, then genetic studies would suggest evidence of coheritability between the two disorders[17] as has been observed in previous studies[3, 6, 7]. The unintentional inclusion of “schizophrenia-like” MDD cases, due to diagnostic misclassification or genetically distinct subgroups, has been acknowledged and explored as a potential source of bias in coheritability studies by previous investigators[3, 17]. We used BUHMBOX to test for a subgroup of “schizophrenia-like” cases in MDD. If a subset of MDD cases are misdiagnosed and in fact have schizophrenia, or are more genetically similar to schizophrenia, we would expect to see subgroup heterogeneity among MDD cases with respect to schizophrenia risk loci. We first evaluated evidence of shared genetic structure among 90 known schizophrenia associated loci[39] (Supplementary Table 3) in 9,238 MDD cases and 7,521 controls from the Major Depressive Disorder Working Group of the Psychiatric Genomics Consortium[40] (Supplementary Table 5). Consistent with previous findings (Supplementary Table 6)[3, 6,7], the GRS was associated with MDD case status (p=1.54×10−5) indicating shared genetic structure (Figure 4). For the GRS analysis we used a refined subset of the total sample (6,382 MDD cases and 5,614 controls), excluding samples that overlapped with the schizophrenia GWAS[39] (Online Methods). Application of cross-trait LDSC[7] to estimate the genetic correlation obtained further evidence of shared genetic structure between MDD and SCZ (rg=0.47, SE=0.07, p=1.61×10−10), of similar magnitude to previous reports[7]. However, the BUHMBOX p-value was not significant (p=0.28), indicating no excess positive correlations among schizophrenia loci within MDD cases (Figure 6, Supplementary Table 4). Our findings suggest no evidence of a subgroup of schizophrenia-like MDD cases. However, we note that we lacked adequate statistical power to detect heterogeneity in the context of a small heterogeneity proportion. Given the MDD sample size and the number of currently known schizophrenia risk loci, there was 53% power at π=0.20 but only 25% power at π=0.10 (Figure 5).

DISCUSSION

BUHMBOX can distinguish whether shared genetic structure between traits is the consequence of heterogeneity or pleiotropy based on SNP genotype data alone. It can help to interpret recent observations of shared genetic structures in complex traits including autoimmune, neuropsychiatric, and metabolic diseases. The intuition behind BUHMBOX is that if heterogeneity exists, independent loci will show non-random positive correlations. Hence, correcting for population structure and long-range LD is critical for this approach to be effective. We emphasize that it is necessary to appropriately interpret the source of heterogeneity, which will depend on the biological and clinical relationship between the two traits. We provide detailed information to guide interpretation in the Supplementary Note. We demonstrated that genetic sharing between autoimmune diseases is due to pleiotropy, noting that for a few traits we had only modest power (Figure 5). One notable exception was seronegative RA, which might contain misclassified seropositive RA cases. The results presented here demonstrate that seronegative RA is a heterogeneous phenotype with respect to genetic overlap with seropositive RA, bringing clarity to an ongoing debate about the nature of this disease. In contrast we were underpowered to draw more definitive conclusions as to whether a subset of MDD cases are genetically similar to schizophrenia cases; as MDD cohorts increase in size we will be able to reassess more accurately whether smaller proportions of heterogeneity might partially explain observed coheritability. Our results are consistent with recent analyses concluding that pleiotropy between psychiatric diseases is unlikely explained by misclassifications alone[17]. We showed that the power of BUHMBOX is a function of sample size, heterogeneity proportion π, and the number, effect sizes and allele frequencies of loci. Power for subtle heterogeneity (π<0.1) in current datasets is limited. But, in future studies, increasing sample size and number of known associated loci will augment power. One potential strategy to augment power is to use a polygenic modeling[3, 12, 13] approach, including a larger number of SNPs with less stringent significance thresholds (Supplementary Note and Supplementary Figure 4). BUHMBOX has certain key caveats. First, it is designed to detect a specific type of heterogeneity resulting from the presence of a subgroup comprising a known second trait. Thus, BUHMBOX cannot currently be applied agnostically to detect the presence of heterogeneity within a dataset. Second, BUHMBOX requires prior knowledge of associated loci and their effect sizes. For diseases with few known loci, BUHMBOX may perform suboptimally. Also, if known effect size estimates are inaccurate, power may decrease because appropriate weighting is crucial (Figure 2). Third, BUHMBOX requires individual-level genotype data for a limited number of loci. Fourth, BUHMBOX can be sensitive to confounding factors. We recommend careful control of LD and population structure using LD pruning and PCs. Fifth, interpretation of the BUHMBOX test statistic is not simplistic. Positive findings indicate the presence of heterogeneity but cannot distinguish between the various causes of this (e.g. misclassifications, molecular subtypes, mediated pleiotropy, ascertainment bias), and negative findings may indicate no heterogeneity or low power. To aid interpretation, BUHMBOX provides a power calculation based on sample size and risk allele information, but it may not always be accurate. For example, if pleiotropy and heterogeneity co-exist, power may be overestimated. Sixth, if the heterogeneity proportion π is small (e.g. 0.05), BUHMBOX’s ability to detect heterogeneity is limited. We expect that π will vary between situations, and further clinical and biological investigations are necessary to uncover true π. Finally, there is the unlikely possibility that real epistasis can manifest as positive signal for BUHMBOX. Broadly, BUHMBOX can be thought of as capturing a specific form of epistasis where risk alleles correlate positively within the additive model. As such, if this specific form of epistasis occurs naturally between DB-associated SNPs, and if this epistasis structure is shared with DA, it has the potential to create a significant BUHMBOX test result and confound these analyses. However, this specific type of epistasis seems unlikely; were it present, application of BUHMBOX using DB-associated SNPs in DB cases to detect apparent “heterogeneity” might yield a significant result. When comparing BUHMBOX to existing approaches, we focused on the GRS method. However, the results of our comparison also apply to other existing methods such as mixed-model-based approaches[5, 6] and LD-score-based approaches[7], which are similar to the GRS approach in the sense that they detect both pleiotropy and heterogeneity. We expect that BUHMBOX will complement any of these methods to facilitate interpretation of observed genetic sharing between traits. Our statistical approach may be extended to have application beyond heterogeneity, including identification of missing heritability resulting from this type of heterogeneity[41]. These applications will become more feasible as functional annotations of SNPs advance in the coming years.

ONLINE METHODS

Genetic risk score approach

Given M independent risk loci associated to DB, we calculated the GRS of individual i as where x is individual i’s risk allele dosage at marker j, and β is the effect size (log odds ratio) of risk allele at marker j for disease DB. The GRS approach calculates GRSs for all individuals and associates GRSs to the case/control status of DA. In the logistic regression framework for associating GRSs and DA status, we can obtain the regression coefficient for GRS (β). We previously showed that β approximates the proportion of DA cases that are genetically DB (heterogeneity proportion π), if we assume is no pleiotropy and the GRS association is solely driven by a subgroup[38]. Thus, β represents an upper bound of π.

The BUHMBOX approach

To detect heterogeneity within DA cases driven by a subgroup that is genetically similar to DB patients, we utilize the following procedure: Prepare genotype data of DA cases and controls, and information about SNPs associated to DB (risk allele, RAF, and OR). Prune SNPs associated to DB based on LD in control samples (excluding SNPs with r2>0.1 or within ±1Mb of other SNPs) Obtain risk allele dosages of pruned SNPs from DA cases and controls Regress out PCs from risk allele dosages to obtain residual dosages, each locus at a time Calculate R, the correlation matrix of residual dosages of risk alleles in N cases with DA and R′, in N′ controls Calculate Y, a z-score matrix from delta-correlations: Calculate the BUHMBOX statistic: where y is the element in Y at row i and column j. Given M pruned SNPs, (i,j) iterates M(M−1)/2 non-diagonal elements of Y. The w term is a weighting function that is designed to maximize power, such that (equation (13) in Supplementary Note): where p is RAF of SNP i, and γ is the OR of SNP i for DB. The BUHMBOX statistic follows N(0,1) under the null hypothesis. We calculate the significance of this statistic as a positive one-sided test; the p-value is p = 1 − Φ(S) where Φ is the cumulative density function of the standard normal distribution. In the context of heterogeneity, excessive positive correlations among DB risk alleles in DA cases result in p < α. See Supplementary Table 1 for a comparison of BUHMBOX and GRS approaches. The BUHMBOX test statistic was inspired by previous work deriving covariance between correlation estimates[42] and on combining dependent estimates[43, 44]. For details of the intuition, derivation, optimization, and interpretation of the BUHMBOX test statistic, see Supplementary Note.

Code availability

BUHMBOX has been fully implemented as a publicly available R script (see URLs).

Power and false positive rate simulations

Given sample size of DA cases (N), proportion of DA cases that actually show genetic characteristics of DB (heterogeneity proportion π), and number of risk loci associated to DB (M), we simulated studies to estimate power of our method as follows. To simulate a reasonable joint distribution of RAFs and ORs, we downloaded the GWAS catalog (as of 29 April 2014). Among all binary traits in the catalog, we selected traits with ≥50 reported SNPs resulting in 22 traits with 1,480 SNPs. From these SNPs, we sampled M pairs of RAF (p) and their corresponding OR (γ). To simulate genotypes, we set the RAF of a subgroup (Nπ individuals) to γp/((γ−1)p+1) and p for the other subgroup (N(1−π) individuals), because Nπ individuals can be thought of as DB cases. Within each subgroup, we generated genotypes assuming that risk alleles are distributed according to the Hardy-Weinberg equilibrium (HWE) and risk loci are independent. We assumed HWE in cases because we assumed an additive disease model. Then we applied BUHMBOX to calculate the p-value. We repeated this 1,000 times to approximate power as the proportion of simulations with p-values ≤0.05. We evaluated power for different values of N, M, and π. Under the assumption that the loci are independent, the FPR simulation was equivalent to the power simulation described above with the only difference being that π was set to zero, which forced the null hypothesis. We measured the FPR by assuming N=1,000 and M=20, and constructing 1,000,000 such studies.

Linkage disequilibrium simulations

To simulate realistic LD, we used chromosome 22 data from control individuals in the Swedish EIRA cohort of the RA dataset (2,762 cases/1,940 controls)[30]. We assigned half of control individuals as cases and the rest as controls. To generate 1,000 random sets of SNPs, we began from all SNPs and thinned the SNP set by 10-fold with different seed numbers using PLINK[45] (with the command --thin 0.1). We then pruned each of the 1,000 datasets using PLINK[45] with r2 criterion of 0.5 or 0.1.

Population stratification simulations

To assess the effects of population stratification, we conducted two sets of simulations. First, used data from HapMap[31] release 23 data (60 CEU founders, 60 YRI founders, and 90 JPT+CHB founders) setting CEU+YRI as cases and JPT+CHB as controls. We calculated PCs after LD pruning (r2<0.1). For DB SNPs we randomly selected 5,000 sets of 22 independent SNPs; we selected a single SNP from each autosome. Second, we used genotype data from a Northern Europe RA cohort (Swedish EIRA; 2,762 cases/1,940 controls) and a Southern Europe cohort (Spain; 807 cases/399 controls) from the RA dataset[30]. For this simulation we used SNPs that we had generated for LD simulations (described above, thinned from Swedish EIRA chromosome 22 with criterion r2<0.1), by setting them as cases and adding Spain samples as controls.

Application to specific phenotypes

Type 1 diabetes dataset

To evaluate pleiotropy and heterogeneity between 18 autoimmune diseases and T1D, we applied GRS and BUHMBOX approaches to the UK case-control dataset provided by the T1DGC[37], which consisted of a total of 16,086 samples (6,670 cases and 9,416 controls) from three collections: (1) cases from the UK-GRID, (2) shared controls from the British 1958 Birth Cohort and (3) shared controls from Blood Services controls (data release 4 February 2012, hg18). The samples were collected from 13 regions. All samples were collected after obtaining informed consent, and were genotyped on the Immunochip array. GRS and BUHMBOX analyses were conducted using the region index as covariates.

Rheumatoid arthritis dataset

To evaluate pleiotropy and heterogeneity between 18 autoimmune diseases and RA, we used the RA Immunochip consortium data from six RA case-control cohorts (UK, US, Dutch, Spanish, Swedish Umea, and Swedish EIRA)[30]. To evaluate pleiotropy to autoimmune diseases, we used 7,279 seropositive RA cases and 15,870 controls. To evaluate misclassifications of RA subtypes, we used 2,406 seronegative RA samples and the same controls. Seropositive and seronegative RA patients were defined in each cohort using standard clinical practices to assess whether patients were reactive to anti-CCP antibody[38]. All samples were obtained with informed consent, and were collected through institutional review board approved protocols. All individuals self-reported as white and of European descent. Samples were genotyped with the Immunochip array. We merged the data of six cohorts into one, and used binary variables representing cohorts as well as 10 PCs as covariates in the analysis.

Defining autoimmune risk loci

We accessed ImmunoBase (7 June 2015 version) to define genome-wide significant risk loci for 18 autoimmune diseases. We did not include inflammatory bowel disease, due to its redundancy with Crohn’s disease and ulcerative colitis. For each of the 18 autoimmune diseases analyzed we pruned the list of index SNPs obtained from ImmunoBase in PLINK[45] with options --r2 --ld-window-r2 0.1, using the 1000 Genomes Phase 1 European reference panel for LD. For all pairs of SNPs with r2>0.1, we kept the most strongly associated SNP. To ensure completely independent risk loci we also removed SNPs annotated as being located in the same chromosomal region in ImmunoBase, again keeping the most strongly associated index SNP (Supplementary Table 3). When a locus was not in the Immunochip datasets, we looked for a proxy (r2>0.2) based on the 1000 Genomes data.

Major depressive disorder dataset

We used BUHMBOX to investigate the relationship between MDD and schizophrenia, which have been previously reported to share genetic etiology based on polygenic risk scoring[3] and coheritability analyses[6]. The full MDD sample analyzed comprised nine GWAS datasets collected from eight separate studies (Supplementary Table 5) as previously described[40]. All samples were collected through institutional review board approved protocols and were obtained with informed consent. Independence of the training (SCZ) and target (MDD) datasets is crucial in GRS analyses; GRSs are constructed using effect size estimates obtained using allele frequency differences between cases and controls in the training GWAS, and overlapping cases or controls will therefore bias the association of GRSs to the target dataset in the positive direction. In contrast the BUHMBOX test statistic is based on the correlation of risk allele dosages among cases, which is orthogonal to allele frequency differences in cases and controls, and is therefore not inflated by sample overlap. Thus, for the GRS analysis individual MDD samples (four cases, 886 controls) that overlapped with those in the schizophrenia GWAS[39] were removed from the analysis; three GWAS cohorts with an insufficient number of independent control samples (N<5) were also removed from the analysis. GRS analyses were conducted in each of the remaining six GWAS datasets (Supplementary Table 5), followed by meta-analysis of the GRS. To obtain the overall GRS effect size (β) and test statistic we used the inverse-variance weighted fixed effects method. For BUHMBOX, we used the full dataset; analyses were conducted in each of the nine GWAS datasets (Supplementary Table 5) followed by meta-analysis. Because the BUHMBOX statistic is a z-score, we meta-analyzed BUHMBOX results across the datasets using the standard weighted sum of z-score approach, where z-scores are weighted by the square root of the sample size.

Defining schizophrenia risk loci

Schizophrenia associated SNPs were selected as those showing genome-wide significant association with schizophrenia (p<5×10−8) in the most recent Psychiatric Genomics Consortium[39] GWAS. For schizophrenia associated SNPs not directly genotyped in the MDD GWAS datasets, we selected proxy SNPs as those with the highest r2 from the list of all proxies with r2>0.2 using the 1000 Genomes Phase 1 European reference panel. Of the 97 schizophrenia associated SNPs (11 indels were not considered in our analysis), 90 LD-independent SNPs (r2>0.1, distance to each other is >1Mb) were available for analysis in the MDD GWAS datasets either via direct genotyping or by a proxy SNP (see Supplementary Table 3 for a detailed list of SNPs).

42 in total

1. Analysis of families in the multiple autoimmune disease genetics consortium (MADGC) collection: the PTPN22 620W allele associates with multiple autoimmune phenotypes.

Authors: Lindsey A Criswell; Kirsten A Pfeiffer; Raymond F Lum; Bonnie Gonzales; Jill Novitzke; Marlena Kern; Kathy L Moser; Ann B Begovich; Victoria E H Carlton; Wentian Li; Annette T Lee; Ward Ortmann; Timothy W Behrens; Peter K Gregersen
Journal: Am J Hum Genet Date: 2005-02-17 Impact factor: 11.025

Review 2. Immune-mediated disease genetics: the shared basis of pathogenesis.

Authors: Chris Cotsapas; David A Hafler
Journal: Trends Immunol Date: 2012-09-29 Impact factor: 16.687

Review 3. Disentangling the heterogeneity of autism spectrum disorder through genetic findings.

Authors: Shafali S Jeste; Daniel H Geschwind
Journal: Nat Rev Neurol Date: 2014-01-28 Impact factor: 42.937

4. Major depression and generalized anxiety disorder. Same genes, (partly) different environments?

Authors: K S Kendler; M C Neale; R C Kessler; A C Heath; L J Eaves
Journal: Arch Gen Psychiatry Date: 1992-09

5. High-density genetic mapping identifies new susceptibility loci for rheumatoid arthritis.

Authors: Steve Eyre; John Bowes; Dorothée Diogo; Annette Lee; Anne Barton; Paul Martin; Alexandra Zhernakova; Eli Stahl; Sebastien Viatte; Kate McAllister; Christopher I Amos; Leonid Padyukov; Rene E M Toes; Tom W J Huizinga; Cisca Wijmenga; Gosia Trynka; Lude Franke; Harm-Jan Westra; Lars Alfredsson; Xinli Hu; Cynthia Sandor; Paul I W de Bakker; Sonia Davila; Chiea Chuen Khor; Khai Koon Heng; Robert Andrews; Sarah Edkins; Sarah E Hunt; Cordelia Langford; Deborah Symmons; Pat Concannon; Suna Onengut-Gumuscu; Stephen S Rich; Panos Deloukas; Miguel A Gonzalez-Gay; Luis Rodriguez-Rodriguez; Lisbeth Ärlsetig; Javier Martin; Solbritt Rantapää-Dahlqvist; Robert M Plenge; Soumya Raychaudhuri; Lars Klareskog; Peter K Gregersen; Jane Worthington
Journal: Nat Genet Date: 2012-11-11 Impact factor: 38.330

6. Meta-analysis of genome-wide association studies in celiac disease and rheumatoid arthritis identifies fourteen non-HLA shared loci.

Authors: Alexandra Zhernakova; Eli A Stahl; Gosia Trynka; Soumya Raychaudhuri; Eleanora A Festen; Lude Franke; Harm-Jan Westra; Rudolf S N Fehrmann; Fina A S Kurreeman; Brian Thomson; Namrata Gupta; Jihane Romanos; Ross McManus; Anthony W Ryan; Graham Turner; Elisabeth Brouwer; Marcel D Posthumus; Elaine F Remmers; Francesca Tucci; Rene Toes; Elvira Grandone; Maria Cristina Mazzilli; Anna Rybak; Bozena Cukrowska; Marieke J H Coenen; Timothy R D J Radstake; Piet L C M van Riel; Yonghong Li; Paul I W de Bakker; Peter K Gregersen; Jane Worthington; Katherine A Siminovitch; Lars Klareskog; Tom W J Huizinga; Cisca Wijmenga; Robert M Plenge
Journal: PLoS Genet Date: 2011-02-24 Impact factor: 5.917

Review 7. The genetics of major depression.

Authors: Jonathan Flint; Kenneth S Kendler
Journal: Neuron Date: 2014-02-05 Impact factor: 17.173

8. New data and an old puzzle: the negative association between schizophrenia and rheumatoid arthritis.

Authors: S Hong Lee; Enda M Byrne; Christina M Hultman; Anna Kähler; Anna A E Vinkhuyzen; Stephan Ripke; Ole A Andreassen; Thomas Frisell; Alexander Gusev; Xinli Hu; Robert Karlsson; Vasilis X Mantzioris; John J McGrath; Divya Mehta; Eli A Stahl; Qiongyi Zhao; Kenneth S Kendler; Patrick F Sullivan; Alkes L Price; Michael O'Donovan; Yukinori Okada; Bryan J Mowry; Soumya Raychaudhuri; Naomi R Wray; William Byerley; Wiepke Cahn; Rita M Cantor; Sven Cichon; Paul Cormican; David Curtis; Srdjan Djurovic; Valentina Escott-Price; Pablo V Gejman; Lyudmila Georgieva; Ina Giegling; Thomas F Hansen; Andrés Ingason; Yunjung Kim; Bettina Konte; Phil H Lee; Andrew McIntosh; Andrew McQuillin; Derek W Morris; Markus M Nöthen; Colm O'Dushlaine; Ann Olincy; Line Olsen; Carlos N Pato; Michele T Pato; Benjamin S Pickard; Danielle Posthuma; Henrik B Rasmussen; Marcella Rietschel; Dan Rujescu; Thomas G Schulze; Jeremy M Silverman; Srinivasa Thirumalai; Thomas Werge; Ingrid Agartz; Farooq Amin; Maria H Azevedo; Nicholas Bass; Donald W Black; Douglas H R Blackwood; Richard Bruggeman; Nancy G Buccola; Khalid Choudhury; Robert C Cloninger; Aiden Corvin; Nicholas Craddock; Mark J Daly; Susmita Datta; Gary J Donohoe; Jubao Duan; Frank Dudbridge; Ayman Fanous; Robert Freedman; Nelson B Freimer; Marion Friedl; Michael Gill; Hugh Gurling; Lieuwe De Haan; Marian L Hamshere; Annette M Hartmann; Peter A Holmans; René S Kahn; Matthew C Keller; Elaine Kenny; George K Kirov; Lydia Krabbendam; Robert Krasucki; Jacob Lawrence; Todd Lencz; Douglas F Levinson; Jeffrey A Lieberman; Dan-Yu Lin; Don H Linszen; Patrik K E Magnusson; Wolfgang Maier; Anil K Malhotra; Manuel Mattheisen; Morten Mattingsdal; Steven A McCarroll; Helena Medeiros; Ingrid Melle; Vihra Milanova; Inez Myin-Germeys; Benjamin M Neale; Roel A Ophoff; Michael J Owen; Jonathan Pimm; Shaun M Purcell; Vinay Puri; Digby J Quested; Lizzy Rossin; Douglas Ruderfer; Alan R Sanders; Jianxin Shi; Pamela Sklar; David St Clair; T Scott Stroup; Jim Van Os; Peter M Visscher; Durk Wiersma; Stanley Zammit; S Louis Bridges; Hyon K Choi; Marieke J H Coenen; Niek de Vries; Philippe Dieud; Jeffrey D Greenberg; Tom W J Huizinga; Leonid Padyukov; Katherine A Siminovitch; Paul P Tak; Jane Worthington; Philip L De Jager; Joshua C Denny; Peter K Gregersen; Lars Klareskog; Xavier Mariette; Robert M Plenge; Mart van Laar; Piet van Riel
Journal: Int J Epidemiol Date: 2015-10 Impact factor: 7.196

9. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs.

Authors: S Hong Lee; Stephan Ripke; Benjamin M Neale; Stephen V Faraone; Shaun M Purcell; Roy H Perlis; Bryan J Mowry; Anita Thapar; Michael E Goddard; John S Witte; Devin Absher; Ingrid Agartz; Huda Akil; Farooq Amin; Ole A Andreassen; Adebayo Anjorin; Richard Anney; Verneri Anttila; Dan E Arking; Philip Asherson; Maria H Azevedo; Lena Backlund; Judith A Badner; Anthony J Bailey; Tobias Banaschewski; Jack D Barchas; Michael R Barnes; Thomas B Barrett; Nicholas Bass; Agatino Battaglia; Michael Bauer; Mònica Bayés; Frank Bellivier; Sarah E Bergen; Wade Berrettini; Catalina Betancur; Thomas Bettecken; Joseph Biederman; Elisabeth B Binder; Donald W Black; Douglas H R Blackwood; Cinnamon S Bloss; Michael Boehnke; Dorret I Boomsma; Gerome Breen; René Breuer; Richard Bruggeman; Paul Cormican; Nancy G Buccola; Jan K Buitelaar; William E Bunney; Joseph D Buxbaum; William F Byerley; Enda M Byrne; Sian Caesar; Wiepke Cahn; Rita M Cantor; Miguel Casas; Aravinda Chakravarti; Kimberly Chambert; Khalid Choudhury; Sven Cichon; C Robert Cloninger; David A Collier; Edwin H Cook; Hilary Coon; Bru Cormand; Aiden Corvin; William H Coryell; David W Craig; Ian W Craig; Jennifer Crosbie; Michael L Cuccaro; David Curtis; Darina Czamara; Susmita Datta; Geraldine Dawson; Richard Day; Eco J De Geus; Franziska Degenhardt; Srdjan Djurovic; Gary J Donohoe; Alysa E Doyle; Jubao Duan; Frank Dudbridge; Eftichia Duketis; Richard P Ebstein; Howard J Edenberg; Josephine Elia; Sean Ennis; Bruno Etain; Ayman Fanous; Anne E Farmer; I Nicol Ferrier; Matthew Flickinger; Eric Fombonne; Tatiana Foroud; Josef Frank; Barbara Franke; Christine Fraser; Robert Freedman; Nelson B Freimer; Christine M Freitag; Marion Friedl; Louise Frisén; Louise Gallagher; Pablo V Gejman; Lyudmila Georgieva; Elliot S Gershon; Daniel H Geschwind; Ina Giegling; Michael Gill; Scott D Gordon; Katherine Gordon-Smith; Elaine K Green; Tiffany A Greenwood; Dorothy E Grice; Magdalena Gross; Detelina Grozeva; Weihua Guan; Hugh Gurling; Lieuwe De Haan; Jonathan L Haines; Hakon Hakonarson; Joachim Hallmayer; Steven P Hamilton; Marian L Hamshere; Thomas F Hansen; Annette M Hartmann; Martin Hautzinger; Andrew C Heath; Anjali K Henders; Stefan Herms; Ian B Hickie; Maria Hipolito; Susanne Hoefels; Peter A Holmans; Florian Holsboer; Witte J Hoogendijk; Jouke-Jan Hottenga; Christina M Hultman; Vanessa Hus; Andrés Ingason; Marcus Ising; Stéphane Jamain; Edward G Jones; Ian Jones; Lisa Jones; Jung-Ying Tzeng; Anna K Kähler; René S Kahn; Radhika Kandaswamy; Matthew C Keller; James L Kennedy; Elaine Kenny; Lindsey Kent; Yunjung Kim; George K Kirov; Sabine M Klauck; Lambertus Klei; James A Knowles; Martin A Kohli; Daniel L Koller; Bettina Konte; Ania Korszun; Lydia Krabbendam; Robert Krasucki; Jonna Kuntsi; Phoenix Kwan; Mikael Landén; Niklas Långström; Mark Lathrop; Jacob Lawrence; William B Lawson; Marion Leboyer; David H Ledbetter; Phil H Lee; Todd Lencz; Klaus-Peter Lesch; Douglas F Levinson; Cathryn M Lewis; Jun Li; Paul Lichtenstein; Jeffrey A Lieberman; Dan-Yu Lin; Don H Linszen; Chunyu Liu; Falk W Lohoff; Sandra K Loo; Catherine Lord; Jennifer K Lowe; Susanne Lucae; Donald J MacIntyre; Pamela A F Madden; Elena Maestrini; Patrik K E Magnusson; Pamela B Mahon; Wolfgang Maier; Anil K Malhotra; Shrikant M Mane; Christa L Martin; Nicholas G Martin; Manuel Mattheisen; Keith Matthews; Morten Mattingsdal; Steven A McCarroll; Kevin A McGhee; James J McGough; Patrick J McGrath; Peter McGuffin; Melvin G McInnis; Andrew McIntosh; Rebecca McKinney; Alan W McLean; Francis J McMahon; William M McMahon; Andrew McQuillin; Helena Medeiros; Sarah E Medland; Sandra Meier; Ingrid Melle; Fan Meng; Jobst Meyer; Christel M Middeldorp; Lefkos Middleton; Vihra Milanova; Ana Miranda; Anthony P Monaco; Grant W Montgomery; Jennifer L Moran; Daniel Moreno-De-Luca; Gunnar Morken; Derek W Morris; Eric M Morrow; Valentina Moskvina; Pierandrea Muglia; Thomas W Mühleisen; Walter J Muir; Bertram Müller-Myhsok; Michael Murtha; Richard M Myers; Inez Myin-Germeys; Michael C Neale; Stan F Nelson; Caroline M Nievergelt; Ivan Nikolov; Vishwajit Nimgaonkar; Willem A Nolen; Markus M Nöthen; John I Nurnberger; Evaristus A Nwulia; Dale R Nyholt; Colm O'Dushlaine; Robert D Oades; Ann Olincy; Guiomar Oliveira; Line Olsen; Roel A Ophoff; Urban Osby; Michael J Owen; Aarno Palotie; Jeremy R Parr; Andrew D Paterson; Carlos N Pato; Michele T Pato; Brenda W Penninx; Michele L Pergadia; Margaret A Pericak-Vance; Benjamin S Pickard; Jonathan Pimm; Joseph Piven; Danielle Posthuma; James B Potash; Fritz Poustka; Peter Propping; Vinay Puri; Digby J Quested; Emma M Quinn; Josep Antoni Ramos-Quiroga; Henrik B Rasmussen; Soumya Raychaudhuri; Karola Rehnström; Andreas Reif; Marta Ribasés; John P Rice; Marcella Rietschel; Kathryn Roeder; Herbert Roeyers; Lizzy Rossin; Aribert Rothenberger; Guy Rouleau; Douglas Ruderfer; Dan Rujescu; Alan R Sanders; Stephan J Sanders; Susan L Santangelo; Joseph A Sergeant; Russell Schachar; Martin Schalling; Alan F Schatzberg; William A Scheftner; Gerard D Schellenberg; Stephen W Scherer; Nicholas J Schork; Thomas G Schulze; Johannes Schumacher; Markus Schwarz; Edward Scolnick; Laura J Scott; Jianxin Shi; Paul D Shilling; Stanley I Shyn; Jeremy M Silverman; Susan L Slager; Susan L Smalley; Johannes H Smit; Erin N Smith; Edmund J S Sonuga-Barke; David St Clair; Matthew State; Michael Steffens; Hans-Christoph Steinhausen; John S Strauss; Jana Strohmaier; T Scott Stroup; James S Sutcliffe; Peter Szatmari; Szabocls Szelinger; Srinivasa Thirumalai; Robert C Thompson; Alexandre A Todorov; Federica Tozzi; Jens Treutlein; Manfred Uhr; Edwin J C G van den Oord; Gerard Van Grootheest; Jim Van Os; Astrid M Vicente; Veronica J Vieland; John B Vincent; Peter M Visscher; Christopher A Walsh; Thomas H Wassink; Stanley J Watson; Myrna M Weissman; Thomas Werge; Thomas F Wienker; Ellen M Wijsman; Gonneke Willemsen; Nigel Williams; A Jeremy Willsey; Stephanie H Witt; Wei Xu; Allan H Young; Timothy W Yu; Stanley Zammit; Peter P Zandi; Peng Zhang; Frans G Zitman; Sebastian Zöllner; Bernie Devlin; John R Kelsoe; Pamela Sklar; Mark J Daly; Michael C O'Donovan; Nicholas Craddock; Patrick F Sullivan; Jordan W Smoller; Kenneth S Kendler; Naomi R Wray
Journal: Nat Genet Date: 2013-08-11 Impact factor: 38.330

10. Biological insights from 108 schizophrenia-associated genetic loci.

Authors:
Journal: Nature Date: 2014-07-22 Impact factor: 49.962

31 in total

Review 1. Network biology concepts in complex disease comorbidities.

Authors: Jessica Xin Hu; Cecilia Engel Thomas; Søren Brunak
Journal: Nat Rev Genet Date: 2016-08-08 Impact factor: 53.242

2. A Powerful Approach to Estimating Annotation-Stratified Genetic Covariance via GWAS Summary Statistics.

Authors: Qiongshi Lu; Boyang Li; Derek Ou; Margret Erlendsdottir; Ryan L Powles; Tony Jiang; Yiming Hu; David Chang; Chentian Jin; Wei Dai; Qidu He; Zefeng Liu; Shubhabrata Mukherjee; Paul K Crane; Hongyu Zhao
Journal: Am J Hum Genet Date: 2017-12-07 Impact factor: 11.025

3. Genetic analysis of quantitative traits in the Japanese population links cell types to complex human diseases.

Authors: Masahiro Kanai; Masato Akiyama; Atsushi Takahashi; Nana Matoba; Yukihide Momozawa; Masashi Ikeda; Nakao Iwata; Shiro Ikegawa; Makoto Hirata; Koichi Matsuda; Michiaki Kubo; Yukinori Okada; Yoichiro Kamatani
Journal: Nat Genet Date: 2018-02-05 Impact factor: 38.330

4. Etiology in psychiatry: embracing the reality of poly-gene-environmental causation of mental illness.

Authors: Rudolf Uher; Alyson Zwicker
Journal: World Psychiatry Date: 2017-06 Impact factor: 49.548

Review 5. Genetics of primary sclerosing cholangitis and pathophysiological implications.

Authors: Xiaojun Jiang; Tom H Karlsen
Journal: Nat Rev Gastroenterol Hepatol Date: 2017-03-15 Impact factor: 46.802

6. Cross-disorder analysis of schizophrenia and 19 immune-mediated diseases identifies shared genetic risk.

Authors: Jennie G Pouget; Buhm Han; Yang Wu; Emmanuel Mignot; Hanna M Ollila; Jonathan Barker; Sarah Spain; Nick Dand; Richard Trembath; Javier Martin; Maureen D Mayes; Lara Bossini-Castillo; Elena López-Isac; Ying Jin; Stephanie A Santorico; Richard A Spritz; Hakon Hakonarson; Constantin Polychronakos; Soumya Raychaudhuri; Jo Knight
Journal: Hum Mol Genet Date: 2019-10-15 Impact factor: 6.150

7. Quantifying between-cohort and between-sex genetic heterogeneity in major depressive disorder.

Authors: Maciej Trzaskowski; Divya Mehta; Wouter J Peyrot; David Hawkes; Daniel Davies; David M Howard; Kathryn E Kemper; Julia Sidorenko; Robert Maier; Stephan Ripke; Manuel Mattheisen; Bernhard T Baune; Hans J Grabe; Andrew C Heath; Lisa Jones; Ian Jones; Pamela A F Madden; Andrew M McIntosh; Gerome Breen; Cathryn M Lewis; Anders D Børglum; Patrick F Sullivan; Nicholas G Martin; Kenneth S Kendler; Douglas F Levinson; Naomi R Wray
Journal: Am J Med Genet B Neuropsychiatr Genet Date: 2019-02-01 Impact factor: 3.568