Literature DB >> 30289880

Heritability informed power optimization (HIPO) leads to enhanced detection of genetic associations across multiple traits.

Abstract

Genome-wide association studies have shown that pleiotropy is a common phenomenon that can potentially be exploited for enhanced detection of susceptibility loci. We propose heritability informed power optimization (HIPO) for conducting powerful pleiotropic analysis using summary-level association statistics. We find optimal linear combinations of association coefficients across traits that are expected to maximize non-centrality parameter for the underlying test statistics, taking into account estimates of heritability, sample size variations and overlaps across the traits. Simulation studies show that the proposed method has correct type I error, robust to population stratification and leads to desired genome-wide enrichment of association signals. Application of the proposed method to publicly available data for three groups of genetically related traits, lipids (N = 188,577), psychiatric diseases (Ncase = 33,332, Ncontrol = 27,888) and social science traits (N ranging between 161,460 to 298,420 across individual traits) increased the number of genome-wide significant loci by 12%, 200% and 50%, respectively, compared to those found by analysis of individual traits. Evidence of replication is present for many of these loci in subsequent larger studies for individual traits. HIPO can potentially be extended to high-dimensional phenotypes as a way of dimension reduction to maximize power for subsequent genetic association testing.

Entities: CellLine Chemical Disease Gene Mutation Species

Mesh：

Year: 2018 PMID： 30289880 PMCID： PMC6192650 DOI： 10.1371/journal.pgen.1007549

Source DB: PubMed Journal: PLoS Genet ISSN： 1553-7390 Impact factor: 5.917

Introduction

Genome-wide association studies of increasingly large sample sizes are continuing to inform genetic basis of complex diseases. These studies have now led to identification of scores of susceptibility SNPs underlying a vast variety of individual complex traits and diseases [1-3]. Moreover, analyses of heritability and effect-size distributions have shown that each trait is likely to be associated with thousands to tens of thousands of additional susceptibility variants, each of which individually has very small effects, but in combination they can explain substantial fraction of trait variation [4-15]. GWAS of increasing sample sizes as well as re-analysis of current studies with powerful statistical methods are expected to lead to identification of many of these additional variants. An approach to increase the power of existing GWAS is to borrow strength across related traits. Comparisons of GWAS discoveries across traits have clearly shown that pleiotropy is a common phenomenon [3, 14, 16–19]. Aggregated analysis of multiple related traits have led to identification of novel SNPs that could not be detected through analysis of individual traits alone [20-23]. Further, analysis of genetic correlation using genome-wide panel of SNPs have identified groups of traits that are likely to share many underlying genetic variants of small effects [10, 12, 14, 24, 25]. As summary-level association statistics from large GWAS are now increasingly accessible, there is a great opportunity to accelerate discoveries through novel cross-trait analysis of these datasets. A variety of methods have been developed in the past decade to increase power of GWAS analysis by combining information across multiple traits [26-37]. Many of these methods have focused on developing test-statistics that are likely to have optimal power for detecting an individual SNP under certain types of alternatives of its shared effects across multiple traits [26, 30, 31, 35, 38, 39]. These approaches do not borrow information across SNPs and may be inefficient for analysis of traits that are likely to have major overlap in their underlying genetic architecture. For the analysis of psychiatric diseases, for example, it has been shown that borrowing pleiotropic information across SNPs can be used to improve power of detection of individual SNP associations and genetic risk prediction [40, 41]. In this article, we propose a novel method for powerful aggregated association analysis for individual SNPs across groups of multiple, highly related, traits informed by genome-wide estimates of heritability and genetic covariance. We derive optimal test-statistics based on orthogonal linear combinations of association coefficients across traits–the directions are expected to maximize genome-wide averages of the underlying non-centrality parameters in a gradually decreasing order. We exploit recent developments in LD-score regression methodology [14, 42] for estimation of phenotypic and genotypic correlations for implementation of the method using only summary-level results from GWAS. Our method applies to traits measured on different samples with unknown overlap. We evaluate performance of the proposed method through extensive simulation studies using a novel scheme for directly generating summary-level association statistics for large GWAS for multiple traits with possibly overlapping samples. We use the proposed method to analyze summary-statistics available from consortia of GWAS of lipid traits [43], psychiatric diseases [20] and social science traits [44]. These applications empirically illustrate that HIPO components can be highly enriched with association signals and can identify novel and replicable associations that are not identifiable at comparable level of significance based on analysis of the individual traits.

Material and methods

Model and assumptions

Suppose that the summary level results are available for K traits. For a given SNP j, let and denote vectors of length K containing estimates of regression parameters and associated standard errors, respectively, for the K traits. Let M be the total number of SNPs under study. Throughout, we will assume both genotypes and phenotypes are standardized to have mean 0 and variance 1. Let N denote the sample size for GWAS for the k-th trait. For binary traits, N is the effective sample size . We assume N can vary across studies because traits may be measured on distinct, but potentially overlapping, samples (see Section C in S1 Appendix for discussion of the case where sample size varies across SNPs within the same study). We assume that summary-level statistics in GWAS are obtained based on one SNP at a time analysis and that follows a multivariate normal distribution: where = (β, …, β) is referred to as the “marginal” effect sizes, the coefficients that will be obtained by fitting single-SNP regression models across the individual traits in the underlying population. The variance-covariance matric which may include non-zero covariance terms when the studies have overlapping samples, will be estimated based on estimates of standard errors of the individual coefficients () and estimate of “phenotypic correlation” that could be obtained based on LD-score regression.

Power optimization

Power has a one-to-one correspondence with the non-centrality parameter (NCP, denoted by δ) of the underlying χ2-statistic. Therefore, we try to find the linear combination that maximizes the average NCP across SNPs (denoted by E[δ]), which is given by Here is the vector of weights for a HIPO component associated with individual traits; we drop the subscript j from and and use and for simple notations. The denominator is easy to simplify: , which does not depend on true value of . We derive an expression of the numerator based on commonly used random effect models that are used to characterize genetic variance-covariances. Let denote the vector of “joint” effect sizes associated with SNP j that could be obtained by simultaneous analysis of SNPs in multivariate models across the K individual traits. We assume that follows a multivariate normal distribution , where Σ is the genetic covariance matrix. It follows that , the vector of marginal regression coefficients, is also normally distributed with mean 0 and , where is the LD score. Here r is the correlation of genotypes between SNP j and j′. Thus, based on the above model, the numerator of (1) can be written as . Therefore, we have The matrix needs to take into account the sample size differences and overlaps across studies. When all the phenotypes are measured on the same set of people, is proportional to the phenotypic variance-covariance matrix and E[δ] reduces to maximizing the heritability of the combined traits (MaxH) [34]. Here = (y1, …, y) is the vector of phenotypes. But HIPO is more general and can be applied to traits measured on different samples with unknown overlap. The LD-score regression allows estimation of both Σ and based on underlying slope and intercept parameters, respectively, using GWAS summary-level statistics (Section A in S1 Appendix) [14, 42]. The first HIPO component 1 is given by solving the following optimization problem: Subsequent components are defined iteratively by solving a slightly different optimization problem The above procedure can be implemented by suitable eigen decomposition, resulting in a total of K HIPO components (Section B in S1 Appendix). We call the first HIPO component HIPO-D1, the second HIPO component HIPO-D2, and so on. Interestingly, it can be shown that the eigenvalues resulting from this procedure are the average NCP for χ2 association-statistics across SNPs along the HIPO directions (up to the same scale constant, Section B in S1 Appendix). Ideally, it is adequate to consider the top HIPO components if a few eigenvalues clearly dominate the others. For the kth HIPO component, the association for the SNP j is tested using Z-statistics in the form It is easy to see that HIPO z-statistics reduce to the inverse standard error weighted z-scores when all traits have the same heritability, have genetic correlation 1 and, there is no sample overlap across studies. Therefore, HIPO can also be viewed as an extension of standard single-trait meta-analysis. It can be expected from theory that the performance of HIPO, characterized by the increase of average NCP of HIPO components compared to individual traits, does not directly depend on the overlaps of causal SNPs across traits or the overlap of samples across studies. In fact, E[δ] only depends on the covariance matrices var() and . These two matrices can stay the same under different degrees of causal SNP overlap and sample overlap. A closely related method is MTAG, which uses the summary level data of multiple traits to estimate single trait effects, based on genetic and phenotypic correlation across traits [36]. MTAG is also based on linear combinations of summary statistics but the weights are different from those obtained by HIPO. More specifically, MTAG solves the moment equation which gives the solution Here var() = Ω, and is a vector equal to the k-th column of Ω and ω is a scalar equal to the k-th diagonal element of Ω. The matrix is the same for all SNPs if both genotypes and phenotypes are standardized to have mean 0 and variance 1. We will compare HIPO with MTAG in simulations and real data analysis.

Simulations

We use a novel simulation method that directly generates summary level data for GWAS of multiple traits preserving realistic genotypic and phenotypic correlation structures. We proposed the single-trait version of this approach in a recent study [15]. We propose to simulate GWAS estimate for marginal effects across K traits, denoted as , using a model of the form where two types of errors terms, and , are introduced to account for variability due to population stratification effects and estimation uncertainty, respectively. We assume the population stratification effects s follow independent and identically distributed (i.i.d.) multivariate normal across SNPs. We generate the estimation error terms following a multivariate normal distribution that takes into account both phenotypic correlation across traits and linkage disequilibrium across SNPs. Note that there is widespread correlation between error terms, which can exist across different SNPs within the same study or for the same SNP across different studies. Correlation can also exist between different SNPs in different studies in the presence of LD, phenotypic correlation and sample overlap. All the possibilities can be captured by simulating from where the covariance matrix is the Kronecker product of the LD coefficient matrix and where the (k, l) element involves sample sizes, the sample overlap N and the phenotypic covariance between the kth and lth trait (Section D in S1 Appendix). We assume that the sample size is the same for all the SNPs within the same study. We simulate by first randomly selecting ~12K causal SNPs out of a reference panel of ~1.2 million HapMap3 SNPs with MAF >5% in 1000 Genomes European population. This SNP list is downloaded from LD Hub [45]. For selected casual SNPs, we generate i.i.d. joint effect sizes from a multivariate normal distribution , where Σ is the genetic covariance matrix. For simplicity we assume all the traits have the same set of causal SNPs. We calculate the marginal effect sizes as the sum of the joint effect size of SNPs in neighborhood weighted by the LD coefficient, i.e. . The neighborhood is defined to be set of SNPs that are within 1MB distance and have r2 > 0.01 with respect to SNP j. For simulation of , we observe that in a GWAS study where the phenotypes have no association with any of the markers, the summary-level association statistics is expected to follow the same multivariate distribution as . We utilize individual level genotype data available from 489 European samples from the 1000 Genomes Project. For each of the 489 subjects, we simulate a vector of phenotype from a predetermined multivariate normal distribution without any reference to their genotypes. We then conducted standard one SNP at a time GWAS analysis for each trait to compute the association statistics for the 1.2 million SNPs. To mimic the incomplete sample overlap between traits, we can calculate ,… based on different subsamples of 1000 Genomes EUR, of size n1, …, n. Finally, to generate error terms according to sample size specification for our simulation studies, we use the adjustment We show in Section D in S1 Appendix that this has the desired distribution. We conduct simulation studies to validate HIPO-based association tests and investigate expected power gain under varying sample size and heritability. For simplicity, we first consider the scenarios where all traits are measured on the same set of subjects. To make the settings more realistic, we use two sets of genetic and phenotypic covariance matrices estimated from real data: Blood lipid traits: LDL, HDL, TG, TC (see next section for full name) Psychiatric diseases: ASD, BIP, SCZ (see next section for full name) We choose only 3 psychiatric diseases instead of all 5 involved in real data analysis to speed up computation. ASD, BIP and SCZ have high heritability and substantial genetic correlation, which should be helpful to illustrate the property of the method. We vary the value of scale factor to control heritability of the traits while preserving the genetic correlation structure. We also vary the sample size: N = 10K, 50K, 100K, 500K. The covariance matrix of is set to in the first and second settings, respectively. This choice of parameters leads to an average per SNP population stratification that is about 25% of the per SNP heritability when . For each setting we repeat the simulation 100 times. More extensive simulations are conducted to investigate the performance of HIPO under partial sample overlap across studies, and when different traits have different sets of causal SNPs (S1 Table). We also investigate the performance in higher dimensions by simulating 10 traits which are divided into two blocks of 5 traits, with higher correlation within blocks and lower correlation between blocks (S1 Table). In addition, we conduct simulations using UK Biobank data to study the type I error of HIPO under unbalanced case-control design (Section E.1 of S1 Appendix), as well as the relationship between the number of dominant HIPO components and underlying genetic mechanisms (Section E.2 in S1 Appendix).

Summary level data

We analyze publicly available GWAS summary-level results across three groups of traits measured on European ancestry samples using the proposed method. Global Lipids Genetics Consortium (GLGC) provides the GWAS results for levels of low-density lipoprotein (LDL) cholesterol, high-density lipoprotein (HDL) cholesterol, triglycerides (TG) and total cholesterol (TC) [43]. The data consist of 188,577 European-ancestry individuals with ~1.8 million SNPs after implementing the LD Hub quality control procedure (described at the end of this section). The Psychiatric Genomics Consortium (PGC) cross-disorder study analyzed data for 5 psychiatric disorders: autism spectrum disorder (ASD), attention deficit-hyperactivity disorder (ADHD), bipolar disorder (BIP), major depressive disorder (MDD) and schizophrenia (SCZ) [20, 46–49]. Two of the five traits involved trio data: ASD (4788 trio cases, 4788 trio pseudocontrols, 161 cases, 526 controls, equivalent to 4949 cases and 5314 controls) and ADHD (1947 trio cases, 1947 trio pseudocontrols, 840 cases, 688 controls, equivalent to 2787 cases and 2635 controls). The other three studies did not involve trios: BIP (6990 cases, 4820 controls), MDD (9227 cases, 7383 controls) and SCZ (9379 cases, 7736 controls). After applying the same QC procedure, we included ~1.05 million SNPs for HIPO analysis. The Social Science Genetic Association Consortium (SSGAC) provides summary statistics for depressive symptoms (DS, N = 161,460), neuroticism (NEU, N = 170,911) and subjective well-being (SWB, N = 298,420) [44]. The DS data are the meta-analysis results combining a study by the Psychiatric Genomics Consortium [48], the initial release of UK Biobank (UKB) [50] and the Resource for Genetic Epidemiology Research on Aging cohort (dbGap, phs000674.v1.p1). For neuroticism, the study pooled summary level data sets from UKB and Genetics Personality Consortium (GPC). The SWB data is the meta-analysis result from 59 cohorts [44]. All subjects are of European ancestry. We analyzed ~2.1 million SNPs after QC. For all three groups of traits, we use the GWAS parameter estimates and standard errors to compute the z-statistics and p-values without making post-meta-analysis correction of genomic control factors. We perform SNP filtering to all three groups of phenotypes based on LD Hub quality control guideline. Markers that meet the following conditions are removed: (1) with extremely large effect size (χ2 > 80) (to avoid the results to be unduly influenced by outliers) (2) within the major histocompatibility complex (MHC) region (26Mb~34Mb on chromosome 6) (3) MAF less than 5% in 1000 Genomes Project Phase 3 European samples (4) sample size less than 0.67 times the 90th percentile of the total sample size (to account for SNP missingness) (5) alleles do not match the 1000 Genomes alleles. We further remove SNPs that are missing for at least one trait. The summary statistics are supplied to LDSC software [14, 42, 45] to fit LD score regression. We defined a locus to be “novel” if it contains at least one SNP that reach genome-wide significance (p-value < 5×10−8) by the HIPO method and the lead SNP in the region is at least 0.5 Mb away and has r2 < 0.1 from all lead SNPs of genome-wide significance regions identified by individual trait analysis (Section F in S1 Appendix).

Results

Simulation results show that all HIPO components maintain the correct type I error rate with or without population stratification, consistently across different sample sizes and values of heritability (Table 1 and S2–S5 Tables), even in unbalanced case-control studies (S13 Table). One representative example is the case with covariance structure of blood lipids (Table 1). It can be seen that correct type I errors are maintained under three different significance levels: p<0.05, p<0.01 and p<0.001. Desired type I error rates are also maintained in simulation settings where causal SNPs across traits only overlapped partially and there are incomplete sample overlaps across studies (S3 Table). Similar patterns can be observed for the case with covariance structure of psychiatric diseases (S4 and S5 Tables). In the presence of population stratification, the degree of which is modest according to our simulation scheme, tests based on individual traits show somewhat inflated type I error under large sample size (e.g. 500K) (Table 1 and S4 Table).

Table 1

Type I error rates for HIPO observed in datasets simulated under covariance structure estimated from studies of blood lipids.

hmax2	p-value threshold	0.1	0.2	0.35	0.5	0.1	0.2	0.35	0.5
N	p-value threshold	0.1	0.2	0.35	0.5	0.1	0.2	0.35	0.5
HIPO-D1		Without population stratification				With population stratification
10K	p<0.05	0.051	0.051	0.05	0.05	0.051	0.051	0.05	0.05
	p<0.01	0.01	0.01	0.01	0.01	0.01	0.01	0.01	0.01
	p<0.001	0.0011	0.001	0.001	0.001	0.0011	0.001	0.001	0.001
50K	p<0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05	0.05
	p<0.01	0.01	0.01	0.01	0.01	0.01	0.01	0.01	0.01
	p<0.001	0.001	0.001	0.001	0.001	0.001	0.001	0.001	0.001
100K	p<0.05	0.05	0.05	0.051	0.05	0.05	0.05	0.05	0.051
	p<0.01	0.01	0.01	0.01	0.01	0.01	0.01	0.01	0.01
	p<0.001	0.001	0.001	0.001	0.001	0.001	0.001	0.001	0.001
500K	p<0.05	0.05	0.051	0.05	0.052	0.05	0.051	0.051	0.052
	p<0.01	0.01	0.01	0.01	0.011	0.01	0.01	0.01	0.011
	p<0.001	0.001	0.0011	0.001	0.0011	0.001	0.001	0.0011	0.0011
Most heritable trait		Without population stratification				With population stratification
10K	p<0.05	0.051	0.051	0.05	0.051	0.051	0.051	0.05	0.051
	p<0.01	0.01	0.01	0.01	0.01	0.01	0.01	0.01	0.01
	p<0.001	0.0011	0.0011	0.0011	0.0011	0.0011	0.0011	0.001	0.0011
50K	p<0.05	0.05	0.05	0.051	0.051	0.051	0.051	0.051	0.051
	p<0.01	0.01	0.01	0.01	0.01	0.01	0.01	0.01	0.01
	p<0.001	0.0011	0.0011	0.0011	0.0011	0.0011	0.0011	0.0011	0.0011
100K	p<0.05	0.051	0.051	0.051	0.051	0.052	0.051	0.051	0.051
	p<0.01	0.01	0.01	0.01	0.01	0.011	0.011	0.011	0.011
	p<0.001	0.0011	0.0011	0.0011	0.0011	0.0011	0.0011	0.0011	0.0011
500K	p<0.05	0.05	0.051	0.051	0.051	0.055	0.055	0.055	0.055
	p<0.01	0.01	0.01	0.01	0.01	0.012	0.012	0.012	0.012
	p<0.001	0.0011	0.0011	0.0011	0.0011	0.0013	0.0013	0.0013	0.0013

is the largest heritability among the individual traits.

Type I error rates for HIPO observed in datasets simulated under covariance structure estimated from studies of blood lipids.

See S1 Table 1a and 1b for detailed settings. Summary-level association statistics are simulated for 4 traits using genetic and phenotypic covariance matrices estimated from Global Lipids Genetics Consortium (GLGC) data, with and without population stratification. The results for HIPO-D1 and the most heritable trait are listed. Reported are the average of genome-wide type I error rates across 100 simulations, under significance thresholds p<0.05, p<0.01 and p<0.001. is the largest heritability among the individual traits. Results also show that association analysis based on HIPO-D1 leads to substantial number of additional true discoveries compared to that based on the most heritable trait (S6 Table). In most settings, the value of average χ2-statistics are larger for HIPO-D1 than those for the individual traits and MTAG estimates (S7 and S8 Tables). After LD-pruning, HIPO components identify a substantial number of novel loci that are not discovered by individual trait analysis (Table 2, S9 and S10 Tables). For the case with covariance structure of blood lipids, the largest power increase can be observed when N = 50K and N = 100K, where HIPO can identify up to 206 new loci (Table 2). Similar patterns are followed in other settings (S9 and S10 Tables). Results also show that QQ plots for HIPO-D1 to be more enriched with signals than those for the most heritable trait (S1–S4 Figs).

Table 2

Number of truly associated independent loci discovered by HIPO, MTAG and individual trait analysis observed in datasets simulated under the covariance structure estimated from studies of blood lipids.

hmax2		0.1	0.2	0.35	0.5	0.1	0.2	0.35	0.5
N		0.1	0.2	0.35	0.5	0.1	0.2	0.35	0.5
Same causal SNPs		Without population stratification				With population stratification
10K	Individual traits	0	1	1	3	0	0	1	3
	HIPO	0	0	1	2	0	0	1	2
	MTAG	0	1	1	4	0	0	1	4
	HIPO new	0	0	1	2	0	0	1	1
	MTAG new	0	0	1	2	0	0	1	2
50K	Individual traits	3	25	134	341	3	25	136	344
	HIPO	2	24	129	317	2	23	128	318
	MTAG	4	32	165	393	3	32	165	394
	HIPO new	2	12	52	98	1	12	51	101
	MTAG new	2	13	51	90	1	12	50	90
100K	Individual traits	25	193	722	1299	26	201	727	1314
	HIPO	23	185	679	1232	24	186	669	1229
	MTAG	32	233	799	1382	33	234	791	1381
	HIPO new	12	68	167	206	12	66	154	197
	MTAG new	12	65	139	157	13	62	130	148
500K	Individual traits	1303	2536	3173	3490	1352	2557	3186	3491
	HIPO	1240	2464	3132	3471	1204	2425	3119	3454
	MTAG	1394	2568	3175	3488	1358	2531	3164	3474
	HIPO new	202	144	81	53	171	120	70	46
	MTAG new	162	88	33	17	121	63	26	12
		Partial causal SNP overlap					Partial sample overlap
10K	Individual traits	0	0	1	3	0	0	1	1
	HIPO	0	0	1	3	0	0	0	1
	MTAG	0	0	1	4	0	0	1	1
	HIPO new	0	0	1	2	0	0	0	1
	MTAG new	0	0	1	2	0	0	0	1
50K	Individual traits	3	26	135	348	1	8	49	136
	HIPO	2	21	122	312	1	8	50	139
	MTAG	4	33	163	403	1	11	67	179
	HIPO new	2	13	60	126	1	5	26	61
	MTAG new	2	12	49	96	1	6	28	64
100K	Individual traits	26	196	730	1336	8	73	334	719
	HIPO	23	178	651	1219	8	73	332	715
	MTAG	33	239	802	1412	12	98	413	845
	HIPO new	14	84	195	256	5	36	122	199
	MTAG new	13	68	136	153	6	38	118	184
500K	Individual traits	1338	2601	3269	3588	725	1954	2799	3193
	HIPO	1225	2523	3243	3583	721	1938	2772	3181
	MTAG	1419	2632	3274	3589	854	2078	2846	3219
	HIPO new	254	198	116	75	194	231	143	94
	MTAG new	155	82	31	15	186	189	91	54

is the largest heritability among the individual traits.

Number of truly associated independent loci discovered by HIPO, MTAG and individual trait analysis observed in datasets simulated under the covariance structure estimated from studies of blood lipids.

See 1a-1d in S1 Table for detailed settings. We report the average number of truly associated loci identified by all the individual traits/HIPO components/MTAG estimates across 100 simulations, under significance threshold p < 5 × 10−8 and LD pruning threshold r2 < 0.1 and different loci required to be >0.5Mb apart. is the largest heritability among the individual traits. We compare HIPO and MTAG in all our simulation settings. Both MTAG and HIPO find many novel loci compared to individual trait analysis and the set of novel loci identified by HIPO and MTAG tend to have some overlaps, but each method identifies some unique ones that is not discovered by the other. Although MTAG tends to find more loci than HIPO in total, HIPO tends to find more novel loci that are not detected by individual trait analysis and MTAG tends to replicate existing findings (Table 2, S9 and S10 Tables). This pattern is more obvious when the sample size is large (N ≥ 100K) and is not sensitive to the LD clumping criterion (S11 Table). HIPO tends to be more powerful in settings of higher dimensions, with the number of novel loci nearly twice of the number identified by MTAG in several cases (S10 Table).

Application to blood lipid data

We applied our method to the Global Lipids Genetics Consortium (GLGC) data [43]. The average NCP decreases from 0.213 for HIPO-D1 to 0.026 for HIPO-D4, with most association signals appears to be associated with the first and second components (S14 Table). This may be due to the fact that LDL and TC are highly correlated with each other and as are HDL and TG, and HIPO-D1 and HIPO-D2 each captures the signal of one pair of traits. HIPO-D1 is positively related to TG, negatively related to HDL and TC and depends weakly on LDL. HIPO-D2 depends mostly on TC. The last component HIPO-D4, which contains very little genetic association signals and hence can be used as a negative control, is positively correlated with TC and negatively with the other three traits. Note that all 4 traits have similar heritability and sample size (S14 Table), hence the difference in weights is likely driven by genetic correlations. The order of λ and average of empirical χ2 statistic also tracks with the average NCP (Fig 1), suggesting that the observed enrichments are likely due to polygenic effects. We identified twenty novel loci by HIPO-D1 and 4 by HIPO-D2 at genome-wide significant level (p < 5 × 10−8) (Table 3). The number of novel loci changed to 16 for HIPO-D1 and 4 for HIPO-D2 under a more stringent LD-pruning criterion: r2 < 0.1 and lead SNPs of different loci are >1Mb from each other. The pattern of p-values for individual traits show that the proposed method detects novel SNPs that contain moderate degree of association signals across multiple traits. There is very little overlap between new loci found by HIPO-D1 and by HIPO-D2, as expected from genetic orthogonality of the two components (S5 Fig). Among the 24 new loci found by HIPO-D1 and HIPO-D2, 9 are not found by any of the MTAG estimates. MTAG identified 10 novel loci that are not detected by HIPO-D1 or HIPO-D2 (S17 Table). The result is fairly consistent with our simulation studies which shows that MTAG and HIPO tend to find some overlapping and some unique novel loci.

Fig 1

QQ plots for individual traits and underlying HIPO components across blood lipids, psychiatric diseases, social science traits.

Table 3

Novel loci discovered at genome-wide significance level (p < 5 × 10−8) by the first and second HIPO components of blood lipid traits.

SNP	CHR	MBP	Nearest Gene (Distance)	p_LDL	p_HDL	p_TG	p_TC	p_HIPO-D1	p_HIPO-D2
HIPO-D1
rs4850047	2	3.6	RPS7(+6.244kb)	2.87e-03 (+)	1.15e-04 (+)	8.58e-04 (-)	2.14e-06 (+)	1.13e-09#	7.82e-04
rs2249105	2	65.3	CEP68(0)	8.72e-02 (+)	6.35e-06 (+)	1.89e-06 (-)	4.66e-01 (+)	1.33e-08	7.92e-01
rs2062432	3	123.1	ADCY5(0)	9.78e-01 (+)	3.88e-06 (-)	2.44e-03 (+)	6.75e-02 (-)	4.30e-08	5.52e-01
rs6855363	4	157.7	PDGFC(-12.22kb)	6.44e-01 (-)	3.20e-07 (+)	3.18e-04 (-)	5.89e-01 (+)	2.27e-08	5.44e-01
rs17199964	4	102.7	BANK1(-3.972kb)	3.11e-01 (-)	9.19e-08 (-)	1.27e-01 (+)	9.85e-04 (-)	3.03e-08	2.43e-02
rs10054063	5	173.4	CPEB4(+5.085kb)	7.00e-01 (-)	6.13e-04 (-)	7.72e-07 (+)	2.67e-01 (-)	3.69e-08	7.80e-01
rs11987974	8	36.8	KCNU1(+30.17kb)	7.66e-01 (+)	3.79e-06 (-)	1.85e-06 (+)	7.16e-01 (-)	3.91e-09#	3.32e-01
rs740746	10	115.8	ADRB1(-11.02kb)	4.32e-01 (-)	1.09e-06 (-)	1.61e-03 (+)	1.07e-01 (-)	4.21e-08	5.74e-01
rs10832027	11	13.4	ARNTL(0)	2.53e-02 (+)	1.52e-07 (+)	5.73e-07 (-)	1.87e-02 (+)	3.88e-12#	3.16e-01
rs7938117	11	68.6	CPT1A(0)	3.69e-01 (-)	2.05e-07 (-)	9.49e-06 (+)	4.01e-02 (-)	3.35e-11#	5.30e-01
rs1565228	11	27.6	BDNF-AS(0)	2.46e-01 (-)	3.50e-05 (-)	3.44e-06 (+)	7.19e-02 (-)	2.56e-09#	6.44e-01
rs661171	11	110.0	ZC3H12C(0)	9.06e-03 (+)	9.84e-07 (+)	1.77e-01 (-)	2.70e-06 (+)	2.58e-08	2.23e-04
rs895953	12	122.2	SETD1B(0)	7.23e-01 (-)	1.45e-06 (+)	3.98e-07 (-)	5.43e-01 (+)	2.84e-10#	3.86e-01
rs2384034	12	113.2	RPH3A(-24.86kb)	9.47e-03 (+)	3.61e-07 (+)	3.18e-02 (-)	5.00e-05 (+)	4.39e-09#	2.51e-03
rs11048456	12	26.5	ITPR2(-25.2kb)	8.65e-02 (-)	2.93e-07 (-)	1.59e-03 (+)	5.74e-02 (-)	1.75e-08	3.28e-01
rs721772	15	41.8	RPAP1(0)	5.17e-01 (+)	2.26e-07 (-)	4.30e-05 (+)	7.10e-01 (-)	4.25e-09#	3.75e-01
rs11079810	17	46.2	SKAP1(0)	1.99e-02 (+)	1.71e-07 (+)	2.23e-04 (-)	9.51e-03 (+)	4.30e-10#	1.32e-01
rs4805755	19	32.9	ZNF507(0)	9.34e-01 (-)	5.58e-08 (+)	4.83e-03 (-)	9.85e-02 (+)	7.71e-09#	6.28e-01
rs10408163	19	47.6	ZC3H4(0)	1.47e-01 (-)	9.99e-07 (+)	3.20e-07 (-)	2.70e-01 (-)	8.87e-09	1.65e-02
rs6059932	20	33.2	PIGU(0)	1.30e-01 (+)	5.73e-07 (+)	6.33e-05 (-)	6.33e-02 (+)	1.27e-09#	4.76e-01
HIPO-D2:
rs4683438	3	142.7	LOC100507389(0)	5.70e-05 (-)	6.85e-01 (+)	3.77e-04 (-)	2.87e-07 (-)	5.41e-01	2.99e-08
rs176813	4	69.6	UGT2B15(+63.04kb)	2.62e-05 (+)	1.75e-01 (+)	4.23e-04 (+)	5.68e-08 (+)	5.62e-01	8.10e-09#
rs2268719	6	52.4	TRAM2(0)	7.52e-07 (-)	3.08e-01 (+)	4.14e-02 (-)	6.66e-08 (-)	9.39e-01	2.13e-08
rs7939352	11	78.0	GAB2(0)	2.79e-06 (-)	9.75e-01 (-)	5.47e-03 (-)	2.48e-07 (-)	9.57e-01	3.47e-08

Independent SNPs were identified through LD-pruning with r2 threshold of 0.1 and pruned SNPs were assumed to represent independent loci if they are >0.5Mb apart. Loci are considered novel if they are not identified at genome-wide significance level through analysis of individual traits. For each lead SNP, p-values for association are shown for HIPO components and for individual traits. The directions of association (+/-) are also shown for each of the individual traits. TG: triglycerides; TC: total cholesterol. HIPO-D1 and HIPO-D2: 1st and 2nd HIPO components. The weights for the first and second HIPO components are: .

# in the last two columns indicates that this locus passes the Bonferroni corrected significance threshold (11 loci by HIPO-D1 and 1 by HIPO-D2).

QQ plots for individual traits and underlying HIPO components across blood lipids, psychiatric diseases, social science traits.

Blood lipid traits include HDL, LDL, triglycerides (TG) and total cholesterol (TC). Psychiatric diseases include autism spectrum disorder (ASD), ADHD, bipolar disorder (BIP), major depressive disorder (MDD) and schizophrenia (SCZ). Meta-analysis QQ plot is also included for psychiatric diseases (in green). Social science traits include depressive symptoms (DS), neuroticism (NEU) and subjective well-being (SWB). Genomic control factors and average χ2 statistics are shown in the legend. Independent SNPs were identified through LD-pruning with r2 threshold of 0.1 and pruned SNPs were assumed to represent independent loci if they are >0.5Mb apart. Loci are considered novel if they are not identified at genome-wide significance level through analysis of individual traits. For each lead SNP, p-values for association are shown for HIPO components and for individual traits. The directions of association (+/-) are also shown for each of the individual traits. TG: triglycerides; TC: total cholesterol. HIPO-D1 and HIPO-D2: 1st and 2nd HIPO components. The weights for the first and second HIPO components are: . # in the last two columns indicates that this locus passes the Bonferroni corrected significance threshold (11 loci by HIPO-D1 and 1 by HIPO-D2).

Application to psychiatric diseases

Applications of HIPO to Psychiatric Genomics Consortium (PGC) cross-disorder data [20] show that most association signals are captured by HIPO-D1 (S15 Table), which has an average NCP twice larger than that of HIPO-D2. The first HIPO component puts the highest weights on BIP and SCZ, which have the largest heritability and relatively large sample sizes. It is noteworthy that for a few of the strongest signals, HIPO is outperformed by standard meta-analysis, which was implemented in PGC cross-disorder analysis as a way for detecting SNPs that may be associated with multiple traits. The QQ plot of HIPO-D1, however, dominates those for the individual traits and for the standard meta-analysis when p > 1 × 10−8 (Fig 1). This suggests that HIPO is superior to standard meta-analysis in detecting moderate effects, while sacrificing some efficiency for the top hits. The value of λ and average χ2 -statistics are higher for HIPO-D1 than those for individual traits and standard meta-analysis. HIPO-D1 discovers one new locus, marked by the lead SNP rs13072940 (p = 1.71 × 10−8), that is not identified by either the individual traits or the meta-analysis. The same loci is identified under LD-pruning criterion r2 < 0.1 and lead SNPs of different loci are >1Mb from each other. The marker rs13072940 shows association with bipolar disorder (p = 0.0026) and schizophrenia (p = 2.55 × 10−6) but no association with autism spectrum disorder (p = 0.97), ADHD (p = 0.70) or major depressive disorder (p = 0.11). The meta-analysis signal (p = 7.02 × 10−6) does not reach genome-wide significance and is, in fact, weaker compared to that from schizophrenia alone. This SNP shows stronger association in more recent larger studies of bipolar disorder [47] (p = 0.0003) and schizophrenia [51] (p = 1.32 × 10−7), clearly indicating that this is likely to be a true signal underlying multiple PGC traits. MTAG also identifies the same new locus (rs13072940) as HIPO-D1 under the same significance and LD pruning threshold.

Application to social science traits

Application of HIPO to Social Science Genetic Association Consortium studies reveals that most of the genetic variation is captured by HIPO-D1 that has an average NCP twice larger than that of HIPO-D2 (S16 Table). The component is negatively associated with DS and NEU and is positively associated with SWB. The tail region of QQ plot of HIPO-D1 lies close to that of neuroticism, but the values of λ and average χ2 are substantially larger for HIPO-D1 (Fig 1). HIPO-D1 identifies 12 new loci that are not discovered by individual trait analysis of SSGAC data (Table 4), increasing the total number of genome-wide significant loci from 24 to 36 (S7 Fig). The number of new loci changes to 11 under a more stringent LD pruning criterion: r2 < 0.1 and lead SNPs of different loci are >1Mb from each other. MTAG identifies 14 loci that are not identified by individual trait analysis, including the 12 loci found by HIPO-D1 (S18 Table).

Table 4

Novel loci discovered at genome-wide significance level (p < 5 × 10−8) by HIPO-D1 for social science traits.

SNP	CHR	MBP	Nearest Gene (Distance)	p_DS	p_NEU	p_SWB	p_HIPO−D1
HIPO-D1
rs2874367*	1	21.3	EIF4G3(0)	6.33e-05 (-)	6.33e-05 (-)	6.33e-05 (+)	1.38e-08
rs11100449*	4	141.0	MAML3(0)	1.47e-05 (-)	6.33e-05 (-)	6.33e-05 (+)	8.02e-09#
rs10475748*	5	164.6	NA	1.47e-05 (-)	5.73e-07 (-)	1.96e-02 (+)	1.90e-08
rs6919210	6	70.6	COL19A1(0)	1.15e-03 (-)	5.73e-07 (-)	1.77e-04 (+)	1.88e-09#
rs6569095	6	120.3	NA	8.58e-04 (+)	2.14e-05 (+)	1.47e-05 (-)	5.94e-09#
rs210899*	6	11.7	ADTRP(0)	1.10e-01 (+)	2.03e-06 (+)	6.33e-05 (-)	3.61e-08
rs2396726	7	114.0	FOXP2(0)	1.77e-04 (+)	3.06e-06 (+)	8.58e-04 (-)	1.04e-08#
rs12701427*	7	4.2	SDK1(0)	1.15e-03 (-)	1.47e-05 (-)	6.33e-05 (+)	1.28e-08
rs9584850*	13	99.1	FARP1(0)	6.33e-05 (-)	1.52e-07 (-)	8.58e-04 (+)	6.50e-10#
rs11644362	16	13.0	SHISA9(-1.379kb)	2.70e-03 (-)	2.46e-04 (-)	3.06e-06 (+)	3.73e-08
rs7239568	18	52.0	C18orf54(+56.37kb)	5.96e-03 (-)	2.14e-05 (-)	9.64e-08 (+)	8.17e-10#
rs1261093*	18	52.9	TCF4(0)	8.58e-04 (+)	9.64e-08 (+)	2.70e-03 (-)	3.92e-09#

Independent SNPs are identified through LD-pruning with threshold of 0.1 and pruned SNPs are assumed to represent independent loci if they are >0.5Mb apart. Loci are considered novel if they were not identified at genome-wide significance level through analysis individual traits. For each lead SNP, p-values for association are shown for HIPO components and for individual traits. The directions of association (+/-) are also shown for each of the individual traits. DS: depressive symptoms; NEU: neuroticism; SWB: subjective well-being. HIPO-D1: 1st HIPO component. Weights for HIPO-D1: . NA in the Nearest Gene column means there is no gene within 200kb of the SNP. SNPs marked by * indicates underlying loci show evidence of replication in the larger data set used in the MTAG paper (see Table 5).

# in the last column indicates that this locus passes Bonferroni corrected significance threshold (7 loci in total).

Table 5

Evidence of replication of novel loci identified by HIPO analysis for social science traits in subsequent larger studies of DS and SWB.

Lead SNP in Novel Loci	Proxy SNP Reported in MTAG Study	D’	Individual Trait p-value in SSGAC	Individual Trait p-value in MTAG Study
DS
rs11100449	rs1877075	0.78	2.00e-06	1.10e-06
rs10475748	rs10045971	0.99	4.51e-02	1.17e-09
rs12701427	rs4723416	0.91	1.59e-03	1.17e-06
rs9584850	rs4772087	1.00	2.42e-03	1.04e-06
rs1261093	rs11876620	0.82	1.58e-04	4.45e-08
SWB
rs2874367	rs12125335	1.00	NA	7.09e-08
rs11100449	rs769664	0.79	3.18e-03	4.59e-07
rs210899	rs10947543	0.93	NA	3.10e-08

Reported are P-values for proxy SNPs (D’ > 0.75) for individual trait associations in SSGAC data and the more recent MTAG study. Novel loci are identified through analysis of SSGAC which include studies of DS and SWB with sample sizes Neff = 161,460 and N = 298,420, respectively. The MTAG study includes an expanded set of sample with Neff = 354,862 and N = 388,538 for DS and SWB, respectively. DS: depressive symptoms; NEU: neuroticism; SWB: subjective well-being. NA indicates that the proxy SNP is not present in the SSGAC data.

# in the last column indicates that this locus passes Bonferroni corrected significance threshold (7 loci in total). We examined evidence of replication of the novel loci based on more recent and larger studies of DS and SWB that were incorporated in the MTAG analysis [36]. As this study reported only a list of top SNPs (p < 1 × 10−5) after stringent LD-pruning (r2 < 0.1), we could not look up the exact lead SNPs that we report for the novel regions (Table 4). Instead, we searched for SNPs in the top list reported by MTAG study that could be considered proxy (D’>0.75) for our lead SNPs. We found 7 of the 12 novel have such proxies and these proxy SNPs show stronger level of association in the more recent MTAG study for at least one of DS and SWB (Table 5). Reported are P-values for proxy SNPs (D’ > 0.75) for individual trait associations in SSGAC data and the more recent MTAG study. Novel loci are identified through analysis of SSGAC which include studies of DS and SWB with sample sizes Neff = 161,460 and N = 298,420, respectively. The MTAG study includes an expanded set of sample with Neff = 354,862 and N = 388,538 for DS and SWB, respectively. DS: depressive symptoms; NEU: neuroticism; SWB: subjective well-being. NA indicates that the proxy SNP is not present in the SSGAC data.

Discussion

In this report, we present a novel method for powerful pleiotropic analysis using summary level data across multiple traits, accounting for both heritability and sample size variations. Application of the proposed method to three groups of genetically related trait identifies a variety of novel and replicable loci that were not detectable by analysis of individual traits at comparable level of confidence. We also conduct extensive simulation studies in realistic settings of large GWAS to demonstrate the ability of the method to maintain type-I error, achieve robustness to population stratification and enhance detection of novel loci. The novel method we introduce for directly simulating summary-level GWAS statistics, preserving expected correlation structure across both traits and SNP markers, will allow rapid evaluation of alternative methods for pleiotropic analysis in settings of large complex GWAS more feasible in the future. Application of the proposed method provides new insight into the genetic architecture of groups of related traits. For blood lipids, which have similar sample sizes, the average NCPs for HIPO-D1 and HIPO-D2 dominate the other two, suggesting that there are perhaps two unrelated mechanisms through which most genetic markers are associated with the individual cholesterol traits. For psychiatric diseases and social science traits, the top HIPO component dominates the others, indicating that there is perhaps one major genetic mechanism underlying each group of traits. These conjectures are supported by a simple simulation (Section E.2, S1 Appendix). However, given that top HIPO component down weights traits with smaller sample sizes, it is possible that there exist other independent genetic mechanisms related to these traits that could not be captured by the top HIPO component. Nevertheless, HIPO, by taking into account both heritability and sample sizes, provides a clear guideline how many independent sets of tests should be performed across the different traits to capture most of the genetic signals. Throughout the paper we report results based on significance threshold p < 5 × 10−8. We do not recommend adjusting the significance threshold to account for multiple comparison, since HIPO components are not independent of individual-trait tests. Similar issues will also arise about MTAG or other pleiotropic methods. Investigation of setting up proper threshold is beyond the scope of the current paper. However, even if we apply Bonferroni correction, HIPO is still able to find a substantial number of new loci. For blood lipids, since there are 4 traits and 2 HIPO components under consideration, the Bonferroni adjusted threshold is . HIPO-D1 still discovers 11 new loci under the new threshold and HIPO-D2 still discovers 1 (Table 3). For social science traits, since there are 3 traits and 1 HIPO component under consideration, the Bonferroni-corrected threshold is . HIPO-D1 still discovers 7 new loci under the new threshold (Table 4). Earlier studies have proposed methods for association analysis in GWAS informed by heritability analysis. For analysis of multivariate traits observed on the same set of individuals, the MaxH [34] method was proposed to conduct association analysis along directions that maximizes trait heritability. HIPO allows a generalization of this approach by taking into account sample size differences and overlaps across studies allowing powerful cross-disorder analysis using only summary-level data across distinct studies. Another closely related method is MTAG [36], which also utilizes summary level data and LD score regression to estimate genotypic and phenotypic variance-covariance matrices. MTAG, however, performs association tests for each individual trait by improving estimation of the underlying association coefficients using cross-trait variance-covariance structure. In contrast, we propose finding optimal linear combination of association coefficients across traits that will maximize the power for detecting underlying common signals. The advantage of MTAG is that it does associate the SNPs to individual traits and thus has appealing interpretation. However, strictly speaking, MTAG, similar to HIPO, is only a valid method for testing the global null hypothesis of no association of a SNP across any of the traits and may identify a SNP to be associated with a null trait while in truth it is only related to another trait in the same group. The advantage of HIPO is that it directly focuses on optimization of power in orthogonal directions for cross-disorder analysis and can provide significant dimension reduction for analysis of higher dimensional traits. Simulation studies as well as analysis of real data shows that both HIPO and MTAG identify substantial number of novel loci compared to analysis of individual traits (Table 2, S9, S10, S17 and S18 Tables). The sets of novel loci identified by the two methods tend to be substantially non-overlapping indicating that it may be useful to implement both methods for cross-trait analysis. Simulations also show that when the number of traits become larger, HIPO tends to find substantially more novel SNPs than MTAG (S10 Table). There exists a variety of methods for pleiotropic analysis [30, 31, 35, 38, 39] that aim to optimize power for testing associations with respect to individual SNPs without being informed by heritability. The method ASSET [39], for example, searches through different subsets of traits to find the optimal subset that yields the strongest meta-analysis z statistic for each individual SNP. Methods like HIPO and MTAG, which use estimates of heritability based on genome-wide set of markers, are likely to be more powerful when the underlying traits have strong genetic correlation, such as that observed for psychiatric disorders. In contrast, methods such as ASSET may be more powerful for analysis of groups of traits that have more moderate genetic correlation, such as cancers of different sites [13], for detection of loci with unique but insightful pleiotropic patterns of association. There is potential to develop intermediate methods, which borrows information across SNPs but in a more localized manner, for example, based on functional annotation information [52, 53]. Further research is also merited for implementation of HIPO for very high-dimensional pleiotropic analysis and rare variant association studies, two settings in both of which there could be challenges associated with dealing with noises associated with estimation of genetic variance-covariance matrices. In conclusion, HIPO provides a novel and powerful method for joint association analysis across multiple traits using summary-level statistics. Application of the method to multiple datasets shows that it provides unique insight into genetic architecture of groups of related traits and can identify substantial number of novel loci compared to analysis of individual traits. Further extension of the method is merited for facilitating more interpretable and parsimonious association analysis across groups of high-dimensional correlated traits. Our R package is available at https://github.com/gqi/hipo.

Summary of simulation settings.

(PDF) Click here for additional data file.

Type I error rates for HIPO-D2 to HIPO-D4 observed in datasets simulated under covariance structure estimated from blood lipids traits.

See S1 Table 1a and 1b for detailed settings and see Table 1 for type I errors of HIPO-D1. (PDF) Click here for additional data file.

Type I error rates for HIPO observed in datasets simulated under covariance structure modified from estimates of blood lipids traits allowing for partial overlap of causal SNPs and sample set across traits.

See S1 Table 1c and 1d for detailed settings. (PDF) Click here for additional data file.

Type I error rates for HIPO observed in datasets simulated under covariance structure estimated from studies of psychiatric diseases.

See S1 Table 2a and 2b for detailed settings. (PDF) Click here for additional data file.

Type I error rates for HIPO observed in datasets simulated under covariance structure modified from estimates of psychiatric diseases allowing for partial overlap of causal SNPs and sample set across traits.

See S1 Table 2c and 2d for detailed settings. (PDF) Click here for additional data file.

Percentage increase in average number of true discoveries by HIPO compared to the analysis of most heritable trait in simulation studies.

(PDF) Click here for additional data file.

Average χ2 for HIPO-D1 compared to those for individual traits and MTAG observed in simulation studies based on covariance structure of blood lipids.

(PDF) Click here for additional data file.

Average χ2 for HIPO-D1 compared to those for individual traits and MTAG observed in simulation studies based on covariance structure of psychiatric diseases.

(PDF) Click here for additional data file.

Number of truly associated loci discovered by HIPO, MTAG and individual trait analysis observed in datasets simulated under the covariance structure estimated from psychiatric diseases.

See S1 Table 2a-2d for detailed settings. (PDF) Click here for additional data file.

Number of truly associated loci discovered by HIPO, MTAG and individual trait analysis from simulation studies of 10 correlated traits.

See S1 Table part 3 for detailed settings. (PDF) Click here for additional data file.

Number of truly associated independent loci discovered by HIPO, MTAG and individual trait analysis observed in datasets simulated under the covariance structure estimated from studies of blood lipids under alternative LD-clumping threshold.

See 1a-1d in S1 Table for detailed settings. (PDF) Click here for additional data file.

Type I error rates for MTAG observed in datasets simulated under covariance structure estimated from studies of blood lipids with population stratification.

(PDF) Click here for additional data file.

Type I error rates for HIPO observed in simulated unbalanced case-control datasets.

(PDF) Click here for additional data file.

Weights associated with individual blood lipid traits and average non-centrality parameters for each HIPO component.

Numbers in the parentheses are the heritability estimated using LD score regression. (PDF) Click here for additional data file.

Weights associated with individual psychiatric diseases and average non-centrality parameters for each HIPO component.

Numbers in the parentheses are the heritability estimated using LD score regression. (PDF) Click here for additional data file.

Weights associated with individual social science traits and average non-centrality parameters for each HIPO component.

Numbers in the parentheses are the heritability estimated using LD score regression. (PDF) Click here for additional data file.

Novel loci for blood lipids identified by HIPO and MTAG.

Only HIPO-D1 and HIPO-D2 are considered. (PDF) Click here for additional data file.

Novel loci for social science traits identified by HIPO and MTAG.

Only HIPO-D1 is considered. (PDF) Click here for additional data file.

Details of mathematical derivations.

(PDF) Click here for additional data file.

QQ plot of HIPO-D1 and the most heritable trait observed in datasets simulated under covariance structure of blood lipids WITHOUT population stratification effects.

(PDF) Click here for additional data file.

QQ plot of HIPO-D1 and the most heritable trait observed in datasets simulated under covariance structure of blood lipids WITH population stratification effects.

(PDF) Click here for additional data file.

QQ plot of HIPO-D1 and the most heritable trait observed in datasets simulated under covariance structure of psychiatric diseases WITHOUT population stratification effects.

(PDF) Click here for additional data file.

QQ plot of HIPO-D1 and the most heritable trait observed in datasets simulated under covariance structure of psychiatric diseases WITH population stratification effects.

(PDF) Click here for additional data file.

Venn diagram for the number of loci discovered to be associated with blood lipid traits by individual trait and HIPO analysis.

Independent SNPs are identified through LD-pruning with r2 threshold of 0.1 and pruned SNPs were assumed to represent independent loci if they are >0.5Mb apart. (PDF) Click here for additional data file.

Venn diagram for the number of associated loci identified by individual trait analysis of psychiatric diseases, HIPO-D1 and meta-analysis.

Venn diagram for the number of associated loci identified by individual trait analysis of social science traits and HIPO-D1.

50 in total

1. A subset-based approach improves power and interpretation for the combined analysis of genetic association studies of heterogeneous traits.

Authors: Samsiddhi Bhattacharjee; Preetha Rajaraman; Kevin B Jacobs; William A Wheeler; Beatrice S Melin; Patricia Hartge; Meredith Yeager; Charles C Chung; Stephen J Chanock; Nilanjan Chatterjee
Journal: Am J Hum Genet Date: 2012-05-04 Impact factor: 11.025

2. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies.

Authors: Brendan K Bulik-Sullivan; Po-Ru Loh; Hilary K Finucane; Stephan Ripke; Jian Yang; Nick Patterson; Mark J Daly; Alkes L Price; Benjamin M Neale
Journal: Nat Genet Date: 2015-02-02 Impact factor: 38.330

3. Evaluating the heritability explained by known susceptibility variants: a survey of ten complex diseases.

Authors: Hon-Cheong So; Allen H S Gui; Stacey S Cherny; Pak C Sham
Journal: Genet Epidemiol Date: 2011-03-03 Impact factor: 2.135

4. A Powerful Approach to Estimating Annotation-Stratified Genetic Covariance via GWAS Summary Statistics.

Authors: Qiongshi Lu; Boyang Li; Derek Ou; Margret Erlendsdottir; Ryan L Powles; Tony Jiang; Yiming Hu; David Chang; Chentian Jin; Wei Dai; Qidu He; Zefeng Liu; Shubhabrata Mukherjee; Paul K Crane; Hongyu Zhao
Journal: Am J Hum Genet Date: 2017-12-07 Impact factor: 11.025

5. Common SNPs explain a large proportion of the heritability for human height.

Authors: Jian Yang; Beben Benyamin; Brian P McEvoy; Scott Gordon; Anjali K Henders; Dale R Nyholt; Pamela A Madden; Andrew C Heath; Nicholas G Martin; Grant W Montgomery; Michael E Goddard; Peter M Visscher
Journal: Nat Genet Date: 2010-06-20 Impact factor: 38.330

6. Large-scale genome-wide association analysis of bipolar disorder identifies a new susceptibility locus near ODZ4.

Authors:
Journal: Nat Genet Date: 2011-09-18 Impact factor: 38.330

7. Genome-wide association study identifies five new schizophrenia loci.

Authors:
Journal: Nat Genet Date: 2011-09-18 Impact factor: 38.330

8. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age.

Authors: Cathie Sudlow; John Gallacher; Naomi Allen; Valerie Beral; Paul Burton; John Danesh; Paul Downey; Paul Elliott; Jane Green; Martin Landray; Bette Liu; Paul Matthews; Giok Ong; Jill Pell; Alan Silman; Alan Young; Tim Sprosen; Tim Peakman; Rory Collins
Journal: PLoS Med Date: 2015-03-31 Impact factor: 11.069

9. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs.

Authors: S Hong Lee; Stephan Ripke; Benjamin M Neale; Stephen V Faraone; Shaun M Purcell; Roy H Perlis; Bryan J Mowry; Anita Thapar; Michael E Goddard; John S Witte; Devin Absher; Ingrid Agartz; Huda Akil; Farooq Amin; Ole A Andreassen; Adebayo Anjorin; Richard Anney; Verneri Anttila; Dan E Arking; Philip Asherson; Maria H Azevedo; Lena Backlund; Judith A Badner; Anthony J Bailey; Tobias Banaschewski; Jack D Barchas; Michael R Barnes; Thomas B Barrett; Nicholas Bass; Agatino Battaglia; Michael Bauer; Mònica Bayés; Frank Bellivier; Sarah E Bergen; Wade Berrettini; Catalina Betancur; Thomas Bettecken; Joseph Biederman; Elisabeth B Binder; Donald W Black; Douglas H R Blackwood; Cinnamon S Bloss; Michael Boehnke; Dorret I Boomsma; Gerome Breen; René Breuer; Richard Bruggeman; Paul Cormican; Nancy G Buccola; Jan K Buitelaar; William E Bunney; Joseph D Buxbaum; William F Byerley; Enda M Byrne; Sian Caesar; Wiepke Cahn; Rita M Cantor; Miguel Casas; Aravinda Chakravarti; Kimberly Chambert; Khalid Choudhury; Sven Cichon; C Robert Cloninger; David A Collier; Edwin H Cook; Hilary Coon; Bru Cormand; Aiden Corvin; William H Coryell; David W Craig; Ian W Craig; Jennifer Crosbie; Michael L Cuccaro; David Curtis; Darina Czamara; Susmita Datta; Geraldine Dawson; Richard Day; Eco J De Geus; Franziska Degenhardt; Srdjan Djurovic; Gary J Donohoe; Alysa E Doyle; Jubao Duan; Frank Dudbridge; Eftichia Duketis; Richard P Ebstein; Howard J Edenberg; Josephine Elia; Sean Ennis; Bruno Etain; Ayman Fanous; Anne E Farmer; I Nicol Ferrier; Matthew Flickinger; Eric Fombonne; Tatiana Foroud; Josef Frank; Barbara Franke; Christine Fraser; Robert Freedman; Nelson B Freimer; Christine M Freitag; Marion Friedl; Louise Frisén; Louise Gallagher; Pablo V Gejman; Lyudmila Georgieva; Elliot S Gershon; Daniel H Geschwind; Ina Giegling; Michael Gill; Scott D Gordon; Katherine Gordon-Smith; Elaine K Green; Tiffany A Greenwood; Dorothy E Grice; Magdalena Gross; Detelina Grozeva; Weihua Guan; Hugh Gurling; Lieuwe De Haan; Jonathan L Haines; Hakon Hakonarson; Joachim Hallmayer; Steven P Hamilton; Marian L Hamshere; Thomas F Hansen; Annette M Hartmann; Martin Hautzinger; Andrew C Heath; Anjali K Henders; Stefan Herms; Ian B Hickie; Maria Hipolito; Susanne Hoefels; Peter A Holmans; Florian Holsboer; Witte J Hoogendijk; Jouke-Jan Hottenga; Christina M Hultman; Vanessa Hus; Andrés Ingason; Marcus Ising; Stéphane Jamain; Edward G Jones; Ian Jones; Lisa Jones; Jung-Ying Tzeng; Anna K Kähler; René S Kahn; Radhika Kandaswamy; Matthew C Keller; James L Kennedy; Elaine Kenny; Lindsey Kent; Yunjung Kim; George K Kirov; Sabine M Klauck; Lambertus Klei; James A Knowles; Martin A Kohli; Daniel L Koller; Bettina Konte; Ania Korszun; Lydia Krabbendam; Robert Krasucki; Jonna Kuntsi; Phoenix Kwan; Mikael Landén; Niklas Långström; Mark Lathrop; Jacob Lawrence; William B Lawson; Marion Leboyer; David H Ledbetter; Phil H Lee; Todd Lencz; Klaus-Peter Lesch; Douglas F Levinson; Cathryn M Lewis; Jun Li; Paul Lichtenstein; Jeffrey A Lieberman; Dan-Yu Lin; Don H Linszen; Chunyu Liu; Falk W Lohoff; Sandra K Loo; Catherine Lord; Jennifer K Lowe; Susanne Lucae; Donald J MacIntyre; Pamela A F Madden; Elena Maestrini; Patrik K E Magnusson; Pamela B Mahon; Wolfgang Maier; Anil K Malhotra; Shrikant M Mane; Christa L Martin; Nicholas G Martin; Manuel Mattheisen; Keith Matthews; Morten Mattingsdal; Steven A McCarroll; Kevin A McGhee; James J McGough; Patrick J McGrath; Peter McGuffin; Melvin G McInnis; Andrew McIntosh; Rebecca McKinney; Alan W McLean; Francis J McMahon; William M McMahon; Andrew McQuillin; Helena Medeiros; Sarah E Medland; Sandra Meier; Ingrid Melle; Fan Meng; Jobst Meyer; Christel M Middeldorp; Lefkos Middleton; Vihra Milanova; Ana Miranda; Anthony P Monaco; Grant W Montgomery; Jennifer L Moran; Daniel Moreno-De-Luca; Gunnar Morken; Derek W Morris; Eric M Morrow; Valentina Moskvina; Pierandrea Muglia; Thomas W Mühleisen; Walter J Muir; Bertram Müller-Myhsok; Michael Murtha; Richard M Myers; Inez Myin-Germeys; Michael C Neale; Stan F Nelson; Caroline M Nievergelt; Ivan Nikolov; Vishwajit Nimgaonkar; Willem A Nolen; Markus M Nöthen; John I Nurnberger; Evaristus A Nwulia; Dale R Nyholt; Colm O'Dushlaine; Robert D Oades; Ann Olincy; Guiomar Oliveira; Line Olsen; Roel A Ophoff; Urban Osby; Michael J Owen; Aarno Palotie; Jeremy R Parr; Andrew D Paterson; Carlos N Pato; Michele T Pato; Brenda W Penninx; Michele L Pergadia; Margaret A Pericak-Vance; Benjamin S Pickard; Jonathan Pimm; Joseph Piven; Danielle Posthuma; James B Potash; Fritz Poustka; Peter Propping; Vinay Puri; Digby J Quested; Emma M Quinn; Josep Antoni Ramos-Quiroga; Henrik B Rasmussen; Soumya Raychaudhuri; Karola Rehnström; Andreas Reif; Marta Ribasés; John P Rice; Marcella Rietschel; Kathryn Roeder; Herbert Roeyers; Lizzy Rossin; Aribert Rothenberger; Guy Rouleau; Douglas Ruderfer; Dan Rujescu; Alan R Sanders; Stephan J Sanders; Susan L Santangelo; Joseph A Sergeant; Russell Schachar; Martin Schalling; Alan F Schatzberg; William A Scheftner; Gerard D Schellenberg; Stephen W Scherer; Nicholas J Schork; Thomas G Schulze; Johannes Schumacher; Markus Schwarz; Edward Scolnick; Laura J Scott; Jianxin Shi; Paul D Shilling; Stanley I Shyn; Jeremy M Silverman; Susan L Slager; Susan L Smalley; Johannes H Smit; Erin N Smith; Edmund J S Sonuga-Barke; David St Clair; Matthew State; Michael Steffens; Hans-Christoph Steinhausen; John S Strauss; Jana Strohmaier; T Scott Stroup; James S Sutcliffe; Peter Szatmari; Szabocls Szelinger; Srinivasa Thirumalai; Robert C Thompson; Alexandre A Todorov; Federica Tozzi; Jens Treutlein; Manfred Uhr; Edwin J C G van den Oord; Gerard Van Grootheest; Jim Van Os; Astrid M Vicente; Veronica J Vieland; John B Vincent; Peter M Visscher; Christopher A Walsh; Thomas H Wassink; Stanley J Watson; Myrna M Weissman; Thomas Werge; Thomas F Wienker; Ellen M Wijsman; Gonneke Willemsen; Nigel Williams; A Jeremy Willsey; Stephanie H Witt; Wei Xu; Allan H Young; Timothy W Yu; Stanley Zammit; Peter P Zandi; Peng Zhang; Frans G Zitman; Sebastian Zöllner; Bernie Devlin; John R Kelsoe; Pamela Sklar; Mark J Daly; Michael C O'Donovan; Nicholas Craddock; Patrick F Sullivan; Jordan W Smoller; Kenneth S Kendler; Naomi R Wray
Journal: Nat Genet Date: 2013-08-11 Impact factor: 38.330

10. MultiPhen: joint model of multiple phenotypes can increase discovery in GWAS.

Authors: Paul F O'Reilly; Clive J Hoggart; Yotsawat Pomyen; Federico C F Calboli; Paul Elliott; Marjo-Riitta Jarvelin; Lachlan J M Coin
Journal: PLoS One Date: 2012-05-02 Impact factor: 3.240

13 in total

1. JASS: command line and web interface for the joint analysis of GWAS results.

Authors: Hanna Julienne; Pierre Lechat; Vincent Guillemot; Carla Lasry; Chunzi Yao; Robinson Araud; Vincent Laville; Bjarni Vilhjalmsson; Hervé Ménager; Hugues Aschard
Journal: NAR Genom Bioinform Date: 2020-01-24

2. Effect of non-normality and low count variants on cross-phenotype association tests in GWAS.

Authors: Debashree Ray; Nilanjan Chatterjee
Journal: Eur J Hum Genet Date: 2019-10-03 Impact factor: 4.246

3. Multi-trait Genome-Wide Analyses of the Brain Imaging Phenotypes in UK Biobank.

Authors: Chong Wu
Journal: Genetics Date: 2020-06-15 Impact factor: 4.562

4. A penalized regression framework for building polygenic risk models based on summary statistics from genome-wide association studies and incorporating external information.

Authors: Ting-Huei Chen; Nilanjan Chatterjee; Maria Teresa Landi; Jianxin Shi
Journal: J Am Stat Assoc Date: 2020-10-12 Impact factor: 5.033

9. Alterations observed in the interferon α and β signaling pathway in MDD patients are marginally influenced by cis-acting alleles.

Authors: Chiara Magri; Edoardo Giacopuzzi; Chiara Sacco; Luisella Bocchio-Chiavetto; Alessandra Minelli; Massimo Gennarelli
Journal: Sci Rep Date: 2021-01-12 Impact factor: 4.379

10. Identifying loci with different allele frequencies among cases of eight psychiatric disorders using CC-GWAS.

Authors: Wouter J Peyrot; Alkes L Price
Journal: Nat Genet Date: 2021-03-08 Impact factor: 38.330

Introduction

Material and methods

Model and assumptions

Power optimization

Simulations

Summary level data

Results

Type I error rates for HIPO observed in datasets simulated under covariance structure estimated from studies of blood lipids.

Number of truly associated independent loci discovered by HIPO, MTAG and individual trait analysis observed in datasets simulated under the covariance structure estimated from studies of blood lipids.

Application to blood lipid data

QQ plots for individual traits and underlying HIPO components across blood lipids, psychiatric diseases, social science traits.

Application to psychiatric diseases

Application to social science traits

Discussion

Summary of simulation settings.

Type I error rates for HIPO-D2 to HIPO-D4 observed in datasets simulated under covariance structure estimated from blood lipids traits.

Type I error rates for HIPO observed in datasets simulated under covariance structure modified from estimates of blood lipids traits allowing for partial overlap of causal SNPs and sample set across traits.

Type I error rates for HIPO observed in datasets simulated under covariance structure estimated from studies of psychiatric diseases.

Type I error rates for HIPO observed in datasets simulated under covariance structure modified from estimates of psychiatric diseases allowing for partial overlap of causal SNPs and sample set across traits.

Percentage increase in average number of true discoveries by HIPO compared to the analysis of most heritable trait in simulation studies.

Average χ2 for HIPO-D1 compared to those for individual traits and MTAG observed in simulation studies based on covariance structure of blood lipids.

Average χ2 for HIPO-D1 compared to those for individual traits and MTAG observed in simulation studies based on covariance structure of psychiatric diseases.

Number of truly associated loci discovered by HIPO, MTAG and individual trait analysis observed in datasets simulated under the covariance structure estimated from psychiatric diseases.

Number of truly associated loci discovered by HIPO, MTAG and individual trait analysis from simulation studies of 10 correlated traits.

Number of truly associated independent loci discovered by HIPO, MTAG and individual trait analysis observed in datasets simulated under the covariance structure estimated from studies of blood lipids under alternative LD-clumping threshold.

Type I error rates for MTAG observed in datasets simulated under covariance structure estimated from studies of blood lipids with population stratification.

Type I error rates for HIPO observed in simulated unbalanced case-control datasets.

Weights associated with individual blood lipid traits and average non-centrality parameters for each HIPO component.

Weights associated with individual psychiatric diseases and average non-centrality parameters for each HIPO component.

Weights associated with individual social science traits and average non-centrality parameters for each HIPO component.

Novel loci for blood lipids identified by HIPO and MTAG.

Novel loci for social science traits identified by HIPO and MTAG.

Details of mathematical derivations.

QQ plot of HIPO-D1 and the most heritable trait observed in datasets simulated under covariance structure of blood lipids WITHOUT population stratification effects.

QQ plot of HIPO-D1 and the most heritable trait observed in datasets simulated under covariance structure of blood lipids WITH population stratification effects.

QQ plot of HIPO-D1 and the most heritable trait observed in datasets simulated under covariance structure of psychiatric diseases WITHOUT population stratification effects.

QQ plot of HIPO-D1 and the most heritable trait observed in datasets simulated under covariance structure of psychiatric diseases WITH population stratification effects.

Venn diagram for the number of loci discovered to be associated with blood lipid traits by individual trait and HIPO analysis.

Venn diagram for the number of associated loci identified by individual trait analysis of psychiatric diseases, HIPO-D1 and meta-analysis.

Venn diagram for the number of associated loci identified by individual trait analysis of social science traits and HIPO-D1.

Review 7. Genetic correlations of polygenic disease traits: from theory to practice.