Literature DB >> 30165448

PhenoSpD: an integrated toolkit for phenotypic correlation estimation and multiple testing correction using GWAS summary statistics.

Jie Zheng¹, Tom G Richardson¹, Louise A C Millard^1,2, Gibran Hemani¹, Benjamin L Elsworth¹, Christopher A Raistrick¹, Bjarni Vilhjalmsson³, Benjamin M Neale^4,5, Philip C Haycock¹, George Davey Smith¹, Tom R Gaunt¹.

Abstract

Background: Identifying phenotypic correlations between complex traits and diseases can provide useful etiological insights. Restricted access to much individual-level phenotype data makes it difficult to estimate large-scale phenotypic correlation across the human phenome. Two state-of-the-art methods, metaCCA and LD score regression, provide an alternative approach to estimate phenotypic correlation using only genome-wide association study (GWAS) summary results.
Results: Here, we present an integrated R toolkit, PhenoSpD, to use LD score regression to estimate phenotypic correlations using GWAS summary statistics and to utilize the estimated phenotypic correlations to inform correction of multiple testing for complex human traits using the spectral decomposition of matrices (SpD). The simulations suggest that it is possible to identify nonindependence of phenotypes using samples with partial overlap; as overlap decreases, the estimated phenotypic correlations will attenuate toward zero and multiple testing correction will be more stringent than in perfectly overlapping samples. Also, in contrast to LD score regression, metaCCA will provide approximate genetic correlations rather than phenotypic correlation, which limits its application for multiple testing correction. In a case study, PhenoSpD using UK Biobank GWAS results suggested 399.6 independent tests among 487 human traits, which is close to the 352.4 independent tests estimated using true phenotypic correlation. We further applied PhenoSpD to an estimated 5,618 pair-wise phenotypic correlations among 107 metabolites using GWAS summary statistics from Kettunen's publication and PhenoSpD suggested the equivalent of 33.5 independent tests for these metabolites. Conclusions: PhenoSpD extends the use of summary-level results, providing a simple and conservative way to reduce dimensionality for complex human traits using GWAS summary statistics. This is particularly valuable in the age of large-scale biobank and consortia studies, where GWAS results are much more accessible than individual-level data.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2018 PMID： 30165448 PMCID： PMC6109640 DOI： 10.1093/gigascience/giy090

Source DB: PubMed Journal: Gigascience ISSN： 2047-217X Impact factor: 6.524

Introduction

Phenotypic correlations between complex human traits and diseases can provide useful etiological insights into the understanding of mechanisms across the human phenome. However, a lack of individual-level phenotype data makes it difficult to estimate the phenotypic correlations between human traits and diseases. Fortunately, we are now in the post genome-wide association study (GWAS) era, in which many GWAS summary results are openly accessible for a large number of human diseases and traits [1]. It can therefore be valuable to use these genetic association summary statistics to reconstruct total phenotypic correlations across the human phenome. The key assumptions here are that phenotypic correlation comprises both genetic and nongenetic (environmental) components and that the genetic association information is able to capture both genetic and nongenetic components of the phenotypic correlation [2]. Here, we consider two methods that can be used (but were not designed) to estimate phenotypic correlations using GWAS summary statistics as by-products of the main purposes of those methods. First, MetaCCA [3] is a multivariate meta-analysis tool that allows multivariate representation of both genotype and phenotype. As a by-product, metaCCA estimates the phenotypic correlation between two traits based on a Pearson correlation between two univariable regression coefficients (betas) across a set of genetic variants. Second, bivariate linkage disequilibrium (LD) score regression [2] is a state-of-the-art approach to estimate genetic correlations between a pair of traits. As a consequence, the bivariate LD score regression approach allows estimation of phenotypic correlation among the overlapping samples of two GWASs. Assuming the genetic and nongenetic components of two phenotypes are independent, the genetic covariance matrix (built up by the beta coefficients of the genetic association test) will capture the genetic effects, while the error covariance matrix (built up by the error term of the genetic association test) will capture the environmental (nongenetic) effects. Using a bivariate LD score regression model, we are able to capture both (genetic correlation will be represented by the slope of the regression model and phenotypic correlation will be represented by the intercept of the regression model) [2]. Large-scale genetic association databases such as MR-Base [4] and LD Hub [5] have harmonized GWAS summary-level results for roughly 1,700 human traits. This provides a timely opportunity to estimate the phenotypic correlation structure across a wide range of high-dimensional, complex molecular traits, such as metabolites, that are potentially highly correlated. Bonferroni correction would markedly overcorrect for the inflated false-positive rate in such correlated datasets, resulting in a reduction in power. An appropriate method to correct for multiple testing among human traits and diseases is the spectral decomposition of matrices (SpD) [6, 7]. Here, we combine LD score regression with SpD to estimate the number of independent tests using only summary-level GWAS data.

Methods

Overview of PhenoSpD

Fig. 1 illustrates the key steps of the proposed pipeline, PhenoSpD, as follows: (1) harmonize GWAS summary results from the same sample; (2) apply the harmonized GWAS results to LD score regression to estimate the phenotypic correlation matrix of the traits; and (3) apply the SpD approach to the phenotypic correlation matrix and estimate the number of independent variables among the traits.

Figure 1:

Flowchart of PhenoSpD.

Simulation of phenotypic correlation estimation

First, we simulated the influence of the number of single nucleotide polymorphisms (SNPs), sample sizes of two GWASs, and sample overlap between two GWASs on the accuracy of the phenotypic correlation estimation. As shown in Fig. 2, we first created two samples A and B with different numbers of individuals (from 300 to 10,000 individuals), where the sample overlap between sample A and B ranged from 10% to 90%. We assumed complex human traits were influenced by both genetic and environmental factors, so we simulated the phenotype data of two correlated human traits (phenotype 1 and phenotype 2 with a phenotypic correlation of –0.7) based on varying numbers of genetic factors (ranging from 10 to 10,000 SNPs), different LD structure (r2 between 0 and 0.9), and 100 environmental factors. We then assigned the phenotypic correlation to its genetic and environmental components, each of which explained 10% to 90% of the total phenotypic variance. These genetic and environmental components were further assigned to each of the genetic and environmental factors in the model randomly. We also simulated two extreme cases where either the genetic or environmental components dominate the phenotypic correlation. After simulating the two phenotypic traits and the genotypic data in samples A and B, we conducted four GWASs (GWASs of phenotype 1 in samples A and B; GWASs of phenotype 2 in samples A and B) and recorded the summary statistics of these GWASs. To measure the accuracy of phenotypic correlation using GWAS summary statistics, we (1) calculated the observational phenotypic correlation (the Pearson correlation) between trait 1 and trait 2 in samples A and B separately and (2) estimated the phenotypic correlation between trait 1 and trait 2 in the overlapped samples using both metaCCA and LD score regression. We simulated step (2) 100 times and estimated the mean and standard deviation of the estimated phenotypic correlations. Finally, we compared the estimated phenotypic correlation with the observational phenotypic correlation and recorded the deviation between observed and estimated correlations. To demonstrate the simulation systematically, we explored the influence of the following properties: sample size; sample overlap; unbalanced sample size in samples A and B; number of SNPs; and LD. The R script for this simulation is provided as a Supplementary File (simulation.R).

Figure 2:

Demonstration of the simulation. For two samples A and B, we simulated the genotype data and phenotype data of two correlated human traits, phenotype 1 and phenotype 2. The sample overlap between sample A and sample B ranged from 10% to 90% in this simulation.

Validation of phenotypic correlation estimation using real GWAS data

We further tested the accuracy of the phenotypic correlation estimation using GWAS summary statistics of 487 traits from the UK Biobank [8] (Supplementary Table S2). We calculated the observational phenotypic correlation using the actual phenotype data (Supplementary Table S4), which was used as a benchmark to evaluate the accuracy of our phenotypic correlation estimates using LD score regression. In addition, we tested whether the number of causal variants (which are tagged by the genetic association signals) may affect the accuracy of the phenotypic correlation using four pairs of metabolites from Shin et al. [9]. The four pairs of metabolites were selected because they have a wide range of observed phenotypic correlation from 0.2 to 0.85. To validate the accuracy, we compared the observed phenotypic correlation with the phenotypic correlation estimated by LD score regression. To consider the number of causal variants in this validation, we set up eight groups of SNPs based on their effects on the traits. The eight groups were all GWAS SNPs; SNPs with Chi square statistics (square of Z scores) smaller than 40; SNPs with X2 <30; SNPs with X2 <20; SNPs with X2 <10; SNPs with X2 <3.84; SNPs with X2 <2.69; and SNPs with X2 <1. In other words, we progressively reduced the number of casual variants from the model and evaluated the impact of this on the accuracy of the phenotypic correlation estimation. Based on the simulation and real case validation, we listed our traits selection criteria in Supplementary Table S1.

Estimating the phenotypic correlations

Within our GWAS summary results database containing roughly 1,700 human traits, we selected 107 metabolites from Kettunen et al.[10] as a real case application since these complex molecular traits are potentially highly correlated. We then applied LD score regression to these 107 metabolites to estimate the phenotypic correlation matrix (Supplementary Table S3), which meets the suggested minimum parameters of the LD score regression method (traits with large sample size [e.g., N >5000], good SNP coverage [e.g., number of SNPs >200,000], and heritable [e.g., Z score of the SNP heritability >2]).

Multiple testing correction for human traits

We applied the SpD approach to estimate the number of independent tests among the 107 metabolites and 487 UK Biobank traits. The observed phenotypic correlations and correlations estimated by LD score regression were used as input for the SpD approach. We implemented the R code of the well-known method SNPSpD [6, 7] to estimate the number of independent traits using the phenotypic correlation matrix as input (Fig.1). The output of the SpD function is the estimated number of independent tests.

Results

Evaluation of phenotypic correlation estimation using simulated and real GWAS summary data

Tables 1 and 2 show the influence of changing various parameters on the accuracy of the phenotypic correlation estimation for metaCCA and LD score regression, respectively. Our general observations from the simulation are that since the genetic association information is able to capture both genetic and nongenetic components of the phenotypic correlation, we can estimate such correlation for any human trait, even for nonheritable traits. Also, we should apply LD score regression to estimate phenotypic correlation in a one-sample setting (i.e., where all GWAS are performed in the same sample). It is possible to identify nonindependence of phenotypes using GWAS results from samples with only a partial overlap; however, as overlap decreases, correlations will attenuate toward zero. In addition, metaCCA will provide approximate genetic correlations rather than phenotypic correlation, which limits its application in our approach to evaluating multiple testing.

Table 1:

The influence of genetic and environmental components on phenotypic correlation estimation using metaCCA

Model	N_ind_A	N_ind_B	N_overlap	Overlap_%	N_SNPs	SNP_region	N_EnvF	Genetic%	N_simu	Obs_rp	rG	rE	Est_rp
Genetic_Env_components 1	5000	5000	5000	100	1000	SNPs across the genome	1000	0	100	0.497	−0.007	0.502	0.035
Genetic_Env_components 2	5000	5000	5000	100	1000	SNPs across the genome	1000	10	100	0.498	−0.050	0.550	−0.002
Genetic_Env_components 3	5000	5000	5000	100	1000	SNPs across the genome	1000	20	100	0.496	−0.103	0.601	−0.044
Genetic_Env_components 4	5000	5000	5000	100	1000	SNPs across the genome	1000	30	100	0.505	−0.150	0.649	−0.079
Genetic_Env_components 5	5000	5000	5000	100	1000	SNPs across the genome	1000	40	100	0.493	−0.202	0.700	−0.128
Genetic_Env_components 6	5000	5000	5000	100	1000	SNPs across the genome	1000	50	100	0.502	−0.250	0.752	−0.166
Genetic_Env_components 7	5000	5000	5000	100	1000	SNPs across the genome	1000	60	100	0.497	−0.302	0.800	−0.211
Genetic_Env_components 8	5000	5000	5000	100	1000	SNPs across the genome	1000	70	100	0.508	−0.347	0.850	−0.246
Genetic_Env_components 9	5000	5000	5000	100	1000	SNPs across the genome	1000	80	100	0.504	−0.401	0.900	−0.288
Genetic_Env_components 10	5000	5000	5000	100	1000	SNPs across the genome	1000	90	100	0.498	−0.449	0.950	−0.330
Genetic_Env_components 11	5000	5000	5000	100	1000	SNPs across the genome	1000	100	100	0.509	−0.496	1.000	−0.373

In this simulation, we compared the agreements of the observational (calculated from phenotypes) and estimated phenotypic correlation (estimated using metaCCA) of two human traits in two samples A and B. We explored the influence of the genetic and environmental components on phenotypic correlation. More details of the simulation can be found in the Methods section. Abbreviations: N_ind_A and N_ind_B: number of individual in samples A and B; N_overlap: number of overlapped samples in samples A and B; overlap_%; percentage of overlapped samples in A and B; N_SNPs: number of SNPs in GWAS of samples A and B; SNP_region: simulated SNPs are from either one or a few LD blocks or from the whole genome; Genetic%: percentage of genetic influences on the phenotypic correlation; N_EnvF: number of environmental factors included in the model; N_simu: number of simulations; Obs_rp: observed phenotypic correlation between two traits in the mixed samples; Est_rp: mean value of the estimated phenotypic correlations in 100 simulations using metaCCA. rG and rE the simulated genetic and environmental correlation in each case.

Table 2:

The influence of genetic and environmental components, number of SNPs, sample sizes of two GWASs, and sample overlap between two GWASs on phenotypic correlation estimation using LD score regression

Model	N_ind_A	N_ind_B	N_overlap	Overlap_%	N_SNPs	SNP_region	N_EnvF	Genetic%	N_simu	Obs_rp	Est_rp	Deviation, %
Genetic_Env_components 1	5000	5000	5000	100	200K	SNPs across the genome	1000	0	100	0.49	0.32	35.70
Genetic_Env_components 2	5000	5000	5000	100	200K	SNPs across the genome	1000	10	100	0.50	0.34	31.30
Genetic_Env_components 3	5000	5000	5000	100	200K	SNPs across the genome	1000	20	100	0.50	0.37	24.90
Genetic_Env_components 4	5000	5000	5000	100	200K	SNPs across the genome	1000	30	100	0.50	0.39	21.30
Genetic_Env_components 5	5000	5000	5000	100	200K	SNPs across the genome	1000	40	100	0.51	0.41	18.70
Genetic_Env_components 6	5000	5000	5000	100	200K	SNPs across the genome	1000	50	100	0.50	0.42	15.30
Genetic_Env_components 7	5000	5000	5000	100	200K	SNPs across the genome	1000	60	100	0.49	0.43	13.10
Genetic_Env_components 8	5000	5000	5000	100	200K	SNPs across the genome	1000	70	100	0.49	0.44	11.60
Genetic_Env_components 9	5000	5000	5000	100	200K	SNPs across the genome	1000	80	100	0.50	0.45	9.70
Genetic_Env_components 10	5000	5000	5000	100	200K	SNPs across the genome	1000	90	100	0.50	0.46	7.90
Genetic_Env_components 11	5000	5000	5000	100	200K	SNPs across the genome	1000	100	100	0.50	0.47	5.90
sample size 1	1000	1000	500	50	200K	SNPs across the genome	1000	50	100	0.50	0.22	55.10
sample size 2	3000	3000	1500	50	200K	SNPs across the genome	1000	50	100	0.51	0.30	41.70
sample size 3	5000	5000	2500	50	200K	SNPs across the genome	1000	50	100	0.50	0.33	33.30
sample size 4	10 000	10 000	5000	50	200K	SNPs across the genome	1000	50	100	0.50	0.35	30.60
sample size 5	50 000	50 000	25 000	50	200K	SNPs across the genome	1000	50	100	0.50	0.36	28.20
sample size 6	100 000	100 000	50 000	50	200K	SNPs across the genome	1000	50	100	0.51	0.39	23.90
sample overlap 1	5000	5000	500	10	200K	SNPs across the genome	1000	50	100	0.50	0.08	83.30
sample overlap 2	5000	5000	1000	20	200K	SNPs across the genome	1000	50	100	0.50	0.16	68.10
sample overlap 3	5000	5000	1500	30	200K	SNPs across the genome	1000	50	100	0.51	0.23	55.40
sample overlap 4	5000	5000	2000	40	200K	SNPs across the genome	1000	50	100	0.51	0.29	42.90
sample overlap 5	5000	5000	2500	50	200K	SNPs across the genome	1000	50	100	0.51	0.34	34.30
sample overlap 6	5000	5000	3000	60	200K	SNPs across the genome	1000	50	100	0.51	0.38	24.90
sample overlap 7	5000	5000	3500	70	200K	SNPs across the genome	1000	50	100	0.50	0.41	18.50
sample overlap 8	5000	5000	4000	80	200K	SNPs across the genome	1000	50	100	0.50	0.45	10.70
sample overlap 9	5000	5000	4500	90	200K	SNPs across the genome	1000	50	100	0.51	0.48	6.10
unbalance sample 1	5000	5000	9000	90	200K	SNPs across the genome	1000	50	100	0.50	0.47	5.90
unbalance sample 2	5000	6000	9000	82	200K	SNPs across the genome	1000	50	100	0.50	0.45	10.50
unbalance sample 3	5000	8000	9000	69	200K	SNPs across the genome	1000	50	100	0.50	0.41	18.20
unbalance sample 4	5000	10 000	9000	60	200K	SNPs across the genome	1000	50	100	0.50	0.38	23.40
unbalance sample 5	5000	13 000	9000	50	200K	SNPs across the genome	1000	50	100	0.50	0.34	31.40
number of SNPs 1	5000	5000	2500	50	7.5K	SNPs across the genome	1000	50	100	0.50	0.04	92.30
number of SNPs 2	5000	5000	2500	50	12.5K	SNPs across the genome	1000	50	100	0.50	0.11	78.20
number of SNPs 3	5000	5000	2500	50	25K	SNPs across the genome	1000	50	100	0.50	0.14	72.10
number of SNPs 4	5000	5000	2500	50	50K	SNPs across the genome	1000	50	100	0.51	0.22	56.70
number of SNPs 5	5000	5000	2500	50	100K	SNPs across the genome	1000	50	100	0.50	0.30	40.90
number of SNPs 6	5000	5000	2500	50	200K	SNPs across the genome	1000	50	100	0.51	0.34	33.70
Linkage disequilibrium 1	5000	5000	2500	50	10K	SNPs from one LD block	1000	50	100	0.51	0.09	82.30
Linkage disequilibrium 2	5000	5000	2500	50	20K	SNPs from two LD blocks	1000	50	100	0.50	0.12	75.40
Linkage disequilibrium 3	5000	5000	2500	50	30K	SNPs from three LD blocks	1000	50	100	0.50	0.16	68.80
Linkage disequilibrium 4	5000	5000	2500	50	40K	SNPs from four LD blocks	1000	50	100	0.51	0.20	60.30
Linkage disequilibrium 5	5000	5000	2500	50	50K	SNPs from five LD blocks	1000	50	100	0.50	0.22	55.70
Linkage disequilibrium 6	5000	5000	2500	50	200K	SNPs across the genome	1000	50	100	0.51	0.34	33.90

In this simulation, we compared the agreements of the observational (calculated from phenotypes) and estimated phenotypic correlation (estimated using LD score regression) of two human traits in two samples A and B. We explored the influence of the following properties: genetic and environmental components; sample size; sample overlap; unbalanced sample size in samples A and B; number of SNPs; and linkage disequilibrium. More details of the simulation can be found in the Methods section. Abbreviations: N_ind_A and N_ind_B: number of individual in samples A and B; N_overlap: number of overlapped samples in samples A and B; overlap_%: percentage of overlapped samples in A and B; N_SNPs: number of SNPs in GWAS of samples A and B; SNP_region: simulated SNPs are from either one or a few LD blocks or from the whole genome; Genetic%: percentage of genetic influences on the phenotypic correlation; N_EnvF: number of environmental factors included in the model; N_simu: number of simulations; Obs_rp: observed phenotypic correlation between two traits in the mixed samples; Est_rp: mean value of the estimated phenotypic correlations in 100 simulations; Deviation (%): deviation between observational phenotypic correlation and estimated phenotypic correlation in each model of simulation.

The influence of genetic and environmental components on phenotypic correlation estimation using metaCCA In this simulation, we compared the agreements of the observational (calculated from phenotypes) and estimated phenotypic correlation (estimated using metaCCA) of two human traits in two samples A and B. We explored the influence of the genetic and environmental components on phenotypic correlation. More details of the simulation can be found in the Methods section. Abbreviations: N_ind_A and N_ind_B: number of individual in samples A and B; N_overlap: number of overlapped samples in samples A and B; overlap_%; percentage of overlapped samples in A and B; N_SNPs: number of SNPs in GWAS of samples A and B; SNP_region: simulated SNPs are from either one or a few LD blocks or from the whole genome; Genetic%: percentage of genetic influences on the phenotypic correlation; N_EnvF: number of environmental factors included in the model; N_simu: number of simulations; Obs_rp: observed phenotypic correlation between two traits in the mixed samples; Est_rp: mean value of the estimated phenotypic correlations in 100 simulations using metaCCA. rG and rE the simulated genetic and environmental correlation in each case. The influence of genetic and environmental components, number of SNPs, sample sizes of two GWASs, and sample overlap between two GWASs on phenotypic correlation estimation using LD score regression In this simulation, we compared the agreements of the observational (calculated from phenotypes) and estimated phenotypic correlation (estimated using LD score regression) of two human traits in two samples A and B. We explored the influence of the following properties: genetic and environmental components; sample size; sample overlap; unbalanced sample size in samples A and B; number of SNPs; and linkage disequilibrium. More details of the simulation can be found in the Methods section. Abbreviations: N_ind_A and N_ind_B: number of individual in samples A and B; N_overlap: number of overlapped samples in samples A and B; overlap_%: percentage of overlapped samples in A and B; N_SNPs: number of SNPs in GWAS of samples A and B; SNP_region: simulated SNPs are from either one or a few LD blocks or from the whole genome; Genetic%: percentage of genetic influences on the phenotypic correlation; N_EnvF: number of environmental factors included in the model; N_simu: number of simulations; Obs_rp: observed phenotypic correlation between two traits in the mixed samples; Est_rp: mean value of the estimated phenotypic correlations in 100 simulations; Deviation (%): deviation between observational phenotypic correlation and estimated phenotypic correlation in each model of simulation. One important question here is how the genetic and environmental factors affect the phenotypic correlation estimation. As shown in Table 1, we found that when the environmental components dominate the phenotypic correlation, the metaCCA estimates will bias toward the null. In addition, when the genetic component dominates, metaCCA estimates will bias toward the genetic correlation. This fits the assumption that the genetic covariance matrix (built from the beta coefficients from two GWASs) will only capture the genetic effects. MetaCCA used the beta coefficients to estimate the correlation, which approximately estimates the genetic correlation within the overlapped samples. This is consistent with the simulation results in Table 1. In contrast to metaCCA, LD score regression estimates both the genetic covariance matrix and the nongenetic covariance matrix (the error variance in the estimates of effects). In other words, given a bivariate setting (two GWASs), the slope of the LD score regression represents the genetic correlation, while the intercept term of the LD score regression represents the phenotypic correlation. This is consistent with the results in Table 2. We also found that the accuracy of the correlation estimation of LD score regression is mainly influenced by the proportion of overlapping individuals between two GWAS studies. For example, the deviation between observed and estimated phenotypic correlation improved from 83.3% to 6.1% when the percentage of sample overlap between two samples increased from 10% to 90% (Table 2). In addition, we observed that the number of SNPs included in the model will also influence the accuracy of the phenotypic correlation estimation. We also found that if all tested SNPs were from one or a few LD blocks (in other words, in high LD with each other), the accuracy of the phenotypic correlation will decrease (Table 2). Based on these two observations, we recommend including SNPs from as many genomic regions as possible to maximize the accuracy of the estimation. Finally, we observed that sample size of the GWAS influences the accuracy of the estimation, so we included GWASs with sample sizes of more than 5000 (Table 2). We further tested the accuracy of phenotypic correlation estimation by comparing the observed phenotypic correlations (Supplementary Table S4) using real phenotype data from UK Biobank [8] and the estimated phenotypic correlation (Supplementary Table S5) using UK Biobank GWAS results via LD score regression. Fig. 3 shows that the estimated phenotypic correlations using LD score regression are consistent with the observed phenotypic correlations (r2 = 0.71). The exception is that some traits with large observed correlation have estimated correlation toward the null. Two possible interpretations of this discrepancy are that the phenotypic correlations of some UK Biobank traits were poorly estimated and potentially mis-specified due to limited sample size or that due to missingness of the phenotype measurements, the sample overlap was limited between some UK Biobank traits.

Figure 3:

The comparison between the observed and estimated phenotypic correlations using LD score regression among 487 traits from UK Biobank. Each point is one trait. The red line is X = Y. Some traits got estimated phenotypic correlation out of bound (correlation more than one). This can occur due to the noises within the error covariance matrix (built up by the error term of the genetic association test) of a pair of traits. Fig. 4 illustrates the influence of the number of causal variants (which are tagged by the genetic association signals) on the accuracy of the phenotypic correlation using four pairs of metabolites from Shin et al. [13]. There is a clear trend that the estimated phenotypic correlations were further away from the observed phenotypic correlation when more and more variants with real effects were removed from the model. Based on this real case study, we recommend including all SNPs from the GWAS when estimating phenotypic correlation using LD score regression.

Figure 4:

Validation of the influence of number of causal variants on phenotypic correlation estimation. Four pairs of metabolites (leucine against N1-methyl-3-pyridone-4-carboxamide, tryptophan, phenylalanine, and valine) from Shin et al. [13] were selected based on their observed phenotypic correlations (0.2, 0.4, 0.6, and 0.85, respectively). Eight sets of SNPs were selected to estimate the phenotypic correlations using LD score regression. The 8 sets were all GWAS SNPs; SNPs with Chi square statistics (square of Z scores) smaller than 40; SNPs with X2 < 30; SNPs with X2 < 20; SNPs with X2 < 10; SNPs with X2 < 3.84; SNPs with X2 < 2.69; and SNPs with X2 <1. Notes: Four columns on the x-axis were the four selected pairs of metabolites. The y-axis was the value of the phenotypic correlation. Dark blue points are the observed phenotypic correlations (noted as rP-OBS). The lighter blue points are the eight groups of SNPs included in the phenotypic correlation estimation using LD score regression (noted as LDSC_X2).

A practical comparison between metaCCA and LD score regression on estimating phenotypic correlation

Both LD score regression and metaCCA have advantages and limitations when used to estimate phenotypic correlation. In this section, we summarize the practical difference between the two to inform PhenoSpD users on how to choose the appropriate methods. LD score regression is designed to estimate genetic correlation (the slope of the regression model) between a pair of human traits. As a by-product, it also provides the pairwise phenotypic correlation estimation (the intercept of the regression model) with standard errors. It is influenced by sample overlap (when there is no sample overlap between two GWASs, the phenotypic correlation estimation will be zero). However, its application is limited to traits with large sample size (e.g., N >5,000), good SNP coverage (e.g., number of SNPs >200,000), and heritable (Z score of SNP heritability >2) to fit the assumptions of LD score regression [11]. MetaCCA can be applied to almost all GWASs (e.g., in our simulation, the sample size >300 and the number of SNPs >1,000). However, it provides the approximate genetic correlation rather than the phenotypic correlation. We consider it can only be applied to the situation in which phenotypic and genetic correlation line up very well, such as metabolites [12]. It only provides the central estimation of the phenotypic correlation but no standard error and P value of the correlation. In addition, the method does not adjust the influence of sample overlap; to maximize the accuracy of the phenotypic correlation estimation, we could put GWASs with good sample overlap into a group and only apply metaCCA to each group of GWASs (rather than cross groups).

The phenotypic correlations of the human metabolome

In a real case study, we applied LD score regression to the human metabolome. We estimated 5,618 pair-wise phenotypic correlations among these 107 metabolites from Kettunen et al. [10] More details of the metabolites are listed in Supplementary Table S3. The phenotypic correlations among 107 metabolites and 487 UK Biobank traits estimated by LD score regression are presented in Supplementary Tables S5 and S6.

Multiple testing correction of the human phenome

Table 3 shows the number of independent traits for two high-dimensional, complex human traits datasets. PhenoSpD using GWAS results suggested 399.6 independent tests among 487 traits from UK Biobank, which is close to 352.4 independent tests estimated using real phenotypic correlation. For metabolites from Kettunen et al., PhenoSpD suggested 33.5 as the number of independent tests for theses metabolites, which greatly reduced the dimensionality for these complex molecular traits.

Table 3:

Summary of number of independent traits for the complex human trait networks

First author	Category	N__traits	N__SNPs	N__indep
Kettunen et al.	Metabolites	107	9 826 292	33.5
UK Biobank	All traits	487	10 879 180	399.6

Abbreviations: N_traits: number of traits in each molecular network; N_SNPs: number of SNPs in each network; and N_indep: number of independent tests in each network.

Summary of number of independent traits for the complex human trait networks Abbreviations: N_traits: number of traits in each molecular network; N_SNPs: number of SNPs in each network; and N_indep: number of independent tests in each network.

Discussion

In this study, we present an integrative method, PhenoSpD, that allows phenotypic correlation estimation and multiple testing correction for human phenome using GWAS summary statistics. We illustrate the application of PhenoSpD by estimating the phenotypic correlation structure and number of independent tests of 107 metabolites from Kettunen's study [10] and 487 UK Biobank traits for the very first time. These results showcase the ability of PhenoSpD to estimate an appropriate phenotypic correlation and multiple testing correction for complex and molecular traits when samples overlap between the GWASs.

Advantages and limitations of PhenoSpD

There are a few key advantages of PhenoSpD. First, our proposed approach utilizes the by-products of two established methods—metaCCA and bivariate LD score regression. We extended the simulations and real-world application of the by-products of these two methods and established that metaCCA can only be applied to metabolites and that bivariate LD score regression can only be used to estimate phenotypic correlation under certain conditions (Supplementary Table S1), which adds significant value to the previous findings ( [2, 3]). In addition, we provided a simple and user-friendly tool to correct for multiple testing for large-scale “omics” data analyses and phenome-wide association studies (PheWAS). The multiple testing correction will still be stringent (since limited sample overlap between two GWASs will drive phenotypic correlation toward null) but less stringent than Bonferroni correction. This approach is therefore particularly valuable for GWAS of complex human traits such as metabolites and large-scale biobanks. As exemplars, we cleaned and reformatted more than 594 GWAS traits and precalculated the phenotypic correlation matrix for these traits from a large-scale “omics” study and UK Biobank [10, 8]. In the GitHub repository, we also provide the precalculated phenotypic correlation matrix of 221 × 221 complex human traits in LD Hub. This greatly simplifies the process of multiple testing estimation for these traits. Following is a description of some limitations of PhenoSpD, which are general limitations when estimating phenotypic correlation using GWAS summary statistics: One sample setting: The samples of the two GWASs must be from substantially overlapping samples to effectively estimate phenotypic correlation. Genetic or environmental components: For metaCCA, when genetic components appear to dominate the phenotypic correlation, using beta coefficients to estimate phenotypic correlation will bias the estimation toward the genetic correlation. When the environmental components (“environment” here can be either shared environmental contributions or stochastic phenotypic variation [13]) dominate the phenotypic correlation, using beta coefficients to estimate phenotypic correlation will bias estimates toward the null. We consider it can only be applied to the situation in which phenotypic and genetic correlation line up very well, for example, metabolites [12]. For LD score regression, the method is able to capture both genetic correlation (which is represented by the slope of the regression model) and phenotypic correlation (which is represented by the intercept of the regression model). When environmental factors dominate the phenotypic correlation (which means the slope of LD score regression is close to zero), the intercept (which is built up using the error term of the SNP-trait association model) can still reconstruct a substantial component of the phenotypic correlation. Sample size of GWASs: We recommend sample size >5,000 for LD score regression and >300 for metaCCA. Number of SNPs: The number of SNPs included in the model should be more than 200,000 to get a more accurate correlation estimation. SNP coverage: Ideally, SNPs across the whole genome should be included in the model.

Potential application of PhenoSpD

The main application of PhenoSpD is to determine the appropriate multiple testing correction for high-dimensional phenotypic data from a single cohort or study (e.g., metabolomics [9], epigenetics [14], transcriptomics [15], and proteomics [16] platforms that assay hundreds to thousands of traits). This approach is less stringent than the very conservative Bonferroni correction, which is inappropriate given that many phenotypes are correlated and not actually independent. In an ideal world, if the individual-level data for such studies would be easily and readily available, it would be straightforward to determine the phenotypic correlations by using-individual-level phenotype data. However, individual-level phenotype data is not as readily available as GWAS summary statistics (which are increasingly openly accessible and downloadable). Large-scale biobanks, such as UK Biobank [8], are increasingly measuring a large number of phenotypes in the same sample. It is therefore likely to become more common for large-scale GWAS studies of diverse phenotypes to be published from the same set of participants, in contrast to the current situation of lots of GWAS from different samples with different phenotypic measurements. The proposed method will be particularly applicable for these biobanks. For example, recently automated GWAS of more than 2,400 human traits has been performed in the UK Biobank, enabling PhenoSpD analysis on a very large number of individuals (data can be downloaded from [17]). Moreover, PheWAS is becoming a very popular tool and the dimensionality of PheWAS will increase greatly in coming years. We are moving away from single, hypothesis-driven analyses to high dimensional hypothesis-free PheWAS analyses. Tools such as PhenoSpD are therefore potentially extremely useful for PheWAS approaches such as MR-PheWAS [18] and MR-Base [4]. To maximize the value of overlapping samples in published GWAS, we recommended a specific strategy when applying PhenoSpD. The strategy is, correlated traits tend to be measured and studied within the same pool of individuals from a specific consortium. For example, anthropometric traits are mostly meta-analyzed by the GIANT consortium [19-21]; and most of the glucose- and insulin-related traits are studied in the MAGIC consortium [22-24]. We could estimate the phenotypic correlations inside each consortium. In such a way, we will be able to utilize the overlapping samples to reconstruct part of the phenotypic correlation. In general, with the development of resources such as LD Hub and MR-Base and large-scale phenotyping and GWAS in major biobanks (e.g., UK Biobank), the proposed method, PhenoSpD, will become more relevant. Click here for additional data file. 8/29/2017 Reviewed Click here for additional data file. 5/21/2018 Reviewed Click here for additional data file. 9/8/2017 Reviewed Click here for additional data file. 6/7/2018 Reviewed Click here for additional data file. Click here for additional data file.

23 in total

1. Adjusting multiple testing in multilocus analyses using the eigenvalues of a correlation matrix.

Authors: J Li; L Ji
Journal: Heredity (Edinb) Date: 2005-09 Impact factor: 3.821

2. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies.

Authors: Brendan K Bulik-Sullivan; Po-Ru Loh; Hilary K Finucane; Stephan Ripke; Jian Yang; Nick Patterson; Mark J Daly; Alkes L Price; Benjamin M Neale
Journal: Nat Genet Date: 2015-02-02 Impact factor: 38.330

3. Epidemiology, epigenetics and the 'Gloomy Prospect': embracing randomness in population health research and practice.

Authors: George Davey Smith
Journal: Int J Epidemiol Date: 2011-06 Impact factor: 7.196

4. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age.

Authors: Cathie Sudlow; John Gallacher; Naomi Allen; Valerie Beral; Paul Burton; John Danesh; Paul Downey; Paul Elliott; Jane Green; Martin Landray; Bette Liu; Paul Matthews; Giok Ong; Jill Pell; Alan Silman; Alan Young; Tim Sprosen; Tim Peakman; Rory Collins
Journal: PLoS Med Date: 2015-03-31 Impact factor: 11.069

5. Connecting genetic risk to disease end points through the human blood plasma proteome.

Authors: Karsten Suhre; Matthias Arnold; Aditya Mukund Bhagwat; Richard J Cotton; Rudolf Engelke; Johannes Raffler; Hina Sarwath; Gaurav Thareja; Annika Wahl; Robert Kirk DeLisle; Larry Gold; Marija Pezer; Gordan Lauc; Mohammed A El-Din Selim; Dennis O Mook-Kanamori; Eman K Al-Dous; Yasmin A Mohamoud; Joel Malek; Konstantin Strauch; Harald Grallert; Annette Peters; Gabi Kastenmüller; Christian Gieger; Johannes Graumann
Journal: Nat Commun Date: 2017-02-27 Impact factor: 14.919

6. Defining the role of common variation in the genomic and biological architecture of adult human height.

Authors: Andrew R Wood; Tonu Esko; Jian Yang; Sailaja Vedantam; Tune H Pers; Stefan Gustafsson; Audrey Y Chu; Karol Estrada; Jian'an Luan; Zoltán Kutalik; Najaf Amin; Martin L Buchkovich; Damien C Croteau-Chonka; Felix R Day; Yanan Duan; Tove Fall; Rudolf Fehrmann; Teresa Ferreira; Anne U Jackson; Juha Karjalainen; Ken Sin Lo; Adam E Locke; Reedik Mägi; Evelin Mihailov; Eleonora Porcu; Joshua C Randall; André Scherag; Anna A E Vinkhuyzen; Harm-Jan Westra; Thomas W Winkler; Tsegaselassie Workalemahu; Jing Hua Zhao; Devin Absher; Eva Albrecht; Denise Anderson; Jeffrey Baron; Marian Beekman; Ayse Demirkan; Georg B Ehret; Bjarke Feenstra; Mary F Feitosa; Krista Fischer; Ross M Fraser; Anuj Goel; Jian Gong; Anne E Justice; Stavroula Kanoni; Marcus E Kleber; Kati Kristiansson; Unhee Lim; Vaneet Lotay; Julian C Lui; Massimo Mangino; Irene Mateo Leach; Carolina Medina-Gomez; Michael A Nalls; Dale R Nyholt; Cameron D Palmer; Dorota Pasko; Sonali Pechlivanis; Inga Prokopenko; Janina S Ried; Stephan Ripke; Dmitry Shungin; Alena Stancáková; Rona J Strawbridge; Yun Ju Sung; Toshiko Tanaka; Alexander Teumer; Stella Trompet; Sander W van der Laan; Jessica van Setten; Jana V Van Vliet-Ostaptchouk; Zhaoming Wang; Loïc Yengo; Weihua Zhang; Uzma Afzal; Johan Arnlöv; Gillian M Arscott; Stefania Bandinelli; Amy Barrett; Claire Bellis; Amanda J Bennett; Christian Berne; Matthias Blüher; Jennifer L Bolton; Yvonne Böttcher; Heather A Boyd; Marcel Bruinenberg; Brendan M Buckley; Steven Buyske; Ida H Caspersen; Peter S Chines; Robert Clarke; Simone Claudi-Boehm; Matthew Cooper; E Warwick Daw; Pim A De Jong; Joris Deelen; Graciela Delgado; Josh C Denny; Rosalie Dhonukshe-Rutten; Maria Dimitriou; Alex S F Doney; Marcus Dörr; Niina Eklund; Elodie Eury; Lasse Folkersen; Melissa E Garcia; Frank Geller; Vilmantas Giedraitis; Alan S Go; Harald Grallert; Tanja B Grammer; Jürgen Gräßler; Henrik Grönberg; Lisette C P G M de Groot; Christopher J Groves; Jeffrey Haessler; Per Hall; Toomas Haller; Goran Hallmans; Anke Hannemann; Catharina A Hartman; Maija Hassinen; Caroline Hayward; Nancy L Heard-Costa; Quinta Helmer; Gibran Hemani; Anjali K Henders; Hans L Hillege; Mark A Hlatky; Wolfgang Hoffmann; Per Hoffmann; Oddgeir Holmen; Jeanine J Houwing-Duistermaat; Thomas Illig; Aaron Isaacs; Alan L James; Janina Jeff; Berit Johansen; Åsa Johansson; Jennifer Jolley; Thorhildur Juliusdottir; Juhani Junttila; Abel N Kho; Leena Kinnunen; Norman Klopp; Thomas Kocher; Wolfgang Kratzer; Peter Lichtner; Lars Lind; Jaana Lindström; Stéphane Lobbens; Mattias Lorentzon; Yingchang Lu; Valeriya Lyssenko; Patrik K E Magnusson; Anubha Mahajan; Marc Maillard; Wendy L McArdle; Colin A McKenzie; Stela McLachlan; Paul J McLaren; Cristina Menni; Sigrun Merger; Lili Milani; Alireza Moayyeri; Keri L Monda; Mario A Morken; Gabriele Müller; Martina Müller-Nurasyid; Arthur W Musk; Narisu Narisu; Matthias Nauck; Ilja M Nolte; Markus M Nöthen; Laticia Oozageer; Stefan Pilz; Nigel W Rayner; Frida Renstrom; Neil R Robertson; Lynda M Rose; Ronan Roussel; Serena Sanna; Hubert Scharnagl; Salome Scholtens; Fredrick R Schumacher; Heribert Schunkert; Robert A Scott; Joban Sehmi; Thomas Seufferlein; Jianxin Shi; Karri Silventoinen; Johannes H Smit; Albert Vernon Smith; Joanna Smolonska; Alice V Stanton; Kathleen Stirrups; David J Stott; Heather M Stringham; Johan Sundström; Morris A Swertz; Ann-Christine Syvänen; Bamidele O Tayo; Gudmar Thorleifsson; Jonathan P Tyrer; Suzanne van Dijk; Natasja M van Schoor; Nathalie van der Velde; Diana van Heemst; Floor V A van Oort; Sita H Vermeulen; Niek Verweij; Judith M Vonk; Lindsay L Waite; Melanie Waldenberger; Roman Wennauer; Lynne R Wilkens; Christina Willenborg; Tom Wilsgaard; Mary K Wojczynski; Andrew Wong; Alan F Wright; Qunyuan Zhang; Dominique Arveiler; Stephan J L Bakker; John Beilby; Richard N Bergman; Sven Bergmann; Reiner Biffar; John Blangero; Dorret I Boomsma; Stefan R Bornstein; Pascal Bovet; Paolo Brambilla; Morris J Brown; Harry Campbell; Mark J Caulfield; Aravinda Chakravarti; Rory Collins; Francis S Collins; Dana C Crawford; L Adrienne Cupples; John Danesh; Ulf de Faire; Hester M den Ruijter; Raimund Erbel; Jeanette Erdmann; Johan G Eriksson; Martin Farrall; Ele Ferrannini; Jean Ferrières; Ian Ford; Nita G Forouhi; Terrence Forrester; Ron T Gansevoort; Pablo V Gejman; Christian Gieger; Alain Golay; Omri Gottesman; Vilmundur Gudnason; Ulf Gyllensten; David W Haas; Alistair S Hall; Tamara B Harris; Andrew T Hattersley; Andrew C Heath; Christian Hengstenberg; Andrew A Hicks; Lucia A Hindorff; Aroon D Hingorani; Albert Hofman; G Kees Hovingh; Steve E Humphries; Steven C Hunt; Elina Hypponen; Kevin B Jacobs; Marjo-Riitta Jarvelin; Pekka Jousilahti; Antti M Jula; Jaakko Kaprio; John J P Kastelein; Manfred Kayser; Frank Kee; Sirkka M Keinanen-Kiukaanniemi; Lambertus A Kiemeney; Jaspal S Kooner; Charles Kooperberg; Seppo Koskinen; Peter Kovacs; Aldi T Kraja; Meena Kumari; Johanna Kuusisto; Timo A Lakka; Claudia Langenberg; Loic Le Marchand; Terho Lehtimäki; Sara Lupoli; Pamela A F Madden; Satu Männistö; Paolo Manunta; André Marette; Tara C Matise; Barbara McKnight; Thomas Meitinger; Frans L Moll; Grant W Montgomery; Andrew D Morris; Andrew P Morris; Jeffrey C Murray; Mari Nelis; Claes Ohlsson; Albertine J Oldehinkel; Ken K Ong; Willem H Ouwehand; Gerard Pasterkamp; Annette Peters; Peter P Pramstaller; Jackie F Price; Lu Qi; Olli T Raitakari; Tuomo Rankinen; D C Rao; Treva K Rice; Marylyn Ritchie; Igor Rudan; Veikko Salomaa; Nilesh J Samani; Jouko Saramies; Mark A Sarzynski; Peter E H Schwarz; Sylvain Sebert; Peter Sever; Alan R Shuldiner; Juha Sinisalo; Valgerdur Steinthorsdottir; Ronald P Stolk; Jean-Claude Tardif; Anke Tönjes; Angelo Tremblay; Elena Tremoli; Jarmo Virtamo; Marie-Claude Vohl; Philippe Amouyel; Folkert W Asselbergs; Themistocles L Assimes; Murielle Bochud; Bernhard O Boehm; Eric Boerwinkle; Erwin P Bottinger; Claude Bouchard; Stéphane Cauchi; John C Chambers; Stephen J Chanock; Richard S Cooper; Paul I W de Bakker; George Dedoussis; Luigi Ferrucci; Paul W Franks; Philippe Froguel; Leif C Groop; Christopher A Haiman; Anders Hamsten; M Geoffrey Hayes; Jennie Hui; David J Hunter; Kristian Hveem; J Wouter Jukema; Robert C Kaplan; Mika Kivimaki; Diana Kuh; Markku Laakso; Yongmei Liu; Nicholas G Martin; Winfried März; Mads Melbye; Susanne Moebus; Patricia B Munroe; Inger Njølstad; Ben A Oostra; Colin N A Palmer; Nancy L Pedersen; Markus Perola; Louis Pérusse; Ulrike Peters; Joseph E Powell; Chris Power; Thomas Quertermous; Rainer Rauramaa; Eva Reinmaa; Paul M Ridker; Fernando Rivadeneira; Jerome I Rotter; Timo E Saaristo; Danish Saleheen; David Schlessinger; P Eline Slagboom; Harold Snieder; Tim D Spector; Konstantin Strauch; Michael Stumvoll; Jaakko Tuomilehto; Matti Uusitupa; Pim van der Harst; Henry Völzke; Mark Walker; Nicholas J Wareham; Hugh Watkins; H-Erich Wichmann; James F Wilson; Pieter Zanen; Panos Deloukas; Iris M Heid; Cecilia M Lindgren; Karen L Mohlke; Elizabeth K Speliotes; Unnur Thorsteinsdottir; Inês Barroso; Caroline S Fox; Kari E North; David P Strachan; Jacques S Beckmann; Sonja I Berndt; Michael Boehnke; Ingrid B Borecki; Mark I McCarthy; Andres Metspalu; Kari Stefansson; André G Uitterlinden; Cornelia M van Duijn; Lude Franke; Cristen J Willer; Alkes L Price; Guillaume Lettre; Ruth J F Loos; Michael N Weedon; Erik Ingelsson; Jeffrey R O'Connell; Goncalo R Abecasis; Daniel I Chasman; Michael E Goddard; Peter M Visscher; Joel N Hirschhorn; Timothy M Frayling
Journal: Nat Genet Date: 2014-10-05 Impact factor: 38.330

7. Metabolic signatures of adiposity in young adults: Mendelian randomization analysis and effects of weight change.

Authors: Peter Würtz; Qin Wang; Antti J Kangas; Rebecca C Richmond; Joni Skarp; Mika Tiainen; Tuulia Tynkkynen; Pasi Soininen; Aki S Havulinna; Marika Kaakinen; Jorma S Viikari; Markku J Savolainen; Mika Kähönen; Terho Lehtimäki; Satu Männistö; Stefan Blankenberg; Tanja Zeller; Jaana Laitinen; Anneli Pouta; Pekka Mäntyselkä; Mauno Vanhala; Paul Elliott; Kirsi H Pietiläinen; Samuli Ripatti; Veikko Salomaa; Olli T Raitakari; Marjo-Riitta Järvelin; George Davey Smith; Mika Ala-Korpela
Journal: PLoS Med Date: 2014-12-09 Impact factor: 11.069

8. Genome-wide study for circulating metabolites identifies 62 loci and reveals novel systemic effects of LPA.

Authors: Johannes Kettunen; Ayşe Demirkan; Peter Würtz; Harmen H M Draisma; Toomas Haller; Rajesh Rawal; Anika Vaarhorst; Antti J Kangas; Leo-Pekka Lyytikäinen; Matti Pirinen; René Pool; Antti-Pekka Sarin; Pasi Soininen; Taru Tukiainen; Qin Wang; Mika Tiainen; Tuulia Tynkkynen; Najaf Amin; Tanja Zeller; Marian Beekman; Joris Deelen; Ko Willems van Dijk; Tõnu Esko; Jouke-Jan Hottenga; Elisabeth M van Leeuwen; Terho Lehtimäki; Evelin Mihailov; Richard J Rose; Anton J M de Craen; Christian Gieger; Mika Kähönen; Markus Perola; Stefan Blankenberg; Markku J Savolainen; Aswin Verhoeven; Jorma Viikari; Gonneke Willemsen; Dorret I Boomsma; Cornelia M van Duijn; Johan Eriksson; Antti Jula; Marjo-Riitta Järvelin; Jaakko Kaprio; Andres Metspalu; Olli Raitakari; Veikko Salomaa; P Eline Slagboom; Melanie Waldenberger; Samuli Ripatti; Mika Ala-Korpela
Journal: Nat Commun Date: 2016-03-23 Impact factor: 14.919

9. MR-PheWAS: hypothesis prioritization among potential causal effects of body mass index on many outcomes, using Mendelian randomization.

Authors: Louise A C Millard; Neil M Davies; Nic J Timpson; Kate Tilling; Peter A Flach; George Davey Smith
Journal: Sci Rep Date: 2015-11-16 Impact factor: 4.379

10. Impact of common genetic determinants of Hemoglobin A1c on type 2 diabetes risk and diagnosis in ancestrally diverse populations: A transethnic genome-wide meta-analysis.

Authors: Eleanor Wheeler; Aaron Leong; Ching-Ti Liu; Marie-France Hivert; Rona J Strawbridge; Clara Podmore; Man Li; Jie Yao; Xueling Sim; Jaeyoung Hong; Audrey Y Chu; Weihua Zhang; Xu Wang; Peng Chen; Nisa M Maruthur; Bianca C Porneala; Stephen J Sharp; Yucheng Jia; Edmond K Kabagambe; Li-Ching Chang; Wei-Min Chen; Cathy E Elks; Daniel S Evans; Qiao Fan; Franco Giulianini; Min Jin Go; Jouke-Jan Hottenga; Yao Hu; Anne U Jackson; Stavroula Kanoni; Young Jin Kim; Marcus E Kleber; Claes Ladenvall; Cecile Lecoeur; Sing-Hui Lim; Yingchang Lu; Anubha Mahajan; Carola Marzi; Mike A Nalls; Pau Navarro; Ilja M Nolte; Lynda M Rose; Denis V Rybin; Serena Sanna; Yuan Shi; Daniel O Stram; Fumihiko Takeuchi; Shu Pei Tan; Peter J van der Most; Jana V Van Vliet-Ostaptchouk; Andrew Wong; Loic Yengo; Wanting Zhao; Anuj Goel; Maria Teresa Martinez Larrad; Dörte Radke; Perttu Salo; Toshiko Tanaka; Erik P A van Iperen; Goncalo Abecasis; Saima Afaq; Behrooz Z Alizadeh; Alain G Bertoni; Amelie Bonnefond; Yvonne Böttcher; Erwin P Bottinger; Harry Campbell; Olga D Carlson; Chien-Hsiun Chen; Yoon Shin Cho; W Timothy Garvey; Christian Gieger; Mark O Goodarzi; Harald Grallert; Anders Hamsten; Catharina A Hartman; Christian Herder; Chao Agnes Hsiung; Jie Huang; Michiya Igase; Masato Isono; Tomohiro Katsuya; Chiea-Chuen Khor; Wieland Kiess; Katsuhiko Kohara; Peter Kovacs; Juyoung Lee; Wen-Jane Lee; Benjamin Lehne; Huaixing Li; Jianjun Liu; Stephane Lobbens; Jian'an Luan; Valeriya Lyssenko; Thomas Meitinger; Tetsuro Miki; Iva Miljkovic; Sanghoon Moon; Antonella Mulas; Gabriele Müller; Martina Müller-Nurasyid; Ramaiah Nagaraja; Matthias Nauck; James S Pankow; Ozren Polasek; Inga Prokopenko; Paula S Ramos; Laura Rasmussen-Torvik; Wolfgang Rathmann; Stephen S Rich; Neil R Robertson; Michael Roden; Ronan Roussel; Igor Rudan; Robert A Scott; William R Scott; Bengt Sennblad; David S Siscovick; Konstantin Strauch; Liang Sun; Morris Swertz; Salman M Tajuddin; Kent D Taylor; Yik-Ying Teo; Yih Chung Tham; Anke Tönjes; Nicholas J Wareham; Gonneke Willemsen; Tom Wilsgaard; Aroon D Hingorani; Josephine Egan; Luigi Ferrucci; G Kees Hovingh; Antti Jula; Mika Kivimaki; Meena Kumari; Inger Njølstad; Colin N A Palmer; Manuel Serrano Ríos; Michael Stumvoll; Hugh Watkins; Tin Aung; Matthias Blüher; Michael Boehnke; Dorret I Boomsma; Stefan R Bornstein; John C Chambers; Daniel I Chasman; Yii-Der Ida Chen; Yduan-Tsong Chen; Ching-Yu Cheng; Francesco Cucca; Eco J C de Geus; Panos Deloukas; Michele K Evans; Myriam Fornage; Yechiel Friedlander; Philippe Froguel; Leif Groop; Myron D Gross; Tamara B Harris; Caroline Hayward; Chew-Kiat Heng; Erik Ingelsson; Norihiro Kato; Bong-Jo Kim; Woon-Puay Koh; Jaspal S Kooner; Antje Körner; Diana Kuh; Johanna Kuusisto; Markku Laakso; Xu Lin; Yongmei Liu; Ruth J F Loos; Patrik K E Magnusson; Winfried März; Mark I McCarthy; Albertine J Oldehinkel; Ken K Ong; Nancy L Pedersen; Mark A Pereira; Annette Peters; Paul M Ridker; Charumathi Sabanayagam; Michele Sale; Danish Saleheen; Juha Saltevo; Peter Eh Schwarz; Wayne H H Sheu; Harold Snieder; Timothy D Spector; Yasuharu Tabara; Jaakko Tuomilehto; Rob M van Dam; James G Wilson; James F Wilson; Bruce H R Wolffenbuttel; Tien Yin Wong; Jer-Yuarn Wu; Jian-Min Yuan; Alan B Zonderman; Nicole Soranzo; Xiuqing Guo; David J Roberts; Jose C Florez; Robert Sladek; Josée Dupuis; Andrew P Morris; E-Shyong Tai; Elizabeth Selvin; Jerome I Rotter; Claudia Langenberg; Inês Barroso; James B Meigs
Journal: PLoS Med Date: 2017-09-12 Impact factor: 11.069

17 in total

1. GWAS meta-analysis followed by Mendelian randomization revealed potential control mechanisms for circulating α-Klotho levels.

Authors: Ingrid Gergei; Jie Zheng; Till F M Andlauer; Vincent Brandenburg; Nazanin Mirza-Schreiber; Bertram Müller-Myhsok; Bernhard K Krämer; Daniel Richard; Louise Falk; Sofia Movérare-Skrtic; Claes Ohlsson; George Davey Smith; Winfried März; Jakob Voelkl; Jonathan H Tobias
Journal: Hum Mol Genet Date: 2022-03-03 Impact factor: 6.150

2. Genome-wide analyses of individual differences in quantitatively assessed reading- and language-related skills in up to 34,000 people.

Authors: Else Eising; Nazanin Mirza-Schreiber; Eveline L de Zeeuw; Carol A Wang; Dongnhu T Truong; Andrea G Allegrini; Chin Yang Shapland; Gu Zhu; Karen G Wigg; Margot L Gerritse; Barbara Molz; Gökberk Alagöz; Alessandro Gialluisi; Filippo Abbondanza; Kaili Rimfeld; Marjolein van Donkelaar; Zhijie Liao; Philip R Jansen; Till F M Andlauer; Timothy C Bates; Manon Bernard; Kirsten Blokland; Milene Bonte; Anders D Børglum; Thomas Bourgeron; Daniel Brandeis; Fabiola Ceroni; Valéria Csépe; Philip S Dale; Peter F de Jong; John C DeFries; Jean-François Démonet; Ditte Demontis; Yu Feng; Scott D Gordon; Sharon L Guger; Marianna E Hayiou-Thomas; Juan A Hernández-Cabrera; Jouke-Jan Hottenga; Charles Hulme; Juha Kere; Elizabeth N Kerr; Tanner Koomar; Karin Landerl; Gabriel T Leonard; Maureen W Lovett; Heikki Lyytinen; Nicholas G Martin; Angela Martinelli; Urs Maurer; Jacob J Michaelson; Kristina Moll; Anthony P Monaco; Angela T Morgan; Markus M Nöthen; Zdenka Pausova; Craig E Pennell; Bruce F Pennington; Kaitlyn M Price; Veera M Rajagopal; Franck Ramus; Louis Richer; Nuala H Simpson; Shelley D Smith; Margaret J Snowling; John Stein; Lisa J Strug; Joel B Talcott; Henning Tiemeier; Marc P van der Schroeff; Ellen Verhoef; Kate E Watkins; Margaret Wilkinson; Margaret J Wright; Cathy L Barr; Dorret I Boomsma; Manuel Carreiras; Marie-Christine J Franken; Jeffrey R Gruen; Michelle Luciano; Bertram Müller-Myhsok; Dianne F Newbury; Richard K Olson; Silvia Paracchini; Tomáš Paus; Robert Plomin; Sheena Reilly; Gerd Schulte-Körne; J Bruce Tomblin; Elsje van Bergen; Andrew J O Whitehouse; Erik G Willcutt; Beate St Pourcain; Clyde Francks; Simon E Fisher
Journal: Proc Natl Acad Sci U S A Date: 2022-08-23 Impact factor: 12.779

Review 3. Recent Developments in Mendelian Randomization Studies.

Authors: Jie Zheng; Denis Baird; Maria-Carolina Borges; Jack Bowden; Gibran Hemani; Philip Haycock; David M Evans; George Davey Smith
Journal: Curr Epidemiol Rep Date: 2017-11-22

4. Genetic polymorphisms of long noncoding RNA RP11-37B2.1 associate with susceptibility of tuberculosis and adverse events of antituberculosis drugs in west China.

Authors: Jiajia Song; Tangyuheng Liu; Zhenzhen Zhao; Xuejiao Hu; Qian Wu; Wu Peng; Xuerong Chen; Binwu Ying
Journal: J Clin Lab Anal Date: 2019-03-28 Impact factor: 2.352

Review 5. A Data-Driven Review of the Genetic Factors of Pregnancy Complications.

Authors: Yury A Barbitoff; Alexander A Tsarev; Elena S Vashukova; Evgeniia M Maksiutenko; Liudmila V Kovalenko; Larisa D Belotserkovtseva; Andrey S Glotov
Journal: Int J Mol Sci Date: 2020-05-11 Impact factor: 5.923

6. A cross-disorder PRS-pheWAS of 5 major psychiatric disorders in UK Biobank.

Authors: Beate Leppert; Louise A C Millard; Lucy Riglin; George Davey Smith; Anita Thapar; Kate Tilling; Esther Walton; Evie Stergiakouli
Journal: PLoS Genet Date: 2020-05-11 Impact factor: 5.917

7. Systematic Mendelian randomization framework elucidates hundreds of CpG sites which may mediate the influence of genetic variants on disease.

Authors: Tom G Richardson; Philip C Haycock; Jie Zheng; Nicholas J Timpson; Tom R Gaunt; George Davey Smith; Caroline L Relton; Gibran Hemani
Journal: Hum Mol Genet Date: 2018-09-15 Impact factor: 6.150

8. Searching for the causal effects of body mass index in over 300 000 participants in UK Biobank, using Mendelian randomization.

Authors: Louise A C Millard; Neil M Davies; Kate Tilling; Tom R Gaunt; George Davey Smith
Journal: PLoS Genet Date: 2019-02-01 Impact factor: 5.917

9. Circulating metabolites and the risk of type 2 diabetes: a prospective study of 11,896 young adults from four Finnish cohorts.

Authors: Ari V Ahola-Olli; Linda Mustelin; Maria Kalimeri; Johannes Kettunen; Jari Jokelainen; Juha Auvinen; Katri Puukka; Aki S Havulinna; Terho Lehtimäki; Mika Kähönen; Markus Juonala; Sirkka Keinänen-Kiukaanniemi; Veikko Salomaa; Markus Perola; Marjo-Riitta Järvelin; Mika Ala-Korpela; Olli Raitakari; Peter Würtz
Journal: Diabetologia Date: 2019-10-04 Impact factor: 10.122

10. Genome-wide association study implicates novel loci and reveals candidate effector genes for longitudinal pediatric bone accrual.

Authors: Diana L Cousminer; Yadav Wagley; James A Pippin; Ahmed Elhakeem; Gregory P Way; Matthew C Pahl; Shana E McCormack; Alessandra Chesi; Jonathan A Mitchell; Joseph M Kindler; Denis Baird; April Hartley; Laura Howe; Heidi J Kalkwarf; Joan M Lappe; Sumei Lu; Michelle E Leonard; Matthew E Johnson; Hakon Hakonarson; Vicente Gilsanz; John A Shepherd; Sharon E Oberfield; Casey S Greene; Andrea Kelly; Deborah A Lawlor; Benjamin F Voight; Andrew D Wells; Babette S Zemel; Kurt D Hankenson; Struan F A Grant
Journal: Genome Biol Date: 2021-01-04 Impact factor: 13.583