Joseph K Pickrell1,2, Tomaz Berisa1, Jimmy Z Liu1, Laure Ségurel3, Joyce Y Tung4, David A Hinds4. 1. New York Genome Center, New York, New York, USA. 2. Department of Biological Sciences, Columbia University, New York, New York, USA. 3. UMR 7206 Eco-Anthropologie et Ethnobiologie, CNRS, MNHN, Université Paris Diderot, Sorbonne Paris Cité, Paris, France. 4. 23andMe, Inc., Mountain View, California, USA.
Abstract
We performed a scan for genetic variants associated with multiple phenotypes by comparing large genome-wide association studies (GWAS) of 42 traits or diseases. We identified 341 loci (at a false discovery rate of 10%) associated with multiple traits. Several loci are associated with multiple phenotypes; for example, a nonsynonymous variant in the zinc transporter SLC39A8 influences seven of the traits, including risk of schizophrenia (rs13107325: log-transformed odds ratio (log OR) = 0.15, P = 2 × 10(-12)) and Parkinson disease (log OR = -0.15, P = 1.6 × 10(-7)), among others. Second, we used these loci to identify traits that have multiple genetic causes in common. For example, variants associated with increased risk of schizophrenia also tended to be associated with increased risk of inflammatory bowel disease. Finally, we developed a method to identify pairs of traits that show evidence of a causal relationship. For example, we show evidence that increased body mass index causally increases triglyceride levels.
We performed a scan for genetic variants associated with multiple phenotypes by comparing large genome-wide association studies (GWAS) of 42 traits or diseases. We identified 341 loci (at a false discovery rate of 10%) associated with multiple traits. Several loci are associated with multiple phenotypes; for example, a nonsynonymous variant in the zinc transporter SLC39A8 influences seven of the traits, including risk of schizophrenia (rs13107325: log-transformed odds ratio (log OR) = 0.15, P = 2 × 10(-12)) and Parkinson disease (log OR = -0.15, P = 1.6 × 10(-7)), among others. Second, we used these loci to identify traits that have multiple genetic causes in common. For example, variants associated with increased risk of schizophrenia also tended to be associated with increased risk of inflammatory bowel disease. Finally, we developed a method to identify pairs of traits that show evidence of a causal relationship. For example, we show evidence that increased body mass index causally increases triglyceride levels.
The observation that a genetic variant affects multiple phenotypes (a phenomenon often called “pleiotropy” [1-3], though we will not use this term) is informative in a number of applications. One such application is to learn about the molecular function of a gene. For example, men with cystic fibrosis (primarily known as a lung disease) are often infertile due to congenital absence of the vas deferens; this is evidence of a shared role for the CFTR protein in lung function and the development of reproductive organs [4]. Another application is to learn about the causal relationships between traits. For example, individuals with congenital hypercholesterolemia also have elevated risk of heart disease [5]; this is now interpreted as evidence that changes in lipid levels causally influence heart disease risk [6].In these two applications, the same observation–that a genetic variant influences two traits–is interpreted in fundamentally different ways depending on known aspects of biology. In the first case, a genetic variant influences the two phenotypes through independent physiological mechanisms (graphically: P1 ← G → P2, if G represents the genotype, P1 the first phenotype, P2 the second phenotype, and the arrows represent causal relationships[7]), while in the second case, G → P1 → P2. In some situations, knowing which interpretation of the observation to prefer is simple: for example, it seems difficult to imagine how the reproductive and lung phenotypes of a CFTR mutation could be related in a causal chain. In other situations, interpretation is considerably more challenging. For example, the causal connections between various lipid phenotypes and heart disease have been debated for decades (e.g. [8]).As the number of reliable associations between genetic variants and various phenotypes has grown over the last decade [9], these issues have received increasing attention. A number of recent studies have identified genetic variants associated with multiple traits [10-20]; in general, these associations are interpreted as most plausibly due to independent effects of a genetic variant on different aspects of physiology. For example, a genetic variant in LGR4 is associated with bone mineral density (BMD), age at menarche, and risk of gallbladder cancer [16], presumably due to effects mediated through different tissues.There has also been increasing interest in the alternative, causal framework for interpreting genetic variants that influence multiple phenotypes, which has been formalized under the name “Mendelian randomization” [21-23]. Mendelian randomization has been used to provide evidence for (or against) a causal role for various clinical variables in disease etiology [24-30]. For example, genetic variants associated with body mass index (BMI) are also associated with type 2 diabetes [27]; this is consistent with a causal role for weight gain in the etiology of diabetes.To date, most studies of multiple traits have been performed genome-wide on groups of traits already known or hypothesized to be related [10;31-33], or via testing small sets of variants for effects on a wide range of traits [20;34]. We aimed to systematically perform a genome-wide search for genetic variants that influence pairs of traits, and then to interpret these associations in the light of the causal and non-causal models described above. In this paper, we describe the results of such a search using large genome-wide association studies of 42 traits.
Results
We assembled summary statistics from 43 genome-wide association studies of 42 traits or diseases performed in individuals of European descent (Table 1; two of these GWAS are for age at menarche). These studies span a wide range of phenotypes, from anthropometric traits (e.g. height, BMI, nose size) to neurological disease (e.g. Alzheimer's disease, Parkinson's disease) to susceptibility to infection (e.g. childhood ear infections, tonsillectomy). 17 of these GWAS were performed by the personal genomics company 23andMe, and have not previously been reported (for details of these studies, see Supplementary Data Sets 1-17). For studies that were not done using imputation to all variants in phase 1 of the 1000 Genomes Project [35], we performed imputation at the level of summary statistics using ImpG v1.0 [36]. We estimated the approximate number of independent associated variants (at a false discovery rate of 10%) in each study using fgwas v.0.3.6 [37]. The number of associations ranged from around five (for age at voice drop in men) to over 500 (for height).
Table 1
Phenotypes used in this study
For each study, we show the name of the phenotype, the abbreviation that will be used throughout this paper, the data source, the number of independent autosomal loci identified at a false discovery rate of 10%, and the number of participants in the study. For studies where the data source is 23andMe, a complete description of the GWAS is presented in the Supplementary Material.
Phenotype
Abbreviation
Data source
Approx # of loci
Approx # of participants, in thousands (cases/controls, if applicable)
Neurological phenotypes
Alzheimer's disease
AD
75
11
17 / 37
Migraine
MIGR
23andMe
37
53 / 231
Parkinson's disease
PD
23andMe
43
10 / 325
Photic sneeze reflex
PS
23andMe
66
32 / 67
Schizophrenia
SCZ
59
222
34 / 46
Anthropometric/social traits
Beighton hypermobility
BHM
23andMe
18
64
Breast size
CUP
23andMe
14
34
Body mass index
BMI
72
30
240
Bone mineral density (femoral neck)
FNBMD
17
19
33
Bone mineral density (lumbar spine)
LSBMD
17
21
32
Chin dimples
DIMP
23andMe
57
58 / 13
Educational attainment
EDU
76
93
294
Height
HEIGHT
71
584
253
Male pattern baldness
MPB
23andMe
49
9 / 8
Nearsightedness
NST
23andMe
183
106 / 86
Nose size
NOSE
23andMe
13
67
Waist-hip ratio
WHR
77
13
143
Unibrow
UB
23andMe
61
69
Immune-related traits
Any allergies
ALL
23andMe
43
67 / 114
Asthma
ATH
23andMe
35
28 / 129
Childhood ear infections
CEI
23andMe
15
47 / 75
Crohn's disease
CD
78
61
6 / 15
Hypothyroidism
HTHY
23andMe
30
18 / 117
Rheumatoid arthritis
RA
79
74
14 / 44
Tonsillectomy
TS
23andMe
48
60 / 113
Ulcerative colitis
UC
78
42
7 / 21
Metabolic phenotypes
Age at menarche
AAM
43
70
133
Age at menarche (23andMe)
AAM (23)
23andMe
55
77
Age at voice drop
AVD
23andMe
5
56
Coronary artery disease
CAD
45
11
22 / 65
Type 2 diabetes
T2D
80
11
12 / 57
Fasting glucose
FG
81
15
58
Low-density lipoproteins
LDL
82
41
85
High-density lipoproteins
HDL
82
46
89
Triglycerides
TG
82
31
86
Total cholesterol
TC
82
53
89
Hematopoeitic traits
Hemoglobin
HB
83
16
51
Mean cell hemoglobin concentration
MCHC
83
15
46
Mean red cell volume
MCV
83
42
48
Packed red cell volume
PCV
83
13
44
Red blood cell count
RBC
83
25
45
Platelet count
PLT
84
50
44
Mean platelet volume
MPV
84
29
17
Identification of genetic variants that influence pairs of traits
We first aimed to identify genetic variants that influence pairs of traits. To do this, we developed a statistical model (extending that used by Giambartolomei et al. [38]) to estimate the probability that a given genomic region either 1) contains a genetic variant that influences the first trait, 2) contains a genetic variant that influences the second trait, 3) contains a genetic variant that influences both traits, or 4) contains both a genetic variant that influences the first trait and a separate genetic variant that influences the second trait (Figure 1). The input to the model is the set of summary statistics (effect size estimates and standard errors) for each SNP in the genome on each of the two phenotypes, and (if the two GWAS were performed on overlapping sets of individuals) the expected correlation in the summary statistics due to correlation between the phenotypes. We can then fit the following log-likelihood function: where D is the data, M is the number of approximately independent blocks in the genome, Π0 is the prior probability that a region contains no genetic variants than influence either trait, Π1, Π2, Π3 and Π4 represent the prior probabilities of the four models described above, Θ is the set of all five Π parameters, and is the regional Bayes factor measuring the support for model j in genomic region i (see Supplementary Information for details). In the presence of missing data, we consider only the subset of SNPs with data in both studies; if the causal SNP is not present this acts to reduce power to detect a shared effect [38]. In fitting this model, we estimate the prior parameters and the posterior probability of each model for each region of the genome (for numerical stability, in practice we penalize the estimates of the prior parameters, and so obtain maximum a posteriori estimates). We were mainly interested in the estimated prior probability that each genomic region contains a variant that influences both trait () and the corresponding posterior probabilities for each genomic region.
Figure 1
Schematic of the different models considered for a given genomic region and two GWAS
We divide the genome into approximately independent blocks (see Methods), and estimate the proportion of blocks that fit into the shown patterns. The null model with no associations is not shown. Each point represents a single genetic variant.
Several caveats of this method are worth mentioning. First, note that the estimate is best thought of as the proportion of genomic regions that detectably influence both traits–if one study is small and underpowered, this estimate will necessary be zero. This contrasts with methods that aim to provide unbiased estimates of the “genetic correlation” between traits that do not depend on sample size [39-41]. Second, in general it is not possible to distinguish a single causal variant that influences both traits (Model 3 in Figure 1) from two separate causal variants (Model 4 in Figure 1) in the presence of strong linkage disequilibrium between the causal variants. For any individual genomic region discussed below, the possibility of two highly correlated causal variants must be considered as an alternative possibility in the absence of functional follow-up. (Indeed, this latter possibility appears to be common in quantitative trait locus studies performed in model organisms [42]). Finally, we evaluated the method in simulations (Supplementary Figures 1-5), and found that the model gives a small overestimate of proportion of shared effects (Supplementary Figure 3). This is because the amount of evidence against the null model of no associations is greater when a variant influences both phenotypes compared to when it only influence a single phenotype (Supplementary Figure 4).
Overlapping association signals identified in 43 GWAS
We applied the method to all pairs of the 43 GWAS listed in Table 1. For each pair of studies, we first estimated the expected correlation in the effect sizes from the summary statistics, and included this correction for overlapping individuals in the model. Note that this is conservative: in pairs of GWAS where we are sure there are no overlapping individuals (for example, age at menarche and age at voice drop) we see that the correlation in the summary statistics is non-zero, indicating that we are correcting out some truly shared genetic effects on the two traits (Supplementary Figure 6).To gain an exploratory sense of the relationships between the phenotypes, we examined the patterns of overlap in associations among all 43 studies. Specifically, the model can be used to estimate, for each pair of traits [i,j], the proportion of detected variants that influence trait i that also detectably influence trait j. These estimates are shown in Figure 2, with phenotypes clustered according to their patterns of overlap. We see several clusters of related traits. For example, of the variants that detectably influence age at menarche (in the Perry et al. [43] study), the maximum a posteriori estimate is that 36% detectably influence height, 30% detectably influence age at voice drop, 28% influence BMI, 10% influence breast size, and 10% influence male pattern baldness. We interpret this as a set of phenotypes that share hormonal regulation. Additionally, there is a large cluster of phenotypes including coronary artery disease, type 2 diabetes, red blood cell traits, and lipid traits, which we interpret as a set of metabolic traits. Further, immune-related disease (allergies, asthma, hypothyroidism, Crohn's disease and rheumatoid arthritis) all cluster together, and also cluster with infectious disease traits (childhood ear infections and tonsillectomy). This biologically-revelant clustering validates the principle that GWAS variants can identify shared mechanisms underlying pairs of traits in a systematic way. As a control, we performed the same clustering of phenotypes by the estimated proportion of genomic regions where two causal sites fall nearby (Model 4 in Figure 1). In this case, there was no biologically-meaningful clustering (Supplementary Figure 7).
Figure 2
Heatmap showing patterns of overlap between traits
Each square [i,j] shows the maximum a posteriori estimate of the proportion of genetic variants that influence trait i that also influence trait j, where i indexes rows and j indexes columns. Note that this is not symmetric. Darker colors represent larger proportions. Colors are shown for all pairs of traits that have at least one region in the set of 341 identified loci; all other pairs are set to white. Phenotypes were clustered by hierarchical clustering in R [74].
Individual loci that influence many traits
We next examined the individual loci identified by these pairwise GWAS. We identified 341 genomic regions where we infer the presence of a variant that influences a pair of traits, at a threshold of a posterior probability greater than 0.9 of model 3 (Supplementary Table 1). This number excludes “trivial” findings where a genetic variant influences two similar traits (two lipid traits, two red blood cell traits, two platelet traits, both measures of bone mineral density, both inflammatory bowel diseases, or type 2 diabetes and fasting glucose) and the MHC region. A previous “phenome-wide association study” identified 44 genetic variants associated with multiple phenotypes [34], so this represents an order-of-magnitude increase in the number of such loci.Some genomic regions contain variants that influence a large number of the traits we considered. We ranked each genomic region according to how many phenotypes share genetic associations in the region (that is, if the pairwise scan for both height and CAD, and the pairwise scan for CAD and LDL, both indicated the same region, we counted this as three phenotypes sharing an association in the region). The top region in this ranking identified a non-synonymous polymorphism in SH2B3 (rs3184504) that is associated with a number of autoimmune diseases, lipid traits, heart disease, and red blood cell traits (Supplementary Figure 8; Supplementary Table 2). This variant has been identified in many GWAS, particularly for autoimmune disease [44].The next region in this ranking contains the gene coding for the ABO histo-blood groups in humans, and has a variant associated with 11 traits in these data (and many other additional traits not in these data, see also [20;45-47]). In Figure 3A, we show the association statistics in this region for coronary artery disease and probability of having a tonsillectomy. At the lead SNP, the non-reference allele is associated with increased risk of CAD (Z = 5.7; P = 1.1 × 10−8) and increased risk of having a tonsillectomy (Z = 6.0; P =1.5 × 10−9). This variant is also strongly associated with other immune, red blood cell, and lipid traits in these data (Figure 3B). A tag for a microsatellite that influences the expression of ABO
[48] is correlated to the lead SNP rs635634, as is a tag for the O blood group (Figure 3A). However, the lead SNP is an eQTL for both ABO and the nearby gene SLC2A6 in whole blood [46], so this allele may in fact have downstream effects via effects on the expression of two genes.
Figure 3
Multiple associations near the ABO gene. A. Association signals for coronary artery disease and tonsillectomy
In the top panel, we show the P-values for association with coronary artery disease for variants in the window around the ABO gene. In the bottom panel are the P-values for association with tonsillectomy. In both panels, SNPs that tag functionally-important alleles at ABO are in color. In the middle are the gene models in the region–exons are denoted by blue boxes, and introns with red lines. Note that the ABO gene is transcribed on the negative strand. B. Association effect sizes for rs635634 on all tested traits. Shown are the effect size estimates for rs635634 for all traits. The lines represent 95% confidence intervals. Traits are grouped according to whether they are quantitative traits (in which case the x-axis is in units of standard deviations) or case/control traits (in which case the x-axis is in units of log-odds).
Among the top-ranked regions are several where the likely causal variant is known:A non-synonymous variant in the zinc transporter SLC39A8 (rs13107325; Supplementary Figure 9) that is associated with schizophrenia (log-odds ratio of the non-reference allele = 0.15, P = 2 × 10−12), Parkinson's disease (log-odds ratio = −0.15, P = 1.6 × 10−7), and height s.d., P = 3.8 × 10−7), among othersA non-synonymous variant in the glucokinase regulator GCKR (rs1260326; Supplementary Figure 10) that is associated with fasting glucose ( s.d., P = 5 × 10−25) and height ( s.d., P = 2.6 × 10−11), among others.A set of variants near the APOE gene (which we presume to be driven by the APOE4 allele; Supplementary Figure 11) that is associated with nearsightedness (rs6857 log-odds ratio = −0.04, P = 1.8 × 10−5), waist-hip ratio ( s.d., P = 8.3 × 10−5), and several lipid traits apart from the well-known association with Alzheimer's disease.Regulatory variants in an intron of the FTO gene [49;50] that are associated with breast size in women (Supplementary Figure 12: rs1421085 s.d., P = 3.5 × 10−7) and age at voice drop in men ( s.d., P = 2.7 × 10−5), among others.It has previously been observed that association signals for different phenotypes tend to cluster spatially in the genome [51]; these results suggest that in some cases clustered associations are driven by single variants. We note anecdotally that the variants that influence a large number of phenotypes seem to often be non-synonymous, rather than regulatory, changes, which contrasts with the pattern seen in association studies overall (e.g. [37]).
Identifying pairs of phenotypes with correlated effect sizes
In our scan for variants that influence pairs of phenotypes, we did not assume any relationship between the effect sizes of a variant on the two phenotypes. However, if two traits are influenced by shared underlying molecular mechanisms, we might expect the effects of a variant on the two phenotypes to be correlated. To test this, we returned to the set of variants identified by analysis of each phenotype individually (the numbers of these variants for each trait are in Table 1). For each set, we calculated the rank correlation between the effect sizes of the variants on the index trait (the one in which the variants were identified) and all of the other traits.The results of this analysis are presented in Figure 4. Apart from closely related traits (e.g. the two measurements of bone density), we see a number of traits that are correlated at a genetic level. We focus on two of these. First, variants that delay age of menarche in women tend, on average, to decrease BMI (ρ = −0.53, P = 1.2 × 10−6), reduce risk of male pattern baldness (ρ = −0.45, P = 5.9 × 10−5), and increase height (ρ = 0.52, P = 2.2 × 10−6; Figure 4). These patterns hold both for the GWAS on age at menarche performed by Perry et al.[43] and that performed by 23andMe (Figure 4). Most of these variants also delay age at voice drop in men (Figure 2), so we interpret these variants as ones that influence pubertal timing in general. The negative correlation between a variant's effect on age at menarche and BMI has previously been observed [39;43;52], as has the positive correlation between a variant's effect on age at menarche and height [39;43]. The negative correlation between a variant's effect on age at menarche (or more likely, puberty in general) and male pattern baldness has not been previously noted, but is consistent with the known role for increased androgen signaling in causing hair loss [53-55].
Figure 4
Heatmap showing patterns of correlated effect sizes of variants across pairs of traits
For each pair of traits [i,j], we extracted the set of variants that influence trait i and their effect sizes on both i and j. We then calculated Spearman's rank correlation between the effect sizes on i and the effect sizes on j, and tested whether this correlation was significantly different from zero. Shown in color are all pairs where this test had a P-value less than 0.01. Darker colors correspond to smaller P-values, and the color corresponds to the direction of the correlation (in red are positive correlations and in blue are negative correlations). The phenotypes are in the same order as in Figure 2. For a comparison to genome-wide genetic correlations, see Supplementary Figure 13.
Second, we find that genetic variants that increase risk of schizophrenia tend to increase risk of both Crohn's disease (ρ = 0.27, P = 2.2 × 10−4) and ulcerative colitis (ρ = 0.33, P = 6.6 × 10−6). These correlations (identified only at “significant” SNPs) are also present at the level of genome-wide genetic correlations between the diseases ([39], Supplementary Figure 13). This observation is consistent with slightly higher rates of autoimmune diseases (including Crohn's and ulcerative colitis) in schizophreniapatients in Denmark [56-58], and with molecular evidence for a partial autoimmune etiology for schizophrenia (e.g. [59]).
Inferring causal relationships between traits
Finally, we were interested in identifying pairs of traits may be related in a causal manner. Since we are using observational data (rather than, for example, a randomized controlled trial), we view strong statements about causality as impossible. Nonetheless, a realistic goal might be to identify aspects of the data that are more consistent with a causal model versus a non-causal model.As a motivating example, we considered the correlation between levels of LDL cholesterol and risk coronary artery disease, now widely accepted as a causal relationship [60]. We noticed that variants ascertained as having an effect on LDL cholesterol levels have correlated effects on risk of coronary artery disease (Figure 4, Figure 5C), while variants ascertained as having an effect on CAD risk do not in general have correlated effects on LDL levels (Figure 5D). This is consistent with the hypothesis that LDL cholesterol is one of many causal factors that influence CAD risk. An alternative interpretation is that LDL cholesterol is highly genetically correlated to an unobserved trait that causally influences risk of CAD.
Figure 5
Putative causal relationships between pairs of traits
For each pair of traits identified as candidates to be related in a causal manner (see Methods), we show the effect sizes of genetic variants on the two traits (at genetic variants successfully genotyped or imputed in both studies). Lines represent one standard error. A. and B. BMI and triglycerides. The effect sizes of genetic variants on BMI and triglyceride levels for variants identified in the GWAS for BMI (A.) or triglycerides (B.). C. and D. LDL and coronary artery disease. The effect sizes of genetic variants on LDL levels and coronary artery disease for variants identified in the GWAS for LDL (C.) or coronary artery disease (D.). E. and F. BMI and type 2 diabetes. The effect sizes of genetic variants on BMI and type 2 diabetes for variants identified in the GWAS for BMI (E.) or type 2 diabetes (F.). G. and H. Hypothyroidism and height. The effect sizes of genetic variants on hypothyroidism and height for variants identified in the GWAS for hypothyroidism (G.) or height (H.).
We developed a method to detect pairs of traits that show this asymmetry in the effect sizes of associated variants, which we interpret as more consistent with a causal relationship between the traits than a non-causal one (Methods). At a threshold of a relative likelihood of 100 in favor of a causal versus a non-causal model, we identified five pairs of putative causally-related traits. (At a less stringent threshold of a relative likelihood of 20 in favor of a causal model, we identified 11 additional pairs of traits (Supplementary Figure 14)) Simulations suggest this threshold corresponds approximately to a P-value around 0.001 (Supplementary Figure 15), and that the power of this test depends on the number of genetic variants used as input and the true underlying correlation in their effect sizes (Supplementary Figure 16). Four of these are shown in Figure 5. First, genetic variants that influence BMI have correlated effects on triglyceride levels, while the reverse is not true; this suggests increased BMI is a cause for increased triglyceride levels (Figure 5). Randomized controlled trials of weight loss are also consistent with this causal link [61;62], as are Mendelian randomization studies [63;64]. Second, we confirm the evidence in favor of a causal role for increased LDL cholesterol in coronary artery disease (Figure 5), and in favor of a causal role for increased BMI in type 2 diabetes risk (Figure 5, Supplementary Figure 17). Finally, we suggest that increased risk of hypothyroidism causes decreased height (Figure 5). While it is known that severe hypothyroidism in childhood leads to decreased adult height (e.g. [65]), these data indicate that hypothyroidism susceptibility may also influence height in the general population. A fifth potentially causal relationship (between risk of coronary artery disease and rheumatoid arthritis) could not be confirmed in a larger study and so is not displayed (see Supplementary Information, Supplementary Figure 18).
Discussion
We have performed a scan for genetic variants that influence multiple phenotypes, and have identified several hundred loci that influence multiple traits. This style of scan complements methods to quantify the “genetic correlation” between two traits [39;41;66;67] that are not generally concerned with identifying individual variants that influence both traits. We were interested in using the individual variants identified to identify biological relationships between traits, including potential relationships when one trait is causally upstream of the other. Other potential mechanisms that could lead to an association between a genetic variant and two phenotypes include trans-generational effects of a variant on a parental phenotype and a separate phenotype in the offspring (e.g. [68;69]) or assortative mating that involves more than a single trait [70].A number of limitations of this study are worth mentioning. First, all of the GWAS we have used are based on genotyping arrays and imputation, and so the loci identified are generally common (over 1% minor allele frequency). Inferences from common variants like these may not hold for rarer variants that may emerge from large sequencing studies. Second, we re-iterate that all of our inferences are based on sets of “detectable” loci; the GWAS we have used have highly variable sample sizes, and the traits have variable genetic architectures. As sample sizes for all traits reach the millions, inferences from “detectable” loci will converge to inferences from all loci. If traits truly follow an infinitesimal model (where every genetic variant influences every trait), we speculate that patterns of genetic overlap (like those in Figure 2) will become less interpretable, while patterns of genetic correlation (like those in Figure 4) may be more useful.One clear observation from these data is that genetic variants that influence puberty (age at menarche and age at voice drop) often have correlated effects on BMI, height, and male pattern baldness (Figure 4). In our scan for causal relationships between traits, we found modest evidence of a causal role of age at menarche in influencing adult height, and for a causal role of BMI in the development of male pattern baldness (Supplementary Figure 12). The non-causal alternative (also consistent with the data) is that all of these traits are influenced by some of the same underlying biological pathways, and perhaps the most likely candidate is hormonal signaling. This highlights the importance of considering evidence from multiple traits when interpreting the molecular consequences of a variant and designing experimental studies. While variants that influence height overall are enriched near genes expressed in cartilage [71] and variants that influence BMI are enriched near genes expressed broadly in the central nervous system [72], it seems a subset of these variants also influence age at menarche and male pattern baldness. For these variants, it may be worth considering functional follow-up in gonadal tissues or specific brain regions known to be important in hormonal signaling.It is also striking to note how many genetic variants influence multiple traits (Figure 2) but without a consistent correlation in the effect sizes (Figure 4). For example, many of the autoimmune and immune-related traits appear to share many genetic causes in common, but the effect sizes of the variants on the different traits appear to be largely uncorrelated (see also [10;39]). Likewise, many variants appear to influence lipid traits, red blood cell traits and immune traits, but without consistent directions of effect. A trivial explanation of this observation is that we are underpowered to detect correlations in the effect sizes because we are using only a small set of the SNPs with the strongest associations. However, the genetic correlations between many of these traits (calculated using all SNPs) are not significantly different from zero ([39], Supplementary Figure 13). Another possibility is that a given genetic variant often influences the function of multiple cell types through separate molecular pathways, or that the effects of a variant on two related phenotypes vary according to an individual's environmental exposures.From the point of view of epidemiology, the ability to scan through many pairs of traits to find those that are potentially causally related seems appealing, and some previous analyses have had similar goals [73]. Our approach makes the key assumption that, if two traits are related in a causal manner, then the “causal” trait is one of many factors that influence the “caused” trait. This induces an asymmetry in the effects of genetic variants on the two traits that can be detected (Figure 5). We also assume that we have identified a modest number of variants that influence both traits. This naturally means we are limited to considering heritable traits that have been studied with in cohorts with moderate sample sizes (on the order of tens to hundreds of thousands of individuals). It seems likely that the main limiting factor to scaling this approach (should it be generally useful) will be phenotyping rather than genotyping.
Methods
Methods are available in the Supplementary Materials.
Authors: M Chillón; T Casals; B Mercier; L Bassas; W Lissens; S Silber; M C Romey; J Ruiz-Romero; C Verlingue; M Claustres Journal: N Engl J Med Date: 1995-06-01 Impact factor: 91.245
Authors: N Maneka G De Silva; Rachel M Freathy; Tom M Palmer; Louise A Donnelly; Jian'an Luan; Tom Gaunt; Claudia Langenberg; Michael N Weedon; Beverley Shields; Beatrice A Knight; Kirsten J Ward; Manjinder S Sandhu; Roger M Harbord; Mark I McCarthy; George Davey Smith; Shah Ebrahim; Andrew T Hattersley; Nicholas Wareham; Debbie A Lawlor; Andrew D Morris; Colin N A Palmer; Timothy M Frayling Journal: Diabetes Date: 2011-01-31 Impact factor: 9.461
Authors: Rui Li; Felix F Brockschmidt; Amy K Kiefer; Hreinn Stefansson; Dale R Nyholt; Kijoung Song; Sita H Vermeulen; Stavroula Kanoni; Daniel Glass; Sarah E Medland; Maria Dimitriou; Dawn Waterworth; Joyce Y Tung; Frank Geller; Stefanie Heilmann; Axel M Hillmer; Veronique Bataille; Sibylle Eigelshoven; Sandra Hanneken; Susanne Moebus; Christine Herold; Martin den Heijer; Grant W Montgomery; Panos Deloukas; Nicholas Eriksson; Andrew C Heath; Tim Becker; Patrick Sulem; Massimo Mangino; Peter Vollenweider; Tim D Spector; George Dedoussis; Nicholas G Martin; Lambertus A Kiemeney; Vincent Mooser; Kari Stefansson; David A Hinds; Markus M Nöthen; J Brent Richards Journal: PLoS Genet Date: 2012-05-31 Impact factor: 5.917
Authors: Irene Pichler; Fabiola Del Greco M; Martin Gögele; Christina M Lill; Lars Bertram; Chuong B Do; Nicholas Eriksson; Tatiana Foroud; Richard H Myers; Michael Nalls; Margaux F Keller; Beben Benyamin; John B Whitfield; Peter P Pramstaller; Andrew A Hicks; John R Thompson; Cosetta Minelli Journal: PLoS Med Date: 2013-06-04 Impact factor: 11.069
Authors: Po-Ru Loh; Gaurav Bhatia; Alexander Gusev; Hilary K Finucane; Brendan K Bulik-Sullivan; Samuela J Pollack; Teresa R de Candia; Sang Hong Lee; Naomi R Wray; Kenneth S Kendler; Michael C O'Donovan; Benjamin M Neale; Nick Patterson; Alkes L Price Journal: Nat Genet Date: 2015-11-02 Impact factor: 38.330
Authors: Ehud Karavani; Or Zuk; Danny Zeevi; Nir Barzilai; Nikos C Stefanis; Alex Hatzimanolis; Nikolaos Smyrnis; Dimitrios Avramopoulos; Leonid Kruglyak; Gil Atzmon; Max Lam; Todd Lencz; Shai Carmi Journal: Cell Date: 2019-11-21 Impact factor: 41.582
Authors: Regie Lyn P Santos-Cortez; Charlotte M Chiong; Daniel N Frank; Allen F Ryan; Arnaud P J Giese; Tori Bootpetch Roberts; Kathleen A Daly; Matthew J Steritz; Wasyl Szeremeta; Melquiadesa Pedro; Harold Pine; Talitha Karisse L Yarza; Melissa A Scholes; Erasmo Gonzalo D V Llanes; Saira Yousaf; Norman Friedman; Ma Leah C Tantoco; Todd M Wine; Patrick John Labra; Jeanne Benoit; Amanda G Ruiz; Rhodieleen Anne R de la Cruz; Christopher Greenlee; Ayesha Yousaf; Jonathan Cardwell; Rachelle Marie A Nonato; Dylan Ray; Kimberly Mae C Ong; Edward So; Charles E Robertson; Jordyn Dinwiddie; Sheryl Mae Lagrana-Villagracia; Samuel P Gubbels; Rehan S Shaikh; Stephen P Cass; Elisabet Einarsdottir; Nanette R Lee; David A Schwartz; Teresa Luisa I Gloria-Cruz; Michael J Bamshad; Ivana V Yang; Juha Kere; Generoso T Abes; Jeremy D Prager; Saima Riazuddin; Abner L Chan; Patricia J Yoon; Deborah A Nickerson; Eva Maria Cutiongco-de la Paz; Sven-Olrik Streubel; Maria Rina T Reyes-Quintos; Herman A Jenkins; Petri Mattila; Kenny H Chan; Karen L Mohlke; Suzanne M Leal; Lena Hafrén; Tasnee Chonmaitree; Michele M Sale; Zubair M Ahmed Journal: Am J Hum Genet Date: 2018-10-25 Impact factor: 11.025
Authors: Valur Emilsson; Marjan Ilkov; John R Lamb; Lori L Jennings; Vilmundur Gudnason; Nancy Finkel; Elias F Gudmundsson; Rebecca Pitts; Heather Hoover; Valborg Gudmundsdottir; Shane R Horman; Thor Aspelund; Le Shu; Vladimir Trifonov; Sigurdur Sigurdsson; Andrei Manolescu; Jun Zhu; Örn Olafsson; Johanna Jakobsdottir; Scott A Lesley; Jeremy To; Jia Zhang; Tamara B Harris; Lenore J Launer; Bin Zhang; Gudny Eiriksdottir; Xia Yang; Anthony P Orth Journal: Science Date: 2018-08-02 Impact factor: 47.728