Literature DB >> 29700474

Comparison of methods that use whole genome data to estimate the heritability and genetic architecture of complex traits.

Luke M Evans¹, Rasool Tahmasbi², Scott I Vrieze³, Gonçalo R Abecasis⁴, Sayantan Das⁴, Steven Gazal^5,6, Douglas W Bjelland², Teresa R de Candia², Michael E Goddard^7,8, Benjamin M Neale⁶, Jian Yang⁹, Peter M Visscher⁹, Matthew C Keller^10,11.

Abstract

Multiple methods have been developed to estimate narrow-sense heritability, h2, using single nucleotide polymorphisms (SNPs) in unrelated individuals. However, a comprehensive evaluation of these methods has not yet been performed, leading to confusion and discrepancy in the literature. We present the most thorough and realistic comparison of these methods to date. We used thousands of real whole-genome sequences to simulate phenotypes under varying genetic architectures and confounding variables, and we used array, imputed, or whole genome sequence SNPs to obtain 'SNP-heritability' estimates. We show that SNP-heritability can be highly sensitive to assumptions about the frequencies, effect sizes, and levels of linkage disequilibrium of underlying causal variants, but that methods that bin SNPs according to minor allele frequency and linkage disequilibrium are less sensitive to these assumptions across a wide range of genetic architectures and possible confounding factors. These findings provide guidance for best practices and proper interpretation of published estimates.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2018 PMID： 29700474 PMCID： PMC5934350 DOI： 10.1038/s41588-018-0108-x

Source DB: PubMed Journal: Nat Genet ISSN： 1061-4036 Impact factor: 38.330

Narrow-sense heritability, h, the proportion of a trait’s phenotypic variance attributable to additive genetic variance, is a fundamental concept in quantitative genetics. In addition to being the central descriptor of the genetic bases of traits, h determines the response to selection and the potential utility of individual genetic prediction[1,2]. h estimated in traditional designs using pedigrees or twins, , relies on strong assumptions about the causes of covariance between close relatives and can be biased to the degree these assumptions are unmet[3,4]. Over the last eight years, alternative “SNP-based” methods[5] have been developed to estimate h using measured SNPs, denoted . When estimated in samples of nominally unrelated individuals, is unlikely to be confounded by common environmental or non-additive genetic effects that increase similarity of close relatives, and should reflect the proportion of phenotypic variation due to causal variants (CVs) tagged by SNPs. When common SNPs are used in the analysis, is expected to be less than h and because rare CVs are typically poorly tagged by common SNPs, and indeed is substantially lower than for most complex traits in such analyses, with schizophrenia[6] ( versus ) a typical example. More recently, imputed SNPs have been used to capture the effects of rarer CVs and to gain insight into the genetic architecture of traits, examine genetic networks and annotation classes, and test evolutionary hypotheses[6-18]. For example, the substantial fraction of the variance in prostate cancer risk due to rare variants suggests that negative selection has reduced the frequency of risk alleles[18], and across a range of traits, young alleles explain more of the heritability than old alleles, suggesting widespread purifying selection[13,14]. Whole genome sequence (WGS) SNPs are likely to be increasingly used for such purposes in the future. As SNPs in these analyses begin to more accurately reflect the density and frequency distributions of CVs, should approach total h, making it important to understand the factors that can bias . Moreover, the proliferation of methods (Table 1) has led to discrepancies in estimates. For example, schizophrenia has been reported as 0.56 (LD score regression[19]) and 0.23 (univariate GREML[16]). Recently, Speed et al.[15] argued that typical assumptions about the relationships between SNP effect size, minor allele frequency (MAF), and linkage disequilibrium (LD) are inaccurate, and reported values significantly higher than previous estimates under different assumptions. How should such discrepancies be interpreted? Under which conditions do biases exist across different methods and when should researchers prefer one method over another? Answers to these questions are important, yet to date, comparisons across methods have been restricted to a small subset of methods in the primary papers they were introduced in, and have been compared across simulations that are unrealistic with respect to properties of real genomes. For example, simulating CVs from imputed genotypic data rather than measured WGS data[15] can lead to CVs with highly atypical levels of LD and therefore to conclusions about that apply to genetic architectures unrepresentative of real traits.

Table 1

Summary of commonly applied methods and a description of findings from simulations.

Method &original ref	Description	Major Assumptions	Simulation findings regarding h^SNP2	Computational Issues
GREML-SC[5]	Often called the “GCTA approach.” Originally applied to common array SNPs only. Estimates h^SNP2, the amount of h² caused by CVs tagged by SNPs used to create the GRM.	1) Genetic similarity is uncorrelated with environmental similarity; 2) an infinitesimal model; 3) SNP effects are normally distributed, independent of LD, and inversely proportionate to MAF (α=−1).	Biased to the degree that the average LD among SNPs is different than the average LD between SNPs and CVs. This occurs in stratified samples and when MAF & LD distributions of SNPs do not match those of CVs.	Simple model tractable with large samples (>100K).
GREML-MS[11]	The first multi-component approach, usually applied by binning SNPs according to their MAF, annotation, or physical regions in order to explore genetic architecture.	Requires that the same assumptions of GREML-SC hold within each GRM.	Biased if CVs have generally higher or lower levels of LD than the SNPs used to make the GRM. Relatively large standard errors.	Run times and memory requirements higher than GREML-SC and increase as a function of the number of variance components estimated.
GREML-LDMS-R[7]	A multi-component approach that bins imputed SNPs by their MAF and regional LD.	Same as GREML-MS	Use of regional LD scores can lead to biases if CVs have different LD on average than surrounding SNPs. Relatively large standard errors.	Same as GREML-MS.
GREML-LDMS-I	A multi-component approach introduced here that bins imputed SNPs by their MAF and individual LD.	Same as GREML-MS	Appears to be the least biased approach, even when traits have complex genetic architectures. Relatively large standard errors.	Same as GREML-MS.
LDAK-SC[15,20]	Introduced to account for redundant tagging of CVs by common SNPs. Recently modified to incorporate error due to imputation and to alter the MAF-effect size relationship.	Same as GREML-SC, except that allelic effects are a function of LD. Extended to assume that effects are also a function of imputation quality and weakly inversely proportionate to MAF (α=−0.25).	Can correct for the overestimation observed in GREML-SC from redundant tagging of CVs, but otherwise about as biased as GREML-SC when assumptions are unmet, although the biases are sometimes in different directions.	Same as GREML-SC.
LDAK-MS[15]	A multi-component extension of LDAK-SC that bins SNPs by MAF.	Requires that the same assumptions of LDAK-SC hold within each GRM.	Less biased on average than LDAK-SC, but more biased than GREML-LDMS (-I or -R). Relatively large standard errors.	Same as GREML-MS.
Threshold GRMs[24]	A multi-component approach with two GRMs: the normal (unthresholded) GRM built from all SNPs, and a second GRM with entries set to 0 if below a threshold. Conducted in samples that include close relatives.	Same as GREML-SC for the unthresholded GRM. Assumes no shared environmental influences among close relatives.	Estimates associated with unthresholded GRM similar to those of GREML-SC. When used in samples that include close relatives, the second GRM captures pedigree-associated variation but can be upwardly biased by shared environmental influences.	See GREML-SC.
LD Score Regression[19]	Uses the slope from χ² (from GWAS) regressed on SNPs’ LD scores to estimate the h² due to CVs in LD with common SNPs.	Infinitesimal model with allelic effects normally distributed.	Largely robust to confounding due to stratification and shared environmental influences. Estimates h² due to common CVs only, even when used on imputed or WGS data. Underestimates h² if the trait is not highly polygenic.	The most computationally efficient method of those compared and is tractable for very large datasets.

Here, we utilized thousands of fully sequenced genomes to simulate traits across different genetic architectures and degrees of population stratification, and compared the performance of the most popular SNP heritability estimation methods using three different SNP types (array, imputed, and WGS). By simulating phenotypes from real WGS data rather than from simulated or array/imputed SNPs, we were able to mimic patterns of LD and stratification found in real genomes and to include the effects of CVs down to a MAF of 0.0003. We then estimated heritability and the allelic spectra of six complex traits in the UK Biobank. Our findings provide insight into the most important factors influencing, and best practices for estimating, .

RESULTS

Comparison of across estimation methods under typical assumptions about CV effect sizes

For all methods described here other than LD score regression, evidence for occurs to the degree to which the genome-wide average correlation between pairs of individuals i, j at measured SNPs, A, is related to phenotypic similarity. A values between all pairs of individuals are stored in an n×n genomic relationship matrix (GRM), used to estimate with restricted maximum likelihood (REML). Such models can be fit using a single GRM (“single-component GREML”)[5,20] or by binning SNPs according to MAF, LD, and/or other annotations into multiple GRMs (“multi-component GREML”)[7,11], akin to multiple regression and leading to one per GRM, which can be summed to derive total . We used WGS data from the Haplotype Reference Consortium[21] to mimic four levels of stratification found within Europe by varying the ancestry compositions of samples (each n=8201; Online Methods). We simulated traits using 1000 randomly chosen WGS CVs within five different MAF ranges under typical assumptions (CV effect sizes independent of LD and inversely proportionate to MAF, per-CV contribution to h invariant across MAF). Later, we tested alternative assumptions. While all CVs are SNPs in our simulations (i.e., we do not simulate non-SNP CVs, such as repeat polymorphisms), we hereafter restrict our usage of “SNPs” to denote the markers used to create GRMs and “CVs” to denote underlying causal variants. We estimated h using commonly applied methods (see Supplemental Note for additional methods) and used SNPs on a typical commercial platform (the UK Biobank Axiom array[22]), SNPs imputed from an independent reference panel, or WGS SNPs to create GRMs. When WGS SNPs were used to create GRMs, CVs were necessarily included in the markers that created the GRMs, whereas this occurred sporadically for array and imputed SNPs. We simulated 100 phenotypes for each parameter combination and found the means of and their empirical 95% confidence intervals (CIs) across replicates. We did not simulate any phenotypic effects as a function of ancestry, and thus biases related to stratification in our results were due to the genotypic (e.g., long-range LD), not environmental, effects of stratification. We note that in some contexts, it is useful to compare to a corresponding population parameter, , defined as the true proportion of variance explained by the set of SNPs used in the analysis[23], and which in most cases is less than the full h due to imperfectly tagged CVs. However, such a formulation is cumbersome in the current context because changes across each combination of genetic architecture and SNP data type. Instead, in all cases we compare to the full (simulated) h, with the recognition that downward biases in are expected when CVs are imperfectly tagged by (array and imputed) SNPs used in the analysis, and that such underestimates do not necessarily reflect estimation problems. Because this expected underestimation does not apply to WGS data, and because these methods will be increasingly applied to WGS data in the future, in this section we focus primarily on results from WGS data; results from imputed SNPs (which were similar) and array SNPs (which were often dissimilar) are discussed briefly below but are presented in full in the Supplement. The most-widely used estimation method, single-component GREML[5] (GREML-SC, or the “GCTA” approach[15]), underestimated h when average CV MAF < average SNP MAF, such as when CVs were rare and array SNPs were analyzed, and overestimated h when average CV MAF > average SNP MAF, such as when CVs were common and WGS SNPs were analyzed (Figure 1; Supplementary Figs. 1–6, Supplementary Tables 1–3). These biases are predictable based on SNP-SNP versus SNP-CV LD: when the mean LD between CVs and SNPs is less than the mean LD between all SNPs , which occurs when CVs are on average rarer than SNPs, under-estimates h, and vice-versa when (Supplementary Fig. 7, ref.[7]). GREML-SC analyses using array SNPs led to modest overestimation of h when CVs were common (Supplementary Fig. 1), presumably because array SNPs are chosen to maximally tag surrounding genomic regions. Stratification led to long-range tagging between ancestry specific (rare) CVs and ancestry informative common SNPs, which altered these biases. In the most stratified sample, average LD for very rare SNPs was higher than average LD for common SNPs (Supplementary Fig. 7), which led to overestimation of h when CVs were very rare and underestimation of common CV h when using WGS or imputed variants (Supplementary Figs. 3–5). Controlling for ancestry principal components as fixed effects had no influence on these biases. Thus, stratification, CV MAF, and data type strongly influenced patterns of CV and SNP LD, leading to over- or under-estimated h using GREML-SC.

Figure 1

Mean across 100 replicates from GRMs built from WGS SNPs in the least structured subsamples. Methods on the x-axis as follows: Single-component GREML (GREML-SC) with all SNPs or only MAF > 0.01; MAF-stratified GREML (GREML-MS); LD and MAF-stratified GREML (GREML-LDMS-R [regional LD] & -I [individual SNP LD]); Single-component Linkage Disequilibrium-Adjusted Kinships (LDAK-SC) with all SNPs or only MAF > 0.01; MAF-stratified LDAK (LDAK-MS); Extended Genealogy with Thresholded GRMs with all SNPs or only common (MAF > 0.01), presenting both h and h (=h + h); LD score regression (LDSC) using no PCs as covariates in GWAS, using PCs as covariates, or partitioned using PCs with MAF-stratification. Estimates are from samples of unrelated individuals (relatedness <0.05) except for those from the Threshold GRM method, which included all individuals. Simulated (true) h = 0.5. Colors represent the MAF range of the 1,000 randomly drawn CVs. See Online Methods for descriptions of each method and Supplementary Figures for additional estimates and Supplementary Table 2 for numerical results. Error bars represent 95% confidence intervals.

Speed et al. introduced an approach (LDAK) to LD-weight SNPs in order to account for the redundant tagging of CVs by multiple SNPs, which can bias in certain situations[20]. We limit discussion here to LDAK-SC as originally described[20], and explore recent extensions of this model[15] below with different simulations. As with GREML-SC, LDAK-SC estimates were highly sensitive to stratification, CV MAF, and SNP data type. When using common SNPs for the analysis (array, imputed, or WGS), LDAK-SC underestimated h arising from rare CVs, but corrected the overestimation arising from common CVs observed with GREML-SC (Fig 1; Supplementary Fig. 1–2). However, when using all SNPs from WGS data, LDAK weighted SNPs inversely proportionate to their LD, resulting in near-zero weights for common SNPs and very high weights for rare SNPs (Supplementary Fig. 8–9). This led to underestimated h when CVs were common and overestimated h when CVs were very rare (Fig. 1; Supplementary Fig. 4). This over-weighting of rare SNPs appeared to exacerbate biases arising from stratification versus the unweighted (GREML-SC) approach (Supplementary Fig. 3–5). On the other hand, when all imputed SNPs were modeled in unstratified samples, LDAK appeared to provide decent estimates of h (Supplementary Fig. 5), although results in the next section suggest that this was due to offsetting biases that happened to cancel out across this particular combination of parameters. Overall, the LDAK-SC results reiterate that single-component GREML models are highly sensitive to assumptions about genetic architecture. We compared four multi-component approaches: 1) GREML-MS[7] (4 GRMs) which binned SNPs into 4 MAF categories; 2) GREML-LDMS-R[7] (16 GRMs) which binned SNPs by MAF crossed by the average LD of SNPs in the surrounding ~200kb region; 3) GREML-LDMS-I (16 GRMs), which we introduce here and which binned SNPs by MAF crossed by their individual levels of LD; and 4) LDAK-MS[15,20] (4 GRMs), which binned SNPs by MAF and weighted them according to the LDAK model. There were no major differences between the results of the first three approaches: all provided ~ unbiased total (the sum of from each GRM) when used on imputed or WGS data (Fig. 1, Supplementary Fig. 1–5). The similarity of these estimates is unsurprising in this set of simulations because CV effects were unrelated to LD, but below we demonstrate that GREML-LDMS-I provides the most robust estimates when this is not the case. LDAK-MS provided less biased than LDAK-SC but more biased than the other three multi-component GREML methods when CVs were rare. Biased from LDAK-MS could occur because the simulation model does not match the LDAK assumption that CV effect sizes are a function of LD; we explore this issue below. In general, multi-component models outperform single-component models because is closer to within narrower MAF/LD ranges, and therefore associated with each partitioned GRM—and their sums—are likely to be ~unbiased, consistent with previous work[7]. For similar reasons, these models were less biased in stratified samples than single-component models (Supplementary Fig. 3–5). However, the empirical standard errors of from GREML-LDMS-I were ~20%–50% higher than those from GREML-LDMS-R, which were in turn ~100% higher than those from GREML-SC (Supplementary Fig. 10–12). Thus, multi-component GREML models require large sample sizes (e.g., n > 30k) to be informative. Zaitlen et al.[24], proposed a two GRM approach to obtain and in samples containing close relatives. The first GRM contains A for all pairs of individuals, while A values below a threshold, t (=.05 here), are set to 0 in the second GRM. The first GRM contains information on sharing of CVs tagged by SNPs and is used to obtain , while the second GRM only contains information from closely related individuals, reflecting sharing of CVs not tagged by SNPs, and is used to obtain , the additional h captured by close relatives. The sum of and therefore provides an estimate of . In our simulations, was an unbiased estimate of h across most situations examined (Supplementary Fig. 13–14). However, and were often severely over- or under-estimated individually, depending on the CV MAF range and data type, with patterns of similar to those observed for GREML-SC. Thus, attempts to use this method to infer genetic architecture should be treated with caution. Moreover, as acknowledged by Zaitlen et al.[24] and demonstrated in additional simulations, may be biased upward when environmental factors cause similarity within nuclear or extended families (Supplemental Fig. 15). LD score regression (LDSC) is an alternative, computationally-efficient approach that estimates h from the relationship between LD-tagging of individual SNPs and their expected GWAS test statistics under an infinitesimal model[10,19]. Results from LDSC were similar when utilizing array, imputed, or WGS SNPs (Fig. 1, Supplementary Fig. 1–2, 16–18), as were estimates of the intercept, which reflect the contribution of stratification and cryptic relatedness to the GWAS test statistics (see Supplementary Note for further discussion of LDSC statistics). Across data types, LDSC generally underestimated h by 5–10% when CVs were common. LDSC increasingly underestimated h when CVs were rare, regardless of data type, because rare SNPs and CVs generally have very low LD scores. However, LDSC was largely immune to the genomic effects of stratification (see Supplementary Note), and we found no upward bias when unmodeled shared environmental effects were included in the simulations (Supplementary Fig. 15), suggesting that from LDSC is robust to familial environmental effects and provides a reasonable estimate of the lower bound of h tagged by common CVs. We also simulated ascertained, case-control phenotypes applying the standard transformation to the liability scale[25]. While the smaller sample size from ascertainment increased standard errors, patterns of estimates across methods were similar to those found with continuous phenotypes (Supplementary Fig. 19), suggesting that our conclusions here apply to categorical outcomes. Finally, multi-component methods can also estimate h across different annotations or different MAF bins (the “allelic spectra” of traits). Multi-component GREML approaches accurately estimated the allelic spectra when using WGS data (Fig. 2, Supplementary Fig. 20). However, these approaches underestimated the contribution of very rare CVs by up to 20% using imputed data (Supplementary Fig. 21), due to the poorer imputation quality of rare SNPs, and highly underestimated their contribution when using array SNPs (Supplementary Fig. 22) due to the low LD typically observed between array SNPs and rare CVs (Supplementary Tables 4–5).

Figure 2

Mean for four MAF bins across 100 replicates from multi-component approaches in unrelated individuals using WGS SNPs in the least structured subsample. See Fig. 1 for specific methods. Black lines are the true (simulated) h values; note that in the top panel, the true h values differ across MAF. See Online Methods for descriptions of each method and Supplementary Figures for additional estimates and Supplementary Table 4 for numerical results. Error bars represent 95% confidence intervals.

Comparison of models under alternative assumptions

Recent work has shown that, conditioning on MAF, SNPs with individually low levels of LD contribute disproportionately to the heritability of multiple complex traits[13], suggesting that CV effects are not independent of their levels of LD. The simulations above assumed that CV effect sizes, β, were independent of LD and that rare CVs had, on average, larger effect sizes than common CVs, and therefore that the per-CV h was invariant on average across MAF. This is achieved by applying an α of −1, which governs the MAF-effect size relationship and assuming β~N(0,1), the default scaling of GREML-SC, -LDMS-R, and LDMS-I[5,7] (Online Methods). Recently, Speed et al.[15] argued that less biased estimates are obtained using a single-component model, but by assuming a higher contribution of common CVs (i.e., α=−0.25), by assuming SNP effect sizes, w, are inversely proportionate to LD, (Supplementary Fig. 8–9), and by weighting SNPs by imputation quality (r) (the LDAK model). Across numerous traits, they observed LDAK-SC-based 25–43% higher than from GREML-SC and GREML-LDMS-R, as well as higher log-likelihoods from LDAK-SC models. We compared the performance of these alternative assumptions of MAF, LD, and CV effect size relationships with simulated phenotypes using CVs drawn from different MAF ranges under four different combinations of MAF-effect size (α=−1 or −0.25) and LD-effect size (β ~N(0,1) or ~N(0,w)) relationships. We also simulated phenotypes from two distinct, functionally relevant genetic architectures. First, we simulated with CVs randomly chosen from all DNase-I Hypersensitivity Sites, which have systematically lower LD[17]. Second, we simulated phenotypes using the empirically-estimated, LD-dependent effect size distribution, β~N(0, τ), where τ was estimated across 31 traits using partitioned LD score regression[13] (Online Methods). This latter simulation is particularly important because the functional, LD-dependent genetic architecture it used was independent of the assumptions made in the GREML and LDAK models used in estimation. Because LDAK-SC was intended to be used on imputed data, our primary results below are based on imputed SNPs, but results from WGS data are also presented in the Supplement. from single-component models, including GREML-SC and LDAK-SC, were highly sensitive to model assumptions about MAF- and LD-effect size relationships, as well as to differences between CV and SNP MAF distributions (Fig. 3, Supplementary Figs. 23–24, Supplementary Tables 6–7). Moreover, in simulations with empirically derived genetic architectures[13] (β~N(0, τ)), both GREML-SC and LDAK-SC (Fig. 4, Supplementary Fig. 25–26) were highly biased. On the other hand, multi-component GREML models were much more robust to model misspecification (Figs. 3–4, Supplementary Fig. 23–28). In particular, when we binned SNPs by their individual LD scores (GREML-LDMS-I), estimates were robust across every genetic architecture we investigated (Fig. 3), including when CV effect sizes were drawn from the empirically estimated genetic architectures (Fig. 4). Across all genetic architectures and all data types investigated, GREML-LDMS-I had the lowest absolute bias of any method (Fig. 5). This suggests that particular assumptions regarding MAF- and LD-effect size relationships are mitigated by the use of multiple-component models.

Figure 3

Mean across 100 replicates from GRMs built from imputed SNPs in the least structured subsamples across different model assumptions (bars) and different ways of simulating CVs (x-axes). The x-axes of each panel show the simulated CV MAF-scaling parameter, α, and the CV effect size distribution, β. The four panels show different MAF ranges of the 1,000 randomly-drawn CVs. DHS sites were randomly sampled without respect to MAF. Bar colors indicate the fitted model, with a single GRM used except for the “LDMS” models, which used 16 GRMs (α=−1) stratified by MAF and either regional (-R) or individual SNP (-I) LD score. See Online Methods for descriptions of each method and Supplementary Figures for additional estimates and Supplementary Table 6 for numerical results. Error bars represent 95% confidence intervals.

Figure 4

Mean across 100 replicates from GRMs built from imputed SNPs in the least structured subsamples across different model assumptions (bars) and different ways of simulating CVs (x-axes). CV effect sizes were simulated from ~N(0,τ). The x-axes of each panel show the simulated CV MAF-scaling parameter, α. The three panels show different MAF ranges of the 1,000 randomly-drawn CVs. Bar colors indicate the fitted model. See Online Methods for descriptions of each method and Supplementary Figures for additional estimates and Supplementary Table 6 for numerical results. Error bars represent 95% confidence intervals.

Figure 5

Boxplots of the absolute bias of heritability estimates across all simulated phenotypes from Supplementary Figures 24 & 26 using WGS data to estimate GRMs (top), and from Figures 3–4 using imputed variants to estimate the GRMs (bottom). X axis indicates the parameters for the estimation model, including the MAF scaling factor, α, and the assumed effect size distribution, β, specified in the GRM and whether imputation scores (r) were used in the GRM estimation. All used a single GRM except for LD- & MAF-stratified GREML (LDMS), which used 16 GRMs (α=−1) stratified by MAF and either regional (-R) or individual SNP (-I) LD score. * Typical GREML-SC parameters. † Typical LDAK-SC parameters. Boxplots show the median and interquartile, with whiskers extending 1.5 times the quartiles and more extreme points shown for N=22 (WGS) and 26 (imputed) mean estimates of heritability.

Of note, log likelihood was not a reliable indicator of degree of bias. Speed et al.[15] argued that higher log-likelihood assuming α=−0.25 than α=−1 suggested that the former was more tenable. Across single-component models, which had the same number of predictors and therefore comparable log likelihoods, models with higher log likelihoods were typically less biased. However, we observed multiple cases where negligible differences in log likelihood translated into large differences in bias, as well as situations where models with higher average log likelihoods produced more biased results than models with lower average log likelihoods (Supplementary Figs. 23–26).

Heritability of Complex Traits in the UK Biobank

We applied seven approaches using imputed SNPs to six complex traits in the UK Biobank[26] (Fig. 6, Supplementary Fig. 29–30, Supplementary Table 8). Differences in across methods were consistent with our simulations. Estimates from single-component models were often higher than those from multi-component models that bin SNPs by MAF and LD. For instance, the majority of height h is attributable common CVs[27], and GREML-SC and LDAK-SC of height were unrealistically high , which can occur when CVs are more common than SNPs used to build the GRM (Fig. 1,3–4). On the other hand, estimates from multi-component GREML were much more reasonable. These results provide context for understanding previously published estimates (see Supplementary Note), including those from Speed et al.[15] showing higher LDAK , and highlight the dangers of using single-component models that rely on strong assumptions about CV effect sizes and MAF distributions.

Figure 6

Estimated using multiple methods with imputed variants for six complex traits in the UK Biobank. MAF>0.01 indicates common SNPs were used to create the GRMs. ∅ = information matrix was not invertible. HM3 indicates that only imputed HapMap3 sites were used in the LDSC analysis. Sample sizes as follows: height N=94,769; BMI N=94,595; impedance N=93,451; trunk fat N=93,414; fluid intelligence N=31,724; neuroticism N=78,565. See Supplementary Table 8 for numerical results. Error bars are 1 S.E.M.

Our results also suggest that the allelic spectra differ across the six traits, as estimated using GREML-LDMS-I, the most accurate approach in our simulations (Supplementary Fig. 31, Supplementary Tables 9–10). For example, while the majority of height heritability was explained by common SNPs, 59% of fluid intelligence h was due to rare CVs, with a total that approached . Nevertheless, our simulations suggest that variance due to increasingly rare CVs was underestimated by ~20% for all traits due to low imputation quality at lower MAF. This under-estimate was probably more severe because the imputation reference panel (combined UK10K and 1,000 Genomes) used in the UK Biobank data was smaller by ~half and less diverse than the reference panel used in our simulations.

DISCUSSION

We have demonstrated that estimates of h and allelic spectra using SNP data can be biased in a number of sometimes difficult to foresee ways, and depend strongly on a complex interplay between the method and type of data used in the analysis, trait genetic architecture, degree of sample stratification, shared environmental effects, and whether close relatives are included or excluded. Understanding how these influence is crucial for proper interpretation of often-conflicting published estimates and for optimal design of future studies. Additional factors that we did not investigate might also influence the biases of across methods, such as technical artifacts[28], environmental factors that covary with ancestry[29,30], CVs with MAF <0.0003, or non-SNP CVs. LD is central to the performance of all the methods compared here, in particular, the LD among SNPs used to create the GRM and that between CVs and SNPs[7,20]. Single-component models, such as GREML-SC and LDAK-SC, are highly sensitive to assumptions, especially when rare imputed or WGS SNPs are used to create the GRM. This is problematic given that it seems unlikely that a single set of assumptions will hold for all traits and across the entire allelic spectrum. Alternatively, multi-component models that partition across multiple LD and MAF bins provide the most robust estimates across the majority of contexts explored here while simultaneously providing insight into the allelic spectra of complex traits. However, they are more computationally intensive and have higher standard errors than single-component models, and require larger datasets to achieve reliable estimates. Nevertheless, such data is now at hand, and if the goal is to obtain the least biased estimates of h or to estimate allelic spectra, we recommend using multi-component GREML models. Even when using multi-component approaches, h is likely underestimated, but will improve as sample sizes increase and larger imputation panels and/or WGS data are utilized. Based on the results of the present and previous studies, we summarize our suggestions for using SNPs to estimate h and allelic spectra of complex traits. First, quality control of genetic data is crucial, particularly for case-control and/or multiple cohorts datasets where technical artifacts can inflate or deflate [31]. Covariates (ancestry principal components, cohorts, plates, etc.) that might be confounded with genetic similarity should be included as fixed effects in GREML models and in the GWAS models for LD score regression[32]. Related individuals may share common environmental and non-additive genetic effects, upwardly biasing estimates of h; using unrelated individuals should provide estimates not inflated by such factors[33]. Second, the model and data type used in the analysis strongly influence estimates. When genotype data are unavailable or impractical to use, LDSC provides a lower bound of the h captured by common CVs and is unaffected by confounding due to stratification and the common environment. Single component methods such as GREML-SC and LDAK-SC are highly sensitive to model misspecification, which can lead to severely biased estimates of heritability. Moreover, they are also sensitive to the effects of stratification, which are not mitigated by inclusion of ancestry covariates. We recommend these approaches only when sample sizes are small (e.g., n < 30,000) and homogeneous. Multi-component approaches with WGS or imputed SNPs provide the most accurate estimates of h and allelic spectra across a range of genetic architectures and stratification levels. When using imputed data, SNPs should be imputed using the largest and most diverse reference panel possible (e.g., HRC[21]) in order to more reliably capture the effects of rare CVs. However, more GRMs lead to larger standard errors, necessitating larger sample sizes (n > 30,000). Of the multi-component approaches, GREML-LDMS-I, which we introduce here and bins SNPs by MAF and individual LD levels, appears to perform the best.

ONLINE METHODS

Samples and Population Structure

We simulated continuous phenotypes derived from WGS data in the Haplotype Reference Consortium (HRC)[21]. The HRC comprises ~32,500 individuals from multiple WGS studies, with called genotypes at all sites with minor allele count ≥5. We had access to a subset (Supplementary Note) of 21,500 individuals with genotype calls at 38,913,048 biallelic SNPs. This large WGS dataset allowed phenotype simulation with differing genetic architectures under realistic patters of LD structure, stratification, and relatedness. The HRC is mainly of European ancestry. To reduce the effects of worldwide stratification, we identified European individuals using principal components analysis (PCA). We used flashpca[34] on 133,603 MAF- and LD-pruned SNPs (plink2[35] commands –maf 0.05 --indep-pairwise 1000 400 0.2), extracted the first ten PCs. We used the 1000 Genomes individuals in the HRC as anchor points for ancestry and identified 19,478 individuals of European descent, including individuals of Finnish and Sardinian ancestry using K-means clustering in R[36] (Supplementary Fig. 32). To identify subsets of these 19,478 individuals spanning different levels of genetic heterogeneity, we reran PCA with only these individuals, then identified four increasingly homogenous subgroups within them using K-means clustering (Supplementary Fig. 33 and Supplemental Note). We sampled an equal number of individuals from each subset at a relatedness cutoff of 0.1 (N=8,201), and also identified individuals with relatedness less than 0.05 within each group (N=7,792; 8,115; 8,129; and 8,186 for the four subsamples) to examine how relatedness and stratification influence estimates.

Simulation of Phenotypes and Whole Genome Data Types

To assess how methods performed on a range of genetic architectures, we simulated phenotypes from CVs drawn randomly from five MAF ranges from the WGS data: common (MAF≥0.05), uncommon (0.01≤MAF<0.05), rare (0.0025≤MAF<0.01), very rare (0.0003≤MAF<0.0025), and all SNPs that had a minor allele count (MAC) ≥5 (MAF≥0.0003). We generated phenotypes from 1,000 CVs from the model y, where g=ΣXβ and X=(z−2p)[2p(1−p)]α/2, where z was the genotype coded as 0, 1, or 2 of individual i at the k CV, p was the MAF within a population subset, and β was the k allelic effect size, drawn from ~N(0,1). In these simulations, we used α=−1, assuming larger average effect sizes for rarer SNPs. The g’s were standardized and added to residual error drawn from ~N(0,(1−h)/h) for h=0.5. A total of 100 replicated phenotypes were simulated for each CV MAF range and each of the four population stratification subsets. Note that simulations did not include any ancestry (i.e., PC) effects, and thus stratification-driven biases were due to the genotypic (e.g., long-range LD) effects of stratification. To simulate ascertained case-control phenotype data, in samples with some and low stratification (Supplementary Fig. 33B–C), we converted the continuous phenotypes simulated above to dichotomous case-control data using a prevalence of 20% (K=0.2). We then combined the cases with an equal number of randomly sampled controls to simulate ascertained datasets, which reduced sample sizes (~40% of the continuous trait data). Note that this altered sample size reduces the genetic variance for phenotypes derived from rarer CVs. We transformed estimates of h to the liability scale using the transformation described in Lee et al.[25]. To simulate array, imputed, and WGS data types, we first extracted from the WGS data SNP positions corresponding to a widely-used commercially available genotyping array, the UKBiobank Affymetrix Axiom array (the array SNP dataset). We then imputed genome-wide variants using these Axiom SNPs and independent HRC samples as a WGS reference panel (the imputed dataset). Finally, we used the HRC WGS data directly (the WGS dataset). See Supplementary Note for details of each dataset. MAF distributions of the different data types for two of the structure subsamples are shown in Supplementary Fig. 34.

Heritability Estimation Methods Tested

We briefly describe our implementation of the most commonly used methods to estimate h and partition genetic variation using genome-wide data (see Supplementary Note for descriptions of and results from additional, less commonly used methods). For all methods except LD score regression (described below), we generated GRMs following the standard procedures of each method, and estimated h using GCTA[37]. In all models, variance component estimates were unconstrained (e.g., by using the –reml-no-constrain option of GCTA), and included 20 PCs (10 from worldwide PCA and 10 from the specific subsample PCA) and sequencing cohort as fixed effects.

Single-component GREML (GREML-SC)

Yang et al.[5] introduced the single-component approach using a mixed-effects model, with GRM entries: where m is the number of SNPs, x is the genotype (coded as 0, 1, or 2) of individual j at the k locus, and p is the MAF of the k locus. The variance of the phenotypes is where the variance explained by the SNPs (σ) and error variance (σ) are estimated using restricted maximum likelihood (REML) implemented in the GCTA package[37]. The proportion of the total variance explained by all SNPs is then a measure of heritability . Typically, the set of m SNPs used to build the GRM is the set of SNPs with MAF≥0.01 (hereafter “common SNPs”) and unrelated individuals (relatedness ≤ 0.05). We compared this typical approach to one using all SNPs with MAC≥5 (hereafter “all SNPs”) in each particular stratification subsample and for each data type (note that ~9.5% of Axiom array positions have MAF <0.01 in our sample), as well as to an approach using less stringent relatedness thresholds (relatedness < 0.10 and no relatedness threshold). For analyses that used no relatedness threshold, inclusion of close relatives increased our sample sizes to 9,916; 8,701; 8,715; and 8,506 for the samples with most, some, low, and least stratification, respectively (Supplementary Fig. 33).

MAF-Stratified GREML (GREML-MS)

is expected to be a biased estimate of h when using the GREML-SC method if the MAF distribution of the CVs does not match the MAF distribution of SNPs used to generate the GRM[11]. Stratifying SNPs into MAF bins in a multiple GRM GREML approach can mitigate this bias and can partition into that explained by different SNP MAF bins, lending insight into the allelic spectra of complex traits[6,7]. For each data type, we applied this approach using 4 MAF bins, matching the CV MAF binsused for phenotype simulation.

LD- and MAF-Stratified GREML (GREML-LDMS-R and GREML-LDMS-I)

Extending the GREML-MS method to account for different levels of LD throughout the genome, Yang et al.[7] introduced an approach (originally termed GREML-LDMS but which we term GREML-LDMS-R here) that stratifies SNPs jointly by their MAF and regional LD scores, defined as the sum of r between the focal SNP and all other SNPs in a 200Kb sliding window. We estimated LD scores using the default settings in GCTA (200Kb block size with a 100Kb overlap), and stratified SNPs into LD score quartiles (see Yang et al.[7] for details). This resulted in 16 GRMs (4 MAF bins by 4 LD bins) and therefore 16 values of , which were summed to derive total . SNPs with individually low levels of LD contribute disproportionately to the heritability for multiple complex traits, particularly low LD SNPs in regions of high LD[13]. Because these results suggest individual rather than regional LD levels influence heritability, we developed and compared results from an alternative approach (GREML-LDMS-I) that stratified by individual (rather than regional) SNP LD scores, again binning SNPs by LD quartiles and four MAF bins, for a total of 16 GRMs.

Single- and multi-component LD-Adjusted Kinships (LDAK-SC and LDAK-MS)

Speed et al.[20] noted that because LD varies across the genome, CVs in regions of high LD receive disproportionate weight by eqn. (1) above. The original LDAK[20] approach weights SNPs according to individual LD, potentially correcting for the bias introduced when there is variation in how well CVs are tagged by SNPs, and assumes standard MAF-CV effect size scaling (α = −1). We used LDAK5[20] to estimate these LD-weighted GRMs, which first thins SNPs in very high LD to reduce redundant tagging, then estimates SNP weights, w, that are inversely proportional to their average LD with other SNPs. We also applied the MAF-stratified approach described above but using LDAK weights (LDAK-MS). For the single-component model (LDAK-SC), we used all SNPs (MAC≥5) as well as only common SNPs (MAF≥0.01) to build the GRM for each data type. For the MAF-stratified approach, following recommendations in the LDAK documentation, we estimated SNP weights over the union of all SNPs (MAC≥5), then computed GRMs for each MAF class separately. We then applied the multiple GRM method with these LDAK-weighted GRMs to estimate h using GCTA. Results from the first set of simulations (Figs. 1 and 2) come from the traditional LDAK approach described above; results from the second set of simulations (Figs. 3–5) come from the updated LDAK approach described in the section below, Simulation of data and comparison of under alternative assumptions about CV effect sizes.

Extended Genealogy with Thresholded GRMs

Zaitlen et al.[24] introduced a method to simultaneously obtain and by using two GRMs in a sample containing close relatives. The first GRM contains all A, whereas the second GRM sets A, values below a threshold, t, to 0. The first GRM, therefore, contains information on allele sharing of (mostly common) variants in unrelated and related individuals (estimating h), while the second only contains information from closely related individuals (estimating h, following Zaitlen et al.[24]). We tested two relatedness thresholds (t ≤ 0.05 and 0.1) for the second GRM. The sum of and provides an estimate of total h, similar to , with all the same potential biases that exist in from designs that use close relatives. By necessity, all analyses using this approach included close relatives, which could lead to confounding between genetic and environmental similarity if shared environmental effects are not modeled[38,39]. Indeed, Zaitlen et al.[24] argue that such shared environmental effects were the likely cause of higher estimates among relatives who shared an environment through cohabitation (e.g., half-siblings) compared to equally related relatives that did not share a cohabitation environment (e.g., grand-parents and grand-children). We therefore assessed whether and estimates from this method (as well as from GREML-SC and LDSC) were biased when extended shared environmental effects were present but unmodeled in samples of closely related individuals (see Supplementary Note).

LD Score Regression (LDSC)

LDSC uses a different approach to estimate the heritability tagged by common CVs. Rather than estimating relatedness within a sample for use in mixed-model GREML analysis, LDSC regresses GWAS test statistics (χ2) on SNPs’ LD scores, which reflect the degree to which each SNP is correlated with surrounding SNPs[10,19]. For a polygenic model, the expected GWAS test statistic of SNP j, χ2j, is where N is the sample size, M is the number of SNPs, l is the LD score (= Σr) measuring the tagging of surrounding variants by SNP j, and a is a measure of confounding biases arising from stratification and cryptic relatedness. Thus, regressing GWAS test statistics on per-SNP LD scores allows for both estimation of and assessing the degree of confounding or polygenicity of a trait[19]. Bulik-Sullivan et al.[19] argue that LDSC provides unbiased estimates of h tagged by common SNPs regardless of whether GWAS test statistics are estimated with or without controlling for ancestry or environmental covariates or relatedness. Here, we estimated GWAS test statistics using plink2 without controlling for ancestry covariates or controlling for ancestry covariates (20 PCs and sequencing cohort as above). We used the ldsc package with default parameters (see URLs) to perform LDSC. We calculated LD scores for all SNPs using WGS data, including common and rare SNPs. As recommended by Bulik-Sullivan et al.[19], we used unrelated individuals (relatedness ≤ 0.05) and only common SNPs to perform the regression itself, because the relationship between the GWAS χ2 and LD-score is unclear for rare (MAF<.01) SNPs. We examined the relationship among , the intercept, the mean χ2, and the genomic control inflation factor, λGC (see Supplementary Note). LDSC can also be used to partition heritability among annotations[10]. We applied this approach using the four MAF bins described above. Because our MAF bins included very rare SNPs, for this MAF-stratified LDSC, we used GWAS test statistics from all SNPs (MAF≥0.0003, using the --not-5–50 flag in the ldsc package) while controlling for covariates as above.

Simulation of Phenotypes and Comparison of under Alternative Assumptions about CV Effect Sizes

We tested the LDAK-SC, GREML-SC, and GREML-LDMS (-R & -I) methods on phenotypes imulated under alternative assumptions about CV effect sizes in order to determine the degree to which the methods were robust to model misspecification. To simulated phenotypes under alternative effect size assumptions, in the low stratification sample only (Supplementary Fig. 33C), we varied the MAF-effect size relationship (α=−1 or −0.25), and the effect size distribution (β~N(0,1) or ~N(0,w), where w is the LDAK weight of the k CV estimated from the WGS data, which is inversely proportional to the SNP LD score (Supplementary Fig. 8–9)). When β~N(0,1) and α=−1, this model is the same as above and as previously described[7]. WGS CVs were drawn randomly from common SNPs (MAF > 0.05), very rare SNPs (MAF < 0.0025), all SNPs (MAF≥0.0003) or randomly from all DHS sites (systematically lower LD[17]), annotated for all UK10K SNPs with MAC≥2. Note that in Speed et al.[15], effect sizes, β, are also assumed to be proportionate to the imputation quality scores (r). Because we were simulating CVs from WGS data rather than imputed variants, we did not include the r term for simulating CV effect sizes. Additionally, we simulated phenotypes using an independent LD architecture derived from the 75 annotations baseline-LD model described in ref.[13], which contains coding, conserved, DHS and other functional annotations, 10 MAF bins, and 6 LD-related annotations modeling multiple LD-related architectures (including predicted allele age, recombination rate and CpG-content). For these simulations, we annotated 20,678,452 SNPs with allele count greater or equal than 2 in 3,567 UK10K unrelated individuals, and modeled the variance of the k SNP, τ, proportional to , where a(k) was the continuous value annotations of CV k for annotation c and θ was the per-SNP contribution of one unit of the annotation a to the heritability. We used the values of θ estimated with stratified LD score regression on 31 independent traits[13] and constrained θ to be positive. Finally, as θ and stratified LD score regression hold only for common SNPs, we rescaled the variance of all τ so that the heritability explained by the four rarest of the 10 MAF bins (delimited by 0, 0.1%, 0.5%, 1% and 5% boundaries) were equal to the expected variance of the bin (=Σ(p(1 − p))1+, where α=−0.28, estimated by Loh et al.[12]). We then simulated phenotypes as described above with effect sizes β drawn from ~N(0,τ). We compared estimates from models applying different assumptions of α and β. The traditional GREML-SC, -LDMS-R, and -LDMS-I estimate GRMs using α=−1 and β ~N(0,1), while the updated LDAK-SC model of Speed et al.[15] uses α=−0.25 and β ~N(0,w) as well as weighting SNPs by imputation r. To test these assumptions, we estimated GRMs using either α=−1 or −0.25 and either weighting by LDAK weights or not. For imputed data, we also weighted SNP contributions to the GRM by imputation r. For GREML-LDMS-R and -I, we used α=−1 and no LDAK or imputation r weighting. We estimated heritability for six continuous phenotypes in the initial release of the UK Biobank[26] (N~150,000) using the most commonly applied methods (Fig. 6). To reduce the effects of stratification, we used individuals of European ancestry (Supplementary Fig. 33). To estimate the GRMs, we separately used directly genotyped Axiom array positions as well as imputed genome-wide SNPs with IMPUTE info score ≥0.3. See Supplementary Table 8 for the list of all methods we applied. See Supplemental Note for additional methods and details.

35 in total

1. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies.

Authors: Brendan K Bulik-Sullivan; Po-Ru Loh; Hilary K Finucane; Stephan Ripke; Jian Yang; Nick Patterson; Mark J Daly; Alkes L Price; Benjamin M Neale
Journal: Nat Genet Date: 2015-02-02 Impact factor: 38.330

2. Estimating missing heritability for disease from genome-wide association studies.

Authors: Sang Hong Lee; Naomi R Wray; Michael E Goddard; Peter M Visscher
Journal: Am J Hum Genet Date: 2011-03-03 Impact factor: 11.025

3. Population structure can inflate SNP-based heritability estimates.

Authors: Sharon R Browning; Brian L Browning
Journal: Am J Hum Genet Date: 2011-07-15 Impact factor: 11.025

Review 4. Model-fitting approaches to the analysis of human behaviour.

Authors: L J Eaves; K A Last; P A Young; N G Martin
Journal: Heredity (Edinb) Date: 1978-12 Impact factor: 3.821

5. Genome partitioning of genetic variation for complex traits using common SNPs.

Authors: Jian Yang; Teri A Manolio; Louis R Pasquale; Eric Boerwinkle; Neil Caporaso; Julie M Cunningham; Mariza de Andrade; Bjarke Feenstra; Eleanor Feingold; M Geoffrey Hayes; William G Hill; Maria Teresa Landi; Alvaro Alonso; Guillaume Lettre; Peng Lin; Hua Ling; William Lowe; Rasika A Mathias; Mads Melbye; Elizabeth Pugh; Marilyn C Cornelis; Bruce S Weir; Michael E Goddard; Peter M Visscher
Journal: Nat Genet Date: 2011-05-08 Impact factor: 38.330

6. The contribution of rare variation to prostate cancer heritability.

Authors: Nicholas Mancuso; Nadin Rohland; Kristin A Rand; Arti Tandon; Alexander Allen; Dominique Quinque; Swapan Mallick; Heng Li; Alex Stram; Xin Sheng; Zsofia Kote-Jarai; Douglas F Easton; Rosalind A Eeles; Loic Le Marchand; Alex Lubwama; Daniel Stram; Stephen Watya; David V Conti; Brian Henderson; Christopher A Haiman; Bogdan Pasaniuc; David Reich
Journal: Nat Genet Date: 2015-11-16 Impact factor: 38.330

Review 7. The heritability of human disease: estimation, uses and abuses.

Authors: Albert Tenesa; Chris S Haley
Journal: Nat Rev Genet Date: 2013-02 Impact factor: 53.242

8. Second-generation PLINK: rising to the challenge of larger and richer datasets.

Authors: Christopher C Chang; Carson C Chow; Laurent Cam Tellier; Shashaank Vattikuti; Shaun M Purcell; James J Lee
Journal: Gigascience Date: 2015-02-25 Impact factor: 6.524

9. Identification of 15 genetic loci associated with risk of major depression in individuals of European descent.

Authors: Craig L Hyde; Michael W Nagle; Chao Tian; Xing Chen; Sara A Paciga; Jens R Wendland; Joyce Y Tung; David A Hinds; Roy H Perlis; Ashley R Winslow
Journal: Nat Genet Date: 2016-08-01 Impact factor: 38.330

10. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs.

Authors: S Hong Lee; Stephan Ripke; Benjamin M Neale; Stephen V Faraone; Shaun M Purcell; Roy H Perlis; Bryan J Mowry; Anita Thapar; Michael E Goddard; John S Witte; Devin Absher; Ingrid Agartz; Huda Akil; Farooq Amin; Ole A Andreassen; Adebayo Anjorin; Richard Anney; Verneri Anttila; Dan E Arking; Philip Asherson; Maria H Azevedo; Lena Backlund; Judith A Badner; Anthony J Bailey; Tobias Banaschewski; Jack D Barchas; Michael R Barnes; Thomas B Barrett; Nicholas Bass; Agatino Battaglia; Michael Bauer; Mònica Bayés; Frank Bellivier; Sarah E Bergen; Wade Berrettini; Catalina Betancur; Thomas Bettecken; Joseph Biederman; Elisabeth B Binder; Donald W Black; Douglas H R Blackwood; Cinnamon S Bloss; Michael Boehnke; Dorret I Boomsma; Gerome Breen; René Breuer; Richard Bruggeman; Paul Cormican; Nancy G Buccola; Jan K Buitelaar; William E Bunney; Joseph D Buxbaum; William F Byerley; Enda M Byrne; Sian Caesar; Wiepke Cahn; Rita M Cantor; Miguel Casas; Aravinda Chakravarti; Kimberly Chambert; Khalid Choudhury; Sven Cichon; C Robert Cloninger; David A Collier; Edwin H Cook; Hilary Coon; Bru Cormand; Aiden Corvin; William H Coryell; David W Craig; Ian W Craig; Jennifer Crosbie; Michael L Cuccaro; David Curtis; Darina Czamara; Susmita Datta; Geraldine Dawson; Richard Day; Eco J De Geus; Franziska Degenhardt; Srdjan Djurovic; Gary J Donohoe; Alysa E Doyle; Jubao Duan; Frank Dudbridge; Eftichia Duketis; Richard P Ebstein; Howard J Edenberg; Josephine Elia; Sean Ennis; Bruno Etain; Ayman Fanous; Anne E Farmer; I Nicol Ferrier; Matthew Flickinger; Eric Fombonne; Tatiana Foroud; Josef Frank; Barbara Franke; Christine Fraser; Robert Freedman; Nelson B Freimer; Christine M Freitag; Marion Friedl; Louise Frisén; Louise Gallagher; Pablo V Gejman; Lyudmila Georgieva; Elliot S Gershon; Daniel H Geschwind; Ina Giegling; Michael Gill; Scott D Gordon; Katherine Gordon-Smith; Elaine K Green; Tiffany A Greenwood; Dorothy E Grice; Magdalena Gross; Detelina Grozeva; Weihua Guan; Hugh Gurling; Lieuwe De Haan; Jonathan L Haines; Hakon Hakonarson; Joachim Hallmayer; Steven P Hamilton; Marian L Hamshere; Thomas F Hansen; Annette M Hartmann; Martin Hautzinger; Andrew C Heath; Anjali K Henders; Stefan Herms; Ian B Hickie; Maria Hipolito; Susanne Hoefels; Peter A Holmans; Florian Holsboer; Witte J Hoogendijk; Jouke-Jan Hottenga; Christina M Hultman; Vanessa Hus; Andrés Ingason; Marcus Ising; Stéphane Jamain; Edward G Jones; Ian Jones; Lisa Jones; Jung-Ying Tzeng; Anna K Kähler; René S Kahn; Radhika Kandaswamy; Matthew C Keller; James L Kennedy; Elaine Kenny; Lindsey Kent; Yunjung Kim; George K Kirov; Sabine M Klauck; Lambertus Klei; James A Knowles; Martin A Kohli; Daniel L Koller; Bettina Konte; Ania Korszun; Lydia Krabbendam; Robert Krasucki; Jonna Kuntsi; Phoenix Kwan; Mikael Landén; Niklas Långström; Mark Lathrop; Jacob Lawrence; William B Lawson; Marion Leboyer; David H Ledbetter; Phil H Lee; Todd Lencz; Klaus-Peter Lesch; Douglas F Levinson; Cathryn M Lewis; Jun Li; Paul Lichtenstein; Jeffrey A Lieberman; Dan-Yu Lin; Don H Linszen; Chunyu Liu; Falk W Lohoff; Sandra K Loo; Catherine Lord; Jennifer K Lowe; Susanne Lucae; Donald J MacIntyre; Pamela A F Madden; Elena Maestrini; Patrik K E Magnusson; Pamela B Mahon; Wolfgang Maier; Anil K Malhotra; Shrikant M Mane; Christa L Martin; Nicholas G Martin; Manuel Mattheisen; Keith Matthews; Morten Mattingsdal; Steven A McCarroll; Kevin A McGhee; James J McGough; Patrick J McGrath; Peter McGuffin; Melvin G McInnis; Andrew McIntosh; Rebecca McKinney; Alan W McLean; Francis J McMahon; William M McMahon; Andrew McQuillin; Helena Medeiros; Sarah E Medland; Sandra Meier; Ingrid Melle; Fan Meng; Jobst Meyer; Christel M Middeldorp; Lefkos Middleton; Vihra Milanova; Ana Miranda; Anthony P Monaco; Grant W Montgomery; Jennifer L Moran; Daniel Moreno-De-Luca; Gunnar Morken; Derek W Morris; Eric M Morrow; Valentina Moskvina; Pierandrea Muglia; Thomas W Mühleisen; Walter J Muir; Bertram Müller-Myhsok; Michael Murtha; Richard M Myers; Inez Myin-Germeys; Michael C Neale; Stan F Nelson; Caroline M Nievergelt; Ivan Nikolov; Vishwajit Nimgaonkar; Willem A Nolen; Markus M Nöthen; John I Nurnberger; Evaristus A Nwulia; Dale R Nyholt; Colm O'Dushlaine; Robert D Oades; Ann Olincy; Guiomar Oliveira; Line Olsen; Roel A Ophoff; Urban Osby; Michael J Owen; Aarno Palotie; Jeremy R Parr; Andrew D Paterson; Carlos N Pato; Michele T Pato; Brenda W Penninx; Michele L Pergadia; Margaret A Pericak-Vance; Benjamin S Pickard; Jonathan Pimm; Joseph Piven; Danielle Posthuma; James B Potash; Fritz Poustka; Peter Propping; Vinay Puri; Digby J Quested; Emma M Quinn; Josep Antoni Ramos-Quiroga; Henrik B Rasmussen; Soumya Raychaudhuri; Karola Rehnström; Andreas Reif; Marta Ribasés; John P Rice; Marcella Rietschel; Kathryn Roeder; Herbert Roeyers; Lizzy Rossin; Aribert Rothenberger; Guy Rouleau; Douglas Ruderfer; Dan Rujescu; Alan R Sanders; Stephan J Sanders; Susan L Santangelo; Joseph A Sergeant; Russell Schachar; Martin Schalling; Alan F Schatzberg; William A Scheftner; Gerard D Schellenberg; Stephen W Scherer; Nicholas J Schork; Thomas G Schulze; Johannes Schumacher; Markus Schwarz; Edward Scolnick; Laura J Scott; Jianxin Shi; Paul D Shilling; Stanley I Shyn; Jeremy M Silverman; Susan L Slager; Susan L Smalley; Johannes H Smit; Erin N Smith; Edmund J S Sonuga-Barke; David St Clair; Matthew State; Michael Steffens; Hans-Christoph Steinhausen; John S Strauss; Jana Strohmaier; T Scott Stroup; James S Sutcliffe; Peter Szatmari; Szabocls Szelinger; Srinivasa Thirumalai; Robert C Thompson; Alexandre A Todorov; Federica Tozzi; Jens Treutlein; Manfred Uhr; Edwin J C G van den Oord; Gerard Van Grootheest; Jim Van Os; Astrid M Vicente; Veronica J Vieland; John B Vincent; Peter M Visscher; Christopher A Walsh; Thomas H Wassink; Stanley J Watson; Myrna M Weissman; Thomas Werge; Thomas F Wienker; Ellen M Wijsman; Gonneke Willemsen; Nigel Williams; A Jeremy Willsey; Stephanie H Witt; Wei Xu; Allan H Young; Timothy W Yu; Stanley Zammit; Peter P Zandi; Peng Zhang; Frans G Zitman; Sebastian Zöllner; Bernie Devlin; John R Kelsoe; Pamela Sklar; Mark J Daly; Michael C O'Donovan; Nicholas Craddock; Patrick F Sullivan; Jordan W Smoller; Kenneth S Kendler; Naomi R Wray
Journal: Nat Genet Date: 2013-08-11 Impact factor: 38.330

86 in total

Review 1. Missing heritability of complex diseases: case solved?

Authors: Emmanuelle Génin
Journal: Hum Genet Date: 2019-06-04 Impact factor: 4.132

Review 2. Genetics of Alcoholism.

Authors: Howard J Edenberg; Joel Gelernter; Arpana Agrawal
Journal: Curr Psychiatry Rep Date: 2019-03-09 Impact factor: 5.285

3. Characterization of DSM-IV Opioid Dependence Among Individuals of European Ancestry.

Authors: Leslie A Brick; Lauren Micalizzi; Valerie S Knopik; Rohan H C Palmer
Journal: J Stud Alcohol Drugs Date: 2019-05 Impact factor: 2.582

4. Overlapping genetic effects between suicidal ideation and neurocognitive functioning.

Authors: Leslie A Brick; Marisa E Marraccini; Lauren Micalizzi; Chelsie E Benca-Bachman; Valerie S Knopik; Rohan H C Palmer
Journal: J Affect Disord Date: 2019-02-06 Impact factor: 4.839

5. Estimating narrow-sense heritability using family data from admixed populations.

Authors: Georgios Athanasiadis; Doug Speed; Mette K Andersen; Emil V R Appel; Niels Grarup; Ivan Brandslund; Marit Eika Jørgensen; Christina Viskum Lytken Larsen; Peter Bjerregaard; Torben Hansen; Anders Albrechtsen
Journal: Heredity (Edinb) Date: 2020-04-09 Impact factor: 3.821

6. Evaluating and improving heritability models using summary statistics.

Authors: Doug Speed; John Holmes; David J Balding
Journal: Nat Genet Date: 2020-03-23 Impact factor: 38.330

7. Testing structural models of psychopathology at the genomic level.

Authors: Irwin D Waldman; Holly E Poore; Justin M Luningham; Jingjing Yang
Journal: World Psychiatry Date: 2020-10 Impact factor: 49.548

8. Correlations between relatives: From Mendelian theory to complete genome sequence.

Authors: Elizabeth A Thompson
Journal: Genet Epidemiol Date: 2019-05-02 Impact factor: 2.135

Review 9. Genome-wide association studies and genetic testing: understanding the science, success, and future of a rapidly developing field.

Authors: Lauren Baker; Peter Muir; Susannah J Sample
Journal: J Am Vet Med Assoc Date: 2019-11-15 Impact factor: 1.936

10. Pleiotropic Meta-Analysis of Cognition, Education, and Schizophrenia Differentiates Roles of Early Neurodevelopmental and Adult Synaptic Pathways.

Authors: Max Lam; W David Hill; Joey W Trampush; Jin Yu; Emma Knowles; Gail Davies; Eli Stahl; Laura Huckins; David C Liewald; Srdjan Djurovic; Ingrid Melle; Kjetil Sundet; Andrea Christoforou; Ivar Reinvang; Pamela DeRosse; Astri J Lundervold; Vidar M Steen; Thomas Espeseth; Katri Räikkönen; Elisabeth Widen; Aarno Palotie; Johan G Eriksson; Ina Giegling; Bettina Konte; Annette M Hartmann; Panos Roussos; Stella Giakoumaki; Katherine E Burdick; Antony Payton; William Ollier; Ornit Chiba-Falek; Deborah K Attix; Anna C Need; Elizabeth T Cirulli; Aristotle N Voineskos; Nikos C Stefanis; Dimitrios Avramopoulos; Alex Hatzimanolis; Dan E Arking; Nikolaos Smyrnis; Robert M Bilder; Nelson A Freimer; Tyrone D Cannon; Edythe London; Russell A Poldrack; Fred W Sabb; Eliza Congdon; Emily Drabant Conley; Matthew A Scult; Dwight Dickinson; Richard E Straub; Gary Donohoe; Derek Morris; Aiden Corvin; Michael Gill; Ahmad R Hariri; Daniel R Weinberger; Neil Pendleton; Panos Bitsios; Dan Rujescu; Jari Lahti; Stephanie Le Hellard; Matthew C Keller; Ole A Andreassen; Ian J Deary; David C Glahn; Anil K Malhotra; Todd Lencz
Journal: Am J Hum Genet Date: 2019-08-01 Impact factor: 11.025