Literature DB >> 35896974

Multivariate estimation of factor structures of complex traits using SNP-based genomic relationships.

Ronald De Vlaming¹, Eric A W Slob^2,3,4, Patrick J F Groenen⁵, Cornelius A Rietveld^3,4.

Abstract

BACKGROUND: Heritability and genetic correlation can be estimated from genome-wide single-nucleotide polymorphism (SNP) data using various methods. We recently developed multivariate genomic-relatedness-based restricted maximum likelihood (MGREML) for statistically and computationally efficient estimation of SNP-based heritability ([Formula: see text]) and genetic correlation ([Formula: see text]) across many traits in large datasets. Here, we extend MGREML by allowing it to fit and perform tests on user-specified factor models, while preserving the low computational complexity.
RESULTS: Using simulations, we show that MGREML yields consistent estimates and valid inferences for such factor models at low computational cost (e.g., for data on 50 traits and 20,000 individuals, a saturated model involving 50 [Formula: see text]'s, 1225 [Formula: see text]'s, and 50 fixed effects is estimated and compared to a restricted model in less than one hour on a single notebook with two 2.7 GHz cores and 16 GB of RAM). Using repeated measures of height and body mass index from the US Health and Retirement Study, we illustrate the ability of MGREML to estimate a factor model and test whether it fits the data better than a nested model. The MGREML tool, the simulation code, and an extensive tutorial are freely available at https://github.com/devlaming/mgreml/ .
CONCLUSION: MGREML can now be used to estimate multivariate factor structures and perform inferences on such factor models at low computational cost. This new feature enables simple structural equation modeling using MGREML, allowing researchers to specify, estimate, and compare genetic factor models of their choosing using SNP data.

Entities: Chemical

Keywords: GREML; Genetic correlation; Genetic factor model; Genomic SEM; SNP heritability

Mesh：

Year: 2022 PMID： 35896974 PMCID： PMC9327374 DOI： 10.1186/s12859-022-04835-3

Source DB: PubMed Journal: BMC Bioinformatics ISSN： 1471-2105 Impact factor: 3.307

Background

Narrow-sense heritability () quantifies the relative importance of additive genetic variance for a trait. Genetic correlation () reflects the shared genetic architecture between two traits. Genomic-relatedness-based restricted maximum likelihood (GREML) estimation [1-4] is widely used to estimate and using genome-wide single-nucleotide polymorphism (SNP) data for unrelated individuals [5]. As the heritability captured by SNPs provides a reasonable lower bound for [6, 7], the former is often referred to as SNP-based heritability and denoted by [8]. We recently developed MGREML, a computationally and statistically efficient approach for multivariate GREML [9]. Importantly, MGREML resolves inconsistencies when combining bivariate estimates into a multivariate matrix. By default, MGREML assumes the matrix is shaped by a so-called saturated model [10], which can fit any conceivable proper correlation matrix. Here, we derive an extension of the statistical framework of MGREML (1) to estimate user-specified genetic and environmental factor models (e.g., a model with just one genetic factor for all traits) and (2) to test whether the given factor model fits the data better than a nested model (also user-specified). Whereas Genomic SEM [11], another method to estimate genetic factor models, relies on preexisting summary statistics from large-scale genome-wide association studies (GWAS) for all traits of interest, MGREML uses individual-level data, giving users (1) more statistical power for a fixed sample size [9] and (2) more direct control over model specification and estimation (e.g., being able to control for an additional covariate in the MGREML analysis itself, rather than having to obtain a new set of GWAS results). In short, we enable MGREML to estimate genetic and environmental factor structures using individual-level data, and to test whether a given factor structure fits the data better than a nested model. We validate this approach using simulations and an empirical application.

Implementation

Model

Consider a set of N unrelated individuals for whom we observe T traits, k covariates, and M SNPs. Let denote the matrix of covariates, the matrix of standardized SNPs, and the matrix of traits, for which column t corresponds to Trait t and is denoted by . Furthermore, let denote a binary matrix indicating which of the k covariates in apply to Trait t. Now, the matrix of covariates for Trait t can be defined as . When applying univariate GREML as implemented in GCTA [1] to Trait t, the following linear mixed model (LMM) is estimated using restricted maximum likelihood (REML) [12]:In this model, is the vector of fixed effects of the covariates that apply to Trait t and the matrix is the so-called genomic-relatedness matrix (GRM) reflecting the subtle genetic similarities between unrelated individuals. Typically, the GRM is calculated as using tools such as GCTA or PLINK [1, 13]. Calculation of the GRM requires time. However, as this calculation can be massively parallelized, it places little practical limitation on either N or M. By giving standardized SNPs the same weight, the preceding definition of the GRM makes tacit assumptions about the relationship between allele frequency and linkage disequilibrium on the one hand and SNP effect sizes on the other. Other tools, such as LDAK [14], can be used to construct a GRM that assigns different weights to the SNPs, thereby incorporating different assumptions about SNP effect sizes. Importantly, MGREML can use any valid GRM in binary format as input, irrespective of its precise definition and irrespective of whether it is calculated using PLINK, GCTA, or LDAK. The parameters of interest in the univariate model are and , where denotes the additive genetic variance of Trait t captured by the available SNPs and the remaining variance in Trait t. The latter quantity is sometimes referred to as the environmental variance, even though this name can be somewhat misguiding, since simply reflects all variance in Trait t that is not tagged by the additive linear effects of the available SNPs and covariates [6]. In spite of the subtleties in its definition, we stick to the convention of calling this term the environmental variance. In this model, of Trait t can be defined as . In essence, univariate GREML quantifies the degree to which genetic similarity between individuals, as tagged by the SNPs used to construct the GRM, maps to trait similarity. Notice here that REML does not estimate directly. Instead, REML controls for the fixed-effect covariates by considering so-called error contrasts [15, 16]. More specifically, REML estimation is equivalent to maximum-likelihood estimation applied to , where the rows of matrix form a basis of the left null space of . However, once REML estimates of and are obtained, one can readily calculate the generalized least squares estimator of the fixed effects [1, 9]. This option is implemented in both GCTA and MGREML. The univariate LMM can be generalized to a multivariate LMM [17, 18], which can be used to jointly estimate genetic covariance and environmental covariance between Traits and , denoted by and respectively. Using the same notation as seen in the original derivations of MGREML [9], this multivariate LMM can be written as follows:where ‘’ denotes the Kronecker product. In this model, is the genetic variance matrix and the environmental variance matrix. Now, the genetic correlation between Traits t and s is defined as [2], where is element t, s from .

Computational complexity

The variance matrix of the multivariate model (i.e., ) is dense, rendering naïve REML estimation infeasible for large N and T, as mere evaluation of the log-likelihood function already requires time. However, the time complexity can be drastically reduced by transforming the data using the eigenvalue decomposition (EVD) of the GRM [4, 9]. Let denote the EVD of . Here, denotes the matrix of eigenvectors and the diagonal matrix of eigenvalues. MGREML defines matrix as the columns from that correspond to the eigenvalues that are not among the L largest, and as the diagonal matrix with corresponding eigenvalues, . Using this matrix , MGREML transforms the data, and then reorders it such that (i) the variance matrix is block diagonal, enabling significant computational improvements, and (ii) the contribution of the L leading principal components from the genetic data to the variance matrix are eliminated, thus, correcting for population stratification [19] without introducing any additional fixed-effect covariates [9]. By default , causing MGREML to control for even quite subtle population stratification. Users can specify a different value for L using the ––adjust-pcs option. More specifically, the following model holds for vector (where vec() denotes the vectorization operator):where , , is row j from , andOmitting the constant, the corresponding log-likelihood function is given bywhere [1]. This log-likelihood function depends on and quadratic forms of the type . Importantly, is a highly sparse, block-diagonal matrix, where diagonal block j equals , with and being functions of the parameter vector . As a result of this block-diagonal structure, these quadratic forms and log-determinants can be written as a sum of n independent contributions, where each contribution comes from a block. MGREML can calculate the contribution of any given block in time. Concordantly, the log-likelihood function can be evaluated in time. Similarly, the gradient (i.e., the vector of partial derivatives of the log-likelihood function with respect to ) can also be calculated in time. MGREML retains its computational efficiency in case there are a limited number of fixed effects covariates. However, if the number of covariates grows large, MGREML will get slower. Nevertheless, as MGREML controls for population stratification without having to introduce any fixed effects for that purpose, a limited number of fixed-effect covariates suffices in a typical empirical application. The average information (AI) algorithm, a variation on Newton’s method [1, 20], is ill-suited for MGREML estimation for large T, since that algorithm involves repeated calculation of the AI matrix, which requires time per iteration for a saturated model [9]. Specifically, a saturated model has free parameters. Thus, the AI matrix has elements, where each element involves a calculation requiring O(N) time, placing overall complexity at . To avoid having to calculate the AI matrix in every iteration, MGREML instead uses a Broyden–Fletcher–Goldfarb–Shanno (BFGS) algorithm [21] in combination with a golden-section line search to estimate . Importantly, a BFGS iteration has roughly the same computational complexity as a gradient-descent iteration yet a higher rate of convergence across iterations. Thus, a BFGS algorithm balances computational complexity per iteration and rate of convergence across iterations. The BFGS algorithm is initialized such that the first iteration is equivalent to a gradient-descent iteration with golden-section search. Evaluations of the log-likelihood function and its gradient suffice for application of the golden-section search and BFGS algorithm, putting the overall time complexity of MGREML at per iteration. For large T, the BFGS algorithm can exhibit unstable behavior, in which case relevant quantities are reinitialized such that first next BFGS iteration is again equivalent to a gradient-descent iteration with golden-section search. If instability persists, MGREML switches to the AI algorithm for a single iteration. In our experience, such expensive ‘interventions’ are needed only sparingly and are effective in resolving numerical instabilities in MGREML estimation. Once MGREML has converged, the variance matrix of is estimated using the AI matrix [20]. In addition, a delta method is used to obtain the standard error (SE) of and estimates. Although calculation of the AI matrix, as indicated, is expensive, this calculation only needs to be carried out once. Moreover, MGREML users can specify the ––no-se option to forgo calculation of the AI matrix and SEs altogether after convergence of the BFGS algorithm.

Factor structures

By default, MGREML assumes a saturated model for both and . An example of such a saturated model for traits is shown in Fig. 1. Letting (resp. ) denote the effect of genetic (environmental) Factor f on Trait t, the saturated model for traits can be written as follows:For T traits in general, a saturated model for (resp. ) can be described in terms of a lower triangular matrix of free genetic (environmental) coefficients () where ().

Fig. 1

A saturated genetic and environmental factor model for three traits

A saturated genetic and environmental factor model for three traits Here, we generalize this approach by allowing (resp. ) to be a () matrix of which a pre-defined subset of the () elements are free, while the other elements are constrained to zero, reflecting an arbitrary factor model with () genetic (environmental) factors. Both factor models need to satisfy standard identification requirements in structural equation modeling [22]. Under this framework, the implied genetic (resp. environmental) variance matrix () is always at least positive (semi)-definite. In other words, provided the user-specified model is identified, MGREML always yields valid correlation matrices. MGREML users can specify a main model, comprising a genetic factor model and an environmental factor model. In case a user also specifies a nested model, MGREML performs a classical likelihood-ratio test (LRT) [23], to infer whether the fit of the main factor model is significantly better than that of the nested model. In total, users can specify at most four factor models: (1A) the main genetic factor model, (1B) the main environmental factor model, (2A) the nested genetic factor model, and (2B) the nested environmental factor model. For example, a user can specify a main genetic factor model where there is only one genetic factor for all traits and a nested genetic factor model, where the traits have no genetic variance at all (i.e., there is no genetic factor), while the environmental factor model is saturated both in the main model as well as the nested model. A factor model specification for MGREML is effectively a binary matrix stored as a plain text file, where F denotes the number of factors. More specifically, in a given model, for and , if Factor f has a free path coefficient to Trait t, element t, f of the binary matrix equals one and otherwise that element equals zero. Let (resp. ) denote the number of free coefficients in the main genetic (environmental) factor model and let (resp. ) be defined analogously for the nested model. Finally, let (resp. ) denote the log-likelihood of the main (nested) model. Now, the LRT statistic is calculated by MGREML as , which under standard maximum likelihood estimation (MLE) assumptions [24] and nestedness of the models is distributed. Specification of a genetic factor model for height and body mass index (BMI) observed at five different points in time (denoted by subscripts indicating waves 7, 8, ..., 11) An example of a genetic factor model that MGREML users can specify is shown in Table 1. The corresponding structural equation model for which MGREML fits under that specification is shown in Fig. 2. The environmental factors shaping are not shown here, for clarity of the figure. We use this genetic factor model in our empirical application. In this example, the first genetic factor captures the genetic signal shared between all height measurements, the second genetic factor captures the genetic signal shared between all measurements of body mass index (BMI), and the third factor captures the genetic overlap between height and BMI (i.e., the genetic correlation).

Table 1

Specification of a genetic factor model for height and body mass index (BMI) observed at five different points in time (denoted by subscripts indicating waves 7, 8, ..., 11)

Trait	G\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{height }$$\end{document}height	G\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{BMI }$$\end{document}BMI	G\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{shared }$$\end{document}shared
height\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{7}$$\end{document}7	1	0	1
height\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{8}$$\end{document}8	1	0	1
height\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{9}$$\end{document}9	1	0	1
height\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{10}$$\end{document}10	1	0	1
height\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{11}$$\end{document}11	1	0	1
BMI\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{7}$$\end{document}7	0	1	1
BMI\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{8}$$\end{document}8	0	1	1
BMI\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{9}$$\end{document}9	0	1	1
BMI\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{10}$$\end{document}10	0	1	1
BMI\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$_{11}$$\end{document}11	0	1	1

Fig. 2

A genetic factor model for height and body mass index (BMI) observed at five different points in time (denoted by subscripts indicating waves 7, 8, ..., 11)

Results

Simulation study

To test the validity of MGREML estimates of genetic correlations and underlying factor structures, we generated 100 independent datasets with individuals and traits with SNP-based . In Simulation 1, we set to the same value across all combinations of traits. In Simulation 2, we simulate two clusters of five traits by setting to random values within clusters and to zero between clusters. In Simulation 3, we consider one additional dataset with individuals and traits with SNP-based and . The simulation design is fully described in the Supplementary Information [see Additional File 1]. As MGREML estimation is a specific form of MLE, we expect MGREML to yield consistent estimates of the population parameters, provided standard MLE assumptions hold [24]. That is, as N increases, each parameter estimate converges to the true value. The results of Simulation 1 support the claim that MGREML yields consistent estimates of and across the full range of feasible values for [see Additional File 1: Tables S1–S4]. The SEs of estimates also align with the standard deviations of estimates across the generated datasets. Estimates have lower SEs when interdependence across traits is higher (i.e., higher ). Typical MGREML estimate of a genetic correlation () matrix in Simulation 2. True genetic correlations (’s) are shown above the diagonal. Estimated ’s (standard error between parentheses) are shown below the diagonal The results of Simulation 2 show that MGREML also yields consistent estimates when the degrees of freedom in the model is larger than necessary [see Additional File 1: Table S5]. Estimates closely reflect the implied factor structure, as illustrated in Fig. 3 which shows MGREML estimation results for the first dataset. When comparing the fit of the appropriate factor model and the saturated model using an LRT, we find that resulting p-values closely follow the correct theoretical distribution [see Additional File 1: Figure S1].

Fig. 3

Typical MGREML estimate of a genetic correlation () matrix in Simulation 2. True genetic correlations (’s) are shown above the diagonal. Estimated ’s (standard error between parentheses) are shown below the diagonal

The results of Simulation 3 show that MGREML can readily estimate and compare factor models for traits observed in individuals, involving 50 fixed effects and including calculation of SEs, on a single notebook with two 2.7 GHz cores and 16 GB of RAM in less than one hour. In addition, on more powerful machines, MGREML estimation can handle at least up until traits and individuals [9].

Empirical application

To illustrate the ability of MGREML to estimate a factor model and test whether it fits the data better than a nested model, we use data on unrelated individuals from the US Health and Retirement Study (HRS) [25], for whom we analyze repeated measures of height and BMI in five consecutive waves of data collection (Waves 7–11). The HRS is a longitudinal panel study that surveys a representative sample of approximately 20,000 individuals aged 51 years and older (and their spouses) in the United States. Further details (e.g., quality control filters and descriptive statistics) are provided in the Supplementary Information [see Additional File 1]. As a baseline model, we start by assuming height and BMI both have no genetic variance (Model I). Given that previous estimates for height and BMI are considerably greater than zero [26, 27] (e.g., with and with [28]), we also consider an alternative model with one genetic factor for the height observations and one genetic factor for the BMI observations (Model II), which corresponds to the first two columns of the factor model shown in Table 1 labeled G and G. Although we expect Model II to have a far better fit than Model I, Model II still assumes there is no genetic correlation between height and BMI. Yet, there is ample evidence that height and BMI are genetically correlated traits [9, 29] (e.g., with [9]). Therefore, we also consider a third model in which we introduce a shared genetic factor that affects both the height and BMI observations (Model III), accounting for the genetic overlap between these two traits. Model III corresponds to the factor model shown in Table 1 (where the shared factor is labeled G) and equivalently in Fig. 2. In all three models, we assume a saturated environmental factor model. With the HRS surveying a representative sample of individuals aged 51 years and older (and their spouses), it seems unlikely that the unique and the shared genetic architecture of height and BMI will drastically change for individuals in our analysis sample between the biennial waves of data collection. Therefore, we a priori believe Model III to be most suitable for the data. That is, we expect this to be the most parsimonious model that is able to capture both the unique and the shared genetic component of height and BMI across waves. At the same time, taking aforementioned estimates of and for height and BMI at face value, and using the online GCTA-GREML power calculator [30], we find that the statistical power to detect in this sample is only 21.8%. Hence, Model II might not be rejected in favor of Model III simply due to lack of statistical power. Details of this power calculation are described in the Supplementary Information [see Additional File 1]. In the application to data on repeated measures of height and BMI, we first compare the fit of Model I and Model II. We find that Model II, as expected, fits the data better than Model I (LRT=72.03, degrees of freedom=10, p-value=). Thus, the null model of no genetic variance is rejected in favor of a model in which (1) height has genetic variance, (2) BMI has genetic variance, yet (3) height and BMI have no genetic correlation. When we compare the fit of Model II and Model III, we do not find an improvement in fit (LRT=11.11, degrees of freedom=10, p-value=0.349). Thus, the model without genetic correlation between height and BMI is not rejected in favor of a model with genetic overlap, in line with our power calculation.

Conclusion

Accurate estimates of genetic correlations and genetic factor structures across multiple traits help to understand their shared etiology and aid in finding likely causal relationships [29, 31]. As such, estimation and inference based on genetic and environmental factor models may contribute to the design of future genetic and functional studies. Here, we derived a statistical framework (1) to model and estimate such factor models using individual-level SNP data and (2) to test hypotheses regarding these factor models. Using simulations and an empirical application, we confirmed the validity of this statistical framework. This framework is implemented in our freely available command-line tool MGREML, which has simple input options for this purpose. MGREML accepts user-specified genetic and environmental factor models as input, and performs estimation and inference based thereon. Even on a single machine, this tool can readily be applied to data on 20,000 individuals and 50 traits. Additional file 1. Supplementary Information.

21 in total

1. Cohort Profile: the Health and Retirement Study (HRS).

Authors: Amanda Sonnega; Jessica D Faul; Mary Beth Ofstedal; Kenneth M Langa; John W R Phillips; David R Weir
Journal: Int J Epidemiol Date: 2014-03-25 Impact factor: 7.196

2. Maximum likelihood estimation of variance components for a multivariate mixed model with equal design matrices.

Authors: K Meyer
Journal: Biometrics Date: 1985-03 Impact factor: 2.571

3. Genome partitioning of genetic variation for complex traits using common SNPs.

Authors: Jian Yang; Teri A Manolio; Louis R Pasquale; Eric Boerwinkle; Neil Caporaso; Julie M Cunningham; Mariza de Andrade; Bjarke Feenstra; Eleanor Feingold; M Geoffrey Hayes; William G Hill; Maria Teresa Landi; Alvaro Alonso; Guillaume Lettre; Peng Lin; Hua Ling; William Lowe; Rasika A Mathias; Mads Melbye; Elizabeth Pugh; Marilyn C Cornelis; Bruce S Weir; Michael E Goddard; Peter M Visscher
Journal: Nat Genet Date: 2011-05-08 Impact factor: 38.330

4. The Promises and Pitfalls of Genoeconomics*

Authors: Daniel J Benjamin; David Cesarini; Christopher F Chabris; Edward L Glaeser; David I Laibson; Vilmundur Guðnason; Tamara B Harris; Lenore J Launer; Shaun Purcell; Albert Vernon Smith; Magnus Johannesson; Patrik K E Magnusson; Jonathan P Beauchamp; Nicholas A Christakis; Craig S Atwood; Benjamin Hebert; Jeremy Freese; Robert M Hauser; Taissa S Hauser; Alexander Grankvist; Christina M Hultman; Paul Lichtenstein
Journal: Annu Rev Econom Date: 2012-06-18

5. Conditions for the validity of SNP-based heritability estimation.

Authors: James J Lee; Carson C Chow
Journal: Hum Genet Date: 2014-04-18 Impact factor: 4.132

Review 6. The contribution of genetic variants to disease depends on the ruler.

Authors: John S Witte; Peter M Visscher; Naomi R Wray
Journal: Nat Rev Genet Date: 2014-09-16 Impact factor: 53.242

7. Second-generation PLINK: rising to the challenge of larger and richer datasets.

Authors: Christopher C Chang; Carson C Chow; Laurent Cam Tellier; Shashaank Vattikuti; Shaun M Purcell; James J Lee
Journal: Gigascience Date: 2015-02-25 Impact factor: 6.524

8. MTG2: an efficient algorithm for multivariate linear mixed model analysis based on genomic information.

Authors: S H Lee; J H J van der Werf
Journal: Bioinformatics Date: 2016-01-10 Impact factor: 6.937

9. Multivariate analysis reveals shared genetic architecture of brain morphology and human behavior.

Authors: Ronald de Vlaming; Eric A W Slob; Philip R Jansen; Alain Dagher; Philipp D Koellinger; Patrick J F Groenen; Cornelius A Rietveld
Journal: Commun Biol Date: 2021-10-12

10. An atlas of genetic correlations across human diseases and traits.

Authors: Brendan Bulik-Sullivan; Hilary K Finucane; Verneri Anttila; Alexander Gusev; Felix R Day; Po-Ru Loh; Laramie Duncan; John R B Perry; Nick Patterson; Elise B Robinson; Mark J Daly; Alkes L Price; Benjamin M Neale
Journal: Nat Genet Date: 2015-09-28 Impact factor: 38.330