Literature DB >> 29950019

A scalable estimator of SNP heritability for biobank-scale data.

Yue Wu1, Sriram Sankararaman1,2.   

Abstract

Motivation: Heritability, the proportion of variation in a trait that can be explained by genetic variation, is an important parameter in efforts to understand the genetic architecture of complex phenotypes as well as in the design and interpretation of genome-wide association studies. Attempts to understand the heritability of complex phenotypes attributable to genome-wide single nucleotide polymorphism (SNP) variation data has motivated the analysis of large datasets as well as the development of sophisticated tools to estimate heritability in these datasets. Linear mixed models (LMMs) have emerged as a key tool for heritability estimation where the parameters of the LMMs, i.e. the variance components, are related to the heritability attributable to the SNPs analyzed. Likelihood-based inference in LMMs, however, poses serious computational burdens.
Results: We propose a scalable randomized algorithm for estimating variance components in LMMs. Our method is based on a method-of-moment estimator that has a runtime complexity O(NMB) for N individuals and M SNPs (where B is a parameter that controls the number of random matrix-vector multiplications). Further, by leveraging the structure of the genotype matrix, we can reduce the time complexity to O(NMBmax( log⁡3N, log⁡3M)). We demonstrate the scalability and accuracy of our method on simulated as well as on empirical data. On standard hardware, our method computes heritability on a dataset of 500 000 individuals and 100 000 SNPs in 38 min. Availability and implementation: The RHE-reg software is made freely available to the research community at: https://github.com/sriramlab/RHE-reg.

Entities:  

Mesh:

Year:  2018        PMID: 29950019      PMCID: PMC6022682          DOI: 10.1093/bioinformatics/bty253

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 Introduction

A central question in biology is to understand how much of the variation in a trait (phenotype) can be explained by genetics as opposed to environmental factors. The heritability of a trait is a central notion in quantifying the contribution of genetics to the variation in a trait. The heritability of a trait refers to the proportion of variation in the trait that can be explained by genetic variation (Visscher ). The narrow-sense heritability (h2) refers to the proportion of trait variation that can be explained by a linear function of genetic variation (Almasy and Blangero, 1998). Beyond understanding the genetic basis of a phenotype, heritability determines the power of genetic association studies to detect genetic variants associated with a phenotype, the accuracy of using genetic data to predict phenotypes, as well as the response of a phenotype to natural and artificial selection (Houle, 1992). While family-based studies enabled the estimation of heritability of a wide variety of traits, the availability of genome-wide genetic variation data has enabled a direct estimation of the heritability associated with genotyped single nucleotide polymorphisms (SNPs), termed SNP heritability. Initial attempts to estimate heritability from genomic data focused on the variation in a trait could be explained by SNPs that were discovered to be significantly associated with the trait in a genome-wide association study (GWAS). These estimates were found to severely under-estimate the narrow-sense heritability, a phenomenon known as missing heritability. A major insight into the mystery of missing heritability emerged in Yang ) who showed that using all genotyped SNPs jointly to explain variation in a trait led to a substantially larger estimate of heritability than from SNPs that were found to be associated in GWAS. Subsequent analyses suggest that much of missing heritability could be explained by the presence of a large number of SNPs of weak effects that has, in turn, motivated analyses of larger datasets. Linear mixed models (LMMs) has emerged as a key analytically technique for estimating the heritability of complex traits using genome-wide SNP variation data. Beyond their application in estimating SNP heritability, LMMs are widely used in association tests where they are used to control for population stratification (Kang ; Lippert ; Loh ; Yu ; Zhou and Stephens, 2014), in phenotype and disease risk prediction (Makowsky ; Speed ; Wray ; Yang ; Zhou ), and in understanding the relative contribution of genomic regions to variation in a trait of interest (Makowsky ; Wray ; Yang ). A key step in the application of LMMs is the estimation of their parameters, i.e. often referred to as variance components. Estimation of variance components is a computationally challenging problem on genomic datasets containing large numbers of individuals and SNPs. The most commonly used method for variance components estimation in LMMs relies on maximizing the likelihood of the parameters. Often, a related estimator, known as the restricted maximum likelihood (REML) estimator, is preferred due to a reduced bias relative to maximum likelihood estimators. Both maximum likelihood as well as REML estimation, however, rely on computationally intensive optimization problems. While a number of methods have been proposed to improve the computational efficiency of REML estimators (Kang ; Lippert ; Loh ; Pirinen ; Yang ), all of these methods rely on iterative optimization algorithms that do not scale well to biobank-scale datasets consisting of millions of individuals genotyped at tens of millions of SNPs. Further, REML has been shown to yield biased estimates of heritability in ascertained case-control studies (Chen, 2014; Golan ).

1.1 Our contributions

We propose a scalable randomized algorithm to estimate variance components of a LMM. Our method is based on Haseman–Elston (HE) regression (Bulik-Sullivan, 2015; Chen ; Elston ; Haseman and Elston, 1972), a method-of-moment (MoM) estimator of the heritability of a phenotype. The HE-regression estimator, like other MoM estimators, tends to be statistically less efficient compared to REML. On the other hand, HE-regression is computationally attractive as it leads to a set of linear equations in the variance components that can be solved analytically. While this property of HE-regression is appealing, a key computational bottleneck in the application of HE-regression is the computation of an N × N matrix that summarizes the relationship between all N pairs of individuals in the dataset. As a result, the computation and memory requirements of HE scale quadratically with the number of individuals. Our randomized HE-regression (RHE-reg) estimator relies on the observation that the key bottleneck in HE-regression can be replaced by multiplying the N × M (individuals × SNPs) matrix of genotypes with a small number, B, of random vectors. This leads to a randomized estimator with runtime and memory requirements . Further, we leverage the observation that the genotype matrix has entries in a finite set, i.e. so that the time complexity of matrix-vector multiplication reduces to (Liberty and Zucker, 2009). This additional gain in efficiency can be substantial when the number of SNPs or individuals is large. For example, in the UK Biobank, N is of the order of 105 while M is of the order of 106. Thus, we propose an estimator of variance components with runtime and memory requirement . We apply the RHE-reg estimator to the problem of estimating SNP heritability. We show that our method yields unbiased SNP heritability estimates. While our method is statistically inefficient compared to REML (both because it is moment-based as well as the added randomization), we show in practice that the statistical inefficiency is minimal, particularly for large sample sizes. Further, our method is substantially more computationally efficient so that it can be effectively applied to whole-genome genotype data from hundreds of thousands of individuals. REML has been shown to yield biased estimates of heritability in ascertained case-control studies (Chen ; Golan ) while the RHE-reg estimator can also be applied in this setting. Finally, since variance component analysis is of interest beyond heritability estimation, the RHE-reg estimator can enable rapid estimation of variance components in all of the settings in which LMMs are used.

2 Materials and methods

We observe genotypes from N individuals at M SNPs. The genotype vector for individual i is a length M vector denoted by . The jth entry of denotes the number of minor allele carried by individual i at SNP j. Let be the N × M genotype matrix where . is a N × M matrix of standardized genotypes obtained by centering and scaling each column of so that and for all . Let is an N-vector of phenotypes and be an M-vector of SNP effect sizes.

2.1 Linear mixed model

We assume the vector of phenotypes is related to the genotypes by a LMM: Here is centered so that . is the residual variance while is the variance component corresponding to the M SNPs. The SNP heritability is defined as . In this model, we have while the population covariance of the phenotype vector is: Here is the genetic relatedness matrix (GRM) computed from all SNPs. One approach to estimate the SNP heritability is HE-regression (Haseman and Elston, 1972) which is a MoM estimator obtained by equating the population covariance to the empirical covariance [several variants of HE-regression have been proposed; what we consider here is HE-CP (Sham and Purcell, 2001)]. The empirical covariance of the phenotype vector is estimated by . The MoM estimator is obtained by solving the following ordinary least squares (OLS) problem (see Appendix A1 for details): The MoM estimator satisfies the normal equations: Solving the normal equations requires computing and . The GRM can be computed in time and requires memory. Given the GRM, computing each of the coefficients for the normal equation requires time. Finally, given each of the coefficients, we can solve analytically solve for the and . Indeed, we can write Thus, the key bottleneck in solving the HE-regression lies in computing the GRM.

2.2 RHE-reg: a randomized estimator of heritability

Given that , we can compute the quantities . For standardized genotypes, while can be computed in time. The one remaining quantity that we need to compute efficiently is . Given a N × N matrix and a random vector with mean zero and covariance , we use the following identity to construct a randomized estimator of the trace of matrix (see Appendix A2 for a proof): Equation (9) leads to the following unbiased estimator of the trace of given B random vectors, , drawn independently from a distribution with zero mean and identity covariance matrix : In practice, we draw each entry of independently from a standard normal distribution. We note that the estimator L involves two matrix-vector multiplications of N × M matrix repeated B times for a total runtime of . The RHE-reg estimator () is obtained by solving the Normal equations [Equation (7)] by replacing with L. The RHE-reg estimator of the SNP heritability is then obtained by where is the unbiased estimator of the phenotypic variance.

2.3 Sub-linear computations

The key bottleneck in the RHE-reg is the computation of L which involves repeated multiplication of the normalized genotype matrix by a real-valued vector. Leveraging the fact that each element of the genotype matrix takes values in the set {0, 1, 2}, we can improve the complexity of these multiplication operations from to using the Mailman algorithm (Liberty and Zucker, 2009).

2.3.1 The Mailman algorithm

Consider a M ×  N matrix whose entries take values in {0, 1, 2}. Assume that the number of SNPs . The naive way to compute the product for any real-valued vector takes time. The Mailman algorithm decomposes the matrix as . is a matrix whose column contains all possible vectors over {0, 1, 2} of length . And P is an indicator matrix, where entry if the ith column is the same as jth column in matrix . The decomposition of matrix takes time. The desired product is computed in two steps as followed by , each of which can be computed in only operations (Liberty and Zucker, 2009). For a matrix with , we partition into sub-matrices each of size each of which can be multiplied in time for a total computational cost of .

2.3.2 Application of the Mailman algorithm to RHE-reg

Now consider the standardized genotype , which could be written as , where is a matrix where the ith column contains the sample mean of the ith SNP (), and is an M × M diagonal matrix, with the inverse of variance of each SNP as the diagonal entries. Thus, when we compute in Equation (11), computing using the Mailman algorithm takes operations. Similarly, to compute each term in the sum of the randomized estimator of [Equation (10)], , we can substitute with . The first term can again be computed using using the Mailman algorithm, and the second term is equivalent to scaling the N-vector which can be computed in time .

2.4 Computing the standard error

We show in Appendix A4 that the variance of the RHE-reg estimator of can be approximated by the variance of the exact HE-regression estimator with an additional contribution due to the randomization: Here B is the number of samples used and is a random vector with mean zero and identity covariance matrix. For samples with low-levels of relatedness, we can assume and our estimates of and to estimate the variance. Further, we show in Appendix A4 that we can estimate the variance (and hence, the standard error) of the RHE-reg estimator in sub-linear time without assuming that .

2.5 Some remarks on the RHE-reg estimator

The RHE-reg is biased as we show in Appendix A3 with a bias that decreases with B. In practice, the bias appears to be small (see Fig. 1).
Fig. 1.

RHE-reg accurately estimates heritability: in the first series of (a–c), we simulated genotypes with varying sample size while fixing the number of SNPs to 10 000. The phenotype in each of the (a), (b) and (c) is simulated with true heritability of 0.2, 0.5 and 0.8, respectively. The second series of (d–f) considers genotype data with varying number of SNPs while the number of samples is fixed at 10 000. All three methods that we evaluated (GCTA, HE-reg and RHE-reg) have similar accuracies. GCTA which estimates the REML has smaller standard errors when the heritability is large (). For lower values of true heritability (), the estimates from REML, HE-regression and RHE-reg are comparable. HE and RHE-reg have similar variance suggesting that randomization only makes a minor contribution to the statistical accuracy

Equation (3) assumes an infinitesimal model for the phenotype. However, all our results only depend on the second moment of the SNP effect sizes. Thus, the RHE-reg estimator can yield valid estimates for non-infinitesimal architectures. In a number of settings, it is desirable to include covariates, such as age or sex, in the analysis. This changes the model in Equation (3) to: RHE-reg accurately estimates heritability: in the first series of (a–c), we simulated genotypes with varying sample size while fixing the number of SNPs to 10 000. The phenotype in each of the (a), (b) and (c) is simulated with true heritability of 0.2, 0.5 and 0.8, respectively. The second series of (d–f) considers genotype data with varying number of SNPs while the number of samples is fixed at 10 000. All three methods that we evaluated (GCTA, HE-reg and RHE-reg) have similar accuracies. GCTA which estimates the REML has smaller standard errors when the heritability is large (). For lower values of true heritability (), the estimates from REML, HE-regression and RHE-reg are comparable. HE and RHE-reg have similar variance suggesting that randomization only makes a minor contribution to the statistical accuracy Here is a N × C matrix of covariates while α is a C-vector of coefficients. In this setting, we transform Equation (12) by multiplying by the projection matrix : The RHE-regression estimator applied to Equation (13) then must satisfy the following moment conditions: Here J is a randomized estimator of analogous to Equation (10). The cost of computing the RHE-reg estimator now includes the cost of computing the inverse of as well as multiplying by a real-valued vector for an added computational cost of . Typically, the number of covariates C is small (tens to hundreds) so that the presence of covariance does not greatly increase the computational burden. 4. The variance components model [Equations (3) and (5)] can be extended in a straightforward manner to more than two variance components. A number of recent studies have explored the utility of these models to partition heritability based on functional annotations as well as other categories. 5. The accuracy and the runtime of RHE-reg depends on the choice of the number of random vectors B. In practice, we find that the estimator is highly accurate with a small even for moderate sample sizes as we show empirically (Fig. 2). Further, for larger sample sizes, even smaller values of B should be adequate. It is also possible to choose increasing values of B and to terminate when the estimate of does not change considerably. We have not explored this option in detail in this work.
Fig. 2.

Impact of the number of random vectors on the accuracy of RHE-reg: we ran RHE-reg with a different number of random vectors B, and compared the point estimate and standard error to GCTA. The gray area indicates the standard error computed by GCTA. As RHE-reg use more random vectors, the estimate converges. In fact, even with 10 random vectors, the point estimation is accurate

Impact of the number of random vectors on the accuracy of RHE-reg: we ran RHE-reg with a different number of random vectors B, and compared the point estimate and standard error to GCTA. The gray area indicates the standard error computed by GCTA. As RHE-reg use more random vectors, the estimate converges. In fact, even with 10 random vectors, the point estimation is accurate

3 Results

3.1 Simulations

We performed simulations to measure the performance of RHE-reg to other methods for heritability estimation in terms of accuracy, running time and memory usage. We compared RHE-reg to two methods for computing REML estimates: GCTA (Yang ) (which implements an exact numerical optimization algorithm to compute the REML) as well as implementations of HE-regression.

3.2 Accuracy

In our first set of simulations, we compared the accuracy of RHE-reg to our implementation of exact HE-regression as well as GCTA, an implementation that computes the REML. We simulated genotypes assuming each SNP is drawn independently from a Binomial distribution with allele frequency that is sampled uniformly from the interval (0, 1). Given the genotypes, we simulated phenotypes under an infinitesimal model, i.e. with effect size at each SNP drawn independently from a normal distribution with mean zero and variance equal to the heritability divided by the number of SNPs. We considered different values for the true SNP heritability of the phenotype to be 0.2, 0.5 and 0.8. In our first series of experiments, we fixed the number of SNPs at M = 10 000 and varied the number of individuals . In the second series of experiments, we varied the number of SNPs while fixing the number of individuals to be N = 10 000. We repeated each experiment 100 times in order to assess the variance of each of the estimators. We estimated heritability using RHE-reg with B = 100 random vectors. Figure 1 compares the estimates of each of the three methods (RHE-reg, HE-regression and GCTA) to the true heritability. First, we observe that all three methods obtain estimates of heritability that are quite close to each other as well as to the true heritability across the range of parameters explored. Second, RHE-reg and HE-regression are virtually indistinguishable in the variance of their estimates in each configuration. This suggests that the randomization makes a negligible contribution to the statistical accuracy of the MoM estimators. In some cases, RHE-reg even has a smaller variance than HE-regression. Third, as expected, REML obtains estimators that are closer to the true heritability compared to either of the MoM estimators for a high value of true heritability. For lower values of true heritability (), the estimates from REML, HE-regression and RHE-reg are comparable. This result is also expected given that REML is asymptotically equivalent to MoM when the phenotypic correlation between individuals is small (Sham ; Sham and Purcell, 2001). Finally, the sample size has a bigger effect than the number of SNPs on the accuracy of each of the methods, consistent with theory (Visscher ).

3.3 Computational efficiency

In the second set of simulations, we compared the runtime and memory usage of different methods. We compared RHE-reg to two REML methods, GCTA (Yang ) and BOLT-REML (Loh ) (a computationally efficient approximate method to compute the REML) as well as an exact MoM method MMHE (Ge ). In this experiment, we simulated genotype data consisting of 100 000 SNPs over sample sizes of and 500 k and then simulated phenotypes corresponding to the genotype data. For each dataset, we ran RHE-reg with B = 100 random vectors. We performed all comparisons on an Intel(R) Xeon(R) CPU 2.10 GHz server with 128 GB RAM. All computations were restricted to a single core, capped to a maximum runtime of 12 h and a maximum memory of 128 GB. Figure 3 shows that both GCTA and MMHE do not scale to large sample sizes due to the requirement of computing and operating on a GRM that scales quadratically with N. GCTA could not complete its computation when running on individuals while MMHE did not complete its computation on . BOLT-REML and RHE-reg scale linearly with sample size. However, RHE-reg is an order of magnitude faster than BOLT-REML. For example, on a dataset of a size of 500 K individuals, RHE-reg computed the heritability in about 30 min compared to 400 min for BOLT-REML. Figure 3 shows that RHE-reg is memory efficient as well.
Fig. 3.

RHE-reg is efficient: we measured the run time and memory usage of methods for heritability estimation as a function of the number of samples while fixing the number of SNPs to 100 000. We performed all comparisons on an Intel(R) Xeon(R) CPU 2.10 GHz server with 128 GB RAM. All computations were restricted to a single core, capped to a maximum runtime of 12 h and a maximum memory of 128 GB. In (a), GCTA could not finish computation on 100 K samples. For MMHE, the computation stopped at sample size of 50 k due to memory constraints. Although BOLT-REML scales linearly, RHE-reg is significantly faster. In (b), we observe RHE-reg and BOLT-REML have scalable memory requirements

RHE-reg is efficient: we measured the run time and memory usage of methods for heritability estimation as a function of the number of samples while fixing the number of SNPs to 100 000. We performed all comparisons on an Intel(R) Xeon(R) CPU 2.10 GHz server with 128 GB RAM. All computations were restricted to a single core, capped to a maximum runtime of 12 h and a maximum memory of 128 GB. In (a), GCTA could not finish computation on 100 K samples. For MMHE, the computation stopped at sample size of 50 k due to memory constraints. Although BOLT-REML scales linearly, RHE-reg is significantly faster. In (b), we observe RHE-reg and BOLT-REML have scalable memory requirements

3.4 Application to real data

We compared the statistical accuracy and runtime of BOLT-REML, GCTA and RHE-reg on the Northern Finland Birth Cohort (NFBC) dataset. The NFBC dataset contains 315 529 SNPs and 5326 individuals after applying standard filters (minor allele frequency >0.05 and Hardy–Weinberg equilibrium P-value <0.01) (Sabatti ). We applied these methods to estimate the heritability of three phenotypes that were assayed in this dataset: triglycerides (TGs), high-density lipoprotein (HDL) and body mass index (BMI). We compared the runtime, point estimates of the heritability as well as standard errors for each of the three methods. We computed RHE-reg with B = 100 random vectors. As shown in Table 1, the heritability estimates of RHE-reg are concordant with the other methods while being an order of magnitude faster to compute. We note that the NFBC dataset has a sample size so that we expect RHE-reg to be more accurate on larger datasets. The standard error estimates can also be computed in sub-linear time (see Appendix A4).
Table 1.

The estimates of heritability from RHE-reg are consistent with those from GCTA and BOLT-REML on the NFBC data while RHE-reg is substantially faster

Method
GCTA
BOLT-REML
RHE-reg
Runtimehg2Runtimehg2Runtimehg2
(min)(SE)(min)(SE)(min)(SE)
TG11.280.1458.870.1481.610.145
(0.051)(0.051)(0.052)
HDL10.810.3259.720.3261.300.349
(0.051)(0.051)(0.052)
BMI10.850.2379.290.2351.290.200
(0.051)(0.051)(0.052)

Note: We estimate the heritability of phenotypes such as triglycerides (TGs), high-density lipoprotein (HDL) and body mass index (BMI) in the NFBC data set.

The estimates of heritability from RHE-reg are consistent with those from GCTA and BOLT-REML on the NFBC data while RHE-reg is substantially faster Note: We estimate the heritability of phenotypes such as triglycerides (TGs), high-density lipoprotein (HDL) and body mass index (BMI) in the NFBC data set.

3.5 Understanding the computational efficiency of RHE-reg

Our implementation of RHE-reg relies on two ideas to obtain computational efficiency: (i) the use of a randomized estimator of the trace, and (ii) the Mailman algorithm for fast matrix-vector multiplication. To explore the contribution of each of these ideas, we compared the runtimes of a MoM estimator with no randomization (HE-reg), RHE-reg using standard matrix-vector multiplication and RHE-reg using the Mailman algorithm. Table 2 shows the runtimes of each of these variants on the NFBC data. We see that the biggest runtime gain arises from applying the randomized estimator (faster by a factor of 10–12 relative to HE-reg) while the application of the Mailman algorithm reduces the runtime further by a factor of 2 (Table 1).
Table 2.

The major gain in computational efficiency arises from the application of the randomized trace estimate

RuntimeNo MailmanNo randomized
trace estimate
(min)(min)(min)
TG1.613.7038.5
HDL1.302.6036.2
BMI1.292.6836.7

Note: We compare the run time for HE-reg as well as the run time for RHE-reg that does not rely on the Mailman algorithm.

The major gain in computational efficiency arises from the application of the randomized trace estimate Note: We compare the run time for HE-reg as well as the run time for RHE-reg that does not rely on the Mailman algorithm.

3.6 Accuracy of RHE-reg as a function of the number of random vectors B

To explore the impact of the choice of the number of random vectors B on the accuracy of RHE-reg, we compared the heritability estimates of RHE-reg to those obtained from GCTA for the TG phenotype as a function of B. We find good concordance between the estimates from RHE-reg and GCTA even for values of B as low as 10 suggesting that RHE-reg could be even faster in practice with little loss in accuracy (see Fig. 2).

4 Discussion

We proposed a scalable estimator of heritability which is a randomized version of the Haseman–Elston regression (RHE-reg) estimator. The RHE-reg estimator is based on performing a small number of multiplications of the genotype matrix with random vectors with mean zero and identity covariance. Using the properties of the genotype matrix, we can compute this estimator using the Mailman algorithm in time on a dataset containing N individuals, M SNPs and with a small number of B random vectors. We show that this estimator achieves similar accuracy as REML-based methods on both simulated and real data. RHE-reg can be effectively applied to whole-genome genotype data of hundreds of thousands of individuals for rapid variance components estimation. Furthermore, RHE-reg is an unbiased estimator and thus can also be applied to ascertained case-control studies.
  26 in total

1.  Haseman and Elston revisited.

Authors:  R C Elston; S Buxbaum; K B Jacobs; J M Olson
Journal:  Genet Epidemiol       Date:  2000-07       Impact factor: 2.135

2.  Equivalence between Haseman-Elston and variance-components linkage analyses for sib pairs.

Authors:  P C Sham; S Purcell
Journal:  Am J Hum Genet       Date:  2001-05-14       Impact factor: 11.025

Review 3.  Heritability in the genomics era--concepts and misconceptions.

Authors:  Peter M Visscher; William G Hill; Naomi R Wray
Journal:  Nat Rev Genet       Date:  2008-03-04       Impact factor: 53.242

4.  FaST linear mixed models for genome-wide association studies.

Authors:  Christoph Lippert; Jennifer Listgarten; Ying Liu; Carl M Kadie; Robert I Davidson; David Heckerman
Journal:  Nat Methods       Date:  2011-09-04       Impact factor: 28.547

5.  The investigation of linkage between a quantitative trait and a marker locus.

Authors:  J K Haseman; R C Elston
Journal:  Behav Genet       Date:  1972-03       Impact factor: 2.805

6.  Common SNPs explain a large proportion of the heritability for human height.

Authors:  Jian Yang; Beben Benyamin; Brian P McEvoy; Scott Gordon; Anjali K Henders; Dale R Nyholt; Pamela A Madden; Andrew C Heath; Nicholas G Martin; Grant W Montgomery; Michael E Goddard; Peter M Visscher
Journal:  Nat Genet       Date:  2010-06-20       Impact factor: 38.330

Review 7.  Pitfalls of predicting complex traits from SNPs.

Authors:  Naomi R Wray; Jian Yang; Ben J Hayes; Alkes L Price; Michael E Goddard; Peter M Visscher
Journal:  Nat Rev Genet       Date:  2013-07       Impact factor: 53.242

8.  Phenome-wide heritability analysis of the UK Biobank.

Authors:  Tian Ge; Chia-Yen Chen; Benjamin M Neale; Mert R Sabuncu; Jordan W Smoller
Journal:  PLoS Genet       Date:  2017-04-07       Impact factor: 5.917

9.  Polygenic modeling with bayesian sparse linear mixed models.

Authors:  Xiang Zhou; Peter Carbonetto; Matthew Stephens
Journal:  PLoS Genet       Date:  2013-02-07       Impact factor: 5.917

10.  Contrasting genetic architectures of schizophrenia and other complex diseases using fast variance-components analysis.

Authors:  Po-Ru Loh; Gaurav Bhatia; Alexander Gusev; Hilary K Finucane; Brendan K Bulik-Sullivan; Samuela J Pollack; Teresa R de Candia; Sang Hong Lee; Naomi R Wray; Kenneth S Kendler; Michael C O'Donovan; Benjamin M Neale; Nick Patterson; Alkes L Price
Journal:  Nat Genet       Date:  2015-11-02       Impact factor: 38.330

View more
  13 in total

1.  On Negative Heritability and Negative Estimates of Heritability.

Authors:  David Steinsaltz; Andy Dahl; Kenneth W Wachter
Journal:  Genetics       Date:  2020-04-14       Impact factor: 4.562

2.  Trade-offs of Linear Mixed Models in Genome-Wide Association Studies.

Authors:  Haohan Wang; Bryon Aragam; Eric P Xing
Journal:  J Comput Biol       Date:  2022-02-25       Impact factor: 1.479

3.  A unified framework for cross-population trait prediction by leveraging the genetic correlation of polygenic traits.

Authors:  Mingxuan Cai; Jiashun Xiao; Shunkang Zhang; Xiang Wan; Hongyu Zhao; Gang Chen; Can Yang
Journal:  Am J Hum Genet       Date:  2021-03-25       Impact factor: 11.025

4.  Estimating variance components in population scale family trees.

Authors:  Tal Shor; Iris Kalka; Dan Geiger; Yaniv Erlich; Omer Weissbrod
Journal:  PLoS Genet       Date:  2019-05-09       Impact factor: 5.917

5.  Accurate estimation of SNP-heritability from biobank-scale data irrespective of genetic architecture.

Authors:  Kangcheng Hou; Kathryn S Burch; Arunabha Majumdar; Huwenbo Shi; Nicholas Mancuso; Yue Wu; Sriram Sankararaman; Bogdan Pasaniuc
Journal:  Nat Genet       Date:  2019-07-29       Impact factor: 38.330

6.  Inferring Gene-by-Environment Interactions with a Bayesian Whole-Genome Regression Model.

Authors:  Matthew Kerin; Jonathan Marchini
Journal:  Am J Hum Genet       Date:  2020-09-03       Impact factor: 11.025

7.  Subsampling Technique to Estimate Variance Component for UK-Biobank Traits.

Authors:  Ting Xu; Guo-An Qi; Jun Zhu; Hai-Ming Xu; Guo-Bo Chen
Journal:  Front Genet       Date:  2021-03-05       Impact factor: 4.599

8.  Efficient variance components analysis across millions of genomes.

Authors:  Ali Pazokitoroudi; Yue Wu; Kathryn S Burch; Kangcheng Hou; Aaron Zhou; Bogdan Pasaniuc; Sriram Sankararaman
Journal:  Nat Commun       Date:  2020-08-11       Impact factor: 14.919

9.  A non-linear regression method for estimation of gene-environment heritability.

Authors:  Matthew Kerin; Jonathan Marchini
Journal:  Bioinformatics       Date:  2020-12-26       Impact factor: 6.937

10.  Scalable probabilistic PCA for large-scale genetic variation data.

Authors:  Aman Agrawal; Alec M Chiu; Minh Le; Eran Halperin; Sriram Sankararaman
Journal:  PLoS Genet       Date:  2020-05-29       Impact factor: 6.020

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.