| Literature DB >> 32637052 |
Huanhuan Zhu1, Xiang Zhou1,2.
Abstract
In GWAS studies, SNP heritability measures the proportion of phenotypic variance explained by all measured SNPs. Accurate estimation of SNP heritability can help us better understand the degree to which measured genetic variants influence phenotypes. Over the last decade, a variety of statistical methods and software tools have been developed for SNP heritability estimation with different data types including genotype array data, imputed genotype data, whole-genome sequencing data, RNA sequencing data, and bisulfite sequencing data. However, a thorough technical review of these methods, especially from a statistical and computational viewpoint, is currently missing. To fill this knowledge gap, we present a comprehensive review on a broad category of recently developed and commonly used SNP heritability estimation methods. We focus on their modeling assumptions; their interconnected relationships; their applicability to quantitative, binary and count phenotypes; their use of individual level data versus summary statistics, as well as their utility for SNP heritability partitioning. We hope that this review will serve as a useful reference for both methodologists who develop heritability estimation methods and practitioners who perform heritability analysis.Entities:
Keywords: Linear mixed model; Method of moments; REML; SNP heritability; Summary statistics
Year: 2020 PMID: 32637052 PMCID: PMC7330487 DOI: 10.1016/j.csbj.2020.06.011
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Fig. 1Partition of phenotypic variance. VG represents the phenotypic variance due to genetic effects; VE represents the phenotypic variance due to environmental effects; and VG×E represents the phenotypic variance due to gene-environment interactions. The genetic variance VG can be partitioned into three parts: Vadd that represents the additive genetic effects; Vdom represents the dominance genetic effects; and Vepi represents the epistatic effects. The environmental variance VE can also be partitioned into three parts: Vcom represents common environmental effects such as those due to residing in the same family; Vmat represents maternal effects such as nutritional intake during pregnancy; and Venv represents the residual stochastic environmental effects.
Fig. 2The distributions of liability score under random sampling and ascertained case-control sampling for unrelated individuals. Red represents cases while black represents controls. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
A summary of methods for SNP heritability estimation.
| Main Text Sections | Methods | Modeling Assumptions | Estimation Algorithms | Trait Types | Software | Weblink | Comments | References |
|---|---|---|---|---|---|---|---|---|
| Modeling assumptions | BVSR | MCMC | Quantitative | GEMMA | Fast for large-scale data; Also useful for phenotype prediction and PRS construction; supports Mac and Linux platforms | |||
| BSLMM | MCMC | |||||||
| LMM/REML | REML | Quantitative | GEMMA/GCTA | Also useful for SNP association tests with LMM; supports Windows, Mac and Linux platforms | ||||
| LDAK | REML | Quantitative | LDAK | Over 30 functions, importantly, SNP heritability estimation and SNP-based prediction models construction, supported for Mac and Linux platforms | ||||
| DPR | MCMC/VB | Quantitative | DPR | Mainly for robust genetic prediction and PRS construction of complex traits; supports Mac and Linux platforms | ||||
| Case-control study | HE/PCGC | MoM | Binary | PCGC | Mitigate biases in REML heritability estimation for ascertained case-control studies; supports Linux platform | |||
| Count data | PQLseq | MCMC | Binary/Count | PQLseq | For heritability estimation of count data in RNAseq and Bisulfite seq studies; supports Windows, Mac and Linux platforms | |||
| Summary statistics | LDSC | MoM | Quantitative/Binary | LDSC | A command tool for estimating heritability and genetic correlation using GWAS summary statistics | |||
| MQS | MoM | Quantitative/Binary | GEMMA | A general statistical framework for SNP heritability estimation using summary statistics; supports Mac and Linux platforms | ||||
| SumHer | REML | Quantitative/Binary | SumHer | Heritability estimation using summary statistics under the LDAK assumption; supports Linux platform |
Table lists 10 methods described in the main text, with the first seven methods for analyzing individual level data and last three methods for analyzing summary statistics. Columns contain the main text section in which the method is described (1st column), method name (2nd column), modeling assumption on the SNP effect sizes (3rd column), estimation algorithms (4th column), phenotype type (5th column), implemented software (6th column), web link (7th column), additional comments (8th column) and references (9th column). In the 3rd column, denotes a point mass at zero; N(.,.) denotes a normal distribution with the mean and variance parameters; DP denotes a Dirichlet process. In the 4th column, MCMC represents Markov chain Monte Carlo method, VB represents variational Bayesian, REML represents restricted maximum likelihood method, and MoM represents method of moments. In the 8th column, PRS is short for polygenic risk scores.
Fig. 3A decision tree on what type of methods to use for SNP heritability estimation.
A summary of SNP heritability estimates for height using different methods.
| References | Dataset | Data Type | Sample Size | Number of SNPs | SNP type (applicable AF) | Methods | SNP heritability Estimates |
|---|---|---|---|---|---|---|---|
| Australian data | Individual | 35,189 | 294,831 | Array (>0.01) | LMM/REML | 0.449 | |
| Australian data | Individual | 35,189 | 294,831 | Array (>0.01) | BSLMM | 0.41 | |
| LMM/REML | 0.42 | ||||||
| BVSR | 0.15 | ||||||
| Australian data | Individual | 3,925 | 4,352,968 | Imputed (>0.01) | MQS | 0.28 | |
| LMM/REML | 0.27 | ||||||
| HE | 0.25 | ||||||
| LDSC | 0.21 | ||||||
| Australian data | Individual | 35,189 | 294,831 | Array (>0.01) | PCGC/HE | 0.537 | |
| LMM/REML | 0.510 | ||||||
| 24 Published GWAS | Summary | Average 121,000 | 4,555,718 | Imputed (>0.01) | SumHer | 0.46 | |
| LDSC | 0.20 | ||||||
| UK10K | Individual | 44,126 | ~17 M | Imputed (>0.0003) | GREML-LDMS | 0.56 | |
| GREML-MS | 0.523 |
Table lists SNP heritability estimates for height reported in the previous literature. Columns contain the references where the SNP heritability estimates are reported (1st column), dataset name (2nd column), data type in terms of individual-level data versus summary statistics (3rd column), sample size (4th column), number of SNPs (5th column), genotype data type in terms of array data versus imputed data (6th column), used methods (7th column) and the SNP heritability estimates (8th column). Note that the heritability estimates for height in the Austrian data using the imputed data [59] is smaller than that using the array data , which seems to be general phenomenon for many other traits.