| Literature DB >> 33734375 |
Andrey Ziyatdinov1, Jihye Kim1, Dmitry Prokopenko2,3, Florian Privé4, Fabien Laporte5, Po-Ru Loh6,7, Peter Kraft1, Hugues Aschard1,5.
Abstract
The effective sample size (ESS) is a metric used to summarize in a single term the amount of correlation in a sample. It is of particular interest when predicting the statistical power of genome-wide association studies (GWAS) based on linear mixed models. Here, we introduce an analytical form of the ESS for mixed-model GWAS of quantitative traits and relate it to empirical estimators recently proposed. Using our framework, we derived approximations of the ESS for analyses of related and unrelated samples and for both marginal genetic and gene-environment interaction tests. We conducted simulations to validate our approximations and to provide a quantitative perspective on the statistical power of various scenarios, including power loss due to family relatedness and power gains due to conditioning on the polygenic signal. Our analyses also demonstrate that the power of gene-environment interaction GWAS in related individuals strongly depends on the family structure and exposure distribution. Finally, we performed a series of mixed-model GWAS on data from the UK Biobank and confirmed the simulation results. We notably found that the expected power drop due to family relatedness in the UK Biobank is negligible.Entities:
Year: 2021 PMID: 33734375 PMCID: PMC8495748 DOI: 10.1093/g3journal/jkab057
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Scenarios and covariance matrices for testing the marginal genetic effect
| Scenario | Model | Study design | Σ | Σ |
|---|---|---|---|---|
| Unrelated | LR | Unrelated |
|
|
| Families | LMM | Related |
|
|
| Unrelated+Grouping | LMM | Unrelated |
|
|
| Unrelated+GRM | LMM | Unrelated |
|
|
The relationship matrices are as follows: K is the kinship matrix; F is the group-membership matrix; G is the GRM.
Scenarios and covariance matrices for testing the gene-environment interaction effect
| Scenario | Model | Study design | Σ | Σ |
|---|---|---|---|---|
| Unrelated | LR | Unrelated |
|
|
| Families | LMM | Related |
|
|
| Unrelated+ Grouping | LMM | Unrelated |
|
|
| Unrelated+ GRM | LMM | Unrelated |
|
|
The relationship matrices specific to testing gene-environment interactions are as follows: K is an interaction kinship matrix (Sul ); G is an interaction genetic relationship (GRM) matrix defined similarly to K.
Figure 1The relative power of detecting marginal genetic effect β. (A) The ESS multiplier is less than one for the Families scenario and greater than one for the Unrelated+Grouping scenario compared to the baseline Unrelated scenario. The amount of variance explained by the random effect ( or ) varies from 0 to 100%. (B) The power of detecting β increases with the sample size at different rates for the Unrelated, Families, and Unrelated+Grouping scenarios. The random effect and genetic variant explain 50 and 1% of trait variance, respectively. (C) The covariance matrices of the trait and genetic variant Σ and Σ (used to compute ) are depicted when 50% of the trait variance is explained by the random effect (denoted by * on panel A).
Figure 2The accuracy of two empirical multipliers (A) and (B) is evaluated against the analytical multiplier (red bars). Association studies of six anthropometric traits are performed using LR and low-rank LMM in 336,347 UK Biobank unrelated individuals. The empirical multipliers are estimated from the tests statistics of the top 1000 associated variants for each trait: all 1000 variants (dark gray bars) and a subset of 1000 variants (significant in LMM, P < , and nominally significant in LR, P < 0.05) (beige bars). The error bars show the distribution of ratios of squared standard errors () or test statistic () between the LMM and LR models, denoting first to third quartiles.
Figure 3The relative power of detecting the gene-environment interaction effect δ. The frequency of binary exposure is 0.6; the exposure status is fixed for the Families scenario such that two parents are unexposed and three offspring are exposed. (A) The ESS multiplier is greater than one for both Families and Unrelated+Grouping scenarios compared to the baseline Unrelated scenario. The amount of variance explained by the random effects ( or ) varies from 0 to 100%. (B) The power of detecting δ increases with the sample size at different rates for the Unrelated, Families and Unrelated+Grouping scenarios. The random effects (jointly) and the interaction variable explain 50% and 1% of trait variance, respectively. (C) The covariance matrices of the trait and interaction variable Σ and Σ (used to compute ) are depicted when 50% of trait variance is explained by random effects (denoted by * on panel A). The colored gradients in entries of matrices denote quantitative differences for positive values, while gray-colored entries correspond to negative values. The ratio between and is fixed to 0.1; both genetic and environmental variables also explain 1% of the trait variance in addition to 1% of the interaction variable.
Figure 4The relative power of detecting the gene-environment interaction effect δ in nuclear families under different simulation settings. The ESS multiplier is analytically computed (i) for all possible realizations of a binary exposure within a nuclear family with 2 parents and 3 offspring (dots in each panel) and (ii) for different ratios between and (three panels). The amount of the trait variance is jointly explained by the random effects and is fixed to 50%. The largest two values of the multiplier on the left and middle panels correspond to exposure realizations: exposed offspring/unexposed parents and exposed parents/unexposed offspring.