| Literature DB >> 33941078 |
Sarah S Ji1, Christopher A German1, Kenneth Lange2,3, Janet S Sinsheimer1,2,3, Hua Zhou1, Jin Zhou4, Eric M Sobel5,6.
Abstract
BACKGROUND: Statistical geneticists employ simulation to estimate the power of proposed studies, test new analysis tools, and evaluate properties of causal models. Although there are existing trait simulators, there is ample room for modernization. For example, most phenotype simulators are limited to Gaussian traits or traits transformable to normality, while ignoring qualitative traits and realistic, non-normal trait distributions. Also, modern computer languages, such as Julia, that accommodate parallelization and cloud-based computing are now mainstream but rarely used in older applications. To meet the challenges of contemporary big studies, it is important for geneticists to adopt new computational tools.Entities:
Keywords: Power; Realistic genetic models; Statistical genetics; Trait simulation
Mesh:
Year: 2021 PMID: 33941078 PMCID: PMC8091532 DOI: 10.1186/s12859-021-04086-8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Simulation models included in TraitSimulation
| Simulation model | Relatedness data | Typical application | |
|---|---|---|---|
| (1) | Generalized linear models | – | Exponential-family traits |
| (2) | Case/control models | – | Disease status traits |
| (3) | Proportional hazards/odds models | – | Ordinal traits |
| (4) | Variance component models | GRM | Correlated normal traits |
| (5) | Generalized linear mixed models | GRM | Correlated non-normal traits |
Syntax for model construction and for the simulate function (see text for variable definitions)
| Simulation model in Table | Model construction syntax |
|---|---|
| (1) | Model = GLMTrait( |
| Model = GLMTrait( | |
| (2, 3) | Model = OrderedMultinomialTrait( |
| (4) | Model = VCMTrait( |
| Model = VCMTrait(formula, | |
| Model = VCMTrait( | |
| (5) | Model = GLMMTrait( |
Fig. 1Open mendel pipeline example. TraitSimulation fits within a software pipeline to assess the power of association analysis under the variance components model of Case Study 2
Fig. 2Case study 1: power under an ordinal multinomial model. This example shows the power to detect a single causal SNP in UK Biobank data with four outcome categories for disease status. Using an ordinal multinomial simulation model and the OpenMendel module for ordinal trait regression [13], we assume a single SNP as a fixed effect and control for sex and standardized age. The figure compares analysis results for three SNPs of varying MAF over 1000 simulation replicates each. For each SNP, the graph depicts the power to detect that SNP at significance level . For each SNP, the effect size varies from 0 to 0.05 in increments of 0.001. On the x-axis, we exponentiate effect sizes to covert to odds ratios. See the text for a detailed description of the model
Fig. 3Case study 2: power under univariate and bivariate variance components models. This example shows the power to detect a single causal SNP using both univariate and bivariate variance components simulation models and the OpenMendel module for variance components analysis [8]. For each anlysis, each line in the graph depicts the power to detect a SNP with using 1000 simulations at significance level . The SNP effect size varies from 0 to 0.065 in increments of 0.002 in the center range (0.016–0.032) and increments of 0.005 in the two end ranges. On the x-axis, we convert the SNP MAF and effect sizes into the proportion of variation explained by the SNP. See the text for a detailed description of the model
For the ordered multinomial model, power calculation runtimes in seconds
| 707.8 | |
| 14350.2 |
A total of 1000 replications were performed for each combination (k, n) of the number of causal SNPs and the sample size. This was repeated for several SNP effect sizes (see Fig. 2) and the median runtimes are recorded here
For the univariate and bivariate variance components model, power calculation runtimes in seconds
| Univariate | |||
| 72.4 | 202.7 | 815.4 | |
| 1422.6 | 4122.1 | 16018.4 | |
| Bivariate | |||
| 215.5 | 354.7 | 978.7 | |
| 4207.8 | 7007.2 | 19644.8 |
A total of 1000 replications were performed for each combination (k, n) of the number of causal SNPs and the sample size. This was repeated for several SNP effect sizes (see Fig. 3) and the median runtimes are recorded here