| Literature DB >> 23711206 |
Karolina Sikorska1, Emmanuel Lesaffre, Patrick F J Groenen, Paul H C Eilers.
Abstract
BACKGROUND: Genome-wide association studies have become very popular in identifying genetic contributions to phenotypes. Millions of SNPs are being tested for their association with diseases and traits using linear or logistic regression models. This conceptually simple strategy encounters the following computational issues: a large number of tests and very large genotype files (many Gigabytes) which cannot be directly loaded into the software memory. One of the solutions applied on a grand scale is cluster computing involving large-scale resources. We show how to speed up the computations using matrix operations in pure R code.Entities:
Mesh:
Year: 2013 PMID: 23711206 PMCID: PMC3695771 DOI: 10.1186/1471-2105-14-166
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Speed versus number of covariates. The plot shows the relationship between the speed of the computations using lm function in R and the number of the covariates in the linear regression model.
Speed in Msips for linear model (estimates, standard errors and-values) with covariates for the functions ls, lsfit and semi-parallel (SP)
| 0 | 0.70 | 3.0 | 43.0 |
| 2 | 0.60 | 2.4 | 43.0 |
| 10 | 0.40 | 1.0 | 25.0 |
| 30 | 0.16 | 0.32 | 12.0 |
Figure 2Imputation of missing SNPs using sample mean. The plot displays the effect of the imputation of the missing SNPs using sample mean on the estimates and the p-values. The call rate is set to 95%.
Speed in Msips for logistic model (estimates, standard errors and-values) with covariates for the functions glm and semi-parallel (SP)
| 1 | 0.2 | 20.0 |
| 10 | 0.1 | 17.0 |
| 30 | 0.1 | 8.0 |