| Literature DB >> 24404405 |
Gh Lubke1, C Laurin2, R Walters2, N Eriksson3, P Hysi4, Td Spector4, Gw Montgomery5, Ng Martin5, Se Medland5, DI Boomsma6.
Abstract
Typically, genome-wide association studies consist of regressing the phenotype on each SNP separately using an additive genetic model. Although statistical models for recessive, dominant, SNP-SNP, or SNP-environment interactions exist, the testing burden makes an evaluation of all possible effects impractical for genome-wide data. We advocate a two-step approach where the first step consists of a filter that is sensitive to different types of SNP main and interactions effects. The aim is to substantially reduce the number of SNPs such that more specific modeling becomes feasible in a second step. We provide an evaluation of a statistical learning method called "gradient boosting machine" (GBM) that can be used as a filter. GBM does not require an a priori specification of a genetic model, and permits inclusion of large numbers of covariates. GBM can therefore be used to explore multiple GxE interactions, which would not be feasible within the parametric framework used in GWAS. We show in a simulation that GBM performs well even under conditions favorable to the standard additive regression model commonly used in GWAS, and is sensitive to the detection of interaction effects even if one of the interacting variables has a zero main effect. The latter would not be detected in GWAS. Our evaluation is accompanied by an analysis of empirical data concerning hair morphology. We estimate the phenotypic variance explained by increasing numbers of highest ranked SNPs, and show that it is sufficient to select 10K-20K SNPs in the first step of a two-step approach.Entities:
Keywords: Boosting; GCTA; GWAS
Year: 2013 PMID: 24404405 PMCID: PMC3882018 DOI: 10.4172/2153-0602.1000143
Source DB: PubMed Journal: J Data Mining Genomics Proteomics
Figure 1Results of GBM and additive GWA methods applied to hair morphology. At each split the sample is divided into subgroups based on an optimal cut point on the SNP with the best predictive performance.
Results of three simulated SNPs using GBM, Robust SNP, and a standard additive GWA model.
| MAF=0.5 | MAF=0.1 | |||||
|---|---|---|---|---|---|---|
| 0.15% | 0.20% | 0.30% | 0.15% | 0.20% | 0.30% | |
| GBM | 5.6 (8.0) | 3.6 (5.2) | 1.6 (2.4) | 19.6 (28.2) | 9.8 (14.2) | 2.0 (2.9) |
| RobustSNP | 6.7 (9.4) | 4.1 (6.1) | 1.7 (2.4) | 13.2 (18.9) | 4.9 (7.2) | 1.2 (1.8) |
| GWA | 5.2 (7.6) | 5.2 (7.7) | 1.6 (2.3) | 11.0 (14.3) | 3.2 (4.8) | 1.1 (1.5) |
Note: Results are presented as percentile median ranks. For instance, averaged over Monte Carlo replications, 50% of the time a SNP explaining 0.15% of phenotypic variance is ranked within the 5.6 percentile. A robust measure of variability (Median Absolute Deviation, MAD) is given between brackets
Results of two interacting simulated SNPs using GBM, Robust SNP, and a standard additive GWA model.
| SNP 1 | SNP 2 | SNP 1 | SNP 2 | |
|---|---|---|---|---|
| GBM | 1.3 (1.9) | 0.8 (1.2) | 1.1 (1.7) | 19.0 (19.9) |
| RobustSNP | 3.0 (4.2) | 2.2 (3.3) | 1.1 (1.6) | 56.9 (33.7) |
| AdditiveGWA | 1.7 (2.5) | 1.8 (2.7) | 0.8 (1.1) | 52.1 (35.3) |
Note: Results are presented as percentile median ranks of detecting SNP. A robust measure of variability (Median Absolute Deviation, MAD) is given between brackets. SNP1 always explains 0.3% of the variance, SNP2 either explains 0.3% (scenario1) or zero% (scenario2). Using GBM, a SNP with a zero main effect (SNP2, scenario2) is within the19th percentile in 50% of the Monte Carlo replications. Robust SNP and additive GWAS perform (as expected) only at chance level (i.e., median percentile rank around 50).
Figure 2Manhattan plots for GBM, and GWAS. The green horizontal line marks the genome-wide significance level. Relevant SNPs on chromosomes 1, 2, and 8 are marked in blue.
Figure 3GCTA results
Comparison of variance in hair curliness explained by top 1 k, 5 k, 10 k, and 20 k SNPs resulting from standard GWAS and GBM. Bars represent point estimates, whiskers 95% confidence intervals.