| Literature DB >> 24348893 |
Erin Austin1, Wei Pan1, Xiaotong Shen2.
Abstract
An important task in personalized medicine is to predict disease risk based on a person's genome, e.g. on a large number of single-nucleotide polymorphisms (SNPs). Genome-wide association studies (GWAS) make SNP and phenotype data available to researchers. A critical question for researchers is how to best predict disease risk. Penalized regression equipped with variable selection, such as LASSO and SCAD, is deemed to be promising in this setting. However, the sparsity assumption taken by the LASSO, SCAD and many other penalized regression techniques may not be applicable here: it is now hypothesized that many common diseases are associated with many SNPs with small to moderate effects. In this article, we use the GWAS data from the Wellcome Trust Case Control Consortium (WTCCC) to investigate the performance of various unpenalized and penalized regression approaches under true sparse or non-sparse models. We find that in general penalized regression outperformed unpenalized regression; SCAD, TLP and LASSO performed best for sparse models, while elastic net regression was the winner, followed by ridge, TLP and LASSO, for non-sparse models.Entities:
Keywords: AUC; Elastic Net; GWAS; LASSO; Logistic regression; MLE; Ridge; SCAD; SNP; TLP
Year: 2013 PMID: 24348893 PMCID: PMC3859439 DOI: 10.1002/sam.11183
Source DB: PubMed Journal: Stat Anal Data Min ISSN: 1932-1864 Impact factor: 1.051