| Literature DB >> 23795347 |
Wonsuk Yoo1, Brian A Ference, Michele L Cote, Ann Schwartz.
Abstract
Genome wide association studies (GWAS) have identified numerous single nucleotide polymorphisms (SNPs) that are associated with a variety of common human diseases. Due to the weak marginal effect of most disease-associated SNPs, attention has recently turned to evaluating the combined effect of multiple disease-associated SNPs on the risk of disease. Several recent multigenic studies show potential evidence of applying multigenic approaches in association studies of various diseases including lung cancer. But the question remains as to the best methodology to analyze single nucleotide polymorphisms in multiple genes. In this work, we consider four methods-logistic regression, logic regression, classification tree, and random forests-to compare results for identifying important genes or gene-gene and gene-environmental interactions. To evaluate the performance of four methods, the cross-validation misclassification error and areas under the curves are provided. We performed a simulation study and applied them to the data from a large-scale, population-based, case-control study.Entities:
Keywords: Area under the Curve; Classification tree; Cross-validation error; Logic regression; Logistic regression; Random Forests; SNP interactions
Year: 2012 PMID: 23795347 PMCID: PMC3686280
Source DB: PubMed Journal: Int J Appl Sci Technol