| Literature DB >> 18466561 |
Weiliang Shi1, Kristine E Lee, Grace Wahba.
Abstract
The Genetic Analysis Workshop 15 Problem 3 simulated rheumatoid arthritis data set provided 100 replicates of simulated single-nucleotide polymorphism (SNP) and covariate data sets for 1500 families with an affected sib pair and 2000 controls, modeled after real rheumatoid arthritis data. The data generation model included nine unobserved trait loci, most of which have one or more of the generated SNPs associated with them. These data sets provide an ideal experimental test bed for evaluating new and old algorithms for selecting SNPs and covariates that can separate cases from controls, because the cases and controls are known as well as the identities of the trait loci. LASSO-Patternsearch is a new multi-step algorithm with a LASSO-type penalized likelihood method at its core specifically designed to detect and model interactions between important predictor variables. In this article the original LASSO-Patternsearch algorithm is modified to handle the large number of SNPs plus covariates. We start with a screen step within the framework of parametric logistic regression. The patterns that survived the screen step were further selected by a penalized logistic regression with the LASSO penalty. And finally, a parametric logistic regression model were built on the patterns that survived the LASSO step. In our analysis of Genetic Analysis Workshop 15 Problem 3 data we have identified most of the associated SNPs and relevant covariates. Upon using the model as a classifier, very competitive error rates were obtained.Entities:
Year: 2007 PMID: 18466561 PMCID: PMC2367607 DOI: 10.1186/1753-6561-1-s1-s60
Source DB: PubMed Journal: BMC Proc ISSN: 1753-6561
Model on chromosome 6
| Variable | No. variant alleles at locus | Coefficienta | SD | |
| Smoking | - | 0.8653 | 0.1088 | 10-15 |
| Sex | - | 1.0478 | 0.1131 | 10-20 |
| SNP6_153 | 1 | -2.0411 | 0.1365 | 10-50 |
| SNP6_154 | 1 | -1.4509 | 0.1448 | 10-23 |
| SNP6_162 | 1 | 2.2297 | 0.2767 | 10-16 |
| SNP6_153 | 2 | -5.5977 | 0.2707 | 10-95 |
aCoefficients are estimated in the final parametric logistic regression model.
Main effects model on all chromosomes
| Variable | Level | Coefficient | SD | |
| Smoking | - | 1.0434 | 0.1214 | 10-18 |
| Sex | - | 1.0819 | 0.1251 | 10-18 |
| SNP6_154 | 1 | -1.6228 | 0.1395 | 10-31 |
| SNP6_162 | 1 | 2.2717 | 0.2885 | 10-15 |
| HLA DR type, father | 2 | 2.3848 | 0.1405 | 10-64 |
| HLA DR type, mother | 2 | 2.3443 | 0.1388 | 10-64 |
| SNP6_154 | 2 | -3.0081 | 0.5492 | 10-8 |
| SNP11_389 | 2 | 0.9521 | 0.1264 | 10-14 |
aFor SNPs, level is the number of variant alleles. For DR type, level = 1 means DR1 and level = 2 means DR4.
Interactions on all chromosomes
| Variable 1 | No. variant alleles of Variable 1 | Variable 2 | No. variant alleles of Variable 2 | Coefficient | SD | |
| SNP2_542 | 1 | SNP2_768 | 1 | -0.5061 | 0.1389 | 0.0003 |
| SNP1_673 | 1 | SNP15_77 | 1 | -0.8369 | 0.1693 | 10-7 |
| SNP8_233 | 1 | SNP16_131 | 2 | -0.8044 | 0.1633 | 10-7 |
The raw data of sex and SNP6 15. The denominator is the total number of people and the numerator is the total number of RA patients.
| SNP6154 | Male | Female |
| 0 | 341/551 = 0.619 | 1015/1241 = 0.818 |
| 1 | 43/520 = 0.083 | 97/628 = 0.154 |
| 2 | 1/270 = 0.004 | 3/290 = 0.010 |