| Literature DB >> 20018054 |
Niloofar Arshadi1, Billy Chang, Rafal Kustra.
Abstract
In this paper, we apply the gradient-boosting machine predictive model to the rheumatoid arthritis data for predicting the case-control status. QQ-plot suggests severe population stratification. In univariate genome-wide association studies, a correction factor for ethnicity confounding can be derived. Here we propose a novel strategy to deal with population stratification in the context of multivariate predictive modeling. We address the problem by clustering the subjects on the axes of genetic variations, and building a predictive model separately in each cluster. This allows us to control ethnicity without explicitly including it in the model, which could marginalize the genetic signal we are trying to discover. Clustering not only leads to more similar ethnicity groups but also, as our results show, increases the accuracy of our model when compared to the non-clustered approach. The highest accuracy is achieved with the model adjusted for population stratification, when the genetic axes of variation are included among the set of predictors, although this may be misleading given the confounding effects.Entities:
Year: 2009 PMID: 20018054 PMCID: PMC2795961 DOI: 10.1186/1753-6561-3-s7-s60
Source DB: PubMed Journal: BMC Proc ISSN: 1753-6561
Figure 1QQ-plot for the original data (inflation factor = 1.335).
Figure 2Comparing AUC of GBM predictive models. The average AUC based on 5-fold CV on the training set for "Adjusted GBM", "3 Clusters", and "No clustering".