| Literature DB >> 21718536 |
Mousheng Xu1, Kelan G Tantisira, Ann Wu, Augusto A Litonjua, Jen-hwa Chu, Blanca E Himes, Amy Damask, Scott T Weiss.
Abstract
BACKGROUND: Personalized health-care promises tailored health-care solutions to individual patients based on their genetic background and/or environmental exposure history. To date, disease prediction has been based on a few environmental factors and/or single nucleotide polymorphisms (SNPs), while complex diseases are usually affected by many genetic and environmental factors with each factor contributing a small portion to the outcome. We hypothesized that the use of random forests classifiers to select SNPs would result in an improved predictive model of asthma exacerbations. We tested this hypothesis in a population of childhood asthmatics.Entities:
Mesh:
Year: 2011 PMID: 21718536 PMCID: PMC3148549 DOI: 10.1186/1471-2350-12-90
Source DB: PubMed Journal: BMC Med Genet ISSN: 1471-2350 Impact factor: 2.103
Sample Characteristics (All Subjects Are Caucasian)
| Training Population N = 417 | Testing Population N = 164 | |||
|---|---|---|---|---|
| Subjects | 127 (30%) | 290 (70%) | 50 (30%) | 114 (70%) |
| Age (mean ± s.d.) | 8.41 ± 2.07 | 8.89 ± 2.07 | 8.54 ± 2.28 | 9.45 ± 2.19 |
| Male | 69% | 61% | 46% | 53% |
| FEV1% (mean ± s.d.) | 92.9 ± 15.7 | 93.5 ± 13.2 | 95.4 ± 17.1 | 95.4 ± 13.5 |
| Treatment | ||||
| Budesonide | 20% | 31% | 36% | 32% |
| Nedocromil | 28% | 29% | 30% | 32% |
| Placebo | 52% | 39% | 34% | 36% |
Figure 1The "manhattan plot" of RF importance scores of all the 550k SNPs. X-axis: the SNPs in chromosomal order; Y-axis: the RF importance scores. The black demarcation separates the top 4k SNPs from the rest.
Figure 2Comparison of performance of predicting severe asthma exacerbation with different methods. Y-axis: AUC; X-axis: the number of SNPs used in a model. "Random SNPs": SNPs are chosen randomly from all SNPs and used as input variables to predict asthma exacerbations, and this process has been iterated 10 times [see Methods for details]; "Permuted": asthma exacerbation is permuted across samples while clinical traits and SNPs are kept with the samples, and this process has been iterated 10 times [see Methods for details]; "Training": the AUC of the model trained and built with all the Stage 1 samples predicting on the same samples; "Internal cross-validation": the AUC of the model built with 90% of the randomly selected Stage 1 samples predicting on the rest (10%) of the Stage 1 samples; "Independent replication": the AUC of the model built with all the Stage 1 samples predicting on all the Stage 2 samples.
Figure 3ROC curves using clinical attributes plus 160 SNPs as predictors. The red curve is obtained for the training of the Stage 1 samples, the blue curve is for the testing of the Stage 2 samples, the grey diagonal line is a theoretical curve representing random guess. Both the red and the blue curves are higher than the grey line, indicating better than random prediction; and they are similar to each other, suggesting the true predictability of the RF model. The p-value for the independent testing AUC to be different from 0.5 is 0.000266.
Figure 4Performance comparison of predicting severe asthma exacerbation with or without clinical traits. Y-axis: AUC; X-axis: the number of SNPs used for prediction. Blue: SNPs plus clinical traits; Red: SNPs alone.