| Literature DB >> 19144132 |
Harald Binder1, Martin Schumacher.
Abstract
BACKGROUND: There are several techniques for fitting risk prediction models to high-dimensional data, arising from microarrays. However, the biological knowledge about relations between genes is only rarely taken into account. One recent approach incorporates pathway information, available, e.g., from the KEGG database, by augmenting the penalty term in Lasso estimation for continuous response models.Entities:
Mesh:
Year: 2009 PMID: 19144132 PMCID: PMC2647532 DOI: 10.1186/1471-2105-10-18
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Results of the simulation study.
| Model | intercept | Lasso | Li&Li | lik.boost | PathBoost |
| 1 | 762.5 (14.4) | 83.6 (2.6) | 42.5 (1.1) | 83.4 (2.4) | 61.0 (1.7) |
| 2 | 305.8 (5.1) | 91.0 (2.7) | 80.8 (1.9) | 89.7 (2.7) | 64.8 (1.8) |
| 3 | 215.6 (4.1) | 32.6 (0.9) | 24.9 (0.8) | 32.1 (0.9) | 26.5 (0.7) |
| 4 | 131.1 (2.4) | 32.6 (0.9) | 29.9 (0.7) | 32.5 (0.9) | 26.9 (0.7) |
| 5 | 525.7 (9.9) | 87.9 (2.6) | 61.6 (1.5) | 85.6 (2.2) | 62.2 (1.6) |
| 6 | 171.6 (3.3) | 32.9 (0.9) | 27.6 (0.7) | 32.2 (0.9) | 26.9 (0.8) |
Predictive mean squared error, mean and standard errors (in parentheses), for an intercept-only model, the Lasso, the pathway-based procedure proposed in [9] (Li&Li), componentwise likelihood-based boosting (lik.boost), and boosting with pathway information (PathBoost) for six types of generating models.
Figure 1Coefficient paths for the DLBCL data. Coefficient paths for boosting without pathway information (left panel) and PathBoost (right panel), applied to DLBCL data. The models selected by 10-fold cross validation are indicated by vertical lines. Microarray features common to both models are indicated by solid curves, the others by dotted curves.
Figure 2Prediction error curves for the DLBCL data. Bootstrap .632+ prediction error curve estimates for boosting without pathway information (dashed curves) and PathBoost (solid curves), applied to DLBCL data, without (thick curves) and with clinical covariates (thin curves). The Kaplan-Meier benchmark (grey curve) and a purely clinical model (dotted curve) are given as a reference.
Figure 3Prediction error curves for the ovarian cancer data. Bootstrap .632+ prediction error curve estimates for boosting without pathway information (dashed curves) and PathBoost (solid curves), applied to ovarian cancer data, without (thick curves) and with clinical covariates (thin curves). The Kaplan-Meier benchmark (grey curve) is given as a reference.