| Literature DB >> 24348746 |
Yifei Chen1, Zhenyu Jia2, Dan Mercola3, Xiaohui Xie4.
Abstract
Survival analysis focuses on modeling and predicting the time to an event of interest. Many statistical models have been proposed for survival analysis. They often impose strong assumptions on hazard functions, which describe how the risk of an event changes over time depending on covariates associated with each individual. In particular, the prevalent proportional hazards model assumes that covariates are multiplicatively related to the hazard. Here we propose a nonparametric model for survival analysis that does not explicitly assume particular forms of hazard functions. Our nonparametric model utilizes an ensemble of regression trees to determine how the hazard function varies according to the associated covariates. The ensemble model is trained using a gradient boosting method to optimize a smoothed approximation of the concordance index, which is one of the most widely used metrics in survival model performance evaluation. We implemented our model in a software package called GBMCI (gradient boosting machine for concordance index) and benchmarked the performance of our model against other popular survival models with a large-scale breast cancer prognosis dataset. Our experiment shows that GBMCI consistently outperforms other methods based on a number of covariate settings. GBMCI is implemented in R and is freely available online.Entities:
Mesh:
Year: 2013 PMID: 24348746 PMCID: PMC3853154 DOI: 10.1155/2013/873595
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
Algorithm 1(Stochastic) gradient boosting machine for concordance index learning (GBMCI).
The five sets of features extracted from the Metabric breast cancer dataset.
| Category | Abbreviation | Explanation |
|---|---|---|
| Clinical feature |
| A subset of clinical covariates is selected by fitting the Cox model with AIC in a stepwise algorithm. The frequently selected features include age at diagnosis, lymph node status, treatment type, tumor size, tumor group, and tumor grade. |
| Gene feature |
| A subset of gene expression microarray probes using Illumina HT 12v3 platform is selected whose concordance indices to the survival data are ranked highest (positive concordant) or lowest (negative concordant). A few examples are, “ILMN_1683450,” “ILMN_2392472,” “ILMN_1700337.” |
| Clinical and gene feature |
| A combination of previously selected clinical features and gene expression features is used to fit the Cox model with AIC in a stepwise algorithm, yielding a refined subset of features. |
| Metagene feature |
| The high-dimensional gene expression data is fed into an iterative |
| Clinical and Metagene feature |
| A minimum subset of metagenes which has strong prognosis power for breast cancer [ |
Figure 1Predictive performance I of GBM methods on the breast cancer dataset. The box plots show the predictive concordance indices of “gbmsci” and “gbmcox” in 50 random experiments without subsampling, using the five feature representations explained in Table 1. In each box plot, the central red line indicates the median C-index; the blue box is the [25%, 75%] area; the black whiskers reach the upper and lower extremes not including outliers; the red “+” symbols represent the outliers.
Numerical statistics of predictive concordance indices of GBM models and the Cox model on the breast cancer dataset. The five feature representations are explained in Table 1. “gbmsci”-I and “gbmcox”-I run without subsampling (n /n = 1), while “gbmsci”-II and “gbmcox”-II run with subsampling (n /n = 0.5). The numerics in each entry show the average C-index and the standard deviation (in parentheses) over 50 random runs. The best performance in each column is highlighted by the bold font.
| Model | Feature Representation | ||||
|---|---|---|---|---|---|
|
|
|
|
|
| |
| “gbmsci”-I |
| 0.7287 (0.0005) | 0.6599 (0.0004) | 0.7145 (0.0004) |
|
| “gbmcox”-I | 0.7039 (0.0008) | 0.7268 (0.0013) | 0.6523 (0.0007) | 0.7110 (0.0014) | 0.7222 (0.0003) |
| “gbmsci”-II | 0.7063 (0.0011) |
|
| 0.7169 (0.0017) | 0.7405 (0.0015) |
| “gbmcox”-II | 0.6983 (0.0009) | 0.7298 (0.0008) | 0.6549 (0.0014) |
| 0.7306 (0.0008) |
| “cox” | 0.7042 | 0.7140 | 0.6590 | 0.6659 | 0.7299 |