| Literature DB >> 31896241 |
Abstract
Survival analysis mainly deals with the time to event, including death, onset of disease, and bankruptcy. The common characteristic of survival analysis is that it contains "censored" data, in which the time to event cannot be completely observed, but instead represents the lower bound of the time to event. Only the occurrence of either time to event or censoring time is observed. Many traditional statistical methods have been effectively used for analyzing survival data with censored observations. However, with the development of high-throughput technologies for producing "omics" data, more advanced statistical methods, such as regularization, should be required to construct the predictive survival model with high-dimensional genomic data. Furthermore, machine learning approaches have been adapted for survival analysis, to fit nonlinear and complex interaction effects between predictors, and achieve more accurate prediction of individual survival probability. Presently, since most clinicians and medical researchers can easily assess statistical programs for analyzing survival data, a review article is helpful for understanding statistical methods used in survival analysis. We review traditional survival methods and regularization methods, with various penalty functions, for the analysis of high-dimensional genomics, and describe machine learning techniques that have been adapted to survival analysis.Entities:
Keywords: Cox model; Kaplan-Meier curve; censoring; machine learning; regularization; survival time
Year: 2019 PMID: 31896241 PMCID: PMC6944043 DOI: 10.5808/GI.2019.17.4.e41
Source DB: PubMed Journal: Genomics Inform ISSN: 1598-866X
2 × 2 table at time t for calculating the log-rank test statistic
| Group | Dead | Alive | Risk set |
|---|---|---|---|
| Treatment | |||
| Placebo | |||
| Total |
Result of univariate Cox model for clinical variables
| Clinical variable | p-value[ | ||
|---|---|---|---|
| Age at initial pathologic diagnosis | 0.103 | ||
| Maximum tumor dimension | 0.001 | ||
| Sex | 0.312 | ||
| Anatomic neoplasm subdivision | 0.017 | ||
| Surgery performed type | <0.001 | ||
| Residual tumor | 0.108 | ||
| T stage | 0.096 | ||
| N stage | 0.053 | ||
| Radiation therapy | 0.004 | ||
| Postoperative rx tx | <0.001 | ||
| Person neoplasm cancer status | <0.001 |
p-value from a univariate Cox model.
Fig. 1.Clinical variable selection scheme.
Comparison of C-index of using clinical and lasso gene variables
| C-index | |||
|---|---|---|---|
| Method | Clinical | Lasso genes | Clinical + Genes |
| Cox model | 0.75 ± 0.06 | 0.60 ± 0.14 | 0.84 ± 0.03 |
| SVM | 0.65 ± 0.12 | 0.50 ± 0.03 | 0.74 ± 0.07 |
| RSF | 0.73 ± 0.11 | 0.56 ± 0.08 | 0.78 ± 0.08 |
| Cox boosting | 0.75 ± 0.06 | 0.60 ± 0.13 | 0.84 ± 0.03 |
Values are presented as mean ± standard deviation.
SVM, support vector machine; RSF, random survival forest.
Comparison of C-index using clinical and elastic net gene variables
| Method | C-index | |||||||
|---|---|---|---|---|---|---|---|---|
| Clinical | E-N gene | Clinical + Genes | ||||||
| Cox model | 0.75 ± 0.06 | 0.64 ± 0.09 | 0.79 ± 0.07 | |||||
| SVM | 0.65 ± 0.12 | 0.54 ± 0.14 | 0.64 ± 0.16 | |||||
| RSF | 0.73 ± 0.11 | 0.61 ± 0.07 | 0.77 ± 0.08 | |||||
| Cox boosting | 0.75 ± 0.06 | 0.64 ± 0.09 | 0.80 ± 0.05 | |||||
Values are presented as mean ± standard deviation.
SVM, support vector machine; RSF, random survival forest.