| Literature DB >> 35818973 |
Raphael Sonabend1,2,3, Andreas Bender4, Sebastian Vollmer1,5,6.
Abstract
MOTIVATION: In this paper we consider how to evaluate survival distribution predictions with measures of discrimination. This is non-trivial as discrimination measures are the most commonly used in survival analysis and yet there is no clear method to derive a risk prediction from a distribution prediction. We survey methods proposed in literature and software and consider their respective advantages and disadvantages.Entities:
Year: 2022 PMID: 35818973 PMCID: PMC9438958 DOI: 10.1093/bioinformatics/btac451
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.931
Fig. 1.PRISMA diagram for literature review. Database: PubMed. Search terms: ‘(comparison OR benchmark) AND (“survival analysis” OR “time-to-event analysis”) AND “machine learning” AND (discrimination OR concordance OR “C statistic” OR “c index”)’. Inclusion criteria: articles that compare machine learning survival predictions with measures of discrimination
Fig. 2.Extrapolation methods to ‘fix’ improper distribution predictions. Top: Kaplan–Meier estimator fit on the rats (Mantel ) dataset (Table 1), which results in an improper distribution as . Middle: Dropping the survival probability to zero at T = 105, just after the study end. Bottom: Dropping the survival probability to zero by linearly extrapolating from first, , and last, , observed survival times. Dashed horizontal lines are drawn at and dotted vertical lines at T = 104, where the observed data ends and the extrapolation begins. Median (m) and mean (μ) are provided for both extrapolation methods. Both methods result in quantities skewed heavily toward the final extrapolated time. For the ‘dropping’ method the median is exactly at the final time. Linear extrapolation results in probabilities that are unrealistically large (a lab rat lives 2 years on average)
First five rows of the rats dataset from package survival (Therneau, 2022)
| id | Litter | rx | Sex | Time | Status |
|---|---|---|---|---|---|
| 1 | 1 | 1 | f | 101 | 0 |
| 2 | 1 | 0 | f | 49 | 1 |
| 3 | 1 | 0 | f | 104 | 0 |
| 4 | 2 | 1 | m | 91 | 0 |
| 5 | 2 | 0 | m | 104 | 0 |
Note: The dataset includes 300 rows, three predictors and the survival outcome as time and status columns.
Various C-index calculations from different methods and models
| Measure | Type | Trafo. | CPH (R) | RSF (D) | GBM (R) |
|---|---|---|---|---|---|
|
| TI | — | 0.859 | — | 0.831 |
|
| TI | — |
| — |
|
|
| TD | — | 0.852 | 0.757 | — |
|
| TI | Prob (min) | 0.500 | 0.500 | — |
|
| TI | Prob (max) | 0.859 |
| — |
|
| TI | Prob (rand) | 0.859 | 0.851 | — |
|
| TI | Summary (naive) | 0.141 | 0.104 | — |
|
| TI | Summary (extr) | 0.859 | 0.871 | — |
|
| TI | ExpMort | 0.859 | 0.878 | — |
Note: Included models are Cox PH (CPH), random survival forest (RSF) and gradient boosting machine with C-index optimization (GBM). CPH predicts a risk natively (R) and uses a distribution transformation with a PH model form and Breslow estimator to predict a distribution. RSF predicts a distribution natively (D) and uses an ensemble mortality transformation to predict risk. GBM predicts a risk natively (R). Models are evaluated either with Harrell’s C (C), Uno’s C (C) or Antolini’s C (C). The second column states if a measure is time-independent (TI) or time-dependent (TD). The third column states the transformation required to evaluate a survival distribution prediction with a measure of discrimination, these are: computing C on the predicted survival probability at the time-point that results in the smallest value for RSF (‘Prob (min)’); C computed on the predicted survival probability at the time-point that results in the largest value for RSF (‘Prob (max)’); C computed on the predicted survival probability at an arbitrary time-point (‘Prob (rand)’); C computed on the distribution expectation without any extrapolation (‘Summary (naive)’); C computed on the distribution expectation after extrapolating by dropping survival probabilities to zero (Fig 2 middle) (‘Summary (extr)’); C computed on the expected mortality (‘ExpMort’). Dashes (‘–’) in the final two columns indicate that the given measure is incompatible with the prediction type without transformation. Values in bold are the maximum C-index for that model.