| Literature DB >> 35650647 |
Hayley Smith1, Michael Sweeting2,3, Tim Morris4, Michael J Crowther5.
Abstract
BACKGROUND: There is substantial interest in the adaptation and application of so-called machine learning approaches to prognostic modelling of censored time-to-event data. These methods must be compared and evaluated against existing methods in a variety of scenarios to determine their predictive performance. A scoping review of how machine learning methods have been compared to traditional survival models is important to identify the comparisons that have been made and issues where they are lacking, biased towards one approach or misleading.Entities:
Keywords: Clinical risk prediction; Machine learning; Prognostic modelling; Simulation studies; Survival analysis
Year: 2022 PMID: 35650647 PMCID: PMC9161606 DOI: 10.1186/s41512-022-00124-y
Source DB: PubMed Journal: Diagn Progn Res ISSN: 2397-7523
Fig. 1Taxonomy of methods for prognostic modelling as defined in this review, adapted from the taxonomy in Wang et al. [11]. Methods were categorised as statistical (a), machine learning (b), or hybrid methods (c) and highlighted in bold if included in articles in this review
Inclusion criteria used for the title, abstract and full-text screening
| Inclusion criteria | |
|---|---|
| 1 | Compare at least one machine learning method and at least one statistical method (according to our definitions). Any number of hybrid methods can be compared but a machine learning method and a statistical method must be included. |
| 2 | Methods included should be prognostic (risk prediction) models for one, specific outcome in a medical/healthcare context. |
| 3 | Methods included must be used to predict survival outcomes. |
| 4 | The simulation study must have been used to compare the methods with a time-to-event outcome with censoring. |
| 5 | Methods must be evaluated and compared in terms of prognostic ability. |
| 6 | Methods must not be for modelling treatment effects, feature selection or genetic variant identification. |
Fig. 2PRISMA flow diagram to illustrate the screening process
Authors and titles of the articles included in this review
| Author/s | Publication date | Title | Journal |
|---|---|---|---|
| Xiang et al. [ | 2000 | Comparison of the performance of neural network methods and Cox regression for censored survival data | |
| Omurlu et al. [ | 2009 | The comparisons of random survival forests and Cox regression analysis with simulation and an application related to breast cancer | |
| Lowsky et al. [ | 2012 | A K-nearest neighbors survival probability prediction method | |
| Geng et al. [ | 2014 | A Model-Free Machine Learning Method for Risk Classification and Survival Probability Prediction | |
| Gong et al. [ | 2018 | Big Data Toolsets to Pharmacometrics: Application of Machine Learning for Time-to-Event Analysis | |
| Hu and Steingrimsson [ | 2018 | Personalized Risk Prediction in Clinical Oncology Research: Applications and Practical Issues Using Survival Trees and Random Forests | |
| Katzman et al. [ | 2018 | DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network | |
| Wang and Li [ | 2019 | Extreme learning machine Cox model for high-dimensional survival analysis | |
| Golmakani and Polley [ | 2020 | Super Learner for Survival Data Prediction | |
| Steingrimsson and Morrison [ | 2020 | Deep learning for survival outcomes |
The number of repetitions, number of data-generating mechanisms and factors varied in each article
| Repetitions | Factors varied in the data-generating mechanisms | |||||||
|---|---|---|---|---|---|---|---|---|
| Number of DGMs | Sample size | Failure time distribution | Number of covariates | Covariate relationships | Covariate effects | Censoring | ||
| Geng et al. (2014) [ | 100 | 20 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Golmakani et al. (2020) [ | 1 | 6 | ✓ | ✓ | ✓ | ✓ | ||
| Gong et al. (2018) | 500 | 57 | ✓ | ✓ | ✓ | ✓ | ✓ | |
| Hu and Steingrimsson (2018) [ | 1000 | 4 | ✓ | ✓ | ✓ | |||
| Katzman et al. (2018) [ | 1 | 2 | ✓ | |||||
| Lowsky et al. (2012) | 20 | 12 | ✓ | ✓ | ||||
| Omurlu et al. (2009) [ | 1000 | 4 | ✓ | |||||
| Steingrimsson and Morrison (2020) [ | 1000 | 16 | ✓ | ✓ | ✓ | ✓ | ✓ | |
| Wang and Li (2019) [ | 24 | ✓ | ✓ | |||||
| Xiang et al. (2000) [ | 50 | 9 | ✓ | ✓ | ✓ | ✓ | ✓ | |
*Gong et al. (2018) [27] also included three data-generating mechanisms where data was based on clinical data
**All simulated datasets in Lowsky et al. (2012) [25] were based on a real kidney transplant dataset
***Numbers of repetitions were unclear in Wang and Li (2019) [30]
Number of covariates, distribution type and relationships between covariates in each article’s simulations
| Covariates | |||||||||
|---|---|---|---|---|---|---|---|---|---|
| Number of covariates | Distribution | Relationships | |||||||
| Binomial | Normal | Uniform | Real Data | Independent | Correlation | Interaction, e.g. X3 = X1X2 | Correlation and interaction | ||
| Geng et al. (2014) [ | 2 | ✓ | ✓ | ✓ | |||||
| Golmakani et al. (2020) [ | 50, 1000 | ✓ | ✓ | ✓ | |||||
| Gong et al. (2018) | 2, 3, 250 | ✓ | ✓ | ✓ | ✓ | ✓ | |||
| Hu and Steingrimsson (2018) [ | 50 | ✓ | ✓ | ✓ | |||||
| Katzman et al. (2018) [ | 10 | ✓ | ✓ | ||||||
| Lowsky et al. (2012) | 13 | ✓ | |||||||
| Omurlu et al. (2009) [ | 5 | ✓ | ✓ | ✓ | |||||
| Steingrimsson and Morrison (2020) [ | 30, 100 | ✓ | ✓ | ||||||
| Wang and Li (2019) [ | 500, 1000, 2000, 5000 | ✓ | ✓ | ||||||
| Xiang et al. (2000) [ | 2, 4 | ✓ | ✓ | ✓ | ✓ | ||||
*Gong et al. (2018) [27] used distributions and parameter values to model clinical data in their clinically relevant datasets and included three data-generating mechanisms where the covariate relationships were modelled to be clinically relevant
**Lowsky et al. (2012) [25] used real clinical data for their covariates and so exact relationships are unknown
Failure time distributions, assumptions and covariate effects included in the data-generating mechanisms for each article
| Failure Times | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Distribution | Assumptions | Covariate effects | |||||||||
| Exponential | Weibull | Gamma | PH | PO | Non-PH | Null effects | Linear | Quadratic covariates | Non-linear | Time-dependent | |
| Geng et al. (2014) [ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||||
| Golmakani et al. (2020) [ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||||
| Gong et al. (2018) [ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||||
| Hu and Steingrimsson (2018) [ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||
| Katzman et al. (2018) [ | ✓ | ✓ | ✓ | ✓ | |||||||
| Lowsky et al. (2012) | ✓ | ✓ | ✓ | ✓ | ✓ | ||||||
| Omurlu et al. (2009) [ | ✓ | ✓ | ✓ | ||||||||
| Steingrimsson and Morrison (2020) [ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||
| Wang and Li (2019) [ | ✓ | ✓ | ✓ | ✓ | |||||||
| Xiang et al. (2000) [ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||||
*Geng et al. (2014) [26] included a specific crossing hazards data-generating mechanism
**Gong et al. (2018) [27] take the exponential of the first covariate squared and cos transform second covariate; covariate coefficients were also obtained for the clinically relevant data-generating mechanisms by fitting each of the predefined models to clinical data
***Katzman et al. (2018) [29] use a Gaussian distribution for the linear predictor and include quadratic effects for both covariates
****Lowsky et al. (2012) [25] fit an exponential model to the clinical data to obtain estimates for the covariate coefficients to use in simulating the failure times
*****Wang and Li (2019) [30] transform the covariates by a radial basis kernel
Level of censoring simulated and distribution of censoring times used in each article
| Censoring | ||||
|---|---|---|---|---|
| Level of censoring (%) | Distribution | |||
| Uniform | Exponential | Other | ||
| Geng et al. (2014) [ | 15, 40 | ✓ | ||
| Golmakani et al. (2020) [ | 18 | ✓ | ||
| Gong et al. (2018) [ | 0, 25, 50, 75 | ✓ | ||
| Hu and Steingrimsson (2018) [ | 37 | ✓ | ||
| Katzman et al. (2018) [ | Unclear | ✓ | ||
| Lowsky et al. (2012) [ | Unclear | ✓ | ||
| Omurlu et al. (2009) [ | Unclear | |||
| Steingrimsson and Morrison (2020) [ | 18, 47 | ✓ | ✓ | |
| Wang and Li (2019) [ | 25 | ✓ | ||
| Xiang et al. (2000) [ | 0, 20, 30, 50, 70 | ✓ | ||
*Gong et al. (2018) [27] randomly chose if the time was a censoring time or event time
**Katzman et al. (2018) [29] included administrative censoring only
***Lowsky et al. (2012) [25] — censoring distribution was unclear
Training and testing data size and method used to split training and testing data
| Training and testing datasets | |||
|---|---|---|---|
| Training data size | Testing data size | Method | |
| Geng et al. (2014)* [ | 100 200 | 1000 1000 | Independent samples from DGM |
| Golmakani et al. (2020) [ | 450 720 | 50 80 | 10-fold cross-validation |
| Gong et al. (2018) [ | 200 400 500 600 800 1000 | 200 400 500 600 800 1000 | Independent samples from DGM |
| Hu and Steingrimsson (2018) [ | 200 500 | 1000 1000 | Independent samples from DGM |
| Katzman et al. (2018)* [ | 4000 | 1000 | Independent samples from DGM |
| Lowsky et al. (2012)* [ | 500 1000 3000 7500 | 13525 13525 13525 13525 | Independent samples from DGM |
| Omurlu et al. (2009) [ | 50 100 250 500 | 50 100 250 500 | Unclear |
| Steingrimsson and Morrison (2020) [ | 250 500 1000 1500 3000 | 250 500 1000 1500 3000 | Independent samples from DGM |
| Wang and Li (2019) [ | 150 | 150 | Two-fold cross-validation |
| Xiang et al. (2000) [ | 100 200 | 100 200 | Randomly split whole sample into equal training and testing sets |
*These articles also included validation datasets
Statistical, hybrid and machine learning methods included in each of the articles
| Statistical methods | Hybrid methods | Machine learning methods | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Cox PH | Penalised L1 Cox (Lasso) | Penalised L2 Cox (Ridge) | Elastic Net Cox | Cox Boost | Super Learners | Mahalanobis K-nearest neighbour Kaplan-Meier | RSF | Neural Network | Boosting | SVM | |
| Geng et al. (2014) [ | ✓ | ✓ | |||||||||
| Golmakani et al. (2020) [ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||
| Gong et al. (2018) [ | ✓ | ✓ | ✓ | ||||||||
| Hu and Steingrimsson (2018) [ | ✓ | ✓ | ✓ | ||||||||
| Katzman et al. (2018) [ | ✓ | ✓ | ✓ | ||||||||
| Lowsky et al. (2012) [ | ✓ | ✓ | ✓ | ||||||||
| Omurlu et al. (2009) [ | ✓ | ✓ | |||||||||
| Steingrimsson and Morrison (2020) [ | ✓ | ✓ | ✓ | ✓ | |||||||
| Wang and Li (2019) [ | ✓ | ✓ | ✓ | ✓ | ✓ | ||||||
| Xiang et al. (2000) [ | ✓ | ✓ | |||||||||
+Methods that were developed by the authors of the papers
Estimands and performance measures for each of the article’s simulation studies
| Estimands | Performance measures | |||||||
|---|---|---|---|---|---|---|---|---|
| Selection for time | S(t|x) | h(t|x) | Linear predictor: | Restricted Mean Survival Time (RMST) | MSPE | C-Index* | Integrated Brier Score | |
| Geng et al. (2014) [ | 1/5th,…,5/6th quantiles of training survival times | ✓ | ✓ | |||||
| Golmakani et al. (2020) [ | N/A | ✓ | ✓ | |||||
| Gong et al. (2018) [ | Unclear | ✓ | ✓ | |||||
| Hu and Steingrimsson (2018) [ | 25th, 50th and 70th quantile of training marginal survival times | ✓ | ✓ | |||||
| Katzman et al. (2018) [ | N/A for linear predictor; Unclear for restricted mean survival and C-index | ✓ | ✓ | ✓ | ✓ | |||
| Lowsky et al. (2012) [ | ✓ | ✓ | ||||||
| Omurlu et al. (2009) | ✓ | |||||||
| Steingrimsson and Morrison (2020) [ | S( | ✓ | ✓ | ✓ | ||||
| Wang and Li (2019) [ | Unclear | ✓ | ✓ | ✓ | ✓ | |||
| Xiang et al. (2000) [ | Unclear | ✓ | ✓ | |||||
*A specified value for t for the C-index is not always required — if the model assumes proportional hazards then the C-index should remain the same regardless of time point
**Lowsky et al. (2012) [25] used the Integrated Brier Score with added inverse probability of censoring weights. This is referred to as the IPEC in the paper
***Omurlu et al. (2009) [24] were unclear in what the estimands were for their simulation study