| Literature DB >> 36238497 |
Georgios Kantidakis1,2,3, Audinga-Dea Hazewinkel1,2,4,5, Marta Fiocco1,2,6.
Abstract
Survival analysis deals with the expected duration of time until one or more events of interest occur. Time to the event of interest may be unobserved, a phenomenon commonly known as right censoring, which renders the analysis of these data challenging. Over the years, machine learning algorithms have been developed and adapted to right-censored data. Neural networks have been repeatedly employed to build clinical prediction models in healthcare with a focus on cancer and cardiology. We present the first ever attempt at a large-scale review of survival neural networks (SNNs) with prognostic factors for clinical prediction in medicine. This work provides a comprehensive understanding of the literature (24 studies from 1990 to August 2021, global search in PubMed). Relevant manuscripts are classified as methodological/technical (novel methodology or new theoretical model; 13 studies) or applications (11 studies). We investigate how researchers have used neural networks to fit survival data for prediction. There are two methodological trends: either time is added as part of the input features and a single output node is specified, or multiple output nodes are defined for each time interval. A critical appraisal of model aspects that should be designed and reported more carefully is performed. We identify key characteristics of prediction models (i.e., number of patients/predictors, evaluation measures, calibration), and compare ANN's predictive performance to the Cox proportional hazards model. The median sample size is 920 patients, and the median number of predictors is 7. Major findings include poor reporting (e.g., regarding missing data, hyperparameters) as well as inaccurate model development/validation. Calibration is neglected in more than half of the studies. Cox models are not developed to their full potential and claims for the performance of SNNs are exaggerated. Light is shed on the current state of art of SNNs in medicine with prognostic factors. Recommendations are made for the reporting of clinical prediction models. Limitations are discussed, and future directions are proposed for researchers who seek to develop existing methodology.Entities:
Mesh:
Year: 2022 PMID: 36238497 PMCID: PMC9553343 DOI: 10.1155/2022/1176060
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.809
Figure 1Preferred reporting items for systematic reviews and meta-analyses (PRISMA) flowchart. Reasons for exclusion of the 214 studies in screening step 1 were: ML techniques for classification (n = 98), predictions based on individual's images (n = 25), models with focus on feature selection (n = 18), bioinformatics/computational biology analysis only (n = 15), other ML techniques for survival analysis (n = 15), unsupervised learning (n = 10), other reasons (n = 33) including ML techniques for risk group stratification (n = 6), systematic/literature review (n = 6), new prediction tool (n = 5), ML techniques for regression (n = 4), ensemble of different ML techniques (n = 3), no prediction (n = 3), letter to the editor (n = 2), model for nonhumans (n = 2), models with focus on feature reduction (n = 1), and tutorial/case study (n = 1).
Notations used in this review.
| Notation | Description |
|---|---|
|
| Survival time |
|
| Maximum follow-up time (in years) |
|
| Conditional survival probability in (output) unit |
|
| Conditional event probability in (output) unit |
|
| Output unit |
|
| Connection weight matrix |
|
| Vector of regression coefficients |
|
| Covariate matrix |
|
| Vector of |
|
| Observed outcome of individual |
|
| Activation function for the hidden layer |
|
| Activation function for the output layer |
|
| Bias unit (node) |
|
| Error (loss) function for the ANN |
|
| Event indicator of individual |
|
| Probability that patient |
|
| Cumulative event probability in (output) unit |
Figure 2Two basic architectures of survival neural networks. (a) A network where time (interval) is coded as a prognostic variable (input feature). Data transformation into a long format is required for each patient. The output layer makes predictions in a given time interval. (b) A network where time (interval) is not coded as part of the prognostic variables. The wide data format is adequate for each patient. The output layer makes predictions at multiple sequential (nonoverlapping) time intervals.
Figure 3Visualization of the PLANN by Biganzoli et al. [20].
Figure 4A schematic representation of the SNN by Han et al., adapted from [33], built for 242 patients with synovial sarcoma. Here ANN means artificial neural network, x is the set of 9 clinical features, p is the survival probability of the previous year t − 1 sequentially updated (10 input feature), h(x) is the predicted survival risk (alive/death probability), and p the predicted survival probability for the following year t.
General characteristics for the 24 studies. If multiple outcomes were predicted, multiple lines were used in the extraction sheet. Maximum number of lines was 34 (10 studies used multiple outcomes). For simulation studies, the number of predictors and percentage of events were not considered, unless they were fixed (e.g., not varied across simulations).
| Min | 1 | Median | 3 | Max | Excel lines | |
|---|---|---|---|---|---|---|
| Total sample size | 96 | 242 | 920 | 1616 | 361239 | 33 |
| # of predictors | 1 | 5 | 7 | 25.75 | 97 | 32 |
| % of events | 6.60 | 21.32 | 29.25 | 47.58 | 97.90 | 20 |
The performance measures used for model validation across the 24 studies.
| Performance criterion | N (%) |
|---|---|
| C-index | 7 (29.2%) |
| AUC | 5 (20.8%) |
| Log-likelihood | 3 (12.5%) |
| Accuracy | 2 (8.3%) |
| Global Chi-squared statistic of Cox regression | 2 (8.3%) |
| Brier score | 1 (4.2%) |
| Comparison of predicted probabilities with Kaplan-Meier | 1 (4.2%) |
| Integrated brier score (IBS) | 1 (4.2%) |
| Mean absolute error (MAE) | 1 (4.2%) |
| McNemar's test | 1 (4.2%) |
| Mean squared error (MSE) | 1 (4.2%) |
| Prognostic risk group discrimination | 1 (4.2%) |
| Sensitivity | 1 (4.2%) |
| Separation of cases into good and bad prognosis | 1 (4.2%) |
| Specificity | 1 (4.2%) |
| Survival curves comparison with log-rank test | 1 (4.2%) |
| Time-dependent C-index ( | 1 (4.2%) |
| Wilcoxon test (separation of cases into good and bad prognosis) | 1 (4.2%) |
Summary of the findings from the critical appraisal across the 24 manuscripts.
| Unclear addressing of missing data (42.9%) or ad-hoc methods (23.8%) |
| Unclear reporting of hyperparameters (62.5%) |
| Unclear reporting of the performance criterion for model development (25.0%) |
| Unclear scaling of prognostic factors (41.7%) |
| Unclear programming language for SNNs (29.2%) |
| Large variability and improper performance measures for survival data |
| External validation for only 4 outcomes (11.8%) |
| No confidence intervals for the predictive measures (54.2%) |
| No calibration plots (54.2%) |
| No interactions in Cox regression or unclear reporting (89.5%) |