| Literature DB >> 30841867 |
Simon Bussy1, Raphaël Veil2,3, Vincent Looten2,3, Anita Burgun2,3, Stéphane Gaïffas4,5, Agathe Guilloux6, Brigitte Ranque7,8, Anne-Sophie Jannot2,3.
Abstract
BACKGROUND: Choosing the most performing method in terms of outcome prediction or variables selection is a recurring problem in prognosis studies, leading to many publications on methods comparison. But some aspects have received little attention. First, most comparison studies treat prediction performance and variable selection aspects separately. Second, methods are either compared within a binary outcome setting (where we want to predict whether the readmission will occur within an arbitrarily chosen delay or not) or within a survival analysis setting (where the outcomes are directly the censored times), but not both. In this paper, we propose a comparison methodology to weight up those different settings both in terms of prediction and variables selection, while incorporating advanced machine learning strategies.Entities:
Keywords: High-dimensional prediction; Hospital readmission risk; Machine learning methods; Sickle-cell disease; Survival analysis
Mesh:
Year: 2019 PMID: 30841867 PMCID: PMC6404305 DOI: 10.1186/s12874-019-0673-4
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Fig. 1Illustration of different situations when dealing with censored data that cannot be labeled when using a threshold ε. is the censoring indicator which is equal to 1 if Y is censored and 0 otherwise. In the binary outcome setting, patient 4 would be excluded
Comparison of prediction performances in the two considered settings, with best results in bold
| Setting | Metric | Model | Score |
|---|---|---|---|
| Survival analysis | C-index | CURE | 0.718 |
| Cox PH | 0.725 | ||
| C-mix |
| ||
| Binary outcome | AUC | SVM | 0.524 |
| GB | 0.561 | ||
| LR | 0.616 | ||
| NN | 0.707 | ||
| RF | 0.738 | ||
| 0.831 | |||
| 0.855 | |||
|
|
Fig. 2Estimated survival curves per subgroups (blue for low risk and red for high risk) with the corresponding 95% confidence bands
Fig. 3Comparison of the tests based on the C-mix groups, on the ε=30 days relative groups and on survival times. We arbitrarily shows only the tests with corresponding p-values below the level α=5%, with the classical Bonferroni multitests correction [3]
Fig. 6Covariates boxplot comparison between the most significant C-mix groups
Fig. 4Comparison of the top-20 covariates importance ordered on the C-mix estimates. Note that some time-dependent covariates, such as average cinetic during the last 48 hours of the stay (slope) or Gaussian Processes kernels parameters, appear to have significant importances
Fig. 5Pearson correlation matrix for comparing covariates selection similarities between methods. Red means high correlations