| Literature DB >> 28105311 |
Søren Wengel Mogensen1, Anne H Petersen1, Ann-Sophie Buchardt1, Niels Richard Hansen1.
Abstract
Survival prognosis is challenging, and accurate prediction of individual survival times is often very difficult. Better statistical methodology and more data can help improve the prognostic models, but it is important that methods and data usages are evaluated properly. The Prostate Cancer DREAM Challenge offered a framework for training and blinded validation of prognostic models using a large and rich dataset on patients diagnosed with metastatic castrate resistant prostate cancer. Using the Prostate Cancer DREAM Challenge data we investigated and compared an array of methods combining imputation techniques of missing values for prognostic variables with tree-based and lasso-based variable selection and model fitting methods. The benchmark metric used was integrated AUC (iAUC), and all methods were benchmarked using cross-validation on the training data as well as via the blinded validation. We found that survival forests without prior variable selection achieved the best overall performance (cv-iAUC = 0.70, validation-iACU = 0.78), while a generalized additive model was best among those methods that used explicit prior variable selection (cv-iAUC = 0.69, validation-iACU = 0.76). Our findings largely concurred with previous results in terms of the choice of important prognostic variables, though we did not find the level of prostate specific antigen to have prognostic value given the other variables included in the data.Entities:
Keywords: generalized additive models; imputation; lasso; stability selection; survival forests; survival prognostic models
Year: 2016 PMID: 28105311 PMCID: PMC5200946 DOI: 10.12688/f1000research.8427.1
Source DB: PubMed Journal: F1000Res ISSN: 2046-1402
Number of patients and registered deaths from the three clinical trials in the training data.
| Trial | Nr. of patients | Nr. of deaths |
|---|---|---|
| ASCENT-2 | 476 | 138 |
| MAINSAIL | 526 | 92 |
| VENICE | 598 | 433 |
| Total | 1600 | 663 |
Supplementary Figure 1. Correlation plot for all binary predictors.
Number of patients stratified according to age group and trial.
| ASC.-2 | MAINSAIL | VENICE | ENT.-33 | Total | |
|---|---|---|---|---|---|
| 18–64 | 111 | 171 | 219 | 111 | 612 |
| 65–74 | 211 | 246 | 254 | 141 | 852 |
| ≥75 | 154 | 109 | 125 | 61 | 449 |
| Total | 476 | 526 | 598 | 313 | 1913 |
Number of patients stratified according to geographic region and trial.
| ASC.-2 | MAINSAIL | VENICE | ENT.-33 | Total | |
|---|---|---|---|---|---|
| W. Europe | 0 | 247 | 212 | 104 | 563 |
| N. America | 0 | 139 | 80 | 61 | 280 |
| E. Europe | 0 | 84 | 127 | 50 | 261 |
| S. America | 0 | 0 | 86 | 38 | 124 |
| Asia/Pacific | 0 | 0 | 0 | 47 | 47 |
| Africa | 0 | 0 | 0 | 13 | 13 |
| Other | 0 | 50 | 93 | 0 | 143 |
| Missing | 476 | 6 | 0 | 0 | 482 |
| Total | 476 | 526 | 598 | 313 | 1913 |
Number of patients stratified according to race and trial.
| ASC.-2 | MAINSAIL | VENICE | ENT.-33 | Total | |
|---|---|---|---|---|---|
| White | 419 | 433 | 538 | 225 | 1615 |
| Asian | 5 | 0 | 36 | 49 | 90 |
| Black | 32 | 25 | 17 | 12 | 86 |
| Hispanic | 14 | 0 | 0 | 0 | 14 |
| Other | 6 | 13 | 7 | 27 | 53 |
| Missing | 0 | 55 | 0 | 0 | 55 |
| Total | 476 | 526 | 598 | 313 | 1913 |
Descriptions of the 93 predictor variables included in the modeling.
The variables selected by stability selection by any of the imputation methods are marked with a *.
| Variable name | Type | Description | Missing
| Missing in
|
|---|---|---|---|---|
| AGEGRP2 | Cat. (ord) | Age Group (3 categories) | 0 | 0 |
| RACE_C* | Cat. | Race (5 categories) | 55 | 0 |
| REGION_C | Cat. | Region of the World (5 categories) | 482 | 0 |
| ECOG_C* | Cat. (ord) | Baseline Patient Performance Status | 1 | 0 |
| NON_TARGET | Binary | Baseline Non-Target Lesion(s) | 0 | 0 |
| TARGET | Binary | Baseline Target Lesion(s) | 0 | 0 |
| BONE | Binary | Baseline Bone Lesion(s) | 0 | 0 |
| RECTAL | Binary | Baseline Rectal Lesion(s) | 0 | 0 |
| LYMPH_NODES | Binary | Baseline Lymph Node Lesion(s) | 0 | 0 |
| KIDNEYS | Binary | Baseline Kidney Lesion(s) | 0 | 0 |
| LUNGS | Binary | Baseline Lung Lesion(s) | 0 | 0 |
| LIVER* | Binary | Baseline Liver Lesion(s) | 0 | 0 |
| PLEURA | Binary | Baseline Pleura Lesion(s) | 0 | 0 |
| OTHER | Binary | Baseline Other Lesion(s) | 0 | 0 |
| PROSTATE | Binary | Baseline Prostate Lesion(s) | 0 | 0 |
| ADRENAL* | Binary | Baseline Adrenal Lesion(s) | 0 | 0 |
| BLADDER | Binary | Baseline Bladder Lesion(s) | 0 | 0 |
| PERITONEUM | Binary | Baseline Peritoneum Lesion(s) | 0 | 0 |
| COLON | Binary | Baseline Colon Lesion(s) | 0 | 0 |
| SOFT_TISSUE | Binary | Baseline Soft Tissue Lesion(s) | 0 | 0 |
| ORCHIDECTOMY | Binary | Prior Orchidectomy(includes bilateral) | 0 | 0 |
| PROSTATECTOMY | Binary | Prior Prostatectomy | 0 | 0 |
| TURP | Binary | Prior Turp | 0 | 0 |
| LYMPHADENECTOMY | Binary | Prior Bilateral Lymphadenectomy | 0 | 0 |
| SPINAL_CORD_SURGERY | Binary | Prior Spinal Cord Surgery | 0 | 0 |
| BILATERAL_ORCHIDECTOMY | Binary | Prior Bilateral Orchidectomy | 0 | 0 |
| PRIOR_RADIOTHERAPY | Binary | Prior Radiotherapy | 0 | 0 |
| ANALGESICS* | Binary | Prior analgesics | 0 | 0 |
| ANTI_ANDROGENS | Binary | Prior Anti-Androgens | 0 | 0 |
| GLUCOCORTICOID | Binary | Prior Glucocorticoids | 0 | 0 |
| GONADOTROPIN | Binary | Prior Gomadotropin | 0 | 0 |
| BISPHOSPHONATE | Binary | Prior Bisphosponate | 0 | 0 |
| CORTICOSTEROID | Binary | Prior Corticosteroid | 0 | 0 |
| IMIDAZOLE | Binary | Prior Imidazole | 0 | 0 |
| ACE_INHIBITORS | Binary | Prior ACE Inhibitors | 0 | 0 |
| BETA_BLOCKING | Binary | Prior Beta Blocking Agents | 0 | 0 |
| HMG_COA_REDUCT | Binary | Prior HMG COA Reductase
| 0 | 0 |
| ESTROGENS | Binary | Prior Estrogens | 0 | 0 |
| ANTI_ESTROGENS | Binary | Prior Anti-Estrogens | 0 | 0 |
| CEREBACC | Binary | Cerebrovascular accident | 0 | 0 |
| CHF | Binary | Congestive heart failure | 0 | 0 |
| DVT | Binary | Deep venous thrombosis | 0 | 0 |
| DIAB | Binary | Diabetes | 0 | 0 |
| GASTREFL | Binary | Gastroesophageal reflux disease | 0 | 0 |
| GIBLEED | Binary | Gastrointestinal bleed | 0 | 0 |
| MI | Binary | Myocardial infarction | 0 | 0 |
| PUD | Binary | Peptic ulcer disease | 0 | 0 |
| PULMEMB | Binary | Pulmonary embolism | 0 | 0 |
| PATHFRAC | Binary | Pathological bone fractures | 0 | 0 |
| SPINCOMP | Binary | Spinal cord compression | 0 | 0 |
| COPD | Binary | Chronic obstructive pulmonary disease | 0 | 0 |
| MHBLOOD | Binary | Blood & lymphatic system | 0 | 0 |
| MHCARD | Binary | Cardiac disorders | 0 | 0 |
| MHCONGEN | Binary | Congenital, familial & genetic | 0 | 0 |
| MHEAR | Binary | ear & Labyrinth | 0 | 0 |
| MHENDO | Binary | Endocrine disorders | 0 | 0 |
| MHEYE | Binary | Eye disorders | 0 | 0 |
| MHGASTRO | Binary | Gastrointestinal disorders | 0 | 0 |
| MHGEN | Binary | Gen. disord & admin site | 0 | 0 |
| MHHEPATO | Binary | Hepatobiliary disorders | 0 | 0 |
| MHIMMUNE | Binary | Immune system disorders | 0 | 0 |
| MHINFECT | Binary | Infections & infestations | 0 | 0 |
| MHINJURY | Binary | Injury, poison & procedural | 0 | 0 |
| MHINVEST | Binary | Investigations | 0 | 0 |
| MHMETAB | Binary | Metabolism & nutrition | 0 | 0 |
| MHMUSCLE | Binary | Musc./skeletal & connect tissue | 0 | 0 |
| MHNERV | Binary | Nervous system disorders | 0 | 0 |
| MHPSYCH | Binary | Psychiatric disorders | 0 | 0 |
| MHRENAL | Binary | Renal & urinary disorders | 0 | 0 |
| MHRESP | Binary | Resp., thoracic & mediastinal | 0 | 0 |
| MHSKIN | Binary | Skin & subcutaneous tissue | 0 | 0 |
| MHSOCIAL | Binary | Social circumstances | 0 | 0 |
| MHSURG | Binary | Surgical & medical procedures | 0 | 0 |
| MHVASC | Binary | Vascular disorders | 0 | 0 |
| BMI* | Numerical | Baseline body mass index (kg/m2) | 10 | 1 |
| ALP* | Numerical | Alkaline phosphatase u/l | 5 | 2 |
| ALT | Numerical | Alanine transaminase u/l | 5 | 3 |
| AST* | Numerical | Aspartate aminotransferase u/l | 13 | 3 |
| CA | Numerical | Calcium mmol/l | 11 | 2 |
| CREAT | Numerical | Creatinine umol/l | 3 | 2 |
| HB* | Numerical | Hemoglobin g/dl | 13 | 4 |
| LDH | Numerical | Lactate dehydrogenase u/l | 610 | 4 |
| NEU | Numerical | Neutrophils 10 9/l | 21 | 4 |
| PLT | Numerical | Platelet count 10 9/l | 17 | 10 |
| PSA | Numerical | Prostate specific antigen ng/ml | 11 | 10 |
| TBILI | Numerical | Total bilirubin umol/l | 23 | 2 |
| TESTO | Numerical | Testosterone nmol/l | 855 | 2 |
| WBC | Numerical | White blood cells 10 9/l | 13 | 4 |
| NA. | Numerical | Sodium mmol/l | 481 | 2 |
| MG | Numerical | Magnesium mmol/l | 510 | 2 |
| PHOS | Numerical | Phosphorus mmol/l | 504 | 2 |
| ALB* | Numerical | Albumin g/l | 493 | 2 |
| TPRO | Numerical | Total protein g/l | 504 | 2 |
Figure 1. Correlation plot (left) for all binary predictors.
See Supplementary Figure 1 for the correlation plot with labels. Correlations (right, below the diagonal) and pairwise associations as given by loess scatter plot smoothers (right, above the diagonal) for the numerical predictors.
The eight used combinations of variable selection and methods for fitting a survival model.
All eight combinations were used in combination with all the three imputation methods: MCAR, MAR and MARwR.
| Method | All variables | Stab. selected
|
|---|---|---|
| Lasso | ✓ | |
| Debias.
| ✓ | |
| Cox | ✓ | |
| Gam | ✓ | |
| Forest | ✓ | ✓ |
| Boosting | ✓ | ✓ |
Figure 2. Integrated AUC for different combinations of methods evaluated by three replications of 5-fold cross-validation.
Results are shown for individual folds (light blue filled circles) and averaged over all folds (red filled circles). The figure also shows iAUC on the validation data (purple filled squares) and iAUC for the reference model on the validation data (purple dashed line). The four methods marked with a * used variables chosen via stability selection, whereas the other four methods relied on implicit variable selection.
Figure 3. Selection proportions for the 20 most stably selected variables stratified by imputation method.
The threshold of 50% (red line) was used for the final variable selection.