| Literature DB >> 25880752 |
Maud Tournoud1, Audrey Larue2, Marie-Angelique Cazalis3, Fabienne Venet4, Alexandre Pachot5, Guillaume Monneret6, Alain Lepape7, Jean-Baptiste Veyrieras8.
Abstract
BACKGROUND: Construction and validation of a prognostic model for survival data in the clinical domain is still an active field of research. Nevertheless there is no consensus on how to develop routine prognostic tests based on a combination of RT-qPCR biomarkers and clinical or demographic variables. In particular, the estimation of the model performance requires to properly account for the RT-qPCR experimental design.Entities:
Mesh:
Substances:
Year: 2015 PMID: 25880752 PMCID: PMC4384357 DOI: 10.1186/s12859-015-0537-9
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Summary of strategies to build prognostic survival models
|
|
|
|
|
|
|---|---|---|---|---|
| Uni_Cox-[1-9]* | Each candidate covariate was selected apriori and included in a univariate model.There is 1 suffix number in the model name per selected covariate | All candidate covariate are continuous, and a linear relationship is assumed | Cox model | |
| bwAIC_FP_Cox-[1-5]* bwAIC_FP/C2-3Fac_Cox-[1-5]* | Backward elimination using AIC criterion | Fractional polynomial to model functionalrelationship. The suffix in the model namecorresponds to the degree of flexibility,controlled with the parameter | Cox model | |
| MFP_Cox-[1-15]* | MFP procedure for variable selection controlled by the parameter select =0.05,0.10, 0.15. Larger select values correspond to less stringent variable selection. Suffix in the model name corresponds to a combination of select and alpha values | Fractional polynomial to model functional relationship. The parameter | Cox model | |
| Lasso-[1-5]* Lasso_C2-3Fac-[1-5] | L1 penalty for variable selection, controlled by the parameter | Linear relationship, except for C2 and C3 which have been dichotomized according to expert knowledge in the Lasso_C2-3Fac_Cox- models. | Lasso Cox model | L1 penalty |
| aLasso-[1-5]* aLasso_C2-3Fac-[1-5]* | L1 penalty for variable selection, controlled by the parameter | Linear relationship, except for C2 and C3 which have been dichotomized according to expert knowledge in the aLasso_C2-3Fac_Cox- models. | Adaptive LassoCox model | Adaptive lasso penalty for coefficients shrinkage (larger coefficients are less shrinked towards 0 in the adaptive lasso model than in the lasso model. |
| SCAD-[1-5]* SCAD_C2-3Fac-[1-5]* | L1 penalty for variable selection, controlled by the parameter | Linear relationship, except for C2 and C3 which have been dichotomized according to expert knowledge in the SCAD_Lasso_C2-3Fac_Cox- models. | SCAD Cox model | SCAD penalty for coefficients shrinkage (larger coefficients are less shrinked towards 0 in the SCAD model than in the lasso model. |
| Lasso_Cox-[1-5]* Lasso_C2-3Fac_Cox-[1-5]* | L1 penalty for variable selection, controlled by the parameter | Linear relationship, except for C2 and C3 which have been dichotomized according to expert knowledge in the Lasso_C2-3Fac_Cox- models. | Cox model |
The first column gives the names of the tested strategies. The strategies cover a wide range of state-of-the-art methods from both low and high dimensional settings. The second column details the variable selection method used; the third column the functional relationship for continuous covariates; the fourth column the survival model; and the last column the coefficients shrinkage strategy if any. The ∗ suffix indicates the index of the prognostic model falling in the strategy. For example, Uni_Cox −[1−9]∗ means that 9 univariate Cox models were built, each of them being suffixed by digit 1 to 9.
Figure 1Patient level (A) vs PCR bacth level (B) resampling strategies. The training dataset includes 5 batches (on the left of the figure). The figure presents an example of patients resampling in a given fold, and a given iteration. In each batch, gene expression of survivor (open circles) and non-survivor (plain circles) patients are measured. In strategy A, samples are randomly drawn within batches to be included in the training fold-data. In strategy B, entire batches are selected and included in the training-fold data. The model building step is performed on the training-fold data and model performance are estimated on the test-fold data.
Figure 2For some models, resampling strategy A (patient level sampling) tends to over-estimate model performance, compared to sampling strategy B (PCR batch level sampling). Panel A presents the cross-validated AUC at day 7 for all the prognostic models, using resampling strategy A. The cross-validated AUC is estimated using the pooling method (y-axis) and the averaging method (x-axis); red dots correspond to AUC estimations based on the predicted survival and black dots to AUC estimations based on the linear predictor (see Methods section). Panel B presents the cross-validated AUC at day 7 for all the prognostic models, using resampling strategy B. Finally, panel C compares the cross-validated AUC estimated with strategy A (x-axis) vs. strategy B (y-axis), using the pooling method based on the linear predictor.
Figure 3Performances of the top 30 models obtained with strategy B (PCR batch level sampling). Each column corresponds to a prognostic survival model. The first 2 rows report respectively the cross-validated AUC at day 7 and the cross-validated C-index (based on the linear predictor). Darker colors correspond to better performances. The 11 next rows correspond to each candidate covariate (G1 to G6, C1, C2, C3 and C2_Fac and C3_Fac when the clinical covariates 2 and 3 have been dichotomized according to clinical expert knowledge). The number within each cell gives the percentage of selection of each variable in each model across the cross-validation iterations. Darker colors correspond to a higher selection frequency.
Figure 4The Lasso_Cox −4 model offers the best compromise between performance and validation surprise. Panel A presents the cross-validated time-dependent AUC for the 4 candidate models using strategy B (PCR batch level sampling). Panel B the cross-validated using strategy B time-dependent “validation surprise” computed from the AUC for the 4 candidate models.
Figure 5Omitted covariate is associated with time-dependent sensitivity and specificity (Panel A) and Martingale residuals (Panel B) in the selected model (Lasso_Cox −4). Panel A presents the association between the C0 omitted covariate and the time-dependent sensitivity and specificity at day 7. Each boxplot corresponds to the odds ratio (OR) across all the cross-validation iterations, for a given cut-off on the linear predictor of the model (i.e. a given combination of sensitivity and specificity). Red points correspond to OR with p-values <0.05. The number above the boxplots gives the proportion of p-values <0.05 and the numbers below the boxplot, the sensitivity and the specificity values for a given cut-off on the linear predictor. Panel B presents the scatter plot of the Martingale residuals and C0 covariate.
Figure 6The validation surprise observed with strategy B (PCR batch level sampling) is smaller than the validation surprise observed with strategy A (patient level sampling). Strategy A cross-validated time-dependent AUC (blue); strategy B cross-validated time-dependent AUC (red); validated time-dependent AUC on the test dataset (black) and bootstrap confidence intervals (grey polygon) (95% of the boostrap samples distribution) for the Lasso_Cox −4 model.