| Literature DB >> 30208855 |
Simon Klau1, Vindi Jurinovic2, Roman Hornung2, Tobias Herold3, Anne-Laure Boulesteix2.
Abstract
BACKGROUND: The inclusion of high-dimensional omics data in prediction models has become a well-studied topic in the last decades. Although most of these methods do not account for possibly different types of variables in the set of covariates available in the same dataset, there are many such scenarios where the variables can be structured in blocks of different types, e.g., clinical, transcriptomic, and methylation data. To date, there exist a few computationally intensive approaches that make use of block structures of this kind.Entities:
Keywords: Cox regression; Lasso; Multi-omics data; Penalized regression; Prediction model; Priority-lasso
Mesh:
Year: 2018 PMID: 30208855 PMCID: PMC6134797 DOI: 10.1186/s12859-018-2344-6
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Variables selected by priority-Lasso in scenarios pl1A and pl1B
| Block | Variable | Coef. pl1A | Coef. pl1B |
|---|---|---|---|
| 1 | ELN2017_2 | 0.8552 | 0.8552 |
| ELN2017_3 | 1.4324 | 1.4324 | |
| 2 | Age | 0.3540 | 0.3556 |
| ECOG (> 1) | 0.2794 | 0.2768 | |
| WBC | 0.1029 | 0.1019 | |
| LDH | 0.1744 | 0.1763 | |
| Hb | 0.0529 | 0.0532 | |
| PLT | -0.0788 | -0.0800 | |
| 4 | PHGDH | 0.1242 | |
| FAM171B | 0.0726 | ||
| SH3PXD2B | 0.0192 | ||
| F12 | 0.0097 | ||
| CD109 | 0.0599 | ||
| FAM92A1 | 0.0193 | ||
| LAPTM4B | 0.0079 | ||
| FAM24B | 0.0378 | ||
| DDIT4 | 0.0424 | ||
| DOCK1 | 0.0295 |
Column 1: priority of the block the variables are included in. Column 2: variable name. Column 3 and 4: coefficient of the variable in the Cox Lasso model
Variables selected by priority-Lasso in scenarios pl2A and pl2B
| Block | Variable | Coef. pl2A | Coef. pl2B |
|---|---|---|---|
| 1 | t(8;21)(q22;q22) | -1.0289 | -1.0289 |
| inv(16)(p13.1q22) | -1.5444 | -1.5444 | |
| NPM1 mut/FLT3-ITD neg or low | -1.0181 | -1.0181 | |
| biCEBPA | -1.2240 | -1.2240 | |
| NPM1 wt/FLT3-ITD pos or low | -0.4358 | -0.4358 | |
| t(9;11)(p21;q23) | 0.4635 | 0.4635 | |
| Other aberrations | -0.4376 | -0.4376 | |
| KMT2A rearrangements | -0.5440 | -0.5440 | |
| Complex karyotype | 0.2970 | 0.2970 | |
| Monosomal karyotype | 0.0313 | 0.0313 | |
| NPM1 wt/FLT3-ITD pos | 0.1712 | 0.1712 | |
| RUNX1 mutations | 0.3065 | 0.3065 | |
| ASXL mutations | -0.1224 | -0.1224 | |
| TP53 mutations | 0.4306 | 0.4306 | |
| 2 | Age | 0.2957 | 0.2617 |
| Sex | -0.1011 | ||
| ECOG (> 1) | 0.3147 | 0.3206 | |
| WBC | 0.0990 | 0.0589 | |
| LDH | 0.1681 | 0.2371 | |
| Hb | 0.0700 | 0.0671 | |
| PLT | -0.0960 | -0.0578 | |
| 4 | ZBTB37 | 0.0047 | 0.0025 |
| MFI2 | 0.0090 | ||
|
| 0.0013 | 0.0418 | |
| PDK3 | -0.0187 | ||
|
| 0.0248 | ||
| SIK3 | -0.0063 | ||
| OR7A17 | 0.0039 | ||
| TBC1D17 | -0.0172 | ||
|
| 0.0488 | ||
|
| 0.0134 | ||
| FGD5 | 0.0359 | ||
| F12 | 0.0238 | ||
| IRX1 | -0.0090 | ||
| FAM92A1 | 0.0239 | ||
| DDIT4 | 0.0769 | ||
| HSPA2 | 0.0169 |
Column 1: priority of the block the variable is included in. Column 2: variable name. Column 3 and 4: coefficient of the variable in the Cox Lasso model. Variables from the block of priority 4 also appearing in Table 1 are marked in bold
Validation results for the model scenarios with restrictions to the number of selected variables
| pl1A | pl1B | Lasso1 | pl2A | pl2B | Lasso2 | ELN2017 | |
|---|---|---|---|---|---|---|---|
| TPR | 0.672 | 0.672 | 0.651 | 0.640 | 0.658 | 0.643 | 0.556 |
| TNR | 0.667 | 0.658 | 0.661 | 0.647 | 0.664 | 0.653 | 0.723 |
| AUC | 0.711 | 0.731 | 0.726 | 0.713 | 0.727 | 0.725 | 0.663 |
| C | 0.653 | 0.660 | 0.658 | 0.658 | 0.664 | 0.656 | 0.619 |
| IBS2 | 0.175 | 0.172 | 0.176 | 0.175 | 0.172 | 0.177 | 0.181 |
| IBS 4.4 | 0.197 | 0.192 | 0.191 | 0.197 | 0.191 | 0.193 | 0.204 |
| Optimism | 0.393 | 0.289 | 0.920 | 0.377 | 0.243 | 0.984 | |
| CI | 0.339 | 0.304 | 0.247 | 0.387 | 0.327 | 0.177 | 0.418 |
| HR | 0.536 | 0.455 | 0.363 | 0.605 | 0.566 | 0.286 | 0.669 |
| CI | 0.849 | 0.652 | 0.535 | 0.946 | 0.981 | 0.461 | 1.074 |
| CI | 1.175 | 1.098 | 0.948 | 1.515 | 1.534 | 0.974 | 1.314 |
| HR | 1.751 | 1.651 | 1.385 | 2.208 | 2.199 | 1.386 | 1.954 |
| CI | 2.612 | 2.483 | 2.022 | 3.216 | 3.151 | 1.972 | 2.907 |
| 1.11e-08 | 1.05e-8 | 2.22e-10 | 1.07e-08 | 1.74e-08 | 4.99e-11 | 1.36e-07 |
The acronyms in the first column are: TPR: True positive rate; TNR: True negative rate; AUC: Area under the curve, C : Uno’s C-index, IBS2: Integrated Brier score up to 2 years, IBS 4.4: Integrated Brier score up to 4.4 years, Optimism: difference between calibration slopes of training and validation data, CI: lower bound of the 95% confidence interval for the hazard ratio of the low risk group, HR : hazard ratio of the low risk group, CI: upper bound of the 95% confidence interval for the hazard ratio of the low risk group, CI: lower bound of the 95% confidence interval for the hazard ratio of the high risk group, HR : hazard ratio of the high risk group, CI: upper bound of the 95% confidence interval for the hazard ratio of the high risk group, p-value: p-value of the likelihood ratio test
Fig. 1Prediction error curves. The curves show the Brier scores calculated in the validation data for the different scenarios and for different time points. The left panel contains the models considering ELN2017 as categories. The right panel contains the models considering all ELN variables. The Reference scenario results from the Kaplan-Meier estimation and is the same in both panels. Furthermore, curves for ELN2017, for priority-Lasso with and without cross-validated offsets, and for standard Lasso are shown
Fig. 2Kaplan-Meier curves for training and validation data in three risk groups. The three risk groups were built according to the highest logrank statistic in the training data. The left panel contains the results for the standard Lasso models and the raw ELN2017 score. The middle and right panels contain the plots of priority-Lasso with and without cross-validated offsets, respectively. The top and middle panels show the results considering ELN2017 as categories and using all ELN variables, respectively
Fig. 3Observed and predicted Kaplan-Meier curves for the validation data in three risk groups. The three risk groups were built according to the highest logrank statistic in the training data. The left panel contains the results for the standard Lasso models and the raw ELN2017 score. The middle and right panels contain the plots of priority-Lasso with and without cross-validated offsets, respectively. The top and middle panels show the results considering ELN2017 as categories and using all ELN variables, respectively