| Literature DB >> 27444890 |
Andreas Mayr1,2, Benjamin Hofner3, Matthias Schmid4.
Abstract
BACKGROUND: When constructing new biomarker or gene signature scores for time-to-event outcomes, the underlying aims are to develop a discrimination model that helps to predict whether patients have a poor or good prognosis and to identify the most influential variables for this task. In practice, this is often done fitting Cox models. Those are, however, not necessarily optimal with respect to the resulting discriminatory power and are based on restrictive assumptions. We present a combined approach to automatically select and fit sparse discrimination models for potentially high-dimensional survival data based on boosting a smooth version of the concordance index (C-index). Due to this objective function, the resulting prediction models are optimal with respect to their ability to discriminate between patients with longer and shorter survival times. The gradient boosting algorithm is combined with the stability selection approach to enhance and control its variable selection properties.Entities:
Keywords: Boosting; Concordance index; High-dimensional data; Stability selection; Time-to-event data; Variable selection
Mesh:
Substances:
Year: 2016 PMID: 27444890 PMCID: PMC4957316 DOI: 10.1186/s12859-016-1149-8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Variable selection results from 100 simulation runs: median number of true positives | false positives and calculated upper bound for the per-family-error rate (PFER, in brackets) for different values of q and π thr
|
| Cox | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
| PH-viol |
|
|
|
|
|
| without | lasso |
| 1000 | 4 | 200 | false | 100 | 4 |8 (24.8) | 4 |3 (11.4) | 4 |1 (4.27) | 4 |0 (1.92) | 4 |0 (0.75) | 4 |180 | 4 |36 |
| 50 | 4 |1 (5.20) | 4 |0 (2.61) | 4 |0 (0.97) | 4 |0 (0.43) | 4 |0 (0.17) | ||||||
| 20 | 4 |0 (0.61) | 4 |0 (0.33) | 3 |0 (0.14) | 3 |0 (0.06) | 2 |0 (0.02) | ||||||
| 15 | 4 |0 (0.32) | 3 |0 (0.17) | 3 |0 (0.08) | 3 |0 (0.04) | 2 |0 (0.01) | ||||||
| 10 | 3 |0 (0.13) | 3 |0 (0.07) | 3 |0 (0.04) | 2 |0 (0.02) | 2 |0 (0.01) | ||||||
| 5 | 2 |0 (0.03) | 2 |0 (0.02) | 2 |0 (0.01) | 2 |0 (0.00) | 1 |0 (0.00) | ||||||
| 500 | 4 | 200 | false | 100 | 4 |14 (51.9) | 4 |5 (27.9) | 4 |2 (10.4) | 4 |0 (4.73) | 4 |0 (1.87) | 4 |166 | 4 |31 |
| 50 | 4 |3 (12.4) | 4 |1 (5.71) | 4 |0 (2.13) | 4 |0 (0.96) | 4 |0 (0.38) | ||||||
| 20 | 4 |0 (1.55) | 4 |0 (0.82) | 4 |0 (0.30) | 3 |0 (0.14) | 3 |0 (0.05) | ||||||
| 15 | 4 |0 (0.79) | 4 |0 (0.44) | 3 |0 (0.17) | 3 |0 (0.07) | 3 |0 (0.03) | ||||||
| 10 | 4 |0 (0.31) | 3 |0 (0.16) | 3 |0 (0.07) | 3 |0 (0.03) | 2 |0 (0.01) | ||||||
| 5 | 3 |0 (0.07) | 3 |0 (0.03) | 2 |0 (0.02) | 2 |0 (0.01) | 1 |0 (0.00) | ||||||
| 500 | 4 | 200 | true | 100 | 4|13 (51.9) | 4|5 (27.9) | 4|2 (10.4) | 4|0 (4.73) | 4|0 (1.87) | 4 |171 | 4 |36 |
| 50 | 4|2 (12.4) | 4|1 (5.71) | 4|0 (2.13) | 4|0 (0.96) | 4|0 (0.38) | ||||||
| 20 | 4|0 (1.55) | 4|0 (0.82) | 4|0 (0.30) | 4|0 (0.14) | 3|0 (0.05) | ||||||
| 15 | 4|0 (0.79) | 4|0 (0.44) | 4|0 (0.17) | 3|0 (0.07) | 3|0 (0.03) | ||||||
| 10 | 4|0 (0.31) | 4|0 (0.16) | 3|0 (0.07) | 3|0 (0.03) | 2|0 (0.01) | ||||||
| 5 | 3|0 (0.07) | 3|0 (0.03) | 2|0 (0.02) | 2|0 (0.01) | 1|0 (0.00) | ||||||
| 50 | 4 | 200 | false | 20 | 4 |7 (50.0) | 4 |4 (50.0) | 4 |2 (6.33) | 4 |1 (3.06) | 4 |0 (1.25) | 4 |43 | 4 |14 |
| 15 | 4 |3 (50.0) | 4 |2 (8.12) | 4 |1 (2.88) | 4 |0 (1.34) | 4 |0 (0.54) | ||||||
| 10 | 4 |1 (5.19) | 4 |0 (2.79) | 4 |0 (1.04) | 4 |0 (0.47) | 4 |0 (0.19) | ||||||
| 5 | 4 |0 (1.24) | 4 |0 (0.57) | 4 |0 (0.21) | 4 |0 (0.10) | 3 |0 (0.04) | ||||||
| 500 | 12 | 200 | false | 100 | 12|12 (51.9) | 12|4 (27.9) | 12|1 (10.4) | 11|0 (4.73) | 9|0 (1.87) | 12 |150 | 12 |78 |
| 50 | 9|2 (12.4) | 8|0 (5.71) | 7|0 (2.13) | 6|0 (0.96) | 3|0 (0.38) | ||||||
| 20 | 5|0 (1.55) | 4|0 (0.82) | 3|0 (0.30) | 2|0 (0.14) | 1|0 (0.05) | ||||||
| 15 | 4|0 (0.79) | 3|0 (0.44) | 2|0 (0.17) | 1|0 (0.07) | 0|0 (0.03) | ||||||
| 10 | 3|0 (0.31) | 2|0 (0.16) | 1|0 (0.07) | 0|0 (0.03) | 0|0 (0.01) | ||||||
| 500 | 40 | 200 | false | 200 | 17|13 (500) | 12|5 (500) | 8|2 (63.3) | 4|0 (30.6) | 1|0 (12.5) | 35 |139 | 9 |12 |
| 100 | 16|12 (51.9) | 11|4 (27.9) | 7|2 (10.4) | 4|0 (4.73) | 1|0 (1.87) | ||||||
| 50 | 6|2 (12.4) | 4|1 (5.71) | 2|0 (2.13) | 1|0 (0.96) | 0|0 (0.38) | ||||||
| 25 | 2|0 (2.6) | 1|0 (1.3) | 0|0 (0.48) | 0|0 (0.21) | 0|0 (0.08) | ||||||
In every setting p inf predictors were truly informative, p−p inf were non-informative; PH-viol: settings were the proportional hazards assumption was violated. C-index boosting without stability selection (without π thr) was fitted on all p predictors with a fixed large m stop; in case of the Cox lasso the shrinkage parameter was optimized via 10-fold cross-validation
Resulting discriminatory power of C-index boosting in combination with stability selection for different values of q and π thr compared to the competing Cox lasso approach
|
| Cox | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
| PH-viol |
|
|
|
|
|
| without | lasso |
| 1000 | 4 | 200 | false | 100 | 0.8150 | 0.8286 | 0.8358 | 0.8393 | 0.8396 | 0.7889 | 0.8148 |
| 50 | 0.8343 | 0.8365 | 0.8381 | 0.8357 | 0.8253 | ||||||
| 20 | 0.8324 | 0.8252 | 0.7829 | 0.7662 | 0.7394 | ||||||
| 15 | 0.8309 | 0.7813 | 0.7694 | 0.7519 | 0.7340 | ||||||
| 10 | 0.7799 | 0.7683 | 0.7519 | 0.7426 | 0.6202 | ||||||
| 5 | 0.7497 | 0.7426 | 0.7323 | 0.6176 | 0.5993 | ||||||
| 500 | 4 | 200 | false | 100 | 0.7998 | 0.8179 | 0.8305 | 0.8361 | 0.8391 | 0.7735 | 0.8161 |
| 50 | 0.8268 | 0.8332 | 0.8375 | 0.8388 | 0.8340 | ||||||
| 20 | 0.8358 | 0.8351 | 0.8309 | 0.7744 | 0.7607 | ||||||
| 15 | 0.8346 | 0.8314 | 0.7835 | 0.7672 | 0.7521 | ||||||
| 10 | 0.8279 | 0.7801 | 0.7672 | 0.7587 | 0.7400 | ||||||
| 5 | 0.7627 | 0.7587 | 0.7444 | 0.7347 | 0.6154 | ||||||
| 500 | 4 | 200 | true | 100 | 0.8304 | 0.8481 | 0.8612 | 0.8656 | 0.8671 | 0.7886 | 0.8345 |
| 50 | 0.8555 | 0.8635 | 0.8664 | 0.8668 | 0.8664 | ||||||
| 20 | 0.8657 | 0.8654 | 0.8626 | 0.8477 | 0.7662 | ||||||
| 15 | 0.8654 | 0.8626 | 0.8554 | 0.7743 | 0.7573 | ||||||
| 10 | 0.8598 | 0.8442 | 0.7757 | 0.7614 | 0.7360 | ||||||
| 5 | 0.7660 | 0.7573 | 0.7391 | 0.7275 | 0.6219 | ||||||
| 50 | 4 | 200 | false | 20 | 0.8183 | 0.8248 | 0.8303 | 0.8333 | 0.8358 | 0.7939 | 0.8256 |
| 15 | 0.8268 | 0.8298 | 0.8329 | 0.8353 | 0.8370 | ||||||
| 10 | 0.8314 | 0.8348 | 0.8366 | 0.8370 | 0.8366 | ||||||
| 5 | 0.8373 | 0.8353 | 0.8324 | 0.8247 | 0.7662 | ||||||
| 500 | 12 | 200 | false | 100 | 0.9109 | 0.9218 | 0.8996 | 0.8639 | 0.8081 | 0.8852 | 0.8834 |
| 50 | 0.7991 | 0.7880 | 0.7451 | 0.7089 | 0.6482 | ||||||
| 20 | 0.6954 | 0.6609 | 0.6239 | 0.5698 | – | ||||||
| 15 | 0.6664 | 0.6274 | 0.5830 | 0.5549 | – | ||||||
| 10 | 0.6275 | 0.5848 | 0.5610 | – | – | ||||||
| 500 | 40 | 200 | false | 200 | 0.6416 | 0.6269 | 0.6088 | 0.5755 | 0.5344 | 0.6983 | 0.5782 |
| 100 | 0.6373 | 0.6245 | 0.6028 | 0.5706 | 0.5308 | ||||||
| 50 | 0.5907 | 0.5703 | 0.5407 | 0.5129 | – | ||||||
| 25 | 0.5411 | 0.5269 | – | – | – | ||||||
In case of C-index boosting, the final models were fitted with fixed m stop=1000. Numbers represent the median on test samples from 100 simulation runs. PH-viol: settings were the proportional hazards assumption was violated. In cases where no variables at all are identified as stable, no discriminatory power can be computed (denoted as –). C-index boosting without stability selection (without π thr) was fitted on all p predictors with a fixed large m stop; in case of the Cox lasso the shrinkage parameter was optimized via 10-fold cross-validation
Fig. 1Variable selection for the breast cancer application. Number of selected variables resulting from boosting a smooth version of the C-index (left boxplots) and Cox lasso (right boxplots) with and without stability selection for different values of π thr. Boxplots refer to the results from 100 stratified subsamples drawn from the complete data set
Fig. 2Discriminatory power for the breast cancer application. Resulting C-index on 100 test samples from the breast cancer application comparing both C-index boosting (left boxplots) and Cox lasso (right boxplots) with and without stability selection for different values of π thr