| Literature DB >> 32670394 |
Julia Gilhodes1, Florence Dalenc2, Jocelyn Gal3, Christophe Zemmour4, Eve Leconte5, Jean-Marie Boher4, Thomas Filleron1.
Abstract
Over the last decades, molecular signatures have become increasingly important in oncology and are opening up a new area of personalized medicine. Nevertheless, biological relevance and statistical tools necessary for the development of these signatures have been called into question in the literature. Here, we investigate six typical selection methods for high-dimensional settings and survival endpoints, including LASSO and some of its extensions, component-wise boosting, and random survival forests (RSF). A resampling algorithm based on data splitting was used on nine high-dimensional simulated datasets to assess selection stability on training sets and the intersection between selection methods. Prognostic performances were evaluated on respective validation sets. Finally, one application on a real breast cancer dataset has been proposed. The false discovery rate (FDR) was high for each selection method, and the intersection between lists of predictors was very poor. RSF selects many more variables than the other methods and thus becomes less efficient on validation sets. Due to the complex correlation structure in genomic data, stability in the selection procedure is generally poor for selected predictors, but can be improved with a higher training sample size. In a very high-dimensional setting, we recommend the LASSO-pcvl method since it outperforms other methods by reducing the number of selected genes and minimizing FDR in most scenarios. Nevertheless, this method still gives a high rate of false positives. Further work is thus necessary to propose new methods to overcome this issue where numerous predictors are present. Pluridisciplinary discussion between clinicians and statisticians is necessary to ensure both statistical and biological relevance of the predictors included in molecular signatures.Entities:
Mesh:
Year: 2020 PMID: 32670394 PMCID: PMC7350178 DOI: 10.1155/2020/6795392
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
Characteristics of simulated datasets.
|
| Events | Censoring rate (%) |
|
|
|---|---|---|---|---|
| 500 | 448 | 10.4 | 1500 | 0 |
| 500 | 403 | 19.4 | 1500 | 12 |
| 500 | 362 | 27.6 | 1500 | 50 |
| 750 | 678 | 9.6 | 1500 | 0 |
| 750 | 617 | 17.7 | 1500 | 12 |
| 750 | 535 | 28.7 | 1500 | 50 |
| 1000 | 892 | 10.8 | 1500 | 0 |
| 1000 | 814 | 18.6 | 1500 | 12 |
| 1000 | 709 | 29.1 | 1500 | 50 |
Number of selected predictors for each simulated dataset and selection method.
|
|
| Training fraction | Number of selected predictors | |||||
|---|---|---|---|---|---|---|---|---|
| LASSO-cvl | LASSO-pcvl | Elastic Net | BSS Enet | CoxBoost | RSF | |||
| 500 | 0 | 1/2 | 0 (0-24) | 1 (0-30) | 0 (0-34) | 9 (1-15) | 0 (0-24) | 71 (67-75) |
| 2/3 | 0 (0-21) | 0 (0-31) | 0 (0-24) | 10 (4-20) | 0 (0-24) | 56 (52-59) | ||
| 12 | 1/2 | 25 (7-55) | 11 (5-29) | 33 (10-62) | 22 (10-29) | 26 (9-50) | 71 (65-75) | |
| 2/3 | 27 (10-55) | 13 (7-26) | 37 (13-75) | 27 (19-35) | 30 (10-50) | 56 (52-60) | ||
| 50 | 1/2 | 62 (40-93) | 44 (13-75) | 83 (50-121) | 44 (29-57) | 57 (24-80) | 75 (70-78) | |
| 2/3 | 76 (57-105) | 53 (34-72) | 96 (71-127) | 56 (47-66) | 67 (41-103) | 63 (57-68) | ||
|
| ||||||||
| 750 | 0 | 1/2 | 0 (0-18) | 0 (0-33) | 0 (0-23) | 12 (6-22) | 0 (0-29) | 45 (42-49) |
| 2/3 | 0 (0-20) | 0 (0-16) | 0 (0-21) | 16 (8-26) | 0 (0-19) | 22 (19-25) | ||
| 12 | 1/2 | 25 (12-52) | 13 (7-25) | 33 (14-57) | 25 (14-36) | 27 (12-53) | 47 (44-51) | |
| 2/3 | 24 (12-49) | 11 (8-18) | 28 (17-55) | 28 (21-37) | 26 (13-52) | 25 (22-28) | ||
| 50 | 1/2 | 72 (51-93) | 44 (27-68) | 88 (65-118) | 54 (46-69) | 62 (36-85) | 58 (53-61) | |
| 2/3 | 80 (58-111) | 46 (31-59) | 96 (76-134) | 64 (52-78) | 69 (43-89) | 37 (32-41) | ||
|
| ||||||||
| 1000 | 0 | 1/2 | 0 (0-45) | 0 (0-36) | 0 (0-45) | 19 (10-34) | 1 (0-40) | 22 (20-25) |
| 2/3 | 1 (0-48) | 1 (0-43) | 1 (0-48) | 27 (17-40) | 2 (0-45) | 6 (6-8) | ||
| 12 | 1/2 | 34 (11-79) | 15 (8-28) | 42 (18-89) | 35 (24-52) | 36 (17-62) | 28 (25-32) | |
| 2/3 | 42 (21-74) | 15 (9-25) | 50 (23-86) | 48 (31-63) | 44 (14-77) | 9 (8-11) | ||
| 50 | 1/2 | 90 (70-124) | 57 (34-78) | 107 (87-139) | 68 (58-89) | 74 (54-114) | 38 (34-43) | |
| 2/3 | 101 (76-145) | 57 (46-73) | 120 (84-168) | 80 (67-99) | 79 (53-112) | 18 (16-20) | ||
False discovery rates and false-negative rates for each simulated dataset and selection method.
|
|
| Training fraction | FDR/FNR | |||||
|---|---|---|---|---|---|---|---|---|
| LASSO-cvl | LASSO-pcvl | Elastic Net | BSS Enet | CoxBoost | RSF | |||
| 500 | 0 | 1/2 | 0.43/. | 0.51/. | 0.45/. | 1/. | 0.49/. | 1/. |
| 2/3 | 0.39/. | 0.48/. | 0.4/. | 1/. | 0.49/. | 1/. | ||
| 12 | 1/2 | 0.62/0.29 | 0.32/0.36 | 0.69/0.19 | 0.56/0.24 | 0.65/0.3 | 0.92/0.56 | |
| 2/3 | 0.61/0.18 | 0.28/0.23 | 0.69/0.1 | 0.6/0.12 | 0.64/0.19 | 0.9/0.51 | ||
| 50 | 1/2 | 0.51/0.39 | 0.35/0.45 | 0.56/0.3 | 0.3/0.39 | 0.47/0.41 | 0.83/0.74 | |
| 2/3 | 0.52/0.27 | 0.34/0.31 | 0.58/0.19 | 0.32/0.25 | 0.47/0.29 | 0.78/0.72 | ||
|
| ||||||||
| 750 | 0 | 1/2 | 0.44/. | 0.5/. | 0.46/. | 1/. | 0.46/. | 1/. |
| 2/3 | 0.23/. | 0.42/. | 0.24/. | 1/. | 0.38/. | 1/. | ||
| 12 | 1/2 | 0.62/0.24 | 0.31/0.27 | 0.67/0.16 | 0.59/0.16 | 0.64/0.25 | 0.85/0.4 | |
| 2/3 | 0.57/0.21 | 0.19/0.23 | 0.62/0.12 | 0.63/0.11 | 0.61/0.2 | 0.65/0.29 | ||
| 50 | 1/2 | 0.49/0.29 | 0.27/0.35 | 0.54/0.2 | 0.32/0.27 | 0.44/0.3 | 0.75/0.71 | |
| 2/3 | 0.48/0.17 | 0.2/0.26 | 0.54/0.1 | 0.33/0.15 | 0.4/0.2 | 0.54/0.66 | ||
|
| ||||||||
| 1000 | 0 | 1/2 | 0.45/. | 0.48/. | 0.45/. | 1/. | 0.51/. | 1/. |
| 2/3 | 0.55/. | 0.57/. | 0.56/. | 1/. | 0.62/. | 1/. | ||
| 12 | 1/2 | 0.71/0.23 | 0.4/0.27 | 0.74/0.16 | 0.7/0.16 | 0.73/0.23 | 0.71/0.32 | |
| 2/3 | 0.75/0.19 | 0.36/0.23 | 0.78/0.12 | 0.77/0.1 | 0.77/0.19 | 0.22/0.4 | ||
| 50 | 1/2 | 0.52/0.15 | 0.27/0.18 | 0.58/0.1 | 0.35/0.11 | 0.44/0.16 | 0.54/0.65 | |
| 2/3 | 0.56/0.1 | 0.24/0.13 | 0.61/0.06 | 0.42/0.07 | 0.44/0.12 | 0.15/0.7 | ||
Gene frequency occurrence for each simulated dataset and selection method.
|
|
| Training fraction | Occurrence frequency | |||||
|---|---|---|---|---|---|---|---|---|
| LASSO-cvl | LASSO-pcvl | Elastic Net | BSS Enet | CoxBoost | RSF | |||
| 500 | 0 | 1/2 | NA/1 (1-16) | NA/1 (1-19) | NA/1 (1-18) | NA/1 (1-26) | NA/1 (1-23) | NA/5 (1-71) |
| 2/3 | NA/1 (1-18) | NA/1 (1-21) | NA/1 (1-22) | NA/2 (1-40) | NA/1 (1-23) | NA/4 (1-94) | ||
| 12 | 1/2 | 76 (3-100)/2 (1-30) | 65 (2-100)/1 (1-13) | 86 (19-100)/2 (1-37) | 82 (19-100)/2 (1-28) | 74 (1-100)/2 (1-31) | 48 (9-94)/4 (1-89) | |
| 2/3 | 92 (8-100)/2 (1-50) | 86 (3-100)/2 (1-29) | 97 (34-100)/2 (1-64) | 97 (38-100)/2 (1-54) | 92 (10-100)/2 (1-59) | 52 (12-87)/4 (1-98) | ||
| 50 | 1/2 | 66 (1-99)/2 (1-53) | 58 (1-100)/2 (1-37) | 79 (1-100)/2 (1-67) | 68 (3-100)/2 (1-34) | 67 (1-99)/2 (1-46) | 18 (1-87)/4 (1-62) | |
| 2/3 | 89 (1-100)/2 (1-74) | 82 (6-100)/2 (1-57) | 94 (4-100)/3 (1-77) | 91 (1-100)/2 (1-64) | 87 (1-100)/2 (1-68) | 22 (1-100)/3 (1-81) | ||
|
| ||||||||
| 750 | 0 | 1/2 | NA/1 (1-14) | NA/1 (1-16) | NA/1 (1-17) | NA/1 (1-42) | NA/1 (1-10) | NA/3 (1-60) |
| 2/3 | NA/1 (1-13) | NA/1 (1-18) | NA/1 (1-14) | NA/2 (1-71) | NA/1 (1-14) | NA/2 (1-86) | ||
| 12 | 1/2 | 94 (22-100)/2 (1-34) | 90 (17-100)/1 (1-13) | 99 (39-100)/2 (1-38) | 98 (31-100)/1 (1-30) | 94 (22-100)/2 (1-34) | 59 (9-100)/3 (1-77) | |
| 2/3 | 99 (17-100)/2 (1-44) | 98 (12-100)/1 (1-15) | 100 (35-100)/2 (1-48) | 100 (28-100)/2 (1-42) | 99 (14-100)/2 (1-47) | 74 (11-100)/2 (1-60) | ||
| 50 | 1/2 | 78 (10-100)/2 (1-44) | 72 (6-100)/1 (1-31) | 90 (14-100)/2 (1-54) | 84 (6-100)/2 (1-31) | 76 (7-100)/2 (1-40) | 21 (1-86)/3 (1-56) | |
| 2/3 | 94 (13-100)/2 (1-68) | 91 (4-100)/2 (1-48) | 98 (28-100)/3 (1-78) | 97 (12-100)/2 (1-57) | 94 (11-100)/2 (1-56) | 35 (1-100)/2 (1-55) | ||
|
| ||||||||
| 1000 | 0 | 1/2 | NA/1 (1-20) | NA/1 (1-21) | NA/1 (1-22) | NA/2 (1-50) | NA/1 (1-26) | NA/2 (1-78) |
| 2/3 | NA/1 (1-39) | NA/1 (1-43) | NA/1 (1-40) | NA/2 (1-80) | NA/1 (1-40) | NA/2 (1-80) | ||
| 12 | 1/2 | 97 (5-100)/2 (1-53) | 97 (2-100)/1 (1-33) | 100 (15-100)/2 (1-64) | 99 (19-100)/2 (1-56) | 98 (5-100)/2 (1-57) | 70 (26-99)/2 (1-32) | |
| 2/3 | 100 (6-100)/2 (1-84) | 100 (2-100)/2 (1-49) | 100 (16-100)/3 (1-87) | 100 (21-100)/3 (1-89) | 100 (5-100)/2 (1-88) | 54 (28-100)/1 (1-14) | ||
| 50 | 1/2 | 98 (2-100)/2 (1-74) | 93 (11-100)/1 (1-48) | 99 (5-100)/3 (1-81) | 99 (20-100)/2 (1-64) | 95 (1-100)/2 (1-60) | 30 (1-95)/2 (1-48) | |
| 2/3 | 100 (2-100)/3 (1-93) | 100 (6-100)/2 (1-77) | 100 (5-100)/3 (1-97) | 100 (2-100)/2 (1-92) | 100 (7-100)/2 (1-86) | 36 (1-86)/2 (1-22) | ||
Number of genes that overlaps between methods for the same samples and true-positive rates among common genes, for each simulated dataset and selection method.
|
|
| Training fraction | Intersection (number of common genes/true-positive rates among common genes) | |||||
|---|---|---|---|---|---|---|---|---|
| LASSO-pcvl | Elastic Net | BSS Enet | CoxBoost | RSF | ||||
| 500 | 12 | 1/2 | LASSO-cvl | 11 (5-29)/0.7 (0.35-1) | 25 (7-53)/0.35 (0.15-1) | 18 (7-27)/0.47 (0.26-1) | 22 (7-40)/0.38 (0.17-1) | 6 (1-12)/0.71 (0.2-1) |
| LASSO-pcvl | 11 (5-29)/0.7 (0.33-1) | 11 (5-24)/0.71 (0.41-1) | 11 (5-29)/0.7 (0.33-1) | 4 (1-12)/0.86 (0.5-1) | ||||
| Elastic Net | 20 (8-29)/0.45 (0.29-0.88) | 24 (8-48)/0.35 (0.17-0.78) | 7 (3-14)/0.67 (0.2-1) | |||||
| BSS Enet | 18 (7-28)/0.44 (0.28-0.8) | 7 (1-12)/0.75 (0.25-1) | ||||||
| CoxBoost | 6 (1-13)/0.71 (0.2-1) | |||||||
| 50 | 1/2 | LASSO-cvl | 42 (13-74)/0.64 (0.46-0.95) | 62 (40-93)/0.49 (0.37-0.65) | 40 (28-56)/0.71 (0.52-0.84) | 53 (24-80)/0.55 (0.41-0.92) | 12 (4-19)/0.79 (0.44-1) | |
| LASSO-pcvl | 44 (13-74)/0.63 (0.45-0.95) | 35 (13-48)/0.74 (0.59-0.95) | 42 (13-72)/0.65 (0.45-0.95) | 11 (3-18)/0.86 (0.56-1) | ||||
| Elastic Net | 43 (29-57)/0.71 (0.52-0.82) | 57 (24-79)/0.54 (0.38-0.92) | 14 (5-21)/0.77 (0.37-1) | |||||
| BSS Enet | 40 (22-53)/0.72 (0.53-0.95) | 11 (2-18)/0.89 (0.57-1) | ||||||
| CoxBoost | 12 (4-20)/0.81 (0.5-1) | |||||||
|
| ||||||||
| 750 | 12 | 1/2 | LASSO-cvl | 13 (7-25)/0.69 (0.4-1) | 25 (12-50)/0.36 (0.18-0.75) | 19 (11-29)/0.47 (0.3-0.79) | 23 (12-43)/0.39 (0.2-0.75) | 7 (4-12)/0.83 (0.5-1) |
| LASSO-pcvl | 13 (7-25)/0.68 (0.4-1) | 12 (6-23)/0.73 (0.43-1) | 13 (7-24)/0.69 (0.38-1) | 6 (3-10)/1 (0.6-1) | ||||
| Elastic Net | 22 (11-33)/0.45 (0.3-0.75) | 26 (12-50)/0.36 (0.2-0.75) | 8 (4-14)/0.8 (0.54-1) | |||||
| BSS Enet | 20 (11-33)/0.45 (0.3-0.82) | 8 (4-12)/0.86 (0.43-1) | ||||||
| CoxBoost | 7 (3-12)/0.8 (0.57-1) | |||||||
| 50 | 1/2 | LASSO-cvl | 43 (27-68)/0.73 (0.5-0.91) | 71 (51-90)/0.51 (0.37-0.69) | 49 (39-65)/0.7 (0.55-0.85) | 60 (36-79)/0.58 (0.43-0.81) | 14 (6-22)/0.85 (0.67-1) | |
| LASSO-pcvl | 44 (27-68)/0.73 (0.5-0.92) | 40 (27-57)/0.79 (0.61-0.94) | 42 (27-65)/0.74 (0.52-0.92) | 12 (7-20)/0.93 (0.71-1) | ||||
| Elastic Net | 52 (42-68)/0.68 (0.55-0.83) | 62 (36-83)/0.56 (0.43-0.81) | 16 (7-23)/0.83 (0.6-1) | |||||
| BSS Enet | 47 (34-63)/0.71 (0.56-0.86) | 14 (8-22)/0.9 (0.73-1) | ||||||
| CoxBoost | 14 (6-22)/0.88 (0.71-1) | |||||||
|
| ||||||||
| 1000 | 12 | 1/2 | LASSO-cvl | 15 (8-28)/0.59 (0.32-0.92) | 34 (11-79)/0.28 (0.11-0.73) | 26 (11-51)/0.35 (0.18-0.73) | 31 (11-55)/0.29 (0.16-0.73) | 8 (4-12)/0.89 (0.45-1) |
| LASSO-pcvl | 15 (8-28)/0.59 (0.32-0.92) | 15 (8-27)/0.61 (0.33-0.92) | 15 (8-28)/0.59 (0.32-0.92) | 7 (4-11)/1 (0.62-1) | ||||
| Elastic Net | 29 (14-52)/0.34 (0.17-0.61) | 34 (16-62)/0.28 (0.14-0.53) | 9 (4-12)/0.89 (0.42-1) | |||||
| BSS Enet | 27 (13-44)/0.34 (0.2-0.59) | 9 (6-12)/0.89 (0.5-1) | ||||||
| CoxBoost | 8 (4-11)/0.88 (0.6-1) | |||||||
| 50 | 1/2 | LASSO-cvl | 57 (34-77)/0.72 (0.52-0.97) | 90 (70-122)/0.48 (0.35-0.61) | 63 (53-88)/0.67 (0.52-0.78) | 73 (54-101)/0.57 (0.45-0.78) | 18 (13-24)/0.9 (0.71-1) | |
| LASSO-pcvl | 57 (34-78)/0.72 (0.51-0.97) | 53 (34-65)/0.78 (0.63-0.97) | 56 (34-76)/0.73 (0.53-0.97) | 17 (12-23)/0.94 (0.74-1) | ||||
| Elastic Net | 66 (58-89)/0.66 (0.53-0.76) | 74 (54-106)/0.56 (0.43-0.78) | 20 (14-25)/0.88 (0.71-1) | |||||
| BSS Enet | 60 (50-84)/0.7 (0.56-0.81) | 19 (13-24)/0.94 (0.79-1) | ||||||
| CoxBoost | 18 (12-24)/0.94 (0.71-1) | |||||||
|
| ||||||||
| 500 | 12 | 2/3 | LASSO-cvl | 13 (7-25)/0.71 (0.4-1) | 26 (10-55)/0.38 (0.15-0.8) | 20 (9-31)/0.45 (0.3-0.8) | 24 (10-46)/0.4 (0.17-0.8) | 7 (3-11)/0.75 (0.43-1) |
| LASSO-pcvl | 13 (7-26)/0.71 (0.38-1) | 13 (7-21)/0.72 (0.48-1) | 13 (7-25)/0.71 (0.4-1) | 5 (2-10)/1 (0.6-1) | ||||
| Elastic Net | 23 (11-33)/0.43 (0.31-0.82) | 28 (10-48)/0.34 (0.17-0.71) | 7 (4-13)/0.71 (0.4-1) | |||||
| BSS Enet | 21 (9-32)/0.44 (0.31-0.78) | 7 (4-11)/0.8 (0.5-1) | ||||||
| CoxBoost | 7 (3-10)/0.71 (0.43-1) | |||||||
| 50 | 2/3 | LASSO-cvl | 52 (33-71)/0.65 (0.51-0.85) | 76 (57-104)/0.49 (0.34-0.61) | 51 (43-62)/0.68 (0.56-0.81) | 65 (41-93)/0.55 (0.35-0.68) | 14 (7-21)/0.83 (0.58-1) | |
| LASSO-pcvl | 53 (34-72)/0.65 (0.51-0.85) | 46 (34-57)/0.73 (0.61-0.92) | 51 (33-65)/0.66 (0.56-0.87) | 13 (7-19)/0.9 (0.64-1) | ||||
| Elastic Net | 54 (45-66)/0.69 (0.56-0.82) | 66 (41-99)/0.54 (0.34-0.69) | 16 (10-23)/0.8 (0.48-1) | |||||
| BSS Enet | 50 (37-60)/0.7 (0.6-0.82) | 14 (7-20)/0.92 (0.61-1) | ||||||
| CoxBoost | 14 (7-21)/0.87 (0.62-1) | |||||||
|
| ||||||||
| 750 | 12 | 2/3 | LASSO-cvl | 11 (8-18)/0.82 (0.5-1) | 23 (12-49)/0.42 (0.2-0.79) | 20 (11-33)/0.5 (0.3-0.92) | 21 (12-47)/0.45 (0.21-0.77) | 8 (5-11)/1 (0.73-1) |
| LASSO-pcvl | 11 (8-18)/0.82 (0.5-1) | 11 (8-17)/0.83 (0.53-1) | 11 (8-18)/0.82 (0.5-1) | 7 (5-10)/1 (0.86-1) | ||||
| Elastic Net | 22 (13-35)/0.46 (0.29-0.85) | 24 (13-49)/0.41 (0.2-0.71) | 8 (5-13)/1 (0.69-1) | |||||
| BSS Enet | 20 (12-32)/0.47 (0.28-0.83) | 8 (5-12)/1 (0.73-1) | ||||||
| CoxBoost | 8 (5-12)/1 (0.7-1) | |||||||
| 50 | 2/3 | LASSO-cvl | 46 (31-59)/0.8 (0.62-0.97) | 80 (58-111)/0.52 (0.39-0.72) | 58 (46-70)/0.69 (0.59-0.85) | 68 (43-87)/0.59 (0.45-0.84) | 16 (12-21)/0.94 (0.76-1) | |
| LASSO-pcvl | 46 (31-59)/0.8 (0.62-0.97) | 44 (30-54)/0.83 (0.69-0.97) | 46 (31-58)/0.8 (0.65-0.97) | 14 (9-20)/1 (0.84-1) | ||||
| Elastic Net | 62 (51-78)/0.68 (0.59-0.84) | 69 (43-89)/0.59 (0.44-0.84) | 17 (14-23)/0.94 (0.74-1) | |||||
| BSS Enet | 55 (40-67)/0.71 (0.6-0.87) | 16 (12-22)/0.94 (0.81-1) | ||||||
| CoxBoost | 15 (12-21)/0.94 (0.8-1) | |||||||
|
| ||||||||
| 1000 | 12 | 2/3 | LASSO-cvl | 15 (9-25)/0.62 (0.4-1) | 42 (20-74)/0.24 (0.12-0.5) | 33 (19-55)/0.29 (0.16-0.53) | 37 (14-67)/0.25 (0.15-0.57) | 7 (3-9)/1 (0.83-1) |
| LASSO-pcvl | 15 (9-25)/0.62 (0.4-1) | 14 (9-25)/0.62 (0.4-1) | 14 (9-25)/0.62 (0.4-1) | 6 (3-9)/1 (0.8-1) | ||||
| Elastic Net | 38 (21-59)/0.28 (0.18-0.52) | 41 (14-70)/0.23 (0.14-0.57) | 7 (4-10)/1 (0.83-1) | |||||
| BSS Enet | 35 (14-53)/0.28 (0.17-0.57) | 7 (4-10)/1 (0.75-1) | ||||||
| CoxBoost | 7 (4-9)/1 (0.83-1) | |||||||
| 50 | 2/3 | LASSO-cvl | 57 (45-73)/0.76 (0.6-0.91) | 100 (75-144)/0.45 (0.31-0.59) | 73 (59-89)/0.61 (0.5-0.74) | 78 (53-110)/0.56 (0.39-0.77) | 15 (11-18)/1 (0.88-1) | |
| LASSO-pcvl | 57 (46-73)/0.75 (0.6-0.91) | 55 (45-68)/0.79 (0.63-0.91) | 55 (45-71)/0.76 (0.61-0.91) | 15 (11-18)/1 (0.92-1) | ||||
| Elastic Net | 78 (65-93)/0.59 (0.5-0.72) | 79 (53-112)/0.55 (0.38-0.77) | 16 (12-18)/1 (0.81-1) | |||||
| BSS Enet | 67 (53-82)/0.66 (0.53-0.83) | 15 (11-18)/1 (0.92-1) | ||||||
| CoxBoost | 15 (11-18)/1 (0.92-1) | |||||||
Figure 1C-index associated to risk score for each selection method and fraction of the training data (fapp) according to the sample sizes (a) N = 500, (b) N = 750, and (c) N = 1000 for simulated datasets with q = 12.
Figure 2Brier score associated to risk score for each selection method and fraction of the training data (fapp) according to the sample sizes (a) N = 500, (b) N = 750, and (c) N = 1000 for simulated datasets with q = 12.
Number of genes that overlap between methods for breast cancer dataset (sample results).
| LASSO-cvl | LASSO-pcvl | Elastic Net | BSS Enet | CoxBoost | RSF | |
| LASSO-cvl |
| 10 | 54 | 32 | 49 | 7 |
| LASSO-pcvl |
| 12 | 9 | 12 | 2 | |
| Elastic Net |
| 38 | 56 | 9 | ||
| BSS Enet |
| 34 | 7 | |||
| CoxBoost |
| 8 | ||||
| RSF |
|
Number of selected predictors and gene frequency occurrence for breast cancer dataset and each selection method (resampling results).
| Training fraction | Number of selected predictors | |||||
|---|---|---|---|---|---|---|
| LASSO-cvl | LASSO-pcvl | Elastic Net | BSS Enet | CoxBoost | RSF | |
| 1/2 | 26 (4-71) | 11 (0-59) | 47 (11-100) | 30 (17-49) | 26 (0-56) | 82 (71-93) |
| 2/3 | 40 (6-70) | 14 (4-53) | 58 (14-104) | 41 (31-62) | 40 (5-70) | 85 (72-93) |
| Occurrence frequency | ||||||
| 1/2 | 2 (1-65) | 2 (1-57) | 2 (1-73) | 2 (1-77) | 2 (1-64) | 5 (1-57) |
| 2/3 | 3 (1-92) | 2 (1-73) | 3 (1-95) | 3 (1-98) | 3 (1-91) | 5 (1-61) |
Figure 3Prognostic performance for each selection method and fraction of the training data (fapp) for the breast cancer dataset. (a) C-index and (b) Brier score associated to risk score.