| Literature DB >> 25907251 |
Leili Tapak1, Massoud Saidijam2, Majid Sadeghifar3, Jalal Poorolajal4, Hossein Mahjub5.
Abstract
Analysis of microarray data is associated with the methodological problems of high dimension and small sample size. Various methods have been used for variable selection in high-dimension and small sample size cases with a single survival endpoint. However, little effort has been directed toward addressing competing risks where there is more than one failure risks. This study compared three typical variable selection techniques including Lasso, elastic net, and likelihood-based boosting for high-dimensional time-to-event data with competing risks. The performance of these methods was evaluated via a simulation study by analyzing a real dataset related to bladder cancer patients using time-dependent receiver operator characteristic (ROC) curve and bootstrap .632+ prediction error curves. The elastic net penalization method was shown to outperform Lasso and boosting. Based on the elastic net, 33 genes out of 1381 genes related to bladder cancer were selected. By fitting to the Fine and Gray model, eight genes were highly significant (P<0.001). Among them, expression of RTN4, SON, IGF1R, SNRPE, PTGR1, PLEK, and ETFDH was associated with a decrease in survival time, whereas SMARCAD1 expression was associated with an increase in survival time. This study indicates that the elastic net has a higher capacity than the Lasso and boosting for the prediction of survival time in bladder cancer patients. Moreover, genes selected by all methods improved the predictive power of the model based on only clinical variables, indicating the value of information contained in the microarray features.Entities:
Keywords: Cause-specific hazard; Competing risks; Elastic net; Lasso; Microarray; Subdistribution hazard
Mesh:
Substances:
Year: 2015 PMID: 25907251 PMCID: PMC4563215 DOI: 10.1016/j.gpb.2015.04.001
Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN: 1672-0229 Impact factor: 7.691
Genes selected by three methods for bladder cancer event included in Dyrskjøt dataset
| SEQ1014 | + | − | − |
| SEQ1038 | − | + | − |
| SEQ1082 | + | − | − |
| SEQ1111 | + | − | − |
| SEQ1164 | + | − | − |
| SEQ1197 | + | − | − |
| SEQ1225 | + | − | − |
| SEQ1226 | + | − | − |
| SEQ1262 | + | − | − |
| SEQ1330 | + | − | − |
| SEQ1381 | + | − | + |
| SEQ1384 | + | − | + |
| SEQ162 | + | − | + |
| SEQ164 | + | + | + |
| SEQ183 | + | − | − |
| SEQ213 | + | − | − |
| SEQ240 | + | − | − |
| SEQ251 | − | + | − |
| SEQ265 | + | − | + |
| SEQ279 | + | − | − |
| SEQ287 | + | − | − |
| SEQ34 | + | − | + |
| SEQ347 | + | + | + |
| SEQ370 | + | + | − |
| SEQ377 | + | − | − |
| SEQ410 | + | − | − |
| SEQ424 | − | + | − |
| SEQ634 | + | − | − |
| SEQ681 | + | − | − |
| SEQ785 | + | − | − |
| SEQ813 | + | − | − |
| SEQ820 | + | − | + |
| SEQ833 | + | − | − |
| SEQ940 | + | − | − |
| SEQ972 | + | − | − |
| SEQ973 | + | − | − |
| 33 | 6 | 8 | |
Note: Genes for bladder cancer event listed in Dyrskjøt dataset [1] were selected using three methods. “+” indicates that the gene was selected by the respective method and genes not selected by the respective methods are indicated with “−”.
Figure 1The area under the ROC curve for bladder cancer data
AUC value over time was presented in y-axis, survival time on x-axis was time to progression or death from bladder cancer (in week).
Figure 2The prediction error curves for bladder cancer data
Clinical model used age, sex, stage, grade and treatment as predictors. The elastic net, Lasso, and boosting used microarray features in addition to the clinical parameters as predictors.
Genes affecting bladder cancer patients’ survival as selected by elastic net
| SEQ1082 | NM_207521.1 | 0.745 ± 0.250 | 2.11 | 0.00290 | ||
| SEQ1197 | NM_003103.5 | 1.335 ± 0.364 | 3.80 | 0.00024 | ||
| SEQ1262 | NM_000875.2 | 1.364 ± 0.510 | 3.85 | 0.00750 | ||
| SEQ1330 | NM_003094.1 | 0.789 ± 0.193 | 2.2 | 0.00005 | ||
| SEQ162 | NM_001146108 | 1.386 ± 0.395 | 3.99 | 0.00045 | ||
| SEQ377 | NM_002664 | 1.058 ± 0.315 | 2.88 | 0.00078 | ||
| SEQ634 | NM_004453 | 1.400 ± 0.399 | 4.06 | 0.00045 | ||
| SEQ940 | NM_020159.1 | −1.000 ± 0.348 | 0.37 | 0.00400 |
Note: Genes affecting bladder cancer patients’ survival were selected by elastic net based on Fine and Gray model. Coefficient is indicated as average ± standard error.
Results of simulation study using the three methods
| Elastic net | 1 | 53.89 ± 0.52 | 31.58 | 68.42 | 1.03 | 98.97 |
| 2 | 69.28 ± 0.76 | 31.80 | 68.20 | 1.30 | 98.67 | |
| Lasso | 1 | 15.16 ± 0.93 | 12.50 | 87.50 | 0.30 | 99.70 |
| 2 | 15.90 ± 0.85 | 16.41 | 83.59 | 0.32 | 99.68 | |
| Boosting | 1 | 23.86 ± 0.12 | 13.58 | 86.42 | 0.58 | 99.42 |
| 2 | 23.90 ± 0.12 | 16.67 | 83.33 | 0.58 | 99.42 |
Note: Type 1 is the first simulated event and type 2 is the competing event. Number of selected variables is indicated as average ± standard error. TP, true positive, the proportion of correctly-included variables; FN, false negative, the proportion of incorrectly-excluded variables; FP, false positive, the proportion of incorrectly-included variables; TN, true negative, the proportion of correctly-excluded variables.