| Literature DB >> 34589136 |
Leili Tapak1,2, Michael R Kosorok3, Majid Sadeghifar4, Omid Hamidi5, Saeid Afshar6, Hassan Doosti7.
Abstract
Variable selection and penalized regression models in high-dimension settings have become an increasingly important topic in many disciplines. For instance, omics data are generated in biomedical researches that may be associated with survival of patients and suggest insights into disease dynamics to identify patients with worse prognosis and to improve the therapy. Analysis of high-dimensional time-to-event data in the presence of competing risks requires special modeling techniques. So far, some attempts have been made to variable selection in low- and high-dimension competing risk setting using partial likelihood-based procedures. In this paper, a weighted likelihood-based penalized approach is extended for direct variable selection under the subdistribution hazards model for high-dimensional competing risk data. The proposed method which considers a larger class of semiparametric regression models for the subdistribution allows for taking into account time-varying effects and is of particular importance, because the proportional hazards assumption may not be valid in general, especially in the high-dimension setting. Also, this model relaxes from the constraint of the ability to simultaneously model multiple cumulative incidence functions using the Fine and Gray approach. The performance/effectiveness of several penalties including minimax concave penalty (MCP); adaptive LASSO and smoothly clipped absolute deviation (SCAD) as well as their L2 counterparts were investigated through simulation studies in terms of sensitivity/specificity. The results revealed that sensitivity of all penalties were comparable, but the MCP and MCP-L2 penalties outperformed the other methods in term of selecting less noninformative variables. The practical use of the model was investigated through the analysis of genomic competing risk data obtained from patients with bladder cancer and six genes of CDC20, NCF2, SMARCAD1, RTN4, ETFDH, and SON were identified using all the methods and were significantly correlated with the subdistribution.Entities:
Mesh:
Substances:
Year: 2021 PMID: 34589136 PMCID: PMC8476266 DOI: 10.1155/2021/5169052
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
Results of the simulation studies for the Fine and Gray model with 5 informative variables (d = 5000) for ρ = 0.1 scenario. Values shown are means (standard deviations) of each performance measure over 500 replicates (~40% censoring).
|
| 0.2 | 0.5 | 0.8 | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Method | No. selected variables | TPR | FPR | No. selected variables | TPR | FPR | No. selected variables | TPR | FPR | |
| 200 | ALASSO | 36.913 (5.119) | 0.816 (0.161) | 0.007 (0.005) | 33.682 (2.255) | 0.969 (0.075) | 0.006 (0.004) | 26.242 (1.796) | 0.988 (0.047) | 0.004 (0.003) |
| AENET | 36.532 (4.503) | 0.830 (0.174) | 0.006 (0.004) | 33.571 (1.909) | 0.958 (0.089) | 0.006 (0.004) | 26.636 (1.808) | 0.989 (0.046) | 0.004 (0.004) | |
| SCAD | 37.684 (4.484) | 0.858 (0.162) | 0.007 (0.003) | 35.414 (2.691) | 0.972 (0.073) | 0.006 (0.006) | 25.054 (2.136) | 0.993 (0.042) | 0.004 (0.005) | |
| SCAD-L2 | 37.190 (3.171) | 0.867 (0.156) | 0.006 (0.003) | 35.960 (2.742) | 0.960 (0.092) | 0.006 (0.006) | 25.935 (2.232) | 0.994 (0.034) | 0.004 (0.005) | |
| MCP | 27.401 (2.996) | 0.860 (0.142) | 0.004 (0.002) | 25.125 (2.308) | 0.975 (0.068) | 0.004 (0.005) | 23.634 (1.889) | 0.994 (0.034) | 0.004 (0.004) | |
| MCP-L2 | 27.030 (3.103) | 0.849 (0.149) | 0.004 (0.003) | 25.881 (2.115) | 0.971 (0.079) | 0.004 (0.004) | 23.328 (1.643) | 0.995 (0.031) | 0.004 (0.004) | |
| Boosting (Binder) | 41.750 (4.874) | 0.899 (0.140) | 0.007 (0.005) | 39.940 (4.890) | 0.987 (0.052) | 0.007 (0.005) | 38.272 (4.731) | 1.000 (0.000) | 0.007 (0.006) | |
| Oracle | 5.000 | 1.000 | 0.000 | 5.000 | 1.000 | 0.000 | 5.000 | 1.0000 | 0.000 | |
|
| ||||||||||
| 400 | ALASSO | 34.871 (1.430) | 0.963 (0.820) | 0.006 (0.005) | 31.846 (1.196) | 1.000 (0.000) | 0.005 (0.003) | 24.572 (0.891) | 1.000 (0.000) | 0.004 (0.002) |
| AENET | 34.181 (1.649) | 0.972 (0.075) | 0.006 (0.003) | 31.701 (1.041) | 0.999 (0.014) | 0.005 (0.002) | 24.631 (0.925) | 1.000 (0.000) | 0.004 (0.002) | |
| SCAD | 35.192 (1.821) | 0.987 (0.053) | 0.006 (0.003) | 29.942 (1.679) | 1.000 (0.000) | 0.004 (0.004) | 25.304 (1.260) | 1.000 (0.000) | 0.004 (0.003) | |
| SCAD-L2 | 35.736 (1.805) | 0.980 (0.067) | 0.006 (0.005) | 29.672 (1.470) | 0.998 (0.020) | 0.004 (0.003) | 25.150 (1.164) | 1.00 (0.000) | 0.004 (0.003) | |
| MCP | 24.140 (1.580) | 0.968 (0.073) | 0.004 (0.003) | 20.761 (1.227) | 0.999 (0.014) | 0.003 (0.002) | 18.572 (1.228) | 1.000 (0.000) | 0.003 (0.002) | |
| MCP-L2 | 24.162 (1.468) | 0.972 (0.072) | 0.004 (0.003) | 20.661 (1.105) | 0.998 (0.020) | 0.003 (0.002) | 18.381 (0.916) | 0.999 (0.014) | 0.003 (0.002) | |
| Boosting (Binder) | 39.911 (5.256) | 0.995 (0.031) | 0.007 (0.005) | 39.280 (4.874) | 1.000 (0.000) | 0.007 (0.005) | 38.695 (4.965) | 1.000 (0.000) | 0.007 (0.004) | |
| Oracle | 5.000 | 1.000 | 0.000 | 5.000 | 1.000 | 0.000 | 5.000 | 1.00 | 0.00 | |
TPR: true positive rate; FPR: false positive rate; n: sample size.
Results of the simulation studies for the Fine and Gray model with 5 informative variables (d = 5000) for ρ = 0.5 scenario. Values shown are means (standard deviations) of each performance measure over 500 replicates (b = 3: ~40% average censoring).
|
| 0.2 | 0.5 | 0.8 | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Method | No. selected variables | TPR | FDR | No. selected variables | TPR | FDR | No. selected variables | TPR | FDR | |
| 200 | ALASSO | 35.702 (1.969) | 0.779 (0.136) | 0.006 (0.004) | 36.054 (2.027) | 0.900 (0.116) | 0.006 (0.004) | 30.261 (2.202) | 0.953 (0.085) | 0.005 (0.005) |
| AENET | 35.651 (1.766) | 0.775 (0.163) | 0.006 (0.003) | 35.791 (1.745) | 0.891 (0.124) | 0.006 (0.004) | 30.092 (1.584) | 0.937 (0.101) | 0.005 (0.003) | |
| SCAD | 36.763 (3.429) | 0.740 (0.166) | 0.007 (0.006) | 36.723 (2.717) | 0.908 (0.108) | 0.006 (0.006) | 27.101 (2.831) | 0.940 (0.101) | 0.004 (0.003) | |
| SCAD-L2 | 35.742 (2.458) | 0.795 (0.148) | 0.006 (0.005) | 36.783 (3.299) | 0.896 (0.120) | 0.006 (0.005) | 26.511 (2.062) | 0.946 (0.094) | 0.004 (0.003) | |
| MCP | 26.234 (4.008) | 0.690 (0.139) | 0.004 (0.005) | 25.870 (1.983) | 0.847 (0.124) | 0.004 (0.004) | 24.384 (1.805) | 0.922 (0.102) | 0.004 (0.004) | |
| MCP-L2 | 25.691 (2.124) | 0.738 (0.133) | 0.004 (0.005) | 25.921 (2.146) | 0.857 (0.137) | 0.004 (0.004) | 23.531 (2.654) | 0.942 (0.099) | 0.004 (0.004) | |
| Boosting (Binder) | 40.524 (4.171) | 0.966 (0.076) | 0.007 (0.006) | 39.447 (3.187) | 0.994 (0.034) | 0.007 (0.007) | 38.602 (4.211) | 0.990 (0.044) | 0.007 (0.006) | |
| Oracle | 5.000 | 1.000 | 0.000 | 5.000 | 1.000 | 0.000 | 5.000 | 1.000 | 0.000 | |
|
| ||||||||||
| 400 | ALASSO | 37.795 (1.436) | 0.942 (0.108) | 0.007 (0.003) | 35.522 (0.999) | 0.990 (0.031) | 0.006 (0.002) | 29.581 (0.923) | 1.000 (0.020) | 0.005 (0.002) |
| AENET | 36.764 (1.534) | 0.946 (0.093) | 0.006 (0.003) | 35.864 (1.137) | 0.990 (0.044) | 0.006 (0.003) | 29.833 (1.234) | 1.000 (0.000) | 0.005 (0.002) | |
| SCAD | 36.510 (1.956) | 0.908 (0.117) | 0.006 (0.004) | 31.754 (1.774) | 0.979 (0.061) | 0.005 (0.003) | 26.813 (1.523) | 0.999 (0.034) | 0.004 (0.003) | |
| SCAD-L2 | 35.722 (1.590) | 0.902 (0.113) | 0.006 (0.003) | 31.122 (1.297) | 0.988 (0.048) | 0.005 (0.003) | 26.181 (1.296) | 0.999 (0.034) | 0.004 (0.003) | |
| MCP | 25.553 (1.517) | 0.874 (0.121) | 0.004 (0.003) | 22.701 (1.364) | 0.963 (0.083) | 0.004 (0.003) | 20.455 (0.869) | 0.992 (0.039) | 0.003 (0.002) | |
| MCP-L2 | 25.382 (1.388) | 0.897 (0.115) | 0.004 (0.003) | 22.571 (0.987) | 0.982 (0.057) | 0.004 (0.003) | 20.482 (1.020) | 0.998 (0.020) | 0.003 (0.002) | |
| Boosting (Binder) | 39.452 (3.770) | 0.994 (0.034) | 0.007 (0.008) | 37.752 (3.066) | 1.000 (0.000) | 0.006 (0.006) | 38.330 (3.714) | 1.00 (0.000) | 0.007 (0.006) | |
| Oracle | 5.000 | 1.000 | 0.000 | 5.000 | 1.000 | 0.000 | 5.000 | 1.000 | 0.000 | |
TPR: true positive rate; FPR: false positive rate; n: sample size.
Results of the simulation studies for proportional odds model (g(x) = log(1 + x)) with 5 informative variables (d = 5000) for ρ = 0.1 and ρ = 0.5 scenario. Values shown are means (standard deviations) of each performance measure over 500 replicates (b = 3: ~40% average censoring; I/C = 0.5).
| No. selected variables | FDR | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| No. selected variables | TPR | FDR | No. selected variables | TPR | FDR | TPR | FDR | No. selected variables | TPR | |||
| ALASSO | 37.233 (3.143) | 0.972 (0.073) | 0.006 (0.003) | 32.445 (2.341) | 1.000 (0.000) | 0.005 (0.004) | 38.113 (2.027) | 0.903 (0.087) | 0.006 (0.004) | 33.271 (2.271) | 0.996 (0.033) | 0.006 (0.003) |
| AENET | 37.523 (2.881) | 0.970 (0.077) | 0.006 (0.005) | 32.611 (2.254) | 1.000 (0.000) | 0.005 (0.002) | 35.792 (1.745) | 0.902 (0.124) | 0.006 (0.003) | 34.215 (2.421) | 0.995 (0.041) | 0.006 (0.003) |
| SCAD | 31.231 (2.779) | 0.978 (0.067) | 0.005 (0.004) | 30.472 (2.471) | 1.000 (0.000) | 0.005 (0.002) | 36.723 (2.717) | 0.901 (0.108) | 0.006 (0.005) | 32.344 (2.622) | 0.982 (0.031) | 0.005 (0.003) |
| SCAD-L2 | 32.485 (2.812) | 0.979 (0.063) | 0.005 (0.005) | 30.285 (2.345) | 0.999 (0.002) | 0.005 (0.002) | 36.785 (3.299) | 0.903 (0.120) | 0.006 (0.005) | 33.426 (2.312) | 0.990 (0.027) | 0.006 (0.003) |
| MCP | 27.131 (3.110) | 0.980 (0.061) | 0.004 (0.005) | 24.634 (2.331) | 1.000 (0.000) | 0.004 (0.002) | 25.872 (1.983) | 0.907 (0.124) | 0.004 (0.004) | 25.121 (2.107) | 0.990 (0.033) | 0.004 (0.003) |
| MCP-L2 | 27.743 (3.103) | 0.982 (0.060) | 0.004 (0.005) | 24.411 (2.262) | 1.000 (0.000) | 0.004 (0.002) | 25.921 (2.146) | 0.912 (0.137) | 0.004 (0.004) | 25.423 (1.998) | 0.996 (0.024) | 0.004 (0.003) |
Variable selection results (relative frequency of selection) for different methods for independent variables (~40% censoring) over 500 repetitions.
| Method |
|
|
|
|
|
| FPR∗ |
|
|
|
|
|
| FPR∗ | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.5 | ALASSO | 500 | 500 | 0 | 400 | 500 | 0 | 0.006 | 500 | 500 | 0 | 450 | 500 | 0 | 0.006 |
| AENET | 500 | 500 | 500 | 500 | 500 | 500 | 0.006 | 500 | 500 | 500 | 500 | 500 | 500 | 0.006 | |
| SCAD | 500 | 500 | 0 | 500 | 450 | 0 | 0.006 | 500 | 500 | 0 | 500 | 500 | 0 | 0.005 | |
| SCAD-L2 | 500 | 500 | 500 | 500 | 500 | 500 | 0.006 | 500 | 500 | 500 | 500 | 500 | 500 | 0.005 | |
| MCP | 500 | 500 | 0 | 450 | 500 | 0 | 0.004 | 500 | 500 | 0 | 500 | 500 | 0 | 0.004 | |
| MCP-L2 | 500 | 500 | 500 | 500 | 500 | 500 | 0.004 | 500 | 500 | 500 | 500 | 500 | 500 | 0.004 | |
| Boosting (Binder) | 500 | 500 | 0 | 500 | 500 | 0 | 0.007 | 500 | 500 | 0 | 500 | 500 | 0 | 0.006 | |
∗Average false positive rate (FPR) across all simulations of selection of βj = 0, averaged across all j ∈ {7, ⋯, 5000}.
Selected genes data by ENET, AENET, and boosting (from Binder et al.'s study) methods for progression or death from bladder cancer event in bladder cancer data.
| Gene ID | GenBank accession no. | Symbol | ALASSO | AENET | SCAD | SCAD-L2 | MCP | MCP-L2 | Boosting | Related to cancer |
|---|---|---|---|---|---|---|---|---|---|---|
| SEQ162 | XM_088569 | PTGR1 | ✓ | ✓ | ✓ | ✓ | Yes | |||
| SEQ164 | XM_088569 | PTGR1 | ✓ | ✓ | Yes | |||||
| SEQ213 | NM_004358 | CDC25B | ✓ | Yes | ||||||
| SEQ227 | NM_007008 | RTN4 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | Yes |
| SEQ240 | NM_016252 | BIRC6 | ✓ | ✓ | Yes | |||||
| SEQ248 | NM_032333 | PRXL2A | ✓ | ✓ | ✓ | Yes | ||||
| SEQ249 | NM_053056 | CCND1 | ✓ | Yes | ||||||
| SEQ264 | NM_001168 | BIRC5 | ✓ | Yes | ||||||
| SEQ265 | NM_001168 | BIRC5 | ✓ | Yes | ||||||
| SEQ279 | XM_027898 | PIF1 | ✓ | ✓ | ✓ | Yes | ||||
| SEQ287 | AK026169 | SLC5A3 | ✓ | ✓ | Yes | |||||
| SEQ34 | NM_000433 | NCF2 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | Yes |
| SEQ343 | XM_085721 | IL6STP1 | ✓ | |||||||
| SEQ347 | NM_001129 | AEBP1 | ✓ | ✓ | ✓ | ✓ | ✓ | Yes | ||
| SEQ377 | NM_002664 | PLEK | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | Yes | |
| SEQ392 | NM_001752 | CAT | ✓ | Yes | ||||||
| SEQ497 | NM_004735 | LRRFIP1 | ✓ | ✓ | ✓ | ✓ | Yes | |||
| SEQ522 | M55643 | NFKB1 | ✓ | Yes | ||||||
| SEQ634 | NM_004453 | ETFDH | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | Yes |
| SEQ648 | NM_006225 | PLCD1 | ✓ | ✓ | Yes | |||||
| SEQ650 | NM_021173 | POLD4 | ✓ | Yes | ||||||
| SEQ681 | NM_001607 | ACAA1 | ✓ | ✓ | ✓ | Yes | ||||
| SEQ709 | NM_000089 | COL1A2 | ✓ | Yes | ||||||
| SEQ715 | AA827892 | cDNA clone IMAGE:1367358 3′ | ✓ | |||||||
| SEQ776 | NM_018695.1 | ERBIN | ✓ | ✓ | ✓ | ✓ | ||||
| SEQ820 | NM_005916 | MCM7 | ✓ | ✓ | ✓ | Yes | ||||
| SEQ833 | NM_001255.1 | CDC20 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | Yes |
| SEQ843 | NM_000698.1 | ALOX5 | ✓ | Yes | ||||||
| SEQ847 | NM_018229.2 | MUDENG | ✓ | Yes | ||||||
| SEQ919 | NM_024665.2 | IRA1 | ✓ | ✓ | ✓ | |||||
| SEQ921 | BE382685.1 | cDNA clone IMAGE:3627276 5′ | ✓ | ✓ | ✓ | ✓ | ||||
| SEQ940 | NM_020159.1 | SMARCAD1 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | Yes |
| SEQ991 | NM_007373.1 | SHOC2 | ✓ | ✓ | ✓ | |||||
| SEQ1028 | NM_000228.1 | LAMB3 | ✓ | Yes | ||||||
| SEQ1036 | NM_012164.2 | FBXW2 | ✓ | ✓ | ✓ | ✓ | ✓ | Yes | ||
| SEQ1037 | NM_005127.2 | CLEC2B | ✓ | ✓ | ✓ | Yes | ||||
| SEQ1197 | NM_003103.5 | SON | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | Yes |
| SEQ1224 | NM_004060.3 | CCNG1 | ✓ | ✓ | ✓ | ✓ | ✓ | Yes | ||
| SEQ1226 | NM_001921.1 | DCTD | ✓ | Yes | ||||||
| SEQ1262 | NM_000875.2 | IGF1R | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | Yes | |
| SEQ1284 | NM_002757.2 | MAP2K5 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | Yes | |
| SEQ1325 | NM_001085.2 | SERPINA3 | ✓ | ✓ | ✓ | ✓ | Yes | |||
| No. | 18 | 19 | 26 | 22 | 20 | 21 | 12 |
Regression coefficients of six common genes selected by all methods correlated with bladder cancer patients' subdistribution hazards.
| Gene symbol | Sequence | Coefficient (SE) | HR∗ | |
|---|---|---|---|---|
| CDC20 | SEQ833 | 0.986 (0.219) | 2.680 | <0.0001 |
| NCF2 | SEQ34 | 0.905 (0.194) | 2.472 | <0.0001 |
| SMARCAD1 | SEQ940 | -0.808 (0.233) | 0.446 | <0.0001 |
| RTN4 | SEQ227 | -0.823 (0.322) | 0.439 | 0.011 |
| ETFDH | SEQ634 | 0.763 (0.321) | 2.145 | 0.018 |
| SON | SEQ1197 | 0.734 (0.246) | 2.083 | 0.003 |
∗HR: hazards ratio.