| Literature DB >> 27882074 |
Shahrbanoo Goli1, Hossein Mahjub2, Javad Faradmal3, Hoda Mashayekhi4, Ali-Reza Soltanian3.
Abstract
The Support Vector Regression (SVR) model has been broadly used for response prediction. However, few researchers have used SVR for survival analysis. In this study, a new SVR model is proposed and SVR with different kernels and the traditional Cox model are trained. The models are compared based on different performance measures. We also select the best subset of features using three feature selection methods: combination of SVR and statistical tests, univariate feature selection based on concordance index, and recursive feature elimination. The evaluations are performed using available medical datasets and also a Breast Cancer (BC) dataset consisting of 573 patients who visited the Oncology Clinic of Hamadan province in Iran. Results show that, for the BC dataset, survival time can be predicted more accurately by linear SVR than nonlinear SVR. Based on the three feature selection methods, metastasis status, progesterone receptor status, and human epidermal growth factor receptor 2 status are the best features associated to survival. Also, according to the obtained results, performance of linear and nonlinear kernels is comparable. The proposed SVR model performs similar to or slightly better than other models. Also, SVR performs similar to or better than Cox when all features are included in model.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27882074 PMCID: PMC5108874 DOI: 10.1155/2016/2157984
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
Comparison of characteristics of dead and alive patients in the BC dataset, and the relation between the features and the estimated prognostic based on standard linear SVR.
| Independent variables | Category | Characteristics of patients in dataset | Estimated prognostic index in SVR model | |||||
|---|---|---|---|---|---|---|---|---|
| Alive | Dead |
| ||||||
| Count | Mean (SD) | Count | Mean (SD) | Pearson correlation (with prognostic index) | ||||
| Age of diagnosis | 341 | 45.33 (10.26) | 201 | 47.30 (11.64) | −0.22 | 0.001 | ||
| Count | Percent | Count | Percent | Mean prognostic index | SD prognostic index | |||
| Tumor size (cm) | ≤2 | 230 | 70.12 | 98 | 29.88 | −15.98 | 12.18 | 0.313 |
| 2–5 | 103 | 56.28 | 80 | 43.72 | −16.47 | 12.59 | ||
| >5 | 8 | 25.80 | 23 | 74.2 | −23.18 | 8.12 | ||
| Involved lymph nodes ( | <4 | 198 | 69.72 | 86 | 30.28 | −13.44 | 11.87 | <0.001 |
| ≥4 | 103 | 62.05 | 63 | 37.95 | −21.14 | 11.32 | ||
| Unknown | 40 | 43.48 | 52 | 56.52 | ||||
| Marital status | Married | 326 | 62.69 | 194 | 37.31 | −16.20 | 12.21 | 0.354 |
| Single | 15 | 68.18 | 7 | 31.82 | −20.29 | 12.93 | ||
| Family history | Yes | 300 | 62.11 | 183 | 37.89 | −16.46 | 7.12 | 0.418 |
| No | 41 | 69.49 | 18 | 30.51 | −14.94 | 12.58 | ||
| Type of surgery | Lumpectomy | 18 | 64.24 | 10 | 35.76 | −13.34 | 9.19 | 0.031 |
| Radical mastectomy | 250 | 61.27 | 158 | 38.73 | −15.47 | 12.13 | ||
| Segmental mastectomy | 26 | 63.41 | 15 | 36.59 | −18.64 | 15.79 | ||
| simple mastectomy | 47 | 72.30 | 18 | 27.70 | −22.62 | 9.75 | ||
| Histological type | Ductal | 301 | 62.57 | 180 | 37.43 | −17.18 | 12.26 | 0.022 |
| Lobular | 21 | 61.76 | 13 | 38.24 | −10.63 | 10.69 | ||
| Medullar | 19 | 70.37 | 8 | 29.63 | −10.76 | 10.81 | ||
| Metastases status | Present | 51 | 36.42 | 89 | 63.58 | −34.32 | 10.82 | <0.001 |
| Absent | 290 | 72.13 | 112 | 27.87 | −12.83 | 8.99 | ||
| HER2 | Positive | 110 | 65.47 | 58 | 34.53 | −20.89 | 11.62 | <0.001 |
| Negative | 84 | 60.00 | 56 | 40.00 | −10.02 | 10.09 | ||
| Unknown | 147 | 62.82 | 87 | 37.18 | ||||
| ER status | Positive | 106 | 63.85 | 60 | 36.15 | −17.54 | 12.35 | 0.148 |
| Negative | 126 | 68.10 | 59 | 31.90 | −15.24 | 12.06 | ||
| Unknown | 109 | 57.06 | 82 | 42.94 | ||||
| PR status | Positive | 85 | 59.44 | 58 | 40.56 | −20.39 | 11.57 | <0.001 |
| Negative | 142 | 71.71 | 56 | 28.29 | −13.37 | 11.88 | ||
| Unknown | 114 | 56.71 | 87 | 43.29 | ||||
p value < 0.05, p value < 0.01, p value < 0.001.
Performance measures of Cox and SVR models using different kernels and datasets when all features are included in the model. Statistical significant differences between SVR based model 1 (indicated in >italic) and the other models are indicated based on the Wilcoxon rank sum test.
| Dataset | Model | Type of kernel | c-index | Log rank test statistic | Hazard ratio |
|---|---|---|---|---|---|
| BC | Cox model | — | 0.61 ± 0.03 | 1.38 ± 1.10 | 1.22 ± 0.11 |
|
|
|
|
|
| |
| SVR-MRL based model 1 | Linear | 0.62 ± 0.03 | 1.95 ± 1.44 | 1.33 ± 0.15 | |
| SVR based model 2 | Linear | 0.60 ± 0.03 | 1.76 ± 1.33 | 1.31 ± 0.14 | |
| SVR based model 1 | RBF | 0.56 ± 0.03 | 0.52 ± 0.46 | 1.13 ± 0.07 | |
| SVR based model 2 | RBF | 0.54 ± 0.05 | 0.64 ± 0.64 | 1.14 ± 0.11 | |
| SVR based model 1 | Polynomial | 0.59 ± 0.04 | 1.56 ± 1.56 | 1.23 ± 0.17 | |
| SVR based model 2 | Polynomial | 0.57 ± 0.08 | 0.88 ± 0.88 | 1.19 ± 0.23 | |
| SVR based model 1 | Clinical | 0.60 ± 0.02 | 1.47 ± 1.11 | 1.26 ± 0.14 | |
| SVR based model 2 | Clinical | 0.60 ± 0.04 | 1.14 ± 0.97 | 1.27 ± 0.14 | |
| CD | Cox model | — | 0.64 ± 0.01 | 20.81 ± 5.33 | 1.53 ± 0.07 |
|
|
|
|
|
| |
| SVR based model 2 | Linear | 0.64 ± 0.01 | 22.17 ± 6.41 | 1.51 ± 0.07 | |
| SVR-MRL based model 1 | Linear |
|
|
| |
| SVR based model 1 | RBF | 0.61 ± 0.01 | 12.13 ± 3.49 | 1.46 ± 0.08 | |
| SVR based model 2 | RBF | 0.61 ± 0.01 | 11.98 ± 3.98 | 1.43 ± 0.07 | |
| SVR based model 1 | Polynomial | 0.63 ± 0.01 | 18.94 ± 7.05 | 1.56 ± 0.11 | |
| SVR based model 2 | Polynomial | 0.63 ± 0.02 | 14.72 ± 9.65 | 1.54 ± 0.17 | |
| SVR based model 1 | Clinical |
|
|
| |
| SVR based model 2 | Clinical | 0.65 ± 0.01 | 22.28 ± 5.12 | 1.65 ± 0.07 | |
| PT | Cox model | — | 0.70 ± 0.05 | 2.13 ± 1.50 | 1.41 ± 0.24 |
|
|
|
|
|
| |
| SVR based model 2 | Linear | 0.74 ± 0.06 | 3.14 ± 1.92 | 2.14 ± 0.48 | |
| SVR-MRL based model 1 | Linear |
|
|
| |
| SVR based model 1 | RBF | 0.68 ± 0.06 | 0.81 ± 0.75 | 1.45 ± 0.30 | |
| SVR based model 2 | RBF | 0.67 ± 0.05 | 0.84 ± 0.74 | 1.45 ± 0.28 | |
| SVR based model 1 | Polynomial | 0.74 ± 0.05 | 3.69 ± 2.03 | 2.15 ± 0.52 | |
| SVR based model 2 | Polynomial |
|
|
| |
| SVR based model 1 | Clinical | 0.71 ± 0.07 | 1.85 ± 1.67 | 1.83 ± 0.46 | |
| SVR based model 2 | Clinical | 0.70 ± 0.07 | 1.60 ± 1.42 | 1.83 ± 0.48 | |
| PD | Cox model | — | 0.82 ± 0.02 | 23.11 ± 5.24 | 2.67 ± 0.31 |
|
|
|
|
|
| |
| SVR based model 2 | Linear | 0.83 ± 0.01 | 27.40 ± 5.59 | 3.07 ± 0.54 | |
| SVR-MRL based model 1 | Linear | 0.84 ± 0.01 | 26.09 ± 5.82 | 3.14 ± 0.52 | |
| SVR based model 1 | RBF | 0.84 ± 0.02 | 27.93 ± 4.46 | 3.02 ± 0.55 | |
| SVR based model 2 | RBF | 0.84 ± 0.02 | 28.51 ± 4.61 | 3.01 ± 0.59 | |
| SVR based model 1 | Polynomial | 0.84 ± 0.02 | 26.58 ± 4.52 | 3.02 ± 0.52 | |
| SVR based model 2 | Polynomial | 0.84 ± 0.02 | 26.61 ± 4.85 | 3.12 ± 0.49 | |
| SVR based model 1 | Clinical | 0.83 ± 0.01 | 23.92 ± 4.80 | 3.21 ± 0.56 | |
| SVR based model 2 | Clinical | 0.83 ± 0.01 | 25.11 ± 5.23 | 3.14 ± 0.54 |
p value < 0.05, p value < 0.01, p value < 0.001 (Wilcoxon rank sum test).
Figure 1Performance measures of standard linear SVR and Cox for different subsets of features. Horizontal axis shows the number of features included in the model. (a), (b), and (c): performance measures of univariate method for BC dataset. (d), (e), and (f): performance measures of RFE for BC dataset. (g), (h), and (i): performance measures of univariate method for CD dataset. (j), (k), and (l): performance measures of RFE for CD dataset. BC dataset's abbreviations: M: metastasis, P: PR status, H: HER2, L: number of involved lymph nodes, A: age, T: tumor size, TB: histological type of BC, MS: marital status, E: ER status, TS: type of surgery, and F: family history. CD dataset's abbreviations: S: sex, P: perforation of colon, A: age, AN: adherence to nearby organs, O: obstruction of colon by tumor, TS: time from surgery to registration, T: treatment, D: differentiation of tumor, E: extent of local spread, and N: number of lymph nodes.
Figure 2Performance measures of linear SVR and Cox for different subsets of features. (a), (b), and (c): performance measures of univariate method for PT dataset. (d), (e), and (f): performance measures of RFE for PT dataset. (g), (h), and (i): performance measures of univariate method for PD dataset. (j), (k), and (l): performance measures of RFE for PD dataset. B: bilirubin, A: age, U: urine copper, ST: stage, AA: aspartate aminotransferase, H: presence of hepatomegaly, C: cholesterol, CT: standardized blood clotting time, P: platelet count, AL: albumin, BV: blood vessel malformations in the skin, TG: triglycerides, AP: alkaline phosphatase, PA: presence of ascites, E: edema, T: treatment, and S: sex.
Performance measures of SVR and Cox models for simulated experiments with different number of features.
| Number of features | Model | c-index | Log rank test statistic | Hazard ratio |
|---|---|---|---|---|
| 10 | SVR | 0.562 ± 0.015 | 12.281 ± 4.955 | 1.240 ± 0.063 |
| Cox | 0.560 ± 0.013 | 10.662 ± 5.234 | 1.245 ± 0.060 | |
| 20 | SVR | 0.552 ± 0.013 | 8.589 ± 5.557 | 1.203 ± 0.048 |
| Cox | 0.552 ± 0.013 | 9.289 ± 5.990 | 1.194 ± 0.054 | |
| 30 | SVR | 0.540 ± 0.010 | 5.070 ± 2.275 | 1.156 ± 0.042 |
| Cox | 0.541 ± 0.011 | 5.510 ± 3.907 | 1.154 ± 0.041 | |
| 40 | SVR | 0.535 ± 0.011 | 4.900 ± 2.907 | 1.117 ± 0.029 |
| Cox | 0.538 ± 0.011 | 4.367 ± 2.573 | 1.125 ± 0.035 | |
| 50 | SVR | 0.540 ± 0.014 | 5.234 ± 3.557 | 1.136 ± 0.054 |
| Cox | 0.537 ± 0.013 | 3.874 ± 2.681 | 1.120 ± 0.047 | |
| 60 | SVR | 0.532 ± 0.012 | 3.474 ± 2.748 | 1.105 ± 0.039 |
| Cox | 0.528 ± 0.015 | 3.723 ± 2.662 | 1.112 ± 0.052 | |
| 70 | SVR | 0.535 ± 0.012 | 4.134 ± 2.304 | 1.123 ± 0.036 |
| Cox | 0.528 ± 0.016 | 2.774 ± 1.925 | 1.096 ± 0.044 | |
| 80 | SVR | 0.530 ± 0.010 | 3.171 ± 2.597 | 1.120 ± 0.036 |
| Cox | 0.526 ± 0.010 | 1.687 ± 1.323 | 1.079 ± 0.041 | |
| 90 | SVR | 0.526 ± 0.013 | 2.846 ± 2.219 | 1.098 ± 0.030 |
| Cox | 0.521 ± 0.012 | 1.720 ± 1.492 | 1.070 ± 0.041 | |
| 100 | SVR | 0.524 ± 0.012 | 1.594 ± 1.386 | 1.085 ± 0.045 |
| Cox | 0.517 ± 0.014 | 1.226 ± 1.033 | 1.054 ± 0.030 | |
| 110 | SVR | 0.535 ± 0.012 | 3.738 ± 3.335 | 1.101 ± 0.048 |
| Cox | 0.515 ± 0.008 | 0.870 ± 0.800 | 1.057 ± 0.024 | |
| 120 | SVR | 0.527 ± 0.012 | 1.863 ± 1.581 | 1.091 ± 0.035 |
| Cox | 0.515 ± 0.015 | 1.092 ± 0.964 | 1.058 ± 0.033 |
p value < 0.05, p value < 0.01, p value < 0.001 (Wilcoxon rank sum test).