| Literature DB >> 25538955 |
Maryam Farhadian1, Paulo J G Lisboa2, Abbas Moghimbeigi3, Jalal Poorolajal3, Hossein Mahjub4.
Abstract
In microarray studies, the number of samples is relatively small compared to the number of genes per sample. An important aspect of microarray studies is the prediction of patient survival based on their gene expression profile. This naturally calls for the use of a dimension reduction procedure together with the survival prediction model. In this study, a new method based on combining wavelet approximation coefficients and Cox regression was presented. The proposed method was compared with supervised principal component and supervised partial least squares methods. The different fitted Cox models based on supervised wavelet approximation coefficients, the top number of supervised principal components, and partial least squares components were applied to the data. The results showed that the prediction performance of the Cox model based on supervised wavelet feature extraction was superior to the supervised principal components and partial least squares components. The results suggested the possibility of developing new tools based on wavelets for the dimensionally reduction of microarray data sets in the context of survival analysis.Entities:
Mesh:
Year: 2014 PMID: 25538955 PMCID: PMC4235600 DOI: 10.1155/2014/618412
Source DB: PubMed Journal: ScientificWorldJournal ISSN: 1537-744X
Figure 1The 1D wavelet decomposition process.
Performance of different Cox models for simulated dataset.
| # Gene | Method | C index ± se | CPE ± se |
| LR ± se | IBS ± se |
|---|---|---|---|---|---|---|
| 40 | Supervised wavelet | 0.924 ± 0.002 | 0.904 ± 0.003 | 0.766 ± 0.006 | 96.906 ± 1.729 | 0.153 ± 0.000 |
| Supervised PCA | 0.907 ± 0.002 | 0.850 ± 0.003 | 0.709 ± 0.003 | 81.564 ± 0.700 | 0.153 ± 0.000 | |
| Supervised PLS | 0.919 ± 0.005 | 0.865 ± 0.005 | 0.739 ± 0.005 | 89.083 ± 1.311 | 0.155 ± 0.000 | |
|
| ||||||
| 30 | Supervised wavelet | 0.914 ± 0.002 | 0.877 ± 0.004 | 0.720 ± 0.009 | 83.313 ± 2.284 | 0.150 ± 0.000 |
| Supervised PCA | 0.897 ± 0.003 | 0.842 ± 0.014 | 0.684 ± 0.007 | 76.448 ± 1.410 | 0.151 ± 0.004 | |
| Supervised PLS | 0.910 ± 0.003 | 0.853 ± 0.016 | 0.711 ± 0.008 | 82.436 ± 1.791 | 0.151 ± 0.004 | |
|
| ||||||
| 20 | Supervised wavelet | 0.899 ± 0.006 | 0.837 ± 0.030 | 0.682 ± 0.005 | 72.253 ± 2.233 | 0.153 ± 0.003 |
| Supervised PCA | 0.886 ± 0.004 | 0.827 ± 0.025 | 0.648 ± 0.009 | 69.357 ± 1.873 | 0.154 ± 0.004 | |
| Supervised PLS | 0.895 ± 0.003 | 0.835 ± 0.027 | 0.669 ± 0.011 | 73.691 ± 2.273 | 0.154 ± 0.003 | |
|
| ||||||
| 10 | Supervised wavelet | 0.870 ± 0.006 | 0.823 ± 0.023 | 0.618 ± 0.013 | 65.800 ± 1.419 | 0.154 ± 0.004 |
| Supervised PCA | 0.855 ± 0.011 | 0.810 ± 0.002 | 0.582 ± 0.008 | 58.072 ± 1.845 | 0.154 ± 0.003 | |
| Supervised PLS | 0.866 ± 0.009 | 0.818 ± 0.001 | 0.609 ± 0.009 | 62.484 ± 1.767 | 0.156 ± 0.003 | |
Performance of different Cox models for DLBCL dataset.
| # Gene | Method | C index ± se | CPE ± se |
| LR ± se | IBS ± se |
|---|---|---|---|---|---|---|
| 40 | Supervised wavelet | 0.755 ± 0.005 | 0.744 ± 0.004 | 0.401 ± 0.011 | 78.739 ± 1.815 | 0.237 ± 0.007 |
| Supervised PCA | 0.711 ± 0.004 | 0.695 ± 0.003 | 0.270 ± 0.000 | 42.636 ± 1.762 | 0.245 ± 0.005 | |
| Supervised PLS | 0.723 ± 0.003 | 0.698 ± 0.003 | 0.294 ± 0.007 | 55.883 ± 1.449 | 0.250 ± 0.005 | |
|
| ||||||
| 30 | Supervised wavelet | 0.723 ± 0.005 | 0.727 ± 0.007 | 0.325 ± 0.013 | 70.303 ± 2.618 | 0.244 ± 0.004 |
| Supervised PCA | 0.709 ± 0.004 | 0.692 ± 0.003 | 0.262 ± 0.008 | 42.087 ± 1.825 | 0.245 ± 0.003 | |
| Supervised PLS | 0.713 ± 0.002 | 0.697 ± 0.002 | 0.289 ± 0.007 | 54.898 ± 1.418 | 0.251 ± 0.004 | |
|
| ||||||
| 20 | Supervised wavelet | 0.730 ± 0.002 | 0.714 ± 0.002 | 0.323 ± 0.009 | 59.708 ± 2.699 | 0.243 ± 0.004 |
| Supervised PCA | 0.709 ± 0.003 | 0.688 ± 0.003 | 0.260 ± 0.008 | 41.327 ± 2.079 | 0.245 ± 0.003 | |
| Supervised PLS | 0.719 ± 0.002 | 0.696 ± 0.003 | 0.282 ± 0.006 | 53.130 ± 1.486 | 0.249 ± 0.004 | |
|
| ||||||
| 10 | Supervised wavelet | 0.703 ± 0.004 | 0.686 ± 0.005 | 0.255 ± 0.007 | 49.838 ± 1.832 | 0.248 ± 0.003 |
| Supervised PCA | 0.699 ± 0.005 | 0.686 ± 0.003 | 0.254 ± 0.013 | 41.056 ± 2.045 | 0.252 ± 0.004 | |
| Supervised PLS | 0.701 ± 0.003 | 0.684 ± 0.003 | 0.255 ± 0.007 | 45.648 ± 2.241 | 0.254 ± 0.006 | |
Performance of different Cox models for lung cancer dataset.
| # Gene | Method | C index ± se | CPE ± se |
| LR ± se | IBS ± se |
|---|---|---|---|---|---|---|
| 20 | Supervised wavelet | 0.923 ± 0.005 | 0.876 ± 0.007 | 0.582 ± 0.014 | 54.986 ± 2.130 | 0.328 ± 0.015 |
| Supervised PCA | 0.892 ± 0.003 | 0.796 ± 0.010 | 0.471 ± 0.014 | 38.609 ± 1.637 | 0.353 ± 0.009 | |
| Supervised PLS | 0.909 ± 0.005 | 0.801 ± 0.005 | 0.498 ± 0.008 | 40.77 ± 1.439 | 0.365 ± 0.011 | |
|
| ||||||
| 15 | Supervised wavelet | 0.905 ± 0.004 | 0.846 ± 0.005 | 0.531 ± 0.007 | 45.466 ± 1.838 | 0.343 ± 0.007 |
| Supervised PCA | 0.894 ± 0.003 | 0.801 ± 0.007 | 0.469 ± 0.010 | 38.263 ± 1.678 | 0.349 ± 0.007 | |
| Supervised PLS | 0.900 ± 0.002 | 0.803 ± 0.005 | 0.483 ± 0.008 | 39.954 ± 1.382 | 0.353 ± 0.009 | |
|
| ||||||
| 10 | Supervised wavelet | 0.889 ± 0.006 | 0.813 ± 0.006 | 0.462 ± 0.018 | 38.357 ± 1.641 | 0.330 ± 0.010 |
| Supervised PCA | 0.878 ± 0.005 | 0.784 ± 0.009 | 0.441 ± 0.008 | 34.217 ± 1.671 | 0.335 ± 0.008 | |
| Supervised PLS | 0.885 ± 0.003 | 0.788 ± 0.004 | 0.448 ± 0.007 | 36.087 ± 1.356 | 0.350 ± 0.007 | |
|
| ||||||
| 5 | Supervised wavelet | 0.873 ± 0.006 | 0.795 ± 0.005 | 0.429 ± 0.001 | 31.906 ± 1.786 | 0.297 ± 0.007 |
| Supervised PCA | 0.853 ± 0.005 | 0.775 ± 0.006 | 0.387 ± 0.012 | 29.241 ± 1.784 | 0.315 ± 0.006 | |
| Supervised PLS | 0.858 ± 0.005 | 0.771 ± 0.006 | 0.386 ± 0.010 | 29.650 ± 1.313 | 0.323 ± 0.006 | |
Figure 2Box plot of the difference in model evaluation criteria between the supervised wavelet and the two other methods for simulated dataset with different number of preselected genes.
Figure 3Box plot of the difference in model evaluation criteria between the supervised wavelet and the two other methods for DLBCL dataset with different number of preselected genes.
Figure 4Box plot of the difference in model evaluation criteria between the supervised wavelet and the two other methods for Lung dataset with different number of preselected genes.
Performance of different Cox models for lung cancer dataset (clinical + genomic data).
| # Gene | Method | C index ± se | CPE ± se |
| LR ± se | IBS ± se |
|---|---|---|---|---|---|---|
| 20 | Supervised wavelet | 0.949 ± 0.006 | 0.924 ± 0.010 | 0.669 ± 0.031 | 72.304 ± 2.589 | 0.431 ± 0.007 |
| Supervised PCA | 0.907 ± 0.008 | 0.844 ± 0.009 | 0.553 ± 0.033 | 52.020 ± 2.208 | 0.432 ± 0.007 | |
| Supervised PLS | 0.914 ± 0.007 | 0.849 ± 0.009 | 0.564 ± 0.035 | 53.814 ± 2.366 | 0.435 ± 0.009 | |
|
| ||||||
| 15 | Supervised wavelet | 0.916 ± 0.005 | 0.855 ± 0.011 | 0.558 ± 0.031 | 56.318 ± 3.017 | 0.433 ± 0.010 |
| Supervised PCA | 0.903 ± 0.007 | 0.836 ± 0.010 | 0.540 ± 0.034 | 53.478 ± 2.585 | 0.435 ± 0.009 | |
| Supervised PLS | 0.908 ± 0.007 | 0.842 ± 0.012 | 0.552 ± 0.041 | 55.526 ± 2.398 | 0.435 ± 0.006 | |
|
| ||||||
| 10 | Supervised wavelet | 0.906 ± 0.006 | 0.848 ± 0.008 | 0.552 ± 0.027 | 52.746 ± 2.872 | 0.426 ± 0.006 |
| Supervised PCA | 0.892 ± 0.009 | 0.831 ± 0.008 | 0.521 ± 0.029 | 48.092 ± 2.119 | 0.426 ± 0.007 | |
| Supervised PLS | 0.905 ± 0.009 | 0.842 ± 0.009 | 0.542 ± 0.031 | 51.472 ± 2.562 | 0.430 ± 0.005 | |
|
| ||||||
| 5 | Supervised wavelet | 0.895 ± 0.008 | 0.818 ± 0.011 | 0.499 ± 0.036 | 51.472 ± 2.760 | 0.352 ± 0.008 |
| Supervised PCA | 0.883 ± 0.009 | 0.803 ± 0.010 | 0.445 ± 0.042 | 46.336 ± 2.113 | 0.359 ± 0.008 | |
| Supervised PLS | 0.879 ± 0.007 | 0.814 ± 0.010 | 0.481 ± 0.029 | 49.976 ± 2.152 | 0.355 ± 0.006 | |