| Literature DB >> 25506389 |
Ge-Jin Chu1, Yong Liang1, Jia-Xuan Wang1.
Abstract
Variable selection is an important issue in regression and a number of variable selection methods have been proposed involving nonconvex penalty functions. In this paper, we investigate a novel harmonic regularization method, which can approximate nonconvex Lq (1/2 < q < 1) regularizations, to select key risk factors in the Cox's proportional hazards model using microarray gene expression data. The harmonic regularization method can be efficiently solved using our proposed direct path seeking approach, which can produce solutions that closely approximate those for the convex loss function and the nonconvex regularization. Simulation results based on the artificial datasets and four real microarray gene expression datasets, such as real diffuse large B-cell lymphoma (DCBCL), the lung cancer, and the AML datasets, show that the harmonic regularization method can be more accurate for variable selection than existing Lasso series methods.Entities:
Mesh:
Year: 2014 PMID: 25506389 PMCID: PMC4259133 DOI: 10.1155/2014/857398
Source DB: PubMed Journal: Comput Math Methods Med ISSN: 1748-670X Impact factor: 2.238
Average number of the variable selected and the recovery rate by the seven regularization methods on the simulated data in 500 runs.
| Corr. | Size | Average of variable selected | ||||||
|---|---|---|---|---|---|---|---|---|
|
| SCAD | MCP | A-Lasso | Len |
| HRA | ||
| ρ = 0.1 | 100 | 33.4 | 11.6 | 9.2 | 17.8 | 35.8 |
|
|
| 150 | 29.7 | 9.8 | 7.9 | 12.8 | 31.2 |
| 6.3 | |
| 200 | 24.4 | 8.4 | 6.1 | 9.4 | 24.8 |
| 5.8 | |
|
| ||||||||
| ρ = 0.1 | 100 | 43.2 | 14.8 | 11.7 | 22.9 | 46.9 | 9.9 |
|
| 150 | 34.5 | 11.2 | 8.7 | 16.5 | 36.3 |
| 7.3 | |
| 200 | 26.7 | 9.7 | 7.7 | 11.4 | 28.2 |
| 7 | |
|
| ||||||||
| ρ = 0.5 | 100 | 45.1 | 15.1 | 12.2 | 27.2 | 47.8 |
| 10.9 |
| 150 | 39.3 | 11.9 | 10.8 | 20.1 | 44.3 | 8.4 |
| |
| 200 | 27.1 | 10.1 | 8.4 | 12.6 | 30.6 |
| 7.7 | |
|
| ||||||||
| ρ = 0.5 | 100 | 55.6 | 17.7 | 16 | 32.7 | 56.3 |
| 15.2 |
| 150 | 47.3 | 15.9 | 14.8 | 25.9 | 48.6 |
| 10 | |
| 200 | 36.8 | 13.7 | 12.5 | 19.5 | 41.6 |
|
| |
|
| ||||||||
| Corr. | Size | Recovery rate | ||||||
|
| SCAD | MCP | A-Lasso | Len |
| HRA | ||
|
| ||||||||
| ρ = 0.1 | 100 | 0.14 | 0.43 | 0.54 | 0.28 | 0.13 | 0.71 |
|
| 150 | 0.16 | 0.51 | 0.63 | 0.39 | 0.16 |
| 0.79 | |
| 200 | 0.2 | 0.59 | 0.81 | 0.53 | 0.2 |
| 0.86 | |
|
| ||||||||
| ρ = 0.1 | 100 | 0.11 | 0.33 | 0.42 | 0.21 | 0.1 | 0.5 |
|
| 150 | 0.14 | 0.44 | 0.57 | 0.3 | 0.13 |
| 0.63 | |
| 200 | 0.18 | 0.51 | 0.64 | 0.43 | 0.17 | 0.7 |
| |
|
| ||||||||
| ρ = 0.5 | 100 | 0.11 | 0.33 | 0.4 | 0.18 | 0.1 |
| 0.45 |
| 150 | 0.12 | 0.42 | 0.46 | 0.24 | 0.11 | 0.59 |
| |
| 200 | 0.18 | 0.49 | 0.59 | 0.39 | 0.16 | 0.62 |
| |
|
| ||||||||
| ρ = 0.5 | 100 | 0.08 | 0.28 | 0.31 | 0.15 | 0.08 |
| 0.32 |
| 150 | 0.1 | 0.31 | 0.33 | 0.19 | 0.1 |
| 0.5 | |
| 200 | 0.13 | 0.36 | 0.4 | 0.25 | 0.12 |
|
| |
Average IBS and CI results of by the seven regularization methods on the simulated data in 500 runs.
| Corr. | Size | Average IBS | ||||||
|---|---|---|---|---|---|---|---|---|
|
| SCAD | MCP | A-Lasso | Len |
| HRA | ||
| ρ = 0.1 | 100 | 0.084 | 0.094 | 0.086 | 0.084 |
| 0.091 | 0.086 |
| 150 | 0.081 | 0.087 | 0.084 | 0.08 | 0.08 | 0.084 |
| |
| 200 | 0.078 | 0.086 | 0.079 | 0.083 |
| 0.078 |
| |
|
| ||||||||
| ρ = 0.1 | 100 | 0.096 |
| 0.097 | 0.096 | 0.094 | 0.098 | 0.093 |
| 150 | 0.092 | 0.091 | 0.094 | 0.094 |
| 0.089 | 0.09 | |
| 200 | 0.088 | 0.088 | 0.086 | 0.087 |
| 0.086 | 0.086 | |
|
| ||||||||
| ρ = 0.5 | 100 | 0.105 | 0.098 | 0.101 | 0.102 | 0.097 | 0.101 |
|
| 150 | 0.098 | 0.096 | 0.099 | 0.102 |
| 0.098 | 0.091 | |
| 200 | 0.091 | 0.092 | 0.096 | 0.096 |
| 0.095 | 0.09 | |
|
| ||||||||
| ρ = 0.5 | 100 | 0.108 | 0.103 | 0.108 | 0.106 | 0.099 | 0.01 |
|
| 150 | 0.101 | 0.097 | 0.1 | 0.096 |
| 0.094 | 0.097 | |
| 200 | 0.084 | 0.094 | 0.086 | 0.084 |
| 0.091 | 0.086 | |
|
| ||||||||
| Corr. | Size | Average CI | ||||||
|
| SCAD | MCP | A-Lasso | Len |
| HRA | ||
|
| ||||||||
| ρ = 0.1 | 100 | 0.749 | 0.788 | 0.822 | 0.757 |
| 0.838 | 0.845 |
| 150 | 0.832 | 0.853 | 0.869 | 0.838 | 0.868 | 0.865 |
| |
| 200 | 0.85 | 0.847 | 0.857 | 0.859 |
| 0.859 | 0.862 | |
|
| ||||||||
| ρ = 0.1 | 100 | 0.728 | 0.758 |
| 0.716 | 0.727 | 0.761 | 0.763 |
| 150 | 0.82 | 0.841 | 0.833 | 0.831 |
| 0.847 | 0.837 | |
| 200 | 0.847 | 0.857 | 0.862 | 0.846 |
| 0.862 | 0.866 | |
|
| ||||||||
| ρ = 0.5 | 100 | 0.726 |
| 0.752 | 0.745 | 0.752 |
| 0.748 |
| 150 | 0.781 | 0.818 | 0.821 | 0.793 | 0.819 | 0.813 |
| |
| 200 | 0.786 | 0.835 | 0.826 | 0.792 |
| 0.824 | 0.828 | |
|
| ||||||||
| ρ = 0.5 | 100 | 0.699 | 0.712 | 0.701 | 0.685 |
| 0.716 | 0.714 |
| 150 | 0.766 | 0.777 | 0.817 | 0.788 | 0.814 |
| 0.814 | |
| 200 | 0.776 | 0.801 | 0.82 | 0.808 |
| 0.819 | 0.819 | |
The gene expression datasets are used in experiments.
| Datasest | Number of genes | Number of samples | Number of censored |
|---|---|---|---|
| DLBCL (2002) | 7399 | 240 | 102 |
| DLBCL (2003) | 8810 | 92 | 28 |
| Lung cancer | 7129 | 86 | 62 |
| AML | 6283 | 116 | 49 |
Results of the gene selected by the seven methods on the four public datasets.
| Datasest |
| SCAD | MCP | A-Lasso | Len |
| HRA |
|---|---|---|---|---|---|---|---|
| DLBCL (2002) | 174 | 129 | 129 | 146 | 180 | 76 |
|
| DLBCL (2003) | 138 | 106 | 95 | 168 | 142 |
| 37 |
| Lung cancer | 188 | 104 | 97 | 233 | 196 | 56 |
|
| AML | 161 | 120 | 110 | 176 | 166 |
| 70 |
In bold is the best performance.
The IBS results obtained by the seven methods on the four public datasets.
| Datasets | Average IBS | ||||||
|---|---|---|---|---|---|---|---|
|
| SCAD | MCP | A-Lasso | Len |
| HRA | |
| DLBCL (2002) | 0.207 | 0.205 | 0.205 | 0.205 |
| 0.203 | 0.205 |
| DLBCL (2003) | 0.121 | 0.119 | 0.12 | 0.12 | 0.12 |
| 0.119 |
| Lung cancer | 0.169 | 0.161 | 0.167 | 0.164 |
| 0.163 | 0.161 |
| AML | 0.174 | 0.174 | 0.173 | 0.172 |
| 0.173 |
|
|
| |||||||
| Datasets | Average CI | ||||||
|
| SCAD | MCP | A-Lasso | Len |
| HRA | |
|
| |||||||
| DLBCL (2002) | 0.553 | 0.554 | 0.564 | 0.555 |
| 0.563 | 0.566 |
| DLBCL (2003) | 0.583 | 0.604 | 0.586 | 0.589 |
| 0.603 |
|
| Lung cancer | 0.628 | 0.634 | 0.666 | 0.646 |
| 0.673 | 0.674 |
| AML | 0.599 | 0.611 | 0.634 | 0.626 | 0.641 | 0.638 |
|
In bold-the best performance.