| Literature DB >> 29689755 |
Haiwei Shen, Hua Chai, Meiping Li, Zhiming Zhou, Yong Liang, Ziyi Yang, Haihui Huang, Xiaoying Liu, Bowen Zhang.
Abstract
To identify the bio-mark genes related to disease with high dimension and low sample size gene expression data, various regression approaches with different regularization methods have been proposed to solve this problem. Nevertheless, high-noises in biological data significantly reduce the performances of methods. The accelerated failure time (AFT) modelwas designed for gene selection and survival time estimation in cancer survival analysis. In this article, we proposed a novel robust sparse accelerated failure time model (RS-AFT) through combining the least absolute deviation (LAD) and Lq regularization. An iterative weighted linear programming algorithm without regularization parameter tuning was proposed to solve this RS-AFT model. The results of the experiments show our method has better performancebothin gene selection and survival time estimationthan some widely used regularization methods such as lasso, elastic net and SCAD. Hence we thought the RS-AFT model may be a competitive regularization method in cancer survival analysis.Entities:
Keywords: AFT; regularization; survival analysis
Mesh:
Year: 2018 PMID: 29689755 PMCID: PMC6004954 DOI: 10.3233/THC-174141
Source DB: PubMed Journal: Technol Health Care ISSN: 0928-7329 Impact factor: 1.285
Theperformanceofgene selection obtainedby different AFT methods
| Control parameter | Number of total selected genes | Number of correct genes | ||||||
|---|---|---|---|---|---|---|---|---|
| RS-AFT | Lasso | SCAD | EN | RS-AFT | Lasso | SCAD | EN | |
| 28.42 | 73.60 | 39.41 | 115.85 | 8.15 | 8.21 | 8.11 | 8.55 | |
| 18.24 | 49.03 | 26.97 | 81.43 | 8.79 | 8.85 | 8.82 | 9.17 | |
| 20.47 | 55.17 | 28.78 | 87.42 | 8.60 | 8.65 | 8.64 | 8.93 | |
| 13.87 | 36.47 | 20.19 | 55.38 | 9.15 | 9.18 | 9.12 | 9.42 | |
The gene selection performancesofdifferent methods in simulation experiments
| Control parameter | Sensitivity | Specificity | Efficiency | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| RS-AFT | Lasso | SCAD | EN | RS-AFT | Lasso | SCAD | EN | RS-AFT | Lasso | SCAD | EN | |
| 0.815 | 0.821 | 0.811 | 0.855 | 0.986 | 0.956 | 0.979 | 0.928 | 0.287 | 0.111 | 0.206 | 0.074 | |
| 0.879 | 0.885 | 0.882 | 0.917 | 0.994 | 0.973 | 0.988 | 0.952 | 0.482 | 0.181 | 0.327 | 0.113 | |
| 0.860 | 0.865 | 0.864 | 0.893 | 0.992 | 0.969 | 0.986 | 0.947 | 0.420 | 0.156 | 0.300 | 0.102 | |
| 0.915 | 0.918 | 0.912 | 0.942 | 0.997 | 0.982 | 0.993 | 0.969 | 0.659 | 0.252 | 0.452 | 0.170 | |
The absolute error obtained by different methods
| Control parameter | RS-AFT | Lasso | SCAD | EN |
|---|---|---|---|---|
| 1.95 | 3.85 | 2.56 | 4.37 | |
| 1.16 | 2.43 | 1.75 | 2.94 | |
| 1.28 | 2.61 | 1.83 | 3.12 | |
| 0.73 | 1.74 | 1.17 | 2.05 |
The detail information of four real gene expression datasets used in the experiments
| Dataset | No. of genes | No. of samples | No. of censored | No. of training | No. of testing |
|---|---|---|---|---|---|
| DLBCL (2002) | 7399 | 240 | 102 | 168 | 72 |
| DLBCL (2003) | 8810 | 92 | 28 | 64 | 28 |
| Lung cancer | 7129 | 86 | 62 | 60 | 26 |
| AML | 6283 | 116 | 49 | 81 | 35 |
The number of selected genes obtained by different AFT models on the real datasets
| Dataset | RS-AFT | Lasso | SCAD | EN |
|---|---|---|---|---|
| DLBCL (2002) | 58.52 | 131.26 | 73.70 | 168.43 |
| DLBCL (2003) | 29.84 | 83.28 | 30.61 | 109.71 |
| Lung cancer | 28.43 | 86.51 | 39.42 | 102.46 |
| AML | 39.11 | 110.37 | 68.57 | 152.83 |
Absolute error obtained by different AFT models on the real microarray datasets
| Dataset | RS-AFT | Lasso | SCAD | EN |
|---|---|---|---|---|
| DLBCL (2002) | 0.65 | 1.31 | 0.84 | 1.87 |
| DLBCL (2003) | 1.16 | 2.28 | 1.41 | 2.83 |
| Lung cancer | 1.84 | 3.30 | 2.66 | 4.18 |
| AML | 3.11 | 4.93 | 3.74 | 6.08 |
The disease related genes selected by different AFT methods in lung cancer dataset
| Rank | RS | Lasso | EN | SCAD |
|---|---|---|---|---|
| 1 | SMAD4 | WWP1 | TRA2A | WWP1 |
| 2 | ENPP2 | HUWE1 | WWP1 | TRA2A |
| 3 | TRA2A | TRA2A | CCL21 | HUWE1 |
| 4 | LLGL1 | CCL21 | HUWE1 | CCL21 |
| 5 | WWP1 | ADM | ADM | ADM |
| 6 | DYNLT3 | PBXIP1 | RPL36AL | PHKG1 |
| 7 | DOC2A | RPS29 | HLA-C | HLA-C |
| 8 | HUWE1 | TNNC2 | PEX7 | RPS29 |
| 9 | TEK | DOC2A | ZNF148 | DOC2A |
| 10 | PHKG1 | HLA-C | INHA | ATRX |
| 11 | PFN1 | HTR6 | RPS29 | ENPP2 |
| 12 | RPL23 | TFAP2C | DOC2A | ZNF148 |
| 13 | ENPP2 | ZNF148 | SERINC3 | TFAP2C |
| 14 | POLR2A | HUMBINDC | GNS | TNNC2 |
| 15 | CFTR | RPL36AL | ATRX | RAD23B |
| The weighted iterative algorithm for the RS-AFT model | |
|---|---|
| 1: | Initialize |
| 2: | Set |
| 3: | |
| 4: | Update |
| 5: | |
| 6: | |
| 7: | end while |
| 8: | return |