| Literature DB >> 26185761 |
Hai-Hui Huang1, Yong Liang1, Xiao-Ying Liu1.
Abstract
Identifying biomarker and signaling pathway is a critical step in genomic studies, in which the regularization method is a widely used feature extraction approach. However, most of the regularizers are based on L 1-norm and their results are not good enough for sparsity and interpretation and are asymptotically biased, especially in genomic research. Recently, we gained a large amount of molecular interaction information about the disease-related biological processes and gathered them through various databases, which focused on many aspects of biological systems. In this paper, we use an enhanced L 1/2 penalized solver to penalize network-constrained logistic regression model called an enhanced L 1/2 net, where the predictors are based on gene-expression data with biologic network knowledge. Extensive simulation studies showed that our proposed approach outperforms L 1 regularization, the old L 1/2 penalized solver, and the Elastic net approaches in terms of classification accuracy and stability. Furthermore, we applied our method for lung cancer data analysis and found that our method achieves higher predictive accuracy than L 1 regularization, the old L 1/2 penalized solver, and the Elastic net approaches, while fewer but informative biomarkers and pathways are selected.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26185761 PMCID: PMC4488258 DOI: 10.1155/2015/713953
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Simulation results of the enhanced L 1/2 net, L 1/2 net, L 1 net, and Elastic net, respectively.
| Model | Misclassification errors (%) | Sensitivity (%) | Specificity (%) | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Eh_ |
|
| Elastic | Eh_ |
|
| Elastic | Eh_ |
|
| Elastic | |
| 1 |
| 9.85 | 11.81 | 13.12 |
| 0.971 | 0.968 | 0.873 | 0.969 | 0.970 | 0.962 | 0.981 |
| (0.36) | (0.31) | (0.41) | (0.12) | (0.00) | (0.00) | (0.02) | (0.00) | (0.00) | (0.01) | (0.01) | (0.00) | |
|
| ||||||||||||
| 2 |
| 10.83 | 13.21 | 14.14 | 0.939 | 0.939 | 0.943 | 0.835 |
| 0.981 | 0.987 | 0.980 |
| (0.33) | (0.36) | (0.24) | (0.23) | (0.00) | (0.00) | (0.01) | (0.00) | (0.02) | (0.01) | (0.01) | (0.00) | |
Simulation results (averaged over 100 runs) for comparison of misclassification errors, sensitivity, and specificity used the enhanced L 1/2 net, L 1/2 net, L 1 net, and the Elastic net, respectively. The standard errors are given in parentheses.
Figure 1The solution paths of the enhanced L 1/2 net for the lung cancer dataset in one sample run.
Figure 2The solution paths of L 1/2 net for the lung cancer dataset in one sample run.
Figure 3The solution paths of L 1 net for the lung cancer dataset in one sample run.
Figure 4The solution paths of the Elastic net for the lung cancer dataset in one sample run.
The results of the enhanced L 1/2 net, L 1/2 net, L 1 net, and Elastic net on LC dataset, respectively.
| Selected genes | Connected genes | Connected edges | Cross validation error | Test error | |
|---|---|---|---|---|---|
| Eh_ | 171 | 54 | 41 | 6/70 | 5/37 |
|
| 193 | 61 | 47 | 6/70 | 6/37 |
|
| 500 | 150 | 121 | 7/70 | 6/37 |
| Elastic | 636 | 337 | 510 | 6/70 | 6/37 |
Results of analysis of LC gene expression dataset by four procedures, including the number of genes selected, the number of linked PPI network genes, the number of linked PPI network edges, the CV error, and test errors.
Figure 5Subnetworks identified by the enhanced L 1/2 net for lung cancer datasets (only those genes that are linked on the PPI network are plotted).