| Literature DB >> 29100492 |
Rex Shen1, Lan Luo2, Hui Jiang3.
Abstract
BACKGROUND: This article concerns the identification of gene pairs or combinations of gene pairs associated with biological phenotype or clinical outcome, allowing for building predictive models that are not only robust to normalization but also easily validated and measured by qPCR techniques. However, given a small number of biological samples yet a large number of genes, this problem suffers from the difficulty of high computational complexity and imposes challenges to the accuracy of identification statistically.Entities:
Keywords: ADMM; Biomarker; Gene pair; Penalized regression
Mesh:
Substances:
Year: 2017 PMID: 29100492 PMCID: PMC5670721 DOI: 10.1186/s12859-017-1872-9
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Comparison of ADMM and CD+LS algorithms
| % Success Rate | Time (s) | ||||
|---|---|---|---|---|---|
|
|
| ADMM | CD+LS | ADMM | CD+LS |
| 20 | 50 | 100.0 | 100.0 | 0.03 | 0.0 |
| (0.0) | (0.0) | (0.01) | (0.0) | ||
| 100 | 100 | 100.0 | 100.0 | 0.06 | 1.3 |
| (0.0) | (0.0) | (0.00) | (0.1) | ||
| 100 | 500 | 100.0 | 80.0 | 0.14 | 0.6 |
| (0.0) | (9.2) | (0.04) | (0.2) | ||
| 200 | 1000 | 100.0 | 80.0 | 0.54 | 3.9 |
| (0.0) | (9.2) | (0.12) | (0.4) | ||
Sample means (standard errors in parentheses) of rates for successful convergence (in percentages), and running times (in seconds), based on 20 simulation replications, for the proposed ADMM and CD+LS algorithms
Comparison of the proposed method and the Lasso in simulations
| Setting | ( | Method | MSE |
| RE | FR |
|---|---|---|---|---|---|---|
| 1 | (20, 50, 0.5) | Lasso 1 | 0.29 | 0.89 | 0.77 | 0.33 |
| (0.01) | (0.00) | (0.01) | (0.00) | |||
| Lasso 2 | 0.31 | 0.88 | 0.14 | 0.00 | ||
| (0.01) | (0.00) | (0.01) | (0.00) | |||
| Proposed | 0.29 | 0.89 | 0.13 | 0.00 | ||
| (0.01) | (0.00) | (0.01) | (0.00) | |||
| 1 | (100, 25, 0.2) | Lasso 1 | 0.19 | 0.92 | 0.80 | 0.37 |
| (0.07) | (0.03) | (0.02) | (0.02) | |||
| Lasso 2 | 0.24 | 0.89 | 0.22 | 0.06 | ||
| (0.08) | (0.03) | (0.03) | (0.03) | |||
| Proposed | 0.19 | 0.92 | 0.21 | 0.04 | ||
| (0.07) | (0.03) | (0.03) | (0.03) | |||
| 2 | (20, 50, 0.5) | Lasso 1 | 0.32 | 0.89 | 0.44 | 0.20 |
| (0.01) | (0.00) | (0.02) | (0.04) | |||
| Lasso 2 | 0.36 | 0.88 | 0.14 | 0.00 | ||
| (0.02) | (0.01) | (0.01) | (0.00) | |||
| Proposed | 0.32 | 0.89 | 0.14 | 0.01 | ||
| (0.01) | (0.00) | (0.01) | (0.01) | |||
| 2 | (100, 25, 0.2) | Lasso 1 | 0.42 | 0.84 | 0.52 | 0.37 |
| (0.07) | (0.02) | (0.03) | (0.06) | |||
| Lasso 2 | 0.63 | 0.76 | 0.32 | 0.22 | ||
| (0.09) | (0.03) | (0.03) | (0.05) | |||
| Proposed | 0.41 | 0.85 | 0.31 | 0.24 | ||
| (0.07) | (0.02) | (0.03) | (0.05) | |||
| 3 | (20, 50, 0.5) | Lasso 1 | 0.30 | 0.93 | 0.18 | 0.00 |
| (0.01) | (0.00) | (0.02) | (0.00) | |||
| Lasso 2 | 0.33 | 0.92 | 0.11 | 0.00 | ||
| (0.01) | (0.00) | (0.01) | (0.00) | |||
| Proposed | 0.30 | 0.93 | 0.11 | 0.00 | ||
| (0.01) | (0.00) | (0.01) | (0.00) | |||
| 3 | (100, 25, 0.2) | Lasso 1 | 0.15 | 0.96 | 0.25 | 0.00 |
| (0.02) | (0.01) | (0.03) | (0.00) | |||
| Lasso 2 | 0.26 | 0.93 | 0.16 | 0.01 | ||
| (0.08) | (0.02) | (0.02) | (0.01) | |||
| Proposed | 0.15 | 0.96 | 0.15 | 0.00 | ||
| (0.02) | (0.01) | (0.01) | (0.00) |
Sample means (standard errors in parentheses) of mean squared error (MSE), R 2, relative error (RE) and false identification rate (FR), based on 20 simulation replications, for the proposed method and the Lasso
Fig. 1Solution paths of the model fitting with p=520 genes
18] for details. To compute a solution path for a decreasing sequence of λ values, we adopt the approach in Friedman et al. [21] and use warm starts for each λ value. The sequence of λ values are either provided by the user, or we begin with λ =∥ ∥ for which all the coefficients are equal to 0. We set λ =ε λ , where ε is a small value, such as 0.01, and generate a decreasing sequence of 100 λ values from λ to λ on the log-scale.