| Literature DB >> 28122498 |
Jing Zhai1, Chiu-Hsieh Hsu1, Z John Daye2.
Abstract
BACKGROUND: Many questions in statistical genomics can be formulated in terms of variable selection of candidate biological factors for modeling a trait or quantity of interest. Often, in these applications, additional covariates describing clinical, demographical or experimental effects must be included a priori as mandatory covariates while allowing the selection of a large number of candidate or optional variables. As genomic studies routinely require mandatory covariates, it is of interest to propose principled methods of variable selection that can incorporate mandatory covariates.Entities:
Keywords: Gene expression analysis; Lasso; Linear models; Penalized regression; Ridge; Variable selection
Mesh:
Substances:
Year: 2017 PMID: 28122498 PMCID: PMC5267467 DOI: 10.1186/s12874-017-0291-y
Source DB: PubMed Journal: BMC Med Res Methodol ISSN: 1471-2288 Impact factor: 4.615
Fig. 1Penalty contour when d=2 for (a) lasso, (b) ridle, and (c) ridge regressions
Simulation example 1: effect of signal strengths
| Method |
|
| Sensitivity | Specificity | |
|---|---|---|---|---|---|
|
| Ridge | 1.008 (0.009) | |||
| Lasso | 1.004 (0.018) | 0.582 (0.009) | 0.350 (0.018) | 0.957 (0.006) | |
| Elastic net | 0.923 (0.020) | 0.676 (0.007) | 0.600 (0.041) | 0.848 (0.023) | |
|
| 0.675 (0.028) |
| 1.000 (0.000) | 1.000 (0.000) | |
|
| 0.697 (0.026) |
| 1.000 (0.000) | 1.000 (0.002) | |
| Ridle |
|
| 1.000 (0.000) | 0.996 (0.002) | |
|
| Ridge | 6.549 (0.056) | |||
| Lasso | 3.300 (0.083) | 0.839 (0.005) | 0.750 (0.017) | 0.926 (0.003) | |
| elastic net | 3.230 (0.118) | 0.853 (0.004) | 0.900 (0.008) | 0.850 (0.005) | |
|
| 0.691 (0.023) |
| 1.000 (0.000) | 1.000 (0.000) | |
|
| 0.701(0.028) |
| 1.000 (0.000) | 1.000 (0.001) | |
| Ridle |
|
| 1.000 (0.000) | 0.996 (0.002) | |
|
| Ridge | 24.559 (0.317) | |||
| Lasso | 8.074 (0.433) | 0.908 (0.005) | 0.900 (0.013) | 0.935 (0.002) | |
| Elastic net | 6.735 (0.339) | 0.903 (0.002) | 0.950 (0.013) | 0.852 (0.003) | |
|
| 0.676 (0.032) |
| 1.000 (0.000) | 1.000 (0.000) | |
|
| 0.725 (0.030) |
| 1.000 (0.000) | 1.000 (0.001) | |
| Ridle |
|
| 1.000 (0.000) | 0.996 (0.002) |
The -unpenalized lasso and -unpenalized elastic net were performed without penalization on the mandatory covariates
n=50, p=250, . The smallest rpe and largest two g-measures are boldfaced
Simulation example 2: effect of correlation between mandatory and irrelevant predictors
| Method |
|
| Sensitivity ( | Sensitivity ( | Specificity ( | |
|---|---|---|---|---|---|---|
|
| Ridge | 1.671 (0.012) | ||||
| Lasso | 1.911 (0.022) | 0.383 (0.034) | 0.100 (0.032) | 0.200 (0.028) | 0.975 (0.008) | |
| Elastic net | 1.744 (0.019) | 0.585 (0.015) | 0.400 (0.054) | 0.600 (0.050) | 0.835 (0.036) | |
|
| 1.741 (0.028) | 0.742 (0.012) | 1.000 (0.000) | 0.200 (0.037) | 0.938 (0.003) | |
|
| 1.657 (0.017) |
| 1.000 (0.000) | 0.500 (0.064) | 0.833 (0.022) | |
| Ridle |
|
| 1.000 (0.000) | 0.200 (0.048) | 0.931 (0.006) | |
|
| Ridge | 1.807 (0.014) | ||||
| Lasso | 2.045 (0.035) | 0.571 (0.013) | 0.300 (0.046) | 0.400 (0.039) | 0.925 (0.007) | |
| Elastic net | 1.773 (0.034) | 0.667 (0.008) | 0.600 (0.014) | 0.800 (0.048) | 0.756 (0.020) | |
|
| 1.922 (0.044) | 0.794 (0.003) | 1.000 (0.000) | 0.400 (0.047) | 0.929 (0.004) | |
|
| 1.729 (0.040) |
| 1.000 (0.000) | 0.700 (0.048) | 0.785 (0.022) | |
| Ridle |
|
| 1.000 (0.000) | 0.600 (0.049) | 0.900 (0.004) | |
|
| Ridge | 1.564 (0.022) | ||||
| Lasso | 1.365 (0.029) | 0.684 (0.008) | 0.400 (0.032) | 0.600 (0.012) | 0.900 (0.003) | |
| Elastic net | 1.237 (0.030) | 0.745 (0.005) | 0.700 (0.048) | 0.900 (0.011) | 0.775 (0.014) | |
|
| 1.423 (0.037) | 0.839 (0.005) | 1.000 (0.000) | 0.700 (0.026) | 0.904 (0.006) | |
|
| 1.310 (0.041) |
| 1.000 (0.000) | 0.800 (0.012) | 0.840 (0.008) | |
| Ridle |
|
| 1.000 (0.000) | 0.700 (0.038) | 0.908 (0.003) |
The -unpenalized lasso and -unpenalized elastic net were performed without penalization on the mandatory covariates. g-measure is estimated from all predictors. Sensitivity () is computed in terms of the mandatory variables only, whereas sensitivity () and specificity () are computed in terms of the optional variables only
n=50, p=250, . The smallest rpe and largest two g-measures are boldfaced
Simulation example 3: effect of multicollinearity among mandatory covariates
| Method |
|
| Sensitivity ( | Sensitivity ( | Specificity ( | |
|---|---|---|---|---|---|---|
|
| Ridge | 6.353 (0.022) | ||||
| Lasso | 4.649 (0.167) | 0.802 (0.011) | 0.800 (0.000) | 0.700 (0.048) | 0.908 (0.004) | |
| Elastic net | 4.410 (0.128) | 0.804 (0.005) | 1.000 (0.009) | 0.700 (0.006) | 0.858 (0.006) | |
|
| 4.776 (0.260) |
| 1.000 (0.000) | 0.700 (0.031) | 0.902 (0.007) | |
|
| 5.402 (0.190) | 0.823 (0.006) | 1.000 (0.000) | 0.700 (0.013) | 0.871 (0.009) | |
| Ridle |
|
| 1.000 (0.000) | 0.900 (0.048) | 0.904 (0.004) | |
|
| Ridge | 6.270 (0.026) | ||||
| Lasso | 4.914 (0.148) | 0.784 (0.010) | 0.600 (0.089) | 0.700 (0.036) | 0.908 (0.004) | |
| Elastic net | 4.336 (0.135) | 0.816 (0.005) | 0.800 (0.092) | 0.700 (0.018) | 0.867 (0.008) | |
|
| 6.992 (0.337) |
| 1.000 (0.000) | 0.700 (0.031) | 0.902 (0.006) | |
|
| 7.245 (0.237) | 0.827 (0.005) | 1.000 (0.000) | 0.700 (0.045) | 0.860 (0.011) | |
| Ridle |
|
| 1.000 (0.000) | 0.800 (0.045) | 0.900 (0.004) | |
|
| Ridge | 6.231 (0.031) | ||||
| Lasso | 7.322 (0.200) | 0.745 (0.005) | 0.400 (0.000) | 0.700 (0.000) | 0.913 (0.003) | |
| Elastic net | 5.003 (0.155) | 0.804 (0.006) | 0.800 (0.049) | 0.700 (0.019) | 0.883 (0.006) | |
|
| 36.214 (2.064) | 0.824 (0.006) | 1.000 (0.000) | 0.700 (0.046) | 0.904 (0.005) | |
|
| 33.583 (2.197) |
| 1.000 (0.010) | 0.700 (0.045) | 0.867 (0.010) | |
| Ridle |
|
| 1.000 (0.000) | 0.800 (0.029) | 0.904 (0.004) |
The -unpenalized lasso and -unpenalized elastic net were performed without penalization on the mandatory covariates. g-measure is estimated from all predictors. Sensitivity () is computed in terms of the mandatory variables only, whereas sensitivity () and specificity () are computed in terms of the optional variables only
n=50, p=250, . The smallest rpe and largest two g-measures are boldfaced
Simulation example 4: mandatory covariates are irrelevant
| Method |
|
| Specificity ( | Sensitivity ( | Specificity ( | |
|---|---|---|---|---|---|---|
|
| Ridge |
| ||||
| Lasso | 1.911 (0.022) |
| 1.000 (0.000) | 0.200 (0.028) | 0.975 (0.008) | |
| Elastic net | 1.744 (0.019) |
| 0.600 (0.053) | 0.600 (0.050) | 0.835 (0.036) | |
|
| 2.357 (0.032) | 0.215 (0.103) | 0.000 (0.000) | 0.050 (0.024) | 0.995 (0.003) | |
|
| 2.210 (0.034) | 0.308 (0.054) | 0.000 (0.000) | 0.525 (0.065) | 0.732 (0.057) | |
| Ridle | 1.854 (0.012) | 0.309 (0.029) | 0.000 (0.000) | 0.100 (0.024) | 0.982 (0.005) | |
|
| Ridge | 1.807 (0.014) | ||||
| Lasso | 2.045 (0.035) |
| 0.800 (0.006) | 0.400 (0.039) | 0.925 (0.007) | |
| Elastic net |
|
| 0.500 (0.048) | 0.800 (0.048) | 0.756 (0.020) | |
|
| 2.242 (0.023) | 0.299 (0.035) | 0.000 (0.000) | 0.100 (0.021) | 0.982 (0.004) | |
|
| 2.080 (0.028) | 0.305 (0.094) | 0.000 (0.000) | 0.550 (0.072) | 0.700 (0.079) | |
| Ridle | 1.801 (0.039) | 0.528 (0.032) | 0.000 (0.000) | 0.300 (0.038) | 0.943 (0.005) | |
|
| Ridge | 1.564 (0.022) | ||||
| Lasso | 1.365 (0.029) |
| 0.700 (0.041) | 0.600 (0.012) | 0.900 (0.003) | |
| Elastic net |
|
| 0.300 (0.046) | 0.900 (0.011) | 0.775 (0.014) | |
|
| 1.747 (0.043) | 0.428 (0.003) | 0.000 (0.000) | 0.200 (0.000) | 0.964 (0.003) | |
|
| 1.662 (0.043) | 0.514 (0.016) | 0.000 (0.000) | 0.350 (0.023) | 0.900 (0.015) | |
| Ridle | 1.253 (0.042) | 0.596 (0.017) | 0.000 (0.000) | 0.400 (0.026) | 0.945 (0.003) |
The -unpenalized lasso and -unpenalized elastic net were performed without penalization on the mandatory covariates. g-measure is estimated from all predictors. specificity () is computed in terms of the mandatory variables only, whereas sensitivity () and specificity () are computed in terms of the optional variables only
n=50, p=250, . The smallest rpe and largest two g-measures are boldfaced
Gene expression analysis on histologic grades of breast cancer
| No. selected | No. selected |
| |
|---|---|---|---|
| Ridge | 4 | 430 | 0.487 |
| Lasso | 2 | 19 | 0.260 |
| Elastic net | 2 | 14 | 0.286 |
|
| 4 | 21 | 0.257 |
|
| 4 | 7 | 0.296 |
| Ridle | 4 | 24 |
|
The -unpenalized lasso and -unpenalized elastic net were performed without penalization on the mandatory covariates. The elastic net and -unpenalized elastic net are built with alpha=0.2575 and alpha=0.8462, respectively, selected by cross-validation. Numbers of selected mandatory covariates and optional variables , and mean-squared error (MSE) are shown. Smallest MSE is boldfaced
Fig. 2Selection of genes and clinicopathological variables. PgR, ER, p53Status, and AgeDiagnosis are clinicopathological covariates, whereas all others are genes. The -unpenalized lasso and -unpenalized elastic net were performed without penalization on the clinicopathological variables as mandatory covariates