| Literature DB >> 35688606 |
Abstract
High-dimensional inference is one of fundamental problems in modern biomedical studies. However, the existing methods do not perform satisfactorily. Based on the Markov property of graphical models and the likelihood ratio test, this article provides a simple justification for the Markov neighborhood regression method such that it can be applied to statistical inference for high-dimensional generalized linear models with mixed features. The Markov neighborhood regression method is highly attractive in that it breaks the high-dimensional inference problems into a series of low-dimensional inference problems. The proposed method is applied to the cancer cell line encyclopedia data for identification of the genes and mutations that are sensitive to the response of anti-cancer drugs. The numerical results favor the Markov neighborhood regression method to the existing ones.Entities:
Keywords:
zzm321990
Mesh:
Year: 2022 PMID: 35688606 PMCID: PMC9427730 DOI: 10.1002/sim.9493
Source DB: PubMed Journal: Stat Med ISSN: 0277-6715 Impact factor: 2.497
Coverage rates and widths of the 95% confidence intervals produced by MNR, desparsified Lasso, and ridge projection for simulated examples, where “signal” and “noise” denote nonzero and zero regression coefficients, respectively
| Measure | Desparsified‐Lasso | Ridge projection | MNR | ||
|---|---|---|---|---|---|
| Linear | Coverage | Signal | 0.880 (0.015) | 0.975 (0.007) | 0.955 (0.010) |
| Noise | 0.953 (0.010) | 0.981 (0.006) | 0.950 (0.010) | ||
| Width | Signal | 0.374 (0.006) | 0.682(0.010) | 0.379 (0.005) | |
| Noise | 0.377 (0.007) | 0.693(0.012) | 0.387 (0.006) | ||
| CPU(s) | 390.9 | 2.393 | 224.6 | ||
| Logistic | Coverage | Signal | 0.135 (0.015) | 0.199 (0.018) | 0.940 (0.011) |
| Noise | 0.990 (0.005) | 1.000 (0.0002) | 0.948 (0.010) | ||
| Width | Signal | 0.831 (0.011) | 1.693 (0.025) | 1.497 (0.016) | |
| Noise | 0.784 (0.014) | 1.677 (0.030) | 1.059 (0.017) | ||
| CPU(s) | 2036 | 7.765 | 532.9 | ||
| Survival | Coverage | Signal | ‐ | ‐ | 0.939 (0.011) |
| Noise | ‐ | ‐ | 0.945 (0.010) | ||
| Width | Signal | ‐ | ‐ | 0.395 (0.004) | |
| Noise | ‐ | ‐ | 0.370 (0.006) | ||
| CPU(s) | ‐ | ‐ | 335.4 |
Note: The CPU time (in seconds) was recorded for a single dataset with the method running in serial on a personal computer of i9‐10900k CPU@3.6 GHz with 128 GB memory.
We set the SIS iteration number to 1 in step (a) of Algorithm 1.
Variable selection results by MNR, SIS‐SCAD, SIS‐MCP, SIS‐Lasso, and SIS‐Elastic‐Net for linear regression datasets simulated with , , , , and , 0.3, and 0.5
| MNR | SIS‐Elastic‐Net | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Measure |
|
|
|
| SIS‐SCAD | SIS‐MCP | SIS‐Lasso |
|
|
|
| |||||||||
| FSR | 0 | 0 | 0 | 0.029 | 0.010 | 0.010 | 0.320 | 0.875 | 0.812 |
| NSR | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.02 | 0.02 |
|
| |||||||||
| FSR | 0 | 0 | 0.010 | 0.057 | 0.010 | 0.091 | 0.281 | 0.829 | 0.699 |
| NSR | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.01 | 0.01 |
|
| |||||||||
| FSR | 0 | 0 | 0.010 | 0.057 | 0.038 | 0.057 | 0.254 | 0.701 | 0.554 |
| NSR | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|
| |||||||||
| FSR | 0.010 | 0.010 | 0.030 | 0.117 | 0.546 | 0.560 | 0.429 | 0.674 | 0.579 |
| NSR | 0.05 | 0.04 | 0.02 | 0.02 | 0.02 | 0.01 | 0 | 0.01 | 0.01 |
Note: For the elastic‐net penalty, we tried the setting .
Variable selection results by MNR, SIS‐SCAD, SIS‐MCP, SIS‐Lasso, and SIS‐Elastic‐Net for linear regression datasets simulated with , , , , and
| MNR | ||||||||
|---|---|---|---|---|---|---|---|---|
| Measure |
|
|
|
| SIS‐SCAD | SIS‐MCP | SIS‐Lasso | SIS‐Elastic‐Net |
|
| ||||||||
| FSR | 0 | 0 | 0.005 | 0.024 | 0.010 | 0.476 | 0.817 | 0.708 |
| NSR | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|
| ||||||||
| FSR | 0 | 0 | 0.005 | 0.024 | 0.206 | 0.541 | 0.845 | 0.715 |
| NSR | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
|
| ||||||||
| FSR | 0 | 0 | 0.015 | 0.107 | 0.631 | 0.650 | 0.779 | 0.668 |
| NSR | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.01 |
|
| ||||||||
| FSR | 0 | 0 | 0.010 | 0.099 | 0.752 | 0.716 | 0.771 | 0.655 |
| NSR | 0.025 | 0.01 | 0.005 | 0 | 0 | 0 | 0 | 0.01 |
|
| ||||||||
| FSR | 0 | 0 | 0.010 | 0.1 | 0.762 | 0.739 | 0.783 | 0.644 |
| NSR | 0.060 | 0.05 | 0.035 | 0.01 | 0.005 | 0.005 | 0 | 0.01 |
Note: For the elastic‐net penalty, we set .
Comparison of drug sensitive genes/mutations selected by desparsified Lasso, ridge projection, and MNR for 24 anti‐cancer drugs, where “*” indicates that this gene was significantly selected and the number in the parentheses denotes the width of the 95% confidence interval, and “‐MUT” indicates a mutation
| Drug | Desparsified‐Lasso | Ridge | MNR |
|---|---|---|---|
| 17‐AAG | ‐ | NQO1(0.194) | NQO1(0.247) |
| AEW541 | ‐ | NFE2L3(0.327) | GPATCH3(0.245) |
| AZD0530 | ‐ | STK39(0.331) | PYY(0.208) |
| AZD6244 | ‐ | SPRY2(0.303) | NRAS‐MUT*(0.548) |
| Erlotinib | ‐ | EGFR‐MUT(1.498) | EGFR‐MUT*(0.814) |
| ‐ | CLK3‐MUT*(1.506) | ||
| ‐ | EGFR*(0.261) | ||
| Irinotecan | ‐ | SLFN11*(0.337) | SLFN11*(0.2) |
| L‐685458 | ‐ | SELPLG(0.473) | WDR86*(0.203) |
| Lapatinib | ‐ | ERBB2(0.561) | SCO1(0.303) |
| LBW242 | ‐ | SET‐MUT(10.27) | SET‐MUT*(5.075) |
| Nilotinib | ‐ | APOL4*(0.474) | CAMK2A‐MUT*(2.017) |
| NCF4*(0.349) | |||
| CCL23*(0.352) | |||
| TRDC*(0.211) | |||
| RNASE2*(0.437) | |||
| APOL4*(0.277) | |||
| Nutlin‐3 | ‐ | SPIC(0.398) | ASB16*(0.231) |
| Paclitaxel | ‐ | ABCB1(0.326) | TM2D2*(0.280) |
| Panobinostat | ‐ | LOC100652995(0.250) | SVIP*(0.201) |
| PD‐0325901 | ‐ | SPRY2(0.324) | THRSP‐MUT(2.696) |
| PD‐0332991 | ‐ | TMTC2(0.346) | NFE2L3*(0.223) |
| PF2341066 | ‐ | SCD5‐MUT(8.433) | SCD5‐MUT*(3.239) |
| ANKRD22*(0.251) | |||
| WDFY4*(0.314) | |||
| PHA‐665752 | ‐ | GCFC2(0.387) | PDPK1‐MUT(3.429) |
| PLX4720 | ‐ | BRAFV600E‐MUT*(1.830) | BRAFV600E‐MUT*(0.899) |
| ‐ | PLEKHH3*(0.19) | ||
| ‐ | IRAK1‐MUT*(1.66) | ||
| RAF265 | ‐ | GNPTAB(0.354) | FAM89B*(0.255) |
| Sorafenib | ‐ | PROSER1(0.523) | DNAJC5B*(0.284) |
| ‐ | THAP10*(0.261) | ||
| TAE684 | ‐ | SELPLG(0.457) | PPFIA1*(0.292) |
| TKI258 | ‐ | WDFY4(0.464) | THEMIS*(0.304) |
| Topotecan | ‐ | SLFN11*(0.278) | SLFN11(0.17) |
| ZD‐6474 | ‐ | APOL4(0.417) | PGBD2*(0.206) |
Note: For each dataset, ridge regression cost 2.6 minutes CPU time with a single thread running in serial, and MNR cost 46.5 minutes CPU time with 10 threads running in parallel. All methods were run on the same personal computer with i9‐10900k CPU@3.6GHz and 128 GB memory.
Comparison of MNR with SIS‐SCAD, SIS‐MCP, and SIS‐Lasso for model prediction and variable selection on three selected drugs, 17‐AAG, Irinotecan, and PLX4720, via 5‐fold cross‐validation experiments: “MSFE” denotes the mean squared fitting error, “MSPE” denotes the mean squared prediction error, and “Size” denotes the number of selected gene/mutations, which are reported as the average over 5‐fold results with the standard deviation given in the parentheses; “selected Genes/mutations” shows the genes and mutations selected in the 5‐fold experiments, where the number in the parentheses represents the selection frequency of each selected gene/mutation
| Drug | Methods | MSFE | MSPE | Size | Selected genes/mutations |
|---|---|---|---|---|---|
| 17‐AAG | SIS‐SCAD | 0.62(0.21) | 0.88(0.16) | 20.0(11.5) | NQO1(4),CDH6(3),MMP24(3),ZNF610(3), ZFP30(3),ZNF14(3) |
| SIS‐MCP | 0.54(0.02) | 0.89(0.14) | 16.2(3.5) | NQO1(5),CDH6(3),MMP24(3), ZFP30(3),CBFB(3) | |
| SIS‐Lasso | 0.77(0.17) | 0.99(0.10) | 7.8(11.0) | MMP24(4),NQO1(2),ZFP30(2),CTDSP1(2) | |
| MNR | 0.93(0.04) | 0.98(0.11) | 1.2(0.5) | NQO1(4) | |
| Irinotecan | SIS‐SCAD | 0.44(0.05) | 0.55(0.08) | 6.6(0.9) | ARHGAP19(5),SLFN11(4) |
| SIS‐MCP | 0.46(0.05) | 0.56(0.09) | 3.8(0.8) | ARHGAP19(5),SLFN11(4) | |
| SIS‐Lasso | 0.43(0.06) | 0.54(0.09) | 9.8(3.0) | ARHGAP19(5),CPSF6(5), SLFN11(4),CD63(3) | |
| MNR | 0.74(0.02) | 0.75(0.07) | 1.0(0.0) | SLFN11(5) | |
| PLX4720 | SIS‐SCAD | 0.59(0.05) | 0.91(0.28) | 9.8(5.1) | GAPDHS(3),MAD1L1(3),RXRG(2), LPL(2),ART3(2),ZFP106(2) |
| SIS‐MCP | 0.61(0.04) | 0.89(0.27) | 5.4(2.8) | GAPDHS(3),ZFP106(2),ZEB2(2) | |
| SIS‐Lasso | 0.60(0.06) | 0.87(0.23) | 10.2(5.6) | SPRYD5(5),GAPDHS(4),RXRG(3) | |
| MNR | 0.52(0.05) | 0.65(0.12) | 3.2(3.8) | BRAF.V600E‐MUT(5), IRAK1‐MUT(2) |