Literature DB >> 35688606

Markov neighborhood regression for statistical inference of high-dimensional generalized linear models.

Abstract

High-dimensional inference is one of fundamental problems in modern biomedical studies. However, the existing methods do not perform satisfactorily. Based on the Markov property of graphical models and the likelihood ratio test, this article provides a simple justification for the Markov neighborhood regression method such that it can be applied to statistical inference for high-dimensional generalized linear models with mixed features. The Markov neighborhood regression method is highly attractive in that it breaks the high-dimensional inference problems into a series of low-dimensional inference problems. The proposed method is applied to the cancer cell line encyclopedia data for identification of the genes and mutations that are sensitive to the response of anti-cancer drugs. The numerical results favor the Markov neighborhood regression method to the existing ones.

Entities: Chemical

Keywords: zzm321990 zzm321990 zzm321990 zzm321990 Pzzm321990 zzm321990 $$ P $$zzm321990 zzm321990 zzm321990-value; confidence interval; graphical model; likelihood ratio test; nodewise regression

Mesh：

Year: 2022 PMID： 35688606 PMCID： PMC9427730 DOI： 10.1002/sim.9493

Source DB: PubMed Journal: Stat Med ISSN： 0277-6715 Impact factor: 2.497

INTRODUCTION

During the past two decades, dramatic improvements in data collection and acquisition technologies have enabled scientists to collect a great amount of high‐dimensional data, for which the dimension can be much larger than the sample size (a.k.a. small‐‐large‐). The current research on high‐dimensional data mainly focuses on variable selection and graphical modeling. The former aims to provide a consistent estimate for the regression model under sparsity constraints. The existing methods include Lasso, SCAD, MCP, elastic net, and rLasso, among others. The latter aims to learn conditional independence relationships for a large set of variables. The existing methods include graphical Lasso, , nodewise regression, and ‐learning, , among others developed. Quite recently, more and more researchers turn their attention to statistical inference, which is to seek for statistical procedures that are able to quantify uncertainty of high‐dimensional regression, for example, constructing confidence intervals and assessing ‐values for a single or subset of regression coefficients. A non‐exhaustive list of the existing methods include desparsified Lasso, , , multi sample‐splitting, ridge projection, and Markov neighborhood regression (MNR). See Section S1 of the supplementary material and Section 2 of this article for a brief review of these methods. Among the existing methods, the MNR method is a promising one. Based on the Markov property of Gaussian graphical models (GGMs), it successfully breaks the high‐dimensional inference problem into a series of low‐dimensional inference problems from which the desired confidence interval and ‐value can be computed as for the conventional low‐dimensional regression problems. Compared to the existing methods, the MNR method tends to produce confidence intervals with more accurate coverage rates. However, based on the theory developed in Reference 16, the MNR method is only applicable to the case that the explanatory variables follow a multivariate Gaussian distribution. This has severely limited the scope of its applications. This article provides a simple justification, based on the Markov property of graphical models and the likelihood ratio test, for the MNR method such that it can be extended to general high‐dimensional inference problems. In particular, it can be applied to statistical inference for high‐dimensional generalized linear models (GLMs) with mixed features, where the features can be continuous, discrete or both, and the response variable can be Gaussian, Poisson, multinomial, or even survival time (for Cox regression). This article also provides an algorithm for implementation of the MNR method and proves its validity. The numerical results favor the MNR method to the existing ones. The remaining part of this article is organized as follows. Section 2 provides a brief review for the MNR method with Gaussian explanatory variables. Section 3 extends the MNR method to general high‐dimensional inference problems. Section 4 illustrates the performance of MNR along with comparisons with the desparsified Lasso and ridge projection methods. Section 5 presents the application of the MNR method to the cancer cell line encyclopedia (CCLE) data. Section 6 concludes the article with a brief discussion.

A BRIEF REVIEW OF THE MNR METHOD

Suppose that a set of independent samples have been collected from the linear regression with a random design: where follows the Gaussian distribution , and the explanatory variables (also known as features) follows a multivariate normal distribution . Let denote the support of the true model, which is sparse. Suppose that has been represented by a GGM denoted by , where represents the set of vertices, represents the adjacency matrix, if the th entry of the precision matrix is nonzero and 0 otherwise. Let denote a set of features indexed by . Let denote the neighboring set of in . It follows from the Markov property of the GGM that for any , is called the minimum Markov neighborhood of in . The minimum Markov neighborhood is also termed as Markov blanket in Bayesian networks or general Markov networks. Any subset is a Markov neighborhood of if . Without loss of generality, we let denote a Markov neighborhood of , let denote the covariance matrix of , and partition as Following from the well‐known property of the GGM, for any variables and , where denotes the th entry of . Therefore, the first row of and the first column of in (2) are exactly zero, as holds. Inverting , we have , which is equal to the top ‐submatrix of . Therefore, Since the first row of and the first column of are exactly zero, the th element of is exactly zero. Therefore, the th entry of (and ) equals to the th entry of . This suggests that if holds and is sufficiently large, then the statistical inference for can be made based on the subset regression: Since forms a Markov neighborhood of in the GGM formed by all features, the subset regression is called a Markov neighborhood regression, which breaks the high‐dimensional inference problem into a series of low‐dimensional inference problems by solving a subset regression for each feature. Let denote an estimate of , let denote an estimate of , and let . Reference 16 proved the validity of the MNR method under the following conditions: For each , if the conditions (6) to (8) are satisfied, then , where is the th entry of the precision matrix . For the case that is finite, one can use to approximate the distribution of ; that is, the estimate, ‐value, and confidence interval of can be calculated from a subset regression as in conventional low‐dimensional multiple linear regression. As implied by the proof of Theorem 1 of Reference 16, the conditions (6) and (8) together ensure the convergence , where denotes convergence in distribution and denotes the diagonal elements of corresponding to , and denotes the covariance matrix of the features included in ; while the condition (7) ensures as explained above (around Equation 4). Finally, we note that the inference problem addressed MNR is very different from the post‐selection inference problem considered in the literature. , Let denotes the collection of all features of a high‐dimensional regression, and let denote a subset model. The post‐selection inference is to construct a confidence interval of for any feature , conditioned on the event that the model is selected (ie, ), such that for a prespecified confidence level . Due to the intrinsic correlation between the selected model and the outputs of statistical tests, the theory for post‐selection inference is rather intricate. In contrast, the problem addressed by MNR is relatively simple, which is to find a confidence interval for any feature such that . Without conditioning on the selected model makes the theory, as developed in Reference 16 and the current article, much simpler than that of post‐selection inference. Under appropriate sparsity assumptions, our inference procedure is valid as long as the consistency or sure screening properties hold for the variable/structure selection procedures employed in steps (a) and (b) of Algorithm 1 (see Section 3 for the detail).

EXTENSION OF THE MNR METHOD TO GENERAL HIGH‐DIMENSIONAL INFERENCE PROBLEMS

From the brief review given in Section 2, we know that the validity of the MNR method depends crucially on the normality assumption for the features . Otherwise, (3) does not hold and the proof cannot go through any more. The normality assumption has severely limited the application scope of the method. In what follows, we provide a new proof for the validity of the MNR method such that it can be used for statistical inference of high‐dimensional GLMs with mixed features. The density function of the GLM is given by where is the dispersion parameter, is continuously differentiable, and is the natural parameter relating to the features via a linear function where the features can be continuous, discrete or both. This class of GLMs includes normal linear regression, Poisson regression, and logistic regression, among others. Further, we assume that the joint distribution of the features can be represented by a graphical model, and the conditional distribution of each feature can be represented by a GLM. We refer to Reference 20 for discussions on compatibility of the joint and conditional distributions. For example, for the case that the features are mixed by Gaussian and Bernoulli random variables, their joint distribution can be found in Reference 21, where it is shown that the conditional distribution of each Gaussian random variable can be represented by a linear regression and that of each binomial random variable can be represented by a logistic regression. Toward the goal of making statistical inference for such a high‐dimensional GLM, evaluating ‐value, and constructing confidence interval for the coefficient of each feature, we extend the MNR method as follows. First, let's consider , the joint distribution of and conditioned on all other variables . Suppose that a Markov network has been constructed for the features and the Markov blanket has been identified for . Here the Markov network can be a Bayesian network or its moral graph, based on which a Markov blanket can be identified for each variable . Recall that the Markov blanket of a node is the minimum subset of such that is independent of all other variables conditional on it. Following from the property of the Markov blanket, can be simplified as follows: where can be modeled by a subset GLM with the natural parameter given by where denotes the regression coefficients corresponding to the features . Suppose that the likelihood ratio test method is used to test the hypothesis vs with respect to the GLM (9), which characterizes the relationship between and conditioned on . By (11), this test is reduced to the likelihood ratio test for the hypothesis with respect to a subset GLM with the natural parameter given by (12). In summary, the MNR method can be described in Algorithm , which breaks the high‐dimensional inference problem into a series of low‐dimensional inference problems by solving a subset GLM for each feature. The likelihood ratio test has been well studied for GLMs. For linear regression, the likelihood ratio test and the Wald test give the same results in the case that is finite and holds. For example, the square of the Wald ‐test for a single coefficient is numerically identical to the likelihood ratio ‐test for the same coefficient. For other GLMs such as logistic regression, this equality does not hold. However, the two tests are asymptotically equivalent in the sense that they will produce the same inference when is large and holds, following from the asymptotic theory established in Reference 22 where the number of parameters is allowed to grow with . That is, if the conditions (6) to (8) are satisfied, the inference for each of the subset GLM in Algorithm 1 can be done as for the conventional low‐dimensional GLMs. The conditions (6) to (8) imply that the MNR method can be implemented in many different ways. The condition (6) is the so‐called screening property, which is known to hold for many high‐dimensional variable selection algorithms, such as SCAD, MCP, elastic net, and adaptive Lasso. Lasso also satisfies this condition under appropriate conditions of the design matrix, see Reference 24 for a review. For Markov blanket estimation, there are at least two methods we can use. The first one is ‐learning, which provides a consistent estimate of the moral graph for mixed data. The second one is nodewise regression, which was first proposed in Reference 8 for learning GGMs with an ‐penalty, and then extended in Reference 25 for learning graphical models for binary data and in Reference 20 for learning graphical models for mixed data. This article extends the method further. In Appendix A, we show that the nodewise regression method can be applied to learn mixed graphical models with an amenable penalty, which includes the Lasso, SCAD, and MCP penalties as special cases. Further, the condition (8) can be easily satisfied by a slight twist of the sparsity constraints imposed on the variable selection and Markov blanket estimation algorithms. Theorem 1 provides a formal justification for the validity of Algorithm 1, whose proof is given in Appendix A. (Validity of Algorithm Consider a GLM given in ( ) with sample size and dimension , where the features form a mixed graphical model with compatible joint and conditional distributions. Suppose that the GLM is sparse such that the true model size , the mixed graphical model is sparse such that the maximum neighborhood size , and the conditions to (given in Appendix ) are satisfied. If a regularization method with an amenable penalty function is used for variable selection in step (a), and the nodewise regression method with an amenable penalty function is used for Markov blanket estimation in step (b), then Algorithm is valid for statistical inference of high‐dimensional GLMs. Regarding Algorithm 1 and the proof for its validity, we have two remarks. (Joint inference) Algorithm 1 can be easily extended to joint inference for a finite number of variables. For example, we want to test a linear hypothesis vs , where denotes a ‐vector with nonzero elements and is finite. For this hypothesis, the subset GLM can be simply constructed as , where and . (Accelerating computing by screening) For the subset GLM, if we do not stick to the equivalent Wald test, but directly perform the likelihood ratio test, then the condition (8) can be much relaxed. For linear regression, by Theorem 1 of Reference 27, the condition (8) can be relaxed as , which is actually a sufficient and necessary condition for the Chi‐square approximation of the likelihood ratio test. For the logistic regression, the above condition can be relaxed further. Reference 28 showed that if and grows large in such a way that for some , then the likelihood ratio test can be approximated by a rescaled Chi‐square. Based on these results, the sparsity conditions in Theorem 1 can be relaxed as and . More importantly, in this case, Algorithm 1 can be much accelerated by replacing the variable selection procedure, performed in step (a) as well step (b) for each node, by a sure independence screening procedure. ,

SIMULATION STUDIES

Linear regression

We generated 500 datasets from the linear regression , where the sample size , the dimension , and the random error . The features were generated in the following procedure according to a Bayesian network with the adjacency matrix given by First, we ordered the variables as , and randomly marked half of them as continuous and half as binary. Next, we generated , and set if is continuous, and otherwise; and generated the variables , sequentially by setting where we set . The true regression coefficients were given by . Algorithm 1 was applied to this example, where variable selection was conducted using the SIS‐MCP method implemented in the R package SIS, the Markov blanket was estimated by nodewise regression with SIS‐MCP used in regressing for each node. For comparison, the desparsified‐Lasso and ridge projection method were also applied to this example. Both methods have been implemented in the R package hdi. For all other examples of this article, the algorithms were implemented in the same way. Table 1 summarizes the coverage rates and widths of the 95% confidence intervals produced by these methods for each regression coefficient. For the nonzero regression coefficients (denoted by “signal”), the mean coverage rate and its standard deviation are calculated by where indicates the coverage of by the confidence interval, and denotes the variance. By dividing by 500 in its calculation, represents the variability of the mean value (averaged over 500 independent datasets) for a single regression coefficient. For the width of the confidence interval, the mean and standard deviation were calculated similarly. For the zero regression coefficients (denoted by “noise”), the mean coverage rate, the mean width, and their standard deviations were also calculated similarly. The comparison indicates that MNR significantly outperforms the existing methods: For both the nonzero and zero regression coefficients, the mean coverage rates produced by MNR are much closer to their nominal level. The reason why desparsified Lasso is coverage deficient has been explained in Reference 16: Desparsified Lasso centers its confidence interval at a bias‐corrected Lasso estimator which, unfortunately, is still biased, although its bias has been much smaller than the original Lasso estimator.

TABLE 1

	Measure		Desparsified‐Lasso	Ridge projection	MNR
Linear	Coverage	Signal	0.880 (0.015)	0.975 (0.007)	0.955 (0.010)
	Coverage	Noise	0.953 (0.010)	0.981 (0.006)	0.950 (0.010)
	Width	Signal	0.374 (0.006)	0.682(0.010)	0.379 (0.005)
	Width	Noise	0.377 (0.007)	0.693(0.012)	0.387 (0.006)
	CPU(s)		390.9	2.393	224.6
Logistic	Coverage	Signal	0.135 (0.015)	0.199 (0.018)	0.940 (0.011)
	Coverage	Noise	0.990 (0.005)	1.000 (0.0002)	0.948 (0.010)
	Width	Signal	0.831 (0.011)	1.693 (0.025)	1.497 (0.016)
	Width	Noise	0.784 (0.014)	1.677 (0.030)	1.059 (0.017)
	CPU(s)		2036	7.765	532.9
Survival	Coverage	Signal	‐	‐	0.939 (0.011)
	Coverage	Noise	‐	‐	0.945 (0.010)
	Width	Signal	‐	‐	0.395 (0.004)
	Width	Noise	‐	‐	0.370 (0.006)
	CPU(s)		‐	‐	335.4 ^a

Note: The CPU time (in seconds) was recorded for a single dataset with the method running in serial on a personal computer of i9‐10900k CPU@3.6 GHz with 128 GB memory.

We set the SIS iteration number to 1 in step (a) of Algorithm 1.

Coverage rates and widths of the 95% confidence intervals produced by MNR, desparsified Lasso, and ridge projection for simulated examples, where “signal” and “noise” denote nonzero and zero regression coefficients, respectively Note: The CPU time (in seconds) was recorded for a single dataset with the method running in serial on a personal computer of i9‐10900k CPU@3.6 GHz with 128 GB memory. We set the SIS iteration number to 1 in step (a) of Algorithm 1.

Logistic regression

We simulated 500 datasets from a logistic regression, where we set , , and the true regression coefficients . We set . The features were generated according to (13) and (14) as in Section 4.1. The MNR, desparsified Lasso and ridge projection methods were applied to this example with the results summarized in the lower part of Table 1. For MNR, SIS‐MCP was used for both procedures of variable selection and Markov blanket construction. The comparison indicates again that MNR significantly outperforms desparsified Lasso and ridge projection in confidence interval construction for GLMs. The desparsified Lasso and ridge regression essentially fail for the example.

Cox regression

To valid the MNR method in more general cases, we consider cox regression. We let denote the hazard rate at time and let denote the baseline hazard rate. The Cox regression is given by from which we simulated 500 datasets with , . We set the true regression coefficients , set the baseline hazard rate , and set the censoring hazard rate . We generated the predictors by using Equations (13) and (14) with , generated the event time from the Weibull distribution with a shape parameter of 1 and a scale parameter of , generated the censoring time from the Weibull distribution with a shape parameter of 1 and a scale parameter of , and set the observed survival time as the minimum of the event time and the censoring time for each subject. The MNR method was applied to the datasets, where SIS‐Lasso was applied for variable selection and SIS‐MCP was used for Markov blanket estimation. The results were summarized in Table 1. Unfortunately, the desparsified Lasso and ridge regression were not available for this model and thus could not be used for comparison.

Variable selection by MNR

This section explores the potential of MNR in variable selection. As discussed in Reference 16, MNR converts the variable selection problem to a multiple hypothesis testing problem. By computing and sorting adjusted ‐values or ‐values, we can select important variables at a prespecified false discovery rate (FDR). In this study, we generated 20 datasets from a linear regression model under each of the following settings: (a) , 300, , , true regression coefficients ; (b) , 500, , , true regression coefficients ; where the value of is varied for tuning the strength of signal. The explanatory variables were generated as in Section 4.1, but different values of , including , 0.3 and 0.5, were used in equation (14). The proportion of binary predictors was set to 10%, that is, each dataset consists of 100 and 1000 binary predictors under the settings (a) and (b), respectively. By step 3 of Algorithm 1, a subset regression is implemented for each predictor, and thus relevant variables can be selected based on the multiple hypothesis tests: Given the ‐values of the subset regressions, we conduct the multiple hypothesis tests using the empirical Bayesian method developed in Reference 35. As shown in Tables 2 and 3, MNR can exactly identify the true predictors for each dataset at a FDR level of or , where the ‐value is as defined in Reference 34. Tables 2 and 3 report these results in terms of false selection rate (FSR) and negative selection rate (NSR), which are defined by: where is the set of true variables and is the set of selected variables for dataset .

TABLE 2

Variable selection results by MNR, SIS‐SCAD, SIS‐MCP, SIS‐Lasso, and SIS‐Elastic‐Net for linear regression datasets simulated with , , , , and , 0.3, and 0.5

	MNR							SIS‐Elastic‐Net
Measure	q=0.0001	q=0.001	q=0.01	q=0.05	SIS‐SCAD	SIS‐MCP	SIS‐Lasso	α=0.1	α=0.2
	ρ=0.1, n=300, γ=1
FSR	0	0	0	0.029	0.010	0.010	0.320	0.875	0.812
NSR	0	0	0	0	0	0	0	0.02	0.02
	ρ=0.3, n=300, γ=1
FSR	0	0	0.010	0.057	0.010	0.091	0.281	0.829	0.699
NSR	0	0	0	0	0	0	0	0.01	0.01
	ρ=0.5, n=300, γ=1
FSR	0	0	0.010	0.057	0.038	0.057	0.254	0.701	0.554
NSR	0	0	0	0	0	0	0	0	0
	ρ=0.5, n=200, γ=1/3
FSR	0.010	0.010	0.030	0.117	0.546	0.560	0.429	0.674	0.579
NSR	0.05	0.04	0.02	0.02	0.02	0.01	0	0.01	0.01

Note: For the elastic‐net penalty, we tried the setting .

TABLE 3

Variable selection results by MNR, SIS‐SCAD, SIS‐MCP, SIS‐Lasso, and SIS‐Elastic‐Net for linear regression datasets simulated with , , , , and

	MNR
Measure	q=0.0001	q=0.001	q=0.01	q=0.05	SIS‐SCAD	SIS‐MCP	SIS‐Lasso	SIS‐Elastic‐Net
	n=500, γ=1
FSR	0	0	0.005	0.024	0.010	0.476	0.817	0.708
NSR	0	0	0	0	0	0	0	0
	n=500, γ=1/3
FSR	0	0	0.005	0.024	0.206	0.541	0.845	0.715
NSR	0	0	0	0	0	0	0	0
	n=300, γ=1/3
FSR	0	0	0.015	0.107	0.631	0.650	0.779	0.668
NSR	0	0	0	0	0	0	0	0.01
	n=300, γ=1/5
FSR	0	0	0.010	0.099	0.752	0.716	0.771	0.655
NSR	0.025	0.01	0.005	0	0	0	0	0.01
	n=300, γ=1/6
FSR	0	0	0.010	0.1	0.762	0.739	0.783	0.644
NSR	0.060	0.05	0.035	0.01	0.005	0.005	0	0.01

Note: For the elastic‐net penalty, we set .

Variable selection results by MNR, SIS‐SCAD, SIS‐MCP, SIS‐Lasso, and SIS‐Elastic‐Net for linear regression datasets simulated with , , , , and , 0.3, and 0.5 Note: For the elastic‐net penalty, we tried the setting . Variable selection results by MNR, SIS‐SCAD, SIS‐MCP, SIS‐Lasso, and SIS‐Elastic‐Net for linear regression datasets simulated with , , , , and Note: For the elastic‐net penalty, we set . For comparison, we applied the popular likelihood regularization methods, including SIS‐SCAD, SIS‐MCP, SIS‐Lasso, and SIS‐Elastic‐Net, to these datasets for performing variable selection under their default settings in the package SIS. It is known that the likelihood regularization methods tend to select more false predictors to compensate their shrinkage effects on regression coefficients, and this over‐selection issue can become worse as the ratio increases due to the increasing likelihood of spurious correlation. In statistics, spurious correlation refers to that two or more variables are associated but not causally related due to either coincidence or the presence of unseen confounding factors. The MNR method, as a multiple hypothesis test‐based method, provides a promising way for addressing the spurious correlation issue encountered in variable selection by controlling the FDR at a low level. As shown in Tables 2 and 3, MNR can significantly outperform the likelihood regularization methods in high‐dimensional variable selection; in particular, MNR tends to have a smaller FSR value and is more robust to the strength of signal than the regularization methods. More discussions on the properties of MNR in variable selection can be found in Sections 5 and 6.

IDENTIFICATION OF DRUG SENSITIVE GENES AND MUTATIONS

Disease heterogeneity is often observed in complex diseases such as cancer. For example, molecularly targeted cancer drugs are only effective for patients with tumors expressing targets. , The disease heterogeneity has directly motivated the development of precision medicine, aiming to improve patient care by tailoring optimal therapies to an individual patient according to his/her molecular profile and clinical characteristics. Identifying sensitive genes and mutations to different drugs is an important step toward the goal of precision medicine. In this study, we considered the CCLE dataset, which is publicly available at https://github.com/alexisbellot/GCIT/tree/master/CCLE%20Experiments. The dataset consists of 8‐point dose‐response curves for 24 drugs (or chemical compounds) across over 400 cell lines. For different drugs, the numbers of cell lines are slightly different. For each cell line, it consists of the expression data of 18 988 genes and 1638 mutations, which bring the dimension of the full dataset to . We used the area under the dose‐response curve, which was termed as activity area in Reference 38, to measure the sensitivity of a drug to each cell line. Compared to other measurements, such as IC and EC, the activity area could capture the efficacy and potency of the drug simultaneously. Since for each drug, the number of experimented cell lines is small, while the number of genes and mutations is large, accurate identification of the drug sensitive genes/mutations has posed a great challenge on the existing statistical methods. It is known that the regularization methods, such as Lasso, SCAD, and MCP, tend to select more false predictors to compensate their shrinkage effects on regression coefficients. In addition, they tend to select spuriously correlated variables due to their likelihood optimization nature. Spurious correlation often occurs in small‐‐large‐ regression due to randomness or unknown confounding factors. When spuriously correlated variables exist, they tend to be selected by likelihood‐based methods. For a dataset with a small number of observations, the spuriously correlated variables often reduce not only the fitting error but also the prediction error in cross‐validation. MNR, as a conditional independence test‐based method, provides a promising way for limiting the selection of spuriously correlated variables by controlling the FDR at a reasonable level. Algorithm 1 was applied to the dataset collected for each drug to select the drug‐sensitive genes and mutations. The selection was based on the adjusted ‐values of the conditional independence tests for each single gene/mutation. We set the significance level of the multiple hypothesis test at .05. If there were no genes/mutations selected at this significance level, we just reported one gene/mutation with the smallest adjusted ‐value. For comparison, the existing methods, including desparsified Lasso , , and ridge projection, were applied to this example. For each drug, desparsified Lasso is simply inapplicable due to the ultra‐high dimensionality of the dataset; the package hdi aborted due to the excess of memory limit. However, the ridge projection method still performed reasonably well. For this method, we also selected the genes/mutations with the adjusted ‐values less than .05 as significant, or reported one gene/mutation with the smallest adjusted ‐value if no gene/mutation was significant at the level .05. Table 4 summarizes the results produced by the above methods. It shows that MNR and ridge projection can produce similar or overlapped results for many drugs, while the confidence intervals produced by MNR tend to be narrower than those by ridge projection for the genes/mutations selected by both methods. For example, for the drugs Topotecan and Irinotecan, both methods selected the gene SLFN11 as a drug sensitive gene, and the confidence intervals by MNR are narrower than those by ridge projection. In the literature, References 38 and 39 reported that SLFN11 is predictive of treatment response for Topotecan and Irinotecan. For the drug 17‐AAG, both methods selected NQO1 as a drug sensitive gene. References 38 and 40 reported NQO1 as the top predictive biomarker for 17‐AAG. Other examples include the drug Nilotinib for which both methods selected APOL4, the drug PF2341066 for which both methods selected the mutation SCD5, the drug PLX4720 for which both methods selected the mutation BRAFV600E, and the drug Erlotinib for which both methods selected the mutation EGFR. It is known that EGFR is the target gene of the drug Erlotinib, and this target gene has been correctly identified by MNR.

TABLE 4

Drug	Desparsified‐Lasso	Ridge	MNR
17‐AAG	‐	NQO1(0.194)	NQO1(0.247)
AEW541	‐	NFE2L3(0.327)	GPATCH3(0.245)
AZD0530	‐	STK39(0.331)	PYY(0.208)
AZD6244	‐	SPRY2(0.303)	NRAS‐MUT*(0.548)
Erlotinib	‐	EGFR‐MUT(1.498)	EGFR‐MUT*(0.814)
	‐		CLK3‐MUT*(1.506)
	‐		EGFR*(0.261)
Irinotecan	‐	SLFN11*(0.337)	SLFN11*(0.2)
L‐685458	‐	SELPLG(0.473)	WDR86*(0.203)
Lapatinib	‐	ERBB2(0.561)	SCO1(0.303)
LBW242	‐	SET‐MUT(10.27)	SET‐MUT*(5.075)
Nilotinib	‐	APOL4*(0.474)	CAMK2A‐MUT*(2.017)
			NCF4*(0.349)
			CCL23*(0.352)
			TRDC*(0.211)
			RNASE2*(0.437)
			APOL4*(0.277)
Nutlin‐3	‐	SPIC(0.398)	ASB16*(0.231)
Paclitaxel	‐	ABCB1(0.326)	TM2D2*(0.280)
Panobinostat	‐	LOC100652995(0.250)	SVIP*(0.201)
PD‐0325901	‐	SPRY2(0.324)	THRSP‐MUT(2.696)
PD‐0332991	‐	TMTC2(0.346)	NFE2L3*(0.223)
PF2341066	‐	SCD5‐MUT(8.433)	SCD5‐MUT*(3.239)
			ANKRD22*(0.251)
			WDFY4*(0.314)
PHA‐665752	‐	GCFC2(0.387)	PDPK1‐MUT(3.429)
PLX4720	‐	BRAFV600E‐MUT*(1.830)	BRAFV600E‐MUT*(0.899)
	‐		PLEKHH3*(0.19)
	‐		IRAK1‐MUT*(1.66)
RAF265	‐	GNPTAB(0.354)	FAM89B*(0.255)
Sorafenib	‐	PROSER1(0.523)	DNAJC5B*(0.284)
Sorafenib	‐	PROSER1(0.523)	THAP10*(0.261)
TAE684	‐	SELPLG(0.457)	PPFIA1*(0.292)
TKI258	‐	WDFY4(0.464)	THEMIS*(0.304)
Topotecan	‐	SLFN11*(0.278)	SLFN11(0.17)
ZD‐6474	‐	APOL4(0.417)	PGBD2*(0.206)

Note: For each dataset, ridge regression cost 2.6 minutes CPU time with a single thread running in serial, and MNR cost 46.5 minutes CPU time with 10 threads running in parallel. All methods were run on the same personal computer with i9‐10900k CPU@3.6GHz and 128 GB memory.

Comparison of drug sensitive genes/mutations selected by desparsified Lasso, ridge projection, and MNR for 24 anti‐cancer drugs, where “*” indicates that this gene was significantly selected and the number in the parentheses denotes the width of the 95% confidence interval, and “‐MUT” indicates a mutation Note: For each dataset, ridge regression cost 2.6 minutes CPU time with a single thread running in serial, and MNR cost 46.5 minutes CPU time with 10 threads running in parallel. All methods were run on the same personal computer with i9‐10900k CPU@3.6GHz and 128 GB memory. For a thorough study for the performance of MNR, we have also compared it with some popular high‐dimensional variable selection methods such as SIS‐SCAD, SIS‐MCP, and SIS‐Lasso, which all fall into the class of likelihood regularization methods. We compared the performance of these methods in variable selection, goodness‐of‐fit and prediction. For this purpose, a 5‐fold cross validation experiment was conducted for each drug. Table 5 reports results for three selected drugs, 17‐AAG, Irinotecan, and PLX4720. More results are presented in Table S3 of the supplementary material. As expected, MNR tends to select much less numbers of genes/mutations and have slightly larger prediction errors than the likelihood regularization methods.

TABLE 5

Drug	Methods	MSFE	MSPE	Size	Selected genes/mutations
17‐AAG	SIS‐SCAD	0.62(0.21)	0.88(0.16)	20.0(11.5)	NQO1(4),CDH6(3),MMP24(3),ZNF610(3), ZFP30(3),ZNF14(3)
	SIS‐MCP	0.54(0.02)	0.89(0.14)	16.2(3.5)	NQO1(5),CDH6(3),MMP24(3), ZFP30(3),CBFB(3)
	SIS‐Lasso	0.77(0.17)	0.99(0.10)	7.8(11.0)	MMP24(4),NQO1(2),ZFP30(2),CTDSP1(2)
	MNR	0.93(0.04)	0.98(0.11)	1.2(0.5)	NQO1(4)
Irinotecan	SIS‐SCAD	0.44(0.05)	0.55(0.08)	6.6(0.9)	ARHGAP19(5),SLFN11(4)
	SIS‐MCP	0.46(0.05)	0.56(0.09)	3.8(0.8)	ARHGAP19(5),SLFN11(4)
	SIS‐Lasso	0.43(0.06)	0.54(0.09)	9.8(3.0)	ARHGAP19(5),CPSF6(5), SLFN11(4),CD63(3)
	MNR	0.74(0.02)	0.75(0.07)	1.0(0.0)	SLFN11(5)
PLX4720	SIS‐SCAD	0.59(0.05)	0.91(0.28)	9.8(5.1)	GAPDHS(3),MAD1L1(3),RXRG(2), LPL(2),ART3(2),ZFP106(2)
	SIS‐MCP	0.61(0.04)	0.89(0.27)	5.4(2.8)	GAPDHS(3),ZFP106(2),ZEB2(2)
	SIS‐Lasso	0.60(0.06)	0.87(0.23)	10.2(5.6)	SPRYD5(5),GAPDHS(4),RXRG(3)
	MNR	0.52(0.05)	0.65(0.12)	3.2(3.8)	BRAF.V600E‐MUT(5), IRAK1‐MUT(2)

Comparison of MNR with SIS‐SCAD, SIS‐MCP, and SIS‐Lasso for model prediction and variable selection on three selected drugs, 17‐AAG, Irinotecan, and PLX4720, via 5‐fold cross‐validation experiments: “MSFE” denotes the mean squared fitting error, “MSPE” denotes the mean squared prediction error, and “Size” denotes the number of selected gene/mutations, which are reported as the average over 5‐fold results with the standard deviation given in the parentheses; “selected Genes/mutations” shows the genes and mutations selected in the 5‐fold experiments, where the number in the parentheses represents the selection frequency of each selected gene/mutation As mentioned previously, this phenomenon can possibly be explained by spurious correlation, which often causes the likelihood‐based methods to a high FDR. In contrast, MNR selects variables based on multiple hypothesis tests and it can, as demonstrated by our previous simulation examples, effectively limit the effect of spurious correlation by controlling the FSR at a low level. Further, we note that the genes/mutations selected by MNR for the three drugs in Table 5 have been verified in the literature as described above. Finally, we note that for the drug PLX4720, MNR did not only select a smaller number of genes/mutations, but also predicted more accurately. This is because it selected the right mutation BRAF.V600E, while the likelihood regularization methods failed to do so. In summary, MNR tends to select a more parsimonious but trustful model than the likelihood regularization methods for high‐dimensional regression problems.

DISCUSSION

Based on the Markov property of graphical models and the likelihood ratio test, this article provides a simple justification for the MNR method such that it can be applied to statistical inference for high‐dimensional GLMs with mixed features. The MNR method has been tested on both simulated and real data problems. The numerical results indicate its superiority over the existing methods. Compared to desparsified Lasso, MNR does not only produce more accurate confidence intervals, but also is computationally more efficient. Both methods involve nodewise regression, but MNR avoids calculation of the precision matrix required by desparsified Lasso, which is costly when the dimension is high. The MNR method is highly attractive in that it has a high‐dimensional inference problem reduced to a series of low‐dimensional inference problems. Consequently, the MNR method possesses an embarrassingly parallel structure, and its computation can be much accelerated (than reported in the article) if running in parallel on a multi‐core computer. Other than parallel implementation, as mentioned in Remark 2, the computation of the MNR method can be further accelerated by replacing the variable selection procedures involved in the method by some sure independent screening procedures. This is worth a further investigation. As shown in this article, as a by‐product, the MNR method can also be used for variable selection for high‐dimensional GLMs. Due to its use of the dependence structure among the predictors, the MNR method tends to outperform the existing variable selection methods. A similar finding has been reported in Reference 41 that use of the correlation structure among the predictors can often improve the performance of a variable selection method. In addition, due to its multiple hypothesis testing nature, the MNR method can effectively limit the effect of spurious correlation that has bothered the likelihood regularization methods under the small‐‐large‐ scenario. Data S1 The supplementary material (i) provides a selective review for high‐dimensional inference methods, (ii) demonstrates the robustness of the MNR method with respect to the dependence structure among explanatory variables, and (iii) presents some cross‐validation results for CCLE data analysis Click here for additional data file.

12 in total

1. Sparse inverse covariance estimation with the graphical lasso.

Authors: Jerome Friedman; Trevor Hastie; Robert Tibshirani
Journal: Biostatistics Date: 2007-12-12 Impact factor: 5.899

2. Discussion of "Sure Independence Screening for Ultra-High Dimensional Feature Space.

Authors: Hao Helen Zhang
Journal: J R Stat Soc Series B Stat Methodol Date: 2008-11 Impact factor: 4.488

3. Selection and estimation for mixed graphical models.

Authors: Shizhe Chen; Daniela M Witten; Ali Shojaie
Journal: Biometrika Date: 2014-12-24 Impact factor: 2.445

4. Learning the Structure of Mixed Graphical Models.

Authors: Jason D Lee; Trevor J Hastie
Journal: J Comput Graph Stat Date: 2015-01-01 Impact factor: 2.302

5. Graphical Models via Univariate Exponential Family Distributions.

Authors: Eunho Yang; Pradeep Ravikumar; Genevera I Allen; Zhandong Liu
Journal: J Mach Learn Res Date: 2015-12 Impact factor: 3.654

6. Putative DNA/RNA helicase Schlafen-11 (SLFN11) sensitizes cancer cells to DNA-damaging agents.

Authors: Gabriele Zoppoli; Marie Regairaz; Elisabetta Leo; William C Reinhold; Sudhir Varma; Alberto Ballestrero; James H Doroshow; Yves Pommier
Journal: Proc Natl Acad Sci U S A Date: 2012-08-27 Impact factor: 11.205

Review 7. Developing inhibitors of the epidermal growth factor receptor for cancer treatment.

Authors: Viktor Grünwald; Manuel Hidalgo
Journal: J Natl Cancer Inst Date: 2003-06-18 Impact factor: 13.506

8. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity.

Authors: Jordi Barretina; Giordano Caponigro; Nicolas Stransky; Kavitha Venkatesan; Adam A Margolin; Sungjoon Kim; Christopher J Wilson; Joseph Lehár; Gregory V Kryukov; Dmitriy Sonkin; Anupama Reddy; Manway Liu; Lauren Murray; Michael F Berger; John E Monahan; Paula Morais; Jodi Meltzer; Adam Korejwa; Judit Jané-Valbuena; Felipa A Mapa; Joseph Thibault; Eva Bric-Furlong; Pichai Raman; Aaron Shipway; Ingo H Engels; Jill Cheng; Guoying K Yu; Jianjun Yu; Peter Aspesi; Melanie de Silva; Kalpana Jagtap; Michael D Jones; Li Wang; Charles Hatton; Emanuele Palescandolo; Supriya Gupta; Scott Mahan; Carrie Sougnez; Robert C Onofrio; Ted Liefeld; Laura MacConaill; Wendy Winckler; Michael Reich; Nanxin Li; Jill P Mesirov; Stacey B Gabriel; Gad Getz; Kristin Ardlie; Vivien Chan; Vic E Myer; Barbara L Weber; Jeff Porter; Markus Warmuth; Peter Finan; Jennifer L Harris; Matthew Meyerson; Todd R Golub; Michael P Morrissey; William R Sellers; Robert Schlegel; Levi A Garraway
Journal: Nature Date: 2012-03-28 Impact factor: 49.962

9. Markov neighborhood regression for statistical inference of high-dimensional generalized linear models.

Authors: Lizhe Sun; Faming Liang
Journal: Stat Med Date: 2022-06-10 Impact factor: 2.497

10. Use of NQO1 status as a selective biomarker for oesophageal squamous cell carcinomas with greater sensitivity to 17-AAG.

Authors: Katie E Hadley; Denver T Hendricks
Journal: BMC Cancer Date: 2014-05-15 Impact factor: 4.430

1 in total

1. Markov neighborhood regression for statistical inference of high-dimensional generalized linear models.

Authors: Lizhe Sun; Faming Liang
Journal: Stat Med Date: 2022-06-10 Impact factor: 2.497

1 in total