Literature DB >> 28624800

Identifying drug-pathway association pairs based on L1L2,1-integrative penalized matrix decomposition.

Dong-Qin Wang¹, Ying-Lian Gao², Jin-Xing Liu¹, Chun-Hou Zheng¹, Xiang-Zhen Kong¹.

Abstract

The traditional methods of drug discovery follow the "one drug-one target" approach, which ignores the cellular and physiological environment of the action mechanism of drugs. However, pathway-based drug discovery methods can overcome this limitation. This kind of method, such as the Integrative Penalized Matrix Decomposition (iPaD) method, identifies the drug-pathway associations by taking the lasso-type penalty on the regularization term. Moreover, instead of imposing the L1-norm regularization, the L2,1-Integrative Penalized Matrix Decomposition (L2,1-iPaD) method imposes the L2,1-norm penalty on the regularization term. In this paper, based on the iPaD and L2,1-iPaD methods, we propose a novel method named L1L2,1-iPaD (L1L2,1-Integrative Penalized Matrix Decomposition), which takes the sum of the L1-norm and L2,1-norm penalties on the regularization term. Besides, we perform permutation test to assess the significance of the identified drug-pathway association pairs and compute the P-values. Compared with the existing methods, our method can identify more drug-pathway association pairs which have been validated in the CancerResource database. In order to identify drug-pathway associations which are not validated in the CancerResource database, we retrieve published papers to prove these associations. The results on two real datasets prove that our method can achieve better enrichment for identified association pairs than the iPaD and L2,1-iPaD methods.

Entities: CellLine Chemical Disease Gene Species

Keywords: L1-norm; L2,1-norm; drug discovery; integrative penalized matrix decomposition; paired drug-pathway associations

Mesh：

Year: 2017 PMID： 28624800 PMCID： PMC5564627 DOI： 10.18632/oncotarget.18254

Source DB: PubMed Journal: Oncotarget ISSN： 1949-2553

INTRODUCTION

With the rapid development of data generated from genetic analyses and functional genomics, identifying drug targets has become more and more feasible [1]. The modern drug discovery has found thousands of drug targets and the clinical testing agents. These agents can produce curative effects via regulating different targets in various biological and related diseases of pathways. Traditional drug discovery methods follow the “one drug-one target” line. But many complex diseases are related to dysfunction of multiple pathways rather than individual genes [2]. These methods ignore the relationship among genes and the systemic nature of human diseases. In general, pathways are defined as the interaction of multiple genes, that is, pathways can be regarded as the target of drugs [3, 4]. Abnormal biological pathways can provide views for the aberrant imbalance underlying diseases and find targets for complex diseases intervention [5]. Compared with the “one drug-one target” methods, the systematic biology approach takes the drug effects into the global physiological environment account [6]. These existing computational methods for identifying drug targets include the Gene Set Enrichment Analysis (GSEA) method [7], the FacPad method [2], the iFad method [5] and the iPaD method [8], etc. The GSEA method has several disadvantages. Firstly, for every paired drug-pathway association, calculation must be done once at every turn. Secondly, the genes which belong to the same pathway will have the common weights, therefore, when a subblock of genes serve as the critical interaction groups for a specific drug. In summary, the GSEA method is time-consuming and inaccurate. The FacPad method has been proposed to identify drug-pathway association pairs, and it develops a sparse Bayesian factor analysis model to deal with treatment response data that are derived from microarray platforms [2]. And the iFad method is also a sparse Bayesian factor model, which is proposed to infer the drug-pathway association using gene expression and drug sensitivity data. In literature [5], the authors apply this method on the NCI-60 data set. The gene expression and drug sensitivity data are downloaded from the CellMiner database [9] (http://discover.nci.nih.gov/cellminer). The iFad method is effective in identifying paired drug-pathway associations. Since this method uses the Markov Chain Monte Carlo (MCMC) [10] to perform statistical inferences, it is computationally expensive. Also it has many turning parameters which should be specified by users. In order to improve the algorithm speed and performance, a method named iPaD (integrative Penalized Matrix Decomposition) is proposed by Li et al.[8] to identify paired drug-pathway associations. The iPaD method applies the integrative penalized matrix decomposition method to analyze the gene expression data and the drug sensitivity data. The scalable bi-convex optimization algorithm is used to solve the objective function. Since the -norm penalty can produce sparsity, the -norm regularization item is added on the drug-pathway association matrix. By applying the iPaD method on the NCI-60 and Cancer Cell Line Encyclopedia (CCLE) datasets, Li et al. [8] prove that the iPaD method performs better than the iFad method in computational efficiency and identifying drug-pathway association pairs. Since the -norm regularization can penalize each row of the matrix as a whole and can enhance the sparsity among the rows [11], based on this theory, a method named -iPaD (-integrative Penalized Matrix Decomposition) is proposed to identify paired drug-pathway associations [12]. The -norm penalization can produce scattered and unstructured sparsity matrix, yet -norm penalization can produce structured row sparsity matrix [11]. Moreover, the sum of the -norm and -norm penalization can produce row structure with intra-row sparsity. In this paper, for the purpose of enhancing the sparsity of the drug-pathway association matrix and improving the performance of the method, we use the sum of the -norm regularization and -norm regularization instead of the -norm regularization. In this article, a novel method named “-iPaD” is proposed to identify paired drug-pathway associations. Our method has the following advantages. Firstly, for the first time, we propose the -iPaD method by replacing the -norm regularization with the sum of the -norm regularization and -norm regularization. Secondly, the -iPaD method can be used to analyze gene expression and drug sensitivity data. Thirdly, it gives an effective method to identify drug-pathway association pairs. Experimental results in two real datasets prove that the -iPaD method is effective. The remainder of this paper is organized as follows. Firstly, we will introduce two real datasets, give out the experimental results and show the comparison of our method with the state-of-art methods. Then we will provide discussion of this article and outline the future works. And thirdly, we will introduce the notations and definitions in this paper. Finally, we will describe the related works and methodology of the -iPaD.

THE INTRODUCTION FOR DATASETS

CCLE data set

The CCLE data set is downloaded from the CCLE project, which is a competitive resource to implement a detailed genetic characterization for a large panel of human cancer cell lines. The CCLE project (http://software.broadinstitute.org/software/cprg/?q=node/11) can provide public access analysis, visualization of DNA copy number, mutation data, mRNA expression, and so on, for about 1046 cancer cell lines. The CCLE data set consists of 480 cell lines with drug sensitivity data for 22 drugs and transcription data for 1802 genes covering 58 KEGG pathways. In fact, the CCLE data set in the CCLE project contains 18988 genes and 24 chemical drugs. The detailed data processing can be found in [8]. The drug sensitivities are measured by area over the dose-response curve (‘activity area’). , and maximum activity area (‘’) can also measure drug sensitivities. But activity area has two advantages. Firstly, it has less missing values. Secondly, it can both reflect drug potency and efficacy. But and can only reflect drug potency, and can only reflect its efficacy. In this paper, the density of the gene-pathway association prior knowledge matrix is 3.95%. And for the drug-pathway association prior knowledge matrix , we set it to zero.

NCI-60 data set

The NCI-60 data set contains 57 cell lines with gene expression data for 1863 genes covering 58 KEGG [13] pathways and drug sensitivity data for 101 drugs. The NCI-60 data set is from the NCI-60 project, which provides various type of ‘Omics’ features of 60 cell lines with 9 different cancer types. The gene expression data and the drug sensitivity profiles are downloaded from the CellMiner database [9]. The detailed data processing can be found in [5]. The drug sensitivity data are obtained by values, those values can reflect the potency of drugs. The is defined as the needed concentration for 50% of maximum cell growth inhibition. The drug sensitivity data values are equal to the . Higher values mean higher drug sensitivity of the cell lines. Besides, the density of the gene-pathway association prior knowledge matrix is 3.95%. And the density of the drug-pathway association prior knowledge matrix is 0.51%.

RESULTS

In this Section, we evaluate the performance of our proposed -iPaD method by applying to the CCLE and NCI-60 datasets. To show the effectiveness of our proposed method, we also compare our method with the iPaD and -iPaD methods in this Section.

The results on the CCLE data set

In this experiment, we first use the five-fold cross-validation to obtain the optimal and values, and then obtain the sparse drug-pathway association matrix corresponding to the optimal parameter values. Similar to the iPaD and -iPaD methods, our method also can assess the relative importance of the coefficients in the drug-pathway association matrix via solving the -problem for a decreasing sequence. For each value, we record the order of the coefficients in which they become nonzero. In general, the more important coefficients ought to become nonzero earlier than the less important coefficients. However, this procedure cannot be used to assess the significance of the coefficients. Therefore, we perform permutation test to assess the significance of the coefficients in the drug-pathway association matrix . In this paper, we run 2000 permutations to obtain the P-values. The P-values of our method, the -iPaD and iPaD methods on the CCLE data set are listed in Table 1, in which the superior results are shown in bold type. In this paper, the known drug-pathway associations are served as the validated information. In Table 1, the drug named Nutlin-3 is related with the Chronic myeloid leukemia pathway. A published paper suggests that Nutlin-3 can up-regulate the expression of Notch1 in both lymphoid and myeloid leukemic cells for a part of the negative feedback antiapoptotic mechanism [14]. Besides, the authors in [15] confirm that IAP inhibition using a small synthetic inhibitor (LBW-242) increases the sensitivity of CML cells to TKI. This drug-pathway association pair is not validated in the CancerResource, but our method can find their associations. And in Table 1, only 5 drug-pathway association pairs are validated in the CancerResource. For the rest 10 drug-pathway associations which are not validated in the CancerResource, we retrieve published papers to prove their associations. Only one association pair is not found from published papers. Besides, similarly, the authors in [16] suggest that cotreatment with LBH589 and 17-AAGcan induce more apoptosis of IM-resistant primary CML-BC and acute myeloidleukemia cells than treatments with either agent alone. And in this paper, our new method also can find that 17-AAG is associated with the Chronic myeloid leukemia pathway. Moreover, the drug-pathway pairs corresponding to nonzero elements in the matrix are selected as the identified drug-pathway association pairs. In this experiment, our method identifies 413 drug-pathway pairs that have p-value no more than 0.05, and 70 drug-pathway pairs are validated in the CancerResource database. The -iPaD method identifies 368 drug-pathway pairs that have p-value no more than 0.05, and 66 drug-pathway pairs are validated in the CancerResource database. However, iPaD identifies 88 drug-pathway pairs that have p-value no more than 0.05, and only 25 drug-pathway pairs are validated in the CancerResource database. When we set the P-value cutoff as 0.005, 51 drug-pathway association pairs are identified by the iPaD method, with 16 association pairs validated in the CancerResource database. But for our method and the -iPaD method, 53 drug-pathway association pairs are identified, with 16 association pairs validated in the CancerResource database. In addition, in [8], we can easily find that the results of iFad and GSEA methods are poorer than the iPaD method, so in this paper, we does not compare our method with the iFad and GSEA methods.

Table 1

The top 15 identified drug-pathway associations on CCLE data set by -iPaD, -iPaD and iPaD methods

Drug	Pathway	L1L2,1-iPaD	L2,1-iPaD	-iPaD	Validated
Drug	Pathway	P-value			Validated
Nutlin-3	Chronic myeloid leukemia	0	1.74E-43	1.09E-17	CR
PD-0332991	Chronic myeloid leukemia	0	6.93E-41	3.16E-13	CR
LBW242	Chronic myeloid leukemia	1.51E-45	2.80E-44	8.08E-17	[28]
17-AAG	Chronic myeloid leukemia	5.52E-45	9.46E-43	3.41E-16	[16]
L-685458	Chronic myeloid leukemia	6.81E-44	4.33E-43	3.32E-19	[29]
AZD0530	Colorectal cancer	1.46E-43	1.62E-41	8.81E-13	[30]
PHA-665752	Chronic myeloid leukemia	1.03E-41	1.09E-40	3.41E-16	[31]
Paclitaxel	Chronic myeloid leukemia	2.11E-40	2.14E-38	4.58E-12	[32]
AZD0530	Chronic myeloid leukemia	5.86E-40	7.12E-38	4.76E-18	CR
PD-0325901	Thyroid cancer	1.25E-28	3.59E-12	2.57E-05	[33]
ZD-6474	Chronic myeloid leukemia	1.52E-22	1.62E-21	2.36E-10	[34]
RAF265	ECM-receptor interaction	2.19E-17	1.26E-15	2.32E-04	Unfound
AZD0530	ErbB signaling pathway	1.13E-16	4.41E-16	5.10E-06	CR
Erlotinib	Chronic myeloid leukemia	3.71E-15	5.69E-15	2.34E-08	[35]
Nilotinib	ErbB signaling pathway	4.00E-14	1.23E-13	1.80E-05	CR

Then we compute the identification and verification rates of drug-pathway association pairs on the CCLE data set. Specifically, we make the ratio of number of identification (or verification) and total number of drug-pathway pairs as the identification (or verification) rate. The identification and verification rates of drug-pathway association pairs on the CCLE data set for the -iPaD, -iPaD and iPaD methods can be found in Table 2 and Table 3. It is obvious that our method can identify more drug-pathway association pairs than other existing methods.

Table 2

The identification and verification rates on CCLE data set with P-value<0.005

Method	Number of identification	Number of verification	Verification rate	Identification rate
L1L2,1-iPaD	53	16	0.0125	0.0415
L2,1-iPaD	53	16	0.0125	0.0415
iPaD	51	16	0.0125	0.0399

Table 3

The identification and verification rates on CCLE data set with P-value<0.05

Method	Number of identification	Number of verification	Verification rate	Identification rate
L1L2,1-iPaD	413	70	0.0549	0.3237
L2,1-iPaD	368	66	0.0517	0.2884
iPaD	88	25	0.0196	0.0689

The results on the NCI-60 data set

Similar to the -iPaD and iPaD methods, we also apply our method to the NCI-60 data set. For the NCI-60 data set, we also run 2000 permutation to evaluate the significance of the coefficients in the drug-pathway association matrix . The P-values of our method, the -iPaD and iPaD methods on the NCI-60 data set are listed in Table 4. The authors in [17] prove that the mechanism of action of the EA derivatives prepared in this study is more complex than the inhibition of glutathione S-transferase p ascribed as unique effect to EA and might help to overcome tumor resistances. And in Table 4, Melphalan is associated with T cell receptor signaling pathway. A study published in 2003 suggests that Melphalan can control the expression of T cell receptor signaling pathway [18]. In Table 4, only 6 drug-pathway association pairs are validated in the CancerResource. For the rest 9 drug-pathway associations which are not validated in the CancerResource, we retrieve published papers to prove their associations. Only one pair associations are not found from published papers. So, our method can also identify associations, which are not validated in CancerResource, in the NCI-60 data set between drugs and pathways. For example, a paper studied in 2000 writes that an in vitro study the effects of MPA (Mycophenolic acid) on human peripheral blood lymphocyte activation markers and on cell cycle characteristics are investigated [19]. Moreover, the drug-pathway pairs corresponding to nonzero elements in the matrix are selected as the identified drug-pathway association pairs. In this experiment, our method identifies 593 drug-pathway pairs that have p-value no more than 0.05, and 163 drug-pathway pairs are validated in the CancerResource database. The -iPaD method identifies 562 drug-pathway pairs that have p-value no more than 0.05, and 163 drug-pathway pairs are validated in the CancerResource database. However, iPaD identifies 247 drug-pathway pairs that have p-value no more than 0.05, and only 74 drug-pathway pairs are validated in the CancerResource database. And when we set the P-value cutoff as 0.005, the results of our method is similar to the -iPaD method, but the number of identification and verification are higher than the iPaD method. Similar to the CCLE data set, we also compute the identification and verification rates of drug-pathway association pairs on the NCI-60 data set. The identification and verification rates of drug-pathway association pairs on the NCI-60 data set for the -iPaD, -iPaD and iPaD methods can be found in Table 5 and Table 6.

Table 4

The top 15 identified drug-pathway associations on NCI-60 data set by -iPaD, -iPaD and iPaD methods

Drug	Pathway	L1L2,1-iPaD	L2,1-iPaD	iPaD	Validated
Drug	Pathway	P-value			Validated
Hydroxyurea	Neuroactive ligand-receptor interation	0	0	NAN	[36]
Rebeccamycin	T cell receptor signaling pathway	1.78E-17	4.12E-16	4.65E-10	Unfound
Tiazofurin	Cell cycle	7.70E-12	8.19E-11	7.54E-07	CR
Selenazofurin	Cell cycle	8.27E-11	1.75E-10	2.78E-07	CR
Mycophenolic acid	Cell cycle	9.02E-11	2.61E-10	2.52E-06	[19]
Lucanthone	Tight junction	2.06E-08	9.97E-09	4.31E-06	CR
Primaquine	Neuroactive ligand-receptor interation	4.46E-08	1.14E-06	2.69E-04	[37]
Ethacrynic acid	Glutathione metabolism	1.17E-07	2.29E-02	6.36E-03	CR
Aminoglutethimide	Primary immunodeficiency	6.55E-07	1.30E-06	1.16E-04	[38]
Diallyl disulfide	Acute myeloid leukemia	8.76E-07	8.13E-06	8.41E-05	[39]
Bleomycin	Focal adhesion	1.46E-06	1.17E-05	4.56E-04	[40]
Geldanamycin	Gap junction	1.46E-06	7.89E-06	1.87E-04	[41]
Melphalan	T cell receptor signaling pathway	3.76E-06	2.64E-05	6.16E-04	CR
Lomustine	Tight junction	6.74E-06	1.06E-05	2.64E-04	CR
Vitamin K3	Metabolism of xenobiotics by cytochrome P450	9.98E-06	2.22E-05	2.71E-04	[42]

Table 5

The identification and verification rates on NCI-60 data set with P-value<0.05

Method	Number of identification	Number of verification	Verification rate	Identification rate
L1L2,1-iPaD	593	163	0.0278	0.1012
L2,1-iPaD	562	163	0.0278	0.0959
iPaD	247	74	0.0126	0.0422

Table 6

The identification and verification rates on NCI-60 data set with P-value<0.005

Method	Number of identification	Number of verification	Verification rate	Identification rate
L1L2,1-iPaD	89	34	0.0058	0.0152
L2,1-iPaD	89	33	0.0056	0.0152
iPaD	72	26	0.0044	0.0122

DISCUSSION

Identifying drug-targets is a momentous issue for the bioinformatics and a crucial step for the drug discovery. In this paper, we proposed a novel method named “-iPaD” to identify the paired drug-pathway associations, and used it to jointly analyze the gene expression and drug sensitivity data. In addition, for the -iPaD method, two parameters need to adjust. So, we use five-fold cross-validation to choose the optimal parameters. Besides, we perform permutation test to assess the significance of the identified drug-pathway association pairs. In order to evaluate the performance of our method, we apply it to two real datasets, including the CCLE and NCI-60 datasets. Moreover, we compare our method with the iPaD and -iPaD methods. The experimental results demonstrate that our method is superior to the iPaD and -iPaD methods. Our method can identify more drug-pathway association pairs which are validated in the CancerResource database than other methods. Besides, for the rest drug-pathway associations which are not validated in the CancerResource database, we retrieve published papers to prove their associations. With the development of the high-throughput technology, more and more genomic data sets are generated from various fields. At present, one of our central task is to develop the feasible and efficient method to analyze them. Our method is one of the useful ways for identifying drug-pathway association pairs. In the future, we will develop more efficient and robust methods to jointly analyze high dimensional data and to solve the computational challenges.

NOTATIONS AND DEFINITIONS

In this Section, we summarize the definition of norms, which will be used in the methods. Given a matrix , denotes the -th row of the matrix . For the matrix , the -norm of a matrix was first presented in [20], whose definition can be written as The Frobenius norm of the matrix can be defined as And the -norm of a matrix was first introduced in [21]. Until now, the -norm has been used in many fields, such as the feature extraction [22] and the image processing [23, 24] etc. The -norm of the matrix is defined as The -norm of the matrix can be written as follows [16]:

RELATED WORKS

iPaD method

Let and denote drug sensitivity data and transcription data matrices, respectively. is the number of the samples (usually cell lines). and are the number of genes and number of drugs, respectively. Besides, denotes a pathway activity matrix, that is, it indicates the activity level of pathways in the samples. For the traditional iPaD method [8], the authors decompose the matrix into the matrices and , and the matrix into the matrices and . Therefore, the model can be introduced as follows: where and are the error matrices. and denote the gene-pathway association and drug-pathway association matrices, respectively. In general, the model (5) can be formulated as follows: where denotes the Frobenius norm. For the Eq.(6), the optimization model of iPaD method [8] is defined as follows: where is the -norm of the matrix . is a crucial parameter and used to adjust the sparsity of the matrix . In general, the bigger the value of is, the more sparse the matrix is. Since a drug is usually related to a few pathways, the matrix may be sparse. Because the -norm can produce sparsity, the authors in [8] add the -norm constraint on the matrix . And, the prior knowledge matrix is an indicating matrix, that is, the matrix is used to indicate elements in the matrix . If , it indicates that the -th gene is associated with -th pathway. However, if , it indicates that the -th gene is not associated with the -th pathway. The authors in [8] assume that the known gene-pathway associations are complete, so, they pay attention to discover the novel drug-pathway associations. Therefore, the second constraint condition is used to incorporate known gene-pathway associations. Besides, the first constraint condition is used to guarantee that the model is identical.

-iPaD method

Since the -norm penalty can produce rows sparsity [25], another effective method named -iPaD is proposed to identify the novel drug-pathway association pairs [12]. In this paper, the -norm regularization is used to replace the -norm regularization to enforce sparse constraint on rows. And it modifies the optimization problem (7) as follows: where denotes the -norm of the matrix.

METHODOLOGY

Since the gene-pathway association information is available and complete, the central interest lies in the inference of the matrix , that is, the paired drug-pathway associations. We further consider enhancing the sparsity of the matrix . Thus, we add the -norm regularization to impose the penalty among all elements in the matrix and propose our new -iPaD method. The objective function of the -iPaD method is defined as follows: where and are two adjustable parameters, which can increase or decrease the sparsity of the matrix . In general, the bigger the and are, the more sparse the matrix is. Optimization problem (9) is convex. That is, when we fix and , optimizing is a convex problem, and when we fix , optimizing and are both convex optimization subproblems. Since it is difficult to directly obtain the solution, an effective method is proposed to solve the optimization problem in Eq.(9).

Optimizing

where and . The iPaD method [8] used the traditional gradient descent method to optimize . Here, we also apply the gradient descent method to solve this problem. The objective function of Eq.(10) can be written as follows: By computing the derivative of Eq.(11), we can obtain So that can be updated by Here, is the iteration step size. At every iteration, we check whether is located in the feasible fields . If satisfies this condition, we perform next step, otherwise, we make . The Nesterov's method [26] can achieve a convergence rate becoming (The original convergence rate is ). So, when we update , we also apply this method to quicken the convergence speed. Least squares method is a common optimization algorithm to solve the unconstrained optimization problems. In order to use the prior knowledge matrix to optimize , similar to [8], we also decompose the original problem into OLS (ordinary least squares) problems. That is, we optimize each column of , separately. The problem (14) can be introduced as follows: where is a vector with the elements corresponding to the -th column of the matrix , and denotes the sub-matrix of the matrix , which is composed of the columns corresponding to the non-zero parts of the prior knowledge matrix . And is a vector with the -th column vector of corresponding to the non-zero parts of the prior knowledge matrix . For optimizing , we observe each column of the matrix and decompose the original problem into sum of -norm and -norm minimization problems: In order to use the prior knowledge on the drug-pathway associations (matrix ), we add the -norm on the matrix . Thus, the problem (16) can be rewritten as follows: Similar to the prior information matrix , is also a prior knowledge matrix, which can indicate drug-pathway association matrix . and are two regulable parameters which are used to control the sparsity of the paired drug-pathway association matrix . For the part of being indicated by , the problem is written as follows: where we omit the symbol of the objective function. The objective function can be converted into the following equation: where is a unit matrix with the size of , and is an auxiliary function. Then we compute the derivative of and set its result to zero, Hence, we can obtain: For the part of being indicated by , we will provide a novel and available method to obtain the interesting drug-pathway association matrix . The optimization problem can be described as follows: where is a diagonal matrix with the -th diagonal element as , and is also a diagonal matrix with the -th diagonal element as . In order to simplify the optimization problem (22), we set and . Then, the problem (22) can be equivalent to the following minimization problem: The objective function is equal to the following equation: And then we compute the derivative of , then set its result to zero, we have: So, we can obtain: Therefore, we finally compute by optimizing , that is, . We sum up the -iPaD method as the Algorithm 1. Besides, in this paper, we also use the soft-impute algorithm to deal with the missing values problems in solving . The detailed steps can be found from the iPaD [8] and -iPaD methods [12].

Algorithm 1

The alternating updating algorithm for the -iPaD method

Data Input: Y(1), Y(2), H(1)Parameter: λ1, λ2Output: B(2)
Initialization: set B(1)=H(1) and set B(2)=0.Optimization:(1). Optimize X: X=argminX‖Y−XB‖F2s.t.∑iXi,j2≤1,∀j=1,…,K,where Y=[Y(1),Y(2)] and B=[B(1),B(2)].(2). Optimize B(1): B(1)=argminB(1)‖Y(1)−XB(1)‖F2s.t. Bi,j(1)=0,∀(i,j):Hi,j(1)=0.(3). Optimize B(2): B(2)=argminB(2)‖Y(2)−XB(2)‖F2+λ1‖B(2)‖1+λ2‖B(2)‖2,1.(4). Repeat step (1), (2) and (3) until convergence.

Parameter selection and significance test

Compared with the iPaD and -iPaD methods, our new -iPaD method has two adjustable parameters ( and ), which can control the sparsity of the drug-pathway association matrix . In this paper, We perform five-fold cross-validation to find the suitable parameters. In the five-fold cross-validation, we divide the matrix into 5 folds. At each round of the cross-validation, we make each of the 5 folds as missing values, and the rest of 4 folds as training data. It is obvious that it is difficult to find the optimal parameters and , simultaneously. Therefore, we solve the -iPaD problem to produce a sequence by fixing , and then we use five-fold cross-validation to find an optimal parameter . We treat the minimum residual sum of squares (RSS) corresponding to the value as the optimal value. Similarly, we find an optimal parameter by fixing . Since our method is a sparse optimization algorithm, there are many zero coefficients in the matrix . We usually treat those nonzero coefficients as the identified core drug-pathway association pairs. After searching for the optimal parameters, we perform permutation test [27] to assess the significance of the coefficients. We first obtain the estimated value of the matrix , and then compute the P-value of the coefficient for the matrix , the computational formula is as follows: where denotes the -th permutation estimated values of the matrix , and is the total number of permutations, is the estimated values of the matrix in the original data.

29 in total

1. iFad: an integrative factor analysis model for drug-pathway association inference.

Authors: Haisu Ma; Hongyu Zhao
Journal: Bioinformatics Date: 2012-05-10 Impact factor: 6.937

2. Erlotinib effectively inhibits JAK2V617F activity and polycythemia vera cell growth.

Authors: Zhe Li; Mingjiang Xu; Shu Xing; Wanting Tina Ho; Takefumi Ishii; Qingshan Li; Xueqi Fu; Zhizhuang Joe Zhao
Journal: J Biol Chem Date: 2006-12-18 Impact factor: 5.157

3. ABT-737 increases tyrosine kinase inhibitor-induced apoptosis in chronic myeloid leukemia cells through XIAP downregulation and sensitizes CD34(+) CD38(-) population to imatinib.

Authors: Kelly Airiau; François-Xavier Mahon; Marina Josselin; Marie Jeanneteau; Beatrice Turcq; Francis Belloc
Journal: Exp Hematol Date: 2012-01-10 Impact factor: 3.084

Review 4. Redox sensing and signaling by malaria parasite in vertebrate host.

Authors: Satyajit Tripathy; Somenath Roy
Journal: J Basic Microbiol Date: 2015-03-03 Impact factor: 2.281

5. The inhibition of ERK/MAPK not the activation of JNK/SAPK is primarily required to induce apoptosis in chronic myelogenous leukemic K562 cells.

Authors: C D Kang; S D Yoo; B W Hwang; K W Kim; D W Kim; C M Kim; S H Kim; B S Chung
Journal: Leuk Res Date: 2000-06 Impact factor: 3.156

6. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.

Authors: Aravind Subramanian; Pablo Tamayo; Vamsi K Mootha; Sayan Mukherjee; Benjamin L Ebert; Michael A Gillette; Amanda Paulovich; Scott L Pomeroy; Todd R Golub; Eric S Lander; Jill P Mesirov
Journal: Proc Natl Acad Sci U S A Date: 2005-09-30 Impact factor: 11.205

7. Combination of the histone deacetylase inhibitor LBH589 and the hsp90 inhibitor 17-AAG is highly active against human CML-BC cells and AML cells with activating mutation of FLT-3.

Authors: Prince George; Purva Bali; Srinivas Annavarapu; Anna Scuto; Warren Fiskus; Fei Guo; Celia Sigua; Gautam Sondarva; Lynn Moscinski; Peter Atadja; Kapil Bhalla
Journal: Blood Date: 2004-10-28 Impact factor: 22.113

8. Potentiation of antileukemic therapies by Smac mimetic, LBW242: effects on mutant FLT3-expressing cells.

Authors: Ellen Weisberg; Andrew L Kung; Renee D Wright; Daisy Moreno; Laurie Catley; Arghya Ray; Leigh Zawel; Mary Tran; Jan Cools; Gary Gilliland; Constantine Mitsiades; Douglas W McMillin; Jingrui Jiang; Elizabeth Hall-Meyers; James D Griffin
Journal: Mol Cancer Ther Date: 2007-07 Impact factor: 6.261