Literature DB >> 31538961

Drug repositioning via matrix completion with multi-view side information.

Abstract

In the process of drug discovery and disease treatment, drug repositioning is broadly studied to identify biological targets for existing drugs. Many methods have been proposed for drug-target interaction prediction by taking into account different kinds of data sources. However, most of the existing methods only use one side information for drugs or targets to predict new targets for drugs. Some recent works have improved the prediction accuracy by jointly considering multiple representations of drugs and targets. In this work, the authors propose a drug-target prediction approach by matrix completion with multi-view side information (MCM) of drugs and proteins from both structural view and chemical view. Different from existing studies for drug-target prediction, they predict drug-target interaction by directly completing the interaction matrix between them. The experimental results show that the MCM method could obtain significantly higher accuracies than the comparison methods. They finally report new drug-target interactions for 26 FDA-approved drugs, and biologically discuss these targets using existing references.

Entities: Chemical

Mesh：

Substances：

Year: 2019 PMID： 31538961 PMCID： PMC8687211 DOI： 10.1049/iet-syb.2018.5129

Source DB: PubMed Journal: IET Syst Biol ISSN： 1751-8849 Impact factor: 1.615

known drug–target interaction matrix complete low‐rank matrix in the structural view complete low‐rank matrix in the chemical view drug–drug similarity matrix in the structural view target–target similarity matrix in the structural view drug–drug similarity matrix in the chemical view target–target similarity matrix in the chemical view drugs feature matrix in the structural view protein targets feature matrix in the structural view drugs feature matrix in the chemical view protein targets feature matrix in the chemical view the common complete drug–target interaction matrix any given matrix inner product for matrices gradient operator trade‐off parameters

Introduction

Drugs take effects by acting on their corresponding targets, such as proteins. The identification of drug–target interactions becomes an important step in discovering new drugs. It helps the understanding of drug mechanism in treating diseases and provides inspirations for inventing new drugs. Although researchers can find some meaningful drug–target interactions through biological experiments, the high cost of carrying out those experiments forces people to develop computational methods to identify potential new targets for drugs. Many methods have been proposed for identifying drug–target interactions. Among these researches, a diversity of data, including protein–protein interactions, gene expression data, chemical structure of drugs, metabolic network, protein sequence, drug response and drug side effects, are applied individually or jointly. For example, Liu et al. [1] apply neighbourhood regularised logistic matrix factorisation based on the protein sequences and drug structures to model how likely a drug interacts with a target. Yamanishi et al. [2] and Bleakley and Yamanishi [3] propose bipartite graph‐based methods with the same dataset [1], by first defining a bipartite graph between drugs and proteins and then finding the latent common space for them. The drugs and targets closely situated are predicted as the interacted pairs. Mizutani et al. [4] make use of protein functions and drugs’ side effects to identify novel targets for the already known anti‐cancer drugs by sparse canonical correlation analysis. Chen and Zhang [5] propose a partial least square method with sparse network regularisation by integrating drug response data and gene expression to identify joint modular patterns. Li et al. [6] use the human metabolic network for the prediction of drug–target interactions by exploring drug‐reaction interactions. Dorothea et al. [7] propose a network‐based approach by combining a molecular interaction network and disease gene expression signatures. Ding et al. [8] and Zheng et al. [9] propose similarity‐based methods to discover new drug targets. Li et al. [10] propose an efficient and effective multi‐task machine learning approach for detecting potential drug targets, using both expression data and compound structure information. Since drugs or proteins can be represented in different ways, the identification of drug targets by jointly considering their multi‐view representations is a promising research field in the future due to the sufficient data varieties. For example, a drug can be described by its chemical response in different cells, or by its chemical structure. As for proteins, both their amino‐acid sequences and their gene expression values in different cells can be regarded as their representations. We could consider the structure information of drugs and proteins as the structural view, while their chemical behaviour described by gene expression and drug response is regarded as the chemical view. In the field of machine learning, there are many multi‐view methods which aim to do supervised or unsupervised learning by combining different representations of samples, such as [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21] and so on. Unfortunately, the multi‐view approaches could not be directly applied for multi‐view drug–target prediction, where drugs and targets could construct a bipartite graph. Li [22] proposes a new graph‐regularised‐based single‐view approach of single‐view penalised graph (SPGraph) to identify drug targets by making use of the structural information or the chemical information individually, and extends it to a co‐regularised multi‐view method by fusing structural and chemical views of drugs and targets together. Li and Cai [23] develop a new multi‐view low‐rank embedding (MLRE) method by using a strategy of low‐rank embedding. The results in [22, 23] suggest that the multi‐view approaches perform significantly better than single‐view approaches. Both [22, 23] take similar strategies by first obtaining new features for drugs and targets in a shared subspace and then doing clustering on all these representations by k‐means. The proteins and drugs closely situated are predicted to have interactions. However, one might obtain different prediction results with different clustering methods or different initialisation at the clustering stage. Besides, the accuracy of prediction might be sensitive to the new representations of drugs and proteins in the common subspace. There is a high chance that drugs and proteins are wrongly clustered due to inappropriate representations. It is challenging to develop a new multi‐view approach to identify drug targets. In this work, our goal is to identify drug targets by directly completing the interaction matrix of drugs and proteins by using multi‐view similarities among drugs or targets which we consider as their multi‐view side information. Matrix completion is widely used in biology prediction problems, such as lncRNA‐disease association [24], averse drug interaction [25], gene‐disease associations [26] and miRNA‐disease association [27]. For example, Chen et al. [27] propose an inductive matrix completion method with single‐view side information. Although this work aims to predict miRNA‐disease associations, it can also be applied for drug–target prediction. Unfortunately, this approach could only use one type of side information. Zhao et al. [21] propose to cluster n samples based on samples’ multiple side information, by completing a 0‐1 square clustering matrix whose entry represents whether the two samples are in the same cluster. However, the model proposed in multi‐view matrix completion (MVMC) [21] can only be used in the case where the rows and the columns of completed matrix represent the same samples. In drug–target prediction problem, the rows and columns of the interaction matrix represent drugs and targets, respectively. Thus, MVMC could not be directly applied to predict drug–target interactions. The contributions of this work are twofold. On one hand, we propose a novel inductive matrix completion with multi‐view side information (MCM) for drug target prediction. We complete the association matrix directly with drugs similarity and targets similarity rather than clustering on the new representation of drugs and targets to predict the latent drug targets. The common completed matrix and two single‐view completed matrices are alternately optimised by our MCM algorithm. The method can be considered as a general MCM and be applied to other scenarios. On the other hand, we compare our method MCM with other comparison partners in two experimental settings on real datasets, and the experimental results show that our method performs significantly better than other methods. We also report new and reliable drug–target interactions for 26 FDA‐approved drugs. Most of the prediction results can be supported by existing references, which shows the effectiveness of our proposed method MCM.

Materials and method

In this section, we first describe materials used to obtain the drug similarities and protein targets similarities of two sides in Section 2.1. Then we formulate the multi‐view problem for predicting drug targets in Section 2.2. In Section 2.3, a single‐view approach is introduced by inductive matrix completion. Finally in Section 2.4, we propose our multi‐view drug–target prediction method MCM.

Materials

The data of drug structures and protein sequences are downloaded from KEGG database [28]. Drug structure similarities are computed by SIMCOMP [29], a software program for structural global alignment using the shared substructures of the two compounds’ structures. The similarities between protein sequences are calculated by Smith‐Waterman algorithm [30]. The NCI60 human tumour cell line screen method, which is developed by National Cancer Institute (NCI), aims to screen a substances of cytotoxic activity in 60 cell lines for various cancer types. Specifically, the growth inhibition is measured by the sulforhodamine B assay for a cellular protein after a cell line was exposed to a drug for two days. 50% growth inhibition (GI50) is qualified the concentration of compound. The Developmental Therapeutics Program (DTP) human tumour cell line screening data is obtained from the DTP database https://dtp.cancer.gov/, and the gene expression data (mRNA:Affy‐U133B, GCRMA‐normalised) in NCI‐60 cell lines conducted in [31] are downloaded from NCI website [32]. Using drug response data, the drug similarities are computed by the Gaussian kernel, for which the parameter is chosen as the median distance of pairwise distances among all drugs. We construct protein chemical similarities from gene expression data in the same way as the drug response similarities. We obtain 326 common drugs from the drug response data and the drug structure data. Meanwhile, 608 overlapping proteins are also selected from the gene expression data and the protein sequence data. On the Drug Bank Database [33], the known drug–target associations are downloaded. We then obtain 114 known associations among the selected 326 drugs and 608 protein targets. For either the drugs or the protein targets in our dataset, there are two types of representations: structural and chemical views. The protein sequence similarities and the drug structural similarities are used to construct the structural view representations. On the other hand, we construct chemical view by drug response data and proteins gene expression data in NCI60 cell lines.

Problem formulation

Suppose we have structural similarities and chemical similarities for drugs and proteins targets, respectively. Denote the drug–drug similarities and target–target similarities in the structural view as and , and denote the drug–drug similarities and target–target similarities in the chemical view as and , respectively. Among these drugs and protein targets, the known drug–target associations are denoted as the interaction matrix , which is defined as We also denote to be all the drug–target pairs which are known to be interacted. Our goal is to predict new drug–target associations by completing the matrix based on all the given information. Nomenclature section summarises the notations used in this paper.

Drug–target prediction by matrix completion with single view side information

The inductive matrix completion is proposed in [34] to recover a latent matrix based on limited information. SIMCLDA [24] method applies inductive matrix completion to predict new associations between lncRNA and diseases. The model is based on the assumption that associations between lncRNAs and disease are dependent on the feature vectors extracted from some side information, such as RNA‐RNA similarities and disease‐disease similarities. It first extracts features for lncRNAs and diseases from their similarity matrices, respectively, and then applies the inductive matrix completion model with single view side information (MCS) to recover the unknown interactions between lncRNAs and diseases. Although the method is developed for a different problem, it could be directly used for drug–target prediction. Similarly to lncRNAs or diseases, the feature vectors of drugs or protein targets could be obtained by eigenvalue decomposition of the similarity matrices in the problem of drug repositioning. In detail, we construct drug feature matrix by the eigenvectors of the drug similarity matrix corresponding to its largest eigenvalues. Similarly, we could obtain protein feature matrix by eigenvectors of protein similarity matrix . For either the chemical view or the structural view, the interaction matrix can be recovered by matrix completion with single view side information (MCS) by SIMCLDA proposed in [24] where is the nuclear norm, and is defined as follows: for any matrix and is a collection of observed indicators of interacting drugs and protein targets. After solving the optimisation problem (1) for the optimal , could be used as the completed matrix of . The entry with larger value in the matrix implies that the corresponding drugs and protein targets have higher probability to be interacted. However, SIMCLDA does not consider multiple similarities between lncRNAs or diseases from different fields or views, thus it could not be applied for the case when the multi‐view side information is available. In next section, we will propose a matrix completion method for drug–target prediction with multi‐view side information.

MCM for drug–target prediction

Note that for the structural view, we could first compute and from and , respectively, and then apply the MCS model to obtain a corresponding completed matrix . We call this MCS‐S (S is the short for structural). Similarly, as for the chemical view, we could also obtain a completed matrix , which we call MCS‐C (C is the short for chemical). In this section, we extend the above single‐view model MCS to the multi‐view case. We hope that the two completed matrices obtained from the structural and chemical views are as consistent as enough, and thus propose a MCM as follows: where and and are trade‐off parameters. Note that by minimising the first item , the known entries in the completed matrices can be preserved well. Minimising the second item is to force the low rank of the two matrices and closer. The third term aims to make the two completed interaction matrices be as similar as possible by introducing a common completed matrix . The details of our method are shown in Fig. 1. We also note that the MCM model could be easily extended for the case when more than two views are available.

Fig. 1

Flowchart of our MCM method. We construct similarity matrices in chemical and structural views for drugs and protein targets and extract features from these similarity matrices. Meanwhile, we preprocess the known drug–target association matrix . Finally, a complete drug–target association matrix Q is obtained by MCM model with , , , and P as inputs

Chemical view construction, Structural view construction, Association matrix preprocessing

Algorithm

In order to solve the optimisation problem (2), we develop an algorithm by updating , and alternately. First, we fix and to solve and get the following sub‐problem: The optimal for this problem is where Next, we fix and and solve by the following sub‐problem: Let For any given , one can approximate by the following quadratic approximation: where denotes the inner product for matrices, and the proximal parameter t determines the estimation of the second‐order gradient . Thus, (5) can be rewritten as We then apply accelerated gradient descent (APG) [35] to obtain optimal solution of (7) by the following iterative procedure where , . For step 2, we solve the optimisation problem by applying the following singular value thresholding algorithm [36]. Let . Suppose the singular value decomposition (SVD) of is where and are unitary matrices and are singular values. The solution of (9) is then given by where and are the left and right singular vectors of corresponding to , respectively. Finally, we fix and and solve by the similar way that we use to solve in the previous step. The iterations stop until the change of the value of the objective function in (2) are less than a small number. We thus obtain the recovered matrix by . We show a summary for the procedure to solve the optimisation problem (2) in algorithm box MCM.

Computation complexity analysis

There are two stages in our algorithm MCM. In the first stage, the eigenvalue decomposition is adopted to extract features for drugs and targets in each view, and a computation cost of or is required. At each iteration of the second stage, , and are updated in three steps, respectively. For the first step, is updated by the mean of recovered drug–target association matrices from both two views, which requires a computation cost of . For the second step, is updated by the SVD of , where is the quadratic approximation of with any given . A computation cost of is required for the second step. For the third step, the same computation cost is required as the second step. Overall, the MCM algorithm takes computation time of or (see Fig. 2).

Fig. 2

MCM algorithm for drug–target prediction

Experiments results

Evaluation of our method

We evaluate the performance of our methods MCS‐S, MCS‐C and MCM by comparing their prediction accuracies with some other existing methods including single view methods including support vector machine (SVM), bipartite graph learning (BGL) [2], SPGraph [22] and single‐view rank embedding (SLRE) (2017) [23] and multi‐view methods including multi‐view SVM, multi‐view penalised graph (MPGraph) [22] and multi‐view rank embedding (MLRE) (2017) [23]. Among our methods, MCS‐S and MCS‐C are single‐view methods for structural view and chemical view, respectively, while MCM is the multi‐view method. We first describe the experimental settings in detail, then introduce the comparison methods, and finally show results for all experiments.

Experimental setting

We collect a smaller dataset from the whole dataset by removing drugs with no known targets and targets with no known drugs. About 65 drugs and 80 targets are remained, and there are 114 known pairs among them in total. For the smaller dataset, we design two experimental settings called NT (new coming target) and NDNT (new coming drug and new coming target) to compare different methods with our methods. For the NT setting, our goal is to find the drugs that are associated with the test targets. In the setting of NDNT, we aim to obtain the interactions between test drugs and test targets. For the NT setting, we divide all the 80 targets in the small dataset into five folds. Each fold of targets is chosen as test data in turn while the remaining four folds of targets were considered as training data. We use the associations between the training targets and all the drugs to recover the interaction matrix by methods of MCS‐S, MCS‐C and MCM, respectively. When the interaction matrix is computed, the probabilities of interactions between test targets and all the drugs are obtained. For each test target, the drugs with the k highest association values are considered to be interacted with it. By changing the threshold k, we can obtain a receiver operating characteristic curve and the corresponding area under the curve (AUC) value. In our multi‐view method, we calculate the AUC values in , and , and report the maximum value among these three AUC values as the final AUC value in all of our multi‐view experiments. We take the same way to calculate AUC values in other compared multi‐view methods. In NDNT, we divide all drugs and all targets into five folds, respectively, and select drugs and targets in each fold as test data for each time while the other remaining drugs and targets as training data. With the known associations between training drugs and training targets, one can recover the potential interaction matrix and compute AUCs with the same way in NT setting. We repeat the procedure for 50 times in each setting and report the average AUCs and standard errors. In all three methods: MCS‐S, MCS‐C and MCM, the parameters , and are chosen from the set {0.001, 0.01, 0.1, 1}. We fixed and reported the best results when parameters are chosen from the above set. To make a fair comparison, the same parameter range of and k are used to compute the final results for SPGraph, MPGraph, SLRE and MLRE approaches.

Comparison methods

(a) Single‐view and multi‐view SVMs: On training datasets, SVMs can learn a classifier which can classify pairs of drug–target into categories ‘having interaction’ or ‘not having interaction’. The Kronecker product of drug similarity matrix and protein similarity matrix represents the kernel between drug–protein pairs. For each specific view, SVM with the corresponding Kronecker kernel is applied to solve drug–target prediction problem. For the multi‐view SVM method, we simply apply the SVM approach with multiple kernels from the two views. (b) BGL [2]: For either structural view or chemical view, BGL can be used to predict drug–target associations as a single‐view approach. (c) SPGraph and MPGraph [22]: SPGraph is a single‐view method to predict drug–target associations, and it can be used for either view. MPGraph is the extended multi‐view method, in which both two views can be integrated for drug–target prediction. (d) SLRE and MLRE [23]: SLRE is a low‐rank embedding based single‐view method, which can be used for either view. MLRE is a multi‐view method which uses both structural and chemical views for identifying drug targets.

Results

We first checked the convergence property of our MCM algorithm with , and on the smaller dataset. The results are shown in Fig. 3, where the x‐axis represents the times of iteration, and the y‐axis represents the values of the optimisation objective function. From the figure, we can see that the algorithm converges quite fast.

Fig. 3

Convergence of our MCM algorithm

Convergence of our MCM algorithm The results for our methods and the comparison methods with are shown in Table 1, where ‘—’ denotes that the corresponding single‐view method does not have multi‐view version. Note that single‐view methods with structural view obtained higher AUC values than those with chemical view in most cases. For both of the two views, the single‐view method MCS performed the best in both the NT and the NDNT settings. We can see from the table that, in both settings, graph‐based multi‐view method (MPGraph) and multi‐view method through low rank embedding (MLRE) performed better than their corresponding single‐view methods (SPGraph and SLRE), and our matrix completion based multi‐view method (MCM) worked better than the corresponding single‐view method (MCS). The results imply that applying multi‐view information of drugs and targets could strengthen the prediction accuracy. Besides, our method MCM performed the best among the multi‐view methods for the settings of both NT and NDNT. This shows that our methods are effective in discovering the potential associations between drugs and targets.

Table 1

Average AUCs for all nine methods and t‐test p‐values of significant difference in results between our methods (bold) and the second best methods (italic)

	SVM	BLG	SPGraph	SLRE	MCS	P‐value
Structure view
NT	0.492	0.443	0.509	0.498	0.598	1.969 × 10⁻¹⁵
NDNT	0.523	0.479	0.527	0.591	0.660	1.738 × 10⁻⁰⁷
Chemical view
NT	0.493	0.497	0.541	0.513	0.543	5.534 × 10⁻⁰¹
NDNT	0.472	0.497	0.477	0.431	0.575	4.035 × 10⁻⁰⁷

Average AUCs for all nine methods and t‐test p‐values of significant difference in results between our methods (bold) and the second best methods (italic) To show whether the MCM method outperforms significantly the other methods, we also calculated the t‐test p‐values by comparing the 50 AUCs between our MCM method and the second best method. In Table 1, we reported the p‐values for all the cases. It shows that our method could obtain significantly better results than the compared methods. To show the robustness of our approaches with respect to the parameter k, we took k from the set {5:5:40} and reported the results of SPGraph‐S, SPGraph‐C, MPGraph, SLRE‐S, SLRE‐C, MLRE, MCS‐S, MCS‐C and MCM for the two settings NT and NDNT in Fig. 4. In the NT setting, we can see that the graph‐based methods and the low‐rank embedding based methods sometimes performed even worse than the matrix completion based single‐view method MCS. In the NDNT setting, all methods obtained higher AUC values and performed stably. We also note that generally the multi‐view methods performed better than the single‐view methods for any k in the parameter set, and our multi‐view method of MCM performed the best for each case.

Fig. 4

Average AUC results computed by nine approaches in two settings of NT and NDNT with different values of the parameter k

Average AUC results computed by nine approaches in two settings of NT and NDNT with different values of the parameter k To show the robustness of our method with respect to the parameters and , we reported the results of the average AUC values on both NT and NDNT settings with different values of these two parameters varying from the set of {0.001, 0.01, 0.1, 1} in Fig. 5. We can see that our method could obtain better results on NDNT setting than NT setting. The results on each setting changed a little when the parameters vary. This shows that our method MCM performed robustly for the given set of parameters and .

Fig. 5

Average AUCs on NT and NDNT settings with different values of parameters and

Prediction of new drug–target associations in the whole dataset

We applied our proposed MCM method on the whole dataset to predict new drug–target interactions by completing the association matrix . The parameter is set as , and . In the proposed MCM method, when the latent matrix is recovered from , the probabilities of the associations between all drugs and targets are obtained. For target i, we selected the top t percentage of drugs based on the values in the ith column of the completed matrix and predicted them as the drugs that can interact with the target. We evaluated the prediction results of our method of MCM in the following steps. We first randomly removed l known interactions from the association matrix , where l is a number chosen from the set {5,10,15,20}, and solved the MCM model to recover the interaction matrix . We then selected the associated drug–target pairs in the complete by varying the threshold t in the set {10,20,30,40,50,60}, and finally computed the percentage of the recalled drug–target pairs. Fig. 6 shows the percentage of the recalled pairs with different rank thresholds t and different number of removed known interactions l. We can see that the percentage of recalled pairs increases along with the increase of t at each fixed l. In most cases, over 50% interactions that were removed in the first step could be recovered by our method of MCM. This implies that the prediction results recovered from our MCM method are highly credible. Furthermore, for the new drug d that we are interested in, we conducted prediction experiment between 66 drugs (65 drugs from the smaller dataset and the new drug d) and 608 proteins with 114 known associations to find its corresponding target proteins. Note that there are no known interactions between the drug d and the 608 proteins. The same parameter settings as the previous experiments are used. Table 2 shows the new identified targets for 26 Food and Drug Administration (FDA)‐approved drugs with the top 0.5% of recovered probabilities in each column of the recovered latent matrix when the parameter k is chosen to be 40. We found that some of the predicted targets in Table 2 can be validated by some existing research results, which are discussed in the discussion part.

Fig. 6

Percentage of the recalled pairs with different rank thresholds t and different number of removed known interactions l

Table 2

Predicted targets for 26 FDA‐approved drugs by our MCM method

KEGG ID	Drug name	Gene name
D05905	sparsomycin	UROD, JARID1D, KIF1A
D00372	thiabendazole	SLC1A4
D00433	silver sulfadiazine	SDS, SCNN1A, RARRES1, TSTA3, NPPB, SST, SULT2B1, GSTA2, CPB1
D03936	econazole	FCER1A, NDUFS8, SCNN1A, ALOX5, IFNAR2, RARA, CMA1, GSTM5
D00413	zidovudine	ALDH2
D00237	auranofin	COL1A1, TYR, TTPA, PLCL1, KLK1, APOE, MTAP, CP, S100P, EEA1, JARID1D, P4HB, CRYBB1
D01334	cyclacillin	ALDH2, CLPP
D01364	ciclopirox	VCAM1, JARID1D
D04115	1,8‐cineole	JARID1D
D00214	dactinomycin	PYGL, COL1A1, SLC1A4, NDUFS1, HMOX1, TGM2, ACADM, CFD, JARID1D, POR
D06265	uracil mustard	JARID1D
D00188	cholecalciferol	GRIK1, GRIA1, GRIK2, GRIA2, GRIA4, GRIK3
D00297	digitoxin	NOS1, SLC1A4
D06067	temozolomide	SDS, CALM1, ACVR1B
D00254	carmustine	CALM1, DNMT1, MCM6, PAICS
D00478	procarbazine	ALDH2
D00343	ifosfamide	ALDH3B2
D00966	tamoxifen	SCNN1A
D00153	testolactone	ALDH2, GRIK1, GRIA1
D00399	valproic acid	GAMT, OXTR, CAST, CDC2, UCK2, NR1H2, ITPKA, HAGH, SCN4A, CAPN1
D01068	vinblastine	SLC1A4, NDUFS1, HMOX1, CFD
D01211	leucovorin	GRM1, GRM4, GRM8, MGST2, GRM3
D00275	cisplatin	SDS
D00266	chlorambucil	JARID1D
D01363	carboplatin	JARID1D, CLPP
D01747	idarubicin	SLC1A4, NDUFS1, HMOX1, CFD

Predicted targets for 26 FDA‐approved drugs by our MCM method Percentage of the recalled pairs with different rank thresholds t and different number of removed known interactions l

Discussion

In this section, we discuss the biological meaning of the predicted drug–target interactions by our method MCM. Carmustine is usually referred as an antineoplastic agent used in the treatment of brain tumours. Hagelkrüys [37] reported that the absence of DNMT1 in the brain leads to a severe neurological phenotype, a dramatically disorganised brain architecture and death. This supports our predicted interaction between the target DNMT1 and the drug Carmustine. The work in [38] shows that IDNMT1 overexpression is correlated with a reduction of MGMT protein expression in high‐grade astrocytic tumour. It is reported in [39] that astrocytic tumours form the most common histologic group among childhood brain tumours. This further validates DNMT1 plays an important role in brain tumour and DNMT1 most likely is a key target for drug Carmustine. Besides, it has been known that Tamoxifen can be used for the treatment and prevention of estrogen receptor positive breast cancer. Varley et al. [40] reported two fusion transcripts that were identified in breast cancer cell lines, confirmed across breast cancer primary tumours, and were not detected in normal tissues (SCNN1A‐TNFRSF1A and CTSD‐IFITM10). This strongly validates our predicted drug–target interaction of Tamoxifen and SCNN1A, which is predicted by our methods. Another drug Testolactone is an antineoplastic agent that is used to treat advanced breast cancer. Choi et al. [41] found that alcohol and genetic polymorphisms of cyp2e1 and aldh2 play an important role in breast cancer development. This supports the predicted interaction of Testolactone and ALDH2 in our results. Leucovorin sometimes can be used in combination with 5‐fluorouracil to prolong survival in the palliative treatment of patients with advanced colorectal cancer. Yi et al. [42] demonstrated that expression of GRM3 is significantly upregulated in majority of human colonic adenocarcinomas tested and colon cancer cell lines. GRM3 and Leucovorin are all related to colon cancer or colorectal cancer so they probably have some interaction, so our finding of interactions between them is reasonable. Valproic acid is a histone deacetylase inhibitor and is under investigation for the treatment of HIV and various cancers. In our prediction results, we found that OXTR, UCK2 and ITPKA may be the targets for valproic acid. We also found evidence which indicates that all these targets play important roles in various types of cancer. Zhong et al. [43] showed that OXT receptor (OXTR) is the primary target of OXT in androgen‐independent prostate cancer cell lines (DU145 and PC3). UCK2 is of particular scientific interest due to its overexpression in tumour cell lines [44], which makes it a target in anti‐cancer treatments [45]. Wang et al. [46] showed that ITPKA expression is up‐regulated in many types of cancer including lung and breast cancers, and overexpressed ITPKA contributes to tumourigenesis. These results suggest that valproic acid may interact with targets OXTR, UCK2 and ITPKA to function in different types of cancer, which supports our results. Both carboplatin and chlorambucil can possess antineoplastic activity or be used as antineoplastic agent for the treatment of various malignant and non‐malignant diseases. The results in [47] demonstrated that JARID1D levels were highly down‐regulated in metastatic prostate tumours compared with normal prostate tissues and primary prostate tumours. This indicates that JARID1D might be the target for carboplatin and chlorambucil in the treatment of prostate cancer. Idarubicin is a kind of anthracycline antineoplastics. The results in [48] showed that the panel with NDUFS1 and NDUFS8 reflecting tumour metabolism status is a novel prognostic predictor for lung cancer. This indicates that NDUFS1 would be the target for idarubicin in the treatment of lung cancer.

Conclusion

Many research results have already shown the effectiveness of multi‐view methods for the applications when multiple information of an object are available. In this work, we propose a MVMC method for prediction of the interactions between two types of samples, say drugs and targets. We apply a single‐view approach MCS to identify drug targets by integrating the structural information from drug structures and protein sequences, or integrating the chemical information from both drug response and gene expression. We then extend the single‐view MCS method to the corresponding multi‐view approach MCM, which jointly considers both the structural and chemical information of the drugs and proteins. Our experimental results demonstrate that our approaches work significantly the best in most cases. Although in this work we only consider two types of information for drugs and proteins, our proposed MCM method can be applied for the case when more than two views are available. Extending MCM to three views is an interesting topic, which could strengthen the learning ability. We will do more research on this in the future.

33 in total

1. Pattern discovery and cancer gene identification in integrated cancer genomic data.

Authors: Qianxing Mo; Sijian Wang; Venkatraman E Seshan; Adam B Olshen; Nikolaus Schultz; Chris Sander; R Scott Powers; Marc Ladanyi; Ronglai Shen
Journal: Proc Natl Acad Sci U S A Date: 2013-02-21 Impact factor: 11.205

2. CellMiner: a web-based suite of genomic and pharmacologic tools to explore transcript and drug patterns in the NCI-60 cell line set.

Authors: William C Reinhold; Margot Sunshine; Hongfang Liu; Sudhir Varma; Kurt W Kohn; Joel Morris; James Doroshow; Yves Pommier
Journal: Cancer Res Date: 2012-07-15 Impact factor: 12.701

3. Prediction of lncRNA-disease associations based on inductive matrix completion.

Authors: Chengqian Lu; Mengyun Yang; Feng Luo; Fang-Xiang Wu; Min Li; Yi Pan; Yaohang Li; Jianxin Wang
Journal: Bioinformatics Date: 2018-10-01 Impact factor: 6.937

Review 4. Treatments for astrocytic tumors in children: current and emerging strategies.

Authors: Stanislaw R Burzynski
Journal: Paediatr Drugs Date: 2006 Impact factor: 3.022

5. Predicting enzyme targets for cancer drugs by profiling human metabolic reactions in NCI-60 cell lines.

Authors: Limin Li; Xiaobo Zhou; Wai-Ki Ching; Ping Wang
Journal: BMC Bioinformatics Date: 2010-10-08 Impact factor: 3.169

6. Role of alcohol and genetic polymorphisms of CYP2E1 and ALDH2 in breast cancer development.

Authors: Ji-Yeob Choi; Josef Abel; Thomas Neuhaus; Yon Ko; Volker Harth; Nobuyuki Hamajima; Kazuo Tajima; Keun-Young Yoo; Sue Kyung Park; Dong-Young Noh; Wonshik Han; Kuk-Jin Choe; Sei-Hyun Ahn; Sook-Un Kim; Ari Hirvonen; Daehee Kang
Journal: Pharmacogenetics Date: 2003-02

7. Supervised prediction of drug-target interactions using bipartite local models.

Authors: Kevin Bleakley; Yoshihiro Yamanishi
Journal: Bioinformatics Date: 2009-07-15 Impact factor: 6.937

8. Testicular germ cell tumor susceptibility associated with the UCK2 locus on chromosome 1q23.

Authors: Fredrick R Schumacher; Zhaoming Wang; Rolf I Skotheim; Roelof Koster; Charles C Chung; Michelle A T Hildebrandt; Christian P Kratz; Anne C Bakken; D Timothy Bishop; Michael B Cook; R Loren Erickson; Sophie D Fosså; Mark H Greene; Kevin B Jacobs; Peter A Kanetsky; Laurence N Kolonel; Jennifer T Loud; Larissa A Korde; Loic Le Marchand; Juan Pablo Lewinger; Ragnhild A Lothe; Malcolm C Pike; Nazneen Rahman; Mark V Rubertone; Stephen M Schwartz; Kimberly D Siegmund; Eila C Skinner; Clare Turnbull; David J Van Den Berg; Xifeng Wu; Meredith Yeager; Katherine L Nathanson; Stephen J Chanock; Victoria K Cortessis; Katherine A McGlynn
Journal: Hum Mol Genet Date: 2013-03-05 Impact factor: 6.150