Literature DB >> 29186828

Drug-Target Interaction Prediction through Label Propagation with Linear Neighborhood Information.

Abstract

Interactions between drugs and target proteins provide important information for the drug discovery. Currently, experiments identified only a small number of drug-target interactions. Therefore, the development of computational methods for drug-target interaction prediction is an urgent task of theoretical interest and practical significance. In this paper, we propose a label propagation method with linear neighborhood information (LPLNI) for predicting unobserved drug-target interactions. Firstly, we calculate drug-drug linear neighborhood similarity in the feature spaces, by considering how to reconstruct data points from neighbors. Then, we take similarities as the manifold of drugs, and assume the manifold unchanged in the interaction space. At last, we predict unobserved interactions between known drugs and targets by using drug-drug linear neighborhood similarity and known drug-target interactions. The experiments show that LPLNI can utilize only known drug-target interactions to make high-accuracy predictions on four benchmark datasets. Furthermore, we consider incorporating chemical structures into LPLNI models. Experimental results demonstrate that the model with integrated information (LPLNI-II) can produce improved performances, better than other state-of-the-art methods. The known drug-target interactions are an important information source for computational predictions. The usefulness of the proposed method is demonstrated by cross validation and the case study.

Entities: Chemical Disease Gene Species

Keywords: drug-target interactions; integrated information; label propagation; linear neighborhood

Mesh：

Substances：

Year: 2017 PMID： 29186828 PMCID： PMC6149680 DOI： 10.3390/molecules22122056

Source DB: PubMed Journal: Molecules ISSN： 1420-3049 Impact factor: 4.411

1. Introduction

The identification of potential drug-target interactions is a crucial task in drug discovery, which helps to find novel targets for existing drugs or identify targets for new drugs [1]. Wet experiments are reliable ways of determining interactions between drugs and targets, but they are cost-intensive and time-consuming [2]. In contrast, computational methods provide economic and efficient alternative to predict possible drug-target interactions with high reliability for further experiments. To the best of our knowledge, researchers collect drug-target interaction data, and construct the public databases. Available drug-target data facilitate the development of drug-target interaction prediction methods. Traditional computational methods include molecular docking simulation methods and ligand-based methods. Though docking simulation methods are effective, they cannot work without three-dimensional (3D) structures of targets [3]. Ligand-based methods perform well when there are sufficient known ligands for a target protein, but such methods are not suitable for large-scale data [4]. In addition, several methods have been proposed based on properties of drug and targets. Kuhn et al. [5] used molecular features and target proteins to predict drug-target relations. Garcia-Sosa et al. [6,7] introduced logistic regression and naïve Bayesian classifiers for classification of compounds into one disease category or organ by studying target-ligand data. Cao et al. [8] found that genes that have spatial interactions may have similar molecular function and developed a new gene function prediction method based on gene-gene interacting networks. Xu et al. [9] proposed a stochastic gradient boosting algorithm to predict effective drug combination. Zeng et al. [10] developed a novel features fusion method and adopted the random forest classifier for protein-protein interaction prediction. Wei et al. utilized the random forest classifier [11] and an ensemble classifier called LibD3C [12] to predict protein-protein interaction. Recently, a great number of machine learning methods have been introduced for the drug-target interaction prediction, and machine learning-based methods are roughly divided into four categories: classification methods, matrix factorization methods, kernel methods and network inference methods. Classification methods take drug-target interaction pairs and non-interaction pairs as positive instances or negative instances, and build the classification models for predictions. For example, Nagamine et al. [13] and Wang et al. [14] constructed support vector machine (SVM) models; Tabei et al. [15] utilized logistic regression and SVM. Matrix factorization methods use the matrix factorization technique to reconstruct drug-target interactions. The kernelized Bayesian matrix factorization with twin kernels (KBMF2K) [16] and multiple similarity collaborative matrix factorization (MSCMF) [17] have been used for predictions and graph-regularized matrix factorization (GRMF) [18]. Kernel methods include the pair kernel method (PKM) [19], net Laplacian regularized least squares (NetLapRLS) [20], and regularized least squares with Kronecker product kernel (RLS-Kron) [21]. Network inference methods formulate the drug-target interactions as the graph learning. Bleakley and Yamanishi [22] built bipartite local model (BLM). Mei et al. [23] improved the BLM by considering new drug candidates through its neighbors’ interaction profiles. Chen et al. [24] applied a random walk technique to walk on a drug-drug similarity network, a target-target similarity network and known drug-target interaction networks for predictions. Cheng et al. [25] adopted the resource allocation method to infer interactions in the drug-target bipartite network. Moreover, there are different types of machine learning-based methods [26,27,28,29,30]. Drug-drug similarity or target-target similarity are critical components in many drug-target interaction prediction methods [17,19,20,21,22,23,24]. How to define the similar drugs (targets) is critical, and the point is to calculate drug-drug similarity. To the best of our knowledge, there are different ways of calculating drug-drug similarity based on feature vectors, such as cosine similarity, Gauss similarity and Jaccard similarity. Cosine similarity consists in measuring the cosine of the angle between two vectors in an inner product space. Gauss similarity utilizes the Gauss kernel function to measure the similarity. Jaccard similarity considers the interaction of components and the union. In this paper, we propose a label propagation method with linear neighborhood information (LPLNI) for drug-target interaction predictions. Firstly, we calculate drug-drug linear neighborhood similarity in the feature spaces by considering how to reconstruct data points from neighbors. Then, we take similarities as the manifold of drugs and assume the manifold unchanged in the interaction space. At last, we predict unobserved interactions between known drugs and targets by using drug-drug linear neighborhood similarity and known drug-target interactions. We present a feature of drugs named the interaction profile from the known drug-target interactions. The LPLNI model, based on the interaction profiles, perform well in the computational experiments, achieving AUPR up to 0.9051, 0.9461, 0.9658, and 0.9464 on the enzymes (Es) dataset, the GPCRs dataset, the ion channels (ICs) dataset, and the nuclear receptors (NRs) dataset, respectively. Further, we incorporate drug structure information into the LPLNI model by a nonlinear strategy, improving AUPR to 0.9069, 0.9469, 0.9684, and 0.9492 on the Es dataset, the GPCRs dataset, the ICs dataset, and the NRs dataset, respectively. The experimental results show that our method outperforms other state-of-the-art methods on these four benchmark datasets.

2. Results and Discussion

2.1. Evaluation Metrics

In order to evaluate the performances of prediction models, computational experiments were conducted on four benchmark datasets. Here, we adopted leave-one-out cross validation (LOOCV) to test model performances. That is, each drug-target pair was left out in turn, and remaining pairs were used as the training set to build models for predictions. We repeated the procedure until each drug-target pair is ever tested. The AUC and AUPR are the most popular evaluation metrics in the previous works. AUC is the area under the receiver operating characteristic (ROC) curve, which plots the true positive rate (TPR) versus the false positive rate (FPR). AUPR is the area under the precision-recall curve, which plots the ratio of true positives among the predicted positives for each recall rate. There are more negative instances than positive ones, and AUPR punishes the false positives more in evaluation [31]. Therefore, we adopted AUPR as the primary metric and used AUC to evaluate models.

2.2. The Performances of the LPLNI Models

In this section, we evaluate the performances of the LPLNI models. Since we had the interaction profiles and fingerprints for drugs, we respectively used these features to calculate the linear neighborhood similarities and then built LPLNI models. Here, we used the Pubchem fingerprint for analysis. There are two parameters and in LPLNI, in which is the number of neighbors in the linear neighborhood similarity (LNS), and is the probability of absorbing target information from neighbors. These parameters may influence the results, and we can build LPLNI models using different parameter values. The number of drug neighbors should be less than the number of all drugs, and the four benchmark datasets, i.e., the nuclear receptors (NRs) dataset, the G-protein coupled receptors (GPCRs) dataset, the ion channels (ICs) dataset, and the enzymes (Es) dataset, contain 54, 223, 210, and 445 drugs, respectively. Therefore, we considered different neighborhood numbers 10, 30, and 50 for the NRs dataset, 60, 120, and 180 for the GPCRs and ICs datasets, and 120, 240, and 360 for the Es dataset. In addition, absorbing probability should be greater than zero, and smaller than one. Hence, for parameter we chose values from 0.1 to 0.9 (with a step size of 0.1). The drug-drug similarity is critical for LPLNI. To demonstrate the superiority of linear neighborhood similarity, we also considered cosine similarity, Jaccard similarity, and Gauss similarity and applied label propagation to build similarity-based prediction models. The Gauss function calculates the similarity by , which has the bandwidth parameter , and we set as in [23], where is the feature vector of the -th drug, and is the number of drugs. All prediction models are evaluated using LOOCV. The performances of different similarity-based models are shown in Figure 1. In general, the linear neighborhood similarity can lead to better performances than can cosine similarity, Gauss similarity, or Jaccard similarity. The possible reason for the superior performances of the LPLNI models is that the linear neighborhood similarity describes the linear relationship of data points in the feature space. The linear neighborhood similarity is then smoothly transferred into the interaction space, and LPLNI utilizes the label propagation to make predictions based on the same linear relationship of data points in the interaction space.

Figure 1

The area under the precision-recall curve (AUPR) values of the similarity-based models with different parameters. LNS-10 means LN similarity-based models constructed with 10 neighbors. Other symbols have the similar meanings.

Moreover, we observed that the LPLNI models based on the interaction profiles have better performances than the LPLNI models based on the Pubchem fingerprint, which indicates that the interaction profiles are an information source of utmost importance for prediction.

2.3. The Performances of LPLNI Models with Integrated Information

In machine learning, combining diverse information of drugs can improve the performance of prediction models [32,33,34,35,36,37]. In Section 2.2, our study demonstrates that only the use of interaction profiles of drugs can lead to high-accuracy prediction models; however, we still attempted to incorporate structural information of drugs to further improve accuracy. Since we had nine different fingerprints, we firstly built individual LPLNI models based on different fingerprint features and evaluated their usefulness. The leave-one-out cross validation performances of the prediction models are shown in Table 1. Among all fingerprints, Daylight, Extended and Hybridization fingerprints produce better performances than others on the benchmark datasets. Although the performances of fingerprints are lower than the interaction profiles, fingerprints can still provide information for the drug-target interaction predictions. According to their performances, Daylight fingerprints, Extended fingerprints, and Hybridization fingerprints were adopted to incorporate into the interaction profile-based models.

Table 1

Performances of label propagation method with linear neighborhood information (LPLNI) models and LPLNI-II models on the four datasets.

Features	Methods	NRs	ICs	GPCRs	Es
Daylight	LPLNI	0.4519	0.3326	0.4254	0.4094
Daylight	LPLNI	0.7868	0.7605	0.8771	0.8307
EState	LPLNI	0.2958	0.2437	0.3096	0.2770
EState	LPLNI	0.6903	0.7098	0.8480	0.8055
Extended	LPLNI	0.4452	0.3382	0.4317	0.4153
Extended	LPLNI	0.7820	0.7741	0.8783	0.8261
GraphOnly	LPLNI	0.3177	0.3226	0.3525	0.3507
GraphOnly	LPLNI	0.7478	0.7606	0.8483	0.7939
Hybridization	LPLNI	0.4226	0.3462	0.4047	0.4050
Hybridization	LPLNI	0.8001	0.7962	0.8747	0.8224
Klekota-Roth	LPLNI	0.4665	0.3030	0.3819	0.3360
Klekota-Roth	LPLNI	0.8103	0.7355	0.8580	0.8179
MACCS	LPLNI	0.3764	0.3400	0.3881	0.3804
MACCS	LPLNI	0.7712	0.7543	0.8621	0.8360
Pubchem	LPLNI	0.4470	0.3234	0.4038	0.4039
Pubchem	LPLNI	0.7561	0.7522	0.8822	0.8405
Substructure	LPLNI	0.3202	0.3092	0.2942	0.2875
Substructure	LPLNI	0.7539	0.7662	0.8465	0.8068
Interaction profile	LPLNI	0.9464	0.9658	0.9461	0.9051
Interaction profile	LPLNI	0.9532	0.9890	0.9683	0.9465
Day&Ext&Hyb&Int	LPLNI-II	0.9492	0.9684	0.9469	0.9069
Day&Ext&Hyb&Int	LPLNI-II	0.9919	0.9947	0.9769	0.9700

The value of each fingerprint represents AUPR values (previous row) and area under the receiver operating characteristic (ROC) curve (AUC) values (next row). The bold type indicates the top 4 in terms of AUC and AUPR values. Day&Ext&Hyb&Int: using Daylight, Extended, Hybridization, and the interaction profile as features.

By using the strategy described in Section 3.4, we incorporated the three fingerprints into the interaction profile-based model and developed the prediction model with integrated information, named “LPLNI-II.” As shown in Table 1, LPLNI-II can produce better results than individual feature-based models on the benchmark datasets, improving the AUPR values of 0.9464 to 0.9492 and AUC values of 0.9532 to 0.9919 (on NRs dataset), indicating the usefulness of combing various information of drugs.

2.4. Comparison with State-of-the-Art Methods

To the best of our knowledge, a great number of methods were proposed to predict drug-target interactions. NetLapRLS [20] trained two classifiers based on the chemical and genomic information with the interaction profiles separately, and then linearly combined the two classifiers to develop the prediction model. RLS-Kron [21] considered chemical structures, genomic sequences, and the interaction profiles, then calculated the similarity by the Gaussian function, and utilized the Regularized Least Squares (RLS) classifier to build prediction models. The model based on the interaction profiles could produce high-accuracy performances, and the final prediction model was developed by integrating diverse information with the Kronecker product. These methods and our method utilize the interaction profiles as the primary information sources to develop prediction models. To demonstrate the superiority of our method, we adopted NetLapRLS and RLS-Kron for comparison. All methods were evaluated by leave-one-out cross validation (LOOCV). Since RLS-Kron and our method can make high-accuracy predictions using only the interaction profiles, we firstly built prediction models based on the interaction profiles and compared their performances. As shown in Table 2, the AUPR values of LPLNI are 0.9051, 0.9461, 0.9658 and 0.9464, higher than RLS-Kron on the enzymes (Es) dataset, the G-protein coupled receptors (GPCRs) dataset, the ion channels (ICs) dataset, and the nuclear receptors (NRs) dataset, respectively. In addition, LPLNI produces superior AUC performances on the GPCRs dataset, the ICs dataset, and the NRs dataset. Therefore, the interaction profile-based LNLPI model produces better results than the interaction profile-based RLS-Kron model on these benchmark datasets.

Table 2

Performances of LPLNI and RLS-Kron based on the interaction profiles.

Datasets	Features	Methods	AUC	AUPR
Es	Interaction profile	RLS-Kron	0.9830	0.8850
Es	Interaction profile	LPLNI	0.9465	0.9051
GPCRs	Interaction profile	RLS-Kron	0.9470	0.7130
GPCRs	Interaction profile	LPLNI	0.9683	0.9461
ICs	Interaction profile	RLS-Kron	0.9860	0.9270
ICs	Interaction profile	LPLNI	0.9890	0.9658
NRs	Interaction profile	RLS-Kron	0.9060	0.6100
NRs	Interaction profile	LPLNI	0.9532	0.9464

The bold type indicates the highest AUC/AUPR values. The following tables maintain uniform standards.

Further, we tested the performances of the LPLNI model with integrated information (LPLNI-II) by comparing LPLNI-II with RLS-Kron and NetLapRLS. As shown in Table 3, LPLNI-II can outperform benchmark methods on the GPCRs dataset, ICs dataset, and NRs dataset. Therefore, the LPLNI-II can integrate different information and make high-accuracy predictions.

Table 3

Performances of LPLNI-II and other state-of-the-art methods.

Datasets	Features	Methods	AUC	AUPR
Es	chem&gen&int	RLS-Kron	0.9780	0.9150
	chem&gen&int	NetLapRLS	0.9830	N.A.
	chem&int	LPLNI-II	0.9700	0.9069
GPCRs	chem&gen&int	RLS-Kron	0.9540	0.7130
	chem&gen&int	NetLapRLS	0.9710	N.A.
	chem&int	LPLNI-II	0.9769	0.9469
ICs	chem&gen&int	RLS-Kron	0.9840	0.9430
	chem&gen&int	NetLapRLS	0.9860	0.N.A.
	chem&int	LPLNI-II	0.9947	0.9684
NRs	chem&gen&int	RLS-Kron	0.9220	0.6840
	chem&gen&int	NetLapRLS	0.8880	0.N.A.
	chem&int	LPLNI-II	0.9919	0.9492

N.A.: not available. chem, gen, and int are abbreviations for chemical structure, genomic sequence, and the interaction profile, respectively.

2.5. Case Study

To test the potential of LNLPI in the drug-target interaction predictions, we built models based on known interactions of the Es dataset and then made predictions for unknown interactions. We checked the top 10 interactions predicted by our method and looked for evidences in SuperTarget [38] to support our discoveries. SuperTarget contains updating interactions from several drug databases, i.e., DrugBank, KEGG, etc. As shown in Table 4, 4 predictions out of 10 are confirmed, and results indicate that our method is capable of predicting novel interactions.

Table 4

The top 10 new predicted interactions on the Es dataset.

Rank	Pair	Description	Confirmed?
1	D00574	Aminoglutethimide (USP/INN)
1	hsa1589	cytochrome P450, family 21, subfamily A, polypeptide 2
2	D00437	Nifedipine (JP15/USP/INN)	Yes
2	hsa1559	cytochrome P450, family 2, subfamily C, polypeptide 9	Yes
3	D00542	Halothane (JP15/USP/INN)	Yes
3	hsa1571	cytochrome P450, family 2, subfamily E, polypeptide 1	Yes
4	D00410	Metyrapone (JP15/USP/INN)
4	hsa1583	cytochrome P450, family 11, subfamily A, polypeptide 1
5	D00139	Methoxsalen (JP15/USP)	Yes
5	hsa1543	cytochrome P450, family 1, subfamily A, polypeptide 1	Yes
6	D00437	Nifedipine (JP15/USP/INN)
6	hsa1585	cytochrome P450, family 11, subfamily B, polypeptide 2
7	D00691	Diprophylline (JAN/INN)
7	hsa8654	phosphodiesterase 5A, cGMP-specific
8	D00691	Diprophylline (JAN/INN)
8	hsa5152	phosphodiesterase 9A
9	D00691	Diprophylline (JAN/INN)	Yes
9	hsa5150	phosphodiesterase 7A	Yes
10	D00691	Diprophylline (JAN/INN)
10	hsa50940	Peptidyl-prolyl cis-trans isomerase A

3. Materials and Methods

3.1. Datasets

There are several databases that provide information about drugs and drug-target interactions and that can be used for predicting unobserved drug-target interactions. The Pubchem database [39,40] can provide chemical structures. The DrugBank database [41,42,43,44] is a comprehensive bioinformatics resource that includes targets, transporters, and enzymes of drugs. The KEGG database [45,46] is a collection of protein pathways that are associated with drug targets. BRENDA [47] is a comprehensive collection of enzyme and metabolic data, and is updated by extracting information from primary literature. SuperTarget [38] contains more than 2500 target proteins, which are annotated with about 7300 relations to 1500 drugs. To study potential drug-target interactions, we used four benchmark datasets of drug-target interactions, which were compiled by Yamanishi et al. [48]. There are mainly four types of target proteins: enzymes (Es), ion channels (ICs), G-protein coupled receptors (GPCRs), and nuclear receptors (NRs). In Yamanishi’s datasets, the drug-target interactions were classified into four subsets, which are associated with different types of targets. Table 5 lists the details of the four datasets.

Table 5

Statistics of four drug-target interaction datasets.

Datasets	nd	nt	Edt	nd¯	nt¯	Sparsity
Es	445	664	2926	6.5753	4.4066	0.0099
GPCRs	223	95	635	2.8475	6.6842	0.0299
ICs	210	204	1476	7.0286	7.2353	0.0345
NRs	54	26	90	1.6667	3.4615	0.0641

is the number of drugs, is the number of targets, is the number of known interactions, is the average number of targets for each drug, and is the average number of drugs for each target. Sparsity is known interactions divided by all possible interaction pairs.

3.2. Features

In order to build prediction models, we should represent drugs or targets as feature vectors. Firstly, we present a feature named “interaction profile” for drugs (targets) from known interactions. As shown in Figure 2, let be a set of given drugs, be a set of given targets, and their interactions can be formalized as an interaction network. The interaction profile of a drug (target) is a binary vector describing the presence or absence of interaction with every target (drug) in the network.

Figure 2

A drug-target interaction network and interaction profiles of drugs.

Since we collect drug structures from KEGG DRUG, we also represent drugs as feature vectors based on their substructures. Structural features of drugs are well known as fingerprints, which are bit vectors with elements indicating the frequencies or the existence of certain substructures. As listed in Table 6, there are different drug fingerprints, and we adopt Chemical Development Kit (CDK) [49] to calculate these fingerprints and then use them as structural feature vectors.

Table 6

Descriptions of nine fingerprints.

Fingerprints	Descriptions
Daylight	Daylight fingerprints based on hashing molecular subgraphs
EState	This fingerprinter generates 79 bit fingerprints using the E-State fragments
Extended	These fingerprints extends the CDK with additional bits describing ring features
Graph Only	Specialized version of the CDK Fingerprinter that does not take bond orders into account
Hybridization	This fingerprinter takes into account SP2 hybridization states
Klekota-Roth	This fingerprinter presence of 4860 substructures
MACCS	This fingerprinter generates 166 bit MACCS keys.
Pubchem	These fingerprints are of the structural key type, of length 881
Substructure	The fingerprint currently supports 307 substructures

3.3. The Label Propagation Method with Linear Neighborhood Information

In this section, we introduce the label propagation method with linear neighborhood information (LPLNI), which has two steps: calculation of linear neighborhood similarity and label propagation-based prediction. Let us introduce several notations. Given drugs and targets, their interactions are organized as an interaction matrix , where is the interaction profile of the -th target. if the -th drug interacts with the -th target, else, . Each drug can be represented by a -dimension feature vector (for example, the interaction profile), .

3.3.1. Linear Neighborhood Similarity

Roweis et al. [50] revealed that a data point and its neighbors are close to a locally linear patch of the manifold, and Wang et al. [51] discovered that each point can be optimally reconstructed by its neighbors. Based on these studies [50,51], we calculated the drug-drug similarity by considering how to reconstruct the data point through its neighbors, as per our previous work [52]. Here, we represent drugs as feature vectors , and take them as data points in the feature space. We reconstruct each data point by linear combination of its neighbors and formulate the optimization problem as follows: where is the Euclidean norm, and represents the set of nearest neighbors (by Euclidean distance) of . and is the entry of the Gram matrix . represents the weights of for reconstructing and can be considered as the similarity between and . Clearly, if . We notice that the matrix is likely to be singular if the neighbors are close to each other. In this case, it is hard to obtain the unique solution of the optimization problem. In order to avoid the singular matrix and enhance generalization capability, we introduce regularization for the reconstructive weights and present the optimization problem: where is the regularization parameter and column vector . The parameter controls the relative value between reconstruction error and the regularization term . Since spectral norm is compatible and Gram matrix is symmetric and positive semidefinite, we have where is spectral radius of . Here, we can estimate value range of and . Therefore, we can roughly set in the practical use, and is a small number satisfying . We set to for simplicity. We can use the standard quadratic programing to solve Equation (2), and its solutions is named the “linear neighborhood similarity” (LNS). We calculate the weights for data points, and concentrate them row by row, and form the similarity matrix . The entire procedure of calculating LNS is summarized in Figure 3.

Figure 3

Procedure of calculating linear neighborhood similarity.

3.3.2. Label Propagation

Based on the drug-drug similarity, we formulate a directed graph, which uses drugs as nodes and similarities as weights. It is worth mentioning that usually . In the graph, the known interactions of drugs with given targets are taken as the initial label information of nodes, and the label information is then updated. In the update, a node absorbs label information for its neighbors with the probability and retains the initially label information with the probability . The update process for the -th label of nodes at the -th iteration is written as where is the -th column vector of the interaction matrix (i.e., the -th initial labels for all nodes). Further, we can formulate the update for all target labels in matrix form: where represents that label matrix in the th iteration, and . We will analyze the convergence of this iterative process Equation (6) in Theorem 1. The iterative process, Equation (6), will converge to a solution where is the identity matrix. Note that , the iterative process Equation (6) can be rewritten as follows Since the spectral radius of or and , then Therefore, is the final label matrix, presenting the predicted scores for drug-target pairs.

3.4. LPLNI with Integrated Information

In this paper, we consider the interaction profile feature of drugs and targets and consider different fingerprint features of drugs. Therefore, we can calculate different similarities based on different features and then build different prediction models. Generally, combining diverse models can enhance predictive performances [53,54,55,56]. Here, we consider a nonlinear strategy to integrate different prediction models. Given models, they will produce predicted scores for a drug-target pair, denoted as , and the integrated score is given by the following binomial logistic regression model in the conditional probability form: where , , and . The parameters are estimated by maximum likelihood estimation based on known interactions and their predicted scores. In the prediction stage, the predicted scores from the models are aggregated by Equation (8) to produce the final predictions. We abbreviate the LPLNI model with integrated information as “LPLNI-II”.

4. Conclusions

In this paper, we propose a drug-target interaction prediction method with linear neighborhood information, and the method can utilize known interactions to make high-accuracy predictions. Further, we incorporated structural information into the prediction models to improve performances. Computational experiments show that our method outperforms other state-of-the-art methods on the benchmark datasets. The potential of the method is also validated in the case study. In conclusion, the proposed method is a promising tool for drug-target interaction prediction.

43 in total

Review 1. G protein-coupled receptor drug discovery: implications from the crystal structure of rhodopsin.

Authors: J Ballesteros; K Palczewski
Journal: Curr Opin Drug Discov Devel Date: 2001-09

2. Predicting drug-target interactions from chemical and genomic kernels using Bayesian matrix factorization.

Authors: Mehmet Gönen
Journal: Bioinformatics Date: 2012-06-23 Impact factor: 6.937

3. Improved prediction of protein-protein interactions using novel negative samples, features, and an ensemble classifier.

Authors: Leyi Wei; Pengwei Xing; Jiancang Zeng; JinXiu Chen; Ran Su; Fei Guo
Journal: Artif Intell Med Date: 2017-03-04 Impact factor: 5.326

4. DrugBank: a comprehensive resource for in silico drug discovery and exploration.

Authors: David S Wishart; Craig Knox; An Chi Guo; Savita Shrivastava; Murtaza Hassanali; Paul Stothard; Zhan Chang; Jennifer Woolsey
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

5. From genomics to chemical genomics: new developments in KEGG.

Authors: Minoru Kanehisa; Susumu Goto; Masahiro Hattori; Kiyoko F Aoki-Kinoshita; Masumi Itoh; Shuichi Kawashima; Toshiaki Katayama; Michihiro Araki; Mika Hirakawa
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

6. Predicting potential drug-drug interactions by integrating chemical, biological, phenotypic and network data.

Authors: Wen Zhang; Yanlin Chen; Feng Liu; Fei Luo; Gang Tian; Xiaohong Li
Journal: BMC Bioinformatics Date: 2017-01-05 Impact factor: 3.169

7. PubChem: a public information system for analyzing bioactivities of small molecules.

Authors: Yanli Wang; Jewen Xiao; Tugba O Suzek; Jian Zhang; Jiyao Wang; Stephen H Bryant
Journal: Nucleic Acids Res Date: 2009-06-04 Impact factor: 16.971

8. Identification of chemogenomic features from drug-target interaction networks using interpretable classifiers.

Authors: Yasuo Tabei; Edouard Pauwels; Véronique Stoven; Kazuhiro Takemoto; Yoshihiro Yamanishi
Journal: Bioinformatics Date: 2012-09-15 Impact factor: 6.937

9. Accurate Prediction of Transposon-Derived piRNAs by Integrating Various Sequential and Physicochemical Features.

Authors: Longqiang Luo; Dingfang Li; Wen Zhang; Shikui Tu; Xiaopeng Zhu; Gang Tian
Journal: PLoS One Date: 2016-04-13 Impact factor: 3.240

10. A genetic algorithm-based weighted ensemble method for predicting transposon-derived piRNAs.

Authors: Dingfang Li; Longqiang Luo; Wen Zhang; Feng Liu; Fei Luo
Journal: BMC Bioinformatics Date: 2016-08-31 Impact factor: 3.169

14 in total

1. A Novel Triple Matrix Factorization Method for Detecting Drug-Side Effect Association Based on Kernel Target Alignment.

Authors: Xiaoyi Guo; Wei Zhou; Yan Yu; Yijie Ding; Jijun Tang; Fei Guo
Journal: Biomed Res Int Date: 2020-05-28 Impact factor: 3.411