Literature DB >> 35625469

RoFDT: Identification of Drug-Target Interactions from Protein Sequence and Drug Molecular Structure Using Rotation Forest.

Ying Wang¹, Lei Wang^1,2, Leon Wong², Bowei Zhao³, Xiaorui Su³, Yang Li⁴, Zhuhong You^2,5.

Abstract

As the basis for screening drug candidates, the identification of drug-target interactions (DTIs) plays a crucial role in the innovative drugs research. However, due to the inherent constraints of small-scale and time-consuming wet experiments, DTI recognition is usually difficult to carry out. In the present study, we developed a computational approach called RoFDT to predict DTIs by combining feature-weighted Rotation Forest (FwRF) with a protein sequence. In particular, we first encode protein sequences as numerical matrices by Position-Specific Score Matrix (PSSM), then extract their features utilize Pseudo Position-Specific Score Matrix (PsePSSM) and combine them with drug structure information-molecular fingerprints and finally feed them into the FwRF classifier and validate the performance of RoFDT on Enzyme, GPCR, Ion Channel and Nuclear Receptor datasets. In the above dataset, RoFDT achieved 91.68%, 84.72%, 88.11% and 78.33% accuracy, respectively. RoFDT shows excellent performance in comparison with support vector machine models and previous superior approaches. Furthermore, 7 of the top 10 DTIs with RoFDT estimate scores were proven by the relevant database. These results demonstrate that RoFDT can be employed to a powerful predictive approach for DTIs to provide theoretical support for innovative drug discovery.

Entities: Chemical

Keywords: drug; rotation forest; support vector machine; target protein

Year: 2022 PMID： 35625469 PMCID： PMC9138819 DOI： 10.3390/biology11050741

Source DB: PubMed Journal: Biology (Basel) ISSN： 2079-7737

1. Introduction

A critical step in innovative drug development is determining the interactions among drugs and targets, which is the forerunner of drug design [1,2]. Drugs play an important role in the human body by interacting with their targets, of which proteins are an essential target. By inhibiting or enhancing the function of the target protein, the drug achieves the goal of treating the disease. Although the advent of high-throughput sequencing methods has provided technical support for determining DTIs and extensive efforts has been made by drug developers, few new drugs are still approved by the Food and Drug Administration (FDA) for marketing each year [3,4,5,6]. The main reason is that the identification of DTIs by wet experimental approaches alone consumes a lot of time and money, and the scale of identification is small. With the development of computational biology, this situation can be greatly alleviated. Computer-aided prediction of DTIs can be executed rapidly at scale, providing reliable candidate drug targets for biological experiments and theoretical support for new drug development [2,4,7,8,9]. To date, computer-aided prediction-based models for DTIs have been devised by numerous researchers, and they can be roughly classified into two groups: the approach based on network and the approach based on machine learning [10,11,12]. The approach based on network approach typically characterizes the association among targets and drugs as a heterogeneous network, and predict DTI by evaluating network topology node similarity. For instance, the SDTBNI model designed by Wu et al. [13] predicts DTIs by DTI networks, drug and entity substructure linkages in unknown network space. Chu et al. [14] proposed a new DTI prediction method called the DTI-CDF model. This method can not only extract similarity features between drugs, but also extract similarity features between target proteins from heterogeneity graph, which greatly improves the prediction performance. Zhang et al. [15] designed the prediction method of DTIs according to LPLNI, which makes use of the data of neighborhood re-construction. Chu et al. [16] facilitate multi-label classification by introducing the community detection method DTI-MLCD for DTI prediction. The method performs significantly better than other machine learning methods and other existing methods in the updated gold standard dataset. Zong et al. [17] used DeepWalk combined with target-target and drug-drug similarities to accurately predict DTIs with the support of Linked Tripartite Network (LTN) and biomedical related data. The approach based on machine learning mainly uses computer to extract data features and combine with classifier to implement DTIs prediction [18,19]. For example, Peng et al. [20] used a semi-supervised inference way to predict DTIs by combining a PCA-based convex optimization algorithm with information about drug targets. On the basis of the hypothesis that drugs with chemical similarity have similar bioactivity, the DTIs prediction method of target protein information combined with drug structure information has achieved excellent results. Therefore, this paper designs the machine learning approach for predicting DTIs according to this hypothesis. Specifically, we first extracted the protein sequence features information using the Pseudo Position-Specific Score Matrix (PsePSSM) method, then fused them with drug molecular fingerprint descriptors and finally accurately predicted DTIs with interactions by the feature-weighted rotation forest classifier (FwRF) classifier. We tested the performance of RoFDT in datasets including Enzyme, GPCR, Ion Channel and Nuclear Receptor, and compared them with other feature approaches, classifier approaches and previous methods. The superior results demonstrate that the proposed model has excellent ability to identify DTIs. The frame diagram of rofdt is shown in Figure 1.

Figure 1

Flow framework diagram of RoFDT model.

2. Materials and Methods

2.1. Gold Standard Datasets

In the present study, we validated the performance of RoFDT on four gold standard datasets, including Enzyme, GPCR, Ion Channel and Nuclear Receptor. These data were collected from SuperTarget & Matador [21], KEGG BRITE [22], BRENDA [23] and DrugBank [24] databases by Yamanishi et al. [25]. In these four datasets, the number of DTIs pairs (drug, target) they contain is (445, 664), (210, 204), (233, 95) and (54, 26), respectively, and the number of DTIs with interaction (positive sample) is 2926, 1476, 635 and 90, respectively [26]. We describe the network of DTIs by a bipartite diagram, where targets or drugs are presented by nodes and their associations are represented by edges. To construct the balanced dataset, we use the random strategy to select the same number of negative samples as positive samples.

2.2. Drug Molecular Descriptor

Drug molecular fingerprinting is widely used to characterize drug compounds because it can directly represent the association between molecular properties and structure and does not need their three-dimensional structural information. Drug molecule fingerprinting manages molecular substructures with dictionary strategy. For a particular molecule, the corresponding position of its dictionary is set to 1 when it has a certain substructure and 0 otherwise. Thus, the fingerprint descriptor of a given drug molecule can be constructed. We used molecular fingerprints from PubChem in this study, and the fingerprints property is “PUBCHEM_CACTVS_SUBGRAPHKEYS”. The compound molecule is decomposed into 881 substructures, so its fingerprint feature descriptor is also 881-dimensional.

2.3. Target Protein Descriptor

In this study, PSSM [27] was used to generate descriptors of protein sequences. PSSM S(i,j) can be characterized as , which is an L × 20 matrix, in which the length of sequence is L and the types of amino acids are 20. Therefore, the formula of S(i,j) is described as shown below: where indicates the probability that the residue of the protein is mutated into the amino acid during evolution. We obtained PSSM through Position-Specific Iterated BLAST (PSI-BLAST) according to SwansProt dataset [28,29]. PSI-BLAST will calculate the vector indicating the mutational conservatism of 20 different amino acids. To obtain broad and high homologous protein, the parameter e-value and iterations are set to 0.001 and 3, respectively.

2.4. Protein Feature Extraction

For better compatibility with the PSSM matrix, we extracted the potential features of proteins using the PsePSSM designed by Chou et al. [30], which can be denoted as below: where is the score calculated by PSI-BLAST, which score can be positive or negative. The probability of the appropriate mutation in the protein sequence higher than unexpectedly expected is indicated by a positive number, otherwise a negative number. However, since protein sequences of different lengths yield different rows of substrates, we thus need to convert them to a uniform pattern using the following equation: and: where represents the average score when protein residue P evolves into a j-type amino acid. To prevent protein P from losing its sequence information during evolution, we improved the equation by constructing pseudo-amino acids, which are described as follows: where is the correlation factor of j-type amino acids.

2.5. Classification Prediction

In our study, we classify and predict DTIs feature descriptors by FwRF. This classifier has the advantage of increasing the effective feature weights and removing the noise information, which can effectively improve the prediction accuracy. FwRF uses the statistical method to obtain the weights of different features, and its formula is as follows: where is the number of categories with the value , and its statistics are as follows: is the expectation of and , and it can be denoted as below: where is the total sample size. In feature , the sample size whose value is is recorded as , and in class , the sample size whose value is is recorded as . FwRF first calculates the weights of the features using the statistical method, then sorts them in descending order and removes the low-weight features depending on the parameters, and finally uses the newly obtained feature set for classification prediction. Rotation forest (RF) [31,32] is a widespread classifier. Given a dataset containing training samples, where is the data and is the label, the data consist of features, thus forming a matrix of . The decision tree in RF is presented as , and there are in total. The execution steps of RF are as follows. a. The feature set is grouped into -independent parts of the number by the appropriate parameter. b. The new matrix of the training set is formed using the corresponding feature columns of , and 3/4 of the features are selected from it forms matrix with bootstrap. c. The coefficient matrix is obtained through the feature transformation , and the coefficient matrix is rotated to generate the rotation matrix , which is described as follows: In the classification prediction stage, the classifier calculates the confidence level of the test sample using the following formula and discriminates it as the class with the highest confidence value:

3. Results

3.1. Evaluation Indicator

To better evaluate the RoFDT performance, we used the general evaluation standard of machine learning in this study, which can be denoted as below: where TP, TN, FP and FN, respectively, represent True Positive, True Negative, False Positive and False Negative. Moreover, the receiver operating characteristic (ROC) curve [33,34,35] and area under the ROC curve (AUC) were also calculated to reflect the performance of RoFDT.

3.2. Parameter Evaluation

To maximize the RoFDT performance, the grid search approach is employed to verify the FwRF and PsePSSM parameters. When data features are extracted using the PsePSSM algorithm, the information content can be adjusted by changing the parameters in Equation (5) to obtain different feature values. We investigate the effect of different parameters of PsePSSM on the subsequent classification effect in this experiment in order to select the best combination of parameters. The effect of different parameters of the classifier on its prediction accuracy in the enzyme dataset is shown in Figure 2. It can be seen from the figure that the RF classifier achieves the highest accuracy when the feature subset , the feature selection ratio and the number of decision trees are set to 16, 0.8 and 21, respectively. Therefore, we apply them as the optimal parameters in the model.

Figure 2

The influence of different parameters of the FwRF classifier on classification accuracy.

3.3. Prediction Performance Evaluation

After optimizing the parameters of RoFDT, we evaluate its performance in the Enzyme, GPCR, Ion Channel and Nuclear Receptor datasets using the five-fold cross-validation (5FCV) method. Table 1, Table 2, Table 3 and Table 4 summarizes the outcomes obtained by RoFDT on the gold standard datasets. In these datasets, RoFDT achieved 91.68%, 88.11%, 84.72% and 78.33% prediction accuracy, and its standard variance was 0.84%, 1.01%, 1.94% and 5.34%, respectively. In terms of sensitivity, RoFDT achieved scores of 90.84%, 90.30%, 84.73% and 81.97% in the four datasets with standard variances of 1.68%, 1.61%, 3.45% and 7.85%, respectively. RoFDT achieved 83.39%, 79.02%, 74.06%, 65.56% and 91.72%, 88.27%, 85.57% and 75.31% in the MCC and AUC evaluation metrics, which combine to show predictive performance, respectively. These excellent results show that RoFDT has good ability to predict DTIs with strong robustness. Figure 3, Figure 4, Figure 5 and Figure 6 show the ROC curves acquired by RoFDT in the gold standard dataset, respectively.

Table 1

5FCV prediction results obtained by RoFDT on the Enzyme dataset.

Test Set	Accu.(%)	Sen.(%)	Prec.(%)	MCC(%)	AUC(%)
1	90.51	89.20	91.27	81.03	90.04
2	92.82	93.22	92.59	85.64	92.96
3	91.62	90.19	92.74	83.28	92.09
4	91.97	89.68	94.40	84.05	91.79
5	91.47	91.90	90.96	82.94	91.73
Average	91.68 ± 0.84	90.84 ± 1.68	92.39 ± 1.37	83.39 ± 1.68	91.72 ± 1.06

Table 2

5FCV prediction results obtained by RoFDT on the Ion Channel dataset.

Test Set	Accu.(%)	Sen.(%)	Prec.(%)	MCC(%)	AUC(%)
1	86.61	90.38	83.76	76.76	86.16
2	87.63	91.92	84.78	78.22	87.83
3	88.98	91.61	87.22	80.36	89.68
4	88.31	89.67	87.62	79.33	89.07
5	89.02	87.93	89.47	80.43	88.59
Average	88.11 ± 1.01	90.30 ± 1.61	86.57 ± 2.29	79.02 ± 1.55	88.27 ± 1.36

Table 3

5FCV prediction results obtained by RoFDT on the GPCR dataset.

Test Set	Accu.(%)	Sen.(%)	Prec.(%)	MCC(%)	AUC(%)
1	82.28	86.21	77.52	70.73	82.86
2	87.01	88.62	85.16	77.38	88.82
3	86.22	86.52	88.41	76.00	86.72
4	84.63	80.33	86.73	73.83	84.37
5	83.46	81.95	85.83	72.37	85.11
Average	84.72 ± 1.94	84.73 ± 3.45	84.73 ± 4.21	74.06 ± 2.68	85.57 ± 2.28

Table 4

5FCV prediction results obtained by RoFDT on the Nuclear Receptor dataset.

Test Set	Accu.(%)	Sen.(%)	Prec.(%)	MCC(%)	AUC(%)
1	69.44	83.33	65.22	55.90	72.22
2	77.78	85.00	77.27	64.34	69.69
3	80.56	92.31	66.67	67.47	74.25
4	83.33	77.78	87.50	72.05	75.31
5	80.56	71.43	93.75	68.03	85.08
Average	78.33 ± 5.34	81.97 ± 7.85	78.08 ± 12.56	65.56 ± 6.05	75.31 ± 5.87

Figure 3

ROC curves of the 5FCV experiment acquired by RoFDT on the Enzyme dataset.

Figure 4

ROC curves of the 5FCV experiment acquired by RoFDT on the Ion Channel dataset.

Figure 5

ROC curves of the 5FCV experiment acquired by RoFDT on the GPCR dataset.

Figure 6

ROC curves of the 5FCV experiment acquired by RoFDT on the Nuclear Receptor dataset.

3.4. Comparison of Different Feature Models

To estimate the influence of the PsePSSM algorithm on the RoFDT model, we compare it with the Local Phase Quantization (LPQ) algorithm model on four gold standard datasets in this part of the experiment. The LPQ algorithm originally described in the article for texture description by Ojansivu and Heikkila [36] and is according to the blur invariance property of the Fourier phase spectrum [37,38,39]. Table 5 lists the 5FCV outcomes produced by LPQ combined with FwRF on gold standard datasets. As observed in Table 5, RoFDT has gained the optimal outcomes in all evaluation indicators. Detailed 5FCV outcomes on four gold standard datasets are aggregated in Tables S1–S4 of Supplementary Materials. For a fair comparison, FwRF was set with the same hyperparameters in the experiment. From the experimental outcomes, it can be seen that PsePSSM combined with FwRF can effectively promote the model performance.

Table 5

5FCV outcomes of the LPQ combined with FwRF model on the four gold standard datasets.

Dataset	Model	Accu.(%)	Sen.(%)	Prec.(%)	MCC(%)	AUC(%)
Enzyme	LPQ	89.63 ± 0.39	89.69 ± 1.82	89.64 ± 2.16	79.32 ± 0.79	89.40 ± 0.98
Enzyme	RoFDT	91.68 ± 0.84	90.84 ± 1.68	92.39 ± 1.37	83.39 ± 1.68	91.72 ± 1.06
Ion Channel	LPQ	83.97 ± 2.32	86.93 ± 3.03	81.89 ± 3.66	68.13 ± 4.54	84.66 ± 2.01
Ion Channel	RoFDT	88.11 ± 1.01	90.30 ± 1.61	86.57 ± 2.29	79.02 ± 1.55	88.27 ± 1.36
GPCR	LPQ	82.52 ± 2.17	83.87 ± 3.58	81.79 ± 3.78	65.19 ± 4.15	83.19 ± 1.79
GPCR	RoFDT	84.72 ± 1.94	84.73 ± 3.45	84.73 ± 4.21	74.06 ± 2.68	85.57 ± 2.28
Nuclear Receptor	LPQ	66.67 ± 7.35	67.64 ± 16.23	67.97 ± 9.98	35.46 ± 10.89	69.56 ± 6.85
Nuclear Receptor	RoFDT	78.33 ± 5.34	81.97 ± 7.85	78.08 ± 12.56	65.56 ± 6.05	75.31 ± 5.87

3.5. Classifier Model Comparison

To investigate further the influence of various classifiers on the RoFDT performance, we compare it with the SVM classifier model. The parameters of the SVM were refined, and its hyperparameters and were optimized to 0.6 and 0.5, respectively. The optimization outcomes of SVM parameters are shown in detail in Table S9 of the Supplementary Materials. As can be seen in Table 6, RoFDT achieved higher scores in all four gold standard datasets compared to the SVM model. Specifically, RoFDT achieved optimal results in the four gold standard datasets for accuracy, MCC, sensitivity and AUC, but was only slightly less precision than the SVM model in the Ion Channel and Enzyme datasets. Detailed 5FCV experimental outcomes on gold standard datasets are shown in Tables S5–S8 of Supplementary Materials. The experimental results of comparing different classifier models show that the FwRF classifier used by RoFDT can be better compatible with it, which helps to increase the model prediction accuracy.

Table 6

5FCV outcomes of different classifier models on the four gold standard datasets.

Dataset	Model	Accu.(%)	Sen.(%)	Prec.(%)	MCC(%)	AUC(%)
Enzyme	SVM	84.20 ± 0.60	69.90 ± 1.70	98.00 ± 0.50	71.50 ± 1.00	84.30 ± 1.20
Enzyme	RoFDT	91.68 ± 0.84	90.84 ± 1.68	92.39 ± 1.37	83.39 ± 1.68	91.72 ± 1.06
Ion Channel	SVM	81.90 ± 1.20	69.70 ± 3.70	92.40 ± 2.20	66.00 ± 1.90	81.70 ± 1.20
Ion Channel	RoFDT	88.11 ± 1.01	90.30 ± 1.61	86.57 ± 2.29	79.02 ± 1.55	88.27 ± 1.36
GPCR	SVM	70.00 ± 2.10	50.40 ± 7.80	82.30 ± 3.30	42.80 ± 4.90	70.10 ± 2.70
GPCR	RoFDT	84.72 ± 1.94	84.73 ± 3.45	84.73 ± 4.21	74.06 ± 2.68	85.57 ± 2.28
Nuclear Receptor	SVM	63.30 ± 3.60	57.60 ± 7.90	67.50 ± 14.60	29.60 ± 7.40	61.80 ± 5.80
Nuclear Receptor	RoFDT	78.33 ± 5.34	81.97 ± 7.85	78.08 ± 12.56	65.56 ± 6.05	75.31 ± 5.87

3.6. Comparison with Previous Models

Using the powerful computing power of computer to predict DTIs on a large scale has become increasingly important in the field of new drug research and development. Numerous researchers have constructed different computational models to solve this problem. To further evaluating RoFDT’s capabilities, we compare it with these excellent models. Among these excellent models, we chose the model that is also implemented in the four datasets and evaluated using 5FCV. The AUCs generated by these models are listed in Table 7. As seen in table, RoFDT performed well overall, achieving the best results on Enzyme and the second highest outcomes on Ion Channel and GPCR. However, constrained by the sample size of Nuclear Receptor, RoFDT is not sufficiently trained and performs generally in it.

Table 7

Comparison with previous excellent models on the four gold standard dataset.

Dataset	NetCBP [40]	KBMF2K [41]	RFDT [26]	SIMCOMP [42]	RoFDT
Enzyme	0.8251	0.832	0.915	0.863	0.9172
Ion Channel	0.8034	0.799	0.890	0.776	0.8827
GPCR	0.8235	0.857	0.845	0.867	0.8557
Nuclear Receptor	0.8394	0.824	0.723	0.856	0.7531

3.7. Case Studies

To verify the power of RoFDT to predict unknown DTIs, all known DTI pairs are used to train RoFDT and predict in its unknown space. We validate the top 10 DTIs with the highest prediction scores in SuperTarget [21] and the drug target pairs validated in the SuperTarget database do not contain the data used for training. SuperTarget is a drug target database with a collection of 332,828 DTIs. The outcomes of the case studies are listed in Table 8, where 7 of the top 10 with the best prediction scores were validated by this database. The case study reveals that RoFDT has the capability to competitively predict unknown DTIs. It is interesting to note that although the remaining three pairs of DTIs are not substantiated by the current database, there is a possibility that their relationship will be proved as the study progresses.

Table 8

Top 10 DTI pairs predicted by RoFDT on the SuperTarget database.

Drug Name	Drug ID	Target Protein Name	Target Protein ID	Validation Database
Dihydroxypropyltheophylline	D00691	PDE7A_HUMAN	has5150	SuperTarget
Isotretinoino	D00348	RXRA_HUM	hsa6256	SuperTarget
Xanthotoxine	D00139	CP1A1_hasAN	hsa1543	SuperTarget
Loxapinsuccinate	D02340	DRhasHUMAN	hsa1812	SuperTarget
Prochlorpermazine	D00493	has2A_HUMAN	hsa3356	unconfirmed
Bromochlorotrifluoroethane	D00542	CP2E1_HUMAN	hsa1571	SuperTarget
Mifepristone	D00585	ESR1_HUMAN	hsa2099	SuperTarget
Olanzapine	D00454	DRD2_HUMAN	hsa1813	unconfirmed
Transdermal Nicotine	D03365	ACHA4_HUMAN	hsa1137	SuperTarget
Epoprostenol	D00106	PE2R3_HUMAN	hsa5733	unconfirmed

4. Discussion

In the present study, we propose a reliable DTI prediction approach RoFDT by combining protein sequence and drug molecular structure. We first transformed the protein sequence information numerically by PSSM based on its sequence information, and extracted its hidden features using PsePSSM. The drug structure is then encoded as the digital descriptor based on molecular fingerprinting techniques. Finally, the performance of RoFDT was verified using FwRF on four benchmark datasets, and its prediction results were confirmed by the authoritative databases. All these exceptional outcomes show that RoFDT is a valid approach for predicting DTIs and can provide new insights for potential drug discovery. RoFDT exhibits competitive advantages over previous DTI prediction models. The reason for this is that RoFDT considers that protein sequences provide rich information support for DTI prediction, and its PSSM descriptors are well compatible with PsePSSM feature extraction method to extract its potential features to the maximum extent. In addition, the molecular fingerprint descriptors of drug structures can faithfully represent different drug substructure properties, and thus, have a high characterization capability. Under the above circumstances, RoFDT was able to predict DTI more accurately and provide a more reliable theoretical basis for drug development. However, RoFDT still has some limitations. For example, the utilization of protein sequence information by RoFDT relies mainly on PSSM, and its richer description needs to be further explored. Second, although the feature extraction method used by RoFDT has achieved better results, it still requires more manual experience to support, and the automation process needs to be better improved. Finally, RoFDT requires more data for training and is not very sensitive to newly discovered drug targets. In future research, we intend to explore more intelligent feature characterization methods to overcome the above-mentioned shortcomings and further enhance the RoFDT performance.

5. Conclusions

As a pioneering step in drug development, the reliable prediction and identification of DTIs plays an essential element in innovative drug research. In the present study, we combined protein sequence and drug molecular structure to design a computational model for DTIs prediction. The proposed model achieves excellent results in the gold standard datasets including Enzyme, GPCR, Ion Channel and Nuclear Receptor. The model also exhibits strong powerful in comparison with extraction algorithm models, classifier models, and previous methods. In addition, 7 of the top 10 DTIs predicted by the proposed model have been verified by relevant database. These outcomes suggest that the RoFDT model can be employed as a stable and dependable tool to provide valuable target candidates for innovative drug research.

39 in total

1. BRENDA, the enzyme database: updates and major new developments.

Authors: Ida Schomburg; Antje Chang; Christian Ebeling; Marion Gremse; Christian Heldt; Gregor Huhn; Dietmar Schomburg
Journal: Nucleic Acids Res Date: 2004-01-01 Impact factor: 16.971

2. Prediction of protein-protein interactions based on protein-protein correlation using least squares regression.

Authors: De-Shuang Huang; Lei Zhang; Kyungsook Han; Suping Deng; Kai Yang; Hongbo Zhang
Journal: Curr Protein Pept Sci Date: 2014 Impact factor: 3.272

3. DTI-CDF: a cascade deep forest model towards the prediction of drug-target interactions based on hybrid features.

Authors: Yanyi Chu; Aman Chandra Kaushik; Xiangeng Wang; Wei Wang; Yufang Zhang; Xiaoqi Shan; Dennis Russell Salahub; Yi Xiong; Dong-Qing Wei
Journal: Brief Bioinform Date: 2019-12-23 Impact factor: 11.622

Review 4. Drugs and their molecular targets: an updated overview.

Authors: Yves Landry; Jean-Pierre Gies
Journal: Fundam Clin Pharmacol Date: 2008-02 Impact factor: 2.748

5. Deep mining heterogeneous networks of biomedical linked data to predict novel drug-target associations.

Authors: Nansu Zong; Hyeoneui Kim; Victoria Ngo; Olivier Harismendy
Journal: Bioinformatics Date: 2017-08-01 Impact factor: 6.937

6. IMS-CDA: Prediction of CircRNA-Disease Associations From the Integration of Multisource Similarity Information With Deep Stacked Autoencoder Model.

Authors: Lei Wang; Zhu-Hong You; Jian-Qiang Li; Yu-An Huang
Journal: IEEE Trans Cybern Date: 2021-11-09 Impact factor: 11.448

7. From genomics to chemical genomics: new developments in KEGG.

Authors: Minoru Kanehisa; Susumu Goto; Masahiro Hattori; Kiyoko F Aoki-Kinoshita; Masumi Itoh; Shuichi Kawashima; Toshiaki Katayama; Michihiro Araki; Mika Hirakawa
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

8. Using Two-dimensional Principal Component Analysis and Rotation Forest for Prediction of Protein-Protein Interactions.

Authors: Lei Wang; Zhu-Hong You; Xin Yan; Shi-Xiong Xia; Feng Liu; Li-Ping Li; Wei Zhang; Yong Zhou
Journal: Sci Rep Date: 2018-08-27 Impact factor: 4.379

9. Prediction of drug-target interaction networks from the integration of chemical and genomic spaces.

Authors: Yoshihiro Yamanishi; Michihiro Araki; Alex Gutteridge; Wataru Honda; Minoru Kanehisa
Journal: Bioinformatics Date: 2008-07-01 Impact factor: 6.937

10. A semi-supervised method for drug-target interaction prediction with consistency in networks.

Authors: Hailin Chen; Zuping Zhang
Journal: PLoS One Date: 2013-05-07 Impact factor: 3.240