| Literature DB >> 31430892 |
Yang Li1, Yu-An Huang2, Zhu-Hong You3, Li-Ping Li1, Zheng Wang1.
Abstract
The identification of drug-target interactions (DTIs) is a critical step in drug development. Experimental methods that are based on clinical trials to discover DTIs are time-consuming, expensive, and challenging. Therefore, as complementary to it, developing new computational methods for predicting novel DTI is of great significance with regards to saving cost and shortening the development period. In this paper, we present a novel computational model for predicting DTIs, which uses the sequence information of proteins and a rotation forest classifier. Specifically, all of the target protein sequences are first converted to a position-specific scoring matrix (PSSM) to retain evolutionary information. We then use local phase quantization (LPQ) descriptors to extract evolutionary information in the PSSM. On the other hand, substructure fingerprint information is utilized to extract the features of the drug. We finally combine the features of drugs and protein together to represent features of each drug-target pair and use a rotation forest classifier to calculate the scores of interaction possibility, for a global DTI prediction. The experimental results indicate that the proposed model is effective, achieving average accuracies of 89.15%, 86.01%, 82.20%, and 71.67% on four datasets (i.e., enzyme, ion channel, G protein-coupled receptors (GPCR), and nuclear receptor), respectively. In addition, we compared the prediction performance of the rotation forest classifier with another popular classifier, support vector machine, on the same dataset. Several types of methods previously proposed are also implemented on the same datasets for performance comparison. The comparison results demonstrate the superiority of the proposed method to the others. We anticipate that the proposed method can be used as an effective tool for predicting drug-target interactions on a large scale, given the information of protein sequences and drug fingerprints.Entities:
Keywords: drug substructure fingerprint; drug-target interactions; local phase quantization; rotation forest
Mesh:
Substances:
Year: 2019 PMID: 31430892 PMCID: PMC6719962 DOI: 10.3390/molecules24162999
Source DB: PubMed Journal: Molecules ISSN: 1420-3049 Impact factor: 4.411
The five-fold cross-validation results achieved on enzyme dataset by using the proposed model.
| Test Set | Acc.(%) | Pre.(%) | Sen.(%) | MCC(%) | AUC(%) |
|---|---|---|---|---|---|
| 1 | 89.74 | 90.68 | 87.79 | 81.54 | 94.77 |
| 2 | 88.55 | 89.61 | 87.81 | 79.71 | 94.04 |
| 3 | 90.60 | 93.73 | 87.46 | 82.94 | 96.16 |
| 4 | 87.09 | 89.30 | 84.63 | 77.50 | 93.55 |
| 5 | 89.76 | 92.01 | 86.54 | 81.55 | 94.82 |
|
| 89.15 ± 1.36 | 91.06 ± 1.83 | 86.85 ± 1.34 | 80.65 ± 2.10 | 94.66 ± 0.99 |
The five-fold cross-validation results achieved on ion channel dataset by using the proposed model.
| Test Set | Acc.(%) | Pre.(%) | Sen.(%) | MCC(%) | AUC(%) |
|---|---|---|---|---|---|
| 1 | 87.97 | 86.03 | 90.94 | 78.78 | 93.27 |
| 2 | 84.41 | 84.59 | 85.15 | 73.65 | 90.16 |
| 3 | 85.25 | 81.46 | 88.81 | 74.80 | 91.14 |
| 4 | 84.92 | 85.81 | 83.78 | 74.38 | 90.44 |
| 5 | 87.50 | 90.43 | 84.44 | 78.10 | 92.62 |
|
| 86.01 ± 1.61 | 85.66 ± 3.23 | 86.62 ± 3.10 | 75.94 ± 2.33 | 91.52 ± 1.36 |
The five-fold cross-validation results achieved on GPCR dataset by using the proposed model.
| Test Set | Acc.(%) | Pre.(%) | Sen.(%) | MCC(%) | AUC(%) |
|---|---|---|---|---|---|
| 1 | 82.68 | 79.51 | 83.62 | 71.23 | 87.83 |
| 2 | 83.46 | 82.40 | 83.74 | 72.38 | 88.66 |
| 3 | 80.31 | 82.73 | 81.56 | 68.04 | 84.77 |
| 4 | 81.89 | 83.93 | 77.05 | 70.11 | 85.21 |
| 5 | 82.68 | 85.60 | 80.45 | 71.32 | 86.01 |
|
| 82.20 ± 1.19 | 82.83 ± 2.24 | 81.28 ± 2.75 | 70.62 ± 1.65 | 86.50 ± 1.68 |
The five-fold cross-validation results achieved on nuclear receptor dataset by using the proposed model.
| Test Set | Acc.(%) | Pre.(%) | Sen.(%) | MCC(%) | AUC(%) |
|---|---|---|---|---|---|
| 1 | 72.22 | 80.00 | 63.16 | 59.25 | 77.86 |
| 2 | 77.78 | 76.92 | 90.91 | 60.78 | 78.57 |
| 3 | 69.44 | 55.56 | 76.92 | 55.90 | 79.43 |
| 4 | 69.44 | 73.68 | 70.00 | 57.23 | 76.56 |
| 5 | 69.44 | 61.90 | 81.25 | 56.69 | 77.34 |
|
| 71.67 ± 3.62 | 69.61 ± 10.43 | 76.45 ± 10.61 | 57.97 ± 2.00 | 77.95 ± 1.10 |
Figure 1The receiver operating characteristic (ROC) curves are generated by our method on enzyme dataset.
Figure 2The ROC curves are generated by our method on ion channel dataset.
Figure 3The ROC curves are generated by our method on GPCR dataset.
Figure 4The ROC curves are generated by our method on nuclear receptor dataset.
The five-fold cross-validation results achieved on the enzyme dataset by using the rotation forest classifier and the support vector machine classifier.
| Test Set | Acc.(%) | Pre.(%) | Sen.(%) | MCC(%) |
|---|---|---|---|---|
| PSSM+LPQ+RF | ||||
| 1 | 89.74 | 90.68 | 87.79 | 81.54 |
| 2 | 88.55 | 89.61 | 87.81 | 79.71 |
| 3 | 90.60 | 93.73 | 87.46 | 82.94 |
| 4 | 87.09 | 89.30 | 84.63 | 77.50 |
| 5 | 89.76 | 92.01 | 86.54 | 81.55 |
|
| 89.15 ± 1.36 | 91.06 ± 1.83 | 86.85 ± 1.34 | 80.65 ± 2.10 |
| PSSM+LPQ+SVM | ||||
| 1 | 85.04 | 86.25 | 82.12 | 74.46 |
| 2 | 85.47 | 86.05 | 85.48 | 75.15 |
| 3 | 85.90 | 86.02 | 86.45 | 75.76 |
| 4 | 83.85 | 83.98 | 84.12 | 72.91 |
| 5 | 85.75 | 86.36 | 84.09 | 75.53 |
|
| 85.20 ± 0.82 | 85.73 ± 0.99 | 84.45 ± 1.64 | 74.76 ± 1.15 |
Figure 5The ROC curves are generated by the support vector machine (SVM) classifier on enzyme dataset.
The comparison of the area under the curve (AUC) values obtained between the proposed method and other existing methods on the gold standard datasets.
| Dataset | Our Method | DBSI | KBMF2K | NetCBP | Yamanishi |
|---|---|---|---|---|---|
| Enzymes | 0.9466 | 0.8075 | 0.832 | 0.8251 | 0.821 |
| Ion Channels | 0.9152 | 0.8029 | 0.799 | 0.8034 | 0.692 |
| GPCRs | 0.8650 | 0.8022 | 0.857 | 0.8235 | 0.811 |
| Nuclear Receptors | 0.7795 | 0.7578 | 0.824 | 0.8394 | 0.814 |
The number of four drug-target interaction datasets.
| Dataset | Drug Compounds | Target Proteins | Interactions |
|---|---|---|---|
| Enzyme | 445 | 664 | 2926 |
| Ion channel | 210 | 204 | 1476 |
| GPCR | 223 | 95 | 635 |
| Nuclear receptor | 54 | 26 | 90 |
Figure 6Flow chart for a given drug and target protein based on the proposed method.