| Literature DB >> 31214240 |
Ping Xuan1, Chang Sun1, Tiangang Zhang2, Yilin Ye1, Tonghui Shen1, Yihua Dong1.
Abstract
Determining the target genes that interact with drugs-drug-target interactions-plays an important role in drug discovery. Identification of drug-target interactions through biological experiments is time consuming, laborious, and costly. Therefore, using computational approaches to predict candidate targets is a good way to reduce the cost of wet-lab experiments. However, the known interactions (positive samples) and the unknown interactions (negative samples) display a serious class imbalance, which has an adverse effect on the accuracy of the prediction results. To mitigate the impact of class imbalance and completely exploit the negative samples, we proposed a new method, named DTIGBDT, based on gradient boosting decision trees, for predicting candidate drug-target interactions. We constructed a drug-target heterogeneous network that contains the drug similarities based on the chemical structures of drugs, the target similarities based on target sequences, and the known drug-target interactions. The topological information of the network was captured by random walks to update the similarities between drugs or targets. The paths between drugs and targets could be divided into multiple categories, and the features of each category of paths were extracted. We constructed a prediction model based on gradient boosting decision trees. The model establishes multiple decision trees with the extracted features and obtains the interaction scores between drugs and targets. DTIGBDT is a method of ensemble learning, and it effectively reduces the impact of class imbalance. The experimental results indicate that DTIGBDT outperforms several state-of-the-art methods for drug-target interaction prediction. In addition, case studies on Quetiapine, Clozapine, Olanzapine, Aripiprazole, and Ziprasidone demonstrate the ability of DTIGBDT to discover potential drug-target interactions.Entities:
Keywords: class imbalance; drug–target interaction prediction; ensemble learning; gradient boosting decision tree; path category-based features
Year: 2019 PMID: 31214240 PMCID: PMC6555260 DOI: 10.3389/fgene.2019.00459
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Algorithm flow of DTIGBDT. (A) Construct the heterogeneous network. (B) Random walk on drug network and target network, respectively. (C) Select most similar k neighbors. (D) Get feature vectors for each drug–target pair. (E) Train the DTIGBDT with the feature vectors.
Figure 2Feature vector calculation of d7-t3. The edges between drug nodes or target nodes are weighted by the similarities between two nodes. The edges between drugs and target nodes represent the known DTIs and are weighted by 1. (A) Paths between d7 and t3. (B) The s-values of all the paths. (C) Three types of path feature vectors. (D) Connection of three feature vectors.
Figure 3Algorithm for predicting the potential drug–target interactions.
Figure 4ROC curves and precision–recall curves of DTI prediction by different methods.
P-values between DTIGBDT and other methods based on AUCs and AUPRs.
| 2.3732e-04 | 5.1773e-08 | 4.9252e-03 | 4.3850e-02 | |
| 7.5153e-14 | 8.0531e-23 | 9.8030e-15 | 6.1235e-09 |
Figure 5The average recalls across all the tested drugs at different top k-values.
Top-ranked five candidates of five drugs.
| Quetiapine | 1 | DrugBank, KEGG | |
| 2 | literature (Sugawara et al., | ||
| 3 | literature (Hong et al., | ||
| 4 | DrugBank | ||
| 5 | literature (Serge and Charles, | ||
| Clozapine | 1 | KEGG, CheMBL | |
| 2 | DrugBank | ||
| 3 | DrugBank | ||
| 4 | KEGG | ||
| 5 | CheMBL | ||
| Olanzapine | 1 | KEGG, UniProt | |
| 2 | KEGG | ||
| 3 | DrugBank | ||
| 4 | UniProt | ||
| 5 | Literature (Filatova et al., | ||
| Aripiprazole | 1 | KEGG, DrugBank | |
| 2 | KEGG, CheMBL | ||
| 3 | KEGG | ||
| 4 | KEGG | ||
| 5 | KEGG, DrugBank | ||
| Ziprasidone | 1 | KEGG, DrugBank | |
| 2 | KEGG | ||
| 3 | KEGG, DrugBank | ||
| 4 | KEGG | ||
| 5 | KEGG, DrugBank |
The novel DTIs are proved by other existing evidence (public databases or literature) and the supporting databases are listed in the evidence.