| Literature DB >> 35260608 |
Jihee Soh1, Sejin Park1, Hyunju Lee2.
Abstract
Identification of drug-target interactions (DTIs) plays a crucial role in drug development. Traditional laboratory-based DTI discovery is generally costly and time-consuming. Therefore, computational approaches have been developed to predict interactions between drug candidates and disease-causing proteins. We designed a novel method, termed heterogeneous information integration for DTI prediction (HIDTI), based on the concept of predicting vectors for all of unknown/unavailable heterogeneous drug- and protein-related information. We applied a residual network in HIDTI to extract features of such heterogeneous information for predicting DTIs, and tested the model using drug-based ten-fold cross-validation to examine the prediction performance for unseen drugs. As a result, HIDTI outperformed existing models using heterogeneous information, and was demonstrating that our method predicted heterogeneous information on unseen data better than other models. In conclusion, our study suggests that HIDTI has the potential to advance the field of drug development by accurately predicting the targets of new drugs.Entities:
Mesh:
Substances:
Year: 2022 PMID: 35260608 PMCID: PMC8904809 DOI: 10.1038/s41598-022-07608-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Architecture of the HIDTI model for predicting drug-target interactions (DTIs). Drug-related features include SMILES strings, drug-drug interactions (DDIs), drug-side effect associations (DSIE), and drug-disease associations (DDIS). Protein-related features include protein sequences, protein-protein similarities, protein–protein interactions (PPIs), and protein–disease interactions (PDIS). These are concatenated and fed into the neural network with a residual block. For unseen drugs, deep neural network (DNN) models are used to predict each item of heterogeneous information to obtain the input vectors of drug-target pairs. Ultimately, our model provides a binary output (1 or 0), considering the interaction between the drug and protein.
Dataset statistics.
| # of drugs | # of proteins | # of side effects | # of diseases | Total |
|---|---|---|---|---|
| 707 | 1489 | 4192 | 5603 | 11,991 |
Positive interactions in our datasets.
| Type of interaction | # of positives |
|---|---|
| Drug–protein | 1909 |
| Drug–drug | 10,024 |
| Drug-side effect | 80,160 |
| Drug-disease | 199,022 |
| Protein–protein | 7133 |
| Protein–disease | 1,572,157 |
Performance evaluation of HIDTI and other models for when heterogeneous information was available for unseen drugs.
| Methods | Ratio of positive and negative interactions | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1:1 | 1:3 | 1:5 | ||||||||||
| AUC | Precision | Recall | F1 | AUC | Precision | Recall | F1 | AUC | Precision | Recall | F1 | |
| Baseline | 0.789 | 0.693 | 0.816 | 0.741 | 0.879 | 0.732 | 0.708 | 0.716 | 0.853 | 0.670 | 0.607 | 0.630 |
| Single info | ||||||||||||
| +DDI | 0.832 | 0.720 | 0.804 | 0.758 | 0.890 | 0.734 | 0.741 | 0.734 | 0.873 | 0.669 | 0.637 | 0.649 |
| +DSIE | 0.816 | 0.719 | 0.792 | 0.752 | 0.881 | 0.675 | 0.772 | 0.714 | 0.867 | 0.639 | 0.616 | 0.621 |
| +DDIS | 0.834 | 0.723 | 0.817 | 0.766 | 0.876 | 0.668 | 0.760 | 0.707 | 0.866 | 0.602 | 0.635 | 0.615 |
| +PPI | 0.832 | 0.721 | 0.837 | 0.767 | 0.893 | 0.745 | 0.752 | 0.747 | 0.891 | 0.732 | 0.659 | 0.690 |
| +PSIM | 0.872 | 0.750 | 0.848 | 0.795 | 0.911 | 0.802 | 0.758 | 0.778 | 0.901 | 0.730 | 0.694 | 0.709 |
| +PDIS | 0.892 | 0.811 | 0.821 | 0.815 | 0.921 | 0.811 | 0.767 | 0.786 | 0.913 | 0.749 | 0.720 | 0.733 |
| Multiple info | ||||||||||||
| HIDTI (Ours) | ||||||||||||
The baseline model represents the prediction of DTIs using only drug chemical and protein sequence feature vectors.
The best performance values are in [bold].
Figure 2Performance evaluation of the HIDTI method for unseen drugs based on the number of targets. (A) The area under the receiver operating characteristic curve (AUC) values based on the number of targets for each drug. The shading intensity indicates the degree of the number of drugs with the corresponding AUC value. (B) Absolute values of the difference between the mean probabilities of positive and negative predicted interactions for each drug. This difference is denoted as the distance along the y-axis. Each dot represents the average distance of the drug for each number of targets.
Performance evaluation of HIDTI and other models when heterogeneous information was predicted for unseen drugs.
| Methods | Ratio of positive and negative interactions | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1:1 | 1:3 | 1:5 | ||||||||||
| AUC | Precision | Recall | F1 | AUC | Precision | Recall | F1 | AUC | Precision | Recall | F1 | |
| Baseline | 0.789 | 0.693 | 0.816 | 0.741 | 0.879 | 0.732 | 0.708 | 0.716 | 0.853 | 0.670 | 0.607 | 0.630 |
| Single info | ||||||||||||
| +DDI | 0.812 | 0.695 | 0.814 | 0.746 | 0.878 | 0.716 | 0.727 | 0.718 | 0.868 | 0.701 | 0.617 | 0.655 |
| +DSIE | 0.824 | 0.740 | 0.780 | 0.757 | 0.866 | 0.728 | 0.703 | 0.712 | 0.854 | 0.650 | 0.627 | 0.635 |
| +DDIS | 0.808 | 0.704 | 0.784 | 0.740 | 0.853 | 0.664 | 0.725 | 0.691 | 0.844 | 0.641 | 0.609 | 0.623 |
| +PPI | 0.849 | 0.766 | 0.801 | 0.780 | 0.896 | 0.750 | 0.751 | 0.747 | 0.890 | 0.733 | 0.654 | 0.689 |
| +PSIM | 0.863 | 0.798 | 0.791 | 0.793 | 0.798 | 0.761 | 0.775 | 0.901 | 0.736 | 0.675 | 0.701 | |
| +PDIS | 0.888 | 0.733 | 0.802 | 0.762 | 0.889 | 0.744 | 0.739 | 0.738 | 0.881 | 0.667 | 0.659 | 0.657 |
| Multiple info | ||||||||||||
| HIDTI (predicted all) | 0.889 | 0.831 | 0.818 | 0.823 | 0.894 | 0.797 | 0.771 | 0.781 | 0.892 | 0.765 | 0.697 | 0.727 |
| HIDTI (available PDIS) | 0.901 | |||||||||||
| NeoDTI | 0.828 | 0.566 | 0.651 | 0.758 | 0.809 | 0.622 | 0.646 | 0.629 | 0.841 | 0.497 | 0.583 | 0.605 |
The case of ‘HIDTI (available PDIS)’ refers to the use of existing protein-disease relationship (PDIS) features from our dataset, and the case of ‘HIDTI (predicted all)’ refers to the use of all predicted features from each deep neural network model described in “Generating features of heterogeneous information” section.
The best performance values are in [bold].
Prediction performance with drug-related heterogeneous information for NeoDTI and HIDTI.
| Drug related information | Ratio of positive and negative interactions | |||||
|---|---|---|---|---|---|---|
| 1:1 | 1:3 | 1:5 | ||||
| AUC | ||||||
| NeoDTI | HIDTI | NeoDTI | HIDTI | NeoDTI | HIDTI | |
| DDI | 0.678 | 0.678 | 0.679 | |||
| DSIE | 0.494 | 0.495 | 0.538 | |||
| DDIS | 0.525 | 0.508 | 0.499 | |||
The best performance values are in [bold].
Figure 3Performance evaluation of HIDTI and NeoDTI in terms of the AUCPR scores for balanced (1:1) (A) and imbalanced (1:3 and 1:5) (B, C) datasets. The three points represent the average precision and recall values for THR_25%, THR_50%, and THR_75% on drug-based ten-fold cross-validation.
Prediction probability of DTI pairs related to dopamine receptors.
| Drug name | Gene name | Probability | Label |
|---|---|---|---|
| Ropinirole | DRD1 | 0.996647 | 1 |
| Ziprasidone | DRD2 | 0.996076 | 1 |
| Olanzapine | DRD3 | 0.995723 | 1 |
| Thiothixene | DRD1 | 0.995303 | 1 |
| Ropinirole | DRD3 | 0.995222 | 1 |
| Ziprasidone | DRD3 | 0.993949 | 1 |
| Orphenadrine | DRD1 | 0.993747 | 0 |
| Risperidone | DRD3 | 0.993418 | 1 |
| Perphenazine | DRD2 | 0.99307 | 1 |
| Chlorpromazine | DRD2 | 0.992936 | 1 |
The probability represents the prediction values of DTI pairs in the test datasets, and the label represents the original labels of the DTI pairs in the dataset used in this study.