| Literature DB >> 34831315 |
Luong Huu Dang1, Nguyen Tan Dung2,3, Ly Xuan Quang1, Le Quang Hung4, Ngoc Hoang Le5, Nhi Thao Ngoc Le5, Nguyen Thi Diem6, Nguyen Thi Thuy Nga7, Shih-Han Hung8,9,10, Nguyen Quoc Khanh Le11,12,13.
Abstract
The requesting of detailed information on new drugs including drug-drug interactions or targets is often unavailable and resource-intensive in assessing adverse drug events. To shorten the common evaluation process of drug-drug interactions, we present a machine learning framework-HAINI to predict DDI types for histamine antagonist drugs using simplified molecular-input line-entry systems (SMILES) combined with interaction features based on CYP450 group as inputs. The data used in our research consisted of approved drugs of histamine antagonists that are connected to 26,344 DDI pairs from the DrugBank database. Various classification algorithms such as Naive Bayes, Decision Tree, Random Forest, Logistic Regression, and XGBoost were used with 5-fold cross-validation to approach a large-scale DDIs prediction among histamine antagonist drugs. The prediction performance shows that our model outperformed previously published works on DDI prediction with the best precision of 0.788, a recall of 0.921, and an F1-score of 0.838 among 19 given DDIs types. An important finding of the study is that our prediction is based solely on the SMILES and CYP450 and thus can be applied at the early stage of drug development.Entities:
Keywords: PyBioMed package; SMILES; cheminformatics; drug-drug interaction; histamine antagonist; machine learning
Mesh:
Substances:
Year: 2021 PMID: 34831315 PMCID: PMC8621088 DOI: 10.3390/cells10113092
Source DB: PubMed Journal: Cells ISSN: 2073-4409 Impact factor: 6.600
Figure 1The architecture of the heterogeneous histamine antagonist interaction-network inference (HAINI) framework for predicting multiclass drug-drug interactions (DDI).
Description of the feature extraction from CYP450 groups.
| VectorA | VectorB | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| 1A2 | Subtrate | 1 | VectorA = | 1 | 1A2 | No interaction | 0 | VectorB = | 0 |
| 2A6 | Non-subtrate | 0 | 0 | 2A6 | No interaction | 0 | 0 | ||
| 2B6 | Non-subtrate | 0 | 0 | 2B6 | Inhibitor | 1 | 1 | ||
| 2C18 | Non-subtrate | 0 | 0 | 2C18 | No interaction | 0 | 0 | ||
| 2C19 | Non-subtrate | 0 | 0 | 2C19 | No interaction | 0 | 0 | ||
| 2C8 | Non-subtrate | 0 | 0 | 2C8 | Inhibitor | 1 | 1 | ||
| 2C9 | Non-subtrate | 0 | 0 | 2C9 | No interaction | 0 | 0 | ||
| 2D6 | Non-subtrate | 0 | 0 | 2D6 | Inducer | −1 | −1 | ||
| 2E1 | Non-subtrate | 0 | 0 | 2E1 | No interaction | 0 | 0 | ||
| 3A4 | Subtrate | 1 | 1 | 3A4 | Inducer | −1 | −1 | ||
| 3A5 | Non-subtrate | 0 | 0 | 3A5 | No interaction | 0 | 0 | ||
| 3A7 | Non-subtrate | 0 | 0 | 3A7 | No interaction | 0 | 0 | ||
Hyperparameter search grid and optimal value for each machine learning algorithm.
| Algorithms | Hyperparameter Grid | Optimal Parameter | |
|---|---|---|---|
| Naïve Bayes | C: | 0.001, 0.01, 0.1, 1, 10, 100, 1000 | C: 100 |
| gamma: | 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0 | gamma: 0.5 | |
| kernel: | rbf, linear | kernel: rbf | |
| Logistic Regression | Penalty: | 11, 12 | Penalty: 12 |
| C: | 0.001, 0.01, 0.1, 1, 10, 100, 1000 | C: 100 | |
| Decision Tree | criterion: | gini, entropy | criterion: Gini |
| max_depth: | 10, 20, 30, 40, 50, None | max_depth: 10 | |
| min_samples_leaf: | 1, 2, 4 | min_samples_leaf: 2 | |
| min_samples_split: | 2, 5, 10 | min_samples_split: 5 | |
| Random Forest | bootstrap: | True, False | bootstrap: False |
| max_depth: | 10, 20, 30, 40, 50, None | max_depth: 10 | |
| max_features: | auto, sqrt | max_features: sqrt | |
| min_samples_leaf: | 1, 2, 4 | min_samples_leaf: 2 | |
| min_samples_split: | 2, 5, 10 | min_samples_split: 5 | |
| n_estimators: | 20, 40, 60, 80, 100, 200, 500, 1000, 1500 | n_estimators: 1500 | |
| XGBoost | max_depth: | 10, 20, 30, 40, 50, None | max_depth: 10 |
| max_features: | auto, sqrt | max_features: sqrt | |
| min_samples_leaf: | 1, 2, 4 | min_samples_leaf: 2 | |
| min_samples_split: | 2, 5, 10 | min_samples_split: 5 | |
| n_estimators: | 20, 40, 60, 80, 100, 200, 500, 1000, 1500 | n_estimators: 1500 | |
Figure 2The number of drug pairs of 19 DDI types (classes) after cut-off low data classes.
Figure 3Shap summary plot of top 20 important features ranked by BestFit method (A) and Random Forest method (B).
Figure 4Receiver Operating Characteristic (ROC) curves of five machine learning classifiers.
Top 10 DDI types having highest precision when classifying with XGBoost algorithm (5-fold cross-validation).
| Type of Interactions * | Recall | Precision | F1-Score | Rank |
|---|---|---|---|---|
| Class 13 | 0.906 | 0.904 | 0.999 | 1 |
| Class 15 | 0.839 | 0.838 | 1.000 | 2 |
| Class 6 | 0.837 | 0.818 | 0.984 | 3 |
| Class 3 | 0.799 | 0.745 | 0.966 | 4 |
| Class 17 | 0.769 | 0.777 | 0.981 | 5 |
| Class 4 | 0.749 | 0.685 | 0.939 | 6 |
| Class 7 | 0.742 | 0.729 | 0.959 | 7 |
| Class 2 | 0.703 | 0.65 | 0.941 | 8 |
| Class 1 | 0.681 | 0.63 | 0.946 | 9 |
| Class 8 | 0.68 | 0.681 | 0.995 | 10 |
* The DDI names and information are shown detail in the Supplementary File.
XGBoost classifier performance on the validation dataset.
| Type of Interaction * | Recall | Precision | F1 | Number of Drug-Drug Pairs |
|---|---|---|---|---|
| 15 | 0.68 | 0.75 | 0.65 | 7 |
| 6 | 0.71 | 0.73 | 0.60 | 165 |
| 3 | 0.73 | 0.57 | 0.64 | 570 |
| 17 | 0.46 | 0.45 | 0.38 | 10 |
| 4 | 0.65 | 0.67 | 0.59 | 670 |
* DDI type names and information are shown detail in the Supplementary File.
HAINI performance compared with previous studies using chemical similarity.
| Recall | Precision | F1 | |
|---|---|---|---|
| Average performance * | 0.734 | 0.783 | 0.758 |
| Best performance ** | 0.921 | 0.778 | 0.838 |
| Narjes Rohani et al. | 0.899 | 0.373 | 0.527 |
| Mei Liu et al. | 0.493 | 0.434 | N/A |
| Wen Zhang et al. | 0.765 | 0.617 | 0.683 |
* Average performance of our model, ** the best performance of our model among all the DDIs.