Literature DB >> 35689200

Multi-type feature fusion based on graph neural network for drug-drug interaction prediction.

Changxiang He¹, Yuru Liu¹, Hao Li², Hui Zhang³, Yaping Mao⁴, Xiaofei Qin², Lele Liu⁵, Xuedian Zhang².

Abstract

BACKGROUND: Drug-Drug interactions (DDIs) are a challenging problem in drug research. Drug combination therapy is an effective solution to treat diseases, but it can also cause serious side effects. Therefore, DDIs prediction is critical in pharmacology. Recently, researchers have been using deep learning techniques to predict DDIs. However, these methods only consider single information of the drug and have shortcomings in robustness and scalability.
RESULTS: In this paper, we propose a multi-type feature fusion based on graph neural network model (MFFGNN) for DDI prediction, which can effectively fuse the topological information in molecular graphs, the interaction information between drugs and the local chemical context in SMILES sequences. In MFFGNN, to fully learn the topological information of drugs, we propose a novel feature extraction module to capture the global features for the molecular graph and the local features for each atom of the molecular graph. In addition, in the multi-type feature fusion module, we use the gating mechanism in each graph convolution layer to solve the over-smoothing problem during information delivery. We perform extensive experiments on multiple real datasets. The results show that MFFGNN outperforms some state-of-the-art models for DDI prediction. Moreover, the cross-dataset experiment results further show that MFFGNN has good generalization performance.
CONCLUSIONS: Our proposed model can efficiently integrate the information from SMILES sequences, molecular graphs and drug-drug interaction networks. We find that a multi-type feature fusion model can accurately predict DDIs. It may contribute to discovering novel DDIs.

Entities: Chemical

Keywords: Gating mechanism; Graph neural network; Link prediction; Multi-type feature fusion

Mesh：

Year: 2022 PMID： 35689200 PMCID： PMC9188183 DOI： 10.1186/s12859-022-04763-2

Source DB: PubMed Journal: BMC Bioinformatics ISSN： 1471-2105 Impact factor: 3.307

Introduction

Drug-Drug interactions (DDIs) refer to the presence of one drug changing the pharmacological activity of another, which may produce some side effects and even injury or death. At the same time, multiple drug combinations to treat diseases are inevitable. So, it is crucial to predict potential DDI. Traditional methods of DDI prediction depend on in vivo and in vitro experiments. However, due to its limited environment, too small scale, cumbersome and expensive process, the ability to predicting DDI is greatly limited. Therefore, an efficient computational method is needed to predict DDI. In the past several years, people have proposed methods based on machine learning [1-4] to solve this problem. Qiu et al. [5] summarized some methods based on machine learning. Deng et al. [6] used chemical structure to learn the representation of DDIs in representation module, and then predicted some rare events with few examples in comparing module. Deng et al. [7] predicted DDI using different drug features and constructed deep neural networks (DNN). Zhang et al. [8] predicted DDI using manifold regularization. Recently, graph-based representation learning has been applied to Drug-Drug interaction. Drugs are compounds, each of which can be represented by a molecular graph with the atom as the node and the chemical bond as the edge, or a Simplified Molecular Input Line Entry System (SMILES) sequence. In Drug-Drug interaction networks, by treating the drug as the node and the interaction as the edge, DDI prediction can be regarded as link prediction tasks. Graph neural network (GNN) has made some progress in DDI prediction [9-13]. Feng et al. [14] predicted DDI using Graph Convolutional Network (GCN) and DNN. In addition, there are also many methods about multi-type DDI prediction [15-17]. Nyamabo et al. [18] proposed to predict DDIs by the interactions between drug substructures. Then, Nyamabo et al. [19] used gating devices to learn the chemical substructures of drugs. Chen et al. [20] used the bi-level cross strategy to fuse the structural information and knowledge graph information of drugs. Although the models mentioned have achieved significant results, there are still some limitations: (i) The models mentioned are generally limited to only considering the structure, sequence or interaction information of the drugs, without considering the synergistic effects between them. (ii) For molecular graphs, only applying GNN can extract the local features for the atoms of the molecular graph, but it is difficult to propagate the information in the graph remotely to capture the global features for the molecular graph. (iii) In drug-drug interaction networks, node features obtained by stacking multi-layer GNNs will be smoothed and blurred, which loses the diversity of node features. To address above issues, this paper proposes an end-to-end learning framework for DDI prediction, namely MFFGNN. In MFFGNN, we first utilize deep neural networks to capture the intra-drug features from SMILES sequences and molecular graphs. For SMILES sequences, MFFGNN applies the bi-directional gate recurrent unit neural network [21] to extract local chemical context information from the sequences. For molecular graphs, MFFGNN not only utilizes graph interaction networks [22] but also graph warp unit [23] to extract both the global features for the molecular graph and the local features for each atom of the molecular graph. In addition, MFFGNN takes the intra-drug features as the initial features of the nodes in the DDI network and uses GCN encoder to fuse the intra-drug features and external DDI features to update the drug representation. Finally, we predict the missing interactions in the DDI graph through Multi-layer Perceptron (MLP). Overall, the main contributions of this paper are summarized as follows: We propose a novel model MFFGNN for DDI prediction, which fuses the topological information in molecular graphs, the interaction information between drugs and the local chemical context in SMILES sequences. To better learn the topological structure of drugs, we propose a molecular graph feature extraction module (MGFEM) to extract the global features for the molecular graph and the local features for each atom of the molecular graph. We conduct extensive experiments on three real datasets with different scales to demonstrate the superiority of our model.

Related works

Drug-drug prediction

Drug-Drug prediction has always been a worthy research direction in pharmacology. Most of previous work depended on in vivo and in vitro experiments. However, they do not scale well due to the limitations of the laboratory environment [24]. Subsequently, machine learning has been proposed to solve this problem. Similarity-based methods calculated specific similarity measures [25-29], e.g., drug structure, targets, side effects, genomic properties, therapeutic, etc., while combined with machine learning models for drug prediction. Ryu et al. [30] predicted the type of drug-drug interactions using DNN based on the similarity of the chemical structure of drugs. Graph-based methods predicted drug-drug interactions by learning the molecular graph [31] or interaction graph [32]. Shang et al. [33] modeled drugs as nodes and DDI as links, so tasks as link prediction problems.

Graph neural network

Recently, as a neural network method on graph domain, the study of graph neural network (GNN) has received great attention. With the development of GNN, many variants based on GNN came out one after another [34-36]. Rahimi et al. [37] proposed to control the transmission of neighbourhood information through gating operation. With the increasing popularity of GNN, researchers are using GNN models for DDIs [38]. For example, Duvenaud et al. [39] used GNN to perform molecular modeling by extracting molecular circular fingerprints. Lin et al. [40] used knowledge graph neural network (KGNN) to mine their associated relations in knowledge graph to solve the DDI prediction problem. Bai et al. [41] proposed to learn drug feature representation by a Bi-level Graph Neural Network (BI-GNN) to solve biological link prediction tasks. MIRACLE [42] is most relevant to our work.

Methods

Preliminaries

We define the drug set as and its corresponding SMILES sequence set as , where n represents the number of drugs. We define the molecular graph as , where and represent the sets of atoms and chemical bonds, respectively, and interaction graph as , where represents the links between drugs. We use to define the dimension of the representation of the atom and chemical bond and to define the dimension of the representation of the drug. Problem description The DDI prediction problem is regarded as the link prediction task on the graph. The interaction graph can be represented by an adjacency matrix with each element . Given two drug nodes, the DDI prediction problem is defined to predict whether there is an interaction between them.

Overview of MFFGNN

The framework of MFFGNN is shown in Fig. 1, which is divided into the following four modules. In Molecular Graph Feature Extraction Module (MGFEM), we use the graph interaction network with graph wrap unit to extract the topological structure features of the drug from a given molecular graph. In SMILES Sequence Feature Extraction Module (SSFEM), we employ the bi-directional gate recurrent unit to extract local chemical context from a given SMILES sequence. In Multi-type Feature Fusion Module (MFFM), we apply GCN encoder to fuse the intra-drug features and external DDI features to update the drug representation. Finally, we predict the missing interactions in the DDI graph through MLP.

Fig. 1

Overview of MFFGNN, where is sum. The MFFGNN uses SMILES sequences and molecular graphs as inputs to the model, and then extracts the intra-drug features through the MGFEM and SSFEM modules, respectively. Then, MFFGNN fuses the intra-drug features and external DDI features through MFFM module to obtain the updated drug features. Finally, the final predicted value is obtained by DDI predictor

Molecular graph feature extraction module

The Molecular Graph Feature Extraction Module (MGFEM) is shown in Fig. 2. Molecular graphs are an important expression for drugs. We use RDKit [43] tool to construct the molecular graph based on SMILES sequence. First, we obtain the initial features of each atom according to atom symbol, formal charge, whether the atom is aromatic, its hybridization, chirality, etc. Similarly, we obtain the initial features of each bond according on the type of bond, whether the bond is in a ring, whether it is conjugated, etc. Then, the initial atom and chemical bond features are transformed to through a layer neural network, and the calculation process is as follows:where is the activation function, and are the learnable weight matrices. To fully extract atom and chemical bond features, we apply graph interaction networks [22]. In graph interaction network, firstly, the features of edge are updated according to the features of its connected nodes and itself, and the process is as follows:where || is concatenation operation, and are the learnable weight matrix and the bias of the edge update, respectively. Then, the node features are updated according to the features of its connected edges and itself, and the calculation process is as follows:where N(i) represents the neighbor of node i.

Fig. 2

Overview of MGFEM. The MGFEM module applies graph interaction network and graph wrap unit to extract local information and global information of the molecular graph. When extracting the local information, the module updates the edge feature before updating the node feature. When extracting the global information, the module utilizes a supernode to promote the global propagation of information

The above processes can only spread the features of atoms and chemical bonds locally, but cannot spread information globally. Therefore, we propose to extract the global features of the molecular graph by applying graph warp unit (GWU) [23]. The properties of the whole drug often influence drug-drug interaction prediction. The GWU consists of three parts: supernode, transmitter and warp gate. Overview of MGFEM. The MGFEM module applies graph interaction network and graph wrap unit to extract local information and global information of the molecular graph. When extracting the local information, the module updates the edge feature before updating the node feature. When extracting the global information, the module utilizes a supernode to promote the global propagation of information Supernode: We add a supernode to the graph, which can connect every atom in the molecular graph. Then, the sum of all atom features is taken as the initial feature of the supernode, , that is:Then, the features of the supernode are updated by a single-layer neural network:where are the learnable weight matrix. Transmitter: The transmitter part gathers information from the atoms and the supernode. Before propagating the atom features to the supernode, we need to transform the form of the information. Different atom features have different degrees of importance relative to the global features. Therefore, the transmitter part applies the multi-head attention mechanism to aggregate different atom features. The calculation process is as follows:where represents the information propagated from each atom to the supernode at the layer, represents the significance score of node i at the head and the layer, represents the product of the elements and , K represents the number of heads. The information propagated from the supernode to each atom is calculated by the following formula:where represents the information propagated from the supernode to each atom at the layer. Warp Gate: The warp gate combines the transmitted information and sets the gating coefficients to control the fusion of information. For each atom, gated interpolation is used to fuse the information from the supernode with the updated atom features :where represents the gating coefficient during the transmission from supernode to each atom and represents the information transmitted to each atom. For supernode, gated interpolation is used to fuse information from atoms with updated supernode features :where represents the gating coefficient during the transmission from atom to supernode and represents the information transmitted to supernode. Finally, the updated features of each atom and supernode are calculated through the gated recurrent units (GRU) [44]:By applying this module to the whole dataset, we obtain the feature matrix based on the molecular graph.

SMILES sequence feature extraction module

Drugs are commonly represented by the SMILES sequences, which are composed of molecular symbols. SMILES sequences also contain rich features compared with molecular graphs. The molecular graphs of the drug provide how the atoms are connected, while the SMILES sequences provide the functional information of the atoms and long-term dependency representations. To capture the local chemical context in SMILES sequences, we first utilized the embedding method to construct an atomic embedding matrix, and then input it into the Bi-directional Gate Recurrent Unit (BiGRU) neural network to obtain the entire drug representation. SMILES Sequence Feature Extraction Module (SSFEM) is shown in Fig. 3.

Fig. 3

Overview of SSFEM. The SSFEM module applies Smi2Vec and BiGRU to extract features from SMILES sequences. Then, the whole drug features are obtained through the readout layer

Overview of SSFEM. The SSFEM module applies Smi2Vec and BiGRU to extract features from SMILES sequences. Then, the whole drug features are obtained through the readout layer Nowadays, most methods encode SMILES sequence by label or one-hot encoding. However, one-hot encoding and label ignore the context information of the atom. Therefore, to explore the function of the atom in the context, we propose to encode SMILES sequences by an advanced embedding method, Smi2Vec [45]. Specifically, for SMILES sequences , we divide them into a series of atomic symbols by space. Then, we map each atom to an embedding vector according to the pre-trained embedding dictionary. Finally, we aggregate the embedding vectors of atoms to obtain an embedding matrix , in which m is the number of atoms and each row is the embedding of an atom. We apply a layer of BiGRU [21] on the embedding matrix . BiGRU trains the input data with two GRUs in opposite directions, as shown in Fig. 3. The current hidden state of BiGRU can be described as follows: and , where represents a non-linear transformation of the input vector. Therefore, the hidden state at time t can be expressed by the weighted sum of and , which is expressed as follows:where and represent the weights, and represents the bias. Then, we use a fully connected layer as the readout layer to obtain the drug representation. By applying this module to the whole dataset, we obtain the sequence-based feature matrix . Note that we should input a fix-sized matrix into the BiGRU layer. However, the length of the SMILES sequence varies. We use the approximately average length of the sequences in the dataset as the fixed length and apply zero-padding and cutting operations.

Multi-type feature fusion module

We combine the feature matrices and obtained above to obtain the intra-drug features, namely . In order to fuse the intra-drug features with the external DDI features, we design a GCN encoder with the gating mechanism. Specifically, we take the intra-drug features as the initial node features in the interaction graphs, and then update the node representation by multi-layer GCN. The Multi-type Feature Fusion Module (MFFM) is shown in Fig. 4.

Fig. 4

Overview of MFFM, where is gating and is 1-gating. The MFFM takes the intra-drug features as the initial node features in DDI network, and then update the node representation by multi-layer graph convolution neural network with gating For drug , the output of layer is as follows:where is learnable weight parameter. is the component of the normalized adjacency matrix . where . We can add multiple GCN layers to expand the neighborhood of label propagation, but it may also cause the increase of noisy information. Meanwhile, the neighborhoods of different orders contain different information. Therefore, we utilize the gating mechanism [37] to control how much neighborhood information is passed to the node. The process is as follows:where represents the gating weight of the layer, are weight matrix and bias variable of the layer. After multi-layer GCN, we finally obtain the feature matrix for drugs in DDI Network. In addition, inspired by MIRACLE, the module uses the graph contrastive learning approach to balance the information inside and outside of the drug. For the drug , we take itself and its first-order neighboring nodes as positive samples P and the nodes not in first-order neighbors as negative samples N. We design a learning objective, which made external features of drug consistent with internal features of positive samples and distinct from internal features of negative samples, defined as follows:where is the discriminator function, which scores agreement between the two vectors of the input. Here we set it to the point product operation.

DDI prediction

Firstly, we obtain an interaction link representation by multiplying two drug representation. Then, we input it into the MLP to get the prediction score:where MLP consists of two fully connected layers. Our learning objective is to minimize the distance between the predictions and the true labels. The specific formula is as follows:where is the real label for drug pair . Then, we unify the DDI prediction task and the contrastive learning task into a learning framework. Formally, the learning objective of our model is:where is a hyper-parameter used to control the magnitude of contrastive task.

Results

In this section, we design various experiments to demonstrate the superiority of the model MFFGNN.

Experimental setup

Datasets. To verify the validity of our model on datasets with different scales, we evaluate the proposed model in small, medium, and large datasets. In the small-scale dataset, the number of drugs is relatively small, but fingerprints of all drugs are available. In the medium-scale dataset, although the number of drugs is relatively large, there is only the same number of labeled DDI links as in small-scale dataset. In the large-scale dataset, most of drugs lack many fingerprints. Detailed information about the datasets can be seen in Table 1.

Table 1

Detailed information about the datasets

Dataset	Drugs	DDI links	Information
ZhangDDI [46]	548	48,548	Similarity
ChCh-Miner [47]	1514	48,514	–
DeepDDI [30]	1861	192,284	Polypharmacy side-effect

Detailed information about the datasets Note that we removed the SMILES sequences that cannot construct the graph in the dataset. Baselines To demonstrate the superiority of our model, we compare MFFGNN with the following state-of-the-art models:Implementation details For the division of the datasets, the splitting method is the same as MIRACLE [42]. We divide 80% of each dataset into the training set, 20% into the test set, and 20% of the training set are randomly sampled as the validation set. The dataset only contains positive drug pairs. For negative training samples, we select the same number of negative drug pairs [51]. SSP-MLP [30]: This approach used the names and structural information of drug-drug or drug-food pairs as inputs and applied Structural Similarity Profile (SSP) and MLP for classification. We name this model as SSP-MLP. Multi-Feature Ensemble [46]: This approach combined multiple types of data and proposed a collective framework. We name this model as Ens. GCN [48]: This approach applied GCN to perform semi-supervised node classification. We use GCN to extract structural information of drugs for DDI prediction. GAT [35]: This approach used GAT to perform node classification task. We apply GAT to extract drug features in interaction graph for DDI prediction. SEAL-C/AI [49]: This approach performs semi-supervised graph classification tasks from a hierarchical graph perspective. We apply this model to obtain drug features for DDI prediction. NFP-GCN [39]: This approach designs a GCN for learning molecular fingerprints. We name this model as NFP-GCN. MIRACLE [42]: This approach simultaneously learned the inter-view molecular structure information and intra-view interaction information of drugs for DDI prediction. MFs [50]: This approach only used molecular fingerprints as input to the DDI network to predict DDIs, we name this model as MFs. We also consider several multi-type DDI prediction methods and apply them to binary classification tasks, i.e. DPDDI [14], SSI-DDI [18], DDIMDL [7], MUFFIN [20]. We utilize Adam [52] optimizer to train the model and Xavier [53] initialization to initialize the model. We utilize the exponential decay method to set the learning rate, where the initial learning rate is 0.0001 and the multiplication factor is 0.96. The model applies a dropout [54] layer to the output of each intermediate layer, where the dropout rate is 0.3. We set the dimension of the atom-level and drug-level representations as 256. We set in the multi-head attention mechanism. To evaluate the effectiveness of the model MFFGNN, we consider three metrics, including Area Under the Receiver Operating Characteristic curve (AUROC), Area Under the Precision-recall Curve (AUPRC) and F1.

Comparison results

To verify the validity of the proposed MFFGNN, we compare MFFGNN with state-of-the-art models for DDI prediction on three datasets with different scales. Over ten repeated experiments, we give the mean and standard deviation. The best results are highlighted in bold. Comparison on the ZhangDDI dataset We compare the MFFGNN model with state-of-the-art models on the ZhangDDI dataset, and the results are shown in Table 2. The results of these baselines are obtained from Table 2 in Ref. [42]. As can be seen, the methods considering multiple features, such as Ens, SEAL-C/AI, NFP-GCN and MIRACLE, perform better than the methods considering only one feature. However, the MFFGNN has the best performance. MFFGNN considers not only the topological structure information in molecular graphs and the interaction information between drugs, but also the local chemical context in SMILES sequences. This indicates that multi-type feature fusion can improve the performance of the model.

Table 2

Comparison results on ZhangDDI dataset

Method	AUROC	AUPRC	F1
SSP-MLP	92.51 ± 0.15	88.51 ± 0.66	80.69 ± 0.81
Ens	95.20 ± 0.14	92.51 ± 0.15	85.41 ± 0.16
GCN	91.91 ± 0.62	88.73 ± 0.84	81.61 ± 0.39
GAT	91.49 ± 0.29	90.69 ± 0.10	80.93 ± 0.25
SEAL-C/AI	92.93 ± 0.19	92.82 ± 0.17	84.74 ± 0.17
NFP-GCN	93.22 ± 0.09	93.07 ± 0.46	85.29 ± 0.17
MIRACLE	98.95 ± 0.15	98.17 ± 0.06	93.20 ± 0.27
MFFGNN	99.06 ± 0.08	98.83 ± 0.16	97.97 ± 0.25

Comparison results on ZhangDDI dataset Comparison results on ChCh-Miner dataset Comparison on the ChCh-Miner dataset Because the ChCh-Miner dataset lacks fingerprints and side-effect information, we only compare the MFFGNN with the graph-based models, and the results are shown in Table 3. The results of these baselines are obtained from Table 3 in Ref. [42]. As shown in Table 3, MFFGNN outperforms all baselines in all metrics, indicating that MFFGNN still maintain its effectiveness on the dataset with few labeled data. In addition, we obtain labeled training data with different amounts by adjusting the proportion of the training set on the ChCh-Miner dataset. This can analyze the robustness of the MFFGNN. We compare MFFGNN with other methods, and the results are shown in Fig. 5a. The results show that MFFGNN has high performance even in a small amount of labeled data. The reason could be that (i) our model fuses topological structure, local chemical context and DDI relationships; (ii) our model extracts both the global features for the molecular graph and the local features for the atoms of the molecular graph; (iii) our model sets a gating mechanism for each graph convolution layer to prevent over-smoothing when stacking multi-layer GCN.

Table 3

Comparison results on ChCh-Miner dataset

Method	AUROC	AUPRC	F1
GCN	82.84 ± 0.61	84.27 ± 0.66	70.54 ± 0.87
GAT	85.84 ± 0.23	88.14 ± 0.25	76.51 ± 0.38
SEAL-C/AI	90.93 ± 0.19	89.38 ± 0.39	84.74 ± 0.48
NFP-GCN	92.12 ± 0.09	93.07 ± 0.69	85.41 ± 0.18
MIRACLE	96.15 ± 0.29	95.57 ± 0.19	92.26 ± 0.09
MFFGNN	97.02 ± 0.25	98.45 ± 0.06	96.94 ± 0.39

Fig. 5

Experimental results on ChCh-Miner dataset

Experimental results on ChCh-Miner dataset Comparison results on DeepDDI dataset The best results are highlighted in bold Comparison on the DeepDDI dataset To verify the scalability of MFFGNN, we perform comparative experiments on the DeepDDI dataset, and the results are shown in Table 4. Because there may be missing information in the large-scale dataset, we only choose the SSP-MLP model. And the NFP-GCN model has worse performance and space limitation. We also ignore the experimental results. We use 881 dimensional molecular fingerprints as the initial node features in the DDI graph for DDIs prediction. Meanwhile, we degrade multi-type DDI prediction methods and obtain binary prediction results on DeepDDI dataset.

Table 4

Comparison results on DeepDDI dataset

Method	AUROC	AUPRC	F1
SSP-MLP	92.28 ± 0.18	90.27 ± 0.28	79.71 ± 0.16
GCN	85.53 ± 0.17	83.27 ± 0.31	72.18 ± 0.22
GAT	84.84 ± 0.23	81.14 ± 0.25	73.51 ± 0.38
SEAL-C/AI	92.83 ± 0.19	90.44 ± 0.39	80.70 ± 0.48
MFs	91.54 ± 0.04	89.82 ± 0.24	83.05 ± 0.5
DPDDI	92.79 ± 0.38	91.15 ± 0.52	85.54 ± 0.40
SSI-DDI	96.14 ± 0.06	94.63 ± 0.47	92.27 ± 0.14
DDIMDL	94.85 ± 0.71	93.48 ± 0.07	82.31 ± 0.44
MUFFIN	95.26 ± 0.12	94.47 ± 0.28	91.22 ± 0.48
MIRACLE	95.51 ± 0.27	92.34 ± 0.17	83.60 ± 0.33
MFFGNN	95.39 ± 0.25	96.81 ± 0.16	92.54 ± 0.61

The best results are highlighted in bold

As shown in Table 4, MFFGNN has high AUROC, AUPRC and F1. The MFs model is relatively poor in all metrics, which only contains one drug feature. Single feature can not comprehensively represent drug information, which will ultimately affect the prediction results. However, MFFGNN integrates the features from drug sequences and molecular graphs to input into DDI graph, so that a more comprehensive drug information can be learned. Although the SSI-DDI and MIRACLE models have higher AUROC metric than MFFGNN, MFFGNN has the highest AUPRC and F1 values. In general, the AUPRC metric is more important than the AUROC metric, because it penalizes false positive DDIs better. F1 focuses on the proportion that can correctly predict DDIs. The imbalance of the data in the DeepDDI dataset may have a negative impact on the AUROC metrics of our model. However, this does not affect the performance of MFFGNN. Cross-dataset evaluations To further evaluate that MFFGNN has good generalization performance, we perform cross-dataset evaluations. One dataset serves as the training set, while the other two serve as test sets. Because of the poor performance of other methods, we compare MFFGNN to three methods, including GAT, SEAL-C/AI and MIRACLE, and the results are shown in Fig. 6. As shown in figures, MFFGNN outperforms the other methods in AUROC, AUPRC and F1. From the above results, it can be shown that our model can predict drug-drug interaction with steady accuracy, independent of the scale of the datasets. Through this experiment, we can also verify that MFFGNN has good generalization performance.

Fig. 6

Cross-dataset experimental results

Ablation study

The performance of different types of features on ChCh-Miner dataset The best results are highlighted in bold S SMILES sequence, M molecular graph, I interaction Ablation experimental results on ChCh-Miner dataset The best results are highlighted in bold In order to verify the validity of each type of feature of drugs, we carry out DDI predictions using each type of feature or combination of feature on ChCh-Miner datasets. The experimental results are shown in Table 5. The best results are highlighted in bold.

Table 5

The performance of different types of features on ChCh-Miner dataset

Method	AUROC	AUPRC	F1
S	90.17 ± 0.04	90.27 ± 0.18	89.14 ± 0.08
M	92.87 ± 0.74	92.55 ± 0.40	90.93 ± 0.56
I	93.23 ± 0.01	92.74 ± 0.15	90.28 ± 0.31
S+I	96.01 ± 0.83	96.89 ± 0.76	94.99 ± 0.23
S+M	95.49 ± 0.72	95.33 ± 0.54	95.02 ± 0.16
M+I	96.25 ± 0.05	97.23 ± 0.02	94.87 ± 0.05
S+M+I	97.02 ± 0.25	98.45 ± 0.06	96.94 ± 0.39

The best results are highlighted in bold

S SMILES sequence, M molecular graph, I interaction

As shown in Table 5, deleting any one of these three types of the features will damage performance. The performance is best when the three types of features are considered simultaneously. In addition, among single feature, considering only the interaction information between drugs or the topological information of the molecular graph, the model has the great performance. Among pairwise feature combinations, considering the interaction information between drugs and the topological information of the molecular graph simultaneously performs best, and pairwise feature combinations can significantly improve performance than single feature. This suggests that multi-feature integration can better represent drugs and improve prediction results. Our model considers the global features for the molecular graph and the local features for the atoms of the molecular graph. In order to study its effectiveness, we design a variant, namely -GWU. -GWU ignores the global information in molecular graphs. As shown in Table 6, deleting the global features will damage performance. To study the validity of contrastive learning, we design a variant, called -Contrastive. This variant removes the contrastive learning from the framework. As shown in Table 6, -Contrastive is inferior to MFFGNN in all metrics. The results show that contrastive learning is beneficial to assist drug feature learning.

Table 6

Ablation experimental results on ChCh-Miner dataset

Method	AUROC	AUPRC	F1
–GWU	95.89 ± 0.15	97.26 ± 0.18	94.97 ± 0.67
–Gating	96.28 ± 0.23	97.78 ± 0.31	95.28 ± 0.20
–Contrastive	96.07 ± 0.28	97.85 ± 0.15	94.38 ± 0.06
MFFGNN	97.02 ± 0.25	98.45 ± 0.06	96.94 ± 0.39

The best results are highlighted in bold

MFFGNN contains a GCN encoder with the gating mechanism to fully utilize the neighborhood information of different order. In order to study its effectiveness, we conduct a comparative experiment based on whether there is gating or not, and the results are shown in Table 6. The performance of the model without gating is lower than that of the model with gating. It can be proved that GCN encoder with gating is beneficial to predict DDI. From Fig. 5b, we can intuitively see the effectiveness of each component of the proposed MFFGNN.

Parameter analysis

In this section, we analyze several key parameters in the model by performing experiments on the ZhangDDI dataset, including in the objective function of our model, the dimensionality of drug representation , sequence length , learning rate , the number of GCN layers and k of the k-head attention in the MGFEM module. We study the influence of different key parameters settings on MFFGNN by fixing other parameters. In order to study the optimal setting of in the objective function of our model, we vary from 0.1 to 1.0 and fix other parameters, the results are shown in Fig. 7a. We observe that the three metrics are optimal when is set to 0.9. On the whole, the non-zero nature of proves the importance of contrastive learning in the model.

Fig. 7

Parameter study on ZhangDDI dataset

When training the BiGRU, we need to input a fix-sized matrix. However, the length of SMILES sequences varies. Therefore, we fix the length of the input sequence at some value and apply zero-padding and cutting operations. To study the optimal setting of sequence length, we vary from 50 to 250 and fix other parameters, the results are shown in Fig. 7b. Because most of the SMILES sequences in the dataset are less than 150 and greater than 100, the model performance is optimal when . When , most of the sequences do not need to be cut, and little information is lost. But, when , most of the sequences will lose information, and the performance is low. When the sequence length is greater than 150, even if zero-paddings are applied, the performance degradation could be trivial, because it contains enough sequence information. In order to study the optimal setting of , we change it from 2 to 1024 and fix other parameters, and the results are shown in Fig. 7c. When is set to 256, the three metrics are optimal, and the model achieves the best performance. Specifically, with the increase of the dimensionality of drug representation, MFFGNN can extract more useful information. However, a too high dimensionality may increase noise and lead to performance degradation. Similarly, in order to study the optimal setting of , we change with and fix other parameters, the results are shown in Fig. 7d. When = 0.0001, the model performance is best. In order to study the optimal setting of and k of the k-head attention in the MGFEM module, we change it from 1 to 4 and fix other parameters, the results are shown in Fig. 7e, f. For k of k-head attention, when , the model performance is the best. As seen from the figure, as the increases, the MFFGNN performance improves. When , the three metrics are optimal and the model achieves the best performance. However, too many layers may cause overfitting and lead to performance degradation. Parameter study on ZhangDDI dataset

Discussions

Drug-Drug prediction has always been a worthy research direction in pharmacology. Most of the existing methods for predicting drug-drug interactions only consider single drug feature. However, single drug feature cannot comprehensively represent drug information, which will ultimately affect the prediction results. Our proposed model takes into account not only the topological structure information in molecular graphs and the interaction information between drugs, but also the local chemical context in SMILES sequences. Multiple drug features will represent the drug information more comprehensively. We perform DDI predictions using each type of feature or combination of features, and the experimenta results are shown in Table 5. When the three types of features are considered simultaneously, the model has the best performance. When extracting information from the molecular graph, we extract the local feature of the atoms and the global feature of the whole molecular graph. This facilitates the remote propagation of the information in graph. We demonstrate the importance of the global features of the molecular graphs in the ablation experiments, and the results are given in Table 6. In addition, To verify evaluate that MFFGNN has good generalization performance, we perform cross-dataset evaluations, and the results are given in Fig. 6. As shown in figures, our model can predict drug-drug interaction with stable accuracy, regardless of the scale of the dataset. However, our model also has some limitations, for example, it does not extend to multi-type DDI prediction tasks. In future work, we will further generalize the model to predict multi-type DDIs events.

Conclusions

In this paper, we propose a novel end-to-end learning framework for DDI prediction, namely MFFGNN, which can efficiently fuse the information from drug molecular graphs, SMILES sequences and DDI graphs. The MFFGNN model utilizes the molecular graph feature extraction module to extract global and local features in molecular graphs. Moreover, in the multi-type feature fusion module, we set up the gating mechanism to control how much neighborhood information is passed to the node. We perform extensive experiments on multiple real datasets. The results show that the MFFGNN model consistently outperforms other state-of-the-art models.

23 in total

1. SumGNN: Multi-typed Drug Interaction Prediction via Efficient Knowledge Graph Summarization.

Authors: Yue Yu; Kexin Huang; Chao Zhang; Lucas M Glass; Jimeng Sun; Cao Xiao
Journal: Bioinformatics Date: 2021-03-26 Impact factor: 6.937

2. Attribute Supervised Probabilistic Dependent Matrix Tri-Factorization Model for the Prediction of Adverse Drug-Drug Interaction.

Authors: Jiajing Zhu; Yongguo Liu; Yun Zhang; Dongxiao Li
Journal: IEEE J Biomed Health Inform Date: 2021-07-27 Impact factor: 5.772

3. MUFFIN: Multi-Scale Feature Fusion for Drug-Drug Interaction Prediction.

Authors: Yujie Chen; Tengfei Ma; Xixi Yang; Jianmin Wang; Bosheng Song; Xiangxiang Zeng
Journal: Bioinformatics Date: 2021-03-15 Impact factor: 6.937

4. Manifold regularized matrix factorization for drug-drug interaction prediction.

Authors: Wen Zhang; Yanlin Chen; Dingfang Li; Xiang Yue
Journal: J Biomed Inform Date: 2018-11-13 Impact factor: 6.317

5. A multimodal deep learning framework for predicting drug-drug interaction events.

Authors: Yifan Deng; Xinran Xu; Yang Qiu; Jingbo Xia; Wen Zhang; Shichao Liu
Journal: Bioinformatics Date: 2020-08-01 Impact factor: 6.937

6. Predicting Drug-Drug Interactions Based on Integrated Similarity and Semi-Supervised Learning.

Authors: Cheng Yan; Guihua Duan; Yayan Zhang; Fang-Xiang Wu; Yi Pan; Jianxin Wang
Journal: IEEE/ACM Trans Comput Biol Bioinform Date: 2022-02-03 Impact factor: 3.710

7. SSI-DDI: substructure-substructure interactions for drug-drug interaction prediction.

Authors: Arnold K Nyamabo; Hui Yu; Jian-Yu Shi
Journal: Brief Bioinform Date: 2021-11-05 Impact factor: 11.622

8. Drug-drug interaction prediction with learnable size-adaptive molecular substructures.

Authors: Arnold K Nyamabo; Hui Yu; Zun Liu; Jian-Yu Shi
Journal: Brief Bioinform Date: 2022-01-17 Impact factor: 11.622

9. Literature based drug interaction prediction with clinical assessment using electronic medical records: novel myopathy associated drug interactions.

Authors: Jon D Duke; Xu Han; Zhiping Wang; Abhinita Subhadarshini; Shreyas D Karnik; Xiaochun Li; Stephen D Hall; Yan Jin; J Thomas Callaghan; Marcus J Overhage; David A Flockhart; R Matthew Strother; Sara K Quinney; Lang Li
Journal: PLoS Comput Biol Date: 2012-08-09 Impact factor: 4.475

10. PubChem 2019 update: improved access to chemical data.

Authors: Sunghwan Kim; Jie Chen; Tiejun Cheng; Asta Gindulyte; Jia He; Siqian He; Qingliang Li; Benjamin A Shoemaker; Paul A Thiessen; Bo Yu; Leonid Zaslavsky; Jian Zhang; Evan E Bolton
Journal: Nucleic Acids Res Date: 2019-01-08 Impact factor: 16.971