| Literature DB >> 33266753 |
Xia Sun1, Ke Dong1, Long Ma1, Richard Sutcliffe1, Feijuan He2, Sushing Chen3, Jun Feng1.
Abstract
Drug-drug interactions (DDIs) may bring huge health risks and dangerous effects to a patient's body when taking two or more drugs at the same time or within a certain period of time. Therefore, the automatic extraction of unknown DDIs has great potential for the development of pharmaceutical agents and the safety of drug use. In this article, we propose a novel recurrent hybrid convolutional neural network (RHCNN) for DDI extraction from biomedical literature. In the embedding layer, the texts mentioning two entities are represented as a sequence of semantic embeddings and position embeddings. In particular, the complete semantic embedding is obtained by the information fusion between a word embedding and its contextual information which is learnt by recurrent structure. After that, the hybrid convolutional neural network is employed to learn the sentence-level features which consist of the local context features from consecutive words and the dependency features between separated words for DDI extraction. Lastly but most significantly, in order to make up for the defects of the traditional cross-entropy loss function when dealing with class imbalanced data, we apply an improved focal loss function to mitigate against this problem when using the DDIExtraction 2013 dataset. In our experiments, we achieve DDI automatic extraction with a micro F-score of 75.48% on the DDIExtraction 2013 dataset, outperforming the state-of-the-art approach by 2.49%.Entities:
Keywords: convolutional neural network; cross-entropy; dilated convolutions; drug-drug interaction; focal loss; relation extraction
Year: 2019 PMID: 33266753 PMCID: PMC7514143 DOI: 10.3390/e21010037
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.524
Figure 1The architecture of recurrent hybrid convolutional neural network model. This figure is an example of the sentence after drug blinding “DRUG1 modifies DRUG2 metabolism with increased serum levels of DRUG0.” (DDIExtraction 2013 dataset, file Acetazolamide-ddi.xml, sentence DDI-DrugBank.d368.s0).
An example for drug blinding preprocessing of sentence “DIAMOX modifies phenytoin metabolism with increased serum levels of phenytoin.” (DDIExtraction 2013 dataset, file Acetazolamide-ddi.xml, sentence DDI-DrugBank.d368.s0).
| Drug Pair ( | DDI Candidate after Drug Blinding |
|---|---|
| (DIAMOX, phenytoin) | DRUG1 modifies DRUG2 metabolism with increased serum levels of DRUG0. |
| (DIAMOX, phenytoin) | DRUG1 modifies DRUG0 metabolism with increased serum levels of DRUG2. |
| (phenytoin, phenytoin) | DRUG0 modifies DRUG1 metabolism with increased serum levels of DRUG2. |
The statistic of DDIExtraction 2013 dataset.
| Types | Training Set | Test Set | ||
|---|---|---|---|---|
| Original | Filtered | Original | Filtered | |
| DDI pairs | 27,792 | 23,010 | 5716 | 4721 |
| Positive | 4020 | 3998 | 979 | 976 |
| Negative | 23,772 | 19,012 | 4737 | 3745 |
| Advice | 826 | 822 | 221 | 221 |
| Effect | 1687 | 1669 | 360 | 360 |
| Mechanism | 1319 | 1319 | 302 | 299 |
| Int | 188 | 188 | 96 | 96 |
Figure 2The bidirectional recurrent structure used for obtaining the actual semantic embedding of each word. This figure is a partial example of the sentence “DRUG1 modifies DRUG2 metabolism with increased serum levels of DRUG0.” (DDIExtraction 2013 dataset, file Acetazolamide-ddi.xml, sentence DDI-DrugBank.d368.s0).
The value of each category in the training set after the instance filtering.
| DDI Type | The Number of Sample |
|
|---|---|---|
| Advice | 822 | 0.15 |
| Effect | 1669 | 0.08 |
| Mechanism | 1319 | 0.10 |
| Int | 188 | 0.67 |
| Negative | 19,012 | 0.01 |
Performance comparison with other state-of-art methods.
| Models | F-Score of Each DDI Type | Overall | ||||||
|---|---|---|---|---|---|---|---|---|
| Advice | Effect | Mechanism | Int | Precision | Recall | F-Score | ||
| UTurku [ | 63.0 | 60.0 | 58.2 | 50.7 | 73.20 | 49.90 | 59.40 | |
| Feature-based | FBK irst [ | 69.20 | 62.80 | 67.90 | 54.70 | 64.60 | 65.60 | 65.10 |
| method | Kim [ | 72.50 | 66.20 | 69.30 | 48.30 | - | - | 67.00 |
| Raihani [ | 77.40 | 69.60 | 73.60 | 52.40 | 73.70 | 68.70 | 71.10 | |
| CNN [ | 77.72 | 69.32 | 70.23 | 46.37 | 75.70 | 64.66 | 69.75 | |
| SCNN [ | - | - | - | - | 72.50 | 65.10 | 68.60 | |
| MCCNN [ | 78.00 | 68.20 | 72.20 | 51.00 | 75.99 | 65.25 | 70.21 | |
| Neural | GRU [ | - | - | - | - | 73.67 | 70.79 | 72.20 |
| network-based | CNN-GCNs [ | 81.62 | 71.03 | 73.83 | 45.83 | 73.31 | 71.81 | 72.55 |
| method | SVM-LSTM [ | 71.50 | 72.00 | 73.80 | 54.90 | 75.30 | 63.70 | 69.00 |
| Joint-LSTMs [ | 79.41 | 67.57 | 76.32 | 43.07 | 73.41 | 69.66 | 71.48 | |
| Hierarchical RNNs [ | 80.30 | 71.80 | 74.00 | 54.30 | 74.10 | 71.80 | 72.90 | |
| PM-BLSTM [ | 81.60 | 71.28 | 74.42 | 48.57 | 75.80 | 70.38 | 72.99 | |
| Our method | RHCNN | 80.54 |
|
|
|
|
|
|
The effect of different strategies on performance.
| Strategy | F-Score |
|
|---|---|---|
| basic RHCNN + improved focal loss function + information fusion + negative instance filtering | 75.48 | - |
| basic RHCNN + improved focal loss function + information fusion | 74.32 | −1.16 |
| basic RHCNN + improved focal loss function + negative instance filtering | 74.39 | −1.09 |
| basic RHCNN + cross-entropy loss function + negative instance filtering + information fusion | 73.29 | −2.19 |
The effect of different features on performance.
| Embedding Feature | F-Score |
| |
|---|---|---|---|
| word | 69.57 | − | |
| word + position | 70.20 | +0.63 | |
| word + context + position | context trained by LSTM | 73.66 | +3.46 |
| context trained by BiLSTM | 75.48 | +5.28 | |
Prediction results of our Recurrent Hybrid Convolutional Neural Network (RHCNN) method.
| Prediction Results | |||||||
|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
| |
| True label |
| 178 | 5 | 3 | 2 | 33 | 221 |
|
| 2 | 269 | 9 | 0 | 80 | 360 | |
|
| 7 | 4 | 232 | 0 | 56 + 3 | 302 | |
|
| 0 | 36 | 2 | 43 | 15 | 96 | |
|
| 34 | 58 | 45 | 5 | 3603 + 992 | 4737 | |
|
| 221 | 372 | 291 | 50 | 4782 | 5716 | |