| Literature DB >> 29373599 |
Sangrak Lim1, Kyubum Lee1, Jaewoo Kang1,2.
Abstract
Detecting drug-drug interactions (DDI) is important because information on DDIs can help prevent adverse effects from drug combinations. Since there are many new DDI-related papers published in the biomedical domain, manually extracting DDI information from the literature is a laborious task. However, text mining can be used to find DDIs in the biomedical literature. Among the recently developed neural networks, we use a Recursive Neural Network to improve the performance of DDI extraction. Our recursive neural network model uses a position feature, a subtree containment feature, and an ensemble method to improve the performance of DDI extraction. Compared with the state-of-the-art models, the DDI detection and type classifiers of our model performed 4.4% and 2.8% better, respectively, on the DDIExtraction Challenge'13 test data. We also validated our model on the PK DDI corpus that consists of two types of DDIs data: in vivo DDI and in vitro DDI. Compared with the existing model, our detection classifier performed 2.3% and 6.7% better on in vivo and in vitro data respectively. The results of our validation demonstrate that our model can automatically extract DDIs better than existing models.Entities:
Mesh:
Year: 2018 PMID: 29373599 PMCID: PMC5786304 DOI: 10.1371/journal.pone.0190926
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
DDI relation types and explanations.
| Types | Explanation |
|---|---|
| Advice | This type is assigned when a sentence contains recommendation or advice regarding the concomitant use of two drugs |
| Effect | This type is assigned when a sentence contains pharmacodynamic mechanism including a clinical finding, signs or symptoms, an increased toxicity or therapeutic failure. |
| Mechanism | This type is assigned when a sentence contains pharmacokinetic mechanism including changes in levels or concentration of the entities. |
| Int | This type is assigned when a sentence states that an interaction occurs and does not provide any information about the interaction. |
Fig 1Overall system architecture.
We implemented both the one-stage and the two-stage method. (a) Data generation part. (b) One-stage method. Five-class type classifier for the one-stage method. (c) Two-stage method. The DDI detection classifier distinguishes positive DDI instances from negative instances. The DDI type classifier receives the predicted positive instances from the detection classifier as a testing set.
Fig 2The architecture of our recursive neural network model.
Our model is a variation of the binary tree-LSTM model. (1) The words in a sentence. The names of drug targets are underlined. (2) Vector representation of a word through the word embedding lookup process. (3) Subtree containment feature represents the importance of a node. (4) Position feature vector representing the relative distance of two target drugs from the current word position. (5) An example of the position feature vector. The current word is “accelerated.” (6) The size of the concatenated vector input x0 of our model is 10 (size of the subtree containment feature; (3) in the figure) + 20 (size of the position feature; (4) in the figure) + 200 (size of the word embedding; (2) in the figure).
Vector representation according to the distance between one of the target drugs and a current word.
| relative distance | -5 | -4 | -3 | -2 | -1 | 0 | 1 | 2 | 3 | 4 | 5 | 6–10 | 11–15 | 16–20 | 21–∞ |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | |
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | |
| 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | |
| 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | |
| 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
| 1 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |
| 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Note: When the distance difference is 5 or less, the vector is assigned to each difference value. If the distance is greater than 5, the same vector is given in units of 5. We skip the columns ranged from -6 to -∞ of the relative distance due to space limitation.
Search process to find the best hyperparameters used for our model.
| Parameter | Test Range | Test Unit | Selected | |
|---|---|---|---|---|
| Common | Hidden Unit Size | 64–256 | 32 | 128 |
| Subtree Containment Size | 1–10 | 1 | 10 | |
| Batch Size | 100–200 | 20 | 100 | |
| Binary tree-LSTM | Learning Rate | 0.0005–0.005 | 0.0001 | 0.0008 |
| DDI’13 Detection | Keep Probability | 0.5–1.0 | 0.05 | 0.75 |
| Detection Epoch | 30–150 | 10 | 100 | |
| Binary tree-LSTM | Learning Rate | 0.0005–0.005 | 0.0005 | 0.0007 |
| DDI’13 Classification | Keep Probability | 0.5–1.0 | 0.05 | 0.9 |
| Classification Epoch | 30–150 | 10 | 130 | |
| Binary tree-LSTM | Learning Rate | 0.0005–0.005 | 0.0005 | 0.003 |
| PK DDI in vivo | Keep Probability | 0.5–1.0 | 0.05 | 0.75 |
| Classification Epoch | 30–150 | 10 | 80 | |
| Binary tree-LSTM | Learning Rate | 0.0005–0.005 | 0.0005 | 0.001 |
| PK DDI in vitro | Keep Probability | 0.5–1.0 | 0.05 | 0.6 |
| Classification Epoch | 30–150 | 10 | 30 |
Note: We found the optimal parameters by moving one parameter within the specified range by a specified unit while the other parameters were fixed. The keep probability is used for the dropout. The epochs are the stopping points for each task.
The statistics of the DDIExtraction Challenge’13 corpus after preprocessing.
| Positive | Negative | Total | Ratio | |
|---|---|---|---|---|
| Original TrainingSet | 4,020 | 23,772 | 27,792 | 1:5.9 |
| Zhao TrainingSet | 3,840 | 8,989 | 12,829 | 1:2.3 |
| Our TrainingSet | 3,854 | 8,987 | 12,841 | 1:2.3 |
| Original TestSet | 979 | 4,782 | 5,761 | 1:4.9 |
| Zhao TestSet | 971 | 2,084 | 3,055 | 1:2.2 |
| Our TestSet | 971 | 2,049 | 3,020 | 1:2.1 |
Comparison between our proposed model and existing models.
| Detection | Type Classification | |||||
|---|---|---|---|---|---|---|
| P (%) | R (%) | F (%) | P (%) | R (%) | F (%) | |
| MV-RNN Model | - | - | - | 52.0 | 48.0 | 50.0 |
| CNN-rand Model | - | - | - | 69.86 | 56.1 | 62.23 |
| Kim Model | - | - | 77.5 | - | - | 67.0 |
| FBK-irst Model | 79.4 | 80.6 | 80.0 | 64.6 | 65.6 | 65.1 |
| SCNN | 74.7 | 76.8 | 75.7 | 69.1 | 65.1 | 67.0 |
| SCNN | 77.5 | 76.9 | 77.2 | 72.5 | 65.1 | 68.6 |
| CNN-bioWE Model | - | - | - | 75.72 | 64.66 | 69.75 |
| MCCNN Model | - | - | 79.0 | 75.9 | 65.2 | 70.2 |
| Joint AB-LSTM Model | 86.3 | 75.0 | 80.3 | 73.4 | 69.6 | 71.48 |
| Our One-Stage Model (Single) | 82.1 | 78.5 | 80.1 | 74.4 | 69.3 | 71.7 |
| Our One-Stage Model (Ensemble) | 85.5 | 77.8 | 81.5 | 77.8 | 69.6 | |
| Our Two-Stage Model (Single) | 80.6 | 84.2 | 81.8 | 77.7 | 66.1 | 71.4 |
| Our Two-Stage Model (Ensemble) | 83.6 | 84.0 | 79.3 | 67.2 | 72.7 | |
Note: P, R and F denotes Precision, Recall and F1 score, respectively. Model [17], Model [13], Model [8], Model [9], SCNN [10], Model [12], Model [11], Model [14].
Changes in our model’s performance in DDI detection by removing several features of our model.
| Removed Features | P (%) | R (%) | F (%) | |
|---|---|---|---|---|
| Ensemble | + All Features | 83.6 | 84.0 | |
| Single | + All Features | 80.6 | 84.2 | 81.8 |
| Single | + All Features—Static Word Embed | 69.8 | 81.9 | 75.3 |
| Single | + All Features—Subtree feature | 78.6 | 84.2 | 81.2 |
| Single | + All Features—Position feature | 78.0 | 85.5 | 81.4 |
| - Subtree feature | 46.0 | 82.6 | 58.9 | |
| - Static Word Embed | 45.1 | 76.4 | 56.7 |
Note: P, R and F denote Precision, Recall and F1 score, respectively. We test the performance of our single model five times and average the results. For the ensemble performance, we sum the output probabilities of the ensemble members.
The statistics from the PK DDI corpus after preprocessing.
| Positive | Negative | Total | Ratio | |
|---|---|---|---|---|
| in vivo DDI training data | 781 | 2,809 | 3,590 | 1:3.5 |
| in vivo DDI test data | 213 | 676 | 889 | 1:3.1 |
| in vitro DDI training data | 544 | 3,984 | 4,528 | 1:7.3 |
| in vitro DDI test data | 146 | 837 | 1,015 | 1:5.7 |
Comparison of in vivo PK DDI results of our model and those of existing models.
| Precision (%) | Recall (%) | F1-score (%) | |
|---|---|---|---|
| PAS_ReSC [ | 84.8 | 68.5 | 75.8 |
| DEP_ReSC [ | 80.8 | 83.1 | 81.9 |
| Our Model (Single) | 82.1 | 83.3 | 82.6 |
| Our Model (Ensemble) | 85.0 | 82.6 |
Comparison of in vitro PK DDI results of our model and those existing models.
| Precision (%) | Recall (%) | F1-score (%) | |
|---|---|---|---|
| PAS_ReSC [ | 74.8 | 62.5 | 68.1 |
| DEP_ReSC [ | 70.7 | 67.9 | 69.3 |
| Our Model (Single) | 80.3 | 65.9 | 72.3 |
| Our Model (Ensemble) | 81.2 | 67.9 |
Performance changes of our model on the PK DDI in vivo dataset by removing features.
| Removed Features | P (%) | R (%) | F (%) | |
|---|---|---|---|---|
| Ensemble | + All Features | 85.0 | 82.6 | |
| Single | + All Features | 82.1 | 83.3 | 82.6 |
| Single | + All Features—Subtree feature | 78.8 | 84.2 | 81.3 |
| - Position feature | 55.7 | 74.4 | 62.8 | |
| - Static Word Embed | 51.8 | 70.7 | 59.5 |
Note: P, R and F denotes Precision, Recall and F1 score, respectively. The same experiment was repeated five times and the results were averaged.
Examples of three common types of error cases.
| Num | DDI | Sentence |
|---|---|---|
| 1 | True | There is usually complete cross-resistance between |
| 2 | True | The bioavailability of SKELID |
| 3 | False | The drug interaction between |
Note: Underlined drug names are target drugs.