| Literature DB >> 23785452 |
Linna He1, Zhihao Yang, Zhehuan Zhao, Hongfei Lin, Yanpeng Li.
Abstract
Drug-drug interaction (DDI) detection is particularly important for patient safety. However, the amount of biomedical literature regarding drug interactions is increasing rapidly. Therefore, there is a need to develop an effective approach for the automatic extraction of DDI information from the biomedical literature. In this paper, we present a Stacked Generalization-based approach for automatic DDI extraction. The approach combines the feature-based, graph and tree kernels and, therefore, reduces the risk of missing important features. In addition, it introduces some domain knowledge based features (the keyword, semantic type, and DrugBank features) into the feature-based kernel, which contribute to the performance improvement. More specifically, the approach applies Stacked generalization to automatically learn the weights from the training data and assign them to three individual kernels to achieve a much better performance than each individual kernel. The experimental results show that our approach can achieve a better performance of 69.24% in F-score compared with other systems in the DDI Extraction 2011 challenge task.Entities:
Mesh:
Year: 2013 PMID: 23785452 PMCID: PMC3681788 DOI: 10.1371/journal.pone.0065814
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Features generated in our feature-based kernel for the instance “Plasma concentrations of drug1 are decreased when administered with drug2 containing drug0 or drug0.”
| Feature | Feature value | |
| Abon | n = 1 | left_area = Plasma, left_area = concentrations, left_area = of; Inner_area = are, Inner_area = decreased,…, Inner_area = with; Right_area = containing,…, Right_area = drug0. |
| n = 2 | left_area = Plasma concentrations, left_area = concentrations of; Inner_area = are decreased,…, Inner_area = administered with; Right_area = containing drug0,…, Right_area = or drug0. | |
| n = 3 | left_area = Plasma concentrations of; inner_area = are decreased when,…, inner_area = when administered with; Right_area = containing drug0 or; Right_area = drug0 or drug0. | |
| San | n = 1 | D1_left = Plasma, D1_left = concentrations, D1_left = of; D1_right = are, D1_right = decreased,…; D2_left = decreased,…; D2_right = containing,… |
| n = 2 | D1_left = Plasma concentrations, D1_left = concentrations of; D1_right = are decreased, D1_right = decreased when,…; D2_left = decreased when; D2_left = when administered,…; D2_right = containing drug0… | |
| n = 3 | D1_left = Plasma concentrations of; D1_right = are decreased when, D1_right = decreased when administered; D2_left = decreased when administered, D2_left = when administered with;D2_right = containing drug0 or, D2_right = drug0 or drug0 | |
| Cpn | n = 1 | D1_left∧D2_left∧distance = Plasma∧are∧5, D1_left∧D2_left∧distance = Plasma∧ decreased∧5, D1_left∧D2_left∧distance = Plasma∧when∧5…; D1_right∧D2_left∧distance = are∧are∧5…; D1_left∧D2_right∧distance = Plasma∧containing∧5…; D1_right∧D2_right∧distance = are∧containing∧5… |
| n = 2 | D1_left∧D2_left∧distance = Plasma concentrations∧are decreased∧5,…;D1_right∧D2_left∧distance = are decreased∧are decreased∧5,…; D1_left∧D2_right∧distance = Plasma concentrations∧containing drug0∧5,…; D1_right∧D2_right∧ditance = are decreased∧containing drug0∧5,… | |
| n = 3 | D1_left∧D2_left∧distance = Plasma concentrations of∧are decreased when ∧5,…; D1_right∧D2_left∧distance = are decreased when∧are decreased when∧5,…; D1_left∧D2_right∧distance = Plasma concentrations of∧containing drug0 or∧5,…; D1_right∧D2_right∧distance = are decreased when∧containing drug0 or∧5… | |
| Negative word | no = -1, not = −1,… | |
| Keyword | decrease = 1, activate = = −1,… | |
| Semantic type | Semtype1 = “carb,phsu”, semtype2 = “gngm” | |
| NameIsDrug | entity1 = −1 entity2 = −1 | |
| DrugBank | indication = 0, pharmacology = 0, description = 0 | |
Figure 1Graph representation generated from an example sentence.
The candidate interaction pair is marked as “drug1” and “drug2”, the other drugs are marked as “drug0”. The shortest path between the drugs is shown in bold. In the dependency based subgraph all nodes in a shortest path are specialized using a post-tag (IP). In the linear order subgraph possible tags are (B)efore, (M)iddle, and (A)fter.
Figure 2The training process of Stacked generalization.
The J-fold cross-validation process in level-0; and the level-1 dataset at the end of this process is used to produce the level-1 hypothesis H.
Figure 3The algorithm of the stacking process.
Performances of different features in the feature-based kernel.
| Feature | Precision | Recall | F-score | MCC | AUC | |
| Lexical features | Abon | 57.40 | 70.86 | 63.43 | 58.92 | 90.7 |
| San | 50.05 | 66.36 | 57.06 | 51.73 | 89.8 | |
| Cpn | 54.61 | 47.81 | 50.99 | 45.65 | 86.0 | |
| Abon+San | 57.20 | 73.64 | 64.39 | 60.11 | 91.2 | |
| Abon+San+Cpn | 58.49 | 72.98 | 64.94 | 60.67 | 91.8 | |
| Lexical features+Negative word | 60.80 | 70.46 | 65.28 | 60.96 | 93.0 | |
| Lexical features+Negative word +NameIsDrug | 59.40 | 72.58 | 65.44 | 61.06 | 92.1 | |
| Lexical features+Negative word +NameIsDrug+Keyword | 59.09 | 73.64 | 65.57 | 61.38 | 92.8 | |
| Lexical features+Negative word+NameIsDrug+Keyword+Semantic type | 60.88 | 71.52 | 65.77 | 61.63 | 92.7 | |
| Lexical features+Negative word+NameIsDrug+Keyword+Semantic type+Indication | 59.50 | 73.51 | 65.91 | 61.76 | 92.0 | |
| Lexical features+Negative word+NameIsDrug +Keyword+Semantictype+ Indication+Pharmacology | 59.87 | 74.70 | 66.47 | 62.42 | 92.2 | |
| Lexical features+Negative word+ NameIsDrug+Keyword+Semantictype +Indication+Pharmacology+Description | 60.30 | 74.44 | 66.63 | 62.58 | 92.2 | |
Performances of each individual kernel and their combinations.
| Method | Precision | Recall | F-score | MCC | AUC |
| Feature-based kernel | 60.30 | 74.44 | 66.63 | 62.58 | 92.2 |
| Tree kernel | 48.06 | 59.21 | 53.06 | 49.94 | 84.6 |
| Graph kernel | 62.84 | 61.59 | 62.21 | 57.71 | 90.3 |
| Feature-based kernel (0.5) +Graph (0.5) | 63.86 | 73.25 | 68.23 | 64.30 | 92.7 |
| Feature-based kernel (0.33) +Graph (0.33) +Tree (0.33) | 65.05 | 71.26 | 68.01 | 64.05 | 92.7 |
| Feature-based kernel (0.45) +Graph (0.4) +Tree (0.15) | 66.79 | 72.19 |
|
|
|
| SVM as the level-1 classifier | 62.21 | 70.86 | 68.46 | 62.06 | 92.8 |
| MLR as the level-1 classifier | 66.17 | 71.25 | 68.62 | 64.74 | 92.7 |
| 5–5 Ranking SVM as the level-1 classifier | 67.18 | 70.99 | 69.03 | 65.21 | 92.7 |
| 6–4 Ranking SVM as the level-1 classifier | 67.43 | 70.46 | 68.91 | 65.10 | 92.8 |
| 7–3 Ranking SVM as the level-1 classifier | 66.22 | 72.45 | 69.20 | 65.38 | 92.9 |
| 8–2 Ranking SVM as the level-1 classifier | 70.46 | 67.55 | 69.06 | 65.33 | 92.9 |
| 9–1 Ranking SVM as the level-1 classifier | 70.39 | 67.68 | 69.01 | 65.37 | 92.9 |
| Ranking SVM as the level-1 classifier | 66.18 | 72.58 |
|
|
|
The weights of each individual kernel in combined kernels are in the parentheses after the kernel name.
Performance comparison with other methods on the DDI Extraction 2011 challenge task dataset.
| Methods | Precision | Recall | F-score | MCC | AUC |
| Thomas et al. | 60.54 | 71.92 | 65.74 | 61.50 | – |
| Chowdhury et al. | 58.59 | 70.46 | 63.98 | 58.25 | – |
| Our combined kernel-1 | 65.05 | 71.26 | 68.01 | 64.05 | 92.7 |
| Our combined kernel-2 | 66.18 | 72.58 |
|
|
|
Examples of DDI instances. The focused entities of each pair are typeset in bold.
| Instances | Our result | Corpus’s annotation | |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| Although not studied with alosetron, inhibition of |
|
|
|
| Intestinal |
|
|
|
| When administered concurrently, the following drugs may interact with |
|
|
|
| As with some other nondepolarizing |
|
|
|
|
|
|
|
|
| Therefore, caution should be used when administering |
|
|
|
| Although no drug-drug interaction studies have been conducted in vivo, it is expected that no significantinteraction would occur when |
|
|
|
| It is recommended that if |
|
|
Analysis of the false negatives.
| Error cause | Error number | Error proportion (%) | Example | |
| Annotation consistency | 71 | 35.5 |
| |
| “Drugs” annotation error | 15 | 7.5 |
| |
| Negative word error | 12 | 6 |
| |
| DDI extraction error | Failure to extract the DDI | 37 | 18.5 |
|
| Unobvious DDI | 65 | 32.5 |
| |
| Totals | 200 | 100 | ||
Analysis of the false positives.
| Error cause | Error number | Error proportion (%) | Example |
| General drug name error | 48 | 24 |
|
| Non-drug name annotation error | 26 | 13 |
|
| “Drugs” annotation error | 12 | 6 |
|
| DDI extraction error | 114 | 57 |
|
| Totals | 200 | 100 |