| Literature DB >> 27466626 |
Zhehuan Zhao1, Zhihao Yang1, Ling Luo1, Hongfei Lin1, Jian Wang1.
Abstract
MOTIVATION: Detecting drug-drug interaction (DDI) has become a vital part of public health safety. Therefore, using text mining techniques to extract DDIs from biomedical literature has received great attentions. However, this research is still at an early stage and its performance has much room to improve.Entities:
Mesh:
Year: 2016 PMID: 27466626 PMCID: PMC5181565 DOI: 10.1093/bioinformatics/btw486
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.The processing flow of our one-stage method SCNN1
Fig. 2.The predicate-argument structure of the example sentence. The nodes and edges on shortest path connecting the first and last words in the predicate-argument structure are shown in bold and the rest in dotted line. The rectangular nodes represent words. The ellipse vertices represent specific relationships between the predicates and their arguments
Fig. 3.Convolutional feature extraction. X (i = 0, 1, …, t) represents ith word in a t length sentence
The statistics of the DDI corpus
| Corpus | Positives | Negatives | Total | Ratio |
|---|---|---|---|---|
| OriginalTraining Set | 4020 | 23 772 | 27 792 | 1:5.9 |
| NewTrainingSet | 3840 | 8989 | 12 829 | 1:2.3 |
| OriginalTestSet | 979 | 4782 | 5761 | 1:4.9 |
| NewTestSet | 971 | 2084 | 3055 | 1:2.2 |
Notes. Ratio denotes the ratio of the positives to the negatives in the corpus. OriginalTrainingSet and OriginalTestSet denote the original training and test sets, respectively. NewTrainingSet and NewTestSet denote the new training and test sets obtained after possible negatives are removed, respectively. It should be noted that 22 interactions in the training set whose corresponding sentences can’t be parsed correctly by the Enju parser are removed and, therefore, the number of positives (4020) in OriginalTrainingSet is slightly different with that (4042) of the DDIExtraction 2013 corpus.
Performance comparison on DDIExtraction 2013 test set
| Method | Classification | Detection | ||||||
|---|---|---|---|---|---|---|---|---|
| △ | ||||||||
| One-stage | SCNN1 | 0.691 | 0.651 | 0.670 | 0.747 | 0.768 | 0.757 | |
| UTurku ( | 0.732 | 0.499 | 0.594 | 0.858 | 0.585 | 0.696 | ||
| NIL_UCM ( | 0.535 | 0.501 | 0.517 | 0.608 | 0.569 | 0.588 | ||
| Two-stage | SCNN2 | 0.725 | 0.651 | 1.6% | 0.775 | 0.769 | 0.772 | |
| – | – | 0.670 | – | – | 0.775 | |||
| FBK-irst ( | 0.646 | 0.656 | 0.651 | 0.794 | 0.806 | |||
| WBI ( | 0.642 | 0.579 | 0.609 | 0.801 | 0.722 | 0.759 | ||
Notes. SCNN1 denotes our SCNN-based one-stage method and SCNN2 denotes our SCNN-based two-stage method. Δ denotes the performance improvement of SCNN1 over UTurku, and SCNN2 over that of Kim et al. (2015). The boldfaced numerals are the highest values in the corresponding column.
The effect of the strategies and features on performance
| Strategy or feature removed | Δ | |||
|---|---|---|---|---|
| None | 0.725 | 0.651 | 0.686 | – |
| Negative instance filtering | 0.685 | 0.610 | 0.645 | −4.1% |
| Syntax | 0.711 | 0.599 | 0.650 | −3.6% |
| POS | 0.707 | 0.623 | 0.662 | −2.4% |
| POS Encoding | 0.690 | 0.652 | 0.670 | −1.6% |
| Shortest Path | 0.671 | 0.586 | 0.626 | −6.0% |
| Shortest Path Encoding | 0.661 | 0.616 | 0.638 | −4.8% |
| Position | 0.680 | 0.636 | 0.657 | −2.9% |
| Word Embedding | 0.639 | 0.572 | 0.604 | −8.2% |
| Context | 0.657 | 0.599 | 0.627 | −5.9% |
| ConvolutionLayer1 | 0.611 | 0.576 | 0.592 | −9.4% |
| ConvolutionLayer2 | 0.577 | 0.648 | 0.611 | −7.5% |
Notes. Δ denotes the corresponding F-score decrease percentage when a strategy or feature is removed.