| Literature DB >> 28053977 |
Chanqin Quan1, Lei Hua2, Xiao Sun2, Wenjun Bai1.
Abstract
The plethora of biomedical relations which are embedded in medical logs (records) demands researchers' attention. Previous theoretical and practical focuses were restricted on traditional machine learning techniques. However, these methods are susceptible to the issues of "vocabulary gap" and data sparseness and the unattainable automation process in feature extraction. To address aforementioned issues, in this work, we propose a multichannel convolutional neural network (MCCNN) for automated biomedical relation extraction. The proposed model has the following two contributions: (1) it enables the fusion of multiple (e.g., five) versions in word embeddings; (2) the need for manual feature engineering can be obviated by automated feature learning with convolutional neural network (CNN). We evaluated our model on two biomedical relation extraction tasks: drug-drug interaction (DDI) extraction and protein-protein interaction (PPI) extraction. For DDI task, our system achieved an overall f-score of 70.2% compared to the standard linear SVM based system (e.g., 67.0%) on DDIExtraction 2013 challenge dataset. And for PPI task, we evaluated our system on Aimed and BioInfer PPI corpus; our system exceeded the state-of-art ensemble SVM system by 2.7% and 5.6% on f-scores.Entities:
Mesh:
Year: 2016 PMID: 28053977 PMCID: PMC5174749 DOI: 10.1155/2016/1850404
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1The architecture of CBOW model [17].
Statistics for five word embeddings (all with 200 dimensions).
| Vocabulary size | Training corpus | |
|---|---|---|
| 1 | 2515686 | PMC |
| 2 | 2351706 | PubMed |
| 3 | 4087446 | PMC and PubMed |
| 4 | 5443656 | Wikipedia and PubMed |
| 5 | 650187 | MedLine |
Figure 2The architecture of the proposed MCCNN. In this example, the length of input sentence is 10, the input word embedding dimension is 5, and there are 5-word embedding channels. Therefore, the size of multichannel inputs is 5 × 10 × 5. Two windows sizes 3 and 4 are used in this example. The green part is generate by (1). The orange part, representing the max-pooling result, is generated by take the maximum value of the blue part through (3). Since there are 2 filters for each window size, 2 features are produced. These extracted features are then concatenated together and fed to a Softmax layer for classification.
An example for preprocessing of sentence “Caution should be exercised when administering nabumetone with warfarin since interactions have been seen with other NSAIDs” in DDI task. There are 3 entities in this example, and thus 3 entity pairs would be generated.
| Entity1 | Entity2 | Generated inputs |
|---|---|---|
| Nabumetone | warfarin | Caution should be exercised when administering Entity1 with Entity2 since interactions have been seen with other EntityOther |
|
| ||
| Nabumetone | NSAIDs | Caution should be exercised when administering Entity1 with EntityOther since Interactions have been seen with other Entity2 |
|
| ||
| Warfarin | NSAIDs | Caution should be exercised when administering EntityOther with Entity1 since interactions have been seen with other Entity2 |
Examples of noise instance for defined rules; the mentioned entities are in italic.
|
|
|
|
| |
|
| To minimize CNS depression and possible potentiation, |
Statistics for DDIExtraction 2013 challenge corpus. The entities pairs interacting with each other are labeled as positive, otherwise negative. The abstract indicates the number of article abstracts in datasets.
| Train | Test | |||||
|---|---|---|---|---|---|---|
| DrugBank | MedLine | Overall | DrugBank | MedLine | Overall | |
| Abstract | 572 | 142 | 714 | 158 | 33 | 191 |
| Positive | 3788 | 232 | 4020 | 884 | 95 | 979 |
| Negative | 22118 | 1547 | 23665 | 4367 | 345 | 4712 |
| Advice | 818 | 8 | 826 | 214 | 7 | 221 |
| Effect | 1535 | 152 | 1687 | 298 | 62 | 360 |
| Mechanism | 1257 | 62 | 1319 | 278 | 24 | 302 |
| Int | 178 | 10 | 188 | 94 | 2 | 96 |
|
| ||||||
| After preprocessing and filtering rules | ||||||
| Positive | 3767 | 231 | 3998 | 884 | 92 | 976 |
| Negative | 14445 | 1179 | 15624 | 2819 | 243 | 3062 |
| Advice | 815 | 7 | 822 | 214 | 7 | 221 |
| Effect | 1517 | 152 | 1669 | 298 | 62 | 360 |
| Mechanism | 1257 | 62 | 1319 | 278 | 21 | 299 |
| Int | 178 | 10 | 188 | 94 | 2 | 96 |
Vocabulary included in five pretrained word embeddings.
| Vocabulary size | Word embedding | |
|---|---|---|
| 1 | 9984 | PMC |
| 2 | 10273 | PubMed |
| 3 | 10399 | PMC and PubMed |
| 4 | 10432 | Wikipedia and PubMed |
| 5 | 9639 | Medline |
Experimental results of baseline, one-channel, and the proposed MCCNN on DDI task. Baseline: with one-channel randomly initialized word embedding. One-channel: with one-channel Wikipedia and PubMed word embedding.
| Baseline | One-channel | MCCNN | |||||||
|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
| |
| Advice |
| 53.88 | 67.24 | 80.77 | 67.12 | 73.32 | 82.99 | 73.52 |
|
| Effect | 56.32 | 57.42 | 56.87 | 60.46 | 73.67 | 66.41 |
| 69.47 |
|
| Mechanism | 78.33 | 53.36 | 63.47 | 64.72 | 70.81 | 67.63 |
| 62.75 |
|
| Int |
| 30.21 | 45.67 | 82.05 | 33.33 | 47.41 | 75.51 | 38.54 |
|
| Overall (micro) | 70.00 | 52.68 | 60.12 | 66.50 | 67.31 | 66.90 |
| 65.25 |
|
Performances of model with and without preprocessing.
|
| |
|---|---|
| MCCNN (with preprocessing) |
|
| MCCNN (without preprocessing) | 67.80 |
Feature sets for four approaches.
| Method | Feature sets |
|---|---|
| Kim | Word features, dependency graph features |
| Word pair features, parse tree features | |
| Noun phrase constrained coordination features | |
|
| |
| FBK-irst | Linear features, path-enclosed tree kernels |
| Shallow linguistic features | |
|
| |
| WBI | Features combination of other DDI methods |
|
| |
| UTurku | Linear features, external resources |
| Word features, graph features | |
Comparisons with other systems on f-scores. ADV, EFF, MEC, and INT donate advice, effect, mechanism, and int, respectively, while DEC refers to interaction detection.
| ADV | EFF | MEC | INT | DEC | Overall | |
|---|---|---|---|---|---|---|
| Kim | 72.5 | 66.2 | 69.3 | 48.3 | 77.5 | 67.0 |
| FBK-irst | 69.2 | 62.8 | 67.9 |
|
| 65.1 |
| WBI | 63.2 | 61.0 | 61.8 | 51.0 | 75.9 | 60.9 |
| UTurku | 63.0 | 60.0 | 58.2 | 50.7 | 69.6 | 59.4 |
|
| ||||||
| MCCNN |
|
|
| 51.0 | 79.0 |
|
Evaluation results (overall f-scores) on separated DrugBank and MedLine corpus. The first column corresponds to the training data set, while the first row corresponds to the test data set.
| DrugBank | MedLine | |
|---|---|---|
| DrugBank | 70.8 | 52.6 |
| MedLine | 10.0 | 28.0 |
Statistics for Aimed and BioInfer datasets after preprocessing.
| Datasets | Positive | Negative |
|---|---|---|
| BioInfer | 2512 | 7010 |
| Aimed | 995 | 4812 |
Vocabulary in pretrained word embedding.
| Aimed | BioInfer | Word embedding | |
|---|---|---|---|
| All | 6276 | 5461 | — |
| 1 | 5293 | 4666 | PMC |
| 2 | 5363 | 4712 | PubMed |
| 3 | 5404 | 4749 | PMC and PubMed |
| 4 | 5414 | 4762 | Wikipedia and PubMed |
| 5 | 4977 | 4328 | MedLine |
Change of performances from baseline to MCCNN on Aimed and BioInfer datasets, respectively.
| Baseline | One-channel | MCCNN | |||||||
|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
| |
| Aimed | 71.62 | 61.25 | 64.27 | 72.28 | 60.82 | 65.58 | 76.41 | 69.00 |
|
| BioInfer | 78.13 | 73.00 | 72.34 | 76.06 | 79.43 | 77.07 | 81.30 | 78.10 |
|
Comparisons with other systems (f-scores) on Aimed and BioInfer.
| Aimed | BioInfer | |
|---|---|---|
| Choi and Myaeng [ | 67.0 | 72.6 |
| Yang et al. [ | 64.4 | 65.9 |
| Li et al. [ | 69.7 | 74.0 |
| Erkan et al. [ | 59.6 | — |
| Miwa et al. [ | 60.8 | 68.1 |
| Miwa et al. [ | 64.2 | 67.6 |
|
| ||
| MCCNN (the proposed) |
|
|
Configurations of machine.
| GPU | NVIDIA GeForce GTX TITAN X |
| CPU | Intel(R) Xeon CPU E5-2620 v3 @ 2.4 GHz |
| System | Windows 7 |
| memory | 8 G |