| Literature DB >> 27081156 |
Huiwei Zhou1, Huijie Deng2, Long Chen2, Yunlong Yang2, Chen Jia2, Degen Huang2.
Abstract
Identifying chemical-disease relations (CDR) from biomedical literature could improve chemical safety and toxicity studies. This article proposes a novel syntactic and semantic information exploitation method for CDR extraction. The proposed method consists of a feature-based model, a tree kernel-based model and a neural network model. The feature-based model exploits lexical features, the tree kernel-based model captures syntactic structure features, and the neural network model generates semantic representations. The motivation of our method is to fully utilize the nice properties of the three models to explore diverse information for CDR extraction. Experiments on the BioCreative V CDR dataset show that the three models are all effective for CDR extraction, and their combination could further improve extraction performance.Database URL:http://www.biocreative.org/resources/corpora/biocreative-v-cdr-corpus/.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27081156 PMCID: PMC4831723 DOI: 10.1093/database/baw048
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1.Hybrid system architecture.
Figure 2.SDPT. (A) The fragment of dependency tree for Sentence 1. (B) SDPT. (C) SDF based on SDPT. (D) Extended SDPT. (E) Extended SDF based on SDPT.
Figure 3.SPF based on SDPT. (A) The fragment of phrase tree for Sentence 1. (B) SPF based on SDPT.
Figure 4.Detailed architecture of the peephole LSTM.
Figure 5.SDP sequences. (A) SDP-dep sequence. (B) SDP-seq sequence.
Performance of the feature-based model with flat features
| Flat features | |||
|---|---|---|---|
| Basic | |||
| Context | 59.07 | 44.00 | 50.43 |
| +Entity | 60.73 | 45.40 | 51.96 |
| +Position | 60.95 | 45.68 | 52.23 |
| +Distance | 61.99 | 46.81 | 53.34 |
| +Verb | 62.15 | 47.28 | 53.70 |
| FDF | |||
| +Context | 62.39 | 47.47 | 53.92 |
| +Position | 62.86 | 47.47 | 54.09 |
Performance of kernel-based model with structure features
| Structure features | |||
|---|---|---|---|
| SDF | 57.86 | 44.18 | 50.11 |
| SPF | 59.08 | 42.12 | 49.18 |
| SDF+SPF | 59.70 | 44.18 | 50.78 |
Comparison with other structured syntactic representation
| Structure features | |||
|---|---|---|---|
| SDF | 57.86 | 44.18 | 50.11 |
| PT | 63.00 | 41.37 | 49.94 |
| Extended SDF | 61.17 | 42.12 | 49.89 |
Performance of LSTM model with the different input methods
| Methods | |||
|---|---|---|---|
| WORD | 47.08 | 56.00 | 51.16 |
| WORD-POS | 52.96 | 50.28 | 51.59 |
| HEAD | 48.41 | 55.82 | 51.85 |
| SDP-dep | 50.44 | 53.85 | 52.09 |
| SDP-seq | 54.08 | 51.03 | 52.51 |
| SDP-seq+POS | 54.06 | 51.22 | 52.60 |
| SDP-seq+HEAD | 54.33 | 51.22 | 52.73 |
| SDP-seq+POS+HEAD | 54.91 | 51.41 | 53.10 |
Performance of CNN model with the different input methods
| Methods | |||
|---|---|---|---|
| WORD | 49.25 | 46.44 | 47.80 |
| WORD-POS | 46.54 | 50.47 | 48.92 |
| HEAD | 49.57 | 48.97 | 49.27 |
| SDP-dep | 42.00 | 53.66 | 47.12 |
| SDP-seq | 47.64 | 47.28 | 47.46 |
| SDP-seq+POS | 49.56 | 47.28 | 48.39 |
| SDP-seq+HEAD | 46.97 | 48.03 | 47.50 |
| SDP-seq+POS+HEAD | 41.13 | 55.25 | 47.16 |
Figure 6.Performance of different weightings of the three models (feature-based model: top, kernel-based model: left, LSTM model: right). ‘+’ indicates the maximum; ‘O’ indicates the minimum.
Statistical analysis of different systems. (feature-based, kernel-based and LSTM models are shorted as F, K and L, respectively)
| Combination systems | ||||
|---|---|---|---|---|
| FKL | 60.30 | 49.19 | 54.18 | |
| FK | 64.64 | 43.94 | 52.31 | 0.025 |
| FL | 57.36 | 50.46 | 53.83 | 0.032 |
| KL | 57.39 | 50.07 | 53.48 | 0.011 |
Effects of post-processing on the test set
| System | |||
|---|---|---|---|
| Hybrid system | 64.89 | 49.25 | 56.00 |
| + Causal relation rules | 62.99 | 51.41 | 56.61 |
| + Focused chemical rules | 55.56 | 68.39 | 61.31 |
Comparison with related work
| System | |||
|---|---|---|---|
| Ours (golden) | 55.56 | 68.39 | 61.31 |
| Ours (NER) | 42.59 | 49.91 | 45.96 |
| Xu | 55.67 | 58.44 | 57.03 |
| Pons | 51.34 | 53.85 | 52.56 |
| Lowe | 52.62 | 51.78 | 52.20 |
Figure 7.Origins of FP errors.
Figure 8.Origins of FN errors.