| Literature DB >> 30700301 |
Zhiheng Li1, Zhihao Yang1, Chen Shen1, Jun Xu2, Yaoyun Zhang2, Hua Xu3.
Abstract
BACKGROUND: Extracting relations between important clinical entities is critical but very challenging for natural language processing (NLP) in the medical domain. Researchers have applied deep learning-based approaches to clinical relation extraction; but most of them consider sentence sequence only, without modeling syntactic structures. The aim of this study was to utilize a deep neural network to capture the syntactic features and further improve the performances of relation extraction in clinical notes.Entities:
Keywords: Relation extraction - deep learning; Shortest dependency path
Mesh:
Year: 2019 PMID: 30700301 PMCID: PMC6354333 DOI: 10.1186/s12911-019-0736-9
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
– Statistics of the relation extraction dataset (a subset from the 2010 i2b2/VA challenge)
| Relation type | Description | Number of instances |
|---|---|---|
| TeCP | Test conducted to investigate medical problem | 504 |
| TeRP | Test reveals medical problem | 3052 |
| PIP | Medical problem indicates medical problem | 2203 |
| TrCP | Treatment causes medical problem | 526 |
| TrAP | Treatment is administered for medical problem | 2617 |
| TrWP | Treatment worsens medical problem | 133 |
| TrNAP | Treatment is not administered because of medical problem | 174 |
| TrIP | Treatment improves medical problem | 203 |
| None | No relation between target entities | 19,870 |
| Total | – | 29,282 |
Fig. 1- Architecture of our model. Our neural network architecture consists of three modules: (1) sentence sequence representation module; (2) SDP representation module; and (3) classification module
Fig. 2- An illustration of SDP generation. This figure shows the dependency syntactic graph and the SDP of sentence “She was maintained on a epidural and pca for pain control”
– Performance of our proposed methods on the 2010 i2b2/VA subset (5-fold cross validation)
| Features | Precision (%) | Recall (%) | F-measure (%) | ∆ (%) |
|---|---|---|---|---|
| Sentence Sequence only | 74.01 | 69.79 | 71.84 | – |
| +SDP (Word Sequence) | 74.20 | 72.84 | 73.51 | 1.67 |
| +SDP (Word Sequence + Relation Type) | 75.69 | 73.03 | 74.34 | 2.50 |
– Improvements in F-measure by adding SDP module for each relation type
| Relation Type | Sentence Sequence | Sentence sequence + SDP | ∆ |
|---|---|---|---|
| TeCP | 54.24 | 61.17 | 6.93 |
| TeRP | 83.64 | 84.44 | 0.80 |
| PIP | 63.09 | 63.33 | 0.24 |
| TrCP | 56.45 | 62.13 | 5.68 |
| TrAP | 75.53 | 79.74 | 4.21 |
| TrWP | 18.05 | 44.57 | 26.52 |
| TrNAP | 30.49 | 42.27 | 11.78 |
| TrIP | 51.85 | 61.59 | 9.74 |
– Comparison of performance of different systems reported on the same i2b2–2010 corpus
| Publications | Models | Precision (%) | Recall (%) | F-measure (%) |
|---|---|---|---|---|
| Rink et al. [ | SVM | 67.44 | 57.85 | 59.31 |
| Sahu te al. [ | Multi-CNN-Max | 55.73 | 50.08 | 49.42 |
| Sahu and Anand [ | LSTM-ATT | 65.23 | 56.77 | 60.04 |
| Wang et al. [ | RCNN | 50.07 | 45.34 | 46.47 |
| Raj et al. [ | CRNN | 67.91 | 61.98 | 64.38 |
| *Luo et al. [ | Seg-CNN | – | – | 74.20 |
| Our model | 75.69 | 73.03 | 74.34 |
*Luo et al. used the original dataset from the challenge (871 documents in total)
– Instances Corrected by Adding SDP-based Module
| Relation Type | Sentence Sequence | SDP |
|---|---|---|
| TrWP | Subsequent discontinuance of [ | |
| TrNAP | ||
| TrIP | [ |
The italics in each sentence sequence are the candidate pair entities