| Literature DB >> 32819377 |
Zhijing Li1,2, Chen Li3,4, Yu Long1,2, Xuan Wang1,2.
Abstract
BACKGROUND: The popularization of health and medical informatics yields huge amounts of data. Extracting clinical events on a temporal course is the foundation of enabling advanced applications and research. It is a structure of presenting information in chronological order. Manual extraction would be extremely challenging due to the quantity and complexity of the records.Entities:
Keywords: Attention mechanism; Clinical text mining; Event extraction; Piecewise representation; Relation extraction; Temporal extraction
Mesh:
Year: 2020 PMID: 32819377 PMCID: PMC7439713 DOI: 10.1186/s12911-020-01208-9
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1The example of the medical information extraction. The texts marked by the underscores, “___”, are the temporal expressions; the texts marked by the dash lines, “_ _ _”, are the event expressions
Fig. 2The data-processing pipeline of the system. We first extract temporal expressions and event expressions respectively from medical records, then extract the relations between them
Fig. 3Architecture of the RNN model for the extraction of temporal expressions and event expressions. We propose the attention-based RNN model to do the entity extraction. On the left side of the figure is the details of the attention mechanism, the right part of the figure is the RNN model
Fig. 4The flow of our proposed model (APRNN). On the left side of the figure is the details of the attention mechanism, the right part of the figure is the RNN model, which is divided into three parts
Fig. 5The flow of the Comparison system. This is a common RNN model
The distribution of the THYME corpus. In this table, we show the different types of data in the corpus
| Data | Colon cancer | Brain cancer |
|---|---|---|
| 293,143,141 | 30,148 | |
| 3833 2078 1952 | 3,501,552 | |
| 38,890 20,974 18,990 | 2557 11,510 | |
| 11,150 6163 5894 | 6,241,759 |
The temporal expressions extraction results on colon cancer. The Part 1 shows the results of six different methods that we used to do the temporal expressions extraction; the Part 2 shows the result of the previously best system
| Method | P | R | F1 | |
|---|---|---|---|---|
| Part 1 | Rule-based | 0.412 | 0.594 | 0.486 |
| CRF | 0.813 | 0.592 | 0.685 | |
| RNN | 0.662 | 0.629 | 0.645 | |
| RNN-att | 0.677 | 0.669 | 0.663 | |
| ARNN | 0.691 | 0.675 | 0.683 | |
| CRF-ARNN | 0.754 | 0.725 | 0.739 | |
| Part 2 | BluLab: run 1–3 | 0.797 | 0.664 | 0.725 |
The temporal expressions extraction results on brain cancer. We utilize 6 different methods to do the task, the results are shown in Part 1; the result of previously best system is shown in Part 2
| Method | P | R | F1 | |
|---|---|---|---|---|
| Part 1 | Rule-based | 0.33 | 0.52 | 0.41 |
| CRF | 0.72 | 0.55 | 0.62 | |
| RNN | 0.63 | 0.57 | 0.60 | |
| RNN-att | 0.65 | 0.57 | 0.61 | |
| ARNN | 0.66 | 0.60 | 0.63 | |
| CRF-ARNN | 0.69 | 0.65 | 0.67 | |
| Part 2 | GUIR | 0.51 | 0.67 | 0.58 |
The event extraction results on colon cancer. 5 different methods are utilized by us, all the results are shown in Par1; the Part 2 shows the result of the previously best system
| Method | P | R | F1 | |
|---|---|---|---|---|
| Part 1 | SVM | 0.860 | 0.843 | 0.851 |
| CRF | 0.896 | 0.874 | 0.885 | |
| RNN | 0.893 | 0.897 | 0.893 | |
| RNN-att | 0.903 | 0.899 | 0.901 | |
| ARNN | 0.922 | 0.908 | 0.915 | |
| Part 2 | BluLab: run 1–3 | 0.887 | 0.864 | 0.875 |
The event extraction results on brain cancer. We adopt 5 methods to do the task, the results can be compared in Part 1; the result of the best system is shown in Part 2
| Method | P | R | F1 | |
|---|---|---|---|---|
| Part 1 | SVM | 0.55 | 0.69 | 0.61 |
| CRF | 0.68 | 0.80 | 0.73 | |
| RNN | 0.75 | 0.83 | 0.77 | |
| RNN-att | 0.77 | 0.79 | 0.78 | |
| ARNN | 0.82 | 0.78 | 0.80 | |
| Part 2 | LIMSI | 0.69 | 0.85 | 0.76 |
The ER classification results on colon cancer. Part 1 shows the results of the relevant methods we used; the other related works, which achieved the very good results are shown in Part 2
| Method | P | R | F1 | |
|---|---|---|---|---|
| Part 1 | RNN | 0.697 | 0.721 | 0.709 |
| RNN-whole | 0.668 | 0.680 | 0.674 | |
| RNN-att | 0.719 | 0.715 | 0.717 | |
| RNN-pie | 0.717 | 0.709 | 0.713 | |
| APRNN-wiki | 0.727 | 0.717 | 0.722 | |
| APRNN-BioASQ | 0.731 | 0.723 | 0.727 | |
| APRNN | 0.733 | 0.711 | 0.729 | |
| Part 2 | BluLab: run 1–3 | 0.712 | 0.693 | 0.702 |
| SVM | 0.678 | 0.658 | 0.668 | |
| Att-BLSTM | 0.721 | 0.715 | 0.718 |
The ER classification results on brain cancer. The results of our proposed methods are shown in Part 1; Part 2 shows the results of other related
| Method | P | R | F1 | |
|---|---|---|---|---|
| Part 1 | RNN | 0.61 | 0.59 | 0.60 |
| RNN (whole) | 0.59 | 0.55 | 0.57 | |
| RNN-att | 0.61 | 0.61 | 0.61 | |
| RNN-pie | 0.62 | 0.60 | 0.61 | |
| APRNN-wiki | 0.63 | 0.61 | 0.62 | |
| APRNN-BioASQ | 0.64 | 0.62 | 0.63 | |
| APRNN | 0.65 | 0.59 | 0.63 | |
| Part 2 | LIMSI | 0.53 | 0.66 | 0.59 |
| SVM | 0.57 | 0.53 | 0.55 | |
| Att-BLSTM | 0.63 | 0.61 | 0.62 |
The temporal relation classification results on TimeBank_Dense corpus
| Method | P | R | F1 |
|---|---|---|---|
| ClearTK | 0.397 | 0.091 | 0.147 |
| CAEVO | 0.508 | 0.506 | 0.507 |
| APRNN | 0.511 | 0.507 | 0.509 |
The ER classification results of different length of sentences
| length of sentences | 0–18 | 19–100 | > 100 |
| number of sentences | 5311 | 2320 | 96 |
| P | 0.723 | 0.714 | 0.612 |
| R | 0.743 | 0.744 | 0.669 |
| F | 0.733 | 0.729 | 0.639 |