| Literature DB >> 35299894 |
Yuxin Sun1, Zhenying Zhao2, Zhongyi Wang2, Haiyang He2, Feng Guo2, Yuchen Luo3, Qing Gao1, Ningjing Wei2, Jialin Liu2, Guo-Zheng Li2, Ziqing Liu2.
Abstract
This paper addresses the mixture symptom mention problem which appears in the structuring of Traditional Chinese Medicine (TCM). We accomplished this by disassembling mixture symptom mentions with entity relation extraction. Over 2,200 clinical notes were annotated to construct the training set. Then, an end-to-end joint learning model was established to extract the entity relations. A joint model leveraging a multihead mechanism was proposed to deal with the problem of relation overlapping. A pretrained transformer encoder was adopted to capture context information. Compared with the entity extraction pipeline, the constructed joint learning model was superior in recall, precision, and F1 measures, at 0.822, 0.825, and 0.818, respectively, 14% higher than the baseline model. The joint learning model could automatically extract features without any extra natural language processing tools. This is efficient in the disassembling of mixture symptom mentions. Furthermore, this superior performance at identifying overlapping relations could benefit the reassembling of separated symptom entities downstream.Entities:
Mesh:
Year: 2022 PMID: 35299894 PMCID: PMC8923793 DOI: 10.1155/2022/2146236
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1The framework for extracting mixture symptom mentions.
Figure 2An example of annotated sentences.
| Type | Counts | Percentage |
|---|---|---|
| 症状 (symptom) | 41,004 | 56.26% |
| 部位 (area of the body) | 26,829 | 36.8% |
| 程度 (severity) | 5,061 | 6.94% |
| Total | 7,2894 | 100% |
| Type | Counts | Percentage |
|---|---|---|
| 位于 (located_at) | 35,804 | 86.81% |
| 描述 (is_a_description_of) | 5,442 | 13.19% |
| Total | 41,246 | 100% |
The frequency at which the entity is involved in relations.
| Frequency at which the entity is involved in relations | Counts | Percentage |
|---|---|---|
| 0 | 4,706 | 6.456% |
| 1 | 56,046 | 76.886% |
| 2 | 10,387 | 14.249% |
| 3 | 1,440 | 1.975% |
| 4 | 241 | 0.331% |
| 5 | 65 | 0.089% |
| 6 | 7 | 0.010% |
| 7 | 3 | 0.004% |
Joint learning between official BERT-base (Chinese) and fine-tuned BERT.
| Model | Label embedding | F1-score | Precision | Recall |
|---|---|---|---|---|
| BERT-base (Chinese) | Without | 0.7968 | 0.7998 | 0.7939 |
| With | 0.8102 | 0.8119 | 0.8085 | |
| Fine-tuned BERT (ours) | Without | 0.8016 | 0.8218 | 0.7823 |
| With | 0.8216 | 0.8250 | 0.8183 |
Figure 3The joint model framework for entity relation extraction.
Entity recognition and relation extraction with pipeline approaches.
| Model | Label embedding | F1-score | Precision | Recall |
|---|---|---|---|---|
| Relation extraction pipeline | Without | 0.6794 | 0.8374 | 0.5716 |
| Multihead joint learning | Without | 0.7079 | 0.8228 | 0.6212 |
| BERT+relation extraction pipeline | Without | 0.7222 | 0.7596 | 0.6884 |
| BERT+relation extraction pipeline | With | 0.7851 | 0.8496 | 0.7297 |
| BERT+multihead joint learning | Without | 0.8016 | 0.8218 | 0.7823 |
| BERT+multihead joint learning | With | 0.8216 | 0.8250 | 0.8183 |