| Literature DB >> 34733967 |
Tingyin Chen1,2, Yongmei Hu3.
Abstract
BACKGROUND: Extracting entities and their relationships from electronic medical records (EMRs) is an important research direction in the development of medical informatization. Recently, a method was proposed to transform entity relation extraction into entity recognition by using annotation rules, and then solve the problem of relation extraction by an entity recognition model. However, this method cannot deal with one-to-many entity relationship problems.Entities:
Keywords: Electronic medical record (EMR); annotation rules; deep learning; entity recognition; relation extraction
Year: 2021 PMID: 34733967 PMCID: PMC8506757 DOI: 10.21037/atm-21-3828
Source DB: PubMed Journal: Ann Transl Med ISSN: 2305-5839
Figure 1Entity relation extraction.
Figure 2Annotation of relation entities.
Figure 3Sequence annotation of the entity relationship.
Figure 4Model structure.
Parameter adjustment table
| Parameter type | Optimal | Test range |
|---|---|---|
| Word embedding dimension | 100 | 50–300 |
| Character embedding dimension | 100 | 50–300 |
| CNN convolution kernel size | 2, 3 | 2–7 |
| CNN output size | 200 | 100–300 |
| LSTM output size | 300 | 100–300 |
| Learning rate | 0.001 | 0.1–0.001 |
| Minibatch size | 20 | 10–50 |
| Dropout value | 0.5 | 0.5–1 |
CNN, convolutional neural network; LSTM, long- and short-term memory.
Statistics of the experimental data sets
| Data category | 200 articles | 800 articles | 1,000 articles | Average |
|---|---|---|---|---|
| Sentences | 6,799 | 27,931 | 34,610 | 34.61 |
| Words | 39,016 | 154,112 | 192,980 | 192.98 |
| Characters | 191,332 | 761,332 | 950,600 | 950.60 |
| Entitles | 10,198 | 40,329 | 50,190 | 50.19 |
| Relationships | 2,566 | 9,688 | 12,130 | 12.13 |
Analysis of entity recognition results
| Category | P (%) | R (%) | F1 |
|---|---|---|---|
| All | 90.76 | 91.40 | 0.9108 |
| Disease | 83.76 | 84.09 | 0.8491 |
| Symptoms | 93.36 | 93.33 | 0.9335 |
| Location | 94.26 | 94.68 | 0.9447 |
| Exam | 88.27 | 89.04 | 0.8865 |
| Exam results | 82.29 | 82.51 | 0.8240 |
Analysis of entity relation extraction results
| Category | P (%) | R (%) | F1 |
|---|---|---|---|
| All | 83.46 | 81.12 | 0.8227 |
| Exam-disease | 82.11 | 81.21 | 0.8166 |
| Exam-exam result | 80.21 | 79.92 | 0.8006 |
| Location-symptoms | 85.19 | 86.23 | 0.8571 |
| Location-exam results | 81.14 | 80.66 | 0.8090 |
Performance comparison of entity recognition results under different systems
| Category | BIO | BIEO | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| BiLSTM + CRF | IDCNN + CRF | BiLSTM + CRF | IDCNN + CRF | ||||||||||||
| P (%) | R (%) | F1 | P (%) | R (%) | F1 | P (%) | R (%) | F1 | P (%) | R (%) | F1 | ||||
| All | 92.03 | 91.94 | 0.9201 | 91.62 | 91.52 | 0.9183 | 90.76 | 91.40 | 0.9108 | 90.92 | 91.75 | 0.9178 | |||
| Disease | 86.52 | 85.87 | 0.8671 | 86.76 | 85.95 | 0.8701 | 83.76 | 84.09 | 0.8491 | 83.92 | 85.12 | 0.8508 | |||
| Symptoms | 94.06 | 93.83 | 0.9425 | 94.26 | 93.97 | 0.9375 | 93.36 | 93.33 | 0.9335 | 93.86 | 92.53 | 0.9401 | |||
| Location | 96.06 | 96.88 | 0.9577 | 96.26 | 96.68 | 0.9527 | 94.26 | 94.68 | 0.9447 | 95.16 | 95.08 | 0.9497 | |||
| Exam | 88.78 | 90.04 | 0.8975 | 89.21 | 89.84 | 0.8885 | 88.27 | 89.04 | 0.8865 | 88.36 | 89.46 | 0.8950 | |||
| Exam results | 85.25 | 84.95 | 0.8397 | 84.19 | 84.07 | 0.8382 | 82.29 | 82.51 | 0.8240 | 83.01 | 82.94 | 0.8259 | |||
BIO, Begin-Intermediate-Other sequence annotation; BIEO, Begin-Intermediate-End-Other sequence annotation; BiLSTM, bidirectional long- and short-term memory; CRF, conditional random field; IDCNN, iterated dilated convolutional neural network.
Performance comparison of entity relation extraction results under different systems
| Category | BIO | BIEO | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| BiLSTM + CRF | IDCNN + CRF | BiLSTM + CRF | IDCNN + CRF | ||||||||||||
| P (%) | R (%) | F1 | P (%) | R (%) | F1 | P (%) | R (%) | F1 | P (%) | R (%) | F1 | ||||
| All | 84.76 | 82.62 | 0.8387 | 84.36 | 82.31 | 0.8371 | 83.46 | 81.12 | 0.8227 | 83.61 | 82.02 | 0.8269 | |||
| Exam-disease | 84.14 | 82.72 | 0.8286 | 83.91 | 82.61 | 0.8296 | 82.11 | 81.21 | 0.8166 | 82.82 | 81.98 | 0.8206 | |||
| Exam-exam results | 82.12 | 81.42 | 0.8260 | 81.81 | 80.92 | 0.8206 | 80.21 | 79.92 | 0.8006 | 80.81 | 79.72 | 0.8010 | |||
| Location-symptoms | 85.99 | 87.51 | 0.8691 | 86.79 | 87.93 | 0.8691 | 85.19 | 86.23 | 0.8571 | 84.89 | 86.53 | 0.8681 | |||
| Location-exam results | 83.42 | 82.86 | 0.8197 | 81.82 | 82.46 | 0.8195 | 81.14 | 80.66 | 0.8090 | 81.74 | 81.06 | 0.8079 | |||
BIO, Begin-Intermediate-Other sequence annotation; BIEO, Begin-Intermediate-End-Other sequence annotation; BiLSTM, bidirectional long- and short-term memory; CRF, conditional random field; IDCNN, iterated dilated convolutional neural network.