| Literature DB >> 30943972 |
Buzhou Tang1, Xiaolong Wang1, Jun Yan2, Qingcai Chen3.
Abstract
BACKGROUND: Clinical entity recognition as a fundamental task of clinical text processing has been attracted a great deal of attention during the last decade. However, most studies focus on clinical text in English rather than other languages. Recently, a few researchers have began to study entity recognition in Chinese clinical text.Entities:
Keywords: Chinese clinical entity recognition; Conditional random field; Convolutional neural network; Long-short term memory; Neural network
Year: 2019 PMID: 30943972 PMCID: PMC6448175 DOI: 10.1186/s12911-019-0787-y
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1Overview architecture of attention-based CNN-LSTM-CRF
Fig. 2Overview architecture of the CNN layer
Fig. 3Overview architecture of the attention layer
Statistics of CCKS2017_CNER and ICRC_CNER for entity recognition in Chinese clinical text
| Dataset (CCKS2017_CNER) | #Record | #Clinical Entity | |||||
| #Body | #Disease | #Symptom | #Test | #Treament | #All | ||
| Training (300) | 300 | 10,719 | 722 | 7831 | 9546 | 1048 | 29,866 |
| Test (100) | 100 | 3021 | 553 | 2311 | 3143 | 465 | 9493 |
| Total (400) | 400 | 13,740 | 1275 | 10,142 | 12,689 | 1513 | 39,359 |
| Dataset (ICRC_CNER) | #Record | #Clinical Entity | |||||
| #Medication | #Disease | #Symptom | #Test | #Treament | #All | ||
| Training | 600 | 1293 | 11,470 | 5270 | 17,024 | 3065 | 38,122 |
| 0 | 7441 | 75 | 7 | 107 | 7630 | ||
| Development | 176 | 475 | 3594 | 1738 | 5276 | 938 | 12,021 |
| 0 | 2421 | 37 | 3 | 41 | 2502 | ||
| Test | 400 | 999 | 7932 | 3353 | 11,326 | 2020 | 35,630 |
| 3 | 5153 | 57 | 6 | 61 | 5280 | ||
| Total | 1176 | 2767 | 22,996 | 10,361 | 33,626 | 6023 | 75,773 |
| 3 | 15,015 | 169 | 16 | 209 | 15,412 | ||
Performances of different methods on the two datasets: CCKS2017_CNER and ICRC_CNER
| Dataset | Method | Strict (%) | Relaxed (%) | ||||
|---|---|---|---|---|---|---|---|
| Precision | Recall | F1-score | Precision | Recall | F1-score | ||
| CCKS2017_CNER | CRF |
| 88.20 | 89.69 |
| 92.57 | 94.13 |
| LSTM-CRF | 90.68 | 89.67 | 90.17 | 95.18 | 94.12 | 94.65 | |
| Our Method | 90.73 |
|
| 94.84 |
|
| |
| ICRC_CNER | CRF | 81.84 | 78.86 | 80.32 | 93.75 | 90.34 | 92.01 |
| 83.42 | 79.90 | 81.62 |
| 90.05 | 92.00 | ||
| LSTM-CRF |
| 82.26 | 82.90 | 93.80 | 92.35 | 93.07 | |
| 82.71 | 83.30 | 83.00 | 92.77 | 93.42 | 93.09 | ||
| Our Method | 82.96 | 82.60 | 82.78 | 93.30 | 92.90 | 93.10 | |
| 82.66 |
|
| 92.57 |
|
| ||
Table 2 shows the performances of different methods on CCKS2017_CNER and ICRC_CNER, where the highest measures are in bold (the following sections also use the same way to denote the highest measures)
Effects of the CNN layer and attention layer in our method
| Method | CCKS2017_CNER (%) | ICRC_CNER (%) | ||||
|---|---|---|---|---|---|---|
| Precision | Recall | F1-score | Precision | Recall | F1-score | |
| Our method | 90.73 |
| 90.61 | 82.66 |
|
|
| w/o CNN |
| 90.48 |
|
| 82.53 | 83.12 |
| w/o attention | 90.61 | 90.23 | 90.42 | 83.16 | 83.29 | 83.22 |
| w/o both | 90.68 | 89.67 | 90.17 | 82.71 | 83.30 | 83.00 |
Table 2 shows the performances of different methods on CCKS2017_CNER and ICRC_CNER, where the highest measures are in bold (the following sections also use the same way to denote the highest measures)
Performances of our CNN-LSTM-Attention model on each category under “strict” criterion
| Category | ICRC_CNER (%) | CCKS2017_CNER (%) | ||||
|---|---|---|---|---|---|---|
| Pre. | Rec. | F1 | Pre. | Rec. | F1 | |
| Disease | 82.84 | 81.67 | 82.25 | 85.06 | 77.22 | 80.95 |
| Symptom | 77.06 | 76.01 | 76.53 | 94.92 | 96.28 |
|
| Test | 84.19 | 89.03 |
| 93.66 | 93.48 |
|
| Treatment | 77.53 | 79.58 | 78.54 | 77.63 | 79.08 | 83.98 |
| Medication | 87.88 | 89.72 |
| / | / | / |
| Body | / | / | / | 86.89 | 87.36 |
|
Table 2 shows the performances of different methods on CCKS2017_CNER and ICRC_CNER, where the highest measures are in bold (the following sections also use the same way to denote the highest measures)
Performances of methods on contiguous and discontiguous clinical entity under “strict” criterion on ICRC_CNER
| Method | Contiguous entity (%) | Discontiguous entity (%) | ||||
|---|---|---|---|---|---|---|
| Precision | Recall | F1-score | Precision | Recall | F1-score | |
| CRF | 83.52 | 84.35 | 83.93 |
| 58.26 | 68.37 |
| LSTM-CRF | 83.35 | 87.35 | 85.30 | 78.70 | 63.62 | 70.36 |
| Our method |
|
|
| 77.17 |
|
|
Table 2 shows the performances of different methods on CCKS2017_CNER and ICRC_CNER, where the highest measures are in bold (the following sections also use the same way to denote the highest measures)
Fig. 4“strict” F1-scores of different methods on each category of clinical text