| Literature DB >> 35908055 |
Shuli Guo1, Wentao Yang1, Lina Han2, Xiaowei Song1, Guowei Wang1.
Abstract
OBJECTIVE: Named entity recognition (NER) is a key and fundamental part of many medical and clinical tasks, including the establishment of a medical knowledge graph, decision-making support, and question answering systems. When extracting entities from electronic health records (EHRs), NER models mostly apply long short-term memory (LSTM) and have surprising performance in clinical NER. However, increasing the depth of the network is often required by these LSTM-based models to capture long-distance dependencies. Therefore, these LSTM-based models that have achieved high accuracy generally require long training times and extensive training data, which has obstructed the adoption of LSTM-based models in clinical scenarios with limited training time.Entities:
Keywords: Clinical named entity recognition; Clinical text mining; Fine-tuning BERT; Medical information processing; Transformer; Word-character lattice
Mesh:
Year: 2022 PMID: 35908055 PMCID: PMC9338545 DOI: 10.1186/s12911-022-01924-4
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 3.298
Proportion of entities in training date
| Label | Count |
|---|---|
| Disease | 2116 |
| Image review | 222 |
| Test | 318 |
| Treatment | 765 |
| Drug | 456 |
| Anatomy | 1486 |
| Total | 5363 |
Fig. 1The architecture of our proposed method
Fig. 2BERT visualization: a attention-head view for BERT, for inputs, The left and center figures represent different layers/attention heads. The right figure depicts the same layer/head as the center figure, but with Sentence A → Sentence B filter selected [31]; b Model view of BERT, for same inputs, layers 4; c Neuron view of BERT for layer 0, head 0
Fig. 3Soft Lattice Chinese Transformer structure
Fig. 4Multiple layers of Chinese sentence segmentation
Test results on the CCKS2019 dataset
| Models | P | R | F1 | Cost time |
|---|---|---|---|---|
| LSTM-CRF | 79.32 | 83.21 | 80.13 | 11 h |
| GRU-CRF | 80.23 | 82.03 | 82.14 | 9 h 7 m |
| BERT-CRF | 83.47 | 80.14 | 82.50 | 7 h 45 m |
| BERT-LSTM-CRF | 86.07 | 86.23 | 86.71 | 8 h 32 m |
| BERT-GRU-CRF | 85.35 | 86.18 | 85.36 | 7 h 23 m |
| BERT-FLAT-CRF[ | 86.56 | 86.56 | 87.49 | 6 h 6 m |
| BERT-Soft Lattice structure Transformer-CRF | 87.86 | 87.53 | 87.83 | 6 h 15 m |
Fig. 5Comparison of the Epoch and F1 relationship of different models
Fig. 6Performance in terms of F1-score with epoch
Result of models to identify long entities
| Models | P | R | F1 |
|---|---|---|---|
| LSTM-CRF | 82.5 | 73.4 | 72.3 |
| GRU-CRF | 80.4 | 72.3 | 75.7 |
| BERT-CRF | 85.4 | 76.7 | 76.2 |
| BERT-LSTM-CRF | 82.3 | 83.9 | 82.6 |
| BERT-GRU-CRF | 83.1 | 84.3 | 84.2 |
| BERT-FLAT—CRF | 85.6 | 86.2 | 85.2 |
| BERT-Soft Lattice structure Transformer—CRF | 90.5 | 91.4 | 91.6 |
Result of models to identify abbreviations and numbers
| Models | P | R | F1 |
|---|---|---|---|
| LSTM-CRF | 79.54 | 80.47 | 80.63 |
| GRU-CRF | 80.43 | 82.35 | 82.27 |
| BERT-CRF | 85.45 | 86.7 | 86.25 |
| BERT-LSTM-CRF | 86.35 | 85.9 | 86.16 |
| BERT-GRU-CRF | 86.13 | 85.3 | 85.25 |
| Soft Lattice structure Transformer-CRF | 90.12 | 90.72 | 90.24 |
| BERT-Soft Lattice structure Transformer- CRF | 90.65 | 90.64 | 90.36 |
Fig. 7Performance in terms of F1-score with proportions of boundary information