| Literature DB >> 32364514 |
Zhichang Zhang1, Lin Zhu1, Peilin Yu1.
Abstract
BACKGROUND: Medical entity recognition is a key technology that supports the development of smart medicine. Existing methods on English medical entity recognition have undergone great development, but their progress in the Chinese language has been slow. Because of limitations due to the complexity of the Chinese language and annotated corpora, these methods are based on simple neural networks, which cannot effectively extract the deep semantic representations of electronic medical records (EMRs) and be used on the scarce medical corpora. We thus developed a new Chinese EMR (CEMR) dataset with six types of entities and proposed a multi-level representation learning model based on Bidirectional Encoder Representation from Transformers (BERT) for Chinese medical entity recognition.Entities:
Keywords: Chinese; electronic medical records; medical entity recognition; multi-head attention mechanism; multi-level representation learning; natural language processing
Year: 2020 PMID: 32364514 PMCID: PMC7235813 DOI: 10.2196/17637
Source DB: PubMed Journal: JMIR Med Inform
Figure 1A tagging example of Chinese electronic medical records.
Electronic medical record (EMR) data distribution by department.
| Department | EMR count, n (%) |
| Neurosurgery | 77 (1.93) |
| Neurology | 77 (1.93) |
| Cardiology | 77 (1.93) |
| Gynecology and obstetrics | 77 (1.93) |
| Andrology | 77 (1.93) |
| Respiratory medicine | 77 (1.93) |
| Cardiovasology | 77 (1.93) |
| Hepatobiliary surgery | 77 (1.93) |
| Ophthalmology | 77 (1.93) |
| Orthopedics | 77 (1.93) |
| Gynecology | 101 (2.53) |
| Pediatrics | 232 (5.80) |
| Internal medicine | 970 (24.25) |
| Surgery | 1495 (37.38) |
| Other | 432 (10.80) |
| Total | 4000 (100) |
Figure 2Multi-level representation learning for ER model. B-Sym: beginning of the noun phrase for the symptom entity; B-Test: beginning of the noun phrase for the test entity; C: input sentence; E: input embedding; I-Sym: middle of the noun phrase for the symptom entity; I-Test: middle of the noun phrase for the test entity; O: not a noun phrase; Trm: transform-block; y: output sentence’s predicted tag sequence.
Figure 3Multi-head attention mechanism. K: key; Q: query; V: value.
Components of the two datasets.
| Dataset | Number of records per set | |||
|
| Total | Training set | Validation set | Test set |
| CEMRa dataset | 4000 | 2400 | 800 | 800 |
| CCKSb 2018 | 1000 | 600 | N/Ac | 400 |
aCEMR: Chinese electronic medical record.
bCCKS: China Conference on Knowledge Graph and Semantic Computing.
cNot applicable; because the comparison method does not divide the validation set on the CCKS dataset, we have kept this the same as the original experiment to make the comparison fair.
Comparison of method performance on the Chinese electronic medical record (CEMR) dataset.
| Method | R value (%) | F1 score (%) | |
| Conditional random field (CRF) | 88.57 | 68.43 | 77.21 |
| CNNa+BiLSTMb+CRF | 81.51 | 76.92 | 79.15 |
| Lattice long short-term memory (LSTM) | 88.60 | 74.48 | 80.93 |
| Bidirectional Encoder Representations from Transformers (BERT) | 83.73 | 78.76 | 81.17 |
| Multi-level representation learning for entity recognition (multi-level ER) | 85.21 | 79.23 | 82.11 |
aCNN: convolutional neural network.
bBiLSTM: bidirectional long short-term memory.
Comparison of method performance on the China Conference on Knowledge Graph and Semantic Computing 2018 dataset.
| Method | R value (%) | F1 score (%) | |
| BiLSTMa-CRFb [ | 65.68 | 69.04 | 67.32 |
| SMc-LSTM-CRF [ | 80.54 | 79.61 | 80.08 |
| Multi-level representation learning for entity recognition (multi-level ER) | 83.90 | 82.47 | 83.18 |
aBiLSTM: bidirectional long short-term memory.
bCRF: conditional random field.
cSM: self-matching attention mechanism.
The effect of assembling methods.
| Assembling method | R value (%) | F1 score (%) | |
| Concatenation | 84.22 | 78.97 | 81.51 |
| Sum average | 83.27 | 79.06 | 81.11 |
| Multi-head attention mechanism | 85.21 | 79.23 | 82.11 |
The effect of extracted layer numbers.
| Extraction layer number | R value (%) | F1 score (%) | |
| Total layers | 85.21 | 79.23 | 82.11 |
| The last six layers | 85.15 | 78.65 | 81.77 |
| The last four layers | 85.50 | 78.68 | 81.95 |
| The last two layers | 84.51 | 78.68 | 81.49 |
Figure 4The effect of dataset size. BERT: Bidirectional Encoder Representations from Transformers; CNN: convolutional neural network; CRF: conditional random field; LSTM: long short-term memory; Multi-Level ER: multi-level representation learning for entity recognition.
Figure 5Case studies comparing the multi-level representation learning for entity recognition (Multi-Level ER) model with the Bidirectional Encoder Representations from Transformers (BERT) model.