| Literature DB >> 35295179 |
Tianjiao Yang1, Ying He1, Ning Yang2.
Abstract
Medical text data records detailed clinical data; named entity recognition is the basis of text information processing and an important part of mining valuable information in medical texts. The named entity recognition technology can accurately identify the information needed in medical texts and help medical staff make clinical decision-making, evidence-based medicine, and epidemic disease monitoring. This paper proposes a hybrid neural network medical text named entity recognition model. First, a coding method based on a fully self-attentive mechanism is proposed. The vector representation of each word is related to the entire sentence through the attention mechanism. It determines the weight distribution by scoring the characters or words in all positions and obtains the position information in the sentence that needs the most attention. The encoding vector at each position is integrated with the context information of full sentence, which solves the ambiguity problem. Second, a multivariate convolutional decoding method is proposed. This method can effectively pay attention to the characteristics of medical text named entity recognition in the decoding process. It uses two-dimensional convolutional decoding to associate the current position word with surrounding words to improve decoding efficiency while extracting features from the logic of the preceding and following words. Using the same number of convolution kernels as the entity category, it can effectively extract effective features from the label dimension. Besides, according to the characteristics of the named entity recognition task, a special mixed loss is designed. The experimental results verify that the proposed method is effective, and it is improved compared with some existing medical text named entity recognition methods.Entities:
Mesh:
Year: 2022 PMID: 35295179 PMCID: PMC8920682 DOI: 10.1155/2022/3990563
Source DB: PubMed Journal: J Healthc Eng ISSN: 2040-2295 Impact factor: 2.682
Figure 1The structure of the HNN.
Figure 2The structure of FSAE. The model input is the word vector after word embedding.
Figure 3The structure of MCD.
Figure 4Model multitask training mechanism.
Dataset entity statistical result.
| Item | Document | Anatomy | Symptom | Independent | Drugs | Operation |
|---|---|---|---|---|---|---|
| Train | 800 | 7534 | 3019 | 3407 | 1658 | 1427 |
| Test | 400 | 4083 | 1482 | 1632 | 915 | 748 |
Comparison with other methods.
| Method | Precision | Recall | F1 |
|---|---|---|---|
| CPM | 0.873 | 0.848 | 0.861 |
| JIC | 0.892 | 0.871 | 0.882 |
| FSCBR | 0.906 | 0.885 | 0.894 |
| MDD | 0.915 | 0.893 | 0.907 |
| Ours | 0.924 | 0.907 | 0.915 |
Figure 5The training loss and test performance.
Figure 6Evaluation on the fully self-attentive encoder.
Figure 7Evaluation on the multivariate convolutional decoder.
Figure 8Evaluation on mixed loss.