| Literature DB >> 34983362 |
Zhaoying Chai1, Han Jin1, Shenghui Shi2, Siyan Zhan3, Lin Zhuo4, Yu Yang5.
Abstract
BACKGROUND: Biomedical named entity recognition (BioNER) is a basic and important medical information extraction task to extract medical entities with special meaning from medical texts. In recent years, deep learning has become the main research direction of BioNER due to its excellent data-driven context coding ability. However, in BioNER task, deep learning has the problem of poor generalization and instability.Entities:
Keywords: BioNLP; Biomedical named entity recognition; Conditional random field; Permutation language model; Transfer learning
Mesh:
Year: 2022 PMID: 34983362 PMCID: PMC8729142 DOI: 10.1186/s12859-021-04551-4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Architecture of the XLNet-CRF model
Fig. 2Model segmentation schematics for hierarchical shared transfer learning. We split XLNet-CRF. The underlying layers of the token embedding and XLNet models are share, with the upper layers and CRFs of the XLNet as specific tasks
Performance of STL-DS and MTL-XC on all tasks
| Dataset | STL | MTL-XC | ||||
|---|---|---|---|---|---|---|
| P. % | R. % | F1 | P. % | R. % | F1 | |
| BC4CHEMD | 93.00 | 92.40 | 92.02 | 92.49 | 91.25 | |
| BC5CDR | 92.76 | 93.96 | 93.43 | 92.94 | 93.19 | |
| BioNLP11ID | 55.56 | 72.58 | 62.94 | 61.44 | 75.81 | |
| BioNLP13CG | 83.20 | 85.14 | 82.35 | 80.88 | 81.61 | |
| BioNLP13PC | 88.57 | 90.33 | 76.65 | 83.80 | 80.07 | |
| CRAFT | 84.07 | 81.05 | 75.34 | 72.38 | 73.83 | |
| BC5CDR | 84.83 | 88.11 | 86.44 | 86.40 | 87.34 | |
| NCBI-disease | 87.27 | 89.27 | 88.26 | 87.80 | 89.73 | |
| BC2GM | 81.91 | 82.53 | 82.22 | 82.67 | 82.61 | |
| BioNLP09 | 88.20 | 86.82 | 87.50 | 87.23 | 91.95 | |
| BioNLP11EPI | 84.23 | 87.96 | 85.81 | 85.32 | 87.63 | |
| BioNLP11ID | 89.22 | 89.65 | 89.6 | 88.47 | 89.03 | |
| BioNLP13CG | 88.45 | 92.42 | 90.39 | 93.56 | 91.63 | |
| BioNLP13GE | 73.57 | 83.51 | 78.22 | 77.62 | 90.55 | |
| BioNLP13PC | 89.56 | 94.26 | 90.66 | 87.93 | 89.27 | |
| CRAFT | 80.48 | 75.44 | 77.88 | 78.56 | 84.83 | |
| Ex-PTM | 74.79 | 80.46 | 77.52 | 81.83 | 86.83 | |
| JNLPBA | 71.98 | 80.04 | 75.80 | 72.58 | 85.04 | |
| BioNLP11ID | 85.41 | 82.03 | 91.22 | 70.24 | 79.37 | |
| BioNLP13CG | 88.34 | 89.19 | 88.39 | 86.68 | 87.52 | |
| CRAFT | 96.45 | 97.73 | 93.76 | 93.51 | 93.63 | |
| LINNAEUS | 91.70 | 85.62 | 88.43 | 82.14 | 85.17 | |
Better scores of each metric are in bold
F1 performance of STL, MTL-XC, and MTL-LS with different slicing rates
| Dataset | STL | MTL-XC | Slicng rate | |||
|---|---|---|---|---|---|---|
| 0.25 | 0.50 | 0.75 | 1.00 | |||
| BC4CHEMD | 91.70 | 91.25 | 91.82 | 92.24 | 92.37 | |
| BC5CDR | 93.36 | 93.19 | 93.75 | 93.60 | 93.14 | |
| BioNLP11ID | 62.94 | 67.87 | 42.29 | 52.98 | 73.41 | |
| BioNLP13CG | 84.16 | 81.61 | 78.59 | 78.50 | 82.47 | |
| BioNLP13PC | 80.07 | 81.25 | 82.13 | 84.03 | 86.00 | |
| CRAFT | 73.83 | 76.73 | 79.74 | 77.96 | 79.44 | |
| BC5CDR | 86.44 | 86.87 | 86.10 | 87.04 | 86.28 | |
| NCBI-disease | 88.26 | 88.75 | 85.23 | 86.79 | 88.97 | |
| BC2GM | 82.22 | 82.64 | 79.85 | 81.54 | 81.94 | |
| BioNLP09 | 87.50 | 89.53 | 87.23 | 89.47 | 89.27 | |
| BioNLP11EPI | 85.81 | 86.46 | 84.65 | 85.65 | 86.24 | |
| BioNLP11ID | 89.43 | 89.03 | 83.62 | 82.69 | 86.55 | |
| BioNLP13CG | 90.39 | 92.58 | 91.33 | 92.05 | 92.21 | |
| BioNLP13GE | 78.22 | 83.59 | 81.29 | 82.61 | 82.73 | |
| BioNLP13PC | 91.85 | 89.27 | 89.16 | 90.59 | 90.37 | |
| CRAFT | 77.88 | 81.57 | 79.02 | 82.62 | 82.18 | |
| Ex-PTM | 77.52 | 84.25 | 78.52 | 80.57 | 84.55 | |
| JNLPBA | 75.80 | 75.08 | 77.00 | 77.32 | 77.66 | |
| BioNLP11ID | 83.68 | 79.37 | 70.91 | 77.45 | 79.00 | |
| BioNLP13CG | 88.76 | 86.30 | 87.39 | 86.07 | 86.90 | |
| CRAFT | 93.63 | 95.56 | 95.04 | 94.74 | 95.75 | |
| LINNAEUS | 85.17 | 83.82 | 84.05 | 86.40 | 85.06 | |
Better scores of each metric are in bold
Model performance comparison to other studies
| BC5CDR | BC2GM | BC4CHEMD | NCBI-disease | LINNAEUS | ||
|---|---|---|---|---|---|---|
| Chemical | Disease | |||||
| Crichton et al. [ | 89.22 | 80.46 | 73.04 | 82.95 | 80.46 | 83.98 |
| Yoon et al. [ | 93.31 | 84.08 | 79.73 | 88.85 | 86.36 | – |
| Lee et al. [ | 93.44 | 86.56 | 91.41 | |||
| BERT [ | 91.16 | 82.41 | 81.79 | 90.04 | 85.63 | 87.60 |
| PubWebBERT [ | 93.33 | 85.62 | 84.52 | – | 87.82 | – |
| HunFlair [ | – | – | – | – | 88.65 | – |
| STL | 93.36 | 86.44 | 82.52 | 91.70 | 88.26 | 88.56 |
| Proposed | 82.92 | 89.25 | 86.37 | |||
Better scores of each metric are in bold
Fig. 3Euler diagram of training entity set (Training), test set (Labels) and predict set (Logits)
Fig. 4Radar chart of the proportional relationship between the sets of six tasks. A , B , C , D , E