| Literature DB >> 35501781 |
Yufei Li1,2,3, Xiangyu Zhou1,2,3, Jie Ma1,2,3, Xiaoyong Ma1,2,3, Pengzhen Cheng1,2,3, Tieliang Gong1,2,3, Chen Li4,5,6.
Abstract
BACKGROUND: Bio-entity Coreference Resolution (CR) is a vital task in biomedical text mining. An important issue in CR is the differential representation of identical mentions as their similar representations may make the coreference more puzzling. However, when extracting features, existing neural network-based models may bring additional noise to the distinction of identical mentions since they tend to get similar or even identical feature representations.Entities:
Keywords: Context-aware; Coreference resolution; Mention detection; Neural network
Mesh:
Year: 2022 PMID: 35501781 PMCID: PMC9063119 DOI: 10.1186/s12911-022-01862-1
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 3.298
Fig. 1Example of linking errors of identical mentions affected by similar representations. The correct prediction is marked blue. The false link errors are highlighted in red. Correct annotations: {T1, T2, T3} in eg.1, {T7, T8} in eg.2
Statistics of identical mentions on training and development set of BioNLP
| Frequency | NN/NNS/NP | PRP | WH- | IN | All | |
|---|---|---|---|---|---|---|
| Train | 2 | 49 | 66 | 80 | 107 | 302 |
| 3 | 7 | 13 | 33 | 43 | 96 | |
| > 3 | 14 | 12 | 9 | 28 | 63 | |
| All | 70 | 91 | 122 | 178 | 461 | |
| Dev | 2 | 7 | 11 | 18 | 20 | 56 |
| 3 | 1 | 5 | 11 | 7 | 24 | |
| > 3 | 0 | 0 | 0 | 7 | 7 | |
| All | 8 | 16 | 29 | 34 | 87 |
Statistics on whether there are string matching and sub-string matching between coreferential mention pairs on CRAFT-CR
| String matching | Sub-string matching | Others | |
|---|---|---|---|
| Train (%) | 61.4 | 8.3 | 30.3 |
| Dev (%) | 58.2 | 7.0 | 34.8 |
Coreference resolution performance comparison by the average F1 value
| Dev-F1 | Test-F1 | Feature-based Rule-based | Hybrid | Neural | |
|---|---|---|---|---|---|
| [ | 62.4 | 60.9 | √ | ||
| [ | 68.6 | / | √ | ||
| [ | 63.9 | 48.1 | √ | ||
| [ | 67.5 | / | √ | ||
| [ | 72.2 | 62.0 | √ | √ | |
| [ | 63.4 | 51.2 | √ | ||
| [ | 65.6 | 69.5 | √ | ||
| [ | 45.5 | 46.4 | √ | ||
| [ | 33.9 | 36.0 | √ | ||
| [ | / | 57.0 | √ |
All the models are evaluated on the platforms provided by the task organizers
Fig. 2The Feature Attention model. The model learns to weigh each feature based on contexts
Fig. 3The model of computing the span embedding representations
The performance of protein coreference resolution with different models on two evaluation datasets of BioNLP
| Dev | Test | |||||
|---|---|---|---|---|---|---|
| P | R | F1 | P | R | F1 | |
| Hybrid | 59.9 | 67.4 | 55.6 | 67.2 | 60.9 | |
| Simple | 63.4 | 64.4 | 63.9 | 46.3 | 50.0 | 48.1 |
| Bio-SCoRes | 72.4 | 63.2 | 67.5 | / | / | / |
| Lee2018-rule | 68.8 | 76.0 | 60.2 | 63.8 | 62.0 | |
| lee2018-neural | 60.4 | 61.9 | 61.2 | 54.9 | 58.0 | 56.4 |
| Bioe2e | 71.7 | 56.7 | 63.1 | 55.6 | 47.5 | 51.2 |
| BioNeu | 61.9 | 68.7 | 71.5 | 60.9 | 65.8 | |
| BioNeu-feature | 75.5 | 65.8 | 70.4 | 69.5 | 60.2 | 64.5 |
| BioNeu + SFA | 73.0 | 65.3 | 69.0 | 61.6 | 66.5 | |
| KE-LSTM | 68.1 | 63.4 | 65.6 | 69.6 | ||
| KE-LSTM-feature | 74.4 | 64.8 | 69.3 | 62.8 | 61.2 | 62.0 |
| KE-LSTM + SFA | 70.8 | 68.3 | 69.6 | 69.5 | 68.2 | 68.8 |
The maximum value is in bold
F1 scores of coreference on CRAFT test set in comparison with some baselines
| System | B3 | BLANC | CEAFE | CEAFM | LEA | MUC | Ave |
|---|---|---|---|---|---|---|---|
| E2E_MetaMap | 36.4 | 46.5 | 33.1 | 41.0 | 32.4 | 51.8 | 40.2 |
| BERTfilter | 44.0 | 48.9 | 39.8 | 49.0 | 40.0 | 57.0 | 46.4 |
| BioNeu | 45.0 | 55.4 | 36.1 | 49.8 | 41.8 | 55.1 | 47.2 |
| BioNeu-feature | 45.3 | 53.2 | 36.5 | 49.4 | 42.1 | 56.1 | 47.1 |
| BioNeu + SFA | 45.1 | 56.2 | 37.0 | 49.7 | 42.0 | 56.3 | 47.7 |
| KE-LSTM | 54.9 | 63.1 | 48.6 | 59.4 | 51.3 | 64.5 | 57.0 |
| KE-LSTM-feature | 54.5 | 62.2 | 48.1 | 59.2 | 51.4 | 64.5 | 56.6 |
| KE-LSTM + SFA |
E2E_MetaMap and BERTfilter are the baselines in [7]
The maximum value is in bold
The performance of mention detection with different models on two datasets
| BioNLP | P | R | F1 |
|---|---|---|---|
| Bioe2e | 82.0 | 66.3 | 73.3 |
| BioNeu | 73.1 | 78.2 | |
| BioNeu + SFA | 83.4 | 76.1 | 79.6 |
| KE-LSTM | 78.0 | 84.1 | 80.9 |
| KE-LSTM + SFA | 78.2 | ||
| CRAFT | P | R | F1 |
| E2E_MetaMap | 67.1 | 52.7 | 59.0 |
| BERTfilter | 73.1 | 57.8 | 64.5 |
| BioNeu | 81.6 | 49.6 | 61.7 |
| BioNeu + SFA | 49.4 | 62.0 | |
| KE-LSTM | 79.3 | 63.1 | 70.3 |
| KE-LSTM + SFA | 78.8 |
E2E_MetaMap and BERTfilter are the baselines in [7]. Bioe2e is the baseline in [9], and KE-LSTM is the baseline in [10]
The maximum value is in bold
The coreference performance of different models on identical mentions with different POS tags on BioNLP dataset
| NN/NNS/NP | PRP | WH | IN | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| P | R | F1 | P | R | F1 | P | R | F1 | P | R | F1 | |
| BioNeu | 62.5 | 17.8 | 27.7 | 39.0 | 37.8 | 38.4 | 68.3 | 73.0 | 70.6 | 65.6 | 64.3 | 64.9 |
| BioNeu-feature | 42.9 | 21.4 | 28.5 | 46.3 | 41.3 | 43.6 | 69.1 | 75.7 | 72.2 | 70.1 | 74.2 | 72.1 |
| BioNeu + SFA | 52.9 | 32.1 | 40.0 | 45.7 | 34.8 | 39.5 | 72.0 | 73.0 | 72.5 | 71.6 | 72.3 | 71.9 |
| KE-LSTM | 66.7 | 21.4 | 32.4 | 45.0 | 39.1 | 41.8 | 63.5 | 73.0 | 67.9 | 67.6 | 68.3 | 68.0 |
| KE-LSTM-feature | 75.0 | 21.4 | 33.3 | 47.0 | 34.9 | 40.1 | 71.8 | 75.6 | 73.6 | 72.0 | 76.2 | 74.0 |
| KE-LSTM + SFA | 92.3 | 80.0 | 85.7 | 47.5 | 50.0 | 48.7 | 66.3 | 77.5 | 71.5 | 60.4 | 63.4 | 61.9 |
KE-LSTM is the baseline in [10]
The coreference performance of mention pairs on CRAFT-CR in three cases
| String match | Sub-string match | Others | |||||||
|---|---|---|---|---|---|---|---|---|---|
| P | R | F1 | P | R | F1 | P | R | F1 | |
| BioNeu | 62.5 | 71.5 | 66.7 | 42.6 | 27.9 | 33.7 | 44.1 | 29.6 | 35.4 |
| BioNeu-feature | 68.4 | 68.9 | 68.6 | 51.4 | 32.9 | 40.1 | 31.9 | 29.9 | 30.9 |
| BioNeu + SFA | 68.8 | 67.8 | 68.3 | 44.2 | 28.2 | 34.4 | 47.5 | 29.3 | 36.3 |
| KE-LSTM | 64.8 | 83.6 | 73.1 | 39.8 | 35.3 | 37.4 | 27.2 | 34.8 | 30.5 |
| KE-LSTM-feature | 66.6 | 81.5 | 73.3 | 47.1 | 34.3 | 39.7 | 24.0 | 37.4 | 29.3 |
| KE-LSTM + SFA | 70.2 | 82.8 | 76.0 | 41.4 | 36.9 | 39.1 | 29.3 | 32.8 | 30.9 |
KE-LSTM is the baseline in [10]
Fig. 4Examples of remaining problems. The correct prediction is marked blue. The spurious link errors are highlighted in red
Fig. 5The visualization of mentions’ features and their attention weights in the first example. GN means grammatical numbers. Each column shows the attention weights of all the features of the span
Fig. 6The visualization of mentions’ features and their attention weights in the second example. GN means grammatical numbers. Each column shows the attention weights of all the features of the span. “celf” means Cell Function comd. “inpr” means intellectual product
Fig. 7The reduced 2-dimensional feature representations before and after using SFA. We use PCA for dimensionality reduction. a The initial feature representations before using SFA, and b the new feature representations after using SFA