| Literature DB >> 35962559 |
Arslan Erdengasileng1, Qing Han1, Tingting Zhao2, Shubo Tian1, Xin Sui1, Keqiao Li1, Wanjing Wang1, Jian Wang3, Ting Hu1, Feng Pan1, Yuan Zhang1, Jinfeng Zhang1.
Abstract
Large volumes of publications are being produced in biomedical sciences nowadays with ever-increasing speed. To deal with the large amount of unstructured text data, effective natural language processing (NLP) methods need to be developed for various tasks such as document classification and information extraction. BioCreative Challenge was established to evaluate the effectiveness of information extraction methods in biomedical domain and facilitate their development as a community-wide effort. In this paper, we summarize our work and what we have learned from the latest round, BioCreative Challenge VII, where we participated in all five tracks. Overall, we found three key components for achieving high performance across a variety of NLP tasks: (1) pre-trained NLP models; (2) data augmentation strategies and (3) ensemble modelling. These three strategies need to be tailored towards the specific tasks at hands to achieve high-performing baseline models, which are usually good enough for practical applications. When further combined with task-specific methods, additional improvements (usually rather small) can be achieved, which might be critical for winning competitions. Database URL: https://doi.org/10.1093/database/baac066.Entities:
Mesh:
Year: 2022 PMID: 35962559 PMCID: PMC9375052 DOI: 10.1093/database/baac066
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 4.462
Results using different pre-trained models. The values in the table are F1 scores on test data
| Pre-trained models | Track 1 | Track 2 | Track 3 | Track 5 |
|---|---|---|---|---|
| BERT | – | 0.8018 | – | – |
| BioBERT | 0.683 | 0.8433 | 0.5957 | 0.9067 |
| PubMedBERT abstract | – | – | 0.5922 | 0.9027 |
| PubMedBERT fulltext | 0.732 | 0.8679 | 0.6257 | 0.9066 |
| BlueBERT | – | 0.8442 | – | 0.8956 |
| SciBERT | – | 0.8495 | – | – |
| ClinicalBERT | – | 0.8114 | – | – |
| T5 | 0.739 | – | – | – |
| RoBERTa | 0.8536 |
Performance of different data augmentation strategies for four tracks. Not all the data augmentation methods were tried for all the tracks due to the differences in the data/tasks. * F1 score on validation data
| Data augmentation methods | Track 1 | Track 2 | Track 3 | Track 5 |
|---|---|---|---|---|
| No data augmentation | 0.721 | 0.8711 | 0.7090 | 0.9298* |
| Dropping a non-essential word | 0.749 | 0.7913 | – | |
| Replacing words with random strings | – | 0.8744 | 0.800 | – |
| Replacing an entity name with another name of the same type | – | – | 0.837 | – |
| Dropping words with lowest TF-IDF values | – | – | – | 0.9271* |
| Dropping words with highest TF-IDF values | – | – | – | 0.9257* |
Track 2 chemical named entity recognition sub-task results. Our team (#128) ranked the second for this sub-task
| File | Strict-P | Strict-R | Strict-F |
|---|---|---|---|
| Run 1 | 0.8544 | 0.8658 | 0.8600 |
| Run 2 | 0.8643 | 0.8403 | 0.8521 |
| Run 3 | 0.8440 | 0.7896 | 0.8159 |
| Run 4 | 0.8457 | 0.8617 | 0.8536 |
| Baseline | 0.8440 | 0.7877 | 0.8149 |
| Best (Team 139) | 0.8759 | 0.8587 | 0.8672 |
Track 2 chemical normalization sub-task results. Our team (#128) ranked second for this sub-task
| File | Strict-P | Strict-R | Strict-F |
|---|---|---|---|
| Run 1 | 0.7833 | 0.8339 | 0.8078 |
| Run 2 | 0.7792 | 0.8434 | 0.8101 |
| Run 3 | 0.7780 | 0.8257 | 0.8011 |
| Run 4 | 0.7755 | 0.8318 | 0.8027 |
| Baseline | 0.8151 | 0.7644 | 0.7889 |
| Best (Team 110) | 0.8621 | 0.7702 | 0.8136 |
Track 2 chemical indexing sub-task results. Our team (#128) ranked first for this sub-task (unofficial). The official best performing team’s result is also shown
| File | Strict-P | Strict-R | Strict-F |
|---|---|---|---|
| Run 1 | 0.4424 | 0.5286 | 0.4817 |
| Run 2 | 0.4397 | 0.5344 | 0.4825 |
| Run 3 | 0.3776 | 0.3781 | 0.3779 |
| Run 4 | 0.3805 | 0.3814 | 0.3809 |
| Baseline | 0.3134 | 0.6101 | 0.4141 |
| Best (Team 110) | 0.5351 | 0.4133 | 0.4664 |
Performances of submissions for Track 3
| Overlapping | Strict | |||||
|---|---|---|---|---|---|---|
| Submission | F1 score | Precision | Recall | F1 score | Precision | Recall |
| 1 | 0.764 | 0.747 | 0.782 | 0.738 | 0.721 | 0.755 |
| 2 | 0.763 | 0.712 | 0.823 | 0.732 | 0.682 | 0.789 |
| 3 | 0.794 | 0.744 | 0.85 | 0.762 | 0.714 | 0.816 |
| All participants (mean ± SD) | 0.749 ± 0.0596 | 0.811 | 0.709 | 0.696 ± 0.072 | 0.754 | 0.658 |
| Baseline | 0.773 | 0.908 | 0.673 | 0.758 | 0.890 | 0.660 |
| Best | 0.838 | 0.832 | 0.844 | 0.804 | 0.799 | 0.810 |