| Literature DB >> 30239666 |
Suwisa Kaewphan1,2,3, Kai Hakala2,3, Niko Miekka2, Tapio Salakoski1,3, Filip Ginter2.
Abstract
We present a system for automatically identifying a multitude of biomedical entities from the literature. This work is based on our previous efforts in the BioCreative VI: Interactive Bio-ID Assignment shared task in which our system demonstrated state-of-the-art performance with the highest achieved results in named entity recognition. In this paper we describe the original conditional random field-based system used in the shared task as well as experiments conducted since, including better hyperparameter tuning and character level modeling, which led to further performance improvements. For normalizing the mentions into unique identifiers we use fuzzy character n-gram matching. The normalization approach has also been improved with a better abbreviation resolution method and stricter guideline compliance resulting in vastly improved results for various entity types. All tools and models used for both named entity recognition and normalization are publicly available under open license.Database URL: https://github.com/TurkuNLP/BioCreativeVI_BioID_assignment.Entities:
Mesh:
Year: 2018 PMID: 30239666 PMCID: PMC6146133 DOI: 10.1093/database/bay096
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1All processing steps included in the whole pipeline.
Figure 2Illustration of the tested character level neural NER model. The inputs are a character sequence (a sentence) and the corresponding IOB tags from the NERsuite model converted to character level. The used example phrase demonstrates the tokenization issue as ‘Dis3lcompared’ is missing a word boundary and as a result is tagged as an entity as a whole in the NERsuite system. The neural model aims at detecting only the span ‘Dis3l’.
Comparison of annotation counts between tokenization approaches
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| Without | 97.178 | 99.772 | 95.951 | 96.107 | 97.099 | 93.691 |
| With | 99.187 | 99.866 | 99.842 | 99.424 | 99.559 | 98.921 |
The comparison of annotation counts between preprocessing with only NERsuite tokenization module (without) and with both NERsuite tokenization and additional tokenization (with). The numbers are percents of annotations compared to the provided data presented for each entity type.
Comparison of NER system on the development data
|
|
|
|
|
|---|---|---|---|
|
| 0.796 / 0.698 / 0.744 | 0.796 / 0.714 / 0.752 | 0.803 / 0.639 / 0.712 |
|
| 0.759 / 0.611 / 0.677 | 0.710 / 0.682 / 0.696 | 0.725 / 0.633 / 0.676 |
|
| 0.771 / 0.726 / 0.748 | 0.755 / 0.738 / 0.746 | 0.833 / 0.779 / 0.805 |
|
| 0.878 / 0.696 / 0.776 | 0.872 / 0.757 / 0.810 | 0.856 / 0.713 / 0.778 |
|
| 0.825 / 0.579 / 0.681 | 0.724 / 0.653 / 0.687 | 0.740 / 0.595 / 0.659 |
|
| 0.816 / 0.566 / 0.668 | 0.750 / 0.696 / 0.722 | 0.730 / 0.607 / 0.663 |
|
|
|
|
|
Evaluation of the original model submitted to the shared task (combined), improved CRF model (Independent) and the neural character level model (CNN-BiLSTM-CRF) based on the official evaluation script with strict entity span matching on the development data. Numbers within cells are precision/recall/F-score.
Official evaluation of NER system on the test data
|
|
|
|
|
|---|---|---|---|
|
| 0.783 / 0.708 / 0.743 | 0.767 / 0.749 / 0.758 | 0.769 / 0.641 / 0.699 |
|
| 0.673 / 0.508 / 0.579 | 0.630 / 0.571 / 0.599 | 0.634 / 0.495 / 0.556 |
|
| 0.729 / 0.739 / 0.734 | 0.728 / 0.745 / 0.736 | 0.764 / 0.768 / 0.766 |
|
| 0.860 / 0.809 / 0.834 | 0.823 / 0.852 / 0.837 | 0.789 / 0.771 / 0.780 |
|
| 0.775 / 0.587 / 0.668 | 0.661 / 0.681 / 0.671 | 0.667 / 0.595 / 0.629 |
|
| 0.727 / 0.575 / 0.642 | 0.650 / 0.700 / 0.674 | 0.646 / 0.622 / 0.634 |
|
|
|
|
|
Evaluation of the original model submitted to the shared task (combined), improved CRF model (independent) and the neural character level model (CNN-BiLSTM-CRF) based on the official evaluation script with strict entity span matching on the test set. Numbers within cells are precision/recall/F-score.
Official evaluation of NEN system on the development data
|
|
| |
|---|---|---|
|
|
| |
|
| 0.733 / 0.770 / 0.751 | 0.715 / 0.766 / 0.740 |
|
| 0.478 / 0.493 / 0.485 | 0.462 / 0.491 / 0.476 |
|
| 0.445 / 0.315 / 0.369 | 0.410 / 0.360 / 0.383 |
|
| 0.724 / 0.669 / 0.695 | 0.802 / 0.668 / 0.729 |
|
| 0.292 / 0.187 / 0.228 | 0.595 / 0.587 / 0.591 |
|
| 0.598 / 0.672 / 0.633 | 0.592 / 0.680 / 0.633 |
Comparison of our normalization systems on development data with gold standard entities. Numbers within table cells are precision/recall/F-score.
Official evaluation of NER and NEN systems on the test data
|
|
|
|
|
|
|
|---|---|---|---|---|---|
|
| 0.600 / 0.576 / 0.588 | 0.630 / 0.664 / 0.647 | 0.674 / 0.610 / 0.641 | 0.784 / 0.557 / 0.651 | Sheng |
|
| 0.456 / 0.371 / 0.410 | 0.404 / 0.423 / 0.413 | 0.391 / 0.376 / 0.383 | 0.550 / 0.450 / 0.495 | Sheng |
|
| 0.472 / 0.343 / 0.397 | 0.456 / 0.358 / 0.401 | 0.445 / 0.388 / 0.415 | 0.472 / 0.343 / 0.397 | Our submitted system |
|
| 0.668 / 0.667 / 0.667 | 0.753 / 0.725 / 0.739 | 0.761 / 0.703 / 0.731 | 0.660 / 0.883 / 0.756 | Singh and Dai ( |
|
| 0.244 / 0.240 / 0.242 | 0.439 / 0.489 / 0.462 | 0.460 / 0.456 / 0.458 | 0.587 / 0.473 / 0.524 | Sheng |
|
| 0.531 / 0.490 / 0.510 | 0.427 / 0.565 / 0.486 | 0.451 / 0.542 / 0.493 | 0.531 / 0.490 / 0.510 | Our submitted system |
Comparison of our joint named entity recognition and normalization systems and the best performing systems in the shared task on the official test set. Numbers within table cells are micro-averaged precision/recall/F-score.