Literature DB >> 35775029

Uncertainty-based Self-training for Biomedical Keyphrase Extraction.

Zelalem Gero1, Joyce C Ho1.   

Abstract

To keep pace with the increased generation and digitization of documents, automated methods that can improve search, discovery and mining of the vast body of literature are essential. Keyphrases provide a concise representation by identifying salient concepts in a document. Various supervised approaches model keyphrase extraction using local context to predict the label for each token and perform much better than the unsupervised counterparts. However, existing supervised datasets have limited annotated examples to train better deep learning models. In contrast, many domains have large amount of un-annotated data that can be leveraged to improve model performance in keyphrase extraction. We introduce a self-learning based model that incorporates uncertainty estimates to select instances from large-scale unlabeled data to augment the small labeled training set. Performance evaluation on a publicly available biomedical dataset demonstrates that our method improves performance of keyphrase extraction over state of the art models.

Entities:  

Keywords:  Biomedical text processing; Document Summarization; Keyphrase Extraction

Year:  2021        PMID: 35775029      PMCID: PMC9241089          DOI: 10.1109/bhi50953.2021.9508592

Source DB:  PubMed          Journal:  IEEE EMBS Int Conf Biomed Health Inform        ISSN: 2641-3590


  2 in total

1.  Deep neural model with self-training for scientific keyphrase extraction.

Authors:  Xun Zhu; Chen Lyu; Donghong Ji; Han Liao; Fei Li
Journal:  PLoS One       Date:  2020-05-15       Impact factor: 3.240

2.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining.

Authors:  Jinhyuk Lee; Wonjin Yoon; Sungdong Kim; Donghyeon Kim; Sunkyu Kim; Chan Ho So; Jaewoo Kang
Journal:  Bioinformatics       Date:  2020-02-15       Impact factor: 6.937

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.