Literature DB >> 32602538

Biomedical named entity recognition and linking datasets: survey and our recent development.

Ming-Siang Huang1, Po-Ting Lai2, Pei-Yen Lin3, Yu-Ting You4, Richard Tzong-Han Tsai5, Wen-Lian Hsu4.   

Abstract

Natural language processing (NLP) is widely applied in biological domains to retrieve information from publications. Systems to address numerous applications exist, such as biomedical named entity recognition (BNER), named entity normalization (NEN) and protein-protein interaction extraction (PPIE). High-quality datasets can assist the development of robust and reliable systems; however, due to the endless applications and evolving techniques, the annotations of benchmark datasets may become outdated and inappropriate. In this study, we first review commonlyused BNER datasets and their potential annotation problems such as inconsistency and low portability. Then, we introduce a revised version of the JNLPBA dataset that solves potential problems in the original and use state-of-the-art named entity recognition systems to evaluate its portability to different kinds of biomedical literature, including protein-protein interaction and biology events. Lastly, we introduce an ensembled biomedical entity dataset (EBED) by extending the revised JNLPBA dataset with PubMed Central full-text paragraphs, figure captions and patent abstracts. This EBED is a multi-task dataset that covers annotations including gene, disease and chemical entities. In total, it contains 85000 entity mentions, 25000 entity mentions with database identifiers and 5000 attribute tags. To demonstrate the usage of the EBED, we review the BNER track from the AI CUP Biomedical Paper Analysis challenge. Availability: The revised JNLPBA dataset is available at https://iasl-btm.iis.sinica.edu.tw/BNER/Content/Re vised_JNLPBA.zip. The EBED dataset is available at https://iasl-btm.iis.sinica.edu.tw/BNER/Content/AICUP _EBED_dataset.rar. Contact: Email: thtsai@g.ncu.edu.tw, Tel. 886-3-4227151 ext. 35203, Fax: 886-3-422-2681 Email: hsu@iis.sinica.edu.tw, Tel. 886-2-2788-3799 ext. 2211, Fax: 886-2-2782-4814 Supplementary information: Supplementary data are available at Briefings in Bioinformatics online.
© The Author(s) 2020. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

Entities:  

Keywords:  biological information retrieval; biomedical dataset; biomedical natural language processing; named entity recognition

Year:  2020        PMID: 32602538     DOI: 10.1093/bib/bbaa054

Source DB:  PubMed          Journal:  Brief Bioinform        ISSN: 1467-5463            Impact factor:   11.622


  3 in total

1.  Chemical identification and indexing in PubMed full-text articles using deep learning and heuristics.

Authors:  Tiago Almeida; Rui Antunes; João F Silva; João R Almeida; Sérgio Matos
Journal:  Database (Oxford)       Date:  2022-07-01       Impact factor: 4.462

2.  A comprehensive study of mobility functioning information in clinical notes: Entity hierarchy, corpus annotation, and sequence labeling.

Authors:  Thanh Thieu; Jonathan Camacho Maldonado; Pei-Shu Ho; Min Ding; Alex Marr; Diane Brandt; Denis Newman-Griffis; Ayah Zirikly; Leighton Chan; Elizabeth Rasch
Journal:  Int J Med Inform       Date:  2020-12-24       Impact factor: 4.046

3.  Automated Extraction of Information From Texts of Scientific Publications: Insights Into HIV Treatment Strategies.

Authors:  Nadezhda Biziukova; Olga Tarasova; Sergey Ivanov; Vladimir Poroikov
Journal:  Front Genet       Date:  2020-12-22       Impact factor: 4.599

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.