Literature DB >> 35998105

Task reformulation and data-centric approach for Twitter medication name extraction.

Yu Zhang1, Jong Kang Lee1, Jen-Chieh Han1, Richard Tzong-Han Tsai1,2,3.   

Abstract

Automatically extracting medication names from tweets is challenging in the real world. There are many tweets; however, only a small proportion mentions medications. Thus, datasets are usually highly imbalanced. Moreover, the length of tweets is very short, which makes it hard to recognize medication names from the limited context. This paper proposes a data-centric approach for extracting medications in the BioCreative VII Track 3 (Automatic Extraction of Medication Names in Tweets). Our approach formulates the sequence labeling problem as text entailment and question-answer tasks. As a result, without using the dictionary and ensemble method, our single model achieved a Strict F1 of 0.77 (the official baseline system is 0.758, and the average performance of participants is 0.696). Moreover, combining the dictionary filtering and ensemble method achieved a Strict F1 of 0.804 and had the highest performance for all participants. Furthermore, domain-specific and task-specific pretrained language models, as well as data-centric approaches, are proposed for further improvements. Database URL https://competitions.codalab.org/competitions/23925 and https://biocreative.bioinformatics.udel.edu/tasks/biocreative-vii/track-3/.
© The Author(s) 2022. Published by Oxford University Press.

Entities:  

Mesh:

Year:  2022        PMID: 35998105      PMCID: PMC9397573          DOI: 10.1093/database/baac067

Source DB:  PubMed          Journal:  Database (Oxford)        ISSN: 1758-0463            Impact factor:   4.462


  3 in total

1.  SOCIAL MEDIA MINING SHARED TASK WORKSHOP.

Authors:  Abeed Sarker; Azadeh Nikfarjam; Graciela Gonzalez
Journal:  Pac Symp Biocomput       Date:  2016

2.  TwiMed: Twitter and PubMed Comparable Corpus of Drugs, Diseases, Symptoms, and Their Relations.

Authors:  Nestor Alvaro; Yusuke Miyao; Nigel Collier
Journal:  JMIR Public Health Surveill       Date:  2017-05-03

3.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining.

Authors:  Jinhyuk Lee; Wonjin Yoon; Sungdong Kim; Donghyeon Kim; Sunkyu Kim; Chan Ho So; Jaewoo Kang
Journal:  Bioinformatics       Date:  2020-02-15       Impact factor: 6.937

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.