Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Task reformulation and data-centric approach for Twitter medication name extraction.

Literature DB >> 35998105

Task reformulation and data-centric approach for Twitter medication name extraction.

Yu Zhang¹, Jong Kang Lee¹, Jen-Chieh Han¹, Richard Tzong-Han Tsai^1,2,3.

Abstract

Automatically extracting medication names from tweets is challenging in the real world. There are many tweets; however, only a small proportion mentions medications. Thus, datasets are usually highly imbalanced. Moreover, the length of tweets is very short, which makes it hard to recognize medication names from the limited context. This paper proposes a data-centric approach for extracting medications in the BioCreative VII Track 3 (Automatic Extraction of Medication Names in Tweets). Our approach formulates the sequence labeling problem as text entailment and question-answer tasks. As a result, without using the dictionary and ensemble method, our single model achieved a Strict F1 of 0.77 (the official baseline system is 0.758, and the average performance of participants is 0.696). Moreover, combining the dictionary filtering and ensemble method achieved a Strict F1 of 0.804 and had the highest performance for all participants. Furthermore, domain-specific and task-specific pretrained language models, as well as data-centric approaches, are proposed for further improvements. Database URL https://competitions.codalab.org/competitions/23925 and https://biocreative.bioinformatics.udel.edu/tasks/biocreative-vii/track-3/.

Entities: Chemical

Mesh：

Year: 2022 PMID： 35998105 PMCID： PMC9397573 DOI： 10.1093/database/baac067

Source DB: PubMed Journal: Database (Oxford) ISSN： 1758-0463 Impact factor: 4.462

Keyword Cloud
References

3 in total

Task reformulation and data-centric approach for Twitter medication name extraction.

1. SOCIAL MEDIA MINING SHARED TASK WORKSHOP.

2. TwiMed: Twitter and PubMed Comparable Corpus of Drugs, Diseases, Symptoms, and Their Relations.

3. BioBERT: a pre-trained biomedical language representation model for biomedical text mining.