Literature DB >> 26116947

Boosting drug named entity recognition using an aggregate classifier.

Ioannis Korkontzelos1, Dimitrios Piliouras2, Andrew W Dowsey3, Sophia Ananiadou4.   

Abstract

OBJECTIVE: Drug named entity recognition (NER) is a critical step for complex biomedical NLP tasks such as the extraction of pharmacogenomic, pharmacodynamic and pharmacokinetic parameters. Large quantities of high quality training data are almost always a prerequisite for employing supervised machine-learning techniques to achieve high classification performance. However, the human labour needed to produce and maintain such resources is a significant limitation. In this study, we improve the performance of drug NER without relying exclusively on manual annotations.
METHODS: We perform drug NER using either a small gold-standard corpus (120 abstracts) or no corpus at all. In our approach, we develop a voting system to combine a number of heterogeneous models, based on dictionary knowledge, gold-standard corpora and silver annotations, to enhance performance. To improve recall, we employed genetic programming to evolve 11 regular-expression patterns that capture common drug suffixes and used them as an extra means for recognition. MATERIALS: Our approach uses a dictionary of drug names, i.e. DrugBank, a small manually annotated corpus, i.e. the pharmacokinetic corpus, and a part of the UKPMC database, as raw biomedical text. Gold-standard and silver annotated data are used to train maximum entropy and multinomial logistic regression classifiers.
RESULTS: Aggregating drug NER methods, based on gold-standard annotations, dictionary knowledge and patterns, improved the performance on models trained on gold-standard annotations, only, achieving a maximum F-score of 95%. In addition, combining models trained on silver annotations, dictionary knowledge and patterns are shown to achieve comparable performance to models trained exclusively on gold-standard data. The main reason appears to be the morphological similarities shared among drug names.
CONCLUSION: We conclude that gold-standard data are not a hard requirement for drug NER. Combining heterogeneous models build on dictionary knowledge can achieve similar or comparable classification performance with that of the best performing model trained on gold-standard annotations.
Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.

Entities:  

Keywords:  Drug named entity recognition; Genetic-programming-evolved string-similarity patterns; Gold-standard vs. silver-standard annotations; Named entity annotation sparsity; Named entity recogniser aggregation

Mesh:

Substances:

Year:  2015        PMID: 26116947     DOI: 10.1016/j.artmed.2015.05.007

Source DB:  PubMed          Journal:  Artif Intell Med        ISSN: 0933-3657            Impact factor:   5.326


  6 in total

1.  Towards an Obesity-Cancer Knowledge Base: Biomedical Entity Identification and Relation Detection.

Authors:  Juan Antonio Lossio-Ventura; William Hogan; François Modave; Amanda Hicks; Josh Hanna; Yi Guo; Zhe He; Jiang Bian
Journal:  Proceedings (IEEE Int Conf Bioinformatics Biomed)       Date:  2017-01-19

Review 2.  Natural language processing systems for capturing and standardizing unstructured clinical information: A systematic review.

Authors:  Kory Kreimeyer; Matthew Foster; Abhishek Pandey; Nina Arya; Gwendolyn Halford; Sandra F Jones; Richard Forshee; Mark Walderhaug; Taxiarchis Botsis
Journal:  J Biomed Inform       Date:  2017-07-17       Impact factor: 6.317

3.  Extracting Drug Names and Associated Attributes From Discharge Summaries: Text Mining Study.

Authors:  Ghada Alfattni; Maksim Belousov; Niels Peek; Goran Nenadic
Journal:  JMIR Med Inform       Date:  2021-05-05

4.  A New Data Representation Based on Training Data Characteristics to Extract Drug Name Entity in Medical Text.

Authors:  Mujiono Sadikin; Mohamad Ivan Fanany; T Basaruddin
Journal:  Comput Intell Neurosci       Date:  2016-10-24

Review 5.  Social media based surveillance systems for healthcare using machine learning: A systematic review.

Authors:  Aakansha Gupta; Rahul Katarya
Journal:  J Biomed Inform       Date:  2020-07-02       Impact factor: 6.317

6.  Annotation and detection of drug effects in text for pharmacovigilance.

Authors:  Paul Thompson; Sophia Daikou; Kenju Ueno; Riza Batista-Navarro; Jun'ichi Tsujii; Sophia Ananiadou
Journal:  J Cheminform       Date:  2018-08-13       Impact factor: 5.514

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.