Literature DB >> 27283952

TaggerOne: joint named entity recognition and normalization with semi-Markov Models.

Robert Leaman1, Zhiyong Lu1.   

Abstract

MOTIVATION: Text mining is increasingly used to manage the accelerating pace of the biomedical literature. Many text mining applications depend on accurate named entity recognition (NER) and normalization (grounding). While high performing machine learning methods trainable for many entity types exist for NER, normalization methods are usually specialized to a single entity type. NER and normalization systems are also typically used in a serial pipeline, causing cascading errors and limiting the ability of the NER system to directly exploit the lexical information provided by the normalization.
METHODS: We propose the first machine learning model for joint NER and normalization during both training and prediction. The model is trainable for arbitrary entity types and consists of a semi-Markov structured linear classifier, with a rich feature approach for NER and supervised semantic indexing for normalization. We also introduce TaggerOne, a Java implementation of our model as a general toolkit for joint NER and normalization. TaggerOne is not specific to any entity type, requiring only annotated training data and a corresponding lexicon, and has been optimized for high throughput.
RESULTS: We validated TaggerOne with multiple gold-standard corpora containing both mention- and concept-level annotations. Benchmarking results show that TaggerOne achieves high performance on diseases (NCBI Disease corpus, NER f-score: 0.829, normalization f-score: 0.807) and chemicals (BioCreative 5 CDR corpus, NER f-score: 0.914, normalization f-score 0.895). These results compare favorably to the previous state of the art, notwithstanding the greater flexibility of the model. We conclude that jointly modeling NER and normalization greatly improves performance.
AVAILABILITY AND IMPLEMENTATION: The TaggerOne source code and an online demonstration are available at: http://www.ncbi.nlm.nih.gov/bionlp/taggerone CONTACT: zhiyong.lu@nih.gov SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Published by Oxford University Press 2016. This work is written by US Government employees and is in the public domain in the US.

Mesh:

Year:  2016        PMID: 27283952      PMCID: PMC5018376          DOI: 10.1093/bioinformatics/btw343

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  26 in total

1.  BANNER: an executable survey of advances in biomedical named entity recognition.

Authors:  Robert Leaman; Graciela Gonzalez
Journal:  Pac Symp Biocomput       Date:  2008

2.  NCBI disease corpus: a resource for disease name recognition and concept normalization.

Authors:  Rezarta Islamaj Doğan; Robert Leaman; Zhiyong Lu
Journal:  J Biomed Inform       Date:  2014-01-03       Impact factor: 6.317

3.  Detection of IUPAC and IUPAC-like chemical names.

Authors:  Roman Klinger; Corinna Kolárik; Juliane Fluck; Martin Hofmann-Apitius; Christoph M Friedrich
Journal:  Bioinformatics       Date:  2008-07-01       Impact factor: 6.937

4.  Overview of BioCreAtIvE: critical assessment of information extraction for biology.

Authors:  Lynette Hirschman; Alexander Yeh; Christian Blaschke; Alfonso Valencia
Journal:  BMC Bioinformatics       Date:  2005-05-24       Impact factor: 3.169

5.  CHEMDNER: The drugs and chemical names extraction challenge.

Authors:  Martin Krallinger; Florian Leitner; Obdulia Rabal; Miguel Vazquez; Julen Oyarzabal; Alfonso Valencia
Journal:  J Cheminform       Date:  2015-01-19       Impact factor: 5.514

6.  A modular framework for biomedical concept recognition.

Authors:  David Campos; Sérgio Matos; José Luís Oliveira
Journal:  BMC Bioinformatics       Date:  2013-09-24       Impact factor: 3.169

7.  Overview of BioCreative II gene normalization.

Authors:  Alexander A Morgan; Zhiyong Lu; Xinglong Wang; Aaron M Cohen; Juliane Fluck; Patrick Ruch; Anna Divoli; Katrin Fundel; Robert Leaman; Jörg Hakenberg; Chengjie Sun; Heng-hui Liu; Rafael Torres; Michael Krauthammer; William W Lau; Hongfang Liu; Chun-Nan Hsu; Martijn Schuemie; K Bretonnel Cohen; Lynette Hirschman
Journal:  Genome Biol       Date:  2008-09-01       Impact factor: 13.583

8.  Assessment of disease named entity recognition on a corpus of annotated sentences.

Authors:  Antonio Jimeno; Ernesto Jimenez-Ruiz; Vivian Lee; Sylvain Gaudan; Rafael Berlanga; Dietrich Rebholz-Schuhmann
Journal:  BMC Bioinformatics       Date:  2008-04-11       Impact factor: 3.169

9.  DNorm: disease name normalization with pairwise learning to rank.

Authors:  Robert Leaman; Rezarta Islamaj Dogan; Zhiyong Lu
Journal:  Bioinformatics       Date:  2013-08-21       Impact factor: 6.937

10.  Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task.

Authors:  Chih-Hsuan Wei; Yifan Peng; Robert Leaman; Allan Peter Davis; Carolyn J Mattingly; Jiao Li; Thomas C Wiegers; Zhiyong Lu
Journal:  Database (Oxford)       Date:  2016-03-19       Impact factor: 3.451

View more
  51 in total

1.  PubTator central: automated concept annotation for biomedical full text articles.

Authors:  Chih-Hsuan Wei; Alexis Allot; Robert Leaman; Zhiyong Lu
Journal:  Nucleic Acids Res       Date:  2019-07-02       Impact factor: 16.971

2.  Towards an Obesity-Cancer Knowledge Base: Biomedical Entity Identification and Relation Detection.

Authors:  Juan Antonio Lossio-Ventura; William Hogan; François Modave; Amanda Hicks; Josh Hanna; Yi Guo; Zhe He; Jiang Bian
Journal:  Proceedings (IEEE Int Conf Bioinformatics Biomed)       Date:  2017-01-19

3.  Biomedical text mining for research rigor and integrity: tasks, challenges, directions.

Authors:  Halil Kilicoglu
Journal:  Brief Bioinform       Date:  2018-11-27       Impact factor: 11.622

4.  Assisting document triage for human kinome curation via machine learning.

Authors:  Yi-Yu Hsu; Chih-Hsuan Wei; Zhiyong Lu
Journal:  Database (Oxford)       Date:  2018-01-01       Impact factor: 3.451

5.  Chemical identification and indexing in PubMed full-text articles using deep learning and heuristics.

Authors:  Tiago Almeida; Rui Antunes; João F Silva; João R Almeida; Sérgio Matos
Journal:  Database (Oxford)       Date:  2022-07-01       Impact factor: 4.462

6.  PGxMine: Text mining for curation of PharmGKB.

Authors:  Jake Lever; Julia M Barbarino; Li Gong; Rachel Huddart; Katrin Sangkuhl; Ryan Whaley; Michelle Whirl-Carrillo; Mark Woon; Teri E Klein; Russ B Altman
Journal:  Pac Symp Biocomput       Date:  2020

Review 7.  Recent advances in biomedical literature mining.

Authors:  Sendong Zhao; Chang Su; Zhiyong Lu; Fei Wang
Journal:  Brief Bioinform       Date:  2021-05-20       Impact factor: 11.622

8.  Parallel sequence tagging for concept recognition.

Authors:  Lenz Furrer; Joseph Cornelius; Fabio Rinaldi
Journal:  BMC Bioinformatics       Date:  2022-03-24       Impact factor: 3.169

9.  Improving the recall of biomedical named entity recognition with label re-correction and knowledge distillation.

Authors:  Huiwei Zhou; Zhe Liu; Chengkun Lang; Yibin Xu; Yingyu Lin; Junjie Hou
Journal:  BMC Bioinformatics       Date:  2021-06-02       Impact factor: 3.169

Review 10.  [Making COVID-19 research data more accessible-building a nationwide information infrastructure].

Authors:  Carsten Oliver Schmidt; Juliane Fluck; Martin Golebiewski; Linus Grabenhenrich; Horst Hahn; Toralf Kirsten; Sebastian Klammt; Matthias Löbe; Ulrich Sax; Sylvia Thun; Iris Pigeot
Journal:  Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz       Date:  2021-07-23       Impact factor: 1.513

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.