Literature DB >> 35331131

Parallel sequence tagging for concept recognition.

Lenz Furrer1,2, Joseph Cornelius3,2, Fabio Rinaldi4,5,6,7.   

Abstract

BACKGROUND: Named Entity Recognition (NER) and Normalisation (NEN) are core components of any text-mining system for biomedical texts. In a traditional concept-recognition pipeline, these tasks are combined in a serial way, which is inherently prone to error propagation from NER to NEN. We propose a parallel architecture, where both NER and NEN are modeled as a sequence-labeling task, operating directly on the source text. We examine different harmonisation strategies for merging the predictions of the two classifiers into a single output sequence.
RESULTS: We test our approach on the recent Version 4 of the CRAFT corpus. In all 20 annotation sets of the concept-annotation task, our system outperforms the pipeline system reported as a baseline in the CRAFT shared task, a competition of the BioNLP Open Shared Tasks 2019. We further refine the systems from the shared task by optimising the harmonisation strategy separately for each annotation set.
CONCLUSIONS: Our analysis shows that the strengths of the two classifiers can be combined in a fruitful way. However, prediction harmonisation requires individual calibration on a development set for each annotation set. This allows achieving a good trade-off between established knowledge (training set) and novel information (unseen concepts).
© 2022. The Author(s).

Entities:  

Keywords:  Concept recognition; Named entity recognition and normalization; Neural network; Sequence tagging; Text mining

Mesh:

Year:  2022        PMID: 35331131      PMCID: PMC8943923          DOI: 10.1186/s12859-021-04511-y

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  32 in total

1.  A new method to measure the semantic similarity of GO terms.

Authors:  James Z Wang; Zhidian Du; Rapeeporn Payattakool; Philip S Yu; Chin-Fu Chen
Journal:  Bioinformatics       Date:  2007-03-07       Impact factor: 6.937

2.  RysannMD: A biomedical semantic annotator balancing speed and accuracy.

Authors:  John Cuzzola; Jelena Jovanović; Ebrahim Bagheri
Journal:  J Biomed Inform       Date:  2017-05-26       Impact factor: 6.317

3.  Gimli: open source and high-performance biomedical name recognition.

Authors:  David Campos; Sérgio Matos; José Luís Oliveira
Journal:  BMC Bioinformatics       Date:  2013-02-15       Impact factor: 3.169

4.  The Protein Ontology: a structured representation of protein forms and complexes.

Authors:  Darren A Natale; Cecilia N Arighi; Winona C Barker; Judith A Blake; Carol J Bult; Michael Caudy; Harold J Drabkin; Peter D'Eustachio; Alexei V Evsikov; Hongzhan Huang; Jules Nchoutmboube; Natalia V Roberts; Barry Smith; Jian Zhang; Cathy H Wu
Journal:  Nucleic Acids Res       Date:  2010-10-08       Impact factor: 16.971

5.  Uberon, an integrative multi-species anatomy ontology.

Authors:  Christopher J Mungall; Carlo Torniai; Georgios V Gkoutos; Suzanna E Lewis; Melissa A Haendel
Journal:  Genome Biol       Date:  2012-01-31       Impact factor: 13.583

6.  Entity recognition in the biomedical domain using a hybrid approach.

Authors:  Marco Basaldella; Lenz Furrer; Carlo Tasso; Fabio Rinaldi
Journal:  J Biomed Semantics       Date:  2017-11-09

7.  OGER++: hybrid multi-type entity recognition.

Authors:  Lenz Furrer; Anna Jancso; Nicola Colic; Fabio Rinaldi
Journal:  J Cheminform       Date:  2019-01-21       Impact factor: 5.514

8.  Towards reliable named entity recognition in the biomedical domain.

Authors:  John M Giorgi; Gary D Bader
Journal:  Bioinformatics       Date:  2020-01-01       Impact factor: 6.937

9.  Abbreviation definition identification based on automatic precision estimates.

Authors:  Sunghwan Sohn; Donald C Comeau; Won Kim; W John Wilbur
Journal:  BMC Bioinformatics       Date:  2008-09-25       Impact factor: 3.169

10.  BioBERT: a pre-trained biomedical language representation model for biomedical text mining.

Authors:  Jinhyuk Lee; Wonjin Yoon; Sungdong Kim; Donghyeon Kim; Sunkyu Kim; Chan Ho So; Jaewoo Kang
Journal:  Bioinformatics       Date:  2020-02-15       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.