Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Parallel sequence tagging for concept recognition.

Literature DB >> 35331131

Parallel sequence tagging for concept recognition.

Lenz Furrer^1,2, Joseph Cornelius^3,2, Fabio Rinaldi^4,5,6,7.

Abstract

BACKGROUND: Named Entity Recognition (NER) and Normalisation (NEN) are core components of any text-mining system for biomedical texts. In a traditional concept-recognition pipeline, these tasks are combined in a serial way, which is inherently prone to error propagation from NER to NEN. We propose a parallel architecture, where both NER and NEN are modeled as a sequence-labeling task, operating directly on the source text. We examine different harmonisation strategies for merging the predictions of the two classifiers into a single output sequence.
RESULTS: We test our approach on the recent Version 4 of the CRAFT corpus. In all 20 annotation sets of the concept-annotation task, our system outperforms the pipeline system reported as a baseline in the CRAFT shared task, a competition of the BioNLP Open Shared Tasks 2019. We further refine the systems from the shared task by optimising the harmonisation strategy separately for each annotation set.
CONCLUSIONS: Our analysis shows that the strengths of the two classifiers can be combined in a fruitful way. However, prediction harmonisation requires individual calibration on a development set for each annotation set. This allows achieving a good trade-off between established knowledge (training set) and novel information (unseen concepts).

Entities: Chemical

Keywords: Concept recognition; Named entity recognition and normalization; Neural network; Sequence tagging; Text mining

Mesh：
Data Mining

Year: 2022 PMID： 35331131 PMCID： PMC8943923 DOI： 10.1186/s12859-021-04511-y

Source DB: PubMed Journal: BMC Bioinformatics ISSN： 1471-2105 Impact factor: 3.169

Keyword Cloud
References

32 in total

1. A new method to measure the semantic similarity of GO terms.

Authors: James Z Wang; Zhidian Du; Rapeeporn Payattakool; Philip S Yu; Chin-Fu Chen
Journal: Bioinformatics Date: 2007-03-07 Impact factor: 6.937

2. RysannMD: A biomedical semantic annotator balancing speed and accuracy.

Authors: John Cuzzola; Jelena Jovanović; Ebrahim Bagheri
Journal: J Biomed Inform Date: 2017-05-26 Impact factor: 6.317

3. Gimli: open source and high-performance biomedical name recognition.

Authors: David Campos; Sérgio Matos; José Luís Oliveira
Journal: BMC Bioinformatics Date: 2013-02-15 Impact factor: 3.169

4. The Protein Ontology: a structured representation of protein forms and complexes.

Authors: Darren A Natale; Cecilia N Arighi; Winona C Barker; Judith A Blake; Carol J Bult; Michael Caudy; Harold J Drabkin; Peter D'Eustachio; Alexei V Evsikov; Hongzhan Huang; Jules Nchoutmboube; Natalia V Roberts; Barry Smith; Jian Zhang; Cathy H Wu
Journal: Nucleic Acids Res Date: 2010-10-08 Impact factor: 16.971

Parallel sequence tagging for concept recognition.

1. A new method to measure the semantic similarity of GO terms.

2. RysannMD: A biomedical semantic annotator balancing speed and accuracy.

3. Gimli: open source and high-performance biomedical name recognition.

4. The Protein Ontology: a structured representation of protein forms and complexes.

5. Uberon, an integrative multi-species anatomy ontology.

6. Entity recognition in the biomedical domain using a hybrid approach.

7. OGER++: hybrid multi-type entity recognition.

8. Towards reliable named entity recognition in the biomedical domain.

9. Abbreviation definition identification based on automatic precision estimates.

10. BioBERT: a pre-trained biomedical language representation model for biomedical text mining.