Literature DB >> 35308957

Hybrid Ensemble-Rule Algorithm for Improved MEDLINE® Sentence Boundary Detection.

Daniel X Le1, James G Mork1, Sameer Antani1.   

Abstract

Sentence boundary detection (SBD) is a fundamental building block in the Natural Language Processing (NLP) pipeline. Incorrect SBD may impact subsequent processing stages resulting in decreased performance. In well-behaved corpora, a few simple rules based on punctuation and capitalization are sufficient for successfully detecting sentence boundaries. However, a corpus like MEDLINE citations presents challenges for SBD due to several syntactic ambiguities, e.g., abbreviation-periods, capital letters in first words of sentences, etc. In this manuscript we present an algorithm to address these challenges based on majority voting among three SBD engines (Python NLTK, pySBD, and Syntok) followed by custom post-processing algorithms that rely on NLP spaCy part-of-speech, abbreviation and capital letter detection, and computing general sentence statistics. Experiments on several thousand MEDLINE citations show that our proposed approach for combining multiple SBD engines and post-processing rules performs better than each individual engine. ©2021 AMIA - All rights reserved.

Entities:  

Mesh:

Year:  2022        PMID: 35308957      PMCID: PMC8861722     

Source DB:  PubMed          Journal:  AMIA Annu Symp Proc        ISSN: 1559-4076


  4 in total

1.  GENIA corpus--semantically annotated corpus for bio-textmining.

Authors:  J-D Kim; T Ohta; Y Tateisi; J Tsujii
Journal:  Bioinformatics       Date:  2003       Impact factor: 6.937

2.  2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text.

Authors:  Özlem Uzuner; Brett R South; Shuying Shen; Scott L DuVall
Journal:  J Am Med Inform Assoc       Date:  2011-06-16       Impact factor: 4.497

3.  Detection of sentence boundaries and abbreviations in clinical narratives.

Authors:  Markus Kreuzthaler; Stefan Schulz
Journal:  BMC Med Inform Decis Mak       Date:  2015-06-15       Impact factor: 2.796

4.  A Quantitative and Qualitative Evaluation of Sentence Boundary Detection for the Clinical Domain.

Authors:  Denis Griffis; Chaitanya Shivade; Eric Fosler-Lussier; Albert M Lai
Journal:  AMIA Jt Summits Transl Sci Proc       Date:  2016-07-20
  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.