Literature DB >> 21056118

Comparing and combining chunkers of biomedical text.

Ning Kang1, Erik M van Mulligen, Jan A Kors.   

Abstract

Text chunking is an essential pre-processing step in information extraction systems. No comparative studies of chunking systems, including sentence splitting, tokenization and part-of-speech tagging, are available for the biomedical domain. We compared the usability (ease of integration, speed, trainability) and performance of six state-of-the-art chunkers for the biomedical domain, and combined the chunker results in order to improve chunking performance. We investigated six frequently used chunkers: GATE chunker, Genia Tagger, Lingpipe, MetaMap, OpenNLP, and Yamcha. All chunkers were integrated into the Unstructured Information Management Architecture framework. The GENIA Treebank corpus was used for training and testing. Performance was assessed for noun-phrase and verb-phrase chunking. For both noun-phrase chunking and verb-phrase chunking, OpenNLP performed best (F-scores 89.7% and 95.7%, respectively), but differences with Genia Tagger and Yamcha were small. With respect to usability, Lingpipe and OpenNLP scored best. When combining the results of the chunkers by a simple voting scheme, the F-score of the combined system improved by 3.1 percentage point for noun phrases and 0.6 percentage point for verb phrases as compared to the best single chunker. Changing the voting threshold offered a simple way to obtain a system with high precision (and moderate recall) or high recall (and moderate precision). This study is the first to compare the performance of the whole chunking pipeline, and to combine different existing chunking systems. Several chunkers showed good performance, but OpenNLP scored best both in performance and usability. The combination of chunker results by a simple voting scheme can further improve performance and allows for different precision-recall settings.
Copyright © 2010 Elsevier Inc. All rights reserved.

Mesh:

Year:  2010        PMID: 21056118     DOI: 10.1016/j.jbi.2010.10.005

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  8 in total

1.  Bridging the text-image gap: a decision support tool for real-time PACS browsing.

Authors:  Merlijn Sevenster; Rob van Ommering; Yuechen Qian
Journal:  J Digit Imaging       Date:  2012-04       Impact factor: 4.056

2.  Recognition of chemical entities: combining dictionary-based and grammar-based approaches.

Authors:  Saber A Akhondi; Kristina M Hettne; Eelke van der Horst; Erik M van Mulligen; Jan A Kors
Journal:  J Cheminform       Date:  2015-01-19       Impact factor: 5.514

3.  Comparison of a semi-automatic annotation tool and a natural language processing application for the generation of clinical statement entries.

Authors:  Ching-Heng Lin; Nai-Yuan Wu; Wei-Shao Lai; Der-Ming Liou
Journal:  J Am Med Inform Assoc       Date:  2014-10-20       Impact factor: 4.497

4.  Unsupervised biomedical named entity recognition: experiments with clinical and biological texts.

Authors:  Shaodian Zhang; Noémie Elhadad
Journal:  J Biomed Inform       Date:  2013-08-15       Impact factor: 6.317

5.  Unsupervised acquisition of idiomatic units of symbolic natural language: An n-gram frequency-based approach for the chunking of news articles and tweets.

Authors:  Dario Borrelli; Gabriela Gongora Svartzman; Carlo Lipizzi
Journal:  PLoS One       Date:  2020-06-08       Impact factor: 3.240

6.  Training text chunkers on a silver standard corpus: can silver replace gold?

Authors:  Ning Kang; Erik M van Mulligen; Jan A Kors
Journal:  BMC Bioinformatics       Date:  2012-01-30       Impact factor: 3.169

7.  A modular framework for biomedical concept recognition.

Authors:  David Campos; Sérgio Matos; José Luís Oliveira
Journal:  BMC Bioinformatics       Date:  2013-09-24       Impact factor: 3.169

8.  Knowledge-based extraction of adverse drug events from biomedical text.

Authors:  Ning Kang; Bharat Singh; Chinh Bui; Zubair Afzal; Erik M van Mulligen; Jan A Kors
Journal:  BMC Bioinformatics       Date:  2014-03-04       Impact factor: 3.169

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.