Literature DB >> 15542019

Improving the performance of dictionary-based approaches in protein name recognition.

Yoshimasa Tsuruoka1, Jun'ichi Tsujii.   

Abstract

Dictionary-based protein name recognition is often a first step in extracting information from biomedical documents because it can provide ID information on recognized terms. However, dictionary-based approaches present two fundamental difficulties: (1) false recognition mainly caused by short names; (2) low recall due to spelling variations. In this paper, we tackle the former problem using machine learning to filter out false positives and present two alternative methods for alleviating the latter problem of spelling variations. The first is achieved by using approximate string searching, and the second by expanding the dictionary with a probabilistic variant generator, which we propose in this paper. Experimental results using the GENIA corpus revealed that filtering using a naive Bayes classifier greatly improved precision with only a slight loss of recall, resulting in 10.8% improvement in F-measure, and dictionary expansion with the variant generator gave further 1.6% improvement and achieved an F-measure of 66.6%.

Entities:  

Mesh:

Substances:

Year:  2004        PMID: 15542019     DOI: 10.1016/j.jbi.2004.08.003

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  12 in total

1.  BioTagger-GM: a gene/protein name recognition system.

Authors:  Manabu Torii; Zhangzhi Hu; Cathy H Wu; Hongfang Liu
Journal:  J Am Med Inform Assoc       Date:  2008-12-11       Impact factor: 4.497

Review 2.  Network integration and graph analysis in mammalian molecular systems biology.

Authors:  A Ma'ayan
Journal:  IET Syst Biol       Date:  2008-09       Impact factor: 1.615

Review 3.  Recent progress in automatically extracting information from the pharmacogenomic literature.

Authors:  Yael Garten; Adrien Coulet; Russ B Altman
Journal:  Pharmacogenomics       Date:  2010-10       Impact factor: 2.533

4.  Mining metabolites: extracting the yeast metabolome from the literature.

Authors:  Chikashi Nobata; Paul D Dobson; Syed A Iqbal; Pedro Mendes; Jun'ichi Tsujii; Douglas B Kell; Sophia Ananiadou
Journal:  Metabolomics       Date:  2010-10-31       Impact factor: 4.290

5.  Integrating text mining into the MGI biocuration workflow.

Authors:  K G Dowell; M S McAndrews-Hill; D P Hill; H J Drabkin; J A Blake
Journal:  Database (Oxford)       Date:  2009-11-21       Impact factor: 3.451

6.  Gene and protein nomenclature in public databases.

Authors:  Katrin Fundel; Ralf Zimmer
Journal:  BMC Bioinformatics       Date:  2006-08-09       Impact factor: 3.169

7.  Corpus annotation for mining biomedical events from literature.

Authors:  Jin-Dong Kim; Tomoko Ohta; Jun'ichi Tsujii
Journal:  BMC Bioinformatics       Date:  2008-01-08       Impact factor: 3.169

8.  Normalizing biomedical terms by minimizing ambiguity and variability.

Authors:  Yoshimasa Tsuruoka; John McNaught; Sophia Ananiadou
Journal:  BMC Bioinformatics       Date:  2008-04-11       Impact factor: 3.169

9.  Sequential pattern mining for discovering gene interactions and their contextual information from biomedical texts.

Authors:  Peggy Cellier; Thierry Charnois; Marc Plantevit; Christophe Rigotti; Bruno Crémilleux; Olivier Gandrillon; Jiří Kléma; Jean-Luc Manguin
Journal:  J Biomed Semantics       Date:  2015-05-18

10.  Evaluation of linguistic features useful in extraction of interactions from PubMed; application to annotating known, high-throughput and predicted interactions in I2D.

Authors:  Yun Niu; David Otasek; Igor Jurisica
Journal:  Bioinformatics       Date:  2009-10-22       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.