Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Improving the performance of dictionary-based approaches in protein name recognition.

Literature DB >> 15542019

Improving the performance of dictionary-based approaches in protein name recognition.

Abstract

Dictionary-based protein name recognition is often a first step in extracting information from biomedical documents because it can provide ID information on recognized terms. However, dictionary-based approaches present two fundamental difficulties: (1) false recognition mainly caused by short names; (2) low recall due to spelling variations. In this paper, we tackle the former problem using machine learning to filter out false positives and present two alternative methods for alleviating the latter problem of spelling variations. The first is achieved by using approximate string searching, and the second by expanding the dictionary with a probabilistic variant generator, which we propose in this paper. Experimental results using the GENIA corpus revealed that filtering using a naive Bayes classifier greatly improved precision with only a slight loss of recall, resulting in 10.8% improvement in F-measure, and dictionary expansion with the variant generator gave further 1.6% improvement and achieved an F-measure of 66.6%.

Entities: Disease

Mesh：

Substances：
Proteins

Year: 2004 PMID： 15542019 DOI： 10.1016/j.jbi.2004.08.003

Source DB: PubMed Journal: J Biomed Inform ISSN： 1532-0464 Impact factor: 6.317

Keyword Cloud
Cited

12 in total

10. Evaluation of linguistic features useful in extraction of interactions from PubMed; application to annotating known, high-throughput and predicted interactions in I2D.

Authors: Yun Niu; David Otasek; Igor Jurisica
Journal: Bioinformatics Date: 2009-10-22 Impact factor: 6.937

Improving the performance of dictionary-based approaches in protein name recognition.

1. BioTagger-GM: a gene/protein name recognition system.

Review 2. Network integration and graph analysis in mammalian molecular systems biology.

Review 3. Recent progress in automatically extracting information from the pharmacogenomic literature.

4. Mining metabolites: extracting the yeast metabolome from the literature.

5. Integrating text mining into the MGI biocuration workflow.

6. Gene and protein nomenclature in public databases.

7. Corpus annotation for mining biomedical events from literature.

8. Normalizing biomedical terms by minimizing ambiguity and variability.

9. Sequential pattern mining for discovering gene interactions and their contextual information from biomedical texts.

10. Evaluation of linguistic features useful in extraction of interactions from PubMed; application to annotating known, high-throughput and predicted interactions in I2D.