Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Protein names and how to find them.

Literature DB >> 12460631

Protein names and how to find them.

Kristofer Franzén¹, Gunnar Eriksson, Fredrik Olsson, Lars Asker, Per Lidén, Joakim Cöster.

Abstract

A prerequisite for all higher level information extraction tasks is the identification of unknown names in text. Today, when large corpora can consist of billions of words, it is of utmost importance to develop accurate techniques for the automatic detection, extraction and categorization of named entities in these corpora. Although named entity recognition might be regarded a solved problem in some domains, it still poses a significant challenge in others. In this work we focus on one of the more difficult tasks, the identification of protein names in text. This task presents several interesting difficulties because of the named entities variant structural characteristics, their sometimes unclear status as names, the lack of common standards and fixed nomenclatures, and the specifics of the texts in the molecular biology domain in which they appear. We describe how we approached these and other difficulties in the implementation of Yapex, a system for the automatic identification of protein names in text. We also evaluate Yapex under four different notions of correctness and compare its performance to that of another publicly available system for protein name recognition.

Mesh：

Substances：
Proteins

Year: 2002 PMID： 12460631 DOI： 10.1016/s1386-5056(02)00052-7

Source DB: PubMed Journal: Int J Med Inform ISSN： 1386-5056 Impact factor: 4.046

Keyword Cloud
Cited

26 in total

Protein names and how to find them.

1. A simple and practical dictionary-based approach for identification of proteins in Medline abstracts.

2. Identification of related gene/protein names based on an HMM of name variations.

3. NLProt: extracting protein names and sequences from papers.

4. High-recall protein entity recognition using a dictionary.

5. Empirical data on corpus design and usage in biomedical natural language processing.

6. Quantitative assessment of dictionary-based protein named entity tagging.

7. Gene/protein name recognition based on support vector machine using dictionary as features.

8. BioTagger-GM: a gene/protein name recognition system.

9. BioDEAL: community generation of biological annotations.

10. A realistic assessment of methods for extracting gene/protein interactions from free text.