Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Tagging gene and protein names in biomedical text.

Literature DB >> 12176836

Tagging gene and protein names in biomedical text.

Abstract

MOTIVATION: The MEDLINE database of biomedical abstracts contains scientific knowledge about thousands of interacting genes and proteins. Automated text processing can aid in the comprehension and synthesis of this valuable information. The fundamental task of identifying gene and protein names is a necessary first step towards making full use of the information encoded in biomedical text. This remains a challenging task due to the irregularities and ambiguities in gene and protein nomenclature. We propose to approach the detection of gene and protein names in scientific abstracts as part-of-speech tagging, the most basic form of linguistic corpus annotation.
RESULTS: We present a method for tagging gene and protein names in biomedical text using a combination of statistical and knowledge-based strategies. This method incorporates automatically generated rules from a transformation-based part-of-speech tagger, and manually generated rules from morphological clues, low frequency trigrams, indicator terms, suffixes and part-of-speech information. Results of an experiment on a test corpus of 56K MEDLINE documents demonstrate that our method to extract gene and protein names can be applied to large sets of MEDLINE abstracts, without the need for special conditions or human experts to predetermine relevant subsets. AVAILABILITY: The programs are available on request from the authors.

Entities: Species

Mesh：

Substances：
Proteins
DNA

Year: 2002 PMID： 12176836 DOI： 10.1093/bioinformatics/18.8.1124

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

Keyword Cloud
Cited

73 in total

1. A simple and practical dictionary-based approach for identification of proteins in Medline abstracts.

Authors: Sergei Egorov; Anton Yuryev; Nikolai Daraselia
Journal: J Am Med Inform Assoc Date: 2004-02-05 Impact factor: 4.497

2. Semantic relations asserting the etiology of genetic diseases.

Authors: Thomas C Rindflesch; Bisharah Libbus; Dimitar Hristovski; Alan R Aronson; Halil Kilicoglu
Journal: AMIA Annu Symp Proc Date: 2003

3. Identification of related gene/protein names based on an HMM of name variations.

Authors: L Yeganova; L Smith; W J Wilbur
Journal: Comput Biol Chem Date: 2004-04 Impact factor: 2.877

4. NLProt: extracting protein names and sequences from papers.

Authors: Sven Mika; Burkhard Rost
Journal: Nucleic Acids Res Date: 2004-07-01 Impact factor: 16.971

5. Toward an automatic method for extracting cancer- and other disease-related point mutations from the biomedical literature.

Authors: Emily Doughty; Attila Kertesz-Farkas; Olivier Bodenreider; Gary Thompson; Asa Adadey; Thomas Peterson; Maricel G Kann
Journal: Bioinformatics Date: 2010-12-07 Impact factor: 6.937

Tagging gene and protein names in biomedical text.

1. A simple and practical dictionary-based approach for identification of proteins in Medline abstracts.

2. Semantic relations asserting the etiology of genetic diseases.

3. Identification of related gene/protein names based on an HMM of name variations.

4. NLProt: extracting protein names and sequences from papers.

5. Toward an automatic method for extracting cancer- and other disease-related point mutations from the biomedical literature.

6. Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts.

7. Quantitative assessment of dictionary-based protein named entity tagging.

8. The value of parsing as feature generation for gene mention recognition.

9. Biological entity recognition with conditional random fields.

Review 10. Recent progress in automatically extracting information from the pharmacogenomic literature.