Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Text mining and protein annotations: the construction and use of protein description sentences.

Literature DB >> 17503385

Text mining and protein annotations: the construction and use of protein description sentences.

Martin Krallinger¹, Rainer Malik, Alfonso Valencia.

Abstract

Existing biological knowledge stored as structured database records has been extracted manually by database curators analyzing the scientific literature. Most of this information was derived from sentences which describe biologically relevant aspects of genes and gene products. We introduce the Protein description sentence (Prodisen) corpus, a useful resource for the automatic identification and construction of text-based protein and gene description records using information extraction and text classification techniques. Basic guidelines and criteria relevant for the construction of a text corpus of functional descriptions of genes and proteins are proposed. The steps used for the corpus construction and its features are presented. Moreover, some of the potential applications of the Prodisen corpus for biomedical text mining purposes are explored and the obtained results are presented.

Mesh：

Substances：
Proteins

Year: 2006 PMID： 17503385

Source DB: PubMed Journal: Genome Inform ISSN： 0919-9454

Keyword Cloud
Cited

5 in total

Text mining and protein annotations: the construction and use of protein description sentences.

1. Predicting protein functions by applying predicate logic to biomedical literature.

2. New challenges for text mining: mapping between text and manually curated pathways.

3. Assessment of disease named entity recognition on a corpus of annotated sentences.

Review 4. Linking genes to literature: text mining, information extraction, and retrieval applications for biology.

5. Overview of the protein-protein interaction annotation extraction task of BioCreative II.