Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Toward information extraction: identifying protein names from biological papers.

Literature DB >> 9697224

Toward information extraction: identifying protein names from biological papers.

K Fukuda¹, A Tamura, T Tsunoda, T Takagi.

Abstract

To solve the mystery of the life phenomenon, we must clarify when genes are expressed and how their products interact with each other. But since the amount of continuously updated knowledge on these interactions is massive and is only available in the form of published articles, an intelligent information extraction (IE) system is needed. To extract these information directly from articles, the system must firstly identify the material names. However, medical and biological documents often include proper nouns newly made by the authors, and conventional methods based on domain specific dictionaries cannot detect such unknown words or coinages. In this study, we propose a new method of extracting material names, PROPER, using surface clue on character strings. It extracts material names in the sentence with 94.70% precision and 98.84% recall, regardless of whether it is already known or newly defined.

Mesh：

Substances：
Proteins

Year: 1998 PMID： 9697224

Source DB: PubMed Journal: Pac Symp Biocomput ISSN： 2335-6928

Keyword Cloud
Cited

70 in total

Toward information extraction: identifying protein names from biological papers.

1. Mining molecular binding terminology from biomedical text.

2. Automatic extraction of gene and protein synonyms from MEDLINE and journal articles.

3. Discovering protein similarity using natural language processing.

4. A simple and practical dictionary-based approach for identification of proteins in Medline abstracts.

5. Semantic relations asserting the etiology of genetic diseases.

6. Identification of related gene/protein names based on an HMM of name variations.

7. NLProt: extracting protein names and sequences from papers.

8. TRANSLATING BIOLOGY: TEXT MINING TOOLS THAT WORK.

9. A literature search tool for intelligent extraction of disease-associated genes.

10. Textpresso: an ontology-based information retrieval and extraction system for biological literature.