Literature DB >> 9697224

Toward information extraction: identifying protein names from biological papers.

K Fukuda1, A Tamura, T Tsunoda, T Takagi.   

Abstract

To solve the mystery of the life phenomenon, we must clarify when genes are expressed and how their products interact with each other. But since the amount of continuously updated knowledge on these interactions is massive and is only available in the form of published articles, an intelligent information extraction (IE) system is needed. To extract these information directly from articles, the system must firstly identify the material names. However, medical and biological documents often include proper nouns newly made by the authors, and conventional methods based on domain specific dictionaries cannot detect such unknown words or coinages. In this study, we propose a new method of extracting material names, PROPER, using surface clue on character strings. It extracts material names in the sentence with 94.70% precision and 98.84% recall, regardless of whether it is already known or newly defined.

Mesh:

Substances:

Year:  1998        PMID: 9697224

Source DB:  PubMed          Journal:  Pac Symp Biocomput        ISSN: 2335-6928


  70 in total

1.  Mining molecular binding terminology from biomedical text.

Authors:  T C Rindflesch; L Hunter; A R Aronson
Journal:  Proc AMIA Symp       Date:  1999

2.  Automatic extraction of gene and protein synonyms from MEDLINE and journal articles.

Authors:  Hong Yu; Vasileios Hatzivassiloglou; Carol Friedman; Andrey Rzhetsky; W John Wilbur
Journal:  Proc AMIA Symp       Date:  2002

3.  Discovering protein similarity using natural language processing.

Authors:  Indra N Sarkar; Thomas C Rindflesch
Journal:  Proc AMIA Symp       Date:  2002

4.  A simple and practical dictionary-based approach for identification of proteins in Medline abstracts.

Authors:  Sergei Egorov; Anton Yuryev; Nikolai Daraselia
Journal:  J Am Med Inform Assoc       Date:  2004-02-05       Impact factor: 4.497

5.  Semantic relations asserting the etiology of genetic diseases.

Authors:  Thomas C Rindflesch; Bisharah Libbus; Dimitar Hristovski; Alan R Aronson; Halil Kilicoglu
Journal:  AMIA Annu Symp Proc       Date:  2003

6.  Identification of related gene/protein names based on an HMM of name variations.

Authors:  L Yeganova; L Smith; W J Wilbur
Journal:  Comput Biol Chem       Date:  2004-04       Impact factor: 2.877

7.  NLProt: extracting protein names and sequences from papers.

Authors:  Sven Mika; Burkhard Rost
Journal:  Nucleic Acids Res       Date:  2004-07-01       Impact factor: 16.971

8.  TRANSLATING BIOLOGY: TEXT MINING TOOLS THAT WORK.

Authors:  K Bretonnel Cohen; Hong Yu; Philip E Bourne; Lynette Hirschman
Journal:  Pac Symp Biocomput       Date:  2008-01-01

9.  A literature search tool for intelligent extraction of disease-associated genes.

Authors:  Jae-Yoon Jung; Todd F DeLuca; Tristan H Nelson; Dennis P Wall
Journal:  J Am Med Inform Assoc       Date:  2013-09-02       Impact factor: 4.497

10.  Textpresso: an ontology-based information retrieval and extraction system for biological literature.

Authors:  Hans-Michael Müller; Eimear E Kenny; Paul W Sternberg
Journal:  PLoS Biol       Date:  2004-09-21       Impact factor: 8.029

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.