Literature DB >> 12968781

Automatically identifying gene/protein terms in MEDLINE abstracts.

Hong Yu1, Vasileios Hatzivassiloglou, Andrey Rzhetsky, W John Wilbur.   

Abstract

MOTIVATION: Natural language processing (NLP) techniques are used to extract information automatically from computer-readable literature. In biology, the identification of terms corresponding to biological substances (e.g., genes and proteins) is a necessary step that precedes the application of other NLP systems that extract biological information (e.g., protein-protein interactions, gene regulation events, and biochemical pathways). We have developed GPmarkup (for "gene/protein-full name mark up"), a software system that automatically identifies gene/protein terms (i.e., symbols or full names) in MEDLINE abstracts. As a part of marking up process, we also generated automatically a knowledge source of paired gene/protein symbols and full names (e.g., LARD for lymphocyte associated receptor of death) from MEDLINE. We found that many of the pairs in our knowledge source do not appear in the current GenBank database. Therefore our methods may also be used for automatic lexicon generation.
RESULTS: GPmarkup has 73% recall and 93% precision in identifying and marking up gene/protein terms in MEDLINE abstracts. AVAILABILITY: A random sample of gene/protein symbols and full names and a sample set of marked up abstracts can be viewed at http://www.cpmc.columbia.edu/homepages/yuh9001/GPmarkup/. Contact. hy52@columbia.edu. Voice: 212-939-7028; fax: 212-666-0140.

Entities:  

Mesh:

Substances:

Year:  2002        PMID: 12968781     DOI: 10.1016/s1532-0464(03)00032-7

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  5 in total

1.  Identification of related gene/protein names based on an HMM of name variations.

Authors:  L Yeganova; L Smith; W J Wilbur
Journal:  Comput Biol Chem       Date:  2004-04       Impact factor: 2.877

2.  Enhancing acronym/abbreviation knowledge bases with semantic information.

Authors:  Manabu Torii; Hongfang Liu
Journal:  AMIA Annu Symp Proc       Date:  2007-10-11

3.  Contextual weighting for Support Vector Machines in literature mining: an application to gene versus protein name disambiguation.

Authors:  Tapio Pahikkala; Filip Ginter; Jorma Boberg; Jouni Järvinen; Tapio Salakoski
Journal:  BMC Bioinformatics       Date:  2005-06-22       Impact factor: 3.169

4.  Argument-predicate distance as a filter for enhancing precision in extracting predications on the genetic etiology of disease.

Authors:  Marco Masseroli; Halil Kilicoglu; François-Michel Lang; Thomas C Rindflesch
Journal:  BMC Bioinformatics       Date:  2006-06-08       Impact factor: 3.169

5.  A comparison study on algorithms of detecting long forms for short forms in biomedical text.

Authors:  Manabu Torii; Zhang-zhi Hu; Min Song; Cathy H Wu; Hongfang Liu
Journal:  BMC Bioinformatics       Date:  2007-11-27       Impact factor: 3.169

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.