Literature DB >> 12603045

Playing biology's name game: identifying protein names in scientific text.

Daniel Hanisch1, Juliane Fluck, Heinz-Theodor Mevissen, Ralf Zimmer.   

Abstract

A growing body of work is devoted to the extraction of protein or gene interaction information from the scientific literature. Yet, the basis for most extraction algorithms, i.e. the specific and sensitive recognition of protein and gene names and their numerous synonyms, has not been adequately addressed. Here we describe the construction of a comprehensive general purpose name dictionary and an accompanying automatic curation procedure based on a simple token model of protein names. We designed an efficient search algorithm to analyze all abstracts in MEDLINE in a reasonable amount of time on standard computers. The parameters of our method are optimized using machine learning techniques. Used in conjunction, these ingredients lead to good search performance. A supplementary web page is available at http://cartan.gmd.de/ProMiner/.

Mesh:

Substances:

Year:  2003        PMID: 12603045

Source DB:  PubMed          Journal:  Pac Symp Biocomput        ISSN: 2335-6928


  19 in total

1.  Gene indexing: characterization and analysis of NLM's GeneRIFs.

Authors:  Joyce A Mitchell; Alan R Aronson; James G Mork; Lillian C Folk; Susanne M Humphrey; Janice M Ward
Journal:  AMIA Annu Symp Proc       Date:  2003

2.  Identification of related gene/protein names based on an HMM of name variations.

Authors:  L Yeganova; L Smith; W J Wilbur
Journal:  Comput Biol Chem       Date:  2004-04       Impact factor: 2.877

3.  NLProt: extracting protein names and sequences from papers.

Authors:  Sven Mika; Burkhard Rost
Journal:  Nucleic Acids Res       Date:  2004-07-01       Impact factor: 16.971

4.  Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts.

Authors:  A M Cohen; W R Hersh; C Dubay; K Spackman
Journal:  BMC Bioinformatics       Date:  2005-04-22       Impact factor: 3.169

5.  Quantitative assessment of dictionary-based protein named entity tagging.

Authors:  Hongfang Liu; Zhang-Zhi Hu; Manabu Torii; Cathy Wu; Carol Friedman
Journal:  J Am Med Inform Assoc       Date:  2006-06-23       Impact factor: 4.497

6.  NERBio: using selected word conjunctions, term normalization, and global patterns to improve biomedical named entity recognition.

Authors:  Richard Tzong-Han Tsai; Cheng-Lung Sung; Hong-Jie Dai; Hsieh-Chuan Hung; Ting-Yi Sung; Wen-Lian Hsu
Journal:  BMC Bioinformatics       Date:  2006-12-18       Impact factor: 3.169

7.  BioTagger-GM: a gene/protein name recognition system.

Authors:  Manabu Torii; Zhangzhi Hu; Cathy H Wu; Hongfang Liu
Journal:  J Am Med Inform Assoc       Date:  2008-12-11       Impact factor: 4.497

Review 8.  Recent progress in automatically extracting information from the pharmacogenomic literature.

Authors:  Yael Garten; Adrien Coulet; Russ B Altman
Journal:  Pharmacogenomics       Date:  2010-10       Impact factor: 2.533

9.  Textpresso: an ontology-based information retrieval and extraction system for biological literature.

Authors:  Hans-Michael Müller; Eimear E Kenny; Paul W Sternberg
Journal:  PLoS Biol       Date:  2004-09-21       Impact factor: 8.029

10.  BIOADI: a machine learning approach to identifying abbreviations and definitions in biological literature.

Authors:  Cheng-Ju Kuo; Maurice H T Ling; Kuan-Ting Lin; Chun-Nan Hsu
Journal:  BMC Bioinformatics       Date:  2009-12-03       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.