Literature DB >> 12460631

Protein names and how to find them.

Kristofer Franzén1, Gunnar Eriksson, Fredrik Olsson, Lars Asker, Per Lidén, Joakim Cöster.   

Abstract

A prerequisite for all higher level information extraction tasks is the identification of unknown names in text. Today, when large corpora can consist of billions of words, it is of utmost importance to develop accurate techniques for the automatic detection, extraction and categorization of named entities in these corpora. Although named entity recognition might be regarded a solved problem in some domains, it still poses a significant challenge in others. In this work we focus on one of the more difficult tasks, the identification of protein names in text. This task presents several interesting difficulties because of the named entities variant structural characteristics, their sometimes unclear status as names, the lack of common standards and fixed nomenclatures, and the specifics of the texts in the molecular biology domain in which they appear. We describe how we approached these and other difficulties in the implementation of Yapex, a system for the automatic identification of protein names in text. We also evaluate Yapex under four different notions of correctness and compare its performance to that of another publicly available system for protein name recognition.

Mesh:

Substances:

Year:  2002        PMID: 12460631     DOI: 10.1016/s1386-5056(02)00052-7

Source DB:  PubMed          Journal:  Int J Med Inform        ISSN: 1386-5056            Impact factor:   4.046


  26 in total

1.  A simple and practical dictionary-based approach for identification of proteins in Medline abstracts.

Authors:  Sergei Egorov; Anton Yuryev; Nikolai Daraselia
Journal:  J Am Med Inform Assoc       Date:  2004-02-05       Impact factor: 4.497

2.  Identification of related gene/protein names based on an HMM of name variations.

Authors:  L Yeganova; L Smith; W J Wilbur
Journal:  Comput Biol Chem       Date:  2004-04       Impact factor: 2.877

3.  NLProt: extracting protein names and sequences from papers.

Authors:  Sven Mika; Burkhard Rost
Journal:  Nucleic Acids Res       Date:  2004-07-01       Impact factor: 16.971

4.  High-recall protein entity recognition using a dictionary.

Authors:  Zhenzhen Kou; William W Cohen; Robert F Murphy
Journal:  Bioinformatics       Date:  2005-06       Impact factor: 6.937

5.  Empirical data on corpus design and usage in biomedical natural language processing.

Authors:  K Bretonnel Cohen; Lynne Fox; Philip V Ogren; Lawrence Hunter
Journal:  AMIA Annu Symp Proc       Date:  2005

6.  Quantitative assessment of dictionary-based protein named entity tagging.

Authors:  Hongfang Liu; Zhang-Zhi Hu; Manabu Torii; Cathy Wu; Carol Friedman
Journal:  J Am Med Inform Assoc       Date:  2006-06-23       Impact factor: 4.497

7.  Gene/protein name recognition based on support vector machine using dictionary as features.

Authors:  Tomohiro Mitsumori; Sevrani Fation; Masaki Murata; Kouichi Doi; Hirohumi Doi
Journal:  BMC Bioinformatics       Date:  2005-05-24       Impact factor: 3.169

8.  BioTagger-GM: a gene/protein name recognition system.

Authors:  Manabu Torii; Zhangzhi Hu; Cathy H Wu; Hongfang Liu
Journal:  J Am Med Inform Assoc       Date:  2008-12-11       Impact factor: 4.497

9.  BioDEAL: community generation of biological annotations.

Authors:  Paul Breimyer; Nathan Green; Vinay Kumar; Nagiza F Samatova
Journal:  BMC Med Inform Decis Mak       Date:  2009-11-03       Impact factor: 2.796

10.  A realistic assessment of methods for extracting gene/protein interactions from free text.

Authors:  Renata Kabiljo; Andrew B Clegg; Adrian J Shepherd
Journal:  BMC Bioinformatics       Date:  2009-07-28       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.