Literature DB >> 16452800

A probabilistic model for identifying protein names and their name boundaries.

Kazuhiro Seki1, Javed Mostafa.   

Abstract

This paper proposes a method for identifying protein names in biomedical texts with an emphasis on detecting protein name boundaries. We use a probabilistic model which exploits several surface clues characterizing protein names and incorporates word classes for generalization. In contrast to previously proposed methods, our approach does not rely on natural language processing tools such as part-of-speech taggers and syntactic parsers, so as to reduce processing overhead and the potential number of probabilistic parameters to be estimated. A notion of certainty is also proposed to improve precision for identification. We implemented a protein name identification system based on our proposed method, and evaluated the system on real-world biomedical texts in conjunction with the previous work. The results showed that overall our system performs comparably to the state-of-the-art protein name identification system and that higher performance is achieved for compound names. In addition, it is demonstrated that our system can further improve precision by restricting the system output to those names with high certainties.

Mesh:

Substances:

Year:  2003        PMID: 16452800

Source DB:  PubMed          Journal:  Proc IEEE Comput Soc Bioinform Conf        ISSN: 1555-3930


  5 in total

1.  A simple and practical dictionary-based approach for identification of proteins in Medline abstracts.

Authors:  Sergei Egorov; Anton Yuryev; Nikolai Daraselia
Journal:  J Am Med Inform Assoc       Date:  2004-02-05       Impact factor: 4.497

2.  A Deep Language Model for Symptom Extraction From Clinical Text and its Application to Extract COVID-19 Symptoms From Social Media.

Authors:  Xiao Luo; Priyanka Gandhi; Susan Storey; Kun Huang
Journal:  IEEE J Biomed Health Inform       Date:  2022-04-14       Impact factor: 7.021

3.  Various criteria in the evaluation of biomedical named entity recognition.

Authors:  Richard Tzong-Han Tsai; Shih-Hung Wu; Wen-Chi Chou; Yu-Chun Lin; Ding He; Jieh Hsiang; Ting-Yi Sung; Wen-Lian Hsu
Journal:  BMC Bioinformatics       Date:  2006-02-24       Impact factor: 3.169

4.  Systematic feature evaluation for gene name recognition.

Authors:  Jörg Hakenberg; Steffen Bickel; Conrad Plake; Ulf Brefeld; Hagen Zahn; Lukas Faulstich; Ulf Leser; Tobias Scheffer
Journal:  BMC Bioinformatics       Date:  2005-05-24       Impact factor: 3.169

5.  Retrieval with gene queries.

Authors:  Aditya K Sehgal; Padmini Srinivasan
Journal:  BMC Bioinformatics       Date:  2006-04-21       Impact factor: 3.169

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.