Literature DB >> 15130538

Identification of related gene/protein names based on an HMM of name variations.

L Yeganova1, L Smith, W J Wilbur.   

Abstract

Gene and protein names follow few, if any, true naming conventions and are subject to great variation in different occurrences of the same name. This gives rise to two important problems in natural language processing. First, can one locate the names of genes or proteins in free text, and second, can one determine when two names denote the same gene or protein? The first of these problems is a special case of the problem of named entity recognition, while the second is a special case of the problem of automatic term recognition (ATR). We study the second problem, that of gene or protein name variation. Here we describe a system which, given a query gene or protein name, identifies related gene or protein names in a large list. The system is based on a dynamic programming algorithm for sequence alignment in which the mutation matrix is allowed to vary under the control of a fully trainable hidden Markov model.

Mesh:

Substances:

Year:  2004        PMID: 15130538      PMCID: PMC5815558          DOI: 10.1016/j.compbiolchem.2003.12.003

Source DB:  PubMed          Journal:  Comput Biol Chem        ISSN: 1476-9271            Impact factor:   2.877


  26 in total

1.  RefSeq and LocusLink: NCBI gene-centered resources.

Authors:  K D Pruitt; D R Maglott
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

2.  Supporting the classification of pathology reports: comparing two information retrieval methods.

Authors:  L M de Bruijn; A Hasman; J W Arends
Journal:  Comput Methods Programs Biomed       Date:  2000-06       Impact factor: 5.428

3.  Getting to the (c)ore of knowledge: mining biomedical literature.

Authors:  Berry de Bruijn; Joel Martin
Journal:  Int J Med Inform       Date:  2002-12-04       Impact factor: 4.046

4.  Terminology-driven mining of biomedical literature.

Authors:  Goran Nenadic; Irena Spasic; Sophia Ananiadou
Journal:  Bioinformatics       Date:  2003-05-22       Impact factor: 6.937

5.  GenBank.

Authors:  Dennis A Benson; Ilene Karsch-Mizrachi; David J Lipman; James Ostell; David L Wheeler
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

6.  Automatically identifying gene/protein terms in MEDLINE abstracts.

Authors:  Hong Yu; Vasileios Hatzivassiloglou; Andrey Rzhetsky; W John Wilbur
Journal:  J Biomed Inform       Date:  2002 Oct-Dec       Impact factor: 6.317

7.  Tagging gene and protein names in biomedical text.

Authors:  Lorraine Tanabe; W John Wilbur
Journal:  Bioinformatics       Date:  2002-08       Impact factor: 6.937

8.  Hidden Markov models and optimized sequence alignments.

Authors:  L Smith; L Yeganova; W J Wilbur
Journal:  Comput Biol Chem       Date:  2003-02       Impact factor: 2.877

9.  Toward information extraction: identifying protein names from biological papers.

Authors:  K Fukuda; A Tamura; T Tsunoda; T Takagi
Journal:  Pac Symp Biocomput       Date:  1998

10.  An improved algorithm for matching biological sequences.

Authors:  O Gotoh
Journal:  J Mol Biol       Date:  1982-12-15       Impact factor: 5.469

View more
  3 in total

1.  CoPub Mapper: mining MEDLINE based on search term co-publication.

Authors:  Blaise T F Alako; Antoine Veldhoven; Sjozef van Baal; Rob Jelier; Stefan Verhoeven; Ton Rullmann; Jan Polman; Guido Jenster
Journal:  BMC Bioinformatics       Date:  2005-03-11       Impact factor: 3.169

2.  Normalizing biomedical terms by minimizing ambiguity and variability.

Authors:  Yoshimasa Tsuruoka; John McNaught; Sophia Ananiadou
Journal:  BMC Bioinformatics       Date:  2008-04-11       Impact factor: 3.169

3.  Unregistered biological words recognition by Q-learning with transfer learning.

Authors:  Fei Zhu; Quan Liu; Hui Wang; Xiaoke Zhou; Yuchen Fu
Journal:  ScientificWorldJournal       Date:  2014-02-19
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.