Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Identification of related gene/protein names based on an HMM of name variations.

Literature DB >> 15130538

Identification of related gene/protein names based on an HMM of name variations.

Abstract

Gene and protein names follow few, if any, true naming conventions and are subject to great variation in different occurrences of the same name. This gives rise to two important problems in natural language processing. First, can one locate the names of genes or proteins in free text, and second, can one determine when two names denote the same gene or protein? The first of these problems is a special case of the problem of named entity recognition, while the second is a special case of the problem of automatic term recognition (ATR). We study the second problem, that of gene or protein name variation. Here we describe a system which, given a query gene or protein name, identifies related gene or protein names in a large list. The system is based on a dynamic programming algorithm for sequence alignment in which the mutation matrix is allowed to vary under the control of a fully trainable hidden Markov model.

Mesh：

Substances：
Proteins

Year: 2004 PMID： 15130538 PMCID： PMC5815558 DOI： 10.1016/j.compbiolchem.2003.12.003

Source DB: PubMed Journal: Comput Biol Chem ISSN： 1476-9271 Impact factor: 2.877

26 in total

Identification of related gene/protein names based on an HMM of name variations.

1. RefSeq and LocusLink: NCBI gene-centered resources.

2. Supporting the classification of pathology reports: comparing two information retrieval methods.

3. Getting to the (c)ore of knowledge: mining biomedical literature.

4. Terminology-driven mining of biomedical literature.

5. GenBank.

6. Automatically identifying gene/protein terms in MEDLINE abstracts.

7. Tagging gene and protein names in biomedical text.

8. Hidden Markov models and optimized sequence alignments.

9. Toward information extraction: identifying protein names from biological papers.

10. An improved algorithm for matching biological sequences.

1. CoPub Mapper: mining MEDLINE based on search term co-publication.

2. Normalizing biomedical terms by minimizing ambiguity and variability.

3. Unregistered biological words recognition by Q-learning with transfer learning.