Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Suregene, a scalable system for automated term disambiguation of gene and protein names.

Literature DB >> 16108092

Suregene, a scalable system for automated term disambiguation of gene and protein names.

Raf M Podowski¹, John G Cleary, Nicholas T Goncharoff, Gregory Amoutzias, William S Hayes.

Abstract

Researchers, hindered by a lack of standard gene and protein-naming conventions, endure long, sometimes fruitless, literature searches. A system that is able to automatically assign gene names to their LocusLink ID (LLID) in previously unseen MEDLINE abstracts is described. The system is based on supervised learning and builds a model for each LLID. The training sets for all LLIDs are extracted automatically from MEDLINE references in the LocusLink and SwissProt databases. A validation was done of the performance for all 20,546 human genes with LLIDs. Of these, 7344 produced good quality models (F-measure >0.7, nearly 60% of which were >0.9) and 13,202 did not, mainly due to insufficient numbers of known document references. A hand validation of MEDLINE documents for a set of 66 genes agreed well with the system's internal accuracy assessment. It is concluded that it is possible to achieve high quality gene disambiguation using scaleable automated techniques.

Entities: Species

Mesh：

Substances：
Proteins

Year: 2005 PMID： 16108092 DOI： 10.1142/s0219720005001223

Source DB: PubMed Journal: J Bioinform Comput Biol ISSN： 0219-7200 Impact factor: 1.122

Keyword Cloud
Cited

4 in total

Suregene, a scalable system for automated term disambiguation of gene and protein names.

1. A literature search tool for intelligent extraction of disease-associated genes.

2. Information discovery on electronic health records using authority flow techniques.

3. Retrieval with gene queries.

4. Evading the annotation bottleneck: using sequence similarity to search non-sequence gene data.