Literature DB >> 16108092

Suregene, a scalable system for automated term disambiguation of gene and protein names.

Raf M Podowski1, John G Cleary, Nicholas T Goncharoff, Gregory Amoutzias, William S Hayes.   

Abstract

Researchers, hindered by a lack of standard gene and protein-naming conventions, endure long, sometimes fruitless, literature searches. A system that is able to automatically assign gene names to their LocusLink ID (LLID) in previously unseen MEDLINE abstracts is described. The system is based on supervised learning and builds a model for each LLID. The training sets for all LLIDs are extracted automatically from MEDLINE references in the LocusLink and SwissProt databases. A validation was done of the performance for all 20,546 human genes with LLIDs. Of these, 7344 produced good quality models (F-measure >0.7, nearly 60% of which were >0.9) and 13,202 did not, mainly due to insufficient numbers of known document references. A hand validation of MEDLINE documents for a set of 66 genes agreed well with the system's internal accuracy assessment. It is concluded that it is possible to achieve high quality gene disambiguation using scaleable automated techniques.

Entities:  

Mesh:

Substances:

Year:  2005        PMID: 16108092     DOI: 10.1142/s0219720005001223

Source DB:  PubMed          Journal:  J Bioinform Comput Biol        ISSN: 0219-7200            Impact factor:   1.122


  4 in total

1.  A literature search tool for intelligent extraction of disease-associated genes.

Authors:  Jae-Yoon Jung; Todd F DeLuca; Tristan H Nelson; Dennis P Wall
Journal:  J Am Med Inform Assoc       Date:  2013-09-02       Impact factor: 4.497

2.  Information discovery on electronic health records using authority flow techniques.

Authors:  Vagelis Hristidis; Ramakrishna R Varadarajan; Paul Biondich; Michael Weiner
Journal:  BMC Med Inform Decis Mak       Date:  2010-10-22       Impact factor: 2.796

3.  Retrieval with gene queries.

Authors:  Aditya K Sehgal; Padmini Srinivasan
Journal:  BMC Bioinformatics       Date:  2006-04-21       Impact factor: 3.169

4.  Evading the annotation bottleneck: using sequence similarity to search non-sequence gene data.

Authors:  Michael J Gilchrist; Mikkel B Christensen; Richard Harland; Nicolas Pollet; James C Smith; Naoto Ueno; Nancy Papalopulu
Journal:  BMC Bioinformatics       Date:  2008-10-17       Impact factor: 3.169

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.