Literature DB >> 16448034

AZuRE, a scalable system for automated term disambiguation of gene and protein names.

Raf M Podowski1, John G Cleary, Nicholas T Goncharoff, Gregory Amoutzias, William S Hayes.   

Abstract

Researchers, hindered by a lack of standard gene and protein-naming conventions, endure long, sometimes fruitless, literature searches. A system is described which is able to automatically assign gene names to their LocusLink ID (LLID) in previously unseen MEDLINE abstracts. The system is based on supervised learning and builds a model for each LLID. The training sets for all LLIDs are extracted automatically from MEDLINE references in the LocusLink and SwissProt databases. A validation was done of the performance for all 20,546 human genes with LLIDs. Of these, 7,344 produced good quality models (F-measure > 0.7, nearly 60% of which were > 0.9) and 13,202 did not, mainly due to insufficient numbers of known document references. A hand validation of MEDLINE documents for a set of 66 genes agreed well with the system's internal accuracy assessment. It is concluded that it is possible to achieve high quality gene disambiguation using scaleable automated techniques.

Entities:  

Mesh:

Substances:

Year:  2004        PMID: 16448034     DOI: 10.1109/csb.2004.1332454

Source DB:  PubMed          Journal:  Proc IEEE Comput Syst Bioinform Conf        ISSN: 1551-7497


  7 in total

1.  Identifying overrepresented concepts in gene lists from literature: a statistical approach based on Poisson mixture model.

Authors:  Xin He; Moushumi Sen Sarma; Xu Ling; Brant Chee; Chengxiang Zhai; Bruce Schatz
Journal:  BMC Bioinformatics       Date:  2010-05-20       Impact factor: 3.169

2.  Identifying the status of genetic lesions in cancer clinical trial documents using machine learning.

Authors:  Yonghui Wu; Mia A Levy; Christine M Micheel; Paul Yeh; Buzhou Tang; Michael J Cantrell; Stacy M Cooreman; Hua Xu
Journal:  BMC Genomics       Date:  2012-12-17       Impact factor: 3.969

3.  Disclosing ambiguous gene aliases by automatic literature profiling.

Authors:  Roney S Coimbra; Dana E Vanderwall; Guilherme C Oliveira
Journal:  BMC Genomics       Date:  2010-12-22       Impact factor: 3.969

4.  Contextual weighting for Support Vector Machines in literature mining: an application to gene versus protein name disambiguation.

Authors:  Tapio Pahikkala; Filip Ginter; Jorma Boberg; Jouni Järvinen; Tapio Salakoski
Journal:  BMC Bioinformatics       Date:  2005-06-22       Impact factor: 3.169

5.  Thesaurus-based disambiguation of gene symbols.

Authors:  Bob J A Schijvenaars; Barend Mons; Marc Weeber; Martijn J Schuemie; Erik M van Mulligen; Hester M Wain; Jan A Kors
Journal:  BMC Bioinformatics       Date:  2005-06-16       Impact factor: 3.169

6.  Machine learning and word sense disambiguation in the biomedical domain: design and evaluation issues.

Authors:  Hua Xu; Marianthi Markatou; Rositsa Dimova; Hongfang Liu; Carol Friedman
Journal:  BMC Bioinformatics       Date:  2006-07-05       Impact factor: 3.169

7.  The strength of co-authorship in gene name disambiguation.

Authors:  Richárd Farkas
Journal:  BMC Bioinformatics       Date:  2008-01-29       Impact factor: 3.169

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.