Literature DB >> 11472998

Disambiguating proteins, genes, and RNA in text: a machine learning approach.

V Hatzivassiloglou1, P A Duboué, A Rzhetsky.   

Abstract

We present an automated system for assigning protein, gene, or mRNA class labels to biological terms in free text. Three machine learning algorithms and several extended ways for defining contextual features for disambiguation are examined, and a fully unsupervised manner for obtaining training examples is proposed. We train and evaluate our system over a collection of 9 million words of molecular biology journal articles, obtaining accuracy rates up to 85%.

Entities:  

Mesh:

Substances:

Year:  2001        PMID: 11472998     DOI: 10.1093/bioinformatics/17.suppl_1.s97

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  27 in total

1.  Mapping abbreviations to full forms in biomedical articles.

Authors:  Hong Yu; George Hripcsak; Carol Friedman
Journal:  J Am Med Inform Assoc       Date:  2002 May-Jun       Impact factor: 4.497

2.  Tailoring vocabularies for NLP in sub-domains: a method to detect unused word sense.

Authors:  Rosa L Figueroa; Qing Zeng-Treitler; Sergey Goryachev; Eduardo P Wiechmann
Journal:  AMIA Annu Symp Proc       Date:  2009-11-14

Review 3.  Biomedical language processing: what's beyond PubMed?

Authors:  Lawrence Hunter; K Bretonnel Cohen
Journal:  Mol Cell       Date:  2006-03-03       Impact factor: 17.970

4.  Quantitative assessment of dictionary-based protein named entity tagging.

Authors:  Hongfang Liu; Zhang-Zhi Hu; Manabu Torii; Cathy Wu; Carol Friedman
Journal:  J Am Med Inform Assoc       Date:  2006-06-23       Impact factor: 4.497

5.  Mining the pharmacogenomics literature--a survey of the state of the art.

Authors:  Udo Hahn; K Bretonnel Cohen; Yael Garten; Nigam H Shah
Journal:  Brief Bioinform       Date:  2012-07       Impact factor: 11.622

6.  Investigating heterogeneous protein annotations toward cross-corpora utilization.

Authors:  Yue Wang; Jin-Dong Kim; Rune Saetre; Sampo Pyysalo; Jun'ichi Tsujii
Journal:  BMC Bioinformatics       Date:  2009-12-09       Impact factor: 3.169

7.  BIOADI: a machine learning approach to identifying abbreviations and definitions in biological literature.

Authors:  Cheng-Ju Kuo; Maurice H T Ling; Kuan-Ting Lin; Chun-Nan Hsu
Journal:  BMC Bioinformatics       Date:  2009-12-03       Impact factor: 3.169

8.  Ambiguity of human gene symbols in LocusLink and MEDLINE: creating an inventory and a disambiguation test collection.

Authors:  Marc Weeber; Bob J Schijvenaars; Erik M Van Mulligen; Barend Mons; Rob Jelier; Christian C Van Der Eijk; Jan A Kors
Journal:  AMIA Annu Symp Proc       Date:  2003

9.  Automatic extraction of mutations from Medline and cross-validation with OMIM.

Authors:  Dietrich Rebholz-Schuhmann; Stephane Marcel; Sylvie Albert; Ralf Tolle; Georg Casari; Harald Kirsch
Journal:  Nucleic Acids Res       Date:  2004-01-02       Impact factor: 16.971

10.  Fast max-margin clustering for unsupervised word sense disambiguation in biomedical texts.

Authors:  Weisi Duan; Min Song; Alexander Yates
Journal:  BMC Bioinformatics       Date:  2009-03-19       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.