Literature DB >> 17698493

Learning string similarity measures for gene/protein name dictionary look-up using logistic regression.

Yoshimasa Tsuruoka1, John McNaught, Jun'ichi Tsujii, Sophia Ananiadou.   

Abstract

MOTIVATION: One of the bottlenecks of biomedical data integration is variation of terms. Exact string matching often fails to associate a name with its biological concept, i.e. ID or accession number in the database, due to seemingly small differences of names. Soft string matching potentially enables us to find the relevant ID by considering the similarity between the names. However, the accuracy of soft matching highly depends on the similarity measure employed.
RESULTS: We used logistic regression for learning a string similarity measure from a dictionary. Experiments using several large-scale gene/protein name dictionaries showed that the logistic regression-based similarity measure outperforms existing similarity measures in dictionary look-up tasks. AVAILABILITY: A dictionary look-up system using the similarity measures described in this article is available at http://text0.mib.man.ac.uk/software/mldic/.

Entities:  

Mesh:

Substances:

Year:  2007        PMID: 17698493     DOI: 10.1093/bioinformatics/btm393

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  22 in total

1.  BioTagger-GM: a gene/protein name recognition system.

Authors:  Manabu Torii; Zhangzhi Hu; Cathy H Wu; Hongfang Liu
Journal:  J Am Med Inform Assoc       Date:  2008-12-11       Impact factor: 4.497

2.  Moara: a Java library for extracting and normalizing gene and protein mentions.

Authors:  Mariana L Neves; José-María Carazo; Alberto Pascual-Montano
Journal:  BMC Bioinformatics       Date:  2010-03-26       Impact factor: 3.169

3.  Hybrid Semantic Analysis for Mapping Adverse Drug Reaction Mentions in Tweets to Medical Terminology.

Authors:  Ehsan Emadzadeh; Abeed Sarker; Azadeh Nikfarjam; Graciela Gonzalez
Journal:  AMIA Annu Symp Proc       Date:  2018-04-16

4.  TaggerOne: joint named entity recognition and normalization with semi-Markov Models.

Authors:  Robert Leaman; Zhiyong Lu
Journal:  Bioinformatics       Date:  2016-06-09       Impact factor: 6.937

Review 5.  Capturing the Patient's Perspective: a Review of Advances in Natural Language Processing of Health-Related Text.

Authors:  G Gonzalez-Hernandez; A Sarker; K O'Connor; G Savova
Journal:  Yearb Med Inform       Date:  2017-09-11

6.  BIOADI: a machine learning approach to identifying abbreviations and definitions in biological literature.

Authors:  Cheng-Ju Kuo; Maurice H T Ling; Kuan-Ting Lin; Chun-Nan Hsu
Journal:  BMC Bioinformatics       Date:  2009-12-03       Impact factor: 3.169

7.  Integration of metabolic databases for the reconstruction of genome-scale metabolic networks.

Authors:  Karin Radrich; Yoshimasa Tsuruoka; Paul Dobson; Albert Gevorgyan; Neil Swainston; Gino Baart; Jean-Marc Schwartz
Journal:  BMC Syst Biol       Date:  2010-08-16

8.  Ranking relations between diseases, drugs and genes for a curation task.

Authors:  Simon Clematide; Fabio Rinaldi
Journal:  J Biomed Semantics       Date:  2012-10-05

Review 9.  What the papers say: text mining for genomics and systems biology.

Authors:  Nathan Harmston; Wendy Filsell; Michael P H Stumpf
Journal:  Hum Genomics       Date:  2010-10       Impact factor: 4.639

10.  A method for integrating and ranking the evidence for biochemical pathways by mining reactions from text.

Authors:  Makoto Miwa; Tomoko Ohta; Rafal Rak; Andrew Rowley; Douglas B Kell; Sampo Pyysalo; Sophia Ananiadou
Journal:  Bioinformatics       Date:  2013-07-01       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.