Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Learning string similarity measures for gene/protein name dictionary look-up using logistic regression.

Literature DB >> 17698493

Learning string similarity measures for gene/protein name dictionary look-up using logistic regression.

Yoshimasa Tsuruoka¹, John McNaught, Jun'ichi Tsujii, Sophia Ananiadou.

Abstract

MOTIVATION: One of the bottlenecks of biomedical data integration is variation of terms. Exact string matching often fails to associate a name with its biological concept, i.e. ID or accession number in the database, due to seemingly small differences of names. Soft string matching potentially enables us to find the relevant ID by considering the similarity between the names. However, the accuracy of soft matching highly depends on the similarity measure employed.
RESULTS: We used logistic regression for learning a string similarity measure from a dictionary. Experiments using several large-scale gene/protein name dictionaries showed that the logistic regression-based similarity measure outperforms existing similarity measures in dictionary look-up tasks. AVAILABILITY: A dictionary look-up system using the similarity measures described in this article is available at http://text0.mib.man.ac.uk/software/mldic/.

Entities: Chemical

Mesh：

Substances：
Proteins

Year: 2007 PMID： 17698493 DOI： 10.1093/bioinformatics/btm393

Source DB: PubMed Journal: Bioinformatics ISSN： 1367-4803 Impact factor: 6.937

Keyword Cloud
Cited

22 in total

1. BioTagger-GM: a gene/protein name recognition system.

Authors: Manabu Torii; Zhangzhi Hu; Cathy H Wu; Hongfang Liu
Journal: J Am Med Inform Assoc Date: 2008-12-11 Impact factor: 4.497

2. Moara: a Java library for extracting and normalizing gene and protein mentions.

Authors: Mariana L Neves; José-María Carazo; Alberto Pascual-Montano
Journal: BMC Bioinformatics Date: 2010-03-26 Impact factor: 3.169

3. Hybrid Semantic Analysis for Mapping Adverse Drug Reaction Mentions in Tweets to Medical Terminology.

Authors: Ehsan Emadzadeh; Abeed Sarker; Azadeh Nikfarjam; Graciela Gonzalez
Journal: AMIA Annu Symp Proc Date: 2018-04-16

4. TaggerOne: joint named entity recognition and normalization with semi-Markov Models.

Authors: Robert Leaman; Zhiyong Lu
Journal: Bioinformatics Date: 2016-06-09 Impact factor: 6.937

Review 5. Capturing the Patient's Perspective: a Review of Advances in Natural Language Processing of Health-Related Text.

Authors: G Gonzalez-Hernandez; A Sarker; K O'Connor; G Savova
Journal: Yearb Med Inform Date: 2017-09-11

Learning string similarity measures for gene/protein name dictionary look-up using logistic regression.

1. BioTagger-GM: a gene/protein name recognition system.

2. Moara: a Java library for extracting and normalizing gene and protein mentions.

3. Hybrid Semantic Analysis for Mapping Adverse Drug Reaction Mentions in Tweets to Medical Terminology.

4. TaggerOne: joint named entity recognition and normalization with semi-Markov Models.

Review 5. Capturing the Patient's Perspective: a Review of Advances in Natural Language Processing of Health-Related Text.

6. BIOADI: a machine learning approach to identifying abbreviations and definitions in biological literature.

7. Integration of metabolic databases for the reconstruction of genome-scale metabolic networks.

8. Ranking relations between diseases, drugs and genes for a curation task.

Review 9. What the papers say: text mining for genomics and systems biology.

10. A method for integrating and ranking the evidence for biochemical pathways by mining reactions from text.