Literature DB >> 16124992

Assessment of approximate string matching in a biomedical text retrieval problem.

J F Wang1, Z R Li, C Z Cai, Y Z Chen.   

Abstract

Text-based search is widely used for biomedical data mining and knowledge discovery. Character errors in literatures affect the accuracy of data mining. Methods for solving this problem are being explored. This work tests the usefulness of the Smith-Waterman algorithm with affine gap penalty as a method for biomedical literature retrieval. Names of medicinal herbs collected from herbal medicine literatures are matched with those from medicinal chemistry literatures by using this algorithm at different string identity levels (80-100%). The optimum performance is at string identity of 88%, at which the recall and precision are 96.9% and 97.3%, respectively. Our study suggests that the Smith-Waterman algorithm is useful for improving the success rate of biomedical text retrieval.

Entities:  

Mesh:

Year:  2005        PMID: 16124992     DOI: 10.1016/j.compbiomed.2004.06.002

Source DB:  PubMed          Journal:  Comput Biol Med        ISSN: 0010-4825            Impact factor:   4.589


  3 in total

Review 1.  Bioinformatics opportunities for identification and study of medicinal plants.

Authors:  Vivekanand Sharma; Indra Neil Sarkar
Journal:  Brief Bioinform       Date:  2012-05-15       Impact factor: 11.622

2.  A fuzzy-match search engine for physician directories.

Authors:  Majid Rastegar-Mojarad; Christopher Kadolph; Zhan Ye; Daniel Wall; Narayana Murali; Simon Lin
Journal:  JMIR Med Inform       Date:  2014-11-04

3.  Mapping biological entities using the longest approximately common prefix method.

Authors:  Alex Rudniy; Min Song; James Geller
Journal:  BMC Bioinformatics       Date:  2014-06-14       Impact factor: 3.169

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.