Literature DB >> 17951839

Rule-based human gene normalization in biomedical text with confidence estimation.

William W Lau1, Calvin A Johnson, Kevin G Becker.   

Abstract

The ability to identify gene mentions in text and normalize them to the proper unique identifiers is crucial for "down-stream" text mining applications in bioinformatics. We have developed a rule-based algorithm that divides the normalization task into two steps. The first step includes pattern matching for gene symbols and an approximate term searching technique for gene names. Next, the algorithm measures several features based on morphological, statistical, and contextual information to estimate the level of confidence that the correct identifier is selected for a potential mention. Uniqueness, inverse distance, and coverage are three novel features we quantified. The algorithm was evaluated against the BioCreAtIvE datasets. The feature weights were tuned by the Nealder-Mead simplex method. An F-score of .7622 and an AUC (area under the recall-precision curve) of .7461 were achieved on the test data using the set of weights optimized to the training data.

Entities:  

Mesh:

Year:  2007        PMID: 17951839

Source DB:  PubMed          Journal:  Comput Syst Bioinformatics Conf        ISSN: 1752-7791


  5 in total

1.  Soft tagging of overlapping high confidence gene mention variants for cross-species full-text gene normalization.

Authors:  Cheng-Ju Kuo; Maurice H T Ling; Chun-Nan Hsu
Journal:  BMC Bioinformatics       Date:  2011-10-03       Impact factor: 3.169

2.  Methods for managing variation in clinical drug names.

Authors:  Lee Peters; Joan E Kapusnik-Uner; Olivier Bodenreider
Journal:  AMIA Annu Symp Proc       Date:  2010-11-13

Review 3.  What the papers say: text mining for genomics and systems biology.

Authors:  Nathan Harmston; Wendy Filsell; Michael P H Stumpf
Journal:  Hum Genomics       Date:  2010-10       Impact factor: 4.639

Review 4.  A knowledge-driven approach to extract disease-related biomarkers from the literature.

Authors:  À Bravo; M Cases; N Queralt-Rosinach; F Sanz; L I Furlong
Journal:  Biomed Res Int       Date:  2014-04-16       Impact factor: 3.411

5.  Introducing meta-services for biomedical information extraction.

Authors:  Florian Leitner; Martin Krallinger; Carlos Rodriguez-Penagos; Jörg Hakenberg; Conrad Plake; Cheng-Ju Kuo; Chun-Nan Hsu; Richard Tzong-Han Tsai; Hsi-Chuan Hung; William W Lau; Calvin A Johnson; Rune Saetre; Kazuhiro Yoshida; Yan Hua Chen; Sun Kim; Soo-Yong Shin; Byoung-Tak Zhang; William A Baumgartner; Lawrence Hunter; Barry Haddow; Michael Matthews; Xinglong Wang; Patrick Ruch; Frédéric Ehrler; Arzucan Ozgür; Güneş Erkan; Dragomir R Radev; Michael Krauthammer; ThaiBinh Luong; Robert Hoffmann; Chris Sander; Alfonso Valencia
Journal:  Genome Biol       Date:  2008-09-01       Impact factor: 13.583

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.