Chih-Hsuan Wei1, Hung-Yu Kao. 1. Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan, ROC.
Abstract
BACKGROUND: To access and utilize the rich information contained in the biomedical literature, the ability to recognize and normalize gene mentions referenced in the literature is crucial. In this paper, we focus on improvements to the accuracy of gene normalization in cases where species information is not provided. Gene names are often ambiguous, in that they can refer to the genes of many species. Therefore, gene normalization is a difficult challenge. METHODS: We define "gene normalization" as a series of tasks involving several issues, including gene name recognition, species assignation and species-specific gene normalization. We propose an integrated method, GenNorm, consisting of three modules to handle the issues of this task. Every issue can affect overall performance, though the most important is species assignation. Clearly, correct identification of the species can decrease the ambiguity of orthologous genes. RESULTS: In experiments, the proposed model attained the top-1 threshold average precision (TAP-k) scores of 0.3297 (k=5), 0.3538 (k=10), and 0.3535 (k=20) when tested against 50 articles that had been selected for their difficulty and the most divergent results from pooled team submissions. In the silver-standard-507 evaluation, our TAP-k scores are 0.4591 for k=5, 10, and 20 and were ranked 2nd, 2nd, and 3rd respectively. AVAILABILITY: A web service and input, output formats of GenNorm are available at http://ikmbio.csie.ncku.edu.tw/GN/.
BACKGROUND: To access and utilize the rich information contained in the biomedical literature, the ability to recognize and normalize gene mentions referenced in the literature is crucial. In this paper, we focus on improvements to the accuracy of gene normalization in cases where species information is not provided. Gene names are often ambiguous, in that they can refer to the genes of many species. Therefore, gene normalization is a difficult challenge. METHODS: We define "gene normalization" as a series of tasks involving several issues, including gene name recognition, species assignation and species-specific gene normalization. We propose an integrated method, GenNorm, consisting of three modules to handle the issues of this task. Every issue can affect overall performance, though the most important is species assignation. Clearly, correct identification of the species can decrease the ambiguity of orthologous genes. RESULTS: In experiments, the proposed model attained the top-1 threshold average precision (TAP-k) scores of 0.3297 (k=5), 0.3538 (k=10), and 0.3535 (k=20) when tested against 50 articles that had been selected for their difficulty and the most divergent results from pooled team submissions. In the silver-standard-507 evaluation, our TAP-k scores are 0.4591 for k=5, 10, and 20 and were ranked 2nd, 2nd, and 3rd respectively. AVAILABILITY: A web service and input, output formats of GenNorm are available at http://ikmbio.csie.ncku.edu.tw/GN/.
Authors: Florian Leitner; Scott A Mardis; Martin Krallinger; Gianni Cesareni; Lynette A Hirschman; Alfonso Valencia Journal: IEEE/ACM Trans Comput Biol Bioinform Date: 2010 Jul-Sep Impact factor: 3.710
Authors: Roman Klinger; Corinna Kolárik; Juliane Fluck; Martin Hofmann-Apitius; Christoph M Friedrich Journal: Bioinformatics Date: 2008-07-01 Impact factor: 6.937
Authors: Alexander A Morgan; Zhiyong Lu; Xinglong Wang; Aaron M Cohen; Juliane Fluck; Patrick Ruch; Anna Divoli; Katrin Fundel; Robert Leaman; Jörg Hakenberg; Chengjie Sun; Heng-hui Liu; Rafael Torres; Michael Krauthammer; William W Lau; Hongfang Liu; Chun-Nan Hsu; Martijn Schuemie; K Bretonnel Cohen; Lynette Hirschman Journal: Genome Biol Date: 2008-09-01 Impact factor: 13.583
Authors: Karen E Ross; Darren A Natale; Cecilia Arighi; Sheng-Chih Chen; Hongzhan Huang; Gang Li; Jia Ren; Michael Wang; K Vijay-Shanker; Cathy H Wu Journal: CEUR Workshop Proc Date: 2016-11-29
Authors: Sofie Van Landeghem; Jari Björne; Chih-Hsuan Wei; Kai Hakala; Sampo Pyysalo; Sophia Ananiadou; Hung-Yu Kao; Zhiyong Lu; Tapio Salakoski; Yves Van de Peer; Filip Ginter Journal: PLoS One Date: 2013-04-17 Impact factor: 3.240