| Literature DB >> 21813477 |
Jörg Hakenberg1, Martin Gerner, Maximilian Haeussler, Illés Solt, Conrad Plake, Michael Schroeder, Graciela Gonzalez, Goran Nenadic, Casey M Bergman.
Abstract
SUMMARY: Identifying mentions of named entities, such as genes or diseases, and normalizing them to database identifiers have become an important step in many text and data mining pipelines. Despite this need, very few entity normalization systems are publicly available as source code or web services for biomedical text mining. Here we present the Gnat Java library for text retrieval, named entity recognition, and normalization of gene and protein mentions in biomedical text. The library can be used as a component to be integrated with other text-mining systems, as a framework to add user-specific extensions, and as an efficient stand-alone application for the identification of gene and protein names for data analysis. On the BioCreative III test data, the current version of Gnat achieves a Tap-20 score of 0.1987. AVAILABILITY: The library and web services are implemented in Java and the sources are available from http://gnat.sourceforge.net. CONTACT: jorg.hakenberg@roche.com.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21813477 PMCID: PMC3179658 DOI: 10.1093/bioinformatics/btr455
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Overview of the Gnat processing pipeline with typical components [(1) through (7); see text] and final output (8). Gnat is designed in a modular manner, where data exchange is performed using the HTTP protocol. It allows memory- and CPU-intensive components (A and B) to be run separately on appropriate hardware. Memory-intensive components typically run as (remote or local) services, as they require longer startup times less suited for small queries. The Gnat client (center) manages which components to invoke in which manner, and sends data to the components for annotation. Some components rely on annotations provided by other components, such as the assignment of candidate identifiers during step (5), which requires species annotations from step (3a).