Literature DB >> 21828087

OrganismTagger: detection, normalization and grounding of organism entities in biomedical documents.

Nona Naderi1, Thomas Kappler, Christopher J O Baker, René Witte.   

Abstract

MOTIVATION: Semantic tagging of organism mentions in full-text articles is an important part of literature mining and semantic enrichment solutions. Tagged organism mentions also play a pivotal role in disambiguating other entities in a text, such as proteins. A high-precision organism tagging system must be able to detect the numerous forms of organism mentions, including common names as well as the traditional taxonomic groups: genus, species and strains. In addition, such a system must resolve abbreviations and acronyms, assign the scientific name and if possible link the detected mention to the NCBI Taxonomy database for further semantic queries and literature navigation.
RESULTS: We present the OrganismTagger, a hybrid rule-based/machine learning system to extract organism mentions from the literature. It includes tools for automatically generating lexical and ontological resources from a copy of the NCBI Taxonomy database, thereby facilitating system updates by end users. Its novel ontology-based resources can also be reused in other semantic mining and linked data tasks. Each detected organism mention is normalized to a canonical name through the resolution of acronyms and abbreviations and subsequently grounded with an NCBI Taxonomy database ID. In particular, our system combines a novel machine-learning approach with rule-based and lexical methods for detecting strain mentions in documents. On our manually annotated OT corpus, the OrganismTagger achieves a precision of 95%, a recall of 94% and a grounding accuracy of 97.5%. On the manually annotated corpus of Linnaeus-100, the results show a precision of 99%, recall of 97% and grounding accuracy of 97.4%. AVAILABILITY: The OrganismTagger, including supporting tools, resources, training data and manual annotations, as well as end user and developer documentation, is freely available under an open-source license at http://www.semanticsoftware.info/organism-tagger. CONTACT: witte@semanticsoftware.info.

Mesh:

Year:  2011        PMID: 21828087     DOI: 10.1093/bioinformatics/btr452

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  17 in total

1.  Semantic text mining support for lignocellulose research.

Authors:  Marie-Jean Meurs; Caitlin Murphy; Ingo Morgenstern; Greg Butler; Justin Powlowski; Adrian Tsang; René Witte
Journal:  BMC Med Inform Decis Mak       Date:  2012-04-30       Impact factor: 2.796

2.  Extracting Characteristics of the Study Subjects from Full-Text Articles.

Authors:  Dina Demner-Fushman; James G Mork
Journal:  AMIA Annu Symp Proc       Date:  2015-11-05

3.  Dataset search in biodiversity research: Do metadata in data repositories reflect scholarly information needs?

Authors:  Felicitas Löffler; Valentin Wesp; Birgitta König-Ries; Friederike Klan
Journal:  PLoS One       Date:  2021-03-24       Impact factor: 3.240

4.  Assigning species information to corresponding genes by a sequence labeling framework.

Authors:  Ling Luo; Chih-Hsuan Wei; Po-Ting Lai; Qingyu Chen; Rezarta Islamaj; Zhiyong Lu
Journal:  Database (Oxford)       Date:  2022-10-13       Impact factor: 4.462

5.  The SPECIES and ORGANISMS Resources for Fast and Accurate Identification of Taxonomic Names in Text.

Authors:  Evangelos Pafilis; Sune P Frankild; Lucia Fanini; Sarah Faulwetter; Christina Pavloudi; Aikaterini Vasileiadou; Christos Arvanitidis; Lars Juhl Jensen
Journal:  PLoS One       Date:  2013-06-18       Impact factor: 3.240

6.  Automated extraction and semantic analysis of mutation impacts from the biomedical literature.

Authors:  Nona Naderi; René Witte
Journal:  BMC Genomics       Date:  2012-06-18       Impact factor: 3.969

7.  SR4GN: a species recognition software tool for gene normalization.

Authors:  Chih-Hsuan Wei; Hung-Yu Kao; Zhiyong Lu
Journal:  PLoS One       Date:  2012-06-05       Impact factor: 3.240

8.  Applications of natural language processing in biodiversity science.

Authors:  Anne E Thessen; Hong Cui; Dmitry Mozzherin
Journal:  Adv Bioinformatics       Date:  2012-05-22

9.  Getting more out of biomedical documents with GATE's full lifecycle open source text analytics.

Authors:  Hamish Cunningham; Valentin Tablan; Angus Roberts; Kalina Bontcheva
Journal:  PLoS Comput Biol       Date:  2013-02-07       Impact factor: 4.475

Review 10.  A review on computational systems biology of pathogen-host interactions.

Authors:  Saliha Durmuş; Tunahan Çakır; Arzucan Özgür; Reinhard Guthke
Journal:  Front Microbiol       Date:  2015-04-09       Impact factor: 5.640

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.