Literature DB >> 14992507

Biological nomenclatures: a source of lexical knowledge and ambiguity.

O Tuason1, L Chen, H Liu, J A Blake, C Friedman.   

Abstract

There has been increased work in developing automated systems that involve natural language processing (NLP) to recognize and extract genomic information from the literature. Recognition and identification of biological entities is a critical step in this process. NLP systems generally rely on nomenclatures and ontological specifications as resources for determining the names of the entities, assigning semantic categories that are consistent with the corresponding ontology, and assignment of identifiers that map to well-defined entities within a particular nomenclature. Although nomenclatures and ontologies are valuable for text processing systems, they were developed to aid researchers and are heterogeneous in structure and semantics. A uniform resource that is automatically generated from diverse resources, and that is designed for NLP purposes would be a useful tool for the field, and would further database interoperability. This paper presents work towards this goal. We have automatically created lexical resources from four model organism nomenclature systems (mouse, fly, worm, and yeast), and have studied performance of the resources within an existing NLP system, GENIES. Using nomenclatures is not straightforward because issues concerning ambiguity, synonymy, and name variations are quite challenging. In this paper we focus mainly on ambiguity. We determined that the number of ambiguous gene names within the individual nomenclatures, across the four nomenclatures, and with general English ranged from 0%-10.18%, 1.187%-20.30%, and 0%-2.49% respectively. When actually processing text, we found the rate of ambiguous occurrences (not counting ambiguities stemming from English words) to range from 2.4%-32.9% depending on the organisms considered.

Entities:  

Mesh:

Year:  2004        PMID: 14992507     DOI: 10.1142/9789812704856_0023

Source DB:  PubMed          Journal:  Pac Symp Biocomput        ISSN: 2335-6928


  18 in total

1.  Cross-species gene normalization by species inference.

Authors:  Chih-Hsuan Wei; Hung-Yu Kao
Journal:  BMC Bioinformatics       Date:  2011-10-03       Impact factor: 3.169

2.  Statistical principle-based approach for recognizing and normalizing microRNAs described in scientific literature.

Authors:  Hong-Jie Dai; Chen-Kai Wang; Nai-Wen Chang; Ming-Siang Huang; Jitendra Jonnagaddala; Feng-Duo Wang; Wen-Lian Hsu
Journal:  Database (Oxford)       Date:  2019-01-01       Impact factor: 3.451

Review 3.  Recent progress in automatically extracting information from the pharmacogenomic literature.

Authors:  Yael Garten; Adrien Coulet; Russ B Altman
Journal:  Pharmacogenomics       Date:  2010-10       Impact factor: 2.533

4.  Prioritizing PubMed articles for the Comparative Toxicogenomic Database utilizing semantic information.

Authors:  Sun Kim; Won Kim; Chih-Hsuan Wei; Zhiyong Lu; W John Wilbur
Journal:  Database (Oxford)       Date:  2012-11-17       Impact factor: 3.451

5.  Survey-based naming conventions for use in OBO Foundry ontology development.

Authors:  Daniel Schober; Barry Smith; Suzanna E Lewis; Waclaw Kusnierczyk; Jane Lomax; Chris Mungall; Chris F Taylor; Philippe Rocca-Serra; Susanna-Assunta Sansone
Journal:  BMC Bioinformatics       Date:  2009-04-27       Impact factor: 3.169

6.  Structuring and extracting knowledge for the support of hypothesis generation in molecular biology.

Authors:  Marco Roos; M Scott Marshall; Andrew P Gibson; Martijn Schuemie; Edgar Meij; Sophia Katrenko; Willem Robert van Hage; Konstantinos Krommydas; Pieter W Adriaans
Journal:  BMC Bioinformatics       Date:  2009-10-01       Impact factor: 3.169

7.  Novel protein-protein interactions inferred from literature context.

Authors:  Herman H H B M van Haagen; Peter A C 't Hoen; Alessandro Botelho Bovo; Antoine de Morrée; Erik M van Mulligen; Christine Chichester; Jan A Kors; Johan T den Dunnen; Gert-Jan B van Ommen; Silvère M van der Maarel; Vinícius Medina Kern; Barend Mons; Martijn J Schuemie
Journal:  PLoS One       Date:  2009-11-18       Impact factor: 3.240

8.  EliIE: An open-source information extraction system for clinical trial eligibility criteria.

Authors:  Tian Kang; Shaodian Zhang; Youlan Tang; Gregory W Hruby; Alexander Rusanov; Noémie Elhadad; Chunhua Weng
Journal:  J Am Med Inform Assoc       Date:  2017-11-01       Impact factor: 4.497

9.  OntoCheck: verifying ontology naming conventions and metadata completeness in Protégé 4.

Authors:  Daniel Schober; Ilinca Tudose; Vojtech Svatek; Martin Boeker
Journal:  J Biomed Semantics       Date:  2012-09-21

10.  Concept recognition for extracting protein interaction relations from biomedical text.

Authors:  William A Baumgartner; Zhiyong Lu; Helen L Johnson; J Gregory Caporaso; Jesse Paquette; Anna Lindemann; Elizabeth K White; Olga Medvedeva; K Bretonnel Cohen; Lawrence Hunter
Journal:  Genome Biol       Date:  2008-09-01       Impact factor: 13.583

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.