Literature DB >> 12755519

Rutabaga by any other name: extracting biological names.

Lynette Hirschman1, Alexander A Morgan, Alexander S Yeh.   

Abstract

As the pace of biological research accelerates, biologists are becoming increasingly reliant on computers to manage the information explosion. Biologists communicate their research findings by relying on precise biological terms; these terms then provide indices into the literature and across the growing number of biological databases. This article examines emerging techniques to access biological resources through extraction of entity names and relations among them. Information extraction has been an active area of research in natural language processing and there are promising results for information extraction applied to news stories, e.g., balanced precision and recall in the 93-95% range for identifying person, organization and location names. But these results do not seem to transfer directly to biological names, where results remain in the 75-80% range. Multiple factors may be involved, including absence of shared training and test sets for rigorous measures of progress, lack of annotated training data specific to biological tasks, pervasive ambiguity of terms, frequent introduction of new terms, and a mismatch between evaluation tasks as defined for news and real biological problems. We present evidence from a simple lexical matching exercise that illustrates some specific problems encountered when identifying biological names. We conclude by outlining a research agenda to raise performance of named entity tagging to a level where it can be used to perform tasks of biological importance.

Entities:  

Mesh:

Year:  2002        PMID: 12755519     DOI: 10.1016/s1532-0464(03)00014-5

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  27 in total

1.  Cross-species gene normalization by species inference.

Authors:  Chih-Hsuan Wei; Hung-Yu Kao
Journal:  BMC Bioinformatics       Date:  2011-10-03       Impact factor: 3.169

2.  Recognizing Medication related Entities in Hospital Discharge Summaries using Support Vector Machine.

Authors:  Son Doan; Hua Xu
Journal:  Proc Int Conf Comput Ling       Date:  2010-08

3.  Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts.

Authors:  A M Cohen; W R Hersh; C Dubay; K Spackman
Journal:  BMC Bioinformatics       Date:  2005-04-22       Impact factor: 3.169

4.  Computer and Internet Utilization among the Medical Students in Qassim University, Saudi Arabia.

Authors:  Yousef Homood Aldebasi; Mohamed Issa Ahmed
Journal:  J Clin Diagn Res       Date:  2013-05-09

5.  Throw the bath water out, keep the baby: keeping medically-relevant terms for text mining.

Authors:  Jay Jarman; Donald J Berndt
Journal:  AMIA Annu Symp Proc       Date:  2010-11-13

6.  Getting started in text mining: part two.

Authors:  Andrey Rzhetsky; Michael Seringhaus; Mark B Gerstein
Journal:  PLoS Comput Biol       Date:  2009-07-31       Impact factor: 4.475

7.  Seeking a new biology through text mining.

Authors:  Andrey Rzhetsky; Michael Seringhaus; Mark Gerstein
Journal:  Cell       Date:  2008-07-11       Impact factor: 41.582

8.  Prioritizing PubMed articles for the Comparative Toxicogenomic Database utilizing semantic information.

Authors:  Sun Kim; Won Kim; Chih-Hsuan Wei; Zhiyong Lu; W John Wilbur
Journal:  Database (Oxford)       Date:  2012-11-17       Impact factor: 3.451

9.  An automated framework for hypotheses generation using literature.

Authors:  Vida Abedi; Ramin Zand; Mohammed Yeasin; Fazle Elahi Faisal
Journal:  BioData Min       Date:  2012-08-29       Impact factor: 2.522

10.  Recognition of medication information from discharge summaries using ensembles of classifiers.

Authors:  Son Doan; Nigel Collier; Hua Xu; Hoang Duy Pham; Minh Phuong Tu
Journal:  BMC Med Inform Decis Mak       Date:  2012-05-07       Impact factor: 2.796

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.