Literature DB >> 11072323

Detecting Gene Symbols and Names in Biological Texts: A First Step toward Pertinent Information Extraction.

.   

Abstract

Gathering data on molecular interactions to be fed into a specialized database has motivated the development of a computer system to help extracting pertinent information from texts, relying on advanced linguistic tools, completed with object-oriented knowledge modeling capabilities. As a first step toward this challenging objective, a program for the identification of gene symbols and names inside sentences has been devised. The main difficulty is that these names and symbols do not appear to follow construction rules. The program is thus made up of a series of sieves of different natures, lexical, morphological and semantic, to distinguish among the words of a sentence those which can only be potential gene symbols or names. Its performance has been evaluated, in terms of coverage and precision ratios, on a corpus of texts concerning D. melanogaster for which the list of names of known genes is available for checking.

Entities:  

Year:  1998        PMID: 11072323

Source DB:  PubMed          Journal:  Genome Inform Ser Workshop Genome Inform


  20 in total

1.  Mining the bibliome: searching for a needle in a haystack? New computing tools are needed to effectively scan the growing amount of scientific literature for useful information.

Authors:  Les Grivell
Journal:  EMBO Rep       Date:  2002-03       Impact factor: 8.807

2.  Automatic extraction of gene and protein synonyms from MEDLINE and journal articles.

Authors:  Hong Yu; Vasileios Hatzivassiloglou; Carol Friedman; Andrey Rzhetsky; W John Wilbur
Journal:  Proc AMIA Symp       Date:  2002

3.  A method for finding communities of related genes.

Authors:  Dennis M Wilkinson; Bernardo A Huberman
Journal:  Proc Natl Acad Sci U S A       Date:  2004-02-02       Impact factor: 11.205

4.  A simple and practical dictionary-based approach for identification of proteins in Medline abstracts.

Authors:  Sergei Egorov; Anton Yuryev; Nikolai Daraselia
Journal:  J Am Med Inform Assoc       Date:  2004-02-05       Impact factor: 4.497

5.  Identification of related gene/protein names based on an HMM of name variations.

Authors:  L Yeganova; L Smith; W J Wilbur
Journal:  Comput Biol Chem       Date:  2004-04       Impact factor: 2.877

6.  NLProt: extracting protein names and sequences from papers.

Authors:  Sven Mika; Burkhard Rost
Journal:  Nucleic Acids Res       Date:  2004-07-01       Impact factor: 16.971

7.  Using co-occurrence network structure to extract synonymous gene and protein names from MEDLINE abstracts.

Authors:  A M Cohen; W R Hersh; C Dubay; K Spackman
Journal:  BMC Bioinformatics       Date:  2005-04-22       Impact factor: 3.169

8.  Unsupervised biomedical named entity recognition: experiments with clinical and biological texts.

Authors:  Shaodian Zhang; Noémie Elhadad
Journal:  J Biomed Inform       Date:  2013-08-15       Impact factor: 6.317

9.  Textpresso: an ontology-based information retrieval and extraction system for biological literature.

Authors:  Hans-Michael Müller; Eimear E Kenny; Paul W Sternberg
Journal:  PLoS Biol       Date:  2004-09-21       Impact factor: 8.029

10.  BIOADI: a machine learning approach to identifying abbreviations and definitions in biological literature.

Authors:  Cheng-Ju Kuo; Maurice H T Ling; Kuan-Ting Lin; Chun-Nan Hsu
Journal:  BMC Bioinformatics       Date:  2009-12-03       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.