Literature DB >> 15383839

Textpresso: an ontology-based information retrieval and extraction system for biological literature.

Hans-Michael Müller1, Eimear E Kenny, Paul W Sternberg.   

Abstract

We have developed Textpresso, a new text-mining system for scientific literature whose capabilities go far beyond those of a simple keyword search engine. Textpresso's two major elements are a collection of the full text of scientific articles split into individual sentences, and the implementation of categories of terms for which a database of articles and individual sentences can be searched. The categories are classes of biological concepts (e.g., gene, allele, cell or cell group, phenotype, etc.) and classes that relate two objects (e.g., association, regulation, etc.) or describe one (e.g., biological process, etc.). Together they form a catalog of types of objects and concepts called an ontology. After this ontology is populated with terms, the whole corpus of articles and abstracts is marked up to identify terms of these categories. The current ontology comprises 33 categories of terms. A search engine enables the user to search for one or a combination of these tags and/or keywords within a sentence or document, and as the ontology allows word meaning to be queried, it is possible to formulate semantic queries. Full text access increases recall of biological data types from 45% to 95%. Extraction of particular biological facts, such as gene-gene interactions, can be accelerated significantly by ontologies, with Textpresso automatically performing nearly as well as expert curators to identify sentences; in searches for two uniquely named genes and an interaction term, the ontology confers a 3-fold increase of search efficiency. Textpresso currently focuses on Caenorhabditis elegans literature, with 3,800 full text articles and 16,000 abstracts. The lexicon of the ontology contains 14,500 entries, each of which includes all versions of a specific word or phrase, and it includes all categories of the Gene Ontology database. Textpresso is a useful curation tool, as well as search engine for researchers, and can readily be extended to other organism-specific corpora of text. Textpresso can be accessed at http://www.textpresso.org or via WormBase at http://www.wormbase.org.

Entities:  

Mesh:

Substances:

Year:  2004        PMID: 15383839      PMCID: PMC517822          DOI: 10.1371/journal.pbio.0020309

Source DB:  PubMed          Journal:  PLoS Biol        ISSN: 1544-9173            Impact factor:   8.029


  27 in total

1.  A literature network of human genes for high-throughput analysis of gene expression.

Authors:  T K Jenssen; A Laegreid; J Komorowski; E Hovig
Journal:  Nat Genet       Date:  2001-05       Impact factor: 38.330

2.  Automatic extraction of protein interactions from scientific abstracts.

Authors:  J Thomas; D Milward; C Ouzounis; S Pulman; M Carroll
Journal:  Pac Symp Biocomput       Date:  2000

3.  GENIES: a natural-language processing system for the extraction of molecular pathways from journal articles.

Authors:  C Friedman; P Kra; H Yu; M Krauthammer; A Rzhetsky
Journal:  Bioinformatics       Date:  2001       Impact factor: 6.937

4.  Getting to the (c)ore of knowledge: mining biomedical literature.

Authors:  Berry de Bruijn; Joel Martin
Journal:  Int J Med Inform       Date:  2002-12-04       Impact factor: 4.046

5.  Tissue-specific regulation of the LIM homeobox gene lin-11 during development of the Caenorhabditis elegans egg-laying system.

Authors:  Bhagwati P Gupta; Paul W Sternberg
Journal:  Dev Biol       Date:  2002-07-01       Impact factor: 3.582

6.  Toward information extraction: identifying protein names from biological papers.

Authors:  K Fukuda; A Tamura; T Tsunoda; T Takagi
Journal:  Pac Symp Biocomput       Date:  1998

7.  SRC-1 and Wnt signaling act together to specify endoderm and to control cleavage orientation in early C. elegans embryos.

Authors:  Yanxia Bei; Jennifer Hogan; Laura A Berkowitz; Martha Soto; Christian E Rocheleau; Ka Ming Pang; John Collins; Craig C Mello
Journal:  Dev Cell       Date:  2002-07       Impact factor: 12.270

8.  Regulation of hypoxic death in C. elegans by the insulin/IGF receptor homolog DAF-2.

Authors:  Barbara A Scott; Michael S Avidan; C Michael Crowder
Journal:  Science       Date:  2002-06-13       Impact factor: 47.728

9.  Rho-binding kinase (LET-502) and myosin phosphatase (MEL-11) regulate cytokinesis in the early Caenorhabditis elegans embryo.

Authors:  Alisa J Piekny; Paul E Mains
Journal:  J Cell Sci       Date:  2002-06-01       Impact factor: 5.285

10.  Can bibliographic pointers for known biological data be found automatically? Protein interactions as a case study.

Authors:  C Blaschke; A Valencia
Journal:  Comp Funct Genomics       Date:  2001
View more
  214 in total

1.  Evaluation of semantic-based information retrieval methods in the autism phenotype domain.

Authors:  Saeed Hassanpour; Martin J O'Connor; Amar K Das
Journal:  AMIA Annu Symp Proc       Date:  2011-10-22

2.  Algorithms for modeling global and context-specific functional relationship networks.

Authors:  Fan Zhu; Bharat Panwar; Yuanfang Guan
Journal:  Brief Bioinform       Date:  2015-08-06       Impact factor: 11.622

3.  Semantic text mining support for lignocellulose research.

Authors:  Marie-Jean Meurs; Caitlin Murphy; Ingo Morgenstern; Greg Butler; Justin Powlowski; Adrian Tsang; René Witte
Journal:  BMC Med Inform Decis Mak       Date:  2012-04-30       Impact factor: 2.796

4.  MachineProse: an ontological framework for scientific assertions.

Authors:  Deendayal Dinakarpandian; Yugyung Lee; Kartik Vishwanath; Rohini Lingambhotla
Journal:  J Am Med Inform Assoc       Date:  2005-12-15       Impact factor: 4.497

Review 5.  The impact of the NIH public access policy on literature informatics: What role can the neuroinformaticists play?

Authors:  William Bug
Journal:  Neuroinformatics       Date:  2005

Review 6.  Biomedical language processing: what's beyond PubMed?

Authors:  Lawrence Hunter; K Bretonnel Cohen
Journal:  Mol Cell       Date:  2006-03-03       Impact factor: 17.970

7.  A UML profile for the OBO relation ontology.

Authors:  Gabriela D A Guardia; Ricardo Z N Vêncio; Cléver R G de Farias
Journal:  BMC Genomics       Date:  2012-10-19       Impact factor: 3.969

8.  Building an efficient curation workflow for the Arabidopsis literature corpus.

Authors:  Donghui Li; Tanya Z Berardini; Robert J Muller; Eva Huala
Journal:  Database (Oxford)       Date:  2012-12-06       Impact factor: 3.451

9.  Improving the prediction of pharmacogenes using text-derived drug-gene relationships.

Authors:  Yael Garten; Nicholas P Tatonetti; Russ B Altman
Journal:  Pac Symp Biocomput       Date:  2010

10.  One stop shop for everything Dictyostelium: dictyBase and the Dicty Stock Center in 2012.

Authors:  Petra Fey; Robert J Dodson; Siddhartha Basu; Rex L Chisholm
Journal:  Methods Mol Biol       Date:  2013
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.