Literature DB >> 18628915

Can bibliographic pointers for known biological data be found automatically? Protein interactions as a case study.

C Blaschke1, A Valencia.   

Abstract

The Dictionary of Interacting Proteins (DIP) (Xenarios et al., 2000) is a large repository of protein interactions: its March 2000 release included 2379 protein pairs whose interactions have been detected by experimental methods. Even if many of these correspond to poorly characterized proteins, the result of massive yeast two-hybrid screenings, as many as 851 correspond to interactions detected using direct biochemical methods.We used information retrieval technology to search automatically for sentences in Medline abstracts that support these 851 DIP interactions. Surprisingly, we found correspondence between DIP protein pairs and Medline sentences describing their interactions in only 30% of the cases. This low coverage has interesting consequences regarding the quality of annotations (references) introduced in the database and the limitations of the application of information extraction (IE) technology to Molecular Biology. It is clear that the limitation of analyzing abstracts rather than full papers and the lack of standard protein names are difficulties of considerably more importance than the limitations of the IE methodology employed. A positive finding is the capacity of the IE system to identify new relations between proteins, even in a set of proteins previously characterized by human experts. These identifications are made with a considerable degree of precision. This is, to our knowledge, the first large scale assessment of IE capacity to detect previously known interactions: we thus propose the use of the DIP data set as a biological reference to benchmark IE systems.

Entities:  

Year:  2001        PMID: 18628915      PMCID: PMC2447212          DOI: 10.1002/cfg.91

Source DB:  PubMed          Journal:  Comp Funct Genomics        ISSN: 1531-6912


  28 in total

1.  BIND--The Biomolecular Interaction Network Database.

Authors:  G D Bader; I Donaldson; C Wolting; B F Ouellette; T Pawson; C W Hogue
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

2.  A literature network of human genes for high-throughput analysis of gene expression.

Authors:  T K Jenssen; A Laegreid; J Komorowski; E Hovig
Journal:  Nat Genet       Date:  2001-05       Impact factor: 38.330

3.  Event extraction from biomedical papers using a full parser.

Authors:  A Yakushiji; Y Tateisi; Y Miyao; J Tsujii
Journal:  Pac Symp Biocomput       Date:  2001

4.  Protein Data Bank (PDB): database of three-dimensional structural information of biological macromolecules.

Authors:  J L Sussman; D Lin; J Jiang; N O Manning; J Prilusky; O Ritter; E E Abola
Journal:  Acta Crystallogr D Biol Crystallogr       Date:  1998-11-01

5.  Automatic extraction of protein interactions from scientific abstracts.

Authors:  J Thomas; D Milward; C Ouzounis; S Pulman; M Carroll
Journal:  Pac Symp Biocomput       Date:  2000

6.  A pragmatic information extraction strategy for gathering data on genetic interactions.

Authors:  D Proux; F Rechenmann; L Julliard
Journal:  Proc Int Conf Intell Syst Mol Biol       Date:  2000

7.  The two-hybrid system: a method to identify and clone genes for proteins that interact with a protein of interest.

Authors:  C T Chien; P L Bartel; R Sternglanz; S Fields
Journal:  Proc Natl Acad Sci U S A       Date:  1991-11-01       Impact factor: 11.205

8.  Automatic extraction of keywords from scientific text: application to the knowledge domain of protein families.

Authors:  M A Andrade; A Valencia
Journal:  Bioinformatics       Date:  1998       Impact factor: 6.937

9.  Toward information extraction: identifying protein names from biological papers.

Authors:  K Fukuda; A Tamura; T Tsunoda; T Takagi
Journal:  Pac Symp Biocomput       Date:  1998

10.  Automatic construction of knowledge base from biological papers.

Authors:  Y Ohta; Y Yamamoto; T Okazaki; I Uchiyama; T Takagi
Journal:  Proc Int Conf Intell Syst Mol Biol       Date:  1997
View more
  14 in total

1.  Search and retrieve. Large-scale data generation is becoming increasingly important in biological research. But how good are the tools to make sense of the data?

Authors:  Alfonso Valencia
Journal:  EMBO Rep       Date:  2002-05       Impact factor: 8.807

2.  Gene/protein name recognition based on support vector machine using dictionary as features.

Authors:  Tomohiro Mitsumori; Sevrani Fation; Masaki Murata; Kouichi Doi; Hirohumi Doi
Journal:  BMC Bioinformatics       Date:  2005-05-24       Impact factor: 3.169

3.  Textpresso: an ontology-based information retrieval and extraction system for biological literature.

Authors:  Hans-Michael Müller; Eimear E Kenny; Paul W Sternberg
Journal:  PLoS Biol       Date:  2004-09-21       Impact factor: 8.029

4.  Automated network analysis identifies core pathways in glioblastoma.

Authors:  Ethan Cerami; Emek Demir; Nikolaus Schultz; Barry S Taylor; Chris Sander
Journal:  PLoS One       Date:  2010-02-12       Impact factor: 3.240

Review 5.  What the papers say: text mining for genomics and systems biology.

Authors:  Nathan Harmston; Wendy Filsell; Michael P H Stumpf
Journal:  Hum Genomics       Date:  2010-10       Impact factor: 4.639

6.  Concept annotation in the CRAFT corpus.

Authors:  Michael Bada; Miriam Eckert; Donald Evans; Kristin Garcia; Krista Shipley; Dmitry Sitnikov; William A Baumgartner; K Bretonnel Cohen; Karin Verspoor; Judith A Blake; Lawrence E Hunter
Journal:  BMC Bioinformatics       Date:  2012-07-09       Impact factor: 3.169

7.  The structural and content aspects of abstracts versus bodies of full text journal articles are different.

Authors:  K Bretonnel Cohen; Helen L Johnson; Karin Verspoor; Christophe Roeder; Lawrence E Hunter
Journal:  BMC Bioinformatics       Date:  2010-09-29       Impact factor: 3.169

8.  Argument-predicate distance as a filter for enhancing precision in extracting predications on the genetic etiology of disease.

Authors:  Marco Masseroli; Halil Kilicoglu; François-Michel Lang; Thomas C Rindflesch
Journal:  BMC Bioinformatics       Date:  2006-06-08       Impact factor: 3.169

9.  The textual characteristics of traditional and Open Access scientific journals are similar.

Authors:  Karin Verspoor; K Bretonnel Cohen; Lawrence Hunter
Journal:  BMC Bioinformatics       Date:  2009-06-15       Impact factor: 3.169

10.  A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools.

Authors:  Karin Verspoor; Kevin Bretonnel Cohen; Arrick Lanfranchi; Colin Warner; Helen L Johnson; Christophe Roeder; Jinho D Choi; Christopher Funk; Yuriy Malenkiy; Miriam Eckert; Nianwen Xue; William A Baumgartner; Michael Bada; Martha Palmer; Lawrence E Hunter
Journal:  BMC Bioinformatics       Date:  2012-08-17       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.