Literature DB >> 14594709

Extraction of protein interaction information from unstructured text using a context-free grammar.

Joshua M Temkin1, Mark R Gilder.   

Abstract

MOTIVATION: As research into disease pathology and cellular function continues to generate vast amounts of data pertaining to protein, gene and small molecule (PGSM) interactions, there exists a critical need to capture these results in structured formats allowing for computational analysis. Although many efforts have been made to create databases that store this information in computer readable form, populating these sources largely requires a manual process of interpreting and extracting interaction relationships from the biological research literature. Being able to efficiently and accurately automate the extraction of interactions from unstructured text, would greatly improve the content of these databases and provide a method for managing the continued growth of new literature being published.
RESULTS: In this paper, we describe a system for extracting PGSM interactions from unstructured text. By utilizing a lexical analyzer and context free grammar (CFG), we demonstrate that efficient parsers can be constructed for extracting these relationships from natural language with high rates of recall and precision. Our results show that this technique achieved a recall rate of 83.5% and a precision rate of 93.1% for recognizing PGSM names and a recall rate of 63.9% and a precision rate of 70.2% for extracting interactions between these entities. In contrast to other published techniques, the use of a CFG significantly reduces the complexities of natural language processing by focusing on domain specific structure as opposed to analyzing the semantics of a given language. Additionally, our approach provides a level of abstraction for adding new rules for extracting other types of biological relationships beyond PGSM relationships. AVAILABILITY: The program and corpus are available by request from the authors.

Mesh:

Substances:

Year:  2003        PMID: 14594709     DOI: 10.1093/bioinformatics/btg279

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  36 in total

1.  Gene/protein name recognition based on support vector machine using dictionary as features.

Authors:  Tomohiro Mitsumori; Sevrani Fation; Masaki Murata; Kouichi Doi; Hirohumi Doi
Journal:  BMC Bioinformatics       Date:  2005-05-24       Impact factor: 3.169

2.  Bayesian inference of protein-protein interactions from biological literature.

Authors:  Rajesh Chowdhary; Jinfeng Zhang; Jun S Liu
Journal:  Bioinformatics       Date:  2009-04-15       Impact factor: 6.937

3.  Interaction relation ontology learning.

Authors:  Chuan-Xi Li; Ru-Jing Wang; Peng Chen; He Huang; Ya-Ru Su
Journal:  J Comput Biol       Date:  2014-01       Impact factor: 1.479

4.  PPLook: an automated data mining tool for protein-protein interaction.

Authors:  Shao-Wu Zhang; Yao-Jun Li; Li Xia; Quan Pan
Journal:  BMC Bioinformatics       Date:  2010-06-16       Impact factor: 3.169

5.  The age-phenome database.

Authors:  Nophar Geifman; Eitan Rubin
Journal:  Springerplus       Date:  2012-04-23

6.  Construction of an annotated corpus to support biomedical information extraction.

Authors:  Paul Thompson; Syed A Iqbal; John McNaught; Sophia Ananiadou
Journal:  BMC Bioinformatics       Date:  2009-10-23       Impact factor: 3.169

7.  VirusMINT: a viral protein interaction database.

Authors:  Andrew Chatr-aryamontri; Arnaud Ceol; Daniele Peluso; Aurelio Nardozza; Simona Panni; Francesca Sacco; Michele Tinti; Alex Smolyar; Luisa Castagnoli; Marc Vidal; Michael E Cusick; Gianni Cesareni
Journal:  Nucleic Acids Res       Date:  2008-10-30       Impact factor: 16.971

8.  PPI finder: a mining tool for human protein-protein interactions.

Authors:  Min He; Yi Wang; Wei Li
Journal:  PLoS One       Date:  2009-02-23       Impact factor: 3.240

9.  Evaluation of linguistic features useful in extraction of interactions from PubMed; application to annotating known, high-throughput and predicted interactions in I2D.

Authors:  Yun Niu; David Otasek; Igor Jurisica
Journal:  Bioinformatics       Date:  2009-10-22       Impact factor: 6.937

10.  PPInterFinder--a mining tool for extracting causal relations on human proteins from literature.

Authors:  Kalpana Raja; Suresh Subramani; Jeyakumar Natarajan
Journal:  Database (Oxford)       Date:  2013-01-15       Impact factor: 3.451

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.