Literature DB >> 15564295

Wnt pathway curation using automated natural language processing: combining statistical methods with partial and full parse for knowledge extraction.

Carlos Santos1, Daniela Eggle, David J States.   

Abstract

MOTIVATION: Wnt signaling is a very active area of research with highly relevant publications appearing at a rate of more than one per day. Building and maintaining databases describing signal transduction networks is a time-consuming and demanding task that requires careful literature analysis and extensive domain-specific knowledge. For instance, more than 50 factors involved in Wnt signal transduction have been identified as of late 2003. In this work we describe a natural language processing (NLP) system that is able to identify references to biological interaction networks in free text and automatically assembles a protein association and interaction map.
RESULTS: A 'gold standard' set of names and assertions was derived by manual scanning of the Wnt genes website (http://www.stanford.edu/~rnusse/wntwindow.html) including 53 interactions involved in Wnt signaling. This system was used to analyze a corpus of peer-reviewed articles related to Wnt signaling including 3369 Pubmed and 1230 full text papers. Names for key Wnt-pathway associated proteins and biological entities are identified using a chi-squared analysis of noun phrases over-represented in the Wnt literature as compared to the general signal transduction literature. Interestingly, we identified several instances where generic terms were used on the website when more specific terms occur in the literature, and one typographic error on the Wnt canonical pathway. Using the named entity list and performing an exhaustive assertion extraction of the corpus, 34 of the 53 interactions in the 'gold standard' Wnt signaling set were successfully identified (64% recall). In addition, the automated extraction found several interactions involving key Wnt-related molecules which were missing or different from those in the canonical diagram, and these were confirmed by manual review of the text. These results suggest that a combination of NLP techniques for information extraction can form a useful first-pass tool for assisting human annotation and maintenance of signal pathway databases. AVAILABILITY: The pipeline software components are freely available on request to the authors. CONTACT: dstates@umich.edu SUPPLEMENTARY INFORMATION: http://stateslab.bioinformatics.med.umich.edu/software.html.

Entities:  

Mesh:

Substances:

Year:  2004        PMID: 15564295     DOI: 10.1093/bioinformatics/bti165

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  13 in total

Review 1.  A cheminformatic toolkit for mining biomedical knowledge.

Authors:  Gus R Rosania; Gordon Crippen; Peter Woolf; David States; Kerby Shedden
Journal:  Pharm Res       Date:  2007-03-24       Impact factor: 4.200

2.  BSQA: integrated text mining using entity relation semantics extracted from biological literature of insects.

Authors:  Xin He; Yanen Li; Radhika Khetani; Barry Sanders; Yue Lu; Xu Ling; Chengxiang Zhai; Bruce Schatz
Journal:  Nucleic Acids Res       Date:  2010-07       Impact factor: 16.971

3.  A text-mining system for extracting metabolic reactions from full-text articles.

Authors:  Jan Czarnecki; Irene Nobeli; Adrian M Smith; Adrian J Shepherd
Journal:  BMC Bioinformatics       Date:  2012-07-23       Impact factor: 3.169

Review 4.  What the papers say: text mining for genomics and systems biology.

Authors:  Nathan Harmston; Wendy Filsell; Michael P H Stumpf
Journal:  Hum Genomics       Date:  2010-10       Impact factor: 4.639

5.  Automatic pathway building in biological association networks.

Authors:  Anton Yuryev; Zufar Mulyukov; Ekaterina Kotelnikova; Sergei Maslov; Sergei Egorov; Alexander Nikitin; Nikolai Daraselia; Ilya Mazo
Journal:  BMC Bioinformatics       Date:  2006-03-24       Impact factor: 3.169

6.  Argument-predicate distance as a filter for enhancing precision in extracting predications on the genetic etiology of disease.

Authors:  Marco Masseroli; Halil Kilicoglu; François-Michel Lang; Thomas C Rindflesch
Journal:  BMC Bioinformatics       Date:  2006-06-08       Impact factor: 3.169

7.  Biomedical text mining and its applications.

Authors:  Raul Rodriguez-Esteban
Journal:  PLoS Comput Biol       Date:  2009-12-24       Impact factor: 4.475

8.  PathBinder--text empirics and automatic extraction of biomolecular interactions.

Authors:  Lifeng Zhang; Daniel Berleant; Jing Ding; Tuan Cao; Eve Syrkin Wurtele
Journal:  BMC Bioinformatics       Date:  2009-10-08       Impact factor: 3.169

9.  New challenges for text mining: mapping between text and manually curated pathways.

Authors:  Kanae Oda; Jin-Dong Kim; Tomoko Ohta; Daisuke Okanohara; Takuya Matsuzaki; Yuka Tateisi; Jun'ichi Tsujii
Journal:  BMC Bioinformatics       Date:  2008-04-11       Impact factor: 3.169

10.  Public databases and software for the pathway analysis of cancer genomes.

Authors:  Ivy F L Tsui; Raj Chari; Timon P H Buys; Wan L Lam
Journal:  Cancer Inform       Date:  2007-12-12
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.