Literature DB >> 19818874

Measuring prediction capacity of individual verbs for the identification of protein interactions.

Dietrich Rebholz-Schuhmann1, Antonio Jimeno-Yepes, Miguel Arregui, Harald Kirsch.   

Abstract

MOTIVATION: The identification of events such as protein-protein interactions (PPIs) from the scientific literature is a complex task. One of the reasons is that there is no formal syntax to denote such relations in the scientific literature. Nonetheless, it is important to understand such relational event representations to improve information extraction solutions (e.g., for gene regulatory events). In this study, we analyze publicly available protein interaction corpora (AIMed, BioInfer, BioCreAtIve II) to determine the scope of verbs used to denote protein interactions and to measure their predictive capacity for the identification of PPI events. Our analysis is based on syntactical language patterns. This restriction has the advantage that the verb mention is used as the independent variable in the experiments enabling comparability of results in the usage of the verbs. The initial selection of verbs has been generated from a systematic analysis of the scientific literature and existing corpora for PPIs. We distinguish modifying interactions (MIs) such as posttranslational modifications (PTMs) from non-modifying interactions (NMIs) and assumed that MIs have a higher predictive capacity due to stronger scientific evidence proving the interaction. We found that MIs are less frequent in the corpus but can be extracted at the same precision levels as PPIs. A significant portion of correct PPI reportings in the BioCreAtIve II corpus use the verb "associate", which semantically does not prove a relation. The performance of every monitored verb is listed and allows the selection of specific verbs to improve the performance of PPI extraction solutions. Programmatic access to the text processing modules is available online (www.ebi.ac.uk/webservices/whatizit/info.jsf) and the full analysis of Medline abstracts will be made through the Web pages of the Rebholz group. 2009 Elsevier Inc. All rights reserved.

Mesh:

Substances:

Year:  2009        PMID: 19818874     DOI: 10.1016/j.jbi.2009.09.007

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  7 in total

1.  Interaction relation ontology learning.

Authors:  Chuan-Xi Li; Ru-Jing Wang; Peng Chen; He Huang; Ya-Ru Su
Journal:  J Comput Biol       Date:  2014-01       Impact factor: 1.479

2.  PaperMaker: validation of biomedical scientific publications.

Authors:  D Rebholz-Schuhmann; S Kavaliauskas; P Pezik
Journal:  Bioinformatics       Date:  2010-03-03       Impact factor: 6.937

3.  Classifying protein-protein interaction articles using word and syntactic features.

Authors:  Sun Kim; W John Wilbur
Journal:  BMC Bioinformatics       Date:  2011-10-03       Impact factor: 3.169

4.  A generalizable NLP framework for fast development of pattern-based biomedical relation extraction systems.

Authors:  Yifan Peng; Manabu Torii; Cathy H Wu; K Vijay-Shanker
Journal:  BMC Bioinformatics       Date:  2014-08-23       Impact factor: 3.169

5.  Triage by ranking to support the curation of protein interactions.

Authors:  Luc Mottin; Emilie Pasche; Julien Gobeill; Valentine Rech de Laval; Anne Gleizes; Pierre-André Michel; Amos Bairoch; Pascale Gaudet; Patrick Ruch
Journal:  Database (Oxford)       Date:  2017-01-01       Impact factor: 3.451

6.  Natural language processing in text mining for structural modeling of protein complexes.

Authors:  Varsha D Badal; Petras J Kundrotas; Ilya A Vakser
Journal:  BMC Bioinformatics       Date:  2018-03-05       Impact factor: 3.169

7.  PCorral--interactive mining of protein interactions from MEDLINE.

Authors:  Chen Li; Antonio Jimeno-Yepes; Miguel Arregui; Harald Kirsch; Dietrich Rebholz-Schuhmann
Journal:  Database (Oxford)       Date:  2013-05-02       Impact factor: 3.451

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.