Literature DB >> 25952719

Self-training in significance space of support vectors for imbalanced biomedical event data.

Tsendsuren Munkhdalai, Oyun-Erdene Namsrai, Keun Ryu.   

Abstract

BACKGROUND: Pairwise relationships extracted from biomedical literature are insufficient in formulating biomolecular interactions. Extraction of complex relations (namely, biomedical events) has become the main focus of the text-mining community. However, there are two critical issues that are seldom dealt with by existing systems. First, an annotated corpus for training a prediction model is highly imbalanced. Second, supervised models trained on only a single annotated corpus can limit system performance. Fortunately, there is a large pool of unlabeled data containing much of the domain background that one can exploit.
RESULTS: In this study, we develop a new semi-supervised learning method to address the issues outlined above. The proposed algorithm efficiently exploits the unlabeled data to leverage system performance. We furthermore extend our algorithm to a two-phase learning framework. The first phase balances the training data for initial model induction. The second phase incorporates domain knowledge into the event extraction model. The effectiveness of our method is evaluated on the Genia event extraction corpus and a PubMed document pool. Our method can identify a small subset of the majority class, which is sufficient for building a well-generalized prediction model. It outperforms the traditional self-training algorithm in terms of f-measure. Our model, based on the training data and the unlabeled data pool, achieves comparable performance to the state-of-the-art systems that are trained on a larger annotated set consisting of training and evaluation data.

Entities:  

Mesh:

Year:  2015        PMID: 25952719      PMCID: PMC4423724          DOI: 10.1186/1471-2105-16-S7-S6

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  12 in total

1.  BANNER: an executable survey of advances in biomedical named entity recognition.

Authors:  Robert Leaman; Graciela Gonzalez
Journal:  Pac Symp Biocomput       Date:  2008

2.  Event extraction with complex event classification using rich features.

Authors:  Makoto Miwa; Rune Saetre; Jin-Dong Kim; Jun'ichi Tsujii
Journal:  J Bioinform Comput Biol       Date:  2010-02       Impact factor: 1.122

3.  Identifying gene-disease associations using centrality on a literature mined gene-interaction network.

Authors:  Arzucan Ozgür; Thuy Vu; Günes Erkan; Dragomir R Radev
Journal:  Bioinformatics       Date:  2008-07-01       Impact factor: 6.937

4.  Complex event extraction at PubMed scale.

Authors:  Jari Björne; Filip Ginter; Sampo Pyysalo; Jun'ichi Tsujii; Tapio Salakoski
Journal:  Bioinformatics       Date:  2010-06-15       Impact factor: 6.937

5.  BioContext: an integrated text mining system for large-scale extraction and contextualization of biomolecular events.

Authors:  Martin Gerner; Farzaneh Sarafraz; Casey M Bergman; Goran Nenadic
Journal:  Bioinformatics       Date:  2012-06-17       Impact factor: 6.937

6.  Boosting automatic event extraction from the literature using domain adaptation and coreference resolution.

Authors:  Makoto Miwa; Paul Thompson; Sophia Ananiadou
Journal:  Bioinformatics       Date:  2012-04-25       Impact factor: 6.937

7.  Corpus annotation for mining biomedical events from literature.

Authors:  Jin-Dong Kim; Tomoko Ohta; Jun'ichi Tsujii
Journal:  BMC Bioinformatics       Date:  2008-01-08       Impact factor: 3.169

8.  Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations.

Authors:  Tsendsuren Munkhdalai; Meijing Li; Khuyagbaatar Batsuren; Hyeon Ah Park; Nak Hyeon Choi; Keun Ho Ryu
Journal:  J Cheminform       Date:  2015-01-19       Impact factor: 5.514

9.  Overview of BioCreative II gene mention recognition.

Authors:  Larry Smith; Lorraine K Tanabe; Rie Johnson nee Ando; Cheng-Ju Kuo; I-Fang Chung; Chun-Nan Hsu; Yu-Shi Lin; Roman Klinger; Christoph M Friedrich; Kuzman Ganchev; Manabu Torii; Hongfang Liu; Barry Haddow; Craig A Struble; Richard J Povinelli; Andreas Vlachos; William A Baumgartner; Lawrence Hunter; Bob Carpenter; Richard Tzong-Han Tsai; Hong-Jie Dai; Feng Liu; Yifei Chen; Chengjie Sun; Sophia Katrenko; Pieter Adriaans; Christian Blaschke; Rafael Torres; Mariana Neves; Preslav Nakov; Anna Divoli; Manuel Maña-López; Jacinto Mata; W John Wilbur
Journal:  Genome Biol       Date:  2008-09-01       Impact factor: 13.583

10.  Overview of the protein-protein interaction annotation extraction task of BioCreative II.

Authors:  Martin Krallinger; Florian Leitner; Carlos Rodriguez-Penagos; Alfonso Valencia
Journal:  Genome Biol       Date:  2008-09-01       Impact factor: 13.583

View more
  3 in total

Review 1.  Critical evaluation of in silico methods for prediction of coiled-coil domains in proteins.

Authors:  Chen Li; Catherine Ching Han Chang; Jeremy Nagel; Benjamin T Porebski; Morihiro Hayashida; Tatsuya Akutsu; Jiangning Song; Ashley M Buckle
Journal:  Brief Bioinform       Date:  2015-07-15       Impact factor: 11.622

2.  Ontology-Based Healthcare Named Entity Recognition from Twitter Messages Using a Recurrent Neural Network Approach.

Authors:  Erdenebileg Batbaatar; Keun Ho Ryu
Journal:  Int J Environ Res Public Health       Date:  2019-09-27       Impact factor: 3.390

3.  A Novel Sample Selection Strategy for Imbalanced Data of Biomedical Event Extraction with Joint Scoring Mechanism.

Authors:  Yang Lu; Xiaolei Ma; Yinan Lu; Yuxin Zhou; Zhili Pei
Journal:  Comput Math Methods Med       Date:  2016-12-14       Impact factor: 2.238

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.