Literature DB >> 24884358

A resource-saving collective approach to biomedical semantic role labeling.

Richard Tzong-Han Tsai1, Po-Ting Lai.   

Abstract

BACKGROUND: Biomedical semantic role labeling (BioSRL) is a natural language processing technique that identifies the semantic roles of the words or phrases in sentences describing biological processes and expresses them as predicate-argument structures (PAS's). Currently, a major problem of BioSRL is that most systems label every node in a full parse tree independently; however, some nodes always exhibit dependency. In general SRL, collective approaches based on the Markov logic network (MLN) model have been successful in dealing with this problem. However, in BioSRL such an approach has not been attempted because it would require more training data to recognize the more specialized and diverse terms found in biomedical literature, increasing training time and computational complexity.
RESULTS: We first constructed a collective BioSRL system based on MLN. This system, called collective BIOSMILE (CBIOSMILE), is trained on the BioProp corpus. To reduce the resources used in BioSRL training, we employ a tree-pruning filter to remove unlikely nodes from the parse tree and four argument candidate identifiers to retain candidate nodes in the tree. Nodes not recognized by any candidate identifier are discarded. The pruned annotated parse trees are used to train a resource-saving MLN-based system, which is referred to as resource-saving collective BIOSMILE (RCBIOSMILE). Our experimental results show that our proposed CBIOSMILE system outperforms BIOSMILE, which is the top BioSRL system. Furthermore, our proposed RCBIOSMILE maintains the same level of accuracy as CBIOSMILE using 92% less memory and 57% less training time.
CONCLUSIONS: This greatly improved efficiency makes RCBIOSMILE potentially suitable for training on much larger BioSRL corpora over more biomedical domains. Compared to real-world biomedical corpora, BioProp is relatively small, containing only 445 MEDLINE abstracts and 30 event triggers. It is not large enough for practical applications, such as pathway construction. We consider it of primary importance to pursue SRL training on large corpora in the future.

Entities:  

Mesh:

Year:  2014        PMID: 24884358      PMCID: PMC4062501          DOI: 10.1186/1471-2105-15-160

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


  10 in total

1.  GENIA corpus--semantically annotated corpus for bio-textmining.

Authors:  J-D Kim; T Ohta; Y Tateisi; J Tsujii
Journal:  Bioinformatics       Date:  2003       Impact factor: 6.937

2.  Integration of gene normalization stages and co-reference resolution using a Markov logic network.

Authors:  Hong-Jie Dai; Yen-Ching Chang; Richard Tzong-Han Tsai; Wen-Lian Hsu
Journal:  Bioinformatics       Date:  2011-06-17       Impact factor: 6.937

3.  Domain adaptation for semantic role labeling in the biomedical domain.

Authors:  Daniel Dahlmeier; Hwee Tou Ng
Journal:  Bioinformatics       Date:  2010-02-23       Impact factor: 6.937

4.  Coreference resolution of medical concepts in discharge summaries by exploiting contextual information.

Authors:  Hong-Jie Dai; Chun-Yu Chen; Chi-Yang Wu; Po-Ting Lai; Richard Tzong-Han Tsai; Wen-Lian Hsu
Journal:  J Am Med Inform Assoc       Date:  2012-05-03       Impact factor: 4.497

Review 5.  A critical review of PASBio's argument structures for biomedical verbs.

Authors:  K Bretonnel Cohen; Lawrence Hunter
Journal:  BMC Bioinformatics       Date:  2006-11-24       Impact factor: 3.169

6.  BIOSMILE: a semantic role labeling system for biomedical verbs using a maximum-entropy model with automatically generated template features.

Authors:  Richard Tzong-Han Tsai; Wen-Chi Chou; Ying-Shan Su; Yu-Chun Lin; Cheng-Lung Sung; Hong-Jie Dai; Irene Tzu-Hsuan Yeh; Wei Ku; Ting-Yi Sung; Wen-Lian Hsu
Journal:  BMC Bioinformatics       Date:  2007-09-01       Impact factor: 3.169

7.  PASBio: predicate-argument structures for event extraction in molecular biology.

Authors:  Tuangthong Wattarujeekrit; Parantu K Shah; Nigel Collier
Journal:  BMC Bioinformatics       Date:  2004-10-19       Impact factor: 3.169

8.  Collective instance-level gene normalization on the IGN corpus.

Authors:  Hong-Jie Dai; Johnny Chi-Yang Wu; Richard Tzong-Han Tsai
Journal:  PLoS One       Date:  2013-11-25       Impact factor: 3.240

9.  Construction of an annotated corpus to support biomedical information extraction.

Authors:  Paul Thompson; Syed A Iqbal; John McNaught; Sophia Ananiadou
Journal:  BMC Bioinformatics       Date:  2009-10-23       Impact factor: 3.169

10.  Semantic role labeling for protein transport predicates.

Authors:  Steven Bethard; Zhiyong Lu; James H Martin; Lawrence Hunter
Journal:  BMC Bioinformatics       Date:  2008-06-11       Impact factor: 3.169

  10 in total
  2 in total

1.  BelSmile: a biomedical semantic role labeling approach for extracting biological expression language from text.

Authors:  Po-Ting Lai; Yu-Yan Lo; Ming-Siang Huang; Yu-Cheng Hsiao; Richard Tzong-Han Tsai
Journal:  Database (Oxford)       Date:  2016-05-12       Impact factor: 3.451

2.  The extraction of complex relationships and their conversion to biological expression language (BEL) overview of the BioCreative VI (2017) BEL track.

Authors:  Sumit Madan; Justyna Szostak; Ravikumar Komandur Elayavilli; Richard Tzong-Han Tsai; Mehdi Ali; Longhua Qian; Majid Rastegar-Mojarad; Julia Hoeng; Juliane Fluck
Journal:  Database (Oxford)       Date:  2019-01-01       Impact factor: 3.451

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.