Literature DB >> 19141332

Porting a lexicalized-grammar parser to the biomedical domain.

Laura Rimell1, Stephen Clark.   

Abstract

This paper introduces a state-of-the-art, linguistically motivated statistical parser to the biomedical text mining community, and proposes a method of adapting it to the biomedical domain requiring only limited resources for data annotation. The parser was originally developed using the Penn Treebank and is therefore tuned to newspaper text. Our approach takes advantage of a lexicalized grammar formalism, Combinatory Categorial Grammar (ccg), to train the parser at a lower level of representation than full syntactic derivations. The ccg parser uses three levels of representation: a first level consisting of part-of-speech (pos) tags; a second level consisting of more fine-grained ccg lexical categories; and a third, hierarchical level consisting of ccg derivations. We find that simply retraining the pos tagger on biomedical data leads to a large improvement in parsing performance, and that using annotated data at the intermediate lexical category level of representation improves parsing accuracy further. We describe the procedure involved in evaluating the parser, and obtain accuracies for biomedical data in the same range as those reported for newspaper text, and higher than those previously reported for the biomedical resource on which we evaluate. Our conclusion is that porting newspaper parsers to the biomedical domain, at least for parsers which use lexicalized grammars, may not be as difficult as first thought.

Mesh:

Year:  2008        PMID: 19141332     DOI: 10.1016/j.jbi.2008.12.004

Source DB:  PubMed          Journal:  J Biomed Inform        ISSN: 1532-0464            Impact factor:   6.317


  8 in total

1.  Bridging semantics and syntax with graph algorithms-state-of-the-art of extracting biomedical relations.

Authors:  Yuan Luo; Özlem Uzuner; Peter Szolovits
Journal:  Brief Bioinform       Date:  2016-02-05       Impact factor: 11.622

2.  Domain adaption of parsing for operative notes.

Authors:  Yan Wang; Serguei Pakhomov; James O Ryan; Genevieve B Melton
Journal:  J Biomed Inform       Date:  2015-02-07       Impact factor: 6.317

3.  Exploring subdomain variation in biomedical language.

Authors:  Thomas Lippincott; Diarmuid Ó Séaghdha; Anna Korhonen
Journal:  BMC Bioinformatics       Date:  2011-05-27       Impact factor: 3.169

4.  Complex event extraction at PubMed scale.

Authors:  Jari Björne; Filip Ginter; Sampo Pyysalo; Jun'ichi Tsujii; Tapio Salakoski
Journal:  Bioinformatics       Date:  2010-06-15       Impact factor: 6.937

5.  Automatic recognition of conceptualization zones in scientific articles and two life science applications.

Authors:  Maria Liakata; Shyamasree Saha; Simon Dobnik; Colin Batchelor; Dietrich Rebholz-Schuhmann
Journal:  Bioinformatics       Date:  2012-02-08       Impact factor: 6.937

6.  Applications of natural language processing in biodiversity science.

Authors:  Anne E Thessen; Hong Cui; Dmitry Mozzherin
Journal:  Adv Bioinformatics       Date:  2012-05-22

7.  Making adjustments to event annotations for improved biological event extraction.

Authors:  Seung-Cheol Baek; Jong C Park
Journal:  J Biomed Semantics       Date:  2016-09-16

8.  Text mining for improved exposure assessment.

Authors:  Kristin Larsson; Simon Baker; Ilona Silins; Yufan Guo; Ulla Stenius; Anna Korhonen; Marika Berglund
Journal:  PLoS One       Date:  2017-03-03       Impact factor: 3.240

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.