Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Porting a lexicalized-grammar parser to the biomedical domain.

Literature DB >> 19141332

Porting a lexicalized-grammar parser to the biomedical domain.

Abstract

This paper introduces a state-of-the-art, linguistically motivated statistical parser to the biomedical text mining community, and proposes a method of adapting it to the biomedical domain requiring only limited resources for data annotation. The parser was originally developed using the Penn Treebank and is therefore tuned to newspaper text. Our approach takes advantage of a lexicalized grammar formalism, Combinatory Categorial Grammar (ccg), to train the parser at a lower level of representation than full syntactic derivations. The ccg parser uses three levels of representation: a first level consisting of part-of-speech (pos) tags; a second level consisting of more fine-grained ccg lexical categories; and a third, hierarchical level consisting of ccg derivations. We find that simply retraining the pos tagger on biomedical data leads to a large improvement in parsing performance, and that using annotated data at the intermediate lexical category level of representation improves parsing accuracy further. We describe the procedure involved in evaluating the parser, and obtain accuracies for biomedical data in the same range as those reported for newspaper text, and higher than those previously reported for the biomedical resource on which we evaluate. Our conclusion is that porting newspaper parsers to the biomedical domain, at least for parsers which use lexicalized grammars, may not be as difficult as first thought.

Mesh：

Year: 2008 PMID： 19141332 DOI： 10.1016/j.jbi.2008.12.004

Source DB: PubMed Journal: J Biomed Inform ISSN： 1532-0464 Impact factor: 6.317

Keyword Cloud
Cited

8 in total

Porting a lexicalized-grammar parser to the biomedical domain.

1. Bridging semantics and syntax with graph algorithms-state-of-the-art of extracting biomedical relations.

2. Domain adaption of parsing for operative notes.

3. Exploring subdomain variation in biomedical language.

4. Complex event extraction at PubMed scale.

5. Automatic recognition of conceptualization zones in scientific articles and two life science applications.

6. Applications of natural language processing in biodiversity science.

7. Making adjustments to event annotations for improved biological event extraction.

8. Text mining for improved exposure assessment.