| Literature DB >> 24850848 |
Yifan Peng1, Catalina O Tudor2, Manabu Torii2, Cathy H Wu2, K Vijay-Shanker3.
Abstract
This article reports the use of the BioC standard format in our sentence simplification system, iSimp, and demonstrates its general utility. iSimp is designed to simplify complex sentences commonly found in the biomedical text, and has been shown to improve existing text mining applications that rely on the analysis of sentence structures. By adopting the BioC format, we aim to make iSimp readily interoperable with other applications in the biomedical domain. To examine the utility of iSimp in BioC, we implemented a rule-based relation extraction system that uses iSimp as a preprocessing module and BioC for data exchange. Evaluation on the training corpus of BioNLP-ST 2011 GENIA Event Extraction (GE) task showed that iSimp sentence simplification improved the recall by 3.2% without reducing precision. The iSimp simplification-annotated corpora, both our previously used corpus and the GE corpus in the current study, have been converted into the BioC format and made publicly available at the project's Web site: http://research.bioinformatics.udel.edu/isimp/. Database URL:http://research.bioinformatics.udel.edu/isimp/Entities:
Mesh:
Year: 2014 PMID: 24850848 PMCID: PMC4028706 DOI: 10.1093/database/bau038
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1.NLP pipeline with iSimp.
Figure 2.The workflow of iSimp.
Figure 3.The key file used in iSimp to define the simplification constructs associated with the data.
Figure 4.The key file used in iSimp to define the simplified sentences associated with the data.
Figure 5.An example of sentence simplification annotation in BioC format. The coordination contains two conjuncts (‘phosphorylates’, ‘activates’) and one conjunction (‘and’). Some attributes, like the location elements, are not shown for the sake of space.
Figure 6.An example of simplified sentences in BioC format (left) and the corresponding text file (right) with locations highlighted.
Figure 7.An example showing ‘equ’ (equivalence) relations in iSimp-generated BioC file.