| Literature DB >> 19091017 |
Richard Tzong-Han Tsai1, Hong-Jie Dai, Chi-Hsin Huang, Wen-Lian Hsu.
Abstract
BACKGROUND: Semantic role labeling (SRL) is an important text analysis technique. In SRL, sentences are represented by one or more predicate-argument structures (PAS). Each PAS is composed of a predicate (verb) and several arguments (noun phrases, adverbial phrases, etc.) with different semantic roles, including main arguments (agent or patient) as well as adjunct arguments (time, manner, or location). PropBank is the most widely used PAS corpus and annotation format in the newswire domain. In the biomedical field, however, more detailed and restrictive PAS annotation formats such as PASBio are popular. Unfortunately, due to the lack of an annotated PASBio corpus, no publicly available machine-learning (ML) based SRL systems based on PASBio have been developed. In previous work, we constructed a biomedical corpus based on the PropBank standard called BioProp, on which we developed an ML-based SRL system, BIOSMILE. In this paper, we aim to build a system to convert BIOSMILE's BioProp annotation output to PASBio annotation. Our system consists of BIOSMILE in combination with a BioProp-PASBio rule-based converter, and an additional semi-automatic rule generator.Entities:
Mesh:
Year: 2008 PMID: 19091017 PMCID: PMC2638158 DOI: 10.1186/1471-2105-9-S12-S18
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1A parse tree annotated with semantic roles.
Frameset of verb "delete" in PropBank I and PASBio
| Predicate: delete | ||
| Argument | PropBank I | PASBio |
| Arg0 | entity removing | causer mechanism |
| //mutation, alternative splicing// | ||
| Arg1 | thing being removed | entity being removed |
| //exon, gene, chromosomal region, cell// | ||
| Arg2 | removed from | resultant product |
| //transcripts// | ||
Figure 2Screenshot of the rule-generation tool.
Figure 3Conversion rule for the verb "express" for Figure 2.
Figure 4Multiple overlap for the verb "express".
The frameset of the verb "express" in BioProp and PASBio
| Predicate: express | ||
| Argument | BioProp | PASBio |
| Arg0 | causer of expression | no definition |
| Arg1 | thing expressing | named entity being expressed |
| //gene or gene products// | ||
| Arg2 | end state | property of the existing named entity [Arg1] |
| Arg3 | start state | location referring to organelle, cell or tissue |
Rule-based converter performance (on PASBiop)
| Argument Type | Precision | Recall | F-score |
| Arg0 | 86.36 | 92.36 | 89.26 |
| Arg1 | 90.04 | 87.85 | 88.93 |
| Arg2 | 88.03 | 70.55 | 78.33 |
| Arg3 | 90.00 | 64.29 | 75.00 |
| Arg4 | 66.67 | 54.54 | 60.00 |
| ArgM-MNR | 88.89 | 100.00 | 94.12 |
| ArgM-MOD | 100.00 | 100.00 | 100.00 |
| ArgM-NEG | 100.00 | 100.00 | 100.00 |
| ArgR | 75.00 | 33.33 | 46.15 |
| Overall | 88.55 | 82.27 | 85.29 |
Combined system performance
| Argument Type | Precision | Recall | F-score |
| Arg0 | 79.49 | 64.58 | 71.26 |
| Arg1 | 79.65 | 63.89 | 70.91 |
| Arg2 | 87.80 | 49.32 | 63.16 |
| Arg3 | 95.65 | 39.29 | 55.70 |
| Arg4 | 100.00 | 45.45 | 62.50 |
| ArgM-MNR | 88.89 | 100.0 | 94.12 |
| ArgM-MOD | 100.00 | 100.00 | 100.00 |
| ArgM-NEG | 100.00 | 100.00 | 100.00 |
| ArgR | 100.00 | 22.22 | 36.36 |
| Overall | 82.85 | 59.23 | 69.08 |
The frameset of the verb "inhibit" in PASBio and BioProp
| Argument | BioProp |
| Arg0 | Inhibitor |
| Arg1 | entity inhibited |
| Argument | PASBio |
| Arg0 | agent |
| Arg1 | the entity being inhibited by agent to get binding |
| Arg2 | the action or property being inhibited |
Figure 5Coordination ambiguity.
Figure 6Correlation between BIOSMILE and combined system performance.