| Literature DB >> 20868467 |
Quang Long Nguyen1, Domonkos Tikk, Ulf Leser.
Abstract
BACKGROUND: Pattern-based approaches to relation extraction have shown very good results in many areas of biomedical text mining. However, defining the right set of patterns is difficult; approaches are either manual, incurring high cost, or automatic, often resulting in large sets of noisy patterns.Entities:
Year: 2010 PMID: 20868467 PMCID: PMC2955645 DOI: 10.1186/2041-1480-1-9
Source DB: PubMed Journal: J Biomed Semantics
A simplified example of an initial pattern of the Ali Baba system (derived from a sentence from document 1520341 of the BioNLP task).
| Token layer | CD19 | protein | is | expressed |
|---|---|---|---|---|
| PTN | protein | be | express | |
| PTN | NN | VBZ | GEE | |
PTN indicates protein name; VBZ, verb in present tense; NN, singular noun; GEE, gene expression event trigger.
Statistics on the three data sets provided by the BioNLP task.
| Training | Development | Test | |
|---|---|---|---|
| Abstracts | 800 | 150 | 260 |
| Sentences | 7,449 | 1,450 | 2,447 |
| Words | 176,146 | 33,937 | 57,367 |
| Gene expression events | 1,738 | 356 | 722 |
| Protein catabolism events | 111 | 21 | 14 |
| Transcription events | 576 | 82 | 137 |
| Phosphorylation events | 169 | 47 | 135 |
Effect of filtering on combined training data (cross-validation folds from development and training corpus) and on the held-back test data set.
| Development (per split) | Test | |||||||
|---|---|---|---|---|---|---|---|---|
| # patterns | Aver. pattern length | Precision | Recall | F1 | Precision | Recall | F1 | |
| Baseline | 590 | 8.93 | 24.7 | 49.2 | 32.9 | 17.2 | 43.9 | 24.8 |
| Split 1 | 50 | 5.34 | 65.6 | 51.8 | 57.9 | 64.7 | 42.7 | 51.4 |
| Split 2 | 50 | 4.86 | 78.1 | 52.3 | 62.6 | 63.0 | 37.8 | 47.3 |
| Split 3 | 60 | 4.68 | 67.6 | 52.9 | 59.3 | 60.9 | 42.5 | 50.1 |
| Split 4 | 40 | 5.02 | 67.7 | 49.5 | 57.2 | 66.6 | 36.7 | 47.3 |
| Split 5 | 50 | 4.80 | 63.7 | 48.7 | 55.2 | 64.2 | 40.7 | 49.8 |
| Union of patterns | 104 | 5.65 | 58.2 | 46.8 | 51.9 | |||
| Best 90 | 90 | 5.66 | 59.7 | 45.1 | 51.4 | |||
| Best 80 | 80 | 5.75 | 64.8 | 37.7 | 47.6 | |||
| Best 70 | 70 | 6.01 | 69.4 | 26.7 | 38.6 | |||
| Best 60 | 60 | 6.17 | 60.0 | 10.0 | 17.1 | |||
| Results of the winner of the shared task [ | 78.5 | 69.8 | 73.9 | |||||
See the definition of splits in text in Results (Evaluation of Test Data)
Evaluation of trigger words for the gene expression event.
| Trigger word | FP | TP | Occurrence | Hit rate |
|---|---|---|---|---|
| co-expression | 0 | 2 | 3 | 0.7 |
| coexpressed | 0 | 4 | 6 | 0.7 |
| nonexpressing | 0 | 2 | 3 | 0.7 |
| expressing | 1 | 44 | 75 | 0.6 |
| expressed | 25 | 136 | 232 | 0.6 |
| express | 15 | 43 | 81 | 0.6 |
| production | 6 | 150 | 298 | 0.5 |
| co-transfections | 1 | 1 | 2 | 0.5 |
| resynthesized | 0 | 1 | 2 | 0.5 |
| expression | 106 | 873 | 1768 | 0.5 |
| produce | 2 | 16 | 33 | 0.5 |
| expresses | 0 | 4 | 9 | 0.4 |
| produces | 0 | 2 | 5 | 0.4 |
| overexpression | 53 | 27 | 80 | 0.4 |
FP: number of cases where the trigger word is trigger of other event types. TP: number of cases where the trigger word is a gene expression trigger. Occurrences: Total number of occurrences in corpus. Hit rate: see definition in text in Results (Trigger Word Filter).
Figure 1F-scores on gene expression event extraction when considering only the .
Evaluation of gene expression event extraction using different combinations of filters.
| Filter | Precision | Recall | F1 |
|---|---|---|---|
| No filter | 24.7 | 49.2 | 32.9 |
| Trigger Word | 39.7 | 48.0 | 43.5 |
| Pattern Length (exactly 4 token) | 51.0 | 29.8 | 37.6 |
| Pattern lengths (≤ 4 token) | 56.2 | 43.5 | 49.0 |
| Pattern Performance (top 50 pattern) | 50.0 | 48.3 | 49.0 |
| Trigger Word + Pattern Length | 65.6 | 39.3 | 49.2 |
| Trigger Word + Pattern Performance | 77.4 | 46.3 | 58.0 |
Figure 2F-scores on gene expression event extraction when using only pattern consisting of maximal .
Figure 3Precision of 590 patterns on extracting gene expression event extraction. Patterns are sorted by decreasing individual precision.
Figure 4F-score of gene expression event extraction when considering subsets of 10 to 200 best patterns sorted by decreasing precision.
Results for the extraction of event types other than gene expression (phosphorylation, transcription, and protein catabolism) on development and test data sets.
| Precision | Recall | F1 | |
|---|---|---|---|
| Baseline | 3.6 | 42.8 | 6.6 |
| Trigger Word Filter | 3.2 | 38. 1 | 5.9 |
| Pattern Performance Filter | 61.5 | 38.1 | 47.0 |
| Baseline | 1.8 | 50.0 | 3.4 |
| Pattern Performance Filter | 50.0 | 14.2 | 22.2 |
| Best in shared task | 66.6 | 42.8 | 52.1 |
| Baseline | 3.8 | 26.8 | 6.7 |
| Trigger Word Filter | 4.4 | 21.9 | 7.4 |
| Pattern Performance Filter | 18.1 | 24.3 | 20.8 |
| Trigger Word + Pattern Perf | 35.3 | 20.2 | 25.5 |
| Baseline | 4.1 | 29.2 | 7.2 |
| Trigger Word + Pattern Perf | 23.8 | 11.2 | 15.3 |
| Best in shared task | 69.2 | 39.4 | 50.2 |
| Baseline | 2.4 | 35.6 | 4.6 |
| Pattern Performance Filter | 40.0 | 34.0 | 36.7 |
| Baseline | 3.0 | 49.2 | 5.7 |
| Pattern Performance Filter | 72.7 | 47.4 | 57.4 |
| Best in shared task | 91.2 | 76.3 | 83.0 |
The table also reports on winner's performance in the shared task on the test data [21]. Best results for phosphorylation are omitted as these are not directly comparable (see text for expla ation).