| Literature DB >> 21489223 |
David Martinez1, Timothy Baldwin.
Abstract
This paper describes a method for detecting event trigger words in biomedical text based on a word sense disambiguation (WSD) approach. We first investigate the applicability of existing WSD techniques to trigger word disambiguation in the BioNLP 2009 shared task data, and find that we are able to outperform a traditional CRF-based approach for certain word types. On the basis of this finding, we combine the WSD approach with the CRF, and obtain significant improvements over the standalone CRF, gaining particularly in recall.Entities:
Mesh:
Year: 2011 PMID: 21489223 PMCID: PMC3073184 DOI: 10.1186/1471-2105-12-S2-S4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
List of target events for BioNLP 2009
| Gene expression |
| Transcription |
| Protein catabolism |
| Localization |
| Binding |
| Phosphorylation |
| Regulation |
| Positive regulation |
| Negative regulation |
List of target words for our WSD experiment (N = noun, V = verb, J = adjective). We present the number of train and test instances, the number of classes, and the bias of the majority class.
| Word | Train # | Test # | Classes | # Top class % |
|---|---|---|---|---|
| expression.N | 1465 | 265 | 5 | 0.51 |
| transcription.N | 1214 | 210 | 2 | 0.83 |
| activation.N | 1177 | 272 | 3 | 0.89 |
| promoter.N | 770 | 141 | 3 | 0.76 |
| binding.N | 625 | 113 | 2 | 0.84 |
| induce.V | 565 | 128 | 3 | 0.65 |
| activate.V | 523 | 93 | 2 | 0.84 |
| effect.N | 416 | 79 | 4 | 0.83 |
| inhibit.V | 412 | 57 | 2 | 0.67 |
| induction.N | 405 | 116 | 4 | 0.57 |
| bind.V | 373 | 80 | 2 | 0.52 |
| role.N | 342 | 75 | 3 | 0.87 |
| express.V | 308 | 55 | 5 | 0.57 |
| increase.V | 289 | 59 | 2 | 0.57 |
| stimulation.N | 284 | 53 | 4 | 0.89 |
| regulation.N | 274 | 52 | 3 | 0.66 |
| regulate.V | 265 | 49 | 3 | 0.59 |
| require.V | 257 | 53 | 3 | 0.74 |
| production.N | 251 | 45 | 4 | 0.49 |
| inhibition.N | 219 | 38 | 2 | 0.79 |
| mediate.V | 218 | 54 | 3 | 0.74 |
| stimulate.V | 215 | 35 | 3 | 0.83 |
| result.V | 180 | 35 | 4 | 0.86 |
| enhance.V | 178 | 27 | 2 | 0.62 |
| phosphorylation.N | 170 | 52 | 3 | 0.64 |
| increase.N | 157 | 27 | 2 | 0.52 |
| lead.V | 146 | 25 | 2 | 0.83 |
| interaction.N | 144 | 41 | 2 | 0.71 |
| associate.V | 138 | 43 | 2 | 0.85 |
| block.V | 138 | 28 | 3 | 0.67 |
| control.N | 124 | 26 | 2 | 0.87 |
| translocation.N | 123 | 20 | 2 | 0.76 |
| tyrosine.N | 119 | 24 | 2 | 0.76 |
| synthesis.N | 119 | 9 | 3 | 0.75 |
| detect.V | 109 | 16 | 7 | 0.76 |
| tNF.N | 108 | 11 | 2 | 0.70 |
| inducible.J | 108 | 28 | 3 | 0.78 |
| affect.V | 107 | 14 | 4 | 0.62 |
| transactivation.N | 106 | 4 | 2 | 0.86 |
| nucleus.N | 104 | 12 | 2 | 0.85 |
| decrease.V | 98 | 12 | 2 | 0.54 |
| reduce.V | 90 | 26 | 2 | 0.52 |
| control.V | 89 | 17 | 2 | 0.65 |
| suppress.V | 89 | 20 | 2 | 0.65 |
| degradation.N | 88 | 15 | 2 | 0.72 |
| produce.V | 87 | 19 | 3 | 0.55 |
| transcript.N | 80 | 21 | 2 | 0.88 |
| occur.V | 80 | 24 | 2 | 0.89 |
| target.N | 75 | 16 | 3 | 0.87 |
| dependent.J | 73 | 16 | 3 | 0.71 |
| cause.V | 72 | 10 | 2 | 0.85 |
| essential.J | 70 | 9 | 2 | 0.84 |
| interact.V | 70 | 15 | 2 | 0.51 |
| heterodimer.N | 69 | 22 | 2 | 0.84 |
| secretion.N | 66 | 11 | 2 | 0.73 |
| prevent.V | 65 | 12 | 2 | 0.60 |
| change.N | 62 | 10 | 3 | 0.81 |
| transfecte.V | 61 | 25 | 3 | 0.85 |
| absence.N | 60 | 10 | 7 | 0.83 |
| modulate.V | 58 | 8 | 3 | 0.79 |
| contribute.V | 55 | 16 | 3 | 0.78 |
| decrease.N | 51 | 10 | 2 | 0.61 |
| cross-linking.N | 50 | 2 | 2 | 0.52 |
Entropy for each feature type across the three WSD corpora
| Feature type | BioNLP | NLM | Senseval |
|---|---|---|---|
| Local | 0.301 | 0.176 | 0.380 |
| Syntactic dep. | 0.305 | — | 0.280 |
| BOW | 0.339 | 0.186 | 0.455 |
| MeSH | 0.360 | 0.183 | — |
| Overall | 0.323 | 0.180 | 0.435 |
Number of word types and their average training frequency for different average entropy ranges. An example for each group is provided, together with its most frequent class.
| Average Entropy | # Words | Avg. Freq. | Example | Major Class |
|---|---|---|---|---|
| 28 | 272.8 | occur.V | NON-EVT | |
| .3 ≤ | 20 | 167.3 | change.N | NON-EVT |
| 15 | 264.8 | express.V | GENE-EXP |
WSD performance of the different classifiers (the best results per column are given in bold)
| System | Acc | Prec | Rec | F-score |
|---|---|---|---|---|
| MC | 72.8 | 55.9 | 27.4 | 36.7 |
| SVM-Weka | 62.7 | 39.9 | 39.6 | 39.8 |
| CRF | 78.4 | 46.3 | 56.5 | |
| VSM | 71.7 | 54.4 | 58.1 | |
| CRF-VSM | 70.2 | 52.6 |
WSD result for different entropy ranges (the best score for each evaluation metric and entropy range is shown in bold)
| Average Entropy | CRF | VSM | ||||
|---|---|---|---|---|---|---|
| Prec | Rec | F-sc. | Prec | Rec | F-sc. | |
| 25.7 | 37.9 | 39.5 | ||||
| .3 ≤ | 42.1 | 54.3 | 63.3 | |||
| 60.9 | 59.9 | 64.3 | ||||
Performance of CRF-VSM by event type, sorted by F-score (Freq. = Frequency of the event in test data).
| Event | Freq. | Prec. | Rec. | F-score |
|---|---|---|---|---|
| PROTEIN CATABOLISM | 13 | 100 | 84.6 | 91.7 |
| GENE EXPRESSION | 208 | 75.9 | 77.4 | 76.7 |
| PHOSPHORYLATION | 34 | 82.8 | 70.6 | 76.2 |
| LOCALIZATION | 13 | 72.7 | 61.5 | 66.7 |
| BINDING | 119 | 78.7 | 52.9 | 63.3 |
| TRANSCRIPTION | 52 | 64.0 | 61.5 | 62.7 |
| POSITIVE REGULATION | 263 | 64.9 | 42.2 | 51.2 |
| REGULATION | 86 | 51.2 | 25.6 | 34.1 |
| NEGATIVE REGULATION | 60 | 50.0 | 23.3 | 31.8 |
WSD experiment performance of VSM by POS. The best result per column is given in bold.
| POS | # Test | Acc | Prec | Rec | F-score |
|---|---|---|---|---|---|
| Noun | 1802 | ||||
| Verb | 1055 | 70.0 | 53.2 | 53.2 | 53.2 |
| Adj. | 53 | 64.2 | 9.1 | 10.0 | 9.5 |