| Literature DB >> 30577839 |
Yan Wang1, Jian Wang2, Hongfei Lin1, Xiwei Tang3, Shaowu Zhang1, Lishuang Li1.
Abstract
BACKGROUND: In biomedical information extraction, event extraction plays a crucial role. Biological events are used to describe the dynamic effects or relationships between biological entities such as proteins and genes. Event extraction is generally divided into trigger detection and argument recognition. The performance of trigger detection directly affects the results of the event extraction. In general, the traditional method is used to address the trigger detection as a classification task, as well as the use of machine learning or rules method, which construct many features to improve the classification results. Moreover, the classification model only recognizes triggers composed of single words, whereas for multiple words, the result is unsatisfactory.Entities:
Keywords: Bidirectional LSTM; Biomedical events; CRF; FastText; Semantic space; Trigger detection
Mesh:
Year: 2018 PMID: 30577839 PMCID: PMC6302454 DOI: 10.1186/s12859-018-2543-1
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1The basic process of trigger detection
Fig. 2The processing flow of “BIO” label mechanism
The example of annotation information
| Word | Label |
|---|---|
| VEGF | O |
| Plays | B-Regulation |
| A | I-Regulation |
| Key | I-Regulation |
| Role | I-Regulation |
| In | O |
| The | O |
| Angiogenic | B-Blood_vessel_development |
| Response | O |
| That | O |
| Occurs | O |
| With | O |
| Chronic | O |
| Bradycardia | O |
| . | O |
Fig. 3The structure of FastText model
Fig. 4The bidirectional LSTM and CRF mode
Fig. 5The distributed vector presentation layer
Fig. 6The bidirectional LSTM layer
Comparison of trigger detection performance with existing methods
| Method | F-score(%) | Precision(%) | Recall(%) |
|---|---|---|---|
| Pyysalo [ | 75.67 | 70.65 | 81.46 |
| Zhou [ | 77.82 | 74.85 | 81.04 |
| Ours [ | 78.08 | 77.89 | 78.28 |
Comparison of trigger detection performance with different word embedding
| Semantic Space | F-score(%) | Precision(%) | Recall(%) |
|---|---|---|---|
| Random Embedding | 73.72 | 76.36 | 71.25 |
| Glove | 74.84 | 79.70 | 70.54 |
| Doc2Vec | 76.03 | 78.78 | 73.47 |
| Word2Vec | 76.71 | 79.61 | 74.02 |
| Ours(FastText) | 78.08 | 77.89 | 78.28 |
Comparison of trigger detection performance on different model
| Model | F-score(%) | Precision(%) | Recall(%) |
|---|---|---|---|
| CRF | 65.45 | 57.44 | 76.06 |
| LSTM | 72.60 | 78.40 | 67.61 |
| LSTM-CRF | 75.22 | 76.12 | 74.35 |
| BLSTM | 76.39 | 80.02 | 73.08 |
| Ours(BLSTM-CRF) | 78.08 | 77.89 | 78.28 |
The result with features of entity and POS
| Features | F-score(%) | Precision(%) | Recall(%) |
|---|---|---|---|
| Ours | 78.08 | 77.89 | 78.28 |
| Ours+POS | 78.12 | 77.99 | 72.22 |
| Ours+entity | 79.58 | 80.58 | 71.57 |
| Ours+POS+entity | 80.64 | 75.28 | 76.86 |
The result of different corpus
| Corpus | systems | F-score(%) | Precision(%) | Recall(%) |
|---|---|---|---|---|
| BioNLP 2009 | Ours | 63.01 | 68.21 | 58.55 |
| Martinez [ | 60.10 | 70.20 | 52.60 | |
| BioNLP 2011 | Ours | 66.81 | 68.44 | 65.26 |
| Vlachos [ | 58.98 | 66.76 | 52.82 | |
| BioNLP 2013 | Ours | 64.66 | 63.08 | 66.33 |
| Liu [ | 50.95 | 54.22 | 48.06 |