| Literature DB >> 27173520 |
Po-Ting Lai1, Yu-Yan Lo2, Ming-Siang Huang3, Yu-Cheng Hsiao2, Richard Tzong-Han Tsai4.
Abstract
Biological expression language (BEL) is one of the most popular languages to represent the causal and correlative relationships among biological events. Automatically extracting and representing biomedical events using BEL can help biologists quickly survey and understand relevant literature. Recently, many researchers have shown interest in biomedical event extraction. However, the task is still a challenge for current systems because of the complexity of integrating different information extraction tasks such as named entity recognition (NER), named entity normalization (NEN) and relation extraction into a single system. In this study, we introduce our BelSmile system, which uses a semantic-role-labeling (SRL)-based approach to extract the NEs and events for BEL statements. BelSmile combines our previous NER, NEN and SRL systems. We evaluate BelSmile using the BioCreative V BEL task dataset. Our system achieved an F-score of 27.8%, ∼7% higher than the top BioCreative V system. The three main contributions of this study are (i) an effective pipeline approach to extract BEL statements, and (ii) a syntactic-based labeler to extract subject-verb-object tuples. We also implement a web-based version of BelSmile (iii) that is publicly available at iisrserv.csie.ncu.edu.tw/belsmile.Entities:
Mesh:
Year: 2016 PMID: 27173520 PMCID: PMC4865328 DOI: 10.1093/database/baw064
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1.A BEL statement sample from the biocreative V BEL corpus.
The resources and models used for recognizing different entities
| Type | Algorithm | ML Corpus | Dictionary |
|---|---|---|---|
| Biological process | Dictionary matching | — | BEL dictionary |
| Chemical | CRF and dictionary matching | BioCreative IV CHEMDNER | Chebi |
| Disease | Dictionary matching | — | BEL dictionary |
| Protein | CRF and dictionary matching | JNLPBA | Entrez gene |
Heuristic normalization rules
| Rule | Examples |
|---|---|
| Basic rules | Converts to lowercaseRemoves hyphen, period, ahead ‘h’, ahead ‘human’ and ‘s’ behind the term |
| Parenthesis rules | Transforms ‘AAA(A)’ into ‘AAA—A’ |
| Remove space rule | Transforms ‘IL 2 alpha’ into ‘IL2alpha’ |
| Suitable rules | Removes general words such as ‘group’, ‘residue’, ‘protein’ and ‘atom’. |
| Stop word rules | Removes the preposition and article |
Examples of function patterns
| Function | No. of Pattern | Pattern |
|---|---|---|
| molecularActivity (act) | 15 | <Protein> activity |
| complexAboundance (complex) | 15 | <Protein>/<Protein> complex |
| Degradation (deg) | 11 | <Protein> degradation |
| proteinModification (pmod) | 9 | phosphorylation of <Protein> |
| Translocation (tloc) | 11 | translocation of <Protein> |
Figure 2.An example of transforming SVO into BEL statement.
Figure 3.Example of a parse tree annotated with semantic roles.
Figure 4.An example sentence with incorrect syntactic tree where two verbs ‘downregulated’ and ‘upregulated’ are in the sentence.
Figure 5.The distribution of the NE types in the training set.
Figure 6.The distribution of the NE types in the test set.
The overall performance of each level
| Evaluation metric | Precision | Recall | F-score |
|---|---|---|---|
| Term-level | 68.45 | 34.11 | 45.53 |
| Term-level with gold terms | 100 | 35.78 | 52.70 |
| Function-level | 55.55 | 7.57 | 13.33 |
| Function-level with gold terms | 90.00 | 13.63 | 23.68 |
| Relation-level | 43.00 | 21.50 | 28.67 |
| Relation-level with gold terms | 70.67 | 26.50 | 38.55 |
| BEL-level | 42.00 | 20.79 | 27.81 |
| BEL-level with gold terms | 69.33 | 25.74 |
The performances of each type
| Test set | Test set with gold NE | |||||
|---|---|---|---|---|---|---|
| Biological process | 100 | 8.7 | 16.0 | 100 | 8.7 | 16.0 |
| Chemical | 54.5 | 50.0 | 52.1 | 100 | 50.0 | 66.7 |
| Disease | 57.1 | 33.3 | 42.1 | 100 | 16.7 | 28.6 |
| Protein | 71.1 | 35.0 | 46.9 | 100 | 37.9 | 55.0 |
| molecularActivity | 25.0 | 3.2 | 5.7 | 66.7 | 6.5 | 11.8 |
| complexAboundance | 0 | 0 | 0 | 100 | 8.3 | 15.4 |
| degradation | 100 | 40.0 | 57.1 | 100 | 40.0 | 57.1 |
| proteinModification | 100 | 16.7 | 28.6 | 100 | 16.7 | 28.6 |
| translocation | 100 | 14.3 | 25.0 | 100 | 14.3 | 25.0 |
| increases | 42.9 | 21.3 | 28.4 | 66.1 | 23.8 | 35.8 |
| decreases | 43.5 | 22.2 | 29.4 | 84.2 | 35.6 | 50.0 |
| increases | 41.6 | 20.4 | 27.4 | 64.3 | 22.9 | 33.8 |
| decreases | 43.5 | 22.2 | 29.4 | 84.2 | 35.6 | 50.0 |
The relation-level performances of removing individual SRL
| Test set with gold NE | ||||
|---|---|---|---|---|
| Relation type | P(%) | R(%) | F(%) | |
| BelSmile | 70.67 | 26.5 | 38.55 | |
| BelSmile remove RCBiosmile | 70.27 | 26.0 | 37.96 | |
| BelSmile remove ME-based SRL | 72.22 | 26.0 | 38.24 | |
| BelSmile remove Rule-based SRL | 73.24 | 26.0 | 38.37 | |
The statement-level performances on BEL test set
| Test set | Test set with gold NEs | |||||
|---|---|---|---|---|---|---|
| R(%) | P(%) | F(%) | R(%) | P(%) | F(%) | |
| Choi | 12.4 | 54.4 | 20.2 | 23.8 | 67.6 | 35.2 |
| Liu | 13.9 | 26.4 | 18.2 | 21.3 | 32.1 | 25.6 |
| BelSmile | 20.79 | 42.0 | 27.81 | 25.74 | 69.33 | 37.55 |