| Literature DB >> 18385820 |
Abstract
Interactions between proteins and genes are considered essential in the description of biomolecular phenomena, and networks of interactions are applied in a system's biology approach. Recently, many studies have sought to extract information from biomolecular text using natural language processing technology. Previous studies have asserted that linguistic information is useful for improving the detection of gene interactions. In particular, syntactic relations among linguistic information are good for detecting gene interactions. However, previous systems give a reasonably good precision but poor recall. To improve recall without sacrificing precision, this paper proposes a three-phase method for detecting gene interactions based on syntactic relations. In the first phase, we retrieve syntactic encapsulation categories for each candidate agent and target. In the second phase, we construct a verb list that indicates the nature of the interaction between pairs of genes. In the last phase, we determine direction rules to detect which of two genes is the agent or target. Even without biomolecular knowledge, our method performs reasonably well using a small training dataset. While the first phase contributes to improve recall, the second and third phases contribute to improve precision. In the experimental results using ICML 05 Workshop on Learning Language in Logic (LLL05) data, our proposed method gave an F-measure of 67.2% for the test data, significantly outperforming previous methods. We also describe the contribution of each phase to the performance.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18385820 PMCID: PMC2277490 DOI: 10.1155/2008/371710
Source DB: PubMed Journal: J Biomed Biotechnol ISSN: 1110-7243
Algorithm 3Direction rule-learning algorithm.
Figure 2Reverse syntactic path of Figure 1 for a negative rule.
Positive and negative rule sets for the sentence in Figure 1.
| Positive rule | |
|---|---|
| Negative rule |
Examples of direction rules learned through the third phase (based on MINIPAR).
| Lexical word | Relation | Direction | Lexical word | Relation | Direction |
|---|---|---|---|---|---|
| activate | aux | RIGHT | affect | aux | LEFT |
| activate | otherwise | ANY | affect | RIGHT | |
| bind | conj | LEFT | affect | obj | ANY |
| bind | RIGHT | affect | otherwise | ANY | |
| ⋯ | ⋯ | ⋯ | drive | obj | LEFT |
| bind | otherwise | ANY | drive | RIGHT |
Performances of our system and other previous systems using LLL05 syntactic tags.
| Performance on test data(%) | Hakenberg et al. [ | Goadrich et al. [ | Riedel and Klein [ | Popelinsky and Blatak [ | Katrenko et al. [ | Our system (Using LLL05 tags) | |
|---|---|---|---|---|---|---|---|
| Using LLL05 syntactic tags | Precision | 28.1 | 28.3 | 60.9 | 46.5 | 39.2 | 67.9 |
| Recall | 31.4 | 79.6 | 46.2 | 50.0 | 26.5 | 66.6 | |
| F-measure | 29.6 | 41.7 | 52.6 | 48.2 | 31.6 | 67.2 | |
Performances of our system and other previous systems using MINI-PAR.
| Performance on test data(%) | Greenwood et al. [ | Our system (Using MINIPAR) | |
|---|---|---|---|
| Using MINIPAR | Precision | 22.2 | 32.4 |
| Recall | 11.1 | 68.5 | |
| F-measure | 14.8 | 44.0 | |
Change in performance when one phase is removed.
| Performance on the test data based on LLL05 syntactic relations(%) | ||
|---|---|---|
| Using all phases | Precision | 67.9 |
| Recall | 66.6 | |
| F-measure | 67.2 | |
| Without the first phase (there is no “syntactic encapsulation categories”) | Precision | 0 |
| Recall | 0 | |
| F-measure | 0 | |
| Without the second phase (all verbs are considered “interaction verbs”) | Precision | 24.6 |
| Recall | 88.8 | |
| F-measure | 38.5 | |
| Without the third phase (there is no syntactic direction information) | Precision | 39.7 |
| Recall | 72.2 | |
| F-measure | 51.3 | |