| Literature DB >> 29297359 |
Xu Dongliang1, Pan Jingchang2, Wang Bailing3.
Abstract
BACKGROUND: Automatic extracting protein entity interaction information from biomedical literature can help to build protein relation network and design new drugs. There are more than 20 million literature abstracts included in MEDLINE, which is the most authoritative textual database in the field of biomedicine, and follow an exponential growth over time. This frantic expansion of the biomedical literature can often be difficult to absorb or manually analyze. Thus efficient and automated search engines are necessary to efficiently explore the biomedical literature using text mining techniques.Entities:
Keywords: Entity relationship extraction; Multi-kernels learing; Tag-graph kernel
Mesh:
Year: 2017 PMID: 29297359 PMCID: PMC5763518 DOI: 10.1186/s13326-017-0138-9
Source DB: PubMed Journal: J Biomed Semantics
Fig. 1Sampling example of protein interaction (The PMID of the literature where the sentence is found is 23041326, and PMID refers to the retrieval number biological literature coded by PubMed)
(IL)-8 and CXCR1 characteristics
| Characteristic name | Characteristic value |
|---|---|
| Lexical item in the two | a1_(IL)-8, a2_CXCR1 |
| Protein names | |
| Lexical item between the | b1_has, b2_an, b3_important,… |
| Two protein names | b 17_their, b 18_receptors |
| Lexical item around the | l1_Interleukin, r1_and |
| Two protein names | |
| Key word feature | k_receptors |
| Entity distance entity | d_16 |
Fig. 2Demonstration of extension dependency path tree kernel
Statistical form of corpus information
| Corpus set | Number of texts | Number of sentences | Number of positive examples | Number of negative examples | Total number of examples |
|---|---|---|---|---|---|
| Aimed | 225 | 1955 | 1000 | 4834 | 5834 |
| IEPA | 50 | 145 | 335 | 482 | 817 |
| BioInfer | 863 | 1100 | 2534 | 7132 | 9666 |
| HPRD50 | 200 | 486 | 163 | 270 | 433 |
| LLL | 45 | 77 | 164 | 166 | 330 |
Comparison between tag graph kernel and all-paths graph kernel in terms of their performance
| Tag graph kernel method | All-paths graph kernel | |||||
|---|---|---|---|---|---|---|
| Corpus set | P | R | F | P | R | F |
| BioInfer | 51.64 | 68.92 | 59.73 | 46.89 | 62.13 | 57.25 |
| Aimed | 50.82 | 69.76 | 58.61 | 44.97 | 65.82 | 55.46 |
| HPRD50 | 55.64 | 67.81 | 70.01 | 49.76 | 64.38 | 68.21 |
| IEPA | 61.58 | 76.91 | 74.23 | 56.48 | 72.36 | 70.65 |
| LLL | 71.92 | 70.84 | 77.43 | 67.19 | 66.95 | 72.68 |
Performance of different kernel methods in BioInfer corpus
| Method | P | R | F |
|---|---|---|---|
| Characteristics-based kernels | 45.61 | 63.57 | 56.24 |
| Extension dependency path tree kernel | 41.32 | 69.76 | 52.58 |
| Tag graph kernel | 51.64 | 68.92 | 59.73 |
| Feature kernel + path tree kernel | 49.86 | 70.12 | 60.25 |
| Feature kernel + tag graph kernel | 55.43 | 71.62 | 61.30 |
| Path tree kernel + tag graph kernel | 55.47 | 70.29 | 60.37 |
Performance of different kernel methods in five types of corpuses
| Corpus set | Evaluation parameters | Characteristics- based kernels | Extension path dependency kernel | Tag graph kernel | Kernels from |
|---|---|---|---|---|---|
| three-kernel fusion | |||||
| Aimed | P | 45.34 | 42.31 | 50.82 | 57.45 |
| R | 61.25 | 68.54 | 69.76 | 72.31 | |
| F | 55.36 | 52.63 | 58.61 | 60.98 | |
| IEPA | P | 56.84 | 52.48 | 61.58 | 73.82 |
| R | 72.92 | 69.35 | 76.91 | 81.06 | |
| F | 87.15 | 63.79 | 74.23 | 79.57 | |
| BioInfer | P | 45.61 | 41.32 | 51.64 | 91.69 |
| R | 63.57 | 69.76 | 68.92 | 71.62 | |
| F | 56.24 | 52.58 | 59.73 | 62.35 | |
| HPRD | P | 50.26 | 49.96 | 55.64 | 61.87 |
| R | 67.59 | 66.31 | 67.81 | 72.35 | |
| F | 75.38 | 69.78 | 70.01 | 85.48 | |
| LLL | P | 53.59 | 83.34 | 71.92 | 75.69 |
| R | 70.12 | 69.78 | 70.84 | 78.37 | |
| F | 68.43 | 88.03 | 77.43 | 90.12 |