| Literature DB >> 26413521 |
Xiaofang Wu1, Zhihao Yang1, ZhiHeng Li1, Hongfei Lin1, Jian Wang1.
Abstract
The volume of published biomedical literature on disease related knowledge is expanding rapidly. Traditional information retrieval (IR) techniques, when applied to large databases such as PubMed, often return large, unmanageable lists of citations that do not fulfill the searcher's information needs. In this paper, we present an approach to automatically construct disease related knowledge summarization from biomedical literature. In this approach, firstly Kullback-Leibler Divergence combined with mutual information metric is used to extract disease salient information. Then deep search based on depth first search (DFS) is applied to find hidden (indirect) relations between biomedical entities. Finally random walk algorithm is exploited to filter out the weak relations. The experimental results show that our approach achieves a precision of 60% and a recall of 61% on salient information extraction for Carcinoma of bladder and outperforms the method of Combo.Entities:
Mesh:
Year: 2015 PMID: 26413521 PMCID: PMC4561941 DOI: 10.1155/2015/428195
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Figure 1The framework of our method.
Kullback-Leibler Divergence scores of relations between Carcinoma of bladder and genes.
| Relation | KLD |
|---|---|
| ASSOCIATED_WITH | 0.264763 |
| PART_OF | 0.196826 |
| PREDISPOSES | 0.056031 |
| COEXISTS_WITH | 0.047783 |
| AFFECTS | 0.017736 |
KM scores of predicates and semantic types between Carcinoma of bladder and genes.
| Semantic type | Relation | KM |
|---|---|---|
| gngm | ASSOCIATED_WITH | 0.036137 |
| tisu | PART_OF | 0.007876 |
| aapp | ASSOCIATED_WITH | 0.007778 |
| bacs | ASSOCIATED_WITH | 0.004029 |
| comd | COEXISTS_WITH | 0.001836 |
Full name for semantic types in disease-gene predication.
| Semantic type | Full name |
|---|---|
| aapp | Amino acid, peptide, or protein |
| bacs | Biologically active substance |
| celc | Cell component |
| cell | Cell |
| comd | Cell or molecular dysfunction |
| gngm | Gene or genome |
| neop | Neoplastic process |
| tisu | Tissue |
Combo scores of predicates and semantic types between Carcinoma of bladder and genes.
| Semantic type | Relation | Combo |
|---|---|---|
| tisu | PART_OF | 0.355117 |
| gngm | ASSOCIATED_WITH | 0.212097 |
| celc | PART_OF | 0.208429 |
| aapp | ASSOCIATED_WITH | 0.119630 |
| cell | PART_OF | 0.098452 |
Ranking and recall for genes confirmed with reference standard for IDF results.
| Gene | KM | KM | Combo | Combo |
|---|---|---|---|---|
| TP53 | 1 | TP | 1 | TP |
| FGFR3 | 2 | TP | 2 | TP |
| HRAS | 6 | TP | 7 | TP |
| TSC1 | 23 | TP | 26 | TP |
| MDM2 | 29 | TP | 32 | TP |
| RB1 | 58 | TP | 70 | TP |
| ERCC2 | 70 | TP | 82 | TP |
| NAT2 | 75 | TP | 87 | TP |
| RAG1 | NULL | FN | NULL | FN |
| MTCYB | NULL | FN | NULL | FN |
| ATM | NULL | FN | NULL | FN |
| TGFB1 | NULL | FN | NULL | FN |
| ERBB3 | NULL | FN | NULL | FN |
| MAP | 39.46% | 37.52% |
Figure 2R@N of KM and Combo. R@N is the recall of top N samples in the ranking; N is the number of samples.
Figure 3P@N of KM and Combo. P@N is the precision of top N samples in the ranking. N is the number of samples.
Ranking and MAP for genes related to Parkinson Disease confirmed with reference standard.
| Gene | KM | KM | Combo | Combo |
|---|---|---|---|---|
| LRRK2 | 1 | TP | 3 | TP |
| PARK7 | 2 | TP | 7 | TP |
| PINK1 | 5 | TP | 11 | TP |
| SNCA | 8 | TP | 14 | TP |
| GIGYF2 | 9 | TP | 16 | TP |
| NR4A2 | 11 | TP | 28 | TP |
| VPS35 | 12 | TP | 24 | TP |
| PARK2 | 13 | TP | 25 | TP |
| ATP13A2 | 20 | TP | 40 | TP |
| GBA | 24 | TP | 39 | TP |
| UCHL1 | NULL | FN | NULL | FN |
| PRKAG2 | NULL | FN | NULL | FN |
| SNCB | NULL | FN | NULL | FN |
| PLA2G6 | NULL | FN | NULL | FN |
| PDE8B | NULL | FN | NULL | FN |
| NDUFAF3 | NULL | FN | NULL | FN |
| FOXRED1 | NULL | FN | NULL | FN |
| NDUFA11 | NULL | FN | NULL | FN |
| NDUFA1 | NULL | FN | NULL | FN |
| NDUFAF2 | NULL | FN | NULL | FN |
| NDUFAF4 | NULL | FN | NULL | FN |
| NDUFAF5 | NULL | FN | NULL | FN |
| NDUFAF6 | NULL | FN | NULL | FN |
| NDUFS1 | NULL | FN | NULL | FN |
| NDUFS2 | NULL | FN | NULL | FN |
| NDUFS4 | NULL | FN | NULL | FN |
| NUBPL | NULL | FN | NULL | FN |
| SLC6A3 | NULL | FN | NULL | FN |
| DRD4 | NULL | FN | NULL | FN |
| FGF20 | NULL | FN | NULL | FN |
| SNCAIP | NULL | FN | NULL | FN |
| NDUFV2 | NULL | FN | NULL | FN |
| STX1B | NULL | FN | NULL | FN |
| NDUFV1 | NULL | FN | NULL | FN |
| HP | NULL | FN | NULL | FN |
| PARK | NULL | FN | NULL | FN |
| MAP | 62.66% | 27.86% |
Box 1Top 10 predications scored by random walk with depth 4.
Figure 4Disease information extracted by random walk with depth 4.
Box 2Top 10 predications scored by random walk with depth 6.
Figure 5Disease information extracted by random walk with depth 6.
Box 3Top 10 predications of cooccurence scored by random walk with depth 4.
Box 4Top 10 predications of cooccurence scored by random walk with depth 6.