| Literature DB >> 23761452 |
Jeongkyun Kim1, Seongeun So, Hee-Jin Lee, Jong C Park, Jung-Jae Kim, Hyunju Lee.
Abstract
Biological events such as gene expression, regulation, phosphorylation, localization and protein catabolism play important roles in the development of diseases. Understanding the association between diseases and genes can be enhanced with the identification of involved biological events in this association. Although biological knowledge has been accumulated in several databases and can be accessed through the Web, there is no specialized Web tool yet allowing for a query into the relationship among diseases, genes and biological events. For this task, we developed DigSee to search MEDLINE abstracts for evidence sentences describing that 'genes' are involved in the development of 'cancer' through 'biological events'. DigSee is available through http://gcancer.org/digsee.Entities:
Mesh:
Year: 2013 PMID: 23761452 PMCID: PMC3692119 DOI: 10.1093/nar/gkt531
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Indexing and searching processes in the DigSee system.
Figure 2.Screenshots of the Web interface of the DigSee system. (a) Main search results are MEDLINE abstracts with evidence sentences supporting that genes are related to a given cancer type through biological events. (b) A graph visualizes genes with evidence sentences. Nodes are genes and two nodes are connected if they appear in the same documents. When the number of genes is large, only a subset of genes with high evidence sentence score is shown. By increasing a threshold for the number of genes, more genes will appear. Users can expand neighbor genes by clicking a right button for a node. Clicking an edge will show a list of abstracts in which two genes appear together.
Gold-standard data
| Events | Binding | Gene expression | Local | Phosphorylation | Protein catabolism | Transcription |
|---|---|---|---|---|---|---|
| Positive | 11 (18) | 20 (52) | 9 (19) | 19 (18) | 6 (8) | 5 (22) |
| Negative | 26 (29) | 20 (46) | 23 (38) | 24 (38) | 24 (17) | 45 (26) |
| Total | 37 (47) | 40 (98) | 32 (57) | 43 (56) | 30 (25) | 50 (48) |
Positive and negative gold-standard evidence sentences for binding, gene expression, localization, phosphorylation, protein catabolism and transcription are collected. For each event type, the numbers of feature selection (performance testing) sentences are shown.
Accuracies of individual features using performance testing data
| Features | F-measure | AUC |
|---|---|---|
| Normalized event SVM score | 62.7 | 57.8 |
| Normalized edge SVM score | 60.3 | 42.5 |
| Gene–event distance | 62.1 | 52.0 |
| Event–regulation distance | 64.5 | 59.7 |
| Event–cancer distance | 71.5 | 74.1 |
| Cancer keywords count (depending on event–cancer distance) | 72.5 | 72.5 |
| Hallmark keywords count | 64.4 | 58.8 |
| Event depth | 60.2 | 47.7 |
| Negative score | 68.7 | 59.3 |
| Agent (depending on hallmark keywords count) | 64.1 | 62.6 |
| Total | 72.7 | 80.5 |
Accuracies of biological events
| Biological events | Bayesian classifier | SVM classifier | Random order | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| P | R | F | AUC | P | R | F | AUC | P | R | F | AUC | |
| Binding | 80.0 | 91.2 | 83.6 | 87.1 | 45.7 | 88.3 | 60.0 | 62.6 | 42.2 | 94.1 | 57.8 | 50.7 |
| Gene expression | 66.7 | 96.2 | 78.7 | 79.3 | 65.7 | 88.7 | 75.5 | 73.7 | 54.4 | 98.6 | 70.0 | 50.7 |
| Localization | 66.7 | 53.3 | 59.0 | 72.5 | 56.0 | 75.0 | 63.9 | 71.5 | 37.5 | 90.7 | 52.1 | 50.5 |
| Phosphorylation | 75.0 | 85.0 | 79.3 | 93.7 | 90.0 | 51.7 | 65.3 | 75.2 | 37.2 | 90.0 | 51.4 | 50.9 |
| Protein catabolism | 100.0 | 70.0 | 80.0 | 96.7 | 40.0 | 80.0 | 52.0 | 61.7 | 42.6 | 87.4 | 55.4 | 52.0 |
| Transcription | 63.3 | 87.0 | 73.1 | 74.7 | 54.3 | 86.0 | 66.4 | 71.0 | 48.5 | 97.2 | 64.5 | 48.6 |
| Total | 62.6 | 86.9 | 72.7 | 80.5 | 53.7 | 81.0 | 64.5 | 71.2 | 43.7 | 93.0 | 59.4 | 49.8 |
In the table, a precision is shortened to ‘P’, a recall to ‘R’ and an F-measure to ‘F’.