| Literature DB >> 19344479 |
Jay Urbain1, Ophir Frieder, Nazli Goharian.
Abstract
We present a passage relevance model for integrating syntactic and semantic evidence of biomedical concepts and topics using a probabilistic graphical model. Component models of topics, concepts, terms, and document are represented as potential functions within a Markov Random Field. The probability of a passage being relevant to a biologist's information need is represented as the joint distribution across all potential functions. Relevance model feedback of top ranked passages is used to improve distributional estimates of query concepts and topics in context, and a dimensional indexing strategy is used for efficient aggregation of concept and term statistics. By integrating multiple sources of evidence including dependencies between topics, concepts, and terms, we seek to improve genomics literature passage retrieval precision. Using this model, we are able to demonstrate statistically significant improvements in retrieval precision using a large genomics literature corpus.Entities:
Mesh:
Year: 2009 PMID: 19344479 PMCID: PMC2665051 DOI: 10.1186/1471-2105-10-S3-S3
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Passage Relevance Model.
Topic Relevance
| BILAG (0.6010) | B-RAF (0.54793) | 1gangliosid (0.5424) |
| lupu (0.5650) | mutat (0.5039) | brain (0.5010) |
| anticardiolipin (0.3960) | RAF (0.4834) | gangliosid (0.4949) |
| immunodiffus (0.3870) | melanoma (0.4536) | accumul (0.4146) |
| isle (0.3750) | activ (0.4403) | abnorm (0.4008) |
| system (0.3331) | mutation (0.3661) | diseas (0.2690) |
| erythematosu (0.2954) | cell (0.3649) | asialo (0.2393) |
| antibodi (0.2820) | ERK (0.2508) | neuron (0.2323) |
| index (0.2776) | gene (0.2132) | protein (0.2067) |
| diseas (0.2488) | RAS (0.19943) | patient (0.1990) |
| activ (0.24514) | pathwai (0.1804) | cell (0.1985) |
| measur (0.2432) | human (0.1781) | lysosom (0.1895) |
| anticoagul (0.2320) | cancer (0.16233) | respons (0.1836) |
| clinic (0.2193) | autoinhibit (0.1617) | human (0.1793) |
| bacon (0.1807) | express (0.1570) | promin (0.1724) |
| patient (0.1551) | growth (0.1447) | mice (0.16800) |
| EM (0.1492) | phosphoryl (0.1183) | clinic (0.1673) |
| ISI (0.1425) | focus (0.1162) | gangliosidosi (0.1647) |
| hay (0.1404) | signal (0.1152) | phenotyp (0.1599) |
| score (0.1291) | tumor (0.1148) | storag (0.1559) |
| SLE (0.1196) | RAF1 (0.1128) | apoptosi (0.1326) |
200: What serum [proteins] change expression in association with high disease activity in lupus?
201: What [mutations] in the Raf gene are associated with cancer?
202: What [drugs] are associated with lysosomal abnormalities in the nervous system?
Note: Terms are shown in stemmed form, acronyms have been capitalized, and the probabilities are not normalized.
Concepts from 2007 TREC Genomics
| antibodi (0.4895) | lupu (0.6477) | lysosom (0.9999) |
| cell (0.2565) | SLE (0.4645) | cell (0.2326) |
| serum (0.2158) | system (0.3527) | protein (0.1848) |
| anti (0.2042) | patient (0.3482) | membran (0.1514) |
| plasma (0.1544) | erythematosu (0.3394) | endosom (0.1297) |
| protein (0.1498) | diseas (0.1749) | enzym (0.1019) |
| membran (0.0869) | antibodi (0.1172) | degrad (0.0912) |
| incub (0.0718) | cell(0.1109) | transport (0.0778) |
| human (0.0713) | anti (0.0779) | acid (0.0704) |
| monoclon (0.0677) | nephriti (0.0744) | compart (0.0644) |
| express (0.0623) | mice (0.0618) | storag (0.0597) |
| bind (0.0596) | autoantibodi(.06) | target (0.0582) |
| concentr (0.0541) | clinic (0.0579) | pathwai (0.0540) |
| beta (0.0513) | autoimmun (0.0557) | accumul (0.0536) |
| IG (0.0507) | DNA (0.0545) | diseas (0.0528) |
| alpha (0.0477) | human (0.0479) | human (0.0522) |
| mous (0.0477) | syndrom (0.0429) | express (0.0474) |
| rabbit (0.0459) | factor (0.0428) | cathepsin (0.0456) |
| CD (0.0434) | ISI (0.0424) | organel (0.0414) |
| glycoprotein (0.0423) | IG (0.0382) | golgi (0.0402) |
Notes: Each concept is defined by the probability of a sentence containing the concept generating the term.
Concept names, e.g., "Blood Protein" for Topic 1, are extracted from the UMLS Metathesaurus.
Terms, sans acronym, are shown in stemmed form.
Figure 2Dimensional term index (paragraph).
Entity resolution
| [Encephalopathy, Bovine Spongiform] | [Mad Cow Disease] |
| [MCD] | |
| [BSE] | |
| [Creutzfeldt-Jakob disease] | |
| [CJD] | |
| [PRNP gene] | [prion protein] |
| [prnp] |
Results 2007 TREC Genomics collection (MAP)
| Top TREC* | 0.3105 | 0.0976 | 0.1097 | 0.2494 |
| Median TREC | 0.1954 | 0.0565 | 0.0391 | 0.1272 |
| TREC 2007 Submission | 0.2385 | 0.09742 | 0.1647 | 0.05164 |
| Document model | 0.2363 | - | - | - |
| Topic model | 0.2034 | - | - | - |
| Topic-relevance model | 0.2605 | 0.0898 | 0.0452 | 0.1383 |
| Concept model | 0.3381 | 0.1087 | 0.0579 | 0.1907 |
| Term model | 0.3226 | 0.1053 | 0.0557 | 0.1856 |
| Concept+Term models | 0.3443 | 0.1100 | 0.0588 | 0.2145 |
| Doc+Concept +Term+Topic-relevance | 0.3554 (+14.46%) | 0.0681 (-37.92%) | 0.2412 (-3.29%) | |
| Doc+Concept +Term+ | 0.1093 (+11.99%) |
†Statiscally significant using Wilcoxon signed rank test (p < 0.05).
Top Results for TREC 2007 Genomics Track