| Literature DB >> 21284871 |
T Elizabeth Workman1, John F Hurdle.
Abstract
BACKGROUND: Traditional information retrieval techniques typically return excessive output when directed at large bibliographic databases. Natural Language Processing applications strive to extract salient content from the excessive data. Semantic MEDLINE, a National Library of Medicine (NLM) natural language processing application, highlights relevant information in PubMed data. However, Semantic MEDLINE implements manually coded schemas, accommodating few information needs. Currently, there are only five such schemas, while many more would be needed to realistically accommodate all potential users. The aim of this project was to develop and evaluate a statistical algorithm that automatically identifies relevant bibliographic data; the new algorithm could be incorporated into a dynamic schema to accommodate various information needs in Semantic MEDLINE, and eliminate the need for multiple schemas.Entities:
Mesh:
Year: 2011 PMID: 21284871 PMCID: PMC3042900 DOI: 10.1186/1472-6947-11-6
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Figure 1Semantic MEDLINE. The adaptive Combo algorithm described in this paper was designed to be incorporated into the Summarization process.
Figure 2Visualized Summarized Results. This is an image of the Visualization process displaying summarized data addressing the genetic etiology of bladder cancer.
Combo Scores of Top-Ranking Patterns in Novel Relevance and Novel Connectivity Analyses; non-seed semantic types are indicated in square brackets.
| Relevancy Analysis Seed Topic: Carcinoma of bladder | Combo Score |
|---|---|
| [gngm] ASSOCIATED_WITH neop | 0.592531 |
| [gngm] PREDISPOSES neop | 0.205778 |
| [aapp] ASSOCIATED_WITH neop | 0.152883 |
| [aapp] PREDISPOSES neop | 0.039868 |
| gngm ASSOCIATED_WITH [neop] | 0.873016 |
Kullback-Leibler Divergence Scores of Top-Ranking Predicates in Novel Relevance and Novel Connectivity Analysis
| Relevance Analysis Seed Topic: Carcinoma of bladder | KLD Score |
|---|---|
| ASSOCIATED_WITH | 0.561861059 |
| PREDISPOSES | 0.299181776 |
| AFFECTS | 0.088951936 |
| PART_OF | 0.034851914 |
| ASSOCIATED_WITH | 0.5553145 |
RlogF Scores of Top-Ranking Predicate/Non-seed Semantic Type Pairs in Novel Relevance and Novel Connectivity Analysis.
| Relevance Analysis Seed Topic: Carcinoma of bladder | RlogF Score |
|---|---|
| gngm ASSOCIATED_WITH | 4.218344839 |
| topp TREATS | 2.96127605 |
| ISA neop | 2.807354922 |
| gngm PREDISPOSES | 2.751207824 |
| ASSOCIATED_WITH neop | 7.208071323 |
Recall Results with Reference Standard (TP = True Positive; FN = False Negative)
| Gene | Combo Analysis | KLD Analysis | RlogF Analysis | Conventional Schema |
|---|---|---|---|---|
| FGFR3 | TP | TP | TP | TP |
| XPD | TP | TP | TP | FN |
| RAG1 | FN | FN | FN | FN |
| TP53 | TP | TP | TP | TP |
| MTCYB | FN | FN | FN | FN |
| HRAS | TP | TP | TP | FN |
| NAT2 | TP | TP | TP | TP |
| RB1 | TP | TP | TP | FN |
| TSC1 | TP | TP | TP | FN |
| ATM | FN | FN | FN | FN |
| TGFB1 | FN | FN | FN | FN |
| MDM2 | TP | TP | TP | FN |
| ERBB3 | FN | FN | FN | FN |
| Recall | 61% | 61% | 61% | 23% |
Precision Results (TP = True Positive; FP = False Positive)
| Combo Analysis | KLD Analysis | RlogF Analysis | Conventional Schema | |
|---|---|---|---|---|
| TP | 60 | 54 | 50 | 10 |
| FP | 14 | 23 | 19 | 0 |
| Total | 74 | 77 | 69 | 10 |
| Precision | 81% | 70% | 72% | 100% |