| Literature DB >> 21886873 |
Abstract
OBJECTIVES: The purpose of this study was to investigate the effects of query expansion algorithms for MEDLINE retrieval within a pseudo-relevance feedback framework.Entities:
Keywords: Evaluation Studies; Information Storage and Retrieval; MEDLINE
Year: 2011 PMID: 21886873 PMCID: PMC3155169 DOI: 10.4258/hir.2011.17.2.120
Source DB: PubMed Journal: Healthc Inform Res ISSN: 2093-3681
Term ranking algorithms and their formulas
p: the probability of occurrence of term t in the set of pseudo-relevant documents, q: the probability of occurrence of term t in the set of non-relevant documents, c: the probability of occurrence of term t in the whole document collection, w: the weight to be assigned to term t, EMIM: expected mutual information measure, F4: F4MODIFIED, IDF: inverse document frequency, KLD: Knullback-Leibler divergence, LCA: local context analysis, RSV: Robertson selection value.
Comparisons of different term ranking algorithms for different term reweighting methods (R = 10 and E = 25)
The mean average precision is presented for a combination of term ranking (rows) and term reweighting (columns) methods, including, in parentheses, the percent (%) improvement from the unexpanded queries.
EMIM: expected mutual information measure, F4: F4MODIFIED, IDF: inverse document frequency, KLD: Knullback-Leibler divergence, LCA: local context analysis, RSV: Robertson selection value.
a(p < 0.05) and b(p < 0.01) are in bold.
Figure 1Percentage of average overlapping expansion terms for fifteen high-overlapping pairs of term ranking algorithms with the default parameter setting (R = 10, E = 25). IDF: inverse document frequency, EMIM: expected mutual information measure, LCA: local context analysis, KLD: Knullback-Leibler divergence, RSV: Robertson selection value.
Results from the paired t-test between term ranking algorithms when the rank_norm term reweighting was applied (R = 10 and E = 25)
EMIM: expected mutual information measure, F4: F4MODIFIED, IDF: inverse document frequency, KLD: Knullback-Leibler divergence, LCA: local context analysis, RSV: Robertson selection value.
Comparison of different term ranking algorithms for different term reweighting methods (R = 50 and E = 15)
The mean average precision is presented for a combination of term ranking (rows) and term reweighting methods (columns), including, in parentheses, the percent (%) improvement from the unexpanded queries.
EMIM: expected mutual information measure, F4: F4MODIFIED, IDF: inverse document frequency, KLD: Knullback-Leibler divergence, LCA: local context analysis, RSV: Robertson selection value.
a(p < 0.05) and b(p < 0.01) are in bold.
Results of the paired t-test between term ranking algorithms when rank_norm term reweighting was applied (R = 50 and E = 15)
EMIM: expected mutual information measure, F4: F4MODIFIED, IDF: inverse document frequency, KLD: Knullback-Leibler divergence, LCA: local context analysis, RSV: Robertson selection value.
Figure 2Comparison of different term reweighting algorithms for queries expanded with the local context analysis term ranking algorithm in terms of the percentage change of precision in the top 5, 10, 15, 20, and 30 retrieved documents from the original queries.