| Literature DB >> 22509475 |
Abstract
OBJECTIVES: The purpose of this study is to validate a method that uses multiple queries to create a set of relevance judgments used to indicate which documents are pertinent to each query when forming a biomedical test collection.Entities:
Keywords: Correlation Studies; Evaluation Studies; Gold Standard; Information Storage and Retrieval; MEDLINE
Year: 2012 PMID: 22509475 PMCID: PMC3324757 DOI: 10.4258/hir.2012.18.1.65
Source DB: PubMed Journal: Healthc Inform Res ISSN: 2093-3681
Figure 1Block diagram of experimental methodology.
Data sets used for experimentation
Figure 2Methodological diagram for generating aspect-qrels.
Figure 3System rankings among the original Text Retrieval Conference (TREC) rankings on mean average precision (MAP) and aspect-qrel based rankings on aMAP, test 1 and test 2.
Rank correlations with MAP computed by TREC qrel and aMAP computed by aspect query based 15 qrels
Each correlation was calculated under p < 0.01.
MAP: mean average precision, aMAP: aspect-qrels MAP, TREC: text retrieval conference.
aThe highest correlation value.
Figure 4Experimental results for rank correlations among aspect-qrels: Kendall's tau and Spearman's rho coefficients. TREC: Text Retrieval Conference.
Figure 5The effect of the number of documents collected per aspect during aspect-qrel creation. TREC: Text Retrieval Conference.
Figure 6Rho correlation between MAP and aMAP calculated from individual aspect qrels. MAP: mean average precision, aMAP: aspect-qrels MAP.
Rho correlation between MAP and aMAP calculated from individual aspect qrels
Each correlation was calculated under p < 0.01.
MAP: mean average precision, aMAP: aspect-qrels MAP.
aThe highest correlation value of each line.
Baseline correlations used different seed systems: BM25, Language model, and PL2
Each correlation was calculated under p < 0.01.
aThe highest correlation value.