| Literature DB >> 17214888 |
Mir S Siadaty1, Jianfen Shu, William A Knaus.
Abstract
BACKGROUND: Receiving extraneous articles in response to a query submitted to MEDLINE/PubMed is common. When submitting a multi-word query (which is the majority of queries submitted), the presence of all query words within each article may be a necessary condition for retrieving relevant articles, but not sufficient. Ideally a relationship between the query words in the article is also required. We propose that if two words occur within an article, the probability that a relation between them is explained is higher when the words occur within adjacent sentences versus remote sentences. Therefore, sentence-level concurrence can be used as a surrogate for existence of the relationship between the words. In order to avoid the irrelevant articles, one solution would be to increase the search specificity. Another solution is to estimate a relevance score to sort the retrieved articles. However among the >30 retrieval services available for MEDLINE, only a few estimate a relevance score, and none detects and incorporates the relation between the query words as part of the relevance score.Entities:
Mesh:
Year: 2007 PMID: 17214888 PMCID: PMC1780044 DOI: 10.1186/1472-6947-7-1
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Examples of retrieval services for MEDLINE
| PubMed | public/free | no | NLM's search engine for MEDLINE |
| SLIM | public/free | no | alternative search interface using slider controllers to implement search limits, methodology filters, and MeSH terminologies |
| askMEDLINE | public/free | no | free-text, natural language query tool for PubMed |
| eTBLAST | public/free | yes | inputs an entire paragraph and returns articles that are similar to it |
| Ovid's MEDLINE | subscription required | no | a search engine to MEDLINE |
| HubMed | public/free | yes | shows first the articles that contain the search terms most frequently in the title and/or abstract |
| PubMedAssistant | public/free | no | biologist-friendly interface for enhanced PubMed search |
| CISMeF | public/free | no | gives ranked list of relevant specialties that relate to topics discussed in each article |
| GoPubMed | public/free | no | classifies the retrieved articles using Gene Ontology terms |
| AnneOTate | public/free | no | A tool for summarizing the results of a PubMed query |
| ArrowSmith | public/free | no | A tool for identifying links between two sets of Medline articles |
| PubMed Gold | public/free | no | finds PDFs for PubMed citations |
Tuning a search engine to attain two different scenarios of retrieval.
| Scenario 1. Query with specificity of 99.99% is insufficient for a database of 16 million records. | ||||
| The truth | ||||
| relevant records | irrelevant records | |||
| search engine | records returned to user | 495 | 1,600 | 2,095 |
| records eliminated | 5 | 15,997,900 | ||
| 500 | 15,999,500 | 16,000,000 | ||
| odds ratio | 1,000,000.00 | |||
| Specificity | 99.99% | |||
| sensitivity (recall) | 99.01% | |||
| Precision | 23.63% | |||
| Scenario 2. The price for a very high specificity: Missing a large number of relevant records. | ||||
| The truth | ||||
| relevant records | irrelevant records | |||
| search engine | records returned to user | 250 | 16 | 266 |
| records eliminated | 250 | 15,999,484 | ||
| 500 | 15,999,500 | 16,000,000 | ||
| odds ratio | 1,000,000.00 | |||
| Specificity | 99.9999% | |||
| sensitivity (recall) | 50.00% | |||
| Precision | 93.99% | |||
Database tables, and their fields
| PMID | PubMed ID number | no |
| SNTNCID | sentence ID number | no |
| Sentence | text of the sentence | yes |
| PMID | PubMed ID number | yes |
| Citation | Citation information for the article | no |
Figure 1Format of search results returned by Relemed.
The eight relevance levels defined by Relemed.
| 1 | T and A and M |
| 2 | T and A |
| 3 | T and M |
| 4 | A and M |
| 5 | T |
| 6 | A |
| 7 | M |
| 8 | TAM |
T = title
A = at least one abstract sentence
M = concatenated MeSH terms
TAM = title, abstract, and MeSH concatenated into one sentence
Count of articles in each Relemed relevance level for the two case studies
| L1 T&A&M | 32 | 0 |
| L2 T&A | 4 | 6 |
| L3 T&M | 36 | 0 |
| L4 A&M | 78 | 0 |
| L5 T | 12 | 2 |
| L6 A | 182 | 68 |
| L7 M | 290 | 0 |
| L8 TAM | 257 | 82 |
| Total | 891 | 158 |
Figure 2Trend of precision in Relemed versus PubMed for case study #1. The red dots show the observed precision in the 8 groups of PMIDs per search engine. The solid blue line is a fitted smoother curve for the observed binary data (true-positive versus false-positive). The dashed black curves are the estimated 95% global confidence bands.
A false positive article for query of case study #1, where query words do concur, both in text and in MeSH (but not in the same sentence).
| DiFranza JR, Aligne CA, Weitzman M. |
Figure 3Trend of precisions for case study #2. The red dots show the observed precision in the 8 groups of PMIDs per search engine. The solid blue line is a fitted smoother curve for the observed binary data (true-positive versus false-positive). The dashed black curves are the estimated 95% global confidence bands.