| Literature DB >> 23302604 |
Abu Saleh Mohammad Mosa1, Illhoi Yoo.
Abstract
BACKGROUND: The practice of evidence-based medicine requires efficient biomedical literature search such as PubMed/MEDLINE. Retrieval performance relies highly on the efficient use of search field tags. The purpose of this study was to analyze PubMed log data in order to understand the usage pattern of search tags by the end user in PubMed/MEDLINE search.Entities:
Mesh:
Year: 2013 PMID: 23302604 PMCID: PMC3552776 DOI: 10.1186/1472-6947-13-8
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
PubMed/MEDLINE informational search field tags [14]
| [MESH TERMS] | [MH], [MESH] |
| [MESH MAJOR TOPIC] | [MAJR] |
| [MESH SUBHEADINGS] | [SH], [SUBHEADING] |
| [FILTER]* | [FILTER]* |
| [LANGUAGE] | [LA], [LANG] |
| [EC/RN NUMBER] | [RN], [EC], [ECNO] |
| [OTHER TERM] | [OT], [KEYWORD] |
| [PS]* | [PS]* |
| [SUPPLEMENTARY CONCEPT] | [NM], [SUBS], [SUBSTANCE NAME] |
| [PHARAMCOLOGICAL ACTION] | [PA] |
| [PLACE OF PUBLICATION] | [PL] |
| [PUBLICATION TYPE] | [PT], [PTYP] |
| [SUBSET] | [SB] |
| [TEXT WORDS] | [TW], [TEXT], [WORD] |
| [TITLE] | [TI], [TITL] |
| [TITLE/ABSTRACT] | [TIAB] |
| [TRANSLITERATED TITLE]# | [TT]# |
| [ALL FIELDS] | [ALL], [ALL FIELD] |
| COMMENT CORRECTIONS # | N/A |
*No variation of this tag was observed either in the PubMed documentation [14] or query log file.
#This tag did not appear in the user query log file.
PubMed/MEDLINE navigational search field tags [14]
| [AFFILIATION] | [AD], [AFFIL] |
| [ARTICLE IDENTIFIER] | [AID], [DOI], [PII] |
| [AUTHOR NAME] | [AUTHOR], [AU], [AU NAME], [AUTH] |
| [BOOK]* | [BOOK]* |
| [CORPORATE AUTHOR] | [CN] |
| [CREATE DATE]# | [CRDT]# |
| [COMPLETION DATE]# | [DCOM]# |
| [EDITOR]# | [ED]# |
| [ENTREZ DATE] | [EDAT] |
| [FIRST AUTHOR NAME] | [1AU], [FIRST AUTHOR] |
| [FULL AUTHOR NAME] | [FAU], [FULL] |
| [FULL INVESTIGATOR NAME]# | [FIR]# |
| [GRANT NUMBER] | [GR] |
| [INVESTIGATOR]# | [IR]# |
| [ISBN]#* | [ISBN]#* |
| [ISSUE] | [IP], [ISS] |
| [JOURNAL] | [TA], [JOUR], [IS], [JO], [JOURNAL NAME] |
| [LAST AUTHOR]# | [LASTAU]# |
| [LOCATION ID]# | [LID]# |
| [MESH DATE] | [MHDA] |
| [MODIFICATION DATE]# | [LR]# |
| [NLM UNIQUE ID] | [JID], [NLMID] |
| OWNER#* | N/A |
| [PAGINATION] | [PG], [PAGE], [PAGE NUMBER] |
| [PMID] | [UID] |
| [PUBLISHER] | [PUBN]# |
| [PUBLICATION DATE] | [DP], [PDAT] |
| [SECONDARY SOURCE ID] | [SI] |
| [VOLUME] | [VI], [VOLUME NUMBER], [VOL] |
* No variation of this tag was observed either in the PubMed documentation [14] or query log file.
#This tag did not appear in the user query log file.
Figure 1Sample PubMed query log. This figure presents a total of 10 sample queries from the PubMed query log file that was used in this study.
Figure 2Data pre-processing steps. This figure demonstrates the data cleaning and pre-processing steps for association mining analysis.
Search tag and queries issued per user
| = | = | = | = | ||
|---|---|---|---|---|---|
| 1 | 193,935 | 54,930 | 9,002 | 7,758 | 265,625 |
| 2 | 64,502 | 12,461 | 764 | 1,809 | 79,536 |
| 3 | 45,023 | 7,869 | 561 | 1,212 | 54,665 |
| 4 | 31,945 | 6,016 | 394 | 895 | 39,250 |
| 5 | 24,128 | 4,634 | 360 | 709 | 29,831 |
| 6 | 18,248 | 3,898 | 312 | 609 | 23,067 |
| 7 | 14,210 | 3,267 | 254 | 493 | 18,224 |
| 8 | 11,348 | 2,703 | 251 | 484 | 14,786 |
| 9 | 9,053 | 2,295 | 200 | 447 | 11,995 |
| 10 | 7,548 | 1,966 | 154 | 371 | 10,039 |
| 11 to 50 | 42,741 | 15,608 | 1,526 | 3,394 | 63,269 |
| Total (%) | 462,681 | 115,647 | 13,778 | 18,181 | 610,287 |
| (75.81%) | (18.95%) | (2.26%) | (2.987%) | (100%) | |
This table presents the total number of users using a different number of total distinct tags issuing a different number of queries.
Figure 3Number of users using a different number of distinct tags per number of queries. This histogram presents the total number of users using a different number of distinct search field tags per number of queries.
Number of queries containing distinct number of tags
| 0 | 2,585,183 | 88.61989 |
| 1 | 245,838 | 8.42731 |
| 2 | 34,731 | 1.19058 |
| 3 | 27,766 | 0.95182 |
| 4 | 16,320 | 0.55945 |
| 5 | 5,157 | 0.17678 |
| 6 | 1,956 | 0.06705 |
| 7 | 195 | 0.00668 |
| 8 | 10 | 0.00034 |
| 9 | 2 | 0.00007 |
| 10 | 0 | 0.00000 |
| 11 | 1 | 0.00003 |
This table presents the total number of queries (and its relative frequency) containing a different number of distinct tags. The maximum number of distinct tags appear in a query is 11.
Figure 4Number of queries containing different number of distinct tags. This histogram presents the total number of queries containing different number of distinct tags.
Search tag frequencies
| Navigational | [AUTHOR] | 179,418 | 23,277 | 202,695 |
| [PUBLICATION DATE] | 2,197 | 51,021 | 53,218 | |
| [JOURNAL] | 12,153 | 36,383 | 48,536 | |
| [PAGINATION] | 330 | 36,213 | 36,543 | |
| [VOLUME] | 89 | 33,630 | 33,719 | |
| [ISSUE] | 4 | 10,608 | 10,612 | |
| [ENTREZ DATE] | 695 | 3,490 | 4,185 | |
| [FIRST AUTHOR NAME] | 1,000 | 2,478 | 3,478 | |
| [AFFILIATION] | 1,197 | 1,341 | 2,538 | |
| [CORPORATE AUTHOR] | 1,463 | 8 | 1,471 | |
| [PMID] | 1,351 | 65 | 1,416 | |
| [GRANT NUMBER] | 85 | 652 | 737 | |
| [MESH DATE] | 21 | 201 | 222 | |
| [BOOK] | 78 | 1 | 79 | |
| [FULL AUTHOR NAME] | 64 | 30 | 79 | |
| [DATE] | 13 | 53 | 66 | |
| [SECONDARY SOURCE ID] | 34 | 0 | 34 | |
| [ARTICLE IDENTIFIER] | 13 | 0 | 13 | |
| [NLMID] | 6 | 0 | 6 | |
| Informational | [MESH TERMS] | 10,195 | 11,704 | 21,899 |
| [LANGUAGE] | 12,496 | 7,595 | 20,091 | |
| [TITLE] | 7,180 | 3,765 | 10,945 | |
| [TITLE ABSTRACT] | 5,001 | 4,889 | 9,890 | |
| [PUBLICATION TYPE] | 605 | 7,366 | 7,971 | |
| [MESH MAJOR TOPIC] | 2,047 | 5,847 | 7,894 | |
| [TEXT WORD] | 1,227 | 5,950 | 7,177 | |
| [SUBSET] | 2,775 | 4,167 | 6,942 | |
| [ALL FIELDS] | 2,822 | 1,922 | 4,744 | |
| [FILTER] | 466 | 1,564 | 2,030 | |
| [SUBHEADING] | 117 | 1,552 | 1,669 | |
| [EC/RN NUMBER] | 165 | 673 | 838 | |
| [SUBSTANCE] | 263 | 459 | 722 | |
| [SOURCE] | 200 | 44 | 244 | |
| [PHARMACOLOGICAL ACTION] | 23 | 50 | 73 | |
| [PLACE OF PUBLICATION] | 23 | 25 | 48 | |
| [PS] | 19 | 4 | 23 | |
| [OTHER TERM] | 3 | 6 | 9 |
This table presents the total number of queries containing 37 different search field tags. This table also contains the number of queries containing single tag and two or more tags.
Figure 5Search field tag frequency in queries. This histogram shows for each of 37 search field tags the total number of queries containing either the tag only or the tag and other tag(s).
Figure 6Plot of tag count against term count: (a) Scatter Plot, and (b) Boxplot. This figure includes a scatter plot diagram and a boxplot diagram presenting the number of search tags (X) against the total number of search terms (Y) used in a query. Also, a linear regression line is superimposed on both of the plot presented by a solid line.
Frequent co-occurrences of informational search field tags and association rules
| [LANGUAGE], [MESH TERMS], [PUBLICATION TYPE],[SUBSET], [MESH MAJOR TOPIC] | 0.027 | |
| [MESH TERMS], [PUBLICATION TYPE], [MESH MAJOR TOPIC], [SUBSET] ==> [LANGUAGE] | 0.99 | |
| [LANGUAGE], [PUBLICATION TYPE], [MESH MAJOR TOPIC], [SUBSET] ==> [MESH TERMS] | 0.99 | |
| [PUBLICATION TYPE], [MESH MAJOR TOPIC], [SUBSET] ==> [MESH TERMS], [LANGUAGE] | 0.98 | |
| [MESH TERMS], [LANGUAGE], [MESH MAJOR TOPIC], [SUBSET] ==> [PUBLICATION TYPE] | 0.96 | |
| [MESH TERMS], [MESH MAJOR TOPIC], [SUBSET] ==> [LANGUAGE], [PUBLICATION TYPE] | 0.95 | |
| [LANGUAGE], [MESH MAJOR TOPIC], [SUBSET] ==> [MESH TERMS], [PUBLICATION TYPE] | 0.93 | |
| [MESH TERMS], [LANGUAGE], [PUBLICATION TYPE], [MESH MAJOR TOPIC] ==> [SUBSET] | 0.93 | |
| [MESH MAJOR TOPIC], [SUBSET] ==> [MESH TERMS], [LANGUAGE], [PUBLICATION TYPE] | 0.91 | |
| [MESH TERMS], [PUBLICATION TYPE], [MESH MAJOR TOPIC] ==> [LANGUAGE], [SUBSET] | 0.91 | |
| [LANGUAGE], [MESH TERMS], [PUBLICATION TYPE],[SUBSET], [TEXT WORD] | 0.021 | |
| [LANGUAGE], [PUBLICATION TYPE], [TEXT WORD], [SUBSET] ==> [MESH TERMS] | 0.99 | |
| [MESH TERMS], [PUBLICATION TYPE], [TEXT WORD], [SUBSET]==> [LANGUAGE] | 0.98 | |
| [PUBLICATION TYPE], [TEXT WORD], [SUBSET] ==> [MESH TERMS], [LANGUAGE] | 0.97 | |
| [MESH TERMS], [LANGUAGE], [PUBLICATION TYPE], [TEXT WORD] ==> [SUBSET] | 0.95 | |
| [MESH TERMS], [LANGUAGE], [TEXT WORD], [SUBSET] ==> [PUBLICATION TYPE] | 0.94 | |
| [LANGUAGE], [TEXT WORD], [SUBSET] ==> [MESH TERMS], [PUBLICATION TYPE] | 0.91 | |
| [MESH TERMS], [PUBLICATION TYPE], [TEXT WORD] ==> [LANGUAGE], [SUBSET] | 0.9 | |
This table presents the results of the association mining analysis demonstrating two interesting frequent itemsets consisting of only informational tags. It also presents 16 association rules generated from these two itemsets.
Frequent co-occurrences of navigational search field tags and association rules
| [PUBLICATION DATE], [JOURNAL], [PAGINATION], [ISSUE], [VOLUME] | 0.025 | |
| [PUBLICATION DATE], [JOURNAL], [PAGINATION], [ISSUE] ==> [VOLUME] | 0.96 | |
| [PUBLICATION DATE], [JOURNAL], [VOLUME], [ISSUE] ==> [PAGINATION] | 0.81 | |
| [PUBLICATION DATE], [JOURNAL], [ISSUE] ==> [PAGINATION], [VOLUME] | 0.75 | |
| [JOURNAL], [PAGINATION], [VOLUME], [ISSUE] ==> [PUBLICATION DATE] | 0.75 | |
| [JOURNAL], [VOLUME], [AUTHOR], [PUBLICATION DATE] | 0.026 | |
| Association Rules | ||
| [JOURNAL], [VOLUME], [AUTHOR] ==> [PUBLICATION DATE] | 0.80 | |
| [PUBLICATION DATE], [VOLUME], [AUTHOR] ==> [JOURNAL] | 0.59 | |
| [PAGINATION], [VOLUME], [AUTHOR], [PUBLICATION DATE] | 0.032 | |
| [PAGINATION], [VOLUME], [AUTHOR] ==> [PUBLICATION DATE] | 0.75 | |
| [PUBLICATION DATE], [VOLUME], [AUTHOR] ==> [PAGINATION] | 0.71 | |
| [PUBLICATION DATE], [PAGINATION], [AUTHOR] ==> [VOLUME] | 0.69 | |
| [VOLUME], [AUTHOR] ==> [PUBLICATION DATE], [PAGINATION] | 0.53 | |
This table presents the results of association mining analysis demonstrating three interesting frequent itemsets comprising of only navigational tags It also presents 10 association rules generated from these three itemsets.
Figure 7Visualization of association rules consisting of only informational tags. This figure visualizes 16 association rules presented in Table 6 consisting of six informational tags (i.e. [LANGUAGE], [MESH TERMS], [PUBLICATION TYPE], [SUBSET], [MESH MAJOR TOPIC], and [TEXT WORD]).
Figure 8Visualization of association rules consisting of only navigational tags. This figure visualizes 10 association rules presented in Table 7 consisting of six navigational tags (i.e. [PUBLICATION DATE], [JOURNAL], [PAGINATION], [ISSUE], [VOLUME], and [AUTHOR]).