| Literature DB >> 27455377 |
Ahmed AbdoAziz Ahmed Abdulla1, Hongfei Lin2, Bo Xu1, Santosh Kumar Banbhrani1.
Abstract
BACKGROUND: Biomedical literature retrieval is becoming increasingly complex, and there is a fundamental need for advanced information retrieval systems. Information Retrieval (IR) programs scour unstructured materials such as text documents in large reserves of data that are usually stored on computers. IR is related to the representation, storage, and organization of information items, as well as to access. In IR one of the main problems is to determine which documents are relevant and which are not to the user's needs. Under the current regime, users cannot precisely construct queries in an accurate way to retrieve particular pieces of data from large reserves of data. Basic information retrieval systems are producing low-quality search results. In our proposed system for this paper we present a new technique to refine Information Retrieval searches to better represent the user's information need in order to enhance the performance of information retrieval by using different query expansion techniques and apply a linear combinations between them, where the combinations was linearly between two expansion results at one time. Query expansions expand the search query, for example, by finding synonyms and reweighting original terms. They provide significantly more focused, particularized search results than do basic search queries.Entities:
Keywords: Biomedical information retrieval; Linear combination of query results; Query expansion
Mesh:
Year: 2016 PMID: 27455377 PMCID: PMC4965722 DOI: 10.1186/s12859-016-1092-8
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Linear combination of multiple query expansion techniques
Q.E. using M.F.T., D.N. = (10–50)
| MAP | DOC. | PASS. | ASP. | PASS2. |
|---|---|---|---|---|
| Baseline (Indri) | 0.2571 | 0.0634 | 0.2008 | 0.0847 |
|
|
|
|
|
|
| 20 | 0.2696 | 0.0733 | 0.1960 | 0.0932 |
| 30 | 0.2688 | 0.0750 | 0.1978 | 0.0949 |
| 40 | 0.2532 | 0.0721 | 0.1928 | 0.0900 |
| 50 | 0.2532 | 0.0721 | 0.1928 | 0.0900 |
Q.E. using M.F.T., T.N. = (5–30)
| MAP | DOC. | PASS. | ASP. | PASS2. |
|---|---|---|---|---|
| Baseline (Indri) | 0.2571 | 0.0634 | 0.2008 | 0.0847 |
| 5 | 0.2549 | 0.0681 | 0.1867 | 0.0826 |
| 10 | 0.2574 | 0.0681 | 0.1782 | 0.0870 |
| 15 | 0.2707 | 0.0711 | 0.1788 | 0.0929 |
|
|
|
|
|
|
| 25 | 0.2658 | 0.0670 | 0.1703 | 0.0898 |
| 30 | 0.2536 | 0.0645 | 0.1509 | 0.0864 |
Q.E. using Lavrenko’s relevance model, feedback D.N. =(5−30)
| MAP | DOC. | PASS. | ASP. | PASS2. |
|---|---|---|---|---|
| Baseline (Indri) | 0.2571 | 0.0634 | 0.2008 | 0.0847 |
|
|
|
|
|
|
| 10 | 0.2796 | 0.0641 | 0.1910 | 0.0923 |
| 15 | 0.2791 | 0.0632 | 0.1866 | 0.0903 |
| 20 | 0.2778 | 0.0653 | 0.1842 | 0.0920 |
| 25 | 0.2747 | 0.0656 | 0.1893 | 0.0930 |
| 30 | 0.2761 | 0.0639 | 0.1908 | 0.0927 |
Q.E. using Lavrenko’s relevance model, feedback weight =(0.1−0.9)
| MAP | DOC. | PASS. | ASP. | PASS2. |
|---|---|---|---|---|
| Baseline (Indri) | 0.2571 | 0.0634 | 0.2008 | 0.0847 |
| 0.1 | 0.2591 | 0.0523 | 0.1643 | 0.0807 |
| 0.2 | 0.2591 | 0.0523 | 0.1643 | 0.0807 |
| 0.3 | 0.2724 | 0.0561 | 0.1693 | 0.0838 |
| 0.4 | 0.2791 | 0.0591 | 0.1760 | 0.0878 |
| 0.5 | 0.2866 | 0.0621 | 0.1806 | 0.0911 |
| 0.6 | 0.2943 | 0.0655 | 0.1862 | 0.0941 |
|
|
|
|
|
|
| 0.8 | 0.2931 | 0.0707 | 0.1990 | 0.0982 |
| 0.9 | 0.2836 | 0.0712 | 0.2031 | 0.0972 |
Q.E. using Lavrenko’s relevance model, feedback T.N. =(10−60)
| MAP | DOC. | PASS. | ASP. | PASS2. |
|---|---|---|---|---|
| Baseline (Indri) | 0.2571 | 0.0634 | 0.2008 | 0.0847 |
| 10 | 0.2866 | 0.0621 | 0.1806 | 0.0911 |
| 20 | 0.2938 | 0.0645 | 0.1894 | 0.0919 |
| 30 | 0.2973 | 0.0667 | 0.1945 | 0.0953 |
| 40 | 0.2980 | 0.0669 | 0.1942 | 0.0948 |
|
|
|
|
|
|
| 60 | 0.2982 | 0.0675 | 0.1932 | 0.0945 |
Unordered terms numbers
| Query topics | T.N. in M.Q.E. |
|---|---|
| 6 | 8 |
| 7 | 7 |
| 9 | 7 |
| 11 | 5 |
| 12 | 6 |
| 13 | 8 |
| 14 | 5 |
| 18 | 6 |
| 20 | 8 |
| 21 | 3 |
| 22 | 5 |
| 23 | 8 |
| 26 | 8 |
| 28 | 8 |
| 29 | 7 |
| 30 | 6 |
| 31 | 7 |
| 35 | 9 |
Q.E. using MetaMap thesaurus
| MAP | DOC. | PASS. | ASP. | PASS2. |
|---|---|---|---|---|
| Baseline (Indri) | 0.2571 | 0.0634 | 0.2008 | 0.0847 |
| M.Q.E. T.N. = 3 | 0.1611 | 0.0391 | 0.1419 | 0.0555 |
| Unordered | 0.1554 | 0.0393 | 0.1332 | 0.0542 |
Expanding query by PubMed
| MAP | DOC. | PASS. | ASP. | PASS2. |
|---|---|---|---|---|
| Baseline (Indri) | 0.2571 | 0.0634 | 0.2008 | 0.0847 |
| P.Q.E T.N. = 10 | 0.2014 | 0.0446 | 0.1522 | 0.0614 |
| T.N. = 5 | 0.2199 | 0.0499 | 0.1701 | 0.0709 |
Feedback & MetaMap combination
| MetaMap | Feedback | DOC. | PASS. | ASP. | PASS2. |
|---|---|---|---|---|---|
| Attributes | Attributes | MAP | MAP | MAP | MAP |
| Baseline (Indri) | 0.2571 | 0.0634 | 0.2008 | 0.0847 | |
| T. N. = 3 | D.N. = 5 | 0.2811 | 0.0638 | 0.1903 | 0.0890 |
| U. O. T. | 0.2776 | 0.0629 | 0.1904 | 0.0874 | |
|
|
|
|
|
|
|
| U.O.T. | 0.2843 | 0.0643 | 0.1981 | 0.0883 | |
| T. N. = 3 | Weight = 0.7 | 0.2824 | 0.0664 | 0.1977 | 0.0920 |
| U. O. T. | 0.2780 | 0.0645 | 0.1974 | 0.0890 | |
Feedback & PubMed combination
| PubMed | Feedback | DOC. | PASS. | ASP. | PASS2. |
|---|---|---|---|---|---|
| Attributes | Attributes | MAP | MAP | MAP | MAP |
| Baseline (Indri) | 0.2571 | 0.0634 | 0.2008 | 0.0847 | |
| T.N. = 5 | D.N. = 5 | 0.2993 | 0.0683 | 0.2002 | 0.0959 |
| T.N. = 10 | 0.2961 | 0.0669 | 0.1909 | 0.0942 | |
| T.N. = 5 |
| 0.3064 | 0.0706 | 0.2059 | 0.0983 |
|
|
|
|
|
| |
| T.N. = 5 | Weight = 0.7 | 0.3044 | 0.0708 | 0.2018 | 0.0988 |
| T.N. = 10 | 0.3087 | 0.0704 | 0.2035 | 0.0975 | |
Feedback & M.F.T. combination
| M.F.T. | Feedback | DOC. | PASS. | ASP. | PASS2. |
|---|---|---|---|---|---|
| Attributes | Attributes | MAP | MAP | MAP | MAP |
| Baseline (Indri) | 0.2571 | 0.0634 | 0.2008 | 0.0847 | |
| T.N. = 20 | D.N. = 5 | 0.2929 | 0.0708 | 0.1903 | 0.0977 |
|
|
|
|
|
| |
| Weight = 0.7 | 0.2997 | 0.0725 | 0.1962 | 0.1000 | |
M.F.T. & MetaMap combination
| M.F.T. | MetaMap | DOC. | PASS. | ASP. | PASS2. |
|---|---|---|---|---|---|
| Attributes | Attributes | MAP | MAP | MAP | MAP |
| Baseline (Indri) | 0.2571 | 0.0634 | 0.2008 | 0.0847 | |
| T.N. = 20 |
|
|
|
|
|
| U.O.T. | 0.2687 | 0.0664 | 0.1857 | 0.0874 | |
M.F.T. & PubMed combination
| M.F.T. | PubMed | DOC. | PASS. | ASP. | PASS2. |
|---|---|---|---|---|---|
| Attributes | Attributes | MAP | MAP | MAP | MAP |
| Baseline (Indri) | 0.2571 | 0.0634 | 0.2008 | 0.0847 | |
| T.N. = 20 |
|
|
|
|
|
| T.N. = 10 | 0.2881 | 0.0739 | 0.1977 | 0.0982 | |
PubMed & MetaMap combination
| PubMed | MetaMap | DOC. | PASS. | ASP. | PASS2. |
|---|---|---|---|---|---|
| Attributes | Attributes | MAP | MAP | MAP | MAP |
| Baseline (Indri) | 0.2571 | 0.0634 | 0.2008 | 0.0847 | |
| T.N. = 5 |
|
|
|
|
|
| T.N. = 10 | 0.2454 | 0.0573 | 0.1827 | 0.0754 | |
| T.N. = 5 | U.O.T. | 0.2407 | 0.0563 | 0.1893 | 0.0761 |
| T.N. = 10 | 0.2337 | 0.0546 | 0.1756 | 0.0729 | |
Fig. 2Combination of feedback and PubMed Q.E
Fig. 3Combination of feedback and M.F.T
Fig. 4Q.E. using M.F.T. with D.N. parameter
Fig. 5Q.E. using M.F.T. with T.N. parameter
Best results for different Q.E.
| Expansions with different parameters | DOC. MAP |
|---|---|
| Baseline (Indri) | 0.2571 |
| Previous study | 0.2906 |
| Feedback D.N. = 5 | 0.2866 |
|
|
|
| Feedback Weight = 0.7 | 0.2974 |
| M.F.T. D.N. = 20 | 0.2729 |
| M.F.T. T.N. = 20 | 0.2720 |
| MetaMap Thesaurus T.N. = 3 | 0.1611 |
| MetaMap Thesaurus Unordered T.N. | 0.1554 |
| PubMed Dictionary T.N. = 10 | 0.2014 |
| PubMed Dictionary T.N. = 5 | 0.2199 |
Fig. 6Q.E. individually with their parameters
Best results for Q.E. combinations
| Best combinations | DOC. MAP |
|---|---|
| Baseline (Indri) | 0.2571 |
| Previous study | 0.2906 |
| Feedback T.N. = 40 & MetaMap T.N. = 3 | 0.2901 |
|
|
|
| Feedback T.N. = 40 & M.F.T. T.N. = 20 | 0.3001 |
| M.F.T. T.N. = 20 & MetaMap T.N. = 3 | 0.2755 |
| M.F.T. T.N. = 20 & PubMed T.N. = 5 | 0.2886 |
| PubMed T.N. = 5 & MetaMap T.N. = 3 | 0.2484 |
Fig. 7Different Q.E. combinations individually