| Literature DB >> 23805178 |
Yu Zhang1, Wei-Nan Zhang, Ke Lu, Rongrong Ji, Fanglin Wang, Ting Liu.
Abstract
Lexical gap in cQA search, resulted by the variability of languages, has been recognized as an important and widespread phenomenon. To address the problem, this paper presents a question reformulation scheme to enhance the question retrieval model by fully exploring the intelligence of paraphrase in phrase-level. It compensates for the existing paraphrasing research in a suitable granularity, which either falls into fine-grained lexical-level or coarse-grained sentence-level. Given a question in natural language, our scheme first detects the involved key-phrases by jointly integrating the corpus-dependent knowledge and question-aware cues. Next, it automatically extracts the paraphrases for each identified key-phrase utilizing multiple online translation engines, and then selects the most relevant reformulations from a large group of question rewrites, which is formed by full permutation and combination of the generated paraphrases. Extensive evaluations on a real world data set demonstrate that our model is able to characterize the complex questions and achieves promising performance as compared to the state-of-the-art methods.Entities:
Mesh:
Year: 2013 PMID: 23805178 PMCID: PMC3689745 DOI: 10.1371/journal.pone.0064601
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Representative questions for lexical gap illustration.
|
|
| Q1: Can you catch a cold from cold temperature? |
|
|
| Q2: Does cold weather affect actually catching a cold? |
|
|
| Q3: How can you catch a cold? |
| Q4: Can you catch a cold from getting your head wet? |
Figure 1The schematic illustration of the proposed question reformulation scheme.
Figure 2An illustration of key-phrases detection from a given question in natural languages.
The heuristic method for key-phrase detection.
| Key-term Decision Rules | Result |
|
| True |
|
| True |
|
| True |
| Otherwise | False |
Figure 3An example for illustrating term weighting scheme and key term selection.
Figure 4Illustration of question reformulations in the form of Viterbi decoding structure.
Figure 5Top 5 reformulated questions with their generation probabilities.
Experimental results of phrasal paraphrase extraction both on percentage of correct meaning and grammar.
| macro-CM | micro-CM | macro-CG | micro-CG | macro-BC | micro-BC | |
| Baseline | 0.4724 | 0.5575 | 0.7825 | 0.8090 | 0.4617 | 0.5375 |
| %chg | +73.05% | +65.33% | +7.13% | +18.89% | +76.87% | +68.02% |
| Our Method | 0.8175 | 0.9217 | 0.8383 | 0.9618 | 0.8166 | 0.9031 |
Overall performance comparison of MRR, MAP and p1. All improvements obtained by VD-TLM are statistically significant over other methods within 0.95 confidence interval using the -test.
| Question Retrieval Models | TLM | WN-TLM | SPG-TLM | VD-TLM | WN-TLM | SPG-TLM | VD-TLM |
| (oq) | (rq) | (rq) | (rq) | (oq+rq) | (oq+rq) | (oq+rq) | |
| MRR | 0.1889 | 0.1875 | 0.2024 | 0.2157 | 0.2206 | 0.2301 | 0.2583 |
| % MRR improvement over TLM | N/A | N/A | +7.15 | +14.19 | +16.78 | +21.81 | +36.74 |
| WN-TLM( | +0.75 | N/A | +7.95 | +15.04 | +17.65 | +22.72 | +37.76 |
| SPG-TLM( | N/A | N/A | N/A | +6.57 | +8.99 | +13.69 | +27.62 |
| WN-TLM( | N/A | N/A | N/A | N/A | N/A | +4.31 | +17.09 |
| SPG-TLM( | N/A | N/A | N/A | N/A | N/A | N/A | +12.26 |
| MAP | 0.2889 | 0.2870 | 0.3037 | 0.3269 | 0.3384 | 0.3664 | 0.4188 |
|
| 0.1928 | 0.1967 | 0.2214 | 0.2357 | 0.2429 | 0.2643 | 0.2786 |
Figure 6Illustration of performance variation when different number of reformulated questions are added in blending model for question retrieval.