| Literature DB >> 35571722 |
Hamid Gharagozlou1, Javad Mohammadzadeh1, Azam Bastanfard1, Saeed Shiry Ghidary2.
Abstract
Answer selection (AS) is a critical subtask of the open-domain question answering (QA) problem. The present paper proposes a method called RLAS-BIABC for AS, which is established on attention mechanism-based long short-term memory (LSTM) and the bidirectional encoder representations from transformers (BERT) word embedding, enriched by an improved artificial bee colony (ABC) algorithm for pretraining and a reinforcement learning-based algorithm for training backpropagation (BP) algorithm. BERT can be comprised in downstream work and fine-tuned as a united task-specific architecture, and the pretrained BERT model can grab different linguistic effects. Existing algorithms typically train the AS model with positive-negative pairs for a two-class classifier. A positive pair contains a question and a genuine answer, while a negative one includes a question and a fake answer. The output should be one for positive and zero for negative pairs. Typically, negative pairs are more than positive, leading to an imbalanced classification that drastically reduces system performance. To deal with it, we define classification as a sequential decision-making process in which the agent takes a sample at each step and classifies it. For each classification operation, the agent receives a reward, in which the prize of the majority class is less than the reward of the minority class. Ultimately, the agent finds the optimal value for the policy weights. We initialize the policy weights with the improved ABC algorithm. The initial value technique can prevent problems such as getting stuck in the local optimum. Although ABC serves well in most tasks, there is still a weakness in the ABC algorithm that disregards the fitness of related pairs of individuals in discovering a neighboring food source position. Therefore, this paper also proposes a mutual learning technique that modifies the produced candidate food source with the higher fitness between two individuals selected by a mutual learning factor. We tested our model on three datasets, LegalQA, TrecQA, and WikiQA, and the results show that RLAS-BIABC can be recognized as a state-of-the-art method.Entities:
Mesh:
Year: 2022 PMID: 35571722 PMCID: PMC9106472 DOI: 10.1155/2022/7839840
Source DB: PubMed Journal: Comput Intell Neurosci
Figure 1The proposed LSTM-similarity model.
Figure 2Architecture of the BERT model.
Figure 3Placement of weights in a vector.
Statistical information of LegalQA, TrecQA, and WikiQA datasets.
| Dataset (TRAIN/DEV/TEST) | # questions | # QA pairs | % correct |
|---|---|---|---|
| LegalQA | 10,526/1,593/3,035 | 100,590/11,965/26,913 | 21.8/24.4/22.9 |
| TrecQA | 1,229/82/100 | 53,417/1,148/1,517 | 12.0/19.3/18.7 |
| WikiQA | 873/126/243 | 20,360/1,130/2,352 | 12.0/12.4/12.5 |
“% correct” means the proportion of matched QA pairs.
The parameters of the model.
| Parameter | Value |
|---|---|
| Batch size | 128 |
| Embedding dim | 60 |
| Max sentence length | 80 |
| Activation function (LSTM and dense) | ReLU |
| Dense hidden layer | 8 |
The evaluation results of the proposed model and other models.
| Method | Dataset | |||||
|---|---|---|---|---|---|---|
| LegalQA | TrecQA | WikiQA | ||||
| MAP | MRR | MAP | MRR | MAP | MRR | |
| KABLSTM [ | 0.751 | 0.790 | 0.792† | 0.844† | 0.732† | 0.749† |
| EATS [ | 0.778 | 0.810 | 0.854† | 0.881† | 0.700† | 0.715† |
| AM-BLSTM [ | 0.786 | 0.836 | 0.818 | 0.827 | 0.780 | 0.788 |
| BERT-Base [ | 0.838 | 0.850 | 0.823 | 0.812 | 0.813† | 0.828† |
| DRCN [ | 0.828 | 0.859 | 0.802 | 0.832 | 0.804† | 0.862† |
| P-CNN [ | 0.715 | 0.729 | 0.680 | 0.698 | 0.734† | 0.737† |
| DARCNN [ | 0.700 | 0.745 | 0.743 | 0.725 | 0.734† | 0.750† |
| DASL [ | 0.804 | 0.816 | 0.824 | 0.831 | 0.768 | 0.795 |
| IKAAS [ | 0.825† | 0.883† | 0.823† | 0.868† | 0.835 | 0.849 |
| AS + random weight | 0.758 ± 0.000 | 0.801 ± 0.001 | 0.796 ± 0.000 | 0.806 ± 0.002 | 0.771 ± 0.002 | 0.792 ± 0.009 |
| AS-BIABC | 0.788 ± 0.012 | 0.815 ± 0.008 | 0.802 ± 0.005 | 0.826 ± 0.002 | 0.803 ± 0.000 | 0.845 ± 0.025 |
| RLAS | 0.855 ± 0.102 | 0.872 ± 0.018 | 0.862 ± 0.014 | 0.883 ± 0.150 | 0.852 ± 0.025 | 0.876 ± 0.026 |
| RLAS-BIABC | 0.895 ± 0.020 | 0.912 ± 0.001 | 0.898 ± 0.015 | 0.906 ± 0.092 | 0.888 ± 0.036 | 0.891 ± 0.017 |
† indicates that the results are taken from the articles.
Parameter setting of experiments.
| Algorithm | Parameter | Value |
|---|---|---|
| ABC | Limit |
|
|
| 50% of the colony | |
|
| 50% of the colony | |
|
| 1 | |
| GWO | No parameters | |
| BAT | Constant for loudness update | 0.4 |
| Constant for an emission rate update | 0.6 | |
| Initial pulse emission rate | 0.002 | |
| DA | Scaling factor | 0.3 |
| Crossover probability | 0.7 | |
| SSA | No parameters | |
| COA | Discovery rate of alien solutions | |
| HMS | Number of clusters | 5 |
| Minimum mental processes | 2 | |
| Maximum mental processes | 5 | |
| C | 1 | |
| WOA | B | 1 |
The performance of other methods for initialization.
| Method | Dataset | |||||
|---|---|---|---|---|---|---|
| LegalQA | TrecQA | WikiQA | ||||
| MAP | MRR | MAP | MRR | MAP | MRR | |
| RLAS-BGDM | 0.796 ± 0.002 | 0.819 ± 0.026 | 0.824 ± 0.093 | 0.836 ± 0.026 | 0.810 ± 0.056 | 0.825 ± 0.136 |
| RLAS-BGDA | 0.783 ± 0.125 | 0.776 ± 0.095 | 0.769 ± 0.025 | 0.786 ± 0.269 | 0.745 ± 0.136 | 0.761 ± 0.002 |
| RLAS-BGDMA | 0.791 ± 0.005 | 0.772 ± 0.103 | 0.796 ± 0.126 | 0.812 ± 0.236 | 0.793 ± 0.026 | 0.793 ± 0.005 |
| RLAS-BOSS | 0.810 ± 0.136 | 0.814 ± 0.004 | 0.853 ± 0.023 | 0.863 ± 0.026 | 0.840 ± 0.027 | 0.855 ± 0.127 |
| RLAS-BBR | 0.842 ± 0.009 | 0.853 ± 0.000 | 0.860 ± 0.036 | 0.878 ± 0.120 | 0.852 ± 0.103 | 0.870 ± 0.035 |
| RLAS-BGWO | 0.771 ± 0.205 | 0.783 ± 0.018 | 0.755 ± 0.072 | 0.781 ± 0.126 | 0.755 ± 0.025 | 0.773 ± 0.026 |
| RLAS-BBAT | 0.862 ± 0.003 | 0.818 ± 0.019 | 0.876 ± 0.093 | 0.880 ± 0.239 | 0.852 ± 0.061 | 0.873 ± 0.082 |
| RLAS-BDA | 0.816 ± 0.072 | 0.829 ± 0.022 | 0.863 ± 0.002 | 0.883 ± 0.056 | 0.836 ± 0.082 | 0.862 ± 0.091 |
| RLAS-BSSA | 0.747 ± 0.029 | 0.769 ± 0.072 | 0.750 ± 0.042 | 0.763 ± 0.025 | 0.746 ± 0.041 | 0.755 ± 0.001 |
| RLAS-BCOA | 0.860 ± 0.085 | 0.889 ± 0.089 | 0.882 ± 0.063 | 0.897 ± 0.237 | 0.872 ± 0.093 | 0.862 ± 0.017 |
| RLAS-BHMS | 0.849 ± 0.002 | 0.880 ± 0.123 | 0.879 ± 0.090 | 0.893 ± 0.036 | 0.840 ± 0.100 | 0.870 ± 0.009 |
| RLAS-BGDM | 0.752 ± 0.012 | 0.753 ± 0.027 | 0.769 ± 0.058 | 0.789 ± 0.085 | 0.731 ± 0.000 | 0.760 ± 0.018 |
| RLAS-BABC | 0.875 ± 0.004 | 0.906 ± 0.021 | 0.888 ± 0.046 | 0.900 ± 0.082 | 0.878 ± 0.016 | 0.889 ± 0.023 |
Figure 4The process of changing the criteria by modifying the value of λ for the three datasets: (a) LegalQA dataset; (b) TrecQA dataset; (c) WikiQA dataset.
The results of various loss functions on the model.
| Model | Dataset | |||||
|---|---|---|---|---|---|---|
| LegalQA | TrecQA | WikiQA | ||||
| MAP | MRR | MAP | MRR | MAP | MRR | |
| AS-BIABC + WCE | 0.781 ± 0.002 | 0.819 ± 0.026 | 0.772 ± 0.005 | 0.780 ± 0.145 | 0.795 ± 0.010 | 0.792 ± 0.012 |
| AS-BIABC + BCE | 0.789 ± 0.000 | 0.812 ± 0.120 | 0.786 ± 0.073 | 0.804 ± 0.025 | 0.783 ± 0.074 | 0.814 ± 0.002 |
| AS-BIABC + FL | 0.842 ± 0.048 | 0.838 ± 0.056 | 0.839 ± 0.090 | 0.829 ± 0.012 | 0.832 ± 0.005 | 0.822 ± 0.006 |
| AS-BIABC + DL | 0.838 ± 0.089 | 0.808 ± 0.135 | 0.810 ± 0.074 | 0.770 ± 0.203 | 0.806 ± 0.082 | 0.804 ± 0.120 |
| AS-BIABC + TL | 0.785 ± 0.096 | 0.783 ± 0.582 | 0.821 ± 0.006 | 0.800 ± 0.041 | 0.823 ± 0.018 | 0.799 ± 0.005 |
For the question “When were the Nobel Prize awards first given?” the table shows the top-5 answers from the model with and without reinforcement learning.
| Rank | Ranked answers w/o RL | Ranked answers by RL |
|---|---|---|
| 1 | The first | The |
| 2 | The five-member | The |
| 3 |
| Among them is the winner of the first |
| 4 | The | The first |
| 5 | “We all know that there are still major problems to be faced,” said | A day after the announcement, for example, critic Norman Holmes Pearson grumbled that this woman, Pearl Buck, was |
“In 1901” is the ground truth answer, and italicized words are terms that appear in the question.
The results of various word embeddings on the model.
| Word embedding | Dataset | |||||
|---|---|---|---|---|---|---|
| LegalQA | TrecQA | WikiQA | ||||
| MAP | MRR | MAP | MRR | MAP | MRR | |
| One-Hot encoding | 0.679 ± 0.042 | 0.569 ± 0.002 | 0.711 ± 0.120 | 0.653 ± 0.081 | 0.649 ± 0.089 | 0.589 ± 0.093 |
| CBOW | 0.869 ± 0.006 | 0.843 ± 0.000 | 0.889 ± 0.078 | 0.869 ± 0.120 | 0.836 ± 0.012 | 0.828 ± 0.010 |
| Skip-gram | 0.874 ± 0.052 | 0.872 ± 0.075 | 0.878 ± 0.030 | 0.858 ± 0.002 | 0.847 ± 0.014 | 0.853 ± 0.014 |
| GloVe | 0.812 ± 0.027 | 0.853 ± 0.082 | 0.795 ± 0.140 | 0.821 ± 0.074 | 0.782 ± 0.039 | 0.806 ± 0.009 |
| FastText | 0.881 ± 0.002 | 0.901 ± 0.041 | 0.886 ± 0.093 | 0.876 ± 0.002 | 0.861 ± 0.099 | 0.870 ± 0.000 |
Figure 5The process of changing the criteria by modifying the value of F for the three datasets: (a) LegalQA dataset; (b) TrecQA dataset; (c) WikiQA dataset.