| Literature DB >> 35360832 |
Xian Zhu1,2, Yuanyuan Chen3, Yueming Gu4, Zhifeng Xiao5.
Abstract
Recent advances have witnessed a trending application of transfer learning in a broad spectrum of natural language processing (NLP) tasks, including question answering (QA). Transfer learning allows a model to inherit domain knowledge obtained from an existing model that has been sufficiently pre-trained. In the biomedical field, most QA datasets are limited by insufficient training examples and the presence of factoid questions. This study proposes a transfer learning-based sentiment-aware model, named SentiMedQAer, for biomedical QA. The proposed method consists of a learning pipeline that utilizes BioBERT to encode text tokens with contextual and domain-specific embeddings, fine-tunes Text-to-Text Transfer Transformer (T5), and RoBERTa models to integrate sentiment information into the model, and trains an XGBoost classifier to output a confidence score to determine the final answer to the question. We validate SentiMedQAer on PubMedQA, a biomedical QA dataset with reasoning-required yes/no questions. Results show that our method outperforms the SOTA by 15.83% and a single human annotator by 5.91%.Entities:
Keywords: RoBERTa; T5; XGBoost; biomedical question answering; sentiment analysis; transfer learning
Year: 2022 PMID: 35360832 PMCID: PMC8961296 DOI: 10.3389/fnbot.2022.773329
Source DB: PubMed Journal: Front Neurorobot ISSN: 1662-5218 Impact factor: 2.650
Profile of PubMedQA.
|
|
|
| |
|---|---|---|---|
| # QA pairs | 1,000 | 612,000 | 2,113,000 |
| Question | O.Q. title | O.Q. title | O. title converted to Q. |
| Labels | Yes/no/maybe | Unlabeled | Generated yes/no |
| Yes% | 55.20% | - | 92.80% |
| No% | 33.80% | - | 7.20% |
| Maybe% | 11.00% | - | 0 |
A sample with annotation in the PubMedQA dataset.
|
|
|
|---|---|
| Question | Double balloon enteroscopy: is it efficacious and safe in a community setting? |
| Context | |
| Long answer | |
| Answer | Yes |
Figure 1Learning framework of SentiMedQAer: Two T5 models are fine-tuned to produce short forms of the context and the question, which are fed into a RoBERTa model that outputs their sentiment representations. A sampling module is employed to pair these sentiment values utilized to train an XGB classifier, which outputs a confidence score that determines the final prediction result.
Hyperparameter setting for T5 small.
|
|
| |
|---|---|---|
| Weight decay | 0.01 | 0.01 |
| Dropout probability | 0.1 | 0.1 |
| Steps | 20,000 | 20,000 |
| Optimizer | Adam | Adam |
| Learning rate | [1E-2, 1E-3, 1E-4] | 1E-3 |
| Batch size | [8, 16, 32] | 32 |
Hyperparameter setting for XGB.
|
|
| |
|---|---|---|
| eta | 0.015 | 0.015 |
| max_depth | [3, 5, 10] | 5 |
| n_estimator | [20, 40, 60] | 20 |
| sub_sample | 0.5 | 0.5 |
| scale_pos_weight | 1.75 | 1.75 |
| random_state | 2 | 2 |
| eval_metric | logloss | logloss |
| objective | binary:logistic | binary:logistic |
| num_round | 50 | 50 |
| test_frac | 0.2 | 0.2 |
Fine-tuning T5 based on Equation 1.
|
|
|
|
|
| |||
|---|---|---|---|---|---|---|---|
| 1 | 1.73 | 0.99 | 5.25 | 4.17 | 6.53 | 5.27 | 1.9 |
| 2 | 0.14 | 0.09 | 84.61 | 86.12 | 91.51 | 91.49 | 29.8 |
| 3 | 0.1 | 0.06 | 91.59 | 90.12 | 91.6 | 91.58 | 33.9 |
T.L., training loss; V.L., validation loss; G.L., generated length.
Fine-tuning T5 based on Equation 3.
|
|
|
|
|
| |||
|---|---|---|---|---|---|---|---|
| 1 | 0.23 | 0.23 | 82.25 | 67.19 | 82.24 | 82.19 | 11.41 |
| 2 | 0.23 | 0.21 | 84.69 | 70.5 | 84.68 | 84.68 | 7.46 |
| 3 | 0.18 | 0.2 | 85.27 | 71.19 | 85.26 | 85.25 | 4.44 |
Fine-tuning RoBERTa.
|
|
|
| ||
|---|---|---|---|---|
| 1 | 0.53 | 0.44 | 0.82 | 0.18 |
| 2 | 0.13 | 0.11 | 0.97 | 0.03 |
Ablation study.
|
|
|
|
|
|---|---|---|---|
| Fine-tuned BioBERT | 66.51 | 52.48 | 1.59 |
| SentiMedQAer w/o T5 | 81.83 | 74.17 | 2.05 |
| SentiMedQAer | 83.91 | 76.92 | 0.41 |
Comparison with the benchmarks.
|
|
|
|
|
|---|---|---|---|
| Fine-tuned BioBERT Jin et al. ( | 2019 | 66.51 | 52.48 |
| Single human performance Jin et al. ( | 2019 | 78.0 | - |
| Multi phase BioBERT Jin et al. ( | 2019 | 68.08 | 52.72 |
| BioELECTRA Kanakarajan et al. ( | 2021 | 64.2 | - |
| SentiMedQAer (ours) | 2021 |
|
|
The bold values indicate the highest score for each metric.
Figure 2Confusion matrices for (A) the fine-tuned BioBERT and (B) SentiMedQAer.
Hyperparameter setting for RoBERTa.
|
|
| |
|---|---|---|
| Sequence length | 128 | 128 |
| Train batch size | [4, 8, 16] | 16 |
| Dev batch size | 8 | 8 |
| Test batch size | 8 | 8 |
| Learning rate | [1E-5, 3E-5, 1E-4, 3E-4] | 1E-4 |
| Epoch number | [3, 6, 9] | 3 |
| Warmup | 0.1 | 0.1 |
| Dropout | 0.1 | 0.1 |