| Literature DB >> 35969463 |
Xin Wang1, Jian Wang1, Bo Xu1, Hongfei Lin1, Bo Zhang1, Zhihao Yang1.
Abstract
BACKGROUND: Question-driven summarization has become a practical and accurate approach to summarizing the source document. The generated summary should be concise and consistent with the concerned question, and thus, it could be regarded as the answer to the nonfactoid question. Existing methods do not fully exploit question information over documents and dependencies across sentences. Besides, most existing summarization evaluation tools like recall-oriented understudy for gisting evaluation (ROUGE) calculate N-gram overlaps between the generated summary and the reference summary while neglecting the factual consistency problem.Entities:
Keywords: algorithm; factual consistency; multi-head attention; natural language processing; pointer network; question answering; question-driven abstractive summarization; transformer; validation
Year: 2022 PMID: 35969463 PMCID: PMC9425173 DOI: 10.2196/38052
Source DB: PubMed Journal: JMIR Med Inform
Figure 1Overview of our model.
Figure 2Multi-view pointer network. Hq: hidden representation of question; y: hidden representation of the input summary; Hs: hidden representation of document.
Statistics of the PubMedQA data set.
| Task data set | Training, n | Development, n | Test, n |
| QAa pairs | 169,000 | 21,000 | 21,000 |
| Average question length (word count) | 16.3 | 16.4 | 16.3 |
| Average document length (word count) | 238 | 238 | 239 |
| Average summary length (word count) | 41.0 | 41.0 | 40.9 |
| Average number of sentences | 9.32 | 9.31 | 9.33 |
aQA: question-answering.
Comparison with related works of question-driven summarization task.
| Methods | Types | With question | ROUGEa-1 (%) | ROUGE-2 (%) | ROUGE-L (%) |
| LEAD3 | Extractive | No | 30.94 | 9.79 | 25.89 |
| MMRb | Extractive | No | 29.69 | 9.50 | 24.10 |
| S2SAc | Abstractive | No | 32.40 | 11.00 | 27.30 |
| PGNd | Abstractive | No | 32.89 | 11.51 | 28.10 |
| Transformer | Abstractive | No | 32.38 | 11.34 | 26.32 |
| SD2e | Abstractive | Query based | 32.33 | 10.52 | 26.01 |
| QSf | Abstractive | Query based | 32.60 | 11.10 | 26.70 |
| HSCMg | Extractive | Question driven | 32.34 | 10.07 | 25.98 |
| MSGh | Abstractive | Question driven |
| 14.80 | 30.20 |
| Trans-Att (ours) | Abstractive | Question driven | 36.01 |
|
|
aROUGE: recall-oriented understudy for gisting evaluation.
bMMR: maximal marginal relevance.
cS2SA: sequence-to-sequence model with attention.
dPGN: pointer-generator network.
eSD2: soft long short-term memory–based diversity attention model.
fQS: query-based summarization using neural networks.
gHSCM: hierarchical and sequential context modeling.
hMSG: multi-hop selective generator.
iItalics indicate the best result.
Comparison with related work for question-answering task.
| Methods | Accuracy (%) | F1 (%) |
| LEAD3 | 93.80 | 67.06 |
| MMRa |
| 75.69 |
| S2SAc | 91.89 | 63.81 |
| PGNd | 91.93 | 64.42 |
| Transformer | 94.18 | 69.59 |
| SD2e | 94.34 | 69.30 |
| HSCMf | 93.78 | 76.48 |
| MSGg | 93.68 | 73.27 |
| Trans-Att (ours) | 94.20 |
|
| Majority | 92.76 | 48.12 |
| Context | 96.50 | 84.65 |
| Long answer | 99.04 | 96.18 |
| Context + long answer | 99.20 | 96.86 |
aMMR: maximal marginal relevance.
bItalics indicate the best result.
cS2SA: sequence-to-sequence model with attention.
dPGN: pointer-generator network.
eSD2: soft long short-term memory–based diversity attention model.
fHSCM: hierarchical and sequential context modeling.
gMSG: multi-hop selective generator.
An ablation study for our model.
| Methods | ROUGEa-1 | ROUGE-2 | ROUGE-L | Accuracy (%) | F1 (%) |
| Trans-Att | 36.01 | 15.59 | 30.22 | 94.20 | 77.57 |
| Intersentence attention | 34.65 | 13.92 | 28.07 | 93.87 | 73.13 |
| Coattention | 34.05 | 13.61 | 26.50 | 93.40 | 70.62 |
| Overall integration | 34.28 | 14.26 | 28.63 | 94.53 | 72.37 |
| Multi-view pointer network | 35.16 | 13.98 | 29.32 | 94.39 | 75.67 |
aROUGE: recall-oriented understudy for gisting evaluation.
Figure 3Case study from PubMedQA (the bottom example omits the context; final answer is in parentheses). MSG: multi-hop selective generator; PGN: pointer-generator network; QS: query-based summarization using neural networks; SD2: soft long short-term memory–based diversity attention model; HELLP: hemolysis, elevated liver enzymes, and low platelets counts syndrome.
Proportion of novel n-grams.
| Methods | 1 grams (%) | 2 grams (%) | 3 grams (%) | 4 grams (%) |
| Trans-Att | 11.00 | 47.82 | 67.12 | 79.38 |
| MSGa | 13.43 | 54.66 | 74.13 | 85.01 |
| PGNb | 16.29 | 43.73 | 58.38 | 69.14 |
| Refrence | 27.83 | 72.11 | 87.17 | 93.55 |
aMSG: multi-hop selective generator.
bPGN: pointer-generator network.