| Literature DB >> 33293900 |
Travis R Goodwin1, Max E Savery1, Dina Demner-Fushman1.
Abstract
Recent work has shown that pre-trained Transformers obtain remarkable performance on many natural language processing tasks, including automatic summarization. However, most work has focused on (relatively) data-rich single-document summarization settings. In this paper, we explore highly-abstractive multi-document summarization, where the summary is explicitly conditioned on a user-given topic statement or question. We compare the summarization quality produced by three state-of-the-art transformer-based models: BART, T5, and PEGASUS. We report the performance on four challenging summarization datasets: three from the general domain and one from consumer health in both zero-shot and few-shot learning settings. While prior work has shown significant differences in performance for these models on standard summarization tasks, our results indicate that with as few as 10 labeled examples, there is no statistically significant difference in summary quality, suggesting the need for more abstractive benchmark collections when determining state-of-the-art.Entities:
Year: 2020 PMID: 33293900 PMCID: PMC7720861
Source DB: PubMed Journal: Proc Int Conf Comput Ling ISSN: 1525-2477
Figure 1:Example topic- and question-driven multi-document abstractive summaries (documents omitted).
Abstract multi-document summarization on DUC 2007 with 95% confidence intervals.
| System | ROUGE-1 | ROUGE-2 | ROUGE-L | BLEU-4 | Repetition |
|---|---|---|---|---|---|
| T5 (ZSL) | 21.21 (20.37 – 22.04) | 4.35 (3.82 – 4.91) | 11.59 (11.17 – 12.03) | 1.45 (1.24 – 1.73) | 33.21 (31.82 – 34.60) |
| T5 (FSL) | 36.35 (34.96 – 37.66) | 9.12 (8.27 – 9.94) | 17.46 (16.85 – 18.10) | 4.81 (4.22 – 5.51) | 54.20 (52.27 – 56.24) |
| BART (ZSL) | 37.36 (36.18 – 38.59) | 8.08 (7.34 – 8.88) | 16.62 (16.08 – 17.18) | 5.14 (4.52 – 5.84) | 44.91 (44.05 – 45.83) |
| BART (FSL) | 18.38 (17.93 – 18.85) | 53.96 (53.17 – 54.69) | |||
| PEGASUS (ZSL) | 26.36 (25.05 – 27.64) | 5.01 (4.38 – 5.70) | 14.69 (13.95 – 15.34) | 2.18 (1.83 – 2.58) | 65.52 (60.81 −70.36) |
| PEGASUS (FSL) | 36.02 (34.63 – 37.33) | 7.95 (7.26 – 8.65) | 5.21 (4.57 – 5.85) | 74.29 (71.73 – 76.92) |
Abstract multi-document summarization on TAC 2009 with 95% confidence intervals.
| System | ROUGE-1 | ROUGE-2 | ROUGE-L | BLEU-4 | Repetition |
|---|---|---|---|---|---|
| T5 (ZSL) | 29.97 (28.55 – 31.38) | 9.03 (7.78 – 10.30) | 17.98 (16.89 – 19.21) | 3.67 (3.10 – 4.36) | 27.96 (26.76 – 29.31) |
| T5 (FSL) | 38.36 (36.92 – 39.85) | 21.06 (19.74 – 22.59) | 33.91 (32.75 – 35.03) | ||
| BART (ZSL) | 12.82 (11.68 – 13.97) | 3.73 (3.27 – 4.20) | 9.43 (8.76 – 10.12) | 0.57 (0.44 – 0.74) | 7.32 (5.43 – 9.41) |
| BART (FSL) | 11.33 (10.37 – 12.44) | 21.11 (20.27 – 22.02) | 7.30 (6.49 – 8.11) | 45.30 (44.24 – 46.33) | |
| PEGASUS (ZSL) | 25.69 (23.88 – 27.66) | 5.70 (4.74 – 6.69) | 16.72 (15.77 – 17.65) | 3.31 (2.81 – 3.91) | 75.56 (71.07 – 80.36) |
| PEGASUS (FSL) | 38.96 (37.64 – 40.17) | 10.44 (9.51 – 11.40) | 7.00 (6.24 – 7.88) | 43.59 (41.50 – 45.87) |
Abstract multi-document summarization on DUC 2007 with 95% confidence intervals.
| System | ROUGE-1 | ROUGE-2 | ROUGE-L | BLEU-4 | Repetition |
|---|---|---|---|---|---|
| T5 (ZSL) | 27.01 (25.65 – 28.35) | 6.25 (5.35 – 7.29) | 15.72 (14.84 – 16.75) | 2.06 (1.72 – 2.45) | 30.47 (29.07 – 31.91) |
| T5 (FSL) | 34.13 (32.72 – 35.77) | 8.36 (7.32 – 9.50) | 17.35 (16.44 – 18.28) | 5.59 (4.70 – 6.54) | 32.60 (31.25 – 34.05) |
| BART (ZSL) | 28.97 (27.48–30.70) | 6.32 (5.58 – 7.24) | 15.64 (14.80 – 16.40) | 3.62 (3.11 – 4.22) | 27.96 (26.06 – 29.74) |
| BART (FSL) | 20.11 (19.27 – 20.94) | 39.91 (38.67 – 41.11) | |||
| PEGASUS (ZSL) | 24.87 (23.14–26.48) | 4.99 (4.31 – 5.77) | 14.80 (13.97 – 15.65) | 2.66 (2.26 – 3.19) | 57.15 (51.71 – 62.81) |
| PEGASUS (FSL) | 36.31 (34.95 – 37.63) | 9.21 (8.27 – 10.15) | 5.81 (5.09 – 6.62) | 40.39 (37.73 – 43.31) |
Abstract multi-document summarization on MEDIQA with 95% confidence intervals.
| System | ROUGE-1 | ROUGE-2 | ROUGE-L | BLEU-4 | Repetition |
|---|---|---|---|---|---|
| T5 (ZSL) | 31.09 (28.46 – 33.72) | 14.63 (11.77 – 17.58) | 22.52 (20.15 – 25.19) | 7.12 (5.07 – 9.36) | 31.00 (29.06 – 32.96) |
| T5 (FSL) | 10.90 (9.08 – 13.07) | 36.19 (34.73 – 37.78) | |||
| BART (ZSL) | 33.51 (31.21 – 36.14) | 13.87 (11.52 – 16.31) | 20.87 (18.92 – 22.88) | 8.21 (6.38 – 10.18) | 38.24 (36.60 – 39.79) |
| BART (FSL) | 37.65 (35.07 – 40.37) | 17.01 (14.38 – 20.12) | 23.54 (21.34 – 26.00) | 10.83 (8.83 – 13.04) | 41.48 (40.13 – 42.87) |
| PEGASUS (ZSL) | 29.75 (26.20 – 32.89) | 12.17 (9.44 – 15.12) | 20.88 (18.19 – 23.49) | 8.61 (6.53 – 10.84) | 63.87 (58.64 – 69.69) |
| PEGASUS (FSL) | 37.02 (33.86 – 40.33) | 17.04 (13.95 – 20.12) | 24.90 (22.18 – 27.68) | 46.81 (43.40 – 50.27) |
Figure 3:Example summaries for the question, What are the causes of childhood obesity?
Figure 2:Rouge-L of each model trained under the few-shot (FSL) and zero-shot (ZSL) learning settings.