| Literature DB >> 35187476 |
Antonio Carta1, Andrea Cossu1,2, Federico Errica1, Davide Bacciu1.
Abstract
In this work, we study the phenomenon of catastrophic forgetting in the graph representation learning scenario. The primary objective of the analysis is to understand whether classical continual learning techniques for flat and sequential data have a tangible impact on performances when applied to graph data. To do so, we experiment with a structure-agnostic model and a deep graph network in a robust and controlled environment on three different datasets. The benchmark is complemented by an investigation on the effect of structure-preserving regularization techniques on catastrophic forgetting. We find that replay is the most effective strategy in so far, which also benefits the most from the use of regularization. Our findings suggest interesting future research at the intersection of the continual and graph representation learning fields. Finally, we provide researchers with a flexible software framework to reproduce our results and carry out further experiments.Entities:
Keywords: benchmarks; catastrophic-forgetting; continual-learning; deep-graph-networks; lifelong-learning
Year: 2022 PMID: 35187476 PMCID: PMC8855050 DOI: 10.3389/frai.2022.824655
Source DB: PubMed Journal: Front Artif Intell ISSN: 2624-8212
Summary of the datasets statistics.
|
|
|
| |
|---|---|---|---|
| Size | 70,000 | 60,000 | 158,100 |
| Node attrs. | 3 | 5 | 0 |
| Edge attrs. | 0 | 0 | 7 |
| Classes | 10 | 10 | 37 |
| Avg | 70.57 | 117.63 | 243.4 |
| Avg | 564.63 | 941.07 | 2266.1 |
| Data split | 55K/5K/15K | 45K/5K/15K | 49%/29%/22% |
| Class split | 2+2+2+2+2 | 2+2+2+2+2 | 17+5+5+5+5 |
“Class split” refers to how we group classes in the Split CL experiment.
Mean accuracy and mean standard deviation among all steps.
|
|
| ||||
|---|---|---|---|---|---|
|
|
|
|
| ||
|
| Baseline | 19.56 ± 0.1 | 19.39 ± 0.1 | 86.13 ± 4.5 | 33.16 ± 13.1 |
| DGN | 19.19 ± 0.1 | 18.95 ± 0.3 | 79.52 ± 1.9 | 32.64 ± 5.0 | |
| DGN+reg | 19.31 ± 0.1 | — | 81.42 ± 2.4 | — | |
|
| Baseline | 17.49 ± 0.1 | 17.49 ± 0.1 | 42.87 ± 3.7 | 26.77 ± 5.1 |
| DGN | 17.11 ± 0.2 | 17.10 ± 0.2 | 39.55 ± 2.3 | 24.13 ± 4.1 | |
| DGN+reg | 17.13 ± 0.1 | — | 46.61 ± 3.5 | — | |
|
| Baseline | 14.53 ± 0.5 | 13.90 ± 0.8 | 55.96 ± 3.0 | 20.83 ± 6.1 |
| DGN | 14.47 ± 0.3 | 14.15 ± 0.5 | 56.34 ± 2.5 | 18.46 ± 5.4 | |
| DGN+reg | 15.18 ± 0.8 | — | 57.27 ± 3.2 | — | |
Smaller accuracy results in larger forgetting of previous knowledge. Replay results are related to memory size of 1, 000. Results are averaged over 5 final runs. We treat the regularization loss as a separate strategy.
Figure 1Paired plots showing the ACC on each step for different models for LWF and MNIST (left), LWF and CIFAR10 (middle), Replay and OGBG-PPA (right). Complete plots in the Supplementary Material. Each column refers to a model and it is composed by pairs of connected points. Each pair refers to a specific step. The leftmost point in the pair represents ACC after training on that specific step. The rightmost point represents ACC after training on all steps. The more vertical the line connecting the points, the larger the forgetting effect. The dashed horizontal line indicates the performance of a random classifier. The red star represents the average performance over all steps.
Figure 2Comparison of performances between model selection (averaged across all configurations) and model assessment (averaged across 5 final training runs). The difference highlights the sensitivity of LwF to the choice of hyperparameters. Most of the configurations cause larger forgetting effects with respect to the best configuration found by model selection.