| Literature DB >> 35156011 |
Laetitia Teodorescu1, Katja Hofmann2, Pierre-Yves Oudeyer1.
Abstract
An embodied, autonomous agent able to set its own goals has to possess geometrical reasoning abilities for judging whether its goals have been achieved, namely it should be able to identify and discriminate classes of configurations of objects, irrespective of its point of view on the scene. However, this problem has received little attention so far in the deep learning literature. In this paper we make two key contributions. First, we propose SpatialSim (Spatial Similarity), a novel geometrical reasoning diagnostic dataset, and argue that progress on this benchmark would allow for diagnosing more principled approaches to this problem. This benchmark is composed of two tasks: "Identification" and "Discrimination," each one instantiated in increasing levels of difficulty. Secondly, we validate that relational inductive biases-exhibited by fully-connected message-passing Graph Neural Networks (MPGNNs)-are instrumental to solve those tasks, and show their advantages over less relational baselines such as Deep Sets and unstructured models such as Multi-Layer Perceptrons. We additionally showcase the failure of high-capacity CNNs on the hard Discrimination task. Finally, we highlight the current limits of GNNs in both tasks.Entities:
Keywords: artificial intelligence; graph neural net; machine learning; neural networks; similarity learning; spatial reasoning; structured representation
Year: 2022 PMID: 35156011 PMCID: PMC8826049 DOI: 10.3389/frai.2021.782081
Source DB: PubMed Journal: Front Artif Intell ISSN: 2624-8212
Figure 1Visual illustration of SpatialSim. The benchmark is composed of two tasks, Identification and Discrimination. In Identification, the model is tasked with predicting whether a given configuration is the same as a held-out reference one, up to rotation, translation and scaling. In Discrimination, a dual-input model is tasked with recognizing whether a given pair of configurations is the same up to rotation, translation and scaling. We represent visual renderings of our objects, but note that for all objet-based models we consider the object-based representation is used as input: see main text.
Test classification accuracies (means and standard deviations are given over datasets and seeds) for the three different models on the Identification task.
|
|
|
|
|
|
|---|---|---|---|---|
| MPGNN | 0.97± 0.026 | 0.98 ± 0.024 | 0.98 ± 0.028 | 2,208 |
| GCN | 0.54 ± 0.033 | 0.52 ± 0.014 | 0.51 ± 0.013 | 2530 |
| RDS | 0.91 ± 0.062 | 0.85 ± 0.128 | 0.78 ± 0.19 | 2,038 |
| Deep Set | 0.65 ± 0.079 | 0.60 ± 0.082 | 0.58 ± 0.09 | 2,386 |
| ResNet18 | 11.7M | |||
| MLP Baseline | 0.82 ± 0.09 | 0.59 ± 0.051 | 0.56 ± 0.051 | 6k/48k/139k |
Bold values indicate highest average accuracy.
Test classification accuracies for the three different models on the discrimination task.
|
|
|
|
|
|
|---|---|---|---|---|
| MPGNN | 4,686 | |||
| GCN | 0.55 ± 0.006 | 0.50 ± 0.004 | 0.50 ± 0.05 | 4,962 |
| RDS | 0.8 ± 0.133 | 0.68 ± 0.154 | 0.52 ± 0.04 | 5,326 |
| Deep Set | 0.51 ± 0.014 | 0.50 ± 0.001 | 0.50 ± 0.005 | 5,274 |
| ResNet18 | 0.50 ± 0.002 | 0.50 ± 0.004 | 0.50 ± 0.005 | 11.7M |
| MLP Baseline | 0.55 ± 0.002 | 0.51 ± 0.006 | 0.50 ± 0.004 | 26k/192k/552k |
All metrics were computed on 10 different seeds and trained for 5 epochs on each dataset of the curriculum. Bold values indicate highest average accuracy.
Figure 2Magnitude of the difference in predicted score for the positive and negative classes for a comparison between a 5-object configuration and a perturbed version of this configuration where one object is displaced over the 2d-plane. Value displayed is score+−score−, where score+ corresponds to the score of the logits of the positive class as output by the model, and score_ corresponds to the logits of the negative class. Left row is with RDS layers and right row is with MPGNN. For each row, the displaced object's position is indicated with a blue star, the other ones with a blue dot. The sizes, colors, orientations and shapes of the objects are not represented. Bright yellow means the model assigns the positive class to the configuration where the displaced object would be placed here, black means the negative class would be assigned.
Generalization results between datasets for Deep Set, RDS and MPGNN. The numbers plotted are averages of testing accuracies.
|
|
|
| |
|---|---|---|---|
| 3–8 | 0.51 ± 0.016 | 0.49 ± 0.046 | 0.50 ± 0.043 |
| 0.80 ± 0.133 | 0.66 ± 0.138 | 0.51 ± 0.048 | |
| 9–20 | 0.51 ± 0.046 | 0.50 ± 0.001 | 0.50 ± 0.047 |
| 0.68 ± 0.154 | 0.52 ± 0.054 | ||
| 0.68 ± 0.063 | |||
| 21–30 | 0.50 ± 0.04 | 0.51 ± 0.068 | 0.50 ± 0.05 |
| 0.68 ± 0.15 | 0.52 ± 0.04 | ||
| 0.51 ± 0.048 |
Columns correspond to training datasets, rows to testing datasets. Each block corresponds to one train-set/test-set combination. In each block, the results are given from top to bottom for Deep Set, RDS and MPGNN. Diagonal blocks correspond to matching train set/test set combinations. All reported results are averages and standard deviations over 10 different runs. Rows and columns are annotated with the n.