| Literature DB >> 35668814 |
Simone Piaggesi1,2, André Panisson3.
Abstract
Representation learning models for graphs are a successful family of techniques that project nodes into feature spaces that can be exploited by other machine learning algorithms. Since many real-world networks are inherently dynamic, with interactions among nodes changing over time, these techniques can be defined both for static and for time-varying graphs. Here, we show how the skip-gram embedding approach can be generalized to perform implicit tensor factorization on different tensor representations of time-varying graphs. We show that higher-order skip-gram with negative sampling (HOSGNS) is able to disentangle the role of nodes and time, with a small fraction of the number of parameters needed by other approaches. We empirically evaluate our approach using time-resolved face-to-face proximity data, showing that the learned representations outperform state-of-the-art methods when used to solve downstream tasks such as network reconstruction. Good performance on predicting the outcome of dynamical processes such as disease spreading shows the potential of this method to estimate contagion risk, providing early risk awareness based on contact tracing data. Supplementary Information: The online version contains supplementary material available at 10.1140/epjds/s13688-022-00344-8.Entities:
Keywords: Representation learning; Spreading processes; Temporal link prediction; Time-varying graphs
Year: 2022 PMID: 35668814 PMCID: PMC9143726 DOI: 10.1140/epjds/s13688-022-00344-8
Source DB: PubMed Journal: EPJ Data Sci ISSN: 2193-1127 Impact factor: 3.630
Figure 1A time-varying graph with three intervals (left) and its corresponding time-respecting supra-adjacency graph (right)
Figure 2Representation of SGNS and HOSGNS with embedding matrices and operations on embedding vectors. Starting from a random walk realization on a static graph , SGNS takes as input nodes i and j within a context window of size T, and maximizes . HOSGNS starts from a random walk realization on a higher-order representation of time-varying graph , takes as input nodes (node i at time k) and (node j at time l) within a context window of size T and maximizes . In both cases, for each input sample, we fix i and draw κ combinations of j or j, k, l from a noise distribution, and we maximize (SGNS) or (HOSGNS) with their corresponding embedding vectors (negative sampling)
Summary statistics about empirical and synthetic time-varying graph data. In order: number of single nodes , number of steps , number of events , number of active nodes , average weight of events , nodes density and links density
| Dataset | Average weight | Nodes density | Links density | ||||
|---|---|---|---|---|---|---|---|
| L | 242 | 104 | 44,820 | 17,174 | 2.806 | 0.6824 | 0.0148 |
| SFHH | 403 | 127 | 17,223 | 10,815 | 4.079 | 0.2113 | 0.0017 |
| LH10 | 76 | 321 | 7435 | 4880 | 4.448 | 0.2000 | 0.0081 |
| T | 327 | 246 | 35,862 | 32,546 | 5.256 | 0.4046 | 0.0027 |
| I | 217 | 691 | 18,791 | 22,451 | 4.164 | 0.1497 | 0.0012 |
| O | 2000 | 100 | 1,243,551 | 198,537 | 1.0 | 0.9927 | 0.0062 |
| O | 5000 | 20 | 632,523 | 99,966 | 1.0 | 0.9997 | 0.0025 |
Macro-F1 scores for classification of nodes in epidemic states according to different SIR epidemic processes over empirical datasets. For each we highlight the two highest scores and underline the best one
| ( | Model | Dataset | ||||
|---|---|---|---|---|---|---|
| L | SFHH | LH10 | T | I | ||
| (0.25,0.002) | D | 67.0 ± 1.2 | 52.5 ± 1.7 | |||
| D | 58.7 ± 2.8 | 35.9 ± 1.1 | 34.5 ± 0.7 | 35.5 ± 1.2 | 58.8 ± 1.1 | |
| D | 31.0 ± 0.4 | 28.8 ± 0.4 | 29.9 ± 0.3 | 30.3 ± 0.2 | 30.4 ± 0.2 | |
| D | 27.3 ± 0.2 | 27.4 ± 0.3 | 29.7 ± 0.2 | 30.2 ± 0.2 | 30.5 ± 0.2 | |
| ISGNS | 63.5 ± 0.6 | 60.7 ± 0.8 | 54.1 ± 1.1 | 56.4 ± 0.6 | 52.3 ± 0.6 | |
| 55.5 ± 0.8 | 57.3 ± 1.1 | 45.9 ± 0.9 | 46.9 ± 0.7 | 44.5 ± 0.7 | ||
| 71.8 ± 1.2 | ||||||
| 77.4 ± 0.6 | 64.2 ± 1.0 | |||||
| (0.0625,0.002) | D | 72.2 ± 0.6 | 64.9 ± 1.7 | 68.0 ± 0.5 | ||
| D | 56.4 ± 2.7 | 35.9 ± 4.1 | 35.8 ± 1.2 | 32.9 ± 1.2 | 55.0 ± 0.6 | |
| D | 29.5 ± 0.5 | 33.1 ± 2.5 | 29.6 ± 0.4 | 27.4 ± 0.3 | 28.4 ± 0.2 | |
| D | 26.4 ± 0.2 | 29.5 ± 1.3 | 29.5 ± 0.3 | 26.5 ± 0.2 | 28.5 ± 0.2 | |
| ISGNS | 59.2 ± 0.3 | 57.1 ± 1.6 | 55.9 ± 1.0 | 49.0 ± 0.3 | 47.2 ± 0.3 | |
| 55.5 ± 0.7 | 57.6 ± 2.2 | 49.4 ± 0.8 | 45.5 ± 0.4 | 43.6 ± 0.5 | ||
| 58.2 ± 1.1 | 59.0 ± 0.7 | |||||
| (0.1875,0.001) | D | 67.7 ± 1.2 | 72.7 ± 0.4 | |||
| D | 57.4 ± 2.8 | 36.2 ± 2.6 | 41.4 ± 1.3 | 34.8 ± 1.3 | 61.2 ± 0.9 | |
| D | 32.3 ± 0.5 | 31.5 ± 0.8 | 30.5 ± 0.4 | 27.9 ± 0.3 | 30.0 ± 0.2 | |
| D | 26.4 ± 0.2 | 29.4 ± 0.8 | 30.0 ± 0.3 | 27.7 ± 0.3 | 29.9 ± 0.2 | |
| ISGNS | 65.1 ± 0.5 | 63.0 ± 1.4 | 60.2 ± 1.7 | 56.0 ± 0.5 | 52.5 ± 0.5 | |
| 56.9 ± 0.8 | 59.4 ± 1.7 | 48.5 ± 1.1 | 49.0 ± 0.6 | 46.2 ± 0.8 | ||
| 62.4 ± 1.7 | ||||||
| 74.5 ± 0.4 | 67.3 ± 0.5 | |||||
Macro-F1 scores for temporal event reconstruction in empirical datasets. We highlight in bold the two best scores for each dataset. For baseline models we underline their highest score
| Model | Operator | Dataset | ||||
|---|---|---|---|---|---|---|
| L | SFHH | LH10 | T | I | ||
| D | Average | 56.4 ± 0.4 | 52.9 ± 0.5 | 52.3 ± 0.6 | 51.0 ± 0.4 | 52.7 ± 0.4 |
| Hadamard | 89.7 ± 0.3 | 94.7 ± 0.1 | 94.1 ± 0.1 | |||
| Weighted-L1 | 90.2 ± 0.2 | 83.3 ± 0.5 | 73.3 ± 0.7 | 94.7 ± 0.1 | 94.4 ± 0.2 | |
| Weighted-L2 | 84.5 ± 0.5 | 72.0 ± 0.5 | ||||
| Concat | 65.7 ± 0.4 | 53.8 ± 0.4 | 56.2 ± 0.6 | 57.0 ± 0.4 | 50.9 ± 0.4 | |
| D | Average | 57.7 ± 0.5 | 56.8 ± 0.7 | 40.4 ± 1.5 | 42.8 ± 0.9 | |
| Hadamard | 55.1 ± 1.0 | 52.5 ± 1.6 | 40.8 ± 1.5 | 43.7 ± 1.0 | ||
| Weighted-L1 | 58.4 ± 0.6 | 52.3 ± 0.7 | 50.9 ± 1.2 | 44.8 ± 0.9 | ||
| Weighted-L2 | 53.7 ± 0.6 | 47.0 ± 0.8 | 47.0 ± 1.3 | 39.2 ± 1.2 | 43.6 ± 0.6 | |
| Concat | 60.4 ± 0.4 | 48.9 ± 1.7 | 36.9 ± 1.3 | |||
| D | Average | 51.7 ± 0.2 | 56.9 ± 0.4 | 60.2 ± 0.6 | 58.1 ± 0.2 | 56.1 ± 0.3 |
| Hadamard | 60.3 ± 0.3 | 58.9 ± 0.4 | 59.5 ± 0.5 | 62.2 ± 0.3 | 64.7 ± 0.3 | |
| Weighted-L1 | 72.3 ± 0.4 | 75.5 ± 0.6 | 70.8 ± 0.3 | 78.1 ± 0.2 | ||
| Weighted-L2 | 77.4 ± 0.4 | |||||
| Concat | 52.2 ± 0.2 | 53.4 ± 0.3 | 55.9 ± 0.7 | 55.1 ± 0.2 | 53.2 ± 0.3 | |
| D | Average | 51.1 ± 0.3 | 49.6 ± 0.4 | 51.6 ± 0.5 | 50.4 ± 0.2 | 50.1 ± 0.3 |
| Hadamard | 54.8 ± 0.6 | |||||
| Weighted-L1 | 72.4 ± 0.5 | 51.5 ± 0.3 | 56.1 ± 0.6 | 66.4 ± 0.4 | 64.8 ± 0.3 | |
| Weighted-L2 | 72.4 ± 0.5 | 51.7 ± 0.3 | 66.5 ± 0.4 | 63.7 ± 0.4 | ||
| Concat | 50.0 ± 0.3 | 50.1 ± 0.4 | 52.3 ± 0.5 | 49.8 ± 0.2 | 50.9 ± 0.3 | |
| ISGNS | Average | 53.4 ± 0.4 | 50.3 ± 0.5 | 48.1 ± 0.6 | 49.4 ± 0.4 | 45.9 ± 0.5 |
| Hadamard | 87.2 ± 0.4 | 80.8 ± 0.7 | 96.7 ± 0.2 | 96.7 ± 0.2 | ||
| Weighted-L1 | 89.9 ± 0.3 | 87.7 ± 0.4 | 81.6 ± 0.4 | 96.8 ± 0.2 | 96.4 ± 0.2 | |
| Weighted-L2 | 89.7 ± 0.3 | |||||
| Concat | 57.1 ± 0.5 | 50.2 ± 0.4 | 48.8 ± 0.7 | 52.7 ± 0.4 | 43.8 ± 0.4 | |
| Hadamard | ||||||
| Hadamard | 90.3 ± 0.2 | 80.9 ± 0.4 | 68.1 ± 0.7 | 93.5 ± 0.2 | 87.2 ± 0.2 | |
| Hadamard | 86.7 ± 0.4 | 73.6 ± 0.6 | 94.3 ± 0.1 | 89.0 ± 0.2 | ||
Macro-F1 scores for missing event prediction in empirical datasets. We highlight in bold the two best scores for each dataset. For baseline models we underline their highest score
| Model | Operator | Dataset | ||||
|---|---|---|---|---|---|---|
| L | SFHH | LH10 | T | I | ||
| D | Average | 56.8 ± 0.6 | 50.6 ± 0.8 | 51.3 ± 1.0 | 49.1 ± 0.6 | 49.3 ± 0.8 |
| Hadamard | 87.3 ± 0.3 | 73.5 ± 0.6 | ||||
| Weighted-L1 | 87.8 ± 0.3 | 73.3 ± 0.6 | 65.9 ± 1.0 | 84.0 ± 0.4 | 78.4 ± 0.6 | |
| Weighted-L2 | 66.1 ± 1.0 | 84.4 ± 0.4 | 78.9 ± 0.6 | |||
| Concat | 64.4 ± 0.5 | 52.4 ± 0.8 | 51.9 ± 1.0 | 57.0 ± 0.6 | 51.4 ± 0.7 | |
| D | Average | 56.2 ± 0.5 | 49.7 ± 0.5 | 50.9 ± 0.7 | ||
| Hadamard | 54.8 ± 0.6 | 51.3 ± 0.7 | 51.7 ± 1.2 | 44.7 ± 0.7 | ||
| Weighted-L1 | 55.5 ± 0.4 | 48.5 ± 0.8 | 50.2 ± 1.0 | 49.8 ± 0.7 | ||
| Weighted-L2 | 53.2 ± 0.7 | 47.8 ± 0.9 | 48.0 ± 1.1 | 48.9 ± 0.6 | 45.3 ± 0.6 | |
| Concat | 50.4 ± 0.8 | 46.4 ± 1.4 | 48.8 ± 0.5 | 49.9 ± 0.6 | ||
| D | Average | 51.4 ± 0.4 | 52.6 ± 0.6 | 53.0 ± 0.8 | 52.0 ± 0.4 | 49.9 ± 0.7 |
| Hadamard | 53.1 ± 0.4 | 49.5 ± 0.6 | 52.0 ± 0.8 | 51.7 ± 0.5 | 49.8 ± 0.6 | |
| Weighted-L1 | 64.3 ± 0.4 | 56.6 ± 0.7 | 54.2 ± 0.9 | 53.6 ± 0.4 | 47.2 ± 0.6 | |
| Weighted-L2 | 47.0 ± 0.6 | |||||
| Concat | 52.6 ± 0.3 | 51.8 ± 0.5 | 52.7 ± 0.9 | 51.5 ± 0.3 | ||
| D | Average | 51.3 ± 0.4 | 51.6 ± 0.6 | 52.5 ± 0.8 | 50.0 ± 0.4 | 50.3 ± 0.6 |
| Hadamard | 56.6 ± 0.7 | 61.5 ± 0.8 | ||||
| Weighted-L1 | 71.3 ± 0.5 | 52.0 ± 0.6 | 63.2 ± 0.6 | |||
| Weighted-L2 | 70.7 ± 0.5 | 51.5 ± 0.7 | 56.5 ± 0.8 | 63.1 ± 0.5 | 63.4 ± 0.5 | |
| Concat | 49.2 ± 0.4 | 48.8 ± 0.8 | 52.4 ± 0.9 | 49.8 ± 0.5 | 50.4 ± 0.6 | |
| ISGNS | Average | 52.4 ± 0.6 | 49.5 ± 0.8 | 44.9 ± 0.9 | 48.0 ± 0.4 | 42.7 ± 0.8 |
| Hadamard | 79.8 ± 0.4 | 59.3 ± 0.7 | 61.1 ± 1.2 | 59.3 ± 0.6 | ||
| Weighted-L1 | 80.8 ± 0.3 | 59.8 ± 0.7 | 61.7 ± 1.0 | 59.0 ± 0.6 | 49.8 ± 0.7 | |
| Weighted-L2 | 51.5 ± 0.7 | |||||
| Concat | 55.8 ± 0.7 | 50.8 ± 0.6 | 46.8 ± 0.8 | 52.2 ± 0.5 | 48.5 ± 0.6 | |
| Hadamard | 52.1 ± 0.4 | 43.8 ± 0.6 | 34.2 ± 0.2 | 55.9 ± 0.6 | 43.0 ± 0.5 | |
| Hadamard | ||||||
| Hadamard | ||||||
Macro-F1 scores for classification of nodes in epidemic states according to different SIR epidemic processes for synthetic datasets. For each we highlight the best score
| ( | Model | Dataset | |
|---|---|---|---|
| O | O | ||
| (0.25,0.002) | D | 59.6 ± 1.7 | |
| 31.2 ± 0.1 | 27.8 ± 0.6 | ||
| 57.5 ± 1.8 | |||
| (0.0625,0.002) | D | 53.8 ± 1.3 | |
| 29.8 ± 0.2 | 29.4 ± 1.4 | ||
| 59.5 ± 0.9 | |||
| (0.1875,0.001) | D | 60.3 ± 1.4 | 59.6 ± 1.5 |
| 31.9 ± 0.2 | 27.4 ± 0.7 | ||
Macro-F1 scores in temporal event reconstruction and missing event prediction for synthetic datasets. We highlight in bold the best two scores for each dataset. For baseline model we underline their highest score
| Model | Operator | Dataset | |||
|---|---|---|---|---|---|
| O | O | ||||
| Reconstruction | Prediction | Reconstruction | Prediction | ||
| D | Average | 52.2 ± 0.1 | 51.7 ± 0.1 | 51.9 ± 0.1 | 51.9 ± 0.1 |
| Hadamard | |||||
| Weighted-L1 | 70.3 ± 0.1 | 67.4 ± 0.2 | 78.2 ± 0.7 | 70.5 ± 0.3 | |
| Weighted-L2 | 70.3 ± 0.1 | 67.7 ± 0.1 | 78.8 ± 0.5 | 70.9 ± 0.3 | |
| Concat | 53.8 ± 0.1 | 54.6 ± 0.1 | 52.5 ± 0.1 | 52.5 ± 0.2 | |
| Hadamard | |||||
| Hadamard | 82.8 ± 0.3 | ||||
Number of trainable parameters and training time of each time-varying graph representation learning model for LyonSchool and the two synthetic datasets. The embedding dimension is fixed to 128, technical specifications of the computing system and hyper-parameters configuration are reported in Additional file 1
| Model | Dataset | |||||
|---|---|---|---|---|---|---|
| L | O | O | ||||
| Tr. parameters | Tr. time | Tr. parameters | Tr. time | Tr. parameters | Tr. time | |
| D | 4,396,544 | 50,825,472 | 25,591,296 | |||
| D | 459,270 | 1,867,428 | 4,270,428 | |||
| D | 3,221,632 | 25,600,128 | 12,800,128 | |||
| D | 98,336 | 323,232 | 707,232 | |||
| ISGNS | 61,952 | 512,000 | 1,280,000 | |||
| 75,264 | 524,800 | 1,282,560 | ||||
| 88,576 | 537,600 | 1,285,120 | ||||
Number of class components for each labelled class in LyonSchool dataset
| Class name | Class label | Number of children or teachers |
|---|---|---|
| CP-A | 0 | 23 |
| CP-B | 1 | 25 |
| CE1-A | 2 | 23 |
| CE1-B | 3 | 26 |
| CE2-A | 4 | 23 |
| CE2-B | 5 | 22 |
| CM1-A | 6 | 21 |
| CM1-B | 7 | 23 |
| CM2-A | 8 | 22 |
| CM2-B | 9 | 24 |
| Teachers | 10 | 10 |
Figure 3Two-dimensional projections of the 128-dim embedding manifold spanned by embedding matrices W (left of each panel) and T (right of each panel), trained on LyonSchool data, of HOSGNS model trained on: (a) and (b) . These plots show how the community structure and the evolution of time is captured by individual node embeddings and time embeddings
Figure 4Two-dimensional projections of the 128-dim embedding manifold spanned by dynamic node embeddings, trained on LyonSchool data and obtained with Hadamard products between rows of W (node embeddings) and T (time embeddings), from HOSGNS model trained on: (a) and (b) . We highlight the temporal participation to communities (left of each panel) and the time interval of activation (right of each panel)
Figure 5Two-dimensional projections of the 128-dim embedding manifold spanned by dynamic node embeddings for LyonSchool data learned with baseline methods. As in Fig. 4 we highlight the temporal participation to communities (top of each panel) and the time interval of activation (bottom of each panel)