| Literature DB >> 35072057 |
Kaiqi Zhang1, Cole Hawkins2, Zheng Zhang1.
Abstract
A major challenge in many machine learning tasks is that the model expressive power depends on model size. Low-rank tensor methods are an efficient tool for handling the curse of dimensionality in many large-scale machine learning models. The major challenges in training a tensor learning model include how to process the high-volume data, how to determine the tensor rank automatically, and how to estimate the uncertainty of the results. While existing tensor learning focuses on a specific task, this paper proposes a generic Bayesian framework that can be employed to solve a broad class of tensor learning problems such as tensor completion, tensor regression, and tensorized neural networks. We develop a low-rank tensor prior for automatic rank determination in nonlinear problems. Our method is implemented with both stochastic gradient Hamiltonian Monte Carlo (SGHMC) and Stein Variational Gradient Descent (SVGD). We compare the automatic rank determination and uncertainty quantification of these two solvers. We demonstrate that our proposed method can determine the tensor rank automatically and can quantify the uncertainty of the obtained results. We validate our framework on tensor completion tasks and tensorized neural network training tasks.Entities:
Keywords: Bayesian inference; deep learning; tensor decomposition; tensor learning; uncertainty quantification
Year: 2022 PMID: 35072057 PMCID: PMC8777296 DOI: 10.3389/frai.2021.668353
Source DB: PubMed Journal: Front Artif Intell ISSN: 2624-8212
Equations to calculate the potential energy U(Θ) for different tensor learning tasks and with different low-rank tensor formats.
|
|
|
|
|
|
|---|---|---|---|---|
| Tensor completion | Gaussian | (12) + (14) +(10) | (31) + (14) + (10) | (20) + (14) + (10) |
| Neural network classification | Multinomial | (12) + (24) | (31) + (24) | (20) + (24) |
| Neural network regression | Gaussian | (12) + (26) | (31) + (26) | (20) + (26) |
Numerical results of tensor completion for the synthetic experiment and MRI dataset.
|
|
| |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
| 0.001 | 5 | 0.0013 | 0.0047 | 5 | 0.0019 | 0.0011 | 1 | 0.1476 | 0.1517 | |
| Uniform | 0.003 | 5 | 0.0038 | 0.0040 | 5 | 0.0031 | 0.0016 | 1 | 0.1499 | 0.1607 |
| Rank-5 | 0.01 | 5 | 0.0128 | 0.0118 | 5 | 0.0114 | 0.0098 | 1 | 0.1386 | 0.1365 |
| 0.03 | 5 | 0.0403 | 0.0318 | 5 | 0.0512 | 0.0071 | 1 | 0.1468 | 0.1523 | |
| 0.001 | 5 | 0.0013 | 0.0025 | 5 | 0.0019 | 0.0007 | 6 | 0.0005 | 0.0011 | |
| Gaussian | 0.003 | 5 | 0.0038 | 0.0031 | 5 | 0.0027 | 0.0019 | 6 | 0.0033 | 0.0033 |
| Rank-5 | 0.01 | 5 | 0.0130 | 0.0102 | 5 | 0.0119 | 0.0069 | 5 | 0.0106 | 0.0110 |
| 0.03 | 5 | 0.0418 | 0.0236 | 5 | 0.0336 | 0.0193 | 7 | 0.0338 | 0.0354 | |
| MRI dataset | 65 | 0.0856 | 0.0670 | 65 | 0.0727 | 0.0319 | 17 | 0.1495 | 0.1456 | |
Results of different networks on two datasets.
|
|
|
| ||||
|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| Fashion-MNIST | NN | 3.97 × 105(1×) | –0.7118 | 88.91% | –0.6730 | 89.41% |
| TT-NN | 2.63 × 104(15.1×) | –0.6687 | 87.07% | –0.6337 | 87.78% | |
| HMC-BF-TT-NN | 4.02 × 103(98.8×) | –0.3317 | 88.24% | –0.3254 | 88.64% | |
| SVGD-BF-TT-NN | 2.8 × 104(14.1×) | –0.3317 | 88.24% | –0.3261 | 88.57% | |
| Tucker-NN | 2.57 × 105(1.54×) | –1.1673 | 87.20% | –1.0984 | 87.53% | |
| HMC-BF-Tucker-NN | 3.10 × 104(12.8×) | –1.2948 | 87.18% | –0.4405 | 88.18% | |
| SVGD-BF-Tucker-NN | 3.10 × 104(12.8×) | –1.2948 | 87.18% | –0.4705 | 87.86% | |
| CIFAR-10 | CNN | 9.91 × 106(1×) | -0.5337 | 91.54% | –0.5370 | 91.53% |
| TT-CNN | 6.93 × 105(14.3×) | –0.6077 | 89.00% | –0.5329 | 90.13% | |
| HMC-BF-TT-CNN | 7.83 × 104(127×) | –0.3936 | 86.68% | –0.3623 | 88.01% | |
| SVGD-BF-TT-NN | 7.83 × 104(127×) | –0.3936 | 86.68% | –0.3419 | 88.41% | |
LL, predictive log likelihood (the larger the better); TT, tensor train decomposition; Tucker, Tucker decomposition; BF, Bayesian low rank prior.
Figure 1The inferred TT rank at different layers. (A) 2 TT-FC layers for Fashion-MNIST. (B) 2 Tucker-FC layers for Fashion-MNIST. (C) 4 TT-Conv and 2 TT-FC layers for CIFAR-10.
SGHMC with thermostats
SVGD