| Literature DB >> 23653617 |
Alberto Testolin1, Ivilin Stoianov, Michele De Filippo De Grazia, Marco Zorzi.
Abstract
Deep belief networks hold great promise for the simulation of human cognition because they show how structured and abstract representations may emerge from probabilistic unsupervised learning. These networks build a hierarchy of progressively more complex distributed representations of the sensory data by fitting a hierarchical generative model. However, learning in deep networks typically requires big datasets and it can involve millions of connection weights, which implies that simulations on standard computers are unfeasible. Developing realistic, medium-to-large-scale learning models of cognition would therefore seem to require expertise in programing parallel-computing hardware, and this might explain why the use of this promising approach is still largely confined to the machine learning community. Here we show how simulations of deep unsupervised learning can be easily performed on a desktop PC by exploiting the processors of low cost graphic cards (graphic processor units) without any specific programing effort, thanks to the use of high-level programming routines (available in MATLAB or Python). We also show that even an entry-level graphic card can outperform a small high-performance computing cluster in terms of learning time and with no loss of learning quality. We therefore conclude that graphic card implementations pave the way for a widespread use of deep learning among cognitive scientists for modeling cognition and behavior.Entities:
Keywords: GPUs; MPI; cognitive modeling; computer cluster; deep neural networks; hierarchical generative models; parallel-computing architectures; unsupervised learning
Year: 2013 PMID: 23653617 PMCID: PMC3644707 DOI: 10.3389/fpsyg.2013.00251
Source DB: PubMed Journal: Front Psychol ISSN: 1664-1078
Figure 1Deep network used as a test bed problem for comparison of parallel implementations. (A) Structure of the deep network composed of a stack of three Restricted Boltzmann Machines (RBM), whereby input consists of vectorized images and is provided to the lowest layer. (B) Pseudocode of the learning algorithm of one RBM layer, which computes contrastive divergence with k iterations. (C) Sample digit-image reconstructions.
Figure 2Trade-off between learning times and learning quality as a function of mini-batch size (abscise) on a log scale. (A) Unsupervised leaning time on all parallel architectures decreases with mini-batch size. The greater the number of patterns simultaneously processed, the more the computational resources involved (e.g., processing cores). (B) Zoom-in of learning times highlighting the additional speed-up of the GTX 690 card. (C) Quality of learning on the cluster and on the GTX 690 implementations, measured as misclassification of a linear classifier trained on the top-layer internal representations. (D) Quality of learning after fine-tuning of the entire deep model (see text for details).
Unsupervised learning times for various mini-batch sizes on a PC workstation, GPUs (MATLAB implementation), and a computer cluster.
| Learning time (s) ± SD | ||||
|---|---|---|---|---|
| Mini-batch size | Quad-core | GTX 690 | GTX 460 | Cluster (overhead) |
| 125 | 15191 ± 3 | 764 ± 7 | 1337 ± 24 | 16330 ± 430 (05%) |
| 250 | 14184 ± 7 | 526 ± 4 | 895 ± 14 | 8828 ± 173 (11%) |
| 500 | 13679 ± 5 | 393 ± 9 | 665 ± 27 | 4493 ± 485 (17%) |
| 1000 | 13043 ± 3 | 348 ± 16 | 528 ± 5 | 2590 ± 83 (27%) |
| 2000 | 12746 ± 12 | 306 ± 15 | 450 ± 6 | 1411 ± 52 (28%) |
| 5000 | 12832 ± 18 | 294 ± 5 | 429 ± 13 | 757 ± 7 (33%) |
| 7500 | 12934 ± 21 | 285 ± 7 | 417 ± 11 | 590 ± 3 (44%) |
The rightmost column reports the communication overhead (in percentage) on the computer cluster.