| Literature DB >> 35996678 |
Christoph Linse1, Hammam Alshazly2, Thomas Martinetz1.
Abstract
Within the last decade Deep Learning has become a tool for solving challenging problems like image recognition. Still, Convolutional Neural Networks (CNNs) are considered black-boxes, which are difficult to understand by humans. Hence, there is an urge to visualize CNN architectures, their internal processes and what they actually learn. Previously, virtual realityhas been successfully applied to display small CNNs in immersive 3D environments. In this work, we address the problem how to feasibly render large-scale CNNs, thereby enabling the visualization of popular architectures with ten thousands of feature maps and branches in the computational graph in 3D. Our software "DeepVisionVR" enables the user to freely walk through the layered network, pick up and place images, move/scale layers for better readability, perform feature visualization and export the results. We also provide a novel Pytorch module to dynamically link PyTorch with Unity, which gives developers and researchers a convenient interface to visualize their own architectures. The visualization is directly created from the PyTorch class that defines the Pytorch model used for training and testing. This approach allows full access to the network's internals and direct control over what exactly is visualized. In a use-case study, we apply the module to analyze models with different generalization abilities in order to understand how networks memorize images. We train two recent architectures, CovidResNet and CovidDenseNet on the Caltech101 and the SARS-CoV-2 datasets and find that bad generalization is driven by high-frequency features and the susceptibility to specific pixel arrangements, leading to implications for the practical application of CNNs. The code is available on Github https://github.com/Criscraft/DeepVisionVR.Entities:
Keywords: Deep convolutional neural network visualization; Explainable artificial intelligence; Human-understandable AI systems; Virtual reality
Year: 2022 PMID: 35996678 PMCID: PMC9387423 DOI: 10.1007/s00521-022-07608-4
Source DB: PubMed Journal: Neural Comput Appl ISSN: 0941-0643 Impact factor: 5.102
Fig. 1DeepVisionVR architecture
Fig. 2Representation of CovidResNet in 3D space. Each 2D panel shows the feature maps (channels) of a specific layer. Negative activations are colored blue, zero activations black and positive activations white
Fig. 3Left: Dataset panel from which the user can pick images from. The software randomly draws images from a provided Pytorch dataset class. Center: User interface and statistics for a specific network layer. Right: The feature visualization generates input images, which maximize the mean activation of one specific channel. Each color image corresponds to one generated image for that specific channel
Fig. 4Representation of different architectures. Left: ResNet basic block. Right: Dense block. Bottom: Inception block
Fig. 5Three training strategies for getting models with different levels of generalization abilities
Details on training the models used for visualization
| Experiment | Mode | Epochs | Initial lr | Weight decay | Augmentation | Image shape |
|---|---|---|---|---|---|---|
| Caltech CovidResNet | Standard | 200 | 0.002 | On | On | |
| Labels shuffled | 250 | 0.001 | Off | Off | ||
| Noisy samples | 1000 | 0.001 | Off | Off | ||
| SARS CovidResNet | Standard | 150 | 0.002 | On | On | |
| Labels shuffled | 250 | 0.001 | Off | Off | ||
| Noisy samples | 1000 | 0.001 | Off | Off | ||
| SARS CovidDenseNet | Standard | 150 | 0.002 | On | On | |
| Labels shuffled | 250 | 0.001 | Off | Off | ||
| Noisy samples | 1000 | 0.001 | Off | Off |
Fig. 6Example images from the Caltech101 dataset for the classes crocodile head, panda, pyramid, rooster, schooner, Snoopy, sunflower, wild cat and Yin and Yang
Example images from the SARS-CoV-2 dataset
CovidResNet architecture. The output sizes are determined for an input size of pixels
| Layers | Output size | CovidResNet |
|---|---|---|
| Convolution |
| |
| Max pool |
| |
| ResNet block 1 |
|
|
| ResNet block 2 |
|
|
| ResNet block 3 |
|
|
| ResNet block 4 |
|
|
| Classification layer |
| Adaptive average pool |
| Fully connected, softmax |
CovidDenseNet architecture. The output sizes are determined for an input size of pixels
| Layers | Output size | CovidResNet |
|---|---|---|
| Convolution |
| |
| Max pool |
| |
| Dense block 1 |
|
|
| Transition 1 |
| |
|
| Average pool | |
| Dense block 2 |
|
|
| Transition 2 |
| |
|
| Average pool | |
| Dense block 3 |
|
|
| Transition 3 |
| |
|
| Average pool | |
| Dense block 4 |
|
|
| Classification layer |
| Adaptive average pool |
| Fully connected, softmax |
Performance metrics for the three different training strategies
| Architecture | Dataset | Type | Test accuracy |
|---|---|---|---|
| CovidResNet | Caltech | Standard | 0.78 |
| CovidResNet | Caltech | Shuffled labels | 0.01 |
| CovidResNet | Caltech | Noisy samples | 0.42 |
| CovidResNet | SARS-CoV-2 | Standard | 0.96 |
| CovidResNet | SARS-CoV-2 | Shuffled labels | 0.49 |
| CovidResNet | SARS-CoV-2 | Noisy samples | 0.97 |
| CovidDenseNet | SARS-CoV-2 | Standard | 0.97 |
| CovidDenseNet | SARS-CoV-2 | Shuffled labels | 0.50 |
| CovidDenseNet | SARS-CoV-2 | Noisy samples | 0.99 |
Fig. 7First training strategy: activations of the last convolutional layer
Fig. 8Second training strategy: activations of the last convolutional layer. Before training, all labels in the train set were shuffled
Fig. 9Third training strategy: activations of the last convolutional layer. The train set was copied 9 times and each copy received white noise and random labels. The original dataset with original labels is contained exactly once
Feature visualization for single channels in CovidResNet trained on the Caltech101 dataset with different training strategies. Best viewed in color with zoom
Feature visualization for single channels in CovidResNet trained on the SARS-CoV-2 CT-scan dataset with different training strategies. Best viewed in color with zoom
Feature visualization for single channels in CovidDenseNet trained on the SARS-CoV-2 CT-scan dataset with different training strategies. Best viewed in color with zoom