| Literature DB >> 30747910 |
Tomoyasu Horikawa1, Shuntaro C Aoki1, Mitsuaki Tsukamoto1, Yukiyasu Kamitani1,2.
Abstract
Achievements of near human-level performance in object recognition by deep neural networks (DNNs) have triggered a flood of comparative studies between the brain and DNNs. Using a DNN as a proxy for hierarchical visual representations, our recent study found that human brain activity patterns measured by functional magnetic resonance imaging (fMRI) can be decoded (translated) into DNN feature values given the same inputs. However, not all DNN features are equally decoded, indicating a gap between the DNN and human vision. Here, we present a dataset derived from DNN feature decoding analyses, which includes fMRI signals of five human subjects during image viewing, decoded feature values of DNNs (AlexNet and VGG19), and decoding accuracies of individual DNN features with their rankings. The decoding accuracies of individual features were highly correlated between subjects, suggesting the systematic differences between the brain and DNNs. We hope the present dataset will contribute to revealing the gap between the brain and DNNs and provide an opportunity to make use of the decoded features for further applications.Entities:
Mesh:
Year: 2019 PMID: 30747910 PMCID: PMC6371890 DOI: 10.1038/sdata.2019.12
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Figure 1Overview of the data generation procedures.
Stimulus images were presented to human subjects in the fMRI experiments to collect fMRI signals. DNN feature decoders were first trained to decode DNN feature values of presented images from the training fMRI data, and were then applied to test fMRI data to produce sequences of decoded feature values for all DNN units. The same stimulus images were also provided to DNNs as inputs and sequences of DNN feature values were computed for all DNN units. For each individual DNN unit, the decoding accuracy (or “decodability”) was evaluated using a Pearson correlation coefficient between the sequences of decoded and true feature values. The estimated decodability was used to rank the DNN units within each DNN layer. Examples of the preferred image of high-ranking units are shown at the bottom-right.
Summary of the experimental data.
| Subject | Experiment | Session | # runs |
|---|---|---|---|
| Subject 1 ( | Training image ( | 1 | 10 |
| 2 | 10 | ||
| 3 | 4 | ||
| Test image( | 1 | 10 | |
| 2 | 10 | ||
| 3 | 5 | ||
| 4 | 10 | ||
| Subject 2 ( | Training image( | 1 | 10 |
| 2 | 10 | ||
| 3 | 4 | ||
| Test image( | 1 | 10 | |
| 2 | 10 | ||
| 3 | 10 | ||
| 4 | 5 | ||
| Subject 3 ( | Training image( | 1 | 8 |
| 2 | 8 | ||
| 3 | 8 | ||
| Test image( | 1 | 8 | |
| 2 | 9 | ||
| 3 | 8 | ||
| 4 | 6 | ||
| 5 | 4 | ||
| Subject 4 ( | Training image( | 1 | 8 |
| 2 | 8 | ||
| 3 | 8 | ||
| Test image( | 1 | 9 | |
| 2 | 9 | ||
| 3 | 9 | ||
| 4 | 8 | ||
| Subject 5 ( | Training image( | 1 | 8 |
| 2 | 4 | ||
| 3 | 6 | ||
| 4 | 3 | ||
| 5 | 3 | ||
| Test image( | 1 | 7 | |
| 2 | 7 | ||
| 3 | 5 | ||
| 4 | 4 | ||
| 5 | 5 | ||
| 6 | 7 |
Columns in task event files for the image presentation experiments.
| Column | Description |
|---|---|
| onset | Onset time of the event (sec) |
| duration | Duration of the event (sec) |
| trial_no | Trial number |
| event_type | Type of the event ( |
| stim_id | Stimulus ID |
| response_time | Subject’s response time (sec; elapsed time from the beginning of the run) |
ROI mask images included in the dataset.
| File name | ROI |
|---|---|
| sub-*_mask_LH_V1.nii.gz | Left V1 |
| sub-*_mask_RH_V1.nii.gz | Right V1 |
| sub-*_mask_LH_V2.nii.gz | Left V2 |
| sub-*_mask_RH_V2.nii.gz | Right V2 |
| sub-*_mask_LH_V3.nii.gz | Left V3 |
| sub-*_mask_RH_V3.nii.gz | Right V3 |
| sub-*_mask_LH_hV4.nii.gz | Left V4 |
| sub-*_mask_RH_hV4.nii.gz | Right V4 |
| sub-*_mask_LH_LOC.nii.gz | Left LOC |
| sub-*_mask_RH_LOC.nii.gz | Right LOC |
| sub-*_mask_LH_FFA.nii.gz | Left FFA |
| sub-*_mask_RH_FFA.nii.gz | Right FFA |
| sub-*_mask_LH_PPA.nii.gz | Left PPA |
| sub-*_mask_RH_PPA.nii.gz | Right PPA |
| sub-*_mask_LH_HVC.nii.gz | Left higher visual cortex (HVC) |
| sub-*_mask_RH_HVC.nii.gz | Right higher visual cortex (HVC) |
Summary of the DNN feature and decodability datasets.
| < | < | < | < | < | Data size |
|---|---|---|---|---|---|
| The data files in Figshare and Zenodo (Data Citations 4, 5) are named as “< | |||||
| accuracydecodedranktrue | S1S2S3S4S5Averaged (only for accuracy and rank) | ImageNet ID for 50 stimulus images in the format n*****_****.(only for decoded and true) | AlexNet | conv1 | 55 × 55 × 96 |
| conv2 | 27 × 27 × 256 | ||||
| conv3 | 13 × 13 × 384 | ||||
| conv4 | 13 × 13 × 384 | ||||
| conv5 | 13 × 13 × 256 | ||||
| fc6 | 1 × 1 × 4096 | ||||
| fc7 | 1 × 1 × 4096 | ||||
| fc8 | 1 × 1 × 1000 | ||||
| VGG19 | conv1_1 | 224 × 224 × 64 | |||
| conv1_2 | 224 × 224 × 64 | ||||
| conv2_1 | 112 × 112 × 128 | ||||
| conv2_2 | 112 × 112 × 128 | ||||
| conv3_1 | 56 × 56 × 256 | ||||
| conv3_2 | 56 × 56 × 256 | ||||
| conv3_3 | 56 × 56 × 256 | ||||
| conv3_4 | 56 × 56 × 256 | ||||
| conv4_1 | 28 × 28 × 512 | ||||
| conv4_2 | 28 × 28 × 512 | ||||
| conv4_3 | 28 × 28 × 512 | ||||
| conv4_4 | 28 × 28 × 512 | ||||
| conv5_1 | 14 × 14 × 512 | ||||
| conv5_2 | 14 × 14 × 512 | ||||
| conv5_3 | 14 × 14 × 512 | ||||
| conv5_4 | 14 × 14 × 512 | ||||
| fc6 | 1 × 1 × 4096 | ||||
| fc7 | 1 × 1 × 4096 | ||||
| fc8 | 1 × 1 × 1000 | ||||
Figure 2Evaluations of DNN feature decoding.
(a) Violin plots of feature decoding accuracy for each DNN layer and model. Distributions of the decoding accuracies of all individual units in each DNN layer are shown (pooled across five subjects, predicted from VC). Black bars denote mean decoding accuracies averaged across all units and subjects. (b) Scatter plots of decoding accuracies of individual DNN units from two subjects (AlexNet, VC). Each dot denotes the decoding accuracies of each DNN unit estimated from Subject 1 (vertical axis) and Subject 2 (horizontal axis). The color of each dot indicates the density of the plotted dots. For visualization purpose, randomly selected subsets of units are shown with a maximum of 1000 units. (c) Mean correlation coefficients between decoding accuracies of DNN units from different subjects (VC). Pearson correlation coefficients between decoding accuracies of individual DNN units obtained from different subjects were calculated for all pairs of subjects (10 pairs from 5 subjects). Each dot denotes the correlation coefficients for each pair of subjects.