| Literature DB >> 35203991 |
Bing Du1, Xiaomu Cheng1, Yiping Duan2, Huansheng Ning1.
Abstract
Brain neural activity decoding is an important branch of neuroscience research and a key technology for the brain-computer interface (BCI). Researchers initially developed simple linear models and machine learning algorithms to classify and recognize brain activities. With the great success of deep learning on image recognition and generation, deep neural networks (DNN) have been engaged in reconstructing visual stimuli from human brain activity via functional magnetic resonance imaging (fMRI). In this paper, we reviewed the brain activity decoding models based on machine learning and deep learning algorithms. Specifically, we focused on current brain activity decoding models with high attention: variational auto-encoder (VAE), generative confrontation network (GAN), and the graph convolutional network (GCN). Furthermore, brain neural-activity-decoding-enabled fMRI-based BCI applications in mental and psychological disease treatment are presented to illustrate the positive correlation between brain decoding and BCI. Finally, existing challenges and future research directions are addressed.Entities:
Keywords: brain decoding; brain–computer interface (BCI); functional magnetic resonance imaging (fMRI); generative adversarial network (GAN); graph convolutional networks (GCN); variational autoencoder (VAE)
Year: 2022 PMID: 35203991 PMCID: PMC8869956 DOI: 10.3390/brainsci12020228
Source DB: PubMed Journal: Brain Sci ISSN: 2076-3425
Figure 1Visual pathway.
Brain encoding models.
| Literature | Objective | Model/Method | Explanation |
|---|---|---|---|
| [ | Predict cortical responses | A pre-trained DNN | Train a nonlinear mapping from visual features to brain activity with a pre-trained DNN (i.e., AlexNet) using transfer learning technique. |
| [ | Representation of information in the visual cortex | GLM | A systematic modeling method is proposed to estimate an encoding model for each voxel and then to perform decoding with the estimated encoding model. |
| [ | Predict responses to a wide range of stimuli of the input images | Two-stage cascade model | This encoding model is a two-stage cascade architecture of a linear stage and nonlinear stage. The linear stage involves calculations of local filters and division normalization. The nonlinear stage involves compressive spatial summation and a second-order contrast. |
| [ | Predict the response of a single voxel or brain neurons in a region of interest in any dimensional space of the stimulus | Receptive Field (rPF) | The encoding model quantifies the uncertainty of neuron parameters, rPF size, and location by estimating the covariance of the parameters. |
| [ | Map the brain activity to natural scenes | Feature-weighted | This method converts visual stimuli to corresponding visual features and assumes that spatial features are separable and uses visual feature maps to train deep neural networks. The pre-trained deep neural network weights the contribution of each feature map to voxel activity in brain regions. |
Brain decoding model based on machine learning.
| Literature | Objective | Model/Method | Explanation |
|---|---|---|---|
| [ | Classification of visual stimuli decoded from brain activity | GLM | For different types of stimuli (objects and pictures), the cerebral cortex has different response patterns through the fMRI of the abdominal temporal cortex. |
| [ | fMRI and brain signals classification | MVPA | For fMRI in the pre-defined ROI of the cerebral cortex, the activated mode of fMRI is classified by the multivariate statistical pattern recognition. |
| [ | fMRI signals classification | SVM | SVM classifier finds the best area in the cerebral cortex that can distinguish the brain state, and then SVM is trained to predict brain state through fMRI. |
| [ | Classify fMRI activity patterns | MVPA | The Gaussian Naive Bayes classifier has high classification accuracy. Because it assumes that the importance of each voxel is the same and it does not consider the sparsity constraint, its interpretability is poor. |
| [ | Reconstruct geometric images from brain activity | MVPA | Based on a modular modeling method, the multi-voxel pattern of fMRI signals and multi-scale vision are used to reconstruct geometric image stimuli that composes of flashing checkerboard patterns. |
| [ | Reconstruct the structure and semantic content of natural images | Bayesian method | Use Bayes’ theorem to combine encoding model and prior information of natural images to calculate the probability of a measured brain response due to the visual stimuli of each image. However, only a simple correlation between the reconstructed image and the training image can be established. |
| [ | Improve fMRI Bayesian classifier accuracy | MVPA | The sparsity constraint is added to the multivariate analysis model of the Bayesian network to quantify the uncertainty of voxel features. |
| [ | Reconstruct spatio-temporal stimuli using image priors | Bayesian method | This method used a large amount of videos as a priori information and combines the videos with a Bayesian decoder to reconstruct visual stimuli from fMRI signals. |
| [ | Decoding human dreams | SVM | SVM classifier is trained to map natural images to brain activities, and a vocabulary database is used to label the images with semantic tags to decode the semantic content of dreams. |
| [ | Decode the reversible mapping between brain activity and visual images | BCCA | The encoding and decoding network is composed of generated multi-view models. The disadvantage is that its linear structure makes the model unable to express the multi-level visual features of the image, and its spherical covariance assumption cannot understand the correlation between fMRI voxels, making it more susceptible to noise. |
| [ | Dimensionality reduction of high-dimensional fMRI data | PCA | PCA reduces the dimensionality of the facial training data set, and the partial least squares regression algorithm maps the fMRI activity pattern to the dimensionality-reduced facial features. |
| [ | Infer the semantic category of the reconstructed image | Bayesian method | Propose a mixed Bayesian network based on the Gaussian mixture model. The Gaussian mixture model represents the prior distribution of the image and can infer high-order semantic categories from low-order image features through combining the prior distributions of different information sources. |
| [ | Predict object categories in dreams | MVPA | Based on CNN, train a decoder with the data set of the normal visual perceptions, and decode the neural activity to the object category. This process involves two parts: 1. map the fMRI signal to the feature space; 2. using correlation analysis to infer the object category based on the feature space. |
| [ | Decode visual stimuli from human brain activity | RNN | Use CNN to select a set of small fMRI voxel signals as the input and then use RNN to classify the selected fMRI voxels. |
| [ | Capture the direct mapping between brain activity and perception | CNN | The generator is directly trained with fMRI data by an end-to-end approach. |
| [ | Reconstruct dynamic video stimuli | CNN | The CNN-based coding model extracts the linear combination of the input video features and then uses PCA to reduce the dimensionality of the extracted high-dimensional feature space while retaining the variance of 99% of the principal components. |
Deep generative models based on VAE or GAN.
| Literature | Objective | Model/Method | Explanation |
|---|---|---|---|
| [ | Reconstruct perception images from brain activity | Deep Generative Multiview Model (DGMM) | DGMM first uses DNN to extract the image’s hierarchical features. Based on the fact that the human brain’s processing model for external stimuli is sparse, a sparse linear model is used to avoid over-fitting of fMRI data. The statistical relationships between the visual stimuli and the evoked fMRI data are modeled by using two view-specific generators with a shared latent space to obtain multiple correspondences between fMRI voxel patterns and image pixel patterns. DGMM can be optimized with an automatically encoded Bayesian model [ |
| [ | Reconstruct facial images | Deep Adversarial Neural Decoding (DAND) | DAND uses the maximum posterior estimation to transform brain activity linearly to the hidden features. Then, the pre-trained CNN and adversarial training are used to transform the hidden features nonlinearly to reconstruct human facial images. DAND showed good performance in reconstructing the details of the face’s gender, skin color, and facial expressions. |
| [ | Improve the quality of reconstructed images | Introspective Variational Autoencoders (IntroVAE) | IntroVAE generator and inference model can be jointly trained in a self-assessment manner. The generator takes the output of the inference model noise as the input to generate the image. The inference model not only learns the potential popular structure of the input image but also classifies the real image and the generative image, which is similar to GAN’s adversarial learning. |
| [ | Reconstruct natural images from brain activity | Deep Convolution Generative Adversarial Network (DCGAN) | DCGAN uses a large natural image data set to train a deep convolutional generation confrontation network in an unsupervised manner, and learn the potential space of stimuli. This DCGAN is used to generate arbitrary images from the stimulus domain. |
| [ | Reconstruct the visual stimuli of brain activity | GAN | They used an encoding model to create surrogate brain activity samples, with which the generative adversarial networks (GANs) are trained to learn a generative model of images and then generalized to real fRMI data measured during the perception of images. The basic outline of the stimuli can finally be reconstructed. |
| [ | Reconstruct visual stimuli (video) of brain activity | VAE | VAE is trained with a five-layer encoder and a five-layer decoder to learn visual representations from a diverse set of unlabeled images in an unsupervised way. VAE first converts the fMRI activity to the latent variables and then converts the latent variables to the reconstructed video frames through the VAE’s decoder. However, VAE could only provide relatively lower accuracy in higher-order visual areas compared to CNN. |
| [ | Reconstruct color images and simple gray-scale images | GAN | A pre-trained DNN decodes the measured fMRI patterns into the hierarchical features that can represent the human visual layering mechanism. The DNN network extracts image features, and then compares them with the decoded human brain activity features, which guides the deep generator network (DGN) to reconstruct images and iteratively minimizes the errors of the two. A natural image prior introduced by an enhanced DGN semantically details to the reconstructions, which improves the visual quality of generated images. |
| [ | Reconstruct the visual image from brain activity | A structured multi-output regression (SMR) model and Introspective Conditional Generation (ICG) | Decodes the brain activity to the intermediate CNN features and then maps these intermediate features to visual images. Combining maximum likelihood estimation and adversarial learning, ICG model uses divergence and reconstruction error for adversarial optimization, which can evaluate the difference between the generated image and the real image. |
| [ | Use semantic features to add details to the generated image | Shape-Semantic GAN | This framework consists of a linear shape decoder, a semantic decoder based on DNN, and an image generator based on GAN. The output of the shape decoder and the semantic decoder are input to the GAN-based image generator, and the semantic features in GAN are used as a supplement to the image details to reconstruct high quality images. |
| [ | Reconstruct natural images from brain activity | Progressively Growing GAN (PG-GAN) | This model adds a priori knowledge of potential features to (PG-GAN). The decoder decodes the measured response of the cerebral cortex into the latent features of the natural image and then reconstruct the natural image through the generator. |
| [ | Reconstruct natural images from brain activity | Similarity-conditions generative adversarial network (SC-GAN) | SC-GAN not only extracts the response patterns of the cerebral cortex to natural images but also captures the high-level semantic features of natural images. The captured semantic features is input to GAN to reconstruct natural images. |
GCN-based brain decoding.
| Literature | Objective | Model/Method | Explanation |
|---|---|---|---|
| [ | Localize brain regions and functional connections | Spatio-Temporal Graph Convolution Networks (ST-GCN) | Based on ST-GCN, the representation extracted from the fMRI data expresses both temporal dynamic information of brain activity and functional dependence between brain regions. Through training ST-GCN, this method can learn the edge importance matrix on short sub-sequences of BOLD time series to improve the prediction accuracy and interpretability of the model. |
| [ | Decode the consciousness level from cortical activity recording | BrainNetCNN | BrainNetCNN is a GCN-based decoding model, which uses multi-layer non-linear units to extract features and predict brain consciousness states. |
| [ | Predict human brain cognitive state | GCN | The brain annotation model uses six graph convolutional layers as feature extractors and two fully connected layers as classifiers to decode the cognitive state of the brain, taking a short series of fMRI data as the input, spreading the information in the annotation model network, and generating high-level domain-specific graph representations to predict the brain cognitive state. |
Figure 2Structured deep generative neural decoding model [51].