| Literature DB >> 35720707 |
Runnan Lu1, Ying Zeng1,2, Rongkai Zhang1, Bin Yan1, Li Tong1.
Abstract
Detecting video-induced P3 is crucial to building the video target detection system based on the brain-computer interface. However, studies have shown that the brain response patterns corresponding to video-induced P3 are dynamic and determined by the interaction of multiple brain regions. This paper proposes a segmentation adaptive spatial-temporal graph convolutional network (SAST-GCN) for P3-based video target detection. To make full use of the dynamic characteristics of the P3 signal data, the data is segmented according to the processing stages of the video-induced P3, and the brain network connections are constructed correspondingly. Then, the spatial-temporal feature of EEG data is extracted by adaptive spatial-temporal graph convolution to discriminate the target and non-target in the video. Especially, a style-based recalibration module is added to select feature maps with higher contributions and increase the feature extraction ability of the network. The experimental results demonstrate the superiority of our proposed model over the baseline methods. Also, the ablation experiments indicate that the segmentation of data to construct the brain connection can effectively improve the recognition performance by reflecting the dynamic connection relationship between EEG channels more accurately.Entities:
Keywords: P3 detection; brain-computer interface (BCI); electroencephalography (EEG); graph convolutional neural networks (GCN); style-based recalibration module (SRM)
Year: 2022 PMID: 35720707 PMCID: PMC9201684 DOI: 10.3389/fnins.2022.913027
Source DB: PubMed Journal: Front Neurosci ISSN: 1662-453X Impact factor: 5.152
FIGURE 1Experimental paradigm of UAV video target detection (Song et al., 2021a).
FIGURE 2The overall architecture of the proposed SAST-GCN. The model consists of three parts: the segmented adjacent matrix construction module, the spatial-temporal graph convolution module, and the classification module.
FIGURE 3The brain response of the video target. After overlaying and averaging the EEG signals of all trials according to the channel, the red curve represents the target trial data, and the blue curve represents the non-target trial data.
FIGURE 4The adjacent matrix of the information integration, decision process, and neuron response. The yellow block represents a connection between the corresponding electrodes, and the blue block represents no connection.
FIGURE 5The architecture of the style-based recalibration module. This module is mainly composed of two parts: style pooling and style integration. AvgPool refers to global average pooling; StdPool refers to global standard deviation pooling; CFC refers to the channel fully connected layer; BN refers to batch standardization.
The training process of SAST-GCN.
SAST-GCN architecture.
| Block | Layer | Kernel size | Stride | Input | Output | Activation |
| ST-graph convolution | Input | (1,T,C) | ||||
| Segmentation | (1,T,C) | (1,T1,C) | ||||
| STGCN1-TCN | (63,1) | 1 | (1,T1,C) | (8,T1,C) | ELU | |
| STGCN1-GCN | (1,1) | 1 | (8,T1,C) | (8,T1,C) | ELU | |
| STGCN2-TCN | (63,1) | 1 | (8,T1,C) | (16,T1,C) | ELU | |
| STGCN2-GCN | (1,1) | 1 | (16,T1,C) | (16,T1,C) | ELU | |
| Concatenation | (16,T1,C) | (16,T,C) | ||||
| Standard convolution | Conv1 | (1,61) | 1 | (16,T,C) | (32,T,1) | ELU |
| Avg_pool1 | (5,1) | 1 | (32,T,1) | (32,T/5,1) | ||
| Conv2 | (1,1) | 1 | (32,T/5,1) | (64,T/5,1) | ELU | |
| Avg_pool2 | (5,1) | 1 | (64,T/5,1) | (64,T/25,1) | ||
| Classifier | Reshape | (64,T/25,1) | (64×T/25×1) | |||
| Full-connection | (64×T/25×1) | 2 | Softmax |
Where, T refers to the number of time points in all stages; C refers to the number of channels; T1, T2, and T3 refers to the number of time points in the formation integration, decision process, and neural response stages, respectively.
FIGURE 6The overall performance of the SAST-GCN model. Average Accuracy refers to average accuracy of all subjects, which is 0.9055.
The accuracy, F1-score, precision, recall and complexity of different methods.
| Model | Accuracy | F1-score | Precision | Recall | FLOPs | Parameters |
| KNN | 0.6776 | 0.3728 | 0.2897 | 0.5323 | ||
| RF | 0.7143 | 0.4236 | 0.3381 | 0.5834 | ||
| SVM | 0.7856 | 0.5351 | 0.4554 | 0.6864 | ||
| Naive Bayes | 0.6863 | 0.4017 | 0.3114 | 0.6072 | ||
| AdaBoost | 0.6762 | 0.3915 | 0.3147 | 0.5723 | ||
| Fusion of traditional algorithms | 0.7541 | 0.4731 | 0.3983 | 0.6251 | ||
| EEGNET | 0.8274 | 0.6195 | 0.5536 | 0.7390 | 0.837G | 1.2K |
| CNN-LSTM | 0.8509 | 0.5647 | 0.6145 | 0.5418 | 61.34G | 282.2K |
| SAST-GCN |
|
|
|
|
|
|
The bold values represent the results of the proposed method.
Ablation studies.
| Operation | Accuracy | F1-score | Precision | Recall | FLOPs | Parameters |
| ASTGCN | 0.8787 | 0.6607 | 0.6414 | 0.6812 | 3.72G | 15.3K |
| ASTGCN+SRM | 0.8883 | 0.6744 | 0.6816 | 0.6814 | 3.72G | 15.5K |
| ASTGCN+SRM + Segmentation |
|
|
|
|
|
|
The bold values represent the results of the proposed method.
Comparison of effects of using the data in different stages on classification results.
| Stage of the brain response for classification | Accuracy | F1-score | Precision | Recall | FLOPs | Parameters |
| Formation integration | 0.8772 | 0.6344 | 0.6322 | 0.6521 | 1.23G | 15.1K |
| Decision processing | 0.8836 | 0.6499 | 0.6475 | 0.6594 | 1.30G | 15.1K |
| Neural response | 0.8816 | 0.6447 | 0.6378 | 0.6613 | 2.00G | 15.1K |
| All stages |
|
|
|
|
|
|
The bold values represent the results of the proposed method.
FIGURE 7(A) Is the heatmap of adaptive adjacency matrix. (B) Is the brain network connection of three stages.
Comparison of effects of using different adjacency matrix initialization methods.
| Method | Accuracy | F1-score | Precision | Recall |
| Fully connected matrix | 0.8238 | 0.5251 | 0.4659 | 0.6202 |
| Random matrix | 0.8852 | 0.6589 | 0.6619 | 0.6726 |
| Phase locking value matrix, | 0.8861 | 0.6635 | 0.6507 | 0.6848 |
| Coherence value matrix | 0.8818 | 0.6512 | 0.6456 | 0.6677 |
| Physical distance matrix | 0.8880 | 0.6767 | 0.6779 | 0.6856 |
| Pearson coefficient matrix |
|
|
|
|
The bold values represent the results of the proposed method.
SWOT analysis.
| Strength | Weakness | Opportunity | Threat |
| 1. An adjacency matrix is constructed to represent the brain network connection according to the neural mechanism of video processing | Disturbance of inter-subject difference on network performance | 1. It is proved that the applicability of graph neural network for video-induced P3 detection | 1. Uncertain perturbations of data need to be addressed |
| 2. The spatial-temporal features of EEG data are extracted by adaptive spatial-temporal graph convolution | 2. The construction of brain network connections by segments are better than those based on static graph design | 2. The initialization method of adjacency matrix needs to be optimized. |