Literature DB >> 36197639

Enhancing the feasibility of cognitive load recognition in remote learning using physiological measures and an adaptive feature recalibration convolutional neural network.

Chennan Wu¹, Yang Liu², Xiang Guo¹, Tianshui Zhu¹, Zongliang Bao¹.

Abstract

The precise assessment of cognitive load during a learning phase is an important pathway to improving students' learning efficiency and performance. Physiological measures make it possible to continuously monitor learners' cognitive load in remote learning during the COVID-19 outbreak. However, maintaining a good balance between performance and computational cost is still a major challenge in advancing cognitive load recognition technology to real-world applications. This paper introduced an adaptive feature recalibration (AFR) convolutional neural network to overcome this challenge by capturing the most discriminative physiological features (EEG and eye-tracking). The results revealed that the optimal average classification accuracy of the feature combination obtained by the AFR method reached 95.56% with only 60 feature dimensions. Additionally, compared with the best result of the conventional correlation-based feature selection (CFS) method, the introduced AFR algorithm achieved higher accuracy and cheaper computational cost, as well as a 2.06% improvement in accuracy and a 51.21% reduction in feature dimension, which is more in line with the requirements of low delay and real-time performance in practical BCI applications.

Entities: Chemical

Keywords: Cognitive load; Deep learning; EEG; Eye-tracking; Multimodal; Remote learning

Year: 2022 PMID： 36197639 PMCID： PMC9532827 DOI： 10.1007/s11517-022-02670-5

Source DB: PubMed Journal: Med Biol Eng Comput ISSN： 0140-0118 Impact factor: 3.079

Introduction

According to cognitive load theory by Sweller et al. [1-3], people have limited cognitive resources for processing and holding information. Learners might fail to complete cognitive tasks that exceed human working memory capacity due to cognitive overload [4]. Therefore, controlling and optimizing learners’ limited cognitive resources to achieve the best learning effect is important in engineering education [5]. The recent spread of COVID-19 has driven most schools to remote learning, leading to the emergence of a growing number of distance-learning applications. In a real-world classroom, experienced instructors can acutely sense whether the students understand the material and concepts being taught, which allows teachers to adjust teaching strategies immediately and reasonably. However, for remote teaching, it becomes difficult for teachers to judge the teaching effect through observation. Therefore, instructors desperately need new tools for monitoring and evaluating students’ cognitive states to effectively conduct remote teaching. Physiological signals have emerged as an alternative for overcoming this challenge. Traditionally, it is accepted that physiological signals are the most effective indicator of cognitive load because they provide efficient temporal resolution of long-term monitoring [6] and higher feasibility for estimating cognitive load compared to subjective rating methods [7]. In particular, electroencephalography (EEG) has been proven to be an effective, noninvasive method for detecting, estimating, or predicting human brain activities [8, 9]. The different rhythms generated by electrical brain activity, such as delta (1–3 Hz), theta (4–7 Hz), alpha (8–13 Hz), beta (14–30 Hz), and gamma (31–50 Hz), are the most popular features in the context of cognitive load recognition [10], especially the theta and alpha range, which seem to involve higher brain function, reflecting task difficulty or cognitive load among diverse task demands [11-15]. Moreover, nonlinear EEG features (i.e., spectral entropy) also perform well in EEG cognitive analysis [16-18]. In addition, eye-tracking is also a commonly used measure in analysing the effectiveness of learning materials [19]. For eye-tracking data, previous studies have confirmed that pupil diameter and fixation duration are sensitive indicators of variations in mental workload [20, 21]. Furthermore, combining EEG and eye-tracking signals can gather multimodal information, thereby leading to better accuracy and cognitive load measurement performance [22, 23]. However, improving classification performance and reducing computational costs is still a major challenge to advancing cognitive load recognition technology from laboratory to real-world applications. To address this, researchers have attempted to develop appropriate classifiers using different machine learning (ML) techniques. Nue et al. developed a lightweight CNN model to classify cognitive states for real-time computing environments, which reduces trainable parameters while maintaining high performance [24]. This paper focuses on improving the feature selection methods to select the most discriminative features to remedy the above challenge. In previous studies, the common algorithm used for feature selection was the CFS method, where the features are selected by ranking their correlation coefficient between the features and class labels [25, 26]. It has been shown to be effective in reducing feature dimensions [27]. Recently, the widespread use of deep learning in a variety of fields has shown its superiority over conventional machine learning, which motivates us to apply an adaptive feature recalibration (AFR) convolutional neural network to select the most discriminative features. The AFR algorithm is representative of the attention mechanism and was first proposed by Hu et al. in computer vision [28, 29]. In this research, we applied the AFR method to enhance feature learning in the EEG and eye-tracking feature selection field, which is realized by diverting attention to the most critical features and disregarding irrelevant features, thus determining what to focus. It is a novel perspective on enhancing cognitive load recognition performance since cognitive load recognition performance relies mainly on the quality of selected features that characterize cognitive activity [30]. To apply cognitive load recognition techniques to a remote learning environment, physiological signals are used as a medium, and the AFR method is introduced to maintain a good balance between performance and complexity. In this paper, we acquired a two-level cognitive load using EEG and eye-tracking from 42 healthy subjects. First, 123 kinds of handcrafted features were extracted, and statistical analysis was performed to verify whether physiological features can be used to discriminate different cognitive load levels in distance learning. Then, the AFR method was employed to select discriminate features for classification, which was accomplished by a radial basis function support vector machines (SVMs) classifier. Additionally, to demonstrate the advantages of the AFR method, the classification accuracy and feature dimensions used for classification are compared between the introduced AFR method and the traditional CFS method.

Material

Participants

This study was reviewed and approved by the Ethics Committee of Zhejiang University of Science and Technology. Forty-two undergraduate students (16 females, 26 males) with an average age of 20.81 ± 1.13 years provided written informed consent to participate in the study. In addition, participants’ scores in a programming course were utilized to guarantee that they had similar prior knowledge (). Participants with (1) a history of major craniocerebral injury or neurological disease or (2) low arousal levels prior to the trial phase were excluded from the experiments [31].

Task and stimuli

The participants were asked to complete a Python-themed online course learning task. According to the conclusion of Mayer et al., positive emotions in multimedia learning facilitate cognitive processes and learning, which can be produced by the design of various multimedia elements, such as the layout, colour, and sound [32, 33]. Therefore, we took colour as an independent variable to design two versions of the online course that would induce two different cognitive load levels. Screenshots of the two online courses are presented in Fig. 1. As shown in Fig. 1, the Neutral Emotional Design (NED) online course was achromatic, and the Positive Emotional Design (PED) adopted the “Palenight Theme” to highlight code format. All other variables were maintained at constant values (including course content and teacher teaching style). We hypothesized that an online course designed with positive emotion produces lower cognitive load levels.

Fig. 1

Screenshots of two versions of the online courses

Procedures

Subjects were randomly assigned to either the PED group or the NED group. Before the experiment began, we put electrode caps on them, injected conductive adhesive into the electrodes, and calibrated the eye-movement camera. We then asked participants to close their eyes for 3 min so that we could take a baseline measurement. Then, the formal experiment began: students were presented with the designated online course, which lasted an estimated 6 min. Upon viewing the assigned online course, the participants’ perceptions of task difficulty were collected using a nine-point Likert scale (i.e., how easy or difficult was the online course to understand? [34]), which has been shown to be effective in assessing learners’ cognitive load during the learning process [35, 36]. Meanwhile, the participants completed the learning performance test to evaluate their ability to understand and retain the information presented in the courses. The entire process lasted an estimated 40 min.

Methods

The combined recording of EEG and eye-tracking data obtained in the experiment was analysed in a variety of ways, including signal preprocessing, feature extraction, feature selection, and cognitive load classification. An overview of the detailed analysis procedure is presented in Figs. 2 and 4.

Fig. 2

The procedure of data processing (data acquisition, preprocessing, and feature extraction)

Fig. 4

The overall framework of the AFR and CFS models for cognitive load classification

The procedure of data processing (data acquisition, preprocessing, and feature extraction)

Signal preprocessing

The EEG collection device used in this study was OpenBCI, and the eye-tracking equipment was Tobii T120. EEG was measured at 15 electrode positions with a sampling rate of 125 Hz (see Fig. 3), arranged according to the international 10/20 system [37]. The original EEG and eye-tracking data were preprocessed and analysed in MATLAB using customized MATLAB scripts. We visually examined the EEG and eye-tracking data and excluded four subjects’ data due to poor quality signals, leaving the PED and NED groups with 19 valid subjects’ data.

Fig. 3

Electrode placement configuration according to the 10–20 system

Electrode placement configuration according to the 10–20 system First, event markers at the start and the end of stimulus in both EEG and eye-tracking data served as synchronization events, and the eye-tracking data were integrated into the EEG data as additional channels with an upsampling rate operation (i.e., channels that contain the pupil diameter of each eye and the fixation duration); second, empirical mode decomposition (EMD) was used to decompose and filter the EEG signal; thereafter, the EEG signals were rereferenced by the average of all electrodes; finally, to correct artefacts such as blinking and muscle tension, the independent component correlation algorithm (ICA) was performed [38-40].

Feature extraction

The primary purpose of feature extraction is to derive salient features that can map physiological signals into consequent cognitive states. Additionally, extracting appropriate features plays an important role in classification tasks. In this paper, we conducted two separate EEG data analyses (i.e., nonlinear dynamics and wavelet transform) and calculated cognitive-related eye-tracking measures (pupil diameter and fixation duration) based on data epochs with a size of 4 s. Consequently, we extracted 123 kinds of cognitive load-related features, which consisted of 75 frequency band power features (5 rhythm × 15 channels), 45 entropy features (approximate, sample, and wavelet entropy × 15 channels), and 3 eye-tracking indicators (pupil diameter of each eye and fixation duration).

Nonlinear dynamics

EEG signals are highly complex, and nonlinear analysis is an especially important method to analyse physiological time series that contain complex dynamics. In this paper, we studied EEG signals using two nonlinear dynamical methods: approximate entropy and sample entropy [41, 42].

Approximate entropy

Approximate entropy (ApEn), proposed by Pincus in 1991, can be used to describe the irregularity of a time series [18]. The approximate entropy value reflects the degree of complexity corresponding to a time series, indicating the probability of generating a new pattern as the dimension increases. Intuitively, the more irregular and complex the time series is, the larger the corresponding ApEn value. ApEn can be reliably estimated from relatively short and noisy data. The details of the algorithm are shown below [43, 44]. A time series consisting of N pieces of data is defined, and the corresponding approximate entropy is calculated according to the following steps: Step 1: Each successive data in a time series can form a -dimensional vector, that is: where . Step 2: The maximum distance between time series and is denoted as , which is the largest absolute value of the difference between each component: where Step 3: Given a positive threshold , if the maximum distance of two time series does not exceed the threshold , the two time series are considered similar. The number of time series that are similar to time series can be counted, and its proportion to the total number of time series is given as: where represents the number of time series that match Step 4: For each , calculate its logarithmic average: Step 5: Increase the dimension to . Repeating steps 1–4, can be calculated as: Step 6: The approximate entropy is defined as:

Sample entropy

Sample entropy (SampEn), proposed by Richman et al. [45], is a time series complexity measure based on improved ApEn, in which it does not need to perform its own matching [46]. Intuitively, the value of SampEn reflects the degree of complexity in a time series. The sample entropy is defined as follows:

Wavelet transform

Wavelet transform is a typical and practicable time–frequency analysis method that has been widely used in analysing EEG signals [47-49]. By using the Daubechies wavelet as the basis function of the wavelet transform, the EEG signal was decomposed into five subbands [50]: delta (1–3 Hz), theta (4–7 Hz), alpha (8–13 Hz), beta (14–30 Hz), and gamma (31–50 Hz); in this manner, the frequency components of five frequency bands in the EEG signal could be extracted. For EEG signals of 15 channels, three features were extracted from the wavelet coefficients of each subband, which were calculated as [51]: Wavelet energy Wavelet energy ratio Wavelet entropy where represents wavelet coefficients of the corresponding decomposition levels.

Feature selection

Feature selection is a critical step in bringing cognitive load recognition techniques to practical use since selecting appropriate features can improve the model’s learning efficiency and significantly benefit the performance of cognitive load recognition [52]. In this paper, two feature selection methods are compared. The first is the AFR method introduced in this paper, which utilizes a convolutional neural network to select discriminative features. The second is the conventional feature selection method represented by CFS, which selects prominent features according to statistical parameters (i.e., correlation coefficient). Both of them obtain a weight distribution, the former from squeeze-and-excitation networks and the latter from the correlation coefficient (Fig. 4). The overall framework of the AFR and CFS models for cognitive load classification

Adaptive feature recalibration method

The AFR method aims to improve feature representation ability by adaptively selecting the discriminative features through the squeeze-and-excitation (SE) network [29]. Our AFR CNN model contains three SE blocks, which are inserted into three normal convolution layers. The specific structure is illustrated in Fig. 5, and the parameters of each CNN layer are described in Table 1. A special classification task is set to evaluate the quality of the weight coefficients generated by the first SE block. In particular, we choose a depthwise convolution (DC) layer instead of a generic convolution (GC) for the first SE block since a generic convolution mixes the channel information, which is not conducive to channel interpretation [28].

Fig. 5

The structure of the AFR convolutional neural network (batch normalization after each convolutional block is omitted in the figure)

Table 1

The parameters of each CNN layer

Layer	Param	Output shape (batch, channel, signal)
Input		\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$38\times 123\times 90$$\end{document}38×123×90
Depthwise conv	Filter = 123, ksize = 7, stride = 2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$38\times 123\times 44$$\end{document}38×123×44
Generic conv	Filter = 48, ksize = 5, stride = 2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$38\times 48\times 21$$\end{document}38×48×21
Generic conv	Filter = 8, ksize = 3, stride = 2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$38\times 8\times 10$$\end{document}38×8×10
Fully connected	Weight \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$=38\times 80\times 32$$\end{document}=38×80×32	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$38\times 32\times 1$$\end{document}38×32×1
Fully connected	Weight \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$=38\times 32\times 16$$\end{document}=38×32×16	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$38\times 16\times 1$$\end{document}38×16×1
Fully connected	Weight \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$=38\times 16\times 2$$\end{document}=38×16×2	\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$38\times 2\times 1$$\end{document}38×2×1

The structure of the AFR convolutional neural network (batch normalization after each convolutional block is omitted in the figure) The parameters of each CNN layer For each SE block, the “squeeze” operation first aggregates feature maps across the time dimension by carrying out global average pooling, which shrinks feature map F to S , where N denotes the total feature size and d denotes the data length. Then, an “excitation” operation is followed, which captures the featurewise dependencies and outputs per-feature modulation weights by two fully connected layers. In particular, the first fully connected layer is followed by a ReLU activation function to perform dimensionality reduction. The reduction factor needs to be carefully adjusted within the training process. The second fully connected layer is followed by a smoothing sigmoid activation function to increase the dimensionality. The output of the SE operation can be formulated as: where and refer to the two fully connected layers, σ is the sigmoid, and represents the ReLU activation function. We implement our model with the PyTorch framework. The binary cross-entropy was chosen as a loss function since our training task can be regarded as a binary task. Its mathematical representation is as follows. where refers to the class label, denotes the probability that the class label is consistent with the class label predicted by the model, and represents the total number of samples. Additionally, ADAM was introduced as the optimizer with a learning rate of 1e − 2 to speed up the correction of the loss function. The batch size was experimentally selected as 38, and the model was trained for 150 epochs. During the training, the dataset was split into 80% for training and the remaining 20% for validation. We adopted a fivefold cross-validation and took the average accuracy as our result. After approximately 50 iterations, the accuracy is close to 100% with the loss down to zero. Because of available limited labelled training samples, the neural network may suffer from overfitting. However, our primary goal is to select discriminative features for a given task, regardless of its generalization ability on other tasks. In contrast, a degree of overfitting will help the prominent features to be more obvious [53].

Correlation-based feature selection method

The correlation-based feature selection (CFS) method is a supervised method of feature selection, through which each feature obtains a correlation coefficient, determining how highly the feature correlates with the target class [54, 55]. We calculated the correlation coefficient between the features and the label of each feature to express each feature’s correlation with cognitive load. A large correlation coefficient value indicates a strong correlation between the feature and the target class, which can be used as the basis for selecting features. Features most relevant to cognition load can be identified by ranking their coefficients in descending order. The correlation coefficient between each feature and cognitive load is calculated as follows: In this index, stands for “covariance” and stands for “correlation coefficient.”

Classification

To demonstrate the effectiveness of the AFR method, a uniform classifier needed to be predefined to compare the classification performance of the distinguishing features selected by the two methods. According to previous research [56], SVMs usually result in better generalization than softmax due to their maximum margin property [57]. In addition, the radial basis function (RBF) can map raw data to infinite-dimensional space, which usually leads to better performance and is widely used in practical applications [58]. Furthermore, the original SVM implementation solves binary class/nonclass separation problems, which is in line with our goal [56]. Thus, RBF-SVM was adopted as a reference to evaluate the performance of the introduced method in our research. The SVM approach was first introduced by Vapnik as a potential alternative to conventional artificial neural networks [59-61]. When SVM is used, a set of training samples needs to be given, where each sample contains several features and is accompanied by a property label clarifying to which of the two categories it belongs. SVM is realized in the following way: a dataset with labels is given, with the task of finding a linear plane meeting the following condition: where denotes the feature vector and stands for its corresponding label. In addition, the RBF-SVM model was trained using fivefold cross-validation on the datasets in this study.

Results

Statistical analysis of physiological features

To verify the effectiveness of applying physiological measures to distinguish two cognitive states, one-way analysis of variance (ANOVA) and Mann–Whitney U tests were conducted to make comparisons between the NED group () and the PED group (). The descriptive statistics of all variables in each group are presented in Table 2.

Table 2

Means and standard deviations of all variables by conditions

	NED group		PED group
	M	SD	M	SD
Delta power	35.26	18.9	38.00	15.85
Theta power	10.70	5.08	16.12	5.83
Alpha power	11.72	4.36	16.37	8.60
Beta power	38.11	18.92	26.46	13.02
Gamma power	4.58	2.39	3.11	1.90
Sample entropy	0.66	0.044	0.60	0.43
Approximate entropy	1.44	0.10	1.19	0.093
Wavelet entropy	2.14	0.018	2.13	0.019
Left pupil diameter	3.71	0.42	3.29	0.35
Right pupil diameter	3.72	0.46	3.37	0.31
Fixation duration	314.00	137.50	218.12	103.23
Perceived task difficulty	5.71	1.76	4.28	1.23
Learning performance	3.10	1.73	5.09	2.07

Means and standard deviations of all variables by conditions Figure 6 shows the weight distribution of five frequency band powers in different brain regions. One-way ANOVA revealed that there was a main effect with the dependent variable theta band and alpha band power across the two groups. Compared to the NED group, subjects had increased theta and alpha activity while learning the PED online courses, indicating that the participants in the PED group had a lower cognitive load level than those in the NED group [62].

Fig. 6

The weight distribution of five frequency band powers in different brain regions

The weight distribution of five frequency band powers in different brain regions Boxplots of approximate, sample, and wavelet entropy in the two experimental groups are shown in Fig. 7a. One-way repeated-measures ANOVA revealed a significant main effect across two groups on approximate entropy , sample entropy , and wavelet entropy . For subjects who watched the online course adopting a positive emotional design, the mean spectral entropy (i.e., approximate, sample, and wavelet entropy) was higher than those who watched the online course with a neural emotion design, suggesting that the PED online course can reduce the cognitive load on learners [63].

Fig. 7

Boxplot data for participants across the two groups: a spectral entropy (approximate entropy, sample entropy, and wavelet entropy). b Eye-tracking (left and right pupil diameters, and fixation duration) Figure 7b shows the boxplot of eye-tracking features across the two conditions. The one-way ANOVA revealed a main effect of the two online courses on the left pupil diameter , right pupil diameter and fixation duration . Subjects in the PED group had smaller pupil diameter and shorter fixation duration than those in the NED group. These results reveal that learning material employing neutral emotional design had a higher cognitive load assumption when compared to PED online courses [20, 64]. In addition, subjective ratings of cognitive load and objective scores of the learning performance test collected after the experiment were also analysed to confirm the results drawn from the physiological measures. For the ANOVA with the dependent variable subjective ratings, learners who watched NED online courses perceived higher difficulty than those who viewed PED online courses, . There was also a significant learning performance effect, , indicating that the PED group can better understand and retain the information presented in the courses. These results support our hypothesis and are consistent with the conclusions of Mayer and Estrella [32] and Um et al. [33].

Recognition performance comparison

To demonstrate the advantage of the introduced AFR method, we evaluated the performance of the distinguishing features selected by the AFR method and used the CFS method to make an overall comparison. In the experiments, classification accuracy (ACC) was adopted as the evaluation criterion, and all these methods were retested on the datasets obtained from 38 valid participants in this study. We first calculated the average weight distribution between features and the class labels for all EEG and eye-tracking features and sorted them in descending order. Then, a useful tensor was declared, and the features corresponding to the top 10 weight distributions were initialized as the initial useful tensor. Every time an extra 10 features were added to the useful tensor, an RBF-SVM was employed. The changes in classification accuracy are depicted in Table 3.

Table 3

Means and standard deviations of classification accuracies achieved through the AFR and CFS methods

Feature dimension	AFR		CFS
Feature dimension	M	SD	M	SD
10	81.06	0.35	78.72	0.12
20	90.36	0.15	88.4	0.29
30	92.92	0.23	91.92	0.40
40	95.32	0.18	93.32	0.15
50	95.2	0.20	93.02	0.19
60	95.56	0.16	92.8	0.11
70	95.12	0.29	92.48	0.16
80	94.76	0.40	92.7	0.18
90	94.1	0.37	92.82	0.12
100	94.58	0.29	92.7	0.17
110	94	0.21	93.24	0.21
120	93.34	0.21	93.28	0.29
All	93.44	0.21	93.4	0.19

Means and standard deviations of classification accuracies achieved through the AFR and CFS methods In Table 3, we can learn that the AFR method can achieve relatively high average accuracies compared to the CFS method. For the top 110 dimensions, the results achieved by the AFR method are 2.34%, 1.96%, 1.00%, 2.00%, 2.18%, 2.76%, 2.64%, 2.06%, 1.28%, 1.88%, and 0.76% higher than those of the CFS method in average accuracy. Specifically, when the number of dimensions reaches 60, our AFR method reaches the highest average accuracy of 95.56%. Moreover, when the classification accuracy reaches the maximum value, the feature dimension decreases to 60, which is much smaller than the original full dimension. However, all features need to be added to the useful tensor for the CFS method to achieve the highest average accuracy (93.5% with a full feature dimension of 123).

Discussion

In this study, we combined EEG and eye-tracking measures to evaluate cognitive load in distance learning. As expected, the statistical analysis results in “Sect. 4.1” indicate the feasibility and validity of using physiological measures to monitor cognitive load in remote learning. The learners with lower cognitive load had higher theta and alpha band power, higher approximate, sample, and wavelet entropy, smaller pupil diameter, and shorter fixation when watching online courses. The results in “Sect. 4.2” demonstrate that our AFR method is better in maintaining a good compromise between performance and computational cost compared to the CFS method, as well as a 2.06% improvement in accuracy and a 51.21% reduction in feature dimension. Therefore, we demonstrated that the application of physiological measures and the AFR method is beneficial to enhancing the feasibility of cognitive load recognition in distance learning. There are three possible explanations for the excellent performance of the AFR method. First, this might be because the AFR method is highly sensitive to irrelative noise, which is consistent with previous findings [65]. Figure 8 shows the visualization result of the classification performance with discriminative features obtained by the AFR and CFS methods. As shown in Table 3, for the AFR method, the classification accuracy achieved by 60 dimensions (95.56%) is better than that achieved by full feature dimensions (93.44%), which demonstrates the ability of the AFR method to effectively remove interference information. In addition, the accuracy of the AFR method presents an overall downwards trend after the highest point in dimension 60 (see Fig. 8), indicating that fusing irrelevant interfering features reduces the recognition accuracy. In contrast, for the CFS method, all features need to be added to the useful tensor to achieve the highest average accuracy. As shown in Fig. 8, the CFS method accuracy shows a fluctuating upwards trend after 40 dimensions, indicating the poor ability of the CFS method to identify redundant or irrelevant information.

Fig. 8

The classification performance of the selected feature combination using the AFR and CFS methods (a standard error of less than 0.3 was omitted in the figure)

The classification performance of the selected feature combination using the AFR and CFS methods (a standard error of less than 0.3 was omitted in the figure) Second, we plotted the average weights of all subjects for each feature and highlighted features with average weight values larger than “mean + standard deviation” (see Fig. 9). It can be seen that the highlighted physiological features were not limited to a particular type of feature, which were in turn scattered in all types of features, indicating that the AFR method can help us combine the robust advantages of multiple and multimodal physiological features to significantly enhance the cognitive load recognition performance. These findings were consistent with previous studies showing that combining multimodal physiological features could enhance recognition [66].

Fig. 9

The feature weights obtained by the AFR method (“mean weight” and “mean + standard deviation weight” are marked in the figure)

The feature weights obtained by the AFR method (“mean weight” and “mean + standard deviation weight” are marked in the figure) The last possible explanation might lie in the strength of the multiresolution convolutional neural network, which makes the AFR method not only powerful but also superior to statistical and machine learning algorithms [65]. AFR explicitly models the interdependencies between the features and efficiently captures features that are most salient for a given task through a residual squeeze-and-excitation block, which helps the lower layers of the network to exploit more contextual information outside its local receptive field [67]. The application of physiological measures and the AFR method opens the way for cognitive load recognition during the online teaching of engineering courses and the results will hopefully serve as helpful feedback for teachers to adjust teaching strategies immediately and reasonably. Additionally, this technology also provides evidence for evaluating online courses. In this case, the learners who received the PED online course had higher theta and alpha band power, higher approximate, sample, and wavelet entropy, smaller pupil diameter, and shorter fixation duration, indicating that online courses employing positive emotional design consume less cognitive load than those using neural emotional design, thereby leading to better learning performance [62]. However, one limitation of this study is that we ignored individual-specific bias. The weight distributions of individual features for each subject are visualized in Fig. 10. Although the results for most subjects are generally consistent with Fig. 9, the discriminative features showed differences among different subjects. This is an important issue for future research since developing a suitable algorithmic model to address individual-specific bias may further enhance the performance of cognitive load recognition.

Fig. 10

The average feature weight on each subject of different kinds of features

The average feature weight on each subject of different kinds of features The future development of this research will focus on combining the channel attention mechanism with the spatial attention mechanism. The channel attention mechanism diverts attention to the most critical features, while the spatial attention mechanism diverts attention to the most relevant parts. The combined use of the two mechanisms may further enhance feature learning, leading to higher classification performance and lower total computational cost by emphasizing important information while suppressing noise.

Conclusion

The major takeaway of this study was to introduce physiological measures and the AFR method for higher feasibility in cognitive load recognition. We utilized multimodal physiological signals as tools to monitor the learners’ cognitive loads and adopted the AFR algorithm to further enhance the feasibility of cognitive load recognition in remote learning. The results demonstrated that physiological measures can significantly distinguish different cognitive load levels. Additionally, the introduced AFR algorithm can be effectively used to capture discriminative features, thus achieving good performance in terms of accuracy and computational cost. Continuously monitoring learners’ cognitive states provides not only instantaneous recognition of cognitive overload or underload but also valuable feedback to distance-learning system designers and administrators so they may take appropriate countermeasures to improve the overall learning effect.

29 in total

1. Using the international 10-20 EEG system for positioning of transcranial magnetic stimulation.

Authors: Uwe Herwig; Peyman Satrapi; Carlos Schönfeldt-Lecuona
Journal: Brain Topogr Date: 2003 Impact factor: 3.020

2. Brain oscillatory 4-30 Hz responses during a visual n-back memory task with varying memory load.

Authors: Mirka Pesonen; Heikki Hämäläinen; Christina M Krause
Journal: Brain Res Date: 2007-01-04 Impact factor: 3.252

3. EEG theta waves and psychological phenomena: a review and analysis.

Authors: D L Schacter
Journal: Biol Psychol Date: 1977-03 Impact factor: 3.251

4. Working Memory: The Interface between Memory and Cognition.

Authors: A Baddeley
Journal: J Cogn Neurosci Date: 1992 Impact factor: 3.225

Review 5. Event-related potentials in clinical research: guidelines for eliciting, recording, and quantifying mismatch negativity, P300, and N400.

Authors: Connie C Duncan; Robert J Barry; John F Connolly; Catherine Fischer; Patricia T Michie; Risto Näätänen; John Polich; Ivar Reinvang; Cyma Van Petten
Journal: Clin Neurophysiol Date: 2009-09-30 Impact factor: 3.708