| Literature DB >> 34326723 |
Nora Hollenstein1, Cedric Renggli2, Benjamin Glaus2, Maria Barrett3, Marius Troendle4, Nicolas Langer4, Ce Zhang2.
Abstract
Until recently, human behavioral data from reading has mainly been of interest to researchers to understand human cognition. However, these human language processing signals can also be beneficial in machine learning-based natural language processing tasks. Using EEG brain activity for this purpose is largely unexplored as of yet. In this paper, we present the first large-scale study of systematically analyzing the potential of EEG brain activity data for improving natural language processing tasks, with a special focus on which features of the signal are most beneficial. We present a multi-modal machine learning architecture that learns jointly from textual input as well as from EEG features. We find that filtering the EEG signals into frequency bands is more beneficial than using the broadband signal. Moreover, for a range of word embedding types, EEG data improves binary and ternary sentiment classification and outperforms multiple baselines. For more complex tasks such as relation detection, only the contextualized BERT embeddings outperform the baselines in our experiments, which raises the need for further research. Finally, EEG data shows to be particularly promising when limited training data is available.Entities:
Keywords: EEG; brain activity; frequency bands; machine learning; multi-modal learning; natural language processing; neural network; physiological data
Year: 2021 PMID: 34326723 PMCID: PMC8314009 DOI: 10.3389/fnhum.2021.659410
Source DB: PubMed Journal: Front Hum Neurosci ISSN: 1662-5161 Impact factor: 3.169
Details about the ZuCo tasks used in this paper.
| Participants | 12 | 12 | 18 |
| Sentences | 400 | 300 | 349 |
| Words | 7,079 | 6,386 | 6,828 |
| Unique word types | 3,080 | 2,657 | 2,412 |
| Sentiment analysis | ✓ | - | - |
| Relation detection | - | ✓ | ✓ |
In Task SR participants read sentences from movie reviews, and in Task NR sentences from Wikipedia articles.
Example sentences for all three NLP tasks used in this study.
| Binary/ternary sentiment analysis | “The film often achieves a mesmerizing poetry.” | |
| Binary/ternary sentiment analysis | “Flaccid drama and exasperatingly slow journey.” | |
| Ternary sentiment analysis | “A portrait of an artist.” | |
| Relation detection | “He attended Wake Forest University.” | |
| Relation detection | “She attended Beverly Hills High School, but | |
| left to become an actress.” | ||
Figure 1(Left) Label distribution of the 11 relation types in the relation detection dataset. (Right) Number of relation types per sentence in the relation detection dataset.
Figure 2The multi-modal machine learning architecture for the EEG-augmented models. Word embeddings of dimension d are the input for the textual component (yellow); EEG features of dimension e for the cognitive component (blue). The text component consists of recurrent layers followed by two dense layers with dropout. We test multiple architectures for the EEG component (see Figure 3). Finally, the hidden states of both components are concatenated and followed by a final dense layer with softmax activation for classification (green).
Figure 3EEG decoding components: (Left) The recurrent model component is analogous to the text component and consists of recurrent layers followed by two dense layers with dropout. (Right) The convolutional inception component consists of an ensemble of convolution filters of varying lengths which are concatenated and flattened before the subsequent dense layers.
Tested value ranges included in the hyper-parameter search for our multi-modal machine learning architecture.
| LSTM layer dimension | 64, 128, 256, 512 |
| Number of LSTM layers | 1, 2, 3, 4 |
| CNN filters | 14, 16, 18 |
| CNN kernel sizes | [1,4,7] |
| CNN pool sizes | 3, 5, 7 |
| Dense layer dimension | 8, 16, 32, 64, 128, 256, 512 |
| Dropout | 0.1, 0.3, 0.5 |
| Batch size | 20, 40, 60 |
| Learning rate | 10−1, 10−2, 10−3, 10−4, 10−5 |
| Random seeds | 13, 22, 42, 66, 78 |
| Threshold | 0.3, 0.5, 0.7 |
Threshold only applies to relation detection.
Binary sentiment analysis results of the multi-modal model using the recurrent EEG decoding component.
| Baseline | 0.572 | 0.573 | 0.552 (0.07) | 0.751 | 0.738 | 0.728 (0.08) | 0.900 | 0.899 | 0.893 (0.04) |
| + noise | 0.599 | 0.574 | 0.541 (0.08) | 0.721 | 0.715 | 0.709 (0.09) | 0.914 | 0.916 | 0.913 (0.03) |
| + ET | 0.781 (0.06) | 0.913 | 0.907 | 0.904 (0.05) | |||||
| + EEG full | 0.540 | 0.538 | 0.525 (0.06) | 0.738 | 0.729 | 0.725 (0.07) | 0.913 | 0.909 | 0.906 (0.04) |
| + EEG θ | 0.602 | 0.599 | 0.584 | 0.789 | 0.785 | 0.916 | 0.913 | ||
| + EEG α | 0.610 | 0.590 | 0.565 (0.05) | 0.763 | 0.758 | 0.753 (0.05) | 0.912 | 0.908 | 0.906 (0.03) |
| + EEG β | 0.587 | 0.578 | 0.555 (0.07) | 0.781 | 0.777 | 0.774+ (0.06) | 0.911 | 0.911 | 0.907 |
| + EEG γ | 0.614 | 0.591 | 0.553 (0.08) | 0.777 | 0.773 | 0.769 | |||
| +θ+α+β+γ | 0.597 | 0.597 | 0.569 (0.08) | 0.766 | 0.764 | 0.760 | 0.913 | 0.913 | 0.911 |
We report precision (P), recall (R), F
denotes p < 0.05 (uncorrected), + denotes p < 0.003 (Bonferroni corrected p-value).
Ternary sentiment analysis results of the multi-modal model using the recurrent EEG decoding component.
| Baseline | 0.408 | 0.384 | 0.351 (0.07) | 0.510 | 0.507 | 0.496 (0.06) | 0.722 | 0.714 | 0.710 (0.05) |
| + noise | 0.373 | 0.399 | 0.344 (0.10) | 0.531 | 0.519 | 0.504 (0.04) | 0.711 | 0.706 | 0.700 (0.06) |
| + ET | 0.728 | 0.717 | 0.714 (0.05) | ||||||
| + EEG full | 0.391 | 0.387 | 0.353 (0.07) | 0.505 | 0.505 | 0.488 (0.07) | 0.724 | 0.715 | 0.711 (0.06) |
| + EEG θ | 0.397 | 0.409 | 0.360 (0.07) | 0.516 | 0.510 | 0.498 (0.06) | 0.715 | 0.708 | 0.704 (0.05) |
| + EEG α | 0.390 | 0.390 | 0.347 (0.08) | 0.520 | 0.516 | 0.506 (0.05) | 0.720 | 0.712 | 0.707 (0.05) |
| + EEG β | 0.350 | 0.370 | 0.302 (0.09) | 0.523 | 0.519 | 0.509 (0.05) | |||
| + EEG γ | 0.409 | 0.397 | 0.359 (0.07) | 0.517. | 0.513 | 0.502 (0.04) | 0.709 | 0.705 | 0.697 (0.06) |
| +θ+α+β+γ | 0.401 | 0.400 | 0.368 (0.06) | 0.522 | 0.516 | 0.505 (0.05) | 0.722 | 0.717 | 0.713 (0.05) |
We report precision (P), recall (R), F.
Relation detection results of the multi-modal model using the recurrent EEG decoding component.
| Baseline | 0.404 | 0.501 | 0.609 | 0.539 (0.05) | 0.522 | 0.623 (0.05) | |||
| + noise | 0.420 | 0.424 | 0.408 (0.07) | 0.577 | 0.497 | 0.532 (0.03) | 0.585 | 0.625 (0.03) | |
| + ET | 0.421 | 0.404 | 0.402 (0.06) | 0.547 | 0.476 | 0.506 (0.04) | 0.661 | 0.631 | 0.644 (0.03) |
| + EEG full | 0.345 | 0.343 | 0.334 (0.05) | 0.511 | 0.387 | 0.432 (0.09) | 0.652 | 0.690 | 0.668 |
| + EEG θ | 0.421 | 0.414 (0.07) | 0.508 | 0.539 (0.07) | 0.646 | 0.736 | 0.684 | ||
| + EEG α | 0.368 | 0.373 | 0.358 (0.12) | 0.515 | 0.652 | 0.715 | 0.679 | ||
| + EEG β | 0.349 | 0.340 | 0.329 (0.09) | 0.581 | 0.497 | 0.532 (0.10) | 0.674 | 0.726 | |
| + EEG γ | 0.410 | 0.399 | 0.397 (0.05) | 0.554 | 0.488 | 0.514 (0.09) | 0.666. | 0.715 | 0.686 |
| +θ+α+β+γ | 0.370 | 0.376 | 0.363 (0.09) | 0.554 | 0.488 | 0.514 (0.09) | 0.675 | 0.646 | 0.659 (0.04) |
We report precision (P), recall (R), F
denotes p < 0.05 (uncorrected), + denotes p < 0.003 (Bonferroni corrected p-value).
Binary sentiment analysis results of the multi-modal model using the convolutional EEG decoding component.
| Baseline | 0.572 | 0.573 | 0.552 (0.07) | 0.751 | 0.738 | 0.728 (0.08) | 0.900 | 0.899 | 0.893 (0.04) |
| + noise | 0.558 | 0.584 | 0.528 (0.11) | 0.780 | 0.767 | 0.762 (0.06) | 0.895 | 0.887 | 0.883 (0.05) |
| + ET | 0.617 | 0.623 | 0.610 (0.07) | 0.790 | 0.790 | 0.783 (0.06) | 0.896 | 0.887 | 0.881 (0.05) |
| + EEG full | 0.588 | 0.583 | 0.572 (0.04) | 0.778 | 0.774 | 0.772+ (0.05) | |||
| + EEG θ | 0.564 | 0.569 | 0.535 (0.08) | 0.792 | 0.791+ (0.04) | 0.922 | 0.919 | 0.917 | |
| + EEG α | 0.596 | 0.593 | 0.563 (0.08) | 0.775 | 0.781 | 0.772 | 0.920 | 0.917 | 0.916 |
| + EEG β | 0.605 | 0.597 | 0.580 (0.08) | 0.802 | 0.920 | 0.914 | 0.914 | ||
| + EEG γ | 0.787 | 0.780 | 0.776+ (0.05) | 0.905 | 0.905 | 0.901 (0.04) | |||
| +θ+α+β+γ | 0.599 | 0.579 | 0.558 (0.07) | 0.800 | 0.794 | 0.786+ (0.05) | 0.909 | 0.910 | 0.907 (0.04) |
We report precision (P), recall (R), F
denotes p < 0.05 (uncorrected), + denotes p < 0.003 (Bonferroni corrected p-value).
Ternary sentiment analysis results of the multi-modal model using the convolutional EEG decoding component.
| Baseline | 0.408 | 0.384 | 0.351 (0.07) | 0.510 | 0.507 | 0.496 (0.06) | 0.722 | 0.714 | 0.710 (0.05) |
| + noise | 0.359 | 0.388 | 0.334 (0.09) | 0.494 | 0.484 | 0.476 (0.07) | 0.715 | 0.683 | 0.684 (0.05) |
| + ET | 0.417 | 0.399 | 0.372 (0.05) | 0.509 | 0.512 | 0.500 (0.07) | 0.721 | 0.687 | 0.670 (0.05) |
| + EEG full | 0.365 | 0.384 | 0.333 (0.08) | 0.488 | 0.484 | 0.476 (0.06) | 0.738 | 0.724 | |
| + EEG θ | 0.389 | 0.372 | 0.330 (0.06) | 0.511 | 0.495 | 0.477 (0.06) | 0.727 | 0.718 | 0.716+ (0.05) |
| + EEG α | 0.357 | 0.382 | 0.331 (0.11) | 0.534 | 0.525 | 0.515+ (0.06) | 0.732 | 0.715 | 0.713+ (0.04) |
| + EEG β | 0.534 | 0.727 | 0.717 | 0.715 (0.04) | |||||
| + EEG γ | 0.404 | 0.406 | 0.360 (0.08) | 0.521 | 0.514 (0.06) | 0.721+ (0.04) | |||
| +θ+α+β+γ | 0.384 | 0.402 | 0.354 (0.10) | 0.517 | 0.504 | 0.488 (0.05) | 0.717 | 0.715 (0.06) | |
We report precision (P), recall (R), F.
Relation detection results of the multi-modal model using the convolutional EEG decoding component.
| Baseline | 0.404 | 0.501 | 0.539 (0.05) | 0.522 | 0.623 (0.05) | ||||
| + noise | 0.424 | 0.299 | 0.342 (0.06) | 0.547 | 0.441 | 0.486 (0.06) | 0.532 | 0.493 | 0.511 (0.07) |
| + ET | 0.415 | 0.307 | 0.345 (0.08) | 0.447 | 0.413. | 0.428 (0.07) | 0.558 | 0.665 | 0.593 (0.13) |
| + EEG full | 0.225 | 0.225 | 0.225 (0.06) | 0.548 | 0.408 | 0.464 (0.07) | 0.647 | 0.664 | 0.650 (0.09) |
| + EEG θ | 0.380 | 0.400 (0.05) | 0.620 | 0.493 | 0.547 (0.05) | 0.698 | |||
| + EEG α | 0.372 | 0.366 | 0.352 (0.12) | 0.509 | 0.433 | 0.461 (0.12) | 0.661 | 0.697 | 0.675+ (0.08) |
| + EEG β | 0.394 | 0.328 | 0.338 (0.09) | 0.627 | 0.479 | 0.541 (0.05) | 0.643 | 0.646 | 0.640 (0.11) |
| + EEG γ | 0.405 | 0.363 | 0.366 (0.09) | 0.490 | 0.667 | 0.699 | 0.679+ (0.06) | ||
| +θ+α+β+γ | 0.324 | 0.227 | 0.257 (0.11) | 0.460 | 0.436 | 0.437 (0.14) | 0.610 | 0.562 | 0.584 (0.05) |
We report precision (P), recall (R), F.
Figure 4Data ablation for all three word embedding types for the binary sentiment analysis task using the recurrent EEG decoding component. The shaded areas represent the standard deviations.
Figure 5Data ablation for all three word embedding types for the binary sentiment analysis task using the convolutional EEG decoding component. The shaded areas represent the standard deviations.
Binary relation detection results for both EEG decoding components for the relation types Job Title and Visited using GloVe embeddings.
| GloVe | 0.789 | 0.776 | 0.767 (0.05) | 0.789 | 0.776 | 0.767 (0.05) |
| GloVe + EEG full | 0.782 | 0.773 (0.06) | 0.796 | 0.793 | 0.789 (0.05) | |
| GloVe + EEG γ | 0.780 | |||||
| GloVe | 0.762 | 0.756 | 0.734 (0.1) | 0.762 | 0.756 | 0.734 (0.1) |
| GloVe + EEG full | 0.756 | 0.759 | 0.745 (0.1) | 0.766 | 0.758 | 0.750 (0.09) |
| GloVe + EEG γ | ||||||
The best result in each column is marked in bold.