| Literature DB >> 30405383 |
Emmanuel Biau1, Sonja A Kotz1,2.
Abstract
How the brain decomposes and integrates information in multimodal speech perception is linked to oscillatory dynamics. However, how speech takes advantage of redundancy between different sensory modalities, and how this translates into specific oscillatory patterns remains unclear. We address the role of lower beta activity (~20 Hz), generally associated with motor functions, as an amodal central coordinator that receives bottom-up delta-theta copies from specific sensory areas and generate top-down temporal predictions for auditory entrainment. Dissociating temporal prediction from entrainment may explain how and why visual input benefits speech processing rather than adding cognitive load in multimodal speech perception. On the one hand, body movements convey prosodic and syllabic features at delta and theta rates (i.e., 1-3 Hz and 4-7 Hz). On the other hand, the natural precedence of visual input before auditory onsets may prepare the brain to anticipate and facilitate the integration of auditory delta-theta copies of the prosodic-syllabic structure. Here, we identify three fundamental criteria based on recent evidence and hypotheses, which support the notion that lower motor beta frequency may play a central and generic role in temporal prediction during speech perception. First, beta activity must respond to rhythmic stimulation across modalities. Second, beta power must respond to biological motion and speech-related movements conveying temporal information in multimodal speech processing. Third, temporal prediction may recruit a communication loop between motor and primary auditory cortices (PACs) via delta-to-beta cross-frequency coupling. We discuss evidence related to each criterion and extend these concepts to a beta-motivated framework of multimodal speech processing.Entities:
Keywords: beta oscillations; biological motion; multimodal speech perception; prosody; temporal predictions
Year: 2018 PMID: 30405383 PMCID: PMC6207805 DOI: 10.3389/fnhum.2018.00434
Source DB: PubMed Journal: Front Hum Neurosci ISSN: 1662-5161 Impact factor: 3.169
Figure 1Multimodal speech perception improvement by non-verbal information in the lower motor beta oscillatory framework. The figure represents the classic left-lateralized audiovisual speech perception network and its interactions with the left motor areas in a lower beta oscillation framework. Visual and auditory sensory input is processed separately in the modality-specific visual (VC) and primary auditory cortices (PACs). Then, sampled auditory and visual input reach the secondary auditory cortex the left post superior temporal gyrus (lpSTG), where multimodal integration mechanisms engage in speech perception (which may extend to the left Inferior Parietal lobule (IFP) considered as a multisensory integration site in audiovisual speech). Finally, multimodal information is conveyed to the left inferior frontal gyrus (lIFG) for semantic integration and multimodal semantic binding. In the oscillatory framework, the visual cortex tracks non-verbal information translating the speech envelope structure conveyed at different time scales: mouth and jaw apertures convey syllabic information at theta rate (~4 Hz), while other body parts reflect the prosodic features of the speech envelope at 1–3 Hz delta rate (i.e., pitch accents, amplitude, duration, silences, etc.). In the primary auditory cortex, the delta and theta activities track the prosodic and syllabic structures composing the speech envelope from the auditory signal via phase entrainment mechanisms. While processed sensory inputs are transferred to the left STG for multimodal integration, we hypothesize that delta-theta afferent copies from primary areas are also sent to the central amodal coordinator in the motor cortex. These delta-theta afferent copies convey online information on the spectro-temporal structure of the multimodal speech (e.g., cadence, rhythm) and may facilitate the elaboration of neural top-down temporal predictions. Crucially, visual input feeds the motor cortex first, due to the natural precedence of visual over auditory onsets in audiovisual speech and prepares the incoming of redundant delta-theta information from auditory cortex. In the motor cortex, oscillatory stimulus driven input reception allows the generation of temporal predictions supported by lower beta activity. In return, beta power supports feedback to the primary auditory areas (often reported as delta-to-beta cross-frequency coupling), optimizing online its activity for the incoming speech. In theory, if auditory entrainment is improved by the beta feedback in the primary auditory cortex, this should improve the quality of information conveyed to the secondary auditory cortex as well, and then ensure better multimodal integration of the incoming speech. All in all, non-verbal information may boost the generation of lower beta-based temporal predictions by providing an additional copy of the delta-theta temporal structures of speech in the visual modality.