Literature DB >> 35400858

Are We in Time? How Predictive Coding and Dynamical Systems Explain Musical Synchrony.

Abstract

Humans tend to anticipate events when they synchronize their actions with sound (such as when they clap to music), which has puzzled scientists for decades. What accounts for this anticipation? We review two theoretical mechanisms for synchrony: predictive coding and dynamical systems. Both theories are grounded in neural activation patterns, but there are important distinctions. We contrast their assumptions, their computations, and their musical applications to anticipatory synchronization.

Entities: Chemical

Keywords: anticipatory synchronization; coupling; dynamical systems; internal models; predictive coding

Year: 2022 PMID： 35400858 PMCID： PMC8988459 DOI： 10.1177/09637214211053635

Source DB: PubMed Journal: Curr Dir Psychol Sci ISSN： 0963-7214

Some of the most intricate temporal sequences that humans produce are the sounds they use to communicate, including speech and music. People precisely coordinate their movements with sound; speakers prepare for their turn to talk, and musicians time their tones to synchronize or align with sound. Even young toddlers spontaneously drum or sway in response to auditory rhythms, which suggests that humans are predisposed to synchronize movement with sound. Because movement preparation is often slower (100–250 ms) than the rate of musical events (up to 8–10 tones/s), individuals must anticipate future musical events in order to prepare their movements. A striking feature of musical synchronization is the tendency people show to produce musical events prior to the sound with which they intend to synchronize; musicians produce tones about 30 to 50 ms sooner than a regular auditory beat, and nonmusicians anticipate even sooner (50–80 ms before the beat; Repp & Siu, 2015). This behavior is referred to as anticipatory synchronization, a cornerstone of human synchrony. Several key variables influence individuals’ ability to synchronize actions with musical sound. One is musical training, which reduces the anticipatory synchrony, defined as negative mean asynchrony (when individuals’ produced actions precede stimulus onsets, such as when they clap to music and their claps precede the onsets of the musical tones). Another factor is sensory feedback; the more feedback available from self-generated and external auditory outcomes, the smaller the asynchrony (Repp & Siu, 2015). A third factor is the predictability of auditory sequences; the more regular a sequence, the smaller the observed variability in asynchrony. Measures of the mean and variance of asynchrony are used by synchronization models in different ways, as we discuss below. Two prominent theories offer relevant mechanisms for musical synchronization. One is predictive coding (PC), and the other is dynamical systems (DS). We compare these theories as applied to anticipatory synchronization and discuss their assumptions, computations, and limitations (see Table 1 for a summary of key definitions).

Table 1.

Key Definitions

Predictive coding	Dynamical systems
Oscillation: Waves of excitation and inhibition arising at different hierarchical levels and at different frequencies among brain networks (e.g., prediction errors propagated via gamma-frequency oscillations, predictions propagated via beta-frequency oscillations; Heilbron & Chait, 2018).	Oscillation: A periodic, recurring time series with an associated amplitude and frequency (Pikovsky et al., 2001), such as the movement of a swing at a playground. Often self-sustaining (i.e., continuing in the absence of input), oscillations are modeled with mathematical equations.
Predictive timing: The use of past information to generate an internal model capable of anticipating the timing of future events (Ravignani et al., 2014). Predictive timing is often contrasted with reactive timing.	Anticipatory synchronization: Behavior of an oscillator that maintains a stable negative (anticipatory) phase relationship relative to another oscillator or stimulus. This oscillatory behavior demonstrates anticipatory synchronization in the absence of internal models (Stepp & Turvey, 2010).
Free energy: Energy available for a system to perform work or cause change. Based on thermodynamics principles, free energy is an information-theory measure that constrains surprise arising from model-based prediction (Friston, 2009).Prediction error: The divergence of the sensory input from the expectation (i.e., the prior). Prediction error is used to update the prior probabilities (Clark, 2013).	Energy expenditure: The amount of energy used to implement change. States of least energy expenditure define steady (stable) states to which a system returns (Strogatz, 1994), such as runners’ preferred rates that minimize oxygen consumption.Phase: Position within one cycle of a periodic oscillation. The phase relationship between two oscillations is often measured as a proportion of (divided by) the period (1/frequency) of the oscillations. Phase can be measured in radians or in degrees.
Prior (probability): In Bayesian inference, the expected probability of a hypothesized outcome before it is known. In common variations of the term, the word “prior” is followed by “probability,” “distribution” (or “statistical information”), “belief,” “knowledge,” “assumption,” or “expectancy.” The word often refers to internal a priori “knowledge accumulated through experience” (Vuust & Witek, 2014, p. 3).	Coupling: A parameter that defines how oscillators influence each other (share information) or how a stimulus influences an oscillator. Coupling can cause oscillators to match in their phase, their frequency, or both (Pikovsky et al., 2001).
Posterior (probability): In Bayesian inference, the updated probability after the outcome is known. The posterior probability is calculated as the likelihood that the hypothesis predicts the input multiplied by the prior probability of the hypothesis (Denham & Winkler, 2020, p. 1157). Common variants of the term parallel the common variants of “prior” (e.g., “posterior” may be followed by “probability,” “distribution,” “belief,” “knowledge,” “assumption,” or “expectancy”).	Delay coupling: An oscillator model that contains a coupling term combined with a time delay and that is implemented in differential equations (Voss, 2000). The time delay can refer to different functional properties (transmission delay, delay of sensory feedback) of the oscillator.

Key Definitions

Predictive Coding

Origins

Early origins of the concept of predictive coding have been traced back to Helmholtz, whose account of learning was based on hierarchical layers of representation (as described in Friston, 2009). More recent precursors to predictive coding include internal models of motor commands (Wolpert et al., 1995) and mental simulations of partners’ joint actions (Sebanz & Knoblich, 2009), according to which internal modeling of future events is based on representations of the mapping between one’s perceptions and actions (Clark, 2013). Internal models of motor control predict the sensory consequences of actions by simulating behavior and compensating for time delays in sensorimotor systems (Wolpert & Flanagan, 2001). Internal models have been applied to poor-pitch singing through comparison of emulations of vocal-fold tension with auditory feedback from resulting pitches (Pfordresher & Mantell, 2014). Extensions to interpersonal, or joint, action propose that a partner’s actions are simulated through the use of internal models to generate predictions for upcoming events (Sebanz & Knoblich, 2009). A musical application of joint action combines simulation of actions for oneself and for one’s partners, as well as imagining sounds produced by partners, with error correction (van der Steen & Keller, 2013). Recent PC models compute the difference between an internal model and perceived events with a goal to minimize the entropy (the inverse of predictability) of predictions relative to future observed outcomes (Friston, 2009, 2018). Influenced by neurophysiological evidence of cascading processes among hierarchical levels in the visual system (Rao & Ballard, 1999), recent PC models rely on Bayesian inference to update internal models and have been applied to explain binocular rivalry (i.e., alternating perception of different images presented to the two eyes) and bistable visual percepts (i.e., percepts with two stable states; see Clark, 2013, for a review). Applications of PC models to auditory perception have recently been assessed (Denham & Winkler, 2020; Heilbron & Chait, 2018).

Assumptions

One key assumption of this approach is that an internal model is used to predict future behavior. Comparison of perceptual input with the prediction generates a difference (prediction error) that is used to adjust the internal model and then discarded. There is no need to store the original perceptual input or the prediction error, and thus the required representation is parsimonious. A second assumption is a hierarchical organization of brain networks in which lower areas receive sensory input that is projected to higher cortical areas. A hierarchy is formed by forward excitatory connections from sensory areas to association areas and by backward inhibitory connections downward, as well as by lateral interactions between units within layers. The hierarchical system permits the computation of prediction error at one level while the internal model is updated at another level; the computations occur in a cascaded fashion. Although the temporal properties of this cascade of forward, backward, and lateral connections are not well specified, the spatial projections are proposed to include the auditory network from brain stem to cortex (Koelsch et al., 2019).

Computations

To create accurate predictions of future states, PC models minimize prediction error over iterations, on the basis of Bayesian inference. The bottom layer of the hierarchy contains error units that sense the incoming information and compute the prediction error, which is sent to higher layers. The top layer encodes the prior probabilities, that is, the likelihoods that given sensory inputs correspond to predicted states. This is the internal model. State units in the top layer (actively) generate the prediction signal and send it downward to the error units. In addition, units within a level are connected laterally. Error units and state units have bidirectional signals: Forward connections convey prediction error to state units, and reciprocal backward connections send the predictions to error units. These bottom-up and top-down messages serve to minimize the use of free energy (Friston, 2009). Minimizing free energy corresponds to maximizing the probability that the sensory input matches the predicted outcome, given the internal model. Over successive iterations, the state units’ prior probabilities (existing beliefs before new evidence is introduced) become closer to the error units’ posterior probabilities (i.e., updated beliefs after new evidence is introduced). Eventually, when prior probabilities match posterior probabilities, the internal model is stable.

Applications to music

Vuust and Witek (2014) proposed a PC model to explain perception of musical syncopation. Syncopation, the occurrence of musical tones in unexpected metrical locations, is typical of groove music and often causes listeners to move, sway, or dance. Musical meter, a hierarchy of regular pulses that form patterns of strong and weak beats, provides the prior experience that tones will occur more often on strong beats. A listener’s internal model of meter generates predictions that include the movements needed to produce the syncopated rhythm, while suppressing the listener’s tendency toward overt action. This suppressed tendency to move arises from predictions of when the beat is expected to occur (see Koelsch et al., 2019, for related neurophysiological evidence). Prediction errors have two important parameters: a mean value and a variance associated with the mean. Vuust et al. (2018) observed that high amounts of syncopation are associated with high variance in prediction error (low precision). The more precise the prediction error, the more impact that error is expected to have. Only prediction errors associated with small variances cause the higher-level predictions to be adjusted; prediction errors with large variances are ignored (Vuust et al., 2018), a claim that is testable in synchronization tasks. Two studies have tested PC models of anticipatory synchronization. The first study (Heggli, Konvalinka, et al., 2019) tested dyadic partners’ internal models as they synchronized with each other while they tapped identical duration patterns. Each partner heard over headphones a metrical context that was either the same one their partner heard (which created similar shared internal models with matched top-down predictions) or a different one (which created dissimilar internal models with unmatched top-down predictions). These matched and mismatched metrical contexts manipulated the top-down prior beliefs for when the partners’ taps should occur. Some of the contexts created polyrhythms (i.e., when participants tapped three times to every 4 context beats) that could be perceived as ambiguous. Small mean negative asynchronies were observed on average. Asynchronies became more variable when the two partners’ metrical contexts differed (although partners recovered quickly within trials), which was interpreted as evidence for prediction errors resulting from the discrepancy in the partners’ different internal models. The second study that tested PC models of anticipatory synchronization (Elliott et al., 2014) involved individual participants who were presented with two auditory cues that created ambiguous percepts with which they tried to synchronize. The two stimulus cues were altered in their asynchrony (relative phase) and temporal regularity across trials. Participants’ large mean negative asynchronies decreased (i.e., moved toward 0) as the two cues’ relative phase increased (i.e., diverged from 0). Participants used different strategies depending on the cues’ relative phase: They synchronized to the integrated (combined) cues when the cues’ phase difference was small, but they synchronized with one cue or the other when the phase difference was large. Models fitted to the asynchrony variabilities provided support for a Bayesian causal inference model with four free parameters that captured prior probabilities associated with the auditory cues; prediction errors indicated that participants attempted to minimize the variance in asynchrony and extract a temporally regular beat.

Remaining issues

PC applications to musical synchrony to date have targeted ambiguous percepts in order to demonstrate the internal model’s impact. One open question is whether PC models can predict anticipatory synchrony (captured by mean behavior). Another is whether an internal model can change only in the presence of prediction error (when contextual information is available), or if it will also change in the absence of novel information. A final question is, what are the time costs of implementing a hierarchy that relies on several steps between layers for prediction error to be computed and sent upward and for prediction signals to be adjusted and sent downward?

Dynamical Systems

DS theories, which explain how systems change over time, have their origins in analysis of physical synchrony among pendulum clocks (Huygens, 1665, described in Pikovsky et al., 2001) as well as biological synchrony in circadian rhythms, cardiac rhythms, and motor coordination (Haken et al., 1985; Winfree, 1967). Synchronization arises when oscillators (alternating waveforms that repeat) that are self-sustaining (continue in the absence of external input) share information via coupling, which causes them to adapt to each other or to a stimulus (see Table 1). Their self-sustaining nature is supported by neural (magnetoencephalographic) representations of musical pulse that develop at a beat frequency despite the absence of stimulus energy at that frequency (Tal et al., 2017). When an oscillator is momentarily disturbed by input, it soon returns to its original frequency, its stable state of minimum energy expenditure. An oscillator will resonate (respond with increased amplitude) to a stimulus when its natural frequency is close to the stimulus frequency or when the coupling between oscillator and stimulus is high. Linear oscillators synchronize 1:1 with stimulus events; nonlinear oscillators additionally synchronize at higher resonances (e.g., 1:2; 1:3). The perception of hierarchical meter thus arises from higher-order resonances; internal models are not required (Large, 2008). Anticipatory synchronization arises in delay-coupled systems when a driven oscillator (i.e., an oscillator affected by external input) compares its own time-delayed memory of a previous state with the current input (Voss, 2000). Delay coupling refers to coupling between two or more oscillators that is modulated by time delay from at least one oscillator. Stepp and Turvey (2010) described delay coupling as “strong” anticipation: Anticipated future states are based on present and past information already in the system without the need for internal models. Machado and Matias (2020) demonstrated the biological plausibility of delay-coupled models by simulating delay coupling in spike neuronal populations that lead to bistable visual percepts. Time delays in DS models have been implemented to account for intrabrain synchronization dynamics (Deco et al., 2009, 2011) as well as interpersonal synchronization (Varlet et al., 2012). DS theories of human synchrony assume that behavior arises in a system composed of coupled subsystems of oscillators whose emergent processes explain perception and cognition. Individual oscillations arise from the joint activity between coupled excitatory and inhibitory neurons and are modeled either at the biophysical level or, more commonly, by using oscillator models with simplifying assumptions. Oscillatory time series arise from neuronal interactions, as well as from neurons interacting with external quasiperiodic stimuli, such as musical sequences. Most DS theories assume continuous change over time, but they can also be modeled discretely, as in the case of musical pulse (Large, 2008). Time delay in DS models, often implemented as a constant for simplicity, is assumed to represent the synaptic transmission rate in a neural system (Machado & Matias, 2020). Delay-coupled DS models have been proposed to account for anticipatory synchronization. Based on self-sustaining oscillators, the coupling term in these models is often adapted from the Kuramoto model (Strogatz, 2000) which assumes that each oscillator has an intrinsic frequency that can be tuned during development via Hebbian learning (strengthened connections between simultaneously firing neurons; Tichko et al., 2021). Oscillators adapt their phase on the basis of how much they differ from incoming stimulus onsets, modulated by the coupling strength; higher coupling means more phase correction and faster synchronization. Time delay compares the oscillator’s past (at a constant delay) with the current input (Voss, 2000), thus providing a type of self-feedback. Most critical is that the oscillator’s past incorporates the past of the stimulus; thus, the oscillator (passively) reacts to present events as a function of present and past states of both stimulus and oscillator. Roman et al. (2019) tested synchronization patterns of individuals who tapped to a metronome, using a single Hopf oscillator (an oscillator whose intrinsic frequency can adapt to a stimulus frequency) with time delay to represent unidirectional coupling with the metronome. The model, designed to mimic neural oscillations and adapt its frequency, simulated the neural delays necessary to account for the observed tappers’ negative mean asynchrony. The model successfully predicted the difference in the degree of anticipatory synchronization between musicians and nonmusicians who tapped at a range of metronome rates. Demos et al. (2019) applied a bidirectional delay-coupled model, adapted from the Kuramoto model, to the more complex case of asymmetric anticipatory synchronization between partners playing musical duets. Each partner was modeled as a simple oscillator with parameters for time delay, coupling, and intrinsic frequency. Each oscillator could be driven by or could drive the other oscillator, depending on whether one partner heard feedback, both partners heard feedback, or neither partner heard feedback. The model accounted for anticipatory synchrony under experimental conditions that manipulated auditory feedback to shift the oscillators from bidirectional to unidirectional information transmission. The use of delay-coupling terms successfully predicted the driven partner’s mean anticipatory synchronization (e.g., the driven partner performed earlier than the driver partner) when the driver partner could not hear the driven partner; this was the first delay-coupling implementation to extend beyond single-person models of musical synchrony to interpersonal interaction. Heggli, Cabral, et al. (2019) examined how partners alter their synchronization in a tapping task, using bidirectional coupling but with no time delay. Two oscillators modeled each individual’s perception and action as coupled; in addition, bidirectional coupling linked each person’s action to the partner’s perception. The model captured different patterns of synchronization variability, such as when partners showed mutual adaptation (increased coupling) or when both partners tried to lead at the same time (reduced coupling). Similar coupled oscillator networks without time delay can produce anticipatory synchronization (Pyragiene. & Pyragas, 2015). The delay-coupling term in DS models accounts for both individual and dyadic anticipatory synchronization. One question is how time delays map to neural features, such as synaptic transmission (Machado & Matias, 2020), and to functional properties of neural networks (Deco et al., 2011). Another question is whether the delay is constrained by the individual, task, or social context. Finally, how can these parameters be interpreted and connected back to behavior given the complexity of their nonlinear interactions?

Model Comparisons

Although assumptions and computations differ between PC and DS models, they have similarities, too. Both classes of models rely on interconnections between excitatory and inhibitory neurons. The primary differences concern the role of time and the organizational architecture: The hierarchical representations in PC models include several layers, more types of nodes than DS models (only PC models include modulatory nodes), and more connections than the bottom-up DS oscillations, which are based solely on excitatory-inhibitory interactions. These computational distinctions suggest that the models diverge in parsimony. These distinctions parallel differences in the synchrony behaviors accounted for: The PC models’ Bayesian properties account for variability in synchrony, whereas the DS models’ delay-coupled differential equations account for mean (directional) synchrony. Although these theories are not mutually exclusive, important paradigmatic and architectural differences prevent an easy merger of them: PC models, evolved from information theory and systems neuroscience, make top-down/bottom-up distinctions (such as in Heggli, Konvalinka et al.’s, 2019, mismatched internal models of musical partners). In contrast, DS models, evolved from physics and mathematical biology, include interactions across multiple spatiotemporal scales (such as unidirectional and bidirectional coupling within and across ensemble musicians). In short, predictive coding explains synchrony by generating predictions based on prior learning; dynamical systems explain synchrony by coupling already existing oscillations without a need to generate predictions.

Conclusions

PC and DS models of musical synchronization share similarities, including the goal to minimize energy; reliance on differences between expected and observed outcomes; and grounding in neurophysiological models of excitatory and inhibitory activation. The models have important differences in how much they rely on prior knowledge about the resulting output, how previous adaptations to stimuli are retained, and whether they are intended to account for the mean or variability of synchrony. To date, only DS theories have successfully modeled anticipatory synchronization. Future research directions include modeling noisy conditions; explaining roles of contextual learning, musical pleasure, and reward; and scaling up to larger groups. Advances are likely to be assisted by machine learning and other mathematical tools for capturing musical synchrony.

Are We in Time? How Predictive Coding and Dynamical Systems Explain Musical Synchrony.

Predictive Coding

Origins

Assumptions

Computations

Applications to music

Remaining issues

Dynamical Systems

Model Comparisons

Conclusions

Recommended Reading

1. Staying Together: A Bidirectional Delay-Coupled Approach to Joint Action.

Review 2. Whatever next? Predictive brains, situated agents, and the future of cognitive science.

3. A theoretical model of phase transitions in human hand movements.

4. Phase bistability between anticipated and delayed synchronization in neuronal populations.

5. On Strong Anticipation.

6. Chorusing, synchrony, and the evolutionary functions of rhythm.

7. Musical interaction is influenced by underlying predictive models and musical expertise.

8. A Kuramoto model of self-other integration across interpersonal synchronization strategies.

9. The ADaptation and Anticipation Model (ADAM) of sensorimotor synchronization.

10. Delayed feedback embedded in perception-action coordination cycles results in anticipation behavior during synchronized rhythmic action: A dynamical systems approach.

1. Social Interaction and Rate Effects in Models of Musical Synchronization.