Quantitative biomechanical models can identify control parameters that are used during movements, and movement parameters that are encoded by premotor neurons. We fit a mathematical dynamical systems model including subsyringeal pressure, syringeal biomechanics and upper-vocal-tract filtering to the songs of zebra finches. This reduces the dimensionality of singing dynamics, described as trajectories (motor 'gestures') in a space of syringeal pressure and tension. Here we assess model performance by characterizing the auditory response 'replay' of song premotor HVC neurons to the presentation of song variants in sleeping birds, and by examining HVC activity in singing birds. HVC projection neurons were excited and interneurons were suppressed within a few milliseconds of the extreme time points of the gesture trajectories. Thus, the HVC precisely encodes vocal motor output through activity at the times of extreme points of movement trajectories. We propose that the sequential activity of HVC neurons is used as a 'forward' model, representing the sequence of gestures in song to make predictions on expected behaviour and evaluate feedback.
Quantitative biomechanical models can identify control parameters that are used during movements, and movement parameters that are encoded by premotor neurons. We fit a mathematical dynamical systems model including subsyringeal pressure, syringeal biomechanics and upper-vocal-tract filtering to the songs of zebra finches. This reduces the dimensionality of singing dynamics, described as trajectories (motor 'gestures') in a space of syringeal pressure and tension. Here we assess model performance by characterizing the auditory response 'replay' of song premotor HVC neurons to the presentation of song variants in sleeping birds, and by examining HVC activity in singing birds. HVC projection neurons were excited and interneurons were suppressed within a few milliseconds of the extreme time points of the gesture trajectories. Thus, the HVC precisely encodes vocal motor output through activity at the times of extreme points of movement trajectories. We propose that the sequential activity of HVC neurons is used as a 'forward' model, representing the sequence of gestures in song to make predictions on expected behaviour and evaluate feedback.
For a given set of movements, sets of movement parameters tend to be correlated with each other, so that it is difficult to resolve if motor cortical neurons encode different sets of static parameters (e.g. position, velocity, direction) or even to distinguish between static and time-dependent parameters (e.g. path trajectory)[1]. In principle, the motor coding problem can be addressed by developing quantitative models that describe the biomechanics of the movements[2]. To the extent that such models capture the actual control elements used to produce a movement, this permits motor neuron activity to be evaluated in a natural framework. We examined motor control in the bird song system from this perspective, creating a dynamical systems model of the avian vocal organ (syrinx) that captures many of the rich set of vocal behaviors that characterize bird songs[3].We assessed predictions of the biomechanical model by taking advantage of a neuronal replay phenomenon[4-6]. Neurons in the nucleus HVC, a secondary motor/association cortical structure which is the most central structure known to be essential for singing, emit precise premotor activity when a bird sings[5-7], and have responses that are very similar in timing and structure[6] that are highly selective for the bird’s own song (BOS) when a bird listens to playback of song[8,9]. In zebra finches, there is a striking state-dependent neuronal replay phenomenon[4] associated with song learning[10], so that the strongest and most selective auditory responses are recorded in sleeping birds. We used responses to song in sleeping adult zebra finches as a proxy for evaluating the structure of singing, and then tested emerging hypotheses in singing birds.
Validating a song model: static parameters
The avian vocal organ is a nonlinear device[11-13] capable of generating complex sounds even when driven by simple instructions[14,15]. We extended a low dimensional model of the avian syrinx and vocal tract that can capture a variety of acoustic features like the precise relationship between fundamental frequency and spectral content of zebra finch song[16,17]. The model used here is summarized in Fig. 1. A two dimensional set of equations describes the labial dynamics (see Methods) (Fig.1, x(t), red trace). Flow fluctuations are fed into a vocal tract, generating an input sound P (green trace). The tract filters the sound and is characterized as a trachea, modeled by a tube, which connects to the oro-esophageal cavity (OEC), here modeled as a Helmholtz resonator[18] (see Methods) . The output of the model is a time trace representing the uttered sound (P) (blue trace).
Figure 1
Schematized view of a dynamical systems model describing labial dynamics and vocal tract filtering (trachea and oro-esophageal cavity, OEC)
The syringeal membrane was modeled as a mass (m) with damping (b) and a restitution (spring) force (K). Normal form equations for labial position (x(t), red line) were integrated, computing the input pressure at the vocal tract (Pi(t), green line) and ultimately the total output pressure (Pout(t), blue line). v, sound velocity; T, propagation time along trachea; γ, time constant (see Methods).
Using this model, we created synthetic versions of the songs our test birds sang. Time dependent parameters of the model describing the labial dynamics were reconstructed to account for the time dependent acoustic properties of the sound (see Methods). Following[3,16,17], for each bird's song we used an algorithmic procedure to reconstruct unique functions for the air sac pressure (α(t)) and the tension of syringeal labia (β(t)). The result of the procedure for one song is illustrated in Fig. 2, showing that many features observed in the spectrograph of the recorded song (Fig. 2a) were also apparent in the synthesized song (Fig. 2b). Relatively simple time traces of reconstructed pressure and tension arose from fitting the bird’s song (Fig. 2c). These two functions drove the nonlinear equations for the labia to produce a wide range of diverse acoustic features. The parameter space of pressure vs. tension was organized by bifurcation curves (Fig. 2d, black lines), i.e. curves in the parameter space that separated regions where the model presented qualitatively different dynamics (sound patterns). Only one region (Fig. 2d, gray region) corresponded to oscillatory behavior, i.e. labial oscillations resulting in sound pressure fluctuations. Two features of the pressure-tension trajectories resulting in sound output were apparent (Fig. 2d). One, most of the control parameters were maintained close to bifurcation curves, facilitating rapid changes in the quality of sound output with small changes in parameter values. Two, there were many sounds that were characterized principally by movements in pressure or tension but not both.
Figure 2
A low dimensional model: reconstructing gestures
Spectrographs of a bird's song (a) and model synthetic song (b). Song is described by fitted parameters α(t) and β(t), proportional to air sac pressure and labial tension, respectively (c). Each sound is generated by a continuous curve in the parameter space of the model, a "gesture" (d). Oscillations in the vicinity of a SN bifurcation present rich spectra, typical of zebra finch song. Note that the spectrally poor "high note" (green) is distant from the SN bifurcation. The gray area indicates the region of phonation. The distribution of gesture durations for five birds is displayed in (e).
Song was described by the sequence of these pressure-tension trajectories, which we call gestures, with gesture onsets and offsets defined as discontinuities in either the pressure or tension functions (Fig. 2c). Gestures include movements that do not result in phonation, such as pressure patterns associated with mini-breaths between syllables[19] but our recordings here were limited to airborne sounds. In a sample of 8 modeled songs, there were 13±4 gestures per motif (largest basic unit of song, a repeated sequence of syllables). The distribution of gesture durations (mode = 22.5±2.5 ms, range 4–142 ms) was non-Gaussian, with 33% of the gestures ≤ 30 ms, and a long tail corresponding to slowly varying sounds such as constant frequency harmonic stacks (Fig. 2e).This simple model captured essential features of sound production in a framework of labial tension and subsyringeal pressure over which birds have direct motor control [20-22]. Whereas the actual syrinx has considerable additional complexity, the model provided for substantial dimensionality reduction. This allowed us to capture a wide range of acoustic features in a small set of time dependent parameters.We tested the model by comparing responses of HVC neurons to the broadcast of the modeled song (mBOS) and BOS in sleeping birds (Fig. 3). Responses to a grid of mBOS stimuli with identical timing but different spectra from BOS identified optimal estimates for two remaining free static parameters (Supplementary Fig. 1). In sleeping birds, song system neurons are exceptionally selective and it was far from trivial to induce a response: for example mBOS generated without the OEC component failed to elicit response. In a case where we mis-estimated the duration of a component of BOS by 5 ms, a neuron responded strongly to BOS but not at all to the synthetic song (Supplementary Fig. 2b). Over a population of 30 neurons, the best mBOS elicited 58%±8% of the response to BOS (Supplementary Note 1). Both phasic projection neurons (HVC(p)) (N=15) and tonic interneurons (HVC(i)) (N=15) responded selectively to mBOS over non-BOS stimuli (Supplementary Note 1). These results show that a low dimensional model representing an approximation of peripheral mechanics is sufficient to capture behaviourally relevant features of song.
Figure 3
Testing the low dimensional model
The activity of HVC selective neurons of sleeping birds in response to the presentation of BOS and modeled BOS (mBOS) was similar. The timing of the three repeated motifs that were presented is indicated by the bold horizontal lines.
Projection neurons burst at gesture extrema
We then evaluated the activity of HVC neurons relative to model dynamics, analyzing the timing of spike bursting relative to the pressure-tension trajectories used to synthesize mBOS. This identified a compelling relation between the timing of HVC(p) spikes and the pressure-tension trajectories. For example, in Fig. 4a the spiking of two neurons (coded with different colors) is shown relative to the BOS spectrograph, oscillograph and reconstructed pressure and tension time series. One neuron bursts once, at the transition between descending frequency modulations and a constant frequency “high note”. The other neuron bursts twice, once when the pressure during a high note reached a maximum, the other time at the transition between a high frequency chevron and a broadband frequency modulated sound. Similar relations between spike burst timing and gestures were observed for 14 of the 15 HVC(p) (Supplementary Figs 2 and 3). In one case, a neuron emitted bursts in the interval between syllables. We hypothesize this pattern might arise if the bursts are associated with mini-breaths during singing[19]. Only the 17 bursts occurring during phonation were considered for further analysis.
Figure 4
Timing of gestures relative to bursting of projection neurons
a, song spectrograph and oscillograph (top panels); reconstructed parameters pressure and tension (middle panels), with tick marks indicating the times of all GTEs. Bottom panel, raster plots of the responses of two neurons (color coded green and orange), together with their closest GTE, indicated with lines of the same colors. The trajectories (same color coding) in parameter space are displayed in (b), with a point indicating the mean position of a burst, and arrows indicating the trajectory direction. c, distribution of time differences between consecutive GTE occurrences (N = 5 birds). d, distribution of time differences between the time of each spike (Ts) and the time of the closest GTE in sleeping birds (N = 14 HVC(p), 5 birds). e, The same analysis of d on singing birds (N = 5 HVC(p), 2 birds).
Examining the responses of the HVC(p) on pressure vs. tension plots demonstrated that neurons burst preferentially at gesture trajectory extrema (GTE) associated with gestures (Fig. 4b). A gesture has at least two GTE, at its beginning and end, and up to two additional GTE, if the absolute maxima of pressure and/or tension represent unique and distinct time points. No additional GTE result in cases where the absolute maximum is not distinct in time, e.g., multiple local maxima with same magnitude. Of the 17 bursts (14 HVC(p)), 11 (65%) were aligned with onsets/offsets, and 6 (35%) were aligned with pressure or tension maxima. In a sample of 5 songs, there were 28±4 GTE per song (165 total GTE). From a total of 60 gestures, 20 (33.4%) had only onset and offset GTEs; 30 (50%) had in addition a unique peak in pressure (3 GTEs per gesture); 5 (8.3%) had in addition a unique peak in tension (3 GTEs per gesture); and 5 (8.3%) had in addition unique peaks in both pressure and tension. The distribution of time intervals between successive GTE (mode = 9±1ms, range 4 – 116 ms) was non-Gaussian, with 66% of the intervals ≤ 30 ms (Fig. 4c). This is graphically emphasized with tick marks showing all GTEs in Fig. 4a and Supplementary Figs 2,3. Most gestures corresponded to notes (the smallest unit of song organization recognized by ornithologists), yet motor activity at GTE maxima could subdivide notes, for example where a neuron burst and the pressure reached a maximum in the middle of a constant frequency harmonic stack (Supplementary Fig. 2). These examples highlight that for some HVC(p) the patterns of activity would not be interpretable with a purely spectrographic analysis of song[5]. We also observed cases where HVC(p) burst at the onset of relatively pure pressure-only or tension-only trajectories, with a preponderance for pressure-only trajectories (Fig. 2d). If such neurons project to distinct regions of HVC’s afferent targets, which are organized based on the syringeal muscles and interactions with respiratory system, such observation could help resolve the long-standing riddle of HVC’s topographic organization.To quantify these observations, we calculated the time between each spike in each burst to the closest GTE for all 17 bursts. The resulting distribution was approximately Gaussian, with bursts on average preceding the closest GTE (mean = –5.6 ± 0.3 ms, σ = 6.7 ± 0.3 ms; Fig. 4d). A bootstrap procedure (Supplementary Note 2) confirmed that the correspondence to the closest GTE was statistically significant (F test, P<0.045). This indicates that the timing of HVC(p) bursts is associated with the timing of GTE. Given a minimal delay between activity of HVC(p) and sound production estimated between 25–50 ms[23], the minimal 15 ms delay for auditory feedback to HVC[8], and that the duration of intervals between GTE varied greatly (Fig. 4c), it is remarkable that the timing of HVC(p) bursting was synchronized with near–zero time lag to a model of actual behavioral output.
Interneurons are suppressed at GTE
We also noted a relation between the minima in the activity of HVC(i) and the timing of GTE. To characterize this, for each interneuron, we bined the spikes in 10 ms windows for each acoustic presentation. The resultant average response traces were smoothed and the minima in the smoothed traces were identified (see Methods). For an example neuron, the average response is shown in green, the superimposed smoothed curve in black, and the minima in red dots (Fig. 5a, bottom panel). Each HVC(i) did not have minima at all GTE, but across all neurons, we observed a close alignment between the times of the minima and the times of GTE. (A non-significant relation was observed for maxima of HVC(i) activity; Supplementary Fig. 4.) Computing the differences between the time of each minimum that occurred during phonation and the closest GTE resulted in a distribution that was approximately Gaussian (mean = –0.82 ms ± 0.60 ms, σ = 7.3 ± 1.4 ms; Fig. 5b). We compare this distribution to the distribution of randomly positioned minima within each motif using the bootstrap procedure and found them to be significantly different (F test, P<0.016, Supplementary Note 2). Additional tests identified marginally significant locking to GTE for one of four birds (Supplementary Note 3). Thus, the precise activity of HVC(i)[7] can help shape the timing of HVC(p). This suggests a simple model where bursts of activity of HVC(p) suppress activity in HVC(i), whose ongoing activity helps shape the next HVC(p) burst.
Figure 5
Suppressed interneuron activity is associated with GTEs
a, Organized as in Fig. 4a, but with spike count response to the song (10 ms bin, 20 repetitions; green line) for one HVC(i), and a smoothed measure of the response (black line; see Methods). Red squares indicate the time of the minima in the smoothed measure, and the vertical lines indicated the position of the closest GTE to each minima. b, distribution of time differences between spike response minima and their closest GTE in sleeping birds (15 HVC(i), 5 birds). c, Same analysis in singing birds (10 HVC(i), 3 birds).
A representation of gestures during singing
Given that our results were obtained by broadcasting songs to sleeping birds, it is natural to inquire if during singing the activity of HVC neurons are also locked to gesture transitions. Previous results have demonstrated tight temporal locking comparing daytime singing activity and auditory-driven responses during sleep of single RA neurons in zebra finches[4], and HVC neurons in awake swamp sparrows and Bengalese finches that respond to auditory stimulation[6], but similar observations have yet to be reported for zebra finch HVC neurons. We made recordings from HVC in singing birds (N = 3 birds), including 5 phasic neurons bursting during phonation (recorded in two of the three birds, Fig. 6, Supplementary Fig. 5); one neuron had two bursts per motif, and 10 tonic neurons. We confirmed that during singing, all sparse bursts of HVC(p) occurred at gesture transitions (Fig 4e). Following the same analysis as in sleeping birds (but here, since each motif of song could vary, it was independently modeled), we observed for singing birds even more precise timing of HVC(p) than was observed during sleeping (cf. Fig. 4d, e). The Gaussian fit for the population of phasic neurons recorded during singing (mean = –1.35 ms ± 0.10 ms, σ = 4.0 ± 0.1 ms; Fig. 4e) was significantly different from the bootstrapped random distribution (F test, P<0.025, see Supplementary Note 2 and Supplementary Fig. 6). The minima activity of tonic neurons recorded during singing also showed precise timing relative to GTEs (Gaussian fit for the minima: mean = –0.12 ms ± 0.4 ms, σ = 4.0 ± 0.4 ms; Fig. 5c), and this was significantly different than the bootstrapped random distribution (F test, P<0.002). Additional analyses demonstrated significant locking of minima to GTE in two of three singing birds (Supplementary Note 3). As for sleeping birds, the maxima of tonic neural activity showed no evidence of a significant locking to the GTEs (Supplementary Fig. 4c). Finally, examining the data from a prior study of zebra finches[24] we observed that during singing the timing of HVC(RA) bursts were closely associated with the timing of HVC(X) bursts (Supplementary Fig. 7. In light of our results, this supports the hypothesis that all classes of HVC neurons are active in relation to the timing of gestures, although the multiple subtypes of HVC(RA), HVC(X), and HVC(i) have yet to be evaluated.
Figure 6
During singing, HVC(p) fired in the vicinity of GTE
a, A HVC(p) neuron bursts locked to the vicinity of a GTE even as the syllable sequence and time interval varies. b, For another bird, the burst of a HVC(p) neuron is locked to a GTE in the vicinity of a subtle acoustic transition.
Previously it was concluded that the timing of song syllables was unrelated to the timing of HVC(p) discharge[5,24] in singing birds. Given the sparse bursting of these cells this led to the idea that the output of HVC had a time clock-like function with a nearly uniform “tick” size of approximately 10 ms[23] supported by a “syn-fire” chain of synaptic activity across HVC(p)[5]. Instead we find that the bursting of HVC(p) and modulation of HVC(i) activity is timed to significant instances of motor gestures. The sequential firing across the population of HVC(p) unfolds in an ordered fashion[5] but time is not explicitly represented in HVC. Instead, the statistics of HVC activity are closely tied to syringeal/vocal tract mechanics. Given the broad distribution of times between GTE, if HVC activity is synchronized with GTEs this is inconsistent with a syn-fire network that is active at every moment. The distinction between these two models of HVC has additional broad implications for the functional organization of the song system, for song learning, and for motor coding.Since gestures vary greatly in duration, and RA only has access to the times of GTE, then downstream components of the motor pathway (RA and presumably brainstem) should generate independent dynamical information to sustain the detailed structure within each gesture (cf.[23,25]). Previous experimental results, including the effects of electrical stimulation of HVC or RA during singing[26] and lesions of nuclei afferent to HVC[27] implicate information in HVC encoding larger units of song. This might arise if some gestures or transitions are over-emphasized in HVC relative to others. Finally, gestures are learned, which is consistent with the physiological properties of HVC neurons: integration over hundreds of milliseconds and multiple syllables, non-linear summation over syllables in a sequence preceding the excitatory response, and selective response to BOS[4,8,9,28,29,30]. The information about groupings of gestures such as syllables can be carried in these integrated signals. This also re-emphasizes synaptic modification in HVC, not just changes at HVC-RA synapses, are associated with feedback mediated sensorimotor learning (cf. [23]). HVC also projects to the cortico-basal ganglia pathway which contributes to learning–mediated synaptic modification in RA by introducing variance into song output[31,32]. This suggests the hypothesis that the variance is structured not in an auditory framework but around specific features of song motor gestures.
A forward model for vocomotor control
If activity in HVC is in synchrony with little time lag with motor gestures occurring at the periphery this would tend to bring it into temporal register with fixed (circa 15 ms) delayed auditory[33], proprioreceptive[20], or brainstem[34] feedback. This allows movements to be represented in HVC by gestures of greatly varying duration (with dynanics principally generated through internal HVC interactions) while each gesture is referenced to a common time framework for evaluating feedback (with feedback arriving through distinct, extrinsic inputs). This suggests that projection neurons represent a prediction about the actual behavioral output at that moment in time, constituting an unexpected form of a “forward” or predictive model to resolve the problem of the delay in sensorimotor control[35]. Assuming that behavior is subdivided into gestures, and only the transitions (GTE) are represented by HVC output (HVC(p)), then the intervals between the transitions could accumulate feedback information by modifying the tonic activity of HVC(i) and subsequently the spike bursting of HVC(p). Indeed, HVC receives multiple sources of feedback including input form the primary motor cortex RA[36], thalamic input carrying brainstem respiratory, auditory, and proprioreceptive information[21,34,37], and forebrain auditory input [38].We have described song organization based on gestures, by taking advantage of the dynamical systems modeling framework to go beyond spectrographs. These features of motor systems organization could obtain generally[39]. Our data support Sherrington’s long-standing hypothesis that the motor cortex is a synthetic organ, representing segments of whole movements[1,40]. In humans the production of speech and the performance of athletes and musicians are an exceptional example of highly precise learned skilled behavior that could share mechanisms to those described here. Developing corresponding models for human speech production should help inform speech and language pathologies where sequential behavior is disrupted.
Methods
Subjects, songs, and surgeries
All procedures were in accordance with a protocol approved by the University of Chicago Institutional Animal Care and Use Committee. Songs were recorded from 12 birds and electrophysiology was conducted on 9 adult male zebra finches (Taeniopygia guttata) bred in our colony. Birds were prepared for recordings with surgeries using standard techniques to implant a head pin (for auditory experiments)[10] or motorized microdrive (for singing experiments)[5]. For auditory experiments, adults were maintained on a 16/8 reversed light cycle in sound isolation boxes. Songs were recorded and filtered using custom software (SABER, A.S. Dave) then edited (Praat, P. Boersma and D. Weenink, www.praat.org). Edited songs included two or three repetitions of one motif, and were typically 2 – 4 s in duration. Birds were prepared for recordings with surgeries using standard techniques to implant a head pin (for auditory experiments)[10] or motorized microdrive (for singing experiments)[5]. Birds were allowed to recover for 2 or 3 days before the first of the days of recordings, and rest for at least 2 days between recording sessions.
Electrophysiology, stimulus presentation, and spike analysis
HVC extracellular recordings were performed in head-fixed sleeping or singing tethered birds. Recordings were post-processed with a spike-sorting algorithm (Klusters, L. Hazan, klusters.sourceforge.net and custom software written by C.D. Meliza) to separate the times of spike events for each unit. For experiments in singing birds, all well-isolated neurons are reported. For auditory experiments, only BOS responsive neurons were recorded. The auditory stimuli were presented randomly with an interstimulus interval of 7±1 s. The neural response to each song is quantified in terms of the Z score[25] :
where μ is the mean response during the auditory stimulus (S) and μ is the mean response during background activity (BG). The denominator of the equation is the standard deviation of (S – BG). The background was estimated by averaging the firing rate during a 2 sec period. The Z score of the mBOS, CON, and REV were normalized to the BOS Z score, and averages across neurons were reported as mean of normalized responses±s.e.m. For interneurons, the strength of the response varied across the motifs[42]. We picked the last (second or third) motif, which gave the strongest response, to analyze the timing of spikes relative to GTE. This minimized false peaks and troughs in the response profiles. In singing birds, interneurons fired reliably for each motif and all motifs were incorporated into the analysis. The average response of each interneuron (1 ms resolution) was smoothed using a Savitsky Golay filter (polynomial local regression[41]) and the minima were identified using a 21-point sliding window.
Reconstruction of motor gestures
We assumed flow induced oscillations of opposing labia as a sound source model for birdsong production[14]. This model assumes that for high enough airflow values, the labia start to oscillate with a wavelike motion. Assuming two basic modes active (a flapping like motion and a lateral displacement of the tissues, appropriately out of phase), a system of equations describe the dynamics of the medial position x(t) of one of the opposing labia, at one of the sound sources. These read
where the first term in the second equation is the restitution in the labium, the second term accounts for the dissipation, and the last term for the force due to the interlabial pressure. The average pressure p can be written in terms of the displacement and its velocity[3]. These equations describe a set of qualitatively different dynamical regimes. To gain independence from the details of any particular model presenting these regimes, we worked with a normal form that unfolds into a Saddle node in limit cycle bifurcation and a Hopf bifurcation. The normal form, which is analytically derived[43], constitutes the simplest set of equations for any model in which oscillations arise in either of these two bifurcations. Once this reduction is performed, the selection of parameters that allow obtaining a sound with specific acoustic features gives rise to unique values. The normal form equations are shown in Fig. 1, and display the same set of dynamical regimes[3] as the physical model, with scaling through a time constant γ. Once x(t) is computed, the pressure at the input of the tract is computed as P where T is the time for a sound wave to reach the end of the tube and return, and α(t) is proportional to the average mean velocity of the flow. The transmitted pressure fluctuation P forces the air in the glottis, which is approximated by the neck of a Helmholtz resonator (used to model the OEC[3,44]), i.e., a large container with a hole, such that the air in its vicinity oscillates due to the springiness of the air in the cavity. A linear set of three ordinary differential equations accounts for the dynamics of the air flow and pressure in this linear acoustic device[3], resulting in the final output pressure P (Fig. 1).We reconstructed the parameters driving the equations of the normal form (α(t) and β(t)), as well as the parameters describing the tracheal length and the OEC cavity in such a way that the synthesized sounds presented the same fundamental frequencies and spectral content as natural song. Reconstructions over sequential sound segments gave estimates of the time dependence of physiological parameters used during song production. A linear integrator (𝝉 = 2.5 ms) was used to compute the envelope of the sound signal. A threshold was used to identify phonating segments. For those longer than 20 ms, we decomposed the recorded songs into successive 20 ms segments (time between consecutive segments ∆t = 1/20000 s). These were short enough to avoid large variation of the physiological gestures, and long enough to compute spectral content. For each segment, we computed the spectral content index (SCI)[16] and the fundamental frequency. A search in the parameter space (α(t), β(t)) was performed over a grid so that the synthetic sounds produced would match the fundamental frequencies of the song segment being fitted. Over the set of (α(t), β(t)) values selected, a search was performed so that SCI of the synthetic sounds matched the value of the song segment[3]. For sound segments shorter than 20 ms, the fundamental frequency was computed as follows. First, we selected the relative maxima of the sound signal that reached the sound envelope. Then, the fundamental frequency was computed as the inverse of the time difference between the next two consecutive selected maxima. The SCI at that time was estimated as the average value among all the possible SCI values, corresponding to that frequency in the framework of the model[16]. With those estimations of fundamental frequency and SCI, (α(t), β(t)) were computed. Brief segments were typically fast trills. We modeled those as rapid oscillations of pressure and tension, with the amplitude of the pressure oscillations such that the maxima fall in the phonating region, and amplitude of the tension oscillations such that the frequency range of the vocalization was reproduced. We found that most of the parameters could be well approximated by either fractions of sine functions, exponential decays, constants, or combinations of those.Using these analytic functions as parameters of the model to generate a synthetic copy of the recorded song resulted in a noiseless surrogate song (e.g., Supplementary Fig. 1, Noise=0). The addition of noise allowed the gradual recovery of realistic timbric features. In the text, the dimensionless variable Noise varied between 0 and 40, with Noise=5 corresponding to a fluctuation size equal to 2.5 percent of the maximum range of the β(t) parameter. Notice that the timbric effect will be more important for low frequency sounds, which explore a small range of β(t).For each bird, the length of the trachea was chosen so that the frequencies close to 2.5 kHz and 7 kHz in the bird’s song were the first and second resonances of a tube closed at one end. This corresponds to a length of 3.5 cm[45]. Typically, zebra finch songs present a third important resonance around 4 kHz. The parameters of the Helmholtz resonator were adjusted so that its resonant frequency would account for this resonance[3]. The synthetic songs for sleeping birds were generated before doing the electrophysiological experiments. For singing birds all song reconstructions were also performed blind to the spike data.
Authors: Yonatan Sanz Perl; Ezequiel M Arneodo; Ana Amador; Franz Goller; Gabriel B Mindlin Journal: Phys Rev E Stat Nonlin Soft Matter Phys Date: 2011-11-16
Authors: Eric E Bauer; Melissa J Coleman; Todd F Roberts; Arani Roy; Jonathan F Prather; Richard Mooney Journal: J Neurosci Date: 2008-02-06 Impact factor: 6.167
Authors: Santiago Boari; Yonatan Sanz Perl; Ana Amador; Daniel Margoliash; Gabriel B Mindlin Journal: J Neurophysiol Date: 2015-09-16 Impact factor: 2.714
Authors: Mark J Basista; Kevin C Elliott; Wei Wu; Richard L Hyson; Richard Bertram; Frank Johnson Journal: J Neurosci Date: 2014-12-10 Impact factor: 6.167
Authors: Michel A Picardo; Josh Merel; Kalman A Katlowitz; Daniela Vallentin; Daniel E Okobi; Sam E Benezra; Rachel C Clary; Eftychios A Pnevmatikakis; Liam Paninski; Michael A Long Journal: Neuron Date: 2016-05-18 Impact factor: 17.173