Literature DB >> 31417719

Superregular grammars do not provide additional explanatory power but allow for a compact analysis of animal song.

Abstract

A pervasive belief with regard to the differences between human language and animal vocal sequences (song) is that they belong to different classes of computational complexity, with animal song belonging to regular languages, whereas human language is superregular. This argument, however, lacks empirical evidence since superregular analyses of animal song are understudied. The goal of this paper is to perform a superregular analysis of animal song, using data from gibbons as a case study, and demonstrate that a superregular analysis can be effectively used with non-human data. A key finding is that a superregular analysis does not increase explanatory power but rather provides for compact analysis: fewer grammatical rules are necessary once superregularity is allowed. This pattern is analogous to a previous computational analysis of human language, and accordingly, the null hypothesis, that human language and animal song are governed by the same type of grammatical systems, cannot be rejected.

Entities: Chemical Disease Species

Keywords: Bayesian analysis; animal song; context-free grammar; gibbon; language

Year: 2019 PMID： 31417719 PMCID： PMC6689648 DOI： 10.1098/rsos.190139

Source DB: PubMed Journal: R Soc Open Sci ISSN： 2054-5703 Impact factor: 2.963

Introduction

Language is often considered a property unique to humans, in contrast with other animal behaviour such as birdsong [1-3]. It is a commonly held belief that human language and animal vocal sequences (which we term song herein) belong to different classes of computational complexity. Animal song belongs to the class of regular languages and is modelled by regular expressions, finite-state automata and left/right-branching grammars. Human language, on the other hand, requires superregular analyses. This argument, however, is not supported by empirical evidence, since superregular analyses of animal song are significantly understudied. In particular, only regular analyses have been performed in the original studies on animal song cited by proponents of the argument [1-3]: e.g. song of Bengalese finch—the most popular example discussed in the literature—has been analysed with n-gram models [4], k-reversible finite-state automata [5] and hidden Markov models [6], which are all regular and no study has yet assessed superregular models. The goal of this paper is to perform a superregular analysis of animal song and demonstrate that it can be applied to realistic non-human data. Song data from gibbons were used, since the gibbon is an ape known for its long vocal sequences [7-9]. This superregular analysis does not increase explanatory power but, instead, provides a compact analysis, since fewer grammatical rules are necessary once superregularity is included. This is in accordance with the previous computational analysis of human language that is reviewed in the next section. Given this, the null hypothesis that human language and animal song are governed by the same kind of grammatical systems cannot be rejected.

Preliminaries

A classic argument for the superregularity of human language is based on a mathematical result from formal language theory. This argument assumes that human language allows an unbounded depth of the centre-embedding schematized as NV, where N and V stand for noun and verb (phrases), respectively, and represents the number of repeats. For example, The mouse died. (m = 1), The mouse [the cat chased] died. (m = 2), The mouse [the cat [the dog bit] chased] died. (m = 3), …. Under the assumption of unbounded centre-embedding, human language is non-regular [10,11]. However, this is empirically unsupported since the depth of centre-embedding observed in corpora is m ≤ 3 [12] and sentences with multiple centre-embedded clauses are extremely hard for people to process or accept [13-15]. These empirical results imply that we would have little chance of observing centre-embedding in animal data even if the system behind the data has a superregular architecture. A better empirical argument for the superregularity of human language—in the sense that the approach is more easily applicable to animal studies—comes from probabilistic analysis. Perfors et al. [16] showed that probabilistic context-free grammar (PCFG), which is a superregular language model, is more probable for the analysis of child-directed speech than regular grammars in the posterior. This finding is significant as it provides an important advantage of PCFG analysis, which becomes visible when we decompose the posterior probability. Using Bayes’ Theorem, it is found that the posterior probability p(g | d) of a grammar g given data d is proportional to the product of the prior probability p(g) of g and the likelihood p(d | g) of d given g. Thus, taking the log of the probabilities:The likelihood encodes the grammars’ explanatory power of the data. The prior, on the other hand, evaluates the compactness of the grammars, with smaller grammars being more probable. Table 1 shows the log prior, likelihood and posterior (minus the constant normalizer term; i.e. the sum of the log prior and likelihood) of the best PCFG and regular grammar found by Perfors et al. The advantage of the PCFG over the regular grammar in the posterior comes from the prior, not the likelihood. In other words, the empirical advantage of a PCFG analysis of human language is not that it can explain more data than regular grammars, even though hypothetical data exist that can be explained only by superregular grammars (unbounded centre-embeddings). Instead, PCFG is a more compact analysis of human language than regular grammars.

Table 1.

Log prior, likelihood and posterior (minus the normalizing constant) probabilities of PCFG and the regular grammar reported in [16]. Scores of the best grammars (‘CFG-L’ for PCFG and ‘REG-B’ for regular) are cited.

probability type	PCFG	regular
log prior	−1111	−1943
log likelihood	−25 889	−25 368
total = log posterior (−normalizer)	−27 000	−27 311

Overview of the material and methods

In this section, we outline our material and methods. Technical details are available later in the Material and methods section. The data subject to the grammatical analysis are sequences of acoustic feature vectors (mel-frequency cepstrum coefficients), measured at representative data points of gibbon song recordings. This data format is different from outputs of the standard PCFG: a PCFG outputs string of discrete symbols (called terminals), such as words in human language syntax. Accordingly, we extend the standard PCFG with multivariate Gaussian emission from terminals (cf. [17]). To generate a data sequence, our model first generates a string of terminals using PCFG. Then, a Euclidean vector is generated according to a multivariate Gaussian conditioned on each of the terminals. Our analysis can be viewed as a joint inference of (i) the discrete categories—corresponding to the terminals—of the acoustic feature tokens and (ii) the grammar behind the sequential patterns of the categories. While these two components have been analysed in separate steps in the previous studies on animal song [4-6,18,19], simulation studies of human language learning have shown that a joint inference of multiple aspects of the target language—capturing their correlations—is more successful than separate learning of the components [20-23]. Hence, we consider that the joint analysis is more reliable than the previous two-step analysis. A PCFG consists of grammatical production rules associated with probability of their use. A challenge in a PCFG analysis of animal data is that we do not know a particular set of grammatical rules appropriate for the data as well as probability of the rules. In the current study, we take a Bayesian inference approach to this problem, estimating the posterior probability distribution over PCFGs (both rules and their probability) conditioned on the gibbon data. Importantly, regular grammars are a special case of PCFG that only generate either the left-branching (figure 1b) or right-branching (figure 1c) structures. Hence, the posterior tells us how probable regular grammars are among PCFGs given the gibbon data.

Figure 1.

Possible parses of a string of length 4. The terminal vertices at the bottom of the tree graphs correspond to the individual data points of the string. (a) Non-regular, (b) left-branching (regular) and (c) right-branching (regular). There are infinitely many sets of context-free grammatical rules, and thus it is not appropriate to assume the uniform prior over all the possible PCFGs (improperness). This study adopts the hierarchical Dirichlet process (HDP) for the prior over PCFGs [24,25]. The HDP prior introduces a bias for compact PCFGs: it assigns exponentially smaller prior probability to PCFGs whose production probability mass are spread over a greater number of grammatical rules, favouring those with a smaller number of reusable rules. The Bayesian inference balances the PCFG likelihood (fit to the data) and the HDP prior (compactness), and the posterior probable PCFGs are those that can generate the data with high probability while reusing a limited number of production rules. Similar balancing between the explanatory power (likelihood) and compactness (prior) is widely adopted in scientific evaluation of models [26-28] as well as modern theories of learning [16,29-31]. In practice, it is difficult to directly measure the posterior probability of the regular and non-regular grammars (even with the help of approximation of the posterior). Accordingly, we will report the expected counts of regular and non-regular parses of the training data and held-out test data based on the posterior. The logic is as follows: if the regular grammars are the only probable accounts of the gibbon data in the posterior, then only the regular parses of the data would have large expected counts. We will show that the expected counts of non-regular parses are not small at all in reality (in comparison with the counts of regular parses as well as a random baseline), and thus non-regular grammars are unignorably probable analyses of the gibbon data, countering the previous argument for their regularity [1-3]. The analysis outlined above will show that non-regular grammars are probable analyses of the gibbon data. However, it would not tell us what made the non-regular grammars probable: (i) Did the non-regular grammars enable better fit to the data (explanatory power)?; (ii) Did the non-regular grammars allow for a more compact explanation of the data, using a more limited variety of production rules (compactness)? To address these questions, we will further diagnose the induced posterior inference of PCFGs by comparing it with that of hidden Markov models (HMMs) induced from the same data and the HDP prior [24,32]. HMMs restrict possible grammars to the regular ones, and the comparison helps us understand what kinds of contribution the non-regular grammars—only possible when the hypothesis space is the PCFGs—make to the posterior inference. The question about explanatory power of the non-regular grammars (i) is addressed by comparing the PCFG- and HMM-based posterior predictive probability of the held-out test data. The compactness of grammars (ii) is evaluated by the expected type counts of production rules used for a PCFG/HMM to generate the training data.

Results

Posterior inference of PCFG

The ‘total’ bar of figure 2a shows the expected counts of left-branching, right-branching and non-regular parses of the training data. The parse of the data strings shorter than 3 is unique and regular (both left- and right-branching), and their counts are reported separately (length < 3). A total of 48.00% of the training data (1328.08 strings of the total N = 2767) are expected to have non-regular parses. The ratio increases to 86.97% when we focus on the strings of a length of at least 4(N = 1527), for which non-regular parses are logically possible. Note that these large proportions are not due to the broader variations of non-regular parses than regular parses: in other words, it is not the case that all the possible parses are almost uniformly probable (i.e. induction failure) and the non-regular parses of each string (whose length is represented by l) ganged up against the only two regular parses. We can see this from the ‘optimal’ bar of figure 2a, which shows the expected counts of the Bayes-optimal parses (with the greatest posterior probability) [25]. The expected counts of the optimal parses tell us how confidently the data were parsed: e.g. if a data string has a unique probable parse (=optimal parse), the expected count of that parse is almost 1. If all the possible parses of the data string are equally probable, on the other hand, the expected count of the optimal parse is only . Figure 2a shows that the expected counts of the non-regular optimal parses of the training data are 717.35 and smaller than the total expected counts of the non-regular parses—meaning that there was some uncertainty in the parsing. However, the expected counts of the non-regular optimal parses are still an order of magnitude greater than the uniform baseline (56.11, represented by the ‘uniform baseline’ bar of figure 2a). The expected counts of the non-regular optimal parses are also greater than those of the left-branching parses (105.92 in total, 37.80 for the optimal) and the right-branching parses (315.00 in total, 378.33 for the optimal).

Figure 2.

Expected counts of parses of the training (a) and test (b) data. The ‘total’ bar shows the total expected counts of each parse type. The ‘optimal’ bar shows the expected counts of the optimal parses. The ‘uniform baseline’ bar shows the expected counts of the non-regular parses under the assumption of the uniform distribution over the logically possible parses of each data string. A similar pattern was observed with the held test data (figure 2b). A total of 54.13% of the entire data (3974.86 strings of the total N = 7343) and 88.23% of the strings of a length of at least 4 (N = 4505) are expected to have non-regular parses. The expected counts of the non-regular optimal parses are 1895.36, an order of magnitude greater than the uniform baseline (166.23). The expected counts of the non-regular optimal parses are also greater than those of the left-branching parses (217.72 in total, 61.51 for the optimal) and the right-branching parses (886.42 in total, 1144.21 for the optimal). In summary, the results suggest that a large portion of the gibbon data was parsed in non-regular ways, and thus non-regular grammars are unignorably probable analyses of the data in the posterior.

Predictive power

Figure 3 shows the distribution of the log posterior predictive probability density of the test data divided by the string lengths (and thus ‘per data point’) with the induced PCFG (mean: −42.900044, std: 3.421662) and HMM (mean: −42.797174, std: 3.535390). The mean density of the PCFG is slightly smaller than that of the HMM. This generalization holds even when we focus on the long-string portion of the data, whose length is 4 or greater and non-regular parses are possible (N = 4505; mean: −41.954381 and std: 1.886496 with PCFG; mean: −41.794641 and std: 1.964568 with HMM). This indicates that the non-regular parses allowed by the PCFG did not provide extra predictive power beyond the regular parses produced by the HMM.

Figure 3.

Log per-data-point posterior predictive probability density of test data (N = 7343) with PCFG and HMM.

Compactness

Table 2 reports the expected type counts of production rules used for a PCFG/HMM to generate the training data: PCFG is expected to generate training data with 52 (counted with [ · ≥1]) or 77 (with tanh) fewer rules and thus allows for a more compact analysis than HMM.

Table 2.

Expected type counts of production rules used for a PCFG/HMM to generate training data. The type counts were calculated in two ways by applying the step function [ · ≥1 ] and tanh(·) to the expected token frequency of rules.

activation function	HMM	PCFG
≥1	274	222
tanh	307.523023	230.429057

Discussion and conclusion

The parse induction result shows that non-regular parses were more probable latent structures of gibbon song than the regular parses. This implies a superregular strong generative capacity of the gibbon’s grammar, which contradicts the previous opinion that regular grammars are sufficient to analyse animal song [1-3]. We neither claim that the gibbon is an exceptional species that exhibits a superregular system (due to, for example, its phylogenetic closeness to humans compared to songbirds) nor that the superregular versus regular boundary is contiguous between primates and non-primates. Superregular analyses of animal song are currently understudied, and thus, it is an open question whether such analyses are appropriate for other specific species. Given that animal song and human language both deserve a superregular analysis, their comparative studies could be more meaningful for our understanding of human language evolution than previously claimed by theoretical linguists [3]. The skepticism on the effectiveness of such comparative studies was based on the assumption that animal song was regular and thus there was no hope to find grammatical similarities between human language and animal song. By analysing human language and animal data with the same class of grammars, we would obtain a deeper understanding of their similarities and differences, which could in turn help us build a more sophisticated theory of human language evolution. Another key finding in the present study is that PCFG does not improve fit to the gibbon data in comparison with HMM. Instead, the advantage of the superregular analysis of gibbon song is its compactness: fewer rules are necessary to analyse the data if non-regular parses are allowed. Note that this pattern is consistent with previous observations of human syntax: a PCFG is more probable than HMMs/regular grammars given human language sentences, because it reduces the grammar size and thus enables greater prior probability, even though its predictive power (i.e. likelihood) is smaller [16]. The importance of the compactness metric in grammar evaluation has long been emphasized in the field of theoretical linguistics (by the proponents of the regular versus superregular difference between animal song and human language) [33,34]. However, the previous studies of animal song have evaluated models solely by the predictive power of the data [6,19], and the compactness metric has been overlooked. For future studies on animal song syntax, it is important to remember that the advantages of PCFG and other superregular hypotheses may be expressed in the form of compactness/greater prior, rather than improved model explanatory power or likelihood. Bayesian inference and similar approaches (e.g. minimum description length [28,31]) combine the two metrics in a mathematically natural way. Finally, note that the (posterior distribution of) PCFG induced here is not guaranteed to match the actual system behind gibbon song. The induced PCFG is just a computationally optimal hypothesis for the gibbon data at this point, and its biological validity should be assessed in future experimental studies (e.g. callback experiment). Moreover, our hypothesis space was still limited to the class of PCFGs, whereas the actual gibbon grammar might go beyond the context-free generative capacity. Sticking to the same PCFG hypothesis space in future studies would lead to the same error as the previous assumption of regularity. Importantly, linguists have suggested that (P)CFG is insufficient to fully describe human language [35,36], and that at least mildly context-sensitive computational power is necessary [37]. The recent success of language models based on recurrent neural networks further implies that some aspects of linguistic data are better captured by a more powerful, Turing-complete architecture. PCFG itself has also been improved so that it better captures frequently recurring structural patterns [38]. We—researchers of animal song—should open our eyes to these more recent achievements in computational and theoretical linguistics and adopt the new analytical techniques to make the comparative studies more fruitful.

Material and methods

Data

We recorded songs of three male gibbons in captivity at the Primate Research Institute, Kyoto University. One was an agile gibbon (H. agilis, age: 44 year or older), and the others were hybrids of H. agilis and H. albibarbis (age: 19 and 18 year). (Note that H. albibarbis was long considered a subspecies of H. agilis [7,39] until a recent DNA study [40], and thus, the two species are phylogenetically similar.) The songs were originally recorded on an eight-channel microphone array (TAMAGO-03, System in Frontier, Inc., with the FPGA customized such that −12 dBSPL of additional gain was applied), and we used the recordings’ first channels for the analysis. The training data were recorded between 10 and 19 August 2017, between 18.00 and 9:00. The test data were recorded between 1 and 30 September 2017, during the same morning periods. The sampling rate was 16 000 Hz. The multi-channel recordings were intended for use in localization of the sound sources (i.e. singer identification) and separation of sounds from different sources, but neither the localization nor separation was successful in our recording environment (we used the HARK programs with the default and customized transfer functions [41]). Nevertheless, previous observations of wild gibbons suggest our recorded songs would have overlapped little among the three singers: it has been reported that males typically vocalized in turn, with those in adjacent territories staying silent during another’s singing [42-44]. The rarity of overlaps is also in accordance with our impression of the recordings. In addition, we suspect that the existence of overlaps (if any) would not make significant differences in the results of our grammatical analysis reported here. It is easily proved that switching among multiple finite-state automata can be seen as one large finite-state automaton, and thus, data strings produced by such a process constitute a regular language in terms of formal language theory. Although our approach is probabilistic and thus such a claim from formal language theory does not guarantee the validity of our conclusion, the proof above still suggests that data strings produced by multiple individuals do not immediately lead to superregularity. (Note, however, that female gibbons are known to vocalize simultaneously—a type of song called great calls, and communication among multiple individuals is of major interest to researchers of the species [8]. Hence, the sound localization and separation are an important issue that should be addressed in future studies on gibbon song.)

Preprocessing

The inputs to our grammatical analyses were strings of acoustic features at representative points in the recorded sound streams. We first detected regions containing gibbon song in the sound streams by calculating the ratio r of the energy in the song-sound band (500–1500 Hz) to that in the total frequency band (0–8000 Hz). Each frequency band’s energy was the sum of the squared amplitudes in the band obtained from the sound spectrum calculated by the discrete Fourier transform (DFT-windows: 20 ms, DFT-overlap: 10 ms). A region of length greater than or equal to 500 ms was judged to contain song if r > 0.6 at every point in the region. We then downsampled the song regions, leaving only representative data points whose acoustic features constituted data strings that would be subject to the grammatical analyses. By the definition of the song regions above, there were no identifiable silent intervals inside the regions that could define smaller units of the song akin to birdsong syllables [18]. On the other hand, the song regions often consisted of multiple components similar to human speech syllables (figure 4), each of which was considered to correspond to an atomic event of vocal production. Accordingly, acoustic features at local peaks of the wideband envelope were used as the unit components of the string data for the grammatical analysis. The local peaks were systematically detected by the recent method using the Hirbert-transformed signals based on the bandpass-filters designed to model human auditory functions [45]. The acoustic local peaks identified by this method are said to correspond to the articulatory local maxima of mouth opening in the production of human speech syllables. Given the anatomical similarities between the vocal tract of the human and the gibbon [46], we considered that the method was appropriate to locate representative data points of the gibbon song.

Figure 4.

Local peaks in a single song region. The top and bottom rows show the raw sound wave and the spectrogram containing the song region, respectively. The vertical dotted lines stretching from the middle row to the sound wave represent the local peaks detected by the algorithm. The details of the local peak detection algorithm are as follows. We first bandpass-filtered waveforms in the song regions into six frequency bands with cut-off frequencies at 80, 260, 600, 1240, 2420, 4650 and 7999 Hz [45,47]. The resulting bandpassed signals were then Hilbert-transformed, and the absolute values of the six transformed signals (narrowband envelopes) were summed to obtain a wideband envelope [45]. The search for the local peaks of the wideband envelope started with the detection of their candidates. Each local peak candidate was the peak within a 300 ms search window slid 10 ms at a time throughout the recordings. The candidates were adopted as local peaks if the following two conditions were met: (i) the difference between the local peak and the minimum in the 300 ms search window was at least 1.0 (Hirbert unit); and (ii) a local peak was not preceded by another one within 150 ms. Finally, we calculated the 13 Mel-Frequency Cepstrum Coefficients (MFCCs) of the 100 ms window starting from each of the local peaks. While MFCC was originally designed to capture acoustic features of human speech, it has also been adopted in analyses of vocal activities of primates [48] and other species [49]. The present study also followed this standard approach. Note, however, that neural networks have recently started replacing MFCC in the field of computational linguistics [50], and future studies on gibbon song and other animal vocal activities should also consider such new methods, just as we suggested for grammatical analyses of animal data. The MFCC vectors calculated at the local peaks were clustered into strings by the song regions containing the peak locations. This yielded 2767 strings of training data and 7343 strings of test data. (The reason behind the smaller amount of training data than that of the test data was that the induction of PCFG was time-consuming and the amount adopted here was almost the maximum possible given our limited computational resources. The size of the test data, on the other hand, did not have such restrictions, so we used the greater amount—recordings over a month.) We fed these strings of MFCC vectors to the grammatical analyses described below.

Generative models

We adopted the basic design of PCFG with an HDP prior (HDP-PCFG) discussed in [24,25]. We modified their HDP-PCFG in three ways. First, we assumed that the roots of PCFG parse trees could be variable, instead of assuming a unique node label like Chomsky normal forms. This modification allowed us to analyse incomplete data strings that appear in naturalistic data. For instance, English corpora can involve subjectless sentences (e.g. Just came back from the party.) and strings solely consisting of a noun phrase (e.g. Great job.). Variable roots enable a uniform treatment of such incomplete strings and substrings of complete strings. For example, just came back from the party could be parsed as a verb phrase whether it stands alone or appears within a complete sentence like John just came back from the party. Secondly, our data were MFCC vectors (of D : = 13 dimensions) rather than discrete symbols, and thus, the terminal symbols were latent variables in our HDP-PCFG. Accordingly, we also imposed an HDP prior on the terminal production rules. Finally, we assumed a Gaussian emission of the MFCC vectors conditioned on terminals of the HDP-PCFG, with Gaussian and Inverse Wishart priors on the mean and covariance matrix parameters respectively [23,51,52]. This generative model is formally described in figure 5a,c.

Figure 5.

HDP-PCFG with variable root and Gaussian emission conditioned on terminals (a + c), and HDP-HMM with variable initial state and Gaussian emission (b + c).

HDP-PCFG with variable root and Gaussian emission conditioned on terminals (a + c), and HDP-HMM with variable initial state and Gaussian emission (b + c). We also followed [24] and imposed an HDP prior on the HMM (HDP-HMM). The only difference was that MFCC vectors, rather than discrete symbols, were emitted conditioned on the hidden states [52]. This model is summarized in figure 5b,c. The free concentration parameters of the HDPs and beta distributions, denoted by α with subscripts, were all set to 1. The scale matrix of the Inverse Wishart prior of the Gaussian emission was the identity matrix, and the degree of freedom was ν0 = D − 1 + 0.001 = 12.001. The mean m0 of the Gaussian prior was set to the sample mean of all the MFCC vectors, and the scalar of the covariance matrix was k0 = ν0 = 12.001 (cf. [23]).

Posterior inference

Obtaining the true posterior distribution of the HDP-PCFG/HMM given data is computationally intractable. Accordingly, we approximated the posterior by variational inference [24,25,51,52]. Variational inference approximates the intractable posterior distribution p(θ|x), where θ bundles the latent variables of the PCFG/HMM in figure 5, by another distribution q(θ) that belongs to a class of tractable distributions. We adopted the mean-field method and assumed the independence of the latent variables in q(θ), with q(θ) equal to the product of q(β), q(γ), q(ϕ(), , , , q(τ) (τ : = (r, t, z, s)) and q(μ, Σ) for the HDP-PCFG, with a similar factorization assumed for the HDP-HMM. The individual factor distributions were defined as follows. For all types of rule probabilities ϕ, q(ϕ) was a Dirichlet distribution. q(τ) for parses τ was a multinomial distribution. q(μ, Σ) was a Gaussian-Inverse-Wishart distribution. Finally, q(β) and q(γ) were degenerate distributions such that they upper-bounded the possible number of (non-)terminals by positive integers K and K respectively: q(β > 0) = 0 for and q(γ > 0) = 0 for . We set K = 40 and K = 100, and similarly upper-bounded the possible number of HDP-HMM states by K = 100. Only 35 non-terminals, 64 terminals and 58 states had expected frequencies greater than 0.05 for the training data, and so we considered the truncation levels to be sufficiently large. Each variational factor distribution was updated iteratively so that the updates minimized the Kullback–Leibler (KL) divergence of the variational distribution q(θ) to the true posterior p(θ|x) (coordinate ascent algorithm). See [24,25] for the details on updates of the factor distributions for the HDP-PCFG/HMM latent variables (figure 5a,b), and [51,52] for those related to the Gaussian emission (figure 5c). See also [53] for updates of the degenerate factor distributions q(β), q(γ) and q(ω). The coordinate ascent algorithm only leads to a local minimum of the KL divergence, and accordingly, we tried 500 different sets of initial values for the parameters of q and reported the best run among them. Each run terminated either when the improvement in the KL divergence was smaller than 0.1 or when the number of maximum iterations (=300) was achieved. The statistical values reported in the Results section were estimated as follows based on the variational approximation. The expected count of a parse of a training/test data string was proportional to the exponential of its expected log probability, , where θPCFG bundles the latent variables of the HDP-PCFG in figure 5a,c [25]. The Bayes-optimal parses are those maximize this expected log probability. While there are an intractable number of parse types, computation of the expected counts of the left-branching, right-branching and Bayes-optimal parses is tractable: we computed the exponential of their expected log probability and normalized it by the total, , which can be computed by dynamic programming (we adopted Earley’s algorithm in particular [54,55]). The expected counts of the non-regular parses are just the total counts minus the expected counts of the left- and right-branching parses. The posterior predictive probability density of the test data under PCFG/HMM was given by [56,57]. Finally, the expected number of production rules (root-labelling, branching, and terminal production rules for the PCFG; initial and inter-state transitions for the HMM) used for the training data were estimated by applying [· ≥1] and tanh to the expected token frequency of the rules. The expected frequency of a rule A → λ was given by the parameters of its Dirichlet variational distribution q(ϕ) minus its base counts (for terminal production rules), (for branching rules such that λ = (B, C)), or αβ (for other rules).

14 in total

1. Chimaeric sounds reveal dichotomies in auditory perception.

Authors: Zachary M Smith; Bertrand Delgutte; Andrew J Oxenham
Journal: Nature Date: 2002-03-07 Impact factor: 49.962

2. The learnability of abstract syntactic principles.

Authors: Amy Perfors; Joshua B Tenenbaum; Terry Regier
Journal: Cognition Date: 2010-12-24

3. Gibbons and their territorial songs.

Authors: J T Marshall; E R Marshall
Journal: Science Date: 1976-07-16 Impact factor: 47.728

4. Songs, choruses and countersinging of Kloss' gibbons (Hylobates klossii) in Siberut Island, Indonesia.

Authors: R R Tenaza
Journal: Z Tierpsychol Date: 1976-01

5. Duet-splitting and the evolution of gibbon songs.

Authors: Thomas Geissmann
Journal: Biol Rev Camb Philos Soc Date: 2002-02

Review 6. Songs to syntax: the linguistics of birdsong.

Authors: Robert C Berwick; Kazuo Okanoya; Gabriel J L Beckers; Johan J Bolhuis
Journal: Trends Cogn Sci Date: 2011-03 Impact factor: 20.229

7. Lesion of a higher-order song nucleus disrupts phrase level complexity in Bengalese finches.

Authors: T Hosino; K Okanoya
Journal: Neuroreport Date: 2000-07-14 Impact factor: 1.837

8. Automatic classification and speaker identification of African elephant (Loxodonta africana) vocalizations.

Authors: Patrick J Clemins; Michael T Johnson; Kirsten M Leong; Anne Savage
Journal: J Acoust Soc Am Date: 2005-02 Impact factor: 1.840

9. The natural statistics of audiovisual speech.

Authors: Chandramouli Chandrasekaran; Andrea Trubanova; Sébastien Stillittano; Alice Caplier; Asif A Ghazanfar
Journal: PLoS Comput Biol Date: 2009-07-17 Impact factor: 4.475

10. Complex sequencing rules of birdsong can be explained by simple hidden Markov processes.

Authors: Kentaro Katahira; Kenta Suzuki; Kazuo Okanoya; Masato Okada
Journal: PLoS One Date: 2011-09-07 Impact factor: 3.240

3 in total

Review 1. Toward a Computational Neuroethology of Vocal Communication: From Bioacoustics to Neurophysiology, Emerging Tools and Future Directions.

Authors: Tim Sainburg; Timothy Q Gentner
Journal: Front Behav Neurosci Date: 2021-12-20 Impact factor: 3.558

2. Finding, visualizing, and quantifying latent structure across diverse animal vocal repertoires.

Authors: Tim Sainburg; Marvin Thielk; Timothy Q Gentner
Journal: PLoS Comput Biol Date: 2020-10-15 Impact factor: 4.475

3. Measuring context dependency in birdsong using artificial neural networks.

Authors: Takashi Morita; Hiroki Koda; Kazuo Okanoya; Ryosuke O Tachibana
Journal: PLoS Comput Biol Date: 2021-12-28 Impact factor: 4.475

3 in total