Hal Whitehead1, Taylor A Hersh1. 1. Department of Biology, Dalhousie University, Halifax, Nova Scotia, Canada.
Abstract
Recordings of calls may be used to assess population structure for acoustic species. This can be particularly effective if there are identity calls, produced nearly exclusively by just one population segment. The identity call method, IDcall, classifies calls into types using contaminated mixture models, and then clusters repertoires of calls into identity clades (potential population segments) using identity calls that are characteristic of the repertoires in each identity clade. We show how to calculate the Bayesian posterior probabilities that each repertoire is a member of each identity clade, and display this information as a stacked bar graph. This methodology (IDcallPP) is introduced using the output of IDcall but could easily be adapted to estimate posterior probabilities of clade membership when acoustic clades are delineated using other methods. This output is similar to that of the STRUCTURE software which uses molecular genetic data to assess population structure and has become a standard in conservation genetics. The technique introduced here should be a valuable asset to those who use acoustic data to address evolution, ecology, or conservation, and creates a methodological and conceptual bridge between geneticists and acousticians who aim to assess population structure.
Recordings of calls may be used to assess population structure for acoustic species. This can be particularly effective if there are identity calls, produced nearly exclusively by just one population segment. The identity call method, IDcall, classifies calls into types using contaminated mixture models, and then clusters repertoires of calls into identity clades (potential population segments) using identity calls that are characteristic of the repertoires in each identity clade. We show how to calculate the Bayesian posterior probabilities that each repertoire is a member of each identity clade, and display this information as a stacked bar graph. This methodology (IDcallPP) is introduced using the output of IDcall but could easily be adapted to estimate posterior probabilities of clade membership when acoustic clades are delineated using other methods. This output is similar to that of the STRUCTURE software which uses molecular genetic data to assess population structure and has become a standard in conservation genetics. The technique introduced here should be a valuable asset to those who use acoustic data to address evolution, ecology, or conservation, and creates a methodological and conceptual bridge between geneticists and acousticians who aim to assess population structure.
Many animals communicate or sense their environment using sound [1]. It is often logistically easier to record acoustic signals than to collect genetic, morphological, or other phenotypic data. Thus, the characteristics of animal calls have been used to examine a range of issues in biology, including evolution [e.g. 2], population structure [e.g. 3, 4] and conservation [e.g. 5]. Call attributes can be genetically or culturally inherited [e.g. 6, 7]. In either case, if there is drift or selection, variation in these attributes may signal population structure. This will especially be the case if the calls themselves structure populations, for instance if song attributes proscribe mate choice [e.g. 8]. Additionally, if the animals themselves use call attributes to identify segments of a population (“us versus them”), and this population structure circumscribes social interactions, and so social learning opportunities, the acoustically-distinguished population segments will tend to have distinct cultural behaviour in various contexts, including non-acoustic behaviour, such as foraging techniques [9].Thus, there is increasing interest in using acoustic data to examine population structure. This is, however, dwarfed by molecular genetic methodologies. The majority of population structures inferred for animal species are based on genetic data, which are processed using a range of analytical methods [10]. Of these, the STRUCTURE package is particularly popular and influential [11]. STRUCTURE uses a Bayesian approach to calculate, from genetic data, posterior probabilities that individuals belong to each of K source populations, or, in the admixture option, to have a proportional assignment to each of the populations [12, 13]. The results are displayed as stacked bar plots of posterior probabilities that each individual is a member of each population segment, or the estimated mixture proportions of source populations for a given individual. STRUCTURE thus gives direct estimates of the number of population segments, their distributions (in space, time, or along other axes), and confidence in allocations of individuals to the different population segments.An analogous method of analyzing and displaying acoustic data has the potential to be similarly useful for calling animals [14]. The IDcall routine (summarized in Fig 1) uses multivariate information on calls that are grouped into repertoires. It classifies the calls into types using contaminated mixture models. Each call has a probability of being a member of each type. The repertoires of calls are then clustered into identity clades by identity call types: identity clades are marked by one or more identity calls that are made frequently by the repertoires in the identity clade and rarely by those outside it. The IDcall framework is then somewhat analogous to the initial steps of STRUCTURE: the repertoires (from individuals or groups of individuals) are classified into population segments (identity clades), with the number of population segments being determined by the routine. However, only some call types are identity calls, and some repertoires may not be assigned to an identity clade.
Fig 1
The major elements of IDcall framework.
The recordings of sounds are delineated into identity clades based on the identity calls that characterize them.
The major elements of IDcall framework.
The recordings of sounds are delineated into identity clades based on the identity calls that characterize them.Here we show how the output of IDcall can be used to calculate the posterior probabilities that each repertoire is a member of each identity clade, and then display these posterior probabilities as a stacked bar graph using a routine that we call IDcallPP. A similar approach could be used with other methods of clustering acoustic repertoires to ascertain confidence in the assignment of acoustic repertoires to clusters. These outputs, especially the stacked bar graphs, parallel those of STRUCTURE.
Methodology
Theory
In IDcall (see Fig 1), the contaminated mixture model algorithm estimates the probability that each call, i, belongs to each call type, j, as u(i,j) (where u(i,j) = 0 if call i is characterized by a different set of variables than the calls in j). The usage, U, of each call type, j, for each repertoire, r, is calculated by summing the probability of call type membership for all calls {i} in the repertoire and dividing by the total number of calls in the repertoire, n(r):The following procedures in IDcallPP are summarized in Fig 2.
Fig 2
The major elements of IDcallPP.
The output of IDcall is used to produce stacked barplots of the posterior probabilities that each repertoire is a member of each identity clade.
The major elements of IDcallPP.
The output of IDcall is used to produce stacked barplots of the posterior probabilities that each repertoire is a member of each identity clade.Once repertoires are assigned to an identity clade, c, we can estimate the probability distribution of call types in the identity clade as, where r represents the repertoire of interest.
This is somewhat circular, as if repertoire R was assigned membership of identity clade c, the call type distribution within R is used to estimate the distribution of call types of c, which will then be used to calculate the likelihood that repertoire R is from identity clade c. In other words, the calls heard in a repertoire are used to delineate identity clades, the very information that is used to calculate the posterior probability that the repertoire is a member of an identity clade. To remove this circularity, we omit repertoire R from the calculation of the call type distribution of identity clade c when we are addressing the likelihood that R is a member of c, giving a revised version of Eq 2 for calls in repertoire R:Then, using the multinomial distribution, the likelihood of the distribution of call types in a repertoire R given that the repertoire is a member of identity clade c is:Bayes’ theorem gives the posterior probability that repertoire R is a member of identity clade c as:
where Pr(R∈c) is the prior probability that repertoire R is a member of identity clade c. In IDcallPP, these posterior probabilities are displayed as stacked barplots for each repertoire, as well as being output in a.csv spreadsheet file.
Priors
There are two simple formulations for prior probabilities:Equal prior probabilities of each identity clade. This is analogous to the “no admixture model” of STRUCTURE [13].Prior probabilities for each identity clade are the proportion of repertoires assigned to the identity clade. This might make sense if sampling was sufficiently random or uniform (over space, time, or other relevant axes) so that the number of assignations to each identity clade was roughly proportional to its incidence in the population being considered. However, if sampling or assignation might be biased, then this option is likely inappropriate.Other types of priors might be sensible. For instance, in the admixture model of STRUCTURE, the priors for membership of population segments are estimated using Bayesian techniques from the data itself [13]. Such formulations have yet to be implemented for IDcallPP but are a promising avenue for future development.
Options
IdcallPP has the following options:
Priors
The prior probabilities of identity clade membership are either A (equal) or B (proportion of assigned repertoires), as described above. The default is A.
Call types used
Eq 4 can use all call types or just those found to be identity calls. In our explorations with real data (see below), we found that the “all call types” option produced clearer posterior probability plots, presumably because, in our example data sets, the non-identity calls were distributed differently among identity clades, and so provided useful information when assigning identity clade membership. This need not necessarily be the case for all data sets. However, using all call types is the default.
Repertoire order
The order in which the repertoires are displayed in the stacked bar plot is, by default, the order in the dendrogram plus heat map plot output from IDcall, so that the two plots can be displayed directly above one another with the repertoires lining up (see Figs 3–6). Alternatively, the input order of repertoires may be used. This could be useful if the distribution of identity clades across some axis of interest (such space or time) is desired.
Fig 3
Identity clades, identity song types, and posterior probabilities of repertoire assignment for crickets.
Output from IDcall (top; taken from [14]) depicts similarity among male Teleogryllus cricket calling songs recorded from individuals derived from 16 field sites in Australia (data from [17]). ‘Oce’ indicates song repertoires recorded from crickets belonging to the oceanicus species (in teal). ‘Com’ denotes song repertoires recorded from crickets belonging to the commodus species (in brown). The letters in parentheses denote field sites (see [17] for site abbreviations). For each song, we created an interval vector comprised of four traits: chirp pulse length, chirp interpulse interval, chirp-trill interval, and trill pulse length (see [17] for details on how song traits were measured). Each repertoire (i.e. branch in the dendrogram) contains all the songs recorded from first-generation crickets that were derived from wild-caught individuals from each field site. (A) The average linkage hierarchical clustering dendrogram thus depicts similarity among song interval vectors of male crickets from the 16 sites. (B) The heatmap shows identity song type usage (rows) for each field site (columns) in shades of grey, with usage calculated based on probabilistic assignment of songs to types. Identity song type codes are on the left of the heat map and centroid song interval vector plots are on the right (with the spaces between the dots representing chirp pulse length, chirp interpulse interval, chirp-trill interval, and trill pulse length, and the scale bar in seconds). (C) The output from IDcallPP shows the posterior assignment probabilities of each repertoire belonging to each identity clade (i.e. species) as a stacked bar plot. See [14] and [17] for additional details.
Fig 6
Identity clades, identity coda types, and posterior probabilities of repertoire assignment for Pacific sperm whales.
Output from IDcall (top; taken from [14]) depicts similarity among coda repertoires of sperm whale groups (data from [14]) recorded in the Pacific Ocean. Colored identity clades correspond to a putative new sperm whale clan (orange) and four known clans: Short (red), Four-Plus (pink), Plus-One (blue), and Regular (green). Location abbreviations are: BAK = Baker Island, CHI = Chile, EAS = Easter Island, ECU = Ecuador, GAL = Galápagos Islands, JAR = Jarvis Island, NEW = New Zealand, PacPAN = Pacific coast of Panama, PER = Peru, and TON = Tonga. Each coda was represented as a vector of inter-click intervals. Each repertoire (i.e. branch in the dendrogram) contains all the codas recorded from a single photo-identified group of sperm whales in a year. (A) The average linkage hierarchical clustering dendrogram thus depicts similarity among 106 sperm whale coda repertoires. (B) The heatmap shows identity coda type usage (rows) for each repertoire (columns) in shades of grey, with usage calculated based on probabilistic assignment of codas to types. Identity coda type codes are on the left of the heat map and centroid coda interval vector plots are on the right (with the spaces between the dots representing the inter-click intervals and the scale bar in seconds). (C) The output from IDcallPP shows the posterior assignment probabilities of each repertoire belonging to each identity clade (i.e. vocal clan) as a stacked bar plot. See [14] for additional details.
Identity clades, identity song types, and posterior probabilities of repertoire assignment for crickets.
Output from IDcall (top; taken from [14]) depicts similarity among male Teleogryllus cricket calling songs recorded from individuals derived from 16 field sites in Australia (data from [17]). ‘Oce’ indicates song repertoires recorded from crickets belonging to the oceanicus species (in teal). ‘Com’ denotes song repertoires recorded from crickets belonging to the commodus species (in brown). The letters in parentheses denote field sites (see [17] for site abbreviations). For each song, we created an interval vector comprised of four traits: chirp pulse length, chirp interpulse interval, chirp-trill interval, and trill pulse length (see [17] for details on how song traits were measured). Each repertoire (i.e. branch in the dendrogram) contains all the songs recorded from first-generation crickets that were derived from wild-caught individuals from each field site. (A) The average linkage hierarchical clustering dendrogram thus depicts similarity among song interval vectors of male crickets from the 16 sites. (B) The heatmap shows identity song type usage (rows) for each field site (columns) in shades of grey, with usage calculated based on probabilistic assignment of songs to types. Identity song type codes are on the left of the heat map and centroid song interval vector plots are on the right (with the spaces between the dots representing chirp pulse length, chirp interpulse interval, chirp-trill interval, and trill pulse length, and the scale bar in seconds). (C) The output from IDcallPP shows the posterior assignment probabilities of each repertoire belonging to each identity clade (i.e. species) as a stacked bar plot. See [14] and [17] for additional details.
Identity clades, identity song types, and posterior probabilities of repertoire assignment for wrens.
Output from IDcall (top; taken from [14]) depicts similarity among male songs (data from [18]) from two subspecies of grey-breasted wood-wren: Henicorhina leucophrys hilaris (salmon) and Henicorhina leucophrys leucophrys (navy). Genotyping abbreviations are: Hil, parental H. l. hilaris; Leu, parental H. l. leucophrys; F1, first-generation hybrid; BC-hil, backcross between Hil and F1; and BC-leu, backcross between Leu and F1. For each song, we created an interval vector comprised of three traits: averaged note peak frequency, minimum song frequency, and maximum song frequency (see [18] for details on how song traits were measured). Each repertoire (i.e. branch in the dendrogram) contains all the songs recorded from a single individual. (A) The average linkage hierarchical clustering dendrogram thus depicts similarity among song interval vectors of 41 male wrens. (B) The heatmap shows identity song type usage (rows) for each wren (columns) in shades of grey, with usage calculated based on probabilistic assignment of songs to types. Identity song type codes are on the left of the heat map and centroid song interval vector plots are on the right (with the spaces between the dots representing averaged note peak frequency, minimum song frequency, and maximum song frequency, and the scale bar in Hertz). (C) The output from IDcallPP shows the posterior assignment probabilities of each repertoire belonging to each identity clade (i.e. subpecies) as a stacked bar plot. See [14] and [18] for additional details.
Identity clades, identity coda types, and posterior probabilities of repertoire assignment for Atlantic/Mediterranean sperm whales.
Output from IDcall (top; taken from [14]) depicts similarity among coda repertoires of sperm whale groups (data from [14]) recorded in the Atlantic Ocean and Mediterranean Sea. Colored identity clades correspond to three sperm whale clans: Mediterranean (cyan), EC2 (gold), and EC1 (purple). Location abbreviations are: AtPAN = Atlantic coast of Panama, BAL = Balearic Islands, CAR = eastern Caribbean islands, and GOM = Gulf of Mexico. Each coda was represented as a vector of inter-click intervals. Each repertoire (i.e. branch in the dendrogram) contains all the codas recorded from a known social unit of whales in a year or, if the identity of the recorded whales was unknown, all the codas recorded on a single day. (A) The average linkage hierarchical clustering dendrogram thus depicts similarity among 82 sperm whale coda repertoires. (B) The heatmap shows identity coda type usage (rows) for each repertoire (columns) in shades of grey, with usage calculated based on probabilistic assignment of codas to types. Identity coda type codes are on the left of the heat map and centroid coda interval vector plots are on the right (with the spaces between the dots representing the inter-click intervals and the scale bar in seconds). (C) The output from IDcallPP shows the posterior assignment probabilities of each repertoire belonging to each identity clade (i.e. vocal clan) as a stacked bar plot. See [14] for additional details.
Identity clades, identity coda types, and posterior probabilities of repertoire assignment for Pacific sperm whales.
Output from IDcall (top; taken from [14]) depicts similarity among coda repertoires of sperm whale groups (data from [14]) recorded in the Pacific Ocean. Colored identity clades correspond to a putative new sperm whale clan (orange) and four known clans: Short (red), Four-Plus (pink), Plus-One (blue), and Regular (green). Location abbreviations are: BAK = Baker Island, CHI = Chile, EAS = Easter Island, ECU = Ecuador, GAL = Galápagos Islands, JAR = Jarvis Island, NEW = New Zealand, PacPAN = Pacific coast of Panama, PER = Peru, and TON = Tonga. Each coda was represented as a vector of inter-click intervals. Each repertoire (i.e. branch in the dendrogram) contains all the codas recorded from a single photo-identified group of sperm whales in a year. (A) The average linkage hierarchical clustering dendrogram thus depicts similarity among 106 sperm whale coda repertoires. (B) The heatmap shows identity coda type usage (rows) for each repertoire (columns) in shades of grey, with usage calculated based on probabilistic assignment of codas to types. Identity coda type codes are on the left of the heat map and centroid coda interval vector plots are on the right (with the spaces between the dots representing the inter-click intervals and the scale bar in seconds). (C) The output from IDcallPP shows the posterior assignment probabilities of each repertoire belonging to each identity clade (i.e. vocal clan) as a stacked bar plot. See [14] for additional details.Colors of identity clades: By default, the stacked bar plot of posterior probabilities uses the same color for each identity clade as in the dendrogram plus heat map plot output from IDcall (see Figs 3–7). However, these can be changed.
Fig 7
Stacked barplots showing posterior probability distributions of repertoires to identity clades using all calls (above) and just identity calls (below) for the four example data sets.
Application examples
We use the same four example acoustic data sets from three taxa as in [14]: Australian field crickets (Teleogryllus spp.; hereafter crickets), grey-breasted wood-wrens (Henicorhina leucophrys; hereafter wrens) and sperm whales (Physeter macrocephalus; Atlantic/Mediterranean and Pacific datasets). These examples investigate population structure within species (sperm whales), among subspecies (wrens), and among species (crickets). For details of call variables, repertoire definitions, etc., see [14]. In each of Figs 3–6, we show the dendrogram plus heat map output from IDcall [14] above the stacked bar plot of posterior probabilities of identity clade membership from IDcallPP (using the default options listed above).In all four example data sets, the posterior assignment probability plots from IDcallPP generally support the identity clade assignations of IDcall. The posterior probabilities for the cricket data set (Fig 3) shows almost perfect assignation to identity clades. It is also very good for the wren data set (Fig 4) with almost all posterior probabilities to the assigned clade greater than 0.7. The Atlantic/Mediterranean sperm whale data (Fig 5) is also very “clean” with only two repertoires having posterior probabilities to an identity clade of less than 0.7. One is a repertoire (leftmost arrow in Fig 5) that was not assigned to an identity clade; the other (rightmost arrow in Fig 5) was a repertoire that appeared on initial annotation to be a mixture of the codas from two previously described sperm whale clans (identity clades), Eastern Caribbean 1 and Eastern Caribbean 2 [15]. The posterior probabilities for the Pacific sperm whale repertoires are somewhat less clear (Fig 6). The great majority of the repertoires assigned to four of the identity clades (putative new, Four-Plus, Plus-One, and Regular clans) had posterior probabilities of >0.7 for their assigned clans. A few of the exceptions echo previous analyses. For instance, the recordings of a repertoire without a clearly dominant posterior probability (arrow in Fig 6) were from a day when photoidentification evidence indicated that there might be two clans present. Additionally, the repertoires assigned to the Short clan generally have much lower posterior support, which agrees with conclusions from the original IDcall analysis that the nature and structure of this identity clade were much less certain [14].
Fig 4
Identity clades, identity song types, and posterior probabilities of repertoire assignment for wrens.
Output from IDcall (top; taken from [14]) depicts similarity among male songs (data from [18]) from two subspecies of grey-breasted wood-wren: Henicorhina leucophrys hilaris (salmon) and Henicorhina leucophrys leucophrys (navy). Genotyping abbreviations are: Hil, parental H. l. hilaris; Leu, parental H. l. leucophrys; F1, first-generation hybrid; BC-hil, backcross between Hil and F1; and BC-leu, backcross between Leu and F1. For each song, we created an interval vector comprised of three traits: averaged note peak frequency, minimum song frequency, and maximum song frequency (see [18] for details on how song traits were measured). Each repertoire (i.e. branch in the dendrogram) contains all the songs recorded from a single individual. (A) The average linkage hierarchical clustering dendrogram thus depicts similarity among song interval vectors of 41 male wrens. (B) The heatmap shows identity song type usage (rows) for each wren (columns) in shades of grey, with usage calculated based on probabilistic assignment of songs to types. Identity song type codes are on the left of the heat map and centroid song interval vector plots are on the right (with the spaces between the dots representing averaged note peak frequency, minimum song frequency, and maximum song frequency, and the scale bar in Hertz). (C) The output from IDcallPP shows the posterior assignment probabilities of each repertoire belonging to each identity clade (i.e. subpecies) as a stacked bar plot. See [14] and [18] for additional details.
Fig 5
Identity clades, identity coda types, and posterior probabilities of repertoire assignment for Atlantic/Mediterranean sperm whales.
Output from IDcall (top; taken from [14]) depicts similarity among coda repertoires of sperm whale groups (data from [14]) recorded in the Atlantic Ocean and Mediterranean Sea. Colored identity clades correspond to three sperm whale clans: Mediterranean (cyan), EC2 (gold), and EC1 (purple). Location abbreviations are: AtPAN = Atlantic coast of Panama, BAL = Balearic Islands, CAR = eastern Caribbean islands, and GOM = Gulf of Mexico. Each coda was represented as a vector of inter-click intervals. Each repertoire (i.e. branch in the dendrogram) contains all the codas recorded from a known social unit of whales in a year or, if the identity of the recorded whales was unknown, all the codas recorded on a single day. (A) The average linkage hierarchical clustering dendrogram thus depicts similarity among 82 sperm whale coda repertoires. (B) The heatmap shows identity coda type usage (rows) for each repertoire (columns) in shades of grey, with usage calculated based on probabilistic assignment of codas to types. Identity coda type codes are on the left of the heat map and centroid coda interval vector plots are on the right (with the spaces between the dots representing the inter-click intervals and the scale bar in seconds). (C) The output from IDcallPP shows the posterior assignment probabilities of each repertoire belonging to each identity clade (i.e. vocal clan) as a stacked bar plot. See [14] for additional details.
For all four of these data sets, the posterior probabilities output using the “all call types” option was clearer than when just identity calls were used (Fig 7). This indicates that while the identity calls are the primary delineators of population structure in these data sets, the other, non-identity, call types also differ somewhat in their usage among population segments.
Discussion
IDcallPP estimates posterior probabilities that each repertoire is a member of each identity clade and provides a range of useful information. It can suggest that the population structure predicted by IDcall is extremely robust (e.g. Fig 3), robust (e.g. Fig 4), robust with an occasional, potentially interesting, outlier (e.g. Fig 5), or that parts of the population structure are well described while others remain unclear (e.g. Fig 6). In cases where different parameter settings for IDcall produce different population structures, it may help guide the choice of parameters.The output stacked barplot of posterior probabilities should provide good guidance for evolutionary biologists, resource managers and conservation biologists as to the structure of their target population, in a similar way to that provided by STRUCTURE [e.g. 16]. However, the output may also address other questions of a species’ biology. For instance, the relative clarity of the posterior probability plots using identity calls versus those using all calls (e.g. Fig 7) might suggest whether the acoustic signatures of identity clades are restricted to identity calls, or manifest more broadly through repertoires.There are important differences between IDcall+IDcallPP and STRUCTURE, in addition to the different data sources (acoustic vs. genetic). Although IDcallPP calculates posterior probabilities of identity clade membership using Bayes’ theorem (Eq 5), the delineation of the identity clades by IDcall uses a non-Bayesian, and generally more conservative, method for determining the number of population segments, and allows some repertoires not to be assigned to identity clades. It should, thus, be less prone to overestimation of the number of population segments and the misassignment of repertoires.An issue which may affect the posterior probabilities is possible non-independence among the calls of a repertoire, thus theoretically invalidating Eq 4. We investigated the resulting biases by calculating how posterior probabilities were affected when the {n(R)} in Eq 4 were divided by a variance inflation factor v, where v>1 indicates lack of independence in count data [17]. With two identity clades, v = 1.2, and a true posterior probability of 0.8 for membership of one of the identity clades this was inflated to 0.84, and with five identity clades this became 0.87. When the variance inflation factor was raised to v = 2.0 (indicating substantial non-independence) these posterior probabilities were raised to 0.94 and 0.98, a considerable bias upwards from 0.8. Thus, non-independence of calls may be an important issue for some data sets. A correction could be applied in situations where the variance inflation factor can be estimated.IDcallPP employs only the no-admixture model in which a repertoire must be from only one identity clade or no identity clade at all, so the y-axes in Figs 3–6 are the posterior assignment probabilities. In the current implementation, there is no theoretical possibility that a repertoire contains elements of two or more identity clades: the posterior probabilities are that a repertoire is from a particular identity clade. However, as suggested above for the sperm whale populations, a repertoire could sometimes include calls from two, or possibly more, population segments. Thus, a useful future development would be an admixture model option in IDcallPP.We have developed this procedure of obtaining posterior membership of population segments using the output of IDcall which delineates clades using identity calls, made often by one population segment and rarely by the others. However, posterior probabilities can be calculated whenever an acoustic data set is divided into repertoires, the elements of each repertoire can be separated into calls in a manner so that each call can be categorized or at least quantified, and then some technique is used to cluster the repertoires into population segments. The trickiest part of this will often be calculating the likelihoods that each repertoire is a member of each population segment. If the calls can be categorized, or at least assigned probabilities of belonging to different categories (as in IDcall), and can be considered independent, then this is accomplished using Eqs 1–4. When calls are only defined by continuous measures (and not allocated to categories), one would need to obtain probability distributions for each population segment in multivariate space, perhaps using mixture models, and then assess the overlap of the calls of each repertoire with the probability distributions of each population segment.Some of these steps could be simple. For instance calls could be allocated to call types subjectively by humans [e.g. 4] or using a simple clustering method such as K-means [18]. Population segments could be delineated geographically, or by weighting equally all the calls in each repertoire (not emphasizing identity calls as in IDcall).Compared with molecular genetic methods for detecting, assigning, and evaluating population structure, techniques using acoustic data are much more rudimentary. They have mostly been ad-hoc methods developed or appropriated for a particular data set [e.g. 3, 18]. However, although IDcallPP has been developed to work with the output of IDcall, we have outlined a generic methodology that should be generally useful in studies of population structure using acoustic data.The collection and analysis of acoustic data to study population structure will often be less costly, and usually less invasive, than comparable genetic studies. Sometimes, as with our wren and cricket examples, the genetic and acoustic data can tell similar stories. In contrast, when acoustic repertoires are socially learned, as with sperm whales, the contrasting patterns of genetic and cultural inheritance may lead to complex population structures [19]. Thus, the analysis of acoustic data may be effective and/or essential if we are to understand population structures.The IDcall and IDcallPP codes (in program language R) are under active development by the authors and can be accessed, along with the sperm whale datasets, through the Open Science Framework (https://osf.io/5fter/).21 Mar 2022
PONE-D-21-36851
Posterior probabilities of membership of repertoires in acoustic clades
PLOS ONE
Dear Dr. Whitehead,Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.The reviewers indicated that this is an interesting and important manuscript and I completely agree. The reviewers have a few minor comments that should be addressed by the authors. I am hoping these revisions can be made without much difficulty.
Please submit your revised manuscript by May 05 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.Please include the following items when submitting your revised manuscript:
If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.
A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.We look forward to receiving your revised manuscript.Kind regards,Christopher Nice, Ph.D.Academic EditorPLOS ONEJournal Requirements:When submitting your revision, we need you to address these additional requirements.1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found athttps://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf andhttps://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf2. PLOS requires an ORCID iD for the corresponding author in Editorial Manager on papers submitted after December 6th, 2016. Please ensure that you have an ORCID iD and that it is validated in Editorial Manager. To do this, go to ‘Update my Information’ (in the upper left-hand corner of the main menu), and click on the Fetch/Validate link next to the ORCID field. This will take you to the ORCID site and allow you to create a new iD or authenticate a pre-existing iD in Editorial Manager. Please see the following video for instructions on linking an ORCID iD to your Editorial Manager account: https://www.youtube.com/watch?v=_xcclfuvtxQ3. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.[Note: HTML markup is below. Please do not edit.]Reviewers' comments:Reviewer's Responses to Questions
Comments to the Author1. Is the manuscript technically sound, and do the data support the conclusions?The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: PartlyReviewer #2: Yes********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: NoReviewer #2: Yes********** 3. Have the authors made all data underlying the findings in their manuscript fully available?The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: YesReviewer #2: Yes********** 4. Is the manuscript presented in an intelligible fashion and written in standard English?PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: YesReviewer #2: Yes********** 5. Review Comments to the AuthorPlease use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: This is a cool paper that makes a “structure-like” model in order to identify groups that use the same call sets, in the way that Structure identifies sets of individuals with the same gene frequencies.(1) label the axes. Most structure plots are based on admixture model, so the x axis is estimated ancestry fraction. Here this is a no admixture model, so the axis corresponds to assignment probability.(2) several of the steps seem unclear. How is the posterior achieved? Structure uses MCMC but what is used here? It says that the model is non Bayesian. What is it instead? How is it decided which repertoires are not assigned to identity groups? It would be very instructive to compare to a fully structure like model (ie bayesian no admixure model) to justify these choices.(3) the statement this is somewhat circular is confusing. Presumably the logic is similar to the structure paper, ie where gene frequencies can be applied given population memberships and vice versa. If this is what is meant spell it out.(4) the approach would presumably be more powerful if IDcall and IDcallPP were combined together. It should be easier to identify different call signals given identityclade assignments. This possibility should at least be mentioned.Reviewer #2: The paper represents a fascinating development in analyzing population structure for species that use acoustic communication. It was well written and the findings clearly presented. I did find the italicizing of keywords to be distracting after a while so I recommend not doing that. One important point that was not emphasized in the Discussion is that their method will generally be much less invasive, and possibly less costly, than genetic methods to accomplish the same task. That said a comparison to a genetic study on the same populations would be informative validation of the approach.********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.If you choose “no”, your identity will remain anonymous but your review may still be made public.Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: NoReviewer #2: No[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.
6 Apr 2022Thank you for considering our manuscript “Posterior probabilities of membership of repertoires in acoustic clades” for publication in PLOS ONE. We are happy with the reviews and have revised the manuscript accordingly, as follows:Reviewer #1: This is a cool paper that makes a “structure-like” model in order to identify groups that use the same call sets, in the way that Structure identifies sets of individuals with the same gene frequencies.(1) label the axes. Most structure plots are based on admixture model, so the x axis is estimated ancestry fraction. Here this is a no admixture model, so the axis corresponds to assignment probability.� This is a good point. Labelling the y-axes would add to the complexity of the already complex plots, so we have added (in italics) to each of the captions for Figs 3-6: “The output from IDcallPP shows the posterior assignment probabilities of each repertoire…”, and also added (lines 216-7): “…, so the y-axes in Figs 3-6 are the posterior assignment probabilities."(2) several of the steps seem unclear. How is the posterior achieved? Structure uses MCMC but what is used here? It says that the model is non Bayesian. What is it instead? How is it decided which repertoires are not assigned to identity groups? It would be very instructive to compare to a fully structure like model (ie bayesian no admixure model) to justify these choices.� There is some confusion here. IDcall (the original Hersh et al. methodology) does not use Bayesian methods to find identity calls or identity clades, but the methodology used in the submitted paper (IDcallPP) to produce posterior probabilities does (equation 5 is Bayes’ theorem). In this case the calculation is straightforward, so methods like MCMC are not needed. To make this clearer, we have added (lines 205-6) “Although IDcallPP calculates posterior probabilities of identity clade membership using Bayes’ theorem (eq 5), the delineation of the identity clades by IDcall uses a non-Bayesian, and generally more conservative, method…”(3) the statement this is somewhat circular is confusing. Presumably the logic is similar to the structure paper, ie where gene frequencies can be applied given population memberships and vice versa. If this is what is meant spell it out.� We have spelled this out, adding (lines 73-75): “In other words, the calls heard in a repertoire are used to delineate identity clades, the very information that is used to calculate the posterior probability that the repertoire is a member of an identity clade. To remove this circularity, we omit repertoire R from the calculation…”(4) the approach would presumably be more powerful if IDcall and IDcallPP were combined together. It should be easier to identify different call signals given identity clade assignments. This possibility should at least be mentioned.� Yes, they are now combined. We have added (lines 242-3): “The IDcall and IDcallPP codes (in program language R) are under active development by the authors and can be accessed, along with the sperm whale datasets, through the Open Science Framework (https://osf.io/5fter/).” In the latest version of software, IDcall optionally calls IDcallPP.Reviewer #2: The paper represents a fascinating development in analyzing population structure for species that use acoustic communication. It was well written and the findings clearly presented. I did find the italicizing of keywords to be distracting after a while so I recommend not doing that.� We have removed the italicizing of key words as suggested by the reviewer.One important point that was not emphasized in the Discussion is that their method will generally be much less invasive, and possibly less costly, than genetic methods to accomplish the same task. That said a comparison to a genetic study on the same populations would be informative validation of the approach.� We have added (lines 237-241): “The collection and analysis of acoustic data to study population structure will often be less costly, and usually less invasive, than comparable genetic studies. Sometimes, as with our wren and cricket examples, the genetic and acoustic data can tell similar stories. In contrast, when acoustic repertoires are socially learned, as with sperm whales, the contrasting patterns of genetic and cultural inheritance may lead to complex population structures [18]. Thus, the analysis of acoustic data may be effective and/or essential if we are to understand population structures.”Submitted filename: Posterior_review_comments.docxClick here for additional data file.11 Apr 2022Posterior probabilities of membership of repertoires in acoustic cladesPONE-D-21-36851R1Dear Dr. Whitehead,We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.Kind regards,Christopher Nice, Ph.D.Academic EditorPLOS ONEAdditional Editor Comments (optional):Reviewers' comments:14 Apr 2022PONE-D-21-36851R1Posterior probabilities of membership of repertoires in acoustic cladesDear Dr. Whitehead:I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.If we can help with anything else, please email us at plosone@plos.org.Thank you for submitting your work to PLOS ONE and supporting open access.Kind regards,PLOS ONE Editorial Office Staffon behalf ofDr. Christopher NiceAcademic EditorPLOS ONE
Authors: Taylor A Hersh; Shane Gero; Luke Rendell; Maurício Cantor; Lindy Weilgart; Masao Amano; Stephen M Dawson; Elisabeth Slooten; Christopher M Johnson; Iain Kerr; Roger Payne; Andy Rogan; Ricardo Antunes; Olive Andrews; Elizabeth L Ferguson; Cory Ann Hom-Weaver; Thomas F Norris; Yvonne M Barkley; Karlina P Merkens; Erin M Oleson; Thomas Doniol-Valcroze; James F Pilkington; Jonathan Gordon; Manuel Fernandes; Marta Guerra; Leigh Hickmott; Hal Whitehead Journal: Proc Natl Acad Sci U S A Date: 2022-09-08 Impact factor: 12.779