Jens C Hegg1, Jonathan Middleton2,3, Ben Luca Robertson4, Brian P Kennedy1,5,6. 1. Dept. of Fish & Wildlife Sciences, University of Idaho, 975 W 6th St, Moscow, ID 83844, United States. 2. Department of Music, Eastern Washington University, 119 Music Building, Cheney, WA 99004, United States. 3. Faculty of Communication Sciences, University of Tampere, 33014, Finland. 4. McIntire Department of Music, University of Virginia, 112 Cabell Drive, Charlottesville, VA 22904, United States. 5. Department of Biology, Life Sciences South 252, University of Idaho, Moscow, ID 83844, United States. 6. Department of Geology, McClure Hall 203, University of Idaho, Moscow, ID 83844, United States.
Abstract
The migration of Pacific salmon is an important part of functioning freshwater ecosystems, but as populations have decreased and ecological conditions have changed, so have migration patterns. Understanding how the environment, and human impacts, change salmon migration behavior requires observing migration at small temporal and spatial scales across large geographic areas. Studying these detailed fish movements is particularly important for one threatened population of Chinook salmon in the Snake River of Idaho whose juvenile behavior may be rapidly evolving in response to dams and anthropogenic impacts. However, exploring movement data sets of large numbers of salmon can present challenges due to the difficulty of visualizing the multivariate, time-series datasets. Previous research indicates that sonification, representing data using sound, has the potential to enhance exploration of multivariate, time-series datasets. We developed sonifications of individual fish movements using a large dataset of salmon otolith microchemistry from Snake River Fall Chinook salmon. Otoliths, a balance and hearing organ in fish, provide a detailed chemical record of fish movements recorded in the tree-like rings they deposit each day the fish is alive. This data represents a scalable, multivariate dataset of salmon movement ideal for sonification. We tested independent listener responses to validate the effectiveness of the sonification tool and mapping methods. The sonifications were presented in a survey to untrained listeners to identify salmon movements with increasingly more fish, with and without visualizations. Our results showed that untrained listeners were most sensitive to transitions mapped to pitch and timbre. Accuracy results were non-intuitive; in aggregate, respondents clearly identified important transitions, but individual accuracy was low. This aggregate effect has potential implications for the use of sonification in the context of crowd-sourced data exploration. The addition of more fish, and visuals, to the sonification increased response time in identifying transitions.
The migration of Pacific salmon is an important part of functioning freshwater ecosystems, but as populations have decreased and ecological conditions have changed, so have migration patterns. Understanding how the environment, and human impacts, change salmon migration behavior requires observing migration at small temporal and spatial scales across large geographic areas. Studying these detailed fish movements is particularly important for one threatened population of Chinook salmon in the Snake River of Idaho whose juvenile behavior may be rapidly evolving in response to dams and anthropogenic impacts. However, exploring movement data sets of large numbers of salmon can present challenges due to the difficulty of visualizing the multivariate, time-series datasets. Previous research indicates that sonification, representing data using sound, has the potential to enhance exploration of multivariate, time-series datasets. We developed sonifications of individual fish movements using a large dataset of salmon otolith microchemistry from Snake River Fall Chinook salmon. Otoliths, a balance and hearing organ in fish, provide a detailed chemical record of fish movements recorded in the tree-like rings they deposit each day the fish is alive. This data represents a scalable, multivariate dataset of salmon movement ideal for sonification. We tested independent listener responses to validate the effectiveness of the sonification tool and mapping methods. The sonifications were presented in a survey to untrained listeners to identify salmon movements with increasingly more fish, with and without visualizations. Our results showed that untrained listeners were most sensitive to transitions mapped to pitch and timbre. Accuracy results were non-intuitive; in aggregate, respondents clearly identified important transitions, but individual accuracy was low. This aggregate effect has potential implications for the use of sonification in the context of crowd-sourced data exploration. The addition of more fish, and visuals, to the sonification increased response time in identifying transitions.
Pacific salmon migration provides important inputs to the freshwater ecosystems, affecting nutrient cycling and biodiversity in the areas where they spawn (Carlson et al., 2011; Gende et al., 2002; Healey, 2009). Despite this, the combined effects of overfishing, hydropower, and other anthropogenic changes have caused large declines in salmon migrations, particularly in the Columbia River basin in the Northwestern United States (Good et al., 2005; Ruckelshaus et al., 2002). Management and conservation of these salmon species requires a detailed understanding of their migration incorporating both temporal detail and large spatial extent. The resulting data is complex and often multivariate, and new tools may help researchers understand and explore this data. Sonification is a data representation method that uses sound instead of visualizations to represent data. When data is mapped to sound in a pleasing way, the human mind can intuitively process the sound to discover trends or features that may be important to researchers (Barrass and Kramer, 1999; Hermann et al., 2011).Many traditional methods of studying fish movement lack the temporal and spatial resolution to study salmon movement both at the fine scales at which important effects occur and across the large spatial extent of the migration. Fish ear stones, called otoliths, provide one method for collecting detailed movement data across the span of salmon migration. Otoliths are a balance and hearing organ in the inner ear of fish. Otoliths grow through the addition of daily rings of calcium carbonate, similar to the growth of tree rings (Campana, 2005; Campana and Neilson, 1985). Each stream a fish travels through has a different chemical signature, and otoliths record this chemistry in their daily growth rings. Measuring the chemistry in these otolith rings, it is possible to reconstruct the location and timing of the movements a fish makes throughout its life (Kennedy et al., 1997, 2002; Thorrold et al., 1998).For migratory fish, and especially salmon, this technique is a powerful, but data intensive, way of studying the ecological implications of movements and migration for species under protected status or that otherwise cannot be handled physically for manual tagging (Hamann and Kennedy, 2012; Hegg et al., 2013, 2015). These chemical signatures record the time a juvenile salmon spends in each freshwater habitat, from the location where it hatched, to each new river it enters on its way downstream, to its entry into the ocean (Hegg et al., 2013; Kennedy et al., 2002; Walther et al., 2008).Reconstructing the movements of a large number of salmon presents challenges for perception and analysis due to the difficulty in visualizing the multivariate time-series datasets. The ability to interpret datasets visually begins to degrade relatively quickly with additional data streams or dimensions (Tufte, 2001; Ware, 2004). For salmon populations, the variation in movement timing within the population is particularly difficult to analyze statistically, despite our ability to collect and analyze large datasets. In this regard otolith microchemistry data shares the same issues of other big data problems: that our ability to collect, store, model, and analyze large amounts of data requires concurrent advances in analysis, communication and interpretation of these complex datasets (Keefe and Isenberg, 2013; Overpeck et al., 2011; Wong et al., 2012).In contrast to visualization, hearing is inherently multidimensional (Moore, 1995) and the human ability to interpret nuanced changes in pattern, and especially timing, in audio signals is striking (Fitch and Kramer, 1994; Kramer et al., 2010; Moore, 1995; Neuhoff, 2011). This is exemplified by the so called, “cocktail party problem,” the observation that human hearing is remarkably capable of disentangling many simultaneous channels of sonic input to focus only on a sound of interest (McDermott, 2009). This indicates that multivariate data, and time-series data in particular, is especially sui to exploration and interpretation using data sonification and auditory display (Kramer et al., 2010). However, no definitive sonification model for this purpose exists, as the theory and best-practices for creating effective sonifications is still under active development (De Campo, 2007; Hermann et al., 2011; Walker and Nees, 2011).Understanding the timing of large numbers of salmon movements is particularly important in one population of Fall Chinook salmon in the Snake River in the northwestern United States. Recent evidence indicates that the timing of ocean migration in juveniles of this population may be evolving due to human induced changes in the river system (Waples et al., 2017; Williams et al., 2008). Migration in these fish has changed from exclusively early outmigration in their first summer (sub-yearling) historically, to a mix of migration timings that includes fish which enter the ocean the following spring (yearling) (Connor et al., 2005). Since the selective pressures driving this evolution are likely different in locations across the basin it is important to understand the timing at which sub-populations of fish decide to move downstream to each new habitat (Connor et al., 2002; Hegg et al., 2013). Sonification of this data has the potential to provide a method to quickly explore temporal details of movement timing, temporal structure which traditional statistical methods struggle to quantify. As a time-series dataset in which each variable describes unique location and timing data, and unique combinations of covarying signatures can also be used to determine location and movement information, it is also an ideal candidate to explore elements of sonification design. This is particularly true because the temporal complexity of the dataset can be scaled through addition or subtraction of the data from individual fish.The field of sonification has resulted in exciting recent advances for data exploration (Ballora et al., 2004; Dombois, 2002; Khamis et al., 2012; Loeb and Fitch, 2002), which often requires an understanding of how listeners perceive important changes in the data (Barrass and Kramer, 1999; De Campo et al., 2006; Flowers, 2005; Hermann et al., 2011; Ware, 2004). Although sonification can be paired with visuals in interactive displays, it is often unclear to what degree simultaneous visualization improves listener accuracy in interpretation of sonifications (Hermann and Hunt, 2005; Minghim and Forrest, 1995; Rabenhorst et al., 1990). Further, understanding of how users respond to the addition of aural complexity, and its effect on the ability of listeners to identify important changes in the data is an open question as most sonifications are limited to a relatively few data streams (Ferguson et al., 2011). The complexity of listener responses is one reason for the recommendation that sonification researchers should validate their work with perceptual surveys (Kramer et al., 2010). In the case of otolith microchemistry, the data provided a scalable, multivariate dataset upon which to test listener responses to layers of sonification complexity, with and without visualizations.Using a sonification of multivariate salmon movement data and naïve listeners, we tested for generalizable trends in the ability of unsupervised listeners to identify changes in an increasingly complex dataset. Our study was based on a sonification model developed by the authors through an iterative, interdisciplinary process with the goal of creating a useful, and aesthetically interesting, data exploration tool. The sonification used five chemical tracers relevant to fish location; strontium isotope ratio (87Sr/86Sr), and ratios of elemental strontium (Sr), barium (Ba), magnesium (Mg) and manganese (Mn) to calcium (Ca). These data were mapped to pitch, timbre and stereo-location with the intention of creating clear transitions in fish location as well as aesthetically interesting harmonic and timbral effects.This study had three objectives. The first was to quantify the specific sonic elements that can provide effective markers of data transitions that reflect salmon movements between habitats. Untrained respondents were tested on a suite of four sonic markers and two negative controls to test the hypothesis that pitch and timbre would be the most effective indicators of transition. Our second objective was to test the ability of respondents to identify known transitions within multivariate fish-otolith sonifications of increasing complexity. We hypothesized that respondent accuracy would decrease with increasing sonification complexity. Finally, we tested whether the addition of a simultaneous visualization of the data improved respondent accuracy as complexity increased. We hypothesized that respondent accuracy would be unchanged, based on recent results from Bywater and Middleton (2016) who found that a high percentage of users can perceive similarities between line graphs and corresponding sonifications based mainly on data-to-pitch mapping.
Methods
Salmon movement data
The data used to create the sonifications were taken from a dataset of threatened Fall Chinook salmon in the Snake River in the northwestern United States (Hegg et al., 2013). Juvenile movement timing is important to ecologists and managers because recent evidence suggests that the population may be evolving novel migration patterns in response to dams and other anthropogenic affects across their habitat (Waples et al., 2017; Williams et al., 2008).The dataset consisted of isotopic and micro-chemical data from forty-five adult salmon otoliths within a larger dataset collected by Hegg et al. (2013). Briefly, otoliths were collected from fish as a part of the sampling of fish as broodstock for Lyons Ferry Hatchery, the largest of two Fall Chinook hatcheries in the Snake River Basin. Fish destined for Lyons Ferry Hatchery are captured as they pass Lower Granite Dam, the final dam on the Lower Snake River. Otoliths were only collected from presumed-wild fish, those fish lacking a clipped adipose fin or coded wire tag and thus likely to have been naturally spawned. Otoliths were stored dry and processed as described in Hegg et al. (2013; Secor et al., 1991). These fish are presumed to be a random sample of the entire run up to the date at which the hatchery quota is reached.Hegg et al. showed that river location can be reliably determined through the natal, rearing and overwintering phases of the juvenile outmigration using linear discriminant function classification of 87Sr/86Sr ratio. This discriminant function analysis was used to provide location information to the sonification. See McGarigal et al. (2000) for a discussion of the linear discriminate function method in the context of fisheries and wildife. In addition to the 87Sr/86Sr isotopic signature, the sonification utilized four elemental signatures expressed as a ratio with calcium; Sr/Ca, Ba/Ca, Mg, and Mn. Trace amounts of these elements replace calcium in the calcium carbonate matrix of the otolith as a function of both the dissolved concentration of these elements in the water the fish inhabits and the bioregulation within the body. The data is expressed as a ratio of the abundance of each element in comparison to calcium, the element they substitute for in the otolith matrix (e.g. - Sr/Ca).Analysis of otolith data using LA-ICP-MS is done by moving a laser across the surface of the otolith from the core to the edge, ablating small amounts of otolith material which is drawn into the mass spectrometer and analyzed in sequence (e.g. Hegg et al., 2015). Therefore, the data consists of measurements of each isotopic and elemental ratio in increasing distance from the core of the otolith. This results in a temporal record of the life of the fish, with the core representing birth and the edge representing the death of the fish after returning to spawn. The microns from the core represent the relative time within the life of the fish (Fig. 1).
Fig. 1
Otolith Data collection (this will be updated for this study). Otoliths are polished along the saggital plane to uncover the rings (a). Polishing is stopped with the core is visible. Otolith chemistry is then analyzed by ablating a transect across the otolith from the otolith core to its rim (A). As the laser moves across the otolith, ablated material is swept into the inductively coupled plasma mass spectrometer (ICP-MS), ionized, and the ratio of isotopes and elements contained in the sample is measured. The resulting data (b) shows the changes in chemical values (A) from the birth of the fish (0 μm) to its death (the edge of the otolith and end of the data). Changes in 87Sr/86Sr indicate movements between locations with distinct chemistry (b).
Otolith Data collection (this will be updated for this study). Otoliths are polished along the saggital plane to uncover the rings (a). Polishing is stopped with the core is visible. Otolith chemistry is then analyzed by ablating a transect across the otolith from the otolith core to its rim (A). As the laser moves across the otolith, ablated material is swept into the inductively coupled plasma mass spectrometer (ICP-MS), ionized, and the ratio of isotopes and elements contained in the sample is measured. The resulting data (b) shows the changes in chemical values (A) from the birth of the fish (0 μm) to its death (the edge of the otolith and end of the data). Changes in 87Sr/86Sr indicate movements between locations with distinct chemistry (b).
Sonification design
The sonification design was based on an interdisciplinary working process between a scientist and two composers, with the objective of meaningfully representing juvenile salmon movement as sound (Robertson et al., 2015). Within the resulting sonification (Clip1_Full_Sonification.mp3), the distance from the otolith core, measured in microns, represents time, from the start of the file to its end. Across this timeline various life stages were mapped to changes in overall amplitude, with important temporal markers, including birth, the end of maternal influence, and death, acting as breakpoints within these overlapping envelopes. For each fish the end of maternal chemical influence on the developing otolith was considered to be 250 μm (Barnett-Johnson et al., 2008), representing an initial crescendo, with the amplitude ascending at a consistent rate towards a steady value that is sustained until the death of that individual, which begins a sudden decrescendo into silence. During simultaneous playback of all fish (tutti), the sound of each fish (soli) in each watershed are cumulative, giving the listener an indication of the how many salmon are currently active within a given watershed or marine system.For each fish, the sonification mapped strontium isotope ratios to audio parameters associated with spatial orientation, distance, and passage between specific river or marine systems. At the foundation of this model is the ability for the listener to recognize discrete entrances or exits of individuals through one of four chemically distinct river groups within the Snake River watershed defined by Hegg et al. (2013): the Lower Snake River, the Upper Snake River, the Clearwater/Salmon Rivers, and the Grand Ronde/Imnaha/Tucannon Rivers, as well as the Pacific Ocean. The 87Sr/86Sr signatures unique to these locations are ranges defined by the group boundaries of the discriminate function used by Hegg et al. (2013), so that as a fish's otolith signature crosses this group boundary its location changes instantaneously (Table 1, Fig. 1). Therefore, 87Sr/86Sr ratio was mapped to discrete, nearly instantaneous changes in pitch at these transition points, indicating the passage of salmon from one river system into another.
Summary of Sonification ParametersPassage between river locations was further punctuated by applying a percussive envelope to each sounding sine tone, creating a sudden, bell-like, audio marker of the transition between habitats. This envelope utilizes a sharp attack (5 milliseconds), a brief decay (100 milliseconds), a sustained amplitude 6 dB lower than the peak value, and a release time of 400 milliseconds. Following the onset of each envelope, the corresponding pitch is sustained at a significantly lower amplitude until another habitat change occurs or the lifecycle of the fish concludes.All mapped pitches originate from sinusoidal waveforms whose frequencies are derived from whole-number ratios. This system of integral tuning, or just intonation, creates intervallic structures between simultaneously sounding individual fish which form cohesive chordal structures. As these structures often stem from high-order partials, resultant harmonies display distinctly rich microtonal qualities that often deviate from standard musical temperament.Beyond mapping fish location to pitch, the sonification algorithm also used 87Sr/86Sr thresholds to map fish to a generalized geographic location within the stereo field. In this way, each fish changed location in relation to the listener as it moved downstream as if the listener were located at the confluence of the Snake and Columbia River (46.233° North Latitude) and facing toward the geographic center of the basin. Latitude ranges for each river group were estimated using the USGS Streamer tool (http://water.usgs.gov/streamer/web/) based on spawning distributions from Garcia et al. (2008). Each fish, at each point during the sonification, was then stochastically assigned a stereo location within the latitude range of the river in which it was assigned (Table 1). To maintain a consistent perception of loudness across the stereo field, a constant-power panning algorithm is employed.To supplement this spatial model and suggest proximity to the listener, reverberation was applied in linear proportion to each fish's virtual location in relation to the listener's virtual location, at the Snake and Columbia Rivers. A greater proportion of reverberation was used to suggest greater distance from the listener, while a direct, unaffected signal indicated proximity.In addition to strontium isotope signatures, the intensity of Sr/Ca ratios were used to determine entry into the ocean, due to the sharp increase in Sr/Ca associated with entry into salt water. Entry into the ocean was defined as a stable, 20-point moving average of 87Sr/86Sr within +/−0.0004 of the global marine value (0.70918) as well as Sr/Ca values between 0.9478 < 1.1609 (Fig. 1).Entry into the Pacific Ocean is heard as a distinctive transformation of spectral quality as spectral bandwidth is broadened and the perception of a single, center pitch is progressively obscured by an increased noise bandwidth, creating a wash of sound rather than the more pure tone of freshwater residence. This timbral change was accomplished using a modified amplitude modulation synthesis in which the audio output is interpolated between a sinusoidal waveform reflecting frequency value of the previously occupied freshwater system and a random-amplitude carrier waveform (“rand∼” object in the programming language, Max/MSP). As chemical signatures indicative of entry into the Pacific Ocean begin to stabilize, the random-amplitude waveform is modulated by a steady, 440 Hertz sine wave. Meanwhile, the frequency of the carrier waveform is mapped to a transitional range of Sr/Ca intensity values (0.947882 < 1.160923) using a linear-scaling function.Minimum and maximum output for this function vary between 50 and 400 Hertz. However, as Sr/Ca values recorded in the study occasionally exceed 2.55, intermodulation effects resulting from higher frequency outputs may be heard as momentary spikes in noise bandwidth, booming noises during the ocean phase. From an aural perspective, the associative qualities and continuum of “pure” to “noisy” timbres generated by this modified form of AM synthesis illustrate variation in the character of environments encountered during out-migration.
Perceptual survey
In order to test the integrity of the sonification model, a perceptual survey was created using sonifications of three individual fish from the larger sonification, as well as six short synthesizer clips. Each fish originated in one of three natal locations as defined by the discriminate function analysis in Hegg et al. (Hegg et al., 2013); the Upper Snake River (fish 5132), Clearwater River (fish m2742), and Imnaha/Grande Ronde/Tucannon Rivers (fish 3354). All fish then moved to the Lower Snake River during the rearing phase, followed by entry into the ocean. Thus, each fish had two major sonic transitions during its life. All otolith sonifications were limited to 1522 μm, the shortest of the three otoliths, and the time span of the sonifications was set to 1 minute and 30 seconds. Each fish was recorded individually, after which the files were combined in open source Audacity audio editing software (www.audacityteam.com). Known fish movements were determined from the discriminate function analysis in Hegg et al. (2013) and the timing of each location change for individual fish was determined by the author using a stopwatch.The survey also included a set of shorter sound clips used as controls, which were based on granular syntheses similar in timbral richness to the sonifications. Positive controls represented sonic transitions in left-to-right stereo panning (Clip2_Pan.mp3), adding a pitch (Clip3_Pitch.mp3), adding a new timbre (Clip4_Timbre.mp3), and increasing volume (Clip5_Crecendo.mp3). The two negative controls consisted of steady random static (Clip6_Static.mp3) and a clip with randomly intermittent sounds over a steady bass tone (evoking a vibrato-like sound, Clip7_Intermittent.mp3) (Table 2).
Table 2
Summary of perceptual survey questions.
Question #
Description
Type
Visuals
Mean Accuracy
Mean Response Delay (seconds)
1
Static
Control
No
8.6%a
-
2
Left-Right Panning
Control
No
85.7%b
-
3
Pitch
Control
No
100%b
-
4
Random intermittent
Control
No
34.3%a
-
5
Timbre
Control
No
97.1%b
-
6
Crescendo
Control
No
82.9%b
-
7
1-Fish
Experimental
No
43.5%
1.47
8
2-Fish
Experimental
No
54.5%
1.53
9
3-Fish
Experimental
No
40.5%
2.10
10
1-Fish
Experimental
Yes
45.8%
1.64
11
2-Fish
Experimental
Yes
52.9%
1.86
12
3-Fish
Experimental
Yes
16.6%
2.41
Description of questions used in perceptual survey of salmon otolith chemistry sonification. Letters indicate significantly different groups among the control responses.
Summary of perceptual survey questions.Description of questions used in perceptual survey of salmon otolith chemistry sonification. Letters indicate significantly different groups among the control responses.The survey was designed and built in Flash 3.0 using Adobe Animate software (Adobe.com), administered via computer, and is available in an online repository (Hegg et al., 2017). All listening was done through headphones. All sounds were accompanied by a counter showing the seconds elapsed in the right-hand corner of the screen. Sounds were also accompanied by a progress bar showing the remaining length of the clip, with the exception of sounds with visual displays. In these cases the visualizations indicated the progress of the sonification with a clear beginning and end point. Visualizations were animated as sparse graphs of the raw 87Sr/86Sr data (absent x and y value labels and using an aggressive 30-point moving average smoother) such that they revealed themselves in time with the sonification so that respondents were not able to look forward in the visualization to anticipate transitions.Respondents (n = 35) were allowed to proceed through the survey at their own pace, with sounds only starting once respondents clicked to start the sound. Responses were recorded on a paper datasheet (Hegg et al., 2017). Respondents were first asked to rate their level of training in Music and Math or Science as these relate to data analysis (none, up to one year, or more than 1 year). The survey then proceeded to a listening section made up of the controls using sounds based on granular synthesis. For each trial respondents answered “yes” or “no” to the same question, “Do you perceive a transition in the sound.” The answers were recorded after listening to each clip, and participants were offered only one listening experience per trial. At the end of the control section respondents were then counseled on the survey's new method for identifying transitions in longer clips, in real-time, using a push-button training clicker (http://www.starmarkacademy.com). Respondents were asked to depress the clicker button at the moment they identified a transition in the sound, at which point the test administrator would record the seconds elapsed on the datasheet.Questions using the sonification data proceeded from a single fish, to the addition of a second fish, to the addition of a third fish. Questions 7–9 were accompanied by a progress bar serving as the only visual aid. Questions 10–12 repeated the same sequence of sonifications, with the inclusion of animated visualizations, proceeding from a single fish (Q10_1Fish_Visuals.mp4), to two fish (Q11_2Fish_Visuals.mp4), and three fish (Q12_3Fish_Visuals.mp4).At the end of the survey respondents were asked four questions related to their experience taking the survey, with space given for a long-form answer. The questions were:Comment on your ability to identify transitions in the short, sound only clips.Comment on your ability to identify transitions in the longer, sound only clips.Comment on the effect of the visuals in identifying transitions in the sound clipsComment on your ability to identify transitions as more sounds were added to the clips.Survey respondents were intentionally left untrained as to what constituted a “transition” in the sound. The purpose of utilizing untrained listeners was to understand whether the sonification mapping provided an intuitive identification of sonic changes, with the intention that the sonification could be used with minimal training for data exploration. Advertisement for the survey did indicate that the sounds were derived from salmon, however details were only given after the testing if respondents were interested.All surveys were administered by Dr. Jonathan Middleton and a graduate assistant at Eastern Washington University between January 25th and February 24th of 2017. This survey was granted exemption from federal regulations for the protection of human subjects under CFR Title 45, Part 46.101(b) (1–6) by the Institutional Review Board for Human Subjects Research at Eastern Washington University (Review HS-5155). University of Idaho also provided an exemption under CFR Title 45, part 46.101(b) (2,4) (protocol 17-080).
Data analysis
Data analysis proceeded along three main hypotheses, one for each section of the survey.
Control questions
The first hypothesis was that respondents would positively identify each of the four positive control sound clips as transitions, while failing to identify the negative controls as transitions. This was tested using Fisher's Exact test of independence with post-hoc pairwise comparison using Bonferroni correction (Routledge, 2005).
Response accuracy
The second hypothesis was that survey respondents could identify the transitions in the sonifications in real-time. Since clicker responses exhibited a time-delay we calculated accuracy based on an envelope between the actual transition and the end of the estimated response delay. This response delay was calculated by estimating the peak-center and variance of aggregate responses of all the survey responses for each question. We used the R package {mclust} (Scrucca et al., 2016) to identify the unique density peaks in the aggregate response data using BIC model selection to identify the number of clusters (limited to between 5 and 20) and whether those clusters had equal or variable variance. This resulted in clusters corresponding to peaks in aggregate responses (i.e. - periods were larger numbers of respondents identified a transition in the sound). The clustering algorithm defines these clusters using a normal distribution, and thus mean and variance was calculated for each peak in responses.The response peaks which directly followed a known transition within the sonification were identified as “correct” response peaks, and responses within them were considered correct. Inclusion in a “correct” response peak was calculated based upon the properties of a normal distribution. Any response recorded between the time of the known transition and the right-hand tail of the cluster distribution was considered correct. Thus, the “correct” window was calculated as the seconds between the known transition and three standard deviations to the right of the mean of the “correct” response peak. This, according to the properties of the normal distribution, encompasses 99.7% of the responses within the response peak. In cases where the cluster model picked wide variance we decreased this window to two, or one standard deviations to avoid including data from nearby response peaks.Respondents were only given one “correct” response within that window so that responses were not biased towards respondents who clicked many times. Thus, if a respondent had multiple clicks within the “correct” window following a known transition, only the response closest to the transition was counted as correct and the rest were counted as incorrect. Response accuracy was then calculated as the number of correct clicks for each question divided by the total clicks the respondent made during the duration of that question. We tested the hypothesis that respondents could identify transitions by comparing response accuracy to 50%, the expected response accuracy in the case of random responses.The third hypothesis was that visualizations would have no effect on the ability of respondents to correctly identify transitions. We analyzed the response accuracy between questions containing visuals and those without, paired by the number of fish used in the sonification, to determine if there was a difference in response accuracy using a Chi-squared test of independence (Agresti and Kateri, 2011).In addition to hypothesis testing we analyzed the aggregate data to understand the response delay and variance as the complexity of the sonification increased.Data was analyzed in R version 3.3.2 (https://cran.r-project.org) and RStudio version 1.0.44 (www.rstudio.com).
Results
Analysis of the results from the control questions supported the hypothesis that respondents were able to positively identify sonic transitions (positive controls) without identifying random noise as a transition (negative controls). The results showed clear differences in respondents' determination of a transition between negative and positive controls (Table 2). Respondents (n = 35) identified a transition in the two negative controls at lower rates (Static = 8.57%, Random Intermittent = 34.29%) than for the positive controls. Respondents identified transitions in the positive controls at high rates, ranging from 82.86% for the Crescendo control to 100% for the Pitch control. A chi-square test of independence over the responses to all control questions indicated a significant difference in responses (p = 2.2 × 10−16, α = 0.05). Pairwise comparisons of each control using Holm's correction for multiple comparisons showed that the static control was significantly different from all the positive controls (adj. p ≤ 2.9 × 10−8 in all cases) but not from the intermittent negative control (adj. p = 0.14). The Random Intermittent control was also significantly different from all the positive controls (adj. p ≤ 0.0008 or less in all cases, α = 0.05).Assuming a large effect size of 0.5 and α = 0.05, power for individual post-hoc tests was high (0.84), despite the relatively lower power of the overall chi-squared test across the entire frequency table (0.6). This indicates confidence in the conclusion that respondents were indeed capable of distinguishing the presence of transitions within the controls.Density estimation of the aggregate responses for the sonification questions using the {mclust} package identified the best fit models as those with clusters of variable variance in all cases. The algorithm identified 6 clusters for both questions with a single fish sonification (questions 7 and 10) and different numbers for all the other questions: question 8 (11 clusters), question 9 (13 clusters), question 11 (9 clusters), and question 12 (10 clusters). To avoid models which conflated response peaks the number of available models were limited to greater than five clusters and up to 20. In the case of question 12 the minimum model was increased to 8 to avoid extremely wide variance clusters (see Fig. 2).
Fig. 2
Determining Correct Response Envelopes. Model based clustering analysis was used to determine density peaks in the aggregate response data for each question (black line). Grey bars indicate the number of responses at that time point. Peak centers (light blue, dashed lines) directly following a known transition (red lines) were identified. The variance of these peaks was used to calculate the right-hand boundary for correct responses, defined as three standard deviations to the right of the peak center (orange, dashed lines). In some cases, the number of standard deviations was decreased to avoid including following data peaks (* denotes peak constrained to 1-st. dev).
Determining Correct Response Envelopes. Model based clustering analysis was used to determine density peaks in the aggregate response data for each question (black line). Grey bars indicate the number of responses at that time point. Peak centers (light blue, dashed lines) directly following a known transition (red lines) were identified. The variance of these peaks was used to calculate the right-hand boundary for correct responses, defined as three standard deviations to the right of the peak center (orange, dashed lines). In some cases, the number of standard deviations was decreased to avoid including following data peaks (* denotes peak constrained to 1-st. dev).The cluster centers directly following a known sound transition were identified and the envelope for correct answers was defined from the point of the known transition to three standard deviations to the right of the associated peak center (Fig. 2). For some questions the peaks defined by {mclust} had wide variance and the number of standard deviations were adjusted to avoid classifying obviously different peaks as correct. This was done for question 8 (4th peak, 2 st. dev.), question 11 (4th peak, 2 st. dev.), question 9 (4th peak, 2 st. dev.), question 12 (2nd, 4th & 5th peaks, 1 st. dev.; 6th peak, 2 st. dev).The models identified several clusters in the period from 70 seconds to the end of the sonification, as well as a cluster at 63 seconds, which were not correlated with known salmon movement locations (Fig. 2). These peaks corresponded to a series of loud booming sounds generated by chemical changes occurring after the fish entered the ocean. These are also the most obvious example of the additional complexity, beyond simple movement data, that was incorporated into the sonification.Correct responses were calculated for each question using the envelope criteria established from the cluster model. The percentage of correct answers were calculated in aggregate for each question, as well as for individual respondents. Individual accuracy was poor, but highly variable, and insignificantly different from the null hypothesis of 50% accuracy (overall 43.5% ± 0.13 St. Dev.). Individual accuracy ranged from 0% to 100% across questions 7 through 12, with a mean individual accuracy ranging from 40.5% on question 10 to 16.6% on question 12.The aggregate frequency of correct and incorrect responses was compared for question pairs with the same number of fish, with and without visualizations, using chi-squared test for independence, despite high statistical power (Power >0.9 for all tests at a moderate effect size of 0.3, and α = 0.05). None of the response rates were significantly different between the question pairs (p ≤ 0.91 for all tests), indicating support for the null hypothesis that visualizations did not improve accuracy. The number of correct responses increased with the number of fish included in the sonification. Questions with one fish (questions 7 and 10), with and without visuals, had a 43.5% and 45.8% accuracy, respectively. Questions with two fish (questions 8 and 11), with and without visuals, had an accuracy rate of 54.5% and 52.9% respectively. Questions with three fish (questions 9 and 12), with and without visuals, had an accuracy rate of 52% and 46% respectively.The response delay was also analyzed, using the difference in time between the known transitions and their associated cluster mean from the {mclust} results. The response delay increased from a minimum of 1.2 seconds with one fish and no visuals (question 7), to a maximum of 2.1 seconds with three fish with visuals (question 12). Both response delay and the variance in those responses increased as more fish were added (Table 2, Fig. 3).
Fig. 3
Response delay with Increasing Numbers of Fish. The response delay of respondents was calculated for each question. Delay time, as well as the variance of that delay, increased as the number of sonified fish increased. Delay was lower throughout the survey for questions without visuals (red) than for questions that included visuals (blue).
Response delay with Increasing Numbers of Fish. The response delay of respondents was calculated for each question. Delay time, as well as the variance of that delay, increased as the number of sonified fish increased. Delay was lower throughout the survey for questions without visuals (red) than for questions that included visuals (blue).No difference was seen between individual accuracy and the amount of musical or math and science training of respondents.All raw data is available in an online data repository (Hegg et al., 2017, https://doi.org/10.17632/7sk82n38sh.2).
Discussion
Human hearing is particularly adept at determining changes in pattern and timing within incoming temporal data streams. In contrast to visual representations of multivariate data, which are limited by the number of available dimensions as well as the ability to interpret large numbers of time-series in one visualization, sonification has the potential to provide a method for display and exploration of high-dimensional time series datasets which may be faster and more intuitive for identifying timing shifts within large datasets (Barrass and Kramer, 1999; Kramer et al., 2010).Kramer et al. (2010), in their report on the status of the field of sonification, identify the need to understand the additive effects of multiple data streams on listener understanding and memory load as a central question. The movement data available from salmon otolith microchemistry studies provides an ideal dataset for the study and development of useful sonification methods. This data is temporal in nature, with discrete changes in chemistry relating directly to easily interpretable movements in individual fish. Since otolith data are inherently multivariate and scalable, each individual fish can be represented by multiple, simultaneous, chemical data streams while the entire dataset can be scaled by adding additional fish. This scalability and temporal nature lend themselves to auditory display, which relies on the ability of the human ear to interpret temporal patterns (Walker and Nees, 2011). This relates to an important ecological question in salmon populations: how individual movement decisions scale to the population level. Our study indicates that sonification could provide a method for data exploration and communication of results on its own or as a complement to traditional statistical methods and visualizations.In our survey respondents were able to identify transitions in several sonic elements with a high degree of accuracy, and to distinguish transitions from random noise (Table 2). In particular, our results indicate that pitch and timbre are the most easily recognized sonic transitions, with volume and panning transitions being recognized slightly less often. This indicates that our naïve participants fall within the expectations of previous research showing that pitch and timbre are effective, and often used, mappings (Dubus et al., 2013; Neuhoff, 2011).Another interesting finding from our control responses is that the degree of granularity in random noises appear to determine whether participants view them as random, or as transitions. The intermittent and static negative controls were not significantly different, however, higher numbers of respondents identified the intermittent control as a transition. This may indicate that the more granular random noises become in a sonification the more likely people may be interpret them as transitions. Similarly, the more complex, and thus seemingly chaotic or random a sonification becomes, the more likely listeners might be to identify random noises as transitions.Overall the control results argue for parsimony in sonification designs. If the most important data streams within a multivariate dataset are known a priori they should be mapped to pitch and timbre given the sensitivity of listeners to transitions in these sonic elements. Further, if the sonification is being developed for exploration of unknown data, attempts should be made to avoid random, granular fluctuations in the data that might be interpreted as important transitions.Our results indicate interesting interactions between sonification complexity, listener response latency, and accuracy. Most sonification experiments have focused on individual accuracy metrics to interpret whether listeners are able to interpret the contents of the sonification (Schuett and Walker, 2013). Sonification complexity has also been cited as a limiting factor in the utility of sonifications (Marila, 2002; Pauletto and Hunt, 2005). In our tests, individual accuracy was relatively low, and highly variable (Fig. 4). This lack of individual accuracy contrasts with the fact that the control data shows that listeners could distinguish transitions with a high degree of accuracy.
Fig. 4
Accuracy of individual responses by number of responses. The accuracy of individual respondents for each survey question (black dots) shows that accuracy decreases with an increased number of clicks, as expected. Some respondents were very selective in their determination of transitions, while others identified many transitions. This pattern holds for questions with one and two fish included in the sonification, with and without visuals. However, the addition of a third fish shows much more variation in accuracy, indicating a limit to the complexity at which respondents could accurately identify individual transition points.
Accuracy of individual responses by number of responses. The accuracy of individual respondents for each survey question (black dots) shows that accuracy decreases with an increased number of clicks, as expected. Some respondents were very selective in their determination of transitions, while others identified many transitions. This pattern holds for questions with one and two fish included in the sonification, with and without visuals. However, the addition of a third fish shows much more variation in accuracy, indicating a limit to the complexity at which respondents could accurately identify individual transition points.The sonifications themselves were complex; utilizing pitch, timbre and stereo location within the data streams for each individual fish. Thus, without training, listeners may have been identifying transitions in other sonic elements which were not counted as “correct.” Despite this, there is evidence to indicate that most listeners were identifying the intended transitions. Respondents who clicked fewer times tended to have higher accuracy rates (Fig. 4), indicating that they were identifying the intended transitions and ignoring other sonic changes. Respondents who clicked more often had lower accuracy rates, however, this is likely due to a dilution effect. Those who clicked more often still largely identified the appropriate transitions in addition to other perceived transitions which were not counted as correct.This leads to the non-intuitive conclusion that although individual accuracy may be low, the natural ability of naïve listeners to identify transitions in pitch and timbre can be useful in aggregate. In essence, our data suggest that untrained listeners were able to “crowd source” the location of sonic transitions in complex, multivariate datasets. The most complex sonifications showed increased variation in individual accuracy (three-fish, no visuals) and a decrease in overall accuracy (three fish, with visuals), indicating that this “crowd sourcing” ability may be limited by complexity.Decreases in aggregate accuracy with increasing sonification complexity may be explained due to the time it took for respondents to identify a transition. Schuett and Walker (2013) argued that response latency indicates the ability of listeners to process sonification information. Response delay increased in our study as more fish were added (Fig. 3), indicating that the additional complexity required more processing time before respondents identified a transition.The inclusion of visuals resulted in further increases in response time across all levels of sonification complexity (Table 1, Fig. 3). The majority of listeners reported that visuals were either unhelpful or even detrimental to interpretation of the audio:“[Visuals were] distracting because I wasn't quite sure what to focus on.”“Watching the visuals was more of a distraction. I decided to just focus on the listening and just watch for fun.”Several respondents provided feedback that indicated that the visualizations clashed with the audio:“I did not feel the visuals correlated with the sound clips.”“The visuals almost were tricky because they made it look like there were more transitions than I heard.”“The visuals don't necessarily match up with the transitions as far as I could tell.”These expressed challenges could be interpreted as the result of an additional data stream which increased processing time, however, the numerous responses indicating that the sound and audio did not match up may indicate another problem. Neuhoff (2011) discusses how visual and audio cues can interact, and that mismatched audio and visual can cause the listener to focus on one stream or the other. The fact that so many respondents felt the visuals did not match the audio indicates some degree of this “ventriloquist effect” in our results that may have increased response time due to increased confusion or switching from audio to visual cues.The results of this study also highlight the tension inherent in sonification development between the need to clearly represent information, and the desire to create a pleasing listening experience that is scalable for multiple scientific questions (Walker and Nees, 2011). One such example is the strategic use of microtones and “just intonation” to allow for representation of a larger number of fish within one habitat than could be easily represented by increasing loudness of a single tone or chord. This additional complexity is arguably unnecessary in the current study using only three fish, however future scalability to population-level scientific questions requires this functionality. Further, the desire to avoid listener fatigue through an aesthetically pleasing sonification, regardless of the sample size, is a counterweight to more straightforward pitch-mapping and “auditory graphing” techniques (Song and Beilharz, 2008). Our study highlights the ongoing need to understand best practices for navigating these design trade-offs within the field of sonification (De Campo, 2007; Hermann et al., 2011; Walker and Nees, 2011).
Conclusions
The results of our perceptual testing demonstrate the extent to which sonification could serve as a tool to explore salmon movement in otolith datasets. Even untrained listeners were very sensitive to sonic transitions in pitch and timbre, indicating that sonification can be used to understand fish movement between habitats. Respondents tended to over-report transitions, however, leading to low individual accuracy. This tendency to over-report may be alleviated through listener training in the future. The promise of sonification for otolith migration studies is that these methods may lead to more easily interpreted trends in large, population-level time-series data, with less required time and training than visual data (Ballora et al., 2004; Khamis et al., 2012; Loeb and Fitch, 2002). Therefore, future work should focus on determining listeners' ability to interpret movement patterns in larger otolith datasets, and tailoring sonifications for this purpose.Beyond salmon migration, this work has implications for crowd-sourced data exploration in complex datasets. Crowdsourcing scientific data is increasingly used successfully to explore large datasets which cannot be analyzed computationally (Bonney et al., 2014; Gura, 2013). The ability of naïve listeners, in aggregate, to identify potentially interesting trends using sonification could be used to improve citizen-science initiatives, or enable effective public outreach for projects based on complex data. However, in developing sonifications, our data indicate that simplicity should be the goal, with an attention to limiting chaotic intermittent sounds and mapping the data of interest to pitch and timbre when possible. Further, our data indicate that the identification of transitions within an auditory display is slowed as complexity increases, which may limit a listener's ability to interpret the sonification. This effect may be overcome by slowing the data stream to allow more processing time between transitions, or by increasing listener training. More work is needed, however, to understand how data complexity affects respondent's ability to process and correctly respond to sonification, and to develop strategies to improve individual perception through sonification design.
Declarations
Author contribution statement
Jens C. Hegg: Conceived and designed the experiments; Analyzed and interpreted the data; Contributed reagents, materials, analysis tools or data; Wrote the paper.Jonathan Middleton: Conceived and designed the experiments; Analyzed and interpreted the data; Performed the experiments; Contributed reagents, materials, analysis tools or data; Wrote the paper.Ben Luca Robertson: Conceived and designed the experiments; Contributed reagents, materials, analysis tools or data; Wrote the paper.Brian P. Kennedy: Conceived and designed the experiments; Analyzed and interpreted the data; Contributed reagents, materials, analysis tools or data; Wrote the paper.
Funding statement
This work was supported by Tekes - The Finnish Funding Agency for Innovation (decision 40296/14).
Authors: Rick Bonney; Jennifer L Shirk; Tina B Phillips; Andrea Wiggins; Heidi L Ballard; Abraham J Miller-Rushing; Julia K Parrish Journal: Science Date: 2014-03-28 Impact factor: 47.728
Authors: Pak Chung Wong; Han-Wei Shen; Christopher R Johnson; Chaomei Chen; Robert B Ross Journal: IEEE Comput Graph Appl Date: 2012 Jul-Aug Impact factor: 2.088
Authors: Robin S Waples; Anna Elz; Billy D Arnsberg; James R Faulkner; Jeffrey J Hard; Emma Timmins-Schiffman; Linda K Park Journal: Evol Appl Date: 2017-05-19 Impact factor: 5.183