| Literature DB >> 36009611 |
Sebastian Schneider1, Kurt Hammerschmidt2, Paul Wilhelm Dierkes1.
Abstract
Unsupervised clustering algorithms are widely used in ecology and conservation to classify animal sounds, but also offer several advantages in basic bioacoustics research. Consequently, it is important to overcome the existing challenges. A common practice is extracting the acoustic features of vocalizations one-dimensionally, only extracting an average value for a given feature for the entire vocalization. With frequency-modulated vocalizations, whose acoustic features can change over time, this can lead to insufficient characterization. Whether the necessary parameters have been set correctly and the obtained clustering result reliably classifies the vocalizations subsequently often remains unclear. The presented software, CASE, is intended to overcome these challenges. Established and new unsupervised clustering methods (community detection, affinity propagation, HDBSCAN, and fuzzy clustering) are tested in combination with various classifiers (k-nearest neighbor, dynamic time-warping, and cross-correlation) using differently transformed animal vocalizations. These methods are compared with predefined clusters to determine their strengths and weaknesses. In addition, a multidimensional data transformation procedure is presented that better represents the course of multiple acoustic features. The results suggest that, especially with frequency-modulated vocalizations, clustering is more applicable with multidimensional feature extraction compared with one-dimensional feature extraction. The characterization and clustering of vocalizations in multidimensional space offer great potential for future bioacoustic studies. The software CASE includes the developed method of multidimensional feature extraction, as well as all used clustering methods. It allows quickly applying several clustering algorithms to one data set to compare their results and to verify their reliability based on their consistency. Moreover, the software CASE determines the optimal values of most of the necessary parameters automatically. To take advantage of these benefits, the software CASE is provided for free download.Entities:
Keywords: bioacoustics; clustering methods; feature extraction; frequency-modulated vocalizations; multidimensional; vocal repertoire; vocalization classification
Year: 2022 PMID: 36009611 PMCID: PMC9404437 DOI: 10.3390/ani12162020
Source DB: PubMed Journal: Animals (Basel) ISSN: 2076-2615 Impact factor: 3.231
Figure 1Sequence of clustering procedures. Possible combinations are shown, resulting in 48 different procedures.
Figure 2Illustration of the advantages of the software CASE in comparison with conventional approaches using one-dimensional feature vectors as inputs for a single clustering algorithm.
Figure 3Cohesion calls of the giant otters. Three of the selected cohesion calls per individual are shown.
Figure 4The five different vocal types of the harpy eagles that were used.
Figure 5Process of data transformation using the All-in-one method (upper half) and the windowed method (lower half). All-in-one: For each vocalization (Voc1 to VocN) M acoustic features are extracted or M frequency domain representations are determined using FFT. These can then either be classified directly via DTW or Xcorr, or are transferred to an NxM matrix for further processing via kNN, HDBSCAN, or fuzzy clustering. Windowed: Each vocalization (Voc1 to VocN) is divided into L windows (w1 to wL) from which m acoustic features are extracted or m frequency domain representations are determined by FFT. These mxL matrices can then either be classified directly via DTW or Xcorr, or are transferred into an NxM matrix (where M = L × m) for further processing via kNN, HDBSCAN, or fuzzy clustering.
List of acoustic features extracted for one vocalization (All-in-one) or for each time window of one vocalization (windowed). The commands (in quotes) and toolboxes mentioned in the reference column all refer to MATLAB.
| Acoustic | Definition of Features | Definition of Features | Reference |
|---|---|---|---|
| F0 | Median fundamental frequency | Fundamental frequency for each window | Audio Toolbox “pitch” with NCF |
| Delta F0 | Median value of the difference between adjacent values for F0 per window | Value of the difference between adjacent values for F0 | |
| Dominant f | Frequency with highest amplitude in the spectrum | Frequency with highest amplitude in the spectrum of each window | |
| Min f | Lower bound of the 99% occupied bandwidth | Lower bound of the 99% occupied bandwidth | Signal Processing Toolbox “obw” |
| Max f | Upper bound of the 99% occupied bandwidth | Upper bound of the 99% occupied bandwidth | Signal Processing Toolbox “obw” |
| Bandwidth | 99% occupied bandwidth | 99% occupied bandwidth | Signal Processing Toolbox “obw” |
| Duration | Time between onset and offset of a vocalization in seconds. | Duration only determined for win%, not for win. | |
| 1st Quartile | 1st quartile of the energy distribution | 1st quartile of the energy distribution | [ |
| 2nd Quartile | 2nd quartile of the energy distribution | 2nd quartile of the energy distribution | [ |
| 3rd Quartile | 3rd quartile of the energy distribution | 3rd quartile of the energy distribution | [ |
| Max Q1 | Frequency with highest amplitude in the 1st quartile | Frequency with highest amplitude in the 1st quartile | |
| Max Q2 | Frequency with highest amplitude in the 2nd quartile | Frequency with highest amplitude in the 2nd quartile | |
| Max Q3 | Frequency with highest amplitude in the 3rd quartile | Frequency with highest amplitude in the 3rd quartile | |
| Max Q4 | Frequency with highest amplitude in the 4th quartile | Frequency with highest amplitude in the 4th quartile | |
| FB1 | Median frequency of the 1st frequency band determined by LPC (Hz) | Frequency of the 1st frequency band determined by LPC | Signal Processing Toolbox “lpc” with 205 coefficients |
| FB2 | Median frequency of the 2nd frequency band | Frequency of the 2nd frequency band | LPC with 205 coefficients |
| FB3 | Median frequency of the 3rd frequency band | Frequency of the 3rd frequency band | LPC with 205 coefficients |
| BW FB1 | Median bandwidth of FB1 | Bandwidth of FB1 | “findpeaks” |
| BW FB2 | Median bandwidth of FB2 | Bandwidth of FB2 | “findpeaks” |
| BW FB3 | Median bandwidth of FB3 | Bandwidth of FB3 | “findpeaks” |
| Delta FB1-FB2 | Difference between FB1 and FB2 | Difference between FB1 and FB2 | |
| Delta FB2-FB3 | Difference between FB2 and FB3 | Difference between FB2 and FB3 | |
| Number of FB | Number of frequency bands determined by LPC | Number of frequency bands determined by LPC | Number calculated by LPC |
| Harmonic Ratio | Harmonic ratio is returned with values in the range of 0 to 1. A value of 0 represents low harmonicity, and a value of 1 represents high harmonicity | Harmonic ratio is returned with values in the range of 0 to 1. A value of 0 represents low harmonicity, and a value of 1 represents high harmonicity | Audio Toolbox “harmonicRatio” |
| Spectral Flatness | Measures how noisy a signal is. The higher the value, the noisier the signal | Measures how noisy a signal is. The higher the value, the noisier the signal | Audio Toolbox “spectralFlatness” |
| MFCC 1 | 1st mel frequency cepstral coefficient | 1st mel frequency cepstral coefficient | Audio Toolbox “mfcc” |
| MFCC 2 | 2nd mel frequency cepstral coefficient | 2nd mel frequency cepstral coefficient | Audio Toolbox “mfcc” |
Figure 6Results of the clustering methods obtained using the harpy eagle vocalizations (A) and the cohesion calls of the giant otters (B). The green dashed lines mark the threshold value that must be exceeded for NMI values (>0.5) and fallen below for Δ-Cluster values (≤1). The double arrows mark the cluster solutions that fulfill both the threshold value for NMI and the threshold value for Δ-Cluster.
NMI values of the comparison between the labeling of the different clustering methods and the actual allocation of vocalizations to the corresponding giant otter individuals. The allocation was not human-rated, but could be determined by sound localization. Only those NMI values that were achieved by clustering methods that met the threshold values for both NMI and Δ-Cluster are shown.
| Clustering Methods | Acoustic Features | Spectra | Data | ||||
|---|---|---|---|---|---|---|---|
| Win | Win(%) | All-in-One | Win | Win(%) | All-in-One | ||
|
| 0.80 | 0.61 | Giant Otter | ||||
|
| 0.61 | ||||||
|
| 0.50 | ||||||
|
| 0.50 | ||||||
|
| |||||||
|
| 0.58 | ||||||
|
| |||||||
|
| |||||||
NMI values of the comparison between the labeling of the different clustering methods and the authors’ rated label. Only the NMI values that were achieved by clustering methods that met the threshold values for both NMI and Δ-Cluster are shown.
| Clustering Methods | Acoustic Features | Spectra | Data | ||||
|---|---|---|---|---|---|---|---|
| Win | Win(%) | All-in-One | Win | Win(%) | All-in-One | ||
|
| 0.67 | 0.92 | Harpy Eagle | ||||
|
| 0.85 | 0.79 | 0.51 | ||||
|
| 0.78 | 0.67 | 0.75 | 0.54 | |||
|
| 0.65 | 0.61 | 0.64 | 0.59 | |||
|
| 0.78 | 0.79 | |||||
|
| 0.65 | 0.74 | 0.87 | ||||
|
| 0.77 | ||||||
|
| 0.85 | ||||||
Comparison of the determined harpy eagle vocalization labels using NMI. NMI values are shaded if NMI > 0.5 and Δ Cluster ≤ 1.
| Acoustic Features (Win%) | Spectrogram | Acoustic Features (All-in-One) | ||||||
|---|---|---|---|---|---|---|---|---|
| kNN + CD | kNN + AP | DTW + CD | DTW + AP | HDBSCAN | Xcorr + CD | Xcorr + AP | Fuzzy | |
|
| 0.90 | |||||||
|
| 0.70 | 0.65 | ||||||
|
| 0.64 | 0.63 | 0.86 | |||||
|
| 0.68 | 0.68 | 0.92 | 0.88 | ||||
|
| 0.91 | 0.81 | 0.65 | 0.60 | 0.68 | |||
|
| 0.88 | 0.79 | 0.68 | 0.62 | 0.64 | 0.84 | ||
|
| 0.78 | 0.72 | 0.73 | 0.67 | 0.76 | 0.80 | 0.74 | |
Comparison of the determined giant otter vocalization labels using NMI. NMI values are shaded if NMI > 0.5 and Δ Cluster ≤ 1.
| Acoustic Features (Win%) | Spectrogram | Acoustic Features (All-in-One) | ||||||
|---|---|---|---|---|---|---|---|---|
| kNN + CD | kNN + AP | DTW + CD | DTW + AP | HDBSCAN | Xcorr + CD | Xcorr + AP | Fuzzy | |
|
| 0.64 | |||||||
|
| 0.56 | 0.45 | ||||||
|
| 0.56 | 0.45 | 1 | |||||
|
| 0.4 | 0.42 | 0.10 | 0.10 | ||||
|
| 0.37 | 0.46 | 0.41 | 0.41 | 0.46 | |||
|
| 0.41 | 0.43 | 0.41 | 0.41 | 0.32 | 0.67 | ||
|
| 0.57 | 0.59 | 0.51 | 0.51 | 0.47 | 0.63 | 0.42 | |
NMI and Δ-Cluster values achieved by inexperienced, human observers compared with the a priori labels.
| Harpy Eagle | Giant Otter | |||
|---|---|---|---|---|
| Person | NMI | Δ-Cluster | NMI | Δ-Cluster |
|
| 0.9233 | 1 | 0.3013 | 1 |
|
| 0.9293 | 0 | 0.5079 | 3 |
|
| 0.8527 | 1 | 0.381 | 1 |
|
| 0.9545 | 0 | 0.3236 | 0 |
|
| 0.9233 | 1 | 0.3877 | 0 |