| Literature DB >> 31467331 |
Peter C Bermant1, Michael M Bronstein1,2,3, Robert J Wood4,5, Shane Gero6, David F Gruber7,8.
Abstract
We implemented Machine Learning (ML) techniques to advance the study of sperm whale (Physeter macrocephalus) bioacoustics. This entailed employing Convolutional Neural Networks (CNNs) to construct an echolocation click detector designed to classify spectrograms generated from sperm whale acoustic data according to the presence or absence of a click. The click detector achieved 99.5% accuracy in classifying 650 spectrograms. The successful application of CNNs to clicks reveals the potential of future studies to train CNN-based architectures to extract finer-scale details from cetacean spectrograms. Long short-term memory and gated recurrent unit recurrent neural networks were trained to perform classification tasks, including (1) "coda type classification" where we obtained 97.5% accuracy in categorizing 23 coda types from a Dominica dataset containing 8,719 codas and 93.6% accuracy in categorizing 43 coda types from an Eastern Tropical Pacific (ETP) dataset with 16,995 codas; (2) "vocal clan classification" where we obtained 95.3% accuracy for two clan classes from Dominica and 93.1% for four ETP clan types; and (3) "individual whale identification" where we obtained 99.4% accuracy using two Dominica sperm whales. These results demonstrate the feasibility of applying ML to sperm whale bioacoustics and establish the validity of constructing neural networks to learn meaningful representations of whale vocalizations.Entities:
Mesh:
Year: 2019 PMID: 31467331 PMCID: PMC6715799 DOI: 10.1038/s41598-019-48909-4
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Input testing spectrogram images with the trained network’s predicted output labels of (a) Click and (b) No Click. The lack of labeled axes and the image resolution reflect that these are the images that are used purely as input data when training the CNN. The resolution suffices for training a CNN-based echolocation click detector.
Figure 2t-SNE visualization of the coda type classifier with colors denoting different coda type. Standard PCA and t-SNE techniques are used to plot the coda type hidden features in a three-dimensional feature space.
Figure 3t-SNE visualization of the vocal clan classifier with the two classes representing the two clans identified in the eastern Caribbean (purple points “EC1”, red points “EC2”). We implement PCA and t-SNE to visualize the hidden features of the clan class network.
Figure 4t-SNE visualization of the whale ID type classifier distinguishing between two identified whales from the “EC1” clan recorded off Dominica (purple points indicate codas produced by whale #5722, and the red points are codas generated by whale #5727). We implement PCA and t-SNE to visualize the hidden features in the whale ID classifier network.