| Literature DB >> 31358873 |
Christian Bergler1, Hendrik Schröter2, Rachael Xi Cheng3, Volker Barth4, Michael Weber4, Elmar Nöth5, Heribert Hofer3,6,7, Andreas Maier2.
Abstract
Large bioacoustic archives of wild animals are an important source to identify reappearing communication patterns, which can then be related to recurring behavioral patterns to advance the current understanding of intra-specific communication of non-human animals. A main challenge remains that most large-scale bioacoustic archives contain only a small percentage of animal vocalizations and a large amount of environmental noise, which makes it extremely difficult to manually retrieve sufficient vocalizations for further analysis - particularly important for species with advanced social systems and complex vocalizations. In this study deep neural networks were trained on 11,509 killer whale (Orcinus orca) signals and 34,848 noise segments. The resulting toolkit ORCA-SPOT was tested on a large-scale bioacoustic repository - the Orchive - comprising roughly 19,000 hours of killer whale underwater recordings. An automated segmentation of the entire Orchive recordings (about 2.2 years) took approximately 8 days. It achieved a time-based precision or positive-predictive-value (PPV) of 93.2% and an area-under-the-curve (AUC) of 0.9523. This approach enables an automated annotation procedure of large bioacoustics databases to extract killer whale sounds, which are essential for subsequent identification of significant communication patterns. The code will be publicly available in October 2019 to support the application of deep learning to bioaoucstic research. ORCA-SPOT can be adapted to other animal species.Entities:
Mesh:
Year: 2019 PMID: 31358873 PMCID: PMC6662697 DOI: 10.1038/s41598-019-47335-w
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Geographic ranges (light shading) of killer whale populations in northeastern Pacific (British Columbia, Canada) (Illustration recreated after Wiles[31]).
Figure 2Spectrograms of three characteristic killer whale sounds (sampling rate = 44.1 kHz, FFT-size = 4,096 samples (≈100 ms), hop-size = 441 samples (≈10 ms)).
Overview datasets and data distribution.
| split | training | validation | test | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| samples | samples | samples | |||||||||||
| dataset | killer whale | noise | sum | % | killer whale | noise | sum | % | killer whale | noise | sum | % | |
| OACb |
| 6,829 | 1,213 | 8,042 | 69.9 | 1,425 | 286 | 1,711 | 14,9 | 1,443 | 308 | 1,751 | 15.2 |
| AEOTDa |
| 1,289 | 13,135 | 14,424 | 80.2 | 276 | 1,511 | 1,787 | 9.9 | 102 | 1,682 | 1,784 | 9.9 |
| DLFD |
| 3,391 | 20,500 | 23,891 | 74.8 | 1,241 | 2,884 | 4,125 | 12.9 | 1,108 | 2,804 | 3,912 | 12.3 |
| SUM |
| 11,509 | 34,848 | 46,357 | 75.5 | 2,942 | 4,681 | 7,623 | 12.4 | 2,653 | 4,794 | 7,447 | 12.1 |
aDataset available upon request[55,56].
bOrchive tapes available upon request[55,56].
Figure 3(a) (left) Expedition route and data collection range of DeepAL project 2017/2018 (b) (right) A network of hydrophones and the acoustic range of the OrcaLab[55] (Illustration b) recreated after OrcaLab[55] and Ness[56]).
Figure 4ORCA-SPOT network architecture.
Model accuracies for common ResNet architectures by comparing architectures with and without max pooling (3 × 3 kernel, stride 2) in the first residual layer.
| Model | ORCA-SPOT-MAX-POOL | ORCA-SPOT-NO-MAX-POOL | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Accuracy % | Statistics % | Accuracy % | Statistics % | |||||||||
| Arch | run1 | run2 | run3 | max | mean | stdv | run1 | run2 | run3 | max | mean | stdv |
| ResNet18 | 95.39 | 93.99 | 92.84 | 95.39 | 94.08 | 1.28 | 95.88 | 96.15 | 94.40 | 96.15 | 95.48 | 0.94 |
| ResNet34 | 93.65 | 95.72 | 95.20 | 95.72 | 94.86 | 1.08 | 96.13 | 95.65 | 95.12 | 96.13 | 95.64 | 0.51 |
| ResNet50 | 92.39 | 95.76 | 94.88 | 95.76 | 94.35 | 1.75 | 96.37 | 95.90 | 95.61 | 96.37 | 95.96 | 0.38 |
| ResNet101 | 94.39 | 95.33 | 95.01 | 95.33 | 94.91 | 0.47 | 95.81 | 94.10 | 96.24 | 96.24 | 95.39 | 1.13 |
Figure 5ORCA-SPOT training, validation and test set metrics (Table 1).
ORCA-SPOT segmentation results based on 238 tapes (≈191.5 hours) distributed over 23 years.
| Orchive tapes | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| S & M | detected killer whale segments | metric | |||||||||||||||
| total segments | true killer whale segments | false killer whale segments | PPV (%) | ||||||||||||||
| Y & T | samples | time (min.) | samples | time (min.) | samples | time (min.) | samples | time (min.) | |||||||||
| OS1 | OS2 | OS1 | OS2 | OS1 | OS2 | OS1 | OS2 | OS1 | OS2 | OS1 | OS2 | OS1 | OS2 | OS1 | OS2 | ||
| 1985 | 20 | 1,923 | 2,072 | 243.94 | 279.80 | 1,835 | 1,966 | 240.08 | 272.78 | 88 | 106 | 3.86 | 7.02 | 95.42 | 94.88 | 98.42 | 97.49 |
| 1986 | 7 | 568 | 492 | 43.44 | 39.40 | 462 | 478 | 38.54 | 38.84 | 106 | 14 | 4.90 | 0.56 | 81.34 | 97.16 | 88.72 | 98.58 |
| 1987 | 9 | 782 | 911 | 63.10 | 79.70 | 761 | 900 | 61.77 | 79.28 | 21 | 11 | 1.33 | 0.42 | 97.31 | 98.79 | 97.90 | 99.47 |
| 1988 | 10 | 690 | 838 | 66.44 | 90.93 | 631 | 752 | 63.81 | 84.26 | 59 | 86 | 2.63 | 6.67 | 91.45 | 89.74 | 96.05 | 92.67 |
| 1989 | 9 | 418 | 486 | 35.54 | 39.80 | 369 | 471 | 32.85 | 39.06 | 49 | 15 | 2.69 | 0.74 | 88.28 | 96.91 | 92.43 | 98.14 |
| 1990 | 10 | 619 | 585 | 67.41 | 67.18 | 544 | 577 | 63.08 | 66.89 | 75 | 8 | 4.33 | 0.29 | 87.88 | 98.63 | 93.57 | 99.57 |
| 1991 | 10 | 552 | 544 | 41.29 | 44.16 | 459 | 504 | 35.13 | 42.22 | 93 | 40 | 6.16 | 1.94 | 83.15 | 92.65 | 85.09 | 95.60 |
| 1992 | 10 | 680 | 625 | 58.79 | 58.89 | 591 | 620 | 54.28 | 58.67 | 89 | 5 | 4.51 | 0.22 | 86.91 | 99.20 | 92.32 | 99.62 |
| 1993 | 9 | 607 | 579 | 93.72 | 98.58 | 578 | 568 | 92.39 | 98.13 | 29 | 11 | 1.33 | 0.45 | 95.22 | 98.10 | 98.59 | 99.54 |
| 1994 | 9 | 891 | 899 | 89.50 | 98.13 | 846 | 870 | 87.79 | 96.83 | 45 | 29 | 1.71 | 1.30 | 94.95 | 96.77 | 98.09 | 98.68 |
| 1995 | 8 | 289 | 753 | 18.37 | 75.23 | 241 | 381 | 16.12 | 40.30 | 48 | 372 | 2.25 | 34.93 | 83.39 | 50.60 | 87.75 | 53.56 |
| 1996 | 9 | 516 | 787 | 48.79 | 62.88 | 374 | 524 | 30.83 | 42.57 | 142 | 263 | 17.96 | 20.31 | 72.48 | 66.58 | 63.19 | 67.70 |
| 1998 | 10 | 735 | 739 | 90.03 | 95.37 | 675 | 732 | 87.20 | 95.11 | 60 | 7 | 2.83 | 0.26 | 91.84 | 99.05 | 96.86 | 99.73 |
| 1999 | 10 | 695 | 763 | 66.86 | 81.47 | 518 | 548 | 56.91 | 65.53 | 177 | 215 | 9.95 | 15.94 | 74.53 | 71.82 | 85.12 | 80.43 |
| 2000 | 6 | 430 | 436 | 46.10 | 47.53 | 423 | 432 | 45.68 | 47.35 | 7 | 4 | 0.42 | 0.18 | 98.37 | 99.08 | 99.10 | 99.63 |
| 2001 | 13 | 1,164 | 1,157 | 109.41 | 117.60 | 1,067 | 1,138 | 102.78 | 116.62 | 97 | 19 | 6.63 | 0.98 | 91.67 | 98.36 | 93.94 | 99.16 |
| 2002 | 8 | 831 | 808 | 95.25 | 106.58 | 752 | 786 | 91.07 | 105.55 | 79 | 22 | 4.18 | 1.03 | 90.49 | 97.28 | 95.61 | 99.03 |
| 2003 | 10 | 669 | 710 | 56.94 | 59.88 | 605 | 697 | 53.68 | 58.98 | 64 | 13 | 3.26 | 0.90 | 90.43 | 98.17 | 94.26 | 98.50 |
| 2004 | 10 | 1,132 | 1,193 | 110.14 | 129.52 | 1,072 | 1,064 | 107.43 | 120.00 | 60 | 129 | 2.71 | 9.52 | 94.70 | 89.19 | 97.53 | 92.65 |
| 2005 | 9 | 1,098 | 1,254 | 106.98 | 147.33 | 975 | 1,032 | 100.98 | 118.87 | 123 | 222 | 6.00 | 28.46 | 88.80 | 82.30 | 94.39 | 80.68 |
| 2006 | 8 | 1,450 | 1,240 | 156.58 | 134.08 | 1,046 | 1,141 | 127.25 | 129.39 | 404 | 99 | 29.33 | 4.69 | 72.14 | 92.02 | 81.27 | 96.50 |
| 2009 | 12 | 1,248 | 1,122 | 106.22 | 104.10 | 955 | 1,060 | 86.68 | 100.63 | 293 | 62 | 19.54 | 3.47 | 76.52 | 94.49 | 81.60 | 96.67 |
| 2010 | 22 | 1,069 | 218 | 68.74 | 10.24 | 867 | 210 | 56.60 | 9.88 | 202 | 8 | 12.14 | 0.36 | 81.10 | 96.33 | 82.34 | 96.42 |
| SUM | 238 | 19,056 | 19,211 | 1,883.58 | 2,068.38 | 16,646 | 17,451 | 1,732.93 | 1,927.74 | 2,410 | 1,760 | 150.65 | 140.64 | 87.35 |
| 92.00 |
|
Figure 6ORCA-SPOT ROC results (AUC) based on 9 (3 high, 3 mid, and 3 low killer whale activity) fully annotated Orchive tapes.
Figure 7Spectrograms of noise segments classified by OS2 as potential killer whale sounds (false positives) (sampling rate = 44.1 kHz, FFT-size = 4,096 samples (≈100 ms), hop-size = 441 samples (≈10 ms), frequency range: 0–13 kHz).