| Literature DB >> 35663036 |
Jacob Andreas1,2, Gašper Beguš3,2, Michael M Bronstein4,5,6,2, Roee Diamant7,2, Denley Delaney8,2, Shane Gero9,10,2, Shafi Goldwasser11, David F Gruber12,2, Sarah de Haas13,2, Peter Malkin13,2, Nikolay Pavlov2, Roger Payne2, Giovanni Petri14,2, Daniela Rus1,2, Pratyusha Sharma1,2, Dan Tchernov7,2, Pernille Tønnesen15,2, Antonio Torralba1,2, Daniel Vogt16,2, Robert J Wood16,2.
Abstract
Machine learning has been advancing dramatically over the past decade. Most strides are human-based applications due to the availability of large-scale datasets; however, opportunities are ripe to apply this technology to more deeply understand non-human communication. We detail a scientific roadmap for advancing the understanding of communication of whales that can be built further upon as a template to decipher other forms of animal and non-human communication. Sperm whales, with their highly developed neuroanatomical features, cognitive abilities, social structures, and discrete click-based encoding make for an excellent model for advanced tools that can be applied to other animals in the future. We outline the key elements required for the collection and processing of massive datasets, detecting basic communication units and language-like higher-level structures, and validating models through interactive playback experiments. The technological capabilities developed by such an undertaking hold potential for cross-applications in broader communities investigating non-human communication and behavioral research.Entities:
Keywords: Artificial intelligence; Ethology; Linguistics; Natural language processing
Year: 2022 PMID: 35663036 PMCID: PMC9160774 DOI: 10.1016/j.isci.2022.104393
Source DB: PubMed Journal: iScience ISSN: 2589-0042
Figure 1An approach to sperm whale communication that integrates biology, robotics, machine learning, and linguistics expertise, and comprise the following key steps
Record: collect large-scale longitudinal multimodal dataset of whale communication and behavioral data from a variety of sensors. Process: reconcile and process the multi-sensor data. Decode: using machine learning techniques, create a model of whale communication, characterize its structure, and link it to behavior. Encode & Playback: conduct interactive playback experiments and refine the whale language model. Illustration © 2021 Alex Boersma.
Figure 2Sperm whale bioacoustic system
(A) Sperm whale head contains the spermaceti organ (c), a cavity filled with almost 2,000 L of wax-like liquid, and the junk compartment (f), comprising a series of wafer-like bodies believed to act as acoustic lenses. The spermaceti organ and junk act as two connected tubes, forming a bent, conical horn of about 10 m in length and 0.8 m aperture in large mature males. The sound emitted by the phonic lips (i) in the front of the head is focused by traveling through the bent horn, producing a flat wavefront at the exit surface.
(B) Typical temporal structure of sperm whale echolocation and coda clicks. Echolocation signals are produced with consistent inter-click intervals (of approximately 0.4 s) while coda clicks are arranged in stereotypical sequences called “codas” lasting less than 2 s. Codas are characterized by the different number of constituent clicks and the intervals between them (called inter-click intervals or ICIs). Codas are typically produced in multi-party exchanges that can last from about 10 s to over half an hour. Each click, in turn, presents itself as a sequence of equally spaced pulses, with inter-pulse interval (IPI) of an order of 3–4 ms in an adult female, which is the result of the sound reflecting within the spermaceti organ. Illustration © 2021 Alex Boersma.
Figure 3Comparative size of datasets used for training NLP models (represented by the circle area)
GPT-3 is only partially visible, while the DSWP dataset is a tiny dot on this plot (located at the center of the dashed circle). Shown in red is the estimated size of a new dataset planned to be collected in Dominica by Project CETI, an interdisciplinary initiative for cetacean communication interpretation. The estimate is based on the assumption of nearly continuous monitoring of 50–400 whales. The estimate assumes 75%–80% of their vocalizations constituting echolocation clicks, and 20%–25% being coda clicks. A typical Caribbean whale coda has five clicks and lasts 4 s (including a silence between two subsequent codas), yielding a rate of 1.25 clicks/sec. Overall, we estimate it would be possible to collect between 400M and 4B clicks per year as a longitudinal and continuous recording of bioacoustic signals as well as detailed behavior and environmental data.
Figure 4Schematic of whale bioacoustic data collection with multiple data sources by several classes of assets
These include tethered buoy arrays (b), which track the whales in a large area in real time by continuously transmitting their data to shore (g), floaters (e), and robotic fishes (d)Tags (c) attached to whales can possibly provide the most detailed bioacoustic and behavioral data. Aerial drones (a) can be used to assist tag deployment (a1), recovery (a2), and provide visual observation of the whales (a3). The collected multimodal data (1) have to be processed to reconstruct a social network of sperm whales. The raw acoustic data (2) have to be analyzed by ML algorithms to detect (3) and classify (4) clicks. Source separation and identification (5) algorithms would allow reconstructing multi-party conversations by attributing different clicks to the whales producing them. Illustration © 2021 Alex Boersma.