| Literature DB >> 35336352 |
Lucas C F Domingos1,2, Paulo E Santos1,3, Phillip S M Skelton3, Russell S A Brinkworth3, Karl Sammut3.
Abstract
This paper presents a comprehensive overview of current deep-learning methods for automatic object classification of underwater sonar data for shoreline surveillance, concentrating mostly on the classification of vessels from passive sonar data and the identification of objects of interest from active sonar (such as minelike objects, human figures or debris of wrecked ships). Not only is the contribution of this work to provide a systematic description of the state of the art of this field, but also to identify five main ingredients in its current development: the application of deep-learning methods using convolutional layers alone; deep-learning methods that apply biologically inspired feature-extraction filters as a preprocessing step; classification of data from frequency and time-frequency analysis; methods using machine learning to extract features from original signals; and transfer learning methods. This paper also describes some of the most important datasets cited in the literature and discusses data-augmentation techniques. The latter are used for coping with the scarcity of annotated sonar datasets from real maritime missions.Entities:
Keywords: deep convolutional neural networks; objects’ classification; underwater acoustics
Mesh:
Year: 2022 PMID: 35336352 PMCID: PMC8954367 DOI: 10.3390/s22062181
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
An overview of methods for classification using convolutional filters alone. In the ‘Metric’ column, ACCY stands for accuracy.
| Reference | Network | Dataset | Metric | Main Contributions |
|---|---|---|---|---|
| Yang et al., 2019 [ | Auditory | Ocean Network Canada signals a | 81.96% | The use of a bank of multiscale deep convolutional filters as a first processing stage, making possible the creation of an end-to-end NN. |
| Hu et al., 2018 [ | CNN + ELM | Civil Ship b | 93.04% | The substitution of the final fully connected layer of a CNN for an ELM classifier, improving the generalisation of the model. |
| Doan et al., 2020 [ | UATC-DenseNet | Real-world passive sonar data b | 98.85% | The analysis of the use of a number of convolutional blocks and layers, and different layer configurations and input features. |
| Tian et al., 2021 [ | MSRDN | Ocean Network Canada signals a | 83.15% | The development of a deep residual network using the soft-thresholding proposed in [ |
a Dataset uses public data with nonpublic preprocessing techniques. b Dataset is proprietary and unavailable for reproduction.
An overview of ML methods for classification using biologically inspired filtering algorithms. In the ‘Metric’ column, ACCY stands for accuracy, and AP stands for Average Precision.
| Reference | Preprocessing | Network | Dataset | Metric | Main Contributions |
|---|---|---|---|---|---|
| Le et al., 2020 [ | Gabor filter banks | Deep Gabor Neural Network | Australian DSTG data b | 79.93% | The use of Gabor filter to extract textures and outlines from the sonar images. |
| Shen et al. 2020, 2019, 2018 [ | Cochlea model | Auditory Inspired CNN | Ocean Network Canada signals a | 87.2% | The use of Gabor filter layers inspired on the Cochlea model. |
| Wang et al., 2018 [ | Mel-filter bank | Inception, Xception, VGG and Densenet | Whale FM | 84.40% | The accuracy comparison of traditional methods and CNNs for the same scenarios. |
| Khishe and Mohammadi, 2019 [ | MFCC | Fully Connected | Sonar dataset b | 97.12% | The accuracy comparison of metaheuristic algorithms and the use of fully connected NN. |
| Wang et al., 2019 [ | GFCC and MFCC | Fully Connected | Six class dataset b | 94.3% | A combination of MFCC and GFCC was used as feature extraction showing time performance improvement. |
a Dataset uses public data with nonpublic preprocessing techniques. b Dataset is proprietary and unavailable for reproduction.
An overview of ML methods for classification from frequency and time–frequency analysis. In the ‘Metric’ column, ACCY stands for accuracy, AP stands for Average Precision, and mAP stands for Mean Average Precision.
| Reference | Preprocessing | Network | Dataset | Metric | Main Contributions |
|---|---|---|---|---|---|
| Ferguson et al., 2017 [ | Cepstrum, Cepstrogram | CNN | Recorded boat radiated noise a | 99.78% | The cepstral representation of inputs for a NN and an improved distance estimation from acoustic noisy sources. |
| Choi et al., 2019 [ | pCSDM, mCSDM | CNN, RF, SVM and FNN | Simulated acoustic data a | >95% ACCY | Comparison between CNN and FNN. |
| Miao et al., 2021 [ | ACT | TFFNet | Whale FM | 92.1% mAP | The combination of ACT with the EFP to generate a high-resolution time–frequency representation. |
| Kim et al., 2021 [ | DWT | CNN | Underwater acoustic signals a | 100% ACCY | The wavelet transform to obtain noise robustness and data augmentation as data source. The results converged with an 8-fold reduction in the number of epochs. |
| Cinelli et al., 2018 [ | Spectrogram, delta, delta–delta frequencies | Fully Connected and CNN | Brazilian Marine dataset a | 88.1% AP | A comparative study of the accuracy of different ML architectures for various input layers. |
| Vahidpour et al., 2015 [ | Image Histograms | Fully Connected | Five classes acoustic dataset a | 95.13% ACCY b | The generation of a short-time Fourier transform-based binary image as input, improving noise robustness. |
| Bach et al., 2021 [ | Signal demodulation | CNN | ShipsEar | 90% ACCY | An algorithm for detecting the fundamental frequencies of a signal according to the amplitude variation, improving the performance of a CNN. |
a Dataset is proprietary and unavailable for reproduction. b Results obtained at 10 dB of SNR.
An overview of methods for classification using machine learning for feature extraction. In the ‘Metric’ column, ACCY stands for accuracy.
| Reference | Preprocessing | Network | Dataset | Metric | Main Contributions |
|---|---|---|---|---|---|
| Luo et al., 2021 [ | RBM based autoencoder | BP Neural Network | ShipsEar | 92.6% | Claims a better feature extraction performance than the conventional feature extraction methods. |
| Denos et al., 2017 [ | Autoencoder NN | CNN | Synthetic realistic images a | 0.87 | The combination of AE with CNN provided an application based on image object-detection models (R-CNN). |
| Kamal et al., 2021 [ | Convolutional Filters | Deep convolutional LSTMs | Indian ocean acoustic dataset a | 95.2% | The proposal of an adaptive filter to generate time–frequency representation based on Short-Time Fourier Transform. |
a Dataset is proprietary and unavailable for reproduction.
An overview of ML methods using transfer learning. In the ’Metric’ column, ACCY stands for Accuracy, and mAP stands for mean average precision.
| Reference | Based Network | Dataset | Metric | Main Contributions |
|---|---|---|---|---|
| Huo et al., 2020 [ | VGG-19 | Seabed Objects-KLSG | 97.76% | Fine tuning method with semi-synthetic dataset. |
| Fuchs et al., 2018 [ | ResNet50, CNN-SVM | ARACAT dataset | 90% | Results show that ImageNet can be used effectively by a CNN to extract general features that can then provide better accuracy when training a sonar classifier with small datasets. |
| Valdenegro-Toro et al., 2021 [ | ResNet 20, MobileNets, DenseNet121, SqueezeNet, MiniXception, and Autoencoder | Marine Debris Turntable and Watertank, and Gemini 720i Panel-Pipe a | 96.31% | Reported results that reinforce the idea of achieving better accuracy with less data when using transfer learning. |
| Nguyen et al., 2019 [ | AlexNet and GoogleNet | CKI, TDI-2017 and TDI-2018 a | 91.6% | The reuse of CNN architectures applied to image classification tasks. |
| Valdenegro-Toro, 2017 [ | LeNet and SqueezeNet | Ocean Systems Lab water tank dataset a | 98.7% | Showed that transfer learning achieves acceptable accuracy even when the source and target sets have no classes in common. |
| Ge et al., 2021 [ | VGG-19 | Seabed Objects-KLSG and proprietary data a | 97.32% | Achieves high accuracy rate even with a small sample size obtained from a synthetically generated dataset. |
| Yu et al., 2021 [ | Transformer and YOLOv5s | Two side-scan sonar datasets were constructed sonar images of shipwrecks and submerged containers. | 85.6% | The proposal of side-scan sonar automatic target recognition method, including preprocessing, sampling, target recognition and target localization, using the SOTA YOLOv5s model and an attention mechanism. |
a Dataset is proprietary and unavailable for reproduction.
A summary of some datasets available.
| Dataset | Description |
|---|---|
| Ocean Network Canada (ONC) [ | A variety of (nonannotated) datasets containing continuously monitored data for relevant ocean variables on the east, west, and Arctic coasts of Canada are collected, maintained, and distributed by Ocean Networks Canada via their Oceans 3.0 data portal at |
| DeepShip: An Underwater Acoustic Benchmark Dataset [ | DeepShip is a benchmark dataset (constructed with data from ONC) for underwater ship classification which consists of 47 h and 4 min of recordings of 265 different ships belonging to four different classes (background sound was not available). This dataset is available for download at |
| ShipsEar: An underwater vessel noise database [ | ShipsEar is a database containing underwater recordings of ship and boat sounds, which has 90 recordings of 11 different vessel types. It also has some useful information about the recordings, such as channel depth, wind, distance and location, to cite a few. This dataset is available for download at |
| Five-element acoustic dataset [ | The main purpose of this dataset is to facilitate research on Doppler correction techniques for underwater acoustic transmissions. The dataset is composed of 360 communication packets with a duration of 0.5 s generated by a transceiver and captured by five hydrophones at nine different positions, and is available for download at |
| Fish classification with Dual-Frequency Identification Sonar (DIDSON) [ | Fishery acoustic observation data were collected using Dual-Frequency Identification Sonar (DIDSON) with the purpose of classifying fish species. From 100 h of data, 524 clips were extracted with eight species labelled. The dataset is available for download at |
| Passive sonar spectrogram images derived from time series [ | The main purpose of this dataset is to facilitate solutions for the problem of detecting tracks in a spectrogram. It contains 4142 spectrograms generated from synthetic and also real small-boat data. The dataset is available for download at |
| Marine Debris Turntable (MDT) dataset [ | This dataset was obtained from a forward-looking sonar (ARIS Explorer 300) placed in a water tank in which a rotating turntable was used to allow various poses for the objects observed. The MDT dataset contains 2471 images with 12 classes of object, including bottle, pipe, platform and propeller, and it is available from |
An overview of the data-augmentation strategies.
| Reference | Year | Data-Augmentation Method |
|---|---|---|
| Berg and Hjelmervik [ | 2018 | Real target detections were copied multiple times and only a fraction of false alarms considered. |
| Le et al. [ | 2020 | MLO image snippets overlaid on seabed background images. |
| Huo et al. [ | 2020 | MLO image snippets overlaid on seabed background images with simulated shadows. |
| Denos et al. [ | 2017 | Generated photo-realistic pictures from 3D mine object modelling combined with synthetically generated seabed background. |
| Choi et al. [ | 2019 | Simulated data from a normal-mode propagation model, generated from Monte-Carlo simulation considering a vertical line array (VLA). |
| Luo et al. [ | 2021 | Reconstruction of audio signals using the output from a RBM auto-encoder. |
| Kim et al. [ | 2017 | Denoising of sonar images using the structures generated from an auto-encoder. |
| Kim et al. [ | 2021 | Convoluted acoustic signals and impulse response signals, with white Gaussian noise added to generate extended audio signals. |
| Phung et al. [ | 2019 | GAN generated sonar images of MLO. |
| Sung et al. [ | 2019 | Ray-tracing method used to generate sonar images followed by a GAN to translate it into more realistic images. |
| Karjalainen et al. [ | 2019 | Synthetic contact sonar images created using ray-tracing then refined using CycleGAN [ |
| Rixon Fuchs et al. [ | 2019 | Used a GAN on simulated data, to generate high resolution sonar data (long arrays) from low resolution counterparts (short arrays). |
| Jegorova et al. [ | 2020 | Relied on the pix2pix architecture to create the Markov Conditional pix2pix GAN architecture, generating realistic-looking SSS images. |
| Ge et al. [ | 2021 | Generated SSS data from optical images using a neural style-transfer. |
| Nguyen et al. [ | 2019 | Added background noise (salt and pepper) and polarised noise to sonar images. |