| Literature DB >> 32025373 |
Allan G de Oliveira1,2, Thiago M Ventura1,2, Todor D Ganchev1,3, Lucas N S Silva1,2, Marinêz I Marques1,4,5,6, Karl-L Schuchmann1,4,6,7.
Abstract
Automated acoustic recognition of birds is considered an important technology in support of biodiversity monitoring and biodiversity conservation activities. These activities require processing large amounts of soundscape recordings. Typically, recordings are transformed to a number of acoustic features, and a machine learning method is used to build models and recognize the sound events of interest. The main problem is the scalability of data processing, either for developing models or for processing recordings made over long time periods. In those cases, the processing time and resources required might become prohibitive for the average user. To address this problem, we evaluated the applicability of three data reduction methods. These methods were applied to a series of acoustic feature vectors as an additional postprocessing step, which aims to reduce the computational demand during training. The experimental results obtained using Mel-frequency cepstral coefficients (MFCCs) and hidden Markov models (HMMs) support the finding that a reduction in training data by a factor of 10 does not significantly affect the recognition performance. ©2020 de Oliveira et al.Entities:
Keywords: Data reduction; Data representation; Piecewise aggregate approximation; Random sampling; Uniform sampling
Year: 2020 PMID: 32025373 PMCID: PMC6991130 DOI: 10.7717/peerj.8407
Source DB: PubMed Journal: PeerJ ISSN: 2167-8359 Impact factor: 2.984
Figure 1A four-step workflow summarizes the automated bird recognition process.
Figure 2The proposed five-step workflow for speeding up the training of automated bird recognition models.
Total processing time required for obtaining different subsets of data that comprised 5%, 10%, 20%, or 40% of the initial dataset.
| Method | 5% | 10% | 20% | 40% |
|---|---|---|---|---|
| RS | 2 min 14 s | 2 min 16 s | 2 min 57 s | 4 min 08 s |
| US | 1 min 53 s | 2 min 20 s | 2 min 42 s | 3 min 56 s |
| PAA | 2 min 18 s | 2 min 30 s | 3 min 11 s | 4 min 50 s |
Figure 3Overall exponential trend and total processing time for selecting different subsets of audio feature vectors (5–40%) with different data reduction methods.
Total times for training the HMMs with different subsets of audio features.
| Method | 5% | 10% | 20% | 40% |
|---|---|---|---|---|
| RS | 1 h 25 min | 5 h 58 min | 33 h 10 min | 164 h 04 min |
| US | 1 h 36 min | 7 h 09 min | 37 h 28 min | 290 h 10 min |
| PAA | 1 h 43 min | 7 h 37 min | 41 h 47 min | 303 h 10 min |
Performance in terms of the precision of classification for different subsets of audio feature vectors.
| Method | 5% | 10% | 20% | 40% | … | Baseline |
|---|---|---|---|---|---|---|
| RS | 56.2% | 58.4% | 53.1% | 47.8% | 64.1% | |
| US | 65.6% | 65.8% | 65.3% | 64.6% | 64.1% | |
| PAA | 0.0% | 0.0% | 63.3% | 64.7% | 64.1% |
Performance in terms of the classification accuracy for different subsets of audio feature vectors.
| Method | 5% | 10% | 20% | 40% | … | Baseline |
|---|---|---|---|---|---|---|
| RS | 96.3% | 96.7% | 96.2% | 95.4% | 97.5% | |
| US | 97.6% | 97.7% | 97.6% | 97.6% | 97.5% | |
| PAA | 95.8% | 95.8% | 97.2% | 97.6% | 97.5% |
Figure 4Overall exponential trend and total times for training the HMMs with different subsets.
Percentage reduction of the overall training times after applying the indicated data reduction methods, compared to the time needed to train a model with the initial dataset.
| Method | 5% | 10% | 20% | 40% |
|---|---|---|---|---|
| RS | 99.9% | 99.7% | 98.1% | 90.7% |
| US | 99.9% | 99.6% | 97.9% | 83.6% |
| PAA | 99.9% | 99.6% | 97.6% | 82.8% |