| Literature DB >> 25822142 |
Etto L Salomons1, Paul J M Havinga2.
Abstract
Wireless sensor networks are suitable to gain context awareness for indoor environments. As sound waves form a rich source of context information, equipping the nodes with microphones can be of great benefit. The algorithms to extract features from sound waves are often highly computationally intensive. This can be problematic as wireless nodes are usually restricted in resources. In order to be able to make a proper decision about which features to use, we survey how sound is used in the literature for global sound classification, age and gender classification, emotion recognition, person verification and identification and indoor and outdoor environmental sound classification. The results of the surveyed algorithms are compared with respect to accuracy and computational load. The accuracies are taken from the surveyed papers; the computational loads are determined by benchmarking the algorithms on an actual sensor node. We conclude that for indoor context awareness, the low-cost algorithms for feature extraction perform equally well as the more computationally-intensive variants. As the feature extraction still requires a large amount of processing time, we present four possible strategies to deal with this problem.Entities:
Year: 2015 PMID: 25822142 PMCID: PMC4431233 DOI: 10.3390/s150407462
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Classifications of sound events.
Figure 2Linear predictive cepstral coefficient (LPCC) calculation.
Figure 3Mel (melody)-spaced filter bank.
Figure 4Mel frequency cepstrum coefficient (MFCC) calculation.
Figure 5Relative execution time per feature. ZCR, zero crossing rate; STE, short-time energy; F0, base frequency.
Relative execution time (RET).
| Time domain features | TD | 1 |
| Haar-like features | Haar | 3.5 |
| Frequency domain features | FD | 15 |
| Long-term features | long | 15 |
| LPCC features | LPCC | 30 |
| MFCC features | MFCC | 78 |
Figure 6Relative execution time for global sound recognition.
Figure 7Relative execution time for gender recognition.
Figure 8Relative execution time for age recognition.
Figure 9Relative execution time for person recognition.
Figure 10Relative execution time for emotion recognition.
Figure 11Relative execution time for indoor environment recognition.
Figure 12Relative execution time for outdoor environment recognition.
Figure 13Overview of feature comparisons, grouped by category.
Figure 14Overview of feature comparisons, grouped by feature.
| Article | Lu |
| Features | simple time, frequency, LPCC |
| Experiment, accuracy | speech: 97.00%, music: 93.00%, environment: 84.00% |
| Training method | KNN/LSP-VQ |
|
| |
| Article | Nishimura and Kuroda (2008) [ |
| Features | Haar-like |
| Experiment, accuracy | speech/non-speech: 96.93% |
| Training method | LBG-VQ |
| Article | Nishimura (2012) [ |
| Features | Haar-like |
| Experiment, accuracy | gender: 97.50% |
| Training method | LBG |
|
| |
| Article | Ting |
| Features | frequency, MFCC |
| Experiment, accuracy | gender: 96.70% |
| Training method | GMM |
|
| |
| Article | Zourmand |
| Features | frequency |
| Experiment, accuracy | gender: 97.50% |
| Training method | NN |
|
| |
| Article | Pronobis and Magimai-Doss (2009) [ |
| Features | frequency |
| Experiment, accuracy | gender: 100.00% |
| Training method | SVM |
|
| |
| Article | Pronobis and Magimai-Doss (2009) [ |
| Features | MFCC |
| Experiment, accuracy | gender: 97.90% |
| Training method | SVM |
|
| |
| Article | Pronobis and Magimai-Doss (2009) [ |
| Features | LPCC |
| Experiment, accuracy | gender: 97.20% |
| Training method | SVM |
|
| |
| Article | Kim |
| Features | MFCC |
| Experiment, accuracy | age: 94.60% |
| Training method | GMM |
|
| |
| Article | Kim |
| Features | MFCC |
| Experiment, Accuracy | gender: 94.90% |
| Training Method | GMM |
|
| |
| Article | Kim |
| Features | MFCC |
| Experiment, accuracy | age + gender average: 88.90% |
| Training method | GMM |
|
| |
| Article | Chen |
| Features | simple time, frequency, long-term |
| Experiment, accuracy | age + gender average: 51.40% |
| Training method | NN |
|
| |
| Article | Chen |
| Features | simple time, frequency, long-term |
| Experiment, accuracy | male: 91.40%, female: 81.20% |
| Training method | NN |
|
| |
| Article | van Heerden |
| Features | MFCC, long-term |
| Experiment, accuracy | age + gender average: 50.70% |
| Training Method | SVM |
|
| |
| Article | Sadeghi Naini and Homayounpour (2006) [ |
| Features | MFCC, long-term |
| Experiment, accuracy | gender: 86.50% |
| Training method | NN |
|
| |
| Article | Sadeghi Naini and Homayounpour (2006) [ |
| Features | MFCC, long-term |
| Experiment, accuracy | 2 age classes: 72.00%, 3 age classes: 60.70% |
| Training method | NN |
|
| |
| Article | Li |
| Features | simple time, frequency |
| Experiment, accuracy | 4 age groups: 52.00% |
| Training method | GMM+SVM |
|
| |
| Article | Li |
| Features | simple time, frequency |
| Experiment, accuracy | gender: 88.40% |
| Training method | GMM+SVM |
| Article | Nogueiras |
| Features | simple time, frequency |
| Experiment, accuracy | 7 emotions: 80.00% |
| Training method | HMM |
|
| |
| Article | Nishimura (2012) [ |
| Features | Haar-like |
| Experiment, accuracy | 3 emotions: 84.60% |
| Training method | LBG |
|
| |
| Article | Nwe |
| Features | Frequency |
| Experiment, accuracy | 6 emotions: 78.10% |
| Training method | HMM |
|
| |
| Article | Busso |
| Features | frequency |
| Experiment, accuracy | 15 emotions: 77.00% |
| Training method | GMM |
|
| |
| Article | He |
| Features | frequency |
| Experiment, accuracy | stress detection: 81.00% |
| Training method | GMM |
|
| |
| Article | Bou-Ghazale and Hansen (2000) [ |
| Features | MFCC |
| Experiment, accuracy | stress detection, 4 levels: 83.66% |
| Training method | |
|
| |
| Article | Neiberg |
| Features | frequency, MFCC |
| Experiment, accuracy | 3 emotions: 90.00% |
| Training method | GMM |
|
| |
| Article | Pao |
| Features | MFCC, LPCC |
| Experiment, accuracy | 6 emotions: 79.55% |
| Training method | weighted D-KNN |
|
| |
| Article | Tosa and Nakatsu (1996) [ |
| Features | LPCC |
| Experiment, accuracy | 7 emotions: 60.00% |
| Training method | ANN |
|
| |
| Article | Ooi |
| Features | simple time, MFCC |
| Experiment, accuracy | 6 emotions: 75.90% |
| Training method | NN |
|
| |
| Article | Giannoulis and Potamianos (2012) [ |
| Features | simple time, frequency, MFCC |
| Experiment, accuracy | 6 emotions: 85.18% |
| Training method | SVM based |
| Article | Alpert and Allen (2010) [ | |
| Features | simple time | |
| Experiment, accuracy | upstairs: 82.87%, downstairs: 87.59% | |
| Training method | NN | |
|
| ||
| Article | Nishimura (2012) [ | |
| Features | Haar-like | |
| Experiment, accuracy | identification, 12 speakers: 93.00% | |
| Training method | LBG | |
|
| ||
| Article | Kinnunen | |
| Features | Frequency | |
| Experiment, accuracy | verification, 170 target speakers: 82.60% | |
| Training method | ||
|
| ||
| Article | Kinnunen | |
| Features | MFCC | |
| Experiment, accuracy | verification, 170 target speakers: 92.70% | |
| Training method | ||
|
| ||
| Article | Hasan | |
| Features | MFCC | |
| Experiment, accuracy | identification, 24 speakers: 100.00% | |
| Training method | VQ | |
|
| ||
| Article | Tiwari (2010) [ | |
| Features | MFCC | |
| Experiment, accuracy | verification, 5 speakers: 85.00% | |
| Training method | VQ | |
|
| ||
| Article | Reynolds | |
| Features | MFCC | |
| Experiment, accuracy | verification, 11 speakers: 90.00% | |
| Training method | GMM | |
|
| ||
| Article | Murty and Yegnanarayana (2006) [ | |
| Features | MFCC | |
| Experiment, accuracy | verification, 149 male speakers: 86.00% | |
| Training method | NN | |
|
| ||
| Article | Murty & Yegnanarayana (2006) [ | |
| Features | LPCC | |
| Experiment, accuracy | verification, 149 male speakers: 78.00% | |
| Training method | NN | |
|
| ||
| Article | Murty & Yegnanarayana (2006) [ | |
| Features | MFCC, LPCC | |
| Experiment, accuracy | verification, 149 male speakers: 89.50% | |
| Training method | NN | |
|
| ||
| Article | Kim | |
| Features | MFCC | |
| Experiment, accuracy | identification, 195 speakers: 95.45% | |
| Training method | GMM | |
| Article | Stäger |
| Features | simple time, frequency |
| Experiment, accuracy | 5 kitchen sounds: 85.00%, 5 workshop sounds: 67.00% |
| Training method | C4.5 decision tree/3NN |
|
| |
| Article | Nishimura (2012) [ |
| Features | Haar-like |
| Experiment, accuracy | 21 sounds: 97.30% |
| Training method | LBG |
|
| |
| Article | Zhan (2012) [ |
| Features | Haar-like |
| Experiment, accuracy | 20 sounds: 96.00% |
| Training method | HMM |
|
| |
| Article | Chen |
| Features | MFCC |
| Experiment, accuracy | 6 bathroom sounds: 83.50% |
| Training method | HMM |
|
| |
| Article | Sehili |
| Features | MFCC |
| Experiment, accuracy | 18 indoor sounds: 75.00% |
| Training method | SVM |
|
| |
| Article | Guo |
| Features | long-term |
| Experiment, accuracy | 10 indoor sounds: 92.00% |
| Training method | NN |
|
| |
| Article | Park |
| Features | MFCC |
| Experiment, accuracy | 9 events: 91.00% |
| Training method | GMM |
|
| |
| Article | Rabaoui |
| Features | simple time, frequency, MFCC |
| Experiment, accuracy | 9 surveillance sounds: 93.00% |
| Training method | HMM |
|
| |
| Article | Łopatka |
| Features | simple time, frequency |
| Experiment, accuracy | 5 danger sounds: 97.07% |
| Training method | SVM |
|
| |
| Article | Peltonen |
| Features | frequency |
| Experiment, accuracy | 17 environment sounds: 63.40% |
| Training method | 1NN |
|
| |
| Article | Peltonen |
| Features | MFCC |
| Experiment, accuracy | 17 environment sounds: 63.40% |
| Training method | GMM |
|
| |
| Article | Krijnders |
| Features | frequency |
| Experiment, accuracy | 21 sounds: 42.00% |
| Training method | knowledge network |
|
| |
| Article | Couvreur |
| Features | LPCC |
| Experiment, accuracy | 5 noise events: 90.00% |
| Training method | HMM |
|
| |
| Article | Heittola |
| Features | MFCC |
| Experiment, accuracy | 10 outdoor contexts: 91.00% |
| Training method | GMM |