| Literature DB >> 30453674 |
Yunbin Kim1, Jaewon Sa2, Yongwha Chung3, Daihee Park4, Sungju Lee5.
Abstract
The use of IoT (Internet of Things) technology for the management of pet dogs left alone at home is increasing. This includes tasks such as automatic feeding, operation of play equipment, and location detection. Classification of the vocalizations of pet dogs using information from a sound sensor is an important method to analyze the behavior or emotions of dogs that are left alone. These sounds should be acquired by attaching the IoT sound sensor to the dog, and then classifying the sound events (e.g., barking, growling, howling, and whining). However, sound sensors tend to transmit large amounts of data and consume considerable amounts of power, which presents issues in the case of resource-constrained IoT sensor devices. In this paper, we propose a way to classify pet dog sound events and improve resource efficiency without significant degradation of accuracy. To achieve this, we only acquire the intensity data of sounds by using a relatively resource-efficient noise sensor. This presents issues as well, since it is difficult to achieve sufficient classification accuracy using only intensity data due to the loss of information from the sound events. To address this problem and avoid significant degradation of classification accuracy, we apply long short-term memory-fully convolutional network (LSTM-FCN), which is a deep learning method, to analyze time-series data, and exploit bicubic interpolation. Based on experimental results, the proposed method based on noise sensors (i.e., Shapelet and LSTM-FCN for time-series) was found to improve energy efficiency by 10 times without significant degradation of accuracy compared to typical methods based on sound sensors (i.e., mel-frequency cepstrum coefficient (MFCC), spectrogram, and mel-spectrum for feature extraction, and support vector machine (SVM) and k-nearest neighbor (K-NN) for classification).Entities:
Keywords: IoT sensor; LSTM-FCN; pet dogs; resource efficiency; separation anxiety; sound events processing
Mesh:
Year: 2018 PMID: 30453674 PMCID: PMC6263678 DOI: 10.3390/s18114019
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Overall structure of the proposed method.
Figure 2Intensity data acquisition using noise sensor. The noise sensors are used to collect intensity data and the collected intensity data is transmitted over the wireless network to the IoT analysis platform to process the data.
Figure 3Waveforms for four pet dog sound events: (a) barking; (b) growling; (c) howling; (d) whining events.
Information on pet dog sound data file format.
| Pet Dog Sound Events | Field Name | |||||
|---|---|---|---|---|---|---|
| CM | NC | SR | TS | Duration | BPS | |
| Barking | Uncompressed | 1 | 22,050 | 5327 | 0.24 | 16 |
| Growling | Uncompressed | 1 | 22,050 | 11,461 | 0.51 | 16 |
| Howling | Uncompressed | 1 | 22,050 | 32,628 | 1.47 | 16 |
| Whining | Uncompressed | 1 | 22,050 | 6311 | 0.28 | 16 |
Figure 4Intensity from the sound data and intensity level obtained from a noise sensor: (left) the intensity from the sound data; (right) the intensity level. (a) a barking event has a relatively short duration, and the value decreases rapidly after a certain period; (b) a growing event has a longer duration than the barking event, and also has a jagged characteristic; (c) a howling event shows the longest duration among the four sound events. It shows that the value of the early event is high and the value becomes low toward the rear part; (d) a whining event, such as barking, shows a short duration, and it also displays a jagged characteristic momentarily.
The results of RMSE between intensity and intensity level.
| Pet dog Sound Events | Noise Sensor (Intensity Level) | ||||
|---|---|---|---|---|---|
| Barking | Growling | Howling | Whining | ||
| Sound sensor (Intensity) | Barking | 4.61 | 14.79 | 8.63 | 13.57 |
| Growling | 10.41 | 4.70 | 8.14 | 8.34 | |
| Howling | 8.89 | 8.89 | 3.54 | 7.93 | |
| Whining | 9.57 | 8.38 | 7.81 | 3.13 | |
The minimum, maximum, mean, and median lengths of the intensity data for each sound events.
| Pet dog Sound Events | Field Name | |||
|---|---|---|---|---|
| Minimum Length | Maximum Length | Mean Length | Median Length | |
| Barking | 5 | 47 | 19.24 | 19 |
| Growling | 16 | 405 | 59.59 | 56 |
| Howling | 51 | 646 | 188.60 | 161 |
| Whining | 5 | 198 | 27.97 | 19 |
Differences of voltage value according to each sound event of data extracted from sensor.
| Pet Dog Sound Events | Field Name | |||
|---|---|---|---|---|
| Minimum Voltage | Maximum Voltage | Mean Voltage | Median Voltage | |
| Barking | 0.98 | 25.39 | 15.70 | 17.58 |
| Growling | 0.98 | 8.79 | 6.29 | 5.86 |
| Howling | 0.98 | 11.72 | 7.68 | 7.81 |
| Whining | 0.98 | 13.67 | 8.32 | 7.81 |
Figure 5Sound events of increased length obtained via bicubic interpolation.
Figure 6LSTM-FCN model for pet dog sound events classification.
An example of intensity data for pet dog sound event obtained from noise sensor.
| Pet Dog Sound Events | 1/138 Sec | Intensity Level | |||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Barking | 1–16 | 4.82 | 4.81 | 4.79 | 4.81 | 4.81 | 4.34 | 4.66 | 4.46 | 4.47 | 4.19 | 3.48 | 2.84 | 2.23 | 1.68 | 1.59 | 2.38 |
| 17–32 | 2.84 | 3.62 | 4.34 | 3.92 | 2.97 | 2.31 | 2.28 | 2.54 | 2.82 | 3.09 | 3.38 | 3.50 | 3.26 | 2.85 | 2.63 | 2.84 | |
| 33–48 | 2.23 | 1.68 | 1.59 | 2.38 | 3.62 | 4.34 | 3.92 | 2.97 | 2.31 | 2.28 | 2.54 | 2.82 | 3.09 | 3.38 | 3.50 | 3.26 | |
| 49–64 | 2.85 | 2.63 | 2.84 | 3.23 | 3.4 | 3.07 | 2.52 | 1.94 | 1.91 | 2.16 | 2.22 | 2.55 | 2.85 | 3.00 | 3.11 | 3.33 | |
| Growling | 1–16 | 4.45 | 4.28 | 3.83 | 2.99 | 2.49 | 2.63 | 3.11 | 3.54 | 3.80 | 4.01 | 4.04 | 3.73 | 3.26 | 2.93 | 2.94 | 3.11 |
| 17–32 | 3.17 | 2.96 | 2.64 | 2.37 | 2.14 | 1.96 | 2.03 | 2.58 | 3.37 | 3.91 | 3.85 | 3.52 | 3.39 | 3.74 | 4.28 | 4.55 | |
| 33–48 | 4.29 | 3.76 | 3.22 | 2.66 | 2.09 | 1.81 | 2.01 | 2.49 | 2.94 | 3.25 | 3.53 | 3.66 | 3.48 | 3.15 | 3.06 | 3.49 | |
| 49–64 | 4.15 | 4.50 | 4.25 | 3.70 | 3.14 | 2.57 | 1.99 | 1.65 | 1.68 | 1.94 | 2.25 | 2.63 | 3.05 | 3.19 | 2.77 | 2.03 | |
| Howling | 1–16 | 3.61 | 4.08 | 4.28 | 4.15 | 3.95 | 3.52 | 2.64 | 2.18 | 1.95 | 1.88 | 2.01 | 2.47 | 3.13 | 3.58 | 3.56 | 3.32 |
| 17–32 | 3.19 | 3.31 | 3.54 | 3.75 | 3.88 | 3.99 | 4.03 | 3.92 | 3.75 | 3.65 | 3.77 | 2.55 | 2.15 | 2.62 | 2.57 | 2.08 | |
| 33–48 | 2.48 | 3.30 | 3.89 | 3.97 | 3.81 | 3.06 | 2.48 | 2.20 | 2.54 | 3.18 | 3.54 | 3.27 | 2.73 | 2.37 | 2.38 | 2.57 | |
| 49–64 | 2.80 | 3.11 | 3.45 | 3.50 | 2.93 | 2.08 | 1.55 | 1.71 | 2.19 | 2.50 | 2.35 | 2.03 | 1.82 | 1.85 | 2.00 | 2.11 | |
| Whining | 1–16 | 0.01 | 0.33 | 0.76 | 1.05 | 1.04 | 0.89 | 0.71 | 0.48 | 0.19 | 0.76 | 1.41 | 2.20 | 2.62 | 2.27 | 1.56 | 1.11 |
| 17–32 | 1.69 | 1.98 | 2.16 | 2.96 | 3.16 | 3.31 | 3.41 | 3.46 | 3.47 | 3.35 | 3.02 | 2.57 | 2.20 | 2.01 | 1.85 | 1.65 | |
| 33–48 | 1.22 | 0.70 | 0.31 | 0.10 | 0.01 | 0.15 | 0.39 | 0.52 | 0.40 | 0.18 | 2.20 | 2.54 | 3.18 | 3.54 | 2.73 | 2.37 | |
| 49–64 | 2.08 | 1.94 | 1.81 | 1.63 | 1.44 | 1.38 | 1.57 | 1.90 | 2.09 | 1.97 | 1.72 | 1.58 | 1.94 | 2.21 | 1.95 | 1.68 | |
Number of data of four pet dog sound events.
| Pet Dog Sound Events | # of Events | # of Intensity Data per Event |
|---|---|---|
| Barking | 300 | 5771 |
| Growling | 300 | 17,877 |
| Howling | 300 | 56,579 |
| Whining | 300 | 8390 |
| Total Number of Data | 1200 | 88,617 |
Figure 7Each accuracy when increasing the data length through bicubic interpolation. The classification accuracy is improved if the length of the intensity data is increased compared to the original data. However, the classification accuracy is decreased if the increased length of the intensity data is exceeded three times compared to the length of the original data.
Classification accuracy applied to each model.
| Type of Sensor | Sound Sensor (Typical) | Noise Sensor (Proposed) | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Type of Data | Sound data | Intensity Data | |||||||
| Feature Extraction Method | MFCC | Spectrogram | Mel-Spectrum | None | |||||
|
| SVM | K-NN | SVM | K-NN | SVM | K-NN | Shapelet | LSTM-FCN | Bicubic + LSTM-FCN |
|
| 0.8545 | 0.7944 | 0.8633 | 0.7834 | 0.8432 | 0.7855 | 0.6788 | 0.7396 | 0.8368 |
Comparison of data size and performance between sound sensor and noise sensor, and Wi-Fi sensor.
| Type of Device | Average of Data Size (KB) | Current (mA) | Voltage (V) | Energy (J) |
|---|---|---|---|---|
| Sound sensor (MQ-U300) | 66.4 | 180 | 5.0 | 0.9 |
| Noise sensor (LM-393) | 0.9 | 20 | 5.0 | 0.1 |
| Wi-Fi (ESP8266) | — | 170 | 3.3 | 0.5 |
Comparison of energy consumption and battery usage with various network conditions.
| Transmission Speed (KB/s) | |||||
|---|---|---|---|---|---|
| 300 | 600 | 900 | 1200 | ||
| Sensing Energy (J) | Sound | 0.9 | |||
| Noise | 0.1 | ||||
| Transmission Energy (J) | Sound | 0.111 | 0.056 | 0.037 | 0.028 |
| Noise | 0.002 | 0.001 | 0.001 | 0.001 | |
| Total Energy (J) | Sound | 1.011 | 0.956 | 0.937 | 0.928 |
| Noise | 0.102 | 0.101 | 0.101 | 0.101 | |
| Battery usage time (h) | Sound | 1.9 | 2.0 | 2.1 | 2.2 |
| Noise | 19.6 | 19.8 | 19.8 | 19.8 | |