| Literature DB >> 35886772 |
Haopeng Shi1, Zhibo Chen1,2, Haiyan Zhang1,2, Juhu Li1,2, Xuanxin Liu1,2, Lili Ren3, Youqing Luo3.
Abstract
The larvae of some trunk-boring beetles barely leave traces on the outside of trunks when feeding within, rendering the detection of them rather difficult. One approach to solving this problem involves the use of a probe to pick up boring vibrations inside the trunk and distinguish larvae activity according to the vibrations. Clean boring vibration signals without noise are critical for accurate judgement. Unfortunately, these environments are filled with natural or artificial noise. To address this issue, we constructed a boring vibration enhancement model named VibDenoiser, which makes a significant contribution to this rarely studied domain. This model is built using the technology of deep learning-based speech enhancement. It consists of convolutional encoder and decoder layers with skip connections, and two layers of SRU++ for sequence modeling. The dataset constructed for study is made up of boring vibrations of Agrilus planipennis Fairmaire, 1888 (Coleoptera: Buprestidae) and environmental noise. Our VibDenoiser achieves an improvement of 18.57 in SNR, and it runs in real-time on a laptop CPU. The accuracy of the four classification models increased by a large margin using vibration clips enhanced by our model. The results demonstrate the great enhancement performance of our model, and the contribution of our work to better boring vibration detection.Entities:
Keywords: boring vibration; convolutional recurrent neural network; deep learning; denoising; end to end; trunk-boring beetle
Year: 2022 PMID: 35886772 PMCID: PMC9323076 DOI: 10.3390/insects13070596
Source DB: PubMed Journal: Insects ISSN: 2075-4450 Impact factor: 3.139
Figure 1Detailed architecture of the encoder (a) and decoder (b) of VibDenoiser, where stands for the kernel size and stands for the stride. The arrows represent the flow of information.
Figure 2The architecture of the proposed VibDenoiser model. The arrows represent the skip connections. is the number of channels and is the layer number.
Figure 3The architecture of SRU. is the input vector, is the internal state, is the previous state, is the forget gate, is the reset gate, and is the output state. * represents matrix multiplication.
Comparison of the model’s performance with 2-layer LSTM, 4-layer, 8-layer, and 14-layer SRU++ at 150 epochs.
| Recurrent Network | SNR (dB) | SegSNR (dB) | LLR |
|---|---|---|---|
| 2-layer LSTM | 8.23 | 0.50 | 0.38 |
| 4-layer SRU++ | 8.43 | 0.62 | 0.43 |
| 8-layer SRU++ | 8.52 | 0.65 | 0.45 |
| 14-layer SRU++ | 8.43 | 0.63 | 0.37 |
Comparison of the model performance with LSTM, and different layers of SRU++ at 150 epochs.
| Recurrent Network | SNR (dB) | SegSNR (dB) | LLR |
|---|---|---|---|
| 2-layer LSTM | 8.23 | 0.50 | 0.38 |
| 4-layer SRU++ | 8.43 | 0.62 | 0.43 |
| 6-layer SRU++ | 8.43 | 0.61 | 0.35 |
| 7-layer SRU++ | 8.43 | 0.59 | 0.34 |
| 8-layer SRU++ | 8.52 | 0.65 | 0.45 |
| 9-layer SRU++ | 8.50 | 0.67 | 0.36 |
| 10-layer SRU++ | 8.48 | 0.62 | 0.36 |
| 14-layer SRU++ | 8.43 | 0.63 | 0.37 |
Model’s performance with 8-layer SRU++ under different loss functions. The training epoch was 50 and the batch size was 56.
| Loss Function | SNR (dB) | SegSNR (dB) | LLR |
|---|---|---|---|
| L1 | 8.62 | 0.70 | 0.40 |
| L2 | 8.59 | 0.65 | 0.36 |
| Huber | 8.54 | 0.67 | 0.36 |
| L1 + STFT | 6.61 | −1.38 | 0.11 |
| L1 + SNR | 7.56 | 0.43 | 10.48 |
| log-cosh | 8.65 | 0.71 | 0.40 |
Inference time of VibDenoiser with different recurrent layers.
| Recurrent Network | Inference Time on CPU | Inference Time on GPU |
|---|---|---|
| 2-layer LSTM | 0.9586 | 0.1107 |
| 2-layer SRU++ | 0.7635 | 0.0401 |
| 4-layer SRU++ | 0.8692 | - |
| 6-layer SRU++ | 0.9142 | - |
| 7-layer SRU++ | 0.9833 | - |
| 8-layer SRU++ | 1.0024 | 0.0404 |
| 9-layer SRU++ | 1.0468 | - |
| 10-layer SRU++ | 1.0812 | - |
| 14-layer SRU++ | 1.1628 | - |
The boring vibration segment we fed into our model had a length of 5 s. The CPU was i7-10870H and the model was restricted to run on a single core. The GPU used here was an RTX 3070 laptop.
Model size, parameters, and FLOPs of VibDenoiser with different recurrent layers.
| Recurrent Network | Model Size (MB) | Parameters | FLOPs |
|---|---|---|---|
| 2-layer LSTM | 71.98 | 18.87M | 59.47G |
| 2-layer SRU++ | 41.02 | 10.75M | 52.48G |
| 4-layer SRU++ | 46.11 | 12.07M | 53.62G |
| 6-layer SRU++ | 51.20 | 13.40M | 54.76G |
| 8-layer SRU++ | 56.82 | 14.73M | 55.90G |
| 9-layer SRU++ | 58.83 | 15.39M | 56.47G |
| 10-layer SRU++ | 61.37 | 16.05M | 57.04G |
| 14-layer SRU++ | 71.55 | 18.71M | 59.33G |
Comparison of the model performance with 2-layer, 4-layer SRU++, and different loss functions. The training batch size was 56 and the epoch was 50.
| Recurrent | Loss Function | SNR (dB) | SegSNR (dB) | LLR |
|---|---|---|---|---|
| 2-layer SRU++ | L1 | 8.49 | 0.65 | 0.31 |
| 2-layer SRU++ | log-cosh | 8.57 | 0.66 | 0.33 |
| 4-layer SRU++ | log-cosh | 8.55 | 0.68 | 0.37 |
The training GPU hours for the model using 2-layer LSTM, 2-layer SRU++, and 8-layer SRU++ at 150 epochs on 4 GPUs.
| Recurrent Network | GPU h |
|---|---|
| 2-layer LSTM | 90.48 |
| 2-layer SRUPP | 77.04 |
| 8-layer SRUPP | 86.48 |
The batch size for the 2-layer LSTM, 2-layer SRU++, and 8-layer SRU++ model was 64, 56, and 56, respectively.
The training GPU hours used by VibDenoiser with different recurrent layers. The training epoch was 150.
| Recurrent Network | GPU h | Batch Size |
|---|---|---|
| LSTM | 90.48 | 64 |
| 2-layer SRU++ | 77.04 | 56 |
| 4-layer SRU++ | 79.72 | 64 |
| 6-layer SRU++ | 83.08 | 64 |
| 7-layer SRU++ | 84.92 | 60 |
| 8-layer SRU++ | 86.48 | 56 |
| 9-layer SRU++ | 88.00 | 56 |
| 10-layer SRU++ | 89.80 | 56 |
| 14-layer SRU++ | 96.16 | 52 |
Figure 4Frequency spectrums of 4 noisy boring vibration segments (column a) and the frequency spectrums of the same segments after enhancement using VibDenoiser (column b).
Classification results of four well-known classification models on the test set and VibDenoiser enhanced test set, respectively.
| Classification Model | Accuracy on Noisy Test Set | Accuracy on Enhanced Test Set |
|---|---|---|
| VGG16 | 81.14% | 92.51% |
| ResNet18 | 89.39% | 96.47% |
| SqueezeNet | 78.45% | 88.89% |
| MobileNetV2 | 85.77% | 90.40% |