| Literature DB >> 34926101 |
Serkan Tokgoz1, Issa M S Panahi1.
Abstract
Direction-of-arrival (DOA) estimation is a fundamental technique in array signal processing due to its wide applications in beamforming, speech enhancement and many other assistive speech processing technologies. In this paper, we devise a novel DOA technique based on randomized singular value decomposition (RSVD) to improve the performance of non-uniform non-linear microphone arrays (NUNLA). The accurate and efficient singular value decomposition of large data matrices is computationally challenging, and randomization provides an effective tool for performing matrix approximation, therefore, the developed DOA estimation utilizes a modified dictionary-based RSVD method for localizing single speech sources under low signal-to-noise ratios (SNR). Unlike previous methods developed for uniform linear microphone arrays, the proposed approach with L-shaped three microphone setup has no 'left-right' ambiguity. We present the performance of our proposed method in comparison to other techniques. The demonstrated experiments shows at-least 20% performance improvement using simulated data and 25% performance improvement using real data when compared with similar DoA estimation techniques for NUNLA. The proposed method exploits frame-based online time delay of arrival (TDOA) measurements which facilitates the proposed algorithm to run on real-time devices. We also show an efficient real-time implementation of the proposed method on a Pixel 3 Android smartphone using its built-in three microphones for hearing aid applications.Entities:
Keywords: Hearing aid device; low SNR; non-uniform microphone arrays; randomized algorithm; real-time implementation; singular value decomposition; smartphone; speech source localization
Year: 2021 PMID: 34926101 PMCID: PMC8681871 DOI: 10.1109/access.2021.3130180
Source DB: PubMed Journal: IEEE Access ISSN: 2169-3536 Impact factor: 3.367
Summary table for recent works.
| Algorithm | Methodology | Highlights and Limitations |
|---|---|---|
| Real-Time Estimation of Direction of Arrival of Speech Source using Three Microphones [ | Time Delay Estimation (TDE) | Three microphone DOA approach using Generalized Cross Correlation. Improved performance under noise, but still lacks under very low SNR. |
| A TDOA-based multiple source localization using delay density maps [ | TDE | This method focuses on multiple source localization using TDOA with volumetric mapping. The method was not examined with different noise types and low SNRs. |
| An L-shaped microphone array configuration for impulsive acoustic source localization in 2-D using orthogonal clustering based time delay estimation [ | TDE | Utilizes orthogonal clustering algorithm for L-shaped microphone array. Only impulsive sources are considered without considering different SNRs. |
| Real-Time Convolutional Neural Network-Based Speech Source Localization on Smartphone [ | Deep Learning | Convolutional Neural Network(CNN) approach for DOA. High accuracy but needs large dataset for training. |
| Multi-Speaker DOA Estimation Using Deep Convolutional Networks Trained With Noise Signals [ | Deep Learning | CNN approach for multi-speaker DOA. Needs extensive data for model training. |
| A polynomial eigenvalue decomposition MUSIC approach for broadband sound source localization [ | Multiple Signal Classification (MUSIC) | High resolution algorithm based on eigenvalue decomposition. Real time processing is not possible due to complexity. |
| DOA estimation of a system using MUSIC method [ | MUSIC | Can be applied to different array geometries. The method is not able to identify of correlated signal and computationally complex. |
| A High-Accuracy, Low-Latency Technique for Talker Localization in Reverberant Environments Using Microphone Arrays [ | Steered Response Power | Robust in noisy environment but has excessive computation due to the grid search. |
| Non-Uniform Microphone Arrays for Robust Speech Source Localization for Smartphone-Assisted Hearing Aid Devices [ | Singular Value Decomposition | High performance under low SNR. Computationally very complex. |
FIGURE 2.(a) Uniform linear arrays (ULA) and (b) Non-uniform non-linear arrays NUNLA where d and v are the inter-element microphone distances.
FIGURE 1.L-shaped 3 microphone array on Pixel 3 smartphone.
FIGURE 3.Block diagram of the real-time processing of the proposed DOA estimation method.
FIGURE 4.Block diagram of speech modeling to obtain f.
FIGURE 5.RMSE (°) results for DOA estimation using simulated data under machinery, traffic, and babble at −5dB, 0dB, and 5dB.
FIGURE 6.RMSE (°) results for DOA estimation using recorded data under machinery, traffic, and babble at −5dB, 0dB, and 5dB.
FIGURE 7.RMSE results for DOA estimation with and without VAD.
Comparison of processing times for different data lengths.
| Processing Times | ||||
|---|---|---|---|---|
| L(Data Length) | MUSIC | SRP-PHAT | NU-SSL | Proposed |
| 20 ms | 49.8 ms | 98.1 ms | 9.7 ms | 2.8 ms |
| 100 ms | 71.7 ms | 537.5 ms | 23.3 ms | 5.5 ms |
| 500 ms | 97.2 ms | 2914 ms | 47.1 ms | 12.7 ms |
RMSE(°) results for different angles.
| Angles | ||||||
|---|---|---|---|---|---|---|
| SNR | 0° | 60° | 120° | 180° | 240° | 300° |
| −5 dB | 6.46 | 4.92 | 5.07 | 6.12 | 5.39 | 5.86 |
| 0 dB | 4.17 | 2.94 | 3.91 | 4.40 | 3.78 | 3.78 |
| 5 dB | 2.5 | 1.96 | 1.75 | 2.29 | 1.64 | 1.63 |
FIGURE 8.Linear directivity pattern (LDP) for the proposed method.
FIGURE 9.Screenshot of the developed application on android smartphone.
FIGURE 10.Snapshot of CPU (top), memory (middle) and energy (bottom) consumption of the proposed method on android pixel 3 smartphone.