Literature DB >> 35495551

Joint Calibration and Synchronization of Two Arrays of Microphones and Loudspeakers using Particle Swarm Optimization.

Anton Kovalyov¹, Kashyap Patel¹, Issa Panahi¹.

Abstract

This work presents a methodology for the joint calibration and synchronization of two arrays of microphones and loudspeakers. The problem is modeled as estimation of the rigid motion of one array with respect to the other, as well as estimation of the synchronization mismatch between the two. The proposed method uses dedicated signals emitted by the loudspeakers of the two arrays to compute a set of time of arrival (TOA) estimates. Through a simple transformation, estimated TOAs are converted into a set of linearly independent time difference of arrival (TDOA) measurements, which are modeled by a system of nonlinear equations in the unknown parameters of interest. A maximum likelihood estimate is then given as the solution to a nonlinear weighted least squares (NWLS) problem, which is optimized applying a parallelizable variant of Particle Swarm Optimization (PSO). In this paper, we also derive the Cramér-Rao lower bound (CRLB), and benchmark it against the proposed method in a series of Monte Carlo (MC) simulations. Results show that the proposed method attains high-performance comparable to the CRLB.

Entities: Chemical

Keywords: Microphone array; PSO; calibration; localization; synchronization

Year: 2021 PMID： 35495551 PMCID： PMC9049220 DOI： 10.1109/ojsp.2021.3118574

Source DB: PubMed Journal: IEEE Open J Signal Process ISSN： 2644-1322

Introduction

MICROPHONE arrays can be employed to determine the space-time structure of an acoustic field. They have been used in many practical applications, including speech enhancement [1], sound source localization (SSL) [2], direction of arrival (DOA) estimation [3] and tracking [4]. The performance of these applications generally improves as the number of spatially distributed microphones being deployed increases. This is where wireless acoustic sensor networks (WASNs) are of special interest. A WASN simulates an ad-hoc array of spatially distributed microphones using an array of acoustic sensor nodes interconnected by a wireless medium [5]. Each node includes a processor, a wireless transmitter and receiver, an array of one or more microphones, and possibly one or more loudspeakers. These node characteristics are nowadays easily satisfied by many commercial off-the-shelf (COTS) devices, such as laptops, tablets and smartphones. Most multi-channel signal processing techniques, such as acoustic beamforming [6] and SSL/DOA based on time difference of arrival (TDOA) measurements [7], rely on precise knowledge of microphone array geometry, i.e., relative 3-dimensional (3D) microphone positions, and the assumption that the multiple audio input channels are synchronized. These constraints can be especially hard to achieve in a WASN, where nodes are generally asynchronous, and their relative positions are not necessarily fixed. In these situations, an automatic mechanism for geometric calibration, also known simply as calibration, as well as synchronization of the multiple audio input channels, is desired. A lot of research has been done on WASN calibration. Approaches in literature often model the problem as estimation of microphone pairwise distances, which are then transformed into relative 3D microphone positions applying multidimensional scaling (MDS) [8]–[13]. Detailed information on MDS can be found in [14]. Calibration methodology can be classified into two types: passive calibration, also known as self-calibration; and active calibration. Passive calibration methods estimate the WASN geometry using acoustic signals in the environment. Active calibration methods estimate the WASN geometry using dedicated signals generated by built-in loudspeakers within the nodes in the network. The concept of passive calibration is generally preferred in practice since it does not rely on the emission of potentially disruptive signals of active calibration methods. However, passive calibration methods proposed in the literature typically make certain assumptions about the environment which may jeopardize their implementation in some systems. Work on passive calibration includes [8]–[10], [15]–[17]. Chen et al. [8] assumed acoustic sources and microphones laying on a 2D plane and used energy measurements to estimate positions of both simultaneously. McCowan et al. [9] assumed synchronous microphones in a diffused noise environment to estimate microphone pairwise distances by fitting measured noise coherence with its theoretical model. Hon et al. [10] assumed acoustic sources at end-fire locations to estimate microphone pairwise distances using TDOA measurements. [15]–[17] assumed sources at far field and individual nodes equipped with a synchronized microphone array capable of reliable DOA estimation. These methods use DOAs observed at individual nodes and TDOAs observed between the microphones of different nodes to estimate the WASN geometry. On the other hand, work on active calibration includes [11]–[13], [18]. Peng et al. [18] proposed a system called “BeepBeep” which estimates the distance between two asynchronous nodes. Each node includes a microphone and a loudspeaker conveniently placed near each other. The loudspeakers emit a special “Beep” signal sequentially and a set of TOAs are estimated using the signals acquired by the microphones. Then, an approximation of the distance between the nodes is found by applying a simple algebraic manipulation on the TOA measurements. Cobos et al. [11] later expanded upon the BeepBeep system to allow simultaneous emission of Beep signals among two or more nodes to compute approximate pairwise distances, which greatly reduces calibration time. Their method excites individual loudspeakers simultaneously with a specific pseudonoise (PN) sequence, which is known for its high autocorrelation and low cross-correlation properties, followed by applying self-interference cancellation to the captured signals to improve TOA estimation. Raykar et al. [12] proposed a method that estimates node pairwise distances using a similar strategy to that of BeepBeep with the addition that pairwise distances are then converted into relative node 3D positions employing MDS, followed by applying the Levenberg-Marquardt algorithm (LMA) to further refine estimated positions of microphones and loudspeakers within nodes. Pertila et al. [13] proposed a calibration method for a WASN where individual nodes include an array of synchronized microphones and one loudspeaker. Their method follows similar steps to that of Raykar et al. with two main differences: TOA estimation was improved using known microphone array geometry within individual nodes; and instead of estimating individual microphone and loudspeaker positions, DOAs observed at individual nodes were used to find the node orientations. The aforementioned calibration methods consider nodes including one or more microphones and zero or one loudspeaker, at least when formulating the problem mathematically. However, in practice, acoustic sensor nodes may also include an array of loudspeakers, e.g., many of the aforementioned COTS devices come equipped with a microphone array and stereo loudspeakers. Consequently, it is of interest to develop an efficient joint calibration and synchronization method that uses all microphones and loudspeakers within individual nodes. Such a method can greatly benefit from the following two assumptions generally true in practice: intra-array geometry is known, that is, the relative 3D positions of microphones and loudspeakers within individual nodes are known; and intra-array audio input channels are synchronized. Therefore, this work offers a method for the joint calibration and synchronization of two arrays of microphones and loudspeakers, which, to the best of our knowledge, has not been addressed before. As shown in this paper, an increased number of elements within the two arrays helps improve estimation of inter-array geometry and inter-array synchronization mismatch. Hence, the proposed method can be efficiently applied to jointly calibrate and synchronize a WASN whose individual nodes include an array of microphones and loudspeakers. The proposed method models the problem of joint calibration and synchronization of two arrays of microphones and loudspeakers as estimation of the rigid motion, that is 3D rotation and 3D translation, of one array with respect to the other, as well as estimation of the synchronization offset between the two. The method uses signals emitted by the loudspeakers of the two arrays to compute a set of TOA measurements. Through a simple transformation, measured TOAs are then converted into a set of linearly independent TDOA estimates, which, applying the assumptions of known intra-array geometry and synchronized intra-array audio input channels, are modeled by a system of nonlinear equations in the unknown parameters of interest. A maximum likelihood (ML) estimate is then derived as the solution to a nonlinear weighted least square (NWLS) problem. Then, a parallelizable variant of Particle Swarm Optimization (PSO) is proposed to optimize the NWLS problem. In this work, we also include derivation of the Cramér-Rao lower bound (CRLB), which is benchmarked against the proposed method in a series of Monte Carlo (MC) simulations. Results show that the proposed method attains the CRLB in most cases. This paper is structured as follows. The problem of joint calibration and synchronization of two arrays of microphones and loudspeakers is described in Section II. The solution, given by optimization of a NWLS problem, is derived in Section III. A numerical optimization method based on PSO is presented in Section IV. Derivation of the CRLB is shown in Section V. The proposed method is evaluated in a series of simulation experiments in Section VI. Finally, Section VII concludes the paper. By convention, vectors in this paper are column vectors. They are denoted by lower case bold letters/symbols. Matrices are denoted by upper case bold letters/symbols. is the i-th element of is the transpose of is the Euclidean norm of is the Schur product. denotes expectation. mod(a, b) denotes a modulo denotes a known estimate. denotes an unknown estimate that needs to be found. Finally, denotes additive noise modeled as a zero-mean random variable.

Problem Formulation

Let us consider two arrays of microphones and loudspeakers. We will refer to one array as primary array (PA) and to the other as secondary array (SA). It is assumed that each array is properly calibrated and synchronized, i.e., intra-array geometry is known, and intra-array audio input channels are synchronized. However, the position of SA with respect to PA, as well as the synchronization offset between the two, are not known and need to be estimated.

Local Coordinate Systems and Rigid Motion

Known intra-array geometry is represented here by defining two local coordinate systems (LCSs), one for PA and one for SA. Let I and J be the number of loudspeakers and microphones in PA, respectively. Similarly, let K and L be the number of loudspeakers and microphones in SA, respectively. The 3D positions of PA’s i-th loudspeaker and j-th microphone, associated to PA’s LCS, are represented by s and m, respectively, where and . On the other hand, the 3D positions of SA’s k-th loudspeaker and l-th microphone, associated to SA’s LCS, are represented by and , respectively. For mathematical convenience we let and . Consequently, the position of SA with respect to PA is modeled by where R is a 3×3 rotation matrix and t is a 3×1 translation vector defining the rigid motion that brings SA’s LCS to that of PA. It is well known that the maximum number of degrees of freedom of a rigid body in 3D space is six, that is, three coordinates are required to locate its center of mass and another three to describe its orientation [19]. Consequently, using (1) to model the problem of calibration of SA with respect to PA is especially convenient, since the maximum number of parameters that need to be estimated is fixed regardless of the number of elements in SA. This work is flexible in how R and t are parametrized. For the purpose of demonstration, R can be parametrized, using Tait-Bryan representation, as where the yaw angle , the pitch angle and the roll angle are rotation parameters whose corresponding matrices represent rotation about the z, y and x axes, respectively. Similarly, t can be parametrized, using spherical representation, as where , and are translation parameters denoting range, azimuth angle and elevation angle, respectively. Fig. 1 illustrates the problem of describing the 3D position of SA with respect to PA in terms of the aforementioned parameters.

Fig. 1.

Parameters describing the 3D position of SA with respect to PA.

TOA

As part of the inter-array calibration procedure, each loudspeaker is excited with a known calibration signal such as a chirp signal or a PN sequence. Emitted signals are then captured by each microphone and TOAs are estimated. Let τ be the noise-free TOA of a signal emitted by loudspeaker p, when captured at microphone q, where and . Assuming direct path between loudspeakers and microphones, corresponding noise-free TOAs are given by where c is the propagation speed of the signal, τ is the physical time with reference to some global clock at which loudspeaker p was excited, and δ and δ are time offsets due to different internal clocks and capture times at PA and SA, respectively. This formulation comes directly from the assumption made earlier that intra-array audio input channels are synchronized. TOA estimation is similar to the problem of time delay estimation (TDE) in [20]. Hence, assuming no multipath and high signal-to-noise ratio, an estimate of τ can be found as the peak of the cross-correlation of the known signal emitted by loudspeaker p and the signal captured by microphone q. For best results, calibration signals can be emitted sequentially. However, this may be impractical when the number of loudspeakers is large. In that case, the methodology for TOA estimation when loudspeakers emit calibration signals simultaneously, developed by Cobos et al. in [11], is of special interest. Moreover, known intra-array geometry can also be exploited to further improve estimation of TOAs using the methodology developed by Pertila et al. in [13]. This work is independent of the methodology applied to estimate the TOAs in (4) and (5). Here, we assume direct knowledge of noisy TOA estimates modeled by where is, by definition, zero-mean additive noise.

TDOA and Synchronization Offset

The parameters τ, δ and δ are of no interest to us. Subtracting (4) and (5) results in the following system of nonlinear TDOA equations where is defined here as the synchronization offset between PA and SA that needs to be estimated. Once is found, the two arrays can be synchronized[1] by simply delaying SA’s audio input channels by .

Problem Summary

The problem is formulated as finding the synchronization offset and the parameters defining the rotation matrix R and translation vector t, given the known parameters and , where , , , , , and .

Problem Solution

In this section, we derive the ML estimate of the parameters of interest based on noisy TDOA measurements. Additionally, we discuss the minimum number of microphones and loudspeakers required.

ML Estimate

Since the transformation of SA’s LCS in (1) does not change relative distances between points, the TDOA representation in (8) can be conveniently rewritten as where the only unknowns on the right-hand side of (9) and (10) are the parameters of interest. An estimate of true TDOA is given by where, using (6), we have It then follows that is a zero-mean random variable whose second order statistics can be conveniently expressed in terms of the, assumed to be known, second order statistics of as follows for , , . Note that although the total number of TDOA measurements defined in (11) is , only are linearly independent, that is, there are linearly independent TDOAs associated to a single loudspeaker. For a given loudspeaker p, a plausible set of linearly independent TDOA measurements can be grouped in vector form as follows where the first L elements of group TDOAs between a fixed PA microphone and varying SA microphones, i.e., j = 1 and , and, similarly, the next J − 1 elements group remaining TDOAs between a fixed SA microphone and varying PA microphones, i.e., and . Let be a vector grouping all linearly independent TDOA measurements. Consequently, where r groups the corresponding noise-free TDOAs defined in (9) and (10), and Δr groups the corresponding noise terms defined in (13). Let be a vector grouping the unknown parameters that need to be estimated, where x groups the parameters defining the rotation matrix R, e.g., , and x groups the parameters defining the translation vector t, e.g., . Let r(x) be a vector grouping all the corresponding noise-free linearly independent TDOAs constructed as a function of x, more specifically, R, t and in (9) and (10) are defined using the corresponding parameters in x. Assuming Δr is zero-mean Gaussian, the ML estimate of x is given by the solution to the following NWLS problem where is the covariance matrix of the TDOA noise, whose individual elements are given by (14).

Minimum Number of Microphones and Loudspeakers

Let be the number of parameters in x, where D and D are the number of parameters in x and x, respectively. It follows that the constraint is a necessary condition for the estimator in (19) to work. However, this constraint is by no means a sufficient identifiability condition. For instance, let us consider a scenario where PA and SA each include a single loudspeaker and a large number of microphones satisfying the aforementioned constraint. If we allow all six degrees of freedom to SA, it is then easy to verify that the rotation of SA around the axis intersecting both loudspeakers, as illustrated in Fig. 2, will not affect pairwise distances between microphones and loudspeakers. This fact implies that identifiable 3D positioning of SA with respect to PA is not possible when regardless of the number of microphones. On the other hand, letting would break the ambiguity assuming the loudspeakers do not lie along a single line.

Fig. 2.

Ambiguity in 3D positioning of SA with respect to PA.

Numerical Optimization Method

Minimization of (20) is a highly nonlinear problem. A popular method for solving nonlinear problems is LMA. LMA is an iterative optimization method that combines gradient descent and Gauss-Newton methods. The problem with LMA is that it is prone to get stuck in local optimum and, as such, it greatly relies on the initial guess of the solution. Literature on active calibration methodology typically makes certain assumptions about the problem geometry to simplify finding an approximate solution to the particular nonlinear problem. The approximate solution could then be improved with LMA. A common assumption is that of microphones and loudspeakers corresponding to individual nodes being closely spaced [11]–[13], [18]. Here we prefer not to make additional assumptions and propose solving the nonlinear problem in (19) directly using PSO. PSO is a well-known population-based metaheuristic (PM) applied in a wide variety of fields [23]. PMs involve optimization using a population of candidate solutions or particles that move around the search space in an iterative manner following a predefined update rule. PMs are stochastic in nature and unlike hill climbing approaches are considerably less reliant on initialization. Moreover, computation involving PMs is generally easy to parallelize, and as such, significant speed up can be obtained with the use of a General-Purpose Graphics Processing Unit (GPGPU). These factors make PMs an attractive choice for solving the optimization problem in this paper. Apart from PSO, there exist many other well-known PMs, such as the Genetic Algorithm (GA) [24], Differential Evolution (DE) [25] and Artificial Bee Colony (ABC) [26]. Many variants of these algorithms were proposed in literature with the aim of improving not only the fitness of the solution for a given optimization problem but also convergence speed, that is, the number of function evaluations (FEs) it takes for the method to converge. Among GA, DE, ABC, and PSO, we found the latter to perform the best for solving (19). Hence, we will focus particularly on PSO. As an attempt to balance local exploitation and global exploration, in PSO, a swarm of particles moves throughout the search space by means of acceleration towards a weighted combination of the best solution that they individually found and the best solution that other particles within their neighborhood found. A neighborhood is known as a predefined set of particles within the swarm that a given particle can communicate with. There exist many variants of PSO in literature. In this paper, we implement the constricted PSO with ring communication topology defined in [27] with a slight modification to allow parallelization of computations.

Solution Search with PSO

PSO is applied to solve the optimization problem in (19) as follows. Let N be the number of iterations. Let N be the number of particles. A particle n, for , is represented by four vectors: its position x, i.e., candidate solution to (19); its velocity v; the best solution it individually found p; and the best solution found by particles within its neighborhood g. At each iteration of the algorithm, the solution search procedure updates the entire swarm as follows where is a parameter known as the constriction factor, c1 and c2 are parameters representing the attraction weights of x towards p and g, respectively, a∗, is a vector of size D whose elements are drawn from , and G is a set of best solutions found by particles within the neighborhood of particle n. Definition of G depends on the communication topology used, which is explained in Section IV-B. The algorithm terminates once the final iteration is completed. Then, the PSO solution to (19) is given by

Communication Topology

Communication among particles is a common feature to all PMs, where particles collaborate to find the global optimum. In the context of PSO, the term communication topology gives definition to G in (25). There are two well-known communication topologies in literature. One is the Global Communication Topology (GCT) and the other is the Ring Communication Topology (RCT). In GCT, a particle can communicate with all other particles including itself (see Fig. 3.a), and as such

Fig. 3.

Particle swarm communication topologies.

GCT implies that particles are directly attracted by the global best solution found by the algorithm, which in turn results in fast convergence rate. RCT, on the other hand, only allows a particle to communicate with itself and two adjacent particles in a ring structure (see Fig. 3.b). It then follows that Due to the overlapping nature of RCT, particles are still attracted by the global best solution found by the algorithm, but unlike GCT, the attraction is not direct, which in turn results in slower convergence rate. However, when implementing both to solve (19), we found that the slower convergence rate of RCT made PSO significantly less likely to converge prematurely when compared to the implementation with GCT. Hence, we propose solving (19) using PSO with RCT.

Swarm Initialization

As is common in most implementations of PSO, the positions and velocities of the entire swarm are initialized using uniform distribution. Let b and b be vectors representing the lower and upper boundaries of the search space, respectively. Initially, we let where .

Particles Travelling Outside the Search Space

The swarm update rule in (23) does not prevent particles from travelling outside the search space, which is especially likely to happen at the first few iterations of the algorithm. As recommended in [27], to prevent a bias towards the center of the search space, the proposed variant of PSO puts little restriction on the trajectory of particles and allows them to travel outside the search boundaries. The idea is that the weighted attraction towards known optima in (22) will anyways pull particles back within the search space regardless of their current position. However, to avoid unfeasible solutions, whenever a particle is found to be outside the search boundaries, its FE, defined by (20) is not computed. Additionally, to prevent particles from developing excessively large velocities, their speeds are bounded by v.

Parameters

The proposed optimization method includes five parameters whose values need to be specified, namely, the number of iterations N, the number of particles N, the constriction factor , and the attraction weights c1 and c2. The first two parameters define the maximum number of FEs computed by the algorithm as . Although no exhaustive search was made, simulation results showed that when restricting the algorithm to N = 5 × 105, a good choice is to let N = 100 and consequently N = 4999. The last three parameters, on the other hand, are rarely tuned in PSO, instead, they are given the default assignments = 0.72894 and c1 = c2 = 2.05. These default assignments are known to satisfy a constraint that guarantees convergence of PSO [27]. Proof of convergence of PSO can be found in [28].

Simultaneous vs. Sequential Swarm Update

The swarm update rules in (22)-(25) can be interpreted in two ways: the update is sequential, that is, particles are updated sequentially, one at a time, using (22)-(25); the update is simultaneous, that is, all particles are updated simultaneously using (22), followed by (23) and so on. In sequential update, particles will tend to steer towards the best known solution faster when compared to simultaneous update, hence the trajectories of the swarms in these two approaches will be different. However, the overall behavior of the algorithm will remain the same. The simultaneous swarm update should be preferred in practice since computation can be parallelized/vectorized for all particles, which is especially convenient due to current trend on GPGPU computing and use of array programming languages such as MATLAB and Python. Nonetheless, perhaps for the sake of simplicity, PSO is generally described in literature using sequential swarm update, including the constricted PSO in [27] used as reference in this paper. We did not find noticeable difference in estimation performance when applying either approach to solve (19). Hence, due to its considerable computational speedup, in this work we use simultaneous swarm update.

Algorithm Summary

The proposed algorithm for solving (19), based on constricted PSO with RCT and simultaneous swarm update, is summarized as follows: Step 1: Initialize the parameters N, N, , c1 and c2 using the values defined in Section IV-E. Step 2: Let t = 0. Step 3: Initialize all x and v, using (29). Step 4: Initialize all p as x. Step 5: Compute FEs for all x, using (20). Step 6: Initialize all g, using (25) and (28). Step 7: Increment t. Step 8: Update all v, using (22). Step 9: Update all x, using (23). Step 10: Compute FEs for all x, using (20). Step 11: Update all p, using (24). Step 12: Update all g, using (25) and (28). Step 13: If , return to Step 7, otherwise, give final solution, using (26).

CRLB

The CRLB places a lower bound on the variance of an unbiased estimator [29]. Hence, it is of interest to derive the CRLB for the problem in this paper to be later used as a benchmark against the proposed estimator based on PSO. The CRLB is given by the inverse of the Fisher information matrix (FIM). The FIM is found by where is the probability density function (PDF) of the measurements vector conditioned on the parameters vector x. In our context, x groups the true values of the unknown parameters in (18) and groups the TDOA measurements in (11). The definition of in (16) allows us to split it into which groups all linearly independent , and which groups all linearly independent . Consequently, and where r and r group the corresponding noise-free TDOAs and , respectively, while and group the corresponding zero-mean noise terms and , respectively. Let us assume that and are independent random vectors in and , where and are the corresponding noise covariance matrices, whose elements are computed using the relationship of TOA and TDOA noise second order statistics in (14). It follows that the PDF of conditioned on x is given by Taking the natural logarithm of both sides of (37) and expanding, we get where C is some constant. The expectation of the double partial in (30) is then found to be where computation of the right-hand side of (39) is straight-forward given the following element-wise partials Hence, derivation of the CRLB is summarized as follows. Define the calibration scenario, that is, define and . Compute s and m, using the rigid motion in (1). Let x group x, x, and . Compute all linearly independent , using (9), and group them into r. Compute all linearly independent , using (10), and group them into r. Assuming knowledge of second order TOA noise statistics, compute Q and Q, using (14). Compute the element-wise partials in (40). Substitute the values of the element-wise partials in (40) into the corresponding entries of the vectorized partials in (39). The CRLB is then given by negation followed by inverse of (39). The final answer is a D×D matrix whose diagonal elements represent the lower bound on the variance, i.e., mean squared error (MSE), of an unbiased estimate of x.

Simulation Experiments

Three experiments were conducted to analyze the performance of the proposed estimator. In all experiments, the speed of sound c was fixed at 343m/s. As illustrated in Fig. 4, PA was defined as a cylindrical array with radius 0.05m and height 0.1m. The coordinates of PA’s loudspeakers were generated by where we let I = 3 for all experiments except experiment 2. The coordinates of PA’s microphones, on the other hand, were fixed throughout all experiments. They are given by where we let J = 6. For the sake of simplicity, SA’s geometry was set equal to that of PA. The degrees of freedom of SA were defined using the parametrization example in Section II-A, that is, R was parametrized by α, β and γ; and t was parametrized by ρ, θ and . In all experiments, the range ρ was constrained within [0,2] (m); and the synchronization offset was constrained within [−1,1] (s), which is easily achievable in paractice. Finally, the five parameters of PSO were set using the values introduced in Section IV-E.

Fig. 4.

PA geometry for I = 3.

In the first experiment, we compare the performance of the proposed estimator with the CRLB for varying TOA noise magnitude. The position and synchronization mismatch of SA with respect to PA were fixed by letting , ρ = 1m and . The TDOA measurements were computed using (12) and (13), where the TOA noise on the right hand side of (13) was simulated using white gaussian noise (WGN) with standard deviation σ. Consequently, we let where and . Considering that the proposed method is being compared to the CRLB, the following performance metrics, based on root mean square error (RMSE), are used which represent orientation, ranging, direction, and synchronization RMSEs, respectively. In the case of CRLB, the expectation terms in (44–47) were given by the corresponding diagonal elements of the CRLB matrix. In the case of PSO, the expectation terms in (44–47) were estimated using the mean over a set of 100 MC simulations. Each MC simulation consisted in rerunning PSO with a different random seed for: simulation of additive TOA noise ; stochastic initialization of swarm positions and velocities in (29); and stochastic update of swarm velocities in (22). Fig. 5 shows that PSO attains the CRLB in all metrics except, perhaps, orientation RMSE at higher values of σ. However, results show that when σ is high, reliable estimation of orientation is not possible.

Fig. 5.

Experiment 1. Performance of PSO vs. CRLB when varying the magnitude of additive TOA noise Δτ.

In the second experiment, we compare the performance of the proposed estimator with the CRLB as the number of loudspeakers in PA and SA vary jointly. The experimental setup was the same as in the first experiment, except for σ, which was fixed by letting . As expected, Fig. 6 shows that performance improves as the number of loudspeakers increases. Note that the CRLB for I = K = 1 is not included due to the FIM being singular. A singular FIM implies that an unbiased estimator does not exist [30], which is consistent with the ambiguity argument in Section III-B. Again, results show that PSO attains the CRLB in most cases.

Fig. 6.

Experiment 2. Performance of PSO vs. CRLB as the numbers of loudspeakers in PA and SA, I, K, respectively, vary jointly in [1,5].

Let us now define three practical calibration scenarios which consist in changing the number of degrees of freedom of SA. We will refer to these scenarios as 7D, 5D and 4D. 7D implies that the calibration algorithm is required to estimate seven parameters, that is, all six degrees of freedom of SA, which, in this context, are given by α, β, γ, ρ, θ and ; plus the synchronization offset . 5D, on the other hand, consists in estimating five parameters by assuming . Similarly, 4D consists in estimating only four parameters by assuming . Note that although 7D is more general, 5D and 4D are very practical, since in many cases the gravitational orientation of devices may be known a priori (5D) and the positioning of devices may be further constrained to a flat surface (4D). In the third and final experiment, we evaluate the 4D, 5D and 7D calibration performance of PSO when TDOA measurements are estimated using acoustic signals in a simulated 5×5×3 (m) room and the range ρ varies in [0.2,2] (m). The room was simulated using the image-source method [31] with fixed reverberation time RT60 = 0.5s. PA was placed in the center of the room. The position and synchronization mismatch of SA with respect to PA were generated randomly for each calibration scenario. Special care had to be taken for the 5D and 7D calibration scenarios to restrict the position of SA within the bounds of the room. The calibration procedure was conducted by exciting each loudspeaker independently with a WGN signal of 1s length sampled at 48kHz. The TOA at each microphone was then estimated by applying GCC-PHAT [20] to the captured and reference signals. Additionally, quadratic interpolation [32] was used for improved TOA resolution. The second order statistics of TOA noise were approximated using (43), where we let σ2 = 1. Two performance metrics are used in this experiment, one is the synchronization RMSE in (47) and the other is the localization RMSE, defined by The expectation terms of (47) and (48) were estimated using the mean over a set of 100 MC simulations. Each MC simulation consisted in rerunning PSO with a different random seed for: simulating the position and synchronization mismatch of SA with respect to PA; stochastic initialization of swarm positions and velocities; and stochastic update of swarm velocities. The results in Fig. 7 show that overall RMSE is exceptionally low at close field but deteriorates with increasing ρ up to a point where it diverges completely due to unreliable TOA estimates caused by reverberation. As expected, results also show that the estimator performance tends to improve when the number of degrees of freedom is reduced. Furthermore, it is shown in Fig. 8 that PSO requires a surprisingly low number of iterations to converge in the 4D and 5D scenarios, while it has difficulties converging in the 7D scenario.

Fig. 7.

Experiment 3. Performance of PSO in 4D, 5D and 7D calibration scenarios as range ρ varies in [0.2,2] (m).

Fig. 8.

Experiment 3. ρ = 1.1m. Convergence rate of PSO in 4D, 5D and 7D calibration scenarios.

Conclusion

A methodology for the joint calibration and synchronization of two arrays of microphones and loudspeakers was presented. The problem is modeled as estimation of the rigid motion of SA with respect to PA, as well as estimation of the synchronization offset between the two. It is assumed that intra-array geometry is known, and intra-array audio input channels are synchronized, assumptions which are generally true in practice. The method consists in using signals emitted by the loudspeakers of PA and SA to compute a set of TOA estimates, which, through a simple transformation, are converted into a set of linearly independent TDOAs. The TDOAs are modeled by a system of nonlinear equations in the unknown parameters of interest, whose ML solution is found by means of optimization of a NWLS problem using a parallelizable variant of constricted PSO with RCT. Additionally, we derived the CRLB for the problem and benchmarked it against PSO in a series of MC simulations. Overall results showed that PSO tends to attain the CRLB. Furthermore, acoustic simulation results showed that the performance of PSO, including convergence rate, can be further improved if the number of degrees of freedom is reduced. Although the presented methodology assumes a network of only two arrays, it can be easily applied to bigger networks. For instance, a WASN could be scaled iteratively as new arrays join the network, or, assuming unambiguous solutions, multiple SAs could be calibrated with respect to a designated PA in parallel. The proposed method can also be used in other applications where precise localization and/or synchronization between two devices is necessary.

2 in total

1. Real-Time Estimation of Direction of Arrival of Speech Source Using Three Microphones.

Authors: Serkan Tokgöz; Anton Kovalyov; Issa Panahi
Journal: IEEE Workshop Signal Process Syst Date: 2020-09-23

2. Real-time dual-channel speech enhancement by VAD assisted MVDR beamformer for hearing aid applications using smartphone.

Authors: Nikhil Shankar; Gautam Shreedhar Bhat; Issa M S Panahi
Journal: Annu Int Conf IEEE Eng Med Biol Soc Date: 2020-07

2 in total