Marek Franaszek1. 1. National Institute of Standards and Technology, Gaithersburg, MD 20899.
Abstract
When two six degrees of freedom (6DOF) datasets are registered, a transformation is sought that minimizes the misalignment between the two datasets. Commonly, the measure of misalignment is the sum of the positional and rotational components. This measure has a dimensional mismatch between the positional component (unbounded and having length units) and the rotational component (bounded and dimensionless). The mismatch can be formally corrected by dividing the positional component by some scale factor with units of length. However, the scale factor is set arbitrarily and, depending on its value, more or less importance is associated with the positional component relative to the rotational component. This may result in a poorer registration. In this paper, a new method is introduced that uses the same form of bounded, dimensionless measure of misalignment for both components. Numerical simulations with a wide range of variances of positional and rotational noise show that the transformation obtained by this method is very close to ground truth. Additionally, knowledge of the contribution of noise to the misalignment from individual components enables the formulation of a rational method to handle noise in 6DOF data.
When two six degrees of freedom (6DOF) datasets are registered, a transformation is sought that minimizes the misalignment between the two datasets. Commonly, the measure of misalignment is the sum of the positional and rotational components. This measure has a dimensional mismatch between the positional component (unbounded and having length units) and the rotational component (bounded and dimensionless). The mismatch can be formally corrected by dividing the positional component by some scale factor with units of length. However, the scale factor is set arbitrarily and, depending on its value, more or less importance is associated with the positional component relative to the rotational component. This may result in a poorer registration. In this paper, a new method is introduced that uses the same form of bounded, dimensionless measure of misalignment for both components. Numerical simulations with a wide range of variances of positional and rotational noise show that the transformation obtained by this method is very close to ground truth. Additionally, knowledge of the contribution of noise to the misalignment from individual components enables the formulation of a rational method to handle noise in 6DOF data.
The pose of a three dimensional rigid body is determined by six degrees of freedom: three coordinates of the position vector (defining, e.g., the location of the center of mass) and three angles (e.g., Euler angles or yaw, pitch, and roll) which uniquely parameterize a 3×3 rotation matrix. A pose measuring instrument outputs position and orientation data in a coordinate frame defined by the pose of the instrument in a global coordinate system. Any data acquired by the instrument from two different locations need to be transformed into one coordinate system through a process known as registration. A similar procedure is required when a robot’s vision system collects data in one coordinate frame while the robot’s arm operates in another coordinate frame (robot-world/hand-eye calibration problem). Due to noise present in acquired 6DOF data (both in their positional and rotational parts) the alignment of two datasets is not perfect. Mathematically, both the registration and calibration problems can be formulated as the minimization of some kind of error measure Epose() where the homogeneous transformation being sought consists of a rotation and a translation.Various techniques have been developed to obtain (see [1] for a comprehensive review). Some methods are based on iterative minimization [2,3], while others provide closed form solutions [4,5]. There are techniques that parameterize rotation with the help of quaternions [6,7]; others use an axis and angle representation [8]. Many approaches follow a separable solution strategy: first calculate the rotational part of and then calculate the translation [9]. Fewer procedures simultaneously solve both the rotational and translational components of [10,11]. Another class of techniques falls into a category of structure from motion with improved L∞ norm optimization [12,13], which enables recovery of a correct location scale [14].Since 6DOF data contain both positional and rotational components, three different strategies of data handling are possible in the search for the best rotation: 1) use only the positional part of 6DOF data; 2) use only the rotational part; or 3) use both the positional and the rotational parts. The first method, 3DOF registration, is used in many applications where only 3D points are acquired [15-17]. Of course, whenever 6DOF data are available, the natural choice is to use all the data without wasting any part of them. However, there is a scaling problem associated with existing registrations based on full 6DOF data. The problem is inherent in the definition of the corresponding error function
where Erot is a dimensionless measure of error related to the rotational part of the data (three angles), while Eloc is the error caused by the positional part of data and is expressed in (length)2 units. This ill-defined expression is a consequence of using a homogenous matrix and its Frobenius norm ║║ as a measure of an overall error
where δ is a 3x3 matrix related to the rotational part of 6DOF data, δ is a vector related to the positional part of data, tr() is a trace of a matrix and T is the matrix transpose of . Since Erot is bounded while Eloc is not, changing length units (say, from mm to µm or to meters) can cause either a contribution of Erot or Eloc to Epose to be ignored. Some attempts to fix this problem are based on a heuristic scale factor λ
with λ having a dimensionality of (length)−2. While introducing λ makes the expression for Epose formally correct, this approach fails to resolve the main problem as it does not provide an objective method of setting a value of λ.A self-adjusting weight approach was proposed for the first time in [18]. The error function was defined as
where
and
are the variances of the rotational and positional noise, respectively. Since both variances are usually not known in laboratory measurements, a multistep minimization of the error function defined in (3) was proposed with
Initially, an arbitrary value was assigned to λ and the first minimization of Epose was performed. The weight factor was then updated to λ = Erot/Eloc and substituted back into (3). Next, the new Epose was minimized again and the whole process was repeated. Figure 2 in [18] shows that regardless of the initial value of λ, after only three steps, λ approaches a constant value, which is interpreted as the correct weight between the rotational and positional components of Epose. Unfortunately, this procedure removes only a mismatch in length units between Eloc and
, so they both can be expressed in the same units, like mm or µm. The procedure does not provide a true measure of relative noise levels, because the limit in (4b) is equally as ill defined as Epose in (1). Depending on the size of the bounding box containing positional data, the same positional noise σloc can be declared as large or small, regardless of the actual value of σrot.In this paper the mismatch between Eloc and Erot in the registration procedure is removed in a different way. Instead of using the Euclidean norm to measure the distance between two sets of 3D points, Eloc is expressed as a sum of squared angular differences between matching vectors. Then, the same form of function can be used to calculate Erot. This formulation ensures that the ratio Eloc/Erot is truly a good measure of the relative amount of noise in the positional and rotational components of 6DOF data. Extensive numerical simulations reveal that it may be more advantageous to use only the positional or the rotational part of data in some experimental conditions.
2. Definition of Error Function
Two 6DOF datasets {, } and {, }, j=1,…,N, are acquired in two different coordinate systems where, for each j, and are two corresponding vectors and and are two corresponding 3x3 rotation matrices. The registration transformation consists of a rotation matrix and a translation vector . In this paper, all rotation matrices are used in an axis and angle representation. The axis can be represented as a unit column vector
where (ϑ,φ) are elevation and azimuth angles, respectively. Then, the matrix of rotation about axis by angle ρ can be expressed by the Rodrigues formula
where is 3x3 identity matrix and
Following separable procedures, the rotation is found first and then the translation is calculated as
where ctr and ctr are centroids of N points {} and {}. So, the hard part of registration is to find a correct rotation . When only 3DOF positional data are available, the typical approach is to move the origins of coordinate systems to the corresponding centroids
and then find a rotation which minimizes the following Euclidean norm
Such defined Eloc has dimensionality of (length)2 and causes problems when 6DOF data need to be registered. To avoid this problem, vectors
and
are normalized
and the error function is defined as
where 0 ≤ w ≤ 1 is a weight factor for a given j-th term and · stands for the dot product of two vectors. The error function so defined gauges the angular misalignment between vectors
and
. This error function can now be minimized to find the best rotation by using any (perhaps gradient based) optimization procedure. The gradient of Eloc can be calculated as
where individual partial derivatives of rotation matrix are expressed as
and derivatives of vector can be explicitly evaluated from (5).Erot can be calculated using a similar form of error function as Eloc in (12). Since is the rotation matrix, its three columns are unit vectors, i.e., the first column (:,1) is a unit vector along the rotated x direction, the second column (:,2) is a unit vector along the rotated y, and (:,3) along the rotated z (and similarly for ). Thus, Erot can be defined as
where 0 ≤ s (k) ≤ 1 is a weight factor for the corresponding (j,k) term. The gradient of Erot can be calculated similarly as for Eloc in (13)
with calculated in (14). Finally, the full error function Epose for registering 6DOF data using both the positional and rotational parts of data is defined as in (1), with Eloc defined by (12) and Erot by (15). The gradient of Epose can be calculated from gradients of Eloc and Erot as defined in (13) and (16), respectively.In order to use the above procedure, the weight factors w in (12) and s (k) in (15) must be defined as well as a starting point for minimization (ϑ0, φ0, ρ0). If there were no noise in the acquired rotational data and , then the following would hold for every j
In reality, a slightly different matrix has to be calculated for each j
where and ρ are the axis and the angle of rotation and
. Then, the normalized mean axis and the mean angle of rotation can be used as the starting point (ϑ0, φ0, ρ0) in a minimization of Epose
where 0 depends on angles (ϑ0, φ0) as in (5). If the experimental rotational data and are not heavily affected by noise, then the starting rotation 0 (ϑ0, φ0, ρ0) should be relatively close to the final best fit rotation * (ϑ*, φ*, ρ*). If only 3DOF positional data are available then 0 can be easily constructed from two corresponding triplets of non-collinear data points.Once the starting point for the minimization is determined, the weight factors w in (12) and s (k) in (15) can be calculated
where
and
are defined in (11) and (:,k) is the k-th column of data matrix (and similarly for ). The rationale behind such defined weight factors is that the components of vectors
and
that are parallel to 0 should be close to each other, and similarly for (:,k) and (:,k). If they are not, then a given j-th term is classified as an outlier and a small value is assigned to w in (12) or s (k) in (15), respectively.
3. Numerical Simulations
In order to test the performance of the introduced procedure, extensive numerical simulations were done. First, a primary 6DOF dataset was generated
. For every j, the unit axis vector
was created as in (5) together with the corresponding rotation matrix
as in (6) and the position vector
. Then, a transformation was selected by setting the translation vector 0 and three angles (ϑGT, φGT, ρGT). The unit axis vector (ϑGT, φGT) was created as in (5) and the rotation matrix
was formed as in (6). Then, the secondary 6DOF dataset
could be created as follows. The position vector is
, where
. The positional noise is represented by (0, f), a vector with three components that are pseudo-random numbers obtained from a generator with Gaussian distribution, zero mean and standard deviation equal to f. The rotation matrix is calculated as
where the rotation matrix was calculated as in (6), h is a standard deviation of angular noise, and [ζ, ω, η] are pseudo-random numbers obtained by the standard Gaussian generator (with zero mean and standard deviation equal to one). In order to make an easier comparison between the effects caused by angular noise (h in radians) and positional noise (f in mm), the standard deviation of positional noise was calculated as
where g is expressed in radians and Lavg is the averaged length of vectors centered at ctr
Once a pair of primary and secondary data was generated, the best fit rotation *(ϑ*,φ*,ρ*) was determined by three different methods. In method 1, only positional data were used and Eloc, defined in (12), was minimized. In method 2, only rotational data were used and Erot, as defined in (15), was minimized. In method 3, full 6DOF data were used and Epose was minimized. Each method yielded slightly different best fit rotations
for k = 1, 2, 3. For each rotation, a deviation d from a ground truth rotation GT was calculated using the Frobenius norm (2)
The matrix GT is an inverse of the matrix
which was used to generate a secondary 6DOF data in (21), i.e.,
For every pair of noise parameters (g, h), the above procedure was repeated many times. First, in order to check the stability of results obtained for different noise realizations, for a given transformation
and primary data, Nnoise secondary datasets were generated
, where n = 1,…, Nnoise. Each n-th dataset was obtained by using different sequences of random numbers [ζ,ω,η] and (0, f) for rotational and positional noise, respectively. In addition to this test, Mdata primary datasets were generated
, where m = 1,…, Mdata. For each m-th primary dataset, Nnoise secondary datasets were generated as described previously
. Finally, Ktrans transformations were created
where k = 1,…, Ktrans. For each k-th transformation, each m-th primary dataset and each n-th random sequence, a secondary dataset
was generated. Overall, for every pair of noise parameters (g, h), a total of Ntot = Nnoise x Mdata x Ktrans secondary datasets were generated. Each secondary dataset was registered to its corresponding primary data using each of the three different methods described earlier.All numerical calculations were performed on a 32-bit PC in double precision. A standard quasi-Newton minimization algorithm as implemented by Davidon-Fletcher-Powell (DFP) in [19] was used in all optimizations. Exit criteria from this iterative procedure were set to 10−6 for both the minimum change in step and the flatness of a gradient. Pseudo-random numbers were generated using the gasdev function provided in [19].
4. Results
Figures 1–5 show the results of simulations obtained for 200 × 200 pairs of noise parameters (g, h). On average, 5–9 iterative steps were needed for the DFP minimization procedure to converge. The length of each dataset (primary or secondary) is N = 10, Nnoise = 16, Mdata = 10, Ktrans = 10, so there are Ntot = 1,600 pairs of (primary, secondary) data requiring registration for each (g, h). Figure 1 shows the mean ratio
averaged over all Ntot cases and displayed in a logarithmic scale
where
is the solution obtained by minimizing Epose for the l-th pair of data. Figure 2 shows which method most frequently delivered the smallest deviation from ground truth d as defined in (24). Figure 3 displays how often a given method delivered the best results.
Fig. 1
Contour plot of the mean ratio
defined in (26) on a logarithmic scale as a function of a standard deviation of positional noise g and rotational noise h. For easier interpretation, both noise parameters are shown in milliradians. In simulations g was converted to millimeters using (22).
Fig. 2
The diagram showing which of the three methods delivered most frequently the best registration. In method 1, Eloc was minimized using only the positional data; in method 2, Erot was minimized using only the rotational data; in method 3, Epose was minimized using full 6DOF data. For each pair of noise parameters (g, h), a total of Ntot = 1,600 registrations of different data pairs were performed.
Fig. 3
The frequency of winning by the best method shown in Fig. 2 vs. positional and rotational noise parameters (g, h).
Fig. 4
The frequency of correct predictions vs. noise parameters (g, h). Based on a value of Eloc / Erot, the two best methods are predicted. The central white area corresponds to the inconclusive region where no prediction could be made because Eloc ≈ Erot. Note that the value of color on the color bar for the highest frequency 1.0 deviates intentionally from a linear scale for better visualization.
Fig. 5
The mean ratio
defined in (27) vs. noise parameters (g, h). As in Fig. 4, the central white part of the plot corresponds to an inconclusive region on (g, h) plane.
Figure 4 shows the outcome of a prediction procedure defined as follows. For each pair of noise parameters (g, h) and each l-th pair of data, the ratio α was calculated as in (26). Then, if α ≤ Tlow or α ≥ Thigh, where Tlow < Thigh are predefined parameters, the prediction was made. If α ≥ Thigh, then the best results were expected to be delivered either by method 2 (minimization of Erot, positional data ignored) or by method 3 (minimization of Epose, full 6DOF data used). Similarly, if α ≤ Tlow, then the best results were expected from either method 1 (minimization of Eloc, rotational data ignored) or from method 3. This prediction was compared with the actual deviations from ground truth d, k = 1,2,3. If the prediction was correct, the number of successful predictions for that pair (g, h) was increased by one. This process was repeated for every l-th pair of data (l = 1,…, Ntot). Displayed in Fig. 4 is the number of successful predictions divided by L, where L is the total number of cases for which α ≤ Tlow (or α ≥ Thigh) for a given (g, h). The central white part of the plot corresponds to an inconclusive region in (g, h) where no prediction could be made, because Tlow <α < Thigh, i.e., L = 0. The results presented were obtained for Tlow = (1/3)2 and Thigh = 32. Figure 5 shows the mean ratio
vs. noise parameters (g, h)
where
is the difference between the two smallest deviations d and d obtained with the predicted two best methods k1 and k2 applied to l-th data pair, and
is the difference between the largest and the smallest deviation for the same l-th pair. Similarly as in Fig. 4, the central white part of the plot corresponds to an inconclusive region in (g, h) where L = 0.
5. Discussion
The results presented in Fig. 1 indicate that, as expected, the ratio
correlates well with the ratio of noise parameters g/h. This is important because the systems that are used for pose determination usually do not provide much information about the noise levels present in positional and rotational data. When the ratio
becomes small, one may expect that discarding the rotational part of 6DOF data may, on average, lead to better registration. Similarly, for
large, discarding the positional part of the full data may on average yield a better result. For intermediate values of
using full 6DOF data should lead to the best results. Figure 2 and 3 confirm that these intuitive expectations are indeed correct.For small or large values of
, the procedure introduced in this paper predicts very well which of the three registration methods provides the best results. The data shown in Fig. 4 indicate that the highest ratio of false predictions does not exceed 0.08 (in a region of small
, around noise parameters (g, h) ≈ (15, 120)). For the majority of (g, h) where either
or
, the prediction rate is exactly 1 or very close to 1.Overall, the data shown in Figs. 2–5 confirm that minimization of Epose (the third method, full 6DOF data used) yields consistently good results in a whole range of investigated noise parameters (g, h). The third method most frequently delivers the best or second best result. In the latter case, for
or
, the difference between the two best methods is in most cases two orders of magnitude smaller than the difference between the worst and the best method; see Fig. 5. This means there is a large difference between the worst method and the remaining two methods. At the same time, differences between the two best methods are small. Thus, using full 6DOF data and minimizing Epose seems to be the best or close to the best strategy across the whole range of noise parameters (g, h). For very small or very large values of the ratio Eloc (vpose)/Erot (vpose), it may be worthwhile to discard the noisy part of the 6DOF data and to redo the minimization using Eloc or Erot with only positional or rotational data, respectively. A similar conclusion was formulated already in [3] without a systematic study of the mutual relation between positional and rotational noise. That observation was based on particular parameters used in the simulations: length of position vector ║║ ∈ [500 mm, 1000 mm], length of translation ║0║ = 800 mm, positional noise 1 mm and rotational noise 2.5 mrad (using a uniform random number generator). However, during the simulations, ║0║ and ║║ were calculated in meters. Only positional data were also used in another hand-eye calibration procedure. The minimum variance method introduced in [20] delivered better results than two other methods [8,18]. Systematic studies presented in this paper explain why this apparently surprising conclusion can be correct.One may wonder why for small or large values of
the predictions are not perfect, as the data in Fig. 4 indicate. However, it should be remembered that a residual value of the error function (like Erot or Eloc) provides only an indication of the noise level present in experimental data, not the actual deviation of best fit parameters from the unknown ground truth.
6. Conclusions
Performance of the iterative minimization procedure for registration of 6DOF noisy data was studied in computer simulations. The properly defined error function Epose removed the mismatch between the positional component of the error (unbounded, in length units) and the bounded, dimensionless rotational component. The error function Epose can be minimized and the resulting rotation matrix provides a good approximation to the true rotation across a large range of positional and rotational noise variances. Thus, both the developers and the users of pose determining systems could benefit from being able to properly gauge a relative amount of noise in the positional and the rotational parts of 6DOF data.