Literature DB >> 32260451

Robust Outlier-Adaptive Filtering for Vision-Aided Inertial Navigation.

Abstract

With the advent of unmanned aerial vehicles (UAVs), a major area of interest in the research field of UAVs has been vision-aided inertial navigation systems (V-INS). In the front-end of V-INS, image processing extracts information about the surrounding environment and determines features or points of interest. With the extracted vision data and inertial measurement unit (IMU) dead reckoning, the most widely used algorithm for estimating vehicle and feature states in the back-end of V-INS is an extended Kalman filter (EKF). An important assumption of the EKF is Gaussian white noise. In fact, measurement outliers that arise in various realistic conditions are often non-Gaussian. A lack of compensation for unknown noise parameters often leads to a serious impact on the reliability and robustness of these navigation systems. To compensate for uncertainties of the outliers, we require modified versions of the estimator or the incorporation of other techniques into the filter. The main purpose of this paper is to develop accurate and robust V-INS for UAVs, in particular, those for situations pertaining to such unknown outliers. Feature correspondence in image processing front-end rejects vision outliers, and then a statistic test in filtering back-end detects the remaining outliers of the vision data. For frequent outliers occurrence, variational approximation for Bayesian inference derives a way to compute the optimal noise precision matrices of the measurement outliers. The overall process of outlier removal and adaptation is referred to here as "outlier-adaptive filtering". Even though almost all approaches of V-INS remove outliers by some method, few researchers have treated outlier adaptation in V-INS in much detail. Here, results from flight datasets validate the improved accuracy of V-INS employing the proposed outlier-adaptive filtering framework.

Entities: Chemical Disease Species

Keywords: EKF; IMU; UAV; V-INS; adaptive filtering; camera vision; computer vision; image processing; navigation; outlier rejection; sensor fusion

Year: 2020 PMID： 32260451 PMCID： PMC7181286 DOI： 10.3390/s20072036

Source DB: PubMed Journal: Sensors (Basel) ISSN： 1424-8220 Impact factor: 3.576

1. Introduction

The most widely used algorithms for estimating the states of a dynamic system are a Kalman Filter [1,2] and its nonlinear versions such as an extended Kalman filter (EKF) [3,4]. After the NASA Ames Research Center implemented the Kalman filter into navigation computers to estimate the trajectory of the Apollo program, engineers have developed a myriad of applications of the Kalman filter in navigation system research areas [5]. For example, Magree and Johnson [6] developed a simultaneous localization and mapping (SLAM) architecture with improved numerical stability based on the UD factored EKF, and Song et al. [7] proposed a new EKF system that loosely fuses both absolute state measurements and relative state measurements. Furthermore, Mostafa et al. [8] integrated radar odometry and visual odometry via EKF to help overcome their limitations in navigation. Despite the development of numerous applications of the Kalman filter in various fields, it suffers from inaccurate estimation when required assumptions fail. Estimation using a Kalman filter is optimal when process and measurement noise are Gaussian. However, sensor measurements are often corrupted by unmodeled non-Gaussian or heavy-tailed noise. An abnormal value relative to an overall pattern of the nominal Gaussian noise distribution is called an outlier. In other words, in statistics, an outlier is an observation that deviates so much from other observations as to arouse suspicion that it is generated by a different mechanism [9]. Such outliers have many anomalous causes. They arise due to unanticipated changes in system behavior (e.g., temporary sensor failure or transient environmental disturbance) or unmodeled factors (e.g., human errors or unknown characteristics of intrinsic noise). As an example of measurement outliers in many navigation systems, either computer vision data contaminated by outliers or sonar data corrupted by phase noise lead to erroneous measurements. Process outliers also occur by chance. Inertial measurement unit (IMU) dead reckoning and wheel odometry as a proxy often generate inaccurate dynamic models in visual-inertial odometry (VIO) and SLAM algorithms, respectively. Without accounting for outliers, the accuracy of the estimator significantly degrades, and control systems that rely on high-quality estimation can diverge.

1.1. Related Work

1.1.1. State Estimation for Measurements with Outliers

As the performance of the Kalman filter degrades at the presence of measurement outliers, many researchers have investigated other approaches to mitigate the impact of the outliers. Mehra [10] created adaptive filtering with the identification of noise covariance matrices and showed the asymptotic convergence of the estimates towards their true values. Maybeck [11] and Stengel [12] found other noise-adaptive filtering such as covariance matching. However, all of these filters performed only offline and required filter tuning. To estimate parameter values in unknown covariances without the need for manual parameter tuning, Ting et al. [13] used a variational expectation–maximization (EM) framework. That is, they introduced a scalar weight for each observed data sample and modeled the weights to be Gamma distributed random variables. However, it assumed that noise characteristics are homogeneous across all measurements even though sensors have distinct properties. Särkkä and Nummenmaa [14] provided the online learning of the parameters of the measurement noise variance, but to simultaneously track the system states and the levels of sensor noise, they additionally defined a heuristic transition model for the noise parameters. Piché et al. [15] developed Gaussian assumed density filtering and smoothing framework for nonlinear systems using the multivariate Student t-distribution, and Roth et al. [16] included an approximation step for heavy-tailed process noise, but these filters are not applicable in high dimensions. Next, Solin and Särkkä [17] found that the added flexibility of Student-t processes over Gaussian processes robustifies inference in outlier-contaminated noisy data, but they treated only analytic solutions enabled by the noise entanglement. Recently, Agamennoni et al. developed the outlier-robust Kalman filter (ORKF) [18,19] to obtain the optimal precision matrices of measurement outliers by variational approximation for Bayesian inference [20]. However, this method requires iterations at every time, even when observed data contain no outliers. Graham et al. also established the -norm filter [21] for both types of sparse outliers. However, the filter might not work for nonlinear systems since they derived the constraint of -norm optimization based on only linear system equations. Similar to the ORKF, the -norm filter needs the constrained optimization at all times, even when no additional noise present as outliers. Therefore, these two approaches demand some extensive computational complexity for either iterations or optimization. As outliers do not always arise (i.e., are rare), we reduce such computation cost if a test detects the time when outliers occur. All of the above methods were not validated for complicated systems such as unmanned aerial vehicles or vision-aided inertial navigation [6,22,23] or with sequential measurement updates [24,25].

1.1.2. Outlier Rejection Techniques

One of the primary problems in vision-aided inertial navigation systems (V-INS) is incorrect vision data correspondence or association. Matched features between two different camera views are corrupted by outliers because of image noise, occlusions, and illumination changes that are not modeled by the feature matching techniques. To provide cleaned measurement data to the filter, outlier removal in image processing front-end is essential. One of standard outlier rejection techniques is RANdom SAmple Consensus (RANSAC) [26]. RANSAC is an iterative approach to estimate the parameters of a mathematical model from a set of observed data contaminated by outliers. An underlying assumption is that the data consist of inliers whose distribution is described by some set of the parameters of the model and outliers that do not fit the model. The generated parameters are then verified on the remaining subset of the data, and the model with the highest consensus is a selected solution. In particular, 2-point RANSAC [27,28] is an extended RANSAC-based method for two consecutive views of a camera rigidly mounted on a vehicle platform. Given gyroscopic data from IMU measurements, randomly selected two-feature correspondences hypothesize an ego-motion of the vehicle. This motion constraint discards wrong data associations in the feature matching processes. For detecting remaining outliers that are not rejected in the image processing front-end, outlier detection tests are required in filtering back-end. Most of the statistical tests [29] that require access to the entire set of data samples for detecting outliers might not be a viable option in real-time applications. For example, the typicality and eccentricity data analysis [30,31] used in [32] is an inadequate measure in V-INS, as computing the means and the variances of each residual of sequential measurements is challenging. In V-INS, the tracking of some measured features is possibly lost due to out of sight, and new feature measurements are coming for initialization. For the real-time outlier detection of sequential measurements in V-INS, the Mahalanobis gating test [33] is a useful measure based on the analysis of residual and covariance signals at each feature measurement. The approach builds upon each Mahalanobis distance [34] of residuals and compares each value against a threshold given by the quantile of the chi-squared distribution with a certain degree of freedom. The confidence level of the threshold is designated prior to examining the data. Most commonly, the 95% confidence level is used. This hypothesis testing, called goodness of fit, is a commonly used outlier detection method in practice. Because of such suitability of the Mahalanobis gating test to real-time detection in V-INS, this paper combines the test with the ORKF [18,32] to detect and handle measurement outliers in vision-aided estimation problems. Similar to the derivation of update steps for handling measurement outliers in the ORKF, for computing the optimal precision matrices of unmodeled outliers in V-INS, Section 4 will derive feasible update procedures by variational inference. In other words, whenever unexpected outliers appear, the outlier-adaptive filtering in this paper updates and marginalizes measurement outliers to improve the robustness of the navigation systems.

1.2. Summary of Contributions

This paper presents improving the use of outlier removal techniques in image processing front-end and the development of a robust and adaptive state estimation framework for V-INS when frequent outliers occur. For outlier removal in the image processing front-end of V-INS, feature correspondence constitutes the following three steps: tracking, stereo matching, and 2-point RANSAC. To estimate the states of V-INS in which vision measurements still contain remaining outliers, we propose a novel approach that combines a real-time outlier detection technique with an extended version of the ORKF in the filtering back-end of V-INS. Therefore, our approach does not restrict noise at either a constant or Gaussian level in filtering. The testing results of benchmark flight datasets show that our approach leads to greater improvement in accuracy and robustness under severe illumination environments. Starting from the architecture of the existing vision-aided inertial navigation system, this paper more focuses on contributing to the development of red boxes in Figure 3.

1.3. A Guide to This Document

The remainder of this document contains the following sections. Section 2 introduces background for all of this study. To estimate the states of V-INS in which frequent outliers arise, Section 3 examines outlier rejection techniques in image processing front-end, and Section 4 and Section 5 formulate a novel implementation of robust outlier-adaptive filtering. Section 6 shows testing results of this study on benchmark flight datasets. The last section concludes and plans future work.

2. Preliminaries

2.1. The Extended Kalman Filter

The system equations with continuous-time dynamics and a discrete-time sensor are as follows, where is the state and a measurement. and are the nonlinear dynamic and measurement functions, respectively. Let us assume that these functions are known based on each equation of motion and modeling. To clarify, t denotes continuous time, subscript k represents the k-th time step, and initial condition is given. Moreover, let us assume that both propagation and measurements are corrupted by additive zero-mean white Gaussian noise; that is, and .

2.1.1. Time Update

To estimate the state variables of the system, we design a hybrid EKF in the following steps. In the propagation step, state estimate and its error-covariance are integrated from time to time with respect to variable where let . Hat “ ” denotes an estimate, and superscripts − and + represent a priori and a posteriori estimates, respectively. Here, for one numerical solution of the ordinary differential equation, we use the Heun’s method [35] that refers to the improved Euler’s method or a similar two-stage Runge–Kutta method. Jacobian A, B, and state transition matrix are defined by where . Letting , the time update of error covariance is

2.1.2. Measurement Update

Using actual sensor measurements, the measurement update step of the EKF corrects state estimates and its corresponding error covariance after propagation. Letting , at every time k when is measured, where K is called the Kalman gain and Jacobian C is defined as Equation (7) is the Joseph’s form [36] of the covariance measurement update, so this form preserves its symmetry and positive definite. For more details such as optimality and derivation, see References [24,37].

2.1.3. Sequential Measurement Update

When myriad measurements are observed at one time, sequential Kalman filtering is useful. In fact, we obtain N measurements, , at time k; that is, we first measure , then , ⋯, and finally , shown in Figure 1.

Figure 1

A schematic of the sequential measurement update.

We first initialize a posteriori estimate and covariance after zero measurement is processed; that is, they are equal to the a priori estimate and covariance. For , perform the general measurement update using the i-th measurement. We lastly assign the a posteriori estimate and covariance as and . Based on Simon [24]’s proof that the sequential Kalman filtering is equivalent formulation of the standard EKF, the order of updates does not affect overall performance of estimation.

2.2. Models of Vision-Aided Inertial Navigation

2.2.1. Vehicle Model

The nonlinear dynamics of a vehicle is driven by raw micro-electromechanical system (MEMS) IMU sensor data including specific force and angular velocity inputs. The estimated vehicle state is given by where , are the position and velocity of the vehicle with respect to the inertial frame, respectively. is the error quaternion of the attitude of the vehicle, and its more details are explained in [38,39,40]. and are the acceleration and gyroscope biases of the IMU, respectively. Left superscript i denotes a vector expressed in the inertial frame. The EKF propagates the vehicle state vector by dead reckoning with data from the IMU. Raw MEMS IMU sensor measurements and are corrupted by noise and bias as follows, where and are the true acceleration and angular rate, respectively, and g is the gravitational acceleration in the inertial frame. and are zero-mean, white, Gaussian noise of the accelerometer and gyroscope measurement, and denotes the rotation matrix from the inertial frame to the body frame. and in Equations (9) and (10) are the random walk rate of the acceleration and gyroscope biases. From the works in [40,41,42], the MEMS accelerometer and gyroscope are subject to flicker noise in the electronics and other components susceptible to random flickering. The flicker noise causes their biases to wander over time. Such bias fluctuations are usually modeled as a random walk. In other words, slow variations in the bias of the IMU sensor are modeled with a “Brownian motion” process, also termed a random walk in discrete-time. In practice, the biases are constrained to be within some range, and thus the random walk model is only a good approximation to the true process for short periods of time. The vehicle dynamics is given by where is a skew symmetric matrix, and function maps a 3 by 1 vector of the angular velocity into a 4 by 4 matrix [25]. The use of the 4 by 1 quaternion representation in state estimation causes the covariance matrix to become singular, so it requires considerable accounting for the quaternion constraints. To avoid these difficulties, engineers developed the error-state Kalman filter in which 3 by 1 infinitesimal error quaternion is used instead of 4 by 1 quaternion q in the state vector. In other words, we use attitude error quaternion to express the incremental difference between tracked reference body frame and actual body frame b for the vehicle. Jacobian matrix and are computed in Appendix A.

2.2.2. Camera Model

An intrinsically calibrated pinhole camera model [27,43] is given by where x is the state vector including the vehicle state and the feature state, and measurement is the j-th feature 2D location on the image plane. and are the horizontal and vertical focal lengths, respectively, and and are additive, zero-mean, white, Gaussian noise of the measurement. Vectors , are the j-th feature 3D position with respect to the camera frame and the inertial frame, respectively. Extrinsic parameter and are known and constant. Jacobian matrix is computed in Appendix A. In addition, from Equation (19), if j-th measurement on an image is a new feature, then is unknown so need to be initialized. Details of feature initialization are explained in Appendix B.

3. Outlier Rejection in Image Processing Front-End

3.1. Feature Correspondence

In this study, a feature detector using the Features from Accelerated Segment Test (FAST) algorithm [44,45] maintains a minimum number of features in each image. For each new image, a feature extractor using the Kanade–Lucas–Tomasi (KLT) sparse optical flow algorithm [46] tracks the existing features. Even though Paul et al. [47] proved that descriptor-based methods for temporal feature tracking are more accurate than KLT-based methods, as Sun et al. [48] found that descriptor-based methods require much more computing resource with a small gain in accuracy, we employ the KLT optical flow algorithm in the image processing front-end of this study. Next, our stereo matching using a fixed baseline stereo configuration also applies to the KLT optical flow algorithm for saving computational loads compared to other stereo matching approaches. With the matched features, a 2-point RANSAC [26] is applied to remove remaining outliers by utilizing the RANSAC step in the fundamental matrix test [27]. In the scope of this study, we implement the 2-point RANSAC algorithm by simply running one of open source codes. Similar to [48,49], our outlier rejection is composed of three steps, shown in Figure 2. We assume that features from previous and images are outlier-rejected points, where and are left and right camera frames of a stereo camera, respectively. The three steps form a closed loop of previous and current frames of left and right cameras. The first step is the stereo matching of tracked features on current image to image. The next steps are applying 2-point RANSAC between previous and current images of the left camera and another 2-point RANSAC between previous and current images of the right camera. For steps 2 and 3, stereo-matched features are directly used in each RANSAC.

Figure 2

Close loop steps of outlier rejection in image processing front-end.

3.2. Algorithm of Feature Correspondence

Algorithm 1 summarizes the feature correspondence for outlier rejection. For the scope of this paper, the OpenCV library [50] and open source codes of RANSAC are extremely useful and directly applied. Feature Tracking: functionbuildOpticalFlowPyramid(current or image) ▹ OpenCV [50] return pyramid of current or end function functionpredictFeatures(outlier-rejected points of previous , of , Intrinsic ) return predicted features of current end function functioncalcOpticalFlowPyrLK(pyramids of previous and current , outlier-rejected points of previous , predicted features of current ) ▹ OpenCV [50] return tracked points of previous and , tracked features of current end function Stereo Matching: functionstereoMatching(tracked points of previous and , tracked features of current ) Initialize points by projecting the tracked features of current to using the rotation from stereo extrinsic function calcOpticalFlowPyrLK(pyramid of current and , tracked features of current , initialized points) ▹ OpenCV [50] end function Further remove outliers based on the essential matrix return matched points of previous and , matched features of current and end function 2-Point RANSAC: functiontwoPointRansac(matched points of previous or , matched features of current or , of or , Intrinsic of or ) return outlier-rejected points of current or end function Addition of Newly Detected Features: Create a mask to avoid re-detecting existing features functionfastFeatureDetector(current image, mask) return new features on current end function functionstereoMatching(new features on current ) return matched new features on current end function Group all of outlier-rejected features In Algorithm 1, Pyramid is a type of multi-scale signal representation in which an image is subject to repeated smoothing and sub-sampling.

4. Outlier Adaptation in Filtering Back-End

Even though image processing front-end removes outliers by tracking, stereo matching, and 2-point RANSAC, some outlier features still survive and enter the filter as inputs. This section explains the outlier rejection procedure in filtering back-end.

4.1. Outlier Removal in Feature Initialization

If a measurement is a new feature, our system initializes its 3D position with respect to the inertial frame. In feature initialization, Gaussian–Newton least-squares minimization first estimates the depth of left camera. If either the estimated depth of the left or right camera is negative, then the solution of the minimization is invalid since features are always in front of both camera frames observing them. This process of removing features that has the invalid depth is referred to as outlier removal in feature initialization.

4.2. Outlier Detection by Chi-Squared Statistical Test

Before operating navigation systems, we initialize the chi-squared test table with a 95% confidence level. While the systems estimate the state variable, if j-th measurement at time k is the existing feature, then its residual and Jacobian are computed. Next, we proceed a Mahalanobis gating test [33] for residual to detect remaining outliers. In fact, Mahalanobis distance [34] is a measure of the distance between residual and covariance matrix In the statistic test, we compare value against a threshold given by the 95-th percentile of the distribution with degree of freedom. Here, is the number of observations of the j-th feature minus one. If the feature passes the test, the EKF uses residual to process the measurement update.

4.3. Outlier-Adaptive Filtering

Unlike the extended ORKF (EORKF) [32], for a practical estimation approach in V-INS, this study investigates only measurement outliers due to the following reasons. As the measurement update is not the process performed at every time step, the outlier detection by each residual value cannot directly detect the outliers of IMU measurements. Furthermore, in the sequential measurement update, multiple residuals are computed to update at one IMU time stamp. In other words, as only rare observations among feature measurements from one image are corrupted by the remaining outliers, hypothesizing that the outliers come from the IMU may be faulty. Therefore, in the scope of this paper, we handle only measurement outliers.

4.3.1. Student’s t-Distribution

Despite the true system with outliers, the classical EKF assumes that each model in the filter is corrupted with additive white Gaussian noise. The levels of the noise are assumed to be constant and encoded by sensor covariance matrices Q and R (i.e., ). However, as outliers arise in the realistic system, now we do not restrict noise at either a constant or Gaussian level. Instead, their levels vary over time, or noise have heavier tails than the normal distribution as follows, where denotes a Student’s t-distribution, and is degrees of freedom. Covariance matrix follows the inverse-Wishart distribution, denoted as . is precision matrix. In Bayesian statistics, the inverse-Wishart distribution is used as the conjugate prior for the covariance matrix of a multivariate normal distribution [20]. The probability density function (pdf) of the inverse-Wishart is where denotes the trace of a square matrix in linear algebra. Moreover, in probability and statistics, a Student’s t-distribution is any member of a family of continuous probability distributions that arises when estimating the mean of a normally distributed population in situations where the standard deviation of the population is unknown [51]. Whereas a normal distribution describes a full population, a t-distribution describes samples drawn from a full population; thus, the larger the sample, the more the distribution resembles a normal distribution. Indeed, as the degree of freedom goes to infinity, the t-distribution approaches the standard normal distribution. In other words, when the variance of a normally distributed random variable is unknown and a conjugate prior placed over it that follows an inverse-Wishart distribution, the resulting marginal distribution of the variable follows a Student’s t-distribution [52]. Then, the Student-t, a sub-exponential distribution with much heavier tails than the Gaussian, is more prone to producing outlying values that fall far from its mean.

4.3.2. Variational Inference

The purpose of filtering is generally to find the approximations of posterior distributions , where is the histories of sensor measurements obtained up to time k. For systems with the heavy-tailed noise, we also wish to produce another inference about covariance matrices whose priors follow the inverse-Wishart distribution. Hence, our goal in this section is to find both approximations for posterior distribution and model evidence . Compared to sampling methods, the variational Bayesian method performs approximate posterior inference at low computational cost for a wide range of models [20,52]. In the method, we decompose log marginal probability where p is the true distribution that is intractable for non-Gaussian noise models, and q is a tractable approximate distribution. In probability theory, a measure of the difference between two probability distributions p and q is the Kullback–Leibler divergence, denoted as . If we allow any possible choice for q such as the Gaussian distribution, then lower bound is maximum when the KL divergence vanishes; that is, . To minimize the KL divergence, we seek the member of a restricted family of . Indeed, maximizing is equivalent to minimizing another new KL divergence [52], and thus the minimum occurs when factorized distributions and the following Equations (25) and (26) hold, where represents the expectation with respect to . With assuming that initial state is Gaussian, the measurement update with varying noise covariance , which closely resemble the EKF updates, solve Equation (25). Algorithm 2 describes the details of the updates. Now let us assume that the true priors are IID noise models as the case in this study; that is, follows distribution. Then, second term in the right-hand side of Equation (26) is computed using the pdf of the inverse-Wishart distribution in Equation (21) with its prior noise model. As the term is conjugate prior for Equation (20), the approximations of have same mathematical forms as priors; that is, also follows distribution. As , From Equations (26)–(29), to handle measurement outliers, similar to Agamennoni et al. [18,19]’s derivation, we derive how to compute precision matrix of approximate distribution of as follows, where each feature from one image is independent and Next, in Equation (30), where estimation error and the Jacobian . In the sequential measurement update, and are corrected by Kalman gain that is a function of , so these update steps are coupled. Hence no a closed-form solution exists, and we can only solve iteratively. The purpose of the iteration seems to be similar to that of the online learning of unknown variances of each noise [10]. In addition, similar to Agamennoni et al.’s interpretation [19], the convergence and optimality of the derived update steps for outliers are guaranteed since the variational lower bound is convex with respect to , , and . In particular, as the j-th feature is observed countless times (i.e., →∞), converges to R in the limit of an infinitely precise noise distribution, so the iterative update steps reduce to the standard sequential measurement update of the EKF. If true state is significantly different from its estimate , then statistics dominates, and becomes much larger than R. This is regarded as a measurement outlier at time k. As Kalman gain is a function of the inverse of precision matrix , the larger values, the smaller the Kalman gain. Therefore, to deal with situations where measurement outliers occur, the iteration for Equations (30) and (31) corrects the state estimates and its covariance with low weights.

5. Implementation

5.1. Marginalization of Feature States

If measurement outliers often occur, a few numbers of sequential updates in the EKF are proceeded to correct the state estimates. Without a sufficient number of measurement updates, the EKF is not robust and even diverges. Hence, outlier-adaptive filtering introduced in Section 4.3 performs the modified measurement update even when a residual is detected as an outlier. Indeed, to save computation resources, this study operates the outlier-adaptive filtering for only features detected frequently outliers. For implementation, we count how many numbers of features augmented in state variables are detected as outliers. Once updating feature outliers by the outlier-adaptive filtering approach, we prune the used feature states from the state vector (Figure 3). In addition, similar to that mentioned in Appendix B, to maintain a certain size of the state vector, after the feature initialization, we marginalize the features with the least number of observations among tracked features.

Figure 3

A block diagram of the vision-aided inertial navigation system employing the outlier-adaptive filtering.

5.2. Summarized Algorithm

This section summarizes and describes an implementation of the proposed method. Figure 4 and Algorithm 2 illustrate a flow chart and the pseudocode of the overall process of the outlier-adaptive filtering approach for V-INS, respectively. For the robust outlier-adaptive filter presented in this paper, the blue boxes in Figure 4 are extended from the figure in [25].

Figure 4

A flow chart of the overall process of the outlier-adaptive filtering.

fordo ▹ Image processing front-end in different thread if new image capture then Image Processing: ▹ Algorithm 1 Stereo matching between current images of left camera and right camera RANSAC between previous and current images of camera RANSAC between previous and current images of camera end if ▹ Filtering back-end in different thread if new IMU packet arrival then Time Update: State prediction end if if new vision data packet arrival then for of observed features N do if new feature then Feature Initialization: If any depth of or is negative, j-th feature is outlier else ▹ tracked feature Outlier Gating Test: ▹ Measurement Updates: if outlier detected then ▹ while until converged do Update measurement noise given the state Update the posteriori state given noise end while else ▹ standard sequential EKF in Section 2 end if end if end for ▹ end if end for

6. Flight Datasets Test Results

To examine the influence of outliers in V-INS and validate the reliability and robustness of the proposed outlier-adaptive approach for navigation systems with outliers, we test one of benchmark flight datasets, the so-called “EuRoC MAV datasets” [53]. The visual-inertial sequences of the datasets were recorded onboard a micro aerial vehicle (MAV) while a pilot manually flew around the indoor motion capture environment. For more details, see Appendix C. To articulate the significance of outliers, we select two datasets of the bright scene, called “EuRoC V1 Easy”, and motion blur, called “EuRoC V1 Difficult.” As the images in the difficult dataset are dark scene or motion blur, we hypothesize that outliers occur more frequently in the difficult dataset. The EKF estimates the relative location and orientation from a starting point. As we do not know the exact absolute location of the origin of given datasets, to compare with ground-truth data given in the datasets, we require certain evaluation error metrics such as so-call “absolute trajectory error [54]”. For more details, see Appendix D. The absolute trajectory error as an evaluation error metric yields the following various comparison plots. Figure 5 illustrates the top-down view of the estimated flight trajectory of the difficult dataset.

Figure 5

Top-down view of flight trajectory of the EuRoC V1 difficult dataset by the outlier-adaptive filter.

Figure 6 depict the estimated x, y, z position and their respective estimation errors. All the estimation errors are bounded within each standard deviation envelope, so the proposed approach is reliable vision-aided inertial navigation under even poor illumination environments. Conceptually, the position error gets locked in, and it tends to increase with the length of the trajectory until new features are being mapped.

Figure 6

Position and estimation error of the EuRoC V1 difficult dataset by the outlier-adaptive filter.

Similar to the analysis presented in [25], the adaptive filter is a well-tuned estimator since the performance of doing runs with ×3 or ×10 ( /3 or /10) multiplier on the R matrix used in the filter is worse for all of those, shown in Table 1. That is, the fact that using the multipliers reveals larger root mean square (RMS) estimation errors indicates that our filter is well-tuned.

Table 1

Indication that the outlier-adaptive filter is well-tuned for the EuRoC V1 difficult dataset.

Multiplier on R		/10	/3	1	×3	×10
RMS error [m]		0.9240	0.3801	0.1700	0.5153	0.5610

Figure 7 shows the advantages of the addition of the outlier adaptation proposed in this paper by comparing it with a baseline, the standard EKF in V-INS.

Figure 7

Box plot of absolute estimation error of the position of the EuRoC V1 difficult dataset by the outlier-adaptive filter.

As Lee [25] and other researchers showed that the standard EKF is a basic filter in V-INS, we choose the method as a baseline here. The baseline only rejects outliers whenever the chi-squared test fails, whereas the outlier-adaptive filtering follows all proposed adaptive approaches in Algorithm 2. Although the iteration in the outlier-adaptive filtering might increase computational resources, it significantly improves the accuracy of estimation. Fortunately, the “while” loop iteration in Algorithm 2 rapidly converges to the optimal noise covariance by twice or three times iterations. For sensitivity analysis, RMS position errors resulting from the baseline and the outlier-adaptive filtering are compiled in Table 2.

Table 2

Sensitivity analysis in RMS position error [m] of the outlier-adaptive filtering.

Dataset	EuRoC V1 Easy	EuRoC V1 Difficult
	Slow Motion 0.41 m/s, 16.0 deg/s	Fast Motion 0.75 m/s, 35.5 deg/s
Method	Bright Scene	Motion Blur
Baseline	0.2558	0.3656
Outlier-Adaptive	0.2237	0.2264

Motion blur datasets are more sensitive to outliers as the improvement is larger when applied to those datasets. Although the outlier-adaptive filtering is the best choice for motion blur datasets, we can select an adequate mode depending on computation margin and cost. Although a number of researchers have investigated V-INS of the EuRoC datasets [55], only a few of them thoroughly has focused on vision measurements with outliers. Table 3 reveals that the proposed estimator, the outlier-adaptive filter, outperforms other state-of-the-art V-INS techniques, called “SVO+MSF [56,57]” and “S-MSCKF [48,58] ” in which stereo is available. As SVO+MSF is loosely coupled, its algorithm actually gets diverged.

Table 3

Comparison with other methods in RMS position error [m] of the outlier-adaptive filtering.

Dataset	EuRoC V1 Easy	EuRoC V1 Difficult
Method	Bright Scene	Motion Blur
Outlier-Adaptive Filter	0.2237	0.2264
SVO+MSF [56,57] (loosely coupled)	0.40	×
S-MSCKF [48,58] (stereo-filter)	0.34	0.67

7. Discussion

This paper has presented practical outlier-adaptive filtering for a vision-aided inertial navigation systems (V-INS) and evaluated its performance with flight datasets testing. In other words, this study develops a robust and adaptive state estimation framework for V-INS under frequent outliers occurrence. In the image processing front-end of the framework, we propose the improved utilization of outlier removal techniques. In filtering back-end, for estimating the states of V-INS with measurement outliers, we implement a novel approach of the outlier-robust extended Kalman filter (EKF) to V-INS, for which we derive iterative update steps for computing the precision noise matrices of vision outliers when the Mahalanobis gating test detects remaining outliers. To validate the accuracy of the proposed approach and compare it with other state-of-the-art V-INS algorithms, we test the performance of V-INS employing the outlier-adaptive filtering algorithm in the realistic benchmark flight datasets. In particular, to show more improvements of our method over the others’ approaches, we use the fast motion and motion blur flight datasets. Results from the flight datasets testing show that the novel navigation approach in this study improves the accuracy and reliability of state estimation in V-INS with frequent outliers. Using the outlier-adaptive filtering reduces the root mean square (RMS) error of the estimates and accelerates the robustness of the estimates, especially for the motion blur datasets. The primary goals of future work are listed as follows. Since an inertial measurement unit (IMU) is also a sensor, it could generate outliers in V-INS. With accounting for the process outliers, the accuracy and robustness of the estimator would be improved. If we distinguish process outliers from IMU sensors with measurement outliers from vision data, the extended outlier-robust EKF [32] may be an impressive and innovate approach for this case. Furthermore, the investigation of color noise in V-INS is another possible future work. One of the required assumptions of the Kalman filter is the whiteness of measurement noise. As an illustration, during sampling and transmission in image processing, color noise that may be originated from a multiplicity of sources could degrade the quality of images [59]. The vibrational effects of camera sensors might also produce colored measurement noise [60]. That is, if the residuals of vision data are correlated with themselves at different timestamp, then colored measurement noise occurs in V-INS. Therefore, the images with color noise would be filtered for ensuring the accuracy of locating landmarks. As modeling noise without additional prior knowledge of the noise statistics is typically difficult, the machine learning techniques-based state estimator for colored noise [61,62] may handle the unknown correlations in V-INS. This study showcases the approaches using stereo cameras but is also suitable for monocular V-INS and employable to other filter-based V-INS frameworks. We test benchmark flight datasets to validate the reliability and robustness of this study, but additional validating with other flight datasets or real-time flight tests would help to prove its more robustness. In addition, we can operate unmanned aerial vehicles (UAVs) stacked the navigation algorithms in this study and a controller-in-the-loop. The use of the controller-in-the-loop could be a more important validation criteria due to the potential for navigation-controller coupling. The research objectives and contributions presented here will remarkably advance the state-of-the-art techniques of vision-aided inertial navigation for UAVs.

Table A1

Sensors of EuRoC Datasets.

Sensor	Rate	Characteristics
Stereo Images (Aptina MT9V034)	2 × 20 FPS	Global Shutter, WVGA Monochrome
MEMS IMU (ADIS16448)	200 Hz	Instrumentally Calibrated

2 in total

1. A Multi-Sensor Fusion MAV State Estimation from Long-Range Stereo, IMU, GPS and Barometric Sensors.

Authors: Yu Song; Stephen Nuske; Sebastian Scherer
Journal: Sensors (Basel) Date: 2016-12-22 Impact factor: 3.576

2. Radar and Visual Odometry Integrated System Aided Navigation for UAVS in GNSS Denied Environment.

Authors: Mostafa Mostafa; Shady Zahran; Adel Moussa; Naser El-Sheimy; Abu Sesay
Journal: Sensors (Basel) Date: 2018-08-23 Impact factor: 3.576

2 in total