| Literature DB >> 34269925 |
Zewen Xu1,2, Zheng Rong1,2, Yihong Wu3,4.
Abstract
In recent years, simultaneous localization and mapping in dynamic environments (dynamic SLAM) has attracted significant attention from both academia and industry. Some pioneering work on this technique has expanded the potential of robotic applications. Compared to standard SLAM under the static world assumption, dynamic SLAM divides features into static and dynamic categories and leverages each type of feature properly. Therefore, dynamic SLAM can provide more robust localization for intelligent robots that operate in complex dynamic environments. Additionally, to meet the demands of some high-level tasks, dynamic SLAM can be integrated with multiple object tracking. This article presents a survey on dynamic SLAM from the perspective of feature choices. A discussion of the advantages and disadvantages of different visual features is provided in this article.Entities:
Keywords: Data association; Dynamic simultaneous localization and mapping; Feature choices; Multiple objects tracking; Object simultaneous localization and mapping
Year: 2021 PMID: 34269925 PMCID: PMC8285453 DOI: 10.1186/s42492-021-00086-w
Source DB: PubMed Journal: Vis Comput Ind Biomed Art ISSN: 2524-4442
Recent surveys related to dynamic SLAM
| Year | Topic | References |
|---|---|---|
| 2018 | Dynamic SLAM | [ |
| 2019 | Motion segment based on optical flow | [ |
| 2020 | Semantics-based V-SLAM | [ |
| 2020 | Deep learning for SLAM | [ |
| 2020 | Feature-based SLAM | [ |
Summary of recent robust SLAM systems
| References | System properties | Implementation details | Practical consideration | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Backbone | CT | Env | MS | HE | P/S | BI | OH | LC | HL | |
| Low-level based SLAM (Robust SLAM section) | ||||||||||
| Point-based or pixel-patch-based SLAM | ||||||||||
| Yang et al. [ | ORB-SLAM2 [ | D | I | RE | – | – | – | – | – | – |
| Du et al. [ | ORB-SLAM2 | D | I | E + RE | – | √ | – | – | √ | – |
| Zhang et al. [ | – | D | I | OF + DI | – | √ | √ | – | – | – |
| Tan et al. [ | PTAM [ | M | I | RE | – | – | – | √ | – | – |
| Point-line-based SLAM | ||||||||||
| Zhang et al. [ | – | D | I | 3DE | – | √ | – | – | √ | √ |
| Using high-level feature as semantic priors in low-level feature-based SLAM (Using high-level features as semantic priors for low-level-feature-based SLAM section) | ||||||||||
| Point-based SLAM | ||||||||||
| Bescos et al. [ | ORB-SLAM2 | M, S, D | I, O | SI + DI | S [ | – | √ | √ | √ | – |
| Yu et al. [ | ORB-SLAM2 | D | I | SI + E | S [ | – | – | – | – | – |
| Cui and Ma [ | ORB-SLAM2 | D | I | SI + E | S [ | – | – | – | – | – |
| Han and Xi [ | ORB-SLAM2 | D | I | SI + OF | S [ | – | – | – | – | – |
| Long et al. [ | ORB-SLAM2 | D | I, O | SI + DI | S [ | – | √ | – | – | – |
| Ai et al. [ | ORB-SLAM2 | S, D | I, O | SI | O [ | √ | – | – | √ | – |
| Xiao et al. [ | ORB-SLAM2 | M | I, O | SI + RE | O [ | √ | – | – | √ | – |
| Brasch et al. [ | ORB-SLAM [ | M | O | SI + T | S [ | √ | – | – | √ | – |
| Point-line-based SLAM | ||||||||||
| Zhang et al. [ | – | D | I | SI + DI + E* | O [ | – | – | – | – | √ |
| Using high-level features in object SLAM (Using high-level features in object SLAM section) | ||||||||||
| Yang and Scherer [ | – | M | I, O | E | O [ | – | – | – | – | √ |
System properties: The backbone of the system (Backbone). Camera type (CT): RGB-D (D), monocular (M), stereo (S). Environment (Env): indoor (I), outdoor (O). Implementation details: Method of motion segmentation (MS): reprojection error (RE), epipolar (E), distance between matched and predicted 3D landmarks (3DE), semantic information (SI), depth information (DI), optical flow (OF), triangulation (T). High-level feature extractor (HE): semantic segmentation network (S), object detection network (O). Practical consideration: Use a probabilistic model or dynamic score (wight) to judge dynamic features (P/S). Long-term consistency (LC). Handle low-texture or less static point-feature man-made scenes (HL). *The epipolar constraint is only used on point features
Summary of recent SLAMMOT systems
| References | System properties | Implementation details | Practical consideration | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| CT | Env | ON | OMT | MK | MMS | HD | HE | OM | HMD | SR | NP | PD | DR | |
| Low-level based SLAM (SLAMMOT section) | ||||||||||||||
| Point-based SLAM | ||||||||||||||
| Wang et al. [ | S | I | M | R | – | SSC | – | – | J | – | I | √ | – | √ |
| Judd et al. [ | S | I | M | R | – | MMF | – | – | J | – | I | √ | – | – |
| Use high-level features in low-level feature-based SLAM (Using high-level features in point-based SLAM section) | ||||||||||||||
| Point-based SLAM | ||||||||||||||
| Nair et al. [ | M | O | M | R | C, O | SI | L | S [ | J | – | √ | – | – | – |
| Huang et al. [ | S | I, O | M | R | – | SI | L | O [ | S | √ | I | √ | √ | – |
| Bescos et al. [ | S D | O | M | R | – | SI | L | S | J | √ | I | √ | – | – |
| Ballester et al. [ | D | O | M | R | – | SI | L | S [ | J | √ | I | √ | – | – |
| Zhang et al. [ | M, S, D | I, O | M | R | – | SI | L | S [ | J | √ | -1 | √ | – | – |
| Using high-level features in object SLAM (Using high-level features in object SLAM section) | ||||||||||||||
| Yang and Scherer [ | M | I, O | M | R | – | SI | L | O [ | S | – | – | √ | – | – |
| Qiu et al. [ | M | I | S2 | R | C3 | SI | NN [ | O [ | S | – | √ | √ | – | – |
| Strecke et al. [ | D | I | M | R | – | SI | L | S [ | √ | I | √ | √ | √ | |
System properties: Camera type (CT): RGB-D (D), monocular (M), stereo (S). Environment (Env): indoor (I), outdoor (O). Object number (ON): single (S), multiple (M). Object motion type (OMT): rigid (R), non-rigid (NR), motion knowledge (MK): need knowledge about regarding object motion (O), need knowledge regarding camera motion (C), need no knowledge regarding motion (−). Details: Multi-motion segmentation (MMS): sub-space cluster (SSC), multi-motion fitting (MMF), semantic information (SI). High-level data association for object SLAM (HD) low-level-feature-based method (L), neural-network-based method (NN). High-level feature extractor and for object SLAM (HE): semantic segmentation network (S), object detection network (O). Optimization method (OM): joint optimization (J), separate optimization (S). Practical Consideration: Handle missing data (e.g., due to occlusion, lost tracks, motion blur) (HMD). Solve the relative-scale problem (SR): irrelevant for the type of camera (I). No need for shape priors (NP). Probabilistic data association (PD). Dense reconstruction (DR). 1. Cannot solve the relative-scale problem of monocular cameras; 2. Can implement MOT using multi-region BA; 3. Camera motion information comes from the IMU
Fig. 1(a): The violation of geometric constraints for point features in dynamic environments: (1) the tracked feature lies too far from the epipolar line, (2) back-projected rays from the tracked features do not meet, (3) faulty fundamental matrix estimation occurs when a dynamic feature is included in pose estimation, (4) high distance between re-projected features and observed features [16]; (b): The violation of geometric constraints for line features in a dynamic environment: (1) the matched 3D line (green) lies too far from the predicted 3D line (blue)
Fig. 2Model for a dynamic camera and dynamic object. The camera observes the same dynamic car at timestamps k − 1 and k. Here, the black solid curves represent camera () and object poses () in the world frame. Red solid lines represent the position and the speed of dynamic object in the world frame. Blue dashed lines represent 3D points in camera frames or the world frame
Fig. 3Approaches to addressing dynamic feature reconstruction for a monocular camera. (a): Trajectory triangulation with a line assumption. When the number of views t is three, the solution is a ruled surface. Therefore, to obtain a unique result, t must be at least five; (b): Illustration of reconstruction using the ground plane
Root-mean-squared error of ATE improvement for robust SLAM compared to ORB-SLAM2 on TUM datasets
| Low-level SLAM | Use high-level in point-based SLAM | |||||
|---|---|---|---|---|---|---|
| Point-based | Point-based | Point-line-based | ||||
| Year | 2020 | 2020 | 2018 | 2018 | 2019 | 2019 |
| References | Yang et al. [ | Du et al. [ | Besco et al. [ | Yu et al. [ | Cui and Ma [ | Zhang et al. [ |
| s_static | 23.2% | – | – | 25.9% | 13.0% | 24.1% |
| s_xyz | – | 18.2% | −66.7% | – | – | 3.1% |
| s_rpy | – | – | – | – | – | −15.8% |
| s_halfsphere | – | – | 15.0% | – | – | 58.6% |
| w_static | 98.2% | 94.9% | 93.3% | 97.9% | 98.5% | 98.3% |
| w_xyz | 97.5% | 95.6% | 96.9% | 96.7% | 97.5% | 97.7% |
| w_rpy | 95.8% | 93.8% | 94.7% | 48.7% | 97.2% | 76.4% |
| w_halfsphere | 95.4% | 92.7% | 92.9% | 93.76% | 95.0% | 96.7% |
Fig. 4Two ways to perform multi-motion segmentation using semantic information: (a) Assigning semantic labels with bounding boxes and (b) assigning semantic labels with pixel-wise semantic masks