| Literature DB >> 32384605 |
Anca Morar1, Alin Moldoveanu1, Irina Mocanu1, Florica Moldoveanu1, Ion Emilian Radoi1, Victor Asavei1, Alexandru Gradinaru1, Alex Butean2.
Abstract
Computer vision based indoor localization methods use either an infrastructure of static cameras to track mobile entities (e.g., people, robots) or cameras attached to the mobile entities. Methods in the first category employ object tracking, while the others map images from mobile cameras with images acquired during a configuration stage or extracted from 3D reconstructed models of the space. This paper offers an overview of the computer vision based indoor localization domain, presenting application areas, commercial tools, existing benchmarks, and other reviews. It provides a survey of indoor localization research solutions, proposing a new classification based on the configuration stage (use of known environment data), sensing devices, type of detected elements, and localization method. It groups 70 of the most recent and relevant image based indoor localization methods according to the proposed classification and discusses their advantages and drawbacks. It highlights localization methods that also offer orientation information, as this is required by an increasing number of applications of indoor localization (e.g., augmented reality).Entities:
Keywords: 3D reconstruction; QR codes; computer vision; fiducial markers; indoor localization
Year: 2020 PMID: 32384605 PMCID: PMC7249029 DOI: 10.3390/s20092641
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Indoor localization with a mobile camera (left) or with static cameras (right).
Figure 2Distribution of selected papers over time. The horizontal axis shows the publication years. The vertical axis shows the number of papers published per year.
Figure 3Proposed classification based on environment data, sensing devices, detected elements, and localization method.
Classification of computer vision based localization research papers considering the environment data, the sensing devices, the detected elements, and the localization algorithm.
| Classification | Research Papers | |||
|---|---|---|---|---|
| Environment Data | Sensing Devices | Detected Elements | Localization Method | |
| Marker/camera position | 2D static cameras | Artificial | Image analysis | [ |
| Marker/camera position | 2D static cameras | Real | Image analysis | [ |
| Marker/camera position | 2D static cameras | Real | AI | [ |
| Marker/camera position | 2D mobile cameras | Artificial | Image analysis | [ |
| Marker/camera position | 3D mobile cameras | Artificial | Image analysis | [ |
| Marker/camera position | 2D cameras, sensors | Artificial | Image analysis | [ |
| Image/feature database | 2D mobile cameras | Real | Image analysis | [ |
| Image/feature database | 2D mobile cameras | Real | AI | [ |
| Image/feature database | 3D mobile cameras | Real | AI | [ |
| Image/feature database | 2D cameras, sensors | Real | Image analysis | [ |
| Image/feature database | 2D cameras, sensors | Real | AI | [ |
| Image/feature database | 3D cameras, sensors | Real | Image analysis | [ |
| 3D model | 2D mobile cameras | Real | Image analysis | [ |
| 3D model | 2D mobile cameras | Real | AI | [ |
| 3D model | 3D mobile cameras | Real | Image analysis | [ |
| 3D model | 3D mobile cameras | Real | AI | [ |
| 3D model | 2D cameras, sensors | Real | Image analysis | [ |
Characteristics of indoor localization solutions with 2D static cameras (with known positions), markers, and traditional image analysis.
| Research Paper | Dataset Characteristics | Computing Time and Platform | Accuracy |
|---|---|---|---|
| [ | own dataset: 1 static camera, covering 1.26 m × 1.67 m | avg. 0.2 s per frame on a server | err. between 0.0002 and 0.01 m (max. err.: 1 cm) |
| [ | own dataset: 3 rooms, each with 1 IP camera | real-time | observational |
Characteristics of indoor localization solutions with 2D static cameras (with known positions), real features, and traditional image analysis.
| Research Paper | Dataset Characteristics | Computing Time and Platform | Accuracy |
|---|---|---|---|
| [ | public datasets: PETS2009 [ | approximately 140 ms on Intel Core2Quad 2.66 GHz with 8 GB RAM | Multiple Object Tracking Accuracy (MOTA): 87.8% on the PETS2009 and 64.2% on the TUD-Stadtmitte |
| [ | own dataset: indoor space with 2.2 m × 6 m, images with | - | less than 7.1 cm |
| [ | own dataset: 12,690 frames acquired with 3 cameras; public dataset: PETS2001 | - | 95.7% hit rate and 96.5% precision |
| [ | own dataset: office with 5.1 m × 8.5 m × 2.7 m | - | mean err. of 0.37 m |
Characteristics of indoor localization solutions with 2D static cameras (with known positions), real features, and artificial intelligence.
| Research Paper | Dataset Characteristics | Computing Time and Platform | Accuracy |
|---|---|---|---|
| [ | public datasets: PETS 2009 (City Center), EPFLterrace dataset | - | Ground Position Error (GPE) metric with total err. rate 0.122/0.131, Projected Position Error (PPE) metric with total err. rate 0.107/0.140 |
| [ | training: 1542 images (own) + 25,608 images from MS COCO; evaluation: 1400 images/robot type + 110/pattern | 50 Hz on a GPU and 10 Hz on a CPU | detection rate between 70% and 97.9%; orientation err. between 1.6 and 11.9 degrees |
| [ | own dataset: 47 employees, 18 rooms and 6 cubicles, 960 ceiling images | 2.8 s per image (offline computation) | 88.2% accuracy for identifying locations |
| [ | own dataset: over 2100 frames in 42 scenarios | 6.25 fps on Jetson TX2 | approximately 45 cm mean err. |
| [ | own dataset: office room with 1 camera and supermarket with 6 cameras | 5 fps on a server | detection success rate of 90% and avg. localization err. of 14.32 cm |
Characteristics of indoor localization solutions with 2D mobile cameras, markers with known positions, and traditional image analysis.
| Research Paper | Dataset Characteristics | Computing Time and Platform | Accuracy |
|---|---|---|---|
| [ | own dataset: classroom with area 2.4 m × 1.8 m and 4 QR codes | Nexus 4 Google (fps not mentioned) | localization err. 6–8 cm, heading direction err. 1.2 angles |
| [ | own dataset, simplified and complex scenarios | 47 ms for QR code extraction on a Raspberry Pi 2 | complex scenario: planar position err. 17.5 cm, 3D pose estimation self-localization err. 10.4 cm |
| [ | public dataset proposed by Mikolajczyk and Schmid [ | 0.11 s, 0.16 s, 0.27 s, 0.14 s; 1–2 iterations to reach the threshold similarity | threshold similarity 0.8 |
| [ | own dataset: hall with 6 QR codes and 2 possible trajectories (circular and 8-shape) | 10 Hz on Linux Ubuntu 12.04 OS running ROS framework | err. for circular path 0.2 m, err. for 8-shape path 0.14 m, orientation err. 0.267 radians |
| [ | own dataset acquired with an RGB camera fixed on an FLIR Pan Tilt Unit mounted on a mobile platform with an SICK s300 laser scanner (ground truth), markers placed on a wall | 0.07 s avg processing time of a scene with 550 markers (up to 200 times faster than AprilTags) | avg. error of angle estimates: 0.02 rad.for pitch/roll |
| [ | own dataset: space covered with 4 × 4 QR codes, placed 50 cm apart | - | the robot can travel more than 7 times on the same route |
| [ | own dataset, images of resolution 1280 × 720 | 18.1 ms on an image with a cluttered scene and a single marker, using a single core 3.70 GHz Intel Xeon | less than 0.6 degrees std. dev. for rotation, less than 0.4 cm std. dev. for translation |
| [ | own dataset in an academic building, four different paths, markers on the ceiling, guidance test with 10 blindfolded users | - | 0 miss detections, 2 false detections out of 40 tests |
Characteristics of indoor localization solutions with 3D mobile cameras, markers with known positions, and traditional image analysis.
| Research Paper | Dataset Characteristics | Computing Time and Platform | Accuracy |
|---|---|---|---|
| [ | own dataset | - | distance from the camera to the QR code: 1 cm err. |
| [ | own dataset | real time | maximum distance and angles from which the robot can see the QR code are: 270 cm and 51∘. |
Characteristics of indoor localization solutions with 2D cameras + other sensors, markers with known positions, and traditional image analysis.
| Research Paper | Dataset Characteristics | Computing Time and Platform | Accuracy |
|---|---|---|---|
| [ | own dataset | less than 4 ms per frame; convergence time 18 s | less than 0.2 m for position and less than 0.1 orientation for EHF |
| [ | own dataset | computational load increases if dead reckoning is invoked with IMU sensors | visual (performance affected if dead reckoning is not used) |
| [ | own dataset: corridor with 100 m × 2.25 m and hall with 14 m × 6.5 m | - | accuracy is within 2 m 80% of the time |
Characteristics of indoor localization solutions with real image/feature databases, 2D mobile cameras, and traditional image analysis.
| Research Paper | Dataset Characteristics | Computing Time and Platform | Accuracy |
|---|---|---|---|
| [ | own datasets: 862 frames, 3674 frames | single CPU/Kepler K 20 chip: 20 ms/1.13 ms for a database of 1000 frames, 160 ms/8.82 ms for 8000 frames | within 2 m in most cases |
| [ | own datasets: hallways 15 m long | - | estimated moving speed compared to ground truth: max absolute err. 0.0643 m/s, RMSE for speed 0.24–0.37 m/s, RMSE for distance 0.16–0.23 m |
| [ | own dataset: 1866 images, 40 key frames | - | - |
| [ | own dataset: over 90,000 annotated frames out of 60 videos from six corridors (approximately 3.5 km of data) | - | 4 m avg. absolute err. for HOG3D and 1.3 m for SF GABOR, over a 50 m traveling distance |
Characteristics of indoor localization solutions with real image/feature databases, 2D mobile cameras, and artificial intelligence.
| Research Paper | Dataset Characteristics | Computing Time and Platform | Accuracy |
|---|---|---|---|
| [ | own dataset: 1800 images from 30 different locations, 480 indoor videos of buildings (each lasting around 2–3 s), public dataset: Dubrovnik [ | 0.00092 s for image based localization and 0.0012 s for the video based method | 95.56%/94.44% accuracy for location/orientation with image based localization and 98% with the video based method |
| [ | own dataset: 302 training images, resolution 3024 × 4032 | object detection phase takes 0.3 s | location accuracy is within 1 m |
| [ | own dataset: 112,919 compound images (composed of 4 images taken by 4 Google Nexus phones) of resolution 224 × 224 | close to real-time | avg. median err. after a 20 step moving for compound images is 12.1 cm |
Characteristics of indoor localization solutions with real image/feature databases, 3D mobile cameras, and artificial intelligence.
| Research Paper | Dataset Characteristics | Computing Time and Platform | Accuracy |
|---|---|---|---|
| [ | ICL-NUIMdataset [ | 296 ms to find the most similar frame and 277 ms to estimate the final pose on Intel Xeon E5-1650 v3 CPU 3.5 GHz, NVidia TITAN GPU | more than 80% of the images are localized within 2.5 degrees and more than 90% are localized within 0.3 m |
| [ | ICL-NUIM dataset [ | - | 0.51 m living room, 0.41 m office |
Characteristics of indoor localization solutions with real image/feature databases, 2D cameras + other sensors, and traditional image analysis.
| Research Paper | Dataset Characteristics | Computing Time and Platform | Accuracy |
|---|---|---|---|
| [ | own dataset: 75 location images from 1.5–2 m distance | query time 40–230 ms | mean distance err. rate is 2.5/2.21 m for extended distance estimation method/hybrid approach |
| [ | own dataset | 90 ms FAST-SURF, 100–130 ms indoor positioning, 3–7 ms character detection, 30–45 ms tracking and registration on Honor 3C smartphone | - |
| [ | own dataset (offices and hallways) | 0.5 s per frame | 90% of location and orientation errors are within 25 cm and 2 degrees |
| [ | - | iPhone5s, iPhone X, LG Nexus 5X, Samsung Galaxy S7, S9, Huawei Mate tablet | best results, on Samsung S9: 4.5 deg. avg. rotation and 250 mm position err. from 1 m in front of the marker |
Characteristics of indoor localization solutions with real image/feature databases, 2D cameras + other sensors, and artificial intelligence.
| Research Paper | Dataset Characteristics | Computing Time and Platform | Accuracy |
|---|---|---|---|
| [ | own dataset, 3 blind volunteers | 2 fps on a laptop | visual inspection |
| [ | own dataset, 5 people with different weight and height, walking at 3 different speeds, on two tracks (straight or zig-zag) | real time | 93% accuracy in case of normal speed |
| [ | own dataset: office floor 51 m × 20 m × 2.7 m and 7 WiFi routers | - | panoramic camera based method: mean err. for localization 0.84 m, cumulative probability within localization err. of 1 m/2 m is 70%/86% |
| [ | own dataset: room-level environment and open large environment | 4 s per image (0.8 s fingerprint location on server, 2.9 s image location on smartphone, 1 s data transmission) | less than 0.6 m avg. location err. and less than 6 degrees avg. direction err., 90% location deviations are less than 1 m |
| [ | own dataset: 50 m | real time on Intel5300 NIC laptop with 3 antennas as signal receiver and Ubuntu server with Intel Xeon e5-2609 CPU, GeForce GTX TITAN X GPU and 256 GB RAM | avg. position err. is 0.98/1.46 m for line-of-sight/none line-of-sight |
Characteristics of indoor localization solutions with real image/feature databases, 3D mobile cameras + other sensors, and traditional image analysis.
| Research Paper | Dataset Characteristics | Computing Time and Platform | Accuracy |
|---|---|---|---|
| [ | own dataset: small number of experiments and short tested trajectories | real time | avg. err. in the X-axis direction is 0.06 m with IMU |
| [ | own dataset | real time | translation error: 0.1043 m, rotation err.: 6.6571 degrees for static environments; translation err.: 0.0431 m ± 0.0080 m, rotation err.: 2.3239 degrees ± 0.4241 degrees for dynamic environments |
Characteristics of indoor localization solutions with an existing/generated 3D model of the environment, 2D mobile cameras, real features, and traditional image analysis.
| Research Paper | Dataset Characteristics | Computing Time and Platform | Accuracy |
|---|---|---|---|
| [ | own dataset: simple experiment with obstacles along a route | real time on a single CPU | visual inspection |
| [ | own dataset: images of resolution | 3 fps for SURF on 2.20 GHz dual-core computer | 90% detection success rate and 14.32 cm avg. localization err. |
| [ | 3 own datasets: images of resolution | avg. search time for Dataset 1 (176 frames) is 10 ms and for Dataset 4 (285 frames) is 28 ms | 80–100% accuracy, depending on dataset; 0.173–0.232 m localization error |
| [ | own dataset: reconstruction with RGBD-SLAM, Dataset 1 (50 frames, 139 mp), dataset 2 (33 frames, 37 mp) | avg. localization time is 0.72 s per frame on an Intel Core i7 with 8 GB RAM | avg. localization error is less than 10 mm: translation err. for Dataset 1 is 0.9–35 mm and for Dataset 2 is 0.3–17 mm |
| [ | own dataset: office environment, with 154 m long route | images captured at 0.8 Hz using a smartphone camera, computing time not mentioned | 1.5 degrees mean accuracy for visual gyroscope, 0.3 m/s mean accuracy for visual odometer, 1.8 m localization err. |
| [ | own dataset: synthetic scenes (20 m × 20 m scene with 88 lines and 160 points, 794 generated images); public dataset: Biccoca_2009 [ | 0.5–1 s for MATLAB version, 25.8 ms avg. running time for C++ version | 0.79 accuracy err. in position on a 967 m path; 0.2 m accuracy err. for synthetic scenes |
| [ | own dataset: Guangdong Key Laboratory, Shantou University | - | only visual inspection, in comparison with the RGB-D SLAM method [ |
| [ | own dataset: indoor environment; public dataset: Karlsruhe outdoor datasets [ | real time on an i7 processor | 94–98% distance and depth measurements accuracy; absolute err. of 5.72/9.63 m for 2 outdoor datasets and 4.07/1.35 cm for 2 indoor datasets |
| [ | own datasets: office building (400 m | approximately 0.1 s for relocalization and navigation on Huawei P10, Nexus 6, Nexus 7, Lenovo Phab2 pro | 98.6% immediate NSR, 93.1% NSR after 1 week, 83.4% NSR after 2 weeks |
Characteristics of indoor localization solutions with an existing/generated 3D model of the environment, 2D mobile cameras, real features, and artificial intelligence.
| Research Paper | Dataset Characteristics | Computing Time and Platform | Accuracy |
|---|---|---|---|
| [ | public dataset: TUM Dynamic Object dataset (RGB images, depth information, ground truth trajectory) | 5fps for Mask-RCNN on NVidia Tesla M40 GPU [ | RMSE between 0.006134 and 0.036156 |
| [ | own dataset: 370 m from route; public datasets: TUM dynamic dataset, KITTI dataset (outdoor large scenarios) | real time | the trajectory RMSE err. is 2.29 m, the accuracy is 7.48–62.33% higher than ORB-SLAM2 [ |
Characteristics of indoor localization solutions with an existing/generated 3D model of the environment, 3D mobile cameras, real features, and traditional image analysis.
| Research Paper | Dataset Characteristics | Computing Time and Platform | Accuracy |
|---|---|---|---|
| [ | own dataset: a room, check if the virtual objects have the same size as the real ones | 3–4 fps for map building on a laptop with i7-720qm CPU | difference in dimensions between 3D reconstructed and real objects: dm/cm level accuracy |
| [ | own dataset: barren office hallway; public dataset: TUM dataset [ | - | 0.1–1.5 m translation RPE, 2–18 degrees rotational RPE, 0.02–1.1 m ATE |
| [ | own dataset: room of size 15 × 10 × 3 m | 20 fps on a gaming laptop | visual inspection, checking loop closure |
| [ | own dataset: a path of 70 m through a building | 25 Hz | visual inspection |
| [ | own datasets: taken with a handheld structure sensor; public datasets: Freiburg Benchmark, TUM dataset [ | 5 fps | 0.011–0.062 RMSE of ATE for public datasets; 1.4–4.1 cm closing distance and 1.08–3.32 degrees closure angle for own datasets |
| [ | own dataset collected with 2 wheeled robots (RB-1 and Kobuki) with Asus Xtion RGB-D sensors | less than 30 s | tracking mode (robot starts from a known position): x-mean: 0.082 m, y-mean 0.078 m; global mode (robot starts from unknown position): x-mean 0.27 m, y-mean 0.43 m |
Characteristics of indoor localization solutions with an existing/generated 3D model of the environment, 3D mobile cameras, real features, and artificial intelligence.
| Research Paper | Dataset Characteristics | Computing Time and Platform | Accuracy |
|---|---|---|---|
| [ | public dataset: TUM dataset [ | 37.753 ms per frame on Intel i5 2.0 GHz CPU with 3 GB RAM | 0.015 and 0.103 RMSE for the error size of the posture, better than ORB-SLAM [ |
| [ | public dataset: ICL-NUIM [ | 119.0 ms (average) on a desktop PC running Ubuntu 12.04 with an Intel Core i7-2600 CPU at 3.40 GHz and 8 GBRAM | Absolute Trajectory Error ATE: 1 cm–5 cm, mostly competing with ORB-SLAM2 [ |
Characteristics of indoor localization solutions with an existing/generated 3D model of the environment, 2D mobile cameras + other sensors, real features, and traditional image analysis.
| Research Paper | Dataset Characteristics | Computing Time and Platform | Accuracy |
|---|---|---|---|
| [ | own dataset | - | visual evaluation |
| [ | own dataset: robot moves along a specific path in a lab room, QVGA resolution | 200 ms for basic image processing on LG P970, whole pipeline processed offline (manual extraction of features) | position err. converges from 35–50 cm to less than 3 m |
| [ | own dataset: 120 m indoor hallway with 5200 video frames of size | from 3.2 fps to 23.3 fps on commodity laptop (2.6 GHz quad-core CPU and 4 GB RAM) | 0.17 m position err. |
| [ | own dataset | - | visual inspection |
| [ | own dataset | map building and fusion process: real time on Intel Core i7-8550U CPU | relative error of ORB-SLAM2 [ |
| [ | own dataset and public dataset: EuRoCdataset [ | real time on an embedded board (1.92 GHz processor and 2 GB DDR3L RAM) | 0.01–0.15 m position err. for own dataset; 0.234 m max. err. for EuRoC dataset |
| [ | own dataset: Vienna airport, path of 200 m | 23 s for the proposed method to complete a guiding task | visual inspection |